From patchwork Mon Dec 18 17:18:19 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roger Sayle X-Patchwork-Id: 180587 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:24d3:b0:fb:cd0c:d3e with SMTP id r19csp1397552dyi; Mon, 18 Dec 2023 09:18:47 -0800 (PST) X-Google-Smtp-Source: AGHT+IEnhiTXAWk3wMTXYJrSkiWAj5BSdHbfE2U/ltQP6WGIx4ug7Z8JNM1mXbQ/dkQEPMCKl839 X-Received: by 2002:a1f:7c8f:0:b0:4b6:c9fb:b185 with SMTP id x137-20020a1f7c8f000000b004b6c9fbb185mr892686vkc.23.1702919927058; Mon, 18 Dec 2023 09:18:47 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1702919927; cv=pass; d=google.com; s=arc-20160816; b=NCQMfzqHIl1Lpy+4cDzkZjNjBhHEA2Wpr8ELb/DzOFoYdRzFsyo8BduyeqTjxcO5CQ m4qjEGh6mM5AYTyZUyBqgXRJi0FYzxu9Lp6/DZVnB6FTGAgAbCnbEMwqLWO2U3T5cjX0 alT67nk/Y6ySoL00nt0yUMipHVvEuOu32S/x7eERCb3afapXA9umAgsdZhzhpCq5XRgo SncBUQXR1HMcKyZYKRBLoJJfbKS4XWo/2maNT2aNNiLBiCsp/5+KvaLjMCZTMxDI5t14 kngdchlDp2T7FtJ2iJevfFajFiIqB93o2qFODCy7+JlIDGiXYe5Kov2+bVKyCu5i7DTo t5TA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-language:thread-index :mime-version:message-id:date:subject:cc:to:from:dkim-signature :arc-filter:dmarc-filter:delivered-to; bh=g8M3JgNv9P0gWQ5ZSMG3PVkPbZB33GGUJRIATMBELpk=; fh=ez+UBk19YaOo+lQEyE9porlijlGbJDzUOtzUi3k96eQ=; b=ZiyAbh2OVKolC6+ikrp66oULwRAdn3fFYB6k0/jEJzhnchymWWxHCBM7qLU31W+BcW piatHlgr8T2uEzLmB/xh6n3MlEPo/OXXUD43u84JKk5bxCgErvWjph3ktw/Kyjpf2ZrU lvLMa618tmIrUhWpjsMxX6K8c9Ch05v/Rzx3VQNqg5mhcKZpJKRcY9ViduEXhAN6YM+Z RIhWbcgBH85N1sUzA06fkCIVQ630Pk1u6X70+RznWJKgw1OAapvZl+JzZJjfWLBQCoB6 IYff5KH2wbdtuJ3mB84vLst8bJFXxGa/AMOSnJwycF7HIdGJFIpVKOqB8GnWBSqx0IZq mKtg== ARC-Authentication-Results: i=2; mx.google.com; dkim=fail header.i=@nextmovesoftware.com header.s=default header.b=D7zfpkTg; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id fi7-20020a0561224d0700b004b2e400add4si4814558vkb.56.2023.12.18.09.18.46 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 18 Dec 2023 09:18:47 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=fail header.i=@nextmovesoftware.com header.s=default header.b=D7zfpkTg; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 8BD673857C76 for ; Mon, 18 Dec 2023 17:18:46 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id 2CEBA385802A for ; Mon, 18 Dec 2023 17:18:20 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2CEBA385802A Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 2CEBA385802A Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=162.254.253.69 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1702919902; cv=none; b=SQzBykwDE0RxIO8FsECsXH1ozoC1a9MU1qxYfcaljDfDLGwejtJAS3IQd/axt4BdxW/J+/fcT2uitbQOlfr4BCnVj/y3a3vIvTfhqkkSUgNgZbD8pPpyMiGELbsGvARHzvd8am3N5QRGouDmB+/u6Urr7J2yMUznIgFUTFnPNzE= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1702919902; c=relaxed/simple; bh=uYqQoj7nO4j6+laM32NUwkqqdeRIbuxHwOFPlKO1Rrg=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=UEFyOEJg813s9OwDy4uRnH644lYVNipBO4wAKpU2vR0U09aqESsZQZBvYbx53y2gMrDs5jslr9KYyHCcCJdefZERFvGHcVMK9887G0172lAusgdLVWbFVwKYzhM+BJKCGnxsywfr870RY5qnn60MowS7KLpuw+WbQRvP2onHk04= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=g8M3JgNv9P0gWQ5ZSMG3PVkPbZB33GGUJRIATMBELpk=; b=D7zfpkTgJQYpjZljx9ePghXMGp s0i1MB4TRyzpDoRJM44ERdSG6uh7MoAWg+b4GlwgaGMAntva3wPUPbILk3xY7EsVuzcAzhASNVorQ BZeaL9Y506V1/nLKj7t/0bm42JR5m/QAyX5LFmByMmKqYo70WGgwgObY6oYRbyXiQavmDIDMeeJCU D65m4cLTWHpBzPwLvbew5F+Bf40YA6sz4iUGMj0ibXgO12dlyUVV+IC9TqxBbyrvvmSjXE9LP/s8X UZn2oh6OvuoEJtxkt/xiwJcBnpN0wRBIMQK3DCwHW/K6EsFdt7fh6H/XlG87a4XGxOiTfJxr4XXjx hldXAzfQ==; Received: from [185.62.158.67] (port=60559 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96.2) (envelope-from ) id 1rFHFz-0005DP-1o; Mon, 18 Dec 2023 12:18:19 -0500 From: "Roger Sayle" To: Cc: "'Uros Bizjak'" Subject: [x86 PATCH] Improved TImode (128-bit) integer constants on x86_64. Date: Mon, 18 Dec 2023 17:18:19 -0000 Message-ID: <01db01da31d6$33055200$990ff600$@nextmovesoftware.com> MIME-Version: 1.0 X-Mailer: Microsoft Outlook 16.0 Thread-Index: Adox1OYXVriT0g4AQN+hn2AgUewxgA== Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-11.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, LOTS_OF_MONEY, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1785640965680947573 X-GMAIL-MSGID: 1785640965680947573 This patch fixes two issues with the handling of 128-bit TImode integer constants in the x86_64 backend. The main issue is that GCC always tries to load 128-bit integer constants via broadcasts to vector SSE registers, even if the result is required in general registers. This is seen in the two closely related functions below: __int128 m; #define CONST (((__int128)0x0123456789abcdefULL<<64) | 0x0123456789abcdefULL) void foo() { m &= CONST; } void bar() { m = CONST; } When compiled with -O2 -mavx, we currently generate: foo: movabsq $81985529216486895, %rax vmovq %rax, %xmm0 vpunpcklqdq %xmm0, %xmm0, %xmm0 vmovq %xmm0, %rax vpextrq $1, %xmm0, %rdx andq %rax, m(%rip) andq %rdx, m+8(%rip) ret bar: movabsq $81985529216486895, %rax vmovq %rax, %xmm1 vpunpcklqdq %xmm1, %xmm1, %xmm0 vpextrq $1, %xmm0, %rdx vmovq %xmm0, m(%rip) movq %rdx, m+8(%rip) ret With this patch we defer the decision to use vector broadcast for TImode until we know we need actually want a SSE register result, by moving the call to ix86_convert_const_wide_int_to_broadcast from the RTL expansion pass, to the scalar-to-vector (STV) pass. With this change (and a minor tweak described below) we now generate: foo: movabsq $81985529216486895, %rax andq %rax, m(%rip) andq %rax, m+8(%rip) ret bar: movabsq $81985529216486895, %rax vmovq %rax, %xmm0 vpunpcklqdq %xmm0, %xmm0, %xmm0 vmovdqa %xmm0, m(%rip) ret showing that we now correctly use vector mode broadcasts (only) where appropriate. The one minor tweak mentioned above is to enable the un-cprop hi/lo optimization, that I originally contributed back in September 2004 https://gcc.gnu.org/pipermail/gcc-patches/2004-September/148756.html even when not optimizing for size. Without this (and currently with just -O2) the function foo above generates: foo: movabsq $81985529216486895, %rax movabsq $81985529216486895, %rdx andq %rax, m(%rip) andq %rdx, m+8(%rip) ret I'm not sure why (back in 2004) I thought that avoiding the implicit "movq %rax, %rdx" instead of a second load was faster, perhaps avoiding a dependency to allow better scheduling, but nowadays "movq %rax, %rdx" is either eliminated by GCC's hardreg cprop pass, or special cased by modern hardware, making the first foo preferrable, not only shorter but also faster. This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32}, and with/without -march=cascadelake with no new failures. Ok for mainline? 2023-12-18 Roger Sayle gcc/ChangeLog * config/i386/i386-expand.cc (ix86_convert_const_wide_int_to_broadcast): Remove static. (ix86_expand_move): Don't attempt to convert wide constants to SSE using ix86_convert_const_wide_int_to_broadcast here. (ix86_split_long_move): Always un-cprop multi-word constants. * config/i386/i386-expand.h (ix86_convert_const_wide_int_to_broadcast): Prototype here. * config/i386/i386-features.cc: Include i386-expand.h. (timode_scalar_chain::convert_insn): When converting TImode to v1TImode, try ix86_convert_const_wide_int_to_broadcast. gcc/testsuite/ChangeLog * gcc.target/i386/movti-2.c: New test case. * gcc.target/i386/movti-3.c: Likewise. Thanks in advance, Roger diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc index fad4f34..57a108a 100644 --- a/gcc/config/i386/i386-expand.cc +++ b/gcc/config/i386/i386-expand.cc @@ -289,7 +289,7 @@ ix86_broadcast (HOST_WIDE_INT v, unsigned int width, /* Convert the CONST_WIDE_INT operand OP to broadcast in MODE. */ -static rtx +rtx ix86_convert_const_wide_int_to_broadcast (machine_mode mode, rtx op) { /* Don't use integer vector broadcast if we can't move from GPR to SSE @@ -541,14 +541,6 @@ ix86_expand_move (machine_mode mode, rtx operands[]) return; } } - else if (CONST_WIDE_INT_P (op1) - && GET_MODE_SIZE (mode) >= 16) - { - rtx tmp = ix86_convert_const_wide_int_to_broadcast - (GET_MODE (op0), op1); - if (tmp != nullptr) - op1 = tmp; - } } } @@ -6323,18 +6315,15 @@ ix86_split_long_move (rtx operands[]) } } - /* If optimizing for size, attempt to locally unCSE nonzero constants. */ - if (optimize_insn_for_size_p ()) - { - for (j = 0; j < nparts - 1; j++) - if (CONST_INT_P (operands[6 + j]) - && operands[6 + j] != const0_rtx - && REG_P (operands[2 + j])) - for (i = j; i < nparts - 1; i++) - if (CONST_INT_P (operands[7 + i]) - && INTVAL (operands[7 + i]) == INTVAL (operands[6 + j])) - operands[7 + i] = operands[2 + j]; - } + /* Attempt to locally unCSE nonzero constants. */ + for (j = 0; j < nparts - 1; j++) + if (CONST_INT_P (operands[6 + j]) + && operands[6 + j] != const0_rtx + && REG_P (operands[2 + j])) + for (i = j; i < nparts - 1; i++) + if (CONST_INT_P (operands[7 + i]) + && INTVAL (operands[7 + i]) == INTVAL (operands[6 + j])) + operands[7 + i] = operands[2 + j]; for (i = 0; i < nparts; i++) emit_move_insn (operands[2 + i], operands[6 + i]); diff --git a/gcc/config/i386/i386-expand.h b/gcc/config/i386/i386-expand.h index 997cb7d..e9e94bf 100644 --- a/gcc/config/i386/i386-expand.h +++ b/gcc/config/i386/i386-expand.h @@ -57,5 +57,6 @@ bool ix86_notrack_prefixed_insn_p (rtx_insn *); machine_mode ix86_split_reduction (machine_mode mode); void ix86_expand_divmod_libfunc (rtx libfunc, machine_mode mode, rtx op0, rtx op1, rtx *quot_p, rtx *rem_p); +rtx ix86_convert_const_wide_int_to_broadcast (machine_mode mode, rtx op); #endif /* GCC_I386_EXPAND_H */ diff --git a/gcc/config/i386/i386-features.cc b/gcc/config/i386/i386-features.cc index e6fc135..3fcbb81 100644 --- a/gcc/config/i386/i386-features.cc +++ b/gcc/config/i386/i386-features.cc @@ -90,6 +90,7 @@ along with GCC; see the file COPYING3. If not see #include "dwarf2out.h" #include "i386-builtins.h" #include "i386-features.h" +#include "i386-expand.h" const char * const xlogue_layout::STUB_BASE_NAMES[XLOGUE_STUB_COUNT] = { "savms64", @@ -1853,14 +1854,25 @@ timode_scalar_chain::convert_insn (rtx_insn *insn) { /* Since there are no instructions to store 128-bit constant, temporary register usage is required. */ + bool use_move; start_sequence (); - src = gen_rtx_CONST_VECTOR (V1TImode, gen_rtvec (1, src)); - src = validize_mem (force_const_mem (V1TImode, src)); + tmp = ix86_convert_const_wide_int_to_broadcast (TImode, src); + if (tmp) + { + src = lowpart_subreg (V1TImode, tmp, TImode); + use_move = true; + } + else + { + src = gen_rtx_CONST_VECTOR (V1TImode, gen_rtvec (1, src)); + src = validize_mem (force_const_mem (V1TImode, src)); + use_move = MEM_P (dst); + } rtx_insn *seq = get_insns (); end_sequence (); if (seq) emit_insn_before (seq, insn); - if (MEM_P (dst)) + if (use_move) { tmp = gen_reg_rtx (V1TImode); emit_insn_before (gen_rtx_SET (tmp, src), insn); diff --git a/gcc/testsuite/gcc.target/i386/movti-2.c b/gcc/testsuite/gcc.target/i386/movti-2.c new file mode 100644 index 0000000..73f69d2 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/movti-2.c @@ -0,0 +1,12 @@ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-O2 -mavx" } */ +__int128 m; + +void foo() +{ + m &= ((__int128)0x0123456789abcdefULL<<64) | 0x0123456789abcdefULL; +} + +/* { dg-final { scan-assembler-times "movabsq" 1 } } */ +/* { dg-final { scan-assembler-not "vmovq" } } */ +/* { dg-final { scan-assembler-not "vpunpcklqdq" } } */ diff --git a/gcc/testsuite/gcc.target/i386/movti-3.c b/gcc/testsuite/gcc.target/i386/movti-3.c new file mode 100644 index 0000000..535e5dc --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/movti-3.c @@ -0,0 +1,12 @@ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-O2 -mavx" } */ + +__int128 m; + +void bar() +{ + m = ((__int128)0x0123456789abcdefULL<<64) | 0x0123456789abcdefULL; +} + +/* { dg-final { scan-assembler "vmovdqa" } } */ +/* { dg-final { scan-assembler-not "vpextrq" } } */