From patchwork Mon Oct 30 17:26:58 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roger Sayle X-Patchwork-Id: 159831 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:d641:0:b0:403:3b70:6f57 with SMTP id cy1csp2377871vqb; Mon, 30 Oct 2023 10:27:27 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHNmCis+ndATU0QW9i43lNAFfpdVITUEuibh7BKBHx0hWwclNBNHinm5+aS1hUHpxfS4ypp X-Received: by 2002:a05:622a:118c:b0:419:5767:af55 with SMTP id m12-20020a05622a118c00b004195767af55mr15073627qtk.16.1698686847356; Mon, 30 Oct 2023 10:27:27 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1698686847; cv=pass; d=google.com; s=arc-20160816; b=YmINJKkE3CyeIIVa+iYBxytoJV3N8kiiN9tb70pHGWEC2YPJIVI1GmCGJnsqTGaDBy jgFkoIx2v15WQoAEuDjbv9+eLtzke2IfmtpGETtxdPJAoR7IK/dT+oNnFl4/OrkithO8 2vnfo7yrwfpNioKswsOimXw1b32owuzegZUtvbcUHG1D79hEqBumayPJOcDkb7ScbWrT O/Mb3XsxCk5dj8yHXSBfrAiWtDWHpzNiw7KzyJyiMbtynNHEdc9JlzsoSBAwJ56NRq6+ pXqWcVC3Q+gieeJxR0adAw232SsCLj4dlF5Nj6qbPDpPA41/5btoVHIrxIt9+fd3yEyA 2+Tg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-language:thread-index :mime-version:message-id:date:subject:cc:to:from:dkim-signature :arc-filter:dmarc-filter:delivered-to; bh=PFKQAhMZrNek5bmiChrjLVpvqbWfRePHvtJp2Knw2WU=; fh=ez+UBk19YaOo+lQEyE9porlijlGbJDzUOtzUi3k96eQ=; b=Bs6bfInPjK2+KGDmo7AKuDw88bTUd07qg7a3kvcArvIF5uO7KkkaPEB699lt4klOkG ist5XfepMXEXYE1KDiMbgK18i448FSXHcmFHjbqQG7alnbB0T+5hbtP+JMSWxnZMne6M yGcofQx5+pEkwa5NhoHjcE0TSQIGdpGJ8sTE1QxD19HtMmIbv4d/yRKK2mGKBLSw8cSw gUksRxh6mISHIBHvxnzgjqlB6EQyexZucVkIBJzigeTNVroW/Hy21Q6GgblLXcmePFh7 t6HPZOpJL4Q/jg2S2d12gK8NgLF7YVhadChaEFoyvyw/HcLt30gruEo44wvI8m51fWw3 Q0Cw== ARC-Authentication-Results: i=2; mx.google.com; dkim=fail header.i=@nextmovesoftware.com header.s=default header.b=fV5yaIZd; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id d20-20020ac85d94000000b0041773edb414si5886022qtx.655.2023.10.30.10.27.27 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 30 Oct 2023 10:27:27 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=fail header.i=@nextmovesoftware.com header.s=default header.b=fV5yaIZd; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 2078B385770B for ; Mon, 30 Oct 2023 17:27:27 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id D31A03858D37 for ; Mon, 30 Oct 2023 17:27:00 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D31A03858D37 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org D31A03858D37 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=162.254.253.69 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698686823; cv=none; b=VvPMSLKzX5ir6jNjZ3OAjl83XucSwwXsrt3pB1bTdtzfpr2OY8Dcd4pYgOhzXHSh1W3L6kmm6t+3D/oMGMZn+r2BPcXYI46b5P0nmUNQd8832psWbDXLaoHWjsxni9yTJdyfgHPJyqhn0nV/88p+x8MiUQnf90bKybyP0AVAFw8= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698686823; c=relaxed/simple; bh=OHZNs63UUXTgdFShcASYiM4L9uRc/GzQGcuzGsNlpNs=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=J3calIRBXB75s+M4N3RC3/L9OOnnzt1ZKlr47jZHWRt4C6+2B5JEYnMU8aMxtimrITfb80SGHTCO8X22wPDRLuN65aL1wCD0lmkbJKa2uEj8WSOx9IFsSvU98Zk75ViW7kKwRcHJYC6ybQY5okETRw43QbLSprx9BXXBBnBj2/s= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=PFKQAhMZrNek5bmiChrjLVpvqbWfRePHvtJp2Knw2WU=; b=fV5yaIZdIpJiNjO3pNZEK98mbU xbQL+zJ42WLXtnrVnNYl0JSb0b3zyCHfN+7m4koQpJbVg5xSyb5wEL32ap49waTrv435FIJh0Vrfl rkzWEHyc0/w14xDKhmQz6yXeRyNuVa76vAtM+qRhX1d6hSqoLiNGNawMiZVC7jMCAYYuq6X3j/2Pa I1kWjGphnDnWx1IBcWO0Oh7QENMWQdgIueCl480uFilEIYfEnldg85xqxg6AmfAR7rMvdXGRJNpWq iGYF61cPIMIvYmp5LJZjxfYGik3PX26WvbImvO5ERK3Dvx5oHMoOww1VwavRirP1xEZnzWvpvaCyT zdEK1CyQ==; Received: from [185.62.158.67] (port=64683 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96.2) (envelope-from ) id 1qxW2W-0001L6-1J; Mon, 30 Oct 2023 13:27:00 -0400 From: "Roger Sayle" To: Cc: "'Uros Bizjak'" Subject: [x86_64 PATCH] PR target/110551: Tweak mulx register allocation using peephole2. Date: Mon, 30 Oct 2023 17:26:58 -0000 Message-ID: <00c901da0b56$4a3ff750$debfe5f0$@nextmovesoftware.com> MIME-Version: 1.0 X-Mailer: Microsoft Outlook 16.0 Thread-Index: AdoLVbQiWTMTnWaeQgu3GCzsT6Q8aw== Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1781202259455721376 X-GMAIL-MSGID: 1781202259455721376 This patch is a follow-up to my previous PR target/110551 patch, this time to address the additional move after mulx, seen on TARGET_BMI2 architectures (such as -march=haswell). The complication here is that the flexible multiple-set mulx instruction is introduced into RTL after reload, by split2, and therefore can't benefit from register preferencing. This results in RTL like the following: (insn 32 31 17 2 (parallel [ (set (reg:DI 4 si [orig:101 r ] [101]) (mult:DI (reg:DI 1 dx [109]) (reg:DI 5 di [109]))) (set (reg:DI 5 di [ r+8 ]) (umul_highpart:DI (reg:DI 1 dx [109]) (reg:DI 5 di [109]))) ]) "pr110551-2.c":8:17 -1 (nil)) (insn 17 32 9 2 (set (reg:DI 0 ax [107]) (reg:DI 5 di [ r+8 ])) "pr110551-2.c":9:40 90 {*movdi_internal} (expr_list:REG_DEAD (reg:DI 5 di [ r+8 ]) (nil))) Here insn 32, the mulx instruction, places its results in si and di, and then immediately after decides to move di to ax, with di now dead. This can be trivially cleaned up by a peephole2. I've added an additional constraint that the two SET_DESTs can't be the same register to avoid confusing the middle-end, but this has well-defined behaviour on x86_64/BMI2, encoding a umul_highpart. For the new test case, compiled on x86_64 with -O2 -march=haswell: Before: mulx64: movabsq $-7046029254386353131, %rdx mulx %rdi, %rsi, %rdi movq %rdi, %rax xorq %rsi, %rax ret After: mulx64: movabsq $-7046029254386353131, %rdx mulx %rdi, %rsi, %rax xorq %rsi, %rax ret This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32} with no new failures. Ok for mainline? 2023-10-30 Roger Sayle gcc/ChangeLog PR target/110551 * config/i386/i386.md (*bmi2_umul3_1): Tidy condition as operands[2] with predicate register_operand must be !MEM_P. (peephole2): Optimize a mulx followed by a register-to-register move, to place result in the correct destination if possible. gcc/testsuite/ChangeLog PR target/110551 * gcc.target/i386/pr110551-2.c: New test case. Thanks in advance, Roger diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index eb4121b..a314f1a 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -9747,13 +9747,37 @@ (match_operand:DWIH 3 "nonimmediate_operand" "rm"))) (set (match_operand:DWIH 1 "register_operand" "=r") (umul_highpart:DWIH (match_dup 2) (match_dup 3)))] - "TARGET_BMI2 - && !(MEM_P (operands[2]) && MEM_P (operands[3]))" + "TARGET_BMI2" "mulx\t{%3, %0, %1|%1, %0, %3}" [(set_attr "type" "imulx") (set_attr "prefix" "vex") (set_attr "mode" "")]) +;; Tweak *bmi2_umul3_1 to eliminate following mov. +(define_peephole2 + [(parallel [(set (match_operand:DWIH 0 "general_reg_operand") + (mult:DWIH (match_operand:DWIH 2 "register_operand") + (match_operand:DWIH 3 "nonimmediate_operand"))) + (set (match_operand:DWIH 1 "general_reg_operand") + (umul_highpart:DWIH (match_dup 2) (match_dup 3)))]) + (set (match_operand:DWIH 4 "general_reg_operand") + (match_operand:DWIH 5 "general_reg_operand"))] + "TARGET_BMI2 + && ((REGNO (operands[5]) == REGNO (operands[0]) + && REGNO (operands[1]) != REGNO (operands[4])) + || (REGNO (operands[5]) == REGNO (operands[1]) + && REGNO (operands[0]) != REGNO (operands[4]))) + && peep2_reg_dead_p (2, operands[5])" + [(parallel [(set (match_dup 0) (mult:DWIH (match_dup 2) (match_dup 3))) + (set (match_dup 1) + (umul_highpart:DWIH (match_dup 2) (match_dup 3)))])] +{ + if (REGNO (operands[5]) == REGNO (operands[0])) + operands[0] = operands[4]; + else + operands[1] = operands[4]; +}) + (define_insn "*umul3_1" [(set (match_operand: 0 "register_operand" "=r,A") (mult: diff --git a/gcc/testsuite/gcc.target/i386/pr110551-2.c b/gcc/testsuite/gcc.target/i386/pr110551-2.c new file mode 100644 index 0000000..4936adf --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr110551-2.c @@ -0,0 +1,12 @@ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-O2 -march=haswell" } */ + +typedef unsigned long long uint64_t; + +uint64_t mulx64(uint64_t x) +{ + __uint128_t r = (__uint128_t)x * 0x9E3779B97F4A7C15ull; + return (uint64_t)r ^ (uint64_t)( r >> 64 ); +} + +/* { dg-final { scan-assembler-not "movq" } } */