From patchwork Tue Aug 29 08:06:23 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jakub Jelinek X-Patchwork-Id: 137107 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a7d1:0:b0:3f2:4152:657d with SMTP id p17csp3782921vqm; Tue, 29 Aug 2023 01:07:23 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHLhZ+MDuUBdIQYLOeKF1/inj4BXNrg7/KsF/aOtRbEU2wek+/7aJ3V+hxxKDUmv/2z3ofC X-Received: by 2002:a05:6512:3f09:b0:4fe:3e89:fcb2 with SMTP id y9-20020a0565123f0900b004fe3e89fcb2mr21032841lfa.34.1693296443138; Tue, 29 Aug 2023 01:07:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1693296443; cv=none; d=google.com; s=arc-20160816; b=JANqm7fFfhnUdhD4jgCjTuXuFcvv+/r3oUUtTwq57bveLC9CUqEqjDWVOhHIAenhed 9WGIc0PBOIRoH6XfHDVo2qh/g6wLijYXAKV0LwRC/0eEJ4e4mbRV0v45ZYVAz9Ljpmr5 bjzRopHf8eQ45v1QOdEDfZDmfbvljq7QEpUCqwMwxajDo38JrSfMu+vr+yWKU6ejBWQd LOsRZjJ/Xk44EEUpMxDtMrtySLxbpbTP6+Ue2nTXpl4LYXwslAjxmiRjvMSgV+3hMt/7 yLFJv3KnMSNJaR1XBA3E4ofxwoAwd2tjYRaiWOUTFD+gYXdAgEEgQWOxys+6o5mvftLY clRQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence :content-disposition:mime-version:message-id:subject:cc:to:date :dmarc-filter:delivered-to:dkim-signature:dkim-filter; bh=3plRP3u6uiV89dmes4q22ptMpkLrga606jOuwM2a1I8=; fh=FCjeRajqaQYHMkQtfIia8KT5yBac53mYOLLyJhYG/AY=; b=uYMHHTrFYqpNWf7sz+ZXVkpNBOqDjoCRSV5xmF57wp3B3hensNqvmC79tSVvH2kB1W fOUl/9GLR+OBIEsPnXFA3clVeaxYl5r27g/UK5ngDwGw/eO0Diq3rNMI9rCqunBOtfhm gfpcUZ23CoFhirH5rw7bo578YG4syAuLGuUxjjunr8kI9pkizRZ2yJYMPwKQ6ucSV02F EKavQEUGibOEWgjZAxLb7C6Wb/xra1XD6o5RfJfXaN9bSxFvriDzOILaX1Ro2olkIjLJ lRfl0jLWHhY8ClB4KMm6kbGBmM0XTI30daf4GRa0nJql3dYs3Ua4LKIdGC+AiNSkK8HQ pALA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=OkfgRRYx; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id u18-20020aa7d992000000b0052a4b8075cesi3564174eds.402.2023.08.29.01.07.22 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Aug 2023 01:07:23 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=OkfgRRYx; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 903743857C44 for ; Tue, 29 Aug 2023 08:07:14 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 903743857C44 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1693296434; bh=3plRP3u6uiV89dmes4q22ptMpkLrga606jOuwM2a1I8=; h=Date:To:Cc:Subject:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:From; b=OkfgRRYxlBPMk+i3JKAxl/dOk9I7MyHjTxWtKEXop6lXxLfZEYP725XD+xt0iIBJI w/fkF+DLuUj0hCRvTn/Kc2J6yZSAixBoRW1bZxInUOh32GZ8tRaShAP0/4XScBwrau hrJHV31HTi3lAnZDc5NQjC1BblNC7w9QGAYqFXpw= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id A39A03858D28 for ; Tue, 29 Aug 2023 08:06:27 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A39A03858D28 Received: from mimecast-mx02.redhat.com (66.187.233.73 [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-516-AK6NYWHtPeCV0IFKkEocTQ-1; Tue, 29 Aug 2023 04:06:25 -0400 X-MC-Unique: AK6NYWHtPeCV0IFKkEocTQ-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 6BDA029AB400; Tue, 29 Aug 2023 08:06:25 +0000 (UTC) Received: from tucnak.zalov.cz (unknown [10.45.224.16]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 31D0340C2063; Tue, 29 Aug 2023 08:06:25 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.17.1/8.17.1) with ESMTPS id 37T86N6v3858834 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Tue, 29 Aug 2023 10:06:23 +0200 Received: (from jakub@localhost) by tucnak.zalov.cz (8.17.1/8.17.1/Submit) id 37T86NuJ3858833; Tue, 29 Aug 2023 10:06:23 +0200 Date: Tue, 29 Aug 2023 10:06:23 +0200 To: Richard Biener Cc: gcc-patches@gcc.gnu.org Subject: [PATCH] tree-ssa-math-opts: Improve uaddc/usubc pattern matching [PR111209] Message-ID: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.1 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Disposition: inline X-Spam-Status: No, score=-3.7 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Jakub Jelinek via Gcc-patches From: Jakub Jelinek Reply-To: Jakub Jelinek Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1775550010970629018 X-GMAIL-MSGID: 1775550010970629018 Hi! The uaddc/usubc usual matching is of the .{ADD,SUB}_OVERFLOW pair in the middle, which adds/subtracts carry-in (from lower limbs) and computes carry-out (to higher limbs). Before optimizations (unless user writes it intentionally that way already), all the steps look the same, but optimizations simplify the handling of the least significant limb (one which adds/subtracts 0 carry-in) to just a single .{ADD,SUB}_OVERFLOW and the handling of the most significant limb if the computed carry-out is ignored to normal addition/subtraction of multiple operands. Now, match_uaddc_usubc has code to turn that least significant .{ADD,SUB}_OVERFLOW call into .U{ADD,SUB}C call with 0 carry-in if a more significant limb above it is matched into .U{ADD,SUB}C; this isn't necessary for functionality, as .ADD_OVERFLOW (x, y) is functionally equal to .UADDC (x, y, 0) (provided the types of operands are the same and result is complex type with that type element), and it also has code to match the most significant limb with ignored carry-out (in that case one pattern match turns both the penultimate limb pair of .{ADD,SUB}_OVERFLOW into .U{ADD,SUB}C and the addition/subtraction of the 4 values (2 carries) into another .U{ADD,SUB}C. As the following patch shows, what we weren't handling is the case when one uses either the __builtin_{add,sub}c builtins or hand written forms thereof (either __builtin_*_overflow or even that written by hand) for just 2 limbs, where the least significant has 0 carry-in and the most significant ignores carry-out. The following patch matches that, e.g. _16 = .ADD_OVERFLOW (_1, _2); _17 = REALPART_EXPR <_16>; _18 = IMAGPART_EXPR <_16>; _15 = _3 + _4; _12 = _15 + _18; into _16 = .UADDC (_1, _2, 0); _17 = REALPART_EXPR <_16>; _18 = IMAGPART_EXPR <_16>; _19 = .UADDC (_3, _4, _18); _12 = IMAGPART_EXPR <_19>; so that we can emit better code. As the 2 later comments show, we must do that carefully, because the pass walks the IL from first to last stmt in a bb and we must avoid pattern matching this way something that should be matched on a later instruction differently. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2023-08-29 Jakub Jelinek PR middle-end/79173 PR middle-end/111209 * tree-ssa-math-opts.cc (match_uaddc_usubc): Match also just 2 limb uaddc/usubc with 0 carry-in on lower limb and ignored carry-out on higher limb. Don't match it though if it could be matched later on 4 argument addition/subtraction. * gcc.target/i386/pr79173-12.c: New test. Jakub --- gcc/tree-ssa-math-opts.cc.jj 2023-08-08 15:55:09.498122557 +0200 +++ gcc/tree-ssa-math-opts.cc 2023-08-28 20:51:31.893886862 +0200 @@ -4641,8 +4641,135 @@ match_uaddc_usubc (gimple_stmt_iterator __imag__ of something, verify it is .UADDC/.USUBC. */ tree rhs1 = gimple_assign_rhs1 (im); gimple *ovf = SSA_NAME_DEF_STMT (TREE_OPERAND (rhs1, 0)); + tree ovf_lhs = NULL_TREE; + tree ovf_arg1 = NULL_TREE, ovf_arg2 = NULL_TREE; if (gimple_call_internal_p (ovf, code == PLUS_EXPR - ? IFN_UADDC : IFN_USUBC) + ? IFN_ADD_OVERFLOW + : IFN_SUB_OVERFLOW)) + { + /* Or verify it is .ADD_OVERFLOW/.SUB_OVERFLOW. + This is for the case of 2 chained .UADDC/.USUBC, + where the first one uses 0 carry-in and the second + one ignores the carry-out. + So, something like: + _16 = .ADD_OVERFLOW (_1, _2); + _17 = REALPART_EXPR <_16>; + _18 = IMAGPART_EXPR <_16>; + _15 = _3 + _4; + _12 = _15 + _18; + where the first 3 statements come from the lower + limb addition and the last 2 from the higher limb + which ignores carry-out. */ + ovf_lhs = gimple_call_lhs (ovf); + tree ovf_lhs_type = TREE_TYPE (TREE_TYPE (ovf_lhs)); + ovf_arg1 = gimple_call_arg (ovf, 0); + ovf_arg2 = gimple_call_arg (ovf, 1); + /* In that case we need to punt if the types don't + mismatch. */ + if (!types_compatible_p (type, ovf_lhs_type) + || !types_compatible_p (type, TREE_TYPE (ovf_arg1)) + || !types_compatible_p (type, + TREE_TYPE (ovf_arg2))) + ovf_lhs = NULL_TREE; + else + { + for (int i = (code == PLUS_EXPR ? 1 : 0); + i >= 0; --i) + { + tree r = gimple_call_arg (ovf, i); + if (TREE_CODE (r) != SSA_NAME) + continue; + if (uaddc_is_cplxpart (SSA_NAME_DEF_STMT (r), + REALPART_EXPR)) + { + /* Punt if one of the args which isn't + subtracted isn't __real__; that could + then prevent better match later. + Consider: + _3 = .ADD_OVERFLOW (_1, _2); + _4 = REALPART_EXPR <_3>; + _5 = IMAGPART_EXPR <_3>; + _7 = .ADD_OVERFLOW (_4, _6); + _8 = REALPART_EXPR <_7>; + _9 = IMAGPART_EXPR <_7>; + _12 = _10 + _11; + _13 = _12 + _9; + _14 = _13 + _5; + We want to match this when called on + the last stmt as a pair of .UADDC calls, + but without this check we could turn + that prematurely on _13 = _12 + _9; + stmt into .UADDC with 0 carry-in just + on the second .ADD_OVERFLOW call and + another replacing the _12 and _13 + additions. */ + ovf_lhs = NULL_TREE; + break; + } + } + } + if (ovf_lhs) + { + use_operand_p use_p; + imm_use_iterator iter; + tree re_lhs = NULL_TREE; + FOR_EACH_IMM_USE_FAST (use_p, iter, ovf_lhs) + { + gimple *use_stmt = USE_STMT (use_p); + if (is_gimple_debug (use_stmt)) + continue; + if (use_stmt == im) + continue; + if (!uaddc_is_cplxpart (use_stmt, + REALPART_EXPR)) + { + ovf_lhs = NULL_TREE; + break; + } + re_lhs = gimple_assign_lhs (use_stmt); + } + if (ovf_lhs && re_lhs) + { + FOR_EACH_IMM_USE_FAST (use_p, iter, re_lhs) + { + gimple *use_stmt = USE_STMT (use_p); + if (is_gimple_debug (use_stmt)) + continue; + internal_fn ifn + = gimple_call_internal_fn (ovf); + /* Punt if the __real__ of lhs is used + in the same .*_OVERFLOW call. + Consider: + _3 = .ADD_OVERFLOW (_1, _2); + _4 = REALPART_EXPR <_3>; + _5 = IMAGPART_EXPR <_3>; + _7 = .ADD_OVERFLOW (_4, _6); + _8 = REALPART_EXPR <_7>; + _9 = IMAGPART_EXPR <_7>; + _12 = _10 + _11; + _13 = _12 + _5; + _14 = _13 + _9; + We want to match this when called on + the last stmt as a pair of .UADDC calls, + but without this check we could turn + that prematurely on _13 = _12 + _5; + stmt into .UADDC with 0 carry-in just + on the first .ADD_OVERFLOW call and + another replacing the _12 and _13 + additions. */ + if (gimple_call_internal_p (use_stmt, ifn)) + { + ovf_lhs = NULL_TREE; + break; + } + } + } + } + } + if ((ovf_lhs + || gimple_call_internal_p (ovf, + code == PLUS_EXPR + ? IFN_UADDC : IFN_USUBC)) && (optab_handler (code == PLUS_EXPR ? uaddc5_optab : usubc5_optab, TYPE_MODE (type)) @@ -4668,6 +4795,26 @@ match_uaddc_usubc (gimple_stmt_iterator TREE_TYPE (ilhs), nlhs)); gsi_replace (gsi, g, true); + /* And if it is initialized from result of __imag__ + of .{ADD,SUB}_OVERFLOW call, replace that + call with .U{ADD,SUB}C call with the same arguments, + just 0 added as third argument. This isn't strictly + necessary, .ADD_OVERFLOW (x, y) and .UADDC (x, y, 0) + produce the same result, but may result in better + generated code on some targets where the backend can + better prepare in how the result will be used. */ + if (ovf_lhs) + { + tree zero = build_zero_cst (type); + g = gimple_build_call_internal (code == PLUS_EXPR + ? IFN_UADDC + : IFN_USUBC, + 3, ovf_arg1, + ovf_arg2, zero); + gimple_call_set_lhs (g, ovf_lhs); + gimple_stmt_iterator gsi2 = gsi_for_stmt (ovf); + gsi_replace (&gsi2, g, true); + } return true; } } --- gcc/testsuite/gcc.target/i386/pr79173-12.c.jj 2023-08-28 19:51:23.574518679 +0200 +++ gcc/testsuite/gcc.target/i386/pr79173-12.c 2023-08-28 19:51:52.260124877 +0200 @@ -0,0 +1,48 @@ +/* PR middle-end/79173 */ +/* PR middle-end/111209 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */ +/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "subq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "subl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */ + +static unsigned long +uaddc (unsigned long x, unsigned long y, unsigned long carry_in, unsigned long *carry_out) +{ + unsigned long r; + unsigned long c1 = __builtin_add_overflow (x, y, &r); + unsigned long c2 = __builtin_add_overflow (r, carry_in, &r); + *carry_out = c1 + c2; + return r; +} + +static unsigned long +usubc (unsigned long x, unsigned long y, unsigned long carry_in, unsigned long *carry_out) +{ + unsigned long r; + unsigned long c1 = __builtin_sub_overflow (x, y, &r); + unsigned long c2 = __builtin_sub_overflow (r, carry_in, &r); + *carry_out = c1 + c2; + return r; +} + +void +foo (unsigned long *p, unsigned long *q) +{ + unsigned long c; + p[0] = uaddc (p[0], q[0], 0, &c); + p[1] = uaddc (p[1], q[1], c, &c); +} + +void +bar (unsigned long *p, unsigned long *q) +{ + unsigned long c; + p[0] = usubc (p[0], q[0], 0, &c); + p[1] = usubc (p[1], q[1], c, &c); +}