Message ID | 20220901032400.23692-1-guojiufu@linux.ibm.com |
---|---|
State | New, archived |
Headers |
Return-Path: <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:ecc5:0:0:0:0:0 with SMTP id s5csp57197wro; Wed, 31 Aug 2022 20:25:03 -0700 (PDT) X-Google-Smtp-Source: AA6agR5o170soAhEesn6lB+pwLYwZ1D6GeLrMryOMifSvVLIvl1NZ4q1oXei5+b3BsPXDm5p81xC X-Received: by 2002:a05:6402:e01:b0:442:dd7e:f49d with SMTP id h1-20020a0564020e0100b00442dd7ef49dmr26669749edh.355.1662002702996; Wed, 31 Aug 2022 20:25:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1662002702; cv=none; d=google.com; s=arc-20160816; b=c8kyujqaMxuBDwpxVqED1rFtsp5ZQt4u5cmzKok3s9ZBM7+04xKcQhUakXY+dyQs19 wLSt8yicAlZELx7ltdyvA3IH3d27QTjDj3McYBE4+I+BqoXNvUvTVAepq0MyRL33IT4M YuC7okDGIB1WatDMsiUkTePHL8zfAnQ3mN/kQ4ZulKm0uqlFGxhjtFCwad3sUEJT/EA4 1Y4dohsqRHeJFW4TY8qkmU3yLtnEK/HRe2c9+Mw5E9f3wVjWJRryiS1RfJnpPu9//KAg JQIy4KiAsgQpSP1crcI1dQUdd600Tsg+DHsudv8SjGYtigCKQb9F1I/I5LLlw+maALPj 2qMg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:reply-to:from:list-subscribe:list-help :list-post:list-archive:list-unsubscribe:list-id:precedence :mime-version:message-id:date:subject:to:dmarc-filter:delivered-to :dkim-signature:dkim-filter; bh=/R/EPxNjOZSVKzuELq+ONanIFwQSoA7140WyrF6eYPc=; b=IPNNroZxJCTqice26LhSvSrG3cAeezd2IBQNNwxEJw1FB1t5995e8wMO08u53KcXNB vpNGeDusQ0R0VGz1cNQXOLda2HvySz0hzld6Nl6JkGuAtgvAR0FOltFXdDGLlfiWfW00 LMQvAcJQswOlAKTTAzrsGiOaSEyrFvN7ZydGr3EoTNi2OYDl43oI1OY1WvXtNbQz9s2g qV+Vx05+qYie1KHfskDNVjqJWEsy1ZBZfAn80ZgccbcSifU7HPAZfOTy1gx0obN0p9s7 sb6sXJTwks/pMEjT9C4TGZ2kUiFHX5F8LJZ9T9W+jBN+jIXrM+d5mvSWtfrAJtLDQvdW Xaiw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=Fsbz92rS; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id cw18-20020a170906479200b00741827c60c4si8841353ejc.900.2022.08.31.20.25.02 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 31 Aug 2022 20:25:02 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=Fsbz92rS; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 8FE6338300A6 for <ouuuleilei@gmail.com>; Thu, 1 Sep 2022 03:24:54 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 8FE6338300A6 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1662002694; bh=/R/EPxNjOZSVKzuELq+ONanIFwQSoA7140WyrF6eYPc=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=Fsbz92rSF34OaaJa+mzv+AxprhXoljGwrh5fizjSwlVoxRVllEYTRMjs+lF+UfwQ2 xfxdz2AN7Rh687x72io7WRLLkGQoKtG7cWgvcRGGC5CxmXDFSfxVzOho1mqDyjernn LW0G1v9ot9bWamtA2FHW05gzgZ1Hxh1Pyki+QXik= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id F20163834F16; Thu, 1 Sep 2022 03:24:09 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org F20163834F16 Received: from pps.filterd (m0098409.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 2813IH0C003956; Thu, 1 Sep 2022 03:24:08 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3jamqc04ye-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 01 Sep 2022 03:24:08 +0000 Received: from m0098409.ppops.net (m0098409.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 2813KF3P011702; Thu, 1 Sep 2022 03:24:08 GMT Received: from ppma01fra.de.ibm.com (46.49.7a9f.ip4.static.sl-reverse.com [159.122.73.70]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3jamqc04x2-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 01 Sep 2022 03:24:07 +0000 Received: from pps.filterd (ppma01fra.de.ibm.com [127.0.0.1]) by ppma01fra.de.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 2813M9Qc006897; Thu, 1 Sep 2022 03:24:05 GMT Received: from b06cxnps4075.portsmouth.uk.ibm.com (d06relay12.portsmouth.uk.ibm.com [9.149.109.197]) by ppma01fra.de.ibm.com with ESMTP id 3j8hkab8tv-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 01 Sep 2022 03:24:05 +0000 Received: from d06av24.portsmouth.uk.ibm.com (d06av24.portsmouth.uk.ibm.com [9.149.105.60]) by b06cxnps4075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 2813O2hQ32702794 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 1 Sep 2022 03:24:02 GMT Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 4305F4203F; Thu, 1 Sep 2022 03:24:02 +0000 (GMT) Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 5F35942045; Thu, 1 Sep 2022 03:24:01 +0000 (GMT) Received: from pike.rch.stglabs.ibm.com (unknown [9.5.12.127]) by d06av24.portsmouth.uk.ibm.com (Postfix) with ESMTP; Thu, 1 Sep 2022 03:24:01 +0000 (GMT) To: gcc-patches@gcc.gnu.org Subject: [PATCH 1/2] Using pli(paddi) and rotate to build 64bit constants Date: Thu, 1 Sep 2022 11:24:00 +0800 Message-Id: <20220901032400.23692-1-guojiufu@linux.ibm.com> X-Mailer: git-send-email 2.17.1 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: TG_QEFwUV8BMra6PPr1ILEbrG3mhr2QQ X-Proofpoint-ORIG-GUID: kbRTokjymdBt0zqitpqet8FYUuF_qn1U X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-09-01_01,2022-08-31_03,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 impostorscore=0 mlxscore=0 malwarescore=0 suspectscore=0 clxscore=1015 phishscore=0 priorityscore=1501 adultscore=0 bulkscore=0 mlxlogscore=999 spamscore=0 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2207270000 definitions=main-2209010011 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> From: Jiufu Guo via Gcc-patches <gcc-patches@gcc.gnu.org> Reply-To: Jiufu Guo <guojiufu@linux.ibm.com> Cc: meissner@linux.ibm.com, dje.gcc@gmail.com, segher@kernel.crashing.org, linkw@gcc.gnu.org Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org> X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1742736146390944823?= X-GMAIL-MSGID: =?utf-8?q?1742736146390944823?= |
Series |
[1/2] Using pli(paddi) and rotate to build 64bit constants
|
|
Commit Message
Jiufu Guo
Sept. 1, 2022, 3:24 a.m. UTC
Hi, As mentioned in PR106550, since pli could support 34bits immediate, we could use less instructions(3insn would be ok) to build 64bits constant with pli. For example, for constant 0x020805006106003, we could generate it with: asm code1: pli 9,101736451 (0x6106003) sldi 9,9,32 paddi 9,9, 2130000 (0x0208050) or asm code2: pli 10, 2130000 pli 9, 101736451 rldimi 9, 10, 32, 0 Testing with simple cases as below, run them a lot of times: f1.c long __attribute__ ((noinline)) foo (long *arg,long *,long*) { *arg = 0x2351847027482577; } 5insns: base pli+sldi+paddi: similar -0.08% pli+pli+rldimi: faster +0.66% f2.c long __attribute__ ((noinline)) foo (long *arg, long *arg2, long *arg3) { *arg = 0x2351847027482577; *arg2 = 0x3257845024384680; *arg3 = 0x1245abcef9240dec; } 5nisns: base pli+sldi+paddi: faster +1.35% pli+pli+rldimi: faster +5.49% f2.c would be more meaningful. Because 'sched passes' are effective for f2.c, but 'scheds' do less thing for f1.c. Compare with previous patch: https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599525.html This one updates code slightly and extracts changes on rs6000.md to a seperate patch. This patch pass boostrap and regtest on ppc64le(includes p10). Is it ok for trunk? BR, Jeff(Jiufu) PR target/106550 gcc/ChangeLog: * config/rs6000/rs6000.cc (rs6000_emit_set_long_const): Add 'pli' for constant building. gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr106550.c: New test. --- gcc/config/rs6000/rs6000.cc | 39 +++++++++++++++++++++ gcc/testsuite/gcc.target/powerpc/pr106550.c | 14 ++++++++ 2 files changed, 53 insertions(+) create mode 100644 gcc/testsuite/gcc.target/powerpc/pr106550.c
Comments
Hi! This patch is a clear improvement :-) On Thu, Sep 01, 2022 at 11:24:00AM +0800, Jiufu Guo wrote: > As mentioned in PR106550, since pli could support 34bits immediate, we could > use less instructions(3insn would be ok) to build 64bits constant with pli. > For example, for constant 0x020805006106003, we could generate it with: > asm code1: > pli 9,101736451 (0x6106003) > sldi 9,9,32 > paddi 9,9, 2130000 (0x0208050) 3 insns, 2 insns dependent on the previous, each. > or asm code2: > pli 10, 2130000 > pli 9, 101736451 > rldimi 9, 10, 32, 0 3 insns, 1 insn dependent on both others. > Testing with simple cases as below, run them a lot of times: > f1.c > long __attribute__ ((noinline)) foo (long *arg,long *,long*) > { > *arg = 0x2351847027482577; > } > 5insns: base > pli+sldi+paddi: similar -0.08% > pli+pli+rldimi: faster +0.66% This mostly tests how well this micro-benchmark is scheduled. More time is spent in the looping and function calls (not shown)! > f2.c > long __attribute__ ((noinline)) foo (long *arg, long *arg2, long *arg3) > { > *arg = 0x2351847027482577; > *arg2 = 0x3257845024384680; > *arg3 = 0x1245abcef9240dec; > } > 5nisns: base > pli+sldi+paddi: faster +1.35% > pli+pli+rldimi: faster +5.49% > > f2.c would be more meaningful. Because 'sched passes' are effective for > f2.c, but 'scheds' do less thing for f1.c. It still is a too small example to mean much without looking at a pipeview, or at the very least perf. But the results show a solid improvement as expected ;-) > gcc/ChangeLog: > PR target/106550 > * config/rs6000/rs6000.cc (rs6000_emit_set_long_const): Add 'pli' for > constant building. "Use pli." ? > gcc/testsuite/ChangeLog: > PR target/106550 > * gcc.target/powerpc/pr106550.c: New test. > + else if (TARGET_PREFIXED) > + { > + /* pli 9,high32 + pli 10,low32 + rldimi 9,10,32,0. */ But not just 9 and 10. Use A and B or X and Y or H and L or something like that? The comment goes... > + if (can_create_pseudo_p ()) > + { ... here. > + temp = gen_reg_rtx (DImode); > + rtx temp1 = gen_reg_rtx (DImode); > + emit_move_insn (copy_rtx (temp), GEN_INT ((ud4 << 16) | ud3)); > + emit_move_insn (copy_rtx (temp1), GEN_INT ((ud2 << 16) | ud1)); > + > + emit_insn (gen_rotldi3_insert_3 (dest, temp, GEN_INT (32), temp1, > + GEN_INT (0xffffffff))); > + } > + No blank line here please. > + /* pli 9,high32 + sldi 9,32 + paddi 9,9,low32. */ > + else > + { The comment goes here, in the block it refers to. Comments for a block are the first thing *in* the block. > + emit_move_insn (copy_rtx (dest), GEN_INT ((ud4 << 16) | ud3)); > + > + emit_move_insn (copy_rtx (dest), > + gen_rtx_ASHIFT (DImode, copy_rtx (dest), > + GEN_INT (32))); > + > + bool can_use_paddi = REGNO (dest) != FIRST_GPR_REGNO; There should be a test that we so the right thing (or *a* right thing, anyway; a working thing; but hopefully a reasonably fast thing) for !can_use_paddi. > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr106550.c > @@ -0,0 +1,14 @@ > +/* PR target/106550 */ > +/* { dg-options "-O2 -std=c99 -mdejagnu-cpu=power10" } */ > + > +void > +foo (unsigned long long *a) > +{ > + *a++ = 0x020805006106003; > + *a++ = 0x2351847027482577; > +} > + > +/* 3 insns for each constant: pli+sldi+paddi or pli+pli+rldimi. > + And 3 additional insns: std+std+blr: 9 insns totally. */ > +/* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 9 } } */ Also test the expected insns separately please? The std's (just with \mstd so it will catch all variations as well), the blr, the pli's and the rldimi etc.? We also should test all special cases as well. Especially those that do not happen all over the place, we will notice something is broken there easy enough. But unlikely cases can take years to show up. Okay for trunk with the formatting fixed. Thank you! Segher
Hi Jeff, Thanks for the patch, some comments on nits are inline. on 2022/9/1 11:24, Jiufu Guo wrote: > Hi, > > As mentioned in PR106550, since pli could support 34bits immediate, we could > use less instructions(3insn would be ok) to build 64bits constant with pli. > > For example, for constant 0x020805006106003, we could generate it with: > asm code1: > pli 9,101736451 (0x6106003) > sldi 9,9,32 > paddi 9,9, 2130000 (0x0208050) > > or asm code2: > pli 10, 2130000 > pli 9, 101736451 > rldimi 9, 10, 32, 0 > > Testing with simple cases as below, run them a lot of times: > f1.c > long __attribute__ ((noinline)) foo (long *arg,long *,long*) > { > *arg = 0x2351847027482577; > } > 5insns: base > pli+sldi+paddi: similar -0.08% > pli+pli+rldimi: faster +0.66% > > f2.c > long __attribute__ ((noinline)) foo (long *arg, long *arg2, long *arg3) > { > *arg = 0x2351847027482577; > *arg2 = 0x3257845024384680; > *arg3 = 0x1245abcef9240dec; > } > 5nisns: base > pli+sldi+paddi: faster +1.35% > pli+pli+rldimi: faster +5.49% > > f2.c would be more meaningful. Because 'sched passes' are effective for > f2.c, but 'scheds' do less thing for f1.c. > > Compare with previous patch: > https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599525.html > This one updates code slightly and extracts changes on rs6000.md to a > seperate patch. > > This patch pass boostrap and regtest on ppc64le(includes p10). > Is it ok for trunk? > > BR, > Jeff(Jiufu) > > > PR target/106550 > > gcc/ChangeLog: > > * config/rs6000/rs6000.cc (rs6000_emit_set_long_const): Add 'pli' for > constant building. > > gcc/testsuite/ChangeLog: > > * gcc.target/powerpc/pr106550.c: New test. > > --- > gcc/config/rs6000/rs6000.cc | 39 +++++++++++++++++++++ > gcc/testsuite/gcc.target/powerpc/pr106550.c | 14 ++++++++ > 2 files changed, 53 insertions(+) > create mode 100644 gcc/testsuite/gcc.target/powerpc/pr106550.c > > diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc > index df491bee2ea..1ccb2ff30a1 100644 > --- a/gcc/config/rs6000/rs6000.cc > +++ b/gcc/config/rs6000/rs6000.cc > @@ -10181,6 +10181,45 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c) > gen_rtx_IOR (DImode, copy_rtx (temp), > GEN_INT (ud1))); > } > + else if (TARGET_PREFIXED) > + { > + /* pli 9,high32 + pli 10,low32 + rldimi 9,10,32,0. */ > + if (can_create_pseudo_p ()) > + { > + temp = gen_reg_rtx (DImode); > + rtx temp1 = gen_reg_rtx (DImode); > + emit_move_insn (copy_rtx (temp), GEN_INT ((ud4 << 16) | ud3)); > + emit_move_insn (copy_rtx (temp1), GEN_INT ((ud2 << 16) | ud1)); > + Nit: copy_rtx here seems not necessary, as both temp and temp1 are with CODE REG. The function copy_rtx returns the given rtx for code REG. > + emit_insn (gen_rotldi3_insert_3 (dest, temp, GEN_INT (32), temp1, > + GEN_INT (0xffffffff))); > + } > + > + /* pli 9,high32 + sldi 9,32 + paddi 9,9,low32. */ > + else > + { > + emit_move_insn (copy_rtx (dest), GEN_INT ((ud4 << 16) | ud3)); > + > + emit_move_insn (copy_rtx (dest), > + gen_rtx_ASHIFT (DImode, copy_rtx (dest), > + GEN_INT (32))); > + > + bool can_use_paddi = REGNO (dest) != FIRST_GPR_REGNO; > + The REGNO usage has asserted dest is with CODE REG, if it's always true I don't see why we need copy_rtx around. Or do I miss something? > + /* Use paddi for the low32 bits. */ > + if (ud2 != 0 && ud1 != 0 && can_use_paddi) > + emit_move_insn (dest, gen_rtx_PLUS (DImode, copy_rtx (dest), > + GEN_INT ((ud2 << 16) | ud1))); > + /* Use oris, ori for low32 bits. */ > + if (ud2 != 0 && (ud1 == 0 || !can_use_paddi)) > + emit_move_insn (ud1 != 0 ? copy_rtx (dest) : dest, > + gen_rtx_IOR (DImode, copy_rtx (dest), > + GEN_INT (ud2 << 16))); > + if (ud1 != 0 && (ud2 == 0 || !can_use_paddi)) > + emit_move_insn (dest, gen_rtx_IOR (DImode, copy_rtx (dest), > + GEN_INT (ud1))); > + } > + } > else > { > temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode); > diff --git a/gcc/testsuite/gcc.target/powerpc/pr106550.c b/gcc/testsuite/gcc.target/powerpc/pr106550.c > new file mode 100644 > index 00000000000..c6f4116bb9a > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr106550.c > @@ -0,0 +1,14 @@ > +/* PR target/106550 */ > +/* { dg-options "-O2 -std=c99 -mdejagnu-cpu=power10" } */ > + Need to check power10_ok, like: /* { dg-require-effective-target power10_ok } */ Nit: -std=c99 is not needed? BR, Kewen
Hi, Segher Boessenkool <segher@kernel.crashing.org> writes: > Hi! > > This patch is a clear improvement :-) > > On Thu, Sep 01, 2022 at 11:24:00AM +0800, Jiufu Guo wrote: >> As mentioned in PR106550, since pli could support 34bits immediate, we could >> use less instructions(3insn would be ok) to build 64bits constant with pli. > >> For example, for constant 0x020805006106003, we could generate it with: >> asm code1: >> pli 9,101736451 (0x6106003) >> sldi 9,9,32 >> paddi 9,9, 2130000 (0x0208050) > > 3 insns, 2 insns dependent on the previous, each. Yeap. > >> or asm code2: >> pli 10, 2130000 >> pli 9, 101736451 >> rldimi 9, 10, 32, 0 > > 3 insns, 1 insn dependent on both others. Yes. > >> Testing with simple cases as below, run them a lot of times: >> f1.c >> long __attribute__ ((noinline)) foo (long *arg,long *,long*) >> { >> *arg = 0x2351847027482577; >> } >> 5insns: base >> pli+sldi+paddi: similar -0.08% >> pli+pli+rldimi: faster +0.66% > > This mostly tests how well this micro-benchmark is scheduled. More time > is spent in the looping and function calls (not shown)! > >> f2.c >> long __attribute__ ((noinline)) foo (long *arg, long *arg2, long *arg3) >> { >> *arg = 0x2351847027482577; >> *arg2 = 0x3257845024384680; >> *arg3 = 0x1245abcef9240dec; >> } >> 5nisns: base >> pli+sldi+paddi: faster +1.35% >> pli+pli+rldimi: faster +5.49% >> >> f2.c would be more meaningful. Because 'sched passes' are effective for >> f2.c, but 'scheds' do less thing for f1.c. > > It still is a too small example to mean much without looking at a > pipeview, or at the very least perf. But the results show a solid > improvement as expected ;-) Right, checking how the 'cycles' are using on each instructions would be more meaningful to demonstrate how the runtime is changing. > >> gcc/ChangeLog: >> PR target/106550 >> * config/rs6000/rs6000.cc (rs6000_emit_set_long_const): Add 'pli' for >> constant building. > > "Use pli." ? Thanks, will update. > >> gcc/testsuite/ChangeLog: >> PR target/106550 >> * gcc.target/powerpc/pr106550.c: New test. > >> + else if (TARGET_PREFIXED) >> + { >> + /* pli 9,high32 + pli 10,low32 + rldimi 9,10,32,0. */ > > But not just 9 and 10. Use A and B or X and Y or H and L or something > like that? OK, will updata accordingly. > > The comment goes... > >> + if (can_create_pseudo_p ()) >> + { > > ... here. > >> + temp = gen_reg_rtx (DImode); >> + rtx temp1 = gen_reg_rtx (DImode); >> + emit_move_insn (copy_rtx (temp), GEN_INT ((ud4 << 16) | ud3)); >> + emit_move_insn (copy_rtx (temp1), GEN_INT ((ud2 << 16) | ud1)); >> + >> + emit_insn (gen_rotldi3_insert_3 (dest, temp, GEN_INT (32), temp1, >> + GEN_INT (0xffffffff))); >> + } >> + > > No blank line here please. > >> + /* pli 9,high32 + sldi 9,32 + paddi 9,9,low32. */ >> + else >> + { > > The comment goes here, in the block it refers to. Comments for a block > are the first thing *in* the block. OK, great! I like the format you sugguested here :-) > >> + emit_move_insn (copy_rtx (dest), GEN_INT ((ud4 << 16) | ud3)); >> + >> + emit_move_insn (copy_rtx (dest), >> + gen_rtx_ASHIFT (DImode, copy_rtx (dest), >> + GEN_INT (32))); >> + >> + bool can_use_paddi = REGNO (dest) != FIRST_GPR_REGNO; > > There should be a test that we so the right thing (or *a* right thing, > anyway; a working thing; but hopefully a reasonably fast thing) for > !can_use_paddi. To catch this test point, we need let the splitter run after RA, and register 0 happen to be the dest of an assignment. Oh, below case would be useful for this test point: /* { dg-options "-O2 -std=c99 -mdejagnu-cpu=power10 -fdisable-rtl-split1" } */ /* force the constant splitter run after RA: -fdisable-rtl-split1 a few assignments to make sure r0 is allocated as dest of an assign. */ void foo (unsigned long long *a) { *a++ = 0x020805006106003; *a++ = 0x2351847027482587; *a++ = 0x22513478874a2578; *a++ = 0x02180570670b003; *a++ = 0x2311847029488587; *a++ = 0x335184b02748757f; *a++ = 0x720805006096003; *a++ = 0x23a18b70a74e257e; *a++ = 0x2a518a70a74a2567; *a++ = 0x5208a5da0606a03; *a++ = 0x1391a47a2749257a; *a++ = 0x235a847027488576; *a++ = 0x23a1847027482677; } /* { dg-final { scan-assembler-times {\moris\M} 1 } } */ /* { dg-final { scan-assembler-times {\mori\M} 1 } } */ I will add this test case in patch. Is this ok? Any sugguestions? > >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/powerpc/pr106550.c >> @@ -0,0 +1,14 @@ >> +/* PR target/106550 */ >> +/* { dg-options "-O2 -std=c99 -mdejagnu-cpu=power10" } */ >> + >> +void >> +foo (unsigned long long *a) >> +{ >> + *a++ = 0x020805006106003; >> + *a++ = 0x2351847027482577; >> +} >> + >> +/* 3 insns for each constant: pli+sldi+paddi or pli+pli+rldimi. >> + And 3 additional insns: std+std+blr: 9 insns totally. */ >> +/* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 9 } } */ > > Also test the expected insns separately please? The std's (just with > \mstd so it will catch all variations as well), the blr, the pli's and > the rldimi etc.? The reason of using "(?n)^\s+[a-z]" is to keep this test case pass no matter the splitter running before or after RA. Yes, using real instructions to check would be better. Will update the case to check real instructions 'pli' and 'rldimi'. > > We also should test all special cases as well. Especially those that do > not happen all over the place, we will notice something is broken there > easy enough. But unlikely cases can take years to show up. Understand :) BR, Jeff(Jiufu) > > Okay for trunk with the formatting fixed. Thank you! > > > Segher
On 9/1/22 4:52 PM, Segher Boessenkool wrote: > On Thu, Sep 01, 2022 at 11:24:00AM +0800, Jiufu Guo wrote: >> As mentioned in PR106550, since pli could support 34bits immediate, we could >> use less instructions(3insn would be ok) to build 64bits constant with pli. > >> For example, for constant 0x020805006106003, we could generate it with: >> asm code1: >> pli 9,101736451 (0x6106003) >> sldi 9,9,32 >> paddi 9,9, 2130000 (0x0208050) > > 3 insns, 2 insns dependent on the previous, each. > >> or asm code2: >> pli 10, 2130000 >> pli 9, 101736451 >> rldimi 9, 10, 32, 0 > > 3 insns, 1 insn dependent on both others. Yeah, the improvement here is the fewer dependent instructions, since 2 64-bit + 1 32-bit instructions is the same size as 5 32-bit insns. Those 5 32-bit insns are all dependent on the previous insn, so not ideal. It's too bad we don't have a paddis or poris insns where we could specify in the prefix a shift of 32-bits rather than the normal 16-bits. If we had those, we could generate the constant with just 2 64-bit insns. Peter
Hi! On Fri, Sep 02, 2022 at 02:56:21PM +0800, Jiufu Guo wrote: > >> + /* pli 9,high32 + sldi 9,32 + paddi 9,9,low32. */ > >> + else > >> + { > > > > The comment goes here, in the block it refers to. Comments for a block > > are the first thing *in* the block. > OK, great! I like the format you sugguested here :-) It's the normal GCC style, not my invention :-) > >> + emit_move_insn (copy_rtx (dest), GEN_INT ((ud4 << 16) | ud3)); > >> + > >> + emit_move_insn (copy_rtx (dest), > >> + gen_rtx_ASHIFT (DImode, copy_rtx (dest), > >> + GEN_INT (32))); > >> + > >> + bool can_use_paddi = REGNO (dest) != FIRST_GPR_REGNO; > > > > There should be a test that we so the right thing (or *a* right thing, > > anyway; a working thing; but hopefully a reasonably fast thing) for > > !can_use_paddi. > To catch this test point, we need let the splitter run after RA, > and register 0 happen to be the dest of an assignment. Or force the testcase to use r0 some other way. Well, "forcing" cannot be done, but we can probably encourage it (via a local register asm for example, or by tying the var to the output of an asm that is hard reg 0, or perhaps there are other ways as well :-) ) > I will add this test case in patch. > Is this ok? Any sugguestions? Sounds useful yes. Maybe describe the expected output in words as well (in the testcase, not in email)? > >> +/* 3 insns for each constant: pli+sldi+paddi or pli+pli+rldimi. > >> + And 3 additional insns: std+std+blr: 9 insns totally. */ > >> +/* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 9 } } */ > > > > Also test the expected insns separately please? The std's (just with > > \mstd so it will catch all variations as well), the blr, the pli's and > > the rldimi etc.? > The reason of using "(?n)^\s+[a-z]" is to keep this test case pass no > matter the splitter running before or after RA. Ah. Some short comment in the testcase please? Thanks again, Segher
On Fri, Sep 02, 2022 at 10:29:35AM -0500, Peter Bergner wrote: > On 9/1/22 4:52 PM, Segher Boessenkool wrote: > > On Thu, Sep 01, 2022 at 11:24:00AM +0800, Jiufu Guo wrote: > >> As mentioned in PR106550, since pli could support 34bits immediate, we could > >> use less instructions(3insn would be ok) to build 64bits constant with pli. > > > >> For example, for constant 0x020805006106003, we could generate it with: > >> asm code1: > >> pli 9,101736451 (0x6106003) > >> sldi 9,9,32 > >> paddi 9,9, 2130000 (0x0208050) > > > > 3 insns, 2 insns dependent on the previous, each. > > > >> or asm code2: > >> pli 10, 2130000 > >> pli 9, 101736451 > >> rldimi 9, 10, 32, 0 > > > > 3 insns, 1 insn dependent on both others. > > Yeah, the improvement here is the fewer dependent instructions, since > 2 64-bit + 1 32-bit instructions is the same size as 5 32-bit insns. It also helps CSE if you do say 0x1200aa00bb0034 and 0x1200aa00bb0056, or even just 0x1200aa001200aa maybe (we probably have a separate pattern for the latter though :-) ) Segher
"Kewen.Lin" <linkw@linux.ibm.com> writes: > Hi Jeff, > > Thanks for the patch, some comments on nits are inline. > > on 2022/9/1 11:24, Jiufu Guo wrote: >> Hi, >> >> As mentioned in PR106550, since pli could support 34bits immediate, we could >> use less instructions(3insn would be ok) to build 64bits constant with pli. >> >> For example, for constant 0x020805006106003, we could generate it with: >> asm code1: >> pli 9,101736451 (0x6106003) >> sldi 9,9,32 >> paddi 9,9, 2130000 (0x0208050) >> >> or asm code2: >> pli 10, 2130000 >> pli 9, 101736451 >> rldimi 9, 10, 32, 0 >> >> Testing with simple cases as below, run them a lot of times: >> f1.c >> long __attribute__ ((noinline)) foo (long *arg,long *,long*) >> { >> *arg = 0x2351847027482577; >> } >> 5insns: base >> pli+sldi+paddi: similar -0.08% >> pli+pli+rldimi: faster +0.66% >> >> f2.c >> long __attribute__ ((noinline)) foo (long *arg, long *arg2, long *arg3) >> { >> *arg = 0x2351847027482577; >> *arg2 = 0x3257845024384680; >> *arg3 = 0x1245abcef9240dec; >> } >> 5nisns: base >> pli+sldi+paddi: faster +1.35% >> pli+pli+rldimi: faster +5.49% >> >> f2.c would be more meaningful. Because 'sched passes' are effective for >> f2.c, but 'scheds' do less thing for f1.c. >> >> Compare with previous patch: >> https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599525.html >> This one updates code slightly and extracts changes on rs6000.md to a >> seperate patch. >> >> This patch pass boostrap and regtest on ppc64le(includes p10). >> Is it ok for trunk? >> >> BR, >> Jeff(Jiufu) >> >> >> PR target/106550 >> >> gcc/ChangeLog: >> >> * config/rs6000/rs6000.cc (rs6000_emit_set_long_const): Add 'pli' for >> constant building. >> >> gcc/testsuite/ChangeLog: >> >> * gcc.target/powerpc/pr106550.c: New test. >> >> --- >> gcc/config/rs6000/rs6000.cc | 39 +++++++++++++++++++++ >> gcc/testsuite/gcc.target/powerpc/pr106550.c | 14 ++++++++ >> 2 files changed, 53 insertions(+) >> create mode 100644 gcc/testsuite/gcc.target/powerpc/pr106550.c >> >> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc >> index df491bee2ea..1ccb2ff30a1 100644 >> --- a/gcc/config/rs6000/rs6000.cc >> +++ b/gcc/config/rs6000/rs6000.cc >> @@ -10181,6 +10181,45 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c) >> gen_rtx_IOR (DImode, copy_rtx (temp), >> GEN_INT (ud1))); >> } >> + else if (TARGET_PREFIXED) >> + { >> + /* pli 9,high32 + pli 10,low32 + rldimi 9,10,32,0. */ >> + if (can_create_pseudo_p ()) >> + { >> + temp = gen_reg_rtx (DImode); >> + rtx temp1 = gen_reg_rtx (DImode); >> + emit_move_insn (copy_rtx (temp), GEN_INT ((ud4 << 16) | ud3)); >> + emit_move_insn (copy_rtx (temp1), GEN_INT ((ud2 << 16) | ud1)); >> + > > Nit: copy_rtx here seems not necessary, as both temp and temp1 are with CODE REG. > The function copy_rtx returns the given rtx for code REG. > >> + emit_insn (gen_rotldi3_insert_3 (dest, temp, GEN_INT (32), temp1, >> + GEN_INT (0xffffffff))); >> + } >> + >> + /* pli 9,high32 + sldi 9,32 + paddi 9,9,low32. */ >> + else >> + { >> + emit_move_insn (copy_rtx (dest), GEN_INT ((ud4 << 16) | ud3)); >> + >> + emit_move_insn (copy_rtx (dest), >> + gen_rtx_ASHIFT (DImode, copy_rtx (dest), >> + GEN_INT (32))); >> + >> + bool can_use_paddi = REGNO (dest) != FIRST_GPR_REGNO; >> + > > The REGNO usage has asserted dest is with CODE REG, if it's always true > I don't see why we need copy_rtx around. Or do I miss something? Thanks a lot for point this out! Yes, now rs6000_emit_set_long_const is called on DImode for constant splitter; and it should be always with CODE REG, and then copy_rtx would not be needed here! I would update the patch accordingly. > >> + /* Use paddi for the low32 bits. */ >> + if (ud2 != 0 && ud1 != 0 && can_use_paddi) >> + emit_move_insn (dest, gen_rtx_PLUS (DImode, copy_rtx (dest), >> + GEN_INT ((ud2 << 16) | ud1))); >> + /* Use oris, ori for low32 bits. */ >> + if (ud2 != 0 && (ud1 == 0 || !can_use_paddi)) >> + emit_move_insn (ud1 != 0 ? copy_rtx (dest) : dest, >> + gen_rtx_IOR (DImode, copy_rtx (dest), >> + GEN_INT (ud2 << 16))); >> + if (ud1 != 0 && (ud2 == 0 || !can_use_paddi)) >> + emit_move_insn (dest, gen_rtx_IOR (DImode, copy_rtx (dest), >> + GEN_INT (ud1))); >> + } >> + } >> else >> { >> temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode); >> diff --git a/gcc/testsuite/gcc.target/powerpc/pr106550.c b/gcc/testsuite/gcc.target/powerpc/pr106550.c >> new file mode 100644 >> index 00000000000..c6f4116bb9a >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/powerpc/pr106550.c >> @@ -0,0 +1,14 @@ >> +/* PR target/106550 */ >> +/* { dg-options "-O2 -std=c99 -mdejagnu-cpu=power10" } */ >> + > > Need to check power10_ok, like: > /* { dg-require-effective-target power10_ok } */ > > Nit: -std=c99 is not needed? Thanks for catching this! BR, Jeff(Jiufu) > > BR, > Kewen
Hi, Segher Boessenkool <segher@kernel.crashing.org> writes: > Hi! > > On Fri, Sep 02, 2022 at 02:56:21PM +0800, Jiufu Guo wrote: >> >> + /* pli 9,high32 + sldi 9,32 + paddi 9,9,low32. */ >> >> + else >> >> + { >> > >> > The comment goes here, in the block it refers to. Comments for a block >> > are the first thing *in* the block. >> OK, great! I like the format you sugguested here :-) > > It's the normal GCC style, not my invention :-) > >> >> + emit_move_insn (copy_rtx (dest), GEN_INT ((ud4 << 16) | ud3)); >> >> + >> >> + emit_move_insn (copy_rtx (dest), >> >> + gen_rtx_ASHIFT (DImode, copy_rtx (dest), >> >> + GEN_INT (32))); >> >> + >> >> + bool can_use_paddi = REGNO (dest) != FIRST_GPR_REGNO; >> > >> > There should be a test that we so the right thing (or *a* right thing, >> > anyway; a working thing; but hopefully a reasonably fast thing) for >> > !can_use_paddi. >> To catch this test point, we need let the splitter run after RA, >> and register 0 happen to be the dest of an assignment. > > Or force the testcase to use r0 some other way. Well, "forcing" cannot > be done, but we can probably encourage it (via a local register asm for > example, or by tying the var to the output of an asm that is hard reg 0, > or perhaps there are other ways as well :-) ) Specify register is very helpful here! Both below two cases could help register 0 to be selected for a variable. Great! Thanks! code1 ------- void __attribute__ ((noinline)) foo (unsigned long long *a) { register long long d asm("r0") = 0x1245abcef9240dec; long long n; asm("cntlzd %0, %1" : "=r"(n) : "r"(d)); *a = n; } code2 ----------- register long long d asm ("r0"); void __attribute__ ((noinline)) foo () { d = 0x2351847027482577ULL; } > >> I will add this test case in patch. >> Is this ok? Any sugguestions? > > Sounds useful yes. Maybe describe the expected output in words as well > (in the testcase, not in email)? OK. Will update the patch accordingly. > >> >> +/* 3 insns for each constant: pli+sldi+paddi or pli+pli+rldimi. >> >> + And 3 additional insns: std+std+blr: 9 insns totally. */ >> >> +/* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 9 } } */ >> > >> > Also test the expected insns separately please? The std's (just with >> > \mstd so it will catch all variations as well), the blr, the pli's and >> > the rldimi etc.? >> The reason of using "(?n)^\s+[a-z]" is to keep this test case pass no >> matter the splitter running before or after RA. > > Ah. Some short comment in the testcase please? Thanks. We can check individual instructions to check better asm pli+pli+rldimi is generated, since the splitter could run in split1 pass. BR, Jeff(Jiufu) > > Thanks again, > > > Segher
On Mon, Sep 05, 2022 at 02:25:29PM +0800, Jiufu Guo wrote: > > On Fri, Sep 02, 2022 at 02:56:21PM +0800, Jiufu Guo wrote: > > Or force the testcase to use r0 some other way. Well, "forcing" cannot > > be done, but we can probably encourage it (via a local register asm for > > example, or by tying the var to the output of an asm that is hard reg 0, > > or perhaps there are other ways as well :-) ) > Specify register is very helpful here! Both below two cases could help > register 0 to be selected for a variable. Great! Thanks! > >> The reason of using "(?n)^\s+[a-z]" is to keep this test case pass no > >> matter the splitter running before or after RA. > > > > Ah. Some short comment in the testcase please? > Thanks. We can check individual instructions to check better asm > pli+pli+rldimi is generated, since the splitter could run in split1 pass. It's always a tradeoff: if we scan for too specific code the testcase will need a lot of maintenance, and that does not scale. If on the other hand we do test less than we really want, well, that is less than we really want to test :-) The changes / additions we agreed to are preapproved btw. But please post the thing you eventually commit to the mailing list :-) Segher
Segher Boessenkool <segher@kernel.crashing.org> writes: > On Mon, Sep 05, 2022 at 02:25:29PM +0800, Jiufu Guo wrote: >> > On Fri, Sep 02, 2022 at 02:56:21PM +0800, Jiufu Guo wrote: >> > Or force the testcase to use r0 some other way. Well, "forcing" cannot >> > be done, but we can probably encourage it (via a local register asm for >> > example, or by tying the var to the output of an asm that is hard reg 0, >> > or perhaps there are other ways as well :-) ) >> Specify register is very helpful here! Both below two cases could help >> register 0 to be selected for a variable. Great! Thanks! > >> >> The reason of using "(?n)^\s+[a-z]" is to keep this test case pass no >> >> matter the splitter running before or after RA. >> > >> > Ah. Some short comment in the testcase please? >> Thanks. We can check individual instructions to check better asm >> pli+pli+rldimi is generated, since the splitter could run in split1 pass. > > It's always a tradeoff: if we scan for too specific code the testcase > will need a lot of maintenance, and that does not scale. If on the > other hand we do test less than we really want, well, that is less than > we really want to test :-) Yes :) > > The changes / additions we agreed to are preapproved btw. But please > post the thing you eventually commit to the mailing list :-) Sure! I updated the patch for testcases and also for code part; I would submit for review before commit again for new comments. :) The updated patch is as: https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601150.html Thanks again! BR, Jeff(Jiufu) > > > Segher
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index df491bee2ea..1ccb2ff30a1 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -10181,6 +10181,45 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c) gen_rtx_IOR (DImode, copy_rtx (temp), GEN_INT (ud1))); } + else if (TARGET_PREFIXED) + { + /* pli 9,high32 + pli 10,low32 + rldimi 9,10,32,0. */ + if (can_create_pseudo_p ()) + { + temp = gen_reg_rtx (DImode); + rtx temp1 = gen_reg_rtx (DImode); + emit_move_insn (copy_rtx (temp), GEN_INT ((ud4 << 16) | ud3)); + emit_move_insn (copy_rtx (temp1), GEN_INT ((ud2 << 16) | ud1)); + + emit_insn (gen_rotldi3_insert_3 (dest, temp, GEN_INT (32), temp1, + GEN_INT (0xffffffff))); + } + + /* pli 9,high32 + sldi 9,32 + paddi 9,9,low32. */ + else + { + emit_move_insn (copy_rtx (dest), GEN_INT ((ud4 << 16) | ud3)); + + emit_move_insn (copy_rtx (dest), + gen_rtx_ASHIFT (DImode, copy_rtx (dest), + GEN_INT (32))); + + bool can_use_paddi = REGNO (dest) != FIRST_GPR_REGNO; + + /* Use paddi for the low32 bits. */ + if (ud2 != 0 && ud1 != 0 && can_use_paddi) + emit_move_insn (dest, gen_rtx_PLUS (DImode, copy_rtx (dest), + GEN_INT ((ud2 << 16) | ud1))); + /* Use oris, ori for low32 bits. */ + if (ud2 != 0 && (ud1 == 0 || !can_use_paddi)) + emit_move_insn (ud1 != 0 ? copy_rtx (dest) : dest, + gen_rtx_IOR (DImode, copy_rtx (dest), + GEN_INT (ud2 << 16))); + if (ud1 != 0 && (ud2 == 0 || !can_use_paddi)) + emit_move_insn (dest, gen_rtx_IOR (DImode, copy_rtx (dest), + GEN_INT (ud1))); + } + } else { temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode); diff --git a/gcc/testsuite/gcc.target/powerpc/pr106550.c b/gcc/testsuite/gcc.target/powerpc/pr106550.c new file mode 100644 index 00000000000..c6f4116bb9a --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr106550.c @@ -0,0 +1,14 @@ +/* PR target/106550 */ +/* { dg-options "-O2 -std=c99 -mdejagnu-cpu=power10" } */ + +void +foo (unsigned long long *a) +{ + *a++ = 0x020805006106003; + *a++ = 0x2351847027482577; +} + +/* 3 insns for each constant: pli+sldi+paddi or pli+pli+rldimi. + And 3 additional insns: std+std+blr: 9 insns totally. */ +/* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 9 } } */ +