From patchwork Fri Nov 10 23:09:28 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Meissner X-Patchwork-Id: 164022 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b129:0:b0:403:3b70:6f57 with SMTP id q9csp1437336vqs; Fri, 10 Nov 2023 15:10:04 -0800 (PST) X-Google-Smtp-Source: AGHT+IEQ997FuY3EyxEoWMrSxfAaHyFgrjCn2xRLbdFVRNSVLoKEJJZ4ru4prVNnBwHNEHd6bRCw X-Received: by 2002:a25:3624:0:b0:da0:6a55:b4d5 with SMTP id d36-20020a253624000000b00da06a55b4d5mr550477yba.46.1699657804632; Fri, 10 Nov 2023 15:10:04 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1699657804; cv=pass; d=google.com; s=arc-20160816; b=KHpScYfLaIbCXH77jfemadlIcexl9iaKCN2Q1xEXQcTBSC8tUni8h63HD8RFUiuK4R 084HqVDYqaEdzFJaeVokWAcmKHhObh26DZPu4DEerxNekZSCgeGmLcjfbJTIoJRhM5ZE zUAVH/LN595sWK117Ce1dL0/B0Fz7DHP9Arb12eZQLyZzfVFHf6Nmj43bWNBWAMIrkRw H1bQULuNAG1eyamcnF6GOMhHPBgWmg8oQSmhcJLbsIsnHnC1ruikXaoCiqh3S6k1AS3f QvD/blyOexWUIoDMFauhiPbyFGD22hfj0zMh/4YY/s/lS/oRXpe8oQkmpDtfNOY0M36J B6ow== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:mime-version:in-reply-to :content-disposition:references:mail-followup-to:message-id:subject :to:from:date:dkim-signature:arc-filter:dmarc-filter:delivered-to; bh=h8tflFszFJ08+7vRrqrOohb2Ig4vCkTldYlZQSQph6U=; fh=jH+DijE7mz3ySVsRmzRqEe/ioBeGu3vnvA+jm2JjCm8=; b=kJtrPv5GJEtC8V4TSvSIJMbdOt2X5wjCd0DF+E37gx2DaHe2INzvy4jPR+rStdm+9n NDEsQoS3dtfbZvci9zS1CwU63O1VwI9neUnAd6efQfNj8hB5IaTLUsxlPZk0oHqXH4G1 aFH4tmrhJYjNPjfEM6uSKpFHpDvaNZKB8V/KJeOiglxuOPNK4eMmAdl1Lx90SzJmCznQ 7B6LEGQPrAFKViF9XcUsWvvEhyPNChB0NdjldKGRfAyo8MN8uWr32OfPlUBGNEVZOzaO fEZ7zqUzptFnDUPkckRkVktmigdZtX1fV7EEJ3PNDelULRneGqZ1/GB7Gp5wusiI/ibW N5gg== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=SaJIudDO; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com Received: from server2.sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id l14-20020ad4408e000000b006558891f406si456803qvp.559.2023.11.10.15.10.04 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 10 Nov 2023 15:10:04 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=SaJIudDO; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 617033858C5F for ; Fri, 10 Nov 2023 23:10:04 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 716553858D1E for ; Fri, 10 Nov 2023 23:09:34 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 716553858D1E Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 716553858D1E Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699657778; cv=none; b=vduHfJW3D0CiehXMjEm3tftAkmX8B7a0J1Fu664pPhIgnyk+I9fYw/m5QMLFR3hnr8Uog599IpM0uuQ//k56By/V4OXkP/l2yNDmLuqEHuRsNmlNsrrZm0G6xOscdJhl3a38LlEjOxBV/2Bfc1iv7aNBZf8+t380QAbW4Vw4ny4= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699657778; c=relaxed/simple; bh=Oxrq5lpSMu0w3y0oJM0oxXmH5rzHln36sY2yEc59HDk=; h=DKIM-Signature:Date:From:To:Subject:Message-ID:MIME-Version; b=rM9O4qa93tCFZad4lygO91Y6WFKVeOKWCyjdiI8NUV0IMWFJqNnF20KHCgIUYdNDnqKQwGb6jPeXeemOURJIOHfwv4FhLdZFuCWgvE4RHK9XjMRrGD2AiEvZQi3BTTuXJbkINTuPxXYGOQ3IdsU4CsFWhSCKjysBm80bk5aEyY0= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0353723.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3AAMiEl6010902; Fri, 10 Nov 2023 23:09:34 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=date : from : to : subject : message-id : references : content-type : in-reply-to : mime-version; s=pp1; bh=h8tflFszFJ08+7vRrqrOohb2Ig4vCkTldYlZQSQph6U=; b=SaJIudDOJ1uTT4sn5bRPhN7gqojOiCYnwdiwQBMYslA0cp81J3sB1x5kfvNVyLF+Ors7 uQ1kAHGgwOmD3cOBBl7pm2lDl1kyU3gRENSDUbMTODVAWCnjrCWXFAQ7aWPVc01fruIK g3R/8cw08sqbbTlIy5uZblrTaodb5yWX4s3EZpYRUewz+yWOgFVsweEOCRyyhTI53wr4 YmYBZe7514YuV8+MOsqyTcYxFBBwrztYKxI3RZtCbgzmSDj27wzf+5cARJfhEmQx9Q2F gT8nB0e3d/Enp3K4agCfw9WFLjQ88Iv6Q/stVIqXmnm5yMQwm2XKmGjqA+BKX85q+4xI rw== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3u9wjerrmg-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 10 Nov 2023 23:09:33 +0000 Received: from m0353723.ppops.net (m0353723.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 3AAMsLUt012954; Fri, 10 Nov 2023 23:09:33 GMT Received: from ppma13.dal12v.mail.ibm.com (dd.9e.1632.ip4.static.sl-reverse.com [50.22.158.221]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3u9wjerrma-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 10 Nov 2023 23:09:33 +0000 Received: from pps.filterd (ppma13.dal12v.mail.ibm.com [127.0.0.1]) by ppma13.dal12v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 3AALRDfS003440; Fri, 10 Nov 2023 23:09:32 GMT Received: from smtprelay05.dal12v.mail.ibm.com ([172.16.1.7]) by ppma13.dal12v.mail.ibm.com (PPS) with ESMTPS id 3u7w22e9k8-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 10 Nov 2023 23:09:32 +0000 Received: from smtpav02.wdc07v.mail.ibm.com (smtpav02.wdc07v.mail.ibm.com [10.39.53.229]) by smtprelay05.dal12v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 3AAN9VCj66519376 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 10 Nov 2023 23:09:31 GMT Received: from smtpav02.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 6508D5805E; Fri, 10 Nov 2023 23:09:31 +0000 (GMT) Received: from smtpav02.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id A4D6A58082; Fri, 10 Nov 2023 23:09:30 +0000 (GMT) Received: from cowardly-lion.the-meissners.org (unknown [9.61.104.206]) by smtpav02.wdc07v.mail.ibm.com (Postfix) with ESMTPS; Fri, 10 Nov 2023 23:09:30 +0000 (GMT) Date: Fri, 10 Nov 2023 18:09:28 -0500 From: Michael Meissner To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , "Kewen.Lin" , David Edelsohn , Peter Bergner Subject: [PATCH 1/4] Add support for floating point vector pair built-in functions Message-ID: Mail-Followup-To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , "Kewen.Lin" , David Edelsohn , Peter Bergner References: Content-Disposition: inline In-Reply-To: X-TM-AS-GCONF: 00 X-Proofpoint-GUID: mu8yf3o_HmLbprnf9FBSNIEq6l-gvtak X-Proofpoint-ORIG-GUID: o82g4LJI2bsJ9gNGa8o1pV-wJ1ZMVc7r X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.987,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-11-10_21,2023-11-09_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxlogscore=999 clxscore=1015 spamscore=0 lowpriorityscore=0 suspectscore=0 mlxscore=0 malwarescore=0 adultscore=0 phishscore=0 priorityscore=1501 impostorscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2311060000 definitions=main-2311100191 X-Spam-Status: No, score=-10.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1782220382268447834 X-GMAIL-MSGID: 1782220382268447834 This patch adds a series of built-in functions to allow users to write code to do a number of simple operations where the loop is done using the __vector_pair type. The __vector_pair type is an opaque type. These built-in functions keep the two 128-bit vectors within the __vector_pair together, and split the operation after register allocation. This patch provides vector pair operations for 32-bit floating point and 64-bit floating point. I have built and tested these patches on: * A little endian power10 server using --with-cpu=power10 * A little endian power9 server using --with-cpu=power9 * A big endian power9 server using --with-cpu=power9. Can I check this patch into the master branch? 2023-11-09 Michael Meissner gcc/ * config/rs6000/rs6000-builtins.def (__builtin_vpair_f32_*): Add vector pair built-in functions for float. (__builtin_vpair_f64_*): Add vector pair built-in functions for double. * config/rs6000/rs6000-protos.h (split_unary_vector_pair): Add declaration. (split_binary_vector_pair): Likewise. (split_fma_vector_pair): Likewise. * config/rs6000/rs6000.cc (split_unary_vector_pair): New helper function for vector pair built-in functions. (split_binary_vector_pair): Likewise. (split_fma_vector_pair): Likewise. * config/rs6000/rs6000.md (toplevel): Include vector-pair.md. * config/rs6000/t-rs6000 (MD_INCLUDES): Add vector-pair.md. * config/rs6000/vector-pair.md: New file. * doc/extend.texi (PowerPC Vector Pair Built-in Functions): Document the floating point and general vector pair built-in functions. gcc/testsuite/ * gcc.target/powerpc/vector-pair-1.c: New test. * gcc.target/powerpc/vector-pair-2.c: New test. * gcc.target/powerpc/vector-pair-3.c: New test. * gcc.target/powerpc/vector-pair-4.c: New test. --- gcc/config/rs6000/rs6000-builtins.def | 52 +++ gcc/config/rs6000/rs6000-protos.h | 5 + gcc/config/rs6000/rs6000.cc | 74 ++++ gcc/config/rs6000/rs6000.md | 1 + gcc/config/rs6000/t-rs6000 | 1 + gcc/config/rs6000/vector-pair.md | 329 ++++++++++++++++++ gcc/doc/extend.texi | 46 +++ .../gcc.target/powerpc/vector-pair-1.c | 135 +++++++ .../gcc.target/powerpc/vector-pair-2.c | 134 +++++++ .../gcc.target/powerpc/vector-pair-3.c | 60 ++++ .../gcc.target/powerpc/vector-pair-4.c | 60 ++++ 11 files changed, 897 insertions(+) create mode 100644 gcc/config/rs6000/vector-pair.md create mode 100644 gcc/testsuite/gcc.target/powerpc/vector-pair-1.c create mode 100644 gcc/testsuite/gcc.target/powerpc/vector-pair-2.c create mode 100644 gcc/testsuite/gcc.target/powerpc/vector-pair-3.c create mode 100644 gcc/testsuite/gcc.target/powerpc/vector-pair-4.c diff --git a/gcc/config/rs6000/rs6000-builtins.def b/gcc/config/rs6000/rs6000-builtins.def index ce40600e803..89b248b50ef 100644 --- a/gcc/config/rs6000/rs6000-builtins.def +++ b/gcc/config/rs6000/rs6000-builtins.def @@ -4131,3 +4131,55 @@ void __builtin_vsx_stxvp (v256, unsigned long, const v256 *); STXVP nothing {mma,pair} + +;; vector pair built-in functions for 8 32-bit float values + + v256 __builtin_vpair_f32_abs (v256); + VPAIR_F32_ABS vpair_abs_v8sf2 {mma,pair} + + v256 __builtin_vpair_f32_add (v256, v256); + VPAIR_F32_ADD vpair_add_v8sf3 {mma,pair} + + v256 __builtin_vpair_f32_fma (v256, v256, v256); + VPAIR_F32_FMA vpair_fma_v8sf4 {mma,pair} + + v256 __builtin_vpair_f32_max (v256, v256); + VPAIR_F32_MAX vpair_smax_v8sf3 {mma,pair} + + v256 __builtin_vpair_f32_min (v256, v256); + VPAIR_F32_MIN vpair_smin_v8sf3 {mma,pair} + + v256 __builtin_vpair_f32_mul (v256, v256); + VPAIR_F32_MUL vpair_mul_v8sf3 {mma,pair} + + v256 __builtin_vpair_f32_neg (v256); + VPAIR_F32_NEG vpair_neg_v8sf2 {mma,pair} + + v256 __builtin_vpair_f32_sub (v256, v256); + VPAIR_F32_SUB vpair_sub_v8sf3 {mma,pair} + +;; vector pair built-in functions for 4 64-bit double values + + v256 __builtin_vpair_f64_abs (v256); + VPAIR_F64_ABS vpair_abs_v4df2 {mma,pair} + + v256 __builtin_vpair_f64_add (v256, v256); + VPAIR_F64_ADD vpair_add_v4df3 {mma,pair} + + v256 __builtin_vpair_f64_fma (v256, v256, v256); + VPAIR_F64_FMA vpair_fma_v4df4 {mma,pair} + + v256 __builtin_vpair_f64_max (v256, v256); + VPAIR_F64_MAX vpair_smax_v4df3 {mma,pair} + + v256 __builtin_vpair_f64_min (v256, v256); + VPAIR_F64_MIN vpair_smin_v4df3 {mma,pair} + + v256 __builtin_vpair_f64_mul (v256, v256); + VPAIR_F64_MUL vpair_mul_v4df3 {mma,pair} + + v256 __builtin_vpair_f64_neg (v256); + VPAIR_F64_NEG vpair_neg_v4df2 {mma,pair} + + v256 __builtin_vpair_f64_sub (v256, v256); + VPAIR_F64_SUB vpair_sub_v4df3 {mma,pair} diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h index f70118ea40f..bbd899d7562 100644 --- a/gcc/config/rs6000/rs6000-protos.h +++ b/gcc/config/rs6000/rs6000-protos.h @@ -138,6 +138,11 @@ extern void rs6000_emit_swsqrt (rtx, rtx, bool); extern void output_toc (FILE *, rtx, int, machine_mode); extern void rs6000_fatal_bad_address (rtx); extern rtx create_TOC_reference (rtx, rtx); +extern void split_unary_vector_pair (machine_mode, rtx [], rtx (*)(rtx, rtx)); +extern void split_binary_vector_pair (machine_mode, rtx [], + rtx (*)(rtx, rtx, rtx)); +extern void split_fma_vector_pair (machine_mode, rtx [], + rtx (*)(rtx, rtx, rtx, rtx)); extern void rs6000_split_multireg_move (rtx, rtx); extern void rs6000_emit_le_vsx_permute (rtx, rtx, machine_mode); extern void rs6000_emit_le_vsx_move (rtx, rtx, machine_mode); diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index db60d3ca960..99352400197 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -27396,6 +27396,80 @@ rs6000_split_logical (rtx operands[3], return; } +/* Split a unary vector pair insn into two separate vector insns. */ + +void +split_unary_vector_pair (machine_mode mode, /* vector mode. */ + rtx operands[], /* dest, src. */ + rtx (*func)(rtx, rtx)) /* create insn. */ +{ + rtx op0 = operands[0]; + rtx op1 = operands[1]; + machine_mode orig_mode = GET_MODE (op0); + + rtx reg0_vector0 = simplify_gen_subreg (mode, op0, orig_mode, 0); + rtx reg1_vector0 = simplify_gen_subreg (mode, op1, orig_mode, 0); + rtx reg0_vector1 = simplify_gen_subreg (mode, op0, orig_mode, 16); + rtx reg1_vector1 = simplify_gen_subreg (mode, op1, orig_mode, 16); + + emit_insn (func (reg0_vector0, reg1_vector0)); + emit_insn (func (reg0_vector1, reg1_vector1)); + return; +} + +/* Split a binary vector pair insn into two separate vector insns. */ + +void +split_binary_vector_pair (machine_mode mode, /* vector mode. */ + rtx operands[], /* dest, src. */ + rtx (*func)(rtx, rtx, rtx)) /* create insn. */ +{ + rtx op0 = operands[0]; + rtx op1 = operands[1]; + rtx op2 = operands[2]; + machine_mode orig_mode = GET_MODE (op0); + + rtx reg0_vector0 = simplify_gen_subreg (mode, op0, orig_mode, 0); + rtx reg1_vector0 = simplify_gen_subreg (mode, op1, orig_mode, 0); + rtx reg2_vector0 = simplify_gen_subreg (mode, op2, orig_mode, 0); + rtx reg0_vector1 = simplify_gen_subreg (mode, op0, orig_mode, 16); + rtx reg1_vector1 = simplify_gen_subreg (mode, op1, orig_mode, 16); + rtx reg2_vector1 = simplify_gen_subreg (mode, op2, orig_mode, 16); + + emit_insn (func (reg0_vector0, reg1_vector0, reg2_vector0)); + emit_insn (func (reg0_vector1, reg1_vector1, reg2_vector1)); + return; +} + +/* Split a fused multiply-add vector pair insn into two separate vector + insns. */ + +void +split_fma_vector_pair (machine_mode mode, /* vector mode. */ + rtx operands[], /* dest, src. */ + rtx (*func)(rtx, rtx, rtx, rtx)) /* create insn. */ +{ + rtx op0 = operands[0]; + rtx op1 = operands[1]; + rtx op2 = operands[2]; + rtx op3 = operands[3]; + machine_mode orig_mode = GET_MODE (op0); + + rtx reg0_vector0 = simplify_gen_subreg (mode, op0, orig_mode, 0); + rtx reg1_vector0 = simplify_gen_subreg (mode, op1, orig_mode, 0); + rtx reg2_vector0 = simplify_gen_subreg (mode, op2, orig_mode, 0); + rtx reg3_vector0 = simplify_gen_subreg (mode, op3, orig_mode, 0); + + rtx reg0_vector1 = simplify_gen_subreg (mode, op0, orig_mode, 16); + rtx reg1_vector1 = simplify_gen_subreg (mode, op1, orig_mode, 16); + rtx reg2_vector1 = simplify_gen_subreg (mode, op2, orig_mode, 16); + rtx reg3_vector1 = simplify_gen_subreg (mode, op3, orig_mode, 16); + + emit_insn (func (reg0_vector0, reg1_vector0, reg2_vector0, reg3_vector0)); + emit_insn (func (reg0_vector1, reg1_vector1, reg2_vector1, reg3_vector1)); + return; +} + /* Emit instructions to move SRC to DST. Called by splitters for multi-register moves. It will emit at most one instruction for each register that is accessed; that is, it won't emit li/lis pairs diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index dcf1f3526f5..5a17adc1bc3 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -15767,6 +15767,7 @@ (define_insn "hashchk" (include "vsx.md") (include "altivec.md") (include "mma.md") +(include "vector-pair.md") (include "dfp.md") (include "crypto.md") (include "htm.md") diff --git a/gcc/config/rs6000/t-rs6000 b/gcc/config/rs6000/t-rs6000 index f183b42ce1d..5fc89499795 100644 --- a/gcc/config/rs6000/t-rs6000 +++ b/gcc/config/rs6000/t-rs6000 @@ -128,6 +128,7 @@ MD_INCLUDES = $(srcdir)/config/rs6000/rs64.md \ $(srcdir)/config/rs6000/vsx.md \ $(srcdir)/config/rs6000/altivec.md \ $(srcdir)/config/rs6000/mma.md \ + $(srcdir)/config/rs6000/vector-pair.md \ $(srcdir)/config/rs6000/crypto.md \ $(srcdir)/config/rs6000/htm.md \ $(srcdir)/config/rs6000/dfp.md \ diff --git a/gcc/config/rs6000/vector-pair.md b/gcc/config/rs6000/vector-pair.md new file mode 100644 index 00000000000..2dcac6a31e2 --- /dev/null +++ b/gcc/config/rs6000/vector-pair.md @@ -0,0 +1,329 @@ +;; Vector pair arithmetic support. +;; Copyright (C) 2020-2023 Free Software Foundation, Inc. +;; Contributed by Peter Bergner and +;; Michael Meissner +;; +;; This file is part of GCC. +;; +;; GCC is free software; you can redistribute it and/or modify it +;; under the terms of the GNU General Public License as published +;; by the Free Software Foundation; either version 3, or (at your +;; option) any later version. +;; +;; GCC is distributed in the hope that it will be useful, but WITHOUT +;; ANY WARRANTY; without even the implied warranty of MERCHANTABILITY +;; or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public +;; License for more details. +;; +;; You should have received a copy of the GNU General Public License +;; along with GCC; see the file COPYING3. If not see +;; . +;; +;; This file adds support for doing vector operations on pairs of vector +;; registers. Most of the instructions use vector pair instructions to load +;; and possibly store registers, but splitting the operation after register +;; allocation to do 2 separate operations. The second scheduler pass can +;; interleave other instructions between these pairs of instructions if +;; possible. + +(define_c_enum "unspec" + [UNSPEC_VPAIR_V4DF + UNSPEC_VPAIR_V8SF + ]) + +;; Iterator doing unary/binary arithmetic on vector pairs +(define_code_iterator VP_FP_UNARY [abs neg]) +(define_code_iterator VP_FP_BINARY [minus mult plus smin smax]) + +;; Return the insn name from the VP_* code iterator +(define_code_attr vp_insn [(abs "abs") + (minus "sub") + (mult "mul") + (neg "neg") + (plus "add") + (smin "smin") + (smax "smax") + (xor "xor")]) + +;; Iterator for creating the unspecs for vector pair built-ins +(define_int_iterator VP_FP [UNSPEC_VPAIR_V4DF + UNSPEC_VPAIR_V8SF]) + +;; Map VP_* to vector mode of the arguments after they are split +(define_int_attr VP_VEC_MODE [(UNSPEC_VPAIR_V4DF "V2DF") + (UNSPEC_VPAIR_V8SF "V4SF")]) + +;; Map VP_* to a lower case name to identify the vector pair. +(define_int_attr vp_pmode [(UNSPEC_VPAIR_V4DF "v4df") + (UNSPEC_VPAIR_V8SF "v8sf")]) + +;; Map VP_* to a lower case name to identify the vector after the vector pair +;; has been split. +(define_int_attr vp_vmode [(UNSPEC_VPAIR_V4DF "v2df") + (UNSPEC_VPAIR_V8SF "v4sf")]) + + +;; Vector pair floating point unary operations +(define_insn_and_split "vpair__2" + [(set (match_operand:OO 0 "vsx_register_operand" "=wa") + (unspec:OO [(VP_FP_UNARY:OO + (match_operand:OO 1 "vsx_register_operand" "wa"))] + VP_FP))] + "TARGET_MMA" + "#" + "&& reload_completed" + [(const_int 0)] +{ + split_unary_vector_pair (mode, operands, + gen_2); + DONE; +} + [(set_attr "length" "8")]) + +;; Optimize vector pair negate of absolute value +(define_insn_and_split "vpair_nabs_2" + [(set (match_operand:OO 0 "vsx_register_operand" "=wa") + (unspec:OO + [(neg:OO + (unspec:OO + [(abs:OO (match_operand:OO 1 "vsx_register_operand" "ww"))] + VP_FP))] + VP_FP))] + "TARGET_MMA" + "#" + "&& reload_completed" + [(const_int 0)] +{ + split_unary_vector_pair (mode, operands, + gen_vsx_nabs2); + DONE; +} + [(set_attr "length" "8")]) + +;; Vector pair floating binary operations +(define_insn_and_split "vpair__3" + [(set (match_operand:OO 0 "vsx_register_operand" "=wa") + (unspec:OO [(VP_FP_BINARY:OO + (match_operand:OO 1 "vsx_register_operand" "wa") + (match_operand:OO 2 "vsx_register_operand" "wa"))] + VP_FP))] + "TARGET_MMA" + "#" + "&& reload_completed" + [(const_int 0)] +{ + split_binary_vector_pair (mode, operands, + gen_3); + DONE; +} + [(set_attr "length" "8")]) + +;; Vector pair fused multiply-add floating point operations +(define_insn_and_split "vpair_fma_4" + [(set (match_operand:OO 0 "vsx_register_operand" "=wa,wa") + (unspec:OO + [(fma:OO + (match_operand:OO 1 "vsx_register_operand" "%wa,wa") + (match_operand:OO 2 "vsx_register_operand" "wa,0") + (match_operand:OO 3 "vsx_register_operand" "0,wa"))] + VP_FP))] + "TARGET_MMA" + "#" + "&& reload_completed" + [(const_int 0)] +{ + split_fma_vector_pair (mode, operands, + gen_fma4); + DONE; +} + [(set_attr "length" "8")]) + +(define_insn_and_split "vpair_fms_4" + [(set (match_operand:OO 0 "vsx_register_operand" "=wa,wa") + (unspec:OO + [(fma:OO + (match_operand:OO 1 "vsx_register_operand" "%wa,wa") + (match_operand:OO 2 "vsx_register_operand" "wa,0") + (unspec:OO + [(neg:OO (match_operand:OO 3 "vsx_register_operand" "0,wa"))] + VP_FP))] + VP_FP))] + "TARGET_MMA" + "#" + "&& reload_completed" + [(const_int 0)] +{ + split_fma_vector_pair (mode, operands, + gen_fms4); + DONE; +} + [(set_attr "length" "8")]) + +(define_insn_and_split "vpair_nfma_4" + [(set (match_operand:OO 0 "vsx_register_operand" "=wa,wa") + (unspec:OO + [(neg:OO + (unspec:OO + [(fma:OO + (match_operand:OO 1 "vsx_register_operand" "%wa,wa") + (match_operand:OO 2 "vsx_register_operand" "wa,0") + (match_operand:OO 3 "vsx_register_operand" "0,wa"))] + VP_FP))] + VP_FP))] + "TARGET_MMA" + "#" + "&& reload_completed" + [(const_int 0)] +{ + split_fma_vector_pair (mode, operands, + gen_nfma4); + DONE; +} + [(set_attr "length" "8")]) + +(define_insn_and_split "vpair_nfms_4" + [(set (match_operand:OO 0 "vsx_register_operand" "=wa,wa") + (unspec:OO + [(neg:OO + (unspec:OO + [(fma:OO + (match_operand:OO 1 "vsx_register_operand" "%wa,wa") + (match_operand:OO 2 "vsx_register_operand" "wa,0") + (unspec:OO + [(neg:OO (match_operand:OO 3 "vsx_register_operand" "0,wa"))] + VP_FP))] + VP_FP))] + VP_FP))] + "TARGET_MMA" + "#" + "&& reload_completed" + [(const_int 0)] +{ + split_fma_vector_pair (mode, operands, + gen_nfms4); + DONE; +} + [(set_attr "length" "8")]) + +;; Optimize vector pair (a * b) + c into vector pair fma (a, b, c). +(define_insn_and_split "*vpair_fma_fpcontract_4" + [(set (match_operand:OO 0 "vsx_register_operand" "=wa,wa") + (unspec:OO + [(plus:OO + (unspec:OO + [(mult:OO + (match_operand:OO 1 "vsx_register_operand" "%wa,wa") + (match_operand:OO 2 "vsx_register_operand" "wa,0"))] + VP_FP) + (match_operand:OO 3 "vsx_register_operand" "0,wa"))] + VP_FP))] + "TARGET_MMA && flag_fp_contract_mode == FP_CONTRACT_FAST" + "#" + "&& 1" + [(set (match_dup 0) + (unspec:OO + [(fma:OO + (match_dup 1) + (match_dup 2) + (match_dup 3))] + VP_FP))] +{ +} + [(set_attr "length" "8")]) + +;; Optimize vector pair (a * b) - c into vector pair fma (a, b, -c) +(define_insn_and_split "*vpair_fms_fpcontract_4" + [(set (match_operand:OO 0 "vsx_register_operand" "=wa,wa") + (unspec:OO + [(minus:OO + (unspec:OO + [(mult:OO + (match_operand:OO 1 "vsx_register_operand" "%wa,wa") + (match_operand:OO 2 "vsx_register_operand" "wa,0"))] + VP_FP) + (match_operand:OO 3 "vsx_register_operand" "0,wa"))] + VP_FP))] + "TARGET_MMA && flag_fp_contract_mode == FP_CONTRACT_FAST" + "#" + "&& 1" + [(set (match_dup 0) + (unspec:OO + [(fma:OO + (match_dup 1) + (match_dup 2) + (unspec:OO + [(neg:OO + (match_dup 3))] + VP_FP))] + VP_FP))] +{ +} + [(set_attr "length" "8")]) + + +;; Optimize vector pair -((a * b) + c) into vector pair -fma (a, b, c). +(define_insn_and_split "*vpair_nfma_fpcontract_4" + [(set (match_operand:OO 0 "vsx_register_operand" "=wa,wa") + (unspec:OO + [(neg:OO + (unspec:OO + [(plus:OO + (unspec:OO + [(mult:OO + (match_operand:OO 1 "vsx_register_operand" "%wa,wa") + (match_operand:OO 2 "vsx_register_operand" "wa,0"))] + VP_FP) + (match_operand:OO 3 "vsx_register_operand" "0,wa"))] + VP_FP))] + VP_FP))] + "TARGET_MMA && flag_fp_contract_mode == FP_CONTRACT_FAST" + "#" + "&& 1" + [(set (match_dup 0) + (unspec:OO + [(neg:OO + (unspec:OO + [(fma:OO + (match_dup 1) + (match_dup 2) + (match_dup 3))] + VP_FP))] + VP_FP))] +{ +} + [(set_attr "length" "8")]) + +;; Optimize vector pair -((a * b) - c) into vector pair -fma (a, b, -c) +(define_insn_and_split "*vpair_nfms_fpcontract_4" + [(set (match_operand:OO 0 "vsx_register_operand" "=wa,wa") + (unspec:OO + [(neg:OO + (unspec:OO + [(minus:OO + (unspec:OO + [(mult:OO + (match_operand:OO 1 "vsx_register_operand" "%wa,wa") + (match_operand:OO 2 "vsx_register_operand" "wa,0"))] + VP_FP) + (match_operand:OO 3 "vsx_register_operand" "0,wa"))] + VP_FP))] + VP_FP))] + "TARGET_MMA && flag_fp_contract_mode == FP_CONTRACT_FAST" + "#" + "&& 1" + [(set (match_dup 0) + (unspec:OO + [(neg:OO + (unspec:OO + [(fma:OO + (match_dup 1) + (match_dup 2) + (unspec:OO + [(neg:OO + (match_dup 3))] + VP_FP))] + VP_FP))] + VP_FP))] +{ +} + [(set_attr "length" "8")]) diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 7cdfdf8c83b..a830ad06b90 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -15038,6 +15038,7 @@ instructions, but allow the compiler to schedule those calls. * NDS32 Built-in Functions:: * Nvidia PTX Built-in Functions:: * Basic PowerPC Built-in Functions:: +* PowerPC Vector Pair Built-in Functions Available on ISA 3.1:: * PowerPC AltiVec/VSX Built-in Functions:: * PowerPC Hardware Transactional Memory Built-in Functions:: * PowerPC Atomic Memory Operation Functions:: @@ -21368,6 +21369,51 @@ int vec_any_le (vector unsigned __int128, vector unsigned __int128); @end smallexample +@node PowerPC Vector Pair Built-in Functions Available on ISA 3.1 +@subsection PowerPC Vector Pair Built-in Functions Available on ISA 3.1 + +GCC provides functions to speed up processing by using the type +@code{__vector_pair} to hold two 128-bit vectors on processors that +support ISA 3.1 (power10). The @code{__vector_pair} type and the +vector pair built-in functions require the MMA instruction set +(@option{-mmma}) to be enabled, which is on by default for +@option{-mcpu=power10}. + +By default, @code{__vector_pair} types are loaded into vectors with a +single load vector pair instruction. The processing for the built-in +function is done as two separate vector instructions on each of the +two 128-bit vectors stored in the vector pair. The +@code{__vector_pair} type is usually stored with a single vector pair +store instruction. + +The following built-in functions operate on pairs of +@code{vector float} values: + +@smallexample +__vector_pair __builtin_vpair_f32_abs (__vector_pair); +__vector_pair __builtin_vpair_f32_add (__vector_pair, __vector_pair); +__vector_pair __builtin_vpair_f32_fma (__vector_pair, __vector_pair, __vector_pair); +__vector_pair __builtin_vpair_f32_max (__vector_pair, __vector_pair); +__vector_pair __builtin_vpair_f32_min (__vector_pair, __vector_pair); +__vector_pair __builtin_vpair_f32_mul (__vector_pair, __vector_pair); +__vector_pair __builtin_vpair_f32_neg (__vector_pair); +__vector_pair __builtin_vpair_f32_sub (__vector_pair, __vector_pair); +@end smallexample + +The following built-in functions operate on pairs of +@code{vector double} values: + +@smallexample +__vector_pair __builtin_vpair_f64_abs (__vector_pair); +__vector_pair __builtin_vpair_f64_add (__vector_pair, __vector_pair); +__vector_pair __builtin_vpair_f64_fma (__vector_pair, __vector_pair, __vector_pair); +__vector_pair __builtin_vpair_f64_mul (__vector_pair, __vector_pair); +__vector_pair __builtin_vpair_f64_neg (__vector_pair); +__vector_pair __builtin_vpair_f64_max (__vector_pair, __vector_pair); +__vector_pair __builtin_vpair_f64_min (__vector_pair, __vector_pair); +__vector_pair __builtin_vpair_f64_sub (__vector_pair, __vector_pair); +@end smallexample + @node PowerPC Hardware Transactional Memory Built-in Functions @subsection PowerPC Hardware Transactional Memory Built-in Functions GCC provides two interfaces for accessing the Hardware Transactional diff --git a/gcc/testsuite/gcc.target/powerpc/vector-pair-1.c b/gcc/testsuite/gcc.target/powerpc/vector-pair-1.c new file mode 100644 index 00000000000..e74840cebc0 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vector-pair-1.c @@ -0,0 +1,135 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */ + +/* Test whether the vector buitin code generates the expected instructions for + vector pairs with 4 double elements. */ + +void +test_add (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 xvadddp, 1 stxvp. */ + *dest = __builtin_vpair_f64_add (*x, *y); +} + +void +test_sub (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 xvsubdp, 1 stxvp. */ + *dest = __builtin_vpair_f64_sub (*x, *y); +} + +void +test_multiply (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 xvmuldp, 1 stxvp. */ + *dest = __builtin_vpair_f64_mul (*x, *y); +} + +void +test_min (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 xvmindp, 1 stxvp. */ + *dest = __builtin_vpair_f64_min (*x, *y); +} + +void +test_max (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 xvmaxdp, 1 stxvp. */ + *dest = __builtin_vpair_f64_max (*x, *y); +} + +void +test_negate (__vector_pair *dest, + __vector_pair *x) +{ + /* 1 lxvp, 2 xvnegdp, 1 stxvp. */ + *dest = __builtin_vpair_f64_neg (*x); +} + +void +test_abs (__vector_pair *dest, + __vector_pair *x) +{ + /* 1 lxvp, 2 xvabsdp, 1 stxvp. */ + *dest = __builtin_vpair_f64_abs (*x); +} + +void +test_negative_abs (__vector_pair *dest, + __vector_pair *x) +{ + /* 2 lxvp, 2 xvnabsdp, 1 stxvp. */ + __vector_pair ab = __builtin_vpair_f64_abs (*x); + *dest = __builtin_vpair_f64_neg (ab); +} + +void +test_fma (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y, + __vector_pair *z) +{ + /* 3 lxvp, 2 xvmadd{a,q}dp, 1 stxvp. */ + *dest = __builtin_vpair_f64_fma (*x, *y, *z); +} + +void +test_fms (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y, + __vector_pair *z) +{ + /* 3 lxvp, 2 xvmsub{a,q}dp, 1 stxvp. */ + __vector_pair n = __builtin_vpair_f64_neg (*z); + *dest = __builtin_vpair_f64_fma (*x, *y, n); +} + +void +test_nfma (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y, + __vector_pair *z) +{ + /* 3 lxvp, 2 xvnmadd{a,q}dp, 1 stxvp. */ + __vector_pair w = __builtin_vpair_f64_fma (*x, *y, *z); + *dest = __builtin_vpair_f64_neg (w); +} + +void +test_nfms (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y, + __vector_pair *z) +{ + /* 3 lxvp, 2 xvnmsub{a,q}dp, 1 stxvp. */ + __vector_pair n = __builtin_vpair_f64_neg (*z); + __vector_pair w = __builtin_vpair_f64_fma (*x, *y, n); + *dest = __builtin_vpair_f64_neg (w); +} + +/* { dg-final { scan-assembler-times {\mlxvp\M} 25 } } */ +/* { dg-final { scan-assembler-times {\mstxvp\M} 12 } } */ +/* { dg-final { scan-assembler-times {\mxvabsdp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvadddp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvmadd.dp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvmaxdp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvmindp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvmsub.dp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvmuldp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvnabsdp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvnegdp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvnmadd.dp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvnmsub.dp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvsubdp\M} 2 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/vector-pair-2.c b/gcc/testsuite/gcc.target/powerpc/vector-pair-2.c new file mode 100644 index 00000000000..2facb727053 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vector-pair-2.c @@ -0,0 +1,134 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */ + +/* Test whether the vector buitin code generates the expected instructions for + vector pairs with 8 float elements. */ + +void +test_add (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 xvaddsp, 1 stxvp. */ + *dest = __builtin_vpair_f32_add (*x, *y); +} + +void +test_sub (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 xvsubsp, 1 stxvp. */ + *dest = __builtin_vpair_f32_sub (*x, *y); +} + +void +test_multiply (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 xvmulsp, 1 stxvp. */ + *dest = __builtin_vpair_f32_mul (*x, *y); +} + +void +test_max (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 xvmaxsp, 1 stxvp. */ + *dest = __builtin_vpair_f32_max (*x, *y); +} + +void +test_min (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 xvminsp, 1 stxvp. */ + *dest = __builtin_vpair_f32_min (*x, *y); +} + +void +test_negate (__vector_pair *dest, + __vector_pair *x) +{ + /* 1 lxvp, 2 xvnegsp, 1 stxvp. */ + *dest = __builtin_vpair_f32_neg (*x); +} + +void +test_abs (__vector_pair *dest, + __vector_pair *x) +{ + /* 1 lxvp, 2 xvabssp, 1 stxvp. */ + *dest = __builtin_vpair_f32_abs (*x); +} + +void +test_negative_abs (__vector_pair *dest, + __vector_pair *x) +{ + /* 2 lxvp, 2 xvnabssp, 1 stxvp. */ + __vector_pair ab = __builtin_vpair_f32_abs (*x); + *dest = __builtin_vpair_f32_neg (ab); +} + +void +test_fma (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y, + __vector_pair *z) +{ + /* 3 lxvp, 2 xvmadd{a,q}sp, 1 stxvp. */ + *dest = __builtin_vpair_f32_fma (*x, *y, *z); +} + +void +test_fms (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y, + __vector_pair *z) +{ + /* 3 lxvp, 2 xvmsub{a,q}sp, 1 stxvp. */ + __vector_pair n = __builtin_vpair_f32_neg (*z); + *dest = __builtin_vpair_f32_fma (*x, *y, n); +} + +void +test_nfma (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y, + __vector_pair *z) +{ + /* 3 lxvp, 2 xvnmadd{a,q}sp, 1 stxvp. */ + __vector_pair w = __builtin_vpair_f32_fma (*x, *y, *z); + *dest = __builtin_vpair_f32_neg (w); +} + +void +test_nfms (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y, + __vector_pair *z) +{ + /* 3 lxvp, 2 xvnmsub{a,q}sp, 1 stxvp. */ + __vector_pair n = __builtin_vpair_f32_neg (*z); + __vector_pair w = __builtin_vpair_f32_fma (*x, *y, n); + *dest = __builtin_vpair_f32_neg (w); +} + +/* { dg-final { scan-assembler-times {\mlxvp\M} 25 } } */ +/* { dg-final { scan-assembler-times {\mstxvp\M} 12 } } */ +/* { dg-final { scan-assembler-times {\mxvabssp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvaddsp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvmadd.sp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvmaxsp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvminsp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvmsub.sp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvmulsp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvnabssp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvnegsp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvnmadd.sp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvnmsub.sp\M} 2 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/vector-pair-3.c b/gcc/testsuite/gcc.target/powerpc/vector-pair-3.c new file mode 100644 index 00000000000..65bfc44f85d --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vector-pair-3.c @@ -0,0 +1,60 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -Ofast" } */ + +/* Test whether the vector buitin code combines multiply, add/subtract, and + negate operations to the appropriate fused multiply-add instruction for + vector pairs with 4 double elements. */ + +void +test_fma (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y, + __vector_pair *z) +{ + /* 3 ldxvp, 2 xvmadd{a,m}dp, 1 stxvp. */ + __vector_pair m = __builtin_vpair_f64_mul (*x, *y); + *dest = __builtin_vpair_f64_add (m, *z); +} + +void +test_fms (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y, + __vector_pair *z) +{ + /* 3 ldxvp, 2 xvmsub{a,m}dp, 1 stxvp. */ + __vector_pair m = __builtin_vpair_f64_mul (*x, *y); + *dest = __builtin_vpair_f64_sub (m, *z); +} + +void +test_nfma (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y, + __vector_pair *z) +{ + /* 3 ldxvp, 2 xvnmadd{a,m}dp, 1 stxvp. */ + __vector_pair m = __builtin_vpair_f64_mul (*x, *y); + __vector_pair w = __builtin_vpair_f64_add (m, *z); + *dest = __builtin_vpair_f64_neg (w); +} + +void +test_nfms (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y, + __vector_pair *z) +{ + /* 3 ldxvp, 2 xvnmadd{a,m}dp, 1 stxvp. */ + __vector_pair m = __builtin_vpair_f64_mul (*x, *y); + __vector_pair w = __builtin_vpair_f64_sub (m, *z); + *dest = __builtin_vpair_f64_neg (w); +} + +/* { dg-final { scan-assembler-times {\mlxvp\M} 12 } } */ +/* { dg-final { scan-assembler-times {\mstxvp\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mxvmadd.dp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvmsub.dp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvnmadd.dp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvnmsub.dp\M} 2 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/vector-pair-4.c b/gcc/testsuite/gcc.target/powerpc/vector-pair-4.c new file mode 100644 index 00000000000..b62871be1fd --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vector-pair-4.c @@ -0,0 +1,60 @@ +/* { dgv64-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -Ofast" } */ + +/* Test whether the vector buitin code combines multiply, add/subtract, and + negate operations to the appropriate fused multiply-add instruction for + vector pairs with 8 float elements. */ + +void +test_fma (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y, + __vector_pair *z) +{ + /* 3 ldxvp, 2 xvmadd{a,m}sp, 1 stxvp. */ + __vector_pair m = __builtin_vpair_f32_mul (*x, *y); + *dest = __builtin_vpair_f32_add (m, *z); +} + +void +test_fms (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y, + __vector_pair *z) +{ + /* 3 ldxvp, 2 xvmsub{a,m}sp, 1 stxvp. */ + __vector_pair m = __builtin_vpair_f32_mul (*x, *y); + *dest = __builtin_vpair_f32_sub (m, *z); +} + +void +test_nfma (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y, + __vector_pair *z) +{ + /* 3 ldxvp, 2 xvnmadd{a,m}sp, 1 stxvp. */ + __vector_pair m = __builtin_vpair_f32_mul (*x, *y); + __vector_pair w = __builtin_vpair_f32_add (m, *z); + *dest = __builtin_vpair_f32_neg (w); +} + +void +test_nfms (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y, + __vector_pair *z) +{ + /* 3 ldxvp, 2 xvnmadd{a,m}sp, 1 stxvp. */ + __vector_pair m = __builtin_vpair_f32_mul (*x, *y); + __vector_pair w = __builtin_vpair_f32_sub (m, *z); + *dest = __builtin_vpair_f32_neg (w); +} + +/* { dg-final { scan-assembler-times {\mlxvp\M} 12 } } */ +/* { dg-final { scan-assembler-times {\mstxvp\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mxvmadd.sp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvmsub.sp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvnmadd.sp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvnmsub.sp\M} 2 } } */ From patchwork Fri Nov 10 23:11:20 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Meissner X-Patchwork-Id: 164025 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b129:0:b0:403:3b70:6f57 with SMTP id q9csp1440574vqs; Fri, 10 Nov 2023 15:16:27 -0800 (PST) X-Google-Smtp-Source: AGHT+IE8K5k4zzPfIp3X1CE6eJyD+5rAzD7pwgPy8ewcMM89sHrWZVwcjXMUdKq3c1ZfvGoiaB8H X-Received: by 2002:a05:6358:7209:b0:168:ff1b:8f59 with SMTP id h9-20020a056358720900b00168ff1b8f59mr644868rwa.4.1699658186983; Fri, 10 Nov 2023 15:16:26 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1699658186; cv=pass; d=google.com; s=arc-20160816; b=ik+on+Spvm8jIVDWskPwcDFzmlSaF7Ls1DeG/0J3gKyzjp22BtyroOOGyGDwh3t3Qi drhG7/AbzTytN+ymedRcv64Tks2hxQQynHNW6qoOE7WIrp7inRpYEBf/OJo8+2LmIS8n 7xeyF/D1XtknKCv+SxjBDrBN7QgWr5rW1SRjqjTKDbN5IWqQYepJKezOSFmqu7EpWRSw wsfgDc1NIZj+iM1Jx0B2OJJGNYodMBSJPV2QCEZvm18CyG2lTOnDzQiNU0v7DIaWULKK LlOHd2kkKMRrOIiDDG70en4dSUozP1Jyb2mZAErapP+9HOTq6UAQjiLtrqNgwVpmzlBR c9KA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:in-reply-to:content-disposition :mime-version:references:mail-followup-to:message-id:subject:to:from :date:dkim-signature:arc-filter:dmarc-filter:delivered-to; bh=/A++58z6q3ji2B0Q3+eLmJskKgZ8sZfqzRHc1NhN400=; fh=jH+DijE7mz3ySVsRmzRqEe/ioBeGu3vnvA+jm2JjCm8=; b=mxkm4Jkwu9htNBRQ7Z+NaIHMnIbqPYIQWs6+ekTvjVgiZbFVAZ7Y6xCuRRwHMx45JR WJJTl334JHpGOsnOgR5Uifgbzujs5Fvvi8kh015nKbvZtaWgP6QPqbvD19KhOnEv2FJF onwLtm+PGCAfnmrkybaKczAO8P2FK9ISxj/Rq2gKbB1jO/d/HH5O06MOtp0bbkXz/MLG 6uG72w/aKevnW4A9yADuP1C7/1OoLXMwh7MQJyKvS+vMUARIM1HZNaOj5NdBI4nLMzRh 5VJjJ6C4MezsM01TnhTA6P4bRUZPW6dd/mlSEywMct3PiT6ut9/bCGh9Qn9ez8cqShr9 in8g== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=qVJV9ec+; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com Received: from server2.sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id v4-20020ac85784000000b00418154b60e5si461210qta.540.2023.11.10.15.16.26 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 10 Nov 2023 15:16:26 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=qVJV9ec+; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B5EB13858407 for ; Fri, 10 Nov 2023 23:16:26 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id B07E73858D37 for ; Fri, 10 Nov 2023 23:15:57 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B07E73858D37 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org B07E73858D37 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699658162; cv=none; b=IE8Qc8GnSkW05L8zyfXOfjQEBGOmUI9NqXJ3cztyOsw1xqRd1toT/vnskECg4tk0AJwIrepSt0OdT+dCC60EYlkwXdsF3aQFDHDlipA9OzYmNs3R5w7MhYmBhe5998Cp74DGT+sGVi0/lHtYAb2K14mznkrPxvzdjLVVRBtte7M= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699658162; c=relaxed/simple; bh=16bC0KXttu0U0NWnaZxjWK4dI5bV1Rx6wdbXjLAE6jw=; h=DKIM-Signature:Date:From:To:Subject:Message-ID:MIME-Version; b=wiuB0sH/wCp2eDRld3YqnJbygyoRk5zBaIIvh/zyrIHPWNIxz2u8Wrf5R3nnYTz6/UnZrI577rDkGoB3gBRfG3N3Qn1z22XnlR2ScyQQOOoDyq2qlVWgT/mn49rZuEq6X6U+Tq4Yyb5O6MYaVnVNeIs9UD/sVbQ4OSU8GQOf7+Q= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0356517.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3AAMlPs2005079; Fri, 10 Nov 2023 23:15:56 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=date : from : to : subject : message-id : references : mime-version : content-type : in-reply-to; s=pp1; bh=/A++58z6q3ji2B0Q3+eLmJskKgZ8sZfqzRHc1NhN400=; b=qVJV9ec+inQwd9GtBj6F3NOpGq5KHVPMzSGhgkF/TpsqV3Q4OV5YIHRw0J4zQJzRZqqL mvtpaXqjZ5rt0p82zKTzTJySwtqyNP71G4Tsy26jJiQOE59Y0W34ZYjGYYCzosbDfnN6 Or17rNlJi3QwGKSrJAzwmGbPmikzBmA8vjZLuaBYm1Ay5Zj4th3S6JWGYyhkM+gu1xKH o5Q07U+N3YH6SlnctffOAH7TXwWz9qUFgQ7cUA/6W5PR9mX+/FwVmmLjjJzxvGPx+jLD G5hjl2rMtjnczKeqAE8LFDrGcSJZYErD0QK3zGLrLEnTVbUZjnMqrmp0KK3wjsKjL4j2 XA== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3u9wm78naa-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 10 Nov 2023 23:15:55 +0000 Received: from m0356517.ppops.net (m0356517.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 3AANFrWf019649; Fri, 10 Nov 2023 23:15:53 GMT Received: from ppma21.wdc07v.mail.ibm.com (5b.69.3da9.ip4.static.sl-reverse.com [169.61.105.91]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3u9wm78mp0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 10 Nov 2023 23:15:52 +0000 Received: from pps.filterd (ppma21.wdc07v.mail.ibm.com [127.0.0.1]) by ppma21.wdc07v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 3AALpikU000662; Fri, 10 Nov 2023 23:11:23 GMT Received: from smtprelay05.dal12v.mail.ibm.com ([172.16.1.7]) by ppma21.wdc07v.mail.ibm.com (PPS) with ESMTPS id 3u7w23eauu-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 10 Nov 2023 23:11:23 +0000 Received: from smtpav02.dal12v.mail.ibm.com (smtpav02.dal12v.mail.ibm.com [10.241.53.101]) by smtprelay05.dal12v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 3AANBMIx5898770 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 10 Nov 2023 23:11:22 GMT Received: from smtpav02.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id B43405805A; Fri, 10 Nov 2023 23:11:22 +0000 (GMT) Received: from smtpav02.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 333F758051; Fri, 10 Nov 2023 23:11:22 +0000 (GMT) Received: from cowardly-lion.the-meissners.org (unknown [9.61.104.206]) by smtpav02.dal12v.mail.ibm.com (Postfix) with ESMTPS; Fri, 10 Nov 2023 23:11:22 +0000 (GMT) Date: Fri, 10 Nov 2023 18:11:20 -0500 From: Michael Meissner To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , "Kewen.Lin" , David Edelsohn , Peter Bergner Subject: [PATCH 2/4] Add support for integer vector pair built-ins Message-ID: Mail-Followup-To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , "Kewen.Lin" , David Edelsohn , Peter Bergner References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-TM-AS-GCONF: 00 X-Proofpoint-GUID: hBgw1IdO05P4x7GSH96QBbtUK21JySyX X-Proofpoint-ORIG-GUID: CJnJIf1aQEDmcraOBbN44GYSeF1BM1UB X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.987,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-11-10_21,2023-11-09_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 suspectscore=0 impostorscore=0 mlxscore=0 adultscore=0 lowpriorityscore=0 bulkscore=0 clxscore=1015 phishscore=0 spamscore=0 mlxlogscore=999 priorityscore=1501 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2311060000 definitions=main-2311100192 X-Spam-Status: No, score=-10.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1782220783293340424 X-GMAIL-MSGID: 1782220783293340424 This patch adds a series of built-in functions to allow users to write code to do a number of simple operations where the loop is done using the __vector_pair type. The __vector_pair type is an opaque type. These built-in functions keep the two 128-bit vectors within the __vector_pair together, and split the operation after register allocation. This patch provides vector pair operations for 8, 16, 32, and 64-bit integers. I have built and tested these patches on: * A little endian power10 server using --with-cpu=power10 * A little endian power9 server using --with-cpu=power9 * A big endian power9 server using --with-cpu=power9. Can I check this patch into the master branch after the preceeding patch is checked in? 2023-11-09 Michael Meissner gcc/ * config/rs6000/rs6000-builtins.def (__builtin_vpair_i8*): Add built-in functions for integer vector pairs. (__builtin_vpair_i16*): Likeise. (__builtin_vpair_i32*): Likeise. (__builtin_vpair_i64*): Likeise. * config/rs6000/vector-pair.md (UNSPEC_VPAIR_V32QI): New unspec. (UNSPEC_VPAIR_V16HI): Likewise. (UNSPEC_VPAIR_V8SI): Likewise. (UNSPEC_VPAIR_V4DI): Likewise. (VP_INT_BINARY): New iterator for integer vector pair. (vp_insn): Add supoort for integer vector pairs. (vp_ireg): New code attribute for integer vector pairs. (vp_ipredicate): Likewise. (VP_INT): New int interator for integer vector pairs. (VP_VEC_MODE): Likewise. (vp_pmode): Likewise. (vp_vmode): Likewise. (vp_neg_reg): New int interator for integer vector pairs. (vpair_neg_): Add integer vector pair support insns. (vpair_not_2): Likewise. (vpair__3): Likewise. (vpair_andc_): Likewise. (vpair_nand__1): Likewise. (vpair_nand__2): Likewise. (vpair_nor__1): Likewise. (vpair_nor__2): Likewise. * doc/extend.texi (PowerPC Vector Pair Built-in Functions): Document the integer vector pair built-in functions. gcc/testsuite/ * gcc.target/powerpc/vector-pair-5.c: New test. * gcc.target/powerpc/vector-pair-6.c: New test. * gcc.target/powerpc/vector-pair-7.c: New test. * gcc.target/powerpc/vector-pair-8.c: New test. --- gcc/config/rs6000/rs6000-builtins.def | 144 +++++++++ gcc/config/rs6000/vector-pair.md | 280 +++++++++++++++++- gcc/doc/extend.texi | 72 +++++ .../gcc.target/powerpc/vector-pair-5.c | 193 ++++++++++++ .../gcc.target/powerpc/vector-pair-6.c | 193 ++++++++++++ .../gcc.target/powerpc/vector-pair-7.c | 193 ++++++++++++ .../gcc.target/powerpc/vector-pair-8.c | 194 ++++++++++++ 7 files changed, 1266 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/vector-pair-5.c create mode 100644 gcc/testsuite/gcc.target/powerpc/vector-pair-6.c create mode 100644 gcc/testsuite/gcc.target/powerpc/vector-pair-7.c create mode 100644 gcc/testsuite/gcc.target/powerpc/vector-pair-8.c diff --git a/gcc/config/rs6000/rs6000-builtins.def b/gcc/config/rs6000/rs6000-builtins.def index 89b248b50ef..3b2db39c1ab 100644 --- a/gcc/config/rs6000/rs6000-builtins.def +++ b/gcc/config/rs6000/rs6000-builtins.def @@ -4183,3 +4183,147 @@ v256 __builtin_vpair_f64_sub (v256, v256); VPAIR_F64_SUB vpair_sub_v4df3 {mma,pair} + +;; vector pair built-in functions for 32 8-bit unsigned char or +;; signed char values + + v256 __builtin_vpair_i8_add (v256, v256); + VPAIR_I8_ADD vpair_add_v32qi3 {mma,pair} + + v256 __builtin_vpair_i8_and (v256, v256); + VPAIR_I8_AND vpair_and_v32qi3 {mma,pair} + + v256 __builtin_vpair_i8_ior (v256, v256); + VPAIR_I8_IOR vpair_ior_v32qi3 {mma,pair} + + v256 __builtin_vpair_i8_max (v256, v256); + VPAIR_I8_MAX vpair_smax_v32qi3 {mma,pair} + + v256 __builtin_vpair_i8_min (v256, v256); + VPAIR_I8_MIN vpair_smin_v32qi3 {mma,pair} + + v256 __builtin_vpair_i8_neg (v256); + VPAIR_I8_NEG vpair_neg_v32qi2 {mma,pair} + + v256 __builtin_vpair_i8_not (v256); + VPAIR_I8_NOT vpair_not_v32qi2 {mma,pair} + + v256 __builtin_vpair_i8_sub (v256, v256); + VPAIR_I8_SUB vpair_sub_v32qi3 {mma,pair} + + v256 __builtin_vpair_i8_xor (v256, v256); + VPAIR_I8_XOR vpair_xor_v32qi3 {mma,pair} + + v256 __builtin_vpair_i8u_max (v256, v256); + VPAIR_I8U_MAX vpair_umax_v32qi3 {mma,pair} + + v256 __builtin_vpair_i8u_min (v256, v256); + VPAIR_I8U_MIN vpair_umin_v32qi3 {mma,pair} + +;; vector pair built-in functions for 16 16-bit unsigned short or +;; signed short values + + v256 __builtin_vpair_i16_add (v256, v256); + VPAIR_I16_ADD vpair_add_v16hi3 {mma,pair} + + v256 __builtin_vpair_i16_and (v256, v256); + VPAIR_I16_AND vpair_and_v16hi3 {mma,pair} + + v256 __builtin_vpair_i16_ior (v256, v256); + VPAIR_I16_IOR vpair_ior_v16hi3 {mma,pair} + + v256 __builtin_vpair_i16_max (v256, v256); + VPAIR_I16_MAX vpair_smax_v16hi3 {mma,pair} + + v256 __builtin_vpair_i16_min (v256, v256); + VPAIR_I16_MIN vpair_smin_v16hi3 {mma,pair} + + v256 __builtin_vpair_i16_neg (v256); + VPAIR_I16_NEG vpair_neg_v16hi2 {mma,pair} + + v256 __builtin_vpair_i16_not (v256); + VPAIR_I16_NOT vpair_not_v16hi2 {mma,pair} + + v256 __builtin_vpair_i16_sub (v256, v256); + VPAIR_I16_SUB vpair_sub_v16hi3 {mma,pair} + + v256 __builtin_vpair_i16_xor (v256, v256); + VPAIR_I16_XOR vpair_xor_v16hi3 {mma,pair} + + v256 __builtin_vpair_i16u_max (v256, v256); + VPAIR_I16U_MAX vpair_umax_v16hi3 {mma,pair} + + v256 __builtin_vpair_i16u_min (v256, v256); + VPAIR_I16U_MIN vpair_umin_v16hi3 {mma,pair} + +;; vector pair built-in functions for 8 32-bit unsigned int or +;; signed int values + + v256 __builtin_vpair_i32_add (v256, v256); + VPAIR_I32_ADD vpair_add_v8si3 {mma,pair} + + v256 __builtin_vpair_i32_and (v256, v256); + VPAIR_I32_AND vpair_and_v8si3 {mma,pair} + + v256 __builtin_vpair_i32_ior (v256, v256); + VPAIR_I32_IOR vpair_ior_v8si3 {mma,pair} + + v256 __builtin_vpair_i32_max (v256, v256); + VPAIR_I32_MAX vpair_smax_v8si3 {mma,pair} + + v256 __builtin_vpair_i32_min (v256, v256); + VPAIR_I32_MIN vpair_smin_v8si3 {mma,pair} + + v256 __builtin_vpair_i32_neg (v256); + VPAIR_I32_NEG vpair_neg_v8si2 {mma,pair} + + v256 __builtin_vpair_i32_not (v256); + VPAIR_I32_NOT vpair_not_v8si2 {mma,pair} + + v256 __builtin_vpair_i32_sub (v256, v256); + VPAIR_I32_SUB vpair_sub_v8si3 {mma,pair} + + v256 __builtin_vpair_i32_xor (v256, v256); + VPAIR_I32_XOR vpair_xor_v8si3 {mma,pair} + + v256 __builtin_vpair_i32u_max (v256, v256); + VPAIR_I32U_MAX vpair_umax_v8si3 {mma,pair} + + v256 __builtin_vpair_i32u_min (v256, v256); + VPAIR_I32U_MIN vpair_umin_v8si3 {mma,pair} + +;; vector pair built-in functions for 4 64-bit unsigned long long or +;; signed long long values + + v256 __builtin_vpair_i64_add (v256, v256); + VPAIR_I64_ADD vpair_add_v4di3 {mma,pair} + + v256 __builtin_vpair_i64_and (v256, v256); + VPAIR_I64_AND vpair_and_v4di3 {mma,pair} + + v256 __builtin_vpair_i64_ior (v256, v256); + VPAIR_I64_IOR vpair_ior_v4di3 {mma,pair} + + v256 __builtin_vpair_i64_max (v256, v256); + VPAIR_I64_MAX vpair_smax_v4di3 {mma,pair} + + v256 __builtin_vpair_i64_min (v256, v256); + VPAIR_I64_MIN vpair_smin_v4di3 {mma,pair} + + v256 __builtin_vpair_i64_neg (v256); + VPAIR_I64_NEG vpair_neg_v4di2 {mma,pair} + + v256 __builtin_vpair_i64_not (v256); + VPAIR_I64_NOT vpair_not_v4di2 {mma,pair} + + v256 __builtin_vpair_i64_sub (v256, v256); + VPAIR_I64_SUB vpair_sub_v4di3 {mma,pair} + + v256 __builtin_vpair_i64_xor (v256, v256); + VPAIR_I64_XOR vpair_xor_v4di3 {mma,pair} + + v256 __builtin_vpair_i64u_max (v256, v256); + VPAIR_I64U_MAX vpair_umax_v4di3 {mma,pair} + + v256 __builtin_vpair_i64u_min (v256, v256); + VPAIR_I64U_MIN vpair_umin_v4di3 {mma,pair} diff --git a/gcc/config/rs6000/vector-pair.md b/gcc/config/rs6000/vector-pair.md index 2dcac6a31e2..cd14430f47a 100644 --- a/gcc/config/rs6000/vector-pair.md +++ b/gcc/config/rs6000/vector-pair.md @@ -29,38 +29,102 @@ (define_c_enum "unspec" [UNSPEC_VPAIR_V4DF UNSPEC_VPAIR_V8SF + UNSPEC_VPAIR_V32QI + UNSPEC_VPAIR_V16HI + UNSPEC_VPAIR_V8SI + UNSPEC_VPAIR_V4DI ]) ;; Iterator doing unary/binary arithmetic on vector pairs (define_code_iterator VP_FP_UNARY [abs neg]) (define_code_iterator VP_FP_BINARY [minus mult plus smin smax]) +(define_code_iterator VP_INT_BINARY [and ior minus plus smax smin umax umin xor]) + ;; Return the insn name from the VP_* code iterator (define_code_attr vp_insn [(abs "abs") + (and "and") + (ior "ior") (minus "sub") (mult "mul") + (not "one_cmpl") (neg "neg") (plus "add") (smin "smin") (smax "smax") + (umin "umin") + (umax "umax") (xor "xor")]) +;; Return the register constraint ("v" or "wa") for the integer code iterator +;; used. For arithmetic operations, we need to use "v" in order to use the +;; Altivec instruction. For logical operations, we can use wa. +(define_code_attr vp_ireg [(and "wa") + (ior "wa") + (minus "v") + (not "wa") + (neg "v") + (plus "v") + (smax "v") + (smin "v") + (umax "v") + (umin "v") + (xor "wa")]) + +;; Return the register previdcate for the integer code iterator used +(define_code_attr vp_ipredicate [(and "vsx_register_operand") + (ior "vsx_register_operand") + (minus "altivec_register_operand") + (not "vsx_register_operand") + (neg "altivec_register_operand") + (plus "altivec_register_operand") + (smax "altivec_register_operand") + (smin "altivec_register_operand") + (umax "altivec_register_operand") + (umin "altivec_register_operand") + (xor "vsx_register_operand")]) + ;; Iterator for creating the unspecs for vector pair built-ins (define_int_iterator VP_FP [UNSPEC_VPAIR_V4DF UNSPEC_VPAIR_V8SF]) +(define_int_iterator VP_INT [UNSPEC_VPAIR_V4DI + UNSPEC_VPAIR_V8SI + UNSPEC_VPAIR_V16HI + UNSPEC_VPAIR_V32QI]) + ;; Map VP_* to vector mode of the arguments after they are split (define_int_attr VP_VEC_MODE [(UNSPEC_VPAIR_V4DF "V2DF") - (UNSPEC_VPAIR_V8SF "V4SF")]) + (UNSPEC_VPAIR_V8SF "V4SF") + (UNSPEC_VPAIR_V32QI "V16QI") + (UNSPEC_VPAIR_V16HI "V8HI") + (UNSPEC_VPAIR_V8SI "V4SI") + (UNSPEC_VPAIR_V4DI "V2DI")]) ;; Map VP_* to a lower case name to identify the vector pair. (define_int_attr vp_pmode [(UNSPEC_VPAIR_V4DF "v4df") - (UNSPEC_VPAIR_V8SF "v8sf")]) + (UNSPEC_VPAIR_V8SF "v8sf") + (UNSPEC_VPAIR_V32QI "v32qi") + (UNSPEC_VPAIR_V16HI "v16hi") + (UNSPEC_VPAIR_V8SI "v8si") + (UNSPEC_VPAIR_V4DI "v4di")]) ;; Map VP_* to a lower case name to identify the vector after the vector pair ;; has been split. (define_int_attr vp_vmode [(UNSPEC_VPAIR_V4DF "v2df") - (UNSPEC_VPAIR_V8SF "v4sf")]) + (UNSPEC_VPAIR_V8SF "v4sf") + (UNSPEC_VPAIR_V32QI "v16qi") + (UNSPEC_VPAIR_V16HI "v8hi") + (UNSPEC_VPAIR_V8SI "v4si") + (UNSPEC_VPAIR_V4DI "v2di")]) + +;; Map VP_INT to constraints used for the negate scratch register. For vectors +;; of QI and HI, we need to change -a into 0 - a since we don't have a negate +;; operation. We do have a vnegw/vnegd operation for SI and DI modes. +(define_int_attr vp_neg_reg [(UNSPEC_VPAIR_V32QI "&v") + (UNSPEC_VPAIR_V16HI "&v") + (UNSPEC_VPAIR_V8SI "X") + (UNSPEC_VPAIR_V4DI "X")]) ;; Vector pair floating point unary operations @@ -327,3 +391,213 @@ (define_insn_and_split "*vpair_nfms_fpcontract_4" { } [(set_attr "length" "8")]) + + +;; Vector pair integer negate support. +(define_insn_and_split "vpair_neg_2" + [(set (match_operand:OO 0 "altivec_register_operand" "=v") + (unspec:OO [(neg:OO + (match_operand:OO 1 "altivec_register_operand" "v"))] + VP_INT)) + (clobber (match_scratch: 2 "="))] + "TARGET_MMA" + "#" + "&& reload_completed" + [(set (match_dup 2) (match_dup 3)) + (set (match_dup 4) (minus: (match_dup 2) + (match_dup 5))) + (set (match_dup 6) (minus: (match_dup 2) + (match_dup 7)))] +{ + unsigned reg0 = reg_or_subregno (operands[0]); + unsigned reg1 = reg_or_subregno (operands[1]); + machine_mode vmode = mode; + + operands[3] = CONST0_RTX (vmode); + + operands[4] = gen_rtx_REG (vmode, reg0); + operands[5] = gen_rtx_REG (vmode, reg1); + + operands[6] = gen_rtx_REG (vmode, reg0 + 1); + operands[7] = gen_rtx_REG (vmode, reg1 + 1); + + /* If the vector integer size is 32 or 64 bits, we can use the vneg{w,d} + instructions. */ + if (vmode == V4SImode) + { + emit_insn (gen_negv4si2 (operands[4], operands[5])); + emit_insn (gen_negv4si2 (operands[6], operands[7])); + DONE; + } + else if (vmode == V2DImode) + { + emit_insn (gen_negv2di2 (operands[4], operands[5])); + emit_insn (gen_negv2di2 (operands[6], operands[7])); + DONE; + } +} + [(set_attr "length" "8")]) + +;; Vector pair integer not support. +(define_insn_and_split "vpair_not_2" + [(set (match_operand:OO 0 "vsx_register_operand" "=wa") + (unspec:OO [(not:OO (match_operand:OO 1 "vsx_register_operand" "wa"))] + VP_INT))] + "TARGET_MMA" + "#" + "&& reload_completed" + [(const_int 0)] +{ + split_unary_vector_pair (mode, operands, + gen_one_cmpl2); + DONE; +} + [(set_attr "length" "8")]) + +;; Vector pair integer binary operations. +(define_insn_and_split "vpair__3" + [(set (match_operand:OO 0 "" "=") + (unspec:OO [(VP_INT_BINARY:OO + (match_operand:OO 1 "" "") + (match_operand:OO 2 "" ""))] + VP_INT))] + "TARGET_MMA" + "#" + "&& reload_completed" + [(const_int 0)] +{ + split_binary_vector_pair (mode, operands, + gen_3); + DONE; +} + [(set_attr "length" "8")]) + +;; Optimize vector pair a & ~b +(define_insn_and_split "*vpair_andc_" + [(set (match_operand:OO 0 "vsx_register_operand" "=wa") + (unspec:OO [(and:OO + (unspec:OO + [(not:OO + (match_operand:OO 1 "vsx_register_operand" "wa"))] + VP_INT) + (match_operand:OO 2 "vsx_register_operand" "wa"))] + VP_INT))] + "TARGET_MMA" + "#" + "&& reload_completed" + [(const_int 0)] +{ + split_binary_vector_pair (mode, operands, + gen_andc3); + DONE; +} + [(set_attr "length" "8")]) + +;; Optimize vector pair a | ~b +(define_insn_and_split "*vpair_iorc_" + [(set (match_operand:OO 0 "vsx_register_operand" "=wa") + (unspec:OO [(ior:OO + (unspec:OO + [(not:OO + (match_operand:OO 1 "vsx_register_operand" "wa"))] + VP_INT) + (match_operand:OO 2 "vsx_register_operand" "wa"))] + VP_INT))] + "TARGET_MMA" + "#" + "&& reload_completed" + [(const_int 0)] +{ + split_binary_vector_pair (mode, operands, + gen_orc3); + DONE; +} + [(set_attr "length" "8")]) + +;; Optiomize vector pair ~(a & b) or ((~a) | (~b)) +(define_insn_and_split "*vpair_nand__1" + [(set (match_operand:OO 0 "vsx_register_operand" "=wa") + (unspec:OO + [(not:OO + (unspec:OO [(and:OO + (match_operand:OO 1 "vsx_register_operand" "wa") + (match_operand:OO 2 "vsx_register_operand" "wa"))] + VP_INT))] + VP_INT))] + "TARGET_MMA" + "#" + "&& reload_completed" + [(const_int 0)] +{ + split_binary_vector_pair (mode, operands, + gen_nand3); + DONE; +} + [(set_attr "length" "8")]) + +(define_insn_and_split "*vpair_nand__2" + [(set (match_operand:OO 0 "vsx_register_operand" "=wa") + (unspec:OO + [(ior:OO + (unspec:OO + [(not:OO + (match_operand:OO 1 "vsx_register_operand" "wa"))] + VP_INT) + (unspec:OO + [(not:OO + (match_operand:OO 2 "vsx_register_operand" "wa"))] + VP_INT))] + VP_INT))] + "TARGET_MMA" + "#" + "&& reload_completed" + [(const_int 0)] +{ + split_binary_vector_pair (mode, operands, + gen_nand3); + DONE; +} + [(set_attr "length" "8")]) + +;; Optiomize vector pair ~(a | b) or ((~a) & (~b)) +(define_insn_and_split "*vpair_nor__1" + [(set (match_operand:OO 0 "vsx_register_operand" "=wa") + (unspec:OO + [(not:OO + (unspec:OO [(ior:OO + (match_operand:OO 1 "vsx_register_operand" "wa") + (match_operand:OO 2 "vsx_register_operand" "wa"))] + VP_INT))] + VP_INT))] + "TARGET_MMA" + "#" + "&& reload_completed" + [(const_int 0)] +{ + split_binary_vector_pair (mode, operands, + gen_nor3); + DONE; +} + [(set_attr "length" "8")]) + +(define_insn_and_split "*vpair_nor__2" + [(set (match_operand:OO 0 "vsx_register_operand" "=wa") + (unspec:OO + [(ior:OO + (unspec:OO + [(not:OO (match_operand:OO 1 "vsx_register_operand" "wa"))] + VP_INT) + (unspec:OO + [(not:OO (match_operand:OO 2 "vsx_register_operand" "wa"))] + VP_INT))] + VP_INT))] + "TARGET_MMA" + "#" + "&& reload_completed" + [(const_int 0)] +{ + split_binary_vector_pair (mode, operands, + gen_nor3); + DONE; +} + [(set_attr "length" "8")]) diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index a830ad06b90..ff7918c7a58 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -21414,6 +21414,78 @@ __vector_pair __builtin_vpair_f64_min (__vector_pair, __vector_pair); __vector_pair __builtin_vpair_f64_sub (__vector_pair, __vector_pair); @end smallexample +The following built-in functions operate on pairs of +@code{vector long long} or @code{vector unsigned long long} values: + +@smallexample +__vector_pair __builtin_vpair_i64_add (__vector_pair, __vector_pair); +__vector_pair __builtin_vpair_i64_and (__vector_pair, __vector_pair); +__vector_pair __builtin_vpair_i64_ior (__vector_pair, __vector_pair); +__vector_pair __builtin_vpair_i64_max (__vector_pair, __vector_pair); +__vector_pair __builtin_vpair_i64_min (__vector_pair, __vector_pair); +__vector_pair __builtin_vpair_i64_neg (__vector_pair); +__vector_pair __builtin_vpair_i64_not (__vector_pair); +__vector_pair __builtin_vpair_i64_sub (__vector_pair, __vector_pair); +__vector_pair __builtin_vpair_i64_xor (__vector_pair, __vector_pair); + +__vector_pair __builtin_vpair_i64u_max (__vector_pair, __vector_pair); +__vector_pair __builtin_vpair_i64u_min (__vector_pair, __vector_pair); +@end smallexample + +The following built-in functions operate on pairs of +@code{vector int} or @code{vector unsigned int} values: + +@smallexample +__vector_pair __builtin_vpair_i32_add (__vector_pair, __vector_pair); +__vector_pair __builtin_vpair_i32_and (__vector_pair, __vector_pair); +__vector_pair __builtin_vpair_i32_ior (__vector_pair, __vector_pair); +__vector_pair __builtin_vpair_i32_neg (__vector_pair); +__vector_pair __builtin_vpair_i32_not (__vector_pair); +__vector_pair __builtin_vpair_i32_max (__vector_pair, __vector_pair); +__vector_pair __builtin_vpair_i32_min (__vector_pair, __vector_pair); +__vector_pair __builtin_vpair_i32_sub (__vector_pair, __vector_pair); +__vector_pair __builtin_vpair_i32_xor (__vector_pair, __vector_pair); + +__vector_pair __builtin_vpair_i32u_max (__vector_pair, __vector_pair); +__vector_pair __builtin_vpair_i32u_min (__vector_pair, __vector_pair); +@end smallexample + +The following built-in functions operate on pairs of +@code{vector short} or @code{vector unsigned short} values: + +@smallexample +__vector_pair __builtin_vpair_i16_add (__vector_pair, __vector_pair); +__vector_pair __builtin_vpair_i16_and (__vector_pair, __vector_pair); +__vector_pair __builtin_vpair_i16_ior (__vector_pair, __vector_pair); +__vector_pair __builtin_vpair_i16_max (__vector_pair, __vector_pair); +__vector_pair __builtin_vpair_i16_min (__vector_pair, __vector_pair); +__vector_pair __builtin_vpair_i16_neg (__vector_pair); +__vector_pair __builtin_vpair_i16_not (__vector_pair); +__vector_pair __builtin_vpair_i16_sub (__vector_pair, __vector_pair); +__vector_pair __builtin_vpair_i16_xor (__vector_pair, __vector_pair); + +__vector_pair __builtin_vpair_i16u_max (__vector_pair, __vector_pair); +__vector_pair __builtin_vpair_i16u_min (__vector_pair, __vector_pair); +@end smallexample + +The following built-in functions operate on pairs of +@code{vector signed char} or @code{vector unsigned char} values: + +@smallexample +__vector_pair __builtin_vpair_i8_add (__vector_pair, __vector_pair); +__vector_pair __builtin_vpair_i8_and (__vector_pair, __vector_pair); +__vector_pair __builtin_vpair_i8_ior (__vector_pair, __vector_pair); +__vector_pair __builtin_vpair_i8_max (__vector_pair, __vector_pair); +__vector_pair __builtin_vpair_i8_min (__vector_pair, __vector_pair); +__vector_pair __builtin_vpair_i8_neg (__vector_pair); +__vector_pair __builtin_vpair_i8_not (__vector_pair); +__vector_pair __builtin_vpair_i8_sub (__vector_pair, __vector_pair); +__vector_pair __builtin_vpair_i8_xor (__vector_pair, __vector_pair); + +__vector_pair __builtin_vpair_i8_umax (__vector_pair, __vector_pair); +__vector_pair __builtin_vpair_i8_umin (__vector_pair, __vector_pair); +@end smallexample + @node PowerPC Hardware Transactional Memory Built-in Functions @subsection PowerPC Hardware Transactional Memory Built-in Functions GCC provides two interfaces for accessing the Hardware Transactional diff --git a/gcc/testsuite/gcc.target/powerpc/vector-pair-5.c b/gcc/testsuite/gcc.target/powerpc/vector-pair-5.c new file mode 100644 index 00000000000..924919cae1b --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vector-pair-5.c @@ -0,0 +1,193 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */ + +/* Test whether the vector buitin code generates the expected instructions for + vector pairs with 4 64-bit integer elements. */ + +void +test_add (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 vaddudm, 1 stxvp. */ + *dest = __builtin_vpair_i64_add (*x, *y); +} + +void +test_sub (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 vaddudm, 1 stxvp. */ + *dest = __builtin_vpair_i64_sub (*x, *y); +} + +void +test_and (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 xxland, 1 stxvp. */ + *dest = __builtin_vpair_i64_and (*x, *y); +} + +void +test_or (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 xxlor, 1 stxvp. */ + *dest = __builtin_vpair_i64_ior (*x, *y); +} + +void +test_xor (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 xxlxor, 1 stxvp. */ + *dest = __builtin_vpair_i64_xor (*x, *y); +} + +void +test_smax (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 vmaxsd, 1 stxvp. */ + *dest = __builtin_vpair_i64_max (*x, *y); +} + +void +test_smin (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 vminsd, 1 stxvp. */ + *dest = __builtin_vpair_i64_min (*x, *y); +} + +void +test_umax (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 vmaxud, 1 stxvp. */ + *dest = __builtin_vpair_i64u_max (*x, *y); +} + +void +test_umin (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 vminud, 1 stxvp. */ + *dest = __builtin_vpair_i64u_min (*x, *y); +} + +void +test_negate (__vector_pair *dest, + __vector_pair *x) +{ + /* 2 lxvp, 2 vnegd, 1 stxvp. */ + *dest = __builtin_vpair_i64_neg (*x); +} + +void +test_not (__vector_pair *dest, + __vector_pair *x) +{ + /* 2 lxvp, 2 xxlnor, 1 stxvp. */ + *dest = __builtin_vpair_i64_not (*x); +} + +/* Combination of logical operators. */ + +void +test_andc_1 (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 xxlandc, 1 stxvp. */ + __vector_pair n = __builtin_vpair_i64_not (*y); + *dest = __builtin_vpair_i64_and (*x, n); +} + +void +test_andc_2 (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 xxlandc, 1 stxvp. */ + __vector_pair n = __builtin_vpair_i64_not (*x); + *dest = __builtin_vpair_i64_and (n, *y); +} + +void +test_orc_1 (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 xxlorc, 1 stxvp. */ + __vector_pair n = __builtin_vpair_i64_not (*y); + *dest = __builtin_vpair_i64_ior (*x, n); +} + +void +test_orc_2 (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 xxlorc, 1 stxvp. */ + __vector_pair n = __builtin_vpair_i64_not (*x); + *dest = __builtin_vpair_i64_ior (n, *y); +} + +void +test_nand_1 (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 xxlnand, 1 stxvp. */ + __vector_pair a = __builtin_vpair_i64_and (*x, *y); + *dest = __builtin_vpair_i64_not (a); +} + +void +test_nand_2 (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 xxlnand, 1 stxvp. */ + __vector_pair nx = __builtin_vpair_i64_not (*x); + __vector_pair ny = __builtin_vpair_i64_not (*y); + *dest = __builtin_vpair_i64_ior (nx, ny); +} + +void +test_nor (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 xxlnor, 1 stxvp. */ + __vector_pair a = __builtin_vpair_i64_ior (*x, *y); + *dest = __builtin_vpair_i64_not (a); +} + +/* { dg-final { scan-assembler-times {\mlxvp\M} 34 } } */ +/* { dg-final { scan-assembler-times {\mstxvp\M} 18 } } */ +/* { dg-final { scan-assembler-times {\mvaddudm\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mvmaxsd\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mvmaxud\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mvminsd\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mvminud\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mvnegd\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mvsubudm\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxxland\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxxlandc\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mxxlnand\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mxxlnor\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mxxlor\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxxlorc\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mxxlxor\M} 2 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/vector-pair-6.c b/gcc/testsuite/gcc.target/powerpc/vector-pair-6.c new file mode 100644 index 00000000000..f22949c1f95 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vector-pair-6.c @@ -0,0 +1,193 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */ + +/* Test whether the vector buitin code generates the expected instructions for + vector pairs with 8 32-bit integer elements. */ + +void +test_add (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 vadduwm, 1 stxvp. */ + *dest = __builtin_vpair_i32_add (*x, *y); +} + +void +test_sub (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 vsubuwm, 1 stxvp. */ + *dest = __builtin_vpair_i32_sub (*x, *y); +} + +void +test_and (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 xxland, 1 stxvp. */ + *dest = __builtin_vpair_i32_and (*x, *y); +} + +void +test_or (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 xxlor, 1 stxvp. */ + *dest = __builtin_vpair_i32_ior (*x, *y); +} + +void +test_xor (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 xxlxor, 1 stxvp. */ + *dest = __builtin_vpair_i32_xor (*x, *y); +} + +void +test_smax (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 vmaxsw, 1 stxvp. */ + *dest = __builtin_vpair_i32_max (*x, *y); +} + +void +test_smin (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 vminsw, 1 stxvp. */ + *dest = __builtin_vpair_i32_min (*x, *y); +} + +void +test_umax (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 vmaxuw, 1 stxvp. */ + *dest = __builtin_vpair_i32u_max (*x, *y); +} + +void +test_umin (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 vminuw, 1 stxvp. */ + *dest = __builtin_vpair_i32u_min (*x, *y); +} + +void +test_negate (__vector_pair *dest, + __vector_pair *x) +{ + /* 2 lxvp, 2 vnegw, 1 stxvp. */ + *dest = __builtin_vpair_i32_neg (*x); +} + +void +test_not (__vector_pair *dest, + __vector_pair *x) +{ + /* 2 lxvp, 2 xxlnor, 1 stxvp. */ + *dest = __builtin_vpair_i32_not (*x); +} + +/* Combination of logical operators. */ + +void +test_andc_1 (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 xxlandc, 1 stxvp. */ + __vector_pair n = __builtin_vpair_i32_not (*y); + *dest = __builtin_vpair_i32_and (*x, n); +} + +void +test_andc_2 (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 xxlandc, 1 stxvp. */ + __vector_pair n = __builtin_vpair_i32_not (*x); + *dest = __builtin_vpair_i32_and (n, *y); +} + +void +test_orc_1 (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 xxlorc, 1 stxvp. */ + __vector_pair n = __builtin_vpair_i32_not (*y); + *dest = __builtin_vpair_i32_ior (*x, n); +} + +void +test_orc_2 (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 xxlorc, 1 stxvp. */ + __vector_pair n = __builtin_vpair_i32_not (*x); + *dest = __builtin_vpair_i32_ior (n, *y); +} + +void +test_nand_1 (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 xxlnand, 1 stxvp. */ + __vector_pair a = __builtin_vpair_i32_and (*x, *y); + *dest = __builtin_vpair_i32_not (a); +} + +void +test_nand_2 (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 xxlnand, 1 stxvp. */ + __vector_pair nx = __builtin_vpair_i32_not (*x); + __vector_pair ny = __builtin_vpair_i32_not (*y); + *dest = __builtin_vpair_i32_ior (nx, ny); +} + +void +test_nor (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 xxlnor, 1 stxvp. */ + __vector_pair a = __builtin_vpair_i32_ior (*x, *y); + *dest = __builtin_vpair_i32_not (a); +} + +/* { dg-final { scan-assembler-times {\mlxvp\M} 34 } } */ +/* { dg-final { scan-assembler-times {\mstxvp\M} 18 } } */ +/* { dg-final { scan-assembler-times {\mvadduwm\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mvmaxsw\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mvmaxuw\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mvminsw\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mvminuw\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mvnegw\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mvsubuwm\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxxland\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxxlandc\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mxxlnand\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mxxlnor\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mxxlor\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxxlorc\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mxxlxor\M} 2 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/vector-pair-7.c b/gcc/testsuite/gcc.target/powerpc/vector-pair-7.c new file mode 100644 index 00000000000..71452f59284 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vector-pair-7.c @@ -0,0 +1,193 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */ + +/* Test whether the vector buitin code generates the expected instructions for + vector pairs with 16 16-bit integer elements. */ + +void +test_add (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 vadduhm, 1 stxvp. */ + *dest = __builtin_vpair_i16_add (*x, *y); +} + +void +test_sub (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 vsubuhm, 1 stxvp. */ + *dest = __builtin_vpair_i16_sub (*x, *y); +} + +void +test_and (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 xxland, 1 stxvp. */ + *dest = __builtin_vpair_i16_and (*x, *y); +} + +void +test_or (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 xxlor, 1 stxvp. */ + *dest = __builtin_vpair_i16_ior (*x, *y); +} + +void +test_xor (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 xxlxor, 1 stxvp. */ + *dest = __builtin_vpair_i16_xor (*x, *y); +} + +void +test_smax (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 vmaxsh, 1 stxvp. */ + *dest = __builtin_vpair_i16_max (*x, *y); +} + +void +test_smin (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 vminsh, 1 stxvp. */ + *dest = __builtin_vpair_i16_min (*x, *y); +} + +void +test_umax (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 vmaxuh, 1 stxvp. */ + *dest = __builtin_vpair_i16u_max (*x, *y); +} + +void +test_umin (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 vminuh, 1 stxvp. */ + *dest = __builtin_vpair_i16u_min (*x, *y); +} + +void +test_negate (__vector_pair *dest, + __vector_pair *x) +{ + /* 2 lxvp, 1 xxspltib, 2 vsubuhm, 1 stxvp. */ + *dest = __builtin_vpair_i16_neg (*x); +} + +void +test_not (__vector_pair *dest, + __vector_pair *x) +{ + /* 2 lxvp, 2 xxlnor, 1 stxvp. */ + *dest = __builtin_vpair_i16_not (*x); +} + +/* Combination of logical operators. */ + +void +test_andc_1 (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 xxlandc, 1 stxvp. */ + __vector_pair n = __builtin_vpair_i16_not (*y); + *dest = __builtin_vpair_i16_and (*x, n); +} + +void +test_andc_2 (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 xxlandc, 1 stxvp. */ + __vector_pair n = __builtin_vpair_i16_not (*x); + *dest = __builtin_vpair_i16_and (n, *y); +} + +void +test_orc_1 (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 xxlorc, 1 stxvp. */ + __vector_pair n = __builtin_vpair_i16_not (*y); + *dest = __builtin_vpair_i16_ior (*x, n); +} + +void +test_orc_2 (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 xxlorc, 1 stxvp. */ + __vector_pair n = __builtin_vpair_i16_not (*x); + *dest = __builtin_vpair_i16_ior (n, *y); +} + +void +test_nand_1 (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 xxlnand, 1 stxvp. */ + __vector_pair a = __builtin_vpair_i16_and (*x, *y); + *dest = __builtin_vpair_i16_not (a); +} + +void +test_nand_2 (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 xxlnand, 1 stxvp. */ + __vector_pair nx = __builtin_vpair_i16_not (*x); + __vector_pair ny = __builtin_vpair_i16_not (*y); + *dest = __builtin_vpair_i16_ior (nx, ny); +} + +void +test_nor (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 xxlnor, 1 stxvp. */ + __vector_pair a = __builtin_vpair_i16_ior (*x, *y); + *dest = __builtin_vpair_i16_not (a); +} + +/* { dg-final { scan-assembler-times {\mlxvp\M} 34 } } */ +/* { dg-final { scan-assembler-times {\mstxvp\M} 18 } } */ +/* { dg-final { scan-assembler-times {\mvadduhm\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mvmaxsh\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mvmaxuh\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mvminsh\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mvminuh\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mvsubuhm\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mxxland\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxxlandc\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mxxlnand\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mxxlnor\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mxxlor\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxxlorc\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mxxlxor\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxxspltib\M} 1 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/vector-pair-8.c b/gcc/testsuite/gcc.target/powerpc/vector-pair-8.c new file mode 100644 index 00000000000..8db9056d4cc --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vector-pair-8.c @@ -0,0 +1,194 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */ + +/* Test whether the vector buitin code generates the expected instructions for + vector pairs with 32 8-bit integer elements. */ + + +void +test_add (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 vaddubm, 1 stxvp. */ + *dest = __builtin_vpair_i8_add (*x, *y); +} + +void +test_sub (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 vsububm, 1 stxvp. */ + *dest = __builtin_vpair_i8_sub (*x, *y); +} + +void +test_and (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 xxland, 1 stxvp. */ + *dest = __builtin_vpair_i8_and (*x, *y); +} + +void +test_or (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 xxlor, 1 stxvp. */ + *dest = __builtin_vpair_i8_ior (*x, *y); +} + +void +test_xor (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 xxlxor, 1 stxvp. */ + *dest = __builtin_vpair_i8_xor (*x, *y); +} + +void +test_smax (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 vmaxsb, 1 stxvp. */ + *dest = __builtin_vpair_i8_max (*x, *y); +} + +void +test_smin (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 vminsb, 1 stxvp. */ + *dest = __builtin_vpair_i8_min (*x, *y); +} + +void +test_umax (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 vmaxub, 1 stxvp. */ + *dest = __builtin_vpair_i8u_max (*x, *y); +} + +void +test_umin (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 vminub, 1 stxvp. */ + *dest = __builtin_vpair_i8u_min (*x, *y); +} + +void +test_negate (__vector_pair *dest, + __vector_pair *x) +{ + /* 2 lxvp, 1 xxspltib, 2 vsububm, 1 stxvp. */ + *dest = __builtin_vpair_i8_neg (*x); +} + +void +test_not (__vector_pair *dest, + __vector_pair *x) +{ + /* 2 lxvp, 2 xxlnor, 1 stxvp. */ + *dest = __builtin_vpair_i8_not (*x); +} + +/* Combination of logical operators. */ + +void +test_andc_1 (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 xxlandc, 1 stxvp. */ + __vector_pair n = __builtin_vpair_i8_not (*y); + *dest = __builtin_vpair_i8_and (*x, n); +} + +void +test_andc_2 (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 xxlandc, 1 stxvp. */ + __vector_pair n = __builtin_vpair_i8_not (*x); + *dest = __builtin_vpair_i8_and (n, *y); +} + +void +test_orc_1 (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 xxlorc, 1 stxvp. */ + __vector_pair n = __builtin_vpair_i8_not (*y); + *dest = __builtin_vpair_i8_ior (*x, n); +} + +void +test_orc_2 (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 xxlorc, 1 stxvp. */ + __vector_pair n = __builtin_vpair_i8_not (*x); + *dest = __builtin_vpair_i8_ior (n, *y); +} + +void +test_nand_1 (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 xxlnand, 1 stxvp. */ + __vector_pair a = __builtin_vpair_i8_and (*x, *y); + *dest = __builtin_vpair_i8_not (a); +} + +void +test_nand_2 (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 xxlnand, 1 stxvp. */ + __vector_pair nx = __builtin_vpair_i8_not (*x); + __vector_pair ny = __builtin_vpair_i8_not (*y); + *dest = __builtin_vpair_i8_ior (nx, ny); +} + +void +test_nor (__vector_pair *dest, + __vector_pair *x, + __vector_pair *y) +{ + /* 2 lxvp, 2 xxlnor, 1 stxvp. */ + __vector_pair a = __builtin_vpair_i8_ior (*x, *y); + *dest = __builtin_vpair_i8_not (a); +} + +/* { dg-final { scan-assembler-times {\mlxvp\M} 34 } } */ +/* { dg-final { scan-assembler-times {\mstxvp\M} 18 } } */ +/* { dg-final { scan-assembler-times {\mvaddubm\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mvmaxsb\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mvmaxub\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mvminsb\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mvminub\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mvsububm\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mxxland\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxxlandc\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mxxlnand\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mxxlnor\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mxxlor\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxxlorc\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mxxlxor\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxxspltib\M} 1 } } */ From patchwork Fri Nov 10 23:12:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Meissner X-Patchwork-Id: 164023 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b129:0:b0:403:3b70:6f57 with SMTP id q9csp1439030vqs; Fri, 10 Nov 2023 15:13:23 -0800 (PST) X-Google-Smtp-Source: AGHT+IF+9kgIb0EUWWgJmUDHkk1tgA0tPMIt3iO0DERDaxcRN2mVOYCRsnKB6hNShP0RlMF4eqoP X-Received: by 2002:a05:6358:7f1b:b0:16b:b980:d84b with SMTP id p27-20020a0563587f1b00b0016bb980d84bmr507056rwn.11.1699658003683; Fri, 10 Nov 2023 15:13:23 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1699658003; cv=pass; d=google.com; s=arc-20160816; b=hheWvp0dbfssg9qcSkc5/WFwB5y/ZH+pGI7i7UglvCNrzy1Z3t3NQl3aOGaSMWDLu3 kokcEgdxqemXGnB7T9ZOX+84dSfUflOtZsxNgwOfMG0l2F+RDAnM6JaN1hfRaswtO5u+ y8+w8SI9uhrxH3p2HvqEzpU6KH1TaxGFhAfBv8kNzTxw+aB9AN9cujGZyKaACe3IQQcO ol5060um4x69yXnijTHf1dKWzkLiy/bgF/NOnBw7P+9SGTBILbJ5ZwVNiH87E+6tJklw q39/7Y5dyHrJMpMuOubTgesSOdyBGYB+/qmLqR3EDCKicWQRo5KQ6J5HuvyoLZz4B6r+ OFgg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:in-reply-to:content-disposition :mime-version:references:mail-followup-to:message-id:subject:to:from :date:dkim-signature:arc-filter:dmarc-filter:delivered-to; bh=07dXQSlnF/f9Drzw/j2/g7QK8PHoNdy5hOGiKUx2cbE=; fh=jH+DijE7mz3ySVsRmzRqEe/ioBeGu3vnvA+jm2JjCm8=; b=R84B6f9AZ4uPZP/2QiEgWuNFczwts1tu/CxG22fSEvHmG4Xr7kggpJHRSHtUUep6g3 Dbx+69B2kpQPMS9YsYAvOREcUiFfEJtnkoODJxZM6uTnQgbDEQVsSpZzFq2JkvmeIwIp pnreC3GYYFIZ0hHo31JqONlWIynOQtEQA0fSRTky5ayNIUW9DkPogGAq6C0tfE+2YlHP BR0EXDm2rS4Vc/Qzk5+KRk9bXKqjTr9I+/7WfpzBoebCB4XRA17yTq/qv0J+U9i1s5ks SCcV2i/m8NYlX1LNNCgPsv5huHhevse1Zh2ZSQFZI7IEe1Ri9W5C95ELj1egRN3U8ZBq cZFg== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=JgCfo42r; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com Received: from server2.sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id l20-20020a056214029400b00670c7fd09d2si490336qvv.136.2023.11.10.15.13.23 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 10 Nov 2023 15:13:23 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=JgCfo42r; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 6B3793858C2C for ; Fri, 10 Nov 2023 23:13:23 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 9D1413858D32 for ; Fri, 10 Nov 2023 23:12:52 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 9D1413858D32 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 9D1413858D32 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699657977; cv=none; b=Su/qXXdBk6pYXOWGQ7etpZTZj8NCJohTnWyV1r312Dn+wlE0xmW9m6mPUEl9XsCncJRRE2HCbKkck+qcFgkf4x4KnKqUL3xSYZr9X7QuKscvlHXG1rdnZrRqqGzL9yTtHkEj85xiiUieiztrA1BqWdz32DL9nig1kcpfctrI4Us= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699657977; c=relaxed/simple; bh=iHmw8KjBQdufk1Lg6hUsIDE73mIcAUApqAK/9JrHL6E=; h=DKIM-Signature:Date:From:To:Subject:Message-ID:MIME-Version; b=lU6nVfhVjcEhOJrEqwJEwCtbECEve1B65luo10onfaaqG+AQCXeP6/aYWAVgVZJ9eihZbhb6ou5PWO4Q3gBISZZYlHSAsuWyIHW9B/0CFMSgzCSuV6sQJc48CpbUJWnHNqDjZnbjJx+9KKd6hmQNQpiL/TuQ0ViImMsE8PhZoIo= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0353725.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3AANBNRY020722; Fri, 10 Nov 2023 23:12:52 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=date : from : to : subject : message-id : references : mime-version : content-type : in-reply-to; s=pp1; bh=07dXQSlnF/f9Drzw/j2/g7QK8PHoNdy5hOGiKUx2cbE=; b=JgCfo42rtSbrEjvPFPS2olDc1CkqtZCCtzCNgqSJv/J08XQrDmGZ7DsFxYS81sKd6ntS g2L31vVspVpW9LQyCWtEwKWoM6wYoLtEUx+4p6ESFP5LBgWo4YXnCNqVN202dzhRNB7O LAZZktChQWEswCHR8O55OogBhBBauq6UB6s2jLHoNOGBUPzcMQF3a1t0/qWoX1lTft/N WpK0dcaJU5k+mC/jRpYy6DJ+s01YIYHwXWXCCXg2B2/OKmhqRXtz15YAnTOnakJvhljS ON1jW/JwsUJGeh5IwiHE74/44S3JnvBylkMWz9Uz2w5Jcx+HTInX9Q/EaYcUlBPDCzMH DQ== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3u9wya80mg-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 10 Nov 2023 23:12:51 +0000 Received: from m0353725.ppops.net (m0353725.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 3AANCphP024113; Fri, 10 Nov 2023 23:12:51 GMT Received: from ppma12.dal12v.mail.ibm.com (dc.9e.1632.ip4.static.sl-reverse.com [50.22.158.220]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3u9wya80ma-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 10 Nov 2023 23:12:51 +0000 Received: from pps.filterd (ppma12.dal12v.mail.ibm.com [127.0.0.1]) by ppma12.dal12v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 3AALQoZa019256; Fri, 10 Nov 2023 23:12:50 GMT Received: from smtprelay05.dal12v.mail.ibm.com ([172.16.1.7]) by ppma12.dal12v.mail.ibm.com (PPS) with ESMTPS id 3u7w24e9ns-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 10 Nov 2023 23:12:50 +0000 Received: from smtpav04.dal12v.mail.ibm.com (smtpav04.dal12v.mail.ibm.com [10.241.53.103]) by smtprelay05.dal12v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 3AANCnkX5964376 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 10 Nov 2023 23:12:50 GMT Received: from smtpav04.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id D1C7958056; Fri, 10 Nov 2023 23:12:49 +0000 (GMT) Received: from smtpav04.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 43CEC58052; Fri, 10 Nov 2023 23:12:49 +0000 (GMT) Received: from cowardly-lion.the-meissners.org (unknown [9.61.104.206]) by smtpav04.dal12v.mail.ibm.com (Postfix) with ESMTPS; Fri, 10 Nov 2023 23:12:49 +0000 (GMT) Date: Fri, 10 Nov 2023 18:12:47 -0500 From: Michael Meissner To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , "Kewen.Lin" , David Edelsohn , Peter Bergner Subject: [PATCH 3/4] Add support for initializing and extracting from vector pairs Message-ID: Mail-Followup-To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , "Kewen.Lin" , David Edelsohn , Peter Bergner References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-TM-AS-GCONF: 00 X-Proofpoint-GUID: gCrA8cZhFStWm45BLvSrG-KTmR9FRNg- X-Proofpoint-ORIG-GUID: CwbTK9_eceN56XAOw8cWVN8tb4p8rubc X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.987,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-11-10_21,2023-11-09_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 adultscore=0 impostorscore=0 clxscore=1015 priorityscore=1501 mlxlogscore=999 mlxscore=0 bulkscore=0 malwarescore=0 phishscore=0 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2311060000 definitions=main-2311100191 X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1782220590828380822 X-GMAIL-MSGID: 1782220590828380822 This patch adds a series of built-in functions to allow users to write code to do a number of simple operations where the loop is done using the __vector_pair type. The __vector_pair type is an opaque type. These built-in functions keep the two 128-bit vectors within the __vector_pair together, and split the operation after register allocation. This patch provides vector pair operations for loading up a vector pair with all 0's, duplicated (splat) from a scalar type, or combining two vectors in a vector pair. This patch also provides vector pair builtins to extract one vector element of a vector pair. I have built and tested these patches on: * A little endian power10 server using --with-cpu=power10 * A little endian power9 server using --with-cpu=power9 * A big endian power9 server using --with-cpu=power9. Can I check this patch into the master branch after the preceeding patches have been checked in? 2023-11-09 Michael Meissner gcc/ * config/rs6000/predicates.md (mma_assemble_input_operand): Allow any 16-byte vector, not just V16QImode. * config/rs6000/rs6000-builtins.def (__builtin_vpair_zero): New vector pair initialization built-in functions. (__builtin_vpair_*_assemble): Likeise. (__builtin_vpair_*_splat): Likeise. (__builtin_vpair_*_extract_vector): New vector pair extraction built-in functions. * config/rs6000/vector-pair.md (UNSPEC_VPAIR_V32QI): New unspec. (UNSPEC_VPAIR_V16HI): Likewise. (UNSPEC_VPAIR_V8SI): Likewise. (UNSPEC_VPAIR_V4DI): Likewise. (VP_INT_BINARY): New iterator for integer vector pair. (vp_insn): Add supoort for integer vector pairs. (vp_ireg): New code attribute for integer vector pairs. (vp_ipredicate): Likewise. (VP_INT): New int interator for integer vector pairs. (VP_VEC_MODE): Likewise. (vp_pmode): Likewise. (vp_vmode): Likewise. (vp_neg_reg): New int interator for integer vector pairs. (vpair_neg_): Add integer vector pair support insns. (vpair_not_2): Likewise. (vpair__3): Likewise. (vpair_andc_): Likewise. (vpair_nand__1): Likewise. (vpair_nand__2): Likewise. (vpair_nor__1): Likewise. (vpair_nor__2): Likewise. * doc/extend.texi (PowerPC Vector Pair Built-in Functions): Document the integer vector pair built-in functions. gcc/testsuite/ * gcc.target/powerpc/vector-pair-5.c: New test. * gcc.target/powerpc/vector-pair-6.c: New test. * gcc.target/powerpc/vector-pair-7.c: New test. * gcc.target/powerpc/vector-pair-8.c: New test. --- gcc/config/rs6000/predicates.md | 2 +- gcc/config/rs6000/rs6000-builtins.def | 95 +++++++++ gcc/config/rs6000/vector-pair.md | 185 ++++++++++++++++++ gcc/doc/extend.texi | 44 +++++ .../gcc.target/powerpc/vector-pair-10.c | 86 ++++++++ .../gcc.target/powerpc/vector-pair-11.c | 84 ++++++++ .../gcc.target/powerpc/vector-pair-12.c | 156 +++++++++++++++ .../gcc.target/powerpc/vector-pair-13.c | 139 +++++++++++++ .../gcc.target/powerpc/vector-pair-14.c | 141 +++++++++++++ .../gcc.target/powerpc/vector-pair-15.c | 139 +++++++++++++ .../gcc.target/powerpc/vector-pair-9.c | 13 ++ 11 files changed, 1083 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/vector-pair-10.c create mode 100644 gcc/testsuite/gcc.target/powerpc/vector-pair-11.c create mode 100644 gcc/testsuite/gcc.target/powerpc/vector-pair-12.c create mode 100644 gcc/testsuite/gcc.target/powerpc/vector-pair-13.c create mode 100644 gcc/testsuite/gcc.target/powerpc/vector-pair-14.c create mode 100644 gcc/testsuite/gcc.target/powerpc/vector-pair-15.c create mode 100644 gcc/testsuite/gcc.target/powerpc/vector-pair-9.c diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md index ef7d3f214c4..922a77716c4 100644 --- a/gcc/config/rs6000/predicates.md +++ b/gcc/config/rs6000/predicates.md @@ -1301,7 +1301,7 @@ (define_predicate "splat_input_operand" ;; Return 1 if this operand is valid for a MMA assemble accumulator insn. (define_special_predicate "mma_assemble_input_operand" - (match_test "(mode == V16QImode + (match_test "(VECTOR_MODE_P (mode) && GET_MODE_SIZE (mode) == 16 && (vsx_register_operand (op, mode) || (MEM_P (op) && (indexed_or_indirect_address (XEXP (op, 0), mode) diff --git a/gcc/config/rs6000/rs6000-builtins.def b/gcc/config/rs6000/rs6000-builtins.def index 3b2db39c1ab..fbd416ceb87 100644 --- a/gcc/config/rs6000/rs6000-builtins.def +++ b/gcc/config/rs6000/rs6000-builtins.def @@ -4132,6 +4132,11 @@ void __builtin_vsx_stxvp (v256, unsigned long, const v256 *); STXVP nothing {mma,pair} +;; General vector pair built-in functions + + v256 __builtin_vpair_zero (); + VPAIR_ZERO vpair_zero {mma} + ;; vector pair built-in functions for 8 32-bit float values v256 __builtin_vpair_f32_abs (v256); @@ -4140,6 +4145,12 @@ v256 __builtin_vpair_f32_add (v256, v256); VPAIR_F32_ADD vpair_add_v8sf3 {mma,pair} + v256 __builtin_vpair_f32_assemble (vf, vf); + VPAIR_F32_ASSEMBLE vpair_assemble_v8sf {mma,pair} + + vf __builtin_vpair_f32_extract_vector (v256, const int<1>); + VPAIR_F32_EXTRACT_VECTOR vpair_extract_vector_v8sf {mma,pair} + v256 __builtin_vpair_f32_fma (v256, v256, v256); VPAIR_F32_FMA vpair_fma_v8sf4 {mma,pair} @@ -4155,6 +4166,9 @@ v256 __builtin_vpair_f32_neg (v256); VPAIR_F32_NEG vpair_neg_v8sf2 {mma,pair} + v256 __builtin_vpair_f32_splat (float); + VPAIR_F32_SPLAT vpair_splat_v8sf {mma,pair} + v256 __builtin_vpair_f32_sub (v256, v256); VPAIR_F32_SUB vpair_sub_v8sf3 {mma,pair} @@ -4166,6 +4180,12 @@ v256 __builtin_vpair_f64_add (v256, v256); VPAIR_F64_ADD vpair_add_v4df3 {mma,pair} +v256 __builtin_vpair_f64_assemble (vd, vd); + VPAIR_F64_ASSEMBLE vpair_assemble_v4df {mma,pair} + + vd __builtin_vpair_f64_extract_vector (v256, const int<1>); + VPAIR_F64_EXTRACT_VECTOR vpair_extract_vector_v4df {mma,pair} + v256 __builtin_vpair_f64_fma (v256, v256, v256); VPAIR_F64_FMA vpair_fma_v4df4 {mma,pair} @@ -4181,6 +4201,9 @@ v256 __builtin_vpair_f64_neg (v256); VPAIR_F64_NEG vpair_neg_v4df2 {mma,pair} + v256 __builtin_vpair_f64_splat (double); + VPAIR_F64_SPLAT vpair_splat_v4df {mma,pair} + v256 __builtin_vpair_f64_sub (v256, v256); VPAIR_F64_SUB vpair_sub_v4df3 {mma,pair} @@ -4193,6 +4216,12 @@ v256 __builtin_vpair_i8_and (v256, v256); VPAIR_I8_AND vpair_and_v32qi3 {mma,pair} + v256 __builtin_vpair_i8_assemble (vsc, vsc); + VPAIR_I8_ASSEMBLE vpair_assemble_v32qi {mma,pair} + + vsc __builtin_vpair_i8_extract_vector (v256, const int<1>); + VPAIR_I8_EXTRACT_VECTOR vpair_extract_vector_v32qi {mma,pair} + v256 __builtin_vpair_i8_ior (v256, v256); VPAIR_I8_IOR vpair_ior_v32qi3 {mma,pair} @@ -4208,18 +4237,30 @@ v256 __builtin_vpair_i8_not (v256); VPAIR_I8_NOT vpair_not_v32qi2 {mma,pair} + v256 __builtin_vpair_i8_splat (signed char); + VPAIR_I8_SPLAT vpair_splat_v32qi {mma,pair} + v256 __builtin_vpair_i8_sub (v256, v256); VPAIR_I8_SUB vpair_sub_v32qi3 {mma,pair} v256 __builtin_vpair_i8_xor (v256, v256); VPAIR_I8_XOR vpair_xor_v32qi3 {mma,pair} + v256 __builtin_vpair_i8u_assemble (vuc, vuc); + VPAIR_I8U_ASSEMBLE vpair_assemble_v32qi {mma,pair} + + vuc __builtin_vpair_i8u_extract_vector (v256, const int<1>); + VPAIR_I8U_EXTRACT_VECTOR vpair_extract_vector_v32qi {mma,pair} + v256 __builtin_vpair_i8u_max (v256, v256); VPAIR_I8U_MAX vpair_umax_v32qi3 {mma,pair} v256 __builtin_vpair_i8u_min (v256, v256); VPAIR_I8U_MIN vpair_umin_v32qi3 {mma,pair} + v256 __builtin_vpair_i8u_splat (unsigned char); + VPAIR_I8U_SPLAT vpair_splat_v32qi {mma,pair} + ;; vector pair built-in functions for 16 16-bit unsigned short or ;; signed short values @@ -4229,6 +4270,12 @@ v256 __builtin_vpair_i16_and (v256, v256); VPAIR_I16_AND vpair_and_v16hi3 {mma,pair} + v256 __builtin_vpair_i16_assemble (vss, vss); + VPAIR_I16_ASSEMBLE vpair_assemble_v16hi {mma,pair} + + vss __builtin_vpair_i16_extract_vector (v256, const int<1>); + VPAIR_I16_EXTRACT_VECTOR vpair_extract_vector_v16hi {mma,pair} + v256 __builtin_vpair_i16_ior (v256, v256); VPAIR_I16_IOR vpair_ior_v16hi3 {mma,pair} @@ -4244,18 +4291,30 @@ v256 __builtin_vpair_i16_not (v256); VPAIR_I16_NOT vpair_not_v16hi2 {mma,pair} + v256 __builtin_vpair_i16_splat (short); + VPAIR_I16_SPLAT vpair_splat_v16hi {mma,pair} + v256 __builtin_vpair_i16_sub (v256, v256); VPAIR_I16_SUB vpair_sub_v16hi3 {mma,pair} v256 __builtin_vpair_i16_xor (v256, v256); VPAIR_I16_XOR vpair_xor_v16hi3 {mma,pair} + v256 __builtin_vpair_i16u_assemble (vus, vus); + VPAIR_I16U_ASSEMBLE vpair_assemble_v16hi {mma,pair} + + vus __builtin_vpair_i16u_extract_vector (v256, const int<1>); + VPAIR_I16U_EXTRACT_VECTOR vpair_extract_vector_v16hi {mma,pair} + v256 __builtin_vpair_i16u_max (v256, v256); VPAIR_I16U_MAX vpair_umax_v16hi3 {mma,pair} v256 __builtin_vpair_i16u_min (v256, v256); VPAIR_I16U_MIN vpair_umin_v16hi3 {mma,pair} + v256 __builtin_vpair_i16u_splat (unsigned short); + VPAIR_I16U_SPLAT vpair_splat_v16hi {mma,pair} + ;; vector pair built-in functions for 8 32-bit unsigned int or ;; signed int values @@ -4265,6 +4324,12 @@ v256 __builtin_vpair_i32_and (v256, v256); VPAIR_I32_AND vpair_and_v8si3 {mma,pair} + v256 __builtin_vpair_i32_assemble (vsi, vsi); + VPAIR_I32_ASSEMBLE vpair_assemble_v8si {mma,pair} + + vsi __builtin_vpair_i32_extract_vector (v256, const int<1>); + VPAIR_I32_EXTRACT_VECTOR vpair_extract_vector_v8si {mma,pair} + v256 __builtin_vpair_i32_ior (v256, v256); VPAIR_I32_IOR vpair_ior_v8si3 {mma,pair} @@ -4280,18 +4345,30 @@ v256 __builtin_vpair_i32_not (v256); VPAIR_I32_NOT vpair_not_v8si2 {mma,pair} + v256 __builtin_vpair_i32_splat (int); + VPAIR_I32_SPLAT vpair_splat_v8si {mma,pair} + v256 __builtin_vpair_i32_sub (v256, v256); VPAIR_I32_SUB vpair_sub_v8si3 {mma,pair} v256 __builtin_vpair_i32_xor (v256, v256); VPAIR_I32_XOR vpair_xor_v8si3 {mma,pair} + v256 __builtin_vpair_i32u_assemble (vui, vui); + VPAIR_I32U_ASSEMBLE vpair_assemble_v8si {mma,pair} + + vui __builtin_vpair_i32u_extract_vector (v256, const int<1>); + VPAIR_I32U_EXTRACT_VECTOR vpair_extract_vector_v8si {mma,pair} + v256 __builtin_vpair_i32u_max (v256, v256); VPAIR_I32U_MAX vpair_umax_v8si3 {mma,pair} v256 __builtin_vpair_i32u_min (v256, v256); VPAIR_I32U_MIN vpair_umin_v8si3 {mma,pair} + v256 __builtin_vpair_i32u_splat (unsigned int); + VPAIR_I32U_SPLAT vpair_splat_v8si {mma,pair} + ;; vector pair built-in functions for 4 64-bit unsigned long long or ;; signed long long values @@ -4301,6 +4378,12 @@ v256 __builtin_vpair_i64_and (v256, v256); VPAIR_I64_AND vpair_and_v4di3 {mma,pair} + v256 __builtin_vpair_i64_assemble (vsll, vsll); + VPAIR_I64_ASSEMBLE vpair_assemble_v4di {mma,pair} + + vsll __builtin_vpair_i64_extract_vector (v256, const int<1>); + VPAIR_I64_EXTRACT_VECTOR vpair_extract_vector_v4di {mma,pair} + v256 __builtin_vpair_i64_ior (v256, v256); VPAIR_I64_IOR vpair_ior_v4di3 {mma,pair} @@ -4316,14 +4399,26 @@ v256 __builtin_vpair_i64_not (v256); VPAIR_I64_NOT vpair_not_v4di2 {mma,pair} + v256 __builtin_vpair_i64_splat (long long); + VPAIR_I64_SPLAT vpair_splat_v4di {mma,pair} + v256 __builtin_vpair_i64_sub (v256, v256); VPAIR_I64_SUB vpair_sub_v4di3 {mma,pair} v256 __builtin_vpair_i64_xor (v256, v256); VPAIR_I64_XOR vpair_xor_v4di3 {mma,pair} + v256 __builtin_vpair_i64u_assemble (vull, vull); + VPAIR_I64U_ASSEMBLE vpair_assemble_v4di {mma,pair} + + vull __builtin_vpair_i64u_extract_vector (v256, const int<1>); + VPAIR_I64U_EXTRACT_VECTOR vpair_extract_vector_v4di {mma,pair} + v256 __builtin_vpair_i64u_max (v256, v256); VPAIR_I64U_MAX vpair_umax_v4di3 {mma,pair} v256 __builtin_vpair_i64u_min (v256, v256); VPAIR_I64U_MIN vpair_umin_v4di3 {mma,pair} + + v256 __builtin_vpair_i64u_splat (unsigned long long); + VPAIR_I64U_SPLAT vpair_splat_v4di {mma,pair} diff --git a/gcc/config/rs6000/vector-pair.md b/gcc/config/rs6000/vector-pair.md index cd14430f47a..f6d0b2a39fc 100644 --- a/gcc/config/rs6000/vector-pair.md +++ b/gcc/config/rs6000/vector-pair.md @@ -33,6 +33,8 @@ (define_c_enum "unspec" UNSPEC_VPAIR_V16HI UNSPEC_VPAIR_V8SI UNSPEC_VPAIR_V4DI + UNSPEC_VPAIR_ZERO + UNSPEC_VPAIR_SPLAT ]) ;; Iterator doing unary/binary arithmetic on vector pairs @@ -93,6 +95,13 @@ (define_int_iterator VP_INT [UNSPEC_VPAIR_V4DI UNSPEC_VPAIR_V16HI UNSPEC_VPAIR_V32QI]) +(define_int_iterator VP_ALL [UNSPEC_VPAIR_V4DF + UNSPEC_VPAIR_V8SF + UNSPEC_VPAIR_V4DI + UNSPEC_VPAIR_V8SI + UNSPEC_VPAIR_V16HI + UNSPEC_VPAIR_V32QI]) + ;; Map VP_* to vector mode of the arguments after they are split (define_int_attr VP_VEC_MODE [(UNSPEC_VPAIR_V4DF "V2DF") (UNSPEC_VPAIR_V8SF "V4SF") @@ -126,6 +135,182 @@ (define_int_attr vp_neg_reg [(UNSPEC_VPAIR_V32QI "&v") (UNSPEC_VPAIR_V8SI "X") (UNSPEC_VPAIR_V4DI "X")]) +;; Moddes of the vector element to splat to vector pair +(define_mode_iterator VP_SPLAT [DF SF DI SI HI QI]) + +;; Moddes of the vector to splat to vector pair +(define_mode_iterator VP_SPLAT_VEC [V2DF V4SF V2DI V4SI V8HI V16QI]) + +;; MAP VP_SPLAT and VP_SPLAT_VEC to the mode of the vector pair operation +(define_mode_attr vp_splat_pmode [(DF "v4df") + (V2DF "v4df") + (SF "v8sf") + (V4SF "v8sf") + (DI "v4di") + (V2DI "v4di") + (SI "v8si") + (V4SI "v8si") + (HI "v16hi") + (V8HI "v16hi") + (QI "v32qi") + (V16QI "v32qi")]) + +;; MAP VP_SPLAT to the mode of the vector containing the element +(define_mode_attr VP_SPLAT_VMODE [(DF "V2DF") + (SF "V4SF") + (DI "V2DI") + (SI "V4SI") + (HI "V8HI") + (QI "V16QI")]) + +;; Initialize a vector pair to 0 +(define_insn_and_split "vpair_zero" + [(set (match_operand:OO 0 "vsx_register_operand" "=wa") + (unspec:OO [(const_int 0)] UNSPEC_VPAIR_ZERO))] + "TARGET_MMA" + "#" + "&& reload_completed" + [(set (match_dup 1) (match_dup 3)) + (set (match_dup 2) (match_dup 3))] +{ + rtx op0 = operands[0]; + unsigned offset_hi = (WORDS_BIG_ENDIAN) ? 0 : 16; + unsigned offset_lo = (WORDS_BIG_ENDIAN) ? 16 : 0; + + operands[1] = simplify_gen_subreg (V2DImode, op0, OOmode, offset_hi); + operands[2] = simplify_gen_subreg (V2DImode, op0, OOmode, offset_lo); + operands[3] = CONST0_RTX (V2DImode); +} + [(set_attr "length" "8")]) + +;; Assemble a vector pair from two vectors. Unlike +;; __builtin_mma_assemble_pair, this function produces a vector pair output +;; directly and it takes all of the vector types. +;; +;; We cannot update the two output registers atomically, so mark the output as +;; an early clobber so we don't accidentally clobber the input operands. */ + +(define_insn_and_split "vpair_assemble_" + [(set (match_operand:OO 0 "vsx_register_operand" "=&wa") + (unspec:OO + [(match_operand: 1 "mma_assemble_input_operand" "mwa") + (match_operand: 2 "mma_assemble_input_operand" "mwa")] + VP_ALL))] + "TARGET_MMA" + "#" + "&& reload_completed" + [(const_int 0)] +{ + rtx src = gen_rtx_UNSPEC (OOmode, + gen_rtvec (2, operands[1], operands[2]), + UNSPEC_VSX_ASSEMBLE); + rs6000_split_multireg_move (operands[0], src); + DONE; +} + [(set_attr "length" "8")]) + +;; Extract one of the two 128-bit vectors from a vector pair. +(define_insn_and_split "vpair_extract_vector_" + [(set (match_operand: 0 "vsx_register_operand" "=wa") + (unspec: + [(match_operand:OO 1 "vsx_register_operand" "wa") + (match_operand 2 "const_0_to_1_operand" "n")] + VP_ALL))] + "TARGET_MMA" + "#" + "&& reload_completed" + [(set (match_dup 0) (match_dup 3))] +{ + machine_mode vmode = mode; + unsigned reg_num = UINTVAL (operands[2]); + if (!WORDS_BIG_ENDIAN) + reg_num = 1 - reg_num; + + operands[3] = simplify_gen_subreg (vmode, operands[1], OOmode, reg_num * 16); +}) + +;; Optimize extracting an 128-bit vector from a vector pair in memory. +(define_insn_and_split "*vpair_extract_vector__mem" + [(set (match_operand: 0 "vsx_register_operand" "=wa") + (unspec: + [(match_operand:OO 1 "memory_operand" "o") + (match_operand 2 "const_0_to_1_operand" "n")] + VP_ALL))] + "TARGET_MMA" + "#" + "&& reload_completed" + [(set (match_dup 0) (match_dup 3))] +{ + operands[3] = adjust_address (operands[1], mode, + 16 * INTVAL (operands[2])); +} + [(set_attr "type" "vecload")]) + +;; Create a vector pair with a value splat'ed (duplicated) to all of the +;; elements. +(define_expand "vpair_splat_" + [(use (match_operand:OO 0 "vsx_register_operand")) + (use (match_operand:VP_SPLAT 1 "input_operand"))] + "TARGET_MMA" +{ + rtx op0 = operands[0]; + rtx op1 = operands[1]; + machine_mode element_mode = mode; + machine_mode vector_mode = mode; + + if (op1 == CONST0_RTX (element_mode)) + { + emit_insn (gen_vpair_zero (op0)); + DONE; + } + + rtx vec = gen_reg_rtx (vector_mode); + unsigned num_elements = GET_MODE_NUNITS (vector_mode); + rtvec elements = rtvec_alloc (num_elements); + for (size_t i = 0; i < num_elements; i++) + RTVEC_ELT (elements, i) = copy_rtx (op1); + + rs6000_expand_vector_init (vec, gen_rtx_PARALLEL (vector_mode, elements)); + emit_insn (gen_vpair_splat__internal (op0, vec)); + DONE; +}) + +;; Inner splat support. Operand1 is the vector splat created above. Allow +;; operand 1 to overlap with the output registers to eliminate one move +;; instruction. +(define_insn_and_split "vpair_splat__internal" + [(set (match_operand:OO 0 "vsx_register_operand" "=wa,wa") + (unspec:OO + [(match_operand:VP_SPLAT_VEC 1 "vsx_register_operand" "0,wa")] + UNSPEC_VPAIR_SPLAT))] + "TARGET_MMA" + "#" + "&& reload_completed" + [(const_int 0)] +{ + rtx op0 = operands[0]; + rtx op1 = operands[1]; + rtx op0_vector0 = simplify_gen_subreg (mode, op0, OOmode, 0); + rtx op0_vector1 = simplify_gen_subreg (mode, op0, OOmode, 16); + + /* Check if the input is one of the output registers. */ + if (rtx_equal_p (op0_vector0, op1)) + emit_move_insn (op0_vector1, op1); + + else if (rtx_equal_p (op0_vector1, op1)) + emit_move_insn (op0_vector0, op1); + + else + { + emit_move_insn (op0_vector0, op1); + emit_move_insn (op0_vector1, op1); + } + + DONE; +} + [(set_attr "length" "*,8") + (set_attr "type" "vecmove")]) + ;; Vector pair floating point unary operations (define_insn_and_split "vpair__2" diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index ff7918c7a58..600e2c393db 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -21386,17 +21386,27 @@ two 128-bit vectors stored in the vector pair. The @code{__vector_pair} type is usually stored with a single vector pair store instruction. +The following built-in functions are independent on the type of the +underlying vector: + +@smallexample +__vector_pair __builtin_vpair_zero (); +@end smallexample + The following built-in functions operate on pairs of @code{vector float} values: @smallexample __vector_pair __builtin_vpair_f32_abs (__vector_pair); __vector_pair __builtin_vpair_f32_add (__vector_pair, __vector_pair); +__vector_pair __builtin_vpair_f32_assemble (vector float, vector float); +vector float __builtin_vpair_f32_extract_vector (__vector_pair, int); __vector_pair __builtin_vpair_f32_fma (__vector_pair, __vector_pair, __vector_pair); __vector_pair __builtin_vpair_f32_max (__vector_pair, __vector_pair); __vector_pair __builtin_vpair_f32_min (__vector_pair, __vector_pair); __vector_pair __builtin_vpair_f32_mul (__vector_pair, __vector_pair); __vector_pair __builtin_vpair_f32_neg (__vector_pair); +__vector_pair __builtin_vpair_f32_splat (float); __vector_pair __builtin_vpair_f32_sub (__vector_pair, __vector_pair); @end smallexample @@ -21406,11 +21416,14 @@ The following built-in functions operate on pairs of @smallexample __vector_pair __builtin_vpair_f64_abs (__vector_pair); __vector_pair __builtin_vpair_f64_add (__vector_pair, __vector_pair); +__vector_pair __builtin_vpair_f64_assemble (vector double, vector double); +vector double __builtin_vpair_f64_extract_vector (__vector_pair, int); __vector_pair __builtin_vpair_f64_fma (__vector_pair, __vector_pair, __vector_pair); __vector_pair __builtin_vpair_f64_mul (__vector_pair, __vector_pair); __vector_pair __builtin_vpair_f64_neg (__vector_pair); __vector_pair __builtin_vpair_f64_max (__vector_pair, __vector_pair); __vector_pair __builtin_vpair_f64_min (__vector_pair, __vector_pair); +__vector_pair __builtin_vpair_f64_splat (double); __vector_pair __builtin_vpair_f64_sub (__vector_pair, __vector_pair); @end smallexample @@ -21420,16 +21433,24 @@ The following built-in functions operate on pairs of @smallexample __vector_pair __builtin_vpair_i64_add (__vector_pair, __vector_pair); __vector_pair __builtin_vpair_i64_and (__vector_pair, __vector_pair); +__vector_pair __builtin_vpair_i64_assemble (vector long long, + vector long long); +vector long long __builtin_vpair_i64_extract_vector (__vector_pair, int); __vector_pair __builtin_vpair_i64_ior (__vector_pair, __vector_pair); __vector_pair __builtin_vpair_i64_max (__vector_pair, __vector_pair); __vector_pair __builtin_vpair_i64_min (__vector_pair, __vector_pair); __vector_pair __builtin_vpair_i64_neg (__vector_pair); __vector_pair __builtin_vpair_i64_not (__vector_pair); +__vector_pair __builtin_vpair_i64_splat (long long); __vector_pair __builtin_vpair_i64_sub (__vector_pair, __vector_pair); __vector_pair __builtin_vpair_i64_xor (__vector_pair, __vector_pair); +__vector_pair __builtin_vpair_i64u_assemble (vector unsigned long long, + vector unsigned long long); +vector unsigned long long __builtin_vpair_i64u_extract_vector (__vector_pair, int); __vector_pair __builtin_vpair_i64u_max (__vector_pair, __vector_pair); __vector_pair __builtin_vpair_i64u_min (__vector_pair, __vector_pair); +__vector_pair __builtin_vpair_i64u_splat (unsigned long long); @end smallexample The following built-in functions operate on pairs of @@ -21438,16 +21459,23 @@ The following built-in functions operate on pairs of @smallexample __vector_pair __builtin_vpair_i32_add (__vector_pair, __vector_pair); __vector_pair __builtin_vpair_i32_and (__vector_pair, __vector_pair); +__vector_pair __builtin_vpair_i32_assemble (vector int, vector int); +vector int __builtin_vpair_i32_extract_vector (__vector_pair, int); __vector_pair __builtin_vpair_i32_ior (__vector_pair, __vector_pair); __vector_pair __builtin_vpair_i32_neg (__vector_pair); __vector_pair __builtin_vpair_i32_not (__vector_pair); __vector_pair __builtin_vpair_i32_max (__vector_pair, __vector_pair); __vector_pair __builtin_vpair_i32_min (__vector_pair, __vector_pair); +__vector_pair __builtin_vpair_i32_splat (int); __vector_pair __builtin_vpair_i32_sub (__vector_pair, __vector_pair); __vector_pair __builtin_vpair_i32_xor (__vector_pair, __vector_pair); +__vector_pair __builtin_vpair_i32u_assemble (vector unsigned int, + vector unsigned int); +vector unsigned int __builtin_vpair_i32u_extract_vector (__vector_pair, int); __vector_pair __builtin_vpair_i32u_max (__vector_pair, __vector_pair); __vector_pair __builtin_vpair_i32u_min (__vector_pair, __vector_pair); +__vector_pair __builtin_vpair_i32u_splat (unsigned int); @end smallexample The following built-in functions operate on pairs of @@ -21456,6 +21484,10 @@ The following built-in functions operate on pairs of @smallexample __vector_pair __builtin_vpair_i16_add (__vector_pair, __vector_pair); __vector_pair __builtin_vpair_i16_and (__vector_pair, __vector_pair); +__vector_pair __builtin_vpair_i16_assemble (vector short, + vector short); +__vector_pair __builtin_vpair_i16_splat (short); +vector short __builtin_vpair_i16_extract_vector (__vector_pair, int); __vector_pair __builtin_vpair_i16_ior (__vector_pair, __vector_pair); __vector_pair __builtin_vpair_i16_max (__vector_pair, __vector_pair); __vector_pair __builtin_vpair_i16_min (__vector_pair, __vector_pair); @@ -21464,6 +21496,10 @@ __vector_pair __builtin_vpair_i16_not (__vector_pair); __vector_pair __builtin_vpair_i16_sub (__vector_pair, __vector_pair); __vector_pair __builtin_vpair_i16_xor (__vector_pair, __vector_pair); +__vector_pair __builtin_vpair_i16u_assemble (vector unsigned short, + vector unsigned short); +vector unsigned short __builtin_vpair_i16u_extract_vector (__vector_pair, int); +__vector_pair __builtin_vpair_i16u_splat (unsigned short); __vector_pair __builtin_vpair_i16u_max (__vector_pair, __vector_pair); __vector_pair __builtin_vpair_i16u_min (__vector_pair, __vector_pair); @end smallexample @@ -21474,6 +21510,10 @@ The following built-in functions operate on pairs of @smallexample __vector_pair __builtin_vpair_i8_add (__vector_pair, __vector_pair); __vector_pair __builtin_vpair_i8_and (__vector_pair, __vector_pair); +__vector_pair __builtin_vpair_i8_assemble (vector signed char, + vector signed char); +vector signed char __builtin_vpair_i8_extract_vector (__vector_pair, int); +__vector_pair __builtin_vpair_i8_splat (signed char); __vector_pair __builtin_vpair_i8_ior (__vector_pair, __vector_pair); __vector_pair __builtin_vpair_i8_max (__vector_pair, __vector_pair); __vector_pair __builtin_vpair_i8_min (__vector_pair, __vector_pair); @@ -21482,8 +21522,12 @@ __vector_pair __builtin_vpair_i8_not (__vector_pair); __vector_pair __builtin_vpair_i8_sub (__vector_pair, __vector_pair); __vector_pair __builtin_vpair_i8_xor (__vector_pair, __vector_pair); +__vector_pair __builtin_vpair_i8u_assemble (vector unsigned char, + vector unsigned char4); +vector unsigned char __builtin_vpair_i8u_extract_vector (__vector_pair, int); __vector_pair __builtin_vpair_i8_umax (__vector_pair, __vector_pair); __vector_pair __builtin_vpair_i8_umin (__vector_pair, __vector_pair); +__vector_pair __builtin_vpair_i8u_splat (unsigned char); @end smallexample @node PowerPC Hardware Transactional Memory Built-in Functions diff --git a/gcc/testsuite/gcc.target/powerpc/vector-pair-10.c b/gcc/testsuite/gcc.target/powerpc/vector-pair-10.c new file mode 100644 index 00000000000..df1c4019245 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vector-pair-10.c @@ -0,0 +1,86 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */ + +/* Test the vector pair built-in functions for creation and extraction of + vector pair operations using 32-bit floats. */ + +void +test_f32_splat_0 (__vector_pair *p) +{ + /* 2 xxspltib, 1 stxvp. */ + *p = __builtin_vpair_f32_splat (0.0f); +} + +void +test_f32_splat_1 (__vector_pair *p) +{ + /* 1 xxspltiw, 1 xxlor, 1 stxvp. */ + *p = __builtin_vpair_f32_splat (1.0f); +} + +void +test_f32_splat_var (__vector_pair *p, + float f) +{ + /* 1 xscvdpspn, 1 xxspltw, 1 xxlor, 1 stxvp. */ + *p = __builtin_vpair_f32_splat (f); +} + +void +test_f32_splat_mem (__vector_pair *p, + float *q) +{ + /* 1 lxvwsx, 1 xxlor, 1 stxvp. */ + *p = __builtin_vpair_f32_splat (*q); +} + +void +test_f32_assemble (__vector_pair *p, + vector float v1, + vector float v2) +{ + /* 2 xxlor, 1 stxvp. */ + *p = __builtin_vpair_f32_assemble (v1, v2); +} + +vector float +test_f32_extract_0_reg (__vector_pair *p) +{ + /* 1 lxvp, 1 xxlor. */ + __vector_pair vp = *p; + __asm__ (" # extract in register %x0" : "+wa" (vp)); + return __builtin_vpair_f32_extract_vector (vp, 0); +} + +vector float +test_f32_extract_1_reg (__vector_pair *p) +{ + /* 1 lxvp, 1 xxlor. */ + __vector_pair vp = *p; + __asm__ (" # extract in register %x0" : "+wa" (vp)); + return __builtin_vpair_f32_extract_vector (vp, 0); +} + +vector float +test_f32_extract_0_mem (__vector_pair *p) +{ + /* 1 lxv. */ + return __builtin_vpair_f32_extract_vector (p[1], 0); +} + +vector float +test_f32_extract_1_mem (__vector_pair *p) +{ + /* 1 lxv. */ + return __builtin_vpair_f32_extract_vector (p[2], 1); +} + +/* { dg-final { scan-assembler-times {\mlxv\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mlxvp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mlxvwsx\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mstxvp\M} 5 } } */ +/* { dg-final { scan-assembler-times {\mxscvdpspn\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mxxspltib\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxxspltiw\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mxxspltw\M} 1 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/vector-pair-11.c b/gcc/testsuite/gcc.target/powerpc/vector-pair-11.c new file mode 100644 index 00000000000..397d7f60f45 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vector-pair-11.c @@ -0,0 +1,84 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */ + +/* Test the vector pair built-in functions for creation and extraction of + vector pair operations using 64-bit doubles. */ + +void +test_f64_splat_0 (__vector_pair *p) +{ + /* 2 xxspltib. */ + *p = __builtin_vpair_f64_splat (0.0); +} + +void +test_f64_splat_1 (__vector_pair *p) +{ + /* 1 xxspltidp, 1 xxlor. */ + *p = __builtin_vpair_f64_splat (1.0); +} + +void +test_f64_splat_var (__vector_pair *p, + double d) +{ + /* 1 xxpermdi, 1 xxlor. */ + *p = __builtin_vpair_f64_splat (d); +} + +void +test_f64_splat_mem (__vector_pair *p, + double *q) +{ + /* 1 lxvdsx, 1 xxlor. */ + *p = __builtin_vpair_f64_splat (*q); +} + +void +test_f64_assemble (__vector_pair *p, + vector double v1, + vector double v2) +{ + /* 2 xxlor, 1 stxvp. */ + *p = __builtin_vpair_f64_assemble (v1, v2); +} + +vector double +test_f64_extract_0_reg (__vector_pair *p) +{ + /* 1 lxvp, 1 xxlor. */ + __vector_pair vp = *p; + __asm__ (" # extract in register %x0" : "+wa" (vp)); + return __builtin_vpair_f64_extract_vector (vp, 0); +} + +vector double +test_f64_extract_1_reg (__vector_pair *p) +{ + /* 1 lxvp, 1 xxlor. */ + __vector_pair vp = *p; + __asm__ (" # extract in register %x0" : "+wa" (vp)); + return __builtin_vpair_f64_extract_vector (vp, 0); +} + +vector double +test_f64_extract_0_mem (__vector_pair *p) +{ + /* 1 lxv. */ + return __builtin_vpair_f64_extract_vector (p[1], 0); +} + +vector double +test_f64_extract_1_mem (__vector_pair *p) +{ + /* 1 lxv. */ + return __builtin_vpair_f64_extract_vector (p[2], 1); +} + +/* { dg-final { scan-assembler-times {\mlxvdsx\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mlxvp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mstxvp\M} 5 } } */ +/* { dg-final { scan-assembler-times {\mxxpermdi\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mxxspltib\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxxspltidp\M} 1 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/vector-pair-12.c b/gcc/testsuite/gcc.target/powerpc/vector-pair-12.c new file mode 100644 index 00000000000..0990dfe28d5 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vector-pair-12.c @@ -0,0 +1,156 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */ + +/* Test the vector pair built-in functions for creation and extraction of + vector pair operations using 64-bit integers. */ + +void +test_i64_splat_0 (__vector_pair *p) +{ + /* 2 xxspltib, 1 stxvp. */ + *p = __builtin_vpair_i64_splat (0); +} + +void +test_i64_splat_1 (__vector_pair *p) +{ + /* 1 xxspltib, 1 vextsb2d, 1 xxlor, 1 stxvp. */ + *p = __builtin_vpair_i64_splat (1); +} + +void +test_i64_splat_var (__vector_pair *p, + long long ll) +{ + /* 1 xscvdpspn, 1 xxspltw, 1 xxlor, 1 stxvp. */ + *p = __builtin_vpair_i64_splat (ll); +} + +void +test_i64_splat_mem (__vector_pair *p, + long long *q) +{ + /* 1 lxvwsx, 1 xxlor, 1 stxvp. */ + *p = __builtin_vpair_i64_splat (*q); +} + +void +test_i64_assemble (__vector_pair *p, + vector long long v1, + vector long long v2) +{ + /* 2 xxlor, 1 stxvp. */ + *p = __builtin_vpair_i64_assemble (v1, v2); +} + +vector long long +test_i64_extract_0_reg (__vector_pair *p) +{ + /* 1 lxvp, 1 xxlor. */ + __vector_pair vp = *p; + __asm__ (" # extract in register %x0" : "+wa" (vp)); + return __builtin_vpair_i64_extract_vector (vp, 0); +} + +vector long long +test_i64_extract_1_reg (__vector_pair *p) +{ + /* 1 lxvp, 1 xxlor. */ + __vector_pair vp = *p; + __asm__ (" # extract in register %x0" : "+wa" (vp)); + return __builtin_vpair_i64_extract_vector (vp, 0); +} + +vector long long +test_i64_extract_0_mem (__vector_pair *p) +{ + /* 1 lxv. */ + return __builtin_vpair_i64_extract_vector (p[1], 0); +} + +vector long long +test_i64_extract_1_mem (__vector_pair *p) +{ + /* 1 lxv. */ + return __builtin_vpair_i64_extract_vector (p[2], 1); +} + +void +test_i64u_splat_0 (__vector_pair *p) +{ + /* 2 xxspltib, 1 stxvp. */ + *p = __builtin_vpair_i64u_splat (0); +} + +void +test_i64u_splat_1 (__vector_pair *p) +{ + /* 1 xxspltib, 1 vextsb2d, 1 xxlor, 1 stxvp. */ + *p = __builtin_vpair_i64u_splat (1); +} + +void +test_i64u_splat_var (__vector_pair *p, + unsigned long long ull) +{ + /* 1 xscvdpspn, 1 xxspltw, 1 xxlor, 1 stxvp. */ + *p = __builtin_vpair_i64u_splat (ull); +} + +void +test_i64u_splat_mem (__vector_pair *p, + unsigned long long *q) +{ + /* 1 lxvwsx, 1 xxlor, 1 stxvp. */ + *p = __builtin_vpair_i64u_splat (*q); +} + +void +test_i64u_assemble (__vector_pair *p, + vector unsigned long long v1, + vector unsigned long long v2) +{ + /* 2 xxlor, 1 stxvp. */ + *p = __builtin_vpair_i64u_assemble (v1, v2); +} + +vector unsigned long long +test_i64u_extract_0_reg (__vector_pair *p) +{ + /* 1 lxvp, 1 xxlor. */ + __vector_pair vp = *p; + __asm__ (" # extract in register %x0" : "+wa" (vp)); + return __builtin_vpair_i64u_extract_vector (vp, 0); +} + +vector unsigned long long +test_i64u_extract_1_reg (__vector_pair *p) +{ + /* 1 lxvp, 1 xxlor. */ + __vector_pair vp = *p; + __asm__ (" # extract in register %x0" : "+wa" (vp)); + return __builtin_vpair_i64u_extract_vector (vp, 0); +} + +vector unsigned long long +test_i64u_extract_0_mem (__vector_pair *p) +{ + /* 1 lxv. */ + return __builtin_vpair_i64u_extract_vector (p[1], 0); +} + +vector unsigned long long +test_i64u_extract_1_mem (__vector_pair *p) +{ + /* 1 lxv. */ + return __builtin_vpair_i64u_extract_vector (p[2], 1); +} + +/* { dg-final { scan-assembler-times {\mlxv\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mlxvdsx\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mlxvp\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mmtvsrdd\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mstxvp\M} 10 } } */ +/* { dg-final { scan-assembler-times {\mvextsb2d\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxxspltib\M} 6 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/vector-pair-13.c b/gcc/testsuite/gcc.target/powerpc/vector-pair-13.c new file mode 100644 index 00000000000..8174f6b1cc3 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vector-pair-13.c @@ -0,0 +1,139 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */ + +/* Test the vector pair built-in functions for creation and extraction of + vector pair operations using 32-bit integers. */ + +void +test_i32_splat_0 (__vector_pair *p) +{ + /* 2 xxspltib, 1 stxvp. */ + *p = __builtin_vpair_i32_splat (0); +} + +void +test_i32_splat_1 (__vector_pair *p) +{ + /* 1 vspltisw, 1 xxlor, 1 stxvp. */ + *p = __builtin_vpair_i32_splat (1); +} + +void +test_i32_splat_mem (__vector_pair *p, + int *q) +{ + /* 1 lxvwsx, 1 xxlor, 1 stxvp. */ + *p = __builtin_vpair_i32_splat (*q); +} + +void +test_i32_assemble (__vector_pair *p, + vector int v1, + vector int v2) +{ + /* 2 xxlor, 1 stxvp. */ + *p = __builtin_vpair_i32_assemble (v1, v2); +} + +vector int +test_i32_extract_0_reg (__vector_pair *p) +{ + /* 1 lxvp, 1 xxlor. */ + __vector_pair vp = *p; + __asm__ (" # extract in register %x0" : "+wa" (vp)); + return __builtin_vpair_i32_extract_vector (vp, 0); +} + +vector int +test_i32_extract_1_reg (__vector_pair *p) +{ + /* 1 lxvp, 1 xxlor. */ + __vector_pair vp = *p; + __asm__ (" # extract in register %x0" : "+wa" (vp)); + return __builtin_vpair_i32_extract_vector (vp, 0); +} + +vector int +test_i32_extract_0_mem (__vector_pair *p) +{ + /* 1 lxv. */ + return __builtin_vpair_i32_extract_vector (p[1], 0); +} + +vector int +test_i32_extract_1_mem (__vector_pair *p) +{ + /* 1 lxv. */ + return __builtin_vpair_i32_extract_vector (p[2], 1); +} + +void +test_i32u_splat_0 (__vector_pair *p) +{ + /* 2 xxspltib, 1 stxvp. */ + *p = __builtin_vpair_i32u_splat (0); +} + +void +test_i32u_splat_1 (__vector_pair *p) +{ + /* 1 vspltisw, 1 xxlor, 1 stxvp. */ + *p = __builtin_vpair_i32u_splat (1); +} + +void +test_i32u_splat_mem (__vector_pair *p, + unsigned int *q) +{ + /* 1 lxvwsx, 1 xxlor, 1 stxvp. */ + *p = __builtin_vpair_i32u_splat (*q); +} + +void +test_i32u_assemble (__vector_pair *p, + vector unsigned int v1, + vector unsigned int v2) +{ + /* 2 xxlor, 1 stxvp. */ + *p = __builtin_vpair_i32u_assemble (v1, v2); +} + +vector unsigned int +test_i32u_extract_0_reg (__vector_pair *p) +{ + /* 1 lxvp, 1 xxlor. */ + __vector_pair vp = *p; + __asm__ (" # extract in register %x0" : "+wa" (vp)); + return __builtin_vpair_i32u_extract_vector (vp, 0); +} + +vector unsigned int +test_i32u_extract_1_reg (__vector_pair *p) +{ + /* 1 lxvp, 1 xxlor. */ + __vector_pair vp = *p; + __asm__ (" # extract in register %x0" : "+wa" (vp)); + return __builtin_vpair_i32u_extract_vector (vp, 0); +} + +vector unsigned int +test_i32u_extract_0_mem (__vector_pair *p) +{ + /* 1 lxv. */ + return __builtin_vpair_i32u_extract_vector (p[1], 0); +} + +vector unsigned int +test_i32u_extract_1_mem (__vector_pair *p) +{ + /* 1 lxv. */ + return __builtin_vpair_i32u_extract_vector (p[2], 1); +} + +/* { dg-final { scan-assembler-times {\mlxv\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mlxvp\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mlxvwsx\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mstxvp\M} 8 } } */ +/* { dg-final { scan-assembler-times {\mvspltisw\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxxspltib\M} 4 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/vector-pair-14.c b/gcc/testsuite/gcc.target/powerpc/vector-pair-14.c new file mode 100644 index 00000000000..fe63df795d6 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vector-pair-14.c @@ -0,0 +1,141 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */ + +/* Test the vector pair built-in functions for creation and extraction of + vector pair operations using 16-bit integers. */ + +void +test_i16_splat_0 (__vector_pair *p) +{ + /* 2 xxspltib, 1 stxvp. */ + *p = __builtin_vpair_i16_splat (0); +} + +void +test_i16_splat_1 (__vector_pair *p) +{ + /* 1 vspltish, 1 xxlor, 1 stxvp. */ + *p = __builtin_vpair_i16_splat (1); +} + +void +test_i16_splat_mem (__vector_pair *p, + short *q) +{ + /* 1 lxsihzx, 1 vsplth, 1 xxlor, 1 stxvp. */ + *p = __builtin_vpair_i16_splat (*q); +} + +void +test_i16_assemble (__vector_pair *p, + vector short v1, + vector short v2) +{ + /* 2 xxlor, 1 stxvp. */ + *p = __builtin_vpair_i16_assemble (v1, v2); +} + +vector short +test_i16_extract_0_reg (__vector_pair *p) +{ + /* 1 lxvp, 1 xxlor. */ + __vector_pair vp = *p; + __asm__ (" # extract in register %x0" : "+wa" (vp)); + return __builtin_vpair_i16_extract_vector (vp, 0); +} + +vector short +test_i16_extract_1_reg (__vector_pair *p) +{ + /* 1 lxvp, 1 xxlor. */ + __vector_pair vp = *p; + __asm__ (" # extract in register %x0" : "+wa" (vp)); + return __builtin_vpair_i16_extract_vector (vp, 0); +} + +vector short +test_i16_extract_0_mem (__vector_pair *p) +{ + /* 1 lxv. */ + return __builtin_vpair_i16_extract_vector (p[1], 0); +} + +vector short +test_i16_extract_1_mem (__vector_pair *p) +{ + /* 1 lxv. */ + return __builtin_vpair_i16_extract_vector (p[2], 1); +} + +void +test_i16u_splat_0 (__vector_pair *p) +{ + /* 2 xxspltib, 1 stxvp. */ + *p = __builtin_vpair_i16u_splat (0); +} + +void +test_i16u_splat_1 (__vector_pair *p) +{ + /* 1 vspltish, 1 xxlor, 1 stxvp. */ + *p = __builtin_vpair_i16u_splat (1); +} + +void +test_i16u_splat_mem (__vector_pair *p, + unsigned short *q) +{ + /* 1 lxsihzx, 1 vsplth, 1 xxlor, 1 stxvp. */ + *p = __builtin_vpair_i16u_splat (*q); +} + +void +test_i16u_assemble (__vector_pair *p, + vector unsigned short v1, + vector unsigned short v2) +{ + /* 2 xxlor, 1 stxvp. */ + *p = __builtin_vpair_i16u_assemble (v1, v2); +} + +vector unsigned short +test_i16u_extract_0_reg (__vector_pair *p) +{ + /* 1 lxvp, 1 xxlor. */ + __vector_pair vp = *p; + __asm__ (" # extract in register %x0" : "+wa" (vp)); + return __builtin_vpair_i16u_extract_vector (vp, 0); +} + +vector unsigned short +test_i16u_extract_1_reg (__vector_pair *p) +{ + /* 1 lxvp, 1 xxlor. */ + __vector_pair vp = *p; + __asm__ (" # extract in register %x0" : "+wa" (vp)); + return __builtin_vpair_i16u_extract_vector (vp, 0); +} + +vector unsigned short +test_i16u_extract_0_mem (__vector_pair *p) +{ + /* 1 lxv. */ + return __builtin_vpair_i16u_extract_vector (p[1], 0); +} + +vector unsigned short +test_i16u_extract_1_mem (__vector_pair *p) +{ + /* 1 lxv. */ + return __builtin_vpair_i16u_extract_vector (p[2], 1); +} + +/* { dg-final { scan-assembler-times {\mlxsihzx\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mlxv\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mlxvp\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mstxvp\M} 8 } } */ +/* { dg-final { scan-assembler-times {\mvsplth\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mvspltish\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxxlor\M} 12 } } */ +/* { dg-final { scan-assembler-times {\mxxspltib\M} 4 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/vector-pair-15.c b/gcc/testsuite/gcc.target/powerpc/vector-pair-15.c new file mode 100644 index 00000000000..bd494327af6 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vector-pair-15.c @@ -0,0 +1,139 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */ + +/* Test the vector pair built-in functions for creation and extraction of + vector pair operations using 8-bit integers. */ + +void +test_i8_splat_0 (__vector_pair *p) +{ + /* 2 xxspltib, 1 stxvp. */ + *p = __builtin_vpair_i8_splat (0); +} + +void +test_i8_splat_1 (__vector_pair *p) +{ + /* 1 vspltisb, 1 xxlor, 1 stxvp. */ + *p = __builtin_vpair_i8_splat (1); +} + +void +test_i8_splat_mem (__vector_pair *p, + signed char *q) +{ + /* 1 lxsibzx, 1 vspltb, 1 xxlor, 1 stxvp. */ + *p = __builtin_vpair_i8_splat (*q); +} + +void +test_i8_assemble (__vector_pair *p, + vector signed char v1, + vector signed char v2) +{ + /* 2 xxlor, 1 stxvp. */ + *p = __builtin_vpair_i8_assemble (v1, v2); +} + +vector signed char +test_i8_extract_0_reg (__vector_pair *p) +{ + /* 1 lxvp, 1 xxlor. */ + __vector_pair vp = *p; + __asm__ (" # extract in register %x0" : "+wa" (vp)); + return __builtin_vpair_i8_extract_vector (vp, 0); +} + +vector signed char +test_i8_extract_1_reg (__vector_pair *p) +{ + /* 1 lxvp, 1 xxlor. */ + __vector_pair vp = *p; + __asm__ (" # extract in register %x0" : "+wa" (vp)); + return __builtin_vpair_i8_extract_vector (vp, 0); +} + +vector signed char +test_i8_extract_0_mem (__vector_pair *p) +{ + /* 1 lxv. */ + return __builtin_vpair_i8_extract_vector (p[1], 0); +} + +vector signed char +test_i8_extract_1_mem (__vector_pair *p) +{ + /* 1 lxv. */ + return __builtin_vpair_i8_extract_vector (p[2], 1); +} + +void +test_i8u_splat_0 (__vector_pair *p) +{ + /* 2 xxspltib, 1 stxvp. */ + *p = __builtin_vpair_i8u_splat (0); +} + +void +test_i8u_splat_1 (__vector_pair *p) +{ + /* 1 vspltisb, 1 xxlor, 1 stxvp. */ + *p = __builtin_vpair_i8u_splat (1); +} + +void +test_i8u_splat_mem (__vector_pair *p, + unsigned char *q) +{ + /* 1 lxsibzx, 1 vspltb, 1 xxlor, 1 stxvp. */ + *p = __builtin_vpair_i8u_splat (*q); +} + +void +test_i8u_assemble (__vector_pair *p, + vector unsigned char v1, + vector unsigned char v2) +{ + /* 2 xxlor, 1 stxvp. */ + *p = __builtin_vpair_i8u_assemble (v1, v2); +} + +vector unsigned char +test_i8u_extract_0_reg (__vector_pair *p) +{ + /* 1 lxvp, 1 xxlor. */ + __vector_pair vp = *p; + __asm__ (" # extract in register %x0" : "+wa" (vp)); + return __builtin_vpair_i8u_extract_vector (vp, 0); +} + +vector unsigned char +test_i8u_extract_1_reg (__vector_pair *p) +{ + /* 1 lxvp, 1 xxlor. */ + __vector_pair vp = *p; + __asm__ (" # extract in register %x0" : "+wa" (vp)); + return __builtin_vpair_i8u_extract_vector (vp, 0); +} + +vector unsigned char +test_i8u_extract_0_mem (__vector_pair *p) +{ + /* 1 lxv. */ + return __builtin_vpair_i8u_extract_vector (p[1], 0); +} + +vector unsigned char +test_i8u_extract_1_mem (__vector_pair *p) +{ + /* 1 lxv. */ + return __builtin_vpair_i8u_extract_vector (p[2], 1); +} + +/* { dg-final { scan-assembler-times {\mlxsibzx\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mlxv\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mlxvp\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mstxvp\M} 8 } } */ +/* { dg-final { scan-assembler-times {\mvspltb\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxxspltib\M} 6 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/vector-pair-9.c b/gcc/testsuite/gcc.target/powerpc/vector-pair-9.c new file mode 100644 index 00000000000..95504a5afd0 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vector-pair-9.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */ + +void +test_zero (__vector_pair *p) +{ + /* 2 xxspltib. */ + *p = __builtin_vpair_zero (); +} + +/* { dg-final { scan-assembler-times {\mstxvp\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mxxspltib\M} 2 } } */ From patchwork Fri Nov 10 23:13:58 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Meissner X-Patchwork-Id: 164024 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b129:0:b0:403:3b70:6f57 with SMTP id q9csp1439701vqs; Fri, 10 Nov 2023 15:14:48 -0800 (PST) X-Google-Smtp-Source: AGHT+IElC5v8PULlxSYXFp4Oy5DpAyscvnenYYu9jjEbfJL0VBO8s9Sz2tLJPygvm582WZbQZFi6 X-Received: by 2002:a0c:ea32:0:b0:66d:46ac:2fbf with SMTP id t18-20020a0cea32000000b0066d46ac2fbfmr631282qvp.14.1699658088656; Fri, 10 Nov 2023 15:14:48 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1699658088; cv=pass; d=google.com; s=arc-20160816; b=LU0BwDsur8kbVA2qCssIv5eanbIpawuwPeNAIP2vwettaNxDUYE94QeE7CQ1meX1dY t9xZytSQvpaPOKZ6VKn29vE+2fwjF23qyTFuvEGk2/bzuAOw+tuoQUnkmnrolp81oJTG xftf2Z4ltXpFihA08i2fgOwVJHDXdcDYeroMFwJth2Uw/9DffWa+u+g7PXK2BattPKQ5 TUXpkDSKeLu3iQS6nKGRu2e/17ZoFUKIGqk9TN31eL9ES9J3sfjpUMefL8ZFme30tUFz BYLmvWVPQF9DbBC/Kd6UQXQenL3nrrrCb5W2oPTOsbYXeLMXY0Z9LoCkoE9ya+xI4ny5 BXkQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:in-reply-to:content-disposition :mime-version:references:mail-followup-to:message-id:subject:to:from :date:dkim-signature:arc-filter:dmarc-filter:delivered-to; bh=ORvoLoFz24f0FA3fM7/33kbNsBqPF/NHrMexGZ0vSpI=; fh=jH+DijE7mz3ySVsRmzRqEe/ioBeGu3vnvA+jm2JjCm8=; b=DDhLC6dfDbN3ILzHx/rug6AMyn4hrpMqfUWunVNWvZWADQkAHWluvlM16HnA0fHZ2S 6Q9t+e1TWK83Tu+rGbyHF2OrZQxsztdJTqnW2GL8XnjO9x6AVAgxI1XUbs+ikePiEA1S Bkn9jO850l7X5FfnurrrdqXNIU2Ig/27VyJHUzFcwQOJ465DEn+4KBsvbq2EFiZcmaJz TUSZ3RUDxBGJ5bJpg8aghHJkV1zGaO67Z0uOxRbrhaeQdtxI1WyIA7XGyx6lIDT7GZVS FsE+Dk0FE4sDONj0wu4Q/an68xFaiqSAOzGff8ur+wNZZXZZlpoo5dCwYLKKiP86oZAJ HFXw== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=fIZnkl2P; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com Received: from server2.sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id c5-20020a0ceb45000000b0067073f29dcdsi492662qvq.429.2023.11.10.15.14.48 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 10 Nov 2023 15:14:48 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=fIZnkl2P; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 58A7538582AC for ; Fri, 10 Nov 2023 23:14:48 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 139223858D1E for ; Fri, 10 Nov 2023 23:14:19 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 139223858D1E Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 139223858D1E Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699658061; cv=none; b=Pk493gDvjAHJkCrsva2T1drp0X3mWq2Kj8do9U/5+Acldz2VbxpMog3J4qA1lB5mAab7aM7nCL7YfU3yOg1X2ldb9WYG2p6Hq3Rb+2LiI1G44BJJksMu+ItBVRwE1svLJr2NXEn4ys1QhgjmCvBtK3UIevOUZrPHH+s2fkqFcP4= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699658061; c=relaxed/simple; bh=NTu+5rRqmUzz4a8sxGrisNjsQC0gcmhFOBk0DzX8qp4=; h=DKIM-Signature:Date:From:To:Subject:Message-ID:MIME-Version; b=ByE0nw69j90+FPQt4ZBwCPVqiEuOA9x/Q82HqvLsqfPib++/BiJNf5BA5U75TtKzQJ7Lh9+GlR0KRDVNmSK/473D2VA1+WLKs8lNoiQ4Lp+FKfHTXt1I5V6S6K+UabAVF+zFRk38opSLSofjJprSylMaHv4wMkpwXwFKRN9DAcU= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0353724.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3AANBFGA027357; Fri, 10 Nov 2023 23:14:18 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=date : from : to : subject : message-id : references : mime-version : content-type : in-reply-to; s=pp1; bh=ORvoLoFz24f0FA3fM7/33kbNsBqPF/NHrMexGZ0vSpI=; b=fIZnkl2PFFc/B4P6fHVQXPQXVQLB3GjH4iaExMagcTLiROL3mwOX3zGSTqf1l42wj4Gf 3HHqrIDmOKbr+zS6zbYZOx+snCgiGdKgZlJRQSQFhSHro8kubsAqwtCVtMmuePLa9/Df jkSzwvDHAJes+57CKOAeiRwK61q5Do/qWG0oTuBg8YlaSHpgotpIa0IYLg2Q7byQiiB9 TwpXIGy82v9yXhToLQ70hHuNKxyL+RDuxg2xZRs073A7O9+8lJQJvcG0QVOS3dIU4Wxd zFjZnnceg1roGwvtch4Bmuh1Ro+LtB/TteBGc7j0WcZyq+rCRaheXNeJzKJQxLNGkVjD cQ== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3u9wyar1x5-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 10 Nov 2023 23:14:17 +0000 Received: from m0353724.ppops.net (m0353724.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 3AANCaxl031198; Fri, 10 Nov 2023 23:14:15 GMT Received: from ppma21.wdc07v.mail.ibm.com (5b.69.3da9.ip4.static.sl-reverse.com [169.61.105.91]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3u9wyar1rq-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 10 Nov 2023 23:14:14 +0000 Received: from pps.filterd (ppma21.wdc07v.mail.ibm.com [127.0.0.1]) by ppma21.wdc07v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 3AALV2RN000652; Fri, 10 Nov 2023 23:14:01 GMT Received: from smtprelay04.dal12v.mail.ibm.com ([172.16.1.6]) by ppma21.wdc07v.mail.ibm.com (PPS) with ESMTPS id 3u7w23eb1h-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 10 Nov 2023 23:14:01 +0000 Received: from smtpav05.wdc07v.mail.ibm.com (smtpav05.wdc07v.mail.ibm.com [10.39.53.232]) by smtprelay04.dal12v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 3AANE0Nk1704494 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 10 Nov 2023 23:14:00 GMT Received: from smtpav05.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 7FB8B58059; Fri, 10 Nov 2023 23:14:00 +0000 (GMT) Received: from smtpav05.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id C7B7D58043; Fri, 10 Nov 2023 23:13:59 +0000 (GMT) Received: from cowardly-lion.the-meissners.org (unknown [9.61.104.206]) by smtpav05.wdc07v.mail.ibm.com (Postfix) with ESMTPS; Fri, 10 Nov 2023 23:13:59 +0000 (GMT) Date: Fri, 10 Nov 2023 18:13:58 -0500 From: Michael Meissner To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , "Kewen.Lin" , David Edelsohn , Peter Bergner Subject: [PATCH 4/4] Add support for doing a horizontal add on vector pair elements. Message-ID: Mail-Followup-To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , "Kewen.Lin" , David Edelsohn , Peter Bergner References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: KakXDFAZmM4z6-rmfcylfw3wJvpXD7l6 X-Proofpoint-GUID: sKBi8yTOQ-3UIA6a8_mmojLOVTfrECiv X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.987,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-11-10_21,2023-11-09_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=0 priorityscore=1501 impostorscore=0 bulkscore=0 mlxlogscore=993 suspectscore=0 clxscore=1015 spamscore=0 adultscore=0 phishscore=0 malwarescore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2311060000 definitions=main-2311100192 X-Spam-Status: No, score=-10.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1782220680198925653 X-GMAIL-MSGID: 1782220680198925653 This patch adds a series of built-in functions to allow users to write code to do a number of simple operations where the loop is done using the __vector_pair type. The __vector_pair type is an opaque type. These built-in functions keep the two 128-bit vectors within the __vector_pair together, and split the operation after register allocation. This patch provides vector pair built-in functions to do a horizontal add on vector pair elements. Only floating point and 64-bit horizontal adds are provided in this patch. I have built and tested these patches on: * A little endian power10 server using --with-cpu=power10 * A little endian power9 server using --with-cpu=power9 * A big endian power9 server using --with-cpu=power9. Can I check this patch into the master branch after the preceeding patches have been checked in? 2023-11-08 Michael Meissner gcc/ * config/rs6000/rs6000-builtins.def (__builtin_vpair_f32_add_elements): New built-in function. (__builtin_vpair_f64_add_elements): Likewise. (__builtin_vpair_i64_add_elements): Likewise. (__builtin_vpair_i64u_add_elements): Likewise. * config/rs6000/vector-pair.md (UNSPEC_VPAIR_REDUCE_PLUS_F32): New unspec. (UNSPEC_VPAIR_REDUCE_PLUS_F64): Likewise. (UNSPEC_VPAIR_REDUCE_PLUS_I64): Likewise. (vpair_reduc_plus_scale_v8sf): New insn. (vpair_reduc_plus_scale_v4df): Likewise. (vpair_reduc_plus_scale_v4di): Likewise. * doc/extend.texi (__builtin_vpair_f32_add_elements): Document. (__builtin_vpair_f64_add_elements): Likewise. (__builtin_vpair_i64_add_elements): Likewise. gcc/testsuite/ * gcc.target/powerpc/vector-pair-16.c: New test. --- gcc/config/rs6000/rs6000-builtins.def | 12 +++ gcc/config/rs6000/vector-pair.md | 93 +++++++++++++++++++ gcc/doc/extend.texi | 3 + .../gcc.target/powerpc/vector-pair-16.c | 45 +++++++++ 4 files changed, 153 insertions(+) create mode 100644 gcc/testsuite/gcc.target/powerpc/vector-pair-16.c diff --git a/gcc/config/rs6000/rs6000-builtins.def b/gcc/config/rs6000/rs6000-builtins.def index fbd416ceb87..b9a16c01420 100644 --- a/gcc/config/rs6000/rs6000-builtins.def +++ b/gcc/config/rs6000/rs6000-builtins.def @@ -4145,6 +4145,9 @@ v256 __builtin_vpair_f32_add (v256, v256); VPAIR_F32_ADD vpair_add_v8sf3 {mma,pair} + float __builtin_vpair_f32_add_elements (v256); + VPAIR_F32_ADD_ELEMENTS vpair_reduc_plus_scale_v8sf {mma,pair} + v256 __builtin_vpair_f32_assemble (vf, vf); VPAIR_F32_ASSEMBLE vpair_assemble_v8sf {mma,pair} @@ -4180,6 +4183,9 @@ v256 __builtin_vpair_f64_add (v256, v256); VPAIR_F64_ADD vpair_add_v4df3 {mma,pair} + double __builtin_vpair_f64_add_elements (v256); + VPAIR_F64_ADD_ELEMENTS vpair_reduc_plus_scale_v4df {mma,pair} + v256 __builtin_vpair_f64_assemble (vd, vd); VPAIR_F64_ASSEMBLE vpair_assemble_v4df {mma,pair} @@ -4375,6 +4381,9 @@ v256 __builtin_vpair_f64_assemble (vd, vd); v256 __builtin_vpair_i64_add (v256, v256); VPAIR_I64_ADD vpair_add_v4di3 {mma,pair} + long long __builtin_vpair_i64_add_elements (v256); + VPAIR_I64_ADD_ELEMENTS vpair_reduc_plus_scale_v4di {mma,pair,no32bit} + v256 __builtin_vpair_i64_and (v256, v256); VPAIR_I64_AND vpair_and_v4di3 {mma,pair} @@ -4408,6 +4417,9 @@ v256 __builtin_vpair_f64_assemble (vd, vd); v256 __builtin_vpair_i64_xor (v256, v256); VPAIR_I64_XOR vpair_xor_v4di3 {mma,pair} + unsigned long long __builtin_vpair_i64u_add_elements (v256); + VPAIR_I64U_ADD_ELEMENTS vpair_reduc_plus_scale_v4di {mma,pair,no32bit} + v256 __builtin_vpair_i64u_assemble (vull, vull); VPAIR_I64U_ASSEMBLE vpair_assemble_v4di {mma,pair} diff --git a/gcc/config/rs6000/vector-pair.md b/gcc/config/rs6000/vector-pair.md index f6d0b2a39fc..b5e9330e71f 100644 --- a/gcc/config/rs6000/vector-pair.md +++ b/gcc/config/rs6000/vector-pair.md @@ -35,6 +35,9 @@ (define_c_enum "unspec" UNSPEC_VPAIR_V4DI UNSPEC_VPAIR_ZERO UNSPEC_VPAIR_SPLAT + UNSPEC_VPAIR_REDUCE_PLUS_F32 + UNSPEC_VPAIR_REDUCE_PLUS_F64 + UNSPEC_VPAIR_REDUCE_PLUS_I64 ]) ;; Iterator doing unary/binary arithmetic on vector pairs @@ -577,6 +580,66 @@ (define_insn_and_split "*vpair_nfms_fpcontract_4" } [(set_attr "length" "8")]) + +;; Add all elements in a pair of V4SF vectors. +(define_insn_and_split "vpair_reduc_plus_scale_v8sf" + [(set (match_operand:SF 0 "vsx_register_operand" "=wa") + (unspec:SF [(match_operand:OO 1 "vsx_register_operand" "v")] + UNSPEC_VPAIR_REDUCE_PLUS_F32)) + (clobber (match_scratch:V4SF 2 "=&v")) + (clobber (match_scratch:V4SF 3 "=&v"))] + "TARGET_MMA" + "#" + "&& reload_completed" + [(pc)] +{ + rtx op0 = operands[0]; + rtx op1 = operands[1]; + rtx tmp1 = operands[2]; + rtx tmp2 = operands[3]; + unsigned r = reg_or_subregno (op1); + rtx op1_hi = gen_rtx_REG (V4SFmode, r); + rtx op1_lo = gen_rtx_REG (V4SFmode, r + 1); + + emit_insn (gen_addv4sf3 (tmp1, op1_hi, op1_lo)); + emit_insn (gen_altivec_vsldoi_v4sf (tmp2, tmp1, tmp1, GEN_INT (8))); + emit_insn (gen_addv4sf3 (tmp2, tmp1, tmp2)); + emit_insn (gen_altivec_vsldoi_v4sf (tmp1, tmp2, tmp2, GEN_INT (4))); + emit_insn (gen_addv4sf3 (tmp2, tmp1, tmp2)); + emit_insn (gen_vsx_xscvspdp_scalar2 (op0, tmp2)); + DONE; +} + [(set_attr "length" "24")]) + +;; Add all elements in a pair of V2DF vectors +(define_insn_and_split "vpair_reduc_plus_scale_v4df" + [(set (match_operand:DF 0 "vsx_register_operand" "=&wa") + (unspec:DF [(match_operand:OO 1 "vsx_register_operand" "wa")] + UNSPEC_VPAIR_REDUCE_PLUS_F64)) + (clobber (match_scratch:DF 2 "=&wa")) + (clobber (match_scratch:V2DF 3 "=&wa"))] + "TARGET_MMA" + "#" + "&& reload_completed" + [(set (match_dup 3) + (plus:V2DF (match_dup 4) + (match_dup 5))) + (set (match_dup 2) + (vec_select:DF (match_dup 3) + (parallel [(match_dup 6)]))) + (set (match_dup 0) + (plus:DF (match_dup 7) + (match_dup 2)))] +{ + unsigned reg1 = reg_or_subregno (operands[1]); + unsigned reg3 = reg_or_subregno (operands[3]); + + operands[4] = gen_rtx_REG (V2DFmode, reg1); + operands[5] = gen_rtx_REG (V2DFmode, reg1 + 1); + operands[6] = GEN_INT (BYTES_BIG_ENDIAN ? 1 : 0); + operands[7] = gen_rtx_REG (DFmode, reg3); +}) + ;; Vector pair integer negate support. (define_insn_and_split "vpair_neg_2" @@ -786,3 +849,33 @@ (define_insn_and_split "*vpair_nor__2" DONE; } [(set_attr "length" "8")]) + +;; Add all elements in a pair of V2DI vectors +(define_insn_and_split "vpair_reduc_plus_scale_v4di" + [(set (match_operand:DI 0 "gpc_reg_operand" "=&r") + (unspec:DI [(match_operand:OO 1 "altivec_register_operand" "v")] + UNSPEC_VPAIR_REDUCE_PLUS_I64)) + (clobber (match_scratch:V2DI 2 "=&v")) + (clobber (match_scratch:DI 3 "=&r"))] + "TARGET_MMA && TARGET_POWERPC64" + "#" + "&& reload_completed" + [(set (match_dup 2) + (plus:V2DI (match_dup 4) + (match_dup 5))) + (set (match_dup 3) + (vec_select:DI (match_dup 2) + (parallel [(const_int 0)]))) + (set (match_dup 0) + (vec_select:DI (match_dup 2) + (parallel [(const_int 1)]))) + (set (match_dup 0) + (plus:DI (match_dup 0) + (match_dup 3)))] +{ + unsigned reg1 = reg_or_subregno (operands[1]); + + operands[4] = gen_rtx_REG (V2DImode, reg1); + operands[5] = gen_rtx_REG (V2DImode, reg1 + 1); +} + [(set_attr "length" "16")]) diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 600e2c393db..0e6e74b8087 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -21399,6 +21399,7 @@ The following built-in functions operate on pairs of @smallexample __vector_pair __builtin_vpair_f32_abs (__vector_pair); __vector_pair __builtin_vpair_f32_add (__vector_pair, __vector_pair); +float __builtin_vpair_f32_add_elements (__vector_pair); __vector_pair __builtin_vpair_f32_assemble (vector float, vector float); vector float __builtin_vpair_f32_extract_vector (__vector_pair, int); __vector_pair __builtin_vpair_f32_fma (__vector_pair, __vector_pair, __vector_pair); @@ -21416,6 +21417,7 @@ The following built-in functions operate on pairs of @smallexample __vector_pair __builtin_vpair_f64_abs (__vector_pair); __vector_pair __builtin_vpair_f64_add (__vector_pair, __vector_pair); +double __builtin_vpair_f64_add_elements (__vector_pair); __vector_pair __builtin_vpair_f64_assemble (vector double, vector double); vector double __builtin_vpair_f64_extract_vector (__vector_pair, int); __vector_pair __builtin_vpair_f64_fma (__vector_pair, __vector_pair, __vector_pair); @@ -21432,6 +21434,7 @@ The following built-in functions operate on pairs of @smallexample __vector_pair __builtin_vpair_i64_add (__vector_pair, __vector_pair); +long long __builtin_vpair_i64_add_elements (__vector_pair); __vector_pair __builtin_vpair_i64_and (__vector_pair, __vector_pair); __vector_pair __builtin_vpair_i64_assemble (vector long long, vector long long); diff --git a/gcc/testsuite/gcc.target/powerpc/vector-pair-16.c b/gcc/testsuite/gcc.target/powerpc/vector-pair-16.c new file mode 100644 index 00000000000..a8c206c4093 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vector-pair-16.c @@ -0,0 +1,45 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */ + +/* Test vector pair built-in functions to do a horizontal add of the + elements. */ + +float +f32_add_elements (__vector_pair *p) +{ + /* 1 lxvp, 1 xvaddsp, 2 vsldoi, 2 xvaddsp, 1 xcvspdp. */ + return __builtin_vpair_f32_add_elements (*p); +} + +double +f64_add_elements (__vector_pair *p) +{ + /* 1 lxvp, 1 xvadddp, 1 xxperdi, 1 fadd/xxadddp. */ + return __builtin_vpair_f64_add_elements (*p); +} + +long long +i64_add_elements (__vector_pair *p) +{ + /* 1 lxvp, 1vaddudm, 1 mfvsrld, 1 mfvsrd, 1 add. */ + return __builtin_vpair_i64_add_elements (*p); +} + +unsigned long long +i64u_add_elements (__vector_pair *p) +{ + /* 1 lxvp, 1vaddudm, 1 mfvsrld, 1 mfvsrd, 1 add. */ + return __builtin_vpair_i64u_add_elements (*p); +} + +/* { dg-final { scan-assembler-times {\mfadd\M|\mxsadddp\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mlxvp\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mmfvsrd\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mmfvsrld\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mvaddudm\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mvsldoi\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxscvspdp\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mxvadddp\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mxvaddsp\M} 3 } } */ +/* { dg-final { scan-assembler-times {\mxxpermdi\M} 1 } } */