From patchwork Mon Nov 20 04:23:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Meissner X-Patchwork-Id: 166943 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:9910:0:b0:403:3b70:6f57 with SMTP id i16csp1974264vqn; Sun, 19 Nov 2023 20:23:37 -0800 (PST) X-Google-Smtp-Source: AGHT+IH+4S2GcyTytxe59EpO7Mm3irCWVlk4hiV0HcXJ8FvF7ueLEIkY+zIu5pvI3+uAkOrLjelI X-Received: by 2002:a05:620a:440d:b0:77b:9bf8:b0e3 with SMTP id v13-20020a05620a440d00b0077b9bf8b0e3mr9111819qkp.27.1700454216934; Sun, 19 Nov 2023 20:23:36 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1700454216; cv=pass; d=google.com; s=arc-20160816; b=Z9FfEzcKoC1/Q9PLZXgPqlx7gpM+T1TiBu/3UwmWVhL0XHcV5TZ/PYtanWo8XA0g7K mVwik5be+DJXoTrFoB4W98XWKnexqFZpV3jGWccrKY9P+AgWrbtUy4xb2sCMP0Ksp7QJ 6xUdWzchvpnONlz/+ucJ3oPx+2HuKoxlTtWph6UMi0AhK1cXXso7R9gZlB68MAUPZUxk HNhz6AEMnETjzyWzyCSFmnOkSA/DGiTysq42zisjNX8gkDvRDh5nvlwLZ7rSJnQxvgpJ oWpyIbhxm3+8vv5CeyBOHq/QS745GoKMVTEvQSKGS7rTfVxM+T6m0deQaTQpV8/2I3P2 Fq3Q== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:mime-version:in-reply-to :content-disposition:references:mail-followup-to:message-id:subject :to:from:date:dkim-signature:arc-filter:dmarc-filter:delivered-to; bh=nKPeLzGLaUyxhJlNuE5nKUmkg9JWM+3Rbrbk7yC85jc=; fh=jH+DijE7mz3ySVsRmzRqEe/ioBeGu3vnvA+jm2JjCm8=; b=gT57fJYp+piL+3u2JYeo4AszIUJDW4GE9LHj+SxtYxyPdahizFNo+bvT9vSUg3VujO FIX9EFgKBShzVOcdc80ZrLLUWB5QTZE2MG1Fnd62h5U7ZRE7p+5yz5KrciG6nbejt6VN nibLGStf/aGrekuwvQBRaDpLjwDcG1qp77OJeS2V1QaZPTvZDxKsqEZ1mMgh/Sb04Kve n3rtOW2Qk5xrIJTtv7yGHCPnVxqKz16mWTwKabD2OUFl+7fsRV3YzAszjd2E4NT/ECR9 8d8epmsRKuUfogtPBOX1XUeangp4OfO3x6BIV3iGy+UrdHatj7q1y9D0eO8AvDD1hQUC km0g== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=EHbkcKUv; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com Received: from server2.sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id d20-20020a05620a137400b007742221bc37si6775798qkl.747.2023.11.19.20.23.36 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 19 Nov 2023 20:23:36 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=EHbkcKUv; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id AB30B3858C20 for ; Mon, 20 Nov 2023 04:23:36 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id F1F223858C50 for ; Mon, 20 Nov 2023 04:23:06 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org F1F223858C50 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org F1F223858C50 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700454191; cv=none; b=Vjbg0OK0Jy5VdnRWALqAve1eTSRmAPq1T/TJ+Rszf/IgUUxbWIjkjStHqAI0Q0Q9a/0yamRPA4g8fq4PziKYXRvYOOmOOpuCfSUM+lHgPQdXIAfwo+IT+rO7jvh0iATfFXlocHVFWFtjBaiLFrSLXpAVkmbWP+IcOom0sQSXDf0= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700454191; c=relaxed/simple; bh=VWAPNQRcZt1OrRUmCJaRXM9xwZjrJV6Tq4LVbNjyYxE=; h=DKIM-Signature:Date:From:To:Subject:Message-ID:MIME-Version; b=aHk7y1BPzHnCbC5y7OHclpadc5R96OOYKHSa3zdb+DRN0NF/0TbsHWepFJPXk5KRPFVivpZNBbPKLF1UG3Bv6DEROR/tyAMg0qsQnVT+yocJzXC7uHOCGBz3DPDByPm40FFxH5WmEmyZQ/1hylVwC7VmYUHu9+g93FAOaH0amZw= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0353725.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3AK23Txg011886; Mon, 20 Nov 2023 04:23:06 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=date : from : to : subject : message-id : references : content-type : in-reply-to : mime-version; s=pp1; bh=nKPeLzGLaUyxhJlNuE5nKUmkg9JWM+3Rbrbk7yC85jc=; b=EHbkcKUvKI9tpv/kM0wgb5zXPr16vuHFnQC3ak5zCUAijkpBnCGZ1MY7LiPQOT8JtOmp cEci6+x2Ohz9TkQFUNRpYpOwbKKZ9nFUvEBEgtamPwXWQfVAGfSN/D5oERpIDzTLWpmL ti2Yur00psCjHT4oYw3G1FdzO6kfbvzIi4S2GIQhETLEptTK85tSB5Ld4EYdegNu2nsG 33NOmaG+311HwgAeXo8lApZzzsSBum0ymQH/5wYqfiruNbOFQ1XpM/VixHq1dAJyomdR n8lZnL8rhrshRYGImaQY67lhzy/L5fdILngfVzubmVZXFQ81q39U7GBYrkW/5ifjeWwD WQ== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3ufp6s9vus-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 20 Nov 2023 04:23:05 +0000 Received: from m0353725.ppops.net (m0353725.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 3AK4Hs0B025808; Mon, 20 Nov 2023 04:23:05 GMT Received: from ppma12.dal12v.mail.ibm.com (dc.9e.1632.ip4.static.sl-reverse.com [50.22.158.220]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3ufp6s9vuf-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 20 Nov 2023 04:23:05 +0000 Received: from pps.filterd (ppma12.dal12v.mail.ibm.com [127.0.0.1]) by ppma12.dal12v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 3AK1psCf015174; Mon, 20 Nov 2023 04:23:04 GMT Received: from smtprelay03.dal12v.mail.ibm.com ([172.16.1.5]) by ppma12.dal12v.mail.ibm.com (PPS) with ESMTPS id 3uf7ksq44b-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 20 Nov 2023 04:23:04 +0000 Received: from smtpav04.dal12v.mail.ibm.com (smtpav04.dal12v.mail.ibm.com [10.241.53.103]) by smtprelay03.dal12v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 3AK4N3aJ14549716 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 20 Nov 2023 04:23:03 GMT Received: from smtpav04.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 849325805A; Mon, 20 Nov 2023 04:23:03 +0000 (GMT) Received: from smtpav04.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 0532B5805E; Mon, 20 Nov 2023 04:23:03 +0000 (GMT) Received: from cowardly-lion.the-meissners.org (unknown [9.61.1.46]) by smtpav04.dal12v.mail.ibm.com (Postfix) with ESMTPS; Mon, 20 Nov 2023 04:23:02 +0000 (GMT) Date: Sun, 19 Nov 2023 23:23:01 -0500 From: Michael Meissner To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , "Kewen.Lin" , David Edelsohn , Peter Bergner Subject: [PATCH 1/4] Add vector pair modes to PowerPC (patch attached) Message-ID: Mail-Followup-To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , "Kewen.Lin" , David Edelsohn , Peter Bergner References: Content-Disposition: inline In-Reply-To: X-TM-AS-GCONF: 00 X-Proofpoint-GUID: lgG0E8IfEtoKwmnDuC9pJB4gwKRnfGFz X-Proofpoint-ORIG-GUID: 5S5DTnw2iNBjxHAF7m3vjDz9SgmE6-7Z X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.987,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-11-20_01,2023-11-17_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1011 bulkscore=0 phishscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 lowpriorityscore=0 suspectscore=0 priorityscore=1501 malwarescore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2311060000 definitions=main-2311200029 X-Spam-Status: No, score=-10.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1783055480890905610 X-GMAIL-MSGID: 1783055480890905610 (sorry I just posted the ChangeLog entry and not the patch). We have had several users ask us to implement ways of using the Power10 load vector pair and store vector pair instructions to give their code a speed up due to reduced memory bandwidth. I had originally posted the following patches: * https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636077.html * https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636078.html * https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636083.html * https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636080.html * https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636081.html to add a set of built-in functions that use the PowePC __vector_pair type and that provide a set of functions to do basic operations on vector pair. After I posted these patches, it was decided that it would be better to have a new type that is used rather than a bunch of new built-in functions. Within the GCC context, the best way to add this support is to extend the vector modes so that V4DFmode, V8SFmode, V4DImode, V8SImode, V16HImode, and V32QImode are used. While in theory you could add a whole new type that isn't a larger size vector, my experience with IEEE 128-bit floating point is that GCC really doesn't like 2 modes that are the same size but have different implementations (such as we see with IEEE 128-bit floating point and IBM double-double 128-bit floating point). So I did not consider adding a new mode for using with vector pairs. My original intention was to just implement V4DFmode and V8SFmode, since the primary users asking for vector pair support are people implementing the high end math libraries like Eigen and Blas. However in implementing this code, I discovered that we will need integer vector pair support as well as floating point vector pair. The integer modes and types are needed to properly implement byte shuffling and vector comparisons which need integer vector pairs. With the current patches, vector pair support is not enabled by default. The main reason is I have not implemented the support for byte shuffling which various tests depend on. I would also like to implement overloads for the vector built-in functions like vec_add, vec_sum, etc. that if you give it a vector pair, it would handle it just like if you give a vector type. In addition, once the various bugs are addressed, I would then implement the support so that automatic vectorization would consider using vector pairs instead of vectors. This is the first patch in the series. It implements the basic modes, and it allows for initialization of the modes. I've added some optimizations for extracting and setting fields within the vector pair. The second patch will implement the floating point vector pair support. The third patch will implement the integer vector pair support. The fourth patch will provide new tests to the test suite. When I test a saxpy type loop (a[i] += (b[i] * c[i])), I generally see a 10% improvement over either auto-factorization, or just using the vector types. I have tested these patches on a little endian power10 system. With -vector-size-32 disabled by default, there are no regressions in the test suite. I have also built and run the tests on both little endian power9 and big endian power9 systems, and there are no regressions. Can I check these patches into the master branch? 2023-11-19 Michael Meissner gcc/ * config/rs6000/constraint.md (eV): New constraint. * config/rs6000/predicates.md (cons_0_to_31_operand): New predicate. (easy_vector_constant): Add support for vector pair constants. (easy_vector_pair_constant): New predicate. (mam_assemble_input_operand): Allow other 16-byte vector modes than Immodest. * config/rs6000/rs6000-c.cc (rs6000_cpu_cpp_builtins): Define __VECTOR_SIZE_32__ if -mvector-size-32. * config/rs6000/rs6000-protos.h (vector_pair_to_vector_mode): New declaration. (split_vector_pair_constant): Likewise. (rs6000_expand_vector_pair_init): Likewise. * config/rs6000/rs6000.cc (rs6000_hard_regno_mode_ok_uncached): Use VECTOR_PAIR_MODE instead of comparing mode to OOmode. (rs6000_modes_tieable_p): Allow various vector pair modes to pair with each other. Allow 16-byte vectors to pair with vector pair modes. (rs6000_setup_reg_addr_masks): Use VECTOR_PAIR_MODE instead of comparing mode to OOmode. (rs6000_init_hard_regno_mode_ok): Setup vector pair mode basic type information and reload handlers. (rs6000_option_override_internal): Warn if -mvector-pair-32 is used without -mcpu=power10 or -mmma. (vector_pair_to_vector_mode): New function. (split_vector_pair_constant): Likewise. (rs6000_expand_vector_pair_init): Likewise. (reg_offset_addressing_ok_p): Add support for vector pair modes. (rs6000_emit_move): Likewise. (rs6000_preferred_reload_class): Likewise. (altivec_expand_vec_perm_le): Likewise. (rs6000_opt_vars): Add -mvector-size-32 switch. (rs6000_split_multireg_move): Add support for vector pair modes. * config/rs6000/rs6000.h (VECTOR_PAIR_MODE): New macro. * config/rs6000/rs6000.md (wd mode attribute): Add vector pair modes. (RELOAD mode iterator): Likewise. (toplevel): Include vector-pair.md. * config/rs6000/rs6000.opt (-mvector-size-32): New option. * config/rs6000/vector-pair.md: New file. * doc/md.texi (PowerPC constraints): Document the eV constraint. --- gcc/config/rs6000/constraints.md | 6 + gcc/config/rs6000/predicates.md | 32 ++- gcc/config/rs6000/rs6000-c.cc | 3 + gcc/config/rs6000/rs6000-protos.h | 3 + gcc/config/rs6000/rs6000.cc | 294 +++++++++++++++++++++++++-- gcc/config/rs6000/rs6000.h | 6 + gcc/config/rs6000/rs6000.md | 7 +- gcc/config/rs6000/rs6000.opt | 4 + gcc/config/rs6000/vector-pair.md | 319 ++++++++++++++++++++++++++++++ gcc/doc/md.texi | 4 + 10 files changed, 656 insertions(+), 22 deletions(-) create mode 100644 gcc/config/rs6000/vector-pair.md diff --git a/gcc/config/rs6000/constraints.md b/gcc/config/rs6000/constraints.md index c4a6ccf4efb..f28e7701a4e 100644 --- a/gcc/config/rs6000/constraints.md +++ b/gcc/config/rs6000/constraints.md @@ -219,6 +219,12 @@ (define_constraint "eQ" "An IEEE 128-bit constant that can be loaded into VSX registers." (match_operand 0 "easy_vector_constant_ieee128")) +;; A vector pair constant that can be loaded into registers without using a +;; load operation. +(define_constraint "eV" + "A vector pair constant that can be loaded into VSX registers." + (match_operand 0 "easy_vector_pair_constant")) + ;; Floating-point constraints. These two are defined so that insn ;; length attributes can be calculated exactly. diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md index ef7d3f214c4..1a1ebfd0e72 100644 --- a/gcc/config/rs6000/predicates.md +++ b/gcc/config/rs6000/predicates.md @@ -327,6 +327,11 @@ (define_predicate "const_0_to_15_operand" (and (match_code "const_int") (match_test "IN_RANGE (INTVAL (op), 0, 15)"))) +;; Match op = 0..31 +(define_predicate "const_0_to_31_operand" + (and (match_code "const_int") + (match_test "IN_RANGE (INTVAL (op), 0, 31)"))) + ;; Return 1 if op is a 34-bit constant integer. (define_predicate "cint34_operand" (match_code "const_int") @@ -729,6 +734,9 @@ (define_predicate "easy_vector_constant" if (zero_constant (op, mode) || all_ones_constant (op, mode)) return true; + if (VECTOR_PAIR_MODE (mode) && easy_vector_pair_constant (op, mode)) + return true; + /* Constants that can be generated with ISA 3.1 instructions are easy. */ vec_const_128bit_type vsx_const; @@ -759,6 +767,26 @@ (define_predicate "easy_vector_constant" return false; }) +;; Return 1 if the operand is a CONST_VECTOR and can be loaded into a +;; a pair of vector registers without using memory. +(define_predicate "easy_vector_pair_constant" + (match_code "const_vector") +{ + rtx hi_constant, lo_constant; + machine_mode vmode; + + if (!TARGET_MMA || !TARGET_VECTOR_SIZE_32 || !VECTOR_PAIR_MODE (mode)) + return false; + + vmode = vector_pair_to_vector_mode (mode); + if (vmode == VOIDmode) + return false; + + return (split_vector_pair_constant (op, &hi_constant, &lo_constant) + && easy_vector_constant (hi_constant, vmode) + && easy_vector_constant (lo_constant, vmode)); +}) + ;; Same as easy_vector_constant but only for EASY_VECTOR_15_ADD_SELF. (define_predicate "easy_vector_constant_add_self" (and (match_code "const_vector") @@ -1301,8 +1329,10 @@ (define_predicate "splat_input_operand" ;; Return 1 if this operand is valid for a MMA assemble accumulator insn. (define_special_predicate "mma_assemble_input_operand" - (match_test "(mode == V16QImode + (match_test "(GET_MODE_SIZE (mode) == 16 && VECTOR_MODE_P (mode) && (vsx_register_operand (op, mode) + || op == CONST0_RTX (mode) + || vsx_prefixed_constant (op, mode) || (MEM_P (op) && (indexed_or_indirect_address (XEXP (op, 0), mode) || quad_address_p (XEXP (op, 0), mode, false)))))")) diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc index 65be0ac43e2..27114b14022 100644 --- a/gcc/config/rs6000/rs6000-c.cc +++ b/gcc/config/rs6000/rs6000-c.cc @@ -631,6 +631,9 @@ rs6000_cpu_cpp_builtins (cpp_reader *pfile) builtin_define ("__SIZEOF_IBM128__=16"); if (ieee128_float_type_node) builtin_define ("__SIZEOF_IEEE128__=16"); + if (TARGET_MMA && TARGET_VECTOR_SIZE_32) + builtin_define ("__VECTOR_SIZE_32__"); + #ifdef TARGET_LIBC_PROVIDES_HWCAP_IN_TCB builtin_define ("__BUILTIN_CPU_SUPPORTS__"); #endif diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h index f70118ea40f..e17d73cb4ca 100644 --- a/gcc/config/rs6000/rs6000-protos.h +++ b/gcc/config/rs6000/rs6000-protos.h @@ -61,6 +61,9 @@ extern bool rs6000_move_128bit_ok_p (rtx []); extern bool rs6000_split_128bit_ok_p (rtx []); extern void rs6000_expand_float128_convert (rtx, rtx, bool); extern void rs6000_expand_vector_init (rtx, rtx); +extern machine_mode vector_pair_to_vector_mode (machine_mode); +extern bool split_vector_pair_constant (rtx, rtx *, rtx *); +extern void rs6000_expand_vector_pair_init (rtx, rtx); extern void rs6000_expand_vector_set (rtx, rtx, rtx); extern void rs6000_expand_vector_extract (rtx, rtx, rtx); extern void rs6000_split_vec_extract_var (rtx, rtx, rtx, rtx, rtx); diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index 0dd21e67dde..c9bd8c35e63 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -1843,7 +1843,7 @@ rs6000_hard_regno_mode_ok_uncached (int regno, machine_mode mode) /* Vector pair modes need even/odd VSX register pairs. Only allow vector registers. */ - if (mode == OOmode) + if (VECTOR_PAIR_MODE (mode)) return (TARGET_MMA && VSX_REGNO_P (regno) && (regno & 1) == 0); /* MMA accumulator modes need FPR registers divisible by 4. */ @@ -1954,9 +1954,10 @@ rs6000_hard_regno_mode_ok (unsigned int regno, machine_mode mode) GPR registers, and TImode can go in any GPR as well as VSX registers (PR 57744). - Similarly, don't allow OOmode (vector pair, restricted to even VSX - registers) or XOmode (vector quad, restricted to FPR registers divisible - by 4) to tie with other modes. + Similarly, don't allow XOmode (vector quad, restricted to FPR registers + divisible by 4) to tie with other modes. + + Vector pair modes can tie with other vector pair modes. Altivec/VSX vector tests were moved ahead of scalar float mode, so that IEEE 128-bit floating point on VSX systems ties with other vectors. */ @@ -1964,9 +1965,14 @@ rs6000_hard_regno_mode_ok (unsigned int regno, machine_mode mode) static bool rs6000_modes_tieable_p (machine_mode mode1, machine_mode mode2) { - if (mode1 == PTImode || mode1 == OOmode || mode1 == XOmode - || mode2 == PTImode || mode2 == OOmode || mode2 == XOmode) - return mode1 == mode2; + if (mode1 == PTImode || mode1 == XOmode + || mode2 == PTImode || mode2 == XOmode) + return mode1 == mode2; + + if (VECTOR_PAIR_MODE (mode1)) + return VECTOR_PAIR_MODE (mode2); + if (VECTOR_PAIR_MODE (mode2)) + return ALTIVEC_OR_VSX_VECTOR_MODE (mode1); if (ALTIVEC_OR_VSX_VECTOR_MODE (mode1)) return ALTIVEC_OR_VSX_VECTOR_MODE (mode2); @@ -2715,13 +2721,13 @@ rs6000_setup_reg_addr_masks (void) of the LXVP or STXVP instructions, do not allow indexed mode so that we can split the load/store. */ else if ((addr_mask != 0) && TARGET_MMA - && (m2 == OOmode || m2 == XOmode)) + && (VECTOR_PAIR_MODE (m2) || m2 == XOmode)) { addr_mask |= RELOAD_REG_OFFSET; if (rc == RELOAD_REG_FPR || rc == RELOAD_REG_VMX) { addr_mask |= RELOAD_REG_QUAD_OFFSET; - if (m2 == OOmode + if (VECTOR_PAIR_MODE (m2) && TARGET_LOAD_VECTOR_PAIR && TARGET_STORE_VECTOR_PAIR) addr_mask |= RELOAD_REG_INDEXED; @@ -2941,6 +2947,33 @@ rs6000_init_hard_regno_mode_ok (bool global_init_p) rs6000_vector_align[XOmode] = 512; } + if (TARGET_MMA && TARGET_VECTOR_SIZE_32) + { + rs6000_vector_unit[V32QImode] = VECTOR_NONE; + rs6000_vector_mem[V32QImode] = VECTOR_VSX; + rs6000_vector_align[V32QImode] = 256; + + rs6000_vector_unit[V16HImode] = VECTOR_NONE; + rs6000_vector_mem[V16HImode] = VECTOR_VSX; + rs6000_vector_align[V16HImode] = 256; + + rs6000_vector_unit[V8SImode] = VECTOR_NONE; + rs6000_vector_mem[V8SImode] = VECTOR_VSX; + rs6000_vector_align[V8SImode] = 256; + + rs6000_vector_unit[V8SFmode] = VECTOR_NONE; + rs6000_vector_mem[V8SFmode] = VECTOR_VSX; + rs6000_vector_align[V8SFmode] = 256; + + rs6000_vector_unit[V4DImode] = VECTOR_NONE; + rs6000_vector_mem[V4DImode] = VECTOR_VSX; + rs6000_vector_align[V4DImode] = 256; + + rs6000_vector_unit[V4DFmode] = VECTOR_NONE; + rs6000_vector_mem[V4DFmode] = VECTOR_VSX; + rs6000_vector_align[V4DFmode] = 256; + } + /* Register class constraints for the constraints that depend on compile switches. When the VSX code was added, different constraints were added based on the type (DFmode, V2DFmode, V4SFmode). For the vector types, all @@ -3072,6 +3105,22 @@ rs6000_init_hard_regno_mode_ok (bool global_init_p) reg_addr[XOmode].reload_store = CODE_FOR_reload_xo_di_store; reg_addr[XOmode].reload_load = CODE_FOR_reload_xo_di_load; } + + if (TARGET_MMA && TARGET_VECTOR_SIZE_32) + { + reg_addr[V32QImode].reload_store = CODE_FOR_reload_v32qi_di_store; + reg_addr[V32QImode].reload_load = CODE_FOR_reload_v32qi_di_load; + reg_addr[V16HImode].reload_store = CODE_FOR_reload_v16hi_di_store; + reg_addr[V16HImode].reload_load = CODE_FOR_reload_v16hi_di_load; + reg_addr[V8SImode].reload_store = CODE_FOR_reload_v8si_di_store; + reg_addr[V8SImode].reload_load = CODE_FOR_reload_v8si_di_load; + reg_addr[V8SFmode].reload_store = CODE_FOR_reload_v8sf_di_store; + reg_addr[V8SFmode].reload_load = CODE_FOR_reload_v8sf_di_load; + reg_addr[V4DImode].reload_store = CODE_FOR_reload_v4di_di_store; + reg_addr[V4DImode].reload_load = CODE_FOR_reload_v4di_di_load; + reg_addr[V4DFmode].reload_store = CODE_FOR_reload_v4df_di_store; + reg_addr[V4DFmode].reload_load = CODE_FOR_reload_v4df_di_load; + } } } else @@ -3129,6 +3178,22 @@ rs6000_init_hard_regno_mode_ok (bool global_init_p) reg_addr[DDmode].reload_fpr_gpr = CODE_FOR_reload_fpr_from_gprdd; reg_addr[DFmode].reload_fpr_gpr = CODE_FOR_reload_fpr_from_gprdf; } + + if (TARGET_MMA && TARGET_VECTOR_SIZE_32) + { + reg_addr[V32QImode].reload_store = CODE_FOR_reload_v32qi_si_store; + reg_addr[V32QImode].reload_load = CODE_FOR_reload_v32qi_si_load; + reg_addr[V16HImode].reload_store = CODE_FOR_reload_v16hi_si_store; + reg_addr[V16HImode].reload_load = CODE_FOR_reload_v16hi_si_load; + reg_addr[V8SImode].reload_store = CODE_FOR_reload_v8si_si_store; + reg_addr[V8SImode].reload_load = CODE_FOR_reload_v8si_si_load; + reg_addr[V8SFmode].reload_store = CODE_FOR_reload_v8sf_si_store; + reg_addr[V8SFmode].reload_load = CODE_FOR_reload_v8sf_si_load; + reg_addr[V4DImode].reload_store = CODE_FOR_reload_v4di_si_store; + reg_addr[V4DImode].reload_load = CODE_FOR_reload_v4di_si_load; + reg_addr[V4DFmode].reload_store = CODE_FOR_reload_v4df_si_store; + reg_addr[V4DFmode].reload_load = CODE_FOR_reload_v4df_si_load; + } } reg_addr[DFmode].scalar_in_vmx_p = true; @@ -4429,6 +4494,15 @@ rs6000_option_override_internal (bool global_init_p) rs6000_isa_flags &= OPTION_MASK_STORE_VECTOR_PAIR; } + if (!TARGET_MMA && TARGET_VECTOR_SIZE_32) + { + if (OPTION_SET_P (TARGET_VECTOR_SIZE_32)) + warning (0, "%qs should not be used unless you use %qs", + "-mvector-size-32", "-mmma"); + + TARGET_VECTOR_SIZE_32 = 0; + } + /* Enable power10 fusion if we are tuning for power10, even if we aren't generating power10 instructions. */ if (!(rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION)) @@ -7275,6 +7349,142 @@ rs6000_expand_vector_init (rtx target, rtx vals) emit_move_insn (target, mem); } +/* For a vector pair mode, return the equivalent vector mode or VOIDmode. */ + +machine_mode +vector_pair_to_vector_mode (machine_mode mode) +{ + machine_mode vmode; + + switch (mode) + { + case E_V32QImode: vmode = V16QImode; break; + case E_V16HImode: vmode = V8HImode; break; + case E_V8SImode: vmode = V4SImode; break; + case E_V4DImode: vmode = V2DImode; break; + case E_V8SFmode: vmode = V4SFmode; break; + case E_V4DFmode: vmode = V2DFmode; break; + case E_OOmode: vmode = V1TImode; break; + default: vmode = VOIDmode; break; + } + + return vmode; +} + +/* Split a vector constant for a type that can be held into a vector register + pair into 2 separate constants that can be held in a single vector register. + Return true if we can split the constant. */ + +bool +split_vector_pair_constant (rtx op, rtx *high, rtx *low) +{ + machine_mode vmode = vector_pair_to_vector_mode (GET_MODE (op)); + + *high = *low = NULL_RTX; + + if (!CONST_VECTOR_P (op) || vmode == GET_MODE (op)) + return false; + + size_t nunits = GET_MODE_NUNITS (vmode); + rtvec hi_vec = rtvec_alloc (nunits); + rtvec lo_vec = rtvec_alloc (nunits); + + for (size_t i = 0; i < nunits; i++) + { + RTVEC_ELT (hi_vec, i) = CONST_VECTOR_ELT (op, i); + RTVEC_ELT (lo_vec, i) = CONST_VECTOR_ELT (op, i + nunits); + } + + *high = gen_rtx_CONST_VECTOR (vmode, hi_vec); + *low = gen_rtx_CONST_VECTOR (vmode, lo_vec); + return true; +} + +/* Initialize vector pair TARGET to VALS. */ + +void +rs6000_expand_vector_pair_init (rtx target, rtx vals) +{ + machine_mode mode_vpair = GET_MODE (target); + machine_mode mode_vector; + size_t n_elts_vpair = GET_MODE_NUNITS (mode_vpair); + bool all_same = true; + rtx first = XVECEXP (vals, 0, 0); + rtx (*gen_splat) (rtx, rtx); + rtx (*gen_concat) (rtx, rtx, rtx); + + switch (mode_vpair) + { + case E_V32QImode: + mode_vector = V16QImode; + gen_splat = gen_vpair_splat_v32qi; + gen_concat = gen_vpair_concat_v32qi; + break; + + case E_V16HImode: + mode_vector = V8HImode; + gen_splat = gen_vpair_splat_v16hi; + gen_concat = gen_vpair_concat_v16hi; + break; + + case E_V8SImode: + mode_vector = V4SImode; + gen_splat = gen_vpair_splat_v8si; + gen_concat = gen_vpair_concat_v8si; + break; + + case E_V4DImode: + mode_vector = V2DImode; + gen_splat = gen_vpair_splat_v4di; + gen_concat = gen_vpair_concat_v4di; + break; + + case E_V8SFmode: + mode_vector = V4SFmode; + gen_splat = gen_vpair_splat_v8sf; + gen_concat = gen_vpair_concat_v8sf; + break; + + case E_V4DFmode: + mode_vector = V2DFmode; + gen_splat = gen_vpair_splat_v4df; + gen_concat = gen_vpair_concat_v4df; + break; + + default: + gcc_unreachable (); + } + + /* See if we can do a splat operation. */ + for (size_t i = 1; i < n_elts_vpair; ++i) + { + if (!rtx_equal_p (XVECEXP (vals, 0, i), first)) + { + all_same = false; + break; + } + } + + if (all_same) + { + emit_insn (gen_splat (target, first)); + return; + } + + /* Break the initialization into two parts. */ + rtx vector_hi = gen_reg_rtx (mode_vector); + rtx vector_lo = gen_reg_rtx (mode_vector); + rtx vals_hi; + rtx vals_lo; + + split_vector_pair_constant (vals, &vals_hi, &vals_lo); + + rs6000_expand_vector_init (vector_hi, vals_hi); + rs6000_expand_vector_init (vector_lo, vals_lo); + emit_insn (gen_concat (target, vector_hi, vector_lo)); + return; +} + /* Insert VAL into IDX of TARGET, VAL size is same of the vector element, IDX is variable and also counts by vector element size for p9 and above. */ @@ -8694,6 +8904,12 @@ reg_offset_addressing_ok_p (machine_mode mode) /* The vector pair/quad types support offset addressing if the underlying vectors support offset addressing. */ case E_OOmode: + case E_V32QImode: + case E_V16HImode: + case E_V8SImode: + case E_V8SFmode: + case E_V4DImode: + case E_V4DFmode: case E_XOmode: return TARGET_MMA; @@ -11202,6 +11418,12 @@ rs6000_emit_move (rtx dest, rtx source, machine_mode mode) case E_V2DFmode: case E_V2DImode: case E_V1TImode: + case E_V32QImode: + case E_V16HImode: + case E_V8SFmode: + case E_V8SImode: + case E_V4DFmode: + case E_V4DImode: if (CONSTANT_P (operands[1]) && !easy_vector_constant (operands[1], mode)) operands[1] = force_const_mem (mode, operands[1]); @@ -13456,7 +13678,7 @@ rs6000_preferred_reload_class (rtx x, enum reg_class rclass) the GPR registers. */ if (rclass == GEN_OR_FLOAT_REGS) { - if (mode == OOmode) + if (VECTOR_PAIR_MODE (mode)) return VSX_REGS; if (mode == XOmode) @@ -23417,6 +23639,7 @@ altivec_expand_vec_perm_le (rtx operands[4]) rtx tmp = target; rtx norreg = gen_reg_rtx (V16QImode); machine_mode mode = GET_MODE (target); + machine_mode qi_vmode = VECTOR_PAIR_MODE (mode) ? V32QImode : V16QImode; /* Get everything in regs so the pattern matches. */ if (!REG_P (op0)) @@ -23424,7 +23647,7 @@ altivec_expand_vec_perm_le (rtx operands[4]) if (!REG_P (op1)) op1 = force_reg (mode, op1); if (!REG_P (sel)) - sel = force_reg (V16QImode, sel); + sel = force_reg (qi_vmode, sel); if (!REG_P (target)) tmp = gen_reg_rtx (mode); @@ -23437,10 +23660,10 @@ altivec_expand_vec_perm_le (rtx operands[4]) { /* Invert the selector with a VNAND if available, else a VNOR. The VNAND is preferred for future fusion opportunities. */ - notx = gen_rtx_NOT (V16QImode, sel); + notx = gen_rtx_NOT (qi_vmode, sel); iorx = (TARGET_P8_VECTOR - ? gen_rtx_IOR (V16QImode, notx, notx) - : gen_rtx_AND (V16QImode, notx, notx)); + ? gen_rtx_IOR (qi_vmode, notx, notx) + : gen_rtx_AND (qi_vmode, notx, notx)); emit_insn (gen_rtx_SET (norreg, iorx)); /* Permute with operands reversed and adjusted selector. */ @@ -24572,6 +24795,9 @@ static struct rs6000_opt_var const rs6000_opt_vars[] = { "speculate-indirect-jumps", offsetof (struct gcc_options, x_rs6000_speculate_indirect_jumps), offsetof (struct cl_target_option, x_rs6000_speculate_indirect_jumps), }, + { "vector-size-32", + offsetof (struct gcc_options, x_TARGET_VECTOR_SIZE_32), + offsetof (struct cl_target_option, x_TARGET_VECTOR_SIZE_32), }, }; /* Inner function to handle attribute((target("..."))) and #pragma GCC target @@ -27426,6 +27652,8 @@ rs6000_split_multireg_move (rtx dst, rtx src) int reg_mode_size; /* The number of registers that will be moved. */ int nregs; + /* Hi/lo values for splitting vector pair constants. */ + rtx vpair_hi, vpair_lo; reg = REG_P (dst) ? REGNO (dst) : REGNO (src); mode = GET_MODE (dst); @@ -27441,8 +27669,11 @@ rs6000_split_multireg_move (rtx dst, rtx src) } /* If we have a vector pair/quad mode, split it into two/four separate vectors. */ - else if (mode == OOmode || mode == XOmode) - reg_mode = V1TImode; + else if (VECTOR_PAIR_MODE (mode) || mode == XOmode) + { + machine_mode vmode = vector_pair_to_vector_mode (mode); + reg_mode = (vmode == VOIDmode) ? V1TImode : vmode; + } else if (FP_REGNO_P (reg)) reg_mode = DECIMAL_FLOAT_MODE_P (mode) ? DDmode : (TARGET_HARD_FLOAT ? DFmode : SFmode); @@ -27454,6 +27685,29 @@ rs6000_split_multireg_move (rtx dst, rtx src) gcc_assert (reg_mode_size * nregs == GET_MODE_SIZE (mode)); + /* Handle vector pair constants. */ + if (CONST_VECTOR_P (src) && VECTOR_PAIR_MODE (mode) && TARGET_MMA + && split_vector_pair_constant (src, &vpair_hi, &vpair_lo) + && VSX_REGNO_P (reg)) + { + reg_mode = GET_MODE (vpair_hi); + rtx reg_hi = gen_rtx_REG (reg_mode, reg); + rtx reg_lo = gen_rtx_REG (reg_mode, reg + 1); + + emit_move_insn (reg_hi, vpair_hi); + + /* 0.0 is easy. For other constants, copy the high register into the low + register if the two sets of constants are equal. This means we won't + be doing back to back prefixed load immediate instructions. */ + if (rtx_equal_p (vpair_hi, vpair_lo) + && !rtx_equal_p (vpair_hi, CONST0_RTX (reg_mode))) + emit_move_insn (reg_lo, reg_hi); + else + emit_move_insn (reg_lo, vpair_lo); + + return; + } + /* TDmode residing in FP registers is special, since the ISA requires that the lower-numbered word of a register pair is always the most significant word, even in little-endian mode. This does not match the usual subreg @@ -27493,7 +27747,7 @@ rs6000_split_multireg_move (rtx dst, rtx src) below. This means the last register gets the first memory location. We also need to be careful of using the right register numbers if we are splitting XO to OO. */ - if (mode == OOmode || mode == XOmode) + if (VECTOR_PAIR_MODE (mode) || mode == XOmode) { nregs = hard_regno_nregs (reg, mode); int reg_mode_nregs = hard_regno_nregs (reg, reg_mode); @@ -27553,7 +27807,7 @@ rs6000_split_multireg_move (rtx dst, rtx src) gcc_assert (REG_P (dst)); if (GET_MODE (src) == XOmode) gcc_assert (FP_REGNO_P (REGNO (dst))); - if (GET_MODE (src) == OOmode) + if (VECTOR_PAIR_MODE (GET_MODE (src))) gcc_assert (VSX_REGNO_P (REGNO (dst))); int nvecs = XVECLEN (src, 0); @@ -27628,7 +27882,7 @@ rs6000_split_multireg_move (rtx dst, rtx src) overlap. */ int i; /* XO/OO are opaque so cannot use subregs. */ - if (mode == OOmode || mode == XOmode ) + if (VECTOR_PAIR_MODE (mode) || mode == XOmode ) { for (i = nregs - 1; i >= 0; i--) { @@ -27802,7 +28056,7 @@ rs6000_split_multireg_move (rtx dst, rtx src) continue; /* XO/OO are opaque so cannot use subregs. */ - if (mode == OOmode || mode == XOmode ) + if (VECTOR_PAIR_MODE (mode) || mode == XOmode ) { rtx dst_i = gen_rtx_REG (reg_mode, REGNO (dst) + j); rtx src_i = gen_rtx_REG (reg_mode, REGNO (src) + j); diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h index 326c45221e9..32848f7d15b 100644 --- a/gcc/config/rs6000/rs6000.h +++ b/gcc/config/rs6000/rs6000.h @@ -1006,6 +1006,12 @@ enum data_align { align_abi, align_opt, align_both }; (ALTIVEC_VECTOR_MODE (MODE) || VSX_VECTOR_MODE (MODE) \ || (MODE) == V2DImode || (MODE) == V1TImode) +/* Whether a mode is held in paired vector registers. */ +#define VECTOR_PAIR_MODE(MODE) \ + ((MODE) == OOmode \ + || (MODE) == V32QImode || (MODE) == V16HImode || (MODE) == V8SImode \ + || (MODE) == V4DImode || (MODE) == V8SFmode || (MODE) == V4DFmode) + /* Post-reload, we can't use any new AltiVec registers, as we already emitted the vrsave mask. */ diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index dcf1f3526f5..e9f2244c216 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -683,9 +683,13 @@ (define_mode_attr wd [(QI "b") (HI "h") (SI "w") (DI "d") + (V32QI "b") (V16QI "b") + (V16HI "h") (V8HI "h") + (V8SI "w") (V4SI "w") + (V4DI "d") (V2DI "d") (V1TI "q") (TI "q")]) @@ -812,7 +816,7 @@ (define_mode_attr BOOL_REGS_UNARY [(TI "r,0,0,wa,v") ;; supplement addressing modes. (define_mode_iterator RELOAD [V16QI V8HI V4SI V2DI V4SF V2DF V1TI SF SD SI DF DD DI TI PTI KF IF TF - OO XO]) + OO XO V32QI V16HI V8SI V8SF V4DI V4DF]) ;; Iterate over smin, smax (define_code_iterator fp_minmax [smin smax]) @@ -15767,6 +15771,7 @@ (define_insn "hashchk" (include "vsx.md") (include "altivec.md") (include "mma.md") +(include "vector-pair.md") (include "dfp.md") (include "crypto.md") (include "htm.md") diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt index 369095df9ed..bc2966f6120 100644 --- a/gcc/config/rs6000/rs6000.opt +++ b/gcc/config/rs6000/rs6000.opt @@ -605,6 +605,10 @@ mstore-vector-pair Target Undocumented Mask(STORE_VECTOR_PAIR) Var(rs6000_isa_flags) Generate (do not generate) store vector pair instructions. +mvector-size-32 +Target Undocumented Var(TARGET_VECTOR_SIZE_32) Init(0) Save +Generate (do not generate) vector pair instructions for vector_size(32). + mrelative-jumptables Target Undocumented Var(rs6000_relative_jumptables) Init(1) Save diff --git a/gcc/config/rs6000/vector-pair.md b/gcc/config/rs6000/vector-pair.md new file mode 100644 index 00000000000..068f562200a --- /dev/null +++ b/gcc/config/rs6000/vector-pair.md @@ -0,0 +1,319 @@ +;; Vector pair arithmetic and logical instruction support. +;; Copyright (C) 2020-2023 Free Software Foundation, Inc. +;; Contributed by Peter Bergner and +;; Michael Meissner +;; +;; This file is part of GCC. +;; +;; GCC is free software; you can redistribute it and/or modify it +;; under the terms of the GNU General Public License as published +;; by the Free Software Foundation; either version 3, or (at your +;; option) any later version. +;; +;; GCC is distributed in the hope that it will be useful, but WITHOUT +;; ANY WARRANTY; without even the implied warranty of MERCHANTABILITY +;; or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public +;; License for more details. +;; +;; You should have received a copy of the GNU General Public License +;; along with GCC; see the file COPYING3. If not see +;; . + +;; This function adds support for doing vector operations on pairs of vector +;; registers. Most of the instructions use vector pair instructions to load +;; and possibly store registers, but splitting the operation after register +;; allocation to do 2 separate operations. The second scheduler pass can +;; interleave other instructions between these pairs of instructions if +;; possible. + +;; Iterator for all vector pair modes. Even though we do not provide integer +;; vector pair operations at this time, we need to support loading and storing +;; integer vector pairs for perumte operations (and eventually compare). +(define_mode_iterator VPAIR [V32QI V16HI V8SI V4DI V8SF V4DF]) + +;; Iterator for vector pairs with double word elements +(define_mode_iterator VPAIR_DWORD [V4DI V4DF]) + +;; Map vector pair mode to vector mode in upper case after the vector pair is +;; split to two vectors. +(define_mode_attr VPAIR_VECTOR [(V32QI "V16QI") + (V16HI "V8HI") + (V8SI "V4SI") + (V4DI "V2DI") + (V8SF "V4SF") + (V4DF "V2DF")]) + +;; Map vector pair mode to vector mode in lower case after the vector pair is +;; split to two vectors. +(define_mode_attr vpair_vector_l [(V32QI "v16qi") + (V16HI "v8hi") + (V8SI "v4si") + (V4DI "v2di") + (V8SF "v4sf") + (V4DF "v2df")]) + +;; Map vector pair mode to the base element mode. +(define_mode_attr VPAIR_ELEMENT [(V32QI "QI") + (V16HI "HI") + (V8SI "SI") + (V4DI "DI") + (V8SF "SF") + (V4DF "DF")]) + +;; Map vector pair mode to the base element mode in lower case. +(define_mode_attr vpair_element_l [(V32QI "qi") + (V16HI "hi") + (V8SI "si") + (V4DI "di") + (V8SF "sf") + (V4DF "df")]) + +;; Vector pair move support. +(define_expand "mov" + [(set (match_operand:VPAIR 0 "nonimmediate_operand") + (match_operand:VPAIR 1 "input_operand"))] + "TARGET_MMA && TARGET_VECTOR_SIZE_32" +{ + rs6000_emit_move (operands[0], operands[1], mode); + DONE; +}) + +(define_insn_and_split "*mov" + [(set (match_operand:VPAIR 0 "nonimmediate_operand" + "=wa, wa, ZwO, QwO, wa, wa, wa") + + (match_operand:VPAIR 1 "input_operand" + "ZwO, QwO, wa, wa, wa, j, eV"))] + "TARGET_MMA + && (gpc_reg_operand (operands[0], mode) + || gpc_reg_operand (operands[1], mode))" + "@ + lxvp%X1 %x0,%1 + # + stxvp%X0 %x1,%0 + # + # + # + #" + "&& reload_completed + && ((MEM_P (operands[0]) && !TARGET_STORE_VECTOR_PAIR) + || (MEM_P (operands[1]) && !TARGET_LOAD_VECTOR_PAIR) + || (!MEM_P (operands[0]) && !MEM_P (operands[1])))" + [(const_int 0)] +{ + rs6000_split_multireg_move (operands[0], operands[1]); + DONE; +} + [(set_attr "size" "256") + (set_attr "type" "vecload, vecload, vecstore, vecstore, veclogical, + vecperm, vecperm") + (set_attr "length" "*, 8, *, 8, 8, + 8, 24") + (set_attr "isa" "lxvp, *, stxvp, *, *, + *, *")]) + +;; Vector pair initialization +(define_expand "vec_init" + [(match_operand:VPAIR 0 "vsx_register_operand") + (match_operand:VPAIR 1 "")] + "TARGET_MMA && TARGET_VECTOR_SIZE_32" +{ + rs6000_expand_vector_pair_init (operands[0], operands[1]); + DONE; +}) + +;; Set an element in a vector pair with double word elements. +(define_insn_and_split "vec_set" + [(set (match_operand:VPAIR_DWORD 0 "vsx_register_operand" "+&wa") + (unspec:VPAIR_DWORD + [(match_dup 0) + (match_operand: 1 "vsx_register_operand" "wa") + (match_operand 2 "const_0_to_3_operand" "n")] + UNSPEC_VSX_SET)) + (clobber (match_scratch: 3 "=&wa"))] + "TARGET_MMA && TARGET_VECTOR_SIZE_32" + "#" + "&& reload_completed" + [(const_int 0)] +{ + rtx dest = operands[0]; + rtx value = operands[1]; + HOST_WIDE_INT elt = INTVAL (operands[2]); + rtx tmp = operands[3]; + machine_mode mode = mode; + machine_mode vmode = mode; + unsigned vsize = GET_MODE_SIZE (mode); + unsigned reg_num = ((WORDS_BIG_ENDIAN && elt >= vsize) + || (!WORDS_BIG_ENDIAN && elt < vsize)); + + rtx vreg = simplify_gen_subreg (vmode, dest, mode, reg_num * 16); + + if ((elt & 0x1) == 0) + { + emit_insn (gen_vsx_extract_ (tmp, vreg, const1_rtx)); + emit_insn (gen_vsx_concat_ (vreg, value, tmp)); + } + else + { + emit_insn (gen_vsx_extract_ (tmp, vreg, const0_rtx)); + emit_insn (gen_vsx_concat_ (vreg, tmp, value)); + } + + DONE; +} + [(set_attr "length" "8") + (set_attr "type" "vecperm")]) + +;; Exctract DF/DI from V4DF/V4DI, convert it into extract from V2DF/V2DI. +(define_insn_and_split "vec_extract" + [(set (match_operand: 0 "gpc_reg_operand" "=wa,r") + (vec_select: + (match_operand:VPAIR_DWORD 1 "gpc_reg_operand" "wa,wa") + (parallel + [(match_operand:QI 2 "const_0_to_3_operand" "n,n")])))] + "TARGET_MMA && TARGET_VECTOR_SIZE_32" + "#" + "&& reload_completed" + [(set (match_dup 0) + (vec_select: + (match_dup 3) + (parallel [(match_dup 4)])))] +{ + machine_mode vmode = mode; + rtx op1 = operands[1]; + HOST_WIDE_INT element = INTVAL (operands[2]); + unsigned reg_num = 0; + + if ((WORDS_BIG_ENDIAN && element >= 2) + || (!WORDS_BIG_ENDIAN && element < 2)) + reg_num++; + + operands[3] = simplify_gen_subreg (vmode, op1, mode, reg_num * 16); + operands[4] = GEN_INT (element & 1); +} + [(set_attr "type" "mfvsr,vecperm")]) + +;; Extract a SFmode element from V8SF +(define_insn_and_split "vec_extractv8sfsf" + [(set (match_operand:SF 0 "vsx_register_operand" "=wa") + (vec_select:SF + (match_operand:V8SF 1 "vsx_register_operand" "wa") + (parallel [(match_operand:QI 2 "const_0_to_7_operand" "n")])))] + "TARGET_MMA && TARGET_VECTOR_SIZE_32" + "#" + "&& reload_completed" + [(const_int 0)] +{ + rtx op0 = operands[0]; + rtx op1 = operands[1]; + rtx tmp; + HOST_WIDE_INT element = INTVAL (operands[2]); + unsigned reg_num = 0; + + if ((WORDS_BIG_ENDIAN && element >= 4) + || (!WORDS_BIG_ENDIAN && element < 4)) + reg_num++; + + rtx vreg = simplify_gen_subreg (V4SFmode, op1, V8SFmode, reg_num * 16); + HOST_WIDE_INT vreg_elt = element & 3; + + /* Get the element into position 0 if it isn't there already. */ + if (!vreg_elt) + tmp = vreg; + else + { + tmp = gen_rtx_REG (V4SFmode, reg_or_subregno (op0)); + emit_insn (gen_vsx_xxsldwi_v4sf (tmp, vreg, vreg, GEN_INT (vreg_elt))); + } + + /* Convert the float element to double precision. */ + emit_insn (gen_vsx_xscvspdp_scalar2 (op0, tmp)); + DONE; +} + [(set_attr "length" "8") + (set_attr "type" "fp")]) + +;; Assemble a vector pair from two vectors. +;; +;; We have both endian versions to change which input register will be moved +;; the the first register in the vector pair. +(define_expand "vpair_concat_" + [(set (match_operand:VPAIR 0 "vsx_register_operand") + (vec_concat:VPAIR + (match_operand: 1 "input_operand") + (match_operand: 2 "input_operand")))] + "TARGET_MMA && TARGET_VECTOR_SIZE_32") + +(define_insn_and_split "*vpair_concat__be" + [(set (match_operand:VPAIR 0 "vsx_register_operand" "=wa,&wa") + (vec_concat:VPAIR + (match_operand: 1 "input_operand" "0,mwajeP") + (match_operand: 2 "input_operand" "mwajeP,mwajeP")))] + "TARGET_MMA && TARGET_VECTOR_SIZE_32 && WORDS_BIG_ENDIAN" + "#" + "&& reload_completed" + [(set (match_dup 3) (match_dup 1)) + (set (match_dup 4) (match_dup 2))] +{ + machine_mode vmode = mode; + rtx op0 = operands[0]; + operands[3] = simplify_gen_subreg (vmode, op0, mode, 0); + operands[4] = simplify_gen_subreg (vmode, op0, mode, 16); +} + [(set_attr "length" "8")]) + +(define_insn_and_split "*vpair_concat__le" + [(set (match_operand:VPAIR 0 "vsx_register_operand" "=&wa,wa") + (vec_concat:VPAIR + (match_operand: 1 "input_operand" "mwajeP,0") + (match_operand: 2 "input_operand" "mwajeP,mwajeP")))] + "TARGET_MMA && TARGET_VECTOR_SIZE_32 && !WORDS_BIG_ENDIAN" + "#" + "&& reload_completed" + [(set (match_dup 3) (match_dup 1)) + (set (match_dup 4) (match_dup 2))] +{ + machine_mode vmode = mode; + rtx op0 = operands[0]; + operands[3] = simplify_gen_subreg (vmode, op0, mode, 0); + operands[4] = simplify_gen_subreg (vmode, op0, mode, 16); +} + [(set_attr "length" "8")]) + +;; Zero a vector pair +(define_expand "vpair_zero_" + [(set (match_operand:VPAIR 0 "vsx_register_operand") (match_dup 1))] + "TARGET_MMA && TARGET_VECTOR_SIZE_32" +{ + operands[1] = CONST0_RTX (mode); +}) + +;; Create a vector pair with a value splat'ed (duplicated) to all of the +;; elements. +(define_expand "vpair_splat_" + [(use (match_operand:VPAIR 0 "vsx_register_operand")) + (use (match_operand: 1 "input_operand"))] + "TARGET_MMA && TARGET_VECTOR_SIZE_32" +{ + machine_mode vmode = mode; + rtx op0 = operands[0]; + rtx op1 = operands[1]; + + if (op1 == CONST0_RTX (vmode)) + { + emit_insn (gen_vpair_zero_ (op0)); + DONE; + } + + rtx tmp = gen_reg_rtx (vmode); + + unsigned num_elements = GET_MODE_NUNITS (vmode); + rtvec elements = rtvec_alloc (num_elements); + for (size_t i = 0; i < num_elements; i++) + RTVEC_ELT (elements, i) = copy_rtx (op1); + + rtx vec_elements = gen_rtx_PARALLEL (vmode, elements); + rs6000_expand_vector_init (tmp, vec_elements); + emit_insn (gen_vpair_concat_ (op0, tmp, tmp)); + DONE; +}) diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index e01cdcbe22c..23c151f90de 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -3509,6 +3509,10 @@ loaded to a VSX register with one prefixed instruction. An IEEE 128-bit constant that can be loaded into a VSX register with the @code{lxvkq} instruction. +@item eV +A vector pair constant that can be loaded to a VSX register with two +separate instructions. + @ifset INTERNALS @item G A floating point constant that can be loaded into a register with one From patchwork Mon Nov 20 04:23:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Meissner X-Patchwork-Id: 166944 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:9910:0:b0:403:3b70:6f57 with SMTP id i16csp1974496vqn; Sun, 19 Nov 2023 20:24:31 -0800 (PST) X-Google-Smtp-Source: AGHT+IEdQmKzMnb5uGsUZJrUb476RVKbIyJyZbgCYXtUCea2cq63JGtaAgxtbvbB6wOOAsa0ahgH X-Received: by 2002:a05:622a:13c8:b0:421:c3a9:1e4f with SMTP id p8-20020a05622a13c800b00421c3a91e4fmr8577679qtk.7.1700454271233; Sun, 19 Nov 2023 20:24:31 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1700454271; cv=pass; d=google.com; s=arc-20160816; b=CUs8D1AJYL4ZZOlqUWeTTvLDoVjwkG3v+3ce3RAeL+2n9UyE+V5GrYaB/FxIxLsFY1 zvfj1u4i9KItYObzrEHCUjVpS+y3/EG9ESoVBjtrSitq01aL9dRZDR4Rux95ApUq1E2n cHdfmKLfmoDs/kiLJZ5TIU7XiV19lX0Enjq/1X8X2f7LXVisEkJHAlkNBJlyVJLbg3Uk OKmJPZRdIK200JrzWYNEdZNFg7XU8e7tuqn+unookZCyefQ7CzWRdXtFD2SlW3INQoE0 6Bz2yjpYHkaA5cGrbi72dHIrz9BKYSI2fub6jlLae5YXoHRaZEqjM32jCjiSTUFN+8Fr Cmzw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:in-reply-to:content-disposition :mime-version:references:mail-followup-to:message-id:subject:to:from :date:dkim-signature:arc-filter:dmarc-filter:delivered-to; bh=/8EYxFycTBCeELh9p+HR2BJBK1Zu0ADnWH5rPx8AyKM=; fh=jH+DijE7mz3ySVsRmzRqEe/ioBeGu3vnvA+jm2JjCm8=; b=BJ4Uxff3YyxM1LGKJWavLYeIkDBHnMnE3zIWIb3dh8G4e0IdT7Yj4ZUAR7FeltAuBd NgUeLBa++U9EygyC6hF5MRAlqZWRuOYzgoos+lvI9uKUe+5zflSEZiAUtADlMqNFmm28 sF8Hzw5EitXvesu5WgIhuw8oC5O9spuQcIF2nuEN8r6ZVtQPrK1MPYfmNFth8F1+eLe4 wHMvp3MEGoASTNmXyjXUyHTYuSnYuF2qOVxJGet+Aln/pbd07V7ayZeyVTKElHpyE7mx fcs10aKhPhIQjHp/t6xhs+KnfUW4EYog2t4fqUCOZz60IhuTS5ZZUqun9yEze29VIe4B zANA== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b="jp/w5Ico"; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com Received: from server2.sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id t27-20020a05622a181b00b0041812c64682si6402862qtc.112.2023.11.19.20.24.31 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 19 Nov 2023 20:24:31 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b="jp/w5Ico"; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 051A03858421 for ; Mon, 20 Nov 2023 04:24:31 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id C6D2B3858C78 for ; Mon, 20 Nov 2023 04:24:03 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C6D2B3858C78 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org C6D2B3858C78 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700454246; cv=none; b=NF69YZyHimdFZY+l6WhkDmWZy9KUZu9epB6oX7yAqwWm1vhpCVqlHCb0tF3z+AWfi4iVCsCkQENiENgup8bxHAEs6eG0Fj0DTZvw3kjUOdk6c4KDpUmxAjjjqYfSoXFxyhPC5OO6r/aBA+F+uy+giLcttzRAQMu4liEEcEvl8OM= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700454246; c=relaxed/simple; bh=F/tV1B8HrVZN1r/P9sBuSIXOAEHLz6vQKeaUxfG9JvQ=; h=DKIM-Signature:Date:From:To:Subject:Message-ID:MIME-Version; b=pfYKnUs+qz7ylH/HCXUFB/L1kDF9aSeJaIXFskCQzTb6vNp5Np/EUZH4HYcqIwkkfmXLb8u5SsfBazLBRFAtcevktrq8P6j5L7B+zNW19UpenO9+RACbDYEp9d5EXZJtvtIwezBcBkK4Se4XThS6VzTjbgOpuvOIwJrxD583Z6A= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0353726.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3AK2jbfi018549; Mon, 20 Nov 2023 04:24:02 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=date : from : to : subject : message-id : references : mime-version : content-type : in-reply-to; s=pp1; bh=/8EYxFycTBCeELh9p+HR2BJBK1Zu0ADnWH5rPx8AyKM=; b=jp/w5IcojWSXSNpn4cWYhyN4/dzkVyInCysuYHcsGijwhik9I007g/JT9Hp5u6M/bxNJ 47aqEFeky7nYYWTZZrIP8ijq2Ofm/S7epCaMo9TrLvoguWw7csUp0aWL359NcNYFWeDw MFtI7hvDwtSDSsdw5cFjdn8QT+vpDZsItq8Ii53mLL5Di++IQ5iBXJ9SL3SMNeRe+e+s zXVldRxuZNCE3IchZAsup5wkW2L/GPbsdmYe1TAqphfy3pJPxU4p4ZG2TCwwcLQZFTI3 2+ptA0BMLzGIFBc6rxcsC60c4ZCZi1G9tayy76nDSaxEVZXhp74E1gLi8+d4VxSnI6Ib Yg== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3uf0k3k177-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 20 Nov 2023 04:24:02 +0000 Received: from m0353726.ppops.net (m0353726.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 3AK4Fb3H009953; Mon, 20 Nov 2023 04:24:01 GMT Received: from ppma11.dal12v.mail.ibm.com (db.9e.1632.ip4.static.sl-reverse.com [50.22.158.219]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3uf0k3k16w-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 20 Nov 2023 04:24:01 +0000 Received: from pps.filterd (ppma11.dal12v.mail.ibm.com [127.0.0.1]) by ppma11.dal12v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 3AK1bhx0000491; Mon, 20 Nov 2023 04:24:00 GMT Received: from smtprelay06.dal12v.mail.ibm.com ([172.16.1.8]) by ppma11.dal12v.mail.ibm.com (PPS) with ESMTPS id 3ufaa1paqb-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 20 Nov 2023 04:24:00 +0000 Received: from smtpav01.wdc07v.mail.ibm.com (smtpav01.wdc07v.mail.ibm.com [10.39.53.228]) by smtprelay06.dal12v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 3AK4O0Hs20644450 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 20 Nov 2023 04:24:00 GMT Received: from smtpav01.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E30C158059; Mon, 20 Nov 2023 04:23:59 +0000 (GMT) Received: from smtpav01.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 2BF3E5804B; Mon, 20 Nov 2023 04:23:59 +0000 (GMT) Received: from cowardly-lion.the-meissners.org (unknown [9.61.1.46]) by smtpav01.wdc07v.mail.ibm.com (Postfix) with ESMTPS; Mon, 20 Nov 2023 04:23:59 +0000 (GMT) Date: Sun, 19 Nov 2023 23:23:57 -0500 From: Michael Meissner To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , "Kewen.Lin" , David Edelsohn , Peter Bergner Subject: [PATCH 2/4] Vector pair floating point support for PowerPC Message-ID: Mail-Followup-To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , "Kewen.Lin" , David Edelsohn , Peter Bergner References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-TM-AS-GCONF: 00 X-Proofpoint-GUID: y1_MIaAR-h1y8XpwRJbQPzWzpQRk1NRB X-Proofpoint-ORIG-GUID: -9_K4bu6J29YCLcQxwaG1gwtDp8mVgrd X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.987,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-11-20_01,2023-11-17_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1015 lowpriorityscore=0 impostorscore=0 mlxlogscore=999 malwarescore=0 spamscore=0 phishscore=0 bulkscore=0 priorityscore=1501 mlxscore=0 adultscore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2311060000 definitions=main-2311200029 X-Spam-Status: No, score=-10.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1783055538062585705 X-GMAIL-MSGID: 1783055538062585705 The first patch in the vector pair series was previous posted. This patch needs that first patch. The first patch implemented the basic modes, and it allows for initialization of the modes. In addition, I added some optimizations for extracting and setting fields within the vector pair. This is the second patch in the vector pair series. It adds the basic support to do the normal floating point arithmetic operations like add, subtract, etc. I have also put in combine insns to enable combining the fma (fused multiply-add) instructions with negation to generate the 4 fma operations on the PowerPC. The third patch will implement the integer vector pair support. The fourth patch will provide new tests to the test suite. When I test a saxpy type loop (a[i] += (b[i] * c[i])), I generally see a 10% improvement over either auto-factorization, or just using the vector types. I have tested these patches on a little endian power10 system. With -vector-size-32 disabled by default, there are no regressions in the test suite. I have also built and run the tests on both little endian power 9 and big endian 9 power systems, and there are no regressions. Can I check these patches into the master branch? 2023-11-19 Michael Meisner gcc/ * config/rs6000/rs6000-protos.h (split_unary_vector_pair): New declaration. (split_binary_vector_pair): Likewise. (split_fma_vector_pair): Likewise. * config/rs6000/rs6000.cc (split_unary_vector_pair): New function. (split_binary_vector_pair): Likewise. (split_fma_vector_pair): Likewise. * config/rs6000/vector-pair.md (VPAIR_FP): New mode iterator. (VPAIR_FP_UNARY): New code iterator. (VPAIR_FP_BINARY): Likewise. (vpair_op): New code attribute. (2, VPAIR_FP and VPAIR_FP_UNARY iterators): New insns. (sqrtv8sf2): Likewise. (sqrtv4df2): Likewise. (nabs2): Likewise. (3, VPAIR_FP and VP_FP_BINARY iterators): Likewise. (divv8sf3): Likewise. (divv4df3): Likewise. (fma4): Likewise. (fms4): Likewise. (nfma4): Likewise. (nfms4): Likewise. (fma_fpcontract_4): Likewise. (fms_fpcontract_4): Likewise. (nfma_fpcontract_): Likewise. (nfms_fpcontract_): Likewise. --- gcc/config/rs6000/rs6000-protos.h | 5 + gcc/config/rs6000/rs6000.cc | 74 +++++++ gcc/config/rs6000/vector-pair.md | 310 ++++++++++++++++++++++++++++++ 3 files changed, 389 insertions(+) diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h index e17d73cb4ca..dac48f199ab 100644 --- a/gcc/config/rs6000/rs6000-protos.h +++ b/gcc/config/rs6000/rs6000-protos.h @@ -141,6 +141,11 @@ extern void rs6000_emit_swsqrt (rtx, rtx, bool); extern void output_toc (FILE *, rtx, int, machine_mode); extern void rs6000_fatal_bad_address (rtx); extern rtx create_TOC_reference (rtx, rtx); +extern void split_unary_vector_pair (machine_mode, rtx [], rtx (*)(rtx, rtx)); +extern void split_binary_vector_pair (machine_mode, rtx [], + rtx (*)(rtx, rtx, rtx)); +extern void split_fma_vector_pair (machine_mode, rtx [], + rtx (*)(rtx, rtx, rtx, rtx)); extern void rs6000_split_multireg_move (rtx, rtx); extern void rs6000_emit_le_vsx_permute (rtx, rtx, machine_mode); extern void rs6000_emit_le_vsx_move (rtx, rtx, machine_mode); diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index c9bd8c35e63..aeac7c9fa42 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -27634,6 +27634,80 @@ rs6000_split_logical (rtx operands[3], return; } +/* Split a unary vector pair insn into two separate vector insns. */ + +void +split_unary_vector_pair (machine_mode mode, /* vector mode. */ + rtx operands[], /* dest, src. */ + rtx (*func)(rtx, rtx)) /* create insn. */ +{ + rtx op0 = operands[0]; + rtx op1 = operands[1]; + machine_mode orig_mode = GET_MODE (op0); + + rtx reg0_vector0 = simplify_gen_subreg (mode, op0, orig_mode, 0); + rtx reg1_vector0 = simplify_gen_subreg (mode, op1, orig_mode, 0); + rtx reg0_vector1 = simplify_gen_subreg (mode, op0, orig_mode, 16); + rtx reg1_vector1 = simplify_gen_subreg (mode, op1, orig_mode, 16); + + emit_insn (func (reg0_vector0, reg1_vector0)); + emit_insn (func (reg0_vector1, reg1_vector1)); + return; +} + +/* Split a binary vector pair insn into two separate vector insns. */ + +void +split_binary_vector_pair (machine_mode mode, /* vector mode. */ + rtx operands[], /* dest, src. */ + rtx (*func)(rtx, rtx, rtx)) /* create insn. */ +{ + rtx op0 = operands[0]; + rtx op1 = operands[1]; + rtx op2 = operands[2]; + machine_mode orig_mode = GET_MODE (op0); + + rtx reg0_vector0 = simplify_gen_subreg (mode, op0, orig_mode, 0); + rtx reg1_vector0 = simplify_gen_subreg (mode, op1, orig_mode, 0); + rtx reg2_vector0 = simplify_gen_subreg (mode, op2, orig_mode, 0); + rtx reg0_vector1 = simplify_gen_subreg (mode, op0, orig_mode, 16); + rtx reg1_vector1 = simplify_gen_subreg (mode, op1, orig_mode, 16); + rtx reg2_vector1 = simplify_gen_subreg (mode, op2, orig_mode, 16); + + emit_insn (func (reg0_vector0, reg1_vector0, reg2_vector0)); + emit_insn (func (reg0_vector1, reg1_vector1, reg2_vector1)); + return; +} + +/* Split a fused multiply-add vector pair insn into two separate vector + insns. */ + +void +split_fma_vector_pair (machine_mode mode, /* vector mode. */ + rtx operands[], /* dest, src. */ + rtx (*func)(rtx, rtx, rtx, rtx)) /* create insn. */ +{ + rtx op0 = operands[0]; + rtx op1 = operands[1]; + rtx op2 = operands[2]; + rtx op3 = operands[3]; + machine_mode orig_mode = GET_MODE (op0); + + rtx reg0_vector0 = simplify_gen_subreg (mode, op0, orig_mode, 0); + rtx reg1_vector0 = simplify_gen_subreg (mode, op1, orig_mode, 0); + rtx reg2_vector0 = simplify_gen_subreg (mode, op2, orig_mode, 0); + rtx reg3_vector0 = simplify_gen_subreg (mode, op3, orig_mode, 0); + + rtx reg0_vector1 = simplify_gen_subreg (mode, op0, orig_mode, 16); + rtx reg1_vector1 = simplify_gen_subreg (mode, op1, orig_mode, 16); + rtx reg2_vector1 = simplify_gen_subreg (mode, op2, orig_mode, 16); + rtx reg3_vector1 = simplify_gen_subreg (mode, op3, orig_mode, 16); + + emit_insn (func (reg0_vector0, reg1_vector0, reg2_vector0, reg3_vector0)); + emit_insn (func (reg0_vector1, reg1_vector1, reg2_vector1, reg3_vector1)); + return; +} + /* Emit instructions to move SRC to DST. Called by splitters for multi-register moves. It will emit at most one instruction for each register that is accessed; that is, it won't emit li/lis pairs diff --git a/gcc/config/rs6000/vector-pair.md b/gcc/config/rs6000/vector-pair.md index 068f562200a..8e2d7e5cc5b 100644 --- a/gcc/config/rs6000/vector-pair.md +++ b/gcc/config/rs6000/vector-pair.md @@ -31,9 +31,34 @@ ;; integer vector pairs for perumte operations (and eventually compare). (define_mode_iterator VPAIR [V32QI V16HI V8SI V4DI V8SF V4DF]) +;; Floating point vector pair ops +(define_mode_iterator VPAIR_FP [V8SF V4DF]) + +;; Iterator for floating point unary/binary operations. +(define_code_iterator VPAIR_FP_UNARY [abs neg]) +(define_code_iterator VPAIR_FP_BINARY [plus minus mult smin smax]) + ;; Iterator for vector pairs with double word elements (define_mode_iterator VPAIR_DWORD [V4DI V4DF]) +;; Give the insn name from the opertion +(define_code_attr vpair_op [(abs "abs") + (div "div") + (and "and") + (fma "fma") + (ior "ior") + (minus "sub") + (mult "mul") + (neg "neg") + (not "one_cmpl") + (plus "add") + (smin "smin") + (smax "smax") + (sqrt "sqrt") + (umin "umin") + (umax "umax") + (xor "xor")]) + ;; Map vector pair mode to vector mode in upper case after the vector pair is ;; split to two vectors. (define_mode_attr VPAIR_VECTOR [(V32QI "V16QI") @@ -317,3 +342,288 @@ (define_expand "vpair_splat_" emit_insn (gen_vpair_concat_ (op0, tmp, tmp)); DONE; }) + +;; Vector pair floating point arithmetic unary operations +(define_insn_and_split "2" + [(set (match_operand:VPAIR_FP 0 "vsx_register_operand" "=wa") + (VPAIR_FP_UNARY:VPAIR_FP + (match_operand:VPAIR_FP 1 "vsx_register_operand" "wa")))] + "TARGET_MMA && TARGET_VECTOR_SIZE_32" + "#" + "&& reload_completed" + [(const_int 0)] +{ + split_unary_vector_pair (mode, operands, + gen_2); + DONE; +} + [(set_attr "length" "8") + (set_attr "type" "vecfloat")]) + +;; Sqrt needs different type attributes between V8SF and V4DF +(define_insn_and_split "sqrtv8sf2" + [(set (match_operand:V8SF 0 "vsx_register_operand" "=wa") + (sqrt:V8SF + (match_operand:V8SF 1 "vsx_register_operand" "wa")))] + "TARGET_MMA && TARGET_VECTOR_SIZE_32" + "#" + "&& reload_completed" + [(const_int 0)] +{ + split_unary_vector_pair (V4SFmode, operands, gen_sqrtv4sf2); + DONE; +} + [(set_attr "length" "8") + (set_attr "type" "vecfdiv")]) + +(define_insn_and_split "sqrtv4df2" + [(set (match_operand:V4DF 0 "vsx_register_operand" "=wa") + (sqrt:V4DF + (match_operand:V4DF 1 "vsx_register_operand" "wa")))] + "TARGET_MMA && TARGET_VECTOR_SIZE_32" + "#" + "&& reload_completed" + [(const_int 0)] +{ + split_unary_vector_pair (V2DFmode, operands, gen_sqrtv2df2); + DONE; +} + [(set_attr "length" "8") + (set_attr "type" "vecdiv")]) + +;; Optimize negative absolute value (both floating point and integer) +(define_insn_and_split "nabs2" + [(set (match_operand:VPAIR_FP 0 "vsx_register_operand" "=wa") + (neg:VPAIR_FP + (abs:VPAIR_FP + (match_operand:VPAIR_FP 1 "vsx_register_operand" "wa"))))] + "TARGET_MMA && TARGET_VECTOR_SIZE_32" + "#" + "&& reload_completed" + [(const_int 0)] +{ + split_unary_vector_pair (mode, operands, + gen_vsx_nabs2); + DONE; +} + [(set_attr "length" "8") + (set_attr "type" "vecfloat")]) + +;; Vector pair floating point arithmetic binary operations +(define_insn_and_split "3" + [(set (match_operand:VPAIR_FP 0 "vsx_register_operand" "=wa") + (VPAIR_FP_BINARY:VPAIR_FP + (match_operand:VPAIR_FP 1 "vsx_register_operand" "wa") + (match_operand:VPAIR_FP 2 "vsx_register_operand" "wa")))] + "TARGET_MMA && TARGET_VECTOR_SIZE_32" + "#" + "&& reload_completed" + [(const_int 0)] +{ + split_binary_vector_pair (mode, operands, + gen_3); + DONE; +} + [(set_attr "length" "8") + (set_attr "type" "vecfloat")]) + +;; Divide needs different type attributes between V8SF and V4DF +(define_insn_and_split "divv8sf3" + [(set (match_operand:V8SF 0 "vsx_register_operand" "=wa") + (div:V8SF + (match_operand:V8SF 1 "vsx_register_operand" "wa") + (match_operand:V8SF 2 "vsx_register_operand" "wa")))] + "TARGET_MMA && TARGET_VECTOR_SIZE_32" + "#" + "&& reload_completed" + [(const_int 0)] +{ + split_binary_vector_pair (V4SFmode, operands, gen_divv4sf3); + DONE; +} + [(set_attr "length" "8") + (set_attr "type" "vecfdiv")]) + +(define_insn_and_split "divv4df3" + [(set (match_operand:V4DF 0 "vsx_register_operand" "=wa") + (div:V4DF + (match_operand:V4DF 1 "vsx_register_operand" "wa") + (match_operand:V4DF 2 "vsx_register_operand" "wa")))] + "TARGET_MMA && TARGET_VECTOR_SIZE_32" + "#" + "&& reload_completed" + [(const_int 0)] +{ + split_binary_vector_pair (V2DFmode, operands, gen_divv2df3); + DONE; +} + [(set_attr "length" "8") + (set_attr "type" "vecdiv")]) + +;; Vector pair floating point fused multiply-add +(define_insn_and_split "fma4" + [(set (match_operand:VPAIR_FP 0 "vsx_register_operand" "=wa,wa") + (fma:VPAIR_FP + (match_operand:VPAIR_FP 1 "vsx_register_operand" "%wa,wa") + (match_operand:VPAIR_FP 2 "vsx_register_operand" "wa,0") + (match_operand:VPAIR_FP 3 "vsx_register_operand" "0,wa")))] + "TARGET_MMA && TARGET_VECTOR_SIZE_32" + "#" + "&& reload_completed" + [(const_int 0)] +{ + split_fma_vector_pair (mode, operands, + gen_fma4); + DONE; +} + [(set_attr "length" "8") + (set_attr "type" "vecfloat")]) + +;; Vector pair floating point fused multiply-subtract +(define_insn_and_split "fms4" + [(set (match_operand:VPAIR_FP 0 "vsx_register_operand" "=wa,wa") + (fma:VPAIR_FP + (match_operand:VPAIR_FP 1 "vsx_register_operand" "%wa,wa") + (match_operand:VPAIR_FP 2 "vsx_register_operand" "wa,0") + (neg:VPAIR_FP + (match_operand:VPAIR_FP 3 "vsx_register_operand" "0,wa"))))] + "TARGET_MMA && TARGET_VECTOR_SIZE_32" + "#" + "&& reload_completed" + [(const_int 0)] +{ + split_fma_vector_pair (mode, operands, + gen_fms4); + DONE; +} + [(set_attr "length" "8") + (set_attr "type" "vecfloat")]) + +;; Vector pair floating point negative fused multiply-add +(define_insn_and_split "nfma4" + [(set (match_operand:VPAIR_FP 0 "vsx_register_operand" "=wa,wa") + (neg:VPAIR_FP + (fma:VPAIR_FP + (match_operand:VPAIR_FP 1 "vsx_register_operand" "%wa,wa") + (match_operand:VPAIR_FP 2 "vsx_register_operand" "wa,0") + (match_operand:VPAIR_FP 3 "vsx_register_operand" "0,wa"))))] + "TARGET_MMA && TARGET_VECTOR_SIZE_32" + "#" + "&& reload_completed" + [(const_int 0)] +{ + split_fma_vector_pair (mode, operands, + gen_nfma4); + DONE; +} + [(set_attr "length" "8")]) + +;; Vector pair floating point fused negative multiply-subtract +(define_insn_and_split "nfms4" + [(set (match_operand:VPAIR_FP 0 "vsx_register_operand" "=wa,wa") + (neg:VPAIR_FP + (fma:VPAIR_FP + (match_operand:VPAIR_FP 1 "vsx_register_operand" "%wa,wa") + (match_operand:VPAIR_FP 2 "vsx_register_operand" "wa,0") + (neg:VPAIR_FP + (match_operand:VPAIR_FP 3 "vsx_register_operand" "0,wa")))))] + "TARGET_MMA && TARGET_VECTOR_SIZE_32" + "#" + "&& reload_completed" + [(const_int 0)] +{ + split_fma_vector_pair (mode, operands, + gen_nfms4); + DONE; +} + [(set_attr "length" "8") + (set_attr "type" "vecfloat")]) + +;; Optimize vector pair (a * b) + c into fma (a, b, c) +(define_insn_and_split "*fma_fpcontract_4" + [(set (match_operand:VPAIR_FP 0 "vsx_register_operand" "=wa,wa") + (plus:VPAIR_FP + (mult:VPAIR_FP + (match_operand:VPAIR_FP 1 "vsx_register_operand" "%wa,wa") + (match_operand:VPAIR_FP 2 "vsx_register_operand" "wa,0")) + (match_operand:VPAIR_FP 3 "vsx_register_operand" "0,wa")))] + "TARGET_MMA && TARGET_VECTOR_SIZE_32 + && flag_fp_contract_mode == FP_CONTRACT_FAST" + "#" + "&& 1" + [(set (match_dup 0) + (fma:VPAIR_FP (match_dup 1) + (match_dup 2) + (match_dup 3)))] +{ +} + [(set_attr "length" "8")]) + +;; Optimize vector pair (a * b) - c into fma (a, b, -c) +(define_insn_and_split "*fms_fpcontract_4" + [(set (match_operand:VPAIR_FP 0 "vsx_register_operand" "=wa,wa") + (minus:VPAIR_FP + (mult:VPAIR_FP + (match_operand:VPAIR_FP 1 "vsx_register_operand" "%wa,wa") + (match_operand:VPAIR_FP 2 "vsx_register_operand" "wa,0")) + (match_operand:VPAIR_FP 3 "vsx_register_operand" "0,wa")))] + "TARGET_MMA && TARGET_VECTOR_SIZE_32 + && flag_fp_contract_mode == FP_CONTRACT_FAST" + "#" + "&& 1" + [(set (match_dup 0) + (fma:VPAIR_FP (match_dup 1) + (match_dup 2) + (neg:VPAIR_FP + (match_dup 3))))] +{ +} + [(set_attr "length" "8") + (set_attr "type" "vecfloat")]) + +;; Optimize vector pair -((a * b) + c) into -fma (a, b, c) +(define_insn_and_split "*nfma_fpcontract_4" + [(set (match_operand:VPAIR_FP 0 "vsx_register_operand" "=wa,wa") + (neg:VPAIR_FP + (plus:VPAIR_FP + (mult:VPAIR_FP + (match_operand:VPAIR_FP 1 "vsx_register_operand" "%wa,wa") + (match_operand:VPAIR_FP 2 "vsx_register_operand" "wa,0")) + (match_operand:VPAIR_FP 3 "vsx_register_operand" "0,wa"))))] + "TARGET_MMA && TARGET_VECTOR_SIZE_32 + && flag_fp_contract_mode == FP_CONTRACT_FAST" + "#" + "&& 1" + [(set (match_dup 0) + (neg:VPAIR_FP + (fma:VPAIR_FP (match_dup 1) + (match_dup 2) + (match_dup 3))))] +{ +} + [(set_attr "length" "8")]) + +;; Optimize vector pair -((a * b) - c) into -fma (a, b, -c) +(define_insn_and_split "*nfms_fpcontract_4" + [(set (match_operand:VPAIR_FP 0 "vsx_register_operand" "=wa,wa") + (neg:VPAIR_FP + (minus:VPAIR_FP + (mult:VPAIR_FP + (match_operand:VPAIR_FP 1 "vsx_register_operand" "%wa,wa") + (match_operand:VPAIR_FP 2 "vsx_register_operand" "wa,0")) + (match_operand:VPAIR_FP 3 "vsx_register_operand" "0,wa"))))] + "TARGET_MMA && TARGET_VECTOR_SIZE_32 + && flag_fp_contract_mode == FP_CONTRACT_FAST" + "#" + "&& 1" + [(set (match_dup 0) + (neg:VPAIR_FP + (fma:VPAIR_FP (match_dup 1) + (match_dup 2) + (neg:VPAIR_FP + (match_dup 3)))))] +{ +} + [(set_attr "length" "8") + (set_attr "type" "vecfloat")]) + From patchwork Mon Nov 20 04:26:03 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Meissner X-Patchwork-Id: 166945 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:9910:0:b0:403:3b70:6f57 with SMTP id i16csp1974984vqn; Sun, 19 Nov 2023 20:26:34 -0800 (PST) X-Google-Smtp-Source: AGHT+IHJCfZvL4FoRaWAsXlOgmWof9ejVkIM8BupnajjZ5ASd91yXkj5cEsuQ+HmFlzvrJM/t1+m X-Received: by 2002:a05:620a:2a08:b0:779:eb01:8390 with SMTP id o8-20020a05620a2a0800b00779eb018390mr7676837qkp.49.1700454394476; Sun, 19 Nov 2023 20:26:34 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1700454394; cv=pass; d=google.com; s=arc-20160816; b=kIV/YSYAqsd5IPYdoF+Dt50nBNZgTKgcNmaIwE2TPhIOiAmWJ0/pIcvEanttw6BYOT sF4JRV3BX0xcauH4KyswEjPepl3+WsafVjj7BOqXdowCLUYdvT9MGhKTWEJVtu+VNqyn kQt57fyZeBzqx3sYjugIDCwFHbYiX8qnEL4kjTncBO5eE2ZLw96YhaEJQ38zFZ6EB/18 qwVtvt4xZ4nBjKo14E9WIibhqTEUXs8ysAuB/xeN62v4RhoKo6HW1QllXHUAr1kG6Fxn 90FoClu0ypfVYZPiiy/NedJpPK3VdRUQ4Im2OgH7njDnHZaRr4sI1AvkNtJPz/P6peWG 4LAQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:in-reply-to:content-disposition :mime-version:references:mail-followup-to:message-id:subject:to:from :date:dkim-signature:arc-filter:dmarc-filter:delivered-to; bh=FS4lD1pU9xOFb0BVeNRLrVXjWgCXr795LUBiv5RnXS0=; fh=jH+DijE7mz3ySVsRmzRqEe/ioBeGu3vnvA+jm2JjCm8=; b=euMyrJN+Djk5mIJSTMaO6UTq6dWSG7FquoqRC1zD4aFSAQeovJeJjkAnpftXJ6MBtY T/uCpqZ1f24aamgXsTVsXO7vVrVjXJM0HhwrU0zSO/m+WEp4+J5AGz2zBLzsx/NzLEWC gBV7a7Nm99zcGl2YW96uo2uSmmyQ+ICGCr8kLAgXo21WA1XqHsf/KFdNHHj9++5IJrv+ zQsGEI2I77SI1oabDdGAQvQAPznxHz4FkiGZPovSsHO1cuZi2rOLhZnwhmpX96G3H1e8 55JY+74R0zITEeppnZINvPKzlOHbFc479phKQrpEWYNaaB28+torNvjlTVjTO9gNl1gW I0Rg== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=FzicYwAa; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com Received: from server2.sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id h27-20020a05620a10bb00b00775cadf4e5csi6488284qkk.214.2023.11.19.20.26.34 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 19 Nov 2023 20:26:34 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=FzicYwAa; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 3FDD93858439 for ; Mon, 20 Nov 2023 04:26:34 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 86A273858C50 for ; Mon, 20 Nov 2023 04:26:08 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 86A273858C50 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 86A273858C50 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700454370; cv=none; b=uUHA7MCWUES2A9WeYaaHaw18P830dTm3ls7kXN0WiDollT0J0Dookrohe5qYB35UZH/l+nKUFqmmvmN5EyNHzOwgNH5p4aTQt6yjiHxwSLLqQuJZ3LH7Vxpfdf0vywSlouZjz+/JZxbCWSx4ITMMDFPcg8EIBxw4hEThJc15Bbc= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700454370; c=relaxed/simple; bh=Ys4aFWoVvU4qpItHaS3tzaPIUCYsK3NlbrJKhrHZvJg=; h=DKIM-Signature:Date:From:To:Subject:Message-ID:MIME-Version; b=vrK5BsGFhHENhGH4CFTINwywbtBk2H45Y5Tkfs+nIKIC3UgwRwG8Q2/P5PKyB9ficYUSNN0KKQlrgVeAPVOEfTXls6iVl06RYKL/glZE5rXjnhYdO5WQotdvPjAD18uxYxfcKZsgJzYlu2p2ouxfWLZnBcZDmYCpLwcShIEgG/E= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0353723.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3AK4B9Z2028501; Mon, 20 Nov 2023 04:26:08 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=date : from : to : subject : message-id : references : mime-version : content-type : in-reply-to; s=pp1; bh=FS4lD1pU9xOFb0BVeNRLrVXjWgCXr795LUBiv5RnXS0=; b=FzicYwAanWrE0eQdi1blNMkp07/lEdvLqm6eP+bwmE5FUORVfWYYnlCXWvpYeXTQWH+B jxJ5zKljEtWiJHPAH8wXj9kwm0Sqdxds+SvputQ13uuziYw/ei4JaEPpRWeebYdWvwn4 RPe9r+g8y/DeZygYK1KmxCmz1OKx4szkdWZ/NKOWoAqeGrJJE1easqiN1VWFJ1iqgv0e kVXYeB7ybje8hcZd5PbQihipCBJMiIt3bdcf2PO5WYsR07dxJSDFLYYJY4A7Ceexxp2p 342TDyT0Zw98frpMPr0YE6rOm4AUdVI30iZbz0Wzym19NE/rv1Hg6cdUE0MJGuXqGO/9 FQ== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3ufmv13fum-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 20 Nov 2023 04:26:07 +0000 Received: from m0353723.ppops.net (m0353723.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 3AK4Nefb004317; Mon, 20 Nov 2023 04:26:07 GMT Received: from ppma12.dal12v.mail.ibm.com (dc.9e.1632.ip4.static.sl-reverse.com [50.22.158.220]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3ufmv13fud-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 20 Nov 2023 04:26:07 +0000 Received: from pps.filterd (ppma12.dal12v.mail.ibm.com [127.0.0.1]) by ppma12.dal12v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 3AK1jCeH015171; Mon, 20 Nov 2023 04:26:06 GMT Received: from smtprelay04.dal12v.mail.ibm.com ([172.16.1.6]) by ppma12.dal12v.mail.ibm.com (PPS) with ESMTPS id 3uf7ksq4f3-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 20 Nov 2023 04:26:06 +0000 Received: from smtpav04.wdc07v.mail.ibm.com (smtpav04.wdc07v.mail.ibm.com [10.39.53.231]) by smtprelay04.dal12v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 3AK4Q5NG21103170 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 20 Nov 2023 04:26:05 GMT Received: from smtpav04.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 78DCD58050; Mon, 20 Nov 2023 04:26:05 +0000 (GMT) Received: from smtpav04.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id C347E58045; Mon, 20 Nov 2023 04:26:04 +0000 (GMT) Received: from cowardly-lion.the-meissners.org (unknown [9.61.1.46]) by smtpav04.wdc07v.mail.ibm.com (Postfix) with ESMTPS; Mon, 20 Nov 2023 04:26:04 +0000 (GMT) Date: Sun, 19 Nov 2023 23:26:03 -0500 From: Michael Meissner To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , "Kewen.Lin" , David Edelsohn , Peter Bergner Subject: [PATCH 3/4] Add integer vector pair mode support to PowerPC Message-ID: Mail-Followup-To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , "Kewen.Lin" , David Edelsohn , Peter Bergner References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-TM-AS-GCONF: 00 X-Proofpoint-GUID: t7as77OMTUkZgeug1PGey3lwUhD6NY4J X-Proofpoint-ORIG-GUID: P3fAk2o51RN61I6awyWCX44v52rBNdJs X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.987,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-11-20_01,2023-11-17_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1015 malwarescore=0 phishscore=0 spamscore=0 mlxlogscore=999 lowpriorityscore=0 bulkscore=0 suspectscore=0 mlxscore=0 priorityscore=1501 adultscore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2311060000 definitions=main-2311200029 X-Spam-Status: No, score=-10.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1783055667355023544 X-GMAIL-MSGID: 1783055667355023544 The first two patches in the vector pair series were previous posted. This patch needs thos two patches. The first patch implemented the basic modes, and it allows for initialization of the modes. In addition, I added some optimizations for extracting and setting fields within the vector pair. The second patch in the vector pair series implemented floating point support. This third patch implements the integer vector pair support. This adds the basic support for doing integer operations on vector pairs. I have implemented most of the arithmetic and logical that will be needed in the future when byte shuffling will be added. I did add various combiner insns to fold the logical instructions (i.e. ior of not becomes orc). Since the PowerPC architecture does not have negative for vectors of 8/16-bit elements, I have added alternate code that creates a 0 and then does a subtract. The main instructions that are not supported are shift and rotate instructions. In addition, if people want to use vector pair support on integer types, it might make sense to add support for saturating adds and subtracts, along the various specialized instructions (bpermd, etc.). The fourth patch will provide new tests to the test suite. When I test a saxpy type loop (a[i] += (b[i] * c[i])), I generally see a 10% improvement over either auto-factorization, or just using the vector types. I have tested these patches on a little endian power10 system. With -vector-size-32 disabled by default, there are no regressions in the test suite. I have also built and run the tests on both little endian power 9 and big endian 9 power systems, and there are no regressions. Can I check these patches into the master branch? 2023-11-19 Michael Meisner gcc/ * config/rs6000/vector-pair.md (VPAIR_INT): New mode iterator. (VPAIR_NEG_VNEG): Likewise. (VPAIR_NEG_SUB): Likewise. (VPAIR_INT_BINARY): New code iterator. (neg2, VPAIR_NEG_VNEG iterator): New insn. (neg2, VPAIR_NEG_SUB iterator); Likewise. (2, VPAIR_LOGICAL_UNARY and VPAIR_INT iterators): Likewise. (3, VPAIR_LOGICAL_BINARY and VPAIR INT iterator): Likewise. (nor3_1): Likewise. (nor3_2): Likewise. (andc3): Likewise. (eqv3): Likewise. (nand3_1): Likewise. (nand3_2): Likewise. (orc): Likewise. --- gcc/config/rs6000/vector-pair.md | 252 +++++++++++++++++++++++++++++++ 1 file changed, 252 insertions(+) diff --git a/gcc/config/rs6000/vector-pair.md b/gcc/config/rs6000/vector-pair.md index 8e2d7e5cc5b..dc71ea28293 100644 --- a/gcc/config/rs6000/vector-pair.md +++ b/gcc/config/rs6000/vector-pair.md @@ -38,6 +38,22 @@ (define_mode_iterator VPAIR_FP [V8SF V4DF]) (define_code_iterator VPAIR_FP_UNARY [abs neg]) (define_code_iterator VPAIR_FP_BINARY [plus minus mult smin smax]) +;; Integer vector pair ops. We need the basic logical opts to support +;; permution on little endian systems. +(define_mode_iterator VPAIR_INT [V32QI V16HI V8SI V4DI]) + +;; Special iterators for NEG (V4SI and V2DI have vneg{w,d}), while V16QI and +;; V8HI have to use a subtract from 0. +(define_mode_iterator VPAIR_NEG_VNEG [V4DI V8SI]) +(define_mode_iterator VPAIR_NEG_SUB [V32QI V16HI]) + +;; Iterator integer unary/binary operations. Logical operations can be done on +;; all VSX registers, while the binary int operators need Altivec registers. +(define_code_iterator VPAIR_LOGICAL_UNARY [not]) +(define_code_iterator VPAIR_LOGICAL_BINARY [and ior xor]) + +(define_code_iterator VPAIR_INT_BINARY [plus minus smin smax umin umax]) + ;; Iterator for vector pairs with double word elements (define_mode_iterator VPAIR_DWORD [V4DI V4DF]) @@ -626,4 +642,240 @@ (define_insn_and_split "*nfms_fpcontract_4" } [(set_attr "length" "8") (set_attr "type" "vecfloat")]) + +;; Vector pair negate if we have the VNEGx instruction. +(define_insn_and_split "neg2" + [(set (match_operand:VPAIR_NEG_VNEG 0 "vsx_register_operand" "=v") + (neg:VPAIR_NEG_VNEG + (match_operand:VPAIR_NEG_VNEG 1 "vsx_register_operand" "v")))] + "TARGET_MMA && TARGET_VECTOR_SIZE_32" + "#" + "&& reload_completed" + [(const_int 0)] +{ + split_unary_vector_pair (mode, operands, + gen_neg2); + DONE; +} + [(set_attr "length" "8") + (set_attr "type" "vecfloat")]) + +;; Vector pair negate if we have to do a subtract from 0 +(define_insn_and_split "neg2" + [(set (match_operand:VPAIR_NEG_SUB 0 "vsx_register_operand" "=v") + (neg:VPAIR_NEG_SUB + (match_operand:VPAIR_NEG_SUB 1 "vsx_register_operand" "v"))) + (clobber (match_scratch: 2 "=&v"))] + "TARGET_MMA && TARGET_VECTOR_SIZE_32" + "#" + "&& reload_completed" + [(const_int 0)] +{ + enum machine_mode mode = mode; + rtx tmp = operands[2]; + unsigned reg0 = reg_or_subregno (operands[0]); + unsigned reg1 = reg_or_subregno (operands[1]); + + emit_move_insn (tmp, CONST0_RTX (mode)); + emit_insn (gen_sub3 (gen_rtx_REG (mode, reg0), + tmp, + gen_rtx_REG (mode, reg1))); + + emit_insn (gen_sub3 (gen_rtx_REG (mode, reg0 + 1), + tmp, + gen_rtx_REG (mode, reg1 + 1))); + + DONE; +} + [(set_attr "length" "8") + (set_attr "type" "vecfloat")]) + +;; Vector pair logical unary operations. These operations can use all VSX +;; registers. +(define_insn_and_split "2" + [(set (match_operand:VPAIR_INT 0 "vsx_register_operand" "=wa") + (VPAIR_LOGICAL_UNARY:VPAIR_INT + (match_operand:VPAIR_INT 1 "vsx_register_operand" "wa")))] + "TARGET_MMA && TARGET_VECTOR_SIZE_32" + "#" + "&& reload_completed" + [(const_int 0)] +{ + split_unary_vector_pair (mode, operands, + gen_2); + DONE; +} + [(set_attr "length" "8") + (set_attr "type" "veclogical")]) + +;; Vector pair logical binary operations. These operations can use all VSX +;; registers. +(define_insn_and_split "3" + [(set (match_operand:VPAIR_INT 0 "vsx_register_operand" "=wa") + (VPAIR_LOGICAL_BINARY:VPAIR_INT + (match_operand:VPAIR_INT 1 "vsx_register_operand" "wa") + (match_operand:VPAIR_INT 2 "vsx_register_operand" "wa")))] + "TARGET_MMA && TARGET_VECTOR_SIZE_32" + "#" + "&& reload_completed" + [(const_int 0)] +{ + split_binary_vector_pair (mode, operands, + gen_3); + DONE; +} + [(set_attr "length" "8") + (set_attr "type" "veclogical")]) + +;; Vector pair logical binary operations. These operations require Altivec +;; registers. +(define_insn_and_split "3" + [(set (match_operand:VPAIR_INT 0 "vsx_register_operand" "=v") + (VPAIR_INT_BINARY:VPAIR_INT + (match_operand:VPAIR_INT 1 "vsx_register_operand" "v") + (match_operand:VPAIR_INT 2 "vsx_register_operand" "v")))] + "TARGET_MMA && TARGET_VECTOR_SIZE_32" + "#" + "&& reload_completed" + [(const_int 0)] +{ + split_binary_vector_pair (mode, operands, + gen_3); + DONE; +} + [(set_attr "length" "8") + (set_attr "type" "vecsimple")]) + +;; Optiomize vector pair ~(a | b) or ((~a) & (~b)) to produce xxlnor +(define_insn_and_split "*nor3_1" + [(set (match_operand:VPAIR_INT 0 "vsx_register_operand" "=wa") + (not:VPAIR_INT + (ior:VPAIR_INT + (match_operand:VPAIR_INT 1 "vsx_register_operand" "wa") + (match_operand:VPAIR_INT 2 "vsx_register_operand" "wa"))))] + "TARGET_MMA && TARGET_VECTOR_SIZE_32" + "#" + "&& reload_completed" + [(const_int 0)] +{ + split_binary_vector_pair (mode, operands, + gen_nor3); + DONE; +} + [(set_attr "length" "8") + (set_attr "type" "veclogical")]) + +(define_insn_and_split "*nor3_2" + [(set (match_operand:VPAIR_INT 0 "vsx_register_operand" "=wa") + (and:VPAIR_INT + (not:VPAIR_INT + (match_operand:VPAIR_INT 1 "vsx_register_operand" "wa")) + (not:VPAIR_INT + (match_operand:VPAIR_INT 2 "vsx_register_operand" "wa"))))] + "TARGET_MMA && TARGET_VECTOR_SIZE_32" + "#" + "&& reload_completed" + [(const_int 0)] +{ + split_binary_vector_pair (mode, operands, + gen_nor3); + DONE; +} + [(set_attr "length" "8") + (set_attr "type" "veclogical")]) + +;; Optimize vector pair (~a) & b to use xxlandc +(define_insn_and_split "*andc3" + [(set (match_operand:VPAIR_INT 0 "vsx_register_operand" "=wa") + (and:VPAIR_INT + (not:VPAIR_INT + (match_operand:VPAIR_INT 1 "vsx_register_operand" "wa")) + (match_operand:VPAIR_INT 2 "vsx_register_operand" "wa")))] + "TARGET_MMA && TARGET_VECTOR_SIZE_32" + "#" + "&& reload_completed" + [(const_int 0)] +{ + split_binary_vector_pair (mode, operands, + gen_andc3); + DONE; +} + [(set_attr "length" "8") + (set_attr "type" "veclogical")]) + +;; Optimize vector pair ~(a ^ b) to produce xxleqv +(define_insn_and_split "*eqv3" + [(set (match_operand:VPAIR_INT 0 "vsx_register_operand" "=wa") + (not:VPAIR_INT + (xor:VPAIR_INT + (match_operand:VPAIR_INT 1 "vsx_register_operand" "wa") + (match_operand:VPAIR_INT 2 "vsx_register_operand" "wa"))))] + "TARGET_MMA && TARGET_VECTOR_SIZE_32" + "#" + "&& reload_completed" + [(const_int 0)] +{ + split_binary_vector_pair (mode, operands, + gen_nor3); + DONE; +} + [(set_attr "length" "8") + (set_attr "type" "veclogical")]) + +;; Optiomize vector pair ~(a & b) or ((~a) | (~b)) to produce xxlnand +(define_insn_and_split "*nand3_1" + [(set (match_operand:VPAIR_INT 0 "vsx_register_operand" "=wa") + (not:VPAIR_INT + (and:VPAIR_INT + (match_operand:VPAIR_INT 1 "vsx_register_operand" "wa") + (match_operand:VPAIR_INT 2 "vsx_register_operand" "wa"))))] + "TARGET_MMA && TARGET_VECTOR_SIZE_32" + "#" + "&& reload_completed" + [(const_int 0)] +{ + split_binary_vector_pair (mode, operands, + gen_nand3); + DONE; +} + [(set_attr "length" "8") + (set_attr "type" "veclogical")]) + +(define_insn_and_split "*nand3_2" + [(set (match_operand:VPAIR_INT 0 "vsx_register_operand" "=wa") + (ior:VPAIR_INT + (not:VPAIR_INT + (match_operand:VPAIR_INT 1 "vsx_register_operand" "wa")) + (not:VPAIR_INT + (match_operand:VPAIR_INT 2 "vsx_register_operand" "wa"))))] + "TARGET_MMA && TARGET_VECTOR_SIZE_32" + "#" + "&& reload_completed" + [(const_int 0)] +{ + split_binary_vector_pair (mode, operands, + gen_nand3); + DONE; +} + [(set_attr "length" "8") + (set_attr "type" "veclogical")]) + +;; Optimize vector pair (~a) | b to produce xxlorc +(define_insn_and_split "*orc3" + [(set (match_operand:VPAIR_INT 0 "vsx_register_operand" "=wa") + (ior:VPAIR_INT + (not:VPAIR_INT + (match_operand:VPAIR_INT 1 "vsx_register_operand" "wa")) + (match_operand:VPAIR_INT 2 "vsx_register_operand" "wa")))] + "TARGET_MMA && TARGET_VECTOR_SIZE_32" + "#" + "&& reload_completed" + [(const_int 0)] +{ + split_binary_vector_pair (mode, operands, + gen_orc3); + DONE; +} + [(set_attr "length" "8") + (set_attr "type" "veclogical")]) From patchwork Mon Nov 20 04:26:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Meissner X-Patchwork-Id: 166946 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:9910:0:b0:403:3b70:6f57 with SMTP id i16csp1975188vqn; Sun, 19 Nov 2023 20:27:26 -0800 (PST) X-Google-Smtp-Source: AGHT+IHw/pmsAcYl/2ObAvlWZfQ+Xu2H4s7HVUKRp4KjdS/Ae93myM/95MhOVRNXrwCK3FaI2y7i X-Received: by 2002:a05:620a:11ab:b0:76f:c24:5450 with SMTP id c11-20020a05620a11ab00b0076f0c245450mr7027822qkk.2.1700454446373; Sun, 19 Nov 2023 20:27:26 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1700454446; cv=pass; d=google.com; s=arc-20160816; b=DKuRY9Cgl+b47IET9xk9FpqVOD27t2SfaanAsXOPeNkLfWB+wNXfN6ZV3SMPnBrPPK Zq2zag0gVvkJ9AP++qa6y0r3OzqOqguz9mqTK6dMXhTz5Fk9MlcrtvyfpI+NXn7vjwiE J6V29LvOg7Yh8ylLxNCczZpCaBmBa2KAU3GCFqvPZRdoIOAdT63MHpepdyB55Qr3fNuW n+Aav0IDKFfY8LnODE11si57d4eYl1JrGAkZF7MPn/47qmPQR1y7yyFpdj4+P9u5ch4C Zz00Co6EQug15YhGk2Yp89+R/mQN+6qT7j6XDEcSO+M5qgItfmMuIe6dighRgmx+ZSmd uU5A== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:in-reply-to:content-disposition :mime-version:references:mail-followup-to:message-id:subject:to:from :date:dkim-signature:arc-filter:dmarc-filter:delivered-to; bh=fNDdmrto4jIs2LQLr1yBd13YVix8H2eJq4IXmAN4WSU=; fh=jH+DijE7mz3ySVsRmzRqEe/ioBeGu3vnvA+jm2JjCm8=; b=hvK/hH1ZS0gzJc/vmYDqjMWtUjNjvqD2IWhwsX95p77LwD3xCrA8EktxET5JrsTrLz gwBjqPPwugfgrC/gzp0S4x3vuPl9YVpx+EHkce19zxpP7C1BxXXdPX7vlTsbK0oaMTDj z7+qwYytruUvVmO2m73dyC7n89NqDPZLIusr5VdcB03f/m9s0JfSeJoROrySe3N/h7i3 2X/PsTmWLoeNp5PAQtxIUcwiJbnblMCKDOOhoczwzkWG8ZbWEIY3UW14TsJmvv/28OzQ f0pEa5jthwEemN2cG6L7fj2CLNKt/C316m9VnhwbnZktKfAPS/hrbYWDXFsM2cSZ58Wv QXVQ== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=REFAr414; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com Received: from server2.sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id q14-20020a05620a0d8e00b0077be8e63a21si6851702qkl.469.2023.11.19.20.27.26 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 19 Nov 2023 20:27:26 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=REFAr414; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 2823E3858423 for ; Mon, 20 Nov 2023 04:27:26 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 689383858C78 for ; Mon, 20 Nov 2023 04:27:01 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 689383858C78 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 689383858C78 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700454424; cv=none; b=QvS5ypuGGtqeoiVEyxZEPxsDf2h9jPO3Otics6OWcFwCMXhejxdH2IM+v8DyPX9szd+4Xmf6kwB9cUzZZsO7er5JYXVtzCBhFGJXcf3/9F05cafI8kO/1/KHlmdQ1Yr99Ma58Y8c03VXf+jWB4B+NQNRalY2g46o12y7frskpRo= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700454424; c=relaxed/simple; bh=KZEUMXLMpB7kDDUZL10/3loJaMAKQWJy7ZT6gnt3ppk=; h=DKIM-Signature:Date:From:To:Subject:Message-ID:MIME-Version; b=oiXfG75Me+9GV3SvlMTBSbYI2KY1ZTGR2DWUb/Lju05L4K6lbNQvUABHu/HAX2bE5xLg14b3fOgsezDRiBU/4mx2Fxr24ncIJYr5KQt8pccpCeZ8lYzIF9eEvgh+q3amfZ0A1uIuuRSjBb1rVSfwA3Cx0CqeMuU3UJ4yMNvURfw= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0356516.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3AK41KoH026047; Mon, 20 Nov 2023 04:27:01 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=date : from : to : subject : message-id : references : mime-version : content-type : in-reply-to; s=pp1; bh=fNDdmrto4jIs2LQLr1yBd13YVix8H2eJq4IXmAN4WSU=; b=REFAr414yYlwt/+FP7fugfA9Knqn+7gnFsHfo+3FI1bHr+FQkrVPMSPlyGTAFU9gjZjH 4jIP3m+KyxK+iKP8/aFcBf2acdSad73VktAZnSfiixA80Yy1zsy8HPTobPcGM2mb5AN2 FmZNT90MixQxZTrLKq37hM8O+Y8klLciSENviD/XXgiCWsqRrjY7iC9A+SHYFQ3q9TPh RHXkx0tN1fcvXU2yTpF6P4z018E4D5r0TudHFwnXRla+7MgRR5s1kSKAx/tj0hG3KInA LerAq3+jF3dMBlaMYgYlHE+Lz3guz0/Fvotm2xi3b8WckMjVIhRD/XZOjZ5LS5jG7jcH og== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3ufuwrvewc-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 20 Nov 2023 04:27:00 +0000 Received: from m0356516.ppops.net (m0356516.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 3AK4QNFF016304; Mon, 20 Nov 2023 04:27:00 GMT Received: from ppma12.dal12v.mail.ibm.com (dc.9e.1632.ip4.static.sl-reverse.com [50.22.158.220]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3ufuwrvew6-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 20 Nov 2023 04:27:00 +0000 Received: from pps.filterd (ppma12.dal12v.mail.ibm.com [127.0.0.1]) by ppma12.dal12v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 3AK1psDW015174; Mon, 20 Nov 2023 04:26:59 GMT Received: from smtprelay06.dal12v.mail.ibm.com ([172.16.1.8]) by ppma12.dal12v.mail.ibm.com (PPS) with ESMTPS id 3uf7ksq4jt-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 20 Nov 2023 04:26:59 +0000 Received: from smtpav02.dal12v.mail.ibm.com (smtpav02.dal12v.mail.ibm.com [10.241.53.101]) by smtprelay06.dal12v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 3AK4QwqY8389332 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 20 Nov 2023 04:26:58 GMT Received: from smtpav02.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id B3B7C5805A; Mon, 20 Nov 2023 04:26:58 +0000 (GMT) Received: from smtpav02.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 3770558051; Mon, 20 Nov 2023 04:26:58 +0000 (GMT) Received: from cowardly-lion.the-meissners.org (unknown [9.61.1.46]) by smtpav02.dal12v.mail.ibm.com (Postfix) with ESMTPS; Mon, 20 Nov 2023 04:26:58 +0000 (GMT) Date: Sun, 19 Nov 2023 23:26:56 -0500 From: Michael Meissner To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , "Kewen.Lin" , David Edelsohn , Peter Bergner Subject: [PATCH 4/4] Add vector pair tests to PowerPC Message-ID: Mail-Followup-To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , "Kewen.Lin" , David Edelsohn , Peter Bergner References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: Xg7JkpKj1b_71WH-75jIDmCx70NihuJ4 X-Proofpoint-GUID: GKzjRZKTNzNSO-vCCbGfR_ykWGuxzmMv X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.987,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-11-20_01,2023-11-17_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 impostorscore=0 phishscore=0 adultscore=0 priorityscore=1501 lowpriorityscore=0 mlxlogscore=999 spamscore=0 clxscore=1015 suspectscore=0 mlxscore=0 bulkscore=0 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2311060000 definitions=main-2311200029 X-Spam-Status: No, score=-10.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1783055721750387269 X-GMAIL-MSGID: 1783055721750387269 The first patch in the vector pair series was previous posted. This patch needs that first patch. The first patch implemented the basic modes, and it allows for initialization of the modes. In addition, I added some optimizations for extracting and setting fields within the vector pair. The second patch in the vector pair series implemented floating point support. The third patch in the vector pair series implemented integer point support. This fourth patch provide new tests to the test suite. When I test a saxpy type loop (a[i] += (b[i] * c[i])), I generally see a 10% improvement over either auto-factorization, or just using the vector types. I have tested these patches on a little endian power10 system. With -vector-size-32 disabled by default, there are no regressions in the test suite. I have also built and run the tests on both little endian power 9 and big endian 9 power systems, and there are no regressions. Can I check these patches into the master branch? 2023-11-19 Michael Meisner gcc/ * gcc.target/powerpc/vector-size-32-1.c: New test. * gcc.target/powerpc/vector-size-32-2.c: New test. * gcc.target/powerpc/vector-size-32-3.c: New test. * gcc.target/powerpc/vector-size-32-4.c: New test. * gcc.target/powerpc/vector-size-32-5.c: New test. * gcc.target/powerpc/vector-size-32-6.c: New test. * gcc.target/powerpc/vector-size-32-7.c: New test. --- .../gcc.target/powerpc/vector-size-32-1.c | 106 ++++++++++++++ .../gcc.target/powerpc/vector-size-32-2.c | 106 ++++++++++++++ .../gcc.target/powerpc/vector-size-32-3.c | 137 ++++++++++++++++++ .../gcc.target/powerpc/vector-size-32-4.c | 137 ++++++++++++++++++ .../gcc.target/powerpc/vector-size-32-5.c | 137 ++++++++++++++++++ .../gcc.target/powerpc/vector-size-32-6.c | 137 ++++++++++++++++++ .../gcc.target/powerpc/vector-size-32-7.c | 31 ++++ 7 files changed, 791 insertions(+) create mode 100644 gcc/testsuite/gcc.target/powerpc/vector-size-32-1.c create mode 100644 gcc/testsuite/gcc.target/powerpc/vector-size-32-2.c create mode 100644 gcc/testsuite/gcc.target/powerpc/vector-size-32-3.c create mode 100644 gcc/testsuite/gcc.target/powerpc/vector-size-32-4.c create mode 100644 gcc/testsuite/gcc.target/powerpc/vector-size-32-5.c create mode 100644 gcc/testsuite/gcc.target/powerpc/vector-size-32-6.c create mode 100644 gcc/testsuite/gcc.target/powerpc/vector-size-32-7.c diff --git a/gcc/testsuite/gcc.target/powerpc/vector-size-32-1.c b/gcc/testsuite/gcc.target/powerpc/vector-size-32-1.c new file mode 100644 index 00000000000..fd1e2decea7 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vector-size-32-1.c @@ -0,0 +1,106 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2 -mvector-size-32" } */ + +/* Test whether the __attrbiute__((__vector_size(32))) generates paired vector + loads and stores with the -mvector-size-32 option. This file tests 32-byte + vectors with 4 double elements. */ + +typedef double vectype_t __attribute__((__vector_size__(32))); + +void +test_add (vectype_t *dest, + vectype_t *a, + vectype_t *b) +{ + /* 2 lxvp, 2 xvadddp, 1 stxvp. */ + *dest = *a + *b; +} + +void +test_sub (vectype_t *dest, + vectype_t *a, + vectype_t *b) +{ + /* 2 lxvp, 2 xvsubdp, 1 stxvp. */ + *dest = *a - *b; +} + +void +test_multiply (vectype_t *dest, + vectype_t *a, + vectype_t *b) +{ + /* 2 lxvp, 2 xvmuldp, 1 stxvp. */ + *dest = *a * *b; +} + +void +test_divide (vectype_t *dest, + vectype_t *a, + vectype_t *b) +{ + /* 2 lxvp, 2 xvdivdp, 1 stxvp. */ + *dest = *a / *b; +} + +void +test_negate (vectype_t *dest, + vectype_t *a, + vectype_t *b) +{ + /* 2 lxvp, 2 xvnegdp, 1 stxvp. */ + *dest = - *a; +} + +void +test_fma (vectype_t *dest, + vectype_t *a, + vectype_t *b, + vectype_t *c) +{ + /* 2 lxvp, 2 xvmadd{a,m}dp, 1 stxvp. */ + *dest = (*a * *b) + *c; +} + +void +test_fms (vectype_t *dest, + vectype_t *a, + vectype_t *b, + vectype_t *c) +{ + /* 2 lxvp, 2 xvmsub{a,m}dp, 1 stxvp. */ + *dest = (*a * *b) - *c; +} + +void +test_nfma (vectype_t *dest, + vectype_t *a, + vectype_t *b, + vectype_t *c) +{ + /* 2 lxvp, 2 xvnmadddp, 1 stxvp. */ + *dest = -((*a * *b) + *c); +} + +void +test_nfms (vectype_t *dest, + vectype_t *a, + vectype_t *b, + vectype_t *c) +{ + /* 2 lxvp, 2 xvnmsubdp, 1 stxvp. */ + *dest = -((*a * *b) - *c); +} + +/* { dg-final { scan-assembler-times {\mlxvp\M} 21 } } */ +/* { dg-final { scan-assembler-times {\mstxvp\M} 9 } } */ +/* { dg-final { scan-assembler-times {\mxvadddp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvdivdp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvmadd.dp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvmsub.dp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvmuldp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvnegdp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvnmadd.dp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvnmsub.dp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvsubdp\M} 2 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/vector-size-32-2.c b/gcc/testsuite/gcc.target/powerpc/vector-size-32-2.c new file mode 100644 index 00000000000..eccc9c7aabf --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vector-size-32-2.c @@ -0,0 +1,106 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2 -mvector-size-32" } */ + +/* Test whether the __attrbiute__((__vector_size(32))) generates paired vector + loads and stores with the -mvector-size-32 option. This file tests 32-byte + vectors with 8 float elements. */ + +typedef float vectype_t __attribute__((__vector_size__(32))); + +void +test_add (vectype_t *dest, + vectype_t *a, + vectype_t *b) +{ + /* 2 lxvp, 2 xvaddsp, 1 stxvp. */ + *dest = *a + *b; +} + +void +test_sub (vectype_t *dest, + vectype_t *a, + vectype_t *b) +{ + /* 2 lxvp, 2 xvsubsp, 1 stxvp. */ + *dest = *a - *b; +} + +void +test_multiply (vectype_t *dest, + vectype_t *a, + vectype_t *b) +{ + /* 2 lxvp, 2 xvmulsp, 1 stxvp. */ + *dest = *a * *b; +} + +void +test_divide (vectype_t *dest, + vectype_t *a, + vectype_t *b) +{ + /* 2 lxvp, 2 xvdivsp, 1 stxvp. */ + *dest = *a / *b; +} + +void +test_negate (vectype_t *dest, + vectype_t *a, + vectype_t *b) +{ + /* 2 lxvp, 2 xvnegsp, 1 stxvp. */ + *dest = - *a; +} + +void +test_fma (vectype_t *dest, + vectype_t *a, + vectype_t *b, + vectype_t *c) +{ + /* 2 lxvp, 2 xvmadd{a,m}sp, 1 stxvp. */ + *dest = (*a * *b) + *c; +} + +void +test_fms (vectype_t *dest, + vectype_t *a, + vectype_t *b, + vectype_t *c) +{ + /* 2 lxvp, 2 xvmsub{a,m}sp, 1 stxvp. */ + *dest = (*a * *b) - *c; +} + +void +test_nfma (vectype_t *dest, + vectype_t *a, + vectype_t *b, + vectype_t *c) +{ + /* 2 lxvp, 2 xvnmaddsp, 1 stxvp. */ + *dest = -((*a * *b) + *c); +} + +void +test_nfms (vectype_t *dest, + vectype_t *a, + vectype_t *b, + vectype_t *c) +{ + /* 2 lxvp, 2 xvnmsubsp, 1 stxvp. */ + *dest = -((*a * *b) - *c); +} + +/* { dg-final { scan-assembler-times {\mlxvp\M} 21 } } */ +/* { dg-final { scan-assembler-times {\mstxvp\M} 9 } } */ +/* { dg-final { scan-assembler-times {\mxvaddsp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvdivsp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvmadd.sp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvmsub.sp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvmulsp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvnegsp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvnmadd.sp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvnmsub.sp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvsubsp\M} 2 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/vector-size-32-3.c b/gcc/testsuite/gcc.target/powerpc/vector-size-32-3.c new file mode 100644 index 00000000000..b1952b046f9 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vector-size-32-3.c @@ -0,0 +1,137 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2 -mvector-size-32" } */ + +/* Test whether the __attrbiute__((__vector_size(32))) generates paired vector + loads and stores with the -mvector-size-32 option. This file tests 32-byte + vectors with 4 64-bit integer elements. */ + +typedef long long vectype_t __attribute__((__vector_size__(32))); + +void +test_add (vectype_t *dest, + vectype_t *a, + vectype_t *b) +{ + /* 2 lxvp, 2 vaddudm, 1 stxvp. */ + *dest = *a + *b; +} + +void +test_sub (vectype_t *dest, + vectype_t *a, + vectype_t *b) +{ + /* 2 lxvp, 2 vsubudm, 1 stxvp. */ + *dest = *a - *b; +} + +void +test_negate (vectype_t *dest, + vectype_t *a) +{ + /* 2 lxvp, 2 vnegd, 1 stxvp. */ + *dest = - *a; +} + +void +test_not (vectype_t *dest, + vectype_t *a) +{ + /* 2 lxvp, 2 xxlnor, 1 stxvp. */ + *dest = ~ *a; +} + +void +test_and (vectype_t *dest, + vectype_t *a, + vectype_t *b) +{ + /* 2 lxvp, 2 xxland, 1 stxvp. */ + *dest = *a & *b; +} + +void +test_or (vectype_t *dest, + vectype_t *a, + vectype_t *b) +{ + /* 2 lxvp, 2 xxlor, 1 stxvp. */ + *dest = *a | *b; +} + +void +test_xor (vectype_t *dest, + vectype_t *a, + vectype_t *b) +{ + /* 2 lxvp, 2 xxlxor, 1 stxvp. */ + *dest = *a ^ *b; +} + +void +test_andc_1 (vectype_t *dest, + vectype_t *a, + vectype_t *b) +{ + /* 2 lxvp, 2 xxlandc, 1 stxvp. */ + *dest = (~ *a) & *b; +} + +void +test_andc_2 (vectype_t *dest, + vectype_t *a, + vectype_t *b) +{ + /* 2 lxvp, 2 xxlandc, 1 stxvp. */ + *dest = *a & (~ *b); +} + +void +test_orc_1 (vectype_t *dest, + vectype_t *a, + vectype_t *b) +{ + /* 2 lxvp, 2 xxlorc, 1 stxvp. */ + *dest = (~ *a) | *b; +} + +void +test_orc_2 (vectype_t *dest, + vectype_t *a, + vectype_t *b) +{ + /* 2 lxvp, 2 xxlorc, 1 stxvp. */ + *dest = *a | (~ *b); +} + +void +test_nand (vectype_t *dest, + vectype_t *a, + vectype_t *b) +{ + /* 2 lxvp, 2 xxlnand, 1 stxvp. */ + *dest = ~(*a & *b); +} + +void +test_nor (vectype_t *dest, + vectype_t *a, + vectype_t *b) +{ + /* 2 lxvp, 2 xxlnor, 1 stxvp. */ + *dest = ~(*a | *b); +} + +/* { dg-final { scan-assembler-times {\mlxvp\M} 24 } } */ +/* { dg-final { scan-assembler-times {\mstxvp\M} 13 } } */ +/* { dg-final { scan-assembler-times {\mvaddudm\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mvnegd\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mvsubudm\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxxland\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxxlandc\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mxxlnand\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxxlnor\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mxxlor\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxxlorc\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mxxlxor\M} 2 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/vector-size-32-4.c b/gcc/testsuite/gcc.target/powerpc/vector-size-32-4.c new file mode 100644 index 00000000000..110292bb4df --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vector-size-32-4.c @@ -0,0 +1,137 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2 -mvector-size-32" } */ + +/* Test whether the __attrbiute__((__vector_size(32))) generates paired vector + loads and stores with the -mvector-size-32 option. This file tests 32-byte + vectors with 4 64-bit integer elements. */ + +typedef int vectype_t __attribute__((__vector_size__(32))); + +void +test_add (vectype_t *dest, + vectype_t *a, + vectype_t *b) +{ + /* 2 lxvp, 2 vadduwm, 1 stxvp. */ + *dest = *a + *b; +} + +void +test_sub (vectype_t *dest, + vectype_t *a, + vectype_t *b) +{ + /* 2 lxvp, 2 vsubuwm, 1 stxvp. */ + *dest = *a - *b; +} + +void +test_negate (vectype_t *dest, + vectype_t *a) +{ + /* 2 lxvp, 2 vnegw, 1 stxvp. */ + *dest = - *a; +} + +void +test_not (vectype_t *dest, + vectype_t *a) +{ + /* 2 lxvp, 2 xxlnor, 1 stxvp. */ + *dest = ~ *a; +} + +void +test_and (vectype_t *dest, + vectype_t *a, + vectype_t *b) +{ + /* 2 lxvp, 2 xxland, 1 stxvp. */ + *dest = *a & *b; +} + +void +test_or (vectype_t *dest, + vectype_t *a, + vectype_t *b) +{ + /* 2 lxvp, 2 xxlor, 1 stxvp. */ + *dest = *a | *b; +} + +void +test_xor (vectype_t *dest, + vectype_t *a, + vectype_t *b) +{ + /* 2 lxvp, 2 xxlxor, 1 stxvp. */ + *dest = *a ^ *b; +} + +void +test_andc_1 (vectype_t *dest, + vectype_t *a, + vectype_t *b) +{ + /* 2 lxvp, 2 xxlandc, 1 stxvp. */ + *dest = (~ *a) & *b; +} + +void +test_andc_2 (vectype_t *dest, + vectype_t *a, + vectype_t *b) +{ + /* 2 lxvp, 2 xxlandc, 1 stxvp. */ + *dest = *a & (~ *b); +} + +void +test_orc_1 (vectype_t *dest, + vectype_t *a, + vectype_t *b) +{ + /* 2 lxvp, 2 xxlorc, 1 stxvp. */ + *dest = (~ *a) | *b; +} + +void +test_orc_2 (vectype_t *dest, + vectype_t *a, + vectype_t *b) +{ + /* 2 lxvp, 2 xxlorc, 1 stxvp. */ + *dest = *a | (~ *b); +} + +void +test_nand (vectype_t *dest, + vectype_t *a, + vectype_t *b) +{ + /* 2 lxvp, 2 xxlnand, 1 stxvp. */ + *dest = ~(*a & *b); +} + +void +test_nor (vectype_t *dest, + vectype_t *a, + vectype_t *b) +{ + /* 2 lxvp, 2 xxlnor, 1 stxvp. */ + *dest = ~(*a | *b); +} + +/* { dg-final { scan-assembler-times {\mlxvp\M} 24 } } */ +/* { dg-final { scan-assembler-times {\mstxvp\M} 13 } } */ +/* { dg-final { scan-assembler-times {\mvadduwm\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mvnegw\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mvsubuwm\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxxland\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxxlandc\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mxxlnand\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxxlnor\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mxxlor\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxxlorc\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mxxlxor\M} 2 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/vector-size-32-5.c b/gcc/testsuite/gcc.target/powerpc/vector-size-32-5.c new file mode 100644 index 00000000000..8921b04c468 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vector-size-32-5.c @@ -0,0 +1,137 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2 -mvector-size-32" } */ + +/* Test whether the __attrbiute__((__vector_size(32))) generates paired vector + loads and stores with the -mvector-size-32 option. This file tests 32-byte + vectors with 4 64-bit integer elements. */ + +typedef short vectype_t __attribute__((__vector_size__(32))); + +void +test_add (vectype_t *dest, + vectype_t *a, + vectype_t *b) +{ + /* 2 lxvp, 2 vadduhm, 1 stxvp. */ + *dest = *a + *b; +} + +void +test_sub (vectype_t *dest, + vectype_t *a, + vectype_t *b) +{ + /* 2 lxvp, 2 vsubuhm, 1 stxvp. */ + *dest = *a - *b; +} + +void +test_negate (vectype_t *dest, + vectype_t *a) +{ + /* 2 lxvp, 1 xxspltib, 2 vsubuhm, 1 stxvp. */ + *dest = - *a; +} + +void +test_not (vectype_t *dest, + vectype_t *a) +{ + /* 2 lxvp, 2 xxlnor, 1 stxvp. */ + *dest = ~ *a; +} + +void +test_and (vectype_t *dest, + vectype_t *a, + vectype_t *b) +{ + /* 2 lxvp, 2 xxland, 1 stxvp. */ + *dest = *a & *b; +} + +void +test_or (vectype_t *dest, + vectype_t *a, + vectype_t *b) +{ + /* 2 lxvp, 2 xxlor, 1 stxvp. */ + *dest = *a | *b; +} + +void +test_xor (vectype_t *dest, + vectype_t *a, + vectype_t *b) +{ + /* 2 lxvp, 2 xxlxor, 1 stxvp. */ + *dest = *a ^ *b; +} + +void +test_andc_1 (vectype_t *dest, + vectype_t *a, + vectype_t *b) +{ + /* 2 lxvp, 2 xxlandc, 1 stxvp. */ + *dest = (~ *a) & *b; +} + +void +test_andc_2 (vectype_t *dest, + vectype_t *a, + vectype_t *b) +{ + /* 2 lxvp, 2 xxlandc, 1 stxvp. */ + *dest = *a & (~ *b); +} + +void +test_orc_1 (vectype_t *dest, + vectype_t *a, + vectype_t *b) +{ + /* 2 lxvp, 2 xxlorc, 1 stxvp. */ + *dest = (~ *a) | *b; +} + +void +test_orc_2 (vectype_t *dest, + vectype_t *a, + vectype_t *b) +{ + /* 2 lxvp, 2 xxlorc, 1 stxvp. */ + *dest = *a | (~ *b); +} + +void +test_nand (vectype_t *dest, + vectype_t *a, + vectype_t *b) +{ + /* 2 lxvp, 2 xxlnand, 1 stxvp. */ + *dest = ~(*a & *b); +} + +void +test_nor (vectype_t *dest, + vectype_t *a, + vectype_t *b) +{ + /* 2 lxvp, 2 xxlnor, 1 stxvp. */ + *dest = ~(*a | *b); +} + +/* { dg-final { scan-assembler-times {\mlxvp\M} 24 } } */ +/* { dg-final { scan-assembler-times {\mstxvp\M} 13 } } */ +/* { dg-final { scan-assembler-times {\mvadduhm\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mvsubuhm\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mxxland\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxxlandc\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mxxlnand\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxxlnor\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mxxlor\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxxlorc\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mxxlxor\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxxspltib\M} 1 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/vector-size-32-6.c b/gcc/testsuite/gcc.target/powerpc/vector-size-32-6.c new file mode 100644 index 00000000000..a905e6b0a31 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vector-size-32-6.c @@ -0,0 +1,137 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2 -mvector-size-32" } */ + +/* Test whether the __attrbiute__((__vector_size(32))) generates paired vector + loads and stores with the -mvector-size-32 option. This file tests 32-byte + vectors with 4 64-bit integer elements. */ + +typedef unsigned char vectype_t __attribute__((__vector_size__(32))); + +void +test_add (vectype_t *dest, + vectype_t *a, + vectype_t *b) +{ + /* 2 lxvp, 2 vaddubm, 1 stxvp. */ + *dest = *a + *b; +} + +void +test_sub (vectype_t *dest, + vectype_t *a, + vectype_t *b) +{ + /* 2 lxvp, 2 vsububm, 1 stxvp. */ + *dest = *a - *b; +} + +void +test_negate (vectype_t *dest, + vectype_t *a) +{ + /* 2 lxvp, 1 xxspltib, 2 vsububm, 1 stxvp. */ + *dest = - *a; +} + +void +test_not (vectype_t *dest, + vectype_t *a) +{ + /* 2 lxvp, 2 xxlnor, 1 stxvp. */ + *dest = ~ *a; +} + +void +test_and (vectype_t *dest, + vectype_t *a, + vectype_t *b) +{ + /* 2 lxvp, 2 xxland, 1 stxvp. */ + *dest = *a & *b; +} + +void +test_or (vectype_t *dest, + vectype_t *a, + vectype_t *b) +{ + /* 2 lxvp, 2 xxlor, 1 stxvp. */ + *dest = *a | *b; +} + +void +test_xor (vectype_t *dest, + vectype_t *a, + vectype_t *b) +{ + /* 2 lxvp, 2 xxlxor, 1 stxvp. */ + *dest = *a ^ *b; +} + +void +test_andc_1 (vectype_t *dest, + vectype_t *a, + vectype_t *b) +{ + /* 2 lxvp, 2 xxlandc, 1 stxvp. */ + *dest = (~ *a) & *b; +} + +void +test_andc_2 (vectype_t *dest, + vectype_t *a, + vectype_t *b) +{ + /* 2 lxvp, 2 xxlandc, 1 stxvp. */ + *dest = *a & (~ *b); +} + +void +test_orc_1 (vectype_t *dest, + vectype_t *a, + vectype_t *b) +{ + /* 2 lxvp, 2 xxlorc, 1 stxvp. */ + *dest = (~ *a) | *b; +} + +void +test_orc_2 (vectype_t *dest, + vectype_t *a, + vectype_t *b) +{ + /* 2 lxvp, 2 xxlorc, 1 stxvp. */ + *dest = *a | (~ *b); +} + +void +test_nand (vectype_t *dest, + vectype_t *a, + vectype_t *b) +{ + /* 2 lxvp, 2 xxlnand, 1 stxvp. */ + *dest = ~(*a & *b); +} + +void +test_nor (vectype_t *dest, + vectype_t *a, + vectype_t *b) +{ + /* 2 lxvp, 2 xxlnor, 1 stxvp. */ + *dest = ~(*a | *b); +} + +/* { dg-final { scan-assembler-times {\mlxvp\M} 24 } } */ +/* { dg-final { scan-assembler-times {\mstxvp\M} 13 } } */ +/* { dg-final { scan-assembler-times {\mvaddubm\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mvsububm\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mxxland\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxxlandc\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mxxlnand\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxxlnor\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mxxlor\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxxlorc\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mxxlxor\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxxspltib\M} 1 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/vector-size-32-7.c b/gcc/testsuite/gcc.target/powerpc/vector-size-32-7.c new file mode 100644 index 00000000000..a6e8582ba4f --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vector-size-32-7.c @@ -0,0 +1,31 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2 -mvector-size-32" } */ + +/* Test whether we can load vector pair constants into registers without using + a load instruction. */ + +typedef double vectype_t __attribute__((__vector_size__(32))); + +void +zero (vectype_t *p) +{ + *p = (vectype_t) { 0.0, 0.0, 0.0, 0.0 }; +} + +void +one (vectype_t *p) +{ + *p = (vectype_t) { 1.0, 1.0, 1.0, 1.0 }; +} + +void +mixed (vectype_t *p) +{ + *p = (vectype_t) { 0.0, 0.0, 1.0, 1.0 }; +} + +/* { dg-final { scan-assembler-not {\mp?lxvpx?\M} } } */ +/* { dg-final { scan-assembler-times {\mp?stxvpx?\M} 3 } } */ +/* { dg-final { scan-assembler-times {\mxxspltib\M} 3 } } */ +/* { dg-final { scan-assembler-times {\mxxspltidp\M} 2 } } */