From patchwork Fri Oct 13 23:41:13 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Meissner X-Patchwork-Id: 152860 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:2908:b0:403:3b70:6f57 with SMTP id ib8csp2213572vqb; Fri, 13 Oct 2023 16:41:48 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFm6ZT8dXDEpx8XQFICh4Y28GY3Xb7u0v3M/63alysnuw5uvnNOLlX5V5d0ijfkfpGT9rgl X-Received: by 2002:a05:620a:2889:b0:76c:df5d:13a9 with SMTP id j9-20020a05620a288900b0076cdf5d13a9mr31605481qkp.58.1697240508673; Fri, 13 Oct 2023 16:41:48 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1697240508; cv=pass; d=google.com; s=arc-20160816; b=eiNn8nC+jEvh1+7plg9uUuN6l7KeScdsJ4JgYYUCriHkAETzpwSRIuDwl/lYHVnW/t kCwt00Aso6jTNOq7iyeZPwRCHR057rah90XZtg6kW0oI285jn4cwK50PXA2By+jXxGmX NwOvTMzcPd8aKdIz9ZSG7mL7pq8ts5jseOC70fmHuz2HDyFwOcKXK0cm8fejGOK3HEje WXZO0KMsKW+/Pb71AlOqS3nGHOCISDjgxRf82phEo48mdMHiVnzRFY7uQH7e7lBMq5l9 RPwWECbbjioFo2SeE7R7XNdmZQkor8aE8PEU7xi/aQDIQ6R1RSrfPHCY3PNFapYpPd8a c1Ag== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-disposition :mime-version:mail-followup-to:message-id:subject:to:from:date :dkim-signature:dmarc-filter:arc-filter:delivered-to; bh=M3065C9rYY+N5fFAC/B5/Co1Ks76uW55LM8LFScepyk=; fh=0bQqz0x7CUT8I+BDaNEJSevw0DVHT666RDr0Mh+m1Jk=; b=QGw2ocLhsYYOJPmy2dICjY0ICKciWeyDb3jUq6tXgNFdNwzjgJSvUOUejTBRIhmVJW ZXiNdSiAwRld3COwCQhYItVYpLBI38b9Ec8F0uccxIjmlmMK48BQ65ziqlObf8E2/thV ocG3RksD+ch0Ho8PLf9HJ6QsVUO8neU0aTKMkaHWeb2vvpOFfU4pT0svJnO572hmfdKz Bl942JymRjzywk15T0XCzBZq1GD3EKe2AiEk7pwkbJcGeU4MEuDiz8gbkYeRTM9+XKO6 +PiJwdZCoCO3rD1bVGFYHnik6qE0nZeQmMiaUtBVMGrr507YtSfuMdqa/L7m7qNs1d1Q AIbA== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=TyKbut+E; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com Received: from server2.sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id w22-20020a05620a445600b007683e40b4a1si2031340qkp.517.2023.10.13.16.41.48 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 13 Oct 2023 16:41:48 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=TyKbut+E; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 70A04385E000 for ; Fri, 13 Oct 2023 23:41:48 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 312713858005 for ; Fri, 13 Oct 2023 23:41:19 +0000 (GMT) ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 312713858005 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1697240482; cv=none; b=SELfUwOe76sIp7l3qQaFMsZTphdQnGqWNs4NE2DjTbNEutTEzIKuBQsM00AudnOYvK47l2wwQ32pcEbGxV9QHSk4ERoxYL1u4VB4HInQ843T3OmVl8LPpANCJvf/nny8He+AQxetip6bi+xs3jK/Nv0CyXky3w21lToHHob2zWY= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1697240482; c=relaxed/simple; bh=QMImZ9YaWB1KukBVvTc9MT4JDoQs78yql77upVbek8A=; h=DKIM-Signature:Date:From:To:Subject:Message-ID:MIME-Version; b=pIyAdmqhQpiGU8WMVBROkyVGD2CffSOnW1JKS8RnUjyvNmQcG9Vya992RV+njDC9z2mUUE2qzNj85lr9mn+4F8eJCQXRKXQNlPDR/5hsK5eQl2TRrtNjHxLtLnCMCmUltgbp9UD7ru5t9wm41EeGXg0fqrWdgiSrAVhmFT0le2A= ARC-Authentication-Results: i=1; server2.sourceware.org DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 312713858005 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com Received: from pps.filterd (m0360072.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 39DNdBp7011823; Fri, 13 Oct 2023 23:41:18 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=date : from : to : subject : message-id : mime-version : content-type; s=pp1; bh=M3065C9rYY+N5fFAC/B5/Co1Ks76uW55LM8LFScepyk=; b=TyKbut+EICiK5tKl2xOAlYP0gRYfjM33kjz8bM8ZiTDwXbHQ6vD2YatHQ/1XUBP2VAqq asTwICiWgzL971bdMddap1PPgp3eVMi1vYgYwXVm2/5PskvJ07FKZW6g42gaQ+gMsIst 0AQyBlhQYcsY/UbsRz0ny/Ib5SkVmxwn+zSjj5VPYFBJeTd44Xo/jc5A/R8Ydh/jsjlt ucJBjgV0+gWvYQlr+Is/AOBG1kdegzUiRESgH/1PyNB7pjlKkYJ2kWH5dUWTiPb+tJAJ 7tpziM659897R8Oi8wuANtE5v+rpx0jnwAexASTM4L/+jcffgQ3AwrNupy69wwoIKYVI yw== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3tqfrg01ty-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 13 Oct 2023 23:41:18 +0000 Received: from m0360072.ppops.net (m0360072.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 39DNek73017090; Fri, 13 Oct 2023 23:41:17 GMT Received: from ppma12.dal12v.mail.ibm.com (dc.9e.1632.ip4.static.sl-reverse.com [50.22.158.220]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3tqfrg01tu-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 13 Oct 2023 23:41:17 +0000 Received: from pps.filterd (ppma12.dal12v.mail.ibm.com [127.0.0.1]) by ppma12.dal12v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 39DMK4A1026141; Fri, 13 Oct 2023 23:41:17 GMT Received: from smtprelay01.wdc07v.mail.ibm.com ([172.16.1.68]) by ppma12.dal12v.mail.ibm.com (PPS) with ESMTPS id 3tpt54ydfk-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 13 Oct 2023 23:41:16 +0000 Received: from smtpav02.dal12v.mail.ibm.com (smtpav02.dal12v.mail.ibm.com [10.241.53.101]) by smtprelay01.wdc07v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 39DNfF7854067474 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 13 Oct 2023 23:41:15 GMT Received: from smtpav02.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 6670A5805A; Fri, 13 Oct 2023 23:41:15 +0000 (GMT) Received: from smtpav02.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id D8DFD58051; Fri, 13 Oct 2023 23:41:14 +0000 (GMT) Received: from cowardly-lion.the-meissners.org (unknown [9.61.180.52]) by smtpav02.dal12v.mail.ibm.com (Postfix) with ESMTPS; Fri, 13 Oct 2023 23:41:14 +0000 (GMT) Date: Fri, 13 Oct 2023 19:41:13 -0400 From: Michael Meissner To: gcc-patches@gcc.gnu.org, Michael Meissner , Segher Boessenkool , "Kewen.Lin" , David Edelsohn , Peter Bergner Subject: [PATCH] Power10: Add options to disable load and store vector pair. Message-ID: Mail-Followup-To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , "Kewen.Lin" , David Edelsohn , Peter Bergner MIME-Version: 1.0 Content-Disposition: inline X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: jzXHTZHmFv0S0YRzzy_sLlbXEE90VRyS X-Proofpoint-GUID: EklFL3aYvmqBKbbZgbognqjHVMY3hQw_ X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.980,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-10-13_12,2023-10-12_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxlogscore=999 clxscore=1011 lowpriorityscore=0 phishscore=0 adultscore=0 impostorscore=0 spamscore=0 malwarescore=0 priorityscore=1501 bulkscore=0 mlxscore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2309180000 definitions=main-2310130206 X-Spam-Status: No, score=-11.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1779685663782966152 X-GMAIL-MSGID: 1779685663782966152 In working on some future patches that involve utilizing vector pair instructions, I wanted to be able to tune my program to enable or disable using the vector pair load or store operations while still keeping the other operations on the vector pair. This patch adds two undocumented tuning options. The -mno-load-vector-pair option would tell GCC to generate two load vector instructions instead of a single load vector pair. The -mno-store-vector-pair option would tell GCC to generate two store vector instructions instead of a single store vector pair. If either -mno-load-vector-pair is used, GCC will not generate the indexed stxvpx instruction. Similarly if -mno-store-vector-pair is used, GCC will not generate the indexed lxvpx instruction. The reason for this is to enable splitting the {,p}lxvp or {,p}stxvp instructions after reload without needing a scratch GPR register. The default for -mcpu=power10 is that both load vector pair and store vector pair are enabled. I decided that if the user explicitly used the __builtin_vsx_lxvp or the __builtin_vsx_stxvp built-in functions to load or store a vector pair, that those functions would always generate a vector pair instruction. I added code so that the user code can modify these settings using either a '#pragma GCC target' directive or used __attribute__((__target__(...))) in the function declaration. I added tests for the switches, #pragma, and attribute options. I have built this on both little endian power10 systems and big endian power9 systems doing the normal bootstrap and test. There were no regressions in any of the tests, and the new tests passed. Can I check this patch into the master branch? 2023-10-13 Michael Meissner gcc/ * config/rs6000/mma.md (movoo): Add support for -mload-vector-pair and -mstore-vector-pair. * config/rs6000/rs6000-cpus.def (OTHER_POWER10_MASKS): Likewise. (POWERPC_MASKS): Likewise. * config/rs6000/rs6000.md (rs6000_setup_reg_addr_masks): If either load vector pair or store vector pair instructions are not being generated, don't allow lxvpx or stxvpx to be generated. (rs6000_option_override_internal): Add warnings if either -mload-vector-pair or -mstore-vector-pair is used without having MMA instructions. (rs6000_opt_masks): Allow user to override -mload-vector-pair or -mstore-vector-pair via #pragma or attribute. * config/rs6000/rs6000.opt (-mload-vector-pair): New option. (-mstore-vector-pair): Likewise. gcc/testsuite/ * gcc.target/powerpc/vector-pair-attribute.c: New test. * gcc.target/powerpc/vector-pair-pragma.c: New test. * gcc.target/powerpc/vector-pair-switch1.c: New test. * gcc.target/powerpc/vector-pair-switch2.c: New test. * gcc.target/powerpc/vector-pair-switch3.c: New test. * gcc.target/powerpc/vector-pair-switch4.c: New test. --- gcc/config/rs6000/mma.md | 44 +++++++++++++++ gcc/config/rs6000/rs6000-builtin.cc | 46 +++++++++++++--- gcc/config/rs6000/rs6000-builtins.def | 6 ++ gcc/config/rs6000/rs6000-cpus.def | 8 ++- gcc/config/rs6000/rs6000.cc | 30 +++++++++- gcc/config/rs6000/rs6000.opt | 8 +++ .../powerpc/vector-pair-attribute.c | 39 +++++++++++++ .../gcc.target/powerpc/vector-pair-builtin.c | 40 ++++++++++++++ .../gcc.target/powerpc/vector-pair-pragma.c | 55 +++++++++++++++++++ .../gcc.target/powerpc/vector-pair-switch1.c | 16 ++++++ .../gcc.target/powerpc/vector-pair-switch2.c | 17 ++++++ .../gcc.target/powerpc/vector-pair-switch3.c | 17 ++++++ .../gcc.target/powerpc/vector-pair-switch4.c | 17 ++++++ 13 files changed, 331 insertions(+), 12 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/vector-pair-attribute.c create mode 100644 gcc/testsuite/gcc.target/powerpc/vector-pair-builtin.c create mode 100644 gcc/testsuite/gcc.target/powerpc/vector-pair-pragma.c create mode 100644 gcc/testsuite/gcc.target/powerpc/vector-pair-switch1.c create mode 100644 gcc/testsuite/gcc.target/powerpc/vector-pair-switch2.c create mode 100644 gcc/testsuite/gcc.target/powerpc/vector-pair-switch3.c create mode 100644 gcc/testsuite/gcc.target/powerpc/vector-pair-switch4.c diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md index 575751d477e..fc7e95bc167 100644 --- a/gcc/config/rs6000/mma.md +++ b/gcc/config/rs6000/mma.md @@ -91,6 +91,7 @@ (define_c_enum "unspec" UNSPEC_MMA_XVI8GER4SPP UNSPEC_MMA_XXMFACC UNSPEC_MMA_XXMTACC + UNSPEC_MMA_VECTOR_PAIR_MEMORY ]) (define_c_enum "unspecv" @@ -298,6 +299,49 @@ (define_insn_and_split "*movoo" "TARGET_MMA && (gpc_reg_operand (operands[0], OOmode) || gpc_reg_operand (operands[1], OOmode))" +{ + if (MEM_P (operands[0])) + return TARGET_STORE_VECTOR_PAIR ? "stxvp%X0 %x1,%0" : "#"; + + if (MEM_P (operands[1])) + return TARGET_LOAD_VECTOR_PAIR ? "lxvp%X1 %x0,%1" : "#"; + + return "#"; +} + "&& reload_completed + && ((MEM_P (operands[0]) && !TARGET_STORE_VECTOR_PAIR) + || (MEM_P (operands[1]) && !TARGET_LOAD_VECTOR_PAIR) + || (!MEM_P (operands[0]) && !MEM_P (operands[1])))" + [(const_int 0)] +{ + rs6000_split_multireg_move (operands[0], operands[1]); + DONE; +} + [(set_attr "type" "vecload,vecstore,veclogical") + (set_attr "size" "256") + (set_attr "length" "*,*,8")]) + +;; Normally __builtin_vsx_lxvp and __builtin_vsx_stxvp are converted to a +;; direct move insns, but if -mno-load-vector-pair or -mno-store-vector-pair +;; are used, we use these insns to guarantee that the load vector pair is +;; generated when the user explicitly uses the built-in function. +(define_expand "lxvp_internal" + [(set (match_operand:OO 0 "gpc_reg_operand") + (unspec:OO [(mem:OO (match_operand 1 "address_operand"))] + UNSPEC_MMA_VECTOR_PAIR_MEMORY))] + "TARGET_MMA") + +(define_expand "stxvp_internal" + [(set (mem:OO (match_operand 0 "address_operand")) + (unspec:OO [(match_operand:OO 1 "gpc_reg_operand")] + UNSPEC_MMA_VECTOR_PAIR_MEMORY))] + "TARGET_MMA") + +(define_insn_and_split "*vector_pair_memory_builtin" + [(set (match_operand:OO 0 "nonimmediate_operand" "=wa,ZwO,wa") + (unspec:OO [(match_operand:OO 1 "input_operand" "ZwO,wa,wa")] + UNSPEC_MMA_VECTOR_PAIR_MEMORY))] + "TARGET_MMA" "@ lxvp%X1 %x0,%1 stxvp%X0 %x1,%0 diff --git a/gcc/config/rs6000/rs6000-builtin.cc b/gcc/config/rs6000/rs6000-builtin.cc index 82cc3a19447..f44427d116e 100644 --- a/gcc/config/rs6000/rs6000-builtin.cc +++ b/gcc/config/rs6000/rs6000-builtin.cc @@ -1,4 +1,4 @@ -/* Target-specific built-in function support for the Power architecture. + /* Target-specific built-in function support for the Power architecture. See also rs6000-c.c, rs6000-gen-builtins.c, rs6000-builtins.def, and rs6000-overloads.def. Note that "normal" builtins (generic math functions, etc.) are handled @@ -1165,9 +1165,23 @@ rs6000_gimple_fold_mma_builtin (gimple_stmt_iterator *gsi, if (TREE_TYPE (TREE_TYPE (ptr)) != vector_pair_type_node) ptr = build1 (NOP_EXPR, build_pointer_type (vector_pair_type_node), ptr); - tree mem = build_simple_mem_ref (build2 (POINTER_PLUS_EXPR, - TREE_TYPE (ptr), ptr, offset)); - gimplify_assign (lhs, mem, &new_seq); + tree addr = build2 (POINTER_PLUS_EXPR, TREE_TYPE (ptr), ptr, offset); + + if (TARGET_LOAD_VECTOR_PAIR) + gimplify_assign (lhs, build_simple_mem_ref (addr), &new_seq); + else + { + tree ptrtype = build_pointer_type (vector_pair_type_node); + tree addrssa = create_tmp_reg_or_ssa_name (ptrtype); + tree lhsssa = create_tmp_reg_or_ssa_name (vector_pair_type_node); + gimplify_assign (addrssa, addr, &new_seq); + new_decl = rs6000_builtin_decls[RS6000_BIF_LXVP_INTERNAL]; + new_call = gimple_build_call (new_decl, 1, addrssa); + gimple_call_set_lhs (new_call, lhsssa); + gimple_seq_add_stmt (&new_seq, new_call); + gimplify_assign (lhs, lhsssa, &new_seq); + } + pop_gimplify_context (NULL); gsi_replace_with_seq (gsi, new_seq, true); return true; @@ -1182,9 +1196,20 @@ rs6000_gimple_fold_mma_builtin (gimple_stmt_iterator *gsi, if (TREE_TYPE (TREE_TYPE (ptr)) != vector_pair_type_node) ptr = build1 (NOP_EXPR, build_pointer_type (vector_pair_type_node), ptr); - tree mem = build_simple_mem_ref (build2 (POINTER_PLUS_EXPR, - TREE_TYPE (ptr), ptr, offset)); - gimplify_assign (mem, src, &new_seq); + tree addr = build2 (POINTER_PLUS_EXPR, TREE_TYPE (ptr), ptr, offset); + + if (TARGET_STORE_VECTOR_PAIR) + gimplify_assign (build_simple_mem_ref (addr), src, &new_seq); + else + { + tree ptrtype = build_pointer_type (vector_pair_type_node); + tree addrssa = create_tmp_reg_or_ssa_name (ptrtype); + gimplify_assign (addrssa, ptr, &new_seq); + new_decl = rs6000_builtin_decls[RS6000_BIF_STXVP_INTERNAL]; + new_call = gimple_build_call (new_decl, 2, addrssa, src); + gimple_seq_add_stmt (&new_seq, new_call); + } + pop_gimplify_context (NULL); gsi_replace_with_seq (gsi, new_seq, true); return true; @@ -3002,8 +3027,13 @@ mma_expand_builtin (tree exp, rtx target, insn_code icode, rtx opnd; const struct insn_operand_data *insn_op; insn_op = &insn_data[icode].operand[nopnds]; + /* The internal built-in functions for lxvp and stxvp must use normal + expansion to allow passing the address of a variable to the + built-in. */ if (TREE_CODE (arg) == ADDR_EXPR - && MEM_P (DECL_RTL (TREE_OPERAND (arg, 0)))) + && MEM_P (DECL_RTL (TREE_OPERAND (arg, 0))) + && fcode != RS6000_BIF_LXVP_INTERNAL + && fcode != RS6000_BIF_STXVP_INTERNAL) opnd = DECL_RTL (TREE_OPERAND (arg, 0)); else opnd = expand_normal (arg); diff --git a/gcc/config/rs6000/rs6000-builtins.def b/gcc/config/rs6000/rs6000-builtins.def index ce40600e803..b661a226843 100644 --- a/gcc/config/rs6000/rs6000-builtins.def +++ b/gcc/config/rs6000/rs6000-builtins.def @@ -4129,5 +4129,11 @@ v256 __builtin_vsx_lxvp (unsigned long, const v256 *); LXVP nothing {mma} + v256 __builtin_vsx_lxvp_internal (const v256 *); + LXVP_INTERNAL lxvp_internal {mma} + void __builtin_vsx_stxvp (v256, unsigned long, const v256 *); STXVP nothing {mma,pair} + + void __builtin_vsx_stxvp_internal (v256 *, v256); + STXVP_INTERNAL stxvp_internal {mma} diff --git a/gcc/config/rs6000/rs6000-cpus.def b/gcc/config/rs6000/rs6000-cpus.def index 4f350da378c..8c530a22da8 100644 --- a/gcc/config/rs6000/rs6000-cpus.def +++ b/gcc/config/rs6000/rs6000-cpus.def @@ -77,10 +77,12 @@ /* Flags that need to be turned off if -mno-power10. */ /* We comment out PCREL_OPT here to disable it by default because SPEC2017 performance was degraded by it. */ -#define OTHER_POWER10_MASKS (OPTION_MASK_MMA \ +#define OTHER_POWER10_MASKS (OPTION_MASK_LOAD_VECTOR_PAIR \ + | OPTION_MASK_MMA \ | OPTION_MASK_PCREL \ /* | OPTION_MASK_PCREL_OPT */ \ - | OPTION_MASK_PREFIXED) + | OPTION_MASK_PREFIXED \ + | OPTION_MASK_STORE_VECTOR_PAIR) #define ISA_3_1_MASKS_SERVER (ISA_3_0_MASKS_SERVER \ | OPTION_MASK_POWER10 \ @@ -134,6 +136,7 @@ | OPTION_MASK_P10_FUSION \ | OPTION_MASK_HTM \ | OPTION_MASK_ISEL \ + | OPTION_MASK_LOAD_VECTOR_PAIR \ | OPTION_MASK_MFCRF \ | OPTION_MASK_MMA \ | OPTION_MASK_MODULO \ @@ -156,6 +159,7 @@ | OPTION_MASK_QUAD_MEMORY_ATOMIC \ | OPTION_MASK_RECIP_PRECISION \ | OPTION_MASK_SOFT_FLOAT \ + | OPTION_MASK_STORE_VECTOR_PAIR \ | OPTION_MASK_STRICT_ALIGN_OPTIONAL \ | OPTION_MASK_VSX) diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index cc24dd5301e..8f06b37171a 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -2711,7 +2711,9 @@ rs6000_setup_reg_addr_masks (void) /* Vector pairs can do both indexed and offset loads if the instructions are enabled, otherwise they can only do offset loads since it will be broken into two vector moves. Vector quads can - only do offset loads. */ + only do offset loads. If the user restricted generation of either + of the LXVP or STXVP instructions, do not allow indexed mode so + that we can split the load/store. */ else if ((addr_mask != 0) && TARGET_MMA && (m2 == OOmode || m2 == XOmode)) { @@ -2719,7 +2721,9 @@ rs6000_setup_reg_addr_masks (void) if (rc == RELOAD_REG_FPR || rc == RELOAD_REG_VMX) { addr_mask |= RELOAD_REG_QUAD_OFFSET; - if (m2 == OOmode) + if (m2 == OOmode + && TARGET_LOAD_VECTOR_PAIR + && TARGET_STORE_VECTOR_PAIR) addr_mask |= RELOAD_REG_INDEXED; } } @@ -4405,6 +4409,26 @@ rs6000_option_override_internal (bool global_init_p) rs6000_isa_flags &= ~OPTION_MASK_MMA; } + /* Warn if -m-load-vector-pair or -m-store-vector-pair are used and MMA is + not set. */ + if (!TARGET_MMA && TARGET_LOAD_VECTOR_PAIR) + { + if ((rs6000_isa_flags_explicit & OPTION_MASK_LOAD_VECTOR_PAIR) != 0) + warning (0, "%qs should not be used unless you use %qs", + "-mload-vector-pair", "-mmma"); + + rs6000_isa_flags &= ~OPTION_MASK_LOAD_VECTOR_PAIR; + } + + if (!TARGET_MMA && TARGET_STORE_VECTOR_PAIR) + { + if ((rs6000_isa_flags_explicit & OPTION_MASK_STORE_VECTOR_PAIR) != 0) + warning (0, "%qs should not be used unless you use %qs", + "-mstore-vector-pair", "-mmma"); + + rs6000_isa_flags &= OPTION_MASK_STORE_VECTOR_PAIR; + } + /* Enable power10 fusion if we are tuning for power10, even if we aren't generating power10 instructions. */ if (!(rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION)) @@ -24437,6 +24461,7 @@ static struct rs6000_opt_mask const rs6000_opt_masks[] = { "hard-dfp", OPTION_MASK_DFP, false, true }, { "htm", OPTION_MASK_HTM, false, true }, { "isel", OPTION_MASK_ISEL, false, true }, + { "load-vector-pair", OPTION_MASK_LOAD_VECTOR_PAIR, false, true }, { "mfcrf", OPTION_MASK_MFCRF, false, true }, { "mfpgpr", 0, false, true }, { "mma", OPTION_MASK_MMA, false, true }, @@ -24461,6 +24486,7 @@ static struct rs6000_opt_mask const rs6000_opt_masks[] = { "quad-memory-atomic", OPTION_MASK_QUAD_MEMORY_ATOMIC, false, true }, { "recip-precision", OPTION_MASK_RECIP_PRECISION, false, true }, { "save-toc-indirect", OPTION_MASK_SAVE_TOC_INDIRECT, false, true }, + { "store-vector-pair", OPTION_MASK_STORE_VECTOR_PAIR, false, true }, { "string", 0, false, true }, { "update", OPTION_MASK_NO_UPDATE, true , true }, { "vsx", OPTION_MASK_VSX, false, true }, diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt index bde6d3ff664..369095df9ed 100644 --- a/gcc/config/rs6000/rs6000.opt +++ b/gcc/config/rs6000/rs6000.opt @@ -597,6 +597,14 @@ mmma Target Mask(MMA) Var(rs6000_isa_flags) Generate (do not generate) MMA instructions. +mload-vector-pair +Target Undocumented Mask(LOAD_VECTOR_PAIR) Var(rs6000_isa_flags) +Generate (do not generate) load vector pair instructions. + +mstore-vector-pair +Target Undocumented Mask(STORE_VECTOR_PAIR) Var(rs6000_isa_flags) +Generate (do not generate) store vector pair instructions. + mrelative-jumptables Target Undocumented Var(rs6000_relative_jumptables) Init(1) Save diff --git a/gcc/testsuite/gcc.target/powerpc/vector-pair-attribute.c b/gcc/testsuite/gcc.target/powerpc/vector-pair-attribute.c new file mode 100644 index 00000000000..985a44aca85 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vector-pair-attribute.c @@ -0,0 +1,39 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */ + +/* Test if we can control generating load and store vector pair via the target + attribute. */ + +__attribute__((__target__("load-vector-pair,store-vector-pair"))) +void +test_load_store (__vector_pair *p, __vector_pair *q) +{ + *p = *q; /* 1 lxvp, 1 stxvp. */ +} + +__attribute__((__target__("load-vector-pair,no-store-vector-pair"))) +void +test_load_no_store (__vector_pair *p, __vector_pair *q) +{ + *p = *q; /* 1 lxvp, 2 stxv. */ +} + +__attribute__((__target__("no-load-vector-pair,store-vector-pair"))) +void +test_store_no_load (__vector_pair *p, __vector_pair *q) +{ + *p = *q; /* 2 lxv, 1 stxvp. */ +} + +__attribute__((__target__("no-load-vector-pair,no-store-vector-pair"))) +void +test_no_load_or_store (__vector_pair *p, __vector_pair *q) +{ + *p = *q; /* 2 lxv, 2 stxv. */ +} + +/* { dg-final { scan-assembler-times {\mp?lxvpx?\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mp?stxvpx?\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mp?lxvx?\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mp?stxvx?\M} 4 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/vector-pair-builtin.c b/gcc/testsuite/gcc.target/powerpc/vector-pair-builtin.c new file mode 100644 index 00000000000..82790b116b9 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vector-pair-builtin.c @@ -0,0 +1,40 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2 -mno-load-vector-pair -mno-store-vector-pair" } */ + +/* Test if we do not generate load and store vector pair if directed to on + power 10 for normal loads and stores, but the built-in versions still + generate the load/store vector pair instructions. Also check that the + prefixed plxvp or pstxvp are generated when appropriate. */ + +static __vector_pair vp; + +void foo_assign (__vector_pair *p, const __vector_pair *q) +{ + *p = *q; /* 2 lxv, 2 stxv. */ +} + +void foo_builtin (__vector_pair *p, const __vector_pair *q) +{ + /* 1 lxvp, 1 stxvp. */ + __builtin_vsx_stxvp (__builtin_vsx_lxvp (16, q), 32, p); +} + +void foo_builtin_static_load (__vector_pair *p) +{ + /* 1 plxvp, 1 stxvp. */ + __builtin_vsx_stxvp (__builtin_vsx_lxvp (0, &vp), 0, p); +} + +void foo_builtin_static_store (const __vector_pair *p) +{ + /* 1 lxvp, 1 stxvp. */ + __builtin_vsx_stxvp (__builtin_vsx_lxvp (0, p), 0, &vp); +} + +/* { dg-final { scan-assembler-times {\mlxvx?\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mlxvpx?\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mplxvp\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mpstxvp\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mstxvx?\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mstxvpx?\M} 2 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/vector-pair-pragma.c b/gcc/testsuite/gcc.target/powerpc/vector-pair-pragma.c new file mode 100644 index 00000000000..74c6baf8185 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vector-pair-pragma.c @@ -0,0 +1,55 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */ + +/* Test if we can control generating load and store vector pair via the #pragma + directive. */ + +#pragma gcc push_options +#pragma GCC target("load-vector-pair,store-vector-pair") + +void +test_load_store (__vector_pair *p, __vector_pair *q) +{ + *p = *q; /* 1 lxvp, 1 stxvp. */ +} + +#pragma gcc pop_options + +#pragma gcc push_options +#pragma GCC target("load-vector-pair,no-store-vector-pair") + +void +test_load_no_store (__vector_pair *p, __vector_pair *q) +{ + *p = *q; /* 1 lxvp, 2 stxv. */ +} + +#pragma gcc pop_options + +#pragma gcc push_options +#pragma GCC target("no-load-vector-pair,store-vector-pair") + +void +test_store_no_load (__vector_pair *p, __vector_pair *q) +{ + *p = *q; /* 2 lxv, 1 stxvp. */ +} + +#pragma gcc pop_options + +#pragma gcc push_options +#pragma GCC target("no-load-vector-pair,no-store-vector-pair") + +void +test_no_load_or_store (__vector_pair *p, __vector_pair *q) +{ + *p = *q; /* 2 lxv, 2 stxv. */ +} + +#pragma gcc pop_options + +/* { dg-final { scan-assembler-times {\mp?lxvpx?\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mp?stxvpx?\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mp?lxvx?\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mp?stxvx?\M} 4 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/vector-pair-switch1.c b/gcc/testsuite/gcc.target/powerpc/vector-pair-switch1.c new file mode 100644 index 00000000000..48e433b378e --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vector-pair-switch1.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */ + +/* Test if we generate load and store vector pair by default on power 10. */ + +void +test (__vector_pair *p, __vector_pair *q) +{ + *p = *q; /* 1 lxvp, 1 stxvp. */ +} + +/* { dg-final { scan-assembler-times {\mp?lxvpx?\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mp?stxvpx?\M} 1 } } */ +/* { dg-final { scan-assembler-not {\mp?lxvx?\M} } } */ +/* { dg-final { scan-assembler-not {\mp?stxvx?\M} } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/vector-pair-switch2.c b/gcc/testsuite/gcc.target/powerpc/vector-pair-switch2.c new file mode 100644 index 00000000000..2a38c2f2aae --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vector-pair-switch2.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2 -mno-store-vector-pair" } */ + +/* Test if we generate load vector pair but not store vector pair if + -mno-store-vector-pair is used on power10. */ + +void +test (__vector_pair *p, __vector_pair *q) +{ + *p = *q; /* 1 lxvp, 2 stxv. */ +} + +/* { dg-final { scan-assembler-times {\mp?lxvpx?\M} 1 } } */ +/* { dg-final { scan-assembler-not {\mp?stxvpx?\M} } } */ +/* { dg-final { scan-assembler-not {\mp?lxvx?\M} } } */ +/* { dg-final { scan-assembler-times {\mp?stxvx?\M} 2 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/vector-pair-switch3.c b/gcc/testsuite/gcc.target/powerpc/vector-pair-switch3.c new file mode 100644 index 00000000000..fd273056b8f --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vector-pair-switch3.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2 -mno-load-vector-pair" } */ + +/* Test if we do not generate load vector pair but generate store vector pair + if -mno-load-vector-pair is used on power10. */ + +void +test (__vector_pair *p, __vector_pair *q) +{ + *p = *q; /* 2 lxv, 1 stxvp. */ +} + +/* { dg-final { scan-assembler-not {\mp?lxvpx?\M} } } */ +/* { dg-final { scan-assembler-times {\mp?stxvpx?\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mp?lxvx?\M} 2 } } */ +/* { dg-final { scan-assembler-not {\mp?stxvx?\M} } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/vector-pair-switch4.c b/gcc/testsuite/gcc.target/powerpc/vector-pair-switch4.c new file mode 100644 index 00000000000..01686e073fe --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vector-pair-switch4.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2 -mno-load-vector-pair -mno-store-vector-pair" } */ + +/* Test if we do not generate load and store vector pair if directed to on + power 10. */ + +void +test (__vector_pair *p, __vector_pair *q) +{ + *p = *q; /* 2 lxv, 2 stxv. */ +} + +/* { dg-final { scan-assembler-not {\mp?lxvpx?\M} } } */ +/* { dg-final { scan-assembler-not {\mp?stxvpx?\M} } } */ +/* { dg-final { scan-assembler-times {\mp?lxvx?\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mp?stxvx?\M} 2 } } */