From patchwork Fri Jul 21 01:32:16 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: HAO CHEN GUI X-Patchwork-Id: 123505 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:c923:0:b0:3e4:2afc:c1 with SMTP id j3csp3488358vqt; Thu, 20 Jul 2023 18:33:56 -0700 (PDT) X-Google-Smtp-Source: APBJJlE6/BaYr4HhKSOpVcgU78ULZzpaWaeJ5C5JKI8Tduy0URPmrxXGHm0+fgPTuP7wZ0XXLwIb X-Received: by 2002:a17:906:8a64:b0:997:e9a3:9c59 with SMTP id hy4-20020a1709068a6400b00997e9a39c59mr363313ejc.6.1689903236354; Thu, 20 Jul 2023 18:33:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689903236; cv=none; d=google.com; s=arc-20160816; b=yLdw+SgT+Z99OWn0bSLSp8NB6FqUOiDKKR/rCootxVU9i2rVroREXutx/+kbDmRacQ bcMaX230P6yub6o7m2OhBAHyMeqZ9rKr9lZdYQZ0dFcnCUWSGOO3+4u+n+DwWa79WOZD UoHJ7aFpv4IpDt4ia/tM1Onqbrqm9ODrzzpWYcSIJw11mY5ZwKEvz3YFxlvTLSDRAnPP nVKjnIdINiwM+xJbrH0UwmLfir7xNi473Rrq2j4xy61lNG1iuzl8+wrz/YYeyUY4qjCz 6khX9ylgeTwNIf2lO2iY1mMdqe0rXBY66HTfrAutx2kmp1AxBZX2L9J3l5GehpYEZYHd pHdw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence:mime-version :content-transfer-encoding:subject:cc:to:content-language:user-agent :date:message-id:dmarc-filter:delivered-to:dkim-signature :dkim-filter; bh=CgpsAiYJ+zfYmR8KO7dc4YuxBNREZmUUFmY2TAILn5c=; fh=vgPNtFWF08EH/qIJ0+1Y43tDLm5kDu6PJa+fkpzGvHc=; b=0Lfx2/cAZPjjqRbrJtj5FIUU7xw7K7ETBtJZ6OIRSUhYMYTgW4oLe69cdUFwTA8/27 xW4UYUT2IshFXdVQvZNwogoukr88MkU6HZVnfoKPnaQ/PdoVl1TzaQDzm1SY8aVlQMKW stWB+VSqEiBaMFYfV51B9PBofVWnDMpV5tf7XWtCPdliIIBEvQBItDZ3/JlqPDn+QIzG s0/i2BnIkvu6x9xPnhUWGoyEFLkz954ihXWgEjPJ/owpQIc+IloKbwYu8jSzN05bqAiB omEFnxItGfM8qZK/eTxKCMf2dsb3iBHAes0M8Zfs3Jsy3iA12vEL1LR6PrH+gGSCU3T0 Lciw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=V3GnTYLy; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id l27-20020a1709060e1b00b009886d484ad6si1498991eji.759.2023.07.20.18.33.56 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 20 Jul 2023 18:33:56 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=V3GnTYLy; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B44C0385C6DD for ; Fri, 21 Jul 2023 01:33:54 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B44C0385C6DD DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1689903234; bh=CgpsAiYJ+zfYmR8KO7dc4YuxBNREZmUUFmY2TAILn5c=; h=Date:To:Cc:Subject:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:From; b=V3GnTYLyD4gwD4OIe74eFqfquTKjTtUYFkrBhHaTTvlkMlDpypIatrVU/GQlpdrZm 8WZzvPY6rI8NfaZ7DHoo/rPo4dJWmM+7UeGasgSoUoI8H8cASrcOu9XfxY2iiia5jv 1MwJEXbRRXtO46ugSI9wArp4rqcE933XyMBDH0+g= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id CDE1C3858C53 for ; Fri, 21 Jul 2023 01:33:07 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org CDE1C3858C53 Received: from pps.filterd (m0353726.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 36L17E3l028716; Fri, 21 Jul 2023 01:33:06 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3rye3fam5n-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 21 Jul 2023 01:33:05 +0000 Received: from m0353726.ppops.net (m0353726.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 36L17Dni028636; Fri, 21 Jul 2023 01:33:04 GMT Received: from ppma13.dal12v.mail.ibm.com (dd.9e.1632.ip4.static.sl-reverse.com [50.22.158.221]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3rye3fakv1-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 21 Jul 2023 01:33:02 +0000 Received: from pps.filterd (ppma13.dal12v.mail.ibm.com [127.0.0.1]) by ppma13.dal12v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 36KNvb2f007491; Fri, 21 Jul 2023 01:32:22 GMT Received: from smtprelay03.fra02v.mail.ibm.com ([9.218.2.224]) by ppma13.dal12v.mail.ibm.com (PPS) with ESMTPS id 3rv80jg9q9-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 21 Jul 2023 01:32:22 +0000 Received: from smtpav02.fra02v.mail.ibm.com (smtpav02.fra02v.mail.ibm.com [10.20.54.101]) by smtprelay03.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 36L1WJNM40567278 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 21 Jul 2023 01:32:19 GMT Received: from smtpav02.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 95DE02004B; Fri, 21 Jul 2023 01:32:19 +0000 (GMT) Received: from smtpav02.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id ACE8F20040; Fri, 21 Jul 2023 01:32:17 +0000 (GMT) Received: from [9.200.60.57] (unknown [9.200.60.57]) by smtpav02.fra02v.mail.ibm.com (Postfix) with ESMTP; Fri, 21 Jul 2023 01:32:17 +0000 (GMT) Message-ID: <894768a2-5ebe-60f0-e6e9-73bdc9f1425d@linux.ibm.com> Date: Fri, 21 Jul 2023 09:32:16 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 Content-Language: en-US To: gcc-patches Cc: Segher Boessenkool , David , "Kewen.Lin" , Peter Bergner Subject: [PATCHv2, rs6000] Generate mfvsrwz for all subtargets and remove redundant zero extend [PR106769] X-TM-AS-GCONF: 00 X-Proofpoint-GUID: MzwdHe_EsNBxYyRFQdJUxvJEcG3n30tK X-Proofpoint-ORIG-GUID: ONSGHOP7uF7lhwXGw2iyOo0-xFqkH98Q X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.591,FMLib:17.11.176.26 definitions=2023-07-20_12,2023-07-20_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1015 spamscore=0 mlxscore=0 priorityscore=1501 adultscore=0 suspectscore=0 phishscore=0 mlxlogscore=999 malwarescore=0 impostorscore=0 bulkscore=0 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2306200000 definitions=main-2307210013 X-Spam-Status: No, score=-11.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H5, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: HAO CHEN GUI via Gcc-patches From: HAO CHEN GUI Reply-To: HAO CHEN GUI Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1771991975701938644 X-GMAIL-MSGID: 1771991975701938644 Hi, This patch modifies vsx extract expand and generates mfvsrwz/stxsiwx for all subtargets when the mode is V4SI and the index of extracted element is 1 for BE and 2 for LE. Also this patch adds a insn pattern for mfvsrwz which can help eliminate redundant zero extend. Compared to last version, the main change is to add a new expand for V4SI and separate "vsx_extract_si" to 2 insn patterns. https://gcc.gnu.org/pipermail/gcc-patches/2023-June/622101.html Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Thanks Gui Haochen ChangeLog rs6000: Generate mfvsrwz for all subtargets and remove redundant zero extend mfvsrwz has lower latency than xxextractuw or vextuw[lr]x. So it should be generated even with p9 vector enabled. Also the instruction is already zero extended. A combine pattern is needed to eliminate redundant zero extend instructions. gcc/ PR target/106769 * config/rs6000/vsx.md (expand vsx_extract_): Set it only for V8HI and V16QI. (vsx_extract_v4si): New expand for V4SI. (*vsx_extract__di_p9): Not generate the insn when it can be generated by mfvsrwz. (mfvsrwz): New insn pattern for zero extended vsx_extract_v4si. (*vsx_extract_si): Removed. (vsx_extract_v4si_0): New insn pattern to deal with V4SI extract when the index of extracted element is 1 with BE and 2 with LE. (vsx_extract_v4si_1): New insn and split pattern which deals with the cases not handled by vsx_extract_v4si_0. gcc/testsuite/ PR target/106769 * gcc.target/powerpc/pr106769.h: New. * gcc.target/powerpc/pr106769-p8.c: New. * gcc.target/powerpc/pr106769-p9.c: New. patch.diff diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index 0a34ceebeb5..ad249441bcf 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -3722,9 +3722,9 @@ (define_insn "vsx_xxpermdi2__1" (define_expand "vsx_extract_" [(parallel [(set (match_operand: 0 "gpc_reg_operand") (vec_select: - (match_operand:VSX_EXTRACT_I 1 "gpc_reg_operand") + (match_operand:VSX_EXTRACT_I2 1 "gpc_reg_operand") (parallel [(match_operand:QI 2 "const_int_operand")]))) - (clobber (match_scratch:VSX_EXTRACT_I 3))])] + (clobber (match_scratch:VSX_EXTRACT_I2 3))])] "VECTOR_MEM_VSX_P (mode) && TARGET_DIRECT_MOVE_64BIT" { /* If we have ISA 3.0, we can do a xxextractuw/vextractu{b,h}. */ @@ -3736,6 +3736,23 @@ (define_expand "vsx_extract_" } }) +(define_expand "vsx_extract_v4si" + [(parallel [(set (match_operand:SI 0 "gpc_reg_operand") + (vec_select:SI + (match_operand:V4SI 1 "gpc_reg_operand") + (parallel [(match_operand:QI 2 "const_0_to_3_operand")]))) + (clobber (match_scratch:V4SI 3))])] + "TARGET_DIRECT_MOVE_64BIT" +{ + if (TARGET_P9_VECTOR + && INTVAL (operands[2]) != (BYTES_BIG_ENDIAN ? 1 : 2)) + { + emit_insn (gen_vsx_extract_v4si_p9 (operands[0], operands[1], + operands[2])); + DONE; + } +}) + (define_insn "vsx_extract__p9" [(set (match_operand: 0 "gpc_reg_operand" "=r,") (vec_select: @@ -3798,7 +3815,9 @@ (define_insn_and_split "*vsx_extract__di_p9" (match_operand:VSX_EXTRACT_I 1 "gpc_reg_operand" "v,") (parallel [(match_operand:QI 2 "const_int_operand" "n,n")])))) (clobber (match_scratch:SI 3 "=r,X"))] - "VECTOR_MEM_VSX_P (mode) && TARGET_VEXTRACTUB" + "TARGET_VEXTRACTUB + && (mode != V4SImode + || INTVAL (operands[2]) != (BYTES_BIG_ENDIAN ? 1 : 2))" "#" "&& reload_completed" [(parallel [(set (match_dup 4) @@ -3830,58 +3849,78 @@ (define_insn_and_split "*vsx_extract__store_p9" (set (match_dup 0) (match_dup 3))]) -(define_insn_and_split "*vsx_extract_si" +(define_insn "mfvsrwz" + [(set (match_operand:DI 0 "register_operand" "=r") + (zero_extend:DI + (vec_select:SI + (match_operand:V4SI 1 "vsx_register_operand" "wa") + (parallel [(match_operand:QI 2 "const_int_operand" "n")])))) + (clobber (match_scratch:V4SI 3 "=v"))] + "TARGET_DIRECT_MOVE_64BIT + && INTVAL (operands[2]) == (BYTES_BIG_ENDIAN ? 1 : 2)" + "mfvsrwz %0,%x1" + [(set_attr "type" "mfvsr") + (set_attr "isa" "p8v")]) + +(define_insn "vsx_extract_v4si_0" + [(set (match_operand:SI 0 "nonimmediate_operand" "=r,wa,Z,wa") + (vec_select:SI + (match_operand:V4SI 1 "gpc_reg_operand" "v,v,v,0") + (parallel [(match_operand:QI 2 "const_0_to_3_operand" "n,n,n,n")]))) + (clobber (match_scratch:V4SI 3 "=v,v,v,v"))] + "TARGET_DIRECT_MOVE_64BIT + && (!TARGET_P9_VECTOR || INTVAL (operands[2]) == (BYTES_BIG_ENDIAN ? 1 : 2))" +{ + if (which_alternative == 0) + return "mfvsrwz %0,%x1"; + + if (which_alternative == 1) + return "xxlor %x0,%x1,%x1"; + + if (which_alternative == 2) + return "stxsiwx %x1,%y0"; + + return ASM_COMMENT_START " vec_extract to same register"; +} + [(set_attr "type" "mfvsr,veclogical,fpstore,*") + (set_attr "length" "4,4,4,0") + (set_attr "isa" "p8v,*,p8v,*")]) + +(define_insn_and_split "vsx_extract_v4si_1" [(set (match_operand:SI 0 "nonimmediate_operand" "=r,wa,Z") (vec_select:SI (match_operand:V4SI 1 "gpc_reg_operand" "v,v,v") (parallel [(match_operand:QI 2 "const_0_to_3_operand" "n,n,n")]))) (clobber (match_scratch:V4SI 3 "=v,v,v"))] - "VECTOR_MEM_VSX_P (V4SImode) && TARGET_DIRECT_MOVE_64BIT && !TARGET_P9_VECTOR" + "TARGET_DIRECT_MOVE_64BIT + && !TARGET_P9_VECTOR + && INTVAL (operands[2]) != (BYTES_BIG_ENDIAN ? 1 : 2)" "#" - "&& reload_completed" + "&& 1" [(const_int 0)] { rtx dest = operands[0]; rtx src = operands[1]; rtx element = operands[2]; - rtx vec_tmp = operands[3]; - int value; + rtx vec_tmp; + + if (GET_CODE (operands[3]) == SCRATCH) + vec_tmp = gen_reg_rtx (V4SImode); + else + vec_tmp = operands[3]; /* Adjust index for LE element ordering, the below minuend 3 is computed by GET_MODE_NUNITS (V4SImode) - 1. */ if (!BYTES_BIG_ENDIAN) element = GEN_INT (3 - INTVAL (element)); - /* If the value is in the correct position, we can avoid doing the VSPLT - instruction. */ - value = INTVAL (element); - if (value != 1) - emit_insn (gen_altivec_vspltw_direct (vec_tmp, src, element)); - else - vec_tmp = src; + emit_insn (gen_altivec_vspltw_direct (vec_tmp, src, element)); - if (MEM_P (operands[0])) - { - if (can_create_pseudo_p ()) - dest = rs6000_force_indexed_or_indirect_mem (dest); - - if (TARGET_P8_VECTOR) - emit_move_insn (dest, gen_rtx_REG (SImode, REGNO (vec_tmp))); - else - emit_insn (gen_stfiwx (dest, gen_rtx_REG (DImode, REGNO (vec_tmp)))); - } - - else if (TARGET_P8_VECTOR) - emit_move_insn (dest, gen_rtx_REG (SImode, REGNO (vec_tmp))); - else - emit_move_insn (gen_rtx_REG (DImode, REGNO (dest)), - gen_rtx_REG (DImode, REGNO (vec_tmp))); + int value = BYTES_BIG_ENDIAN ? 1 : 2; + emit_insn (gen_vsx_extract_v4si_0 (dest, vec_tmp, GEN_INT (value))); DONE; -} - [(set_attr "type" "mfvsr,vecperm,fpstore") - (set_attr "length" "8") - (set_attr "isa" "*,p8v,*")]) +}) (define_insn_and_split "*vsx_extract__p8" [(set (match_operand: 0 "nonimmediate_operand" "=r") diff --git a/gcc/testsuite/gcc.target/powerpc/pr106769-p8.c b/gcc/testsuite/gcc.target/powerpc/pr106769-p8.c new file mode 100644 index 00000000000..e7cdbc76298 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr106769-p8.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-skip-if "" { powerpc*-*-darwin* } } */ +/* { dg-require-effective-target powerpc_p8vector_ok } */ +/* { dg-options "-mdejagnu-cpu=power8 -O2" } */ +/* { dg-require-effective-target has_arch_ppc64 } */ + +#include "pr106769.h" + +/* { dg-final { scan-assembler {\mmfvsrwz\M} } } */ +/* { dg-final { scan-assembler {\mstxsiwx\M} } } */ +/* { dg-final { scan-assembler-not {\mrldicl\M} } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/pr106769-p9.c b/gcc/testsuite/gcc.target/powerpc/pr106769-p9.c new file mode 100644 index 00000000000..2205e434a86 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr106769-p9.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-skip-if "" { powerpc*-*-darwin* } } */ +/* { dg-require-effective-target powerpc_p9vector_ok } */ +/* { dg-options "-mdejagnu-cpu=power9 -O2" } */ +/* { dg-require-effective-target has_arch_ppc64 } */ + +#include "pr106769.h" + +/* { dg-final { scan-assembler {\mmfvsrwz\M} } } */ +/* { dg-final { scan-assembler {\mstxsiwx\M} } } */ +/* { dg-final { scan-assembler-not {\mrldicl\M} } } */ +/* { dg-final { scan-assembler-not {\mxxextractuw\M} } } */ +/* { dg-final { scan-assembler-not "vextuw\[rl\]x" } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/pr106769.h b/gcc/testsuite/gcc.target/powerpc/pr106769.h new file mode 100644 index 00000000000..1c8c8a024f3 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr106769.h @@ -0,0 +1,17 @@ +#include + +#ifdef __BIG_ENDIAN__ +#define LANE 1 +#else +#define LANE 2 +#endif + +unsigned int foo1 (vector unsigned int v) +{ + return vec_extract(v, LANE); +} + +void foo2 (vector unsigned int v, unsigned int* p) +{ + *p = vec_extract(v, LANE); +}