From patchwork Fri Feb 3 21:36:18 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Meissner X-Patchwork-Id: 52659 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp1068804wrn; Fri, 3 Feb 2023 13:37:08 -0800 (PST) X-Google-Smtp-Source: AK7set+J0RweIP+0jNXXGZMNYG9OOoAzhwPypadmKMFhU0YYkM6Xi8skaOGjtdGpdMoEe5DYKhtk X-Received: by 2002:a50:bb0f:0:b0:4a2:665:52cc with SMTP id y15-20020a50bb0f000000b004a2066552ccmr12180262ede.27.1675460228584; Fri, 03 Feb 2023 13:37:08 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1675460228; cv=none; d=google.com; s=arc-20160816; b=pu6ExHPM2zivec8iOBVtw3RlJLUrTFzq0KIhAMtZTaBkUUMaoK2T/4Uku9T/SlRvlE RbTS1jrNF61cMAMnwOqhd9bQY4pi+lXmI1kimw7hFZ7CBddLTWdbS6aw21Q44qbG70kJ iMAr4mOtvGgXXLkr7T1FuwIhm0qQNxRe27FSV9CKWBn5m4xm0lZWE3lXiMUe1gUdbEOk 7Bb1QQGclximJwsSrf3EcELGsz+t+UP6PAlBynVFFdQauXMnSMTUEPoyjTrDJ6znW2B0 rOPoQllLfOtdNcXcK0O8nV3ojMIxNzW92b7BXyTc0MPR0UjfUmCnb455p6P1hw3/rlUy HldQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence:in-reply-to :content-disposition:mime-version:references:mail-followup-to :message-id:subject:to:date:dmarc-filter:delivered-to:dkim-signature :dkim-filter; bh=5qQ59LZMLHBvfKiLnwr8dX69JIQYg8TmHjj46WnadKg=; b=iG7tkr8FEDgrkrQ6cPDUAmj9vBvfm1mzTIAztGhEfAoAfAD0gG++Wje9AmgDYJa6bO 2EPc0zNaiYSNu7r0RB/fL/9JamObULkULLc3T2AvhiUAyVCLbAdm72Zx+dumAb88VQ5U jL0K4V39cH7N4t38ro/qzTWGI4pFrCaKTYbtTnCedk0fE7gt2iyxbop3wW3IfI/Wq2G9 eUOGxxQU5n5ydygXdIEFEPu82DOk25UawYToe0u8TCuSMQTv0bPxIMtvWlrxcFlr+tle 0Cwp7NTNBf7ckQpwrScKgoDYk/yAHKWhOQ2ie2rTe3RVixcwS+0Y9a3R801wp/e5S69s zV4g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=LlPOe3TP; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id h17-20020a056402281100b0049e0440d493si5175801ede.15.2023.02.03.13.37.08 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 03 Feb 2023 13:37:08 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=LlPOe3TP; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 7D9993858423 for ; Fri, 3 Feb 2023 21:37:07 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 7D9993858423 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1675460227; bh=5qQ59LZMLHBvfKiLnwr8dX69JIQYg8TmHjj46WnadKg=; h=Date:To:Subject:References:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=LlPOe3TPz9QMyQJ7oDIfHrDi920WUhWzs4lFub/GGEVd6/OUFM03FUGGve8gi+b6B mY/Q0DQS6LP8KUlvgZMJJoqthQmwIMh8W/3xE8rXBcyvMq7csGC3Y9l2OiLACFsNHC WutD81I5QFZG3fSbWii4v3g4Sgz4eoaIeDTU3mp4= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id AA4C03858C52 for ; Fri, 3 Feb 2023 21:36:23 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org AA4C03858C52 Received: from pps.filterd (m0098417.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 313L396u028196; Fri, 3 Feb 2023 21:36:23 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3nh7by4p7a-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 03 Feb 2023 21:36:22 +0000 Received: from m0098417.ppops.net (m0098417.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 313L57bE019876; Fri, 3 Feb 2023 21:36:22 GMT Received: from ppma02dal.us.ibm.com (a.bd.3ea9.ip4.static.sl-reverse.com [169.62.189.10]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3nh7by4p67-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 03 Feb 2023 21:36:22 +0000 Received: from pps.filterd (ppma02dal.us.ibm.com [127.0.0.1]) by ppma02dal.us.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 313KEeKu019228; Fri, 3 Feb 2023 21:36:21 GMT Received: from smtprelay06.dal12v.mail.ibm.com ([9.208.130.100]) by ppma02dal.us.ibm.com (PPS) with ESMTPS id 3ncvurbfe2-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 03 Feb 2023 21:36:21 +0000 Received: from smtpav01.dal12v.mail.ibm.com (smtpav01.dal12v.mail.ibm.com [10.241.53.100]) by smtprelay06.dal12v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 313LaKpA4522542 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 3 Feb 2023 21:36:20 GMT Received: from smtpav01.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 1537B58063; Fri, 3 Feb 2023 21:36:20 +0000 (GMT) Received: from smtpav01.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 7D72E58059; Fri, 3 Feb 2023 21:36:19 +0000 (GMT) Received: from toto.the-meissners.org (unknown [9.65.233.34]) by smtpav01.dal12v.mail.ibm.com (Postfix) with ESMTPS; Fri, 3 Feb 2023 21:36:19 +0000 (GMT) Date: Fri, 3 Feb 2023 16:36:18 -0500 To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , "Kewen.Lin" , David Edelsohn , Peter Bergner , Will Schmidt Subject: [PATCH 7/8] Support load/store vector with right length. Message-ID: Mail-Followup-To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , "Kewen.Lin" , David Edelsohn , Peter Bergner , Will Schmidt References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-TM-AS-GCONF: 00 X-Proofpoint-GUID: 8ZiyQYDc0ihFKPzjEw40iD7GSN05ExWT X-Proofpoint-ORIG-GUID: K0nTgEu9SEY5pLmY8G0zVg78l1fdammB X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.930,Hydra:6.0.562,FMLib:17.11.122.1 definitions=2023-02-03_19,2023-02-03_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxscore=0 bulkscore=0 suspectscore=0 phishscore=0 malwarescore=0 clxscore=1015 spamscore=0 impostorscore=0 priorityscore=1501 lowpriorityscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2212070000 definitions=main-2302030194 X-Spam-Status: No, score=-10.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_MANYTO, KAM_SHORT, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Michael Meissner via Gcc-patches From: Michael Meissner Reply-To: Michael Meissner Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1756847384329549041?= X-GMAIL-MSGID: =?utf-8?q?1756847384329549041?= This patch adds support for new instructions that may be added to the PowerPC architecture in the future to enhance the load and store vector with length instructions. The current instructions (lxvl, lxvll, stxvl, and stxvll) are inconvient to use since the count for the number of bytes must be in the top 8 bits of the GPR register, instead of the bottom 8 bits. This meant that code generating these instructions typically had to do a shift left by 56 bits to get the count into the right position. In a future version of the PowerPC architecture, new variants of these instructions might be added that expect the count to be in the bottom 8 bits of the GPR register. These patches add this support to GCC if the user uses the -mcpu=future option. I discovered that the code in rs6000-string.cc to generate ISA 3.1 lxvl/stxvl future lxvll/stxvll instructions would generate these instructions on 32-bit. However the patterns for these instructions is only done on 64-bit systems. So I added a check for 64-bit support before generating the instructions. I tested this patch on a little endian power10 system with long double using the tradiational IBM double double format. Assuming the other 6 patches for -mcpu=future are checked in (or at least the first patch), can I check this patch into the master branch for GCC 13? Note, I will be on vacation from Tuesday February 7th through Tuesday February 14th. 2023-02-03 Michael Meissner gcc/ * config/rs6000/rs6000-string.cc (expand_block_move): Do generate lxvl and stxvl on 32-bit. * config/rs6000/vsx.md (lxvl): If -mcpu=future, generate the lxvl with the shift count automaticaly used in the insn. (lxvrl): New insn for -mcpu=future. (lxvrll): Likewise. (stxvl): If -mcpu=future, generate the stxvl with the shift count automaticaly used in the insn. (stxvrl): New insn for -mcpu=future. (stxvrll): Likewise. gcc/testsuite/ * gcc.target/powerpc/lxvrl.c: New test. * lib/target-supports.exp (check_effective_target_powerpc_future_ok): New effective target. --- gcc/config/rs6000/rs6000-string.cc | 1 + gcc/config/rs6000/vsx.md | 122 +++++++++++++++++++---- gcc/testsuite/gcc.target/powerpc/lxvrl.c | 32 ++++++ gcc/testsuite/lib/target-supports.exp | 16 ++- 4 files changed, 148 insertions(+), 23 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/lxvrl.c diff --git a/gcc/config/rs6000/rs6000-string.cc b/gcc/config/rs6000/rs6000-string.cc index 75e6f8803a5..9b2f1b83b22 100644 --- a/gcc/config/rs6000/rs6000-string.cc +++ b/gcc/config/rs6000/rs6000-string.cc @@ -2811,6 +2811,7 @@ expand_block_move (rtx operands[], bool might_overlap) gen_func.mov = gen_vsx_movv2di_64bit; } else if (TARGET_BLOCK_OPS_UNALIGNED_VSX + && TARGET_POWERPC64 && TARGET_POWER10 && bytes < 16 && orig_bytes > 16 && !(bytes == 1 || bytes == 2 diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index 0865608f94a..1ab8dc373c0 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -5582,20 +5582,32 @@ (define_expand "first_mismatch_or_eos_index_" DONE; }) -;; Load VSX Vector with Length +;; Load VSX Vector with Length. If we have lxvrl, we don't have to do an +;; explicit shift left into a pseudo. (define_expand "lxvl" - [(set (match_dup 3) - (ashift:DI (match_operand:DI 2 "register_operand") - (const_int 56))) - (set (match_operand:V16QI 0 "vsx_register_operand") - (unspec:V16QI - [(match_operand:DI 1 "gpc_reg_operand") - (mem:V16QI (match_dup 1)) - (match_dup 3)] - UNSPEC_LXVL))] + [(use (match_operand:V16QI 0 "vsx_register_operand")) + (use (match_operand:DI 1 "gpc_reg_operand")) + (use (match_operand:DI 2 "gpc_reg_operand"))] "TARGET_P9_VECTOR && TARGET_64BIT" { - operands[3] = gen_reg_rtx (DImode); + rtx shift_len = gen_rtx_ASHIFT (DImode, operands[2], GEN_INT (56)); + rtx len; + + if (TARGET_FUTURE) + len = shift_len; + else + { + len = gen_reg_rtx (DImode); + emit_insn (gen_rtx_SET (len, shift_len)); + } + + rtx dest = operands[0]; + rtx addr = operands[1]; + rtx mem = gen_rtx_MEM (V16QImode, addr); + rtvec rv = gen_rtvec (3, addr, mem, len); + rtx lxvl = gen_rtx_UNSPEC (V16QImode, rv, UNSPEC_LXVL); + emit_insn (gen_rtx_SET (dest, lxvl)); + DONE; }) (define_insn "*lxvl" @@ -5619,6 +5631,34 @@ (define_insn "lxvll" "lxvll %x0,%1,%2" [(set_attr "type" "vecload")]) +;; For lxvrl and lxvrll, use the combiner to eliminate the shift. The +;; define_expand for lxvl will already incorporate the shift in generating the +;; insn. The lxvll buitl-in function required the user to have already done +;; the shift. Defining lxvrll this way, will optimize cases where the user has +;; done the shift immediately before the built-in. +(define_insn "*lxvrl" + [(set (match_operand:V16QI 0 "vsx_register_operand" "=wa") + (unspec:V16QI + [(match_operand:DI 1 "gpc_reg_operand" "b") + (mem:V16QI (match_dup 1)) + (ashift:DI (match_operand:DI 2 "register_operand" "r") + (const_int 56))] + UNSPEC_LXVL))] + "TARGET_FUTURE && TARGET_64BIT" + "lxvrl %x0,%1,%2" + [(set_attr "type" "vecload")]) + +(define_insn "*lxvrll" + [(set (match_operand:V16QI 0 "vsx_register_operand" "=wa") + (unspec:V16QI [(match_operand:DI 1 "gpc_reg_operand" "b") + (mem:V16QI (match_dup 1)) + (ashift:DI (match_operand:DI 2 "register_operand" "r") + (const_int 56))] + UNSPEC_LXVLL))] + "TARGET_FUTURE" + "lxvrll %x0,%1,%2" + [(set_attr "type" "vecload")]) + ;; Expand for builtin xl_len_r (define_expand "xl_len_r" [(match_operand:V16QI 0 "vsx_register_operand") @@ -5650,18 +5690,29 @@ (define_insn "stxvll" ;; Store VSX Vector with Length (define_expand "stxvl" - [(set (match_dup 3) - (ashift:DI (match_operand:DI 2 "register_operand") - (const_int 56))) - (set (mem:V16QI (match_operand:DI 1 "gpc_reg_operand")) - (unspec:V16QI - [(match_operand:V16QI 0 "vsx_register_operand") - (mem:V16QI (match_dup 1)) - (match_dup 3)] - UNSPEC_STXVL))] + [(use (match_operand:V16QI 0 "vsx_register_operand")) + (use (match_operand:DI 1 "gpc_reg_operand")) + (use (match_operand:DI 2 "gpc_reg_operand"))] "TARGET_P9_VECTOR && TARGET_64BIT" { - operands[3] = gen_reg_rtx (DImode); + rtx shift_len = gen_rtx_ASHIFT (DImode, operands[2], GEN_INT (56)); + rtx len; + + if (TARGET_FUTURE) + len = shift_len; + else + { + len = gen_reg_rtx (DImode); + emit_insn (gen_rtx_SET (len, shift_len)); + } + + rtx src = operands[0]; + rtx addr = operands[1]; + rtx mem = gen_rtx_MEM (V16QImode, addr); + rtvec rv = gen_rtvec (3, src, mem, len); + rtx stxvl = gen_rtx_UNSPEC (V16QImode, rv, UNSPEC_STXVL); + emit_insn (gen_rtx_SET (mem, stxvl)); + DONE; }) ;; Define optab for vector access with length vectorization exploitation. @@ -5705,6 +5756,35 @@ (define_insn "*stxvl" "stxvl %x0,%1,%2" [(set_attr "type" "vecstore")]) +;; For stxvrl and stxvrll, use the combiner to eliminate the shift. The +;; define_expand for stxvl will already incorporate the shift in generating the +;; insn. The stxvll buitl-in function required the user to have already done +;; the shift. Defining stxvrll this way, will optimize cases where the user +;; has done the shift immediately before the built-in. + +(define_insn "*stxvrl" + [(set (mem:V16QI (match_operand:DI 1 "gpc_reg_operand" "b")) + (unspec:V16QI + [(match_operand:V16QI 0 "vsx_register_operand" "wa") + (mem:V16QI (match_dup 1)) + (ashift:DI (match_operand:DI 2 "register_operand" "r") + (const_int 56))] + UNSPEC_STXVL))] + "TARGET_FUTURE && TARGET_64BIT" + "stxvrl %x0,%1,%2" + [(set_attr "type" "vecstore")]) + +(define_insn "*stxvrll" + [(set (mem:V16QI (match_operand:DI 1 "gpc_reg_operand" "b")) + (unspec:V16QI [(match_operand:V16QI 0 "vsx_register_operand" "wa") + (mem:V16QI (match_dup 1)) + (ashift:DI (match_operand:DI 2 "register_operand" "r") + (const_int 56))] + UNSPEC_STXVLL))] + "TARGET_FUTURE" + "stxvrll %x0,%1,%2" + [(set_attr "type" "vecstore")]) + ;; Expand for builtin xst_len_r (define_expand "xst_len_r" [(match_operand:V16QI 0 "vsx_register_operand" "=wa") diff --git a/gcc/testsuite/gcc.target/powerpc/lxvrl.c b/gcc/testsuite/gcc.target/powerpc/lxvrl.c new file mode 100644 index 00000000000..71854c50c91 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/lxvrl.c @@ -0,0 +1,32 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_future_ok } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-options "-mdejagnu-cpu=future -O2" } */ + +/* Test whether the lxvrl and stxvrl instructions are generated for + -mcpu=future on memory copy operations. */ + +#ifndef VSIZE +#define VSIZE 2 +#endif + +#ifndef LSIZE +#define LSIZE 5 +#endif + +struct foo { + vector unsigned char vc[VSIZE]; + unsigned char leftover[LSIZE]; +}; + +void memcpy_ptr (struct foo *p, struct foo *q) +{ + __builtin_memcpy ((void *) p, /* lxvrl and stxvrl. */ + (void *) q, + (sizeof (vector unsigned char) * VSIZE) + LSIZE); +} + +/* { dg-final { scan-assembler {\mlxvrl\M} } } */ +/* { dg-final { scan-assembler {\mstxvrl\M} } } */ +/* { dg-final { scan-assembler-not {\mlxvl\M} } } */ +/* { dg-final { scan-assembler-not {\mstxvl\M} } } */ diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 9586ed3ae47..47adf407f83 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -6581,8 +6581,8 @@ proc check_effective_target_power10_ok { } { } } -# Return 1 if this is a PowerPC target supporting -mcpu=future or -mdense-math -# which enables the dense math operations. +# Return 1 if this is a PowerPC target supporting -mcpu=future which enables +# the dense math operations. proc check_effective_target_powerpc_dense_math_ok { } { return [check_no_compiler_messages_nocache powerpc_dense_math_ok assembly { __vector_quad vq; @@ -6600,6 +6600,18 @@ proc check_effective_target_powerpc_dense_math_ok { } { } "-mcpu=future"] } +# Return 1 if this is a PowerPC target supporting -mcpu=future which enables +# the saturating subtract instruction. +proc check_effective_target_powerpc_future_ok { } { + return [check_no_compiler_messages powerpc_future_ok object { + #ifndef _ARCH_PWR_FUTURE + #error "not -mcpu=future" + #else + int dummy; + #endif + } "-mcpu=future"] +} + # Return 1 if this is a PowerPC target supporting -mfloat128 via either # software emulation on power7/power8 systems or hardware support on power9.