From patchwork Mon Nov 6 09:47:53 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: HAO CHEN GUI X-Patchwork-Id: 161912 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:8f47:0:b0:403:3b70:6f57 with SMTP id j7csp2544877vqu; Mon, 6 Nov 2023 01:48:26 -0800 (PST) X-Google-Smtp-Source: AGHT+IGaQjd9GaETKbsV++PAkP1XEC9iohyGzqfUhgPCx1nnLjyGL50qF1K0gdVUroyjrarzfoku X-Received: by 2002:a05:6214:529c:b0:66d:2af4:c423 with SMTP id kj28-20020a056214529c00b0066d2af4c423mr33235269qvb.2.1699264106291; Mon, 06 Nov 2023 01:48:26 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1699264106; cv=pass; d=google.com; s=arc-20160816; b=KV7exolq762YQFstwJfFBdUDCOl4/lr+bZHCba3MlIJjQRY10cYz+XvfTUVurBDXpi pqK5W2mtPeSDqhRQAyovv32saJPUkZoteAjV4rUhlUS/9ClnOqX1L+M+qcMHEu6zw2T7 HF4LOhu+iqEUB+dQAcrEI4/LaiqFKdavTAha9XTI80IEvD20V8qosithf1slpZfvr7IU xGOWifeG1O8OBXlHCyj0A5WoBE2sdIYxFiOIVoUdS3V+I63uDN7irqg2gWvOf1OEHsme sJpBUZKlMWqDGQVEvOnJd4ymemR/xvFOT17V7mtUj+KGqQ08/L8mYAKpO//XG/zkdY94 fVZw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :subject:from:cc:to:content-language:user-agent:mime-version:date :message-id:dkim-signature:arc-filter:dmarc-filter:delivered-to; bh=oLl5H+RXob9/bsuzTCR5oxTNbyaoEi/Gq9LFPeSjfvs=; fh=3eUSxJU+9IWNwGHlMjnmqDQDnJfeMKAjlglEUO7a4vw=; b=yceHB5PiSQ5wxxtYwrmg/4Svmj0qT8NIKoMpNR1puIOyBFSnIl9p8TCaDufgnFo5Ne HmlWu2BOuui7AuJgUCFnUQgVG1qTgyZAILmaIgz8Vo9AyLU5L+zoGE+BIXJdxsRCnLKH Yv+hVetZgSziI14upG/0GgXJz0s9LCWNlRFzafkleeLilwPAbq4R5AjZLztR9E4GA/Z5 GPLck7qjL2h6wqPi2mbUh7r5DuKS3rFLsIqAKdgQnL3kN9rZgCCEzsHQDkDOrSjjpSfI JtUsq+EXyNzCtMsFCEOVYBMOzaKzBil3vng03SEq9vCcEP7/4GhAHfM7J6OxDlFtFYMz IpCA== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=nxwpvGJS; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com Received: from server2.sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id 15-20020a05620a04cf00b0077a05a3ebd9si5156974qks.107.2023.11.06.01.48.26 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 06 Nov 2023 01:48:26 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=nxwpvGJS; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 0B84D3858D33 for ; Mon, 6 Nov 2023 09:48:26 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 0F7BD3858D28 for ; Mon, 6 Nov 2023 09:48:00 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 0F7BD3858D28 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 0F7BD3858D28 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699264081; cv=none; b=N0rINmfVtht+kESxJTExtTdfl+b7RsgoJDKN6DaJEnR85SD/WB4YbKgtwpmU1s7D07oL9elZXABiSmh0+iZJkacpgiDEx0jAFWfY5E+PxP4xv1MoanQIQZUrbfq6lysVXY4vuB2B+AY3ui3BwLxtqdZSelVd+RDDVQ2vyLzz8X0= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699264081; c=relaxed/simple; bh=BLNC/UJGrb87rvQfsjB4t8MgcGAAwCnuV+YzjiySo88=; h=DKIM-Signature:Message-ID:Date:MIME-Version:To:From:Subject; b=AZ8CdIhjiL0/2MBmJqGtH2VZOwQQXLGUkg+FiuQCVM5EXTkv2h+csD++1VaORL50aGxFNA66rKN84ANh1SiyGo5TFdpD5nHBWm/6O3XXVT6FK3Gj7QluP1z0TsWBEarnvtcpuQKPvFnQonbXBPFjYULHP+8VcU5Od1Lm6sNOA7M= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0356516.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3A69iG6Z006882; Mon, 6 Nov 2023 09:47:59 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=message-id : date : mime-version : to : cc : from : subject : content-type : content-transfer-encoding; s=pp1; bh=oLl5H+RXob9/bsuzTCR5oxTNbyaoEi/Gq9LFPeSjfvs=; b=nxwpvGJSStjNbHG/F/npCyMSZM+afgPVXZyDWvYvGwVn894DBUHevSFh6rSAqWHKu70C jNPHv9Cl4b9Hoq6ODCgavOktClC0mrGhzXnyNQ7UznWTtSe14GHOwuQhJU21tzPHxwZy Pl1B//K2NWmf6XHhha578/HmiPaSY2CmIdTcC26AJuVkXyy46/6AdiIXE1SViyAE2Njm g6c7sYMjhnGVGxg9S5Zss9svo30DacQdaNNWASzhkxbkqtXlV2yHFnSsr8j4lpfkG6X4 TikK3jfBMBmwH50nDJXh4dXUrivCtl5Re4zCkJ/96Vq4dek3MjBeIsUZDC68cpkkvnCd /w== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3u6ws7g49h-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 06 Nov 2023 09:47:59 +0000 Received: from m0356516.ppops.net (m0356516.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 3A69iPdh007423; Mon, 6 Nov 2023 09:47:58 GMT Received: from ppma11.dal12v.mail.ibm.com (db.9e.1632.ip4.static.sl-reverse.com [50.22.158.219]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3u6ws7g490-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 06 Nov 2023 09:47:58 +0000 Received: from pps.filterd (ppma11.dal12v.mail.ibm.com [127.0.0.1]) by ppma11.dal12v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 3A66wAdh016959; Mon, 6 Nov 2023 09:47:57 GMT Received: from smtprelay06.fra02v.mail.ibm.com ([9.218.2.230]) by ppma11.dal12v.mail.ibm.com (PPS) with ESMTPS id 3u6301ftat-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 06 Nov 2023 09:47:57 +0000 Received: from smtpav04.fra02v.mail.ibm.com (smtpav04.fra02v.mail.ibm.com [10.20.54.103]) by smtprelay06.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 3A69lsCq44368148 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 6 Nov 2023 09:47:54 GMT Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id A593820063; Mon, 6 Nov 2023 09:47:54 +0000 (GMT) Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id DB54F20040; Mon, 6 Nov 2023 09:47:52 +0000 (GMT) Received: from [9.197.230.36] (unknown [9.197.230.36]) by smtpav04.fra02v.mail.ibm.com (Postfix) with ESMTP; Mon, 6 Nov 2023 09:47:52 +0000 (GMT) Message-ID: <201dd572-e1fc-48c4-bd18-2f894ce31cb0@linux.ibm.com> Date: Mon, 6 Nov 2023 17:47:53 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Content-Language: en-US To: gcc-patches Cc: Segher Boessenkool , David , "Kewen.Lin" , Peter Bergner From: HAO CHEN GUI Subject: [PATCH-3v2, rs6000] Enable 16-byte by pieces move [PR111449] X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: PaC0KGZl0WV-ssJzTa87CWCdjODKL0sJ X-Proofpoint-GUID: q1f6bzOEosbi4ms_AS4vL2S0b2RCbJOJ X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.987,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-11-06_08,2023-11-02_03,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 bulkscore=0 adultscore=0 lowpriorityscore=0 suspectscore=0 mlxlogscore=999 spamscore=0 priorityscore=1501 mlxscore=0 clxscore=1015 malwarescore=0 impostorscore=0 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2310240000 definitions=main-2311060082 X-Spam-Status: No, score=-12.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1781780573402550329 X-GMAIL-MSGID: 1781807559639146320 Hi, The patch 2 enables 16-byte by pieces move on rs6000. This patch fixes the regression cases caused by previous patch. For sra-17/18, the long array with 4 elements can be loaded by one 16-byte by pieces move on 32-bit platform. So the array is not be constructed in LC0 and SRA optimization is unable to be taken. "no-vsx" option is added for 32-bit platform, as it sets the MOVE_MAX_PIECES to 4-byte on 32-bit platform and the array can't be loaded by one by pieces move. Another regression is on P8 LE. The 16-byte memory to memory is implemented by two TImode load/store. The TImode load/store is finally split to two DImode load/store on P8 LE as it doesn't have unaligned vector load/store instructions. Actually, 16-byte memory to memory move can be implement by two V2DI reversed load/store on P8 LE. The patch creates a insn_and_split pattern for this optimization. Compared to previous version, it fixes the syntax errors in test cases. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is this OK for trunk? Thanks Gui Haochen ChangeLog rs6000: Enable 16-byte by pieces move This patch enables 16-byte by pieces move. The 16-byte move is generated with TImode and finally implemented by vector instructions. There are several regression cases after the enablement. 16-byte TImode memory to memory move is originally implemented by two pairs of DImode load/store on P8 LE as there is no unaligned vsx load/store on it. The patch fixes the problem by creating an insn_and_split pattern and converts it to one pair of reversed load/store. Two SRA cases lost the SRA optimization as the array can be loaded by one 16-byte move so that not be initialized in LC0 on 32-bit platform. So fixes them by adding no-vsx option. gcc/ PR target/111449 * config/rs6000/vsx.md (*vsx_le_mem_to_mem_mov_ti): New. gcc/testsuite/ PR target/111449 * gcc.dg/tree-ssa/sra-17.c: Add no-vsx option for powerpc ilp32. * gcc.dg/tree-ssa/sra-18.c: Likewise. * gcc.target/powerpc/pr111449-1.c: New. patch.diff diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index f3b40229094..9f6bc49998a 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -414,6 +414,27 @@ (define_mode_attr VM3_char [(V2DI "d") ;; VSX moves +;; TImode memory to memory move optimization on LE with p8vector +(define_insn_and_split "*vsx_le_mem_to_mem_mov_ti" + [(set (match_operand:TI 0 "indexed_or_indirect_operand" "=Z") + (match_operand:TI 1 "indexed_or_indirect_operand" "Z"))] + "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR + && !MEM_VOLATILE_P (operands[0]) + && !MEM_VOLATILE_P (operands[1]) + && !reload_completed" + "#" + "&& 1" + [(const_int 0)] +{ + rtx tmp = gen_reg_rtx (V2DImode); + rtx src = adjust_address (operands[1], V2DImode, 0); + emit_insn (gen_vsx_ld_elemrev_v2di (tmp, src)); + rtx dest = adjust_address (operands[0], V2DImode, 0); + emit_insn (gen_vsx_st_elemrev_v2di (dest, tmp)); + DONE; +} + [(set_attr "length" "16")]) + ;; The patterns for LE permuted loads and stores come before the general ;; VSX moves so they match first. (define_insn_and_split "*vsx_le_perm_load_" diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sra-17.c b/gcc/testsuite/gcc.dg/tree-ssa/sra-17.c index 221d96b6cd9..b0d4811e77b 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/sra-17.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/sra-17.c @@ -1,6 +1,7 @@ /* { dg-do run { target { aarch64*-*-* alpha*-*-* arm*-*-* hppa*-*-* powerpc*-*-* s390*-*-* } } } */ /* { dg-options "-O2 -fdump-tree-esra --param sra-max-scalarization-size-Ospeed=32" } */ /* { dg-additional-options "-mcpu=ev4" { target alpha*-*-* } } */ +/* { dg-additional-options "-mno-vsx" { target { powerpc*-*-* && ilp32 } } } */ extern void abort (void); diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sra-18.c b/gcc/testsuite/gcc.dg/tree-ssa/sra-18.c index f5e6a21c2ae..2cdeae6e9e7 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/sra-18.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/sra-18.c @@ -1,6 +1,7 @@ /* { dg-do run { target { aarch64*-*-* alpha*-*-* arm*-*-* hppa*-*-* powerpc*-*-* s390*-*-* } } } */ /* { dg-options "-O2 -fdump-tree-esra --param sra-max-scalarization-size-Ospeed=32" } */ /* { dg-additional-options "-mcpu=ev4" { target alpha*-*-* } } */ +/* { dg-additional-options "-mno-vsx" { target { powerpc*-*-* && ilp32 } } } */ extern void abort (void); struct foo { long x; }; diff --git a/gcc/testsuite/gcc.target/powerpc/pr111449-2.c b/gcc/testsuite/gcc.target/powerpc/pr111449-2.c new file mode 100644 index 00000000000..7003bdc0208 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr111449-2.c @@ -0,0 +1,18 @@ +/* { dg-do compile { target { has_arch_pwr8 } } } */ +/* { dg-require-effective-target powerpc_p8vector_ok } */ +/* { dg-options "-mvsx -O2" } */ + +/* Ensure 16-byte by pieces move is enabled. */ + +void move1 (void *s1, void *s2) +{ + __builtin_memcpy (s1, s2, 16); +} + +void move2 (void *s1) +{ + __builtin_memcpy (s1, "0123456789012345", 16); +} + +/* { dg-final { scan-assembler-times {\mlxvd2x\M|\mp?lxv\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mstxvd2x\M|\mstxv\M} 2 } } */