From patchwork Mon Nov 6 02:38:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: HAO CHEN GUI X-Patchwork-Id: 161807 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:8f47:0:b0:403:3b70:6f57 with SMTP id j7csp2406294vqu; Sun, 5 Nov 2023 18:39:30 -0800 (PST) X-Google-Smtp-Source: AGHT+IHQRYykRc8meQiz/zsxAVxAR0Az8nag8nqTErnBRbnhqowPtA3DZilM6omjpb17YuzuH/J0 X-Received: by 2002:a05:6214:2b57:b0:66d:593e:7722 with SMTP id jy23-20020a0562142b5700b0066d593e7722mr14072802qvb.3.1699238370441; Sun, 05 Nov 2023 18:39:30 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1699238370; cv=pass; d=google.com; s=arc-20160816; b=ITEvu+1q8vpF+kszoxRKnMrCRFlVmgFq03xIa/VtV9jxUAABgEREfnFqD8QoPEX9J4 dTI5n1mmx+7lRyQLasNg6RL/ID4J19Nrzo2RyQdLu6e2fy1IJs540xd8z8nrTOPLEckc I6VVoP8pYt/I/Ll7YURweWfPKYWVRd0ONusX1nFLfEn8JQR6viMApgLgsh+fgxWJSmcV JBWsq92Td55FmhOc9hl8Q9Olo/1fxxgz0XaYltBqghEw23KvIUuiSDKkC+MoFqVavRQF D9UZOlKcdnek9E4IVAkZv0QyKtWlPQyaOrgWW/6Qv/wBrcP0pzbg1o9WTs6EimqS/eiQ o7/g== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :subject:from:cc:to:content-language:user-agent:mime-version:date :message-id:dkim-signature:arc-filter:dmarc-filter:delivered-to; bh=re/tq0qFSfQ/ZaIDwVRulJlc07vfpzYLcKCq5v4C2H8=; fh=3eUSxJU+9IWNwGHlMjnmqDQDnJfeMKAjlglEUO7a4vw=; b=prO+j0+je+camu8V7zh4iOF5E9hlB5DAwuOoRbOmLZr7UqHlHBxzx/NqEFVsePRPkd 9N/rPhZ3NnWGQg7X9gFgzRfEpuXMhwyYQk48C4SMOXMVOCzii1oNt8D8oxZW15rRZX9F 8Hn4E4STBv0RTuSBNofpzJ+6xqEncF6SQTXfJFyudk8ra2LNcFjdcuFpCyuat7mZQNTf ZIYLomprm6rdLF6S5owVG0pLArPh3M/IVgYJtPWe6nvSr1iIEH0VbtnEHCN0+Cix+cfy Go7NZ0v5dHfIeJPBF2zxsxa+WNkNvBb+7o9fxY9+ahUyuVnSJZvgJZ30RqkP6FnfAEV4 MhbQ== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=IprglU4q; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com Received: from server2.sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id h9-20020a05620a244900b0076cb04a68ddsi5132684qkn.609.2023.11.05.18.39.30 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 05 Nov 2023 18:39:30 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=IprglU4q; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 37E513858419 for ; Mon, 6 Nov 2023 02:39:30 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 478553858CDA for ; Mon, 6 Nov 2023 02:39:05 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 478553858CDA Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 478553858CDA Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699238347; cv=none; b=rIvhthWY8e/buSysEqLfRZxV/FGJ043Bg1b+1hEVpQ0+uNZT3GzTDBeJuJ9+9t9AZ2z9HTfp4CPfIyCY5gqfh+DqM89ogGMgZhdTDyvpZa3tdQZrT14XYG9rdU+BGSkpcHn5k2GjqgPAWS15oX306EgKUNSA+ALKF3xS14vHjnI= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699238347; c=relaxed/simple; bh=RryElZ63jzHmh1XyU6+CvjdjNjY8TjFZHR5ScniVs94=; h=DKIM-Signature:Message-ID:Date:MIME-Version:To:From:Subject; b=GMd3dsiUGCQimtefZxWeOc5Sj/fXvIKvmkBBqQhB4xqLZ2sZoShF1VYk3LbDC4SoAfKC4ekjZPymjOdkpX50aaIYBVCcD4854G+ypOAqYb9Qwo3gbcdgW8crJF9EqJzZ51GXiQqCOxthkoMIkQKdVpNL+Ce7j2V3YP+wiW4/HVw= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0353726.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3A61kcOa015369; Mon, 6 Nov 2023 02:39:04 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=message-id : date : mime-version : to : cc : from : subject : content-type : content-transfer-encoding; s=pp1; bh=re/tq0qFSfQ/ZaIDwVRulJlc07vfpzYLcKCq5v4C2H8=; b=IprglU4qPqShgGsS5LP4d+sc/Ts4cmJHJ1lpa57mnMcXw/tIIqcrnVXwPUV2ytV41419 JB7WvJAA2B4EKc7avb1xSNAaueT0/L4wD1Cvi98x+IXsdeyChd+3Bs8vBGR8nqZVzw1V 1vV2j8vRw8ryunfo/uL7jqvRQGhMOZh70I0LNeCLMhFcB23ysp6c5fyT36Cs0boezBae w4cS2yVNybq0nJ3NUuyGwNYTL2SXU9SPRVCpny98U5hDPgxvorlxAqdqWoFQz5SH2QdN C4Fp082bdypgrNqoBv2DhA9DL/NOPCC0USZuTFkfzwtcYhcuOdUTdNZtAgDNHQ/T1sLN DQ== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3u6jrtntu0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 06 Nov 2023 02:39:03 +0000 Received: from m0353726.ppops.net (m0353726.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 3A62JwUs016757; Mon, 6 Nov 2023 02:39:03 GMT Received: from ppma11.dal12v.mail.ibm.com (db.9e.1632.ip4.static.sl-reverse.com [50.22.158.219]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3u6jrtntth-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 06 Nov 2023 02:39:03 +0000 Received: from pps.filterd (ppma11.dal12v.mail.ibm.com [127.0.0.1]) by ppma11.dal12v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 3A5NuZ9l016964; Mon, 6 Nov 2023 02:39:01 GMT Received: from smtprelay04.fra02v.mail.ibm.com ([9.218.2.228]) by ppma11.dal12v.mail.ibm.com (PPS) with ESMTPS id 3u6301dx5e-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 06 Nov 2023 02:39:01 +0000 Received: from smtpav04.fra02v.mail.ibm.com (smtpav04.fra02v.mail.ibm.com [10.20.54.103]) by smtprelay04.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 3A62cxh140829356 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 6 Nov 2023 02:38:59 GMT Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id F3B0420043; Mon, 6 Nov 2023 02:38:58 +0000 (GMT) Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 35C4D20040; Mon, 6 Nov 2023 02:38:57 +0000 (GMT) Received: from [9.197.230.36] (unknown [9.197.230.36]) by smtpav04.fra02v.mail.ibm.com (Postfix) with ESMTP; Mon, 6 Nov 2023 02:38:56 +0000 (GMT) Message-ID: <35c10f52-facc-4da5-b3f9-d9a59dab424b@linux.ibm.com> Date: Mon, 6 Nov 2023 10:38:57 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Content-Language: en-US To: gcc-patches Cc: Segher Boessenkool , David , "Kewen.Lin" , Peter Bergner From: HAO CHEN GUI Subject: [PATCH-3, rs6000] Enable 16-byte by pieces move [PR111449] X-TM-AS-GCONF: 00 X-Proofpoint-GUID: bdqEWgBPWt7DCZRqWsJQNYo2tjjuAxVW X-Proofpoint-ORIG-GUID: JZSoS59rgqvWsmwmUTf5Lj5Ly8QYbGJ9 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.987,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-11-05_21,2023-11-02_03,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 adultscore=0 spamscore=0 clxscore=1015 mlxlogscore=999 priorityscore=1501 malwarescore=0 impostorscore=0 lowpriorityscore=0 mlxscore=0 phishscore=0 suspectscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2310240000 definitions=main-2311060020 X-Spam-Status: No, score=-12.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1781780573402550329 X-GMAIL-MSGID: 1781780573402550329 Hi, The patch 2 enables 16-byte by pieces move on rs6000. This patch fixes the regression cases caused by previous patch. For sra-17/18, the long array with 4 elements can be loaded by one 16-byte by pieces move on 32-bit platform. So the array is not be constructed in LC0 and SRA optimization is unable to be taken. "no-vsx" option is added for 32-bit platform, as it sets the MOVE_MAX_PIECES to 4-byte on 32-bit platform and the array can't be loaded by one by pieces move. Another regression is on P8 LE. The 16-byte memory to memory is implemented by two TImode load/store. The TImode load/store is finally split to two DImode load/store on P8 LE as it doesn't have unaligned vector load/store instructions. Actually, 16-byte memory to memory move can be implement by two V2DI reversed load/store on P8 LE. The patch creates a insn_and_split pattern for this optimization. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is this OK for trunk? Thanks Gui Haochen ChangeLog rs6000: Enable 16-byte by pieces move This patch enables 16-byte by pieces move. The 16-byte move is generated with TImode and finally implemented by vector instructions. There are several regression cases after the enablement. 16-byte TImode memory to memory move is originally implemented by two pairs of DImode load/store on P8 LE as there is no unalignment vsx load/store on it. The patch fixes the problem by creating an insn_and_split pattern and converts it to one pair of reversed load/store. Two SRA cases lost the SRA optimization as the array can be loaded by one 16-byte move so that not be initialized in LC0 on 32-bit platform. So fixes them by adding no-vsx option. gcc/ PR target/111449 * config/rs6000/vsx.md (*vsx_le_mem_to_mem_mov_ti): New. gcc/testsuite/ PR target/111449 * gcc.dg/tree-ssa/sra-17.c: Add no-vsx option for powerpc ilp32. * gcc.dg/tree-ssa/sra-18.c: Likewise. * gcc.target/powerpc/pr111449-1.c: New. patch.diff diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index f3b40229094..9f6bc49998a 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -414,6 +414,27 @@ (define_mode_attr VM3_char [(V2DI "d") ;; VSX moves +;; TImode memory to memory move optimization on LE with p8vector +(define_insn_and_split "*vsx_le_mem_to_mem_mov_ti" + [(set (match_operand:TI 0 "indexed_or_indirect_operand" "=Z") + (match_operand:TI 1 "indexed_or_indirect_operand" "Z"))] + "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR + && !MEM_VOLATILE_P (operands[0]) + && !MEM_VOLATILE_P (operands[1]) + && !reload_completed" + "#" + "&& 1" + [(const_int 0)] +{ + rtx tmp = gen_reg_rtx (V2DImode); + rtx src = adjust_address (operands[1], V2DImode, 0); + emit_insn (gen_vsx_ld_elemrev_v2di (tmp, src)); + rtx dest = adjust_address (operands[0], V2DImode, 0); + emit_insn (gen_vsx_st_elemrev_v2di (dest, tmp)); + DONE; +} + [(set_attr "length" "16")]) + ;; The patterns for LE permuted loads and stores come before the general ;; VSX moves so they match first. (define_insn_and_split "*vsx_le_perm_load_" diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sra-17.c b/gcc/testsuite/gcc.dg/tree-ssa/sra-17.c index 221d96b6cd9..36d72c9256b 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/sra-17.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/sra-17.c @@ -1,6 +1,7 @@ /* { dg-do run { target { aarch64*-*-* alpha*-*-* arm*-*-* hppa*-*-* powerpc*-*-* s390*-*-* } } } */ /* { dg-options "-O2 -fdump-tree-esra --param sra-max-scalarization-size-Ospeed=32" } */ /* { dg-additional-options "-mcpu=ev4" { target alpha*-*-* } } */ +/* { dg-additional-options "-mno-vsx" { target powerpc*-*-* && ilp32 } } */ extern void abort (void); diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sra-18.c b/gcc/testsuite/gcc.dg/tree-ssa/sra-18.c index f5e6a21c2ae..3682a9a8c29 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/sra-18.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/sra-18.c @@ -1,6 +1,7 @@ /* { dg-do run { target { aarch64*-*-* alpha*-*-* arm*-*-* hppa*-*-* powerpc*-*-* s390*-*-* } } } */ /* { dg-options "-O2 -fdump-tree-esra --param sra-max-scalarization-size-Ospeed=32" } */ /* { dg-additional-options "-mcpu=ev4" { target alpha*-*-* } } */ +/* { dg-additional-options "-mno-vsx" { target powerpc*-*-* && ilp32 } } */ extern void abort (void); struct foo { long x; }; diff --git a/gcc/testsuite/gcc.target/powerpc/pr111449-2.c b/gcc/testsuite/gcc.target/powerpc/pr111449-2.c new file mode 100644 index 00000000000..7003bdc0208 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr111449-2.c @@ -0,0 +1,18 @@ +/* { dg-do compile { target { has_arch_pwr8 } } } */ +/* { dg-require-effective-target powerpc_p8vector_ok } */ +/* { dg-options "-mvsx -O2" } */ + +/* Ensure 16-byte by pieces move is enabled. */ + +void move1 (void *s1, void *s2) +{ + __builtin_memcpy (s1, s2, 16); +} + +void move2 (void *s1) +{ + __builtin_memcpy (s1, "0123456789012345", 16); +} + +/* { dg-final { scan-assembler-times {\mlxvd2x\M|\mp?lxv\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mstxvd2x\M|\mstxv\M} 2 } } */