From patchwork Sun Sep 10 17:28:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Surya Kumari Jangala X-Patchwork-Id: 137864 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:ab0a:0:b0:3f2:4152:657d with SMTP id m10csp1682036vqo; Sun, 10 Sep 2023 10:29:24 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHelqT4ybw2O+Nb1GyXnukdAhOzs7Z+mKT8JbWieIovrvh/PmCptqG7DnfOsCoFAmE9w6gR X-Received: by 2002:a05:6512:3bc:b0:4ff:8f76:677f with SMTP id v28-20020a05651203bc00b004ff8f76677fmr5317183lfp.67.1694366963828; Sun, 10 Sep 2023 10:29:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1694366963; cv=none; d=google.com; s=arc-20160816; b=0Ua3R1S6PybbnBZE9JYo6/49Z/BnsmUTeTlLSqmYrh7cZfMYKyPOo+3YwXQY4viEbI CpOzj3D4kolPQx29j9Lb7PRwaEmiWWGanHPLKwt4N2T54QmD9zpITpauQlS6uDCcS1Oo tAFl6MnbVDq0lpE9l6vRbel6Z1mhVsahBCrZZ2pqqm7vfOXxAwn35PcaRNmN9FwphMb2 0gvSGUltCluaXLHLXYrTyQHyoAnaG/Wjgu9e7/itbd6RJ81cwLWkeWIpbwNq0h6nDZmY 0S5k27DGz7aAwUfpeOUiSvwNTlL+xdu98lGeYQFRyA3NkhBKKdhnJHYiq9AYtED1vJfQ yz5Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence :content-transfer-encoding:subject:cc:to:content-language:user-agent :mime-version:date:message-id:dmarc-filter:delivered-to :dkim-signature:dkim-filter; bh=vkfxs5jSJg8DwTlLz++xmu7YUDoxCfdIVJsYUnb9DUc=; fh=uK2O7VidJgz+S6hEeTIm9OHIxOdWtgLHQpcI34tJxn0=; b=VQcitxhoxjj+8LMcsB1H1JaciFnX7jwbHZYgABQC7hGo9XUStneexTBHlCfeI/J2SZ rnxOYBtlmOFfexy1KDTzttEuFqEiNDfbg53MddZwtF2B1EAFIUWoOnQWdQOEue6gmSBJ k8tqV5tIIj/BP0fdlGQZ5KbZJM72JIBxYN9OTX4D/2Izu5WDqLtC1WAOgqnVopFPTOyX hh1IQFtJD5LrkcD+piYFk9A9/4zZJMt7LDvSXwIZxvkJXd9FcG2OyRutVsS7iGisB+ML BRJUMZY8GMX9rlrPaUU61IyWQ3dCvc46UTsqmhAdNNEPpsqBcA6cQn9MKQ9PB5v1mKAV ApOg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=mdxFt45I; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id o2-20020aa7c502000000b0052a12558983si5073425edq.303.2023.09.10.10.29.23 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 10 Sep 2023 10:29:23 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=mdxFt45I; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 5F2AD3858C5E for ; Sun, 10 Sep 2023 17:29:22 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 5F2AD3858C5E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1694366962; bh=vkfxs5jSJg8DwTlLz++xmu7YUDoxCfdIVJsYUnb9DUc=; h=Date:To:Cc:Subject:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:From; b=mdxFt45ISRDe+lSX2Y7k+w0Gvid53bY8KIjcZ3ZhBzpz3vms4sdgutwFV97jZYJBP lYd1rHTKX2fAVee0HhjocjBuhDC9kzh+RkTMz2CUeHwsGKBruVVbOHrBg/+y/9lRPR vgF8CxwlDixHxAIZeqBqp85BSHoTP9fe5CKj/IJc= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id B271A3858D33 for ; Sun, 10 Sep 2023 17:28:37 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B271A3858D33 Received: from pps.filterd (m0353725.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 38AGd2Yk031470; Sun, 10 Sep 2023 17:28:37 GMT Received: from ppma22.wdc07v.mail.ibm.com (5c.69.3da9.ip4.static.sl-reverse.com [169.61.105.92]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3t168pjexb-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sun, 10 Sep 2023 17:28:36 +0000 Received: from pps.filterd (ppma22.wdc07v.mail.ibm.com [127.0.0.1]) by ppma22.wdc07v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 38AG3dlQ012064; Sun, 10 Sep 2023 17:28:36 GMT Received: from smtprelay03.wdc07v.mail.ibm.com ([172.16.1.70]) by ppma22.wdc07v.mail.ibm.com (PPS) with ESMTPS id 3t13dy4rtg-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sun, 10 Sep 2023 17:28:36 +0000 Received: from smtpav03.wdc07v.mail.ibm.com (smtpav03.wdc07v.mail.ibm.com [10.39.53.230]) by smtprelay03.wdc07v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 38AHSZwQ3015186 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Sun, 10 Sep 2023 17:28:35 GMT Received: from smtpav03.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 40B715805A; Sun, 10 Sep 2023 17:28:35 +0000 (GMT) Received: from smtpav03.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id EB74058054; Sun, 10 Sep 2023 17:28:33 +0000 (GMT) Received: from [9.61.146.15] (unknown [9.61.146.15]) by smtpav03.wdc07v.mail.ibm.com (Postfix) with ESMTP; Sun, 10 Sep 2023 17:28:33 +0000 (GMT) Message-ID: <0976f3cd-9e80-d7bf-ad9a-44e72452aebd@linux.vnet.ibm.com> Date: Sun, 10 Sep 2023 22:58:32 +0530 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.15.0 Content-Language: en-US To: Segher Boessenkool , GCC Patches Cc: Peter Bergner Subject: [PATCH v2] swap: Fix incorrect lane extraction by vec_extract() [PR106770] X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: 3P6wvxmQm9dF-iH8jtBnGjQiUJtFadmn X-Proofpoint-GUID: 3P6wvxmQm9dF-iH8jtBnGjQiUJtFadmn X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.267,Aquarius:18.0.957,Hydra:6.0.601,FMLib:17.11.176.26 definitions=2023-09-10_12,2023-09-05_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxscore=0 bulkscore=0 mlxlogscore=999 lowpriorityscore=0 priorityscore=1501 impostorscore=0 suspectscore=0 adultscore=0 malwarescore=0 clxscore=1011 phishscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2308100000 definitions=main-2309100154 X-Spam-Status: No, score=-12.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Surya Kumari Jangala via Gcc-patches From: Surya Kumari Jangala Reply-To: Surya Kumari Jangala Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1776672533467430692 X-GMAIL-MSGID: 1776672533467430692 swap: Fix incorrect lane extraction by vec_extract() [PR106770] In the routine rs6000_analyze_swaps(), special handling of swappable instructions is done even if the webs that contain the swappable instructions are not optimized, i.e., the webs do not contain any permuting load/store instructions along with the associated register swap instructions. Doing special handling in such webs will result in the extracted lane being adjusted unnecessarily for vec_extract. Another issue is that existing code treats non-permuting loads/stores as special swappables. Non-permuting loads/stores (that have not yet been split into a permuting load/store and a swap) are handled by converting them into a permuting load/store (which effectively removes the swap). As a result, if special swappables are handled only in webs containing permuting loads/stores, then non-optimal code is generated for non-permuting loads/stores. Hence, in this patch, all webs containing either permuting loads/ stores or non-permuting loads/stores are marked as requiring special handling of swappables. Swaps associated with permuting loads/stores are marked for removal, and non-permuting loads/stores are converted to permuting loads/stores. Then the special swappables in the webs are fixed up. Another issue with always handling swappable instructions is that it is incorrect to do so in webs where loads/stores on quad word aligned addresses are changed to lvx/stvx. Similarly, in webs where swap(load(vector constant)) instructions are replaced with load(swapped vector constant), the swappable instructions should not be modified. 2023-09-10 Surya Kumari Jangala gcc/ PR rtl-optimization/PR106770 * config/rs6000/rs6000-p8swap.cc (non_permuting_mem_insn): New function. (handle_non_permuting_mem_insn): New function. (rs6000_analyze_swaps): Handle swappable instructions only in certain webs. (web_requires_special_handling): New instance variable. (handle_special_swappables): Remove handling of non-permuting load/store instructions. gcc/testsuite/ PR rtl-optimization/PR106770 * gcc.target/powerpc/pr106770.c: New test. diff --git a/gcc/config/rs6000/rs6000-p8swap.cc b/gcc/config/rs6000/rs6000-p8swap.cc index 0388b9bd736..3a695aa1318 100644 --- a/gcc/config/rs6000/rs6000-p8swap.cc +++ b/gcc/config/rs6000/rs6000-p8swap.cc @@ -179,6 +179,13 @@ class swap_web_entry : public web_entry_base unsigned int special_handling : 4; /* Set if the web represented by this entry cannot be optimized. */ unsigned int web_not_optimizable : 1; + /* Set if the swappable insns in the web represented by this entry + have to be fixed. Swappable insns have to be fixed in : + - webs containing permuting loads/stores and the swap insns + in such webs have been marked for removal + - webs where non-permuting loads/stores have been converted + to permuting loads/stores */ + unsigned int web_requires_special_handling : 1; /* Set if this insn should be deleted. */ unsigned int will_delete : 1; }; @@ -1468,14 +1475,6 @@ handle_special_swappables (swap_web_entry *insn_entry, unsigned i) if (dump_file) fprintf (dump_file, "Adjusting subreg in insn %d\n", i); break; - case SH_NOSWAP_LD: - /* Convert a non-permuting load to a permuting one. */ - permute_load (insn); - break; - case SH_NOSWAP_ST: - /* Convert a non-permuting store to a permuting one. */ - permute_store (insn); - break; case SH_EXTRACT: /* Change the lane on an extract operation. */ adjust_extract (insn); @@ -2401,6 +2400,25 @@ recombine_lvx_stvx_patterns (function *fun) free (to_delete); } +/* Return true if insn is a non-permuting load/store. */ +static bool +non_permuting_mem_insn (swap_web_entry *insn_entry, unsigned int i) +{ + return (insn_entry[i].special_handling == SH_NOSWAP_LD || + insn_entry[i].special_handling == SH_NOSWAP_ST); +} + +/* Convert a non-permuting load/store insn to a permuting one. */ +static void +handle_non_permuting_mem_insn (swap_web_entry *insn_entry, unsigned int i) +{ + rtx_insn *insn = insn_entry[i].insn; + if (insn_entry[i].special_handling == SH_NOSWAP_LD) + permute_load (insn); + else if (insn_entry[i].special_handling == SH_NOSWAP_ST) + permute_store (insn); +} + /* Main entry point for this pass. */ unsigned int rs6000_analyze_swaps (function *fun) @@ -2624,25 +2642,56 @@ rs6000_analyze_swaps (function *fun) dump_swap_insn_table (insn_entry); } - /* For each load and store in an optimizable web (which implies - the loads and stores are permuting), find the associated - register swaps and mark them for removal. Due to various - optimizations we may mark the same swap more than once. Also - perform special handling for swappable insns that require it. */ + /* There are two kinds of optimizations that can be performed on an + optimizable web: + 1. Remove the register swaps associated with permuting load/store + in an optimizable web + 2. Convert the vanilla loads/stores (that have not yet been split + into a permuting load/store and a swap) into a permuting + load/store (which effectively removes the swap) + In both the cases, swappable instructions in the webs need + special handling to fix them up. */ for (i = 0; i < e; ++i) + /* For each permuting load/store in an optimizable web, find + the associated register swaps and mark them for removal. + Due to various optimizations we may mark the same swap more + than once. */ if ((insn_entry[i].is_load || insn_entry[i].is_store) && insn_entry[i].is_swap) { swap_web_entry* root_entry = (swap_web_entry*)((&insn_entry[i])->unionfind_root ()); if (!root_entry->web_not_optimizable) - mark_swaps_for_removal (insn_entry, i); + { + mark_swaps_for_removal (insn_entry, i); + root_entry->web_requires_special_handling = true; + } } - else if (insn_entry[i].is_swappable && insn_entry[i].special_handling) + /* Convert the non-permuting loads/stores into a permuting + load/store. */ + else if (insn_entry[i].is_swappable + && non_permuting_mem_insn (insn_entry, i)) { swap_web_entry* root_entry = (swap_web_entry*)((&insn_entry[i])->unionfind_root ()); if (!root_entry->web_not_optimizable) + { + handle_non_permuting_mem_insn (insn_entry, i); + root_entry->web_requires_special_handling = true; + } + } + + /* Perform special handling for swappable insns that require it. + Note that special handling should be done only for those + swappable insns that are present in webs marked as requiring + special handling. */ + for (i = 0; i < e; ++i) + if (insn_entry[i].is_swappable && insn_entry[i].special_handling && + !non_permuting_mem_insn (insn_entry, i)) + { + swap_web_entry* root_entry + = (swap_web_entry*)((&insn_entry[i])->unionfind_root ()); + if (root_entry->web_requires_special_handling) handle_special_swappables (insn_entry, i); } diff --git a/gcc/testsuite/gcc.target/powerpc/pr106770.c b/gcc/testsuite/gcc.target/powerpc/pr106770.c new file mode 100644 index 00000000000..11efe39abc5 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr106770.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_p8vector_ok } */ +/* { dg-options "-mdejagnu-cpu=power8 -O2 " } */ +/* The 2 xxpermdi instructions are generated by the two + calls to vec_promote() */ +/* { dg-final { scan-assembler-times "xxpermdi" 2 } } */ + +/* Test case to resolve PR106770 */ + +#include + +int cmp2(double a, double b) +{ + vector double va = vec_promote(a, 1); + vector double vb = vec_promote(b, 1); + vector long long vlt = (vector long long)vec_cmplt(va, vb); + vector long long vgt = (vector long long)vec_cmplt(vb, va); + vector signed long long vr = vec_sub(vlt, vgt); + + return vec_extract(vr, 1); +}