From patchwork Thu Dec 21 01:37:52 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: HAO CHEN GUI X-Patchwork-Id: 181940 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:2483:b0:fb:cd0c:d3e with SMTP id q3csp126426dyi; Wed, 20 Dec 2023 17:38:27 -0800 (PST) X-Google-Smtp-Source: AGHT+IEhPn2um+rIW2z7vqO3FyD71b99gjj7duE0eMfDQ4BRKyUczuMWwfAhUgcD76itW1VifZi+ X-Received: by 2002:a05:622a:1a19:b0:427:916a:24cd with SMTP id f25-20020a05622a1a1900b00427916a24cdmr1521112qtb.74.1703122707073; Wed, 20 Dec 2023 17:38:27 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1703122707; cv=pass; d=google.com; s=arc-20160816; b=EEzamkJS35zgCoN1msK2nM9cG65KV0LMD3QARB8ylqDDAji5PQ5B3oTH9NV6+42y0S w9HMWlxNHSfx7QTj1oykmQHQNhJh5TeOO1Ut/qm5yjeZ1GhzUPrIYzczof6WEhzxQdcz /L56jTky2OwqZ/z2U41lBpcTpmE9aN5C5fPwE02foSyI5L7cwzhT4PFIpqwW5S16lBx5 H79RYeaUseyDXTlncDOP2ej5WYhz7P+7YILYzjTogxZ9171UuJ8tGdZJGUX7BNmnGBqE iDgDqLc4WIOhkt59VffamaVEbW5zaTAAPDhZAKXum3mv3/BvWJDU8NZ8LtftjtwVEb3s 9rEw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:mime-version :content-transfer-encoding:subject:from:cc:to:content-language :user-agent:date:message-id:dkim-signature:arc-filter:dmarc-filter :delivered-to; bh=jb7Vu8QBOPwk77VroIjPMN/AqwQmBZTzcbOoeS+GMbM=; fh=3eUSxJU+9IWNwGHlMjnmqDQDnJfeMKAjlglEUO7a4vw=; b=lTjeMSBZk0b+l9KQGfMkXnIMG4C9hvuJJmk2WKigOnqnj4iz4AapFXhNtbWVmj0Izj jKGKBbdhgFmX8yXzV6ME7/77puYkP9Pq2eJ0muzQ7EGPw0Y+fFKZnoj37I0jvLUkAZSs b+TvTIhtPPUaYk19WFzIoM8qIZowsk7AnMTHZKGDnibTuUY6kXGvFu4Y6k2nrCBv+UQC P3k4rOjPhcL1eMN+/4j7HhMAxKn+BfJgNM3hWf1UcI0qjRFqKtYEWaD3y/ZaYwnHNVNA QSfjD+0reOhr66mS7mJZ+XnkPojMRuP4VL/VuxtwdW8xlmIBgKHA3eLBnD1A7DJWXJbP wvDg== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=WGi74jie; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com Received: from server2.sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id e15-20020a05622a110f00b00425e75b5a54si1111224qty.343.2023.12.20.17.38.26 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 Dec 2023 17:38:27 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=WGi74jie; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id C80723861839 for ; Thu, 21 Dec 2023 01:38:26 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id C60CB3858CD1 for ; Thu, 21 Dec 2023 01:38:00 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C60CB3858CD1 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org C60CB3858CD1 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1703122683; cv=none; b=lLuA06ZBCTudc3ZRJXlCfpS6dqRaS910pLDj2v4w3AyG0oOJ8q2jDk7iMQLrZVs9uMsyV8I+AaZZJZdTNQIXNMBbmql2gpEZWS6IyRTd3CJkSymROR37QuKZQQMdZvFQHNG0y53EyVhYW7g5735/HS9ArCa1ADt2tGw5vMjgjE8= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1703122683; c=relaxed/simple; bh=PpziSH4QRk1hcebApR2B6X18LkcPmn9UXYHcirCB62c=; h=DKIM-Signature:Message-ID:Date:To:From:Subject:MIME-Version; b=kQpqqE5Ov8vFO1WfIYjANtfWnU0aO9vFs6VId6XiHwaiCXLc4nolh04MMBo/3Rh6TZcNLar0NUhNSp+A8oAowtOS94qDJ38IXdXUyVz1bmwI5AWFR3TNCefYc3XjAtmjJAKt1Vrr7XknMI10McfYF8cnQhd3uHGTtN6JfY/jpAI= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0353728.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3BL1Ftj8017820; Thu, 21 Dec 2023 01:37:59 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=message-id : date : to : cc : from : subject : content-type : content-transfer-encoding : mime-version; s=pp1; bh=jb7Vu8QBOPwk77VroIjPMN/AqwQmBZTzcbOoeS+GMbM=; b=WGi74jiee+OpOAbLIB09pZbcoUvNE25Y0qsOxdaTgkdz3bLQcNMNd5LUBwZmgBLv1vza qPaLRwUDRAiVmCk9/iyaHqLsYjwcBMQepSm8De8kB/DqeHv1yDvfjm8gzoUs4ewobQsQ +0fF72iwHwdPvhU46CHE/WjXwE12K09VDrGnE5mG9I0eT9kYbW6l25Yr3BSyEks0q6F2 kQwon3N1yXIFL8UB4QskPnaXZHtzKrs7ZRaLtwesyW1jzZQgRBii5Z4u0rUHB/tgZSv+ BHqyzlMr4WBrHcvQiIsn1st8nBk48HXMTsrJcBrzOFjJbGEiruhLEnTVlrcQ5Rj98Unn aw== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3v4ayrs1g7-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 21 Dec 2023 01:37:59 +0000 Received: from m0353728.ppops.net (m0353728.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 3BL1bwJt012656; Thu, 21 Dec 2023 01:37:58 GMT Received: from ppma13.dal12v.mail.ibm.com (dd.9e.1632.ip4.static.sl-reverse.com [50.22.158.221]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3v4ayrs1fx-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 21 Dec 2023 01:37:58 +0000 Received: from pps.filterd (ppma13.dal12v.mail.ibm.com [127.0.0.1]) by ppma13.dal12v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 3BL0Ubad027073; Thu, 21 Dec 2023 01:37:57 GMT Received: from smtprelay02.fra02v.mail.ibm.com ([9.218.2.226]) by ppma13.dal12v.mail.ibm.com (PPS) with ESMTPS id 3v1rek9r80-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 21 Dec 2023 01:37:57 +0000 Received: from smtpav05.fra02v.mail.ibm.com (smtpav05.fra02v.mail.ibm.com [10.20.54.104]) by smtprelay02.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 3BL1bs2x21693028 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 21 Dec 2023 01:37:54 GMT Received: from smtpav05.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 9F1EC20043; Thu, 21 Dec 2023 01:37:54 +0000 (GMT) Received: from smtpav05.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 395D820040; Thu, 21 Dec 2023 01:37:53 +0000 (GMT) Received: from [9.200.103.64] (unknown [9.200.103.64]) by smtpav05.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 21 Dec 2023 01:37:52 +0000 (GMT) Message-ID: <138fd05a-75c8-4a4b-b358-9633e087da20@linux.ibm.com> Date: Thu, 21 Dec 2023 09:37:52 +0800 User-Agent: Mozilla Thunderbird Content-Language: en-US To: gcc-patches Cc: Segher Boessenkool , David , "Kewen.Lin" , Peter Bergner From: HAO CHEN GUI Subject: [Patchv3, rs6000] Clean up pre-checkings of expand_block_compare X-TM-AS-GCONF: 00 X-Proofpoint-GUID: 9ApcnRD1eFkZ9Bmua02zo8j3e9Sr0Udk X-Proofpoint-ORIG-GUID: APvHlc2Pu5B6EFrj9aH7nidfLgmhGVqT X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.997,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-12-20_15,2023-12-20_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 malwarescore=0 spamscore=0 clxscore=1011 bulkscore=0 lowpriorityscore=0 impostorscore=0 adultscore=0 priorityscore=1501 phishscore=0 mlxlogscore=999 suspectscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2311290000 definitions=main-2312210010 X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1785853595691428858 X-GMAIL-MSGID: 1785853595691428858 Hi, This patch cleans up pre-checkings of expand_block_compare. It does 1. Assert only P7 above can enter this function as it's already guard by the expand. 2. Remove P7 processor test as only P7 above can enter this function and P7 LE is excluded by targetm.slow_unaligned_access. On P7 BE, the performance of expand is better than the performance of library when the length is long. Compared to last version, https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640833.html the main change is to split optimization for size to a separate patch and add a testcase for P7 BE. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is this OK for trunk? Thanks Gui Haochen ChangeLog rs6000: Clean up the pre-checkings of expand_block_compare Remove P7 CPU test as only P7 above can enter this function and P7 LE is excluded by the checking of targetm.slow_unaligned_access on word_mode. Also performance test shows the expand of block compare is better than library on P7 BE when the length is from 16 bytes to 64 bytes. gcc/ * gcc/config/rs6000/rs6000-string.cc (expand_block_compare): Assert only P7 above can enter this function. Remove P7 CPU test and let P7 BE do the expand. gcc/testsuite/ * gcc.target/powerpc/block-cmp-4.c: New. patch.diff diff --git a/gcc/config/rs6000/rs6000-string.cc b/gcc/config/rs6000/rs6000-string.cc index 5149273b80e..09db57255fa 100644 --- a/gcc/config/rs6000/rs6000-string.cc +++ b/gcc/config/rs6000/rs6000-string.cc @@ -1947,15 +1947,12 @@ expand_block_compare_gpr(unsigned HOST_WIDE_INT bytes, unsigned int base_align, bool expand_block_compare (rtx operands[]) { + /* TARGET_POPCNTD is already guarded at expand cmpmemsi. */ + gcc_assert (TARGET_POPCNTD); + if (optimize_insn_for_size_p ()) return false; - rtx target = operands[0]; - rtx orig_src1 = operands[1]; - rtx orig_src2 = operands[2]; - rtx bytes_rtx = operands[3]; - rtx align_rtx = operands[4]; - /* This case is complicated to handle because the subtract with carry instructions do not generate the 64-bit carry and so we must emit code to calculate it ourselves. @@ -1963,23 +1960,19 @@ expand_block_compare (rtx operands[]) if (TARGET_32BIT && TARGET_POWERPC64) return false; - bool isP7 = (rs6000_tune == PROCESSOR_POWER7); - /* Allow this param to shut off all expansion. */ if (rs6000_block_compare_inline_limit == 0) return false; - /* targetm.slow_unaligned_access -- don't do unaligned stuff. - However slow_unaligned_access returns true on P7 even though the - performance of this code is good there. */ - if (!isP7 - && (targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src1)) - || targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src2)))) - return false; + rtx target = operands[0]; + rtx orig_src1 = operands[1]; + rtx orig_src2 = operands[2]; + rtx bytes_rtx = operands[3]; + rtx align_rtx = operands[4]; - /* Unaligned l*brx traps on P7 so don't do this. However this should - not affect much because LE isn't really supported on P7 anyway. */ - if (isP7 && !BYTES_BIG_ENDIAN) + /* targetm.slow_unaligned_access -- don't do unaligned stuff. */ + if (targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src1)) + || targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src2))) return false; /* If this is not a fixed size compare, try generating loop code and @@ -2027,14 +2020,6 @@ expand_block_compare (rtx operands[]) if (!IN_RANGE (bytes, 1, max_bytes)) return expand_compare_loop (operands); - /* The code generated for p7 and older is not faster than glibc - memcmp if alignment is small and length is not short, so bail - out to avoid those conditions. */ - if (targetm.slow_unaligned_access (word_mode, base_align * BITS_PER_UNIT) - && ((base_align == 1 && bytes > 16) - || (base_align == 2 && bytes > 32))) - return false; - rtx final_label = NULL; if (use_vec) diff --git a/gcc/testsuite/gcc.target/powerpc/block-cmp-4.c b/gcc/testsuite/gcc.target/powerpc/block-cmp-4.c new file mode 100644 index 00000000000..c86febae68a --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/block-cmp-4.c @@ -0,0 +1,11 @@ +/* { dg-do compile { target be } } */ +/* { dg-options "-O2 -mdejagnu-cpu=power7" } */ +/* { dg-final { scan-assembler-not {\mb[l]? memcmp\M} } } */ + +/* Test that it does expand for memcmpsi instead of calling library on + P7 BE when length is less than 32 bytes. */ + +int foo (const char* s1, const char* s2) +{ + return __builtin_memcmp (s1, s2, 31); +}