From patchwork Mon Feb 26 02:25:35 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: HAO CHEN GUI X-Patchwork-Id: 206175 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:a81b:b0:108:e6aa:91d0 with SMTP id bq27csp1830914dyb; Sun, 25 Feb 2024 18:26:34 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCWVz1bgpjHfzKIuMZ7fwajs1j6XALlKw7qC5XS9Boi+BZIjeIPsqvwG80yrRy1W/Q0kvv0lk0cc6/KaEnj4QMNddJ7MUw== X-Google-Smtp-Source: AGHT+IFRJBXoJ9AG4mrecSiGLO1B7gNrIfkEFO1smhxCguuE9pn+v/C3DpNc196nDcj2XPUG3NpW X-Received: by 2002:a05:620a:8588:b0:787:be96:8145 with SMTP id pf8-20020a05620a858800b00787be968145mr7118499qkn.23.1708914393861; Sun, 25 Feb 2024 18:26:33 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1708914393; cv=pass; d=google.com; s=arc-20160816; b=NcJsknU5zfncMuyXiCE/aIFf19roSNf1FOrevqGQy/UbadbgnFTf9Lln8vvbOowDI4 arpwlJwfEHNWYd5bnvZO6v5WeNLhYLCkXzL3fk01BX18eX9RQo+2joDcp19fcJEkuw3X pb56AKsnWop1aFqtnAP3Y0CgZ033nikYG+yAesy3MOVLmhfKKo9eOI5tFGXcmx3KfFWB oF3qhKmmAbPGkbsUROSDlLlL94xAHXIUDltdyFB4oHeQEltPv4IPLPxVmnz96wkPDJ/4 WFS7YasE7tsZPAdZJCWY+8IGcj8PQVNsHRTg3K+ooTTYpFwu92WeTZ1PNCPsmOyOH2iR 06+g== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :subject:from:cc:to:content-language:user-agent:mime-version:date :message-id:dkim-signature:arc-filter:dmarc-filter:delivered-to; bh=9LomvRs2TCifqXBlu/Aq6DWZOKT0GvJxxNDilUwUiUI=; fh=3eUSxJU+9IWNwGHlMjnmqDQDnJfeMKAjlglEUO7a4vw=; b=qPkqPz3lMZLtPMRkBlwG/J0zc0vpBvJFD3vZlL2QP3VnByxgdJg7uNcsRy5LdhnltJ XNA1a9gDg+kcduU1rh4rWBx5FV3IHG1PdgVVE95NulsVfbSX1BFYphbvLLGVysj2wL/G DtEaTsw5mQ3o8CqTwAiVrLObAshQN+zNtjaeZh/6++6xkh9pnLxGFl3aDE+bQ7d9iQnt EmbSf4Khh6IjQA/1GVCtOYGFfxOcq1O/nbNiGozH31aYcJcW43UgQ78qJlANQZJIfxpV HdwHwTxzisQU+myn6VH9sEi+XXVL9w+GTP4M2gQGC1LDxJaggdvKHKg3MBu09EDAP6Lr d+pw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=QoTfL5Ig; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com Received: from server2.sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id z18-20020a05620a101200b00785da71a474si4103366qkj.193.2024.02.25.18.26.33 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 25 Feb 2024 18:26:33 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=QoTfL5Ig; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 474743858D1E for ; Mon, 26 Feb 2024 02:26:33 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id C8F4B3858D1E for ; Mon, 26 Feb 2024 02:25:46 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C8F4B3858D1E Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org C8F4B3858D1E Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1708914348; cv=none; b=dLOFNsydcNfZDVxtmcXKvrFgkenmKhftsd3pDusNSWUpb1Lpwf/loEULAnhdJBredi+bJjZ/tIOcqTqqxKugMNrLpPpaOMWfn1nkvneETHiZPMWt4MyyFrEtvc71iebMaj7T1IOzVmHbkAGAvzxQhV4VMFd9PqCA1cgrDSojYdo= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1708914348; c=relaxed/simple; bh=bIarQdjIhb7SOR2f+byxxQncgGkLOy2Z2e3vAm0LEfs=; h=DKIM-Signature:Message-ID:Date:MIME-Version:To:From:Subject; b=KbL2K79tBfkxNwBxplQs0PLlwbl8uNDW/W4Rpf05P0uCySiSk9hROnxOwPeaQ+/KgYbbhpIuqwvYaHhlXrZP8jb0W3oJHGrubyJbmUMrSXmAGESq6PMS0whWmFYkA71BY8ZQe6fWIwov3xAyWHmthYU3qMsPK0ME+b12mSqhrlE= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0360072.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 41PM1VgY000786; Mon, 26 Feb 2024 02:25:46 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=message-id : date : mime-version : to : cc : from : subject : content-type : content-transfer-encoding; s=pp1; bh=9LomvRs2TCifqXBlu/Aq6DWZOKT0GvJxxNDilUwUiUI=; b=QoTfL5IgVRnZL1vGxfYe5a5eAOmyZLFHc5U1nIcwtoOHhOGYaxQ6gNARD1hCcahIx0Ty qAz947b5Q3fsgeHi+30QRc3rl2o/3ShTl/MhIBrIBis3PFtWXvh9eLiZbrrXZ8VkwKYf tucM3x4XG1NnaMx9QkYxYDD7px5Hi2I6kSQJCgfmsn8zMNYwOytnJFvojZLBOuoAPheA KMF+s6DbyeLAoGM99Vo01oM1kKD1AJ6/y+ROeG6HYoI2EC184Z/CHtkwHjUdN5HFMysr +jWK4AYECBLZFXNJKDvSSmoTs+WxultP6zCd3HTRGsJilX9jdDFzwd7lbrtjQnFaQ6C+ Vg== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3wg0bgw0s4-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 26 Feb 2024 02:25:45 +0000 Received: from m0360072.ppops.net (m0360072.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 41Q2DhoY025626; Mon, 26 Feb 2024 02:25:45 GMT Received: from ppma12.dal12v.mail.ibm.com (dc.9e.1632.ip4.static.sl-reverse.com [50.22.158.220]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3wg0bgw0ry-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 26 Feb 2024 02:25:45 +0000 Received: from pps.filterd (ppma12.dal12v.mail.ibm.com [127.0.0.1]) by ppma12.dal12v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 41Q0xm8Z008802; Mon, 26 Feb 2024 02:25:44 GMT Received: from smtprelay01.fra02v.mail.ibm.com ([9.218.2.227]) by ppma12.dal12v.mail.ibm.com (PPS) with ESMTPS id 3wftst6few-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 26 Feb 2024 02:25:44 +0000 Received: from smtpav05.fra02v.mail.ibm.com (smtpav05.fra02v.mail.ibm.com [10.20.54.104]) by smtprelay01.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 41Q2PdG420054706 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 26 Feb 2024 02:25:41 GMT Received: from smtpav05.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 3E2832004D; Mon, 26 Feb 2024 02:25:39 +0000 (GMT) Received: from smtpav05.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 5914020040; Mon, 26 Feb 2024 02:25:37 +0000 (GMT) Received: from [9.197.226.11] (unknown [9.197.226.11]) by smtpav05.fra02v.mail.ibm.com (Postfix) with ESMTP; Mon, 26 Feb 2024 02:25:36 +0000 (GMT) Message-ID: Date: Mon, 26 Feb 2024 10:25:35 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Content-Language: en-US To: gcc-patches Cc: Segher Boessenkool , David , "Kewen.Lin" , Peter Bergner From: HAO CHEN GUI Subject: [Patch, rs6000] Enable overlap memory store for block memory clear X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: NYEJvQi9VhHe505vqeEj4zzqnPc0BuRv X-Proofpoint-GUID: 4cNn3NojiSybMRGTuwHFpLHFCKZbVMwU X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.1011,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2024-02-26_01,2024-02-23_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 suspectscore=0 priorityscore=1501 mlxscore=0 adultscore=0 clxscore=1011 lowpriorityscore=0 malwarescore=0 spamscore=0 mlxlogscore=999 phishscore=0 impostorscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2311290000 definitions=main-2402260016 X-Spam-Status: No, score=-12.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1791926619254104224 X-GMAIL-MSGID: 1791926619254104224 Hi, This patch enables overlap memory store for block memory clear which saves the number of store instructions. The expander calls widest_fixed_size_mode_for_block_clear to get the mode for looped block clear and calls widest_fixed_size_mode_for_block_clear to get the mode for last overlapped clear. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is it OK for the trunk or next stage 1? Thanks Gui Haochen ChangeLog rs6000: Enable overlap memory store for block memory clear gcc/ * config/rs6000/rs6000-string.cc (widest_fixed_size_mode_for_block_clear): New. (smallest_fixed_size_mode_for_block_clear): New. (expand_block_clear): Call widest_fixed_size_mode_for_block_clear to get the mode for looped memory stores and call smallest_fixed_size_mode_for_block_clear to get the mode for the last overlapped memory store. gcc/testsuite * gcc.target/powerpc/block-clear-1.c: New. patch.diff diff --git a/gcc/config/rs6000/rs6000-string.cc b/gcc/config/rs6000/rs6000-string.cc index 133e5382af2..c2a6095a586 100644 --- a/gcc/config/rs6000/rs6000-string.cc +++ b/gcc/config/rs6000/rs6000-string.cc @@ -38,6 +38,49 @@ #include "profile-count.h" #include "predict.h" +/* Return the widest mode which mode size is less than or equal to the + size. */ +static fixed_size_mode +widest_fixed_size_mode_for_block_clear (unsigned int size, unsigned int align, + bool unaligned_vsx_ok) +{ + machine_mode mode; + + if (TARGET_ALTIVEC + && size >= 16 + && (align >= 128 + || unaligned_vsx_ok)) + mode = V4SImode; + else if (size >= 8 + && TARGET_POWERPC64 + && (align >= 64 + || !STRICT_ALIGNMENT)) + mode = DImode; + else if (size >= 4 + && (align >= 32 + || !STRICT_ALIGNMENT)) + mode = SImode; + else if (size >= 2 + && (align >= 16 + || !STRICT_ALIGNMENT)) + mode = HImode; + else + mode = QImode; + + return as_a (mode); +} + +/* Return the smallest mode which mode size is smaller than or eqaul to + the size. */ +static fixed_size_mode +smallest_fixed_size_mode_for_block_clear (unsigned int size) +{ + if (size > UNITS_PER_WORD) + return as_a (V4SImode); + + return smallest_int_mode_for_size (size * BITS_PER_UNIT); +} + /* Expand a block clear operation, and return 1 if successful. Return 0 if we should let the compiler generate normal code. @@ -55,7 +98,6 @@ expand_block_clear (rtx operands[]) HOST_WIDE_INT align; HOST_WIDE_INT bytes; int offset; - int clear_bytes; int clear_step; /* If this is not a fixed size move, just call memcpy */ @@ -89,62 +131,36 @@ expand_block_clear (rtx operands[]) bool unaligned_vsx_ok = (bytes >= 32 && TARGET_EFFICIENT_UNALIGNED_VSX); - for (offset = 0; bytes > 0; offset += clear_bytes, bytes -= clear_bytes) + auto mode = widest_fixed_size_mode_for_block_clear (bytes, align, + unaligned_vsx_ok); + offset = 0; + rtx dest; + + do { - machine_mode mode = BLKmode; - rtx dest; + unsigned int size = GET_MODE_SIZE (mode); - if (TARGET_ALTIVEC - && (bytes >= 16 && (align >= 128 || unaligned_vsx_ok))) + while (bytes >= size) { - clear_bytes = 16; - mode = V4SImode; - } - else if (bytes >= 8 && TARGET_POWERPC64 - && (align >= 64 || !STRICT_ALIGNMENT)) - { - clear_bytes = 8; - mode = DImode; - if (offset == 0 && align < 64) - { - rtx addr; + dest = adjust_address (orig_dest, mode, offset); + emit_move_insn (dest, CONST0_RTX (mode)); - /* If the address form is reg+offset with offset not a - multiple of four, reload into reg indirect form here - rather than waiting for reload. This way we get one - reload, not one per store. */ - addr = XEXP (orig_dest, 0); - if ((GET_CODE (addr) == PLUS || GET_CODE (addr) == LO_SUM) - && CONST_INT_P (XEXP (addr, 1)) - && (INTVAL (XEXP (addr, 1)) & 3) != 0) - { - addr = copy_addr_to_reg (addr); - orig_dest = replace_equiv_address (orig_dest, addr); - } - } - } - else if (bytes >= 4 && (align >= 32 || !STRICT_ALIGNMENT)) - { /* move 4 bytes */ - clear_bytes = 4; - mode = SImode; - } - else if (bytes >= 2 && (align >= 16 || !STRICT_ALIGNMENT)) - { /* move 2 bytes */ - clear_bytes = 2; - mode = HImode; - } - else /* move 1 byte at a time */ - { - clear_bytes = 1; - mode = QImode; + offset += size; + bytes -= size; } - dest = adjust_address (orig_dest, mode, offset); + if (bytes == 0) + return 1; - emit_move_insn (dest, CONST0_RTX (mode)); + mode = smallest_fixed_size_mode_for_block_clear (bytes); + int gap = GET_MODE_SIZE (mode) - bytes; + if (gap > 0) + { + offset -= gap; + bytes += gap; + } } - - return 1; + while (1); } /* Figure out the correct instructions to generate to load data for diff --git a/gcc/testsuite/gcc.target/powerpc/block-clear-1.c b/gcc/testsuite/gcc.target/powerpc/block-clear-1.c new file mode 100644 index 00000000000..5e16c44fea3 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/block-clear-1.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ +/* { dg-final { scan-assembler-not {\mst[hb]\M} } } */ + +/* Verify that memclear takes overlap store. */ +void* foo (char* s1) +{ + __builtin_memset (s1, 0, 31); +}