From patchwork Sun Feb 4 20:10:44 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff Law X-Patchwork-Id: 196568 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:168b:b0:106:860b:bbdd with SMTP id ma11csp514616dyb; Sun, 4 Feb 2024 12:11:38 -0800 (PST) X-Google-Smtp-Source: AGHT+IHGSkBpsvpW8RB/rvptaArfjXT6C8+ZtYisNk43BHNG5odVExEHk/44WPSeu8ZFkw9xNsto X-Received: by 2002:a05:6871:4311:b0:219:6cd9:3e26 with SMTP id lu17-20020a056871431100b002196cd93e26mr3339717oab.23.1707077497646; Sun, 04 Feb 2024 12:11:37 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707077497; cv=pass; d=google.com; s=arc-20160816; b=PRl56AnDF681yHrsFM+ArtLdQzkyn0QcybuSsTsYi5VzjkgV2pk71R5kH9LzG3ak2N xv54N0yx0QpnDwkY3QbnWA52Hwa000HTXsbZr4hpVKMrTKpQT4DhXsGsT/K1KUCaDcyi rJCUBzs7+h9DMut19tAvBYLBi3KkmLs7mggGucqEg8+z3PQT+wm0fmeAnkfvhYDNFFIB f/7fkuezATTr3deni1cC1DVYFChfkDp5dKXO4J72SuIScGvt+N25xNg5ko6cSIfKVUre 56Se/EXwMdxWOIfSQS/e8h5wDOqnhSVxu6Myk8aEzF93rcNYwmCzCdGamL8O/inhUYJu AnSg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:to:from :content-language:user-agent:mime-version:date:message-id :dkim-signature:arc-filter:dmarc-filter:delivered-to; bh=J9FXvBQEV0VQl81ghlPZvBojhKZvEkOsspBfebjFuUo=; fh=QlrJiIigVz8jFi1nWH9Wwl7juF3lyvtOSyChnSkpCFw=; b=E31yH3aQAHPiAr5JNlB9qH+lYDHTG3ZF8lpBeJW0X65ewfcH9wp/OHlDvZgzPmD5yJ hT/avpk0d4xupPwaH+hk9mOD39pC+Y6+/X54eLtGD4BtqlIciCIpEqJYSkRUFdkBCAUL xWncQB4KCc2HobaaGDfzpWeeRwcjidqfDgGvvMTJEwx8nTE2a4w6CmNMqrnNgyOttn1L SrprW0S154ypu/Cmuk6pYSJAHn6IhzqUjERGdMUrDIlXd0qKKcBxNVCkdsodnc8+GVZh EohyAvUJb/GOpZ25ZM8neZxLb9Ka7K8f9GcnmoazJyI+oiVBkOMYdDXN6t3WtxGdV/jv UNFw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@ventanamicro.com header.s=google header.b="Vb+g1/nI"; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" X-Forwarded-Encrypted: i=1; AJvYcCWb73nKY7LHJkdDWHZL0eAzSr9R6blc98XF8TJjy/MR95PEDCgchms+R4n6Dj2UJjW7MT1kqprJSfzhRs6e+pEBZ2kgRw== Received: from server2.sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id o2-20020ac85a42000000b0042c1eba585dsi1664966qta.415.2024.02.04.12.11.37 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 04 Feb 2024 12:11:37 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@ventanamicro.com header.s=google header.b="Vb+g1/nI"; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 5796F385828D for ; Sun, 4 Feb 2024 20:11:37 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-il1-x136.google.com (mail-il1-x136.google.com [IPv6:2607:f8b0:4864:20::136]) by sourceware.org (Postfix) with ESMTPS id 80AA83858CDB for ; Sun, 4 Feb 2024 20:10:47 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 80AA83858CDB Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=ventanamicro.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=ventanamicro.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 80AA83858CDB Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::136 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1707077450; cv=none; b=Iywha+jQaQJkXrod5x32GLLtlnkq+0h2m7ZB6RsF2In6hi6MdmR2slcYZrOrNfgHabP3wEVKtDdWqkAxUF+g/wuDEAcaH8ja07vqY5c+jmzby0CJgkihmozT9W+Sv42Rti0prIZoUBazGl2mAp453xiEw5hCst6RLVaLnD1kUYw= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1707077450; c=relaxed/simple; bh=QsjlREk1TuayoT8tSZp+zoMFCL7E+kvLEWR0ni0dVw8=; h=DKIM-Signature:Message-ID:Date:MIME-Version:From:To:Subject; b=vfp7jl/LgHrFrV6sbU2miKkLDuzkYlCsf6RP4YX3V4bnkcERkfZoyq2qeDAyKwg1N+IEkjTW4c8h1YC0O0M3z0tCCdUknm3LvqWum8daNzR4boSI0mbGAcQ5SBa2+kH2pqowX8dIdgPuvyiQTdvRPyXnoBXrJ9Ac8Aeop8vjmQk= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-il1-x136.google.com with SMTP id e9e14a558f8ab-3639ef0f790so12007015ab.0 for ; Sun, 04 Feb 2024 12:10:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ventanamicro.com; s=google; t=1707077446; x=1707682246; darn=gcc.gnu.org; h=subject:to:from:content-language:user-agent:mime-version:date :message-id:from:to:cc:subject:date:message-id:reply-to; bh=J9FXvBQEV0VQl81ghlPZvBojhKZvEkOsspBfebjFuUo=; b=Vb+g1/nIievq7am5XN8mCWoMLIw/LiwWmpqh/H7gCcLfFK56A0kQZHDmI9GbZFuSVi xqnrqaQecbSFJ3R65wQ2N7fz/C5UX8SmuTPYoy5muLilaKuwbogCy1eVtPD6uwiO3IHO xxR2Hy6yyiCPNcrFaofxdAh45fZO8GHr71YDJznHmc4CUzDnB1PbjtXG56snxS7+XEYy 5HxbqfdB0DrqttY/1HwFKdhEHXXLBJxN9dJ1PV7H+IiuGAVTby3IvJtlzd1Uaff5QUDX jZ3AfHYzu6/033kwPDCOOwO/j/mzHDobei/pat/vewanFrxjEedcDy6H4gCuPerjM8WJ cbgQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707077446; x=1707682246; h=subject:to:from:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=J9FXvBQEV0VQl81ghlPZvBojhKZvEkOsspBfebjFuUo=; b=SrAueaFHtvf3mXyQSU/q6ZQki2AOZCTiLsm4BrKt4RWonVm5PWEfahLGxcqF+4oavV Efk58YgAgFiQ6HsK2W5o4nbkLW4fmGkAyNeGATvdF71n/WoOn8hmfu4jg/SBB55JmJ0b FUSAoixtT6ZAy5FVsI+GoVTdrd+SWJOXWPeAtUuRBuB58w/svB624DkyY4JY445Et+nV yDjYggdP6SdPh+qLfRQA10SF72M9lw3uTXl1hdtNJqmVnazTPOraKn/mMfI3NkPnYxfE Ps+PKE6S9sjPRuOEcnRSQllZdbEvjy1euBIf/BoNh4eR7RI8uvsUqMwWQ4hvBfjIJdLX 0STA== X-Gm-Message-State: AOJu0Yy2vHnY/YX1CUhMSlAjeK/dMCLjiR7b6+8KgqV1JQh/zLPtQ25h nhaNJcTD8Zm9ieN5jif/OQwjgFnlSyk9ID7WV3j2W7lWQqINNv+A0QoYBBMA0G+wRK+sf6Covzc o X-Received: by 2002:a92:b708:0:b0:363:bfc7:74b1 with SMTP id k8-20020a92b708000000b00363bfc774b1mr4304411ili.32.1707077445800; Sun, 04 Feb 2024 12:10:45 -0800 (PST) Received: from [172.31.0.109] ([136.36.72.243]) by smtp.gmail.com with ESMTPSA id j11-20020a056e02220b00b0036382484384sm101648ilf.17.2024.02.04.12.10.44 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 04 Feb 2024 12:10:45 -0800 (PST) Message-ID: <7b53cdea-e75e-4df5-a0ae-91f7c15fb0fa@ventanamicro.com> Date: Sun, 4 Feb 2024 13:10:44 -0700 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Content-Language: en-US From: Jeff Law To: "gcc-patches@gcc.gnu.org" Subject: [committed] Reasonably handle SUBREGs in risc-v cost modeling X-Spam-Status: No, score=-9.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, LIKELY_SPAM_BODY, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1790000494019697723 X-GMAIL-MSGID: 1790000494019697723 This patch adjusts the costs so that we treat REG and SUBREG expressions the same for costing. This was motivated by bt_skip_func and bt_find_func in xz and results in nearly a 5% improvement in the dynamic instruction count for input #2 and smaller, but definitely visible improvements pretty much across the board. Exceptions would be perlbench input #1 and exchange2 which showed small regressions. In the bt_find_func and bt_skip_func cases we have something like this: > (insn 10 7 11 2 (set (reg/v:DI 136 [ x ]) > (zero_extend:DI (subreg/s/u:SI (reg/v:DI 137 [ a ]) 0))) "zz.c":6:21 387 {*zero_extendsidi2_bitmanip} > (nil)) > (insn 11 10 12 2 (set (reg:DI 142 [ _1 ]) > (plus:DI (reg/v:DI 136 [ x ]) > (reg/v:DI 139 [ b ]))) "zz.c":7:23 5 {adddi3} > (nil)) [ ... ] > (insn 13 12 14 2 (set (reg:DI 143 [ _2 ]) > (plus:DI (reg/v:DI 136 [ x ]) > (reg/v:DI 141 [ c ]))) "zz.c":8:23 5 {adddi3} > (nil)) Note the two uses of (reg 136). The best way to handle that in combine might be a 3->2 split. But there's a much better approach if we look at fwprop... (set (reg:DI 142 [ _1 ]) (plus:DI (zero_extend:DI (subreg/s/u:SI (reg/v:DI 137 [ a ]) 0)) (reg/v:DI 139 [ b ]))) change not profitable (cost 4 -> cost 8) So that should be the same cost as a regular DImode addition when the ZBA extension is enabled. But it ends up costing more because the clause to cost this variant isn't prepared to handle a SUBREG. That results in the RTL above having too high a cost and fwprop gives up. One approach would be to replace the REG_P with REG_P || SUBREG_P in the costing code. I ultimately decided against that and instead check if the operand in question passes register_operand. By far the most important case to handle is the DImode PLUS. But for the sake of consistency, I changed the other instances in riscv_rtx_costs as well. For those other cases we're talking about improvements in the .000001% range. While we are into stage4, this just hits cost modeling which we've generally agreed is still appropriate for the RISC-V backend (though we were mostly talking about vector). So I'm going to extend that general agreement ever so slightly and include scalar cost modeling :-) Built and regression tested on rv64gc. Pushing to the trunk. Shout out to Jivan who took the original somewhat vague report about bt_skip_func and boiled it down to a very simple testcase along with info on a couple attempted fixes that didn't work out. Jeff commit 777df37a12e55ecbc135efbed2749a8a8a756d4d Author: Jeff Law Date: Sun Feb 4 13:01:50 2024 -0700 [committed] Reasonably handle SUBREGs in risc-v cost modeling This patch adjusts the costs so that we treat REG and SUBREG expressions the same for costing. This was motivated by bt_skip_func and bt_find_func in xz and results in nearly a 5% improvement in the dynamic instruction count for input #2 and smaller, but definitely visible improvements pretty much across the board. Exceptions would be perlbench input #1 and exchange2 which showed very small regressions. In the bt_find_func and bt_skip_func cases we have something like this: > (insn 10 7 11 2 (set (reg/v:DI 136 [ x ]) > (zero_extend:DI (subreg/s/u:SI (reg/v:DI 137 [ a ]) 0))) "zz.c":6:21 387 {*zero_extendsidi2_bitmanip} > (nil)) > (insn 11 10 12 2 (set (reg:DI 142 [ _1 ]) > (plus:DI (reg/v:DI 136 [ x ]) > (reg/v:DI 139 [ b ]))) "zz.c":7:23 5 {adddi3} > (nil)) [ ... ]> (insn 13 12 14 2 (set (reg:DI 143 [ _2 ]) > (plus:DI (reg/v:DI 136 [ x ]) > (reg/v:DI 141 [ c ]))) "zz.c":8:23 5 {adddi3} > (nil)) Note the two uses of (reg 136). The best way to handle that in combine might be a 3->2 split. But there's a much better approach if we look at fwprop... (set (reg:DI 142 [ _1 ]) (plus:DI (zero_extend:DI (subreg/s/u:SI (reg/v:DI 137 [ a ]) 0)) (reg/v:DI 139 [ b ]))) change not profitable (cost 4 -> cost 8) So that should be the same cost as a regular DImode addition when the ZBA extension is enabled. But it ends up costing more because the clause to cost this variant isn't prepared to handle a SUBREG. That results in the RTL above having too high a cost and fwprop gives up. One approach would be to replace the REG_P with REG_P || SUBREG_P in the costing code. I ultimately decided against that and instead check if the operand in question passes register_operand. By far the most important case to handle is the DImode PLUS. But for the sake of consistency, I changed the other instances in riscv_rtx_costs as well. For those other cases we're talking about improvements in the .000001% range. While we are into stage4, this just hits cost modeling which we've generally agreed is still appropriate (though we were mostly talking about vector). So I'm going to extend that general agreement ever so slightly and include scalar cost modeling :-) gcc/ * config/riscv/riscv.cc (riscv_rtx_costs): Handle SUBREG and REG similarly. gcc/testsuite/ * gcc.target/riscv/reg_subreg_costs.c: New test. Co-authored-by: Jivan Hakobyan diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index d7cdd7183c2..d6868a65b31 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -3055,7 +3055,8 @@ riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UN case SET: /* If we are called for an INSN that's a simple set of a register, then cost based on the SET_SRC alone. */ - if (outer_code == INSN && REG_P (SET_DEST (x))) + if (outer_code == INSN + && register_operand (SET_DEST (x), GET_MODE (SET_DEST (x)))) { riscv_rtx_costs (SET_SRC (x), mode, outer_code, opno, total, speed); return true; @@ -3172,7 +3173,7 @@ riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UN rtx and_rhs = XEXP (x, 1); rtx ashift_lhs = XEXP (XEXP (x, 0), 0); rtx ashift_rhs = XEXP (XEXP (x, 0), 1); - if (REG_P (ashift_lhs) + if (register_operand (ashift_lhs, GET_MODE (ashift_lhs)) && CONST_INT_P (ashift_rhs) && CONST_INT_P (and_rhs) && ((INTVAL (and_rhs) >> INTVAL (ashift_rhs)) == 0xffffffff)) @@ -3188,7 +3189,7 @@ riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UN } /* bclr pattern for zbs. */ if (TARGET_ZBS - && REG_P (XEXP (x, 1)) + && register_operand (XEXP (x, 1), GET_MODE (XEXP (x, 1))) && GET_CODE (XEXP (x, 0)) == ROTATE && CONST_INT_P (XEXP ((XEXP (x, 0)), 0)) && INTVAL (XEXP ((XEXP (x, 0)), 0)) == -2) @@ -3344,7 +3345,8 @@ riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UN if (TARGET_ZBA && (TARGET_64BIT && (mode == DImode)) && GET_CODE (XEXP (x, 0)) == ZERO_EXTEND - && REG_P (XEXP (XEXP (x, 0), 0)) + && register_operand (XEXP (XEXP (x, 0), 0), + GET_MODE (XEXP (XEXP (x, 0), 0))) && GET_MODE (XEXP (XEXP (x, 0), 0)) == SImode) { *total = COSTS_N_INSNS (1); @@ -3355,7 +3357,8 @@ riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UN && ((!TARGET_64BIT && (mode == SImode)) || (TARGET_64BIT && (mode == DImode))) && (GET_CODE (XEXP (x, 0)) == ASHIFT) - && REG_P (XEXP (XEXP (x, 0), 0)) + && register_operand (XEXP (XEXP (x, 0), 0), + GET_MODE (XEXP (XEXP (x, 0), 0))) && CONST_INT_P (XEXP (XEXP (x, 0), 1)) && IN_RANGE (INTVAL (XEXP (XEXP (x, 0), 1)), 1, 3)) { @@ -3368,7 +3371,8 @@ riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UN if (TARGET_ZBA && mode == word_mode && GET_CODE (XEXP (x, 0)) == MULT - && REG_P (XEXP (XEXP (x, 0), 0)) + && register_operand (XEXP (XEXP (x, 0), 0), + GET_MODE (XEXP (XEXP (x, 0), 0))) && CONST_INT_P (XEXP (XEXP (x, 0), 1)) && pow2p_hwi (INTVAL (XEXP (XEXP (x, 0), 1))) && IN_RANGE (exact_log2 (INTVAL (XEXP (XEXP (x, 0), 1))), 1, 3)) @@ -3390,7 +3394,7 @@ riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UN if (TARGET_ZBA && (TARGET_64BIT && (mode == DImode)) && (GET_CODE (XEXP (x, 0)) == AND) - && (REG_P (XEXP (x, 1)))) + && register_operand (XEXP (x, 1), GET_MODE (XEXP (x, 1)))) { do { rtx and_lhs = XEXP (XEXP (x, 0), 0); diff --git a/gcc/testsuite/gcc.target/riscv/reg_subreg_costs.c b/gcc/testsuite/gcc.target/riscv/reg_subreg_costs.c new file mode 100644 index 00000000000..874dff3a688 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/reg_subreg_costs.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target rv64 } */ +/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-Os" "-Oz" } } */ +/* { dg-options "-march=rv64gc_zba" } */ + +#include +void foo(uint32_t a, uint64_t *b_ptr, uint64_t b, uint64_t *c_ptr, uint64_t c) +{ + uint64_t x = a; + *b_ptr = b + x; + *c_ptr = c + x; +} + +/* { dg-final { scan-assembler-not "\\szext.w\\s" } } */ +