From patchwork Thu May 11 10:12:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Li, Pan2 via Gcc-patches" X-Patchwork-Id: 92471 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp4252215vqo; Thu, 11 May 2023 03:13:13 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ55QaNXYWOSeYLDHT0pTBIOcME4lqOt3zoiBsDDtURDcuVgEsXcbO+JpE5HoAClvTP0Gktg X-Received: by 2002:a17:907:7251:b0:96a:1ab:b4a2 with SMTP id ds17-20020a170907725100b0096a01abb4a2mr8580305ejc.25.1683799993127; Thu, 11 May 2023 03:13:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1683799993; cv=none; d=google.com; s=arc-20160816; b=TD4U3xUePNOXUnr+s49G9h1sIRjZFxgDgPxaGBsVqvyOxiQioLQhZbB/tOtj7w6L40 6bWa92b43XWkwvbqXrAFOWJO3en+iv0upJEdIMz7OmG7WLh66K7TVhLQHstaFICI2G/q kWCLhMjm0dye+FNc03rzZsCLjNga5wwigX2cS6hA78mBI7jNE+el53MBeCUBaR8Mg9nN VdQjepXWdW2K8qDUob6hvfhJYH/gOvcUmpOVm7JRm6Gv/Xs0XpqlvtnOwVUoCS+H1DID w+94jXo9M95Vv2LvAZB4apYcwsWgFuYVL0vmoJPURhAlDLdBqhtBvgWQca6ZVAr/56LQ ftkQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence :content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:dmarc-filter:delivered-to :dkim-signature:dkim-filter; bh=qwJjoAi6rHP8udm6WodojqkvZR5oxp4cvUepQjCpHfs=; b=xdBSVktaozJe/g43WITJp0+rZUdbBXe8rv+Eeqo9SEZUOLzOTG89VPLUaM/ieFeI2Y bfWKP3UuZoTX4HQIyE+J7+Gk9I/zaAtDO6Yzu8ZvjW/jbRHZ11HFYDsc5STxfi7yTOmI 4sNHFAflx58Wc+zbH/at4vrFPiuvKXBbTS1lz6o9HNNbbhfiU4lmy2pcUGcokZ2eTvGv eTwG41QGVwefaE3WnVfxHeQsNJ5ZYRg903KslAk3lNs5p6xYerelj/b2gG45RvDCJeTu P7tY8TTZzFDpNKnN2UkuAbZX7LOfefGqTBnWitCohCtgbhKkOabuT2DiA2KbonS8uWrw BMWQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=h7ELw5Kx; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id qf37-20020a1709077f2500b009531dd6761esi6859968ejc.45.2023.05.11.03.13.12 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 11 May 2023 03:13:13 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=h7ELw5Kx; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id D3A8A385702F for ; Thu, 11 May 2023 10:12:52 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org D3A8A385702F DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1683799972; bh=qwJjoAi6rHP8udm6WodojqkvZR5oxp4cvUepQjCpHfs=; h=To:Cc:Subject:Date:In-Reply-To:References:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=h7ELw5KxqzeEYo4x9uC7zT/1i2AQuLhJ947q+sm6/24uYrj//kRhIPrCmdImtlGPS HN44ndJwEEg/f8s2wFB4SEEIY+KlF6dT1MUB2IuTiMOQTzB6ytdUUWshlgIF9/E2L4 3Pgxy40bBT3tbIYpxXu0CgYUyoLYABxh1aWrtPLY= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by sourceware.org (Postfix) with ESMTPS id 5AD443858C54 for ; Thu, 11 May 2023 10:12:06 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 5AD443858C54 X-IronPort-AV: E=McAfee;i="6600,9927,10706"; a="330815126" X-IronPort-AV: E=Sophos;i="5.99,266,1677571200"; d="scan'208";a="330815126" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 May 2023 03:12:05 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10706"; a="693740216" X-IronPort-AV: E=Sophos;i="5.99,266,1677571200"; d="scan'208";a="693740216" Received: from scymds03.sc.intel.com ([10.148.94.166]) by orsmga007.jf.intel.com with ESMTP; 11 May 2023 03:12:05 -0700 Received: from shgcc101.sh.intel.com (shgcc101.sh.intel.com [10.239.85.97]) by scymds03.sc.intel.com (Postfix) with ESMTP id D998D78; Thu, 11 May 2023 03:12:03 -0700 (PDT) To: gcc-patches@gcc.gnu.org Cc: Lili Cui Subject: [PATCH 2/2] Add a tune option to control the length of the chain with FMA Date: Thu, 11 May 2023 10:12:01 +0000 Message-Id: <20230511101201.2052667-2-lili.cui@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230511101201.2052667-1-lili.cui@intel.com> References: <20230511101201.2052667-1-lili.cui@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-11.1 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: "Cui, Lili via Gcc-patches" From: "Li, Pan2 via Gcc-patches" Reply-To: "Cui, Lili" Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1765592261319665044?= X-GMAIL-MSGID: =?utf-8?q?1765592261319665044?= From: Lili Cui Set the length of the chain with FMA to 5 for icelake_cost. With this patch applied, SPR multi-copy: 508.namd_r increased by 3% ICX multi-copy: 508.namd_r increased by 3.5%, 507.cactuBSSN_r increased by 3.7% Using FMA instead of mult + add reduces register pressure and insruction retired. gcc/ChangeLog: * config/i386/i386-options.cc (ix86_option_override_internal): Set param_max_reassoc_fma_chain_length. * config/i386/i386.h (struct processor_costs): Add new tune parameters. * config/i386/x86-tune-costs.h (struct processor_costs): Set reassoc_max_chain_length_with_fma to 5 for icelake. gcc/testsuite/ChangeLog: * gcc.target/i386/fma-chain.c: New test. --- gcc/config/i386/i386-options.cc | 2 ++ gcc/config/i386/i386.h | 3 ++ gcc/config/i386/x86-tune-costs.h | 35 +++++++++++++++++++++++ gcc/testsuite/gcc.target/i386/fma-chain.c | 11 +++++++ 4 files changed, 51 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/fma-chain.c diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc index 2cb0bddcd35..67d35d89d91 100644 --- a/gcc/config/i386/i386-options.cc +++ b/gcc/config/i386/i386-options.cc @@ -2684,6 +2684,8 @@ ix86_option_override_internal (bool main_args_p, ix86_tune_cost->l1_cache_size); SET_OPTION_IF_UNSET (opts, opts_set, param_l2_cache_size, ix86_tune_cost->l2_cache_size); + SET_OPTION_IF_UNSET (opts, opts_set, param_reassoc_max_chain_length_with_fma, + ix86_tune_cost->reassoc_max_chain_length_with_fma); /* 64B is the accepted value for these for all x86. */ SET_OPTION_IF_UNSET (&global_options, &global_options_set, diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index c7439f89bdf..c7fa7312a67 100644 --- a/gcc/config/i386/i386.h +++ b/gcc/config/i386/i386.h @@ -206,6 +206,9 @@ struct processor_costs { to number of instructions executed in parallel. See also ix86_reassociation_width. */ + const int reassoc_max_chain_length_with_fma; + /* Specify max reassociation chain length with + FMA. */ struct stringop_algs *memcpy, *memset; const int cond_taken_branch_cost; /* Cost of taken branch for vectorizer cost model. */ diff --git a/gcc/config/i386/x86-tune-costs.h b/gcc/config/i386/x86-tune-costs.h index 4f7a67ca5c5..1f57a5ee2a7 100644 --- a/gcc/config/i386/x86-tune-costs.h +++ b/gcc/config/i386/x86-tune-costs.h @@ -127,6 +127,7 @@ struct processor_costs ix86_size_cost = {/* costs for tuning for size */ COSTS_N_BYTES (2), /* cost of SQRTSS instruction. */ COSTS_N_BYTES (2), /* cost of SQRTSD instruction. */ 1, 1, 1, 1, /* reassoc int, fp, vec_int, vec_fp. */ + 1, /* Reassoc max FMA chain length. */ ix86_size_memcpy, ix86_size_memset, COSTS_N_BYTES (1), /* cond_taken_branch_cost. */ @@ -238,6 +239,7 @@ struct processor_costs i386_cost = { /* 386 specific costs */ COSTS_N_INSNS (122), /* cost of SQRTSS instruction. */ COSTS_N_INSNS (122), /* cost of SQRTSD instruction. */ 1, 1, 1, 1, /* reassoc int, fp, vec_int, vec_fp. */ + 1, /* Reassoc max FMA chain length. */ i386_memcpy, i386_memset, COSTS_N_INSNS (3), /* cond_taken_branch_cost. */ @@ -350,6 +352,7 @@ struct processor_costs i486_cost = { /* 486 specific costs */ COSTS_N_INSNS (83), /* cost of SQRTSS instruction. */ COSTS_N_INSNS (83), /* cost of SQRTSD instruction. */ 1, 1, 1, 1, /* reassoc int, fp, vec_int, vec_fp. */ + 1, /* Reassoc max FMA chain length. */ i486_memcpy, i486_memset, COSTS_N_INSNS (3), /* cond_taken_branch_cost. */ @@ -460,6 +463,7 @@ struct processor_costs pentium_cost = { COSTS_N_INSNS (70), /* cost of SQRTSS instruction. */ COSTS_N_INSNS (70), /* cost of SQRTSD instruction. */ 1, 1, 1, 1, /* reassoc int, fp, vec_int, vec_fp. */ + 1, /* Reassoc max FMA chain length. */ pentium_memcpy, pentium_memset, COSTS_N_INSNS (3), /* cond_taken_branch_cost. */ @@ -563,6 +567,7 @@ struct processor_costs lakemont_cost = { COSTS_N_INSNS (31), /* cost of SQRTSS instruction. */ COSTS_N_INSNS (63), /* cost of SQRTSD instruction. */ 1, 1, 1, 1, /* reassoc int, fp, vec_int, vec_fp. */ + 1, /* Reassoc max FMA chain length. */ pentium_memcpy, pentium_memset, COSTS_N_INSNS (3), /* cond_taken_branch_cost. */ @@ -681,6 +686,7 @@ struct processor_costs pentiumpro_cost = { COSTS_N_INSNS (31), /* cost of SQRTSS instruction. */ COSTS_N_INSNS (31), /* cost of SQRTSD instruction. */ 1, 1, 1, 1, /* reassoc int, fp, vec_int, vec_fp. */ + 1, /* Reassoc max FMA chain length. */ pentiumpro_memcpy, pentiumpro_memset, COSTS_N_INSNS (3), /* cond_taken_branch_cost. */ @@ -790,6 +796,7 @@ struct processor_costs geode_cost = { COSTS_N_INSNS (54), /* cost of SQRTSS instruction. */ COSTS_N_INSNS (54), /* cost of SQRTSD instruction. */ 1, 1, 1, 1, /* reassoc int, fp, vec_int, vec_fp. */ + 1, /* Reassoc max FMA chain length. */ geode_memcpy, geode_memset, COSTS_N_INSNS (3), /* cond_taken_branch_cost. */ @@ -902,6 +909,7 @@ struct processor_costs k6_cost = { COSTS_N_INSNS (56), /* cost of SQRTSS instruction. */ COSTS_N_INSNS (56), /* cost of SQRTSD instruction. */ 1, 1, 1, 1, /* reassoc int, fp, vec_int, vec_fp. */ + 1, /* Reassoc max FMA chain length. */ k6_memcpy, k6_memset, COSTS_N_INSNS (3), /* cond_taken_branch_cost. */ @@ -1015,6 +1023,7 @@ struct processor_costs athlon_cost = { COSTS_N_INSNS (19), /* cost of SQRTSS instruction. */ COSTS_N_INSNS (19), /* cost of SQRTSD instruction. */ 1, 1, 1, 1, /* reassoc int, fp, vec_int, vec_fp. */ + 1, /* Reassoc max FMA chain length. */ athlon_memcpy, athlon_memset, COSTS_N_INSNS (3), /* cond_taken_branch_cost. */ @@ -1137,6 +1146,7 @@ struct processor_costs k8_cost = { COSTS_N_INSNS (19), /* cost of SQRTSS instruction. */ COSTS_N_INSNS (27), /* cost of SQRTSD instruction. */ 1, 1, 1, 1, /* reassoc int, fp, vec_int, vec_fp. */ + 1, /* Reassoc max FMA chain length. */ k8_memcpy, k8_memset, COSTS_N_INSNS (3), /* cond_taken_branch_cost. */ @@ -1267,6 +1277,7 @@ struct processor_costs amdfam10_cost = { COSTS_N_INSNS (19), /* cost of SQRTSS instruction. */ COSTS_N_INSNS (27), /* cost of SQRTSD instruction. */ 1, 1, 1, 1, /* reassoc int, fp, vec_int, vec_fp. */ + 1, /* Reassoc max FMA chain length. */ amdfam10_memcpy, amdfam10_memset, COSTS_N_INSNS (2), /* cond_taken_branch_cost. */ @@ -1390,6 +1401,7 @@ const struct processor_costs bdver_cost = { COSTS_N_INSNS (15), /* cost of SQRTSS instruction. */ COSTS_N_INSNS (26), /* cost of SQRTSD instruction. */ 1, 2, 1, 1, /* reassoc int, fp, vec_int, vec_fp. */ + 1, /* Reassoc max FMA chain length. */ bdver_memcpy, bdver_memset, COSTS_N_INSNS (4), /* cond_taken_branch_cost. */ @@ -1545,6 +1557,7 @@ struct processor_costs znver1_cost = { plus/minus operations per cycle but only one multiply. This is adjusted in ix86_reassociation_width. */ 4, 4, 3, 6, /* reassoc int, fp, vec_int, vec_fp. */ + 1, /* Reassoc max FMA chain length. */ znver1_memcpy, znver1_memset, COSTS_N_INSNS (4), /* cond_taken_branch_cost. */ @@ -1704,6 +1717,7 @@ struct processor_costs znver2_cost = { plus/minus operations per cycle but only one multiply. This is adjusted in ix86_reassociation_width. */ 4, 4, 3, 6, /* reassoc int, fp, vec_int, vec_fp. */ + 1, /* Reassoc max FMA chain length. */ znver2_memcpy, znver2_memset, COSTS_N_INSNS (4), /* cond_taken_branch_cost. */ @@ -1838,6 +1852,7 @@ struct processor_costs znver3_cost = { plus/minus operations per cycle but only one multiply. This is adjusted in ix86_reassociation_width. */ 4, 4, 3, 6, /* reassoc int, fp, vec_int, vec_fp. */ + 1, /* Reassoc max FMA chain length. */ znver2_memcpy, znver2_memset, COSTS_N_INSNS (4), /* cond_taken_branch_cost. */ @@ -1974,6 +1989,7 @@ struct processor_costs znver4_cost = { plus/minus operations per cycle but only one multiply. This is adjusted in ix86_reassociation_width. */ 4, 4, 3, 6, /* reassoc int, fp, vec_int, vec_fp. */ + 1, /* Reassoc max FMA chain length. */ znver2_memcpy, znver2_memset, COSTS_N_INSNS (4), /* cond_taken_branch_cost. */ @@ -2100,6 +2116,7 @@ struct processor_costs skylake_cost = { COSTS_N_INSNS (12), /* cost of SQRTSS instruction. */ COSTS_N_INSNS (18), /* cost of SQRTSD instruction. */ 1, 4, 2, 2, /* reassoc int, fp, vec_int, vec_fp. */ + 1, /* Reassoc max FMA chain length. */ skylake_memcpy, skylake_memset, COSTS_N_INSNS (3), /* cond_taken_branch_cost. */ @@ -2228,6 +2245,12 @@ struct processor_costs icelake_cost = { COSTS_N_INSNS (12), /* cost of SQRTSS instruction. */ COSTS_N_INSNS (18), /* cost of SQRTSD instruction. */ 1, 4, 2, 2, /* reassoc int, fp, vec_int, vec_fp. */ + /* Icelake-server prefers fma chains instead of breaking dependencies into + mult + add, which can reduce instruction retired. 1 means not to keep + the fma chain. When the value big than 1, we will generate fma chain. + When the actual fma chain length is greater than this value, the fma + chain will be split with width. */ + 5, /* Reassoc max FMA chain length. */ icelake_memcpy, icelake_memset, COSTS_N_INSNS (3), /* cond_taken_branch_cost. */ @@ -2350,6 +2373,7 @@ struct processor_costs alderlake_cost = { COSTS_N_INSNS (14), /* cost of SQRTSS instruction. */ COSTS_N_INSNS (18), /* cost of SQRTSD instruction. */ 1, 4, 3, 3, /* reassoc int, fp, vec_int, vec_fp. */ + 1, /* Reassoc max FMA chain length. */ alderlake_memcpy, alderlake_memset, COSTS_N_INSNS (4), /* cond_taken_branch_cost. */ @@ -2465,6 +2489,7 @@ const struct processor_costs btver1_cost = { COSTS_N_INSNS (14), /* cost of SQRTSS instruction. */ COSTS_N_INSNS (48), /* cost of SQRTSD instruction. */ 1, 1, 1, 1, /* reassoc int, fp, vec_int, vec_fp. */ + 1, /* Reassoc max FMA chain length. */ btver1_memcpy, btver1_memset, COSTS_N_INSNS (2), /* cond_taken_branch_cost. */ @@ -2577,6 +2602,7 @@ const struct processor_costs btver2_cost = { COSTS_N_INSNS (16), /* cost of SQRTSS instruction. */ COSTS_N_INSNS (21), /* cost of SQRTSD instruction. */ 1, 1, 1, 1, /* reassoc int, fp, vec_int, vec_fp. */ + 1, /* Reassoc max FMA chain length. */ btver2_memcpy, btver2_memset, COSTS_N_INSNS (2), /* cond_taken_branch_cost. */ @@ -2688,6 +2714,7 @@ struct processor_costs pentium4_cost = { COSTS_N_INSNS (23), /* cost of SQRTSS instruction. */ COSTS_N_INSNS (38), /* cost of SQRTSD instruction. */ 1, 1, 1, 1, /* reassoc int, fp, vec_int, vec_fp. */ + 1, /* Reassoc max FMA chain length. */ pentium4_memcpy, pentium4_memset, COSTS_N_INSNS (3), /* cond_taken_branch_cost. */ @@ -2802,6 +2829,7 @@ struct processor_costs nocona_cost = { COSTS_N_INSNS (32), /* cost of SQRTSS instruction. */ COSTS_N_INSNS (41), /* cost of SQRTSD instruction. */ 1, 1, 1, 1, /* reassoc int, fp, vec_int, vec_fp. */ + 1, /* Reassoc max FMA chain length. */ nocona_memcpy, nocona_memset, COSTS_N_INSNS (3), /* cond_taken_branch_cost. */ @@ -2914,6 +2942,7 @@ struct processor_costs atom_cost = { COSTS_N_INSNS (31), /* cost of SQRTSS instruction. */ COSTS_N_INSNS (63), /* cost of SQRTSD instruction. */ 2, 2, 2, 2, /* reassoc int, fp, vec_int, vec_fp. */ + 1, /* Reassoc max FMA chain length. */ atom_memcpy, atom_memset, COSTS_N_INSNS (3), /* cond_taken_branch_cost. */ @@ -3026,6 +3055,7 @@ struct processor_costs slm_cost = { COSTS_N_INSNS (20), /* cost of SQRTSS instruction. */ COSTS_N_INSNS (35), /* cost of SQRTSD instruction. */ 1, 2, 1, 1, /* reassoc int, fp, vec_int, vec_fp. */ + 1, /* Reassoc max FMA chain length. */ slm_memcpy, slm_memset, COSTS_N_INSNS (3), /* cond_taken_branch_cost. */ @@ -3152,6 +3182,7 @@ struct processor_costs tremont_cost = { COSTS_N_INSNS (14), /* cost of SQRTSS instruction. */ COSTS_N_INSNS (18), /* cost of SQRTSD instruction. */ 1, 4, 3, 3, /* reassoc int, fp, vec_int, vec_fp. */ + 1, /* Reassoc max FMA chain length. */ tremont_memcpy, tremont_memset, COSTS_N_INSNS (4), /* cond_taken_branch_cost. */ @@ -3264,6 +3295,7 @@ struct processor_costs intel_cost = { COSTS_N_INSNS (40), /* cost of SQRTSS instruction. */ COSTS_N_INSNS (40), /* cost of SQRTSD instruction. */ 1, 4, 1, 1, /* reassoc int, fp, vec_int, vec_fp. */ + 1, /* Reassoc max FMA chain length. */ intel_memcpy, intel_memset, COSTS_N_INSNS (3), /* cond_taken_branch_cost. */ @@ -3381,6 +3413,7 @@ struct processor_costs lujiazui_cost = { COSTS_N_INSNS (32), /* cost of SQRTSS instruction. */ COSTS_N_INSNS (60), /* cost of SQRTSD instruction. */ 1, 4, 3, 3, /* reassoc int, fp, vec_int, vec_fp. */ + 1, /* Reassoc max FMA chain length. */ lujiazui_memcpy, lujiazui_memset, COSTS_N_INSNS (4), /* cond_taken_branch_cost. */ @@ -3502,6 +3535,7 @@ struct processor_costs generic_cost = { COSTS_N_INSNS (14), /* cost of SQRTSS instruction. */ COSTS_N_INSNS (18), /* cost of SQRTSD instruction. */ 1, 4, 3, 3, /* reassoc int, fp, vec_int, vec_fp. */ + 1, /* Reassoc max FMA chain length. */ generic_memcpy, generic_memset, COSTS_N_INSNS (4), /* cond_taken_branch_cost. */ @@ -3630,6 +3664,7 @@ struct processor_costs core_cost = { COSTS_N_INSNS (30), /* cost of SQRTSS instruction. */ COSTS_N_INSNS (58), /* cost of SQRTSD instruction. */ 1, 4, 2, 2, /* reassoc int, fp, vec_int, vec_fp. */ + 1, /* Reassoc max FMA chain length. */ core_memcpy, core_memset, COSTS_N_INSNS (3), /* cond_taken_branch_cost. */ diff --git a/gcc/testsuite/gcc.target/i386/fma-chain.c b/gcc/testsuite/gcc.target/i386/fma-chain.c new file mode 100644 index 00000000000..9de61f1b6ff --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/fma-chain.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-Ofast -march=icelake-server -Wno-attributes " } */ + +/* Test that the compiler properly optimizes multiply and add + to generate more FMA instructions. */ +float +foo (float a, float b, float c, float d, float e, float f, float g, float h, float j) +{ + return a * b + c * d + e * f + g * h + j; +} +/* { dg-final { scan-assembler-times "vfm" 4 } } */