From patchwork Wed Aug 30 10:35:16 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 137167 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a7d1:0:b0:3f2:4152:657d with SMTP id p17csp4450132vqm; Wed, 30 Aug 2023 03:38:14 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGysqXOzIiaDn2ZY0MujzTb5rZ/dyyuBblWglMWd59nzF/nnDD5b44tTtVAWSN5pFS/ntTp X-Received: by 2002:ac2:57c7:0:b0:4fd:faf0:6591 with SMTP id k7-20020ac257c7000000b004fdfaf06591mr1099936lfo.10.1693391893768; Wed, 30 Aug 2023 03:38:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1693391893; cv=none; d=google.com; s=arc-20160816; b=kaDmVVzbv77yWdv3+YhyUQ3AAWStSIL72hxGqNPkFIUhm+5tKKBy+8CKSOI94M37Qg AJGZwKwJCMap86lzEBDA4/iNw2fCW3YbyDU9UFkgPiDAgfcg3rvUz64VrshSPPgj2Dt3 pc5qtcnWQjGw5arHu+HXfNDiM36uwAl8R8luvZe1EtWH6yxshuaI6IJ2dJD3hbySRO3P fLmYhxx5BbR0U7dJB409QL0sUdZ0k0QAgjrnrSM24XfIP1tWmV/3LMc2Z22gcHb542aG yrp7UAA5zD4e6GoWjTUKVJ+YZmwPVGyTxbZIUC1LbD4o6pb/2OYeQCFMvRDCXf/eeGSi dB/g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence :content-transfer-encoding:mime-version:message-id:date:subject:cc :to:dmarc-filter:delivered-to:dkim-signature:dkim-filter; bh=TIFpEPURw8ysSidYX8yk2Q0Ui1SReCL9V89pI8YDnLE=; fh=2TMc/kV9on/03wVR1skLy0d/4uJMtp6806m4sqgy7oc=; b=jJ188wboqbW3jgLev3+BtRfVk7XCBZ33aXaLv9FiXXUI1mrlCDOOgEq/K/dlptVcad ZnOoKqdhnotl3GM18uLQDzJF1Y3AomA/GILXbgW2hzE+dHTbHoul5CQoP52DPCrKubR3 Z+5yuZJUOMexi9Hv5s1XMycuUW2J1vRxPYGtOdC+ehYMDOS+RbD5Q4OWCih0X52kim2V O8vuH0KN0feF5Maul0iED2L9qnKNQs+cVcbNbkI1cdvym3d/jbnlq964+2KdTIctN5Hz /Iqw3uMR8i6GeB5BPXnLkVOWwrLXVhsMKe91a6qwt9evt66Nw2kBCmTg3G/y/vsUaT2S papA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=CTGmz6i5; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id q21-20020a170906b29500b0097394940619si5379680ejz.984.2023.08.30.03.38.13 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 30 Aug 2023 03:38:13 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=CTGmz6i5; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A6E393858C39 for ; Wed, 30 Aug 2023 10:38:12 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A6E393858C39 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1693391892; bh=TIFpEPURw8ysSidYX8yk2Q0Ui1SReCL9V89pI8YDnLE=; h=To:Cc:Subject:Date:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:From; b=CTGmz6i5L72zgRTOv/x3rrfxA4d49O1GtHHlJKqW5sN6GXy8l5OgM+Hs75idIGw+G 0IDbwNNAB37rVTfw1BZ9X8i6JB4jby8tkRSI3SFgw4XcXVSbP+Eokk6Lc7AwPknzP9 ESyjDlya0UgJ09Nb00h4xV4pb6UVIf9zLBWSMPvA= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.43]) by sourceware.org (Postfix) with ESMTPS id B4F473858C3A for ; Wed, 30 Aug 2023 10:37:27 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B4F473858C3A X-IronPort-AV: E=McAfee;i="6600,9927,10817"; a="461982291" X-IronPort-AV: E=Sophos;i="6.02,213,1688454000"; d="scan'208";a="461982291" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Aug 2023 03:37:25 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10817"; a="742184327" X-IronPort-AV: E=Sophos;i="6.02,213,1688454000"; d="scan'208";a="742184327" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by fmsmga007.fm.intel.com with ESMTP; 30 Aug 2023 03:37:17 -0700 Received: from shliclel4217.sh.intel.com (shliclel4217.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id CDD47100512A; Wed, 30 Aug 2023 18:37:16 +0800 (CST) To: gcc-patches@gcc.gnu.org Cc: rguenther@suse.de, hubicka@ucw.cz Subject: [PATCH] Adjust costing of emulated vectorized gather/scatter Date: Wed, 30 Aug 2023 18:35:16 +0800 Message-Id: <20230830103516.882926-1-hongtao.liu@intel.com> X-Mailer: git-send-email 2.31.1 MIME-Version: 1.0 X-Spam-Status: No, score=-12.0 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1775650098255984175 X-GMAIL-MSGID: 1775650098255984175 r14-332-g24905a4bd1375c adjusts costing of emulated vectorized gather/scatter. ---- commit 24905a4bd1375ccd99c02510b9f9529015a48315 Author: Richard Biener Date: Wed Jan 18 11:04:49 2023 +0100 Adjust costing of emulated vectorized gather/scatter Emulated gather/scatter behave similar to strided elementwise accesses in that they need to decompose the offset vector and construct or decompose the data vector so handle them the same way, pessimizing the cases with may elements. ---- But for emulated gather/scatter, offset vector load/vec_construct has aready been counted, and in real case, it's probably eliminated by later optimizer. Also after decomposing, element loads from continous memory could be less bounded compared to normal elementwise load. The patch decreases the cost a little bit. This will enable gather emulation for below loop with VF=8(ymm) double foo (double* a, double* b, unsigned int* c, int n) { double sum = 0; for (int i = 0; i != n; i++) sum += a[i] * b[c[i]]; return sum; } For the upper loop, microbenchmark result shows on ICX, emulated gather with VF=8 is 30% faster than emulated gather with VF=4 when tripcount is big enough. It bring back ~4% for 510.parest still ~5% regression compared to gather instruction due to throughput bound. For -march=znver1/2/3/4, the change doesn't enable VF=8(ymm) for the loop, VF remains 4(xmm) as before(guess related to their own cost model). Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ok for trunk? gcc/ChangeLog: PR target/111064 * config/i386/i386.cc (ix86_vector_costs::add_stmt_cost): Decrease cost a little bit for vec_to_scalar(offset vector) in emulated gather. gcc/testsuite/ChangeLog: * gcc.target/i386/pr111064.c: New test. --- gcc/config/i386/i386.cc | 11 ++++++++++- gcc/testsuite/gcc.target/i386/pr111064.c | 12 ++++++++++++ 2 files changed, 22 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr111064.c diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index 1bc3f11ff07..337e0f1bfbb 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -24079,7 +24079,16 @@ ix86_vector_costs::add_stmt_cost (int count, vect_cost_for_stmt kind, || STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info) == VMAT_GATHER_SCATTER)) { stmt_cost = ix86_builtin_vectorization_cost (kind, vectype, misalign); - stmt_cost *= (TYPE_VECTOR_SUBPARTS (vectype) + 1); + /* For emulated gather/scatter, offset vector load/vec_construct has + already been counted and in real case, it's probably eliminated by + later optimizer. + Also after decomposing, element loads from continous memory + could be less bounded compared to normal elementwise load. */ + if (kind == vec_to_scalar + && STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info) == VMAT_GATHER_SCATTER) + stmt_cost *= TYPE_VECTOR_SUBPARTS (vectype); + else + stmt_cost *= (TYPE_VECTOR_SUBPARTS (vectype) + 1); } else if ((kind == vec_construct || kind == scalar_to_vec) && node diff --git a/gcc/testsuite/gcc.target/i386/pr111064.c b/gcc/testsuite/gcc.target/i386/pr111064.c new file mode 100644 index 00000000000..aa2589bd36f --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr111064.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-Ofast -march=icelake-server -mno-gather" } */ +/* { dg-final { scan-assembler-times {(?n)vfmadd[123]*pd.*ymm} 2 { target { ! ia32 } } } } */ + +double +foo (double* a, double* b, unsigned int* c, int n) +{ + double sum = 0; + for (int i = 0; i != n; i++) + sum += a[i] * b[c[i]]; + return sum; +}