From patchwork Fri Jan 5 05:03:00 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Levy Hsu X-Patchwork-Id: 185219 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:6f82:b0:100:9c79:88ff with SMTP id tb2csp6029500dyb; Thu, 4 Jan 2024 21:05:24 -0800 (PST) X-Google-Smtp-Source: AGHT+IGrUcOEdJsWOcH9cQ9HxdkvNPN8L1BDHkSqETnSMYmJIYkpr8dv2EpZ95aLHFNgIRp3BqMk X-Received: by 2002:a05:6214:1d0b:b0:680:d69:3e80 with SMTP id e11-20020a0562141d0b00b006800d693e80mr2067281qvd.72.1704431124433; Thu, 04 Jan 2024 21:05:24 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1704431124; cv=pass; d=google.com; s=arc-20160816; b=UVVas8wvnVtuup69LDAZg6kCSmBxz0jqMH3ZWJmaIDXiLc6g1QaQSP4U63UH6qPc7R +6Et9Sx3uH6ank9N6cXAWC8Kg+AKN3kpbk29CPH3Vnhrx+AZhs8lNcLReWSzEj7/e7lN FFKVxUUOR1Z8gXkLwwmM/t55cxAE59ILNotDjH2wivw+Sqv7Wuut/opmEGc9No+DdaD/ GOk/YH3P0cxaa4raVdvNMc21mgVBbm9oLydrBI0N8ZN5nnj8nSiU/oFqO9QDkZoZAU6X N8R8//Kyml42Hj1BflItuSIkEqaQ9YIf+OTHuuz8xgd18fITXUSC6nwvdqwP9E+H7kGy ZKHg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:message-id:date:subject:cc:to:from:arc-filter :dmarc-filter:delivered-to; bh=FPN7Ol5nzgcj1ZtrMMBLmHeWzX5yQKf100ScyvWKMG0=; fh=7ORmMkDYp7wfjZ+C5mUTBJ7jxFNRvqyluIpMQAHiZ9s=; b=TDAh3hgj7pEXYRjXSzhrTSL+wLBBiGYqWJRmlb9TP8KnLEzE9/8pmwACCPs1p6iCTp pTlo89u0qklS1m6sKa0o7D3IpqFzYgLAg5toW6YlP0inhNsIRaY/KSx3gfZNUNlfSzNG X6bu5It5rsnpEtKF5uhprhYSoXzA+NT0XbnVJHPclJDv6jM+fg4rDT8TaAlNc2fUG4Uj U/KYAkq8G3iixxbzkGbez681zDxh+hiAWNeKCS4XqCYNsVaFzlbObZTxfQbTYjI0ibsJ B6woJquJcG9F6ew3SGfXwTAyzndFBZXBOdSr6JAwBKafhlMghBmru0FXOgk5eGPlEagV M84w== ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id x18-20020a0cda12000000b0067f9da638afsi1018634qvj.293.2024.01.04.21.05.24 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 04 Jan 2024 21:05:24 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 270823858410 for ; Fri, 5 Jan 2024 05:05:24 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.10]) by sourceware.org (Postfix) with ESMTPS id 21B613858D32 for ; Fri, 5 Jan 2024 05:04:30 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 21B613858D32 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=levyhsu.com Authentication-Results: sourceware.org; spf=fail smtp.mailfrom=levyhsu.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 21B613858D32 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=198.175.65.10 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1704431073; cv=none; b=hVdByIW0jDBGFU4gDUuOFbyfm0k+ti6V2aGKkffB6K/W676lXtyVJowlHKDgoaGOm6drrXsvOFhk0zTChgp6H5ECjhIfzfuO6JdpXCJrDqLVtBzwg+4TSiKwXQoVWIcaeqpbL6zFGxL6cNCJU8F/DS5H5EG+GJl3VHqONx6bL8A= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1704431073; c=relaxed/simple; bh=FD6xxjNPHhp+zVuDIQPg60LwdjTgzdh2XvQ55H4xVoU=; h=From:To:Subject:Date:Message-Id:MIME-Version; b=arPUqo7ixeq7yNDpp7m1FhwCbXgS6NB3cSzrfZsnphsuGFghxu2BPjk9AZS5aRcqaWLVdLLMsXtY6+7WYxJF52Uy0iT4gXd8K5aiNJe2YaR11Ru1SLF47EHe8NC8+pygf5Uk6MQSzy9EtrgaKI5I+zhDNXdRR4F/9cfJMt71y0U= ARC-Authentication-Results: i=1; server2.sourceware.org X-IronPort-AV: E=McAfee;i="6600,9927,10943"; a="10806684" X-IronPort-AV: E=Sophos;i="6.04,332,1695711600"; d="scan'208";a="10806684" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Jan 2024 21:03:04 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10943"; a="899543280" X-IronPort-AV: E=Sophos;i="6.04,332,1695711600"; d="scan'208";a="899543280" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by fmsmga002.fm.intel.com with ESMTP; 04 Jan 2024 21:03:01 -0800 Received: from shliclel4217.sh.intel.com (shliclel4217.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id 83D3A10056FF; Fri, 5 Jan 2024 13:03:00 +0800 (CST) From: Levy Hsu To: gcc-patches@gcc.gnu.org Cc: admin@levyhsu.com, liwei.xu@intel.com, crazylht@gmail.com Subject: [x86_64 PATCH] PR target/107563: Add 3-instruction subroutine vector shift in ix86_expand_vec_perm_const_1 Date: Fri, 5 Jan 2024 13:03:00 +0800 Message-Id: <20240105050300.3455412-1-admin@levyhsu.com> X-Mailer: git-send-email 2.31.1 MIME-Version: 1.0 X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_NUMSUBJECT, SPF_FAIL, SPF_HELO_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1787225570613691967 X-GMAIL-MSGID: 1787225570613691967 From: Liwei Xu This patch optimize byte swaps in vectors using SSE2 instructions. It targets 8-byte and 16-byte vectors, efficiently handling patterns like __builtin_shufflevector(v, v, 1, 0, 3, 2, ...). PR target/107563 gcc/ChangeLog: * config/i386/i386-expand.cc (expand_vec_perm_psrlw_psllw_por): New subroutine. (ix86_expand_vec_perm_const_1): New Entry. gcc/testsuite/ChangeLog: * g++.target/i386/pr107563.C: New test. --- gcc/config/i386/i386-expand.cc | 64 ++++++++++++++++++++++++ gcc/testsuite/g++.target/i386/pr107563.C | 23 +++++++++ 2 files changed, 87 insertions(+) create mode 100755 gcc/testsuite/g++.target/i386/pr107563.C diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc index 527fcc63506..ba5ea20daf7 100644 --- a/gcc/config/i386/i386-expand.cc +++ b/gcc/config/i386/i386-expand.cc @@ -21826,6 +21826,67 @@ expand_vec_perm_2perm_pblendv (struct expand_vec_perm_d *d, bool two_insn) return true; } +/* A subroutine of ix86_expand_vec_perm_const_1. + Implement a permutation with psrlw, psllw and por. + It handles case: + __builtin_shufflevector (v,v,1,0,3,2,5,4,7,6,9,8,11,10,13,12,15,14); + __builtin_shufflevector (v,v,1,0,3,2,5,4,7,6); */ + +static bool +expand_vec_perm_psrlw_psllw_por (struct expand_vec_perm_d *d) +{ + unsigned i; + rtx (*gen_shr) (rtx, rtx, rtx); + rtx (*gen_shl) (rtx, rtx, rtx); + rtx (*gen_or) (rtx, rtx, rtx); + machine_mode mode = VOIDmode; + + if (!TARGET_SSE2 || !d->one_operand_p) + return false; + + switch (d->vmode) + { + case E_V8QImode: + if (!TARGET_MMX_WITH_SSE) + return false; + mode = V4HImode; + gen_shr = gen_ashrv4hi3; + gen_shl = gen_ashlv4hi3; + gen_or = gen_iorv4hi3; + break; + case E_V16QImode: + mode = V8HImode; + gen_shr = gen_vlshrv8hi3; + gen_shl = gen_vashlv8hi3; + gen_or = gen_iorv8hi3; + break; + default: return false; + } + + if (!rtx_equal_p (d->op0, d->op1)) + return false; + + for (i = 0; i < d->nelt; i += 2) + if (d->perm[i] != i + 1 || d->perm[i + 1] != i) + return false; + + if (d->testing_p) + return true; + + rtx tmp1 = gen_reg_rtx (mode); + rtx tmp2 = gen_reg_rtx (mode); + rtx op0 = force_reg (d->vmode, d->op0); + + emit_move_insn (tmp1, lowpart_subreg (mode, op0, d->vmode)); + emit_move_insn (tmp2, lowpart_subreg (mode, op0, d->vmode)); + emit_insn (gen_shr (tmp1, tmp1, GEN_INT (8))); + emit_insn (gen_shl (tmp2, tmp2, GEN_INT (8))); + emit_insn (gen_or (tmp1, tmp1, tmp2)); + emit_move_insn (d->target, lowpart_subreg (d->vmode, tmp1, mode)); + + return true; +} + /* A subroutine of ix86_expand_vec_perm_const_1. Implement a V4DF permutation using two vperm2f128, followed by a vshufpd insn blending the two vectors together. */ @@ -23243,6 +23304,9 @@ ix86_expand_vec_perm_const_1 (struct expand_vec_perm_d *d) if (expand_vec_perm_2perm_pblendv (d, false)) return true; + if (expand_vec_perm_psrlw_psllw_por (d)) + return true; + /* Try sequences of four instructions. */ if (expand_vec_perm_even_odd_trunc (d)) diff --git a/gcc/testsuite/g++.target/i386/pr107563.C b/gcc/testsuite/g++.target/i386/pr107563.C new file mode 100755 index 00000000000..5b0c648e8f1 --- /dev/null +++ b/gcc/testsuite/g++.target/i386/pr107563.C @@ -0,0 +1,23 @@ +/* PR target/107563.C */ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-std=c++2b -O3 -msse2" } */ +/* { dg-final { scan-assembler-not "movzbl" } } */ +/* { dg-final { scan-assembler-not "salq" } } */ +/* { dg-final { scan-assembler-not "orq" } } */ +/* { dg-final { scan-assembler-not "punpcklqdq" } } */ +/* { dg-final { scan-assembler-times "psllw" 2 } } */ +/* { dg-final { scan-assembler-times "psrlw" 1 } } */ +/* { dg-final { scan-assembler-times "psraw" 1 } } */ +/* { dg-final { scan-assembler-times "por" 2 } } */ + +using temp_vec_type [[__gnu__::__vector_size__ (16)]] = char; +void foo (temp_vec_type& v) noexcept +{ + v = __builtin_shufflevector(v,v,1,0,3,2,5,4,7,6,9,8,11,10,13,12,15,14); +} + +using temp_vec_type2 [[__gnu__::__vector_size__ (8)]] = char; +void foo2 (temp_vec_type2& v) noexcept +{ + v=__builtin_shufflevector(v,v,1,0,3,2,5,4,7,6); +}