From patchwork Tue Jan 16 21:59:18 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roger Sayle X-Patchwork-Id: 188623 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:42cf:b0:101:a8e8:374 with SMTP id q15csp545932dye; Tue, 16 Jan 2024 14:00:05 -0800 (PST) X-Google-Smtp-Source: AGHT+IEev6WHc4QgNb0HpMKNkGFZ06wK22DyQNZFGMkv8xbsz1VAmjFr7P4wkGzcFNww2wkWmaRB X-Received: by 2002:a05:6808:1190:b0:3bd:346e:1f53 with SMTP id j16-20020a056808119000b003bd346e1f53mr9096721oil.5.1705442404928; Tue, 16 Jan 2024 14:00:04 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1705442404; cv=pass; d=google.com; s=arc-20160816; b=arNhyfhpmdZEfoFB5ZSpzT6D704hd8Hy9eFq6w7SjzIZMANhEcSZu3gvzt1CwcKB5I 1PWXjFSTAFDbXC9+5371VGmv3N2PxLeDgAc2vp0ba43GEy0u8du4ivRjaVM1FdvMPNo3 HzqfCg+ikZews2Ntc+IirLAcEndM5DJpkPSDWdNbALSUvlCX0s5veTtMX8GFrca17s2S PMrLBf0feK1/pYzEf8xtRuLga+/C5SWnPIfl8V4t7TvxRvCZ8fkRbEk4vXjm6CSWIGDl kNYnBhdtUbtKByftodNgCcLEvoPN3U11exlsmv1UghrThoHIzx4aB0nl8ZG9aI4DrEfT G8OQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-language:thread-index :mime-version:message-id:date:subject:cc:to:from:dkim-signature :arc-filter:dmarc-filter:delivered-to; bh=rHbUzhOExOMu6VtfGWF9+Q6x3RVaG5dE1NHj+6GwvFw=; fh=CKAXBFfUtqyxxXNvUuwDxVCE3dDrGd8ikn/xuStPTv0=; b=s+bCuxmdrVQUjrCy4nUEQMhAQOOJgBW0SPAWulnA9OPzoUg3evBSzjP4utLKbLijvD bGbBf0h9rXGEFU1laxPOee4jSLf8zvvOoBTLF9jp3rInMkf+PpEXH/yTLwilBunJK6jN e4Qq+c+Tn+LBNu57uVfasv/iYFGWzoLQh+NSO4yBSUkXplhYaBXnjPvciP1RTYrcq2KT emZhKnNfHFW6Srys9oWx3ysPvXPWyISLRSshFru01ce/+gHu6qYtIUVX/KgeoJ/VDWzY jQgMducyWZmxMnBtYzP+/HnH93ImMbZPq0EVHLpoTzOsFzSx4xu7NE/er5PZNOGtO/4p mOzg== ARC-Authentication-Results: i=2; mx.google.com; dkim=fail header.i=@nextmovesoftware.com header.s=default header.b=FKBFRNCW; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id o20-20020a05622a009400b00429f990ff9dsi3421744qtw.136.2024.01.16.14.00.04 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 16 Jan 2024 14:00:04 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=fail header.i=@nextmovesoftware.com header.s=default header.b=FKBFRNCW; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 8E0943858C53 for ; Tue, 16 Jan 2024 22:00:04 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id 642DC3858D37 for ; Tue, 16 Jan 2024 21:59:22 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 642DC3858D37 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 642DC3858D37 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=162.254.253.69 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1705442366; cv=none; b=MkYh4VwqxIqf9BL/vBuCtHmTCw6c+BkgOBOTbxWHDsrd6CpConGay0XXn7OibgaY6a91Kii4MuqDYaBmMyz+UzK9jxuTQmcWkogUCEfFxUzT+G7KheVRQ7PpmP4cwsjBZlD9fKWjKAlV3shGn3UbvcdwJn6Osn9SMKXDesNx3QY= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1705442366; c=relaxed/simple; bh=buBShggpFVKLDZ6sEknK22yGXe/D+h/S/Nw1CR/BKHo=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=jCqpd5ACG6bUqgENFG74QK5zc6qj9j/Hjum9/BWZOTC4HJ3s54lHFGRJctMzDMOaRRfZ8MUhS6mAvL3ijSEccpT4bLDm8ceQtLKPLxhUx/tyW0naDJFCFMEHsdzXb8c9VH2HZVF/RDf63VrW9SMF3dwzv+qVJZ36KtKZWm1A+Pc= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=rHbUzhOExOMu6VtfGWF9+Q6x3RVaG5dE1NHj+6GwvFw=; b=FKBFRNCWyJRmWvcI3VeAbc2N7P diJaZnw8IEUvZhWlGmwGFXYuGuKVAQ5292dsNP+eTDIqqJV2nKMwtKEIkhxAMPtQEAiNknNY7lFVE hc+80krWibT73RQ7tMyI2sGIiscl4EJX5BZPZT4IUrIOZmSJRcGHo4dF1boNZAz6TFzIXdhey4RcX sJqB5cB6sQ+usDtRVP165m80IFEyunQySg+ta9IfQNavstUJJMzKFfHPGKA1Pp5y0aCkfVYI7oNbe VVu3YZTlBVx7mCiiexqNDvR5US2CzBH/up1HW3UhJVnvGJoe0iqzbw4kjgf7sfS6/iUnFyQDgFJ2W i696bEow==; Received: from host109-154-238-190.range109-154.btcentralplus.com ([109.154.238.190]:57392 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96.2) (envelope-from ) id 1rPrSr-0002pE-0z; Tue, 16 Jan 2024 16:59:21 -0500 From: "Roger Sayle" To: Cc: "'Hongtao Liu'" , "'Uros Bizjak'" Subject: [x86 PATCH] PR target/106060: Improved SSE vector constant materialization. Date: Tue, 16 Jan 2024 21:59:18 -0000 Message-ID: <031901da48c7$42c37b10$c84a7130$@nextmovesoftware.com> MIME-Version: 1.0 X-Mailer: Microsoft Outlook 16.0 Thread-Index: AdpIxhgcu6QGDxjgRuWF03Ll8HIx0w== Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1788285975477961793 X-GMAIL-MSGID: 1788285975477961793 I thought I'd just missed the bug fixing season of stage3, but there appears to a little latitude in early stage4 (for vector patches), so I'll post this now. This patch resolves PR target/106060 by providing efficient methods for materializing/synthesizing special "vector" constants on x86. Currently there are three methods of materializing a vector constant; the most general is to load a vector from the constant pool, secondly "duplicated" constants can be synthesized by moving an integer between units and broadcasting (or shuffling it), and finally the special cases of the all-zeros vector and all-ones vectors can be loaded via a single SSE instruction. This patch handles additional cases that can be synthesized in two instructions, loading an all-ones vector followed by another SSE instruction. Following my recent patch for PR target/112992, there's conveniently a single place in i386-expand.cc where these special cases can be handled. Two examples are given in the original bugzilla PR for 106060. __m256i should_be_cmpeq_abs () { return _mm256_set1_epi8 (1); } is now generated (with -O3 -march=x86-64-v3) as: vpcmpeqd %ymm0, %ymm0, %ymm0 vpabsb %ymm0, %ymm0 ret and __m256i should_be_cmpeq_add () { return _mm256_set1_epi8 (-2); } is now generated as: vpcmpeqd %ymm0, %ymm0, %ymm0 vpaddb %ymm0, %ymm0, %ymm0 ret This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32} with no new failures. Ok for mainline? 2024-01-16 Roger Sayle gcc/ChangeLog PR target/106060 * config/i386/i386-expand.cc (enum ix86_vec_bcast_alg): New. (struct ix86_vec_bcast_map_simode_t): New type for table below. (ix86_vec_bcast_map_simode): Table of SImode constants that may be efficiently synthesized by a ix86_vec_bcast_alg method. (ix86_vec_bcast_map_simode_cmp): New comparator for bsearch. (ix86_vector_duplicate_simode_const): Efficiently synthesize V4SImode and V8SImode constants that duplicate special constants. (ix86_vector_duplicate_value): Attempt to synthesize "special" vector constants using ix86_vector_duplicate_simode_const. * config/i386/i386.cc (ix86_rtx_costs) : ABS of a vector integer mode costs with a single SSE instruction. gcc/testsuite/ChangeLog PR target/106060 * gcc.target/i386/auto-init-8.c: Update test case. * gcc.target/i386/avx512fp16-3.c: Likewise. * gcc.target/i386/pr100865-9a.c: Likewise. * gcc.target/i386/pr106060-1.c: New test case. * gcc.target/i386/pr106060-2.c: Likewise. * gcc.target/i386/pr106060-3.c: Likewise. * gcc.target/i386/pr70314-3.c: Update test case. * gcc.target/i386/vect-shiftv4qi.c: Likewise. * gcc.target/i386/vect-shiftv8qi.c: Likewise. Thanks in advance, Roger diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc index 52754e1..f8f8af6 100644 --- a/gcc/config/i386/i386-expand.cc +++ b/gcc/config/i386/i386-expand.cc @@ -15638,6 +15638,288 @@ s4fma_expand: gcc_unreachable (); } +/* See below where shifts are handled for explanation of this enum. */ +enum ix86_vec_bcast_alg +{ + VEC_BCAST_PXOR, + VEC_BCAST_PCMPEQ, + VEC_BCAST_PABSB, + VEC_BCAST_PADDB, + VEC_BCAST_PSRLW, + VEC_BCAST_PSRLD, + VEC_BCAST_PSLLW, + VEC_BCAST_PSLLD +}; + +struct ix86_vec_bcast_map_simode_t +{ + unsigned int key; + enum ix86_vec_bcast_alg alg; + unsigned int arg; +}; + +/* This table must be kept sorted as values are looked-up using bsearch. */ +static const ix86_vec_bcast_map_simode_t ix86_vec_bcast_map_simode[] = { + { 0x00000000, VEC_BCAST_PXOR, 0 }, + { 0x00000001, VEC_BCAST_PSRLD, 31 }, + { 0x00000003, VEC_BCAST_PSRLD, 30 }, + { 0x00000007, VEC_BCAST_PSRLD, 29 }, + { 0x0000000f, VEC_BCAST_PSRLD, 28 }, + { 0x0000001f, VEC_BCAST_PSRLD, 27 }, + { 0x0000003f, VEC_BCAST_PSRLD, 26 }, + { 0x0000007f, VEC_BCAST_PSRLD, 25 }, + { 0x000000ff, VEC_BCAST_PSRLD, 24 }, + { 0x000001ff, VEC_BCAST_PSRLD, 23 }, + { 0x000003ff, VEC_BCAST_PSRLD, 22 }, + { 0x000007ff, VEC_BCAST_PSRLD, 21 }, + { 0x00000fff, VEC_BCAST_PSRLD, 20 }, + { 0x00001fff, VEC_BCAST_PSRLD, 19 }, + { 0x00003fff, VEC_BCAST_PSRLD, 18 }, + { 0x00007fff, VEC_BCAST_PSRLD, 17 }, + { 0x0000ffff, VEC_BCAST_PSRLD, 16 }, + { 0x00010001, VEC_BCAST_PSRLW, 15 }, + { 0x0001ffff, VEC_BCAST_PSRLD, 15 }, + { 0x00030003, VEC_BCAST_PSRLW, 14 }, + { 0x0003ffff, VEC_BCAST_PSRLD, 14 }, + { 0x00070007, VEC_BCAST_PSRLW, 13 }, + { 0x0007ffff, VEC_BCAST_PSRLD, 13 }, + { 0x000f000f, VEC_BCAST_PSRLW, 12 }, + { 0x000fffff, VEC_BCAST_PSRLD, 12 }, + { 0x001f001f, VEC_BCAST_PSRLW, 11 }, + { 0x001fffff, VEC_BCAST_PSRLD, 11 }, + { 0x003f003f, VEC_BCAST_PSRLW, 10 }, + { 0x003fffff, VEC_BCAST_PSRLD, 10 }, + { 0x007f007f, VEC_BCAST_PSRLW, 9 }, + { 0x007fffff, VEC_BCAST_PSRLD, 9 }, + { 0x00ff00ff, VEC_BCAST_PSRLW, 8 }, + { 0x00ffffff, VEC_BCAST_PSRLD, 8 }, + { 0x01010101, VEC_BCAST_PABSB, 0 }, + { 0x01ff01ff, VEC_BCAST_PSRLW, 7 }, + { 0x01ffffff, VEC_BCAST_PSRLD, 7 }, + { 0x03ff03ff, VEC_BCAST_PSRLW, 6 }, + { 0x03ffffff, VEC_BCAST_PSRLD, 6 }, + { 0x07ff07ff, VEC_BCAST_PSRLW, 5 }, + { 0x07ffffff, VEC_BCAST_PSRLD, 5 }, + { 0x0fff0fff, VEC_BCAST_PSRLW, 4 }, + { 0x0fffffff, VEC_BCAST_PSRLD, 4 }, + { 0x1fff1fff, VEC_BCAST_PSRLW, 3 }, + { 0x1fffffff, VEC_BCAST_PSRLD, 3 }, + { 0x3fff3fff, VEC_BCAST_PSRLW, 2 }, + { 0x3fffffff, VEC_BCAST_PSRLD, 2 }, + { 0x7fff7fff, VEC_BCAST_PSRLW, 1 }, + { 0x7fffffff, VEC_BCAST_PSRLD, 1 }, + { 0x80000000, VEC_BCAST_PSLLD, 31 }, + { 0x80008000, VEC_BCAST_PSLLW, 15 }, + { 0xc0000000, VEC_BCAST_PSLLD, 30 }, + { 0xc000c000, VEC_BCAST_PSLLW, 14 }, + { 0xe0000000, VEC_BCAST_PSLLD, 29 }, + { 0xe000e000, VEC_BCAST_PSLLW, 13 }, + { 0xf0000000, VEC_BCAST_PSLLD, 28 }, + { 0xf000f000, VEC_BCAST_PSLLW, 12 }, + { 0xf8000000, VEC_BCAST_PSLLD, 27 }, + { 0xf800f800, VEC_BCAST_PSLLW, 11 }, + { 0xfc000000, VEC_BCAST_PSLLD, 26 }, + { 0xfc00fc00, VEC_BCAST_PSLLW, 10 }, + { 0xfe000000, VEC_BCAST_PSLLD, 25 }, + { 0xfe00fe00, VEC_BCAST_PSLLW, 9 }, + { 0xfefefefe, VEC_BCAST_PADDB, 0 }, + { 0xff000000, VEC_BCAST_PSLLD, 24 }, + { 0xff00ff00, VEC_BCAST_PSLLW, 8 }, + { 0xff800000, VEC_BCAST_PSLLD, 23 }, + { 0xff80ff80, VEC_BCAST_PSLLW, 7 }, + { 0xffc00000, VEC_BCAST_PSLLD, 22 }, + { 0xffc0ffc0, VEC_BCAST_PSLLW, 6 }, + { 0xffe00000, VEC_BCAST_PSLLD, 21 }, + { 0xffe0ffe0, VEC_BCAST_PSLLW, 5 }, + { 0xfff00000, VEC_BCAST_PSLLD, 20 }, + { 0xfff0fff0, VEC_BCAST_PSLLW, 4 }, + { 0xfff80000, VEC_BCAST_PSLLD, 19 }, + { 0xfff8fff8, VEC_BCAST_PSLLW, 3 }, + { 0xfffc0000, VEC_BCAST_PSLLD, 18 }, + { 0xfffcfffc, VEC_BCAST_PSLLW, 2 }, + { 0xfffe0000, VEC_BCAST_PSLLD, 17 }, + { 0xfffefffe, VEC_BCAST_PSLLW, 1 }, + { 0xffff0000, VEC_BCAST_PSLLD, 16 }, + { 0xffff8000, VEC_BCAST_PSLLD, 15 }, + { 0xffffc000, VEC_BCAST_PSLLD, 14 }, + { 0xffffe000, VEC_BCAST_PSLLD, 13 }, + { 0xfffff000, VEC_BCAST_PSLLD, 12 }, + { 0xfffff800, VEC_BCAST_PSLLD, 11 }, + { 0xfffffc00, VEC_BCAST_PSLLD, 10 }, + { 0xfffffe00, VEC_BCAST_PSLLD, 9 }, + { 0xffffff00, VEC_BCAST_PSLLD, 8 }, + { 0xffffff80, VEC_BCAST_PSLLD, 7 }, + { 0xffffffc0, VEC_BCAST_PSLLD, 6 }, + { 0xffffffe0, VEC_BCAST_PSLLD, 5 }, + { 0xfffffff0, VEC_BCAST_PSLLD, 4 }, + { 0xfffffff8, VEC_BCAST_PSLLD, 3 }, + { 0xfffffffc, VEC_BCAST_PSLLD, 2 }, + { 0xfffffffe, VEC_BCAST_PSLLD, 1 }, + { 0xffffffff, VEC_BCAST_PCMPEQ, 0 } +}; + +/* Comparator for bsearch on ix86_vec_bcast_map. */ +static int +ix86_vec_bcast_map_simode_cmp (const void *key, const void *entry) +{ + return (*(const unsigned int*)key) + - ((const ix86_vec_bcast_map_simode_t*)entry)->key; +} + +/* A subroutine of ix86_vector_duplicate_value. Tries to efficiently + materialize V4SImode and V8SImode vectors from SImode integer + constants. */ +static bool +ix86_vector_duplicate_simode_const (machine_mode mode, rtx target, + unsigned int val) +{ + const ix86_vec_bcast_map_simode_t *entry; + rtx tmp1, tmp2; + + entry = (const ix86_vec_bcast_map_simode_t*) + bsearch(&val, ix86_vec_bcast_map_simode, + ARRAY_SIZE (ix86_vec_bcast_map_simode), + sizeof (ix86_vec_bcast_map_simode_t), + ix86_vec_bcast_map_simode_cmp); + if (!entry) + return false; + + switch (entry->alg) + { + case VEC_BCAST_PXOR: + if (mode == V8SImode && !TARGET_AVX2) + return false; + emit_move_insn (target, CONST0_RTX (mode)); + return true; + + case VEC_BCAST_PCMPEQ: + if ((mode == V4SImode && !TARGET_SSE2) + || (mode == V8SImode && !TARGET_AVX2)) + return false; + emit_move_insn (target, CONSTM1_RTX (mode)); + return true; + + case VEC_BCAST_PABSB: + if (mode == V4SImode) + { + tmp1 = gen_reg_rtx (V16QImode); + emit_move_insn (tmp1, CONSTM1_RTX (V16QImode)); + tmp2 = gen_reg_rtx (V16QImode); + emit_insn (gen_absv16qi2 (tmp2, tmp1)); + } + else if (mode == V8SImode && TARGET_AVX2) + { + tmp1 = gen_reg_rtx (V32QImode); + emit_move_insn (tmp1, CONSTM1_RTX (V32QImode)); + tmp2 = gen_reg_rtx (V32QImode); + emit_insn (gen_absv32qi2 (tmp2, tmp1)); + } + else + return false; + break; + + case VEC_BCAST_PADDB: + if (mode == V4SImode) + { + tmp1 = gen_reg_rtx (V16QImode); + emit_move_insn (tmp1, CONSTM1_RTX (V16QImode)); + tmp2 = gen_reg_rtx (V16QImode); + emit_insn (gen_addv16qi3 (tmp2, tmp1, tmp1)); + } + else if (mode == V8SImode && TARGET_AVX2) + { + tmp1 = gen_reg_rtx (V32QImode); + emit_move_insn (tmp1, CONSTM1_RTX (V32QImode)); + tmp2 = gen_reg_rtx (V32QImode); + emit_insn (gen_addv32qi3 (tmp2, tmp1, tmp1)); + } + else + return false; + break; + + case VEC_BCAST_PSRLW: + if (mode == V4SImode) + { + tmp1 = gen_reg_rtx (V8HImode); + emit_move_insn (tmp1, CONSTM1_RTX (V8HImode)); + tmp2 = gen_reg_rtx (V8HImode); + emit_insn (gen_lshrv8hi3 (tmp2, tmp1, GEN_INT (entry->arg))); + } + else if (mode == V8SImode && TARGET_AVX2) + { + tmp1 = gen_reg_rtx (V16HImode); + emit_move_insn (tmp1, CONSTM1_RTX (V16HImode)); + tmp2 = gen_reg_rtx (V16HImode); + emit_insn (gen_lshrv16hi3 (tmp2, tmp1, GEN_INT (entry->arg))); + } + else + return false; + break; + + case VEC_BCAST_PSRLD: + if (mode == V4SImode) + { + tmp1 = gen_reg_rtx (V4SImode); + emit_move_insn (tmp1, CONSTM1_RTX (V4SImode)); + emit_insn (gen_lshrv4si3 (target, tmp1, GEN_INT (entry->arg))); + return true; + } + else if (mode == V8SImode && TARGET_AVX2) + { + tmp1 = gen_reg_rtx (V8SImode); + emit_move_insn (tmp1, CONSTM1_RTX (V8SImode)); + emit_insn (gen_lshrv8si3 (target, tmp1, GEN_INT (entry->arg))); + return true; + } + else + return false; + break; + + case VEC_BCAST_PSLLW: + if (mode == V4SImode) + { + tmp1 = gen_reg_rtx (V8HImode); + emit_move_insn (tmp1, CONSTM1_RTX (V8HImode)); + tmp2 = gen_reg_rtx (V8HImode); + emit_insn (gen_ashlv8hi3 (tmp2, tmp1, GEN_INT (entry->arg))); + } + else if (mode == V8SImode && TARGET_AVX2) + { + tmp1 = gen_reg_rtx (V16HImode); + emit_move_insn (tmp1, CONSTM1_RTX (V16HImode)); + tmp2 = gen_reg_rtx (V16HImode); + emit_insn (gen_ashlv16hi3 (tmp2, tmp1, GEN_INT (entry->arg))); + } + else + return false; + break; + + case VEC_BCAST_PSLLD: + if (mode == V4SImode) + { + tmp1 = gen_reg_rtx (V4SImode); + emit_move_insn (tmp1, CONSTM1_RTX (V4SImode)); + emit_insn (gen_ashlv4si3 (target, tmp1, GEN_INT (entry->arg))); + return true; + } + else if (mode == V8SImode && TARGET_AVX2) + { + tmp1 = gen_reg_rtx (V8SImode); + emit_move_insn (tmp1, CONSTM1_RTX (V8SImode)); + emit_insn (gen_ashlv8si3 (target, tmp1, GEN_INT (entry->arg))); + return true; + } + else + return false; + + default: + return false; + } + + emit_move_insn (target, gen_lowpart (mode, tmp2)); + return true; +} + /* A subroutine of ix86_expand_vector_init_duplicate. Tries to fill target with val via vec_duplicate. */ @@ -15647,6 +15929,12 @@ ix86_vector_duplicate_value (machine_mode mode, rtx target, rtx val) bool ok; rtx_insn *insn; rtx dup; + + if ((mode == V4SImode || mode == V8SImode) + && CONST_INT_P (val) + && ix86_vector_duplicate_simode_const (mode, target, INTVAL (val))) + return true; + /* Save/restore recog_data in case this is called from splitters or other routines where recog_data needs to stay valid across force_reg. See PR106577. */ diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index 8010532..da4a6dd 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -22076,6 +22076,8 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno, *total = cost->fabs; else if (FLOAT_MODE_P (mode)) *total = ix86_vec_cost (mode, cost->sse_op); + else if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT) + *total = cost->sse_op; return false; case SQRT: diff --git a/gcc/testsuite/gcc.target/i386/auto-init-8.c b/gcc/testsuite/gcc.target/i386/auto-init-8.c index 7023d72..666ee14 100644 --- a/gcc/testsuite/gcc.target/i386/auto-init-8.c +++ b/gcc/testsuite/gcc.target/i386/auto-init-8.c @@ -29,7 +29,7 @@ double foo() return result; } -/* { dg-final { scan-rtl-dump-times "0xfffffffffefefefe" 3 "expand" } } */ +/* { dg-final { scan-rtl-dump-times "0xfffffffffefefefe" 1 "expand" } } */ /* { dg-final { scan-rtl-dump-times "\\\[0xfefefefefefefefe\\\]" 2 "expand" } } */ /* { dg-final { scan-rtl-dump-times "0xfffffffffffffffe\\\]\\\) repeated x16" 2 "expand" } } */ diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-13.c b/gcc/testsuite/gcc.target/i386/avx512fp16-13.c index f431b8a..9902c81 100644 --- a/gcc/testsuite/gcc.target/i386/avx512fp16-13.c +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-13.c @@ -126,7 +126,6 @@ abs256_ph (__m256h a) return _mm256_abs_ph (a); } -/* { dg-final { scan-assembler-times "vpbroadcastd\[^\n\]*%ymm\[0-9\]+" 1 { target { ! ia32 } } } } */ /* { dg-final { scan-assembler-times "vpand\[^\n\]*%ymm\[0-9\]+" 1 } } */ __m128h @@ -136,5 +135,4 @@ abs_ph (__m128h a) return _mm_abs_ph (a); } -/* { dg-final { scan-assembler-times "vpbroadcastd\[^\n\]*%xmm\[0-9\]+" 1 { target { ! ia32 } } } } */ /* { dg-final { scan-assembler-times "vpand\[^\n\]*%xmm\[0-9\]+" 1 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr100865-9a.c b/gcc/testsuite/gcc.target/i386/pr100865-9a.c index f2ac1bd..91cfeda 100644 --- a/gcc/testsuite/gcc.target/i386/pr100865-9a.c +++ b/gcc/testsuite/gcc.target/i386/pr100865-9a.c @@ -18,7 +18,7 @@ foo (void) { int i; for (i = 0; i < sizeof (array) / sizeof (array[0]); i++) - array[i] = MK_CONST128_BROADCAST (0x1fff); + array[i] = MK_CONST128_BROADCAST (0x1234); } /* { dg-final { scan-assembler-times "vpbroadcastd\[\\t \]+%xmm\[0-9\]+, %xmm\[0-9\]+" 1 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr106060-1.c b/gcc/testsuite/gcc.target/i386/pr106060-1.c new file mode 100644 index 0000000..a734d56 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr106060-1.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -march=x86-64-v3" } */ +#include + +__m256i +foo () +{ + /* shouldnt_have_movabs */ + return _mm256_set1_epi8 (123); +} + +/* { dg-final { scan-assembler-not "movabs" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr106060-2.c b/gcc/testsuite/gcc.target/i386/pr106060-2.c new file mode 100644 index 0000000..23933ab --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr106060-2.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -march=x86-64-v3" } */ +#include + +__m256i +foo () +{ + /* should_be_cmpeq_abs */ + return _mm256_set1_epi8 (1); +} + +/* { dg-final { scan-assembler "pcmpeq" } } */ +/* { dg-final { scan-assembler "pabsb" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr106060-3.c b/gcc/testsuite/gcc.target/i386/pr106060-3.c new file mode 100644 index 0000000..59c128c --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr106060-3.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -march=x86-64-v3" } */ +#include + +__m256i +foo () +{ + /* should_be_cmpeq_add */ + return _mm256_set1_epi8 (-2); +} + +/* { dg-final { scan-assembler "pcmpeq" } } */ +/* { dg-final { scan-assembler "paddb" } } */ + diff --git a/gcc/testsuite/gcc.target/i386/pr70314.c b/gcc/testsuite/gcc.target/i386/pr70314.c index aad8dd9..181d2b4 100644 --- a/gcc/testsuite/gcc.target/i386/pr70314.c +++ b/gcc/testsuite/gcc.target/i386/pr70314.c @@ -1,6 +1,6 @@ /* { dg-do compile } */ /* { dg-options "-march=skylake-avx512 -O2" } */ -/* { dg-final { scan-assembler-times "cmp" 2 } } */ +/* { dg-final { scan-assembler-times "cmp\[dq\]" 2 } } */ /* { dg-final { scan-assembler-not "and" } } */ typedef long vec __attribute__((vector_size(16))); diff --git a/gcc/testsuite/gcc.target/i386/vect-shiftv4qi.c b/gcc/testsuite/gcc.target/i386/vect-shiftv4qi.c index c6a6390..b7e45c2 100644 --- a/gcc/testsuite/gcc.target/i386/vect-shiftv4qi.c +++ b/gcc/testsuite/gcc.target/i386/vect-shiftv4qi.c @@ -28,7 +28,7 @@ __vu srl_c (__vu a) return a >> 5; } -/* { dg-final { scan-assembler-times "psrlw" 2 } } */ +/* { dg-final { scan-assembler-times "psrlw" 5 } } */ __vi sra (__vi a, int n) { diff --git a/gcc/testsuite/gcc.target/i386/vect-shiftv8qi.c b/gcc/testsuite/gcc.target/i386/vect-shiftv8qi.c index 244b0db..2471e6e 100644 --- a/gcc/testsuite/gcc.target/i386/vect-shiftv8qi.c +++ b/gcc/testsuite/gcc.target/i386/vect-shiftv8qi.c @@ -28,7 +28,7 @@ __vu srl_c (__vu a) return a >> 5; } -/* { dg-final { scan-assembler-times "psrlw" 2 } } */ +/* { dg-final { scan-assembler-times "psrlw" 5 } } */ __vi sra (__vi a, int n) {