From patchwork Wed Nov 15 09:47:04 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hongyu Wang X-Patchwork-Id: 165243 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b909:0:b0:403:3b70:6f57 with SMTP id t9csp2430214vqg; Wed, 15 Nov 2023 01:52:09 -0800 (PST) X-Google-Smtp-Source: AGHT+IGY4UuAWgHn+9C52hnymXJJTcp5I09/8hpyqbSy5U4gU6Z+IXEjrhSkdWnJNeJiR9GxaWrb X-Received: by 2002:a67:c196:0:b0:45d:9d27:215c with SMTP id h22-20020a67c196000000b0045d9d27215cmr12059848vsj.1.1700041929168; Wed, 15 Nov 2023 01:52:09 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1700041929; cv=pass; d=google.com; s=arc-20160816; b=uRMVWV0zi4bcePc5Hwon7NznzOOJWk+HUeL5u7s0qeuUtyaU3DXVGOE2yKSQTYKDgP HZnBQMI/SWpfjqe5TcuiaVeOJuKSvQAQixnaM7EvvruE0YhutavjdSHZXu10pu5SOlgt 5NL4Q2XknuPVGnQ5vVvDLfe2JWcqz5X6Ly9PoJYpK11q96PgQerwKsRoX11UEPjgCWls plIx7VeGYNPEg/R16vwZl/CepQ8nHXhKwwlRvWLuWFDXL+9g3CiPIGYpBs+sQlUJlLlv 5UJNBFdlgMVYGdKuEyYh8cZnUwjwsHWU4i5dJYtf/jdAHsZe011/An6CwNB4wj4SoRfv obXg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature:arc-filter:dmarc-filter:delivered-to; bh=rWW3bnQHalUWOXB/lhUNdRjYEftq3ToxFX1moPfjUes=; fh=n8eNxIWSYJwy/CU3QSXzDvE/zeEoomCGojuOcYEQEyQ=; b=aMLPY5Afa6o1JKKrOJRBwVT7CtERQKd8sJg39QF8UAH8JAeBBWxOSNstCE3insheZL GGpmA61HLKXVlW1InEEhlm+16lk3VYAjArj/1Ga/DyOe4j3ZJgNV3Sn4Yp31jChP5cNR a2cPL/UE72rk3scu8Qfz4UakJ34hD8h/cJAiVJ37AigYLjN/A1kmU4HTe9b6PmICvr1W /3WTcoJQ4SDf3Vzlwo8Mr6IKbPJzJ9G9l0lABf5SV/xPzIhlGx17HIcuSy+jvC3Hsu7v 36+fDKaCFprdur9tYetwy8awvsUvsatqbOdj8FUaf/BbWPC0vt2XL0uWsdIExkO5G9WP kk4w== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=lmoe3uQx; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from server2.sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id k10-20020a05622a03ca00b004053819f665si8450238qtx.608.2023.11.15.01.52.09 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 15 Nov 2023 01:52:09 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=lmoe3uQx; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id D6AA63858283 for ; Wed, 15 Nov 2023 09:52:08 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.100]) by sourceware.org (Postfix) with ESMTPS id 080A63857C4B for ; Wed, 15 Nov 2023 09:47:22 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 080A63857C4B Authentication-Results: sourceware.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=fail smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 080A63857C4B Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=134.134.136.100 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700041645; cv=none; b=IlGxhfxGshgNpmKqn05kDZgz8pxb7DgqAyLhyuiCxKuS+9nwgZ+/GcFs4w01YCvNJSNZs9j/fPvGpDx02rBDNMi4N05k/CdxS9CZuZG0zgBVFnr+CCAEuTzz3DHFytvBBH+ZvA4tWdftz7gKnS5vWZZDyI8hKa7XkMPeW9B6/eM= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700041645; c=relaxed/simple; bh=2be3W85hD1jUMYCdJoxqreOSwF5889G+qsLBJdr1Vao=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=wsUn8+tCOZOU+oVVR7D1nl6ZjaKo6tus0r6hE2//KJWcUgdDu0kVIWiu2imSPO/Bykahy+D8iRX7Tq7WLxepxAK3cCMfCr0BCud+c+KlYHRye/Ax+hsE/FrrAuE2s5WTzFiZyD09AWvlOuxxbSMpfOucZg8oUVy2sWRlkutptCk= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1700041643; x=1731577643; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=2be3W85hD1jUMYCdJoxqreOSwF5889G+qsLBJdr1Vao=; b=lmoe3uQx/QA/4vlnocRB6s3Fr4wjnUIe+G3lPjtrho5ViNn7tKdYuUDf ZTsROL9Ts4RhOGIMYv+HBQaejL1R72Ze+kD4v6/RNkJFAWCFPAcRjXUqh 3AJd+avt09xk8tG9T2JVVVfF76nzXoSDBFFFEOuNc6XxiOGPm22ChHNYj /d9TwBUAAIew0R/mtZFLGjWSzRKlKGBxQ+lIrkhWADXWj0Hc4ZLvh9wEQ QUMvHwGeQyjrDOSwdbrXlewrRISGjRlOoEV3BwkcZq/rR48+OwDn4rzBm ZGpnrnx+Zkh96iAPokuEZkDBSGfjJCfsAC0gsnKdj7YSlwaa+xpHzW9jI w==; X-IronPort-AV: E=McAfee;i="6600,9927,10894"; a="457342620" X-IronPort-AV: E=Sophos;i="6.03,304,1694761200"; d="scan'208";a="457342620" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Nov 2023 01:47:16 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10894"; a="938431722" X-IronPort-AV: E=Sophos;i="6.03,304,1694761200"; d="scan'208";a="938431722" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by orsmga005.jf.intel.com with ESMTP; 15 Nov 2023 01:47:12 -0800 Received: from shliclel4217.sh.intel.com (shliclel4217.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id CCD791005686; Wed, 15 Nov 2023 17:47:05 +0800 (CST) From: Hongyu Wang To: gcc-patches@gcc.gnu.org Cc: ubizjak@gmail.com, hongtao.liu@intel.com Subject: [PATCH 15/16] [APX NDD] Support APX NDD for shld/shrd insns Date: Wed, 15 Nov 2023 17:47:04 +0800 Message-Id: <20231115094705.3976553-16-hongyu.wang@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20231115094705.3976553-1-hongyu.wang@intel.com> References: <20231115094705.3976553-1-hongyu.wang@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_SHORT, SPF_HELO_NONE, SPF_SOFTFAIL, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1782623165451334275 X-GMAIL-MSGID: 1782623165451334275 For shld/shrd insns, the old pattern use match_dup 0 as its shift src and use +r*m as its constraint. To support NDD we added new define_insns to handle NDD form pattern with extra input and dest operand to be fixed in register. gcc/ChangeLog: * config/i386/i386.md (x86_64_shld_ndd): New define_insn. (x86_64_shld_ndd_1): Likewise. (*x86_64_shld_ndd_2): Likewise. (x86_shld_ndd): Likewise. (x86_shld_ndd_1): Likewise. (*x86_shld_ndd_2): Likewise. (x86_64_shrd_ndd): Likewise. (x86_64_shrd_ndd_1): Likewise. (*x86_64_shrd_ndd_2): Likewise. (x86_shrd_ndd): Likewise. (x86_shrd_ndd_1): Likewise. (*x86_shrd_ndd_2): Likewise. (*x86_64_shld_shrd_1_nozext): Adjust codegen under TARGET_APX_NDD. (*x86_shld_shrd_1_nozext): Likewise. (*x86_64_shrd_shld_1_nozext): Likewise. (*x86_shrd_shld_1_nozext): Likewise. gcc/testsuite/ChangeLog: * gcc.target/i386/apx-ndd-shld-shrd.c: New test. --- gcc/config/i386/i386.md | 323 +++++++++++++++++- .../gcc.target/i386/apx-ndd-shld-shrd.c | 24 ++ 2 files changed, 345 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd-shld-shrd.c diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 760c0d32f4d..2e3d37d08b0 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -14183,6 +14183,24 @@ (define_insn "x86_64_shld" (set_attr "amdfam10_decode" "vector") (set_attr "bdver1_decode" "vector")]) +(define_insn "x86_64_shld_ndd" + [(set (match_operand:DI 0 "register_operand" "=r") + (ior:DI (ashift:DI (match_operand:DI 1 "nonimmediate_operand" "rm") + (and:QI (match_operand:QI 3 "nonmemory_operand" "Jc") + (const_int 63))) + (subreg:DI + (lshiftrt:TI + (zero_extend:TI + (match_operand:DI 2 "register_operand" "r")) + (minus:QI (const_int 64) + (and:QI (match_dup 3) (const_int 63)))) 0))) + (clobber (reg:CC FLAGS_REG))] + "TARGET_64BIT && TARGET_APX_NDD" + "shld{q}\t{%s3%2, %1, %0|%0, %1, %2, %3}" + [(set_attr "type" "ishift") + (set_attr "prefix_0f" "1") + (set_attr "mode" "DI")]) + (define_insn "x86_64_shld_1" [(set (match_operand:DI 0 "nonimmediate_operand" "+r*m") (ior:DI (ashift:DI (match_dup 0) @@ -14204,6 +14222,24 @@ (define_insn "x86_64_shld_1" (set_attr "amdfam10_decode" "vector") (set_attr "bdver1_decode" "vector")]) +(define_insn "x86_64_shld_ndd_1" + [(set (match_operand:DI 0 "register_operand" "=r") + (ior:DI (ashift:DI (match_operand:DI 1 "nonimmediate_operand" "rm") + (match_operand:QI 3 "const_0_to_63_operand")) + (subreg:DI + (lshiftrt:TI + (zero_extend:TI + (match_operand:DI 2 "register_operand" "r")) + (match_operand:QI 4 "const_0_to_255_operand")) 0))) + (clobber (reg:CC FLAGS_REG))] + "TARGET_64BIT && TARGET_APX_NDD + && INTVAL (operands[4]) == 64 - INTVAL (operands[3])" + "shld{q}\t{%3, %2, %1, %0|%0, %1, %2, %3}" + [(set_attr "type" "ishift") + (set_attr "mode" "DI") + (set_attr "length_immediate" "1")]) + + (define_insn_and_split "*x86_64_shld_shrd_1_nozext" [(set (match_operand:DI 0 "nonimmediate_operand") (ior:DI (ashift:DI (match_operand:DI 4 "nonimmediate_operand") @@ -14229,6 +14265,23 @@ (define_insn_and_split "*x86_64_shld_shrd_1_nozext" operands[4] = force_reg (DImode, operands[4]); emit_insn (gen_x86_64_shrd_1 (operands[0], operands[4], operands[3], operands[2])); } + else if (TARGET_APX_NDD) + { + rtx tmp = gen_reg_rtx (DImode); + if (MEM_P (operands[4])) + { + operands[1] = force_reg (DImode, operands[1]); + emit_insn (gen_x86_64_shld_ndd_1 (tmp, operands[4], operands[1], + operands[2], operands[3])); + } + else if (MEM_P (operands[1])) + emit_insn (gen_x86_64_shrd_ndd_1 (tmp, operands[1], operands[4], + operands[3], operands[2])); + else + emit_insn (gen_x86_64_shld_ndd_1 (tmp, operands[4], operands[1], + operands[2], operands[3])); + emit_move_insn (operands[0], tmp); + } else { operands[1] = force_reg (DImode, operands[1]); @@ -14261,6 +14314,33 @@ (define_insn_and_split "*x86_64_shld_2" (const_int 63)))) 0))) (clobber (reg:CC FLAGS_REG))])]) +(define_insn_and_split "*x86_64_shld_ndd_2" + [(set (match_operand:DI 0 "nonimmediate_operand") + (ior:DI (ashift:DI (match_operand:DI 1 "nonimmediate_operand") + (match_operand:QI 3 "nonmemory_operand")) + (lshiftrt:DI (match_operand:DI 2 "register_operand") + (minus:QI (const_int 64) (match_dup 3))))) + (clobber (reg:CC FLAGS_REG))] + "TARGET_64BIT && TARGET_APX_NDD + && ix86_pre_reload_split ()" + "#" + "&& 1" + [(parallel [(set (match_dup 4) + (ior:DI (ashift:DI (match_dup 1) + (and:QI (match_dup 3) (const_int 63))) + (subreg:DI + (lshiftrt:TI + (zero_extend:TI (match_dup 2)) + (minus:QI (const_int 64) + (and:QI (match_dup 3) + (const_int 63)))) 0))) + (clobber (reg:CC FLAGS_REG)) + (set (match_dup 0) (match_dup 4))])] +{ + operands[4] = gen_reg_rtx (DImode); + emit_move_insn (operands[4], operands[0]); +}) + (define_insn "x86_shld" [(set (match_operand:SI 0 "nonimmediate_operand" "+r*m") (ior:SI (ashift:SI (match_dup 0) @@ -14283,6 +14363,24 @@ (define_insn "x86_shld" (set_attr "amdfam10_decode" "vector") (set_attr "bdver1_decode" "vector")]) +(define_insn "x86_shld_ndd" + [(set (match_operand:SI 0 "nonimmediate_operand" "=r") + (ior:SI (ashift:SI (match_operand:SI 1 "nonimmediate_operand" "rm") + (and:QI (match_operand:QI 3 "nonmemory_operand" "Ic") + (const_int 31))) + (subreg:SI + (lshiftrt:DI + (zero_extend:DI + (match_operand:SI 2 "register_operand" "r")) + (minus:QI (const_int 32) + (and:QI (match_dup 3) (const_int 31)))) 0))) + (clobber (reg:CC FLAGS_REG))] + "TARGET_APX_NDD" + "shld{l}\t{%s3%2, %1, %0|%0, %1, %2, %3}" + [(set_attr "type" "ishift") + (set_attr "mode" "SI")]) + + (define_insn "x86_shld_1" [(set (match_operand:SI 0 "nonimmediate_operand" "+r*m") (ior:SI (ashift:SI (match_dup 0) @@ -14304,6 +14402,24 @@ (define_insn "x86_shld_1" (set_attr "amdfam10_decode" "vector") (set_attr "bdver1_decode" "vector")]) +(define_insn "x86_shld_ndd_1" + [(set (match_operand:SI 0 "register_operand" "=r") + (ior:SI (ashift:SI (match_operand:SI 1 "nonimmediate_operand" "rm") + (match_operand:QI 3 "const_0_to_31_operand")) + (subreg:SI + (lshiftrt:DI + (zero_extend:DI + (match_operand:SI 2 "register_operand" "r")) + (match_operand:QI 4 "const_0_to_63_operand")) 0))) + (clobber (reg:CC FLAGS_REG))] + "TARGET_APX_NDD && + INTVAL (operands[4]) == 32 - INTVAL (operands[3])" + "shld{l}\t{%3, %2, %1, %0|%0, %1, %2, %3}" + [(set_attr "type" "ishift") + (set_attr "length_immediate" "1") + (set_attr "mode" "SI")]) + + (define_insn_and_split "*x86_shld_shrd_1_nozext" [(set (match_operand:SI 0 "nonimmediate_operand") (ior:SI (ashift:SI (match_operand:SI 4 "nonimmediate_operand") @@ -14328,7 +14444,24 @@ (define_insn_and_split "*x86_shld_shrd_1_nozext" operands[4] = force_reg (SImode, operands[4]); emit_insn (gen_x86_shrd_1 (operands[0], operands[4], operands[3], operands[2])); } - else + else if (TARGET_APX_NDD) + { + rtx tmp = gen_reg_rtx (SImode); + if (MEM_P (operands[4])) + { + operands[1] = force_reg (SImode, operands[1]); + emit_insn (gen_x86_shld_ndd_1 (tmp, operands[4], operands[1], + operands[2], operands[3])); + } + else if (MEM_P (operands[1])) + emit_insn (gen_x86_shrd_ndd_1 (tmp, operands[1], operands[4], + operands[3], operands[2])); + else + emit_insn (gen_x86_shld_ndd_1 (tmp, operands[4], operands[1], + operands[2], operands[3])); + emit_move_insn (operands[0], tmp); + } + else { operands[1] = force_reg (SImode, operands[1]); rtx tmp = gen_reg_rtx (SImode); @@ -14360,6 +14493,33 @@ (define_insn_and_split "*x86_shld_2" (const_int 31)))) 0))) (clobber (reg:CC FLAGS_REG))])]) +(define_insn_and_split "*x86_shld_ndd_2" + [(set (match_operand:SI 0 "nonimmediate_operand") + (ior:SI (ashift:SI (match_operand:SI 1 "nonimmediate_operand") + (match_operand:QI 3 "nonmemory_operand")) + (lshiftrt:SI (match_operand:SI 2 "register_operand") + (minus:QI (const_int 32) (match_dup 3))))) + (clobber (reg:CC FLAGS_REG))] + "TARGET_64BIT && TARGET_APX_NDD + && ix86_pre_reload_split ()" + "#" + "&& 1" + [(parallel [(set (match_dup 4) + (ior:SI (ashift:SI (match_dup 1) + (and:QI (match_dup 3) (const_int 31))) + (subreg:SI + (lshiftrt:DI + (zero_extend:DI (match_dup 2)) + (minus:QI (const_int 32) + (and:QI (match_dup 3) + (const_int 31)))) 0))) + (clobber (reg:CC FLAGS_REG)) + (set (match_dup 0) (match_dup 4))])] +{ + operands[4] = gen_reg_rtx (SImode); + emit_move_insn (operands[4], operands[0]); +}) + (define_expand "@x86_shift_adj_1" [(set (reg:CCZ FLAGS_REG) (compare:CCZ (and:QI (match_operand:QI 2 "register_operand") @@ -15308,6 +15468,24 @@ (define_insn "x86_64_shrd" (set_attr "amdfam10_decode" "vector") (set_attr "bdver1_decode" "vector")]) +(define_insn "x86_64_shrd_ndd" + [(set (match_operand:DI 0 "register_operand" "=r") + (ior:DI (lshiftrt:DI (match_operand:DI 1 "nonimmediate_operand" "rm") + (and:QI (match_operand:QI 3 "nonmemory_operand" "Jc") + (const_int 63))) + (subreg:DI + (ashift:TI + (zero_extend:TI + (match_operand:DI 2 "register_operand" "r")) + (minus:QI (const_int 64) + (and:QI (match_dup 3) (const_int 63)))) 0))) + (clobber (reg:CC FLAGS_REG))] + "TARGET_64BIT && TARGET_APX_NDD" + "shrd{q}\t{%s3%2, %1, %0|%0, %1, %2, %3}" + [(set_attr "type" "ishift") + (set_attr "mode" "DI")]) + + (define_insn "x86_64_shrd_1" [(set (match_operand:DI 0 "nonimmediate_operand" "+r*m") (ior:DI (lshiftrt:DI (match_dup 0) @@ -15329,6 +15507,24 @@ (define_insn "x86_64_shrd_1" (set_attr "amdfam10_decode" "vector") (set_attr "bdver1_decode" "vector")]) +(define_insn "x86_64_shrd_ndd_1" + [(set (match_operand:DI 0 "register_operand" "=r") + (ior:DI (lshiftrt:DI (match_operand:DI 1 "nonimmediate_operand" "rm") + (match_operand:QI 3 "const_0_to_63_operand")) + (subreg:DI + (ashift:TI + (zero_extend:TI + (match_operand:DI 2 "register_operand" "r")) + (match_operand:QI 4 "const_0_to_255_operand")) 0))) + (clobber (reg:CC FLAGS_REG))] + "TARGET_64BIT && TARGET_APX_NDD + && INTVAL (operands[4]) == 64 - INTVAL (operands[3])" + "shrd{q}\t{%3, %2, %1, %0|%0, %1, %2, %3}" + [(set_attr "type" "ishift") + (set_attr "length_immediate" "1") + (set_attr "mode" "DI")]) + + (define_insn_and_split "*x86_64_shrd_shld_1_nozext" [(set (match_operand:DI 0 "nonimmediate_operand") (ior:DI (lshiftrt:DI (match_operand:DI 4 "nonimmediate_operand") @@ -15354,6 +15550,23 @@ (define_insn_and_split "*x86_64_shrd_shld_1_nozext" operands[4] = force_reg (DImode, operands[4]); emit_insn (gen_x86_64_shld_1 (operands[0], operands[4], operands[3], operands[2])); } + else if (TARGET_APX_NDD) + { + rtx tmp = gen_reg_rtx (DImode); + if (MEM_P (operands[4])) + { + operands[1] = force_reg (DImode, operands[1]); + emit_insn (gen_x86_64_shrd_ndd_1 (tmp, operands[4], operands[1], + operands[2], operands[3])); + } + else if (MEM_P (operands[1])) + emit_insn (gen_x86_64_shld_ndd_1 (tmp, operands[1], operands[4], + operands[3], operands[2])); + else + emit_insn (gen_x86_64_shrd_ndd_1 (tmp, operands[4], operands[1], + operands[2], operands[3])); + emit_move_insn (operands[0], tmp); + } else { operands[1] = force_reg (DImode, operands[1]); @@ -15386,6 +15599,33 @@ (define_insn_and_split "*x86_64_shrd_2" (const_int 63)))) 0))) (clobber (reg:CC FLAGS_REG))])]) +(define_insn_and_split "*x86_64_shrd_ndd_2" + [(set (match_operand:DI 0 "nonimmediate_operand") + (ior:DI (lshiftrt:DI (match_operand:DI 1 "nonimmediate_operand") + (match_operand:QI 3 "nonmemory_operand")) + (ashift:DI (match_operand:DI 2 "register_operand") + (minus:QI (const_int 64) (match_dup 2))))) + (clobber (reg:CC FLAGS_REG))] + "TARGET_64BIT && TARGET_APX_NDD + && ix86_pre_reload_split ()" + "#" + "&& 1" + [(parallel [(set (match_dup 4) + (ior:DI (lshiftrt:DI (match_dup 1) + (and:QI (match_dup 3) (const_int 63))) + (subreg:DI + (ashift:TI + (zero_extend:TI (match_dup 2)) + (minus:QI (const_int 64) + (and:QI (match_dup 3) + (const_int 63)))) 0))) + (clobber (reg:CC FLAGS_REG)) + (set (match_dup 0) (match_dup 4))])] +{ + operands[4] = gen_reg_rtx (DImode); + emit_move_insn (operands[4], operands[0]); +}) + (define_insn "x86_shrd" [(set (match_operand:SI 0 "nonimmediate_operand" "+r*m") (ior:SI (lshiftrt:SI (match_dup 0) @@ -15408,6 +15648,23 @@ (define_insn "x86_shrd" (set_attr "amdfam10_decode" "vector") (set_attr "bdver1_decode" "vector")]) +(define_insn "x86_shrd_ndd" + [(set (match_operand:SI 0 "register_operand" "=r") + (ior:SI (lshiftrt:SI (match_operand:SI 1 "nonimmediate_operand" "rm") + (and:QI (match_operand:QI 3 "nonmemory_operand" "Ic") + (const_int 31))) + (subreg:SI + (ashift:DI + (zero_extend:DI + (match_operand:SI 2 "register_operand" "r")) + (minus:QI (const_int 32) + (and:QI (match_dup 3) (const_int 31)))) 0))) + (clobber (reg:CC FLAGS_REG))] + "TARGET_APX_NDD" + "shrd{l}\t{%s3%2, %1, %0|%0, %1, %2, %3}" + [(set_attr "type" "ishift") + (set_attr "mode" "SI")]) + (define_insn "x86_shrd_1" [(set (match_operand:SI 0 "nonimmediate_operand" "+r*m") (ior:SI (lshiftrt:SI (match_dup 0) @@ -15429,6 +15686,24 @@ (define_insn "x86_shrd_1" (set_attr "amdfam10_decode" "vector") (set_attr "bdver1_decode" "vector")]) +(define_insn "x86_shrd_ndd_1" + [(set (match_operand:SI 0 "register_operand" "=r") + (ior:SI (lshiftrt:SI (match_operand:SI 1 "nonimmediate_operand" "rm") + (match_operand:QI 3 "const_0_to_31_operand")) + (subreg:SI + (ashift:DI + (zero_extend:DI + (match_operand:SI 2 "register_operand" "r")) + (match_operand:QI 4 "const_0_to_63_operand")) 0))) + (clobber (reg:CC FLAGS_REG))] + "TARGET_APX_NDD + && (INTVAL (operands[4]) == 32 - INTVAL (operands[3]))" + "shrd{l}\t{%3, %2, %1, %0|%0, %1, %2, %3}" + [(set_attr "type" "ishift") + (set_attr "length_immediate" "1") + (set_attr "mode" "SI")]) + + (define_insn_and_split "*x86_shrd_shld_1_nozext" [(set (match_operand:SI 0 "nonimmediate_operand") (ior:SI (lshiftrt:SI (match_operand:SI 4 "nonimmediate_operand") @@ -15453,7 +15728,24 @@ (define_insn_and_split "*x86_shrd_shld_1_nozext" operands[4] = force_reg (SImode, operands[4]); emit_insn (gen_x86_shld_1 (operands[0], operands[4], operands[3], operands[2])); } - else + else if (TARGET_APX_NDD) + { + rtx tmp = gen_reg_rtx (SImode); + if (MEM_P (operands[4])) + { + operands[1] = force_reg (SImode, operands[1]); + emit_insn (gen_x86_shrd_ndd_1 (tmp, operands[4], operands[1], + operands[2], operands[3])); + } + else if (MEM_P (operands[1])) + emit_insn (gen_x86_shld_ndd_1 (tmp, operands[1], operands[4], + operands[3], operands[2])); + else + emit_insn (gen_x86_shrd_ndd_1 (tmp, operands[4], operands[1], + operands[2], operands[3])); + emit_move_insn (operands[0], tmp); + } + else { operands[1] = force_reg (SImode, operands[1]); rtx tmp = gen_reg_rtx (SImode); @@ -15485,6 +15777,33 @@ (define_insn_and_split "*x86_shrd_2" (const_int 31)))) 0))) (clobber (reg:CC FLAGS_REG))])]) +(define_insn_and_split "*x86_shrd_ndd_2" + [(set (match_operand:SI 0 "nonimmediate_operand") + (ior:SI (lshiftrt:SI (match_operand:SI 1 "nonimmediate_operand") + (match_operand:QI 3 "nonmemory_operand")) + (ashift:SI (match_operand:SI 2 "register_operand") + (minus:QI (const_int 32) (match_dup 3))))) + (clobber (reg:CC FLAGS_REG))] + "TARGET_64BIT && TARGET_APX_NDD + && ix86_pre_reload_split ()" + "#" + "&& 1" + [(parallel [(set (match_dup 4) + (ior:SI (lshiftrt:SI (match_dup 1) + (and:QI (match_dup 3) (const_int 31))) + (subreg:SI + (ashift:DI + (zero_extend:DI (match_dup 2)) + (minus:QI (const_int 32) + (and:QI (match_dup 3) + (const_int 31)))) 0))) + (clobber (reg:CC FLAGS_REG)) + (set (match_dup 0) (match_dup 4))])] +{ + operands[4] = gen_reg_rtx (SImode); + emit_move_insn (operands[4], operands[0]); +}) + ;; Base name for insn mnemonic. (define_mode_attr cvt_mnemonic [(SI "{cltd|cdq}") (DI "{cqto|cqo}")]) diff --git a/gcc/testsuite/gcc.target/i386/apx-ndd-shld-shrd.c b/gcc/testsuite/gcc.target/i386/apx-ndd-shld-shrd.c new file mode 100644 index 00000000000..87068ea31aa --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/apx-ndd-shld-shrd.c @@ -0,0 +1,24 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2 -Wno-shift-count-overflow -m64 -mapxf" } */ +/* { dg-final { scan-assembler-times {(?n)shld[ql]?[\t ]*\$2} 4 } } */ +/* { dg-final { scan-assembler-times {(?n)shrd[ql]?[\t ]*\$2} 4 } } */ + +typedef unsigned long u64; +typedef unsigned int u32; + +long a; +int c; +const char n = 2; + +long test64r (long e) { long t = ((u64)a >> n) | (e << (64 - n)); return t;} +long test64l (u64 e) { long t = (a << n) | (e >> (64 - n)); return t;} +int test32r (int f) { int t = ((u32)c >> n) | (f << (32 - n)); return t; } +int test32l (u32 f) { int t = (c << n) | (f >> (32 - n)); return t; } + +u64 ua; +u32 uc; + +u64 testu64l (u64 ue) { u64 ut = (ua << n) | (ue >> (64 - n)); return ut; } +u64 testu64r (u64 ue) { u64 ut = (ua >> n) | (ue << (64 - n)); return ut; } +u32 testu32l (u32 uf) { u32 ut = (uc << n) | (uf >> (32 - n)); return ut; } +u32 testu32r (u32 uf) { u32 ut = (uc >> n) | (uf << (32 - n)); return ut; }