From patchwork Tue Dec 5 02:29:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hongyu Wang X-Patchwork-Id: 173691 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:bcd1:0:b0:403:3b70:6f57 with SMTP id r17csp3176678vqy; Mon, 4 Dec 2023 18:41:59 -0800 (PST) X-Google-Smtp-Source: AGHT+IH9yy1BEd0xl+8NiBxtwDbXitp2kC6hfzjMR9jVbjfxQY+PvapQ6N5Uvfn56sst5T9h9xfn X-Received: by 2002:a05:622a:180d:b0:425:4043:18ab with SMTP id t13-20020a05622a180d00b00425404318abmr734310qtc.94.1701744119136; Mon, 04 Dec 2023 18:41:59 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1701744119; cv=pass; d=google.com; s=arc-20160816; b=mB7K2LjDK5qFavZRzoqy1DGSvbib5IT7pbJApxbebO5l4l0dnSPu9Ko5Sx0T7FZP/z SHybzNBDPUyAe5BcWabGs6LaqYKTepSgxnBOSYu9o0jEn4rEUoHb1yexUGtccwWP+yqV wV0rDJI61XV/dwqfOH0h2Z8UFNwelovsHCIM8QCcCPEIF2R/vCtcL4OhV16gqCry6ZA5 RHTx2CmJmHIFZ6XC2FJwug0tBKajhDlSq9WBCcyWLKxd8YOaFhkdZ4b34Kji4NHYtgDe HVynx8vMqI/wG9w9EPYwuvgyYzjbQrySrVZZ2WM9JqPDqYMLuo9ENt3RSIjM69T4THAg OlVw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature:arc-filter:dmarc-filter:delivered-to; bh=ncRz2VySLMpiiW1btngQ5xBQDo/Ly8x/tin+gWZT4pY=; fh=n8eNxIWSYJwy/CU3QSXzDvE/zeEoomCGojuOcYEQEyQ=; b=dkm6Ai2s/AXP5gJtRaeoNfqV0GVG8P4k2x6vP8wxHdvc0c6jF2rPu69jX7Iwp1y10S tpxRHwu+7TiGmwMwTWusty3U4789bwvF7pnWezajlyBx53tXyBCgARa5Hn40xlXEoXfN 3F2cC9itexC8vgzShRNH7kFvhUKTIVQE+GG9wXvKtCoj+vxwHhtXT2edpZo7eBfedhhx OHBbGVCTYiEkSV4Me2g7pN/sOnnZXJpSitWu01hxTkYpJ2B3q/awAriVDXoUNI/3BwRl T3uUyqRXtdrFYljeJ6/s0Fk/KNr6w/sgv9Ycy48mj8z66v0gC3f7apYQqrlMOqa7XnO3 /JHA== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=C1EDh8Qy; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from server2.sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id bw25-20020a05622a099900b00423807a611dsi11896541qtb.424.2023.12.04.18.41.59 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 04 Dec 2023 18:41:59 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=C1EDh8Qy; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id D93A03ABA6A6 for ; Tue, 5 Dec 2023 02:33:33 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from eggs.gnu.org (eggs.gnu.org [IPv6:2001:470:142:3::10]) by sourceware.org (Postfix) with ESMTPS id 556C13898C67 for ; Tue, 5 Dec 2023 02:31:34 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 556C13898C67 Authentication-Results: sourceware.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=fail smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 556C13898C67 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2001:470:142:3::10 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701743503; cv=none; b=Zzwd2GGInipvadpkHy8zV5svD0vLyBXJWdpR9wSAmJ1kYgSJJLd6U0OP3u+Ab0pnRkTQfosbPThY25092OGqxb5yPFWIopRro83VGnX1WRnj0qlIwznNJMUhXjxu8Bwbvg3+h9vUPu7xiLskag7iGaO+zBTfdbVqxhyaya9iOKk= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701743503; c=relaxed/simple; bh=Ok+OyFV+zE2gERJkWqPUHnvBdcsFzSjivsaUoc8N/G8=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=E6FnFybytlkBlUUuFEcwfsUQTYLRwWshqdZYARQ4Gt9Z8aE98O9DiUuoOwikVU+hVrLe62Rcgv5JrDkmmO0BDb+IC2uqA3PHH+8jAZJ3Q4PtLiTngS10/Q/kOOtMLmYZOx7/d4jFtb9FjeBRUjy8Rv34C7OdF264czKLdkvVk6Q= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from mgamail.intel.com ([192.55.52.136]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rALDf-0001YN-30 for gcc-patches@gcc.gnu.org; Mon, 04 Dec 2023 21:31:33 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1701743491; x=1733279491; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Ok+OyFV+zE2gERJkWqPUHnvBdcsFzSjivsaUoc8N/G8=; b=C1EDh8QyLeA40kuJ2JFmtXk5nZQrn6yc2toITdDIkbIMLFjy1eYR9yzJ Neu3YQ40X7xjhFstaGpL10ssFs0k29kH8Gv+rtadxRZj3wK3AA7NyKFdd kH0hOecACaAnxx/pqukJj9g2yU0kCEqpvL7OqOp/dt/ZI8iyN72sfErdf yB5SxIhQKgu/QmI1ykmYlk5WZE+SJ6PQnpIRhXmcdiUdHVMHjWoPht10J 7/3+rN59FlFZtTfkvifmR4inYAkMoCdVZwqb/09qdSOOdqy3/RXSDoUgJ WxnMxxSP3f4cQcYJ7dabqBf5G1jP0iU1z23iztI7sb183+ukSWQLjXYuX w==; X-IronPort-AV: E=McAfee;i="6600,9927,10914"; a="373277819" X-IronPort-AV: E=Sophos;i="6.04,251,1695711600"; d="scan'208";a="373277819" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Dec 2023 18:29:58 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10914"; a="841275550" X-IronPort-AV: E=Sophos;i="6.04,251,1695711600"; d="scan'208";a="841275550" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by fmsmga004.fm.intel.com with ESMTP; 04 Dec 2023 18:29:54 -0800 Received: from shliclel4217.sh.intel.com (shliclel4217.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id 69132100780E; Tue, 5 Dec 2023 10:29:48 +0800 (CST) From: Hongyu Wang To: gcc-patches@gcc.gnu.org Cc: ubizjak@gmail.com, hongtao.liu@intel.com Subject: [PATCH 17/17] [APX NDD] Support TImode shift for NDD Date: Tue, 5 Dec 2023 10:29:48 +0800 Message-Id: <20231205022948.504790-18-hongyu.wang@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20231205022948.504790-1-hongyu.wang@intel.com> References: <20231205022948.504790-1-hongyu.wang@intel.com> MIME-Version: 1.0 Received-SPF: softfail client-ip=192.55.52.136; envelope-from=wwwhhhyyy333@gmail.com; helo=mgamail.intel.com X-Spam_score_int: -28 X-Spam_score: -2.9 X-Spam_bar: -- X-Spam_report: (-2.9 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FORGED_FROMDOMAIN=0.249, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.25, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_SOFTFAIL=0.665, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_SHORT, SPF_HELO_PASS, SPF_SOFTFAIL, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1784408041556245778 X-GMAIL-MSGID: 1784408041556245778 For TImode shifts, they are splitted by splitter functions, which assume operands[0] and operands[1] to be the same. For the NDD alternative the assumption may not be true so add split functions for NDD to emit the NDD form instructions, and omit the handling of !64bit target split. Although the NDD form allows memory src, for post-reload splitter there are no extra register to accept NDD form shift, especially shld/shrd. So only accept register alternative for shift src under NDD. gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_split_ashl_ndd): New function to split NDD form lshift. (ix86_split_rshift_ndd): Likewise for l/ashiftrt. * config/i386/i386-protos.h (ix86_split_ashl_ndd): New prototype. (ix86_split_rshift_ndd): Likewise. * config/i386/i386.md (ashl3_doubleword): Add NDD alternative, call ndd split function when operands[0] not equal to operands[1]. (define_split for doubleword lshift): Likewise. (define_peephole for doubleword lshift): Likewise. (3_doubleword): Likewise for l/ashiftrt. (define_split for doubleword l/ashiftrt): Likewise. (define_peephole for doubleword l/ashiftrt): Likewise. gcc/ChangeLog: * gcc.target/i386/apx-ndd-ti-shift.c: New test. --- gcc/config/i386/i386-expand.cc | 136 ++++++++++++++++++ gcc/config/i386/i386-protos.h | 2 + gcc/config/i386/i386.md | 56 ++++++-- .../gcc.target/i386/apx-ndd-ti-shift.c | 91 ++++++++++++ 4 files changed, 273 insertions(+), 12 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd-ti-shift.c diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc index d4bbd33ce07..a53d69d5400 100644 --- a/gcc/config/i386/i386-expand.cc +++ b/gcc/config/i386/i386-expand.cc @@ -6678,6 +6678,142 @@ ix86_split_lshr (rtx *operands, rtx scratch, machine_mode mode) } } +/* Helper function to split TImode ashl under NDD. */ +void +ix86_split_ashl_ndd (rtx *operands, rtx scratch) +{ + gcc_assert (TARGET_APX_NDD); + int half_width = GET_MODE_BITSIZE (TImode) >> 1; + + rtx low[2], high[2]; + int count; + + split_double_mode (TImode, operands, 2, low, high); + if (CONST_INT_P (operands[2])) + { + count = INTVAL (operands[2]) & (GET_MODE_BITSIZE (TImode) - 1); + + if (count >= half_width) + { + count = count - half_width; + if (count == 0) + { + if (!rtx_equal_p (high[0], low[1])) + emit_move_insn (high[0], low[1]); + } + else if (count == 1) + emit_insn (gen_adddi3 (high[0], low[1], low[1])); + else + emit_insn (gen_ashldi3 (high[0], low[1], GEN_INT (count))); + + ix86_expand_clear (low[0]); + } + else if (count == 1) + { + rtx x3 = gen_rtx_REG (CCCmode, FLAGS_REG); + rtx x4 = gen_rtx_LTU (TImode, x3, const0_rtx); + emit_insn (gen_add3_cc_overflow_1 (DImode, low[0], + low[1], low[1])); + emit_insn (gen_add3_carry (DImode, high[0], high[1], high[1], + x3, x4)); + } + else + { + emit_insn (gen_x86_64_shld_ndd (high[0], high[1], low[1], + GEN_INT (count))); + emit_insn (gen_ashldi3 (low[0], low[1], GEN_INT (count))); + } + } + else + { + emit_insn (gen_x86_64_shld_ndd (high[0], high[1], low[1], + operands[2])); + emit_insn (gen_ashldi3 (low[0], low[1], operands[2])); + if (TARGET_CMOVE && scratch) + { + ix86_expand_clear (scratch); + emit_insn (gen_x86_shift_adj_1 + (DImode, high[0], low[0], operands[2], scratch)); + } + else + emit_insn (gen_x86_shift_adj_2 (DImode, high[0], low[0], operands[2])); + } +} + +/* Helper function to split TImode l/ashr under NDD. */ +void +ix86_split_rshift_ndd (enum rtx_code code, rtx *operands, rtx scratch) +{ + gcc_assert (TARGET_APX_NDD); + int half_width = GET_MODE_BITSIZE (TImode) >> 1; + bool ashr_p = code == ASHIFTRT; + rtx (*gen_shr)(rtx, rtx, rtx) = ashr_p ? gen_ashrdi3 + : gen_lshrdi3; + + rtx low[2], high[2]; + int count; + + split_double_mode (TImode, operands, 2, low, high); + if (CONST_INT_P (operands[2])) + { + count = INTVAL (operands[2]) & (GET_MODE_BITSIZE (TImode) - 1); + + if (ashr_p && (count == GET_MODE_BITSIZE (TImode) - 1)) + { + emit_insn (gen_shr (high[0], high[1], + GEN_INT (half_width - 1))); + emit_move_insn (low[0], high[0]); + } + else if (count >= half_width) + { + if (ashr_p) + emit_insn (gen_shr (high[0], high[1], + GEN_INT (half_width - 1))); + else + ix86_expand_clear (high[0]); + + if (count > half_width) + emit_insn (gen_shr (low[0], high[1], + GEN_INT (count - half_width))); + else + emit_move_insn (low[0], high[1]); + } + else + { + emit_insn (gen_x86_64_shrd_ndd (low[0], low[1], high[1], + GEN_INT (count))); + emit_insn (gen_shr (high[0], high[1], GEN_INT (count))); + } + } + else + { + emit_insn (gen_x86_64_shrd_ndd (low[0], low[1], high[1], + operands[2])); + emit_insn (gen_shr (high[0], high[1], operands[2])); + + if (TARGET_CMOVE && scratch) + { + if (ashr_p) + { + emit_move_insn (scratch, high[0]); + emit_insn (gen_shr (scratch, scratch, + GEN_INT (half_width - 1))); + } + else + ix86_expand_clear (scratch); + + emit_insn (gen_x86_shift_adj_1 + (DImode, low[0], high[0], operands[2], scratch)); + } + else if (ashr_p) + emit_insn (gen_x86_shift_adj_3 + (DImode, low[0], high[0], operands[2])); + else + emit_insn (gen_x86_shift_adj_2 + (DImode, low[0], high[0], operands[2])); + } +} + /* Expand move of V1TI mode register X to a new TI mode register. */ static rtx ix86_expand_v1ti_to_ti (rtx x) diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h index fa952409729..56349064a6c 100644 --- a/gcc/config/i386/i386-protos.h +++ b/gcc/config/i386/i386-protos.h @@ -174,8 +174,10 @@ extern void x86_initialize_trampoline (rtx, rtx, rtx); extern rtx ix86_zero_extend_to_Pmode (rtx); extern void ix86_split_long_move (rtx[]); extern void ix86_split_ashl (rtx *, rtx, machine_mode); +extern void ix86_split_ashl_ndd (rtx *, rtx); extern void ix86_split_ashr (rtx *, rtx, machine_mode); extern void ix86_split_lshr (rtx *, rtx, machine_mode); +extern void ix86_split_rshift_ndd (enum rtx_code, rtx *, rtx); extern void ix86_expand_v1ti_shift (enum rtx_code, rtx[]); extern void ix86_expand_v1ti_rotate (enum rtx_code, rtx[]); extern void ix86_expand_v1ti_ashiftrt (rtx[]); diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 853f53c2bb9..331dda89b29 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -14420,13 +14420,14 @@ (define_insn_and_split "*ashl3_doubleword_mask_1" }) (define_insn "ashl3_doubleword" - [(set (match_operand:DWI 0 "register_operand" "=&r") - (ashift:DWI (match_operand:DWI 1 "reg_or_pm1_operand" "0n") - (match_operand:QI 2 "nonmemory_operand" "c"))) + [(set (match_operand:DWI 0 "register_operand" "=&r,r") + (ashift:DWI (match_operand:DWI 1 "reg_or_pm1_operand" "0n,r") + (match_operand:QI 2 "nonmemory_operand" "c,c"))) (clobber (reg:CC FLAGS_REG))] "" "#" - [(set_attr "type" "multi")]) + [(set_attr "type" "multi") + (set_attr "isa" "*,apx_ndd")]) (define_split [(set (match_operand:DWI 0 "register_operand") @@ -14435,7 +14436,15 @@ (define_split (clobber (reg:CC FLAGS_REG))] "epilogue_completed" [(const_int 0)] - "ix86_split_ashl (operands, NULL_RTX, mode); DONE;") +{ + if (TARGET_APX_NDD + && !rtx_equal_p (operands[0], operands[1]) + && REG_P (operands[1])) + ix86_split_ashl_ndd (operands, NULL_RTX); + else + ix86_split_ashl (operands, NULL_RTX, mode); + DONE; +}) ;; By default we don't ask for a scratch register, because when DWImode ;; values are manipulated, registers are already at a premium. But if @@ -14451,7 +14460,15 @@ (define_peephole2 (match_dup 3)] "TARGET_CMOVE" [(const_int 0)] - "ix86_split_ashl (operands, operands[3], mode); DONE;") +{ + if (TARGET_APX_NDD + && !rtx_equal_p (operands[0], operands[1]) + && (REG_P (operands[1]))) + ix86_split_ashl_ndd (operands, operands[3]); + else + ix86_split_ashl (operands, operands[3], mode); + DONE; +}) (define_insn_and_split "*ashl3_doubleword_highpart" [(set (match_operand: 0 "register_operand" "=r") @@ -15708,16 +15725,24 @@ (define_insn_and_split "*3_doubleword_mask_1" }) (define_insn_and_split "3_doubleword" - [(set (match_operand:DWI 0 "register_operand" "=&r") - (any_shiftrt:DWI (match_operand:DWI 1 "register_operand" "0") - (match_operand:QI 2 "nonmemory_operand" "c"))) + [(set (match_operand:DWI 0 "register_operand" "=&r,r") + (any_shiftrt:DWI (match_operand:DWI 1 "register_operand" "0,r") + (match_operand:QI 2 "nonmemory_operand" "c,c"))) (clobber (reg:CC FLAGS_REG))] "" "#" "epilogue_completed" [(const_int 0)] - "ix86_split_ (operands, NULL_RTX, mode); DONE;" - [(set_attr "type" "multi")]) +{ + if (TARGET_APX_NDD + && !rtx_equal_p (operands[0], operands[1])) + ix86_split_rshift_ndd (, operands, NULL_RTX); + else + ix86_split_ (operands, NULL_RTX, mode); + DONE; +} + [(set_attr "type" "multi") + (set_attr "isa" "*,apx_ndd")]) ;; By default we don't ask for a scratch register, because when DWImode ;; values are manipulated, registers are already at a premium. But if @@ -15733,7 +15758,14 @@ (define_peephole2 (match_dup 3)] "TARGET_CMOVE" [(const_int 0)] - "ix86_split_ (operands, operands[3], mode); DONE;") +{ + if (TARGET_APX_NDD + && !rtx_equal_p (operands[0], operands[1])) + ix86_split_rshift_ndd (, operands, operands[3]); + else + ix86_split_ (operands, operands[3], mode); + DONE; +}) ;; Split truncations of double word right shifts into x86_shrd_1. (define_insn_and_split "3_doubleword_lowpart" diff --git a/gcc/testsuite/gcc.target/i386/apx-ndd-ti-shift.c b/gcc/testsuite/gcc.target/i386/apx-ndd-ti-shift.c new file mode 100644 index 00000000000..0489712b7f6 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/apx-ndd-ti-shift.c @@ -0,0 +1,91 @@ +/* { dg-do run { target { int128 && { ! ia32 } } } } */ +/* { dg-require-effective-target apxf } */ +/* { dg-options "-O2" } */ + +#include + +#define APX_TARGET __attribute__((noinline, target("apxf"))) +#define NO_APX __attribute__((noinline, target("no-apxf"))) +typedef __uint128_t u128; +typedef __int128 i128; + +#define TI_SHIFT_FUNC(TYPE, op, name) \ +APX_TARGET \ +TYPE apx_##name##TYPE (TYPE a, char b) \ +{ \ + return a op b; \ +} \ +TYPE noapx_##name##TYPE (TYPE a, char b) \ +{ \ + return a op b; \ +} \ + +#define TI_SHIFT_FUNC_CONST(TYPE, i, op, name) \ +APX_TARGET \ +TYPE apx_##name##TYPE##_const (TYPE a) \ +{ \ + return a op i; \ +} \ +NO_APX \ +TYPE noapx_##name##TYPE##_const (TYPE a) \ +{ \ + return a op i; \ +} + +#define TI_SHIFT_TEST(TYPE, name, val) \ +{\ + if (apx_##name##TYPE (val, b) != noapx_##name##TYPE (val, b)) \ + abort (); \ +} + +#define TI_SHIFT_CONST_TEST(TYPE, name, val) \ +{\ + if (apx_##name##1##TYPE##_const (val) \ + != noapx_##name##1##TYPE##_const (val)) \ + abort (); \ + if (apx_##name##2##TYPE##_const (val) \ + != noapx_##name##2##TYPE##_const (val)) \ + abort (); \ + if (apx_##name##3##TYPE##_const (val) \ + != noapx_##name##3##TYPE##_const (val)) \ + abort (); \ + if (apx_##name##4##TYPE##_const (val) \ + != noapx_##name##4##TYPE##_const (val)) \ + abort (); \ +} + +TI_SHIFT_FUNC(i128, <<, ashl) +TI_SHIFT_FUNC(i128, >>, ashr) +TI_SHIFT_FUNC(u128, >>, lshr) + +TI_SHIFT_FUNC_CONST(i128, 1, <<, ashl1) +TI_SHIFT_FUNC_CONST(i128, 65, <<, ashl2) +TI_SHIFT_FUNC_CONST(i128, 64, <<, ashl3) +TI_SHIFT_FUNC_CONST(i128, 87, <<, ashl4) +TI_SHIFT_FUNC_CONST(i128, 127, >>, ashr1) +TI_SHIFT_FUNC_CONST(i128, 87, >>, ashr2) +TI_SHIFT_FUNC_CONST(i128, 27, >>, ashr3) +TI_SHIFT_FUNC_CONST(i128, 64, >>, ashr4) +TI_SHIFT_FUNC_CONST(u128, 127, >>, lshr1) +TI_SHIFT_FUNC_CONST(u128, 87, >>, lshr2) +TI_SHIFT_FUNC_CONST(u128, 27, >>, lshr3) +TI_SHIFT_FUNC_CONST(u128, 64, >>, lshr4) + +int main (void) +{ + if (!__builtin_cpu_supports ("apxf")) + return 0; + + u128 ival = 0x123456788765432FLL; + u128 uval = 0xF234567887654321ULL; + char b = 28; + + TI_SHIFT_TEST(i128, ashl, ival) + TI_SHIFT_TEST(i128, ashr, ival) + TI_SHIFT_TEST(u128, lshr, uval) + TI_SHIFT_CONST_TEST(i128, ashl, ival) + TI_SHIFT_CONST_TEST(i128, ashr, ival) + TI_SHIFT_CONST_TEST(u128, lshr, uval) + + return 0; +}