From patchwork Wed Dec 6 07:04:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jiahao Xu X-Patchwork-Id: 174335 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:bcd1:0:b0:403:3b70:6f57 with SMTP id r17csp3929225vqy; Tue, 5 Dec 2023 23:05:36 -0800 (PST) X-Google-Smtp-Source: AGHT+IFtKs5JxnxV602ubnauhoN/lmkTowe9qdr2H7K0t7/UNiWn5SHLgilyJcMi1VVrmbGy9myz X-Received: by 2002:a05:620a:8224:b0:77e:fba3:93ab with SMTP id ow36-20020a05620a822400b0077efba393abmr457167qkn.141.1701846336575; Tue, 05 Dec 2023 23:05:36 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1701846336; cv=pass; d=google.com; s=arc-20160816; b=EOS3tdpmYuep+dKR24AK8pBzM7YjPQ2brvK6JX3CyeBNiRyJvonHC6jPM8P9790xIG n+ZkAgTjUVATkEtVxF9el2LQA6okeusmX5VUeyexQmzbqAacPzK9N1N8BxCrLLE9pTsq /IM2rDoZL+CJPdyugndMaRGI3xYPG4CGv2rpj3/zyJfZDDaqzc9XOz5ag8voplriCDEt CZciK4s5sLJn6nGkP1tlHtDmEmSbhDqf9YvFjd4N1vyIraD2fLS0yDcu19JHm89mEIi+ jb6+Os3Vk0U1DfA5UOFir3yuU7FOwvgOynNK3VJlz6/Or8O1usoxuN9B18rPVzaRPw/8 6dDg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:arc-filter:dmarc-filter:delivered-to; bh=WKpr78abrEMQ6qcOSChxGeGvjJ/gw0ZJ3I1tpT51Uyw=; fh=w+xsGLuzpTj56E0bzPMCc39RWHkBXt2f2AGs+4pGimo=; b=NUH4Yx5/9aOdfBcvJcoP+PcrYZUPFKSiuaZOhrfWpiLP0a6ddk2kIKh8zYC25HGLwy KdjjwKRobyzTYel7nVpgO0q4whavG8cuiC8ZwHPozt5J3pPhIxBDkcBzICim2RHZoS6z i3LgpT3gDxdMgqd5urfNRz7S3pnuD1O7L0kuVRIFKo3pg06opkXru/yYCBXWYph8V7v6 P8jsSBWQNweovjEeQwx1YKfuGP0fARUh58rFg/iHJ887AcbcBFJbiCroYrafLtfU8N3m HwQ774CQQfcgI9/GSm6MbdZXYmGsRysbH8ocxfJrDQ+WsRP2qFjQ44kgLypE8GvwzQzg wmog== ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id e8-20020a37ac08000000b0077f2d713eddsi558524qkm.114.2023.12.05.23.05.36 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Dec 2023 23:05:36 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 5579D386D636 for ; Wed, 6 Dec 2023 07:05:35 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail.loongson.cn (mail.loongson.cn [114.242.206.163]) by sourceware.org (Postfix) with ESMTP id C72D0386C5A5 for ; Wed, 6 Dec 2023 07:05:07 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C72D0386C5A5 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=loongson.cn Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=loongson.cn ARC-Filter: OpenARC Filter v1.0.0 sourceware.org C72D0386C5A5 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=114.242.206.163 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701846312; cv=none; b=QRulH0vUHGHLiyfIMO+D/duhS4ehds7V97j28+lw/OuJzok3RlrTcEEBxSfJB28EgezG4BL1qwvrDuGnNkidyHC0pi9P4T1Q0kLh0SWV4PYtTq/ohg9qpc1v50Cpy1iu9WwoLvei48IpL/v+PgHMYl5yPSmBr+2GLKaJShXFCPk= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701846312; c=relaxed/simple; bh=8g8MbX5LagJGwRTPWwMhyvVL+MeJ+0CciaeqAruD5+c=; h=From:To:Subject:Date:Message-Id:MIME-Version; b=mxb9z29rQ+TWoiGCpMmdS5d/RgVRPA55uhvfJjPLhhHBiRaJU1NX5KvppA2tHo0gkAi1P2ay+NkesJ/pPui8QjPYMzi5ocCMBnwpNaJWmBw53fLK7vk2sIiEruJuVSAB31ndz0qD+y4plD8yH9PHZNPm+0j332/Kxk85m6fNXqA= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from loongson.cn (unknown [10.10.130.252]) by gateway (Coremail) with SMTP id _____8AxZ+gcHXBlQzs_AA--.25131S3; Wed, 06 Dec 2023 15:05:00 +0800 (CST) Received: from slurm-master.loongson.cn (unknown [10.10.130.252]) by localhost.localdomain (Coremail) with SMTP id AQAAf8Dxvi8XHXBlp0BWAA--.59594S5; Wed, 06 Dec 2023 15:04:57 +0800 (CST) From: Jiahao Xu To: gcc-patches@gcc.gnu.org Cc: xry111@xry111.site, i@xen0n.name, chenglulu@loongson.cn, xuchenghua@loongson.cn, Jiahao Xu Subject: [PATCH v3 1/5] LoongArch: Add support for LoongArch V1.1 approximate instructions. Date: Wed, 6 Dec 2023 15:04:49 +0800 Message-Id: <20231206070453.3252-2-xujiahao@loongson.cn> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20231206070453.3252-1-xujiahao@loongson.cn> References: <20231206070453.3252-1-xujiahao@loongson.cn> MIME-Version: 1.0 X-CM-TRANSID: AQAAf8Dxvi8XHXBlp0BWAA--.59594S5 X-CM-SenderInfo: 50xmxthkdrqz5rrqw2lrqou0/ X-Coremail-Antispam: 1Uk129KBj9fXoWfur1UXFW5Wr4fAF1ktF4kZrc_yoW5WF4DGo WrCF4DJa1xGryIyrW5KrnxXrWjvayFyF4DAay3Zws5Ca1xJr90k347W3WFy342qF1kWrn8 C3s5W3sxXa4xJFs5l-sFpf9Il3svdjkaLaAFLSUrUUUUjb8apTn2vfkv8UJUUUU8wcxFpf 9Il3svdxBIdaVrn0xqx4xG64xvF2IEw4CE5I8CrVC2j2Jv73VFW2AGmfu7bjvjm3AaLaJ3 UjIYCTnIWjp_UUUY17kC6x804xWl14x267AKxVWUJVW8JwAFc2x0x2IEx4CE42xK8VAvwI 8IcIk0rVWrJVCq3wAFIxvE14AKwVWUGVWUXwA2ocxC64kIII0Yj41l84x0c7CEw4AK67xG Y2AK021l84ACjcxK6xIIjxv20xvE14v26r1I6r4UM28EF7xvwVC0I7IYx2IY6xkF7I0E14 v26r4j6F4UM28EF7xvwVC2z280aVAFwI0_Gr1j6F4UJwA2z4x0Y4vEx4A2jsIEc7CjxVAF wI0_Gr1j6F4UJwAS0I0E0xvYzxvE52x082IY62kv0487Mc804VCY07AIYIkI8VC2zVCFFI 0UMc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7IYx2IY67AKxVWUAVWUtwAv7VC2z280 aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4UM4x0Y48IcxkI7VAKI48JMxAIw28Icx kI7VAKI48JMxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I0E5I8CrVAFwI0_Jr0_Jr4lx2Iq xVCjr7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVWUAVWUtwCIc40Y0x0EwIxGrwCI42 IY6xIIjxv20xvE14v26r1j6r1xMIIF0xvE2Ix0cI8IcVCY1x0267AKxVWUJVW8JwCI42IY 6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Jr0_Gr1lIxAIcVC2z280aV CY1x0267AKxVWUJVW8JbIYCTnIWIevJa73UjIFyTuYvjxU7MmhUUUUU X-Spam-Status: No, score=-13.1 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1784515224271564893 X-GMAIL-MSGID: 1784515224271564893 This patch adds define_insn/builtins/intrinsics for these instructions, and add option -mfrecipe to control instruction generation. gcc/ChangeLog: * config/loongarch/genopts/isa-evolution.in (fecipe): Add. * config/loongarch/larchintrin.h (__frecipe_s): New intrinsic. (__frecipe_d): Ditto. (__frsqrte_s): Ditto. (__frsqrte_d): Ditto. * config/loongarch/lasx.md (lasx_xvfrecipe_): New insn pattern. (lasx_xvfrsqrte_): Ditto. * config/loongarch/lasxintrin.h (__lasx_xvfrecipe_s): New intrinsic. (__lasx_xvfrecipe_d): Ditto. (__lasx_xvfrsqrte_s): Ditto. (__lasx_xvfrsqrte_d): Ditto. * config/loongarch/loongarch-builtins.cc (AVAIL_ALL): Add predicates. (LSX_EXT_BUILTIN): New macro. (LASX_EXT_BUILTIN): Ditto. * config/loongarch/loongarch-cpucfg-map.h: Regenerate. * config/loongarch/loongarch-c.cc: Add builtin macro "__loongarch_frecipe". * config/loongarch/loongarch-def.cc: Regenerate. * config/loongarch/loongarch-str.h (OPTSTR_FRECIPE): Regenerate. * config/loongarch/loongarch.cc (loongarch_asm_code_end): Dump status for TARGET_FRECIPE. * config/loongarch/loongarch.md (loongarch_frecipe_): New insn pattern. (loongarch_frsqrte_): Ditto. * config/loongarch/loongarch.opt: Regenerate. * config/loongarch/lsx.md (lsx_vfrecipe_): New insn pattern. (lsx_vfrsqrte_): Ditto. * config/loongarch/lsxintrin.h (__lsx_vfrecipe_s): New intrinsic. (__lsx_vfrecipe_d): Ditto. (__lsx_vfrsqrte_s): Ditto. (__lsx_vfrsqrte_d): Ditto. * doc/extend.texi: Add documentation for LoongArch new builtins and intrinsics. gcc/testsuite/ChangeLog: * gcc.target/loongarch/larch-frecipe-builtin.c: New test. * gcc.target/loongarch/vector/lasx/lasx-frecipe-builtin.c: New test. * gcc.target/loongarch/vector/lsx/lsx-frecipe-builtin.c: New test. diff --git a/gcc/config/loongarch/genopts/isa-evolution.in b/gcc/config/loongarch/genopts/isa-evolution.in index a6bc3f87f20..11a198b649f 100644 --- a/gcc/config/loongarch/genopts/isa-evolution.in +++ b/gcc/config/loongarch/genopts/isa-evolution.in @@ -1,3 +1,4 @@ +2 25 frecipe Support frecipe.{s/d} and frsqrte.{s/d} instructions. 2 26 div32 Support div.w[u] and mod.w[u] instructions with inputs not sign-extended. 2 27 lam-bh Support am{swap/add}[_db].{b/h} instructions. 2 28 lamcas Support amcas[_db].{b/h/w/d} instructions. diff --git a/gcc/config/loongarch/larchintrin.h b/gcc/config/loongarch/larchintrin.h index e571ed27b37..bb1cda831eb 100644 --- a/gcc/config/loongarch/larchintrin.h +++ b/gcc/config/loongarch/larchintrin.h @@ -333,6 +333,44 @@ __iocsrwr_d (unsigned long int _1, unsigned int _2) } #endif +#ifdef __loongarch_frecipe +/* Assembly instruction format: fd, fj. */ +/* Data types in instruction templates: SF, SF. */ +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +__frecipe_s (float _1) +{ + __builtin_loongarch_frecipe_s ((float) _1); +} + +/* Assembly instruction format: fd, fj. */ +/* Data types in instruction templates: DF, DF. */ +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +__frecipe_d (double _1) +{ + __builtin_loongarch_frecipe_d ((double) _1); +} + +/* Assembly instruction format: fd, fj. */ +/* Data types in instruction templates: SF, SF. */ +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +__frsqrte_s (float _1) +{ + __builtin_loongarch_frsqrte_s ((float) _1); +} + +/* Assembly instruction format: fd, fj. */ +/* Data types in instruction templates: DF, DF. */ +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +__frsqrte_d (double _1) +{ + __builtin_loongarch_frsqrte_d ((double) _1); +} +#endif + /* Assembly instruction format: ui15. */ /* Data types in instruction templates: USI. */ #define __dbar(/*ui15*/ _1) __builtin_loongarch_dbar ((_1)) diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md index 116b30c0774..f6e5208a6f1 100644 --- a/gcc/config/loongarch/lasx.md +++ b/gcc/config/loongarch/lasx.md @@ -40,8 +40,10 @@ (define_c_enum "unspec" [ UNSPEC_LASX_XVFCVTL UNSPEC_LASX_XVFLOGB UNSPEC_LASX_XVFRECIP + UNSPEC_LASX_XVFRECIPE UNSPEC_LASX_XVFRINT UNSPEC_LASX_XVFRSQRT + UNSPEC_LASX_XVFRSQRTE UNSPEC_LASX_XVFCMP_SAF UNSPEC_LASX_XVFCMP_SEQ UNSPEC_LASX_XVFCMP_SLE @@ -1633,6 +1635,17 @@ (define_insn "lasx_xvfrecip_" [(set_attr "type" "simd_fdiv") (set_attr "mode" "")]) +;; Approximate Reciprocal Instructions. + +(define_insn "lasx_xvfrecipe_" + [(set (match_operand:FLASX 0 "register_operand" "=f") + (unspec:FLASX [(match_operand:FLASX 1 "register_operand" "f")] + UNSPEC_LASX_XVFRECIPE))] + "ISA_HAS_LASX && TARGET_FRECIPE" + "xvfrecipe.\t%u0,%u1" + [(set_attr "type" "simd_fdiv") + (set_attr "mode" "")]) + (define_insn "lasx_xvfrsqrt_" [(set (match_operand:FLASX 0 "register_operand" "=f") (unspec:FLASX [(match_operand:FLASX 1 "register_operand" "f")] @@ -1642,6 +1655,17 @@ (define_insn "lasx_xvfrsqrt_" [(set_attr "type" "simd_fdiv") (set_attr "mode" "")]) +;; Approximate Reciprocal Square Root Instructions. + +(define_insn "lasx_xvfrsqrte_" + [(set (match_operand:FLASX 0 "register_operand" "=f") + (unspec:FLASX [(match_operand:FLASX 1 "register_operand" "f")] + UNSPEC_LASX_XVFRSQRTE))] + "ISA_HAS_LASX && TARGET_FRECIPE" + "xvfrsqrte.\t%u0,%u1" + [(set_attr "type" "simd_fdiv") + (set_attr "mode" "")]) + (define_insn "lasx_xvftint_u__" [(set (match_operand: 0 "register_operand" "=f") (unspec: [(match_operand:FLASX 1 "register_operand" "f")] diff --git a/gcc/config/loongarch/lasxintrin.h b/gcc/config/loongarch/lasxintrin.h index 7bce2c757f1..5e65e76e74c 100644 --- a/gcc/config/loongarch/lasxintrin.h +++ b/gcc/config/loongarch/lasxintrin.h @@ -2399,6 +2399,40 @@ __m256d __lasx_xvfrecip_d (__m256d _1) return (__m256d)__builtin_lasx_xvfrecip_d ((v4f64)_1); } +#if defined(__loongarch_frecipe) +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V8SF, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256 __lasx_xvfrecipe_s (__m256 _1) +{ + return (__m256)__builtin_lasx_xvfrecipe_s ((v8f32)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V4DF, V4DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256d __lasx_xvfrecipe_d (__m256d _1) +{ + return (__m256d)__builtin_lasx_xvfrecipe_d ((v4f64)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V8SF, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256 __lasx_xvfrsqrte_s (__m256 _1) +{ + return (__m256)__builtin_lasx_xvfrsqrte_s ((v8f32)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V4DF, V4DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256d __lasx_xvfrsqrte_d (__m256d _1) +{ + return (__m256d)__builtin_lasx_xvfrsqrte_d ((v4f64)_1); +} +#endif + /* Assembly instruction format: xd, xj. */ /* Data types in instruction templates: V8SF, V8SF. */ extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) diff --git a/gcc/config/loongarch/loongarch-builtins.cc b/gcc/config/loongarch/loongarch-builtins.cc index 5d037ab7f10..507fc953c72 100644 --- a/gcc/config/loongarch/loongarch-builtins.cc +++ b/gcc/config/loongarch/loongarch-builtins.cc @@ -120,6 +120,9 @@ struct loongarch_builtin_description AVAIL_ALL (hard_float, TARGET_HARD_FLOAT_ABI) AVAIL_ALL (lsx, ISA_HAS_LSX) AVAIL_ALL (lasx, ISA_HAS_LASX) +AVAIL_ALL (frecipe, TARGET_FRECIPE && TARGET_HARD_FLOAT_ABI) +AVAIL_ALL (lsx_frecipe, ISA_HAS_LSX && TARGET_FRECIPE) +AVAIL_ALL (lasx_frecipe, ISA_HAS_LASX && TARGET_FRECIPE) /* Construct a loongarch_builtin_description from the given arguments. @@ -164,6 +167,15 @@ AVAIL_ALL (lasx, ISA_HAS_LASX) "__builtin_lsx_" #INSN, LARCH_BUILTIN_DIRECT, \ FUNCTION_TYPE, loongarch_builtin_avail_lsx } + /* Define an LSX LARCH_BUILTIN_DIRECT function __builtin_lsx_ + for instruction CODE_FOR_lsx_. FUNCTION_TYPE is a builtin_description + field. AVAIL is the name of the availability predicate, without the leading + loongarch_builtin_avail_. */ +#define LSX_EXT_BUILTIN(INSN, FUNCTION_TYPE, AVAIL) \ + { CODE_FOR_lsx_ ## INSN, \ + "__builtin_lsx_" #INSN, LARCH_BUILTIN_DIRECT, \ + FUNCTION_TYPE, loongarch_builtin_avail_##AVAIL } + /* Define an LSX LARCH_BUILTIN_LSX_TEST_BRANCH function __builtin_lsx_ for instruction CODE_FOR_lsx_. FUNCTION_TYPE is a builtin_description @@ -189,6 +201,15 @@ AVAIL_ALL (lasx, ISA_HAS_LASX) "__builtin_lasx_" #INSN, LARCH_BUILTIN_LASX, \ FUNCTION_TYPE, loongarch_builtin_avail_lasx } +/* Define an LASX LARCH_BUILTIN_DIRECT function __builtin_lasx_ + for instruction CODE_FOR_lasx_. FUNCTION_TYPE is a builtin_description + field. AVAIL is the name of the availability predicate, without the leading + loongarch_builtin_avail_. */ +#define LASX_EXT_BUILTIN(INSN, FUNCTION_TYPE, AVAIL) \ + { CODE_FOR_lasx_ ## INSN, \ + "__builtin_lasx_" #INSN, LARCH_BUILTIN_LASX, \ + FUNCTION_TYPE, loongarch_builtin_avail_##AVAIL } + /* Define an LASX LARCH_BUILTIN_DIRECT_NO_TARGET function __builtin_lasx_ for instruction CODE_FOR_lasx_. FUNCTION_TYPE is a builtin_description field. */ @@ -804,6 +825,27 @@ static const struct loongarch_builtin_description loongarch_builtins[] = { DIRECT_NO_TARGET_BUILTIN (syscall, LARCH_VOID_FTYPE_USI, default), DIRECT_NO_TARGET_BUILTIN (break, LARCH_VOID_FTYPE_USI, default), + /* Built-in functions for frecipe.{s/d} and frsqrte.{s/d}. */ + + DIRECT_BUILTIN (frecipe_s, LARCH_SF_FTYPE_SF, frecipe), + DIRECT_BUILTIN (frecipe_d, LARCH_DF_FTYPE_DF, frecipe), + DIRECT_BUILTIN (frsqrte_s, LARCH_SF_FTYPE_SF, frecipe), + DIRECT_BUILTIN (frsqrte_d, LARCH_DF_FTYPE_DF, frecipe), + + /* Built-in functions for new LSX instructions. */ + + LSX_EXT_BUILTIN (vfrecipe_s, LARCH_V4SF_FTYPE_V4SF, lsx_frecipe), + LSX_EXT_BUILTIN (vfrecipe_d, LARCH_V2DF_FTYPE_V2DF, lsx_frecipe), + LSX_EXT_BUILTIN (vfrsqrte_s, LARCH_V4SF_FTYPE_V4SF, lsx_frecipe), + LSX_EXT_BUILTIN (vfrsqrte_d, LARCH_V2DF_FTYPE_V2DF, lsx_frecipe), + + /* Built-in functions for new LASX instructions. */ + + LASX_EXT_BUILTIN (xvfrecipe_s, LARCH_V8SF_FTYPE_V8SF, lasx_frecipe), + LASX_EXT_BUILTIN (xvfrecipe_d, LARCH_V4DF_FTYPE_V4DF, lasx_frecipe), + LASX_EXT_BUILTIN (xvfrsqrte_s, LARCH_V8SF_FTYPE_V8SF, lasx_frecipe), + LASX_EXT_BUILTIN (xvfrsqrte_d, LARCH_V4DF_FTYPE_V4DF, lasx_frecipe), + /* Built-in functions for LSX. */ LSX_BUILTIN (vsll_b, LARCH_V16QI_FTYPE_V16QI_V16QI), LSX_BUILTIN (vsll_h, LARCH_V8HI_FTYPE_V8HI_V8HI), diff --git a/gcc/config/loongarch/loongarch-c.cc b/gcc/config/loongarch/loongarch-c.cc index fbc33a10351..44f52245c78 100644 --- a/gcc/config/loongarch/loongarch-c.cc +++ b/gcc/config/loongarch/loongarch-c.cc @@ -102,6 +102,9 @@ loongarch_cpu_cpp_builtins (cpp_reader *pfile) else builtin_define ("__loongarch_frlen=0"); + if (TARGET_HARD_FLOAT && TARGET_FRECIPE) + builtin_define ("__loongarch_frecipe"); + if (ISA_HAS_LSX) { builtin_define ("__loongarch_simd"); diff --git a/gcc/config/loongarch/loongarch-cpucfg-map.h b/gcc/config/loongarch/loongarch-cpucfg-map.h index 02ff1671255..148333c249c 100644 --- a/gcc/config/loongarch/loongarch-cpucfg-map.h +++ b/gcc/config/loongarch/loongarch-cpucfg-map.h @@ -29,6 +29,7 @@ static constexpr struct { unsigned int cpucfg_bit; HOST_WIDE_INT isa_evolution_bit; } cpucfg_map[] = { + { 2, 1u << 25, OPTION_MASK_ISA_FRECIPE }, { 2, 1u << 26, OPTION_MASK_ISA_DIV32 }, { 2, 1u << 27, OPTION_MASK_ISA_LAM_BH }, { 2, 1u << 28, OPTION_MASK_ISA_LAMCAS }, diff --git a/gcc/config/loongarch/loongarch-def.cc b/gcc/config/loongarch/loongarch-def.cc index bc6997e45b5..c41804a180e 100644 --- a/gcc/config/loongarch/loongarch-def.cc +++ b/gcc/config/loongarch/loongarch-def.cc @@ -60,7 +60,8 @@ array_arch loongarch_cpu_default_isa = .fpu_ (ISA_EXT_FPU64) .simd_ (ISA_EXT_SIMD_LASX) .evolution_ (OPTION_MASK_ISA_DIV32 | OPTION_MASK_ISA_LD_SEQ_SA - | OPTION_MASK_ISA_LAM_BH | OPTION_MASK_ISA_LAMCAS)); + | OPTION_MASK_ISA_LAM_BH | OPTION_MASK_ISA_LAMCAS + | OPTION_MASK_ISA_FRECIPE)); static inline loongarch_cache la464_cache () { diff --git a/gcc/config/loongarch/loongarch-str.h b/gcc/config/loongarch/loongarch-str.h index 7c78d1443d5..4d1bfd675e8 100644 --- a/gcc/config/loongarch/loongarch-str.h +++ b/gcc/config/loongarch/loongarch-str.h @@ -68,6 +68,7 @@ along with GCC; see the file COPYING3. If not see #define STR_EXPLICIT_RELOCS_NONE "none" #define STR_EXPLICIT_RELOCS_ALWAYS "always" +#define OPTSTR_FRECIPE "frecipe" #define OPTSTR_DIV32 "div32" #define OPTSTR_LAM_BH "lam-bh" #define OPTSTR_LAMCAS "lamcas" diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index 3545e66a10e..57a20bec8a4 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -11503,6 +11503,7 @@ loongarch_asm_code_end (void) loongarch_cpu_strings [la_target.cpu_tune]); fprintf (asm_out_file, "%s Base ISA: %s\n", ASM_COMMENT_START, loongarch_isa_base_strings [la_target.isa.base]); + DUMP_FEATURE (TARGET_FRECIPE); DUMP_FEATURE (TARGET_DIV32); DUMP_FEATURE (TARGET_LAM_BH); DUMP_FEATURE (TARGET_LAMCAS); diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md index 7a101dd64b7..07beede8892 100644 --- a/gcc/config/loongarch/loongarch.md +++ b/gcc/config/loongarch/loongarch.md @@ -59,6 +59,12 @@ (define_c_enum "unspec" [ ;; Stack tie UNSPEC_TIE + ;; RSQRT + UNSPEC_RSQRTE + + ;; RECIP + UNSPEC_RECIPE + ;; CRC UNSPEC_CRC UNSPEC_CRCC @@ -220,6 +226,7 @@ (define_attr "qword_mode" "no,yes" ;; fmadd floating point multiply-add ;; fdiv floating point divide ;; frdiv floating point reciprocal divide +;; frecipe floating point approximate reciprocal ;; fabs floating point absolute value ;; flogb floating point exponent extract ;; fneg floating point negation @@ -229,6 +236,7 @@ (define_attr "qword_mode" "no,yes" ;; fscaleb floating point scale ;; fsqrt floating point square root ;; frsqrt floating point reciprocal square root +;; frsqrte floating point approximate reciprocal square root ;; multi multiword sequence (or user asm statements) ;; atomic atomic memory update instruction ;; syncloop memory atomic operation implemented as a sync loop @@ -238,8 +246,8 @@ (define_attr "type" "unknown,branch,jump,call,load,fpload,fpidxload,store,fpstore,fpidxstore, prefetch,prefetchx,condmove,mgtf,mftg,const,arith,logical, shift,slt,signext,clz,trap,imul,idiv,move, - fmove,fadd,fmul,fmadd,fdiv,frdiv,fabs,flogb,fneg,fcmp,fcopysign,fcvt, - fscaleb,fsqrt,frsqrt,accext,accmod,multi,atomic,syncloop,nop,ghost, + fmove,fadd,fmul,fmadd,fdiv,frdiv,frecipe,fabs,flogb,fneg,fcmp,fcopysign,fcvt, + fscaleb,fsqrt,frsqrt,frsqrte,accext,accmod,multi,atomic,syncloop,nop,ghost, simd_div,simd_fclass,simd_flog2,simd_fadd,simd_fcvt,simd_fmul,simd_fmadd, simd_fdiv,simd_bitins,simd_bitmov,simd_insert,simd_sld,simd_mul,simd_fcmp, simd_fexp2,simd_int_arith,simd_bit,simd_shift,simd_splat,simd_fill, @@ -908,6 +916,18 @@ (define_insn "*recip3" [(set_attr "type" "frdiv") (set_attr "mode" "")]) +;; Approximate Reciprocal Instructions. + +(define_insn "loongarch_frecipe_" + [(set (match_operand:ANYF 0 "register_operand" "=f") + (unspec:ANYF [(match_operand:ANYF 1 "register_operand" "f")] + UNSPEC_RECIPE))] + "TARGET_FRECIPE" + "frecipe.\t%0,%1" + [(set_attr "type" "frecipe") + (set_attr "mode" "") + (set_attr "insn_count" "1")]) + ;; Integer division and modulus. (define_expand "3" [(set (match_operand:GPR 0 "register_operand") @@ -1133,6 +1153,17 @@ (define_insn "*rsqrtb" [(set_attr "type" "frsqrt") (set_attr "mode" "") (set_attr "insn_count" "1")]) + +;; Approximate Reciprocal Square Root Instructions. + +(define_insn "loongarch_frsqrte_" + [(set (match_operand:ANYF 0 "register_operand" "=f") + (unspec:ANYF [(match_operand:ANYF 1 "register_operand" "f")] + UNSPEC_RSQRTE))] + "TARGET_FRECIPE" + "frsqrte.\t%0,%1" + [(set_attr "type" "frsqrte") + (set_attr "mode" "")]) ;; ;; .................... diff --git a/gcc/config/loongarch/loongarch.opt b/gcc/config/loongarch/loongarch.opt index 41e6424e861..cdd59ae4fcf 100644 --- a/gcc/config/loongarch/loongarch.opt +++ b/gcc/config/loongarch/loongarch.opt @@ -260,6 +260,10 @@ default value is 4. Variable HOST_WIDE_INT isa_evolution = 0 +mfrecipe +Target Mask(ISA_FRECIPE) Var(isa_evolution) +Support frecipe.{s/d} and frsqrte.{s/d} instructions. + mdiv32 Target Mask(ISA_DIV32) Var(isa_evolution) Support div.w[u] and mod.w[u] instructions with inputs not sign-extended. diff --git a/gcc/config/loongarch/lsx.md b/gcc/config/loongarch/lsx.md index 23239993404..e2393aed139 100644 --- a/gcc/config/loongarch/lsx.md +++ b/gcc/config/loongarch/lsx.md @@ -42,8 +42,10 @@ (define_c_enum "unspec" [ UNSPEC_LSX_VFCVTL UNSPEC_LSX_VFLOGB UNSPEC_LSX_VFRECIP + UNSPEC_LSX_VFRECIPE UNSPEC_LSX_VFRINT UNSPEC_LSX_VFRSQRT + UNSPEC_LSX_VFRSQRTE UNSPEC_LSX_VFCMP_SAF UNSPEC_LSX_VFCMP_SEQ UNSPEC_LSX_VFCMP_SLE @@ -1546,6 +1548,17 @@ (define_insn "lsx_vfrecip_" [(set_attr "type" "simd_fdiv") (set_attr "mode" "")]) +;; Approximate Reciprocal Instructions. + +(define_insn "lsx_vfrecipe_" + [(set (match_operand:FLSX 0 "register_operand" "=f") + (unspec:FLSX [(match_operand:FLSX 1 "register_operand" "f")] + UNSPEC_LSX_VFRECIPE))] + "ISA_HAS_LSX && TARGET_FRECIPE" + "vfrecipe.\t%w0,%w1" + [(set_attr "type" "simd_fdiv") + (set_attr "mode" "")]) + (define_insn "lsx_vfrsqrt_" [(set (match_operand:FLSX 0 "register_operand" "=f") (unspec:FLSX [(match_operand:FLSX 1 "register_operand" "f")] @@ -1555,6 +1568,17 @@ (define_insn "lsx_vfrsqrt_" [(set_attr "type" "simd_fdiv") (set_attr "mode" "")]) +;; Approximate Reciprocal Square Root Instructions. + +(define_insn "lsx_vfrsqrte_" + [(set (match_operand:FLSX 0 "register_operand" "=f") + (unspec:FLSX [(match_operand:FLSX 1 "register_operand" "f")] + UNSPEC_LSX_VFRSQRTE))] + "ISA_HAS_LSX && TARGET_FRECIPE" + "vfrsqrte.\t%w0,%w1" + [(set_attr "type" "simd_fdiv") + (set_attr "mode" "")]) + (define_insn "lsx_vftint_u__" [(set (match_operand: 0 "register_operand" "=f") (unspec: [(match_operand:FLSX 1 "register_operand" "f")] diff --git a/gcc/config/loongarch/lsxintrin.h b/gcc/config/loongarch/lsxintrin.h index 29553c093fa..57a6fc40a8f 100644 --- a/gcc/config/loongarch/lsxintrin.h +++ b/gcc/config/loongarch/lsxintrin.h @@ -2480,6 +2480,40 @@ __m128d __lsx_vfrecip_d (__m128d _1) return (__m128d)__builtin_lsx_vfrecip_d ((v2f64)_1); } +#if defined(__loongarch_frecipe) +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V4SF, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128 __lsx_vfrecipe_s (__m128 _1) +{ + return (__m128)__builtin_lsx_vfrecipe_s ((v4f32)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V2DF, V2DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128d __lsx_vfrecipe_d (__m128d _1) +{ + return (__m128d)__builtin_lsx_vfrecipe_d ((v2f64)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V4SF, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128 __lsx_vfrsqrte_s (__m128 _1) +{ + return (__m128)__builtin_lsx_vfrsqrte_s ((v4f32)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V2DF, V2DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128d __lsx_vfrsqrte_d (__m128d _1) +{ + return (__m128d)__builtin_lsx_vfrsqrte_d ((v2f64)_1); +} +#endif + /* Assembly instruction format: vd, vj. */ /* Data types in instruction templates: V4SF, V4SF. */ extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 32ae15e1d5b..98c6d320fbe 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -17027,6 +17027,14 @@ The intrinsics provided are listed below: void __builtin_loongarch_break (imm0_32767) @end smallexample +These instrisic functions are available by using @option{-mfrecipe}. +@smallexample + float __builtin_loongarch_frecipe_s (float); + double __builtin_loongarch_frecipe_d (double); + float __builtin_loongarch_frsqrte_s (float); + double __builtin_loongarch_frsqrte_d (double); +@end smallexample + @emph{Note:}Since the control register is divided into 32-bit and 64-bit, but the access instruction is not distinguished. So GCC renames the control instructions when implementing intrinsics. @@ -17099,6 +17107,15 @@ function you need to include @code{larchintrin.h}. void __break (imm0_32767) @end smallexample +These instrisic functions are available by including @code{larchintrin.h} and +using @option{-mfrecipe}. +@smallexample + float __frecipe_s (float); + double __frecipe_d (double); + float __frsqrte_s (float); + double __frsqrte_d (double); +@end smallexample + Additional built-in functions are available for LoongArch family processors to efficiently use 128-bit floating-point (__float128) values. @@ -17939,6 +17956,15 @@ __m128i __lsx_vxori_b (__m128i, imm0_255); __m128i __lsx_vxor_v (__m128i, __m128i); @end smallexample +These instrisic functions are available by including @code{lsxintrin.h} and +using @option{-mfrecipe} and @option{-mlsx}. +@smallexample +__m128d __lsx_vfrecipe_d (__m128d); +__m128 __lsx_vfrecipe_s (__m128); +__m128d __lsx_vfrsqrte_d (__m128d); +__m128 __lsx_vfrsqrte_s (__m128); +@end smallexample + @node LoongArch ASX Vector Intrinsics @subsection LoongArch ASX Vector Intrinsics @@ -18778,6 +18804,15 @@ __m256i __lasx_xvxori_b (__m256i, imm0_255); __m256i __lasx_xvxor_v (__m256i, __m256i); @end smallexample +These instrisic functions are available by including @code{lasxintrin.h} and +using @option{-mfrecipe} and @option{-mlasx}. +@smallexample +__m256d __lasx_xvfrecipe_d (__m256d); +__m256 __lasx_xvfrecipe_s (__m256); +__m256d __lasx_xvfrsqrte_d (__m256d); +__m256 __lasx_xvfrsqrte_s (__m256); +@end smallexample + @node MIPS DSP Built-in Functions @subsection MIPS DSP Built-in Functions diff --git a/gcc/testsuite/gcc.target/loongarch/larch-frecipe-builtin.c b/gcc/testsuite/gcc.target/loongarch/larch-frecipe-builtin.c new file mode 100644 index 00000000000..b9329f34676 --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/larch-frecipe-builtin.c @@ -0,0 +1,28 @@ +/* Test builtins for frecipe.{s/d} and frsqrte.{s/d} instructions */ +/* { dg-do compile } */ +/* { dg-options "-mfrecipe" } */ +/* { dg-final { scan-assembler-times "test_frecipe_s:.*frecipe\\.s.*test_frecipe_s" 1 } } */ +/* { dg-final { scan-assembler-times "test_frecipe_d:.*frecipe\\.d.*test_frecipe_d" 1 } } */ +/* { dg-final { scan-assembler-times "test_frsqrte_s:.*frsqrte\\.s.*test_frsqrte_s" 1 } } */ +/* { dg-final { scan-assembler-times "test_frsqrte_d:.*frsqrte\\.d.*test_frsqrte_d" 1 } } */ + +float +test_frecipe_s (float _1) +{ + return __builtin_loongarch_frecipe_s (_1); +} +double +test_frecipe_d (double _1) +{ + return __builtin_loongarch_frecipe_d (_1); +} +float +test_frsqrte_s (float _1) +{ + return __builtin_loongarch_frsqrte_s (_1); +} +double +test_frsqrte_d (double _1) +{ + return __builtin_loongarch_frsqrte_d (_1); +} diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-frecipe-builtin.c b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-frecipe-builtin.c new file mode 100644 index 00000000000..522535b45a3 --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-frecipe-builtin.c @@ -0,0 +1,30 @@ +/* Test builtins for xvfrecipe.{s/d} and xvfrsqrte.{s/d} instructions */ +/* { dg-do compile } */ +/* { dg-options "-mlasx -mfrecipe" } */ +/* { dg-final { scan-assembler-times "lasx_xvfrecipe_s:.*xvfrecipe\\.s.*lasx_xvfrecipe_s" 1 } } */ +/* { dg-final { scan-assembler-times "lasx_xvfrecipe_d:.*xvfrecipe\\.d.*lasx_xvfrecipe_d" 1 } } */ +/* { dg-final { scan-assembler-times "lasx_xvfrsqrte_s:.*xvfrsqrte\\.s.*lasx_xvfrsqrte_s" 1 } } */ +/* { dg-final { scan-assembler-times "lasx_xvfrsqrte_d:.*xvfrsqrte\\.d.*lasx_xvfrsqrte_d" 1 } } */ + +#include + +v8f32 +__lasx_xvfrecipe_s (v8f32 _1) +{ + return __builtin_lasx_xvfrecipe_s (_1); +} +v4f64 +__lasx_xvfrecipe_d (v4f64 _1) +{ + return __builtin_lasx_xvfrecipe_d (_1); +} +v8f32 +__lasx_xvfrsqrte_s (v8f32 _1) +{ + return __builtin_lasx_xvfrsqrte_s (_1); +} +v4f64 +__lasx_xvfrsqrte_d (v4f64 _1) +{ + return __builtin_lasx_xvfrsqrte_d (_1); +} diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-frecipe-builtin.c b/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-frecipe-builtin.c new file mode 100644 index 00000000000..4ad0cb0ffd6 --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-frecipe-builtin.c @@ -0,0 +1,30 @@ +/* Test builtins for vfrecipe.{s/d} and vfrsqrte.{s/d} instructions */ +/* { dg-do compile } */ +/* { dg-options "-mlsx -mfrecipe" } */ +/* { dg-final { scan-assembler-times "lsx_vfrecipe_s:.*vfrecipe\\.s.*lsx_vfrecipe_s" 1 } } */ +/* { dg-final { scan-assembler-times "lsx_vfrecipe_d:.*vfrecipe\\.d.*lsx_vfrecipe_d" 1 } } */ +/* { dg-final { scan-assembler-times "lsx_vfrsqrte_s:.*vfrsqrte\\.s.*lsx_vfrsqrte_s" 1 } } */ +/* { dg-final { scan-assembler-times "lsx_vfrsqrte_d:.*vfrsqrte\\.d.*lsx_vfrsqrte_d" 1 } } */ + +#include + +v4f32 +__lsx_vfrecipe_s (v4f32 _1) +{ + return __builtin_lsx_vfrecipe_s (_1); +} +v2f64 +__lsx_vfrecipe_d (v2f64 _1) +{ + return __builtin_lsx_vfrecipe_d (_1); +} +v4f32 +__lsx_vfrsqrte_s (v4f32 _1) +{ + return __builtin_lsx_vfrsqrte_s (_1); +} +v2f64 +__lsx_vfrsqrte_d (v2f64 _1) +{ + return __builtin_lsx_vfrsqrte_d (_1); +}