From patchwork Tue Nov 7 03:53:39 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: chenxiaolong X-Patchwork-Id: 162259 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:8f47:0:b0:403:3b70:6f57 with SMTP id j7csp3081965vqu; Mon, 6 Nov 2023 19:54:36 -0800 (PST) X-Google-Smtp-Source: AGHT+IERMkPbQPWgH6CLWy4umzVgXEBHmEQUDF/+1PHuJH7fG07FbGIkoO3ScGtEqk2Uj9vEWlLU X-Received: by 2002:ac8:1345:0:b0:418:1235:5c84 with SMTP id f5-20020ac81345000000b0041812355c84mr1893356qtj.10.1699329276363; Mon, 06 Nov 2023 19:54:36 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1699329276; cv=pass; d=google.com; s=arc-20160816; b=yK+U3lNVwA4T4kP+/Q4I2S9tBw0+QpdGN8Mu13cCfx/ra8Gli2LuHkZjfnO4BJZoC4 +1GVSBdfx9oFNi5NsCqOxzvl+RCxLWUxIA4WBi5bzFK+SJ+cubwiTb1+WqARZw2L+9tc TSt5IvnEpFbFX4fjUhPc81j2bc9p+5gTwwtfZ6He+0qwWcuLgnA0Pyy+WtWn7GDT+6Y5 WTQryssl69KGyaAAKWsMGllRBs2bkux47FIPL6rMixDRjN0tMTMf0/67F84dx+ELunBG eGldD7I8DswD9OiAt/rmJE9stZ4HaLC3aupk+shZOXTW0BR73nGNSt8HKn70eeCRhkXB WBDA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:message-id:date:subject:cc:to:from:arc-filter :dmarc-filter:delivered-to; bh=Nux7ml0XCYD+vBXmP8e3R9M65lrMYZcGQjOv9NFPcDk=; fh=FZUwle8+72fCZy+/zioADwckSVaPYtJTkTncWlIM74g=; b=aItoXHxFn8kdKavOOylwt3DnDydboTxODJMZqmdxJsxJmHboNzdd6YihIFW68xawdu uW9DMAkLzOFwekvBo5GTJteRwHz5dtJOIesDIZYEr2QhP2l9bY5MRHCiDJPpjjkVTVHi ganrPCvCxw8QoqV4rdj4uilRj476qN92oAZHcU5pNENJbzJwvxndz1s8s4p533rsBeRS xZHX3VOxpI4zqPD6GCCZmh10608Gn0D4BzCtOE7IYxVODezZYDqmBKgqOvArYCsI7U4Y N5JDb9kxI5mQ2Pfv8nq75jCXPnQyFcZmrqy/w689X4xuAdmLEZvxw02X0pCovxMj54st X++w== ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id c9-20020ac85a89000000b0040ea3cba1e9si6209465qtc.688.2023.11.06.19.54.36 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 06 Nov 2023 19:54:36 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 22A573858C78 for ; Tue, 7 Nov 2023 03:54:36 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from eggs.gnu.org (eggs.gnu.org [IPv6:2001:470:142:3::10]) by sourceware.org (Postfix) with ESMTPS id 404583858D1E for ; Tue, 7 Nov 2023 03:54:05 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 404583858D1E Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=loongson.cn Authentication-Results: sourceware.org; spf=fail smtp.mailfrom=loongson.cn ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 404583858D1E Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2001:470:142:3::10 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699329251; cv=none; b=Fzk6d5rBCNbDtaolKW/1ZYUBX0JMg24Wil2B/pc51P1pOC50T5n0rkK73YUVJxX4cC5pB/qakwdEO8R7NS5kyaVSED05m79F4yilXxvNi06K7VJAJWaUofPnim5s4OI/INQxc9KqXMI7Qh/dGnClg61rUifwKAR/iHyzPonG4/s= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699329251; c=relaxed/simple; bh=5EOtmKM+KT/ZtKXBsdGcmUnevs9oas9Lsj1WPO0GSF0=; h=From:To:Subject:Date:Message-Id:MIME-Version; b=NN5Ip0LZ0o9RagQuIVxDp2zf3pWu2C2zs0aHsIILqzwyal33o+WhvmWhrSVbTlWJShcMGcGba45Iey2ahZRgZHEKPp9CFWqxs381e8YWH7x8KNtHJkxaK4paRButkp+WOrgTyaUTUB3Alf3xVyKTToFy6DmRwgzPhEMRbs8nmnw= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from mail.loongson.cn ([114.242.206.163]) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1r0DA6-0001Gz-Hq for gcc-patches@gcc.gnu.org; Mon, 06 Nov 2023 22:54:04 -0500 Received: from loongson.cn (unknown [10.10.130.252]) by gateway (Coremail) with SMTP id _____8Bxd+jLtEll2Y03AA--.7416S3; Tue, 07 Nov 2023 11:53:48 +0800 (CST) Received: from slurm-master.loongson.cn (unknown [10.10.130.252]) by localhost.localdomain (Coremail) with SMTP id AQAAf8BxE+TKtElledI6AA--.64498S4; Tue, 07 Nov 2023 11:53:46 +0800 (CST) From: chenxiaolong To: gcc-patches@gcc.gnu.org Cc: xry111@xry111.site, i@xen0n.name, xuchenghua@loongson.cn, chenglulu@loongson.cn, chenxiaolong Subject: [PATCH v1] LoongArch: Add instructions for the use of vector functions. Date: Tue, 7 Nov 2023 11:53:39 +0800 Message-Id: <20231107035339.28242-1-chenxiaolong@loongson.cn> X-Mailer: git-send-email 2.20.1 MIME-Version: 1.0 X-CM-TRANSID: AQAAf8BxE+TKtElledI6AA--.64498S4 X-CM-SenderInfo: hfkh05xldrz0tqj6z05rqj20fqof0/1tbiAQAIBWVJnqkELQAAs+ X-Coremail-Antispam: 1Uk129KBj9fXoWDZFy7CFW3Cryftr4UCr47KFX_yoWfJw1rXo WUAa47X34vka13tF4DJr1kCF1Ska4Fkr15Aw1xJw4qyasxt34Iva1fCr1rJ34UGw17Z345 J3WrAr47u345Jr4kl-sFpf9Il3svdjkaLaAFLSUrUUUUjb8apTn2vfkv8UJUUUU8wcxFpf 9Il3svdxBIdaVrn0xqx4xG64xvF2IEw4CE5I8CrVC2j2Jv73VFW2AGmfu7bjvjm3AaLaJ3 UjIYCTnIWjp_UUUY17kC6x804xWl14x267AKxVWUJVW8JwAFc2x0x2IEx4CE42xK8VAvwI 8IcIk0rVWrJVCq3wAFIxvE14AKwVWUGVWUXwA2ocxC64kIII0Yj41l84x0c7CEw4AK67xG Y2AK021l84ACjcxK6xIIjxv20xvE14v26r4j6ryUM28EF7xvwVC0I7IYx2IY6xkF7I0E14 v26r4j6F4UM28EF7xvwVC2z280aVAFwI0_Gr1j6F4UJwA2z4x0Y4vEx4A2jsIEc7CjxVAF wI0_Gr1j6F4UJwAS0I0E0xvYzxvE52x082IY62kv0487Mc804VCY07AIYIkI8VC2zVCFFI 0UMc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7IYx2IY67AKxVWUGVWUXwAv7VC2z280 aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4UM4x0Y48IcxkI7VAKI48JMxAIw28Icx kI7VAKI48JMxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I0E5I8CrVAFwI0_Jr0_Jr4lx2Iq xVCjr7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVWUAVWUtwCIc40Y0x0EwIxGrwCI42 IY6xIIjxv20xvE14v26r1j6r1xMIIF0xvE2Ix0cI8IcVCY1x0267AKxVWUJVW8JwCI42IY 6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Jr0_Gr1lIxAIcVC2z280aV CY1x0267AKxVWUJVW8JbIYCTnIWIevJa73UjIFyTuYvjxUzZ2-UUUUU Received-SPF: pass client-ip=114.242.206.163; envelope-from=chenxiaolong@loongson.cn; helo=mail.loongson.cn X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Status: No, score=-13.7 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, SPF_FAIL, SPF_HELO_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1781875895228970521 X-GMAIL-MSGID: 1781875895228970521 gcc/ChangeLog: * doc/extend.texi:Add instructions for SX and ASX vector functions to GCC documentation, including definitions and usage of function type aliases, constant types, and vector function prototypes. --- gcc/doc/extend.texi | 1673 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 1673 insertions(+) diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 618f49b3968..470015a7488 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -15055,6 +15055,8 @@ instructions, but allow the compiler to schedule those calls. * BPF Built-in Functions:: * FR-V Built-in Functions:: * LoongArch Base Built-in Functions:: +* LoongArch SX Vector Intrinsics:: +* LoongArch ASX Vector Intrinsics:: * MIPS DSP Built-in Functions:: * MIPS Paired-Single Support:: * MIPS Loongson Built-in Functions:: @@ -16839,6 +16841,1677 @@ Returns the value that is currently set in the @samp{tp} register. void * __builtin_thread_pointer (void) @end smallexample +@node LoongArch SX Vector Intrinsics +@subsection LoongArch SX Vector Intrinsics + + Currently, GCC provides support for 128-bit and 256-bit vector operations on +the LoongArch architecture. When using a 128-bit vector function, you need to +add the header file @code{} and use the compile option @code{-mlsx} +to enable vectorization operations. They can be defined in C as follows: + +@smallexample +typedef float __m128 __attribute__ ((__vector_size__ (16), __may_alias__)); +typedef long long __m128i __attribute__ ((__vector_size__ (16), __may_alias__)); +typedef double __m128d __attribute__ ((__vector_size__ (16), __may_alias__)); +typedef int i32; +typedef unsigned int u32; +typedef long int i64; +typedef unsigned long int u64; +@end smallexample + + @code{__m128} is an alias of type float, @code{128} bits (16 bytes) long, and +uses the __may_alias__ attribute, which means that the defined alias can be +accessed by the compiler. Similarly, @code{__m128i} and @code{__m128d} are used +in compilers as aliases for types long long and double respectively. @code{i32} +and @code{i64} are used as aliases for signed integers, while @code{u32} and +@code{u64} are used as aliases for unsigned integers. + + Also, some built-in functions prefer or require immediate numbers as +parameters, because the corresponding instructions accept both immediate +numbers and register operands,or accept immediate numbers only. The immediate +parameters are listed as follows. + +@smallexample +* imm0_1, an integer literal in range 0 to 1. +* imm0_3, an integer literal in range 0 to 3. +* imm0_7, an integer literal in range 0 to 7. +* imm0_15, an integer literal in range 0 to 15. +* imm0_31, an integer literal in range 0 to 31. +* imm0_63, an integer literal in range 0 to 63. +* imm0_127, an integer literal in range 0 to 127. +* imm0_255, an integer literal in range 0 to 255. +* imm_n16_15, an integer literal in range -16 to 15. +* imm_n128_127, an integer literal in range -128 to 127. +* imm_n256_255, an integer literal in range -256 to 255. +* imm_n512_511, an integer literal in range -512 to 511. +* imm_n1024_1023, an integer literal in range -1024 to 1023. +* imm_n2048_2047, an integer literal in range -2048 to 2047. +@end smallexample + + In the builtin function implemented on the LoongArch architecture, there are +some special points to note, as shown below: + + * For instructions with the same source and destination operand, the first +residue of the builtin function call is used as the destination operand. + + * The vector instruction "vldi vd,i13" is implemented according to whether +the highest bit is 0 or 1,as shown in the following two cases. + +@smallexample +a.When the highest digit of the immediate number (i13) is 0: + Different values of the 11th and 12th bits correspond to the following four +instructions. + If @code{i13[11:10]} is set to 00, the @code{vrepli.b vd,s10} command is used +to implement the function; + If @code{i13[11:10]} is set to 01, the @code{vrepli.h vd,s10} command is used +to implement the function; + If @code{i13[11:10]} is set to 10, the @code{vrepli.w vd,s10} command is used +to implement the function; + If @code{i13[11:10]} is set to 11, the @code{vrepli.d vd,s10} command is used +to implement the function; + + In the above four instructions, @code{s10} represents the signed number 10. + +b.When the highest digit of the immediate number (i13) is 1: + The compiler has not implemented the builtin function with the highest bit +of 1. +@end smallexample + + * In order to support vseteqz instructions on the loongArch architecture, a +number of builtin functions were added to the GCC compiler, which implement +functions by combining two instructions. + +@smallexample + The corresponding assembly instructions for the @code{__lsx_bz_v} function +are @code{vseteqz.v} and @code{bcnez}. + The corresponding assembly instructions for the @code{__lsx_bz_b} function +are @code{vsetanyeqz.b} and @code{bcnez}. + The corresponding assembly instructions for the @code{__lsx_bz_h} function +are @code{vsetanteqz.h} and @code{bcnez}. + The corresponding assembly instructions for the @code{__lsx_bz_w} function +are @code{vsetanyeqz.w} and @code{bcnez}. + The corresponding assembly instructions for the @code{__lsx_bz_d} function +are @code{vsetanyeqz.d} and @code{bcnez}. + The corresponding assembly instructions for the @code{__lsx_bnz_v} function +are @code{vsetnez.v} and @code{bcnez}. + The corresponding assembly instructions for the @code{__lsx_bnz_b} function +are @code{vsetallnez.b} and @code{bcnez}. + The corresponding assembly instructions for the @code{__lsx_bnz_h} function +are @code{vsetallnez.h} and @code{bcnez}. + The corresponding assembly instructions for the @code{__lsx_bnz_w} function +are @code{vsetallnez.w} and @code{bcnez}. + The corresponding assembly instructions for the @code{__lsx_bnz_d} function +are @code{vsetallnez.d} and @code{bcnez}. +@end smallexample + + The intrinsics provided are listed below: +@smallexample +i32 __lsx_bnz_b (__m128i); +i32 __lsx_bnz_d (__m128i); +i32 __lsx_bnz_h (__m128i); +i32 __lsx_bnz_v (__m128i); +i32 __lsx_bnz_w (__m128i); +i32 __lsx_bz_b (__m128i); +i32 __lsx_bz_d (__m128i); +i32 __lsx_bz_h (__m128i); +i32 __lsx_bz_v (__m128i); +i32 __lsx_bz_w (__m128i); +__m128i __lsx_vabsd_b (__m128i, __m128i); +__m128i __lsx_vabsd_bu (__m128i, __m128i); +__m128i __lsx_vabsd_di (__m128i, __m128i); +__m128i __lsx_vabsd_du (__m128i, __m128i); +__m128i __lsx_vabsd_h (__m128i, __m128i); +__m128i __lsx_vabsd_hu (__m128i, __m128i); +__m128i __lsx_vabsd_w (__m128i, __m128i); +__m128i __lsx_vabsd_wu (__m128i, __m128i); +__m128i __lsx_vadda_b (__m128i, __m128i); +__m128i __lsx_vadda_d (__m128i, __m128i); +__m128i __lsx_vadda_h (__m128i, __m128i); +__m128i __lsx_vadda_w (__m128i, __m128i); +__m128i __lsx_vadd_b (__m128i, __m128i); +__m128i __lsx_vadd_d (__m128i, __m128i); +__m128i __lsx_vadd_h (__m128i, __m128i); +__m128i __lsx_vaddi_bu (__m128i, imm0_31); +__m128i __lsx_vaddi_du (__m128i, imm0_31); +__m128i __lsx_vaddi_hu (__m128i, imm0_31); +__m128i __lsx_vaddi_wu (__m128i, imm0_31); +__m128i __lsx_vadd_q (__m128i, __m128i); +__m128i __lsx_vadd_w (__m128i, __m128i); +__m128i __lsx_vaddwev_d_w (__m128i, __m128i); +__m128i __lsx_vaddwev_d_wu (__m128i, __m128i); +__m128i __lsx_vaddwev_d_wu_w (__m128i, __m128i); +__m128i __lsx_vaddwev_h_b (__m128i, __m128i); +__m128i __lsx_vaddwev_h_bu (__m128i, __m128i); +__m128i __lsx_vaddwev_h_bu_b (__m128i, __m128i); +__m128i __lsx_vaddwev_q_d (__m128i, __m128i); +__m128i __lsx_vaddwev_q_du (__m128i, __m128i); +__m128i __lsx_vaddwev_q_du_d (__m128i, __m128i); +__m128i __lsx_vaddwev_w_h (__m128i, __m128i); +__m128i __lsx_vaddwev_w_hu (__m128i, __m128i); +__m128i __lsx_vaddwev_w_hu_h (__m128i, __m128i); +__m128i __lsx_vaddwod_d_w (__m128i, __m128i); +__m128i __lsx_vaddwod_d_wu (__m128i, __m128i); +__m128i __lsx_vaddwod_d_wu_w (__m128i, __m128i); +__m128i __lsx_vaddwod_h_b (__m128i, __m128i); +__m128i __lsx_vaddwod_h_bu (__m128i, __m128i); +__m128i __lsx_vaddwod_h_bu_b (__m128i, __m128i); +__m128i __lsx_vaddwod_q_d (__m128i, __m128i); +__m128i __lsx_vaddwod_q_du (__m128i, __m128i); +__m128i __lsx_vaddwod_q_du_d (__m128i, __m128i); +__m128i __lsx_vaddwod_w_h (__m128i, __m128i); +__m128i __lsx_vaddwod_w_hu (__m128i, __m128i); +__m128i __lsx_vaddwod_w_hu_h (__m128i, __m128i); +__m128i __lsx_vandi_b (__m128i, imm0_255); +__m128i __lsx_vandn_v (__m128i, __m128i); +__m128i __lsx_vand_v (__m128i, __m128i); +__m128i __lsx_vavg_b (__m128i, __m128i); +__m128i __lsx_vavg_bu (__m128i, __m128i); +__m128i __lsx_vavg_d (__m128i, __m128i); +__m128i __lsx_vavg_du (__m128i, __m128i); +__m128i __lsx_vavg_h (__m128i, __m128i); +__m128i __lsx_vavg_hu (__m128i, __m128i); +__m128i __lsx_vavgr_b (__m128i, __m128i); +__m128i __lsx_vavgr_bu (__m128i, __m128i); +__m128i __lsx_vavgr_d (__m128i, __m128i); +__m128i __lsx_vavgr_du (__m128i, __m128i); +__m128i __lsx_vavgr_h (__m128i, __m128i); +__m128i __lsx_vavgr_hu (__m128i, __m128i); +__m128i __lsx_vavgr_w (__m128i, __m128i); +__m128i __lsx_vavgr_wu (__m128i, __m128i); +__m128i __lsx_vavg_w (__m128i, __m128i); +__m128i __lsx_vavg_wu (__m128i, __m128i); +__m128i __lsx_vbitclr_b (__m128i, __m128i); +__m128i __lsx_vbitclr_d (__m128i, __m128i); +__m128i __lsx_vbitclr_h (__m128i, __m128i); +__m128i __lsx_vbitclri_b (__m128i, imm0_7); +__m128i __lsx_vbitclri_d (__m128i, imm0_63); +__m128i __lsx_vbitclri_h (__m128i, imm0_15); +__m128i __lsx_vbitclri_w (__m128i, imm0_31); +__m128i __lsx_vbitclr_w (__m128i, __m128i); +__m128i __lsx_vbitrev_b (__m128i, __m128i); +__m128i __lsx_vbitrev_d (__m128i, __m128i); +__m128i __lsx_vbitrev_h (__m128i, __m128i); +__m128i __lsx_vbitrevi_b (__m128i, imm0_7); +__m128i __lsx_vbitrevi_d (__m128i, imm0_63); +__m128i __lsx_vbitrevi_h (__m128i, imm0_15); +__m128i __lsx_vbitrevi_w (__m128i, imm0_31); +__m128i __lsx_vbitrev_w (__m128i, __m128i); +__m128i __lsx_vbitseli_b (__m128i, __m128i, imm0_255); +__m128i __lsx_vbitsel_v (__m128i, __m128i, __m128i); +__m128i __lsx_vbitset_b (__m128i, __m128i); +__m128i __lsx_vbitset_d (__m128i, __m128i); +__m128i __lsx_vbitset_h (__m128i, __m128i); +__m128i __lsx_vbitseti_b (__m128i, imm0_7); +__m128i __lsx_vbitseti_d (__m128i, imm0_63); +__m128i __lsx_vbitseti_h (__m128i, imm0_15); +__m128i __lsx_vbitseti_w (__m128i, imm0_31); +__m128i __lsx_vbitset_w (__m128i, __m128i); +__m128i __lsx_vbsll_v (__m128i, imm0_31); +__m128i __lsx_vbsrl_v (__m128i, imm0_31); +__m128i __lsx_vclo_b (__m128i); +__m128i __lsx_vclo_d (__m128i); +__m128i __lsx_vclo_h (__m128i); +__m128i __lsx_vclo_w (__m128i); +__m128i __lsx_vclz_b (__m128i); +__m128i __lsx_vclz_d (__m128i); +__m128i __lsx_vclz_h (__m128i); +__m128i __lsx_vclz_w (__m128i); +__m128i __lsx_vdiv_b (__m128i, __m128i); +__m128i __lsx_vdiv_bu (__m128i, __m128i); +__m128i __lsx_vdiv_d (__m128i, __m128i); +__m128i __lsx_vdiv_du (__m128i, __m128i); +__m128i __lsx_vdiv_h (__m128i, __m128i); +__m128i __lsx_vdiv_hu (__m128i, __m128i); +__m128i __lsx_vdiv_w (__m128i, __m128i); +__m128i __lsx_vdiv_wu (__m128i, __m128i); +__m128i __lsx_vexth_du_wu (__m128i); +__m128i __lsx_vexth_d_w (__m128i); +__m128i __lsx_vexth_h_b (__m128i); +__m128i __lsx_vexth_hu_bu (__m128i); +__m128i __lsx_vexth_q_d (__m128i); +__m128i __lsx_vexth_qu_du (__m128i); +__m128i __lsx_vexth_w_h (__m128i); +__m128i __lsx_vexth_wu_hu (__m128i); +__m128i __lsx_vextl_q_d (__m128i); +__m128i __lsx_vextl_qu_du (__m128i); +__m128i __lsx_vextrins_b (__m128i, __m128i, imm0_255); +__m128i __lsx_vextrins_d (__m128i, __m128i, imm0_255); +__m128i __lsx_vextrins_h (__m128i, __m128i, imm0_255); +__m128i __lsx_vextrins_w (__m128i, __m128i, imm0_255); +__m128d __lsx_vfadd_d (__m128d, __m128d); +__m128 __lsx_vfadd_s (__m128, __m128); +__m128i __lsx_vfclass_d (__m128d); +__m128i __lsx_vfclass_s (__m128); +__m128i __lsx_vfcmp_caf_d (__m128d, __m128d); +__m128i __lsx_vfcmp_caf_s (__m128, __m128); +__m128i __lsx_vfcmp_ceq_d (__m128d, __m128d); +__m128i __lsx_vfcmp_ceq_s (__m128, __m128); +__m128i __lsx_vfcmp_cle_d (__m128d, __m128d); +__m128i __lsx_vfcmp_cle_s (__m128, __m128); +__m128i __lsx_vfcmp_clt_d (__m128d, __m128d); +__m128i __lsx_vfcmp_clt_s (__m128, __m128); +__m128i __lsx_vfcmp_cne_d (__m128d, __m128d); +__m128i __lsx_vfcmp_cne_s (__m128, __m128); +__m128i __lsx_vfcmp_cor_d (__m128d, __m128d); +__m128i __lsx_vfcmp_cor_s (__m128, __m128); +__m128i __lsx_vfcmp_cueq_d (__m128d, __m128d); +__m128i __lsx_vfcmp_cueq_s (__m128, __m128); +__m128i __lsx_vfcmp_cule_d (__m128d, __m128d); +__m128i __lsx_vfcmp_cule_s (__m128, __m128); +__m128i __lsx_vfcmp_cult_d (__m128d, __m128d); +__m128i __lsx_vfcmp_cult_s (__m128, __m128); +__m128i __lsx_vfcmp_cun_d (__m128d, __m128d); +__m128i __lsx_vfcmp_cune_d (__m128d, __m128d); +__m128i __lsx_vfcmp_cune_s (__m128, __m128); +__m128i __lsx_vfcmp_cun_s (__m128, __m128); +__m128i __lsx_vfcmp_saf_d (__m128d, __m128d); +__m128i __lsx_vfcmp_saf_s (__m128, __m128); +__m128i __lsx_vfcmp_seq_d (__m128d, __m128d); +__m128i __lsx_vfcmp_seq_s (__m128, __m128); +__m128i __lsx_vfcmp_sle_d (__m128d, __m128d); +__m128i __lsx_vfcmp_sle_s (__m128, __m128); +__m128i __lsx_vfcmp_slt_d (__m128d, __m128d); +__m128i __lsx_vfcmp_slt_s (__m128, __m128); +__m128i __lsx_vfcmp_sne_d (__m128d, __m128d); +__m128i __lsx_vfcmp_sne_s (__m128, __m128); +__m128i __lsx_vfcmp_sor_d (__m128d, __m128d); +__m128i __lsx_vfcmp_sor_s (__m128, __m128); +__m128i __lsx_vfcmp_sueq_d (__m128d, __m128d); +__m128i __lsx_vfcmp_sueq_s (__m128, __m128); +__m128i __lsx_vfcmp_sule_d (__m128d, __m128d); +__m128i __lsx_vfcmp_sule_s (__m128, __m128); +__m128i __lsx_vfcmp_sult_d (__m128d, __m128d); +__m128i __lsx_vfcmp_sult_s (__m128, __m128); +__m128i __lsx_vfcmp_sun_d (__m128d, __m128d); +__m128i __lsx_vfcmp_sune_d (__m128d, __m128d); +__m128i __lsx_vfcmp_sune_s (__m128, __m128); +__m128i __lsx_vfcmp_sun_s (__m128, __m128); +__m128d __lsx_vfcvth_d_s (__m128); +__m128i __lsx_vfcvt_h_s (__m128, __m128); +__m128 __lsx_vfcvth_s_h (__m128i); +__m128d __lsx_vfcvtl_d_s (__m128); +__m128 __lsx_vfcvtl_s_h (__m128i); +__m128 __lsx_vfcvt_s_d (__m128d, __m128d); +__m128d __lsx_vfdiv_d (__m128d, __m128d); +__m128 __lsx_vfdiv_s (__m128, __m128); +__m128d __lsx_vffint_d_l (__m128i); +__m128d __lsx_vffint_d_lu (__m128i); +__m128d __lsx_vffinth_d_w (__m128i); +__m128d __lsx_vffintl_d_w (__m128i); +__m128 __lsx_vffint_s_l (__m128i, __m128i); +__m128 __lsx_vffint_s_w (__m128i); +__m128 __lsx_vffint_s_wu (__m128i); +__m128d __lsx_vflogb_d (__m128d); +__m128 __lsx_vflogb_s (__m128); +__m128d __lsx_vfmadd_d (__m128d, __m128d, __m128d); +__m128 __lsx_vfmadd_s (__m128, __m128, __m128); +__m128d __lsx_vfmaxa_d (__m128d, __m128d); +__m128 __lsx_vfmaxa_s (__m128, __m128); +__m128d __lsx_vfmax_d (__m128d, __m128d); +__m128 __lsx_vfmax_s (__m128, __m128); +__m128d __lsx_vfmina_d (__m128d, __m128d); +__m128 __lsx_vfmina_s (__m128, __m128); +__m128d __lsx_vfmin_d (__m128d, __m128d); +__m128 __lsx_vfmin_s (__m128, __m128); +__m128d __lsx_vfmsub_d (__m128d, __m128d, __m128d); +__m128 __lsx_vfmsub_s (__m128, __m128, __m128); +__m128d __lsx_vfmul_d (__m128d, __m128d); +__m128 __lsx_vfmul_s (__m128, __m128); +__m128d __lsx_vfnmadd_d (__m128d, __m128d, __m128d); +__m128 __lsx_vfnmadd_s (__m128, __m128, __m128); +__m128d __lsx_vfnmsub_d (__m128d, __m128d, __m128d); +__m128 __lsx_vfnmsub_s (__m128, __m128, __m128); +__m128d __lsx_vfrecip_d (__m128d); +__m128 __lsx_vfrecip_s (__m128); +__m128d __lsx_vfrint_d (__m128d); +__m128i __lsx_vfrintrm_d (__m128d); +__m128i __lsx_vfrintrm_s (__m128); +__m128i __lsx_vfrintrne_d (__m128d); +__m128i __lsx_vfrintrne_s (__m128); +__m128i __lsx_vfrintrp_d (__m128d); +__m128i __lsx_vfrintrp_s (__m128); +__m128i __lsx_vfrintrz_d (__m128d); +__m128i __lsx_vfrintrz_s (__m128); +__m128 __lsx_vfrint_s (__m128); +__m128d __lsx_vfrsqrt_d (__m128d); +__m128 __lsx_vfrsqrt_s (__m128); +__m128i __lsx_vfrstp_b (__m128i, __m128i, __m128i); +__m128i __lsx_vfrstp_h (__m128i, __m128i, __m128i); +__m128i __lsx_vfrstpi_b (__m128i, __m128i, imm0_31); +__m128i __lsx_vfrstpi_h (__m128i, __m128i, imm0_31); +__m128d __lsx_vfsqrt_d (__m128d); +__m128 __lsx_vfsqrt_s (__m128); +__m128d __lsx_vfsub_d (__m128d, __m128d); +__m128 __lsx_vfsub_s (__m128, __m128); +__m128i __lsx_vftinth_l_s (__m128); +__m128i __lsx_vftint_l_d (__m128d); +__m128i __lsx_vftintl_l_s (__m128); +__m128i __lsx_vftint_lu_d (__m128d); +__m128i __lsx_vftintrmh_l_s (__m128); +__m128i __lsx_vftintrm_l_d (__m128d); +__m128i __lsx_vftintrml_l_s (__m128); +__m128i __lsx_vftintrm_w_d (__m128d, __m128d); +__m128i __lsx_vftintrm_w_s (__m128); +__m128i __lsx_vftintrneh_l_s (__m128); +__m128i __lsx_vftintrne_l_d (__m128d); +__m128i __lsx_vftintrnel_l_s (__m128); +__m128i __lsx_vftintrne_w_d (__m128d, __m128d); +__m128i __lsx_vftintrne_w_s (__m128); +__m128i __lsx_vftintrph_l_s (__m128); +__m128i __lsx_vftintrp_l_d (__m128d); +__m128i __lsx_vftintrpl_l_s (__m128); +__m128i __lsx_vftintrp_w_d (__m128d, __m128d); +__m128i __lsx_vftintrp_w_s (__m128); +__m128i __lsx_vftintrzh_l_s (__m128); +__m128i __lsx_vftintrz_l_d (__m128d); +__m128i __lsx_vftintrzl_l_s (__m128); +__m128i __lsx_vftintrz_lu_d (__m128d); +__m128i __lsx_vftintrz_w_d (__m128d, __m128d); +__m128i __lsx_vftintrz_w_s (__m128); +__m128i __lsx_vftintrz_wu_s (__m128); +__m128i __lsx_vftint_w_d (__m128d, __m128d); +__m128i __lsx_vftint_w_s (__m128); +__m128i __lsx_vftint_wu_s (__m128); +__m128i __lsx_vhaddw_du_wu (__m128i, __m128i); +__m128i __lsx_vhaddw_d_w (__m128i, __m128i); +__m128i __lsx_vhaddw_h_b (__m128i, __m128i); +__m128i __lsx_vhaddw_hu_bu (__m128i, __m128i); +__m128i __lsx_vhaddw_q_d (__m128i, __m128i); +__m128i __lsx_vhaddw_qu_du (__m128i, __m128i); +__m128i __lsx_vhaddw_w_h (__m128i, __m128i); +__m128i __lsx_vhaddw_wu_hu (__m128i, __m128i); +__m128i __lsx_vhsubw_du_wu (__m128i, __m128i); +__m128i __lsx_vhsubw_d_w (__m128i, __m128i); +__m128i __lsx_vhsubw_h_b (__m128i, __m128i); +__m128i __lsx_vhsubw_hu_bu (__m128i, __m128i); +__m128i __lsx_vhsubw_q_d (__m128i, __m128i); +__m128i __lsx_vhsubw_qu_du (__m128i, __m128i); +__m128i __lsx_vhsubw_w_h (__m128i, __m128i); +__m128i __lsx_vhsubw_wu_hu (__m128i, __m128i); +__m128i __lsx_vilvh_b (__m128i, __m128i); +__m128i __lsx_vilvh_d (__m128i, __m128i); +__m128i __lsx_vilvh_h (__m128i, __m128i); +__m128i __lsx_vilvh_w (__m128i, __m128i); +__m128i __lsx_vilvl_b (__m128i, __m128i); +__m128i __lsx_vilvl_d (__m128i, __m128i); +__m128i __lsx_vilvl_h (__m128i, __m128i); +__m128i __lsx_vilvl_w (__m128i, __m128i); +__m128i __lsx_vinsgr2vr_b (__m128i, i32, imm0_15); +__m128i __lsx_vinsgr2vr_d (__m128i, i64, imm0_1); +__m128i __lsx_vinsgr2vr_h (__m128i, i32, imm0_7); +__m128i __lsx_vinsgr2vr_w (__m128i, i32, imm0_3); +__m128i __lsx_vld (void *, imm_n2048_2047) +__m128i __lsx_vldi (imm_n1024_1023) +__m128i __lsx_vldrepl_b (void *, imm_n2048_2047) +__m128i __lsx_vldrepl_d (void *, imm_n256_255) +__m128i __lsx_vldrepl_h (void *, imm_n1024_1023) +__m128i __lsx_vldrepl_w (void *, imm_n512_511) +__m128i __lsx_vldx (void *, i64); +__m128i __lsx_vmadd_b (__m128i, __m128i, __m128i); +__m128i __lsx_vmadd_d (__m128i, __m128i, __m128i); +__m128i __lsx_vmadd_h (__m128i, __m128i, __m128i); +__m128i __lsx_vmadd_w (__m128i, __m128i, __m128i); +__m128i __lsx_vmaddwev_d_w (__m128i, __m128i, __m128i); +__m128i __lsx_vmaddwev_d_wu (__m128i, __m128i, __m128i); +__m128i __lsx_vmaddwev_d_wu_w (__m128i, __m128i, __m128i); +__m128i __lsx_vmaddwev_h_b (__m128i, __m128i, __m128i); +__m128i __lsx_vmaddwev_h_bu (__m128i, __m128i, __m128i); +__m128i __lsx_vmaddwev_h_bu_b (__m128i, __m128i, __m128i); +__m128i __lsx_vmaddwev_q_d (__m128i, __m128i, __m128i); +__m128i __lsx_vmaddwev_q_du (__m128i, __m128i, __m128i); +__m128i __lsx_vmaddwev_q_du_d (__m128i, __m128i, __m128i); +__m128i __lsx_vmaddwev_w_h (__m128i, __m128i, __m128i); +__m128i __lsx_vmaddwev_w_hu (__m128i, __m128i, __m128i); +__m128i __lsx_vmaddwev_w_hu_h (__m128i, __m128i, __m128i); +__m128i __lsx_vmaddwod_d_w (__m128i, __m128i, __m128i); +__m128i __lsx_vmaddwod_d_wu (__m128i, __m128i, __m128i); +__m128i __lsx_vmaddwod_d_wu_w (__m128i, __m128i, __m128i); +__m128i __lsx_vmaddwod_h_b (__m128i, __m128i, __m128i); +__m128i __lsx_vmaddwod_h_bu (__m128i, __m128i, __m128i); +__m128i __lsx_vmaddwod_h_bu_b (__m128i, __m128i, __m128i); +__m128i __lsx_vmaddwod_q_d (__m128i, __m128i, __m128i); +__m128i __lsx_vmaddwod_q_du (__m128i, __m128i, __m128i); +__m128i __lsx_vmaddwod_q_du_d (__m128i, __m128i, __m128i); +__m128i __lsx_vmaddwod_w_h (__m128i, __m128i, __m128i); +__m128i __lsx_vmaddwod_w_hu (__m128i, __m128i, __m128i); +__m128i __lsx_vmaddwod_w_hu_h (__m128i, __m128i, __m128i); +__m128i __lsx_vmax_b (__m128i, __m128i); +__m128i __lsx_vmax_bu (__m128i, __m128i); +__m128i __lsx_vmax_d (__m128i, __m128i); +__m128i __lsx_vmax_du (__m128i, __m128i); +__m128i __lsx_vmax_h (__m128i, __m128i); +__m128i __lsx_vmax_hu (__m128i, __m128i); +__m128i __lsx_vmaxi_b (__m128i, imm_n16_15) +__m128i __lsx_vmaxi_bu (__m128i, imm0_31); +__m128i __lsx_vmaxi_d (__m128i, imm_n16_15) +__m128i __lsx_vmaxi_du (__m128i, imm0_31); +__m128i __lsx_vmaxi_h (__m128i, imm_n16_15) +__m128i __lsx_vmaxi_hu (__m128i, imm0_31); +__m128i __lsx_vmaxi_w (__m128i, imm_n16_15) +__m128i __lsx_vmaxi_wu (__m128i, imm0_31); +__m128i __lsx_vmax_w (__m128i, __m128i); +__m128i __lsx_vmax_wu (__m128i, __m128i); +__m128i __lsx_vmin_b (__m128i, __m128i); +__m128i __lsx_vmin_bu (__m128i, __m128i); +__m128i __lsx_vmin_d (__m128i, __m128i); +__m128i __lsx_vmin_du (__m128i, __m128i); +__m128i __lsx_vmin_h (__m128i, __m128i); +__m128i __lsx_vmin_hu (__m128i, __m128i); +__m128i __lsx_vmini_b (__m128i, imm_n16_15) +__m128i __lsx_vmini_bu (__m128i, imm0_31); +__m128i __lsx_vmini_d (__m128i, imm_n16_15) +__m128i __lsx_vmini_du (__m128i, imm0_31); +__m128i __lsx_vmini_h (__m128i, imm_n16_15) +__m128i __lsx_vmini_hu (__m128i, imm0_31); +__m128i __lsx_vmini_w (__m128i, imm_n16_15) +__m128i __lsx_vmini_wu (__m128i, imm0_31); +__m128i __lsx_vmin_w (__m128i, __m128i); +__m128i __lsx_vmin_wu (__m128i, __m128i); +__m128i __lsx_vmod_b (__m128i, __m128i); +__m128i __lsx_vmod_bu (__m128i, __m128i); +__m128i __lsx_vmod_d (__m128i, __m128i); +__m128i __lsx_vmod_du (__m128i, __m128i); +__m128i __lsx_vmod_h (__m128i, __m128i); +__m128i __lsx_vmod_hu (__m128i, __m128i); +__m128i __lsx_vmod_w (__m128i, __m128i); +__m128i __lsx_vmod_wu (__m128i, __m128i); +__m128i __lsx_vmskgez_b (__m128i); +__m128i __lsx_vmskltz_b (__m128i); +__m128i __lsx_vmskltz_d (__m128i); +__m128i __lsx_vmskltz_h (__m128i); +__m128i __lsx_vmskltz_w (__m128i); +__m128i __lsx_vmsknz_b (__m128i); +__m128i __lsx_vmsub_b (__m128i, __m128i, __m128i); +__m128i __lsx_vmsub_d (__m128i, __m128i, __m128i); +__m128i __lsx_vmsub_h (__m128i, __m128i, __m128i); +__m128i __lsx_vmsub_w (__m128i, __m128i, __m128i); +__m128i __lsx_vmuh_b (__m128i, __m128i); +__m128i __lsx_vmuh_bu (__m128i, __m128i); +__m128i __lsx_vmuh_d (__m128i, __m128i); +__m128i __lsx_vmuh_du (__m128i, __m128i); +__m128i __lsx_vmuh_h (__m128i, __m128i); +__m128i __lsx_vmuh_hu (__m128i, __m128i); +__m128i __lsx_vmuh_w (__m128i, __m128i); +__m128i __lsx_vmuh_wu (__m128i, __m128i); +__m128i __lsx_vmul_b (__m128i, __m128i); +__m128i __lsx_vmul_d (__m128i, __m128i); +__m128i __lsx_vmul_h (__m128i, __m128i); +__m128i __lsx_vmul_w (__m128i, __m128i); +__m128i __lsx_vmulwev_d_w (__m128i, __m128i); +__m128i __lsx_vmulwev_d_wu (__m128i, __m128i); +__m128i __lsx_vmulwev_d_wu_w (__m128i, __m128i); +__m128i __lsx_vmulwev_h_b (__m128i, __m128i); +__m128i __lsx_vmulwev_h_bu (__m128i, __m128i); +__m128i __lsx_vmulwev_h_bu_b (__m128i, __m128i); +__m128i __lsx_vmulwev_q_d (__m128i, __m128i); +__m128i __lsx_vmulwev_q_du (__m128i, __m128i); +__m128i __lsx_vmulwev_q_du_d (__m128i, __m128i); +__m128i __lsx_vmulwev_w_h (__m128i, __m128i); +__m128i __lsx_vmulwev_w_hu (__m128i, __m128i); +__m128i __lsx_vmulwev_w_hu_h (__m128i, __m128i); +__m128i __lsx_vmulwod_d_w (__m128i, __m128i); +__m128i __lsx_vmulwod_d_wu (__m128i, __m128i); +__m128i __lsx_vmulwod_d_wu_w (__m128i, __m128i); +__m128i __lsx_vmulwod_h_b (__m128i, __m128i); +__m128i __lsx_vmulwod_h_bu (__m128i, __m128i); +__m128i __lsx_vmulwod_h_bu_b (__m128i, __m128i); +__m128i __lsx_vmulwod_q_d (__m128i, __m128i); +__m128i __lsx_vmulwod_q_du (__m128i, __m128i); +__m128i __lsx_vmulwod_q_du_d (__m128i, __m128i); +__m128i __lsx_vmulwod_w_h (__m128i, __m128i); +__m128i __lsx_vmulwod_w_hu (__m128i, __m128i); +__m128i __lsx_vmulwod_w_hu_h (__m128i, __m128i); +__m128i __lsx_vneg_b (__m128i); +__m128i __lsx_vneg_d (__m128i); +__m128i __lsx_vneg_h (__m128i); +__m128i __lsx_vneg_w (__m128i); +__m128i __lsx_vnori_b (__m128i, imm0_255); +__m128i __lsx_vnor_v (__m128i, __m128i); +__m128i __lsx_vori_b (__m128i, imm0_255); +__m128i __lsx_vorn_v (__m128i, __m128i); +__m128i __lsx_vor_v (__m128i, __m128i); +__m128i __lsx_vpackev_b (__m128i, __m128i); +__m128i __lsx_vpackev_d (__m128i, __m128i); +__m128i __lsx_vpackev_h (__m128i, __m128i); +__m128i __lsx_vpackev_w (__m128i, __m128i); +__m128i __lsx_vpackod_b (__m128i, __m128i); +__m128i __lsx_vpackod_d (__m128i, __m128i); +__m128i __lsx_vpackod_h (__m128i, __m128i); +__m128i __lsx_vpackod_w (__m128i, __m128i); +__m128i __lsx_vpcnt_b (__m128i); +__m128i __lsx_vpcnt_d (__m128i); +__m128i __lsx_vpcnt_h (__m128i); +__m128i __lsx_vpcnt_w (__m128i); +__m128i __lsx_vpermi_w (__m128i, __m128i, imm0_255); +__m128i __lsx_vpickev_b (__m128i, __m128i); +__m128i __lsx_vpickev_d (__m128i, __m128i); +__m128i __lsx_vpickev_h (__m128i, __m128i); +__m128i __lsx_vpickev_w (__m128i, __m128i); +__m128i __lsx_vpickod_b (__m128i, __m128i); +__m128i __lsx_vpickod_d (__m128i, __m128i); +__m128i __lsx_vpickod_h (__m128i, __m128i); +__m128i __lsx_vpickod_w (__m128i, __m128i); +i32 __lsx_vpickve2gr_b (__m128i, imm0_15); +u32 __lsx_vpickve2gr_bu (__m128i, imm0_15); +i64 __lsx_vpickve2gr_d (__m128i, imm0_1); +u64 __lsx_vpickve2gr_du (__m128i, imm0_1); +i32 __lsx_vpickve2gr_h (__m128i, imm0_7); +u32 __lsx_vpickve2gr_hu (__m128i, imm0_7); +i32 __lsx_vpickve2gr_w (__m128i, imm0_3); +u32 __lsx_vpickve2gr_wu (__m128i, imm0_3); +__m128i __lsx_vreplgr2vr_b (i32); +__m128i __lsx_vreplgr2vr_d (i64); +__m128i __lsx_vreplgr2vr_h (i32); +__m128i __lsx_vreplgr2vr_w (i32); +__m128i __lsx_vrepli_b (imm_n512_511); +__m128i __lsx_vrepli_d (imm_n512_511); +__m128i __lsx_vrepli_h (imm_n512_511); +__m128i __lsx_vrepli_w (imm_n512_511); +__m128i __lsx_vreplve_b (__m128i, i32); +__m128i __lsx_vreplve_d (__m128i, i32); +__m128i __lsx_vreplve_h (__m128i, i32); +__m128i __lsx_vreplvei_b (__m128i, imm0_15); +__m128i __lsx_vreplvei_d (__m128i, imm0_1); +__m128i __lsx_vreplvei_h (__m128i, imm0_7); +__m128i __lsx_vreplvei_w (__m128i, imm0_3); +__m128i __lsx_vreplve_w (__m128i, i32); +__m128i __lsx_vrotr_b (__m128i, __m128i); +__m128i __lsx_vrotr_d (__m128i, __m128i); +__m128i __lsx_vrotr_h (__m128i, __m128i); +__m128i __lsx_vrotri_b (__m128i, imm0_7); +__m128i __lsx_vrotri_d (__m128i, imm0_63); +__m128i __lsx_vrotri_h (__m128i, imm0_15); +__m128i __lsx_vrotri_w (__m128i, imm0_31); +__m128i __lsx_vrotr_w (__m128i, __m128i); +__m128i __lsx_vsadd_b (__m128i, __m128i); +__m128i __lsx_vsadd_bu (__m128i, __m128i); +__m128i __lsx_vsadd_d (__m128i, __m128i); +__m128i __lsx_vsadd_du (__m128i, __m128i); +__m128i __lsx_vsadd_h (__m128i, __m128i); +__m128i __lsx_vsadd_hu (__m128i, __m128i); +__m128i __lsx_vsadd_w (__m128i, __m128i); +__m128i __lsx_vsadd_wu (__m128i, __m128i); +__m128i __lsx_vsat_b (__m128i, imm0_7); +__m128i __lsx_vsat_bu (__m128i, imm0_7); +__m128i __lsx_vsat_d (__m128i, imm0_63); +__m128i __lsx_vsat_du (__m128i, imm0_63); +__m128i __lsx_vsat_h (__m128i, imm0_15); +__m128i __lsx_vsat_hu (__m128i, imm0_15); +__m128i __lsx_vsat_w (__m128i, imm0_31); +__m128i __lsx_vsat_wu (__m128i, imm0_31); +__m128i __lsx_vseq_b (__m128i, __m128i); +__m128i __lsx_vseq_d (__m128i, __m128i); +__m128i __lsx_vseq_h (__m128i, __m128i); +__m128i __lsx_vseqi_b (__m128i, imm_n16_15); +__m128i __lsx_vseqi_d (__m128i, imm_n16_15); +__m128i __lsx_vseqi_h (__m128i, imm_n16_15); +__m128i __lsx_vseqi_w (__m128i, imm_n16_15); +__m128i __lsx_vseq_w (__m128i, __m128i); +__m128i __lsx_vshuf4i_b (__m128i, imm0_255); +__m128i __lsx_vshuf4i_d (__m128i, __m128i, imm0_255); +__m128i __lsx_vshuf4i_h (__m128i, imm0_255); +__m128i __lsx_vshuf4i_w (__m128i, imm0_255); +__m128i __lsx_vshuf_b (__m128i, __m128i, __m128i); +__m128i __lsx_vshuf_d (__m128i, __m128i, __m128i); +__m128i __lsx_vshuf_h (__m128i, __m128i, __m128i); +__m128i __lsx_vshuf_w (__m128i, __m128i, __m128i); +__m128i __lsx_vsigncov_b (__m128i, __m128i); +__m128i __lsx_vsigncov_d (__m128i, __m128i); +__m128i __lsx_vsigncov_h (__m128i, __m128i); +__m128i __lsx_vsigncov_w (__m128i, __m128i); +__m128i __lsx_vsigncov_b (__m128i, __m128i); +__m128i __lsx_vsigncov_d (__m128i, __m128i); +__m128i __lsx_vsigncov_h (__m128i, __m128i); +__m128i __lsx_vsigncov_w (__m128i, __m128i); +__m128i __lsx_vsle_b (__m128i, __m128i); +__m128i __lsx_vsle_bu (__m128i, __m128i); +__m128i __lsx_vsle_d (__m128i, __m128i); +__m128i __lsx_vsle_du (__m128i, __m128i); +__m128i __lsx_vsle_h (__m128i, __m128i); +__m128i __lsx_vsle_hu (__m128i, __m128i); +__m128i __lsx_vslei_b (__m128i, imm_n16_15); +__m128i __lsx_vslei_bu (__m128i, imm0_31); +__m128i __lsx_vslei_d (__m128i, imm_n16_15); +__m128i __lsx_vslei_du (__m128i, imm0_31); +__m128i __lsx_vslei_h (__m128i, imm_n16_15); +__m128i __lsx_vslei_hu (__m128i, imm0_31); +__m128i __lsx_vslei_w (__m128i, imm_n16_15); +__m128i __lsx_vslei_wu (__m128i, imm0_31); +__m128i __lsx_vsle_w (__m128i, __m128i); +__m128i __lsx_vsle_wu (__m128i, __m128i); +__m128i __lsx_vsll_b (__m128i, __m128i); +__m128i __lsx_vsll_d (__m128i, __m128i); +__m128i __lsx_vsll_h (__m128i, __m128i); +__m128i __lsx_vslli_b (__m128i, imm0_7); +__m128i __lsx_vslli_d (__m128i, imm0_63); +__m128i __lsx_vslli_h (__m128i, imm0_15); +__m128i __lsx_vslli_w (__m128i, imm0_31); +__m128i __lsx_vsll_w (__m128i, __m128i); +__m128i __lsx_vsllwil_du_wu (__m128i, imm0_31); +__m128i __lsx_vsllwil_d_w (__m128i, imm0_31); +__m128i __lsx_vsllwil_h_b (__m128i, imm0_7); +__m128i __lsx_vsllwil_hu_bu (__m128i, imm0_7); +__m128i __lsx_vsllwil_w_h (__m128i, imm0_15); +__m128i __lsx_vsllwil_wu_hu (__m128i, imm0_15); +__m128i __lsx_vslt_b (__m128i, __m128i); +__m128i __lsx_vslt_bu (__m128i, __m128i); +__m128i __lsx_vslt_d (__m128i, __m128i); +__m128i __lsx_vslt_du (__m128i, __m128i); +__m128i __lsx_vslt_h (__m128i, __m128i); +__m128i __lsx_vslt_hu (__m128i, __m128i); +__m128i __lsx_vslti_b (__m128i, imm_n16_15); +__m128i __lsx_vslti_bu (__m128i, imm0_31); +__m128i __lsx_vslti_d (__m128i, imm_n16_15); +__m128i __lsx_vslti_du (__m128i, imm0_31); +__m128i __lsx_vslti_h (__m128i, imm_n16_15); +__m128i __lsx_vslti_hu (__m128i, imm0_31); +__m128i __lsx_vslti_w (__m128i, imm_n16_15); +__m128i __lsx_vslti_wu (__m128i, imm0_31); +__m128i __lsx_vslt_w (__m128i, __m128i); +__m128i __lsx_vslt_wu (__m128i, __m128i); +__m128i __lsx_vsra_b (__m128i, __m128i); +__m128i __lsx_vsra_d (__m128i, __m128i); +__m128i __lsx_vsra_h (__m128i, __m128i); +__m128i __lsx_vsrai_b (__m128i, imm0_7); +__m128i __lsx_vsrai_d (__m128i, imm0_63); +__m128i __lsx_vsrai_h (__m128i, imm0_15); +__m128i __lsx_vsrai_w (__m128i, imm0_31); +__m128i __lsx_vsran_b_h (__m128i, __m128i); +__m128i __lsx_vsran_h_w (__m128i, __m128i); +__m128i __lsx_vsrani_b_h (__m128i, __m128i, imm0_15); +__m128i __lsx_vsrani_d_q (__m128i, __m128i, imm0_127) +__m128i __lsx_vsrani_h_w (__m128i, __m128i, imm0_31); +__m128i __lsx_vsrani_w_d (__m128i, __m128i, imm0_63); +__m128i __lsx_vsran_w_d (__m128i, __m128i); +__m128i __lsx_vsrar_b (__m128i, __m128i); +__m128i __lsx_vsrar_d (__m128i, __m128i); +__m128i __lsx_vsrar_h (__m128i, __m128i); +__m128i __lsx_vsrari_b (__m128i, imm0_7); +__m128i __lsx_vsrari_d (__m128i, imm0_63); +__m128i __lsx_vsrari_h (__m128i, imm0_15); +__m128i __lsx_vsrari_w (__m128i, imm0_31); +__m128i __lsx_vsrarn_b_h (__m128i, __m128i); +__m128i __lsx_vsrarn_h_w (__m128i, __m128i); +__m128i __lsx_vsrarni_b_h (__m128i, __m128i, imm0_15); +__m128i __lsx_vsrarni_d_q (__m128i, __m128i, imm0_127) +__m128i __lsx_vsrarni_h_w (__m128i, __m128i, imm0_31); +__m128i __lsx_vsrarni_w_d (__m128i, __m128i, imm0_63); +__m128i __lsx_vsrarn_w_d (__m128i, __m128i); +__m128i __lsx_vsrar_w (__m128i, __m128i); +__m128i __lsx_vsra_w (__m128i, __m128i); +__m128i __lsx_vsrl_b (__m128i, __m128i); +__m128i __lsx_vsrl_d (__m128i, __m128i); +__m128i __lsx_vsrl_h (__m128i, __m128i); +__m128i __lsx_vsrli_b (__m128i, imm0_7); +__m128i __lsx_vsrli_d (__m128i, imm0_63); +__m128i __lsx_vsrli_h (__m128i, imm0_15); +__m128i __lsx_vsrli_w (__m128i, imm0_31); +__m128i __lsx_vsrln_b_h (__m128i, __m128i); +__m128i __lsx_vsrln_h_w (__m128i, __m128i); +__m128i __lsx_vsrlni_b_h (__m128i, __m128i, imm0_15); +__m128i __lsx_vsrlni_d_q (__m128i, __m128i, imm0_127) +__m128i __lsx_vsrlni_h_w (__m128i, __m128i, imm0_31); +__m128i __lsx_vsrlni_w_d (__m128i, __m128i, imm0_63); +__m128i __lsx_vsrln_w_d (__m128i, __m128i); +__m128i __lsx_vsrlr_b (__m128i, __m128i); +__m128i __lsx_vsrlr_d (__m128i, __m128i); +__m128i __lsx_vsrlr_h (__m128i, __m128i); +__m128i __lsx_vsrlri_b (__m128i, imm0_7); +__m128i __lsx_vsrlri_d (__m128i, imm0_63); +__m128i __lsx_vsrlri_h (__m128i, imm0_15); +__m128i __lsx_vsrlri_w (__m128i, imm0_31); +__m128i __lsx_vsrlrn_b_h (__m128i, __m128i); +__m128i __lsx_vsrlrn_h_w (__m128i, __m128i); +__m128i __lsx_vsrlrni_b_h (__m128i, __m128i, imm0_15); +__m128i __lsx_vsrlrni_d_q (__m128i, __m128i, imm0_127) +__m128i __lsx_vsrlrni_h_w (__m128i, __m128i, imm0_31); +__m128i __lsx_vsrlrni_w_d (__m128i, __m128i, imm0_63); +__m128i __lsx_vsrlrn_w_d (__m128i, __m128i); +__m128i __lsx_vsrlr_w (__m128i, __m128i); +__m128i __lsx_vsrl_w (__m128i, __m128i); +__m128i __lsx_vssran_b_h (__m128i, __m128i); +__m128i __lsx_vssran_bu_h (__m128i, __m128i); +__m128i __lsx_vssran_hu_w (__m128i, __m128i); +__m128i __lsx_vssran_h_w (__m128i, __m128i); +__m128i __lsx_vssrani_b_h (__m128i, __m128i, imm0_15); +__m128i __lsx_vssrani_bu_h (__m128i, __m128i, imm0_15); +__m128i __lsx_vssrani_d_q (__m128i, __m128i, imm0_127) +__m128i __lsx_vssrani_du_q (__m128i, __m128i, imm0_127) +__m128i __lsx_vssrani_hu_w (__m128i, __m128i, imm0_31); +__m128i __lsx_vssrani_h_w (__m128i, __m128i, imm0_31); +__m128i __lsx_vssrani_w_d (__m128i, __m128i, imm0_63); +__m128i __lsx_vssrani_wu_d (__m128i, __m128i, imm0_63); +__m128i __lsx_vssran_w_d (__m128i, __m128i); +__m128i __lsx_vssran_wu_d (__m128i, __m128i); +__m128i __lsx_vssrarn_b_h (__m128i, __m128i); +__m128i __lsx_vssrarn_bu_h (__m128i, __m128i); +__m128i __lsx_vssrarn_hu_w (__m128i, __m128i); +__m128i __lsx_vssrarn_h_w (__m128i, __m128i); +__m128i __lsx_vssrarni_b_h (__m128i, __m128i, imm0_15); +__m128i __lsx_vssrarni_bu_h (__m128i, __m128i, imm0_15); +__m128i __lsx_vssrarni_d_q (__m128i, __m128i, imm0_127) +__m128i __lsx_vssrarni_du_q (__m128i, __m128i, imm0_127) +__m128i __lsx_vssrarni_hu_w (__m128i, __m128i, imm0_31); +__m128i __lsx_vssrarni_h_w (__m128i, __m128i, imm0_31); +__m128i __lsx_vssrarni_w_d (__m128i, __m128i, imm0_63); +__m128i __lsx_vssrarni_wu_d (__m128i, __m128i, imm0_63); +__m128i __lsx_vssrarn_w_d (__m128i, __m128i); +__m128i __lsx_vssrarn_wu_d (__m128i, __m128i); +__m128i __lsx_vssrln_b_h (__m128i, __m128i); +__m128i __lsx_vssrln_bu_h (__m128i, __m128i); +__m128i __lsx_vssrln_hu_w (__m128i, __m128i); +__m128i __lsx_vssrln_h_w (__m128i, __m128i); +__m128i __lsx_vssrlni_b_h (__m128i, __m128i, imm0_15); +__m128i __lsx_vssrlni_bu_h (__m128i, __m128i, imm0_15); +__m128i __lsx_vssrlni_d_q (__m128i, __m128i, imm0_127) +__m128i __lsx_vssrlni_du_q (__m128i, __m128i, imm0_127) +__m128i __lsx_vssrlni_hu_w (__m128i, __m128i, imm0_31); +__m128i __lsx_vssrlni_h_w (__m128i, __m128i, imm0_31); +__m128i __lsx_vssrlni_w_d (__m128i, __m128i, imm0_63); +__m128i __lsx_vssrlni_wu_d (__m128i, __m128i, imm0_63); +__m128i __lsx_vssrln_w_d (__m128i, __m128i); +__m128i __lsx_vssrln_wu_d (__m128i, __m128i); +__m128i __lsx_vssrlrn_b_h (__m128i, __m128i); +__m128i __lsx_vssrlrn_bu_h (__m128i, __m128i); +__m128i __lsx_vssrlrn_hu_w (__m128i, __m128i); +__m128i __lsx_vssrlrn_h_w (__m128i, __m128i); +__m128i __lsx_vssrlrni_b_h (__m128i, __m128i, imm0_15); +__m128i __lsx_vssrlrni_bu_h (__m128i, __m128i, imm0_15); +__m128i __lsx_vssrlrni_d_q (__m128i, __m128i, imm0_127) +__m128i __lsx_vssrlrni_du_q (__m128i, __m128i, imm0_127) +__m128i __lsx_vssrlrni_hu_w (__m128i, __m128i, imm0_31); +__m128i __lsx_vssrlrni_h_w (__m128i, __m128i, imm0_31); +__m128i __lsx_vssrlrni_w_d (__m128i, __m128i, imm0_63); +__m128i __lsx_vssrlrni_wu_d (__m128i, __m128i, imm0_63); +__m128i __lsx_vssrlrn_w_d (__m128i, __m128i); +__m128i __lsx_vssrlrn_wu_d (__m128i, __m128i); +__m128i __lsx_vssub_b (__m128i, __m128i); +__m128i __lsx_vssub_bu (__m128i, __m128i); +__m128i __lsx_vssub_d (__m128i, __m128i); +__m128i __lsx_vssub_du (__m128i, __m128i); +__m128i __lsx_vssub_h (__m128i, __m128i); +__m128i __lsx_vssub_hu (__m128i, __m128i); +__m128i __lsx_vssub_w (__m128i, __m128i); +__m128i __lsx_vssub_wu (__m128i, __m128i); +void __lsx_vst (__m128i, void *, imm_n2048_2047) +void __lsx_vstelm_b (__m128i, void *, imm_n128_127, idx); +void __lsx_vstelm_d (__m128i, void *, imm_n128_127, idx); +void __lsx_vstelm_h (__m128i, void *, imm_n128_127, idx); +void __lsx_vstelm_w (__m128i, void *, imm_n128_127, idx); +void __lsx_vstx (__m128i, void *, i64) +__m128i __lsx_vsub_b (__m128i, __m128i); +__m128i __lsx_vsub_d (__m128i, __m128i); +__m128i __lsx_vsub_h (__m128i, __m128i); +__m128i __lsx_vsubi_bu (__m128i, imm0_31); +__m128i __lsx_vsubi_du (__m128i, imm0_31); +__m128i __lsx_vsubi_hu (__m128i, imm0_31); +__m128i __lsx_vsubi_wu (__m128i, imm0_31); +__m128i __lsx_vsub_q (__m128i, __m128i); +__m128i __lsx_vsub_w (__m128i, __m128i); +__m128i __lsx_vsubwev_d_w (__m128i, __m128i); +__m128i __lsx_vsubwev_d_wu (__m128i, __m128i); +__m128i __lsx_vsubwev_h_b (__m128i, __m128i); +__m128i __lsx_vsubwev_h_bu (__m128i, __m128i); +__m128i __lsx_vsubwev_q_d (__m128i, __m128i); +__m128i __lsx_vsubwev_q_du (__m128i, __m128i); +__m128i __lsx_vsubwev_w_h (__m128i, __m128i); +__m128i __lsx_vsubwev_w_hu (__m128i, __m128i); +__m128i __lsx_vsubwod_d_w (__m128i, __m128i); +__m128i __lsx_vsubwod_d_wu (__m128i, __m128i); +__m128i __lsx_vsubwod_h_b (__m128i, __m128i); +__m128i __lsx_vsubwod_h_bu (__m128i, __m128i); +__m128i __lsx_vsubwod_q_d (__m128i, __m128i); +__m128i __lsx_vsubwod_q_du (__m128i, __m128i); +__m128i __lsx_vsubwod_w_h (__m128i, __m128i); +__m128i __lsx_vsubwod_w_hu (__m128i, __m128i); +__m128i __lsx_vxori_b (__m128i, imm0_255); +__m128i __lsx_vxor_v (__m128i, __m128i); +@end smallexample + +@node LoongArch ASX Vector Intrinsics +@subsection LoongArch ASX Vector Intrinsics + + Currently, GCC provides support for 128-bit and 256-bit vector operations on +the LoongArch architecture. When using a @code{256-bit} vector function, you +need to add the header file @code{} and use the compile option +@code{-mlasx} to enable vectorization operations. They can be defined in C as +follows: + +@smallexample +typedef float __m256 __attribute__ ((__vector_size__ (32), __may_alias__)); +typedef long long __m256i __attribute__ ((__vector_size__ (32), __may_alias__)); +typedef double __m256d __attribute__ ((__vector_size__ (32), __may_alias__)); +typedef int i32; +typedef unsigned int u32; +typedef long int i64; +typedef unsigned long int u64; +@end smallexample + + @code{__m256} is an alias of type float, @code{256 bits} (32 bytes) long, and +uses the __may_alias__ attribute, which means that the defined alias can be +accessed by the compiler. Similarly, @code{__m256i} and @code{__m256d} are used +in compilers as aliases for types long long and double respectively. @code{i32} +and @code{i64} are used as aliases for signed integers, while @code{u32} and +@code{u64} are used as aliases for unsigned integers. + + Also, some built-in functions prefer or require immediate numbers as +parameters,because the corresponding instructions accept both immediate numbers +and register operands,or accept immediate numbers only. The immediate +parameters are listed as follows. + +@smallexample +* imm0_1, an integer literal in range 0 to 1. +* imm0_3, an integer literal in range 0 to 3. +* imm0_7, an integer literal in range 0 to 7. +* imm0_15, an integer literal in range 0 to 15. +* imm0_31, an integer literal in range 0 to 31. +* imm0_63, an integer literal in range 0 to 63. +* imm0_127, an integer literal in range 0 to 127. +* imm0_255, an integer literal in range 0 to 255. +* imm_n16_15, an integer literal in range -16 to 15. +* imm_n128_127, an integer literal in range -128 to 127. +* imm_n256_255, an integer literal in range -256 to 255. +* imm_n512_511, an integer literal in range -512 to 511. +* imm_n1024_1023, an integer literal in range -1024 to 1023. +* imm_n2048_2047, an integer literal in range -2048 to 2047. +@end smallexample + + In the builtin function implemented on the LoongArch architecture, there are +some special points to note, as shown below: + + * For instructions with the same source and destination operand, the first +residue of the builtin function call is used as the destination operand. + + * The vector instruction "xvldi vd,i13" is implemented according to whether +the highest bit is 0 or 1,as shown in the following two cases. + +@smallexample +a.When the highest digit of the immediate number (i13) is 0: + Different values of the 11th and 12th bits correspond to the following four +instructions. + If @code{i13[11:10]} is set to 00, the @code{xvrepli.b vd,s10} command is +used to implement the function; + If @code{i13[11:10]} is set to 01, the @code{xvrepli.h vd,s10} command is +used to implement the function; + If @code{i13[11:10]} is set to 10, the @code{xvrepli.w vd,s10} command is +used to implement the function; + If @code{i13[11:10]} is set to 11, the @code{xvrepli.d vd,s10} command is +used to implement the function; + + In the above four instructions, @code{s10} represents the signed number 10. + +b.When the highest digit of the immediate number (i13) is 1: + The compiler has not implemented the builtin function with the highest bit +of 1. +@end smallexample + + * In order to support vseteqz instructions on the loongArch architecture, a +number of builtin functions were added to the GCC compiler, which implement +functions by combining two instructions. +@smallexample + The corresponding assembly instructions for the @code{__lasx_bz_v} function +are @code{xvseteqz.v} and @code{bcnez}. + The corresponding assembly instructions for the @code{__lasx_bz_b} function +are @code{xvsetanyeqz.b} and @code{bcnez}. + The corresponding assembly instructions for the @code{__lasx_bz_h} function +are @code{xvsetanteqz.h} and @code{bcnez}. + The corresponding assembly instructions for the @code{__lasx_bz_w} function +are @code{xvsetanyeqz.w} and @code{bcnez}. + The corresponding assembly instructions for the @code{__lasx_bz_d} function +are @code{xvsetanyeqz.d} and @code{bcnez}. + The corresponding assembly instructions for the @code{__lasx_bnz_v} function +are @code{xvsetnez.v} and @code{bcnez}. + The corresponding assembly instructions for the @code{__lasx_bnz_b} function +are @code{xvsetallnez.b} and @code{bcnez}. + The corresponding assembly instructions for the @code{__lasx_bnz_h} function +are @code{xvsetallnez.h} and @code{bcnez}. + The corresponding assembly instructions for the @code{__lasx_bnz_w} function +are @code{xvsetallnez.w} and @code{bcnez}. + The corresponding assembly instructions for the @code{__lasx_bnz_d} function +are @code{xvsetallnez.d} and @code{bcnez}. +@end smallexample + + The intrinsics provided are listed below: + +@smallexample +__m256i __lasx_vext2xv_d_b (__m256i); +__m256i __lasx_vext2xv_d_h (__m256i); +__m256i __lasx_vext2xv_du_bu (__m256i); +__m256i __lasx_vext2xv_du_hu (__m256i); +__m256i __lasx_vext2xv_du_wu (__m256i); +__m256i __lasx_vext2xv_d_w (__m256i); +__m256i __lasx_vext2xv_h_b (__m256i); +__m256i __lasx_vext2xv_hu_bu (__m256i); +__m256i __lasx_vext2xv_w_b (__m256i); +__m256i __lasx_vext2xv_w_h (__m256i); +__m256i __lasx_vext2xv_wu_bu (__m256i); +__m256i __lasx_vext2xv_wu_hu (__m256i); +i32 __lasx_xbnz_b (__m256i); +i32 __lasx_xbnz_d (__m256i); +i32 __lasx_xbnz_h (__m256i); +i32 __lasx_xbnz_v (__m256i); +i32 __lasx_xbnz_w (__m256i); +i32 __lasx_xbz_b (__m256i); +i32 __lasx_xbz_d (__m256i); +i32 __lasx_xbz_h (__m256i); +i32 __lasx_xbz_v (__m256i); +i32 __lasx_xbz_w (__m256i); +__m256i __lasx_xvabsd_b (__m256i, __m256i); +__m256i __lasx_xvabsd_bu (__m256i, __m256i); +__m256i __lasx_xvabsd_d (__m256i, __m256i); +__m256i __lasx_xvabsd_du (__m256i, __m256i); +__m256i __lasx_xvabsd_h (__m256i, __m256i); +__m256i __lasx_xvabsd_hu (__m256i, __m256i); +__m256i __lasx_xvabsd_w (__m256i, __m256i); +__m256i __lasx_xvabsd_wu (__m256i, __m256i); +__m256i __lasx_xvadda_b (__m256i, __m256i); +__m256i __lasx_xvadda_d (__m256i, __m256i); +__m256i __lasx_xvadda_h (__m256i, __m256i); +__m256i __lasx_xvadda_w (__m256i, __m256i); +__m256i __lasx_xvadd_b (__m256i, __m256i); +__m256i __lasx_xvadd_d (__m256i, __m256i); +__m256i __lasx_xvadd_h (__m256i, __m256i); +__m256i __lasx_xvaddi_bu (__m256i, imm0_31); +__m256i __lasx_xvaddi_du (__m256i, imm0_31); +__m256i __lasx_xvaddi_hu (__m256i, imm0_31); +__m256i __lasx_xvaddi_wu (__m256i, imm0_31); +__m256i __lasx_xvadd_q (__m256i, __m256i); +__m256i __lasx_xvadd_w (__m256i, __m256i); +__m256i __lasx_xvaddwev_d_w (__m256i, __m256i); +__m256i __lasx_xvaddwev_d_wu (__m256i, __m256i); +__m256i __lasx_xvaddwev_d_wu_w (__m256i, __m256i); +__m256i __lasx_xvaddwev_h_b (__m256i, __m256i); +__m256i __lasx_xvaddwev_h_bu (__m256i, __m256i); +__m256i __lasx_xvaddwev_h_bu_b (__m256i, __m256i); +__m256i __lasx_xvaddwev_q_d (__m256i, __m256i); +__m256i __lasx_xvaddwev_q_du (__m256i, __m256i); +__m256i __lasx_xvaddwev_q_du_d (__m256i, __m256i); +__m256i __lasx_xvaddwev_w_h (__m256i, __m256i); +__m256i __lasx_xvaddwev_w_hu (__m256i, __m256i); +__m256i __lasx_xvaddwev_w_hu_h (__m256i, __m256i); +__m256i __lasx_xvaddwod_d_w (__m256i, __m256i); +__m256i __lasx_xvaddwod_d_wu (__m256i, __m256i); +__m256i __lasx_xvaddwod_d_wu_w (__m256i, __m256i); +__m256i __lasx_xvaddwod_h_b (__m256i, __m256i); +__m256i __lasx_xvaddwod_h_bu (__m256i, __m256i); +__m256i __lasx_xvaddwod_h_bu_b (__m256i, __m256i); +__m256i __lasx_xvaddwod_q_d (__m256i, __m256i); +__m256i __lasx_xvaddwod_q_du (__m256i, __m256i); +__m256i __lasx_xvaddwod_q_du_d (__m256i, __m256i); +__m256i __lasx_xvaddwod_w_h (__m256i, __m256i); +__m256i __lasx_xvaddwod_w_hu (__m256i, __m256i); +__m256i __lasx_xvaddwod_w_hu_h (__m256i, __m256i); +__m256i __lasx_xvandi_b (__m256i, imm0_255); +__m256i __lasx_xvandn_v (__m256i, __m256i); +__m256i __lasx_xvand_v (__m256i, __m256i); +__m256i __lasx_xvavg_b (__m256i, __m256i); +__m256i __lasx_xvavg_bu (__m256i, __m256i); +__m256i __lasx_xvavg_d (__m256i, __m256i); +__m256i __lasx_xvavg_du (__m256i, __m256i); +__m256i __lasx_xvavg_h (__m256i, __m256i); +__m256i __lasx_xvavg_hu (__m256i, __m256i); +__m256i __lasx_xvavgr_b (__m256i, __m256i); +__m256i __lasx_xvavgr_bu (__m256i, __m256i); +__m256i __lasx_xvavgr_d (__m256i, __m256i); +__m256i __lasx_xvavgr_du (__m256i, __m256i); +__m256i __lasx_xvavgr_h (__m256i, __m256i); +__m256i __lasx_xvavgr_hu (__m256i, __m256i); +__m256i __lasx_xvavgr_w (__m256i, __m256i); +__m256i __lasx_xvavgr_wu (__m256i, __m256i); +__m256i __lasx_xvavg_w (__m256i, __m256i); +__m256i __lasx_xvavg_wu (__m256i, __m256i); +__m256i __lasx_xvbitclr_b (__m256i, __m256i); +__m256i __lasx_xvbitclr_d (__m256i, __m256i); +__m256i __lasx_xvbitclr_h (__m256i, __m256i); +__m256i __lasx_xvbitclri_b (__m256i, imm0_7); +__m256i __lasx_xvbitclri_d (__m256i, imm0_63); +__m256i __lasx_xvbitclri_h (__m256i, imm0_15); +__m256i __lasx_xvbitclri_w (__m256i, imm0_31); +__m256i __lasx_xvbitclr_w (__m256i, __m256i); +__m256i __lasx_xvbitrev_b (__m256i, __m256i); +__m256i __lasx_xvbitrev_d (__m256i, __m256i); +__m256i __lasx_xvbitrev_h (__m256i, __m256i); +__m256i __lasx_xvbitrevi_b (__m256i, imm0_7); +__m256i __lasx_xvbitrevi_d (__m256i, imm0_63); +__m256i __lasx_xvbitrevi_h (__m256i, imm0_15); +__m256i __lasx_xvbitrevi_w (__m256i, imm0_31); +__m256i __lasx_xvbitrev_w (__m256i, __m256i); +__m256i __lasx_xvbitseli_b (__m256i, __m256i, imm0_255); +__m256i __lasx_xvbitsel_v (__m256i, __m256i, __m256i); +__m256i __lasx_xvbitset_b (__m256i, __m256i); +__m256i __lasx_xvbitset_d (__m256i, __m256i); +__m256i __lasx_xvbitset_h (__m256i, __m256i); +__m256i __lasx_xvbitseti_b (__m256i, imm0_7); +__m256i __lasx_xvbitseti_d (__m256i, imm0_63); +__m256i __lasx_xvbitseti_h (__m256i, imm0_15); +__m256i __lasx_xvbitseti_w (__m256i, imm0_31); +__m256i __lasx_xvbitset_w (__m256i, __m256i); +__m256i __lasx_xvbsll_v (__m256i, imm0_31); +__m256i __lasx_xvbsrl_v (__m256i, imm0_31); +__m256i __lasx_xvclo_b (__m256i); +__m256i __lasx_xvclo_d (__m256i); +__m256i __lasx_xvclo_h (__m256i); +__m256i __lasx_xvclo_w (__m256i); +__m256i __lasx_xvclz_b (__m256i); +__m256i __lasx_xvclz_d (__m256i); +__m256i __lasx_xvclz_h (__m256i); +__m256i __lasx_xvclz_w (__m256i); +__m256i __lasx_xvdiv_b (__m256i, __m256i); +__m256i __lasx_xvdiv_bu (__m256i, __m256i); +__m256i __lasx_xvdiv_d (__m256i, __m256i); +__m256i __lasx_xvdiv_du (__m256i, __m256i); +__m256i __lasx_xvdiv_h (__m256i, __m256i); +__m256i __lasx_xvdiv_hu (__m256i, __m256i); +__m256i __lasx_xvdiv_w (__m256i, __m256i); +__m256i __lasx_xvdiv_wu (__m256i, __m256i); +__m256i __lasx_xvexth_du_wu (__m256i); +__m256i __lasx_xvexth_d_w (__m256i); +__m256i __lasx_xvexth_h_b (__m256i); +__m256i __lasx_xvexth_hu_bu (__m256i); +__m256i __lasx_xvexth_q_d (__m256i); +__m256i __lasx_xvexth_qu_du (__m256i); +__m256i __lasx_xvexth_w_h (__m256i); +__m256i __lasx_xvexth_wu_hu (__m256i); +__m256i __lasx_xvextl_q_d (__m256i); +__m256i __lasx_xvextl_qu_du (__m256i); +__m256i __lasx_xvextrins_b (__m256i, __m256i, imm0_255); +__m256i __lasx_xvextrins_d (__m256i, __m256i, imm0_255); +__m256i __lasx_xvextrins_h (__m256i, __m256i, imm0_255); +__m256i __lasx_xvextrins_w (__m256i, __m256i, imm0_255); +__m256d __lasx_xvfadd_d (__m256d, __m256d); +__m256 __lasx_xvfadd_s (__m256, __m256); +__m256i __lasx_xvfclass_d (__m256d); +__m256i __lasx_xvfclass_s (__m256); +__m256i __lasx_xvfcmp_caf_d (__m256d, __m256d); +__m256i __lasx_xvfcmp_caf_s (__m256, __m256); +__m256i __lasx_xvfcmp_ceq_d (__m256d, __m256d); +__m256i __lasx_xvfcmp_ceq_s (__m256, __m256); +__m256i __lasx_xvfcmp_cle_d (__m256d, __m256d); +__m256i __lasx_xvfcmp_cle_s (__m256, __m256); +__m256i __lasx_xvfcmp_clt_d (__m256d, __m256d); +__m256i __lasx_xvfcmp_clt_s (__m256, __m256); +__m256i __lasx_xvfcmp_cne_d (__m256d, __m256d); +__m256i __lasx_xvfcmp_cne_s (__m256, __m256); +__m256i __lasx_xvfcmp_cor_d (__m256d, __m256d); +__m256i __lasx_xvfcmp_cor_s (__m256, __m256); +__m256i __lasx_xvfcmp_cueq_d (__m256d, __m256d); +__m256i __lasx_xvfcmp_cueq_s (__m256, __m256); +__m256i __lasx_xvfcmp_cule_d (__m256d, __m256d); +__m256i __lasx_xvfcmp_cule_s (__m256, __m256); +__m256i __lasx_xvfcmp_cult_d (__m256d, __m256d); +__m256i __lasx_xvfcmp_cult_s (__m256, __m256); +__m256i __lasx_xvfcmp_cun_d (__m256d, __m256d); +__m256i __lasx_xvfcmp_cune_d (__m256d, __m256d); +__m256i __lasx_xvfcmp_cune_s (__m256, __m256); +__m256i __lasx_xvfcmp_cun_s (__m256, __m256); +__m256i __lasx_xvfcmp_saf_d (__m256d, __m256d); +__m256i __lasx_xvfcmp_saf_s (__m256, __m256); +__m256i __lasx_xvfcmp_seq_d (__m256d, __m256d); +__m256i __lasx_xvfcmp_seq_s (__m256, __m256); +__m256i __lasx_xvfcmp_sle_d (__m256d, __m256d); +__m256i __lasx_xvfcmp_sle_s (__m256, __m256); +__m256i __lasx_xvfcmp_slt_d (__m256d, __m256d); +__m256i __lasx_xvfcmp_slt_s (__m256, __m256); +__m256i __lasx_xvfcmp_sne_d (__m256d, __m256d); +__m256i __lasx_xvfcmp_sne_s (__m256, __m256); +__m256i __lasx_xvfcmp_sor_d (__m256d, __m256d); +__m256i __lasx_xvfcmp_sor_s (__m256, __m256); +__m256i __lasx_xvfcmp_sueq_d (__m256d, __m256d); +__m256i __lasx_xvfcmp_sueq_s (__m256, __m256); +__m256i __lasx_xvfcmp_sule_d (__m256d, __m256d); +__m256i __lasx_xvfcmp_sule_s (__m256, __m256); +__m256i __lasx_xvfcmp_sult_d (__m256d, __m256d); +__m256i __lasx_xvfcmp_sult_s (__m256, __m256); +__m256i __lasx_xvfcmp_sun_d (__m256d, __m256d); +__m256i __lasx_xvfcmp_sune_d (__m256d, __m256d); +__m256i __lasx_xvfcmp_sune_s (__m256, __m256); +__m256i __lasx_xvfcmp_sun_s (__m256, __m256); +__m256d __lasx_xvfcvth_d_s (__m256); +__m256i __lasx_xvfcvt_h_s (__m256, __m256); +__m256 __lasx_xvfcvth_s_h (__m256i); +__m256d __lasx_xvfcvtl_d_s (__m256); +__m256 __lasx_xvfcvtl_s_h (__m256i); +__m256 __lasx_xvfcvt_s_d (__m256d, __m256d); +__m256d __lasx_xvfdiv_d (__m256d, __m256d); +__m256 __lasx_xvfdiv_s (__m256, __m256); +__m256d __lasx_xvffint_d_l (__m256i); +__m256d __lasx_xvffint_d_lu (__m256i); +__m256d __lasx_xvffinth_d_w (__m256i); +__m256d __lasx_xvffintl_d_w (__m256i); +__m256 __lasx_xvffint_s_l (__m256i, __m256i); +__m256 __lasx_xvffint_s_w (__m256i); +__m256 __lasx_xvffint_s_wu (__m256i); +__m256d __lasx_xvflogb_d (__m256d); +__m256 __lasx_xvflogb_s (__m256); +__m256d __lasx_xvfmadd_d (__m256d, __m256d, __m256d); +__m256 __lasx_xvfmadd_s (__m256, __m256, __m256); +__m256d __lasx_xvfmaxa_d (__m256d, __m256d); +__m256 __lasx_xvfmaxa_s (__m256, __m256); +__m256d __lasx_xvfmax_d (__m256d, __m256d); +__m256 __lasx_xvfmax_s (__m256, __m256); +__m256d __lasx_xvfmina_d (__m256d, __m256d); +__m256 __lasx_xvfmina_s (__m256, __m256); +__m256d __lasx_xvfmin_d (__m256d, __m256d); +__m256 __lasx_xvfmin_s (__m256, __m256); +__m256d __lasx_xvfmsub_d (__m256d, __m256d, __m256d); +__m256 __lasx_xvfmsub_s (__m256, __m256, __m256); +__m256d __lasx_xvfmul_d (__m256d, __m256d); +__m256 __lasx_xvfmul_s (__m256, __m256); +__m256d __lasx_xvfnmadd_d (__m256d, __m256d, __m256d); +__m256 __lasx_xvfnmadd_s (__m256, __m256, __m256); +__m256d __lasx_xvfnmsub_d (__m256d, __m256d, __m256d); +__m256 __lasx_xvfnmsub_s (__m256, __m256, __m256); +__m256d __lasx_xvfrecip_d (__m256d); +__m256 __lasx_xvfrecip_s (__m256); +__m256d __lasx_xvfrint_d (__m256d); +__m256i __lasx_xvfrintrm_d (__m256d); +__m256i __lasx_xvfrintrm_s (__m256); +__m256i __lasx_xvfrintrne_d (__m256d); +__m256i __lasx_xvfrintrne_s (__m256); +__m256i __lasx_xvfrintrp_d (__m256d); +__m256i __lasx_xvfrintrp_s (__m256); +__m256i __lasx_xvfrintrz_d (__m256d); +__m256i __lasx_xvfrintrz_s (__m256); +__m256 __lasx_xvfrint_s (__m256); +__m256d __lasx_xvfrsqrt_d (__m256d); +__m256 __lasx_xvfrsqrt_s (__m256); +__m256i __lasx_xvfrstp_b (__m256i, __m256i, __m256i); +__m256i __lasx_xvfrstp_h (__m256i, __m256i, __m256i); +__m256i __lasx_xvfrstpi_b (__m256i, __m256i, imm0_31); +__m256i __lasx_xvfrstpi_h (__m256i, __m256i, imm0_31); +__m256d __lasx_xvfsqrt_d (__m256d); +__m256 __lasx_xvfsqrt_s (__m256); +__m256d __lasx_xvfsub_d (__m256d, __m256d); +__m256 __lasx_xvfsub_s (__m256, __m256); +__m256i __lasx_xvftinth_l_s (__m256); +__m256i __lasx_xvftint_l_d (__m256d); +__m256i __lasx_xvftintl_l_s (__m256); +__m256i __lasx_xvftint_lu_d (__m256d); +__m256i __lasx_xvftintrmh_l_s (__m256); +__m256i __lasx_xvftintrm_l_d (__m256d); +__m256i __lasx_xvftintrml_l_s (__m256); +__m256i __lasx_xvftintrm_w_d (__m256d, __m256d); +__m256i __lasx_xvftintrm_w_s (__m256); +__m256i __lasx_xvftintrneh_l_s (__m256); +__m256i __lasx_xvftintrne_l_d (__m256d); +__m256i __lasx_xvftintrnel_l_s (__m256); +__m256i __lasx_xvftintrne_w_d (__m256d, __m256d); +__m256i __lasx_xvftintrne_w_s (__m256); +__m256i __lasx_xvftintrph_l_s (__m256); +__m256i __lasx_xvftintrp_l_d (__m256d); +__m256i __lasx_xvftintrpl_l_s (__m256); +__m256i __lasx_xvftintrp_w_d (__m256d, __m256d); +__m256i __lasx_xvftintrp_w_s (__m256); +__m256i __lasx_xvftintrzh_l_s (__m256); +__m256i __lasx_xvftintrz_l_d (__m256d); +__m256i __lasx_xvftintrzl_l_s (__m256); +__m256i __lasx_xvftintrz_lu_d (__m256d); +__m256i __lasx_xvftintrz_w_d (__m256d, __m256d); +__m256i __lasx_xvftintrz_w_s (__m256); +__m256i __lasx_xvftintrz_wu_s (__m256); +__m256i __lasx_xvftint_w_d (__m256d, __m256d); +__m256i __lasx_xvftint_w_s (__m256); +__m256i __lasx_xvftint_wu_s (__m256); +__m256i __lasx_xvhaddw_du_wu (__m256i, __m256i); +__m256i __lasx_xvhaddw_d_w (__m256i, __m256i); +__m256i __lasx_xvhaddw_h_b (__m256i, __m256i); +__m256i __lasx_xvhaddw_hu_bu (__m256i, __m256i); +__m256i __lasx_xvhaddw_q_d (__m256i, __m256i); +__m256i __lasx_xvhaddw_qu_du (__m256i, __m256i); +__m256i __lasx_xvhaddw_w_h (__m256i, __m256i); +__m256i __lasx_xvhaddw_wu_hu (__m256i, __m256i); +__m256i __lasx_xvhsubw_du_wu (__m256i, __m256i); +__m256i __lasx_xvhsubw_d_w (__m256i, __m256i); +__m256i __lasx_xvhsubw_h_b (__m256i, __m256i); +__m256i __lasx_xvhsubw_hu_bu (__m256i, __m256i); +__m256i __lasx_xvhsubw_q_d (__m256i, __m256i); +__m256i __lasx_xvhsubw_qu_du (__m256i, __m256i); +__m256i __lasx_xvhsubw_w_h (__m256i, __m256i); +__m256i __lasx_xvhsubw_wu_hu (__m256i, __m256i); +__m256i __lasx_xvilvh_b (__m256i, __m256i); +__m256i __lasx_xvilvh_d (__m256i, __m256i); +__m256i __lasx_xvilvh_h (__m256i, __m256i); +__m256i __lasx_xvilvh_w (__m256i, __m256i); +__m256i __lasx_xvilvl_b (__m256i, __m256i); +__m256i __lasx_xvilvl_d (__m256i, __m256i); +__m256i __lasx_xvilvl_h (__m256i, __m256i); +__m256i __lasx_xvilvl_w (__m256i, __m256i); +__m256i __lasx_xvinsgr2vr_d (__m256i, i64, imm0_3); +__m256i __lasx_xvinsgr2vr_w (__m256i, i32, imm0_7); +__m256i __lasx_xvinsve0_d (__m256i, __m256i, imm0_3); +__m256i __lasx_xvinsve0_w (__m256i, __m256i, imm0_7); +__m256i __lasx_xvld (void *, imm_n2048_2047); +__m256i __lasx_xvldi (imm_n1024_1023); +__m256i __lasx_xvldrepl_b (void *, imm_n2048_2047); +__m256i __lasx_xvldrepl_d (void *, imm_n256_255); +__m256i __lasx_xvldrepl_h (void *, imm_n1024_1023); +__m256i __lasx_xvldrepl_w (void *, imm_n512_511); +__m256i __lasx_xvldx (void *, i64); +__m256i __lasx_xvmadd_b (__m256i, __m256i, __m256i); +__m256i __lasx_xvmadd_d (__m256i, __m256i, __m256i); +__m256i __lasx_xvmadd_h (__m256i, __m256i, __m256i); +__m256i __lasx_xvmadd_w (__m256i, __m256i, __m256i); +__m256i __lasx_xvmaddwev_d_w (__m256i, __m256i, __m256i); +__m256i __lasx_xvmaddwev_d_wu (__m256i, __m256i, __m256i); +__m256i __lasx_xvmaddwev_d_wu_w (__m256i, __m256i, __m256i); +__m256i __lasx_xvmaddwev_h_b (__m256i, __m256i, __m256i); +__m256i __lasx_xvmaddwev_h_bu (__m256i, __m256i, __m256i); +__m256i __lasx_xvmaddwev_h_bu_b (__m256i, __m256i, __m256i); +__m256i __lasx_xvmaddwev_q_d (__m256i, __m256i, __m256i); +__m256i __lasx_xvmaddwev_q_du (__m256i, __m256i, __m256i); +__m256i __lasx_xvmaddwev_q_du_d (__m256i, __m256i, __m256i); +__m256i __lasx_xvmaddwev_w_h (__m256i, __m256i, __m256i); +__m256i __lasx_xvmaddwev_w_hu (__m256i, __m256i, __m256i); +__m256i __lasx_xvmaddwev_w_hu_h (__m256i, __m256i, __m256i); +__m256i __lasx_xvmaddwod_d_w (__m256i, __m256i, __m256i); +__m256i __lasx_xvmaddwod_d_wu (__m256i, __m256i, __m256i); +__m256i __lasx_xvmaddwod_d_wu_w (__m256i, __m256i, __m256i); +__m256i __lasx_xvmaddwod_h_b (__m256i, __m256i, __m256i); +__m256i __lasx_xvmaddwod_h_bu (__m256i, __m256i, __m256i); +__m256i __lasx_xvmaddwod_h_bu_b (__m256i, __m256i, __m256i); +__m256i __lasx_xvmaddwod_q_d (__m256i, __m256i, __m256i); +__m256i __lasx_xvmaddwod_q_du (__m256i, __m256i, __m256i); +__m256i __lasx_xvmaddwod_q_du_d (__m256i, __m256i, __m256i); +__m256i __lasx_xvmaddwod_w_h (__m256i, __m256i, __m256i); +__m256i __lasx_xvmaddwod_w_hu (__m256i, __m256i, __m256i); +__m256i __lasx_xvmaddwod_w_hu_h (__m256i, __m256i, __m256i); +__m256i __lasx_xvmax_b (__m256i, __m256i); +__m256i __lasx_xvmax_bu (__m256i, __m256i); +__m256i __lasx_xvmax_d (__m256i, __m256i); +__m256i __lasx_xvmax_du (__m256i, __m256i); +__m256i __lasx_xvmax_h (__m256i, __m256i); +__m256i __lasx_xvmax_hu (__m256i, __m256i); +__m256i __lasx_xvmaxi_b (__m256i, imm_n16_15); +__m256i __lasx_xvmaxi_bu (__m256i, imm0_31); +__m256i __lasx_xvmaxi_d (__m256i, imm_n16_15); +__m256i __lasx_xvmaxi_du (__m256i, imm0_31); +__m256i __lasx_xvmaxi_h (__m256i, imm_n16_15); +__m256i __lasx_xvmaxi_hu (__m256i, imm0_31); +__m256i __lasx_xvmaxi_w (__m256i, imm_n16_15); +__m256i __lasx_xvmaxi_wu (__m256i, imm0_31); +__m256i __lasx_xvmax_w (__m256i, __m256i); +__m256i __lasx_xvmax_wu (__m256i, __m256i); +__m256i __lasx_xvmin_b (__m256i, __m256i); +__m256i __lasx_xvmin_bu (__m256i, __m256i); +__m256i __lasx_xvmin_d (__m256i, __m256i); +__m256i __lasx_xvmin_du (__m256i, __m256i); +__m256i __lasx_xvmin_h (__m256i, __m256i); +__m256i __lasx_xvmin_hu (__m256i, __m256i); +__m256i __lasx_xvmini_b (__m256i, imm_n16_15); +__m256i __lasx_xvmini_bu (__m256i, imm0_31); +__m256i __lasx_xvmini_d (__m256i, imm_n16_15); +__m256i __lasx_xvmini_du (__m256i, imm0_31); +__m256i __lasx_xvmini_h (__m256i, imm_n16_15); +__m256i __lasx_xvmini_hu (__m256i, imm0_31); +__m256i __lasx_xvmini_w (__m256i, imm_n16_15); +__m256i __lasx_xvmini_wu (__m256i, imm0_31); +__m256i __lasx_xvmin_w (__m256i, __m256i); +__m256i __lasx_xvmin_wu (__m256i, __m256i); +__m256i __lasx_xvmod_b (__m256i, __m256i); +__m256i __lasx_xvmod_bu (__m256i, __m256i); +__m256i __lasx_xvmod_d (__m256i, __m256i); +__m256i __lasx_xvmod_du (__m256i, __m256i); +__m256i __lasx_xvmod_h (__m256i, __m256i); +__m256i __lasx_xvmod_hu (__m256i, __m256i); +__m256i __lasx_xvmod_w (__m256i, __m256i); +__m256i __lasx_xvmod_wu (__m256i, __m256i); +__m256i __lasx_xvmskgez_b (__m256i); +__m256i __lasx_xvmskltz_b (__m256i); +__m256i __lasx_xvmskltz_d (__m256i); +__m256i __lasx_xvmskltz_h (__m256i); +__m256i __lasx_xvmskltz_w (__m256i); +__m256i __lasx_xvmsknz_b (__m256i); +__m256i __lasx_xvmsub_b (__m256i, __m256i, __m256i); +__m256i __lasx_xvmsub_d (__m256i, __m256i, __m256i); +__m256i __lasx_xvmsub_h (__m256i, __m256i, __m256i); +__m256i __lasx_xvmsub_w (__m256i, __m256i, __m256i); +__m256i __lasx_xvmuh_b (__m256i, __m256i); +__m256i __lasx_xvmuh_bu (__m256i, __m256i); +__m256i __lasx_xvmuh_d (__m256i, __m256i); +__m256i __lasx_xvmuh_du (__m256i, __m256i); +__m256i __lasx_xvmuh_h (__m256i, __m256i); +__m256i __lasx_xvmuh_hu (__m256i, __m256i); +__m256i __lasx_xvmuh_w (__m256i, __m256i); +__m256i __lasx_xvmuh_wu (__m256i, __m256i); +__m256i __lasx_xvmul_b (__m256i, __m256i); +__m256i __lasx_xvmul_d (__m256i, __m256i); +__m256i __lasx_xvmul_h (__m256i, __m256i); +__m256i __lasx_xvmul_w (__m256i, __m256i); +__m256i __lasx_xvmulwev_d_w (__m256i, __m256i); +__m256i __lasx_xvmulwev_d_wu (__m256i, __m256i); +__m256i __lasx_xvmulwev_d_wu_w (__m256i, __m256i); +__m256i __lasx_xvmulwev_h_b (__m256i, __m256i); +__m256i __lasx_xvmulwev_h_bu (__m256i, __m256i); +__m256i __lasx_xvmulwev_h_bu_b (__m256i, __m256i); +__m256i __lasx_xvmulwev_q_d (__m256i, __m256i); +__m256i __lasx_xvmulwev_q_du (__m256i, __m256i); +__m256i __lasx_xvmulwev_q_du_d (__m256i, __m256i); +__m256i __lasx_xvmulwev_w_h (__m256i, __m256i); +__m256i __lasx_xvmulwev_w_hu (__m256i, __m256i); +__m256i __lasx_xvmulwev_w_hu_h (__m256i, __m256i); +__m256i __lasx_xvmulwod_d_w (__m256i, __m256i); +__m256i __lasx_xvmulwod_d_wu (__m256i, __m256i); +__m256i __lasx_xvmulwod_d_wu_w (__m256i, __m256i); +__m256i __lasx_xvmulwod_h_b (__m256i, __m256i); +__m256i __lasx_xvmulwod_h_bu (__m256i, __m256i); +__m256i __lasx_xvmulwod_h_bu_b (__m256i, __m256i); +__m256i __lasx_xvmulwod_q_d (__m256i, __m256i); +__m256i __lasx_xvmulwod_q_du (__m256i, __m256i); +__m256i __lasx_xvmulwod_q_du_d (__m256i, __m256i); +__m256i __lasx_xvmulwod_w_h (__m256i, __m256i); +__m256i __lasx_xvmulwod_w_hu (__m256i, __m256i); +__m256i __lasx_xvmulwod_w_hu_h (__m256i, __m256i); +__m256i __lasx_xvneg_b (__m256i); +__m256i __lasx_xvneg_d (__m256i); +__m256i __lasx_xvneg_h (__m256i); +__m256i __lasx_xvneg_w (__m256i); +__m256i __lasx_xvnori_b (__m256i, imm0_255); +__m256i __lasx_xvnor_v (__m256i, __m256i); +__m256i __lasx_xvori_b (__m256i, imm0_255); +__m256i __lasx_xvorn_v (__m256i, __m256i); +__m256i __lasx_xvor_v (__m256i, __m256i); +__m256i __lasx_xvpackev_b (__m256i, __m256i); +__m256i __lasx_xvpackev_d (__m256i, __m256i); +__m256i __lasx_xvpackev_h (__m256i, __m256i); +__m256i __lasx_xvpackev_w (__m256i, __m256i); +__m256i __lasx_xvpackod_b (__m256i, __m256i); +__m256i __lasx_xvpackod_d (__m256i, __m256i); +__m256i __lasx_xvpackod_h (__m256i, __m256i); +__m256i __lasx_xvpackod_w (__m256i, __m256i); +__m256i __lasx_xvpcnt_b (__m256i); +__m256i __lasx_xvpcnt_d (__m256i); +__m256i __lasx_xvpcnt_h (__m256i); +__m256i __lasx_xvpcnt_w (__m256i); +__m256i __lasx_xvpermi_d (__m256i, imm0_255); +__m256i __lasx_xvpermi_q (__m256i, __m256i, imm0_255); +__m256i __lasx_xvpermi_w (__m256i, __m256i, imm0_255); +__m256i __lasx_xvperm_w (__m256i, __m256i); +__m256i __lasx_xvpickev_b (__m256i, __m256i); +__m256i __lasx_xvpickev_d (__m256i, __m256i); +__m256i __lasx_xvpickev_h (__m256i, __m256i); +__m256i __lasx_xvpickev_w (__m256i, __m256i); +__m256i __lasx_xvpickod_b (__m256i, __m256i); +__m256i __lasx_xvpickod_d (__m256i, __m256i); +__m256i __lasx_xvpickod_h (__m256i, __m256i); +__m256i __lasx_xvpickod_w (__m256i, __m256i); +i64 __lasx_xvpickve2gr_d (__m256i, imm0_3); +u64 __lasx_xvpickve2gr_du (__m256i, imm0_3); +i32 __lasx_xvpickve2gr_w (__m256i, imm0_7); +u32 __lasx_xvpickve2gr_wu (__m256i, imm0_7); +__m256i __lasx_xvpickve_d (__m256i, imm0_3); +__m256d __lasx_xvpickve_d_f (__m256d, imm0_3); +__m256i __lasx_xvpickve_w (__m256i, imm0_7); +__m256 __lasx_xvpickve_w_f (__m256, imm0_7); +__m256i __lasx_xvrepl128vei_b (__m256i, imm0_15); +__m256i __lasx_xvrepl128vei_d (__m256i, imm0_1); +__m256i __lasx_xvrepl128vei_h (__m256i, imm0_7); +__m256i __lasx_xvrepl128vei_w (__m256i, imm0_3); +__m256i __lasx_xvreplgr2vr_b (i32); +__m256i __lasx_xvreplgr2vr_d (i64); +__m256i __lasx_xvreplgr2vr_h (i32); +__m256i __lasx_xvreplgr2vr_w (i32); +__m256i __lasx_xvrepli_b (imm_n512_511); +__m256i __lasx_xvrepli_d (imm_n512_511); +__m256i __lasx_xvrepli_h (imm_n512_511); +__m256i __lasx_xvrepli_w (imm_n512_511); +__m256i __lasx_xvreplve0_b (__m256i); +__m256i __lasx_xvreplve0_d (__m256i); +__m256i __lasx_xvreplve0_h (__m256i); +__m256i __lasx_xvreplve0_q (__m256i); +__m256i __lasx_xvreplve0_w (__m256i); +__m256i __lasx_xvreplve_b (__m256i, i32); +__m256i __lasx_xvreplve_d (__m256i, i32); +__m256i __lasx_xvreplve_h (__m256i, i32); +__m256i __lasx_xvreplve_w (__m256i, i32); +__m256i __lasx_xvrotr_b (__m256i, __m256i); +__m256i __lasx_xvrotr_d (__m256i, __m256i); +__m256i __lasx_xvrotr_h (__m256i, __m256i); +__m256i __lasx_xvrotri_b (__m256i, imm0_7); +__m256i __lasx_xvrotri_d (__m256i, imm0_63); +__m256i __lasx_xvrotri_h (__m256i, imm0_15); +__m256i __lasx_xvrotri_w (__m256i, imm0_31); +__m256i __lasx_xvrotr_w (__m256i, __m256i); +__m256i __lasx_xvsadd_b (__m256i, __m256i); +__m256i __lasx_xvsadd_bu (__m256i, __m256i); +__m256i __lasx_xvsadd_d (__m256i, __m256i); +__m256i __lasx_xvsadd_du (__m256i, __m256i); +__m256i __lasx_xvsadd_h (__m256i, __m256i); +__m256i __lasx_xvsadd_hu (__m256i, __m256i); +__m256i __lasx_xvsadd_w (__m256i, __m256i); +__m256i __lasx_xvsadd_wu (__m256i, __m256i); +__m256i __lasx_xvsat_b (__m256i, imm0_7); +__m256i __lasx_xvsat_bu (__m256i, imm0_7); +__m256i __lasx_xvsat_d (__m256i, imm0_63); +__m256i __lasx_xvsat_du (__m256i, imm0_63); +__m256i __lasx_xvsat_h (__m256i, imm0_15); +__m256i __lasx_xvsat_hu (__m256i, imm0_15); +__m256i __lasx_xvsat_w (__m256i, imm0_31); +__m256i __lasx_xvsat_wu (__m256i, imm0_31); +__m256i __lasx_xvseq_b (__m256i, __m256i); +__m256i __lasx_xvseq_d (__m256i, __m256i); +__m256i __lasx_xvseq_h (__m256i, __m256i); +__m256i __lasx_xvseqi_b (__m256i, imm_n16_15); +__m256i __lasx_xvseqi_d (__m256i, imm_n16_15); +__m256i __lasx_xvseqi_h (__m256i, imm_n16_15); +__m256i __lasx_xvseqi_w (__m256i, imm_n16_15); +__m256i __lasx_xvseq_w (__m256i, __m256i); +__m256i __lasx_xvshuf4i_b (__m256i, imm0_255); +__m256i __lasx_xvshuf4i_d (__m256i, __m256i, imm0_255); +__m256i __lasx_xvshuf4i_h (__m256i, imm0_255); +__m256i __lasx_xvshuf4i_w (__m256i, imm0_255); +__m256i __lasx_xvshuf_b (__m256i, __m256i, __m256i); +__m256i __lasx_xvshuf_d (__m256i, __m256i, __m256i); +__m256i __lasx_xvshuf_h (__m256i, __m256i, __m256i); +__m256i __lasx_xvshuf_w (__m256i, __m256i, __m256i); +__m256i __lasx_xvsigncov_b (__m256i, __m256i); +__m256i __lasx_xvsigncov_d (__m256i, __m256i); +__m256i __lasx_xvsigncov_h (__m256i, __m256i); +__m256i __lasx_xvsigncov_w (__m256i, __m256i); +__m256i __lasx_xvsle_b (__m256i, __m256i); +__m256i __lasx_xvsle_bu (__m256i, __m256i); +__m256i __lasx_xvsle_d (__m256i, __m256i); +__m256i __lasx_xvsle_du (__m256i, __m256i); +__m256i __lasx_xvsle_h (__m256i, __m256i); +__m256i __lasx_xvsle_hu (__m256i, __m256i); +__m256i __lasx_xvslei_b (__m256i, imm_n16_15); +__m256i __lasx_xvslei_bu (__m256i, imm0_31); +__m256i __lasx_xvslei_d (__m256i, imm_n16_15); +__m256i __lasx_xvslei_du (__m256i, imm0_31); +__m256i __lasx_xvslei_h (__m256i, imm_n16_15); +__m256i __lasx_xvslei_hu (__m256i, imm0_31); +__m256i __lasx_xvslei_w (__m256i, imm_n16_15); +__m256i __lasx_xvslei_wu (__m256i, imm0_31); +__m256i __lasx_xvsle_w (__m256i, __m256i); +__m256i __lasx_xvsle_wu (__m256i, __m256i); +__m256i __lasx_xvsll_b (__m256i, __m256i); +__m256i __lasx_xvsll_d (__m256i, __m256i); +__m256i __lasx_xvsll_h (__m256i, __m256i); +__m256i __lasx_xvslli_b (__m256i, imm0_7); +__m256i __lasx_xvslli_d (__m256i, imm0_63); +__m256i __lasx_xvslli_h (__m256i, imm0_15); +__m256i __lasx_xvslli_w (__m256i, imm0_31); +__m256i __lasx_xvsll_w (__m256i, __m256i); +__m256i __lasx_xvsllwil_du_wu (__m256i, imm0_31); +__m256i __lasx_xvsllwil_d_w (__m256i, imm0_31); +__m256i __lasx_xvsllwil_h_b (__m256i, imm0_7); +__m256i __lasx_xvsllwil_hu_bu (__m256i, imm0_7); +__m256i __lasx_xvsllwil_w_h (__m256i, imm0_15); +__m256i __lasx_xvsllwil_wu_hu (__m256i, imm0_15); +__m256i __lasx_xvslt_b (__m256i, __m256i); +__m256i __lasx_xvslt_bu (__m256i, __m256i); +__m256i __lasx_xvslt_d (__m256i, __m256i); +__m256i __lasx_xvslt_du (__m256i, __m256i); +__m256i __lasx_xvslt_h (__m256i, __m256i); +__m256i __lasx_xvslt_hu (__m256i, __m256i); +__m256i __lasx_xvslti_b (__m256i, imm_n16_15); +__m256i __lasx_xvslti_bu (__m256i, imm0_31); +__m256i __lasx_xvslti_d (__m256i, imm_n16_15); +__m256i __lasx_xvslti_du (__m256i, imm0_31); +__m256i __lasx_xvslti_h (__m256i, imm_n16_15); +__m256i __lasx_xvslti_hu (__m256i, imm0_31); +__m256i __lasx_xvslti_w (__m256i, imm_n16_15); +__m256i __lasx_xvslti_wu (__m256i, imm0_31); +__m256i __lasx_xvslt_w (__m256i, __m256i); +__m256i __lasx_xvslt_wu (__m256i, __m256i); +__m256i __lasx_xvsra_b (__m256i, __m256i); +__m256i __lasx_xvsra_d (__m256i, __m256i); +__m256i __lasx_xvsra_h (__m256i, __m256i); +__m256i __lasx_xvsrai_b (__m256i, imm0_7); +__m256i __lasx_xvsrai_d (__m256i, imm0_63); +__m256i __lasx_xvsrai_h (__m256i, imm0_15); +__m256i __lasx_xvsrai_w (__m256i, imm0_31); +__m256i __lasx_xvsran_b_h (__m256i, __m256i); +__m256i __lasx_xvsran_h_w (__m256i, __m256i); +__m256i __lasx_xvsrani_b_h (__m256i, __m256i, imm0_15); +__m256i __lasx_xvsrani_d_q (__m256i, __m256i, imm0_127); +__m256i __lasx_xvsrani_h_w (__m256i, __m256i, imm0_31); +__m256i __lasx_xvsrani_w_d (__m256i, __m256i, imm0_63); +__m256i __lasx_xvsran_w_d (__m256i, __m256i); +__m256i __lasx_xvsrar_b (__m256i, __m256i); +__m256i __lasx_xvsrar_d (__m256i, __m256i); +__m256i __lasx_xvsrar_h (__m256i, __m256i); +__m256i __lasx_xvsrari_b (__m256i, imm0_7); +__m256i __lasx_xvsrari_d (__m256i, imm0_63); +__m256i __lasx_xvsrari_h (__m256i, imm0_15); +__m256i __lasx_xvsrari_w (__m256i, imm0_31); +__m256i __lasx_xvsrarn_b_h (__m256i, __m256i); +__m256i __lasx_xvsrarn_h_w (__m256i, __m256i); +__m256i __lasx_xvsrarni_b_h (__m256i, __m256i, imm0_15); +__m256i __lasx_xvsrarni_d_q (__m256i, __m256i, imm0_127); +__m256i __lasx_xvsrarni_h_w (__m256i, __m256i, imm0_31); +__m256i __lasx_xvsrarni_w_d (__m256i, __m256i, imm0_63); +__m256i __lasx_xvsrarn_w_d (__m256i, __m256i); +__m256i __lasx_xvsrar_w (__m256i, __m256i); +__m256i __lasx_xvsra_w (__m256i, __m256i); +__m256i __lasx_xvsrl_b (__m256i, __m256i); +__m256i __lasx_xvsrl_d (__m256i, __m256i); +__m256i __lasx_xvsrl_h (__m256i, __m256i); +__m256i __lasx_xvsrli_b (__m256i, imm0_7); +__m256i __lasx_xvsrli_d (__m256i, imm0_63); +__m256i __lasx_xvsrli_h (__m256i, imm0_15); +__m256i __lasx_xvsrli_w (__m256i, imm0_31); +__m256i __lasx_xvsrln_b_h (__m256i, __m256i); +__m256i __lasx_xvsrln_h_w (__m256i, __m256i); +__m256i __lasx_xvsrlni_b_h (__m256i, __m256i, imm0_15); +__m256i __lasx_xvsrlni_d_q (__m256i, __m256i, imm0_127); +__m256i __lasx_xvsrlni_h_w (__m256i, __m256i, imm0_31); +__m256i __lasx_xvsrlni_w_d (__m256i, __m256i, imm0_63); +__m256i __lasx_xvsrln_w_d (__m256i, __m256i); +__m256i __lasx_xvsrlr_b (__m256i, __m256i); +__m256i __lasx_xvsrlr_d (__m256i, __m256i); +__m256i __lasx_xvsrlr_h (__m256i, __m256i); +__m256i __lasx_xvsrlri_b (__m256i, imm0_7); +__m256i __lasx_xvsrlri_d (__m256i, imm0_63); +__m256i __lasx_xvsrlri_h (__m256i, imm0_15); +__m256i __lasx_xvsrlri_w (__m256i, imm0_31); +__m256i __lasx_xvsrlrn_b_h (__m256i, __m256i); +__m256i __lasx_xvsrlrn_h_w (__m256i, __m256i); +__m256i __lasx_xvsrlrni_b_h (__m256i, __m256i, imm0_15); +__m256i __lasx_xvsrlrni_d_q (__m256i, __m256i, imm0_127); +__m256i __lasx_xvsrlrni_h_w (__m256i, __m256i, imm0_31); +__m256i __lasx_xvsrlrni_w_d (__m256i, __m256i, imm0_63); +__m256i __lasx_xvsrlrn_w_d (__m256i, __m256i); +__m256i __lasx_xvsrlr_w (__m256i, __m256i); +__m256i __lasx_xvsrl_w (__m256i, __m256i); +__m256i __lasx_xvssran_b_h (__m256i, __m256i); +__m256i __lasx_xvssran_bu_h (__m256i, __m256i); +__m256i __lasx_xvssran_hu_w (__m256i, __m256i); +__m256i __lasx_xvssran_h_w (__m256i, __m256i); +__m256i __lasx_xvssrani_b_h (__m256i, __m256i, imm0_15); +__m256i __lasx_xvssrani_bu_h (__m256i, __m256i, imm0_15); +__m256i __lasx_xvssrani_d_q (__m256i, __m256i, imm0_127); +__m256i __lasx_xvssrani_du_q (__m256i, __m256i, imm0_127); +__m256i __lasx_xvssrani_hu_w (__m256i, __m256i, imm0_31); +__m256i __lasx_xvssrani_h_w (__m256i, __m256i, imm0_31); +__m256i __lasx_xvssrani_w_d (__m256i, __m256i, imm0_63); +__m256i __lasx_xvssrani_wu_d (__m256i, __m256i, imm0_63); +__m256i __lasx_xvssran_w_d (__m256i, __m256i); +__m256i __lasx_xvssran_wu_d (__m256i, __m256i); +__m256i __lasx_xvssrarn_b_h (__m256i, __m256i); +__m256i __lasx_xvssrarn_bu_h (__m256i, __m256i); +__m256i __lasx_xvssrarn_hu_w (__m256i, __m256i); +__m256i __lasx_xvssrarn_h_w (__m256i, __m256i); +__m256i __lasx_xvssrarni_b_h (__m256i, __m256i, imm0_15); +__m256i __lasx_xvssrarni_bu_h (__m256i, __m256i, imm0_15); +__m256i __lasx_xvssrarni_d_q (__m256i, __m256i, imm0_127); +__m256i __lasx_xvssrarni_du_q (__m256i, __m256i, imm0_127); +__m256i __lasx_xvssrarni_hu_w (__m256i, __m256i, imm0_31); +__m256i __lasx_xvssrarni_h_w (__m256i, __m256i, imm0_31); +__m256i __lasx_xvssrarni_w_d (__m256i, __m256i, imm0_63); +__m256i __lasx_xvssrarni_wu_d (__m256i, __m256i, imm0_63); +__m256i __lasx_xvssrarn_w_d (__m256i, __m256i); +__m256i __lasx_xvssrarn_wu_d (__m256i, __m256i); +__m256i __lasx_xvssrln_b_h (__m256i, __m256i); +__m256i __lasx_xvssrln_bu_h (__m256i, __m256i); +__m256i __lasx_xvssrln_hu_w (__m256i, __m256i); +__m256i __lasx_xvssrln_h_w (__m256i, __m256i); +__m256i __lasx_xvssrlni_b_h (__m256i, __m256i, imm0_15); +__m256i __lasx_xvssrlni_bu_h (__m256i, __m256i, imm0_15); +__m256i __lasx_xvssrlni_d_q (__m256i, __m256i, imm0_127); +__m256i __lasx_xvssrlni_du_q (__m256i, __m256i, imm0_127); +__m256i __lasx_xvssrlni_hu_w (__m256i, __m256i, imm0_31); +__m256i __lasx_xvssrlni_h_w (__m256i, __m256i, imm0_31); +__m256i __lasx_xvssrlni_w_d (__m256i, __m256i, imm0_63); +__m256i __lasx_xvssrlni_wu_d (__m256i, __m256i, imm0_63); +__m256i __lasx_xvssrln_w_d (__m256i, __m256i); +__m256i __lasx_xvssrln_wu_d (__m256i, __m256i); +__m256i __lasx_xvssrlrn_b_h (__m256i, __m256i); +__m256i __lasx_xvssrlrn_bu_h (__m256i, __m256i); +__m256i __lasx_xvssrlrn_hu_w (__m256i, __m256i); +__m256i __lasx_xvssrlrn_h_w (__m256i, __m256i); +__m256i __lasx_xvssrlrni_b_h (__m256i, __m256i, imm0_15); +__m256i __lasx_xvssrlrni_bu_h (__m256i, __m256i, imm0_15); +__m256i __lasx_xvssrlrni_d_q (__m256i, __m256i, imm0_127); +__m256i __lasx_xvssrlrni_du_q (__m256i, __m256i, imm0_127); +__m256i __lasx_xvssrlrni_hu_w (__m256i, __m256i, imm0_31); +__m256i __lasx_xvssrlrni_h_w (__m256i, __m256i, imm0_31); +__m256i __lasx_xvssrlrni_w_d (__m256i, __m256i, imm0_63); +__m256i __lasx_xvssrlrni_wu_d (__m256i, __m256i, imm0_63); +__m256i __lasx_xvssrlrn_w_d (__m256i, __m256i); +__m256i __lasx_xvssrlrn_wu_d (__m256i, __m256i); +__m256i __lasx_xvssub_b (__m256i, __m256i); +__m256i __lasx_xvssub_bu (__m256i, __m256i); +__m256i __lasx_xvssub_d (__m256i, __m256i); +__m256i __lasx_xvssub_du (__m256i, __m256i); +__m256i __lasx_xvssub_h (__m256i, __m256i); +__m256i __lasx_xvssub_hu (__m256i, __m256i); +__m256i __lasx_xvssub_w (__m256i, __m256i); +__m256i __lasx_xvssub_wu (__m256i, __m256i); +void __lasx_xvst (__m256i, void *, imm_n2048_2047); +void __lasx_xvstelm_b (__m256i, void *, imm_n128_127, idx); +void __lasx_xvstelm_d (__m256i, void *, imm_n128_127, idx); +void __lasx_xvstelm_h (__m256i, void *, imm_n128_127, idx); +void __lasx_xvstelm_w (__m256i, void *, imm_n128_127, idx); +void __lasx_xvstx (__m256i, void *, i64); +__m256i __lasx_xvsub_b (__m256i, __m256i); +__m256i __lasx_xvsub_d (__m256i, __m256i); +__m256i __lasx_xvsub_h (__m256i, __m256i); +__m256i __lasx_xvsubi_bu (__m256i, imm0_31); +__m256i __lasx_xvsubi_du (__m256i, imm0_31); +__m256i __lasx_xvsubi_hu (__m256i, imm0_31); +__m256i __lasx_xvsubi_wu (__m256i, imm0_31); +__m256i __lasx_xvsub_q (__m256i, __m256i); +__m256i __lasx_xvsub_w (__m256i, __m256i); +__m256i __lasx_xvsubwev_d_w (__m256i, __m256i); +__m256i __lasx_xvsubwev_d_wu (__m256i, __m256i); +__m256i __lasx_xvsubwev_h_b (__m256i, __m256i); +__m256i __lasx_xvsubwev_h_bu (__m256i, __m256i); +__m256i __lasx_xvsubwev_q_d (__m256i, __m256i); +__m256i __lasx_xvsubwev_q_du (__m256i, __m256i); +__m256i __lasx_xvsubwev_w_h (__m256i, __m256i); +__m256i __lasx_xvsubwev_w_hu (__m256i, __m256i); +__m256i __lasx_xvsubwod_d_w (__m256i, __m256i); +__m256i __lasx_xvsubwod_d_wu (__m256i, __m256i); +__m256i __lasx_xvsubwod_h_b (__m256i, __m256i); +__m256i __lasx_xvsubwod_h_bu (__m256i, __m256i); +__m256i __lasx_xvsubwod_q_d (__m256i, __m256i); +__m256i __lasx_xvsubwod_q_du (__m256i, __m256i); +__m256i __lasx_xvsubwod_w_h (__m256i, __m256i); +__m256i __lasx_xvsubwod_w_hu (__m256i, __m256i); +__m256i __lasx_xvxori_b (__m256i, imm0_255); +__m256i __lasx_xvxor_v (__m256i, __m256i); +@end smallexample + @node MIPS DSP Built-in Functions @subsection MIPS DSP Built-in Functions