From patchwork Mon Nov 20 00:47:28 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xi Ruoyao X-Patchwork-Id: 166914 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:9910:0:b0:403:3b70:6f57 with SMTP id i16csp1909217vqn; Sun, 19 Nov 2023 16:49:25 -0800 (PST) X-Google-Smtp-Source: AGHT+IHu2/3EYnKEbr2379mDwajVAukbokIPSFzqRiM5eW5s/zSjuvUNl3eYKI2dYn07NE1zn7Ah X-Received: by 2002:a05:620a:618e:b0:76c:ea3f:9010 with SMTP id or14-20020a05620a618e00b0076cea3f9010mr8125523qkn.16.1700441365413; Sun, 19 Nov 2023 16:49:25 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1700441365; cv=pass; d=google.com; s=arc-20160816; b=DKZ+6ikrBiTqqhL6F2hMXDlxA9zoWhxBXy2kVRyHYgHMu59MlUqbWQpDtUkngJc29Y QerDMjcJ9O6YoS2Fv9A6VsDQyOHjwFZcxYtmwbHk+jXwey3t5H6CUyR4+umq6U+g+v5G yXKsSjZVQf8wywWN62IF3mKHtTptfftYA9qoYKv55yW/Re++IIJa9lpufB42t2WSyTbq d5GNKpye+cqwvlFXQ+/ew4dVo/YqJbewCPeLJ8CCq1+/sFGiPjs4A1i6C6inA/BOhUkL PxHV3yZ5Y9/hlgl6V1QYmYNc6UFuvPrHwCh9Bmgb3RnOxk7/QjE7NO8t0yDYmodrDXXk ioxg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature:arc-filter:dmarc-filter:delivered-to; bh=00YWl3buaxe47y1lw5snagmzs9/KSwL+lEl/7qLnKGE=; fh=oUCfM/eMlWtMCtZZKY1bglzxCo7b3kw9D5LTFFWuz38=; b=kYbOEIRy0u+n1oHush5kbLIlNOCwGY4KWzzeeoTveEaMWHn6pkrlGsilJpTyMApssu lLtXFT85kgPLpMpjwUpx4vEHe2ptq3cRbaQRerZXasxSyMP0YTPfaBiriNJYXlRl8LXq 0eCBo2UjYX55t9LQ1I+twDgpszU6Bq1Q4Ja+K5jOF+/xbwQ1gGEgmncviwig0m0jtxck sgQY3tKtO2YjGUNyOk1UMZI3q3I9VzGliqk1y4YYa45vW5yNMz0XWazpzfhV54bJ6/Wg T6u2wdKAh5H9jANE60a6hFOuTojT3hAOs6nXDyCNGoFRBQgI9fr4YsQ9vCpwYXnp+xbT xOfQ== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@xry111.site header.s=default header.b=fb6U4kS0; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=xry111.site Received: from server2.sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id de14-20020a05620a370e00b007740257b80esi6232398qkb.311.2023.11.19.16.49.25 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 19 Nov 2023 16:49:25 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@xry111.site header.s=default header.b=fb6U4kS0; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=xry111.site Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id E223F3857BB7 for ; Mon, 20 Nov 2023 00:48:54 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from xry111.site (xry111.site [89.208.246.23]) by sourceware.org (Postfix) with ESMTPS id D7927385772A for ; Mon, 20 Nov 2023 00:48:10 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D7927385772A Authentication-Results: sourceware.org; dmarc=pass (p=reject dis=none) header.from=xry111.site Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=xry111.site ARC-Filter: OpenARC Filter v1.0.0 sourceware.org D7927385772A Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=89.208.246.23 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700441293; cv=none; b=mR1hQ5lPHgrf4SkLV/0FkyTIbxM60cDpHuSzu0xmIas/KrGu59JFs8CoGNTcolBXd1bPmoxI/ONiISERPavX/7NGxXR819oHIA/5FMXGZam6uuJoM8TdVq5Eagdb+OahBv5PfL5YdvA/zAtSV0WEZWNTk6PqwUZa3PW3W6X/xOM= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700441293; c=relaxed/simple; bh=iC3ozrUu3yBjNODqU8TGshQOHtRprsUFeBvImX1twLQ=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=J2Id4dyx497UcZlad6zANOSs4VprcHX2SiB+AJe+mBVK56tD6dcmRo0vviK2NT4IbbDroO+B8AhAzxToHH9QZhVB9r9B8orCbXCaysv7fob2XipFO8Ya40/kWyiGJj5CqXQKg3T1Z0gklrWowJG7rZhk4+sNh79ZwQ4uqCSFtIw= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=xry111.site; s=default; t=1700441290; bh=iC3ozrUu3yBjNODqU8TGshQOHtRprsUFeBvImX1twLQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=fb6U4kS0COKlxjEsx/cSFykTzKfrh2CA+A6hvPsVcs0GXsKwVc67OOYO+AQAc3rvA J+xt7G+xUoEFg3ZjLVKsifxu8+Q/8IGrvpr7Yg+sJpUimejHY3+biL1LljPU49CUu0 7zXHj7ogaxbQ2pN4H4EhjEUBVX+jS2j+wTuCfhUo= Received: from stargazer.. (unknown [IPv6:240e:358:11b1:2500:dc73:854d:832e:5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (Client did not present a certificate) (Authenticated sender: xry111@xry111.site) by xry111.site (Postfix) with ESMTPSA id 0A2B666B39; Sun, 19 Nov 2023 19:48:05 -0500 (EST) From: Xi Ruoyao To: gcc-patches@gcc.gnu.org Cc: chenglulu , i@xen0n.name, xuchenghua@loongson.cn, Xi Ruoyao Subject: [PATCH v3 5/5] LoongArch: Use LSX for scalar FP rounding with explicit rounding mode Date: Mon, 20 Nov 2023 08:47:28 +0800 Message-ID: <20231120004728.205167-6-xry111@xry111.site> X-Mailer: git-send-email 2.42.1 In-Reply-To: <20231120004728.205167-1-xry111@xry111.site> References: <20231120004728.205167-1-xry111@xry111.site> MIME-Version: 1.0 X-Spam-Status: No, score=-8.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, LIKELY_SPAM_FROM, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1783042005097398685 X-GMAIL-MSGID: 1783042005097398685 In LoongArch FP base ISA there is only the frint.{s/d} instruction which reads the global rounding mode. Utilize LSX for explicit rounding mode even if the operand is scalar. It seems wasting the CPU power, but still much faster than calling the library function. gcc/ChangeLog: * config/loongarch/simd.md (LSX_SCALAR_FRINT): New int iterator. (VLSX_FOR_FMODE): New mode attribute. (2): New expander, expanding to vreplvei.{w/d} + frint{rp/rz/rm/rne}.{s.d}. gcc/testsuite/ChangeLog: * gcc.target/loongarch/vect-frint-scalar.c: New test. * gcc.target/loongarch/vect-frint-scalar-no-inexact.c: New test. --- gcc/config/loongarch/simd.md | 29 +++++++++++++ .../loongarch/vect-frint-scalar-no-inexact.c | 23 ++++++++++ .../gcc.target/loongarch/vect-frint-scalar.c | 43 +++++++++++++++++++ 3 files changed, 95 insertions(+) create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-frint-scalar-no-inexact.c create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-frint-scalar.c diff --git a/gcc/config/loongarch/simd.md b/gcc/config/loongarch/simd.md index 6937477e3df..e592de49aa0 100644 --- a/gcc/config/loongarch/simd.md +++ b/gcc/config/loongarch/simd.md @@ -150,6 +150,35 @@ (define_expand "ftrunc2" UNSPEC_SIMD_FRINTRZ))] "") +;; Use LSX for scalar ceil/floor/trunc/roundeven when -mlsx and -ffp-int- +;; builtin-inexact. The base FP instruction set lacks these operations. +;; Yes we are wasting 50% or even 75% of the CPU horsepower, but it's still +;; much faster than calling a libc function: on LA464 and LA664 there is a +;; 3x ~ 5x speed up. +;; +;; Note that a vreplvei instruction is needed or we'll also operate on the +;; junk in high bits of the vector register and produce random FP exceptions. + +(define_int_iterator LSX_SCALAR_FRINT + [UNSPEC_SIMD_FRINTRP + UNSPEC_SIMD_FRINTRZ + UNSPEC_SIMD_FRINTRM + UNSPEC_SIMD_FRINTRNE]) + +(define_mode_attr VLSX_FOR_FMODE [(DF "V2DF") (SF "V4SF")]) + +(define_expand "2" + [(set (match_dup 2) + (vec_duplicate: + (match_operand:ANYF 1 "register_operand"))) + (set (match_dup 2) + (unspec: [(match_dup 2)] LSX_SCALAR_FRINT)) + (set (match_operand:ANYF 0 "register_operand") + (vec_select:ANYF (match_dup 2) (parallel [(const_int 0)]))) + (clobber (match_scratch: 3))] + "ISA_HAS_LSX && (flag_fp_int_builtin_inexact || !flag_trapping_math)" + "operands[2] = gen_reg_rtx (mode);") + ;; vftint.{/rp/rz/rm} (define_insn "_vftint__" diff --git a/gcc/testsuite/gcc.target/loongarch/vect-frint-scalar-no-inexact.c b/gcc/testsuite/gcc.target/loongarch/vect-frint-scalar-no-inexact.c new file mode 100644 index 00000000000..002e3b92df7 --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/vect-frint-scalar-no-inexact.c @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mlsx -fno-fp-int-builtin-inexact" } */ + +#include "vect-frint-scalar.c" + +/* cannot use LSX for these with -fno-fp-int-builtin-inexact, + call library function. */ +/* { dg-final { scan-assembler "\tb\t%plt\\(ceil\\)" } } */ +/* { dg-final { scan-assembler "\tb\t%plt\\(ceilf\\)" } } */ +/* { dg-final { scan-assembler "\tb\t%plt\\(floor\\)" } } */ +/* { dg-final { scan-assembler "\tb\t%plt\\(floorf\\)" } } */ +/* { dg-final { scan-assembler "\tb\t%plt\\(trunc\\)" } } */ +/* { dg-final { scan-assembler "\tb\t%plt\\(truncf\\)" } } */ +/* { dg-final { scan-assembler "\tb\t%plt\\(roundeven\\)" } } */ +/* { dg-final { scan-assembler "\tb\t%plt\\(roundevenf\\)" } } */ + +/* nearbyint is not allowed to rasie FE_INEXACT for decades */ +/* { dg-final { scan-assembler "\tb\t%plt\\(nearbyint\\)" } } */ +/* { dg-final { scan-assembler "\tb\t%plt\\(nearbyintf\\)" } } */ + +/* rint should just use basic FP operation */ +/* { dg-final { scan-assembler "\tfrint\.s" } } */ +/* { dg-final { scan-assembler "\tfrint\.d" } } */ diff --git a/gcc/testsuite/gcc.target/loongarch/vect-frint-scalar.c b/gcc/testsuite/gcc.target/loongarch/vect-frint-scalar.c new file mode 100644 index 00000000000..c7cb40be7d4 --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/vect-frint-scalar.c @@ -0,0 +1,43 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mlsx" } */ + +#define test(func, suffix) \ +__typeof__ (1.##suffix) \ +_##func##suffix (__typeof__ (1.##suffix) x) \ +{ \ + return __builtin_##func##suffix (x); \ +} + +test (ceil, f) +test (ceil, ) +test (floor, f) +test (floor, ) +test (trunc, f) +test (trunc, ) +test (roundeven, f) +test (roundeven, ) +test (nearbyint, f) +test (nearbyint, ) +test (rint, f) +test (rint, ) + +/* { dg-final { scan-assembler "\tvfrintrp\.s" } } */ +/* { dg-final { scan-assembler "\tvfrintrm\.s" } } */ +/* { dg-final { scan-assembler "\tvfrintrz\.s" } } */ +/* { dg-final { scan-assembler "\tvfrintrne\.s" } } */ +/* { dg-final { scan-assembler "\tvfrintrp\.d" } } */ +/* { dg-final { scan-assembler "\tvfrintrm\.d" } } */ +/* { dg-final { scan-assembler "\tvfrintrz\.d" } } */ +/* { dg-final { scan-assembler "\tvfrintrne\.d" } } */ + +/* must do vreplvei first */ +/* { dg-final { scan-assembler-times "\tvreplvei\.w\t\\\$vr0,\\\$vr0,0" 4 } } */ +/* { dg-final { scan-assembler-times "\tvreplvei\.d\t\\\$vr0,\\\$vr0,0" 4 } } */ + +/* nearbyint is not allowed to rasie FE_INEXACT for decades */ +/* { dg-final { scan-assembler "\tb\t%plt\\(nearbyint\\)" } } */ +/* { dg-final { scan-assembler "\tb\t%plt\\(nearbyintf\\)" } } */ + +/* rint should just use basic FP operation */ +/* { dg-final { scan-assembler "\tfrint\.s" } } */ +/* { dg-final { scan-assembler "\tfrint\.d" } } */