From patchwork Thu Dec 21 08:57:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "juzhe.zhong@rivai.ai" X-Patchwork-Id: 182060 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:2483:b0:fb:cd0c:d3e with SMTP id q3csp276509dyi; Thu, 21 Dec 2023 00:58:31 -0800 (PST) X-Google-Smtp-Source: AGHT+IED2uPxADFv6+NdHpbOWwLVlKd25OPKjJUotpXgymIwOsALoISylIiELWlSZ5cGRqSai3Nm X-Received: by 2002:a9d:7e9a:0:b0:6d9:d63d:3f02 with SMTP id m26-20020a9d7e9a000000b006d9d63d3f02mr16366764otp.34.1703149110897; Thu, 21 Dec 2023 00:58:30 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1703149110; cv=pass; d=google.com; s=arc-20160816; b=xtF2vLJpYj9nSNmLfLnGrDWadaAAmCx0t7Z6YMFX7pq0lWmdwv1TyObvBzwMa1/8No aWH62vlWwG3Kzq8vgdUipr9X2t0I2I+1aSh3KCr4WdIJNTK8i+NKNA4vNKjzf2l7BmTl 3VZyxkjZ5cP5ahdexlJKU7S6wVpQXA7xBAXPvkGq42n+zce1KF9JgfhYqtQp+k5kHlqi vYvGBqWj6c/ktBr+leSDReEX1WtKxdQlMscGepsyv/pfWr/DFqbhfG9SWGW6jBBf5GP/ jM9awrlcOeFfYKXcgAlDlhB4H/2TihQv7XWZpLPCI3e73UnlSRFiHnwPSH4ZXZcs8+H6 MbuA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:feedback-id :content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:arc-filter:dmarc-filter:delivered-to; bh=gzFoTHwWXd1q2Ap53O4edCaPvUnG1vZzxx7kifcv5zE=; fh=12MRPJmZ1mgDpHqWoogMKqnaGRGM2b7lcuJroqfjJiw=; b=HXsuhZ48/e1tYrDoTn9d2y25fW7MvPgCOi9ZNjSOBRulF2cOjFLK/vq/1H3Qtp+JJ9 J/V0ExAbRoLpZYJACd7n3O0ilKzPKjkdJfo9yWmhXF2HMIeuD51loAGS/7gzDsSCcirg 7c1RNigw2a/u77cl2w8vhz5qYPFVfBuCP01cLPVhIJfFD09yz66roJCTPhVq+1ietaPa HGZsdqWv4OTnLEf8PAGWzD7Ry0ZAxj9vGj/x11s5fI3xx5L8pexrOP6JbDSmA8xSsVJq nhbtFAog9M01z2RAhlCrY0/1jv5BVMf6BEHUtCY9euUs/Po3klIV8Nicok6IJxjom3mw wE3g== ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id k3-20020ac85fc3000000b00425895963ddsi1677830qta.411.2023.12.21.00.58.30 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 21 Dec 2023 00:58:30 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 9CD8C386189E for ; Thu, 21 Dec 2023 08:58:30 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtpbgau2.qq.com (smtpbgau2.qq.com [54.206.34.216]) by sourceware.org (Postfix) with ESMTPS id 3105F3858413 for ; Thu, 21 Dec 2023 08:57:58 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 3105F3858413 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivai.ai Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivai.ai ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 3105F3858413 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=54.206.34.216 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1703149083; cv=none; b=Ibp8i7fBf4uqmyRZz+MBG/xeRwwmu3dmjn1nE+eohrudRzQd4zw1MEKkvADVGnZYkpyAzVOpCJaWN1i1Yc0Ya9niUxU5l4SqVDD7HtCtOhgk8suw5yPLUM7Di52AHzIyxdReWrklDUyAonCkpXaPFIADNUF+7Vu6gX8kc9+bEpI= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1703149083; c=relaxed/simple; bh=TvvOSUBHAD++fw9jzxjUY0bzigw4WBnIoeQDMxhuN80=; h=From:To:Subject:Date:Message-Id:MIME-Version; b=NH0TLYU1lWoKulJg+LoHCPBjB/FyKT/AuKxKjPC+zc+P7e9ROWo3fuDSLZhojhH8BJdfeRCZHhxfERF8HAW5J5XBhymSg2ew12B7H/BD33euUgLP1U0cqbg8eemxPrtPpm6t3p5twYOzUhOgLpVs7o3O0I39K9wAQeAlIdQeqpQ= ARC-Authentication-Results: i=1; server2.sourceware.org X-QQ-mid: bizesmtp79t1703149073tao234m8 Received: from rios-cad121.hadoop.rioslab.org ( [58.60.1.9]) by bizesmtp.qq.com (ESMTP) with id ; Thu, 21 Dec 2023 16:57:51 +0800 (CST) X-QQ-SSF: 01400000000000G0V000000A0000000 X-QQ-FEAT: aBJFcW+uBGYAXA3V1YahNjvnCuxynUIWEGGygQYpZ5GmhZI2oF0M0R3RPVdQ3 WKWwIdcu91bhTD3zqI4uJxQNo5nk+TCr3k4d3FGVbo1nRB6f5VKZWfSCua+hI153W7RBOG7 zK1QZwYrFCUUBENitP2ztBF9tgPq0IGFn3OwRMQpb6d7pZEZQx5K6Rvjn6qwpFdO5sOQz0/ eR2l+1TZpcnxO4QtUmUxyl8dQV0WW9XiE89YUfDxj3GnvHmrMDV4bAKU8jeNAkJXkD+ffFA gtHU1RABACfyhb4w8ElX51RUyKYa5CJCuuWsxgtHO50p5tn3NmVm4czjN+vb8IuOlEsHArw sWqvG5Am5HbZl2q6hmfCa0ZQWenngPGqFh51mN53sB//hJl08zsJm1DuQlWGEytrDcX/xcb GxBPkI4M7P8= X-QQ-GoodBg: 2 X-BIZMAIL-ID: 17803413661999039035 From: Juzhe-Zhong To: gcc-patches@gcc.gnu.org Cc: kito.cheng@gmail.com, kito.cheng@sifive.com, jeffreyalaw@gmail.com, rdapp.gcc@gmail.com, Juzhe-Zhong Subject: [Committed] RISC-V: Add dynamic LMUL test for x264 Date: Thu, 21 Dec 2023 16:57:50 +0800 Message-Id: <20231221085750.3541650-1-juzhe.zhong@rivai.ai> X-Mailer: git-send-email 2.36.3 MIME-Version: 1.0 X-QQ-SENDSIZE: 520 Feedback-ID: bizesmtp:rivai.ai:qybglogicsvrgz:qybglogicsvrgz7a-one-0 X-Spam-Status: No, score=-11.4 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_NUMSUBJECT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SCC_5_SHORT_WORD_LINES, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1785881282292260567 X-GMAIL-MSGID: 1785881282292260567 When working on evaluating x264 performance, I notice the best LMUL for such case with -march=rv64gcv is LMUL = 2 LMUL = 1: x264_pixel_8x8: add a4,a1,a2 addi a6,a0,16 vsetivli zero,4,e8,mf4,ta,ma add a5,a4,a2 vle8.v v12,0(a6) vle8.v v2,0(a4) addi a6,a0,4 addi a4,a4,4 vle8.v v11,0(a6) vle8.v v9,0(a4) addi a6,a1,4 addi a4,a0,32 vle8.v v13,0(a0) vle8.v v1,0(a1) vle8.v v4,0(a6) vle8.v v8,0(a4) vle8.v v7,0(a5) vwsubu.vv v3,v13,v1 add a3,a5,a2 addi a6,a0,20 addi a4,a0,36 vle8.v v10,0(a6) vle8.v v6,0(a4) addi a5,a5,4 vle8.v v5,0(a5) vsetvli zero,zero,e16,mf2,ta,mu vmslt.vi v0,v3,0 vneg.v v3,v3,v0.t vsetvli zero,zero,e8,mf4,ta,ma vwsubu.vv v1,v12,v2 vsetvli zero,zero,e16,mf2,ta,mu vmslt.vi v0,v1,0 vneg.v v1,v1,v0.t vmv1r.v v2,v1 vwadd.vv v1,v3,v2 vsetvli zero,zero,e8,mf4,ta,ma vwsubu.vv v2,v11,v4 vsetvli zero,zero,e16,mf2,ta,mu vmslt.vi v0,v2,0 vneg.v v2,v2,v0.t vsetvli zero,zero,e8,mf4,ta,ma vwsubu.vv v3,v10,v9 vsetvli zero,zero,e16,mf2,ta,mu vmv1r.v v4,v2 vmslt.vi v0,v3,0 vneg.v v3,v3,v0.t vwadd.vv v2,v4,v3 vsetvli zero,zero,e8,mf4,ta,ma vwsubu.vv v3,v8,v7 vsetvli zero,zero,e16,mf2,ta,mu add a4,a3,a2 vmslt.vi v0,v3,0 vneg.v v3,v3,v0.t vwadd.wv v1,v1,v3 vsetvli zero,zero,e8,mf4,ta,ma add a5,a4,a2 vwsubu.vv v3,v6,v5 addi a6,a0,48 vsetvli zero,zero,e16,mf2,ta,mu vle8.v v16,0(a3) vle8.v v12,0(a4) addi a3,a3,4 addi a4,a4,4 vle8.v v17,0(a6) vle8.v v14,0(a3) vle8.v v10,0(a4) vle8.v v8,0(a5) add a6,a5,a2 addi a3,a0,64 addi a4,a0,80 addi a5,a5,4 vle8.v v13,0(a3) vle8.v v4,0(a5) vle8.v v9,0(a4) vle8.v v6,0(a6) vmslt.vi v0,v3,0 addi a7,a0,52 vneg.v v3,v3,v0.t vle8.v v15,0(a7) vwadd.wv v2,v2,v3 addi a3,a0,68 addi a4,a0,84 vle8.v v11,0(a3) vle8.v v5,0(a4) addi a5,a0,96 vle8.v v7,0(a5) vsetvli zero,zero,e8,mf4,ta,ma vwsubu.vv v3,v17,v16 vsetvli zero,zero,e16,mf2,ta,mu vmslt.vi v0,v3,0 vneg.v v3,v3,v0.t vwadd.wv v1,v1,v3 vsetvli zero,zero,e8,mf4,ta,ma vwsubu.vv v3,v15,v14 vsetvli zero,zero,e16,mf2,ta,mu vmslt.vi v0,v3,0 vneg.v v3,v3,v0.t vwadd.wv v2,v2,v3 vsetvli zero,zero,e8,mf4,ta,ma vwsubu.vv v3,v13,v12 vsetvli zero,zero,e16,mf2,ta,mu slli a4,a2,3 vmslt.vi v0,v3,0 vneg.v v3,v3,v0.t vwadd.wv v1,v1,v3 vsetvli zero,zero,e8,mf4,ta,ma sub a4,a4,a2 vwsubu.vv v3,v11,v10 vsetvli zero,zero,e16,mf2,ta,mu add a1,a1,a4 vmslt.vi v0,v3,0 vneg.v v3,v3,v0.t vwadd.wv v2,v2,v3 vsetvli zero,zero,e8,mf4,ta,ma lbu a7,0(a1) vwsubu.vv v3,v9,v8 lbu a5,112(a0) vsetvli zero,zero,e16,mf2,ta,mu subw a5,a5,a7 vmslt.vi v0,v3,0 lbu a3,113(a0) vneg.v v3,v3,v0.t lbu a4,1(a1) vwadd.wv v1,v1,v3 addi a6,a6,4 vsetvli zero,zero,e8,mf4,ta,ma subw a3,a3,a4 vwsubu.vv v3,v5,v4 addi a2,a0,100 vsetvli zero,zero,e16,mf2,ta,mu vle8.v v4,0(a6) sraiw a6,a5,31 vle8.v v5,0(a2) sraiw a7,a3,31 vmslt.vi v0,v3,0 xor a2,a5,a6 vneg.v v3,v3,v0.t vwadd.wv v2,v2,v3 vsetvli zero,zero,e8,mf4,ta,ma lbu a4,114(a0) vwsubu.vv v3,v7,v6 lbu t1,2(a1) vsetvli zero,zero,e16,mf2,ta,mu subw a2,a2,a6 xor a6,a3,a7 vmslt.vi v0,v3,0 subw a4,a4,t1 vneg.v v3,v3,v0.t lbu t1,3(a1) vwadd.wv v1,v1,v3 lbu a5,115(a0) subw a6,a6,a7 vsetvli zero,zero,e8,mf4,ta,ma li a7,0 vwsubu.vv v3,v5,v4 sraiw t3,a4,31 vsetvli zero,zero,e16,mf2,ta,mu subw a5,a5,t1 vmslt.vi v0,v3,0 vneg.v v3,v3,v0.t vwadd.wv v2,v2,v3 sraiw t1,a5,31 vsetvli zero,zero,e32,m1,ta,ma xor a4,a4,t3 vadd.vv v1,v1,v2 vmv.s.x v2,a7 vredsum.vs v1,v1,v2 vmv.x.s a7,v1 addw a2,a7,a2 subw a4,a4,t3 addw a6,a6,a2 xor a2,a5,t1 lbu a3,116(a0) lbu t4,4(a1) addw a4,a4,a6 subw a2,a2,t1 lbu a5,5(a1) subw a3,a3,t4 addw a2,a2,a4 lbu a4,117(a0) lbu t1,6(a1) sraiw a7,a3,31 subw a4,a4,a5 lbu a5,118(a0) sraiw a6,a4,31 subw a5,a5,t1 xor a3,a3,a7 lbu t1,7(a1) lbu a0,119(a0) sraiw a1,a5,31 subw a0,a0,t1 subw a3,a3,a7 xor a4,a4,a6 addw a3,a3,a2 subw a4,a4,a6 sraiw a2,a0,31 xor a5,a5,a1 addw a4,a4,a3 subw a5,a5,a1 xor a0,a0,a2 addw a5,a5,a4 subw a0,a0,a2 addw a0,a0,a5 ret LMUL = dynamic x264_pixel_8x8: add a7,a1,a2 vsetivli zero,8,e8,mf2,ta,ma add a6,a7,a2 vle8.v v1,0(a1) add a3,a6,a2 vle8.v v2,0(a7) add a4,a3,a2 vle8.v v13,0(a0) vle8.v v7,0(a4) vwsubu.vv v4,v13,v1 vle8.v v11,0(a6) vle8.v v9,0(a3) add a5,a4,a2 addi t1,a0,16 vle8.v v5,0(a5) vle8.v v3,0(t1) addi a7,a0,32 addi a6,a0,48 vle8.v v12,0(a7) vle8.v v10,0(a6) addi a3,a0,64 addi a4,a0,80 vle8.v v8,0(a3) vle8.v v6,0(a4) vsetvli zero,zero,e16,m1,ta,mu vmslt.vi v0,v4,0 vneg.v v4,v4,v0.t vsetvli zero,zero,e8,mf2,ta,ma vwsubu.vv v1,v3,v2 vsetvli zero,zero,e16,m1,ta,mu vmslt.vi v0,v1,0 vneg.v v1,v1,v0.t vwadd.vv v2,v4,v1 vsetvli zero,zero,e8,mf2,ta,ma vwsubu.vv v1,v12,v11 vsetvli zero,zero,e16,m1,ta,mu vmslt.vi v0,v1,0 vneg.v v1,v1,v0.t vwadd.wv v2,v2,v1 vsetvli zero,zero,e8,mf2,ta,ma vwsubu.vv v1,v10,v9 vsetvli zero,zero,e16,m1,ta,mu vmslt.vi v0,v1,0 vneg.v v1,v1,v0.t vwadd.wv v2,v2,v1 vsetvli zero,zero,e8,mf2,ta,ma vwsubu.vv v1,v8,v7 vsetvli zero,zero,e16,m1,ta,mu slli a4,a2,3 vmslt.vi v0,v1,0 vneg.v v1,v1,v0.t vwadd.wv v2,v2,v1 vsetvli zero,zero,e8,mf2,ta,ma sub a4,a4,a2 vwsubu.vv v1,v6,v5 vsetvli zero,zero,e16,m1,ta,mu addi a3,a0,96 vmslt.vi v0,v1,0 vle8.v v7,0(a3) vneg.v v1,v1,v0.t add a5,a5,a2 vwadd.wv v2,v2,v1 vle8.v v6,0(a5) addi a0,a0,112 add a1,a1,a4 vle8.v v5,0(a0) vle8.v v4,0(a1) vsetvli zero,zero,e8,mf2,ta,ma vwsubu.vv v1,v7,v6 vsetvli zero,zero,e16,m1,ta,mu vmslt.vi v0,v1,0 vneg.v v1,v1,v0.t vwadd.wv v2,v2,v1 vsetvli zero,zero,e32,m2,ta,ma li a5,0 vmv.s.x v1,a5 vredsum.vs v1,v2,v1 vmv.x.s a0,v1 vsetvli zero,zero,e8,mf2,ta,ma vwsubu.vv v1,v5,v4 vsetvli zero,zero,e16,m1,ta,mu vmslt.vi v0,v1,0 vneg.v v1,v1,v0.t vsetivli zero,1,e32,m1,ta,ma vmv.s.x v2,a5 vsetivli zero,8,e16,m1,ta,ma vwredsumu.vs v1,v1,v2 vsetivli zero,0,e32,m1,ta,ma vmv.x.s a5,v1 addw a0,a0,a5 ret I notice we have much better codegen and performance improvement gain with --param=riscv-autovec-lmul=dynamic which is able to pick the best LMUL (M2). Add test avoid future somebody potential destroy performance on X264. gcc/testsuite/ChangeLog: * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c: New test. --- .../costmodel/riscv/rvv/dynamic-lmul2-7.c | 24 +++++++++++++++++++ 1 file changed, 24 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c new file mode 100644 index 00000000000..87e963edc47 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize --param riscv-autovec-lmul=dynamic" } */ + +int +x264_pixel_8x8 (unsigned char *pix1, unsigned char *pix2, int i_stride_pix2) +{ + int i_sum = 0; + for (int y = 0; y < 8; y++) + { + i_sum += __builtin_abs (pix1[0] - pix2[0]); + i_sum += __builtin_abs (pix1[1] - pix2[1]); + i_sum += __builtin_abs (pix1[2] - pix2[2]); + i_sum += __builtin_abs (pix1[3] - pix2[3]); + i_sum += __builtin_abs (pix1[4] - pix2[4]); + i_sum += __builtin_abs (pix1[5] - pix2[5]); + i_sum += __builtin_abs (pix1[6] - pix2[6]); + i_sum += __builtin_abs (pix1[7] - pix2[7]); + pix1 += 16; + pix2 += i_stride_pix2; + } + return i_sum; +} + +/* { dg-final { scan-assembler {e32,m2} } } */