Message ID | 20231202135202.4071-1-jszhang@kernel.org |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:bcd1:0:b0:403:3b70:6f57 with SMTP id r17csp1776201vqy; Sat, 2 Dec 2023 06:04:49 -0800 (PST) X-Google-Smtp-Source: AGHT+IHMLgJ0GFJ1YxxpRkWro3z00aYRi/EsPKAXL3LUnvq0CHVshlj1CC1kabsmn23YYHnq5f0m X-Received: by 2002:a05:6359:640f:b0:170:17eb:9c5c with SMTP id sh15-20020a056359640f00b0017017eb9c5cmr1232249rwb.61.1701525889027; Sat, 02 Dec 2023 06:04:49 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701525888; cv=none; d=google.com; s=arc-20160816; b=S2hGzpLaMhevPEwSzqnX0q0VcAcc3qErvcZ9bS+OtD+yy8rMYxwSQKLdIN7bkraK9Y ZvIIv/1eu4E8eMoNmL3MzDhx0s1GsXS7rvMVGcLTLQ/1DLuipoi0mShl/cQ7pCH7iklh xHCJ4vRKT0UPTUj8Sg5X2/S1ZDbUkMWT0HgbIXXVbkiFicHNONCSIdcWk95VBXav+9q5 ic/3kkf6cWroY5bVGj7vV9FuGz7SsoIdmbqyP05GeieuVbspR5r6dnen8WW/pU+k2NUW t+cR55QeIOkX034wwDztjcPhBxoa3gQEX/FBsl6b3Rgjt9NVWqDJy/F8Hd4gyJLzoN+M x5MA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=csGJEunuBR3yoi+1/MfrfmKcCnH12dlvCn3r096CDf0=; fh=DlmJuR6wwQKMZDEHlp504aEHD07s8CYIONf2sauIDG0=; b=L4H8krIeZ9C/HlX/pCV6PoXXDeKM81Bhr9c01AO8F/HrY3jaLSNS0AErFoSewBD2bM 0edw7WXQR8u+w3O0PYev8gHXghUsLbRFMus9W2Kmfh/EVE7D8WM0jj0MyIrpBQnTYaiD aQnLuy2W6CzQV7gCNqIg93pfEB6sb+W8dGhmz2ztxLjBWzQAqVUiP3aedHaNwWVQL357 hgVH/r+xduiAzbR9jP7kG7lHsTuxbSxWymJ6TBZeUy8TLwsapHd0GzIx16aIC0QI29V4 ZngVz6ugwo6xoCR2QqRupcYMILwjRF8sZ9zsQ2fQsd+nqJ0c3VKnT0eE1pTTEVW+JMM1 VN1w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=PJyPXWna; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from groat.vger.email (groat.vger.email. [23.128.96.35]) by mx.google.com with ESMTPS id a21-20020a63d215000000b005c658bf30aasi1359182pgg.412.2023.12.02.06.04.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 02 Dec 2023 06:04:48 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) client-ip=23.128.96.35; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=PJyPXWna; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id 820A7808D292; Sat, 2 Dec 2023 06:04:46 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229852AbjLBOEY (ORCPT <rfc822;pwkd43@gmail.com> + 99 others); Sat, 2 Dec 2023 09:04:24 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48464 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229451AbjLBOEX (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Sat, 2 Dec 2023 09:04:23 -0500 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 804C311C for <linux-kernel@vger.kernel.org>; Sat, 2 Dec 2023 06:04:29 -0800 (PST) Received: by smtp.kernel.org (Postfix) with ESMTPSA id BA462C433C9; Sat, 2 Dec 2023 14:04:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1701525869; bh=AV5SjHrRFe5hjdLFAKlI3xJUbybDuDNPP1TvLDaRINs=; h=From:To:Cc:Subject:Date:From; b=PJyPXWnas2YQ4+MeVwOs6h99j9+jSJBaewQgFLM0EjWj2woE82icYVpFy1mOkiKGS T3skUbOlQQC4EivCgBtVdPEIBdbrCg10wZOHHleHuhQYHeLPckVXiQhT5W8nF+faXw zGVOG4WFYtfkGdJinzCdLiy+LbYVK3JQJTwrf69l6OeuSwBGvF6exbLPQSsnvUzZWq sQoLbGg/7fny0rt0f38Qi6pEmk1lz/nRGaNs2CNBuYfofhynZIFz6pYrZ57nhD+pLD 4GqyvjBrnUMed5YDKzYdpahARVX8Dcxeicqagckk/Bv6JfsfbkKtLu+zyPKDKJhGls OmBMRYzXvM0JQ== From: Jisheng Zhang <jszhang@kernel.org> To: Paul Walmsley <paul.walmsley@sifive.com>, Palmer Dabbelt <palmer@dabbelt.com>, Albert Ou <aou@eecs.berkeley.edu> Cc: linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, Samuel Holland <samuel.holland@sifive.com> Subject: [PATCH v2] riscv: select ARCH_HAS_FAST_MULTIPLIER Date: Sat, 2 Dec 2023 21:52:02 +0800 Message-Id: <20231202135202.4071-1-jszhang@kernel.org> X-Mailer: git-send-email 2.40.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Sat, 02 Dec 2023 06:04:46 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1784179210223046948 X-GMAIL-MSGID: 1784179210223046948 |
Series |
[v2] riscv: select ARCH_HAS_FAST_MULTIPLIER
|
|
Commit Message
Jisheng Zhang
Dec. 2, 2023, 1:52 p.m. UTC
Currently, riscv linux requires at least IMA, so all platforms have a multiplier. And I assume the 'mul' efficiency is comparable or better than a sequence of five or so register-dependent arithmetic instructions. Select ARCH_HAS_FAST_MULTIPLIER to get slightly nicer codegen. Refer to commit f9b4192923fa ("[PATCH] bitops: hweight() speedup") for more details. In a simple benchmark test calling hweight64() in a loop, it got: about 14% performance improvement on JH7110, tested on Milkv Mars. about 23% performance improvement on TH1520 and SG2042, tested on Sipeed LPI4A and SG2042 platform. a slight performance drop on CV1800B, tested on milkv duo. Among all riscv platforms in my hands, this is the only one which sees a slight performance drop. It means the 'mul' isn't quick enough. However, the situation exists on x86 too, for example, P4 doesn't have fast integer multiplies as said in the above commit, x86 also selects ARCH_HAS_FAST_MULTIPLIER. So let's select ARCH_HAS_FAST_MULTIPLIER which can benefit almost riscv platforms. Samuel also provided some performance numbers: On Unmatched: 20% speedup for __sw_hweight32 and 30% speedup for __sw_hweight64. On D1: 8% speedup for __sw_hweight32 and 8% slowdown for __sw_hweight64. Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Reviewed-by: Samuel Holland <samuel.holland@sifive.com> Tested-by: Samuel Holland <samuel.holland@sifive.com> --- since v1: - fix typo in commit msg - add some performance numbers provided by Samuel - collect Reviewed-by and Tested-by tag arch/riscv/Kconfig | 1 + 1 file changed, 1 insertion(+)
Comments
On Sat, Dec 02, 2023 at 09:52:02PM +0800, Jisheng Zhang wrote: > Currently, riscv linux requires at least IMA, so all platforms have a > multiplier. And I assume the 'mul' efficiency is comparable or better > than a sequence of five or so register-dependent arithmetic > instructions. Select ARCH_HAS_FAST_MULTIPLIER to get slightly nicer > codegen. Refer to commit f9b4192923fa ("[PATCH] bitops: hweight() > speedup") for more details. > > In a simple benchmark test calling hweight64() in a loop, it got: > about 14% performance improvement on JH7110, tested on Milkv Mars. > > about 23% performance improvement on TH1520 and SG2042, tested on > Sipeed LPI4A and SG2042 platform. > > a slight performance drop on CV1800B, tested on milkv duo. Among all > riscv platforms in my hands, this is the only one which sees a slight > performance drop. It means the 'mul' isn't quick enough. However, the > situation exists on x86 too, for example, P4 doesn't have fast > integer multiplies as said in the above commit, x86 also selects > ARCH_HAS_FAST_MULTIPLIER. So let's select ARCH_HAS_FAST_MULTIPLIER > which can benefit almost riscv platforms. > > Samuel also provided some performance numbers: > On Unmatched: 20% speedup for __sw_hweight32 and 30% speedup for > __sw_hweight64. > On D1: 8% speedup for __sw_hweight32 and 8% slowdown for > __sw_hweight64. > > Signed-off-by: Jisheng Zhang <jszhang@kernel.org> > Reviewed-by: Samuel Holland <samuel.holland@sifive.com> > Tested-by: Samuel Holland <samuel.holland@sifive.com> Hi @Palmer, I saw this simple patch is missed in your for-next tree, could you please pick it up? Thanks in advance > --- > > since v1: > - fix typo in commit msg > - add some performance numbers provided by Samuel > - collect Reviewed-by and Tested-by tag > > arch/riscv/Kconfig | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig > index 95a2a06acc6a..e4834fa76417 100644 > --- a/arch/riscv/Kconfig > +++ b/arch/riscv/Kconfig > @@ -23,6 +23,7 @@ config RISCV > select ARCH_HAS_DEBUG_VIRTUAL if MMU > select ARCH_HAS_DEBUG_VM_PGTABLE > select ARCH_HAS_DEBUG_WX > + select ARCH_HAS_FAST_MULTIPLIER > select ARCH_HAS_FORTIFY_SOURCE > select ARCH_HAS_GCOV_PROFILE_ALL > select ARCH_HAS_GIGANTIC_PAGE > -- > 2.42.0 > > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv
Hi Jisheng, On 02/12/2023 14:52, Jisheng Zhang wrote: > Currently, riscv linux requires at least IMA, so all platforms have a > multiplier. And I assume the 'mul' efficiency is comparable or better > than a sequence of five or so register-dependent arithmetic > instructions. Select ARCH_HAS_FAST_MULTIPLIER to get slightly nicer > codegen. Refer to commit f9b4192923fa ("[PATCH] bitops: hweight() > speedup") for more details. > > In a simple benchmark test calling hweight64() in a loop, it got: > about 14% performance improvement on JH7110, tested on Milkv Mars. > > about 23% performance improvement on TH1520 and SG2042, tested on > Sipeed LPI4A and SG2042 platform. > > a slight performance drop on CV1800B, tested on milkv duo. Among all > riscv platforms in my hands, this is the only one which sees a slight > performance drop. It means the 'mul' isn't quick enough. However, the > situation exists on x86 too, for example, P4 doesn't have fast > integer multiplies as said in the above commit, x86 also selects > ARCH_HAS_FAST_MULTIPLIER. So let's select ARCH_HAS_FAST_MULTIPLIER > which can benefit almost riscv platforms. > > Samuel also provided some performance numbers: > On Unmatched: 20% speedup for __sw_hweight32 and 30% speedup for > __sw_hweight64. > On D1: 8% speedup for __sw_hweight32 and 8% slowdown for > __sw_hweight64. > > Signed-off-by: Jisheng Zhang <jszhang@kernel.org> > Reviewed-by: Samuel Holland <samuel.holland@sifive.com> > Tested-by: Samuel Holland <samuel.holland@sifive.com> > --- > > since v1: > - fix typo in commit msg > - add some performance numbers provided by Samuel > - collect Reviewed-by and Tested-by tag > > arch/riscv/Kconfig | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig > index 95a2a06acc6a..e4834fa76417 100644 > --- a/arch/riscv/Kconfig > +++ b/arch/riscv/Kconfig > @@ -23,6 +23,7 @@ config RISCV > select ARCH_HAS_DEBUG_VIRTUAL if MMU > select ARCH_HAS_DEBUG_VM_PGTABLE > select ARCH_HAS_DEBUG_WX > + select ARCH_HAS_FAST_MULTIPLIER > select ARCH_HAS_FORTIFY_SOURCE > select ARCH_HAS_GCOV_PROFILE_ALL > select ARCH_HAS_GIGANTIC_PAGE You can add: Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Thanks, Alex
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index 95a2a06acc6a..e4834fa76417 100644 --- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -23,6 +23,7 @@ config RISCV select ARCH_HAS_DEBUG_VIRTUAL if MMU select ARCH_HAS_DEBUG_VM_PGTABLE select ARCH_HAS_DEBUG_WX + select ARCH_HAS_FAST_MULTIPLIER select ARCH_HAS_FORTIFY_SOURCE select ARCH_HAS_GCOV_PROFILE_ALL select ARCH_HAS_GIGANTIC_PAGE