Message ID | 20231121144340.3492-1-jszhang@kernel.org |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:2b07:b0:403:3b70:6f57 with SMTP id io7csp678107vqb; Tue, 21 Nov 2023 06:56:57 -0800 (PST) X-Google-Smtp-Source: AGHT+IGrW8ZOtJoqoANbnp58TAtFTUcckJeS6zHGEFIRarbRrei3qleC/caH53T2L3ZcT65bR6dH X-Received: by 2002:a05:6808:2103:b0:3b8:37af:d192 with SMTP id r3-20020a056808210300b003b837afd192mr1421415oiw.38.1700578616886; Tue, 21 Nov 2023 06:56:56 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1700578616; cv=none; d=google.com; s=arc-20160816; b=erhbBhDNln0gBGKcvKImqHwKYkLqvAJGDuqyxWDpdTtwFjWB+7ECDPaQDuxOehWS7L I238Er9icFDYFPj5eUOGy6NZbwSK5rPkwJN8XOcGbVKDykAFQHJ4MFU4o0r3ZbwnR/hw tnYWGjV+fmaJtZLyBYFn5uvioRUwoO+SUxyRdk6YmXL5BBWjLH0TOpVfSny/hWVDckHo yp7ALvRe6Jeuy/SYxtY/FWmpVo3c7ovznV9nBEfNy0u/deLVNY1DdOGgHDsd0hO//Fc4 HSkO6qw78q9IM57dLkwsSHb/J4zxi96URh1cjc2z2AFjf3rReJt/2clyX6yWkpbi1KoW G7Uw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=voscgaTtk8mmx5YY9RtNu+L2Yxf8hTOcpm6IodI57Ys=; fh=srb/EijQIrMEdOeqn3qQX6qrsAGyqHAqeSFkl/qDdLM=; b=lvqNME2DhGTKZwMo82V6xcblFx++RtaLACtmCuyqkRU8++QvOcncHKcDq5dwcwTjLA G2ma09WoUbq21UFdmk9IUN/gNWNvt7dB9q+f0ck67K0nNfUsCUT3ordoVYsUTOA8XCx2 gtz6J/ciw89sxoL7vqF1aQcdrKubtyHQ3VN1y0IR1FLBoDVzksmHFSwOlUnXiq3Gvpbn WEzPy47piDgbL6fyvWtVvDb3baJI9+Oa+pvV2ttv1nyk0YShVnfE3FkQ5XNtjOjOlyLo qt2yZLUsLiHOaRNa9JwjyW48j9tdv4UyKCfU4w6906TUXOysi7FWIkuVEElAmE/vbf2M SydQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="Vb3OCJ/b"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from groat.vger.email (groat.vger.email. [23.128.96.35]) by mx.google.com with ESMTPS id p14-20020a634f4e000000b005c1cc7273bfsi10041375pgl.26.2023.11.21.06.56.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Nov 2023 06:56:56 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) client-ip=23.128.96.35; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="Vb3OCJ/b"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id 357418093F6C; Tue, 21 Nov 2023 06:56:18 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234023AbjKUO4R (ORCPT <rfc822;ouuuleilei@gmail.com> + 99 others); Tue, 21 Nov 2023 09:56:17 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58918 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230202AbjKUO4Q (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 21 Nov 2023 09:56:16 -0500 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F085A98 for <linux-kernel@vger.kernel.org>; Tue, 21 Nov 2023 06:56:12 -0800 (PST) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 598F7C433C8; Tue, 21 Nov 2023 14:56:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1700578572; bh=goyE925j68b1sxQIGmM2p3n/Pv4DT5HZh2Zf6o8Dplk=; h=From:To:Cc:Subject:Date:From; b=Vb3OCJ/bzT++FdX+q066N3VO4wDu8sqSY6geuAKT9GNoRGC9D1r6P1G1TcZAA/I+f dr0Nv4SXyTP0TEH7MQWZjh3O8c8i3WguLumLdc2fnIHHKub12hMuywxBioal0TdnNg Ws1NOMoabop79fPpC4iO4jbBvpqDqz/n/DcR2366tYfx1M+7MIAJHgHBy9rT9nUJwJ GtUirmde6czIQGiAZ+n8cyktLgVq6A3bdVYp7fqhfjgIz8YMUnsBuDJEFgN9uUksxB BAW94RhNifuLk1BB3/hOw+gCG+0VZdKeWQvT8k712EKVWS/F7dTu8JsYlUppPuj0y1 FUKn9URfOxx/w== From: Jisheng Zhang <jszhang@kernel.org> To: Paul Walmsley <paul.walmsley@sifive.com>, Palmer Dabbelt <palmer@dabbelt.com>, Albert Ou <aou@eecs.berkeley.edu> Cc: linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org Subject: [PATCH] riscv: select ARCH_HAS_FAST_MULTIPLIER Date: Tue, 21 Nov 2023 22:43:40 +0800 Message-Id: <20231121144340.3492-1-jszhang@kernel.org> X-Mailer: git-send-email 2.40.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Tue, 21 Nov 2023 06:56:18 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1783185923758439883 X-GMAIL-MSGID: 1783185923758439883 |
Series |
riscv: select ARCH_HAS_FAST_MULTIPLIER
|
|
Commit Message
Jisheng Zhang
Nov. 21, 2023, 2:43 p.m. UTC
Currently, riscv linux requires at least IMA, so all platforms have a
multiplier. And I assume the 'mul' efficiency is comparable or better
than a sequence of five or so register-dependent arithmetic
instructions. Select ARCH_HAS_FAST_MULTIPLIER to get slightly nicer
codegen. Refer to commit f9b4192923fa ("[PATCH] bitops: hweight()
speedup") for more details.
In a simple benchmark test calling hweight64() in a loop, it got:
about 14% preformance improvement on JH7110, tested on Milkv Mars.
about 23% performance improvement on TH1520 and SG2042, tested on
Sipeed LPI4A and SG2042 platform.
a slight performance drop on CV1800B, tested on milkv duo. Among all
riscv platforms in my hands, this is the only one which sees a slight
performance drop. It means the 'mul' isn't quick enough. However, the
situation exists on x86 too, for example, P4 doesn't have fast
integer multiplies as said in the above commit, x86 also selects
ARCH_HAS_FAST_MULTIPLIER. So let's select ARCH_HAS_FAST_MULTIPLIER
which can benefit almost riscv platforms.
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
---
arch/riscv/Kconfig | 1 +
1 file changed, 1 insertion(+)
Comments
On 2023-11-21 8:43 AM, Jisheng Zhang wrote: > Currently, riscv linux requires at least IMA, so all platforms have a > multiplier. And I assume the 'mul' efficiency is comparable or better > than a sequence of five or so register-dependent arithmetic > instructions. Select ARCH_HAS_FAST_MULTIPLIER to get slightly nicer > codegen. Refer to commit f9b4192923fa ("[PATCH] bitops: hweight() > speedup") for more details. > > In a simple benchmark test calling hweight64() in a loop, it got: > about 14% preformance improvement on JH7110, tested on Milkv Mars. typo: performance > about 23% performance improvement on TH1520 and SG2042, tested on > Sipeed LPI4A and SG2042 platform. > > a slight performance drop on CV1800B, tested on milkv duo. Among all > riscv platforms in my hands, this is the only one which sees a slight > performance drop. It means the 'mul' isn't quick enough. However, the > situation exists on x86 too, for example, P4 doesn't have fast > integer multiplies as said in the above commit, x86 also selects > ARCH_HAS_FAST_MULTIPLIER. So let's select ARCH_HAS_FAST_MULTIPLIER > which can benefit almost riscv platforms. On Unmatched: 20% speedup for __sw_hweight32 and 30% speedup for __sw_hweight64. On D1: 8% speedup for __sw_hweight32 and 8% slowdown for __sw_hweight64. So overall still an improvement. > Signed-off-by: Jisheng Zhang <jszhang@kernel.org> > --- > arch/riscv/Kconfig | 1 + > 1 file changed, 1 insertion(+) Reviewed-by: Samuel Holland <samuel.holland@sifive.com> Tested-by: Samuel Holland <samuel.holland@sifive.com>
... > However, the > situation exists on x86 too, for example, P4 doesn't have fast > integer multiplies ... P4 doesn't have fast anything :-) More interestingly does anything modern not have fast multiplies? (that you'd consider running Linux on). The silicon required isn't that big. IIRC an 'n' bit multiplier requires n^2 full adders and has the latency of 2n adders. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index 95a2a06acc6a..e4834fa76417 100644 --- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -23,6 +23,7 @@ config RISCV select ARCH_HAS_DEBUG_VIRTUAL if MMU select ARCH_HAS_DEBUG_VM_PGTABLE select ARCH_HAS_DEBUG_WX + select ARCH_HAS_FAST_MULTIPLIER select ARCH_HAS_FORTIFY_SOURCE select ARCH_HAS_GCOV_PROFILE_ALL select ARCH_HAS_GIGANTIC_PAGE