Message ID | Y2xlRIxBIy3VG/xL@toto.the-meissners.org |
---|---|
Headers |
Return-Path: <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp686481wru; Wed, 9 Nov 2022 18:44:09 -0800 (PST) X-Google-Smtp-Source: AMsMyM6NUQPQIElXqIbHHjb54jALQnk1VKmfXOZHOTq9+ofQoM5uPqX8pc3Xof8jhc1e4RxWxMZK X-Received: by 2002:a05:6402:27c6:b0:461:e426:dc8b with SMTP id c6-20020a05640227c600b00461e426dc8bmr60250235ede.403.1668048249713; Wed, 09 Nov 2022 18:44:09 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1668048249; cv=none; d=google.com; s=arc-20160816; b=nXEnyVHdxYw4/nLjnQiVx9ZwrhhY5wgrGxE2tc1jRxpwAccM4HkqStRo6wsA2r5yS1 7+P5iAGB2aA04FcsE2PGKqNMFemIhyxRbkbODQuTBxRM5nfQ/r0TiMKRFSX03K44GKAE mS+Y6jc+9Cz++DJXq6uU9AMzFUA83xra9Wl1U5eUgN9yhtfCKuas/J4uhh9fX19/XN9U fcUWUi7uxxegjvQjDKzsGWtC8wyUtEGTGUrmlkRDKbMYkV/SvGkWvs5Tjo/iVprWiJxK iKgtOiAM0uj9zVoNW+M+GK7DzI8Z7557TaFYQIqL1h1izk85P/sHYsUVWijgbLmIBXUL oIDQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence :content-disposition:mime-version:mail-followup-to:message-id :subject:to:date:dmarc-filter:delivered-to:dkim-signature :dkim-filter; bh=E+v4RCwFVquMefsMu/dDm9UW7rhMDrT3PF2KGHIySWU=; b=kXA1BqMbHFHtGZLA5r3LYH5mgVqQ0aJUFEPeCFgu/lidj0dmEM+AnCPa5iGi9wzQjM 86wMr8ZTRuQf3lZUmqnjgH1+ruSTBqaf5JjJSjxlartNF9jIvltzXaPiH8ZndUYk1UsD rlr4sZ89v3vhVgm8hQMndEo+u/xVhpNPyUsLMrS7Iw+4miuOA3GnZJ4pQ1bgRCcdH1bJ Uox7TnfO1DzJU9qlxZekD0rvjKnUoPGfYqIrvorfexH4/jK+2+eZ3ZWoz29BnqIvHZD8 26hNDfnAwFLW/lgjRgAdVtnHYHIJI3GhG8JatfhpRcn/6K+gCiQ0hgAFwSG/gmK6Ktjo I4Sg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=s7QT7o4i; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id du10-20020a17090772ca00b007ae17fd2f2dsi17927594ejc.996.2022.11.09.18.44.09 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 09 Nov 2022 18:44:09 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=s7QT7o4i; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A4A453858D38 for <ouuuleilei@gmail.com>; Thu, 10 Nov 2022 02:44:08 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A4A453858D38 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1668048248; bh=E+v4RCwFVquMefsMu/dDm9UW7rhMDrT3PF2KGHIySWU=; h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=s7QT7o4iyxfFFdEGnULj/q7c/phN4jX94DqRAG3wHjdJktB/cOlTJUGmCpzawCISk 0JnIg8uKp8Zzl17PVg3QRlxEn5IUIg1YllCAjZUwWX+HJYHuqN6hoc6oiHg1sF6/y+ yT0ffG5uR1kZVpG78rDANfrPvVL+EfwyK9EYpIjo= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 5A6C13858D1E for <gcc-patches@gcc.gnu.org>; Thu, 10 Nov 2022 02:43:23 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 5A6C13858D1E Received: from pps.filterd (m0098416.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 2AA2GMvv017296; Thu, 10 Nov 2022 02:43:22 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0b-001b2d01.pphosted.com (PPS) with ESMTPS id 3krrcbgep5-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 10 Nov 2022 02:43:21 +0000 Received: from m0098416.ppops.net (m0098416.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 2AA2SkUM024366; Thu, 10 Nov 2022 02:43:21 GMT Received: from ppma05wdc.us.ibm.com (1b.90.2fa9.ip4.static.sl-reverse.com [169.47.144.27]) by mx0b-001b2d01.pphosted.com (PPS) with ESMTPS id 3krrcbgenw-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 10 Nov 2022 02:43:21 +0000 Received: from pps.filterd (ppma05wdc.us.ibm.com [127.0.0.1]) by ppma05wdc.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 2AA2aYO9010507; Thu, 10 Nov 2022 02:43:20 GMT Received: from b01cxnp22033.gho.pok.ibm.com (b01cxnp22033.gho.pok.ibm.com [9.57.198.23]) by ppma05wdc.us.ibm.com with ESMTP id 3kngpm9sgv-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 10 Nov 2022 02:43:20 +0000 Received: from smtpav05.wdc07v.mail.ibm.com ([9.208.128.117]) by b01cxnp22033.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 2AA2hJL764749838 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 10 Nov 2022 02:43:19 GMT Received: from smtpav05.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 2310D58053; Thu, 10 Nov 2022 02:43:19 +0000 (GMT) Received: from smtpav05.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 55D075805F; Thu, 10 Nov 2022 02:43:18 +0000 (GMT) Received: from toto.the-meissners.org (unknown [9.160.5.6]) by smtpav05.wdc07v.mail.ibm.com (Postfix) with ESMTPS; Thu, 10 Nov 2022 02:43:18 +0000 (GMT) Date: Wed, 9 Nov 2022 21:43:16 -0500 To: gcc-patches@gcc.gnu.org, Michael Meissner <meissner@linux.ibm.com>, Segher Boessenkool <segher@kernel.crashing.org>, "Kewen.Lin" <linkw@linux.ibm.com>, David Edelsohn <dje.gcc@gmail.com>, Peter Bergner <bergner@linux.ibm.com>, Will Schmidt <will_schmidt@vnet.ibm.com> Subject: [PATCH 0/6] PowerPC Dense Math prelimary support (-mcpu=future) Message-ID: <Y2xlRIxBIy3VG/xL@toto.the-meissners.org> Mail-Followup-To: Michael Meissner <meissner@linux.ibm.com>, gcc-patches@gcc.gnu.org, Segher Boessenkool <segher@kernel.crashing.org>, "Kewen.Lin" <linkw@linux.ibm.com>, David Edelsohn <dje.gcc@gmail.com>, Peter Bergner <bergner@linux.ibm.com>, Will Schmidt <will_schmidt@vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-TM-AS-GCONF: 00 X-Proofpoint-GUID: -Hm2I9kFgk_fzot8IjdLTGdcOFwK2TUy X-Proofpoint-ORIG-GUID: mPpbc8x2_2PaSxy5FKb2geOfGJZepCIe X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.895,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-11-09_06,2022-11-09_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 impostorscore=0 mlxscore=0 priorityscore=1501 lowpriorityscore=0 bulkscore=0 phishscore=0 suspectscore=0 adultscore=0 spamscore=0 malwarescore=0 mlxlogscore=999 clxscore=1015 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2210170000 definitions=main-2211100016 X-Spam-Status: No, score=-4.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, KAM_MANYTO, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> From: Michael Meissner via Gcc-patches <gcc-patches@gcc.gnu.org> Reply-To: Michael Meissner <meissner@linux.ibm.com> Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org> X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1749075361486274873?= X-GMAIL-MSGID: =?utf-8?q?1749075361486274873?= |
Series |
PowerPC Dense Math prelimary support (-mcpu=future)
|
|
Message
Michael Meissner
Nov. 10, 2022, 2:43 a.m. UTC
This patch is very preliminary support for a potential new feature to the PowerPC that extends the current power10 MMA architecture. This feature may or may not be present in any specific future PowerPC processor. In the current MMA subsystem for Power10, there are 8 512-bit accumulator registers. These accumulators are each tied to sets of 4 FPR registers. When you issue a prime instruction, it makes sure the accumulator is a copy of the 4 FPR registers the accumulator is tied to. When you issue a deprime instruction, it makes sure that the accumulator data content is logically copied to the matching FPR register. In the potential dense math system, the accumulators are moved to separate registers called dense math registers (DM registers or DMR). The DMRs are then extended to 1,024 bits and new instructions will be added to deal with all 1,024 bits of the DMRs. If you take existing MMA code, it will work as long as you don't do anything with accumulators, and you follow the rules in the ISA 3.1 documentation for using the MMA subsystem. These patches add support for the 512-bit accumulators within the dense math system, and for allocation of the 1,024-bit DMRs. At this time, no additional built-in functions will be done to support any dense math features other than doing data movement between the DMRs and the VSX registers. Before we can look at adding any new dense math support other than data movement, we need the GCC compiler to be able to allocate and use these DMRs. There are 6 patches in this patch set: 1) The first patch just adds -mcpu=future as an option to add new support. This is similar to the -mcpu=future that we did before power10 was announced. 2) The second patch enables GCC to use the load and store vector pair instructions to optimize memory copy operations in the compiler. For power10, we needed to just stay with normal vector load/stores for memory copy operations. 3) The third patch enables 512-bit accumulators store in DMRs. This patch enables the register allocation, but it does not move the existing MMA to use these registers. 4) The fourth patch switches the MMA subsystem to use 512-bit accumulators within DMRs if you use -mcpu=future. 5) The fifth patch switches the names of the MMA instructions to use the dense math equivalent name if -mcpu=future. 6) The sixth patch enables using the full 1,024-bit DMRs. Right now, all you can do with DMRs is move a VSX register to a DMR register, and to move a DMR register to a VSX register. In terms of changes, we now use the wD constraint for accumulators. If you compile with -mcpu=power10, the wD constraint will match the equivalent FPR register that overlaps with the accumulator. If you compile with -mcpu=future, the wD constraint will match the DMR register and not the FPR register. This patch also modifies the print_operand %A output modifier to print out DMR register numbers if -mcpu=future, and continue to print out the FPR register number divided by 4 for -mcpu=power10. In general, if you only use the built-in functions, things work between the two systems. If you use extended asm, you will likely need to modify the code. Going forward, hopefully if you modify your code to use the wD constraint and %A output modifier, you can write code that switches more easily between the two systems. There is one bug that I noticed. When you use the full DMR instruction the constant copy propagation patch issues internal errors. I believe this is due to the CCP pass not handling opaque types cleanly enough, and it only shows up in larger types. I would like to get these patches committed, and then work the maintainers of the CCP to fix the problem. Again, these are preliminary patches for a potential future machine. Things will likely change in terms of implementation and usage over time.
Comments
Hi! On Wed, Nov 09, 2022 at 09:43:16PM -0500, Michael Meissner wrote: > This patch is very preliminary support for a potential new feature to the > PowerPC that extends the current power10 MMA architecture. This feature may or > may not be present in any specific future PowerPC processor. MMA is an optional facility in ISA 3.1 -- please don't say it is power10 only. > In the current MMA subsystem for Power10, there are 8 512-bit accumulator > registers. These accumulators are each tied to sets of 4 FPR registers. Four VSRs. FPRs are only 64bits. You mean this is VSRs 0..31 . > When > you issue a prime instruction, it makes sure the accumulator is a copy of the 4 I suppose you mean the xxmtacc instruction? > FPR registers the accumulator is tied to. When you issue a deprime > instruction, it makes sure that the accumulator data content is logically > copied to the matching FPR register. And xxmfacc. Very importantly all the other rules in 7.2.1.3 "VSX Accumulators" apply as well. That should make old code work on new systems transparently. > In terms of changes, we now use the wD constraint for accumulators. If you > compile with -mcpu=power10, the wD constraint will match the equivalent FPR > register that overlaps with the accumulator. The set of *four* *VSX* registers. Of course in the end it is just a number, but :-) > If you compile with -mcpu=future, > the wD constraint will match the DMR register and not the FPR register. Constraints do not "match" anything. "Will allow" perhaps? > In general, if you only use the built-in functions, things work between the two > systems. If you use extended asm, you will likely need to modify the code. > Going forward, hopefully if you modify your code to use the wD constraint and > %A output modifier, you can write code that switches more easily between the > two systems. You *already* are required to follow all these rules that make this painless and transparent. > There is one bug that I noticed. When you use the full DMR instruction the > constant copy propagation patch issues internal errors. I believe this is due > to the CCP pass not handling opaque types cleanly enough, and it only shows up > in larger types. I would like to get these patches committed, and then work > the maintainers of the CCP to fix the problem. Erm. If the compiler ICEs, we can not include this code. But hopefully you mean something else? Segher
On Fri, Jan 27, 2023 at 01:59:00PM -0600, Segher Boessenkool wrote: > > There is one bug that I noticed. When you use the full DMR instruction the > > constant copy propagation patch issues internal errors. I believe this is due > > to the CCP pass not handling opaque types cleanly enough, and it only shows up > > in larger types. I would like to get these patches committed, and then work > > the maintainers of the CCP to fix the problem. > > Erm. If the compiler ICEs, we can not include this code. But hopefully > you mean something else? I realize we can't include the code for final release. But as a temporary measure I was hoping we would put in the code, we could allow somebody more familar with ccp to debug it. Then if there were changes needed in the PowerPC back end, we could make them, once ccp was fixed. But that is a moot point, ccp no longer dies with the code, so I have removed the comment and the no tree ccp option in the next set of patches.
On Sat, Jan 28, 2023 at 02:29:04AM -0500, Michael Meissner wrote: > On Fri, Jan 27, 2023 at 01:59:00PM -0600, Segher Boessenkool wrote: > > > There is one bug that I noticed. When you use the full DMR instruction the > > > constant copy propagation patch issues internal errors. I believe this is due > > > to the CCP pass not handling opaque types cleanly enough, and it only shows up > > > in larger types. I would like to get these patches committed, and then work > > > the maintainers of the CCP to fix the problem. > > > > Erm. If the compiler ICEs, we can not include this code. But hopefully > > you mean something else? > > I realize we can't include the code for final release. But as a temporary > measure I was hoping we would put in the code, we could allow somebody more > familar with ccp to debug it. Then if there were changes needed in the PowerPC > back end, we could make them, once ccp was fixed. > > But that is a moot point, ccp no longer dies with the code, so I have removed > the comment and the no tree ccp option in the next set of patches. Unfortunately, while it worked on my x86 as a cross compiler, when I did the builds for real, it is a problem, so I will need to look into it.
On Sun, Jan 29, 2023 at 09:52:38PM -0500, Michael Meissner wrote: > On Sat, Jan 28, 2023 at 02:29:04AM -0500, Michael Meissner wrote: > > On Fri, Jan 27, 2023 at 01:59:00PM -0600, Segher Boessenkool wrote: > > > > There is one bug that I noticed. When you use the full DMR instruction the > > > > constant copy propagation patch issues internal errors. I believe this is due > > > > to the CCP pass not handling opaque types cleanly enough, and it only shows up > > > > in larger types. I would like to get these patches committed, and then work > > > > the maintainers of the CCP to fix the problem. > > > > > > Erm. If the compiler ICEs, we can not include this code. But hopefully > > > you mean something else? > > > > I realize we can't include the code for final release. But as a temporary > > measure I was hoping we would put in the code, we could allow somebody more > > familar with ccp to debug it. Then if there were changes needed in the PowerPC > > back end, we could make them, once ccp was fixed. > > > > But that is a moot point, ccp no longer dies with the code, so I have removed > > the comment and the no tree ccp option in the next set of patches. > > Unfortunately, while it worked on my x86 as a cross compiler, when I did the > builds for real, it is a problem, so I will need to look into it. Ok, I tracked down the source of the bug. The CCP pass is depending on the precision field. Unfortunately in tree-core.h, the precision is a 10 integer bit field, so 1,024 will become 0. Having a 0 precision meant that the hwint function for sign extending a value would generate: (HOST_WIDE_INT)(((unsigned HOST_WIDE_INT)value << 64) >> 64) which is undefined behavior in C and C++. On the x86_64 doing the shift left and then right gives you the initial value (which was -1), while on the PowerPC it always gives you 0. The CCP code was assuming if it wasn't -1, that it was an integer, but the TDO type is opaque, not integer. The solution was to grow precision by 1 bit and decrease the extra bits in the placeholder entry by 1 bit. I'm testing it now.
On Tue, Jan 31, 2023 at 10:31:03PM -0500, Michael Meissner wrote: > Ok, I tracked down the source of the bug. The CCP pass is depending on the > precision field. Unfortunately in tree-core.h, the precision is a 10 integer > bit field, so 1,024 will become 0. > > Having a 0 precision meant that the hwint function for sign extending a value > would generate: > > (HOST_WIDE_INT)(((unsigned HOST_WIDE_INT)value << 64) >> 64) > > which is undefined behavior in C and C++. On the x86_64 doing the shift left > and then right gives you the initial value (which was -1), while on the PowerPC > it always gives you 0. The CCP code was assuming if it wasn't -1, that it was > an integer, but the TDO type is opaque, not integer. Variable 64-bit shifts on x86 mask the shift amount to 6 bits, while on PowerPC it is masked to 7 bits. It sounds like that is what you hit, with some -O0 build perhaps. But either way UB is UB, the program has no meaning, any output is correct, no output is correct as well :-) Nasal demons and all that. bootstrap-ubsan should have found this? Segher