From patchwork Wed Sep 28 15:05:38 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Stubbs X-Patchwork-Id: 1509 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f2ce:0:0:0:0:0 with SMTP id d14csp216102wrp; Wed, 28 Sep 2022 08:06:28 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4XqjwN0eW3Xr2yUxrjvf/ZujbWkSJ4/ggT6jpSyrb5vE/RoXr7j6rWjH/ASTwy/Uv9yYPd X-Received: by 2002:a17:906:770d:b0:73c:a08f:593c with SMTP id q13-20020a170906770d00b0073ca08f593cmr28846850ejm.182.1664377588351; Wed, 28 Sep 2022 08:06:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1664377588; cv=none; d=google.com; s=arc-20160816; b=n0AaKJfn1BSOLUJE4XbfR9ilIVTdPNkjhI2nZhvLDQyLm5dgRWSoClxTEruFcmogIw m2wazPkUknZIAz/DlHc+qLxgKRE3yCfdeyqGATKEHCpRlIgPM7x9Zht2xZyA/gEgcw2D mbO64M8zpDh2zNIukspAoh3qOMJ2SynBxY1zw/X1X+HybdB5uV+kDRuc/fQeLpnEndUL ruk44Udp2aBP2/Sk/oX9JG1aTAK+Mm8YzgJuy469A45GYUGuLJh3MUH8wWGMTz1Wa9TV GxXm919eiOPeNz9VcqbYf40j9B4ZR6jCckkiGvjBJ7+4HweA6B13tNjabh7DS/d0+LDC /cug== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:to:subject:from :content-language:user-agent:mime-version:date:message-id :ironport-sdr:dmarc-filter:delivered-to; bh=dSjgC8+qLfvxiJiFg4gFkypgfjHnL+Sc5+++2ihnExo=; b=Y7oKKU2FY/4WrhD13eOpefPFixW/SD5goOJzrNjEmaH7DnKefsfQjJ6zV9Ub0xZCoM yDKTOk9rnfDM+9/Kv4O7zVQoEEP+RU5nXQTzZMpYGXbRPxqO7m4ZFWi6sOhL/uCvkNQC 2DLsiUi1znEN1rT0GrWvWcCs0N46Er6wDrtM4XRXrXoaNm/JbZ5GKN3KYZm7HOm/Lg4a u6zFTkb6y18OP/VF6WYpcYjlFMDwnQXWbW+UHb9HoQ6OQCSlBtH7ptq0irkhlhYQS6zG +yj9FmlBWiKMsIJ7GBR/f9x2rB9QFvrfWYcmslc59VuGpS8DMzoLMHPULDMybe9C8lWr dGDQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id bs11-20020a170906d1cb00b0078372cf516csi4041876ejb.229.2022.09.28.08.06.28 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Sep 2022 08:06:28 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 07398385AC23 for ; Wed, 28 Sep 2022 15:06:11 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa3.mentor.iphmx.com (esa3.mentor.iphmx.com [68.232.137.180]) by sourceware.org (Postfix) with ESMTPS id 74ACC3857342 for ; Wed, 28 Sep 2022 15:05:46 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 74ACC3857342 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.93,352,1654588800"; d="scan'208";a="83659942" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa3.mentor.iphmx.com with ESMTP; 28 Sep 2022 07:05:43 -0800 IronPort-SDR: uJ/vOT5X0jwHoTXcl4jYi5+TnfldK30qIiRHW0ztfeiDxoPvyGP94RtdWVTALh7bfZyfLRA8Nf CVAlSfCHL6vaBB/xDE152qoZINf0HhzGILXoQGjTAL9UibJDnNtCK2oFF9eOyEIJUoR8MWkcM6 XUkuzSsaP5ewBZsNHmS3vx8X/OGNmsnfvNCSwZwXU8QxzRcCkkbUylv0/zbaAf6HdeKeniaNyi hYr86JiosSGsXpMTiPPINqGuKGfaW3Y0oYFIXidpYoBtpsYUsZLUQUYxoP0zGkJm8M8Ed0Qy2L gjY= Message-ID: <87180de9-d0d4-b92f-405f-100aca3d5cf8@codesourcery.com> Date: Wed, 28 Sep 2022 16:05:38 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.3.0 Content-Language: en-GB From: Andrew Stubbs Subject: [PATCH] vect: while_ult for integer mask To: "gcc-patches@gcc.gnu.org" X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-15.mgc.mentorg.com (139.181.222.15) To svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11) X-Spam-Status: No, score=-12.8 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1745226394074447999?= X-GMAIL-MSGID: =?utf-8?q?1745226394074447999?= This patch is a prerequisite for some amdgcn patches I'm working on to support shorter vector lengths (having fixed 64 lanes tends to miss optimizations, and masking is not supported everywhere yet). The problem is that, unlike AArch64, I'm not using different mask modes for different sized vectors, so all loops end up using the while_ultsidi pattern, regardless of vector length. In theory I could use SImode for V32, HImode for V16, etc., but there's no mode to fit V4 or V2 so something else is needed. Moving to using vector masks in the backend is not a natural fit for GCN, and would be a huge task in any case. This patch adds an additional length operand so that we can distinguish the different uses in the back end and don't end up with more lanes enabled than there ought to be. I've made the extra operand conditional on the mode so that I don't have to modify the AArch64 backend; that uses while_ family of operators in a lot of places and uses iterators, so it would end up touching a lot of code just to add an inactive operand, plus I don't have a way to test it properly. I've confirmed that AArch64 builds and expands while_ult correctly in a simple example. OK for mainline? Thanks Andrew vect: while_ult for integer masks Add a vector length parameter needed by amdgcn without breaking aarch64. All amdgcn vector masks are DImode, regardless of vector length, so we can't tell what length is implied simply from the operator mode. (Even if we used different integer modes there's no mode small enough to differenciate a 2 or 4 lane mask). Without knowing the intended length we end up using a mask with too many lanes enabled, which leads to undefined behaviour.. The extra operand is not added for vector mask types so AArch64 does not need to be adjusted. gcc/ChangeLog: * config/gcn/gcn-valu.md (while_ultsidi): Limit mask length using operand 3. * doc/md.texi (while_ult): Document new operand 3 usage. * internal-fn.cc (expand_while_optab_fn): Set operand 3 when lhs_type maps to a non-vector mode. diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md index 3bfdf8213fc..dec81e863f7 100644 --- a/gcc/config/gcn/gcn-valu.md +++ b/gcc/config/gcn/gcn-valu.md @@ -3052,7 +3052,8 @@ (define_expand "vcondu_exec" (define_expand "while_ultsidi" [(match_operand:DI 0 "register_operand") (match_operand:SI 1 "") - (match_operand:SI 2 "")] + (match_operand:SI 2 "") + (match_operand:SI 3 "")] "" { if (GET_CODE (operands[1]) != CONST_INT @@ -3077,6 +3078,11 @@ (define_expand "while_ultsidi" : ~((unsigned HOST_WIDE_INT)-1 << diff)); emit_move_insn (operands[0], gen_rtx_CONST_INT (VOIDmode, mask)); } + if (INTVAL (operands[3]) < 64) + emit_insn (gen_anddi3 (operands[0], operands[0], + gen_rtx_CONST_INT (VOIDmode, + ~((unsigned HOST_WIDE_INT)-1 + << INTVAL (operands[3]))))); DONE; }) diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index d46963f468c..d8e2a5a83f4 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -4950,9 +4950,10 @@ This pattern is not allowed to @code{FAIL}. @cindex @code{while_ult@var{m}@var{n}} instruction pattern @item @code{while_ult@var{m}@var{n}} Set operand 0 to a mask that is true while incrementing operand 1 -gives a value that is less than operand 2. Operand 0 has mode @var{n} -and operands 1 and 2 are scalar integers of mode @var{m}. -The operation is equivalent to: +gives a value that is less than operand 2, for a vector length up to operand 3. +Operand 0 has mode @var{n} and operands 1 to 3 are scalar integers of mode +@var{m}. Operand 3 should be omitted when @var{n} is a vector mode. The +operation for vector modes is equivalent to: @smallexample operand0[0] = operand1 < operand2; @@ -4960,6 +4961,14 @@ for (i = 1; i < GET_MODE_NUNITS (@var{n}); i++) operand0[i] = operand0[i - 1] && (operand1 + i < operand2); @end smallexample +And for non-vector modes the operation is equivalent to: + +@smallexample +operand0[0] = operand1 < operand2; +for (i = 1; i < operand3; i++) + operand0[i] = operand0[i - 1] && (operand1 + i < operand2); +@end smallexample + @cindex @code{check_raw_ptrs@var{m}} instruction pattern @item @samp{check_raw_ptrs@var{m}} Check whether, given two pointers @var{a} and @var{b} and a length @var{len}, diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc index 651d99eaeb9..c306240c2ac 100644 --- a/gcc/internal-fn.cc +++ b/gcc/internal-fn.cc @@ -3664,7 +3664,7 @@ expand_direct_optab_fn (internal_fn fn, gcall *stmt, direct_optab optab, static void expand_while_optab_fn (internal_fn, gcall *stmt, convert_optab optab) { - expand_operand ops[3]; + expand_operand ops[4]; tree rhs_type[2]; tree lhs = gimple_call_lhs (stmt); @@ -3680,10 +3680,24 @@ expand_while_optab_fn (internal_fn, gcall *stmt, convert_optab optab) create_input_operand (&ops[i + 1], rhs_rtx, TYPE_MODE (rhs_type[i])); } + int opcnt; + if (!VECTOR_MODE_P (TYPE_MODE (lhs_type))) + { + /* When the mask is an integer mode the exact vector length may not + be clear to the backend, so we pass it in operand[3]. + Use the vector in arg2 for the most reliable intended size. */ + tree type = TREE_TYPE (gimple_call_arg (stmt, 2)); + create_integer_operand (&ops[3], TYPE_VECTOR_SUBPARTS (type)); + opcnt = 4; + } + else + /* The mask has a vector type so the length operand is unnecessary. */ + opcnt = 3; + insn_code icode = convert_optab_handler (optab, TYPE_MODE (rhs_type[0]), TYPE_MODE (lhs_type)); - expand_insn (icode, 3, ops); + expand_insn (icode, opcnt, ops); if (!rtx_equal_p (lhs_rtx, ops[0].value)) emit_move_insn (lhs_rtx, ops[0].value); }