Teach vectorizer to deal with bitfield accesses (was: [RFC] Teach vectorizer to deal with bitfield reads)
Message ID | 8f805fb1-d4ae-b0e3-ff26-57fd2c1fc1f7@arm.com |
---|---|
State | New, archived |
Headers |
Return-Path: <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:6a10:20da:b0:2d3:3019:e567 with SMTP id n26csp1879798pxc; Mon, 8 Aug 2022 07:07:52 -0700 (PDT) X-Google-Smtp-Source: AA6agR7BrB1DExmNpIleaUXw29SRbxwh4sMSQeBXgUsDmXNyc32+PZR5UnOJxudI30HMJJ+btWKW X-Received: by 2002:a17:906:a089:b0:72f:826b:e084 with SMTP id q9-20020a170906a08900b0072f826be084mr14277892ejy.708.1659967672507; Mon, 08 Aug 2022 07:07:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1659967672; cv=none; d=google.com; s=arc-20160816; b=c26Oi0ixOgpeOvz/ScH4TDOHPkttFdmJ/8qYQpJwmakX0Yb7M5zwo5Y17PbIbWjUwO Kn2BxtyZ6KoOSUN9D1EVx80Yn25gvQzFeZHolcGA5RUkczpIAWr4c/ywajh/RlThuXvH Uqfi5kUtlo9lZlGG781nwJkUS15kgh3Vjewm0SK8kWfsGmVrfRM9u3yoEalGUZeWIwRr aA6q5yYjOLuJFjLbIe5Xy7imrNUafRoFp9y+X6F8wfx2nES0HF2W1tz59gg9KLKhieBN g5WxkqjBVZyfJvSIkE05qDEpOpaNFOonJfJQOrdp9w/cnr9XThssG5+i8+VctTleovf2 Cj0g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:reply-to:from:list-subscribe:list-help :list-post:list-archive:list-unsubscribe:list-id:precedence :in-reply-to:references:to:content-language:subject:user-agent :mime-version:date:message-id:dmarc-filter:delivered-to :dkim-signature:dkim-filter; bh=pc/MZaslPKCVhAquIYBevpalvjcSJC5IYzl6o4hviuQ=; b=VJbqCQeJB7+tHs6j4k1QyGmmHruDG/ltbuwz7Yc7r0u38vqCObNMWThqA0uTbN3QCj uTsExqd4BqgTacKCSSWGTbNKKFDQ2HwiMctBeiKyGWk9vtJteA4RRTV8GdOrLv9s+2Ef nbnQ3Rx/t8iOg2d8tPtLbGy0ODFbJEhQdsKYGMGyMlxTtWWvXEM3cjQYaHPi75iMvlEx osQCJJoSIGblm40gqB02o267pq8GNBHsi0auhqlX8hzjqfiDOKzu7VjYqYQQTII1rdne ht4R1OJ2FfziBfUbd992roz/GagABPIh1w1Ee6bd4CU8DWZe5++8tXO2Km5r4MSTBZqo nsVw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b="lcIxQ/j3"; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id g7-20020a056402424700b0043d4fe7f8e7si7612086edb.330.2022.08.08.07.07.51 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 08 Aug 2022 07:07:52 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b="lcIxQ/j3"; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 0D9AF385737C for <ouuuleilei@gmail.com>; Mon, 8 Aug 2022 14:07:51 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 0D9AF385737C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1659967671; bh=pc/MZaslPKCVhAquIYBevpalvjcSJC5IYzl6o4hviuQ=; h=Date:Subject:To:References:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=lcIxQ/j380bhiRjiRf/yry7tTcP32cWL4oO9tWZBQ3LRsV/PC0wQlZW82muFy7HtI 5rNqxS7S9ACE3WEivjuPqoGa6aonun/DmnUWyc8r762M0Wk7iTHYCK2/Kg0t8H1OF7 McE13ba6VfAVn5/1LeRD3LJO+aZUT2fjq0XDtlu8= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 7AAC738582BF for <gcc-patches@gcc.gnu.org>; Mon, 8 Aug 2022 14:07:03 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 7AAC738582BF Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D300B1596; Mon, 8 Aug 2022 07:07:03 -0700 (PDT) Received: from [10.1.35.127] (E121495.arm.com [10.1.35.127]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 4D43A3F5A1; Mon, 8 Aug 2022 07:07:02 -0700 (PDT) Content-Type: multipart/mixed; boundary="------------05gJsynyansKBraapcLVjddB" Message-ID: <8f805fb1-d4ae-b0e3-ff26-57fd2c1fc1f7@arm.com> Date: Mon, 8 Aug 2022 15:06:56 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.11.0 Subject: [PATCH] Teach vectorizer to deal with bitfield accesses (was: [RFC] Teach vectorizer to deal with bitfield reads) Content-Language: en-US To: Richard Biener <rguenther@suse.de> References: <4a6f2350-f070-1473-63a5-3232968d3a89@arm.com> <nycvar.YFH.7.77.849.2207271122000.6583@jbgna.fhfr.qr> <4ea91767-de43-9d30-5bd2-a9a0759760c7@arm.com> <YuO3FvWoUm148X8J@tucnak> <nycvar.YFH.7.77.849.2207291049350.4208@jbgna.fhfr.qr> <69abe824-94f5-95b5-fb7f-6fa076973e05@arm.com> <nycvar.YFH.7.77.849.2208011313430.4208@jbgna.fhfr.qr> In-Reply-To: <nycvar.YFH.7.77.849.2208011313430.4208@jbgna.fhfr.qr> X-Spam-Status: No, score=-21.4 required=5.0 tests=BAYES_00, BODY_8BITS, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_LOTSOFHASH, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> From: "Andre Vieira \(lists\) via Gcc-patches" <gcc-patches@gcc.gnu.org> Reply-To: "Andre Vieira \(lists\)" <andre.simoesdiasvieira@arm.com> Cc: Jakub Jelinek <jakub@redhat.com>, Richard Sandiford <richard.sandiford@arm.com>, "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org> Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org> X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1740602262449037925?= X-GMAIL-MSGID: =?utf-8?q?1740602262449037925?= |
Series |
Teach vectorizer to deal with bitfield accesses (was: [RFC] Teach vectorizer to deal with bitfield reads)
|
|
Commit Message
Andre Vieira (lists)
Aug. 8, 2022, 2:06 p.m. UTC
Hi, So I've changed the approach from the RFC as suggested, moving the bitfield lowering to the if-convert pass. So to reiterate, ifcvt will lower COMPONENT_REF's with DECL_BIT_FIELD field's to either BIT_FIELD_REF if they are reads or BIT_INSERT_EXPR if they are writes, using loads and writes of 'representatives' that are big enough to contain the bitfield value. In vect_recog I added two patterns to replace these BIT_FIELD_REF and BIT_INSERT_EXPR with shift's and masks as appropriate. I'd like to see if it was possible to remove the 'load' part of a BIT_INSERT_EXPR if the representative write didn't change any relevant bits. For example: struct s{ int dont_care; char a : 3; }; s.a = <value>; Should not require a load & write cycle, in fact it wouldn't even require any masking either. Though to achieve this we'd need to make sure the representative didn't overlap with any other field. Any suggestions on how to do this would be great, though I don't think we need to wait for that, as that's merely a nice-to-have optimization I guess? I am not sure where I should 'document' this change of behavior to ifcvt, and/or we should change the name of the pass, since it's doing more than if-conversion now? Bootstrapped and regression tested this patch on aarch64-none-linux-gnu. gcc/ChangeLog: 2022-08-08 Andre Vieira <andre.simoesdiasvieira@arm.com> * tree-if-conv.cc (includes): Add expr.h and langhooks.h to list of includes. (need_to_lower_bitfields): New static bool. (need_to_ifcvt): Likewise. (version_loop_for_if_conversion): Adapt to work for bitfield lowering-only path. (bitfield_data_t): New typedef. (get_bitfield_data): New function. (lower_bitfield): New function. (bitfields_to_lower_p): New function. (tree_if_conversion): Change to lower-bitfields too. * tree-vect-data-refs.cc (vect_find_stmt_data_reference): Modify dump message to be more accurate. * tree-vect-patterns.cc (includes): Add gimplify-me.h include. (vect_recog_bitfield_ref_pattern): New function. (vect_recog_bit_insert_pattern): New function. (vect_vect_recog_func_ptrs): Add two new patterns. gcc/testsuite/ChangeLog: 2022-08-08 Andre Vieira <andre.simoesdiasvieira@arm.com> * gcc.dg/vect/vect-bitfield-read-1.c: New test. * gcc.dg/vect/vect-bitfield-read-2.c: New test. * gcc.dg/vect/vect-bitfield-read-3.c: New test. * gcc.dg/vect/vect-bitfield-read-4.c: New test. * gcc.dg/vect/vect-bitfield-write-1.c: New test. * gcc.dg/vect/vect-bitfield-write-2.c: New test. * gcc.dg/vect/vect-bitfield-write-3.c: New test. Kind regards, Andre
Comments
On Mon, 8 Aug 2022, Andre Vieira (lists) wrote: > Hi, > > So I've changed the approach from the RFC as suggested, moving the bitfield > lowering to the if-convert pass. > > So to reiterate, ifcvt will lower COMPONENT_REF's with DECL_BIT_FIELD field's > to either BIT_FIELD_REF if they are reads or BIT_INSERT_EXPR if they are > writes, using loads and writes of 'representatives' that are big enough to > contain the bitfield value. > > In vect_recog I added two patterns to replace these BIT_FIELD_REF and > BIT_INSERT_EXPR with shift's and masks as appropriate. > > I'd like to see if it was possible to remove the 'load' part of a > BIT_INSERT_EXPR if the representative write didn't change any relevant bits. > For example: > > struct s{ > int dont_care; > char a : 3; > }; > > s.a = <value>; > > Should not require a load & write cycle, in fact it wouldn't even require any > masking either. Though to achieve this we'd need to make sure the > representative didn't overlap with any other field. Any suggestions on how to > do this would be great, though I don't think we need to wait for that, as > that's merely a nice-to-have optimization I guess? Hmm. I'm not sure the middle-end can simply ignore padding. If some language standard says that would be OK then I think we should exploit this during lowering when the frontend is still around to ask - which means somewhen during early optimization. > I am not sure where I should 'document' this change of behavior to ifcvt, > and/or we should change the name of the pass, since it's doing more than > if-conversion now? It's preparation for vectorization anyway since it will emit .MASK_LOAD/STORE and friends already. So I don't think anything needs to change there. @@ -2998,7 +3013,7 @@ ifcvt_split_critical_edges (class loop *loop, bool aggressive_if_conv) auto_vec<edge> critical_edges; /* Loop is not well formed. */ - if (num <= 2 || loop->inner || !single_exit (loop)) + if (num <= 2 || loop->inner) return false; body = get_loop_body (loop); this doesn't appear in the ChangeLog nor is it clear to me why it's needed? Likewise - /* Save BB->aux around loop_version as that uses the same field. */ - save_length = loop->inner ? loop->inner->num_nodes : loop->num_nodes; - void **saved_preds = XALLOCAVEC (void *, save_length); - for (unsigned i = 0; i < save_length; i++) - saved_preds[i] = ifc_bbs[i]->aux; + void **saved_preds = NULL; + if (any_complicated_phi || need_to_predicate) + { + /* Save BB->aux around loop_version as that uses the same field. */ + save_length = loop->inner ? loop->inner->num_nodes : loop->num_nodes; + saved_preds = XALLOCAVEC (void *, save_length); + for (unsigned i = 0; i < save_length; i++) + saved_preds[i] = ifc_bbs[i]->aux; + } is that just premature optimization? + /* BITSTART and BITEND describe the region we can safely load from inside the + structure. BITPOS is the bit position of the value inside the + representative that we will end up loading OFFSET bytes from the start + of the struct. BEST_MODE is the mode describing the optimal size of the + representative chunk we load. If this is a write we will store the same + sized representative back, after we have changed the appropriate bits. */ + get_bit_range (&bitstart, &bitend, comp_ref, &bitpos, &offset); I think you need to give up when get_bit_range sets bitstart = bitend to zero + if (get_best_mode (bitsize, bitpos.to_constant (), bitstart, bitend, + TYPE_ALIGN (TREE_TYPE (struct_expr)), + INT_MAX, false, &best_mode)) + tree rep_decl = build_decl (UNKNOWN_LOCATION, FIELD_DECL, + NULL_TREE, rep_type); + /* Load from the start of 'offset + bitpos % alignment'. */ + uint64_t extra_offset = bitpos.to_constant (); you shouldn't build a new FIELD_DECL. Either you use DECL_BIT_FIELD_REPRESENTATIVE directly or you use a BIT_FIELD_REF accessing the "representative". DECL_BIT_FIELD_REPRESENTATIVE exists so it can maintain a variable field offset, you can also subset that with an intermediate BIT_FIELD_REF if DECL_BIT_FIELD_REPRESENTATIVE is too large for your taste. I'm not sure all the offset calculation you do is correct, but since you shouldn't invent a new FIELD_DECL it probably needs to change anyway ... Note that for optimization it will be important that all accesses to the bitfield members of the same bitfield use the same underlying area (CSE and store-forwarding will thank you). + + need_to_lower_bitfields = bitfields_to_lower_p (loop, &bitfields_to_lower); + if (!ifcvt_split_critical_edges (loop, aggressive_if_conv) + && !need_to_lower_bitfields) goto cleanup; so we lower bitfields even when we cannot split critical edges? why? + need_to_ifcvt + = if_convertible_loop_p (loop) && dbg_cnt (if_conversion_tree); + if (!need_to_ifcvt && !need_to_lower_bitfields) goto cleanup; likewise - if_convertible_loop_p performs other checks, the only one we want to elide is the loop->num_nodes <= 2 check since we want to lower bitfields in single-block loops as well. That means we only have to scan for bitfield accesses in the first block "prematurely". So I would interwind the need_to_lower_bitfields into if_convertible_loop_p and if_convertible_loop_p_1 and put the loop->num_nodes <= 2 after it when !need_to_lower_bitfields. + tree op = gimple_get_lhs (stmt); + bool write = TREE_CODE (op) == COMPONENT_REF; + + if (!write) + op = gimple_assign_rhs1 (stmt); + + if (TREE_CODE (op) != COMPONENT_REF) + continue; + + if (DECL_BIT_FIELD (TREE_OPERAND (op, 1))) note the canonical test for a bitfield access is to check DECL_BIT_FIELD_TYPE, not DECL_BIT_FIELD. In particular for struct { int a : 4; int b : 4; int c : 8; int d : 4; int e : 12; } 'c' will _not_ have DECL_BIT_FIELD set but you want to lower it's access since you otherwise likely will get conflicting accesses for the other fields (store forwarding). +static bool +bitfields_to_lower_p (class loop *loop, auto_vec <bitfield_data_t *, 4> *to_lower) don't pass auto_vec<> *, just pass vec<>&, auto_vec will properly decay. +vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info, + tree *type_out) +{ + gassign *nop_stmt = dyn_cast <gassign *> (stmt_info->stmt); + if (!nop_stmt + || gimple_assign_rhs_code (nop_stmt) != NOP_EXPR CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (nop_stmt)) + tree bf_ref = gimple_assign_rhs1 (bf_stmt); + + tree load = TREE_OPERAND (bf_ref, 0); + tree size = TREE_OPERAND (bf_ref, 1); + tree offset = TREE_OPERAND (bf_ref, 2); use bit_field_{size,offset} + /* Bail out if the load is already a vector type. */ + if (VECTOR_TYPE_P (TREE_TYPE (load))) + return NULL; I think you want a "positive" check, what kind of type you handle for the load. An (unsigned?) INTEGRAL_TYPE_P one I guess. + tree ret_type = TREE_TYPE (gimple_get_lhs (nop_stmt)); + gimple_assign_lhs + if (!useless_type_conversion_p (TREE_TYPE (lhs), ret_type)) + { + pattern_stmt + = gimple_build_assign (vect_recog_temp_ssa_var (ret_type, NULL), + NOP_EXPR, lhs); + lhs = gimple_get_lhs (pattern_stmt); + append_pattern_def_seq (vinfo, stmt_info, pattern_stmt); + } hm - so you have for example int _1 = MEM; int:3 _2 = BIT_FIELD_REF <_1, ...> type _3 = (type) _2; and that _3 = (type) _2 is because of integer promotion and you perform all the shifting in that type. I suppose you should verify that the cast is indeed promoting, not narrowing, since otherwise you'll produce wrong code? That said, shouldn't you perform the shift / mask in the type of _1 instead? (the hope is, of course, that typeof (_1) == type in most cases) Similar comments apply to vect_recog_bit_insert_pattern. Overall it looks reasonable but it does still need some work. Thanks, Richard. > Bootstrapped and regression tested this patch on aarch64-none-linux-gnu. > > gcc/ChangeLog: > 2022-08-08 Andre Vieira <andre.simoesdiasvieira@arm.com> > > * tree-if-conv.cc (includes): Add expr.h and langhooks.h to list of > includes. > (need_to_lower_bitfields): New static bool. > (need_to_ifcvt): Likewise. > (version_loop_for_if_conversion): Adapt to work for bitfield > lowering-only path. > (bitfield_data_t): New typedef. > (get_bitfield_data): New function. > (lower_bitfield): New function. > (bitfields_to_lower_p): New function. > (tree_if_conversion): Change to lower-bitfields too. > * tree-vect-data-refs.cc (vect_find_stmt_data_reference): > Modify dump message to be more accurate. > * tree-vect-patterns.cc (includes): Add gimplify-me.h include. > (vect_recog_bitfield_ref_pattern): New function. > (vect_recog_bit_insert_pattern): New function. > (vect_vect_recog_func_ptrs): Add two new patterns. > > gcc/testsuite/ChangeLog: > 2022-08-08 Andre Vieira <andre.simoesdiasvieira@arm.com> > > * gcc.dg/vect/vect-bitfield-read-1.c: New test. > * gcc.dg/vect/vect-bitfield-read-2.c: New test. > * gcc.dg/vect/vect-bitfield-read-3.c: New test. > * gcc.dg/vect/vect-bitfield-read-4.c: New test. > * gcc.dg/vect/vect-bitfield-write-1.c: New test. > * gcc.dg/vect/vect-bitfield-write-2.c: New test. > * gcc.dg/vect/vect-bitfield-write-3.c: New test. > > Kind regards, > Andre
Hi, New version of the patch attached, but haven't recreated the ChangeLog yet, just waiting to see if this is what you had in mind. See also some replies to your comments in-line below: On 09/08/2022 15:34, Richard Biener wrote: > @@ -2998,7 +3013,7 @@ ifcvt_split_critical_edges (class loop *loop, bool > aggressive_if_conv) > auto_vec<edge> critical_edges; > > /* Loop is not well formed. */ > - if (num <= 2 || loop->inner || !single_exit (loop)) > + if (num <= 2 || loop->inner) > return false; > > body = get_loop_body (loop); > > this doesn't appear in the ChangeLog nor is it clear to me why it's > needed? Likewise So both these and... > > - /* Save BB->aux around loop_version as that uses the same field. */ > - save_length = loop->inner ? loop->inner->num_nodes : loop->num_nodes; > - void **saved_preds = XALLOCAVEC (void *, save_length); > - for (unsigned i = 0; i < save_length; i++) > - saved_preds[i] = ifc_bbs[i]->aux; > + void **saved_preds = NULL; > + if (any_complicated_phi || need_to_predicate) > + { > + /* Save BB->aux around loop_version as that uses the same field. > */ > + save_length = loop->inner ? loop->inner->num_nodes : > loop->num_nodes; > + saved_preds = XALLOCAVEC (void *, save_length); > + for (unsigned i = 0; i < save_length; i++) > + saved_preds[i] = ifc_bbs[i]->aux; > + } > > is that just premature optimization? .. these changes are to make sure we can still use the loop versioning code even for cases where there are bitfields to lower but no ifcvts (i.e. num of BBs <= 2). I wasn't sure about the loop-inner condition and the small examples I tried it seemed to work, that is loop version seems to be able to handle nested loops. The single_exit condition is still required for both, because the code to create the loop versions depends on it. It does look like I missed this in the ChangeLog... > + /* BITSTART and BITEND describe the region we can safely load from > inside the > + structure. BITPOS is the bit position of the value inside the > + representative that we will end up loading OFFSET bytes from the > start > + of the struct. BEST_MODE is the mode describing the optimal size of > the > + representative chunk we load. If this is a write we will store the > same > + sized representative back, after we have changed the appropriate > bits. */ > + get_bit_range (&bitstart, &bitend, comp_ref, &bitpos, &offset); > > I think you need to give up when get_bit_range sets bitstart = bitend to > zero > > + if (get_best_mode (bitsize, bitpos.to_constant (), bitstart, bitend, > + TYPE_ALIGN (TREE_TYPE (struct_expr)), > + INT_MAX, false, &best_mode)) > > + tree rep_decl = build_decl (UNKNOWN_LOCATION, FIELD_DECL, > + NULL_TREE, rep_type); > + /* Load from the start of 'offset + bitpos % alignment'. */ > + uint64_t extra_offset = bitpos.to_constant (); > > you shouldn't build a new FIELD_DECL. Either you use > DECL_BIT_FIELD_REPRESENTATIVE directly or you use a > BIT_FIELD_REF accessing the "representative". > DECL_BIT_FIELD_REPRESENTATIVE exists so it can maintain > a variable field offset, you can also subset that with an > intermediate BIT_FIELD_REF if DECL_BIT_FIELD_REPRESENTATIVE is > too large for your taste. > > I'm not sure all the offset calculation you do is correct, but > since you shouldn't invent a new FIELD_DECL it probably needs > to change anyway ... I can use the DECL_BIT_FIELD_REPRESENTATIVE, but I'll still need some offset calculation/extraction. It's easier to example with an example: In vect-bitfield-read-3.c the struct: typedef struct { int c; int b; bool a : 1; } struct_t; and field access 'vect_false[i].a' or 'vect_true[i].a' will lead to a DECL_BIT_FIELD_REPRESENTATIVE of TYPE_SIZE of 8 (and TYPE_PRECISION is also 8 as expected). However, the DECL_FIELD_OFFSET of either the original field decl, the actual bitfield member, or the DECL_BIT_FIELD_REPRESENTATIVE is 0 and the DECL_FIELD_BIT_OFFSET is 64. These will lead to the correct load: _1 = vect_false[i].D; D here being the representative is an 8-bit load from vect_false[i] + 64bits. So all good there. However, when we construct BIT_FIELD_REF we can't simply use DECL_FIELD_BIT_OFFSET (field_decl) as the BIT_FIELD_REF's bitpos. During `verify_gimple` it checks that bitpos + bitsize < TYPE_SIZE (TREE_TYPE (load)) where BIT_FIELD_REF (load, bitsize, bitpos). So instead I change bitpos such that: align_of_representative = TYPE_ALIGN (TREE_TYPE (representative)); bitpos -= bitpos.to_constant () / align_of_representative * align_of_representative; I've now rewritten this to: poly_int64 q,r; if (can_trunc_div_p(bitpos, align_of_representative, &q, &r)) bitpos = r; It makes it slightly clearer, also because I no longer need the changes to the original tree offset as I'm just using D for the load. > Note that for optimization it will be important that all > accesses to the bitfield members of the same bitfield use the > same underlying area (CSE and store-forwarding will thank you). > > + > + need_to_lower_bitfields = bitfields_to_lower_p (loop, > &bitfields_to_lower); > + if (!ifcvt_split_critical_edges (loop, aggressive_if_conv) > + && !need_to_lower_bitfields) > goto cleanup; > > so we lower bitfields even when we cannot split critical edges? > why? > > + need_to_ifcvt > + = if_convertible_loop_p (loop) && dbg_cnt (if_conversion_tree); > + if (!need_to_ifcvt && !need_to_lower_bitfields) > goto cleanup; > > likewise - if_convertible_loop_p performs other checks, the only > one we want to elide is the loop->num_nodes <= 2 check since > we want to lower bitfields in single-block loops as well. That > means we only have to scan for bitfield accesses in the first > block "prematurely". So I would interwind the need_to_lower_bitfields > into if_convertible_loop_p and if_convertible_loop_p_1 and > put the loop->num_nodes <= 2 after it when !need_to_lower_bitfields. I'm not sure I understood this. But I'd rather keep the 'need_to_ifcvt' (new) and 'need_to_lower_bitfields' separate. One thing I did change is that we no longer check for bitfields to lower if there are if-stmts that we can't lower, since we will not be vectorizing this loop anyway so no point in wasting time lowering bitfields. At the same time though, I'd like to be able to lower-bitfields if there are no ifcvts. > + if (!useless_type_conversion_p (TREE_TYPE (lhs), ret_type)) > + { > + pattern_stmt > + = gimple_build_assign (vect_recog_temp_ssa_var (ret_type, NULL), > + NOP_EXPR, lhs); > + lhs = gimple_get_lhs (pattern_stmt); > + append_pattern_def_seq (vinfo, stmt_info, pattern_stmt); > + } > > hm - so you have for example > > int _1 = MEM; > int:3 _2 = BIT_FIELD_REF <_1, ...> > type _3 = (type) _2; > > and that _3 = (type) _2 is because of integer promotion and you > perform all the shifting in that type. I suppose you should > verify that the cast is indeed promoting, not narrowing, since > otherwise you'll produce wrong code? That said, shouldn't you > perform the shift / mask in the type of _1 instead? (the hope > is, of course, that typeof (_1) == type in most cases) > > Similar comments apply to vect_recog_bit_insert_pattern. Good shout, hadn't realized that yet because of how the testcases didn't have that problem, but when using the REPRESENTATIVE macro they do test that now. I don't think the bit_insert is a problem though. In bit_insert, 'value' always has the relevant bits starting at its LSB. So regardless of whether the load (and store) type is larger or smaller than the type, performing the shifts and masks in this type should be OK as you'll only be 'cutting off' the MSB's which would be the ones that would get truncated anyway? Or am missing something here? diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1.c new file mode 100644 index 0000000000000000000000000000000000000000..01cf34fb44484ca926ca5de99eef76dd99b69e92 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1.c @@ -0,0 +1,40 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" + +extern void abort(void); + +struct s { int i : 31; }; + +#define ELT0 {0} +#define ELT1 {1} +#define ELT2 {2} +#define ELT3 {3} +#define N 32 +#define RES 48 +struct s A[N] + = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3}; + +int __attribute__ ((noipa)) +f(struct s *ptr, unsigned n) { + int res = 0; + for (int i = 0; i < n; ++i) + res += ptr[i].i; + return res; +} + +int main (void) +{ + check_vect (); + + if (f(&A[0], N) != RES) + abort (); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2.c new file mode 100644 index 0000000000000000000000000000000000000000..1a4a1579c1478b9407ad21b19e8fbdca9f674b42 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2.c @@ -0,0 +1,43 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" + +extern void abort(void); + +struct s { + unsigned i : 31; + char a : 4; +}; + +#define N 32 +#define ELT0 {0x7FFFFFFFUL, 0} +#define ELT1 {0x7FFFFFFFUL, 1} +#define ELT2 {0x7FFFFFFFUL, 2} +#define ELT3 {0x7FFFFFFFUL, 3} +#define RES 48 +struct s A[N] + = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3}; + +int __attribute__ ((noipa)) +f(struct s *ptr, unsigned n) { + int res = 0; + for (int i = 0; i < n; ++i) + res += ptr[i].a; + return res; +} + +int main (void) +{ + check_vect (); + + if (f(&A[0], N) != RES) + abort (); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-3.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-3.c new file mode 100644 index 0000000000000000000000000000000000000000..216611a29fd8bbfbafdbdb79d790e520f44ba672 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-3.c @@ -0,0 +1,43 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" +#include <stdbool.h> + +extern void abort(void); + +typedef struct { + int c; + int b; + bool a : 1; +} struct_t; + +#define N 16 +#define ELT_F { 0xFFFFFFFF, 0xFFFFFFFF, 0 } +#define ELT_T { 0xFFFFFFFF, 0xFFFFFFFF, 1 } + +struct_t vect_false[N] = { ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, + ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F }; +struct_t vect_true[N] = { ELT_F, ELT_F, ELT_T, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, + ELT_F, ELT_F, ELT_T, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F }; +int main (void) +{ + unsigned ret = 0; + for (unsigned i = 0; i < N; i++) + { + ret |= vect_false[i].a; + } + if (ret) + abort (); + + for (unsigned i = 0; i < N; i++) + { + ret |= vect_true[i].a; + } + if (!ret) + abort (); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-4.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-4.c new file mode 100644 index 0000000000000000000000000000000000000000..5bc9c412e9616aefcbf49a4518f1603380a54b2f --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-4.c @@ -0,0 +1,45 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" + +extern void abort(void); + +struct s { + unsigned i : 31; + char x : 2; + char a : 4; +}; + +#define N 32 +#define ELT0 {0x7FFFFFFFUL, 3, 0} +#define ELT1 {0x7FFFFFFFUL, 3, 1} +#define ELT2 {0x7FFFFFFFUL, 3, 2} +#define ELT3 {0x7FFFFFFFUL, 3, 3} +#define RES 48 +struct s A[N] + = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3}; + +int __attribute__ ((noipa)) +f(struct s *ptr, unsigned n) { + int res = 0; + for (int i = 0; i < n; ++i) + res += ptr[i].a; + return res; +} + +int main (void) +{ + check_vect (); + + if (f(&A[0], N) != RES) + abort (); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c new file mode 100644 index 0000000000000000000000000000000000000000..19683d277b1ade1034496136f1d03bb2b446900f --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c @@ -0,0 +1,39 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" + +extern void abort(void); + +struct s { int i : 31; }; + +#define N 32 +#define V 5 +struct s A[N]; + +void __attribute__ ((noipa)) +f(struct s *ptr, unsigned n) { + for (int i = 0; i < n; ++i) + ptr[i].i = V; +} + +void __attribute__ ((noipa)) +check_f(struct s *ptr) { + for (unsigned i = 0; i < N; ++i) + if (ptr[i].i != V) + abort (); +} + +int main (void) +{ + check_vect (); + __builtin_memset (&A[0], 0, sizeof(struct s) * N); + + f(&A[0], N); + check_f (&A[0]); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c new file mode 100644 index 0000000000000000000000000000000000000000..d550dd35ab75eb67f6e53f89fbf55b7315e50bc9 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c @@ -0,0 +1,42 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" + +extern void abort(void); + +struct s { + unsigned i : 31; + char a : 4; +}; + +#define N 32 +#define V 5 +struct s A[N]; + +void __attribute__ ((noipa)) +f(struct s *ptr, unsigned n) { + for (int i = 0; i < n; ++i) + ptr[i].a = V; +} + +void __attribute__ ((noipa)) +check_f(struct s *ptr) { + for (unsigned i = 0; i < N; ++i) + if (ptr[i].a != V) + abort (); +} + +int main (void) +{ + check_vect (); + __builtin_memset (&A[0], 0, sizeof(struct s) * N); + + f(&A[0], N); + check_f (&A[0]); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c new file mode 100644 index 0000000000000000000000000000000000000000..3303d2610ff972d986be172962c129634ee64254 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c @@ -0,0 +1,43 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" + +extern void abort(void); + +struct s { + unsigned i : 31; + char x : 2; + char a : 4; +}; + +#define N 32 +#define V 5 +struct s A[N]; + +void __attribute__ ((noipa)) +f(struct s *ptr, unsigned n) { + for (int i = 0; i < n; ++i) + ptr[i].a = V; +} + +void __attribute__ ((noipa)) +check_f(struct s *ptr) { + for (unsigned i = 0; i < N; ++i) + if (ptr[i].a != V) + abort (); +} + +int main (void) +{ + check_vect (); + __builtin_memset (&A[0], 0, sizeof(struct s) * N); + + f(&A[0], N); + check_f (&A[0]); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ + diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc index 1c8e1a45234b8c3565edaacd55abbee23d8ea240..f450dbb1922586b3d405281f605fb0d8a7fc8fc2 100644 --- a/gcc/tree-if-conv.cc +++ b/gcc/tree-if-conv.cc @@ -91,6 +91,7 @@ along with GCC; see the file COPYING3. If not see #include "tree-pass.h" #include "ssa.h" #include "expmed.h" +#include "expr.h" #include "optabs-query.h" #include "gimple-pretty-print.h" #include "alias.h" @@ -123,6 +124,9 @@ along with GCC; see the file COPYING3. If not see #include "tree-vectorizer.h" #include "tree-eh.h" +/* For lang_hooks.types.type_for_mode. */ +#include "langhooks.h" + /* Only handle PHIs with no more arguments unless we are asked to by simd pragma. */ #define MAX_PHI_ARG_NUM \ @@ -145,6 +149,12 @@ static bool need_to_rewrite_undefined; before phi_convertible_by_degenerating_args. */ static bool any_complicated_phi; +/* True if we have bitfield accesses we can lower. */ +static bool need_to_lower_bitfields; + +/* True if there is any ifcvting to be done. */ +static bool need_to_ifcvt; + /* Hash for struct innermost_loop_behavior. It depends on the user to free the memory. */ @@ -2898,18 +2908,22 @@ version_loop_for_if_conversion (class loop *loop, vec<gimple *> *preds) class loop *new_loop; gimple *g; gimple_stmt_iterator gsi; - unsigned int save_length; + unsigned int save_length = 0; g = gimple_build_call_internal (IFN_LOOP_VECTORIZED, 2, build_int_cst (integer_type_node, loop->num), integer_zero_node); gimple_call_set_lhs (g, cond); - /* Save BB->aux around loop_version as that uses the same field. */ - save_length = loop->inner ? loop->inner->num_nodes : loop->num_nodes; - void **saved_preds = XALLOCAVEC (void *, save_length); - for (unsigned i = 0; i < save_length; i++) - saved_preds[i] = ifc_bbs[i]->aux; + void **saved_preds = NULL; + if (any_complicated_phi || need_to_predicate) + { + /* Save BB->aux around loop_version as that uses the same field. */ + save_length = loop->inner ? loop->inner->num_nodes : loop->num_nodes; + saved_preds = XALLOCAVEC (void *, save_length); + for (unsigned i = 0; i < save_length; i++) + saved_preds[i] = ifc_bbs[i]->aux; + } initialize_original_copy_tables (); /* At this point we invalidate porfile confistency until IFN_LOOP_VECTORIZED @@ -2921,8 +2935,9 @@ version_loop_for_if_conversion (class loop *loop, vec<gimple *> *preds) profile_probability::always (), true); free_original_copy_tables (); - for (unsigned i = 0; i < save_length; i++) - ifc_bbs[i]->aux = saved_preds[i]; + if (any_complicated_phi || need_to_predicate) + for (unsigned i = 0; i < save_length; i++) + ifc_bbs[i]->aux = saved_preds[i]; if (new_loop == NULL) return NULL; @@ -2998,7 +3013,7 @@ ifcvt_split_critical_edges (class loop *loop, bool aggressive_if_conv) auto_vec<edge> critical_edges; /* Loop is not well formed. */ - if (num <= 2 || loop->inner || !single_exit (loop)) + if (loop->inner) return false; body = get_loop_body (loop); @@ -3259,6 +3274,196 @@ ifcvt_hoist_invariants (class loop *loop, edge pe) free (body); } +/* Returns the DECL_FIELD_BIT_OFFSET of the bitfield accesse in stmt iff its + type mode is not BLKmode. If BITPOS is not NULL it will hold the poly_int64 + value of the DECL_FIELD_BIT_OFFSET of the bitfield access and STRUCT_EXPR, + if not NULL, will hold the tree representing the base struct of this + bitfield. */ + +static tree +get_bitfield_rep (gassign *stmt, bool write, poly_int64 *bitpos, + tree *struct_expr) +{ + tree comp_ref = write ? gimple_get_lhs (stmt) + : gimple_assign_rhs1 (stmt); + + if (struct_expr) + *struct_expr = TREE_OPERAND (comp_ref, 0); + + tree field_decl = TREE_OPERAND (comp_ref, 1); + if (bitpos) + *bitpos = tree_to_poly_int64 (DECL_FIELD_BIT_OFFSET (field_decl)); + + tree rep_decl = DECL_BIT_FIELD_REPRESENTATIVE (field_decl); + /* Bail out if the representative is BLKmode as we will not be able to + vectorize this. */ + if (TYPE_MODE (TREE_TYPE (rep_decl)) == E_BLKmode) + return NULL_TREE; + + return rep_decl; + +} + +/* Lowers the bitfield described by DATA. + For a write like: + + struct.bf = _1; + + lower to: + + __ifc_1 = struct.<representative>; + __ifc_2 = BIT_INSERT_EXPR (__ifc_1, _1, bitpos); + struct.<representative> = __ifc_2; + + For a read: + + _1 = struct.bf; + + lower to: + + __ifc_1 = struct.<representative>; + _1 = BIT_FIELD_REF (__ifc_1, bitsize, bitpos); + + where representative is a legal load that contains the bitfield value, + bitsize is the size of the bitfield and bitpos the offset to the start of + the bitfield within the representative. */ + +static void +lower_bitfield (gassign *stmt, bool write) +{ + tree struct_expr; + poly_int64 bitpos; + tree rep_decl = get_bitfield_rep (stmt, write, &bitpos, &struct_expr); + tree rep_type = TREE_TYPE (rep_decl); + tree bf_type = TREE_TYPE (gimple_get_lhs (stmt)); + + gimple_stmt_iterator gsi = gsi_for_stmt (stmt); + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "Lowering:\n"); + print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM); + fprintf (dump_file, "to:\n"); + } + + /* BITPOS represents the position of the first bit of the bitfield field we + are accessing. However, sometimes it can be from the start of the struct, + and sometimes from the start of the representative we are loading. For + the first, the following code will adapt BITPOS to the latter since that + is the value BIT_FIELD_REF is expecting as bitposition. For the latter + this should no effect. */ + HOST_WIDE_INT q; + poly_int64 r; + poly_int64 rep_align = TYPE_ALIGN (rep_type); + if (can_div_trunc_p (bitpos, rep_align, &q, &r)) + bitpos = r; + + /* REP_COMP_REF is a COMPONENT_REF for the representative. NEW_VAL is it's + defining SSA_NAME. */ + tree rep_comp_ref = build3 (COMPONENT_REF, rep_type, struct_expr, rep_decl, + NULL_TREE); + tree new_val = ifc_temp_var (rep_type, rep_comp_ref, &gsi); + + if (dump_file && (dump_flags & TDF_DETAILS)) + print_gimple_stmt (dump_file, SSA_NAME_DEF_STMT (new_val), 0, TDF_SLIM); + + tree bitpos_tree = build_int_cst (bitsizetype, bitpos); + if (write) + { + new_val = ifc_temp_var (rep_type, + build3 (BIT_INSERT_EXPR, rep_type, new_val, + unshare_expr (gimple_assign_rhs1 (stmt)), + bitpos_tree), &gsi); + + if (dump_file && (dump_flags & TDF_DETAILS)) + print_gimple_stmt (dump_file, SSA_NAME_DEF_STMT (new_val), 0, TDF_SLIM); + + gimple *new_stmt = gimple_build_assign (unshare_expr (rep_comp_ref), + new_val); + gimple_set_vuse (new_stmt, gimple_vuse (stmt)); + tree vdef = gimple_vdef (stmt); + gimple_set_vdef (new_stmt, vdef); + SSA_NAME_DEF_STMT (vdef) = new_stmt; + gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT); + + if (dump_file && (dump_flags & TDF_DETAILS)) + print_gimple_stmt (dump_file, new_stmt, 0, TDF_SLIM); + } + else + { + tree bfr = build3 (BIT_FIELD_REF, bf_type, new_val, + build_int_cst (bitsizetype, TYPE_PRECISION (bf_type)), + bitpos_tree); + new_val = ifc_temp_var (bf_type, bfr, &gsi); + redundant_ssa_names.safe_push (std::make_pair (gimple_get_lhs (stmt), + new_val)); + + if (dump_file && (dump_flags & TDF_DETAILS)) + print_gimple_stmt (dump_file, SSA_NAME_DEF_STMT (new_val), 0, TDF_SLIM); + } + + gsi_remove (&gsi, true); +} + +/* Return TRUE if there are bitfields to lower in this LOOP. Fill TO_LOWER + with data structures representing these bitfields. */ + +static bool +bitfields_to_lower_p (class loop *loop, + vec <gassign *> &reads_to_lower, + vec <gassign *> &writes_to_lower) +{ + basic_block *bbs = get_loop_body (loop); + gimple_stmt_iterator gsi; + + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "Analyzing loop %d for bitfields:\n", loop->num); + } + + for (unsigned i = 0; i < loop->num_nodes; ++i) + { + basic_block bb = bbs[i]; + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) + { + gassign *stmt = dyn_cast<gassign*> (gsi_stmt (gsi)); + if (!stmt) + continue; + + tree op = gimple_get_lhs (stmt); + bool write = TREE_CODE (op) == COMPONENT_REF; + + if (!write) + op = gimple_assign_rhs1 (stmt); + + if (TREE_CODE (op) != COMPONENT_REF) + continue; + + if (DECL_BIT_FIELD_TYPE (TREE_OPERAND (op, 1))) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM); + + if (!get_bitfield_rep (stmt, write, NULL, NULL)) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "\t Bitfield NOT OK to lower," + " representative is BLKmode.\n"); + return false; + } + + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "\tBitfield OK to lower.\n"); + if (write) + writes_to_lower.safe_push (stmt); + else + reads_to_lower.safe_push (stmt); + } + } + } + return !reads_to_lower.is_empty () || !writes_to_lower.is_empty (); +} + + /* If-convert LOOP when it is legal. For the moment this pass has no profitability analysis. Returns non-zero todo flags when something changed. */ @@ -3269,12 +3474,18 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds) unsigned int todo = 0; bool aggressive_if_conv; class loop *rloop; + vec <gassign *> reads_to_lower; + vec <gassign *> writes_to_lower; bitmap exit_bbs; edge pe; again: + reads_to_lower.create (4); + writes_to_lower.create (4); rloop = NULL; ifc_bbs = NULL; + need_to_lower_bitfields = false; + need_to_ifcvt = false; need_to_predicate = false; need_to_rewrite_undefined = false; any_complicated_phi = false; @@ -3290,16 +3501,30 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds) aggressive_if_conv = true; } - if (!ifcvt_split_critical_edges (loop, aggressive_if_conv)) + if (!single_exit (loop)) goto cleanup; - if (!if_convertible_loop_p (loop) - || !dbg_cnt (if_conversion_tree)) - goto cleanup; + /* If there are more than two BBs in the loop then there is at least one if + to convert. */ + if (loop->num_nodes > 2) + { + need_to_ifcvt = true; + if (!ifcvt_split_critical_edges (loop, aggressive_if_conv)) + goto cleanup; + + if (!if_convertible_loop_p (loop) || !dbg_cnt (if_conversion_tree)) + goto cleanup; + + if ((need_to_predicate || any_complicated_phi) + && ((!flag_tree_loop_vectorize && !loop->force_vectorize) + || loop->dont_vectorize)) + goto cleanup; + } - if ((need_to_predicate || any_complicated_phi) - && ((!flag_tree_loop_vectorize && !loop->force_vectorize) - || loop->dont_vectorize)) + need_to_lower_bitfields = bitfields_to_lower_p (loop, reads_to_lower, + writes_to_lower); + + if (!need_to_ifcvt && !need_to_lower_bitfields) goto cleanup; /* The edge to insert invariant stmts on. */ @@ -3310,7 +3535,8 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds) Either version this loop, or if the pattern is right for outer-loop vectorization, version the outer loop. In the latter case we will still if-convert the original inner loop. */ - if (need_to_predicate + if (need_to_lower_bitfields + || need_to_predicate || any_complicated_phi || flag_tree_loop_if_convert != 1) { @@ -3350,10 +3576,31 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds) pe = single_pred_edge (gimple_bb (preds->last ())); } - /* Now all statements are if-convertible. Combine all the basic - blocks into one huge basic block doing the if-conversion - on-the-fly. */ - combine_blocks (loop); + if (need_to_lower_bitfields) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "-------------------------\n"); + fprintf (dump_file, "Start lowering bitfields\n"); + } + while (!reads_to_lower.is_empty ()) + lower_bitfield (reads_to_lower.pop (), false); + while (!writes_to_lower.is_empty ()) + lower_bitfield (writes_to_lower.pop (), true); + + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "Done lowering bitfields\n"); + fprintf (dump_file, "-------------------------\n"); + } + } + if (need_to_ifcvt) + { + /* Now all statements are if-convertible. Combine all the basic + blocks into one huge basic block doing the if-conversion + on-the-fly. */ + combine_blocks (loop); + } /* Perform local CSE, this esp. helps the vectorizer analysis if loads and stores are involved. CSE only the loop body, not the entry @@ -3380,6 +3627,8 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds) todo |= TODO_cleanup_cfg; cleanup: + reads_to_lower.release (); + writes_to_lower.release (); if (ifc_bbs) { unsigned int i; diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc index b279a82551eb70379804d405983ae5dc44b66bf5..e93cdc727da4bb7863b2ad13f29f7d550492adea 100644 --- a/gcc/tree-vect-data-refs.cc +++ b/gcc/tree-vect-data-refs.cc @@ -4301,7 +4301,8 @@ vect_find_stmt_data_reference (loop_p loop, gimple *stmt, free_data_ref (dr); return opt_result::failure_at (stmt, "not vectorized:" - " statement is bitfield access %G", stmt); + " statement is an unsupported" + " bitfield access %G", stmt); } if (DR_BASE_ADDRESS (dr) diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc index dfbfb71b3c69a0205ccc1b287cb50fa02a70942e..5486aa72a33274db954abf275c2c30dae3accc1c 100644 --- a/gcc/tree-vect-patterns.cc +++ b/gcc/tree-vect-patterns.cc @@ -35,6 +35,7 @@ along with GCC; see the file COPYING3. If not see #include "tree-eh.h" #include "gimplify.h" #include "gimple-iterator.h" +#include "gimplify-me.h" #include "cfgloop.h" #include "tree-vectorizer.h" #include "dumpfile.h" @@ -1828,6 +1829,206 @@ vect_recog_widen_sum_pattern (vec_info *vinfo, return pattern_stmt; } +/* Function vect_recog_bitfield_ref_pattern + + Try to find the following pattern: + + _2 = BIT_FIELD_REF (_1, bitsize, bitpos); + _3 = (type) _2; + + where type is a non-bitfield type, that is to say, it's precision matches + 2^(TYPE_SIZE(type) - (TYPE_UNSIGNED (type) ? 1 : 2)). + + Input: + + * STMT_VINFO: The stmt from which the pattern search begins. + here it starts with: + _3 = (type) _2; + + Output: + + * TYPE_OUT: The vector type of the output of this pattern. + + * Return value: A new stmt that will be used to replace the sequence of + stmts that constitute the pattern. In this case it will be: + patt1 = (type) _1; + patt2 = patt1 >> bitpos; + _3 = patt2 & ((1 << bitsize) - 1); + +*/ + +static gimple * +vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info, + tree *type_out) +{ + gassign *nop_stmt = dyn_cast <gassign *> (stmt_info->stmt); + if (!nop_stmt + || !CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (nop_stmt)) + || TREE_CODE (gimple_assign_rhs1 (nop_stmt)) != SSA_NAME) + return NULL; + + gassign *bf_stmt + = dyn_cast <gassign *> (SSA_NAME_DEF_STMT (gimple_assign_rhs1 (nop_stmt))); + + if (!bf_stmt || gimple_assign_rhs_code (bf_stmt) != BIT_FIELD_REF) + return NULL; + + tree bf_ref = gimple_assign_rhs1 (bf_stmt); + tree lhs = TREE_OPERAND (bf_ref, 0); + + if (!INTEGRAL_TYPE_P (TREE_TYPE (lhs))) + return NULL; + + gimple *pattern_stmt; + tree ret_type = TREE_TYPE (gimple_assign_lhs (nop_stmt)); + + /* We move the conversion earlier if the loaded type is smaller than the + return type to enable the use of widening loads. */ + if (TYPE_PRECISION (TREE_TYPE (lhs)) < TYPE_PRECISION (ret_type) + && !useless_type_conversion_p (TREE_TYPE (lhs), ret_type)) + { + pattern_stmt + = gimple_build_assign (vect_recog_temp_ssa_var (ret_type, NULL), + NOP_EXPR, lhs); + lhs = gimple_get_lhs (pattern_stmt); + append_pattern_def_seq (vinfo, stmt_info, pattern_stmt); + } + + unsigned HOST_WIDE_INT shift_n = bit_field_offset (bf_ref).to_constant (); + unsigned HOST_WIDE_INT mask_i = bit_field_size (bf_ref).to_constant (); + tree mask = build_int_cst (TREE_TYPE (lhs), + ((1ULL << mask_i) - 1) << shift_n); + pattern_stmt + = gimple_build_assign (vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL), + BIT_AND_EXPR, lhs, mask); + lhs = gimple_get_lhs (pattern_stmt); + if (shift_n) + { + append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, + get_vectype_for_scalar_type (vinfo, + TREE_TYPE (lhs))); + pattern_stmt + = gimple_build_assign (vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL), + RSHIFT_EXPR, lhs, + build_int_cst (sizetype, shift_n)); + lhs = gimple_get_lhs (pattern_stmt); + } + + if (!useless_type_conversion_p (TREE_TYPE (lhs), ret_type)) + { + append_pattern_def_seq (vinfo, stmt_info, pattern_stmt); + pattern_stmt + = gimple_build_assign (vect_recog_temp_ssa_var (ret_type, NULL), + NOP_EXPR, lhs); + lhs = gimple_get_lhs (pattern_stmt); + } + + *type_out = STMT_VINFO_VECTYPE (stmt_info); + vect_pattern_detected ("bitfield_ref pattern", stmt_info->stmt); + + return pattern_stmt; +} + +/* Function vect_recog_bit_insert_pattern + + Try to find the following pattern: + + _3 = BIT_INSERT_EXPR (_1, _2, bitpos); + + Input: + + * STMT_VINFO: The stmt we want to replace. + + Output: + + * TYPE_OUT: The vector type of the output of this pattern. + + * Return value: A new stmt that will be used to replace the sequence of + stmts that constitute the pattern. In this case it will be: + patt1 = _2 & mask; // Clearing of the non-relevant bits in the + // 'to-write value'. + patt2 = patt1 << bitpos; // Shift the cleaned value in to place. + patt3 = _1 & ~(mask << bitpos); // Clearing the bits we want to write to, + // from the value we want to write to. + _3 = patt3 | patt2; // Write bits. + + + where mask = ((1 << TYPE_PRECISION (_2)) - 1), a mask to keep the number of + bits corresponding to the real size of the bitfield value we are writing to. + +*/ + +static gimple * +vect_recog_bit_insert_pattern (vec_info *vinfo, stmt_vec_info stmt_info, + tree *type_out) +{ + gassign *bf_stmt = dyn_cast <gassign *> (stmt_info->stmt); + if (!bf_stmt || gimple_assign_rhs_code (bf_stmt) != BIT_INSERT_EXPR) + return NULL; + + tree load = gimple_assign_rhs1 (bf_stmt); + tree value = gimple_assign_rhs2 (bf_stmt); + tree offset = gimple_assign_rhs3 (bf_stmt); + + tree bf_type = TREE_TYPE (value); + tree load_type = TREE_TYPE (load); + + if (!INTEGRAL_TYPE_P (load_type)) + return NULL; + + gimple *pattern_stmt; + + if (!useless_type_conversion_p (TREE_TYPE (value), load_type)) + { + value = fold_build1 (NOP_EXPR, load_type, value); + if (!CONSTANT_CLASS_P (value)) + { + pattern_stmt + = gimple_build_assign (vect_recog_temp_ssa_var (load_type, NULL), + value); + value = gimple_get_lhs (pattern_stmt); + append_pattern_def_seq (vinfo, stmt_info, pattern_stmt); + } + } + + unsigned HOST_WIDE_INT mask_i = (1ULL << TYPE_PRECISION (bf_type)) - 1; + tree mask_t = build_int_cst (load_type, mask_i); + /* Clear bits we don't want to write back from value and shift it in place. */ + pattern_stmt + = gimple_build_assign (vect_recog_temp_ssa_var (load_type, NULL), + fold_build2 (BIT_AND_EXPR, load_type, value, + mask_t)); + append_pattern_def_seq (vinfo, stmt_info, pattern_stmt); + unsigned HOST_WIDE_INT shift_n = tree_to_uhwi (offset); + if (shift_n) + { + pattern_stmt + = gimple_build_assign (vect_recog_temp_ssa_var (load_type, NULL), + LSHIFT_EXPR, value, offset); + append_pattern_def_seq (vinfo, stmt_info, pattern_stmt); + value = gimple_get_lhs (pattern_stmt); + } + /* Mask off the bits in the loaded value. */ + mask_i <<= shift_n; + mask_i = ~mask_i; + mask_t = build_int_cst (load_type, mask_i); + + tree lhs = vect_recog_temp_ssa_var (load_type, NULL); + pattern_stmt = gimple_build_assign (lhs, BIT_AND_EXPR,load, mask_t); + append_pattern_def_seq (vinfo, stmt_info, pattern_stmt); + + /* Compose the value to write back. */ + pattern_stmt + = gimple_build_assign (vect_recog_temp_ssa_var (load_type, NULL), + BIT_IOR_EXPR, lhs, value); + + *type_out = STMT_VINFO_VECTYPE (stmt_info); + vect_pattern_detected ("bit_insert pattern", stmt_info->stmt); + + return pattern_stmt; +} + + /* Recognize cases in which an operation is performed in one type WTYPE but could be done more efficiently in a narrower type NTYPE. For example, if we have: @@ -5623,6 +5824,8 @@ struct vect_recog_func taken which means usually the more complex one needs to preceed the less comples onex (widen_sum only after dot_prod or sad for example). */ static vect_recog_func vect_vect_recog_func_ptrs[] = { + { vect_recog_bitfield_ref_pattern, "bitfield_ref" }, + { vect_recog_bit_insert_pattern, "bit_insert" }, { vect_recog_over_widening_pattern, "over_widening" }, /* Must come after over_widening, which narrows the shift as much as possible beforehand. */
On Tue, 16 Aug 2022, Andre Vieira (lists) wrote: > Hi, > > New version of the patch attached, but haven't recreated the ChangeLog yet, > just waiting to see if this is what you had in mind. See also some replies to > your comments in-line below: > > On 09/08/2022 15:34, Richard Biener wrote: > > > @@ -2998,7 +3013,7 @@ ifcvt_split_critical_edges (class loop *loop, bool > > aggressive_if_conv) > > auto_vec<edge> critical_edges; > > > > /* Loop is not well formed. */ > > - if (num <= 2 || loop->inner || !single_exit (loop)) > > + if (num <= 2 || loop->inner) > > return false; > > > > body = get_loop_body (loop); > > > > this doesn't appear in the ChangeLog nor is it clear to me why it's > > needed? Likewise > So both these and... > > > > - /* Save BB->aux around loop_version as that uses the same field. */ > > - save_length = loop->inner ? loop->inner->num_nodes : loop->num_nodes; > > - void **saved_preds = XALLOCAVEC (void *, save_length); > > - for (unsigned i = 0; i < save_length; i++) > > - saved_preds[i] = ifc_bbs[i]->aux; > > + void **saved_preds = NULL; > > + if (any_complicated_phi || need_to_predicate) > > + { > > + /* Save BB->aux around loop_version as that uses the same field. > > */ > > + save_length = loop->inner ? loop->inner->num_nodes : > > loop->num_nodes; > > + saved_preds = XALLOCAVEC (void *, save_length); > > + for (unsigned i = 0; i < save_length; i++) > > + saved_preds[i] = ifc_bbs[i]->aux; > > + } > > > > is that just premature optimization? > > .. these changes are to make sure we can still use the loop versioning code > even for cases where there are bitfields to lower but no ifcvts (i.e. num of > BBs <= 2). > I wasn't sure about the loop-inner condition and the small examples I tried it > seemed to work, that is loop version seems to be able to handle nested loops. > > The single_exit condition is still required for both, because the code to > create the loop versions depends on it. It does look like I missed this in the > ChangeLog... > > > + /* BITSTART and BITEND describe the region we can safely load from > > inside the > > + structure. BITPOS is the bit position of the value inside the > > + representative that we will end up loading OFFSET bytes from the > > start > > + of the struct. BEST_MODE is the mode describing the optimal size of > > the > > + representative chunk we load. If this is a write we will store the > > same > > + sized representative back, after we have changed the appropriate > > bits. */ > > + get_bit_range (&bitstart, &bitend, comp_ref, &bitpos, &offset); > > > > I think you need to give up when get_bit_range sets bitstart = bitend to > > zero > > > > + if (get_best_mode (bitsize, bitpos.to_constant (), bitstart, bitend, > > + TYPE_ALIGN (TREE_TYPE (struct_expr)), > > + INT_MAX, false, &best_mode)) > > > > + tree rep_decl = build_decl (UNKNOWN_LOCATION, FIELD_DECL, > > + NULL_TREE, rep_type); > > + /* Load from the start of 'offset + bitpos % alignment'. */ > > + uint64_t extra_offset = bitpos.to_constant (); > > > > you shouldn't build a new FIELD_DECL. Either you use > > DECL_BIT_FIELD_REPRESENTATIVE directly or you use a > > BIT_FIELD_REF accessing the "representative". > > DECL_BIT_FIELD_REPRESENTATIVE exists so it can maintain > > a variable field offset, you can also subset that with an > > intermediate BIT_FIELD_REF if DECL_BIT_FIELD_REPRESENTATIVE is > > too large for your taste. > > > > I'm not sure all the offset calculation you do is correct, but > > since you shouldn't invent a new FIELD_DECL it probably needs > > to change anyway ... > I can use the DECL_BIT_FIELD_REPRESENTATIVE, but I'll still need some offset > calculation/extraction. It's easier to example with an example: > > In vect-bitfield-read-3.c the struct: > typedef struct { > int c; > int b; > bool a : 1; > } struct_t; > > and field access 'vect_false[i].a' or 'vect_true[i].a' will lead to a > DECL_BIT_FIELD_REPRESENTATIVE of TYPE_SIZE of 8 (and TYPE_PRECISION is also 8 > as expected). However, the DECL_FIELD_OFFSET of either the original field > decl, the actual bitfield member, or the DECL_BIT_FIELD_REPRESENTATIVE is 0 > and the DECL_FIELD_BIT_OFFSET is 64. These will lead to the correct load: > _1 = vect_false[i].D; > > D here being the representative is an 8-bit load from vect_false[i] + 64bits. > So all good there. However, when we construct BIT_FIELD_REF we can't simply > use DECL_FIELD_BIT_OFFSET (field_decl) as the BIT_FIELD_REF's bitpos. During > `verify_gimple` it checks that bitpos + bitsize < TYPE_SIZE (TREE_TYPE (load)) > where BIT_FIELD_REF (load, bitsize, bitpos). Yes, of course. What you need to do is subtract DECL_FIELD_BIT_OFFSET of the representative from DECL_FIELD_BIT_OFFSET of the original bitfield access - that's the offset within the representative (by construction both fields share DECL_FIELD_OFFSET). > So instead I change bitpos such that: > align_of_representative = TYPE_ALIGN (TREE_TYPE (representative)); > bitpos -= bitpos.to_constant () / align_of_representative * > align_of_representative; ? Not sure why alignment comes into play here? > I've now rewritten this to: > poly_int64 q,r; > if (can_trunc_div_p(bitpos, align_of_representative, &q, &r)) > bitpos = r; > > It makes it slightly clearer, also because I no longer need the changes to the > original tree offset as I'm just using D for the load. > > > Note that for optimization it will be important that all > > accesses to the bitfield members of the same bitfield use the > > same underlying area (CSE and store-forwarding will thank you). > > > > + > > + need_to_lower_bitfields = bitfields_to_lower_p (loop, > > &bitfields_to_lower); > > + if (!ifcvt_split_critical_edges (loop, aggressive_if_conv) > > + && !need_to_lower_bitfields) > > goto cleanup; > > > > so we lower bitfields even when we cannot split critical edges? > > why? > > > > + need_to_ifcvt > > + = if_convertible_loop_p (loop) && dbg_cnt (if_conversion_tree); > > + if (!need_to_ifcvt && !need_to_lower_bitfields) > > goto cleanup; > > > > likewise - if_convertible_loop_p performs other checks, the only > > one we want to elide is the loop->num_nodes <= 2 check since > > we want to lower bitfields in single-block loops as well. That > > means we only have to scan for bitfield accesses in the first > > block "prematurely". So I would interwind the need_to_lower_bitfields > > into if_convertible_loop_p and if_convertible_loop_p_1 and > > put the loop->num_nodes <= 2 after it when !need_to_lower_bitfields. > I'm not sure I understood this. But I'd rather keep the 'need_to_ifcvt' (new) > and 'need_to_lower_bitfields' separate. One thing I did change is that we no > longer check for bitfields to lower if there are if-stmts that we can't lower, > since we will not be vectorizing this loop anyway so no point in wasting time > lowering bitfields. At the same time though, I'd like to be able to > lower-bitfields if there are no ifcvts. Sure. > > + if (!useless_type_conversion_p (TREE_TYPE (lhs), ret_type)) > > + { > > + pattern_stmt > > + = gimple_build_assign (vect_recog_temp_ssa_var (ret_type, NULL), > > + NOP_EXPR, lhs); > > + lhs = gimple_get_lhs (pattern_stmt); > > + append_pattern_def_seq (vinfo, stmt_info, pattern_stmt); > > + } > > > > hm - so you have for example > > > > int _1 = MEM; > > int:3 _2 = BIT_FIELD_REF <_1, ...> > > type _3 = (type) _2; > > > > and that _3 = (type) _2 is because of integer promotion and you > > perform all the shifting in that type. I suppose you should > > verify that the cast is indeed promoting, not narrowing, since > > otherwise you'll produce wrong code? That said, shouldn't you > > perform the shift / mask in the type of _1 instead? (the hope > > is, of course, that typeof (_1) == type in most cases) > > > > Similar comments apply to vect_recog_bit_insert_pattern. > Good shout, hadn't realized that yet because of how the testcases didn't have > that problem, but when using the REPRESENTATIVE macro they do test that now. I > don't think the bit_insert is a problem though. In bit_insert, 'value' always > has the relevant bits starting at its LSB. So regardless of whether the load > (and store) type is larger or smaller than the type, performing the shifts and > masks in this type should be OK as you'll only be 'cutting off' the MSB's > which would be the ones that would get truncated anyway? Or am missing > something here? Not sure what you are saying but "yes", all shifting and masking should happen in the type of the representative. + tree bitpos_tree = build_int_cst (bitsizetype, bitpos); for your convenience there's bitsize_int (bitpos) you can use. I don't think you are using the correct bitpos though, you fail to adjust it for the BIT_FIELD_REF/BIT_INSERT_EXPR. + build_int_cst (bitsizetype, TYPE_PRECISION (bf_type)), the size of the bitfield reference is DECL_SIZE of the original FIELD_DECL - it might be bigger than the precision of its type. You probably want to double-check it's equal to the precision (because of the insert but also because of all the masking) and refuse to lower if not. +/* Return TRUE if there are bitfields to lower in this LOOP. Fill TO_LOWER + with data structures representing these bitfields. */ + +static bool +bitfields_to_lower_p (class loop *loop, + vec <gassign *> &reads_to_lower, + vec <gassign *> &writes_to_lower) +{ + basic_block *bbs = get_loop_body (loop); + gimple_stmt_iterator gsi; as said I'd prefer to do this walk as part of the other walks we already do - if and if only because get_loop_body () is a DFS walk over the loop body (you should at least share that). + gassign *stmt = dyn_cast<gassign*> (gsi_stmt (gsi)); + if (!stmt) + continue; + + tree op = gimple_get_lhs (stmt); gimple_assign_lhs (stmt) + bool write = TREE_CODE (op) == COMPONENT_REF; + + if (!write) + op = gimple_assign_rhs1 (stmt); + + if (TREE_CODE (op) != COMPONENT_REF) + continue; + + if (DECL_BIT_FIELD_TYPE (TREE_OPERAND (op, 1))) + { rumors say that at least with Ada you can have non-integral, maybe even aggregate "bitfields", so please add && INTEGRAL_TYPE_P (TREE_TYPE (op)) @@ -3269,12 +3474,18 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds) unsigned int todo = 0; bool aggressive_if_conv; class loop *rloop; + vec <gassign *> reads_to_lower; + vec <gassign *> writes_to_lower; bitmap exit_bbs; you should be able to use auto_vec<> here again: + reads_to_lower.create (4); + writes_to_lower.create (4); I think repeated .create will not release what is there. With auto_vec<> above there's no need to .create, just do truncate (0) here. + tree mask = build_int_cst (TREE_TYPE (lhs), + ((1ULL << mask_i) - 1) << shift_n); please use wide_int_to_tree (TREE_TYPE (lhs), wi::shifted_mask (shift_n, mask_i, false , TYPE_PRECISION (TREE_TYPE (lhs))); 1ULL would better be (unsigned HOST_WIDE_INT)1 or HOST_WIDE_INT_1U. But note the representative could be __int128_t where uint64_t mask operations fall apart... Btw, instead of (val & mask) >> shift it might be better to use (val >> shift) & mask since the resulting mask values are "smaller" and maybe easier to code generate? + patt1 = _2 & mask; // Clearing of the non-relevant bits in the + // 'to-write value'. + patt2 = patt1 << bitpos; // Shift the cleaned value in to place. + patt3 = _1 & ~(mask << bitpos); // Clearing the bits we want to write to, same here, shifting patt1 first and then masking allows to just invert the mask (or use andn), no need for two different (constant) masks? + value = fold_build1 (NOP_EXPR, load_type, value); fold_convert (load_type, value) + if (!CONSTANT_CLASS_P (value)) + { + pattern_stmt + = gimple_build_assign (vect_recog_temp_ssa_var (load_type, NULL), + value); + value = gimple_get_lhs (pattern_stmt); there's in principle gimple_seq stmts = NULL; value = gimple_convert (&stmts, load_type, value); if (!gimple_seq_empty_p (stmts)) { pattern_stmt = gimple_seq_first_stmt (stmts); append_pattern_def_seq (vinfo, stmt_info, pattern_stmt); } though a append_pattern_def_seq helper to add a convenience sequence would be nice to have here. + pattern_stmt + = gimple_build_assign (vect_recog_temp_ssa_var (load_type, NULL), + fold_build2 (BIT_AND_EXPR, load_type, value, + mask_t)); please avoid building GENERIC and then gimple from it. Either use gimple_build_assing (..., BIT_AND_EXPR, load_type, value, mask_t); or, if you want to fold, use result_value = gimple_build (&stmts, BIT_AND_EXPR, load_type, value, mask_t); as above with gimple_convert. See my comment about the nice to have helper so you can block-process the 'stmts' sequence as pattern def sequence. + mask_i <<= shift_n; + mask_i = ~mask_i; you have to use wide_ints again, a HOST_WIDE_INT might not be large enough. You probably want to double-check your lowering code by bootstrapping / testing with -ftree-loop-if-convert. Richard.
On 17/08/2022 13:49, Richard Biener wrote: > Yes, of course. What you need to do is subtract DECL_FIELD_BIT_OFFSET > of the representative from DECL_FIELD_BIT_OFFSET of the original bitfield > access - that's the offset within the representative (by construction > both fields share DECL_FIELD_OFFSET). Doh! That makes sense... >> So instead I change bitpos such that: >> align_of_representative = TYPE_ALIGN (TREE_TYPE (representative)); >> bitpos -= bitpos.to_constant () / align_of_representative * >> align_of_representative; > ? Not sure why alignment comes into play here? Yeah just forget about this... it was my ill attempt at basically doing what you described above. > Not sure what you are saying but "yes", all shifting and masking should > happen in the type of the representative. > > + tree bitpos_tree = build_int_cst (bitsizetype, bitpos); > > for your convenience there's bitsize_int (bitpos) you can use. > > I don't think you are using the correct bitpos though, you fail to > adjust it for the BIT_FIELD_REF/BIT_INSERT_EXPR. Not sure I understand what you mean? I do adjust it, I've changed it now so it should hopefully be clearer. > > + build_int_cst (bitsizetype, TYPE_PRECISION > (bf_type)), > > the size of the bitfield reference is DECL_SIZE of the original > FIELD_DECL - it might be bigger than the precision of its type. > You probably want to double-check it's equal to the precision > (because of the insert but also because of all the masking) and > refuse to lower if not. I added a check for this but out of curiosity, how can the DECL_SIZE of a bitfield FIELD_DECL be different than it's type's precision? > > +/* Return TRUE if there are bitfields to lower in this LOOP. Fill > TO_LOWER > + with data structures representing these bitfields. */ > + > +static bool > +bitfields_to_lower_p (class loop *loop, > + vec <gassign *> &reads_to_lower, > + vec <gassign *> &writes_to_lower) > +{ > + basic_block *bbs = get_loop_body (loop); > + gimple_stmt_iterator gsi; > > as said I'd prefer to do this walk as part of the other walks we > already do - if and if only because get_loop_body () is a DFS > walk over the loop body (you should at least share that). I'm now sharing the use of ifc_bbs. The reason why I'd rather not share the walk over them is because it becomes quite complex to split out the decision to not lower if's because there are none, for which we will still want to lower bitfields, versus not lowering if's when they are there but aren't lowerable at which point we will forego lowering bitfields since we will not vectorize this loop anyway. > > + value = fold_build1 (NOP_EXPR, load_type, value); > > fold_convert (load_type, value) > > + if (!CONSTANT_CLASS_P (value)) > + { > + pattern_stmt > + = gimple_build_assign (vect_recog_temp_ssa_var (load_type, > NULL), > + value); > + value = gimple_get_lhs (pattern_stmt); > > there's in principle > > gimple_seq stmts = NULL; > value = gimple_convert (&stmts, load_type, value); > if (!gimple_seq_empty_p (stmts)) > { > pattern_stmt = gimple_seq_first_stmt (stmts); > append_pattern_def_seq (vinfo, stmt_info, pattern_stmt); > } > > though a append_pattern_def_seq helper to add a convenience sequence > would be nice to have here. Ended up using the existing 'vect_convert_input', seems to do nicely here. > You probably want to double-check your lowering code by > bootstrapping / testing with -ftree-loop-if-convert. Done, this lead me to find a new failure mode, where the type of the first operand of BIT_FIELD_REF was a FP type (TF mode), which then lead to failures when constructing the masking and shifting. I ended up adding a nop-conversion to an INTEGER type of the same width first if necessary. Also did a follow-up bootstrap with the addition of `-ftree-vectorize` and `-fno-vect-cost-model` to further test the codegen. All seems to be working on aarch64-linux-gnu. diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1.c new file mode 100644 index 0000000000000000000000000000000000000000..01cf34fb44484ca926ca5de99eef76dd99b69e92 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1.c @@ -0,0 +1,40 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" + +extern void abort(void); + +struct s { int i : 31; }; + +#define ELT0 {0} +#define ELT1 {1} +#define ELT2 {2} +#define ELT3 {3} +#define N 32 +#define RES 48 +struct s A[N] + = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3}; + +int __attribute__ ((noipa)) +f(struct s *ptr, unsigned n) { + int res = 0; + for (int i = 0; i < n; ++i) + res += ptr[i].i; + return res; +} + +int main (void) +{ + check_vect (); + + if (f(&A[0], N) != RES) + abort (); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2.c new file mode 100644 index 0000000000000000000000000000000000000000..1a4a1579c1478b9407ad21b19e8fbdca9f674b42 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2.c @@ -0,0 +1,43 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" + +extern void abort(void); + +struct s { + unsigned i : 31; + char a : 4; +}; + +#define N 32 +#define ELT0 {0x7FFFFFFFUL, 0} +#define ELT1 {0x7FFFFFFFUL, 1} +#define ELT2 {0x7FFFFFFFUL, 2} +#define ELT3 {0x7FFFFFFFUL, 3} +#define RES 48 +struct s A[N] + = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3}; + +int __attribute__ ((noipa)) +f(struct s *ptr, unsigned n) { + int res = 0; + for (int i = 0; i < n; ++i) + res += ptr[i].a; + return res; +} + +int main (void) +{ + check_vect (); + + if (f(&A[0], N) != RES) + abort (); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-3.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-3.c new file mode 100644 index 0000000000000000000000000000000000000000..216611a29fd8bbfbafdbdb79d790e520f44ba672 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-3.c @@ -0,0 +1,43 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" +#include <stdbool.h> + +extern void abort(void); + +typedef struct { + int c; + int b; + bool a : 1; +} struct_t; + +#define N 16 +#define ELT_F { 0xFFFFFFFF, 0xFFFFFFFF, 0 } +#define ELT_T { 0xFFFFFFFF, 0xFFFFFFFF, 1 } + +struct_t vect_false[N] = { ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, + ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F }; +struct_t vect_true[N] = { ELT_F, ELT_F, ELT_T, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, + ELT_F, ELT_F, ELT_T, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F }; +int main (void) +{ + unsigned ret = 0; + for (unsigned i = 0; i < N; i++) + { + ret |= vect_false[i].a; + } + if (ret) + abort (); + + for (unsigned i = 0; i < N; i++) + { + ret |= vect_true[i].a; + } + if (!ret) + abort (); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-4.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-4.c new file mode 100644 index 0000000000000000000000000000000000000000..5bc9c412e9616aefcbf49a4518f1603380a54b2f --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-4.c @@ -0,0 +1,45 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" + +extern void abort(void); + +struct s { + unsigned i : 31; + char x : 2; + char a : 4; +}; + +#define N 32 +#define ELT0 {0x7FFFFFFFUL, 3, 0} +#define ELT1 {0x7FFFFFFFUL, 3, 1} +#define ELT2 {0x7FFFFFFFUL, 3, 2} +#define ELT3 {0x7FFFFFFFUL, 3, 3} +#define RES 48 +struct s A[N] + = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3}; + +int __attribute__ ((noipa)) +f(struct s *ptr, unsigned n) { + int res = 0; + for (int i = 0; i < n; ++i) + res += ptr[i].a; + return res; +} + +int main (void) +{ + check_vect (); + + if (f(&A[0], N) != RES) + abort (); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c new file mode 100644 index 0000000000000000000000000000000000000000..19683d277b1ade1034496136f1d03bb2b446900f --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c @@ -0,0 +1,39 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" + +extern void abort(void); + +struct s { int i : 31; }; + +#define N 32 +#define V 5 +struct s A[N]; + +void __attribute__ ((noipa)) +f(struct s *ptr, unsigned n) { + for (int i = 0; i < n; ++i) + ptr[i].i = V; +} + +void __attribute__ ((noipa)) +check_f(struct s *ptr) { + for (unsigned i = 0; i < N; ++i) + if (ptr[i].i != V) + abort (); +} + +int main (void) +{ + check_vect (); + __builtin_memset (&A[0], 0, sizeof(struct s) * N); + + f(&A[0], N); + check_f (&A[0]); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c new file mode 100644 index 0000000000000000000000000000000000000000..d550dd35ab75eb67f6e53f89fbf55b7315e50bc9 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c @@ -0,0 +1,42 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" + +extern void abort(void); + +struct s { + unsigned i : 31; + char a : 4; +}; + +#define N 32 +#define V 5 +struct s A[N]; + +void __attribute__ ((noipa)) +f(struct s *ptr, unsigned n) { + for (int i = 0; i < n; ++i) + ptr[i].a = V; +} + +void __attribute__ ((noipa)) +check_f(struct s *ptr) { + for (unsigned i = 0; i < N; ++i) + if (ptr[i].a != V) + abort (); +} + +int main (void) +{ + check_vect (); + __builtin_memset (&A[0], 0, sizeof(struct s) * N); + + f(&A[0], N); + check_f (&A[0]); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c new file mode 100644 index 0000000000000000000000000000000000000000..3303d2610ff972d986be172962c129634ee64254 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c @@ -0,0 +1,43 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" + +extern void abort(void); + +struct s { + unsigned i : 31; + char x : 2; + char a : 4; +}; + +#define N 32 +#define V 5 +struct s A[N]; + +void __attribute__ ((noipa)) +f(struct s *ptr, unsigned n) { + for (int i = 0; i < n; ++i) + ptr[i].a = V; +} + +void __attribute__ ((noipa)) +check_f(struct s *ptr) { + for (unsigned i = 0; i < N; ++i) + if (ptr[i].a != V) + abort (); +} + +int main (void) +{ + check_vect (); + __builtin_memset (&A[0], 0, sizeof(struct s) * N); + + f(&A[0], N); + check_f (&A[0]); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ + diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc index 1c8e1a45234b8c3565edaacd55abbee23d8ea240..c5c6d937a645e9caa0092c941c52c5192363bbd7 100644 --- a/gcc/tree-if-conv.cc +++ b/gcc/tree-if-conv.cc @@ -91,6 +91,7 @@ along with GCC; see the file COPYING3. If not see #include "tree-pass.h" #include "ssa.h" #include "expmed.h" +#include "expr.h" #include "optabs-query.h" #include "gimple-pretty-print.h" #include "alias.h" @@ -123,6 +124,9 @@ along with GCC; see the file COPYING3. If not see #include "tree-vectorizer.h" #include "tree-eh.h" +/* For lang_hooks.types.type_for_mode. */ +#include "langhooks.h" + /* Only handle PHIs with no more arguments unless we are asked to by simd pragma. */ #define MAX_PHI_ARG_NUM \ @@ -145,6 +149,12 @@ static bool need_to_rewrite_undefined; before phi_convertible_by_degenerating_args. */ static bool any_complicated_phi; +/* True if we have bitfield accesses we can lower. */ +static bool need_to_lower_bitfields; + +/* True if there is any ifcvting to be done. */ +static bool need_to_ifcvt; + /* Hash for struct innermost_loop_behavior. It depends on the user to free the memory. */ @@ -1411,15 +1421,6 @@ if_convertible_loop_p_1 (class loop *loop, vec<data_reference_p> *refs) calculate_dominance_info (CDI_DOMINATORS); - /* Allow statements that can be handled during if-conversion. */ - ifc_bbs = get_loop_body_in_if_conv_order (loop); - if (!ifc_bbs) - { - if (dump_file && (dump_flags & TDF_DETAILS)) - fprintf (dump_file, "Irreducible loop\n"); - return false; - } - for (i = 0; i < loop->num_nodes; i++) { basic_block bb = ifc_bbs[i]; @@ -2898,18 +2899,22 @@ version_loop_for_if_conversion (class loop *loop, vec<gimple *> *preds) class loop *new_loop; gimple *g; gimple_stmt_iterator gsi; - unsigned int save_length; + unsigned int save_length = 0; g = gimple_build_call_internal (IFN_LOOP_VECTORIZED, 2, build_int_cst (integer_type_node, loop->num), integer_zero_node); gimple_call_set_lhs (g, cond); - /* Save BB->aux around loop_version as that uses the same field. */ - save_length = loop->inner ? loop->inner->num_nodes : loop->num_nodes; - void **saved_preds = XALLOCAVEC (void *, save_length); - for (unsigned i = 0; i < save_length; i++) - saved_preds[i] = ifc_bbs[i]->aux; + void **saved_preds = NULL; + if (any_complicated_phi || need_to_predicate) + { + /* Save BB->aux around loop_version as that uses the same field. */ + save_length = loop->inner ? loop->inner->num_nodes : loop->num_nodes; + saved_preds = XALLOCAVEC (void *, save_length); + for (unsigned i = 0; i < save_length; i++) + saved_preds[i] = ifc_bbs[i]->aux; + } initialize_original_copy_tables (); /* At this point we invalidate porfile confistency until IFN_LOOP_VECTORIZED @@ -2921,8 +2926,9 @@ version_loop_for_if_conversion (class loop *loop, vec<gimple *> *preds) profile_probability::always (), true); free_original_copy_tables (); - for (unsigned i = 0; i < save_length; i++) - ifc_bbs[i]->aux = saved_preds[i]; + if (any_complicated_phi || need_to_predicate) + for (unsigned i = 0; i < save_length; i++) + ifc_bbs[i]->aux = saved_preds[i]; if (new_loop == NULL) return NULL; @@ -2998,7 +3004,7 @@ ifcvt_split_critical_edges (class loop *loop, bool aggressive_if_conv) auto_vec<edge> critical_edges; /* Loop is not well formed. */ - if (num <= 2 || loop->inner || !single_exit (loop)) + if (loop->inner) return false; body = get_loop_body (loop); @@ -3259,6 +3265,200 @@ ifcvt_hoist_invariants (class loop *loop, edge pe) free (body); } +/* Returns the DECL_FIELD_BIT_OFFSET of the bitfield accesse in stmt iff its + type mode is not BLKmode. If BITPOS is not NULL it will hold the poly_int64 + value of the DECL_FIELD_BIT_OFFSET of the bitfield access and STRUCT_EXPR, + if not NULL, will hold the tree representing the base struct of this + bitfield. */ + +static tree +get_bitfield_rep (gassign *stmt, bool write, tree *bitpos, + tree *struct_expr) +{ + tree comp_ref = write ? gimple_assign_lhs (stmt) + : gimple_assign_rhs1 (stmt); + + tree field_decl = TREE_OPERAND (comp_ref, 1); + tree rep_decl = DECL_BIT_FIELD_REPRESENTATIVE (field_decl); + + /* Bail out if the representative is BLKmode as we will not be able to + vectorize this. */ + if (TYPE_MODE (TREE_TYPE (rep_decl)) == E_BLKmode) + return NULL_TREE; + + /* Bail out if the DECL_SIZE of the field_decl isn't the same as the BF's + precision. */ + unsigned HOST_WIDE_INT decl_size = tree_to_uhwi (DECL_SIZE (field_decl)); + if (TYPE_PRECISION (TREE_TYPE (gimple_assign_lhs (stmt))) != decl_size) + return NULL_TREE; + + if (struct_expr) + *struct_expr = TREE_OPERAND (comp_ref, 0); + + if (bitpos) + *bitpos + = fold_build2 (MINUS_EXPR, bitsizetype, + DECL_FIELD_BIT_OFFSET (field_decl), + DECL_FIELD_BIT_OFFSET (rep_decl)); + + return rep_decl; + +} + +/* Lowers the bitfield described by DATA. + For a write like: + + struct.bf = _1; + + lower to: + + __ifc_1 = struct.<representative>; + __ifc_2 = BIT_INSERT_EXPR (__ifc_1, _1, bitpos); + struct.<representative> = __ifc_2; + + For a read: + + _1 = struct.bf; + + lower to: + + __ifc_1 = struct.<representative>; + _1 = BIT_FIELD_REF (__ifc_1, bitsize, bitpos); + + where representative is a legal load that contains the bitfield value, + bitsize is the size of the bitfield and bitpos the offset to the start of + the bitfield within the representative. */ + +static void +lower_bitfield (gassign *stmt, bool write) +{ + tree struct_expr; + tree bitpos; + tree rep_decl = get_bitfield_rep (stmt, write, &bitpos, &struct_expr); + tree rep_type = TREE_TYPE (rep_decl); + tree bf_type = TREE_TYPE (gimple_assign_lhs (stmt)); + + gimple_stmt_iterator gsi = gsi_for_stmt (stmt); + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "Lowering:\n"); + print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM); + fprintf (dump_file, "to:\n"); + } + + /* REP_COMP_REF is a COMPONENT_REF for the representative. NEW_VAL is it's + defining SSA_NAME. */ + tree rep_comp_ref = build3 (COMPONENT_REF, rep_type, struct_expr, rep_decl, + NULL_TREE); + tree new_val = ifc_temp_var (rep_type, rep_comp_ref, &gsi); + + if (dump_file && (dump_flags & TDF_DETAILS)) + print_gimple_stmt (dump_file, SSA_NAME_DEF_STMT (new_val), 0, TDF_SLIM); + + if (write) + { + new_val = ifc_temp_var (rep_type, + build3 (BIT_INSERT_EXPR, rep_type, new_val, + unshare_expr (gimple_assign_rhs1 (stmt)), + bitpos), &gsi); + + if (dump_file && (dump_flags & TDF_DETAILS)) + print_gimple_stmt (dump_file, SSA_NAME_DEF_STMT (new_val), 0, TDF_SLIM); + + gimple *new_stmt = gimple_build_assign (unshare_expr (rep_comp_ref), + new_val); + gimple_set_vuse (new_stmt, gimple_vuse (stmt)); + tree vdef = gimple_vdef (stmt); + gimple_set_vdef (new_stmt, vdef); + SSA_NAME_DEF_STMT (vdef) = new_stmt; + gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT); + + if (dump_file && (dump_flags & TDF_DETAILS)) + print_gimple_stmt (dump_file, new_stmt, 0, TDF_SLIM); + } + else + { + tree bfr = build3 (BIT_FIELD_REF, bf_type, new_val, + build_int_cst (bitsizetype, TYPE_PRECISION (bf_type)), + bitpos); + new_val = ifc_temp_var (bf_type, bfr, &gsi); + redundant_ssa_names.safe_push (std::make_pair (gimple_assign_lhs (stmt), + new_val)); + + if (dump_file && (dump_flags & TDF_DETAILS)) + print_gimple_stmt (dump_file, SSA_NAME_DEF_STMT (new_val), 0, TDF_SLIM); + } + + gsi_remove (&gsi, true); +} + +/* Return TRUE if there are bitfields to lower in this LOOP. Fill TO_LOWER + with data structures representing these bitfields. */ + +static bool +bitfields_to_lower_p (class loop *loop, + vec <gassign *> &reads_to_lower, + vec <gassign *> &writes_to_lower) +{ + gimple_stmt_iterator gsi; + + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "Analyzing loop %d for bitfields:\n", loop->num); + } + + for (unsigned i = 0; i < loop->num_nodes; ++i) + { + basic_block bb = ifc_bbs[i]; + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) + { + gassign *stmt = dyn_cast<gassign*> (gsi_stmt (gsi)); + if (!stmt) + continue; + + tree op = gimple_assign_lhs (stmt); + bool write = TREE_CODE (op) == COMPONENT_REF; + + if (!write) + op = gimple_assign_rhs1 (stmt); + + if (TREE_CODE (op) != COMPONENT_REF) + continue; + + if (DECL_BIT_FIELD_TYPE (TREE_OPERAND (op, 1))) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM); + + if (!INTEGRAL_TYPE_P (TREE_TYPE (op))) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "\t Bitfield NO OK to lower," + " field type is not Integral.\n"); + return false; + } + + if (!get_bitfield_rep (stmt, write, NULL, NULL)) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "\t Bitfield NOT OK to lower," + " representative is BLKmode.\n"); + return false; + } + + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "\tBitfield OK to lower.\n"); + if (write) + writes_to_lower.safe_push (stmt); + else + reads_to_lower.safe_push (stmt); + } + } + } + return !reads_to_lower.is_empty () || !writes_to_lower.is_empty (); +} + + /* If-convert LOOP when it is legal. For the moment this pass has no profitability analysis. Returns non-zero todo flags when something changed. */ @@ -3269,12 +3469,16 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds) unsigned int todo = 0; bool aggressive_if_conv; class loop *rloop; + auto_vec <gassign *, 4> reads_to_lower; + auto_vec <gassign *, 4> writes_to_lower; bitmap exit_bbs; edge pe; again: rloop = NULL; ifc_bbs = NULL; + need_to_lower_bitfields = false; + need_to_ifcvt = false; need_to_predicate = false; need_to_rewrite_undefined = false; any_complicated_phi = false; @@ -3290,16 +3494,40 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds) aggressive_if_conv = true; } - if (!ifcvt_split_critical_edges (loop, aggressive_if_conv)) + if (!single_exit (loop)) goto cleanup; - if (!if_convertible_loop_p (loop) - || !dbg_cnt (if_conversion_tree)) + /* If there are more than two BBs in the loop then there is at least one if + to convert. */ + if (loop->num_nodes > 2 + && !ifcvt_split_critical_edges (loop, aggressive_if_conv)) goto cleanup; - if ((need_to_predicate || any_complicated_phi) - && ((!flag_tree_loop_vectorize && !loop->force_vectorize) - || loop->dont_vectorize)) + ifc_bbs = get_loop_body_in_if_conv_order (loop); + if (!ifc_bbs) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "Irreducible loop\n"); + goto cleanup; + } + + if (loop->num_nodes > 2) + { + need_to_ifcvt = true; + + if (!if_convertible_loop_p (loop) || !dbg_cnt (if_conversion_tree)) + goto cleanup; + + if ((need_to_predicate || any_complicated_phi) + && ((!flag_tree_loop_vectorize && !loop->force_vectorize) + || loop->dont_vectorize)) + goto cleanup; + } + + need_to_lower_bitfields = bitfields_to_lower_p (loop, reads_to_lower, + writes_to_lower); + + if (!need_to_ifcvt && !need_to_lower_bitfields) goto cleanup; /* The edge to insert invariant stmts on. */ @@ -3310,7 +3538,8 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds) Either version this loop, or if the pattern is right for outer-loop vectorization, version the outer loop. In the latter case we will still if-convert the original inner loop. */ - if (need_to_predicate + if (need_to_lower_bitfields + || need_to_predicate || any_complicated_phi || flag_tree_loop_if_convert != 1) { @@ -3350,10 +3579,31 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds) pe = single_pred_edge (gimple_bb (preds->last ())); } - /* Now all statements are if-convertible. Combine all the basic - blocks into one huge basic block doing the if-conversion - on-the-fly. */ - combine_blocks (loop); + if (need_to_lower_bitfields) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "-------------------------\n"); + fprintf (dump_file, "Start lowering bitfields\n"); + } + while (!reads_to_lower.is_empty ()) + lower_bitfield (reads_to_lower.pop (), false); + while (!writes_to_lower.is_empty ()) + lower_bitfield (writes_to_lower.pop (), true); + + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "Done lowering bitfields\n"); + fprintf (dump_file, "-------------------------\n"); + } + } + if (need_to_ifcvt) + { + /* Now all statements are if-convertible. Combine all the basic + blocks into one huge basic block doing the if-conversion + on-the-fly. */ + combine_blocks (loop); + } /* Perform local CSE, this esp. helps the vectorizer analysis if loads and stores are involved. CSE only the loop body, not the entry @@ -3393,6 +3643,8 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds) if (rloop != NULL) { loop = rloop; + reads_to_lower.truncate (0); + writes_to_lower.truncate (0); goto again; } diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc index b279a82551eb70379804d405983ae5dc44b66bf5..e93cdc727da4bb7863b2ad13f29f7d550492adea 100644 --- a/gcc/tree-vect-data-refs.cc +++ b/gcc/tree-vect-data-refs.cc @@ -4301,7 +4301,8 @@ vect_find_stmt_data_reference (loop_p loop, gimple *stmt, free_data_ref (dr); return opt_result::failure_at (stmt, "not vectorized:" - " statement is bitfield access %G", stmt); + " statement is an unsupported" + " bitfield access %G", stmt); } if (DR_BASE_ADDRESS (dr) diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc index dfbfb71b3c69a0205ccc1b287cb50fa02a70942e..731b7c2bc1962ff22288c4439679c0b11232cb4a 100644 --- a/gcc/tree-vect-patterns.cc +++ b/gcc/tree-vect-patterns.cc @@ -35,6 +35,8 @@ along with GCC; see the file COPYING3. If not see #include "tree-eh.h" #include "gimplify.h" #include "gimple-iterator.h" +#include "gimple-fold.h" +#include "gimplify-me.h" #include "cfgloop.h" #include "tree-vectorizer.h" #include "dumpfile.h" @@ -1828,6 +1830,294 @@ vect_recog_widen_sum_pattern (vec_info *vinfo, return pattern_stmt; } +/* Function vect_recog_bitfield_ref_pattern + + Try to find the following pattern: + + _2 = BIT_FIELD_REF (_1, bitsize, bitpos); + _3 = (type_out) _2; + + where type_out is a non-bitfield type, that is to say, it's precision matches + 2^(TYPE_SIZE(type_out) - (TYPE_UNSIGNED (type_out) ? 1 : 2)). + + Input: + + * STMT_VINFO: The stmt from which the pattern search begins. + here it starts with: + _3 = (type_out) _2; + + Output: + + * TYPE_OUT: The vector type of the output of this pattern. + + * Return value: A new stmt that will be used to replace the sequence of + stmts that constitute the pattern. If the precision of type_out is bigger + than the precision type of _1 we perform the widening before the shifting, + since the new precision will be large enough to shift the value and moving + widening operations up the statement chain enables the generation of + widening loads. If we are widening and the operation after the pattern is + an addition then we mask first and shift later, to enable the generation of + shifting adds. In the case of narrowing we will always mask first, shift + last and then perform a narrowing operation. This will enable the + generation of narrowing shifts. + + Widening with mask first, shift later: + patt1 = (type_out) _1; + patt2 = patt1 & (((1 << bitsize) - 1) << bitpos); + _3 = patt2 >> bitpos; + + Widening with shift first, mask last: + patt1 = (type_out) _1; + patt2 = patt1 >> bitpos; + _3 = patt2 & ((1 <<bitsize) - 1); + + Narrowing: + patt1 = _1 & (((1 << bitsize) - 1) << bitpos); + patt2 = patt1 >> bitpos; + _3 = (type_out) patt2; + + The shifting is always optional depending on whether bitpos != 0. + +*/ + +static gimple * +vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info, + tree *type_out) +{ + gassign *first_stmt = dyn_cast <gassign *> (stmt_info->stmt); + + if (!first_stmt) + return NULL; + + gassign *bf_stmt; + if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (first_stmt)) + && TREE_CODE (gimple_assign_rhs1 (first_stmt)) == SSA_NAME) + { + gimple *second_stmt + = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (first_stmt)); + if (!second_stmt || gimple_code (second_stmt) != GIMPLE_ASSIGN + || gimple_assign_rhs_code (second_stmt) != BIT_FIELD_REF) + return NULL; + bf_stmt = static_cast <gassign *> (second_stmt); + } + else + return NULL; + + tree bf_ref = gimple_assign_rhs1 (bf_stmt); + tree lhs = TREE_OPERAND (bf_ref, 0); + + if (!INTEGRAL_TYPE_P (TREE_TYPE (bf_ref))) + return NULL; + + gimple *use_stmt, *pattern_stmt; + use_operand_p use_p; + tree ret = gimple_assign_lhs (first_stmt); + tree ret_type = TREE_TYPE (ret); + bool shift_first = true; + + /* We move the conversion earlier if the loaded type is smaller than the + return type to enable the use of widening loads. */ + if (TYPE_PRECISION (TREE_TYPE (lhs)) < TYPE_PRECISION (ret_type) + && !useless_type_conversion_p (TREE_TYPE (lhs), ret_type)) + { + pattern_stmt + = gimple_build_assign (vect_recog_temp_ssa_var (ret_type, NULL), + NOP_EXPR, lhs); + lhs = gimple_get_lhs (pattern_stmt); + append_pattern_def_seq (vinfo, stmt_info, pattern_stmt); + } + else if (!useless_type_conversion_p (TREE_TYPE (lhs), ret_type)) + /* If we are doing the conversion last then also delay the shift as we may + be able to combine the shift and conversion in certain cases. */ + shift_first = false; + + tree vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (lhs)); + /* If the first operand of the BIT_FIELD_REF is not an INTEGER type, convert + it to one of the same width so we can perform the necessary masking and + shifting. */ + if (!INTEGRAL_TYPE_P (TREE_TYPE (lhs))) + { + tree int_type + = build_nonstandard_integer_type (TYPE_PRECISION (TREE_TYPE (lhs)), + true); + pattern_stmt + = gimple_build_assign (vect_recog_temp_ssa_var (int_type, NULL), + NOP_EXPR, lhs); + vectype = get_vectype_for_scalar_type (vinfo, int_type); + lhs = gimple_assign_lhs (pattern_stmt); + append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype); + } + + /* If the only use of the result of this BIT_FIELD_REF + CONVERT is a + PLUS_EXPR then do the shift last as some targets can combine the shift and + add into a single instruction. */ + if (single_imm_use (gimple_assign_lhs (first_stmt), &use_p, &use_stmt)) + { + if (gimple_code (use_stmt) == GIMPLE_ASSIGN + && gimple_assign_rhs_code (use_stmt) == PLUS_EXPR) + shift_first = false; + } + + unsigned HOST_WIDE_INT shift_n = bit_field_offset (bf_ref).to_constant (); + unsigned HOST_WIDE_INT mask_width = bit_field_size (bf_ref).to_constant (); + unsigned int prec = TYPE_PRECISION (TREE_TYPE (lhs)); + if (shift_first) + { + if (shift_n) + { + pattern_stmt + = gimple_build_assign (vect_recog_temp_ssa_var (TREE_TYPE (lhs), + NULL), + RSHIFT_EXPR, lhs, + build_int_cst (sizetype, shift_n)); + lhs = gimple_assign_lhs (pattern_stmt); + append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype); + } + + tree mask = wide_int_to_tree (TREE_TYPE (lhs), + wi::mask (mask_width, false, prec)); + + pattern_stmt + = gimple_build_assign (vect_recog_temp_ssa_var (TREE_TYPE (lhs), + NULL), + BIT_AND_EXPR, lhs, mask); + lhs = gimple_assign_lhs (pattern_stmt); + } + else + { + tree mask = wide_int_to_tree (TREE_TYPE (lhs), + wi::shifted_mask (shift_n, mask_width, + false, prec)); + pattern_stmt + = gimple_build_assign (vect_recog_temp_ssa_var (TREE_TYPE (lhs), + NULL), + BIT_AND_EXPR, lhs, mask); + lhs = gimple_assign_lhs (pattern_stmt); + if (shift_n) + { + append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype); + pattern_stmt + = gimple_build_assign (vect_recog_temp_ssa_var (TREE_TYPE (lhs), + NULL), + RSHIFT_EXPR, lhs, + build_int_cst (sizetype, shift_n)); + lhs = gimple_assign_lhs (pattern_stmt); + } + } + + if (!useless_type_conversion_p (TREE_TYPE (lhs), ret_type)) + { + append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype); + pattern_stmt + = gimple_build_assign (vect_recog_temp_ssa_var (ret_type, NULL), + NOP_EXPR, lhs); + lhs = gimple_get_lhs (pattern_stmt); + } + + *type_out = STMT_VINFO_VECTYPE (stmt_info); + vect_pattern_detected ("bitfield_ref pattern", stmt_info->stmt); + + return pattern_stmt; +} + +/* Function vect_recog_bit_insert_pattern + + Try to find the following pattern: + + _3 = BIT_INSERT_EXPR (_1, _2, bitpos); + + Input: + + * STMT_VINFO: The stmt we want to replace. + + Output: + + * TYPE_OUT: The vector type of the output of this pattern. + + * Return value: A new stmt that will be used to replace the sequence of + stmts that constitute the pattern. In this case it will be: + patt1 = _2 << bitpos; // Shift value into place + patt2 = patt1 & (mask << bitpos); // Clearing of the non-relevant bits in the + // 'to-write value'. + patt3 = _1 & ~(mask << bitpos); // Clearing the bits we want to write to, + // from the value we want to write to. + _3 = patt3 | patt2; // Write bits. + + + where mask = ((1 << TYPE_PRECISION (_2)) - 1), a mask to keep the number of + bits corresponding to the real size of the bitfield value we are writing to. + +*/ + +static gimple * +vect_recog_bit_insert_pattern (vec_info *vinfo, stmt_vec_info stmt_info, + tree *type_out) +{ + gassign *bf_stmt = dyn_cast <gassign *> (stmt_info->stmt); + if (!bf_stmt || gimple_assign_rhs_code (bf_stmt) != BIT_INSERT_EXPR) + return NULL; + + tree load = gimple_assign_rhs1 (bf_stmt); + tree value = gimple_assign_rhs2 (bf_stmt); + tree offset = gimple_assign_rhs3 (bf_stmt); + + tree bf_type = TREE_TYPE (value); + tree load_type = TREE_TYPE (load); + + if (!INTEGRAL_TYPE_P (load_type)) + return NULL; + + gimple *pattern_stmt; + + vect_unpromoted_value unprom; + unprom.set_op (value, vect_internal_def); + value = vect_convert_input (vinfo, stmt_info, load_type, &unprom, + get_vectype_for_scalar_type (vinfo, load_type)); + + unsigned HOST_WIDE_INT shift_n = tree_to_uhwi (offset); + if (shift_n) + { + pattern_stmt + = gimple_build_assign (vect_recog_temp_ssa_var (load_type, NULL), + LSHIFT_EXPR, value, offset); + append_pattern_def_seq (vinfo, stmt_info, pattern_stmt); + value = gimple_get_lhs (pattern_stmt); + } + + unsigned HOST_WIDE_INT mask_width = TYPE_PRECISION (bf_type); + unsigned int prec = TYPE_PRECISION (load_type); + tree mask_t + = wide_int_to_tree (load_type, + wi::shifted_mask (shift_n, mask_width, false, prec)); + + /* Clear bits we don't want to write back from value and shift it in place. */ + gimple_seq stmts = NULL; + value = gimple_build (&stmts, BIT_AND_EXPR, load_type, value, mask_t); + if (!gimple_seq_empty_p (stmts)) + { + pattern_stmt = gimple_seq_first_stmt (stmts); + append_pattern_def_seq (vinfo, stmt_info, pattern_stmt); + } + + /* Mask off the bits in the loaded value. */ + mask_t = wide_int_to_tree (load_type, + wi::shifted_mask (shift_n, mask_width, true, prec)); + tree lhs = vect_recog_temp_ssa_var (load_type, NULL); + pattern_stmt = gimple_build_assign (lhs, BIT_AND_EXPR,load, mask_t); + append_pattern_def_seq (vinfo, stmt_info, pattern_stmt); + + /* Compose the value to write back. */ + pattern_stmt + = gimple_build_assign (vect_recog_temp_ssa_var (load_type, NULL), + BIT_IOR_EXPR, lhs, value); + + *type_out = STMT_VINFO_VECTYPE (stmt_info); + vect_pattern_detected ("bit_insert pattern", stmt_info->stmt); + + return pattern_stmt; +} + + /* Recognize cases in which an operation is performed in one type WTYPE but could be done more efficiently in a narrower type NTYPE. For example, if we have: @@ -5623,6 +5913,8 @@ struct vect_recog_func taken which means usually the more complex one needs to preceed the less comples onex (widen_sum only after dot_prod or sad for example). */ static vect_recog_func vect_vect_recog_func_ptrs[] = { + { vect_recog_bitfield_ref_pattern, "bitfield_ref" }, + { vect_recog_bit_insert_pattern, "bit_insert" }, { vect_recog_over_widening_pattern, "over_widening" }, /* Must come after over_widening, which narrows the shift as much as possible beforehand. */
Ping. On 25/08/2022 10:09, Andre Vieira (lists) via Gcc-patches wrote: > > On 17/08/2022 13:49, Richard Biener wrote: >> Yes, of course. What you need to do is subtract DECL_FIELD_BIT_OFFSET >> of the representative from DECL_FIELD_BIT_OFFSET of the original >> bitfield >> access - that's the offset within the representative (by construction >> both fields share DECL_FIELD_OFFSET). > Doh! That makes sense... >>> So instead I change bitpos such that: >>> align_of_representative = TYPE_ALIGN (TREE_TYPE (representative)); >>> bitpos -= bitpos.to_constant () / align_of_representative * >>> align_of_representative; >> ? Not sure why alignment comes into play here? > Yeah just forget about this... it was my ill attempt at basically > doing what you described above. >> Not sure what you are saying but "yes", all shifting and masking should >> happen in the type of the representative. >> >> + tree bitpos_tree = build_int_cst (bitsizetype, bitpos); >> >> for your convenience there's bitsize_int (bitpos) you can use. >> >> I don't think you are using the correct bitpos though, you fail to >> adjust it for the BIT_FIELD_REF/BIT_INSERT_EXPR. > Not sure I understand what you mean? I do adjust it, I've changed it > now so it should hopefully be clearer. >> >> + build_int_cst (bitsizetype, TYPE_PRECISION >> (bf_type)), >> >> the size of the bitfield reference is DECL_SIZE of the original >> FIELD_DECL - it might be bigger than the precision of its type. >> You probably want to double-check it's equal to the precision >> (because of the insert but also because of all the masking) and >> refuse to lower if not. > I added a check for this but out of curiosity, how can the DECL_SIZE > of a bitfield FIELD_DECL be different than it's type's precision? >> >> +/* Return TRUE if there are bitfields to lower in this LOOP. Fill >> TO_LOWER >> + with data structures representing these bitfields. */ >> + >> +static bool >> +bitfields_to_lower_p (class loop *loop, >> + vec <gassign *> &reads_to_lower, >> + vec <gassign *> &writes_to_lower) >> +{ >> + basic_block *bbs = get_loop_body (loop); >> + gimple_stmt_iterator gsi; >> >> as said I'd prefer to do this walk as part of the other walks we >> already do - if and if only because get_loop_body () is a DFS >> walk over the loop body (you should at least share that). > I'm now sharing the use of ifc_bbs. The reason why I'd rather not > share the walk over them is because it becomes quite complex to split > out the decision to not lower if's because there are none, for which > we will still want to lower bitfields, versus not lowering if's when > they are there but aren't lowerable at which point we will forego > lowering bitfields since we will not vectorize this loop anyway. >> >> + value = fold_build1 (NOP_EXPR, load_type, value); >> >> fold_convert (load_type, value) >> >> + if (!CONSTANT_CLASS_P (value)) >> + { >> + pattern_stmt >> + = gimple_build_assign (vect_recog_temp_ssa_var (load_type, >> NULL), >> + value); >> + value = gimple_get_lhs (pattern_stmt); >> >> there's in principle >> >> gimple_seq stmts = NULL; >> value = gimple_convert (&stmts, load_type, value); >> if (!gimple_seq_empty_p (stmts)) >> { >> pattern_stmt = gimple_seq_first_stmt (stmts); >> append_pattern_def_seq (vinfo, stmt_info, pattern_stmt); >> } >> >> though a append_pattern_def_seq helper to add a convenience sequence >> would be nice to have here. > Ended up using the existing 'vect_convert_input', seems to do nicely > here. >> You probably want to double-check your lowering code by >> bootstrapping / testing with -ftree-loop-if-convert. > Done, this lead me to find a new failure mode, where the type of the > first operand of BIT_FIELD_REF was a FP type (TF mode), which then > lead to failures when constructing the masking and shifting. I ended > up adding a nop-conversion to an INTEGER type of the same width first > if necessary. Also did a follow-up bootstrap with the addition of > `-ftree-vectorize` and `-fno-vect-cost-model` to further test the > codegen. All seems to be working on aarch64-linux-gnu.
On Thu, 25 Aug 2022, Andre Vieira (lists) wrote: > > On 17/08/2022 13:49, Richard Biener wrote: > > Yes, of course. What you need to do is subtract DECL_FIELD_BIT_OFFSET > > of the representative from DECL_FIELD_BIT_OFFSET of the original bitfield > > access - that's the offset within the representative (by construction > > both fields share DECL_FIELD_OFFSET). > Doh! That makes sense... > >> So instead I change bitpos such that: > >> align_of_representative = TYPE_ALIGN (TREE_TYPE (representative)); > >> bitpos -= bitpos.to_constant () / align_of_representative * > >> align_of_representative; > > ? Not sure why alignment comes into play here? > Yeah just forget about this... it was my ill attempt at basically doing what > you described above. > > Not sure what you are saying but "yes", all shifting and masking should > > happen in the type of the representative. > > > > + tree bitpos_tree = build_int_cst (bitsizetype, bitpos); > > > > for your convenience there's bitsize_int (bitpos) you can use. > > > > I don't think you are using the correct bitpos though, you fail to > > adjust it for the BIT_FIELD_REF/BIT_INSERT_EXPR. > Not sure I understand what you mean? I do adjust it, I've changed it now so it > should hopefully be clearer. > > > > + build_int_cst (bitsizetype, TYPE_PRECISION > > (bf_type)), > > > > the size of the bitfield reference is DECL_SIZE of the original > > FIELD_DECL - it might be bigger than the precision of its type. > > You probably want to double-check it's equal to the precision > > (because of the insert but also because of all the masking) and > > refuse to lower if not. > I added a check for this but out of curiosity, how can the DECL_SIZE of a > bitfield FIELD_DECL be different than it's type's precision? It's probably not possible to create a C testcase but I don't see what makes this impossible in general to have padding in a bitfield object. > > > > +/* Return TRUE if there are bitfields to lower in this LOOP. Fill > > TO_LOWER > > + with data structures representing these bitfields. */ > > + > > +static bool > > +bitfields_to_lower_p (class loop *loop, > > + vec <gassign *> &reads_to_lower, > > + vec <gassign *> &writes_to_lower) > > +{ > > + basic_block *bbs = get_loop_body (loop); > > + gimple_stmt_iterator gsi; > > > > as said I'd prefer to do this walk as part of the other walks we > > already do - if and if only because get_loop_body () is a DFS > > walk over the loop body (you should at least share that). > I'm now sharing the use of ifc_bbs. The reason why I'd rather not share the > walk over them is because it becomes quite complex to split out the decision > to not lower if's because there are none, for which we will still want to > lower bitfields, versus not lowering if's when they are there but aren't > lowerable at which point we will forego lowering bitfields since we will not > vectorize this loop anyway. > > > > + value = fold_build1 (NOP_EXPR, load_type, value); > > > > fold_convert (load_type, value) > > > > + if (!CONSTANT_CLASS_P (value)) > > + { > > + pattern_stmt > > + = gimple_build_assign (vect_recog_temp_ssa_var (load_type, > > NULL), > > + value); > > + value = gimple_get_lhs (pattern_stmt); > > > > there's in principle > > > > gimple_seq stmts = NULL; > > value = gimple_convert (&stmts, load_type, value); > > if (!gimple_seq_empty_p (stmts)) > > { > > pattern_stmt = gimple_seq_first_stmt (stmts); > > append_pattern_def_seq (vinfo, stmt_info, pattern_stmt); > > } > > > > though a append_pattern_def_seq helper to add a convenience sequence > > would be nice to have here. > Ended up using the existing 'vect_convert_input', seems to do nicely here. > > You probably want to double-check your lowering code by > > bootstrapping / testing with -ftree-loop-if-convert. > Done, this lead me to find a new failure mode, where the type of the first > operand of BIT_FIELD_REF was a FP type (TF mode), which then lead to failures > when constructing the masking and shifting. I ended up adding a nop-conversion > to an INTEGER type of the same width first if necessary. You want a VIEW_CONVERT (aka bit-cast) here. > Also did a follow-up > bootstrap with the addition of `-ftree-vectorize` and `-fno-vect-cost-model` > to further test the codegen. All seems to be working on aarch64-linux-gnu. +static tree +get_bitfield_rep (gassign *stmt, bool write, tree *bitpos, + tree *struct_expr) ... + /* Bail out if the DECL_SIZE of the field_decl isn't the same as the BF's + precision. */ + unsigned HOST_WIDE_INT decl_size = tree_to_uhwi (DECL_SIZE (field_decl)); + if (TYPE_PRECISION (TREE_TYPE (gimple_assign_lhs (stmt))) != decl_size) + return NULL_TREE; you can use compare_tree_int (DECL_SIZE (field_decl), TYPE_PRECISION (...)) != 0 which avoids caring for the case the size isn't a uhwi ... + gimple *new_stmt = gimple_build_assign (unshare_expr (rep_comp_ref), + new_val); + gimple_set_vuse (new_stmt, gimple_vuse (stmt)); + tree vdef = gimple_vdef (stmt); + gimple_set_vdef (new_stmt, vdef); + SSA_NAME_DEF_STMT (vdef) = new_stmt; you can use gimple_move_vops (new_stmt, stmt); here + tree bfr = build3 (BIT_FIELD_REF, bf_type, new_val, + build_int_cst (bitsizetype, TYPE_PRECISION (bf_type)), + bitpos); + new_val = ifc_temp_var (bf_type, bfr, &gsi); + redundant_ssa_names.safe_push (std::make_pair (gimple_assign_lhs (stmt), + new_val)); I'm curious, why the push to redundant_ssa_names? That could use a comment ... + need_to_lower_bitfields = bitfields_to_lower_p (loop, reads_to_lower, + writes_to_lower); do we want to conditionalize this on flag_tree_loop_vectorize? That is, I think the lowering should for now happen only on the loop version guarded by .IFN_VECTORIZED. There's if ((need_to_predicate || any_complicated_phi) && ((!flag_tree_loop_vectorize && !loop->force_vectorize) || loop->dont_vectorize)) goto cleanup; for the cases that will force versioning, but I think we should simply not lower bitfields in the ((!flag_tree_loop_vectorize && !loop->force_vectorize) || loop->dont_vectorize) case? + if (!second_stmt || gimple_code (second_stmt) != GIMPLE_ASSIGN + || gimple_assign_rhs_code (second_stmt) != BIT_FIELD_REF) + return NULL; the first || goes to a new line + bf_stmt = static_cast <gassign *> (second_stmt); "nicer" and shorter is bf_stmt = dyn_cast <gassign *> (second_stmt); if (!bf_stmt || gimple_assign_rhs_code (bf_stmt) != BIT_FIELD_REF) return NULL; + tree lhs = TREE_OPERAND (bf_ref, 0); + + if (!INTEGRAL_TYPE_P (TREE_TYPE (bf_ref))) + return NULL; + + gimple *use_stmt, *pattern_stmt; + use_operand_p use_p; + tree ret = gimple_assign_lhs (first_stmt); just when reading, generic variables like 'lhs' are not helpful (when they are not an actual lhs even less so ...). You have nice docs ontop of the function - when you use atual names for _2 = BIT_FIELD_REF (_1, ...) variables you can even use them in the code so docs and code match up nicely. + /* If the first operand of the BIT_FIELD_REF is not an INTEGER type, convert + it to one of the same width so we can perform the necessary masking and + shifting. */ + if (!INTEGRAL_TYPE_P (TREE_TYPE (lhs))) + { + tree int_type + = build_nonstandard_integer_type (TYPE_PRECISION (TREE_TYPE (lhs)), + true); so you probably run into this from code that's not lowered from original bitfield reads? Note you should use TYPE_SIZE here, definitely not TYPE_PRECISION on arbitrary types (if its a vector type then that will yield the number of units for example). + unsigned HOST_WIDE_INT shift_n = bit_field_offset (bf_ref).to_constant (); + unsigned HOST_WIDE_INT mask_width = bit_field_size (bf_ref).to_constant (); is there anything that prevents this to run on VLA vector extractions? I think it would be nice to test constantness at the start of the function. + pattern_stmt + = gimple_build_assign (vect_recog_temp_ssa_var (TREE_TYPE (lhs), + NULL), eh, seeing that multiple times the vect_recog_temp_ssa_var needs a defaulted NULL second argument ... Note I fear we will have endianess issues when translating bit-field accesses to BIT_FIELD_REF/INSERT and then to shifts. Rules for memory and register operations do not match up (IIRC, I repeatedly run into issues here myself). The testcases all look like they won't catch this - I think an example would be sth like struct X { unsigned a : 23; unsigned b : 9; }, can you see to do testing on a big-endian target? Otherwise the patch looks good, so there's only minor things to fix up (in case the endianess issue turns out to be a non-issue). Sorry for the delay in reviewing. Thanks, Richard.
On 08/09/2022 12:51, Richard Biener wrote: > > I'm curious, why the push to redundant_ssa_names? That could use > a comment ... So I purposefully left a #if 0 #else #endif in there so you can see the two options. But the reason I used redundant_ssa_names is because ifcvt seems to use that as a container for all pairs of (old, new) ssa names to replace later. So I just piggy backed on that. I don't know if there's a specific reason they do the replacement at the end? Maybe some ordering issue? Either way both adding it to redundant_ssa_names or doing the replacement inline work for the bitfield lowering (or work in my testing at least). > Note I fear we will have endianess issues when translating > bit-field accesses to BIT_FIELD_REF/INSERT and then to shifts. Rules > for memory and register operations do not match up (IIRC, I repeatedly > run into issues here myself). The testcases all look like they > won't catch this - I think an example would be sth like > struct X { unsigned a : 23; unsigned b : 9; }, can you see to do > testing on a big-endian target? I've done some testing and you were right, it did fall apart on big-endian. I fixed it by changing the way we compute the 'shift' value and added two extra testcases for read and write each. > > Sorry for the delay in reviewing. No worries, apologies myself for the delay in reworking this, had a nice little week holiday in between :) I'll write the ChangeLogs once the patch has stabilized. Thanks, Andre diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1.c new file mode 100644 index 0000000000000000000000000000000000000000..01cf34fb44484ca926ca5de99eef76dd99b69e92 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1.c @@ -0,0 +1,40 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" + +extern void abort(void); + +struct s { int i : 31; }; + +#define ELT0 {0} +#define ELT1 {1} +#define ELT2 {2} +#define ELT3 {3} +#define N 32 +#define RES 48 +struct s A[N] + = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3}; + +int __attribute__ ((noipa)) +f(struct s *ptr, unsigned n) { + int res = 0; + for (int i = 0; i < n; ++i) + res += ptr[i].i; + return res; +} + +int main (void) +{ + check_vect (); + + if (f(&A[0], N) != RES) + abort (); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2.c new file mode 100644 index 0000000000000000000000000000000000000000..1a4a1579c1478b9407ad21b19e8fbdca9f674b42 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2.c @@ -0,0 +1,43 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" + +extern void abort(void); + +struct s { + unsigned i : 31; + char a : 4; +}; + +#define N 32 +#define ELT0 {0x7FFFFFFFUL, 0} +#define ELT1 {0x7FFFFFFFUL, 1} +#define ELT2 {0x7FFFFFFFUL, 2} +#define ELT3 {0x7FFFFFFFUL, 3} +#define RES 48 +struct s A[N] + = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3}; + +int __attribute__ ((noipa)) +f(struct s *ptr, unsigned n) { + int res = 0; + for (int i = 0; i < n; ++i) + res += ptr[i].a; + return res; +} + +int main (void) +{ + check_vect (); + + if (f(&A[0], N) != RES) + abort (); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-3.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-3.c new file mode 100644 index 0000000000000000000000000000000000000000..216611a29fd8bbfbafdbdb79d790e520f44ba672 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-3.c @@ -0,0 +1,43 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" +#include <stdbool.h> + +extern void abort(void); + +typedef struct { + int c; + int b; + bool a : 1; +} struct_t; + +#define N 16 +#define ELT_F { 0xFFFFFFFF, 0xFFFFFFFF, 0 } +#define ELT_T { 0xFFFFFFFF, 0xFFFFFFFF, 1 } + +struct_t vect_false[N] = { ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, + ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F }; +struct_t vect_true[N] = { ELT_F, ELT_F, ELT_T, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, + ELT_F, ELT_F, ELT_T, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F }; +int main (void) +{ + unsigned ret = 0; + for (unsigned i = 0; i < N; i++) + { + ret |= vect_false[i].a; + } + if (ret) + abort (); + + for (unsigned i = 0; i < N; i++) + { + ret |= vect_true[i].a; + } + if (!ret) + abort (); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-4.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-4.c new file mode 100644 index 0000000000000000000000000000000000000000..5bc9c412e9616aefcbf49a4518f1603380a54b2f --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-4.c @@ -0,0 +1,45 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" + +extern void abort(void); + +struct s { + unsigned i : 31; + char x : 2; + char a : 4; +}; + +#define N 32 +#define ELT0 {0x7FFFFFFFUL, 3, 0} +#define ELT1 {0x7FFFFFFFUL, 3, 1} +#define ELT2 {0x7FFFFFFFUL, 3, 2} +#define ELT3 {0x7FFFFFFFUL, 3, 3} +#define RES 48 +struct s A[N] + = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3}; + +int __attribute__ ((noipa)) +f(struct s *ptr, unsigned n) { + int res = 0; + for (int i = 0; i < n; ++i) + res += ptr[i].a; + return res; +} + +int main (void) +{ + check_vect (); + + if (f(&A[0], N) != RES) + abort (); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-5.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-5.c new file mode 100644 index 0000000000000000000000000000000000000000..1dc24d3eded192144dc9ad94589b4c5c3d999e65 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-5.c @@ -0,0 +1,42 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" + +extern void abort(void); + +struct s { + unsigned a : 23; unsigned b : 9; +}; + +#define N 32 +#define ELT0 {0x7FFFFFUL, 0} +#define ELT1 {0x7FFFFFUL, 1} +#define ELT2 {0x7FFFFFUL, 2} +#define ELT3 {0x7FFFFFUL, 3} +#define RES 48 +struct s A[N] + = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3}; + +int __attribute__ ((noipa)) +f(struct s *ptr, unsigned n) { + int res = 0; + for (int i = 0; i < n; ++i) + res += ptr[i].b; + return res; +} + +int main (void) +{ + check_vect (); + + if (f(&A[0], N) != RES) + abort (); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-6.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-6.c new file mode 100644 index 0000000000000000000000000000000000000000..7d24c29975865883a7cdc7aa057fbb6bf413e0bc --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-6.c @@ -0,0 +1,42 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" + +extern void abort(void); + +struct s { + unsigned a : 23; unsigned b : 8; +}; + +#define N 32 +#define ELT0 {0x7FFFFFUL, 0} +#define ELT1 {0x7FFFFFUL, 1} +#define ELT2 {0x7FFFFFUL, 2} +#define ELT3 {0x7FFFFFUL, 3} +#define RES 48 +struct s A[N] + = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3}; + +int __attribute__ ((noipa)) +f(struct s *ptr, unsigned n) { + int res = 0; + for (int i = 0; i < n; ++i) + res += ptr[i].b; + return res; +} + +int main (void) +{ + check_vect (); + + if (f(&A[0], N) != RES) + abort (); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c new file mode 100644 index 0000000000000000000000000000000000000000..19683d277b1ade1034496136f1d03bb2b446900f --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c @@ -0,0 +1,39 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" + +extern void abort(void); + +struct s { int i : 31; }; + +#define N 32 +#define V 5 +struct s A[N]; + +void __attribute__ ((noipa)) +f(struct s *ptr, unsigned n) { + for (int i = 0; i < n; ++i) + ptr[i].i = V; +} + +void __attribute__ ((noipa)) +check_f(struct s *ptr) { + for (unsigned i = 0; i < N; ++i) + if (ptr[i].i != V) + abort (); +} + +int main (void) +{ + check_vect (); + __builtin_memset (&A[0], 0, sizeof(struct s) * N); + + f(&A[0], N); + check_f (&A[0]); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c new file mode 100644 index 0000000000000000000000000000000000000000..d550dd35ab75eb67f6e53f89fbf55b7315e50bc9 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c @@ -0,0 +1,42 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" + +extern void abort(void); + +struct s { + unsigned i : 31; + char a : 4; +}; + +#define N 32 +#define V 5 +struct s A[N]; + +void __attribute__ ((noipa)) +f(struct s *ptr, unsigned n) { + for (int i = 0; i < n; ++i) + ptr[i].a = V; +} + +void __attribute__ ((noipa)) +check_f(struct s *ptr) { + for (unsigned i = 0; i < N; ++i) + if (ptr[i].a != V) + abort (); +} + +int main (void) +{ + check_vect (); + __builtin_memset (&A[0], 0, sizeof(struct s) * N); + + f(&A[0], N); + check_f (&A[0]); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c new file mode 100644 index 0000000000000000000000000000000000000000..3303d2610ff972d986be172962c129634ee64254 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c @@ -0,0 +1,43 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" + +extern void abort(void); + +struct s { + unsigned i : 31; + char x : 2; + char a : 4; +}; + +#define N 32 +#define V 5 +struct s A[N]; + +void __attribute__ ((noipa)) +f(struct s *ptr, unsigned n) { + for (int i = 0; i < n; ++i) + ptr[i].a = V; +} + +void __attribute__ ((noipa)) +check_f(struct s *ptr) { + for (unsigned i = 0; i < N; ++i) + if (ptr[i].a != V) + abort (); +} + +int main (void) +{ + check_vect (); + __builtin_memset (&A[0], 0, sizeof(struct s) * N); + + f(&A[0], N); + check_f (&A[0]); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-4.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-4.c new file mode 100644 index 0000000000000000000000000000000000000000..fae6ea3557dcaba7b330ebdaa471281d33d2ba15 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-4.c @@ -0,0 +1,42 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" + +extern void abort(void); + +struct s { + unsigned b : 23; + unsigned a : 9; +}; + +#define N 32 +#define V 5 +struct s A[N]; + +void __attribute__ ((noipa)) +f(struct s *ptr, unsigned n) { + for (int i = 0; i < n; ++i) + ptr[i].a = V; +} + +void __attribute__ ((noipa)) +check_f(struct s *ptr) { + for (unsigned i = 0; i < N; ++i) + if (ptr[i].a != V) + abort (); +} + +int main (void) +{ + check_vect (); + __builtin_memset (&A[0], 0, sizeof(struct s) * N); + + f(&A[0], N); + check_f (&A[0]); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-5.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-5.c new file mode 100644 index 0000000000000000000000000000000000000000..99360c2967b076212c67eb4f34b8fd91711d8821 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-5.c @@ -0,0 +1,42 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" + +extern void abort(void); + +struct s { + unsigned b : 23; + unsigned a : 8; +}; + +#define N 32 +#define V 5 +struct s A[N]; + +void __attribute__ ((noipa)) +f(struct s *ptr, unsigned n) { + for (int i = 0; i < n; ++i) + ptr[i].a = V; +} + +void __attribute__ ((noipa)) +check_f(struct s *ptr) { + for (unsigned i = 0; i < N; ++i) + if (ptr[i].a != V) + abort (); +} + +int main (void) +{ + check_vect (); + __builtin_memset (&A[0], 0, sizeof(struct s) * N); + + f(&A[0], N); + check_f (&A[0]); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ + diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc index 1c8e1a45234b8c3565edaacd55abbee23d8ea240..ee6226b7bee713598141468de00728abff675e52 100644 --- a/gcc/tree-if-conv.cc +++ b/gcc/tree-if-conv.cc @@ -91,6 +91,7 @@ along with GCC; see the file COPYING3. If not see #include "tree-pass.h" #include "ssa.h" #include "expmed.h" +#include "expr.h" #include "optabs-query.h" #include "gimple-pretty-print.h" #include "alias.h" @@ -123,6 +124,9 @@ along with GCC; see the file COPYING3. If not see #include "tree-vectorizer.h" #include "tree-eh.h" +/* For lang_hooks.types.type_for_mode. */ +#include "langhooks.h" + /* Only handle PHIs with no more arguments unless we are asked to by simd pragma. */ #define MAX_PHI_ARG_NUM \ @@ -145,6 +149,12 @@ static bool need_to_rewrite_undefined; before phi_convertible_by_degenerating_args. */ static bool any_complicated_phi; +/* True if we have bitfield accesses we can lower. */ +static bool need_to_lower_bitfields; + +/* True if there is any ifcvting to be done. */ +static bool need_to_ifcvt; + /* Hash for struct innermost_loop_behavior. It depends on the user to free the memory. */ @@ -1411,15 +1421,6 @@ if_convertible_loop_p_1 (class loop *loop, vec<data_reference_p> *refs) calculate_dominance_info (CDI_DOMINATORS); - /* Allow statements that can be handled during if-conversion. */ - ifc_bbs = get_loop_body_in_if_conv_order (loop); - if (!ifc_bbs) - { - if (dump_file && (dump_flags & TDF_DETAILS)) - fprintf (dump_file, "Irreducible loop\n"); - return false; - } - for (i = 0; i < loop->num_nodes; i++) { basic_block bb = ifc_bbs[i]; @@ -2898,18 +2899,22 @@ version_loop_for_if_conversion (class loop *loop, vec<gimple *> *preds) class loop *new_loop; gimple *g; gimple_stmt_iterator gsi; - unsigned int save_length; + unsigned int save_length = 0; g = gimple_build_call_internal (IFN_LOOP_VECTORIZED, 2, build_int_cst (integer_type_node, loop->num), integer_zero_node); gimple_call_set_lhs (g, cond); - /* Save BB->aux around loop_version as that uses the same field. */ - save_length = loop->inner ? loop->inner->num_nodes : loop->num_nodes; - void **saved_preds = XALLOCAVEC (void *, save_length); - for (unsigned i = 0; i < save_length; i++) - saved_preds[i] = ifc_bbs[i]->aux; + void **saved_preds = NULL; + if (any_complicated_phi || need_to_predicate) + { + /* Save BB->aux around loop_version as that uses the same field. */ + save_length = loop->inner ? loop->inner->num_nodes : loop->num_nodes; + saved_preds = XALLOCAVEC (void *, save_length); + for (unsigned i = 0; i < save_length; i++) + saved_preds[i] = ifc_bbs[i]->aux; + } initialize_original_copy_tables (); /* At this point we invalidate porfile confistency until IFN_LOOP_VECTORIZED @@ -2921,8 +2926,9 @@ version_loop_for_if_conversion (class loop *loop, vec<gimple *> *preds) profile_probability::always (), true); free_original_copy_tables (); - for (unsigned i = 0; i < save_length; i++) - ifc_bbs[i]->aux = saved_preds[i]; + if (any_complicated_phi || need_to_predicate) + for (unsigned i = 0; i < save_length; i++) + ifc_bbs[i]->aux = saved_preds[i]; if (new_loop == NULL) return NULL; @@ -2998,7 +3004,7 @@ ifcvt_split_critical_edges (class loop *loop, bool aggressive_if_conv) auto_vec<edge> critical_edges; /* Loop is not well formed. */ - if (num <= 2 || loop->inner || !single_exit (loop)) + if (loop->inner) return false; body = get_loop_body (loop); @@ -3259,6 +3265,202 @@ ifcvt_hoist_invariants (class loop *loop, edge pe) free (body); } +/* Returns the DECL_FIELD_BIT_OFFSET of the bitfield accesse in stmt iff its + type mode is not BLKmode. If BITPOS is not NULL it will hold the poly_int64 + value of the DECL_FIELD_BIT_OFFSET of the bitfield access and STRUCT_EXPR, + if not NULL, will hold the tree representing the base struct of this + bitfield. */ + +static tree +get_bitfield_rep (gassign *stmt, bool write, tree *bitpos, + tree *struct_expr) +{ + tree comp_ref = write ? gimple_assign_lhs (stmt) + : gimple_assign_rhs1 (stmt); + + tree field_decl = TREE_OPERAND (comp_ref, 1); + tree rep_decl = DECL_BIT_FIELD_REPRESENTATIVE (field_decl); + + /* Bail out if the representative is BLKmode as we will not be able to + vectorize this. */ + if (TYPE_MODE (TREE_TYPE (rep_decl)) == E_BLKmode) + return NULL_TREE; + + /* Bail out if the DECL_SIZE of the field_decl isn't the same as the BF's + precision. */ + unsigned HOST_WIDE_INT bf_prec + = TYPE_PRECISION (TREE_TYPE (gimple_assign_lhs (stmt))); + if (compare_tree_int (DECL_SIZE (field_decl), bf_prec) != 0) + return NULL_TREE; + + if (struct_expr) + *struct_expr = TREE_OPERAND (comp_ref, 0); + + if (bitpos) + *bitpos + = fold_build2 (MINUS_EXPR, bitsizetype, + DECL_FIELD_BIT_OFFSET (field_decl), + DECL_FIELD_BIT_OFFSET (rep_decl)); + + return rep_decl; + +} + +/* Lowers the bitfield described by DATA. + For a write like: + + struct.bf = _1; + + lower to: + + __ifc_1 = struct.<representative>; + __ifc_2 = BIT_INSERT_EXPR (__ifc_1, _1, bitpos); + struct.<representative> = __ifc_2; + + For a read: + + _1 = struct.bf; + + lower to: + + __ifc_1 = struct.<representative>; + _1 = BIT_FIELD_REF (__ifc_1, bitsize, bitpos); + + where representative is a legal load that contains the bitfield value, + bitsize is the size of the bitfield and bitpos the offset to the start of + the bitfield within the representative. */ + +static void +lower_bitfield (gassign *stmt, bool write) +{ + tree struct_expr; + tree bitpos; + tree rep_decl = get_bitfield_rep (stmt, write, &bitpos, &struct_expr); + tree rep_type = TREE_TYPE (rep_decl); + tree bf_type = TREE_TYPE (gimple_assign_lhs (stmt)); + + gimple_stmt_iterator gsi = gsi_for_stmt (stmt); + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "Lowering:\n"); + print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM); + fprintf (dump_file, "to:\n"); + } + + /* REP_COMP_REF is a COMPONENT_REF for the representative. NEW_VAL is it's + defining SSA_NAME. */ + tree rep_comp_ref = build3 (COMPONENT_REF, rep_type, struct_expr, rep_decl, + NULL_TREE); + tree new_val = ifc_temp_var (rep_type, rep_comp_ref, &gsi); + + if (dump_file && (dump_flags & TDF_DETAILS)) + print_gimple_stmt (dump_file, SSA_NAME_DEF_STMT (new_val), 0, TDF_SLIM); + + if (write) + { + new_val = ifc_temp_var (rep_type, + build3 (BIT_INSERT_EXPR, rep_type, new_val, + unshare_expr (gimple_assign_rhs1 (stmt)), + bitpos), &gsi); + + if (dump_file && (dump_flags & TDF_DETAILS)) + print_gimple_stmt (dump_file, SSA_NAME_DEF_STMT (new_val), 0, TDF_SLIM); + + gimple *new_stmt = gimple_build_assign (unshare_expr (rep_comp_ref), + new_val); + gimple_move_vops (new_stmt, stmt); + gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT); + + if (dump_file && (dump_flags & TDF_DETAILS)) + print_gimple_stmt (dump_file, new_stmt, 0, TDF_SLIM); + } + else + { + tree bfr = build3 (BIT_FIELD_REF, bf_type, new_val, + build_int_cst (bitsizetype, TYPE_PRECISION (bf_type)), + bitpos); + new_val = ifc_temp_var (bf_type, bfr, &gsi); +#if 0 + redundant_ssa_names.safe_push (std::make_pair (gimple_assign_lhs (stmt), + new_val)); +#else + replace_uses_by (gimple_assign_lhs (stmt), new_val); +#endif + + if (dump_file && (dump_flags & TDF_DETAILS)) + print_gimple_stmt (dump_file, SSA_NAME_DEF_STMT (new_val), 0, TDF_SLIM); + } + + gsi_remove (&gsi, true); +} + +/* Return TRUE if there are bitfields to lower in this LOOP. Fill TO_LOWER + with data structures representing these bitfields. */ + +static bool +bitfields_to_lower_p (class loop *loop, + vec <gassign *> &reads_to_lower, + vec <gassign *> &writes_to_lower) +{ + gimple_stmt_iterator gsi; + + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "Analyzing loop %d for bitfields:\n", loop->num); + } + + for (unsigned i = 0; i < loop->num_nodes; ++i) + { + basic_block bb = ifc_bbs[i]; + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) + { + gassign *stmt = dyn_cast<gassign*> (gsi_stmt (gsi)); + if (!stmt) + continue; + + tree op = gimple_assign_lhs (stmt); + bool write = TREE_CODE (op) == COMPONENT_REF; + + if (!write) + op = gimple_assign_rhs1 (stmt); + + if (TREE_CODE (op) != COMPONENT_REF) + continue; + + if (DECL_BIT_FIELD_TYPE (TREE_OPERAND (op, 1))) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM); + + if (!INTEGRAL_TYPE_P (TREE_TYPE (op))) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "\t Bitfield NO OK to lower," + " field type is not Integral.\n"); + return false; + } + + if (!get_bitfield_rep (stmt, write, NULL, NULL)) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "\t Bitfield NOT OK to lower," + " representative is BLKmode.\n"); + return false; + } + + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "\tBitfield OK to lower.\n"); + if (write) + writes_to_lower.safe_push (stmt); + else + reads_to_lower.safe_push (stmt); + } + } + } + return !reads_to_lower.is_empty () || !writes_to_lower.is_empty (); +} + + /* If-convert LOOP when it is legal. For the moment this pass has no profitability analysis. Returns non-zero todo flags when something changed. */ @@ -3269,12 +3471,16 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds) unsigned int todo = 0; bool aggressive_if_conv; class loop *rloop; + auto_vec <gassign *, 4> reads_to_lower; + auto_vec <gassign *, 4> writes_to_lower; bitmap exit_bbs; edge pe; again: rloop = NULL; ifc_bbs = NULL; + need_to_lower_bitfields = false; + need_to_ifcvt = false; need_to_predicate = false; need_to_rewrite_undefined = false; any_complicated_phi = false; @@ -3290,16 +3496,42 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds) aggressive_if_conv = true; } - if (!ifcvt_split_critical_edges (loop, aggressive_if_conv)) + if (!single_exit (loop)) goto cleanup; - if (!if_convertible_loop_p (loop) - || !dbg_cnt (if_conversion_tree)) + /* If there are more than two BBs in the loop then there is at least one if + to convert. */ + if (loop->num_nodes > 2 + && !ifcvt_split_critical_edges (loop, aggressive_if_conv)) goto cleanup; - if ((need_to_predicate || any_complicated_phi) - && ((!flag_tree_loop_vectorize && !loop->force_vectorize) - || loop->dont_vectorize)) + ifc_bbs = get_loop_body_in_if_conv_order (loop); + if (!ifc_bbs) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "Irreducible loop\n"); + goto cleanup; + } + + if (loop->num_nodes > 2) + { + need_to_ifcvt = true; + + if (!if_convertible_loop_p (loop) || !dbg_cnt (if_conversion_tree)) + goto cleanup; + + if ((need_to_predicate || any_complicated_phi) + && ((!flag_tree_loop_vectorize && !loop->force_vectorize) + || loop->dont_vectorize)) + goto cleanup; + } + + if ((flag_tree_loop_vectorize || loop->force_vectorize) + && !loop->dont_vectorize) + need_to_lower_bitfields = bitfields_to_lower_p (loop, reads_to_lower, + writes_to_lower); + + if (!need_to_ifcvt && !need_to_lower_bitfields) goto cleanup; /* The edge to insert invariant stmts on. */ @@ -3310,7 +3542,8 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds) Either version this loop, or if the pattern is right for outer-loop vectorization, version the outer loop. In the latter case we will still if-convert the original inner loop. */ - if (need_to_predicate + if (need_to_lower_bitfields + || need_to_predicate || any_complicated_phi || flag_tree_loop_if_convert != 1) { @@ -3350,10 +3583,31 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds) pe = single_pred_edge (gimple_bb (preds->last ())); } - /* Now all statements are if-convertible. Combine all the basic - blocks into one huge basic block doing the if-conversion - on-the-fly. */ - combine_blocks (loop); + if (need_to_lower_bitfields) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "-------------------------\n"); + fprintf (dump_file, "Start lowering bitfields\n"); + } + while (!reads_to_lower.is_empty ()) + lower_bitfield (reads_to_lower.pop (), false); + while (!writes_to_lower.is_empty ()) + lower_bitfield (writes_to_lower.pop (), true); + + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "Done lowering bitfields\n"); + fprintf (dump_file, "-------------------------\n"); + } + } + if (need_to_ifcvt) + { + /* Now all statements are if-convertible. Combine all the basic + blocks into one huge basic block doing the if-conversion + on-the-fly. */ + combine_blocks (loop); + } /* Perform local CSE, this esp. helps the vectorizer analysis if loads and stores are involved. CSE only the loop body, not the entry @@ -3393,6 +3647,8 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds) if (rloop != NULL) { loop = rloop; + reads_to_lower.truncate (0); + writes_to_lower.truncate (0); goto again; } diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc index b279a82551eb70379804d405983ae5dc44b66bf5..e93cdc727da4bb7863b2ad13f29f7d550492adea 100644 --- a/gcc/tree-vect-data-refs.cc +++ b/gcc/tree-vect-data-refs.cc @@ -4301,7 +4301,8 @@ vect_find_stmt_data_reference (loop_p loop, gimple *stmt, free_data_ref (dr); return opt_result::failure_at (stmt, "not vectorized:" - " statement is bitfield access %G", stmt); + " statement is an unsupported" + " bitfield access %G", stmt); } if (DR_BASE_ADDRESS (dr) diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc index dfbfb71b3c69a0205ccc1b287cb50fa02a70942e..9042599f04399eca37fe9038d2bd5c9f78e3a9e4 100644 --- a/gcc/tree-vect-patterns.cc +++ b/gcc/tree-vect-patterns.cc @@ -35,6 +35,8 @@ along with GCC; see the file COPYING3. If not see #include "tree-eh.h" #include "gimplify.h" #include "gimple-iterator.h" +#include "gimple-fold.h" +#include "gimplify-me.h" #include "cfgloop.h" #include "tree-vectorizer.h" #include "dumpfile.h" @@ -663,7 +665,7 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code, is NULL, the caller must set SSA_NAME_DEF_STMT for the returned SSA var. */ static tree -vect_recog_temp_ssa_var (tree type, gimple *stmt) +vect_recog_temp_ssa_var (tree type, gimple *stmt = NULL) { return make_temp_ssa_name (type, stmt, "patt"); } @@ -1828,6 +1830,329 @@ vect_recog_widen_sum_pattern (vec_info *vinfo, return pattern_stmt; } +/* Function vect_recog_bitfield_ref_pattern + + Try to find the following pattern: + + bf_value = BIT_FIELD_REF (container, bitsize, bitpos); + result = (type_out) bf_value; + + where type_out is a non-bitfield type, that is to say, it's precision matches + 2^(TYPE_SIZE(type_out) - (TYPE_UNSIGNED (type_out) ? 1 : 2)). + + Input: + + * STMT_VINFO: The stmt from which the pattern search begins. + here it starts with: + result = (type_out) bf_value; + + Output: + + * TYPE_OUT: The vector type of the output of this pattern. + + * Return value: A new stmt that will be used to replace the sequence of + stmts that constitute the pattern. If the precision of type_out is bigger + than the precision type of _1 we perform the widening before the shifting, + since the new precision will be large enough to shift the value and moving + widening operations up the statement chain enables the generation of + widening loads. If we are widening and the operation after the pattern is + an addition then we mask first and shift later, to enable the generation of + shifting adds. In the case of narrowing we will always mask first, shift + last and then perform a narrowing operation. This will enable the + generation of narrowing shifts. + + Widening with mask first, shift later: + container = (type_out) container; + masked = container & (((1 << bitsize) - 1) << bitpos); + result = patt2 >> masked; + + Widening with shift first, mask last: + container = (type_out) container; + shifted = container >> bitpos; + result = shifted & ((1 << bitsize) - 1); + + Narrowing: + masked = container & (((1 << bitsize) - 1) << bitpos); + result = masked >> bitpos; + result = (type_out) result; + + The shifting is always optional depending on whether bitpos != 0. + +*/ + +static gimple * +vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info, + tree *type_out) +{ + gassign *first_stmt = dyn_cast <gassign *> (stmt_info->stmt); + + if (!first_stmt) + return NULL; + + gassign *bf_stmt; + if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (first_stmt)) + && TREE_CODE (gimple_assign_rhs1 (first_stmt)) == SSA_NAME) + { + gimple *second_stmt + = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (first_stmt)); + bf_stmt = dyn_cast <gassign *> (second_stmt); + if (!bf_stmt + || gimple_assign_rhs_code (bf_stmt) != BIT_FIELD_REF) + return NULL; + } + else + return NULL; + + tree bf_ref = gimple_assign_rhs1 (bf_stmt); + tree container = TREE_OPERAND (bf_ref, 0); + + if (!bit_field_offset (bf_ref).is_constant () + || !bit_field_size (bf_ref).is_constant () + || !tree_fits_uhwi_p (TYPE_SIZE (TREE_TYPE (container)))) + return NULL; + + if (!INTEGRAL_TYPE_P (TREE_TYPE (bf_ref))) + return NULL; + + gimple *use_stmt, *pattern_stmt; + use_operand_p use_p; + tree ret = gimple_assign_lhs (first_stmt); + tree ret_type = TREE_TYPE (ret); + bool shift_first = true; + tree vectype; + + /* If the first operand of the BIT_FIELD_REF is not an INTEGER type, convert + it to one of the same width so we can perform the necessary masking and + shifting. */ + if (!INTEGRAL_TYPE_P (TREE_TYPE (container))) + { + unsigned HOST_WIDE_INT container_size = + tree_to_uhwi (TYPE_SIZE (TREE_TYPE (container))); + tree int_type = build_nonstandard_integer_type (container_size, true); + pattern_stmt + = gimple_build_assign (vect_recog_temp_ssa_var (int_type), + VIEW_CONVERT_EXPR, container); + vectype = get_vectype_for_scalar_type (vinfo, int_type); + container = gimple_assign_lhs (pattern_stmt); + append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype); + } + else + vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (container)); + + /* We move the conversion earlier if the loaded type is smaller than the + return type to enable the use of widening loads. */ + if (TYPE_PRECISION (TREE_TYPE (container)) < TYPE_PRECISION (ret_type) + && !useless_type_conversion_p (TREE_TYPE (container), ret_type)) + { + pattern_stmt + = gimple_build_assign (vect_recog_temp_ssa_var (ret_type), + NOP_EXPR, container); + container = gimple_get_lhs (pattern_stmt); + append_pattern_def_seq (vinfo, stmt_info, pattern_stmt); + } + else if (!useless_type_conversion_p (TREE_TYPE (container), ret_type)) + /* If we are doing the conversion last then also delay the shift as we may + be able to combine the shift and conversion in certain cases. */ + shift_first = false; + + tree container_type = TREE_TYPE (container); + + /* If the only use of the result of this BIT_FIELD_REF + CONVERT is a + PLUS_EXPR then do the shift last as some targets can combine the shift and + add into a single instruction. */ + if (single_imm_use (gimple_assign_lhs (first_stmt), &use_p, &use_stmt)) + { + if (gimple_code (use_stmt) == GIMPLE_ASSIGN + && gimple_assign_rhs_code (use_stmt) == PLUS_EXPR) + shift_first = false; + } + + unsigned HOST_WIDE_INT shift_n = bit_field_offset (bf_ref).to_constant (); + unsigned HOST_WIDE_INT mask_width = bit_field_size (bf_ref).to_constant (); + unsigned HOST_WIDE_INT prec = tree_to_uhwi (TYPE_SIZE (container_type)); + if (BYTES_BIG_ENDIAN) + shift_n = prec - shift_n - mask_width; + + /* If we don't have to shift we only generate the mask, so just fix the + code-path to shift_first. */ + if (shift_n == 0) + shift_first = true; + + tree result; + if (shift_first) + { + tree shifted = container; + if (shift_n) + { + pattern_stmt + = gimple_build_assign (vect_recog_temp_ssa_var (container_type), + RSHIFT_EXPR, container, + build_int_cst (sizetype, shift_n)); + shifted = gimple_assign_lhs (pattern_stmt); + append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype); + } + + tree mask = wide_int_to_tree (container_type, + wi::mask (mask_width, false, prec)); + + pattern_stmt + = gimple_build_assign (vect_recog_temp_ssa_var (container_type), + BIT_AND_EXPR, shifted, mask); + result = gimple_assign_lhs (pattern_stmt); + } + else + { + tree mask = wide_int_to_tree (container_type, + wi::shifted_mask (shift_n, mask_width, + false, prec)); + pattern_stmt + = gimple_build_assign (vect_recog_temp_ssa_var (container_type), + BIT_AND_EXPR, container, mask); + tree masked = gimple_assign_lhs (pattern_stmt); + + append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype); + pattern_stmt + = gimple_build_assign (vect_recog_temp_ssa_var (container_type), + RSHIFT_EXPR, masked, + build_int_cst (sizetype, shift_n)); + result = gimple_assign_lhs (pattern_stmt); + } + + if (!useless_type_conversion_p (TREE_TYPE (result), ret_type)) + { + append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype); + pattern_stmt + = gimple_build_assign (vect_recog_temp_ssa_var (ret_type), + NOP_EXPR, result); + } + + *type_out = STMT_VINFO_VECTYPE (stmt_info); + vect_pattern_detected ("bitfield_ref pattern", stmt_info->stmt); + + return pattern_stmt; +} + +/* Function vect_recog_bit_insert_pattern + + Try to find the following pattern: + + written = BIT_INSERT_EXPR (container, value, bitpos); + + Input: + + * STMT_VINFO: The stmt we want to replace. + + Output: + + * TYPE_OUT: The vector type of the output of this pattern. + + * Return value: A new stmt that will be used to replace the sequence of + stmts that constitute the pattern. In this case it will be: + value = (container_type) value; // Make sure + shifted = value << bitpos; // Shift value into place + masked = shifted & (mask << bitpos); // Mask off the non-relevant bits in + // the 'to-write value'. + cleared = container & ~(mask << bitpos); // Clearing the bits we want to + // write to from the value we want + // to write to. + written = cleared | masked; // Write bits. + + + where mask = ((1 << TYPE_PRECISION (value)) - 1), a mask to keep the number of + bits corresponding to the real size of the bitfield value we are writing to. + The shifting is always optional depending on whether bitpos != 0. + +*/ + +static gimple * +vect_recog_bit_insert_pattern (vec_info *vinfo, stmt_vec_info stmt_info, + tree *type_out) +{ + gassign *bf_stmt = dyn_cast <gassign *> (stmt_info->stmt); + if (!bf_stmt || gimple_assign_rhs_code (bf_stmt) != BIT_INSERT_EXPR) + return NULL; + + tree container = gimple_assign_rhs1 (bf_stmt); + tree value = gimple_assign_rhs2 (bf_stmt); + tree shift = gimple_assign_rhs3 (bf_stmt); + + tree bf_type = TREE_TYPE (value); + tree container_type = TREE_TYPE (container); + + if (!INTEGRAL_TYPE_P (container_type) + || !tree_fits_uhwi_p (TYPE_SIZE (container_type))) + return NULL; + + gimple *pattern_stmt; + + vect_unpromoted_value unprom; + unprom.set_op (value, vect_internal_def); + value = vect_convert_input (vinfo, stmt_info, container_type, &unprom, + get_vectype_for_scalar_type (vinfo, + container_type)); + + unsigned HOST_WIDE_INT mask_width = TYPE_PRECISION (bf_type); + unsigned HOST_WIDE_INT prec = tree_to_uhwi (TYPE_SIZE (container_type)); + unsigned HOST_WIDE_INT shift_n = tree_to_uhwi (shift); + if (BYTES_BIG_ENDIAN) + { + shift_n = prec - shift_n - mask_width; + shift = build_int_cst (TREE_TYPE (shift), shift_n); + } + + if (!useless_type_conversion_p (TREE_TYPE (value), container_type)) + { + pattern_stmt = + gimple_build_assign (vect_recog_temp_ssa_var (container_type), + NOP_EXPR, value); + append_pattern_def_seq (vinfo, stmt_info, pattern_stmt); + value = gimple_get_lhs (pattern_stmt); + } + + /* Shift VALUE into place. */ + tree shifted = value; + if (shift_n) + { + pattern_stmt + = gimple_build_assign (vect_recog_temp_ssa_var (container_type), + LSHIFT_EXPR, value, shift); + append_pattern_def_seq (vinfo, stmt_info, pattern_stmt); + shifted = gimple_get_lhs (pattern_stmt); + } + + tree mask_t + = wide_int_to_tree (container_type, + wi::shifted_mask (shift_n, mask_width, false, prec)); + + /* Clear bits we don't want to write back from SHIFTED. */ + gimple_seq stmts = NULL; + tree masked = gimple_build (&stmts, BIT_AND_EXPR, container_type, shifted, + mask_t); + if (!gimple_seq_empty_p (stmts)) + { + pattern_stmt = gimple_seq_first_stmt (stmts); + append_pattern_def_seq (vinfo, stmt_info, pattern_stmt); + } + + /* Mask off the bits in the container that we are to write to. */ + mask_t = wide_int_to_tree (container_type, + wi::shifted_mask (shift_n, mask_width, true, prec)); + tree cleared = vect_recog_temp_ssa_var (container_type); + pattern_stmt = gimple_build_assign (cleared, BIT_AND_EXPR, container, mask_t); + append_pattern_def_seq (vinfo, stmt_info, pattern_stmt); + + /* Write MASKED into CLEARED. */ + pattern_stmt + = gimple_build_assign (vect_recog_temp_ssa_var (container_type), + BIT_IOR_EXPR, cleared, masked); + + *type_out = STMT_VINFO_VECTYPE (stmt_info); + vect_pattern_detected ("bit_insert pattern", stmt_info->stmt); + + return pattern_stmt; +} + + /* Recognize cases in which an operation is performed in one type WTYPE but could be done more efficiently in a narrower type NTYPE. For example, if we have: @@ -5623,6 +5948,8 @@ struct vect_recog_func taken which means usually the more complex one needs to preceed the less comples onex (widen_sum only after dot_prod or sad for example). */ static vect_recog_func vect_vect_recog_func_ptrs[] = { + { vect_recog_bitfield_ref_pattern, "bitfield_ref" }, + { vect_recog_bit_insert_pattern, "bit_insert" }, { vect_recog_over_widening_pattern, "over_widening" }, /* Must come after over_widening, which narrows the shift as much as possible beforehand. */
On Mon, 26 Sep 2022, Andre Vieira (lists) wrote: > > On 08/09/2022 12:51, Richard Biener wrote: > > > > I'm curious, why the push to redundant_ssa_names? That could use > > a comment ... > So I purposefully left a #if 0 #else #endif in there so you can see the two > options. But the reason I used redundant_ssa_names is because ifcvt seems to > use that as a container for all pairs of (old, new) ssa names to replace > later. So I just piggy backed on that. I don't know if there's a specific > reason they do the replacement at the end? Maybe some ordering issue? Either > way both adding it to redundant_ssa_names or doing the replacement inline work > for the bitfield lowering (or work in my testing at least). Possibly because we (in the past?) inserted/copied stuff based on predicates generated at analysis time after we decide to elide something so we need to watch for later appearing uses. But who knows ... my mind fails me here. If it works to replace uses immediately please do so. But now I wonder why we need this - the value shouldn't change so you should get away with re-using the existing SSA name for the final value? > > Note I fear we will have endianess issues when translating > > bit-field accesses to BIT_FIELD_REF/INSERT and then to shifts. Rules > > for memory and register operations do not match up (IIRC, I repeatedly > > run into issues here myself). The testcases all look like they > > won't catch this - I think an example would be sth like > > struct X { unsigned a : 23; unsigned b : 9; }, can you see to do > > testing on a big-endian target? > I've done some testing and you were right, it did fall apart on big-endian. I > fixed it by changing the way we compute the 'shift' value and added two extra > testcases for read and write each. > > > > Sorry for the delay in reviewing. > No worries, apologies myself for the delay in reworking this, had a nice > little week holiday in between :) > > I'll write the ChangeLogs once the patch has stabilized. Thanks, Richard.
On 27/09/2022 13:34, Richard Biener wrote: > On Mon, 26 Sep 2022, Andre Vieira (lists) wrote: > >> On 08/09/2022 12:51, Richard Biener wrote: >>> I'm curious, why the push to redundant_ssa_names? That could use >>> a comment ... >> So I purposefully left a #if 0 #else #endif in there so you can see the two >> options. But the reason I used redundant_ssa_names is because ifcvt seems to >> use that as a container for all pairs of (old, new) ssa names to replace >> later. So I just piggy backed on that. I don't know if there's a specific >> reason they do the replacement at the end? Maybe some ordering issue? Either >> way both adding it to redundant_ssa_names or doing the replacement inline work >> for the bitfield lowering (or work in my testing at least). > Possibly because we (in the past?) inserted/copied stuff based on > predicates generated at analysis time after we decide to elide something > so we need to watch for later appearing uses. But who knows ... my mind > fails me here. > > If it works to replace uses immediately please do so. But now > I wonder why we need this - the value shouldn't change so you > should get away with re-using the existing SSA name for the final value? Yeah... good point. A quick change and minor testing seems to agree. I'm sure I had a good reason to do it initially ;) I'll run a full-regression on this change to make sure I didn't miss anything.
Made the change and also created the ChangeLogs. gcc/ChangeLog: * tree-if-conv.cc (if_convertible_loop_p_1): Move ordering of loop bb's from here... (tree_if_conversion): ... to here. Also call bitfield lowering when appropriate. (version_loop_for_if_conversion): Adapt to enable loop versioning when we only need to lower bitfields. (ifcvt_split_critical_edges): Relax condition of expected loop form as this is checked earlier. (get_bitfield_rep): New function. (lower_bitfield): Likewise. (bitfields_to_lower_p): Likewise. (need_to_lower_bitfields): New global boolean. (need_to_ifcvt): Likewise. * tree-vect-data-refs.cc (vect_find_stmt_data_reference): Improve diagnostic message. * tree-vect-patterns.cc (vect_recog_temp_ssa_var): Add default value for last parameter. (vect_recog_bitfield_ref_pattern): New. (vect_recog_bit_insert_pattern): New. gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-bitfield-read-1.c: New test. * gcc.dg/vect/vect-bitfield-read-2.c: New test. * gcc.dg/vect/vect-bitfield-read-3.c: New test. * gcc.dg/vect/vect-bitfield-read-4.c: New test. * gcc.dg/vect/vect-bitfield-read-5.c: New test. * gcc.dg/vect/vect-bitfield-read-6.c: New test. * gcc.dg/vect/vect-bitfield-write-1.c: New test. * gcc.dg/vect/vect-bitfield-write-2.c: New test. * gcc.dg/vect/vect-bitfield-write-3.c: New test. * gcc.dg/vect/vect-bitfield-write-4.c: New test. * gcc.dg/vect/vect-bitfield-write-5.c: New test. On 28/09/2022 10:43, Andre Vieira (lists) via Gcc-patches wrote: > > On 27/09/2022 13:34, Richard Biener wrote: >> On Mon, 26 Sep 2022, Andre Vieira (lists) wrote: >> >>> On 08/09/2022 12:51, Richard Biener wrote: >>>> I'm curious, why the push to redundant_ssa_names? That could use >>>> a comment ... >>> So I purposefully left a #if 0 #else #endif in there so you can see >>> the two >>> options. But the reason I used redundant_ssa_names is because ifcvt >>> seems to >>> use that as a container for all pairs of (old, new) ssa names to >>> replace >>> later. So I just piggy backed on that. I don't know if there's a >>> specific >>> reason they do the replacement at the end? Maybe some ordering >>> issue? Either >>> way both adding it to redundant_ssa_names or doing the replacement >>> inline work >>> for the bitfield lowering (or work in my testing at least). >> Possibly because we (in the past?) inserted/copied stuff based on >> predicates generated at analysis time after we decide to elide something >> so we need to watch for later appearing uses. But who knows ... my mind >> fails me here. >> >> If it works to replace uses immediately please do so. But now >> I wonder why we need this - the value shouldn't change so you >> should get away with re-using the existing SSA name for the final value? > > Yeah... good point. A quick change and minor testing seems to agree. > I'm sure I had a good reason to do it initially ;) > > I'll run a full-regression on this change to make sure I didn't miss > anything. > diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1.c new file mode 100644 index 0000000000000000000000000000000000000000..01cf34fb44484ca926ca5de99eef76dd99b69e92 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1.c @@ -0,0 +1,40 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" + +extern void abort(void); + +struct s { int i : 31; }; + +#define ELT0 {0} +#define ELT1 {1} +#define ELT2 {2} +#define ELT3 {3} +#define N 32 +#define RES 48 +struct s A[N] + = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3}; + +int __attribute__ ((noipa)) +f(struct s *ptr, unsigned n) { + int res = 0; + for (int i = 0; i < n; ++i) + res += ptr[i].i; + return res; +} + +int main (void) +{ + check_vect (); + + if (f(&A[0], N) != RES) + abort (); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2.c new file mode 100644 index 0000000000000000000000000000000000000000..1a4a1579c1478b9407ad21b19e8fbdca9f674b42 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2.c @@ -0,0 +1,43 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" + +extern void abort(void); + +struct s { + unsigned i : 31; + char a : 4; +}; + +#define N 32 +#define ELT0 {0x7FFFFFFFUL, 0} +#define ELT1 {0x7FFFFFFFUL, 1} +#define ELT2 {0x7FFFFFFFUL, 2} +#define ELT3 {0x7FFFFFFFUL, 3} +#define RES 48 +struct s A[N] + = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3}; + +int __attribute__ ((noipa)) +f(struct s *ptr, unsigned n) { + int res = 0; + for (int i = 0; i < n; ++i) + res += ptr[i].a; + return res; +} + +int main (void) +{ + check_vect (); + + if (f(&A[0], N) != RES) + abort (); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-3.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-3.c new file mode 100644 index 0000000000000000000000000000000000000000..216611a29fd8bbfbafdbdb79d790e520f44ba672 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-3.c @@ -0,0 +1,43 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" +#include <stdbool.h> + +extern void abort(void); + +typedef struct { + int c; + int b; + bool a : 1; +} struct_t; + +#define N 16 +#define ELT_F { 0xFFFFFFFF, 0xFFFFFFFF, 0 } +#define ELT_T { 0xFFFFFFFF, 0xFFFFFFFF, 1 } + +struct_t vect_false[N] = { ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, + ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F }; +struct_t vect_true[N] = { ELT_F, ELT_F, ELT_T, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, + ELT_F, ELT_F, ELT_T, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F }; +int main (void) +{ + unsigned ret = 0; + for (unsigned i = 0; i < N; i++) + { + ret |= vect_false[i].a; + } + if (ret) + abort (); + + for (unsigned i = 0; i < N; i++) + { + ret |= vect_true[i].a; + } + if (!ret) + abort (); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-4.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-4.c new file mode 100644 index 0000000000000000000000000000000000000000..5bc9c412e9616aefcbf49a4518f1603380a54b2f --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-4.c @@ -0,0 +1,45 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" + +extern void abort(void); + +struct s { + unsigned i : 31; + char x : 2; + char a : 4; +}; + +#define N 32 +#define ELT0 {0x7FFFFFFFUL, 3, 0} +#define ELT1 {0x7FFFFFFFUL, 3, 1} +#define ELT2 {0x7FFFFFFFUL, 3, 2} +#define ELT3 {0x7FFFFFFFUL, 3, 3} +#define RES 48 +struct s A[N] + = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3}; + +int __attribute__ ((noipa)) +f(struct s *ptr, unsigned n) { + int res = 0; + for (int i = 0; i < n; ++i) + res += ptr[i].a; + return res; +} + +int main (void) +{ + check_vect (); + + if (f(&A[0], N) != RES) + abort (); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-5.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-5.c new file mode 100644 index 0000000000000000000000000000000000000000..1dc24d3eded192144dc9ad94589b4c5c3d999e65 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-5.c @@ -0,0 +1,42 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" + +extern void abort(void); + +struct s { + unsigned a : 23; unsigned b : 9; +}; + +#define N 32 +#define ELT0 {0x7FFFFFUL, 0} +#define ELT1 {0x7FFFFFUL, 1} +#define ELT2 {0x7FFFFFUL, 2} +#define ELT3 {0x7FFFFFUL, 3} +#define RES 48 +struct s A[N] + = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3}; + +int __attribute__ ((noipa)) +f(struct s *ptr, unsigned n) { + int res = 0; + for (int i = 0; i < n; ++i) + res += ptr[i].b; + return res; +} + +int main (void) +{ + check_vect (); + + if (f(&A[0], N) != RES) + abort (); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-6.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-6.c new file mode 100644 index 0000000000000000000000000000000000000000..7d24c29975865883a7cdc7aa057fbb6bf413e0bc --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-6.c @@ -0,0 +1,42 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" + +extern void abort(void); + +struct s { + unsigned a : 23; unsigned b : 8; +}; + +#define N 32 +#define ELT0 {0x7FFFFFUL, 0} +#define ELT1 {0x7FFFFFUL, 1} +#define ELT2 {0x7FFFFFUL, 2} +#define ELT3 {0x7FFFFFUL, 3} +#define RES 48 +struct s A[N] + = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3}; + +int __attribute__ ((noipa)) +f(struct s *ptr, unsigned n) { + int res = 0; + for (int i = 0; i < n; ++i) + res += ptr[i].b; + return res; +} + +int main (void) +{ + check_vect (); + + if (f(&A[0], N) != RES) + abort (); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c new file mode 100644 index 0000000000000000000000000000000000000000..19683d277b1ade1034496136f1d03bb2b446900f --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c @@ -0,0 +1,39 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" + +extern void abort(void); + +struct s { int i : 31; }; + +#define N 32 +#define V 5 +struct s A[N]; + +void __attribute__ ((noipa)) +f(struct s *ptr, unsigned n) { + for (int i = 0; i < n; ++i) + ptr[i].i = V; +} + +void __attribute__ ((noipa)) +check_f(struct s *ptr) { + for (unsigned i = 0; i < N; ++i) + if (ptr[i].i != V) + abort (); +} + +int main (void) +{ + check_vect (); + __builtin_memset (&A[0], 0, sizeof(struct s) * N); + + f(&A[0], N); + check_f (&A[0]); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c new file mode 100644 index 0000000000000000000000000000000000000000..d550dd35ab75eb67f6e53f89fbf55b7315e50bc9 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c @@ -0,0 +1,42 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" + +extern void abort(void); + +struct s { + unsigned i : 31; + char a : 4; +}; + +#define N 32 +#define V 5 +struct s A[N]; + +void __attribute__ ((noipa)) +f(struct s *ptr, unsigned n) { + for (int i = 0; i < n; ++i) + ptr[i].a = V; +} + +void __attribute__ ((noipa)) +check_f(struct s *ptr) { + for (unsigned i = 0; i < N; ++i) + if (ptr[i].a != V) + abort (); +} + +int main (void) +{ + check_vect (); + __builtin_memset (&A[0], 0, sizeof(struct s) * N); + + f(&A[0], N); + check_f (&A[0]); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c new file mode 100644 index 0000000000000000000000000000000000000000..3303d2610ff972d986be172962c129634ee64254 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c @@ -0,0 +1,43 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" + +extern void abort(void); + +struct s { + unsigned i : 31; + char x : 2; + char a : 4; +}; + +#define N 32 +#define V 5 +struct s A[N]; + +void __attribute__ ((noipa)) +f(struct s *ptr, unsigned n) { + for (int i = 0; i < n; ++i) + ptr[i].a = V; +} + +void __attribute__ ((noipa)) +check_f(struct s *ptr) { + for (unsigned i = 0; i < N; ++i) + if (ptr[i].a != V) + abort (); +} + +int main (void) +{ + check_vect (); + __builtin_memset (&A[0], 0, sizeof(struct s) * N); + + f(&A[0], N); + check_f (&A[0]); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-4.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-4.c new file mode 100644 index 0000000000000000000000000000000000000000..fae6ea3557dcaba7b330ebdaa471281d33d2ba15 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-4.c @@ -0,0 +1,42 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" + +extern void abort(void); + +struct s { + unsigned b : 23; + unsigned a : 9; +}; + +#define N 32 +#define V 5 +struct s A[N]; + +void __attribute__ ((noipa)) +f(struct s *ptr, unsigned n) { + for (int i = 0; i < n; ++i) + ptr[i].a = V; +} + +void __attribute__ ((noipa)) +check_f(struct s *ptr) { + for (unsigned i = 0; i < N; ++i) + if (ptr[i].a != V) + abort (); +} + +int main (void) +{ + check_vect (); + __builtin_memset (&A[0], 0, sizeof(struct s) * N); + + f(&A[0], N); + check_f (&A[0]); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-5.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-5.c new file mode 100644 index 0000000000000000000000000000000000000000..99360c2967b076212c67eb4f34b8fd91711d8821 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-5.c @@ -0,0 +1,42 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" + +extern void abort(void); + +struct s { + unsigned b : 23; + unsigned a : 8; +}; + +#define N 32 +#define V 5 +struct s A[N]; + +void __attribute__ ((noipa)) +f(struct s *ptr, unsigned n) { + for (int i = 0; i < n; ++i) + ptr[i].a = V; +} + +void __attribute__ ((noipa)) +check_f(struct s *ptr) { + for (unsigned i = 0; i < N; ++i) + if (ptr[i].a != V) + abort (); +} + +int main (void) +{ + check_vect (); + __builtin_memset (&A[0], 0, sizeof(struct s) * N); + + f(&A[0], N); + check_f (&A[0]); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ + diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc index 1c8e1a45234b8c3565edaacd55abbee23d8ea240..d13b2fa6661d56e911bb9ec37cd3a9885fa653bb 100644 --- a/gcc/tree-if-conv.cc +++ b/gcc/tree-if-conv.cc @@ -91,6 +91,7 @@ along with GCC; see the file COPYING3. If not see #include "tree-pass.h" #include "ssa.h" #include "expmed.h" +#include "expr.h" #include "optabs-query.h" #include "gimple-pretty-print.h" #include "alias.h" @@ -123,6 +124,9 @@ along with GCC; see the file COPYING3. If not see #include "tree-vectorizer.h" #include "tree-eh.h" +/* For lang_hooks.types.type_for_mode. */ +#include "langhooks.h" + /* Only handle PHIs with no more arguments unless we are asked to by simd pragma. */ #define MAX_PHI_ARG_NUM \ @@ -145,6 +149,12 @@ static bool need_to_rewrite_undefined; before phi_convertible_by_degenerating_args. */ static bool any_complicated_phi; +/* True if we have bitfield accesses we can lower. */ +static bool need_to_lower_bitfields; + +/* True if there is any ifcvting to be done. */ +static bool need_to_ifcvt; + /* Hash for struct innermost_loop_behavior. It depends on the user to free the memory. */ @@ -1411,15 +1421,6 @@ if_convertible_loop_p_1 (class loop *loop, vec<data_reference_p> *refs) calculate_dominance_info (CDI_DOMINATORS); - /* Allow statements that can be handled during if-conversion. */ - ifc_bbs = get_loop_body_in_if_conv_order (loop); - if (!ifc_bbs) - { - if (dump_file && (dump_flags & TDF_DETAILS)) - fprintf (dump_file, "Irreducible loop\n"); - return false; - } - for (i = 0; i < loop->num_nodes; i++) { basic_block bb = ifc_bbs[i]; @@ -2898,18 +2899,22 @@ version_loop_for_if_conversion (class loop *loop, vec<gimple *> *preds) class loop *new_loop; gimple *g; gimple_stmt_iterator gsi; - unsigned int save_length; + unsigned int save_length = 0; g = gimple_build_call_internal (IFN_LOOP_VECTORIZED, 2, build_int_cst (integer_type_node, loop->num), integer_zero_node); gimple_call_set_lhs (g, cond); - /* Save BB->aux around loop_version as that uses the same field. */ - save_length = loop->inner ? loop->inner->num_nodes : loop->num_nodes; - void **saved_preds = XALLOCAVEC (void *, save_length); - for (unsigned i = 0; i < save_length; i++) - saved_preds[i] = ifc_bbs[i]->aux; + void **saved_preds = NULL; + if (any_complicated_phi || need_to_predicate) + { + /* Save BB->aux around loop_version as that uses the same field. */ + save_length = loop->inner ? loop->inner->num_nodes : loop->num_nodes; + saved_preds = XALLOCAVEC (void *, save_length); + for (unsigned i = 0; i < save_length; i++) + saved_preds[i] = ifc_bbs[i]->aux; + } initialize_original_copy_tables (); /* At this point we invalidate porfile confistency until IFN_LOOP_VECTORIZED @@ -2921,8 +2926,9 @@ version_loop_for_if_conversion (class loop *loop, vec<gimple *> *preds) profile_probability::always (), true); free_original_copy_tables (); - for (unsigned i = 0; i < save_length; i++) - ifc_bbs[i]->aux = saved_preds[i]; + if (any_complicated_phi || need_to_predicate) + for (unsigned i = 0; i < save_length; i++) + ifc_bbs[i]->aux = saved_preds[i]; if (new_loop == NULL) return NULL; @@ -2998,7 +3004,7 @@ ifcvt_split_critical_edges (class loop *loop, bool aggressive_if_conv) auto_vec<edge> critical_edges; /* Loop is not well formed. */ - if (num <= 2 || loop->inner || !single_exit (loop)) + if (loop->inner) return false; body = get_loop_body (loop); @@ -3259,6 +3265,201 @@ ifcvt_hoist_invariants (class loop *loop, edge pe) free (body); } +/* Returns the DECL_FIELD_BIT_OFFSET of the bitfield accesse in stmt iff its + type mode is not BLKmode. If BITPOS is not NULL it will hold the poly_int64 + value of the DECL_FIELD_BIT_OFFSET of the bitfield access and STRUCT_EXPR, + if not NULL, will hold the tree representing the base struct of this + bitfield. */ + +static tree +get_bitfield_rep (gassign *stmt, bool write, tree *bitpos, + tree *struct_expr) +{ + tree comp_ref = write ? gimple_assign_lhs (stmt) + : gimple_assign_rhs1 (stmt); + + tree field_decl = TREE_OPERAND (comp_ref, 1); + tree rep_decl = DECL_BIT_FIELD_REPRESENTATIVE (field_decl); + + /* Bail out if the representative is BLKmode as we will not be able to + vectorize this. */ + if (TYPE_MODE (TREE_TYPE (rep_decl)) == E_BLKmode) + return NULL_TREE; + + /* Bail out if the DECL_SIZE of the field_decl isn't the same as the BF's + precision. */ + unsigned HOST_WIDE_INT bf_prec + = TYPE_PRECISION (TREE_TYPE (gimple_assign_lhs (stmt))); + if (compare_tree_int (DECL_SIZE (field_decl), bf_prec) != 0) + return NULL_TREE; + + if (struct_expr) + *struct_expr = TREE_OPERAND (comp_ref, 0); + + if (bitpos) + *bitpos + = fold_build2 (MINUS_EXPR, bitsizetype, + DECL_FIELD_BIT_OFFSET (field_decl), + DECL_FIELD_BIT_OFFSET (rep_decl)); + + return rep_decl; + +} + +/* Lowers the bitfield described by DATA. + For a write like: + + struct.bf = _1; + + lower to: + + __ifc_1 = struct.<representative>; + __ifc_2 = BIT_INSERT_EXPR (__ifc_1, _1, bitpos); + struct.<representative> = __ifc_2; + + For a read: + + _1 = struct.bf; + + lower to: + + __ifc_1 = struct.<representative>; + _1 = BIT_FIELD_REF (__ifc_1, bitsize, bitpos); + + where representative is a legal load that contains the bitfield value, + bitsize is the size of the bitfield and bitpos the offset to the start of + the bitfield within the representative. */ + +static void +lower_bitfield (gassign *stmt, bool write) +{ + tree struct_expr; + tree bitpos; + tree rep_decl = get_bitfield_rep (stmt, write, &bitpos, &struct_expr); + tree rep_type = TREE_TYPE (rep_decl); + tree bf_type = TREE_TYPE (gimple_assign_lhs (stmt)); + + gimple_stmt_iterator gsi = gsi_for_stmt (stmt); + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "Lowering:\n"); + print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM); + fprintf (dump_file, "to:\n"); + } + + /* REP_COMP_REF is a COMPONENT_REF for the representative. NEW_VAL is it's + defining SSA_NAME. */ + tree rep_comp_ref = build3 (COMPONENT_REF, rep_type, struct_expr, rep_decl, + NULL_TREE); + tree new_val = ifc_temp_var (rep_type, rep_comp_ref, &gsi); + + if (dump_file && (dump_flags & TDF_DETAILS)) + print_gimple_stmt (dump_file, SSA_NAME_DEF_STMT (new_val), 0, TDF_SLIM); + + if (write) + { + new_val = ifc_temp_var (rep_type, + build3 (BIT_INSERT_EXPR, rep_type, new_val, + unshare_expr (gimple_assign_rhs1 (stmt)), + bitpos), &gsi); + + if (dump_file && (dump_flags & TDF_DETAILS)) + print_gimple_stmt (dump_file, SSA_NAME_DEF_STMT (new_val), 0, TDF_SLIM); + + gimple *new_stmt = gimple_build_assign (unshare_expr (rep_comp_ref), + new_val); + gimple_move_vops (new_stmt, stmt); + gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT); + + if (dump_file && (dump_flags & TDF_DETAILS)) + print_gimple_stmt (dump_file, new_stmt, 0, TDF_SLIM); + } + else + { + tree bfr = build3 (BIT_FIELD_REF, bf_type, new_val, + build_int_cst (bitsizetype, TYPE_PRECISION (bf_type)), + bitpos); + new_val = ifc_temp_var (bf_type, bfr, &gsi); + + gimple *new_stmt = gimple_build_assign (gimple_assign_lhs (stmt), + new_val); + gimple_move_vops (new_stmt, stmt); + gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT); + + if (dump_file && (dump_flags & TDF_DETAILS)) + print_gimple_stmt (dump_file, new_stmt, 0, TDF_SLIM); + } + + gsi_remove (&gsi, true); +} + +/* Return TRUE if there are bitfields to lower in this LOOP. Fill TO_LOWER + with data structures representing these bitfields. */ + +static bool +bitfields_to_lower_p (class loop *loop, + vec <gassign *> &reads_to_lower, + vec <gassign *> &writes_to_lower) +{ + gimple_stmt_iterator gsi; + + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "Analyzing loop %d for bitfields:\n", loop->num); + } + + for (unsigned i = 0; i < loop->num_nodes; ++i) + { + basic_block bb = ifc_bbs[i]; + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) + { + gassign *stmt = dyn_cast<gassign*> (gsi_stmt (gsi)); + if (!stmt) + continue; + + tree op = gimple_assign_lhs (stmt); + bool write = TREE_CODE (op) == COMPONENT_REF; + + if (!write) + op = gimple_assign_rhs1 (stmt); + + if (TREE_CODE (op) != COMPONENT_REF) + continue; + + if (DECL_BIT_FIELD_TYPE (TREE_OPERAND (op, 1))) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM); + + if (!INTEGRAL_TYPE_P (TREE_TYPE (op))) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "\t Bitfield NO OK to lower," + " field type is not Integral.\n"); + return false; + } + + if (!get_bitfield_rep (stmt, write, NULL, NULL)) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "\t Bitfield NOT OK to lower," + " representative is BLKmode.\n"); + return false; + } + + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "\tBitfield OK to lower.\n"); + if (write) + writes_to_lower.safe_push (stmt); + else + reads_to_lower.safe_push (stmt); + } + } + } + return !reads_to_lower.is_empty () || !writes_to_lower.is_empty (); +} + + /* If-convert LOOP when it is legal. For the moment this pass has no profitability analysis. Returns non-zero todo flags when something changed. */ @@ -3269,12 +3470,16 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds) unsigned int todo = 0; bool aggressive_if_conv; class loop *rloop; + auto_vec <gassign *, 4> reads_to_lower; + auto_vec <gassign *, 4> writes_to_lower; bitmap exit_bbs; edge pe; again: rloop = NULL; ifc_bbs = NULL; + need_to_lower_bitfields = false; + need_to_ifcvt = false; need_to_predicate = false; need_to_rewrite_undefined = false; any_complicated_phi = false; @@ -3290,16 +3495,42 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds) aggressive_if_conv = true; } - if (!ifcvt_split_critical_edges (loop, aggressive_if_conv)) + if (!single_exit (loop)) goto cleanup; - if (!if_convertible_loop_p (loop) - || !dbg_cnt (if_conversion_tree)) + /* If there are more than two BBs in the loop then there is at least one if + to convert. */ + if (loop->num_nodes > 2 + && !ifcvt_split_critical_edges (loop, aggressive_if_conv)) goto cleanup; - if ((need_to_predicate || any_complicated_phi) - && ((!flag_tree_loop_vectorize && !loop->force_vectorize) - || loop->dont_vectorize)) + ifc_bbs = get_loop_body_in_if_conv_order (loop); + if (!ifc_bbs) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "Irreducible loop\n"); + goto cleanup; + } + + if (loop->num_nodes > 2) + { + need_to_ifcvt = true; + + if (!if_convertible_loop_p (loop) || !dbg_cnt (if_conversion_tree)) + goto cleanup; + + if ((need_to_predicate || any_complicated_phi) + && ((!flag_tree_loop_vectorize && !loop->force_vectorize) + || loop->dont_vectorize)) + goto cleanup; + } + + if ((flag_tree_loop_vectorize || loop->force_vectorize) + && !loop->dont_vectorize) + need_to_lower_bitfields = bitfields_to_lower_p (loop, reads_to_lower, + writes_to_lower); + + if (!need_to_ifcvt && !need_to_lower_bitfields) goto cleanup; /* The edge to insert invariant stmts on. */ @@ -3310,7 +3541,8 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds) Either version this loop, or if the pattern is right for outer-loop vectorization, version the outer loop. In the latter case we will still if-convert the original inner loop. */ - if (need_to_predicate + if (need_to_lower_bitfields + || need_to_predicate || any_complicated_phi || flag_tree_loop_if_convert != 1) { @@ -3350,10 +3582,31 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds) pe = single_pred_edge (gimple_bb (preds->last ())); } - /* Now all statements are if-convertible. Combine all the basic - blocks into one huge basic block doing the if-conversion - on-the-fly. */ - combine_blocks (loop); + if (need_to_lower_bitfields) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "-------------------------\n"); + fprintf (dump_file, "Start lowering bitfields\n"); + } + while (!reads_to_lower.is_empty ()) + lower_bitfield (reads_to_lower.pop (), false); + while (!writes_to_lower.is_empty ()) + lower_bitfield (writes_to_lower.pop (), true); + + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "Done lowering bitfields\n"); + fprintf (dump_file, "-------------------------\n"); + } + } + if (need_to_ifcvt) + { + /* Now all statements are if-convertible. Combine all the basic + blocks into one huge basic block doing the if-conversion + on-the-fly. */ + combine_blocks (loop); + } /* Perform local CSE, this esp. helps the vectorizer analysis if loads and stores are involved. CSE only the loop body, not the entry @@ -3393,6 +3646,8 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds) if (rloop != NULL) { loop = rloop; + reads_to_lower.truncate (0); + writes_to_lower.truncate (0); goto again; } diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc index b279a82551eb70379804d405983ae5dc44b66bf5..e93cdc727da4bb7863b2ad13f29f7d550492adea 100644 --- a/gcc/tree-vect-data-refs.cc +++ b/gcc/tree-vect-data-refs.cc @@ -4301,7 +4301,8 @@ vect_find_stmt_data_reference (loop_p loop, gimple *stmt, free_data_ref (dr); return opt_result::failure_at (stmt, "not vectorized:" - " statement is bitfield access %G", stmt); + " statement is an unsupported" + " bitfield access %G", stmt); } if (DR_BASE_ADDRESS (dr) diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc index dfbfb71b3c69a0205ccc1b287cb50fa02a70942e..9042599f04399eca37fe9038d2bd5c9f78e3a9e4 100644 --- a/gcc/tree-vect-patterns.cc +++ b/gcc/tree-vect-patterns.cc @@ -35,6 +35,8 @@ along with GCC; see the file COPYING3. If not see #include "tree-eh.h" #include "gimplify.h" #include "gimple-iterator.h" +#include "gimple-fold.h" +#include "gimplify-me.h" #include "cfgloop.h" #include "tree-vectorizer.h" #include "dumpfile.h" @@ -663,7 +665,7 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code, is NULL, the caller must set SSA_NAME_DEF_STMT for the returned SSA var. */ static tree -vect_recog_temp_ssa_var (tree type, gimple *stmt) +vect_recog_temp_ssa_var (tree type, gimple *stmt = NULL) { return make_temp_ssa_name (type, stmt, "patt"); } @@ -1828,6 +1830,329 @@ vect_recog_widen_sum_pattern (vec_info *vinfo, return pattern_stmt; } +/* Function vect_recog_bitfield_ref_pattern + + Try to find the following pattern: + + bf_value = BIT_FIELD_REF (container, bitsize, bitpos); + result = (type_out) bf_value; + + where type_out is a non-bitfield type, that is to say, it's precision matches + 2^(TYPE_SIZE(type_out) - (TYPE_UNSIGNED (type_out) ? 1 : 2)). + + Input: + + * STMT_VINFO: The stmt from which the pattern search begins. + here it starts with: + result = (type_out) bf_value; + + Output: + + * TYPE_OUT: The vector type of the output of this pattern. + + * Return value: A new stmt that will be used to replace the sequence of + stmts that constitute the pattern. If the precision of type_out is bigger + than the precision type of _1 we perform the widening before the shifting, + since the new precision will be large enough to shift the value and moving + widening operations up the statement chain enables the generation of + widening loads. If we are widening and the operation after the pattern is + an addition then we mask first and shift later, to enable the generation of + shifting adds. In the case of narrowing we will always mask first, shift + last and then perform a narrowing operation. This will enable the + generation of narrowing shifts. + + Widening with mask first, shift later: + container = (type_out) container; + masked = container & (((1 << bitsize) - 1) << bitpos); + result = patt2 >> masked; + + Widening with shift first, mask last: + container = (type_out) container; + shifted = container >> bitpos; + result = shifted & ((1 << bitsize) - 1); + + Narrowing: + masked = container & (((1 << bitsize) - 1) << bitpos); + result = masked >> bitpos; + result = (type_out) result; + + The shifting is always optional depending on whether bitpos != 0. + +*/ + +static gimple * +vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info, + tree *type_out) +{ + gassign *first_stmt = dyn_cast <gassign *> (stmt_info->stmt); + + if (!first_stmt) + return NULL; + + gassign *bf_stmt; + if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (first_stmt)) + && TREE_CODE (gimple_assign_rhs1 (first_stmt)) == SSA_NAME) + { + gimple *second_stmt + = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (first_stmt)); + bf_stmt = dyn_cast <gassign *> (second_stmt); + if (!bf_stmt + || gimple_assign_rhs_code (bf_stmt) != BIT_FIELD_REF) + return NULL; + } + else + return NULL; + + tree bf_ref = gimple_assign_rhs1 (bf_stmt); + tree container = TREE_OPERAND (bf_ref, 0); + + if (!bit_field_offset (bf_ref).is_constant () + || !bit_field_size (bf_ref).is_constant () + || !tree_fits_uhwi_p (TYPE_SIZE (TREE_TYPE (container)))) + return NULL; + + if (!INTEGRAL_TYPE_P (TREE_TYPE (bf_ref))) + return NULL; + + gimple *use_stmt, *pattern_stmt; + use_operand_p use_p; + tree ret = gimple_assign_lhs (first_stmt); + tree ret_type = TREE_TYPE (ret); + bool shift_first = true; + tree vectype; + + /* If the first operand of the BIT_FIELD_REF is not an INTEGER type, convert + it to one of the same width so we can perform the necessary masking and + shifting. */ + if (!INTEGRAL_TYPE_P (TREE_TYPE (container))) + { + unsigned HOST_WIDE_INT container_size = + tree_to_uhwi (TYPE_SIZE (TREE_TYPE (container))); + tree int_type = build_nonstandard_integer_type (container_size, true); + pattern_stmt + = gimple_build_assign (vect_recog_temp_ssa_var (int_type), + VIEW_CONVERT_EXPR, container); + vectype = get_vectype_for_scalar_type (vinfo, int_type); + container = gimple_assign_lhs (pattern_stmt); + append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype); + } + else + vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (container)); + + /* We move the conversion earlier if the loaded type is smaller than the + return type to enable the use of widening loads. */ + if (TYPE_PRECISION (TREE_TYPE (container)) < TYPE_PRECISION (ret_type) + && !useless_type_conversion_p (TREE_TYPE (container), ret_type)) + { + pattern_stmt + = gimple_build_assign (vect_recog_temp_ssa_var (ret_type), + NOP_EXPR, container); + container = gimple_get_lhs (pattern_stmt); + append_pattern_def_seq (vinfo, stmt_info, pattern_stmt); + } + else if (!useless_type_conversion_p (TREE_TYPE (container), ret_type)) + /* If we are doing the conversion last then also delay the shift as we may + be able to combine the shift and conversion in certain cases. */ + shift_first = false; + + tree container_type = TREE_TYPE (container); + + /* If the only use of the result of this BIT_FIELD_REF + CONVERT is a + PLUS_EXPR then do the shift last as some targets can combine the shift and + add into a single instruction. */ + if (single_imm_use (gimple_assign_lhs (first_stmt), &use_p, &use_stmt)) + { + if (gimple_code (use_stmt) == GIMPLE_ASSIGN + && gimple_assign_rhs_code (use_stmt) == PLUS_EXPR) + shift_first = false; + } + + unsigned HOST_WIDE_INT shift_n = bit_field_offset (bf_ref).to_constant (); + unsigned HOST_WIDE_INT mask_width = bit_field_size (bf_ref).to_constant (); + unsigned HOST_WIDE_INT prec = tree_to_uhwi (TYPE_SIZE (container_type)); + if (BYTES_BIG_ENDIAN) + shift_n = prec - shift_n - mask_width; + + /* If we don't have to shift we only generate the mask, so just fix the + code-path to shift_first. */ + if (shift_n == 0) + shift_first = true; + + tree result; + if (shift_first) + { + tree shifted = container; + if (shift_n) + { + pattern_stmt + = gimple_build_assign (vect_recog_temp_ssa_var (container_type), + RSHIFT_EXPR, container, + build_int_cst (sizetype, shift_n)); + shifted = gimple_assign_lhs (pattern_stmt); + append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype); + } + + tree mask = wide_int_to_tree (container_type, + wi::mask (mask_width, false, prec)); + + pattern_stmt + = gimple_build_assign (vect_recog_temp_ssa_var (container_type), + BIT_AND_EXPR, shifted, mask); + result = gimple_assign_lhs (pattern_stmt); + } + else + { + tree mask = wide_int_to_tree (container_type, + wi::shifted_mask (shift_n, mask_width, + false, prec)); + pattern_stmt + = gimple_build_assign (vect_recog_temp_ssa_var (container_type), + BIT_AND_EXPR, container, mask); + tree masked = gimple_assign_lhs (pattern_stmt); + + append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype); + pattern_stmt + = gimple_build_assign (vect_recog_temp_ssa_var (container_type), + RSHIFT_EXPR, masked, + build_int_cst (sizetype, shift_n)); + result = gimple_assign_lhs (pattern_stmt); + } + + if (!useless_type_conversion_p (TREE_TYPE (result), ret_type)) + { + append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype); + pattern_stmt + = gimple_build_assign (vect_recog_temp_ssa_var (ret_type), + NOP_EXPR, result); + } + + *type_out = STMT_VINFO_VECTYPE (stmt_info); + vect_pattern_detected ("bitfield_ref pattern", stmt_info->stmt); + + return pattern_stmt; +} + +/* Function vect_recog_bit_insert_pattern + + Try to find the following pattern: + + written = BIT_INSERT_EXPR (container, value, bitpos); + + Input: + + * STMT_VINFO: The stmt we want to replace. + + Output: + + * TYPE_OUT: The vector type of the output of this pattern. + + * Return value: A new stmt that will be used to replace the sequence of + stmts that constitute the pattern. In this case it will be: + value = (container_type) value; // Make sure + shifted = value << bitpos; // Shift value into place + masked = shifted & (mask << bitpos); // Mask off the non-relevant bits in + // the 'to-write value'. + cleared = container & ~(mask << bitpos); // Clearing the bits we want to + // write to from the value we want + // to write to. + written = cleared | masked; // Write bits. + + + where mask = ((1 << TYPE_PRECISION (value)) - 1), a mask to keep the number of + bits corresponding to the real size of the bitfield value we are writing to. + The shifting is always optional depending on whether bitpos != 0. + +*/ + +static gimple * +vect_recog_bit_insert_pattern (vec_info *vinfo, stmt_vec_info stmt_info, + tree *type_out) +{ + gassign *bf_stmt = dyn_cast <gassign *> (stmt_info->stmt); + if (!bf_stmt || gimple_assign_rhs_code (bf_stmt) != BIT_INSERT_EXPR) + return NULL; + + tree container = gimple_assign_rhs1 (bf_stmt); + tree value = gimple_assign_rhs2 (bf_stmt); + tree shift = gimple_assign_rhs3 (bf_stmt); + + tree bf_type = TREE_TYPE (value); + tree container_type = TREE_TYPE (container); + + if (!INTEGRAL_TYPE_P (container_type) + || !tree_fits_uhwi_p (TYPE_SIZE (container_type))) + return NULL; + + gimple *pattern_stmt; + + vect_unpromoted_value unprom; + unprom.set_op (value, vect_internal_def); + value = vect_convert_input (vinfo, stmt_info, container_type, &unprom, + get_vectype_for_scalar_type (vinfo, + container_type)); + + unsigned HOST_WIDE_INT mask_width = TYPE_PRECISION (bf_type); + unsigned HOST_WIDE_INT prec = tree_to_uhwi (TYPE_SIZE (container_type)); + unsigned HOST_WIDE_INT shift_n = tree_to_uhwi (shift); + if (BYTES_BIG_ENDIAN) + { + shift_n = prec - shift_n - mask_width; + shift = build_int_cst (TREE_TYPE (shift), shift_n); + } + + if (!useless_type_conversion_p (TREE_TYPE (value), container_type)) + { + pattern_stmt = + gimple_build_assign (vect_recog_temp_ssa_var (container_type), + NOP_EXPR, value); + append_pattern_def_seq (vinfo, stmt_info, pattern_stmt); + value = gimple_get_lhs (pattern_stmt); + } + + /* Shift VALUE into place. */ + tree shifted = value; + if (shift_n) + { + pattern_stmt + = gimple_build_assign (vect_recog_temp_ssa_var (container_type), + LSHIFT_EXPR, value, shift); + append_pattern_def_seq (vinfo, stmt_info, pattern_stmt); + shifted = gimple_get_lhs (pattern_stmt); + } + + tree mask_t + = wide_int_to_tree (container_type, + wi::shifted_mask (shift_n, mask_width, false, prec)); + + /* Clear bits we don't want to write back from SHIFTED. */ + gimple_seq stmts = NULL; + tree masked = gimple_build (&stmts, BIT_AND_EXPR, container_type, shifted, + mask_t); + if (!gimple_seq_empty_p (stmts)) + { + pattern_stmt = gimple_seq_first_stmt (stmts); + append_pattern_def_seq (vinfo, stmt_info, pattern_stmt); + } + + /* Mask off the bits in the container that we are to write to. */ + mask_t = wide_int_to_tree (container_type, + wi::shifted_mask (shift_n, mask_width, true, prec)); + tree cleared = vect_recog_temp_ssa_var (container_type); + pattern_stmt = gimple_build_assign (cleared, BIT_AND_EXPR, container, mask_t); + append_pattern_def_seq (vinfo, stmt_info, pattern_stmt); + + /* Write MASKED into CLEARED. */ + pattern_stmt + = gimple_build_assign (vect_recog_temp_ssa_var (container_type), + BIT_IOR_EXPR, cleared, masked); + + *type_out = STMT_VINFO_VECTYPE (stmt_info); + vect_pattern_detected ("bit_insert pattern", stmt_info->stmt); + + return pattern_stmt; +} + + /* Recognize cases in which an operation is performed in one type WTYPE but could be done more efficiently in a narrower type NTYPE. For example, if we have: @@ -5623,6 +5948,8 @@ struct vect_recog_func taken which means usually the more complex one needs to preceed the less comples onex (widen_sum only after dot_prod or sad for example). */ static vect_recog_func vect_vect_recog_func_ptrs[] = { + { vect_recog_bitfield_ref_pattern, "bitfield_ref" }, + { vect_recog_bit_insert_pattern, "bit_insert" }, { vect_recog_over_widening_pattern, "over_widening" }, /* Must come after over_widening, which narrows the shift as much as possible beforehand. */
On Wed, Sep 28, 2022 at 7:32 PM Andre Vieira (lists) via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > Made the change and also created the ChangeLogs. OK if bootstrap / testing succeeds. Thanks, Richard. > gcc/ChangeLog: > > * tree-if-conv.cc (if_convertible_loop_p_1): Move ordering of > loop bb's from here... > (tree_if_conversion): ... to here. Also call bitfield lowering > when appropriate. > (version_loop_for_if_conversion): Adapt to enable loop > versioning when we only need > to lower bitfields. > (ifcvt_split_critical_edges): Relax condition of expected loop > form as this is checked earlier. > (get_bitfield_rep): New function. > (lower_bitfield): Likewise. > (bitfields_to_lower_p): Likewise. > (need_to_lower_bitfields): New global boolean. > (need_to_ifcvt): Likewise. > * tree-vect-data-refs.cc (vect_find_stmt_data_reference): > Improve diagnostic message. > * tree-vect-patterns.cc (vect_recog_temp_ssa_var): Add default > value for last parameter. > (vect_recog_bitfield_ref_pattern): New. > (vect_recog_bit_insert_pattern): New. > > gcc/testsuite/ChangeLog: > > * gcc.dg/vect/vect-bitfield-read-1.c: New test. > * gcc.dg/vect/vect-bitfield-read-2.c: New test. > * gcc.dg/vect/vect-bitfield-read-3.c: New test. > * gcc.dg/vect/vect-bitfield-read-4.c: New test. > * gcc.dg/vect/vect-bitfield-read-5.c: New test. > * gcc.dg/vect/vect-bitfield-read-6.c: New test. > * gcc.dg/vect/vect-bitfield-write-1.c: New test. > * gcc.dg/vect/vect-bitfield-write-2.c: New test. > * gcc.dg/vect/vect-bitfield-write-3.c: New test. > * gcc.dg/vect/vect-bitfield-write-4.c: New test. > * gcc.dg/vect/vect-bitfield-write-5.c: New test. > > On 28/09/2022 10:43, Andre Vieira (lists) via Gcc-patches wrote: > > > > On 27/09/2022 13:34, Richard Biener wrote: > >> On Mon, 26 Sep 2022, Andre Vieira (lists) wrote: > >> > >>> On 08/09/2022 12:51, Richard Biener wrote: > >>>> I'm curious, why the push to redundant_ssa_names? That could use > >>>> a comment ... > >>> So I purposefully left a #if 0 #else #endif in there so you can see > >>> the two > >>> options. But the reason I used redundant_ssa_names is because ifcvt > >>> seems to > >>> use that as a container for all pairs of (old, new) ssa names to > >>> replace > >>> later. So I just piggy backed on that. I don't know if there's a > >>> specific > >>> reason they do the replacement at the end? Maybe some ordering > >>> issue? Either > >>> way both adding it to redundant_ssa_names or doing the replacement > >>> inline work > >>> for the bitfield lowering (or work in my testing at least). > >> Possibly because we (in the past?) inserted/copied stuff based on > >> predicates generated at analysis time after we decide to elide something > >> so we need to watch for later appearing uses. But who knows ... my mind > >> fails me here. > >> > >> If it works to replace uses immediately please do so. But now > >> I wonder why we need this - the value shouldn't change so you > >> should get away with re-using the existing SSA name for the final value? > > > > Yeah... good point. A quick change and minor testing seems to agree. > > I'm sure I had a good reason to do it initially ;) > > > > I'll run a full-regression on this change to make sure I didn't miss > > anything. > >
Hi, Whilst running a bootstrap with extra options to force bitfield vectorization '-O2 -ftree-vectorize -ftree-loop-if-convert -fno-vect-cost-model' I ran into an ICE in vect-patterns where a bit_field_ref had a container that wasn't INTEGRAL_TYPE and had a E_BLKmode, which meant we failed to build an integer type with the same size. For that reason I added a check to bail out earlier if the TYPE_MODE of the container is indeed E_BLKmode. The pattern for the bitfield inserts required no change as we currently don't support containers that aren't integer typed. Also changed a testcase because in BIG-ENDIAN it was not vectorizing due to a different size of container that wasn't supported. This passes the same bootstrap and regressions on aarch64-none-linux and no regressions on aarch64_be-none-elf either. I assume you are OK with these changes Richard, but I don't like to commit on Friday in case something breaks over the weekend, so I'll leave it until Monday. Thanks, Andre On 29/09/2022 08:54, Richard Biener wrote: > On Wed, Sep 28, 2022 at 7:32 PM Andre Vieira (lists) via Gcc-patches > <gcc-patches@gcc.gnu.org> wrote: >> Made the change and also created the ChangeLogs. > OK if bootstrap / testing succeeds. > > Thanks, > Richard. > >> gcc/ChangeLog: >> >> * tree-if-conv.cc (if_convertible_loop_p_1): Move ordering of >> loop bb's from here... >> (tree_if_conversion): ... to here. Also call bitfield lowering >> when appropriate. >> (version_loop_for_if_conversion): Adapt to enable loop >> versioning when we only need >> to lower bitfields. >> (ifcvt_split_critical_edges): Relax condition of expected loop >> form as this is checked earlier. >> (get_bitfield_rep): New function. >> (lower_bitfield): Likewise. >> (bitfields_to_lower_p): Likewise. >> (need_to_lower_bitfields): New global boolean. >> (need_to_ifcvt): Likewise. >> * tree-vect-data-refs.cc (vect_find_stmt_data_reference): >> Improve diagnostic message. >> * tree-vect-patterns.cc (vect_recog_temp_ssa_var): Add default >> value for last parameter. >> (vect_recog_bitfield_ref_pattern): New. >> (vect_recog_bit_insert_pattern): New. >> >> gcc/testsuite/ChangeLog: >> >> * gcc.dg/vect/vect-bitfield-read-1.c: New test. >> * gcc.dg/vect/vect-bitfield-read-2.c: New test. >> * gcc.dg/vect/vect-bitfield-read-3.c: New test. >> * gcc.dg/vect/vect-bitfield-read-4.c: New test. >> * gcc.dg/vect/vect-bitfield-read-5.c: New test. >> * gcc.dg/vect/vect-bitfield-read-6.c: New test. >> * gcc.dg/vect/vect-bitfield-write-1.c: New test. >> * gcc.dg/vect/vect-bitfield-write-2.c: New test. >> * gcc.dg/vect/vect-bitfield-write-3.c: New test. >> * gcc.dg/vect/vect-bitfield-write-4.c: New test. >> * gcc.dg/vect/vect-bitfield-write-5.c: New test. >> >> On 28/09/2022 10:43, Andre Vieira (lists) via Gcc-patches wrote: >>> On 27/09/2022 13:34, Richard Biener wrote: >>>> On Mon, 26 Sep 2022, Andre Vieira (lists) wrote: >>>> >>>>> On 08/09/2022 12:51, Richard Biener wrote: >>>>>> I'm curious, why the push to redundant_ssa_names? That could use >>>>>> a comment ... >>>>> So I purposefully left a #if 0 #else #endif in there so you can see >>>>> the two >>>>> options. But the reason I used redundant_ssa_names is because ifcvt >>>>> seems to >>>>> use that as a container for all pairs of (old, new) ssa names to >>>>> replace >>>>> later. So I just piggy backed on that. I don't know if there's a >>>>> specific >>>>> reason they do the replacement at the end? Maybe some ordering >>>>> issue? Either >>>>> way both adding it to redundant_ssa_names or doing the replacement >>>>> inline work >>>>> for the bitfield lowering (or work in my testing at least). >>>> Possibly because we (in the past?) inserted/copied stuff based on >>>> predicates generated at analysis time after we decide to elide something >>>> so we need to watch for later appearing uses. But who knows ... my mind >>>> fails me here. >>>> >>>> If it works to replace uses immediately please do so. But now >>>> I wonder why we need this - the value shouldn't change so you >>>> should get away with re-using the existing SSA name for the final value? >>> Yeah... good point. A quick change and minor testing seems to agree. >>> I'm sure I had a good reason to do it initially ;) >>> >>> I'll run a full-regression on this change to make sure I didn't miss >>> anything. >>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1.c new file mode 100644 index 0000000000000000000000000000000000000000..01cf34fb44484ca926ca5de99eef76dd99b69e92 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1.c @@ -0,0 +1,40 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" + +extern void abort(void); + +struct s { int i : 31; }; + +#define ELT0 {0} +#define ELT1 {1} +#define ELT2 {2} +#define ELT3 {3} +#define N 32 +#define RES 48 +struct s A[N] + = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3}; + +int __attribute__ ((noipa)) +f(struct s *ptr, unsigned n) { + int res = 0; + for (int i = 0; i < n; ++i) + res += ptr[i].i; + return res; +} + +int main (void) +{ + check_vect (); + + if (f(&A[0], N) != RES) + abort (); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2.c new file mode 100644 index 0000000000000000000000000000000000000000..1a4a1579c1478b9407ad21b19e8fbdca9f674b42 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2.c @@ -0,0 +1,43 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" + +extern void abort(void); + +struct s { + unsigned i : 31; + char a : 4; +}; + +#define N 32 +#define ELT0 {0x7FFFFFFFUL, 0} +#define ELT1 {0x7FFFFFFFUL, 1} +#define ELT2 {0x7FFFFFFFUL, 2} +#define ELT3 {0x7FFFFFFFUL, 3} +#define RES 48 +struct s A[N] + = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3}; + +int __attribute__ ((noipa)) +f(struct s *ptr, unsigned n) { + int res = 0; + for (int i = 0; i < n; ++i) + res += ptr[i].a; + return res; +} + +int main (void) +{ + check_vect (); + + if (f(&A[0], N) != RES) + abort (); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-3.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-3.c new file mode 100644 index 0000000000000000000000000000000000000000..849f4a017e1818eee4abd66385417a326c497696 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-3.c @@ -0,0 +1,44 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" +#include <stdbool.h> + +extern void abort(void); + +typedef struct { + int c; + int b; + bool a : 1; + int d : 31; +} struct_t; + +#define N 16 +#define ELT_F { 0xFFFFFFFF, 0xFFFFFFFF, 0, 0x7FFFFFFF } +#define ELT_T { 0xFFFFFFFF, 0xFFFFFFFF, 1, 0x7FFFFFFF } + +struct_t vect_false[N] = { ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, + ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F }; +struct_t vect_true[N] = { ELT_F, ELT_F, ELT_T, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, + ELT_F, ELT_F, ELT_T, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F }; +int main (void) +{ + unsigned ret = 0; + for (unsigned i = 0; i < N; i++) + { + ret |= vect_false[i].a; + } + if (ret) + abort (); + + for (unsigned i = 0; i < N; i++) + { + ret |= vect_true[i].a; + } + if (!ret) + abort (); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-4.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-4.c new file mode 100644 index 0000000000000000000000000000000000000000..5bc9c412e9616aefcbf49a4518f1603380a54b2f --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-4.c @@ -0,0 +1,45 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" + +extern void abort(void); + +struct s { + unsigned i : 31; + char x : 2; + char a : 4; +}; + +#define N 32 +#define ELT0 {0x7FFFFFFFUL, 3, 0} +#define ELT1 {0x7FFFFFFFUL, 3, 1} +#define ELT2 {0x7FFFFFFFUL, 3, 2} +#define ELT3 {0x7FFFFFFFUL, 3, 3} +#define RES 48 +struct s A[N] + = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3}; + +int __attribute__ ((noipa)) +f(struct s *ptr, unsigned n) { + int res = 0; + for (int i = 0; i < n; ++i) + res += ptr[i].a; + return res; +} + +int main (void) +{ + check_vect (); + + if (f(&A[0], N) != RES) + abort (); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-5.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-5.c new file mode 100644 index 0000000000000000000000000000000000000000..1dc24d3eded192144dc9ad94589b4c5c3d999e65 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-5.c @@ -0,0 +1,42 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" + +extern void abort(void); + +struct s { + unsigned a : 23; unsigned b : 9; +}; + +#define N 32 +#define ELT0 {0x7FFFFFUL, 0} +#define ELT1 {0x7FFFFFUL, 1} +#define ELT2 {0x7FFFFFUL, 2} +#define ELT3 {0x7FFFFFUL, 3} +#define RES 48 +struct s A[N] + = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3}; + +int __attribute__ ((noipa)) +f(struct s *ptr, unsigned n) { + int res = 0; + for (int i = 0; i < n; ++i) + res += ptr[i].b; + return res; +} + +int main (void) +{ + check_vect (); + + if (f(&A[0], N) != RES) + abort (); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-6.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-6.c new file mode 100644 index 0000000000000000000000000000000000000000..7d24c29975865883a7cdc7aa057fbb6bf413e0bc --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-6.c @@ -0,0 +1,42 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" + +extern void abort(void); + +struct s { + unsigned a : 23; unsigned b : 8; +}; + +#define N 32 +#define ELT0 {0x7FFFFFUL, 0} +#define ELT1 {0x7FFFFFUL, 1} +#define ELT2 {0x7FFFFFUL, 2} +#define ELT3 {0x7FFFFFUL, 3} +#define RES 48 +struct s A[N] + = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3}; + +int __attribute__ ((noipa)) +f(struct s *ptr, unsigned n) { + int res = 0; + for (int i = 0; i < n; ++i) + res += ptr[i].b; + return res; +} + +int main (void) +{ + check_vect (); + + if (f(&A[0], N) != RES) + abort (); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c new file mode 100644 index 0000000000000000000000000000000000000000..19683d277b1ade1034496136f1d03bb2b446900f --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c @@ -0,0 +1,39 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" + +extern void abort(void); + +struct s { int i : 31; }; + +#define N 32 +#define V 5 +struct s A[N]; + +void __attribute__ ((noipa)) +f(struct s *ptr, unsigned n) { + for (int i = 0; i < n; ++i) + ptr[i].i = V; +} + +void __attribute__ ((noipa)) +check_f(struct s *ptr) { + for (unsigned i = 0; i < N; ++i) + if (ptr[i].i != V) + abort (); +} + +int main (void) +{ + check_vect (); + __builtin_memset (&A[0], 0, sizeof(struct s) * N); + + f(&A[0], N); + check_f (&A[0]); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c new file mode 100644 index 0000000000000000000000000000000000000000..d550dd35ab75eb67f6e53f89fbf55b7315e50bc9 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c @@ -0,0 +1,42 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" + +extern void abort(void); + +struct s { + unsigned i : 31; + char a : 4; +}; + +#define N 32 +#define V 5 +struct s A[N]; + +void __attribute__ ((noipa)) +f(struct s *ptr, unsigned n) { + for (int i = 0; i < n; ++i) + ptr[i].a = V; +} + +void __attribute__ ((noipa)) +check_f(struct s *ptr) { + for (unsigned i = 0; i < N; ++i) + if (ptr[i].a != V) + abort (); +} + +int main (void) +{ + check_vect (); + __builtin_memset (&A[0], 0, sizeof(struct s) * N); + + f(&A[0], N); + check_f (&A[0]); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c new file mode 100644 index 0000000000000000000000000000000000000000..3303d2610ff972d986be172962c129634ee64254 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c @@ -0,0 +1,43 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" + +extern void abort(void); + +struct s { + unsigned i : 31; + char x : 2; + char a : 4; +}; + +#define N 32 +#define V 5 +struct s A[N]; + +void __attribute__ ((noipa)) +f(struct s *ptr, unsigned n) { + for (int i = 0; i < n; ++i) + ptr[i].a = V; +} + +void __attribute__ ((noipa)) +check_f(struct s *ptr) { + for (unsigned i = 0; i < N; ++i) + if (ptr[i].a != V) + abort (); +} + +int main (void) +{ + check_vect (); + __builtin_memset (&A[0], 0, sizeof(struct s) * N); + + f(&A[0], N); + check_f (&A[0]); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-4.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-4.c new file mode 100644 index 0000000000000000000000000000000000000000..fae6ea3557dcaba7b330ebdaa471281d33d2ba15 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-4.c @@ -0,0 +1,42 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" + +extern void abort(void); + +struct s { + unsigned b : 23; + unsigned a : 9; +}; + +#define N 32 +#define V 5 +struct s A[N]; + +void __attribute__ ((noipa)) +f(struct s *ptr, unsigned n) { + for (int i = 0; i < n; ++i) + ptr[i].a = V; +} + +void __attribute__ ((noipa)) +check_f(struct s *ptr) { + for (unsigned i = 0; i < N; ++i) + if (ptr[i].a != V) + abort (); +} + +int main (void) +{ + check_vect (); + __builtin_memset (&A[0], 0, sizeof(struct s) * N); + + f(&A[0], N); + check_f (&A[0]); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-5.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-5.c new file mode 100644 index 0000000000000000000000000000000000000000..99360c2967b076212c67eb4f34b8fd91711d8821 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-5.c @@ -0,0 +1,42 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" + +extern void abort(void); + +struct s { + unsigned b : 23; + unsigned a : 8; +}; + +#define N 32 +#define V 5 +struct s A[N]; + +void __attribute__ ((noipa)) +f(struct s *ptr, unsigned n) { + for (int i = 0; i < n; ++i) + ptr[i].a = V; +} + +void __attribute__ ((noipa)) +check_f(struct s *ptr) { + for (unsigned i = 0; i < N; ++i) + if (ptr[i].a != V) + abort (); +} + +int main (void) +{ + check_vect (); + __builtin_memset (&A[0], 0, sizeof(struct s) * N); + + f(&A[0], N); + check_f (&A[0]); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ + diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc index bac29fb557462f5d3193481ef180f1412e8bc639..e468a4659fa28a3a31c3390cf19bee65f4590b80 100644 --- a/gcc/tree-if-conv.cc +++ b/gcc/tree-if-conv.cc @@ -91,6 +91,7 @@ along with GCC; see the file COPYING3. If not see #include "tree-pass.h" #include "ssa.h" #include "expmed.h" +#include "expr.h" #include "optabs-query.h" #include "gimple-pretty-print.h" #include "alias.h" @@ -123,6 +124,9 @@ along with GCC; see the file COPYING3. If not see #include "tree-vectorizer.h" #include "tree-eh.h" +/* For lang_hooks.types.type_for_mode. */ +#include "langhooks.h" + /* Only handle PHIs with no more arguments unless we are asked to by simd pragma. */ #define MAX_PHI_ARG_NUM \ @@ -145,6 +149,12 @@ static bool need_to_rewrite_undefined; before phi_convertible_by_degenerating_args. */ static bool any_complicated_phi; +/* True if we have bitfield accesses we can lower. */ +static bool need_to_lower_bitfields; + +/* True if there is any ifcvting to be done. */ +static bool need_to_ifcvt; + /* Hash for struct innermost_loop_behavior. It depends on the user to free the memory. */ @@ -1411,15 +1421,6 @@ if_convertible_loop_p_1 (class loop *loop, vec<data_reference_p> *refs) calculate_dominance_info (CDI_DOMINATORS); - /* Allow statements that can be handled during if-conversion. */ - ifc_bbs = get_loop_body_in_if_conv_order (loop); - if (!ifc_bbs) - { - if (dump_file && (dump_flags & TDF_DETAILS)) - fprintf (dump_file, "Irreducible loop\n"); - return false; - } - for (i = 0; i < loop->num_nodes; i++) { basic_block bb = ifc_bbs[i]; @@ -2899,18 +2900,22 @@ version_loop_for_if_conversion (class loop *loop, vec<gimple *> *preds) class loop *new_loop; gimple *g; gimple_stmt_iterator gsi; - unsigned int save_length; + unsigned int save_length = 0; g = gimple_build_call_internal (IFN_LOOP_VECTORIZED, 2, build_int_cst (integer_type_node, loop->num), integer_zero_node); gimple_call_set_lhs (g, cond); - /* Save BB->aux around loop_version as that uses the same field. */ - save_length = loop->inner ? loop->inner->num_nodes : loop->num_nodes; - void **saved_preds = XALLOCAVEC (void *, save_length); - for (unsigned i = 0; i < save_length; i++) - saved_preds[i] = ifc_bbs[i]->aux; + void **saved_preds = NULL; + if (any_complicated_phi || need_to_predicate) + { + /* Save BB->aux around loop_version as that uses the same field. */ + save_length = loop->inner ? loop->inner->num_nodes : loop->num_nodes; + saved_preds = XALLOCAVEC (void *, save_length); + for (unsigned i = 0; i < save_length; i++) + saved_preds[i] = ifc_bbs[i]->aux; + } initialize_original_copy_tables (); /* At this point we invalidate porfile confistency until IFN_LOOP_VECTORIZED @@ -2922,8 +2927,9 @@ version_loop_for_if_conversion (class loop *loop, vec<gimple *> *preds) profile_probability::always (), true); free_original_copy_tables (); - for (unsigned i = 0; i < save_length; i++) - ifc_bbs[i]->aux = saved_preds[i]; + if (any_complicated_phi || need_to_predicate) + for (unsigned i = 0; i < save_length; i++) + ifc_bbs[i]->aux = saved_preds[i]; if (new_loop == NULL) return NULL; @@ -2999,7 +3005,7 @@ ifcvt_split_critical_edges (class loop *loop, bool aggressive_if_conv) auto_vec<edge> critical_edges; /* Loop is not well formed. */ - if (num <= 2 || loop->inner || !single_exit (loop)) + if (loop->inner) return false; body = get_loop_body (loop); @@ -3260,6 +3266,201 @@ ifcvt_hoist_invariants (class loop *loop, edge pe) free (body); } +/* Returns the DECL_FIELD_BIT_OFFSET of the bitfield accesse in stmt iff its + type mode is not BLKmode. If BITPOS is not NULL it will hold the poly_int64 + value of the DECL_FIELD_BIT_OFFSET of the bitfield access and STRUCT_EXPR, + if not NULL, will hold the tree representing the base struct of this + bitfield. */ + +static tree +get_bitfield_rep (gassign *stmt, bool write, tree *bitpos, + tree *struct_expr) +{ + tree comp_ref = write ? gimple_assign_lhs (stmt) + : gimple_assign_rhs1 (stmt); + + tree field_decl = TREE_OPERAND (comp_ref, 1); + tree rep_decl = DECL_BIT_FIELD_REPRESENTATIVE (field_decl); + + /* Bail out if the representative is BLKmode as we will not be able to + vectorize this. */ + if (TYPE_MODE (TREE_TYPE (rep_decl)) == E_BLKmode) + return NULL_TREE; + + /* Bail out if the DECL_SIZE of the field_decl isn't the same as the BF's + precision. */ + unsigned HOST_WIDE_INT bf_prec + = TYPE_PRECISION (TREE_TYPE (gimple_assign_lhs (stmt))); + if (compare_tree_int (DECL_SIZE (field_decl), bf_prec) != 0) + return NULL_TREE; + + if (struct_expr) + *struct_expr = TREE_OPERAND (comp_ref, 0); + + if (bitpos) + *bitpos + = fold_build2 (MINUS_EXPR, bitsizetype, + DECL_FIELD_BIT_OFFSET (field_decl), + DECL_FIELD_BIT_OFFSET (rep_decl)); + + return rep_decl; + +} + +/* Lowers the bitfield described by DATA. + For a write like: + + struct.bf = _1; + + lower to: + + __ifc_1 = struct.<representative>; + __ifc_2 = BIT_INSERT_EXPR (__ifc_1, _1, bitpos); + struct.<representative> = __ifc_2; + + For a read: + + _1 = struct.bf; + + lower to: + + __ifc_1 = struct.<representative>; + _1 = BIT_FIELD_REF (__ifc_1, bitsize, bitpos); + + where representative is a legal load that contains the bitfield value, + bitsize is the size of the bitfield and bitpos the offset to the start of + the bitfield within the representative. */ + +static void +lower_bitfield (gassign *stmt, bool write) +{ + tree struct_expr; + tree bitpos; + tree rep_decl = get_bitfield_rep (stmt, write, &bitpos, &struct_expr); + tree rep_type = TREE_TYPE (rep_decl); + tree bf_type = TREE_TYPE (gimple_assign_lhs (stmt)); + + gimple_stmt_iterator gsi = gsi_for_stmt (stmt); + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "Lowering:\n"); + print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM); + fprintf (dump_file, "to:\n"); + } + + /* REP_COMP_REF is a COMPONENT_REF for the representative. NEW_VAL is it's + defining SSA_NAME. */ + tree rep_comp_ref = build3 (COMPONENT_REF, rep_type, struct_expr, rep_decl, + NULL_TREE); + tree new_val = ifc_temp_var (rep_type, rep_comp_ref, &gsi); + + if (dump_file && (dump_flags & TDF_DETAILS)) + print_gimple_stmt (dump_file, SSA_NAME_DEF_STMT (new_val), 0, TDF_SLIM); + + if (write) + { + new_val = ifc_temp_var (rep_type, + build3 (BIT_INSERT_EXPR, rep_type, new_val, + unshare_expr (gimple_assign_rhs1 (stmt)), + bitpos), &gsi); + + if (dump_file && (dump_flags & TDF_DETAILS)) + print_gimple_stmt (dump_file, SSA_NAME_DEF_STMT (new_val), 0, TDF_SLIM); + + gimple *new_stmt = gimple_build_assign (unshare_expr (rep_comp_ref), + new_val); + gimple_move_vops (new_stmt, stmt); + gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT); + + if (dump_file && (dump_flags & TDF_DETAILS)) + print_gimple_stmt (dump_file, new_stmt, 0, TDF_SLIM); + } + else + { + tree bfr = build3 (BIT_FIELD_REF, bf_type, new_val, + build_int_cst (bitsizetype, TYPE_PRECISION (bf_type)), + bitpos); + new_val = ifc_temp_var (bf_type, bfr, &gsi); + + gimple *new_stmt = gimple_build_assign (gimple_assign_lhs (stmt), + new_val); + gimple_move_vops (new_stmt, stmt); + gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT); + + if (dump_file && (dump_flags & TDF_DETAILS)) + print_gimple_stmt (dump_file, new_stmt, 0, TDF_SLIM); + } + + gsi_remove (&gsi, true); +} + +/* Return TRUE if there are bitfields to lower in this LOOP. Fill TO_LOWER + with data structures representing these bitfields. */ + +static bool +bitfields_to_lower_p (class loop *loop, + vec <gassign *> &reads_to_lower, + vec <gassign *> &writes_to_lower) +{ + gimple_stmt_iterator gsi; + + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "Analyzing loop %d for bitfields:\n", loop->num); + } + + for (unsigned i = 0; i < loop->num_nodes; ++i) + { + basic_block bb = ifc_bbs[i]; + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) + { + gassign *stmt = dyn_cast<gassign*> (gsi_stmt (gsi)); + if (!stmt) + continue; + + tree op = gimple_assign_lhs (stmt); + bool write = TREE_CODE (op) == COMPONENT_REF; + + if (!write) + op = gimple_assign_rhs1 (stmt); + + if (TREE_CODE (op) != COMPONENT_REF) + continue; + + if (DECL_BIT_FIELD_TYPE (TREE_OPERAND (op, 1))) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM); + + if (!INTEGRAL_TYPE_P (TREE_TYPE (op))) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "\t Bitfield NO OK to lower," + " field type is not Integral.\n"); + return false; + } + + if (!get_bitfield_rep (stmt, write, NULL, NULL)) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "\t Bitfield NOT OK to lower," + " representative is BLKmode.\n"); + return false; + } + + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "\tBitfield OK to lower.\n"); + if (write) + writes_to_lower.safe_push (stmt); + else + reads_to_lower.safe_push (stmt); + } + } + } + return !reads_to_lower.is_empty () || !writes_to_lower.is_empty (); +} + + /* If-convert LOOP when it is legal. For the moment this pass has no profitability analysis. Returns non-zero todo flags when something changed. */ @@ -3270,12 +3471,16 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds) unsigned int todo = 0; bool aggressive_if_conv; class loop *rloop; + auto_vec <gassign *, 4> reads_to_lower; + auto_vec <gassign *, 4> writes_to_lower; bitmap exit_bbs; edge pe; again: rloop = NULL; ifc_bbs = NULL; + need_to_lower_bitfields = false; + need_to_ifcvt = false; need_to_predicate = false; need_to_rewrite_undefined = false; any_complicated_phi = false; @@ -3291,16 +3496,42 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds) aggressive_if_conv = true; } - if (!ifcvt_split_critical_edges (loop, aggressive_if_conv)) + if (!single_exit (loop)) goto cleanup; - if (!if_convertible_loop_p (loop) - || !dbg_cnt (if_conversion_tree)) + /* If there are more than two BBs in the loop then there is at least one if + to convert. */ + if (loop->num_nodes > 2 + && !ifcvt_split_critical_edges (loop, aggressive_if_conv)) goto cleanup; - if ((need_to_predicate || any_complicated_phi) - && ((!flag_tree_loop_vectorize && !loop->force_vectorize) - || loop->dont_vectorize)) + ifc_bbs = get_loop_body_in_if_conv_order (loop); + if (!ifc_bbs) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "Irreducible loop\n"); + goto cleanup; + } + + if (loop->num_nodes > 2) + { + need_to_ifcvt = true; + + if (!if_convertible_loop_p (loop) || !dbg_cnt (if_conversion_tree)) + goto cleanup; + + if ((need_to_predicate || any_complicated_phi) + && ((!flag_tree_loop_vectorize && !loop->force_vectorize) + || loop->dont_vectorize)) + goto cleanup; + } + + if ((flag_tree_loop_vectorize || loop->force_vectorize) + && !loop->dont_vectorize) + need_to_lower_bitfields = bitfields_to_lower_p (loop, reads_to_lower, + writes_to_lower); + + if (!need_to_ifcvt && !need_to_lower_bitfields) goto cleanup; /* The edge to insert invariant stmts on. */ @@ -3311,7 +3542,8 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds) Either version this loop, or if the pattern is right for outer-loop vectorization, version the outer loop. In the latter case we will still if-convert the original inner loop. */ - if (need_to_predicate + if (need_to_lower_bitfields + || need_to_predicate || any_complicated_phi || flag_tree_loop_if_convert != 1) { @@ -3351,10 +3583,31 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds) pe = single_pred_edge (gimple_bb (preds->last ())); } - /* Now all statements are if-convertible. Combine all the basic - blocks into one huge basic block doing the if-conversion - on-the-fly. */ - combine_blocks (loop); + if (need_to_lower_bitfields) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "-------------------------\n"); + fprintf (dump_file, "Start lowering bitfields\n"); + } + while (!reads_to_lower.is_empty ()) + lower_bitfield (reads_to_lower.pop (), false); + while (!writes_to_lower.is_empty ()) + lower_bitfield (writes_to_lower.pop (), true); + + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "Done lowering bitfields\n"); + fprintf (dump_file, "-------------------------\n"); + } + } + if (need_to_ifcvt) + { + /* Now all statements are if-convertible. Combine all the basic + blocks into one huge basic block doing the if-conversion + on-the-fly. */ + combine_blocks (loop); + } /* Perform local CSE, this esp. helps the vectorizer analysis if loads and stores are involved. CSE only the loop body, not the entry @@ -3394,6 +3647,8 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds) if (rloop != NULL) { loop = rloop; + reads_to_lower.truncate (0); + writes_to_lower.truncate (0); goto again; } diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc index e03b50498d164144da3220df8ee5bcf4248db821..4a23d6172aaa12ad7049dc626e5c4afbd5ca3f74 100644 --- a/gcc/tree-vect-data-refs.cc +++ b/gcc/tree-vect-data-refs.cc @@ -4302,7 +4302,8 @@ vect_find_stmt_data_reference (loop_p loop, gimple *stmt, free_data_ref (dr); return opt_result::failure_at (stmt, "not vectorized:" - " statement is bitfield access %G", stmt); + " statement is an unsupported" + " bitfield access %G", stmt); } if (DR_BASE_ADDRESS (dr) diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc index d2bd15b5e9005bce2612f0b32c0acf6ffe776343..0cc315d312667c05a27df4cdf435f0d0e6fd4a52 100644 --- a/gcc/tree-vect-patterns.cc +++ b/gcc/tree-vect-patterns.cc @@ -35,6 +35,8 @@ along with GCC; see the file COPYING3. If not see #include "tree-eh.h" #include "gimplify.h" #include "gimple-iterator.h" +#include "gimple-fold.h" +#include "gimplify-me.h" #include "cfgloop.h" #include "tree-vectorizer.h" #include "dumpfile.h" @@ -663,7 +665,7 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code, is NULL, the caller must set SSA_NAME_DEF_STMT for the returned SSA var. */ static tree -vect_recog_temp_ssa_var (tree type, gimple *stmt) +vect_recog_temp_ssa_var (tree type, gimple *stmt = NULL) { return make_temp_ssa_name (type, stmt, "patt"); } @@ -1829,6 +1831,330 @@ vect_recog_widen_sum_pattern (vec_info *vinfo, return pattern_stmt; } +/* Function vect_recog_bitfield_ref_pattern + + Try to find the following pattern: + + bf_value = BIT_FIELD_REF (container, bitsize, bitpos); + result = (type_out) bf_value; + + where type_out is a non-bitfield type, that is to say, it's precision matches + 2^(TYPE_SIZE(type_out) - (TYPE_UNSIGNED (type_out) ? 1 : 2)). + + Input: + + * STMT_VINFO: The stmt from which the pattern search begins. + here it starts with: + result = (type_out) bf_value; + + Output: + + * TYPE_OUT: The vector type of the output of this pattern. + + * Return value: A new stmt that will be used to replace the sequence of + stmts that constitute the pattern. If the precision of type_out is bigger + than the precision type of _1 we perform the widening before the shifting, + since the new precision will be large enough to shift the value and moving + widening operations up the statement chain enables the generation of + widening loads. If we are widening and the operation after the pattern is + an addition then we mask first and shift later, to enable the generation of + shifting adds. In the case of narrowing we will always mask first, shift + last and then perform a narrowing operation. This will enable the + generation of narrowing shifts. + + Widening with mask first, shift later: + container = (type_out) container; + masked = container & (((1 << bitsize) - 1) << bitpos); + result = patt2 >> masked; + + Widening with shift first, mask last: + container = (type_out) container; + shifted = container >> bitpos; + result = shifted & ((1 << bitsize) - 1); + + Narrowing: + masked = container & (((1 << bitsize) - 1) << bitpos); + result = masked >> bitpos; + result = (type_out) result; + + The shifting is always optional depending on whether bitpos != 0. + +*/ + +static gimple * +vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info, + tree *type_out) +{ + gassign *first_stmt = dyn_cast <gassign *> (stmt_info->stmt); + + if (!first_stmt) + return NULL; + + gassign *bf_stmt; + if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (first_stmt)) + && TREE_CODE (gimple_assign_rhs1 (first_stmt)) == SSA_NAME) + { + gimple *second_stmt + = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (first_stmt)); + bf_stmt = dyn_cast <gassign *> (second_stmt); + if (!bf_stmt + || gimple_assign_rhs_code (bf_stmt) != BIT_FIELD_REF) + return NULL; + } + else + return NULL; + + tree bf_ref = gimple_assign_rhs1 (bf_stmt); + tree container = TREE_OPERAND (bf_ref, 0); + + if (!bit_field_offset (bf_ref).is_constant () + || !bit_field_size (bf_ref).is_constant () + || !tree_fits_uhwi_p (TYPE_SIZE (TREE_TYPE (container)))) + return NULL; + + if (!INTEGRAL_TYPE_P (TREE_TYPE (bf_ref)) + || TYPE_MODE (TREE_TYPE (container)) == E_BLKmode) + return NULL; + + gimple *use_stmt, *pattern_stmt; + use_operand_p use_p; + tree ret = gimple_assign_lhs (first_stmt); + tree ret_type = TREE_TYPE (ret); + bool shift_first = true; + tree vectype; + + /* If the first operand of the BIT_FIELD_REF is not an INTEGER type, convert + it to one of the same width so we can perform the necessary masking and + shifting. */ + if (!INTEGRAL_TYPE_P (TREE_TYPE (container))) + { + unsigned HOST_WIDE_INT container_size = + tree_to_uhwi (TYPE_SIZE (TREE_TYPE (container))); + tree int_type = build_nonstandard_integer_type (container_size, true); + pattern_stmt + = gimple_build_assign (vect_recog_temp_ssa_var (int_type), + VIEW_CONVERT_EXPR, container); + vectype = get_vectype_for_scalar_type (vinfo, int_type); + container = gimple_assign_lhs (pattern_stmt); + append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype); + } + else + vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (container)); + + /* We move the conversion earlier if the loaded type is smaller than the + return type to enable the use of widening loads. */ + if (TYPE_PRECISION (TREE_TYPE (container)) < TYPE_PRECISION (ret_type) + && !useless_type_conversion_p (TREE_TYPE (container), ret_type)) + { + pattern_stmt + = gimple_build_assign (vect_recog_temp_ssa_var (ret_type), + NOP_EXPR, container); + container = gimple_get_lhs (pattern_stmt); + append_pattern_def_seq (vinfo, stmt_info, pattern_stmt); + } + else if (!useless_type_conversion_p (TREE_TYPE (container), ret_type)) + /* If we are doing the conversion last then also delay the shift as we may + be able to combine the shift and conversion in certain cases. */ + shift_first = false; + + tree container_type = TREE_TYPE (container); + + /* If the only use of the result of this BIT_FIELD_REF + CONVERT is a + PLUS_EXPR then do the shift last as some targets can combine the shift and + add into a single instruction. */ + if (single_imm_use (gimple_assign_lhs (first_stmt), &use_p, &use_stmt)) + { + if (gimple_code (use_stmt) == GIMPLE_ASSIGN + && gimple_assign_rhs_code (use_stmt) == PLUS_EXPR) + shift_first = false; + } + + unsigned HOST_WIDE_INT shift_n = bit_field_offset (bf_ref).to_constant (); + unsigned HOST_WIDE_INT mask_width = bit_field_size (bf_ref).to_constant (); + unsigned HOST_WIDE_INT prec = tree_to_uhwi (TYPE_SIZE (container_type)); + if (BYTES_BIG_ENDIAN) + shift_n = prec - shift_n - mask_width; + + /* If we don't have to shift we only generate the mask, so just fix the + code-path to shift_first. */ + if (shift_n == 0) + shift_first = true; + + tree result; + if (shift_first) + { + tree shifted = container; + if (shift_n) + { + pattern_stmt + = gimple_build_assign (vect_recog_temp_ssa_var (container_type), + RSHIFT_EXPR, container, + build_int_cst (sizetype, shift_n)); + shifted = gimple_assign_lhs (pattern_stmt); + append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype); + } + + tree mask = wide_int_to_tree (container_type, + wi::mask (mask_width, false, prec)); + + pattern_stmt + = gimple_build_assign (vect_recog_temp_ssa_var (container_type), + BIT_AND_EXPR, shifted, mask); + result = gimple_assign_lhs (pattern_stmt); + } + else + { + tree mask = wide_int_to_tree (container_type, + wi::shifted_mask (shift_n, mask_width, + false, prec)); + pattern_stmt + = gimple_build_assign (vect_recog_temp_ssa_var (container_type), + BIT_AND_EXPR, container, mask); + tree masked = gimple_assign_lhs (pattern_stmt); + + append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype); + pattern_stmt + = gimple_build_assign (vect_recog_temp_ssa_var (container_type), + RSHIFT_EXPR, masked, + build_int_cst (sizetype, shift_n)); + result = gimple_assign_lhs (pattern_stmt); + } + + if (!useless_type_conversion_p (TREE_TYPE (result), ret_type)) + { + append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype); + pattern_stmt + = gimple_build_assign (vect_recog_temp_ssa_var (ret_type), + NOP_EXPR, result); + } + + *type_out = STMT_VINFO_VECTYPE (stmt_info); + vect_pattern_detected ("bitfield_ref pattern", stmt_info->stmt); + + return pattern_stmt; +} + +/* Function vect_recog_bit_insert_pattern + + Try to find the following pattern: + + written = BIT_INSERT_EXPR (container, value, bitpos); + + Input: + + * STMT_VINFO: The stmt we want to replace. + + Output: + + * TYPE_OUT: The vector type of the output of this pattern. + + * Return value: A new stmt that will be used to replace the sequence of + stmts that constitute the pattern. In this case it will be: + value = (container_type) value; // Make sure + shifted = value << bitpos; // Shift value into place + masked = shifted & (mask << bitpos); // Mask off the non-relevant bits in + // the 'to-write value'. + cleared = container & ~(mask << bitpos); // Clearing the bits we want to + // write to from the value we want + // to write to. + written = cleared | masked; // Write bits. + + + where mask = ((1 << TYPE_PRECISION (value)) - 1), a mask to keep the number of + bits corresponding to the real size of the bitfield value we are writing to. + The shifting is always optional depending on whether bitpos != 0. + +*/ + +static gimple * +vect_recog_bit_insert_pattern (vec_info *vinfo, stmt_vec_info stmt_info, + tree *type_out) +{ + gassign *bf_stmt = dyn_cast <gassign *> (stmt_info->stmt); + if (!bf_stmt || gimple_assign_rhs_code (bf_stmt) != BIT_INSERT_EXPR) + return NULL; + + tree container = gimple_assign_rhs1 (bf_stmt); + tree value = gimple_assign_rhs2 (bf_stmt); + tree shift = gimple_assign_rhs3 (bf_stmt); + + tree bf_type = TREE_TYPE (value); + tree container_type = TREE_TYPE (container); + + if (!INTEGRAL_TYPE_P (container_type) + || !tree_fits_uhwi_p (TYPE_SIZE (container_type))) + return NULL; + + gimple *pattern_stmt; + + vect_unpromoted_value unprom; + unprom.set_op (value, vect_internal_def); + value = vect_convert_input (vinfo, stmt_info, container_type, &unprom, + get_vectype_for_scalar_type (vinfo, + container_type)); + + unsigned HOST_WIDE_INT mask_width = TYPE_PRECISION (bf_type); + unsigned HOST_WIDE_INT prec = tree_to_uhwi (TYPE_SIZE (container_type)); + unsigned HOST_WIDE_INT shift_n = tree_to_uhwi (shift); + if (BYTES_BIG_ENDIAN) + { + shift_n = prec - shift_n - mask_width; + shift = build_int_cst (TREE_TYPE (shift), shift_n); + } + + if (!useless_type_conversion_p (TREE_TYPE (value), container_type)) + { + pattern_stmt = + gimple_build_assign (vect_recog_temp_ssa_var (container_type), + NOP_EXPR, value); + append_pattern_def_seq (vinfo, stmt_info, pattern_stmt); + value = gimple_get_lhs (pattern_stmt); + } + + /* Shift VALUE into place. */ + tree shifted = value; + if (shift_n) + { + pattern_stmt + = gimple_build_assign (vect_recog_temp_ssa_var (container_type), + LSHIFT_EXPR, value, shift); + append_pattern_def_seq (vinfo, stmt_info, pattern_stmt); + shifted = gimple_get_lhs (pattern_stmt); + } + + tree mask_t + = wide_int_to_tree (container_type, + wi::shifted_mask (shift_n, mask_width, false, prec)); + + /* Clear bits we don't want to write back from SHIFTED. */ + gimple_seq stmts = NULL; + tree masked = gimple_build (&stmts, BIT_AND_EXPR, container_type, shifted, + mask_t); + if (!gimple_seq_empty_p (stmts)) + { + pattern_stmt = gimple_seq_first_stmt (stmts); + append_pattern_def_seq (vinfo, stmt_info, pattern_stmt); + } + + /* Mask off the bits in the container that we are to write to. */ + mask_t = wide_int_to_tree (container_type, + wi::shifted_mask (shift_n, mask_width, true, prec)); + tree cleared = vect_recog_temp_ssa_var (container_type); + pattern_stmt = gimple_build_assign (cleared, BIT_AND_EXPR, container, mask_t); + append_pattern_def_seq (vinfo, stmt_info, pattern_stmt); + + /* Write MASKED into CLEARED. */ + pattern_stmt + = gimple_build_assign (vect_recog_temp_ssa_var (container_type), + BIT_IOR_EXPR, cleared, masked); + + *type_out = STMT_VINFO_VECTYPE (stmt_info); + vect_pattern_detected ("bit_insert pattern", stmt_info->stmt); + + return pattern_stmt; +} + + /* Recognize cases in which an operation is performed in one type WTYPE but could be done more efficiently in a narrower type NTYPE. For example, if we have: @@ -5622,6 +5948,8 @@ struct vect_recog_func taken which means usually the more complex one needs to preceed the less comples onex (widen_sum only after dot_prod or sad for example). */ static vect_recog_func vect_vect_recog_func_ptrs[] = { + { vect_recog_bitfield_ref_pattern, "bitfield_ref" }, + { vect_recog_bit_insert_pattern, "bit_insert" }, { vect_recog_over_widening_pattern, "over_widening" }, /* Must come after over_widening, which narrows the shift as much as possible beforehand. */
This commit failed tests FAIL: gcc.target/i386/pr101668.c scan-assembler vpmovsxdq FAIL: gcc.target/i386/pr101668.c scan-assembler vpmovsxdq FAIL: gcc.target/i386/pr101668.c scan-assembler vpmovsxdq FAIL: gcc.target/i386/pr92645.c scan-tree-dump-times optimized "vec_unpack_" 4 FAIL: gcc.target/i386/pr92645.c scan-tree-dump-times optimized "vec_unpack_" 4 FAIL: gcc.target/i386/pr92645.c scan-tree-dump-times optimized "vec_unpack_" 4 FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxbd 2 FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxbd 2 FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxbd 2 FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxbq 2 FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxbq 2 FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxbq 2 FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxbw 2 FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxbw 2 FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxbw 2 FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxdq 2 FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxdq 2 FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxdq 2 FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxwd 2 FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxwd 2 FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxwd 2 FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxwq 2 FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxwq 2 FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxwq 2 FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbd 2 FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbd 2 FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbd 2 FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbq 2 FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbq 2 FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbq 2 FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbw 2 FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbw 2 FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbw 2 FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxdq 2 FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxdq 2 FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxdq 2 FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxwd 2 FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxwd 2 FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxwd 2 FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxwq 2 FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxwq 2 FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxwq 2 FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxbd 2 FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxbd 2 FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxbd 2 FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxbq 2 FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxbq 2 FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxbq 2 FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxbw 2 FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxbw 2 FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxbw 2 FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxdq 2 FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxdq 2 FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxdq 2 FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxwd 2 FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxwd 2 FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxwd 2 FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxwq 2 FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxwq 2 FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxwq 2 FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxbd 2 FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxbd 2 FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxbd 2 FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxbq 2 FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxbq 2 FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxbq 2 FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxbw 2 FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxbw 2 FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxbw 2 FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxdq 2 FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxdq 2 FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxdq 2 FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxwd 2 FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxwd 2 FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxwd 2 FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxwq 2 FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxwq 2 FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxwq 2 FAIL: gcc.target/i386/pr92658-avx512bw-trunc.c scan-assembler-times vpmovwb 3 FAIL: gcc.target/i386/pr92658-avx512bw-trunc.c scan-assembler-times vpmovwb 3 FAIL: gcc.target/i386/pr92658-avx512bw-trunc.c scan-assembler-times vpmovwb 3 FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovdb 1 FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovdb 1 FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovdb 1 FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovdw 1 FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovdw 1 FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovdw 1 FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovqb 1 FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovqb 1 FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovqb 1 FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovqd 1 FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovqd 1 FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovqd 1 FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovqw 1 FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovqw 1 FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovqw 1 FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovdb 2 FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovdb 2 FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovdb 2 FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovdw 2 FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovdw 2 FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovdw 2 FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqb[ \t]*%xmm 1 FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqb[ \t]*%xmm 1 FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqb[ \t]*%xmm 1 FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqb[ \t]*%ymm 1 FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqb[ \t]*%ymm 1 FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqb[ \t]*%ymm 1 FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqd 2 FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqd 2 FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqd 2 FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqw 2 FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqw 2 FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqw 2 FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbd 2 FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbd 2 FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbd 2 FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbq 2 FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbq 2 FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbq 2 FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbw 2 FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbw 2 FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbw 2 FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxdq 2 FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxdq 2 FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxdq 2 FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxwd 2 FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxwd 2 FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxwd 2 FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxwq 2 FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxwq 2 FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxwq 2 FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbd 2 FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbd 2 FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbd 2 FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbq 2 FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbq 2 FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbq 2 FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbw 2 FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbw 2 FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbw 2 FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxdq 2 FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxdq 2 FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxdq 2 FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxwd 2 FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxwd 2 FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxwd 2 FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxwq 2 FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxwq 2 FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxwq 2 On Fri, Oct 7, 2022 at 10:21 PM Andre Vieira (lists) via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > Hi, > > Whilst running a bootstrap with extra options to force bitfield > vectorization '-O2 -ftree-vectorize -ftree-loop-if-convert > -fno-vect-cost-model' I ran into an ICE in vect-patterns where a > bit_field_ref had a container that wasn't INTEGRAL_TYPE and had a > E_BLKmode, which meant we failed to build an integer type with the same > size. For that reason I added a check to bail out earlier if the > TYPE_MODE of the container is indeed E_BLKmode. The pattern for the > bitfield inserts required no change as we currently don't support > containers that aren't integer typed. > > Also changed a testcase because in BIG-ENDIAN it was not vectorizing due > to a different size of container that wasn't supported. > > This passes the same bootstrap and regressions on aarch64-none-linux and > no regressions on aarch64_be-none-elf either. > > I assume you are OK with these changes Richard, but I don't like to > commit on Friday in case something breaks over the weekend, so I'll > leave it until Monday. > > Thanks, > Andre > > On 29/09/2022 08:54, Richard Biener wrote: > > On Wed, Sep 28, 2022 at 7:32 PM Andre Vieira (lists) via Gcc-patches > > <gcc-patches@gcc.gnu.org> wrote: > >> Made the change and also created the ChangeLogs. > > OK if bootstrap / testing succeeds. > > > > Thanks, > > Richard. > > > >> gcc/ChangeLog: > >> > >> * tree-if-conv.cc (if_convertible_loop_p_1): Move ordering of > >> loop bb's from here... > >> (tree_if_conversion): ... to here. Also call bitfield lowering > >> when appropriate. > >> (version_loop_for_if_conversion): Adapt to enable loop > >> versioning when we only need > >> to lower bitfields. > >> (ifcvt_split_critical_edges): Relax condition of expected loop > >> form as this is checked earlier. > >> (get_bitfield_rep): New function. > >> (lower_bitfield): Likewise. > >> (bitfields_to_lower_p): Likewise. > >> (need_to_lower_bitfields): New global boolean. > >> (need_to_ifcvt): Likewise. > >> * tree-vect-data-refs.cc (vect_find_stmt_data_reference): > >> Improve diagnostic message. > >> * tree-vect-patterns.cc (vect_recog_temp_ssa_var): Add default > >> value for last parameter. > >> (vect_recog_bitfield_ref_pattern): New. > >> (vect_recog_bit_insert_pattern): New. > >> > >> gcc/testsuite/ChangeLog: > >> > >> * gcc.dg/vect/vect-bitfield-read-1.c: New test. > >> * gcc.dg/vect/vect-bitfield-read-2.c: New test. > >> * gcc.dg/vect/vect-bitfield-read-3.c: New test. > >> * gcc.dg/vect/vect-bitfield-read-4.c: New test. > >> * gcc.dg/vect/vect-bitfield-read-5.c: New test. > >> * gcc.dg/vect/vect-bitfield-read-6.c: New test. > >> * gcc.dg/vect/vect-bitfield-write-1.c: New test. > >> * gcc.dg/vect/vect-bitfield-write-2.c: New test. > >> * gcc.dg/vect/vect-bitfield-write-3.c: New test. > >> * gcc.dg/vect/vect-bitfield-write-4.c: New test. > >> * gcc.dg/vect/vect-bitfield-write-5.c: New test. > >> > >> On 28/09/2022 10:43, Andre Vieira (lists) via Gcc-patches wrote: > >>> On 27/09/2022 13:34, Richard Biener wrote: > >>>> On Mon, 26 Sep 2022, Andre Vieira (lists) wrote: > >>>> > >>>>> On 08/09/2022 12:51, Richard Biener wrote: > >>>>>> I'm curious, why the push to redundant_ssa_names? That could use > >>>>>> a comment ... > >>>>> So I purposefully left a #if 0 #else #endif in there so you can see > >>>>> the two > >>>>> options. But the reason I used redundant_ssa_names is because ifcvt > >>>>> seems to > >>>>> use that as a container for all pairs of (old, new) ssa names to > >>>>> replace > >>>>> later. So I just piggy backed on that. I don't know if there's a > >>>>> specific > >>>>> reason they do the replacement at the end? Maybe some ordering > >>>>> issue? Either > >>>>> way both adding it to redundant_ssa_names or doing the replacement > >>>>> inline work > >>>>> for the bitfield lowering (or work in my testing at least). > >>>> Possibly because we (in the past?) inserted/copied stuff based on > >>>> predicates generated at analysis time after we decide to elide something > >>>> so we need to watch for later appearing uses. But who knows ... my mind > >>>> fails me here. > >>>> > >>>> If it works to replace uses immediately please do so. But now > >>>> I wonder why we need this - the value shouldn't change so you > >>>> should get away with re-using the existing SSA name for the final value? > >>> Yeah... good point. A quick change and minor testing seems to agree. > >>> I'm sure I had a good reason to do it initially ;) > >>> > >>> I'll run a full-regression on this change to make sure I didn't miss > >>> anything. > >>>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107226 On Wed, Oct 12, 2022 at 9:55 AM Hongtao Liu <crazylht@gmail.com> wrote: > > This commit failed tests > > FAIL: gcc.target/i386/pr101668.c scan-assembler vpmovsxdq > FAIL: gcc.target/i386/pr101668.c scan-assembler vpmovsxdq > FAIL: gcc.target/i386/pr101668.c scan-assembler vpmovsxdq > FAIL: gcc.target/i386/pr92645.c scan-tree-dump-times optimized "vec_unpack_" 4 > FAIL: gcc.target/i386/pr92645.c scan-tree-dump-times optimized "vec_unpack_" 4 > FAIL: gcc.target/i386/pr92645.c scan-tree-dump-times optimized "vec_unpack_" 4 > FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxbd 2 > FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxbd 2 > FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxbd 2 > FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxbq 2 > FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxbq 2 > FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxbq 2 > FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxbw 2 > FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxbw 2 > FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxbw 2 > FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxdq 2 > FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxdq 2 > FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxdq 2 > FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxwd 2 > FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxwd 2 > FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxwd 2 > FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxwq 2 > FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxwq 2 > FAIL: gcc.target/i386/pr92658-avx2-2.c scan-assembler-times pmovsxwq 2 > FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbd 2 > FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbd 2 > FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbd 2 > FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbq 2 > FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbq 2 > FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbq 2 > FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbw 2 > FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbw 2 > FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxbw 2 > FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxdq 2 > FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxdq 2 > FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxdq 2 > FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxwd 2 > FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxwd 2 > FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxwd 2 > FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxwq 2 > FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxwq 2 > FAIL: gcc.target/i386/pr92658-avx2.c scan-assembler-times pmovzxwq 2 > FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxbd 2 > FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxbd 2 > FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxbd 2 > FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxbq 2 > FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxbq 2 > FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxbq 2 > FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxbw 2 > FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxbw 2 > FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxbw 2 > FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxdq 2 > FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxdq 2 > FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxdq 2 > FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxwd 2 > FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxwd 2 > FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxwd 2 > FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxwq 2 > FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxwq 2 > FAIL: gcc.target/i386/pr92658-avx512bw-2.c scan-assembler-times pmovsxwq 2 > FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxbd 2 > FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxbd 2 > FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxbd 2 > FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxbq 2 > FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxbq 2 > FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxbq 2 > FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxbw 2 > FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxbw 2 > FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxbw 2 > FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxdq 2 > FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxdq 2 > FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxdq 2 > FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxwd 2 > FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxwd 2 > FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxwd 2 > FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxwq 2 > FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxwq 2 > FAIL: gcc.target/i386/pr92658-avx512bw.c scan-assembler-times pmovzxwq 2 > FAIL: gcc.target/i386/pr92658-avx512bw-trunc.c scan-assembler-times vpmovwb 3 > FAIL: gcc.target/i386/pr92658-avx512bw-trunc.c scan-assembler-times vpmovwb 3 > FAIL: gcc.target/i386/pr92658-avx512bw-trunc.c scan-assembler-times vpmovwb 3 > FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovdb 1 > FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovdb 1 > FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovdb 1 > FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovdw 1 > FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovdw 1 > FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovdw 1 > FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovqb 1 > FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovqb 1 > FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovqb 1 > FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovqd 1 > FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovqd 1 > FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovqd 1 > FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovqw 1 > FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovqw 1 > FAIL: gcc.target/i386/pr92658-avx512f.c scan-assembler-times vpmovqw 1 > FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovdb 2 > FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovdb 2 > FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovdb 2 > FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovdw 2 > FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovdw 2 > FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovdw 2 > FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqb[ > \t]*%xmm 1 > FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqb[ > \t]*%xmm 1 > FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqb[ > \t]*%xmm 1 > FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqb[ > \t]*%ymm 1 > FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqb[ > \t]*%ymm 1 > FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqb[ > \t]*%ymm 1 > FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqd 2 > FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqd 2 > FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqd 2 > FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqw 2 > FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqw 2 > FAIL: gcc.target/i386/pr92658-avx512vl.c scan-assembler-times vpmovqw 2 > FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbd 2 > FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbd 2 > FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbd 2 > FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbq 2 > FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbq 2 > FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbq 2 > FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbw 2 > FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbw 2 > FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxbw 2 > FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxdq 2 > FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxdq 2 > FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxdq 2 > FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxwd 2 > FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxwd 2 > FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxwd 2 > FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxwq 2 > FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxwq 2 > FAIL: gcc.target/i386/pr92658-sse4-2.c scan-assembler-times pmovsxwq 2 > FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbd 2 > FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbd 2 > FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbd 2 > FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbq 2 > FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbq 2 > FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbq 2 > FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbw 2 > FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbw 2 > FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxbw 2 > FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxdq 2 > FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxdq 2 > FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxdq 2 > FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxwd 2 > FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxwd 2 > FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxwd 2 > FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxwq 2 > FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxwq 2 > FAIL: gcc.target/i386/pr92658-sse4.c scan-assembler-times pmovzxwq 2 > > On Fri, Oct 7, 2022 at 10:21 PM Andre Vieira (lists) via Gcc-patches > <gcc-patches@gcc.gnu.org> wrote: > > > > Hi, > > > > Whilst running a bootstrap with extra options to force bitfield > > vectorization '-O2 -ftree-vectorize -ftree-loop-if-convert > > -fno-vect-cost-model' I ran into an ICE in vect-patterns where a > > bit_field_ref had a container that wasn't INTEGRAL_TYPE and had a > > E_BLKmode, which meant we failed to build an integer type with the same > > size. For that reason I added a check to bail out earlier if the > > TYPE_MODE of the container is indeed E_BLKmode. The pattern for the > > bitfield inserts required no change as we currently don't support > > containers that aren't integer typed. > > > > Also changed a testcase because in BIG-ENDIAN it was not vectorizing due > > to a different size of container that wasn't supported. > > > > This passes the same bootstrap and regressions on aarch64-none-linux and > > no regressions on aarch64_be-none-elf either. > > > > I assume you are OK with these changes Richard, but I don't like to > > commit on Friday in case something breaks over the weekend, so I'll > > leave it until Monday. > > > > Thanks, > > Andre > > > > On 29/09/2022 08:54, Richard Biener wrote: > > > On Wed, Sep 28, 2022 at 7:32 PM Andre Vieira (lists) via Gcc-patches > > > <gcc-patches@gcc.gnu.org> wrote: > > >> Made the change and also created the ChangeLogs. > > > OK if bootstrap / testing succeeds. > > > > > > Thanks, > > > Richard. > > > > > >> gcc/ChangeLog: > > >> > > >> * tree-if-conv.cc (if_convertible_loop_p_1): Move ordering of > > >> loop bb's from here... > > >> (tree_if_conversion): ... to here. Also call bitfield lowering > > >> when appropriate. > > >> (version_loop_for_if_conversion): Adapt to enable loop > > >> versioning when we only need > > >> to lower bitfields. > > >> (ifcvt_split_critical_edges): Relax condition of expected loop > > >> form as this is checked earlier. > > >> (get_bitfield_rep): New function. > > >> (lower_bitfield): Likewise. > > >> (bitfields_to_lower_p): Likewise. > > >> (need_to_lower_bitfields): New global boolean. > > >> (need_to_ifcvt): Likewise. > > >> * tree-vect-data-refs.cc (vect_find_stmt_data_reference): > > >> Improve diagnostic message. > > >> * tree-vect-patterns.cc (vect_recog_temp_ssa_var): Add default > > >> value for last parameter. > > >> (vect_recog_bitfield_ref_pattern): New. > > >> (vect_recog_bit_insert_pattern): New. > > >> > > >> gcc/testsuite/ChangeLog: > > >> > > >> * gcc.dg/vect/vect-bitfield-read-1.c: New test. > > >> * gcc.dg/vect/vect-bitfield-read-2.c: New test. > > >> * gcc.dg/vect/vect-bitfield-read-3.c: New test. > > >> * gcc.dg/vect/vect-bitfield-read-4.c: New test. > > >> * gcc.dg/vect/vect-bitfield-read-5.c: New test. > > >> * gcc.dg/vect/vect-bitfield-read-6.c: New test. > > >> * gcc.dg/vect/vect-bitfield-write-1.c: New test. > > >> * gcc.dg/vect/vect-bitfield-write-2.c: New test. > > >> * gcc.dg/vect/vect-bitfield-write-3.c: New test. > > >> * gcc.dg/vect/vect-bitfield-write-4.c: New test. > > >> * gcc.dg/vect/vect-bitfield-write-5.c: New test. > > >> > > >> On 28/09/2022 10:43, Andre Vieira (lists) via Gcc-patches wrote: > > >>> On 27/09/2022 13:34, Richard Biener wrote: > > >>>> On Mon, 26 Sep 2022, Andre Vieira (lists) wrote: > > >>>> > > >>>>> On 08/09/2022 12:51, Richard Biener wrote: > > >>>>>> I'm curious, why the push to redundant_ssa_names? That could use > > >>>>>> a comment ... > > >>>>> So I purposefully left a #if 0 #else #endif in there so you can see > > >>>>> the two > > >>>>> options. But the reason I used redundant_ssa_names is because ifcvt > > >>>>> seems to > > >>>>> use that as a container for all pairs of (old, new) ssa names to > > >>>>> replace > > >>>>> later. So I just piggy backed on that. I don't know if there's a > > >>>>> specific > > >>>>> reason they do the replacement at the end? Maybe some ordering > > >>>>> issue? Either > > >>>>> way both adding it to redundant_ssa_names or doing the replacement > > >>>>> inline work > > >>>>> for the bitfield lowering (or work in my testing at least). > > >>>> Possibly because we (in the past?) inserted/copied stuff based on > > >>>> predicates generated at analysis time after we decide to elide something > > >>>> so we need to watch for later appearing uses. But who knows ... my mind > > >>>> fails me here. > > >>>> > > >>>> If it works to replace uses immediately please do so. But now > > >>>> I wonder why we need this - the value shouldn't change so you > > >>>> should get away with re-using the existing SSA name for the final value? > > >>> Yeah... good point. A quick change and minor testing seems to agree. > > >>> I'm sure I had a good reason to do it initially ;) > > >>> > > >>> I'll run a full-regression on this change to make sure I didn't miss > > >>> anything. > > >>> > > > > -- > BR, > Hongtao
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1.c new file mode 100644 index 0000000000000000000000000000000000000000..01cf34fb44484ca926ca5de99eef76dd99b69e92 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1.c @@ -0,0 +1,40 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" + +extern void abort(void); + +struct s { int i : 31; }; + +#define ELT0 {0} +#define ELT1 {1} +#define ELT2 {2} +#define ELT3 {3} +#define N 32 +#define RES 48 +struct s A[N] + = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3}; + +int __attribute__ ((noipa)) +f(struct s *ptr, unsigned n) { + int res = 0; + for (int i = 0; i < n; ++i) + res += ptr[i].i; + return res; +} + +int main (void) +{ + check_vect (); + + if (f(&A[0], N) != RES) + abort (); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2.c new file mode 100644 index 0000000000000000000000000000000000000000..1a4a1579c1478b9407ad21b19e8fbdca9f674b42 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2.c @@ -0,0 +1,43 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" + +extern void abort(void); + +struct s { + unsigned i : 31; + char a : 4; +}; + +#define N 32 +#define ELT0 {0x7FFFFFFFUL, 0} +#define ELT1 {0x7FFFFFFFUL, 1} +#define ELT2 {0x7FFFFFFFUL, 2} +#define ELT3 {0x7FFFFFFFUL, 3} +#define RES 48 +struct s A[N] + = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3}; + +int __attribute__ ((noipa)) +f(struct s *ptr, unsigned n) { + int res = 0; + for (int i = 0; i < n; ++i) + res += ptr[i].a; + return res; +} + +int main (void) +{ + check_vect (); + + if (f(&A[0], N) != RES) + abort (); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-3.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-3.c new file mode 100644 index 0000000000000000000000000000000000000000..216611a29fd8bbfbafdbdb79d790e520f44ba672 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-3.c @@ -0,0 +1,43 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" +#include <stdbool.h> + +extern void abort(void); + +typedef struct { + int c; + int b; + bool a : 1; +} struct_t; + +#define N 16 +#define ELT_F { 0xFFFFFFFF, 0xFFFFFFFF, 0 } +#define ELT_T { 0xFFFFFFFF, 0xFFFFFFFF, 1 } + +struct_t vect_false[N] = { ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, + ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F }; +struct_t vect_true[N] = { ELT_F, ELT_F, ELT_T, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, + ELT_F, ELT_F, ELT_T, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F }; +int main (void) +{ + unsigned ret = 0; + for (unsigned i = 0; i < N; i++) + { + ret |= vect_false[i].a; + } + if (ret) + abort (); + + for (unsigned i = 0; i < N; i++) + { + ret |= vect_true[i].a; + } + if (!ret) + abort (); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-4.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-4.c new file mode 100644 index 0000000000000000000000000000000000000000..5bc9c412e9616aefcbf49a4518f1603380a54b2f --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-4.c @@ -0,0 +1,45 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" + +extern void abort(void); + +struct s { + unsigned i : 31; + char x : 2; + char a : 4; +}; + +#define N 32 +#define ELT0 {0x7FFFFFFFUL, 3, 0} +#define ELT1 {0x7FFFFFFFUL, 3, 1} +#define ELT2 {0x7FFFFFFFUL, 3, 2} +#define ELT3 {0x7FFFFFFFUL, 3, 3} +#define RES 48 +struct s A[N] + = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3}; + +int __attribute__ ((noipa)) +f(struct s *ptr, unsigned n) { + int res = 0; + for (int i = 0; i < n; ++i) + res += ptr[i].a; + return res; +} + +int main (void) +{ + check_vect (); + + if (f(&A[0], N) != RES) + abort (); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c new file mode 100644 index 0000000000000000000000000000000000000000..19683d277b1ade1034496136f1d03bb2b446900f --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c @@ -0,0 +1,39 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" + +extern void abort(void); + +struct s { int i : 31; }; + +#define N 32 +#define V 5 +struct s A[N]; + +void __attribute__ ((noipa)) +f(struct s *ptr, unsigned n) { + for (int i = 0; i < n; ++i) + ptr[i].i = V; +} + +void __attribute__ ((noipa)) +check_f(struct s *ptr) { + for (unsigned i = 0; i < N; ++i) + if (ptr[i].i != V) + abort (); +} + +int main (void) +{ + check_vect (); + __builtin_memset (&A[0], 0, sizeof(struct s) * N); + + f(&A[0], N); + check_f (&A[0]); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c new file mode 100644 index 0000000000000000000000000000000000000000..d550dd35ab75eb67f6e53f89fbf55b7315e50bc9 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c @@ -0,0 +1,42 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" + +extern void abort(void); + +struct s { + unsigned i : 31; + char a : 4; +}; + +#define N 32 +#define V 5 +struct s A[N]; + +void __attribute__ ((noipa)) +f(struct s *ptr, unsigned n) { + for (int i = 0; i < n; ++i) + ptr[i].a = V; +} + +void __attribute__ ((noipa)) +check_f(struct s *ptr) { + for (unsigned i = 0; i < N; ++i) + if (ptr[i].a != V) + abort (); +} + +int main (void) +{ + check_vect (); + __builtin_memset (&A[0], 0, sizeof(struct s) * N); + + f(&A[0], N); + check_f (&A[0]); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c new file mode 100644 index 0000000000000000000000000000000000000000..3303d2610ff972d986be172962c129634ee64254 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c @@ -0,0 +1,43 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" + +extern void abort(void); + +struct s { + unsigned i : 31; + char x : 2; + char a : 4; +}; + +#define N 32 +#define V 5 +struct s A[N]; + +void __attribute__ ((noipa)) +f(struct s *ptr, unsigned n) { + for (int i = 0; i < n; ++i) + ptr[i].a = V; +} + +void __attribute__ ((noipa)) +check_f(struct s *ptr) { + for (unsigned i = 0; i < N; ++i) + if (ptr[i].a != V) + abort (); +} + +int main (void) +{ + check_vect (); + __builtin_memset (&A[0], 0, sizeof(struct s) * N); + + f(&A[0], N); + check_f (&A[0]); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ + diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc index 1c8e1a45234b8c3565edaacd55abbee23d8ea240..4070fa2f45970e564f13de794707613356cb5045 100644 --- a/gcc/tree-if-conv.cc +++ b/gcc/tree-if-conv.cc @@ -91,6 +91,7 @@ along with GCC; see the file COPYING3. If not see #include "tree-pass.h" #include "ssa.h" #include "expmed.h" +#include "expr.h" #include "optabs-query.h" #include "gimple-pretty-print.h" #include "alias.h" @@ -123,6 +124,9 @@ along with GCC; see the file COPYING3. If not see #include "tree-vectorizer.h" #include "tree-eh.h" +/* For lang_hooks.types.type_for_mode. */ +#include "langhooks.h" + /* Only handle PHIs with no more arguments unless we are asked to by simd pragma. */ #define MAX_PHI_ARG_NUM \ @@ -145,6 +149,12 @@ static bool need_to_rewrite_undefined; before phi_convertible_by_degenerating_args. */ static bool any_complicated_phi; +/* True if we have bitfield accesses we can lower. */ +static bool need_to_lower_bitfields; + +/* True if there is any ifcvting to be done. */ +static bool need_to_ifcvt; + /* Hash for struct innermost_loop_behavior. It depends on the user to free the memory. */ @@ -2898,18 +2908,22 @@ version_loop_for_if_conversion (class loop *loop, vec<gimple *> *preds) class loop *new_loop; gimple *g; gimple_stmt_iterator gsi; - unsigned int save_length; + unsigned int save_length = 0; g = gimple_build_call_internal (IFN_LOOP_VECTORIZED, 2, build_int_cst (integer_type_node, loop->num), integer_zero_node); gimple_call_set_lhs (g, cond); - /* Save BB->aux around loop_version as that uses the same field. */ - save_length = loop->inner ? loop->inner->num_nodes : loop->num_nodes; - void **saved_preds = XALLOCAVEC (void *, save_length); - for (unsigned i = 0; i < save_length; i++) - saved_preds[i] = ifc_bbs[i]->aux; + void **saved_preds = NULL; + if (any_complicated_phi || need_to_predicate) + { + /* Save BB->aux around loop_version as that uses the same field. */ + save_length = loop->inner ? loop->inner->num_nodes : loop->num_nodes; + saved_preds = XALLOCAVEC (void *, save_length); + for (unsigned i = 0; i < save_length; i++) + saved_preds[i] = ifc_bbs[i]->aux; + } initialize_original_copy_tables (); /* At this point we invalidate porfile confistency until IFN_LOOP_VECTORIZED @@ -2921,8 +2935,9 @@ version_loop_for_if_conversion (class loop *loop, vec<gimple *> *preds) profile_probability::always (), true); free_original_copy_tables (); - for (unsigned i = 0; i < save_length; i++) - ifc_bbs[i]->aux = saved_preds[i]; + if (any_complicated_phi || need_to_predicate) + for (unsigned i = 0; i < save_length; i++) + ifc_bbs[i]->aux = saved_preds[i]; if (new_loop == NULL) return NULL; @@ -2998,7 +3013,7 @@ ifcvt_split_critical_edges (class loop *loop, bool aggressive_if_conv) auto_vec<edge> critical_edges; /* Loop is not well formed. */ - if (num <= 2 || loop->inner || !single_exit (loop)) + if (num <= 2 || loop->inner) return false; body = get_loop_body (loop); @@ -3259,6 +3274,225 @@ ifcvt_hoist_invariants (class loop *loop, edge pe) free (body); } +typedef struct +{ + scalar_int_mode best_mode; + tree struct_expr; + tree bf_type; + tree offset; + poly_int64 bitpos; + bool write; + gassign *stmt; +} bitfield_data_t; + +/* Return TRUE if we can lower the bitfield in STMT. Fill DATA with the + relevant information required to lower this bitfield. */ + +static bool +get_bitfield_data (gassign *stmt, bool write, bitfield_data_t *data) +{ + poly_uint64 bitstart, bitend; + scalar_int_mode best_mode; + tree comp_ref = write ? gimple_get_lhs (stmt) + : gimple_assign_rhs1 (stmt); + tree struct_expr = TREE_OPERAND (comp_ref, 0); + tree field_decl = TREE_OPERAND (comp_ref, 1); + tree bf_type = TREE_TYPE (field_decl); + poly_int64 bitpos + = tree_to_poly_int64 (DECL_FIELD_BIT_OFFSET (field_decl)); + unsigned HOST_WIDE_INT bitsize = TYPE_PRECISION (bf_type); + tree offset = DECL_FIELD_OFFSET (field_decl); + /* BITSTART and BITEND describe the region we can safely load from inside the + structure. BITPOS is the bit position of the value inside the + representative that we will end up loading OFFSET bytes from the start + of the struct. BEST_MODE is the mode describing the optimal size of the + representative chunk we load. If this is a write we will store the same + sized representative back, after we have changed the appropriate bits. */ + get_bit_range (&bitstart, &bitend, comp_ref, &bitpos, &offset); + if (get_best_mode (bitsize, bitpos.to_constant (), bitstart, bitend, + TYPE_ALIGN (TREE_TYPE (struct_expr)), + INT_MAX, false, &best_mode)) + { + data->best_mode = best_mode; + data->struct_expr = struct_expr; + data->bf_type = bf_type; + data->offset = offset; + data->bitpos = bitpos; + data->write = write; + data->stmt = stmt; + return true; + } + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "\t\tCan not lower Bitfield, could not determine" + " best mode.\n"); + } + return false; +} + +/* Lowers the bitfield described by DATA. + For a write like: + + struct.bf = _1; + + lower to: + + __ifc_1 = struct.<representative>; + __ifc_2 = BIT_INSERT_EXPR (__ifc_1, _1, bitpos); + struct.<representative> = __ifc_2; + + For a read: + + _1 = struct.bf; + + lower to: + + __ifc_1 = struct.<representative>; + _1 = BIT_FIELD_REF (__ifc_1, bitsize, bitpos); + + where representative is a legal load that contains the bitfield value, + bitsize is the size of the bitfield and bitpos the offset to the start of + the bitfield within the representative. */ + +static void +lower_bitfield (bitfield_data_t *data) +{ + scalar_int_mode best_mode = data->best_mode; + tree struct_expr = data->struct_expr; + tree bf_type = data->bf_type; + tree offset = data->offset; + poly_int64 bitpos = data->bitpos; + bool write = data->write; + gassign *stmt = data->stmt; + gimple_stmt_iterator gsi = gsi_for_stmt (stmt); + /* Type of the representative. */ + tree rep_type + = lang_hooks.types.type_for_mode (best_mode, TYPE_UNSIGNED (bf_type)); + + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "Lowering:\n"); + print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM); + fprintf (dump_file, "to:\n"); + } + + tree rep_decl = build_decl (UNKNOWN_LOCATION, FIELD_DECL, + NULL_TREE, rep_type); + /* Load from the start of 'offset + bitpos % alignment'. */ + uint64_t extra_offset = bitpos.to_constant (); + extra_offset /= TYPE_ALIGN (bf_type); + extra_offset *= TYPE_ALIGN (bf_type); + offset = fold_build2 (PLUS_EXPR, TREE_TYPE (offset), offset, + build_int_cst (TREE_TYPE (offset), + extra_offset / BITS_PER_UNIT)); + /* Adapt the BITPOS to reflect the number of bits between the start of the + load and the start of the bitfield value. */ + bitpos -= extra_offset; + DECL_FIELD_BIT_OFFSET (rep_decl) = build_zero_cst (bitsizetype); + DECL_FIELD_OFFSET (rep_decl) = offset; + DECL_SIZE (rep_decl) = TYPE_SIZE (rep_type); + DECL_CONTEXT (rep_decl) = TREE_TYPE (struct_expr); + tree bitpos_tree = build_int_cst (bitsizetype, bitpos); + /* REP_COMP_REF is a COMPONENT_REF for the representative. */ + tree rep_comp_ref = build3 (COMPONENT_REF, rep_type, struct_expr, rep_decl, + NULL_TREE); + tree new_val = ifc_temp_var (rep_type, rep_comp_ref, &gsi); + + if (dump_file && (dump_flags & TDF_DETAILS)) + print_gimple_stmt (dump_file, SSA_NAME_DEF_STMT (new_val), 0, TDF_SLIM); + + if (write) + { + new_val = ifc_temp_var (rep_type, + build3 (BIT_INSERT_EXPR, rep_type, new_val, + unshare_expr (gimple_assign_rhs1 (stmt)), + bitpos_tree), &gsi); + + if (dump_file && (dump_flags & TDF_DETAILS)) + print_gimple_stmt (dump_file, SSA_NAME_DEF_STMT (new_val), 0, TDF_SLIM); + + gimple *new_stmt = gimple_build_assign (unshare_expr (rep_comp_ref), + new_val); + gimple_set_vuse (new_stmt, gimple_vuse (stmt)); + tree vdef = gimple_vdef (stmt); + gimple_set_vdef (new_stmt, vdef); + SSA_NAME_DEF_STMT (vdef) = new_stmt; + gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT); + + if (dump_file && (dump_flags & TDF_DETAILS)) + print_gimple_stmt (dump_file, new_stmt, 0, TDF_SLIM); + } + else + { + tree bfr = build3 (BIT_FIELD_REF, bf_type, new_val, + build_int_cst (bitsizetype, TYPE_PRECISION (bf_type)), + bitpos_tree); + new_val = ifc_temp_var (bf_type, bfr, &gsi); + redundant_ssa_names.safe_push (std::make_pair (gimple_get_lhs (stmt), + new_val)); + + if (dump_file && (dump_flags & TDF_DETAILS)) + print_gimple_stmt (dump_file, SSA_NAME_DEF_STMT (new_val), 0, TDF_SLIM); + } + + gsi_remove (&gsi, true); +} + +/* Return TRUE if there are bitfields to lower in this LOOP. Fill TO_LOWER + with data structures representing these bitfields. */ + +static bool +bitfields_to_lower_p (class loop *loop, auto_vec <bitfield_data_t *, 4> *to_lower) +{ + basic_block *bbs = get_loop_body (loop); + gimple_stmt_iterator gsi; + + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "Analyzing loop %d for bitfields:\n", loop->num); + } + + for (unsigned i = 0; i < loop->num_nodes; ++i) + { + basic_block bb = bbs[i]; + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) + { + gassign *stmt = dyn_cast<gassign*> (gsi_stmt (gsi)); + if (!stmt) + continue; + + tree op = gimple_get_lhs (stmt); + bool write = TREE_CODE (op) == COMPONENT_REF; + + if (!write) + op = gimple_assign_rhs1 (stmt); + + if (TREE_CODE (op) != COMPONENT_REF) + continue; + + if (DECL_BIT_FIELD (TREE_OPERAND (op, 1))) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM); + + bitfield_data_t *data = new bitfield_data_t (); + if (get_bitfield_data (stmt, write, data)) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "\tBitfield OK to lower.\n"); + to_lower->safe_push (data); + } + else + { + delete data; + return false; + } + } + } + } + return !to_lower->is_empty (); +} + + /* If-convert LOOP when it is legal. For the moment this pass has no profitability analysis. Returns non-zero todo flags when something changed. */ @@ -3269,12 +3503,15 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds) unsigned int todo = 0; bool aggressive_if_conv; class loop *rloop; + auto_vec <bitfield_data_t *, 4> bitfields_to_lower; bitmap exit_bbs; edge pe; again: rloop = NULL; ifc_bbs = NULL; + need_to_lower_bitfields = false; + need_to_ifcvt = false; need_to_predicate = false; need_to_rewrite_undefined = false; any_complicated_phi = false; @@ -3290,11 +3527,17 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds) aggressive_if_conv = true; } - if (!ifcvt_split_critical_edges (loop, aggressive_if_conv)) + if (!single_exit (loop)) + goto cleanup; + + need_to_lower_bitfields = bitfields_to_lower_p (loop, &bitfields_to_lower); + if (!ifcvt_split_critical_edges (loop, aggressive_if_conv) + && !need_to_lower_bitfields) goto cleanup; - if (!if_convertible_loop_p (loop) - || !dbg_cnt (if_conversion_tree)) + need_to_ifcvt + = if_convertible_loop_p (loop) && dbg_cnt (if_conversion_tree); + if (!need_to_ifcvt && !need_to_lower_bitfields) goto cleanup; if ((need_to_predicate || any_complicated_phi) @@ -3310,7 +3553,8 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds) Either version this loop, or if the pattern is right for outer-loop vectorization, version the outer loop. In the latter case we will still if-convert the original inner loop. */ - if (need_to_predicate + if (need_to_lower_bitfields + || need_to_predicate || any_complicated_phi || flag_tree_loop_if_convert != 1) { @@ -3350,10 +3594,32 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds) pe = single_pred_edge (gimple_bb (preds->last ())); } - /* Now all statements are if-convertible. Combine all the basic - blocks into one huge basic block doing the if-conversion - on-the-fly. */ - combine_blocks (loop); + if (need_to_lower_bitfields) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "-------------------------\n"); + fprintf (dump_file, "Start lowering bitfields\n"); + } + while (!bitfields_to_lower.is_empty ()) + { + bitfield_data_t *data = bitfields_to_lower.pop (); + lower_bitfield (data); + delete data; + } + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "Done lowering bitfields\n"); + fprintf (dump_file, "-------------------------\n"); + } + } + if (need_to_ifcvt) + { + /* Now all statements are if-convertible. Combine all the basic + blocks into one huge basic block doing the if-conversion + on-the-fly. */ + combine_blocks (loop); + } /* Perform local CSE, this esp. helps the vectorizer analysis if loads and stores are involved. CSE only the loop body, not the entry @@ -3395,6 +3661,11 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds) loop = rloop; goto again; } + while (!bitfields_to_lower.is_empty ()) + { + bitfield_data_t *data = bitfields_to_lower.pop (); + delete data; + } return todo; } diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc index b279a82551eb70379804d405983ae5dc44b66bf5..e93cdc727da4bb7863b2ad13f29f7d550492adea 100644 --- a/gcc/tree-vect-data-refs.cc +++ b/gcc/tree-vect-data-refs.cc @@ -4301,7 +4301,8 @@ vect_find_stmt_data_reference (loop_p loop, gimple *stmt, free_data_ref (dr); return opt_result::failure_at (stmt, "not vectorized:" - " statement is bitfield access %G", stmt); + " statement is an unsupported" + " bitfield access %G", stmt); } if (DR_BASE_ADDRESS (dr) diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc index dfbfb71b3c69a0205ccc1b287cb50fa02a70942e..435b75f860784a929041d5214d39c876c5ba790a 100644 --- a/gcc/tree-vect-patterns.cc +++ b/gcc/tree-vect-patterns.cc @@ -35,6 +35,7 @@ along with GCC; see the file COPYING3. If not see #include "tree-eh.h" #include "gimplify.h" #include "gimple-iterator.h" +#include "gimplify-me.h" #include "cfgloop.h" #include "tree-vectorizer.h" #include "dumpfile.h" @@ -1828,6 +1829,204 @@ vect_recog_widen_sum_pattern (vec_info *vinfo, return pattern_stmt; } +/* Function vect_recog_bitfield_ref_pattern + + Try to find the following pattern: + + _2 = BIT_FIELD_REF (_1, bitsize, bitpos); + _3 = (type) _2; + + where type is a non-bitfield type, that is to say, it's precision matches + 2^(TYPE_SIZE(type) - (TYPE_UNSIGNED (type) ? 1 : 2)). + + Input: + + * STMT_VINFO: The stmt from which the pattern search begins. + here it starts with: + _3 = (type) _2; + + Output: + + * TYPE_OUT: The vector type of the output of this pattern. + + * Return value: A new stmt that will be used to replace the sequence of + stmts that constitute the pattern. In this case it will be: + patt1 = (type) _1; + patt2 = patt1 >> bitpos; + _3 = patt2 & ((1 << bitsize) - 1); + +*/ + +static gimple * +vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info, + tree *type_out) +{ + gassign *nop_stmt = dyn_cast <gassign *> (stmt_info->stmt); + if (!nop_stmt + || gimple_assign_rhs_code (nop_stmt) != NOP_EXPR + || TREE_CODE (gimple_assign_rhs1 (nop_stmt)) != SSA_NAME) + return NULL; + + gassign *bf_stmt + = dyn_cast <gassign *> (SSA_NAME_DEF_STMT (gimple_assign_rhs1 (nop_stmt))); + + if (!bf_stmt || gimple_assign_rhs_code (bf_stmt) != BIT_FIELD_REF) + return NULL; + + tree bf_ref = gimple_assign_rhs1 (bf_stmt); + + tree load = TREE_OPERAND (bf_ref, 0); + tree size = TREE_OPERAND (bf_ref, 1); + tree offset = TREE_OPERAND (bf_ref, 2); + + /* Bail out if the load is already a vector type. */ + if (VECTOR_TYPE_P (TREE_TYPE (load))) + return NULL; + + + gimple *pattern_stmt; + tree lhs = load; + tree ret_type = TREE_TYPE (gimple_get_lhs (nop_stmt)); + + if (!useless_type_conversion_p (TREE_TYPE (lhs), ret_type)) + { + pattern_stmt + = gimple_build_assign (vect_recog_temp_ssa_var (ret_type, NULL), + NOP_EXPR, lhs); + lhs = gimple_get_lhs (pattern_stmt); + append_pattern_def_seq (vinfo, stmt_info, pattern_stmt); + } + + unsigned HOST_WIDE_INT shift_n = tree_to_uhwi (offset); + if (shift_n) + { + pattern_stmt + = gimple_build_assign (vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL), + RSHIFT_EXPR, lhs, offset); + lhs = gimple_get_lhs (pattern_stmt); + append_pattern_def_seq (vinfo, stmt_info, pattern_stmt); + } + + unsigned HOST_WIDE_INT mask_i = tree_to_uhwi (size); + tree mask = build_int_cst (TREE_TYPE (lhs), (1ULL << mask_i) - 1); + pattern_stmt + = gimple_build_assign (vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL), + BIT_AND_EXPR, lhs, mask); + + *type_out = STMT_VINFO_VECTYPE (stmt_info); + vect_pattern_detected ("bit_field_ref pattern", stmt_info->stmt); + + return pattern_stmt; +} + +/* Function vect_recog_bit_insert_pattern + + Try to find the following pattern: + + _3 = BIT_INSERT_EXPR (_1, _2, bitpos); + + Input: + + * STMT_VINFO: The stmt we want to replace. + + Output: + + * TYPE_OUT: The vector type of the output of this pattern. + + * Return value: A new stmt that will be used to replace the sequence of + stmts that constitute the pattern. In this case it will be: + patt1 = _2 & mask; // Clearing of the non-relevant bits in the + // 'to-write value'. + patt2 = patt1 << bitpos; // Shift the cleaned value in to place. + patt3 = _1 & ~(mask << bitpos); // Clearing the bits we want to write to, + // from the value we want to write to. + _3 = patt3 | patt2; // Write bits. + + + where mask = ((1 << TYPE_PRECISION (_2)) - 1), a mask to keep the number of + bits corresponding to the real size of the bitfield value we are writing to. + +*/ + +static gimple * +vect_recog_bit_insert_pattern (vec_info *vinfo, stmt_vec_info stmt_info, + tree *type_out) +{ + gassign *bf_stmt = dyn_cast <gassign *> (stmt_info->stmt); + if (!bf_stmt || gimple_assign_rhs_code (bf_stmt) != BIT_INSERT_EXPR) + return NULL; + + tree load = gimple_assign_rhs1 (bf_stmt); + tree value = gimple_assign_rhs2 (bf_stmt); + tree offset = gimple_assign_rhs3 (bf_stmt); + + tree bf_type = TREE_TYPE (value); + tree load_type = TREE_TYPE (load); + + /* Bail out if the load is already of vector type. */ + if (VECTOR_TYPE_P (load_type)) + return NULL; + + gimple *pattern_stmt; + + if (CONSTANT_CLASS_P (value)) + value = fold_build1 (NOP_EXPR, load_type, value); + else + { + if (TREE_CODE (value) != SSA_NAME) + return NULL; + gassign *nop_stmt = dyn_cast <gassign *> (SSA_NAME_DEF_STMT (value)); + if (!nop_stmt || gimple_assign_rhs_code (nop_stmt) != NOP_EXPR) + return NULL; + if (!useless_type_conversion_p (TREE_TYPE (value), load_type)) + { + value = fold_build1 (NOP_EXPR, load_type, gimple_assign_rhs1 (nop_stmt)); + pattern_stmt + = gimple_build_assign (vect_recog_temp_ssa_var (load_type, NULL), + value); + value = gimple_get_lhs (pattern_stmt); + append_pattern_def_seq (vinfo, stmt_info, pattern_stmt); + } + } + + unsigned HOST_WIDE_INT mask_i = (1ULL << TYPE_PRECISION (bf_type)) - 1; + tree mask_t = build_int_cst (load_type, mask_i); + /* Clear bits we don't want to write back from value and shift it in place. */ + pattern_stmt + = gimple_build_assign (vect_recog_temp_ssa_var (load_type, NULL), + fold_build2 (BIT_AND_EXPR, load_type, value, + mask_t)); + append_pattern_def_seq (vinfo, stmt_info, pattern_stmt); + unsigned HOST_WIDE_INT shift_n = tree_to_uhwi (offset); + if (shift_n) + { + pattern_stmt + = gimple_build_assign (vect_recog_temp_ssa_var (load_type, NULL), + LSHIFT_EXPR, value, offset); + append_pattern_def_seq (vinfo, stmt_info, pattern_stmt); + value = gimple_get_lhs (pattern_stmt); + } + /* Mask off the bits in the loaded value. */ + mask_i <<= shift_n; + mask_i = ~mask_i; + mask_t = build_int_cst (load_type, mask_i); + + tree lhs = vect_recog_temp_ssa_var (load_type, NULL); + pattern_stmt = gimple_build_assign (lhs, BIT_AND_EXPR,load, mask_t); + append_pattern_def_seq (vinfo, stmt_info, pattern_stmt); + + /* Compose the value to write back. */ + pattern_stmt + = gimple_build_assign (vect_recog_temp_ssa_var (load_type, NULL), + BIT_IOR_EXPR, lhs, value); + + *type_out = STMT_VINFO_VECTYPE (stmt_info); + vect_pattern_detected ("bit_field_ref pattern", stmt_info->stmt); + + return pattern_stmt; +} + + /* Recognize cases in which an operation is performed in one type WTYPE but could be done more efficiently in a narrower type NTYPE. For example, if we have: @@ -5623,6 +5822,8 @@ struct vect_recog_func taken which means usually the more complex one needs to preceed the less comples onex (widen_sum only after dot_prod or sad for example). */ static vect_recog_func vect_vect_recog_func_ptrs[] = { + { vect_recog_bitfield_ref_pattern, "bitfield_ref" }, + { vect_recog_bit_insert_pattern, "bit_insert" }, { vect_recog_over_widening_pattern, "over_widening" }, /* Must come after over_widening, which narrows the shift as much as possible beforehand. */