From patchwork Mon Aug 8 14:06:56 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "Andre Vieira (lists)" X-Patchwork-Id: 429 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:6a10:20da:b0:2d3:3019:e567 with SMTP id n26csp1879798pxc; Mon, 8 Aug 2022 07:07:52 -0700 (PDT) X-Google-Smtp-Source: AA6agR7BrB1DExmNpIleaUXw29SRbxwh4sMSQeBXgUsDmXNyc32+PZR5UnOJxudI30HMJJ+btWKW X-Received: by 2002:a17:906:a089:b0:72f:826b:e084 with SMTP id q9-20020a170906a08900b0072f826be084mr14277892ejy.708.1659967672507; Mon, 08 Aug 2022 07:07:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1659967672; cv=none; d=google.com; s=arc-20160816; b=c26Oi0ixOgpeOvz/ScH4TDOHPkttFdmJ/8qYQpJwmakX0Yb7M5zwo5Y17PbIbWjUwO Kn2BxtyZ6KoOSUN9D1EVx80Yn25gvQzFeZHolcGA5RUkczpIAWr4c/ywajh/RlThuXvH Uqfi5kUtlo9lZlGG781nwJkUS15kgh3Vjewm0SK8kWfsGmVrfRM9u3yoEalGUZeWIwRr aA6q5yYjOLuJFjLbIe5Xy7imrNUafRoFp9y+X6F8wfx2nES0HF2W1tz59gg9KLKhieBN g5WxkqjBVZyfJvSIkE05qDEpOpaNFOonJfJQOrdp9w/cnr9XThssG5+i8+VctTleovf2 Cj0g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:reply-to:from:list-subscribe:list-help :list-post:list-archive:list-unsubscribe:list-id:precedence :in-reply-to:references:to:content-language:subject:user-agent :mime-version:date:message-id:dmarc-filter:delivered-to :dkim-signature:dkim-filter; bh=pc/MZaslPKCVhAquIYBevpalvjcSJC5IYzl6o4hviuQ=; b=VJbqCQeJB7+tHs6j4k1QyGmmHruDG/ltbuwz7Yc7r0u38vqCObNMWThqA0uTbN3QCj uTsExqd4BqgTacKCSSWGTbNKKFDQ2HwiMctBeiKyGWk9vtJteA4RRTV8GdOrLv9s+2Ef nbnQ3Rx/t8iOg2d8tPtLbGy0ODFbJEhQdsKYGMGyMlxTtWWvXEM3cjQYaHPi75iMvlEx osQCJJoSIGblm40gqB02o267pq8GNBHsi0auhqlX8hzjqfiDOKzu7VjYqYQQTII1rdne ht4R1OJ2FfziBfUbd992roz/GagABPIh1w1Ee6bd4CU8DWZe5++8tXO2Km5r4MSTBZqo nsVw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b="lcIxQ/j3"; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id g7-20020a056402424700b0043d4fe7f8e7si7612086edb.330.2022.08.08.07.07.51 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 08 Aug 2022 07:07:52 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b="lcIxQ/j3"; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 0D9AF385737C for ; Mon, 8 Aug 2022 14:07:51 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 0D9AF385737C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1659967671; bh=pc/MZaslPKCVhAquIYBevpalvjcSJC5IYzl6o4hviuQ=; h=Date:Subject:To:References:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=lcIxQ/j380bhiRjiRf/yry7tTcP32cWL4oO9tWZBQ3LRsV/PC0wQlZW82muFy7HtI 5rNqxS7S9ACE3WEivjuPqoGa6aonun/DmnUWyc8r762M0Wk7iTHYCK2/Kg0t8H1OF7 McE13ba6VfAVn5/1LeRD3LJO+aZUT2fjq0XDtlu8= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 7AAC738582BF for ; Mon, 8 Aug 2022 14:07:03 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 7AAC738582BF Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D300B1596; Mon, 8 Aug 2022 07:07:03 -0700 (PDT) Received: from [10.1.35.127] (E121495.arm.com [10.1.35.127]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 4D43A3F5A1; Mon, 8 Aug 2022 07:07:02 -0700 (PDT) Message-ID: <8f805fb1-d4ae-b0e3-ff26-57fd2c1fc1f7@arm.com> Date: Mon, 8 Aug 2022 15:06:56 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.11.0 Subject: [PATCH] Teach vectorizer to deal with bitfield accesses (was: [RFC] Teach vectorizer to deal with bitfield reads) Content-Language: en-US To: Richard Biener References: <4a6f2350-f070-1473-63a5-3232968d3a89@arm.com> <4ea91767-de43-9d30-5bd2-a9a0759760c7@arm.com> <69abe824-94f5-95b5-fb7f-6fa076973e05@arm.com> In-Reply-To: X-Spam-Status: No, score=-21.4 required=5.0 tests=BAYES_00, BODY_8BITS, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_LOTSOFHASH, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: "Andre Vieira \(lists\) via Gcc-patches" From: "Andre Vieira (lists)" Reply-To: "Andre Vieira \(lists\)" Cc: Jakub Jelinek , Richard Sandiford , "gcc-patches@gcc.gnu.org" Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1740602262449037925?= X-GMAIL-MSGID: =?utf-8?q?1740602262449037925?= Hi, So I've changed the approach from the RFC as suggested, moving the bitfield lowering to the if-convert pass. So to reiterate, ifcvt will lower COMPONENT_REF's with DECL_BIT_FIELD field's to either BIT_FIELD_REF if they are reads or BIT_INSERT_EXPR if they are writes, using loads and writes of 'representatives' that are big enough to contain the bitfield value. In vect_recog I added two patterns to replace these BIT_FIELD_REF and BIT_INSERT_EXPR with shift's and masks as appropriate. I'd like to see if it was possible to remove the 'load' part of a BIT_INSERT_EXPR if the representative write didn't change any relevant bits.  For example: struct s{ int dont_care; char a : 3; }; s.a = ; Should not require a load & write cycle, in fact it wouldn't even require any masking either. Though to achieve this we'd need to make sure the representative didn't overlap with any other field. Any suggestions on how to do this would be great, though I don't think we need to wait for that, as that's merely a nice-to-have optimization I guess? I am not sure where I should 'document' this change of behavior to ifcvt, and/or we should change the name of the pass, since it's doing more than if-conversion now? Bootstrapped and regression tested this patch on aarch64-none-linux-gnu. gcc/ChangeLog: 2022-08-08  Andre Vieira          * tree-if-conv.cc (includes): Add expr.h and langhooks.h to list of includes.         (need_to_lower_bitfields): New static bool.         (need_to_ifcvt): Likewise.         (version_loop_for_if_conversion): Adapt to work for bitfield lowering-only path.         (bitfield_data_t): New typedef.         (get_bitfield_data): New function.         (lower_bitfield): New function.         (bitfields_to_lower_p): New function.         (tree_if_conversion): Change to lower-bitfields too.         * tree-vect-data-refs.cc (vect_find_stmt_data_reference): Modify dump message to be more accurate.         * tree-vect-patterns.cc (includes): Add gimplify-me.h include.         (vect_recog_bitfield_ref_pattern): New function.         (vect_recog_bit_insert_pattern): New function.         (vect_vect_recog_func_ptrs): Add two new patterns. gcc/testsuite/ChangeLog: 2022-08-08  Andre Vieira          * gcc.dg/vect/vect-bitfield-read-1.c: New test.         * gcc.dg/vect/vect-bitfield-read-2.c: New test.         * gcc.dg/vect/vect-bitfield-read-3.c: New test.         * gcc.dg/vect/vect-bitfield-read-4.c: New test.         * gcc.dg/vect/vect-bitfield-write-1.c: New test.         * gcc.dg/vect/vect-bitfield-write-2.c: New test.         * gcc.dg/vect/vect-bitfield-write-3.c: New test. Kind regards, Andre diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1.c new file mode 100644 index 0000000000000000000000000000000000000000..01cf34fb44484ca926ca5de99eef76dd99b69e92 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1.c @@ -0,0 +1,40 @@ +/* { dg-require-effective-target vect_int } */ + +#include +#include "tree-vect.h" + +extern void abort(void); + +struct s { int i : 31; }; + +#define ELT0 {0} +#define ELT1 {1} +#define ELT2 {2} +#define ELT3 {3} +#define N 32 +#define RES 48 +struct s A[N] + = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3}; + +int __attribute__ ((noipa)) +f(struct s *ptr, unsigned n) { + int res = 0; + for (int i = 0; i < n; ++i) + res += ptr[i].i; + return res; +} + +int main (void) +{ + check_vect (); + + if (f(&A[0], N) != RES) + abort (); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2.c new file mode 100644 index 0000000000000000000000000000000000000000..1a4a1579c1478b9407ad21b19e8fbdca9f674b42 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2.c @@ -0,0 +1,43 @@ +/* { dg-require-effective-target vect_int } */ + +#include +#include "tree-vect.h" + +extern void abort(void); + +struct s { + unsigned i : 31; + char a : 4; +}; + +#define N 32 +#define ELT0 {0x7FFFFFFFUL, 0} +#define ELT1 {0x7FFFFFFFUL, 1} +#define ELT2 {0x7FFFFFFFUL, 2} +#define ELT3 {0x7FFFFFFFUL, 3} +#define RES 48 +struct s A[N] + = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3}; + +int __attribute__ ((noipa)) +f(struct s *ptr, unsigned n) { + int res = 0; + for (int i = 0; i < n; ++i) + res += ptr[i].a; + return res; +} + +int main (void) +{ + check_vect (); + + if (f(&A[0], N) != RES) + abort (); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-3.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-3.c new file mode 100644 index 0000000000000000000000000000000000000000..216611a29fd8bbfbafdbdb79d790e520f44ba672 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-3.c @@ -0,0 +1,43 @@ +/* { dg-require-effective-target vect_int } */ + +#include +#include "tree-vect.h" +#include + +extern void abort(void); + +typedef struct { + int c; + int b; + bool a : 1; +} struct_t; + +#define N 16 +#define ELT_F { 0xFFFFFFFF, 0xFFFFFFFF, 0 } +#define ELT_T { 0xFFFFFFFF, 0xFFFFFFFF, 1 } + +struct_t vect_false[N] = { ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, + ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F }; +struct_t vect_true[N] = { ELT_F, ELT_F, ELT_T, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F, + ELT_F, ELT_F, ELT_T, ELT_F, ELT_F, ELT_F, ELT_F, ELT_F }; +int main (void) +{ + unsigned ret = 0; + for (unsigned i = 0; i < N; i++) + { + ret |= vect_false[i].a; + } + if (ret) + abort (); + + for (unsigned i = 0; i < N; i++) + { + ret |= vect_true[i].a; + } + if (!ret) + abort (); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-4.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-4.c new file mode 100644 index 0000000000000000000000000000000000000000..5bc9c412e9616aefcbf49a4518f1603380a54b2f --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-4.c @@ -0,0 +1,45 @@ +/* { dg-require-effective-target vect_int } */ + +#include +#include "tree-vect.h" + +extern void abort(void); + +struct s { + unsigned i : 31; + char x : 2; + char a : 4; +}; + +#define N 32 +#define ELT0 {0x7FFFFFFFUL, 3, 0} +#define ELT1 {0x7FFFFFFFUL, 3, 1} +#define ELT2 {0x7FFFFFFFUL, 3, 2} +#define ELT3 {0x7FFFFFFFUL, 3, 3} +#define RES 48 +struct s A[N] + = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3}; + +int __attribute__ ((noipa)) +f(struct s *ptr, unsigned n) { + int res = 0; + for (int i = 0; i < n; ++i) + res += ptr[i].a; + return res; +} + +int main (void) +{ + check_vect (); + + if (f(&A[0], N) != RES) + abort (); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c new file mode 100644 index 0000000000000000000000000000000000000000..19683d277b1ade1034496136f1d03bb2b446900f --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c @@ -0,0 +1,39 @@ +/* { dg-require-effective-target vect_int } */ + +#include +#include "tree-vect.h" + +extern void abort(void); + +struct s { int i : 31; }; + +#define N 32 +#define V 5 +struct s A[N]; + +void __attribute__ ((noipa)) +f(struct s *ptr, unsigned n) { + for (int i = 0; i < n; ++i) + ptr[i].i = V; +} + +void __attribute__ ((noipa)) +check_f(struct s *ptr) { + for (unsigned i = 0; i < N; ++i) + if (ptr[i].i != V) + abort (); +} + +int main (void) +{ + check_vect (); + __builtin_memset (&A[0], 0, sizeof(struct s) * N); + + f(&A[0], N); + check_f (&A[0]); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c new file mode 100644 index 0000000000000000000000000000000000000000..d550dd35ab75eb67f6e53f89fbf55b7315e50bc9 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c @@ -0,0 +1,42 @@ +/* { dg-require-effective-target vect_int } */ + +#include +#include "tree-vect.h" + +extern void abort(void); + +struct s { + unsigned i : 31; + char a : 4; +}; + +#define N 32 +#define V 5 +struct s A[N]; + +void __attribute__ ((noipa)) +f(struct s *ptr, unsigned n) { + for (int i = 0; i < n; ++i) + ptr[i].a = V; +} + +void __attribute__ ((noipa)) +check_f(struct s *ptr) { + for (unsigned i = 0; i < N; ++i) + if (ptr[i].a != V) + abort (); +} + +int main (void) +{ + check_vect (); + __builtin_memset (&A[0], 0, sizeof(struct s) * N); + + f(&A[0], N); + check_f (&A[0]); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c new file mode 100644 index 0000000000000000000000000000000000000000..3303d2610ff972d986be172962c129634ee64254 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c @@ -0,0 +1,43 @@ +/* { dg-require-effective-target vect_int } */ + +#include +#include "tree-vect.h" + +extern void abort(void); + +struct s { + unsigned i : 31; + char x : 2; + char a : 4; +}; + +#define N 32 +#define V 5 +struct s A[N]; + +void __attribute__ ((noipa)) +f(struct s *ptr, unsigned n) { + for (int i = 0; i < n; ++i) + ptr[i].a = V; +} + +void __attribute__ ((noipa)) +check_f(struct s *ptr) { + for (unsigned i = 0; i < N; ++i) + if (ptr[i].a != V) + abort (); +} + +int main (void) +{ + check_vect (); + __builtin_memset (&A[0], 0, sizeof(struct s) * N); + + f(&A[0], N); + check_f (&A[0]); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ + diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc index 1c8e1a45234b8c3565edaacd55abbee23d8ea240..4070fa2f45970e564f13de794707613356cb5045 100644 --- a/gcc/tree-if-conv.cc +++ b/gcc/tree-if-conv.cc @@ -91,6 +91,7 @@ along with GCC; see the file COPYING3. If not see #include "tree-pass.h" #include "ssa.h" #include "expmed.h" +#include "expr.h" #include "optabs-query.h" #include "gimple-pretty-print.h" #include "alias.h" @@ -123,6 +124,9 @@ along with GCC; see the file COPYING3. If not see #include "tree-vectorizer.h" #include "tree-eh.h" +/* For lang_hooks.types.type_for_mode. */ +#include "langhooks.h" + /* Only handle PHIs with no more arguments unless we are asked to by simd pragma. */ #define MAX_PHI_ARG_NUM \ @@ -145,6 +149,12 @@ static bool need_to_rewrite_undefined; before phi_convertible_by_degenerating_args. */ static bool any_complicated_phi; +/* True if we have bitfield accesses we can lower. */ +static bool need_to_lower_bitfields; + +/* True if there is any ifcvting to be done. */ +static bool need_to_ifcvt; + /* Hash for struct innermost_loop_behavior. It depends on the user to free the memory. */ @@ -2898,18 +2908,22 @@ version_loop_for_if_conversion (class loop *loop, vec *preds) class loop *new_loop; gimple *g; gimple_stmt_iterator gsi; - unsigned int save_length; + unsigned int save_length = 0; g = gimple_build_call_internal (IFN_LOOP_VECTORIZED, 2, build_int_cst (integer_type_node, loop->num), integer_zero_node); gimple_call_set_lhs (g, cond); - /* Save BB->aux around loop_version as that uses the same field. */ - save_length = loop->inner ? loop->inner->num_nodes : loop->num_nodes; - void **saved_preds = XALLOCAVEC (void *, save_length); - for (unsigned i = 0; i < save_length; i++) - saved_preds[i] = ifc_bbs[i]->aux; + void **saved_preds = NULL; + if (any_complicated_phi || need_to_predicate) + { + /* Save BB->aux around loop_version as that uses the same field. */ + save_length = loop->inner ? loop->inner->num_nodes : loop->num_nodes; + saved_preds = XALLOCAVEC (void *, save_length); + for (unsigned i = 0; i < save_length; i++) + saved_preds[i] = ifc_bbs[i]->aux; + } initialize_original_copy_tables (); /* At this point we invalidate porfile confistency until IFN_LOOP_VECTORIZED @@ -2921,8 +2935,9 @@ version_loop_for_if_conversion (class loop *loop, vec *preds) profile_probability::always (), true); free_original_copy_tables (); - for (unsigned i = 0; i < save_length; i++) - ifc_bbs[i]->aux = saved_preds[i]; + if (any_complicated_phi || need_to_predicate) + for (unsigned i = 0; i < save_length; i++) + ifc_bbs[i]->aux = saved_preds[i]; if (new_loop == NULL) return NULL; @@ -2998,7 +3013,7 @@ ifcvt_split_critical_edges (class loop *loop, bool aggressive_if_conv) auto_vec critical_edges; /* Loop is not well formed. */ - if (num <= 2 || loop->inner || !single_exit (loop)) + if (num <= 2 || loop->inner) return false; body = get_loop_body (loop); @@ -3259,6 +3274,225 @@ ifcvt_hoist_invariants (class loop *loop, edge pe) free (body); } +typedef struct +{ + scalar_int_mode best_mode; + tree struct_expr; + tree bf_type; + tree offset; + poly_int64 bitpos; + bool write; + gassign *stmt; +} bitfield_data_t; + +/* Return TRUE if we can lower the bitfield in STMT. Fill DATA with the + relevant information required to lower this bitfield. */ + +static bool +get_bitfield_data (gassign *stmt, bool write, bitfield_data_t *data) +{ + poly_uint64 bitstart, bitend; + scalar_int_mode best_mode; + tree comp_ref = write ? gimple_get_lhs (stmt) + : gimple_assign_rhs1 (stmt); + tree struct_expr = TREE_OPERAND (comp_ref, 0); + tree field_decl = TREE_OPERAND (comp_ref, 1); + tree bf_type = TREE_TYPE (field_decl); + poly_int64 bitpos + = tree_to_poly_int64 (DECL_FIELD_BIT_OFFSET (field_decl)); + unsigned HOST_WIDE_INT bitsize = TYPE_PRECISION (bf_type); + tree offset = DECL_FIELD_OFFSET (field_decl); + /* BITSTART and BITEND describe the region we can safely load from inside the + structure. BITPOS is the bit position of the value inside the + representative that we will end up loading OFFSET bytes from the start + of the struct. BEST_MODE is the mode describing the optimal size of the + representative chunk we load. If this is a write we will store the same + sized representative back, after we have changed the appropriate bits. */ + get_bit_range (&bitstart, &bitend, comp_ref, &bitpos, &offset); + if (get_best_mode (bitsize, bitpos.to_constant (), bitstart, bitend, + TYPE_ALIGN (TREE_TYPE (struct_expr)), + INT_MAX, false, &best_mode)) + { + data->best_mode = best_mode; + data->struct_expr = struct_expr; + data->bf_type = bf_type; + data->offset = offset; + data->bitpos = bitpos; + data->write = write; + data->stmt = stmt; + return true; + } + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "\t\tCan not lower Bitfield, could not determine" + " best mode.\n"); + } + return false; +} + +/* Lowers the bitfield described by DATA. + For a write like: + + struct.bf = _1; + + lower to: + + __ifc_1 = struct.; + __ifc_2 = BIT_INSERT_EXPR (__ifc_1, _1, bitpos); + struct. = __ifc_2; + + For a read: + + _1 = struct.bf; + + lower to: + + __ifc_1 = struct.; + _1 = BIT_FIELD_REF (__ifc_1, bitsize, bitpos); + + where representative is a legal load that contains the bitfield value, + bitsize is the size of the bitfield and bitpos the offset to the start of + the bitfield within the representative. */ + +static void +lower_bitfield (bitfield_data_t *data) +{ + scalar_int_mode best_mode = data->best_mode; + tree struct_expr = data->struct_expr; + tree bf_type = data->bf_type; + tree offset = data->offset; + poly_int64 bitpos = data->bitpos; + bool write = data->write; + gassign *stmt = data->stmt; + gimple_stmt_iterator gsi = gsi_for_stmt (stmt); + /* Type of the representative. */ + tree rep_type + = lang_hooks.types.type_for_mode (best_mode, TYPE_UNSIGNED (bf_type)); + + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "Lowering:\n"); + print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM); + fprintf (dump_file, "to:\n"); + } + + tree rep_decl = build_decl (UNKNOWN_LOCATION, FIELD_DECL, + NULL_TREE, rep_type); + /* Load from the start of 'offset + bitpos % alignment'. */ + uint64_t extra_offset = bitpos.to_constant (); + extra_offset /= TYPE_ALIGN (bf_type); + extra_offset *= TYPE_ALIGN (bf_type); + offset = fold_build2 (PLUS_EXPR, TREE_TYPE (offset), offset, + build_int_cst (TREE_TYPE (offset), + extra_offset / BITS_PER_UNIT)); + /* Adapt the BITPOS to reflect the number of bits between the start of the + load and the start of the bitfield value. */ + bitpos -= extra_offset; + DECL_FIELD_BIT_OFFSET (rep_decl) = build_zero_cst (bitsizetype); + DECL_FIELD_OFFSET (rep_decl) = offset; + DECL_SIZE (rep_decl) = TYPE_SIZE (rep_type); + DECL_CONTEXT (rep_decl) = TREE_TYPE (struct_expr); + tree bitpos_tree = build_int_cst (bitsizetype, bitpos); + /* REP_COMP_REF is a COMPONENT_REF for the representative. */ + tree rep_comp_ref = build3 (COMPONENT_REF, rep_type, struct_expr, rep_decl, + NULL_TREE); + tree new_val = ifc_temp_var (rep_type, rep_comp_ref, &gsi); + + if (dump_file && (dump_flags & TDF_DETAILS)) + print_gimple_stmt (dump_file, SSA_NAME_DEF_STMT (new_val), 0, TDF_SLIM); + + if (write) + { + new_val = ifc_temp_var (rep_type, + build3 (BIT_INSERT_EXPR, rep_type, new_val, + unshare_expr (gimple_assign_rhs1 (stmt)), + bitpos_tree), &gsi); + + if (dump_file && (dump_flags & TDF_DETAILS)) + print_gimple_stmt (dump_file, SSA_NAME_DEF_STMT (new_val), 0, TDF_SLIM); + + gimple *new_stmt = gimple_build_assign (unshare_expr (rep_comp_ref), + new_val); + gimple_set_vuse (new_stmt, gimple_vuse (stmt)); + tree vdef = gimple_vdef (stmt); + gimple_set_vdef (new_stmt, vdef); + SSA_NAME_DEF_STMT (vdef) = new_stmt; + gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT); + + if (dump_file && (dump_flags & TDF_DETAILS)) + print_gimple_stmt (dump_file, new_stmt, 0, TDF_SLIM); + } + else + { + tree bfr = build3 (BIT_FIELD_REF, bf_type, new_val, + build_int_cst (bitsizetype, TYPE_PRECISION (bf_type)), + bitpos_tree); + new_val = ifc_temp_var (bf_type, bfr, &gsi); + redundant_ssa_names.safe_push (std::make_pair (gimple_get_lhs (stmt), + new_val)); + + if (dump_file && (dump_flags & TDF_DETAILS)) + print_gimple_stmt (dump_file, SSA_NAME_DEF_STMT (new_val), 0, TDF_SLIM); + } + + gsi_remove (&gsi, true); +} + +/* Return TRUE if there are bitfields to lower in this LOOP. Fill TO_LOWER + with data structures representing these bitfields. */ + +static bool +bitfields_to_lower_p (class loop *loop, auto_vec *to_lower) +{ + basic_block *bbs = get_loop_body (loop); + gimple_stmt_iterator gsi; + + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "Analyzing loop %d for bitfields:\n", loop->num); + } + + for (unsigned i = 0; i < loop->num_nodes; ++i) + { + basic_block bb = bbs[i]; + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) + { + gassign *stmt = dyn_cast (gsi_stmt (gsi)); + if (!stmt) + continue; + + tree op = gimple_get_lhs (stmt); + bool write = TREE_CODE (op) == COMPONENT_REF; + + if (!write) + op = gimple_assign_rhs1 (stmt); + + if (TREE_CODE (op) != COMPONENT_REF) + continue; + + if (DECL_BIT_FIELD (TREE_OPERAND (op, 1))) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM); + + bitfield_data_t *data = new bitfield_data_t (); + if (get_bitfield_data (stmt, write, data)) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "\tBitfield OK to lower.\n"); + to_lower->safe_push (data); + } + else + { + delete data; + return false; + } + } + } + } + return !to_lower->is_empty (); +} + + /* If-convert LOOP when it is legal. For the moment this pass has no profitability analysis. Returns non-zero todo flags when something changed. */ @@ -3269,12 +3503,15 @@ tree_if_conversion (class loop *loop, vec *preds) unsigned int todo = 0; bool aggressive_if_conv; class loop *rloop; + auto_vec bitfields_to_lower; bitmap exit_bbs; edge pe; again: rloop = NULL; ifc_bbs = NULL; + need_to_lower_bitfields = false; + need_to_ifcvt = false; need_to_predicate = false; need_to_rewrite_undefined = false; any_complicated_phi = false; @@ -3290,11 +3527,17 @@ tree_if_conversion (class loop *loop, vec *preds) aggressive_if_conv = true; } - if (!ifcvt_split_critical_edges (loop, aggressive_if_conv)) + if (!single_exit (loop)) + goto cleanup; + + need_to_lower_bitfields = bitfields_to_lower_p (loop, &bitfields_to_lower); + if (!ifcvt_split_critical_edges (loop, aggressive_if_conv) + && !need_to_lower_bitfields) goto cleanup; - if (!if_convertible_loop_p (loop) - || !dbg_cnt (if_conversion_tree)) + need_to_ifcvt + = if_convertible_loop_p (loop) && dbg_cnt (if_conversion_tree); + if (!need_to_ifcvt && !need_to_lower_bitfields) goto cleanup; if ((need_to_predicate || any_complicated_phi) @@ -3310,7 +3553,8 @@ tree_if_conversion (class loop *loop, vec *preds) Either version this loop, or if the pattern is right for outer-loop vectorization, version the outer loop. In the latter case we will still if-convert the original inner loop. */ - if (need_to_predicate + if (need_to_lower_bitfields + || need_to_predicate || any_complicated_phi || flag_tree_loop_if_convert != 1) { @@ -3350,10 +3594,32 @@ tree_if_conversion (class loop *loop, vec *preds) pe = single_pred_edge (gimple_bb (preds->last ())); } - /* Now all statements are if-convertible. Combine all the basic - blocks into one huge basic block doing the if-conversion - on-the-fly. */ - combine_blocks (loop); + if (need_to_lower_bitfields) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "-------------------------\n"); + fprintf (dump_file, "Start lowering bitfields\n"); + } + while (!bitfields_to_lower.is_empty ()) + { + bitfield_data_t *data = bitfields_to_lower.pop (); + lower_bitfield (data); + delete data; + } + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "Done lowering bitfields\n"); + fprintf (dump_file, "-------------------------\n"); + } + } + if (need_to_ifcvt) + { + /* Now all statements are if-convertible. Combine all the basic + blocks into one huge basic block doing the if-conversion + on-the-fly. */ + combine_blocks (loop); + } /* Perform local CSE, this esp. helps the vectorizer analysis if loads and stores are involved. CSE only the loop body, not the entry @@ -3395,6 +3661,11 @@ tree_if_conversion (class loop *loop, vec *preds) loop = rloop; goto again; } + while (!bitfields_to_lower.is_empty ()) + { + bitfield_data_t *data = bitfields_to_lower.pop (); + delete data; + } return todo; } diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc index b279a82551eb70379804d405983ae5dc44b66bf5..e93cdc727da4bb7863b2ad13f29f7d550492adea 100644 --- a/gcc/tree-vect-data-refs.cc +++ b/gcc/tree-vect-data-refs.cc @@ -4301,7 +4301,8 @@ vect_find_stmt_data_reference (loop_p loop, gimple *stmt, free_data_ref (dr); return opt_result::failure_at (stmt, "not vectorized:" - " statement is bitfield access %G", stmt); + " statement is an unsupported" + " bitfield access %G", stmt); } if (DR_BASE_ADDRESS (dr) diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc index dfbfb71b3c69a0205ccc1b287cb50fa02a70942e..435b75f860784a929041d5214d39c876c5ba790a 100644 --- a/gcc/tree-vect-patterns.cc +++ b/gcc/tree-vect-patterns.cc @@ -35,6 +35,7 @@ along with GCC; see the file COPYING3. If not see #include "tree-eh.h" #include "gimplify.h" #include "gimple-iterator.h" +#include "gimplify-me.h" #include "cfgloop.h" #include "tree-vectorizer.h" #include "dumpfile.h" @@ -1828,6 +1829,204 @@ vect_recog_widen_sum_pattern (vec_info *vinfo, return pattern_stmt; } +/* Function vect_recog_bitfield_ref_pattern + + Try to find the following pattern: + + _2 = BIT_FIELD_REF (_1, bitsize, bitpos); + _3 = (type) _2; + + where type is a non-bitfield type, that is to say, it's precision matches + 2^(TYPE_SIZE(type) - (TYPE_UNSIGNED (type) ? 1 : 2)). + + Input: + + * STMT_VINFO: The stmt from which the pattern search begins. + here it starts with: + _3 = (type) _2; + + Output: + + * TYPE_OUT: The vector type of the output of this pattern. + + * Return value: A new stmt that will be used to replace the sequence of + stmts that constitute the pattern. In this case it will be: + patt1 = (type) _1; + patt2 = patt1 >> bitpos; + _3 = patt2 & ((1 << bitsize) - 1); + +*/ + +static gimple * +vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info, + tree *type_out) +{ + gassign *nop_stmt = dyn_cast (stmt_info->stmt); + if (!nop_stmt + || gimple_assign_rhs_code (nop_stmt) != NOP_EXPR + || TREE_CODE (gimple_assign_rhs1 (nop_stmt)) != SSA_NAME) + return NULL; + + gassign *bf_stmt + = dyn_cast (SSA_NAME_DEF_STMT (gimple_assign_rhs1 (nop_stmt))); + + if (!bf_stmt || gimple_assign_rhs_code (bf_stmt) != BIT_FIELD_REF) + return NULL; + + tree bf_ref = gimple_assign_rhs1 (bf_stmt); + + tree load = TREE_OPERAND (bf_ref, 0); + tree size = TREE_OPERAND (bf_ref, 1); + tree offset = TREE_OPERAND (bf_ref, 2); + + /* Bail out if the load is already a vector type. */ + if (VECTOR_TYPE_P (TREE_TYPE (load))) + return NULL; + + + gimple *pattern_stmt; + tree lhs = load; + tree ret_type = TREE_TYPE (gimple_get_lhs (nop_stmt)); + + if (!useless_type_conversion_p (TREE_TYPE (lhs), ret_type)) + { + pattern_stmt + = gimple_build_assign (vect_recog_temp_ssa_var (ret_type, NULL), + NOP_EXPR, lhs); + lhs = gimple_get_lhs (pattern_stmt); + append_pattern_def_seq (vinfo, stmt_info, pattern_stmt); + } + + unsigned HOST_WIDE_INT shift_n = tree_to_uhwi (offset); + if (shift_n) + { + pattern_stmt + = gimple_build_assign (vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL), + RSHIFT_EXPR, lhs, offset); + lhs = gimple_get_lhs (pattern_stmt); + append_pattern_def_seq (vinfo, stmt_info, pattern_stmt); + } + + unsigned HOST_WIDE_INT mask_i = tree_to_uhwi (size); + tree mask = build_int_cst (TREE_TYPE (lhs), (1ULL << mask_i) - 1); + pattern_stmt + = gimple_build_assign (vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL), + BIT_AND_EXPR, lhs, mask); + + *type_out = STMT_VINFO_VECTYPE (stmt_info); + vect_pattern_detected ("bit_field_ref pattern", stmt_info->stmt); + + return pattern_stmt; +} + +/* Function vect_recog_bit_insert_pattern + + Try to find the following pattern: + + _3 = BIT_INSERT_EXPR (_1, _2, bitpos); + + Input: + + * STMT_VINFO: The stmt we want to replace. + + Output: + + * TYPE_OUT: The vector type of the output of this pattern. + + * Return value: A new stmt that will be used to replace the sequence of + stmts that constitute the pattern. In this case it will be: + patt1 = _2 & mask; // Clearing of the non-relevant bits in the + // 'to-write value'. + patt2 = patt1 << bitpos; // Shift the cleaned value in to place. + patt3 = _1 & ~(mask << bitpos); // Clearing the bits we want to write to, + // from the value we want to write to. + _3 = patt3 | patt2; // Write bits. + + + where mask = ((1 << TYPE_PRECISION (_2)) - 1), a mask to keep the number of + bits corresponding to the real size of the bitfield value we are writing to. + +*/ + +static gimple * +vect_recog_bit_insert_pattern (vec_info *vinfo, stmt_vec_info stmt_info, + tree *type_out) +{ + gassign *bf_stmt = dyn_cast (stmt_info->stmt); + if (!bf_stmt || gimple_assign_rhs_code (bf_stmt) != BIT_INSERT_EXPR) + return NULL; + + tree load = gimple_assign_rhs1 (bf_stmt); + tree value = gimple_assign_rhs2 (bf_stmt); + tree offset = gimple_assign_rhs3 (bf_stmt); + + tree bf_type = TREE_TYPE (value); + tree load_type = TREE_TYPE (load); + + /* Bail out if the load is already of vector type. */ + if (VECTOR_TYPE_P (load_type)) + return NULL; + + gimple *pattern_stmt; + + if (CONSTANT_CLASS_P (value)) + value = fold_build1 (NOP_EXPR, load_type, value); + else + { + if (TREE_CODE (value) != SSA_NAME) + return NULL; + gassign *nop_stmt = dyn_cast (SSA_NAME_DEF_STMT (value)); + if (!nop_stmt || gimple_assign_rhs_code (nop_stmt) != NOP_EXPR) + return NULL; + if (!useless_type_conversion_p (TREE_TYPE (value), load_type)) + { + value = fold_build1 (NOP_EXPR, load_type, gimple_assign_rhs1 (nop_stmt)); + pattern_stmt + = gimple_build_assign (vect_recog_temp_ssa_var (load_type, NULL), + value); + value = gimple_get_lhs (pattern_stmt); + append_pattern_def_seq (vinfo, stmt_info, pattern_stmt); + } + } + + unsigned HOST_WIDE_INT mask_i = (1ULL << TYPE_PRECISION (bf_type)) - 1; + tree mask_t = build_int_cst (load_type, mask_i); + /* Clear bits we don't want to write back from value and shift it in place. */ + pattern_stmt + = gimple_build_assign (vect_recog_temp_ssa_var (load_type, NULL), + fold_build2 (BIT_AND_EXPR, load_type, value, + mask_t)); + append_pattern_def_seq (vinfo, stmt_info, pattern_stmt); + unsigned HOST_WIDE_INT shift_n = tree_to_uhwi (offset); + if (shift_n) + { + pattern_stmt + = gimple_build_assign (vect_recog_temp_ssa_var (load_type, NULL), + LSHIFT_EXPR, value, offset); + append_pattern_def_seq (vinfo, stmt_info, pattern_stmt); + value = gimple_get_lhs (pattern_stmt); + } + /* Mask off the bits in the loaded value. */ + mask_i <<= shift_n; + mask_i = ~mask_i; + mask_t = build_int_cst (load_type, mask_i); + + tree lhs = vect_recog_temp_ssa_var (load_type, NULL); + pattern_stmt = gimple_build_assign (lhs, BIT_AND_EXPR,load, mask_t); + append_pattern_def_seq (vinfo, stmt_info, pattern_stmt); + + /* Compose the value to write back. */ + pattern_stmt + = gimple_build_assign (vect_recog_temp_ssa_var (load_type, NULL), + BIT_IOR_EXPR, lhs, value); + + *type_out = STMT_VINFO_VECTYPE (stmt_info); + vect_pattern_detected ("bit_field_ref pattern", stmt_info->stmt); + + return pattern_stmt; +} + + /* Recognize cases in which an operation is performed in one type WTYPE but could be done more efficiently in a narrower type NTYPE. For example, if we have: @@ -5623,6 +5822,8 @@ struct vect_recog_func taken which means usually the more complex one needs to preceed the less comples onex (widen_sum only after dot_prod or sad for example). */ static vect_recog_func vect_vect_recog_func_ptrs[] = { + { vect_recog_bitfield_ref_pattern, "bitfield_ref" }, + { vect_recog_bit_insert_pattern, "bit_insert" }, { vect_recog_over_widening_pattern, "over_widening" }, /* Must come after over_widening, which narrows the shift as much as possible beforehand. */