From patchwork Wed Apr 19 08:51:19 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jakub Jelinek X-Patchwork-Id: 85261 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp225536vqo; Wed, 19 Apr 2023 01:52:33 -0700 (PDT) X-Google-Smtp-Source: AKy350afLhjZ3CYeGQw5OjZCorL7rI4bi9G6HYi2OWizHgT64pFZKeo0pPUx3MATZM5SKi7Sd8FE X-Received: by 2002:a17:907:2d1f:b0:94f:322d:909c with SMTP id gs31-20020a1709072d1f00b0094f322d909cmr13329064ejc.34.1681894352944; Wed, 19 Apr 2023 01:52:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1681894352; cv=none; d=google.com; s=arc-20160816; b=yKHKaHhbt0hR4r07qnjcXDk+nuyeATIokoW9ozdP/p4+mWY+8GJZKflii0i4jX4mpY u93LbVmk9Z4E25Q2d4kWIlruH62GbHMLE9Nx3c/L/hwdhJS8T9TWesb6cfH8Jr+8qDNa O4zSeCeerDjWK84+tPCpCa3Kw1vk/KSgmPm+rTUUCWMKkMyyzXL3qRPIsW/5MgKjD2zX c47oNzX+oRx3+oB41dh53ebQsoTUqscPSMXhXIIvA250Vr06GvmqqR0IlGEyWDLCETvN A9oHHHB4Dm7QuMU2/AUZ4fxwwrFyryjfbjFbTGa5aFI40lhWEWzaJHi3XoTF+QY0tNdS YhqA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence :content-disposition:mime-version:message-id:subject:cc:to:date :dmarc-filter:delivered-to:dkim-signature:dkim-filter; bh=qskINQd1H3gPHm6gYptRqFAtQtHcSljeyRl/OIUj1eI=; b=gIwE7D7jYNnmnfV9lhsiPv8wWpQZ+xZgesbhtLCSToC66D7MFwqfIcFpHyKOgTGtJT R+ymeNhy+DLcwvvek/qAq6WXtJrwotHx6PLeRmAWyl2CvNZO2A8GlI/zQ7u0M+mMRdf2 TaB0G2hk5UArOCRcxbe1LhSPUbldWjG80gYJo9Ssb/3qqiy09Mo+z2kHBKyvVyBZFjwq H99UcX0IEfXGv5Zjt/2LqJq/zZfEyfHrtJ2DLDeKTTzXD392Uq0bxiI9bXJUJS6RTkCF XzkiombgOSYU8wqFFnGII7A0TdDim2P+QOdgworCSfhfDw6VUy6rCUXjFB5ecO24Wb1m jrKA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=lddYmmD1; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id x4-20020aa7dac4000000b0050698dda5bdsi7373770eds.66.2023.04.19.01.52.32 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 Apr 2023 01:52:32 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=lddYmmD1; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 6A8613858404 for ; Wed, 19 Apr 2023 08:52:22 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 6A8613858404 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1681894342; bh=qskINQd1H3gPHm6gYptRqFAtQtHcSljeyRl/OIUj1eI=; h=Date:To:Cc:Subject:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:From; b=lddYmmD1Ni3Q4NasiuMliCOKxdqznB31OucShI/BMiFJcZLFZMyqjo3To4DdmnSYL CKju22OXzePRczBKuMmhFvHR+XqyidnjoOLUQTGpRSu83hTigaJHMr2BoJeQ5AAP0h pFNTo8YpSoIwDlqxo3Se+I1k/7ts3/YH+yghUGwU= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id AAD873857716 for ; Wed, 19 Apr 2023 08:51:25 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org AAD873857716 Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-611-J1PbBFdaPBe0JXudXtuzDA-1; Wed, 19 Apr 2023 04:51:23 -0400 X-MC-Unique: J1PbBFdaPBe0JXudXtuzDA-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 9402B185A78F; Wed, 19 Apr 2023 08:51:23 +0000 (UTC) Received: from tucnak.zalov.cz (unknown [10.39.194.25]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 5936140C201F; Wed, 19 Apr 2023 08:51:23 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.17.1/8.17.1) with ESMTPS id 33J8pK8v2502843 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Wed, 19 Apr 2023 10:51:21 +0200 Received: (from jakub@localhost) by tucnak.zalov.cz (8.17.1/8.17.1/Submit) id 33J8pKIt2502842; Wed, 19 Apr 2023 10:51:20 +0200 Date: Wed, 19 Apr 2023 10:51:19 +0200 To: Richard Biener Cc: gcc-patches@gcc.gnu.org Subject: [PATCH] tree-vect-patterns: Improve __builtin_{clz,ctz,ffs}ll vectorization [PR109011] Message-ID: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.1 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Disposition: inline X-Spam-Status: No, score=-3.4 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Jakub Jelinek via Gcc-patches From: Jakub Jelinek Reply-To: Jakub Jelinek Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1763594053244060861?= X-GMAIL-MSGID: =?utf-8?q?1763594053244060861?= Hi! For __builtin_popcountll tree-vect-patterns.cc has vect_recog_popcount_pattern, which improves the vectorized code. Without that the vectorization is always multi-type vectorization in the loop (at least int and long long types) where we emit two .POPCOUNT calls with long long arguments and int return value and then widen to long long, so effectively after vectorization do the V?DImode -> V?DImode popcount twice, then pack the result into V?SImode and immediately unpack. The following patch extends that handling to __builtin_{clz,ctz,ffs}ll builtins as well (as long as there is an optab for them; more to come laster). Bootstrapped/regtested on x86_64-linux and i686-linux, plus tested on the testcase in crosses to powerpc64le-linux and s390x-linux. Ok for trunk? x86 can do __builtin_popcountll with -mavx512vpopcntdq, __builtin_clzll with -mavx512cd, ppc can do __builtin_popcountll and __builtin_clzll with -mpower8-vector and __builtin_ctzll with -mpower9-vector, s390 can do __builtin_{popcount,clz,ctz}ll with -march=z13 -mzarch (i.e. VX). 2023-04-19 Jakub Jelinek PR tree-optimization/109011 * tree-vect-patterns.cc (vect_recog_popcount_pattern): Rename to ... (vect_recog_popcount_clz_ctz_ffs_pattern): ... this. Handle also CLZ, CTZ and FFS. Remove vargs variable, use gimple_build_call_internal rather than gimple_build_call_internal_vec. (vect_vect_recog_func_ptrs): Adjust popcount entry. * gcc.dg/vect/pr109011-1.c: New test. Jakub --- gcc/tree-vect-patterns.cc.jj 2023-03-01 09:51:27.995362601 +0100 +++ gcc/tree-vect-patterns.cc 2023-04-18 17:16:42.733935262 +0200 @@ -1501,7 +1501,7 @@ vect_recog_widen_minus_pattern (vec_info "vect_recog_widen_minus_pattern"); } -/* Function vect_recog_popcount_pattern +/* Function vect_recog_popcount_clz_ctz_ffs_pattern Try to find the following pattern: @@ -1530,16 +1530,20 @@ vect_recog_widen_minus_pattern (vec_info * Return value: A new stmt that will be used to replace the sequence of stmts that constitute the pattern. In this case it will be: B = .POPCOUNT (A); + + Similarly for clz, ctz and ffs. */ static gimple * -vect_recog_popcount_pattern (vec_info *vinfo, - stmt_vec_info stmt_vinfo, tree *type_out) +vect_recog_popcount_clz_ctz_ffs_pattern (vec_info *vinfo, + stmt_vec_info stmt_vinfo, + tree *type_out) { gassign *last_stmt = dyn_cast (stmt_vinfo->stmt); - gimple *popcount_stmt, *pattern_stmt; + gimple *call_stmt, *pattern_stmt; tree rhs_oprnd, rhs_origin, lhs_oprnd, lhs_type, vec_type, new_var; - auto_vec vargs; + internal_fn ifn = IFN_LAST; + int addend = 0; /* Find B = (TYPE1) temp_out. */ if (!last_stmt) @@ -1557,51 +1561,137 @@ vect_recog_popcount_pattern (vec_info *v if (TREE_CODE (rhs_oprnd) != SSA_NAME || !has_single_use (rhs_oprnd)) return NULL; - popcount_stmt = SSA_NAME_DEF_STMT (rhs_oprnd); + call_stmt = SSA_NAME_DEF_STMT (rhs_oprnd); /* Find temp_out = __builtin_popcount{,l,ll} (temp_in); */ - if (!is_gimple_call (popcount_stmt)) + if (!is_gimple_call (call_stmt)) return NULL; - switch (gimple_call_combined_fn (popcount_stmt)) + switch (gimple_call_combined_fn (call_stmt)) { + int val; CASE_CFN_POPCOUNT: + ifn = IFN_POPCOUNT; + break; + CASE_CFN_CLZ: + ifn = IFN_CLZ; + /* Punt if call result is unsigned and defined value at zero + is negative, as the negative value doesn't extend correctly. */ + if (TYPE_UNSIGNED (TREE_TYPE (rhs_oprnd)) + && gimple_call_internal_p (call_stmt) + && CLZ_DEFINED_VALUE_AT_ZERO + (SCALAR_INT_TYPE_MODE (TREE_TYPE (rhs_oprnd)), val) == 2 + && val < 0) + return NULL; + break; + CASE_CFN_CTZ: + ifn = IFN_CTZ; + /* Punt if call result is unsigned and defined value at zero + is negative, as the negative value doesn't extend correctly. */ + if (TYPE_UNSIGNED (TREE_TYPE (rhs_oprnd)) + && gimple_call_internal_p (call_stmt) + && CTZ_DEFINED_VALUE_AT_ZERO + (SCALAR_INT_TYPE_MODE (TREE_TYPE (rhs_oprnd)), val) == 2 + && val < 0) + return NULL; + break; + CASE_CFN_FFS: + ifn = IFN_FFS; break; default: return NULL; } - if (gimple_call_num_args (popcount_stmt) != 1) + if (gimple_call_num_args (call_stmt) != 1) return NULL; - rhs_oprnd = gimple_call_arg (popcount_stmt, 0); + rhs_oprnd = gimple_call_arg (call_stmt, 0); vect_unpromoted_value unprom_diff; - rhs_origin = vect_look_through_possible_promotion (vinfo, rhs_oprnd, - &unprom_diff); + rhs_origin + = vect_look_through_possible_promotion (vinfo, rhs_oprnd, &unprom_diff); if (!rhs_origin) return NULL; - /* Input and output of .POPCOUNT should be same-precision integer. - Also A should be unsigned or same precision as temp_in, - otherwise there would be sign_extend from A to temp_in. */ - if (TYPE_PRECISION (unprom_diff.type) != TYPE_PRECISION (lhs_type) - || (!TYPE_UNSIGNED (unprom_diff.type) - && (TYPE_PRECISION (unprom_diff.type) - != TYPE_PRECISION (TREE_TYPE (rhs_oprnd))))) + /* Input and output of .POPCOUNT should be same-precision integer. */ + if (TYPE_PRECISION (unprom_diff.type) != TYPE_PRECISION (lhs_type)) return NULL; - vargs.safe_push (unprom_diff.op); - vect_pattern_detected ("vec_regcog_popcount_pattern", popcount_stmt); + /* Also A should be unsigned or same precision as temp_in, otherwise + different builtins/internal functions have different behaviors. */ + if (TYPE_PRECISION (unprom_diff.type) + != TYPE_PRECISION (TREE_TYPE (rhs_oprnd))) + switch (ifn) + { + case IFN_POPCOUNT: + /* For popcount require zero extension, which doesn't add any + further bits to the count. */ + if (!TYPE_UNSIGNED (unprom_diff.type)) + return NULL; + break; + case IFN_CLZ: + /* clzll (x) == clz (x) + 32 for unsigned x != 0, so ok + if it is undefined at zero or if it matches also for the + defined value there. */ + if (!TYPE_UNSIGNED (unprom_diff.type)) + return NULL; + if (!type_has_mode_precision_p (lhs_type) + || !type_has_mode_precision_p (TREE_TYPE (rhs_oprnd))) + return NULL; + addend = (TYPE_PRECISION (TREE_TYPE (rhs_oprnd)) + - TYPE_PRECISION (lhs_type)); + if (gimple_call_internal_p (call_stmt)) + { + int val1, val2; + int d1 + = CLZ_DEFINED_VALUE_AT_ZERO + (SCALAR_INT_TYPE_MODE (TREE_TYPE (rhs_oprnd)), val1); + int d2 + = CLZ_DEFINED_VALUE_AT_ZERO (SCALAR_INT_TYPE_MODE (lhs_type), + val2); + if (d1 != 2) + break; + if (d2 != 2 || val1 != val2 + addend) + return NULL; + } + break; + case IFN_CTZ: + /* ctzll (x) == ctz (x) for unsigned or signed x != 0, so ok + if it is undefined at zero or if it matches also for the + defined value there. */ + if (gimple_call_internal_p (call_stmt)) + { + int val1, val2; + int d1 + = CTZ_DEFINED_VALUE_AT_ZERO + (SCALAR_INT_TYPE_MODE (TREE_TYPE (rhs_oprnd)), val1); + int d2 + = CTZ_DEFINED_VALUE_AT_ZERO (SCALAR_INT_TYPE_MODE (lhs_type), + val2); + if (d1 != 2) + break; + if (d2 != 2 || val1 != val2) + return NULL; + } + break; + case IFN_FFS: + /* ffsll (x) == ffs (x) for unsigned or signed x. */ + break; + default: + gcc_unreachable (); + } + + vect_pattern_detected ("vec_recog_popcount_clz_ctz_ffs_pattern", + call_stmt); vec_type = get_vectype_for_scalar_type (vinfo, lhs_type); - /* Do it only if the backend has popcount2 pattern. */ + /* Do it only if the backend has popcount2 etc. pattern. */ if (!vec_type - || !direct_internal_fn_supported_p (IFN_POPCOUNT, vec_type, + || !direct_internal_fn_supported_p (ifn, vec_type, OPTIMIZE_FOR_SPEED)) return NULL; /* Create B = .POPCOUNT (A). */ new_var = vect_recog_temp_ssa_var (lhs_type, NULL); - pattern_stmt = gimple_build_call_internal_vec (IFN_POPCOUNT, vargs); + pattern_stmt = gimple_build_call_internal (ifn, 1, unprom_diff.op); gimple_call_set_lhs (pattern_stmt, new_var); gimple_set_location (pattern_stmt, gimple_location (last_stmt)); *type_out = vec_type; @@ -1609,6 +1699,14 @@ vect_recog_popcount_pattern (vec_info *v if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, "created pattern stmt: %G", pattern_stmt); + + if (addend) + { + append_pattern_def_seq (vinfo, stmt_vinfo, pattern_stmt, vec_type); + tree ret_var = vect_recog_temp_ssa_var (lhs_type, NULL); + pattern_stmt = gimple_build_assign (ret_var, PLUS_EXPR, new_var, + build_int_cst (lhs_type, addend)); + } return pattern_stmt; } @@ -6051,7 +6149,7 @@ static vect_recog_func vect_vect_recog_f { vect_recog_sad_pattern, "sad" }, { vect_recog_widen_sum_pattern, "widen_sum" }, { vect_recog_pow_pattern, "pow" }, - { vect_recog_popcount_pattern, "popcount" }, + { vect_recog_popcount_clz_ctz_ffs_pattern, "popcount_clz_ctz_ffs" }, { vect_recog_widen_shift_pattern, "widen_shift" }, { vect_recog_rotate_pattern, "rotate" }, { vect_recog_vector_vector_shift_pattern, "vector_vector_shift" }, --- gcc/testsuite/gcc.dg/vect/pr109011-1.c.jj 2023-04-18 14:40:47.117397908 +0200 +++ gcc/testsuite/gcc.dg/vect/pr109011-1.c 2023-04-18 14:40:05.124004362 +0200 @@ -0,0 +1,48 @@ +/* PR tree-optimization/109011 */ +/* { dg-do compile } */ +/* { dg-options "-O3 -fno-unroll-loops --param=vect-epilogues-nomask=0 -fdump-tree-optimized" } */ +/* { dg-additional-options "-mavx512cd" { target { { i?86-*-* x86_64-*-* } && avx512cd } } } */ +/* { dg-additional-options "-mavx512vpopcntdq" { target { { i?86-*-* x86_64-*-* } && avx512vpopcntdq } } } */ +/* { dg-additional-options "-mpower8-vector" { target powerpc_p8vector_ok } } */ +/* { dg-additional-options "-mpower9-vector" { target powerpc_p9vector_ok } } */ +/* { dg-additional-options "-march=z13 -mzarch" { target s390_vx } } */ + +void +foo (long long *p, long long *q) +{ +#pragma omp simd + for (int i = 0; i < 2048; ++i) + p[i] = __builtin_popcountll (q[i]); +} + +/* { dg-final { scan-tree-dump-times " = \.POPCOUNT \\\(" 1 "optimized" { target { { i?86-*-* x86_64-*-* } && avx512vpopcntdq } } } } */ +/* { dg-final { scan-tree-dump-times " = \.POPCOUNT \\\(" 1 "optimized" { target { powerpc_p8vector_ok || s390_vx } } } } */ + +void +bar (long long *p, long long *q) +{ +#pragma omp simd + for (int i = 0; i < 2048; ++i) + p[i] = __builtin_clzll (q[i]); +} + +/* { dg-final { scan-tree-dump-times " = \.CLZ \\\(" 1 "optimized" { target { { i?86-*-* x86_64-*-* } && avx512cd } } } } */ +/* { dg-final { scan-tree-dump-times " = \.CLZ \\\(" 1 "optimized" { target { powerpc_p8vector_ok || s390_vx } } } } */ + +void +baz (long long *p, long long *q) +{ +#pragma omp simd + for (int i = 0; i < 2048; ++i) + p[i] = __builtin_ctzll (q[i]); +} + +/* { dg-final { scan-tree-dump-times " = \.CTZ \\\(" 1 "optimized" { target { powerpc_p9vector_ok || s390_vx } } } } */ + +void +qux (long long *p, long long *q) +{ +#pragma omp simd + for (int i = 0; i < 2048; ++i) + p[i] = __builtin_ffsll (q[i]); +}