From patchwork Wed Oct 18 09:20:04 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Robin Dapp X-Patchwork-Id: 154770 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:2908:b0:403:3b70:6f57 with SMTP id ib8csp4662118vqb; Wed, 18 Oct 2023 02:20:51 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEClbgYMj5/n9kxUnWwcRjb7vO+K2l9LS5lF7kgpJ9UlIO3eypAUSqPH3xLoiAzbdn7ypxF X-Received: by 2002:a05:6214:1d28:b0:66d:3e3e:5bae with SMTP id f8-20020a0562141d2800b0066d3e3e5baemr5755872qvd.30.1697620851269; Wed, 18 Oct 2023 02:20:51 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1697620851; cv=pass; d=google.com; s=arc-20160816; b=TDFlIGgbhFCrJmBbwdoUfXQPLtOB65AoxZNd4ncMH0iMXvUJBWcsLpj7jjQmTEaUkG Ktx1zr+ZqcxNmcS/67hU6RbOy0TbgqegOD5qPO8K91hLoJ9AdH0Um3kp+ye8O0xBLFyx j3vbarrCW7UiIotoKvtt5y522wNs8Hb2AbQhSuwRxrltpnAGx97MWwl66JQ+254TNxHX gN8ZWBkxryN8GY1V3A+ORBfzACKtfANQ6SKPdVVRALkbVLxMXSbJkRBMVjmyQlufPEz/ 21rlFovm+Ud4Yh7kgq689Wc/c38t3yFTEBjL5SHjEPHH38LFTB8+wfX9zQd9wC1JTMKW THXQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding:to :subject:from:content-language:cc:user-agent:mime-version:date :message-id:dkim-signature:arc-filter:dmarc-filter:delivered-to; bh=AnuIAdoVikmKhdaAxU660XUkPdtU9MbLVsYf9XqFFo8=; fh=MP1/pUCFlWKCuVsZg/AyiqM/gAN9eNrTjtzrUP16xWo=; b=dlJwGlDmR0Htf1c1408Q3g87xLwk7JSJ/eXeTEjRE6stbalaY37t41mrNvp+dY8P87 pcbvTniwonstvxx4BApyqQr5Wu7WAGu1prx8V+Y3k/IMCjf2ROmUtqAihLEQQ9dyYMWW hA34xa/sf0DcEB6slFqBGCR59Y1oSSGyGLdKxnkiB0uG2fP6UnnwHmGU9JuQJ1Z2trnp o97y1wFrZ0rRdy376ZLGzsX4FrMui4zpHHovRPOXxRosMGxJ/RUs+VpIZJWBeM0FKW+M B23KOl81hdS8eDT/BSVhI5F96VwgOg3SP3quC+KUPsXKpbVO64570JAG7XjRmz3qV2Z+ NDYg== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=NlGmAlCm; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from server2.sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id t19-20020a0cb393000000b0066d20730d39si2364477qve.356.2023.10.18.02.20.51 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 18 Oct 2023 02:20:51 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=NlGmAlCm; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id DF00538582BC for ; Wed, 18 Oct 2023 09:20:50 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-lf1-x129.google.com (mail-lf1-x129.google.com [IPv6:2a00:1450:4864:20::129]) by sourceware.org (Postfix) with ESMTPS id 213923858D33 for ; Wed, 18 Oct 2023 09:20:09 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 213923858D33 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 213923858D33 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::129 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1697620813; cv=none; b=Djpdd2miCRps7ANw8hDJ2TPxRFRe2HMi0lIFmaRM5fxYPjMQ3HQgVWQNpAnbZ6I2j4LhzsZKlpaXrH8loTH9nTzTdpWqOOws1/012+MXGRLx+0KyL6ABjxVuCn6RjDTAN64MsTJ2/fCCEl0posz8Wqe9/KqGF7Lx/epEvn3Tmlc= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1697620813; c=relaxed/simple; bh=s9px1NOuQAhXJw8hOigIE1o+yzKPrDr8l6oJxHGV6SU=; h=DKIM-Signature:Message-ID:Date:MIME-Version:From:Subject:To; b=WtcV2DY2iw28sxaVDbsAU4OjXytNrZpX2vbl0dUVRP86VTeVXJculolu7OeI0VSdFBenDsj7AU627yk0tVeX5A7P2pdc5c25qEisMFwYH8jkQmfa6EGn7LBuoeamdqBsKCOc6Wbdv6/l6Kexe6wVhXvCLVKIqEzT8EQKBF0PMc0= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-lf1-x129.google.com with SMTP id 2adb3069b0e04-507ac66a969so4117963e87.3 for ; Wed, 18 Oct 2023 02:20:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1697620807; x=1698225607; darn=gcc.gnu.org; h=content-transfer-encoding:to:subject:from:content-language:cc :user-agent:mime-version:date:message-id:from:to:cc:subject:date :message-id:reply-to; bh=AnuIAdoVikmKhdaAxU660XUkPdtU9MbLVsYf9XqFFo8=; b=NlGmAlCmLr53MxS8l+0uGlqjexl1h6QTPdatoTP37u0UfcUmkAjIQZk1faohCpfNAd XHSs2Z03M+fKvecvCRtD+pWXOEU0/NsZHZNeXKkt8SHWo/DX3Xt1gCssnSgfvf/MeM+A ynpG0MhoW30Pcg0WcR3SXxymXQKRD5g6zXUdjG+eOba0o9VFO5xIO15UmiYjCwtByup0 uyWAtqVyH8+1ZIdm/Y/2fVEgmqv3mZiq2DnMcD4B3yCVBPXvvpN7V6zHQ1rVPa2JxRGX KExXP7G53+MMb+1mUU31WCejZech1rrEYMAiOTywYkKNvl1FzWiT+jifO9vxt1Et+icF VIWg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697620807; x=1698225607; h=content-transfer-encoding:to:subject:from:content-language:cc :user-agent:mime-version:date:message-id:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=AnuIAdoVikmKhdaAxU660XUkPdtU9MbLVsYf9XqFFo8=; b=K++G0y10r5UpIe7wUUvGNhcuIsvo2znxcakQQiSzWtvmXvhptjfLXRGRS2A8huU5Eg DQX/el7cJcF5nWiGG1kjk3xFnn8NH0M9DXzvG9ioPA/Wlq+qc16cE4ugkilUrHcbbgvc zQ550v5gshyqUKd3/ABFJfvrQDfhg+J2mvFTq+FHQksbibnDSyV508fuCG7XfpH727FX F1pyzEm2kWksQ5S7bPeM9XHbD6YKvMI5FZUZIQaBNypdbYqkFFks0gQ+j3Ce6g33pm0R N1PV7rihBZxnBttDV16JCF3QZeJ37F9YrYdvKuOeh/H9pm7nwR9HkhSce+5wEELCVPFu R2kg== X-Gm-Message-State: AOJu0Ywx69PN25uC6UTKkMrZwexY9SIbjEH24XX8+EtT1/BIg9WKBOMK 9AWeLz241dTumJxRXQ+jXLxC4miNWvA= X-Received: by 2002:ac2:454a:0:b0:500:b7dc:6c90 with SMTP id j10-20020ac2454a000000b00500b7dc6c90mr3315219lfm.36.1697620805929; Wed, 18 Oct 2023 02:20:05 -0700 (PDT) Received: from [192.168.1.23] (ip-046-223-203-173.um13.pools.vodafone-ip.de. [46.223.203.173]) by smtp.gmail.com with ESMTPSA id e6-20020adfe386000000b0032d96dd703bsm1658925wrm.70.2023.10.18.02.20.05 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 18 Oct 2023 02:20:05 -0700 (PDT) Message-ID: <01d52528-7616-422a-8e8f-6073af049b39@gmail.com> Date: Wed, 18 Oct 2023 11:20:04 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Cc: rdapp.gcc@gmail.com Content-Language: en-US From: Robin Dapp Subject: [PATCH] RISC-V: Add popcount fallback expander. To: gcc-patches , palmer , Kito Cheng , jeffreyalaw , "juzhe.zhong@rivai.ai" X-Spam-Status: No, score=-8.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_ASCII_DIVIDERS, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1780084481984286799 X-GMAIL-MSGID: 1780084481984286799 Hi, as I didn't manage to get back to the generic vectorizer fallback for popcount in time (still the generic costing problem) I figured I'd rather implement the popcount fallback in the riscv backend. It uses the WWG algorithm from libgcc. rvv.exp is unchanged, vect and dg.exp testsuites are currently running. Regards Robin gcc/ChangeLog: * config/riscv/autovec.md (popcount2): New expander. * config/riscv/riscv-protos.h (expand_popcount): Define. * config/riscv/riscv-v.cc (expand_popcount): Vectorize popcount with the WWG algorithm. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/unop/popcount-1.c: New test. * gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c: New test. * gcc.target/riscv/rvv/autovec/unop/popcount.c: New test. --- gcc/config/riscv/autovec.md | 14 + gcc/config/riscv/riscv-protos.h | 1 + gcc/config/riscv/riscv-v.cc | 71 + .../riscv/rvv/autovec/unop/popcount-1.c | 20 + .../riscv/rvv/autovec/unop/popcount-run-1.c | 49 + .../riscv/rvv/autovec/unop/popcount.c | 1464 +++++++++++++++++ 6 files changed, 1619 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md index c5b1e52cbf9..dfe836f705d 100644 --- a/gcc/config/riscv/autovec.md +++ b/gcc/config/riscv/autovec.md @@ -1484,6 +1484,20 @@ (define_expand "xorsign3" DONE; }) +;; ------------------------------------------------------------------------------- +;; - [INT] POPCOUNT. +;; ------------------------------------------------------------------------------- + +(define_expand "popcount2" + [(match_operand:VI 0 "register_operand") + (match_operand:VI 1 "register_operand")] + "TARGET_VECTOR" +{ + riscv_vector::expand_popcount (operands); + DONE; +}) + + ;; ------------------------------------------------------------------------- ;; ---- [INT] Highpart multiplication ;; ------------------------------------------------------------------------- diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h index 49bdcdf2f93..4aeccdd961b 100644 --- a/gcc/config/riscv/riscv-protos.h +++ b/gcc/config/riscv/riscv-protos.h @@ -515,6 +515,7 @@ void expand_fold_extract_last (rtx *); void expand_cond_unop (unsigned, rtx *); void expand_cond_binop (unsigned, rtx *); void expand_cond_ternop (unsigned, rtx *); +void expand_popcount (rtx *); /* Rounding mode bitfield for fixed point VXRM. */ enum fixed_point_rounding_mode diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index 21d86c3f917..8b594b7127e 100644 --- a/gcc/config/riscv/riscv-v.cc +++ b/gcc/config/riscv/riscv-v.cc @@ -4152,4 +4152,75 @@ expand_vec_lfloor (rtx op_0, rtx op_1, machine_mode vec_fp_mode, emit_vec_cvt_x_f (op_0, op_1, UNARY_OP_FRM_RDN, vec_fp_mode); } +/* Vectorize popcount by the Wilkes-Wheeler-Gill algorithm that libgcc uses as + well. */ +void +expand_popcount (rtx *ops) +{ + rtx dst = ops[0]; + rtx src = ops[1]; + machine_mode mode = GET_MODE (dst); + scalar_mode imode = GET_MODE_INNER (mode); + static const uint64_t m5 = 0x5555555555555555ULL; + static const uint64_t m3 = 0x3333333333333333ULL; + static const uint64_t mf = 0x0F0F0F0F0F0F0F0FULL; + static const uint64_t m1 = 0x0101010101010101ULL; + + rtx x1 = gen_reg_rtx (mode); + rtx x2 = gen_reg_rtx (mode); + rtx x3 = gen_reg_rtx (mode); + rtx x4 = gen_reg_rtx (mode); + + /* x1 = src - (src >> 1) & 0x555...); */ + rtx shift1 = expand_binop (mode, lshr_optab, src, GEN_INT (1), NULL, true, + OPTAB_DIRECT); + + rtx and1 = gen_reg_rtx (mode); + rtx ops1[] = {and1, shift1, gen_int_mode (m5, imode)}; + emit_vlmax_insn (code_for_pred_scalar (AND, mode), riscv_vector::BINARY_OP, + ops1); + + x1 = expand_binop (mode, sub_optab, src, and1, NULL, true, OPTAB_DIRECT); + + /* x2 = (x1 & 0x3333333333333333ULL) + ((x1 >> 2) & 0x3333333333333333ULL); + */ + rtx and2 = gen_reg_rtx (mode); + rtx ops2[] = {and2, x1, gen_int_mode (m3, imode)}; + emit_vlmax_insn (code_for_pred_scalar (AND, mode), riscv_vector::BINARY_OP, + ops2); + + rtx shift2 = expand_binop (mode, lshr_optab, x1, GEN_INT (2), NULL, true, + OPTAB_DIRECT); + + rtx and22 = gen_reg_rtx (mode); + rtx ops22[] = {and22, shift2, gen_int_mode (m3, imode)}; + emit_vlmax_insn (code_for_pred_scalar (AND, mode), riscv_vector::BINARY_OP, + ops22); + + x2 = expand_binop (mode, add_optab, and2, and22, NULL, true, OPTAB_DIRECT); + + /* x3 = (x2 + (x2 >> 4)) & 0x0f0f0f0f0f0f0f0fULL; */ + rtx shift3 = expand_binop (mode, lshr_optab, x2, GEN_INT (4), NULL, true, + OPTAB_DIRECT); + + rtx plus3 + = expand_binop (mode, add_optab, x2, shift3, NULL, true, OPTAB_DIRECT); + + rtx ops3[] = {x3, plus3, gen_int_mode (mf, imode)}; + emit_vlmax_insn (code_for_pred_scalar (AND, mode), riscv_vector::BINARY_OP, + ops3); + + /* dest = (x3 * 0x0101010101010101ULL) >> 56; */ + rtx mul4 = gen_reg_rtx (mode); + rtx ops4[] = {mul4, x3, gen_int_mode (m1, imode)}; + emit_vlmax_insn (code_for_pred_scalar (MULT, mode), riscv_vector::BINARY_OP, + ops4); + + x4 = expand_binop (mode, lshr_optab, mul4, + GEN_INT (GET_MODE_BITSIZE (imode) - 8), NULL, true, + OPTAB_DIRECT); + + emit_move_insn (dst, x4); +} + } // namespace riscv_vector diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount-1.c new file mode 100644 index 00000000000..3169ebbff71 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount-1.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-march=rv64gcv_zvfh -mabi=lp64d --param=riscv-autovec-preference=scalable -fno-vect-cost-model -fdump-tree-vect-details" } */ + +#include + +void __attribute__ ((noipa)) +popcount_32 (uint32_t *restrict dst, uint32_t *restrict src, int size) +{ + for (int i = 0; i < size; ++i) + dst[i] = __builtin_popcount (src[i]); +} + +void __attribute__ ((noipa)) +popcount_64 (uint64_t *restrict dst, uint64_t *restrict src, int size) +{ + for (int i = 0; i < size; ++i) + dst[i] = __builtin_popcountll (src[i]); +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 2 "vect" } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c new file mode 100644 index 00000000000..38f1633da99 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c @@ -0,0 +1,49 @@ +/* { dg-do run { target { riscv_v } } } */ + +#include "popcount-1.c" + +extern void abort (void) __attribute__ ((noreturn)); + +unsigned int data[] = { + 0x11111100, 6, + 0xe0e0f0f0, 14, + 0x9900aab3, 13, + 0x00040003, 3, + 0x000e000c, 5, + 0x22227777, 16, + 0x12341234, 10, + 0x0, 0 +}; + +int __attribute__ ((optimize (1))) +main (void) +{ + unsigned int count = sizeof (data) / sizeof (data[0]) / 2; + + uint32_t in32[count]; + uint32_t out32[count]; + for (unsigned int i = 0; i < count; ++i) + { + in32[i] = data[i * 2]; + asm volatile ("" ::: "memory"); + } + popcount_32 (out32, in32, count); + for (unsigned int i = 0; i < count; ++i) + if (out32[i] != data[i * 2 + 1]) + abort (); + + count /= 2; + uint64_t in64[count]; + uint64_t out64[count]; + for (unsigned int i = 0; i < count; ++i) + { + in64[i] = ((uint64_t) data[i * 4] << 32) | data[i * 4 + 2]; + asm volatile ("" ::: "memory"); + } + popcount_64 (out64, in64, count); + for (unsigned int i = 0; i < count; ++i) + if (out64[i] != data[i * 4 + 1] + data[i * 4 + 3]) + abort (); + + return 0; +} diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c new file mode 100644 index 00000000000..585a522aa81 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c @@ -0,0 +1,1464 @@ +/* { dg-do run { target { riscv_v } } } */ +/* { dg-additional-options { -O2 -fdump-tree-vect-details -fno-vect-cost-model } } */ + +#include "stdint-gcc.h" +#include + +#define DEF64(TYPEDST, TYPESRC) \ + void __attribute__ ((noipa)) \ + popcount64_##TYPEDST##TYPESRC (TYPEDST *restrict dst, TYPESRC *restrict src, \ + int size) \ + { \ + for (int i = 0; i < size; ++i) \ + dst[i] = __builtin_popcountll (src[i]); \ + } + +#define DEF32(TYPEDST, TYPESRC) \ + void __attribute__ ((noipa)) \ + popcount32_##TYPEDST##TYPESRC (TYPEDST *restrict dst, TYPESRC *restrict src, \ + int size) \ + { \ + for (int i = 0; i < size; ++i) \ + dst[i] = __builtin_popcount (src[i]); \ + } + +#define DEFCTZ64(TYPEDST, TYPESRC) \ + void __attribute__ ((noipa)) \ + ctz64_##TYPEDST##TYPESRC (TYPEDST *restrict dst, TYPESRC *restrict src, \ + int size) \ + { \ + for (int i = 0; i < size; ++i) \ + dst[i] = __builtin_ctzll (src[i]); \ + } + +#define DEFCTZ32(TYPEDST, TYPESRC) \ + void __attribute__ ((noipa)) \ + ctz32_##TYPEDST##TYPESRC (TYPEDST *restrict dst, TYPESRC *restrict src, \ + int size) \ + { \ + for (int i = 0; i < size; ++i) \ + dst[i] = __builtin_ctz (src[i]); \ + } + +#define DEFFFS64(TYPEDST, TYPESRC) \ + void __attribute__ ((noipa)) \ + ffs64_##TYPEDST##TYPESRC (TYPEDST *restrict dst, TYPESRC *restrict src, \ + int size) \ + { \ + for (int i = 0; i < size; ++i) \ + dst[i] = __builtin_ffsll (src[i]); \ + } + +#define DEFFFS32(TYPEDST, TYPESRC) \ + void __attribute__ ((noipa)) \ + ffs32_##TYPEDST##TYPESRC (TYPEDST *restrict dst, TYPESRC *restrict src, \ + int size) \ + { \ + for (int i = 0; i < size; ++i) \ + dst[i] = __builtin_ffs (src[i]); \ + } + +#define DEF_ALL() \ + DEF64 (uint64_t, uint64_t) \ + DEF64 (uint64_t, uint32_t) \ + DEF64 (uint64_t, uint16_t) \ + DEF64 (uint64_t, uint8_t) \ + DEF64 (uint64_t, int64_t) \ + DEF64 (uint64_t, int32_t) \ + DEF64 (uint64_t, int16_t) \ + DEF64 (uint64_t, int8_t) \ + DEF64 (int64_t, uint64_t) \ + DEF64 (int64_t, uint32_t) \ + DEF64 (int64_t, uint16_t) \ + DEF64 (int64_t, uint8_t) \ + DEF64 (int64_t, int64_t) \ + DEF64 (int64_t, int32_t) \ + DEF64 (int64_t, int16_t) \ + DEF64 (int64_t, int8_t) \ + DEF64 (uint32_t, uint64_t) \ + DEF64 (uint32_t, uint32_t) \ + DEF64 (uint32_t, uint16_t) \ + DEF64 (uint32_t, uint8_t) \ + DEF64 (uint32_t, int64_t) \ + DEF64 (uint32_t, int32_t) \ + DEF64 (uint32_t, int16_t) \ + DEF64 (uint32_t, int8_t) \ + DEF64 (int32_t, uint64_t) \ + DEF64 (int32_t, uint32_t) \ + DEF64 (int32_t, uint16_t) \ + DEF64 (int32_t, uint8_t) \ + DEF64 (int32_t, int64_t) \ + DEF64 (int32_t, int32_t) \ + DEF64 (int32_t, int16_t) \ + DEF64 (int32_t, int8_t) \ + DEF64 (uint16_t, uint64_t) \ + DEF64 (uint16_t, uint32_t) \ + DEF64 (uint16_t, uint16_t) \ + DEF64 (uint16_t, uint8_t) \ + DEF64 (uint16_t, int64_t) \ + DEF64 (uint16_t, int32_t) \ + DEF64 (uint16_t, int16_t) \ + DEF64 (uint16_t, int8_t) \ + DEF64 (int16_t, uint64_t) \ + DEF64 (int16_t, uint32_t) \ + DEF64 (int16_t, uint16_t) \ + DEF64 (int16_t, uint8_t) \ + DEF64 (int16_t, int64_t) \ + DEF64 (int16_t, int32_t) \ + DEF64 (int16_t, int16_t) \ + DEF64 (int16_t, int8_t) \ + DEF64 (uint8_t, uint64_t) \ + DEF64 (uint8_t, uint32_t) \ + DEF64 (uint8_t, uint16_t) \ + DEF64 (uint8_t, uint8_t) \ + DEF64 (uint8_t, int64_t) \ + DEF64 (uint8_t, int32_t) \ + DEF64 (uint8_t, int16_t) \ + DEF64 (uint8_t, int8_t) \ + DEF64 (int8_t, uint64_t) \ + DEF64 (int8_t, uint32_t) \ + DEF64 (int8_t, uint16_t) \ + DEF64 (int8_t, uint8_t) \ + DEF64 (int8_t, int64_t) \ + DEF64 (int8_t, int32_t) \ + DEF64 (int8_t, int16_t) \ + DEF64 (int8_t, int8_t) \ + DEF32 (uint64_t, uint64_t) \ + DEF32 (uint64_t, uint32_t) \ + DEF32 (uint64_t, uint16_t) \ + DEF32 (uint64_t, uint8_t) \ + DEF32 (uint64_t, int64_t) \ + DEF32 (uint64_t, int32_t) \ + DEF32 (uint64_t, int16_t) \ + DEF32 (uint64_t, int8_t) \ + DEF32 (int64_t, uint64_t) \ + DEF32 (int64_t, uint32_t) \ + DEF32 (int64_t, uint16_t) \ + DEF32 (int64_t, uint8_t) \ + DEF32 (int64_t, int64_t) \ + DEF32 (int64_t, int32_t) \ + DEF32 (int64_t, int16_t) \ + DEF32 (int64_t, int8_t) \ + DEF32 (uint32_t, uint64_t) \ + DEF32 (uint32_t, uint32_t) \ + DEF32 (uint32_t, uint16_t) \ + DEF32 (uint32_t, uint8_t) \ + DEF32 (uint32_t, int64_t) \ + DEF32 (uint32_t, int32_t) \ + DEF32 (uint32_t, int16_t) \ + DEF32 (uint32_t, int8_t) \ + DEF32 (int32_t, uint64_t) \ + DEF32 (int32_t, uint32_t) \ + DEF32 (int32_t, uint16_t) \ + DEF32 (int32_t, uint8_t) \ + DEF32 (int32_t, int64_t) \ + DEF32 (int32_t, int32_t) \ + DEF32 (int32_t, int16_t) \ + DEF32 (int32_t, int8_t) \ + DEF32 (uint16_t, uint64_t) \ + DEF32 (uint16_t, uint32_t) \ + DEF32 (uint16_t, uint16_t) \ + DEF32 (uint16_t, uint8_t) \ + DEF32 (uint16_t, int64_t) \ + DEF32 (uint16_t, int32_t) \ + DEF32 (uint16_t, int16_t) \ + DEF32 (uint16_t, int8_t) \ + DEF32 (int16_t, uint64_t) \ + DEF32 (int16_t, uint32_t) \ + DEF32 (int16_t, uint16_t) \ + DEF32 (int16_t, uint8_t) \ + DEF32 (int16_t, int64_t) \ + DEF32 (int16_t, int32_t) \ + DEF32 (int16_t, int16_t) \ + DEF32 (int16_t, int8_t) \ + DEF32 (uint8_t, uint64_t) \ + DEF32 (uint8_t, uint32_t) \ + DEF32 (uint8_t, uint16_t) \ + DEF32 (uint8_t, uint8_t) \ + DEF32 (uint8_t, int64_t) \ + DEF32 (uint8_t, int32_t) \ + DEF32 (uint8_t, int16_t) \ + DEF32 (uint8_t, int8_t) \ + DEF32 (int8_t, uint64_t) \ + DEF32 (int8_t, uint32_t) \ + DEF32 (int8_t, uint16_t) \ + DEF32 (int8_t, uint8_t) \ + DEF32 (int8_t, int64_t) \ + DEF32 (int8_t, int32_t) \ + DEF32 (int8_t, int16_t) \ + DEF32 (int8_t, int8_t) \ + DEFCTZ64 (uint64_t, uint64_t) \ + DEFCTZ64 (uint64_t, uint32_t) \ + DEFCTZ64 (uint64_t, uint16_t) \ + DEFCTZ64 (uint64_t, uint8_t) \ + DEFCTZ64 (uint64_t, int64_t) \ + DEFCTZ64 (uint64_t, int32_t) \ + DEFCTZ64 (uint64_t, int16_t) \ + DEFCTZ64 (uint64_t, int8_t) \ + DEFCTZ64 (int64_t, uint64_t) \ + DEFCTZ64 (int64_t, uint32_t) \ + DEFCTZ64 (int64_t, uint16_t) \ + DEFCTZ64 (int64_t, uint8_t) \ + DEFCTZ64 (int64_t, int64_t) \ + DEFCTZ64 (int64_t, int32_t) \ + DEFCTZ64 (int64_t, int16_t) \ + DEFCTZ64 (int64_t, int8_t) \ + DEFCTZ64 (uint32_t, uint64_t) \ + DEFCTZ64 (uint32_t, uint32_t) \ + DEFCTZ64 (uint32_t, uint16_t) \ + DEFCTZ64 (uint32_t, uint8_t) \ + DEFCTZ64 (uint32_t, int64_t) \ + DEFCTZ64 (uint32_t, int32_t) \ + DEFCTZ64 (uint32_t, int16_t) \ + DEFCTZ64 (uint32_t, int8_t) \ + DEFCTZ64 (int32_t, uint64_t) \ + DEFCTZ64 (int32_t, uint32_t) \ + DEFCTZ64 (int32_t, uint16_t) \ + DEFCTZ64 (int32_t, uint8_t) \ + DEFCTZ64 (int32_t, int64_t) \ + DEFCTZ64 (int32_t, int32_t) \ + DEFCTZ64 (int32_t, int16_t) \ + DEFCTZ64 (int32_t, int8_t) \ + DEFCTZ64 (uint16_t, uint64_t) \ + DEFCTZ64 (uint16_t, uint32_t) \ + DEFCTZ64 (uint16_t, uint16_t) \ + DEFCTZ64 (uint16_t, uint8_t) \ + DEFCTZ64 (uint16_t, int64_t) \ + DEFCTZ64 (uint16_t, int32_t) \ + DEFCTZ64 (uint16_t, int16_t) \ + DEFCTZ64 (uint16_t, int8_t) \ + DEFCTZ64 (int16_t, uint64_t) \ + DEFCTZ64 (int16_t, uint32_t) \ + DEFCTZ64 (int16_t, uint16_t) \ + DEFCTZ64 (int16_t, uint8_t) \ + DEFCTZ64 (int16_t, int64_t) \ + DEFCTZ64 (int16_t, int32_t) \ + DEFCTZ64 (int16_t, int16_t) \ + DEFCTZ64 (int16_t, int8_t) \ + DEFCTZ64 (uint8_t, uint64_t) \ + DEFCTZ64 (uint8_t, uint32_t) \ + DEFCTZ64 (uint8_t, uint16_t) \ + DEFCTZ64 (uint8_t, uint8_t) \ + DEFCTZ64 (uint8_t, int64_t) \ + DEFCTZ64 (uint8_t, int32_t) \ + DEFCTZ64 (uint8_t, int16_t) \ + DEFCTZ64 (uint8_t, int8_t) \ + DEFCTZ64 (int8_t, uint64_t) \ + DEFCTZ64 (int8_t, uint32_t) \ + DEFCTZ64 (int8_t, uint16_t) \ + DEFCTZ64 (int8_t, uint8_t) \ + DEFCTZ64 (int8_t, int64_t) \ + DEFCTZ64 (int8_t, int32_t) \ + DEFCTZ64 (int8_t, int16_t) \ + DEFCTZ64 (int8_t, int8_t) \ + DEFCTZ32 (uint64_t, uint64_t) \ + DEFCTZ32 (uint64_t, uint32_t) \ + DEFCTZ32 (uint64_t, uint16_t) \ + DEFCTZ32 (uint64_t, uint8_t) \ + DEFCTZ32 (uint64_t, int64_t) \ + DEFCTZ32 (uint64_t, int32_t) \ + DEFCTZ32 (uint64_t, int16_t) \ + DEFCTZ32 (uint64_t, int8_t) \ + DEFCTZ32 (int64_t, uint64_t) \ + DEFCTZ32 (int64_t, uint32_t) \ + DEFCTZ32 (int64_t, uint16_t) \ + DEFCTZ32 (int64_t, uint8_t) \ + DEFCTZ32 (int64_t, int64_t) \ + DEFCTZ32 (int64_t, int32_t) \ + DEFCTZ32 (int64_t, int16_t) \ + DEFCTZ32 (int64_t, int8_t) \ + DEFCTZ32 (uint32_t, uint64_t) \ + DEFCTZ32 (uint32_t, uint32_t) \ + DEFCTZ32 (uint32_t, uint16_t) \ + DEFCTZ32 (uint32_t, uint8_t) \ + DEFCTZ32 (uint32_t, int64_t) \ + DEFCTZ32 (uint32_t, int32_t) \ + DEFCTZ32 (uint32_t, int16_t) \ + DEFCTZ32 (uint32_t, int8_t) \ + DEFCTZ32 (int32_t, uint64_t) \ + DEFCTZ32 (int32_t, uint32_t) \ + DEFCTZ32 (int32_t, uint16_t) \ + DEFCTZ32 (int32_t, uint8_t) \ + DEFCTZ32 (int32_t, int64_t) \ + DEFCTZ32 (int32_t, int32_t) \ + DEFCTZ32 (int32_t, int16_t) \ + DEFCTZ32 (int32_t, int8_t) \ + DEFCTZ32 (uint16_t, uint64_t) \ + DEFCTZ32 (uint16_t, uint32_t) \ + DEFCTZ32 (uint16_t, uint16_t) \ + DEFCTZ32 (uint16_t, uint8_t) \ + DEFCTZ32 (uint16_t, int64_t) \ + DEFCTZ32 (uint16_t, int32_t) \ + DEFCTZ32 (uint16_t, int16_t) \ + DEFCTZ32 (uint16_t, int8_t) \ + DEFCTZ32 (int16_t, uint64_t) \ + DEFCTZ32 (int16_t, uint32_t) \ + DEFCTZ32 (int16_t, uint16_t) \ + DEFCTZ32 (int16_t, uint8_t) \ + DEFCTZ32 (int16_t, int64_t) \ + DEFCTZ32 (int16_t, int32_t) \ + DEFCTZ32 (int16_t, int16_t) \ + DEFCTZ32 (int16_t, int8_t) \ + DEFCTZ32 (uint8_t, uint64_t) \ + DEFCTZ32 (uint8_t, uint32_t) \ + DEFCTZ32 (uint8_t, uint16_t) \ + DEFCTZ32 (uint8_t, uint8_t) \ + DEFCTZ32 (uint8_t, int64_t) \ + DEFCTZ32 (uint8_t, int32_t) \ + DEFCTZ32 (uint8_t, int16_t) \ + DEFCTZ32 (uint8_t, int8_t) \ + DEFCTZ32 (int8_t, uint64_t) \ + DEFCTZ32 (int8_t, uint32_t) \ + DEFCTZ32 (int8_t, uint16_t) \ + DEFCTZ32 (int8_t, uint8_t) \ + DEFCTZ32 (int8_t, int64_t) \ + DEFCTZ32 (int8_t, int32_t) \ + DEFCTZ32 (int8_t, int16_t) \ + DEFCTZ32 (int8_t, int8_t) \ + DEFFFS64 (uint64_t, uint64_t) \ + DEFFFS64 (uint64_t, uint32_t) \ + DEFFFS64 (uint64_t, uint16_t) \ + DEFFFS64 (uint64_t, uint8_t) \ + DEFFFS64 (uint64_t, int64_t) \ + DEFFFS64 (uint64_t, int32_t) \ + DEFFFS64 (uint64_t, int16_t) \ + DEFFFS64 (uint64_t, int8_t) \ + DEFFFS64 (int64_t, uint64_t) \ + DEFFFS64 (int64_t, uint32_t) \ + DEFFFS64 (int64_t, uint16_t) \ + DEFFFS64 (int64_t, uint8_t) \ + DEFFFS64 (int64_t, int64_t) \ + DEFFFS64 (int64_t, int32_t) \ + DEFFFS64 (int64_t, int16_t) \ + DEFFFS64 (int64_t, int8_t) \ + DEFFFS64 (uint32_t, uint64_t) \ + DEFFFS64 (uint32_t, uint32_t) \ + DEFFFS64 (uint32_t, uint16_t) \ + DEFFFS64 (uint32_t, uint8_t) \ + DEFFFS64 (uint32_t, int64_t) \ + DEFFFS64 (uint32_t, int32_t) \ + DEFFFS64 (uint32_t, int16_t) \ + DEFFFS64 (uint32_t, int8_t) \ + DEFFFS64 (int32_t, uint64_t) \ + DEFFFS64 (int32_t, uint32_t) \ + DEFFFS64 (int32_t, uint16_t) \ + DEFFFS64 (int32_t, uint8_t) \ + DEFFFS64 (int32_t, int64_t) \ + DEFFFS64 (int32_t, int32_t) \ + DEFFFS64 (int32_t, int16_t) \ + DEFFFS64 (int32_t, int8_t) \ + DEFFFS64 (uint16_t, uint64_t) \ + DEFFFS64 (uint16_t, uint32_t) \ + DEFFFS64 (uint16_t, uint16_t) \ + DEFFFS64 (uint16_t, uint8_t) \ + DEFFFS64 (uint16_t, int64_t) \ + DEFFFS64 (uint16_t, int32_t) \ + DEFFFS64 (uint16_t, int16_t) \ + DEFFFS64 (uint16_t, int8_t) \ + DEFFFS64 (int16_t, uint64_t) \ + DEFFFS64 (int16_t, uint32_t) \ + DEFFFS64 (int16_t, uint16_t) \ + DEFFFS64 (int16_t, uint8_t) \ + DEFFFS64 (int16_t, int64_t) \ + DEFFFS64 (int16_t, int32_t) \ + DEFFFS64 (int16_t, int16_t) \ + DEFFFS64 (int16_t, int8_t) \ + DEFFFS64 (uint8_t, uint64_t) \ + DEFFFS64 (uint8_t, uint32_t) \ + DEFFFS64 (uint8_t, uint16_t) \ + DEFFFS64 (uint8_t, uint8_t) \ + DEFFFS64 (uint8_t, int64_t) \ + DEFFFS64 (uint8_t, int32_t) \ + DEFFFS64 (uint8_t, int16_t) \ + DEFFFS64 (uint8_t, int8_t) \ + DEFFFS64 (int8_t, uint64_t) \ + DEFFFS64 (int8_t, uint32_t) \ + DEFFFS64 (int8_t, uint16_t) \ + DEFFFS64 (int8_t, uint8_t) \ + DEFFFS64 (int8_t, int64_t) \ + DEFFFS64 (int8_t, int32_t) \ + DEFFFS64 (int8_t, int16_t) \ + DEFFFS64 (int8_t, int8_t) \ + DEFFFS32 (uint64_t, uint64_t) \ + DEFFFS32 (uint64_t, uint32_t) \ + DEFFFS32 (uint64_t, uint16_t) \ + DEFFFS32 (uint64_t, uint8_t) \ + DEFFFS32 (uint64_t, int64_t) \ + DEFFFS32 (uint64_t, int32_t) \ + DEFFFS32 (uint64_t, int16_t) \ + DEFFFS32 (uint64_t, int8_t) \ + DEFFFS32 (int64_t, uint64_t) \ + DEFFFS32 (int64_t, uint32_t) \ + DEFFFS32 (int64_t, uint16_t) \ + DEFFFS32 (int64_t, uint8_t) \ + DEFFFS32 (int64_t, int64_t) \ + DEFFFS32 (int64_t, int32_t) \ + DEFFFS32 (int64_t, int16_t) \ + DEFFFS32 (int64_t, int8_t) \ + DEFFFS32 (uint32_t, uint64_t) \ + DEFFFS32 (uint32_t, uint32_t) \ + DEFFFS32 (uint32_t, uint16_t) \ + DEFFFS32 (uint32_t, uint8_t) \ + DEFFFS32 (uint32_t, int64_t) \ + DEFFFS32 (uint32_t, int32_t) \ + DEFFFS32 (uint32_t, int16_t) \ + DEFFFS32 (uint32_t, int8_t) \ + DEFFFS32 (int32_t, uint64_t) \ + DEFFFS32 (int32_t, uint32_t) \ + DEFFFS32 (int32_t, uint16_t) \ + DEFFFS32 (int32_t, uint8_t) \ + DEFFFS32 (int32_t, int64_t) \ + DEFFFS32 (int32_t, int32_t) \ + DEFFFS32 (int32_t, int16_t) \ + DEFFFS32 (int32_t, int8_t) \ + DEFFFS32 (uint16_t, uint64_t) \ + DEFFFS32 (uint16_t, uint32_t) \ + DEFFFS32 (uint16_t, uint16_t) \ + DEFFFS32 (uint16_t, uint8_t) \ + DEFFFS32 (uint16_t, int64_t) \ + DEFFFS32 (uint16_t, int32_t) \ + DEFFFS32 (uint16_t, int16_t) \ + DEFFFS32 (uint16_t, int8_t) \ + DEFFFS32 (int16_t, uint64_t) \ + DEFFFS32 (int16_t, uint32_t) \ + DEFFFS32 (int16_t, uint16_t) \ + DEFFFS32 (int16_t, uint8_t) \ + DEFFFS32 (int16_t, int64_t) \ + DEFFFS32 (int16_t, int32_t) \ + DEFFFS32 (int16_t, int16_t) \ + DEFFFS32 (int16_t, int8_t) \ + DEFFFS32 (uint8_t, uint64_t) \ + DEFFFS32 (uint8_t, uint32_t) \ + DEFFFS32 (uint8_t, uint16_t) \ + DEFFFS32 (uint8_t, uint8_t) \ + DEFFFS32 (uint8_t, int64_t) \ + DEFFFS32 (uint8_t, int32_t) \ + DEFFFS32 (uint8_t, int16_t) \ + DEFFFS32 (uint8_t, int8_t) \ + DEFFFS32 (int8_t, uint64_t) \ + DEFFFS32 (int8_t, uint32_t) \ + DEFFFS32 (int8_t, uint16_t) \ + DEFFFS32 (int8_t, uint8_t) \ + DEFFFS32 (int8_t, int64_t) \ + DEFFFS32 (int8_t, int32_t) \ + DEFFFS32 (int8_t, int16_t) \ + DEFFFS32 (int8_t, int8_t) + +DEF_ALL () + +#define SZ 512 + +#define TEST64(TYPEDST, TYPESRC) \ + void __attribute__ ((optimize ("0"))) test64_##TYPEDST##TYPESRC () \ + { \ + TYPESRC src[SZ]; \ + TYPEDST dst[SZ]; \ + for (int i = 0; i < SZ; i++) \ + { \ + int ia = i + 1; \ + src[i] = ia * 1234567890; \ + dst[i] = 0; \ + } \ + popcount64_##TYPEDST##TYPESRC (dst, src, SZ); \ + for (int i = 0; i < SZ; i++) \ + { \ + assert (dst[i] == __builtin_popcountll (src[i])); \ + } \ + } + +#define TEST64N(TYPEDST, TYPESRC) \ + void __attribute__ ((optimize ("0"))) test64n_##TYPEDST##TYPESRC () \ + { \ + TYPESRC src[SZ]; \ + TYPEDST dst[SZ]; \ + for (int i = 0; i < SZ; i++) \ + { \ + int ia = i + 1; \ + src[i] = ia * -1234567890; \ + dst[i] = 0; \ + } \ + popcount64_##TYPEDST##TYPESRC (dst, src, SZ); \ + for (int i = 0; i < SZ; i++) \ + { \ + assert (dst[i] == __builtin_popcountll (src[i])); \ + } \ + } + +#define TEST32(TYPEDST, TYPESRC) \ + void __attribute__ ((optimize ("0"))) test32_##TYPEDST##TYPESRC () \ + { \ + TYPESRC src[SZ]; \ + TYPEDST dst[SZ]; \ + for (int i = 0; i < SZ; i++) \ + { \ + int ia = i + 1; \ + src[i] = ia * 1234567; \ + dst[i] = 0; \ + } \ + popcount32_##TYPEDST##TYPESRC (dst, src, SZ); \ + for (int i = 0; i < SZ; i++) \ + { \ + assert (dst[i] == __builtin_popcount (src[i])); \ + } \ + } + +#define TEST32N(TYPEDST, TYPESRC) \ + void __attribute__ ((optimize ("0"))) test32n_##TYPEDST##TYPESRC () \ + { \ + TYPESRC src[SZ]; \ + TYPEDST dst[SZ]; \ + for (int i = 0; i < SZ; i++) \ + { \ + int ia = i + 1; \ + src[i] = ia * -1234567; \ + dst[i] = 0; \ + } \ + popcount32_##TYPEDST##TYPESRC (dst, src, SZ); \ + for (int i = 0; i < SZ; i++) \ + { \ + assert (dst[i] == __builtin_popcount (src[i])); \ + } \ + } + +#define TESTCTZ64(TYPEDST, TYPESRC) \ + void __attribute__ ((optimize ("0"))) testctz64_##TYPEDST##TYPESRC () \ + { \ + TYPESRC src[SZ]; \ + TYPEDST dst[SZ]; \ + for (int i = 0; i < SZ; i++) \ + { \ + int ia = i + 1; \ + src[i] = ia * 1234567890; \ + dst[i] = 0; \ + } \ + ctz64_##TYPEDST##TYPESRC (dst, src, SZ); \ + for (int i = 0; i < SZ; i++) \ + { \ + if (src[i] != 0) \ + assert (dst[i] == __builtin_ctzll (src[i])); \ + } \ + } + +#define TESTCTZ64N(TYPEDST, TYPESRC) \ + void __attribute__ ((optimize ("0"))) testctz64n_##TYPEDST##TYPESRC () \ + { \ + TYPESRC src[SZ]; \ + TYPEDST dst[SZ]; \ + for (int i = 0; i < SZ; i++) \ + { \ + int ia = i + 1; \ + src[i] = ia * -1234567890; \ + dst[i] = 0; \ + } \ + ctz64_##TYPEDST##TYPESRC (dst, src, SZ); \ + for (int i = 0; i < SZ; i++) \ + { \ + if (src[i] != 0) \ + assert (dst[i] == __builtin_ctzll (src[i])); \ + } \ + } + +#define TESTCTZ32(TYPEDST, TYPESRC) \ + void __attribute__ ((optimize ("0"))) testctz32_##TYPEDST##TYPESRC () \ + { \ + TYPESRC src[SZ]; \ + TYPEDST dst[SZ]; \ + for (int i = 0; i < SZ; i++) \ + { \ + int ia = i + 1; \ + src[i] = ia * 1234567; \ + dst[i] = 0; \ + } \ + ctz32_##TYPEDST##TYPESRC (dst, src, SZ); \ + for (int i = 0; i < SZ; i++) \ + { \ + if (src[i] != 0) \ + assert (dst[i] == __builtin_ctz (src[i])); \ + } \ + } + +#define TESTCTZ32N(TYPEDST, TYPESRC) \ + void __attribute__ ((optimize ("0"))) testctz32n_##TYPEDST##TYPESRC () \ + { \ + TYPESRC src[SZ]; \ + TYPEDST dst[SZ]; \ + for (int i = 0; i < SZ; i++) \ + { \ + int ia = i + 1; \ + src[i] = ia * -1234567; \ + dst[i] = 0; \ + } \ + ctz32_##TYPEDST##TYPESRC (dst, src, SZ); \ + for (int i = 0; i < SZ; i++) \ + { \ + if (src[i] != 0) \ + assert (dst[i] == __builtin_ctz (src[i])); \ + } \ + } + +#define TESTFFS64(TYPEDST, TYPESRC) \ + void __attribute__ ((optimize ("0"))) testffs64_##TYPEDST##TYPESRC () \ + { \ + TYPESRC src[SZ]; \ + TYPEDST dst[SZ]; \ + for (int i = 0; i < SZ; i++) \ + { \ + int ia = i + 1; \ + src[i] = ia * 1234567890; \ + dst[i] = 0; \ + } \ + ffs64_##TYPEDST##TYPESRC (dst, src, SZ); \ + for (int i = 0; i < SZ; i++) \ + { \ + assert (dst[i] == __builtin_ffsll (src[i])); \ + } \ + } + +#define TESTFFS64N(TYPEDST, TYPESRC) \ + void __attribute__ ((optimize ("0"))) testffs64n_##TYPEDST##TYPESRC () \ + { \ + TYPESRC src[SZ]; \ + TYPEDST dst[SZ]; \ + for (int i = 0; i < SZ; i++) \ + { \ + int ia = i + 1; \ + src[i] = ia * -1234567890; \ + dst[i] = 0; \ + } \ + ffs64_##TYPEDST##TYPESRC (dst, src, SZ); \ + for (int i = 0; i < SZ; i++) \ + { \ + assert (dst[i] == __builtin_ffsll (src[i])); \ + } \ + } + +#define TESTFFS32(TYPEDST, TYPESRC) \ + void __attribute__ ((optimize ("0"))) testffs32_##TYPEDST##TYPESRC () \ + { \ + TYPESRC src[SZ]; \ + TYPEDST dst[SZ]; \ + for (int i = 0; i < SZ; i++) \ + { \ + int ia = i + 1; \ + src[i] = ia * 1234567; \ + dst[i] = 0; \ + } \ + ffs32_##TYPEDST##TYPESRC (dst, src, SZ); \ + for (int i = 0; i < SZ; i++) \ + { \ + assert (dst[i] == __builtin_ffs (src[i])); \ + } \ + } + +#define TESTFFS32N(TYPEDST, TYPESRC) \ + void __attribute__ ((optimize ("0"))) testffs32n_##TYPEDST##TYPESRC () \ + { \ + TYPESRC src[SZ]; \ + TYPEDST dst[SZ]; \ + for (int i = 0; i < SZ; i++) \ + { \ + int ia = i + 1; \ + src[i] = ia * -1234567; \ + dst[i] = 0; \ + } \ + ffs32_##TYPEDST##TYPESRC (dst, src, SZ); \ + for (int i = 0; i < SZ; i++) \ + { \ + assert (dst[i] == __builtin_ffs (src[i])); \ + } \ + } + +#define TEST_ALL() \ + TEST64 (uint64_t, uint64_t) \ + TEST64 (uint64_t, uint32_t) \ + TEST64 (uint64_t, uint16_t) \ + TEST64 (uint64_t, uint8_t) \ + TEST64 (uint64_t, int64_t) \ + TEST64 (uint64_t, int32_t) \ + TEST64 (uint64_t, int16_t) \ + TEST64 (uint64_t, int8_t) \ + TEST64N (int64_t, uint64_t) \ + TEST64N (int64_t, uint32_t) \ + TEST64N (int64_t, uint16_t) \ + TEST64N (int64_t, uint8_t) \ + TEST64N (int64_t, int64_t) \ + TEST64N (int64_t, int32_t) \ + TEST64N (int64_t, int16_t) \ + TEST64N (int64_t, int8_t) \ + TEST64 (uint32_t, uint64_t) \ + TEST64 (uint32_t, uint32_t) \ + TEST64 (uint32_t, uint16_t) \ + TEST64 (uint32_t, uint8_t) \ + TEST64 (uint32_t, int64_t) \ + TEST64 (uint32_t, int32_t) \ + TEST64 (uint32_t, int16_t) \ + TEST64 (uint32_t, int8_t) \ + TEST64N (int32_t, uint64_t) \ + TEST64N (int32_t, uint32_t) \ + TEST64N (int32_t, uint16_t) \ + TEST64N (int32_t, uint8_t) \ + TEST64N (int32_t, int64_t) \ + TEST64N (int32_t, int32_t) \ + TEST64N (int32_t, int16_t) \ + TEST64N (int32_t, int8_t) \ + TEST64 (uint16_t, uint64_t) \ + TEST64 (uint16_t, uint32_t) \ + TEST64 (uint16_t, uint16_t) \ + TEST64 (uint16_t, uint8_t) \ + TEST64 (uint16_t, int64_t) \ + TEST64 (uint16_t, int32_t) \ + TEST64 (uint16_t, int16_t) \ + TEST64 (uint16_t, int8_t) \ + TEST64N (int16_t, uint64_t) \ + TEST64N (int16_t, uint32_t) \ + TEST64N (int16_t, uint16_t) \ + TEST64N (int16_t, uint8_t) \ + TEST64N (int16_t, int64_t) \ + TEST64N (int16_t, int32_t) \ + TEST64N (int16_t, int16_t) \ + TEST64N (int16_t, int8_t) \ + TEST64 (uint8_t, uint64_t) \ + TEST64 (uint8_t, uint32_t) \ + TEST64 (uint8_t, uint16_t) \ + TEST64 (uint8_t, uint8_t) \ + TEST64 (uint8_t, int64_t) \ + TEST64 (uint8_t, int32_t) \ + TEST64 (uint8_t, int16_t) \ + TEST64 (uint8_t, int8_t) \ + TEST64N (int8_t, uint64_t) \ + TEST64N (int8_t, uint32_t) \ + TEST64N (int8_t, uint16_t) \ + TEST64N (int8_t, uint8_t) \ + TEST64N (int8_t, int64_t) \ + TEST64N (int8_t, int32_t) \ + TEST64N (int8_t, int16_t) \ + TEST64N (int8_t, int8_t) \ + TEST32 (uint64_t, uint64_t) \ + TEST32 (uint64_t, uint32_t) \ + TEST32 (uint64_t, uint16_t) \ + TEST32 (uint64_t, uint8_t) \ + TEST32 (uint64_t, int64_t) \ + TEST32 (uint64_t, int32_t) \ + TEST32 (uint64_t, int16_t) \ + TEST32 (uint64_t, int8_t) \ + TEST32N (int64_t, uint64_t) \ + TEST32N (int64_t, uint32_t) \ + TEST32N (int64_t, uint16_t) \ + TEST32N (int64_t, uint8_t) \ + TEST32N (int64_t, int64_t) \ + TEST32N (int64_t, int32_t) \ + TEST32N (int64_t, int16_t) \ + TEST32N (int64_t, int8_t) \ + TEST32 (uint32_t, uint64_t) \ + TEST32 (uint32_t, uint32_t) \ + TEST32 (uint32_t, uint16_t) \ + TEST32 (uint32_t, uint8_t) \ + TEST32 (uint32_t, int64_t) \ + TEST32 (uint32_t, int32_t) \ + TEST32 (uint32_t, int16_t) \ + TEST32 (uint32_t, int8_t) \ + TEST32N (int32_t, uint64_t) \ + TEST32N (int32_t, uint32_t) \ + TEST32N (int32_t, uint16_t) \ + TEST32N (int32_t, uint8_t) \ + TEST32N (int32_t, int64_t) \ + TEST32N (int32_t, int32_t) \ + TEST32N (int32_t, int16_t) \ + TEST32N (int32_t, int8_t) \ + TEST32 (uint16_t, uint64_t) \ + TEST32 (uint16_t, uint32_t) \ + TEST32 (uint16_t, uint16_t) \ + TEST32 (uint16_t, uint8_t) \ + TEST32 (uint16_t, int64_t) \ + TEST32 (uint16_t, int32_t) \ + TEST32 (uint16_t, int16_t) \ + TEST32 (uint16_t, int8_t) \ + TEST32N (int16_t, uint64_t) \ + TEST32N (int16_t, uint32_t) \ + TEST32N (int16_t, uint16_t) \ + TEST32N (int16_t, uint8_t) \ + TEST32N (int16_t, int64_t) \ + TEST32N (int16_t, int32_t) \ + TEST32N (int16_t, int16_t) \ + TEST32N (int16_t, int8_t) \ + TEST32 (uint8_t, uint64_t) \ + TEST32 (uint8_t, uint32_t) \ + TEST32 (uint8_t, uint16_t) \ + TEST32 (uint8_t, uint8_t) \ + TEST32 (uint8_t, int64_t) \ + TEST32 (uint8_t, int32_t) \ + TEST32 (uint8_t, int16_t) \ + TEST32 (uint8_t, int8_t) \ + TEST32N (int8_t, uint64_t) \ + TEST32N (int8_t, uint32_t) \ + TEST32N (int8_t, uint16_t) \ + TEST32N (int8_t, uint8_t) \ + TEST32N (int8_t, int64_t) \ + TEST32N (int8_t, int32_t) \ + TEST32N (int8_t, int16_t) \ + TEST32N (int8_t, int8_t) \ + TESTCTZ64 (uint64_t, uint64_t) \ + TESTCTZ64 (uint64_t, uint32_t) \ + TESTCTZ64 (uint64_t, uint16_t) \ + TESTCTZ64 (uint64_t, uint8_t) \ + TESTCTZ64 (uint64_t, int64_t) \ + TESTCTZ64 (uint64_t, int32_t) \ + TESTCTZ64 (uint64_t, int16_t) \ + TESTCTZ64 (uint64_t, int8_t) \ + TESTCTZ64N (int64_t, uint64_t) \ + TESTCTZ64N (int64_t, uint32_t) \ + TESTCTZ64N (int64_t, uint16_t) \ + TESTCTZ64N (int64_t, uint8_t) \ + TESTCTZ64N (int64_t, int64_t) \ + TESTCTZ64N (int64_t, int32_t) \ + TESTCTZ64N (int64_t, int16_t) \ + TESTCTZ64N (int64_t, int8_t) \ + TESTCTZ64 (uint32_t, uint64_t) \ + TESTCTZ64 (uint32_t, uint32_t) \ + TESTCTZ64 (uint32_t, uint16_t) \ + TESTCTZ64 (uint32_t, uint8_t) \ + TESTCTZ64 (uint32_t, int64_t) \ + TESTCTZ64 (uint32_t, int32_t) \ + TESTCTZ64 (uint32_t, int16_t) \ + TESTCTZ64 (uint32_t, int8_t) \ + TESTCTZ64N (int32_t, uint64_t) \ + TESTCTZ64N (int32_t, uint32_t) \ + TESTCTZ64N (int32_t, uint16_t) \ + TESTCTZ64N (int32_t, uint8_t) \ + TESTCTZ64N (int32_t, int64_t) \ + TESTCTZ64N (int32_t, int32_t) \ + TESTCTZ64N (int32_t, int16_t) \ + TESTCTZ64N (int32_t, int8_t) \ + TESTCTZ64 (uint16_t, uint64_t) \ + TESTCTZ64 (uint16_t, uint32_t) \ + TESTCTZ64 (uint16_t, uint16_t) \ + TESTCTZ64 (uint16_t, uint8_t) \ + TESTCTZ64 (uint16_t, int64_t) \ + TESTCTZ64 (uint16_t, int32_t) \ + TESTCTZ64 (uint16_t, int16_t) \ + TESTCTZ64 (uint16_t, int8_t) \ + TESTCTZ64N (int16_t, uint64_t) \ + TESTCTZ64N (int16_t, uint32_t) \ + TESTCTZ64N (int16_t, uint16_t) \ + TESTCTZ64N (int16_t, uint8_t) \ + TESTCTZ64N (int16_t, int64_t) \ + TESTCTZ64N (int16_t, int32_t) \ + TESTCTZ64N (int16_t, int16_t) \ + TESTCTZ64N (int16_t, int8_t) \ + TESTCTZ64 (uint8_t, uint64_t) \ + TESTCTZ64 (uint8_t, uint32_t) \ + TESTCTZ64 (uint8_t, uint16_t) \ + TESTCTZ64 (uint8_t, uint8_t) \ + TESTCTZ64 (uint8_t, int64_t) \ + TESTCTZ64 (uint8_t, int32_t) \ + TESTCTZ64 (uint8_t, int16_t) \ + TESTCTZ64 (uint8_t, int8_t) \ + TESTCTZ64N (int8_t, uint64_t) \ + TESTCTZ64N (int8_t, uint32_t) \ + TESTCTZ64N (int8_t, uint16_t) \ + TESTCTZ64N (int8_t, uint8_t) \ + TESTCTZ64N (int8_t, int64_t) \ + TESTCTZ64N (int8_t, int32_t) \ + TESTCTZ64N (int8_t, int16_t) \ + TESTCTZ64N (int8_t, int8_t) \ + TESTCTZ32 (uint64_t, uint64_t) \ + TESTCTZ32 (uint64_t, uint32_t) \ + TESTCTZ32 (uint64_t, uint16_t) \ + TESTCTZ32 (uint64_t, uint8_t) \ + TESTCTZ32 (uint64_t, int64_t) \ + TESTCTZ32 (uint64_t, int32_t) \ + TESTCTZ32 (uint64_t, int16_t) \ + TESTCTZ32 (uint64_t, int8_t) \ + TESTCTZ32N (int64_t, uint64_t) \ + TESTCTZ32N (int64_t, uint32_t) \ + TESTCTZ32N (int64_t, uint16_t) \ + TESTCTZ32N (int64_t, uint8_t) \ + TESTCTZ32N (int64_t, int64_t) \ + TESTCTZ32N (int64_t, int32_t) \ + TESTCTZ32N (int64_t, int16_t) \ + TESTCTZ32N (int64_t, int8_t) \ + TESTCTZ32 (uint32_t, uint64_t) \ + TESTCTZ32 (uint32_t, uint32_t) \ + TESTCTZ32 (uint32_t, uint16_t) \ + TESTCTZ32 (uint32_t, uint8_t) \ + TESTCTZ32 (uint32_t, int64_t) \ + TESTCTZ32 (uint32_t, int32_t) \ + TESTCTZ32 (uint32_t, int16_t) \ + TESTCTZ32 (uint32_t, int8_t) \ + TESTCTZ32N (int32_t, uint64_t) \ + TESTCTZ32N (int32_t, uint32_t) \ + TESTCTZ32N (int32_t, uint16_t) \ + TESTCTZ32N (int32_t, uint8_t) \ + TESTCTZ32N (int32_t, int64_t) \ + TESTCTZ32N (int32_t, int32_t) \ + TESTCTZ32N (int32_t, int16_t) \ + TESTCTZ32N (int32_t, int8_t) \ + TESTCTZ32 (uint16_t, uint64_t) \ + TESTCTZ32 (uint16_t, uint32_t) \ + TESTCTZ32 (uint16_t, uint16_t) \ + TESTCTZ32 (uint16_t, uint8_t) \ + TESTCTZ32 (uint16_t, int64_t) \ + TESTCTZ32 (uint16_t, int32_t) \ + TESTCTZ32 (uint16_t, int16_t) \ + TESTCTZ32 (uint16_t, int8_t) \ + TESTCTZ32N (int16_t, uint64_t) \ + TESTCTZ32N (int16_t, uint32_t) \ + TESTCTZ32N (int16_t, uint16_t) \ + TESTCTZ32N (int16_t, uint8_t) \ + TESTCTZ32N (int16_t, int64_t) \ + TESTCTZ32N (int16_t, int32_t) \ + TESTCTZ32N (int16_t, int16_t) \ + TESTCTZ32N (int16_t, int8_t) \ + TESTCTZ32 (uint8_t, uint64_t) \ + TESTCTZ32 (uint8_t, uint32_t) \ + TESTCTZ32 (uint8_t, uint16_t) \ + TESTCTZ32 (uint8_t, uint8_t) \ + TESTCTZ32 (uint8_t, int64_t) \ + TESTCTZ32 (uint8_t, int32_t) \ + TESTCTZ32 (uint8_t, int16_t) \ + TESTCTZ32 (uint8_t, int8_t) \ + TESTCTZ32N (int8_t, uint64_t) \ + TESTCTZ32N (int8_t, uint32_t) \ + TESTCTZ32N (int8_t, uint16_t) \ + TESTCTZ32N (int8_t, uint8_t) \ + TESTCTZ32N (int8_t, int64_t) \ + TESTCTZ32N (int8_t, int32_t) \ + TESTCTZ32N (int8_t, int16_t) \ + TESTCTZ32N (int8_t, int8_t) \ + TESTFFS64 (uint64_t, uint64_t) \ + TESTFFS64 (uint64_t, uint32_t) \ + TESTFFS64 (uint64_t, uint16_t) \ + TESTFFS64 (uint64_t, uint8_t) \ + TESTFFS64 (uint64_t, int64_t) \ + TESTFFS64 (uint64_t, int32_t) \ + TESTFFS64 (uint64_t, int16_t) \ + TESTFFS64 (uint64_t, int8_t) \ + TESTFFS64N (int64_t, uint64_t) \ + TESTFFS64N (int64_t, uint32_t) \ + TESTFFS64N (int64_t, uint16_t) \ + TESTFFS64N (int64_t, uint8_t) \ + TESTFFS64N (int64_t, int64_t) \ + TESTFFS64N (int64_t, int32_t) \ + TESTFFS64N (int64_t, int16_t) \ + TESTFFS64N (int64_t, int8_t) \ + TESTFFS64 (uint32_t, uint64_t) \ + TESTFFS64 (uint32_t, uint32_t) \ + TESTFFS64 (uint32_t, uint16_t) \ + TESTFFS64 (uint32_t, uint8_t) \ + TESTFFS64 (uint32_t, int64_t) \ + TESTFFS64 (uint32_t, int32_t) \ + TESTFFS64 (uint32_t, int16_t) \ + TESTFFS64 (uint32_t, int8_t) \ + TESTFFS64N (int32_t, uint64_t) \ + TESTFFS64N (int32_t, uint32_t) \ + TESTFFS64N (int32_t, uint16_t) \ + TESTFFS64N (int32_t, uint8_t) \ + TESTFFS64N (int32_t, int64_t) \ + TESTFFS64N (int32_t, int32_t) \ + TESTFFS64N (int32_t, int16_t) \ + TESTFFS64N (int32_t, int8_t) \ + TESTFFS64 (uint16_t, uint64_t) \ + TESTFFS64 (uint16_t, uint32_t) \ + TESTFFS64 (uint16_t, uint16_t) \ + TESTFFS64 (uint16_t, uint8_t) \ + TESTFFS64 (uint16_t, int64_t) \ + TESTFFS64 (uint16_t, int32_t) \ + TESTFFS64 (uint16_t, int16_t) \ + TESTFFS64 (uint16_t, int8_t) \ + TESTFFS64N (int16_t, uint64_t) \ + TESTFFS64N (int16_t, uint32_t) \ + TESTFFS64N (int16_t, uint16_t) \ + TESTFFS64N (int16_t, uint8_t) \ + TESTFFS64N (int16_t, int64_t) \ + TESTFFS64N (int16_t, int32_t) \ + TESTFFS64N (int16_t, int16_t) \ + TESTFFS64N (int16_t, int8_t) \ + TESTFFS64 (uint8_t, uint64_t) \ + TESTFFS64 (uint8_t, uint32_t) \ + TESTFFS64 (uint8_t, uint16_t) \ + TESTFFS64 (uint8_t, uint8_t) \ + TESTFFS64 (uint8_t, int64_t) \ + TESTFFS64 (uint8_t, int32_t) \ + TESTFFS64 (uint8_t, int16_t) \ + TESTFFS64 (uint8_t, int8_t) \ + TESTFFS64N (int8_t, uint64_t) \ + TESTFFS64N (int8_t, uint32_t) \ + TESTFFS64N (int8_t, uint16_t) \ + TESTFFS64N (int8_t, uint8_t) \ + TESTFFS64N (int8_t, int64_t) \ + TESTFFS64N (int8_t, int32_t) \ + TESTFFS64N (int8_t, int16_t) \ + TESTFFS64N (int8_t, int8_t) \ + TESTFFS32 (uint64_t, uint64_t) \ + TESTFFS32 (uint64_t, uint32_t) \ + TESTFFS32 (uint64_t, uint16_t) \ + TESTFFS32 (uint64_t, uint8_t) \ + TESTFFS32 (uint64_t, int64_t) \ + TESTFFS32 (uint64_t, int32_t) \ + TESTFFS32 (uint64_t, int16_t) \ + TESTFFS32 (uint64_t, int8_t) \ + TESTFFS32N (int64_t, uint64_t) \ + TESTFFS32N (int64_t, uint32_t) \ + TESTFFS32N (int64_t, uint16_t) \ + TESTFFS32N (int64_t, uint8_t) \ + TESTFFS32N (int64_t, int64_t) \ + TESTFFS32N (int64_t, int32_t) \ + TESTFFS32N (int64_t, int16_t) \ + TESTFFS32N (int64_t, int8_t) \ + TESTFFS32 (uint32_t, uint64_t) \ + TESTFFS32 (uint32_t, uint32_t) \ + TESTFFS32 (uint32_t, uint16_t) \ + TESTFFS32 (uint32_t, uint8_t) \ + TESTFFS32 (uint32_t, int64_t) \ + TESTFFS32 (uint32_t, int32_t) \ + TESTFFS32 (uint32_t, int16_t) \ + TESTFFS32 (uint32_t, int8_t) \ + TESTFFS32N (int32_t, uint64_t) \ + TESTFFS32N (int32_t, uint32_t) \ + TESTFFS32N (int32_t, uint16_t) \ + TESTFFS32N (int32_t, uint8_t) \ + TESTFFS32N (int32_t, int64_t) \ + TESTFFS32N (int32_t, int32_t) \ + TESTFFS32N (int32_t, int16_t) \ + TESTFFS32N (int32_t, int8_t) \ + TESTFFS32 (uint16_t, uint64_t) \ + TESTFFS32 (uint16_t, uint32_t) \ + TESTFFS32 (uint16_t, uint16_t) \ + TESTFFS32 (uint16_t, uint8_t) \ + TESTFFS32 (uint16_t, int64_t) \ + TESTFFS32 (uint16_t, int32_t) \ + TESTFFS32 (uint16_t, int16_t) \ + TESTFFS32 (uint16_t, int8_t) \ + TESTFFS32N (int16_t, uint64_t) \ + TESTFFS32N (int16_t, uint32_t) \ + TESTFFS32N (int16_t, uint16_t) \ + TESTFFS32N (int16_t, uint8_t) \ + TESTFFS32N (int16_t, int64_t) \ + TESTFFS32N (int16_t, int32_t) \ + TESTFFS32N (int16_t, int16_t) \ + TESTFFS32N (int16_t, int8_t) \ + TESTFFS32 (uint8_t, uint64_t) \ + TESTFFS32 (uint8_t, uint32_t) \ + TESTFFS32 (uint8_t, uint16_t) \ + TESTFFS32 (uint8_t, uint8_t) \ + TESTFFS32 (uint8_t, int64_t) \ + TESTFFS32 (uint8_t, int32_t) \ + TESTFFS32 (uint8_t, int16_t) \ + TESTFFS32 (uint8_t, int8_t) \ + TESTFFS32N (int8_t, uint64_t) \ + TESTFFS32N (int8_t, uint32_t) \ + TESTFFS32N (int8_t, uint16_t) \ + TESTFFS32N (int8_t, uint8_t) \ + TESTFFS32N (int8_t, int64_t) \ + TESTFFS32N (int8_t, int32_t) \ + TESTFFS32N (int8_t, int16_t) \ + TESTFFS32N (int8_t, int8_t) + +TEST_ALL () + +#define RUN64(TYPEDST, TYPESRC) test64_##TYPEDST##TYPESRC (); +#define RUN64N(TYPEDST, TYPESRC) test64n_##TYPEDST##TYPESRC (); +#define RUN32(TYPEDST, TYPESRC) test32_##TYPEDST##TYPESRC (); +#define RUN32N(TYPEDST, TYPESRC) test32n_##TYPEDST##TYPESRC (); +#define RUNCTZ64(TYPEDST, TYPESRC) testctz64_##TYPEDST##TYPESRC (); +#define RUNCTZ64N(TYPEDST, TYPESRC) testctz64n_##TYPEDST##TYPESRC (); +#define RUNCTZ32(TYPEDST, TYPESRC) testctz32_##TYPEDST##TYPESRC (); +#define RUNCTZ32N(TYPEDST, TYPESRC) testctz32n_##TYPEDST##TYPESRC (); +#define RUNFFS64(TYPEDST, TYPESRC) testffs64_##TYPEDST##TYPESRC (); +#define RUNFFS64N(TYPEDST, TYPESRC) testffs64n_##TYPEDST##TYPESRC (); +#define RUNFFS32(TYPEDST, TYPESRC) testffs32_##TYPEDST##TYPESRC (); +#define RUNFFS32N(TYPEDST, TYPESRC) testffs32n_##TYPEDST##TYPESRC (); + +#define RUN_ALL() \ + RUN64 (uint64_t, uint64_t) \ + RUN64 (uint64_t, uint32_t) \ + RUN64 (uint64_t, uint16_t) \ + RUN64 (uint64_t, uint8_t) \ + RUN64 (uint64_t, int64_t) \ + RUN64 (uint64_t, int32_t) \ + RUN64 (uint64_t, int16_t) \ + RUN64 (uint64_t, int8_t) \ + RUN64N (int64_t, uint64_t) \ + RUN64N (int64_t, uint32_t) \ + RUN64N (int64_t, uint16_t) \ + RUN64N (int64_t, uint8_t) \ + RUN64N (int64_t, int64_t) \ + RUN64N (int64_t, int32_t) \ + RUN64N (int64_t, int16_t) \ + RUN64N (int64_t, int8_t) \ + RUN64 (uint32_t, uint64_t) \ + RUN64 (uint32_t, uint32_t) \ + RUN64 (uint32_t, uint16_t) \ + RUN64 (uint32_t, uint8_t) \ + RUN64 (uint32_t, int64_t) \ + RUN64 (uint32_t, int32_t) \ + RUN64 (uint32_t, int16_t) \ + RUN64 (uint32_t, int8_t) \ + RUN64N (int32_t, uint64_t) \ + RUN64N (int32_t, uint32_t) \ + RUN64N (int32_t, uint16_t) \ + RUN64N (int32_t, uint8_t) \ + RUN64N (int32_t, int64_t) \ + RUN64N (int32_t, int32_t) \ + RUN64N (int32_t, int16_t) \ + RUN64N (int32_t, int8_t) \ + RUN64 (uint16_t, uint64_t) \ + RUN64 (uint16_t, uint32_t) \ + RUN64 (uint16_t, uint16_t) \ + RUN64 (uint16_t, uint8_t) \ + RUN64 (uint16_t, int64_t) \ + RUN64 (uint16_t, int32_t) \ + RUN64 (uint16_t, int16_t) \ + RUN64 (uint16_t, int8_t) \ + RUN64N (int16_t, uint64_t) \ + RUN64N (int16_t, uint32_t) \ + RUN64N (int16_t, uint16_t) \ + RUN64N (int16_t, uint8_t) \ + RUN64N (int16_t, int64_t) \ + RUN64N (int16_t, int32_t) \ + RUN64N (int16_t, int16_t) \ + RUN64N (int16_t, int8_t) \ + RUN64 (uint8_t, uint64_t) \ + RUN64 (uint8_t, uint32_t) \ + RUN64 (uint8_t, uint16_t) \ + RUN64 (uint8_t, uint8_t) \ + RUN64 (uint8_t, int64_t) \ + RUN64 (uint8_t, int32_t) \ + RUN64 (uint8_t, int16_t) \ + RUN64 (uint8_t, int8_t) \ + RUN64N (int8_t, uint64_t) \ + RUN64N (int8_t, uint32_t) \ + RUN64N (int8_t, uint16_t) \ + RUN64N (int8_t, uint8_t) \ + RUN64N (int8_t, int64_t) \ + RUN64N (int8_t, int32_t) \ + RUN64N (int8_t, int16_t) \ + RUN64N (int8_t, int8_t) \ + RUN32 (uint64_t, uint64_t) \ + RUN32 (uint64_t, uint32_t) \ + RUN32 (uint64_t, uint16_t) \ + RUN32 (uint64_t, uint8_t) \ + RUN32 (uint64_t, int64_t) \ + RUN32 (uint64_t, int32_t) \ + RUN32 (uint64_t, int16_t) \ + RUN32 (uint64_t, int8_t) \ + RUN32N (int64_t, uint64_t) \ + RUN32N (int64_t, uint32_t) \ + RUN32N (int64_t, uint16_t) \ + RUN32N (int64_t, uint8_t) \ + RUN32N (int64_t, int64_t) \ + RUN32N (int64_t, int32_t) \ + RUN32N (int64_t, int16_t) \ + RUN32N (int64_t, int8_t) \ + RUN32 (uint32_t, uint64_t) \ + RUN32 (uint32_t, uint32_t) \ + RUN32 (uint32_t, uint16_t) \ + RUN32 (uint32_t, uint8_t) \ + RUN32 (uint32_t, int64_t) \ + RUN32 (uint32_t, int32_t) \ + RUN32 (uint32_t, int16_t) \ + RUN32 (uint32_t, int8_t) \ + RUN32N (int32_t, uint64_t) \ + RUN32N (int32_t, uint32_t) \ + RUN32N (int32_t, uint16_t) \ + RUN32N (int32_t, uint8_t) \ + RUN32N (int32_t, int64_t) \ + RUN32N (int32_t, int32_t) \ + RUN32N (int32_t, int16_t) \ + RUN32N (int32_t, int8_t) \ + RUN32 (uint16_t, uint64_t) \ + RUN32 (uint16_t, uint32_t) \ + RUN32 (uint16_t, uint16_t) \ + RUN32 (uint16_t, uint8_t) \ + RUN32 (uint16_t, int64_t) \ + RUN32 (uint16_t, int32_t) \ + RUN32 (uint16_t, int16_t) \ + RUN32 (uint16_t, int8_t) \ + RUN32N (int16_t, uint64_t) \ + RUN32N (int16_t, uint32_t) \ + RUN32N (int16_t, uint16_t) \ + RUN32N (int16_t, uint8_t) \ + RUN32N (int16_t, int64_t) \ + RUN32N (int16_t, int32_t) \ + RUN32N (int16_t, int16_t) \ + RUN32N (int16_t, int8_t) \ + RUN32 (uint8_t, uint64_t) \ + RUN32 (uint8_t, uint32_t) \ + RUN32 (uint8_t, uint16_t) \ + RUN32 (uint8_t, uint8_t) \ + RUN32 (uint8_t, int64_t) \ + RUN32 (uint8_t, int32_t) \ + RUN32 (uint8_t, int16_t) \ + RUN32 (uint8_t, int8_t) \ + RUN32N (int8_t, uint64_t) \ + RUN32N (int8_t, uint32_t) \ + RUN32N (int8_t, uint16_t) \ + RUN32N (int8_t, uint8_t) \ + RUN32N (int8_t, int64_t) \ + RUN32N (int8_t, int32_t) \ + RUN32N (int8_t, int16_t) \ + RUN32N (int8_t, int8_t) \ + RUNCTZ64 (uint64_t, uint64_t) \ + RUNCTZ64 (uint64_t, uint32_t) \ + RUNCTZ64 (uint64_t, uint16_t) \ + RUNCTZ64 (uint64_t, uint8_t) \ + RUNCTZ64 (uint64_t, int64_t) \ + RUNCTZ64 (uint64_t, int32_t) \ + RUNCTZ64 (uint64_t, int16_t) \ + RUNCTZ64 (uint64_t, int8_t) \ + RUNCTZ64N (int64_t, uint64_t) \ + RUNCTZ64N (int64_t, uint32_t) \ + RUNCTZ64N (int64_t, uint16_t) \ + RUNCTZ64N (int64_t, uint8_t) \ + RUNCTZ64N (int64_t, int64_t) \ + RUNCTZ64N (int64_t, int32_t) \ + RUNCTZ64N (int64_t, int16_t) \ + RUNCTZ64N (int64_t, int8_t) \ + RUNCTZ64 (uint32_t, uint64_t) \ + RUNCTZ64 (uint32_t, uint32_t) \ + RUNCTZ64 (uint32_t, uint16_t) \ + RUNCTZ64 (uint32_t, uint8_t) \ + RUNCTZ64 (uint32_t, int64_t) \ + RUNCTZ64 (uint32_t, int32_t) \ + RUNCTZ64 (uint32_t, int16_t) \ + RUNCTZ64 (uint32_t, int8_t) \ + RUNCTZ64N (int32_t, uint64_t) \ + RUNCTZ64N (int32_t, uint32_t) \ + RUNCTZ64N (int32_t, uint16_t) \ + RUNCTZ64N (int32_t, uint8_t) \ + RUNCTZ64N (int32_t, int64_t) \ + RUNCTZ64N (int32_t, int32_t) \ + RUNCTZ64N (int32_t, int16_t) \ + RUNCTZ64N (int32_t, int8_t) \ + RUNCTZ64 (uint16_t, uint64_t) \ + RUNCTZ64 (uint16_t, uint32_t) \ + RUNCTZ64 (uint16_t, uint16_t) \ + RUNCTZ64 (uint16_t, uint8_t) \ + RUNCTZ64 (uint16_t, int64_t) \ + RUNCTZ64 (uint16_t, int32_t) \ + RUNCTZ64 (uint16_t, int16_t) \ + RUNCTZ64 (uint16_t, int8_t) \ + RUNCTZ64N (int16_t, uint64_t) \ + RUNCTZ64N (int16_t, uint32_t) \ + RUNCTZ64N (int16_t, uint16_t) \ + RUNCTZ64N (int16_t, uint8_t) \ + RUNCTZ64N (int16_t, int64_t) \ + RUNCTZ64N (int16_t, int32_t) \ + RUNCTZ64N (int16_t, int16_t) \ + RUNCTZ64N (int16_t, int8_t) \ + RUNCTZ64 (uint8_t, uint64_t) \ + RUNCTZ64 (uint8_t, uint32_t) \ + RUNCTZ64 (uint8_t, uint16_t) \ + RUNCTZ64 (uint8_t, uint8_t) \ + RUNCTZ64 (uint8_t, int64_t) \ + RUNCTZ64 (uint8_t, int32_t) \ + RUNCTZ64 (uint8_t, int16_t) \ + RUNCTZ64 (uint8_t, int8_t) \ + RUNCTZ64N (int8_t, uint64_t) \ + RUNCTZ64N (int8_t, uint32_t) \ + RUNCTZ64N (int8_t, uint16_t) \ + RUNCTZ64N (int8_t, uint8_t) \ + RUNCTZ64N (int8_t, int64_t) \ + RUNCTZ64N (int8_t, int32_t) \ + RUNCTZ64N (int8_t, int16_t) \ + RUNCTZ64N (int8_t, int8_t) \ + RUNCTZ32 (uint64_t, uint64_t) \ + RUNCTZ32 (uint64_t, uint32_t) \ + RUNCTZ32 (uint64_t, uint16_t) \ + RUNCTZ32 (uint64_t, uint8_t) \ + RUNCTZ32 (uint64_t, int64_t) \ + RUNCTZ32 (uint64_t, int32_t) \ + RUNCTZ32 (uint64_t, int16_t) \ + RUNCTZ32 (uint64_t, int8_t) \ + RUNCTZ32N (int64_t, uint64_t) \ + RUNCTZ32N (int64_t, uint32_t) \ + RUNCTZ32N (int64_t, uint16_t) \ + RUNCTZ32N (int64_t, uint8_t) \ + RUNCTZ32N (int64_t, int64_t) \ + RUNCTZ32N (int64_t, int32_t) \ + RUNCTZ32N (int64_t, int16_t) \ + RUNCTZ32N (int64_t, int8_t) \ + RUNCTZ32 (uint32_t, uint64_t) \ + RUNCTZ32 (uint32_t, uint32_t) \ + RUNCTZ32 (uint32_t, uint16_t) \ + RUNCTZ32 (uint32_t, uint8_t) \ + RUNCTZ32 (uint32_t, int64_t) \ + RUNCTZ32 (uint32_t, int32_t) \ + RUNCTZ32 (uint32_t, int16_t) \ + RUNCTZ32 (uint32_t, int8_t) \ + RUNCTZ32N (int32_t, uint64_t) \ + RUNCTZ32N (int32_t, uint32_t) \ + RUNCTZ32N (int32_t, uint16_t) \ + RUNCTZ32N (int32_t, uint8_t) \ + RUNCTZ32N (int32_t, int64_t) \ + RUNCTZ32N (int32_t, int32_t) \ + RUNCTZ32N (int32_t, int16_t) \ + RUNCTZ32N (int32_t, int8_t) \ + RUNCTZ32 (uint16_t, uint64_t) \ + RUNCTZ32 (uint16_t, uint32_t) \ + RUNCTZ32 (uint16_t, uint16_t) \ + RUNCTZ32 (uint16_t, uint8_t) \ + RUNCTZ32 (uint16_t, int64_t) \ + RUNCTZ32 (uint16_t, int32_t) \ + RUNCTZ32 (uint16_t, int16_t) \ + RUNCTZ32 (uint16_t, int8_t) \ + RUNCTZ32N (int16_t, uint64_t) \ + RUNCTZ32N (int16_t, uint32_t) \ + RUNCTZ32N (int16_t, uint16_t) \ + RUNCTZ32N (int16_t, uint8_t) \ + RUNCTZ32N (int16_t, int64_t) \ + RUNCTZ32N (int16_t, int32_t) \ + RUNCTZ32N (int16_t, int16_t) \ + RUNCTZ32N (int16_t, int8_t) \ + RUNCTZ32 (uint8_t, uint64_t) \ + RUNCTZ32 (uint8_t, uint32_t) \ + RUNCTZ32 (uint8_t, uint16_t) \ + RUNCTZ32 (uint8_t, uint8_t) \ + RUNCTZ32 (uint8_t, int64_t) \ + RUNCTZ32 (uint8_t, int32_t) \ + RUNCTZ32 (uint8_t, int16_t) \ + RUNCTZ32 (uint8_t, int8_t) \ + RUNCTZ32N (int8_t, uint64_t) \ + RUNCTZ32N (int8_t, uint32_t) \ + RUNCTZ32N (int8_t, uint16_t) \ + RUNCTZ32N (int8_t, uint8_t) \ + RUNCTZ32N (int8_t, int64_t) \ + RUNCTZ32N (int8_t, int32_t) \ + RUNCTZ32N (int8_t, int16_t) \ + RUNCTZ32N (int8_t, int8_t) \ + RUNFFS64 (uint64_t, uint64_t) \ + RUNFFS64 (uint64_t, uint32_t) \ + RUNFFS64 (uint64_t, uint16_t) \ + RUNFFS64 (uint64_t, uint8_t) \ + RUNFFS64 (uint64_t, int64_t) \ + RUNFFS64 (uint64_t, int32_t) \ + RUNFFS64 (uint64_t, int16_t) \ + RUNFFS64 (uint64_t, int8_t) \ + RUNFFS64N (int64_t, uint64_t) \ + RUNFFS64N (int64_t, uint32_t) \ + RUNFFS64N (int64_t, uint16_t) \ + RUNFFS64N (int64_t, uint8_t) \ + RUNFFS64N (int64_t, int64_t) \ + RUNFFS64N (int64_t, int32_t) \ + RUNFFS64N (int64_t, int16_t) \ + RUNFFS64N (int64_t, int8_t) \ + RUNFFS64 (uint32_t, uint64_t) \ + RUNFFS64 (uint32_t, uint32_t) \ + RUNFFS64 (uint32_t, uint16_t) \ + RUNFFS64 (uint32_t, uint8_t) \ + RUNFFS64 (uint32_t, int64_t) \ + RUNFFS64 (uint32_t, int32_t) \ + RUNFFS64 (uint32_t, int16_t) \ + RUNFFS64 (uint32_t, int8_t) \ + RUNFFS64N (int32_t, uint64_t) \ + RUNFFS64N (int32_t, uint32_t) \ + RUNFFS64N (int32_t, uint16_t) \ + RUNFFS64N (int32_t, uint8_t) \ + RUNFFS64N (int32_t, int64_t) \ + RUNFFS64N (int32_t, int32_t) \ + RUNFFS64N (int32_t, int16_t) \ + RUNFFS64N (int32_t, int8_t) \ + RUNFFS64 (uint16_t, uint64_t) \ + RUNFFS64 (uint16_t, uint32_t) \ + RUNFFS64 (uint16_t, uint16_t) \ + RUNFFS64 (uint16_t, uint8_t) \ + RUNFFS64 (uint16_t, int64_t) \ + RUNFFS64 (uint16_t, int32_t) \ + RUNFFS64 (uint16_t, int16_t) \ + RUNFFS64 (uint16_t, int8_t) \ + RUNFFS64N (int16_t, uint64_t) \ + RUNFFS64N (int16_t, uint32_t) \ + RUNFFS64N (int16_t, uint16_t) \ + RUNFFS64N (int16_t, uint8_t) \ + RUNFFS64N (int16_t, int64_t) \ + RUNFFS64N (int16_t, int32_t) \ + RUNFFS64N (int16_t, int16_t) \ + RUNFFS64N (int16_t, int8_t) \ + RUNFFS64 (uint8_t, uint64_t) \ + RUNFFS64 (uint8_t, uint32_t) \ + RUNFFS64 (uint8_t, uint16_t) \ + RUNFFS64 (uint8_t, uint8_t) \ + RUNFFS64 (uint8_t, int64_t) \ + RUNFFS64 (uint8_t, int32_t) \ + RUNFFS64 (uint8_t, int16_t) \ + RUNFFS64 (uint8_t, int8_t) \ + RUNFFS64N (int8_t, uint64_t) \ + RUNFFS64N (int8_t, uint32_t) \ + RUNFFS64N (int8_t, uint16_t) \ + RUNFFS64N (int8_t, uint8_t) \ + RUNFFS64N (int8_t, int64_t) \ + RUNFFS64N (int8_t, int32_t) \ + RUNFFS64N (int8_t, int16_t) \ + RUNFFS64N (int8_t, int8_t) \ + RUNFFS32 (uint64_t, uint64_t) \ + RUNFFS32 (uint64_t, uint32_t) \ + RUNFFS32 (uint64_t, uint16_t) \ + RUNFFS32 (uint64_t, uint8_t) \ + RUNFFS32 (uint64_t, int64_t) \ + RUNFFS32 (uint64_t, int32_t) \ + RUNFFS32 (uint64_t, int16_t) \ + RUNFFS32 (uint64_t, int8_t) \ + RUNFFS32N (int64_t, uint64_t) \ + RUNFFS32N (int64_t, uint32_t) \ + RUNFFS32N (int64_t, uint16_t) \ + RUNFFS32N (int64_t, uint8_t) \ + RUNFFS32N (int64_t, int64_t) \ + RUNFFS32N (int64_t, int32_t) \ + RUNFFS32N (int64_t, int16_t) \ + RUNFFS32N (int64_t, int8_t) \ + RUNFFS32 (uint32_t, uint64_t) \ + RUNFFS32 (uint32_t, uint32_t) \ + RUNFFS32 (uint32_t, uint16_t) \ + RUNFFS32 (uint32_t, uint8_t) \ + RUNFFS32 (uint32_t, int64_t) \ + RUNFFS32 (uint32_t, int32_t) \ + RUNFFS32 (uint32_t, int16_t) \ + RUNFFS32 (uint32_t, int8_t) \ + RUNFFS32N (int32_t, uint64_t) \ + RUNFFS32N (int32_t, uint32_t) \ + RUNFFS32N (int32_t, uint16_t) \ + RUNFFS32N (int32_t, uint8_t) \ + RUNFFS32N (int32_t, int64_t) \ + RUNFFS32N (int32_t, int32_t) \ + RUNFFS32N (int32_t, int16_t) \ + RUNFFS32N (int32_t, int8_t) \ + RUNFFS32 (uint16_t, uint64_t) \ + RUNFFS32 (uint16_t, uint32_t) \ + RUNFFS32 (uint16_t, uint16_t) \ + RUNFFS32 (uint16_t, uint8_t) \ + RUNFFS32 (uint16_t, int64_t) \ + RUNFFS32 (uint16_t, int32_t) \ + RUNFFS32 (uint16_t, int16_t) \ + RUNFFS32 (uint16_t, int8_t) \ + RUNFFS32N (int16_t, uint64_t) \ + RUNFFS32N (int16_t, uint32_t) \ + RUNFFS32N (int16_t, uint16_t) \ + RUNFFS32N (int16_t, uint8_t) \ + RUNFFS32N (int16_t, int64_t) \ + RUNFFS32N (int16_t, int32_t) \ + RUNFFS32N (int16_t, int16_t) \ + RUNFFS32N (int16_t, int8_t) \ + RUNFFS32 (uint8_t, uint64_t) \ + RUNFFS32 (uint8_t, uint32_t) \ + RUNFFS32 (uint8_t, uint16_t) \ + RUNFFS32 (uint8_t, uint8_t) \ + RUNFFS32 (uint8_t, int64_t) \ + RUNFFS32 (uint8_t, int32_t) \ + RUNFFS32 (uint8_t, int16_t) \ + RUNFFS32 (uint8_t, int8_t) \ + RUNFFS32N (int8_t, uint64_t) \ + RUNFFS32N (int8_t, uint32_t) \ + RUNFFS32N (int8_t, uint16_t) \ + RUNFFS32N (int8_t, uint8_t) \ + RUNFFS32N (int8_t, int64_t) \ + RUNFFS32N (int8_t, int32_t) \ + RUNFFS32N (int8_t, int16_t) \ + RUNFFS32N (int8_t, int8_t) + +int +main () +{ + RUN_ALL () +} + +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 229 "vect" } } */