From patchwork Thu Oct 26 20:29:39 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Robin Dapp X-Patchwork-Id: 158681 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:d641:0:b0:403:3b70:6f57 with SMTP id cy1csp151689vqb; Thu, 26 Oct 2023 13:30:07 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEb9BVNaj1qnPxiML0uGUxIYQFIcQiQ2JWN7cqJAxa+cr/OwMEC2MyT1wXQxLE+nRHVFKwU X-Received: by 2002:a05:620a:40d2:b0:776:fad0:cc3b with SMTP id g18-20020a05620a40d200b00776fad0cc3bmr479233qko.1.1698352206895; Thu, 26 Oct 2023 13:30:06 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1698352206; cv=pass; d=google.com; s=arc-20160816; b=pDGpNTasSypc59ZANoptkec2GQrkontqBHvWC7V1rh8AL24yWQT0coAOW/ag6GjIdA P1iY72T87F4TiqsfhlaPZsbWzXDIn16rwAAg6bgJCwjrfsE+EMdznfTTH/49Np5MqntC SdNbBmTOaoZHsIii9kIqKj4rffxlKQCxW8cNQYYmEXXN+ULSz9jLzk/pwGm8Sp8Ekcm/ uP0kp0uB/73/K0+74Fvyh7j88bMXyhVRhHjvKgRTq4/+hybO9D5zoTHx7x9G3qG9dOw1 6BeusgCejnSFcp/4Bgg/nfhnbCm7XXRV4+voZMgfaR0KRmane04jQ5gx7I9DOzspeTcz 7ILg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :subject:from:to:content-language:cc:user-agent:mime-version:date :message-id:dkim-signature:arc-filter:dmarc-filter:delivered-to; bh=iW3PEQZ4iv1RkQEVz/y+AOaR9/qGyPLD/XJwfFaQWgM=; fh=MP1/pUCFlWKCuVsZg/AyiqM/gAN9eNrTjtzrUP16xWo=; b=MuD5rIgD5wZnFb/DYmZRjWCbp7v1o1zPwvUXx9CMAOT0nmANxpCi50iF4w51ZaDJIJ eedoFO6DreGASKaVsQijs4//Rx5+B8NDjSSe8QQTDFXkknYLp4+yH0lOGIwBFzc/apbP asIs4rzsnCdD1lREcGHcR7WzR7t0rgN2qROIKumunP2B1KVvTwnzvtnsxCQlK9N7De2x mcZtqtRqjRrcNsXJGvw7DhpHNzzgiwhwqlpmkIkml82cao/5Zu73WdbczK7yQ3rrH8BN YVPde1PYklweP7zmQqRzUn/yxRnGHXtelpdjpKmSNNDfxvTZI/1bOtbCZ20jHczryemr hybw== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=UmDIlgsk; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from server2.sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id u25-20020a05620a085900b00775cbd321a2si7797qku.586.2023.10.26.13.30.06 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 26 Oct 2023 13:30:06 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=UmDIlgsk; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A05C63855598 for ; Thu, 26 Oct 2023 20:30:06 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-wm1-x336.google.com (mail-wm1-x336.google.com [IPv6:2a00:1450:4864:20::336]) by sourceware.org (Postfix) with ESMTPS id 297483858D32 for ; Thu, 26 Oct 2023 20:29:42 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 297483858D32 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 297483858D32 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::336 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698352184; cv=none; b=prUs+NWAHGFW5yYWFLbw6Rniziw2CEjctB8S/bC1mKxeScqSGOq2FEaKsE0nXobl7REuWu2uEk5VYc5slrIMOJrtlLhLCav1U/1g69VpcTw6y0PgblV3A47GYOmLAQ9Y7i1gfDoUnnorhyd+KyDwq3lOxTecUSfznGYt0ahDIYI= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698352184; c=relaxed/simple; bh=9PZzG2rpVg6634oqht6QPvUSpFZpvbn7YLcArttR0Nw=; h=DKIM-Signature:Message-ID:Date:MIME-Version:To:From:Subject; b=HSpy0xxqMLq7MpluU5rmz3CStZPfDXtTQx4vmdAQ5Xwe6gezp0E9j4LMKFw4X+g6Wwg5kxPLlaP8ovG97yuZ97BwAb2LaHa6WkTvDdcFA4GReeZ0xol1SDGi/vVrTlLSyfGO0m6pABGPeDttuRyaUgitknEdwpT+u9RCTxIZpGw= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-wm1-x336.google.com with SMTP id 5b1f17b1804b1-40859c466efso10394455e9.3 for ; Thu, 26 Oct 2023 13:29:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1698352180; x=1698956980; darn=gcc.gnu.org; h=content-transfer-encoding:subject:from:to:content-language:cc :user-agent:mime-version:date:message-id:from:to:cc:subject:date :message-id:reply-to; bh=iW3PEQZ4iv1RkQEVz/y+AOaR9/qGyPLD/XJwfFaQWgM=; b=UmDIlgski9FP2ryqPofcDFUrf0Y9QOd6z2dGqwKN3ZyIOKhMpyyCj2KVfJ9t5eVPxZ /+cupfouYNU98mrH2ThSvm0lpi1pIXMU6WTEyu2p7WL/yyeNFs/SXHCSLIbi9KNpvG7r qBkuKGWaCTm94cvHgCzvk4WHuyPlzS0LyQ4X4NYDZ906ly3b6RZyqmp07VtdDPjMvCv1 5NScHpypenA/wLFGhHTSl9rw/FF8p5HTPj1OBTTKQxF+cbNtIjjmpofc5TAUtea3qFa1 nPYJYqfxfXZaHx1bu3dXP0mxXNEJLGQKuWLeapKH3DBnOI/ZDGvKvGei4VdBhuA6O+Yl NCHg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698352180; x=1698956980; h=content-transfer-encoding:subject:from:to:content-language:cc :user-agent:mime-version:date:message-id:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=iW3PEQZ4iv1RkQEVz/y+AOaR9/qGyPLD/XJwfFaQWgM=; b=cRaUHpcyKjDO8uidP5npaFubFtPUh3aX2D2gtSsD2UZw+fE7YpZ6I4Q5iExk7JzmRT FSdaF7Sn2QMh4kBwDzJ16PYWDRvIs/oLJjMAdeaqQKEvtmAEvxkVnSDbic55NFHhirOF jh93hBDQ+o2BK98D/7NMOYSJCLvrzp5XSb6CTe4FP0aqJKwir5paL4eagOdnOhjdeoOn fZFHwZvWVLXo8OifKxqZpXDLAVQQ7Xfa8iBxEFs2mjXX7tEKt7HSTBGvHKgLcQyj/inL FLaHUCATdxm3hOUK/JY24APjGacz1SaeO+aPiNIo+XssoBzm8uOohbmu5KekGlHdwJH1 8cSQ== X-Gm-Message-State: AOJu0YwQhXRdOlpVFW404LEdKZMh8+iWqirMPYshnzdshMo5m91q5kuL tAaUtjaE/kNyspsAl0GvoFhLKZUXbZU= X-Received: by 2002:a05:600c:4454:b0:407:7e5f:ffb9 with SMTP id v20-20020a05600c445400b004077e5fffb9mr724633wmn.9.1698352180059; Thu, 26 Oct 2023 13:29:40 -0700 (PDT) Received: from [192.168.1.24] (ip-046-223-203-173.um13.pools.vodafone-ip.de. [46.223.203.173]) by smtp.gmail.com with ESMTPSA id m2-20020a5d56c2000000b003233b554e6esm183636wrw.85.2023.10.26.13.29.39 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 26 Oct 2023 13:29:39 -0700 (PDT) Message-ID: <207cb501-558a-445b-95b5-b891819aefb6@gmail.com> Date: Thu, 26 Oct 2023 22:29:39 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Cc: rdapp.gcc@gmail.com Content-Language: en-US To: gcc-patches , palmer , Kito Cheng , jeffreyalaw , "juzhe.zhong@rivai.ai" From: Robin Dapp Subject: [PATCH] RISC-V: Add rawmemchr expander. X-Spam-Status: No, score=-9.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1780851363987022405 X-GMAIL-MSGID: 1780851363987022405 Hi, this patch adds a vectorized rawmemchr expander. It's basically strstr but for 8, 16 and 32-byte needles. Apart from adjusting the common-code tests I re-used a similar test that Stefan added to the s390 backend. Regards Robin gcc/ChangeLog: * config/riscv/autovec.md (rawmemchr): New expander. * config/riscv/riscv-protos.h (enum insn_type): Define. (expand_rawmemchr): New function. * config/riscv/riscv-v.cc (expand_rawmemchr): Add vectorized rawmemchr. * internal-fn.cc (expand_RAWMEMCHR): Fix typo. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/ldist-rawmemchr-1.c: Add riscv. * gcc.dg/tree-ssa/ldist-rawmemchr-2.c: Ditto. * gcc.target/riscv/rvv/autovec/rawmemchr-1.c: New test. --- gcc/config/riscv/autovec.md | 13 +++ gcc/config/riscv/riscv-protos.h | 1 + gcc/config/riscv/riscv-v.cc | 99 +++++++++++++++++++ gcc/internal-fn.cc | 2 +- .../gcc.dg/tree-ssa/ldist-rawmemchr-1.c | 8 +- .../gcc.dg/tree-ssa/ldist-rawmemchr-2.c | 8 +- .../riscv/rvv/autovec/rawmemchr-1.c | 99 +++++++++++++++++++ 7 files changed, 221 insertions(+), 9 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/rawmemchr-1.c diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md index 80910ba3cc2..5f49d73be44 100644 --- a/gcc/config/riscv/autovec.md +++ b/gcc/config/riscv/autovec.md @@ -2367,3 +2367,16 @@ (define_expand "lfloor2" DONE; } ) + +;; Implement rawmemchr[qi|si|hi]. +(define_expand "rawmemchr" + [(match_operand 0 "register_operand") + (match_operand 1 "memory_operand") + (match_operand:ANYI 2 "const_int_operand")] + "TARGET_VECTOR" + { + riscv_vector::expand_rawmemchr(mode, operands[0], operands[1], + operands[2]); + DONE; + } +) diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h index 668d75043ca..3c092c82ab1 100644 --- a/gcc/config/riscv/riscv-protos.h +++ b/gcc/config/riscv/riscv-protos.h @@ -522,6 +522,7 @@ void expand_cond_unop (unsigned, rtx *); void expand_cond_binop (unsigned, rtx *); void expand_cond_ternop (unsigned, rtx *); void expand_popcount (rtx *); +void expand_rawmemchr (machine_mode, rtx, rtx, rtx); /* Rounding mode bitfield for fixed point VXRM. */ enum fixed_point_rounding_mode diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index 3fe8125801b..653796a5ad2 100644 --- a/gcc/config/riscv/riscv-v.cc +++ b/gcc/config/riscv/riscv-v.cc @@ -2215,6 +2215,105 @@ expand_block_move (rtx dst_in, rtx src_in, rtx length_in) return true; } +/* Implement rawmemchr using vector instructions. + It can be assumed that the needle is in the haystack, otherwise the + behavior is undefined. */ + +void +expand_rawmemchr (machine_mode mode, rtx dst, rtx src, rtx pat) +{ + /* + rawmemchr: + loop: + vsetvli a1, zero, e[8,16,32,64], m1, ta, ma + vle[8,16,32,64]ff.v v8, (a0) # Load. + csrr a1, vl # Get number of bytes read. + vmseq.vx v0, v8, pat # v0 = (v8 == {pat, pat, ...}) + vfirst.m a2, v0 # Find first hit. + add a0, a0, a1 # Bump pointer. + bltz a2, loop # Not found? + + sub a0, a0, a1 # Go back by a1. + shll a2, a2, [0,1,2,3] # Shift to get byte offset. + add a0, a0, a2 # Add the offset. + + ret + */ + gcc_assert (TARGET_VECTOR); + + machine_mode vmode; + switch (mode) + { + case QImode: + vmode = E_RVVM1QImode; + break; + case HImode: + vmode = E_RVVM1HImode; + break; + case SImode: + vmode = E_RVVM1SImode; + break; + case DImode: + vmode = E_RVVM1DImode; + break; + default: + gcc_unreachable (); + } + machine_mode mask_mode = get_mask_mode (vmode); + + rtx cnt = gen_reg_rtx (Pmode); + rtx end = gen_reg_rtx (Pmode); + rtx vec = gen_reg_rtx (vmode); + rtx mask = gen_reg_rtx (mask_mode); + + /* After finding the first vector element matching the needle, we + need to multiply by the vector element width (SEW) in order to + return a pointer to the matching byte. */ + unsigned int shift = exact_log2 (GET_MODE_SIZE (mode).to_constant ()); + + rtx src_addr = copy_addr_to_reg (XEXP (src, 0)); + + rtx loop = gen_label_rtx (); + emit_label (loop); + + rtx vsrc = change_address (src, vmode, src_addr); + + /* Emit a first-fault load. */ + rtx vlops[] = {vec, vsrc}; + emit_vlmax_insn (code_for_pred_fault_load (vmode), UNARY_OP, vlops); + + /* Read how far we read. */ + if (Pmode == SImode) + emit_insn (gen_read_vlsi (cnt)); + else + emit_insn (gen_read_vldi_zero_extend (cnt)); + + /* Compare needle with haystack and store in a mask. */ + rtx eq = gen_rtx_EQ (mask_mode, gen_const_vec_duplicate (vmode, pat), vec); + rtx vmsops[] = {mask, eq, vec, pat}; + emit_nonvlmax_insn (code_for_pred_eqne_scalar (vmode), COMPARE_OP, vmsops, + cnt); + + /* Find the first bit in the mask. */ + rtx vfops[] = {end, mask}; + emit_nonvlmax_insn (code_for_pred_ffs (mask_mode, Pmode), + CPOP_OP, vfops, cnt); + + /* Bump the pointer. */ + emit_insn (gen_rtx_SET (src_addr, gen_rtx_PLUS (Pmode, src_addr, cnt))); + + /* Emit the loop condition. */ + rtx test = gen_rtx_LT (VOIDmode, end, const0_rtx); + emit_jump_insn (gen_cbranch4 (Pmode, test, end, const0_rtx, loop)); + + /* We overran by CNT, subtract it. */ + emit_insn (gen_rtx_SET (src_addr, gen_rtx_MINUS (Pmode, src_addr, cnt))); + + /* We found something at SRC + END * [1,2,4,8]. */ + emit_insn (gen_rtx_SET (end, gen_rtx_ASHIFT (Pmode, end, GEN_INT (shift)))); + emit_insn (gen_rtx_SET (dst, gen_rtx_PLUS (Pmode, src_addr, end))); +} + /* Return the vectorization machine mode for RVV according to LMUL. */ machine_mode preferred_simd_mode (scalar_mode mode) diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc index 61d5a9e4772..e7451b96353 100644 --- a/gcc/internal-fn.cc +++ b/gcc/internal-fn.cc @@ -3208,7 +3208,7 @@ expand_VEC_CONVERT (internal_fn, gcall *) gcc_unreachable (); } -/* Expand IFN_RAWMEMCHAR internal function. */ +/* Expand IFN_RAWMEMCHR internal function. */ void expand_RAWMEMCHR (internal_fn, gcall *stmt) diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-rawmemchr-1.c b/gcc/testsuite/gcc.dg/tree-ssa/ldist-rawmemchr-1.c index bf6335f6360..adf53b10def 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ldist-rawmemchr-1.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-rawmemchr-1.c @@ -1,9 +1,9 @@ -/* { dg-do run { target s390x-*-* } } */ +/* { dg-do run { target { { s390x-*-* } || { riscv_v } } } } */ /* { dg-options "-O2 -ftree-loop-distribution -fdump-tree-ldist-details" } */ /* { dg-additional-options "-march=z13 -mzarch" { target s390x-*-* } } */ -/* { dg-final { scan-tree-dump-times "generated rawmemchrQI" 2 "ldist" { target s390x-*-* } } } */ -/* { dg-final { scan-tree-dump-times "generated rawmemchrHI" 2 "ldist" { target s390x-*-* } } } */ -/* { dg-final { scan-tree-dump-times "generated rawmemchrSI" 2 "ldist" { target s390x-*-* } } } */ +/* { dg-final { scan-tree-dump-times "generated rawmemchrQI" 2 "ldist" { target { { s390x-*-* } || { riscv_v } } } } } */ +/* { dg-final { scan-tree-dump-times "generated rawmemchrHI" 2 "ldist" { target { { s390x-*-* } || { riscv_v } } } } } */ +/* { dg-final { scan-tree-dump-times "generated rawmemchrSI" 2 "ldist" { target { { s390x-*-* } || { riscv_v } } } } } */ /* Rawmemchr pattern: reduction stmt and no store */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-rawmemchr-2.c b/gcc/testsuite/gcc.dg/tree-ssa/ldist-rawmemchr-2.c index 83f5a35a322..6c8a485a3aa 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ldist-rawmemchr-2.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-rawmemchr-2.c @@ -1,9 +1,9 @@ -/* { dg-do run { target s390x-*-* } } */ +/* { dg-do run { target { { s390x-*-* } || { riscv_v } } } } */ /* { dg-options "-O2 -ftree-loop-distribution -fdump-tree-ldist-details" } */ /* { dg-additional-options "-march=z13 -mzarch" { target s390x-*-* } } */ -/* { dg-final { scan-tree-dump-times "generated rawmemchrQI" 2 "ldist" { target s390x-*-* } } } */ -/* { dg-final { scan-tree-dump-times "generated rawmemchrHI" 2 "ldist" { target s390x-*-* } } } */ -/* { dg-final { scan-tree-dump-times "generated rawmemchrSI" 2 "ldist" { target s390x-*-* } } } */ +/* { dg-final { scan-tree-dump-times "generated rawmemchrQI" 2 "ldist" { target { { s390x-*-* } || { riscv_v } } } } } */ +/* { dg-final { scan-tree-dump-times "generated rawmemchrHI" 2 "ldist" { target { { s390x-*-* } || { riscv_v } } } } } */ +/* { dg-final { scan-tree-dump-times "generated rawmemchrSI" 2 "ldist" { target { { s390x-*-* } || { riscv_v } } } } } */ /* Rawmemchr pattern: reduction stmt and store */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/rawmemchr-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/rawmemchr-1.c new file mode 100644 index 00000000000..ba83cb3836f --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/rawmemchr-1.c @@ -0,0 +1,99 @@ +/* { dg-do run { target { riscv_v } } } */ +/* { dg-additional-options "-std=gnu99 -O2 -ftree-loop-distribution -fdump-tree-ldist-details" } */ +/* { dg-final { scan-tree-dump-times "generated rawmemchrQI" 2 "ldist" } } */ +/* { dg-final { scan-tree-dump-times "generated rawmemchrHI" 2 "ldist" } } */ +/* { dg-final { scan-tree-dump-times "generated rawmemchrSI" 2 "ldist" } } */ + +#include +#include +#include +#include + +#define rawmemchrT(T, pattern) \ +__attribute__((noinline,noclone)) \ +T* rawmemchr_##T (T *s) \ +{ \ + while (*s != pattern) \ + ++s; \ + return s; \ +} + +rawmemchrT(int8_t, (int8_t)0xde) +rawmemchrT(uint8_t, 0xde) +rawmemchrT(int16_t, (int16_t)0xdead) +rawmemchrT(uint16_t, 0xdead) +rawmemchrT(int32_t, (int32_t)0xdeadbeef) +rawmemchrT(uint32_t, 0xdeadbeef) + +#define runT(T, pattern) \ +void run_##T () \ +{ \ + T *buf = malloc (4096 * 2 * sizeof(T)); \ + assert (buf != NULL); \ + memset (buf, 0xa, 4096 * 2 * sizeof(T)); \ + /* ensure q is 4096-byte aligned */ \ + T *q = (T*)((unsigned char *)buf \ + + (4096 - ((uintptr_t)buf & 4095))); \ + T *p; \ + /* unaligned + block boundary + 1st load */ \ + p = (T *) ((uintptr_t)q - 8); \ + p[2] = pattern; \ + assert ((rawmemchr_##T (&p[0]) == &p[2])); \ + p[2] = (T) 0xaaaaaaaa; \ + /* unaligned + block boundary + 2nd load */ \ + p = (T *) ((uintptr_t)q - 8); \ + p[6] = pattern; \ + assert ((rawmemchr_##T (&p[0]) == &p[6])); \ + p[6] = (T) 0xaaaaaaaa; \ + /* unaligned + 1st load */ \ + q[5] = pattern; \ + assert ((rawmemchr_##T (&q[2]) == &q[5])); \ + q[5] = (T) 0xaaaaaaaa; \ + /* unaligned + 2nd load */ \ + q[14] = pattern; \ + assert ((rawmemchr_##T (&q[2]) == &q[14])); \ + q[14] = (T) 0xaaaaaaaa; \ + /* unaligned + 3rd load */ \ + q[19] = pattern; \ + assert ((rawmemchr_##T (&q[2]) == &q[19])); \ + q[19] = (T) 0xaaaaaaaa; \ + /* unaligned + 4th load */ \ + q[25] = pattern; \ + assert ((rawmemchr_##T (&q[2]) == &q[25])); \ + q[25] = (T) 0xaaaaaaaa; \ + /* aligned + 1st load */ \ + q[5] = pattern; \ + assert ((rawmemchr_##T (&q[0]) == &q[5])); \ + q[5] = (T) 0xaaaaaaaa; \ + /* aligned + 2nd load */ \ + q[14] = pattern; \ + assert ((rawmemchr_##T (&q[0]) == &q[14])); \ + q[14] = (T) 0xaaaaaaaa; \ + /* aligned + 3rd load */ \ + q[19] = pattern; \ + assert ((rawmemchr_##T (&q[0]) == &q[19])); \ + q[19] = (T) 0xaaaaaaaa; \ + /* aligned + 4th load */ \ + q[25] = pattern; \ + assert ((rawmemchr_##T (&q[0]) == &q[25])); \ + q[25] = (T) 0xaaaaaaaa; \ + free (buf); \ +} + +runT(int8_t, (int8_t)0xde) +runT(uint8_t, 0xde) +runT(int16_t, (int16_t)0xdead) +runT(uint16_t, 0xdead) +runT(int32_t, (int32_t)0xdeadbeef) +runT(uint32_t, 0xdeadbeef) + +int main (void) +{ + run_uint8_t (); + run_int8_t (); + run_uint16_t (); + run_int16_t (); + run_uint32_t (); + run_int32_t (); + return 0; +}