From patchwork Tue Dec 19 09:53:46 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sergei Lewis X-Patchwork-Id: 180851 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:24d3:b0:fb:cd0c:d3e with SMTP id r19csp1826635dyi; Tue, 19 Dec 2023 01:54:21 -0800 (PST) X-Google-Smtp-Source: AGHT+IGRSOflLQjhlDIZtwPWGjw68E2JmVhrHmcV6SeEsJ9RqlLH02uLbUCaGLVJrA7aZYzB2DqU X-Received: by 2002:a05:622a:1708:b0:41e:1d15:69a6 with SMTP id h8-20020a05622a170800b0041e1d1569a6mr27372713qtk.31.1702979661041; Tue, 19 Dec 2023 01:54:21 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1702979661; cv=pass; d=google.com; s=arc-20160816; b=S4wEvAMdbVdTc49PEmbuKtHd+hbO1aHidGpSEqY27tsUGMIPgNFbktGToUxQmUtyMH //eC5mC7WdywnyGPlFHLyuECQEiT9Rj4hXSyhPjx71VhNgJOqTfqJOfOsJkzf6Kt6RG2 NT6dPFp050AkEgnqOhADw9F6PJ3yKwlZSAQ+VqOrJo4XxFc/s19c0HRZIIL6urBlmPRy gNdj6QZJ1HheGg1LZbVo5Ei4Tluj4pj+x0TTHwWXtpVOSMlDD8Kt5HtQ0jmhF3GKxfy6 JfBZpyIh7NG27w8Vw1WSRVZzhhEz3jI1vkRpMmH8ekZ597idc7WaQX8PAi8J3R2owcSK FbcA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:to:from :dkim-signature:arc-filter:dmarc-filter:delivered-to; bh=0QiLqOW6hhzpOcupW89dRnkLpt88za/TgK25dLcOyBY=; fh=hPrbWPhweUx4V0GV9uXJqbyAzg2ABmTz7kczrAQqMmM=; b=pP2andmL9UiuPKdUcrQD7wnoO1Kawa3AR8aly16cMPMe5AhiwDk1SyGiimp0Za5Nwf GCcuDru95kwLnM0H8VabE/Mo+Jz/+XTtyCQTCWpu6aXANZtWb87M7eGhrQCT/+3TWNHa rGm4vm+W2h1r/5v1WOIwr0XZP8SISj5Gswly5V1WCUSNVKiEUI5BWzuKoYc6nNIpuCe5 uepAlvqBBLewDbgfmofLPgeE32Z35Wsc8c8OJ/DAv3uXroGNXtYzxOZRBTC8Ngz+TUS/ hhx9QutHdpTgnNvG3KOTtyWSsnltNnHxfW9dTPMjZDJO03B2x5oj5CLEQyhZqino4tSS nSbg== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@rivosinc-com.20230601.gappssmtp.com header.s=20230601 header.b=c53gS7Tb; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id bv6-20020a05622a0a0600b004238551943bsi28438718qtb.516.2023.12.19.01.54.20 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 19 Dec 2023 01:54:21 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@rivosinc-com.20230601.gappssmtp.com header.s=20230601 header.b=c53gS7Tb; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 4B475385E45F for ; Tue, 19 Dec 2023 09:54:19 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-wr1-x42f.google.com (mail-wr1-x42f.google.com [IPv6:2a00:1450:4864:20::42f]) by sourceware.org (Postfix) with ESMTPS id D66FA3858439 for ; Tue, 19 Dec 2023 09:53:52 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D66FA3858439 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivosinc.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivosinc.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org D66FA3858439 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::42f ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1702979634; cv=none; b=l3oRlp6zNGuRzW5pLUtRpremuwWLFBpH/rP46XpQE7I6RyPxLwLQWH280Lz6UDwe2KMUs2EwCAa9sVB59G3Tc2ZjgX0PToDhPohjdu1vgiRdF+DCOSFSA12KnAiYOcfw8qIENFf2qQ0NF8fFIZtW8SkJHG7zMPVu+6tUI88uc6M= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1702979634; c=relaxed/simple; bh=AzmlhSvYWgVR6k748R1nfJl8Uj8eb8o49vjFI0kVdVk=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=JAjlzaZDjJ5r3PPAIkPRKolkDb6aaQBEtCkbTmh5KWeWGV4E6hn1lgrNjU5MVgsIxGWd7/AwJ71B7z817oBqY0uJLNCbNM6wAVtg5k8Cpc1+BjxvOIpn1XNUZ5NinZh5mSTM/X171SzDGUZnvIdAM3KXTEDX0Cl55WrvaOGkqwI= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-wr1-x42f.google.com with SMTP id ffacd0b85a97d-33664b4e038so2302864f8f.3 for ; Tue, 19 Dec 2023 01:53:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20230601.gappssmtp.com; s=20230601; t=1702979631; x=1703584431; darn=gcc.gnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=0QiLqOW6hhzpOcupW89dRnkLpt88za/TgK25dLcOyBY=; b=c53gS7TbXU24rYYKR8vaaFHdTZ0IaV/mRwCSeDFOorv6Bb0zUB6HiYvPL+Qe6ZSpqG AcfERc9FPN9FzU1LiVwCtkumuiTvgD5lLQfEhbcjTESrqCYGkb0u9oUMD3viFh05aPew VbXi0CeNMw9t0UTSMmNg/9GibV9PqaeA7Kpjb8ky1UP/noywtY9j6cUvWxUpnfqvOTYQ a2nCQdtfjjHMHzdOLAGRDXgzn44+AeHiPKvq5gik89uZPAeBiuQwA2M2gSKc8xP2VniJ 8Kb1czgTNDYdjlDgbAPIBYOaacjgVVtH+6yEnv/DXOqaVi4WcbNaQYmjgh1VYCbMRf3p oMOA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702979631; x=1703584431; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=0QiLqOW6hhzpOcupW89dRnkLpt88za/TgK25dLcOyBY=; b=aLC8qCw9Bm/CQ15KoJsUz57vSpsq7UmYAHKxfCPSq5Z7RyOIYIapIIN+0OT0a7Lbfy OT2PZPyueqRwEW/4j43+3z1V3Ns2uyDqz1JtCv+w2cwjdTkljJz9zqzomeDnaimwpMbt Pe7gaVoC3xiZNGDIS4/izdcfpjqj2quUl78ZqColgdFrXddHiFULxaLvswRdZLD8XYnV ZzuXwkdbGgHjrA76VekVOn8jQW5nx/kebiLyaanpBcelMPIQGfF6V9CouJVccbpbFDur kwe5bpRzwocgueBLnkv8njqGVuA6Ms6QWkcL5KK9K29yI9yh3NW4ZzYYzKOd/nrZpaWA s4ng== X-Gm-Message-State: AOJu0YznSnvDNMIDaAsIXtQjZqzhU7rzXk7Xr+DvzAukdpVgudk2nbtB JwVjtWhXqWnxjk6VTC3E2+/g4+Pgfqor1FBh6O1Jvg== X-Received: by 2002:a05:600c:1395:b0:40c:6e33:e212 with SMTP id u21-20020a05600c139500b0040c6e33e212mr3130872wmf.67.1702979630241; Tue, 19 Dec 2023 01:53:50 -0800 (PST) Received: from slewis-laptop.ba.rivosinc.com ([51.52.155.69]) by smtp.gmail.com with ESMTPSA id q19-20020a05600c46d300b0040b632f31d2sm2079985wmo.5.2023.12.19.01.53.49 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 19 Dec 2023 01:53:49 -0800 (PST) From: Sergei Lewis To: gcc-patches@gcc.gnu.org Subject: [PATCH v2 1/3] RISC-V: movmem for RISCV with V extension Date: Tue, 19 Dec 2023 09:53:46 +0000 Message-Id: <20231219095348.356551-2-slewis@rivosinc.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231219095348.356551-1-slewis@rivosinc.com> References: <20231219095348.356551-1-slewis@rivosinc.com> MIME-Version: 1.0 X-Spam-Status: No, score=-11.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1785703601168138973 X-GMAIL-MSGID: 1785703601168138973 gcc/ChangeLog * config/riscv/riscv.md (movmem): Use riscv_vector::expand_block_move, if and only if we know the entire operation can be performed using one vector load followed by one vector store gcc/testsuite/ChangeLog PR target/112109 * gcc.target/riscv/rvv/base/movmem-1.c: New test --- gcc/config/riscv/riscv.md | 22 +++++++ .../gcc.target/riscv/rvv/base/movmem-1.c | 60 +++++++++++++++++++ 2 files changed, 82 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/movmem-1.c diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md index ee8b71c22aa..1b3f66fd15c 100644 --- a/gcc/config/riscv/riscv.md +++ b/gcc/config/riscv/riscv.md @@ -2365,6 +2365,28 @@ FAIL; }) +;; Inlining general memmove is a pessimisation: we can't avoid having to decide +;; which direction to go at runtime, which is costly in instruction count +;; however for situations where the entire move fits in one vector operation +;; we can do all reads before doing any writes so we don't have to worry +;; so generate the inline vector code in such situations +;; nb. prefer scalar path for tiny memmoves. +(define_expand "movmem" + [(parallel [(set (match_operand:BLK 0 "general_operand") + (match_operand:BLK 1 "general_operand")) + (use (match_operand:P 2 "const_int_operand")) + (use (match_operand:SI 3 "const_int_operand"))])] + "TARGET_VECTOR" +{ + if ((INTVAL (operands[2]) >= TARGET_MIN_VLEN/8) + && (INTVAL (operands[2]) <= TARGET_MIN_VLEN) + && riscv_vector::expand_block_move (operands[0], operands[1], + operands[2])) + DONE; + else + FAIL; +}) + ;; Expand in-line code to clear the instruction cache between operand[0] and ;; operand[1]. (define_expand "clear_cache" diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/movmem-1.c b/gcc/testsuite/gcc.target/riscv/rvv/base/movmem-1.c new file mode 100644 index 00000000000..0ecc3f7e3b7 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/movmem-1.c @@ -0,0 +1,60 @@ +/* { dg-do compile } */ +/* { dg-add-options riscv_v } */ +/* { dg-additional-options "-O3 --param=riscv-autovec-lmul=dynamic" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#define MIN_VECTOR_BYTES (__riscv_v_min_vlen / 8) + +/* Tiny memmoves should not be vectorised. +** f1: +** li\s+a2,\d+ +** tail\s+memmove +*/ +char * +f1 (char *a, char const *b) +{ + return __builtin_memmove (a, b, MIN_VECTOR_BYTES - 1); +} + +/* Vectorise+inline minimum vector register width with LMUL=1 +** f2: +** ( +** vsetivli\s+zero,16,e8,m1,ta,ma +** | +** li\s+[ta][0-7],\d+ +** vsetvli\s+zero,[ta][0-7],e8,m1,ta,ma +** ) +** vle8\.v\s+v\d+,0\(a1\) +** vse8\.v\s+v\d+,0\(a0\) +** ret +*/ +char * +f2 (char *a, char const *b) +{ + return __builtin_memmove (a, b, MIN_VECTOR_BYTES); +} + +/* Vectorise+inline up to LMUL=8 +** f3: +** li\s+[ta][0-7],\d+ +** vsetvli\s+zero,[ta][0-7],e8,m8,ta,ma +** vle8\.v\s+v\d+,0\(a1\) +** vse8\.v\s+v\d+,0\(a0\) +** ret +*/ +char * +f3 (char *a, char const *b) +{ + return __builtin_memmove (a, b, MIN_VECTOR_BYTES * 8); +} + +/* Don't vectorise if the move is too large for one operation +** f4: +** li\s+a2,\d+ +** tail\s+memmove +*/ +char * +f4 (char *a, char const *b) +{ + return __builtin_memmove (a, b, MIN_VECTOR_BYTES * 8 + 1); +} From patchwork Tue Dec 19 09:53:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sergei Lewis X-Patchwork-Id: 180853 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:24d3:b0:fb:cd0c:d3e with SMTP id r19csp1826792dyi; Tue, 19 Dec 2023 01:54:52 -0800 (PST) X-Google-Smtp-Source: AGHT+IF3A6owcXi0ure/lChYMBpgXwiLTL3vLw2cGMETPGuiHjUrovQBR3OwS9+IXT4iZ1a04HwB X-Received: by 2002:a05:622a:38e:b0:421:ba91:f590 with SMTP id j14-20020a05622a038e00b00421ba91f590mr23578036qtx.1.1702979691937; Tue, 19 Dec 2023 01:54:51 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1702979691; cv=pass; d=google.com; s=arc-20160816; b=F+hD7UkC+PZoDggKSM65Ay2IB5DmctyCuA/IP3gwBVNZF+/ih01XET708YvkIkdc0f eBkTnKKYpVoYnwl3KkBiD9w3mzjWCMXLox14yoHjBs1gLsB1ZAjhH/dFOM9BRaTSTM51 snqZaS1hEzHBS3jDfo7VeYpkJ8YLGc7Z70TBS1a+bwAlRp65Ri5JDAKIZjeTwW7mI3QC 6+CZ21C+OPBEsYiN+48wsvkHaVn8BZZWuTJWFfmBTZZK6Z+GuGtjxJTvokT7lu75vE46 9zwAXgm5H23FjIqREVHSSc8TwlBUjWcVnVkjcgvcRw1hreGKS3fqhB5ioeSgc2rT72YJ mD4w== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:to:from :dkim-signature:arc-filter:dmarc-filter:delivered-to; bh=4svtwVnCcahwTuOm53ref/4bH6PwPLfcm7TatQZA5/4=; fh=hPrbWPhweUx4V0GV9uXJqbyAzg2ABmTz7kczrAQqMmM=; b=sn0RooDSYI9WRXjK670xfu3Hhs2S65wtnGC9ChqMtPd+RTnp0E32UokPL9Os348yQb GvZOtv8+4Ra4tOtbtnLgeWelJkbMrakmNEmOgYblf8xP5EwHluGT/0LF/giF3KZjvkxy AbsRuS5TZ7Fx22uw4A0ffL0lQNv7eNwSXpH/oFgabsMCeDGRRigfP4Eoq+x6HbyWmEZC m9s7Xw7o/Cj0g4gdlNH4NxF3Amx6SuljccS1sj0r4UYqS0aPaxHzKxrE3rGsitvH7y2A wKl5pnSneA09IELq9/rBmTVDiJwI8Z0jEYTNWp2wiz5j82XvGbaVJnN0M9tVt+/rSCPZ dYNg== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@rivosinc-com.20230601.gappssmtp.com header.s=20230601 header.b=ZNM5dE05; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id kg16-20020a05622a761000b004238a6e25fbsi24222284qtb.77.2023.12.19.01.54.51 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 19 Dec 2023 01:54:51 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@rivosinc-com.20230601.gappssmtp.com header.s=20230601 header.b=ZNM5dE05; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id BA0AA386181A for ; Tue, 19 Dec 2023 09:54:48 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-wm1-x335.google.com (mail-wm1-x335.google.com [IPv6:2a00:1450:4864:20::335]) by sourceware.org (Postfix) with ESMTPS id AAA613858282 for ; Tue, 19 Dec 2023 09:53:53 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org AAA613858282 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivosinc.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivosinc.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org AAA613858282 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::335 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1702979636; cv=none; b=KauaQ4pHh4wpMCQ1KwQcxtqkKM9Ku9ib5tnWPT3QQL3p2Z6L1OxVyfGWW4d/KTdWGP9AaXx0pTqdiZTDBhIb3uVKyWkrs4m5NjF2pwD13zC3nex5VZftbtbod6ZoDk8O3HzYgcqf/NOvJBnZE1eW9iZY7Gc3DdnJ1sCYQDu2XNU= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1702979636; c=relaxed/simple; bh=hYzLOiNx2razIdnMnmogHY+w+m/oxGTg3rMCk3o3/74=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=i5nu6+5y06hyK+/Ez/l8gmd8YSPNiz+NEz691lzmQEgn5iwaIkwWW9HS235zZfLmr2knTNJtGWXXad50qpx3CfdZrvwQMpdFWoJpbwrGR1zVu3k6FY6CdAoL9O7EOqxl5C6Hu++rI6J4SLY3ePcK5RUQEDt3JY2CKBK/5LUhrVE= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-wm1-x335.google.com with SMTP id 5b1f17b1804b1-40c38e292c8so21537575e9.0 for ; Tue, 19 Dec 2023 01:53:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20230601.gappssmtp.com; s=20230601; t=1702979631; x=1703584431; darn=gcc.gnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=4svtwVnCcahwTuOm53ref/4bH6PwPLfcm7TatQZA5/4=; b=ZNM5dE05oV4SZ6lTcY2uIfPG4cywwbCGqDCK+4615OVX6LF1V93Nfhm10pZxeW0slS Fomq6c2ap3j6Qe8VCegj+su2BKSiNSAQCq6LUygsj9SdsiRQq/7NI61jwzDkXFi/CDDR qkAT2EJhnf1YEpDNuk3SmcrRG3T5QL+LWqzpXLhVxKFa2qMMpyEz+uQqyOGuNNeS6U4P jW3BdHfuvJiUor2SELLX48NawQY7lrDr9kUI5UnkpzvKQCILssl5e55f7xTgnPMaMKuP 5wDi49ecL6mHVNnW0CyFhHOEKoPfg6IPlaVPXbOtTT3ID4pHWht8RDcfamb+gu13YXrd becQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702979631; x=1703584431; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=4svtwVnCcahwTuOm53ref/4bH6PwPLfcm7TatQZA5/4=; b=bca5XNVE91whQjbnK6F3C9ttKzu75xUw1jRXzYXb+cm5cGAQum3hMM/2GlZqBGkU/1 sLGiOmurEzSAZNdOE+Q7ddhPcRTk1/KN0nX3PDUZrO3/FuuGMnefIl4vIssx4KQSpZyi eqh57PY6AjG9bw9eMZKR1Sa4O+Jm3HKLAnv9TvU25mJSkCPTfppjcxruun4m8YpjlLQO beCTSjvWSHK8aefe399B7wT2wzr6tr9Ksg3l3i/GsnzWzo/YmY933DCfVNgsNti2fG8F T0Fg8WdZOmsUgOviBJ0ebU7hXA4hie4NV2TZBlCxrwCP5oH9Z2cQI28JRVGNkn1ywRIj fu1w== X-Gm-Message-State: AOJu0YydhZ+FHC9b2oGRZq3ICP/J1smXSzRGf720kwB7SZcFOpzXH7l+ dFxu/MUKO/Oqo7tOXLlIBLx56KfBclFvDsTeJJMzxw== X-Received: by 2002:a05:600c:695:b0:40c:2710:f67 with SMTP id a21-20020a05600c069500b0040c27100f67mr386600wmn.85.1702979630772; Tue, 19 Dec 2023 01:53:50 -0800 (PST) Received: from slewis-laptop.ba.rivosinc.com ([51.52.155.69]) by smtp.gmail.com with ESMTPSA id q19-20020a05600c46d300b0040b632f31d2sm2079985wmo.5.2023.12.19.01.53.50 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 19 Dec 2023 01:53:50 -0800 (PST) From: Sergei Lewis To: gcc-patches@gcc.gnu.org Subject: [PATCH v2 2/3] RISC-V: setmem for RISCV with V extension Date: Tue, 19 Dec 2023 09:53:47 +0000 Message-Id: <20231219095348.356551-3-slewis@rivosinc.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231219095348.356551-1-slewis@rivosinc.com> References: <20231219095348.356551-1-slewis@rivosinc.com> MIME-Version: 1.0 X-Spam-Status: No, score=-11.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1785703633079715705 X-GMAIL-MSGID: 1785703633079715705 gcc/ChangeLog * config/riscv/riscv-protos.h (riscv_vector::expand_vec_setmem): New function declaration. * config/riscv/riscv-string.cc (riscv_vector::expand_vec_setmem): New function: this generates an inline vectorised memory set, if and only if we know the entire operation can be performed in a single vector store * config/riscv/riscv.md (setmem): Try riscv_vector::expand_vec_setmem for constant lengths gcc/testsuite/ChangeLog * gcc.target/riscv/rvv/base/setmem-1.c: New tests * gcc.target/riscv/rvv/base/setmem-2.c: New tests * gcc.target/riscv/rvv/base/setmem-3.c: New tests --- gcc/config/riscv/riscv-protos.h | 1 + gcc/config/riscv/riscv-string.cc | 90 +++++++++++++++ gcc/config/riscv/riscv.md | 14 +++ .../gcc.target/riscv/rvv/base/setmem-1.c | 103 ++++++++++++++++++ .../gcc.target/riscv/rvv/base/setmem-2.c | 51 +++++++++ .../gcc.target/riscv/rvv/base/setmem-3.c | 69 ++++++++++++ 6 files changed, 328 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/setmem-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/setmem-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/setmem-3.c diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h index eaee53ce94e..c4531589300 100644 --- a/gcc/config/riscv/riscv-protos.h +++ b/gcc/config/riscv/riscv-protos.h @@ -637,6 +637,7 @@ void expand_popcount (rtx *); void expand_rawmemchr (machine_mode, rtx, rtx, rtx, bool = false); bool expand_strcmp (rtx, rtx, rtx, rtx, unsigned HOST_WIDE_INT, bool); void emit_vec_extract (rtx, rtx, rtx); +bool expand_vec_setmem (rtx, rtx, rtx, rtx); /* Rounding mode bitfield for fixed point VXRM. */ enum fixed_point_rounding_mode diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc index 11c1f74d0b3..e506b92a552 100644 --- a/gcc/config/riscv/riscv-string.cc +++ b/gcc/config/riscv/riscv-string.cc @@ -1247,4 +1247,94 @@ expand_strcmp (rtx result, rtx src1, rtx src2, rtx nbytes, return true; } +/* Check we are permitted to vectorise a memory operation. + If so, return true and populate lmul_out. + Otherwise, return false and leave lmul_out unchanged. */ +static bool +check_vectorise_memory_operation (rtx length_in, HOST_WIDE_INT &lmul_out) +{ + /* If we either can't or have been asked not to vectorise, respect this. */ + if (!TARGET_VECTOR) + return false; + if (!(stringop_strategy & STRATEGY_VECTOR)) + return false; + + /* If we can't reason about the length, don't vectorise. */ + if (!CONST_INT_P (length_in)) + return false; + + HOST_WIDE_INT length = INTVAL (length_in); + + /* If it's tiny, default operation is likely better; maybe worth + considering fractional lmul in the future as well. */ + if (length < (TARGET_MIN_VLEN / 8)) + return false; + + /* If we've been asked to use a specific LMUL, + check the operation fits and do that. */ + if (riscv_autovec_lmul != RVV_DYNAMIC) + { + lmul_out = TARGET_MAX_LMUL; + return (length <= ((TARGET_MAX_LMUL * TARGET_MIN_VLEN) / 8)); + } + + /* Find smallest lmul large enough for entire op. */ + HOST_WIDE_INT lmul = 1; + while ((lmul <= 8) && (length > ((lmul * TARGET_MIN_VLEN) / 8))) + { + lmul <<= 1; + } + + if (lmul > 8) + return false; + + lmul_out = lmul; + return true; +} + +/* Used by setmemdi in riscv.md. */ +bool +expand_vec_setmem (rtx dst_in, rtx length_in, rtx fill_value_in, + rtx alignment_in) +{ + HOST_WIDE_INT lmul; + /* Check we are able and allowed to vectorise this operation; + bail if not. */ + if (!check_vectorise_memory_operation (length_in, lmul)) + return false; + + machine_mode vmode + = riscv_vector::get_vector_mode (QImode, BYTES_PER_RISCV_VECTOR * lmul) + .require (); + rtx dst_addr = copy_addr_to_reg (XEXP (dst_in, 0)); + rtx dst = change_address (dst_in, vmode, dst_addr); + + rtx fill_value = gen_reg_rtx (vmode); + rtx broadcast_ops[] = { fill_value, fill_value_in }; + + /* If the length is exactly vlmax for the selected mode, do that. + Otherwise, use a predicated store. */ + if (known_eq (GET_MODE_SIZE (vmode), INTVAL (length_in))) + { + emit_vlmax_insn (code_for_pred_broadcast (vmode), UNARY_OP, + broadcast_ops); + emit_move_insn (dst, fill_value); + } + else + { + if (!satisfies_constraint_K (length_in)) + length_in = force_reg (Pmode, length_in); + emit_nonvlmax_insn (code_for_pred_broadcast (vmode), UNARY_OP, + broadcast_ops, length_in); + machine_mode mask_mode + = riscv_vector::get_vector_mode (BImode, GET_MODE_NUNITS (vmode)) + .require (); + rtx mask = CONSTM1_RTX (mask_mode); + emit_insn (gen_pred_store (vmode, dst, mask, fill_value, length_in, + get_avl_type_rtx (riscv_vector::NONVLMAX))); + } + + return true; +} + } diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md index 1b3f66fd15c..dd34211ca80 100644 --- a/gcc/config/riscv/riscv.md +++ b/gcc/config/riscv/riscv.md @@ -2387,6 +2387,20 @@ FAIL; }) +(define_expand "setmemsi" + [(set (match_operand:BLK 0 "memory_operand") ;; Dest + (match_operand:QI 2 "nonmemory_operand")) ;; Value + (use (match_operand:SI 1 "const_int_operand")) ;; Length + (match_operand:SI 3 "const_int_operand")] ;; Align + "TARGET_VECTOR" +{ + if (riscv_vector::expand_vec_setmem (operands[0], operands[1], operands[2], + operands[3])) + DONE; + else + FAIL; +}) + ;; Expand in-line code to clear the instruction cache between operand[0] and ;; operand[1]. (define_expand "clear_cache" diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/setmem-1.c b/gcc/testsuite/gcc.target/riscv/rvv/base/setmem-1.c new file mode 100644 index 00000000000..1c08be978a6 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/setmem-1.c @@ -0,0 +1,103 @@ +/* { dg-do compile } */ +/* { dg-add-options riscv_v } */ +/* { dg-additional-options "-O3 --param=riscv-autovec-lmul=dynamic" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#define MIN_VECTOR_BYTES (__riscv_v_min_vlen / 8) + +/* Tiny memsets should use scalar ops. +** f1: +** sb\s+a1,0\(a0\) +** ret +*/ +void * +f1 (void *a, int const b) +{ + return __builtin_memset (a, b, 1); +} + +/* Tiny memsets should use scalar ops. +** f2: +** sb\s+a1,0\(a0\) +** sb\s+a1,1\(a0\) +** ret +*/ +void * +f2 (void *a, int const b) +{ + return __builtin_memset (a, b, 2); +} + +/* Tiny memsets should use scalar ops. +** f3: +** sb\s+a1,0\(a0\) +** sb\s+a1,1\(a0\) +** sb\s+a1,2\(a0\) +** ret +*/ +void * +f3 (void *a, int const b) +{ + return __builtin_memset (a, b, 3); +} + +/* Vectorise+inline minimum vector register width with LMUL=1 +** f4: +** ( +** vsetivli\s+zero,\d+,e8,m1,ta,ma +** | +** li\s+a\d+,\d+ +** vsetvli\s+zero,a\d+,e8,m1,ta,ma +** ) +** vmv\.v\.x\s+v\d+,a1 +** vse8\.v\s+v\d+,0\(a0\) +** ret +*/ +void * +f4 (void *a, int const b) +{ + return __builtin_memset (a, b, MIN_VECTOR_BYTES); +} + +/* Vectorised code should use smallest lmul known to fit length +** f5: +** ( +** vsetivli\s+zero,\d+,e8,m2,ta,ma +** | +** li\s+a\d+,\d+ +** vsetvli\s+zero,a\d+,e8,m2,ta,ma +** ) +** vmv\.v\.x\s+v\d+,a1 +** vse8\.v\s+v\d+,0\(a0\) +** ret +*/ +void * +f5 (void *a, int const b) +{ + return __builtin_memset (a, b, MIN_VECTOR_BYTES + 1); +} + +/* Vectorise+inline up to LMUL=8 +** f6: +** li\s+a\d+,\d+ +** vsetvli\s+zero,a\d+,e8,m8,ta,ma +** vmv\.v\.x\s+v\d+,a1 +** vse8\.v\s+v\d+,0\(a0\) +** ret +*/ +void * +f6 (void *a, int const b) +{ + return __builtin_memset (a, b, MIN_VECTOR_BYTES * 8); +} + +/* Don't vectorise if the move is too large for one operation. +** f7: +** li\s+a2,\d+ +** tail\s+memset +*/ +void * +f7 (void *a, int const b) +{ + return __builtin_memset (a, b, MIN_VECTOR_BYTES * 8 + 1); +} diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/setmem-2.c b/gcc/testsuite/gcc.target/riscv/rvv/base/setmem-2.c new file mode 100644 index 00000000000..82d181dff3f --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/setmem-2.c @@ -0,0 +1,51 @@ +/* { dg-do compile } */ +/* { dg-add-options riscv_v } */ +/* { dg-additional-options "-O3 --param riscv-autovec-lmul=m1" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#define MIN_VECTOR_BYTES (__riscv_v_min_vlen / 8) + +/* Small memsets shouldn't be vectorised. +** f1: +** ( +** sb\s+a1,0\(a0\) +** ... +** | +** li\s+a2,\d+ +** tail\s+memset +** ) +*/ +void * +f1 (void *a, int const b) +{ + return __builtin_memset (a, b, MIN_VECTOR_BYTES - 1); +} + +/* Vectorise+inline minimum vector register width using requested lmul. +** f2: +** ( +** vsetivli\s+zero,\d+,e8,m1,ta,ma +** | +** li\s+a\d+,\d+ +** vsetvli\s+zero,a\d+,e8,m1,ta,ma +** ) +** vmv\.v\.x\s+v\d+,a1 +** vse8\.v\s+v\d+,0\(a0\) +** ret +*/ +void * +f2 (void *a, int const b) +{ + return __builtin_memset (a, b, MIN_VECTOR_BYTES); +} + +/* Don't vectorise if the move is too large for requested lmul. +** f3: +** li\s+a2,\d+ +** tail\s+memset +*/ +void * +f3 (void *a, int const b) +{ + return __builtin_memset (a, b, MIN_VECTOR_BYTES + 1); +} diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/setmem-3.c b/gcc/testsuite/gcc.target/riscv/rvv/base/setmem-3.c new file mode 100644 index 00000000000..f043d9e0784 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/setmem-3.c @@ -0,0 +1,69 @@ +/* { dg-do compile } */ +/* { dg-add-options riscv_v } */ +/* { dg-additional-options "-O3 --param riscv-autovec-lmul=m8" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#define MIN_VECTOR_BYTES (__riscv_v_min_vlen / 8) + +/* Small memsets shouldn't be vectorised. +** f1: +** ( +** sb\s+a1,0\(a0\) +** ... +** | +** li\s+a2,\d+ +** tail\s+memset +** ) +*/ +void * +f1 (void *a, int const b) +{ + return __builtin_memset (a, b, MIN_VECTOR_BYTES - 1); +} + +/* Vectorise+inline minimum vector register width using requested lmul. +** f2: +** ( +** vsetivli\s+zero,\d+,e8,m8,ta,ma +** | +** li\s+a\d+,\d+ +** vsetvli\s+zero,a\d+,e8,m8,ta,ma +** ) +** vmv\.v\.x\s+v\d+,a1 +** vse8\.v\s+v\d+,0\(a0\) +** ret +*/ +void * +f2 (void *a, int const b) +{ + return __builtin_memset (a, b, MIN_VECTOR_BYTES); +} + +/* Vectorise+inline operations up to requested lmul. +** f3: +** ( +** vsetivli\s+zero,\d+,e8,m8,ta,ma +** | +** li\s+a\d+,\d+ +** vsetvli\s+zero,a\d+,e8,m8,ta,ma +** ) +** vmv\.v\.x\s+v\d+,a1 +** vse8\.v\s+v\d+,0\(a0\) +** ret +*/ +void * +f3 (void *a, int const b) +{ + return __builtin_memset (a, b, MIN_VECTOR_BYTES * 8); +} + +/* Don't vectorise if the move is too large for requested lmul. +** f4: +** li\s+a2,\d+ +** tail\s+memset +*/ +void * +f4 (void *a, int const b) +{ + return __builtin_memset (a, b, MIN_VECTOR_BYTES * 8 + 1); +} From patchwork Tue Dec 19 09:53:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sergei Lewis X-Patchwork-Id: 180852 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:24d3:b0:fb:cd0c:d3e with SMTP id r19csp1826744dyi; Tue, 19 Dec 2023 01:54:41 -0800 (PST) X-Google-Smtp-Source: AGHT+IF+epZEa1IareyLPb7PwQlJTwtqk3Bh5Jt5zw1TvMvReIBXTHXSW/jhB2d7JrHygq9q5+vI X-Received: by 2002:a05:622a:1c8:b0:425:4043:1d8a with SMTP id t8-20020a05622a01c800b0042540431d8amr22668081qtw.93.1702979681188; Tue, 19 Dec 2023 01:54:41 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1702979681; cv=pass; d=google.com; s=arc-20160816; b=RR635bU9Khb4WrK6UzqUwxw4F+oQCQCA4FqIPjf58qfAYV83Yt+MeU6LRYhfNU3yTa /Dbicz7j9Wtc6gInWG0EB6wR0R7Ks3IchluZclEK+9EFlUSC+1pT8iEeNaD4Ey2quKQL zh7eHDVQa0Umtz8YKFHElGQIW4jz8ECnHxoOIGECiFiBAL3dFap7wKsO4BSSg3IWmt4m z9a3XLGATjHx80QMO9ib8qpGJBBFDWHqadFFNFN7m6y3gca3Zpz+k4MCRh2KxtX4eH8W S7177XFl65uahQe+4jNUJzFJMlEpzqIGq/lDKDf9i8uQbD/oP1fTM9u2SINgIH9mkt0C 3Mzg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:to:from :dkim-signature:arc-filter:dmarc-filter:delivered-to; bh=Mrj6HRmH3G1MhcYY31iFtdJ/GcIY9QP9yyjd8lMOPKk=; fh=hPrbWPhweUx4V0GV9uXJqbyAzg2ABmTz7kczrAQqMmM=; b=t7Gy+sNcclOXzKICvLNQGZdw8W5j3XykE4+mvMNxUPnIFzs9UQP6IZTdSSoRrNOMaz Z4VB4xK3R3d80eCIGO9dn1ciiQMBRl+r3ZMuJB+9D5su5KGpJkNw18Q4pB1NAkusvvEF 7OX5mJAoQwLMnCLvlZRY7D2EYJPvp/ws7TV1hVAm7raCUcVuZL4/2gxxMu5/pk15/gxo f1kweJd5vi/MdI6G9cQo0/naHlHG33MxyEYRyZqA6TA3T1/hB/XdjiVtwuF8oHpS+mZT xG5zUZ+5ttRU7534FTdkULThjkBffKhcAWdy+0Y2Q5MJgWNy37S/H94CPn8XLaxBWYfK sQGQ== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@rivosinc-com.20230601.gappssmtp.com header.s=20230601 header.b=HgsowDYT; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id b10-20020ac844ca000000b0042377597590si23792385qto.379.2023.12.19.01.54.41 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 19 Dec 2023 01:54:41 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@rivosinc-com.20230601.gappssmtp.com header.s=20230601 header.b=HgsowDYT; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id D69DE3853328 for ; Tue, 19 Dec 2023 09:54:35 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-wm1-x32c.google.com (mail-wm1-x32c.google.com [IPv6:2a00:1450:4864:20::32c]) by sourceware.org (Postfix) with ESMTPS id 405E9385801F for ; Tue, 19 Dec 2023 09:53:55 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 405E9385801F Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivosinc.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivosinc.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 405E9385801F Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::32c ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1702979637; cv=none; b=KqRTJNJ6ngtW9CqdEiDoH+UqO0KrK+YW1gUP1T4Bs0SkIHi7RTjpcjcYLWEbM8unzhvlv26vHzxuvC2gPLX+yZQGlTT2M6IB+lg/n5MV8NatQ4E5ZqwZX5q8jSDF0vF6GRnzdpzqp5co4GcOp7j7wKqeovRRCtKBIxQq9fiIvlM= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1702979637; c=relaxed/simple; bh=1s8sfb2HthRb8B9t9k4SRtY2FXoYoo3+h9VNUAWqFxk=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=dmD9NYCI7F2FuvKOwx/2mPAITeU64MZA7aAnoe5nySFkxX3Jt4uy/NfgubxRAiOrr105Mjqz+OEmGik0lA6wrFTswerDmVLThRBO6JqXtk4bs+itlHiBFTyBPZYGt3kgaxNT+5dlYCgy+SYAhuhPhtr+1ew4jU8QrO2ao9CKxyg= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-wm1-x32c.google.com with SMTP id 5b1f17b1804b1-40c580ba223so54327875e9.3 for ; Tue, 19 Dec 2023 01:53:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20230601.gappssmtp.com; s=20230601; t=1702979633; x=1703584433; darn=gcc.gnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=Mrj6HRmH3G1MhcYY31iFtdJ/GcIY9QP9yyjd8lMOPKk=; b=HgsowDYTue1laf2Wau1OyYwIQY9gl+dnzZfP8vS6dTgzGlO3tLk2oTJb79r7SJTFEB CAiXiNsiiraD+jGz9IAdTNn6DpalfE4V3nFmTydKKZDGdqOaUW6Lxvr2YspKYcMWAKQp Y4EmpbC2ZCvOMSD+wCtwz98LPizG/PDdJ4YSHAtdc2qPkkM8TPzYPCtj3ue/MYN1wr6y 7EiT1hJh6XgxXbkBB5bLghN7OKiCrS9X3mi3ZR93zJD2PiXgkMx5q2MLufHF3oq6kqdP R9HFvDte1Q9mJCRxlnm3rxvkhWFfe/y6MNwZoUGPnqTj/iD/XebIbBlTgXpLce3VKK0Y xZ5w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702979633; x=1703584433; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Mrj6HRmH3G1MhcYY31iFtdJ/GcIY9QP9yyjd8lMOPKk=; b=wigpk8W0aUBWRFk4XNeZ2jegL4fweEHobAaDykt5z1ESe5lrSGH+uMpPrN6C5D2NRx 3U0RxY3PTeVGF7GqGXmOB94iGN8bYrf6KFF8LBmiELq+sfSVyQzDmSz7GbHPNLjaKqKq Hm/NyB2UiukuhRnEmLMEhqwFl5mpuyS8NyOfU4Rem0PUhWbsgvcbXSI2Dv2YVNSoHJmE Mto0HgtyYymWdWpM1XAyanCg/F1z8RNXyaZ0uE909kNtEQFsrLRdE4cEfsgv7Uo5xWXi I90RkfPqZaQgpPCbcgMknyr51meP76W7MBFqANm/CsB1kTl6cSpfLCxkvBJmg1W3VmRV j02A== X-Gm-Message-State: AOJu0YzHCRFlpq/RnlaXzzlRJyk+zTXd+madvizVf46TM9tGTuc4X58Q u1PiTS/ag6S+o+u4MQxlL0a0MjTNO05an+g3O0HpGg== X-Received: by 2002:a05:600c:1f1a:b0:40c:16ee:3219 with SMTP id bd26-20020a05600c1f1a00b0040c16ee3219mr10991145wmb.165.1702979632941; Tue, 19 Dec 2023 01:53:52 -0800 (PST) Received: from slewis-laptop.ba.rivosinc.com ([51.52.155.69]) by smtp.gmail.com with ESMTPSA id q19-20020a05600c46d300b0040b632f31d2sm2079985wmo.5.2023.12.19.01.53.50 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 19 Dec 2023 01:53:51 -0800 (PST) From: Sergei Lewis To: gcc-patches@gcc.gnu.org Subject: [PATCH v2 3/3] RISC-V: cmpmem for RISCV with V extension Date: Tue, 19 Dec 2023 09:53:48 +0000 Message-Id: <20231219095348.356551-4-slewis@rivosinc.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231219095348.356551-1-slewis@rivosinc.com> References: <20231219095348.356551-1-slewis@rivosinc.com> MIME-Version: 1.0 X-Spam-Status: No, score=-11.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1785703622102360602 X-GMAIL-MSGID: 1785703622102360602 gcc/ChangeLog: * config/riscv/riscv-protos.h (riscv_vector::expand_vec_cmpmem): New function declaration. * config/riscv/riscv-string.cc (riscv_vector::expand_vec_cmpmem): New function; this generates an inline vectorised memory compare, if and only if we know the entire operation can be performed in a single vector load per input * config/riscv/riscv.md (cmpmemsi): Try riscv_vector::expand_vec_cmpmem for constant lengths gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/cmpmem-1.c: New codegen tests * gcc.target/riscv/rvv/base/cmpmem-2.c: New execution tests * gcc.target/riscv/rvv/base/cmpmem-3.c: New codegen tests * gcc.target/riscv/rvv/base/cmpmem-4.c: New codegen tests --- gcc/config/riscv/riscv-protos.h | 1 + gcc/config/riscv/riscv-string.cc | 100 ++++++++++++++++++ gcc/config/riscv/riscv.md | 15 +++ .../gcc.target/riscv/rvv/base/cmpmem-1.c | 88 +++++++++++++++ .../gcc.target/riscv/rvv/base/cmpmem-2.c | 74 +++++++++++++ .../gcc.target/riscv/rvv/base/cmpmem-3.c | 45 ++++++++ .../gcc.target/riscv/rvv/base/cmpmem-4.c | 62 +++++++++++ 7 files changed, 385 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/cmpmem-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/cmpmem-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/cmpmem-3.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/cmpmem-4.c diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h index c4531589300..301aa9b8889 100644 --- a/gcc/config/riscv/riscv-protos.h +++ b/gcc/config/riscv/riscv-protos.h @@ -638,6 +638,7 @@ void expand_rawmemchr (machine_mode, rtx, rtx, rtx, bool = false); bool expand_strcmp (rtx, rtx, rtx, rtx, unsigned HOST_WIDE_INT, bool); void emit_vec_extract (rtx, rtx, rtx); bool expand_vec_setmem (rtx, rtx, rtx, rtx); +bool expand_vec_cmpmem (rtx, rtx, rtx, rtx); /* Rounding mode bitfield for fixed point VXRM. */ enum fixed_point_rounding_mode diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc index e506b92a552..3b634851753 100644 --- a/gcc/config/riscv/riscv-string.cc +++ b/gcc/config/riscv/riscv-string.cc @@ -1337,4 +1337,104 @@ expand_vec_setmem (rtx dst_in, rtx length_in, rtx fill_value_in, return true; } +/* Used by cmpmemsi in riscv.md. */ + +bool +expand_vec_cmpmem (rtx result_out, rtx blk_a_in, rtx blk_b_in, rtx length_in) +{ + HOST_WIDE_INT lmul; + /* Check we are able and allowed to vectorise this operation; + bail if not. */ + if (!check_vectorise_memory_operation (length_in, lmul)) + return false; + + /* Strategy: + load entire blocks at a and b into vector regs + generate mask of bytes that differ + find first set bit in mask + find offset of first set bit in mask, use 0 if none set + result is ((char*)a[offset] - (char*)b[offset]) + */ + + machine_mode vmode + = riscv_vector::get_vector_mode (QImode, BYTES_PER_RISCV_VECTOR * lmul) + .require (); + rtx blk_a_addr = copy_addr_to_reg (XEXP (blk_a_in, 0)); + rtx blk_a = change_address (blk_a_in, vmode, blk_a_addr); + rtx blk_b_addr = copy_addr_to_reg (XEXP (blk_b_in, 0)); + rtx blk_b = change_address (blk_b_in, vmode, blk_b_addr); + + rtx vec_a = gen_reg_rtx (vmode); + rtx vec_b = gen_reg_rtx (vmode); + + machine_mode mask_mode = get_mask_mode (vmode); + rtx mask = gen_reg_rtx (mask_mode); + rtx mismatch_ofs = gen_reg_rtx (Pmode); + + rtx ne = gen_rtx_NE (mask_mode, vec_a, vec_b); + rtx vmsops[] = { mask, ne, vec_a, vec_b }; + rtx vfops[] = { mismatch_ofs, mask }; + + /* If the length is exactly vlmax for the selected mode, do that. + Otherwise, use a predicated store. */ + + if (known_eq (GET_MODE_SIZE (vmode), INTVAL (length_in))) + { + emit_move_insn (vec_a, blk_a); + emit_move_insn (vec_b, blk_b); + emit_vlmax_insn (code_for_pred_cmp (vmode), riscv_vector::COMPARE_OP, + vmsops); + + emit_vlmax_insn (code_for_pred_ffs (mask_mode, Pmode), + riscv_vector::CPOP_OP, vfops); + } + else + { + if (!satisfies_constraint_K (length_in)) + length_in = force_reg (Pmode, length_in); + + rtx memmask = CONSTM1_RTX (mask_mode); + + rtx m_ops_a[] = { vec_a, memmask, blk_a }; + rtx m_ops_b[] = { vec_b, memmask, blk_b }; + + emit_nonvlmax_insn (code_for_pred_mov (vmode), + riscv_vector::UNARY_OP_TAMA, m_ops_a, length_in); + emit_nonvlmax_insn (code_for_pred_mov (vmode), + riscv_vector::UNARY_OP_TAMA, m_ops_b, length_in); + + emit_nonvlmax_insn (code_for_pred_cmp (vmode), riscv_vector::COMPARE_OP, + vmsops, length_in); + + emit_nonvlmax_insn (code_for_pred_ffs (mask_mode, Pmode), + riscv_vector::CPOP_OP, vfops, length_in); + } + + /* Mismatch_ofs is -1 if blocks match, or the offset of + the first mismatch otherwise. */ + rtx ltz = gen_reg_rtx (Xmode); + emit_insn (gen_slt_3 (LT, Xmode, Xmode, ltz, mismatch_ofs, const0_rtx)); + /* mismatch_ofs += (mismatch_ofs < 0) ? 1 : 0. */ + emit_insn ( + gen_rtx_SET (mismatch_ofs, gen_rtx_PLUS (Pmode, mismatch_ofs, ltz))); + + /* Unconditionally load the bytes at mismatch_ofs and subtract them + to get our result. */ + emit_insn (gen_rtx_SET (blk_a_addr, + gen_rtx_PLUS (Pmode, mismatch_ofs, blk_a_addr))); + emit_insn (gen_rtx_SET (blk_b_addr, + gen_rtx_PLUS (Pmode, mismatch_ofs, blk_b_addr))); + + blk_a = change_address (blk_a, QImode, blk_a_addr); + blk_b = change_address (blk_b, QImode, blk_b_addr); + + rtx byte_a = gen_reg_rtx (SImode); + rtx byte_b = gen_reg_rtx (SImode); + do_zero_extendqi2 (byte_a, blk_a); + do_zero_extendqi2 (byte_b, blk_b); + + emit_insn (gen_rtx_SET (result_out, gen_rtx_MINUS (SImode, byte_a, byte_b))); + + return true; +} } diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md index dd34211ca80..08dd22ea733 100644 --- a/gcc/config/riscv/riscv.md +++ b/gcc/config/riscv/riscv.md @@ -2401,6 +2401,21 @@ FAIL; }) +(define_expand "cmpmemsi" + [(set (match_operand:SI 0 "register_operand" "") + (compare:SI (match_operand:BLK 1 "memory_operand" "") + (match_operand:BLK 2 "memory_operand" ""))) + (use (match_operand:SI 3 "general_operand" "")) + (use (match_operand:SI 4 "" ""))] + "TARGET_VECTOR" +{ + if (riscv_vector::expand_vec_cmpmem (operands[0], operands[1], + operands[2], operands[3])) + DONE; + else + FAIL; +}) + ;; Expand in-line code to clear the instruction cache between operand[0] and ;; operand[1]. (define_expand "clear_cache" diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/cmpmem-1.c b/gcc/testsuite/gcc.target/riscv/rvv/base/cmpmem-1.c new file mode 100644 index 00000000000..d4c665dc791 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/cmpmem-1.c @@ -0,0 +1,88 @@ +/* { dg-do compile } */ +/* { dg-add-options riscv_v } */ +/* { dg-additional-options "-O3 --param=riscv-autovec-lmul=dynamic" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#define MIN_VECTOR_BYTES (__riscv_v_min_vlen / 8) + +/* Trivial memcmp should use inline scalar ops. +** f1: +** lbu\s+a\d+,0\(a0\) +** lbu\s+a\d+,0\(a1\) +** subw?\s+a0,a\d+,a\d+ +** ret +*/ +int +f1 (void *a, void *b) +{ + return __builtin_memcmp (a, b, 1); +} + +/* Tiny __builtin_memcmp should use libc. +** f2: +** li\s+a\d,\d+ +** tail\s+memcmp +*/ +int +f2 (void *a, void *b) +{ + return __builtin_memcmp (a, b, MIN_VECTOR_BYTES - 1); +} + +/* Vectorise+inline minimum vector register width with LMUL=1 +** f3: +** ( +** vsetivli\s+zero,\d+,e8,m1,ta,ma +** | +** li\s+a\d+,\d+ +** vsetvli\s+zero,a\d+,e8,m1,ta,ma +** ) +** ... +** ret +*/ +int +f3 (void *a, void *b) +{ + return __builtin_memcmp (a, b, MIN_VECTOR_BYTES); +} + +/* Vectorised code should use smallest lmul known to fit length +** f4: +** ( +** vsetivli\s+zero,\d+,e8,m2,ta,ma +** | +** li\s+a\d+,\d+ +** vsetvli\s+zero,a\d+,e8,m2,ta,ma +** ) +** ... +** ret +*/ +int +f4 (void *a, void *b) +{ + return __builtin_memcmp (a, b, MIN_VECTOR_BYTES + 1); +} + +/* Vectorise+inline up to LMUL=8 +** f5: +** li\s+a\d+,\d+ +** vsetvli\s+zero,a\d+,e8,m8,ta,ma +** ... +** ret +*/ +int +f5 (void *a, void *b) +{ + return __builtin_memcmp (a, b, MIN_VECTOR_BYTES * 8); +} + +/* Don't inline if the length is too large for one operation. +** f6: +** li\s+a2,\d+ +** tail\s+memcmp +*/ +int +f6 (void *a, void *b) +{ + return __builtin_memcmp (a, b, MIN_VECTOR_BYTES * 8 + 1); +} diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/cmpmem-2.c b/gcc/testsuite/gcc.target/riscv/rvv/base/cmpmem-2.c new file mode 100644 index 00000000000..81c8bdb33ca --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/cmpmem-2.c @@ -0,0 +1,74 @@ +/* { dg-do run { target { riscv_v } } } */ +/* { dg-add-options riscv_v } */ +/* { dg-options "-O2 --param=riscv-autovec-lmul=dynamic" } */ + +#include + +#define MIN_VECTOR_BYTES (__riscv_v_min_vlen / 8) + +static inline __attribute__ ((always_inline)) void +do_one_test (int const size, int const diff_offset, int const diff_dir) +{ + unsigned char A[size]; + unsigned char B[size]; + unsigned char const fill_value = 0x55; + __builtin_memset (A, fill_value, size); + __builtin_memset (B, fill_value, size); + + if (diff_dir != 0) + { + if (diff_dir < 0) + { + A[diff_offset] = fill_value - 1; + } + else + { + A[diff_offset] = fill_value + 1; + } + } + + if (__builtin_memcmp (A, B, size) != diff_dir) + { + abort (); + } +} + +int +main () +{ + do_one_test (0, 0, 0); + + do_one_test (1, 0, -1); + do_one_test (1, 0, 0); + do_one_test (1, 0, 1); + + do_one_test (MIN_VECTOR_BYTES - 1, 0, -1); + do_one_test (MIN_VECTOR_BYTES - 1, 0, 0); + do_one_test (MIN_VECTOR_BYTES - 1, 0, 1); + do_one_test (MIN_VECTOR_BYTES - 1, 1, -1); + do_one_test (MIN_VECTOR_BYTES - 1, 1, 0); + do_one_test (MIN_VECTOR_BYTES - 1, 1, 1); + + do_one_test (MIN_VECTOR_BYTES, 0, -1); + do_one_test (MIN_VECTOR_BYTES, 0, 0); + do_one_test (MIN_VECTOR_BYTES, 0, 1); + do_one_test (MIN_VECTOR_BYTES, MIN_VECTOR_BYTES - 1, -1); + do_one_test (MIN_VECTOR_BYTES, MIN_VECTOR_BYTES - 1, 0); + do_one_test (MIN_VECTOR_BYTES, MIN_VECTOR_BYTES - 1, 1); + + do_one_test (MIN_VECTOR_BYTES + 1, 0, -1); + do_one_test (MIN_VECTOR_BYTES + 1, 0, 0); + do_one_test (MIN_VECTOR_BYTES + 1, 0, 1); + do_one_test (MIN_VECTOR_BYTES + 1, MIN_VECTOR_BYTES, -1); + do_one_test (MIN_VECTOR_BYTES + 1, MIN_VECTOR_BYTES, 0); + do_one_test (MIN_VECTOR_BYTES + 1, MIN_VECTOR_BYTES, 1); + + do_one_test (MIN_VECTOR_BYTES * 8, 0, -1); + do_one_test (MIN_VECTOR_BYTES * 8, 0, 0); + do_one_test (MIN_VECTOR_BYTES * 8, 0, 1); + do_one_test (MIN_VECTOR_BYTES * 8, MIN_VECTOR_BYTES * 8 - 1, -1); + do_one_test (MIN_VECTOR_BYTES * 8, MIN_VECTOR_BYTES * 8 - 1, 0); + do_one_test (MIN_VECTOR_BYTES * 8, MIN_VECTOR_BYTES * 8 - 1, 1); + + return 0; +} diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/cmpmem-3.c b/gcc/testsuite/gcc.target/riscv/rvv/base/cmpmem-3.c new file mode 100644 index 00000000000..dfad1b96c60 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/cmpmem-3.c @@ -0,0 +1,45 @@ +/* { dg-do compile } */ +/* { dg-add-options riscv_v } */ +/* { dg-additional-options "-O3 --param riscv-autovec-lmul=m1" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#define MIN_VECTOR_BYTES (__riscv_v_min_vlen / 8) + +/* Tiny __builtin_memcmp should use libc. +** f1: +** li\s+a\d,\d+ +** tail\s+memcmp +*/ +int +f1 (void *a, void *b) +{ + return __builtin_memcmp (a, b, MIN_VECTOR_BYTES - 1); +} + +/* Vectorise+inline minimum vector register width with LMUL=1 +** f2: +** ( +** vsetivli\s+zero,\d+,e8,m1,ta,ma +** | +** li\s+a\d+,\d+ +** vsetvli\s+zero,a\d+,e8,m1,ta,ma +** ) +** ... +** ret +*/ +int +f2 (void *a, void *b) +{ + return __builtin_memcmp (a, b, MIN_VECTOR_BYTES); +} + +/* Don't inline if the length is too large for one operation. +** f3: +** li\s+a2,\d+ +** tail\s+memcmp +*/ +int +f3 (void *a, void *b) +{ + return __builtin_memcmp (a, b, MIN_VECTOR_BYTES + 1); +} diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/cmpmem-4.c b/gcc/testsuite/gcc.target/riscv/rvv/base/cmpmem-4.c new file mode 100644 index 00000000000..55a61eae029 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/cmpmem-4.c @@ -0,0 +1,62 @@ +/* { dg-do compile } */ +/* { dg-add-options riscv_v } */ +/* { dg-additional-options "-O3 --param riscv-autovec-lmul=m8" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#define MIN_VECTOR_BYTES (__riscv_v_min_vlen / 8) + +/* Tiny __builtin_memcmp should use libc. +** f1: +** li\s+a\d,\d+ +** tail\s+memcmp +*/ +int +f1 (void *a, void *b) +{ + return __builtin_memcmp (a, b, MIN_VECTOR_BYTES - 1); +} + +/* Vectorise+inline minimum vector register width with LMUL=8 as requested +** f2: +** ( +** vsetivli\s+zero,\d+,e8,m8,ta,ma +** | +** li\s+a\d+,\d+ +** vsetvli\s+zero,a\d+,e8,m8,ta,ma +** ) +** ... +** ret +*/ +int +f2 (void *a, void *b) +{ + return __builtin_memcmp (a, b, MIN_VECTOR_BYTES); +} + +/* Vectorise+inline anything that fits +** f3: +** ( +** vsetivli\s+zero,\d+,e8,m8,ta,ma +** | +** li\s+a\d+,\d+ +** vsetvli\s+zero,a\d+,e8,m8,ta,ma +** ) +** ... +** ret +*/ +int +f3 (void *a, void *b) +{ + return __builtin_memcmp (a, b, MIN_VECTOR_BYTES * 8); +} + +/* Don't inline if the length is too large for one operation. +** f4: +** li\s+a2,\d+ +** tail\s+memcmp +*/ +int +f4 (void *a, void *b) +{ + return __builtin_memcmp (a, b, MIN_VECTOR_BYTES * 8 + 1); +}