From patchwork Fri Feb 3 07:16:33 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Prathamesh Kulkarni X-Patchwork-Id: 52347 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp691850wrn; Thu, 2 Feb 2023 23:25:41 -0800 (PST) X-Google-Smtp-Source: AK7set8L8ZFOKwP0EaRXdtjEyn7vXdnBzIgEVS0EWyCttXPcEI8XdLw79PwBIcUAb5W14w2KCb+9 X-Received: by 2002:a17:906:26cc:b0:878:605e:dbe7 with SMTP id u12-20020a17090626cc00b00878605edbe7mr7898275ejc.3.1675409141165; Thu, 02 Feb 2023 23:25:41 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1675409141; cv=none; d=google.com; s=arc-20160816; b=tOcOLl2UAeOzijJ6fCgettR6/Imdq3uratbBMBCkifMCCYQWdo4Et29Z7gpTPI26a/ XI5fhyoNSOTvBEmS77sjV7PSlB2IIoFn54eFKLcGrwvhJiYSySgyxjB6ksVGhoUp+w7t eUsWAtQ3AxAKpGaDWt86Yn9zpNnw+m2ASAr7qrD91EQWui9kV1FaUhQMMFxyzziP24sW fnilSAyK3hFIZIN9XsB5PGMq/1V1csJy+fReQ/fswczew1xepONnpP0o6W8cK9v1SC7f kjhB5BB52HVa984hrbrPfcwJUWzOpY8mEQSaO9ETP4xQJgac7/Z3IuU509QX3XNGDLpL U6cg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence:to:subject :message-id:date:mime-version:dmarc-filter:delivered-to :dkim-signature:dkim-filter; bh=BMdgzj6i/QkTXpS7Zl2LXM3k62weX7czt4HeYspyb7k=; b=Z8rbJAVmmYvVZGk6ux+5MVQoUBcC/ZM6nD5p5cAprYWnH2hiE/ArOKYu7oymZsYjtT wYFnQj2FpwKOPuDUmv5TFpl4zxxQvtPvQlLBlpTgxLxjDcX/oQYh4atcAib/X+dlFZvq w76pmEgUGDIzbtJ2g9N3ONtsXSNw1EPcDF0DaXTxTt2T8IuINUD8CIwbgLVOsOsECv2+ 0eQLK0hLAwdklDhderJxxD/cVbKkhO24EDdTCCkQc/ivHaiX67Is5gRz1GHAhh6YmpLS JALKn2/cs+qhXc2Xu9QOtnICWTzsRo5XHQ5M4+Lwputtdcgl8eLpFDDdMCzzOFUFPBnS yvUg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=nXwKRMmR; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id fl6-20020a1709072a8600b0088c224bf5b0si2163762ejc.144.2023.02.02.23.25.41 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 02 Feb 2023 23:25:41 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=nXwKRMmR; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 0A870394243F for ; Fri, 3 Feb 2023 07:22:01 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 0A870394243F DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1675408921; bh=BMdgzj6i/QkTXpS7Zl2LXM3k62weX7czt4HeYspyb7k=; h=Date:Subject:To:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=nXwKRMmRQcRaWUx6ITEJwN9mriNau0JoH+V/hCnZEgT2S9nklgjdNa5QOUHyi/8+j iQIPA1D0nT18ytf/dNJ8Q+mz6yi9s8PBXrWh6dzMayoYGha+aEr2RUFM4g/UfvokoI QtciUT5TazmJjWLvM06x1rk3Dsg7MjvLGbgHcm24= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-wm1-x329.google.com (mail-wm1-x329.google.com [IPv6:2a00:1450:4864:20::329]) by sourceware.org (Postfix) with ESMTPS id B961E38432D6 for ; Fri, 3 Feb 2023 07:17:12 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B961E38432D6 Received: by mail-wm1-x329.google.com with SMTP id k8-20020a05600c1c8800b003dc57ea0dfeso5375091wms.0 for ; Thu, 02 Feb 2023 23:17:12 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=BMdgzj6i/QkTXpS7Zl2LXM3k62weX7czt4HeYspyb7k=; b=Pqr75jdYjDN5jT4JhQRlYaObfPRhC6+BisQuDPrWY0keX56JDmzfcfF2RoK3HQIsNx Vfmanab2IbaANdSpuzgsrU6vmdxf5prA5mFcVxnpzzTkD2e3MwVCOdokO6aq40ulCrPp nf1I+I1oMrdScF5xx0PT4nTcwNV+05AKyx9llJ44GdFso1ZcHtGEXMAQOlGopm9R8Nyv YqgQDgJvp1Iz/IIuEFuuF/9Uad+A+n0KCekHIcp+vY3m7XNp0JvKctzy+ML81L82CtGS qeXQUI0jyvhRt4vTsW9KrHJniIUKW5gie8hoOCrrWgYJsWG2K1OTWbbRx0+AZ5B7LJTq rDRw== X-Gm-Message-State: AO0yUKViijh1BSyKyGKzjUiDkuonZSO3V+/ynTBEQG5p7k6NfEyEihXA 1sC9+kxKPKrOujLpEjB1J+oPQUhpTe7NVpRE8Li/kmkdfEinqbvx X-Received: by 2002:a05:600c:4f4d:b0:3d1:e4ed:2719 with SMTP id m13-20020a05600c4f4d00b003d1e4ed2719mr267886wmq.147.1675408630936; Thu, 02 Feb 2023 23:17:10 -0800 (PST) MIME-Version: 1.0 Date: Fri, 3 Feb 2023 12:46:33 +0530 Message-ID: Subject: [aarch64] Code-gen for vector initialization involving constants To: gcc Patches , Richard Sandiford X-Spam-Status: No, score=-9.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Prathamesh Kulkarni via Gcc-patches From: Prathamesh Kulkarni Reply-To: Prathamesh Kulkarni Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1756793815510719247?= X-GMAIL-MSGID: =?utf-8?q?1756793815510719247?= Hi Richard, While digging thru aarch64_expand_vector_init, I noticed it gives priority to loading a constant first: /* Initialise a vector which is part-variable. We want to first try to build those lanes which are constant in the most efficient way we can. */ which results in suboptimal code-gen for following case: int16x8_t f_s16(int16_t x) { return (int16x8_t) { x, x, x, x, x, x, x, 1 }; } code-gen trunk: f_s16: movi v0.8h, 0x1 ins v0.h[0], w0 ins v0.h[1], w0 ins v0.h[2], w0 ins v0.h[3], w0 ins v0.h[4], w0 ins v0.h[5], w0 ins v0.h[6], w0 ret The attached patch tweaks the following condition: if (n_var == n_elts && n_elts <= 16) { ... } to pass if maxv >= 80% of n_elts, with 80% being an arbitrary "high enough" threshold. The intent is to dup the most repeating variable if it it's repetition is "high enough" and insert constants which should be "better" than loading constant first and inserting variables like in the above case. Alternatively, I suppose we can remove threshold and for constants, generate both sequences and check which one is more efficient ? code-gen with patch: f_s16: dup v0.8h, w0 movi v1.4h, 0x1 ins v0.h[7], v1.h[0] ret The patch is lightly tested to verify that vec[t]-init-*.c tests pass with bootstrap+test in progress. Does this look OK ? Thanks, Prathamesh diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index acc0cfe5f94..df33509c6e4 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -22079,30 +22079,36 @@ aarch64_expand_vector_init (rtx target, rtx vals) and matches[X][1] with the count of duplicate elements (if X is the earliest element which has duplicates). */ - if (n_var == n_elts && n_elts <= 16) + int matches[16][2] = {0}; + for (int i = 0; i < n_elts; i++) { - int matches[16][2] = {0}; - for (int i = 0; i < n_elts; i++) + for (int j = 0; j <= i; j++) { - for (int j = 0; j <= i; j++) + if (rtx_equal_p (XVECEXP (vals, 0, i), XVECEXP (vals, 0, j))) { - if (rtx_equal_p (XVECEXP (vals, 0, i), XVECEXP (vals, 0, j))) - { - matches[i][0] = j; - matches[j][1]++; - break; - } + matches[i][0] = j; + matches[j][1]++; + break; } } - int maxelement = 0; - int maxv = 0; - for (int i = 0; i < n_elts; i++) - if (matches[i][1] > maxv) - { - maxelement = i; - maxv = matches[i][1]; - } + } + int maxelement = 0; + int maxv = 0; + for (int i = 0; i < n_elts; i++) + if (matches[i][1] > maxv) + { + maxelement = i; + maxv = matches[i][1]; + } + + rtx max_elem = XVECEXP (vals, 0, maxelement); + if (n_elts <= 16 + && ((n_var == n_elts) + || (maxv >= (int)(0.8 * n_elts) + && !CONST_INT_P (max_elem) + && !CONST_DOUBLE_P (max_elem)))) + { /* Create a duplicate of the most common element, unless all elements are equally useless to us, in which case just immediately set the vector register using the first element. */ diff --git a/gcc/testsuite/gcc.target/aarch64/vec-init-18.c b/gcc/testsuite/gcc.target/aarch64/vec-init-18.c new file mode 100644 index 00000000000..e20b813559e --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/vec-init-18.c @@ -0,0 +1,53 @@ +/* { dg-do compile } */ +/* { dg-options "-O3" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +#include + +/* +** f1_s16: +** ... +** dup v[0-9]+\.8h, w[0-9]+ +** movi v[0-9]+\.4h, 0x1 +** ins v[0-9]+\.h\[7\], v[0-9]+\.h\[0\] +** ... +** ret +*/ + +int16x8_t f1_s16(int16_t x) +{ + return (int16x8_t) {x, x, x, x, x, x, x, 1}; +} + +/* +** f2_s16: +** ... +** dup v[0-9]+\.8h, w[0-9]+ +** movi v[0-9]+\.4h, 0x1 +** movi v[0-9]+\.4h, 0x2 +** ins v[0-9]+\.h\[6\], v[0-9]+\.h\[0\] +** ins v[0-9]+\.h\[7\], v[0-9]+\.h\[0\] +** ... +** ret +*/ + +int16x8_t f2_s16(int16_t x) +{ + return (int16x8_t) { x, x, x, x, x, x, 1, 2 }; +} + +/* +** f3_s16: +** ... +** movi v[0-9]+\.8h, 0x1 +** ins v[0-9]+\.h\[0\], w0 +** ins v[0-9]+\.h\[1\], w0 +** ins v[0-9]+\.h\[2\], w0 +** ... +** ret +*/ + +int16x8_t f3_s16(int16_t x) +{ + return (int16x8_t) {x, x, x, 1, 1, 1, 1, 1}; +}