From patchwork Fri Aug 12 09:40:55 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kewen.Lin" X-Patchwork-Id: 484 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:6a10:38f:b0:2d5:3c95:9e21 with SMTP id 15csp748969pxh; Fri, 12 Aug 2022 02:42:00 -0700 (PDT) X-Google-Smtp-Source: AA6agR75agBysuo7K5mr6uDqKZ1DzTKURWSvPcvWgq52Z9IU8gx4Bm7/+DMa4HxcWw8eQJULwTF1 X-Received: by 2002:a05:6402:248a:b0:440:9709:df09 with SMTP id q10-20020a056402248a00b004409709df09mr2851319eda.42.1660297319836; Fri, 12 Aug 2022 02:41:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1660297319; cv=none; d=google.com; s=arc-20160816; b=B0dItSRwr3EEiS7KxX3CGQYGp2s+GGMCYd1xnW7SvzZmGJuNF06K6PZv2bQOcttp1/ 8NAdTufUedpCEpqchE2Uk9OgBaXLWRvEN8GF+y0n6kcYxKi3QqKo4Nbr1pg0rSXK0W6x /drwEdaMK5mxy9v33gzZe07tMQ/koDaA0YjXZ2iFPskYzOEN9+pmxf7mVei+D/I+x0ap nAnqwIqw2wI3Ss//LN369WMtzrj9p/u6Jzo+LWjNXdMbc50vrG4kVvqEAcGHBGTNWpGn lL1Ip2jBjD/fdaVTxbQjvedzAIOtxoE2kLpU9d0gDiOiZ8SnbjKUnMxVriO3SqQQWSRy Ptfg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:reply-to:from:list-subscribe:list-help :list-post:list-archive:list-unsubscribe:list-id:precedence :content-transfer-encoding:subject:to:content-language:user-agent :mime-version:date:message-id:dmarc-filter:delivered-to :dkim-signature:dkim-filter; bh=flik56wCc/oD43hzXZyQMKlSsmjbIC1Cp55jZyw420A=; b=ouD8EOKtB4AGOXPWM7AUEHCesqZ77dOEM3HwaJcFtr6jLKeF4//0xTJZlIztYcyHvV hHkA18h4lRo76LNXhOR9t3eUYva8TyZuemT2dNmRf+7fdkHpIle5sGC2n7rvvYPDplbM 7uo5+X9tZrWtViC9Hk7pJoSWYagZm2qhZcelTNhaanY/L7P7Byq7Eep0O3uVaRSNVeZ+ BMKAqnxGoEL85yEWCo0P9n4YX85MdbiMsXrqwBmxy8GArC2aSK1EhzQL3j9VDrK+VhQY UTT1E7zxbmK6E+KQEC1Xn8K6ZcrmzEQB0nWzXWa7irAYom56zzZ+w+wl9oc3js4WmdK4 z8gA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=CMtOGEGi; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id s7-20020a056402520700b0043dc6af135dsi2043404edd.56.2022.08.12.02.41.59 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 12 Aug 2022 02:41:59 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=CMtOGEGi; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A92C03858421 for ; Fri, 12 Aug 2022 09:41:58 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A92C03858421 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1660297318; bh=flik56wCc/oD43hzXZyQMKlSsmjbIC1Cp55jZyw420A=; h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=CMtOGEGi+tj7uK1ULR1gwyaSWSdx7ZmPvk7lQzjtJ6W3VGtr071O72KlrLJLYRFFZ S5UB5AA1igLxf0oT5Y500CE8n8WBHZvnEGKkkwg641vooR9h8wUtzueTnm2kL3HA6K n+CvXwhavyH7Y1A5DsEPZEfJp7RsvTsEf+dYwFlo= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id E1C533858288 for ; Fri, 12 Aug 2022 09:41:14 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org E1C533858288 Received: from pps.filterd (m0098409.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 27C8lR8s025587; Fri, 12 Aug 2022 09:41:05 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3hwknm1g34-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 12 Aug 2022 09:41:05 +0000 Received: from m0098409.ppops.net (m0098409.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 27C8rWth014436; Fri, 12 Aug 2022 09:41:05 GMT Received: from ppma03ams.nl.ibm.com (62.31.33a9.ip4.static.sl-reverse.com [169.51.49.98]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3hwknm1g23-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 12 Aug 2022 09:41:04 +0000 Received: from pps.filterd (ppma03ams.nl.ibm.com [127.0.0.1]) by ppma03ams.nl.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 27C9NSgV016740; Fri, 12 Aug 2022 09:41:02 GMT Received: from b06avi18626390.portsmouth.uk.ibm.com (b06avi18626390.portsmouth.uk.ibm.com [9.149.26.192]) by ppma03ams.nl.ibm.com with ESMTP id 3huwvg327n-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 12 Aug 2022 09:41:02 +0000 Received: from d06av26.portsmouth.uk.ibm.com (d06av26.portsmouth.uk.ibm.com [9.149.105.62]) by b06avi18626390.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 27C9cP8133096000 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 12 Aug 2022 09:38:25 GMT Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id C179DAE045; Fri, 12 Aug 2022 09:40:59 +0000 (GMT) Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id A7EA3AE04D; Fri, 12 Aug 2022 09:40:57 +0000 (GMT) Received: from [9.197.252.68] (unknown [9.197.252.68]) by d06av26.portsmouth.uk.ibm.com (Postfix) with ESMTP; Fri, 12 Aug 2022 09:40:57 +0000 (GMT) Message-ID: Date: Fri, 12 Aug 2022 17:40:55 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.6.1 Content-Language: en-US To: GCC Patches Subject: [PATCH] vect: Don't allow vect_emulated_vector_p type in vectorizable_call [PR106322] X-TM-AS-GCONF: 00 X-Proofpoint-GUID: TuQzTlZGPYGBcfcxMMIEgd72Zt-lnoiV X-Proofpoint-ORIG-GUID: 33gjD-H32WlMHpqAeNtbmuFQMridJDfp X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-08-12_06,2022-08-11_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=0 bulkscore=0 adultscore=0 priorityscore=1501 mlxlogscore=914 phishscore=0 clxscore=1011 impostorscore=0 mlxscore=0 spamscore=0 malwarescore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2207270000 definitions=main-2208120026 X-Spam-Status: No, score=-11.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: "Kewen.Lin via Gcc-patches" From: "Kewen.Lin" Reply-To: "Kewen.Lin" Cc: Richard Sandiford Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1740947922175008124?= X-GMAIL-MSGID: =?utf-8?q?1740947922175008124?= Hi, As PR106322 shows, in some cases for some vector type whose TYPE_MODE is a scalar integral mode instead of a vector mode, it's possible to obtain wrong target support information when querying with the scalar integral mode. For example, for the test case in PR106322, on ppc64 32bit vectorizer gets vector type "vector(2) short unsigned int" for scalar type "short unsigned int", its mode is SImode instead of V2HImode. The target support querying checks umul_highpart optab with SImode and considers it's supported, then vectorizer further generates .MULH IFN call for that vector type. Unfortunately it's wrong to use SImode support for that vector type multiply highpart here. This patch is to teach vectorizable_call analysis not to allow vect_emulated_vector_p type for both vectype_in and vectype_out as Richi suggested. Bootstrapped and regtested on x86_64-redhat-linux, aarch64-linux-gnu and powerpc64{,le}-linux-gnu. Is it ok for trunk? If it's ok, I guess we want this to be backported? BR, Kewen ----- PR tree-optimization/106322 gcc/ChangeLog: * tree-vect-stmts.cc (vectorizable_call): Don't allow vect_emulated_vector_p type for both vectype_in and vectype_out. gcc/testsuite/ChangeLog: * g++.target/i386/pr106322.C: New test. * g++.target/powerpc/pr106322.C: New test. --- gcc/testsuite/g++.target/i386/pr106322.C | 196 ++++++++++++++++++++ gcc/testsuite/g++.target/powerpc/pr106322.C | 195 +++++++++++++++++++ gcc/tree-vect-stmts.cc | 8 + 3 files changed, 399 insertions(+) create mode 100644 gcc/testsuite/g++.target/i386/pr106322.C create mode 100644 gcc/testsuite/g++.target/powerpc/pr106322.C -- 2.27.0 diff --git a/gcc/testsuite/g++.target/i386/pr106322.C b/gcc/testsuite/g++.target/i386/pr106322.C new file mode 100644 index 00000000000..3cd8d6bf225 --- /dev/null +++ b/gcc/testsuite/g++.target/i386/pr106322.C @@ -0,0 +1,196 @@ +/* { dg-do run } */ +/* { dg-require-effective-target ia32 } */ +/* { dg-require-effective-target c++11 } */ +/* { dg-options "-O2 -mtune=generic -march=i686" } */ + +/* As PR106322, verify this can execute well (not abort). */ + +#include +#include +#include +#include +#include +#include + +__attribute__((noipa)) +bool BytesEqual(const void *bytes1, const void *bytes2, const size_t size) { + return memcmp(bytes1, bytes2, size) == 0; +} + +#define HWY_ALIGNMENT 64 +constexpr size_t kAlignment = HWY_ALIGNMENT; +constexpr size_t kAlias = kAlignment * 4; + +namespace hwy { +namespace N_EMU128 { +template struct Vec128 { + T raw[16 / sizeof(T)] = {}; +}; +} // namespace N_EMU128 +} // namespace hwy + +template +static void Store(const hwy::N_EMU128::Vec128 v, + T *__restrict__ aligned) { + __builtin_memcpy(aligned, v.raw, sizeof(T) * N); +} + +template +static hwy::N_EMU128::Vec128 Load(const T *__restrict__ aligned) { + hwy::N_EMU128::Vec128 v; + __builtin_memcpy(v.raw, aligned, sizeof(T) * N); + return v; +} + +template +static hwy::N_EMU128::Vec128 +MulHigh(hwy::N_EMU128::Vec128 a, + const hwy::N_EMU128::Vec128 b) { + for (size_t i = 0; i < N; ++i) { + // Cast to uint32_t first to prevent overflow. Otherwise the result of + // uint16_t * uint16_t is in "int" which may overflow. In practice the + // result is the same but this way it is also defined. + a.raw[i] = static_cast( + (static_cast(a.raw[i]) * static_cast(b.raw[i])) >> + 16); + } + return a; +} + +#define HWY_ASSERT(condition) assert((condition)) +#define HWY_ASSUME_ALIGNED(ptr, align) __builtin_assume_aligned((ptr), (align)) + +#pragma pack(push, 1) +struct AllocationHeader { + void *allocated; + size_t payload_size; +}; +#pragma pack(pop) + +static void FreeAlignedBytes(const void *aligned_pointer) { + HWY_ASSERT(aligned_pointer != nullptr); + if (aligned_pointer == nullptr) + return; + + const uintptr_t payload = reinterpret_cast(aligned_pointer); + HWY_ASSERT(payload % kAlignment == 0); + const AllocationHeader *header = + reinterpret_cast(payload) - 1; + + free(header->allocated); +} + +class AlignedFreer { +public: + template void operator()(T *aligned_pointer) const { + FreeAlignedBytes(aligned_pointer); + } +}; + +template +using AlignedFreeUniquePtr = std::unique_ptr; + +static inline constexpr size_t ShiftCount(size_t n) { + return (n <= 1) ? 0 : 1 + ShiftCount(n / 2); +} + +namespace { +static size_t NextAlignedOffset() { + static std::atomic next{0}; + constexpr uint32_t kGroups = kAlias / kAlignment; + const uint32_t group = next.fetch_add(1, std::memory_order_relaxed) % kGroups; + const size_t offset = kAlignment * group; + HWY_ASSERT((offset % kAlignment == 0) && offset <= kAlias); + return offset; +} +} // namespace + +static void *AllocateAlignedBytes(const size_t payload_size) { + HWY_ASSERT(payload_size != 0); // likely a bug in caller + if (payload_size >= std::numeric_limits::max() / 2) { + HWY_ASSERT(false && "payload_size too large"); + return nullptr; + } + + size_t offset = NextAlignedOffset(); + + // What: | misalign | unused | AllocationHeader |payload + // Size: |<= kAlias | offset |payload_size + // ^allocated.^aligned.^header............^payload + // The header must immediately precede payload, which must remain aligned. + // To avoid wasting space, the header resides at the end of `unused`, + // which therefore cannot be empty (offset == 0). + if (offset == 0) { + offset = kAlignment; // = RoundUpTo(sizeof(AllocationHeader), kAlignment) + static_assert(sizeof(AllocationHeader) <= kAlignment, "Else: round up"); + } + + const size_t allocated_size = kAlias + offset + payload_size; + void *allocated = malloc(allocated_size); + HWY_ASSERT(allocated != nullptr); + if (allocated == nullptr) + return nullptr; + // Always round up even if already aligned - we already asked for kAlias + // extra bytes and there's no way to give them back. + uintptr_t aligned = reinterpret_cast(allocated) + kAlias; + static_assert((kAlias & (kAlias - 1)) == 0, "kAlias must be a power of 2"); + static_assert(kAlias >= kAlignment, "Cannot align to more than kAlias"); + aligned &= ~(kAlias - 1); + + const uintptr_t payload = aligned + offset; // still aligned + + // Stash `allocated` and payload_size inside header for FreeAlignedBytes(). + // The allocated_size can be reconstructed from the payload_size. + AllocationHeader *header = reinterpret_cast(payload) - 1; + header->allocated = allocated; + header->payload_size = payload_size; + + return HWY_ASSUME_ALIGNED(reinterpret_cast(payload), kAlignment); +} + +template static T *AllocateAlignedItems(size_t items) { + constexpr size_t size = sizeof(T); + + constexpr bool is_pow2 = (size & (size - 1)) == 0; + constexpr size_t bits = ShiftCount(size); + static_assert(!is_pow2 || (1ull << bits) == size, "ShiftCount is incorrect"); + + const size_t bytes = is_pow2 ? items << bits : items * size; + const size_t check = is_pow2 ? bytes >> bits : bytes / size; + if (check != items) { + return nullptr; // overflowed + } + return static_cast(AllocateAlignedBytes(bytes)); +} + +template +static AlignedFreeUniquePtr AllocateAligned(const size_t items) { + return AlignedFreeUniquePtr(AllocateAlignedItems(items), + AlignedFreer()); +} + +int main() { + AlignedFreeUniquePtr in_lanes = AllocateAligned(2); + uint16_t expected_lanes[2]; + in_lanes[0] = 65535; + in_lanes[1] = 32767; + expected_lanes[0] = 65534; + expected_lanes[1] = 16383; + hwy::N_EMU128::Vec128 v = Load(in_lanes.get()); + hwy::N_EMU128::Vec128 actual = MulHigh(v, v); + { + auto actual_lanes = AllocateAligned(2); + Store(actual, actual_lanes.get()); + const uint8_t *expected_array = + reinterpret_cast(expected_lanes); + const uint8_t *actual_array = + reinterpret_cast(actual_lanes.get()); + for (size_t i = 0; i < 2; ++i) { + const uint8_t *expected_ptr = expected_array + i * 2; + const uint8_t *actual_ptr = actual_array + i * 2; + if (!BytesEqual(expected_ptr, actual_ptr, 2)) { + abort(); + } + } + } +} diff --git a/gcc/testsuite/g++.target/powerpc/pr106322.C b/gcc/testsuite/g++.target/powerpc/pr106322.C new file mode 100644 index 00000000000..1de6e5e37e5 --- /dev/null +++ b/gcc/testsuite/g++.target/powerpc/pr106322.C @@ -0,0 +1,195 @@ +/* { dg-do run } */ +/* { dg-require-effective-target c++11 } */ +/* { dg-options "-O2 -mdejagnu-cpu=power4" } */ + +/* As PR106322, verify this can execute well (not abort). */ + +#include +#include +#include +#include +#include +#include + +__attribute__((noipa)) +bool BytesEqual(const void *bytes1, const void *bytes2, const size_t size) { + return memcmp(bytes1, bytes2, size) == 0; +} + +#define HWY_ALIGNMENT 64 +constexpr size_t kAlignment = HWY_ALIGNMENT; +constexpr size_t kAlias = kAlignment * 4; + +namespace hwy { +namespace N_EMU128 { +template struct Vec128 { + T raw[16 / sizeof(T)] = {}; +}; +} // namespace N_EMU128 +} // namespace hwy + +template +static void Store(const hwy::N_EMU128::Vec128 v, + T *__restrict__ aligned) { + __builtin_memcpy(aligned, v.raw, sizeof(T) * N); +} + +template +static hwy::N_EMU128::Vec128 Load(const T *__restrict__ aligned) { + hwy::N_EMU128::Vec128 v; + __builtin_memcpy(v.raw, aligned, sizeof(T) * N); + return v; +} + +template +static hwy::N_EMU128::Vec128 +MulHigh(hwy::N_EMU128::Vec128 a, + const hwy::N_EMU128::Vec128 b) { + for (size_t i = 0; i < N; ++i) { + // Cast to uint32_t first to prevent overflow. Otherwise the result of + // uint16_t * uint16_t is in "int" which may overflow. In practice the + // result is the same but this way it is also defined. + a.raw[i] = static_cast( + (static_cast(a.raw[i]) * static_cast(b.raw[i])) >> + 16); + } + return a; +} + +#define HWY_ASSERT(condition) assert((condition)) +#define HWY_ASSUME_ALIGNED(ptr, align) __builtin_assume_aligned((ptr), (align)) + +#pragma pack(push, 1) +struct AllocationHeader { + void *allocated; + size_t payload_size; +}; +#pragma pack(pop) + +static void FreeAlignedBytes(const void *aligned_pointer) { + HWY_ASSERT(aligned_pointer != nullptr); + if (aligned_pointer == nullptr) + return; + + const uintptr_t payload = reinterpret_cast(aligned_pointer); + HWY_ASSERT(payload % kAlignment == 0); + const AllocationHeader *header = + reinterpret_cast(payload) - 1; + + free(header->allocated); +} + +class AlignedFreer { +public: + template void operator()(T *aligned_pointer) const { + FreeAlignedBytes(aligned_pointer); + } +}; + +template +using AlignedFreeUniquePtr = std::unique_ptr; + +static inline constexpr size_t ShiftCount(size_t n) { + return (n <= 1) ? 0 : 1 + ShiftCount(n / 2); +} + +namespace { +static size_t NextAlignedOffset() { + static std::atomic next{0}; + constexpr uint32_t kGroups = kAlias / kAlignment; + const uint32_t group = next.fetch_add(1, std::memory_order_relaxed) % kGroups; + const size_t offset = kAlignment * group; + HWY_ASSERT((offset % kAlignment == 0) && offset <= kAlias); + return offset; +} +} // namespace + +static void *AllocateAlignedBytes(const size_t payload_size) { + HWY_ASSERT(payload_size != 0); // likely a bug in caller + if (payload_size >= std::numeric_limits::max() / 2) { + HWY_ASSERT(false && "payload_size too large"); + return nullptr; + } + + size_t offset = NextAlignedOffset(); + + // What: | misalign | unused | AllocationHeader |payload + // Size: |<= kAlias | offset |payload_size + // ^allocated.^aligned.^header............^payload + // The header must immediately precede payload, which must remain aligned. + // To avoid wasting space, the header resides at the end of `unused`, + // which therefore cannot be empty (offset == 0). + if (offset == 0) { + offset = kAlignment; // = RoundUpTo(sizeof(AllocationHeader), kAlignment) + static_assert(sizeof(AllocationHeader) <= kAlignment, "Else: round up"); + } + + const size_t allocated_size = kAlias + offset + payload_size; + void *allocated = malloc(allocated_size); + HWY_ASSERT(allocated != nullptr); + if (allocated == nullptr) + return nullptr; + // Always round up even if already aligned - we already asked for kAlias + // extra bytes and there's no way to give them back. + uintptr_t aligned = reinterpret_cast(allocated) + kAlias; + static_assert((kAlias & (kAlias - 1)) == 0, "kAlias must be a power of 2"); + static_assert(kAlias >= kAlignment, "Cannot align to more than kAlias"); + aligned &= ~(kAlias - 1); + + const uintptr_t payload = aligned + offset; // still aligned + + // Stash `allocated` and payload_size inside header for FreeAlignedBytes(). + // The allocated_size can be reconstructed from the payload_size. + AllocationHeader *header = reinterpret_cast(payload) - 1; + header->allocated = allocated; + header->payload_size = payload_size; + + return HWY_ASSUME_ALIGNED(reinterpret_cast(payload), kAlignment); +} + +template static T *AllocateAlignedItems(size_t items) { + constexpr size_t size = sizeof(T); + + constexpr bool is_pow2 = (size & (size - 1)) == 0; + constexpr size_t bits = ShiftCount(size); + static_assert(!is_pow2 || (1ull << bits) == size, "ShiftCount is incorrect"); + + const size_t bytes = is_pow2 ? items << bits : items * size; + const size_t check = is_pow2 ? bytes >> bits : bytes / size; + if (check != items) { + return nullptr; // overflowed + } + return static_cast(AllocateAlignedBytes(bytes)); +} + +template +static AlignedFreeUniquePtr AllocateAligned(const size_t items) { + return AlignedFreeUniquePtr(AllocateAlignedItems(items), + AlignedFreer()); +} + +int main() { + AlignedFreeUniquePtr in_lanes = AllocateAligned(2); + uint16_t expected_lanes[2]; + in_lanes[0] = 65535; + in_lanes[1] = 32767; + expected_lanes[0] = 65534; + expected_lanes[1] = 16383; + hwy::N_EMU128::Vec128 v = Load(in_lanes.get()); + hwy::N_EMU128::Vec128 actual = MulHigh(v, v); + { + auto actual_lanes = AllocateAligned(2); + Store(actual, actual_lanes.get()); + const uint8_t *expected_array = + reinterpret_cast(expected_lanes); + const uint8_t *actual_array = + reinterpret_cast(actual_lanes.get()); + for (size_t i = 0; i < 2; ++i) { + const uint8_t *expected_ptr = expected_array + i * 2; + const uint8_t *actual_ptr = actual_array + i * 2; + if (!BytesEqual(expected_ptr, actual_ptr, 2)) { + abort(); + } + } + } +} diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index f582d238984..c9dab217f05 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -3423,6 +3423,14 @@ vectorizable_call (vec_info *vinfo, return false; } + if (vect_emulated_vector_p (vectype_in) || vect_emulated_vector_p (vectype_out)) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "use emulated vector type for call\n"); + return false; + } + /* FORNOW */ nunits_in = TYPE_VECTOR_SUBPARTS (vectype_in); nunits_out = TYPE_VECTOR_SUBPARTS (vectype_out);