From patchwork Wed Jun 14 05:59:05 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 107708 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp1022693vqr; Tue, 13 Jun 2023 22:59:56 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5OMuq0N+pkdlg+tQyc9z7TiB32sJrQAlx2tM/b8HHoun8X/vtRVcBNcIWeW36RXC2XTmOF X-Received: by 2002:a17:907:160b:b0:97d:2bdb:aa67 with SMTP id hb11-20020a170907160b00b0097d2bdbaa67mr13960466ejc.70.1686722395995; Tue, 13 Jun 2023 22:59:55 -0700 (PDT) Received: from sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id y21-20020aa7c255000000b005187c27a9b0si825326edo.151.2023.06.13.22.59.55 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 13 Jun 2023 22:59:55 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=V1yuZtlV; arc=fail (signature failed); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 88F5E385840D for ; Wed, 14 Jun 2023 05:59:54 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 88F5E385840D DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1686722394; bh=av3ANhdMxwO1rp5BZafuym8VyFsEVrQdUjolGs7cadk=; h=Date:To:Cc:Subject:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:From; b=V1yuZtlVTV8S45uicXGvv3DR/ycVeUXR/MKlPuJlAQ3OaytVkNetsO8iAND+ji3HP hqq8z2ynnQe8wl1SaXUSV9Q28D1xvvXjiVQNt8lTTgS9zbe7TaUwtWrBRpgUVGlQ3N VH+JnHtxFD0z6eP+QIBscFkry3ZPyStFbyEyQH5c= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR05-AM6-obe.outbound.protection.outlook.com (mail-am6eur05on2052.outbound.protection.outlook.com [40.107.22.52]) by sourceware.org (Postfix) with ESMTPS id 6FA8A3858CDA for ; Wed, 14 Jun 2023 05:59:10 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 6FA8A3858CDA ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=YIkFn1HBEK4Wffd8BxSoPTw1kCQcJzU6oBdsRRgnMAzEnRLcq23defZpSNOD7txFw21V1YEcWjkfdOQkj1fnOcekU3RGbzHvH9FU83TNOvRZJwyFRqN16Nil6/BUTg5VG5doTFSOOy4kWghRdsobVjgHCh96+ZWZo3ee5pJ7d0tvPrBiJAoX7jaqR5EgjmX5sGkDGYNwmkemO1KckCd5zXh3C1aUjMAyjp9lZMZ9Pk9u4Nb3OQDFir3NU4SCk5bkSiX2VKEkuY3618RzpzGJ6KIYCCOAgsUKLnLmHF1isEbAqJTIMeiqYKLPI5cachZgFJ7XZwN5gIOxyMxfArGqtw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=av3ANhdMxwO1rp5BZafuym8VyFsEVrQdUjolGs7cadk=; b=nz2gsb50yaqDvSY/whjeeejG171elarWwImc7lPWJnEWA0RCW4MjJnBrePHpIR0anpyL3ORyxxKCci/n4XX5PBA/gmMtT6R6X1PRxELFaGEdbZRXxySLHkkWLaIMrFF3RgpHMGD7gQoiE5h4PQCL5PECXKNvNNFGdvTo/W0Bk+cA3kQ2K1tECq01nJo/eJOl4J0TAdfz7qKI3LU7SfP3WNJdWGikfEq9SGiLtdjqWejwnHIvDg3Gc2MLS2+zeWDZHUziIN44KUs/GMeb1tJTow7edsFTVttKTHe8kL/fPAaJlla2GCi1TQcrmsqAmXcypFv+BvzalpKX2gEa8zeqtg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none Received: from VE1PR04MB6560.eurprd04.prod.outlook.com (2603:10a6:803:122::25) by VI1PR04MB9858.eurprd04.prod.outlook.com (2603:10a6:800:1d4::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6455.44; Wed, 14 Jun 2023 05:59:07 +0000 Received: from VE1PR04MB6560.eurprd04.prod.outlook.com ([fe80::e442:306f:7711:e24c]) by VE1PR04MB6560.eurprd04.prod.outlook.com ([fe80::e442:306f:7711:e24c%5]) with mapi id 15.20.6455.039; Wed, 14 Jun 2023 05:59:07 +0000 Message-ID: <68c1aa7d-0a7b-1427-55f8-edc6302f00dc@suse.com> Date: Wed, 14 Jun 2023 07:59:05 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.12.0 Content-Language: en-US To: "gcc-patches@gcc.gnu.org" Cc: Kirill Yukhin , Hongtao Liu Subject: [PATCH] x86: make VPTERNLOG* usable on less than 512-bit operands with just AVX512F X-ClientProxiedBy: FR3P281CA0015.DEUP281.PROD.OUTLOOK.COM (2603:10a6:d10:1d::13) To VE1PR04MB6560.eurprd04.prod.outlook.com (2603:10a6:803:122::25) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: VE1PR04MB6560:EE_|VI1PR04MB9858:EE_ X-MS-Office365-Filtering-Correlation-Id: de12ff73-e226-44c1-21f7-08db6c9c7719 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: eCPBOZaRxvQ2IIrcWBElDcYjex2Aj5StvGA0+SV8JRPeo65in/IOoIj7bGwZLL6sj04+9DWZnUJ9/pP0ELEv0Hi3hWQAmcOUcMYeKW2X9laEavzu1b7zXmP3WzGgxDvy0pJnVnSGOlJl6VDwjMI0GjCn6TX32kttkbs6uAiiMB3Y3zedB6RscgFUFaeIW8WpTCoD1cMwXi3xJSFmQiLScUY+95T7hC3+3w06F+A9cyrCR5rBpgcDuaM39zU1hlsA2+J7pJqUv25yx3MlU1ouooZDnawkGYqyN17Ca0t43B4Rvotw7jwDGm7twJJNEC/+XIXtqmGr/qvJ2OhBDAt7FeinpSDuYUQ8R1ZJciau6XVnpJ+4HPNAHjFOC/CF54W/7YQMbKPiUtNgcXhoFwfQIdj/BAHBhCGIRyylGNnRWzDaLMsXdCcbZow96OLrNIiIagpXtWfpttGK23yJJwz/AS3S5LGY1RzRn5cYK0USXwbnTappkHWnE64IxmJz2quS7dW/u1/QSHD7M8fWT8DsoolSXmJuXQoCIn//1Cc93r0gnODJudcxXMMXiks4nd8iDrJCEi6GZTXsxYMa/wzNXHUWMdCAcUqUP6E8tpTd0qncc+d1q1UnnhDF9h/6Znb/rXm9It54AmwnHSS1Wj9QxA== X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VE1PR04MB6560.eurprd04.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230028)(376002)(366004)(136003)(396003)(39860400002)(346002)(451199021)(36756003)(31696002)(86362001)(2906002)(31686004)(6486002)(83380400001)(186003)(6512007)(6506007)(26005)(54906003)(4326008)(66946007)(66556008)(6916009)(66476007)(41300700001)(316002)(2616005)(38100700002)(478600001)(8676002)(8936002)(5660300002)(45980500001)(43740500002); DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?q?mK+v+MtH7fNrboDf1svehgZD8vEj?= =?utf-8?q?OGfbn3R9fgCSHmNsL2z/3nwD56Ya/z1nlqPAqlv/QQ92pVYIiOdFMQwlUUiBbj8HH?= =?utf-8?q?S4nb/6p8RwJGFlUH2c5RDUrsAywDZXt3txoESpjOkeUBp3AUWBDVF3wF8KEpwfv3a?= =?utf-8?q?OG5OrVABT2U5p4iwYyYHRkVxBZOiK2J/ShUc+b2BgP/76dT4dNjlpdoRssscdqECk?= =?utf-8?q?ue9s0CctSWJmVWmOQOsH4VGKnx9smS4zVetl0D0OEXdLegPE/P9jLVU7u8a4thWOM?= =?utf-8?q?KlcuoRq5dIgt0SsC6GCKxMjcEPwIcVVYKzxjqcXb3VnAk861LvaAwce6zTpY66eL/?= =?utf-8?q?xN05F/ZOGIsUVXgG5e2+AdDCY5GCwDWOP7OpLsoPjGpRxACCivCNbR9OdjRejcTQ6?= =?utf-8?q?mzJYT/vBUQdz7ChaHcT7dHXeqMfJGZ3fndps1EjxAK6FhaTY4NMvzs9qH2c4rYbJA?= =?utf-8?q?HYNZV/RNHyWP3420oXHRoJCsHTBEvDI0e/eTj6lX+2jbTn+Ew6Gotmx0Sk14nuJr5?= =?utf-8?q?J+eYsljnwHMqSEtSJiNgkKCpGhdbgq3SYV3L0d0JBrI2L4KA+n/Ne1ujOxZs+DJGd?= =?utf-8?q?N8KikHljBRl2amWWdSZW04ksr+yBNsTpQ0NmmerbND8O0B3wRDk0aDL7tAlORffA0?= =?utf-8?q?SiOK97EVxx8Gg821ayx8c4JQYSaSxfx9QxYSD2sJn+r5vrRtYR9HdiKpY/wKGAkQT?= =?utf-8?q?ulxb7xKgev7jjO1ukgLvD1xC/ETsbcdHfERJyGWwQJfy4N54vhL19ZHQIbbGFxPsi?= =?utf-8?q?dan3WE9qZ8+Gu8/55zwK0IgOAmzZe2ppPzgIoqbu2GA5kyMiGZiHADrdXqGiNYH3n?= =?utf-8?q?qEXUjRMZ5uj5pu//Ixf9bRewLkF++IY32yDYe2ozBoWG8OphqS+BJae4djucuOg4u?= =?utf-8?q?KcHsZj6saf9CtTZRMYmHizBCChKoKDzLgH/8EVWnfjjwCVCOgIxmUHiXgZtJxnMeo?= =?utf-8?q?GpFHxPGVKuD9FUm9zBWsIyCAf4tyruR1ccZnM3PqsqUtCDlQ3ZHfh3oLA/l/xTcfw?= =?utf-8?q?+wax8SV4J4XI90QcFp0mrgK2t60BMmpmQYYutATMcgtNdvrEyju+/pUfHTFtdkb0O?= =?utf-8?q?EL4HO5rhkuimOWDXStPffyIo7/9k+45LIyDXXo046nIjSVnU0/qRrAwmOJ7MFrbmh?= =?utf-8?q?KlYxYF/j5GX6tseeEyfAD/TvePuJTABVgpf3MAbAighEN9W7vF8Pq2UP0RtxgM0D7?= =?utf-8?q?xI2EyGA6lML2Wnbew85/WLevxVu2pZKlWcf2DAiz51UhXdgGg9jWF8qygpu13tRu+?= =?utf-8?q?sr0CNTnGme5jdhHRKjAoZmew/nk5zn1EFZ2NmFu9LLMm3WzTsDKtiTcCC1BLoHRdl?= =?utf-8?q?vGlGQBJ6nBhuxRms3DwA2JJ6xnb1u6flGa+8xQTKoxT8jaGtIclL5w8WEv20203bL?= =?utf-8?q?MZYiAXDkQ8vAezj/NlCRf53zxuvyCa2Z83A1vxzMWnkFgDXGjGH2F+Bk3zgam9tP+?= =?utf-8?q?fNcy5GGZyy6PgjzxG0NS8pZECtxB69TRKX9oZLXTyRSvN9xylQJugApx4goM12GD4?= =?utf-8?q?CjlzrVRAVhaC?= X-OriginatorOrg: suse.com X-MS-Exchange-CrossTenant-Network-Message-Id: de12ff73-e226-44c1-21f7-08db6c9c7719 X-MS-Exchange-CrossTenant-AuthSource: VE1PR04MB6560.eurprd04.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 14 Jun 2023 05:59:07.3122 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: f7a17af6-1c5c-4a36-aa8b-f5be247aa4ba X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: NSI5Q7xSjwzRk5jCRlFajVtc0ohRvFmrPeRBH2Ix/1ZPRawmpFqHwbI4H2tb7okSQUOB+nsoOrN6nxyi3xo5tw== X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR04MB9858 X-Spam-Status: No, score=-3027.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Jan Beulich via Gcc-patches From: Jan Beulich Reply-To: Jan Beulich Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768656623065641172?= X-GMAIL-MSGID: =?utf-8?q?1768656623065641172?= There's no reason to constrain this to AVX512VL, as the wider operation is not usable for more narrow operands only when the possible memory source is a non-broadcast one. This way even the scalar copysign3 can benefit from the operation being a single-insn one (leaving aside moves which the compiler decides to insert for unclear reasons, and leaving aside the fact that bcst_mem_operand() is too restrictive for broadcast to be embedded right into VPTERNLOG*). Along with this also request value duplication in ix86_expand_copysign()'s call to ix86_build_signbit_mask(), eliminating excess space allocation in .rodata.*, filled with zeros which are never read. gcc/ * config/i386/i386-expand.cc (ix86_expand_copysign): Request value duplication by ix86_build_signbit_mask() when AVX512F and not HFmode. * config/i386/sse.md (*_vternlog_all): Convert to 2-alternative form. Adjust "mode" attribute. Add "enabled" attribute. (*_vpternlog_1): Relax to just TARGET_AVX512F. (*_vpternlog_2): Likewise. (*_vpternlog_3): Likewise. --- I guess the underlying pattern, going along the lines of what one_cmpl2 uses, can be applied elsewhere as well. HFmode could use embedded broadcast too for copysign and alike, but that would need to be V2HF -> V8HF (for which I don't think there are any existing patterns). --- a/gcc/config/i386/i386-expand.cc +++ b/gcc/config/i386/i386-expand.cc @@ -2266,7 +2266,7 @@ ix86_expand_copysign (rtx operands[]) else dest = NULL_RTX; op1 = lowpart_subreg (vmode, force_reg (mode, operands[2]), mode); - mask = ix86_build_signbit_mask (vmode, 0, 0); + mask = ix86_build_signbit_mask (vmode, TARGET_AVX512F && mode != HFmode, 0); if (CONST_DOUBLE_P (operands[1])) { --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -12399,11 +12399,11 @@ (set_attr "mode" "")]) (define_insn "*_vternlog_all" - [(set (match_operand:V 0 "register_operand" "=v") + [(set (match_operand:V 0 "register_operand" "=v,v") (unspec:V - [(match_operand:V 1 "register_operand" "0") - (match_operand:V 2 "register_operand" "v") - (match_operand:V 3 "bcst_vector_operand" "vmBr") + [(match_operand:V 1 "register_operand" "0,0") + (match_operand:V 2 "register_operand" "v,v") + (match_operand:V 3 "bcst_vector_operand" "vBr,m") (match_operand:SI 4 "const_0_to_255_operand")] UNSPEC_VTERNLOG))] "TARGET_AVX512F @@ -12411,10 +12411,22 @@ it's not real AVX512FP16 instruction. */ && (GET_MODE_SIZE (GET_MODE_INNER (mode)) >= 4 || GET_CODE (operands[3]) != VEC_DUPLICATE)" - "vpternlog\t{%4, %3, %2, %0|%0, %2, %3, %4}" +{ + if (TARGET_AVX512VL) + return "vpternlog\t{%4, %3, %2, %0|%0, %2, %3, %4}"; + else + return "vpternlog\t{%4, %g3, %g2, %g0|%g0, %g2, %g3, %4}"; +} [(set_attr "type" "sselog") (set_attr "prefix" "evex") - (set_attr "mode" "")]) + (set (attr "mode") + (if_then_else (match_test "TARGET_AVX512VL") + (const_string "") + (const_string "XI"))) + (set (attr "enabled") + (if_then_else (eq_attr "alternative" "1") + (symbol_ref " == 64 || TARGET_AVX512VL") + (const_string "*")))]) ;; There must be lots of other combinations like ;; @@ -12443,7 +12455,7 @@ (any_logic2:V (match_operand:V 3 "regmem_or_bitnot_regmem_operand") (match_operand:V 4 "regmem_or_bitnot_regmem_operand"))))] - "( == 64 || TARGET_AVX512VL) + "TARGET_AVX512F && ix86_pre_reload_split () && (rtx_equal_p (STRIP_UNARY (operands[1]), STRIP_UNARY (operands[4])) @@ -12527,7 +12539,7 @@ (match_operand:V 2 "regmem_or_bitnot_regmem_operand")) (match_operand:V 3 "regmem_or_bitnot_regmem_operand")) (match_operand:V 4 "regmem_or_bitnot_regmem_operand")))] - "( == 64 || TARGET_AVX512VL) + "TARGET_AVX512F && ix86_pre_reload_split () && (rtx_equal_p (STRIP_UNARY (operands[1]), STRIP_UNARY (operands[4])) @@ -12610,7 +12622,7 @@ (match_operand:V 1 "regmem_or_bitnot_regmem_operand") (match_operand:V 2 "regmem_or_bitnot_regmem_operand")) (match_operand:V 3 "regmem_or_bitnot_regmem_operand")))] - "( == 64 || TARGET_AVX512VL) + "TARGET_AVX512F && ix86_pre_reload_split ()" "#" "&& 1"