From patchwork Wed Nov 29 02:40:07 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 171039 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a5a7:0:b0:403:3b70:6f57 with SMTP id d7csp79382vqn; Tue, 28 Nov 2023 18:42:38 -0800 (PST) X-Google-Smtp-Source: AGHT+IHm+Z/sI1i9OwBNtR94jdzn+tqtD0XH+W2iZYaHUPuN4QBT7Y406SXl9Plh5izCoYox++HM X-Received: by 2002:a05:620a:901:b0:77b:ec3a:27c8 with SMTP id v1-20020a05620a090100b0077bec3a27c8mr17335869qkv.63.1701225758542; Tue, 28 Nov 2023 18:42:38 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1701225758; cv=pass; d=google.com; s=arc-20160816; b=bxY286l7NWWqFYfEDXpoTOVfm0mGFsb1oaOgQXKbCSyHhDxnKsgKQRBmU+97aLbZrr s9bplG7Zjr2bMCU6KMvyR/k/MS6IP9jl/uvSnt+h+G2R9m4/1YoYMXzIDn7kXf/fpt+I rAvx7FtnL5wWlk1D9CgCg3rfqo1HzzfzeOGHICD4GU3gV51KeXL5bqqYVn4VqmYuaf62 TbJ0zm1Lhz+o8HLLkX7M82eXW9QSPqC2AT/fc+y4K9z/5psBwY/JJfJDrO558V9F7oUh vOuGCzY0ykQR6BGfhA8TVPSB2m7aiY9mtEaqhiWz5lCbMF/VEvo2+UA3lKAe9TJeZ3Vc 6LJA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:message-id:date:subject:cc:to:from:dkim-signature :arc-filter:dmarc-filter:delivered-to; bh=AnB28EaslLxAObxNtv8c7Vgep5JVLH6XhttPn7ngVEg=; fh=ChXOctppJn0KECDRINafwUY5xHRufGHaa0Ju9pddrcQ=; b=ax1j+oj7uvjqGhbaF0yqOjmDZtt+e+k/cFB2a5ZWB2IJEQ4TtvmBpsBiWMeapYKgsN AWygiqRm0OexcMNHS/cmzJ45OXDekZm1aFihjzUK9Qh5Sntj66BxcRyWnogBI9UoNcVo +5sCD6/xu7QHaXoNJvKqwkT9pFh7DtKrACv6RqOfSVDPmL+Qcwv0jH1JJWPZyXK6aIfM cYAz6cp+njBuKE30PablLT227BLuuIkEilEtIEtZovTlimYQwX5Gt0ASh2f4AzymQ83v fn3v9PQacbSGehghu/K3X3A1uixx87oK0S2I0ZzSQcYH0WlB5dWy1KmBKHRPE+ZOfsMP nEfQ== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=cyTS5MbY; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from server2.sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id po11-20020a05620a384b00b0076ef0da9545si11641935qkn.736.2023.11.28.18.42.38 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Nov 2023 18:42:38 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=cyTS5MbY; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 4C5DE3857BBC for ; Wed, 29 Nov 2023 02:42:38 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.43]) by sourceware.org (Postfix) with ESMTPS id E5E8D3858CD1 for ; Wed, 29 Nov 2023 02:42:12 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org E5E8D3858CD1 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org E5E8D3858CD1 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=192.55.52.43 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701225734; cv=none; b=oltLDY7FL6uE4xUWyMfjVfsIkVLtldS4obiMF7CbdRKAZx05AxFjW6Cu7+g5zo2/usTgmX1AMQKmyuki+Ukqc7b8apKK9FG0e8Kq1XrEd77Ii8mZ6olBQIAvCWrUvdFs/bsho2wuAKviTKzqQmsOKPQ5Eu6ZDCV9mea+fmj3yj4= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701225734; c=relaxed/simple; bh=/cbXdKgu/8fUKtPVKftHFwz4iS6jePREAAQF4hKCD8s=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=fNyIdRWHLmUExTHMIM7pVlxWNMwbbHFngAyf0Ny5sicWqiQp40ZcW5ztUG9mYLRehL1sdtFy+RDHcOt50jkqBmtfIg5xPfQyIuiOcy4VgMufQq1osgD4mKy8Wn1/mNWCdBd4Zk0rVnHtZyyASaNSBvxNDEAZ8aS1edousk1H7vw= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1701225732; x=1732761732; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=/cbXdKgu/8fUKtPVKftHFwz4iS6jePREAAQF4hKCD8s=; b=cyTS5MbYtfnVqvl5yPeTuea0XqQqVMLDS2lNyYekaCxWI3lkbVrnOYXS +a642dqQDcgNtRDkC6upan9+CMThG6OxzIIqHHgLp5SYhR7BZeae9Yj9v JxMwjBzdqTbXTXQW4ZFwlufRYBNC8tSeJhUcjKG2CGHKNbEu5ZPUEyOlU jN6yV68aPoafy9oI9hLBqmfd7QTGsnApt519UiNziB87Gje9PKiCUFqIM ga8u2oAVmosFctvrqbevAmzwnJULMXFvCFEYLo4CZlpyPDVuXbs5DPP4R /Mrq02y8CSUDtiDeLdQIq14TWV/1OZ5EfUarn7jmU0/huIcHyMM53EsIX Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10908"; a="479276822" X-IronPort-AV: E=Sophos;i="6.04,234,1695711600"; d="scan'208";a="479276822" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Nov 2023 18:42:11 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10908"; a="1100339989" X-IronPort-AV: E=Sophos;i="6.04,234,1695711600"; d="scan'208";a="1100339989" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by fmsmga005.fm.intel.com with ESMTP; 28 Nov 2023 18:42:08 -0800 Received: from shliclel4217.sh.intel.com (shliclel4217.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id 155A51005609; Wed, 29 Nov 2023 10:42:08 +0800 (CST) From: liuhongt To: gcc-patches@gcc.gnu.org Cc: crazylht@gmail.com, hjl.tools@gmail.com Subject: [PATCH] [x86] Support sdot_prodv*qi with emulation of sdot_prodv*hi. Date: Wed, 29 Nov 2023 10:40:07 +0800 Message-Id: <20231129024007.493958-1-hongtao.liu@intel.com> X-Mailer: git-send-email 2.31.1 MIME-Version: 1.0 X-Spam-Status: No, score=-12.0 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1783864500819030284 X-GMAIL-MSGID: 1783864500819030284 Currently sdot_prodv*qi is available under TARGET_AVXVNNIINT8, but it can be emulated by vec_unpacks_lo_v32qi vec_unpacks_lo_v32qi vec_unpacks_hi_v32qi vec_unpacks_hi_v32qi sdot_prodv16hi sdot_prodv16hi add3v8si which is faster than original vect_patt_39.11_48 = WIDEN_MULT_LO_EXPR ; vect_patt_39.11_49 = WIDEN_MULT_HI_EXPR ; vect_patt_38.14_54 = [vec_unpack_lo_expr] vect_patt_39.11_48; vect_patt_38.14_55 = [vec_unpack_hi_expr] vect_patt_39.11_48; vect_patt_38.14_56 = [vec_unpack_lo_expr] vect_patt_39.11_49; vect_patt_38.14_57 = [vec_unpack_hi_expr] vect_patt_39.11_49; vect_sum_15.15_59 = vect_patt_38.14_54 + vect_patt_38.14_55; vect_sum_15.15_60 = vect_patt_38.14_56 + vect_sum_15.15_59; vect_sum_15.15_61 = vect_patt_38.14_57 + vect_sum_15.15_60; Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ready push to trunk. gcc/ChangeLog: * config/i386/sse.md (sdot_prodv64qi): New expander. (sseunpackmodelower): New mode attr. (sdot_prod): Emulate sdot_prodv*qi with sodt_prov*hi when TARGET_VNNIINT8 is not available. gcc/testsuite/ChangeLog: * gcc.target/i386/sdotprodint8_emulate.c: New test. --- gcc/config/i386/sse.md | 87 ++++++++++++++++--- .../gcc.target/i386/sdotprodint8_emulate.c | 15 ++++ 2 files changed, 90 insertions(+), 12 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/sdotprodint8_emulate.c diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index f94a77d0b6d..e29311d83cc 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -1291,6 +1291,11 @@ (define_mode_attr sseunpackmode (V32QI "V16HI") (V16HI "V8SI") (V8SI "V4DI") (V32HI "V16SI") (V64QI "V32HI") (V16SI "V8DI")]) +(define_mode_attr sseunpackmodelower + [(V16QI "v8hi") (V8HI "v4si") (V4SI "v2di") + (V32QI "v16hi") (V16HI "v8si") (V8SI "v4di") + (V32HI "v16si") (V64QI "v32hi") (V16SI "v8di")]) + (define_mode_attr ssepackmode [(V8HI "V16QI") (V4SI "V8HI") (V2DI "V4SI") (V16HI "V32QI") (V8SI "V16HI") (V4DI "V8SI") @@ -30742,20 +30747,78 @@ (define_int_attr vpdotprodtype (define_expand "sdot_prod" [(match_operand: 0 "register_operand") - (match_operand:VI1 1 "register_operand") - (match_operand:VI1 2 "register_operand") + (match_operand:VI1_AVX2 1 "register_operand") + (match_operand:VI1_AVX2 2 "register_operand") (match_operand: 3 "register_operand")] - "TARGET_AVXVNNIINT8" + "TARGET_SSE2" { - operands[1] = lowpart_subreg (mode, - force_reg (mode, operands[1]), - mode); - operands[2] = lowpart_subreg (mode, - force_reg (mode, operands[2]), - mode); - emit_insn (gen_rtx_SET (operands[0], operands[3])); - emit_insn (gen_vpdpbssd_ (operands[0], operands[3], - operands[1], operands[2])); + if (TARGET_AVXVNNIINT8) + { + operands[1] = lowpart_subreg (mode, + force_reg (mode, operands[1]), + mode); + operands[2] = lowpart_subreg (mode, + force_reg (mode, operands[2]), + mode); + emit_insn (gen_rtx_SET (operands[0], operands[3])); + emit_insn (gen_vpdpbssd_ (operands[0], operands[3], + operands[1], operands[2])); + } + else + { + /* Emulate with vpdpwssd. */ + rtx op1_lo = gen_reg_rtx (mode); + rtx op1_hi = gen_reg_rtx (mode); + rtx op2_lo = gen_reg_rtx (mode); + rtx op2_hi = gen_reg_rtx (mode); + + emit_insn (gen_vec_unpacks_lo_ (op1_lo, operands[1])); + emit_insn (gen_vec_unpacks_lo_ (op2_lo, operands[2])); + emit_insn (gen_vec_unpacks_hi_ (op1_hi, operands[1])); + emit_insn (gen_vec_unpacks_hi_ (op2_hi, operands[2])); + + rtx res1 = gen_reg_rtx (mode); + rtx res2 = gen_reg_rtx (mode); + rtx sum = gen_reg_rtx (mode); + + emit_move_insn (sum, CONST0_RTX (mode)); + emit_insn (gen_sdot_prod (res1, op1_lo, + op2_lo, sum)); + emit_insn (gen_sdot_prod (res2, op1_hi, + op2_hi, operands[3])); + emit_insn (gen_add3 (operands[0], res1, res2)); + } + + DONE; +}) + +(define_expand "sdot_prodv64qi" + [(match_operand:V16SI 0 "register_operand") + (match_operand:V64QI 1 "register_operand") + (match_operand:V64QI 2 "register_operand") + (match_operand:V16SI 3 "register_operand")] + "(TARGET_AVX512VNNI || TARGET_AVX512BW) && TARGET_EVEX512" +{ + /* Emulate with vpdpwssd. */ + rtx op1_lo = gen_reg_rtx (V32HImode); + rtx op1_hi = gen_reg_rtx (V32HImode); + rtx op2_lo = gen_reg_rtx (V32HImode); + rtx op2_hi = gen_reg_rtx (V32HImode); + + emit_insn (gen_vec_unpacks_lo_v64qi (op1_lo, operands[1])); + emit_insn (gen_vec_unpacks_lo_v64qi (op2_lo, operands[2])); + emit_insn (gen_vec_unpacks_hi_v64qi (op1_hi, operands[1])); + emit_insn (gen_vec_unpacks_hi_v64qi (op2_hi, operands[2])); + + rtx res1 = gen_reg_rtx (V16SImode); + rtx res2 = gen_reg_rtx (V16SImode); + rtx sum = gen_reg_rtx (V16SImode); + + emit_move_insn (sum, CONST0_RTX (V16SImode)); + emit_insn (gen_sdot_prodv32hi (res1, op1_lo, op2_lo, sum)); + emit_insn (gen_sdot_prodv32hi (res2, op1_hi, op2_hi, operands[3])); + + emit_insn (gen_addv16si3 (operands[0], res1, res2)); DONE; }) diff --git a/gcc/testsuite/gcc.target/i386/sdotprodint8_emulate.c b/gcc/testsuite/gcc.target/i386/sdotprodint8_emulate.c new file mode 100644 index 00000000000..ed584606820 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/sdotprodint8_emulate.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-mavxvnni -O2 -fdump-tree-optimized" } */ +/* { dg-final { scan-tree-dump-times "DOT_PROD_EXPR" 1 "optimized" } } */ +/* { dg-final { scan-assembler-times "vpdpwssd" 2 } } */ + +int +foo (char* a, char* b) +{ + int sum = 0; + for (int i = 0; i != 16; i++) + { + sum += a[i] * b[i]; + } + return sum; +}