From patchwork Thu May 18 18:50:25 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Uros Bizjak <ubizjak@gmail.com>
X-Patchwork-Id: 96022
Return-Path: <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp724222vqo;
        Thu, 18 May 2023 11:51:24 -0700 (PDT)
X-Google-Smtp-Source: 
 ACHHUZ6DGtlQowuMiB2vTkNthMTy0Gocg01sWVLP7/uSOuS7eW41Y19R1dz7mwBgltGSNPqBrIHQ
X-Received: by 2002:a05:6402:796:b0:510:d003:3dee with SMTP id
 d22-20020a056402079600b00510d0033deemr4896222edy.16.1684435884146;
        Thu, 18 May 2023 11:51:24 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1684435884; cv=none;
        d=google.com; s=arc-20160816;
        b=Sxm11zaF/2lEupRZHoo4FCb0+fQ3wwnPZWB2hSZa+8RTkDLlCtobdilY9sbjEQVPTO
         KMK53eJBcWPPCZpMCaXUm8DBS84ZcjZYjFXy+ku9C4MushAEceBoC4geM+DevD+NrTA3
         QC7tS8soXKLD48utKOhJcpNV9+vYiI9qbcT5UEBnQZR3bWvtAdSAS+orFAwdKQwCJq+M
         Xh1FYNXW2bB2K9aRXJCakBm993u7tTjOdI+07VaPMaoOUps5C5TYYeYLrSBgtHRhNhFt
         F00sJgxcYw0FkkUfxaJuQX+ubfR7GNg7eUwtfeHNKN/sbz+jeWQVjs54s7iYYtid8gnm
         LKkg==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post
         :list-archive:list-unsubscribe:list-id:precedence:cc:to:subject
         :message-id:date:mime-version:dmarc-filter:delivered-to
         :dkim-signature:dkim-filter;
        bh=NWAX4a8yQtymhXcYs98JLf6l9jZ5pc8IGM3Fz30trrQ=;
        b=qr9erKkUX1/6Mkv0G4vuLm5KmJu8KfvTwiRvecJlmD0hGPGc7LvrMIXAKZUsfrEifL
         CYgtVnqcmP89IhjMPyPJcDGd/KRWmwXA1euJCOPZkV0sl8s0EQFFg/FcsWND+S5COzuN
         KlVQlwhL1IeqELVXQmErhWM4Z2IxL7S0UqYJXN5uq4UBVhHBmWDFE5bCPJL/bHmrDOwt
         CZnvUXza0DnhL4xhHtw+6ImrLIPRzDvzQFJm+WBjKRouYbc0UEH7WxFJiy+hhpV5H0q0
         tUZvQ7lCDl7k5ISf6tf8V6nJD3aA3FY+WXUc/jOYrePk4NEEecf/fXtwYXtI/vq9P2Yy
         20Lg==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@gcc.gnu.org header.s=default header.b="H+Fdoc/x";
       spf=pass (google.com: domain of
 gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates
 2620:52:3:1:0:246e:9693:128c as permitted sender)
 smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org";
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org
Received: from sourceware.org (server2.sourceware.org.
 [2620:52:3:1:0:246e:9693:128c])
        by mx.google.com with ESMTPS id
 c8-20020aa7df08000000b0050c08405e71si1622879edy.613.2023.05.18.11.51.23
        for <ouuuleilei@gmail.com>
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Thu, 18 May 2023 11:51:24 -0700 (PDT)
Received-SPF: pass (google.com: domain of
 gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates
 2620:52:3:1:0:246e:9693:128c as permitted sender)
 client-ip=2620:52:3:1:0:246e:9693:128c;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@gcc.gnu.org header.s=default header.b="H+Fdoc/x";
       spf=pass (google.com: domain of
 gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates
 2620:52:3:1:0:246e:9693:128c as permitted sender)
 smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org";
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id A9CD938555AE
	for <ouuuleilei@gmail.com>; Thu, 18 May 2023 18:51:22 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A9CD938555AE
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1684435882;
	bh=NWAX4a8yQtymhXcYs98JLf6l9jZ5pc8IGM3Fz30trrQ=;
	h=Date:Subject:To:Cc:List-Id:List-Unsubscribe:List-Archive:
	 List-Post:List-Help:List-Subscribe:From:Reply-To:From;
	b=H+Fdoc/xhak0qiOELGblJDdagSmY+axIu7rSLCd7KRXZomzRRCECbt6BqRO3uJ46H
	 fdJFH8SIzpAXiSNJOBBnRb3qv1Wz+M1JudbYrzOUUyGJb+kma7NU+p4I8rBc3+I0L2
	 HsZaJ5YSxtAMQNXIuwBlkdwWennOJ+7daZo+EHLI=
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from mail-qk1-x72e.google.com (mail-qk1-x72e.google.com
 [IPv6:2607:f8b0:4864:20::72e])
 by sourceware.org (Postfix) with ESMTPS id B27493858CDB
 for <gcc-patches@gcc.gnu.org>; Thu, 18 May 2023 18:50:37 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B27493858CDB
Received: by mail-qk1-x72e.google.com with SMTP id
 af79cd13be357-759413d99afso750573185a.1
 for <gcc-patches@gcc.gnu.org>; Thu, 18 May 2023 11:50:37 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20221208; t=1684435837; x=1687027837;
 h=cc:to:subject:message-id:date:from:mime-version:x-gm-message-state
 :from:to:cc:subject:date:message-id:reply-to;
 bh=NWAX4a8yQtymhXcYs98JLf6l9jZ5pc8IGM3Fz30trrQ=;
 b=QGSb+IMZw/dpaa5Gh24Ok9bU4y+ZO2Of9rlXiSdF3coxsNZLWauZOS8SEzUnAgZyjp
 KPZYKYij6IxnDgw155o/RaQwCt0ILrDrwr/qZT13L5CGGIY74cRod68GbqPk3OI5O26l
 Xa4OVTHTOLJgprAn18aE+aGpO5Iayv/nNBGkHcD8RGvCCO3j3wPFu4UBDDlQ0ICZ9fgo
 xBBSzzs63Oh1o2ALIFoZgbwHzarKjpb+LywMCf8J6Cjxu80kVAmGno9kKH0Lrx6lie0o
 VhUoRzxFzMWzrSOnZP+g+Y3lupQL5ZaWkxJQH/K4ekZLFL+1y7qbqaCo2R4aGdfwUFC7
 3FRQ==
X-Gm-Message-State: AC+VfDxBdGbBZ0hvVpwguz4ZhHV95lGzuQfIMKC4evKsN+lQZC7NE51w
 h/ByiV1895T7+Gokzj51x73LaoTtdbwNW6KVekxpLz9SNdmU6g==
X-Received: by 2002:a05:6214:501a:b0:616:5755:ca5d with SMTP id
 jo26-20020a056214501a00b006165755ca5dmr141792qvb.4.1684435836695; Thu, 18 May
 2023 11:50:36 -0700 (PDT)
MIME-Version: 1.0
Date: Thu, 18 May 2023 20:50:25 +0200
Message-ID: 
 <CAFULd4Yq+s2JkNrazY-H1bANSTFW10+kJ81z8wefUrAPeN+Szg@mail.gmail.com>
Subject: [COMMITTED] i386: Add infrastructure for QImode partial vector mult
 and shift operations
To: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Cc: Hongtao Liu <crazylht@gmail.com>
X-Spam-Status: No, score=-8.6 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0,
 KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-Patchwork-Original-From: Uros Bizjak via Gcc-patches
 <gcc-patches@gcc.gnu.org>
From: Uros Bizjak <ubizjak@gmail.com>
Reply-To: Uros Bizjak <ubizjak@gmail.com>
Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org
Sender: "Gcc-patches" <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org>
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1766259041712580626?=
X-GMAIL-MSGID: =?utf-8?q?1766259041712580626?=

QImode partial vector multiplications and shifts can be implemented using
their HImode counterparts.  Add infrastructure to handle V8QImode and
V4QImode vectors by extending (interleaving) their input operands to
V8HImode, performing V8HImode operation and truncating output back to
the original QImode vector.

The patch implements V8QImode and V4QImode multiplication for SSE2 targets,
using generic permutation to truncate output operand, but still taking
advantage of VPMOVWB down convert instruction, when available.

The patch also removes setting of REG_EQAUL note to the last insn
of ix86_expand_vecop_qihi expander.  This is what generic code does
automatically when named pattern is expanded.

gcc/ChangeLog:

    * config/i386/i386-expand.cc (ix86_expand_vecop_qihi_partial): New.
    (ix86_expand_vecop_qihi): Add op2vec bool variable.
    Do not set REG_EQUAL note.
    * config/i386/i386-protos.h (ix86_expand_vecop_qihi_partial):
    Add prototype.
    * config/i386/i386.cc (ix86_multiplication_cost): Handle
    V4QImode and V8QImode.
    * config/i386/mmx.md (mulv8qi3): New expander.
    (mulv4qi3): Ditto.
    * config/i386/sse.md (mulv8qi3): Remove.

gcc/testsuite/ChangeLog:

    * gcc.target/i386/avx512vl-pr95488-1.c: Adjust
    expected scan-assembler-times frequency and strings.
    * gcc.target/i386/vect-mulv4qi.c: New test.
    * gcc.target/i386/vect-mulv8qi.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 8a869eb3b30..d5116801498 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -23270,6 +23270,116 @@ ix86_expand_vec_shift_qihi_constant (enum rtx_code code,
   return true;
 }
 
+void
+ix86_expand_vecop_qihi_partial (enum rtx_code code, rtx dest, rtx op1, rtx op2)
+{
+  machine_mode qimode = GET_MODE (dest);
+  rtx qop1, qop2, hop1, hop2, qdest, hres;
+  bool op2vec = GET_MODE_CLASS (GET_MODE (op2)) == MODE_VECTOR_INT;
+  bool uns_p = true;
+
+  switch (qimode)
+    {
+    case E_V4QImode:
+    case E_V8QImode:
+      break;
+    default:
+      gcc_unreachable ();
+    }
+
+  qop1 = lowpart_subreg (V16QImode, force_reg (qimode, op1), qimode);
+
+  if (op2vec)
+    qop2 = lowpart_subreg (V16QImode, force_reg (qimode, op2), qimode);
+  else
+    qop2 = op2;
+
+  switch (code)
+    {
+    case MULT:
+      gcc_assert (op2vec);
+      /* Unpack data such that we've got a source byte in each low byte of
+	 each word.  We don't care what goes into the high byte of each word.
+	 Rather than trying to get zero in there, most convenient is to let
+	 it be a copy of the low byte.  */
+      hop1 = copy_to_reg (qop1);
+      hop2 = copy_to_reg (qop2);
+      emit_insn (gen_vec_interleave_lowv16qi (hop1, hop1, hop1));
+      emit_insn (gen_vec_interleave_lowv16qi (hop2, hop2, hop2));
+      break;
+
+    case ASHIFTRT:
+      uns_p = false;
+      /* FALLTHRU */
+    case ASHIFT:
+    case LSHIFTRT:
+      hop1 = gen_reg_rtx (V8HImode);
+      ix86_expand_sse_unpack (hop1, qop1, uns_p, false);
+      /* vashr/vlshr/vashl  */
+      if (op2vec)
+	{
+	  hop2 = gen_reg_rtx (V8HImode);
+	  ix86_expand_sse_unpack (hop2, qop2, uns_p, false);
+	}
+      else
+	hop2 = qop2;
+
+      break;
+    default:
+      gcc_unreachable ();
+    }
+
+  if (code != MULT && op2vec)
+    {
+      /* Expand vashr/vlshr/vashl.  */
+      hres = gen_reg_rtx (V8HImode);
+      emit_insn (gen_rtx_SET (hres,
+			      simplify_gen_binary (code, V8HImode,
+						   hop1, hop2)));
+    }
+  else
+    /* Expand mult/ashr/lshr/ashl.  */
+    hres = expand_simple_binop (V8HImode, code, hop1, hop2,
+				NULL_RTX, 1, OPTAB_DIRECT);
+
+  if (TARGET_AVX512BW && TARGET_AVX512VL)
+    {
+      if (qimode == V8QImode)
+	qdest = dest;
+      else
+	qdest = gen_reg_rtx (V8QImode);
+
+      emit_insn (gen_truncv8hiv8qi2 (qdest, hres));
+    }
+  else
+    {
+      struct expand_vec_perm_d d;
+      rtx qres = gen_lowpart (V16QImode, hres);
+      bool ok;
+      int i;
+
+      qdest = gen_reg_rtx (V16QImode);
+
+      /* Merge the data back into the right place.  */
+      d.target = qdest;
+      d.op0 = qres;
+      d.op1 = qres;
+      d.vmode = V16QImode;
+      d.nelt = 16;
+      d.one_operand_p = false;
+      d.testing_p = false;
+
+      for (i = 0; i < d.nelt; ++i)
+	d.perm[i] = i * 2;
+
+      ok = ix86_expand_vec_perm_const_1 (&d);
+      gcc_assert (ok);
+    }
+
+  if (qdest != dest)
+    emit_move_insn (dest, gen_lowpart (qimode, qdest));
+}
+
 /* Expand a vector operation CODE for a V*QImode in terms of the
    same operation on V*HImode.  */
 
@@ -23281,6 +23391,7 @@ ix86_expand_vecop_qihi (enum rtx_code code, rtx dest, rtx op1, rtx op2)
   rtx (*gen_il) (rtx, rtx, rtx);
   rtx (*gen_ih) (rtx, rtx, rtx);
   rtx op1_l, op1_h, op2_l, op2_h, res_l, res_h;
+  bool op2vec = GET_MODE_CLASS (GET_MODE (op2)) == MODE_VECTOR_INT;
   struct expand_vec_perm_d d;
   bool full_interleave = true;
   bool uns_p = true;
@@ -23315,6 +23426,7 @@ ix86_expand_vecop_qihi (enum rtx_code code, rtx dest, rtx op1, rtx op2)
   switch (code)
     {
     case MULT:
+      gcc_assert (op2vec);
       /* Unpack data such that we've got a source byte in each low byte of
 	 each word.  We don't care what goes into the high byte of each word.
 	 Rather than trying to get zero in there, most convenient is to let
@@ -23360,7 +23472,7 @@ ix86_expand_vecop_qihi (enum rtx_code code, rtx dest, rtx op1, rtx op2)
       ix86_expand_sse_unpack (op1_l, op1, uns_p, false);
       ix86_expand_sse_unpack (op1_h, op1, uns_p, true);
       /* vashr/vlshr/vashl  */
-      if (GET_MODE_CLASS (GET_MODE (op2)) == MODE_VECTOR_INT)
+      if (op2vec)
 	{
 	  rtx tmp = force_reg (qimode, op2);
 	  op2_l = gen_reg_rtx (himode);
@@ -23376,8 +23488,7 @@ ix86_expand_vecop_qihi (enum rtx_code code, rtx dest, rtx op1, rtx op2)
       gcc_unreachable ();
     }
 
-  if (code != MULT
-      && GET_MODE_CLASS (GET_MODE (op2)) == MODE_VECTOR_INT)
+  if (code != MULT && op2vec)
     {
       /* Expand vashr/vlshr/vashl.  */
       res_l = gen_reg_rtx (himode);
@@ -23435,9 +23546,6 @@ ix86_expand_vecop_qihi (enum rtx_code code, rtx dest, rtx op1, rtx op2)
 
   ok = ix86_expand_vec_perm_const_1 (&d);
   gcc_assert (ok);
-
-  set_unique_reg_note (get_last_insn (), REG_EQUAL,
-		       gen_rtx_fmt_ee (code, qimode, op1, op2));
 }
 
 /* Helper function of ix86_expand_mul_widen_evenodd.  Return true
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 71ae95ffef7..d0f5783173e 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -215,6 +215,7 @@ extern void ix86_expand_round (rtx, rtx);
 extern void ix86_expand_rounddf_32 (rtx, rtx);
 extern void ix86_expand_round_sse4 (rtx, rtx);
 
+extern void ix86_expand_vecop_qihi_partial (enum rtx_code, rtx, rtx, rtx);
 extern void ix86_expand_vecop_qihi (enum rtx_code, rtx, rtx, rtx);
 extern rtx ix86_split_stack_guard (void);
 
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 9ab24242b59..369a718c880 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -20465,6 +20465,14 @@ ix86_multiplication_cost (const struct processor_costs *cost,
   else if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
     switch (mode)
       {
+      case V4QImode:
+      case V8QImode:
+	/* Partial V*QImode is emulated with 4-5 insns.  */
+	if ((TARGET_AVX512BW && TARGET_AVX512VL) || TARGET_XOP)
+	  return ix86_vec_cost (mode, cost->mulss + cost->sse_op * 3);
+	else
+	  return ix86_vec_cost (mode, cost->mulss + cost->sse_op * 4);
+
       case V16QImode:
 	/* V*QImode is emulated with 4-11 insns.  */
 	if (TARGET_AVX512BW && TARGET_AVX512VL)
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index b2954fff8ae..45773673049 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -2149,6 +2149,26 @@ (define_insn "mulv2hi3"
    (set_attr "type" "ssemul")
    (set_attr "mode" "TI")])
 
+(define_expand "mulv8qi3"
+  [(set (match_operand:V8QI 0 "register_operand")
+	(mult:V8QI (match_operand:V8QI 1 "register_operand")
+		   (match_operand:V8QI 2 "register_operand")))]
+  "TARGET_MMX_WITH_SSE"
+{
+  ix86_expand_vecop_qihi_partial (MULT, operands[0], operands[1], operands[2]);
+  DONE;
+})
+
+(define_expand "mulv4qi3"
+  [(set (match_operand:V4QI 0 "register_operand")
+	(mult:V4QI (match_operand:V4QI 1 "register_operand")
+		   (match_operand:V4QI 2 "register_operand")))]
+  "TARGET_SSE2"
+{
+  ix86_expand_vecop_qihi_partial (MULT, operands[0], operands[1], operands[2]);
+  DONE;
+})
+
 (define_expand "mmx_smulv4hi3_highpart"
   [(set (match_operand:V4HI 0 "register_operand")
 	(truncate:V4HI
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index f14a9c24ebd..26dd0b1aa10 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -14987,16 +14987,6 @@ (define_split
 	(eq:VI12_AVX2 (match_dup 4) (match_dup 1)))]
   "operands[4] = gen_reg_rtx (<MODE>mode);")
 
-(define_expand "mulv8qi3"
-  [(set (match_operand:V8QI 0 "register_operand")
-	(mult:V8QI (match_operand:V8QI 1 "register_operand")
-		   (match_operand:V8QI 2 "register_operand")))]
-  "TARGET_AVX512VL && TARGET_AVX512BW && TARGET_64BIT"
-{
-  ix86_expand_vecop_qihi (MULT, operands[0], operands[1], operands[2]);
-  DONE;
-})
-
 (define_expand "mul<mode>3"
   [(set (match_operand:VI1_AVX512 0 "register_operand")
 	(mult:VI1_AVX512 (match_operand:VI1_AVX512 1 "register_operand")
diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-pr95488-1.c b/gcc/testsuite/gcc.target/i386/avx512vl-pr95488-1.c
index dc684a167c8..5e9f4f2805c 100644
--- a/gcc/testsuite/gcc.target/i386/avx512vl-pr95488-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx512vl-pr95488-1.c
@@ -1,7 +1,8 @@
 /* PR target/pr95488  */
 /* { dg-do compile } */
 /* { dg-options "-O2 -mavx512bw -mavx512vl" }  */
-/* { dg-final { scan-assembler-times "vpmovzxbw" 8 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "vpmovzxbw" 4 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "vpunpcklbw" 4 { target { ! ia32 } } } } */
 /* { dg-final { scan-assembler-times "vpmullw\[^\n\]*ymm" 2 } } */
 /* { dg-final { scan-assembler-times "vpmullw\[^\n\]*xmm" 2 { target { ! ia32 } } } } */
 /* { dg-final { scan-assembler-times "vpmovwb" 4 { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/vect-mulv4qi.c b/gcc/testsuite/gcc.target/i386/vect-mulv4qi.c
new file mode 100644
index 00000000000..d64bf044e91
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-mulv4qi.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize -msse2" } */
+
+#define N 4
+
+unsigned char ur[N], ua[N], ub[N];
+
+void mul (void)
+{
+  int i;
+
+  for (i = 0; i < N; i++)
+    ur[i] = ua[i] * ub[i];
+}
+
+void mul_slp (void)
+{
+  ur[0] = ua[0] * ub[0];
+  ur[1] = ua[1] * ub[1];
+  ur[2] = ua[2] * ub[2];
+  ur[3] = ua[3] * ub[3];
+}
+
+/* { dg-final { scan-assembler-times "pmullw" 2 } } */
diff --git a/gcc/testsuite/gcc.target/i386/vect-mulv8qi.c b/gcc/testsuite/gcc.target/i386/vect-mulv8qi.c
new file mode 100644
index 00000000000..05003644ec7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-mulv8qi.c
@@ -0,0 +1,28 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -ftree-vectorize -msse2" } */
+
+#define N 8
+
+unsigned char ur[N], ua[N], ub[N];
+
+void mul (void)
+{
+  int i;
+
+  for (i = 0; i < N; i++)
+    ur[i] = ua[i] * ub[i];
+}
+
+void mul_slp (void)
+{
+  ur[0] = ua[0] * ub[0];
+  ur[1] = ua[1] * ub[1];
+  ur[2] = ua[2] * ub[2];
+  ur[3] = ua[3] * ub[3];
+  ur[4] = ua[4] * ub[4];
+  ur[5] = ua[5] * ub[5];
+  ur[6] = ua[6] * ub[6];
+  ur[7] = ua[7] * ub[7];
+}
+
+/* { dg-final { scan-assembler-times "pmullw" 2 } } */