dmaengine: idxd: Change wmb() to smp_wmb() when copying completion record to user space
Message ID | 20240130025806.2027284-1-fenghua.yu@intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel+bounces-43866-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:2087:b0:106:209c:c626 with SMTP id gs7csp966475dyb; Mon, 29 Jan 2024 18:59:35 -0800 (PST) X-Google-Smtp-Source: AGHT+IFt7SluAKVZra7vll53OHSfo0Dj802YfsCzfuaEXokwDQF6iF7PBeNnlH8+K8AyGkhEcjAB X-Received: by 2002:a05:6a20:d38a:b0:19c:9c9a:1b0a with SMTP id iq10-20020a056a20d38a00b0019c9c9a1b0amr3695482pzb.45.1706583574986; Mon, 29 Jan 2024 18:59:34 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706583574; cv=pass; d=google.com; s=arc-20160816; b=acSZZ0iJyeoy3h7nJM+mG+p+QwCBGSoPk8NWQq5kalRWg6t6kHkR16uBrJ6+6AidwA 2+bgFQgaSQTFoKOZEU9o4/vZ5knVM3309vtCh8Lz+X9ysGnKQakyrbhd9RYCxFsUu8r6 nGWePbueh2Q45T9TMq2Mt4kL1J/S6R8yPMBH13BLUIA7Nrt+17P0ykmUPBQxGlq2ix15 7gISJaMWCx5pqILLgW7HzvMwqGQXK2D09MQxwc+35klhfrIVuMtteoNd3Beai5wd7gSQ H7B+lMXoB+yOCSiTmT93wtukczNDHL4A1Zvk8EZx8siYAOGH2R+yEr6GvigPcqZTT+dn q7fQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:message-id:date:subject:cc:to :from:dkim-signature; bh=NcwVONN2TUzwf50oH74Ok3NsqmNIRgMSWJGHM3YyTno=; fh=e2YG26LXyEhPGSAZzEiiz8q8INMSuVxKRbwhcKAj0qg=; b=hA3byWInZLcFl/lsT8T91m9fO0jo4neXmseNUva7TKid30pUCSXCDNx7cDvmSy9Fox j7VDP6pDK7ljgiOvMF7LrP+UJQeAeTZDIAUbV71+ZZR4VJKuZCMiFKlJU7PpbEkb9//7 HUydrH/6Yvy8t4O0pSPRgln+HEUpEgfJTadCKw86m6U6+gsKmwMP/+zirgzXhZnEw+gp EJrE7E49LezBuYICIXrqzGUowlPDXGHyAR0HrIcCWsTQty3u/7W6BbGJ3VmHd4G8ojX7 EWhaDPO2UPJh7kcUXSGv0ZGe9toxZXVHrBigtveZjbtBS4vNTOPF883XhSgWmBrolY4o YtfQ== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=Y1vd3WUV; arc=pass (i=1 spf=pass spfdomain=intel.com dkim=pass dkdomain=intel.com dmarc=pass fromdomain=intel.com); spf=pass (google.com: domain of linux-kernel+bounces-43866-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-43866-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id z7-20020a170902708700b001d74823fc1esi6492879plk.582.2024.01.29.18.59.34 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 29 Jan 2024 18:59:34 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-43866-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=Y1vd3WUV; arc=pass (i=1 spf=pass spfdomain=intel.com dkim=pass dkdomain=intel.com dmarc=pass fromdomain=intel.com); spf=pass (google.com: domain of linux-kernel+bounces-43866-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-43866-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id BE7DF283B7B for <ouuuleilei@gmail.com>; Tue, 30 Jan 2024 02:59:34 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 32EF837711; Tue, 30 Jan 2024 02:58:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="Y1vd3WUV" Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 92EC8374E9; Tue, 30 Jan 2024 02:58:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.9 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706583533; cv=none; b=IrznwPCSRPLZqgSW3m6syVTJ7m1HtMWp2VmMB9DnmeJpB50ddKcz2W5DaOoslFXV4GNzK7xyzr78/0Cj+Cg8oiqETAnAiyaQW1/r5H7Qvq7emuFQhvTKPiKQjqinzCiy5/jcYAx3+XdYV3djvDXnRoFalTYWKk0+leGpPTOwP+c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706583533; c=relaxed/simple; bh=GmuwNFJzWnA1Oqg0ns6ZK1TFM5ogyUhRCfAnEm9agos=; h=From:To:Cc:Subject:Date:Message-Id:MIME-Version; b=OO4h8TN2G986qWk5pojOGs0LR67+GRQCbaWKCjbkwKkIVIwxaanMi1i+ik9CrlwXGbC4Wz2y4W/+0bCYmKnW4e37f1WfNdDT6k+SkrfJMyUeZy8xrhuvIWva+s2x6WuazhYn2p/KRGMzSIvA5xBnOyJ3YgRfP+xMJ8qcRTqAKyQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=Y1vd3WUV; arc=none smtp.client-ip=192.198.163.9 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1706583531; x=1738119531; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=GmuwNFJzWnA1Oqg0ns6ZK1TFM5ogyUhRCfAnEm9agos=; b=Y1vd3WUVOe2fmCAOY4HOVXRn3e6qUsV/k+Djxxu6lEeh0VYM/VROdLd8 YQ3uJquMGarfuRVGMUNgdVPeuzx+6Xii5DbhWxaEiyF3upwKBxpG6pg6p ZkM1DS+BjtGh7IgAXKeOPgQrmlW7aiMQYsemRhyoFgSXLrwypJGLBWl2c 34R8kGrQn+Rtn6nCkbJIWkfASqS/nlt6aSUuZ1HZp1XTip3Xq/UX2aq7J YtJmSefMZMiOxlOVH11QiWZtCOnl83gNip3PLdoKvq4b3JIeAvaL00JmS ZCf65iv4LgIfRGtqeIxT6dpk1f6jTyu9M13PsohPsOn2ocv2jopOOPtx5 Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10968"; a="9891664" X-IronPort-AV: E=Sophos;i="6.05,707,1701158400"; d="scan'208";a="9891664" Received: from orviesa005.jf.intel.com ([10.64.159.145]) by fmvoesa103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Jan 2024 18:58:50 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.05,707,1701158400"; d="scan'208";a="3520576" Received: from fyu1.sc.intel.com ([172.25.103.126]) by orviesa005.jf.intel.com with ESMTP; 29 Jan 2024 18:58:49 -0800 From: Fenghua Yu <fenghua.yu@intel.com> To: "Vinod Koul" <vkoul@kernel.org>, "Dave Jiang" <dave.jiang@intel.com> Cc: dmaengine@vger.kernel.org, "linux-kernel" <linux-kernel@vger.kernel.org>, Fenghua Yu <fenghua.yu@intel.com>, Nikhil Rao <nikhil.rao@intel.com>, Tony Zhu <tony.zhu@intel.com> Subject: [PATCH] dmaengine: idxd: Change wmb() to smp_wmb() when copying completion record to user space Date: Mon, 29 Jan 2024 18:58:06 -0800 Message-Id: <20240130025806.2027284-1-fenghua.yu@intel.com> X-Mailer: git-send-email 2.37.1 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1789482578727754131 X-GMAIL-MSGID: 1789482578727754131 |
Series |
dmaengine: idxd: Change wmb() to smp_wmb() when copying completion record to user space
|
|
Commit Message
Fenghua Yu
Jan. 30, 2024, 2:58 a.m. UTC
wmb() is used to ensure status in the completion record is written after the rest of the completion record, making it visible to the user. However, on SMP systems, this may not guarantee visibility across different CPUs. Considering this scenario that event log handler is running on CPU1 while user app is polling completion record (cr) status on CPU2: CPU1 CPU2 event log handler user app 1. cr = 0 (status = 0) 2. copy X to user cr except "status" 3. wmb() 4. copy Y to user cr "status" 5. poll status value Y 6. read rest cr which is still 0. cr handling fails 7. cr value X visible now Although wmb() ensure value Y is written and visible after X is written on CPU1, the order is not guaranteed on CPU2. So user app may see status value Y while cr value X is still not visible yet on CPU2. This will cause reading 0 from the rest of cr and cr handling fails. Changing wmb() to smp_wmb() ensures Y is written after X on both CPU1 and CPU2. This guarantees that user app can consume cr in right order. Fixes: b022f59725f0 ("dmaengine: idxd: add idxd_copy_cr() to copy user completion record during page fault handling") Suggested-by: Nikhil Rao <nikhil.rao@intel.com> Tested-by: Tony Zhu <tony.zhu@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> --- drivers/dma/idxd/cdev.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
Comments
On 1/29/24 19:58, Fenghua Yu wrote: > wmb() is used to ensure status in the completion record is written > after the rest of the completion record, making it visible to the user. > However, on SMP systems, this may not guarantee visibility across > different CPUs. > > Considering this scenario that event log handler is running on CPU1 while > user app is polling completion record (cr) status on CPU2: > > CPU1 CPU2 > event log handler user app > > 1. cr = 0 (status = 0) > 2. copy X to user cr except "status" > 3. wmb() > 4. copy Y to user cr "status" > 5. poll status value Y > 6. read rest cr which is still 0. > cr handling fails > 7. cr value X visible now > > Although wmb() ensure value Y is written and visible after X is written > on CPU1, the order is not guaranteed on CPU2. So user app may see status > value Y while cr value X is still not visible yet on CPU2. This will > cause reading 0 from the rest of cr and cr handling fails. > > Changing wmb() to smp_wmb() ensures Y is written after X on both CPU1 > and CPU2. This guarantees that user app can consume cr in right order. > > Fixes: b022f59725f0 ("dmaengine: idxd: add idxd_copy_cr() to copy user completion record during page fault handling") > Suggested-by: Nikhil Rao <nikhil.rao@intel.com> > Tested-by: Tony Zhu <tony.zhu@intel.com> > Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> > --- > drivers/dma/idxd/cdev.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c > index 77f8885cf407..9b7388a23cbe 100644 > --- a/drivers/dma/idxd/cdev.c > +++ b/drivers/dma/idxd/cdev.c > @@ -681,9 +681,10 @@ int idxd_copy_cr(struct idxd_wq *wq, ioasid_t pasid, unsigned long addr, > * Ensure that the completion record's status field is written > * after the rest of the completion record has been written. > * This ensures that the user receives the correct completion > - * record information once polling for a non-zero status. > + * record information on any CPU once polling for a non-zero > + * status. > */ > - wmb(); > + smp_wmb(); > status = *(u8 *)cr; > if (put_user(status, (u8 __user *)addr)) > left += status_size;
On 1/29/2024 8:58 PM, Fenghua Yu wrote: > wmb() is used to ensure status in the completion record is written > after the rest of the completion record, making it visible to the user. > However, on SMP systems, this may not guarantee visibility across > different CPUs. > > Considering this scenario that event log handler is running on CPU1 while > user app is polling completion record (cr) status on CPU2: > > CPU1 CPU2 > event log handler user app > > 1. cr = 0 (status = 0) > 2. copy X to user cr except "status" > 3. wmb() > 4. copy Y to user cr "status" > 5. poll status value Y > 6. read rest cr which is still 0. > cr handling fails > 7. cr value X visible now > > Although wmb() ensure value Y is written and visible after X is written > on CPU1, the order is not guaranteed on CPU2. So user app may see status > value Y while cr value X is still not visible yet on CPU2. This will > cause reading 0 from the rest of cr and cr handling fails. > > Changing wmb() to smp_wmb() ensures Y is written after X on both CPU1 > and CPU2. This guarantees that user app can consume cr in right order. > > Fixes: b022f59725f0 ("dmaengine: idxd: add idxd_copy_cr() to copy user completion record during page fault handling") > Suggested-by: Nikhil Rao <nikhil.rao@intel.com> > Tested-by: Tony Zhu <tony.zhu@intel.com> > Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Acked-by: Lijun Pan <lijun.pan@intel.com> some minor comments below. > --- > drivers/dma/idxd/cdev.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c > index 77f8885cf407..9b7388a23cbe 100644 > --- a/drivers/dma/idxd/cdev.c > +++ b/drivers/dma/idxd/cdev.c > @@ -681,9 +681,10 @@ int idxd_copy_cr(struct idxd_wq *wq, ioasid_t pasid, unsigned long addr, > * Ensure that the completion record's status field is written > * after the rest of the completion record has been written. > * This ensures that the user receives the correct completion > - * record information once polling for a non-zero status. > + * record information on any CPU once polling for a non-zero Maybe add "smp system" to the sentence to be more explicit since people usually only read above comments instead of the commit message later on. say, "record information on any CPUs of a SMP system once polling for a non-zero" > + * status. > */ > - wmb(); > + smp_wmb(); > status = *(u8 *)cr; > if (put_user(status, (u8 __user *)addr)) > left += status_size;
This patch might be ok (it looks reasonable as an optimization), but I think the description of wmb() and smp_wmb() is incorrect. I also think that you're missing an rmb()/smp_rmb()eor equivalent on the reader side. On Mon, Jan 29, 2024 at 06:58:06PM -0800, Fenghua Yu wrote: > wmb() is used to ensure status in the completion record is written > after the rest of the completion record, making it visible to the user. > However, on SMP systems, this may not guarantee visibility across > different CPUs. > > Considering this scenario that event log handler is running on CPU1 while > user app is polling completion record (cr) status on CPU2: > > CPU1 CPU2 > event log handler user app > > 1. cr = 0 (status = 0) > 2. copy X to user cr except "status" > 3. wmb() > 4. copy Y to user cr "status" > 5. poll status value Y > 6. read rest cr which is still 0. > cr handling fails > 7. cr value X visible now > > Although wmb() ensure value Y is written and visible after X is written > on CPU1, the order is not guaranteed on CPU2. So user app may see status > value Y while cr value X is still not visible yet on CPU2. This will > cause reading 0 from the rest of cr and cr handling fails. The wmb() on CPU1 ensures the order of the reads, but you need an rmb() on CPU2 between reading the 'status' and 'rest' parts; otherwise CPU2 (or the compiler!) is permitted to hoist the read of 'rest' early, before reading from 'status', and hence you can end up with a sequence that is effectively: CPU1 CPU2 event log handler user app 1. cr = 0 (status = 0) 6a. read rest cr which is still 0. 2. copy X to user cr except "status" 3. wmb() 4. copy Y to user cr "status" 5. poll status value Y 6b. cr handling fails 7. cr value X visible now Since this is all to regular cacheable memory, it's *sufficient* to use smp_wmb() and smp_rmb(), but that's an optimization rather than an ordering fix. Note that on x86_64, TSO means that the stores are in-order (and so smp_wmb() is just a compiler barrier), and IIUC loads are not reordered w.r.t. other loads (and so smp_rmb() is also just a compiler barrier). > Changing wmb() to smp_wmb() ensures Y is written after X on both CPU1 > and CPU2. This guarantees that user app can consume cr in right order. This implies that smp_wmb() is *stronger* than wmb(), whereas smp_wmb() is actually *weaker* (e.g. on x86_64 wmb() is an sfence, whereas smp_wmb() is a barrier()). Thanks, Mark. > > Fixes: b022f59725f0 ("dmaengine: idxd: add idxd_copy_cr() to copy user completion record during page fault handling") > Suggested-by: Nikhil Rao <nikhil.rao@intel.com> > Tested-by: Tony Zhu <tony.zhu@intel.com> > Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> > --- > drivers/dma/idxd/cdev.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c > index 77f8885cf407..9b7388a23cbe 100644 > --- a/drivers/dma/idxd/cdev.c > +++ b/drivers/dma/idxd/cdev.c > @@ -681,9 +681,10 @@ int idxd_copy_cr(struct idxd_wq *wq, ioasid_t pasid, unsigned long addr, > * Ensure that the completion record's status field is written > * after the rest of the completion record has been written. > * This ensures that the user receives the correct completion > - * record information once polling for a non-zero status. > + * record information on any CPU once polling for a non-zero > + * status. > */ > - wmb(); > + smp_wmb(); > status = *(u8 *)cr; > if (put_user(status, (u8 __user *)addr)) > left += status_size; > -- > 2.37.1 > >
On Tue, Jan 30, 2024 at 05:58:24PM +0000, Mark Rutland wrote: > This patch might be ok (it looks reasonable as an optimization), but I think > the description of wmb() and smp_wmb() is incorrect. I also think that you're > missing an rmb()/smp_rmb()eor equivalent on the reader side. Sorry, the above should have said: an rmb()/smp_rmb() *or* equivalent > > On Mon, Jan 29, 2024 at 06:58:06PM -0800, Fenghua Yu wrote: > > wmb() is used to ensure status in the completion record is written > > after the rest of the completion record, making it visible to the user. > > However, on SMP systems, this may not guarantee visibility across > > different CPUs. > > > > Considering this scenario that event log handler is running on CPU1 while > > user app is polling completion record (cr) status on CPU2: > > > > CPU1 CPU2 > > event log handler user app > > > > 1. cr = 0 (status = 0) > > 2. copy X to user cr except "status" > > 3. wmb() > > 4. copy Y to user cr "status" > > 5. poll status value Y > > 6. read rest cr which is still 0. > > cr handling fails > > 7. cr value X visible now > > > > Although wmb() ensure value Y is written and visible after X is written > > on CPU1, the order is not guaranteed on CPU2. So user app may see status > > value Y while cr value X is still not visible yet on CPU2. This will > > cause reading 0 from the rest of cr and cr handling fails. > > The wmb() on CPU1 ensures the order of the reads, but you need an rmb() on CPU2 Sorry again, the above should have said: The wmb() on CPU1 ensures the order of the *writes* Apologies for any confusion resulting from those mistakes. Mark. > between reading the 'status' and 'rest' parts; otherwise CPU2 (or the > compiler!) is permitted to hoist the read of 'rest' early, before reading from > 'status', and hence you can end up with a sequence that is effectively: > > CPU1 CPU2 > event log handler user app > > 1. cr = 0 (status = 0) > 6a. read rest cr which is still 0. > 2. copy X to user cr except "status" > 3. wmb() > 4. copy Y to user cr "status" > 5. poll status value Y > 6b. cr handling fails > 7. cr value X visible now > > Since this is all to regular cacheable memory, it's *sufficient* to use > smp_wmb() and smp_rmb(), but that's an optimization rather than an ordering > fix. > > Note that on x86_64, TSO means that the stores are in-order (and so smp_wmb() > is just a compiler barrier), and IIUC loads are not reordered w.r.t. other > loads (and so smp_rmb() is also just a compiler barrier). > > > Changing wmb() to smp_wmb() ensures Y is written after X on both CPU1 > > and CPU2. This guarantees that user app can consume cr in right order. > > This implies that smp_wmb() is *stronger* than wmb(), whereas smp_wmb() is > actually *weaker* (e.g. on x86_64 wmb() is an sfence, whereas smp_wmb() is a > barrier()). > > Thanks, > Mark. > > > > > Fixes: b022f59725f0 ("dmaengine: idxd: add idxd_copy_cr() to copy user completion record during page fault handling") > > Suggested-by: Nikhil Rao <nikhil.rao@intel.com> > > Tested-by: Tony Zhu <tony.zhu@intel.com> > > Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> > > --- > > drivers/dma/idxd/cdev.c | 5 +++-- > > 1 file changed, 3 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c > > index 77f8885cf407..9b7388a23cbe 100644 > > --- a/drivers/dma/idxd/cdev.c > > +++ b/drivers/dma/idxd/cdev.c > > @@ -681,9 +681,10 @@ int idxd_copy_cr(struct idxd_wq *wq, ioasid_t pasid, unsigned long addr, > > * Ensure that the completion record's status field is written > > * after the rest of the completion record has been written. > > * This ensures that the user receives the correct completion > > - * record information once polling for a non-zero status. > > + * record information on any CPU once polling for a non-zero > > + * status. > > */ > > - wmb(); > > + smp_wmb(); > > status = *(u8 *)cr; > > if (put_user(status, (u8 __user *)addr)) > > left += status_size; > > -- > > 2.37.1 > > > > >
On Tue, Jan 30, 2024 at 05:58:24PM +0000, Mark Rutland wrote: > This patch might be ok (it looks reasonable as an optimization), but I think > the description of wmb() and smp_wmb() is incorrect. I also think that you're Agreed. A wmb() -> smp_wmb() change can only be an optimization rather than a fix. > missing an rmb()/smp_rmb()eor equivalent on the reader side. > > On Mon, Jan 29, 2024 at 06:58:06PM -0800, Fenghua Yu wrote: > > wmb() is used to ensure status in the completion record is written > > after the rest of the completion record, making it visible to the user. > > However, on SMP systems, this may not guarantee visibility across > > different CPUs. > > > > Considering this scenario that event log handler is running on CPU1 while > > user app is polling completion record (cr) status on CPU2: > > > > CPU1 CPU2 > > event log handler user app > > > > 1. cr = 0 (status = 0) > > 2. copy X to user cr except "status" > > 3. wmb() > > 4. copy Y to user cr "status" > > 5. poll status value Y > > 6. read rest cr which is still 0. > > cr handling fails > > 7. cr value X visible now > > > > Although wmb() ensure value Y is written and visible after X is written > > on CPU1, the order is not guaranteed on CPU2. So user app may see status > > value Y while cr value X is still not visible yet on CPU2. This will > > cause reading 0 from the rest of cr and cr handling fails. > > The wmb() on CPU1 ensures the order of the reads, but you need an rmb() on CPU2 > between reading the 'status' and 'rest' parts; otherwise CPU2 (or the > compiler!) is permitted to hoist the read of 'rest' early, before reading from > 'status', and hence you can end up with a sequence that is effectively: > > CPU1 CPU2 > event log handler user app > > 1. cr = 0 (status = 0) > 6a. read rest cr which is still 0. > 2. copy X to user cr except "status" > 3. wmb() > 4. copy Y to user cr "status" > 5. poll status value Y > 6b. cr handling fails > 7. cr value X visible now > > Since this is all to regular cacheable memory, it's *sufficient* to use > smp_wmb() and smp_rmb(), but that's an optimization rather than an ordering > fix. > > Note that on x86_64, TSO means that the stores are in-order (and so smp_wmb() > is just a compiler barrier), and IIUC loads are not reordered w.r.t. other > loads (and so smp_rmb() is also just a compiler barrier). > > > Changing wmb() to smp_wmb() ensures Y is written after X on both CPU1 > > and CPU2. This guarantees that user app can consume cr in right order. A barrier can only provide ordering for memory accesses on the same CPU, so this doesn't make any sense. > > This implies that smp_wmb() is *stronger* than wmb(), whereas smp_wmb() is > actually *weaker* (e.g. on x86_64 wmb() is an sfence, whereas smp_wmb() is a > barrier()). > > Thanks, > Mark. > > > > > Fixes: b022f59725f0 ("dmaengine: idxd: add idxd_copy_cr() to copy user completion record during page fault handling") > > Suggested-by: Nikhil Rao <nikhil.rao@intel.com> > > Tested-by: Tony Zhu <tony.zhu@intel.com> Since it has a "Fixes" tag and a "Tested-by" tag, I'd assume there has been a test w/ and w/o this patch showing it can resolve a real issue *constantly*? If so, I think x86 might be broken somewhere. [Cc x86 maintainers] Regards, Boqun > > Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> > > --- > > drivers/dma/idxd/cdev.c | 5 +++-- > > 1 file changed, 3 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c > > index 77f8885cf407..9b7388a23cbe 100644 > > --- a/drivers/dma/idxd/cdev.c > > +++ b/drivers/dma/idxd/cdev.c > > @@ -681,9 +681,10 @@ int idxd_copy_cr(struct idxd_wq *wq, ioasid_t pasid, unsigned long addr, > > * Ensure that the completion record's status field is written > > * after the rest of the completion record has been written. > > * This ensures that the user receives the correct completion > > - * record information once polling for a non-zero status. > > + * record information on any CPU once polling for a non-zero > > + * status. > > */ > > - wmb(); > > + smp_wmb(); > > status = *(u8 *)cr; > > if (put_user(status, (u8 __user *)addr)) > > left += status_size; > > -- > > 2.37.1 > > > >
On 1/30/24 11:53, Boqun Feng wrote: >>> Fixes: b022f59725f0 ("dmaengine: idxd: add idxd_copy_cr() to copy user completion record during page fault handling") >>> Suggested-by: Nikhil Rao <nikhil.rao@intel.com> >>> Tested-by: Tony Zhu <tony.zhu@intel.com> > Since it has a "Fixes" tag and a "Tested-by" tag, I'd assume there has > been a test w/ and w/o this patch showing it can resolve a real issue > *constantly*? If so, I think x86 might be broken somewhere. > > [Cc x86 maintainers] Fenghua, could you perhaps explain how this problem affects end users? What symptom was observed that made it obvious something was broken and what changes with this patch?
Hi, Dave, Boqun, and Mark, On 1/30/24 12:30, Dave Hansen wrote: > On 1/30/24 11:53, Boqun Feng wrote: >>>> Fixes: b022f59725f0 ("dmaengine: idxd: add idxd_copy_cr() to copy user completion record during page fault handling") >>>> Suggested-by: Nikhil Rao <nikhil.rao@intel.com> >>>> Tested-by: Tony Zhu <tony.zhu@intel.com> >> Since it has a "Fixes" tag and a "Tested-by" tag, I'd assume there has >> been a test w/ and w/o this patch showing it can resolve a real issue >> *constantly*? If so, I think x86 might be broken somewhere. >> >> [Cc x86 maintainers] > > Fenghua, could you perhaps explain how this problem affects end users? > What symptom was observed that made it obvious something was broken and > what changes with this patch? There is no issue found by any test. This wmb() code was reviewed and was "thought" that it may have a potential issue. The patch was tested without breaking any existing tests. From the discussions with Boqun and Mark, this patch might just be an optimization rather than a fix. Let me think about it further and may update commit message in v2 or withdraw this patch since it won't really fix an issue. Thank you very much for review! -Fenghua
> -----Original Message----- > From: Yu, Fenghua <fenghua.yu@intel.com> > Sent: Wednesday, January 31, 2024 2:12 AM > To: Hansen, Dave <dave.hansen@intel.com>; Boqun Feng > <boqun.feng@gmail.com>; Mark Rutland <mark.rutland@arm.com> > Cc: Vinod Koul <vkoul@kernel.org>; Jiang, Dave <dave.jiang@intel.com>; > dmaengine@vger.kernel.org; linux-kernel <linux-kernel@vger.kernel.org>; Rao, > Nikhil <nikhil.rao@intel.com>; Zhu, Tony <tony.zhu@intel.com>; Mathieu > Desnoyers <mathieu.desnoyers@efficios.com>; Thomas Gleixner > <tglx@linutronix.de>; Ingo Molnar <mingo@redhat.com>; Borislav Petkov > <bp@alien8.de>; Dave Hansen <dave.hansen@linux.intel.com>; > x86@kernel.org > Subject: Re: [PATCH] dmaengine: idxd: Change wmb() to smp_wmb() when > copying completion record to user space > > Hi, Dave, Boqun, and Mark, > > On 1/30/24 12:30, Dave Hansen wrote: > > On 1/30/24 11:53, Boqun Feng wrote: > >>>> Fixes: b022f59725f0 ("dmaengine: idxd: add idxd_copy_cr() to copy > >>>> user completion record during page fault handling") > >>>> Suggested-by: Nikhil Rao <nikhil.rao@intel.com> > >>>> Tested-by: Tony Zhu <tony.zhu@intel.com> > >> Since it has a "Fixes" tag and a "Tested-by" tag, I'd assume there > >> has been a test w/ and w/o this patch showing it can resolve a real > >> issue *constantly*? If so, I think x86 might be broken somewhere. > >> > >> [Cc x86 maintainers] > > > > Fenghua, could you perhaps explain how this problem affects end users? > > What symptom was observed that made it obvious something was broken > > and what changes with this patch? > > There is no issue found by any test. This wmb() code was reviewed and was > "thought" that it may have a potential issue. I had made this suggestion since the code only needed a smp_wmb(), If the review refers to my suggestion, sorry if my message indicated a potential issue, that certainly wasn't my intention. memory-barriers.txt does say that mandatory barriers (of which wmb() is one) should not be used to control SMP effects. Nikhil
diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c index 77f8885cf407..9b7388a23cbe 100644 --- a/drivers/dma/idxd/cdev.c +++ b/drivers/dma/idxd/cdev.c @@ -681,9 +681,10 @@ int idxd_copy_cr(struct idxd_wq *wq, ioasid_t pasid, unsigned long addr, * Ensure that the completion record's status field is written * after the rest of the completion record has been written. * This ensures that the user receives the correct completion - * record information once polling for a non-zero status. + * record information on any CPU once polling for a non-zero + * status. */ - wmb(); + smp_wmb(); status = *(u8 *)cr; if (put_user(status, (u8 __user *)addr)) left += status_size;