Message ID | 20221118000524.1477383-1-ira.weiny@intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp690500wrr; Thu, 17 Nov 2022 16:07:06 -0800 (PST) X-Google-Smtp-Source: AA0mqf49UB095VZWn0MQHLmlpBJSHv05Lck6tbI+OFGBCUCzsrw8rznanRA22uqySYX/XU2ejD7H X-Received: by 2002:a05:6402:5406:b0:467:4b3d:f2ed with SMTP id ev6-20020a056402540600b004674b3df2edmr4207244edb.101.1668730025866; Thu, 17 Nov 2022 16:07:05 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1668730025; cv=none; d=google.com; s=arc-20160816; b=VlZxn1xxrS21lL3acX9z+3atx0VWNMJdFqdGtrtvu3PiRIqp7idT8vW1LBHKdvRFfd +4nTVGsbg/U/SQIY1P0dCTxYVONJJGwDGhxcC7YyY/qCOylepSQucCnKIC1dlQiRmu/V Tq1DnJc9F4AOpSOLEsAezVv4nkGJTOhGbikbhG1QBtptkPMZIb9At34Bnigw9XGW05tv s3X82+deXV+ChCkn9OGukXRjuXqFzTR8SfGjDBEQEYmjxDikqMbUYGDsk0DELdDKHfBj oIseO5EgB/s6mr3ObuvQLpGLrwV6RHkNKGDGhk86K56zfRwa1CFWDViipy8VLJ7ILM6t L/7w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=PyyshX2MXtCgxFh5/yxvPHPhEZ9CUlXVS8URKYK9m/0=; b=Jpt5z5LXif6BHFU0f9l9vVW6ZnEXURpwkbJf038Gw2EFOXLVGrqU3WnxOa06xS8WVO kfcbDYMVHPeOHPfo9FTKqtSzvt893fTRcDpSXurik8apWvLFTpYc55s6wa3np6RXJykS r5uDTUID85MEMZDegUJ5zZaw26AGqORcEV0L8WL+xR4wNzJtnaUimXlYEf5wpsSI5EuQ eG7fx2ChSdRxjQaDyDKrKkj0NGxjrnbBcwvRnE6m7GD+xrcgAPT52sFy1lO2GvUC7YCT cFX/xnwClNykZpMzD2TCOeHm8AgM+p60uvx4t20iBcuk3CdgMHQ121vEikp+rXitY52o n/tw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=RjVMaBgD; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id g23-20020a170906349700b007ad812915c9si1455497ejb.581.2022.11.17.16.06.35; Thu, 17 Nov 2022 16:07:05 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=RjVMaBgD; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235133AbiKRAFe (ORCPT <rfc822;a1648639935@gmail.com> + 99 others); Thu, 17 Nov 2022 19:05:34 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57606 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233270AbiKRAFa (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 17 Nov 2022 19:05:30 -0500 Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 74C7ABC0; Thu, 17 Nov 2022 16:05:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1668729929; x=1700265929; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=hWL8ZNgUOuvv1B4SV56p3dDMPMbvsjEleV962v2qcqc=; b=RjVMaBgDDUqIB3RA8JHiiYYKXgzaF/Dm8V9C71i0jLSP8h8khC+XCr+n T8gZwQ5yv36gaA+29UvaxC8ndzzzEHGg+OyaIiTOJH1py68nvwhsjT+HT tmJiJs9e91MFG2uajznUD+JGZ2BqrLXOu1BId21OpDnrxvMQBwXWKonyH u6gdWwt9IP4FCdLrk9163NVysTjheBROnmnNKXNkzxJiQJxQssU2xuoNg gohLESHOCwNM7lY0twhvbxmr2Q/4K7t+5eJadzxijGBW3hndfrKnmD2xB 8DbXMV4wZi0zgvf87ErURkvO3wPzbzLCcSJ/yZ08y+duU2MYr23Zj7F4r A==; X-IronPort-AV: E=McAfee;i="6500,9779,10534"; a="300546400" X-IronPort-AV: E=Sophos;i="5.96,172,1665471600"; d="scan'208";a="300546400" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Nov 2022 16:05:28 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10534"; a="590814123" X-IronPort-AV: E=Sophos;i="5.96,172,1665471600"; d="scan'208";a="590814123" Received: from jmartin2-mobl.amr.corp.intel.com (HELO localhost) ([10.209.2.15]) by orsmga003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Nov 2022 16:05:27 -0800 From: ira.weiny@intel.com To: Bjorn Helgaas <bhelgaas@google.com>, Dan Williams <dan.j.williams@intel.com> Cc: Ira Weiny <ira.weiny@intel.com>, Bjorn Helgaas <helgaas@kernel.org>, Gregory Price <gregory.price@memverge.com>, Jonathan Cameron <Jonathan.Cameron@huawei.com>, Lukas Wunner <lukas@wunner.de>, Vishal Verma <vishal.l.verma@intel.com>, linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, linux-cxl@vger.kernel.org Subject: [PATCH V2] PCI/DOE: Detect on stack work items automatically Date: Thu, 17 Nov 2022 16:05:24 -0800 Message-Id: <20221118000524.1477383-1-ira.weiny@intel.com> X-Mailer: git-send-email 2.37.2 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1749790255337548979?= X-GMAIL-MSGID: =?utf-8?q?1749790255337548979?= |
Series |
[V2] PCI/DOE: Detect on stack work items automatically
|
|
Commit Message
Ira Weiny
Nov. 18, 2022, 12:05 a.m. UTC
From: Ira Weiny <ira.weiny@intel.com> Work item initialization needs to be done with either INIT_WORK_ONSTACK() or INIT_WORK() depending on how the work item is allocated. The callers of pci_doe_submit_task() allocate struct pci_doe_task on the stack and pci_doe_submit_task() incorrectly used INIT_WORK(). Jonathan suggested creating doe task allocation macros such as DECLARE_CDAT_DOE_TASK_ONSTACK().[1] The issue with this is the work function is not known to the callers and must be initialized correctly. A follow up suggestion was to have an internal 'pci_doe_work' item allocated by pci_doe_submit_task().[2] This requires an allocation which could restrict the context where tasks are used. Another idea was to have an intermediate step to initialize the task struct with a new call.[3] This added a lot of complexity. Lukas pointed out that object_is_on_stack() is available to detect this automatically. Use object_is_on_stack() to determine the correct init work function to call. [1] https://lore.kernel.org/linux-cxl/20221014151045.24781-1-Jonathan.Cameron@huawei.com/T/#m88a7f50dcce52f30c8bf5c3dcc06fa9843b54a2d [2] https://lore.kernel.org/linux-cxl/20221014151045.24781-1-Jonathan.Cameron@huawei.com/T/#m63c636c5135f304480370924f4d03c00357be667 [3] https://lore.kernel.org/all/20221115011943.1051039-1-ira.weiny@intel.com/ Cc: Bjorn Helgaas <helgaas@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Reported-by: Gregory Price <gregory.price@memverge.com> Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Suggested-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Ira Weiny <ira.weiny@intel.com> --- Changes from V1 Update oneliner Use object_is_on_stack() to make this a simple fix --- drivers/pci/doe.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) base-commit: 30a0b95b1335e12efef89dd78518ed3e4a71a763 prerequisite-patch-id: dfea657e07f37aa9d7c3d477d68b07f64fe78721 prerequisite-patch-id: e27264e76e637214ee50cdab0e5854b223d44b4e
Comments
On Thu, Nov 17, 2022 at 04:05:24PM -0800, Ira wrote: > From: Ira Weiny <ira.weiny@intel.com> Sorry about the extra pre-patches listed below. They are not needed. > > Work item initialization needs to be done with either > INIT_WORK_ONSTACK() or INIT_WORK() depending on how the work item is > allocated. > > The callers of pci_doe_submit_task() allocate struct pci_doe_task on the > stack and pci_doe_submit_task() incorrectly used INIT_WORK(). > > Jonathan suggested creating doe task allocation macros such as > DECLARE_CDAT_DOE_TASK_ONSTACK().[1] The issue with this is the work > function is not known to the callers and must be initialized correctly. > > A follow up suggestion was to have an internal 'pci_doe_work' item > allocated by pci_doe_submit_task().[2] This requires an allocation which > could restrict the context where tasks are used. > > Another idea was to have an intermediate step to initialize the task > struct with a new call.[3] This added a lot of complexity. > > Lukas pointed out that object_is_on_stack() is available to detect this > automatically. > > Use object_is_on_stack() to determine the correct init work function to > call. > > [1] https://lore.kernel.org/linux-cxl/20221014151045.24781-1-Jonathan.Cameron@huawei.com/T/#m88a7f50dcce52f30c8bf5c3dcc06fa9843b54a2d > [2] https://lore.kernel.org/linux-cxl/20221014151045.24781-1-Jonathan.Cameron@huawei.com/T/#m63c636c5135f304480370924f4d03c00357be667 > [3] https://lore.kernel.org/all/20221115011943.1051039-1-ira.weiny@intel.com/ > > Cc: Bjorn Helgaas <helgaas@kernel.org> > Cc: Dan Williams <dan.j.williams@intel.com> > Reported-by: Gregory Price <gregory.price@memverge.com> > Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > Suggested-by: Lukas Wunner <lukas@wunner.de> > Signed-off-by: Ira Weiny <ira.weiny@intel.com> > > --- > Changes from V1 > Update oneliner > Use object_is_on_stack() to make this a simple fix > --- > drivers/pci/doe.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/drivers/pci/doe.c b/drivers/pci/doe.c > index e402f05068a5..42de517022d9 100644 > --- a/drivers/pci/doe.c > +++ b/drivers/pci/doe.c > @@ -19,6 +19,7 @@ > #include <linux/pci.h> > #include <linux/pci-doe.h> > #include <linux/workqueue.h> > +#include <linux/sched/task_stack.h> > > #define PCI_DOE_PROTOCOL_DISCOVERY 0 > > @@ -529,7 +530,10 @@ int pci_doe_submit_task(struct pci_doe_mb *doe_mb, struct pci_doe_task *task) > return -EIO; > > task->doe_mb = doe_mb; > - INIT_WORK(&task->work, doe_statemachine_work); > + if (object_is_on_stack(&task->work)) > + INIT_WORK_ONSTACK(&task->work, doe_statemachine_work); > + else > + INIT_WORK(&task->work, doe_statemachine_work); > queue_work(doe_mb->work_queue, &task->work); > return 0; > } > > base-commit: 30a0b95b1335e12efef89dd78518ed3e4a71a763 This is correct. > prerequisite-patch-id: dfea657e07f37aa9d7c3d477d68b07f64fe78721 > prerequisite-patch-id: e27264e76e637214ee50cdab0e5854b223d44b4e These are not needed... Sorry, should I resend? Ira > -- > 2.37.2 >
On Thu, Nov 17, 2022 at 04:07:48PM -0800, Ira Weiny wrote: > On Thu, Nov 17, 2022 at 04:05:24PM -0800, Ira wrote: > > From: Ira Weiny <ira.weiny@intel.com> > > Sorry about the extra pre-patches listed below. They are not needed. > ... > > base-commit: 30a0b95b1335e12efef89dd78518ed3e4a71a763 > > This is correct. > > > prerequisite-patch-id: dfea657e07f37aa9d7c3d477d68b07f64fe78721 > > prerequisite-patch-id: e27264e76e637214ee50cdab0e5854b223d44b4e > > These are not needed... > > Sorry, should I resend? No worries, no need to resend it! Bjorn
From: ira.weiny@intel.com > Sent: 18 November 2022 00:05 > > Work item initialization needs to be done with either > INIT_WORK_ONSTACK() or INIT_WORK() depending on how the work item is > allocated. > > The callers of pci_doe_submit_task() allocate struct pci_doe_task on the > stack and pci_doe_submit_task() incorrectly used INIT_WORK(). > > Jonathan suggested creating doe task allocation macros such as > DECLARE_CDAT_DOE_TASK_ONSTACK().[1] The issue with this is the work > function is not known to the callers and must be initialized correctly. > > A follow up suggestion was to have an internal 'pci_doe_work' item > allocated by pci_doe_submit_task().[2] This requires an allocation which > could restrict the context where tasks are used. > > Another idea was to have an intermediate step to initialize the task > struct with a new call.[3] This added a lot of complexity. > > Lukas pointed out that object_is_on_stack() is available to detect this > automatically. > > Use object_is_on_stack() to determine the correct init work function to > call. This is all a bit strange. The 'onstack' flag is needed for the diagnostic check: is_on_stack = object_is_on_stack(addr); if (is_on_stack == onstack) return; pr_warn(...); WARN_ON(1); So setting the flag to the location of the buffer just subverts the check. It that is sane there ought to be a proper way to do it. OTOH using an on-stack structure for INIT_WORK seems rather strange. Since the kernel thread must sleep waiting for the 'work' to complete why not just perform the required code there. Also you really don't want to OOPS with anything from the stack linked into global kernel data structures. While wait queues are pretty limited in scope and probably ok, this looks like a big accident waiting to happen. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
David Laight wrote: > From: ira.weiny@intel.com > > Sent: 18 November 2022 00:05 > > > > Work item initialization needs to be done with either > > INIT_WORK_ONSTACK() or INIT_WORK() depending on how the work item is > > allocated. > > > > The callers of pci_doe_submit_task() allocate struct pci_doe_task on the > > stack and pci_doe_submit_task() incorrectly used INIT_WORK(). > > > > Jonathan suggested creating doe task allocation macros such as > > DECLARE_CDAT_DOE_TASK_ONSTACK().[1] The issue with this is the work > > function is not known to the callers and must be initialized correctly. > > > > A follow up suggestion was to have an internal 'pci_doe_work' item > > allocated by pci_doe_submit_task().[2] This requires an allocation which > > could restrict the context where tasks are used. > > > > Another idea was to have an intermediate step to initialize the task > > struct with a new call.[3] This added a lot of complexity. > > > > Lukas pointed out that object_is_on_stack() is available to detect this > > automatically. > > > > Use object_is_on_stack() to determine the correct init work function to > > call. > > This is all a bit strange. > The 'onstack' flag is needed for the diagnostic check: > is_on_stack = object_is_on_stack(addr); > if (is_on_stack == onstack) > return; > pr_warn(...); > WARN_ON(1); > > So setting the flag to the location of the buffer just subverts the check. > It that is sane there ought to be a proper way to do it. > > OTOH using an on-stack structure for INIT_WORK seems rather strange. > Since the kernel thread must sleep waiting for the 'work' to complete > why not just perform the required code there. To have the option to support both async and sync flows through this driver interface. It is similar to the internal distinction between: submit_bio_wait() ...and: submit_bio() Where the former just layers an on on-stack completion over the asynchronous submit_bio(). > Also you really don't want to OOPS with anything from the stack > linked into global kernel data structures. > While wait queues are pretty limited in scope and probably ok, > this looks like a big accident waiting to happen. I do not see the cause for alarm, this sync-wait design pattern is not new.
On Fri, Nov 18, 2022 at 09:20:38AM +0000, David Laight wrote: > From: ira.weiny@intel.com > > Sent: 18 November 2022 00:05 > > > > Work item initialization needs to be done with either > > INIT_WORK_ONSTACK() or INIT_WORK() depending on how the work item is > > allocated. > > > > The callers of pci_doe_submit_task() allocate struct pci_doe_task on the > > stack and pci_doe_submit_task() incorrectly used INIT_WORK(). > > > > Jonathan suggested creating doe task allocation macros such as > > DECLARE_CDAT_DOE_TASK_ONSTACK().[1] The issue with this is the work > > function is not known to the callers and must be initialized correctly. > > > > A follow up suggestion was to have an internal 'pci_doe_work' item > > allocated by pci_doe_submit_task().[2] This requires an allocation which > > could restrict the context where tasks are used. > > > > Another idea was to have an intermediate step to initialize the task > > struct with a new call.[3] This added a lot of complexity. > > > > Lukas pointed out that object_is_on_stack() is available to detect this > > automatically. > > > > Use object_is_on_stack() to determine the correct init work function to > > call. > > This is all a bit strange. > The 'onstack' flag is needed for the diagnostic check: > is_on_stack = object_is_on_stack(addr); > if (is_on_stack == onstack) > return; > pr_warn(...); > WARN_ON(1); > :-( > So setting the flag to the location of the buffer just subverts the check. > It that is sane there ought to be a proper way to do it. Ok this brings me back to my previous point and suggested patch.[*] The fundamental bug is that the work item is allocated in different code from the code which uses it. Separating the work item from the task. [*] https://lore.kernel.org/linux-cxl/20221014151045.24781-1-Jonathan.Cameron@huawei.com/T/#m63c636c5135f304480370924f4d03c00357be667 Bjorn would this solution be acceptable and just use GFP_KERNEL and mark the required context for pci_doe_submit_task()? > OTOH using an on-stack structure for INIT_WORK seems rather strange. > Since the kernel thread must sleep waiting for the 'work' to complete > why not just perform the required code there. It is not strange if some task submitters want to wait while others do not. It was suggested that all submit task operations be async and the callers who wanted to be synchronous would wait like this. As Dan said there is a difference between submit_bio() and submit_bio_wait(). We have simply left the wait part up to the users who all wait right now. > > Also you really don't want to OOPS with anything from the stack > linked into global kernel data structures. I'm not following what you mean here. I'm not seeing anything like this in the current code nor any of the solutions suggested. Ira > While wait queues are pretty limited in scope and probably ok, > this looks like a big accident waiting to happen. > > David > > - > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK > Registration No: 1397386 (Wales) >
Ira Weiny wrote: > On Fri, Nov 18, 2022 at 09:20:38AM +0000, David Laight wrote: > > From: ira.weiny@intel.com > > > Sent: 18 November 2022 00:05 > > > > > > Work item initialization needs to be done with either > > > INIT_WORK_ONSTACK() or INIT_WORK() depending on how the work item is > > > allocated. > > > > > > The callers of pci_doe_submit_task() allocate struct pci_doe_task on the > > > stack and pci_doe_submit_task() incorrectly used INIT_WORK(). > > > > > > Jonathan suggested creating doe task allocation macros such as > > > DECLARE_CDAT_DOE_TASK_ONSTACK().[1] The issue with this is the work > > > function is not known to the callers and must be initialized correctly. > > > > > > A follow up suggestion was to have an internal 'pci_doe_work' item > > > allocated by pci_doe_submit_task().[2] This requires an allocation which > > > could restrict the context where tasks are used. > > > > > > Another idea was to have an intermediate step to initialize the task > > > struct with a new call.[3] This added a lot of complexity. > > > > > > Lukas pointed out that object_is_on_stack() is available to detect this > > > automatically. > > > > > > Use object_is_on_stack() to determine the correct init work function to > > > call. > > > > This is all a bit strange. > > The 'onstack' flag is needed for the diagnostic check: > > is_on_stack = object_is_on_stack(addr); > > if (is_on_stack == onstack) > > return; > > pr_warn(...); > > WARN_ON(1); > > > > :-( > > > So setting the flag to the location of the buffer just subverts the check. > > It that is sane there ought to be a proper way to do it. > > Ok this brings me back to my previous point and suggested patch.[*] The > fundamental bug is that the work item is allocated in different code from > the code which uses it. Separating the work item from the task. > > [*] https://lore.kernel.org/linux-cxl/20221014151045.24781-1-Jonathan.Cameron@huawei.com/T/#m63c636c5135f304480370924f4d03c00357be667 > > Bjorn would this solution be acceptable and just use GFP_KERNEL and mark the > required context for pci_doe_submit_task()? It is a waste to have an allocation when one is not needed. The value of having INIT_WORK_ONSTACK and DECLARE_COMPLETION_ONSTACK is to be clear at the call site that an async context cares about this stack frame not going out of scope. However, coming full circle, we have zero async users today, and having the completion and work struct in the task are causing a maintenance burden. So let's just rip it out for now with something like the following and circle back to add async support later when it becomes necessary: (only compile tested) diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c index 0dbbe8d39b07..69873cdcc911 100644 --- a/drivers/cxl/core/pci.c +++ b/drivers/cxl/core/pci.c @@ -488,21 +488,14 @@ static struct pci_doe_mb *find_cdat_doe(struct device *uport) CXL_DOE_TABLE_ACCESS_TABLE_TYPE_CDATA) | \ FIELD_PREP(CXL_DOE_TABLE_ACCESS_ENTRY_HANDLE, (entry_handle))) -static void cxl_doe_task_complete(struct pci_doe_task *task) -{ - complete(task->private); -} - struct cdat_doe_task { u32 request_pl; u32 response_pl[32]; - struct completion c; struct pci_doe_task task; }; #define DECLARE_CDAT_DOE_TASK(req, cdt) \ struct cdat_doe_task cdt = { \ - .c = COMPLETION_INITIALIZER_ONSTACK(cdt.c), \ .request_pl = req, \ .task = { \ .prot.vid = PCI_DVSEC_VENDOR_ID_CXL, \ @@ -511,8 +504,6 @@ struct cdat_doe_task cdt = { \ .request_pl_sz = sizeof(cdt.request_pl), \ .response_pl = cdt.response_pl, \ .response_pl_sz = sizeof(cdt.response_pl), \ - .complete = cxl_doe_task_complete, \ - .private = &cdt.c, \ } \ } @@ -523,12 +514,12 @@ static int cxl_cdat_get_length(struct device *dev, DECLARE_CDAT_DOE_TASK(CDAT_DOE_REQ(0), t); int rc; - rc = pci_doe_submit_task(cdat_doe, &t.task); + rc = pci_doe_submit_task_wait(cdat_doe, &t.task); if (rc < 0) { dev_err(dev, "DOE submit failed: %d", rc); return rc; } - wait_for_completion(&t.c); + if (t.task.rv < sizeof(u32)) return -EIO; @@ -552,12 +543,11 @@ static int cxl_cdat_read_table(struct device *dev, u32 *entry; int rc; - rc = pci_doe_submit_task(cdat_doe, &t.task); + rc = pci_doe_submit_task_wait(cdat_doe, &t.task); if (rc < 0) { dev_err(dev, "DOE submit failed: %d", rc); return rc; } - wait_for_completion(&t.c); /* 1 DW header + 1 DW data min */ if (t.task.rv < (2 * sizeof(u32))) return -EIO; diff --git a/drivers/pci/doe.c b/drivers/pci/doe.c index e402f05068a5..115a8ff14afc 100644 --- a/drivers/pci/doe.c +++ b/drivers/pci/doe.c @@ -18,7 +18,6 @@ #include <linux/mutex.h> #include <linux/pci.h> #include <linux/pci-doe.h> -#include <linux/workqueue.h> #define PCI_DOE_PROTOCOL_DISCOVERY 0 @@ -40,7 +39,6 @@ * @cap_offset: Capability offset * @prots: Array of protocols supported (encoded as long values) * @wq: Wait queue for work item - * @work_queue: Queue of pci_doe_work items * @flags: Bit array of PCI_DOE_FLAG_* flags */ struct pci_doe_mb { @@ -49,7 +47,6 @@ struct pci_doe_mb { struct xarray prots; wait_queue_head_t wq; - struct workqueue_struct *work_queue; unsigned long flags; }; @@ -211,7 +208,6 @@ static int pci_doe_recv_resp(struct pci_doe_mb *doe_mb, struct pci_doe_task *tas static void signal_task_complete(struct pci_doe_task *task, int rv) { task->rv = rv; - task->complete(task); } static void signal_task_abort(struct pci_doe_task *task, int rv) @@ -231,10 +227,9 @@ static void signal_task_abort(struct pci_doe_task *task, int rv) signal_task_complete(task, rv); } -static void doe_statemachine_work(struct work_struct *work) + +static void exec_task(struct pci_doe_task *task) { - struct pci_doe_task *task = container_of(work, struct pci_doe_task, - work); struct pci_doe_mb *doe_mb = task->doe_mb; struct pci_dev *pdev = doe_mb->pdev; int offset = doe_mb->cap_offset; @@ -295,18 +290,12 @@ static void doe_statemachine_work(struct work_struct *work) signal_task_complete(task, rc); } -static void pci_doe_task_complete(struct pci_doe_task *task) -{ - complete(task->private); -} - static int pci_doe_discovery(struct pci_doe_mb *doe_mb, u8 *index, u16 *vid, u8 *protocol) { u32 request_pl = FIELD_PREP(PCI_DOE_DATA_OBJECT_DISC_REQ_3_INDEX, *index); u32 response_pl; - DECLARE_COMPLETION_ONSTACK(c); struct pci_doe_task task = { .prot.vid = PCI_VENDOR_ID_PCI_SIG, .prot.type = PCI_DOE_PROTOCOL_DISCOVERY, @@ -314,17 +303,13 @@ static int pci_doe_discovery(struct pci_doe_mb *doe_mb, u8 *index, u16 *vid, .request_pl_sz = sizeof(request_pl), .response_pl = &response_pl, .response_pl_sz = sizeof(response_pl), - .complete = pci_doe_task_complete, - .private = &c, }; int rc; - rc = pci_doe_submit_task(doe_mb, &task); + rc = pci_doe_submit_task_wait(doe_mb, &task); if (rc < 0) return rc; - wait_for_completion(&c); - if (task.rv != sizeof(response_pl)) return -EIO; @@ -376,13 +361,6 @@ static void pci_doe_xa_destroy(void *mb) xa_destroy(&doe_mb->prots); } -static void pci_doe_destroy_workqueue(void *mb) -{ - struct pci_doe_mb *doe_mb = mb; - - destroy_workqueue(doe_mb->work_queue); -} - static void pci_doe_flush_mb(void *mb) { struct pci_doe_mb *doe_mb = mb; @@ -393,9 +371,6 @@ static void pci_doe_flush_mb(void *mb) /* Cancel an in progress work item, if necessary */ set_bit(PCI_DOE_FLAG_CANCEL, &doe_mb->flags); wake_up(&doe_mb->wq); - - /* Flush all work items */ - flush_workqueue(doe_mb->work_queue); } /** @@ -429,19 +404,6 @@ struct pci_doe_mb *pcim_doe_create_mb(struct pci_dev *pdev, u16 cap_offset) if (rc) return ERR_PTR(rc); - doe_mb->work_queue = alloc_ordered_workqueue("%s %s DOE [%x]", 0, - dev_driver_string(&pdev->dev), - pci_name(pdev), - doe_mb->cap_offset); - if (!doe_mb->work_queue) { - pci_err(pdev, "[%x] failed to allocate work queue\n", - doe_mb->cap_offset); - return ERR_PTR(-ENOMEM); - } - rc = devm_add_action_or_reset(dev, pci_doe_destroy_workqueue, doe_mb); - if (rc) - return ERR_PTR(rc); - /* Reset the mailbox by issuing an abort */ rc = pci_doe_abort(doe_mb); if (rc) { @@ -496,23 +458,20 @@ bool pci_doe_supports_prot(struct pci_doe_mb *doe_mb, u16 vid, u8 type) EXPORT_SYMBOL_GPL(pci_doe_supports_prot); /** - * pci_doe_submit_task() - Submit a task to be processed by the state machine + * pci_doe_submit_task_wait() - Submit and execute a task * * @doe_mb: DOE mailbox capability to submit to - * @task: task to be queued + * @task: task to be run * - * Submit a DOE task (request/response) to the DOE mailbox to be processed. - * Returns upon queueing the task object. If the queue is full this function - * will sleep until there is room in the queue. - * - * task->complete will be called when the state machine is done processing this - * task. + * Submit and run DOE task (request/response) to the DOE mailbox to be + * processed. * * Excess data will be discarded. * - * RETURNS: 0 when task has been successfully queued, -ERRNO on error + * RETURNS: 0 when task was executed, the @task->rv holds the status + * result of the executed opertion, -ERRNO on failure to submit. */ -int pci_doe_submit_task(struct pci_doe_mb *doe_mb, struct pci_doe_task *task) +int pci_doe_submit_task_wait(struct pci_doe_mb *doe_mb, struct pci_doe_task *task) { if (!pci_doe_supports_prot(doe_mb, task->prot.vid, task->prot.type)) return -EINVAL; @@ -529,8 +488,8 @@ int pci_doe_submit_task(struct pci_doe_mb *doe_mb, struct pci_doe_task *task) return -EIO; task->doe_mb = doe_mb; - INIT_WORK(&task->work, doe_statemachine_work); - queue_work(doe_mb->work_queue, &task->work); + exec_task(task); + return 0; } -EXPORT_SYMBOL_GPL(pci_doe_submit_task); +EXPORT_SYMBOL_GPL(pci_doe_submit_task_wait); diff --git a/include/linux/pci-doe.h b/include/linux/pci-doe.h index ed9b4df792b8..c94122a66221 100644 --- a/include/linux/pci-doe.h +++ b/include/linux/pci-doe.h @@ -30,8 +30,6 @@ struct pci_doe_mb; * @response_pl_sz: Size of the response payload (bytes) * @rv: Return value. Length of received response or error (bytes) * @complete: Called when task is complete - * @private: Private data for the consumer - * @work: Used internally by the mailbox * @doe_mb: Used internally by the mailbox * * The payload sizes and rv are specified in bytes with the following @@ -50,11 +48,6 @@ struct pci_doe_task { u32 *response_pl; size_t response_pl_sz; int rv; - void (*complete)(struct pci_doe_task *task); - void *private; - - /* No need for the user to initialize these fields */ - struct work_struct work; struct pci_doe_mb *doe_mb; }; @@ -72,6 +65,5 @@ struct pci_doe_task { struct pci_doe_mb *pcim_doe_create_mb(struct pci_dev *pdev, u16 cap_offset); bool pci_doe_supports_prot(struct pci_doe_mb *doe_mb, u16 vid, u8 type); -int pci_doe_submit_task(struct pci_doe_mb *doe_mb, struct pci_doe_task *task); - +int pci_doe_submit_task_wait(struct pci_doe_mb *doe_mb, struct pci_doe_task *task); #endif > > OTOH using an on-stack structure for INIT_WORK seems rather strange. > > Since the kernel thread must sleep waiting for the 'work' to complete > > why not just perform the required code there. > > It is not strange if some task submitters want to wait while others do not. It > was suggested that all submit task operations be async and the callers who > wanted to be synchronous would wait like this. > > As Dan said there is a difference between submit_bio() and submit_bio_wait(). > > We have simply left the wait part up to the users who all wait right now. Yeah, my bad for jumping ahead to worry about async when it is not yet needed.
On 11/19/2022 3:46 AM, Dan Williams wrote: > Ira Weiny wrote: >> On Fri, Nov 18, 2022 at 09:20:38AM +0000, David Laight wrote: >>> From: ira.weiny@intel.com >>>> Sent: 18 November 2022 00:05 >>>> >>>> Work item initialization needs to be done with either >>>> INIT_WORK_ONSTACK() or INIT_WORK() depending on how the work item is >>>> allocated. >>>> >>>> The callers of pci_doe_submit_task() allocate struct pci_doe_task on the >>>> stack and pci_doe_submit_task() incorrectly used INIT_WORK(). >>>> >>>> Jonathan suggested creating doe task allocation macros such as >>>> DECLARE_CDAT_DOE_TASK_ONSTACK().[1] The issue with this is the work >>>> function is not known to the callers and must be initialized correctly. >>>> >>>> A follow up suggestion was to have an internal 'pci_doe_work' item >>>> allocated by pci_doe_submit_task().[2] This requires an allocation which >>>> could restrict the context where tasks are used. >>>> >>>> Another idea was to have an intermediate step to initialize the task >>>> struct with a new call.[3] This added a lot of complexity. >>>> >>>> Lukas pointed out that object_is_on_stack() is available to detect this >>>> automatically. >>>> >>>> Use object_is_on_stack() to determine the correct init work function to >>>> call. >>> >>> This is all a bit strange. >>> The 'onstack' flag is needed for the diagnostic check: >>> is_on_stack = object_is_on_stack(addr); >>> if (is_on_stack == onstack) >>> return; >>> pr_warn(...); >>> WARN_ON(1); >>> >> >> :-( >> >>> So setting the flag to the location of the buffer just subverts the check. >>> It that is sane there ought to be a proper way to do it. >> >> Ok this brings me back to my previous point and suggested patch.[*] The >> fundamental bug is that the work item is allocated in different code from >> the code which uses it. Separating the work item from the task. >> >> [*] https://lore.kernel.org/linux-cxl/20221014151045.24781-1-Jonathan.Cameron@huawei.com/T/#m63c636c5135f304480370924f4d03c00357be667 >> >> Bjorn would this solution be acceptable and just use GFP_KERNEL and mark the >> required context for pci_doe_submit_task()? > > It is a waste to have an allocation when one is not needed. The value of > having INIT_WORK_ONSTACK and DECLARE_COMPLETION_ONSTACK is to be clear > at the call site that an async context cares about this stack frame not > going out of scope. > > However, coming full circle, we have zero async users today, and having > the completion and work struct in the task are causing a maintenance > burden. So let's just rip it out for now with something like the > following and circle back to add async support later when it becomes > necessary: (only compile tested) > > > diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c > index 0dbbe8d39b07..69873cdcc911 100644 > --- a/drivers/cxl/core/pci.c > +++ b/drivers/cxl/core/pci.c > @@ -488,21 +488,14 @@ static struct pci_doe_mb *find_cdat_doe(struct device *uport) > CXL_DOE_TABLE_ACCESS_TABLE_TYPE_CDATA) | \ > FIELD_PREP(CXL_DOE_TABLE_ACCESS_ENTRY_HANDLE, (entry_handle))) > > -static void cxl_doe_task_complete(struct pci_doe_task *task) > -{ > - complete(task->private); > -} > - > struct cdat_doe_task { > u32 request_pl; > u32 response_pl[32]; > - struct completion c; > struct pci_doe_task task; > }; > > #define DECLARE_CDAT_DOE_TASK(req, cdt) \ > struct cdat_doe_task cdt = { \ > - .c = COMPLETION_INITIALIZER_ONSTACK(cdt.c), \ > .request_pl = req, \ > .task = { \ > .prot.vid = PCI_DVSEC_VENDOR_ID_CXL, \ > @@ -511,8 +504,6 @@ struct cdat_doe_task cdt = { \ > .request_pl_sz = sizeof(cdt.request_pl), \ > .response_pl = cdt.response_pl, \ > .response_pl_sz = sizeof(cdt.response_pl), \ > - .complete = cxl_doe_task_complete, \ > - .private = &cdt.c, \ > } \ > } > > @@ -523,12 +514,12 @@ static int cxl_cdat_get_length(struct device *dev, > DECLARE_CDAT_DOE_TASK(CDAT_DOE_REQ(0), t); > int rc; > > - rc = pci_doe_submit_task(cdat_doe, &t.task); > + rc = pci_doe_submit_task_wait(cdat_doe, &t.task); > if (rc < 0) { > dev_err(dev, "DOE submit failed: %d", rc); > return rc; > } > - wait_for_completion(&t.c); > + > if (t.task.rv < sizeof(u32)) > return -EIO; > > @@ -552,12 +543,11 @@ static int cxl_cdat_read_table(struct device *dev, > u32 *entry; > int rc; > > - rc = pci_doe_submit_task(cdat_doe, &t.task); > + rc = pci_doe_submit_task_wait(cdat_doe, &t.task); > if (rc < 0) { > dev_err(dev, "DOE submit failed: %d", rc); > return rc; > } > - wait_for_completion(&t.c); > /* 1 DW header + 1 DW data min */ > if (t.task.rv < (2 * sizeof(u32))) > return -EIO; > diff --git a/drivers/pci/doe.c b/drivers/pci/doe.c > index e402f05068a5..115a8ff14afc 100644 > --- a/drivers/pci/doe.c > +++ b/drivers/pci/doe.c > @@ -18,7 +18,6 @@ > #include <linux/mutex.h> > #include <linux/pci.h> > #include <linux/pci-doe.h> > -#include <linux/workqueue.h> > > #define PCI_DOE_PROTOCOL_DISCOVERY 0 > > @@ -40,7 +39,6 @@ > * @cap_offset: Capability offset > * @prots: Array of protocols supported (encoded as long values) > * @wq: Wait queue for work item > - * @work_queue: Queue of pci_doe_work items > * @flags: Bit array of PCI_DOE_FLAG_* flags > */ > struct pci_doe_mb { > @@ -49,7 +47,6 @@ struct pci_doe_mb { > struct xarray prots; > > wait_queue_head_t wq; > - struct workqueue_struct *work_queue; > unsigned long flags; > }; > > @@ -211,7 +208,6 @@ static int pci_doe_recv_resp(struct pci_doe_mb *doe_mb, struct pci_doe_task *tas > static void signal_task_complete(struct pci_doe_task *task, int rv) > { > task->rv = rv; > - task->complete(task); > } > > static void signal_task_abort(struct pci_doe_task *task, int rv) > @@ -231,10 +227,9 @@ static void signal_task_abort(struct pci_doe_task *task, int rv) > signal_task_complete(task, rv); > } > > -static void doe_statemachine_work(struct work_struct *work) > + > +static void exec_task(struct pci_doe_task *task) > { > - struct pci_doe_task *task = container_of(work, struct pci_doe_task, > - work); > struct pci_doe_mb *doe_mb = task->doe_mb; > struct pci_dev *pdev = doe_mb->pdev; > int offset = doe_mb->cap_offset; > @@ -295,18 +290,12 @@ static void doe_statemachine_work(struct work_struct *work) > signal_task_complete(task, rc); > } > > -static void pci_doe_task_complete(struct pci_doe_task *task) > -{ > - complete(task->private); > -} > - > static int pci_doe_discovery(struct pci_doe_mb *doe_mb, u8 *index, u16 *vid, > u8 *protocol) > { > u32 request_pl = FIELD_PREP(PCI_DOE_DATA_OBJECT_DISC_REQ_3_INDEX, > *index); > u32 response_pl; > - DECLARE_COMPLETION_ONSTACK(c); > struct pci_doe_task task = { > .prot.vid = PCI_VENDOR_ID_PCI_SIG, > .prot.type = PCI_DOE_PROTOCOL_DISCOVERY, > @@ -314,17 +303,13 @@ static int pci_doe_discovery(struct pci_doe_mb *doe_mb, u8 *index, u16 *vid, > .request_pl_sz = sizeof(request_pl), > .response_pl = &response_pl, > .response_pl_sz = sizeof(response_pl), > - .complete = pci_doe_task_complete, > - .private = &c, > }; > int rc; > > - rc = pci_doe_submit_task(doe_mb, &task); > + rc = pci_doe_submit_task_wait(doe_mb, &task); > if (rc < 0) > return rc; > > - wait_for_completion(&c); > - > if (task.rv != sizeof(response_pl)) > return -EIO; > > @@ -376,13 +361,6 @@ static void pci_doe_xa_destroy(void *mb) > xa_destroy(&doe_mb->prots); > } > > -static void pci_doe_destroy_workqueue(void *mb) > -{ > - struct pci_doe_mb *doe_mb = mb; > - > - destroy_workqueue(doe_mb->work_queue); > -} > - > static void pci_doe_flush_mb(void *mb) > { > struct pci_doe_mb *doe_mb = mb; > @@ -393,9 +371,6 @@ static void pci_doe_flush_mb(void *mb) > /* Cancel an in progress work item, if necessary */ > set_bit(PCI_DOE_FLAG_CANCEL, &doe_mb->flags); > wake_up(&doe_mb->wq); > - > - /* Flush all work items */ > - flush_workqueue(doe_mb->work_queue); > } > > /** > @@ -429,19 +404,6 @@ struct pci_doe_mb *pcim_doe_create_mb(struct pci_dev *pdev, u16 cap_offset) > if (rc) > return ERR_PTR(rc); > > - doe_mb->work_queue = alloc_ordered_workqueue("%s %s DOE [%x]", 0, > - dev_driver_string(&pdev->dev), > - pci_name(pdev), > - doe_mb->cap_offset); > - if (!doe_mb->work_queue) { > - pci_err(pdev, "[%x] failed to allocate work queue\n", > - doe_mb->cap_offset); > - return ERR_PTR(-ENOMEM); > - } > - rc = devm_add_action_or_reset(dev, pci_doe_destroy_workqueue, doe_mb); > - if (rc) > - return ERR_PTR(rc); > - > /* Reset the mailbox by issuing an abort */ > rc = pci_doe_abort(doe_mb); > if (rc) { > @@ -496,23 +458,20 @@ bool pci_doe_supports_prot(struct pci_doe_mb *doe_mb, u16 vid, u8 type) > EXPORT_SYMBOL_GPL(pci_doe_supports_prot); > > /** > - * pci_doe_submit_task() - Submit a task to be processed by the state machine > + * pci_doe_submit_task_wait() - Submit and execute a task > * > * @doe_mb: DOE mailbox capability to submit to > - * @task: task to be queued > + * @task: task to be run > * > - * Submit a DOE task (request/response) to the DOE mailbox to be processed. > - * Returns upon queueing the task object. If the queue is full this function > - * will sleep until there is room in the queue. > - * > - * task->complete will be called when the state machine is done processing this > - * task. > + * Submit and run DOE task (request/response) to the DOE mailbox to be > + * processed. > * > * Excess data will be discarded. > * > - * RETURNS: 0 when task has been successfully queued, -ERRNO on error > + * RETURNS: 0 when task was executed, the @task->rv holds the status > + * result of the executed opertion, -ERRNO on failure to submit. > */ > -int pci_doe_submit_task(struct pci_doe_mb *doe_mb, struct pci_doe_task *task) > +int pci_doe_submit_task_wait(struct pci_doe_mb *doe_mb, struct pci_doe_task *task) > { > if (!pci_doe_supports_prot(doe_mb, task->prot.vid, task->prot.type)) > return -EINVAL; > @@ -529,8 +488,8 @@ int pci_doe_submit_task(struct pci_doe_mb *doe_mb, struct pci_doe_task *task) > return -EIO; > > task->doe_mb = doe_mb; > - INIT_WORK(&task->work, doe_statemachine_work); > - queue_work(doe_mb->work_queue, &task->work); > + exec_task(task); > + > return 0; > } > -EXPORT_SYMBOL_GPL(pci_doe_submit_task); > +EXPORT_SYMBOL_GPL(pci_doe_submit_task_wait); > diff --git a/include/linux/pci-doe.h b/include/linux/pci-doe.h > index ed9b4df792b8..c94122a66221 100644 > --- a/include/linux/pci-doe.h > +++ b/include/linux/pci-doe.h > @@ -30,8 +30,6 @@ struct pci_doe_mb; > * @response_pl_sz: Size of the response payload (bytes) > * @rv: Return value. Length of received response or error (bytes) > * @complete: Called when task is complete > - * @private: Private data for the consumer > - * @work: Used internally by the mailbox > * @doe_mb: Used internally by the mailbox > * > * The payload sizes and rv are specified in bytes with the following > @@ -50,11 +48,6 @@ struct pci_doe_task { > u32 *response_pl; > size_t response_pl_sz; > int rv; > - void (*complete)(struct pci_doe_task *task); > - void *private; > - > - /* No need for the user to initialize these fields */ > - struct work_struct work; > struct pci_doe_mb *doe_mb; > }; > > @@ -72,6 +65,5 @@ struct pci_doe_task { > > struct pci_doe_mb *pcim_doe_create_mb(struct pci_dev *pdev, u16 cap_offset); > bool pci_doe_supports_prot(struct pci_doe_mb *doe_mb, u16 vid, u8 type); > -int pci_doe_submit_task(struct pci_doe_mb *doe_mb, struct pci_doe_task *task); > - > +int pci_doe_submit_task_wait(struct pci_doe_mb *doe_mb, struct pci_doe_task *task); > #endif > > good to see that we can have a sync interface. I think that we still need some methods to garantee doe_task can be handled one by one in doe_mb? When more than one kernel thread are going to transfer data over a same doe_mb, only one kernel thread can use it and others will failed in exec_task(). Thanks Ming >>> OTOH using an on-stack structure for INIT_WORK seems rather strange. >>> Since the kernel thread must sleep waiting for the 'work' to complete >>> why not just perform the required code there. >> >> It is not strange if some task submitters want to wait while others do not. It >> was suggested that all submit task operations be async and the callers who >> wanted to be synchronous would wait like this. >> >> As Dan said there is a difference between submit_bio() and submit_bio_wait(). >> >> We have simply left the wait part up to the users who all wait right now. > > Yeah, my bad for jumping ahead to worry about async when it is not yet > needed.
Li, Ming wrote: > On 11/19/2022 3:46 AM, Dan Williams wrote: > > Ira Weiny wrote: > >> On Fri, Nov 18, 2022 at 09:20:38AM +0000, David Laight wrote: > >>> From: ira.weiny@intel.com > >>>> Sent: 18 November 2022 00:05 > >>>> > >>>> Work item initialization needs to be done with either > >>>> INIT_WORK_ONSTACK() or INIT_WORK() depending on how the work item is > >>>> allocated. > >>>> > >>>> The callers of pci_doe_submit_task() allocate struct pci_doe_task on the > >>>> stack and pci_doe_submit_task() incorrectly used INIT_WORK(). > >>>> > >>>> Jonathan suggested creating doe task allocation macros such as > >>>> DECLARE_CDAT_DOE_TASK_ONSTACK().[1] The issue with this is the work > >>>> function is not known to the callers and must be initialized correctly. > >>>> > >>>> A follow up suggestion was to have an internal 'pci_doe_work' item > >>>> allocated by pci_doe_submit_task().[2] This requires an allocation which > >>>> could restrict the context where tasks are used. > >>>> > >>>> Another idea was to have an intermediate step to initialize the task > >>>> struct with a new call.[3] This added a lot of complexity. > >>>> > >>>> Lukas pointed out that object_is_on_stack() is available to detect this > >>>> automatically. > >>>> > >>>> Use object_is_on_stack() to determine the correct init work function to > >>>> call. > >>> > >>> This is all a bit strange. > >>> The 'onstack' flag is needed for the diagnostic check: > >>> is_on_stack = object_is_on_stack(addr); > >>> if (is_on_stack == onstack) > >>> return; > >>> pr_warn(...); > >>> WARN_ON(1); > >>> > >> > >> :-( > >> > >>> So setting the flag to the location of the buffer just subverts the check. > >>> It that is sane there ought to be a proper way to do it. > >> > >> Ok this brings me back to my previous point and suggested patch.[*] The > >> fundamental bug is that the work item is allocated in different code from > >> the code which uses it. Separating the work item from the task. > >> > >> [*] https://lore.kernel.org/linux-cxl/20221014151045.24781-1-Jonathan.Cameron@huawei.com/T/#m63c636c5135f304480370924f4d03c00357be667 > >> > >> Bjorn would this solution be acceptable and just use GFP_KERNEL and mark the > >> required context for pci_doe_submit_task()? > > > > It is a waste to have an allocation when one is not needed. The value of > > having INIT_WORK_ONSTACK and DECLARE_COMPLETION_ONSTACK is to be clear > > at the call site that an async context cares about this stack frame not > > going out of scope. > > > > However, coming full circle, we have zero async users today, and having > > the completion and work struct in the task are causing a maintenance > > burden. So let's just rip it out for now with something like the > > following and circle back to add async support later when it becomes > > necessary: (only compile tested) > > > > > > diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c > > index 0dbbe8d39b07..69873cdcc911 100644 > > --- a/drivers/cxl/core/pci.c > > +++ b/drivers/cxl/core/pci.c > > @@ -488,21 +488,14 @@ static struct pci_doe_mb *find_cdat_doe(struct device *uport) > > CXL_DOE_TABLE_ACCESS_TABLE_TYPE_CDATA) | \ > > FIELD_PREP(CXL_DOE_TABLE_ACCESS_ENTRY_HANDLE, (entry_handle))) > > > > -static void cxl_doe_task_complete(struct pci_doe_task *task) > > -{ > > - complete(task->private); > > -} > > - > > struct cdat_doe_task { > > u32 request_pl; > > u32 response_pl[32]; > > - struct completion c; > > struct pci_doe_task task; > > }; > > > > #define DECLARE_CDAT_DOE_TASK(req, cdt) \ > > struct cdat_doe_task cdt = { \ > > - .c = COMPLETION_INITIALIZER_ONSTACK(cdt.c), \ > > .request_pl = req, \ > > .task = { \ > > .prot.vid = PCI_DVSEC_VENDOR_ID_CXL, \ > > @@ -511,8 +504,6 @@ struct cdat_doe_task cdt = { \ > > .request_pl_sz = sizeof(cdt.request_pl), \ > > .response_pl = cdt.response_pl, \ > > .response_pl_sz = sizeof(cdt.response_pl), \ > > - .complete = cxl_doe_task_complete, \ > > - .private = &cdt.c, \ > > } \ > > } > > > > @@ -523,12 +514,12 @@ static int cxl_cdat_get_length(struct device *dev, > > DECLARE_CDAT_DOE_TASK(CDAT_DOE_REQ(0), t); > > int rc; > > > > - rc = pci_doe_submit_task(cdat_doe, &t.task); > > + rc = pci_doe_submit_task_wait(cdat_doe, &t.task); > > if (rc < 0) { > > dev_err(dev, "DOE submit failed: %d", rc); > > return rc; > > } > > - wait_for_completion(&t.c); > > + > > if (t.task.rv < sizeof(u32)) > > return -EIO; > > > > @@ -552,12 +543,11 @@ static int cxl_cdat_read_table(struct device *dev, > > u32 *entry; > > int rc; > > > > - rc = pci_doe_submit_task(cdat_doe, &t.task); > > + rc = pci_doe_submit_task_wait(cdat_doe, &t.task); > > if (rc < 0) { > > dev_err(dev, "DOE submit failed: %d", rc); > > return rc; > > } > > - wait_for_completion(&t.c); > > /* 1 DW header + 1 DW data min */ > > if (t.task.rv < (2 * sizeof(u32))) > > return -EIO; > > diff --git a/drivers/pci/doe.c b/drivers/pci/doe.c > > index e402f05068a5..115a8ff14afc 100644 > > --- a/drivers/pci/doe.c > > +++ b/drivers/pci/doe.c > > @@ -18,7 +18,6 @@ > > #include <linux/mutex.h> > > #include <linux/pci.h> > > #include <linux/pci-doe.h> > > -#include <linux/workqueue.h> > > > > #define PCI_DOE_PROTOCOL_DISCOVERY 0 > > > > @@ -40,7 +39,6 @@ > > * @cap_offset: Capability offset > > * @prots: Array of protocols supported (encoded as long values) > > * @wq: Wait queue for work item > > - * @work_queue: Queue of pci_doe_work items > > * @flags: Bit array of PCI_DOE_FLAG_* flags > > */ > > struct pci_doe_mb { > > @@ -49,7 +47,6 @@ struct pci_doe_mb { > > struct xarray prots; > > > > wait_queue_head_t wq; > > - struct workqueue_struct *work_queue; > > unsigned long flags; > > }; > > > > @@ -211,7 +208,6 @@ static int pci_doe_recv_resp(struct pci_doe_mb *doe_mb, struct pci_doe_task *tas > > static void signal_task_complete(struct pci_doe_task *task, int rv) > > { > > task->rv = rv; > > - task->complete(task); > > } > > > > static void signal_task_abort(struct pci_doe_task *task, int rv) > > @@ -231,10 +227,9 @@ static void signal_task_abort(struct pci_doe_task *task, int rv) > > signal_task_complete(task, rv); > > } > > > > -static void doe_statemachine_work(struct work_struct *work) > > + > > +static void exec_task(struct pci_doe_task *task) > > { > > - struct pci_doe_task *task = container_of(work, struct pci_doe_task, > > - work); > > struct pci_doe_mb *doe_mb = task->doe_mb; > > struct pci_dev *pdev = doe_mb->pdev; > > int offset = doe_mb->cap_offset; > > @@ -295,18 +290,12 @@ static void doe_statemachine_work(struct work_struct *work) > > signal_task_complete(task, rc); > > } > > > > -static void pci_doe_task_complete(struct pci_doe_task *task) > > -{ > > - complete(task->private); > > -} > > - > > static int pci_doe_discovery(struct pci_doe_mb *doe_mb, u8 *index, u16 *vid, > > u8 *protocol) > > { > > u32 request_pl = FIELD_PREP(PCI_DOE_DATA_OBJECT_DISC_REQ_3_INDEX, > > *index); > > u32 response_pl; > > - DECLARE_COMPLETION_ONSTACK(c); > > struct pci_doe_task task = { > > .prot.vid = PCI_VENDOR_ID_PCI_SIG, > > .prot.type = PCI_DOE_PROTOCOL_DISCOVERY, > > @@ -314,17 +303,13 @@ static int pci_doe_discovery(struct pci_doe_mb *doe_mb, u8 *index, u16 *vid, > > .request_pl_sz = sizeof(request_pl), > > .response_pl = &response_pl, > > .response_pl_sz = sizeof(response_pl), > > - .complete = pci_doe_task_complete, > > - .private = &c, > > }; > > int rc; > > > > - rc = pci_doe_submit_task(doe_mb, &task); > > + rc = pci_doe_submit_task_wait(doe_mb, &task); > > if (rc < 0) > > return rc; > > > > - wait_for_completion(&c); > > - > > if (task.rv != sizeof(response_pl)) > > return -EIO; > > > > @@ -376,13 +361,6 @@ static void pci_doe_xa_destroy(void *mb) > > xa_destroy(&doe_mb->prots); > > } > > > > -static void pci_doe_destroy_workqueue(void *mb) > > -{ > > - struct pci_doe_mb *doe_mb = mb; > > - > > - destroy_workqueue(doe_mb->work_queue); > > -} > > - > > static void pci_doe_flush_mb(void *mb) > > { > > struct pci_doe_mb *doe_mb = mb; > > @@ -393,9 +371,6 @@ static void pci_doe_flush_mb(void *mb) > > /* Cancel an in progress work item, if necessary */ > > set_bit(PCI_DOE_FLAG_CANCEL, &doe_mb->flags); > > wake_up(&doe_mb->wq); > > - > > - /* Flush all work items */ > > - flush_workqueue(doe_mb->work_queue); > > } > > > > /** > > @@ -429,19 +404,6 @@ struct pci_doe_mb *pcim_doe_create_mb(struct pci_dev *pdev, u16 cap_offset) > > if (rc) > > return ERR_PTR(rc); > > > > - doe_mb->work_queue = alloc_ordered_workqueue("%s %s DOE [%x]", 0, > > - dev_driver_string(&pdev->dev), > > - pci_name(pdev), > > - doe_mb->cap_offset); > > - if (!doe_mb->work_queue) { > > - pci_err(pdev, "[%x] failed to allocate work queue\n", > > - doe_mb->cap_offset); > > - return ERR_PTR(-ENOMEM); > > - } > > - rc = devm_add_action_or_reset(dev, pci_doe_destroy_workqueue, doe_mb); > > - if (rc) > > - return ERR_PTR(rc); > > - > > /* Reset the mailbox by issuing an abort */ > > rc = pci_doe_abort(doe_mb); > > if (rc) { > > @@ -496,23 +458,20 @@ bool pci_doe_supports_prot(struct pci_doe_mb *doe_mb, u16 vid, u8 type) > > EXPORT_SYMBOL_GPL(pci_doe_supports_prot); > > > > /** > > - * pci_doe_submit_task() - Submit a task to be processed by the state machine > > + * pci_doe_submit_task_wait() - Submit and execute a task > > * > > * @doe_mb: DOE mailbox capability to submit to > > - * @task: task to be queued > > + * @task: task to be run > > * > > - * Submit a DOE task (request/response) to the DOE mailbox to be processed. > > - * Returns upon queueing the task object. If the queue is full this function > > - * will sleep until there is room in the queue. > > - * > > - * task->complete will be called when the state machine is done processing this > > - * task. > > + * Submit and run DOE task (request/response) to the DOE mailbox to be > > + * processed. > > * > > * Excess data will be discarded. > > * > > - * RETURNS: 0 when task has been successfully queued, -ERRNO on error > > + * RETURNS: 0 when task was executed, the @task->rv holds the status > > + * result of the executed opertion, -ERRNO on failure to submit. > > */ > > -int pci_doe_submit_task(struct pci_doe_mb *doe_mb, struct pci_doe_task *task) > > +int pci_doe_submit_task_wait(struct pci_doe_mb *doe_mb, struct pci_doe_task *task) > > { > > if (!pci_doe_supports_prot(doe_mb, task->prot.vid, task->prot.type)) > > return -EINVAL; > > @@ -529,8 +488,8 @@ int pci_doe_submit_task(struct pci_doe_mb *doe_mb, struct pci_doe_task *task) > > return -EIO; > > > > task->doe_mb = doe_mb; > > - INIT_WORK(&task->work, doe_statemachine_work); > > - queue_work(doe_mb->work_queue, &task->work); > > + exec_task(task); > > + > > return 0; > > } > > -EXPORT_SYMBOL_GPL(pci_doe_submit_task); > > +EXPORT_SYMBOL_GPL(pci_doe_submit_task_wait); > > diff --git a/include/linux/pci-doe.h b/include/linux/pci-doe.h > > index ed9b4df792b8..c94122a66221 100644 > > --- a/include/linux/pci-doe.h > > +++ b/include/linux/pci-doe.h > > @@ -30,8 +30,6 @@ struct pci_doe_mb; > > * @response_pl_sz: Size of the response payload (bytes) > > * @rv: Return value. Length of received response or error (bytes) > > * @complete: Called when task is complete > > - * @private: Private data for the consumer > > - * @work: Used internally by the mailbox > > * @doe_mb: Used internally by the mailbox > > * > > * The payload sizes and rv are specified in bytes with the following > > @@ -50,11 +48,6 @@ struct pci_doe_task { > > u32 *response_pl; > > size_t response_pl_sz; > > int rv; > > - void (*complete)(struct pci_doe_task *task); > > - void *private; > > - > > - /* No need for the user to initialize these fields */ > > - struct work_struct work; > > struct pci_doe_mb *doe_mb; > > }; > > > > @@ -72,6 +65,5 @@ struct pci_doe_task { > > > > struct pci_doe_mb *pcim_doe_create_mb(struct pci_dev *pdev, u16 cap_offset); > > bool pci_doe_supports_prot(struct pci_doe_mb *doe_mb, u16 vid, u8 type); > > -int pci_doe_submit_task(struct pci_doe_mb *doe_mb, struct pci_doe_task *task); > > - > > +int pci_doe_submit_task_wait(struct pci_doe_mb *doe_mb, struct pci_doe_task *task); > > #endif > > > > > > good to see that we can have a sync interface. > I think that we still need some methods to garantee doe_task can be > handled one by one in doe_mb? When more than one kernel thread are > going to transfer data over a same doe_mb, only one kernel thread can > use it and others will failed in exec_task(). > Oh, good catch, yes, this likely needs a mutex_lock_interruptible() over exec_task(), or similar.
On Fri, Nov 18, 2022 at 09:11:58PM -0800, Dan Williams wrote: > Li, Ming wrote: > > On 11/19/2022 3:46 AM, Dan Williams wrote: > > > Ira Weiny wrote: > > >> On Fri, Nov 18, 2022 at 09:20:38AM +0000, David Laight wrote: > > >>> From: ira.weiny@intel.com > > >>>> Sent: 18 November 2022 00:05 > > >>>> > > >>>> Work item initialization needs to be done with either > > >>>> INIT_WORK_ONSTACK() or INIT_WORK() depending on how the work item is > > >>>> allocated. > > >>>> > > >>>> The callers of pci_doe_submit_task() allocate struct pci_doe_task on the > > >>>> stack and pci_doe_submit_task() incorrectly used INIT_WORK(). > > >>>> > > >>>> Jonathan suggested creating doe task allocation macros such as > > >>>> DECLARE_CDAT_DOE_TASK_ONSTACK().[1] The issue with this is the work > > >>>> function is not known to the callers and must be initialized correctly. > > >>>> > > >>>> A follow up suggestion was to have an internal 'pci_doe_work' item > > >>>> allocated by pci_doe_submit_task().[2] This requires an allocation which > > >>>> could restrict the context where tasks are used. > > >>>> > > >>>> Another idea was to have an intermediate step to initialize the task > > >>>> struct with a new call.[3] This added a lot of complexity. > > >>>> > > >>>> Lukas pointed out that object_is_on_stack() is available to detect this > > >>>> automatically. > > >>>> > > >>>> Use object_is_on_stack() to determine the correct init work function to > > >>>> call. > > >>> > > >>> This is all a bit strange. > > >>> The 'onstack' flag is needed for the diagnostic check: > > >>> is_on_stack = object_is_on_stack(addr); > > >>> if (is_on_stack == onstack) > > >>> return; > > >>> pr_warn(...); > > >>> WARN_ON(1); > > >>> > > >> > > >> :-( > > >> > > >>> So setting the flag to the location of the buffer just subverts the check. > > >>> It that is sane there ought to be a proper way to do it. > > >> > > >> Ok this brings me back to my previous point and suggested patch.[*] The > > >> fundamental bug is that the work item is allocated in different code from > > >> the code which uses it. Separating the work item from the task. > > >> > > >> [*] https://lore.kernel.org/linux-cxl/20221014151045.24781-1-Jonathan.Cameron@huawei.com/T/#m63c636c5135f304480370924f4d03c00357be667 > > >> > > >> Bjorn would this solution be acceptable and just use GFP_KERNEL and mark the > > >> required context for pci_doe_submit_task()? > > > > > > It is a waste to have an allocation when one is not needed. The value of > > > having INIT_WORK_ONSTACK and DECLARE_COMPLETION_ONSTACK is to be clear > > > at the call site that an async context cares about this stack frame not > > > going out of scope. > > > > > > However, coming full circle, we have zero async users today, and having > > > the completion and work struct in the task are causing a maintenance > > > burden. So let's just rip it out for now with something like the > > > following and circle back to add async support later when it becomes > > > necessary: (only compile tested) > > > > > > > > > diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c > > > index 0dbbe8d39b07..69873cdcc911 100644 > > > --- a/drivers/cxl/core/pci.c > > > +++ b/drivers/cxl/core/pci.c > > > @@ -488,21 +488,14 @@ static struct pci_doe_mb *find_cdat_doe(struct device *uport) > > > CXL_DOE_TABLE_ACCESS_TABLE_TYPE_CDATA) | \ > > > FIELD_PREP(CXL_DOE_TABLE_ACCESS_ENTRY_HANDLE, (entry_handle))) > > > > > > -static void cxl_doe_task_complete(struct pci_doe_task *task) > > > -{ > > > - complete(task->private); > > > -} > > > - > > > struct cdat_doe_task { > > > u32 request_pl; > > > u32 response_pl[32]; > > > - struct completion c; > > > struct pci_doe_task task; > > > }; > > > > > > #define DECLARE_CDAT_DOE_TASK(req, cdt) \ > > > struct cdat_doe_task cdt = { \ > > > - .c = COMPLETION_INITIALIZER_ONSTACK(cdt.c), \ > > > .request_pl = req, \ > > > .task = { \ > > > .prot.vid = PCI_DVSEC_VENDOR_ID_CXL, \ > > > @@ -511,8 +504,6 @@ struct cdat_doe_task cdt = { \ > > > .request_pl_sz = sizeof(cdt.request_pl), \ > > > .response_pl = cdt.response_pl, \ > > > .response_pl_sz = sizeof(cdt.response_pl), \ > > > - .complete = cxl_doe_task_complete, \ > > > - .private = &cdt.c, \ > > > } \ > > > } > > > > > > @@ -523,12 +514,12 @@ static int cxl_cdat_get_length(struct device *dev, > > > DECLARE_CDAT_DOE_TASK(CDAT_DOE_REQ(0), t); > > > int rc; > > > > > > - rc = pci_doe_submit_task(cdat_doe, &t.task); > > > + rc = pci_doe_submit_task_wait(cdat_doe, &t.task); > > > if (rc < 0) { > > > dev_err(dev, "DOE submit failed: %d", rc); > > > return rc; > > > } > > > - wait_for_completion(&t.c); > > > + > > > if (t.task.rv < sizeof(u32)) > > > return -EIO; > > > > > > @@ -552,12 +543,11 @@ static int cxl_cdat_read_table(struct device *dev, > > > u32 *entry; > > > int rc; > > > > > > - rc = pci_doe_submit_task(cdat_doe, &t.task); > > > + rc = pci_doe_submit_task_wait(cdat_doe, &t.task); > > > if (rc < 0) { > > > dev_err(dev, "DOE submit failed: %d", rc); > > > return rc; > > > } > > > - wait_for_completion(&t.c); > > > /* 1 DW header + 1 DW data min */ > > > if (t.task.rv < (2 * sizeof(u32))) > > > return -EIO; > > > diff --git a/drivers/pci/doe.c b/drivers/pci/doe.c > > > index e402f05068a5..115a8ff14afc 100644 > > > --- a/drivers/pci/doe.c > > > +++ b/drivers/pci/doe.c > > > @@ -18,7 +18,6 @@ > > > #include <linux/mutex.h> > > > #include <linux/pci.h> > > > #include <linux/pci-doe.h> > > > -#include <linux/workqueue.h> > > > > > > #define PCI_DOE_PROTOCOL_DISCOVERY 0 > > > > > > @@ -40,7 +39,6 @@ > > > * @cap_offset: Capability offset > > > * @prots: Array of protocols supported (encoded as long values) > > > * @wq: Wait queue for work item > > > - * @work_queue: Queue of pci_doe_work items > > > * @flags: Bit array of PCI_DOE_FLAG_* flags > > > */ > > > struct pci_doe_mb { > > > @@ -49,7 +47,6 @@ struct pci_doe_mb { > > > struct xarray prots; > > > > > > wait_queue_head_t wq; > > > - struct workqueue_struct *work_queue; > > > unsigned long flags; > > > }; > > > > > > @@ -211,7 +208,6 @@ static int pci_doe_recv_resp(struct pci_doe_mb *doe_mb, struct pci_doe_task *tas > > > static void signal_task_complete(struct pci_doe_task *task, int rv) > > > { > > > task->rv = rv; > > > - task->complete(task); > > > } > > > > > > static void signal_task_abort(struct pci_doe_task *task, int rv) > > > @@ -231,10 +227,9 @@ static void signal_task_abort(struct pci_doe_task *task, int rv) > > > signal_task_complete(task, rv); > > > } > > > > > > -static void doe_statemachine_work(struct work_struct *work) > > > + > > > +static void exec_task(struct pci_doe_task *task) > > > { > > > - struct pci_doe_task *task = container_of(work, struct pci_doe_task, > > > - work); > > > struct pci_doe_mb *doe_mb = task->doe_mb; > > > struct pci_dev *pdev = doe_mb->pdev; > > > int offset = doe_mb->cap_offset; > > > @@ -295,18 +290,12 @@ static void doe_statemachine_work(struct work_struct *work) > > > signal_task_complete(task, rc); > > > } > > > > > > -static void pci_doe_task_complete(struct pci_doe_task *task) > > > -{ > > > - complete(task->private); > > > -} > > > - > > > static int pci_doe_discovery(struct pci_doe_mb *doe_mb, u8 *index, u16 *vid, > > > u8 *protocol) > > > { > > > u32 request_pl = FIELD_PREP(PCI_DOE_DATA_OBJECT_DISC_REQ_3_INDEX, > > > *index); > > > u32 response_pl; > > > - DECLARE_COMPLETION_ONSTACK(c); > > > struct pci_doe_task task = { > > > .prot.vid = PCI_VENDOR_ID_PCI_SIG, > > > .prot.type = PCI_DOE_PROTOCOL_DISCOVERY, > > > @@ -314,17 +303,13 @@ static int pci_doe_discovery(struct pci_doe_mb *doe_mb, u8 *index, u16 *vid, > > > .request_pl_sz = sizeof(request_pl), > > > .response_pl = &response_pl, > > > .response_pl_sz = sizeof(response_pl), > > > - .complete = pci_doe_task_complete, > > > - .private = &c, > > > }; > > > int rc; > > > > > > - rc = pci_doe_submit_task(doe_mb, &task); > > > + rc = pci_doe_submit_task_wait(doe_mb, &task); > > > if (rc < 0) > > > return rc; > > > > > > - wait_for_completion(&c); > > > - > > > if (task.rv != sizeof(response_pl)) > > > return -EIO; > > > > > > @@ -376,13 +361,6 @@ static void pci_doe_xa_destroy(void *mb) > > > xa_destroy(&doe_mb->prots); > > > } > > > > > > -static void pci_doe_destroy_workqueue(void *mb) > > > -{ > > > - struct pci_doe_mb *doe_mb = mb; > > > - > > > - destroy_workqueue(doe_mb->work_queue); > > > -} > > > - > > > static void pci_doe_flush_mb(void *mb) > > > { > > > struct pci_doe_mb *doe_mb = mb; > > > @@ -393,9 +371,6 @@ static void pci_doe_flush_mb(void *mb) > > > /* Cancel an in progress work item, if necessary */ > > > set_bit(PCI_DOE_FLAG_CANCEL, &doe_mb->flags); > > > wake_up(&doe_mb->wq); > > > - > > > - /* Flush all work items */ > > > - flush_workqueue(doe_mb->work_queue); > > > } > > > > > > /** > > > @@ -429,19 +404,6 @@ struct pci_doe_mb *pcim_doe_create_mb(struct pci_dev *pdev, u16 cap_offset) > > > if (rc) > > > return ERR_PTR(rc); > > > > > > - doe_mb->work_queue = alloc_ordered_workqueue("%s %s DOE [%x]", 0, > > > - dev_driver_string(&pdev->dev), > > > - pci_name(pdev), > > > - doe_mb->cap_offset); > > > - if (!doe_mb->work_queue) { > > > - pci_err(pdev, "[%x] failed to allocate work queue\n", > > > - doe_mb->cap_offset); > > > - return ERR_PTR(-ENOMEM); > > > - } > > > - rc = devm_add_action_or_reset(dev, pci_doe_destroy_workqueue, doe_mb); > > > - if (rc) > > > - return ERR_PTR(rc); > > > - > > > /* Reset the mailbox by issuing an abort */ > > > rc = pci_doe_abort(doe_mb); > > > if (rc) { > > > @@ -496,23 +458,20 @@ bool pci_doe_supports_prot(struct pci_doe_mb *doe_mb, u16 vid, u8 type) > > > EXPORT_SYMBOL_GPL(pci_doe_supports_prot); > > > > > > /** > > > - * pci_doe_submit_task() - Submit a task to be processed by the state machine > > > + * pci_doe_submit_task_wait() - Submit and execute a task > > > * > > > * @doe_mb: DOE mailbox capability to submit to > > > - * @task: task to be queued > > > + * @task: task to be run > > > * > > > - * Submit a DOE task (request/response) to the DOE mailbox to be processed. > > > - * Returns upon queueing the task object. If the queue is full this function > > > - * will sleep until there is room in the queue. > > > - * > > > - * task->complete will be called when the state machine is done processing this > > > - * task. > > > + * Submit and run DOE task (request/response) to the DOE mailbox to be > > > + * processed. > > > * > > > * Excess data will be discarded. > > > * > > > - * RETURNS: 0 when task has been successfully queued, -ERRNO on error > > > + * RETURNS: 0 when task was executed, the @task->rv holds the status > > > + * result of the executed opertion, -ERRNO on failure to submit. > > > */ > > > -int pci_doe_submit_task(struct pci_doe_mb *doe_mb, struct pci_doe_task *task) > > > +int pci_doe_submit_task_wait(struct pci_doe_mb *doe_mb, struct pci_doe_task *task) > > > { > > > if (!pci_doe_supports_prot(doe_mb, task->prot.vid, task->prot.type)) > > > return -EINVAL; > > > @@ -529,8 +488,8 @@ int pci_doe_submit_task(struct pci_doe_mb *doe_mb, struct pci_doe_task *task) > > > return -EIO; > > > > > > task->doe_mb = doe_mb; > > > - INIT_WORK(&task->work, doe_statemachine_work); > > > - queue_work(doe_mb->work_queue, &task->work); > > > + exec_task(task); > > > + > > > return 0; > > > } > > > -EXPORT_SYMBOL_GPL(pci_doe_submit_task); > > > +EXPORT_SYMBOL_GPL(pci_doe_submit_task_wait); > > > diff --git a/include/linux/pci-doe.h b/include/linux/pci-doe.h > > > index ed9b4df792b8..c94122a66221 100644 > > > --- a/include/linux/pci-doe.h > > > +++ b/include/linux/pci-doe.h > > > @@ -30,8 +30,6 @@ struct pci_doe_mb; > > > * @response_pl_sz: Size of the response payload (bytes) > > > * @rv: Return value. Length of received response or error (bytes) > > > * @complete: Called when task is complete > > > - * @private: Private data for the consumer > > > - * @work: Used internally by the mailbox > > > * @doe_mb: Used internally by the mailbox > > > * > > > * The payload sizes and rv are specified in bytes with the following > > > @@ -50,11 +48,6 @@ struct pci_doe_task { > > > u32 *response_pl; > > > size_t response_pl_sz; > > > int rv; > > > - void (*complete)(struct pci_doe_task *task); > > > - void *private; > > > - > > > - /* No need for the user to initialize these fields */ > > > - struct work_struct work; > > > struct pci_doe_mb *doe_mb; > > > }; > > > > > > @@ -72,6 +65,5 @@ struct pci_doe_task { > > > > > > struct pci_doe_mb *pcim_doe_create_mb(struct pci_dev *pdev, u16 cap_offset); > > > bool pci_doe_supports_prot(struct pci_doe_mb *doe_mb, u16 vid, u8 type); > > > -int pci_doe_submit_task(struct pci_doe_mb *doe_mb, struct pci_doe_task *task); > > > - > > > +int pci_doe_submit_task_wait(struct pci_doe_mb *doe_mb, struct pci_doe_task *task); > > > #endif > > > > > > > > > > good to see that we can have a sync interface. > > I think that we still need some methods to garantee doe_task can be > > handled one by one in doe_mb? When more than one kernel thread are > > going to transfer data over a same doe_mb, only one kernel thread can > > use it and others will failed in exec_task(). > > > > Oh, good catch, yes, this likely needs a mutex_lock_interruptible() over > exec_task(), or similar. Indeed, good catch. I'll get to this today; coded up and tested. Thanks, Ira
[+cc Thomas Gleixner, author of dc186ad741c1] On Fri, Nov 18, 2022 at 09:20:38AM +0000, David Laight wrote: > > From: ira.weiny@intel.com > > Sent: 18 November 2022 00:05 > > > > Work item initialization needs to be done with either > > INIT_WORK_ONSTACK() or INIT_WORK() depending on how the work item is > > allocated. > > > > The callers of pci_doe_submit_task() allocate struct pci_doe_task on the > > stack and pci_doe_submit_task() incorrectly used INIT_WORK(). > > > > Jonathan suggested creating doe task allocation macros such as > > DECLARE_CDAT_DOE_TASK_ONSTACK().[1] The issue with this is the work > > function is not known to the callers and must be initialized correctly. > > > > A follow up suggestion was to have an internal 'pci_doe_work' item > > allocated by pci_doe_submit_task().[2] This requires an allocation which > > could restrict the context where tasks are used. > > > > Another idea was to have an intermediate step to initialize the task > > struct with a new call.[3] This added a lot of complexity. > > > > Lukas pointed out that object_is_on_stack() is available to detect this > > automatically. > > > > Use object_is_on_stack() to determine the correct init work function to > > call. > > This is all a bit strange. > The 'onstack' flag is needed for the diagnostic check: > is_on_stack = object_is_on_stack(addr); > if (is_on_stack == onstack) > return; > pr_warn(...); > WARN_ON(1); > > So setting the flag to the location of the buffer just subverts the check. > It that is sane there ought to be a proper way to do it. If object_is_on_stack() is sufficient to check whether a struct is on the stack or not, why doesn't __init_work() use it to auto-detect whether to call debug_object_init_on_stack() or debug_object_init()? Forcing developers to use a specific initializer for something that can be auto-detected is akin to treating them like kids and telling them "You didn't say the magic word." What's the point? Thanks, Lukas
Now with Thomas added to cc for real. On Tue, Nov 22, 2022 at 06:13:09PM +0100, Lukas Wunner wrote: > [+cc Thomas Gleixner, author of dc186ad741c1] > > On Fri, Nov 18, 2022 at 09:20:38AM +0000, David Laight wrote: > > > From: ira.weiny@intel.com > > > Sent: 18 November 2022 00:05 > > > > > > Work item initialization needs to be done with either > > > INIT_WORK_ONSTACK() or INIT_WORK() depending on how the work item is > > > allocated. > > > > > > The callers of pci_doe_submit_task() allocate struct pci_doe_task on the > > > stack and pci_doe_submit_task() incorrectly used INIT_WORK(). > > > > > > Jonathan suggested creating doe task allocation macros such as > > > DECLARE_CDAT_DOE_TASK_ONSTACK().[1] The issue with this is the work > > > function is not known to the callers and must be initialized correctly. > > > > > > A follow up suggestion was to have an internal 'pci_doe_work' item > > > allocated by pci_doe_submit_task().[2] This requires an allocation which > > > could restrict the context where tasks are used. > > > > > > Another idea was to have an intermediate step to initialize the task > > > struct with a new call.[3] This added a lot of complexity. > > > > > > Lukas pointed out that object_is_on_stack() is available to detect this > > > automatically. > > > > > > Use object_is_on_stack() to determine the correct init work function to > > > call. > > > > This is all a bit strange. > > The 'onstack' flag is needed for the diagnostic check: > > is_on_stack = object_is_on_stack(addr); > > if (is_on_stack == onstack) > > return; > > pr_warn(...); > > WARN_ON(1); > > > > So setting the flag to the location of the buffer just subverts the check. > > It that is sane there ought to be a proper way to do it. > > If object_is_on_stack() is sufficient to check whether a struct > is on the stack or not, why doesn't __init_work() use it to > auto-detect whether to call debug_object_init_on_stack() or > debug_object_init()? > > Forcing developers to use a specific initializer for something > that can be auto-detected is akin to treating them like kids > and telling them "You didn't say the magic word." > > What's the point? > > Thanks, > > Lukas
Lukas Wunner wrote: > Now with Thomas added to cc for real. > > On Tue, Nov 22, 2022 at 06:13:09PM +0100, Lukas Wunner wrote: > > [+cc Thomas Gleixner, author of dc186ad741c1] > > > > On Fri, Nov 18, 2022 at 09:20:38AM +0000, David Laight wrote: > > > > From: ira.weiny@intel.com > > > > Sent: 18 November 2022 00:05 > > > > > > > > Work item initialization needs to be done with either > > > > INIT_WORK_ONSTACK() or INIT_WORK() depending on how the work item is > > > > allocated. > > > > > > > > The callers of pci_doe_submit_task() allocate struct pci_doe_task on the > > > > stack and pci_doe_submit_task() incorrectly used INIT_WORK(). > > > > > > > > Jonathan suggested creating doe task allocation macros such as > > > > DECLARE_CDAT_DOE_TASK_ONSTACK().[1] The issue with this is the work > > > > function is not known to the callers and must be initialized correctly. > > > > > > > > A follow up suggestion was to have an internal 'pci_doe_work' item > > > > allocated by pci_doe_submit_task().[2] This requires an allocation which > > > > could restrict the context where tasks are used. > > > > > > > > Another idea was to have an intermediate step to initialize the task > > > > struct with a new call.[3] This added a lot of complexity. > > > > > > > > Lukas pointed out that object_is_on_stack() is available to detect this > > > > automatically. > > > > > > > > Use object_is_on_stack() to determine the correct init work function to > > > > call. > > > > > > This is all a bit strange. > > > The 'onstack' flag is needed for the diagnostic check: > > > is_on_stack = object_is_on_stack(addr); > > > if (is_on_stack == onstack) > > > return; > > > pr_warn(...); > > > WARN_ON(1); > > > > > > So setting the flag to the location of the buffer just subverts the check. > > > It that is sane there ought to be a proper way to do it. > > > > If object_is_on_stack() is sufficient to check whether a struct > > is on the stack or not, why doesn't __init_work() use it to > > auto-detect whether to call debug_object_init_on_stack() or > > debug_object_init()? > > > > Forcing developers to use a specific initializer for something > > that can be auto-detected is akin to treating them like kids > > and telling them "You didn't say the magic word." > > > > What's the point? I had this initial reaction as well, but INIT_WORK_ONSTACK() documents an important detail of the object's lifetime. Here are 2 examples of functions that would become trickier to read if the kernel did a global s/INIT_WORK_ONSTACK()/INIT_WORK()/ synchronize_rcu_expedited_queue_work() insert_wq_barrier() ...where those take arguments that are known to come from the stack and be used in async context.
From: Dan Williams > Sent: 22 November 2022 20:23 ... > > > > > Lukas pointed out that object_is_on_stack() is available to detect this > > > > > automatically. > > > > > > > > > > Use object_is_on_stack() to determine the correct init work function to > > > > > call. > > > > > > > > This is all a bit strange. > > > > The 'onstack' flag is needed for the diagnostic check: > > > > is_on_stack = object_is_on_stack(addr); > > > > if (is_on_stack == onstack) > > > > return; > > > > pr_warn(...); > > > > WARN_ON(1); > > > > > > > > So setting the flag to the location of the buffer just subverts the check. > > > > It that is sane there ought to be a proper way to do it. > > > > > > If object_is_on_stack() is sufficient to check whether a struct > > > is on the stack or not, why doesn't __init_work() use it to > > > auto-detect whether to call debug_object_init_on_stack() or > > > debug_object_init()? > > > > > > Forcing developers to use a specific initializer for something > > > that can be auto-detected is akin to treating them like kids > > > and telling them "You didn't say the magic word." > > > > > > What's the point? > > I had this initial reaction as well, but INIT_WORK_ONSTACK() documents > an important detail of the object's lifetime. Here are 2 examples of > functions that would become trickier to read if the kernel did a > global s/INIT_WORK_ONSTACK()/INIT_WORK()/ > > synchronize_rcu_expedited_queue_work() > insert_wq_barrier() > > ...where those take arguments that are known to come from the stack and > be used in async context. I suspect the check was added in response to some code that added on on-stack item and then slept after returning from that function. One option would be to change the diagnostic check to: is_on_stack != !object_is_on_stack(addr) and then pass in 2 so the test always succeeds. But I suspect that won't be liked. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
diff --git a/drivers/pci/doe.c b/drivers/pci/doe.c index e402f05068a5..42de517022d9 100644 --- a/drivers/pci/doe.c +++ b/drivers/pci/doe.c @@ -19,6 +19,7 @@ #include <linux/pci.h> #include <linux/pci-doe.h> #include <linux/workqueue.h> +#include <linux/sched/task_stack.h> #define PCI_DOE_PROTOCOL_DISCOVERY 0 @@ -529,7 +530,10 @@ int pci_doe_submit_task(struct pci_doe_mb *doe_mb, struct pci_doe_task *task) return -EIO; task->doe_mb = doe_mb; - INIT_WORK(&task->work, doe_statemachine_work); + if (object_is_on_stack(&task->work)) + INIT_WORK_ONSTACK(&task->work, doe_statemachine_work); + else + INIT_WORK(&task->work, doe_statemachine_work); queue_work(doe_mb->work_queue, &task->work); return 0; }