Message ID | 20221102203405.1797491-2-ogabbay@kernel.org |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp130158wru; Wed, 2 Nov 2022 13:40:04 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6WQYvMgb0GbUZTKu8dz7XA2VAju1hA0Kr2lMQ55PoRNQPH4VpWCdFJohQFKQWB+pw4Ol9w X-Received: by 2002:aa7:c718:0:b0:462:ff35:95dc with SMTP id i24-20020aa7c718000000b00462ff3595dcmr25317159edq.32.1667421604133; Wed, 02 Nov 2022 13:40:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667421604; cv=none; d=google.com; s=arc-20160816; b=dksE3A7ZUsSRBJvM4czyPtLRQHPyNG4YBa4gw8bN7yn7678GV4SuYo9Dw5NueaIMd8 jIhFC+AS0veMvzB/mPGPChi4IYeO+HtvMuoF8KRAdbQemeUxX3ock5AethuaUFG2jpcP shoawVXtB9vj0oL+evLVM77YHUNiDl1Hh5rW3ObZSCerbgzsvK40c43ZD9izRwAsxc9G soevXscZV9sZbFCLL4cdqsBUsufeea1ek9pXD096q4kHT5o1P8RW3hqtgtkUBSpM9M2Z 0xW8CkbxWogUvdXoKlBRBoaVF0oOxo2UBTbiAZ3cjdziimYA2V3BdcFz+gaYmNuKjwWm aGDg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=GVJlYlSkAseNvLzjIQaomJlHHi+SClvNR9wZWGE/lQA=; b=kJDi5adWQoyv8VtxqwS78o5qCK+ea6tX+ZRYcNx3DcB9EcG6eAulu5ucOKfdYQjosc 5uMsRngjNtdxZ46W5h3A2KNf+I3hedXGESTQ+XyaFySzBFYWJauKt7PB2RC1wlmQDzEM +ElKw6CGx4SRQH1HlknnWT0S9jNpTmDxLxC/F+QkM/Ye04ATftNfNQX2rRZxPEwFBd8N QvCpPGMUZCrEA1rYFU/OATIPns1w59JTEtxRmMdjVKBx11k+AGjs0vosSqSmcloKYK9I +MusZUUxxOIaKtrR3u7otsWLlqTtEVFGanti+jfaNLxVWhvecCGMEQ+ZjTlr/rgNXRVZ 0eCQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=vCJ8tsAO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id d9-20020a50ea89000000b00461c531d8casi14941227edo.305.2022.11.02.13.39.28; Wed, 02 Nov 2022 13:40:04 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=vCJ8tsAO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231253AbiKBUea (ORCPT <rfc822;yves.mi.zy@gmail.com> + 99 others); Wed, 2 Nov 2022 16:34:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59884 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231224AbiKBUeY (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 2 Nov 2022 16:34:24 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A17CC6400 for <linux-kernel@vger.kernel.org>; Wed, 2 Nov 2022 13:34:22 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 56C1FB823C4 for <linux-kernel@vger.kernel.org>; Wed, 2 Nov 2022 20:34:21 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id B4E80C433B5; Wed, 2 Nov 2022 20:34:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1667421260; bh=iJP0+58bg6H21ZtOSo1JNsjuytNpyr5vbaN+xzNBUTg=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=vCJ8tsAOc9AwtQU93KtI6qVLtY0NAR7NDFFnFqFRzqJ5xGQJw1/wuUpvL/kJNh5gV lJLEPv91FtZbeDA9clLnWwWSiqKJSeC2jCWYcJBmjgyO7GOIqsjDKvCyXWq7h98SH4 8qwutIHyRRtnJl0MS+ZF12lS3pVTlN5pZkPGk6iWdVzIia8Jy6U9qlP6kMHT2YdLlL ySnldnYE7n3mwcC7GPLOm/4EeHa7nja6+9xCJeD8Zdd6eX4fpnmcFJ7uJcfB2gPj+n DBAGFXJ4T5DcTaJjcsE3IIj+OSpWQPuRQu7ydTZ5q1fyuoaHadtgJKJ8EcNxQva/04 mgETA+skfPIDw== From: Oded Gabbay <ogabbay@kernel.org> To: David Airlie <airlied@gmail.com>, Daniel Vetter <daniel@ffwll.ch>, Arnd Bergmann <arnd@arndb.de>, Greg Kroah-Hartman <gregkh@linuxfoundation.org>, linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org, Jason Gunthorpe <jgg@nvidia.com>, John Hubbard <jhubbard@nvidia.com>, Alex Deucher <alexander.deucher@amd.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>, Maxime Ripard <mripard@kernel.org>, Thomas Zimmermann <tzimmermann@suse.de>, Yuji Ishikawa <yuji2.ishikawa@toshiba.co.jp>, Jiho Chu <jiho.chu@samsung.com>, Daniel Stone <daniel@fooishbar.org>, Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>, Jeffrey Hugo <quic_jhugo@quicinc.com>, Christoph Hellwig <hch@infradead.org>, Kevin Hilman <khilman@baylibre.com>, Jagan Teki <jagan@amarulasolutions.com>, Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>, Maciej Kwapulinski <maciej.kwapulinski@linux.intel.com> Subject: [RFC PATCH v2 1/3] drivers/accel: define kconfig and register a new major Date: Wed, 2 Nov 2022 22:34:03 +0200 Message-Id: <20221102203405.1797491-2-ogabbay@kernel.org> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20221102203405.1797491-1-ogabbay@kernel.org> References: <20221102203405.1797491-1-ogabbay@kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-8.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748418275776090076?= X-GMAIL-MSGID: =?utf-8?q?1748418275776090076?= |
Series |
new subsystem for compute accelerator devices
|
|
Commit Message
Oded Gabbay
Nov. 2, 2022, 8:34 p.m. UTC
Add a new Kconfig for the accel subsystem. The Kconfig currently
contains only the basic CONFIG_ACCEL option that will be used to
decide whether to compile the accel registration code. That code will
be called directly from the DRM core code.
The accelerator devices will be exposed to the user space with a new,
dedicated major number - 261.
The accel init function registers the new major number as a char device
and create corresponding sysfs and debugfs root entries, similar to
what is done in DRM init function.
I added a new header called drm_accel.h to include/drm/, that will hold
the prototypes of the accel_drv.c functions. In case CONFIG_ACCEL is
set to 'N', that header will contain empty inline implementations of
those functions, to allow DRM core code to compile successfully
without dependency on CONFIG_ACCEL.
I Updated the MAINTAINERS file accordingly with the newly added folder
and I have taken the liberty to appropriate the dri-devel mailing list
and the dri-devel IRC channel for the accel subsystem.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
---
Changes in v2:
- Created accel_drv.c that will hold the accel framework core functions,
instead of embedding the code inside drm core functions.
- Created drm/drm_accel.h
- Removed all #ifdef CONFIG_ACCEL from drm_drv.c
Documentation/admin-guide/devices.txt | 5 ++
MAINTAINERS | 8 ++
drivers/Kconfig | 2 +
drivers/Makefile | 3 +
drivers/accel/Kconfig | 24 ++++++
drivers/accel/Makefile | 10 +++
drivers/accel/accel_drv.c | 112 ++++++++++++++++++++++++++
include/drm/drm_accel.h | 31 +++++++
8 files changed, 195 insertions(+)
create mode 100644 drivers/accel/Kconfig
create mode 100644 drivers/accel/Makefile
create mode 100644 drivers/accel/accel_drv.c
create mode 100644 include/drm/drm_accel.h
--
2.25.1
Comments
On 11/2/2022 2:34 PM, Oded Gabbay wrote: > diff --git a/drivers/accel/accel_drv.c b/drivers/accel/accel_drv.c > new file mode 100644 > index 000000000000..6132765ea054 > --- /dev/null > +++ b/drivers/accel/accel_drv.c > @@ -0,0 +1,112 @@ > +// SPDX-License-Identifier: GPL-2.0 > + > +/* > + * Copyright 2022 HabanaLabs, Ltd. > + * All Rights Reserved. > + * > + */ > + > +#include <linux/module.h> Alphebetical order? > +#include <linux/debugfs.h> > +#include <linux/device.h> > + > +#include <drm/drm_accel.h> > +#include <drm/drm_ioctl.h> > +#include <drm/drm_print.h> > + > +static struct dentry *accel_debugfs_root; > +struct class *accel_class; Static?
On 11/2/22 13:34, Oded Gabbay wrote: > diff --git a/drivers/accel/Kconfig b/drivers/accel/Kconfig > new file mode 100644 > index 000000000000..282ea24f90c5 > --- /dev/null > +++ b/drivers/accel/Kconfig > @@ -0,0 +1,24 @@ > +# SPDX-License-Identifier: GPL-2.0-only > +# > +# Compute Acceleration device configuration > +# > +# This framework provides support for compute acceleration devices, such > +# as, but not limited to, Machine-Learning and Deep-Learning acceleration > +# devices > +# > +menuconfig ACCEL > + tristate "Compute Acceleration Framework" > + depends on DRM > + help > + Framework for device drivers of compute acceleration devices, such > + as, but not limited to, Machine-Learning and Deep-Learning > + acceleration devices. > + If you say Y here, you need to select the module that's right for > + your acceleration device from the list below. > + This framework is integrated with the DRM subsystem as compute > + accelerators and GPUs share a lot in common and can use almost the > + same infrastructure code. > + Having said that, acceleration devices will have a different > + major number than GPUs, and will be exposed to user-space using > + different device files, called accel/accel* (in /dev, sysfs > + and debugfs) Please add a period at the end of the help text. + and debugfs).
On Wed, Nov 02, 2022 at 10:34:03PM +0200, Oded Gabbay wrote: > --- /dev/null > +++ b/drivers/accel/Kconfig > @@ -0,0 +1,24 @@ > +# SPDX-License-Identifier: GPL-2.0-only > +# > +# Compute Acceleration device configuration > +# > +# This framework provides support for compute acceleration devices, such > +# as, but not limited to, Machine-Learning and Deep-Learning acceleration > +# devices > +# > +menuconfig ACCEL > + tristate "Compute Acceleration Framework" > + depends on DRM > + help > + Framework for device drivers of compute acceleration devices, such > + as, but not limited to, Machine-Learning and Deep-Learning > + acceleration devices. > + If you say Y here, you need to select the module that's right for > + your acceleration device from the list below. > + This framework is integrated with the DRM subsystem as compute > + accelerators and GPUs share a lot in common and can use almost the > + same infrastructure code. > + Having said that, acceleration devices will have a different > + major number than GPUs, and will be exposed to user-space using > + different device files, called accel/accel* (in /dev, sysfs > + and debugfs) Module name if "M" is chosen? > +static char *accel_devnode(struct device *dev, umode_t *mode) > +{ > + return kasprintf(GFP_KERNEL, "accel/%s", dev_name(dev)); > +} > + > +static CLASS_ATTR_STRING(accel_version, 0444, "accel 1.0.0 20221018"); What is a version number doing here? Please no, I understand that DRM has this, but it really does not make sense for any in-kernel code. And that's not how sysfs is supposed to work anyway (if a file is present, the value is documented, if the file is not present, the value is just not there, userspace has to handle it all.) > + > +/** > + * accel_sysfs_init - initialize sysfs helpers > + * > + * This is used to create the ACCEL class, which is the implicit parent of any > + * other top-level ACCEL sysfs objects. > + * > + * You must call accel_sysfs_destroy() to release the allocated resources. > + * > + * Return: 0 on success, negative error code on failure. > + */ > +static int accel_sysfs_init(void) > +{ > + int err; > + > + accel_class = class_create(THIS_MODULE, "accel"); > + if (IS_ERR(accel_class)) > + return PTR_ERR(accel_class); > + > + err = class_create_file(accel_class, &class_attr_accel_version.attr); Hint, if you ever find yourself adding sysfs files "by hand" like this, you are doing something wrong. The driver code should create them automatically for you by setting up default groups, _OR_ as in this case, you shouldn't be adding the file at all :) > +static void accel_sysfs_destroy(void) > +{ > + if (IS_ERR_OR_NULL(accel_class)) > + return; > + class_remove_file(accel_class, &class_attr_accel_version.attr); No need to manually destroy files when you remove a device. But you will remove this file anyway for the next version of this patch, so it's not a big deal :) > + class_destroy(accel_class); > + accel_class = NULL; > +} > + > +static int accel_stub_open(struct inode *inode, struct file *filp) > +{ > + DRM_DEBUG("Operation not supported"); ftrace is wonderful, please use that and not hand-rolled custom "I am here!" type messages like this. thanks, greg k-h
On Wed, Nov 2, 2022 at 11:04 PM Jeffrey Hugo <quic_jhugo@quicinc.com> wrote: > > On 11/2/2022 2:34 PM, Oded Gabbay wrote: > > diff --git a/drivers/accel/accel_drv.c b/drivers/accel/accel_drv.c > > new file mode 100644 > > index 000000000000..6132765ea054 > > --- /dev/null > > +++ b/drivers/accel/accel_drv.c > > @@ -0,0 +1,112 @@ > > +// SPDX-License-Identifier: GPL-2.0 > > + > > +/* > > + * Copyright 2022 HabanaLabs, Ltd. > > + * All Rights Reserved. > > + * > > + */ > > + > > +#include <linux/module.h> > > Alphebetical order? ok > > > +#include <linux/debugfs.h> > > +#include <linux/device.h> > > + > > +#include <drm/drm_accel.h> > > +#include <drm/drm_ioctl.h> > > +#include <drm/drm_print.h> > > + > > +static struct dentry *accel_debugfs_root; > > +struct class *accel_class; > > Static? > yes, thx.
On Thu, Nov 3, 2022 at 12:58 AM Randy Dunlap <rdunlap@infradead.org> wrote: > > > > On 11/2/22 13:34, Oded Gabbay wrote: > > diff --git a/drivers/accel/Kconfig b/drivers/accel/Kconfig > > new file mode 100644 > > index 000000000000..282ea24f90c5 > > --- /dev/null > > +++ b/drivers/accel/Kconfig > > @@ -0,0 +1,24 @@ > > +# SPDX-License-Identifier: GPL-2.0-only > > +# > > +# Compute Acceleration device configuration > > +# > > +# This framework provides support for compute acceleration devices, such > > +# as, but not limited to, Machine-Learning and Deep-Learning acceleration > > +# devices > > +# > > +menuconfig ACCEL > > + tristate "Compute Acceleration Framework" > > + depends on DRM > > + help > > + Framework for device drivers of compute acceleration devices, such > > + as, but not limited to, Machine-Learning and Deep-Learning > > + acceleration devices. > > + If you say Y here, you need to select the module that's right for > > + your acceleration device from the list below. > > + This framework is integrated with the DRM subsystem as compute > > + accelerators and GPUs share a lot in common and can use almost the > > + same infrastructure code. > > + Having said that, acceleration devices will have a different > > + major number than GPUs, and will be exposed to user-space using > > + different device files, called accel/accel* (in /dev, sysfs > > + and debugfs) > > Please add a period at the end of the help text. > > + and debugfs). sure, thx. Oded > > -- > ~Randy
On Thu, Nov 3, 2022 at 2:31 AM Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > > On Wed, Nov 02, 2022 at 10:34:03PM +0200, Oded Gabbay wrote: > > --- /dev/null > > +++ b/drivers/accel/Kconfig > > @@ -0,0 +1,24 @@ > > +# SPDX-License-Identifier: GPL-2.0-only > > +# > > +# Compute Acceleration device configuration > > +# > > +# This framework provides support for compute acceleration devices, such > > +# as, but not limited to, Machine-Learning and Deep-Learning acceleration > > +# devices > > +# > > +menuconfig ACCEL > > + tristate "Compute Acceleration Framework" > > + depends on DRM > > + help > > + Framework for device drivers of compute acceleration devices, such > > + as, but not limited to, Machine-Learning and Deep-Learning > > + acceleration devices. > > + If you say Y here, you need to select the module that's right for > > + your acceleration device from the list below. > > + This framework is integrated with the DRM subsystem as compute > > + accelerators and GPUs share a lot in common and can use almost the > > + same infrastructure code. > > + Having said that, acceleration devices will have a different > > + major number than GPUs, and will be exposed to user-space using > > + different device files, called accel/accel* (in /dev, sysfs > > + and debugfs) > > Module name if "M" is chosen? Will add > > > > +static char *accel_devnode(struct device *dev, umode_t *mode) > > +{ > > + return kasprintf(GFP_KERNEL, "accel/%s", dev_name(dev)); > > +} > > + > > +static CLASS_ATTR_STRING(accel_version, 0444, "accel 1.0.0 20221018"); > > What is a version number doing here? > > Please no, I understand that DRM has this, but it really does not make > sense for any in-kernel code. And that's not how sysfs is supposed to > work anyway (if a file is present, the value is documented, if the file > is not present, the value is just not there, userspace has to handle > it all.) Actually I don't know if drm even uses that. I just copied it to be identical to drm's sysfs, but as accel doesn't have any history, I can remove it. > > > + > > +/** > > + * accel_sysfs_init - initialize sysfs helpers > > + * > > + * This is used to create the ACCEL class, which is the implicit parent of any > > + * other top-level ACCEL sysfs objects. > > + * > > + * You must call accel_sysfs_destroy() to release the allocated resources. > > + * > > + * Return: 0 on success, negative error code on failure. > > + */ > > +static int accel_sysfs_init(void) > > +{ > > + int err; > > + > > + accel_class = class_create(THIS_MODULE, "accel"); > > + if (IS_ERR(accel_class)) > > + return PTR_ERR(accel_class); > > + > > + err = class_create_file(accel_class, &class_attr_accel_version.attr); > > Hint, if you ever find yourself adding sysfs files "by hand" like this, > you are doing something wrong. The driver code should create them > automatically for you by setting up default groups, _OR_ as in this > case, you shouldn't be adding the file at all :) ok, removed. > > > +static void accel_sysfs_destroy(void) > > +{ > > + if (IS_ERR_OR_NULL(accel_class)) > > + return; > > + class_remove_file(accel_class, &class_attr_accel_version.attr); > > No need to manually destroy files when you remove a device. But you > will remove this file anyway for the next version of this patch, so it's > not a big deal :) :) > > > + class_destroy(accel_class); > > + accel_class = NULL; > > +} > > + > > +static int accel_stub_open(struct inode *inode, struct file *filp) > > +{ > > + DRM_DEBUG("Operation not supported"); > > ftrace is wonderful, please use that and not hand-rolled custom "I am > here!" type messages like this. I'll just remove it as this line is being replaced anyway in the next patch. Thanks, Oded > > thanks, > > greg k-h
On Thu, Nov 3, 2022 at 3:31 PM Oded Gabbay <ogabbay@kernel.org> wrote: > > On Thu, Nov 3, 2022 at 2:31 AM Greg Kroah-Hartman > <gregkh@linuxfoundation.org> wrote: > > > > On Wed, Nov 02, 2022 at 10:34:03PM +0200, Oded Gabbay wrote: > > > --- /dev/null > > > +++ b/drivers/accel/Kconfig > > > @@ -0,0 +1,24 @@ > > > +# SPDX-License-Identifier: GPL-2.0-only > > > +# > > > +# Compute Acceleration device configuration > > > +# > > > +# This framework provides support for compute acceleration devices, such > > > +# as, but not limited to, Machine-Learning and Deep-Learning acceleration > > > +# devices > > > +# > > > +menuconfig ACCEL > > > + tristate "Compute Acceleration Framework" > > > + depends on DRM > > > + help > > > + Framework for device drivers of compute acceleration devices, such > > > + as, but not limited to, Machine-Learning and Deep-Learning > > > + acceleration devices. > > > + If you say Y here, you need to select the module that's right for > > > + your acceleration device from the list below. > > > + This framework is integrated with the DRM subsystem as compute > > > + accelerators and GPUs share a lot in common and can use almost the > > > + same infrastructure code. > > > + Having said that, acceleration devices will have a different > > > + major number than GPUs, and will be exposed to user-space using > > > + different device files, called accel/accel* (in /dev, sysfs > > > + and debugfs) > > > > Module name if "M" is chosen? > Will add So, unfortunately, the path of doing accel as a kernel module won't work cleanly (Thanks stanislaw for pointing this out to me). The reason is the circular dependency between drm and accel. drm calls accel exported symbols during init and when devices are registering (all the minor handling), and accel calls drm exported symbols because I don't want to duplicate the entire drm core code. I'll keep this menuconfig to provide the ability to disable this code for people who think it is too "experimental". And in the future, when drivers will join this subsystem, they will need this place for their kconfig. Thanks, Oded
On 11/3/22 13:39, Oded Gabbay wrote: > On Thu, Nov 3, 2022 at 3:31 PM Oded Gabbay <ogabbay@kernel.org> wrote: >> >> On Thu, Nov 3, 2022 at 2:31 AM Greg Kroah-Hartman >> <gregkh@linuxfoundation.org> wrote: >>> >>> On Wed, Nov 02, 2022 at 10:34:03PM +0200, Oded Gabbay wrote: >>>> --- /dev/null >>>> +++ b/drivers/accel/Kconfig >>>> @@ -0,0 +1,24 @@ >>>> +# SPDX-License-Identifier: GPL-2.0-only >>>> +# >>>> +# Compute Acceleration device configuration >>>> +# >>>> +# This framework provides support for compute acceleration devices, such >>>> +# as, but not limited to, Machine-Learning and Deep-Learning acceleration >>>> +# devices >>>> +# >>>> +menuconfig ACCEL >>>> + tristate "Compute Acceleration Framework" >>>> + depends on DRM >>>> + help >>>> + Framework for device drivers of compute acceleration devices, such >>>> + as, but not limited to, Machine-Learning and Deep-Learning >>>> + acceleration devices. >>>> + If you say Y here, you need to select the module that's right for >>>> + your acceleration device from the list below. >>>> + This framework is integrated with the DRM subsystem as compute >>>> + accelerators and GPUs share a lot in common and can use almost the >>>> + same infrastructure code. >>>> + Having said that, acceleration devices will have a different >>>> + major number than GPUs, and will be exposed to user-space using >>>> + different device files, called accel/accel* (in /dev, sysfs >>>> + and debugfs) >>> >>> Module name if "M" is chosen? >> Will add > So, unfortunately, the path of doing accel as a kernel module won't > work cleanly (Thanks stanislaw for pointing this out to me). > The reason is the circular dependency between drm and accel. drm calls > accel exported symbols during init and when devices are registering > (all the minor handling), and accel calls drm exported symbols because > I don't want to duplicate the entire drm core code. But DRM is a tristate symbol, so during drm init (loadable module), couldn't it call accel init code (loadable module)? Or are you saying that they only work together if both of them are builtin? > I'll keep this menuconfig to provide the ability to disable this code > for people who think it is too "experimental". And in the future, when > drivers will join this subsystem, they will need this place for their > kconfig. > > Thanks, > Oded
On Thu, Nov 03, 2022 at 04:01:08PM -0700, Randy Dunlap wrote: > >>> Module name if "M" is chosen? > >> Will add > > So, unfortunately, the path of doing accel as a kernel module won't > > work cleanly (Thanks stanislaw for pointing this out to me). > > The reason is the circular dependency between drm and accel. drm calls > > accel exported symbols during init and when devices are registering > > (all the minor handling), and accel calls drm exported symbols because > > I don't want to duplicate the entire drm core code. > > But DRM is a tristate symbol, so during drm init (loadable module), couldn't > it call accel init code (loadable module)? > > Or are you saying that they only work together if both of them are builtin? Yes, with current state of the patches, we can not build code as modules. There are symbols in accel that are from drm and we use accel symbols in drm. This could be fixed by separating symbols accel requires in separate module i.e. drm_file_helper.ko, however Oded proposed to make CONFIG_ACCEL compile option for DRM and all accel code will be build in drm.ko . I think that ok, since accel is not big. Regards Stanislaw
On Thu, Nov 03, 2022 at 10:39:36PM +0200, Oded Gabbay wrote: > On Thu, Nov 3, 2022 at 3:31 PM Oded Gabbay <ogabbay@kernel.org> wrote: > > > > On Thu, Nov 3, 2022 at 2:31 AM Greg Kroah-Hartman > > <gregkh@linuxfoundation.org> wrote: > > > > > > On Wed, Nov 02, 2022 at 10:34:03PM +0200, Oded Gabbay wrote: > > > > --- /dev/null > > > > +++ b/drivers/accel/Kconfig > > > > @@ -0,0 +1,24 @@ > > > > +# SPDX-License-Identifier: GPL-2.0-only > > > > +# > > > > +# Compute Acceleration device configuration > > > > +# > > > > +# This framework provides support for compute acceleration devices, such > > > > +# as, but not limited to, Machine-Learning and Deep-Learning acceleration > > > > +# devices > > > > +# > > > > +menuconfig ACCEL > > > > + tristate "Compute Acceleration Framework" > > > > + depends on DRM > > > > + help > > > > + Framework for device drivers of compute acceleration devices, such > > > > + as, but not limited to, Machine-Learning and Deep-Learning > > > > + acceleration devices. > > > > + If you say Y here, you need to select the module that's right for > > > > + your acceleration device from the list below. > > > > + This framework is integrated with the DRM subsystem as compute > > > > + accelerators and GPUs share a lot in common and can use almost the > > > > + same infrastructure code. > > > > + Having said that, acceleration devices will have a different > > > > + major number than GPUs, and will be exposed to user-space using > > > > + different device files, called accel/accel* (in /dev, sysfs > > > > + and debugfs) > > > > > > Module name if "M" is chosen? > > Will add > So, unfortunately, the path of doing accel as a kernel module won't > work cleanly (Thanks stanislaw for pointing this out to me). > The reason is the circular dependency between drm and accel. drm calls > accel exported symbols during init and when devices are registering > (all the minor handling), and accel calls drm exported symbols because > I don't want to duplicate the entire drm core code. I really don't think this is the right way to integrate with DRM. Accel should be a layer over top of DRM, not have these wakky co-dependencies. The fact you are running into stuff like this already smells really bad. Jason
On Mon, Nov 7, 2022 at 2:56 PM Jason Gunthorpe <jgg@nvidia.com> wrote: > > On Thu, Nov 03, 2022 at 10:39:36PM +0200, Oded Gabbay wrote: > > On Thu, Nov 3, 2022 at 3:31 PM Oded Gabbay <ogabbay@kernel.org> wrote: > > > > > > On Thu, Nov 3, 2022 at 2:31 AM Greg Kroah-Hartman > > > <gregkh@linuxfoundation.org> wrote: > > > > > > > > On Wed, Nov 02, 2022 at 10:34:03PM +0200, Oded Gabbay wrote: > > > > > --- /dev/null > > > > > +++ b/drivers/accel/Kconfig > > > > > @@ -0,0 +1,24 @@ > > > > > +# SPDX-License-Identifier: GPL-2.0-only > > > > > +# > > > > > +# Compute Acceleration device configuration > > > > > +# > > > > > +# This framework provides support for compute acceleration devices, such > > > > > +# as, but not limited to, Machine-Learning and Deep-Learning acceleration > > > > > +# devices > > > > > +# > > > > > +menuconfig ACCEL > > > > > + tristate "Compute Acceleration Framework" > > > > > + depends on DRM > > > > > + help > > > > > + Framework for device drivers of compute acceleration devices, such > > > > > + as, but not limited to, Machine-Learning and Deep-Learning > > > > > + acceleration devices. > > > > > + If you say Y here, you need to select the module that's right for > > > > > + your acceleration device from the list below. > > > > > + This framework is integrated with the DRM subsystem as compute > > > > > + accelerators and GPUs share a lot in common and can use almost the > > > > > + same infrastructure code. > > > > > + Having said that, acceleration devices will have a different > > > > > + major number than GPUs, and will be exposed to user-space using > > > > > + different device files, called accel/accel* (in /dev, sysfs > > > > > + and debugfs) > > > > > > > > Module name if "M" is chosen? > > > Will add > > So, unfortunately, the path of doing accel as a kernel module won't > > work cleanly (Thanks stanislaw for pointing this out to me). > > The reason is the circular dependency between drm and accel. drm calls > > accel exported symbols during init and when devices are registering > > (all the minor handling), and accel calls drm exported symbols because > > I don't want to duplicate the entire drm core code. > > I really don't think this is the right way to integrate with > DRM. Accel should be a layer over top of DRM, not have these wakky > co-dependencies. > > The fact you are running into stuff like this already smells really > bad. > > Jason I don't agree with your statement that it should be "a layer over top of DRM". Anything on top of DRM is a device driver. Accel is not a device driver, it is a new type of drm minor / drm driver. Please look at v3 of the patch-set. There I abandoned the idea of having accel as a separate module and instead it is part of drm.ko, as it should be because it is just a new drm minor. The only alternative imo to that is to abandon the idea of reusing drm, and just make an independant accel core code. Oded
On Mon, Nov 07, 2022 at 03:01:08PM +0200, Oded Gabbay wrote: > I don't agree with your statement that it should be "a layer over top of DRM". > Anything on top of DRM is a device driver. > Accel is not a device driver, it is a new type of drm minor / drm driver. Yeah, I still think this is not the right way, you are getting almost nothing from DRM and making everything more complicated in the process. > The only alternative imo to that is to abandon the idea of reusing > drm, and just make an independant accel core code. Not quite really, layer it properly and librarize parts of DRM into things accel can re-use so they are not intimately tied to the DRM struct device notion. IMHO this is much better, because accel has very little need of DRM to manage a struct device/cdev in the first place. Jason
Hi On Mon, Nov 07, 2022 at 09:10:36AM -0400, Jason Gunthorpe wrote: > On Mon, Nov 07, 2022 at 03:01:08PM +0200, Oded Gabbay wrote: > > I don't agree with your statement that it should be "a layer over top of DRM". > > Anything on top of DRM is a device driver. > > Accel is not a device driver, it is a new type of drm minor / drm driver. > > Yeah, I still think this is not the right way, you are getting almost > nothing from DRM and making everything more complicated in the > process. > > > The only alternative imo to that is to abandon the idea of reusing > > drm, and just make an independant accel core code. > > Not quite really, layer it properly and librarize parts of DRM into > things accel can re-use so they are not intimately tied to the DRM > struct device notion. What do you mean by "struct device notion" ? struct drm_devce ? Regards Stanislaw
On Mon, Nov 7, 2022 at 3:10 PM Jason Gunthorpe <jgg@nvidia.com> wrote: > > On Mon, Nov 07, 2022 at 03:01:08PM +0200, Oded Gabbay wrote: > > I don't agree with your statement that it should be "a layer over top of DRM". > > Anything on top of DRM is a device driver. > > Accel is not a device driver, it is a new type of drm minor / drm driver. > > Yeah, I still think this is not the right way, you are getting almost > nothing from DRM and making everything more complicated in the > process. > > > The only alternative imo to that is to abandon the idea of reusing > > drm, and just make an independant accel core code. > > Not quite really, layer it properly and librarize parts of DRM into > things accel can re-use so they are not intimately tied to the DRM > struct device notion. > > IMHO this is much better, because accel has very little need of DRM to > manage a struct device/cdev in the first place. > > Jason I'm not following. How can an accel device be a new type of drm_minor, if it doesn't have access to all its functions and members ? How will accel device leverage, for example, the GEM code without being a drm_minor ? Librarizing parts of DRM sounds nice in theory but the reality is that everything there is interconnected, all the structures are interdependent. I would have to re-write the entire DRM library to make such a thing work. I don't think this was the intention. The current design makes the accel device an integral part of drm, with very minimal code duplication and without re-writing DRM. Oded
On Mon, Nov 07, 2022 at 04:02:01PM +0200, Oded Gabbay wrote: > On Mon, Nov 7, 2022 at 3:10 PM Jason Gunthorpe <jgg@nvidia.com> wrote: > > > > On Mon, Nov 07, 2022 at 03:01:08PM +0200, Oded Gabbay wrote: > > > I don't agree with your statement that it should be "a layer over top of DRM". > > > Anything on top of DRM is a device driver. > > > Accel is not a device driver, it is a new type of drm minor / drm driver. > > > > Yeah, I still think this is not the right way, you are getting almost > > nothing from DRM and making everything more complicated in the > > process. > > > > > The only alternative imo to that is to abandon the idea of reusing > > > drm, and just make an independant accel core code. > > > > Not quite really, layer it properly and librarize parts of DRM into > > things accel can re-use so they are not intimately tied to the DRM > > struct device notion. > > > > IMHO this is much better, because accel has very little need of DRM to > > manage a struct device/cdev in the first place. > > > > Jason > I'm not following. How can an accel device be a new type of drm_minor, > if it doesn't have access to all its functions and members ? "drm_minor" is not necessary anymore. Strictly managing minor numbers lost its value years ago when /dev/ was reorganized. Just use dynamic minors fully. > How will accel device leverage, for example, the GEM code without > being a drm_minor ? Split GEM into a library so it doesn't require that. > Librarizing parts of DRM sounds nice in theory but the reality is that > everything there is interconnected, all the structures are > interdependent. Yes, the kernel is full of stuff that needs improving. Let's not take shortcuts. > I would have to re-write the entire DRM library to make such a thing > work. I don't think this was the intention. Not necessarily you, whoever someday needs GEM would have to do some work. > The current design makes the accel device an integral part of drm, > with very minimal code duplication and without re-writing DRM. And it smells bad, you can't even make it into a proper module. Who knows what other problems will come. Jason
On Mon, Nov 7, 2022 at 4:10 PM Jason Gunthorpe <jgg@nvidia.com> wrote: > > On Mon, Nov 07, 2022 at 04:02:01PM +0200, Oded Gabbay wrote: > > On Mon, Nov 7, 2022 at 3:10 PM Jason Gunthorpe <jgg@nvidia.com> wrote: > > > > > > On Mon, Nov 07, 2022 at 03:01:08PM +0200, Oded Gabbay wrote: > > > > I don't agree with your statement that it should be "a layer over top of DRM". > > > > Anything on top of DRM is a device driver. > > > > Accel is not a device driver, it is a new type of drm minor / drm driver. > > > > > > Yeah, I still think this is not the right way, you are getting almost > > > nothing from DRM and making everything more complicated in the > > > process. > > > > > > > The only alternative imo to that is to abandon the idea of reusing > > > > drm, and just make an independant accel core code. > > > > > > Not quite really, layer it properly and librarize parts of DRM into > > > things accel can re-use so they are not intimately tied to the DRM > > > struct device notion. > > > > > > IMHO this is much better, because accel has very little need of DRM to > > > manage a struct device/cdev in the first place. > > > > > > Jason > > I'm not following. How can an accel device be a new type of drm_minor, > > if it doesn't have access to all its functions and members ? > > "drm_minor" is not necessary anymore. Strictly managing minor numbers > lost its value years ago when /dev/ was reorganized. Just use > dynamic minors fully. drm minor is not just about handling minor numbers. It contains the entire code to manage devices that register with drm framework (e.g. supply callbacks to file operations), manage their lifecycle, resources (e.g. automatic free of resources on release), sysfs, debugfs, etc. To take all of that out of drm.ko and make it a separate kernel module is a big change, which I don't know if the drm people even want me to do. > > > How will accel device leverage, for example, the GEM code without > > being a drm_minor ? > > Split GEM into a library so it doesn't require that. I don't see the advantage of doing that over defining accel as a new type of drm minor. > > > Librarizing parts of DRM sounds nice in theory but the reality is that > > everything there is interconnected, all the structures are > > interdependent. > > Yes, the kernel is full of stuff that needs improving. Let's not take > shortcuts. It's not about shortcuts. It's about a different way to solve this issue which I don't think is anyway hacky or inappropriate. > > > I would have to re-write the entire DRM library to make such a thing > > work. I don't think this was the intention. > > Not necessarily you, whoever someday needs GEM would have to do some > work. > > > The current design makes the accel device an integral part of drm, > > with very minimal code duplication and without re-writing DRM. > > And it smells bad, you can't even make it into a proper module. Who > knows what other problems will come. I would argue that if accel is part of drm, it doesn't have to be a proper module. I don't see that as a hard requirement. Oded > > Jason
On Mon, Nov 07, 2022 at 05:53:55PM +0200, Oded Gabbay wrote: > On Mon, Nov 7, 2022 at 4:10 PM Jason Gunthorpe <jgg@nvidia.com> wrote: > > > > On Mon, Nov 07, 2022 at 04:02:01PM +0200, Oded Gabbay wrote: > > > On Mon, Nov 7, 2022 at 3:10 PM Jason Gunthorpe <jgg@nvidia.com> wrote: > > > > > > > > On Mon, Nov 07, 2022 at 03:01:08PM +0200, Oded Gabbay wrote: > > > > > I don't agree with your statement that it should be "a layer over top of DRM". > > > > > Anything on top of DRM is a device driver. > > > > > Accel is not a device driver, it is a new type of drm minor / drm driver. > > > > > > > > Yeah, I still think this is not the right way, you are getting almost > > > > nothing from DRM and making everything more complicated in the > > > > process. > > > > > > > > > The only alternative imo to that is to abandon the idea of reusing > > > > > drm, and just make an independant accel core code. > > > > > > > > Not quite really, layer it properly and librarize parts of DRM into > > > > things accel can re-use so they are not intimately tied to the DRM > > > > struct device notion. > > > > > > > > IMHO this is much better, because accel has very little need of DRM to > > > > manage a struct device/cdev in the first place. > > > > > > > > Jason > > > I'm not following. How can an accel device be a new type of drm_minor, > > > if it doesn't have access to all its functions and members ? > > > > "drm_minor" is not necessary anymore. Strictly managing minor numbers > > lost its value years ago when /dev/ was reorganized. Just use > > dynamic minors fully. > drm minor is not just about handling minor numbers. It contains the > entire code to manage devices that register with drm framework (e.g. > supply callbacks to file operations), manage their lifecycle, > resources (e.g. automatic free of resources on release), sysfs, > debugfs, etc. This is why you are having such troubles, this is already good library code. You don't need DRM to wrapper debugfs APIs, for instance. We have devm, though maybe it is not a good idea, etc Greg already pointed out the sysfs was not being done correctly anyhow. I don't think DRM is improving on these core kernel services. Just use the normal stuff directly. > > > How will accel device leverage, for example, the GEM code without > > > being a drm_minor ? > > > > Split GEM into a library so it doesn't require that. > I don't see the advantage of doing that over defining accel as a new > type of drm minor. Making things into smaller libraries is recognized as a far better kernel approach than trying to make a gigantic wide midlayer that stuffs itself into everything. LWN called this the "midlayer mistake" and wrote about the pitfalls a long time ago: https://lwn.net/Articles/336262/ It is exactly what you are experiencing trying to stretch a midlayer even further out. Jason
On Mon, Nov 7, 2022 at 6:31 PM Jason Gunthorpe <jgg@nvidia.com> wrote: > > On Mon, Nov 07, 2022 at 05:53:55PM +0200, Oded Gabbay wrote: > > On Mon, Nov 7, 2022 at 4:10 PM Jason Gunthorpe <jgg@nvidia.com> wrote: > > > > > > On Mon, Nov 07, 2022 at 04:02:01PM +0200, Oded Gabbay wrote: > > > > On Mon, Nov 7, 2022 at 3:10 PM Jason Gunthorpe <jgg@nvidia.com> wrote: > > > > > > > > > > On Mon, Nov 07, 2022 at 03:01:08PM +0200, Oded Gabbay wrote: > > > > > > I don't agree with your statement that it should be "a layer over top of DRM". > > > > > > Anything on top of DRM is a device driver. > > > > > > Accel is not a device driver, it is a new type of drm minor / drm driver. > > > > > > > > > > Yeah, I still think this is not the right way, you are getting almost > > > > > nothing from DRM and making everything more complicated in the > > > > > process. > > > > > > > > > > > The only alternative imo to that is to abandon the idea of reusing > > > > > > drm, and just make an independant accel core code. > > > > > > > > > > Not quite really, layer it properly and librarize parts of DRM into > > > > > things accel can re-use so they are not intimately tied to the DRM > > > > > struct device notion. > > > > > > > > > > IMHO this is much better, because accel has very little need of DRM to > > > > > manage a struct device/cdev in the first place. > > > > > > > > > > Jason > > > > I'm not following. How can an accel device be a new type of drm_minor, > > > > if it doesn't have access to all its functions and members ? > > > > > > "drm_minor" is not necessary anymore. Strictly managing minor numbers > > > lost its value years ago when /dev/ was reorganized. Just use > > > dynamic minors fully. > > drm minor is not just about handling minor numbers. It contains the > > entire code to manage devices that register with drm framework (e.g. > > supply callbacks to file operations), manage their lifecycle, > > resources (e.g. automatic free of resources on release), sysfs, > > debugfs, etc. > > This is why you are having such troubles, this is already good library > code. You don't need DRM to wrapper debugfs APIs, for instance. We > have devm, though maybe it is not a good idea, etc > > Greg already pointed out the sysfs was not being done correctly > anyhow. > > I don't think DRM is improving on these core kernel services. Just use > the normal stuff directly. I get what you are saying but if I do all that, then how is this subsystem related to DRM and re-using its code ? (at least at this stage) btw, using the basic stuff directly was my original intention, if you remember the original accel mail thread from July/August. And then we all decided in LPC that we shouldn't do that and instead accel should use the DRM code and just expose a new major+minor for the new drivers. So, something doesn't add up... imo, we need to choose between doing accel either as a small new feature in drm, or as an independent subsystem. I just don't see how I do the former without calling drm code directly and using all its wrappers. > > > > > How will accel device leverage, for example, the GEM code without > > > > being a drm_minor ? > > > > > > Split GEM into a library so it doesn't require that. > > I don't see the advantage of doing that over defining accel as a new > > type of drm minor. > > Making things into smaller libraries is recognized as a far better > kernel approach than trying to make a gigantic wide midlayer that stuffs > itself into everything. LWN called this the "midlayer mistake" and > wrote about the pitfalls a long time ago: > > https://lwn.net/Articles/336262/ > > It is exactly what you are experiencing trying to stretch a > midlayer even further out. > > Jason I'm all for breaking it down to smaller libraries, I completely agree with you. But as you wrote above, why do I even need to use the drm wrappers for the basic stuff ? I'll just call the kernel api directly. And if that's the case then I don't need to rip that code out of the heart of drm and make it a separate module. For GEM (as an example of something less basic) it might be a different story, but we are not there yet. Oded
On Mon, 7 Nov 2022 at 23:10, Jason Gunthorpe <jgg@nvidia.com> wrote: > > On Mon, Nov 07, 2022 at 03:01:08PM +0200, Oded Gabbay wrote: > > I don't agree with your statement that it should be "a layer over top of DRM". > > Anything on top of DRM is a device driver. > > Accel is not a device driver, it is a new type of drm minor / drm driver. > > Yeah, I still think this is not the right way, you are getting almost > nothing from DRM and making everything more complicated in the > process. You are looking at the small picture that is these patches, there are just infrastructure to start the process of merging drivers and reusing other parts of the drm code. We aren't going to ever get anywhere if we start splitting code out of drm just in case, we get this stuff rolling in the tree and if we have a pressing need to refactor it out into separate libraries later then we can address that from a more educated place, instead of just throwing huge refactors around before we have any code to even use them. > > IMHO this is much better, because accel has very little need of DRM to > manage a struct device/cdev in the first place. Right now it doesn't, but when drivers start leveraging the other code it will reuse a lot more code. I'm not going to spend too much time entertaining this, devm vs drmm memory etc are real problems drm has already identified if not completely solved. Dave.
> > > > > > "drm_minor" is not necessary anymore. Strictly managing minor numbers > > > lost its value years ago when /dev/ was reorganized. Just use > > > dynamic minors fully. > > drm minor is not just about handling minor numbers. It contains the > > entire code to manage devices that register with drm framework (e.g. > > supply callbacks to file operations), manage their lifecycle, > > resources (e.g. automatic free of resources on release), sysfs, > > debugfs, etc. > > This is why you are having such troubles, this is already good library > code. You don't need DRM to wrapper debugfs APIs, for instance. We > have devm, though maybe it is not a good idea, etc > > Greg already pointed out the sysfs was not being done correctly > anyhow. > > I don't think DRM is improving on these core kernel services. Just use > the normal stuff directly. At plumbers we decided a direction, I think the direction is good, if there is refactoring to be done, I'd rather it was done in tree with a clear direction. Coming in now and saying we should go down a different path isn't really helpful. We need to get rolling on this, we have drivers that want to land somewhere now, which means we need to just get a framework in place, leveraging drm code is the way to do it. There is no need to an "accel" module, what does that even buy you, the idea is to have an accel subsystem that allows drivers to use drm features, not an accel subsystem that refactors drm features, that would take years. There are already drivers for this subsystem wanting to use GEM, and I don't think holding them up for a year to refactor something that we don't have a clear reason or goal behind refactoring. If there is a problem with the drm subsystem interactions with the kernel standard implementations then let's go fix that and accel will also get fixed, but there's no reason to start going down that road at the same time as introducing accel. Also with the idr/xarray stuff, this isn't the patchset to be introducing a bunch of new and divergent work, if this patchset identifies deficiencies then let's document them and work on them in parallel instead of blocking the initial landing in favour of some future refactors with no in-tree users. Dave.
On Tue, Nov 08, 2022 at 06:33:23AM +1000, Dave Airlie wrote: > At plumbers we decided a direction, I think the direction is good, if > there is refactoring to be done, I'd rather it was done in tree with a > clear direction. > > Coming in now and saying we should go down a different path isn't > really helpful. We need to get rolling on this, we have drivers that > want to land somewhere now, which means we need to just get a > framework in place, leveraging drm code is the way to do it. It is not a different path, at plumbers we decided accel should try to re-use parts of DRM that make sense. I think that should be done by making those DRM parts into libraries that can be re-used, not by trying to twist DRM into something weird. If this thing needs special major/minor numbers, it's own class, its own debufs, sysfs, etc, then it should not be abusing the DRM struct device infrastructure to create that very basic kernel infrastructure. Somehow we ended up with the worst of both worlds. If you want to to be DRM then it should just be DRM and we shouldn't see all this core infrastructue code for debugfs/sysfs/cdevs/etc in thes patches at all. Jason
On Tue, 8 Nov 2022 at 22:28, Jason Gunthorpe <jgg@nvidia.com> wrote: > > On Tue, Nov 08, 2022 at 06:33:23AM +1000, Dave Airlie wrote: > > > At plumbers we decided a direction, I think the direction is good, if > > there is refactoring to be done, I'd rather it was done in tree with a > > clear direction. > > > > Coming in now and saying we should go down a different path isn't > > really helpful. We need to get rolling on this, we have drivers that > > want to land somewhere now, which means we need to just get a > > framework in place, leveraging drm code is the way to do it. > > It is not a different path, at plumbers we decided accel should try to > re-use parts of DRM that make sense. I think that should be done by > making those DRM parts into libraries that can be re-used, not by > trying to twist DRM into something weird. There isn't much twisting here, the thing is this is just the code for sharing, there isn't going to be mountains more. This code gives accel drivers access to a lot of things. Refactoring it out will take a year or so, and I don't think buys us anything. > > If this thing needs special major/minor numbers, it's own class, its > own debufs, sysfs, etc, then it should not be abusing the DRM struct > device infrastructure to create that very basic kernel infrastructure. > > Somehow we ended up with the worst of both worlds. If you want to to > be DRM then it should just be DRM and we shouldn't see all this core > infrastructue code for debugfs/sysfs/cdevs/etc in thes patches at all. We can refactor this out even clearer in the long run if it needs to, but you are overly focusing on the small picture of these patches and not the larger sharing this enables. At this point I'm going to be merging close to what we have here, so we can move forward with getting some drivers lined up. Dave.
diff --git a/Documentation/admin-guide/devices.txt b/Documentation/admin-guide/devices.txt index 9764d6edb189..06c525e01ea5 100644 --- a/Documentation/admin-guide/devices.txt +++ b/Documentation/admin-guide/devices.txt @@ -3080,6 +3080,11 @@ ... 255 = /dev/osd255 256th OSD Device + 261 char Compute Acceleration Devices + 0 = /dev/accel/accel0 First acceleration device + 1 = /dev/accel/accel1 Second acceleration device + ... + 384-511 char RESERVED FOR DYNAMIC ASSIGNMENT Character devices that request a dynamic allocation of major number will take numbers starting from 511 and downward, diff --git a/MAINTAINERS b/MAINTAINERS index ab07cf28844e..9b34f756e343 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -6825,6 +6825,14 @@ F: include/drm/drm* F: include/linux/vga* F: include/uapi/drm/drm* +DRM COMPUTE ACCELERATORS DRIVERS AND FRAMEWORK +M: Oded Gabbay <ogabbay@kernel.org> +L: dri-devel@lists.freedesktop.org +S: Maintained +C: irc://irc.oftc.net/dri-devel +T: git https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/accel.git +F: drivers/accel/ + DRM DRIVERS FOR ALLWINNER A10 M: Maxime Ripard <mripard@kernel.org> M: Chen-Yu Tsai <wens@csie.org> diff --git a/drivers/Kconfig b/drivers/Kconfig index 19ee995bd0ae..968bd0a6fd78 100644 --- a/drivers/Kconfig +++ b/drivers/Kconfig @@ -99,6 +99,8 @@ source "drivers/media/Kconfig" source "drivers/video/Kconfig" +source "drivers/accel/Kconfig" + source "sound/Kconfig" source "drivers/hid/Kconfig" diff --git a/drivers/Makefile b/drivers/Makefile index bdf1c66141c9..658199dcee96 100644 --- a/drivers/Makefile +++ b/drivers/Makefile @@ -62,6 +62,9 @@ obj-y += iommu/ # gpu/ comes after char for AGP vs DRM startup and after iommu obj-y += gpu/ +# accel is part of drm so it must come after gpu +obj-$(CONFIG_ACCEL) += accel/ + obj-$(CONFIG_CONNECTOR) += connector/ # i810fb and intelfb depend on char/agp/ diff --git a/drivers/accel/Kconfig b/drivers/accel/Kconfig new file mode 100644 index 000000000000..282ea24f90c5 --- /dev/null +++ b/drivers/accel/Kconfig @@ -0,0 +1,24 @@ +# SPDX-License-Identifier: GPL-2.0-only +# +# Compute Acceleration device configuration +# +# This framework provides support for compute acceleration devices, such +# as, but not limited to, Machine-Learning and Deep-Learning acceleration +# devices +# +menuconfig ACCEL + tristate "Compute Acceleration Framework" + depends on DRM + help + Framework for device drivers of compute acceleration devices, such + as, but not limited to, Machine-Learning and Deep-Learning + acceleration devices. + If you say Y here, you need to select the module that's right for + your acceleration device from the list below. + This framework is integrated with the DRM subsystem as compute + accelerators and GPUs share a lot in common and can use almost the + same infrastructure code. + Having said that, acceleration devices will have a different + major number than GPUs, and will be exposed to user-space using + different device files, called accel/accel* (in /dev, sysfs + and debugfs) diff --git a/drivers/accel/Makefile b/drivers/accel/Makefile new file mode 100644 index 000000000000..b5b7d812a8ef --- /dev/null +++ b/drivers/accel/Makefile @@ -0,0 +1,10 @@ +# SPDX-License-Identifier: GPL-2.0 + +# Makefile for the accel framework. This framework provides support for +# compute acceleration devices, such as, but not limited to, Machine-Learning +# and Deep-Learning acceleration devices + +accel-y := \ + accel_drv.o + +obj-$(CONFIG_ACCEL) += accel.o diff --git a/drivers/accel/accel_drv.c b/drivers/accel/accel_drv.c new file mode 100644 index 000000000000..6132765ea054 --- /dev/null +++ b/drivers/accel/accel_drv.c @@ -0,0 +1,112 @@ +// SPDX-License-Identifier: GPL-2.0 + +/* + * Copyright 2022 HabanaLabs, Ltd. + * All Rights Reserved. + * + */ + +#include <linux/module.h> +#include <linux/debugfs.h> +#include <linux/device.h> + +#include <drm/drm_accel.h> +#include <drm/drm_ioctl.h> +#include <drm/drm_print.h> + +static struct dentry *accel_debugfs_root; +struct class *accel_class; + +static char *accel_devnode(struct device *dev, umode_t *mode) +{ + return kasprintf(GFP_KERNEL, "accel/%s", dev_name(dev)); +} + +static CLASS_ATTR_STRING(accel_version, 0444, "accel 1.0.0 20221018"); + +/** + * accel_sysfs_init - initialize sysfs helpers + * + * This is used to create the ACCEL class, which is the implicit parent of any + * other top-level ACCEL sysfs objects. + * + * You must call accel_sysfs_destroy() to release the allocated resources. + * + * Return: 0 on success, negative error code on failure. + */ +static int accel_sysfs_init(void) +{ + int err; + + accel_class = class_create(THIS_MODULE, "accel"); + if (IS_ERR(accel_class)) + return PTR_ERR(accel_class); + + err = class_create_file(accel_class, &class_attr_accel_version.attr); + if (err) { + class_destroy(accel_class); + accel_class = NULL; + return err; + } + + accel_class->devnode = accel_devnode; + + return 0; +} + +/** + * accel_sysfs_destroy - destroys ACCEL class + * + * Destroy the ACCEL device class. + */ +static void accel_sysfs_destroy(void) +{ + if (IS_ERR_OR_NULL(accel_class)) + return; + class_remove_file(accel_class, &class_attr_accel_version.attr); + class_destroy(accel_class); + accel_class = NULL; +} + +static int accel_stub_open(struct inode *inode, struct file *filp) +{ + DRM_DEBUG("Operation not supported"); + + return -EOPNOTSUPP; +} + +static const struct file_operations accel_stub_fops = { + .owner = THIS_MODULE, + .open = accel_stub_open, + .llseek = noop_llseek, +}; + +void accel_core_exit(void) +{ + unregister_chrdev(ACCEL_MAJOR, "accel"); + debugfs_remove(accel_debugfs_root); + accel_sysfs_destroy(); +} + +int __init accel_core_init(void) +{ + int ret; + + ret = accel_sysfs_init(); + if (ret < 0) { + DRM_ERROR("Cannot create ACCEL class: %d\n", ret); + goto error; + } + + accel_debugfs_root = debugfs_create_dir("accel", NULL); + + ret = register_chrdev(ACCEL_MAJOR, "accel", &accel_stub_fops); + if (ret < 0) + goto error; + +error: + /* Any cleanup will be done in drm_core_exit() that will call + * to accel_core_exit() + */ + return ret; +} diff --git a/include/drm/drm_accel.h b/include/drm/drm_accel.h new file mode 100644 index 000000000000..cf43a7b30f34 --- /dev/null +++ b/include/drm/drm_accel.h @@ -0,0 +1,31 @@ +/* SPDX-License-Identifier: GPL-2.0 + * + * Copyright 2022 HabanaLabs, Ltd. + * All Rights Reserved. + * + */ + +#ifndef DRM_ACCEL_H_ +#define DRM_ACCEL_H_ + +#define ACCEL_MAJOR 261 + +#if IS_ENABLED(CONFIG_ACCEL) + +void accel_core_exit(void); +int accel_core_init(void); + +#else + +static inline void accel_core_exit(void) +{ +} + +static inline int __init accel_core_init(void) +{ + return 0; +} + +#endif /* IS_ENABLED(CONFIG_ACCEL) */ + +#endif /* DRM_ACCEL_H_ */