[RFC,v3,0/3] new subsystem for compute accelerator devices

Message ID 20221106210225.2065371-1-ogabbay@kernel.org
Headers
Series new subsystem for compute accelerator devices |

Message

Oded Gabbay Nov. 6, 2022, 9:02 p.m. UTC
  This is the third version of the RFC following the comments given on the
second version, but more importantly, following testing done by the VPU
driver people and myself. We found out that there is a circular dependency
between DRM and accel. DRM calls accel exported symbols during init and when
accel devices are registering (all the minor handling), then accel calls DRM
exported symbols. Therefore, if the two components are compiled as modules,
there is a circular dependency.

To overcome this, I have decided to compile the accel core code as part of
the DRM kernel module (drm.ko). IMO, this is inline with the spirit of the
design choice to have accel reuse the DRM core code and avoid code
duplication.

Another important change is that I have reverted back to use IDR for minor
handling instead of xarray. This is because I have found that xarray doesn't
handle well the scenario where you allocate a NULL entry and then exchange it
with a real pointer. It appears xarray still considers that entry a "zero"
entry. This is unfortunate because DRM works that way (first allocates a NULL
entry and then replaces the entry with a real pointer).

I decided to revert to IDR because I don't want to hold up these patches,
as many people are blocked until the support for accel is merged. The xarray
issue should be fixed as a separate patch by either fixing the xarray code or
changing how DRM + ACCEL do minor id handling.

The patches are in the following repo:
https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/accel.git/log/?h=accel_v3

As in v2, The HEAD of that branch is a commit adding a dummy driver that
registers an accel device using the new framework. This can be served
as a simple reference. I have checked inserting and removing the dummy driver,
and opening and closing /dev/accel/accel0 and nothing got broken :)

v1 cover letter:
https://lkml.org/lkml/2022/10/22/544

v2 cover letter:
https://lore.kernel.org/lkml/20221102203405.1797491-1-ogabbay@kernel.org/T/

Thanks,
Oded.

Oded Gabbay (3):
  drivers/accel: define kconfig and register a new major
  accel: add dedicated minor for accelerator devices
  drm: initialize accel framework

 Documentation/admin-guide/devices.txt |   5 +
 MAINTAINERS                           |   8 +
 drivers/Kconfig                       |   2 +
 drivers/accel/Kconfig                 |  24 ++
 drivers/accel/drm_accel.c             | 322 ++++++++++++++++++++++++++
 drivers/gpu/drm/Makefile              |   1 +
 drivers/gpu/drm/drm_drv.c             | 102 +++++---
 drivers/gpu/drm/drm_file.c            |   2 +-
 drivers/gpu/drm/drm_sysfs.c           |  24 +-
 include/drm/drm_accel.h               |  97 ++++++++
 include/drm/drm_device.h              |   3 +
 include/drm/drm_drv.h                 |   8 +
 include/drm/drm_file.h                |  21 +-
 13 files changed, 582 insertions(+), 37 deletions(-)
 create mode 100644 drivers/accel/Kconfig
 create mode 100644 drivers/accel/drm_accel.c
 create mode 100644 include/drm/drm_accel.h

--
2.25.1
  

Comments

Jeffrey Hugo Nov. 7, 2022, 4:07 p.m. UTC | #1
On 11/6/2022 2:02 PM, Oded Gabbay wrote:
> This is the third version of the RFC following the comments given on the
> second version, but more importantly, following testing done by the VPU
> driver people and myself. We found out that there is a circular dependency
> between DRM and accel. DRM calls accel exported symbols during init and when
> accel devices are registering (all the minor handling), then accel calls DRM
> exported symbols. Therefore, if the two components are compiled as modules,
> there is a circular dependency.
> 
> To overcome this, I have decided to compile the accel core code as part of
> the DRM kernel module (drm.ko). IMO, this is inline with the spirit of the
> design choice to have accel reuse the DRM core code and avoid code
> duplication.
> 
> Another important change is that I have reverted back to use IDR for minor
> handling instead of xarray. This is because I have found that xarray doesn't
> handle well the scenario where you allocate a NULL entry and then exchange it
> with a real pointer. It appears xarray still considers that entry a "zero"
> entry. This is unfortunate because DRM works that way (first allocates a NULL
> entry and then replaces the entry with a real pointer).
> 
> I decided to revert to IDR because I don't want to hold up these patches,
> as many people are blocked until the support for accel is merged. The xarray
> issue should be fixed as a separate patch by either fixing the xarray code or
> changing how DRM + ACCEL do minor id handling.

This sounds sane to me.  However, this appears to be something that 
Matthew Wilcox should be aware of (added for visibility).  Perhaps he 
has a very quick solution.  If not, at-least he might have ideas on how 
to best address in the future.

> The patches are in the following repo:
> https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/accel.git/log/?h=accel_v3
> 
> As in v2, The HEAD of that branch is a commit adding a dummy driver that
> registers an accel device using the new framework. This can be served
> as a simple reference. I have checked inserting and removing the dummy driver,
> and opening and closing /dev/accel/accel0 and nothing got broken :)
> 
> v1 cover letter:
> https://lkml.org/lkml/2022/10/22/544
> 
> v2 cover letter:
> https://lore.kernel.org/lkml/20221102203405.1797491-1-ogabbay@kernel.org/T/
> 
> Thanks,
> Oded.
> 
> Oded Gabbay (3):
>    drivers/accel: define kconfig and register a new major
>    accel: add dedicated minor for accelerator devices
>    drm: initialize accel framework
> 
>   Documentation/admin-guide/devices.txt |   5 +
>   MAINTAINERS                           |   8 +
>   drivers/Kconfig                       |   2 +
>   drivers/accel/Kconfig                 |  24 ++
>   drivers/accel/drm_accel.c             | 322 ++++++++++++++++++++++++++
>   drivers/gpu/drm/Makefile              |   1 +
>   drivers/gpu/drm/drm_drv.c             | 102 +++++---
>   drivers/gpu/drm/drm_file.c            |   2 +-
>   drivers/gpu/drm/drm_sysfs.c           |  24 +-
>   include/drm/drm_accel.h               |  97 ++++++++
>   include/drm/drm_device.h              |   3 +
>   include/drm/drm_drv.h                 |   8 +
>   include/drm/drm_file.h                |  21 +-
>   13 files changed, 582 insertions(+), 37 deletions(-)
>   create mode 100644 drivers/accel/Kconfig
>   create mode 100644 drivers/accel/drm_accel.c
>   create mode 100644 include/drm/drm_accel.h
> 
> --
> 2.25.1
>
  
Jason Gunthorpe Nov. 7, 2022, 4:20 p.m. UTC | #2
On Sun, Nov 06, 2022 at 11:02:22PM +0200, Oded Gabbay wrote:
> Another important change is that I have reverted back to use IDR for minor
> handling instead of xarray. This is because I have found that xarray doesn't
> handle well the scenario where you allocate a NULL entry and then exchange it
> with a real pointer. It appears xarray still considers that entry a "zero"
> entry. This is unfortunate because DRM works that way (first allocates a NULL
> entry and then replaces the entry with a real pointer).

This is what XA_ZERO_ENTRY is for.

Some APIs, like xa_alloc automatically promote NULL to XA_ZERO_ENTRY,
others require it to be explicit.

If you use the usual pattern of xa_alloc(NULL), xa_store(!NULL) then
you should be fine, as far as I know. So long as the xarray was tagged
as allocating.

Jason
  
Matthew Wilcox Nov. 7, 2022, 4:21 p.m. UTC | #3
On Mon, Nov 07, 2022 at 09:07:28AM -0700, Jeffrey Hugo wrote:
> > Another important change is that I have reverted back to use IDR for minor
> > handling instead of xarray. This is because I have found that xarray doesn't
> > handle well the scenario where you allocate a NULL entry and then exchange it
> > with a real pointer. It appears xarray still considers that entry a "zero"
> > entry. This is unfortunate because DRM works that way (first allocates a NULL
> > entry and then replaces the entry with a real pointer).
> > 
> > I decided to revert to IDR because I don't want to hold up these patches,
> > as many people are blocked until the support for accel is merged. The xarray
> > issue should be fixed as a separate patch by either fixing the xarray code or
> > changing how DRM + ACCEL do minor id handling.
> 
> This sounds sane to me.  However, this appears to be something that Matthew
> Wilcox should be aware of (added for visibility).  Perhaps he has a very
> quick solution.  If not, at-least he might have ideas on how to best address
> in the future.

Thanks for cc'ing me.  I wasn't aware of this problem because I hadn't
seen Oded's email yet.  The "problem" is simply a mis-use of the API.
  
Christopher Friedt Nov. 11, 2022, 10:03 p.m. UTC | #4
Hi Oded,

On Sun, Nov 6, 2022 at 4:03 PM Oded Gabbay <ogabbay@kernel.org> wrote:
> The patches are in the following repo:
> https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/accel.git/log/?h=accel_v3
>
> As in v2, The HEAD of that branch is a commit adding a dummy driver that
> registers an accel device using the new framework. This can be served
> as a simple reference. I have checked inserting and removing the dummy driver,
> and opening and closing /dev/accel/accel0 and nothing got broken :)
>
> v1 cover letter:
> https://lkml.org/lkml/2022/10/22/544
>
> v2 cover letter:
> https://lore.kernel.org/lkml/20221102203405.1797491-1-ogabbay@kernel.org/T/

I was in the room at Plumbers when a lot of this was discussed (in
2022 and also 2019), but I haven't really had an opportunity to
provide feedback until now. In general, I think it's great and thanks
for pushing it forward and getting feedback.

The v1 cover letter mentioned RAS (reliability, availability,
serviceability) and Dave also mentioned it here [1]. There was a
suggestion to use Netlink. It's an area that I'm fairly interested in
because I do a lot of development on the firmware side (and
specifically, with Zephyr).

Personally, I think Netlink could be one option for serializing and
deserializing RAS information but it would be helpful for that
interface to be somewhat flexible, like a void * and length, and to
provide userspace the capability of querying which RAS formats are
supported.

For example, AntMicro used OpenAMP + rpmsg in their NVMe accelerator,
and gave a talk on it at ZDS and Plumbers this year [2][3].

In Zephyr, the LGPL license for Netlink might be a non-starter
(although I'm no lawyer). However, Zephyr does already support
OpenAMP, protobufs, json, and will soon support Thrift.

Some companies might prefer to use Netlink. Others might prefer to use
ASN.1. Some companies might prefer to use key-value pairs and limit
the parameters and messages to uint32s. Some might handle all of the
RAS details in-kernel, while others might want the kernel to act more
like a transport to firmware.

Companies already producing accelerators may have a particular
preference for serialization / deserialization in their own
datacenters.

With that, it would be helpful to be able to query RAS capabilities via ioctl.

#define ACCEL_CAP_RAS_KEY_VAL_32 BIT(0)
#define ACCEL_CAP_RAS_NETLINK BIT(1)
#define ACCEL_CAP_RAS_JSON BIT(2)
#define ACCEL_CAP_RAS_PROTOBUF BIT(3)
#define ACCEL_CAP_RAS_GRPC BIT(4)
#define ACCEL_CAP_RAS_THRIFT BIT(5)
#define ACCEL_CAP_RAS_JSON BIT(6)
#define ACCEL_CAP_RAS_ASN1 BIT(7)

or something along those lines. Anyway, just putting the idea out there.

I'm sure there are a lot of opinions on this topic and that there are
a lot of implications of using this or that serialization format.
Obviously there can be security implications as well.

Apologies if I've already missed some of this discussion.

Cheers,

C

[1] https://airlied.blogspot.com/2022/09/accelerators-bof-outcomes-summary.html
[2] https://zephyr2022.sched.com/event/10CFD/open-source-nvme-ai-accelerator-platform-with-zephyr-karol-gugala-antmicro
[3] https://lpc.events/event/16/contributions/1245/
  
Oded Gabbay Nov. 13, 2022, 3:05 p.m. UTC | #5
On Sat, Nov 12, 2022 at 12:04 AM Christopher Friedt
<chrisfriedt@gmail.com> wrote:
>
> Hi Oded,
>
> On Sun, Nov 6, 2022 at 4:03 PM Oded Gabbay <ogabbay@kernel.org> wrote:
> > The patches are in the following repo:
> > https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/accel.git/log/?h=accel_v3
> >
> > As in v2, The HEAD of that branch is a commit adding a dummy driver that
> > registers an accel device using the new framework. This can be served
> > as a simple reference. I have checked inserting and removing the dummy driver,
> > and opening and closing /dev/accel/accel0 and nothing got broken :)
> >
> > v1 cover letter:
> > https://lkml.org/lkml/2022/10/22/544
> >
> > v2 cover letter:
> > https://lore.kernel.org/lkml/20221102203405.1797491-1-ogabbay@kernel.org/T/
>
> I was in the room at Plumbers when a lot of this was discussed (in
> 2022 and also 2019), but I haven't really had an opportunity to
> provide feedback until now. In general, I think it's great and thanks
> for pushing it forward and getting feedback.
>
> The v1 cover letter mentioned RAS (reliability, availability,
> serviceability) and Dave also mentioned it here [1]. There was a
> suggestion to use Netlink. It's an area that I'm fairly interested in
> because I do a lot of development on the firmware side (and
> specifically, with Zephyr).
>
> Personally, I think Netlink could be one option for serializing and
> deserializing RAS information but it would be helpful for that
> interface to be somewhat flexible, like a void * and length, and to
> provide userspace the capability of querying which RAS formats are
> supported.
>
> For example, AntMicro used OpenAMP + rpmsg in their NVMe accelerator,
> and gave a talk on it at ZDS and Plumbers this year [2][3].
>
> In Zephyr, the LGPL license for Netlink might be a non-starter
> (although I'm no lawyer). However, Zephyr does already support
> OpenAMP, protobufs, json, and will soon support Thrift.
>
> Some companies might prefer to use Netlink. Others might prefer to use
> ASN.1. Some companies might prefer to use key-value pairs and limit
> the parameters and messages to uint32s. Some might handle all of the
> RAS details in-kernel, while others might want the kernel to act more
> like a transport to firmware.
>
> Companies already producing accelerators may have a particular
> preference for serialization / deserialization in their own
> datacenters.
>
> With that, it would be helpful to be able to query RAS capabilities via ioctl.
>
> #define ACCEL_CAP_RAS_KEY_VAL_32 BIT(0)
> #define ACCEL_CAP_RAS_NETLINK BIT(1)
> #define ACCEL_CAP_RAS_JSON BIT(2)
> #define ACCEL_CAP_RAS_PROTOBUF BIT(3)
> #define ACCEL_CAP_RAS_GRPC BIT(4)
> #define ACCEL_CAP_RAS_THRIFT BIT(5)
> #define ACCEL_CAP_RAS_JSON BIT(6)
> #define ACCEL_CAP_RAS_ASN1 BIT(7)
>
> or something along those lines. Anyway, just putting the idea out there.
>
> I'm sure there are a lot of opinions on this topic and that there are
> a lot of implications of using this or that serialization format.
> Obviously there can be security implications as well.
>
> Apologies if I've already missed some of this discussion.
>
> Cheers,
>
> C
>
> [1] https://airlied.blogspot.com/2022/09/accelerators-bof-outcomes-summary.html
> [2] https://zephyr2022.sched.com/event/10CFD/open-source-nvme-ai-accelerator-platform-with-zephyr-karol-gugala-antmicro
> [3] https://lpc.events/event/16/contributions/1245/

Hi Christopher,
Thanks for all this information.
At this stage, I'm mainly trying to gather information on RAS current
status in the OCP (Open Compute Project) and Linux kernel, so your
email was on point :)
It seems to me that this topic is broader than just accelerators or
GPUs, because there are other device types that are implementing some
kind of RAS (e.g. NIC).
My gut feeling is that the end solution would be some kind of generic
kernel driver/framework that will expose RAS to userspace for any
device type, but it's too early to tell.
I'll update once I have the full picture.

Thanks,
Oded