[v4,0/11] Introduce a unified API for SCMI Server testing

Message ID	20221019204626.3813043-1-cristian.marussi@arm.com
Headers	Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; From: Cristian Marussi <cristian.marussi@arm.com> To: linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org Cc: sudeep.holla@arm.com, james.quinlan@broadcom.com, Jonathan.Cameron@Huawei.com, f.fainelli@gmail.com, etienne.carriere@linaro.org, vincent.guittot@linaro.org, souvik.chakravarty@arm.com, wleavitt@marvell.com, peter.hilber@opensynergy.com, nicola.mazzucato@arm.com, tarek.el-sherbiny@arm.com, quic_kshivnan@quicinc.com, cristian.marussi@arm.com Subject: [PATCH v4 0/11] Introduce a unified API for SCMI Server testing Date: Wed, 19 Oct 2022 21:46:15 +0100 Message-Id: <20221019204626.3813043-1-cristian.marussi@arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	Introduce a unified API for SCMI Server testing \| [v4,0/11] Introduce a unified API for SCMI Server testing [v4,01/11] firmware: arm_scmi: Refactor xfer in-flight registration routines [v4,02/11] firmware: arm_scmi: Simplify chan_available transport operation [v4,03/11] firmware: arm_scmi: Use dedicated devices to initialize channels [v4,04/11] firmware: arm_scmi: Refactor polling helpers [v4,05/11] firmware: arm_scmi: Refactor scmi_wait_for_message_response [v4,06/11] firmware: arm_scmi: Add xfer raw helpers [v4,07/11] firmware: arm_scmi: Move errors defs and code to common.h [v4,08/11] firmware: arm_scmi: Add raw transmission support [v4,09/11] firmware: arm_scmi: Add debugfs ABI documentation for Raw mode [v4,10/11] firmware: arm_scmi: Reject SCMI drivers while in Raw mode [v4,11/11] firmware: arm_scmi: Call Raw mode hooks from the core stack

Message ID

20221019204626.3813043-1-cristian.marussi@arm.com

Headers

Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
From: Cristian Marussi <cristian.marussi@arm.com>
To: linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org
Cc: sudeep.holla@arm.com, james.quinlan@broadcom.com,
        Jonathan.Cameron@Huawei.com, f.fainelli@gmail.com,
        etienne.carriere@linaro.org, vincent.guittot@linaro.org,
        souvik.chakravarty@arm.com, wleavitt@marvell.com,
        peter.hilber@opensynergy.com, nicola.mazzucato@arm.com,
        tarek.el-sherbiny@arm.com, quic_kshivnan@quicinc.com,
        cristian.marussi@arm.com
Subject: [PATCH v4 0/11] Introduce a unified API for SCMI Server testing
Date: Wed, 19 Oct 2022 21:46:15 +0100
Message-Id: <20221019204626.3813043-1-cristian.marussi@arm.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

Introduce a unified API for SCMI Server testing |

Message

Cristian Marussi Oct. 19, 2022, 8:46 p.m. UTC

  Hi all,

This series aims to introduce a new SCMI unified userspace interface meant
to ease testing an SCMI Server implementation for compliance, fuzzing etc.,
from the perspective of the OSPM agent (non-secure world only ...)

It is proposed as a testing/development facility, it is NOT meant to be a
feature to use in production, but only enabled in Kconfig for test
deployments.

Currently an SCMI Compliance Suite like the one at [1] can only work by
injecting SCMI messages at the SCMI transport layer using the mailbox test
driver (CONFIG_MAILBOX_TEST) via its few debugfs entries and looking at
the related replies from the SCMI backend Server.

This approach has a few drawbacks:

- the SCMI Server under test MUST be reachable through a mailbox based
  SCMI transport: any other SCMI Server placement is not possible (like in
  a VM reachable via SCMI Virtio). In order to cover other placements in
  the current scenario we should write some sort of test driver for each
  and every existent SCMI transport and for any future additional transport
  ...this clearly does not scale.

- even in the mailbox case the userspace Compliance suite cannot simply
  send and receive bare SCMI messages BUT it has to properly lay them out
  into the shared memory exposed by the mailbox test driver as expected by
  the transport definitions. In other words such a userspace test
  application has to, not only use a proper transport driver for the system
  at hand, but it also has to have a comprehensive knowledge of the
  internals of the underlying transport in order to operate.

- last but not least, the system under test has to be specifically
  configured and built, in terms of Kconfig and DT, to perform such kind of
  testing, it cannot be used for anything else, which is unfortunate for
  CI/CD deployments.

This series introduces a new SCMI Raw mode support feature that, when
configured and enabled exposes a new interface in debugfs through which:

- a userspace application can inject bare SCMI binary messages into the
  SCMI core stack; such messages will be routed by the SCMI regular kernel
  stack to the backend Server using the currently configured transport
  transparently: in other words you can test the SCMI server, no matter
  where it is placed, as long as it is reachable from the currently
  configured SCMI stack.
  Same goes the other way around on the reading path: any SCMI server reply
  can be read as a bare SCMI binary message from the same debugfs path.

- as a direct consequence of this way of injecting bare messages in the
  middle of the SCMI stack (instead of beneath it at the transport layer)
  the user application has to handle only bare SCMI messages without having
  to worry about the specific underlying transport internals that will be
  taken care of by the SCMI core stack itself using its own machinery,
  without duplicating such logic.

- a system under test, once configured with SCMI Raw support enabled in
  Kconfig, can be booted without any particular DT change.

In V2 the runtime enable/disable switching capability has been removed
(for now) since still not deemed to be stable/reliable enough: as a
consequence when SCMI Raw support is compiled in, the regular SCMI stack
drivers are now inhibited permanently for that Kernel.

In V4 it has been added the support for transports lacking a completion_irq
or configured forcibly in polled mode.

A quick and trivial example from the shell...reading from a sensor
injecting a properly crafted packet in raw mode:

	# INJECT THE SENSOR_READING MESSAGE FOR SENSOR ID=1 (binary little endian)
	root@deb-buster-arm64:~# echo -e -n \\x06\\x54\\x00\\x00\\x01\\x00\\x00\\x00\\x00\\x00\\x00\\x00 > /sys/kernel/debug/scmi_raw/message

	# READING BACK THE REPLY...
	root@deb-buster-arm64:~# cat /sys/kernel/debug/scmi_raw/message | od --endian=little -t x4
	0000000 00005406 00000000 00000335 00000000
	0000020

while doing that, since Raw mode makes (partial) use of the regular SCMI
stack, you can observe the messages going through the SCMI stack with the
usual traces:

              bash-329     [000] ..... 14183.446808: scmi_msg_dump: pt=15 t=CMND msg_id=06 seq=0000 s=0 pyld=0100000000000000
   irq/35-mhu_db_l-81      [000] ..... 14183.447809: scmi_msg_dump: pt=15 t=RESP msg_id=06 seq=0000 s=0 pyld=3503000000000000


..trying to read in async when the backend server does NOT supports asyncs:

	# AN ASYNC SENSOR READING REQUEST...
	root@deb-buster-arm64:~# echo -e -n \\x06\\x54\\x00\\x00\\x01\\x00\\x00\\x00\\x01\\x00\\x00\\x00 > /sys/kernel/debug/scmi_raw/message_async

              bash-329     [000] ..... 16415.938739: scmi_msg_dump: pt=15 t=CMND msg_id=06 seq=0000 s=0 pyld=0100000001000000
   irq/35-mhu_db_l-81      [000] ..... 16415.944129: scmi_msg_dump: pt=15 t=RESP msg_id=06 seq=0000 s=-1 pyld=

	# RETURNS A STATUS -1 FROM THE SERVER NOT SUPPORTING IT
	root@deb-buster-arm64:~# cat /sys/kernel/debug/scmi_raw/message | od --endian=little -t x4
	0000000 00005406 ffffffff
	0000010

Note that this was on a JUNO, BUT exactly the same steps can be used to
reach an SCMI Server living on a VM reachable via virtio as long as the
system under test if properly configured to work with a virtio transport.

In a nutshell the exposed API is as follows:

/sys/kernel/debug/scmi_raw/
├── errors
├── message
├── message_async
├── notification
├── reset
├── transport_max_msg_size
├── transport_rx_timeout_ms
└── transport_tx_max_msg

where:

 - message*: used to send sync/async commands and read back immediate and
   delayed responses (if any)
 - errors: used to report timeout and unexpected replies
 - reset: used to reset the SCMI Raw stack, flushing all queues from
   received messages still pending to be read out (useful to be sure to
   cleanup between test suite runs...)
 - notification: used to read any notification being spit by the system
   (if previously enabled by the user app)
 - transport*: a bunch of configuration useful to setup the user
   application expectations in terms of timeouts and message
   characteristics.

Each write corresponds to one command request and the replies or delayed
response are read back one message at time (receiving an EOF at each
message boundary).

The user application running the test is in charge of handling timeouts
and properly choosing SCMI sequence numbers for the outgoing requests: note
that the same fixed number can be re-used (...though discouraged...) as
long as the suite does NOT expect to send multiple in-flight commands
concurrently.

Since the SCMI core regular stack is partially used to deliver and collect
the messages, late replies after timeouts and any other sort of unexpected
message sent by the SCMI server platform back can be identified by the SCMI
core as usual and it will be reported under /errors for later analysis.
(a userspace test-app will have anyway properly detected the timeout on
 /message* ...)

All of the above has been roughly tested against a standard JUNO SCP SCMI
Server (mailbox trans) and an emulated SCMI Server living in a VM (virtio
trans) using a custom experimental version of the scmi-tests Compliance
suite patched to support Raw mode and posted at [2]. (still in development
...merge requests are in progress...for now it is just a mean for me to
test the testing API ... O_o)

The series is based on v6.1-rc1.

Having said that (in such a concise and brief way :P) ...
	
...any feedback/comments are welcome !

Thanks,
Cristian

---
V3 --> v4
- rebased on v6.1-rc1
- addedd missing support for 'polled' transports and transport lacking a
  completion_irq (like smc/optee)
- removed a few inlines
- refactored SCMI Raw RX patch to make use more extensively of the regular
  non-Raw RX path
- fix handling of O_NONBLOCK raw_mode read requests

v2 --> v3
- fixed some sparse warning on LE and __poll_t
- reworked and simplified deferred worker in charge of xfer delayed waiting
- allow for injection of DT-unknown protocols messages when in Raw mode
  (needed for any kind of fuzzing...)

v1 --> v2
- added comments and debugfs docs
- added dedicated transport devices for channels initialization
- better channels handling in Raw mode
- removed runtime enable, moved to static compile time exclusion
  of SCMI regular stack

[1]: https://gitlab.arm.com/tests/scmi-tests
[2]: https://gitlab.arm.com/tests/scmi-tests/-/commits/raw_mode_support_devel/


Cristian Marussi (11):
  firmware: arm_scmi: Refactor xfer in-flight registration routines
  firmware: arm_scmi: Simplify chan_available transport operation
  firmware: arm_scmi: Use dedicated devices to initialize channels
  firmware: arm_scmi: Refactor polling helpers
  firmware: arm_scmi: Refactor scmi_wait_for_message_response
  firmware: arm_scmi: Add xfer raw helpers
  firmware: arm_scmi: Move errors defs and code to common.h
  firmware: arm_scmi: Add raw transmission support
  firmware: arm_scmi: Add debugfs ABI documentation for Raw mode
  firmware: arm_scmi: Reject SCMI drivers while in Raw mode
  firmware: arm_scmi: Call Raw mode hooks from the core stack

 Documentation/ABI/testing/debugfs-scmi-raw |   88 ++
 drivers/firmware/arm_scmi/Kconfig          |   13 +
 drivers/firmware/arm_scmi/Makefile         |    1 +
 drivers/firmware/arm_scmi/common.h         |   72 +-
 drivers/firmware/arm_scmi/driver.c         |  521 +++++---
 drivers/firmware/arm_scmi/mailbox.c        |    4 +-
 drivers/firmware/arm_scmi/optee.c          |    4 +-
 drivers/firmware/arm_scmi/raw_mode.c       | 1244 ++++++++++++++++++++
 drivers/firmware/arm_scmi/raw_mode.h       |   29 +
 drivers/firmware/arm_scmi/smc.c            |    4 +-
 drivers/firmware/arm_scmi/virtio.c         |    2 +-
 11 files changed, 1827 insertions(+), 155 deletions(-)
 create mode 100644 Documentation/ABI/testing/debugfs-scmi-raw
 create mode 100644 drivers/firmware/arm_scmi/raw_mode.c
 create mode 100644 drivers/firmware/arm_scmi/raw_mode.h

Comments

Vincent Guittot Oct. 28, 2022, 2:40 p.m. UTC | #1

On Wed, 19 Oct 2022 at 22:46, Cristian Marussi <cristian.marussi@arm.com> wrote:
>
> Hi all,
>
> This series aims to introduce a new SCMI unified userspace interface meant
> to ease testing an SCMI Server implementation for compliance, fuzzing etc.,
> from the perspective of the OSPM agent (non-secure world only ...)
>
> It is proposed as a testing/development facility, it is NOT meant to be a
> feature to use in production, but only enabled in Kconfig for test
> deployments.
>
> Currently an SCMI Compliance Suite like the one at [1] can only work by
> injecting SCMI messages at the SCMI transport layer using the mailbox test
> driver (CONFIG_MAILBOX_TEST) via its few debugfs entries and looking at
> the related replies from the SCMI backend Server.
>
> This approach has a few drawbacks:
>
> - the SCMI Server under test MUST be reachable through a mailbox based
>   SCMI transport: any other SCMI Server placement is not possible (like in
>   a VM reachable via SCMI Virtio). In order to cover other placements in
>   the current scenario we should write some sort of test driver for each
>   and every existent SCMI transport and for any future additional transport
>   ...this clearly does not scale.
>
> - even in the mailbox case the userspace Compliance suite cannot simply
>   send and receive bare SCMI messages BUT it has to properly lay them out
>   into the shared memory exposed by the mailbox test driver as expected by
>   the transport definitions. In other words such a userspace test
>   application has to, not only use a proper transport driver for the system
>   at hand, but it also has to have a comprehensive knowledge of the
>   internals of the underlying transport in order to operate.
>
> - last but not least, the system under test has to be specifically
>   configured and built, in terms of Kconfig and DT, to perform such kind of
>   testing, it cannot be used for anything else, which is unfortunate for
>   CI/CD deployments.
>
> This series introduces a new SCMI Raw mode support feature that, when
> configured and enabled exposes a new interface in debugfs through which:
>
> - a userspace application can inject bare SCMI binary messages into the
>   SCMI core stack; such messages will be routed by the SCMI regular kernel
>   stack to the backend Server using the currently configured transport
>   transparently: in other words you can test the SCMI server, no matter
>   where it is placed, as long as it is reachable from the currently
>   configured SCMI stack.
>   Same goes the other way around on the reading path: any SCMI server reply
>   can be read as a bare SCMI binary message from the same debugfs path.
>
> - as a direct consequence of this way of injecting bare messages in the
>   middle of the SCMI stack (instead of beneath it at the transport layer)
>   the user application has to handle only bare SCMI messages without having
>   to worry about the specific underlying transport internals that will be
>   taken care of by the SCMI core stack itself using its own machinery,
>   without duplicating such logic.
>
> - a system under test, once configured with SCMI Raw support enabled in
>   Kconfig, can be booted without any particular DT change.
>
> In V2 the runtime enable/disable switching capability has been removed
> (for now) since still not deemed to be stable/reliable enough: as a
> consequence when SCMI Raw support is compiled in, the regular SCMI stack
> drivers are now inhibited permanently for that Kernel.
>
> In V4 it has been added the support for transports lacking a completion_irq
> or configured forcibly in polled mode.
>
> A quick and trivial example from the shell...reading from a sensor
> injecting a properly crafted packet in raw mode:
>
>         # INJECT THE SENSOR_READING MESSAGE FOR SENSOR ID=1 (binary little endian)
>         root@deb-buster-arm64:~# echo -e -n \\x06\\x54\\x00\\x00\\x01\\x00\\x00\\x00\\x00\\x00\\x00\\x00 > /sys/kernel/debug/scmi_raw/message
>
>         # READING BACK THE REPLY...
>         root@deb-buster-arm64:~# cat /sys/kernel/debug/scmi_raw/message | od --endian=little -t x4
>         0000000 00005406 00000000 00000335 00000000
>         0000020
>
> while doing that, since Raw mode makes (partial) use of the regular SCMI
> stack, you can observe the messages going through the SCMI stack with the
> usual traces:
>
>               bash-329     [000] ..... 14183.446808: scmi_msg_dump: pt=15 t=CMND msg_id=06 seq=0000 s=0 pyld=0100000000000000
>    irq/35-mhu_db_l-81      [000] ..... 14183.447809: scmi_msg_dump: pt=15 t=RESP msg_id=06 seq=0000 s=0 pyld=3503000000000000
>
>
> ..trying to read in async when the backend server does NOT supports asyncs:
>
>         # AN ASYNC SENSOR READING REQUEST...
>         root@deb-buster-arm64:~# echo -e -n \\x06\\x54\\x00\\x00\\x01\\x00\\x00\\x00\\x01\\x00\\x00\\x00 > /sys/kernel/debug/scmi_raw/message_async
>
>               bash-329     [000] ..... 16415.938739: scmi_msg_dump: pt=15 t=CMND msg_id=06 seq=0000 s=0 pyld=0100000001000000
>    irq/35-mhu_db_l-81      [000] ..... 16415.944129: scmi_msg_dump: pt=15 t=RESP msg_id=06 seq=0000 s=-1 pyld=
>
>         # RETURNS A STATUS -1 FROM THE SERVER NOT SUPPORTING IT
>         root@deb-buster-arm64:~# cat /sys/kernel/debug/scmi_raw/message | od --endian=little -t x4
>         0000000 00005406 ffffffff
>         0000010
>
> Note that this was on a JUNO, BUT exactly the same steps can be used to
> reach an SCMI Server living on a VM reachable via virtio as long as the
> system under test if properly configured to work with a virtio transport.
>
> In a nutshell the exposed API is as follows:
>
> /sys/kernel/debug/scmi_raw/
> ├── errors
> ├── message
> ├── message_async
> ├── notification
> ├── reset
> ├── transport_max_msg_size
> ├── transport_rx_timeout_ms
> └── transport_tx_max_msg
>
> where:
>
>  - message*: used to send sync/async commands and read back immediate and
>    delayed responses (if any)
>  - errors: used to report timeout and unexpected replies
>  - reset: used to reset the SCMI Raw stack, flushing all queues from
>    received messages still pending to be read out (useful to be sure to
>    cleanup between test suite runs...)
>  - notification: used to read any notification being spit by the system
>    (if previously enabled by the user app)
>  - transport*: a bunch of configuration useful to setup the user
>    application expectations in terms of timeouts and message
>    characteristics.
>
> Each write corresponds to one command request and the replies or delayed
> response are read back one message at time (receiving an EOF at each
> message boundary).
>
> The user application running the test is in charge of handling timeouts
> and properly choosing SCMI sequence numbers for the outgoing requests: note
> that the same fixed number can be re-used (...though discouraged...) as
> long as the suite does NOT expect to send multiple in-flight commands
> concurrently.
>
> Since the SCMI core regular stack is partially used to deliver and collect
> the messages, late replies after timeouts and any other sort of unexpected
> message sent by the SCMI server platform back can be identified by the SCMI
> core as usual and it will be reported under /errors for later analysis.
> (a userspace test-app will have anyway properly detected the timeout on
>  /message* ...)
>
> All of the above has been roughly tested against a standard JUNO SCP SCMI
> Server (mailbox trans) and an emulated SCMI Server living in a VM (virtio
> trans) using a custom experimental version of the scmi-tests Compliance
> suite patched to support Raw mode and posted at [2]. (still in development
> ...merge requests are in progress...for now it is just a mean for me to
> test the testing API ... O_o)
>
> The series is based on v6.1-rc1.
>
> Having said that (in such a concise and brief way :P) ...
>
> ...any feedback/comments are welcome !

Hi Cristian,

I have tested your series with an optee message transport layer and
been able to send raw messages to the scmi server PTA

FWIW

Tested-by: Vincent Guittot <vincent.guittot@linaro.org>

>
> Thanks,
> Cristian
>
> ---
> V3 --> v4
> - rebased on v6.1-rc1
> - addedd missing support for 'polled' transports and transport lacking a
>   completion_irq (like smc/optee)
> - removed a few inlines
> - refactored SCMI Raw RX patch to make use more extensively of the regular
>   non-Raw RX path
> - fix handling of O_NONBLOCK raw_mode read requests
>
> v2 --> v3
> - fixed some sparse warning on LE and __poll_t
> - reworked and simplified deferred worker in charge of xfer delayed waiting
> - allow for injection of DT-unknown protocols messages when in Raw mode
>   (needed for any kind of fuzzing...)
>
> v1 --> v2
> - added comments and debugfs docs
> - added dedicated transport devices for channels initialization
> - better channels handling in Raw mode
> - removed runtime enable, moved to static compile time exclusion
>   of SCMI regular stack
>
> [1]: https://gitlab.arm.com/tests/scmi-tests
> [2]: https://gitlab.arm.com/tests/scmi-tests/-/commits/raw_mode_support_devel/
>
>
> Cristian Marussi (11):
>   firmware: arm_scmi: Refactor xfer in-flight registration routines
>   firmware: arm_scmi: Simplify chan_available transport operation
>   firmware: arm_scmi: Use dedicated devices to initialize channels
>   firmware: arm_scmi: Refactor polling helpers
>   firmware: arm_scmi: Refactor scmi_wait_for_message_response
>   firmware: arm_scmi: Add xfer raw helpers
>   firmware: arm_scmi: Move errors defs and code to common.h
>   firmware: arm_scmi: Add raw transmission support
>   firmware: arm_scmi: Add debugfs ABI documentation for Raw mode
>   firmware: arm_scmi: Reject SCMI drivers while in Raw mode
>   firmware: arm_scmi: Call Raw mode hooks from the core stack
>
>  Documentation/ABI/testing/debugfs-scmi-raw |   88 ++
>  drivers/firmware/arm_scmi/Kconfig          |   13 +
>  drivers/firmware/arm_scmi/Makefile         |    1 +
>  drivers/firmware/arm_scmi/common.h         |   72 +-
>  drivers/firmware/arm_scmi/driver.c         |  521 +++++---
>  drivers/firmware/arm_scmi/mailbox.c        |    4 +-
>  drivers/firmware/arm_scmi/optee.c          |    4 +-
>  drivers/firmware/arm_scmi/raw_mode.c       | 1244 ++++++++++++++++++++
>  drivers/firmware/arm_scmi/raw_mode.h       |   29 +
>  drivers/firmware/arm_scmi/smc.c            |    4 +-
>  drivers/firmware/arm_scmi/virtio.c         |    2 +-
>  11 files changed, 1827 insertions(+), 155 deletions(-)
>  create mode 100644 Documentation/ABI/testing/debugfs-scmi-raw
>  create mode 100644 drivers/firmware/arm_scmi/raw_mode.c
>  create mode 100644 drivers/firmware/arm_scmi/raw_mode.h
>
> --
> 2.34.1
>

Florian Fainelli Oct. 28, 2022, 2:44 p.m. UTC | #2

Hi Christian,

On 10/19/2022 1:46 PM, Cristian Marussi wrote:
> Hi all,
> 
> This series aims to introduce a new SCMI unified userspace interface meant
> to ease testing an SCMI Server implementation for compliance, fuzzing etc.,
> from the perspective of the OSPM agent (non-secure world only ...)
> 
> It is proposed as a testing/development facility, it is NOT meant to be a
> feature to use in production, but only enabled in Kconfig for test
> deployments.
> 
> Currently an SCMI Compliance Suite like the one at [1] can only work by
> injecting SCMI messages at the SCMI transport layer using the mailbox test
> driver (CONFIG_MAILBOX_TEST) via its few debugfs entries and looking at
> the related replies from the SCMI backend Server.

I plan on giving this a try on our systems later today and will let you 
know the outcome. This is very useful for making sure the SCMI 
implementation is both correct and properly hardened.

Cristian Marussi Oct. 28, 2022, 3:03 p.m. UTC | #3

On Fri, Oct 28, 2022 at 04:40:02PM +0200, Vincent Guittot wrote:
> On Wed, 19 Oct 2022 at 22:46, Cristian Marussi <cristian.marussi@arm.com> wrote:
> >
> > Hi all,
> >

Hi Vincent,

> > This series aims to introduce a new SCMI unified userspace interface meant
> > to ease testing an SCMI Server implementation for compliance, fuzzing etc.,
> > from the perspective of the OSPM agent (non-secure world only ...)
> >
> > It is proposed as a testing/development facility, it is NOT meant to be a
> > feature to use in production, but only enabled in Kconfig for test
> > deployments.
> >
> > Currently an SCMI Compliance Suite like the one at [1] can only work by
> > injecting SCMI messages at the SCMI transport layer using the mailbox test
> > driver (CONFIG_MAILBOX_TEST) via its few debugfs entries and looking at
> > the related replies from the SCMI backend Server.
> >
> > This approach has a few drawbacks:
> >
> > - the SCMI Server under test MUST be reachable through a mailbox based
> >   SCMI transport: any other SCMI Server placement is not possible (like in
> >   a VM reachable via SCMI Virtio). In order to cover other placements in
> >   the current scenario we should write some sort of test driver for each
> >   and every existent SCMI transport and for any future additional transport
> >   ...this clearly does not scale.
> >
> > - even in the mailbox case the userspace Compliance suite cannot simply
> >   send and receive bare SCMI messages BUT it has to properly lay them out
> >   into the shared memory exposed by the mailbox test driver as expected by
> >   the transport definitions. In other words such a userspace test
> >   application has to, not only use a proper transport driver for the system
> >   at hand, but it also has to have a comprehensive knowledge of the
> >   internals of the underlying transport in order to operate.
> >
> > - last but not least, the system under test has to be specifically
> >   configured and built, in terms of Kconfig and DT, to perform such kind of
> >   testing, it cannot be used for anything else, which is unfortunate for
> >   CI/CD deployments.
> >
> > This series introduces a new SCMI Raw mode support feature that, when
> > configured and enabled exposes a new interface in debugfs through which:
> >
> > - a userspace application can inject bare SCMI binary messages into the
> >   SCMI core stack; such messages will be routed by the SCMI regular kernel
> >   stack to the backend Server using the currently configured transport
> >   transparently: in other words you can test the SCMI server, no matter
> >   where it is placed, as long as it is reachable from the currently
> >   configured SCMI stack.
> >   Same goes the other way around on the reading path: any SCMI server reply
> >   can be read as a bare SCMI binary message from the same debugfs path.
> >
> > - as a direct consequence of this way of injecting bare messages in the
> >   middle of the SCMI stack (instead of beneath it at the transport layer)
> >   the user application has to handle only bare SCMI messages without having
> >   to worry about the specific underlying transport internals that will be
> >   taken care of by the SCMI core stack itself using its own machinery,
> >   without duplicating such logic.
> >
> > - a system under test, once configured with SCMI Raw support enabled in
> >   Kconfig, can be booted without any particular DT change.
> >
> > In V2 the runtime enable/disable switching capability has been removed
> > (for now) since still not deemed to be stable/reliable enough: as a
> > consequence when SCMI Raw support is compiled in, the regular SCMI stack
> > drivers are now inhibited permanently for that Kernel.
> >
> > In V4 it has been added the support for transports lacking a completion_irq
> > or configured forcibly in polled mode.
> >
> > A quick and trivial example from the shell...reading from a sensor
> > injecting a properly crafted packet in raw mode:
> >
> >         # INJECT THE SENSOR_READING MESSAGE FOR SENSOR ID=1 (binary little endian)
> >         root@deb-buster-arm64:~# echo -e -n \\x06\\x54\\x00\\x00\\x01\\x00\\x00\\x00\\x00\\x00\\x00\\x00 > /sys/kernel/debug/scmi_raw/message
> >
> >         # READING BACK THE REPLY...
> >         root@deb-buster-arm64:~# cat /sys/kernel/debug/scmi_raw/message | od --endian=little -t x4
> >         0000000 00005406 00000000 00000335 00000000
> >         0000020
> >
> > while doing that, since Raw mode makes (partial) use of the regular SCMI
> > stack, you can observe the messages going through the SCMI stack with the
> > usual traces:
> >
> >               bash-329     [000] ..... 14183.446808: scmi_msg_dump: pt=15 t=CMND msg_id=06 seq=0000 s=0 pyld=0100000000000000
> >    irq/35-mhu_db_l-81      [000] ..... 14183.447809: scmi_msg_dump: pt=15 t=RESP msg_id=06 seq=0000 s=0 pyld=3503000000000000
> >
> >
> > ..trying to read in async when the backend server does NOT supports asyncs:
> >
> >         # AN ASYNC SENSOR READING REQUEST...
> >         root@deb-buster-arm64:~# echo -e -n \\x06\\x54\\x00\\x00\\x01\\x00\\x00\\x00\\x01\\x00\\x00\\x00 > /sys/kernel/debug/scmi_raw/message_async
> >
> >               bash-329     [000] ..... 16415.938739: scmi_msg_dump: pt=15 t=CMND msg_id=06 seq=0000 s=0 pyld=0100000001000000
> >    irq/35-mhu_db_l-81      [000] ..... 16415.944129: scmi_msg_dump: pt=15 t=RESP msg_id=06 seq=0000 s=-1 pyld=
> >
> >         # RETURNS A STATUS -1 FROM THE SERVER NOT SUPPORTING IT
> >         root@deb-buster-arm64:~# cat /sys/kernel/debug/scmi_raw/message | od --endian=little -t x4
> >         0000000 00005406 ffffffff
> >         0000010
> >
> > Note that this was on a JUNO, BUT exactly the same steps can be used to
> > reach an SCMI Server living on a VM reachable via virtio as long as the
> > system under test if properly configured to work with a virtio transport.
> >
> > In a nutshell the exposed API is as follows:
> >
> > /sys/kernel/debug/scmi_raw/
> > ├── errors
> > ├── message
> > ├── message_async
> > ├── notification
> > ├── reset
> > ├── transport_max_msg_size
> > ├── transport_rx_timeout_ms
> > └── transport_tx_max_msg
> >
> > where:
> >
> >  - message*: used to send sync/async commands and read back immediate and
> >    delayed responses (if any)
> >  - errors: used to report timeout and unexpected replies
> >  - reset: used to reset the SCMI Raw stack, flushing all queues from
> >    received messages still pending to be read out (useful to be sure to
> >    cleanup between test suite runs...)
> >  - notification: used to read any notification being spit by the system
> >    (if previously enabled by the user app)
> >  - transport*: a bunch of configuration useful to setup the user
> >    application expectations in terms of timeouts and message
> >    characteristics.
> >
> > Each write corresponds to one command request and the replies or delayed
> > response are read back one message at time (receiving an EOF at each
> > message boundary).
> >
> > The user application running the test is in charge of handling timeouts
> > and properly choosing SCMI sequence numbers for the outgoing requests: note
> > that the same fixed number can be re-used (...though discouraged...) as
> > long as the suite does NOT expect to send multiple in-flight commands
> > concurrently.
> >
> > Since the SCMI core regular stack is partially used to deliver and collect
> > the messages, late replies after timeouts and any other sort of unexpected
> > message sent by the SCMI server platform back can be identified by the SCMI
> > core as usual and it will be reported under /errors for later analysis.
> > (a userspace test-app will have anyway properly detected the timeout on
> >  /message* ...)
> >
> > All of the above has been roughly tested against a standard JUNO SCP SCMI
> > Server (mailbox trans) and an emulated SCMI Server living in a VM (virtio
> > trans) using a custom experimental version of the scmi-tests Compliance
> > suite patched to support Raw mode and posted at [2]. (still in development
> > ...merge requests are in progress...for now it is just a mean for me to
> > test the testing API ... O_o)
> >
> > The series is based on v6.1-rc1.
> >
> > Having said that (in such a concise and brief way :P) ...
> >
> > ...any feedback/comments are welcome !
> 
> Hi Cristian,
> 
> I have tested your series with an optee message transport layer and
> been able to send raw messages to the scmi server PTA
> 
> FWIW
> 
> Tested-by: Vincent Guittot <vincent.guittot@linaro.org>
> 

Thanks a lot for your test and feedback !

... but I was going to reply to this saying that I spotted another issue
with the current SCMI Raw implementation (NOT related to optee/smc) so
that I'll post a V5 next week :P

Anyway, the issue is much related to the debugfs root used by SCMI Raw,
i.e.:

	/sys/kernel/debug/scmi_raw/

..this works fine unless you run it on a system sporting multiple DT-defined
server instances ...that is not officially supported but....ehm...a little
bird told these system with multiple servers do exists :D

In such a case the SCMI core stack is probed multiuple times and so it
will try to register multiple debugfs Raw roots: there is no chanche to
root the SCMI Raw entries at the same point clearly ... and then anyway
there is the issue of recognizing which server is rooted where ... with
the additional pain that a server CANNOT be recognized by querying...cause
there is only one by teh spec with agentID ZERO ... in theory :D...

So my tentaive solution for V5 would be:

- change the Raw root debugfs as:

	/sys/kernel/debug/scmi_raw/0/... (first server defined)

	/sys/kernel/debug/scmi_raw/1/... (possible additional server(s)..)

- expose the DT scmi-server root-node name of the server somewhere under
  that debugfs root like:

	..../scmi_raw/0/transport_name -> 'scmi-mbx'

	..../scmi_raw/1/transport_name -> 'scmi-virtio'

  so that if you know HOW you have configured your own system in the DT
  (maybe multiple servers with different kind of transports ?), you can
  easily select programmatically which one is which, and so decide
  which Raw debugfs fs to use...

... I plan to leave the SCMI ACS suite use by default the first system
rooted at /sys/kernel/debug/scmi_raw/0/...maybe adding a commandline
option to choose an alternative path for SCMI Raw.

Does all of this sound reasonable ?

Thanks,
Cristian

Cristian Marussi Oct. 28, 2022, 3:23 p.m. UTC | #4

On Fri, Oct 28, 2022 at 07:44:32AM -0700, Florian Fainelli wrote:
> Hi Christian,
> 
> On 10/19/2022 1:46 PM, Cristian Marussi wrote:
> > Hi all,
> > 

Hi Florian,

> > This series aims to introduce a new SCMI unified userspace interface meant
> > to ease testing an SCMI Server implementation for compliance, fuzzing etc.,
> > from the perspective of the OSPM agent (non-secure world only ...)
> > 
> > It is proposed as a testing/development facility, it is NOT meant to be a
> > feature to use in production, but only enabled in Kconfig for test
> > deployments.
> > 
> > Currently an SCMI Compliance Suite like the one at [1] can only work by
> > injecting SCMI messages at the SCMI transport layer using the mailbox test
> > driver (CONFIG_MAILBOX_TEST) via its few debugfs entries and looking at
> > the related replies from the SCMI backend Server.
> 
> I plan on giving this a try on our systems later today and will let you know
> the outcome.

Great ! It would be much appreciated...

> This is very useful for making sure the SCMI implementation is
> both correct and properly hardened.

... that was the plan :P

Note that the upstream SCMI ACS suite that I am using for stressing/testing
this Raw thing is still WIP in term of supporting Raw mode injection
(i.e. functional but ALL still to be merged)..but if you need I can give
you pointers on how to use it....unless of course you have your suite or
you just want to test using the shell as in the cover-letter examples...

... on my side I tried to fuzz me with a brutal

	'dd bs=128 count=1 if=/dev/random of=<scmi_raw>/message'

as a poor man fuzzying tool :D ... so I was thinking if it was meaningful
to think about upstreaming some common tools for fuzzying or simply
pre-building bare payloads (in proper endianity) to be injected with this
SCMI raw thing... (I mean something useful that could live in tools/)

...any feedbacks/hints in these regards are welcome.

Thanks,
Cristian

Vincent Guittot Oct. 28, 2022, 4:18 p.m. UTC | #5

On Fri, 28 Oct 2022 at 17:04, Cristian Marussi <cristian.marussi@arm.com> wrote:
>
> On Fri, Oct 28, 2022 at 04:40:02PM +0200, Vincent Guittot wrote:
> > On Wed, 19 Oct 2022 at 22:46, Cristian Marussi <cristian.marussi@arm.com> wrote:
> > >
> > > Hi all,
> > >
>
> Hi Vincent,
>
> > > This series aims to introduce a new SCMI unified userspace interface meant
> > > to ease testing an SCMI Server implementation for compliance, fuzzing etc.,
> > > from the perspective of the OSPM agent (non-secure world only ...)
> > >

[ snip]

> > Hi Cristian,
> >
> > I have tested your series with an optee message transport layer and
> > been able to send raw messages to the scmi server PTA
> >
> > FWIW
> >
> > Tested-by: Vincent Guittot <vincent.guittot@linaro.org>
> >
>
> Thanks a lot for your test and feedback !
>
> ... but I was going to reply to this saying that I spotted another issue
> with the current SCMI Raw implementation (NOT related to optee/smc) so
> that I'll post a V5 next week :P
>
> Anyway, the issue is much related to the debugfs root used by SCMI Raw,
> i.e.:
>
>         /sys/kernel/debug/scmi_raw/
>
> ..this works fine unless you run it on a system sporting multiple DT-defined
> server instances ...that is not officially supported but....ehm...a little
> bird told these system with multiple servers do exists :D

;-)

>
> In such a case the SCMI core stack is probed multiuple times and so it
> will try to register multiple debugfs Raw roots: there is no chanche to
> root the SCMI Raw entries at the same point clearly ... and then anyway
> there is the issue of recognizing which server is rooted where ... with
> the additional pain that a server CANNOT be recognized by querying...cause
> there is only one by teh spec with agentID ZERO ... in theory :D...
>
> So my tentaive solution for V5 would be:
>
> - change the Raw root debugfs as:
>
>         /sys/kernel/debug/scmi_raw/0/... (first server defined)
>
>         /sys/kernel/debug/scmi_raw/1/... (possible additional server(s)..)
>
> - expose the DT scmi-server root-node name of the server somewhere under
>   that debugfs root like:
>
>         ..../scmi_raw/0/transport_name -> 'scmi-mbx'
>
>         ..../scmi_raw/1/transport_name -> 'scmi-virtio'

I was about to say that you would display the server name but that
means that you have send a request to the server which probably
defeats the purpose of the raw mode

>
>   so that if you know HOW you have configured your own system in the DT
>   (maybe multiple servers with different kind of transports ?), you can
>   easily select programmatically which one is which, and so decide
>   which Raw debugfs fs to use...
>
> ... I plan to leave the SCMI ACS suite use by default the first system
> rooted at /sys/kernel/debug/scmi_raw/0/...maybe adding a commandline
> option to choose an alternative path for SCMI Raw.
>
> Does all of this sound reasonable ?

Yes, adding an index looks good to me.

As we are there, should we consider adding a per channel entry in the
tree when there are several channels shared with the same server ?

Vincent

>
> Thanks,
> Cristian
>

Cristian Marussi Oct. 28, 2022, 4:58 p.m. UTC | #6

On Fri, Oct 28, 2022 at 06:18:52PM +0200, Vincent Guittot wrote:
> On Fri, 28 Oct 2022 at 17:04, Cristian Marussi <cristian.marussi@arm.com> wrote:
> >
> > On Fri, Oct 28, 2022 at 04:40:02PM +0200, Vincent Guittot wrote:
> > > On Wed, 19 Oct 2022 at 22:46, Cristian Marussi <cristian.marussi@arm.com> wrote:
> > > >
> > > > Hi all,
> > > >
> >
> > Hi Vincent,
> >
> > > > This series aims to introduce a new SCMI unified userspace interface meant
> > > > to ease testing an SCMI Server implementation for compliance, fuzzing etc.,
> > > > from the perspective of the OSPM agent (non-secure world only ...)
> > > >
> 
> [ snip]
> 
> > > Hi Cristian,
> > >
> > > I have tested your series with an optee message transport layer and
> > > been able to send raw messages to the scmi server PTA
> > >
> > > FWIW
> > >
> > > Tested-by: Vincent Guittot <vincent.guittot@linaro.org>
> > >
> >
> > Thanks a lot for your test and feedback !
> >
> > ... but I was going to reply to this saying that I spotted another issue
> > with the current SCMI Raw implementation (NOT related to optee/smc) so
> > that I'll post a V5 next week :P
> >
> > Anyway, the issue is much related to the debugfs root used by SCMI Raw,
> > i.e.:
> >
> >         /sys/kernel/debug/scmi_raw/
> >
> > ..this works fine unless you run it on a system sporting multiple DT-defined
> > server instances ...that is not officially supported but....ehm...a little
> > bird told these system with multiple servers do exists :D
> 
> ;-)
> 
> >
> > In such a case the SCMI core stack is probed multiuple times and so it
> > will try to register multiple debugfs Raw roots: there is no chanche to
> > root the SCMI Raw entries at the same point clearly ... and then anyway
> > there is the issue of recognizing which server is rooted where ... with
> > the additional pain that a server CANNOT be recognized by querying...cause
> > there is only one by teh spec with agentID ZERO ... in theory :D...
> >
> > So my tentaive solution for V5 would be:
> >
> > - change the Raw root debugfs as:
> >
> >         /sys/kernel/debug/scmi_raw/0/... (first server defined)
> >
> >         /sys/kernel/debug/scmi_raw/1/... (possible additional server(s)..)
> >
> > - expose the DT scmi-server root-node name of the server somewhere under
> >   that debugfs root like:
> >
> >         ..../scmi_raw/0/transport_name -> 'scmi-mbx'
> >
> >         ..../scmi_raw/1/transport_name -> 'scmi-virtio'
> 
> I was about to say that you would display the server name but that
> means that you have send a request to the server which probably
> defeats the purpose of the raw mode
> 
> >
> >   so that if you know HOW you have configured your own system in the DT
> >   (maybe multiple servers with different kind of transports ?), you can
> >   easily select programmatically which one is which, and so decide
> >   which Raw debugfs fs to use...
> >
> > ... I plan to leave the SCMI ACS suite use by default the first system
> > rooted at /sys/kernel/debug/scmi_raw/0/...maybe adding a commandline
> > option to choose an alternative path for SCMI Raw.
> >
> > Does all of this sound reasonable ?
> 
> Yes, adding an index looks good to me.

Ok, I'll rework accordingly.

> 
> As we are there, should we consider adding a per channel entry in the
> tree when there are several channels shared with the same server ?
> 

So, I was thinking about this and, even though, it seems not strictly
needed for Compliance testing (as discussed offline) I do think that
could be a sensible option to have as an additional mean to stress the
server transport implementation (as you wish).

Having said that, this week, I was reasoning about an alternative
interface to do this, i.e. to avoid to add even more debugfs entries
for this chosen-channel config or possibly in the future to ask for
transport polling mode (if supported by the underlying transport)

My idea (not thought fully through as of now eh..) would be as follows:

since the current Raw implementation enforces a minimum size of 4 bytes
for the injected message (more on this later down below in NOTE), I was
thinking about using less-than-4-bytes-sized messages to sort of
pre-configure the Raw stack.

IOW, instead of having a number of new additional entries like

	../message_ch10
	../message_ch13
	../message_poll

we could design a sort of API (in the API :D) that defines how
3-bytes message payload are to be interpreted, so that in normal usage
everything will go on as it is now, while if a 3-bytes message is
injected by a specially crafted testcase, it would be used to configure
the behaviour stack for the subsequent set of Raw transactions
(i.e. for the currently opened fd...) like:

- open message fd

- send a configure message:

	| proto_chan_# 	|     flags (polling..)  |
	------------------------------------------
	0		7			21

- send/receive your test messages

- repeat or close (then the config will vanish...)

This would mean adding some sort entry under scmi_raw to expose the
configured available channels on the system though.

[maybe the flags above could also include an async flag and avoid
 also to add the message_async entries...]

I discarded the idea to run the above config process via IOCTLs since
it seemed to me even more frowned upon to use IOCTLs on a debugfs entry
:P...but I maybe wrong ah...

All of this is still to be explored anyway, any thoughts ? or evident
drawbacks ? (beside having to clearly define an API for these message
config machinery)

Anyway, whatever direction we'll choose (additional entries vs 3-bytes
config msg), I would prefer to add this per-channel (or polling)
capabilities with separate series to post on top of this in teh next
cycle.

..too many words even this time :P

Thanks,
Cristian

P.S: NOTE min_injected_msg_size:
--------------------------------
Thinking about all of the above, at first, I was a bit dubious if
instead I should not allow, in Raw mode, the injection of shorter than
4 bytes messages (i.e. shorter than a SCMI header) for the purpose of
fuzzing: then I realized that even though I should allow the injection
of smaller messages, the underlying transports, as they are defined, both
sides (platform and agent) will anyway carry out a 4bytes transaction,
it's just that all the other non-provided bytes will be zeroed in the
memory layout; this is just how the transports itself (shmem or msg
based) are designed to work both sides. (and again would be transport
layer testing more than SCMI spec verification..)

So at the end I thought this kind of less-than-4-bytes transmissions
gave no benefit and I came up with the above trick to use such tiny
message for configuration.

Florian Fainelli Oct. 29, 2022, 2:38 a.m. UTC | #7

Hi Christian,

On 10/19/2022 1:46 PM, Cristian Marussi wrote:
[snip]

> In V2 the runtime enable/disable switching capability has been removed
> (for now) since still not deemed to be stable/reliable enough: as a
> consequence when SCMI Raw support is compiled in, the regular SCMI stack
> drivers are now inhibited permanently for that Kernel.

For our platforms (ARCH_BRCMSTB) we would need to have the ability to 
start with the regular SCMI stack to satisfy if nothing else, all clock 
consumers otherwise it makes it fairly challenging for us to boot to a 
prompt as we purposely turn off all unnecessary peripherals to conserve 
power. We could introduce a "full on" mode to remove the clock provider 
dependency, but I suspect others on "real" silicon may suffer from the 
same short comings.

Once user-space is reached, I suppose we could find a way to unbind from 
all SCMI consumers, and/or ensure that runtime PM is disabled, cpufreq 
is in a governor that won't do any active frequency switching etc.

What do you think?

Cristian Marussi Oct. 29, 2022, 11:14 a.m. UTC | #8

On Fri, Oct 28, 2022 at 07:38:25PM -0700, Florian Fainelli wrote:
> Hi Christian,

Hi Florian,

> 
> On 10/19/2022 1:46 PM, Cristian Marussi wrote:
> [snip]
> 
> > In V2 the runtime enable/disable switching capability has been removed
> > (for now) since still not deemed to be stable/reliable enough: as a
> > consequence when SCMI Raw support is compiled in, the regular SCMI stack
> > drivers are now inhibited permanently for that Kernel.
> 
> For our platforms (ARCH_BRCMSTB) we would need to have the ability to start
> with the regular SCMI stack to satisfy if nothing else, all clock consumers
> otherwise it makes it fairly challenging for us to boot to a prompt as we
> purposely turn off all unnecessary peripherals to conserve power. We could
> introduce a "full on" mode to remove the clock provider dependency, but I
> suspect others on "real" silicon may suffer from the same short comings.
> 

Indeed in V1 of this series the Raw mode was dynamically switched on/off at
runtime, so that you could have booted your system with a full working
SCMI stack and then Raw mode could have been enabled/disabled via
scmi_raw/enable entry, so causing the SCMI drivers to be unbound after
boot when entering Raw mode.

The idea, indeed, was initially to be able to boot a regular system, perform
any kind of non-SCMI testing and then switch at will into Raw mode, perform
your SCMI testing, and then back from the grave into normal mode when needed.
(this way you could have deployed into CI one single image for all
testing scenarios...)

The valid objections/worries were around the stability/relliability of such
a solution both when entering Raw mode and then coming back to normal
use: i.e. not being sure to be able to safely unbind all and to safely
bind back all the stack at the end.

The full discussion about this is in this thread if you'd want to
chime in with your point of view:

https://lore.kernel.org/all/Yvvb6Y+lzuABT1fy@sirena.org.uk/

So we removed it, but the idea was not fullly abandoned, we could revive
it with some variations, most probably binding this feature to a Kconfig
option (default=N).

Any feedback/idea from You in these regards is highly welcome.

> Once user-space is reached, I suppose we could find a way to unbind from all
> SCMI consumers, and/or ensure that runtime PM is disabled, cpufreq is in a
> governor that won't do any active frequency switching etc.
> 
> What do you think?

Anyway, thinking about your scenario, maybe this dynamic-switch is NOT
a good solution in your case, because that was an all-or-nothing switch
that caused the full SCMI stack to be unbound, you could not selectively
keep alive what you possibly need to stay on even after boot.

I think that an alternative, maybe better, option in your case, since
you are willing to manually fine-tune at runtime which parts of the SCMI
has to be inhibited to avoid interferences while Raw-testing (via unbind/
unload or policy governors changes), a better option could be a
'full-coexistence' Raw mode solution.

In such a COEX configuration you'll boot a normal system with all the
SCMI drivers operational as configured in the DT, BUT with also the
Raw mode initialized and ready to be used.

In this scenario, basically, you'll have the normal message transactions,
coming from the regular SCMI drivers, and the Raw transactions, injected
from your test suite, that happily coexist side-by-side at the pure
trasaction level: this does NOT mean that you won't suffer any interference
at the protocol level so, as said, you'll have anyway to inhibit properly the
SCMI drivers by hand to avoid false-positives in your results.
(imagine testcase generating a series of Raw get/set/get transaction on
a resource while the regular stack issue a set on the same
res...notifications interferences are even worst...)

Now, the GOOD_NEWS : is ... that this can be done already with an additional
slim patch that has to be applied on top of this series, patch that I
have not posted still since not sure of its utility, but that I am using
heavily in my setup and which works fine for me (with really rare
interferences on testing even without fine-tuning/disabling anything by hand..)

I attached such patch at the end of this mail so that you can
immediately be unblocked and experiment further with Raw mode as you
planned.

I'll cleaned it up and post it also to the next V5 at this point.

Once that COEX is enabled, you should see something like this at boot:

[    1.824269] arm-scmi firmware:scmi: SCMI RAW Mode initialized.
[    1.830155] arm-scmi firmware:scmi: SCMI RAW Mode COEX enabled !
[    1.836473] arm-scmi firmware:scmi: SCMI Notifications - Core Enabled.
[    1.847481] arm-scmi firmware:scmi: SCMI Protocol v2.0 'arm:arm' Firmware version 0x20a0000
...

... and then you can just use the scmi_raw entries as you wish.

Any transaction, normal or raw, will be visible as usual in the SCMI
traces (even though, currently, NOT distinguishable by type raw/normal)

So...after this other too much long mail (:P)...let me know what you
think about al of this (including the possibility of revive the runtime
dynamic switch too...)

Thanks,
Cristian

---->8-----

From 8613438d4171866088339e030959cb1de8e88c6a Mon Sep 17 00:00:00 2001
From: Cristian Marussi <cristian.marussi@arm.com>
Date: Sun, 21 Aug 2022 19:09:39 +0100
Subject: [PATCH] [DEBUG] firmware: arm_scmi: Add SCMI Raw mode COEXISTENCE
 support

When Raw mode support is configured in coexistence mode, normal SCMI
drivers are allowed to register and work as usual with the SCMI core.
Normal and raw SCMI message transactions will remain anyway segregated from
each other, it is just that any SCMI test suite using the Raw mode
access could report unreliable results due to possible interferences
from the regular drivers access to shared SCMI resources.

Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
---
 drivers/firmware/arm_scmi/Kconfig     | 10 ++++++++++
 drivers/firmware/arm_scmi/driver.c    | 21 +++++++++++++++------
 drivers/firmware/arm_scmi/protocols.h |  2 ++
 drivers/firmware/arm_scmi/raw_mode.c  |  2 +-
 4 files changed, 28 insertions(+), 7 deletions(-)

diff --git a/drivers/firmware/arm_scmi/Kconfig b/drivers/firmware/arm_scmi/Kconfig
index ab726a92ac2f..743f53fbe2f8 100644
--- a/drivers/firmware/arm_scmi/Kconfig
+++ b/drivers/firmware/arm_scmi/Kconfig
@@ -36,6 +36,16 @@ config ARM_SCMI_RAW_MODE_SUPPORT
 	  order to avoid unexpected interactions with the SCMI Raw message
 	  flow. If unsure say N.

+config ARM_SCMI_RAW_MODE_SUPPORT_COEX
+	bool "Allow SCMI Raw mode coexistence with normal SCMI stack"
+	depends on ARM_SCMI_RAW_MODE_SUPPORT
+	help
+	  Allow SCMI Raw transmission mode to coexist with normal SCMI stack.
+
+	  This will allow regular SCMI drivers to register with the core and
+	  operate normally, thing which could make an SCMI test suite using the
+	  SCMI Raw mode support unreliable. If unsure, say N.
+
 config ARM_SCMI_HAVE_TRANSPORT
 	bool
 	help
diff --git a/drivers/firmware/arm_scmi/driver.c b/drivers/firmware/arm_scmi/driver.c
index 32374fdba997..f0b06b6e8dc2 100644
--- a/drivers/firmware/arm_scmi/driver.c
+++ b/drivers/firmware/arm_scmi/driver.c
@@ -449,9 +449,14 @@ static struct scmi_xfer *scmi_xfer_get(const struct scmi_handle *handle,
  */
 struct scmi_xfer *scmi_xfer_raw_get(const struct scmi_handle *handle)
 {
+	struct scmi_xfer *xfer;
 	struct scmi_info *info = handle_to_scmi_info(handle);

-	return scmi_xfer_get(handle, &info->tx_minfo);
+	xfer = scmi_xfer_get(handle, &info->tx_minfo);
+	if (!IS_ERR(xfer))
+		xfer->is_raw = true;
+
+	return xfer;
 }

 /**
@@ -531,6 +536,7 @@ void scmi_xfer_raw_put(const struct scmi_handle *handle, struct scmi_xfer *xfer)
 {
 	struct scmi_info *info = handle_to_scmi_info(handle);

+	xfer->is_raw = false;
 	return __scmi_xfer_put(&info->tx_minfo, xfer);
 }

@@ -2401,7 +2407,8 @@ int scmi_protocol_device_request(const struct scmi_device_id *id_table)
 	pr_debug("Requesting SCMI device (%s) for protocol %x\n",
 		 id_table->name, id_table->protocol_id);

-	if (IS_ENABLED(CONFIG_ARM_SCMI_RAW_MODE_SUPPORT)) {
+	if (IS_ENABLED(CONFIG_ARM_SCMI_RAW_MODE_SUPPORT) &&
+	    !IS_ENABLED(CONFIG_ARM_SCMI_RAW_MODE_SUPPORT_COEX)) {
 		pr_warn("SCMI Raw mode active. Rejecting '%s'/0x%02X\n",
 			id_table->name, id_table->protocol_id);
 		return -EINVAL;
@@ -2634,11 +2641,13 @@ static int scmi_probe(struct platform_device *pdev)
 					       info->tx_minfo.max_msg);
 		if (!IS_ERR(info->raw)) {
 			dev_info(dev, "SCMI RAW Mode initialized.\n");
-			return 0;
+			if (!IS_ENABLED(CONFIG_ARM_SCMI_RAW_MODE_SUPPORT_COEX))
+				return 0;
+			dev_info(dev, "SCMI RAW Mode COEX enabled !\n");
+		} else {
+			dev_err(dev, "Failed to initialize SCMI RAW Mode !\n");
+			info->raw = NULL;
 		}
-
-		dev_err(dev, "Failed to initialize SCMI RAW Mode !\n");
-		info->raw = NULL;
 	}

 	if (scmi_notification_init(handle))
diff --git a/drivers/firmware/arm_scmi/protocols.h b/drivers/firmware/arm_scmi/protocols.h
index 2f3bf691db7c..70a48adcc320 100644
--- a/drivers/firmware/arm_scmi/protocols.h
+++ b/drivers/firmware/arm_scmi/protocols.h
@@ -88,6 +88,7 @@ struct scmi_msg_hdr {
 /**
  * struct scmi_xfer - Structure representing a message flow
  *
+ * @is_raw: Flag to state if this xfer has been generated by RAW mode
  * @transfer_id: Unique ID for debug & profiling purpose
  * @hdr: Transmit message header
  * @tx: Transmit message
@@ -119,6 +120,7 @@ struct scmi_msg_hdr {
  * @priv: A pointer for transport private usage.
  */
 struct scmi_xfer {
+	bool is_raw;
 	int transfer_id;
 	struct scmi_msg_hdr hdr;
 	struct scmi_msg tx;
diff --git a/drivers/firmware/arm_scmi/raw_mode.c b/drivers/firmware/arm_scmi/raw_mode.c
index 3fdfc0564286..0edaeb405267 100644
--- a/drivers/firmware/arm_scmi/raw_mode.c
+++ b/drivers/firmware/arm_scmi/raw_mode.c
@@ -1154,7 +1154,7 @@ void scmi_raw_message_report(void *r, struct scmi_xfer *xfer, unsigned int idx)
 	struct device *dev;
 	struct scmi_raw_mode_info *raw = r;

-	if (!raw)
+	if (!raw || (idx == SCMI_RAW_REPLY_QUEUE && !xfer->is_raw))
 		return;

 	dev = raw->handle->dev;

Vincent Guittot Nov. 2, 2022, 8:54 a.m. UTC | #9

On Fri, 28 Oct 2022 at 18:58, Cristian Marussi <cristian.marussi@arm.com> wrote:
>
> On Fri, Oct 28, 2022 at 06:18:52PM +0200, Vincent Guittot wrote:
> > On Fri, 28 Oct 2022 at 17:04, Cristian Marussi <cristian.marussi@arm.com> wrote:
> > >
> > > On Fri, Oct 28, 2022 at 04:40:02PM +0200, Vincent Guittot wrote:
> > > > On Wed, 19 Oct 2022 at 22:46, Cristian Marussi <cristian.marussi@arm.com> wrote:
> > > > >
> > > > > Hi all,
> > > > >
> > >
> > > Hi Vincent,
> > >
> > > > > This series aims to introduce a new SCMI unified userspace interface meant
> > > > > to ease testing an SCMI Server implementation for compliance, fuzzing etc.,
> > > > > from the perspective of the OSPM agent (non-secure world only ...)
> > > > >
> >
> > [ snip]
> >
> > > > Hi Cristian,
> > > >
> > > > I have tested your series with an optee message transport layer and
> > > > been able to send raw messages to the scmi server PTA
> > > >
> > > > FWIW
> > > >
> > > > Tested-by: Vincent Guittot <vincent.guittot@linaro.org>
> > > >
> > >
> > > Thanks a lot for your test and feedback !
> > >
> > > ... but I was going to reply to this saying that I spotted another issue
> > > with the current SCMI Raw implementation (NOT related to optee/smc) so
> > > that I'll post a V5 next week :P
> > >
> > > Anyway, the issue is much related to the debugfs root used by SCMI Raw,
> > > i.e.:
> > >
> > >         /sys/kernel/debug/scmi_raw/
> > >
> > > ..this works fine unless you run it on a system sporting multiple DT-defined
> > > server instances ...that is not officially supported but....ehm...a little
> > > bird told these system with multiple servers do exists :D
> >
> > ;-)
> >
> > >
> > > In such a case the SCMI core stack is probed multiuple times and so it
> > > will try to register multiple debugfs Raw roots: there is no chanche to
> > > root the SCMI Raw entries at the same point clearly ... and then anyway
> > > there is the issue of recognizing which server is rooted where ... with
> > > the additional pain that a server CANNOT be recognized by querying...cause
> > > there is only one by teh spec with agentID ZERO ... in theory :D...
> > >
> > > So my tentaive solution for V5 would be:
> > >
> > > - change the Raw root debugfs as:
> > >
> > >         /sys/kernel/debug/scmi_raw/0/... (first server defined)
> > >
> > >         /sys/kernel/debug/scmi_raw/1/... (possible additional server(s)..)
> > >
> > > - expose the DT scmi-server root-node name of the server somewhere under
> > >   that debugfs root like:
> > >
> > >         ..../scmi_raw/0/transport_name -> 'scmi-mbx'
> > >
> > >         ..../scmi_raw/1/transport_name -> 'scmi-virtio'
> >
> > I was about to say that you would display the server name but that
> > means that you have send a request to the server which probably
> > defeats the purpose of the raw mode
> >
> > >
> > >   so that if you know HOW you have configured your own system in the DT
> > >   (maybe multiple servers with different kind of transports ?), you can
> > >   easily select programmatically which one is which, and so decide
> > >   which Raw debugfs fs to use...
> > >
> > > ... I plan to leave the SCMI ACS suite use by default the first system
> > > rooted at /sys/kernel/debug/scmi_raw/0/...maybe adding a commandline
> > > option to choose an alternative path for SCMI Raw.
> > >
> > > Does all of this sound reasonable ?
> >
> > Yes, adding an index looks good to me.
>
> Ok, I'll rework accordingly.
>
> >
> > As we are there, should we consider adding a per channel entry in the
> > tree when there are several channels shared with the same server ?
> >
>
> So, I was thinking about this and, even though, it seems not strictly
> needed for Compliance testing (as discussed offline) I do think that
> could be a sensible option to have as an additional mean to stress the
> server transport implementation (as you wish).

Thanks

>
> Having said that, this week, I was reasoning about an alternative
> interface to do this, i.e. to avoid to add even more debugfs entries
> for this chosen-channel config or possibly in the future to ask for
> transport polling mode (if supported by the underlying transport)
>
> My idea (not thought fully through as of now eh..) would be as follows:
>
> since the current Raw implementation enforces a minimum size of 4 bytes
> for the injected message (more on this later down below in NOTE), I was
> thinking about using less-than-4-bytes-sized messages to sort of
> pre-configure the Raw stack.
>
> IOW, instead of having a number of new additional entries like
>
>         ../message_ch10
>         ../message_ch13
>         ../message_poll
>
> we could design a sort of API (in the API :D) that defines how
> 3-bytes message payload are to be interpreted, so that in normal usage
> everything will go on as it is now, while if a 3-bytes message is
> injected by a specially crafted testcase, it would be used to configure
> the behaviour stack for the subsequent set of Raw transactions
> (i.e. for the currently opened fd...) like:
>
> - open message fd
>
> - send a configure message:
>
>         | proto_chan_#  |     flags (polling..)  |
>         ------------------------------------------
>         0               7                       21
>
> - send/receive your test messages
>
> - repeat or close (then the config will vanish...)
>
> This would mean adding some sort entry under scmi_raw to expose the
> configured available channels on the system though.
>
> [maybe the flags above could also include an async flag and avoid
>  also to add the message_async entries...]
>
> I discarded the idea to run the above config process via IOCTLs since
> it seemed to me even more frowned upon to use IOCTLs on a debugfs entry
> :P...but I maybe wrong ah...
>
> All of this is still to be explored anyway, any thoughts ? or evident
> drawbacks ? (beside having to clearly define an API for these message
> config machinery)

TBH, I'm not a fan of adding a protocol on top of the SCMI one. This
interface aims to test the SCMI servers and their channels so we
should focus on this and make it simple to use. IMHO, adding some
special bytes before the real scmi message is prone to create
complexity and error in the use of this debug interface.

>
> Anyway, whatever direction we'll choose (additional entries vs 3-bytes
> config msg), I would prefer to add this per-channel (or polling)
> capabilities with separate series to post on top of this in teh next
> cycle.

Ok

>
> ..too many words even this time :P

Thanks
Vincent

>
> Thanks,
> Cristian
>
>
> P.S: NOTE min_injected_msg_size:
> --------------------------------
> Thinking about all of the above, at first, I was a bit dubious if
> instead I should not allow, in Raw mode, the injection of shorter than
> 4 bytes messages (i.e. shorter than a SCMI header) for the purpose of
> fuzzing: then I realized that even though I should allow the injection
> of smaller messages, the underlying transports, as they are defined, both
> sides (platform and agent) will anyway carry out a 4bytes transaction,
> it's just that all the other non-provided bytes will be zeroed in the
> memory layout; this is just how the transports itself (shmem or msg
> based) are designed to work both sides. (and again would be transport
> layer testing more than SCMI spec verification..)
>
> So at the end I thought this kind of less-than-4-bytes transmissions
> gave no benefit and I came up with the above trick to use such tiny
> message for configuration.
>

Cristian Marussi Nov. 3, 2022, 9:21 a.m. UTC | #10

On Wed, Nov 02, 2022 at 09:54:50AM +0100, Vincent Guittot wrote:
> On Fri, 28 Oct 2022 at 18:58, Cristian Marussi <cristian.marussi@arm.com> wrote:
> >
> > On Fri, Oct 28, 2022 at 06:18:52PM +0200, Vincent Guittot wrote:
> > > On Fri, 28 Oct 2022 at 17:04, Cristian Marussi <cristian.marussi@arm.com> wrote:
> > > >
> > > > On Fri, Oct 28, 2022 at 04:40:02PM +0200, Vincent Guittot wrote:
> > > > > On Wed, 19 Oct 2022 at 22:46, Cristian Marussi <cristian.marussi@arm.com> wrote:
> > > > > >
> > > > > > Hi all,
> > > > > >
> > > >
> > > > Hi Vincent,
> > > >
> > > > > > This series aims to introduce a new SCMI unified userspace interface meant
> > > > > > to ease testing an SCMI Server implementation for compliance, fuzzing etc.,
> > > > > > from the perspective of the OSPM agent (non-secure world only ...)
> > > > > >
> > >
> > > [ snip]
> > >
> > > > > Hi Cristian,
> > > > >
> > > > > I have tested your series with an optee message transport layer and
> > > > > been able to send raw messages to the scmi server PTA
> > > > >
> > > > > FWIW
> > > > >
> > > > > Tested-by: Vincent Guittot <vincent.guittot@linaro.org>
> > > > >
> > > >
> > > > Thanks a lot for your test and feedback !
> > > >
> > > > ... but I was going to reply to this saying that I spotted another issue
> > > > with the current SCMI Raw implementation (NOT related to optee/smc) so
> > > > that I'll post a V5 next week :P
> > > >
> > > > Anyway, the issue is much related to the debugfs root used by SCMI Raw,
> > > > i.e.:
> > > >
> > > >         /sys/kernel/debug/scmi_raw/
> > > >
> > > > ..this works fine unless you run it on a system sporting multiple DT-defined
> > > > server instances ...that is not officially supported but....ehm...a little
> > > > bird told these system with multiple servers do exists :D
> > >
> > > ;-)
> > >
> > > >
> > > > In such a case the SCMI core stack is probed multiuple times and so it
> > > > will try to register multiple debugfs Raw roots: there is no chanche to
> > > > root the SCMI Raw entries at the same point clearly ... and then anyway
> > > > there is the issue of recognizing which server is rooted where ... with
> > > > the additional pain that a server CANNOT be recognized by querying...cause
> > > > there is only one by teh spec with agentID ZERO ... in theory :D...
> > > >
> > > > So my tentaive solution for V5 would be:
> > > >
> > > > - change the Raw root debugfs as:
> > > >
> > > >         /sys/kernel/debug/scmi_raw/0/... (first server defined)
> > > >
> > > >         /sys/kernel/debug/scmi_raw/1/... (possible additional server(s)..)
> > > >
> > > > - expose the DT scmi-server root-node name of the server somewhere under
> > > >   that debugfs root like:
> > > >
> > > >         ..../scmi_raw/0/transport_name -> 'scmi-mbx'
> > > >
> > > >         ..../scmi_raw/1/transport_name -> 'scmi-virtio'
> > >
> > > I was about to say that you would display the server name but that
> > > means that you have send a request to the server which probably
> > > defeats the purpose of the raw mode
> > >
> > > >
> > > >   so that if you know HOW you have configured your own system in the DT
> > > >   (maybe multiple servers with different kind of transports ?), you can
> > > >   easily select programmatically which one is which, and so decide
> > > >   which Raw debugfs fs to use...
> > > >
> > > > ... I plan to leave the SCMI ACS suite use by default the first system
> > > > rooted at /sys/kernel/debug/scmi_raw/0/...maybe adding a commandline
> > > > option to choose an alternative path for SCMI Raw.
> > > >
> > > > Does all of this sound reasonable ?
> > >
> > > Yes, adding an index looks good to me.
> >
> > Ok, I'll rework accordingly.
> >
> > >
> > > As we are there, should we consider adding a per channel entry in the
> > > tree when there are several channels shared with the same server ?
> > >
> >
> > So, I was thinking about this and, even though, it seems not strictly
> > needed for Compliance testing (as discussed offline) I do think that
> > could be a sensible option to have as an additional mean to stress the
> > server transport implementation (as you wish).
> 
> Thanks
> 
> >
> > Having said that, this week, I was reasoning about an alternative
> > interface to do this, i.e. to avoid to add even more debugfs entries
> > for this chosen-channel config or possibly in the future to ask for
> > transport polling mode (if supported by the underlying transport)
> >
> > My idea (not thought fully through as of now eh..) would be as follows:
> >
> > since the current Raw implementation enforces a minimum size of 4 bytes
> > for the injected message (more on this later down below in NOTE), I was
> > thinking about using less-than-4-bytes-sized messages to sort of
> > pre-configure the Raw stack.
> >
> > IOW, instead of having a number of new additional entries like
> >
> >         ../message_ch10
> >         ../message_ch13
> >         ../message_poll
> >
> > we could design a sort of API (in the API :D) that defines how
> > 3-bytes message payload are to be interpreted, so that in normal usage
> > everything will go on as it is now, while if a 3-bytes message is
> > injected by a specially crafted testcase, it would be used to configure
> > the behaviour stack for the subsequent set of Raw transactions
> > (i.e. for the currently opened fd...) like:
> >
> > - open message fd
> >
> > - send a configure message:
> >
> >         | proto_chan_#  |     flags (polling..)  |
> >         ------------------------------------------
> >         0               7                       21
> >
> > - send/receive your test messages
> >
> > - repeat or close (then the config will vanish...)
> >
> > This would mean adding some sort entry under scmi_raw to expose the
> > configured available channels on the system though.
> >
> > [maybe the flags above could also include an async flag and avoid
> >  also to add the message_async entries...]
> >
> > I discarded the idea to run the above config process via IOCTLs since
> > it seemed to me even more frowned upon to use IOCTLs on a debugfs entry
> > :P...but I maybe wrong ah...
> >
> > All of this is still to be explored anyway, any thoughts ? or evident
> > drawbacks ? (beside having to clearly define an API for these message
> > config machinery)
> 
> TBH, I'm not a fan of adding a protocol on top of the SCMI one. This
> interface aims to test the SCMI servers and their channels so we
> should focus on this and make it simple to use. IMHO, adding some
> special bytes before the real scmi message is prone to create
> complexity and error in the use of this debug interface.
> 

Indeed, even if only for transport-related tests, the risk is to make
more complicate to use the interface.

Agreed, just wanted to have some feedback. I'll revert to some based on
debugfs trying to minimize entries and improper usage...maybe something
like grouping on per-channel subdirs when different channels are
available.

E.g. on a system with a dedicated PERF channel I could have


	../scmi_raw/0/message             <<< usual auto channel selection
		     /message_async 
		     ...

		     /chan_0x10/message   <<< use default channel (base proto)
		     	       /message_async
			       ....

		     /chan_0x13/message  <<< use PERF channel
		     	       /message_async
			       ....

The alternative would be to have a common single entry to configure the usage
of the a single batch of message/message_async entries BUT that seems to
me more prone to error, e.g. if you dont clear the special config at the
end of your special testcase you could endup using custom channels conf also
with any following regular test which expects to benefit from the
automatic channel selection (which I still think should be default way of
running these tests..)

Thoughts ?

Thanks for the feedback.
Cristian

Vincent Guittot Nov. 3, 2022, 10:58 a.m. UTC | #11

On Thu, 3 Nov 2022 at 10:21, Cristian Marussi <cristian.marussi@arm.com> wrote:
>
> On Wed, Nov 02, 2022 at 09:54:50AM +0100, Vincent Guittot wrote:
> > On Fri, 28 Oct 2022 at 18:58, Cristian Marussi <cristian.marussi@arm.com> wrote:
> > >
> > > On Fri, Oct 28, 2022 at 06:18:52PM +0200, Vincent Guittot wrote:
> > > > On Fri, 28 Oct 2022 at 17:04, Cristian Marussi <cristian.marussi@arm.com> wrote:
> > > > >
> > > > > On Fri, Oct 28, 2022 at 04:40:02PM +0200, Vincent Guittot wrote:
> > > > > > On Wed, 19 Oct 2022 at 22:46, Cristian Marussi <cristian.marussi@arm.com> wrote:
> > > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > >
> > > > > Hi Vincent,
> > > > >
> > > > > > > This series aims to introduce a new SCMI unified userspace interface meant
> > > > > > > to ease testing an SCMI Server implementation for compliance, fuzzing etc.,
> > > > > > > from the perspective of the OSPM agent (non-secure world only ...)
> > > > > > >
> > > >
> > > > [ snip]
> > > >
> > > > > > Hi Cristian,
> > > > > >
> > > > > > I have tested your series with an optee message transport layer and
> > > > > > been able to send raw messages to the scmi server PTA
> > > > > >
> > > > > > FWIW
> > > > > >
> > > > > > Tested-by: Vincent Guittot <vincent.guittot@linaro.org>
> > > > > >
> > > > >
> > > > > Thanks a lot for your test and feedback !
> > > > >
> > > > > ... but I was going to reply to this saying that I spotted another issue
> > > > > with the current SCMI Raw implementation (NOT related to optee/smc) so
> > > > > that I'll post a V5 next week :P
> > > > >
> > > > > Anyway, the issue is much related to the debugfs root used by SCMI Raw,
> > > > > i.e.:
> > > > >
> > > > >         /sys/kernel/debug/scmi_raw/
> > > > >
> > > > > ..this works fine unless you run it on a system sporting multiple DT-defined
> > > > > server instances ...that is not officially supported but....ehm...a little
> > > > > bird told these system with multiple servers do exists :D
> > > >
> > > > ;-)
> > > >
> > > > >
> > > > > In such a case the SCMI core stack is probed multiuple times and so it
> > > > > will try to register multiple debugfs Raw roots: there is no chanche to
> > > > > root the SCMI Raw entries at the same point clearly ... and then anyway
> > > > > there is the issue of recognizing which server is rooted where ... with
> > > > > the additional pain that a server CANNOT be recognized by querying...cause
> > > > > there is only one by teh spec with agentID ZERO ... in theory :D...
> > > > >
> > > > > So my tentaive solution for V5 would be:
> > > > >
> > > > > - change the Raw root debugfs as:
> > > > >
> > > > >         /sys/kernel/debug/scmi_raw/0/... (first server defined)
> > > > >
> > > > >         /sys/kernel/debug/scmi_raw/1/... (possible additional server(s)..)
> > > > >
> > > > > - expose the DT scmi-server root-node name of the server somewhere under
> > > > >   that debugfs root like:
> > > > >
> > > > >         ..../scmi_raw/0/transport_name -> 'scmi-mbx'
> > > > >
> > > > >         ..../scmi_raw/1/transport_name -> 'scmi-virtio'
> > > >
> > > > I was about to say that you would display the server name but that
> > > > means that you have send a request to the server which probably
> > > > defeats the purpose of the raw mode
> > > >
> > > > >
> > > > >   so that if you know HOW you have configured your own system in the DT
> > > > >   (maybe multiple servers with different kind of transports ?), you can
> > > > >   easily select programmatically which one is which, and so decide
> > > > >   which Raw debugfs fs to use...
> > > > >
> > > > > ... I plan to leave the SCMI ACS suite use by default the first system
> > > > > rooted at /sys/kernel/debug/scmi_raw/0/...maybe adding a commandline
> > > > > option to choose an alternative path for SCMI Raw.
> > > > >
> > > > > Does all of this sound reasonable ?
> > > >
> > > > Yes, adding an index looks good to me.
> > >
> > > Ok, I'll rework accordingly.
> > >
> > > >
> > > > As we are there, should we consider adding a per channel entry in the
> > > > tree when there are several channels shared with the same server ?
> > > >
> > >
> > > So, I was thinking about this and, even though, it seems not strictly
> > > needed for Compliance testing (as discussed offline) I do think that
> > > could be a sensible option to have as an additional mean to stress the
> > > server transport implementation (as you wish).
> >
> > Thanks
> >
> > >
> > > Having said that, this week, I was reasoning about an alternative
> > > interface to do this, i.e. to avoid to add even more debugfs entries
> > > for this chosen-channel config or possibly in the future to ask for
> > > transport polling mode (if supported by the underlying transport)
> > >
> > > My idea (not thought fully through as of now eh..) would be as follows:
> > >
> > > since the current Raw implementation enforces a minimum size of 4 bytes
> > > for the injected message (more on this later down below in NOTE), I was
> > > thinking about using less-than-4-bytes-sized messages to sort of
> > > pre-configure the Raw stack.
> > >
> > > IOW, instead of having a number of new additional entries like
> > >
> > >         ../message_ch10
> > >         ../message_ch13
> > >         ../message_poll
> > >
> > > we could design a sort of API (in the API :D) that defines how
> > > 3-bytes message payload are to be interpreted, so that in normal usage
> > > everything will go on as it is now, while if a 3-bytes message is
> > > injected by a specially crafted testcase, it would be used to configure
> > > the behaviour stack for the subsequent set of Raw transactions
> > > (i.e. for the currently opened fd...) like:
> > >
> > > - open message fd
> > >
> > > - send a configure message:
> > >
> > >         | proto_chan_#  |     flags (polling..)  |
> > >         ------------------------------------------
> > >         0               7                       21
> > >
> > > - send/receive your test messages
> > >
> > > - repeat or close (then the config will vanish...)
> > >
> > > This would mean adding some sort entry under scmi_raw to expose the
> > > configured available channels on the system though.
> > >
> > > [maybe the flags above could also include an async flag and avoid
> > >  also to add the message_async entries...]
> > >
> > > I discarded the idea to run the above config process via IOCTLs since
> > > it seemed to me even more frowned upon to use IOCTLs on a debugfs entry
> > > :P...but I maybe wrong ah...
> > >
> > > All of this is still to be explored anyway, any thoughts ? or evident
> > > drawbacks ? (beside having to clearly define an API for these message
> > > config machinery)
> >
> > TBH, I'm not a fan of adding a protocol on top of the SCMI one. This
> > interface aims to test the SCMI servers and their channels so we
> > should focus on this and make it simple to use. IMHO, adding some
> > special bytes before the real scmi message is prone to create
> > complexity and error in the use of this debug interface.
> >
>
> Indeed, even if only for transport-related tests, the risk is to make
> more complicate to use the interface.
>
> Agreed, just wanted to have some feedback. I'll revert to some based on
> debugfs trying to minimize entries and improper usage...maybe something
> like grouping on per-channel subdirs when different channels are
> available.
>
> E.g. on a system with a dedicated PERF channel I could have
>
>
>         ../scmi_raw/0/message             <<< usual auto channel selection
>                      /message_async
>                      ...
>
>                      /chan_0x10/message   <<< use default channel (base proto)
>                                /message_async
>                                ....
>
>                      /chan_0x13/message  <<< use PERF channel
>                                /message_async
>                                ....


The proposal above looks good for me

Thanks

>
> The alternative would be to have a common single entry to configure the usage
> of the a single batch of message/message_async entries BUT that seems to
> me more prone to error, e.g. if you dont clear the special config at the
> end of your special testcase you could endup using custom channels conf also
> with any following regular test which expects to benefit from the
> automatic channel selection (which I still think should be default way of
> running these tests..)
>
> Thoughts ?
>
> Thanks for the feedback.
> Cristian

Sudeep Holla Nov. 3, 2022, 11:21 a.m. UTC | #12

On Fri, Oct 28, 2022 at 07:38:25PM -0700, Florian Fainelli wrote:
> Hi Christian,
> 
> On 10/19/2022 1:46 PM, Cristian Marussi wrote:
> [snip]
> 
> > In V2 the runtime enable/disable switching capability has been removed
> > (for now) since still not deemed to be stable/reliable enough: as a
> > consequence when SCMI Raw support is compiled in, the regular SCMI stack
> > drivers are now inhibited permanently for that Kernel.
> 
> For our platforms (ARCH_BRCMSTB) we would need to have the ability to start
> with the regular SCMI stack to satisfy if nothing else, all clock consumers
> otherwise it makes it fairly challenging for us to boot to a prompt as we
> purposely turn off all unnecessary peripherals to conserve power. We could
> introduce a "full on" mode to remove the clock provider dependency, but I
> suspect others on "real" silicon may suffer from the same short comings.
>

Fair enough. But if we are doing SCMI firmware testing or conformance via
the $subject proposed way, can these drivers survive if the userspace do
a random or a torture test changing the clock configurations ? Not sure
how to deal with that as the intention here is to do the testing from the
user-space and anything can happen. How do we avoid bring the entire system
down while doing this testing. Can we unbind all the drivers using scmi on
your platform ? I guess no. Let me know.

> Once user-space is reached, I suppose we could find a way to unbind from all
> SCMI consumers, and/or ensure that runtime PM is disabled, cpufreq is in a
> governor that won't do any active frequency switching etc.
>
> What do you think?

Yes, Cristian always wanted to support that but I am the one trying to
convince him not to unless there is a strong requirement for it. You seem
to suggest that you have such a requirement, but that just opens loads of
questions and how to we deal with that. Few of them are as stated above, I
need to recall all the conversations I had with Cristian around that and why
handling it may be bit complex.

--
Regards,
Sudeep

Cristian Marussi Nov. 3, 2022, 12:06 p.m. UTC | #13

On Thu, Nov 03, 2022 at 11:21:47AM +0000, Sudeep Holla wrote:
> On Fri, Oct 28, 2022 at 07:38:25PM -0700, Florian Fainelli wrote:
> > Hi Christian,
> > 
> > On 10/19/2022 1:46 PM, Cristian Marussi wrote:
> > [snip]
> > 
> > > In V2 the runtime enable/disable switching capability has been removed
> > > (for now) since still not deemed to be stable/reliable enough: as a
> > > consequence when SCMI Raw support is compiled in, the regular SCMI stack
> > > drivers are now inhibited permanently for that Kernel.
> > 
> > For our platforms (ARCH_BRCMSTB) we would need to have the ability to start
> > with the regular SCMI stack to satisfy if nothing else, all clock consumers
> > otherwise it makes it fairly challenging for us to boot to a prompt as we
> > purposely turn off all unnecessary peripherals to conserve power. We could
> > introduce a "full on" mode to remove the clock provider dependency, but I
> > suspect others on "real" silicon may suffer from the same short comings.
> >
> 
> Fair enough. But if we are doing SCMI firmware testing or conformance via
> the $subject proposed way, can these drivers survive if the userspace do
> a random or a torture test changing the clock configurations ? Not sure
> how to deal with that as the intention here is to do the testing from the
> user-space and anything can happen. How do we avoid bring the entire system
> down while doing this testing. Can we unbind all the drivers using scmi on
> your platform ? I guess no. Let me know.
> 
> > Once user-space is reached, I suppose we could find a way to unbind from all
> > SCMI consumers, and/or ensure that runtime PM is disabled, cpufreq is in a
> > governor that won't do any active frequency switching etc.
> >
> > What do you think?
> 
> Yes, Cristian always wanted to support that but I am the one trying to
> convince him not to unless there is a strong requirement for it. You seem
> to suggest that you have such a requirement, but that just opens loads of
> questions and how to we deal with that. Few of them are as stated above, I
> need to recall all the conversations I had with Cristian around that and why
> handling it may be bit complex.

:D ... I really even more like the idea of enabling on demand full coexistence
so that I completely delegate to the users to manually deal with possible
interferences at runtime and drop any liabilities if someone shoots himself
in the foot :P

... jokes apart I'll post today a V5 with a few fixes and and an optional
coexistence mode so that Florian can experiment and see how much is feasible
to operate in this way by manually unbinding/re-configuring SCMI behaviour
at runtime before starting tests and not kill the system on something
like ARCH_BRCMSTB platforms.

Thanks,
Cristian