spi: spi-mem: add statistics support to ->exec_op() calls

Message ID 20240209-spi-mem-stats-v1-1-dd1a422fc015@bootlin.com
State New
Headers
Series spi: spi-mem: add statistics support to ->exec_op() calls |

Commit Message

Théo Lebrun Feb. 9, 2024, 1:51 p.m. UTC
  Current behavior is that spi-mem operations do not increment statistics,
neither per-controller nor per-device, if ->exec_op() is used. For
operations that do NOT use ->exec_op(), stats are increased as the
usual spi_sync() is called.

The newly implemented spi_mem_add_op_stats() function is strongly
inspired by spi_statistics_add_transfer_stats(); locking logic and
l2len computation comes from there.

Statistics that are being filled: bytes{,_rx,_tx}, messages, transfers,
errors, timedout, transfer_bytes_histo_*.

Note about messages & transfers counters: in the fallback to spi_sync()
case, there are from 1 to 4 transfers per message. We only register one
big transfer in the ->exec_op() case as that is closer to reality.

This patch is NOT touching:
 - spi_async, spi_sync, spi_sync_immediate: those counters describe
   precise function calls, incrementing them would be lying. I believe
   comparing the messages counter to spi_async+spi_sync is a good way
   to detect ->exec_op() calls, but I might be missing edge cases
   knowledge.
 - transfers_split_maxsize: splitting cannot happen if ->exec_op() is
   provided.

Testing this patch:

   $ cd /sys/devices/platform/soc
   $ find . -type d -path "*spi*" -name statistics
   ./2100000.spi/spi_master/spi0/statistics
   ./2100000.spi/spi_master/spi0/spi0.0/statistics
   $ cd ./2100000.spi/spi_master/spi0/statistics

   $ for f in *; do printf "%s\t" $f; cat $f; done | \
         grep -v transfer_bytes_histo | column -t
   bytes                    240745444
   bytes_rx                 240170907
   bytes_tx                 126320
   errors                   0
   messages                 97354
   spi_async                0
   spi_sync                 0
   spi_sync_immediate       0
   timedout                 0
   transfers                97354
   transfers_split_maxsize  0

Signed-off-by: Théo Lebrun <theo.lebrun@bootlin.com>
---
 drivers/spi/spi-mem.c | 50 +++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 49 insertions(+), 1 deletion(-)


---
base-commit: 19b50f80b3a4865bd477aa5c026dd234d39a50d2
change-id: 20240209-spi-mem-stats-ff9bf91c0f7e

Best regards,
  

Comments

Dhruva Gole Feb. 12, 2024, 11:13 a.m. UTC | #1
Hi!

On Feb 09, 2024 at 14:51:23 +0100, Théo Lebrun wrote:
> Current behavior is that spi-mem operations do not increment statistics,
> neither per-controller nor per-device, if ->exec_op() is used. For
> operations that do NOT use ->exec_op(), stats are increased as the
> usual spi_sync() is called.
> 
> The newly implemented spi_mem_add_op_stats() function is strongly
> inspired by spi_statistics_add_transfer_stats(); locking logic and
> l2len computation comes from there.
> 
> Statistics that are being filled: bytes{,_rx,_tx}, messages, transfers,
> errors, timedout, transfer_bytes_histo_*.
> 
> Note about messages & transfers counters: in the fallback to spi_sync()
> case, there are from 1 to 4 transfers per message. We only register one
> big transfer in the ->exec_op() case as that is closer to reality.

Can you please elaborate on this point a bit? To me it feels then that
the reported stats in this case will be less than the true value then?

> 
> This patch is NOT touching:
>  - spi_async, spi_sync, spi_sync_immediate: those counters describe
>    precise function calls, incrementing them would be lying. I believe
>    comparing the messages counter to spi_async+spi_sync is a good way
>    to detect ->exec_op() calls, but I might be missing edge cases
>    knowledge.
>  - transfers_split_maxsize: splitting cannot happen if ->exec_op() is
>    provided.

Credit where it's due - This is a very well written and verbose commit
message.

Just my personal opinion maybe but all this data about testing can go
below the tear line in the description?

Or somewhere in the kernel docs would also be just fine. (I know we
kernel developers consider git log as the best source of documentation
:) ) but still.. if you feel like adding ;)

No strong opinions there though.

> 
> Testing this patch:
> 
>    $ cd /sys/devices/platform/soc
>    $ find . -type d -path "*spi*" -name statistics
>    ./2100000.spi/spi_master/spi0/statistics
>    ./2100000.spi/spi_master/spi0/spi0.0/statistics
>    $ cd ./2100000.spi/spi_master/spi0/statistics
> 
>    $ for f in *; do printf "%s\t" $f; cat $f; done | \
>          grep -v transfer_bytes_histo | column -t
>    bytes                    240745444
>    bytes_rx                 240170907
>    bytes_tx                 126320
>    errors                   0
>    messages                 97354
>    spi_async                0
>    spi_sync                 0
>    spi_sync_immediate       0
>    timedout                 0
>    transfers                97354
>    transfers_split_maxsize  0
> 
> Signed-off-by: Théo Lebrun <theo.lebrun@bootlin.com>
> ---
>  drivers/spi/spi-mem.c | 50 +++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 49 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/spi/spi-mem.c b/drivers/spi/spi-mem.c
> index 2dc8ceb85374..171fe6b1c247 100644
> --- a/drivers/spi/spi-mem.c
> +++ b/drivers/spi/spi-mem.c
> @@ -297,6 +297,50 @@ static void spi_mem_access_end(struct spi_mem *mem)
>  		pm_runtime_put(ctlr->dev.parent);
>  }
>  
> +static void spi_mem_add_op_stats(struct spi_statistics __percpu *pcpu_stats,
> +				 const struct spi_mem_op *op, int exec_op_ret)
> +{
> +	struct spi_statistics *stats;
> +	int len, l2len;
> +
> +	get_cpu();
> +	stats = this_cpu_ptr(pcpu_stats);
> +	u64_stats_update_begin(&stats->syncp);
> +
> +	/*
> +	 * We do not have the concept of messages or transfers. Let's consider
> +	 * that one operation is equivalent to one message and one transfer.

Why 1 message _and_ 1 xfer and not simply 1 xfer?
Even in the example of testing that you showed above the values for
message and xfer are anyway going to be same, then why have these 2
members in the first place? Can we not do away with one of these?

Sorry but I am not too familiar with u64_stats_inc so my q. may sound
silly.

Seems to be more of a concept widely used in networking side of the
kernel.

> +	 */
> +	u64_stats_inc(&stats->messages);
> +	u64_stats_inc(&stats->transfers);
> +
> +	/* Use the sum of all lengths as bytes count and histogram value. */
> +	len = (int)op->cmd.nbytes + (int)op->addr.nbytes;
> +	len += (int)op->dummy.nbytes + (int)op->data.nbytes;
> +	u64_stats_add(&stats->bytes, len);
> +	l2len = min(fls(len), SPI_STATISTICS_HISTO_SIZE) - 1;
> +	l2len = max(l2len, 0);
> +	u64_stats_inc(&stats->transfer_bytes_histo[l2len]);
> +
> +	/* Only account for data bytes as xferred bytes. */
> +	if (op->data.nbytes && op->data.dir == SPI_MEM_DATA_OUT)
> +		u64_stats_add(&stats->bytes_tx, op->data.nbytes);
> +	if (op->data.nbytes && op->data.dir == SPI_MEM_DATA_IN)
> +		u64_stats_add(&stats->bytes_rx, op->data.nbytes);
> +
> +	/*
> +	 * A timeout is not an error, following the same behavior as
> +	 * spi_transfer_one_message().
> +	 */
> +	if (exec_op_ret == -ETIMEDOUT)
> +		u64_stats_inc(&stats->timedout);
> +	else if (exec_op_ret)
> +		u64_stats_inc(&stats->errors);
> +
> +	u64_stats_update_end(&stats->syncp);
> +	put_cpu();
> +}
> +
>  /**
>   * spi_mem_exec_op() - Execute a memory operation
>   * @mem: the SPI memory
> @@ -339,8 +383,12 @@ int spi_mem_exec_op(struct spi_mem *mem, const struct spi_mem_op *op)
>  		 * read path) and expect the core to use the regular SPI
>  		 * interface in other cases.
>  		 */
> -		if (!ret || ret != -ENOTSUPP || ret != -EOPNOTSUPP)
> +		if (!ret || ret != -ENOTSUPP || ret != -EOPNOTSUPP) {
> +			spi_mem_add_op_stats(ctlr->pcpu_statistics, op, ret);
> +			spi_mem_add_op_stats(mem->spi->pcpu_statistics, op, ret);
> +

Just curious, how much does this impact performance? Have you been able
to do some before / after profiling with/out this patch?

For eg. for every single spimem op I'm constantly going to incur the
penalty of these calls right?

Just wondering if we can / should make this optional to have the
op_stats. If there is a perf penalty, like if my ospi operations start
being impacted by these calls then I may not be okay with this patch.

But if you have tested and not found it to be the case I am okay with
these changes.

If I find some time later, I'll try to test but I'm caught up with some
other work. For now I'll leave my R-by with the above conditions
addressed / answered.

Mostly LGTM,

Reviewed-by: Dhruva Gole <d-gole@ti.com>
  
Mark Brown Feb. 12, 2024, 1:35 p.m. UTC | #2
On Mon, Feb 12, 2024 at 04:43:55PM +0530, Dhruva Gole wrote:
> On Feb 09, 2024 at 14:51:23 +0100, Théo Lebrun wrote:

> > +	/*
> > +	 * We do not have the concept of messages or transfers. Let's consider
> > +	 * that one operation is equivalent to one message and one transfer.

> Why 1 message _and_ 1 xfer and not simply 1 xfer?
> Even in the example of testing that you showed above the values for
> message and xfer are anyway going to be same, then why have these 2
> members in the first place? Can we not do away with one of these?

If the device supports regular SPI operations as well as spi-mem
operations then this will ensure that the spi-mem stats fit in with the
other operations.  If it only supports spi-mem operations it would not
matter so much.
  
Théo Lebrun Feb. 12, 2024, 2:22 p.m. UTC | #3
Hello,

On Mon Feb 12, 2024 at 12:13 PM CET, Dhruva Gole wrote:
> Hi!
>
> On Feb 09, 2024 at 14:51:23 +0100, Théo Lebrun wrote:
> > Current behavior is that spi-mem operations do not increment statistics,
> > neither per-controller nor per-device, if ->exec_op() is used. For
> > operations that do NOT use ->exec_op(), stats are increased as the
> > usual spi_sync() is called.
> > 
> > The newly implemented spi_mem_add_op_stats() function is strongly
> > inspired by spi_statistics_add_transfer_stats(); locking logic and
> > l2len computation comes from there.
> > 
> > Statistics that are being filled: bytes{,_rx,_tx}, messages, transfers,
> > errors, timedout, transfer_bytes_histo_*.
> > 
> > Note about messages & transfers counters: in the fallback to spi_sync()
> > case, there are from 1 to 4 transfers per message. We only register one
> > big transfer in the ->exec_op() case as that is closer to reality.
>
> Can you please elaborate on this point a bit? To me it feels then that
> the reported stats in this case will be less than the true value then?

To me, a transfer is one transaction with the SPI controller. In most
implementations of ->exec_op(), the controller gets configured once for
the full transfer to take place. This contrasts with the fallback case
that does from 1 to 4 transfers (cmd, addr, dummy & data, with the last
three being optional).

One transfer feels closer to what happens from my point-of-view. What
would be your definition of a transfer? Or the "official" one that the
sysfs entry represents?

> > This patch is NOT touching:
> >  - spi_async, spi_sync, spi_sync_immediate: those counters describe
> >    precise function calls, incrementing them would be lying. I believe
> >    comparing the messages counter to spi_async+spi_sync is a good way
> >    to detect ->exec_op() calls, but I might be missing edge cases
> >    knowledge.
> >  - transfers_split_maxsize: splitting cannot happen if ->exec_op() is
> >    provided.
>
> Credit where it's due - This is a very well written and verbose commit
> message.

Thanks!

> Just my personal opinion maybe but all this data about testing can go
> below the tear line in the description?

I see where you are coming from. I'll do so on the next revision (if
there is one).

> Or somewhere in the kernel docs would also be just fine. (I know we
> kernel developers consider git log as the best source of documentation
> :) ) but still.. if you feel like adding ;)

A first step would be to have the sysfs SPI statistics API be described
inside Documentation/. That is outside the scope of this patch
though. :-)

> No strong opinions there though.

Same.

[...]

> > +static void spi_mem_add_op_stats(struct spi_statistics __percpu *pcpu_stats,
> > +				 const struct spi_mem_op *op, int exec_op_ret)
> > +{
> > +	struct spi_statistics *stats;
> > +	int len, l2len;
> > +
> > +	get_cpu();
> > +	stats = this_cpu_ptr(pcpu_stats);
> > +	u64_stats_update_begin(&stats->syncp);
> > +
> > +	/*
> > +	 * We do not have the concept of messages or transfers. Let's consider
> > +	 * that one operation is equivalent to one message and one transfer.
>
> Why 1 message _and_ 1 xfer and not simply 1 xfer?
> Even in the example of testing that you showed above the values for
> message and xfer are anyway going to be same, then why have these 2
> members in the first place? Can we not do away with one of these?

Mark Brown gave an answer to this. Indeed, with regular SPI operations,
one message doesn't map to one transfer.

[...]

> >  /**
> >   * spi_mem_exec_op() - Execute a memory operation
> >   * @mem: the SPI memory
> > @@ -339,8 +383,12 @@ int spi_mem_exec_op(struct spi_mem *mem, const struct spi_mem_op *op)
> >  		 * read path) and expect the core to use the regular SPI
> >  		 * interface in other cases.
> >  		 */
> > -		if (!ret || ret != -ENOTSUPP || ret != -EOPNOTSUPP)
> > +		if (!ret || ret != -ENOTSUPP || ret != -EOPNOTSUPP) {
> > +			spi_mem_add_op_stats(ctlr->pcpu_statistics, op, ret);
> > +			spi_mem_add_op_stats(mem->spi->pcpu_statistics, op, ret);
> > +
>
> Just curious, how much does this impact performance? Have you been able
> to do some before / after profiling with/out this patch?
>
> For eg. for every single spimem op I'm constantly going to incur the
> penalty of these calls right?

I have indeed done some benchmarking. I was not able to measure anything
significant. Neither doing timings of end-to-end testing by reading
loads of data over UBIFS, nor by using ftrace's function_graph.

> Just wondering if we can / should make this optional to have the
> op_stats. If there is a perf penalty, like if my ospi operations start
> being impacted by these calls then I may not be okay with this patch.

I've asked myself the same question. It is being done unconditionally on
regular SPI ops, so I guess the question has been answered previously:
no need to make this conditional.

See spi_statistics_add_transfer_stats() in drivers/spi/spi.c.

> But if you have tested and not found it to be the case I am okay with
> these changes.
>
> If I find some time later, I'll try to test but I'm caught up with some
> other work. For now I'll leave my R-by with the above conditions
> addressed / answered.
>
> Mostly LGTM,
>
> Reviewed-by: Dhruva Gole <d-gole@ti.com>

Thanks for your review! I don't regret adding you to the Cc list.

Regards,

--
Théo Lebrun, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com
  
Dhruva Gole Feb. 13, 2024, 4:54 a.m. UTC | #4
Hi,

On Feb 12, 2024 at 15:22:41 +0100, Théo Lebrun wrote:
> Hello,
> 
> On Mon Feb 12, 2024 at 12:13 PM CET, Dhruva Gole wrote:
> > Hi!
> >
> > On Feb 09, 2024 at 14:51:23 +0100, Théo Lebrun wrote:
> > > Current behavior is that spi-mem operations do not increment statistics,
> > > neither per-controller nor per-device, if ->exec_op() is used. For
> > > operations that do NOT use ->exec_op(), stats are increased as the
> > > usual spi_sync() is called.
> > > 
> > > The newly implemented spi_mem_add_op_stats() function is strongly
> > > inspired by spi_statistics_add_transfer_stats(); locking logic and
> > > l2len computation comes from there.
> > > 
> > > Statistics that are being filled: bytes{,_rx,_tx}, messages, transfers,
> > > errors, timedout, transfer_bytes_histo_*.
> > > 
> > > Note about messages & transfers counters: in the fallback to spi_sync()
> > > case, there are from 1 to 4 transfers per message. We only register one
> > > big transfer in the ->exec_op() case as that is closer to reality.
> >
> > Can you please elaborate on this point a bit? To me it feels then that
> > the reported stats in this case will be less than the true value then?
> 
> To me, a transfer is one transaction with the SPI controller. In most
> implementations of ->exec_op(), the controller gets configured once for
> the full transfer to take place. This contrasts with the fallback case
> that does from 1 to 4 transfers (cmd, addr, dummy & data, with the last
> three being optional).
> 
> One transfer feels closer to what happens from my point-of-view. What
> would be your definition of a transfer? Or the "official" one that the
> sysfs entry represents?

Yeah I understand your point, this is something I'd also call as a transaction

> 
> > > This patch is NOT touching:
> > >  - spi_async, spi_sync, spi_sync_immediate: those counters describe
> > >    precise function calls, incrementing them would be lying. I believe
> > >    comparing the messages counter to spi_async+spi_sync is a good way
> > >    to detect ->exec_op() calls, but I might be missing edge cases
> > >    knowledge.
> > >  - transfers_split_maxsize: splitting cannot happen if ->exec_op() is
> > >    provided.
> >
> > Credit where it's due - This is a very well written and verbose commit
> > message.
> 
> Thanks!
> 
> > Just my personal opinion maybe but all this data about testing can go
> > below the tear line in the description?
> 
> I see where you are coming from. I'll do so on the next revision (if
> there is one).

cool!

> 
> > Or somewhere in the kernel docs would also be just fine. (I know we
> > kernel developers consider git log as the best source of documentation
> > :) ) but still.. if you feel like adding ;)
> 
> A first step would be to have the sysfs SPI statistics API be described
> inside Documentation/. That is outside the scope of this patch
> though. :-)
> 
> > No strong opinions there though.
> 
> Same.
> 
> [...]
> 
> > > +static void spi_mem_add_op_stats(struct spi_statistics __percpu *pcpu_stats,
> > > +				 const struct spi_mem_op *op, int exec_op_ret)
> > > +{
> > > +	struct spi_statistics *stats;
> > > +	int len, l2len;
> > > +
> > > +	get_cpu();
> > > +	stats = this_cpu_ptr(pcpu_stats);
> > > +	u64_stats_update_begin(&stats->syncp);
> > > +
> > > +	/*
> > > +	 * We do not have the concept of messages or transfers. Let's consider
> > > +	 * that one operation is equivalent to one message and one transfer.
> >
> > Why 1 message _and_ 1 xfer and not simply 1 xfer?
> > Even in the example of testing that you showed above the values for
> > message and xfer are anyway going to be same, then why have these 2
> > members in the first place? Can we not do away with one of these?
> 
> Mark Brown gave an answer to this. Indeed, with regular SPI operations,
> one message doesn't map to one transfer.

Thanks for explaining Mark, understood.

> 
> [...]
> 
> > >  /**
> > >   * spi_mem_exec_op() - Execute a memory operation
> > >   * @mem: the SPI memory
> > > @@ -339,8 +383,12 @@ int spi_mem_exec_op(struct spi_mem *mem, const struct spi_mem_op *op)
> > >  		 * read path) and expect the core to use the regular SPI
> > >  		 * interface in other cases.
> > >  		 */
> > > -		if (!ret || ret != -ENOTSUPP || ret != -EOPNOTSUPP)
> > > +		if (!ret || ret != -ENOTSUPP || ret != -EOPNOTSUPP) {
> > > +			spi_mem_add_op_stats(ctlr->pcpu_statistics, op, ret);
> > > +			spi_mem_add_op_stats(mem->spi->pcpu_statistics, op, ret);
> > > +
> >
> > Just curious, how much does this impact performance? Have you been able
> > to do some before / after profiling with/out this patch?
> >
> > For eg. for every single spimem op I'm constantly going to incur the
> > penalty of these calls right?
> 
> I have indeed done some benchmarking. I was not able to measure anything
> significant. Neither doing timings of end-to-end testing by reading
> loads of data over UBIFS, nor by using ftrace's function_graph.

Awesome.

> 
> > Just wondering if we can / should make this optional to have the
> > op_stats. If there is a perf penalty, like if my ospi operations start
> > being impacted by these calls then I may not be okay with this patch.
> 
> I've asked myself the same question. It is being done unconditionally on
> regular SPI ops, so I guess the question has been answered previously:
> no need to make this conditional.
> 
> See spi_statistics_add_transfer_stats() in drivers/spi/spi.c.

Yeah I did see that, but maybe it didn't occur then whether we should
make it optional.

Anyway, I'm ok with this too. Let's worry about optional in future if
required.

> 
> > But if you have tested and not found it to be the case I am okay with
> > these changes.
> >
> > If I find some time later, I'll try to test but I'm caught up with some
> > other work. For now I'll leave my R-by with the above conditions
> > addressed / answered.
> >
> > Mostly LGTM,
> >
> > Reviewed-by: Dhruva Gole <d-gole@ti.com>
> 
> Thanks for your review! I don't regret adding you to the Cc list.
> 

Cheers!
  
Tudor Ambarus Feb. 13, 2024, 12:39 p.m. UTC | #5
On 2/9/24 13:51, Théo Lebrun wrote:
> Current behavior is that spi-mem operations do not increment statistics,
> neither per-controller nor per-device, if ->exec_op() is used. For
> operations that do NOT use ->exec_op(), stats are increased as the
> usual spi_sync() is called.
> 
> The newly implemented spi_mem_add_op_stats() function is strongly
> inspired by spi_statistics_add_transfer_stats(); locking logic and
> l2len computation comes from there.
> 
> Statistics that are being filled: bytes{,_rx,_tx}, messages, transfers,
> errors, timedout, transfer_bytes_histo_*.
> 
> Note about messages & transfers counters: in the fallback to spi_sync()
> case, there are from 1 to 4 transfers per message. We only register one
> big transfer in the ->exec_op() case as that is closer to reality.
> 
> This patch is NOT touching:
>  - spi_async, spi_sync, spi_sync_immediate: those counters describe
>    precise function calls, incrementing them would be lying. I believe
>    comparing the messages counter to spi_async+spi_sync is a good way
>    to detect ->exec_op() calls, but I might be missing edge cases
>    knowledge.
>  - transfers_split_maxsize: splitting cannot happen if ->exec_op() is
>    provided.
> 
> Testing this patch:
> 
>    $ cd /sys/devices/platform/soc
>    $ find . -type d -path "*spi*" -name statistics
>    ./2100000.spi/spi_master/spi0/statistics
>    ./2100000.spi/spi_master/spi0/spi0.0/statistics
>    $ cd ./2100000.spi/spi_master/spi0/statistics
> 
>    $ for f in *; do printf "%s\t" $f; cat $f; done | \
>          grep -v transfer_bytes_histo | column -t
>    bytes                    240745444
>    bytes_rx                 240170907
>    bytes_tx                 126320
>    errors                   0
>    messages                 97354
>    spi_async                0
>    spi_sync                 0
>    spi_sync_immediate       0
>    timedout                 0
>    transfers                97354
>    transfers_split_maxsize  0
> 
> Signed-off-by: Théo Lebrun <theo.lebrun@bootlin.com>
> ---
>  drivers/spi/spi-mem.c | 50 +++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 49 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/spi/spi-mem.c b/drivers/spi/spi-mem.c
> index 2dc8ceb85374..171fe6b1c247 100644
> --- a/drivers/spi/spi-mem.c
> +++ b/drivers/spi/spi-mem.c
> @@ -297,6 +297,50 @@ static void spi_mem_access_end(struct spi_mem *mem)
>  		pm_runtime_put(ctlr->dev.parent);
>  }
>  
> +static void spi_mem_add_op_stats(struct spi_statistics __percpu *pcpu_stats,
> +				 const struct spi_mem_op *op, int exec_op_ret)
> +{
> +	struct spi_statistics *stats;
> +	int len, l2len;
> +
> +	get_cpu();
> +	stats = this_cpu_ptr(pcpu_stats);
> +	u64_stats_update_begin(&stats->syncp);
> +
> +	/*
> +	 * We do not have the concept of messages or transfers. Let's consider
> +	 * that one operation is equivalent to one message and one transfer.
> +	 */
> +	u64_stats_inc(&stats->messages);
> +	u64_stats_inc(&stats->transfers);
> +
> +	/* Use the sum of all lengths as bytes count and histogram value. */
> +	len = (int)op->cmd.nbytes + (int)op->addr.nbytes;
> +	len += (int)op->dummy.nbytes + (int)op->data.nbytes;

spi_mem_check_op() makes sure that op->cmd.nbytes != 0, otherwise it
returns -EINVAL ...

> +	u64_stats_add(&stats->bytes, len);
> +	l2len = min(fls(len), SPI_STATISTICS_HISTO_SIZE) - 1;

.. thus l2len can never be negative. You can declare len and l2len as
u64. The casts from above shall disappear.

> +	l2len = max(l2len, 0);
> +	u64_stats_inc(&stats->transfer_bytes_histo[l2len]);
> +
> +	/* Only account for data bytes as xferred bytes. */

s/xferred/transferred?

> +	if (op->data.nbytes && op->data.dir == SPI_MEM_DATA_OUT)
> +		u64_stats_add(&stats->bytes_tx, op->data.nbytes);
> +	if (op->data.nbytes && op->data.dir == SPI_MEM_DATA_IN)
> +		u64_stats_add(&stats->bytes_rx, op->data.nbytes);
> +
> +	/*
> +	 * A timeout is not an error, following the same behavior as
> +	 * spi_transfer_one_message().
> +	 */
> +	if (exec_op_ret == -ETIMEDOUT)
> +		u64_stats_inc(&stats->timedout);
> +	else if (exec_op_ret)
> +		u64_stats_inc(&stats->errors);
> +
> +	u64_stats_update_end(&stats->syncp);
> +	put_cpu();
> +}
> +
>  /**
>   * spi_mem_exec_op() - Execute a memory operation
>   * @mem: the SPI memory
> @@ -339,8 +383,12 @@ int spi_mem_exec_op(struct spi_mem *mem, const struct spi_mem_op *op)
>  		 * read path) and expect the core to use the regular SPI
>  		 * interface in other cases.
>  		 */
> -		if (!ret || ret != -ENOTSUPP || ret != -EOPNOTSUPP)
> +		if (!ret || ret != -ENOTSUPP || ret != -EOPNOTSUPP) {
> +			spi_mem_add_op_stats(ctlr->pcpu_statistics, op, ret);
> +			spi_mem_add_op_stats(mem->spi->pcpu_statistics, op, ret);
> +

Would be good to be able to opt out the statistics if one wants it.

SPI NORs can write with a single write op maximum page_size bytes, which
is typically 256 bytes. And since there are SPI NORs that can run at 400
MHz, I guess some performance penalty shouldn't be excluded.

>  			return ret;
> +		}
>  	}
>  
>  	tmpbufsize = op->cmd.nbytes + op->addr.nbytes + op->dummy.nbytes;
> 
> ---
> base-commit: 19b50f80b3a4865bd477aa5c026dd234d39a50d2
> change-id: 20240209-spi-mem-stats-ff9bf91c0f7e
> 
> Best regards,
  
Théo Lebrun Feb. 13, 2024, 3 p.m. UTC | #6
Hello Tudor,

On Tue Feb 13, 2024 at 1:39 PM CET, Tudor Ambarus wrote:
> >  /**
> >   * spi_mem_exec_op() - Execute a memory operation
> >   * @mem: the SPI memory
> > @@ -339,8 +383,12 @@ int spi_mem_exec_op(struct spi_mem *mem, const struct spi_mem_op *op)
> >  		 * read path) and expect the core to use the regular SPI
> >  		 * interface in other cases.
> >  		 */
> > -		if (!ret || ret != -ENOTSUPP || ret != -EOPNOTSUPP)
> > +		if (!ret || ret != -ENOTSUPP || ret != -EOPNOTSUPP) {
> > +			spi_mem_add_op_stats(ctlr->pcpu_statistics, op, ret);
> > +			spi_mem_add_op_stats(mem->spi->pcpu_statistics, op, ret);
> > +
>
> Would be good to be able to opt out the statistics if one wants it.
>
> SPI NORs can write with a single write op maximum page_size bytes, which
> is typically 256 bytes. And since there are SPI NORs that can run at 400
> MHz, I guess some performance penalty shouldn't be excluded.

I did my testing on a 40 MHz octal SPI NOR with most reads being much
bigger than 256 bytes, so I probably didn't have the fastest setup
indeed.

What shape would that take? A spi-mem DT prop? New field in the SPI
statistics sysfs directory?

Other remarks have been taken into account, thanks!

--
Théo Lebrun, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com
  
Mark Brown Feb. 13, 2024, 3:28 p.m. UTC | #7
On Tue, Feb 13, 2024 at 12:39:02PM +0000, Tudor Ambarus wrote:
> On 2/9/24 13:51, Théo Lebrun wrote:

> > +		if (!ret || ret != -ENOTSUPP || ret != -EOPNOTSUPP) {
> > +			spi_mem_add_op_stats(ctlr->pcpu_statistics, op, ret);
> > +			spi_mem_add_op_stats(mem->spi->pcpu_statistics, op, ret);
> > +

> Would be good to be able to opt out the statistics if one wants it.

> SPI NORs can write with a single write op maximum page_size bytes, which
> is typically 256 bytes. And since there are SPI NORs that can run at 400
> MHz, I guess some performance penalty shouldn't be excluded.

If we can cope with this sort of statistics collection in the networking
fast path we can probably cope with it for SPI too, the immediate
recording is all per CPU so I'd like to see some numbers showing that
it's a problem before worrying about it too much.  Even the people doing
things like saturating CAN buses haven't been raising it as a concern
for the regular SPI data path.  We could add a Kconfig if it's an issue.
  
Tudor Ambarus Feb. 14, 2024, 7:42 a.m. UTC | #8
On 2/13/24 15:28, Mark Brown wrote:
> On Tue, Feb 13, 2024 at 12:39:02PM +0000, Tudor Ambarus wrote:
>> On 2/9/24 13:51, Théo Lebrun wrote:
> 
>>> +		if (!ret || ret != -ENOTSUPP || ret != -EOPNOTSUPP) {
>>> +			spi_mem_add_op_stats(ctlr->pcpu_statistics, op, ret);
>>> +			spi_mem_add_op_stats(mem->spi->pcpu_statistics, op, ret);
>>> +
> 
>> Would be good to be able to opt out the statistics if one wants it.
> 
>> SPI NORs can write with a single write op maximum page_size bytes, which
>> is typically 256 bytes. And since there are SPI NORs that can run at 400
>> MHz, I guess some performance penalty shouldn't be excluded.
> 
> If we can cope with this sort of statistics collection in the networking
> fast path we can probably cope with it for SPI too, the immediate
> recording is all per CPU so I'd like to see some numbers showing that
> it's a problem before worrying about it too much.  Even the people doing
> things like saturating CAN buses haven't been raising it as a concern
> for the regular SPI data path.  We could add a Kconfig if it's an issue.

Ok, we can deal with it afterwards if it'll become an issue.
  
Tudor Ambarus Feb. 14, 2024, 8 a.m. UTC | #9
On 2/13/24 15:00, Théo Lebrun wrote:
> Hello Tudor,

Hi!

> 
> On Tue Feb 13, 2024 at 1:39 PM CET, Tudor Ambarus wrote:
>>>  /**
>>>   * spi_mem_exec_op() - Execute a memory operation
>>>   * @mem: the SPI memory
>>> @@ -339,8 +383,12 @@ int spi_mem_exec_op(struct spi_mem *mem, const struct spi_mem_op *op)
>>>  		 * read path) and expect the core to use the regular SPI
>>>  		 * interface in other cases.
>>>  		 */
>>> -		if (!ret || ret != -ENOTSUPP || ret != -EOPNOTSUPP)
>>> +		if (!ret || ret != -ENOTSUPP || ret != -EOPNOTSUPP) {
>>> +			spi_mem_add_op_stats(ctlr->pcpu_statistics, op, ret);
>>> +			spi_mem_add_op_stats(mem->spi->pcpu_statistics, op, ret);
>>> +
>>
>> Would be good to be able to opt out the statistics if one wants it.
>>
>> SPI NORs can write with a single write op maximum page_size bytes, which
>> is typically 256 bytes. And since there are SPI NORs that can run at 400
>> MHz, I guess some performance penalty shouldn't be excluded.
> 
> I did my testing on a 40 MHz octal SPI NOR with most reads being much
> bigger than 256 bytes, so I probably didn't have the fastest setup
> indeed.

yeah, reads are bigger, the entire flash can be read with a single read op.

> 
> What shape would that take? A spi-mem DT prop? New field in the SPI
> statistics sysfs directory?
> 

I think I'd go with a sysfs entry, it provides flexibility. But I guess
we can worry about this if we have some numbers, and I don't have, so
you're fine even without the opt-out option.

> Other remarks have been taken into account, thanks!
> 

Ok, thanks.
Cheers,
ta
  
Théo Lebrun Feb. 14, 2024, 8:51 a.m. UTC | #10
Hello,

On Wed Feb 14, 2024 at 9:00 AM CET, Tudor Ambarus wrote:
> On 2/13/24 15:00, Théo Lebrun wrote:
> > On Tue Feb 13, 2024 at 1:39 PM CET, Tudor Ambarus wrote:
> >>>  /**
> >>>   * spi_mem_exec_op() - Execute a memory operation
> >>>   * @mem: the SPI memory
> >>> @@ -339,8 +383,12 @@ int spi_mem_exec_op(struct spi_mem *mem, const struct spi_mem_op *op)
> >>>  		 * read path) and expect the core to use the regular SPI
> >>>  		 * interface in other cases.
> >>>  		 */
> >>> -		if (!ret || ret != -ENOTSUPP || ret != -EOPNOTSUPP)
> >>> +		if (!ret || ret != -ENOTSUPP || ret != -EOPNOTSUPP) {
> >>> +			spi_mem_add_op_stats(ctlr->pcpu_statistics, op, ret);
> >>> +			spi_mem_add_op_stats(mem->spi->pcpu_statistics, op, ret);
> >>> +
> >>
> >> Would be good to be able to opt out the statistics if one wants it.
> >>
> >> SPI NORs can write with a single write op maximum page_size bytes, which
> >> is typically 256 bytes. And since there are SPI NORs that can run at 400
> >> MHz, I guess some performance penalty shouldn't be excluded.
> > 
> > I did my testing on a 40 MHz octal SPI NOR with most reads being much
> > bigger than 256 bytes, so I probably didn't have the fastest setup
> > indeed.
>
> yeah, reads are bigger, the entire flash can be read with a single read op.
>
> > 
> > What shape would that take? A spi-mem DT prop? New field in the SPI
> > statistics sysfs directory?
> > 
>
> I think I'd go with a sysfs entry, it provides flexibility. But I guess
> we can worry about this if we have some numbers, and I don't have, so
> you're fine even without the opt-out option.

Some ftrace numbers:
- 48002 calls to spi_mem_add_op_stats();
- min 1.053000µs;
- avg 1.175652µs;
- max 16.272000µs.

Platform is Mobileye EyeQ5. Cores are Imagine Technologies I6500-F. I
don't know the precision of our timer but we might be getting close to
what is measurable.

Thanks,

--
Théo Lebrun, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com
  
Tudor Ambarus Feb. 14, 2024, 9:29 a.m. UTC | #11
On 2/14/24 08:51, Théo Lebrun wrote:
> Hello,

Hi, Théo,
> 
> On Wed Feb 14, 2024 at 9:00 AM CET, Tudor Ambarus wrote:
>> On 2/13/24 15:00, Théo Lebrun wrote:
>>> On Tue Feb 13, 2024 at 1:39 PM CET, Tudor Ambarus wrote:
>>>>>  /**
>>>>>   * spi_mem_exec_op() - Execute a memory operation
>>>>>   * @mem: the SPI memory
>>>>> @@ -339,8 +383,12 @@ int spi_mem_exec_op(struct spi_mem *mem, const struct spi_mem_op *op)
>>>>>  		 * read path) and expect the core to use the regular SPI
>>>>>  		 * interface in other cases.
>>>>>  		 */
>>>>> -		if (!ret || ret != -ENOTSUPP || ret != -EOPNOTSUPP)
>>>>> +		if (!ret || ret != -ENOTSUPP || ret != -EOPNOTSUPP) {
>>>>> +			spi_mem_add_op_stats(ctlr->pcpu_statistics, op, ret);
>>>>> +			spi_mem_add_op_stats(mem->spi->pcpu_statistics, op, ret);
>>>>> +
>>>>
>>>> Would be good to be able to opt out the statistics if one wants it.
>>>>
>>>> SPI NORs can write with a single write op maximum page_size bytes, which
>>>> is typically 256 bytes. And since there are SPI NORs that can run at 400
>>>> MHz, I guess some performance penalty shouldn't be excluded.
>>>
>>> I did my testing on a 40 MHz octal SPI NOR with most reads being much
>>> bigger than 256 bytes, so I probably didn't have the fastest setup
>>> indeed.
>>
>> yeah, reads are bigger, the entire flash can be read with a single read op.
>>
>>>
>>> What shape would that take? A spi-mem DT prop? New field in the SPI
>>> statistics sysfs directory?
>>>
>>
>> I think I'd go with a sysfs entry, it provides flexibility. But I guess
>> we can worry about this if we have some numbers, and I don't have, so
>> you're fine even without the opt-out option.
> 
> Some ftrace numbers:
> - 48002 calls to spi_mem_add_op_stats();
> - min 1.053000µs;
> - avg 1.175652µs;
> - max 16.272000µs.
> 
> Platform is Mobileye EyeQ5. Cores are Imagine Technologies I6500-F. I
> don't know the precision of our timer but we might be getting close to
> what is measurable.
> 
Thanks.

I took a random SPI NOR flash [1], its page program typical time is 64µs
according to its SFDP data. We'll have to add here the delay the
software handling takes.

If you want to play a bit more, you can write the entire flash then
compare the ftrace numbers of spi_mem_add_op_stats() with spi_nor_write().

Cheers,
ta

[1] https://ww1.microchip.com/downloads/en/DeviceDoc/20005119G.pdf
  
Théo Lebrun Feb. 14, 2024, 10:59 a.m. UTC | #12
On Wed Feb 14, 2024 at 10:29 AM CET, Tudor Ambarus wrote:
> On 2/14/24 08:51, Théo Lebrun wrote:
> > On Wed Feb 14, 2024 at 9:00 AM CET, Tudor Ambarus wrote:
> >> On 2/13/24 15:00, Théo Lebrun wrote:
> >>> On Tue Feb 13, 2024 at 1:39 PM CET, Tudor Ambarus wrote:
> >>>>>  /**
> >>>>>   * spi_mem_exec_op() - Execute a memory operation
> >>>>>   * @mem: the SPI memory
> >>>>> @@ -339,8 +383,12 @@ int spi_mem_exec_op(struct spi_mem *mem, const struct spi_mem_op *op)
> >>>>>  		 * read path) and expect the core to use the regular SPI
> >>>>>  		 * interface in other cases.
> >>>>>  		 */
> >>>>> -		if (!ret || ret != -ENOTSUPP || ret != -EOPNOTSUPP)
> >>>>> +		if (!ret || ret != -ENOTSUPP || ret != -EOPNOTSUPP) {
> >>>>> +			spi_mem_add_op_stats(ctlr->pcpu_statistics, op, ret);
> >>>>> +			spi_mem_add_op_stats(mem->spi->pcpu_statistics, op, ret);
> >>>>> +
> >>>>
> >>>> Would be good to be able to opt out the statistics if one wants it.
> >>>>
> >>>> SPI NORs can write with a single write op maximum page_size bytes, which
> >>>> is typically 256 bytes. And since there are SPI NORs that can run at 400
> >>>> MHz, I guess some performance penalty shouldn't be excluded.
> >>>
> >>> I did my testing on a 40 MHz octal SPI NOR with most reads being much
> >>> bigger than 256 bytes, so I probably didn't have the fastest setup
> >>> indeed.
> >>
> >> yeah, reads are bigger, the entire flash can be read with a single read op.
> >>
> >>>
> >>> What shape would that take? A spi-mem DT prop? New field in the SPI
> >>> statistics sysfs directory?
> >>>
> >>
> >> I think I'd go with a sysfs entry, it provides flexibility. But I guess
> >> we can worry about this if we have some numbers, and I don't have, so
> >> you're fine even without the opt-out option.
> > 
> > Some ftrace numbers:
> > - 48002 calls to spi_mem_add_op_stats();
> > - min 1.053000µs;
> > - avg 1.175652µs;
> > - max 16.272000µs.
> > 
> > Platform is Mobileye EyeQ5. Cores are Imagine Technologies I6500-F. I
> > don't know the precision of our timer but we might be getting close to
> > what is measurable.
> > 
> Thanks.
>
> I took a random SPI NOR flash [1], its page program typical time is 64µs
> according to its SFDP data. We'll have to add here the delay the
> software handling takes.
>
> If you want to play a bit more, you can write the entire flash then
> compare the ftrace numbers of spi_mem_add_op_stats() with spi_nor_write().

It is unclear to me why you are focusing on writes? Won't reads be much
faster in the common case, and therefore where stats overhead would
show the most? For cadence-qspi, only issuing command reads (reads below
8 bytes) would be a sort of pathological case.

Thanks,

--
Théo Lebrun, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com
  
Tudor Ambarus Feb. 14, 2024, 12:59 p.m. UTC | #13
On 2/14/24 10:59, Théo Lebrun wrote:
> On Wed Feb 14, 2024 at 10:29 AM CET, Tudor Ambarus wrote:
>> On 2/14/24 08:51, Théo Lebrun wrote:
>>> On Wed Feb 14, 2024 at 9:00 AM CET, Tudor Ambarus wrote:
>>>> On 2/13/24 15:00, Théo Lebrun wrote:
>>>>> On Tue Feb 13, 2024 at 1:39 PM CET, Tudor Ambarus wrote:
>>>>>>>  /**
>>>>>>>   * spi_mem_exec_op() - Execute a memory operation
>>>>>>>   * @mem: the SPI memory
>>>>>>> @@ -339,8 +383,12 @@ int spi_mem_exec_op(struct spi_mem *mem, const struct spi_mem_op *op)
>>>>>>>  		 * read path) and expect the core to use the regular SPI
>>>>>>>  		 * interface in other cases.
>>>>>>>  		 */
>>>>>>> -		if (!ret || ret != -ENOTSUPP || ret != -EOPNOTSUPP)
>>>>>>> +		if (!ret || ret != -ENOTSUPP || ret != -EOPNOTSUPP) {
>>>>>>> +			spi_mem_add_op_stats(ctlr->pcpu_statistics, op, ret);
>>>>>>> +			spi_mem_add_op_stats(mem->spi->pcpu_statistics, op, ret);
>>>>>>> +
>>>>>>
>>>>>> Would be good to be able to opt out the statistics if one wants it.
>>>>>>
>>>>>> SPI NORs can write with a single write op maximum page_size bytes, which
>>>>>> is typically 256 bytes. And since there are SPI NORs that can run at 400
>>>>>> MHz, I guess some performance penalty shouldn't be excluded.
>>>>>
>>>>> I did my testing on a 40 MHz octal SPI NOR with most reads being much
>>>>> bigger than 256 bytes, so I probably didn't have the fastest setup
>>>>> indeed.
>>>>
>>>> yeah, reads are bigger, the entire flash can be read with a single read op.
>>>>
>>>>>
>>>>> What shape would that take? A spi-mem DT prop? New field in the SPI
>>>>> statistics sysfs directory?
>>>>>
>>>>
>>>> I think I'd go with a sysfs entry, it provides flexibility. But I guess
>>>> we can worry about this if we have some numbers, and I don't have, so
>>>> you're fine even without the opt-out option.
>>>
>>> Some ftrace numbers:
>>> - 48002 calls to spi_mem_add_op_stats();
>>> - min 1.053000µs;
>>> - avg 1.175652µs;
>>> - max 16.272000µs.
>>>
>>> Platform is Mobileye EyeQ5. Cores are Imagine Technologies I6500-F. I
>>> don't know the precision of our timer but we might be getting close to
>>> what is measurable.
>>>
>> Thanks.
>>
>> I took a random SPI NOR flash [1], its page program typical time is 64µs
>> according to its SFDP data. We'll have to add here the delay the
>> software handling takes.
>>
>> If you want to play a bit more, you can write the entire flash then
>> compare the ftrace numbers of spi_mem_add_op_stats() with spi_nor_write().
> 
> It is unclear to me why you are focusing on writes? Won't reads be much

It's easier to test as the SPI NOR core will issue a new page program
operation, thus a new exec_op() call, for each page size.

> faster in the common case, and therefore where stats overhead would
> show the most? For cadence-qspi, only issuing command reads (reads below
> 8 bytes) would be a sort of pathological case.

If you can serialize the reads and do small 8 bytes requests, then yes,
small reads are the critical case. Not sure how common it is and how to
test it.

Again, opting out is not a hard requirement from my side, you're fine
with how the patch is now. But if you want to measure the impact, you
can either compare with small reads, as you suggested, and with full
flash write, where the exec_op() calls will be split by the core on a
page_size basis.

Cheers,
ta
  

Patch

diff --git a/drivers/spi/spi-mem.c b/drivers/spi/spi-mem.c
index 2dc8ceb85374..171fe6b1c247 100644
--- a/drivers/spi/spi-mem.c
+++ b/drivers/spi/spi-mem.c
@@ -297,6 +297,50 @@  static void spi_mem_access_end(struct spi_mem *mem)
 		pm_runtime_put(ctlr->dev.parent);
 }
 
+static void spi_mem_add_op_stats(struct spi_statistics __percpu *pcpu_stats,
+				 const struct spi_mem_op *op, int exec_op_ret)
+{
+	struct spi_statistics *stats;
+	int len, l2len;
+
+	get_cpu();
+	stats = this_cpu_ptr(pcpu_stats);
+	u64_stats_update_begin(&stats->syncp);
+
+	/*
+	 * We do not have the concept of messages or transfers. Let's consider
+	 * that one operation is equivalent to one message and one transfer.
+	 */
+	u64_stats_inc(&stats->messages);
+	u64_stats_inc(&stats->transfers);
+
+	/* Use the sum of all lengths as bytes count and histogram value. */
+	len = (int)op->cmd.nbytes + (int)op->addr.nbytes;
+	len += (int)op->dummy.nbytes + (int)op->data.nbytes;
+	u64_stats_add(&stats->bytes, len);
+	l2len = min(fls(len), SPI_STATISTICS_HISTO_SIZE) - 1;
+	l2len = max(l2len, 0);
+	u64_stats_inc(&stats->transfer_bytes_histo[l2len]);
+
+	/* Only account for data bytes as xferred bytes. */
+	if (op->data.nbytes && op->data.dir == SPI_MEM_DATA_OUT)
+		u64_stats_add(&stats->bytes_tx, op->data.nbytes);
+	if (op->data.nbytes && op->data.dir == SPI_MEM_DATA_IN)
+		u64_stats_add(&stats->bytes_rx, op->data.nbytes);
+
+	/*
+	 * A timeout is not an error, following the same behavior as
+	 * spi_transfer_one_message().
+	 */
+	if (exec_op_ret == -ETIMEDOUT)
+		u64_stats_inc(&stats->timedout);
+	else if (exec_op_ret)
+		u64_stats_inc(&stats->errors);
+
+	u64_stats_update_end(&stats->syncp);
+	put_cpu();
+}
+
 /**
  * spi_mem_exec_op() - Execute a memory operation
  * @mem: the SPI memory
@@ -339,8 +383,12 @@  int spi_mem_exec_op(struct spi_mem *mem, const struct spi_mem_op *op)
 		 * read path) and expect the core to use the regular SPI
 		 * interface in other cases.
 		 */
-		if (!ret || ret != -ENOTSUPP || ret != -EOPNOTSUPP)
+		if (!ret || ret != -ENOTSUPP || ret != -EOPNOTSUPP) {
+			spi_mem_add_op_stats(ctlr->pcpu_statistics, op, ret);
+			spi_mem_add_op_stats(mem->spi->pcpu_statistics, op, ret);
+
 			return ret;
+		}
 	}
 
 	tmpbufsize = op->cmd.nbytes + op->addr.nbytes + op->dummy.nbytes;