Message ID | 20240209-spi-mem-stats-v1-1-dd1a422fc015@bootlin.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel+bounces-59387-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:50ea:b0:106:860b:bbdd with SMTP id r10csp863571dyd; Fri, 9 Feb 2024 05:52:13 -0800 (PST) X-Google-Smtp-Source: AGHT+IEYM+u+vg8u3dnxPJ/uv0adPAJGGoOSr/cPjipn8GzI0VrMi2iTcjqSNRHpdWnEdNFYpUjZ X-Received: by 2002:a17:90b:3d8c:b0:296:643:bd00 with SMTP id pq12-20020a17090b3d8c00b002960643bd00mr1444231pjb.23.1707486733173; Fri, 09 Feb 2024 05:52:13 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707486733; cv=pass; d=google.com; s=arc-20160816; b=sEtb2ruR65qS2TUDLA1dxb3oywpg7OUERbEeT5oEDeFIc3VhP67RE7GuyhWnwPJYs+ 7c7k9JfLoPL6ne9bwsNq1lbNpvQI2FE3T3WvH2ybhYjPC4fQ5xHRojeeOk/iLpyo+ZZ/ Z4wy8iBt3MPm1EyebR8fa75uTRaPRcT3inCALNblRJhCPIn9RXG12klsSwpzOzVg2dRc jfuglXtJMAAAIBsB9E8wr8hz37YSf3cfxL+bl6jA51/ik0DGiR0qbs6aLXHz0QcYQ3xn JH7nrnrRe6IJ+UHjllUAnK2k0zKNsUdHmkViRrs2h+xygDKE49VFjuGqKzQUtMzqeUIG 4RlA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=cc:to:message-id:content-transfer-encoding:mime-version :list-unsubscribe:list-subscribe:list-id:precedence:subject:date :from:dkim-signature; bh=AUYRGdnXj82lH9bL33wM3PM4HJIHlriE6TD1Z3Nt0sI=; fh=QWD9BciDOqbPDYBeMF2wVkzR3ZoPWMBPBwMhlgYvPf4=; b=MnozTl/hx/4FF0Uz/gq8kb3B6/eTtrtbFOCKvL/PB2aKSn9twsUYvpm0mevVn2UMfC 3+zWTEQDBXASy4XpJQ3WjGAlVchCRH/4AjkHJi0U6F+X06ZqXNUtUJPuWe5je6+U2hWz PKU9n8BwIpnJV4X034zk8JTnDUJwvLlCik426bXCnAlI5Qdv25ckxid3JB+UcNtcoNNQ bTwWjhaiMXBF6Se1Rlpuzfmjmg8QTGyTFpEuZGL4bA4CVJXFvZED0opusuVzB3rpUBLy wiQ9CsNYNvD0AlWVgpWzorkUwD7EcmikoRAwG02u/gfplDSlQMORizucjGUopEZAs3Wk nYgg==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@bootlin.com header.s=gm1 header.b=ETBDqY2w; arc=pass (i=1 spf=pass spfdomain=bootlin.com dkim=pass dkdomain=bootlin.com dmarc=pass fromdomain=bootlin.com); spf=pass (google.com: domain of linux-kernel+bounces-59387-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-59387-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=bootlin.com X-Forwarded-Encrypted: i=2; AJvYcCU6bqhvtf7fJS4gXhtI+UIjUbRijMHiWnXkPdZznc2uDhEbu2KJDbIQe+TBWR5Qetk44CxKvAFQo71HEysOsR6HmyaMxQ== Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id pc4-20020a17090b3b8400b00296e23005efsi1743049pjb.20.2024.02.09.05.52.13 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 09 Feb 2024 05:52:13 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-59387-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; dkim=pass header.i=@bootlin.com header.s=gm1 header.b=ETBDqY2w; arc=pass (i=1 spf=pass spfdomain=bootlin.com dkim=pass dkdomain=bootlin.com dmarc=pass fromdomain=bootlin.com); spf=pass (google.com: domain of linux-kernel+bounces-59387-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-59387-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=bootlin.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id E99E1281FE3 for <ouuuleilei@gmail.com>; Fri, 9 Feb 2024 13:52:12 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id E6E404D11F; Fri, 9 Feb 2024 13:51:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bootlin.com header.i=@bootlin.com header.b="ETBDqY2w" Received: from mslow1.mail.gandi.net (mslow1.mail.gandi.net [217.70.178.240]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 838CC4CB50; Fri, 9 Feb 2024 13:51:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.70.178.240 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707486716; cv=none; b=a9hkqvkqvvBrTrVUHDOHnY5uEAV6MRixS3OVjSxuwleO3+fpLlK4H96GyEcNfqOM7YOiExXo1gCuS3jZlaadddZ00vS9mPn/ktt5jK/Z17NlNR+674jHSgjGs7owBzhYk6CTEP7wjCWSrkGTjt//9Hu+wBXy2A8awk+b0Vs8ssA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707486716; c=relaxed/simple; bh=0bD1Yt7GaMWvgagUDRUd142oKOAha2Uum7epnkYDwYM=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:To:Cc; b=ESGfy75wSIMA286Yq0/RhFBqyrScWuNFSXpS6raLKCCQU0aBpvCev4GN0ZxBjgj86MyvY5OkdwHc/hMD0vRdU7oNPrFwzFgM0tupmCGsrpcY+5/QDZ0OdOpcE9JraVe+oyEPPonN5GrAJqzI8nCtyijCLaK/bQT56Pij0tnlSaY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=bootlin.com; spf=pass smtp.mailfrom=bootlin.com; dkim=pass (2048-bit key) header.d=bootlin.com header.i=@bootlin.com header.b=ETBDqY2w; arc=none smtp.client-ip=217.70.178.240 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=bootlin.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bootlin.com Received: from relay6-d.mail.gandi.net (unknown [IPv6:2001:4b98:dc4:8::226]) by mslow1.mail.gandi.net (Postfix) with ESMTP id 4F523C386D; Fri, 9 Feb 2024 13:51:46 +0000 (UTC) Received: by mail.gandi.net (Postfix) with ESMTPSA id 0E403C0006; Fri, 9 Feb 2024 13:51:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bootlin.com; s=gm1; t=1707486697; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=AUYRGdnXj82lH9bL33wM3PM4HJIHlriE6TD1Z3Nt0sI=; b=ETBDqY2wPkcGTEGRxzebBkunFGckdAZ804L/mgWOjGLEwhXdfrfZsC1Lo4bpwlQEtpenTc SQbuIBUO1JJ1Oktdw+aRH0j8OYDhXSQMhGu/SlXI+oENT0tA7jl0pn739yIqJgr2Svw/mB EKJNmaCqj2wnnxGN69xlpwa4NSAgaW50GacDj0IYhM455+2IdpIEHfq4F1LVRB1MUVwE7W UNryUU/48r46YzAJFc4sY+/bLMmIVpSWVR5J+FSQbZQsmVfr+uCcLM4gtNAhu02O6ep605 YJeriIHtnOAYVCVYFMiyI7V63hJiYBB8KC2P2B/rR2dO/YdN2LyN8qyJjMsZfg== From: =?utf-8?q?Th=C3=A9o_Lebrun?= <theo.lebrun@bootlin.com> Date: Fri, 09 Feb 2024 14:51:23 +0100 Subject: [PATCH] spi: spi-mem: add statistics support to ->exec_op() calls Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit Message-Id: <20240209-spi-mem-stats-v1-1-dd1a422fc015@bootlin.com> X-B4-Tracking: v=1; b=H4sIANotxmUC/x3MMQqAMAxA0auUzAZqEaReRRy0JprBWhoRQXp3i +Mb/n9BKQspDOaFTLeonLGibQyEfY4boazV4KzrrLMeNQkedKBe86XI7Bf2bbDcE9QmZWJ5/t8 4lfIB4wjtUV8AAAA= To: Mark Brown <broonie@kernel.org> Cc: linux-spi@vger.kernel.org, linux-kernel@vger.kernel.org, Dhruva Gole <d-gole@ti.com>, Gregory CLEMENT <gregory.clement@bootlin.com>, Vladimir Kondratiev <vladimir.kondratiev@mobileye.com>, Thomas Petazzoni <thomas.petazzoni@bootlin.com>, Tawfik Bayouk <tawfik.bayouk@mobileye.com>, =?utf-8?q?Th=C3=A9o_Lebrun?= <theo.lebrun@bootlin.com> X-Mailer: b4 0.12.4 X-GND-Sasl: theo.lebrun@bootlin.com X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1790429608674164216 X-GMAIL-MSGID: 1790429608674164216 |
Series |
spi: spi-mem: add statistics support to ->exec_op() calls
|
|
Commit Message
Théo Lebrun
Feb. 9, 2024, 1:51 p.m. UTC
Current behavior is that spi-mem operations do not increment statistics,
neither per-controller nor per-device, if ->exec_op() is used. For
operations that do NOT use ->exec_op(), stats are increased as the
usual spi_sync() is called.
The newly implemented spi_mem_add_op_stats() function is strongly
inspired by spi_statistics_add_transfer_stats(); locking logic and
l2len computation comes from there.
Statistics that are being filled: bytes{,_rx,_tx}, messages, transfers,
errors, timedout, transfer_bytes_histo_*.
Note about messages & transfers counters: in the fallback to spi_sync()
case, there are from 1 to 4 transfers per message. We only register one
big transfer in the ->exec_op() case as that is closer to reality.
This patch is NOT touching:
- spi_async, spi_sync, spi_sync_immediate: those counters describe
precise function calls, incrementing them would be lying. I believe
comparing the messages counter to spi_async+spi_sync is a good way
to detect ->exec_op() calls, but I might be missing edge cases
knowledge.
- transfers_split_maxsize: splitting cannot happen if ->exec_op() is
provided.
Testing this patch:
$ cd /sys/devices/platform/soc
$ find . -type d -path "*spi*" -name statistics
./2100000.spi/spi_master/spi0/statistics
./2100000.spi/spi_master/spi0/spi0.0/statistics
$ cd ./2100000.spi/spi_master/spi0/statistics
$ for f in *; do printf "%s\t" $f; cat $f; done | \
grep -v transfer_bytes_histo | column -t
bytes 240745444
bytes_rx 240170907
bytes_tx 126320
errors 0
messages 97354
spi_async 0
spi_sync 0
spi_sync_immediate 0
timedout 0
transfers 97354
transfers_split_maxsize 0
Signed-off-by: Théo Lebrun <theo.lebrun@bootlin.com>
---
drivers/spi/spi-mem.c | 50 +++++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 49 insertions(+), 1 deletion(-)
---
base-commit: 19b50f80b3a4865bd477aa5c026dd234d39a50d2
change-id: 20240209-spi-mem-stats-ff9bf91c0f7e
Best regards,
Comments
Hi! On Feb 09, 2024 at 14:51:23 +0100, Théo Lebrun wrote: > Current behavior is that spi-mem operations do not increment statistics, > neither per-controller nor per-device, if ->exec_op() is used. For > operations that do NOT use ->exec_op(), stats are increased as the > usual spi_sync() is called. > > The newly implemented spi_mem_add_op_stats() function is strongly > inspired by spi_statistics_add_transfer_stats(); locking logic and > l2len computation comes from there. > > Statistics that are being filled: bytes{,_rx,_tx}, messages, transfers, > errors, timedout, transfer_bytes_histo_*. > > Note about messages & transfers counters: in the fallback to spi_sync() > case, there are from 1 to 4 transfers per message. We only register one > big transfer in the ->exec_op() case as that is closer to reality. Can you please elaborate on this point a bit? To me it feels then that the reported stats in this case will be less than the true value then? > > This patch is NOT touching: > - spi_async, spi_sync, spi_sync_immediate: those counters describe > precise function calls, incrementing them would be lying. I believe > comparing the messages counter to spi_async+spi_sync is a good way > to detect ->exec_op() calls, but I might be missing edge cases > knowledge. > - transfers_split_maxsize: splitting cannot happen if ->exec_op() is > provided. Credit where it's due - This is a very well written and verbose commit message. Just my personal opinion maybe but all this data about testing can go below the tear line in the description? Or somewhere in the kernel docs would also be just fine. (I know we kernel developers consider git log as the best source of documentation :) ) but still.. if you feel like adding ;) No strong opinions there though. > > Testing this patch: > > $ cd /sys/devices/platform/soc > $ find . -type d -path "*spi*" -name statistics > ./2100000.spi/spi_master/spi0/statistics > ./2100000.spi/spi_master/spi0/spi0.0/statistics > $ cd ./2100000.spi/spi_master/spi0/statistics > > $ for f in *; do printf "%s\t" $f; cat $f; done | \ > grep -v transfer_bytes_histo | column -t > bytes 240745444 > bytes_rx 240170907 > bytes_tx 126320 > errors 0 > messages 97354 > spi_async 0 > spi_sync 0 > spi_sync_immediate 0 > timedout 0 > transfers 97354 > transfers_split_maxsize 0 > > Signed-off-by: Théo Lebrun <theo.lebrun@bootlin.com> > --- > drivers/spi/spi-mem.c | 50 +++++++++++++++++++++++++++++++++++++++++++++++++- > 1 file changed, 49 insertions(+), 1 deletion(-) > > diff --git a/drivers/spi/spi-mem.c b/drivers/spi/spi-mem.c > index 2dc8ceb85374..171fe6b1c247 100644 > --- a/drivers/spi/spi-mem.c > +++ b/drivers/spi/spi-mem.c > @@ -297,6 +297,50 @@ static void spi_mem_access_end(struct spi_mem *mem) > pm_runtime_put(ctlr->dev.parent); > } > > +static void spi_mem_add_op_stats(struct spi_statistics __percpu *pcpu_stats, > + const struct spi_mem_op *op, int exec_op_ret) > +{ > + struct spi_statistics *stats; > + int len, l2len; > + > + get_cpu(); > + stats = this_cpu_ptr(pcpu_stats); > + u64_stats_update_begin(&stats->syncp); > + > + /* > + * We do not have the concept of messages or transfers. Let's consider > + * that one operation is equivalent to one message and one transfer. Why 1 message _and_ 1 xfer and not simply 1 xfer? Even in the example of testing that you showed above the values for message and xfer are anyway going to be same, then why have these 2 members in the first place? Can we not do away with one of these? Sorry but I am not too familiar with u64_stats_inc so my q. may sound silly. Seems to be more of a concept widely used in networking side of the kernel. > + */ > + u64_stats_inc(&stats->messages); > + u64_stats_inc(&stats->transfers); > + > + /* Use the sum of all lengths as bytes count and histogram value. */ > + len = (int)op->cmd.nbytes + (int)op->addr.nbytes; > + len += (int)op->dummy.nbytes + (int)op->data.nbytes; > + u64_stats_add(&stats->bytes, len); > + l2len = min(fls(len), SPI_STATISTICS_HISTO_SIZE) - 1; > + l2len = max(l2len, 0); > + u64_stats_inc(&stats->transfer_bytes_histo[l2len]); > + > + /* Only account for data bytes as xferred bytes. */ > + if (op->data.nbytes && op->data.dir == SPI_MEM_DATA_OUT) > + u64_stats_add(&stats->bytes_tx, op->data.nbytes); > + if (op->data.nbytes && op->data.dir == SPI_MEM_DATA_IN) > + u64_stats_add(&stats->bytes_rx, op->data.nbytes); > + > + /* > + * A timeout is not an error, following the same behavior as > + * spi_transfer_one_message(). > + */ > + if (exec_op_ret == -ETIMEDOUT) > + u64_stats_inc(&stats->timedout); > + else if (exec_op_ret) > + u64_stats_inc(&stats->errors); > + > + u64_stats_update_end(&stats->syncp); > + put_cpu(); > +} > + > /** > * spi_mem_exec_op() - Execute a memory operation > * @mem: the SPI memory > @@ -339,8 +383,12 @@ int spi_mem_exec_op(struct spi_mem *mem, const struct spi_mem_op *op) > * read path) and expect the core to use the regular SPI > * interface in other cases. > */ > - if (!ret || ret != -ENOTSUPP || ret != -EOPNOTSUPP) > + if (!ret || ret != -ENOTSUPP || ret != -EOPNOTSUPP) { > + spi_mem_add_op_stats(ctlr->pcpu_statistics, op, ret); > + spi_mem_add_op_stats(mem->spi->pcpu_statistics, op, ret); > + Just curious, how much does this impact performance? Have you been able to do some before / after profiling with/out this patch? For eg. for every single spimem op I'm constantly going to incur the penalty of these calls right? Just wondering if we can / should make this optional to have the op_stats. If there is a perf penalty, like if my ospi operations start being impacted by these calls then I may not be okay with this patch. But if you have tested and not found it to be the case I am okay with these changes. If I find some time later, I'll try to test but I'm caught up with some other work. For now I'll leave my R-by with the above conditions addressed / answered. Mostly LGTM, Reviewed-by: Dhruva Gole <d-gole@ti.com>
On Mon, Feb 12, 2024 at 04:43:55PM +0530, Dhruva Gole wrote: > On Feb 09, 2024 at 14:51:23 +0100, Théo Lebrun wrote: > > + /* > > + * We do not have the concept of messages or transfers. Let's consider > > + * that one operation is equivalent to one message and one transfer. > Why 1 message _and_ 1 xfer and not simply 1 xfer? > Even in the example of testing that you showed above the values for > message and xfer are anyway going to be same, then why have these 2 > members in the first place? Can we not do away with one of these? If the device supports regular SPI operations as well as spi-mem operations then this will ensure that the spi-mem stats fit in with the other operations. If it only supports spi-mem operations it would not matter so much.
Hello, On Mon Feb 12, 2024 at 12:13 PM CET, Dhruva Gole wrote: > Hi! > > On Feb 09, 2024 at 14:51:23 +0100, Théo Lebrun wrote: > > Current behavior is that spi-mem operations do not increment statistics, > > neither per-controller nor per-device, if ->exec_op() is used. For > > operations that do NOT use ->exec_op(), stats are increased as the > > usual spi_sync() is called. > > > > The newly implemented spi_mem_add_op_stats() function is strongly > > inspired by spi_statistics_add_transfer_stats(); locking logic and > > l2len computation comes from there. > > > > Statistics that are being filled: bytes{,_rx,_tx}, messages, transfers, > > errors, timedout, transfer_bytes_histo_*. > > > > Note about messages & transfers counters: in the fallback to spi_sync() > > case, there are from 1 to 4 transfers per message. We only register one > > big transfer in the ->exec_op() case as that is closer to reality. > > Can you please elaborate on this point a bit? To me it feels then that > the reported stats in this case will be less than the true value then? To me, a transfer is one transaction with the SPI controller. In most implementations of ->exec_op(), the controller gets configured once for the full transfer to take place. This contrasts with the fallback case that does from 1 to 4 transfers (cmd, addr, dummy & data, with the last three being optional). One transfer feels closer to what happens from my point-of-view. What would be your definition of a transfer? Or the "official" one that the sysfs entry represents? > > This patch is NOT touching: > > - spi_async, spi_sync, spi_sync_immediate: those counters describe > > precise function calls, incrementing them would be lying. I believe > > comparing the messages counter to spi_async+spi_sync is a good way > > to detect ->exec_op() calls, but I might be missing edge cases > > knowledge. > > - transfers_split_maxsize: splitting cannot happen if ->exec_op() is > > provided. > > Credit where it's due - This is a very well written and verbose commit > message. Thanks! > Just my personal opinion maybe but all this data about testing can go > below the tear line in the description? I see where you are coming from. I'll do so on the next revision (if there is one). > Or somewhere in the kernel docs would also be just fine. (I know we > kernel developers consider git log as the best source of documentation > :) ) but still.. if you feel like adding ;) A first step would be to have the sysfs SPI statistics API be described inside Documentation/. That is outside the scope of this patch though. :-) > No strong opinions there though. Same. [...] > > +static void spi_mem_add_op_stats(struct spi_statistics __percpu *pcpu_stats, > > + const struct spi_mem_op *op, int exec_op_ret) > > +{ > > + struct spi_statistics *stats; > > + int len, l2len; > > + > > + get_cpu(); > > + stats = this_cpu_ptr(pcpu_stats); > > + u64_stats_update_begin(&stats->syncp); > > + > > + /* > > + * We do not have the concept of messages or transfers. Let's consider > > + * that one operation is equivalent to one message and one transfer. > > Why 1 message _and_ 1 xfer and not simply 1 xfer? > Even in the example of testing that you showed above the values for > message and xfer are anyway going to be same, then why have these 2 > members in the first place? Can we not do away with one of these? Mark Brown gave an answer to this. Indeed, with regular SPI operations, one message doesn't map to one transfer. [...] > > /** > > * spi_mem_exec_op() - Execute a memory operation > > * @mem: the SPI memory > > @@ -339,8 +383,12 @@ int spi_mem_exec_op(struct spi_mem *mem, const struct spi_mem_op *op) > > * read path) and expect the core to use the regular SPI > > * interface in other cases. > > */ > > - if (!ret || ret != -ENOTSUPP || ret != -EOPNOTSUPP) > > + if (!ret || ret != -ENOTSUPP || ret != -EOPNOTSUPP) { > > + spi_mem_add_op_stats(ctlr->pcpu_statistics, op, ret); > > + spi_mem_add_op_stats(mem->spi->pcpu_statistics, op, ret); > > + > > Just curious, how much does this impact performance? Have you been able > to do some before / after profiling with/out this patch? > > For eg. for every single spimem op I'm constantly going to incur the > penalty of these calls right? I have indeed done some benchmarking. I was not able to measure anything significant. Neither doing timings of end-to-end testing by reading loads of data over UBIFS, nor by using ftrace's function_graph. > Just wondering if we can / should make this optional to have the > op_stats. If there is a perf penalty, like if my ospi operations start > being impacted by these calls then I may not be okay with this patch. I've asked myself the same question. It is being done unconditionally on regular SPI ops, so I guess the question has been answered previously: no need to make this conditional. See spi_statistics_add_transfer_stats() in drivers/spi/spi.c. > But if you have tested and not found it to be the case I am okay with > these changes. > > If I find some time later, I'll try to test but I'm caught up with some > other work. For now I'll leave my R-by with the above conditions > addressed / answered. > > Mostly LGTM, > > Reviewed-by: Dhruva Gole <d-gole@ti.com> Thanks for your review! I don't regret adding you to the Cc list. Regards, -- Théo Lebrun, Bootlin Embedded Linux and Kernel engineering https://bootlin.com
Hi, On Feb 12, 2024 at 15:22:41 +0100, Théo Lebrun wrote: > Hello, > > On Mon Feb 12, 2024 at 12:13 PM CET, Dhruva Gole wrote: > > Hi! > > > > On Feb 09, 2024 at 14:51:23 +0100, Théo Lebrun wrote: > > > Current behavior is that spi-mem operations do not increment statistics, > > > neither per-controller nor per-device, if ->exec_op() is used. For > > > operations that do NOT use ->exec_op(), stats are increased as the > > > usual spi_sync() is called. > > > > > > The newly implemented spi_mem_add_op_stats() function is strongly > > > inspired by spi_statistics_add_transfer_stats(); locking logic and > > > l2len computation comes from there. > > > > > > Statistics that are being filled: bytes{,_rx,_tx}, messages, transfers, > > > errors, timedout, transfer_bytes_histo_*. > > > > > > Note about messages & transfers counters: in the fallback to spi_sync() > > > case, there are from 1 to 4 transfers per message. We only register one > > > big transfer in the ->exec_op() case as that is closer to reality. > > > > Can you please elaborate on this point a bit? To me it feels then that > > the reported stats in this case will be less than the true value then? > > To me, a transfer is one transaction with the SPI controller. In most > implementations of ->exec_op(), the controller gets configured once for > the full transfer to take place. This contrasts with the fallback case > that does from 1 to 4 transfers (cmd, addr, dummy & data, with the last > three being optional). > > One transfer feels closer to what happens from my point-of-view. What > would be your definition of a transfer? Or the "official" one that the > sysfs entry represents? Yeah I understand your point, this is something I'd also call as a transaction > > > > This patch is NOT touching: > > > - spi_async, spi_sync, spi_sync_immediate: those counters describe > > > precise function calls, incrementing them would be lying. I believe > > > comparing the messages counter to spi_async+spi_sync is a good way > > > to detect ->exec_op() calls, but I might be missing edge cases > > > knowledge. > > > - transfers_split_maxsize: splitting cannot happen if ->exec_op() is > > > provided. > > > > Credit where it's due - This is a very well written and verbose commit > > message. > > Thanks! > > > Just my personal opinion maybe but all this data about testing can go > > below the tear line in the description? > > I see where you are coming from. I'll do so on the next revision (if > there is one). cool! > > > Or somewhere in the kernel docs would also be just fine. (I know we > > kernel developers consider git log as the best source of documentation > > :) ) but still.. if you feel like adding ;) > > A first step would be to have the sysfs SPI statistics API be described > inside Documentation/. That is outside the scope of this patch > though. :-) > > > No strong opinions there though. > > Same. > > [...] > > > > +static void spi_mem_add_op_stats(struct spi_statistics __percpu *pcpu_stats, > > > + const struct spi_mem_op *op, int exec_op_ret) > > > +{ > > > + struct spi_statistics *stats; > > > + int len, l2len; > > > + > > > + get_cpu(); > > > + stats = this_cpu_ptr(pcpu_stats); > > > + u64_stats_update_begin(&stats->syncp); > > > + > > > + /* > > > + * We do not have the concept of messages or transfers. Let's consider > > > + * that one operation is equivalent to one message and one transfer. > > > > Why 1 message _and_ 1 xfer and not simply 1 xfer? > > Even in the example of testing that you showed above the values for > > message and xfer are anyway going to be same, then why have these 2 > > members in the first place? Can we not do away with one of these? > > Mark Brown gave an answer to this. Indeed, with regular SPI operations, > one message doesn't map to one transfer. Thanks for explaining Mark, understood. > > [...] > > > > /** > > > * spi_mem_exec_op() - Execute a memory operation > > > * @mem: the SPI memory > > > @@ -339,8 +383,12 @@ int spi_mem_exec_op(struct spi_mem *mem, const struct spi_mem_op *op) > > > * read path) and expect the core to use the regular SPI > > > * interface in other cases. > > > */ > > > - if (!ret || ret != -ENOTSUPP || ret != -EOPNOTSUPP) > > > + if (!ret || ret != -ENOTSUPP || ret != -EOPNOTSUPP) { > > > + spi_mem_add_op_stats(ctlr->pcpu_statistics, op, ret); > > > + spi_mem_add_op_stats(mem->spi->pcpu_statistics, op, ret); > > > + > > > > Just curious, how much does this impact performance? Have you been able > > to do some before / after profiling with/out this patch? > > > > For eg. for every single spimem op I'm constantly going to incur the > > penalty of these calls right? > > I have indeed done some benchmarking. I was not able to measure anything > significant. Neither doing timings of end-to-end testing by reading > loads of data over UBIFS, nor by using ftrace's function_graph. Awesome. > > > Just wondering if we can / should make this optional to have the > > op_stats. If there is a perf penalty, like if my ospi operations start > > being impacted by these calls then I may not be okay with this patch. > > I've asked myself the same question. It is being done unconditionally on > regular SPI ops, so I guess the question has been answered previously: > no need to make this conditional. > > See spi_statistics_add_transfer_stats() in drivers/spi/spi.c. Yeah I did see that, but maybe it didn't occur then whether we should make it optional. Anyway, I'm ok with this too. Let's worry about optional in future if required. > > > But if you have tested and not found it to be the case I am okay with > > these changes. > > > > If I find some time later, I'll try to test but I'm caught up with some > > other work. For now I'll leave my R-by with the above conditions > > addressed / answered. > > > > Mostly LGTM, > > > > Reviewed-by: Dhruva Gole <d-gole@ti.com> > > Thanks for your review! I don't regret adding you to the Cc list. > Cheers!
On 2/9/24 13:51, Théo Lebrun wrote: > Current behavior is that spi-mem operations do not increment statistics, > neither per-controller nor per-device, if ->exec_op() is used. For > operations that do NOT use ->exec_op(), stats are increased as the > usual spi_sync() is called. > > The newly implemented spi_mem_add_op_stats() function is strongly > inspired by spi_statistics_add_transfer_stats(); locking logic and > l2len computation comes from there. > > Statistics that are being filled: bytes{,_rx,_tx}, messages, transfers, > errors, timedout, transfer_bytes_histo_*. > > Note about messages & transfers counters: in the fallback to spi_sync() > case, there are from 1 to 4 transfers per message. We only register one > big transfer in the ->exec_op() case as that is closer to reality. > > This patch is NOT touching: > - spi_async, spi_sync, spi_sync_immediate: those counters describe > precise function calls, incrementing them would be lying. I believe > comparing the messages counter to spi_async+spi_sync is a good way > to detect ->exec_op() calls, but I might be missing edge cases > knowledge. > - transfers_split_maxsize: splitting cannot happen if ->exec_op() is > provided. > > Testing this patch: > > $ cd /sys/devices/platform/soc > $ find . -type d -path "*spi*" -name statistics > ./2100000.spi/spi_master/spi0/statistics > ./2100000.spi/spi_master/spi0/spi0.0/statistics > $ cd ./2100000.spi/spi_master/spi0/statistics > > $ for f in *; do printf "%s\t" $f; cat $f; done | \ > grep -v transfer_bytes_histo | column -t > bytes 240745444 > bytes_rx 240170907 > bytes_tx 126320 > errors 0 > messages 97354 > spi_async 0 > spi_sync 0 > spi_sync_immediate 0 > timedout 0 > transfers 97354 > transfers_split_maxsize 0 > > Signed-off-by: Théo Lebrun <theo.lebrun@bootlin.com> > --- > drivers/spi/spi-mem.c | 50 +++++++++++++++++++++++++++++++++++++++++++++++++- > 1 file changed, 49 insertions(+), 1 deletion(-) > > diff --git a/drivers/spi/spi-mem.c b/drivers/spi/spi-mem.c > index 2dc8ceb85374..171fe6b1c247 100644 > --- a/drivers/spi/spi-mem.c > +++ b/drivers/spi/spi-mem.c > @@ -297,6 +297,50 @@ static void spi_mem_access_end(struct spi_mem *mem) > pm_runtime_put(ctlr->dev.parent); > } > > +static void spi_mem_add_op_stats(struct spi_statistics __percpu *pcpu_stats, > + const struct spi_mem_op *op, int exec_op_ret) > +{ > + struct spi_statistics *stats; > + int len, l2len; > + > + get_cpu(); > + stats = this_cpu_ptr(pcpu_stats); > + u64_stats_update_begin(&stats->syncp); > + > + /* > + * We do not have the concept of messages or transfers. Let's consider > + * that one operation is equivalent to one message and one transfer. > + */ > + u64_stats_inc(&stats->messages); > + u64_stats_inc(&stats->transfers); > + > + /* Use the sum of all lengths as bytes count and histogram value. */ > + len = (int)op->cmd.nbytes + (int)op->addr.nbytes; > + len += (int)op->dummy.nbytes + (int)op->data.nbytes; spi_mem_check_op() makes sure that op->cmd.nbytes != 0, otherwise it returns -EINVAL ... > + u64_stats_add(&stats->bytes, len); > + l2len = min(fls(len), SPI_STATISTICS_HISTO_SIZE) - 1; .. thus l2len can never be negative. You can declare len and l2len as u64. The casts from above shall disappear. > + l2len = max(l2len, 0); > + u64_stats_inc(&stats->transfer_bytes_histo[l2len]); > + > + /* Only account for data bytes as xferred bytes. */ s/xferred/transferred? > + if (op->data.nbytes && op->data.dir == SPI_MEM_DATA_OUT) > + u64_stats_add(&stats->bytes_tx, op->data.nbytes); > + if (op->data.nbytes && op->data.dir == SPI_MEM_DATA_IN) > + u64_stats_add(&stats->bytes_rx, op->data.nbytes); > + > + /* > + * A timeout is not an error, following the same behavior as > + * spi_transfer_one_message(). > + */ > + if (exec_op_ret == -ETIMEDOUT) > + u64_stats_inc(&stats->timedout); > + else if (exec_op_ret) > + u64_stats_inc(&stats->errors); > + > + u64_stats_update_end(&stats->syncp); > + put_cpu(); > +} > + > /** > * spi_mem_exec_op() - Execute a memory operation > * @mem: the SPI memory > @@ -339,8 +383,12 @@ int spi_mem_exec_op(struct spi_mem *mem, const struct spi_mem_op *op) > * read path) and expect the core to use the regular SPI > * interface in other cases. > */ > - if (!ret || ret != -ENOTSUPP || ret != -EOPNOTSUPP) > + if (!ret || ret != -ENOTSUPP || ret != -EOPNOTSUPP) { > + spi_mem_add_op_stats(ctlr->pcpu_statistics, op, ret); > + spi_mem_add_op_stats(mem->spi->pcpu_statistics, op, ret); > + Would be good to be able to opt out the statistics if one wants it. SPI NORs can write with a single write op maximum page_size bytes, which is typically 256 bytes. And since there are SPI NORs that can run at 400 MHz, I guess some performance penalty shouldn't be excluded. > return ret; > + } > } > > tmpbufsize = op->cmd.nbytes + op->addr.nbytes + op->dummy.nbytes; > > --- > base-commit: 19b50f80b3a4865bd477aa5c026dd234d39a50d2 > change-id: 20240209-spi-mem-stats-ff9bf91c0f7e > > Best regards,
Hello Tudor, On Tue Feb 13, 2024 at 1:39 PM CET, Tudor Ambarus wrote: > > /** > > * spi_mem_exec_op() - Execute a memory operation > > * @mem: the SPI memory > > @@ -339,8 +383,12 @@ int spi_mem_exec_op(struct spi_mem *mem, const struct spi_mem_op *op) > > * read path) and expect the core to use the regular SPI > > * interface in other cases. > > */ > > - if (!ret || ret != -ENOTSUPP || ret != -EOPNOTSUPP) > > + if (!ret || ret != -ENOTSUPP || ret != -EOPNOTSUPP) { > > + spi_mem_add_op_stats(ctlr->pcpu_statistics, op, ret); > > + spi_mem_add_op_stats(mem->spi->pcpu_statistics, op, ret); > > + > > Would be good to be able to opt out the statistics if one wants it. > > SPI NORs can write with a single write op maximum page_size bytes, which > is typically 256 bytes. And since there are SPI NORs that can run at 400 > MHz, I guess some performance penalty shouldn't be excluded. I did my testing on a 40 MHz octal SPI NOR with most reads being much bigger than 256 bytes, so I probably didn't have the fastest setup indeed. What shape would that take? A spi-mem DT prop? New field in the SPI statistics sysfs directory? Other remarks have been taken into account, thanks! -- Théo Lebrun, Bootlin Embedded Linux and Kernel engineering https://bootlin.com
On Tue, Feb 13, 2024 at 12:39:02PM +0000, Tudor Ambarus wrote: > On 2/9/24 13:51, Théo Lebrun wrote: > > + if (!ret || ret != -ENOTSUPP || ret != -EOPNOTSUPP) { > > + spi_mem_add_op_stats(ctlr->pcpu_statistics, op, ret); > > + spi_mem_add_op_stats(mem->spi->pcpu_statistics, op, ret); > > + > Would be good to be able to opt out the statistics if one wants it. > SPI NORs can write with a single write op maximum page_size bytes, which > is typically 256 bytes. And since there are SPI NORs that can run at 400 > MHz, I guess some performance penalty shouldn't be excluded. If we can cope with this sort of statistics collection in the networking fast path we can probably cope with it for SPI too, the immediate recording is all per CPU so I'd like to see some numbers showing that it's a problem before worrying about it too much. Even the people doing things like saturating CAN buses haven't been raising it as a concern for the regular SPI data path. We could add a Kconfig if it's an issue.
On 2/13/24 15:28, Mark Brown wrote: > On Tue, Feb 13, 2024 at 12:39:02PM +0000, Tudor Ambarus wrote: >> On 2/9/24 13:51, Théo Lebrun wrote: > >>> + if (!ret || ret != -ENOTSUPP || ret != -EOPNOTSUPP) { >>> + spi_mem_add_op_stats(ctlr->pcpu_statistics, op, ret); >>> + spi_mem_add_op_stats(mem->spi->pcpu_statistics, op, ret); >>> + > >> Would be good to be able to opt out the statistics if one wants it. > >> SPI NORs can write with a single write op maximum page_size bytes, which >> is typically 256 bytes. And since there are SPI NORs that can run at 400 >> MHz, I guess some performance penalty shouldn't be excluded. > > If we can cope with this sort of statistics collection in the networking > fast path we can probably cope with it for SPI too, the immediate > recording is all per CPU so I'd like to see some numbers showing that > it's a problem before worrying about it too much. Even the people doing > things like saturating CAN buses haven't been raising it as a concern > for the regular SPI data path. We could add a Kconfig if it's an issue. Ok, we can deal with it afterwards if it'll become an issue.
On 2/13/24 15:00, Théo Lebrun wrote: > Hello Tudor, Hi! > > On Tue Feb 13, 2024 at 1:39 PM CET, Tudor Ambarus wrote: >>> /** >>> * spi_mem_exec_op() - Execute a memory operation >>> * @mem: the SPI memory >>> @@ -339,8 +383,12 @@ int spi_mem_exec_op(struct spi_mem *mem, const struct spi_mem_op *op) >>> * read path) and expect the core to use the regular SPI >>> * interface in other cases. >>> */ >>> - if (!ret || ret != -ENOTSUPP || ret != -EOPNOTSUPP) >>> + if (!ret || ret != -ENOTSUPP || ret != -EOPNOTSUPP) { >>> + spi_mem_add_op_stats(ctlr->pcpu_statistics, op, ret); >>> + spi_mem_add_op_stats(mem->spi->pcpu_statistics, op, ret); >>> + >> >> Would be good to be able to opt out the statistics if one wants it. >> >> SPI NORs can write with a single write op maximum page_size bytes, which >> is typically 256 bytes. And since there are SPI NORs that can run at 400 >> MHz, I guess some performance penalty shouldn't be excluded. > > I did my testing on a 40 MHz octal SPI NOR with most reads being much > bigger than 256 bytes, so I probably didn't have the fastest setup > indeed. yeah, reads are bigger, the entire flash can be read with a single read op. > > What shape would that take? A spi-mem DT prop? New field in the SPI > statistics sysfs directory? > I think I'd go with a sysfs entry, it provides flexibility. But I guess we can worry about this if we have some numbers, and I don't have, so you're fine even without the opt-out option. > Other remarks have been taken into account, thanks! > Ok, thanks. Cheers, ta
Hello, On Wed Feb 14, 2024 at 9:00 AM CET, Tudor Ambarus wrote: > On 2/13/24 15:00, Théo Lebrun wrote: > > On Tue Feb 13, 2024 at 1:39 PM CET, Tudor Ambarus wrote: > >>> /** > >>> * spi_mem_exec_op() - Execute a memory operation > >>> * @mem: the SPI memory > >>> @@ -339,8 +383,12 @@ int spi_mem_exec_op(struct spi_mem *mem, const struct spi_mem_op *op) > >>> * read path) and expect the core to use the regular SPI > >>> * interface in other cases. > >>> */ > >>> - if (!ret || ret != -ENOTSUPP || ret != -EOPNOTSUPP) > >>> + if (!ret || ret != -ENOTSUPP || ret != -EOPNOTSUPP) { > >>> + spi_mem_add_op_stats(ctlr->pcpu_statistics, op, ret); > >>> + spi_mem_add_op_stats(mem->spi->pcpu_statistics, op, ret); > >>> + > >> > >> Would be good to be able to opt out the statistics if one wants it. > >> > >> SPI NORs can write with a single write op maximum page_size bytes, which > >> is typically 256 bytes. And since there are SPI NORs that can run at 400 > >> MHz, I guess some performance penalty shouldn't be excluded. > > > > I did my testing on a 40 MHz octal SPI NOR with most reads being much > > bigger than 256 bytes, so I probably didn't have the fastest setup > > indeed. > > yeah, reads are bigger, the entire flash can be read with a single read op. > > > > > What shape would that take? A spi-mem DT prop? New field in the SPI > > statistics sysfs directory? > > > > I think I'd go with a sysfs entry, it provides flexibility. But I guess > we can worry about this if we have some numbers, and I don't have, so > you're fine even without the opt-out option. Some ftrace numbers: - 48002 calls to spi_mem_add_op_stats(); - min 1.053000µs; - avg 1.175652µs; - max 16.272000µs. Platform is Mobileye EyeQ5. Cores are Imagine Technologies I6500-F. I don't know the precision of our timer but we might be getting close to what is measurable. Thanks, -- Théo Lebrun, Bootlin Embedded Linux and Kernel engineering https://bootlin.com
On 2/14/24 08:51, Théo Lebrun wrote: > Hello, Hi, Théo, > > On Wed Feb 14, 2024 at 9:00 AM CET, Tudor Ambarus wrote: >> On 2/13/24 15:00, Théo Lebrun wrote: >>> On Tue Feb 13, 2024 at 1:39 PM CET, Tudor Ambarus wrote: >>>>> /** >>>>> * spi_mem_exec_op() - Execute a memory operation >>>>> * @mem: the SPI memory >>>>> @@ -339,8 +383,12 @@ int spi_mem_exec_op(struct spi_mem *mem, const struct spi_mem_op *op) >>>>> * read path) and expect the core to use the regular SPI >>>>> * interface in other cases. >>>>> */ >>>>> - if (!ret || ret != -ENOTSUPP || ret != -EOPNOTSUPP) >>>>> + if (!ret || ret != -ENOTSUPP || ret != -EOPNOTSUPP) { >>>>> + spi_mem_add_op_stats(ctlr->pcpu_statistics, op, ret); >>>>> + spi_mem_add_op_stats(mem->spi->pcpu_statistics, op, ret); >>>>> + >>>> >>>> Would be good to be able to opt out the statistics if one wants it. >>>> >>>> SPI NORs can write with a single write op maximum page_size bytes, which >>>> is typically 256 bytes. And since there are SPI NORs that can run at 400 >>>> MHz, I guess some performance penalty shouldn't be excluded. >>> >>> I did my testing on a 40 MHz octal SPI NOR with most reads being much >>> bigger than 256 bytes, so I probably didn't have the fastest setup >>> indeed. >> >> yeah, reads are bigger, the entire flash can be read with a single read op. >> >>> >>> What shape would that take? A spi-mem DT prop? New field in the SPI >>> statistics sysfs directory? >>> >> >> I think I'd go with a sysfs entry, it provides flexibility. But I guess >> we can worry about this if we have some numbers, and I don't have, so >> you're fine even without the opt-out option. > > Some ftrace numbers: > - 48002 calls to spi_mem_add_op_stats(); > - min 1.053000µs; > - avg 1.175652µs; > - max 16.272000µs. > > Platform is Mobileye EyeQ5. Cores are Imagine Technologies I6500-F. I > don't know the precision of our timer but we might be getting close to > what is measurable. > Thanks. I took a random SPI NOR flash [1], its page program typical time is 64µs according to its SFDP data. We'll have to add here the delay the software handling takes. If you want to play a bit more, you can write the entire flash then compare the ftrace numbers of spi_mem_add_op_stats() with spi_nor_write(). Cheers, ta [1] https://ww1.microchip.com/downloads/en/DeviceDoc/20005119G.pdf
On Wed Feb 14, 2024 at 10:29 AM CET, Tudor Ambarus wrote: > On 2/14/24 08:51, Théo Lebrun wrote: > > On Wed Feb 14, 2024 at 9:00 AM CET, Tudor Ambarus wrote: > >> On 2/13/24 15:00, Théo Lebrun wrote: > >>> On Tue Feb 13, 2024 at 1:39 PM CET, Tudor Ambarus wrote: > >>>>> /** > >>>>> * spi_mem_exec_op() - Execute a memory operation > >>>>> * @mem: the SPI memory > >>>>> @@ -339,8 +383,12 @@ int spi_mem_exec_op(struct spi_mem *mem, const struct spi_mem_op *op) > >>>>> * read path) and expect the core to use the regular SPI > >>>>> * interface in other cases. > >>>>> */ > >>>>> - if (!ret || ret != -ENOTSUPP || ret != -EOPNOTSUPP) > >>>>> + if (!ret || ret != -ENOTSUPP || ret != -EOPNOTSUPP) { > >>>>> + spi_mem_add_op_stats(ctlr->pcpu_statistics, op, ret); > >>>>> + spi_mem_add_op_stats(mem->spi->pcpu_statistics, op, ret); > >>>>> + > >>>> > >>>> Would be good to be able to opt out the statistics if one wants it. > >>>> > >>>> SPI NORs can write with a single write op maximum page_size bytes, which > >>>> is typically 256 bytes. And since there are SPI NORs that can run at 400 > >>>> MHz, I guess some performance penalty shouldn't be excluded. > >>> > >>> I did my testing on a 40 MHz octal SPI NOR with most reads being much > >>> bigger than 256 bytes, so I probably didn't have the fastest setup > >>> indeed. > >> > >> yeah, reads are bigger, the entire flash can be read with a single read op. > >> > >>> > >>> What shape would that take? A spi-mem DT prop? New field in the SPI > >>> statistics sysfs directory? > >>> > >> > >> I think I'd go with a sysfs entry, it provides flexibility. But I guess > >> we can worry about this if we have some numbers, and I don't have, so > >> you're fine even without the opt-out option. > > > > Some ftrace numbers: > > - 48002 calls to spi_mem_add_op_stats(); > > - min 1.053000µs; > > - avg 1.175652µs; > > - max 16.272000µs. > > > > Platform is Mobileye EyeQ5. Cores are Imagine Technologies I6500-F. I > > don't know the precision of our timer but we might be getting close to > > what is measurable. > > > Thanks. > > I took a random SPI NOR flash [1], its page program typical time is 64µs > according to its SFDP data. We'll have to add here the delay the > software handling takes. > > If you want to play a bit more, you can write the entire flash then > compare the ftrace numbers of spi_mem_add_op_stats() with spi_nor_write(). It is unclear to me why you are focusing on writes? Won't reads be much faster in the common case, and therefore where stats overhead would show the most? For cadence-qspi, only issuing command reads (reads below 8 bytes) would be a sort of pathological case. Thanks, -- Théo Lebrun, Bootlin Embedded Linux and Kernel engineering https://bootlin.com
On 2/14/24 10:59, Théo Lebrun wrote: > On Wed Feb 14, 2024 at 10:29 AM CET, Tudor Ambarus wrote: >> On 2/14/24 08:51, Théo Lebrun wrote: >>> On Wed Feb 14, 2024 at 9:00 AM CET, Tudor Ambarus wrote: >>>> On 2/13/24 15:00, Théo Lebrun wrote: >>>>> On Tue Feb 13, 2024 at 1:39 PM CET, Tudor Ambarus wrote: >>>>>>> /** >>>>>>> * spi_mem_exec_op() - Execute a memory operation >>>>>>> * @mem: the SPI memory >>>>>>> @@ -339,8 +383,12 @@ int spi_mem_exec_op(struct spi_mem *mem, const struct spi_mem_op *op) >>>>>>> * read path) and expect the core to use the regular SPI >>>>>>> * interface in other cases. >>>>>>> */ >>>>>>> - if (!ret || ret != -ENOTSUPP || ret != -EOPNOTSUPP) >>>>>>> + if (!ret || ret != -ENOTSUPP || ret != -EOPNOTSUPP) { >>>>>>> + spi_mem_add_op_stats(ctlr->pcpu_statistics, op, ret); >>>>>>> + spi_mem_add_op_stats(mem->spi->pcpu_statistics, op, ret); >>>>>>> + >>>>>> >>>>>> Would be good to be able to opt out the statistics if one wants it. >>>>>> >>>>>> SPI NORs can write with a single write op maximum page_size bytes, which >>>>>> is typically 256 bytes. And since there are SPI NORs that can run at 400 >>>>>> MHz, I guess some performance penalty shouldn't be excluded. >>>>> >>>>> I did my testing on a 40 MHz octal SPI NOR with most reads being much >>>>> bigger than 256 bytes, so I probably didn't have the fastest setup >>>>> indeed. >>>> >>>> yeah, reads are bigger, the entire flash can be read with a single read op. >>>> >>>>> >>>>> What shape would that take? A spi-mem DT prop? New field in the SPI >>>>> statistics sysfs directory? >>>>> >>>> >>>> I think I'd go with a sysfs entry, it provides flexibility. But I guess >>>> we can worry about this if we have some numbers, and I don't have, so >>>> you're fine even without the opt-out option. >>> >>> Some ftrace numbers: >>> - 48002 calls to spi_mem_add_op_stats(); >>> - min 1.053000µs; >>> - avg 1.175652µs; >>> - max 16.272000µs. >>> >>> Platform is Mobileye EyeQ5. Cores are Imagine Technologies I6500-F. I >>> don't know the precision of our timer but we might be getting close to >>> what is measurable. >>> >> Thanks. >> >> I took a random SPI NOR flash [1], its page program typical time is 64µs >> according to its SFDP data. We'll have to add here the delay the >> software handling takes. >> >> If you want to play a bit more, you can write the entire flash then >> compare the ftrace numbers of spi_mem_add_op_stats() with spi_nor_write(). > > It is unclear to me why you are focusing on writes? Won't reads be much It's easier to test as the SPI NOR core will issue a new page program operation, thus a new exec_op() call, for each page size. > faster in the common case, and therefore where stats overhead would > show the most? For cadence-qspi, only issuing command reads (reads below > 8 bytes) would be a sort of pathological case. If you can serialize the reads and do small 8 bytes requests, then yes, small reads are the critical case. Not sure how common it is and how to test it. Again, opting out is not a hard requirement from my side, you're fine with how the patch is now. But if you want to measure the impact, you can either compare with small reads, as you suggested, and with full flash write, where the exec_op() calls will be split by the core on a page_size basis. Cheers, ta
diff --git a/drivers/spi/spi-mem.c b/drivers/spi/spi-mem.c index 2dc8ceb85374..171fe6b1c247 100644 --- a/drivers/spi/spi-mem.c +++ b/drivers/spi/spi-mem.c @@ -297,6 +297,50 @@ static void spi_mem_access_end(struct spi_mem *mem) pm_runtime_put(ctlr->dev.parent); } +static void spi_mem_add_op_stats(struct spi_statistics __percpu *pcpu_stats, + const struct spi_mem_op *op, int exec_op_ret) +{ + struct spi_statistics *stats; + int len, l2len; + + get_cpu(); + stats = this_cpu_ptr(pcpu_stats); + u64_stats_update_begin(&stats->syncp); + + /* + * We do not have the concept of messages or transfers. Let's consider + * that one operation is equivalent to one message and one transfer. + */ + u64_stats_inc(&stats->messages); + u64_stats_inc(&stats->transfers); + + /* Use the sum of all lengths as bytes count and histogram value. */ + len = (int)op->cmd.nbytes + (int)op->addr.nbytes; + len += (int)op->dummy.nbytes + (int)op->data.nbytes; + u64_stats_add(&stats->bytes, len); + l2len = min(fls(len), SPI_STATISTICS_HISTO_SIZE) - 1; + l2len = max(l2len, 0); + u64_stats_inc(&stats->transfer_bytes_histo[l2len]); + + /* Only account for data bytes as xferred bytes. */ + if (op->data.nbytes && op->data.dir == SPI_MEM_DATA_OUT) + u64_stats_add(&stats->bytes_tx, op->data.nbytes); + if (op->data.nbytes && op->data.dir == SPI_MEM_DATA_IN) + u64_stats_add(&stats->bytes_rx, op->data.nbytes); + + /* + * A timeout is not an error, following the same behavior as + * spi_transfer_one_message(). + */ + if (exec_op_ret == -ETIMEDOUT) + u64_stats_inc(&stats->timedout); + else if (exec_op_ret) + u64_stats_inc(&stats->errors); + + u64_stats_update_end(&stats->syncp); + put_cpu(); +} + /** * spi_mem_exec_op() - Execute a memory operation * @mem: the SPI memory @@ -339,8 +383,12 @@ int spi_mem_exec_op(struct spi_mem *mem, const struct spi_mem_op *op) * read path) and expect the core to use the regular SPI * interface in other cases. */ - if (!ret || ret != -ENOTSUPP || ret != -EOPNOTSUPP) + if (!ret || ret != -ENOTSUPP || ret != -EOPNOTSUPP) { + spi_mem_add_op_stats(ctlr->pcpu_statistics, op, ret); + spi_mem_add_op_stats(mem->spi->pcpu_statistics, op, ret); + return ret; + } } tmpbufsize = op->cmd.nbytes + op->addr.nbytes + op->dummy.nbytes;