[07/16] spi: bcm63xx-hsspi: Add polling mode support

Message ID 20230106200809.330769-8-william.zhang@broadcom.com
State New
Headers
Series spi: bcm63xx-hsspi: driver and doc updates |

Commit Message

William Zhang Jan. 6, 2023, 8:07 p.m. UTC
  Polling mode provides better throughput in general by avoiding the
interrupt overhead as the maximum data size one interrupt can handle is
only 512 bytes.

When interrupt is not defined in the HSSPI dts node, driver switches to
polling mode for controller SPI message processing.  Also add driver
banner message when the driver is loaded successfully.

When test on a Broadcom BCM47622(ARM A7 dual core) reference board with
WINBOND W25N01GV SPI NAND chip at 100MHz SPI clock using the MTD speed
test suite, it shows about 15% improvement on the write and 30% on
the read:
** Interrupt mode **
  mtd_speedtest: MTD device: 0    count: 16
  mtd_speedtest: MTD device size 134217728, eraseblock size 131072, page
size 2048, count of eraseblocks 1024, pages per eraseblock 64, OOB size
64
  mtd_test: scanning for bad eraseblocks
  mtd_test: scanned 16 eraseblocks, 0 are bad
  mtd_speedtest: testing eraseblock write speed
  mtd_speedtest: eraseblock write speed is 3072 KiB/s
  mtd_speedtest: testing eraseblock read speed
  mtd_speedtest: eraseblock read speed is 6690 KiB/s
  mtd_speedtest: testing page write speed
  mtd_speedtest: page write speed is 3066 KiB/s
  mtd_speedtest: testing page read speed
  mtd_speedtest: page read speed is 6762 KiB/s
  mtd_speedtest: testing 2 page write speed
  mtd_speedtest: 2 page write speed is 3071 KiB/s
  mtd_speedtest: testing 2 page read speed
  mtd_speedtest: 2 page read speed is 6772 KiB/s
** Polling mode **
  mtd_speedtest: MTD device: 0    count: 16
  mtd_speedtest: MTD device size 134217728, eraseblock size 131072, page
size 2048, count of eraseblocks 1024, pages per eraseblock 64, OOB size
64
  mtd_test: scanning for bad eraseblocks
  mtd_test: scanned 16 eraseblocks, 0 are bad
  mtd_speedtest: testing eraseblock write speed
  mtd_speedtest: eraseblock write speed is 3542 KiB/s
  mtd_speedtest: testing eraseblock read speed
  mtd_speedtest: eraseblock read speed is 8825 KiB/s
  mtd_speedtest: testing page write speed
  mtd_speedtest: page write speed is 3563 KiB/s
  mtd_speedtest: testing page read speed
  mtd_speedtest: page read speed is 8787 KiB/s
  mtd_speedtest: testing 2 page write speed
  mtd_speedtest: 2 page write speed is 3572 KiB/s
  mtd_speedtest: testing 2 page read speed
  mtd_speedtest: 2 page read speed is 8806 KiB/s

Signed-off-by: William Zhang <william.zhang@broadcom.com>
---

 drivers/spi/spi-bcm63xx-hsspi.c | 49 +++++++++++++++++++++++++--------
 1 file changed, 38 insertions(+), 11 deletions(-)
  

Comments

Mark Brown Jan. 6, 2023, 9:47 p.m. UTC | #1
On Fri, Jan 06, 2023 at 12:07:59PM -0800, William Zhang wrote:

> Polling mode provides better throughput in general by avoiding the
> interrupt overhead as the maximum data size one interrupt can handle is
> only 512 bytes.

> When interrupt is not defined in the HSSPI dts node, driver switches to
> polling mode for controller SPI message processing.  Also add driver
> banner message when the driver is loaded successfully.

This should not be something the user selects via the DT, if the polling
mode is better then the driver should just use it regardless of there
being an interrupt wired up.  Generally there's some point at which the
benefits of polling become minimal (and the costs more impactful) but if
the DMA setup is as bad as it sounds then the driver should just ignore
the interrupt.
  
William Zhang Jan. 7, 2023, 3:35 a.m. UTC | #2
On 01/06/2023 01:47 PM, Mark Brown wrote:
> On Fri, Jan 06, 2023 at 12:07:59PM -0800, William Zhang wrote:
> 
>> Polling mode provides better throughput in general by avoiding the
>> interrupt overhead as the maximum data size one interrupt can handle is
>> only 512 bytes.
> 
>> When interrupt is not defined in the HSSPI dts node, driver switches to
>> polling mode for controller SPI message processing.  Also add driver
>> banner message when the driver is loaded successfully.
> 
> This should not be something the user selects via the DT, if the polling
> mode is better then the driver should just use it regardless of there
> being an interrupt wired up.  Generally there's some point at which the
> benefits of polling become minimal (and the costs more impactful) but if
> the DMA setup is as bad as it sounds then the driver should just ignore
> the interrupt.
> 
I agree but this is more for backward compatibility as the original 
driver uses interrupt to work with MIPS based SoC and I do not want to 
change that behavior.  For newer devices I added, they work in polling 
mode by default with the dts I supplied.  IMHO it is also good to have 
this fallback option to use interrupt in case user is sensitive to cpu 
usage.
  
Mark Brown Jan. 9, 2023, 7:06 p.m. UTC | #3
On Fri, Jan 06, 2023 at 07:35:59PM -0800, William Zhang wrote:
> On 01/06/2023 01:47 PM, Mark Brown wrote:
> > On Fri, Jan 06, 2023 at 12:07:59PM -0800, William Zhang wrote:

> > > When interrupt is not defined in the HSSPI dts node, driver switches to
> > > polling mode for controller SPI message processing.  Also add driver
> > > banner message when the driver is loaded successfully.

> > This should not be something the user selects via the DT, if the polling
> > mode is better then the driver should just use it regardless of there
> > being an interrupt wired up.  Generally there's some point at which the
> > benefits of polling become minimal (and the costs more impactful) but if
> > the DMA setup is as bad as it sounds then the driver should just ignore
> > the interrupt.

> I agree but this is more for backward compatibility as the original driver
> uses interrupt to work with MIPS based SoC and I do not want to change that
> behavior.  For newer devices I added, they work in polling mode by default
> with the dts I supplied.  IMHO it is also good to have this fallback option
> to use interrupt in case user is sensitive to cpu usage.

You can put whatever logic is needed in the code - for something like
this an architecture based define isn't ideal but is probably good
enough if need be (though I'd not be surprised if it turned out that
there was also some performance benefit for the MIPS systems too, at
least for smaller transfers).
  
William Zhang Jan. 9, 2023, 8:10 p.m. UTC | #4
On 01/09/2023 11:06 AM, Mark Brown wrote:
> On Fri, Jan 06, 2023 at 07:35:59PM -0800, William Zhang wrote:
>> On 01/06/2023 01:47 PM, Mark Brown wrote:
>>> On Fri, Jan 06, 2023 at 12:07:59PM -0800, William Zhang wrote:
> 
>>>> When interrupt is not defined in the HSSPI dts node, driver switches to
>>>> polling mode for controller SPI message processing.  Also add driver
>>>> banner message when the driver is loaded successfully.
> 
>>> This should not be something the user selects via the DT, if the polling
>>> mode is better then the driver should just use it regardless of there
>>> being an interrupt wired up.  Generally there's some point at which the
>>> benefits of polling become minimal (and the costs more impactful) but if
>>> the DMA setup is as bad as it sounds then the driver should just ignore
>>> the interrupt.
> 
>> I agree but this is more for backward compatibility as the original driver
>> uses interrupt to work with MIPS based SoC and I do not want to change that
>> behavior.  For newer devices I added, they work in polling mode by default
>> with the dts I supplied.  IMHO it is also good to have this fallback option
>> to use interrupt in case user is sensitive to cpu usage.
> 
> You can put whatever logic is needed in the code - for something like
> this an architecture based define isn't ideal but is probably good
> enough if need be (though I'd not be surprised if it turned out that
> there was also some performance benefit for the MIPS systems too, at
> least for smaller transfers).
> 
I just don't know what other logic I can put in the driver to select 
interrupt or polling mode.  Only the end user know if performance or cpu 
usage is more important to their application.
  
Mark Brown Jan. 10, 2023, 10:49 p.m. UTC | #5
On Mon, Jan 09, 2023 at 12:10:30PM -0800, William Zhang wrote:
> On 01/09/2023 11:06 AM, Mark Brown wrote:

> > You can put whatever logic is needed in the code - for something like
> > this an architecture based define isn't ideal but is probably good
> > enough if need be (though I'd not be surprised if it turned out that
> > there was also some performance benefit for the MIPS systems too, at
> > least for smaller transfers).

> I just don't know what other logic I can put in the driver to select
> interrupt or polling mode.  Only the end user know if performance or cpu
> usage is more important to their application.

Usually you can take a reasonable guess as to what would be a good point
to start switching, typically for short enough transfers the overhead of
setting up DMA, waiting for interrupts and tearing things down is very
much larger than the cost of just doing PIO - a bunch of other drivers
have pick a number logic of some kind, often things like FIFO sizes are
a good key for where to look.  A lot of the time this is good enough,
and it means that users have much better facilities for making tradeoffs
if they have a range of transfer sizes available - it's not an either/or
thing but based on some features of the individual message/transfer.

It is true that for people with heavy SPI traffic or otherwise very
demanding requirements for a specific system and software stack
additional tuning might produce better results, exposing some sysfs
knobs to allow tuning of parameters at runtime would be helpful for them
and I'd certainly be happy to see that added.
  
William Zhang Jan. 11, 2023, 8:13 p.m. UTC | #6
On 01/10/2023 02:49 PM, Mark Brown wrote:
> On Mon, Jan 09, 2023 at 12:10:30PM -0800, William Zhang wrote:
>> On 01/09/2023 11:06 AM, Mark Brown wrote:
> 
>>> You can put whatever logic is needed in the code - for something like
>>> this an architecture based define isn't ideal but is probably good
>>> enough if need be (though I'd not be surprised if it turned out that
>>> there was also some performance benefit for the MIPS systems too, at
>>> least for smaller transfers).
> 
>> I just don't know what other logic I can put in the driver to select
>> interrupt or polling mode.  Only the end user know if performance or cpu
>> usage is more important to their application.
> 
> Usually you can take a reasonable guess as to what would be a good point
> to start switching, typically for short enough transfers the overhead of
> setting up DMA, waiting for interrupts and tearing things down is very
> much larger than the cost of just doing PIO - a bunch of other drivers
> have pick a number logic of some kind, often things like FIFO sizes are
> a good key for where to look.  A lot of the time this is good enough,
> and it means that users have much better facilities for making tradeoffs
> if they have a range of transfer sizes available - it's not an either/or
> thing but based on some features of the individual message/transfer.
> 
> It is true that for people with heavy SPI traffic or otherwise very
> demanding requirements for a specific system and software stack
> additional tuning might produce better results, exposing some sysfs
> knobs to allow tuning of parameters at runtime would be helpful for them
> and I'd certainly be happy to see that added.
> 
Thanks for the explanation. I saw the spi-uniphier.c and spi-bcm2835.c 
doing the trick you mentioned(thanks Kursad for pointing out).  In our 
case, even the maximum fifo size usage(512bytes), the polling still have 
better performance than interrupt. The MTD test result included in this 
patch is based on maximum fifo usage. So there is no benefit to switch 
to interrupt based on transfer size.

Does the spi framework has any requirement on how much time that the 
driver's transfer_one function can spend?  I can see the polling 
function might take quite some time in busy loop if the clock is low, 
for example, at 100Hz(slowest clock this controller can go), it takes 
512x8/100Hz ~= 41ms to complete.  If this is something in concern,  I 
can do the interrupt switch based on a time limit value if interrupt is 
available.
  
Mark Brown Jan. 11, 2023, 10:41 p.m. UTC | #7
On Wed, Jan 11, 2023 at 12:13:43PM -0800, William Zhang wrote:

> Thanks for the explanation. I saw the spi-uniphier.c and spi-bcm2835.c doing
> the trick you mentioned(thanks Kursad for pointing out).  In our case, even
> the maximum fifo size usage(512bytes), the polling still have better
> performance than interrupt. The MTD test result included in this patch is
> based on maximum fifo usage. So there is no benefit to switch to interrupt
> based on transfer size.

> Does the spi framework has any requirement on how much time that the
> driver's transfer_one function can spend?  I can see the polling function
> might take quite some time in busy loop if the clock is low, for example, at
> 100Hz(slowest clock this controller can go), it takes 512x8/100Hz ~= 41ms to
> complete.  If this is something in concern,  I can do the interrupt switch
> based on a time limit value if interrupt is available.

There's no fixed limit, some hardware just doesn't have DMA.  Some
doesn't even have interrupts which is even better.  If there's always a
clear benefit for using PIO then just do that, perhaps creating a sysfs
hook to allow people to switch to DMA if they need it (rather than
requiring people to update their DT, this is really a runtime
performance tradeoff rather than a description of the hardware).  If
there's a point at which the performance is very similar then it's
probably worth switching to DMA there to lower the CPU usage, but if
it's always faster to use PIO then just defaulting to PIO seems like the
best option.
  
William Zhang Jan. 11, 2023, 10:57 p.m. UTC | #8
On 01/11/2023 02:41 PM, Mark Brown wrote:
> On Wed, Jan 11, 2023 at 12:13:43PM -0800, William Zhang wrote:
> 
>> Thanks for the explanation. I saw the spi-uniphier.c and spi-bcm2835.c doing
>> the trick you mentioned(thanks Kursad for pointing out).  In our case, even
>> the maximum fifo size usage(512bytes), the polling still have better
>> performance than interrupt. The MTD test result included in this patch is
>> based on maximum fifo usage. So there is no benefit to switch to interrupt
>> based on transfer size.
> 
>> Does the spi framework has any requirement on how much time that the
>> driver's transfer_one function can spend?  I can see the polling function
>> might take quite some time in busy loop if the clock is low, for example, at
>> 100Hz(slowest clock this controller can go), it takes 512x8/100Hz ~= 41ms to
>> complete.  If this is something in concern,  I can do the interrupt switch
>> based on a time limit value if interrupt is available.
> 
> There's no fixed limit, some hardware just doesn't have DMA.  Some
> doesn't even have interrupts which is even better.  If there's always a
> clear benefit for using PIO then just do that, perhaps creating a sysfs
> hook to allow people to switch to DMA if they need it (rather than
> requiring people to update their DT, this is really a runtime
> performance tradeoff rather than a description of the hardware).  If
> there's a point at which the performance is very similar then it's
> probably worth switching to DMA there to lower the CPU usage, but if
> it's always faster to use PIO then just defaulting to PIO seems like the
> best option.
> 
Sure. I will add the sysfs option. Interrupt node is now required in dts 
but by default it will still runs at polling mode until sysfs option is 
set to use interrupt.
  

Patch

diff --git a/drivers/spi/spi-bcm63xx-hsspi.c b/drivers/spi/spi-bcm63xx-hsspi.c
index 63682c8ff955..2b4cdf7e7002 100644
--- a/drivers/spi/spi-bcm63xx-hsspi.c
+++ b/drivers/spi/spi-bcm63xx-hsspi.c
@@ -57,6 +57,7 @@ 
 #define PINGPONG_CMD_SS_SHIFT			12
 
 #define HSSPI_PINGPONG_STATUS_REG(x)		(0x84 + (x) * 0x40)
+#define HSSPI_PINGPONG_STATUS_SRC_BUSY		BIT(1)
 
 #define HSSPI_PROFILE_CLK_CTRL_REG(x)		(0x100 + (x) * 0x20)
 #define CLK_CTRL_FREQ_CTRL_MASK			0x0000ffff
@@ -96,6 +97,7 @@ 
 
 #define HSSPI_SPI_MAX_CS			8
 #define HSSPI_BUS_NUM				1 /* 0 is legacy SPI */
+#define HSSPI_POLL_STATUS_TIMEOUT_MS	100
 
 struct bcm63xx_hsspi {
 	struct completion done;
@@ -109,6 +111,7 @@  struct bcm63xx_hsspi {
 
 	u32 speed_hz;
 	u8 cs_polarity;
+	int irq;
 };
 
 static void bcm63xx_hsspi_set_cs(struct bcm63xx_hsspi *bs, unsigned int cs,
@@ -163,6 +166,8 @@  static int bcm63xx_hsspi_do_txrx(struct spi_device *spi, struct spi_transfer *t)
 	int step_size = HSSPI_BUFFER_LEN;
 	const u8 *tx = t->tx_buf;
 	u8 *rx = t->rx_buf;
+	u32 val;
+	unsigned long limit;
 
 	bcm63xx_hsspi_set_clk(bs, spi, t->speed_hz);
 	bcm63xx_hsspi_set_cs(bs, spi->chip_select, true);
@@ -197,8 +202,9 @@  static int bcm63xx_hsspi_do_txrx(struct spi_device *spi, struct spi_transfer *t)
 		__raw_writew(cpu_to_be16(opcode | curr_step), bs->fifo);
 
 		/* enable interrupt */
-		__raw_writel(HSSPI_PINGx_CMD_DONE(0),
-			     bs->regs + HSSPI_INT_MASK_REG);
+		if (bs->irq > 0)
+			__raw_writel(HSSPI_PINGx_CMD_DONE(0),
+				     bs->regs + HSSPI_INT_MASK_REG);
 
 		/* start the transfer */
 		__raw_writel(!chip_select << PINGPONG_CMD_SS_SHIFT |
@@ -206,9 +212,21 @@  static int bcm63xx_hsspi_do_txrx(struct spi_device *spi, struct spi_transfer *t)
 			     PINGPONG_COMMAND_START_NOW,
 			     bs->regs + HSSPI_PINGPONG_COMMAND_REG(0));
 
-		if (wait_for_completion_timeout(&bs->done, HZ) == 0) {
-			dev_err(&bs->pdev->dev, "transfer timed out!\n");
-			return -ETIMEDOUT;
+		if (bs->irq > 0) {
+			if (wait_for_completion_timeout(&bs->done, HZ) == 0)
+				goto err_timeout;
+		} else {
+			/* polling mode checks for status busy bit */
+			limit = jiffies + msecs_to_jiffies(HSSPI_POLL_STATUS_TIMEOUT_MS);
+			while (!time_after(jiffies, limit)) {
+				val = __raw_readl(bs->regs + HSSPI_PINGPONG_STATUS_REG(0));
+				if (val & HSSPI_PINGPONG_STATUS_SRC_BUSY)
+					cpu_relax();
+				else
+					break;
+			}
+			if (val & HSSPI_PINGPONG_STATUS_SRC_BUSY)
+				goto err_timeout;
 		}
 
 		if (rx) {
@@ -220,6 +238,10 @@  static int bcm63xx_hsspi_do_txrx(struct spi_device *spi, struct spi_transfer *t)
 	}
 
 	return 0;
+
+err_timeout:
+	dev_err(&bs->pdev->dev, "transfer timed out!\n");
+	return -ETIMEDOUT;
 }
 
 static int bcm63xx_hsspi_setup(struct spi_device *spi)
@@ -338,8 +360,8 @@  static int bcm63xx_hsspi_probe(struct platform_device *pdev)
 	u32 reg, rate, num_cs = HSSPI_SPI_MAX_CS;
 	struct reset_control *reset;
 
-	irq = platform_get_irq(pdev, 0);
-	if (irq < 0)
+	irq = platform_get_irq_optional(pdev, 0);
+	if (irq < 0 && irq != -ENXIO)
 		return irq;
 
 	regs = devm_platform_ioremap_resource(pdev, 0);
@@ -398,6 +420,7 @@  static int bcm63xx_hsspi_probe(struct platform_device *pdev)
 	bs->regs = regs;
 	bs->speed_hz = rate;
 	bs->fifo = (u8 __iomem *)(bs->regs + HSSPI_FIFO_REG(0));
+	bs->irq = irq;
 
 	mutex_init(&bs->bus_mutex);
 	init_completion(&bs->done);
@@ -434,11 +457,13 @@  static int bcm63xx_hsspi_probe(struct platform_device *pdev)
 	__raw_writel(reg | GLOBAL_CTRL_CLK_GATE_SSOFF,
 		     bs->regs + HSSPI_GLOBAL_CTRL_REG);
 
-	ret = devm_request_irq(dev, irq, bcm63xx_hsspi_interrupt, IRQF_SHARED,
-			       pdev->name, bs);
+	if (bs->irq > 0) {
+		ret = devm_request_irq(dev, irq, bcm63xx_hsspi_interrupt, IRQF_SHARED,
+				       pdev->name, bs);
 
-	if (ret)
-		goto out_put_master;
+		if (ret)
+			goto out_put_master;
+	}
 
 	pm_runtime_enable(&pdev->dev);
 
@@ -447,6 +472,8 @@  static int bcm63xx_hsspi_probe(struct platform_device *pdev)
 	if (ret)
 		goto out_pm_disable;
 
+	dev_info(dev, "Broadcom 63XX High Speed SPI Controller driver");
+
 	return 0;
 
 out_pm_disable: