[10/16] spi: bcm63xx-hsspi: Make dummy cs workaround as an option

Message ID 20230106200809.330769-11-william.zhang@broadcom.com
State New
Headers
Series spi: bcm63xx-hsspi: driver and doc updates |

Commit Message

William Zhang Jan. 6, 2023, 8:08 p.m. UTC
  Due to the controller limitation to keep the chip select low during the
bus idle time between the transfer, a dummy cs workaround was used when
this driver was first upstreamed to the kernel.

The workaround picks dummy_cs as !actual_cs and usually cs is 0 and
dummy cs is 1. But this does not work out in all the situations as
customer reported issues for their board design. Due to SPI block design
constrain, the controller works in different mode internally depending
on the clock. When clock is 30MHz or above, controller works in faster
mode and cs 1 signal is used internally as feedback from the pad. So cs
1 pin must be brought out and configured as chip select function in
pinmux. When clock is below 30MHz, only cs 0 pin is used. In summary
when clock is below 30MHz, the workaround always works as only cs 0 is
involved. For clock faster than 30MHz, it require cs1 pin used in the
board design and configured as chip selection function.

In a typical usage of SPI flash on cs 0 that normally runs at 100MHz,
cs 1 pin must be used in the board design but this is not always the
case for many customer boards.

For voice daughtercard usage,  the SoC alway uses internal cs 7 for
voice card SPI control. Then it requires cs 0 pin as the dummy cs but
board with voice design may not use cs 0 pin at all.

The controller actually has a prepend feature that can combine multiple
SPI transfers in a SPI message into one single transfer when the
transfers meet certain requirements. So there is no need for keeping cs
low when the message only has one transfer. Most of the SPI devices
including SPI NOR, SPI NAND flash, Broadcom voice card and etc can use
this feature without the dummy cs workaround.

This patch makes the dummy cs workaround as an option based on the
dts flag brcm,use-cs-workaround. By default dummy cs workaround is
hard coded to enable. We will use the prepend feature and disable this
workaround as default in the next patch of this series unless this flag
is set in dts.

Signed-off-by: William Zhang <william.zhang@broadcom.com>
---

 drivers/spi/spi-bcm63xx-hsspi.c | 55 +++++++++++++++++++++------------
 1 file changed, 36 insertions(+), 19 deletions(-)
  

Comments

Mark Brown Jan. 12, 2023, 6:08 p.m. UTC | #1
On Fri, Jan 06, 2023 at 12:08:02PM -0800, William Zhang wrote:

> Due to the controller limitation to keep the chip select low during the
> bus idle time between the transfer, a dummy cs workaround was used when
> this driver was first upstreamed to the kernel.
> 
> The workaround picks dummy_cs as !actual_cs and usually cs is 0 and
> dummy cs is 1. But this does not work out in all the situations as
> customer reported issues for their board design. Due to SPI block design
> constrain, the controller works in different mode internally depending
> on the clock. When clock is 30MHz or above, controller works in faster
> mode and cs 1 signal is used internally as feedback from the pad. So cs
> 1 pin must be brought out and configured as chip select function in
> pinmux. When clock is below 30MHz, only cs 0 pin is used. In summary
> when clock is below 30MHz, the workaround always works as only cs 0 is
> involved. For clock faster than 30MHz, it require cs1 pin used in the
> board design and configured as chip selection function.

> In a typical usage of SPI flash on cs 0 that normally runs at 100MHz,
> cs 1 pin must be used in the board design but this is not always the
> case for many customer boards.

> For voice daughtercard usage,  the SoC alway uses internal cs 7 for
> voice card SPI control. Then it requires cs 0 pin as the dummy cs but
> board with voice design may not use cs 0 pin at all.

So I think what this is trying to say is that operation over 30MHz with
the existing workaround requires that the board be designed to bring out
the chip select used as the dummy as a physical chip select (or
potentially it has to be the actual chip select for the device?) but
that likely won't have been done?  And potentially this only works if CS
1 is the one brought out?  I'm unclear if CS 1 is just the most common
dummy chip select or if there's something else going on here.

> The controller actually has a prepend feature that can combine multiple
> SPI transfers in a SPI message into one single transfer when the
> transfers meet certain requirements. So there is no need for keeping cs
> low when the message only has one transfer. Most of the SPI devices
> including SPI NOR, SPI NAND flash, Broadcom voice card and etc can use
> this feature without the dummy cs workaround.

> This patch makes the dummy cs workaround as an option based on the
> dts flag brcm,use-cs-workaround. By default dummy cs workaround is
> hard coded to enable. We will use the prepend feature and disable this
> workaround as default in the next patch of this series unless this flag
> is set in dts.

...and based on your other comments I gather it's difficult to disable
the workaround per message?  I'm also guessing that the overhead from
always doing full duplex transfers is noticable so it's better to try
the workaround.

I wonder if we can't do this by selecting the workaround based on the
configured device speed.  If the device is configured to use more than
30MHz then disable the workaround, if it's set to lower then use it.  In
practice most devices don't change their speed after setup, and the
driver could check for and handle that case I guess (eg, limit the speed
configured if the workaround has been activated or rewrite the message
to a single transfer if neded).  That would be less error prone for
users, there wouldn't be any possibility of them setting this flag
incorrectly.
  
William Zhang Jan. 18, 2023, 11:09 p.m. UTC | #2
Hi Mark,

Sorry for the late reply. Please see my comments below.

On 01/12/2023 10:08 AM, Mark Brown wrote:
> On Fri, Jan 06, 2023 at 12:08:02PM -0800, William Zhang wrote:
> 
>> Due to the controller limitation to keep the chip select low during the
>> bus idle time between the transfer, a dummy cs workaround was used when
>> this driver was first upstreamed to the kernel.
>>
>> The workaround picks dummy_cs as !actual_cs and usually cs is 0 and
>> dummy cs is 1. But this does not work out in all the situations as
>> customer reported issues for their board design. Due to SPI block design
>> constrain, the controller works in different mode internally depending
>> on the clock. When clock is 30MHz or above, controller works in faster
>> mode and cs 1 signal is used internally as feedback from the pad. So cs
>> 1 pin must be brought out and configured as chip select function in
>> pinmux. When clock is below 30MHz, only cs 0 pin is used. In summary
>> when clock is below 30MHz, the workaround always works as only cs 0 is
>> involved. For clock faster than 30MHz, it require cs1 pin used in the
>> board design and configured as chip selection function.
> 
>> In a typical usage of SPI flash on cs 0 that normally runs at 100MHz,
>> cs 1 pin must be used in the board design but this is not always the
>> case for many customer boards.
> 
>> For voice daughtercard usage,  the SoC alway uses internal cs 7 for
>> voice card SPI control. Then it requires cs 0 pin as the dummy cs but
>> board with voice design may not use cs 0 pin at all.
> 
> So I think what this is trying to say is that operation over 30MHz with
> the existing workaround requires that the board be designed to bring out
> the chip select used as the dummy as a physical chip select (or
> potentially it has to be the actual chip select for the device?) 
Yes as the physical chip select and its pinmux must set as spi slave 
chip select function.  If that pin is configured as GPIO function, this 
workaround won't work. For the old SoCs, this dummy cs 1 pin are default 
to cs pinmux function so the issue was not revealed. For all new SoC, 
the cs 1 pin is default to GPIO function so it stops working.

> but that likely won't have been done?
Correct. This was never a requirement from Broadcom for any board design 
because the Broadcom in-house SPI bus driver in its initial software 
release was essentially based on prepend mode so no such hardware design 
requirement and it does not make sense to reserve the dummy cs1 pin 
while customer only have one spi device at cs0. So no customer board 
would do that unless they have two devices. We decided to switch to the 
upstream  driver with prepend mode addition as default in our late 
software release couple years ago. It has been running pretty smooth in 
the field.

> And potentially this only works if CS
> 1 is the one brought out?  I'm unclear if CS 1 is just the most common
> dummy chip select or if there's something else going on here.
> 
Yes with pinmux set correctly. It can be other cs but the code set 
dummy_cs = !actual_cs. So it can be either 1 or 0. In most common case, 
board only have one spi device on cs 0. So dummy is mostly cs1.

>> The controller actually has a prepend feature that can combine multiple
>> SPI transfers in a SPI message into one single transfer when the
>> transfers meet certain requirements. So there is no need for keeping cs
>> low when the message only has one transfer. Most of the SPI devices
>> including SPI NOR, SPI NAND flash, Broadcom voice card and etc can use
>> this feature without the dummy cs workaround.
> 
>> This patch makes the dummy cs workaround as an option based on the
>> dts flag brcm,use-cs-workaround. By default dummy cs workaround is
>> hard coded to enable. We will use the prepend feature and disable this
>> workaround as default in the next patch of this series unless this flag
>> is set in dts.
> 
> ...and based on your other comments I gather it's difficult to disable
> the workaround per message?  I'm also guessing that the overhead from
> always doing full duplex transfers is noticable so it's better to try
> the workaround.
> 
The main issue the max_message/transfer_size setup during the driver 
probe function. prepend mode has a max size of 512 bytes while dummy cs 
does not. I don't think it is safe to update these function ptr at 
runtime and it will be too late for the driver to update.

But I think we can always set the max size as in prepend mode. It does 
looses some performance for dummy cs (about 15% slower on throughput 
when test on 100MHz SPI NAND) because the size limit. The prepend buffer 
copy and other handling is insignificant. But since dummy cs does not 
work on 100MHz in general, it is still much better throughput when doing 
prepend mode at 100Mhz with size limit comparing dummy cs at 30MHz.

> I wonder if we can't do this by selecting the workaround based on the
> configured device speed.  If the device is configured to use more than
> 30MHz then disable the workaround, if it's set to lower then use it.  In
> practice most devices don't change their speed after setup, and the
> driver could check for and handle that case I guess (eg, limit the speed
> configured if the workaround has been activated or rewrite the message
> to a single transfer if neded).  That would be less error prone for
> users, there wouldn't be any possibility of them setting this flag
> incorrectly.
> 
I agree.  We can get rid of the cs workaround dts flag and make run time 
decision based on the device speed. And for the cut off frequency,  we 
want to make it bit more conservative.  According to our VSLI designer, 
the cut off freq depends on the chip floor plan - the distance from the 
SPI peripheral block to the SPI pins. So for some chips, the freq is 
25Mhz, for some is 30Mhz.  So we think 25MHz is good number to use.

As our SDK release uses prepend mode as default and it has been field 
tested with different and multiple SPI devices simultaneously in our 
large customer bases, I don't feel like to change the default mode to 
dummy cs mode and I am more comfortable to stick with prepend mode.

So my preferred choice is to use prepend mode as default. In the 
unlikely event that SPI transfers can not be merged for prepend mode, 
we switch to dummy cs workaround mode and force the transfer to run at 
25MHz. This make sure everything works under the hood automatically and 
no dts setting is required. I can add sysfs knobs to override this 
behavior and force to use either mode just in case. Does this sounds 
good to you?
  
Mark Brown Jan. 19, 2023, 1:09 p.m. UTC | #3
On Wed, Jan 18, 2023 at 03:09:39PM -0800, William Zhang wrote:

> So my preferred choice is to use prepend mode as default. In the unlikely
> event that SPI transfers can not be merged for prepend mode, we switch to
> dummy cs workaround mode and force the transfer to run at 25MHz. This make
> sure everything works under the hood automatically and no dts setting is
> required. I can add sysfs knobs to override this behavior and force to use
> either mode just in case. Does this sounds good to you?

I think so.
  

Patch

diff --git a/drivers/spi/spi-bcm63xx-hsspi.c b/drivers/spi/spi-bcm63xx-hsspi.c
index 398c412dcc3e..b5043251edec 100644
--- a/drivers/spi/spi-bcm63xx-hsspi.c
+++ b/drivers/spi/spi-bcm63xx-hsspi.c
@@ -111,6 +111,7 @@  struct bcm63xx_hsspi {
 
 	u32 speed_hz;
 	u8 cs_polarity;
+	bool use_cs_workaround;
 	int irq;
 };
 
@@ -161,16 +162,18 @@  static int bcm63xx_hsspi_do_txrx(struct spi_device *spi, struct spi_transfer *t)
 {
 	struct bcm63xx_hsspi *bs = spi_master_get_devdata(spi->master);
 	unsigned int chip_select = spi->chip_select;
+	unsigned int hw_cs;
 	u16 opcode = 0;
 	int pending = t->len;
 	int step_size = HSSPI_BUFFER_LEN;
 	const u8 *tx = t->tx_buf;
 	u8 *rx = t->rx_buf;
-	u32 val = 0;
 	unsigned long limit;
+	u32 reg = 0;
 
 	bcm63xx_hsspi_set_clk(bs, spi, t->speed_hz);
-	bcm63xx_hsspi_set_cs(bs, spi->chip_select, true);
+	if (bs->use_cs_workaround)
+		bcm63xx_hsspi_set_cs(bs, spi->chip_select, true);
 
 	if (tx && rx)
 		opcode = HSSPI_OP_READ_WRITE;
@@ -187,14 +190,15 @@  static int bcm63xx_hsspi_do_txrx(struct spi_device *spi, struct spi_transfer *t)
 		opcode |= HSSPI_OP_MULTIBIT;
 
 		if (t->rx_nbits == SPI_NBITS_DUAL)
-			val |= 1 << MODE_CTRL_MULTIDATA_RD_SIZE_SHIFT;
+			reg |= 1 << MODE_CTRL_MULTIDATA_RD_SIZE_SHIFT;
 		if (t->tx_nbits == SPI_NBITS_DUAL)
-			val |= 1 << MODE_CTRL_MULTIDATA_WR_SIZE_SHIFT;
+			reg |= 1 << MODE_CTRL_MULTIDATA_WR_SIZE_SHIFT;
 	}
 
-	__raw_writel(val | 0xff,
+	__raw_writel(reg | 0xff,
 		     bs->regs + HSSPI_PROFILE_MODE_CTRL_REG(chip_select));
 
+	hw_cs = bs->use_cs_workaround ? !chip_select : chip_select;
 	while (pending > 0) {
 		int curr_step = min_t(int, step_size, pending);
 
@@ -211,11 +215,10 @@  static int bcm63xx_hsspi_do_txrx(struct spi_device *spi, struct spi_transfer *t)
 			__raw_writel(HSSPI_PINGx_CMD_DONE(0),
 				     bs->regs + HSSPI_INT_MASK_REG);
 
-		/* start the transfer */
-		__raw_writel(!chip_select << PINGPONG_CMD_SS_SHIFT |
-			     chip_select << PINGPONG_CMD_PROFILE_SHIFT |
-			     PINGPONG_COMMAND_START_NOW,
-			     bs->regs + HSSPI_PINGPONG_COMMAND_REG(0));
+		reg =  hw_cs << PINGPONG_CMD_SS_SHIFT |
+			    chip_select << PINGPONG_CMD_PROFILE_SHIFT |
+			    PINGPONG_COMMAND_START_NOW;
+		__raw_writel(reg, bs->regs + HSSPI_PINGPONG_COMMAND_REG(0));
 
 		if (bs->irq > 0) {
 			if (wait_for_completion_timeout(&bs->done, HZ) == 0)
@@ -223,14 +226,15 @@  static int bcm63xx_hsspi_do_txrx(struct spi_device *spi, struct spi_transfer *t)
 		} else {
 			/* polling mode checks for status busy bit */
 			limit = jiffies + msecs_to_jiffies(HSSPI_POLL_STATUS_TIMEOUT_MS);
+
 			while (!time_after(jiffies, limit)) {
-				val = __raw_readl(bs->regs + HSSPI_PINGPONG_STATUS_REG(0));
-				if (val & HSSPI_PINGPONG_STATUS_SRC_BUSY)
+				reg = __raw_readl(bs->regs + HSSPI_PINGPONG_STATUS_REG(0));
+				if (reg & HSSPI_PINGPONG_STATUS_SRC_BUSY)
 					cpu_relax();
 				else
 					break;
 			}
-			if (val & HSSPI_PINGPONG_STATUS_SRC_BUSY)
+			if (reg & HSSPI_PINGPONG_STATUS_SRC_BUSY)
 				goto err_timeout;
 		}
 
@@ -311,8 +315,10 @@  static int bcm63xx_hsspi_transfer_one(struct spi_master *master,
 	 * e. At the end restore the polarities again to their default values.
 	 */
 
-	dummy_cs = !spi->chip_select;
-	bcm63xx_hsspi_set_cs(bs, dummy_cs, true);
+	if (bs->use_cs_workaround) {
+		dummy_cs = !spi->chip_select;
+		bcm63xx_hsspi_set_cs(bs, dummy_cs, true);
+	}
 
 	list_for_each_entry(t, &msg->transfers, transfer_list) {
 		status = bcm63xx_hsspi_do_txrx(spi, t);
@@ -343,9 +349,11 @@  static int bcm63xx_hsspi_transfer_one(struct spi_master *master,
 		restore_polarity = !t->cs_change;
 	}
 
-	bcm63xx_hsspi_set_cs(bs, dummy_cs, false);
-	if (restore_polarity)
-		bcm63xx_hsspi_set_cs(bs, spi->chip_select, false);
+	if (bs->use_cs_workaround) {
+		bcm63xx_hsspi_set_cs(bs, dummy_cs, false);
+		if (restore_polarity)
+			bcm63xx_hsspi_set_cs(bs, spi->chip_select, false);
+	}
 
 	msg->status = status;
 	spi_finalize_current_message(master);
@@ -445,8 +453,17 @@  static int bcm63xx_hsspi_probe(struct platform_device *pdev)
 	init_completion(&bs->done);
 
 	master->dev.of_node = dev->of_node;
-	if (!dev->of_node)
+	if (!dev->of_node) {
 		master->bus_num = HSSPI_BUS_NUM;
+		/* use dummy cs workaround on old mips dev with no dts support */
+		bs->use_cs_workaround = true;
+	} else {
+		/* check if dummy cs workaround is needed from device tree */
+		bs->use_cs_workaround = of_property_read_bool(
+				    dev->of_node, "brcm,use-cs-workaround");
+	}
+	/* tmp hack. hard code to use cs workaround before prepend mode is added */
+	bs->use_cs_workaround = true;
 
 	of_property_read_u32(dev->of_node, "num-cs", &num_cs);
 	if (num_cs > 8) {