diff mbox series

[11/16] spi: bcm63xx-hsspi: Add prepend feature support

Message ID	20230106200809.330769-12-william.zhang@broadcom.com
State	New
Headers	Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; From: William Zhang <william.zhang@broadcom.com> To: Linux SPI List <linux-spi@vger.kernel.org>, Broadcom Kernel List <bcm-kernel-feedback-list@broadcom.com> Cc: anand.gore@broadcom.com, tomer.yacoby@broadcom.com, dan.beygelman@broadcom.com, joel.peshkin@broadcom.com, f.fainelli@gmail.com, jonas.gorski@gmail.com, kursad.oney@broadcom.com, dregan@mail.com, William Zhang <william.zhang@broadcom.com>, Mark Brown <broonie@kernel.org>, linux-kernel@vger.kernel.org Subject: [PATCH 11/16] spi: bcm63xx-hsspi: Add prepend feature support Date: Fri, 6 Jan 2023 12:08:03 -0800 Message-Id: <20230106200809.330769-12-william.zhang@broadcom.com> In-Reply-To: <20230106200809.330769-1-william.zhang@broadcom.com> References: <20230106200809.330769-1-william.zhang@broadcom.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	spi: bcm63xx-hsspi: driver and doc updates \| [00/16] spi: bcm63xx-hsspi: driver and doc updates [01/16] dt-bindings: spi: Convert bcm63xx-hsspi bindings to json-schema [02/16] dt-bindings: spi: Add bcmbca-hsspi controller support [03/16] dt-bindings: spi: Add spi peripheral specific property [04/16] ARM: dts: broadcom: bcmbca: Add spi controller node [05/16] arm64: dts: broadcom: bcmbca: Add spi controller node [06/16] spi: bcm63xx-hsspi: Endianness fix for ARM based SoC [07/16] spi: bcm63xx-hsspi: Add polling mode support [08/16] spi: bcm63xx-hsspi: Handle cs_change correctly [09/16] spi: bcm63xx-hsspi: Fix multi-bit mode setting [10/16] spi: bcm63xx-hsspi: Make dummy cs workaround as an option [11/16] spi: bcm63xx-hsspi: Add prepend feature support [12/16] spi: bcm63xx-hsspi: Add clock gate disable option support [13/16] spi: spi-mem: Allow controller supporting mem_ops without exec_op [14/16] spi: bcm63xx-hsspi: prepend: Disable spi mem dual io read op support [15/16] spi: bcmbca-hsspi: Add driver for newer HSSPI controller [16/16] MAINTAINERS: Add entry for Broadcom Broadband SoC HS SPI drivers

Commit Message

William Zhang Jan. 6, 2023, 8:08 p.m. UTC

  Multiple transfers within a SPI message may be combined into one
transfer to the controller using its prepend feature. A SPI message is
prependable only if the following are all true:
  * One or more half duplex write transfer
  * Optional full duplex read/write at the end
  * No delay and cs_change between transfers

Most of the SPI device meets this requirements such as SPI NOR,
SPI NAND flash, Broadcom SPI voice card and etc. So this patch
makes use of the prepend feature as the default mode as it has no board
design requirement as the dummy cs workaround needs.

For any SPI device that does not meet the above requirement, dummy cs
mode can be used as long as the board design meets its cs pin usage
requirement or runs below 30MHz clock speed.

Signed-off-by: William Zhang <william.zhang@broadcom.com>
---

 drivers/spi/spi-bcm63xx-hsspi.c | 246 ++++++++++++++++++++++++++++----
 1 file changed, 221 insertions(+), 25 deletions(-)

Comments

Mark Brown Jan. 6, 2023, 10 p.m. UTC | #1

On Fri, Jan 06, 2023 at 12:08:03PM -0800, William Zhang wrote:
> Multiple transfers within a SPI message may be combined into one
> transfer to the controller using its prepend feature. A SPI message is
> prependable only if the following are all true:
>   * One or more half duplex write transfer
>   * Optional full duplex read/write at the end
>   * No delay and cs_change between transfers

There is nothing driver specific here, this should be implemented in the
core - we have existing logic to rewrite messages to match driver
constraints, this could be added there possibly with flags to allow
drivers to disable or enable the merging if they've got special
requirements.

William Zhang Jan. 7, 2023, 3:52 a.m. UTC | #2

On 01/06/2023 02:00 PM, Mark Brown wrote:
> On Fri, Jan 06, 2023 at 12:08:03PM -0800, William Zhang wrote:
>> Multiple transfers within a SPI message may be combined into one
>> transfer to the controller using its prepend feature. A SPI message is
>> prependable only if the following are all true:
>>    * One or more half duplex write transfer
>>    * Optional full duplex read/write at the end
>>    * No delay and cs_change between transfers
> 
> There is nothing driver specific here, this should be implemented in the
> core - we have existing logic to rewrite messages to match driver
> constraints, this could be added there possibly with flags to allow
> drivers to disable or enable the merging if they've got special
> requirements.
> 
My understanding of combining the spi transfer in the core level does 
not quite work out to our controller.  For example, for a spi message 
with three transfers, tx, tx and rx. We can possibly combine them in 
single duplex tx/rx transfer in the core. But this will be treated as 
duplex transaction in our controller level which require tx and rx data 
happens at the same time. Obviously this won't work when rx depends on 
tx happening first. We can not differentiate this combined duplex 
transfer from the true duplex transfer unless there is some flag to 
indicate that. Also there is limit of max tx length as the prepend 
buffer so maybe another parameter.  And another reason to be done in the 
driver level is this prepend mode has dependency on dummy cs workaround 
which is driver level parameter currently.  I am not sure how practical 
and useful this is to factor them out to the core level?  Maybe I didn't 
fully understand your comments and appreciate if you can elaborate or 
point me to the related core code.

Mark Brown Jan. 9, 2023, 7:31 p.m. UTC | #3

On Fri, Jan 06, 2023 at 07:52:35PM -0800, William Zhang wrote:
> On 01/06/2023 02:00 PM, Mark Brown wrote:
> > On Fri, Jan 06, 2023 at 12:08:03PM -0800, William Zhang wrote:

> > > Multiple transfers within a SPI message may be combined into one
> > > transfer to the controller using its prepend feature. A SPI message is
> > > prependable only if the following are all true:
> > >    * One or more half duplex write transfer
> > >    * Optional full duplex read/write at the end
> > >    * No delay and cs_change between transfers

> > There is nothing driver specific here, this should be implemented in the
> > core - we have existing logic to rewrite messages to match driver
> > constraints, this could be added there possibly with flags to allow
> > drivers to disable or enable the merging if they've got special
> > requirements.

> My understanding of combining the spi transfer in the core level does not
> quite work out to our controller.  For example, for a spi message with three
> transfers, tx, tx and rx. We can possibly combine them in single duplex
> tx/rx transfer in the core. But this will be treated as duplex transaction
> in our controller level which require tx and rx data happens at the same
> time. Obviously this won't work when rx depends on tx happening first. We

I'm saying that if this logic is useful then implement in the core
rather than in the driver.

> can not differentiate this combined duplex transfer from the true duplex
> transfer unless there is some flag to indicate that. Also there is limit of
> max tx length as the prepend buffer so maybe another parameter.  And another
> reason to be done in the driver level is this prepend mode has dependency on
> dummy cs workaround which is driver level parameter currently.  I am not
> sure how practical and useful this is to factor them out to the core level?

If this relies on software control of the chip select (which is what I
*think* your dummy CS workaround thing is about, the other patch about
that is really hard to understand) then I'm confused about what the
advantage is?

William Zhang Jan. 9, 2023, 8:43 p.m. UTC | #4

On 01/09/2023 11:31 AM, Mark Brown wrote:
> On Fri, Jan 06, 2023 at 07:52:35PM -0800, William Zhang wrote:
>> On 01/06/2023 02:00 PM, Mark Brown wrote:
>>> On Fri, Jan 06, 2023 at 12:08:03PM -0800, William Zhang wrote:
> 
>>>> Multiple transfers within a SPI message may be combined into one
>>>> transfer to the controller using its prepend feature. A SPI message is
>>>> prependable only if the following are all true:
>>>>     * One or more half duplex write transfer
>>>>     * Optional full duplex read/write at the end
>>>>     * No delay and cs_change between transfers
> 
>>> There is nothing driver specific here, this should be implemented in the
>>> core - we have existing logic to rewrite messages to match driver
>>> constraints, this could be added there possibly with flags to allow
>>> drivers to disable or enable the merging if they've got special
>>> requirements.
> 
>> My understanding of combining the spi transfer in the core level does not
>> quite work out to our controller.  For example, for a spi message with three
>> transfers, tx, tx and rx. We can possibly combine them in single duplex
>> tx/rx transfer in the core. But this will be treated as duplex transaction
>> in our controller level which require tx and rx data happens at the same
>> time. Obviously this won't work when rx depends on tx happening first. We
> 
> I'm saying that if this logic is useful then implement in the core
> rather than in the driver.
> 
>> can not differentiate this combined duplex transfer from the true duplex
>> transfer unless there is some flag to indicate that. Also there is limit of
>> max tx length as the prepend buffer so maybe another parameter.  And another
>> reason to be done in the driver level is this prepend mode has dependency on
>> dummy cs workaround which is driver level parameter currently.  I am not
>> sure how practical and useful this is to factor them out to the core level?
> 
> If this relies on software control of the chip select (which is what I
> *think* your dummy CS workaround thing is about, the other patch about
> that is really hard to understand) then I'm confused about what the
> advantage is?
Dummy CS workaround is implemented by Jonas when he first upstream the 
driver. It does not work on all the board designs so prepend mode is 
introduced. I have some detail explanation on this in [PATCH 10/16] spi: 
bcm63xx-hsspi: Make dummy cs workaround as an option.

The controller only work in one mode and that's why driver code has some 
dependency between these two modes. The advantage of the premode is it 
works on all hw design however it does not support all types mem_ops 
operation. That is why you see the patch 14 to disable the dual io mem 
op. But dummy cs workaround can support this and in case there is such 
pattern from non mem op spi transaction, dummy cs workaround can be used 
as long as it does not have the board design limitation.   So neither 
one is perfect but hopefully with both options available, we can cover 
all the cases.

You mentioned there is some existing logic to rewrite messages to match 
driver constraints in the core driver.  I didn't see it when I did a 
quick search on spi.c. I will take a deep look into the file. But if you 
can point me where this logic is so I can be sure that I am looking at 
the right place and will double check if this can be done or not in the 
core level.  Thanks!


>

Mark Brown Jan. 10, 2023, 9:18 p.m. UTC | #5

On Mon, Jan 09, 2023 at 12:43:53PM -0800, William Zhang wrote:
> On 01/09/2023 11:31 AM, Mark Brown wrote:

> > If this relies on software control of the chip select (which is what I
> > *think* your dummy CS workaround thing is about, the other patch about
> > that is really hard to understand) then I'm confused about what the
> > advantage is?

> Dummy CS workaround is implemented by Jonas when he first upstream the
> driver. It does not work on all the board designs so prepend mode is
> introduced. I have some detail explanation on this in [PATCH 10/16] spi:
> bcm63xx-hsspi: Make dummy cs workaround as an option.

Yes, it is the description in patch 10 that I was having a lot of
trouble following.

> The controller only work in one mode and that's why driver code has some
> dependency between these two modes. The advantage of the premode is it works
> on all hw design however it does not support all types mem_ops operation.
> That is why you see the patch 14 to disable the dual io mem op. But dummy cs
> workaround can support this and in case there is such pattern from non mem
> op spi transaction, dummy cs workaround can be used as long as it does not
> have the board design limitation.   So neither one is perfect but hopefully
> with both options available, we can cover all the cases.

We can't switch modes per message?

> You mentioned there is some existing logic to rewrite messages to match
> driver constraints in the core driver.  I didn't see it when I did a quick
> search on spi.c. I will take a deep look into the file. But if you can point
> me where this logic is so I can be sure that I am looking at the right place
> and will double check if this can be done or not in the core level.  Thanks!

spi_replace_transfers().

William Zhang Jan. 11, 2023, 7:42 p.m. UTC | #6

On 01/10/2023 01:18 PM, Mark Brown wrote:
> On Mon, Jan 09, 2023 at 12:43:53PM -0800, William Zhang wrote:
>> On 01/09/2023 11:31 AM, Mark Brown wrote:
> 
>>> If this relies on software control of the chip select (which is what I
>>> *think* your dummy CS workaround thing is about, the other patch about
>>> that is really hard to understand) then I'm confused about what the
>>> advantage is?
> 
>> Dummy CS workaround is implemented by Jonas when he first upstream the
>> driver. It does not work on all the board designs so prepend mode is
>> introduced. I have some detail explanation on this in [PATCH 10/16] spi:
>> bcm63xx-hsspi: Make dummy cs workaround as an option.
> 
> Yes, it is the description in patch 10 that I was having a lot of
> trouble following.
> 
Sorry that my description is not clear...  I can certainly improve it if 
you can let me know what is not clear.

>> The controller only work in one mode and that's why driver code has some
>> dependency between these two modes. The advantage of the premode is it works
>> on all hw design however it does not support all types mem_ops operation.
>> That is why you see the patch 14 to disable the dual io mem op. But dummy cs
>> workaround can support this and in case there is such pattern from non mem
>> op spi transaction, dummy cs workaround can be used as long as it does not
>> have the board design limitation.   So neither one is perfect but hopefully
>> with both options available, we can cover all the cases.
> 
> We can't switch modes per message?
> 
Technically yes. If the code finds the message is not prependable,  it 
can try to use dummy cs workaround to transfer the message but it may 
also fail if the board design does not work with this workaround. I can 
add this if you think this is good to have.

>> You mentioned there is some existing logic to rewrite messages to match
>> driver constraints in the core driver.  I didn't see it when I did a quick
>> search on spi.c. I will take a deep look into the file. But if you can point
>> me where this logic is so I can be sure that I am looking at the right place
>> and will double check if this can be done or not in the core level.  Thanks!
> 
> spi_replace_transfers().
> 
Okay I saw this function is used by spi_split_transfers_maxsize which a 
few drivers use to limit the transfer size and it make sense.  I can 
come up something like spi_merge_transfers to be used by my driver's 
prepend function.  But it has the same issue I mentioned early as the 
these tx, rx transfers have the dependency on the order they present in 
the original transfer list for my prepend function to work.  And for the 
same reason, it won't be generally useful for other drivers.

Mark Brown Jan. 12, 2023, 4:57 p.m. UTC | #7

On Wed, Jan 11, 2023 at 11:42:57AM -0800, William Zhang wrote:
> On 01/10/2023 01:18 PM, Mark Brown wrote:

> > spi_replace_transfers().

> Okay I saw this function is used by spi_split_transfers_maxsize which a few
> drivers use to limit the transfer size and it make sense.  I can come up
> something like spi_merge_transfers to be used by my driver's prepend
> function.  But it has the same issue I mentioned early as the these tx, rx
> transfers have the dependency on the order they present in the original
> transfer list for my prepend function to work.  And for the same reason, it
> won't be generally useful for other drivers.

I wouldn't be surprised if something else turned up which had similar
constraints, SPI isn't the most complex thing ever so there's a lot of
patterns in controlers that get reproduced.

diff mbox series

Patch

diff --git a/drivers/spi/spi-bcm63xx-hsspi.c b/drivers/spi/spi-bcm63xx-hsspi.c
index b5043251edec..58f2b495c13c 100644
--- a/drivers/spi/spi-bcm63xx-hsspi.c
+++ b/drivers/spi/spi-bcm63xx-hsspi.c
@@ -113,8 +113,208 @@  struct bcm63xx_hsspi {
 	u8 cs_polarity;
 	bool use_cs_workaround;
 	int irq;
+	u32 prepend_cnt;
+	u8 *prepend_buf;
 };
 
+static void bcm63xx_hsspi_set_clk(struct bcm63xx_hsspi *bs,
+				  struct spi_device *spi, int hz);
+
+static size_t bcm63xx_hsspi_max_message_size(struct spi_device *spi)
+{
+	return HSSPI_BUFFER_LEN - HSSPI_OPCODE_LEN;
+}
+
+static int bcm63xx_hsspi_wait_cmd(struct bcm63xx_hsspi *bs)
+{
+	unsigned long limit;
+	u32 reg = 0;
+	int rc = 0;
+
+	if (bs->irq > 0) {
+		if (wait_for_completion_timeout(&bs->done, HZ) == 0)
+			rc = 1;
+	} else {
+		/* polling mode checks for status busy bit */
+		limit = jiffies + msecs_to_jiffies(HSSPI_POLL_STATUS_TIMEOUT_MS);
+
+		while (!time_after(jiffies, limit)) {
+			reg = __raw_readl(bs->regs + HSSPI_PINGPONG_STATUS_REG(0));
+			if (reg & HSSPI_PINGPONG_STATUS_SRC_BUSY)
+				cpu_relax();
+			else
+				break;
+		}
+		if (reg & HSSPI_PINGPONG_STATUS_SRC_BUSY)
+			rc = 1;
+	}
+
+	if (rc)
+		dev_err(&bs->pdev->dev, "transfer timed out!\n");
+
+	return rc;
+}
+
+static bool bcm63xx_check_msg_prependable(struct spi_master *master,
+					  struct spi_message *msg,
+					  struct spi_transfer *t_prepend)
+{
+
+	struct bcm63xx_hsspi *bs = spi_master_get_devdata(master);
+	bool prepend = false, tx_only = false;
+	struct spi_transfer *t;
+
+	/* If cs dummy workaround used, no need to prepend message */
+	if (bs->use_cs_workaround)
+		goto check_done;
+
+	/*
+	 * Multiple transfers within a message may be combined into one transfer
+	 * to the controller using its prepend feature. A SPI message is prependable
+	 * only if the following are all true:
+	 *   1. One or more half duplex write transfer
+	 *   2. Optional full duplex read/write at the end
+	 *   3. No delay and cs_change between transfers
+	 */
+	bs->prepend_cnt = 0;
+	list_for_each_entry(t, &msg->transfers, transfer_list) {
+		if ((spi_delay_to_ns(&t->delay, t) > 0) || t->cs_change) {
+			dev_warn(&bs->pdev->dev,
+				 "Delay or cs change not supported in prepend mode!\n");
+			break;
+		}
+
+		tx_only = false;
+		if (t->tx_buf && !t->rx_buf) {
+			tx_only = true;
+			if (bs->prepend_cnt + t->len >
+			    (HSSPI_BUFFER_LEN - HSSPI_OPCODE_LEN)) {
+				dev_warn(&bs->pdev->dev,
+					 "exceed max buf len, abort prepending transfers!\n");
+				break;
+			}
+			memcpy(bs->prepend_buf + bs->prepend_cnt, t->tx_buf,
+			       t->len);
+			bs->prepend_cnt += t->len;
+		} else {
+			if (!list_is_last(&t->transfer_list, &msg->transfers)) {
+				dev_warn(&bs->pdev->dev,
+					 "rx/tx_rx transfer not supported when it is not last one!\n");
+				break;
+			}
+		}
+
+		if (list_is_last(&t->transfer_list, &msg->transfers)) {
+			memcpy(t_prepend, t, sizeof(struct spi_transfer));
+
+			if (tx_only) {
+				/*
+				 * if the last one is also a tx only transfer, merge all
+				 * them into one single tx transfer
+				 */
+				t_prepend->len = bs->prepend_cnt;
+				t_prepend->tx_buf = bs->prepend_buf;
+				bs->prepend_cnt = 0;
+			} else {
+				/*
+				 * if the last one is not a tx only transfer, all the previous
+				 * transfers are sent through prepend bytes and make sure it does
+				 * not exceed the max prepend len
+				 */
+				if (bs->prepend_cnt > HSSPI_MAX_PREPEND_LEN) {
+					dev_warn(&bs->pdev->dev,
+						"exceed max prepend len, abort prepending transfers!\n");
+					break;
+				}
+			}
+			prepend = true;
+		}
+	}
+
+check_done:
+	if (!bs->use_cs_workaround && !prepend)
+		dev_warn(&bs->pdev->dev,
+		    "Msg not prependable and cs war not used. Transfer may fail!\n");
+
+	return prepend;
+}
+
+static int bcm63xx_hsspi_do_prepend_txrx(struct spi_device *spi,
+					 struct spi_transfer *t)
+{
+	struct bcm63xx_hsspi *bs = spi_master_get_devdata(spi->master);
+	unsigned int chip_select = spi->chip_select;
+	u16 opcode = 0;
+	const u8 *tx = t->tx_buf;
+	u8 *rx = t->rx_buf;
+	u32 reg = 0;
+
+	/*
+	 * shouldn't happen as we set the max_message_size in the probe.
+	 * but check it again in case some driver does not honor the max size
+	 */
+	if (t->len + bs->prepend_cnt > (HSSPI_BUFFER_LEN - HSSPI_OPCODE_LEN)) {
+		dev_warn(&bs->pdev->dev,
+			 "Prepend message large than fifo size len %d prepend %d\n",
+			 t->len, bs->prepend_cnt);
+		return -EINVAL;
+	}
+
+	bcm63xx_hsspi_set_clk(bs, spi, t->speed_hz);
+
+	if (tx && rx)
+		opcode = HSSPI_OP_READ_WRITE;
+	else if (tx)
+		opcode = HSSPI_OP_WRITE;
+	else if (rx)
+		opcode = HSSPI_OP_READ;
+
+	if ((opcode == HSSPI_OP_READ && t->rx_nbits == SPI_NBITS_DUAL) ||
+	    (opcode == HSSPI_OP_WRITE && t->tx_nbits == SPI_NBITS_DUAL)) {
+		opcode |= HSSPI_OP_MULTIBIT;
+
+		if (t->rx_nbits == SPI_NBITS_DUAL) {
+			reg |= 1 << MODE_CTRL_MULTIDATA_RD_SIZE_SHIFT;
+			reg |= bs->prepend_cnt << MODE_CTRL_MULTIDATA_RD_STRT_SHIFT;
+		}
+		if (t->tx_nbits == SPI_NBITS_DUAL) {
+			reg |= 1 << MODE_CTRL_MULTIDATA_WR_SIZE_SHIFT;
+			reg |= bs->prepend_cnt << MODE_CTRL_MULTIDATA_WR_STRT_SHIFT;
+		}
+	}
+
+	reg |= bs->prepend_cnt << MODE_CTRL_PREPENDBYTE_CNT_SHIFT;
+	__raw_writel(reg | 0xff,
+		     bs->regs + HSSPI_PROFILE_MODE_CTRL_REG(chip_select));
+
+	reinit_completion(&bs->done);
+	if (bs->prepend_cnt)
+		memcpy_toio(bs->fifo + HSSPI_OPCODE_LEN, bs->prepend_buf,
+			    bs->prepend_cnt);
+	if (tx)
+		memcpy_toio(bs->fifo + HSSPI_OPCODE_LEN + bs->prepend_cnt, tx,
+			    t->len);
+
+	__raw_writew(cpu_to_be16(opcode | t->len), bs->fifo);
+	/* enable interrupt */
+	if (bs->irq > 0)
+		__raw_writel(HSSPI_PINGx_CMD_DONE(0), bs->regs + HSSPI_INT_MASK_REG);
+
+	/* start the transfer */
+	reg = chip_select << PINGPONG_CMD_SS_SHIFT |
+	    chip_select << PINGPONG_CMD_PROFILE_SHIFT |
+	    PINGPONG_COMMAND_START_NOW;
+	__raw_writel(reg, bs->regs + HSSPI_PINGPONG_COMMAND_REG(0));
+
+	if (bcm63xx_hsspi_wait_cmd(bs))
+		return -ETIMEDOUT;
+
+	if (rx)
+		memcpy_fromio(rx, bs->fifo, t->len);
+
+	return 0;
+}
+
 static void bcm63xx_hsspi_set_cs(struct bcm63xx_hsspi *bs, unsigned int cs,
 				 bool active)
 {
@@ -168,7 +368,6 @@  static int bcm63xx_hsspi_do_txrx(struct spi_device *spi, struct spi_transfer *t)
 	int step_size = HSSPI_BUFFER_LEN;
 	const u8 *tx = t->tx_buf;
 	u8 *rx = t->rx_buf;
-	unsigned long limit;
 	u32 reg = 0;
 
 	bcm63xx_hsspi_set_clk(bs, spi, t->speed_hz);
@@ -220,23 +419,8 @@  static int bcm63xx_hsspi_do_txrx(struct spi_device *spi, struct spi_transfer *t)
 			    PINGPONG_COMMAND_START_NOW;
 		__raw_writel(reg, bs->regs + HSSPI_PINGPONG_COMMAND_REG(0));
 
-		if (bs->irq > 0) {
-			if (wait_for_completion_timeout(&bs->done, HZ) == 0)
-				goto err_timeout;
-		} else {
-			/* polling mode checks for status busy bit */
-			limit = jiffies + msecs_to_jiffies(HSSPI_POLL_STATUS_TIMEOUT_MS);
-
-			while (!time_after(jiffies, limit)) {
-				reg = __raw_readl(bs->regs + HSSPI_PINGPONG_STATUS_REG(0));
-				if (reg & HSSPI_PINGPONG_STATUS_SRC_BUSY)
-					cpu_relax();
-				else
-					break;
-			}
-			if (reg & HSSPI_PINGPONG_STATUS_SRC_BUSY)
-				goto err_timeout;
-		}
+		if (bcm63xx_hsspi_wait_cmd(bs))
+			return -ETIMEDOUT;
 
 		if (rx) {
 			memcpy_fromio(rx, bs->fifo, curr_step);
@@ -247,10 +431,6 @@  static int bcm63xx_hsspi_do_txrx(struct spi_device *spi, struct spi_transfer *t)
 	}
 
 	return 0;
-
-err_timeout:
-	dev_err(&bs->pdev->dev, "transfer timed out!\n");
-	return -ETIMEDOUT;
 }
 
 static int bcm63xx_hsspi_setup(struct spi_device *spi)
@@ -300,8 +480,16 @@  static int bcm63xx_hsspi_transfer_one(struct spi_master *master,
 	int dummy_cs;
 	u32 reg;
 	bool restore_polarity = true;
+	struct spi_transfer t_prepend;
+
+	if (bcm63xx_check_msg_prependable(master, msg, &t_prepend)) {
+		status = bcm63xx_hsspi_do_prepend_txrx(spi, &t_prepend);
+		msg->actual_length += (t_prepend.len + bs->prepend_cnt);
+		goto msg_done;
+	}
 
-	/* This controller does not support keeping CS active during idle.
+	/*
+	 * This controller does not support keeping CS active during idle.
 	 * To work around this, we use the following ugly hack:
 	 *
 	 * a. Invert the target chip select's polarity so it will be active.
@@ -355,6 +543,7 @@  static int bcm63xx_hsspi_transfer_one(struct spi_master *master,
 			bcm63xx_hsspi_set_cs(bs, spi->chip_select, false);
 	}
 
+msg_done:
 	msg->status = status;
 	spi_finalize_current_message(master);
 
@@ -448,6 +637,11 @@  static int bcm63xx_hsspi_probe(struct platform_device *pdev)
 	bs->speed_hz = rate;
 	bs->fifo = (u8 __iomem *)(bs->regs + HSSPI_FIFO_REG(0));
 	bs->irq = irq;
+	bs->prepend_buf = devm_kzalloc(dev, HSSPI_BUFFER_LEN, GFP_KERNEL);
+	if (!bs->prepend_buf) {
+		ret = -ENOMEM;
+		goto out_put_master;
+	}
 
 	mutex_init(&bs->bus_mutex);
 	init_completion(&bs->done);
@@ -462,8 +656,6 @@  static int bcm63xx_hsspi_probe(struct platform_device *pdev)
 		bs->use_cs_workaround = of_property_read_bool(
 				    dev->of_node, "brcm,use-cs-workaround");
 	}
-	/* tmp hack. hard code to use cs workaround before prepend mode is added */
-	bs->use_cs_workaround = true;
 
 	of_property_read_u32(dev->of_node, "num-cs", &num_cs);
 	if (num_cs > 8) {
@@ -474,6 +666,10 @@  static int bcm63xx_hsspi_probe(struct platform_device *pdev)
 	master->num_chipselect = num_cs;
 	master->setup = bcm63xx_hsspi_setup;
 	master->transfer_one_message = bcm63xx_hsspi_transfer_one;
+	if (!bs->use_cs_workaround) {
+		master->max_transfer_size = bcm63xx_hsspi_max_message_size;
+		master->max_message_size = bcm63xx_hsspi_max_message_size;
+	}
 	master->mode_bits = SPI_CPOL | SPI_CPHA | SPI_CS_HIGH |
 			    SPI_RX_DUAL | SPI_TX_DUAL;
 	master->bits_per_word_mask = SPI_BPW_MASK(8);