[0/2] net: stmmac: add DT parameter to keep RX_CLK running in LPI state

Message ID 20230123133747.18896-1-andrey.konovalov@linaro.org
Headers
Series net: stmmac: add DT parameter to keep RX_CLK running in LPI state |

Message

Andrey Konovalov Jan. 23, 2023, 1:37 p.m. UTC
  On my qcs404 based board the ethernet MAC has issues with handling
Rx LPI exit / Rx LPI entry interrupts.

When in LPI mode the "refresh transmission" is received, the driver may
see both "Rx LPI exit", and "Rx LPI entry" bits set in the single read from
GMAC4_LPI_CTRL_STATUS register (vs "Rx LPI exit" first, and "Rx LPI entry"
then). In this case an interrupt storm happens: the LPI interrupt is
triggered every few microseconds - with all the status bits in the
GMAC4_LPI_CTRL_STATUS register being read as zeros. This interrupt storm
continues until a normal non-zero status is read from GMAC4_LPI_CTRL_STATUS
register (single "Rx LPI exit", or "Tx LPI exit").

The reason seems to be in the hardware not being able to properly clear
the "Rx LPI exit" interrupt if GMAC4_LPI_CTRL_STATUS register is read
after Rx LPI mode is entered again.

The current driver unconditionally sets the "Clock-stop enable" bit
(bit 10 in PHY's PCS Control 1 register) when calling phy_init_eee().
Not setting this bit - so that the PHY continues to provide RX_CLK
to the ethernet controller during Rx LPI state - prevents the LPI
interrupt storm.

This patch set adds a new parameter to the stmmac DT:
snps,rx-clk-runs-in-lpi.
If this parameter is present in the device tree, the driver configures
the PHY not to stop RX_CLK after entering Rx LPI state.

Andrey Konovalov (2):
  dt-bindings: net: snps,dwmac: add snps,rx-clk-runs-in-lpi parameter
  net: stmmac: consider snps,rx-clk-runs-in-lpi DT parameter

 Documentation/devicetree/bindings/net/snps,dwmac.yaml | 5 +++++
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c     | 3 ++-
 drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c | 3 +++
 include/linux/stmmac.h                                | 1 +
 4 files changed, 11 insertions(+), 1 deletion(-)
  

Comments

Andrew Lunn Jan. 24, 2023, 1:04 a.m. UTC | #1
On Mon, Jan 23, 2023 at 04:37:45PM +0300, Andrey Konovalov wrote:
> On my qcs404 based board the ethernet MAC has issues with handling
> Rx LPI exit / Rx LPI entry interrupts.
> 
> When in LPI mode the "refresh transmission" is received, the driver may
> see both "Rx LPI exit", and "Rx LPI entry" bits set in the single read from
> GMAC4_LPI_CTRL_STATUS register (vs "Rx LPI exit" first, and "Rx LPI entry"
> then). In this case an interrupt storm happens: the LPI interrupt is
> triggered every few microseconds - with all the status bits in the
> GMAC4_LPI_CTRL_STATUS register being read as zeros. This interrupt storm
> continues until a normal non-zero status is read from GMAC4_LPI_CTRL_STATUS
> register (single "Rx LPI exit", or "Tx LPI exit").
> 
> The reason seems to be in the hardware not being able to properly clear
> the "Rx LPI exit" interrupt if GMAC4_LPI_CTRL_STATUS register is read
> after Rx LPI mode is entered again.
> 
> The current driver unconditionally sets the "Clock-stop enable" bit
> (bit 10 in PHY's PCS Control 1 register) when calling phy_init_eee().
> Not setting this bit - so that the PHY continues to provide RX_CLK
> to the ethernet controller during Rx LPI state - prevents the LPI
> interrupt storm.
> 
> This patch set adds a new parameter to the stmmac DT:
> snps,rx-clk-runs-in-lpi.
> If this parameter is present in the device tree, the driver configures
> the PHY not to stop RX_CLK after entering Rx LPI state.

Do we really need yet another device tree parameter? Could
dwmac-qcom-ethqos.c just do this unconditionally? Is the interrupt
controller part of the licensed IP, or is it from QCOM? If it is part
of the licensed IP, it is probably broken for other devices as well,
so maybe it should be a quirk for all devices of a particular version
of the IP?

   Andrew
  
Andrey Konovalov Jan. 24, 2023, 8:49 a.m. UTC | #2
Hi Andrew,

On 24.01.2023 04:04, Andrew Lunn wrote:
> On Mon, Jan 23, 2023 at 04:37:45PM +0300, Andrey Konovalov wrote:
>> On my qcs404 based board the ethernet MAC has issues with handling
>> Rx LPI exit / Rx LPI entry interrupts.
>>
>> When in LPI mode the "refresh transmission" is received, the driver may
>> see both "Rx LPI exit", and "Rx LPI entry" bits set in the single read from
>> GMAC4_LPI_CTRL_STATUS register (vs "Rx LPI exit" first, and "Rx LPI entry"
>> then). In this case an interrupt storm happens: the LPI interrupt is
>> triggered every few microseconds - with all the status bits in the
>> GMAC4_LPI_CTRL_STATUS register being read as zeros. This interrupt storm
>> continues until a normal non-zero status is read from GMAC4_LPI_CTRL_STATUS
>> register (single "Rx LPI exit", or "Tx LPI exit").
>>
>> The reason seems to be in the hardware not being able to properly clear
>> the "Rx LPI exit" interrupt if GMAC4_LPI_CTRL_STATUS register is read
>> after Rx LPI mode is entered again.
>>
>> The current driver unconditionally sets the "Clock-stop enable" bit
>> (bit 10 in PHY's PCS Control 1 register) when calling phy_init_eee().
>> Not setting this bit - so that the PHY continues to provide RX_CLK
>> to the ethernet controller during Rx LPI state - prevents the LPI
>> interrupt storm.
>>
>> This patch set adds a new parameter to the stmmac DT:
>> snps,rx-clk-runs-in-lpi.
>> If this parameter is present in the device tree, the driver configures
>> the PHY not to stop RX_CLK after entering Rx LPI state.
> 
> Do we really need yet another device tree parameter?

Indeed, there are quite a lot of them already (as this is complex and 
highly configurable device).

> Could
> dwmac-qcom-ethqos.c just do this unconditionally?

Never stopping RX_CLK in Rx LPI state would always work, but the power 
consumption would somewhat increase (in Rx LPI state). Some people do 
care about it.

> Is the interrupt
> controller part of the licensed IP, or is it from QCOM? If it is part
> of the licensed IP, it is probably broken for other devices as well,
> so maybe it should be a quirk for all devices of a particular version
> of the IP?

Most probably this is the part of the ethernet MAC IP. And this is quite 
possible that the issue is specific for particular versions of the IP.
Unfortunately I don't have the documentation related to this particular 
issue. And don't have the test results for different IP versions. It 
looks like testing Energy Efficient Ethernet (EEE) support isn't very 
common (yes, it is enabled by default in the stmmac driver, but if the 
ethernet switch the device is connected to doesn't support EEE then the 
issue wouldn't reveal).

Thanks,
Andrey

>     Andrew
  
Andrew Lunn Jan. 24, 2023, 2:09 p.m. UTC | #3
> > Could
> > dwmac-qcom-ethqos.c just do this unconditionally?
> 
> Never stopping RX_CLK in Rx LPI state would always work, but the power
> consumption would somewhat increase (in Rx LPI state). Some people do care
> about it.
>
> > Is the interrupt
> > controller part of the licensed IP, or is it from QCOM? If it is part
> > of the licensed IP, it is probably broken for other devices as well,
> > so maybe it should be a quirk for all devices of a particular version
> > of the IP?
> 
> Most probably this is the part of the ethernet MAC IP. And this is quite
> possible that the issue is specific for particular versions of the IP.
> Unfortunately I don't have the documentation related to this particular
> issue.

Please could you ask around. Do you have contacts in Qualcomm?
Contacts at Synopsys?

Ideally it would be nice to fix it for everybody, not just one SoC.

As for power consumption, EEE is negotiated. You could look at the
results of autoneg, and only enable this workaround if EEE is actually
part of the resolved results. And maybe look into the clock source,
and only enable this work around if the PHY is the clock source.

	Andrew
  
Rob Herring Jan. 25, 2023, 7:14 p.m. UTC | #4
On Tue, Jan 24, 2023 at 03:09:50PM +0100, Andrew Lunn wrote:
> > > Could
> > > dwmac-qcom-ethqos.c just do this unconditionally?
> > 
> > Never stopping RX_CLK in Rx LPI state would always work, but the power
> > consumption would somewhat increase (in Rx LPI state). Some people do care
> > about it.
> >
> > > Is the interrupt
> > > controller part of the licensed IP, or is it from QCOM? If it is part
> > > of the licensed IP, it is probably broken for other devices as well,
> > > so maybe it should be a quirk for all devices of a particular version
> > > of the IP?
> > 
> > Most probably this is the part of the ethernet MAC IP. And this is quite
> > possible that the issue is specific for particular versions of the IP.
> > Unfortunately I don't have the documentation related to this particular
> > issue.
> 
> Please could you ask around. Do you have contacts in Qualcomm?
> Contacts at Synopsys?
> 
> Ideally it would be nice to fix it for everybody, not just one SoC.

Yes, but to fix for just 1 SoC use the SoC specific compatible to imply 
the need for this. Then only a kernel update is needed to fix, not a 
kernel and dtb update.

Rob
  
Andrey Konovalov Jan. 26, 2023, 9:51 p.m. UTC | #5
On 25.01.2023 22:14, Rob Herring wrote:
> On Tue, Jan 24, 2023 at 03:09:50PM +0100, Andrew Lunn wrote:
>>>> Could
>>>> dwmac-qcom-ethqos.c just do this unconditionally?
>>>
>>> Never stopping RX_CLK in Rx LPI state would always work, but the power
>>> consumption would somewhat increase (in Rx LPI state). Some people do care
>>> about it.
>>>
>>>> Is the interrupt
>>>> controller part of the licensed IP, or is it from QCOM? If it is part
>>>> of the licensed IP, it is probably broken for other devices as well,
>>>> so maybe it should be a quirk for all devices of a particular version
>>>> of the IP?
>>>
>>> Most probably this is the part of the ethernet MAC IP. And this is quite
>>> possible that the issue is specific for particular versions of the IP.
>>> Unfortunately I don't have the documentation related to this particular
>>> issue.
>>
>> Please could you ask around.

I am on it, but it will take time.

>> Do you have contacts in Qualcomm?
>> Contacts at Synopsys?

In Qualcomm only I am afraid.

>> Ideally it would be nice to fix it for everybody, not just one SoC.
> 
> Yes, but to fix for just 1 SoC use the SoC specific compatible to imply
> the need for this. Then only a kernel update is needed to fix, not a
> kernel and dtb update.

That's good point! Thanks!

I've just posted such 1 SoC only version:
https://lore.kernel.org/lkml/20230126213539.166298-1-andrey.konovalov@linaro.org/T/#t
In case this is a more proper way to go.

> Rob

Thanks,
Andrey