[2/2] net: fec: Create device link between phy dev and mac dev

Message ID 20221116144305.2317573-3-xiaolei.wang@windriver.com
State New
Headers
Series Add link between phy dev and mac dev |

Commit Message

xiaolei wang Nov. 16, 2022, 2:43 p.m. UTC
  On imx6sx, there are two fec interfaces, but the external
phys can only be configured by fec0 mii_bus. That means
the fec1 can't work independently, it only work when the
fec0 is active. It is alright in the normal boot since the
fec0 will be probed first. But then the fec0 maybe moved
behind of fec1 in the dpm_list due to various device link.
So in system suspend and resume, we would get the following
warning when configuring the external phy of fec1 via the
fec0 mii_bus due to the inactive of fec0. In order to fix
this issue, we create a device link between phy dev and fec0.
This will make sure that fec0 is always active when fec1
is in active mode.

  WARNING: CPU: 0 PID: 24 at drivers/net/phy/phy.c:983 phy_error+0x20/0x68
  Modules linked in:
  CPU: 0 PID: 24 Comm: kworker/0:2 Not tainted 6.1.0-rc3-00011-g5aaef24b5c6d-dirty #34
  Hardware name: Freescale i.MX6 SoloX (Device Tree)
  Workqueue: events_power_efficient phy_state_machine
  unwind_backtrace from show_stack+0x10/0x14
  show_stack from dump_stack_lvl+0x68/0x90
  dump_stack_lvl from __warn+0xb4/0x24c
  __warn from warn_slowpath_fmt+0x5c/0xd8
  warn_slowpath_fmt from phy_error+0x20/0x68
  phy_error from phy_state_machine+0x22c/0x23c
  phy_state_machine from process_one_work+0x288/0x744
  process_one_work from worker_thread+0x3c/0x500
  worker_thread from kthread+0xf0/0x114
  kthread from ret_from_fork+0x14/0x28
  Exception stack(0xf0951fb0 to 0xf0951ff8)

Fixes: 2f664823a470 ("net: phy: at803x: add device tree binding")
Signed-off-by: Xiaolei Wang <xiaolei.wang@windriver.com>
---
 drivers/net/ethernet/freescale/fec_main.c | 3 +++
 1 file changed, 3 insertions(+)
  

Comments

Andrew Lunn Nov. 16, 2022, 3:07 p.m. UTC | #1
On Wed, Nov 16, 2022 at 10:43:05PM +0800, Xiaolei Wang wrote:
> On imx6sx, there are two fec interfaces, but the external
> phys can only be configured by fec0 mii_bus. That means
> the fec1 can't work independently, it only work when the
> fec0 is active. It is alright in the normal boot since the
> fec0 will be probed first. But then the fec0 maybe moved
> behind of fec1 in the dpm_list due to various device link.
> So in system suspend and resume, we would get the following
> warning when configuring the external phy of fec1 via the
> fec0 mii_bus due to the inactive of fec0. In order to fix
> this issue, we create a device link between phy dev and fec0.
> This will make sure that fec0 is always active when fec1
> is in active mode.
> 
>   WARNING: CPU: 0 PID: 24 at drivers/net/phy/phy.c:983 phy_error+0x20/0x68
>   Modules linked in:
>   CPU: 0 PID: 24 Comm: kworker/0:2 Not tainted 6.1.0-rc3-00011-g5aaef24b5c6d-dirty #34
>   Hardware name: Freescale i.MX6 SoloX (Device Tree)
>   Workqueue: events_power_efficient phy_state_machine
>   unwind_backtrace from show_stack+0x10/0x14
>   show_stack from dump_stack_lvl+0x68/0x90
>   dump_stack_lvl from __warn+0xb4/0x24c
>   __warn from warn_slowpath_fmt+0x5c/0xd8
>   warn_slowpath_fmt from phy_error+0x20/0x68
>   phy_error from phy_state_machine+0x22c/0x23c
>   phy_state_machine from process_one_work+0x288/0x744
>   process_one_work from worker_thread+0x3c/0x500
>   worker_thread from kthread+0xf0/0x114
>   kthread from ret_from_fork+0x14/0x28
>   Exception stack(0xf0951fb0 to 0xf0951ff8)

Please add an explanation why you only do this for the FEC? Is this
not a problem for any board which has the PHY on an MDIO bus not
direct child of the MAC?

       Andrew
  
Florian Fainelli Nov. 16, 2022, 11:27 p.m. UTC | #2
On 11/16/22 07:07, Andrew Lunn wrote:
> On Wed, Nov 16, 2022 at 10:43:05PM +0800, Xiaolei Wang wrote:
>> On imx6sx, there are two fec interfaces, but the external
>> phys can only be configured by fec0 mii_bus. That means
>> the fec1 can't work independently, it only work when the
>> fec0 is active. It is alright in the normal boot since the
>> fec0 will be probed first. But then the fec0 maybe moved
>> behind of fec1 in the dpm_list due to various device link.

Humm, but if FEC1 depends upon its PHY to be available by the FEC0 MDIO 
bus provider, then surely we will need to make sure that FEC0's MDIO bus 
is always functional, and that includes surviving re-ordering as well as 
any sort of run-time power management that can occur.

>> So in system suspend and resume, we would get the following
>> warning when configuring the external phy of fec1 via the
>> fec0 mii_bus due to the inactive of fec0. In order to fix
>> this issue, we create a device link between phy dev and fec0.
>> This will make sure that fec0 is always active when fec1
>> is in active mode.

Still not clear to me how the proposed fix works, let alone how it does 
not leak device links since there is no device_link_del(), also you are 
going to be creating guaranteed regressions by putting that change in 
the PHY library.

It seems to me that you need to address a more fundamental issue within 
the FEC driver and how it registers its internal MDIO busses.

The device link between the PHY and MAC does have the MDIO bus in 
between as a device.

Have you verified your patch is still needed even with 
557d5dc83f6831b4e54d141e9b121850406f9a60 ("net: fec: use mac-managed PHY 
PM")?
  
Andrew Lunn Nov. 16, 2022, 11:57 p.m. UTC | #3
On Wed, Nov 16, 2022 at 03:27:39PM -0800, Florian Fainelli wrote:
> On 11/16/22 07:07, Andrew Lunn wrote:
> > On Wed, Nov 16, 2022 at 10:43:05PM +0800, Xiaolei Wang wrote:
> > > On imx6sx, there are two fec interfaces, but the external
> > > phys can only be configured by fec0 mii_bus. That means
> > > the fec1 can't work independently, it only work when the
> > > fec0 is active. It is alright in the normal boot since the
> > > fec0 will be probed first. But then the fec0 maybe moved
> > > behind of fec1 in the dpm_list due to various device link.
> 
> Humm, but if FEC1 depends upon its PHY to be available by the FEC0 MDIO bus
> provider, then surely we will need to make sure that FEC0's MDIO bus is
> always functional, and that includes surviving re-ordering as well as any
> sort of run-time power management that can occur.

Runtime PM is solved for the FECs MDIO bus, because there are switches
hanging off it, which have their own life cycle independent of the
MAC. This is something i had to fix many moons ago, when the FEC would
power off the MDIO bus when the interface is admin down, stopping
access to the switch. The mdio read and write functions now do run
time pm get and put as needed.

I've never done suspend/resume with a switch, it is not something
needed in the use cases i've covered.

> > > So in system suspend and resume, we would get the following
> > > warning when configuring the external phy of fec1 via the
> > > fec0 mii_bus due to the inactive of fec0. In order to fix
> > > this issue, we create a device link between phy dev and fec0.
> > > This will make sure that fec0 is always active when fec1
> > > is in active mode.
> 
> Still not clear to me how the proposed fix works, let alone how it does not
> leak device links since there is no device_link_del(), also you are going to
> be creating guaranteed regressions by putting that change in the PHY
> library.

The reference leak is bad, but i think phylib is the correct place to
fix this general issue. It is not specific to the FEC. There are other
boards with dual MAC SoCs and they save a couple of pins by putting
both PHYs on one MDIO bus. Having the link should help better
represent the device tree so that suspend/resume can do stuff in the
right order.

      Andrew
  
Florian Fainelli Nov. 17, 2022, 12:21 a.m. UTC | #4
On 11/16/22 15:57, Andrew Lunn wrote:
> On Wed, Nov 16, 2022 at 03:27:39PM -0800, Florian Fainelli wrote:
>> On 11/16/22 07:07, Andrew Lunn wrote:
>>> On Wed, Nov 16, 2022 at 10:43:05PM +0800, Xiaolei Wang wrote:
>>>> On imx6sx, there are two fec interfaces, but the external
>>>> phys can only be configured by fec0 mii_bus. That means
>>>> the fec1 can't work independently, it only work when the
>>>> fec0 is active. It is alright in the normal boot since the
>>>> fec0 will be probed first. But then the fec0 maybe moved
>>>> behind of fec1 in the dpm_list due to various device link.
>>
>> Humm, but if FEC1 depends upon its PHY to be available by the FEC0 MDIO bus
>> provider, then surely we will need to make sure that FEC0's MDIO bus is
>> always functional, and that includes surviving re-ordering as well as any
>> sort of run-time power management that can occur.
> 
> Runtime PM is solved for the FECs MDIO bus, because there are switches
> hanging off it, which have their own life cycle independent of the
> MAC. This is something i had to fix many moons ago, when the FEC would
> power off the MDIO bus when the interface is admin down, stopping
> access to the switch. The mdio read and write functions now do run
> time pm get and put as needed.
> 
> I've never done suspend/resume with a switch, it is not something
> needed in the use cases i've covered.

All of the systems with integrated I worked on had to support 
suspend/resume both with HW maintaining the state, and with HW losing 
the state because of being powered off. The whole thing is IMHO still 
not quite well supported upstream if you have applied some configuration 
more complicated than a bridge or standalone ports. Anyway, this is a 
topic for another 10 years to come :)

> 
>>>> So in system suspend and resume, we would get the following
>>>> warning when configuring the external phy of fec1 via the
>>>> fec0 mii_bus due to the inactive of fec0. In order to fix
>>>> this issue, we create a device link between phy dev and fec0.
>>>> This will make sure that fec0 is always active when fec1
>>>> is in active mode.
>>
>> Still not clear to me how the proposed fix works, let alone how it does not
>> leak device links since there is no device_link_del(), also you are going to
>> be creating guaranteed regressions by putting that change in the PHY
>> library.
> 
> The reference leak is bad, but i think phylib is the correct place to
> fix this general issue. It is not specific to the FEC. There are other
> boards with dual MAC SoCs and they save a couple of pins by putting
> both PHYs on one MDIO bus. Having the link should help better
> represent the device tree so that suspend/resume can do stuff in the
> right order.

My concern is that we already have had a hard time solving the proper 
suspend/resume sequence whether the MAC suspends the PHY or the MDIO bus 
suspends the PHY and throwing device links will either change the 
ordering in subtle ways, or hopefully just provide the same piece of 
information we already have via mac_managed_pm.

It seems like in Xiaolei's case, the MDIO bus should suspend the PHY and 
that ought to take care of all dependencies, one would think.
  
xiaolei wang Nov. 17, 2022, 4:40 a.m. UTC | #5
On 11/16/2022 11:07 PM, Andrew Lunn wrote:
> CAUTION: This email comes from a non Wind River email account!
> Do not click links or open attachments unless you recognize the sender and know the content is safe.
>
> On Wed, Nov 16, 2022 at 10:43:05PM +0800, Xiaolei Wang wrote:
>> On imx6sx, there are two fec interfaces, but the external
>> phys can only be configured by fec0 mii_bus. That means
>> the fec1 can't work independently, it only work when the
>> fec0 is active. It is alright in the normal boot since the
>> fec0 will be probed first. But then the fec0 maybe moved
>> behind of fec1 in the dpm_list due to various device link.
>> So in system suspend and resume, we would get the following
>> warning when configuring the external phy of fec1 via the
>> fec0 mii_bus due to the inactive of fec0. In order to fix
>> this issue, we create a device link between phy dev and fec0.
>> This will make sure that fec0 is always active when fec1
>> is in active mode.
>>
>>    WARNING: CPU: 0 PID: 24 at drivers/net/phy/phy.c:983 phy_error+0x20/0x68
>>    Modules linked in:
>>    CPU: 0 PID: 24 Comm: kworker/0:2 Not tainted 6.1.0-rc3-00011-g5aaef24b5c6d-dirty #34
>>    Hardware name: Freescale i.MX6 SoloX (Device Tree)
>>    Workqueue: events_power_efficient phy_state_machine
>>    unwind_backtrace from show_stack+0x10/0x14
>>    show_stack from dump_stack_lvl+0x68/0x90
>>    dump_stack_lvl from __warn+0xb4/0x24c
>>    __warn from warn_slowpath_fmt+0x5c/0xd8
>>    warn_slowpath_fmt from phy_error+0x20/0x68
>>    phy_error from phy_state_machine+0x22c/0x23c
>>    phy_state_machine from process_one_work+0x288/0x744
>>    process_one_work from worker_thread+0x3c/0x500
>>    worker_thread from kthread+0xf0/0x114
>>    kthread from ret_from_fork+0x14/0x28
>>    Exception stack(0xf0951fb0 to 0xf0951ff8)
> Please add an explanation why you only do this for the FEC? Is this
> not a problem for any board which has the PHY on an MDIO bus not
> direct child of the MAC?

Hi

Oh, my initial idea is to provide such an interface, we can add links 
manually in such cases, if it is to solve this type of problem, my idea 
is to create a link in phy_connect_direct, or is there any better 
suggestion?

thanks

xiaolei

>
>         Andrew
  
xiaolei wang Nov. 17, 2022, 4:40 a.m. UTC | #6
On 11/17/2022 8:21 AM, Florian Fainelli wrote:
> CAUTION: This email comes from a non Wind River email account!
> Do not click links or open attachments unless you recognize the sender 
> and know the content is safe.
>
> On 11/16/22 15:57, Andrew Lunn wrote:
>> On Wed, Nov 16, 2022 at 03:27:39PM -0800, Florian Fainelli wrote:
>>> On 11/16/22 07:07, Andrew Lunn wrote:
>>>> On Wed, Nov 16, 2022 at 10:43:05PM +0800, Xiaolei Wang wrote:
>>>>> On imx6sx, there are two fec interfaces, but the external
>>>>> phys can only be configured by fec0 mii_bus. That means
>>>>> the fec1 can't work independently, it only work when the
>>>>> fec0 is active. It is alright in the normal boot since the
>>>>> fec0 will be probed first. But then the fec0 maybe moved
>>>>> behind of fec1 in the dpm_list due to various device link.
>>>
>>> Humm, but if FEC1 depends upon its PHY to be available by the FEC0 
>>> MDIO bus
>>> provider, then surely we will need to make sure that FEC0's MDIO bus is
>>> always functional, and that includes surviving re-ordering as well 
>>> as any
>>> sort of run-time power management that can occur.
>>
>> Runtime PM is solved for the FECs MDIO bus, because there are switches
>> hanging off it, which have their own life cycle independent of the
>> MAC. This is something i had to fix many moons ago, when the FEC would
>> power off the MDIO bus when the interface is admin down, stopping
>> access to the switch. The mdio read and write functions now do run
>> time pm get and put as needed.
>>
>> I've never done suspend/resume with a switch, it is not something
>> needed in the use cases i've covered.
>
> All of the systems with integrated I worked on had to support
> suspend/resume both with HW maintaining the state, and with HW losing
> the state because of being powered off. The whole thing is IMHO still
> not quite well supported upstream if you have applied some configuration
> more complicated than a bridge or standalone ports. Anyway, this is a
> topic for another 10 years to come :)
>
>>
>>>>> So in system suspend and resume, we would get the following
>>>>> warning when configuring the external phy of fec1 via the
>>>>> fec0 mii_bus due to the inactive of fec0. In order to fix
>>>>> this issue, we create a device link between phy dev and fec0.
>>>>> This will make sure that fec0 is always active when fec1
>>>>> is in active mode.
>>>
>>> Still not clear to me how the proposed fix works, let alone how it 
>>> does not
>>> leak device links since there is no device_link_del(), also you are 
>>> going to
>>> be creating guaranteed regressions by putting that change in the PHY
>>> library.
>>
>> The reference leak is bad, but i think phylib is the correct place to
>> fix this general issue. It is not specific to the FEC. There are other
>> boards with dual MAC SoCs and they save a couple of pins by putting
>> both PHYs on one MDIO bus. Having the link should help better
>> represent the device tree so that suspend/resume can do stuff in the
>> right order.
>
> My concern is that we already have had a hard time solving the proper
> suspend/resume sequence whether the MAC suspends the PHY or the MDIO bus
> suspends the PHY and throwing device links will either change the
> ordering in subtle ways, or hopefully just provide the same piece of
> information we already have via mac_managed_pm.
>
> It seems like in Xiaolei's case, the MDIO bus should suspend the PHY and
> that ought to take care of all dependencies, one would think.

Hi

mac_managed_pm solves the soft reset triggered during aeg. If you modify 
it back to MDIO bus to suspend phy, you still need to solve the problem 
of auto-negotiation,

thanks

xiaolei

> -- 
> Florian
>
  

Patch

diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
index f623c12eaf95..036e1bbafdd2 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -3963,6 +3963,9 @@  fec_probe(struct platform_device *pdev)
 		goto failed_stop_mode;
 
 	phy_node = of_parse_phandle(np, "phy-handle", 0);
+	if (phy_node) {
+		phy_mac_link_add(phy_node, ndev);
+	}
 	if (!phy_node && of_phy_is_fixed_link(np)) {
 		ret = of_phy_register_fixed_link(np);
 		if (ret < 0) {