phy: aquantia: Configure SERDES mode by default

Message ID 20221114210740.3332937-1-sean.anderson@seco.com
State New
Headers
Series phy: aquantia: Configure SERDES mode by default |

Commit Message

Sean Anderson Nov. 14, 2022, 9:07 p.m. UTC
  When autonegotiation completes, the phy interface will be set based on
the global config register for that speed. If the SERDES mode is set to
something which the MAC does not support, then the link will not come
up. The register reference says that the SERDES mode should default to
XFI, but for some phys lower speeds default to XFI/2 (5G XFI). To ensure
the link comes up correctly, configure the SERDES mode.

We use the same configuration for all interfaces. We don't advertise
any speeds faster than the interface mode, so they won't be selected.
We default to pause-based rate adaptation, but enable USXGMII rate
adaptation for USXGMII. I'm not sure if this is correct for
SGMII; it might need USXGMII adaptation instead.

This effectively disables switching interface mode depending on the
speed, in favor of using rate adaptation. If this is not desired, we
would need some kind of API to configure things.

Signed-off-by: Sean Anderson <sean.anderson@seco.com>
---

 drivers/net/phy/aquantia_main.c | 65 +++++++++++++++++++++++++++++++++
 1 file changed, 65 insertions(+)
  

Comments

Vladimir Oltean Nov. 15, 2022, 10:37 p.m. UTC | #1
On Mon, Nov 14, 2022 at 04:07:39PM -0500, Sean Anderson wrote:
> When autonegotiation completes, the phy interface will be set based on
> the global config register for that speed. If the SERDES mode is set to
> something which the MAC does not support, then the link will not come
> up. The register reference says that the SERDES mode should default to
> XFI, but for some phys lower speeds default to XFI/2 (5G XFI). To ensure
> the link comes up correctly, configure the SERDES mode.
> 
> We use the same configuration for all interfaces. We don't advertise
> any speeds faster than the interface mode, so they won't be selected.
> We default to pause-based rate adaptation, but enable USXGMII rate
> adaptation for USXGMII. I'm not sure if this is correct for
> SGMII; it might need USXGMII adaptation instead.
> 
> This effectively disables switching interface mode depending on the
> speed, in favor of using rate adaptation. If this is not desired, we
> would need some kind of API to configure things.
> 
> Signed-off-by: Sean Anderson <sean.anderson@seco.com>
> ---

Was this patch tested and confirmed to do something sane on any platform
at all?
  
Sean Anderson Nov. 15, 2022, 10:46 p.m. UTC | #2
On 11/15/22 17:37, Vladimir Oltean wrote:
> On Mon, Nov 14, 2022 at 04:07:39PM -0500, Sean Anderson wrote:
>> When autonegotiation completes, the phy interface will be set based on
>> the global config register for that speed. If the SERDES mode is set to
>> something which the MAC does not support, then the link will not come
>> up. The register reference says that the SERDES mode should default to
>> XFI, but for some phys lower speeds default to XFI/2 (5G XFI). To ensure
>> the link comes up correctly, configure the SERDES mode.
>> 
>> We use the same configuration for all interfaces. We don't advertise
>> any speeds faster than the interface mode, so they won't be selected.
>> We default to pause-based rate adaptation, but enable USXGMII rate
>> adaptation for USXGMII. I'm not sure if this is correct for
>> SGMII; it might need USXGMII adaptation instead.
>> 
>> This effectively disables switching interface mode depending on the
>> speed, in favor of using rate adaptation. If this is not desired, we
>> would need some kind of API to configure things.
>> 
>> Signed-off-by: Sean Anderson <sean.anderson@seco.com>
>> ---
> 
> Was this patch tested and confirmed to do something sane on any platform
> at all?

This was mainly intended for Tim to test and see if it fixed his problem.

--Sean
  
Vladimir Oltean Nov. 15, 2022, 11:02 p.m. UTC | #3
On Tue, Nov 15, 2022 at 05:46:54PM -0500, Sean Anderson wrote:
> On 11/15/22 17:37, Vladimir Oltean wrote:
> > Was this patch tested and confirmed to do something sane on any platform
> > at all?
> 
> This was mainly intended for Tim to test and see if it fixed his problem.

And that is stated where? Does Tim know he should test it?
If you don't have the certainty that it works, do maintainers know not
to apply it, as many times unfortunately happens when there is no review
comment and the change looks innocuous?

Even if the change works, why would it be a good idea to overwrite some
random registers which are supposed to be configured correctly by the
firmware provided for the board? If the Linux fixup works for one board
with one firmware, how do we know it also works for another board with
the same PHY, but different firmware? Are you willing to take the risk
to break someone's system to find out?

As long as the Aquantia PHY driver doesn't contain all the necessary
steps for bringing the PHY up from a clean slate, but works on top of
what the firmware has done, changes like this make me very uncomfortable
to add any PHY ID to the Aquantia driver. I'd rather leave them with the
Generic C45 driver, even if that means I'll lose interrupt support, rate
matching and things like that.
  
Sean Anderson Nov. 17, 2022, 11:40 p.m. UTC | #4
On 11/15/22 18:02, Vladimir Oltean wrote:
> On Tue, Nov 15, 2022 at 05:46:54PM -0500, Sean Anderson wrote:
>> On 11/15/22 17:37, Vladimir Oltean wrote:
>> > Was this patch tested and confirmed to do something sane on any platform
>> > at all?
>> 
>> This was mainly intended for Tim to test and see if it fixed his problem.
> 
> And that is stated where? Does Tim know he should test it?
> If you don't have the certainty that it works, do maintainers know not
> to apply it, as many times unfortunately happens when there is no review
> comment and the change looks innocuous?

Sorry, I should have done a better job communicating this (probably by
marking it RFC).

> Even if the change works, why would it be a good idea to overwrite some
> random registers which are supposed to be configured correctly by the
> firmware provided for the board?

They're not random registers. They happen to be exactly the same registers
we use to determine if rate adaptation is enabled.

> If the Linux fixup works for one board
> with one firmware, how do we know it also works for another board with
> the same PHY, but different firmware?

How do we know if a fix on one board for any hardware works on another board?

> Are you willing to take the risk to break someone's system to find out?

I hope it doesn't come to that. I would much rather get some feedback/testing
so I can be more confident in whatever we end up doing (or not).

Well, part of my goal in sending out this patch is to get some feedback
on the right thing to do here. As I see it, there are three ways of
configuring this phy:

- Always rate adapt to whatever the initial phy interface mode is
- Switch phy interfaces depending on the link speed
- Do whatever the firmware sets up

On my system, the last option happens to be the same as the first.
However, on Tim's system it's not. I had originally considered doing
this kind of configuration in my initial rate adaptation patch. However,
I deferred it since nothing needed to be configured for me.

The problem here is that if we advertise like we are in the first mode,
but we are not actually, then we can end up negotiating a link mode
which we don't support.

I think there are a few ways to address this:

- Always enable rate adaptation, since that's what we tell phylink we
  do. This is what this patch does. It's a bit risky (since it departs
  from "do whatever the firmware does"). It's also a bit rigid (what if 
- We can check all the registers to ensure we are actually going to rate
  adapt. If we aren't, we tell phylink we don't support it. This is the
  least risky, but we can end up not bringing up the link even in
  circumstances where we could if we configured things properly. And we
  generally know the right way to configure things.
- Add a configuration option (devicetree? ethtool?) on which option
  above to pick. This is probably what we will want to do in the long
  term, but I feel like we have enough information to determine the
  right thing to do most of the time (without needing manual
  intervention).

> As long as the Aquantia PHY driver doesn't contain all the necessary
> steps for bringing the PHY up from a clean slate, but works on top of
> what the firmware has done, changes like this make me very uncomfortable
> to add any PHY ID to the Aquantia driver. I'd rather leave them with the
> Generic C45 driver, even if that means I'll lose interrupt support, rate
> matching and things like that.

I think these registers should be viewed as configuration for the phy as
a whole, rather than as guts which should be configure by firmware. At
least for the fields we're working with, it seems clear to me what's
going on.

--Sean
  
Andrew Lunn Nov. 18, 2022, 12:02 a.m. UTC | #5
> Well, part of my goal in sending out this patch is to get some feedback
> on the right thing to do here. As I see it, there are three ways of
> configuring this phy:
> 
> - Always rate adapt to whatever the initial phy interface mode is
> - Switch phy interfaces depending on the link speed
> - Do whatever the firmware sets up

My understanding of the aQuantia firmware is that it is split into two
parts. The first is the actual firmware that runs on the PHY. The
second is provisioning, which seems to be a bunch of instructions to
put value X in register Y. It seems like aQuantia, now Marvell, give
different provisioning to different customers.

What this means is, you cannot really trust any register contains what
you want, that your devices does the same as somebody elses' device in
its reset state.

So i would say, "Do whatever the firmware sets up" is the worst
choice. Assume nothing, set every register which is important to the
correct value.

	Andrew
  
Vladimir Oltean Nov. 18, 2022, 4:49 p.m. UTC | #6
On Thu, Nov 17, 2022 at 06:40:02PM -0500, Sean Anderson wrote:
> > Even if the change works, why would it be a good idea to overwrite some
> > random registers which are supposed to be configured correctly by the
> > firmware provided for the board?
> 
> They're not random registers. They happen to be exactly the same registers
> we use to determine if rate adaptation is enabled.

As far as I'm concerned, this is just poking in places where there is no
guarantee that the end result will be a known state.

FWIW, for internal testing of multiple SERDES modes all with the same
Aquantia firmware, the NXP SDK also has a quick-and-dirty patch to
change the SERDES protocol on the Aquantia PHY based on device tree:
https://source.codeaurora.org/external/qoriq/qoriq-components/linux/tree/drivers/net/phy/aquantia_main.c?h=lf-5.15.y#n288

but we decided to not upstream such a thing, specifically because
it might react in exotic ways with firmware images shipped by Aquantia
to some of their other customers. I don't work for Aquantia, I am not a
fan of their model of customizing firmware for everyone, and I don't
want to have to support the ensuing breakage, I wouldn't have time for
basically anything else. If you do, I'm not going to stop you. Just be
prepared to help me too ;)

> > If the Linux fixup works for one board
> > with one firmware, how do we know it also works for another board with
> > the same PHY, but different firmware?
> 
> How do we know if a fix on one board for any hardware works on another board?

If both boards start from the same state X and make the same transition
T, they end in the same state Y, no? Aquantia PHYs don't all start from
the same state. Not sure what you'd like me to say.

> Well, part of my goal in sending out this patch is to get some feedback
> on the right thing to do here. As I see it, there are three ways of
> configuring this phy:
> 
> - Always rate adapt to whatever the initial phy interface mode is
> - Switch phy interfaces depending on the link speed
> - Do whatever the firmware sets up

"Do whatever the firmware sets up", which means either bullet 1, or
bullet 2, or a combination of both (unlikely but AFAIU possible).

> 
> On my system, the last option happens to be the same as the first.
> However, on Tim's system it's not. I had originally considered doing
> this kind of configuration in my initial rate adaptation patch. However,
> I deferred it since nothing needed to be configured for me.
> 
> The problem here is that if we advertise like we are in the first mode,
> but we are not actually, then we can end up negotiating a link mode
> which we don't support.

I didn't quite understand in your patch set why there exists a
phydev->rate_matching as well as a phy_get_rate_matching() procedure.
It seems like that's at the root of all issues here? Couldn't
phy_get_rate_matching() be made to look at the hardware registers for
the given interface?

> I think there are a few ways to address this:
> 
> - Always enable rate adaptation, since that's what we tell phylink we
>   do. This is what this patch does. It's a bit risky (since it departs
>   from "do whatever the firmware does"). It's also a bit rigid (what if 

I think the mistake is that we tell phylink we support rate matching
when the firmware provisioning doesn't agree.

> - We can check all the registers to ensure we are actually going to rate
>   adapt. If we aren't, we tell phylink we don't support it. This is the
>   least risky, but we can end up not bringing up the link even in
>   circumstances where we could if we configured things properly. And we
>   generally know the right way to configure things.

Like when?

> - Add a configuration option (devicetree? ethtool?) on which option
>   above to pick. This is probably what we will want to do in the long
>   term, but I feel like we have enough information to determine the
>   right thing to do most of the time (without needing manual
>   intervention).

Not sure I see the need, when long-term there is no volunteer to make
the Linux driver bring Aquantia PHYs to a known state regardless of
vendor provisioning. Until then, there is just no reason to even attempt
this.

> > As long as the Aquantia PHY driver doesn't contain all the necessary
> > steps for bringing the PHY up from a clean slate, but works on top of
> > what the firmware has done, changes like this make me very uncomfortable
> > to add any PHY ID to the Aquantia driver. I'd rather leave them with the
> > Generic C45 driver, even if that means I'll lose interrupt support, rate
> > matching and things like that.
> 
> I think these registers should be viewed as configuration for the phy as
> a whole, rather than as guts which should be configure by firmware. At
> least for the fields we're working with, it seems clear to me what's
> going on.

Until you look at the procedure in the NXP SDK and see that things are a
bit more complicated to get right, like put the PHY in low power mode,
sleep for a while. I think a large part of that was determined experimentally,
out of laziness to change PHY firmware on some riser cards more than anything.
We still expect the production boards to have a good firmware, and Linux
to read what that does and adapt accordingly.
  
Sean Anderson Nov. 18, 2022, 5:11 p.m. UTC | #7
On 11/18/22 11:49, Vladimir Oltean wrote:
> On Thu, Nov 17, 2022 at 06:40:02PM -0500, Sean Anderson wrote:
>> > Even if the change works, why would it be a good idea to overwrite some
>> > random registers which are supposed to be configured correctly by the
>> > firmware provided for the board?
>> 
>> They're not random registers. They happen to be exactly the same registers
>> we use to determine if rate adaptation is enabled.
> 
> As far as I'm concerned, this is just poking in places where there is no
> guarantee that the end result will be a known state.
> 
> FWIW, for internal testing of multiple SERDES modes all with the same
> Aquantia firmware, the NXP SDK also has a quick-and-dirty patch to
> change the SERDES protocol on the Aquantia PHY based on device tree:
> https://source.codeaurora.org/external/qoriq/qoriq-components/linux/tree/drivers/net/phy/aquantia_main.c?h=lf-5.15.y#n288
> 
> but we decided to not upstream such a thing, specifically because
> it might react in exotic ways with firmware images shipped by Aquantia
> to some of their other customers. I don't work for Aquantia, I am not a
> fan of their model of customizing firmware for everyone, and I don't
> want to have to support the ensuing breakage, I wouldn't have time for
> basically anything else. If you do, I'm not going to stop you. Just be
> prepared to help me too ;)
> 
>> > If the Linux fixup works for one board
>> > with one firmware, how do we know it also works for another board with
>> > the same PHY, but different firmware?
>> 
>> How do we know if a fix on one board for any hardware works on another board?
> 
> If both boards start from the same state X and make the same transition
> T, they end in the same state Y, no? Aquantia PHYs don't all start from
> the same state. Not sure what you'd like me to say.
> 
>> Well, part of my goal in sending out this patch is to get some feedback
>> on the right thing to do here. As I see it, there are three ways of
>> configuring this phy:
>> 
>> - Always rate adapt to whatever the initial phy interface mode is
>> - Switch phy interfaces depending on the link speed
>> - Do whatever the firmware sets up
> 
> "Do whatever the firmware sets up", which means either bullet 1, or
> bullet 2, or a combination of both (unlikely but AFAIU possible).

Happened to Tim.

>> 
>> On my system, the last option happens to be the same as the first.
>> However, on Tim's system it's not. I had originally considered doing
>> this kind of configuration in my initial rate adaptation patch. However,
>> I deferred it since nothing needed to be configured for me.
>> 
>> The problem here is that if we advertise like we are in the first mode,
>> but we are not actually, then we can end up negotiating a link mode
>> which we don't support.
> 
> I didn't quite understand in your patch set why there exists a
> phydev->rate_matching as well as a phy_get_rate_matching() procedure.
> It seems like that's at the root of all issues here? Couldn't
> phy_get_rate_matching() be made to look at the hardware registers for
> the given interface?

This is what I propose below as strategy 2. I didn't do this originally,
because it was more complex and I didn't have evidence that we would need it.

>> I think there are a few ways to address this:
>> 
>> - Always enable rate adaptation, since that's what we tell phylink we
>>   do. This is what this patch does. It's a bit risky (since it departs
>>   from "do whatever the firmware does"). It's also a bit rigid (what if 
> 
> I think the mistake is that we tell phylink we support rate matching
> when the firmware provisioning doesn't agree.
> 
>> - We can check all the registers to ensure we are actually going to rate
>>   adapt. If we aren't, we tell phylink we don't support it. This is the
>>   least risky, but we can end up not bringing up the link even in
>>   circumstances where we could if we configured things properly. And we
>>   generally know the right way to configure things.
> 
> Like when?

Well, like whenever the phy says "Please do XFI/2" or some other mode we
don't have a phy interface mode for. We will never be able to tell the MAC
"Please do XFI/2" (until we add an interface mode for it), so that's
obviously wrong.

>> - Add a configuration option (devicetree? ethtool?) on which option
>>   above to pick. This is probably what we will want to do in the long
>>   term, but I feel like we have enough information to determine the
>>   right thing to do most of the time (without needing manual
>>   intervention).
> 
> Not sure I see the need, when long-term there is no volunteer to make
> the Linux driver bring Aquantia PHYs to a known state regardless of
> vendor provisioning. Until then, there is just no reason to even attempt
> this.

I mean a config for option 1 vs 2 above.

>> > As long as the Aquantia PHY driver doesn't contain all the necessary
>> > steps for bringing the PHY up from a clean slate, but works on top of
>> > what the firmware has done, changes like this make me very uncomfortable
>> > to add any PHY ID to the Aquantia driver. I'd rather leave them with the
>> > Generic C45 driver, even if that means I'll lose interrupt support, rate
>> > matching and things like that.
>> 
>> I think these registers should be viewed as configuration for the phy as
>> a whole, rather than as guts which should be configure by firmware. At
>> least for the fields we're working with, it seems clear to me what's
>> going on.
> 
> Until you look at the procedure in the NXP SDK and see that things are a
> bit more complicated to get right, like put the PHY in low power mode,
> sleep for a while. I think a large part of that was determined experimentally,
> out of laziness to change PHY firmware on some riser cards more than anything.
> We still expect the production boards to have a good firmware, and Linux
> to read what that does and adapt accordingly.

Alas, if only Marvell put stuff like this in a manual... All I have is a spec
sheet and the register reference, and my company has an NDA...

We aren't even using this phy on our board, so I am fine disabling rate adaptation
for funky firmwares.

--Sean
  
Vladimir Oltean Nov. 18, 2022, 5:16 p.m. UTC | #8
On Fri, Nov 18, 2022 at 01:02:29AM +0100, Andrew Lunn wrote:
> > Well, part of my goal in sending out this patch is to get some feedback
> > on the right thing to do here. As I see it, there are three ways of
> > configuring this phy:
> > 
> > - Always rate adapt to whatever the initial phy interface mode is
> > - Switch phy interfaces depending on the link speed
> > - Do whatever the firmware sets up
> 
> My understanding of the aQuantia firmware is that it is split into two
> parts. The first is the actual firmware that runs on the PHY. The
> second is provisioning, which seems to be a bunch of instructions to
> put value X in register Y. It seems like aQuantia, now Marvell, give
> different provisioning to different customers.
> 
> What this means is, you cannot really trust any register contains what
> you want, that your devices does the same as somebody elses' device in
> its reset state.
> 
> So i would say, "Do whatever the firmware sets up" is the worst
> choice. Assume nothing, set every register which is important to the
> correct value.

If "do whatever the firmware sets up" is the worst choice, it means you
think it's worse than "doing whatever the firmware sets up, except a few
fixups here and there which worked on my board". Whereas I think _that's_
actually even worse.

What might be an even bigger offence than giving different provisioning
to different customers is giving different documentation to different
customers. In the Aquantia Register Specification for Gen4 PHYs given
to NXP, the SerDes mode field in register 1E.31C cannot even _take_ the
value of 6. They're all documented only from 0 to 5. I only learned that
6 (XFI/2) was a thing from the discussion between Sean and Tim.
  
Vladimir Oltean Nov. 18, 2022, 5:30 p.m. UTC | #9
On Fri, Nov 18, 2022 at 12:11:30PM -0500, Sean Anderson wrote:
> >> - We can check all the registers to ensure we are actually going to rate
> >>   adapt. If we aren't, we tell phylink we don't support it. This is the
> >>   least risky, but we can end up not bringing up the link even in
> >>   circumstances where we could if we configured things properly. And we
> >>   generally know the right way to configure things.
> > 
> > Like when?
> 
> Well, like whenever the phy says "Please do XFI/2" or some other mode we
> don't have a phy interface mode for. We will never be able to tell the MAC
> "Please do XFI/2" (until we add an interface mode for it), so that's
> obviously wrong.

Add an interface mode for it then... But note that I have absolutely no
clue what XFI/2 is. Apparently Aquantia doesn't want NXP to know....

> >> - Add a configuration option (devicetree? ethtool?) on which option
> >>   above to pick. This is probably what we will want to do in the long
> >>   term, but I feel like we have enough information to determine the
> >>   right thing to do most of the time (without needing manual
> >>   intervention).
> > 
> > Not sure I see the need, when long-term there is no volunteer to make
> > the Linux driver bring Aquantia PHYs to a known state regardless of
> > vendor provisioning. Until then, there is just no reason to even attempt
> > this.
> 
> I mean a config for option 1 vs 2 above.

How would this interact with Marek's proposal for phy-mode to be an
array, and some middle entity (phylink?) selects the SERDES protocol and
rate matching algorithm to use for each medium side link speed?
https://patchwork.kernel.org/project/netdevbpf/cover/20211123164027.15618-1-kabel@kernel.org/

> > Until you look at the procedure in the NXP SDK and see that things are a
> > bit more complicated to get right, like put the PHY in low power mode,
> > sleep for a while. I think a large part of that was determined experimentally,
> > out of laziness to change PHY firmware on some riser cards more than anything.
> > We still expect the production boards to have a good firmware, and Linux
> > to read what that does and adapt accordingly.
> 
> Alas, if only Marvell put stuff like this in a manual... All I have is a spec
> sheet and the register reference, and my company has an NDA...

Can't help with much more than providing this hint, sorry. All I can say
is that SERDES protocol override from Linux is possible with care, at
least on some systems. But it may be riddled with landmines.

> We aren't even using this phy on our board, so I am fine disabling rate adaptation
> for funky firmwares.

Disabling rate adaptation is one thing. But there's also the unresolved
XFI/2 issue?
  
Sean Anderson Nov. 18, 2022, 6:01 p.m. UTC | #10
On 11/18/22 12:30, Vladimir Oltean wrote:
> On Fri, Nov 18, 2022 at 12:11:30PM -0500, Sean Anderson wrote:
>> >> - We can check all the registers to ensure we are actually going to rate
>> >>   adapt. If we aren't, we tell phylink we don't support it. This is the
>> >>   least risky, but we can end up not bringing up the link even in
>> >>   circumstances where we could if we configured things properly. And we
>> >>   generally know the right way to configure things.
>> > 
>> > Like when?
>> 
>> Well, like whenever the phy says "Please do XFI/2" or some other mode we
>> don't have a phy interface mode for. We will never be able to tell the MAC
>> "Please do XFI/2" (until we add an interface mode for it), so that's
>> obviously wrong.
> 
> Add an interface mode for it then...

> But note that I have absolutely no clue what XFI/2 is. Apparently
> Aquantia doesn't want NXP to know....

SERDES Mode [2:0]
  0 = XFI
  1 = Reserved
  2 = Reserved
  3 = SGMII
  4 = OCSGMII
  5 = Low Power
  6 = XFI/2 (i.e. XFI 5G)
  7 = XFI*2 (i.e. XFI 20G)

This is about it (aside from a mention in the PHY XS System Interface
Connection Status register). I assume it's over/underclocked XFI, much
like how 2500BASE-X is over/underclocked "SGMII."

I got my manual from Marvell's customer portal (okta). My document is
dated March 10, 2022.

>> >> - Add a configuration option (devicetree? ethtool?) on which option
>> >>   above to pick. This is probably what we will want to do in the long
>> >>   term, but I feel like we have enough information to determine the
>> >>   right thing to do most of the time (without needing manual
>> >>   intervention).
>> > 
>> > Not sure I see the need, when long-term there is no volunteer to make
>> > the Linux driver bring Aquantia PHYs to a known state regardless of
>> > vendor provisioning. Until then, there is just no reason to even attempt
>> > this.
>> 
>> I mean a config for option 1 vs 2 above.
> 
> How would this interact with Marek's proposal for phy-mode to be an
> array, and some middle entity (phylink?) selects the SERDES protocol and
> rate matching algorithm to use for each medium side link speed?
> https://patchwork.kernel.org/project/netdevbpf/cover/20211123164027.15618-1-kabel@kernel.org/

Yeah, this is what I was referring to in the other thread [1]. In order to
implement this properly, we'd need to know what interfaces are supported,
electrically, by the board.

[1] https://lore.kernel.org/netdev/ea320070-a949-c737-22c4-14fd199fdc23@seco.com/

>> > Until you look at the procedure in the NXP SDK and see that things are a
>> > bit more complicated to get right, like put the PHY in low power mode,
>> > sleep for a while. I think a large part of that was determined experimentally,
>> > out of laziness to change PHY firmware on some riser cards more than anything.
>> > We still expect the production boards to have a good firmware, and Linux
>> > to read what that does and adapt accordingly.
>> 
>> Alas, if only Marvell put stuff like this in a manual... All I have is a spec
>> sheet and the register reference, and my company has an NDA...
> 
> Can't help with much more than providing this hint, sorry. All I can say
> is that SERDES protocol override from Linux is possible with care, at
> least on some systems. But it may be riddled with landmines.
> 
>> We aren't even using this phy on our board, so I am fine disabling rate adaptation
>> for funky firmwares.
> 
> Disabling rate adaptation is one thing. But there's also the unresolved
> XFI/2 issue?

I had in mind something like

diff --git a/drivers/net/phy/aquantia_main.c b/drivers/net/phy/aquantia_main.c
index 47a76df36b74..18dfc09e80ef 100644
--- a/drivers/net/phy/aquantia_main.c
+++ b/drivers/net/phy/aquantia_main.c
@@ -109,6 +109,12 @@
  #define VEND1_GLOBAL_CFG_RATE_ADAPT_NONE	0
  #define VEND1_GLOBAL_CFG_RATE_ADAPT_USX		1
  #define VEND1_GLOBAL_CFG_RATE_ADAPT_PAUSE	2
+#define VEND1_GLOBAL_CFG_SERDES_MODE		GENMASK(2, 0)
+#define VEND1_GLOBAL_CFG_SERDES_MODE_XFI	0
+#define VEND1_GLOBAL_CFG_SERDES_MODE_SGMII	3
+#define VEND1_GLOBAL_CFG_SERDES_MODE_OCSGMII	4
+#define VEND1_GLOBAL_CFG_SERDES_MODE_XFI5G	6
+#define VEND1_GLOBAL_CFG_SERDES_MODE_XFI20G	7

  #define VEND1_GLOBAL_RSVD_STAT1			0xc885
  #define VEND1_GLOBAL_RSVD_STAT1_FW_BUILD_ID	GENMASK(7, 4)
@@ -675,14 +681,69 @@ static int aqr107_wait_processor_intensive_op(struct phy_device *phydev)
  	return 0;
  }

+static int aqr107_global_config_serdes_speed(int global_config)
+{
+	switch (FIELD_GET(VEND1_GLOBAL_CFG_SERDES_MODE, global_config)) {
+	case VEND1_GLOBAL_CFG_SERDES_MODE_XFI:
+		return SPEED_10000;
+	case VEND1_GLOBAL_CFG_SERDES_MODE_SGMII:
+		return SPEED_1000;
+	case VEND1_GLOBAL_CFG_SERDES_MODE_OCSGMII:
+		return SPEED_2500;
+	case VEND1_GLOBAL_CFG_SERDES_MODE_XFI5G:
+		return SPEED_5000;
+	case VEND1_GLOBAL_CFG_SERDES_MODE_XFI20G:
+		return SPEED_20000;
+	default:
+		return SPEED_UNKNOWN;
+	}
+}
+
+static bool aqr107_rate_adapt_ok(struct phy_device *phydev, u16 reg, int speed)
+{
+	int val;
+
+	val = phy_read_mmd(phydev, MDIO_MMD_VEND1, reg);
+	if (val < 0) {
+		phydev_warn(phydev, "could not read register %x:%x (err = %d)\n",
+			    MDIO_MMD_VEND1, reg, val);
+		return false;
+	}
+
+	if (FIELD_GET(VEND1_GLOBAL_CFG_RATE_ADAPT, val) !=
+		VEND1_GLOBAL_CFG_RATE_ADAPT_PAUSE)
+		return false;
+
+	if (aqr107_global_config_serdes_speed(val) != speed)
+		return false;
+
+	return true;
+}
+
  static int aqr107_get_rate_matching(struct phy_device *phydev,
  				    phy_interface_t iface)
  {
-	if (iface == PHY_INTERFACE_MODE_10GBASER ||
-	    iface == PHY_INTERFACE_MODE_2500BASEX ||
-	    iface == PHY_INTERFACE_MODE_NA)
+	int speed = phy_interface_max_speed(iface);
+
+	switch (speed) {
+	case SPEED_10000:
+		if (!aqr107_rate_adapt_ok(phydev, VEND1_GLOBAL_CFG_10G, speed) ||
+		    !aqr107_rate_adapt_ok(phydev, VEND1_GLOBAL_CFG_5G, speed))
+			return RATE_MATCH_NONE;
+		fallthrough;
+	case SPEED_2500:
+		if (!aqr107_rate_adapt_ok(phydev, VEND1_GLOBAL_CFG_2_5G, speed))
+			return RATE_MATCH_NONE;
+		fallthrough;
+	case SPEED_1000:
+		if (!aqr107_rate_adapt_ok(phydev, VEND1_GLOBAL_CFG_1G, speed) ||
+		    !aqr107_rate_adapt_ok(phydev, VEND1_GLOBAL_CFG_100M, speed) ||
+		    !aqr107_rate_adapt_ok(phydev, VEND1_GLOBAL_CFG_10M, speed))
+			return RATE_MATCH_NONE;
  		return RATE_MATCH_PAUSE;
-	return RATE_MATCH_NONE;
+	default:
+		return RATE_MATCH_NONE;
+	};
  }

  static int aqr107_suspend(struct phy_device *phydev)
  
Andrew Lunn Nov. 18, 2022, 6:56 p.m. UTC | #11
> What might be an even bigger offence than giving different provisioning
> to different customers is giving different documentation to different
> customers. In the Aquantia Register Specification for Gen4 PHYs given
> to NXP, the SerDes mode field in register 1E.31C cannot even _take_ the
> value of 6. They're all documented only from 0 to 5. I only learned that
> 6 (XFI/2) was a thing from the discussion between Sean and Tim.

At some point we just have to declare the hardware unsupportable in
mainline, too many landmines. We simply stop any further development
on it. If it works for you, great. Otherwise use the vendor crap
driver and complain loudly to the vendor.

The way out could be Marvell puts firmware in linux-firmware with no
provisioning, or a known provisioning. We load that firmware at probe
time, and we only support that.

But do Marvell actually care about mainline? I guess not.

Microchip seem like the better vendor if you care about mainline.
Open datasheets, engagement with the community, etc.

    Andrew
  
Tim Harvey Nov. 29, 2022, 12:20 a.m. UTC | #12
On Thu, Nov 17, 2022 at 4:03 PM Andrew Lunn <andrew@lunn.ch> wrote:
>
> > Well, part of my goal in sending out this patch is to get some feedback
> > on the right thing to do here. As I see it, there are three ways of
> > configuring this phy:
> >
> > - Always rate adapt to whatever the initial phy interface mode is
> > - Switch phy interfaces depending on the link speed
> > - Do whatever the firmware sets up
>
> My understanding of the aQuantia firmware is that it is split into two
> parts. The first is the actual firmware that runs on the PHY. The
> second is provisioning, which seems to be a bunch of instructions to
> put value X in register Y. It seems like aQuantia, now Marvell, give
> different provisioning to different customers.
>
> What this means is, you cannot really trust any register contains what
> you want, that your devices does the same as somebody elses' device in
> its reset state.
>
> So i would say, "Do whatever the firmware sets up" is the worst
> choice. Assume nothing, set every register which is important to the
> correct value.
>
>         Andrew

Andrew,

Sorry for the late reply!

That's exactly what the firmware is doing. According to a Marvell FAE
they add 'provisioning' of registers to the firmware to ease their
support effort. That didn't at all ease things for me because I was
pointed to the wrong firmware by another FAE which led to all of this.

In my case my device-tree is setting the interface to xfi (10g) yet
the firmware has provisioned the link for xfi/2 (5g) - a warning would
have probably helped us all understand the issue but again I was just
pointed to the wrong firmware without an explanation of how their
firmware works.

Tim
  

Patch

diff --git a/drivers/net/phy/aquantia_main.c b/drivers/net/phy/aquantia_main.c
index 47a76df36b74..88a3defb632c 100644
--- a/drivers/net/phy/aquantia_main.c
+++ b/drivers/net/phy/aquantia_main.c
@@ -109,6 +109,10 @@ 
 #define VEND1_GLOBAL_CFG_RATE_ADAPT_NONE	0
 #define VEND1_GLOBAL_CFG_RATE_ADAPT_USX		1
 #define VEND1_GLOBAL_CFG_RATE_ADAPT_PAUSE	2
+#define VEND1_GLOBAL_CFG_SERDES_MODE		GENMASK(2, 0)
+#define VEND1_GLOBAL_CFG_SERDES_MODE_XFI	0
+#define VEND1_GLOBAL_CFG_SERDES_MODE_SGMII	3
+#define VEND1_GLOBAL_CFG_SERDES_MODE_OCSGMII	4
 
 #define VEND1_GLOBAL_RSVD_STAT1			0xc885
 #define VEND1_GLOBAL_RSVD_STAT1_FW_BUILD_ID	GENMASK(7, 4)
@@ -558,6 +562,63 @@  static void aqr107_chip_info(struct phy_device *phydev)
 		   fw_major, fw_minor, build_id, prov_id);
 }
 
+static int aqr107_global_config_init(struct phy_device *phydev)
+{
+	u16 mask = VEND1_GLOBAL_CFG_RATE_ADAPT | VEND1_GLOBAL_CFG_SERDES_MODE;
+	u16 val, serdes_mode, rate_adapt = VEND1_GLOBAL_CFG_RATE_ADAPT_PAUSE;
+	u16 config_regs[] = {
+		VEND1_GLOBAL_CFG_10M,
+		VEND1_GLOBAL_CFG_100M,
+		VEND1_GLOBAL_CFG_1G,
+		VEND1_GLOBAL_CFG_2_5G,
+		VEND1_GLOBAL_CFG_5G,
+		VEND1_GLOBAL_CFG_10G,
+	};
+	int i, ret;
+
+	switch (phydev->interface) {
+	case PHY_INTERFACE_MODE_SGMII:
+	case PHY_INTERFACE_MODE_1000BASEKX:
+		rate_adapt = VEND1_GLOBAL_CFG_RATE_ADAPT_USX;
+		serdes_mode = VEND1_GLOBAL_CFG_SERDES_MODE_SGMII;
+		break;
+
+	case PHY_INTERFACE_MODE_2500BASEX:
+		serdes_mode = VEND1_GLOBAL_CFG_SERDES_MODE_OCSGMII;
+		break;
+
+	case PHY_INTERFACE_MODE_USXGMII:
+		rate_adapt = VEND1_GLOBAL_CFG_RATE_ADAPT_USX;
+		fallthrough;
+	case PHY_INTERFACE_MODE_XGMII:
+	case PHY_INTERFACE_MODE_10GKR:
+	case PHY_INTERFACE_MODE_10GBASER:
+	case PHY_INTERFACE_MODE_XAUI:
+	case PHY_INTERFACE_MODE_RXAUI:
+		serdes_mode = VEND1_GLOBAL_CFG_SERDES_MODE_XFI;
+		break;
+
+	default:
+		WARN_ON_ONCE(1);
+		return -EINVAL;
+	}
+
+	val = FIELD_PREP(VEND1_GLOBAL_CFG_RATE_ADAPT, rate_adapt);
+	val |= FIELD_PREP(VEND1_GLOBAL_CFG_SERDES_MODE, serdes_mode);
+
+	for (i = 0; i < ARRAY_SIZE(config_regs); i++) {
+		ret = phy_modify_mmd(phydev, MDIO_MMD_VEND1, config_regs[i],
+				     mask, val);
+		if (ret) {
+			phydev_err(phydev, "could not initialize register %x\n",
+				   config_regs[i]);
+			return ret;
+		}
+	}
+
+	return 0;
+}
+
 static int aqr107_config_init(struct phy_device *phydev)
 {
 	int ret;
@@ -581,6 +642,10 @@  static int aqr107_config_init(struct phy_device *phydev)
 	if (!ret)
 		aqr107_chip_info(phydev);
 
+	ret = aqr107_global_config_init(phydev);
+	if (ret)
+		return ret;
+
 	return aqr107_set_downshift(phydev, MDIO_AN_VEND_PROV_DOWNSHIFT_DFLT);
 }