[net-next,v7,15/16] net: ethtool: ts: Let the active time stamping layer be selectable

Message ID 20231114-feature_ptp_netnext-v7-15-472e77951e40@bootlin.com
State New
Headers
Series net: Make timestamping selectable |

Commit Message

Köry Maincent Nov. 14, 2023, 11:28 a.m. UTC
  Now that the current timestamp is saved in a variable lets add the
ETHTOOL_MSG_TS_SET ethtool netlink socket to make it selectable.

Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
---

Changes in v2:
- Move selected_timestamping_layer introduction in this patch.
- Replace strmcmp by sysfs_streq.
- Use the PHY timestamp only if available.

Changes in v3:
- Add a devicetree binding to select the preferred timestamp
- Replace the way to select timestamp through ethtool instead of sysfs
You can test it with the ethtool source on branch feature_ptp of:
https://github.com/kmaincent/ethtool

Changes in v4:
- Change the API to select MAC default time stamping instead of the PHY.
- Add a whitelist to no break the old timestamp PHY default preferences
  for current PHY.
- Replace the ethtool ioctl by netlink.
- Add a netdev notifier to allow network device to create trap on PTP
  packets. Not tested as it need to change the lan966x driver that
  implement packet traps. I will do after the hwtstamp management change
  to NDOs.

Changes in v5:
- Remove the netdev notifier added in v4.
- Extract the default timestamp API change in another patch.

Changes in v6:
- Update the error message.
- Put ndo_hwtstamp_set check first as it is most likely what most drivers
  currently do not support.
- Follow timestamping layer naming update.
- Update the timestamp layer check.
- Update timestamp set between MAC and software.

Changes in v7:
- Fix commit title typo.
---
 Documentation/networking/ethtool-netlink.rst | 17 +++++
 include/uapi/linux/ethtool_netlink.h         |  1 +
 net/ethtool/netlink.c                        |  8 +++
 net/ethtool/netlink.h                        |  1 +
 net/ethtool/ts.c                             | 99 ++++++++++++++++++++++++++++
 5 files changed, 126 insertions(+)
  

Comments

Jakub Kicinski Nov. 19, 2023, 2:34 a.m. UTC | #1
On Tue, 14 Nov 2023 12:28:43 +0100 Kory Maincent wrote:
> +	if (!tb[ETHTOOL_A_TS_LAYER])
> +		return 0;

GENL_REQ_ATTR_CHECK(), not sure why anyone would issue this command
without any useful attr.

> +	/* Disable time stamping in the current layer. */
> +	if (netif_device_present(dev) &&
> +	    (dev->ts_layer == PHY_TIMESTAMPING ||
> +	    dev->ts_layer == MAC_TIMESTAMPING)) {
> +		ret = dev_set_hwtstamp_phylib(dev, &config, info->extack);
> +		if (ret < 0)
> +			return ret;

So you only support PHYLIB?

The semantics need to be better documented :(
  
Köry Maincent Nov. 20, 2023, 9:44 a.m. UTC | #2
On Sat, 18 Nov 2023 18:34:33 -0800
Jakub Kicinski <kuba@kernel.org> wrote:

> On Tue, 14 Nov 2023 12:28:43 +0100 Kory Maincent wrote:
> > +	if (!tb[ETHTOOL_A_TS_LAYER])
> > +		return 0;  
> 
> GENL_REQ_ATTR_CHECK(), not sure why anyone would issue this command
> without any useful attr.
> 
> > +	/* Disable time stamping in the current layer. */
> > +	if (netif_device_present(dev) &&
> > +	    (dev->ts_layer == PHY_TIMESTAMPING ||
> > +	    dev->ts_layer == MAC_TIMESTAMPING)) {
> > +		ret = dev_set_hwtstamp_phylib(dev, &config, info->extack);
> > +		if (ret < 0)
> > +			return ret;  
> 
> So you only support PHYLIB?
> 
> The semantics need to be better documented :(

Yes as we don't really know how each MAC deal with the timestamping
before ndo_hwstamp_get/set. Using phylib only allows us to be sure these NDO are
implemented and the management of timestamping is coherent in the MAC. Also It
will push people to move on to these NDOs.

Ok I will add documentation.
  
Vladimir Oltean Nov. 20, 2023, 10:52 a.m. UTC | #3
Hi Köry,

On Mon, Nov 20, 2023 at 10:44:39AM +0100, Köry Maincent wrote:
> On Sat, 18 Nov 2023 18:34:33 -0800
> Jakub Kicinski <kuba@kernel.org> wrote:
> 
> > On Tue, 14 Nov 2023 12:28:43 +0100 Kory Maincent wrote:
> > > +	if (!tb[ETHTOOL_A_TS_LAYER])
> > > +		return 0;  
> > 
> > GENL_REQ_ATTR_CHECK(), not sure why anyone would issue this command
> > without any useful attr.
> > 
> > > +	/* Disable time stamping in the current layer. */
> > > +	if (netif_device_present(dev) &&
> > > +	    (dev->ts_layer == PHY_TIMESTAMPING ||
> > > +	    dev->ts_layer == MAC_TIMESTAMPING)) {
> > > +		ret = dev_set_hwtstamp_phylib(dev, &config, info->extack);
> > > +		if (ret < 0)
> > > +			return ret;  
> > 
> > So you only support PHYLIB?
> > 
> > The semantics need to be better documented :(
> 
> Yes as we don't really know how each MAC deal with the timestamping
> before ndo_hwstamp_get/set. Using phylib only allows us to be sure these NDO are
> implemented and the management of timestamping is coherent in the MAC. Also It
> will push people to move on to these NDOs.
> 
> Ok I will add documentation.
> 
> -- 
> Köry Maincent, Bootlin
> Embedded Linux and kernel engineering
> https://bootlin.com/

When Jakub says "the semantics need to be better documented", I'm also
thinking of a different direction.

From what I understand, Maxime is working on representing multiple
phylib PHYs in the UAPI:
https://patchwork.kernel.org/project/netdevbpf/cover/20231117162323.626979-1-maxime.chevallier@bootlin.com/

Does your UAPI proposal make it possible in any way to select
timestamping in phylib PHY A rather than PHY B? Or do you think it is
extensible to support that, somehow?
  
Köry Maincent Nov. 20, 2023, 11:14 a.m. UTC | #4
On Mon, 20 Nov 2023 12:52:55 +0200
Vladimir Oltean <vladimir.oltean@nxp.com> wrote:

> > > So you only support PHYLIB?
> > > 
> > > The semantics need to be better documented :(  
> > 
> > Yes as we don't really know how each MAC deal with the timestamping
> > before ndo_hwstamp_get/set. Using phylib only allows us to be sure these
> > NDO are implemented and the management of timestamping is coherent in the
> > MAC. Also It will push people to move on to these NDOs.
> > 
> > Ok I will add documentation.
> 
> When Jakub says "the semantics need to be better documented", I'm also
> thinking of a different direction.
> 
> From what I understand, Maxime is working on representing multiple
> phylib PHYs in the UAPI:
> https://patchwork.kernel.org/project/netdevbpf/cover/20231117162323.626979-1-maxime.chevallier@bootlin.com/

Yes I am also following his patch series. 
 
> Does your UAPI proposal make it possible in any way to select
> timestamping in phylib PHY A rather than PHY B? Or do you think it is
> extensible to support that, somehow?

It does not support it for now.
I didn't want to base my work on his series as it could work without it for now
and I didn't want to wait to have his series accepted. It is more a future
possible support as I don't have anything to test it and I don't know if such
hardware exists right now.
I think it will be extensible to support that, my thinking was to create this
struct in net_device struct:

struct {
	enum layer;
	u32 id;
} ts;

With id saving the phy_index of the PHY X used when the layer PHY is selected.
This id could also be used to store the timestamp point in case of several
timestamp in a MAC.

Regards,
  
Vladimir Oltean Nov. 20, 2023, 12:06 p.m. UTC | #5
On Mon, Nov 20, 2023 at 12:14:40PM +0100, Köry Maincent wrote:
> > Does your UAPI proposal make it possible in any way to select
> > timestamping in phylib PHY A rather than PHY B? Or do you think it is
> > extensible to support that, somehow?
> 
> It does not support it for now.
> I didn't want to base my work on his series as it could work without it for now
> and I didn't want to wait to have his series accepted. It is more a future
> possible support as I don't have anything to test it and I don't know if such
> hardware exists right now.
> I think it will be extensible to support that, my thinking was to create this
> struct in net_device struct:
> 
> struct {
> 	enum layer;
> 	u32 id;
> } ts;
> 
> With id saving the phy_index of the PHY X used when the layer PHY is selected.
> This id could also be used to store the timestamp point in case of several
> timestamp in a MAC.

Ok, and I suppose the "u32 id" would be numerically the same as the
ETHTOOL_A_HEADER_PHY_INDEX nlattr that Maxime is proposing?

The next question would be: if a driver performs PHY management in
firmware, and does not use phylib, how should user space interact with it?
What timestamping layer and upon what should the ID be chosen?

Finally (and unrelated to the question above), why is SOFTWARE_TIMESTAMPING
even a layer exposed in the UAPI? My understanding of this patch set is
that it is meant to select the source of hardware timestamps that are
given to a socket. What gap in the UAPI does the introduction of a
SOFTWARE_TIMESTAMPING hwtstamping layer cover?
  
Köry Maincent Nov. 20, 2023, 1:49 p.m. UTC | #6
On Mon, 20 Nov 2023 14:06:01 +0200
Vladimir Oltean <vladimir.oltean@nxp.com> wrote:

> On Mon, Nov 20, 2023 at 12:14:40PM +0100, Köry Maincent wrote:
> > > Does your UAPI proposal make it possible in any way to select
> > > timestamping in phylib PHY A rather than PHY B? Or do you think it is
> > > extensible to support that, somehow?  
> > 
> > It does not support it for now.
> > I didn't want to base my work on his series as it could work without it for
> > now and I didn't want to wait to have his series accepted. It is more a
> > future possible support as I don't have anything to test it and I don't
> > know if such hardware exists right now.
> > I think it will be extensible to support that, my thinking was to create
> > this struct in net_device struct:
> > 
> > struct {
> > 	enum layer;
> > 	u32 id;
> > } ts;
> > 
> > With id saving the phy_index of the PHY X used when the layer PHY is
> > selected. This id could also be used to store the timestamp point in case
> > of several timestamp in a MAC.  
> 
> Ok, and I suppose the "u32 id" would be numerically the same as the
> ETHTOOL_A_HEADER_PHY_INDEX nlattr that Maxime is proposing?

Yes.

> The next question would be: if a driver performs PHY management in
> firmware, and does not use phylib, how should user space interact with it?
> What timestamping layer and upon what should the ID be chosen?

In that case it could be the second options I refereed to.
Using the id to select the right timestamp within the NIC driver.
It indeed won't be called PHY timestamping as it is managed by the NIC firmware
but as it is managed by only one firmware and driver using the id to separate
the available timestamp seems a good idea.

Another solution would be to create another value in the layer enumeration.
PHY_NIC_TIMESTAMPING? Better idea? I am not good at naming.
 
> Finally (and unrelated to the question above), why is SOFTWARE_TIMESTAMPING
> even a layer exposed in the UAPI? My understanding of this patch set is
> that it is meant to select the source of hardware timestamps that are
> given to a socket. What gap in the UAPI does the introduction of a
> SOFTWARE_TIMESTAMPING hwtstamping layer cover?

As I explained to Jakub:
The software timestamping comes from the MAC driver capabilities and I decided
to separate software and MAC timestamping. If we select PHY timestamping we
can't use software timestamping and for an user, selecting the MAC as
timestamping seems not logical to use software timestamping (I got confused
myself when I first dig into it long time ago). Be able to select
directly Software timestamping seems appropriate and won't bring any harm. What
do you think?

Regards,
  
Vladimir Oltean Nov. 20, 2023, 2:23 p.m. UTC | #7
On Mon, Nov 20, 2023 at 02:49:29PM +0100, Köry Maincent wrote:
> > The next question would be: if a driver performs PHY management in
> > firmware, and does not use phylib, how should user space interact with it?
> > What timestamping layer and upon what should the ID be chosen?
> 
> In that case it could be the second options I refereed to.
> Using the id to select the right timestamp within the NIC driver.
> It indeed won't be called PHY timestamping as it is managed by the NIC firmware
> but as it is managed by only one firmware and driver using the id to separate
> the available timestamp seems a good idea.
> 
> Another solution would be to create another value in the layer enumeration.
> PHY_NIC_TIMESTAMPING? Better idea? I am not good at naming.

The point I was trying to make is that your current choice of exposing
PHY_TIMESTAMPING in UAPI, when it really only refers to phylib PHYs,
would lead exactly to this sort of UAPI balkanization where everyone
wants to add more timestamping layers, and to define IDs to be specific
to their own invented layer. Maybe the concept of timestamping layers is
not what user space should see at all.

In previous email discussions, I was proposing to Jakub and you "what if
we didn't let user space select a specific layer like PHY_TIMESTAMPING
or MAC_TIMESTAMPING at all, but just select a specific phc_index as the
provider of hardware timestamps"?

The limitation we're trying to lift is that currently, there can be only
a single provider of hardware timestamps. We make that provider customizable.
There is already a good understanding from user space that, if "ethtool -T"
on an interface says there is no PHC, then there are going to be no
hardware timestamps. So I thought it would be much more intuitive if the
timestamping layer could be selected by the user merely by an unified
phc_index (provided by a phylib phy or firmware based driver or whatever),
and everything else would just be an implementation detail of the kernel.
No one should care that it's a phylib phy, and shouldn't use a different
procedure to identify its ID based on whether it's a phylib or firmware
PHY.

It's a bit hard to align my expectation of what this series should offer
with yours. I think we're talking past each other, which unfortunately
makes me lose track and interest. I wish you could have answered my
earlier question about this alternative proposal.
https://lore.kernel.org/netdev/20231013170903.p3ycicebnfrsmoks@skbuf/

> > Finally (and unrelated to the question above), why is SOFTWARE_TIMESTAMPING
> > even a layer exposed in the UAPI? My understanding of this patch set is
> > that it is meant to select the source of hardware timestamps that are
> > given to a socket. What gap in the UAPI does the introduction of a
> > SOFTWARE_TIMESTAMPING hwtstamping layer cover?
> 
> As I explained to Jakub:
> The software timestamping comes from the MAC driver capabilities and I decided
> to separate software and MAC timestamping.

Why? What was the problem? This confuses me because I don't understand
what is the problem that the solution is trying to address, and whether
the solution is orthogonal to all the other UAPI that exists for
software and hardware timestamping at the socket layer - which AFAIK can
happily coexist.

> If we select PHY timestamping we can't use software timestamping and
> for an user, selecting the MAC as timestamping seems not logical to
> use software timestamping (I got confused myself when I first dig into
> it long time ago). Be able to select directly Software timestamping
> seems appropriate and won't bring any harm. What do you think?

Hmm, can you please explain what is the reason why software timestamping
can't coexist with PHY timestamping? It is a genuine question to which I
don't have an answer - I haven't used PHY timestamping. It must be
something specific to that, since I do know that MAC + software
timestamping work simultaneously just fine.
  
Köry Maincent Nov. 20, 2023, 2:53 p.m. UTC | #8
On Mon, 20 Nov 2023 16:23:16 +0200
Vladimir Oltean <vladimir.oltean@nxp.com> wrote:

> On Mon, Nov 20, 2023 at 02:49:29PM +0100, Köry Maincent wrote:
> > > The next question would be: if a driver performs PHY management in
> > > firmware, and does not use phylib, how should user space interact with it?
> > > What timestamping layer and upon what should the ID be chosen?  
> > 
> > In that case it could be the second options I refereed to.
> > Using the id to select the right timestamp within the NIC driver.
> > It indeed won't be called PHY timestamping as it is managed by the NIC
> > firmware but as it is managed by only one firmware and driver using the id
> > to separate the available timestamp seems a good idea.
> > 
> > Another solution would be to create another value in the layer enumeration.
> > PHY_NIC_TIMESTAMPING? Better idea? I am not good at naming.  
> 
> The point I was trying to make is that your current choice of exposing
> PHY_TIMESTAMPING in UAPI, when it really only refers to phylib PHYs,
> would lead exactly to this sort of UAPI balkanization where everyone
> wants to add more timestamping layers, and to define IDs to be specific
> to their own invented layer. Maybe the concept of timestamping layers is
> not what user space should see at all.
> 
> In previous email discussions, I was proposing to Jakub and you "what if
> we didn't let user space select a specific layer like PHY_TIMESTAMPING
> or MAC_TIMESTAMPING at all, but just select a specific phc_index as the
> provider of hardware timestamps"?
> 
> The limitation we're trying to lift is that currently, there can be only
> a single provider of hardware timestamps. We make that provider customizable.
> There is already a good understanding from user space that, if "ethtool -T"
> on an interface says there is no PHC, then there are going to be no
> hardware timestamps. So I thought it would be much more intuitive if the
> timestamping layer could be selected by the user merely by an unified
> phc_index (provided by a phylib phy or firmware based driver or whatever),
> and everything else would just be an implementation detail of the kernel.
> No one should care that it's a phylib phy, and shouldn't use a different
> procedure to identify its ID based on whether it's a phylib or firmware
> PHY.
> 
> It's a bit hard to align my expectation of what this series should offer
> with yours. I think we're talking past each other, which unfortunately
> makes me lose track and interest. I wish you could have answered my
> earlier question about this alternative proposal.
> https://lore.kernel.org/netdev/20231013170903.p3ycicebnfrsmoks@skbuf/

I did thought about it but I got stuck by the case of hardware timestamping
without PHC. Richard explained the reason of its existence here:
https://lore.kernel.org/netdev/ZS3MKWlnPqTe8gkq@hoboy.vegasvil.org/#t

Maybe I got a bit stuck in my implementation and should investigate more your
proposition and how to deal with this case. Do you have an idea on how to
solve it?

> > > Finally (and unrelated to the question above), why is
> > > SOFTWARE_TIMESTAMPING even a layer exposed in the UAPI? My understanding
> > > of this patch set is that it is meant to select the source of hardware
> > > timestamps that are given to a socket. What gap in the UAPI does the
> > > introduction of a SOFTWARE_TIMESTAMPING hwtstamping layer cover?  
> > 
> > As I explained to Jakub:
> > The software timestamping comes from the MAC driver capabilities and I
> > decided to separate software and MAC timestamping.  
> 
> Why? What was the problem? This confuses me because I don't understand
> what is the problem that the solution is trying to address, and whether
> the solution is orthogonal to all the other UAPI that exists for
> software and hardware timestamping at the socket layer - which AFAIK can
> happily coexist.
> 
> > If we select PHY timestamping we can't use software timestamping and
> > for an user, selecting the MAC as timestamping seems not logical to
> > use software timestamping (I got confused myself when I first dig into
> > it long time ago). Be able to select directly Software timestamping
> > seems appropriate and won't bring any harm. What do you think?  
> 
> Hmm, can you please explain what is the reason why software timestamping
> can't coexist with PHY timestamping? It is a genuine question to which I
> don't have an answer - I haven't used PHY timestamping. It must be
> something specific to that, since I do know that MAC + software
> timestamping work simultaneously just fine.

The software timestamp is managed through the MAC driver calling
skb_tx_timestamp() function. The PHY driver does not call it, that's why there
is no software timestamping in PHY driver capabilities. Also the PHY driver
doesn't know if the MAC driver support it so it currently can not coexist with
PHY timestamping.

Regards,
  
Vladimir Oltean Nov. 20, 2023, 4:10 p.m. UTC | #9
On Mon, Nov 20, 2023 at 03:53:44PM +0100, Köry Maincent wrote:
> I did thought about it but I got stuck by the case of hardware timestamping
> without PHC. Richard explained the reason of its existence here:
> https://lore.kernel.org/netdev/ZS3MKWlnPqTe8gkq@hoboy.vegasvil.org/#t
> 
> Maybe I got a bit stuck in my implementation and should investigate more your
> proposition and how to deal with this case. Do you have an idea on how to
> solve it?

I would take what Richard said with a grain of salt, and interpret as
"there exists hardware with hwts but w/o PHC, and that may work for
marginal use cases", not that "we should design having that as a
first-class citizen in mind".

If there is any need for me to point out that such kind of driver isn't
a first class citizen, then here's an experiment:

diff --git a/drivers/net/ethernet/freescale/enetc/enetc_ethtool.c b/drivers/net/ethernet/freescale/enetc/enetc_ethtool.c
index e993ed04ab57..5755f54197b9 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_ethtool.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_ethtool.c
@@ -842,13 +842,7 @@ static int enetc_get_ts_info(struct net_device *ndev,
 {
 	int *phc_idx;
 	
-	phc_idx = symbol_get(enetc_phc_index);
-	if (phc_idx) {
-		info->phc_index = *phc_idx;
-		symbol_put(enetc_phc_index);
-	} else {
-		info->phc_index = -1;
-	}
+	info->phc_index = -1;
 
 #ifdef CONFIG_FSL_ENETC_PTP_CLOCK
 	info->so_timestamping = SOF_TIMESTAMPING_TX_HARDWARE |

When elected as master, it kinda works, and does synchronize with a
slave, even if ptp4l gets confused about the phc_index being -1.

But when elected as a slave by the BMCA, ptp4l gets confused and thinks
that phc_index == -1 means that it's supposed to use software timestamping.

$ ptp4l -i eno0 -2 -P -m -s
ptp4l[1185.594]: port 1: INITIALIZING to LISTENING on INIT_COMPLETE
ptp4l[1185.598]: port 0: INITIALIZING to LISTENING on INIT_COMPLETE
ptp4l[1186.887]: port 1: new foreign master 00049f.fffe.05f628-1
ptp4l[1190.889]: selected best master clock 00049f.fffe.05f628
ptp4l[1190.890]: port 1: LISTENING to UNCALIBRATED on RS_SLAVE
ptp4l[1191.891]: master offset 37000003850 s0 freq +100000000 path delay       823
ptp4l[1192.896]: master offset 3[ 1235.693485] systemd-journald[125]: Time jumped backwards, rotating.
7000003848 s1 freq +99999998 path delay       822
ptp4l[1193.603]: clockcheck: clock jumped forward or running faster than expected!
ptp4l[1193.892]: clockcheck: clock jumped forward or running faster than expected!
ptp4l[1193.893]: master offset 37000003852 s0 freq +99999998 path delay       822
ptp4l[1194.342]: clockcheck: clock jumped forward or running faster than expected!
ptp4l[1194.604]: clockcheck: clock jumped forward or running faster than expected!

So, I guess the only thing we need to do to this kind of setup is not do
too much harm to it.

We break nothing if we make the phc_index the central aspect of hwts
layer selection - except for the fact that such a MAC won't be able to
change its timestamping layer to be a PHY.

If we wanted to add such a capability to that MAC driver, the obvious
way to solve the lack of a PHC is to create a PHC that returns
-EOPNOTSUPP for all of its ptp_clock_info operations (gettime, settime
etc). It may possibly be that, in the worst case, ptp4l needs to probe
for each syscall on the NIC's PHC being operational before deciding what
can be done with it. But that's already an improvement over the current
handling to make it more graceful, it's not to keep things on par.

> > Hmm, can you please explain what is the reason why software timestamping
> > can't coexist with PHY timestamping? It is a genuine question to which I
> > don't have an answer - I haven't used PHY timestamping. It must be
> > something specific to that, since I do know that MAC + software
> > timestamping work simultaneously just fine.
> 
> The software timestamp is managed through the MAC driver calling
> skb_tx_timestamp() function. The PHY driver does not call it, that's why there
> is no software timestamping in PHY driver capabilities. Also the PHY driver
> doesn't know if the MAC driver support it so it currently can not coexist with
> PHY timestamping.

I don't understand. Documentation/networking/timestamping.rst says that
skb_tx_timestamp() is one of the actual _mechanisms_ through which phylib
timestamping works. It's called by the parent MAC driver, and
mii_ts->txtstamp() hooks onto that. So in some situations, the PHY
timestamping core _piggybacks_ on top of software timestamping support
in the MAC.

Where I agree is that the PHY driver has no business in deciding whether
the interface should report the socket option flags for software timestamping
or not. But I still don't see a proof that they can't coexist. What you need
to explain is what makes said software timestamping unusable in the
presence of PHY timestamping - to justify this separate software
timestamping layer in your UAPI proposal.
  
Köry Maincent Nov. 20, 2023, 5:17 p.m. UTC | #10
On Mon, 20 Nov 2023 18:10:04 +0200
Vladimir Oltean <vladimir.oltean@nxp.com> wrote:

> On Mon, Nov 20, 2023 at 03:53:44PM +0100, Köry Maincent wrote:
> > I did thought about it but I got stuck by the case of hardware timestamping
> > without PHC. Richard explained the reason of its existence here:
> > https://lore.kernel.org/netdev/ZS3MKWlnPqTe8gkq@hoboy.vegasvil.org/#t
> > 
> > Maybe I got a bit stuck in my implementation and should investigate more
> > your proposition and how to deal with this case. Do you have an idea on how
> > to solve it?  
> 
> I would take what Richard said with a grain of salt, and interpret as
> "there exists hardware with hwts but w/o PHC, and that may work for
> marginal use cases", not that "we should design having that as a
> first-class citizen in mind".

> When elected as master, it kinda works, and does synchronize with a
> slave, even if ptp4l gets confused about the phc_index being -1.
> 
> But when elected as a slave by the BMCA, ptp4l gets confused and thinks
> that phc_index == -1 means that it's supposed to use software timestamping.
> 
> So, I guess the only thing we need to do to this kind of setup is not do
> too much harm to it.
> 
> We break nothing if we make the phc_index the central aspect of hwts
> layer selection - except for the fact that such a MAC won't be able to
> change its timestamping layer to be a PHY.
> 
> If we wanted to add such a capability to that MAC driver, the obvious
> way to solve the lack of a PHC is to create a PHC that returns
> -EOPNOTSUPP for all of its ptp_clock_info operations (gettime, settime
> etc). It may possibly be that, in the worst case, ptp4l needs to probe
> for each syscall on the NIC's PHC being operational before deciding what
> can be done with it. But that's already an improvement over the current
> handling to make it more graceful, it's not to keep things on par.

Ok, you convinced me.
Thanks for your arguments and spending time explaining it.


Regards,
  
Jakub Kicinski Nov. 20, 2023, 5:37 p.m. UTC | #11
On Mon, 20 Nov 2023 16:23:16 +0200 Vladimir Oltean wrote:
> In previous email discussions, I was proposing to Jakub and you "what if
> we didn't let user space select a specific layer like PHY_TIMESTAMPING
> or MAC_TIMESTAMPING at all, but just select a specific phc_index as the
> provider of hardware timestamps"?

What about my use case of having a NIC which can stamp at a low rate
at the PHY (for PTP) and high rate at the DMA block (for congestion
control)? Both stamp points have the same PHC index.
  
Andrew Lunn Nov. 20, 2023, 6:39 p.m. UTC | #12
> What about my use case of having a NIC which can stamp at a low rate
> at the PHY (for PTP) and high rate at the DMA block (for congestion
> control)? Both stamp points have the same PHC index.

How theoretical is that? To me, it seems more likely you have two PHC.

The PHY stamping tends to be slow because of the MDIO bus. If the MAC
has fast access to the PHC, it means its not on the MDIO bus. It
probably means you have the PHY integrated into the MAC/SoC, so the
MAC can access it. But if its integrated like this, i don't see why
PHY stamping should be particularly slow. So you can probably use it
for congestion control. And then you don't need DMA stamping.

Do you know of real hardware with a MAC and a PHY sharing a PHC?

   Andrew
  
Jakub Kicinski Nov. 20, 2023, 6:51 p.m. UTC | #13
On Mon, 20 Nov 2023 19:39:35 +0100 Andrew Lunn wrote:
> > What about my use case of having a NIC which can stamp at a low rate
> > at the PHY (for PTP) and high rate at the DMA block (for congestion
> > control)? Both stamp points have the same PHC index.  
> 
> How theoretical is that? To me, it seems more likely you have two PHC.

Very practical. mlx5 does this today, based on guessing and private
ethtool flags.

> The PHY stamping tends to be slow because of the MDIO bus. If the MAC
> has fast access to the PHC, it means its not on the MDIO bus. It
> probably means you have the PHY integrated into the MAC/SoC, so the
> MAC can access it. But if its integrated like this, i don't see why
> PHY stamping should be particularly slow. So you can probably use it
> for congestion control. And then you don't need DMA stamping.

Tx stamps are harder to carry back to the host all the way from the PHY
than from the DMA block when DMA completion is signaled. Rx stamps seem
much easier to carry down the pipeline but apparently some vendors are
incapable of doing that as well.

> Do you know of real hardware with a MAC and a PHY sharing a PHC?

mlx5 for sure, but other designs, too. PHY, NIC pipeline and PCIe PTM
may all need to time stamp from a single time counter.
  
Vladimir Oltean Nov. 20, 2023, 7 p.m. UTC | #14
On Mon, Nov 20, 2023 at 09:37:23AM -0800, Jakub Kicinski wrote:
> On Mon, 20 Nov 2023 16:23:16 +0200 Vladimir Oltean wrote:
> > In previous email discussions, I was proposing to Jakub and you "what if
> > we didn't let user space select a specific layer like PHY_TIMESTAMPING
> > or MAC_TIMESTAMPING at all, but just select a specific phc_index as the
> > provider of hardware timestamps"?
> 
> What about my use case of having a NIC which can stamp at a low rate
> at the PHY (for PTP) and high rate at the DMA block (for congestion
> control)? Both stamp points have the same PHC index.

Well, first of all, given my understanding of the "laws of physics",
I think something has to give in your use case description. I can't
see how on RX, the NIC can decide in advance whether to provide low
rate MAC timestamps for packets going to a socket and high rate DMA
timestamps for packets going to another socket. It can either provide
MAC timestamps, or DMA timestamps, or an unreliable, unpresentable to
user space, mix.

But maybe I'm wrong and there are NICs which can do that filtering.
If such NIC exists, then I guess a SOF_TIMESTAMPING_RX_DMA flag should
be added to the socket layer, and the NIC driver provides timestamps
according to the skb->sk->sk_tsflags, and that problem is completely out
of scope for Köry's patch set - and implicitly compatible with it, since
as you say, the device-wide timestamping layer - PHC index - does not
really change.

If I'm not wrong and the MAC-or-DMA timestamp selection is NIC-wide
(which diverges from your problem description), then neither Köry's work
nor my "everything is a phc_index" proposal will bring your use case to
fruition without further work. Here I would avoid speculating, because a
lot will depend upon the details which you haven't really given.

One question will be whether, in the case of "NIC-wide DMA timestamps",
DMA timestamps should be presented as hardware timestamps - struct
scm_timestamping[2] from CMSG_DATA() - or as their own thing, that user
space needs explicit support for - by parsing a new cmsg level/type.
If DMA timestamps won't look to user space like hardware timestamps,
then the use case is again out of scope for Köry's work, as far as I see
it.

Another simple question is - if NICs do this today - probably by giving
the "unrepresentable mix" to user space in an implicit, hardcoded and
very fine tuned way such that nobody bats an eye - then what is there
more to support? Are you looking at extra UAPI as a way to legitimize
hacks, or do you feel there is extra control that applications can gain?
  
Russell King (Oracle) Nov. 20, 2023, 7:09 p.m. UTC | #15
On Mon, Nov 20, 2023 at 07:39:35PM +0100, Andrew Lunn wrote:
> > What about my use case of having a NIC which can stamp at a low rate
> > at the PHY (for PTP) and high rate at the DMA block (for congestion
> > control)? Both stamp points have the same PHC index.
> 
> How theoretical is that? To me, it seems more likely you have two PHC.
> 
> The PHY stamping tends to be slow because of the MDIO bus. If the MAC
> has fast access to the PHC, it means its not on the MDIO bus. It
> probably means you have the PHY integrated into the MAC/SoC, so the
> MAC can access it. But if its integrated like this, i don't see why
> PHY stamping should be particularly slow. So you can probably use it
> for congestion control. And then you don't need DMA stamping.
> 
> Do you know of real hardware with a MAC and a PHY sharing a PHC?

Not so much a MAC and PHY sharing a PHC, but I notice in USXGMII-M,
there is the ability for the PHY to pass the timestamp back to the MAC
through the USXGMII-M connection - which would eliminate the problems
of accessing the PHY to get that the timestamps.
  
Jakub Kicinski Nov. 20, 2023, 7:58 p.m. UTC | #16
On Mon, 20 Nov 2023 21:00:23 +0200 Vladimir Oltean wrote:
> Well, first of all, given my understanding of the "laws of physics",
> I think something has to give in your use case description. I can't
> see how on RX, the NIC can decide in advance whether to provide low
> rate MAC timestamps for packets going to a socket and high rate DMA
> timestamps for packets going to another socket. It can either provide
> MAC timestamps, or DMA timestamps, or an unreliable, unpresentable to
> user space, mix.

Rx time stamping is configured by filters. Is there a problem with user
specifying that they want "true" timestamps for PTP/NTP packets, and
"dma" timestamps for all the rest?

Maybe we can extend struct scm_timestamping to carry an indication
which stamp ended up in ts[2] but that's less important to me than
the ability to configure the thing. Right now, as I said, mlx5 uses
an ethtool priv flag :(

> But maybe I'm wrong and there are NICs which can do that filtering.
> If such NIC exists, then I guess a SOF_TIMESTAMPING_RX_DMA flag should
> be added to the socket layer, and the NIC driver provides timestamps
> according to the skb->sk->sk_tsflags, and that problem is completely out
> of scope for Köry's patch set - and implicitly compatible with it, since
> as you say, the device-wide timestamping layer - PHC index - does not
> really change.

IDK. Maybe the sniffles I picked up at LPC are clouding my judgment
but to me this patch set is shaped too much by current implementation
and not enough by what it's modeling. It basically exposes to user
space the "mux" for choosing NETDEV vs PHYLIB.

There are multiple time stamping points as the packet moves thru 
the pipeline. Expose them so that SIOC[GS]HWTSTAMP can target each
on individually.

> If I'm not wrong and the MAC-or-DMA timestamp selection is NIC-wide
> (which diverges from your problem description),

Nope.

> then neither Köry's work
> nor my "everything is a phc_index" proposal will bring your use case to
> fruition without further work. Here I would avoid speculating, because a
> lot will depend upon the details which you haven't really given.

What are the details you'd like? PTP gets stamped at the PHY/MAC, 
the rest gets stamped at DMA. mlx5 achieves this by splitting the
PTP traffic to a separate queue pair, and configuring that qp to
capture PHY/MAC stamps, AFAIU.

> One question will be whether, in the case of "NIC-wide DMA timestamps",
> DMA timestamps should be presented as hardware timestamps - struct
> scm_timestamping[2] from CMSG_DATA() - or as their own thing, that user
> space needs explicit support for - by parsing a new cmsg level/type.
> If DMA timestamps won't look to user space like hardware timestamps,
> then the use case is again out of scope for Köry's work, as far as I see
> it.
> 
> Another simple question is - if NICs do this today - probably by giving
> the "unrepresentable mix" to user space in an implicit, hardcoded and
> very fine tuned way such that nobody bats an eye - then what is there
> more to support? Are you looking at extra UAPI as a way to legitimize
> hacks, or do you feel there is extra control that applications can gain?

I don't understand what you're asking me.

DMA timestamping is becoming increasingly important. Ready any
congestion control paper from the last 5 years and chances are
it will be using delay as a signal. If we're extending uAPI
for Hw stamping we should make sure to cater to CC use cases.
  
Vladimir Oltean Nov. 20, 2023, 7:58 p.m. UTC | #17
On Mon, Nov 20, 2023 at 10:51:48AM -0800, Jakub Kicinski wrote:
> On Mon, 20 Nov 2023 19:39:35 +0100 Andrew Lunn wrote:
> > Do you know of real hardware with a MAC and a PHY sharing a PHC?
> 
> mlx5 for sure, but other designs, too. PHY, NIC pipeline and PCIe PTM
> may all need to time stamp from a single time counter.

I'm still waiting for you to fully clarify the "per socket vs global"
aspect, but independently of that, at least I understand why this is a
counter-argument to my proposal. I need to tune it a bit (ASSUMING that
we want DMA timestamps to "look like" hwtimestamps, and not like their
own thing, to user space), because the PHC index would no longer fully
identify a hwtstamp provider, so we need something more.

I imagine both ETHTOOL_MSG_TSINFO_GET and ETHTOOL_MSG_TSINFO_SET to
support a new (nest) nlattr called ETHTOOL_A_TSINFO_HWSTAMP_PROVIDER.

This would contain (u32) ETHTOOL_A_TSINFO_HWSTAMP_PROVIDER_PHC_INDEX
and (u32) ETHTOOL_A_TSINFO_HWSTAMP_PROVIDER_QUALIFIER. It could be
extensible in the future, but this is the baseline and forms the key.

The latter takes values from an:

enum ethtool_hwstamp_provider_qualifier {
	ETHTOOL_HWSTAMP_PROVIDER_QUALIFIER_MAC,
	ETHTOOL_HWSTAMP_PROVIDER_QUALIFIER_PHY,
	ETHTOOL_HWSTAMP_PROVIDER_QUALIFIER_DMA,
};

with Jakub's comments about the various types providing various
qualities of timestamps, given here:
https://lore.kernel.org/netdev/20230511150902.57d9a437@kernel.org/

 - PHY - per spec, at the RS layer
 - MAC - "close to the wire" in the MAC, specifically the pipeline
   delay (PHY stamp vs MAC stamp) should be constant for all packets;
   there must be no variable-time buffering and (for Tx) the time
   stamping must be past the stage of the pipeline affected by pause
   frames
 - DMA - worst quality, variable delay timestamp, usually taken when
   packets DMA descriptors (Rx or completion) are being written


It _sounds_ like we've all been talking about the same thing for ages,
but we weren't.


So, a PHC could offer multiple hwtstamp providers, as many as there are
qualifiers to uniquely describe them. Each hwstamp provider is
represented by a single ETHTOOL_A_TSINFO_HWSTAMP_PROVIDER nested nlattr.
In TSINFO_GET requests, there are as many ETHTOOL_A_TSINFO_HWSTAMP_PROVIDER
nests as there are hwtstamp providers for the NIC.

In the "normal" case of one single hwtstamp provider per PHC, it would
be the responsibility of the driver to set its qualifier to the right
thing: phylib to ETHTOOL_HWSTAMP_PROVIDER_QUALIFIER_PHY, and "normal" MAC
drivers to ETHTOOL_HWSTAMP_PROVIDER_QUALIFIER_MAC. Here, the qualifier
isn't more than an extra (partially redundant) mechanism for user space
to know what it's juggling with.

As opposed to Köry's proposal (where "dev->ts_layer == PHY_TIMESTAMPING"
means actual phylib), ETHTOOL_HWSTAMP_PROVIDER_QUALIFIER_PHY describes
the PHY-like timestamp quality of any hwtstamp provider, be it provided
by a phylib PHY or a firmware-based PHY. It doesn't describe "the layer"
itself.

Does this tick all boxes?
  
Vladimir Oltean Nov. 20, 2023, 9:17 p.m. UTC | #18
On Mon, Nov 20, 2023 at 11:58:39AM -0800, Jakub Kicinski wrote:
> Rx time stamping is configured by filters. Is there a problem with user
> specifying that they want "true" timestamps for PTP/NTP packets, and
> "dma" timestamps for all the rest?

There is, because enum hwtstamp_rx_filters is NIC-wide, and there is
only one of those - corresponding to the single hwtstamp (ts[2]) provider.
There were never talks in this patch set, AFAIR, about multiple hwtstamp
providers active simultaneously (for different traffic streams) and
configuring them independently, with separate RX filters.

> Maybe we can extend struct scm_timestamping to carry an indication
> which stamp ended up in ts[2] but that's less important to me than
> the ability to configure the thing. Right now, as I said, mlx5 uses
> an ethtool priv flag :(

No, you misunderstood me. I didn't suggest (at least not here) to add an
indication to struct scm_timestamping of "what's the source of ts[2]
(the hwtstamp)".

I was just _asking_ (collecting data) whether it's ultimately desirable
for DMA timestamps to be visible in ts[2] (indistinguishable from a
better hwtstamp, as they currently are, I guess) rather than their
own thing. Like for example, in a congestion control algorithm, where
does TCP really get them from.

If they'd rather be their own thing in a fully developed API, then the
whole discussion is rather off-topic to Köry, because here, we're
beating the dead horse of "where does ts[2] come from" - _still_ a
single source, just selectable, that is.

> > But maybe I'm wrong and there are NICs which can do that filtering.
> > If such NIC exists, then I guess a SOF_TIMESTAMPING_RX_DMA flag should
> > be added to the socket layer, and the NIC driver provides timestamps
> > according to the skb->sk->sk_tsflags, and that problem is completely out
> > of scope for Köry's patch set - and implicitly compatible with it, since
> > as you say, the device-wide timestamping layer - PHC index - does not
> > really change.
> 
> IDK. Maybe the sniffles I picked up at LPC are clouding my judgment
> but to me this patch set is shaped too much by current implementation
> and not enough by what it's modeling. It basically exposes to user
> space the "mux" for choosing NETDEV vs PHYLIB.

The last sentence I agree with.

> There are multiple time stamping points as the packet moves thru 
> the pipeline. Expose them so that SIOC[GS]HWTSTAMP can target each
> on individually.

Ok, that is rather vague and complex.

You will forever have to contend with the fact that struct scm_timestamping
can contain a single hwtstamp per packet: ts[2]. So you need complex
control path logic to ensure that the sum of RX filters for all
timestamping points in the packet data path doesn't actually request
more than one ts[2] for any skb.

I understand how that could scrath your itch, but here, it sounds
off-topic?

This is actually starting to get close, in a way, to my feedback to
Richard to allow multiple hwtstamp sources for an skb, and to just give
an indication in the cmsg of what's their source, leaving user space to
figure out the rest.
https://lore.kernel.org/netdev/20220120164832.xdebp5vykib6h6dp@skbuf/

But his response was "There was a fair amount of discussion, and it
seemed to me that everyone wanted a pony."
https://lore.kernel.org/netdev/Y%2F0Idkhy27TObawi@hoboy.vegasvil.org/

I mean, IDK, maybe it's not off-topic, but it's a round-about way of
achieving what they think they can achieve in a more straightforward way.

Rephrased in my own words, you're saying:

Forget the concept of an active hwtstamp provider, just open up the
knobs of _all_ possible hwtstamp providers for a NIC. Simultaneously!
To make one active and all the others inactive, just use
HWTSTAMP_FILTER_NONE/HWTSTAMP_TX_OFF for all except one, and the desired
enum hwtstamp_rx_filters / enum hwtstamp_tx_types for the active one.
Live with this expanded configuration model for a while, just restricted
for a single active timestamping layer, and then, once user space is
ready for an enhanced struct scm_timestamping which supports potentially
multiple cmsgs with distinct hwtstamps, remove the restriction and let
it all rip! Everybody gets their pony!

Additionally, SIOCSHWTSTAMP is kinda rusty, has a fixed binary format,
and is not extensible to target a specific hwtstamp provider. So a
netlink conversion of that, as a first step, would of course be great.

Is it an accurate summary?

If it is, I'll let Köry comment on the feasibility :)

> > If I'm not wrong and the MAC-or-DMA timestamp selection is NIC-wide
> > (which diverges from your problem description),
> 
> Nope.
> 
> > then neither Köry's work
> > nor my "everything is a phc_index" proposal will bring your use case to
> > fruition without further work. Here I would avoid speculating, because a
> > lot will depend upon the details which you haven't really given.
> 
> What are the details you'd like? PTP gets stamped at the PHY/MAC, 
> the rest gets stamped at DMA. mlx5 achieves this by splitting the
> PTP traffic to a separate queue pair, and configuring that qp to
> capture PHY/MAC stamps, AFAIU.

That's enough, thanks.

> > One question will be whether, in the case of "NIC-wide DMA timestamps",
> > DMA timestamps should be presented as hardware timestamps - struct
> > scm_timestamping[2] from CMSG_DATA() - or as their own thing, that user
> > space needs explicit support for - by parsing a new cmsg level/type.
> > If DMA timestamps won't look to user space like hardware timestamps,
> > then the use case is again out of scope for Köry's work, as far as I see
> > it.
> > 
> > Another simple question is - if NICs do this today - probably by giving
> > the "unrepresentable mix" to user space in an implicit, hardcoded and
> > very fine tuned way such that nobody bats an eye - then what is there
> > more to support? Are you looking at extra UAPI as a way to legitimize
> > hacks, or do you feel there is extra control that applications can gain?
> 
> I don't understand what you're asking me.

You've partially answered above. The mix of timestamps coming from the
PHY/MAC and those coming from the DMA is unrepresentable in today's
UAPI, and is just fine-tuned to work for the existing use case of "PTP
gets PHY/MAC, everything else gets DMA".

Still not 100% clear what would the proper UAPI (separate user-controllable
RX filters for PHY, MAC and DMA) gain, in addition to what exists in mlx5.

> DMA timestamping is becoming increasingly important. Ready any
> congestion control paper from the last 5 years and chances are
> it will be using delay as a signal. If we're extending uAPI
> for Hw stamping we should make sure to cater to CC use cases.

I'll stop commenting here for a while and go read some of those papers and
RFCs, in the hope that I find some info about the way in which TCP expects
hwtstamps from a NIC. It's quite evident to me that I don't have enough
information to help this discussion reach a conclusion.
  
Jakub Kicinski Nov. 20, 2023, 9:37 p.m. UTC | #19
On Mon, 20 Nov 2023 23:17:59 +0200 Vladimir Oltean wrote:
> Forget the concept of an active hwtstamp provider, just open up the
> knobs of _all_ possible hwtstamp providers for a NIC. Simultaneously!
> To make one active and all the others inactive, just use
> HWTSTAMP_FILTER_NONE/HWTSTAMP_TX_OFF for all except one, and the desired
> enum hwtstamp_rx_filters / enum hwtstamp_tx_types for the active one.
> Live with this expanded configuration model for a while, just restricted
> for a single active timestamping layer, and then, once user space is
> ready for an enhanced struct scm_timestamping which supports potentially
> multiple cmsgs with distinct hwtstamps, remove the restriction and let
> it all rip! Everybody gets their pony!
> 
> Additionally, SIOCSHWTSTAMP is kinda rusty, has a fixed binary format,
> and is not extensible to target a specific hwtstamp provider. So a
> netlink conversion of that, as a first step, would of course be great.
> 
> Is it an accurate summary?

Yes.

For now we can impose the requirement that only one can be active 
easily at the kernel level. But the uAPI should allow expressing more.

> You've partially answered above. The mix of timestamps coming from the
> PHY/MAC and those coming from the DMA is unrepresentable in today's
> UAPI, and is just fine-tuned to work for the existing use case of "PTP
> gets PHY/MAC, everything else gets DMA".
> 
> Still not 100% clear what would the proper UAPI (separate user-controllable
> RX filters for PHY, MAC and DMA) gain, in addition to what exists in mlx5.

Too late for mlx5 but I'm anticipating that more vendors will start
needing such configuration in the future. At which point it will be
good to have an API in place.
  
Jakub Kicinski Nov. 20, 2023, 9:45 p.m. UTC | #20
On Mon, 20 Nov 2023 21:58:58 +0200 Vladimir Oltean wrote:
> I'm still waiting for you to fully clarify the "per socket vs global"
> aspect, but independently of that, at least I understand why this is a
> counter-argument to my proposal. I need to tune it a bit (ASSUMING that
> we want DMA timestamps to "look like" hwtimestamps, and not like their
> own thing, to user space), because the PHC index would no longer fully
> identify a hwtstamp provider, so we need something more.
> 
> I imagine both ETHTOOL_MSG_TSINFO_GET and ETHTOOL_MSG_TSINFO_SET to
> support a new (nest) nlattr called ETHTOOL_A_TSINFO_HWSTAMP_PROVIDER.
> 
> This would contain (u32) ETHTOOL_A_TSINFO_HWSTAMP_PROVIDER_PHC_INDEX
> and (u32) ETHTOOL_A_TSINFO_HWSTAMP_PROVIDER_QUALIFIER. It could be
> extensible in the future, but this is the baseline and forms the key.
> 
> The latter takes values from an:
> 
> enum ethtool_hwstamp_provider_qualifier {
> 	ETHTOOL_HWSTAMP_PROVIDER_QUALIFIER_MAC,
> 	ETHTOOL_HWSTAMP_PROVIDER_QUALIFIER_PHY,
> 	ETHTOOL_HWSTAMP_PROVIDER_QUALIFIER_DMA,
> };

Sounds reasonable. Having more attributes than just PHC index works.
Given the lack of distinction between MAC and PHY for integrated NICs
I'd lean towards ditching the "layers" completely and exposing 
an "approximate" vs "precise" boolean. Approximate being the DMA point
for NICs, but more generically a point that is separated from the wire
by buffering or other variable length delay. Precise == IEEE 1588
quality.
  
Vladimir Oltean Nov. 20, 2023, 10:05 p.m. UTC | #21
On Mon, Nov 20, 2023 at 01:37:37PM -0800, Jakub Kicinski wrote:
> > Is it an accurate summary?
> 
> Yes.
> 
> For now we can impose the requirement that only one can be active 
> easily at the kernel level. But the uAPI should allow expressing more.

I see. That's quite something to think about for Köry. In its defense,
I also agree that this idea seems the most orthogonal to everything else
that we have or may want to add in the future, and is not likely to
become obsoleted by some other mechanism that can achieve the same
thing, but in a more flexible way. It's just that it's quite the task.

I sense it may be time to dust off and submit the rest of my
ndo_hwtstamp_get()/ ndo_hwtstamp_set() conversions before a netlink
conversion of SIOCGHWTSTAMP/SIOCSHWTSTAMP could even take place...
https://github.com/vladimiroltean/linux/commits/ndo-hwtstamp-v9
  
Köry Maincent Nov. 21, 2023, 5:31 p.m. UTC | #22
On Tue, 21 Nov 2023 00:05:49 +0200
Vladimir Oltean <vladimir.oltean@nxp.com> wrote:

> On Mon, Nov 20, 2023 at 01:37:37PM -0800, Jakub Kicinski wrote:
> > > Is it an accurate summary?  
> > 
> > Yes.
> > 
> > For now we can impose the requirement that only one can be active 
> > easily at the kernel level. But the uAPI should allow expressing more.  
> 
> I see. That's quite something to think about for Köry. In its defense,
> I also agree that this idea seems the most orthogonal to everything else
> that we have or may want to add in the future, and is not likely to
> become obsoleted by some other mechanism that can achieve the same
> thing, but in a more flexible way. It's just that it's quite the task.
> 
> I sense it may be time to dust off and submit the rest of my
> ndo_hwtstamp_get()/ ndo_hwtstamp_set() conversions before a netlink
> conversion of SIOCGHWTSTAMP/SIOCSHWTSTAMP could even take place...
> https://github.com/vladimiroltean/linux/commits/ndo-hwtstamp-v9

Ok I kind of got an idea of what is your prerequisites.

If I summarize, a solution could be this:

- Expand struct hwtstamp_config with a phc_index member for the SIOCG/SHWTSTAMP
  commands.
  To keep backward compatibility if phc_index is not set in the hwtstamp_config
  data from userspace use the default hwtstamp (the default being selected as
  done in my patch series).
  Is this possible, would it breaks things?

- In netlink part, send one netlink tsinfo skb for each phc_index.

Could be done in a later patch series:
- Expand netlink TSINFO with ETHTOOL_A_TSINFO_HWSTAMP_PROVIDER_QUALIFIER.
  Describing this struct:
enum ethtool_hwstamp_provider_qualifier {
 	ETHTOOL_HWSTAMP_PROVIDER_QUALIFIER_PRECISE,
 	ETHTOOL_HWSTAMP_PROVIDER_QUALIFIER_APPROX,
}; 

  Set the desired qualifier through TSINFO_SET or through SIOCSHWTSTAMP by
  expanding again the struct hwtstamp_config.

Do you think this is feasible?
I might miss some core stuff.

Regards,
  
Jakub Kicinski Nov. 21, 2023, 5:43 p.m. UTC | #23
On Tue, 21 Nov 2023 18:31:14 +0100 Köry Maincent wrote:
> - Expand struct hwtstamp_config with a phc_index member for the SIOCG/SHWTSTAMP
>   commands.
>   To keep backward compatibility if phc_index is not set in the hwtstamp_config
>   data from userspace use the default hwtstamp (the default being selected as
>   done in my patch series).
>   Is this possible, would it breaks things?

I'd skip this bit, and focus on the ETHTOOL_TSINFO. Keep the ioctl as
"legacy" and do all the extensions in ethtool. TSINFO_GET can serve
as GET, to avoid adding 3rd command for the same thing. TSINFO_SET
would be new (as you indicate below).

> - In netlink part, send one netlink tsinfo skb for each phc_index.

phc_index and netdev combination. A DO command can only generate one
answer (or rather, it should generate only one answer, there are few
hard rules in netlink). So we need to move that functionality to DUMP.
We can filter the DUMP based on user-provided ifindex and/or phc_index.

> Could be done in a later patch series:
> - Expand netlink TSINFO with ETHTOOL_A_TSINFO_HWSTAMP_PROVIDER_QUALIFIER.
>   Describing this struct:
> enum ethtool_hwstamp_provider_qualifier {
>  	ETHTOOL_HWSTAMP_PROVIDER_QUALIFIER_PRECISE,
>  	ETHTOOL_HWSTAMP_PROVIDER_QUALIFIER_APPROX,
> }; 
> 
>   Set the desired qualifier through TSINFO_SET or through SIOCSHWTSTAMP by
>   expanding again the struct hwtstamp_config.
> 
> Do you think this is feasible?
  
Richard Cochran Nov. 21, 2023, 11:40 p.m. UTC | #24
On Tue, Nov 21, 2023 at 06:31:14PM +0100, Köry Maincent wrote:
> If I summarize, a solution could be this:
> 
> - Expand struct hwtstamp_config with a phc_index member for the SIOCG/SHWTSTAMP
>   commands.
>   To keep backward compatibility if phc_index is not set in the hwtstamp_config
>   data from userspace use the default hwtstamp (the default being selected as
>   done in my patch series).
>   Is this possible, would it breaks things?

You can't "expand" hwtstamp_config because it is an established ABI.

(You could introduce SIOC[GS]HWTSTAMP_2 with an expanded layout)

Thanks,
Richard
  
Köry Maincent Nov. 22, 2023, 1:44 p.m. UTC | #25
On Tue, 21 Nov 2023 09:43:54 -0800
Jakub Kicinski <kuba@kernel.org> wrote:

> On Tue, 21 Nov 2023 18:31:14 +0100 Köry Maincent wrote:
> > - Expand struct hwtstamp_config with a phc_index member for the
> > SIOCG/SHWTSTAMP commands.
> >   To keep backward compatibility if phc_index is not set in the
> > hwtstamp_config data from userspace use the default hwtstamp (the default
> > being selected as done in my patch series).
> >   Is this possible, would it breaks things?  
> 
> I'd skip this bit, and focus on the ETHTOOL_TSINFO. Keep the ioctl as
> "legacy" and do all the extensions in ethtool. TSINFO_GET can serve
> as GET, to avoid adding 3rd command for the same thing. TSINFO_SET
> would be new (as you indicate below).

You say this patch series should simply add TSINFO_SET command to set the
current phc_index?

It won't solve your requirement of having simultaneous hwtimestamp and
enabling/disabling them through rx_filter and tx_types.
You want to do this in another patch series alongside a new SIOCG/SHWTSTAMP_2
ABI?

> > - In netlink part, send one netlink tsinfo skb for each phc_index.  
> 
> phc_index and netdev combination. A DO command can only generate one
> answer (or rather, it should generate only one answer, there are few
> hard rules in netlink). So we need to move that functionality to DUMP.
> We can filter the DUMP based on user-provided ifindex and/or phc_index.

Currently, the dumpit function is assigned to ethnl_default_dumpit. Wouldn't
the behavior change of the dumpit callback break the ABI?


> > Could be done in a later patch series:
> > - Expand netlink TSINFO with ETHTOOL_A_TSINFO_HWSTAMP_PROVIDER_QUALIFIER.
> >   Describing this struct:
> > enum ethtool_hwstamp_provider_qualifier {
> >  	ETHTOOL_HWSTAMP_PROVIDER_QUALIFIER_PRECISE,
> >  	ETHTOOL_HWSTAMP_PROVIDER_QUALIFIER_APPROX,
> > }; 
> > 
> >   Set the desired qualifier through TSINFO_SET or through SIOCSHWTSTAMP by
> >   expanding again the struct hwtstamp_config.

Just wondering to have a insight of future support, in the case of several
provider qualifier and the SIOCG/SHWTSTAMP_2 layout containing the phc_index.
Will we be able to talk to the two providers qualifiers simultaneously or is it
not possible. To know if the SIOCG/SHWTSTAMP_2 layout would contain the
description of the qualifier provider.
If I understand well your mail in the thread it will be the case right?

Regards,
  
Vladimir Oltean Nov. 22, 2023, 2:08 p.m. UTC | #26
On Wed, Nov 22, 2023 at 02:44:53PM +0100, Köry Maincent wrote:
> On Tue, 21 Nov 2023 09:43:54 -0800
> Jakub Kicinski <kuba@kernel.org> wrote:
> 
> > On Tue, 21 Nov 2023 18:31:14 +0100 Köry Maincent wrote:
> > > - Expand struct hwtstamp_config with a phc_index member for the
> > > SIOCG/SHWTSTAMP commands.
> > >   To keep backward compatibility if phc_index is not set in the
> > > hwtstamp_config data from userspace use the default hwtstamp (the default
> > > being selected as done in my patch series).
> > >   Is this possible, would it breaks things?  
> > 
> > I'd skip this bit, and focus on the ETHTOOL_TSINFO. Keep the ioctl as
> > "legacy" and do all the extensions in ethtool. TSINFO_GET can serve
> > as GET, to avoid adding 3rd command for the same thing. TSINFO_SET
> > would be new (as you indicate below).
> 
> You say this patch series should simply add TSINFO_SET command to set the
> current phc_index?
> 
> It won't solve your requirement of having simultaneous hwtimestamp and
> enabling/disabling them through rx_filter and tx_types.
> You want to do this in another patch series alongside a new SIOCG/SHWTSTAMP_2
> ABI?
> 
> > > - In netlink part, send one netlink tsinfo skb for each phc_index.  
> > 
> > phc_index and netdev combination. A DO command can only generate one
> > answer (or rather, it should generate only one answer, there are few
> > hard rules in netlink). So we need to move that functionality to DUMP.
> > We can filter the DUMP based on user-provided ifindex and/or phc_index.
> 
> Currently, the dumpit function is assigned to ethnl_default_dumpit. Wouldn't
> the behavior change of the dumpit callback break the ABI?
> 
> 
> > > Could be done in a later patch series:
> > > - Expand netlink TSINFO with ETHTOOL_A_TSINFO_HWSTAMP_PROVIDER_QUALIFIER.
> > >   Describing this struct:
> > > enum ethtool_hwstamp_provider_qualifier {
> > >  	ETHTOOL_HWSTAMP_PROVIDER_QUALIFIER_PRECISE,
> > >  	ETHTOOL_HWSTAMP_PROVIDER_QUALIFIER_APPROX,
> > > }; 
> > > 
> > >   Set the desired qualifier through TSINFO_SET or through SIOCSHWTSTAMP by
> > >   expanding again the struct hwtstamp_config.
> 
> Just wondering to have a insight of future support, in the case of several
> provider qualifier and the SIOCG/SHWTSTAMP_2 layout containing the phc_index.
> Will we be able to talk to the two providers qualifiers simultaneously or is it
> not possible. To know if the SIOCG/SHWTSTAMP_2 layout would contain the
> description of the qualifier provider.
> If I understand well your mail in the thread it will be the case right?
> 
> Regards,
> -- 
> Köry Maincent, Bootlin
> Embedded Linux and kernel engineering
> https://bootlin.com/

My understanding of Jakub's email was that he wants to see the functionality
offered by SIOCGHWTSTAMP and SIOCSHWTSTAMP converted to netlink. I don't
think that ethtool is the correct netlink family for that, given that
these aren't ethtool ioctls to begin with. Maybe the new netdev netlink
family. The conversion in its basic form would offer exactly the same
functionality. The extended netlink messages would have extra attributes
to identify the targeted hwtstamp provider. In the lack of those
attributes, the default hwtstamp provider is targeted. The definition of
the default hwtstamp provider should be as per your current patch set
(netdev, with a whitelist for current phylib PHYs).

The _listing_ of hwtstamp providers is what could be done through ethtool
netlink, similar but not identical to the way in which you are proposing
today (you are presenting blanket "layers" which correspond to netdev and
phylib, rather than individual providers).

The concept of an "active phc_index" would not explicitly exist in the
UAPI. Thus I'm not sure what's with this TSINFO_SET being floated around.
The only thing would exist is a configurable rx_filter and tx_type per
hwtstamp provider (aka "{phc_index, qualifier}"). User space will have
to learn to select the hwtstamp provider it wants to configure through
netlink, and use for its class of traffic.

This is why I mentioned by ndo_hwtstamp_set() conversion, because
suddenly it is a prerequisite for any further progress to be done.
You can't convert SIOCSHWTSTAMP to netlink if there are some driver
implementations which still use ndo_eth_ioctl(). They need to be
UAPI-agnostic.

I'm not sure what's with Richard's mention of the "_2" variants of the
ioctls. Probably a low-effort suggestion which was a bit out of context.
His main point, that you cannot extend struct hwtstamp_config as that
has a fixed binary format, is perfectly valid though. This is why
netlink is preferable, because if done correctly (meaning not with
NLA_BINARY attributes), then it is much more extensible because all
attributes are TLVs. Use NLA_BINARY, and you will run into the exact
extensibility issues that the ioctl interface has.
  
Vladimir Oltean Nov. 22, 2023, 2:19 p.m. UTC | #27
On Wed, Nov 22, 2023 at 04:08:50PM +0200, Vladimir Oltean wrote:
> The concept of an "active phc_index" would not explicitly exist in the
> UAPI. Thus I'm not sure what's with this TSINFO_SET being floated around.
> The only thing would exist is a configurable rx_filter and tx_type per
> hwtstamp provider (aka "{phc_index, qualifier}"). User space will have
> to learn to select the hwtstamp provider it wants to configure through
> netlink, and use for its class of traffic.

One clarification: the extended timestamping filters are PER NETDEV
(in addition to being per one of the hwtstamp providers listed for that
netdev). This was understated from the fact that the netlink interface
itself targets a netdev, but I didn't say it explicitly.
  
Vladimir Oltean Nov. 22, 2023, 2:36 p.m. UTC | #28
On Wed, Nov 22, 2023 at 04:08:50PM +0200, Vladimir Oltean wrote:
> The concept of an "active phc_index" would not explicitly exist in the
> UAPI. Thus I'm not sure what's with this TSINFO_SET being floated around.
> The only thing would exist is a configurable rx_filter and tx_type per
> hwtstamp provider (aka "{phc_index, qualifier}"). User space will have
> to learn to select the hwtstamp provider it wants to configure through
> netlink, and use for its class of traffic.

@Jakub, for your long-term "MAC timestamps for PTP, DMA for everything else".
How do you see this? I guess we need some sort of priority function in
the UAPI between hwtstamp providers.

And even with that, I think the enums that we currently have for filters
are not specific enough. The most we could expose is:

                      MAC provider                      DMA provider

hwtstamp_rx_filters   HWTSTAMP_FILTER_PTP_V2_EVENT      HWTSTAMP_FILTER_ALL
tx_type               HWTSTAMP_TX_ON                    HWTSTAMP_TX_ON

but it isn't clear: for PTP, does the DMA provider give you an RX
timestamp too? What about a TX timestamp?
  
Köry Maincent Nov. 22, 2023, 2:57 p.m. UTC | #29
On Wed, 22 Nov 2023 16:08:50 +0200
Vladimir Oltean <vladimir.oltean@nxp.com> wrote:

> On Wed, Nov 22, 2023 at 02:44:53PM +0100, Köry Maincent wrote:
> > On Tue, 21 Nov 2023 09:43:54 -0800
> > Jakub Kicinski <kuba@kernel.org> wrote:
> >   
> > > On Tue, 21 Nov 2023 18:31:14 +0100 Köry Maincent wrote:  
> > > > - Expand struct hwtstamp_config with a phc_index member for the
> > > > SIOCG/SHWTSTAMP commands.
> > > >   To keep backward compatibility if phc_index is not set in the
> > > > hwtstamp_config data from userspace use the default hwtstamp (the
> > > > default being selected as done in my patch series).
> > > >   Is this possible, would it breaks things?    
> > > 
> > > I'd skip this bit, and focus on the ETHTOOL_TSINFO. Keep the ioctl as
> > > "legacy" and do all the extensions in ethtool. TSINFO_GET can serve
> > > as GET, to avoid adding 3rd command for the same thing. TSINFO_SET
> > > would be new (as you indicate below).  
> > 
> > You say this patch series should simply add TSINFO_SET command to set the
> > current phc_index?
> > 
> > It won't solve your requirement of having simultaneous hwtimestamp and
> > enabling/disabling them through rx_filter and tx_types.
> > You want to do this in another patch series alongside a new
> > SIOCG/SHWTSTAMP_2 ABI?
> >   
> > > > - In netlink part, send one netlink tsinfo skb for each phc_index.    
> > > 
> > > phc_index and netdev combination. A DO command can only generate one
> > > answer (or rather, it should generate only one answer, there are few
> > > hard rules in netlink). So we need to move that functionality to DUMP.
> > > We can filter the DUMP based on user-provided ifindex and/or phc_index.  
> > 
> > Currently, the dumpit function is assigned to ethnl_default_dumpit. Wouldn't
> > the behavior change of the dumpit callback break the ABI?
> > 
> >   
> > > > Could be done in a later patch series:
> > > > - Expand netlink TSINFO with
> > > > ETHTOOL_A_TSINFO_HWSTAMP_PROVIDER_QUALIFIER. Describing this struct:
> > > > enum ethtool_hwstamp_provider_qualifier {
> > > >  	ETHTOOL_HWSTAMP_PROVIDER_QUALIFIER_PRECISE,
> > > >  	ETHTOOL_HWSTAMP_PROVIDER_QUALIFIER_APPROX,
> > > > }; 
> > > > 
> > > >   Set the desired qualifier through TSINFO_SET or through SIOCSHWTSTAMP
> > > > by expanding again the struct hwtstamp_config.  
> > 
> > Just wondering to have a insight of future support, in the case of several
> > provider qualifier and the SIOCG/SHWTSTAMP_2 layout containing the
> > phc_index. Will we be able to talk to the two providers qualifiers
> > simultaneously or is it not possible. To know if the SIOCG/SHWTSTAMP_2
> > layout would contain the description of the qualifier provider.
> > If I understand well your mail in the thread it will be the case right?

> My understanding of Jakub's email was that he wants to see the functionality
> offered by SIOCGHWTSTAMP and SIOCSHWTSTAMP converted to netlink. I don't
> think that ethtool is the correct netlink family for that, given that
> these aren't ethtool ioctls to begin with. Maybe the new netdev netlink
> family. The conversion in its basic form would offer exactly the same
> functionality. The extended netlink messages would have extra attributes
> to identify the targeted hwtstamp provider. In the lack of those
> attributes, the default hwtstamp provider is targeted. The definition of
> the default hwtstamp provider should be as per your current patch set
> (netdev, with a whitelist for current phylib PHYs).
> 
> The _listing_ of hwtstamp providers is what could be done through ethtool
> netlink, similar but not identical to the way in which you are proposing
> today (you are presenting blanket "layers" which correspond to netdev and
> phylib, rather than individual providers).
> 
> The concept of an "active phc_index" would not explicitly exist in the
> UAPI. Thus I'm not sure what's with this TSINFO_SET being floated around.
> The only thing would exist is a configurable rx_filter and tx_type per
> hwtstamp provider (aka "{phc_index, qualifier}"). User space will have
> to learn to select the hwtstamp provider it wants to configure through
> netlink, and use for its class of traffic.
> 
> This is why I mentioned by ndo_hwtstamp_set() conversion, because
> suddenly it is a prerequisite for any further progress to be done.
> You can't convert SIOCSHWTSTAMP to netlink if there are some driver
> implementations which still use ndo_eth_ioctl(). They need to be
> UAPI-agnostic.
> 
> I'm not sure what's with Richard's mention of the "_2" variants of the
> ioctls. Probably a low-effort suggestion which was a bit out of context.
> His main point, that you cannot extend struct hwtstamp_config as that
> has a fixed binary format, is perfectly valid though. This is why
> netlink is preferable, because if done correctly (meaning not with
> NLA_BINARY attributes), then it is much more extensible because all
> attributes are TLVs. Use NLA_BINARY, and you will run into the exact
> extensibility issues that the ioctl interface has.

I appreciate the clarification, it now makes more sense to me.

Regards,
  
Jakub Kicinski Nov. 22, 2023, 4:50 p.m. UTC | #30
On Wed, 22 Nov 2023 16:08:50 +0200 Vladimir Oltean wrote:
> My understanding of Jakub's email was that he wants to see the functionality
> offered by SIOCGHWTSTAMP and SIOCSHWTSTAMP converted to netlink. I don't
> think that ethtool is the correct netlink family for that, given that
> these aren't ethtool ioctls to begin with. Maybe the new netdev netlink
> family. The conversion in its basic form would offer exactly the same
> functionality.

Well, ethtool has been the catch all for a lot of random things
for the longest time. The question is whether we want to extend
ETHTOOL_GET_TS_INFO or add a third API somewhere else. And if we
do - do we also duplicate the functionality of ETHTOOL_GET_TS_INFO
(i.e. getting capabilities)?

My vote is that keeping it in ethtool is less bad than 3rd API.

> The _listing_ of hwtstamp providers is what could be done through ethtool
> netlink, similar but not identical to the way in which you are proposing
> today (you are presenting blanket "layers" which correspond to netdev and
> phylib, rather than individual providers).
> 
> The concept of an "active phc_index" would not explicitly exist in the
> UAPI. Thus I'm not sure what's with this TSINFO_SET being floated around.
> The only thing would exist is a configurable rx_filter and tx_type per
> hwtstamp provider (aka "{phc_index, qualifier}"). User space will have
> to learn to select the hwtstamp provider it wants to configure through
> netlink, and use for its class of traffic.

"Active provider" is the one that has TX_ON, rx != FILTER_NONE, right?

> This is why I mentioned by ndo_hwtstamp_set() conversion, because
> suddenly it is a prerequisite for any further progress to be done.
> You can't convert SIOCSHWTSTAMP to netlink if there are some driver
> implementations which still use ndo_eth_ioctl(). They need to be
> UAPI-agnostic.

Right, definitely.

> I'm not sure what's with Richard's mention of the "_2" variants of the
> ioctls. Probably a low-effort suggestion which was a bit out of context.
> His main point, that you cannot extend struct hwtstamp_config as that
> has a fixed binary format, is perfectly valid though. This is why
> netlink is preferable, because if done correctly (meaning not with
> NLA_BINARY attributes), then it is much more extensible because all
> attributes are TLVs. Use NLA_BINARY, and you will run into the exact
> extensibility issues that the ioctl interface has.
  
Jakub Kicinski Nov. 22, 2023, 4:54 p.m. UTC | #31
On Wed, 22 Nov 2023 16:36:18 +0200 Vladimir Oltean wrote:
> @Jakub, for your long-term "MAC timestamps for PTP, DMA for everything else".
> How do you see this? I guess we need some sort of priority function in
> the UAPI between hwtstamp providers.
> 
> And even with that, I think the enums that we currently have for filters
> are not specific enough. The most we could expose is:
> 
>                       MAC provider                      DMA provider
> 
> hwtstamp_rx_filters   HWTSTAMP_FILTER_PTP_V2_EVENT      HWTSTAMP_FILTER_ALL
> tx_type               HWTSTAMP_TX_ON                    HWTSTAMP_TX_ON
> 
> but it isn't clear: for PTP, does the DMA provider give you an RX
> timestamp too?

If we phrase it as "precise / approximate" rather than "MAC / DMA" - it
seems fairly intuitive to give the best timestamp available for a given
packet, no?

> What about a TX timestamp?

I was thinking - socket flag to make packets for a given socket request
precise timestamps.
  
Vladimir Oltean Nov. 22, 2023, 4:55 p.m. UTC | #32
On Wed, Nov 22, 2023 at 08:50:00AM -0800, Jakub Kicinski wrote:
> On Wed, 22 Nov 2023 16:08:50 +0200 Vladimir Oltean wrote:
> > My understanding of Jakub's email was that he wants to see the functionality
> > offered by SIOCGHWTSTAMP and SIOCSHWTSTAMP converted to netlink. I don't
> > think that ethtool is the correct netlink family for that, given that
> > these aren't ethtool ioctls to begin with. Maybe the new netdev netlink
> > family. The conversion in its basic form would offer exactly the same
> > functionality.
> 
> Well, ethtool has been the catch all for a lot of random things
> for the longest time. The question is whether we want to extend
> ETHTOOL_GET_TS_INFO or add a third API somewhere else. And if we
> do - do we also duplicate the functionality of ETHTOOL_GET_TS_INFO
> (i.e. getting capabilities)?
> 
> My vote is that keeping it in ethtool is less bad than 3rd API.

With SIOCSHWTSTAMP also implemented by CAN (and presumably also by
wireless in the future), I do wonder whether ethtool is the right place
for the netlink conversion.

I wouldn't suggest duplicating ETHTOOL_GET_TS_INFO towards the netdev
netlink family.

> > The concept of an "active phc_index" would not explicitly exist in the
> > UAPI. Thus I'm not sure what's with this TSINFO_SET being floated around.
> > The only thing would exist is a configurable rx_filter and tx_type per
> > hwtstamp provider (aka "{phc_index, qualifier}"). User space will have
> > to learn to select the hwtstamp provider it wants to configure through
> > netlink, and use for its class of traffic.
> 
> "Active provider" is the one that has TX_ON, rx != FILTER_NONE, right?

In the "implicit" definition of an "active hwtstamp provider", yes.
  
Vladimir Oltean Nov. 22, 2023, 4:59 p.m. UTC | #33
On Wed, Nov 22, 2023 at 08:54:59AM -0800, Jakub Kicinski wrote:
> On Wed, 22 Nov 2023 16:36:18 +0200 Vladimir Oltean wrote:
> > @Jakub, for your long-term "MAC timestamps for PTP, DMA for everything else".
> > How do you see this? I guess we need some sort of priority function in
> > the UAPI between hwtstamp providers.
> > 
> > And even with that, I think the enums that we currently have for filters
> > are not specific enough. The most we could expose is:
> > 
> >                       MAC provider                      DMA provider
> > 
> > hwtstamp_rx_filters   HWTSTAMP_FILTER_PTP_V2_EVENT      HWTSTAMP_FILTER_ALL
> > tx_type               HWTSTAMP_TX_ON                    HWTSTAMP_TX_ON
> > 
> > but it isn't clear: for PTP, does the DMA provider give you an RX
> > timestamp too?
> 
> If we phrase it as "precise / approximate" rather than "MAC / DMA" - it
> seems fairly intuitive to give the best timestamp available for a given
> packet, no?

I wouldn't be so sure. The alternative interpretation "for PTP, give me
timestamps from both sources" also sounds reasonable for the distant
future where that will be possible (with proper cmsg identification).
But I don't see how to distinguish the two - the filters, expressed in
these terms, would be the same.

> > What about a TX timestamp?
> 
> I was thinking - socket flag to make packets for a given socket request
> precise timestamps.

So the ptp4l source code would have to be modified to still work with
the same precision as before? I'm not seeing this through.
  
Jakub Kicinski Nov. 22, 2023, 5:55 p.m. UTC | #34
On Wed, 22 Nov 2023 18:59:55 +0200 Vladimir Oltean wrote:
> I wouldn't be so sure. The alternative interpretation "for PTP, give me
> timestamps from both sources" also sounds reasonable for the distant
> future where that will be possible (with proper cmsg identification).
> But I don't see how to distinguish the two - the filters, expressed in
> these terms, would be the same.

We can add an attribute that explicitly says that the configuration
is only requesting one stamp. But feels like jumping the gun at this
stage, given we have no other option to express there.

> So the ptp4l source code would have to be modified to still work with
> the same precision as before? I'm not seeing this through.

We can do the opposite and add a socket flag which says "DMA is okay".
  
Jakub Kicinski Nov. 22, 2023, 6:01 p.m. UTC | #35
On Wed, 22 Nov 2023 18:55:17 +0200 Vladimir Oltean wrote:
> > Well, ethtool has been the catch all for a lot of random things
> > for the longest time. The question is whether we want to extend
> > ETHTOOL_GET_TS_INFO or add a third API somewhere else. And if we
> > do - do we also duplicate the functionality of ETHTOOL_GET_TS_INFO
> > (i.e. getting capabilities)?
> > 
> > My vote is that keeping it in ethtool is less bad than 3rd API.  
> 
> With SIOCSHWTSTAMP also implemented by CAN (and presumably also by
> wireless in the future), I do wonder whether ethtool is the right place
> for the netlink conversion.

ethtool currently provides the only way we have to configure ring
length, ring count, RSS, UDP tunnels etc.

It's a matter of taste, IMO ethtool is a bit of a lost cause already
and keeping things together (ethtool already has TS_INFO) is cleaner
than spreading them around.

> I wouldn't suggest duplicating ETHTOOL_GET_TS_INFO towards the netdev
> netlink family.

FTR so far the netdev family is all about SW configuration. We should
probably keep it that way, so it doesn't become ginormous. It's easy
enough to create a new family, if needed.
  
Willem de Bruijn Nov. 22, 2023, 6:11 p.m. UTC | #36
On Wed, Nov 22, 2023 at 12:55 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Wed, 22 Nov 2023 18:59:55 +0200 Vladimir Oltean wrote:
> > I wouldn't be so sure. The alternative interpretation "for PTP, give me
> > timestamps from both sources" also sounds reasonable for the distant
> > future where that will be possible (with proper cmsg identification).
> > But I don't see how to distinguish the two - the filters, expressed in
> > these terms, would be the same.
>
> We can add an attribute that explicitly says that the configuration
> is only requesting one stamp. But feels like jumping the gun at this
> stage, given we have no other option to express there.
>
> > So the ptp4l source code would have to be modified to still work with
> > the same precision as before? I'm not seeing this through.
>
> We can do the opposite and add a socket flag which says "DMA is okay".

There already is a disconnect between configuring hardware timestamp
generation. Through the ioctl, which is a global admin-only interface.
And requesting timestamps with SO_TIMESTAMPING.

Today the user of ptp4l already has to know that the admin has
configured the right RX and TX filters. That is no different if
multiple filters can be installed? (PHY for PTP, DMA for everything
else).

If attribution becomes important, we could add another cmsg alongside
the timestamp. On TX this already happens with
IP_RECVERR/IPV6_RECVERR/PACKET_TX_TIMESTAMP. Maybe the
sock_extended_err struct even still has a field that can be (ab)used
for this purpose.

Being able to pass multiple timestamps up to userspace eventually will
be interesting. A large blocker is where to store these values in the
sk_buff on the path between the driver and the socket (skb_ext?). At
Google we already have this scenario, where the local TCP stack and
userspace both want converted hardware timestamps -- but converted
from raw to different timebases.
  
Köry Maincent Nov. 23, 2023, 3 p.m. UTC | #37
On Wed, 22 Nov 2023 10:01:42 -0800
Jakub Kicinski <kuba@kernel.org> wrote:

> On Wed, 22 Nov 2023 18:55:17 +0200 Vladimir Oltean wrote:
> > > Well, ethtool has been the catch all for a lot of random things
> > > for the longest time. The question is whether we want to extend
> > > ETHTOOL_GET_TS_INFO or add a third API somewhere else. And if we
> > > do - do we also duplicate the functionality of ETHTOOL_GET_TS_INFO
> > > (i.e. getting capabilities)?
> > > 
> > > My vote is that keeping it in ethtool is less bad than 3rd API.    
> > 
> > With SIOCSHWTSTAMP also implemented by CAN (and presumably also by
> > wireless in the future), I do wonder whether ethtool is the right place
> > for the netlink conversion.  
> 
> ethtool currently provides the only way we have to configure ring
> length, ring count, RSS, UDP tunnels etc.
> 
> It's a matter of taste, IMO ethtool is a bit of a lost cause already
> and keeping things together (ethtool already has TS_INFO) is cleaner
> than spreading them around.
> 
> > I wouldn't suggest duplicating ETHTOOL_GET_TS_INFO towards the netdev
> > netlink family.  
> 
> FTR so far the netdev family is all about SW configuration. We should
> probably keep it that way, so it doesn't become ginormous. It's easy
> enough to create a new family, if needed.

So, do we have a consensus? Vlad, do you agree on putting all under ethtool?

ETHTOOL_GET_TS_INFO will be in charge of replacing the SIOCGHWSTAMP
implementation. Need to add ETHTOOL_A_TSINFO_PHC_INDEX
ETHTOOL_A_TSINFO_QUALIFIER to the request.

ETHTOOL_GET_TS_INFO will list all the hwtstamp provider (aka "{phc_index,
qualifier}") through the dumpit callback. I will add a filter to be able to
list only the hwtstamp provider of one netdev.

ETHTOOL_SET_TS_INFO will be in charge of replacing the SIOCSHWSTAMP
implementation.

Is that ok for you?

Regards,
  
Jakub Kicinski Nov. 23, 2023, 5:32 p.m. UTC | #38
On Thu, 23 Nov 2023 16:00:56 +0100 Köry Maincent wrote:
> > FTR so far the netdev family is all about SW configuration. We should
> > probably keep it that way, so it doesn't become ginormous. It's easy
> > enough to create a new family, if needed.  
> 
> So, do we have a consensus? Vlad, do you agree on putting all under ethtool?

If not we can do a vote/poll? Maybe others don't find the configuration
of timestamping as confusing as me.
  
Vladimir Oltean Nov. 24, 2023, 3:43 p.m. UTC | #39
On Thu, Nov 23, 2023 at 09:32:05AM -0800, Jakub Kicinski wrote:
> On Thu, Nov 23, 2023 at 04:00:56PM +0100, Köry Maincent wrote:
> > So, do we have a consensus? Vlad, do you agree on putting all under ethtool?
> > 
> > ETHTOOL_GET_TS_INFO will be in charge of replacing the SIOCGHWSTAMP
> > implementation. Need to add ETHTOOL_A_TSINFO_PHC_INDEX
> > ETHTOOL_A_TSINFO_QUALIFIER to the request.
> > 
> > ETHTOOL_GET_TS_INFO will list all the hwtstamp provider (aka "{phc_index,
> > qualifier}") through the dumpit callback. I will add a filter to be able to
> > list only the hwtstamp provider of one netdev.
> > 
> > ETHTOOL_SET_TS_INFO will be in charge of replacing the SIOCSHWSTAMP
> > implementation.
> 
> If not we can do a vote/poll? Maybe others don't find the configuration
> of timestamping as confusing as me.

If you mean the ETHTOOL_MSG_TSINFO_GET netlink message (ETHTOOL_GET_TS_INFO
is an ioctl), you're saying that you want to move the entire contents of
SIOCGHWSTAMP there, by making the kernel call ndo_hwtstamp_get() in
addition to the existing __ethtool_get_ts_info()?

Yeah, I don't know, I don't have a real objection, I guess it's fine.

What will be a bit of an "?!" moment for users is when ethtool gains
support for the SIOCGHWSTAMP/SIOCSHWSTAMP netlink replacements, but not
for the original ioctls. So hwstamp_ctl will be able to change timestamping
configuration, but ethtool wouldn't - all on the same system. Unless
ethtool gains an ioctl fallback for a ioctl that was never down its alley.

But by all means, still hold a poll if you want to. I would vote for
ethtool netlink, not because it's great, just because I don't have a
better alternative to propose.
  
Vladimir Oltean Nov. 24, 2023, 5:27 p.m. UTC | #40
Hi Willem,

On Wed, Nov 22, 2023 at 01:11:02PM -0500, Willem de Bruijn wrote:
> There already is a disconnect between configuring hardware timestamp
> generation. Through the ioctl, which is a global admin-only interface.
> And requesting timestamps with SO_TIMESTAMPING.
> 
> Today the user of ptp4l already has to know that the admin has
> configured the right RX and TX filters. That is no different if
> multiple filters can be installed? (PHY for PTP, DMA for everything
> else).

Are you saying that ptp4l doesn't configure the RX and TX filters by
itself, just the admin had to do that? Because it does.
https://github.com/richardcochran/linuxptp/blob/master/sk.c#L59

I'm not seeing the disconnect. SO_TIMESTAMPING is for the socket,
SIOCSHWTSTAMP is for the configuration at the device level.

It _is_ different if multiple filters can be installed, because either
we let things be (and ptp4l issues the same ioctl which affects the
default hwtstamp provider, which may or may not coincide with what we
intend), or we teach ptp4l to deal with the multitude of providers that
a port may have.
  
Köry Maincent Nov. 24, 2023, 5:34 p.m. UTC | #41
On Fri, 24 Nov 2023 17:43:43 +0200
Vladimir Oltean <vladimir.oltean@nxp.com> wrote:

> On Thu, Nov 23, 2023 at 09:32:05AM -0800, Jakub Kicinski wrote:
> > On Thu, Nov 23, 2023 at 04:00:56PM +0100, Köry Maincent wrote:  
> > > So, do we have a consensus? Vlad, do you agree on putting all under
> > > ethtool?
> > > 
> > > ETHTOOL_GET_TS_INFO will be in charge of replacing the SIOCGHWSTAMP
> > > implementation. Need to add ETHTOOL_A_TSINFO_PHC_INDEX
> > > ETHTOOL_A_TSINFO_QUALIFIER to the request.
> > > 
> > > ETHTOOL_GET_TS_INFO will list all the hwtstamp provider (aka "{phc_index,
> > > qualifier}") through the dumpit callback. I will add a filter to be able
> > > to list only the hwtstamp provider of one netdev.
> > > 
> > > ETHTOOL_SET_TS_INFO will be in charge of replacing the SIOCSHWSTAMP
> > > implementation.  
> > 
> > If not we can do a vote/poll? Maybe others don't find the configuration
> > of timestamping as confusing as me.  
> 
> If you mean the ETHTOOL_MSG_TSINFO_GET netlink message (ETHTOOL_GET_TS_INFO
> is an ioctl), you're saying that you want to move the entire contents of
> SIOCGHWSTAMP there, by making the kernel call ndo_hwtstamp_get() in
> addition to the existing __ethtool_get_ts_info()?

Yes.
 
> Yeah, I don't know, I don't have a real objection, I guess it's fine.
> 
> What will be a bit of an "?!" moment for users is when ethtool gains
> support for the SIOCGHWSTAMP/SIOCSHWSTAMP netlink replacements, but not
> for the original ioctls. So hwstamp_ctl will be able to change timestamping
> configuration, but ethtool wouldn't - all on the same system. Unless
> ethtool gains an ioctl fallback for a ioctl that was never down its alley.

Yes indeed. Would it break things if both ioctls and netlink can get and set
the hwtstamps configuration? It is only configuration. Both happen under
rtnl_lock it should be alright.

The question is which hwtstamp provider will the original ioctls be able to
change? Maybe the default one (MAC with phy whitelist) and only this one.

> But by all means, still hold a poll if you want to. I would vote for
> ethtool netlink, not because it's great, just because I don't have a
> better alternative to propose.

If you agree on that choice, let's go. Jakub and your are the most proactive
reviewers in this patch series. Willem you are the timestamping maintainer do
you also agree on this? 
If anyone have another proposition let them speak now, or forever remain
silent! ;)

Regards,
  
Willem de Bruijn Nov. 24, 2023, 7:45 p.m. UTC | #42
On Fri, Nov 24, 2023 at 12:28 PM Vladimir Oltean
<vladimir.oltean@nxp.com> wrote:
>
> Hi Willem,
>
> On Wed, Nov 22, 2023 at 01:11:02PM -0500, Willem de Bruijn wrote:
> > There already is a disconnect between configuring hardware timestamp
> > generation. Through the ioctl, which is a global admin-only interface.
> > And requesting timestamps with SO_TIMESTAMPING.
> >
> > Today the user of ptp4l already has to know that the admin has
> > configured the right RX and TX filters. That is no different if
> > multiple filters can be installed? (PHY for PTP, DMA for everything
> > else).
>
> Are you saying that ptp4l doesn't configure the RX and TX filters by
> itself, just the admin had to do that? Because it does.
> https://github.com/richardcochran/linuxptp/blob/master/sk.c#L59
>
> I'm not seeing the disconnect. SO_TIMESTAMPING is for the socket,
> SIOCSHWTSTAMP is for the configuration at the device level.
>
> It _is_ different if multiple filters can be installed, because either
> we let things be (and ptp4l issues the same ioctl which affects the
> default hwtstamp provider, which may or may not coincide with what we
> intend), or we teach ptp4l to deal with the multitude of providers that
> a port may have.

I see. By disconnect, I meant that the socket option is unprivileged and
can be set by many processes, while the ioctl is a global privileged
setting, so must be under control of a single admin.

But I did not know that ptp4l can take on both those roles for PTP.

Perhaps multiple SIOCSHWTSTAMP rules can coexist, up to
one per level:

HWTSTAMP_FILTER_PTP_V2_EVENT, level=PHY
HWTSTAMP_FILTER_ALL, level, level=DMA

Then ptp4l can manage all levels except the DMA level. And DMA
timestamps can be configured independently by another admin.

If only one timestamp can be communicated to the host, the earliest
match must takes precedence. Jakub pointed out how one device
handles this by having a separate queue for PHY  timestamped
packets.

This does not address the issue that packets with different
precision skb_shinfo(skb)->hwtstamps->hwtstamp may now exist
in the system. All packets reaching ptp4l sockets must have a high
resolution source, but there is no explicit annotation to ensure or
check this. This is fully based on trusting the HWSTAMP_FILTER.
Expanding the skb infra and cmsg might be follow-on work.
  
Vladimir Oltean Nov. 24, 2023, 7:57 p.m. UTC | #43
On Fri, Nov 24, 2023 at 06:34:31PM +0100, Köry Maincent wrote:
> Would it break things if both ioctls and netlink can get and set the
> hwtstamps configuration?

Uhm, obviously? It would break things if ioctl and netlink were _not_
freely interchangeable, and you couldn't see in a ioctl GET what got set
through a netlink SET.

> It is only configuration. Both happen under rtnl_lock it should be
> alright.

Yeah, but you always need to keep the API interchangeability in mind
during the implementation.

> The question is which hwtstamp provider will the original ioctls be able to
> change? Maybe the default one (MAC with phy whitelist) and only this one.

TL;DR: yeah.

Remember one single rule and go from there: new development should not
change established setups. So SIOCSHWSTAMPs should continue to behave
"as before".

This is also the exact reason why I asked for the phy whitelist. The
introduction of CONFIG_NETWORK_PHY_TIMESTAMPING introduced exactly that:
a breaking change in the mode in which deployed setups operate.

> > But by all means, still hold a poll if you want to. I would vote for
> > ethtool netlink, not because it's great, just because I don't have a
> > better alternative to propose.
> 
> If you agree on that choice, let's go. Jakub and your are the most proactive
> reviewers in this patch series. Willem you are the timestamping maintainer do
> you also agree on this? 
> If anyone have another proposition let them speak now, or forever remain
> silent! ;)

Hmm, proactive means doing stuff in anticipation of being requested to
do it. I'd use the work "active" at most...
  
Willem de Bruijn Nov. 24, 2023, 8:47 p.m. UTC | #44
On Fri, Nov 24, 2023 at 12:34 PM Köry Maincent
<kory.maincent@bootlin.com> wrote:
>
> On Fri, 24 Nov 2023 17:43:43 +0200
> Vladimir Oltean <vladimir.oltean@nxp.com> wrote:
>
> > On Thu, Nov 23, 2023 at 09:32:05AM -0800, Jakub Kicinski wrote:
> > > On Thu, Nov 23, 2023 at 04:00:56PM +0100, Köry Maincent wrote:
> > > > So, do we have a consensus? Vlad, do you agree on putting all under
> > > > ethtool?
> > > >
> > > > ETHTOOL_GET_TS_INFO will be in charge of replacing the SIOCGHWSTAMP
> > > > implementation. Need to add ETHTOOL_A_TSINFO_PHC_INDEX
> > > > ETHTOOL_A_TSINFO_QUALIFIER to the request.
> > > >
> > > > ETHTOOL_GET_TS_INFO will list all the hwtstamp provider (aka "{phc_index,
> > > > qualifier}") through the dumpit callback. I will add a filter to be able
> > > > to list only the hwtstamp provider of one netdev.
> > > >
> > > > ETHTOOL_SET_TS_INFO will be in charge of replacing the SIOCSHWSTAMP
> > > > implementation.
> > >
> > > If not we can do a vote/poll? Maybe others don't find the configuration
> > > of timestamping as confusing as me.
> >
> > If you mean the ETHTOOL_MSG_TSINFO_GET netlink message (ETHTOOL_GET_TS_INFO
> > is an ioctl), you're saying that you want to move the entire contents of
> > SIOCGHWSTAMP there, by making the kernel call ndo_hwtstamp_get() in
> > addition to the existing __ethtool_get_ts_info()?
>
> Yes.
>
> > Yeah, I don't know, I don't have a real objection, I guess it's fine.
> >
> > What will be a bit of an "?!" moment for users is when ethtool gains
> > support for the SIOCGHWSTAMP/SIOCSHWSTAMP netlink replacements, but not
> > for the original ioctls. So hwstamp_ctl will be able to change timestamping
> > configuration, but ethtool wouldn't - all on the same system. Unless
> > ethtool gains an ioctl fallback for a ioctl that was never down its alley.
>
> Yes indeed. Would it break things if both ioctls and netlink can get and set
> the hwtstamps configuration? It is only configuration. Both happen under
> rtnl_lock it should be alright.
>
> The question is which hwtstamp provider will the original ioctls be able to
> change? Maybe the default one (MAC with phy whitelist) and only this one.
>
> > But by all means, still hold a poll if you want to. I would vote for
> > ethtool netlink, not because it's great, just because I don't have a
> > better alternative to propose.
>
> If you agree on that choice, let's go. Jakub and your are the most proactive
> reviewers in this patch series. Willem you are the timestamping maintainer do
> you also agree on this?

I don't have a strong opinion. Ethtool netlink SGTM.

For new network configuration we are moving away from ioctl towards
netlink in general.

Ethtool itself made this move, where the old ioctl way of things
continues to work, but will no longer be extended.

Since one of the APIs we use already uses ethtool, converting the
other two there makes sense to me.

I'm not familiar enough with configuring CAN or wireless to know
whether it would pose a problem for these mentioned cases.

> If anyone have another proposition let them speak now, or forever remain
> silent! ;)
  
Richard Cochran Nov. 25, 2023, 7:41 p.m. UTC | #45
On Fri, Nov 24, 2023 at 02:45:46PM -0500, Willem de Bruijn wrote:

> Expanding the skb infra and cmsg might be follow-on work.

Yes, and would I suggest limiting the scope of the present series just
to allow changing the global, device-level time stamping layer
administratively.

Trying to support multiple time stamps from different layers is a much
larger development project, and it will require new kernel/user
interfaces.

(Of course it would be grand if the series is forward looking to the
day when time stamp reporting expands beyond the current hw/sw cmsg.)

Thanks,
Richard
  
Köry Maincent Nov. 29, 2023, 8:09 p.m. UTC | #46
On Mon, 20 Nov 2023 13:45:51 -0800
Jakub Kicinski <kuba@kernel.org> wrote:

> On Mon, 20 Nov 2023 21:58:58 +0200 Vladimir Oltean wrote:
> > I'm still waiting for you to fully clarify the "per socket vs global"
> > aspect, but independently of that, at least I understand why this is a
> > counter-argument to my proposal. I need to tune it a bit (ASSUMING that
> > we want DMA timestamps to "look like" hwtimestamps, and not like their
> > own thing, to user space), because the PHC index would no longer fully
> > identify a hwtstamp provider, so we need something more.
> > 
> > I imagine both ETHTOOL_MSG_TSINFO_GET and ETHTOOL_MSG_TSINFO_SET to
> > support a new (nest) nlattr called ETHTOOL_A_TSINFO_HWSTAMP_PROVIDER.
> > 
> > This would contain (u32) ETHTOOL_A_TSINFO_HWSTAMP_PROVIDER_PHC_INDEX
> > and (u32) ETHTOOL_A_TSINFO_HWSTAMP_PROVIDER_QUALIFIER. It could be
> > extensible in the future, but this is the baseline and forms the key.
> > 
> > The latter takes values from an:
> > 
> > enum ethtool_hwstamp_provider_qualifier {
> > 	ETHTOOL_HWSTAMP_PROVIDER_QUALIFIER_MAC,
> > 	ETHTOOL_HWSTAMP_PROVIDER_QUALIFIER_PHY,
> > 	ETHTOOL_HWSTAMP_PROVIDER_QUALIFIER_DMA,
> > };  
> 
> Sounds reasonable. Having more attributes than just PHC index works.
> Given the lack of distinction between MAC and PHY for integrated NICs
> I'd lean towards ditching the "layers" completely and exposing 
> an "approximate" vs "precise" boolean. Approximate being the DMA point
> for NICs, but more generically a point that is separated from the wire
> by buffering or other variable length delay. Precise == IEEE 1588
> quality.

Hello Jakub, just wondering.
I can add this hwtstamp provider qualifier in the next series version but it
won't be used as it is set and used at the driver level and no driver is using
it for now. It would not be accepted if I use something that is not used, right?
Do you still think I should add this in v8?

Regards,
  
Vladimir Oltean Nov. 29, 2023, 8:37 p.m. UTC | #47
On Wed, Nov 29, 2023 at 09:09:59PM +0100, Köry Maincent wrote:
> On Mon, 20 Nov 2023 13:45:51 -0800
> Jakub Kicinski <kuba@kernel.org> wrote:
> 
> > On Mon, 20 Nov 2023 21:58:58 +0200 Vladimir Oltean wrote:
> > > I'm still waiting for you to fully clarify the "per socket vs global"
> > > aspect, but independently of that, at least I understand why this is a
> > > counter-argument to my proposal. I need to tune it a bit (ASSUMING that
> > > we want DMA timestamps to "look like" hwtimestamps, and not like their
> > > own thing, to user space), because the PHC index would no longer fully
> > > identify a hwtstamp provider, so we need something more.
> > > 
> > > I imagine both ETHTOOL_MSG_TSINFO_GET and ETHTOOL_MSG_TSINFO_SET to
> > > support a new (nest) nlattr called ETHTOOL_A_TSINFO_HWSTAMP_PROVIDER.
> > > 
> > > This would contain (u32) ETHTOOL_A_TSINFO_HWSTAMP_PROVIDER_PHC_INDEX
> > > and (u32) ETHTOOL_A_TSINFO_HWSTAMP_PROVIDER_QUALIFIER. It could be
> > > extensible in the future, but this is the baseline and forms the key.
> > > 
> > > The latter takes values from an:
> > > 
> > > enum ethtool_hwstamp_provider_qualifier {
> > > 	ETHTOOL_HWSTAMP_PROVIDER_QUALIFIER_MAC,
> > > 	ETHTOOL_HWSTAMP_PROVIDER_QUALIFIER_PHY,
> > > 	ETHTOOL_HWSTAMP_PROVIDER_QUALIFIER_DMA,
> > > };  
> > 
> > Sounds reasonable. Having more attributes than just PHC index works.
> > Given the lack of distinction between MAC and PHY for integrated NICs
> > I'd lean towards ditching the "layers" completely and exposing 
> > an "approximate" vs "precise" boolean. Approximate being the DMA point
> > for NICs, but more generically a point that is separated from the wire
> > by buffering or other variable length delay. Precise == IEEE 1588
> > quality.
> 
> Hello Jakub, just wondering.
> I can add this hwtstamp provider qualifier in the next series version but it
> won't be used as it is set and used at the driver level and no driver is using
> it for now. It would not be accepted if I use something that is not used, right?
> Do you still think I should add this in v8?

Not sure why you say "not used", though. Are you not planning to expose
the qualifier as an attribute to the listing of hwtstamp providers
offered to user space by ETHTOOL_MSG_TSINFO_GET?

Personally, I worry that if the qualifier gets added later (not now) to
the UAPI, we will end up having user space software (written now) that
iterates through the provider listing thinking that there may only ever
be one provider offered by one PHC, and will stop at the first such
provider found, whichever that may be.

With the added qualifier, there's a higher chance that user space
searches will be for a {phc, qualifier} pair (even if there will only be
1 possible qualifier type), and the future addition of a new hwtstamp
provider will keep existing software working in the same way as before,
i.e. user space won't select the DMA provider by mistake, by ignoring
the qualifier attribute altogether.

Generally I'm against adding things upfront that can only be in a certain
way, but in this case I believe that it is necessary in order for the
future extensions that were discussed to be possible. The qualifier is
part of the user space search key and thus pretty important.

My 2 cents, Jakub can absolutely disagree.
  
Köry Maincent Nov. 29, 2023, 10 p.m. UTC | #48
On Wed, 29 Nov 2023 22:37:00 +0200
Vladimir Oltean <vladimir.oltean@nxp.com> wrote:

> On Wed, Nov 29, 2023 at 09:09:59PM +0100, Köry Maincent wrote:
> > On Mon, 20 Nov 2023 13:45:51 -0800
> > Jakub Kicinski <kuba@kernel.org> wrote:

> > > Sounds reasonable. Having more attributes than just PHC index works.
> > > Given the lack of distinction between MAC and PHY for integrated NICs
> > > I'd lean towards ditching the "layers" completely and exposing 
> > > an "approximate" vs "precise" boolean. Approximate being the DMA point
> > > for NICs, but more generically a point that is separated from the wire
> > > by buffering or other variable length delay. Precise == IEEE 1588
> > > quality.  
> > 
> > Hello Jakub, just wondering.
> > I can add this hwtstamp provider qualifier in the next series version but it
> > won't be used as it is set and used at the driver level and no driver is
> > using it for now. It would not be accepted if I use something that is not
> > used, right? Do you still think I should add this in v8?  
> 
> Not sure why you say "not used", though. Are you not planning to expose
> the qualifier as an attribute to the listing of hwtstamp providers
> offered to user space by ETHTOOL_MSG_TSINFO_GET?

Yes I will, I was just saying that all the PHC would be set as precise for now.
Approximate timestamp quality won't be used because IIUC there are no NIC driver
supporting it yet.

> Personally, I worry that if the qualifier gets added later (not now) to
> the UAPI, we will end up having user space software (written now) that
> iterates through the provider listing thinking that there may only ever
> be one provider offered by one PHC, and will stop at the first such
> provider found, whichever that may be.
> 
> With the added qualifier, there's a higher chance that user space
> searches will be for a {phc, qualifier} pair (even if there will only be
> 1 possible qualifier type), and the future addition of a new hwtstamp
> provider will keep existing software working in the same way as before,
> i.e. user space won't select the DMA provider by mistake, by ignoring
> the qualifier attribute altogether.
> 
> Generally I'm against adding things upfront that can only be in a certain
> way, but in this case I believe that it is necessary in order for the
> future extensions that were discussed to be possible. The qualifier is
> part of the user space search key and thus pretty important.
> 
> My 2 cents, Jakub can absolutely disagree.

Alright, this seems relevant.

Regards,
  
Jakub Kicinski Nov. 29, 2023, 11:56 p.m. UTC | #49
On Wed, 29 Nov 2023 23:00:34 +0100 Köry Maincent wrote:
> > Not sure why you say "not used", though. Are you not planning to expose
> > the qualifier as an attribute to the listing of hwtstamp providers
> > offered to user space by ETHTOOL_MSG_TSINFO_GET?  
> 
> Yes I will, I was just saying that all the PHC would be set as precise for now.
> Approximate timestamp quality won't be used because IIUC there are no NIC driver
> supporting it yet.

Agreed that we should add the attr from the start.

Maybe we can ask/work with Rahul <rrameshbabu@nvidia.com>
to implement the right thing in mlx5?

Failing that we can mark mlx5 as imprecise, until its sorted out.
So that we have both types in the tree.
  
Rahul Rameshbabu Nov. 30, 2023, 12:06 a.m. UTC | #50
On Wed, 29 Nov, 2023 15:56:13 -0800 Jakub Kicinski <kuba@kernel.org> wrote:
> On Wed, 29 Nov 2023 23:00:34 +0100 Köry Maincent wrote:
>> > Not sure why you say "not used", though. Are you not planning to expose
>> > the qualifier as an attribute to the listing of hwtstamp providers
>> > offered to user space by ETHTOOL_MSG_TSINFO_GET?  
>> 
>> Yes I will, I was just saying that all the PHC would be set as precise for now.
>> Approximate timestamp quality won't be used because IIUC there are no NIC driver
>> supporting it yet.
>
> Agreed that we should add the attr from the start.
>
> Maybe we can ask/work with Rahul <rrameshbabu@nvidia.com>
> to implement the right thing in mlx5?

Thanks for looping me in. We were already looking at this patch series
out of interest. I saw your suggestion to rephrase "MAC / DMA" as
"precise / approximate", which we really like for mlx5 devices because
our "approximate" timestamping logic is not exactly a "MAC" timestamp
but its not a port timestamp that has the greater precision we use. I
have a task already for implementing support for this ethtool attribute.
If folks here are open to it, I can add mlx5 support for both modes in
this patch series for the next revision that will entail the discussed
changes.

>
> Failing that we can mark mlx5 as imprecise, until its sorted out.
> So that we have both types in the tree.

--
Thanks,

Rahul Rameshbabu
  

Patch

diff --git a/Documentation/networking/ethtool-netlink.rst b/Documentation/networking/ethtool-netlink.rst
index b8d00676ed82..530c1775e5f4 100644
--- a/Documentation/networking/ethtool-netlink.rst
+++ b/Documentation/networking/ethtool-netlink.rst
@@ -227,6 +227,7 @@  Userspace to kernel:
   ``ETHTOOL_MSG_MM_SET``                set MAC merge layer parameters
   ``ETHTOOL_MSG_TS_GET``                get current timestamping
   ``ETHTOOL_MSG_TS_LIST_GET``           list available timestampings
+  ``ETHTOOL_MSG_TS_SET``                set current timestamping
   ===================================== =================================
 
 Kernel to userspace:
@@ -2038,6 +2039,21 @@  Kernel response contents:
 
 This command lists all the possible timestamp layer available.
 
+TS_SET
+======
+
+Modify the selected timestamping.
+
+Request contents:
+
+  =======================  ======  ===================
+  ``ETHTOOL_A_TS_HEADER``  nested  reply header
+  ``ETHTOOL_A_TS_LAYER``   u32     timestamping
+  =======================  ======  ===================
+
+This command set the timestamping with one that should be listed by the
+TSLIST_GET command.
+
 Request translation
 ===================
 
@@ -2146,4 +2162,5 @@  are netlink only.
   n/a                                 ``ETHTOOL_MSG_MM_SET``
   n/a                                 ``ETHTOOL_MSG_TS_GET``
   n/a                                 ``ETHTOOL_MSG_TS_LIST_GET``
+  n/a                                 ``ETHTOOL_MSG_TS_SET``
   =================================== =====================================
diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h
index 62b885d44d06..df6c4fcc62c1 100644
--- a/include/uapi/linux/ethtool_netlink.h
+++ b/include/uapi/linux/ethtool_netlink.h
@@ -59,6 +59,7 @@  enum {
 	ETHTOOL_MSG_MM_SET,
 	ETHTOOL_MSG_TS_GET,
 	ETHTOOL_MSG_TS_LIST_GET,
+	ETHTOOL_MSG_TS_SET,
 
 	/* add new constants above here */
 	__ETHTOOL_MSG_USER_CNT,
diff --git a/net/ethtool/netlink.c b/net/ethtool/netlink.c
index 842c9db1531f..8322bf71f80d 100644
--- a/net/ethtool/netlink.c
+++ b/net/ethtool/netlink.c
@@ -308,6 +308,7 @@  ethnl_default_requests[__ETHTOOL_MSG_USER_CNT] = {
 	[ETHTOOL_MSG_MM_SET]		= &ethnl_mm_request_ops,
 	[ETHTOOL_MSG_TS_GET]		= &ethnl_ts_request_ops,
 	[ETHTOOL_MSG_TS_LIST_GET]	= &ethnl_ts_list_request_ops,
+	[ETHTOOL_MSG_TS_SET]		= &ethnl_ts_request_ops,
 };
 
 static struct ethnl_dump_ctx *ethnl_dump_context(struct netlink_callback *cb)
@@ -1148,6 +1149,13 @@  static const struct genl_ops ethtool_genl_ops[] = {
 		.policy = ethnl_ts_get_policy,
 		.maxattr = ARRAY_SIZE(ethnl_ts_get_policy) - 1,
 	},
+	{
+		.cmd	= ETHTOOL_MSG_TS_SET,
+		.flags	= GENL_UNS_ADMIN_PERM,
+		.doit	= ethnl_default_set_doit,
+		.policy = ethnl_ts_set_policy,
+		.maxattr = ARRAY_SIZE(ethnl_ts_set_policy) - 1,
+	},
 };
 
 static const struct genl_multicast_group ethtool_nl_mcgrps[] = {
diff --git a/net/ethtool/netlink.h b/net/ethtool/netlink.h
index ea8c312db3af..8fedf234b824 100644
--- a/net/ethtool/netlink.h
+++ b/net/ethtool/netlink.h
@@ -444,6 +444,7 @@  extern const struct nla_policy ethnl_plca_get_status_policy[ETHTOOL_A_PLCA_HEADE
 extern const struct nla_policy ethnl_mm_get_policy[ETHTOOL_A_MM_HEADER + 1];
 extern const struct nla_policy ethnl_mm_set_policy[ETHTOOL_A_MM_MAX + 1];
 extern const struct nla_policy ethnl_ts_get_policy[ETHTOOL_A_TS_HEADER + 1];
+extern const struct nla_policy ethnl_ts_set_policy[ETHTOOL_A_TS_MAX + 1];
 
 int ethnl_set_features(struct sk_buff *skb, struct genl_info *info);
 int ethnl_act_cable_test(struct sk_buff *skb, struct genl_info *info);
diff --git a/net/ethtool/ts.c b/net/ethtool/ts.c
index bd219512b8de..357265e74e08 100644
--- a/net/ethtool/ts.c
+++ b/net/ethtool/ts.c
@@ -59,6 +59,102 @@  static int ts_fill_reply(struct sk_buff *skb,
 	return nla_put_u32(skb, ETHTOOL_A_TS_LAYER, data->ts_layer);
 }
 
+/* TS_SET */
+const struct nla_policy ethnl_ts_set_policy[] = {
+	[ETHTOOL_A_TS_HEADER]	= NLA_POLICY_NESTED(ethnl_header_policy),
+	[ETHTOOL_A_TS_LAYER]	= NLA_POLICY_RANGE(NLA_U32, 0,
+						   __TIMESTAMPING_COUNT - 1)
+};
+
+static int ethnl_set_ts_validate(struct ethnl_req_info *req_info,
+				 struct genl_info *info)
+{
+	struct nlattr **tb = info->attrs;
+	const struct net_device_ops *ops = req_info->dev->netdev_ops;
+
+	if (!ops->ndo_hwtstamp_set)
+		return -EOPNOTSUPP;
+
+	if (!tb[ETHTOOL_A_TS_LAYER])
+		return 0;
+
+	return 1;
+}
+
+static int ethnl_set_ts(struct ethnl_req_info *req_info, struct genl_info *info)
+{
+	struct net_device *dev = req_info->dev;
+	const struct ethtool_ops *ops = dev->ethtool_ops;
+	struct kernel_hwtstamp_config config = {0};
+	struct nlattr **tb = info->attrs;
+	enum timestamping_layer ts_layer;
+	bool mod = false;
+	int ret;
+
+	ts_layer = dev->ts_layer;
+	ethnl_update_u32(&ts_layer, tb[ETHTOOL_A_TS_LAYER], &mod);
+
+	if (!mod)
+		return 0;
+
+	if (ts_layer == SOFTWARE_TIMESTAMPING) {
+		struct ethtool_ts_info ts_info = {0};
+
+		if (!ops->get_ts_info) {
+			NL_SET_ERR_MSG_ATTR(info->extack,
+					    tb[ETHTOOL_A_TS_LAYER],
+					    "this net device cannot support timestamping");
+			return -EINVAL;
+		}
+
+		ops->get_ts_info(dev, &ts_info);
+		if ((ts_info.so_timestamping &
+		    SOF_TIMESTAMPING_SOFTWARE_MASK) !=
+		    SOF_TIMESTAMPING_SOFTWARE_MASK) {
+			NL_SET_ERR_MSG_ATTR(info->extack,
+					    tb[ETHTOOL_A_TS_LAYER],
+					    "this net device cannot support software timestamping");
+			return -EINVAL;
+		}
+	} else if (ts_layer == MAC_TIMESTAMPING) {
+		struct ethtool_ts_info ts_info = {0};
+
+		if (!ops->get_ts_info) {
+			NL_SET_ERR_MSG_ATTR(info->extack,
+					    tb[ETHTOOL_A_TS_LAYER],
+					    "this net device cannot support timestamping");
+			return -EINVAL;
+		}
+
+		ops->get_ts_info(dev, &ts_info);
+		if ((ts_info.so_timestamping &
+		    SOF_TIMESTAMPING_HARDWARE_MASK) !=
+		    SOF_TIMESTAMPING_HARDWARE_MASK) {
+			NL_SET_ERR_MSG_ATTR(info->extack,
+					    tb[ETHTOOL_A_TS_LAYER],
+					    "this net device cannot support hardware timestamping");
+			return -EINVAL;
+		}
+	} else if (ts_layer == PHY_TIMESTAMPING && !phy_has_tsinfo(dev->phydev)) {
+		NL_SET_ERR_MSG_ATTR(info->extack, tb[ETHTOOL_A_TS_LAYER],
+				    "this phy device cannot support timestamping");
+		return -EINVAL;
+	}
+
+	/* Disable time stamping in the current layer. */
+	if (netif_device_present(dev) &&
+	    (dev->ts_layer == PHY_TIMESTAMPING ||
+	    dev->ts_layer == MAC_TIMESTAMPING)) {
+		ret = dev_set_hwtstamp_phylib(dev, &config, info->extack);
+		if (ret < 0)
+			return ret;
+	}
+
+	dev->ts_layer = ts_layer;
+
+	return 1;
+}
+
 const struct ethnl_request_ops ethnl_ts_request_ops = {
 	.request_cmd		= ETHTOOL_MSG_TS_GET,
 	.reply_cmd		= ETHTOOL_MSG_TS_GET_REPLY,
@@ -69,6 +165,9 @@  const struct ethnl_request_ops ethnl_ts_request_ops = {
 	.prepare_data		= ts_prepare_data,
 	.reply_size		= ts_reply_size,
 	.fill_reply		= ts_fill_reply,
+
+	.set_validate		= ethnl_set_ts_validate,
+	.set			= ethnl_set_ts,
 };
 
 /* TS_LIST_GET */