[net,v2] octeontx2-af: Move validation of ptp pointer before its usage

Message ID 20230609115806.2625564-1-saikrishnag@marvell.com
State New
Headers
Series [net,v2] octeontx2-af: Move validation of ptp pointer before its usage |

Commit Message

Sai Krishna Gajula June 9, 2023, 11:58 a.m. UTC
  Moved PTP pointer validation before its use to avoid smatch warning.
Also used kzalloc/kfree instead of devm_kzalloc/devm_kfree.

Fixes: 2ef4e45d99b1 ("octeontx2-af: Add PTP PPS Errata workaround on CN10K silicon")
Signed-off-by: Sai Krishna <saikrishnag@marvell.com>
Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com>
---
v2:
    - Addressed review comments given by Maciej Fijalkowski
	1. Modified patch title, commit message
	2. Used kzalloc/kfree instead of devm_kzalloc/devm_kfree

 drivers/net/ethernet/marvell/octeontx2/af/ptp.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)
  

Comments

Dan Carpenter June 9, 2023, 1:18 p.m. UTC | #1
On Fri, Jun 09, 2023 at 05:28:06PM +0530, Sai Krishna wrote:
> @@ -428,7 +427,7 @@ static int ptp_probe(struct pci_dev *pdev,
>  	return 0;
>  
>  error_free:
> -	devm_kfree(dev, ptp);
> +	kfree(ptp);

Yeah.  It's strange any time we call devm_kfree()...  So there is
something here which I have not understood.

>  
>  error:
>  	/* For `ptp_get()` we need to differentiate between the case

This probe function is super weird how it returns success on the failure
path.  One concern, I had initially was that if anything returns
-EPROBE_DEFER then we cannot recover.  That's not possible in the
current code, but it makes me itch...  But here is a different crash.

drivers/net/ethernet/marvell/octeontx2/af/ptp.c
   432  error:
   433          /* For `ptp_get()` we need to differentiate between the case
   434           * when the core has not tried to probe this device and the case when
   435           * the probe failed.  In the later case we pretend that the
   436           * initialization was successful and keep the error in
   437           * `dev->driver_data`.
   438           */
   439          pci_set_drvdata(pdev, ERR_PTR(err));
   440          if (!first_ptp_block)
   441                  first_ptp_block = ERR_PTR(err);

first_ptp_block is NULL for unprobed, an error pointer for probe
failure, or valid pointer.

   442  
   443          return 0;
   444  }

drivers/net/ethernet/marvell/octeontx2/af/ptp.c
   201  struct ptp *ptp_get(void)
   202  {
   203          struct ptp *ptp = first_ptp_block;
                            ^^^^^^^^^^^^^^^^^^^^^^

   204  
   205          /* Check PTP block is present in hardware */
   206          if (!pci_dev_present(ptp_id_table))
   207                  return ERR_PTR(-ENODEV);
   208          /* Check driver is bound to PTP block */
   209          if (!ptp)
   210                  ptp = ERR_PTR(-EPROBE_DEFER);
   211          else
   212                  pci_dev_get(ptp->pdev);
                                    ^^^^^^^^^
if first_ptp_block is an error pointer this will Oops.

   213  
   214          return ptp;
   215  }

regards,
dan carpenter
  
Sai Krishna Gajula June 23, 2023, 11:28 a.m. UTC | #2
> -----Original Message-----
> From: Dan Carpenter <dan.carpenter@linaro.org>
> Sent: Friday, June 9, 2023 6:48 PM
> To: Sai Krishna Gajula <saikrishnag@marvell.com>
> Cc: davem@davemloft.net; edumazet@google.com; kuba@kernel.org;
> pabeni@redhat.com; netdev@vger.kernel.org; linux-kernel@vger.kernel.org;
> Sunil Kovvuri Goutham <sgoutham@marvell.com>;
> maciej.fijalkowski@intel.com; Naveen Mamindlapalli
> <naveenm@marvell.com>
> Subject: Re: [net PATCH v2] octeontx2-af: Move validation of ptp
> pointer before its usage
> 
> On Fri, Jun 09, 2023 at 05:28:06PM +0530, Sai Krishna wrote:
> > @@ -428,7 +427,7 @@ static int ptp_probe(struct pci_dev *pdev,
> >  	return 0;
> >
> >  error_free:
> > -	devm_kfree(dev, ptp);
> > +	kfree(ptp);
> 
> Yeah.  It's strange any time we call devm_kfree()...  So there is something
> here which I have not understood.
We moved from devm_kzalloc/devm_kfree  to kzalloc/kfree as per review comments from Maciej.
> 
> >
> >  error:
> >  	/* For `ptp_get()` we need to differentiate between the case
> 
> This probe function is super weird how it returns success on the failure path.
> One concern, I had initially was that if anything returns -EPROBE_DEFER then
> we cannot recover.  That's not possible in the current code, but it makes me
> itch...  But here is a different crash.
> 

In few circumstances, the PTP device is probed before the AF device in the driver. In such instance, -EPROBE_DEFER is used.
-- EDEFER_PROBE is useful when probe order changes. Ex: AF driver probes before PTP.

> drivers/net/ethernet/marvell/octeontx2/af/ptp.c
>    432  error:
>    433          /* For `ptp_get()` we need to differentiate between the case
>    434           * when the core has not tried to probe this device and the case
> when
>    435           * the probe failed.  In the later case we pretend that the
>    436           * initialization was successful and keep the error in
>    437           * `dev->driver_data`.
>    438           */
>    439          pci_set_drvdata(pdev, ERR_PTR(err));
>    440          if (!first_ptp_block)
>    441                  first_ptp_block = ERR_PTR(err);
> 
> first_ptp_block is NULL for unprobed, an error pointer for probe failure, or
> valid pointer.

This is correct.
> 
>    442
>    443          return 0;
>    444  }
> 
> drivers/net/ethernet/marvell/octeontx2/af/ptp.c
>    201  struct ptp *ptp_get(void)
>    202  {
>    203          struct ptp *ptp = first_ptp_block;
>                             ^^^^^^^^^^^^^^^^^^^^^^
> 
>    204
>    205          /* Check PTP block is present in hardware */
>    206          if (!pci_dev_present(ptp_id_table))
>    207                  return ERR_PTR(-ENODEV);
>    208          /* Check driver is bound to PTP block */
>    209          if (!ptp)
>    210                  ptp = ERR_PTR(-EPROBE_DEFER);
>    211          else
>    212                  pci_dev_get(ptp->pdev);
>                                     ^^^^^^^^^ if first_ptp_block is an error pointer this will
> Oops.

Will fix this in v3 patch.

Thanks,
Sai
> 
>    213
>    214          return ptp;
>    215  }
> 
> regards,
> dan carpenter
  
Dan Carpenter June 23, 2023, 11:44 a.m. UTC | #3
On Fri, Jun 23, 2023 at 11:28:19AM +0000, Sai Krishna Gajula wrote:
> > This probe function is super weird how it returns success on the failure path.
> > One concern, I had initially was that if anything returns -EPROBE_DEFER then
> > we cannot recover.  That's not possible in the current code, but it makes me
> > itch...  But here is a different crash.
> > 
> 
> In few circumstances, the PTP device is probed before the AF device in
> the driver. In such instance, -EPROBE_DEFER is used.
> -- EDEFER_PROBE is useful when probe order changes. Ex: AF driver probes before PTP.
> 

You're describing how -EPROBE_DEFER is *supposed* to work.  But that's
not what this driver does.

If the AF driver is probed before the PTP driver then ptp_probe() should
return -EPROBE_DEFER and that would allow the kernel to automatically
retry ptp_probe() later.  But instead of that, ptp_probe() returns
success.  So I guess the user would have to manually rmmod it and insmod
it again?  So, what I'm saying I don't understand why we can't do this
in the normal way.

The other thing I'm saying is that the weird return success on error
stuff hasn't been tested or we would have discovered the crash through
testing.

regards,
dan carpenter
  
Sai Krishna Gajula June 30, 2023, 5:19 a.m. UTC | #4
> -----Original Message-----
> From: Dan Carpenter <dan.carpenter@linaro.org>
> Sent: Friday, June 23, 2023 5:14 PM
> To: Sai Krishna Gajula <saikrishnag@marvell.com>
> Cc: davem@davemloft.net; edumazet@google.com; kuba@kernel.org;
> pabeni@redhat.com; netdev@vger.kernel.org; linux-
> kernel@vger.kernel.org; Sunil Kovvuri Goutham <sgoutham@marvell.com>;
> maciej.fijalkowski@intel.com; Naveen Mamindlapalli
> <naveenm@marvell.com>
> Subject: Re: [net PATCH v2] octeontx2-af: Move validation of ptp
> pointer before its usage
> 
> On Fri, Jun 23, 2023 at 11:28:19AM +0000, Sai Krishna Gajula wrote:
> > > This probe function is super weird how it returns success on the failure
> path.
> > > One concern, I had initially was that if anything returns
> > > -EPROBE_DEFER then we cannot recover.  That's not possible in the
> > > current code, but it makes me itch...  But here is a different crash.
> > >
> >
> > In few circumstances, the PTP device is probed before the AF device in
> > the driver. In such instance, -EPROBE_DEFER is used.
> > -- EDEFER_PROBE is useful when probe order changes. Ex: AF driver probes
> before PTP.
> >
> 
> You're describing how -EPROBE_DEFER is *supposed* to work.  But that's not
> what this driver does.
> 
> If the AF driver is probed before the PTP driver then ptp_probe() should
> return -EPROBE_DEFER and that would allow the kernel to automatically retry
> ptp_probe() later.  But instead of that, ptp_probe() returns success.  So I
> guess the user would have to manually rmmod it and insmod it again?  So,
> what I'm saying I don't understand why we can't do this in the normal way.
> 
> The other thing I'm saying is that the weird return success on error stuff
> hasn't been tested or we would have discovered the crash through testing.
> 
> regards,
> dan carpenter

As suggested, we will return error in ptp_probe in case of any failure conditions. In this case AF driver continues without PTP support.
When the AF driver is probed before PTP driver , we will defer the AF probe. Hope these changes are inline with your view.
I will send a v3 patch with these changes. 

Regards,
Sai
  
Dan Carpenter June 30, 2023, 5:45 a.m. UTC | #5
On Fri, Jun 30, 2023 at 05:19:27AM +0000, Sai Krishna Gajula wrote:
> 
> > -----Original Message-----
> > From: Dan Carpenter <dan.carpenter@linaro.org>
> > Sent: Friday, June 23, 2023 5:14 PM
> > To: Sai Krishna Gajula <saikrishnag@marvell.com>
> > Cc: davem@davemloft.net; edumazet@google.com; kuba@kernel.org;
> > pabeni@redhat.com; netdev@vger.kernel.org; linux-
> > kernel@vger.kernel.org; Sunil Kovvuri Goutham <sgoutham@marvell.com>;
> > maciej.fijalkowski@intel.com; Naveen Mamindlapalli
> > <naveenm@marvell.com>
> > Subject: Re: [net PATCH v2] octeontx2-af: Move validation of ptp
> > pointer before its usage
> > 
> > On Fri, Jun 23, 2023 at 11:28:19AM +0000, Sai Krishna Gajula wrote:
> > > > This probe function is super weird how it returns success on the failure
> > path.
> > > > One concern, I had initially was that if anything returns
> > > > -EPROBE_DEFER then we cannot recover.  That's not possible in the
> > > > current code, but it makes me itch...  But here is a different crash.
> > > >
> > >
> > > In few circumstances, the PTP device is probed before the AF device in
> > > the driver. In such instance, -EPROBE_DEFER is used.
> > > -- EDEFER_PROBE is useful when probe order changes. Ex: AF driver probes
> > before PTP.
> > >
> > 
> > You're describing how -EPROBE_DEFER is *supposed* to work.  But that's not
> > what this driver does.
> > 
> > If the AF driver is probed before the PTP driver then ptp_probe() should
> > return -EPROBE_DEFER and that would allow the kernel to automatically retry
> > ptp_probe() later.  But instead of that, ptp_probe() returns success.  So I
> > guess the user would have to manually rmmod it and insmod it again?  So,
> > what I'm saying I don't understand why we can't do this in the normal way.
> > 
> > The other thing I'm saying is that the weird return success on error stuff
> > hasn't been tested or we would have discovered the crash through testing.
> > 
> > regards,
> > dan carpenter
> 
> As suggested, we will return error in ptp_probe in case of any
> failure conditions. In this case AF driver continues without PTP support.

I'm concerned about the "AF driver continues without PTP support".

> When the AF driver is probed before PTP driver , we will defer the AF
> probe. Hope these changes are inline with your view.
> I will send a v3 patch with these changes. 
> 

I don't really understand the situation.  You have two drivers.
Normally, the relationship is very simple where you have to load one
before you can load the other.  But here it sounds like the relationships
are very complicated and you are loading one in a degraded state for
some reason...

When drivers are loaded that happens in drivers/base/dd.c.  We start
with a list of drivers to probe.  Then if any of them return
-EPROBE_DEFER, we put them on deferred_probe_pending_list.  Then as soon
as we manage to probe another driver successfully we put the drivers on
deferred_probe_pending_list onto another list and start trying to probe
them again.

That process continues until we've gone through the list of drivers and
nothing succeeds.

regards,
dan carpenter
  
Sunil Kovvuri June 30, 2023, 6:46 a.m. UTC | #6
On Fri, Jun 30, 2023 at 11:16 AM Dan Carpenter <dan.carpenter@linaro.org> wrote:
>
> On Fri, Jun 30, 2023 at 05:19:27AM +0000, Sai Krishna Gajula wrote:
> >
> > > -----Original Message-----

> >
> > As suggested, we will return error in ptp_probe in case of any
> > failure conditions. In this case AF driver continues without PTP support.
>
> I'm concerned about the "AF driver continues without PTP support".
>

Yes, it doesn't make sense to proceed with AF driver if PTP driver
probe has failed.
PTP driver probe will fail upon memory alloc or ioremap failures, such failures
will most likely be encountered by AF driver as well. So better not
continue with AF driver probe.

> > When the AF driver is probed before PTP driver , we will defer the AF
> > probe. Hope these changes are inline with your view.
> > I will send a v3 patch with these changes.
> >
>
> I don't really understand the situation.  You have two drivers.
> Normally, the relationship is very simple where you have to load one
> before you can load the other.  But here it sounds like the relationships
> are very complicated and you are loading one in a degraded state for
> some reason...
>

No, the relationship is simple. Idea is to defer AF driver probe until
PTP driver is loaded.
Once the above is fixed there won't be any degraded state.

Thanks,
Sunil.
  

Patch

diff --git a/drivers/net/ethernet/marvell/octeontx2/af/ptp.c b/drivers/net/ethernet/marvell/octeontx2/af/ptp.c
index 3411e2e47d46..248316c4a53f 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/ptp.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/ptp.c
@@ -388,11 +388,10 @@  static int ptp_extts_on(struct ptp *ptp, int on)
 static int ptp_probe(struct pci_dev *pdev,
 		     const struct pci_device_id *ent)
 {
-	struct device *dev = &pdev->dev;
 	struct ptp *ptp;
 	int err;
 
-	ptp = devm_kzalloc(dev, sizeof(*ptp), GFP_KERNEL);
+	ptp = kzalloc(sizeof(struct ptp), GFP_KERNEL);
 	if (!ptp) {
 		err = -ENOMEM;
 		goto error;
@@ -428,7 +427,7 @@  static int ptp_probe(struct pci_dev *pdev,
 	return 0;
 
 error_free:
-	devm_kfree(dev, ptp);
+	kfree(ptp);
 
 error:
 	/* For `ptp_get()` we need to differentiate between the case
@@ -449,16 +448,17 @@  static void ptp_remove(struct pci_dev *pdev)
 	struct ptp *ptp = pci_get_drvdata(pdev);
 	u64 clock_cfg;
 
-	if (cn10k_ptp_errata(ptp) && hrtimer_active(&ptp->hrtimer))
-		hrtimer_cancel(&ptp->hrtimer);
-
 	if (IS_ERR_OR_NULL(ptp))
 		return;
 
+	if (cn10k_ptp_errata(ptp) && hrtimer_active(&ptp->hrtimer))
+		hrtimer_cancel(&ptp->hrtimer);
+
 	/* Disable PTP clock */
 	clock_cfg = readq(ptp->reg_base + PTP_CLOCK_CFG);
 	clock_cfg &= ~PTP_CLOCK_CFG_PTP_EN;
 	writeq(clock_cfg, ptp->reg_base + PTP_CLOCK_CFG);
+	kfree(ptp);
 }
 
 static const struct pci_device_id ptp_id_table[] = {