[V2,04/11] cxl/mem: Clear events on driver load

Message ID 20221201002719.2596558-5-ira.weiny@intel.com
State New
Headers
Series CXL: Process event logs |

Commit Message

Ira Weiny Dec. 1, 2022, 12:27 a.m. UTC
  From: Ira Weiny <ira.weiny@intel.com>

The information contained in the events prior to the driver loading can
be queried at any time through other mailbox commands.

Ensure a clean slate of events by reading and clearing the events.  The
events are sent to the trace buffer but it is not anticipated to have
anyone listening to it at driver load time.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
---
 drivers/cxl/pci.c            | 2 ++
 tools/testing/cxl/test/mem.c | 2 ++
 2 files changed, 4 insertions(+)
  

Comments

Jonathan Cameron Dec. 1, 2022, 1:30 p.m. UTC | #1
On Wed, 30 Nov 2022 16:27:12 -0800
ira.weiny@intel.com wrote:

> From: Ira Weiny <ira.weiny@intel.com>
> 
> The information contained in the events prior to the driver loading can
> be queried at any time through other mailbox commands.
> 
> Ensure a clean slate of events by reading and clearing the events.  The
> events are sent to the trace buffer but it is not anticipated to have
> anyone listening to it at driver load time.
> 
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>

Probably not worth addressing but there is a corner case where this might fail
if some broken software already messed with reading out the events.

Imagine it read the first mailbox sized chunk, but didn't clear them...

If that happens, then we'd end up seeing the whole list, but in non
temporal order and hence trying to clear them out of order with predictable
fails.

Maybe this is the category of things we 'fix' if we ever hear of it actually
happening.

So with that caveat called out so I can say 'I told you so' :), fine to keep my tag on this.

Thanks,

Jonathan


> ---
>  drivers/cxl/pci.c            | 2 ++
>  tools/testing/cxl/test/mem.c | 2 ++
>  2 files changed, 4 insertions(+)
> 
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index 8f86f85d89c7..11e95a95195a 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -521,6 +521,8 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>  	if (IS_ERR(cxlmd))
>  		return PTR_ERR(cxlmd);
>  
> +	cxl_mem_get_event_records(cxlds);
> +
>  	if (resource_size(&cxlds->pmem_res) && IS_ENABLED(CONFIG_CXL_PMEM))
>  		rc = devm_cxl_add_nvdimm(&pdev->dev, cxlmd);
>  
> diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
> index aa2df3a15051..e2f5445d24ff 100644
> --- a/tools/testing/cxl/test/mem.c
> +++ b/tools/testing/cxl/test/mem.c
> @@ -285,6 +285,8 @@ static int cxl_mock_mem_probe(struct platform_device *pdev)
>  	if (IS_ERR(cxlmd))
>  		return PTR_ERR(cxlmd);
>  
> +	cxl_mem_get_event_records(cxlds);
> +
>  	if (resource_size(&cxlds->pmem_res) && IS_ENABLED(CONFIG_CXL_PMEM))
>  		rc = devm_cxl_add_nvdimm(dev, cxlmd);
>
  
Ira Weiny Dec. 1, 2022, 5:02 p.m. UTC | #2
On Thu, Dec 01, 2022 at 01:30:33PM +0000, Jonathan Cameron wrote:
> On Wed, 30 Nov 2022 16:27:12 -0800
> ira.weiny@intel.com wrote:
> 
> > From: Ira Weiny <ira.weiny@intel.com>
> > 
> > The information contained in the events prior to the driver loading can
> > be queried at any time through other mailbox commands.
> > 
> > Ensure a clean slate of events by reading and clearing the events.  The
> > events are sent to the trace buffer but it is not anticipated to have
> > anyone listening to it at driver load time.
> > 
> > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> 
> Probably not worth addressing but there is a corner case where this might fail
> if some broken software already messed with reading out the events.

Yea they can keep the pieces if they have done that.

> 
> Imagine it read the first mailbox sized chunk, but didn't clear them...
> 
> If that happens, then we'd end up seeing the whole list, but in non
> temporal order and hence trying to clear them out of order with predictable
> fails.
> 
> Maybe this is the category of things we 'fix' if we ever hear of it actually
> happening.
> 
> So with that caveat called out so I can say 'I told you so' :), fine to keep my tag on this.

Sure!  We probably owe you this T-Shirt already!

https://www.amazon.com/Big-Bang-Theory-Informed-Thusly/dp/B06XYCSQRF

:-D

Ira

> 
> Thanks,
> 
> Jonathan
> 
> 
> > ---
> >  drivers/cxl/pci.c            | 2 ++
> >  tools/testing/cxl/test/mem.c | 2 ++
> >  2 files changed, 4 insertions(+)
> > 
> > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> > index 8f86f85d89c7..11e95a95195a 100644
> > --- a/drivers/cxl/pci.c
> > +++ b/drivers/cxl/pci.c
> > @@ -521,6 +521,8 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> >  	if (IS_ERR(cxlmd))
> >  		return PTR_ERR(cxlmd);
> >  
> > +	cxl_mem_get_event_records(cxlds);
> > +
> >  	if (resource_size(&cxlds->pmem_res) && IS_ENABLED(CONFIG_CXL_PMEM))
> >  		rc = devm_cxl_add_nvdimm(&pdev->dev, cxlmd);
> >  
> > diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
> > index aa2df3a15051..e2f5445d24ff 100644
> > --- a/tools/testing/cxl/test/mem.c
> > +++ b/tools/testing/cxl/test/mem.c
> > @@ -285,6 +285,8 @@ static int cxl_mock_mem_probe(struct platform_device *pdev)
> >  	if (IS_ERR(cxlmd))
> >  		return PTR_ERR(cxlmd);
> >  
> > +	cxl_mem_get_event_records(cxlds);
> > +
> >  	if (resource_size(&cxlds->pmem_res) && IS_ENABLED(CONFIG_CXL_PMEM))
> >  		rc = devm_cxl_add_nvdimm(dev, cxlmd);
> >  
>
  
Dan Williams Dec. 2, 2022, 2:48 a.m. UTC | #3
cxl/mem is cxl_mem.ko, This is cxl/pci.

ira.weiny@ wrote:
> From: Ira Weiny <ira.weiny@intel.com>
> 
> The information contained in the events prior to the driver loading can
> be queried at any time through other mailbox commands.
> 
> Ensure a clean slate of events by reading and clearing the events.  The
> events are sent to the trace buffer but it is not anticipated to have
> anyone listening to it at driver load time.

This is easy to guarantee with modprobe policy, so I am not sure it is
worth stating.

This breakdown feels odd. I would split the trace event definitions into
its own lead in patch since that is a pile of definitions that can be
merged on their own. Then squash get, clear, and this patch into one
patch as they don't have much reason to go in separately.

> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> ---
>  drivers/cxl/pci.c            | 2 ++
>  tools/testing/cxl/test/mem.c | 2 ++
>  2 files changed, 4 insertions(+)
> 
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index 8f86f85d89c7..11e95a95195a 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -521,6 +521,8 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>  	if (IS_ERR(cxlmd))
>  		return PTR_ERR(cxlmd);
>  
> +	cxl_mem_get_event_records(cxlds);
> +
>  	if (resource_size(&cxlds->pmem_res) && IS_ENABLED(CONFIG_CXL_PMEM))
>  		rc = devm_cxl_add_nvdimm(&pdev->dev, cxlmd);
>  
> diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
> index aa2df3a15051..e2f5445d24ff 100644
> --- a/tools/testing/cxl/test/mem.c
> +++ b/tools/testing/cxl/test/mem.c
> @@ -285,6 +285,8 @@ static int cxl_mock_mem_probe(struct platform_device *pdev)
>  	if (IS_ERR(cxlmd))
>  		return PTR_ERR(cxlmd);
>  
> +	cxl_mem_get_event_records(cxlds);
> +

This hunk likely goes with the first patch that actually implements some
mocked events.

>  	if (resource_size(&cxlds->pmem_res) && IS_ENABLED(CONFIG_CXL_PMEM))
>  		rc = devm_cxl_add_nvdimm(dev, cxlmd);
>  
> -- 
> 2.37.2
>
  
Ira Weiny Dec. 2, 2022, 4:34 p.m. UTC | #4
On Thu, Dec 01, 2022 at 06:48:12PM -0800, Dan Williams wrote:
> cxl/mem is cxl_mem.ko, This is cxl/pci.
> 
> ira.weiny@ wrote:
> > From: Ira Weiny <ira.weiny@intel.com>
> > 
> > The information contained in the events prior to the driver loading can
> > be queried at any time through other mailbox commands.
> > 
> > Ensure a clean slate of events by reading and clearing the events.  The
> > events are sent to the trace buffer but it is not anticipated to have
> > anyone listening to it at driver load time.
> 
> This is easy to guarantee with modprobe policy, so I am not sure it is
> worth stating.

Fair enough.  But there was some discussion early on regarding why reading and
clearing on startup was a good thing.  This showed that we chose to do that and
why we don't care.  I'll remove it.

> 
> This breakdown feels odd. I would split the trace event definitions into
> its own lead in patch since that is a pile of definitions that can be
> merged on their own. Then squash get, clear, and this patch into one
> patch as they don't have much reason to go in separately.

I agree that splitting the Get/Clear/and this patch was odd.  However,
splitting Get/Clear made the discussion on those operations easier IMO.

As a result this did not really belong in either of those patches on their own.

It is also very clearly a do one thing per patch situation.

> 
> > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> > ---
> >  drivers/cxl/pci.c            | 2 ++
> >  tools/testing/cxl/test/mem.c | 2 ++
> >  2 files changed, 4 insertions(+)
> > 
> > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> > index 8f86f85d89c7..11e95a95195a 100644
> > --- a/drivers/cxl/pci.c
> > +++ b/drivers/cxl/pci.c
> > @@ -521,6 +521,8 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> >  	if (IS_ERR(cxlmd))
> >  		return PTR_ERR(cxlmd);
> >  
> > +	cxl_mem_get_event_records(cxlds);
> > +
> >  	if (resource_size(&cxlds->pmem_res) && IS_ENABLED(CONFIG_CXL_PMEM))
> >  		rc = devm_cxl_add_nvdimm(&pdev->dev, cxlmd);
> >  
> > diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
> > index aa2df3a15051..e2f5445d24ff 100644
> > --- a/tools/testing/cxl/test/mem.c
> > +++ b/tools/testing/cxl/test/mem.c
> > @@ -285,6 +285,8 @@ static int cxl_mock_mem_probe(struct platform_device *pdev)
> >  	if (IS_ERR(cxlmd))
> >  		return PTR_ERR(cxlmd);
> >  
> > +	cxl_mem_get_event_records(cxlds);
> > +
> 
> This hunk likely goes with the first patch that actually implements some
> mocked events.

If this patch was squashed into the other patches yes.  But as a patch which
does exactly 1 thing "Clear events on driver load" it works IMO.  I could just
have well put this patch at the very end.

Now that the Get/Clear operations are more settled I'll split this out and
squash it as you suggest.  Jonathan suggested squashing Get/Clear too but again
I really prefer the 1 thing/patch and each of those operations seemed like a
good breakdown.

Ira

> 
> >  	if (resource_size(&cxlds->pmem_res) && IS_ENABLED(CONFIG_CXL_PMEM))
> >  		rc = devm_cxl_add_nvdimm(dev, cxlmd);
> >  
> > -- 
> > 2.37.2
> > 
> 
>
  
Dan Williams Dec. 2, 2022, 11:34 p.m. UTC | #5
Ira Weiny wrote:
> On Thu, Dec 01, 2022 at 06:48:12PM -0800, Dan Williams wrote:
> > cxl/mem is cxl_mem.ko, This is cxl/pci.
> > 
> > ira.weiny@ wrote:
> > > From: Ira Weiny <ira.weiny@intel.com>
> > > 
> > > The information contained in the events prior to the driver loading can
> > > be queried at any time through other mailbox commands.
> > > 
> > > Ensure a clean slate of events by reading and clearing the events.  The
> > > events are sent to the trace buffer but it is not anticipated to have
> > > anyone listening to it at driver load time.
> > 
> > This is easy to guarantee with modprobe policy, so I am not sure it is
> > worth stating.
> 
> Fair enough.  But there was some discussion early on regarding why reading and
> clearing on startup was a good thing.  This showed that we chose to do that and
> why we don't care.  I'll remove it.
> 
> > 
> > This breakdown feels odd. I would split the trace event definitions into
> > its own lead in patch since that is a pile of definitions that can be
> > merged on their own. Then squash get, clear, and this patch into one
> > patch as they don't have much reason to go in separately.
> 
> I agree that splitting the Get/Clear/and this patch was odd.  However,
> splitting Get/Clear made the discussion on those operations easier IMO.
> 
> As a result this did not really belong in either of those patches on their own.
> 
> It is also very clearly a do one thing per patch situation.
> 
> > 
> > > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > > Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> > > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> > > ---
> > >  drivers/cxl/pci.c            | 2 ++
> > >  tools/testing/cxl/test/mem.c | 2 ++
> > >  2 files changed, 4 insertions(+)
> > > 
> > > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> > > index 8f86f85d89c7..11e95a95195a 100644
> > > --- a/drivers/cxl/pci.c
> > > +++ b/drivers/cxl/pci.c
> > > @@ -521,6 +521,8 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> > >  	if (IS_ERR(cxlmd))
> > >  		return PTR_ERR(cxlmd);
> > >  
> > > +	cxl_mem_get_event_records(cxlds);
> > > +
> > >  	if (resource_size(&cxlds->pmem_res) && IS_ENABLED(CONFIG_CXL_PMEM))
> > >  		rc = devm_cxl_add_nvdimm(&pdev->dev, cxlmd);
> > >  
> > > diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
> > > index aa2df3a15051..e2f5445d24ff 100644
> > > --- a/tools/testing/cxl/test/mem.c
> > > +++ b/tools/testing/cxl/test/mem.c
> > > @@ -285,6 +285,8 @@ static int cxl_mock_mem_probe(struct platform_device *pdev)
> > >  	if (IS_ERR(cxlmd))
> > >  		return PTR_ERR(cxlmd);
> > >  
> > > +	cxl_mem_get_event_records(cxlds);
> > > +
> > 
> > This hunk likely goes with the first patch that actually implements some
> > mocked events.
> 
> If this patch was squashed into the other patches yes.  But as a patch which
> does exactly 1 thing "Clear events on driver load" it works IMO.  I could just
> have well put this patch at the very end.
> 
> Now that the Get/Clear operations are more settled I'll split this out and
> squash it as you suggest.  Jonathan suggested squashing Get/Clear too but again
> I really prefer the 1 thing/patch and each of those operations seemed like a
> good breakdown.
> 

I'll preface this by saying if you ask 3 kernel developers how to split
a patch series you'll get 5 answers. For me though, a patch should be a
bisectable full-thought. That at each step of a series the kernel is
incrementally better in a way that makes sense. The kernel that gets Get
Events likely needs to clear them too to complete 1 full thought about
enbling Event handling. Otherwise a kernel that just retrieves some
events until they overflow feels like a POC.
  
Ira Weiny Dec. 3, 2022, 9 p.m. UTC | #6
On Fri, Dec 02, 2022 at 03:34:20PM -0800, Dan Williams wrote:
> Ira Weiny wrote:
> > On Thu, Dec 01, 2022 at 06:48:12PM -0800, Dan Williams wrote:
> > > cxl/mem is cxl_mem.ko, This is cxl/pci.

[snip]

> > > > +	cxl_mem_get_event_records(cxlds);
> > > > +
> > > 
> > > This hunk likely goes with the first patch that actually implements some
> > > mocked events.
> > 
> > If this patch was squashed into the other patches yes.  But as a patch which
> > does exactly 1 thing "Clear events on driver load" it works IMO.  I could just
> > have well put this patch at the very end.
> > 
> > Now that the Get/Clear operations are more settled I'll split this out and
> > squash it as you suggest.  Jonathan suggested squashing Get/Clear too but again
> > I really prefer the 1 thing/patch and each of those operations seemed like a
> > good breakdown.
> > 
> 
> I'll preface this by saying if you ask 3 kernel developers how to split
> a patch series you'll get 5 answers.

Indeed.

> For me though, a patch should be a
> bisectable full-thought. That at each step of a series the kernel is
> incrementally better in a way that makes sense. The kernel that gets Get
> Events likely needs to clear them too to complete 1 full thought about
> enbling Event handling. Otherwise a kernel that just retrieves some
> events until they overflow feels like a POC.

I've squashed it.

Ira
  

Patch

diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index 8f86f85d89c7..11e95a95195a 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -521,6 +521,8 @@  static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	if (IS_ERR(cxlmd))
 		return PTR_ERR(cxlmd);
 
+	cxl_mem_get_event_records(cxlds);
+
 	if (resource_size(&cxlds->pmem_res) && IS_ENABLED(CONFIG_CXL_PMEM))
 		rc = devm_cxl_add_nvdimm(&pdev->dev, cxlmd);
 
diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
index aa2df3a15051..e2f5445d24ff 100644
--- a/tools/testing/cxl/test/mem.c
+++ b/tools/testing/cxl/test/mem.c
@@ -285,6 +285,8 @@  static int cxl_mock_mem_probe(struct platform_device *pdev)
 	if (IS_ERR(cxlmd))
 		return PTR_ERR(cxlmd);
 
+	cxl_mem_get_event_records(cxlds);
+
 	if (resource_size(&cxlds->pmem_res) && IS_ENABLED(CONFIG_CXL_PMEM))
 		rc = devm_cxl_add_nvdimm(dev, cxlmd);