diff mbox series

[V2,03/11] cxl/mem: Implement Clear Event Records command

Message ID	20221201002719.2596558-4-ira.weiny@intel.com
State	New
Headers	Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; From: ira.weiny@intel.com To: Dan Williams <dan.j.williams@intel.com> Cc: Ira Weiny <ira.weiny@intel.com>, Alison Schofield <alison.schofield@intel.com>, Vishal Verma <vishal.l.verma@intel.com>, Ben Widawsky <bwidawsk@kernel.org>, Steven Rostedt <rostedt@goodmis.org>, Jonathan Cameron <Jonathan.Cameron@huawei.com>, Davidlohr Bueso <dave@stgolabs.net>, Dave Jiang <dave.jiang@intel.com>, linux-kernel@vger.kernel.org, linux-cxl@vger.kernel.org Subject: [PATCH V2 03/11] cxl/mem: Implement Clear Event Records command Date: Wed, 30 Nov 2022 16:27:11 -0800 Message-Id: <20221201002719.2596558-4-ira.weiny@intel.com> In-Reply-To: <20221201002719.2596558-1-ira.weiny@intel.com> References: <20221201002719.2596558-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	CXL: Process event logs \| [V2,00/11] CXL: Process event logs [V2,01/11] cxl/pci: Add generic MSI-X/MSI irq support [V2,02/11] cxl/mem: Implement Get Event Records command [V2,03/11] cxl/mem: Implement Clear Event Records command [V2,04/11] cxl/mem: Clear events on driver load [V2,05/11] cxl/mem: Trace General Media Event Record [V2,06/11] cxl/mem: Trace DRAM Event Record [V2,07/11] cxl/mem: Trace Memory Module Event Record [V2,08/11] cxl/mem: Wire up event interrupts [V2,09/11] cxl/test: Add generic mock events [V2,10/11] cxl/test: Add specific events [V2,11/11] cxl/test: Simulate event log overflow

Commit Message

Ira Weiny Dec. 1, 2022, 12:27 a.m. UTC

  From: Ira Weiny <ira.weiny@intel.com>

CXL rev 3.0 section 8.2.9.2.3 defines the Clear Event Records mailbox
command.  After an event record is read it needs to be cleared from the
event log.

Implement cxl_clear_event_record() to clear all record retrieved from
the device.

Each record is cleared explicitly.  A clear all bit is specified but
events could arrive between a get and any final clear all operation.
This means events would be missed.
Therefore each event is cleared specifically.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from V1:
	Clear Event Record allows for u8 handles while Get Event Record
	allows for u16 records to be returned.  Based on Jonathan's
	feedback; allow for all event records to be handled in this
	clear.  Which means a double loop with potentially multiple
	Clear Event payloads being sent to clear all events sent.

Changes from RFC:
	Jonathan
		Clean up init of payload and use return code.
		Also report any error to clear the event.
		s/v3.0/rev 3.0
---
 drivers/cxl/core/mbox.c      | 61 +++++++++++++++++++++++++++++++-----
 drivers/cxl/cxlmem.h         | 14 +++++++++
 include/uapi/linux/cxl_mem.h |  1 +
 3 files changed, 69 insertions(+), 7 deletions(-)

Comments

Jonathan Cameron Dec. 1, 2022, 1:26 p.m. UTC | #1

On Wed, 30 Nov 2022 16:27:11 -0800
ira.weiny@intel.com wrote:

> From: Ira Weiny <ira.weiny@intel.com>
> 
> CXL rev 3.0 section 8.2.9.2.3 defines the Clear Event Records mailbox
> command.  After an event record is read it needs to be cleared from the
> event log.
> 
> Implement cxl_clear_event_record() to clear all record retrieved from
> the device.
> 
> Each record is cleared explicitly.  A clear all bit is specified but
> events could arrive between a get and any final clear all operation.
> This means events would be missed.
> Therefore each event is cleared specifically.
> 
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
I think there is a type issue on the min_t() calculation with that addressed
this looks good to me.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> 
> ---
> Changes from V1:
> 	Clear Event Record allows for u8 handles while Get Event Record
> 	allows for u16 records to be returned.  Based on Jonathan's
> 	feedback; allow for all event records to be handled in this
> 	clear.  Which means a double loop with potentially multiple
> 	Clear Event payloads being sent to clear all events sent.
> 
> Changes from RFC:
> 	Jonathan
> 		Clean up init of payload and use return code.
> 		Also report any error to clear the event.
> 		s/v3.0/rev 3.0
> ---
>  drivers/cxl/core/mbox.c      | 61 +++++++++++++++++++++++++++++++-----
>  drivers/cxl/cxlmem.h         | 14 +++++++++
>  include/uapi/linux/cxl_mem.h |  1 +
>  3 files changed, 69 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 70b681027a3d..076a3df0ba38 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -52,6 +52,7 @@ static struct cxl_mem_command cxl_mem_commands[CXL_MEM_COMMAND_ID_MAX] = {
>  #endif
>  	CXL_CMD(GET_SUPPORTED_LOGS, 0, CXL_VARIABLE_PAYLOAD, CXL_CMD_FLAG_FORCE_ENABLE),
>  	CXL_CMD(GET_EVENT_RECORD, 1, CXL_VARIABLE_PAYLOAD, 0),
> +	CXL_CMD(CLEAR_EVENT_RECORD, CXL_VARIABLE_PAYLOAD, 0, 0),
>  	CXL_CMD(GET_FW_INFO, 0, 0x50, 0),
>  	CXL_CMD(GET_PARTITION_INFO, 0, 0x20, 0),
>  	CXL_CMD(GET_LSA, 0x8, CXL_VARIABLE_PAYLOAD, 0),
> @@ -708,6 +709,42 @@ int cxl_enumerate_cmds(struct cxl_dev_state *cxlds)
>  }
>  EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL);
>  
> +static int cxl_clear_event_record(struct cxl_dev_state *cxlds,
> +				  enum cxl_event_log_type log,
> +				  struct cxl_get_event_payload *get_pl,
> +				  u16 total)
> +{
> +	struct cxl_mbox_clear_event_payload payload = {
> +		.event_log = log,
> +	};
> +	int cnt;
> +
> +	/*
> +	 * Clear Event Records uses u8 for the handle cnt while Get Event
> +	 * Record can return up to 0xffff records.
> +	 */
> +	for (cnt = 0; cnt < total; /* cnt incremented internally */) {
> +		u8 nr_recs = min_t(u8, (total - cnt),
> +				   CXL_CLEAR_EVENT_MAX_HANDLES);

I might be half asleep but isn't this assuming that (total - cnt)
fits in an u8?  Shouldn't this be min_t(u16, ..) 
Also, maybe u16 cnt would be simpler.

Hmm.  This is safe but only because of how you call it alongside
handling of a particular Get event records response (which must
have fitted in the mailbox and has a longer header).

Looking at this function in isolation, I think the mailbox could be
small enough that we might not fit 255 records + the header.
Perhaps we need a comment to say that, or at minimum a check and error
return if it won't fit?

> +		int i, rc;
> +
> +		for (i = 0; i < nr_recs; i++, cnt++) {
> +			payload.handle[i] = get_pl->records[cnt].hdr.handle;
> +			dev_dbg(cxlds->dev, "Event log '%s': Clearning %u\n",
> +				cxl_event_log_type_str(log),
> +				le16_to_cpu(payload.handle[i]));
> +		}
> +		payload.nr_recs = nr_recs;
> +
> +		rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_CLEAR_EVENT_RECORD,
> +				       &payload, sizeof(payload), NULL, 0);
> +		if (rc)
> +			return rc;
> +	}
> +
> +	return 0;
> +}
> +
>  static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
>  				    enum cxl_event_log_type type)
>  {
> @@ -732,13 +769,22 @@ static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
This feels miss named now but I can't immediately think of better naming so on that
basis fine to leave it as is if you don't have a better idea!.

>  		}
>  
>  		nr_rec = le16_to_cpu(payload->record_count);
> -		if (trace_cxl_generic_event_enabled()) {
> +		if (nr_rec > 0) {
>  			int i;
>  
> -			for (i = 0; i < nr_rec; i++)
> -				trace_cxl_generic_event(dev_name(cxlds->dev),
> -							type,
> -							&payload->records[i]);
> +			if (trace_cxl_generic_event_enabled()) {
> +				for (i = 0; i < nr_rec; i++)
> +					trace_cxl_generic_event(dev_name(cxlds->dev),
> +								type,
> +								&payload->records[i]);
> +			}
> +
> +			rc = cxl_clear_event_record(cxlds, type, payload, nr_rec);
> +			if (rc) {
> +				dev_err(cxlds->dev, "Event log '%s': Failed to clear events : %d",
> +					cxl_event_log_type_str(type), rc);
> +				return;
> +			}
>  		}

Ira Weiny Dec. 1, 2022, 3:30 p.m. UTC | #2

On Thu, Dec 01, 2022 at 01:26:18PM +0000, Jonathan Cameron wrote:
> On Wed, 30 Nov 2022 16:27:11 -0800
> ira.weiny@intel.com wrote:
> 
> > From: Ira Weiny <ira.weiny@intel.com>
> > 
> > CXL rev 3.0 section 8.2.9.2.3 defines the Clear Event Records mailbox
> > command.  After an event record is read it needs to be cleared from the
> > event log.
> > 
> > Implement cxl_clear_event_record() to clear all record retrieved from
> > the device.
> > 
> > Each record is cleared explicitly.  A clear all bit is specified but
> > events could arrive between a get and any final clear all operation.
> > This means events would be missed.
> > Therefore each event is cleared specifically.
> > 
> > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> I think there is a type issue on the min_t() calculation with that addressed
> this looks good to me.
> 
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> 
> > 
> > ---
> > Changes from V1:
> > 	Clear Event Record allows for u8 handles while Get Event Record
> > 	allows for u16 records to be returned.  Based on Jonathan's
> > 	feedback; allow for all event records to be handled in this
> > 	clear.  Which means a double loop with potentially multiple
> > 	Clear Event payloads being sent to clear all events sent.
> > 
> > Changes from RFC:
> > 	Jonathan
> > 		Clean up init of payload and use return code.
> > 		Also report any error to clear the event.
> > 		s/v3.0/rev 3.0
> > ---
> >  drivers/cxl/core/mbox.c      | 61 +++++++++++++++++++++++++++++++-----
> >  drivers/cxl/cxlmem.h         | 14 +++++++++
> >  include/uapi/linux/cxl_mem.h |  1 +
> >  3 files changed, 69 insertions(+), 7 deletions(-)
> > 
> > diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> > index 70b681027a3d..076a3df0ba38 100644
> > --- a/drivers/cxl/core/mbox.c
> > +++ b/drivers/cxl/core/mbox.c
> > @@ -52,6 +52,7 @@ static struct cxl_mem_command cxl_mem_commands[CXL_MEM_COMMAND_ID_MAX] = {
> >  #endif
> >  	CXL_CMD(GET_SUPPORTED_LOGS, 0, CXL_VARIABLE_PAYLOAD, CXL_CMD_FLAG_FORCE_ENABLE),
> >  	CXL_CMD(GET_EVENT_RECORD, 1, CXL_VARIABLE_PAYLOAD, 0),
> > +	CXL_CMD(CLEAR_EVENT_RECORD, CXL_VARIABLE_PAYLOAD, 0, 0),
> >  	CXL_CMD(GET_FW_INFO, 0, 0x50, 0),
> >  	CXL_CMD(GET_PARTITION_INFO, 0, 0x20, 0),
> >  	CXL_CMD(GET_LSA, 0x8, CXL_VARIABLE_PAYLOAD, 0),
> > @@ -708,6 +709,42 @@ int cxl_enumerate_cmds(struct cxl_dev_state *cxlds)
> >  }
> >  EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL);
> >  
> > +static int cxl_clear_event_record(struct cxl_dev_state *cxlds,
> > +				  enum cxl_event_log_type log,
> > +				  struct cxl_get_event_payload *get_pl,
> > +				  u16 total)
> > +{
> > +	struct cxl_mbox_clear_event_payload payload = {
> > +		.event_log = log,
> > +	};
> > +	int cnt;
> > +
> > +	/*
> > +	 * Clear Event Records uses u8 for the handle cnt while Get Event
> > +	 * Record can return up to 0xffff records.
> > +	 */
> > +	for (cnt = 0; cnt < total; /* cnt incremented internally */) {
> > +		u8 nr_recs = min_t(u8, (total - cnt),
> > +				   CXL_CLEAR_EVENT_MAX_HANDLES);
> 
> I might be half asleep but isn't this assuming that (total - cnt)
> fits in an u8?  Shouldn't this be min_t(u16, ..) 

This cast will ensure the value is never out of range for nr_recs which needs
to be u8 and (total - cnt) will never be negative.

But now you have me double thinking myself.

> Also, maybe u16 cnt would be simpler.
> 
> Hmm.  This is safe but only because of how you call it alongside
> handling of a particular Get event records response (which must
> have fitted in the mailbox and has a longer header).
> 
> Looking at this function in isolation, I think the mailbox could be
> small enough that we might not fit 255 records + the header.
> Perhaps we need a comment to say that, or at minimum a check and error
> return if it won't fit?

I did not realize that Payload Size applied to input payloads as well.  :-/
There is no check in the send command for that ATM.  Looking at the spec I
think you are right.

I'll further limit the payload size here too.

And with this I might get rid of the min_t() and just cap based on that value.

> 
> > +		int i, rc;
> > +
> > +		for (i = 0; i < nr_recs; i++, cnt++) {
> > +			payload.handle[i] = get_pl->records[cnt].hdr.handle;
> > +			dev_dbg(cxlds->dev, "Event log '%s': Clearning %u\n",
> > +				cxl_event_log_type_str(log),
> > +				le16_to_cpu(payload.handle[i]));
> > +		}
> > +		payload.nr_recs = nr_recs;
> > +
> > +		rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_CLEAR_EVENT_RECORD,
> > +				       &payload, sizeof(payload), NULL, 0);
> > +		if (rc)
> > +			return rc;
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> >  static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
> >  				    enum cxl_event_log_type type)
> >  {
> > @@ -732,13 +769,22 @@ static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
> This feels miss named now but I can't immediately think of better naming so on that
> basis fine to leave it as is if you don't have a better idea!.

So we leave it.  Naming is hard!  :-D

Thanks for the quick review, V3 coming ASAP.
Ira

Dan Williams Dec. 2, 2022, 2:29 a.m. UTC | #3

ira.weiny@ wrote:
> From: Ira Weiny <ira.weiny@intel.com>
> 
> CXL rev 3.0 section 8.2.9.2.3 defines the Clear Event Records mailbox
> command.  After an event record is read it needs to be cleared from the
> event log.
> 
> Implement cxl_clear_event_record() to clear all record retrieved from
> the device.
> 
> Each record is cleared explicitly.  A clear all bit is specified but
> events could arrive between a get and any final clear all operation.
> This means events would be missed.
> Therefore each event is cleared specifically.

Note that the spec has a better reason for why Clear All has limited
usage:

"Clear All Events is only allowed when the Event Log has overflowed;
otherwise, the device shall return Invalid Input."

Will need to wait and see if we need that to keep pace with a device
with a high event frequency.

> 
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> 
> ---
> Changes from V1:
> 	Clear Event Record allows for u8 handles while Get Event Record
> 	allows for u16 records to be returned.  Based on Jonathan's
> 	feedback; allow for all event records to be handled in this
> 	clear.  Which means a double loop with potentially multiple
> 	Clear Event payloads being sent to clear all events sent.
> 
> Changes from RFC:
> 	Jonathan
> 		Clean up init of payload and use return code.
> 		Also report any error to clear the event.
> 		s/v3.0/rev 3.0
> ---
>  drivers/cxl/core/mbox.c      | 61 +++++++++++++++++++++++++++++++-----
>  drivers/cxl/cxlmem.h         | 14 +++++++++
>  include/uapi/linux/cxl_mem.h |  1 +
>  3 files changed, 69 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 70b681027a3d..076a3df0ba38 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -52,6 +52,7 @@ static struct cxl_mem_command cxl_mem_commands[CXL_MEM_COMMAND_ID_MAX] = {
>  #endif
>  	CXL_CMD(GET_SUPPORTED_LOGS, 0, CXL_VARIABLE_PAYLOAD, CXL_CMD_FLAG_FORCE_ENABLE),
>  	CXL_CMD(GET_EVENT_RECORD, 1, CXL_VARIABLE_PAYLOAD, 0),
> +	CXL_CMD(CLEAR_EVENT_RECORD, CXL_VARIABLE_PAYLOAD, 0, 0),
>  	CXL_CMD(GET_FW_INFO, 0, 0x50, 0),
>  	CXL_CMD(GET_PARTITION_INFO, 0, 0x20, 0),
>  	CXL_CMD(GET_LSA, 0x8, CXL_VARIABLE_PAYLOAD, 0),
> @@ -708,6 +709,42 @@ int cxl_enumerate_cmds(struct cxl_dev_state *cxlds)
>  }
>  EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL);
>  
> +static int cxl_clear_event_record(struct cxl_dev_state *cxlds,
> +				  enum cxl_event_log_type log,
> +				  struct cxl_get_event_payload *get_pl,
> +				  u16 total)
> +{
> +	struct cxl_mbox_clear_event_payload payload = {
> +		.event_log = log,
> +	};
> +	int cnt;
> +
> +	/*
> +	 * Clear Event Records uses u8 for the handle cnt while Get Event
> +	 * Record can return up to 0xffff records.
> +	 */
> +	for (cnt = 0; cnt < total; /* cnt incremented internally */) {
> +		u8 nr_recs = min_t(u8, (total - cnt),
> +				   CXL_CLEAR_EVENT_MAX_HANDLES);

This seems overly complicated. @total is a duplicate of
@get_pl->record_count, and the 2 loops feel like it could be cut
down to one.

> +		int i, rc;
> +
> +		for (i = 0; i < nr_recs; i++, cnt++) {
> +			payload.handle[i] = get_pl->records[cnt].hdr.handle;
> +			dev_dbg(cxlds->dev, "Event log '%s': Clearning %u\n",

While I do think this operation is a mix of clearing and cleaning event
records, I don't think "Clearning" is a word.

> +				cxl_event_log_type_str(log),
> +				le16_to_cpu(payload.handle[i]));
> +		}
> +		payload.nr_recs = nr_recs;
> +
> +		rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_CLEAR_EVENT_RECORD,
> +				       &payload, sizeof(payload), NULL, 0);
> +		if (rc)
> +			return rc;
> +	}
> +
> +	return 0;
> +}
> +
>  static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
>  				    enum cxl_event_log_type type)
>  {
> @@ -732,13 +769,22 @@ static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
>  		}
>  
>  		nr_rec = le16_to_cpu(payload->record_count);
> -		if (trace_cxl_generic_event_enabled()) {
> +		if (nr_rec > 0) {
>  			int i;
>  
> -			for (i = 0; i < nr_rec; i++)
> -				trace_cxl_generic_event(dev_name(cxlds->dev),
> -							type,
> -							&payload->records[i]);
> +			if (trace_cxl_generic_event_enabled()) {

Again, trace_cxl_generic_event_enabled() injects some awkward
formatting here to micro-optimize looping. Any performance benefit this
code might offer is likely offset by the extra human effort to read it.

> +				for (i = 0; i < nr_rec; i++)
> +					trace_cxl_generic_event(dev_name(cxlds->dev),
> +								type,
> +								&payload->records[i]);
> +			}
> +
> +			rc = cxl_clear_event_record(cxlds, type, payload, nr_rec);
> +			if (rc) {
> +				dev_err(cxlds->dev, "Event log '%s': Failed to clear events : %d",
> +					cxl_event_log_type_str(type), rc);
> +				return;
> +			}
>  		}
>  
>  		if (trace_cxl_overflow_enabled() &&
> @@ -780,10 +826,11 @@ static struct cxl_get_event_payload *alloc_event_buf(struct cxl_dev_state *cxlds
>   * cxl_mem_get_event_records - Get Event Records from the device
>   * @cxlds: The device data for the operation
>   *
> - * Retrieve all event records available on the device and report them as trace
> - * events.
> + * Retrieve all event records available on the device, report them as trace
> + * events, and clear them.
>   *
>   * See CXL rev 3.0 @8.2.9.2.2 Get Event Records
> + * See CXL rev 3.0 @8.2.9.2.3 Clear Event Records
>   */
>  void cxl_mem_get_event_records(struct cxl_dev_state *cxlds)
>  {
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 55d57f5a64bc..1ae9962c5a06 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -261,6 +261,7 @@ enum cxl_opcode {
>  	CXL_MBOX_OP_INVALID		= 0x0000,
>  	CXL_MBOX_OP_RAW			= CXL_MBOX_OP_INVALID,
>  	CXL_MBOX_OP_GET_EVENT_RECORD	= 0x0100,
> +	CXL_MBOX_OP_CLEAR_EVENT_RECORD	= 0x0101,
>  	CXL_MBOX_OP_GET_FW_INFO		= 0x0200,
>  	CXL_MBOX_OP_ACTIVATE_FW		= 0x0202,
>  	CXL_MBOX_OP_GET_SUPPORTED_LOGS	= 0x0400,
> @@ -396,6 +397,19 @@ static inline const char *cxl_event_log_type_str(enum cxl_event_log_type type)
>  	return "<unknown>";
>  }
>  
> +/*
> + * Clear Event Records input payload
> + * CXL rev 3.0 section 8.2.9.2.3; Table 8-51
> + */
> +#define CXL_CLEAR_EVENT_MAX_HANDLES (0xff)
> +struct cxl_mbox_clear_event_payload {
> +	u8 event_log;		/* enum cxl_event_log_type */
> +	u8 clear_flags;
> +	u8 nr_recs;
> +	u8 reserved[3];
> +	__le16 handle[CXL_CLEAR_EVENT_MAX_HANDLES];
> +};
> +
>  struct cxl_mbox_get_partition_info {
>  	__le64 active_volatile_cap;
>  	__le64 active_persistent_cap;
> diff --git a/include/uapi/linux/cxl_mem.h b/include/uapi/linux/cxl_mem.h
> index 70459be5bdd4..7c1ad8062792 100644
> --- a/include/uapi/linux/cxl_mem.h
> +++ b/include/uapi/linux/cxl_mem.h
> @@ -25,6 +25,7 @@
>  	___C(RAW, "Raw device command"),                                  \
>  	___C(GET_SUPPORTED_LOGS, "Get Supported Logs"),                   \
>  	___C(GET_EVENT_RECORD, "Get Event Record"),                       \
> +	___C(CLEAR_EVENT_RECORD, "Clear Event Record"),                   \
>  	___C(GET_FW_INFO, "Get FW Info"),                                 \
>  	___C(GET_PARTITION_INFO, "Get Partition Information"),            \
>  	___C(GET_LSA, "Get Label Storage Area"),                          \

Same, "yikes" / "must be at the end of the enum" feedback.

Jonathan Cameron Dec. 2, 2022, 1:18 p.m. UTC | #4

> > +static int cxl_clear_event_record(struct cxl_dev_state *cxlds,
> > +				  enum cxl_event_log_type log,
> > +				  struct cxl_get_event_payload *get_pl,
> > +				  u16 total)
> > +{
> > +	struct cxl_mbox_clear_event_payload payload = {
> > +		.event_log = log,
> > +	};
> > +	int cnt;
> > +
> > +	/*
> > +	 * Clear Event Records uses u8 for the handle cnt while Get Event
> > +	 * Record can return up to 0xffff records.
> > +	 */
> > +	for (cnt = 0; cnt < total; /* cnt incremented internally */) {
> > +		u8 nr_recs = min_t(u8, (total - cnt),
> > +				   CXL_CLEAR_EVENT_MAX_HANDLES);  
> 
> This seems overly complicated. @total is a duplicate of
> @get_pl->record_count, and the 2 loops feel like it could be cut
> down to one.


You could do something nasty like
	for (i = 0; i < total; i++) {

		...
		payload.handle[i % CLEAR_EVENT_MAX_HANDLES] = ...
		if (i % CXL_CLEAR_EVENT_MAX_HANDLES == CXL_CLEAR_EVENT_MAX_HANDLE - 1) {
			send command.
		}
	}

but that looks worse to me than the double loop.

Making outer loop
	for (j = 0; j <= total / CXL_CLEAR_EVENT_MAX_HANDLES; j++)
might bet clearer but then you'd have to do
records[j * CXL_CLEAR_EVENT_MAX_HANDLES + i] which isn't nice.

Ah well, Ira gets to try and find a happy compromise.


...

> > diff --git a/include/uapi/linux/cxl_mem.h b/include/uapi/linux/cxl_mem.h
> > index 70459be5bdd4..7c1ad8062792 100644
> > --- a/include/uapi/linux/cxl_mem.h
> > +++ b/include/uapi/linux/cxl_mem.h
> > @@ -25,6 +25,7 @@
> >  	___C(RAW, "Raw device command"),                                  \
> >  	___C(GET_SUPPORTED_LOGS, "Get Supported Logs"),                   \
> >  	___C(GET_EVENT_RECORD, "Get Event Record"),                       \
> > +	___C(CLEAR_EVENT_RECORD, "Clear Event Record"),                   \
> >  	___C(GET_FW_INFO, "Get FW Info"),                                 \
> >  	___C(GET_PARTITION_INFO, "Get Partition Information"),            \
> >  	___C(GET_LSA, "Get Label Storage Area"),                          \  
> 
> Same, "yikes" / "must be at the end of the enum" feedback.

Macro magic makes that non obvious.. Not that I'd ever said I thought this trick
was a bad idea ;)

Steven Rostedt Dec. 2, 2022, 1:34 p.m. UTC | #5

On Thu, 1 Dec 2022 18:29:20 -0800
Dan Williams <dan.j.williams@intel.com> wrote:

> >  static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
> >  				    enum cxl_event_log_type type)
> >  {
> > @@ -732,13 +769,22 @@ static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
> >  		}
> >  
> >  		nr_rec = le16_to_cpu(payload->record_count);
> > -		if (trace_cxl_generic_event_enabled()) {
> > +		if (nr_rec > 0) {
> >  			int i;
> >  
> > -			for (i = 0; i < nr_rec; i++)
> > -				trace_cxl_generic_event(dev_name(cxlds->dev),
> > -							type,
> > -							&payload->records[i]);
> > +			if (trace_cxl_generic_event_enabled()) {  
> 
> Again, trace_cxl_generic_event_enabled() injects some awkward
> formatting here to micro-optimize looping. Any performance benefit this
> code might offer is likely offset by the extra human effort to read it.

This is commonly used throughout the kernel, and highly suggested for use to
encapsulate any work being done only for tracing, when tracing is disabled.
It uses static_braches/jump_labels which makes the loop into a 'nop' when
tracing is off. That is, there is zero overhead for the for loop below (and
there's not even a branch to skip it!)

But sure, if you really don't care as it's not a fast path, then keep it
out. I like people to keep the habit of doing this, because otherwise it
tends to creep into the fast paths.

-- Steve

> 
> > +				for (i = 0; i < nr_rec; i++)
> > +					trace_cxl_generic_event(dev_name(cxlds->dev),
> > +								type,
> > +								&payload->records[i]);
> > +			}
> > +
> > +			rc = cxl_clear_event_record(cxlds, type, payload, nr_rec);
> > +			if (rc) {
> > +				dev_err(cxlds->dev, "Event log '%s': Failed to clear events : %d",
> > +					cxl_event_log_type_str(type), rc);
> > +				return;
> > +			}
> >  		}
> >

Dan Williams Dec. 2, 2022, 7:27 p.m. UTC | #6

Steven Rostedt wrote:
> On Thu, 1 Dec 2022 18:29:20 -0800
> Dan Williams <dan.j.williams@intel.com> wrote:
> 
> > >  static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
> > >  				    enum cxl_event_log_type type)
> > >  {
> > > @@ -732,13 +769,22 @@ static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
> > >  		}
> > >  
> > >  		nr_rec = le16_to_cpu(payload->record_count);
> > > -		if (trace_cxl_generic_event_enabled()) {
> > > +		if (nr_rec > 0) {
> > >  			int i;
> > >  
> > > -			for (i = 0; i < nr_rec; i++)
> > > -				trace_cxl_generic_event(dev_name(cxlds->dev),
> > > -							type,
> > > -							&payload->records[i]);
> > > +			if (trace_cxl_generic_event_enabled()) {  
> > 
> > Again, trace_cxl_generic_event_enabled() injects some awkward
> > formatting here to micro-optimize looping. Any performance benefit this
> > code might offer is likely offset by the extra human effort to read it.
> 
> This is commonly used throughout the kernel, and highly suggested for use to
> encapsulate any work being done only for tracing, when tracing is disabled.
> It uses static_braches/jump_labels which makes the loop into a 'nop' when
> tracing is off. That is, there is zero overhead for the for loop below (and
> there's not even a branch to skip it!)
> 
> But sure, if you really don't care as it's not a fast path, then keep it
> out. I like people to keep the habit of doing this, because otherwise it
> tends to creep into the fast paths.

Duly noted. It makes a lot of sense when you are tracing in a fast path
to skip any and all preamble code. In this case we are doing it after
doing a whole series of uncached PCI mmio reads with all the stalling
and serialization that implies. 

Speaking of which, this probably wants a cond_resched() after each loop
iteration.

I'll note it is also a tracepoint that is likely to be enabled most of
the time in production.

Ira Weiny Dec. 2, 2022, 9:28 p.m. UTC | #7

On Fri, Dec 02, 2022 at 11:27:07AM -0800, Dan Williams wrote:
> Steven Rostedt wrote:
> > On Thu, 1 Dec 2022 18:29:20 -0800
> > Dan Williams <dan.j.williams@intel.com> wrote:
> > 
> > > >  static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
> > > >  				    enum cxl_event_log_type type)
> > > >  {
> > > > @@ -732,13 +769,22 @@ static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
> > > >  		}
> > > >  
> > > >  		nr_rec = le16_to_cpu(payload->record_count);
> > > > -		if (trace_cxl_generic_event_enabled()) {
> > > > +		if (nr_rec > 0) {
> > > >  			int i;
> > > >  
> > > > -			for (i = 0; i < nr_rec; i++)
> > > > -				trace_cxl_generic_event(dev_name(cxlds->dev),
> > > > -							type,
> > > > -							&payload->records[i]);
> > > > +			if (trace_cxl_generic_event_enabled()) {  
> > > 
> > > Again, trace_cxl_generic_event_enabled() injects some awkward
> > > formatting here to micro-optimize looping. Any performance benefit this
> > > code might offer is likely offset by the extra human effort to read it.
> > 
> > This is commonly used throughout the kernel, and highly suggested for use to
> > encapsulate any work being done only for tracing, when tracing is disabled.
> > It uses static_braches/jump_labels which makes the loop into a 'nop' when
> > tracing is off. That is, there is zero overhead for the for loop below (and
> > there's not even a branch to skip it!)
> > 
> > But sure, if you really don't care as it's not a fast path, then keep it
> > out. I like people to keep the habit of doing this, because otherwise it
> > tends to creep into the fast paths.

Thanks for chiming in here Steven.  I should have pushed back on this.

> 
> Duly noted. It makes a lot of sense when you are tracing in a fast path
> to skip any and all preamble code. In this case we are doing it after
> doing a whole series of uncached PCI mmio reads with all the stalling
> and serialization that implies. 
> 
> Speaking of which, this probably wants a cond_resched() after each loop
> iteration.
> 
> I'll note it is also a tracepoint that is likely to be enabled most of
> the time in production.

Ok I did not have any of these in there originally and I will remove them now.

Thanks!
Ira

Ira Weiny Dec. 2, 2022, 11:49 p.m. UTC | #8

On Thu, Dec 01, 2022 at 06:29:20PM -0800, Dan Williams wrote:
> ira.weiny@ wrote:
> > From: Ira Weiny <ira.weiny@intel.com>
> > 
> > CXL rev 3.0 section 8.2.9.2.3 defines the Clear Event Records mailbox
> > command.  After an event record is read it needs to be cleared from the
> > event log.
> > 
> > Implement cxl_clear_event_record() to clear all record retrieved from
> > the device.
> > 
> > Each record is cleared explicitly.  A clear all bit is specified but
> > events could arrive between a get and any final clear all operation.
> > This means events would be missed.
> > Therefore each event is cleared specifically.
> 
> Note that the spec has a better reason for why Clear All has limited
> usage:
> 
> "Clear All Events is only allowed when the Event Log has overflowed;
> otherwise, the device shall return Invalid Input."
> 
> Will need to wait and see if we need that to keep pace with a device
> with a high event frequency.

Perhaps.  But yea I would wait and see.

[snip]

> > +static int cxl_clear_event_record(struct cxl_dev_state *cxlds,
> > +				  enum cxl_event_log_type log,
> > +				  struct cxl_get_event_payload *get_pl,
> > +				  u16 total)
> > +{
> > +	struct cxl_mbox_clear_event_payload payload = {
> > +		.event_log = log,
> > +	};
> > +	int cnt;
> > +
> > +	/*
> > +	 * Clear Event Records uses u8 for the handle cnt while Get Event
> > +	 * Record can return up to 0xffff records.
> > +	 */
> > +	for (cnt = 0; cnt < total; /* cnt incremented internally */) {
> > +		u8 nr_recs = min_t(u8, (total - cnt),
> > +				   CXL_CLEAR_EVENT_MAX_HANDLES);
> 
> This seems overly complicated. @total is a duplicate of
> @get_pl->record_count, and the 2 loops feel like it could be cut
> down to one.

Sure, total is redundant to pass to the function.

However, 2 loops is IMO not at all overly complicated.  Note that the 2 loops
do not do the same thing.  The inner loop is filling in the payload for the
Clear command.  There is really no way around doing this.

Now that I've had time to think about it:

	Are you suggesting we issue a single mailbox command for every handle?

That would be a single loop.  But a lot more mailbox commands.

> 
> > +		int i, rc;
> > +
> > +		for (i = 0; i < nr_recs; i++, cnt++) {
> > +			payload.handle[i] = get_pl->records[cnt].hdr.handle;
> > +			dev_dbg(cxlds->dev, "Event log '%s': Clearning %u\n",
> 
> While I do think this operation is a mix of clearing and cleaning event
> records, I don't think "Clearning" is a word.

LOL...  I'll fix it.  :-D

> 
> > +				cxl_event_log_type_str(log),
> > +				le16_to_cpu(payload.handle[i]));
> > +		}
> > +		payload.nr_recs = nr_recs;
> > +
> > +		rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_CLEAR_EVENT_RECORD,
> > +				       &payload, sizeof(payload), NULL, 0);
> > +		if (rc)
> > +			return rc;
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> >  static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
> >  				    enum cxl_event_log_type type)
> >  {
> > @@ -732,13 +769,22 @@ static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
> >  		}
> >  
> >  		nr_rec = le16_to_cpu(payload->record_count);
> > -		if (trace_cxl_generic_event_enabled()) {
> > +		if (nr_rec > 0) {
> >  			int i;
> >  
> > -			for (i = 0; i < nr_rec; i++)
> > -				trace_cxl_generic_event(dev_name(cxlds->dev),
> > -							type,
> > -							&payload->records[i]);
> > +			if (trace_cxl_generic_event_enabled()) {
> 
> Again, trace_cxl_generic_event_enabled() injects some awkward
> formatting here to micro-optimize looping. Any performance benefit this
> code might offer is likely offset by the extra human effort to read it.

Agreed.  Gone.

> 
> > +				for (i = 0; i < nr_rec; i++)
> > +					trace_cxl_generic_event(dev_name(cxlds->dev),
> > +								type,
> > +								&payload->records[i]);
> > +			}
> > +
> > +			rc = cxl_clear_event_record(cxlds, type, payload, nr_rec);
> > +			if (rc) {
> > +				dev_err(cxlds->dev, "Event log '%s': Failed to clear events : %d",
> > +					cxl_event_log_type_str(type), rc);
> > +				return;
> > +			}
> >  		}
> >  
> >  		if (trace_cxl_overflow_enabled() &&
> > @@ -780,10 +826,11 @@ static struct cxl_get_event_payload *alloc_event_buf(struct cxl_dev_state *cxlds
> >   * cxl_mem_get_event_records - Get Event Records from the device
> >   * @cxlds: The device data for the operation
> >   *
> > - * Retrieve all event records available on the device and report them as trace
> > - * events.
> > + * Retrieve all event records available on the device, report them as trace
> > + * events, and clear them.
> >   *
> >   * See CXL rev 3.0 @8.2.9.2.2 Get Event Records
> > + * See CXL rev 3.0 @8.2.9.2.3 Clear Event Records
> >   */
> >  void cxl_mem_get_event_records(struct cxl_dev_state *cxlds)
> >  {
> > diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> > index 55d57f5a64bc..1ae9962c5a06 100644
> > --- a/drivers/cxl/cxlmem.h
> > +++ b/drivers/cxl/cxlmem.h
> > @@ -261,6 +261,7 @@ enum cxl_opcode {
> >  	CXL_MBOX_OP_INVALID		= 0x0000,
> >  	CXL_MBOX_OP_RAW			= CXL_MBOX_OP_INVALID,
> >  	CXL_MBOX_OP_GET_EVENT_RECORD	= 0x0100,
> > +	CXL_MBOX_OP_CLEAR_EVENT_RECORD	= 0x0101,
> >  	CXL_MBOX_OP_GET_FW_INFO		= 0x0200,
> >  	CXL_MBOX_OP_ACTIVATE_FW		= 0x0202,
> >  	CXL_MBOX_OP_GET_SUPPORTED_LOGS	= 0x0400,
> > @@ -396,6 +397,19 @@ static inline const char *cxl_event_log_type_str(enum cxl_event_log_type type)
> >  	return "<unknown>";
> >  }
> >  
> > +/*
> > + * Clear Event Records input payload
> > + * CXL rev 3.0 section 8.2.9.2.3; Table 8-51
> > + */
> > +#define CXL_CLEAR_EVENT_MAX_HANDLES (0xff)
> > +struct cxl_mbox_clear_event_payload {
> > +	u8 event_log;		/* enum cxl_event_log_type */
> > +	u8 clear_flags;
> > +	u8 nr_recs;
> > +	u8 reserved[3];
> > +	__le16 handle[CXL_CLEAR_EVENT_MAX_HANDLES];
> > +};
> > +
> >  struct cxl_mbox_get_partition_info {
> >  	__le64 active_volatile_cap;
> >  	__le64 active_persistent_cap;
> > diff --git a/include/uapi/linux/cxl_mem.h b/include/uapi/linux/cxl_mem.h
> > index 70459be5bdd4..7c1ad8062792 100644
> > --- a/include/uapi/linux/cxl_mem.h
> > +++ b/include/uapi/linux/cxl_mem.h
> > @@ -25,6 +25,7 @@
> >  	___C(RAW, "Raw device command"),                                  \
> >  	___C(GET_SUPPORTED_LOGS, "Get Supported Logs"),                   \
> >  	___C(GET_EVENT_RECORD, "Get Event Record"),                       \
> > +	___C(CLEAR_EVENT_RECORD, "Clear Event Record"),                   \
> >  	___C(GET_FW_INFO, "Get FW Info"),                                 \
> >  	___C(GET_PARTITION_INFO, "Get Partition Information"),            \
> >  	___C(GET_LSA, "Get Label Storage Area"),                          \
> 
> Same, "yikes" / "must be at the end of the enum" feedback.

Yep,
Ira

Dan Williams Dec. 3, 2022, 1:14 a.m. UTC | #9

Ira Weiny wrote:
> On Thu, Dec 01, 2022 at 06:29:20PM -0800, Dan Williams wrote:
> > ira.weiny@ wrote:
> > > From: Ira Weiny <ira.weiny@intel.com>
> > > 
> > > CXL rev 3.0 section 8.2.9.2.3 defines the Clear Event Records mailbox
> > > command.  After an event record is read it needs to be cleared from the
> > > event log.
> > > 
> > > Implement cxl_clear_event_record() to clear all record retrieved from
> > > the device.
> > > 
> > > Each record is cleared explicitly.  A clear all bit is specified but
> > > events could arrive between a get and any final clear all operation.
> > > This means events would be missed.
> > > Therefore each event is cleared specifically.
> > 
> > Note that the spec has a better reason for why Clear All has limited
> > usage:
> > 
> > "Clear All Events is only allowed when the Event Log has overflowed;
> > otherwise, the device shall return Invalid Input."
> > 
> > Will need to wait and see if we need that to keep pace with a device
> > with a high event frequency.
> 
> Perhaps.  But yea I would wait and see.
> 
> [snip]
> 
> > > +static int cxl_clear_event_record(struct cxl_dev_state *cxlds,
> > > +				  enum cxl_event_log_type log,
> > > +				  struct cxl_get_event_payload *get_pl,
> > > +				  u16 total)
> > > +{
> > > +	struct cxl_mbox_clear_event_payload payload = {
> > > +		.event_log = log,
> > > +	};
> > > +	int cnt;
> > > +
> > > +	/*
> > > +	 * Clear Event Records uses u8 for the handle cnt while Get Event
> > > +	 * Record can return up to 0xffff records.
> > > +	 */
> > > +	for (cnt = 0; cnt < total; /* cnt incremented internally */) {
> > > +		u8 nr_recs = min_t(u8, (total - cnt),
> > > +				   CXL_CLEAR_EVENT_MAX_HANDLES);
> > 
> > This seems overly complicated. @total is a duplicate of
> > @get_pl->record_count, and the 2 loops feel like it could be cut
> > down to one.
> 
> Sure, total is redundant to pass to the function.
> 
> However, 2 loops is IMO not at all overly complicated.  Note that the 2 loops
> do not do the same thing.  The inner loop is filling in the payload for the
> Clear command.  There is really no way around doing this.
> 
> Now that I've had time to think about it:
> 
> 	Are you suggesting we issue a single mailbox command for every handle?
> 
> That would be a single loop.  But a lot more mailbox commands.

I was thinking something like this pseudo code

int tosend = le16_to_cpu(get_pl->record_count);
int added = 0;

    for (i = 0; i < tosend; i++) {
    	add_to_clear(added++);
    	if (added == MAX)
    		send_mailbox();
	added = 0;
    }

    if (added)
    	send_mailbox();

...where it batches and sends every 256 and one more send afterwards for
any stragglers.

Ira Weiny Dec. 6, 2022, 7:35 a.m. UTC | #10

On Fri, Dec 02, 2022 at 05:14:27PM -0800, Dan Williams wrote:
> Ira Weiny wrote:
> > On Thu, Dec 01, 2022 at 06:29:20PM -0800, Dan Williams wrote:
> > > ira.weiny@ wrote:
> > > > From: Ira Weiny <ira.weiny@intel.com>
> > > > 
> > > > CXL rev 3.0 section 8.2.9.2.3 defines the Clear Event Records mailbox
> > > > command.  After an event record is read it needs to be cleared from the
> > > > event log.
> > > > 
> > > > Implement cxl_clear_event_record() to clear all record retrieved from
> > > > the device.
> > > > 
> > > > Each record is cleared explicitly.  A clear all bit is specified but
> > > > events could arrive between a get and any final clear all operation.
> > > > This means events would be missed.
> > > > Therefore each event is cleared specifically.
> > > 
> > > Note that the spec has a better reason for why Clear All has limited
> > > usage:
> > > 
> > > "Clear All Events is only allowed when the Event Log has overflowed;
> > > otherwise, the device shall return Invalid Input."
> > > 
> > > Will need to wait and see if we need that to keep pace with a device
> > > with a high event frequency.
> > 
> > Perhaps.  But yea I would wait and see.
> > 
> > [snip]
> > 
> > > > +static int cxl_clear_event_record(struct cxl_dev_state *cxlds,
> > > > +				  enum cxl_event_log_type log,
> > > > +				  struct cxl_get_event_payload *get_pl,
> > > > +				  u16 total)
> > > > +{
> > > > +	struct cxl_mbox_clear_event_payload payload = {
> > > > +		.event_log = log,
> > > > +	};
> > > > +	int cnt;
> > > > +
> > > > +	/*
> > > > +	 * Clear Event Records uses u8 for the handle cnt while Get Event
> > > > +	 * Record can return up to 0xffff records.
> > > > +	 */
> > > > +	for (cnt = 0; cnt < total; /* cnt incremented internally */) {
> > > > +		u8 nr_recs = min_t(u8, (total - cnt),
> > > > +				   CXL_CLEAR_EVENT_MAX_HANDLES);
> > > 
> > > This seems overly complicated. @total is a duplicate of
> > > @get_pl->record_count, and the 2 loops feel like it could be cut
> > > down to one.
> > 
> > Sure, total is redundant to pass to the function.
> > 
> > However, 2 loops is IMO not at all overly complicated.  Note that the 2 loops
> > do not do the same thing.  The inner loop is filling in the payload for the
> > Clear command.  There is really no way around doing this.
> > 
> > Now that I've had time to think about it:
> > 
> > 	Are you suggesting we issue a single mailbox command for every handle?
> > 
> > That would be a single loop.  But a lot more mailbox commands.
> 
> I was thinking something like this pseudo code
> 
> int tosend = le16_to_cpu(get_pl->record_count);
> int added = 0;
> 
>     for (i = 0; i < tosend; i++) {
>     	add_to_clear(added++);
>     	if (added == MAX)
>     		send_mailbox();
> 	added = 0;
>     }
> 
>     if (added)
>     	send_mailbox();
> 
> ...where it batches and sends every 256 and one more send afterwards for
> any stragglers.

Ok I'm not convinced it makes that much difference but I don't have the
fortitude to try and look at the assembly to argue...  ;-)

Done.

Ira

diff mbox series

Patch

diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 70b681027a3d..076a3df0ba38 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -52,6 +52,7 @@  static struct cxl_mem_command cxl_mem_commands[CXL_MEM_COMMAND_ID_MAX] = {
 #endif
 	CXL_CMD(GET_SUPPORTED_LOGS, 0, CXL_VARIABLE_PAYLOAD, CXL_CMD_FLAG_FORCE_ENABLE),
 	CXL_CMD(GET_EVENT_RECORD, 1, CXL_VARIABLE_PAYLOAD, 0),
+	CXL_CMD(CLEAR_EVENT_RECORD, CXL_VARIABLE_PAYLOAD, 0, 0),
 	CXL_CMD(GET_FW_INFO, 0, 0x50, 0),
 	CXL_CMD(GET_PARTITION_INFO, 0, 0x20, 0),
 	CXL_CMD(GET_LSA, 0x8, CXL_VARIABLE_PAYLOAD, 0),
@@ -708,6 +709,42 @@  int cxl_enumerate_cmds(struct cxl_dev_state *cxlds)
 }
 EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL);
 
+static int cxl_clear_event_record(struct cxl_dev_state *cxlds,
+				  enum cxl_event_log_type log,
+				  struct cxl_get_event_payload *get_pl,
+				  u16 total)
+{
+	struct cxl_mbox_clear_event_payload payload = {
+		.event_log = log,
+	};
+	int cnt;
+
+	/*
+	 * Clear Event Records uses u8 for the handle cnt while Get Event
+	 * Record can return up to 0xffff records.
+	 */
+	for (cnt = 0; cnt < total; /* cnt incremented internally */) {
+		u8 nr_recs = min_t(u8, (total - cnt),
+				   CXL_CLEAR_EVENT_MAX_HANDLES);
+		int i, rc;
+
+		for (i = 0; i < nr_recs; i++, cnt++) {
+			payload.handle[i] = get_pl->records[cnt].hdr.handle;
+			dev_dbg(cxlds->dev, "Event log '%s': Clearning %u\n",
+				cxl_event_log_type_str(log),
+				le16_to_cpu(payload.handle[i]));
+		}
+		payload.nr_recs = nr_recs;
+
+		rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_CLEAR_EVENT_RECORD,
+				       &payload, sizeof(payload), NULL, 0);
+		if (rc)
+			return rc;
+	}
+
+	return 0;
+}
+
 static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
 				    enum cxl_event_log_type type)
 {
@@ -732,13 +769,22 @@  static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
 		}
 
 		nr_rec = le16_to_cpu(payload->record_count);
-		if (trace_cxl_generic_event_enabled()) {
+		if (nr_rec > 0) {
 			int i;
 
-			for (i = 0; i < nr_rec; i++)
-				trace_cxl_generic_event(dev_name(cxlds->dev),
-							type,
-							&payload->records[i]);
+			if (trace_cxl_generic_event_enabled()) {
+				for (i = 0; i < nr_rec; i++)
+					trace_cxl_generic_event(dev_name(cxlds->dev),
+								type,
+								&payload->records[i]);
+			}
+
+			rc = cxl_clear_event_record(cxlds, type, payload, nr_rec);
+			if (rc) {
+				dev_err(cxlds->dev, "Event log '%s': Failed to clear events : %d",
+					cxl_event_log_type_str(type), rc);
+				return;
+			}
 		}
 
 		if (trace_cxl_overflow_enabled() &&
@@ -780,10 +826,11 @@  static struct cxl_get_event_payload *alloc_event_buf(struct cxl_dev_state *cxlds
  * cxl_mem_get_event_records - Get Event Records from the device
  * @cxlds: The device data for the operation
  *
- * Retrieve all event records available on the device and report them as trace
- * events.
+ * Retrieve all event records available on the device, report them as trace
+ * events, and clear them.
  *
  * See CXL rev 3.0 @8.2.9.2.2 Get Event Records
+ * See CXL rev 3.0 @8.2.9.2.3 Clear Event Records
  */
 void cxl_mem_get_event_records(struct cxl_dev_state *cxlds)
 {
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 55d57f5a64bc..1ae9962c5a06 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -261,6 +261,7 @@  enum cxl_opcode {
 	CXL_MBOX_OP_INVALID		= 0x0000,
 	CXL_MBOX_OP_RAW			= CXL_MBOX_OP_INVALID,
 	CXL_MBOX_OP_GET_EVENT_RECORD	= 0x0100,
+	CXL_MBOX_OP_CLEAR_EVENT_RECORD	= 0x0101,
 	CXL_MBOX_OP_GET_FW_INFO		= 0x0200,
 	CXL_MBOX_OP_ACTIVATE_FW		= 0x0202,
 	CXL_MBOX_OP_GET_SUPPORTED_LOGS	= 0x0400,
@@ -396,6 +397,19 @@  static inline const char *cxl_event_log_type_str(enum cxl_event_log_type type)
 	return "<unknown>";
 }
 
+/*
+ * Clear Event Records input payload
+ * CXL rev 3.0 section 8.2.9.2.3; Table 8-51
+ */
+#define CXL_CLEAR_EVENT_MAX_HANDLES (0xff)
+struct cxl_mbox_clear_event_payload {
+	u8 event_log;		/* enum cxl_event_log_type */
+	u8 clear_flags;
+	u8 nr_recs;
+	u8 reserved[3];
+	__le16 handle[CXL_CLEAR_EVENT_MAX_HANDLES];
+};
+
 struct cxl_mbox_get_partition_info {
 	__le64 active_volatile_cap;
 	__le64 active_persistent_cap;
diff --git a/include/uapi/linux/cxl_mem.h b/include/uapi/linux/cxl_mem.h
index 70459be5bdd4..7c1ad8062792 100644
--- a/include/uapi/linux/cxl_mem.h
+++ b/include/uapi/linux/cxl_mem.h
@@ -25,6 +25,7 @@ 
 	___C(RAW, "Raw device command"),                                  \
 	___C(GET_SUPPORTED_LOGS, "Get Supported Logs"),                   \
 	___C(GET_EVENT_RECORD, "Get Event Record"),                       \
+	___C(CLEAR_EVENT_RECORD, "Clear Event Record"),                   \
 	___C(GET_FW_INFO, "Get FW Info"),                                 \
 	___C(GET_PARTITION_INFO, "Get Partition Information"),            \
 	___C(GET_LSA, "Get Label Storage Area"),                          \