[V2,02/11] cxl/mem: Implement Get Event Records command

Message ID 20221201002719.2596558-3-ira.weiny@intel.com
State New
Headers
Series CXL: Process event logs |

Commit Message

Ira Weiny Dec. 1, 2022, 12:27 a.m. UTC
  From: Ira Weiny <ira.weiny@intel.com>

CXL devices have multiple event logs which can be queried for CXL event
records.  Devices are required to support the storage of at least one
event record in each event log type.

Devices track event log overflow by incrementing a counter and tracking
the time of the first and last overflow event seen.

Software queries events via the Get Event Record mailbox command; CXL
rev 3.0 section 8.2.9.2.2.

Issue the Get Event Record mailbox command on driver load.  Trace each
record found with a generic record trace.  Trace any overflow
conditions.

The device can return up to 1MB worth of event records per query.
Allocate a shared large buffer to handle the max number of records based
on the mailbox payload size.

This patch traces a raw event record only and leaves the specific event
record types to subsequent patches.

Macros are created to use for tracing the common CXL Event header
fields.

Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Change from V1:
	Ignore useless More Event Flag
	defer DCD support
	Jonathan
		delete extra blank line
		Use all caps for flags
	Jonathan/Dan/Ira
		Allocate event MB buffer on start up.
	Alison
		s/pl_nr/nr_pl

Change from RFC v2:
	Support reading 3 events at once.
	Reverse Jonathan's suggestion and check for positive number of
		records.  Because the record count may have been
		returned as something > 3 based on what the device
		thinks it can send back even though the core Linux mbox
		processing truncates the data.
	Alison and Dave Jiang
		Change header uuid type to uuid_t for better user space
		processing
	Smita
		Check status reg before reading log.
	Steven
		Prefix all trace points with 'cxl_'
		Use static branch <trace>_enabled() calls
	Jonathan
		s/CXL_EVENT_TYPE_INFO/0
		s/{first,last}/{first,last}_ts
		Remove Reserved field from header
		Fix header issue for cxl_event_log_type_str()

Change from RFC:
	Remove redundant error message in get event records loop
	s/EVENT_RECORD_DATA_LENGTH/CXL_EVENT_RECORD_DATA_LENGTH
	Use hdr_uuid for the header UUID field
	Use cxl_event_log_type_str() for the trace events
	Create macros for the header fields and common entries of each event
	Add reserved buffer output dump
	Report error if event query fails
	Remove unused record_cnt variable
	Steven - reorder overflow record
		Remove NOTE about checkpatch
	Jonathan
		check for exactly 1 record
		s/v3.0/rev 3.0
		Use 3 byte fields for 24bit fields
		Add 3.0 Maintenance Operation Class
		Add Dynamic Capacity log type
		Fix spelling
	Dave Jiang/Dan/Alison
		s/cxl-event/cxl
		trace/events/cxl-events => trace/events/cxl.h
		s/cxl_event_overflow/overflow
		s/cxl_event/generic_event
---
 MAINTAINERS                  |   1 +
 drivers/cxl/core/mbox.c      | 105 +++++++++++++++++++++++++++++
 drivers/cxl/cxl.h            |   7 ++
 drivers/cxl/cxlmem.h         |  72 ++++++++++++++++++++
 include/trace/events/cxl.h   | 126 +++++++++++++++++++++++++++++++++++
 include/uapi/linux/cxl_mem.h |   1 +
 6 files changed, 312 insertions(+)
 create mode 100644 include/trace/events/cxl.h
  

Comments

Jonathan Cameron Dec. 1, 2022, 1:06 p.m. UTC | #1
On Wed, 30 Nov 2022 16:27:10 -0800
ira.weiny@intel.com wrote:

> From: Ira Weiny <ira.weiny@intel.com>
> 
> CXL devices have multiple event logs which can be queried for CXL event
> records.  Devices are required to support the storage of at least one
> event record in each event log type.
> 
> Devices track event log overflow by incrementing a counter and tracking
> the time of the first and last overflow event seen.
> 
> Software queries events via the Get Event Record mailbox command; CXL
> rev 3.0 section 8.2.9.2.2.
> 
> Issue the Get Event Record mailbox command on driver load.  Trace each
> record found with a generic record trace.  Trace any overflow
> conditions.
> 
> The device can return up to 1MB worth of event records per query.
> Allocate a shared large buffer to handle the max number of records based
> on the mailbox payload size.
> 
> This patch traces a raw event record only and leaves the specific event
> record types to subsequent patches.
> 
> Macros are created to use for tracing the common CXL Event header
> fields.
> 
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>

Hi Ira,

Looks good to me.  A few trivial suggestions inline. Either way,

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>


> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 16176b9278b4..70b681027a3d 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -7,6 +7,9 @@

...

> +
> +static void cxl_mem_free_event_buffer(void *data)
> +{
> +	struct cxl_dev_state *cxlds = data;
> +
> +	kvfree(cxlds->event_buf);

Trivial, but why not just pass in the event_buf?

> +}
> +
> +/*
> + * There is a single buffer for reading event logs from the mailbox.  All logs
> + * share this buffer protected by the cxlds->event_buf_lock.
> + */
> +static struct cxl_get_event_payload *alloc_event_buf(struct cxl_dev_state *cxlds)
> +{
> +	struct cxl_get_event_payload *buf;
> +
> +	dev_dbg(cxlds->dev, "Allocating event buffer size %zu\n",
> +		cxlds->payload_size);
> +
> +	buf = kvmalloc(cxlds->payload_size, GFP_KERNEL);

huh. I assumed there would be a devm_kvmalloc() but apparently not..  Ah well
- whilst it might makes sense to add one, let's not tie that up with this series.

> +	if (buf && devm_add_action_or_reset(cxlds->dev,
> +			cxl_mem_free_event_buffer, cxlds))
> +		return NULL;

Trivial, but I'd go for a more wordy but more conventional pattern of
	if (!buf)
		return NULL;

	if (devm_add_action_or_reset())
		return NULL
	
	return buff;
	
> +	return buf;
> +}
> +

...

> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index cd35f43fedd4..55d57f5a64bc 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -4,6 +4,7 @@
>  #define __CXL_MEM_H__
>  #include <uapi/linux/cxl_mem.h>
>  #include <linux/cdev.h>
> +#include <linux/uuid.h>
>  #include "cxl.h"
>  
>  /* CXL 2.0 8.2.8.5.1.1 Memory Device Status Register */
> @@ -250,12 +251,16 @@ struct cxl_dev_state {
>  
>  	bool msi_enabled;
>  
> +	struct cxl_get_event_payload *event_buf;
Whilst it is obvious (and document at point of allocation),
I think one of the static checkers still warns that all locks must
have comments.  Probably easier to add one now than wait for the
inevitable warning report.

> +	struct mutex event_buf_lock;
> +
>  	int (*mbox_send)(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd);
>  };
>
  
Ira Weiny Dec. 1, 2022, 3:10 p.m. UTC | #2
On Thu, Dec 01, 2022 at 01:06:50PM +0000, Jonathan Cameron wrote:
> On Wed, 30 Nov 2022 16:27:10 -0800
> ira.weiny@intel.com wrote:
> 
> > From: Ira Weiny <ira.weiny@intel.com>
> > 
> > CXL devices have multiple event logs which can be queried for CXL event
> > records.  Devices are required to support the storage of at least one
> > event record in each event log type.
> > 
> > Devices track event log overflow by incrementing a counter and tracking
> > the time of the first and last overflow event seen.
> > 
> > Software queries events via the Get Event Record mailbox command; CXL
> > rev 3.0 section 8.2.9.2.2.
> > 
> > Issue the Get Event Record mailbox command on driver load.  Trace each
> > record found with a generic record trace.  Trace any overflow
> > conditions.
> > 
> > The device can return up to 1MB worth of event records per query.
> > Allocate a shared large buffer to handle the max number of records based
> > on the mailbox payload size.
> > 
> > This patch traces a raw event record only and leaves the specific event
> > record types to subsequent patches.
> > 
> > Macros are created to use for tracing the common CXL Event header
> > fields.
> > 
> > Cc: Steven Rostedt <rostedt@goodmis.org>
> > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> 
> Hi Ira,
> 
> Looks good to me.  A few trivial suggestions inline. Either way,
> 
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> 
> 
> > diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> > index 16176b9278b4..70b681027a3d 100644
> > --- a/drivers/cxl/core/mbox.c
> > +++ b/drivers/cxl/core/mbox.c
> > @@ -7,6 +7,9 @@
> 
> ...
> 
> > +
> > +static void cxl_mem_free_event_buffer(void *data)
> > +{
> > +	struct cxl_dev_state *cxlds = data;
> > +
> > +	kvfree(cxlds->event_buf);
> 
> Trivial, but why not just pass in the event_buf?

Just following the pattern that 'cxl_mem_*' functions take a cxlds parameter.
<shrug>

I'm going to leave this because it is tested.

> 
> > +}
> > +
> > +/*
> > + * There is a single buffer for reading event logs from the mailbox.  All logs
> > + * share this buffer protected by the cxlds->event_buf_lock.
> > + */
> > +static struct cxl_get_event_payload *alloc_event_buf(struct cxl_dev_state *cxlds)
> > +{
> > +	struct cxl_get_event_payload *buf;
> > +
> > +	dev_dbg(cxlds->dev, "Allocating event buffer size %zu\n",
> > +		cxlds->payload_size);
> > +
> > +	buf = kvmalloc(cxlds->payload_size, GFP_KERNEL);
> 
> huh. I assumed there would be a devm_kvmalloc() but apparently not..  Ah well

Nope I've learned my lesson and checked first!

> - whilst it might makes sense to add one, let's not tie that up with this series.

Yep I did not want to hold this up for something like that.

> 
> > +	if (buf && devm_add_action_or_reset(cxlds->dev,
> > +			cxl_mem_free_event_buffer, cxlds))
> > +		return NULL;
> 
> Trivial, but I'd go for a more wordy but more conventional pattern of
> 	if (!buf)
> 		return NULL;
> 
> 	if (devm_add_action_or_reset())
> 		return NULL

I've been beat up in the past for not combining statements before.  So I've a
bad habit sometimes.

This pattern is a bit more clear.  Since I'm adding the comment below I'll
change it.

> 	
> 	return buff;
> 	
> > +	return buf;
> > +}
> > +
> 
> ...
> 
> > diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> > index cd35f43fedd4..55d57f5a64bc 100644
> > --- a/drivers/cxl/cxlmem.h
> > +++ b/drivers/cxl/cxlmem.h
> > @@ -4,6 +4,7 @@
> >  #define __CXL_MEM_H__
> >  #include <uapi/linux/cxl_mem.h>
> >  #include <linux/cdev.h>
> > +#include <linux/uuid.h>
> >  #include "cxl.h"
> >  
> >  /* CXL 2.0 8.2.8.5.1.1 Memory Device Status Register */
> > @@ -250,12 +251,16 @@ struct cxl_dev_state {
> >  
> >  	bool msi_enabled;
> >  
> > +	struct cxl_get_event_payload *event_buf;
> Whilst it is obvious (and document at point of allocation),
> I think one of the static checkers still warns that all locks must
> have comments.  Probably easier to add one now than wait for the
> inevitable warning report.

Well 0-day did not complain.  :-/  But I know there are other checkers out
there; better to add now, thanks.

Thanks for the review,
Ira

> 
> > +	struct mutex event_buf_lock;
> > +
> >  	int (*mbox_send)(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd);
> >  };
> >  
> 
>
  
Steven Rostedt Dec. 1, 2022, 5:38 p.m. UTC | #3
On Wed, 30 Nov 2022 16:27:10 -0800
ira.weiny@intel.com wrote:

>  CONEXANT ACCESSRUNNER USB DRIVER
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 16176b9278b4..70b681027a3d 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -7,6 +7,9 @@
>  #include <cxlmem.h>
>  #include <cxl.h>
>  
> +#define CREATE_TRACE_POINTS
> +#include <trace/events/cxl.h>
> +
>  #include "core.h"
>  
>  static bool cxl_raw_allow_all;
> @@ -48,6 +51,7 @@ static struct cxl_mem_command cxl_mem_commands[CXL_MEM_COMMAND_ID_MAX] = {
>  	CXL_CMD(RAW, CXL_VARIABLE_PAYLOAD, CXL_VARIABLE_PAYLOAD, 0),
>  #endif
>  	CXL_CMD(GET_SUPPORTED_LOGS, 0, CXL_VARIABLE_PAYLOAD, CXL_CMD_FLAG_FORCE_ENABLE),
> +	CXL_CMD(GET_EVENT_RECORD, 1, CXL_VARIABLE_PAYLOAD, 0),
>  	CXL_CMD(GET_FW_INFO, 0, 0x50, 0),
>  	CXL_CMD(GET_PARTITION_INFO, 0, 0x20, 0),
>  	CXL_CMD(GET_LSA, 0x8, CXL_VARIABLE_PAYLOAD, 0),
> @@ -704,6 +708,106 @@ int cxl_enumerate_cmds(struct cxl_dev_state *cxlds)
>  }
>  EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL);
>  
> +static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
> +				    enum cxl_event_log_type type)
> +{
> +	struct cxl_get_event_payload *payload;
> +	u16 nr_rec;
> +
> +	mutex_lock(&cxlds->event_buf_lock);
> +
> +	payload = cxlds->event_buf;
> +
> +	do {
> +		u8 log_type = type;
> +		int rc;
> +
> +		rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_GET_EVENT_RECORD,
> +				       &log_type, sizeof(log_type),
> +				       payload, cxlds->payload_size);
> +		if (rc) {
> +			dev_err(cxlds->dev, "Event log '%s': Failed to query event records : %d",
> +				cxl_event_log_type_str(type), rc);
> +			goto unlock_buffer;
> +		}
> +
> +		nr_rec = le16_to_cpu(payload->record_count);
> +		if (trace_cxl_generic_event_enabled()) {
> +			int i;
> +
> +			for (i = 0; i < nr_rec; i++)
> +				trace_cxl_generic_event(dev_name(cxlds->dev),
> +							type,
> +							&payload->records[i]);
> +		}
> +
> +		if (trace_cxl_overflow_enabled() &&
> +		    (payload->flags & CXL_GET_EVENT_FLAG_OVERFLOW))
> +			trace_cxl_overflow(dev_name(cxlds->dev), type, payload);
> +
> +	} while (nr_rec);
> +
> +unlock_buffer:
> +	mutex_unlock(&cxlds->event_buf_lock);
> +}
> +
> +static void cxl_mem_free_event_buffer(void *data)
> +{
> +	struct cxl_dev_state *cxlds = data;
> +
> +	kvfree(cxlds->event_buf);
> +}
> +
> +/*
> + * There is a single buffer for reading event logs from the mailbox.  All logs
> + * share this buffer protected by the cxlds->event_buf_lock.
> + */
> +static struct cxl_get_event_payload *alloc_event_buf(struct cxl_dev_state *cxlds)
> +{
> +	struct cxl_get_event_payload *buf;
> +
> +	dev_dbg(cxlds->dev, "Allocating event buffer size %zu\n",
> +		cxlds->payload_size);
> +
> +	buf = kvmalloc(cxlds->payload_size, GFP_KERNEL);
> +	if (buf && devm_add_action_or_reset(cxlds->dev,
> +			cxl_mem_free_event_buffer, cxlds))
> +		return NULL;
> +	return buf;
> +}
> +
> +/**
> + * cxl_mem_get_event_records - Get Event Records from the device
> + * @cxlds: The device data for the operation
> + *
> + * Retrieve all event records available on the device and report them as trace
> + * events.
> + *
> + * See CXL rev 3.0 @8.2.9.2.2 Get Event Records
> + */
> +void cxl_mem_get_event_records(struct cxl_dev_state *cxlds)
> +{
> +	u32 status = readl(cxlds->regs.status + CXLDEV_DEV_EVENT_STATUS_OFFSET);
> +
> +	dev_dbg(cxlds->dev, "Reading event logs: %x\n", status);
> +
> +	if (!cxlds->event_buf) {
> +		cxlds->event_buf = alloc_event_buf(cxlds);
> +		if (WARN_ON_ONCE(!cxlds->event_buf))
> +			return;
> +	}
> +
> +	if (status & CXLDEV_EVENT_STATUS_INFO)
> +		cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_INFO);
> +	if (status & CXLDEV_EVENT_STATUS_WARN)
> +		cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_WARN);
> +	if (status & CXLDEV_EVENT_STATUS_FAIL)
> +		cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_FAIL);
> +	if (status & CXLDEV_EVENT_STATUS_FATAL)
> +		cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_FATAL);
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_mem_get_event_records, CXL);
> +
>  /**
>   * cxl_mem_get_partition_info - Get partition info
>   * @cxlds: The device data for the operation
> @@ -846,6 +950,7 @@ struct cxl_dev_state *cxl_dev_state_create(struct device *dev)
>  	}
>  
>  	mutex_init(&cxlds->mbox_mutex);
> +	mutex_init(&cxlds->event_buf_lock);
>  	cxlds->dev = dev;
>  
>  	return cxlds;
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index f680450f0b16..d4baae74cd97 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -132,6 +132,13 @@ static inline int ways_to_cxl(unsigned int ways, u8 *iw)
>  #define CXLDEV_CAP_CAP_ID_SECONDARY_MAILBOX 0x3
>  #define CXLDEV_CAP_CAP_ID_MEMDEV 0x4000
>  
> +/* CXL 3.0 8.2.8.3.1 Event Status Register */
> +#define CXLDEV_DEV_EVENT_STATUS_OFFSET		0x00
> +#define CXLDEV_EVENT_STATUS_INFO		BIT(0)
> +#define CXLDEV_EVENT_STATUS_WARN		BIT(1)
> +#define CXLDEV_EVENT_STATUS_FAIL		BIT(2)
> +#define CXLDEV_EVENT_STATUS_FATAL		BIT(3)
> +
>  /* CXL 2.0 8.2.8.4 Mailbox Registers */
>  #define CXLDEV_MBOX_CAPS_OFFSET 0x00
>  #define   CXLDEV_MBOX_CAP_PAYLOAD_SIZE_MASK GENMASK(4, 0)
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index cd35f43fedd4..55d57f5a64bc 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -4,6 +4,7 @@
>  #define __CXL_MEM_H__
>  #include <uapi/linux/cxl_mem.h>
>  #include <linux/cdev.h>
> +#include <linux/uuid.h>
>  #include "cxl.h"
>  
>  /* CXL 2.0 8.2.8.5.1.1 Memory Device Status Register */
> @@ -250,12 +251,16 @@ struct cxl_dev_state {
>  
>  	bool msi_enabled;
>  
> +	struct cxl_get_event_payload *event_buf;
> +	struct mutex event_buf_lock;
> +
>  	int (*mbox_send)(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd);
>  };
>  
>  enum cxl_opcode {
>  	CXL_MBOX_OP_INVALID		= 0x0000,
>  	CXL_MBOX_OP_RAW			= CXL_MBOX_OP_INVALID,
> +	CXL_MBOX_OP_GET_EVENT_RECORD	= 0x0100,
>  	CXL_MBOX_OP_GET_FW_INFO		= 0x0200,
>  	CXL_MBOX_OP_ACTIVATE_FW		= 0x0202,
>  	CXL_MBOX_OP_GET_SUPPORTED_LOGS	= 0x0400,
> @@ -325,6 +330,72 @@ struct cxl_mbox_identify {
>  	u8 qos_telemetry_caps;
>  } __packed;
>  
> +/*
> + * Common Event Record Format
> + * CXL rev 3.0 section 8.2.9.2.1; Table 8-42
> + */
> +struct cxl_event_record_hdr {
> +	uuid_t id;
> +	u8 length;
> +	u8 flags[3];
> +	__le16 handle;
> +	__le16 related_handle;
> +	__le64 timestamp;
> +	u8 maint_op_class;
> +	u8 reserved[0xf];
> +} __packed;
> +
> +#define CXL_EVENT_RECORD_DATA_LENGTH 0x50
> +struct cxl_event_record_raw {
> +	struct cxl_event_record_hdr hdr;
> +	u8 data[CXL_EVENT_RECORD_DATA_LENGTH];
> +} __packed;
> +
> +/*
> + * Get Event Records output payload
> + * CXL rev 3.0 section 8.2.9.2.2; Table 8-50
> + */
> +#define CXL_GET_EVENT_FLAG_OVERFLOW		BIT(0)
> +#define CXL_GET_EVENT_FLAG_MORE_RECORDS		BIT(1)
> +struct cxl_get_event_payload {
> +	u8 flags;
> +	u8 reserved1;
> +	__le16 overflow_err_count;
> +	__le64 first_overflow_timestamp;
> +	__le64 last_overflow_timestamp;
> +	__le16 record_count;
> +	u8 reserved2[0xa];
> +	struct cxl_event_record_raw records[];
> +} __packed;
> +
> +/*
> + * CXL rev 3.0 section 8.2.9.2.2; Table 8-49
> + */
> +enum cxl_event_log_type {
> +	CXL_EVENT_TYPE_INFO = 0x00,
> +	CXL_EVENT_TYPE_WARN,
> +	CXL_EVENT_TYPE_FAIL,
> +	CXL_EVENT_TYPE_FATAL,
> +	CXL_EVENT_TYPE_MAX
> +};
> +
> +static inline const char *cxl_event_log_type_str(enum cxl_event_log_type type)
> +{
> +	switch (type) {
> +	case CXL_EVENT_TYPE_INFO:
> +		return "Informational";
> +	case CXL_EVENT_TYPE_WARN:
> +		return "Warning";
> +	case CXL_EVENT_TYPE_FAIL:
> +		return "Failure";
> +	case CXL_EVENT_TYPE_FATAL:
> +		return "Fatal";
> +	default:
> +		break;
> +	}
> +	return "<unknown>";
> +}

So you are using this in a TP_printk() section, which means perf and
trace-cmd have no idea how to parse it. Can I recommend instead having:

#define cxl_event_log_type_str(type)				\
	__print_symbolic(type,					\
		{ CXL_EVENT_TYPE_INFO, "Informational" },	\
		{ CXL_EVENT_TYPE_WARN, "Warning" },		\
		{ CXL_EVENT_TYPE_FAIL, "Failure" },		\
		{ CXL_EVENT_TYPE_FATAL, "Fatal" })

#ifndef CREATE_TRACE_POINTS
static inline const char *__cxl_event_log_type_str(enum cxl_event_log_type type,
			struct trace_print_flags *symbols)
{
	for (; symbols->mask >= 0; symbols++) {
		if (type == symbols->mask)
			return symbols->name;
	}
	return "<unknown>";
}
#define __print_symbolic(value, symbol_array...)			\
	({								\
		static const struct trace_print_flags symbols[] =	\
			{ symbol_array, { -1, NULL }};			\
		__cxl_event_log_type_str(value, symbols);		\
	})
#endif		

Note, I did not even try to compile the above. But it should be close to
working.

This way, the cxl_event_log_type_str() for trace events will be converted
into the __print_symbolic() which can be parsed by perf and trace-cmd. For
all other use cases, it is converted into the function above to return the
string.

-- Steve

> +
>  struct cxl_mbox_get_partition_info {
>  	__le64 active_volatile_cap;
>  	__le64 active_persistent_cap;
> @@ -384,6 +455,7 @@ int cxl_mem_create_range_info(struct cxl_dev_state *cxlds);
>  struct cxl_dev_state *cxl_dev_state_create(struct device *dev);
>  void set_exclusive_cxl_commands(struct cxl_dev_state *cxlds, unsigned long *cmds);
>  void clear_exclusive_cxl_commands(struct cxl_dev_state *cxlds, unsigned long *cmds);
> +void cxl_mem_get_event_records(struct cxl_dev_state *cxlds);
>  #ifdef CONFIG_CXL_SUSPEND
>  void cxl_mem_active_inc(void);
>  void cxl_mem_active_dec(void);
> diff --git a/include/trace/events/cxl.h b/include/trace/events/cxl.h
> new file mode 100644
> index 000000000000..c03a1a894af8
> --- /dev/null
> +++ b/include/trace/events/cxl.h
> @@ -0,0 +1,126 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM cxl
> +
> +#if !defined(_CXL_TRACE_EVENTS_H) ||  defined(TRACE_HEADER_MULTI_READ)
> +#define _CXL_TRACE_EVENTS_H
> +
> +#include <asm-generic/unaligned.h>
> +#include <linux/tracepoint.h>
> +#include <cxlmem.h>
> +
> +TRACE_EVENT(cxl_overflow,
> +
> +	TP_PROTO(const char *dev_name, enum cxl_event_log_type log,
> +		 struct cxl_get_event_payload *payload),
> +
> +	TP_ARGS(dev_name, log, payload),
> +
> +	TP_STRUCT__entry(
> +		__string(dev_name, dev_name)
> +		__field(int, log)
> +		__field(u64, first_ts)
> +		__field(u64, last_ts)
> +		__field(u16, count)
> +	),
> +
> +	TP_fast_assign(
> +		__assign_str(dev_name, dev_name);
> +		__entry->log = log;
> +		__entry->count = le16_to_cpu(payload->overflow_err_count);
> +		__entry->first_ts = le64_to_cpu(payload->first_overflow_timestamp);
> +		__entry->last_ts = le64_to_cpu(payload->last_overflow_timestamp);
> +	),
> +
> +	TP_printk("%s: EVENT LOG OVERFLOW log=%s : %u records from %llu to %llu",
> +		__get_str(dev_name), cxl_event_log_type_str(__entry->log),
> +		__entry->count, __entry->first_ts, __entry->last_ts)
> +
> +);
> +
> +/*
> + * Common Event Record Format
> + * CXL 3.0 section 8.2.9.2.1; Table 8-42
> + */
> +#define CXL_EVENT_RECORD_FLAG_PERMANENT		BIT(2)
> +#define CXL_EVENT_RECORD_FLAG_MAINT_NEEDED	BIT(3)
> +#define CXL_EVENT_RECORD_FLAG_PERF_DEGRADED	BIT(4)
> +#define CXL_EVENT_RECORD_FLAG_HW_REPLACE	BIT(5)
> +#define show_hdr_flags(flags)	__print_flags(flags, " | ",			   \
> +	{ CXL_EVENT_RECORD_FLAG_PERMANENT,	"PERMANENT_CONDITION"		}, \
> +	{ CXL_EVENT_RECORD_FLAG_MAINT_NEEDED,	"MAINTENANCE_NEEDED"		}, \
> +	{ CXL_EVENT_RECORD_FLAG_PERF_DEGRADED,	"PERFORMANCE_DEGRADED"		}, \
> +	{ CXL_EVENT_RECORD_FLAG_HW_REPLACE,	"HARDWARE_REPLACEMENT_NEEDED"	}  \
> +)
> +
> +/*
> + * Define macros for the common header of each CXL event.
> + *
> + * Tracepoints using these macros must do 3 things:
> + *
> + *	1) Add CXL_EVT_TP_entry to TP_STRUCT__entry
> + *	2) Use CXL_EVT_TP_fast_assign within TP_fast_assign;
> + *	   pass the dev_name, log, and CXL event header
> + *	3) Use CXL_EVT_TP_printk() instead of TP_printk()
> + *
> + * See the generic_event tracepoint as an example.
> + */
> +#define CXL_EVT_TP_entry					\
> +	__string(dev_name, dev_name)				\
> +	__field(int, log)					\
> +	__field_struct(uuid_t, hdr_uuid)			\
> +	__field(u32, hdr_flags)					\
> +	__field(u16, hdr_handle)				\
> +	__field(u16, hdr_related_handle)			\
> +	__field(u64, hdr_timestamp)				\
> +	__field(u8, hdr_length)					\
> +	__field(u8, hdr_maint_op_class)
> +
> +#define CXL_EVT_TP_fast_assign(dname, l, hdr)					\
> +	__assign_str(dev_name, (dname));					\
> +	__entry->log = (l);							\
> +	memcpy(&__entry->hdr_uuid, &(hdr).id, sizeof(uuid_t));			\
> +	__entry->hdr_length = (hdr).length;					\
> +	__entry->hdr_flags = get_unaligned_le24((hdr).flags);			\
> +	__entry->hdr_handle = le16_to_cpu((hdr).handle);			\
> +	__entry->hdr_related_handle = le16_to_cpu((hdr).related_handle);	\
> +	__entry->hdr_timestamp = le64_to_cpu((hdr).timestamp);			\
> +	__entry->hdr_maint_op_class = (hdr).maint_op_class
> +
> +#define CXL_EVT_TP_printk(fmt, ...) \
> +	TP_printk("%s log=%s : time=%llu uuid=%pUb len=%d flags='%s' "		\
> +		"handle=%x related_handle=%x maint_op_class=%u"			\
> +		" : " fmt,							\
> +		__get_str(dev_name), cxl_event_log_type_str(__entry->log),	\
> +		__entry->hdr_timestamp, &__entry->hdr_uuid, __entry->hdr_length,\
> +		show_hdr_flags(__entry->hdr_flags), __entry->hdr_handle,	\
> +		__entry->hdr_related_handle, __entry->hdr_maint_op_class,	\
> +		##__VA_ARGS__)
> +
> +TRACE_EVENT(cxl_generic_event,
> +
> +	TP_PROTO(const char *dev_name, enum cxl_event_log_type log,
> +		 struct cxl_event_record_raw *rec),
> +
> +	TP_ARGS(dev_name, log, rec),
> +
> +	TP_STRUCT__entry(
> +		CXL_EVT_TP_entry
> +		__array(u8, data, CXL_EVENT_RECORD_DATA_LENGTH)
> +	),
> +
> +	TP_fast_assign(
> +		CXL_EVT_TP_fast_assign(dev_name, log, rec->hdr);
> +		memcpy(__entry->data, &rec->data, CXL_EVENT_RECORD_DATA_LENGTH);
> +	),
> +
> +	CXL_EVT_TP_printk("%s",
> +		__print_hex(__entry->data, CXL_EVENT_RECORD_DATA_LENGTH))
> +);
> +
> +#endif /* _CXL_TRACE_EVENTS_H */
> +
> +/* This part must be outside protection */
> +#undef TRACE_INCLUDE_FILE
> +#define TRACE_INCLUDE_FILE cxl
> +#include <trace/define_trace.h>
> diff --git a/include/uapi/linux/cxl_mem.h b/include/uapi/linux/cxl_mem.h
> index c71021a2a9ed..70459be5bdd4 100644
> --- a/include/uapi/linux/cxl_mem.h
> +++ b/include/uapi/linux/cxl_mem.h
> @@ -24,6 +24,7 @@
>  	___C(IDENTIFY, "Identify Command"),                               \
>  	___C(RAW, "Raw device command"),                                  \
>  	___C(GET_SUPPORTED_LOGS, "Get Supported Logs"),                   \
> +	___C(GET_EVENT_RECORD, "Get Event Record"),                       \
>  	___C(GET_FW_INFO, "Get FW Info"),                                 \
>  	___C(GET_PARTITION_INFO, "Get Partition Information"),            \
>  	___C(GET_LSA, "Get Label Storage Area"),                          \
  
Ira Weiny Dec. 2, 2022, 12:09 a.m. UTC | #4
On Thu, Dec 01, 2022 at 12:38:49PM -0500, Steven Rostedt wrote:
> On Wed, 30 Nov 2022 16:27:10 -0800
> ira.weiny@intel.com wrote:
> 

[snip]

> > +
> > +/*
> > + * CXL rev 3.0 section 8.2.9.2.2; Table 8-49
> > + */
> > +enum cxl_event_log_type {
> > +	CXL_EVENT_TYPE_INFO = 0x00,
> > +	CXL_EVENT_TYPE_WARN,
> > +	CXL_EVENT_TYPE_FAIL,
> > +	CXL_EVENT_TYPE_FATAL,
> > +	CXL_EVENT_TYPE_MAX
> > +};
> > +
> > +static inline const char *cxl_event_log_type_str(enum cxl_event_log_type type)
> > +{
> > +	switch (type) {
> > +	case CXL_EVENT_TYPE_INFO:
> > +		return "Informational";
> > +	case CXL_EVENT_TYPE_WARN:
> > +		return "Warning";
> > +	case CXL_EVENT_TYPE_FAIL:
> > +		return "Failure";
> > +	case CXL_EVENT_TYPE_FATAL:
> > +		return "Fatal";
> > +	default:
> > +		break;
> > +	}
> > +	return "<unknown>";
> > +}
> 
> So you are using this in a TP_printk() section, which means perf and
> trace-cmd have no idea how to parse it. Can I recommend instead having:
> 
> #define cxl_event_log_type_str(type)				\
> 	__print_symbolic(type,					\
> 		{ CXL_EVENT_TYPE_INFO, "Informational" },	\
> 		{ CXL_EVENT_TYPE_WARN, "Warning" },		\
> 		{ CXL_EVENT_TYPE_FAIL, "Failure" },		\
> 		{ CXL_EVENT_TYPE_FATAL, "Fatal" })
> 
> #ifndef CREATE_TRACE_POINTS
> static inline const char *__cxl_event_log_type_str(enum cxl_event_log_type type,
> 			struct trace_print_flags *symbols)
> {
> 	for (; symbols->mask >= 0; symbols++) {
> 		if (type == symbols->mask)
> 			return symbols->name;
> 	}
> 	return "<unknown>";
> }
> #define __print_symbolic(value, symbol_array...)			\
> 	({								\
> 		static const struct trace_print_flags symbols[] =	\
> 			{ symbol_array, { -1, NULL }};			\
> 		__cxl_event_log_type_str(value, symbols);		\
> 	})
> #endif		
> 
> Note, I did not even try to compile the above. But it should be close to
> working.

Dropping that into cxlmem.h does not compile.  I've given it another go but
because I use cxl_event_log_type_str() in a file where trace points are used
CREATE_TRACE_POINTS is defined and I get the following error.

|| drivers/cxl/core/mbox.c: In function ‘cxl_mem_get_records_log’:
drivers/cxl/cxlmem.h|386 col 7| error: implicit declaration of function ‘__print_symbolic’; did you mean ‘sprint_symbol’?  [-Werror=implicit-function-declaration]                        
||   386 |       __print_symbolic(type,                            \
||       |       ^~~~~~~~~~~~~~~~

I got it to work with the patch below on top of this one.[3]  But it is kind of
ugly.  The only way I could get __print_symbolic() to be defined was to
redefine it in mbox.c.[1]  Then throw it in it's own header as in [3]

NOTE that patch [2] which I think _should_ work on top of patch [1] does not.
I can't understand why.

> 
> This way, the cxl_event_log_type_str() for trace events will be converted
> into the __print_symbolic() which can be parsed by perf and trace-cmd. For
> all other use cases, it is converted into the function above to return the
> string.

I would like to have this support.  I really tried to share this code a while
back.  What you have is seems nicer than what I remember coming up with but it
is still a bit hacky IMO.  And I'm afraid of how fragile this seems right now.

At this point I'm just going to define cxl_event_log_type_str() separate from
the __print_symbolic() in the trace code.  Comment that they should both be
updated if changed and move forward.

Thanks for the suggestion but I think it is going to be more complicated than
it is worth.  At least for mere mortals such as myself.

Ira

[1]

For mbox.c I have to have the special redefinition of __print_symbolic() in the
c file itself.  Code including cxlmem.h without CREATE_TRACE_POINTS defined
(like pci.c) works just fine with what you had.


commit 43a30047962312be2e532dff542d47a132949c08
Author: Ira Weiny <ira.weiny@intel.com>
Date:   Thu Dec 1 13:50:14 2022 -0800

    squash: Redefine __print_symbolic to have a central define of the log string print function.

diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 63ee0fd5f4c2..55938d530a21 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -4,12 +4,30 @@
 #include <linux/security.h>
 #include <linux/debugfs.h>
 #include <linux/mutex.h>
-#include <cxlmem.h>
 #include <cxl.h>
 
 #define CREATE_TRACE_POINTS
+/* Must be after CREATE_TRACE_POINTS */
+#include <cxlmem.h>
 #include <trace/events/cxl.h>
 
+static inline const char *__cxl_event_log_type_str(enum cxl_event_log_type type,
+                      const struct trace_print_flags *symbols)
+{
+      for (; symbols->mask >= 0; symbols++) {
+              if (type == symbols->mask)
+                      return symbols->name;
+      }
+      return "<unknown>";
+}
+
+#define __print_symbolic(value, symbol_array...)			\
+      ({								\
+              static const struct trace_print_flags symbols[] =		\
+                      { symbol_array, { -1, NULL }};			\
+              __cxl_event_log_type_str(value, symbols);			\
+      })
+
 #include "core.h"
 
 static bool cxl_raw_allow_all;
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index a701a2e9bcba..34e7d8ae6cfd 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -381,23 +381,32 @@ enum cxl_event_log_type {
 	CXL_EVENT_TYPE_MAX
 };
 
-static inline const char *cxl_event_log_type_str(enum cxl_event_log_type type)
+#define cxl_event_log_type_str(type)			\
+      __print_symbolic(type,				\
+              { CXL_EVENT_TYPE_INFO, "Informational" },	\
+              { CXL_EVENT_TYPE_WARN, "Warning" },	\
+              { CXL_EVENT_TYPE_FAIL, "Failure" },	\
+              { CXL_EVENT_TYPE_FATAL, "Fatal" })
+
+#ifndef CREATE_TRACE_POINTS
+static inline const char *__cxl_event_log_type_str(enum cxl_event_log_type type,
+                      const struct trace_print_flags *symbols)
 {
-	switch (type) {
-	case CXL_EVENT_TYPE_INFO:
-		return "Informational";
-	case CXL_EVENT_TYPE_WARN:
-		return "Warning";
-	case CXL_EVENT_TYPE_FAIL:
-		return "Failure";
-	case CXL_EVENT_TYPE_FATAL:
-		return "Fatal";
-	default:
-		break;
-	}
-	return "<unknown>";
+      for (; symbols->mask >= 0; symbols++) {
+              if (type == symbols->mask)
+                      return symbols->name;
+      }
+      return "<unknown>";
 }
 
+#define __print_symbolic(value, symbol_array...)			\
+      ({								\
+              static const struct trace_print_flags symbols[] =		\
+                      { symbol_array, { -1, NULL }};			\
+              __cxl_event_log_type_str(value, symbols);			\
+      })
+#endif
+
 struct cxl_mbox_get_partition_info {
 	__le64 active_volatile_cap;
 	__le64 active_persistent_cap;



[2]

Why can't this work?  Why does the undef of CREATE_TRACE_POINTS not work?

I've also tried to include __cxl_event_log_type_str() and this redefintion of
__print_symbolic() in trace/events/cxl.h and that does not work.

I'm also worried this somehow breaks the support you want.  But I'm still not
sure how the trace headers and multiple passes work.

commit ad08110d2432fb24d4513fe9e75bb9be94870e6f
Author: Ira Weiny <ira.weiny@intel.com>
Date:   Thu Dec 1 15:01:05 2022 -0800

    squash: broken!

diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 55938d530a21..04117afe9fbf 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -7,26 +7,10 @@
 #include <cxl.h>
 
 #define CREATE_TRACE_POINTS
-/* Must be after CREATE_TRACE_POINTS */
-#include <cxlmem.h>
 #include <trace/events/cxl.h>
 
-static inline const char *__cxl_event_log_type_str(enum cxl_event_log_type type,
-                      const struct trace_print_flags *symbols)
-{
-      for (; symbols->mask >= 0; symbols++) {
-              if (type == symbols->mask)
-                      return symbols->name;
-      }
-      return "<unknown>";
-}
-
-#define __print_symbolic(value, symbol_array...)			\
-      ({								\
-              static const struct trace_print_flags symbols[] =		\
-                      { symbol_array, { -1, NULL }};			\
-              __cxl_event_log_type_str(value, symbols);			\
-      })
+#undef CREATE_TRACE_POINTS
+#include <cxlmem.h>
 
 #include "core.h"
 



[3]

This seems to be the cleanest thing I have gotten to work.  Work == compiles
and trace points still print the strings.


commit c04f87639164737605c9ff503f8060b901c1b83a
Author: Ira Weiny <ira.weiny@intel.com>
Date:   Thu Dec 1 13:50:14 2022 -0800

    squash: Redefine __print_symbolic to have a central define of the log string print function.

diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 63ee0fd5f4c2..31a65106e93c 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -4,12 +4,19 @@
 #include <linux/security.h>
 #include <linux/debugfs.h>
 #include <linux/mutex.h>
-#include <cxlmem.h>
 #include <cxl.h>
 
 #define CREATE_TRACE_POINTS
+/* Must be after CREATE_TRACE_POINTS */
+#include <cxlmem.h>
 #include <trace/events/cxl.h>
 
+/*
+ * Must be included explicitly after trace header
+ * because CREATE_TRACE_POINTS can't be undefined for cxlmem.h???
+ */
+#include <cxl_event_log.h>
+
 #include "core.h"
 
 static bool cxl_raw_allow_all;
diff --git a/drivers/cxl/cxl_event_log.h b/drivers/cxl/cxl_event_log.h
new file mode 100644
index 000000000000..e8357bfeecdf
--- /dev/null
+++ b/drivers/cxl/cxl_event_log.h
@@ -0,0 +1,23 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright(c) 2020 Intel Corporation. All rights reserved. */
+
+/*
+ * Redefine __print_symbolic() from the trace code to be used in regular C code
+ * This compliments the define of cxl_event_log_type_str() in cxlmem.h
+ */
+static inline const char *__cxl_event_log_type_str(enum cxl_event_log_type type,
+                      const struct trace_print_flags *symbols)
+{
+      for (; symbols->mask >= 0; symbols++) {
+              if (type == symbols->mask)
+                      return symbols->name;
+      }
+      return "<unknown>";
+}
+
+#define __print_symbolic(value, symbol_array...)			\
+      ({								\
+              static const struct trace_print_flags symbols[] =		\
+                      { symbol_array, { -1, NULL }};			\
+              __cxl_event_log_type_str(value, symbols);			\
+      })
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index a701a2e9bcba..15222a0ceb3f 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -381,22 +381,16 @@ enum cxl_event_log_type {
 	CXL_EVENT_TYPE_MAX
 };
 
-static inline const char *cxl_event_log_type_str(enum cxl_event_log_type type)
-{
-	switch (type) {
-	case CXL_EVENT_TYPE_INFO:
-		return "Informational";
-	case CXL_EVENT_TYPE_WARN:
-		return "Warning";
-	case CXL_EVENT_TYPE_FAIL:
-		return "Failure";
-	case CXL_EVENT_TYPE_FATAL:
-		return "Fatal";
-	default:
-		break;
-	}
-	return "<unknown>";
-}
+#define cxl_event_log_type_str(type)			\
+      __print_symbolic(type,				\
+              { CXL_EVENT_TYPE_INFO, "Informational" },	\
+              { CXL_EVENT_TYPE_WARN, "Warning" },	\
+              { CXL_EVENT_TYPE_FAIL, "Failure" },	\
+              { CXL_EVENT_TYPE_FATAL, "Fatal" })
+
+#ifndef CREATE_TRACE_POINTS
+#include "cxl_event_log.h"
+#endif
 
 struct cxl_mbox_get_partition_info {
 	__le64 active_volatile_cap;
  
Dan Williams Dec. 2, 2022, 1:39 a.m. UTC | #5
ira.weiny@ wrote:
> From: Ira Weiny <ira.weiny@intel.com>
> 
> CXL devices have multiple event logs which can be queried for CXL event
> records.  Devices are required to support the storage of at least one
> event record in each event log type.
> 
> Devices track event log overflow by incrementing a counter and tracking
> the time of the first and last overflow event seen.
> 
> Software queries events via the Get Event Record mailbox command; CXL
> rev 3.0 section 8.2.9.2.2.
> 
> Issue the Get Event Record mailbox command on driver load.  Trace each
> record found with a generic record trace.  Trace any overflow
> conditions.
> 
> The device can return up to 1MB worth of event records per query.
> Allocate a shared large buffer to handle the max number of records based
> on the mailbox payload size.
> 
> This patch traces a raw event record only and leaves the specific event
> record types to subsequent patches.
> 
> Macros are created to use for tracing the common CXL Event header
> fields.
> 
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> 
> ---
> Change from V1:
> 	Ignore useless More Event Flag
> 	defer DCD support
> 	Jonathan
> 		delete extra blank line
> 		Use all caps for flags
> 	Jonathan/Dan/Ira
> 		Allocate event MB buffer on start up.
> 	Alison
> 		s/pl_nr/nr_pl
> 
> Change from RFC v2:
> 	Support reading 3 events at once.
> 	Reverse Jonathan's suggestion and check for positive number of
> 		records.  Because the record count may have been
> 		returned as something > 3 based on what the device
> 		thinks it can send back even though the core Linux mbox
> 		processing truncates the data.
> 	Alison and Dave Jiang
> 		Change header uuid type to uuid_t for better user space
> 		processing
> 	Smita
> 		Check status reg before reading log.
> 	Steven
> 		Prefix all trace points with 'cxl_'
> 		Use static branch <trace>_enabled() calls
> 	Jonathan
> 		s/CXL_EVENT_TYPE_INFO/0
> 		s/{first,last}/{first,last}_ts
> 		Remove Reserved field from header
> 		Fix header issue for cxl_event_log_type_str()
> 
> Change from RFC:
> 	Remove redundant error message in get event records loop
> 	s/EVENT_RECORD_DATA_LENGTH/CXL_EVENT_RECORD_DATA_LENGTH
> 	Use hdr_uuid for the header UUID field
> 	Use cxl_event_log_type_str() for the trace events
> 	Create macros for the header fields and common entries of each event
> 	Add reserved buffer output dump
> 	Report error if event query fails
> 	Remove unused record_cnt variable
> 	Steven - reorder overflow record
> 		Remove NOTE about checkpatch
> 	Jonathan
> 		check for exactly 1 record
> 		s/v3.0/rev 3.0
> 		Use 3 byte fields for 24bit fields
> 		Add 3.0 Maintenance Operation Class
> 		Add Dynamic Capacity log type
> 		Fix spelling
> 	Dave Jiang/Dan/Alison
> 		s/cxl-event/cxl
> 		trace/events/cxl-events => trace/events/cxl.h
> 		s/cxl_event_overflow/overflow
> 		s/cxl_event/generic_event
> ---
>  MAINTAINERS                  |   1 +
>  drivers/cxl/core/mbox.c      | 105 +++++++++++++++++++++++++++++
>  drivers/cxl/cxl.h            |   7 ++
>  drivers/cxl/cxlmem.h         |  72 ++++++++++++++++++++
>  include/trace/events/cxl.h   | 126 +++++++++++++++++++++++++++++++++++
>  include/uapi/linux/cxl_mem.h |   1 +
>  6 files changed, 312 insertions(+)
>  create mode 100644 include/trace/events/cxl.h
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index ca063a504026..4b7c6e3055c6 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -5223,6 +5223,7 @@ M:	Dan Williams <dan.j.williams@intel.com>
>  L:	linux-cxl@vger.kernel.org
>  S:	Maintained
>  F:	drivers/cxl/
> +F:	include/trace/events/cxl.h
>  F:	include/uapi/linux/cxl_mem.h
>  
>  CONEXANT ACCESSRUNNER USB DRIVER
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 16176b9278b4..70b681027a3d 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -7,6 +7,9 @@
>  #include <cxlmem.h>
>  #include <cxl.h>
>  
> +#define CREATE_TRACE_POINTS
> +#include <trace/events/cxl.h>
> +
>  #include "core.h"
>  
>  static bool cxl_raw_allow_all;
> @@ -48,6 +51,7 @@ static struct cxl_mem_command cxl_mem_commands[CXL_MEM_COMMAND_ID_MAX] = {
>  	CXL_CMD(RAW, CXL_VARIABLE_PAYLOAD, CXL_VARIABLE_PAYLOAD, 0),
>  #endif
>  	CXL_CMD(GET_SUPPORTED_LOGS, 0, CXL_VARIABLE_PAYLOAD, CXL_CMD_FLAG_FORCE_ENABLE),
> +	CXL_CMD(GET_EVENT_RECORD, 1, CXL_VARIABLE_PAYLOAD, 0),
>  	CXL_CMD(GET_FW_INFO, 0, 0x50, 0),
>  	CXL_CMD(GET_PARTITION_INFO, 0, 0x20, 0),
>  	CXL_CMD(GET_LSA, 0x8, CXL_VARIABLE_PAYLOAD, 0),

Similar to this patch:

https://lore.kernel.org/linux-cxl/166993221008.1995348.11651567302609703175.stgit@dwillia2-xfh.jf.intel.com/

CXL_MEM_COMMAND_ID_GET_EVENT_RECORD, should be added to the "always
kernel" / cxlds->exclusive_cmds mask.

> @@ -704,6 +708,106 @@ int cxl_enumerate_cmds(struct cxl_dev_state *cxlds)
>  }
>  EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL);
>  
> +static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
> +				    enum cxl_event_log_type type)
> +{
> +	struct cxl_get_event_payload *payload;
> +	u16 nr_rec;
> +
> +	mutex_lock(&cxlds->event_buf_lock);
> +
> +	payload = cxlds->event_buf;
> +
> +	do {
> +		u8 log_type = type;
> +		int rc;
> +
> +		rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_GET_EVENT_RECORD,
> +				       &log_type, sizeof(log_type),
> +				       payload, cxlds->payload_size);
> +		if (rc) {
> +			dev_err(cxlds->dev, "Event log '%s': Failed to query event records : %d",
> +				cxl_event_log_type_str(type), rc);
> +			goto unlock_buffer;
> +		}
> +
> +		nr_rec = le16_to_cpu(payload->record_count);
> +		if (trace_cxl_generic_event_enabled()) {

This feels like a premature micro-optimization as none of this code is
fast path and it is dwarfed by the cost of executing the mailbox
command. I started with trying to reduce the 80 column collision
pressure, but then stepped back even further and asked, why?

> +			int i;
> +
> +			for (i = 0; i < nr_rec; i++)
> +				trace_cxl_generic_event(dev_name(cxlds->dev),
> +							type,
> +							&payload->records[i]);

As far as I can tell trace_cxl_generic_event() always expects a
device-name as its first argument. So why not enforce that with
type-safety?  I.e. I think trace_cxl_generic_event() should take a
"struct device *", not a string unless it is really the case that any
old string will do as the first argument to the trace event. Otherwise
the trace point can do "__string(dev_name, dev_name(dev))", and mandate
that callers pass devices.

> +		}
> +
> +		if (trace_cxl_overflow_enabled() &&
> +		    (payload->flags & CXL_GET_EVENT_FLAG_OVERFLOW))
> +			trace_cxl_overflow(dev_name(cxlds->dev), type, payload);
> +
> +	} while (nr_rec);
> +
> +unlock_buffer:
> +	mutex_unlock(&cxlds->event_buf_lock);
> +}
> +
> +static void cxl_mem_free_event_buffer(void *data)
> +{
> +	struct cxl_dev_state *cxlds = data;
> +
> +	kvfree(cxlds->event_buf);
> +}
> +
> +/*
> + * There is a single buffer for reading event logs from the mailbox.  All logs
> + * share this buffer protected by the cxlds->event_buf_lock.
> + */
> +static struct cxl_get_event_payload *alloc_event_buf(struct cxl_dev_state *cxlds)
> +{
> +	struct cxl_get_event_payload *buf;
> +
> +	dev_dbg(cxlds->dev, "Allocating event buffer size %zu\n",
> +		cxlds->payload_size);
> +
> +	buf = kvmalloc(cxlds->payload_size, GFP_KERNEL);
> +	if (buf && devm_add_action_or_reset(cxlds->dev,
> +			cxl_mem_free_event_buffer, cxlds))
> +		return NULL;
> +	return buf;
> +}
> +
> +/**
> + * cxl_mem_get_event_records - Get Event Records from the device
> + * @cxlds: The device data for the operation
> + *
> + * Retrieve all event records available on the device and report them as trace
> + * events.
> + *
> + * See CXL rev 3.0 @8.2.9.2.2 Get Event Records
> + */
> +void cxl_mem_get_event_records(struct cxl_dev_state *cxlds)
> +{
> +	u32 status = readl(cxlds->regs.status + CXLDEV_DEV_EVENT_STATUS_OFFSET);
> +
> +	dev_dbg(cxlds->dev, "Reading event logs: %x\n", status);
> +
> +	if (!cxlds->event_buf) {
> +		cxlds->event_buf = alloc_event_buf(cxlds);
> +		if (WARN_ON_ONCE(!cxlds->event_buf))
> +			return;
> +	}

What's the point of having an event_buf_lock if event_buf is reallocated
every call?

Just allocate event_buf once at the beginning of time during init if the
device supports event log retrieval, and fail the driver load if that
allocation fails. No runtime WARN() for memory allocation.

I notice this patch does not clear events, I trust that comes later in
the series, but I think it belongs here to make this patch a complete
standalone thought.

> +	if (status & CXLDEV_EVENT_STATUS_INFO)
> +		cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_INFO);
> +	if (status & CXLDEV_EVENT_STATUS_WARN)
> +		cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_WARN);
> +	if (status & CXLDEV_EVENT_STATUS_FAIL)
> +		cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_FAIL);
> +	if (status & CXLDEV_EVENT_STATUS_FATAL)
> +		cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_FATAL);

This retrieval order should be flipped. If there is a FATAL pending I
expect a monitor wants that first and would be happy to parse the INFO
later. I would go so far as to say that if the INFO logger is looping
and a FATAL comes in the driver should get that out first before going
back for more INFO logs. That would mean executing Clear Events and
looping through the logs by priority until all the status bits fall
silent inside cxl_mem_get_records_log().

> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_mem_get_event_records, CXL);
> +
>  /**
>   * cxl_mem_get_partition_info - Get partition info
>   * @cxlds: The device data for the operation
> @@ -846,6 +950,7 @@ struct cxl_dev_state *cxl_dev_state_create(struct device *dev)
>  	}
>  
>  	mutex_init(&cxlds->mbox_mutex);
> +	mutex_init(&cxlds->event_buf_lock);
>  	cxlds->dev = dev;
>  
>  	return cxlds;
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index f680450f0b16..d4baae74cd97 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -132,6 +132,13 @@ static inline int ways_to_cxl(unsigned int ways, u8 *iw)
>  #define CXLDEV_CAP_CAP_ID_SECONDARY_MAILBOX 0x3
>  #define CXLDEV_CAP_CAP_ID_MEMDEV 0x4000
>  
> +/* CXL 3.0 8.2.8.3.1 Event Status Register */
> +#define CXLDEV_DEV_EVENT_STATUS_OFFSET		0x00
> +#define CXLDEV_EVENT_STATUS_INFO		BIT(0)
> +#define CXLDEV_EVENT_STATUS_WARN		BIT(1)
> +#define CXLDEV_EVENT_STATUS_FAIL		BIT(2)
> +#define CXLDEV_EVENT_STATUS_FATAL		BIT(3)
> +
>  /* CXL 2.0 8.2.8.4 Mailbox Registers */
>  #define CXLDEV_MBOX_CAPS_OFFSET 0x00
>  #define   CXLDEV_MBOX_CAP_PAYLOAD_SIZE_MASK GENMASK(4, 0)
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index cd35f43fedd4..55d57f5a64bc 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -4,6 +4,7 @@
>  #define __CXL_MEM_H__
>  #include <uapi/linux/cxl_mem.h>
>  #include <linux/cdev.h>
> +#include <linux/uuid.h>
>  #include "cxl.h"
>  
>  /* CXL 2.0 8.2.8.5.1.1 Memory Device Status Register */
> @@ -250,12 +251,16 @@ struct cxl_dev_state {
>  
>  	bool msi_enabled;
>  
> +	struct cxl_get_event_payload *event_buf;
> +	struct mutex event_buf_lock;
> +

Missing kdoc.

>  	int (*mbox_send)(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd);
>  };
>  
>  enum cxl_opcode {
>  	CXL_MBOX_OP_INVALID		= 0x0000,
>  	CXL_MBOX_OP_RAW			= CXL_MBOX_OP_INVALID,
> +	CXL_MBOX_OP_GET_EVENT_RECORD	= 0x0100,
>  	CXL_MBOX_OP_GET_FW_INFO		= 0x0200,
>  	CXL_MBOX_OP_ACTIVATE_FW		= 0x0202,
>  	CXL_MBOX_OP_GET_SUPPORTED_LOGS	= 0x0400,
> @@ -325,6 +330,72 @@ struct cxl_mbox_identify {
>  	u8 qos_telemetry_caps;
>  } __packed;
>  
> +/*
> + * Common Event Record Format
> + * CXL rev 3.0 section 8.2.9.2.1; Table 8-42
> + */
> +struct cxl_event_record_hdr {
> +	uuid_t id;
> +	u8 length;
> +	u8 flags[3];
> +	__le16 handle;
> +	__le16 related_handle;
> +	__le64 timestamp;
> +	u8 maint_op_class;
> +	u8 reserved[0xf];

Nit, but lets not copy the spec here and just make all the field sizes
decimal. It even saves a character to write 15 instead of 0xf, and @flags
is also decimal.

> +} __packed;
> +
> +#define CXL_EVENT_RECORD_DATA_LENGTH 0x50
> +struct cxl_event_record_raw {
> +	struct cxl_event_record_hdr hdr;
> +	u8 data[CXL_EVENT_RECORD_DATA_LENGTH];
> +} __packed;
> +
> +/*
> + * Get Event Records output payload
> + * CXL rev 3.0 section 8.2.9.2.2; Table 8-50
> + */
> +#define CXL_GET_EVENT_FLAG_OVERFLOW		BIT(0)
> +#define CXL_GET_EVENT_FLAG_MORE_RECORDS		BIT(1)
> +struct cxl_get_event_payload {
> +	u8 flags;
> +	u8 reserved1;
> +	__le16 overflow_err_count;
> +	__le64 first_overflow_timestamp;
> +	__le64 last_overflow_timestamp;
> +	__le16 record_count;
> +	u8 reserved2[0xa];

Same nit.

> +	struct cxl_event_record_raw records[];
> +} __packed;
> +
> +/*
> + * CXL rev 3.0 section 8.2.9.2.2; Table 8-49
> + */
> +enum cxl_event_log_type {
> +	CXL_EVENT_TYPE_INFO = 0x00,
> +	CXL_EVENT_TYPE_WARN,
> +	CXL_EVENT_TYPE_FAIL,
> +	CXL_EVENT_TYPE_FATAL,
> +	CXL_EVENT_TYPE_MAX
> +};
> +
> +static inline const char *cxl_event_log_type_str(enum cxl_event_log_type type)
> +{
> +	switch (type) {
> +	case CXL_EVENT_TYPE_INFO:
> +		return "Informational";
> +	case CXL_EVENT_TYPE_WARN:
> +		return "Warning";
> +	case CXL_EVENT_TYPE_FAIL:
> +		return "Failure";
> +	case CXL_EVENT_TYPE_FATAL:
> +		return "Fatal";
> +	default:
> +		break;
> +	}
> +	return "<unknown>";
> +}
> +
>  struct cxl_mbox_get_partition_info {
>  	__le64 active_volatile_cap;
>  	__le64 active_persistent_cap;
> @@ -384,6 +455,7 @@ int cxl_mem_create_range_info(struct cxl_dev_state *cxlds);
>  struct cxl_dev_state *cxl_dev_state_create(struct device *dev);
>  void set_exclusive_cxl_commands(struct cxl_dev_state *cxlds, unsigned long *cmds);
>  void clear_exclusive_cxl_commands(struct cxl_dev_state *cxlds, unsigned long *cmds);
> +void cxl_mem_get_event_records(struct cxl_dev_state *cxlds);
>  #ifdef CONFIG_CXL_SUSPEND
>  void cxl_mem_active_inc(void);
>  void cxl_mem_active_dec(void);
> diff --git a/include/trace/events/cxl.h b/include/trace/events/cxl.h
> new file mode 100644
> index 000000000000..c03a1a894af8
> --- /dev/null
> +++ b/include/trace/events/cxl.h
> @@ -0,0 +1,126 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM cxl
> +
> +#if !defined(_CXL_TRACE_EVENTS_H) ||  defined(TRACE_HEADER_MULTI_READ)
> +#define _CXL_TRACE_EVENTS_H
> +
> +#include <asm-generic/unaligned.h>
> +#include <linux/tracepoint.h>
> +#include <cxlmem.h>
> +
> +TRACE_EVENT(cxl_overflow,
> +
> +	TP_PROTO(const char *dev_name, enum cxl_event_log_type log,
> +		 struct cxl_get_event_payload *payload),
> +
> +	TP_ARGS(dev_name, log, payload),
> +
> +	TP_STRUCT__entry(
> +		__string(dev_name, dev_name)
> +		__field(int, log)
> +		__field(u64, first_ts)
> +		__field(u64, last_ts)
> +		__field(u16, count)
> +	),
> +
> +	TP_fast_assign(
> +		__assign_str(dev_name, dev_name);
> +		__entry->log = log;
> +		__entry->count = le16_to_cpu(payload->overflow_err_count);
> +		__entry->first_ts = le64_to_cpu(payload->first_overflow_timestamp);
> +		__entry->last_ts = le64_to_cpu(payload->last_overflow_timestamp);
> +	),
> +
> +	TP_printk("%s: EVENT LOG OVERFLOW log=%s : %u records from %llu to %llu",
> +		__get_str(dev_name), cxl_event_log_type_str(__entry->log),
> +		__entry->count, __entry->first_ts, __entry->last_ts)
> +
> +);
> +
> +/*
> + * Common Event Record Format
> + * CXL 3.0 section 8.2.9.2.1; Table 8-42
> + */
> +#define CXL_EVENT_RECORD_FLAG_PERMANENT		BIT(2)
> +#define CXL_EVENT_RECORD_FLAG_MAINT_NEEDED	BIT(3)
> +#define CXL_EVENT_RECORD_FLAG_PERF_DEGRADED	BIT(4)
> +#define CXL_EVENT_RECORD_FLAG_HW_REPLACE	BIT(5)
> +#define show_hdr_flags(flags)	__print_flags(flags, " | ",			   \
> +	{ CXL_EVENT_RECORD_FLAG_PERMANENT,	"PERMANENT_CONDITION"		}, \
> +	{ CXL_EVENT_RECORD_FLAG_MAINT_NEEDED,	"MAINTENANCE_NEEDED"		}, \
> +	{ CXL_EVENT_RECORD_FLAG_PERF_DEGRADED,	"PERFORMANCE_DEGRADED"		}, \
> +	{ CXL_EVENT_RECORD_FLAG_HW_REPLACE,	"HARDWARE_REPLACEMENT_NEEDED"	}  \
> +)
> +
> +/*
> + * Define macros for the common header of each CXL event.
> + *
> + * Tracepoints using these macros must do 3 things:
> + *
> + *	1) Add CXL_EVT_TP_entry to TP_STRUCT__entry
> + *	2) Use CXL_EVT_TP_fast_assign within TP_fast_assign;
> + *	   pass the dev_name, log, and CXL event header
> + *	3) Use CXL_EVT_TP_printk() instead of TP_printk()
> + *
> + * See the generic_event tracepoint as an example.
> + */
> +#define CXL_EVT_TP_entry					\
> +	__string(dev_name, dev_name)				\
> +	__field(int, log)					\
> +	__field_struct(uuid_t, hdr_uuid)			\
> +	__field(u32, hdr_flags)					\
> +	__field(u16, hdr_handle)				\
> +	__field(u16, hdr_related_handle)			\
> +	__field(u64, hdr_timestamp)				\
> +	__field(u8, hdr_length)					\
> +	__field(u8, hdr_maint_op_class)
> +
> +#define CXL_EVT_TP_fast_assign(dname, l, hdr)					\
> +	__assign_str(dev_name, (dname));					\
> +	__entry->log = (l);							\
> +	memcpy(&__entry->hdr_uuid, &(hdr).id, sizeof(uuid_t));			\
> +	__entry->hdr_length = (hdr).length;					\
> +	__entry->hdr_flags = get_unaligned_le24((hdr).flags);			\
> +	__entry->hdr_handle = le16_to_cpu((hdr).handle);			\
> +	__entry->hdr_related_handle = le16_to_cpu((hdr).related_handle);	\
> +	__entry->hdr_timestamp = le64_to_cpu((hdr).timestamp);			\
> +	__entry->hdr_maint_op_class = (hdr).maint_op_class
> +
> +#define CXL_EVT_TP_printk(fmt, ...) \
> +	TP_printk("%s log=%s : time=%llu uuid=%pUb len=%d flags='%s' "		\
> +		"handle=%x related_handle=%x maint_op_class=%u"			\
> +		" : " fmt,							\
> +		__get_str(dev_name), cxl_event_log_type_str(__entry->log),	\
> +		__entry->hdr_timestamp, &__entry->hdr_uuid, __entry->hdr_length,\
> +		show_hdr_flags(__entry->hdr_flags), __entry->hdr_handle,	\
> +		__entry->hdr_related_handle, __entry->hdr_maint_op_class,	\
> +		##__VA_ARGS__)
> +
> +TRACE_EVENT(cxl_generic_event,
> +
> +	TP_PROTO(const char *dev_name, enum cxl_event_log_type log,
> +		 struct cxl_event_record_raw *rec),
> +
> +	TP_ARGS(dev_name, log, rec),
> +
> +	TP_STRUCT__entry(
> +		CXL_EVT_TP_entry
> +		__array(u8, data, CXL_EVENT_RECORD_DATA_LENGTH)
> +	),
> +
> +	TP_fast_assign(
> +		CXL_EVT_TP_fast_assign(dev_name, log, rec->hdr);
> +		memcpy(__entry->data, &rec->data, CXL_EVENT_RECORD_DATA_LENGTH);
> +	),
> +
> +	CXL_EVT_TP_printk("%s",
> +		__print_hex(__entry->data, CXL_EVENT_RECORD_DATA_LENGTH))
> +);
> +
> +#endif /* _CXL_TRACE_EVENTS_H */
> +
> +/* This part must be outside protection */
> +#undef TRACE_INCLUDE_FILE
> +#define TRACE_INCLUDE_FILE cxl
> +#include <trace/define_trace.h>
> diff --git a/include/uapi/linux/cxl_mem.h b/include/uapi/linux/cxl_mem.h
> index c71021a2a9ed..70459be5bdd4 100644
> --- a/include/uapi/linux/cxl_mem.h
> +++ b/include/uapi/linux/cxl_mem.h
> @@ -24,6 +24,7 @@
>  	___C(IDENTIFY, "Identify Command"),                               \
>  	___C(RAW, "Raw device command"),                                  \
>  	___C(GET_SUPPORTED_LOGS, "Get Supported Logs"),                   \
> +	___C(GET_EVENT_RECORD, "Get Event Record"),                       \
>  	___C(GET_FW_INFO, "Get FW Info"),                                 \
>  	___C(GET_PARTITION_INFO, "Get Partition Information"),            \
>  	___C(GET_LSA, "Get Label Storage Area"),                          \

Yikes, no, this is an enum. New commands need to come at the end
otherwise different kernels will disagree about the command numbering.
Likely needs a comment here alerting future devs about the ABI breakage
danger here.
  
Steven Rostedt Dec. 2, 2022, 4:40 a.m. UTC | #6
On Thu, 1 Dec 2022 16:09:17 -0800
Ira Weiny <ira.weiny@intel.com> wrote:

> Dropping that into cxlmem.h does not compile.  I've given it another go but
> because I use cxl_event_log_type_str() in a file where trace points are used
> CREATE_TRACE_POINTS is defined and I get the following error.
> 
> || drivers/cxl/core/mbox.c: In function ‘cxl_mem_get_records_log’:
> drivers/cxl/cxlmem.h|386 col 7| error: implicit declaration of function ‘__print_symbolic’; did you mean ‘sprint_symbol’?  [-Werror=implicit-function-declaration]                        
> ||   386 |       __print_symbolic(type,                            \
> ||       |       ^~~~~~~~~~~~~~~~
> 
> I got it to work with the patch below on top of this one.[3]  But it is kind of
> ugly.  The only way I could get __print_symbolic() to be defined was to
> redefine it in mbox.c.[1]  Then throw it in it's own header as in [3]

I played around a bit, and with the below patch, you can just have:


#define cxl_event_log_type_str(type)				\
	__print_symbolic(type,					\
		{ CXL_EVENT_TYPE_INFO, "Informational" },	\
		{ CXL_EVENT_TYPE_WARN, "Warning" },		\
		{ CXL_EVENT_TYPE_FAIL, "Failure" },		\
		{ CXL_EVENT_TYPE_FATAL, "Fatal" })

And everything else should "just work" :-)

I can work on a more formal patch if this works for you. And thinking about
this, perhaps we could add this throughout the kernel!

-- Steve

diff --git a/include/trace/define_trace.h b/include/trace/define_trace.h
index 00723935dcc7..ee41057674a2 100644
--- a/include/trace/define_trace.h
+++ b/include/trace/define_trace.h
@@ -132,4 +132,25 @@
 /* We may be processing more files */
 #define CREATE_TRACE_POINTS
 
+#ifndef __DEFINE_PRINT_SYMBOLIC_STR
+#define __DEFINE_PRINT_SYMBOLIC_STR
+static inline const char *
+__print_symbolic_str(int type, struct trace_print_flags *symbols)
+{
+	for (; symbols->name != NULL; symbols++) {
+		if (type == symbols->mask)
+			return symbols->name;
+	}
+	return "<invalid>";
+}
+#endif
+
+#undef __print_symbolic
+#define __print_symbolic(value, symbol_array...)			\
+	({								\
+		static const struct trace_print_flags symbols[] =	\
+			{ symbol_array, { -1, NULL }};			\
+		__print_symbolic_str(value, symbols);			\
+	})
+
 #endif /* CREATE_TRACE_POINTS */
diff --git a/include/trace/stages/stage7_class_define.h b/include/trace/stages/stage7_class_define.h
index 8a7ec24c246d..6fe83397f65d 100644
--- a/include/trace/stages/stage7_class_define.h
+++ b/include/trace/stages/stage7_class_define.h
@@ -6,7 +6,6 @@
 #define __entry REC
 
 #undef __print_flags
-#undef __print_symbolic
 #undef __print_hex
 #undef __print_hex_str
 #undef __get_dynamic_array
  
Steven Rostedt Dec. 2, 2022, 5 a.m. UTC | #7
On Thu, 1 Dec 2022 23:40:52 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> +#undef __print_symbolic
> +#define __print_symbolic(value, symbol_array...)			\
> +	({								\
> +		static const struct trace_print_flags symbols[] =	\
> +			{ symbol_array, { -1, NULL }};			\
> +		__print_symbolic_str(value, symbols);			\
> +	})
> +
>  #endif /* CREATE_TRACE_POINTS */

Bah, I want this outside that #ifdef

> diff --git a/include/trace/stages/stage7_class_define.h b/include/trace/stages/stage7_class_define.h
> index 8a7ec24c246d..6fe83397f65d 100644
> --- a/include/trace/stages/stage7_class_define.h
> +++ b/include/trace/stages/stage7_class_define.h
> @@ -6,7 +6,6 @@

I also don't think I need to touch stage7.

New patch:

diff --git a/include/trace/define_trace.h b/include/trace/define_trace.h
index 00723935dcc7..9d665f634614 100644
--- a/include/trace/define_trace.h
+++ b/include/trace/define_trace.h
@@ -133,3 +133,24 @@
 #define CREATE_TRACE_POINTS
 
 #endif /* CREATE_TRACE_POINTS */
+
+#ifndef __DEFINE_PRINT_SYMBOLIC_STR
+#define __DEFINE_PRINT_SYMBOLIC_STR
+static inline const char *
+__print_symbolic_str(int type, struct trace_print_flags *symbols)
+{
+	for (; symbols->name != NULL; symbols++) {
+		if (type == symbols->mask)
+			return symbols->name;
+	}
+	return "<invalid>";
+}
+#endif
+
+#undef __print_symbolic
+#define __print_symbolic(value, symbol_array...)			\
+	({								\
+		static const struct trace_print_flags symbols[] =	\
+			{ symbol_array, { -1, NULL }};			\
+		__print_symbolic_str(value, symbols);			\
+	})
  
Ira Weiny Dec. 2, 2022, 9:31 p.m. UTC | #8
On Fri, Dec 02, 2022 at 12:00:59AM -0500, Steven Rostedt wrote:
> On Thu, 1 Dec 2022 23:40:52 -0500
> Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > +#undef __print_symbolic
> > +#define __print_symbolic(value, symbol_array...)			\
> > +	({								\
> > +		static const struct trace_print_flags symbols[] =	\
> > +			{ symbol_array, { -1, NULL }};			\
> > +		__print_symbolic_str(value, symbols);			\
> > +	})
> > +
> >  #endif /* CREATE_TRACE_POINTS */
> 
> Bah, I want this outside that #ifdef
> 
> > diff --git a/include/trace/stages/stage7_class_define.h b/include/trace/stages/stage7_class_define.h
> > index 8a7ec24c246d..6fe83397f65d 100644
> > --- a/include/trace/stages/stage7_class_define.h
> > +++ b/include/trace/stages/stage7_class_define.h
> > @@ -6,7 +6,6 @@
> 
> I also don't think I need to touch stage7.
> 
> New patch:

I'm still going to defer this but will include a follow up patch to add this
back in after the bulk of the series gets in.

Thanks for helping here.  I still want to understand this all better but I have
to focus on the main features ATM.

Thanks again!
Ira

> 
> diff --git a/include/trace/define_trace.h b/include/trace/define_trace.h
> index 00723935dcc7..9d665f634614 100644
> --- a/include/trace/define_trace.h
> +++ b/include/trace/define_trace.h
> @@ -133,3 +133,24 @@
>  #define CREATE_TRACE_POINTS
>  
>  #endif /* CREATE_TRACE_POINTS */
> +
> +#ifndef __DEFINE_PRINT_SYMBOLIC_STR
> +#define __DEFINE_PRINT_SYMBOLIC_STR
> +static inline const char *
> +__print_symbolic_str(int type, struct trace_print_flags *symbols)
> +{
> +	for (; symbols->name != NULL; symbols++) {
> +		if (type == symbols->mask)
> +			return symbols->name;
> +	}
> +	return "<invalid>";
> +}
> +#endif
> +
> +#undef __print_symbolic
> +#define __print_symbolic(value, symbol_array...)			\
> +	({								\
> +		static const struct trace_print_flags symbols[] =	\
> +			{ symbol_array, { -1, NULL }};			\
> +		__print_symbolic_str(value, symbols);			\
> +	})
  
Ira Weiny Dec. 2, 2022, 9:47 p.m. UTC | #9
On Thu, Dec 01, 2022 at 05:39:12PM -0800, Dan Williams wrote:
> ira.weiny@ wrote:
> > From: Ira Weiny <ira.weiny@intel.com>
> > 

[snip]

> >  
> > +#define CREATE_TRACE_POINTS
> > +#include <trace/events/cxl.h>
> > +
> >  #include "core.h"
> >  
> >  static bool cxl_raw_allow_all;
> > @@ -48,6 +51,7 @@ static struct cxl_mem_command cxl_mem_commands[CXL_MEM_COMMAND_ID_MAX] = {
> >  	CXL_CMD(RAW, CXL_VARIABLE_PAYLOAD, CXL_VARIABLE_PAYLOAD, 0),
> >  #endif
> >  	CXL_CMD(GET_SUPPORTED_LOGS, 0, CXL_VARIABLE_PAYLOAD, CXL_CMD_FLAG_FORCE_ENABLE),
> > +	CXL_CMD(GET_EVENT_RECORD, 1, CXL_VARIABLE_PAYLOAD, 0),
> >  	CXL_CMD(GET_FW_INFO, 0, 0x50, 0),
> >  	CXL_CMD(GET_PARTITION_INFO, 0, 0x20, 0),
> >  	CXL_CMD(GET_LSA, 0x8, CXL_VARIABLE_PAYLOAD, 0),
> 
> Similar to this patch:
> 
> https://lore.kernel.org/linux-cxl/166993221008.1995348.11651567302609703175.stgit@dwillia2-xfh.jf.intel.com/
> 
> CXL_MEM_COMMAND_ID_GET_EVENT_RECORD, should be added to the "always
> kernel" / cxlds->exclusive_cmds mask.

Done for all the commands.  I'll rebase as well before sending this out.

> 
> > @@ -704,6 +708,106 @@ int cxl_enumerate_cmds(struct cxl_dev_state *cxlds)
> >  }
> >  EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL);
> >  
> > +static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
> > +				    enum cxl_event_log_type type)
> > +{
> > +	struct cxl_get_event_payload *payload;
> > +	u16 nr_rec;
> > +
> > +	mutex_lock(&cxlds->event_buf_lock);
> > +
> > +	payload = cxlds->event_buf;
> > +
> > +	do {
> > +		u8 log_type = type;
> > +		int rc;
> > +
> > +		rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_GET_EVENT_RECORD,
> > +				       &log_type, sizeof(log_type),
> > +				       payload, cxlds->payload_size);
> > +		if (rc) {
> > +			dev_err(cxlds->dev, "Event log '%s': Failed to query event records : %d",
> > +				cxl_event_log_type_str(type), rc);
> > +			goto unlock_buffer;
> > +		}
> > +
> > +		nr_rec = le16_to_cpu(payload->record_count);
> > +		if (trace_cxl_generic_event_enabled()) {
> 
> This feels like a premature micro-optimization as none of this code is
> fast path and it is dwarfed by the cost of executing the mailbox
> command. I started with trying to reduce the 80 column collision
> pressure, but then stepped back even further and asked, why?

Because Steven told me to.  :-(  I should have been smarter than that.

> 
> > +			int i;
> > +
> > +			for (i = 0; i < nr_rec; i++)
> > +				trace_cxl_generic_event(dev_name(cxlds->dev),
> > +							type,
> > +							&payload->records[i]);
> 
> As far as I can tell trace_cxl_generic_event() always expects a
> device-name as its first argument. So why not enforce that with
> type-safety?  I.e. I think trace_cxl_generic_event() should take a
> "struct device *", not a string unless it is really the case that any
> old string will do as the first argument to the trace event. Otherwise
> the trace point can do "__string(dev_name, dev_name(dev))", and mandate
> that callers pass devices.

From a trace point view 'any old string' will do.  There was nothing else the
trace needed from struct device so I skipped it.

[snip]

> > +
> > +/**
> > + * cxl_mem_get_event_records - Get Event Records from the device
> > + * @cxlds: The device data for the operation
> > + *
> > + * Retrieve all event records available on the device and report them as trace
> > + * events.
> > + *
> > + * See CXL rev 3.0 @8.2.9.2.2 Get Event Records
> > + */
> > +void cxl_mem_get_event_records(struct cxl_dev_state *cxlds)
> > +{
> > +	u32 status = readl(cxlds->regs.status + CXLDEV_DEV_EVENT_STATUS_OFFSET);
> > +
> > +	dev_dbg(cxlds->dev, "Reading event logs: %x\n", status);
> > +
> > +	if (!cxlds->event_buf) {
> > +		cxlds->event_buf = alloc_event_buf(cxlds);
> > +		if (WARN_ON_ONCE(!cxlds->event_buf))
> > +			return;
> > +	}
> 
> What's the point of having an event_buf_lock if event_buf is reallocated
> every call?

This is only called on start up.

> 
> Just allocate event_buf once at the beginning of time during init if the
> device supports event log retrieval, and fail the driver load if that
> allocation fails. No runtime WARN() for memory allocation.

It was.  I'll make that more clear in the next series.

> 
> I notice this patch does not clear events, I trust that comes later in
> the series, but I think it belongs here to make this patch a complete
> standalone thought.

Squashed.  But it does make for a large patch.  Which I'm not a fan of for
review.  Lucky that now we have a lot of review on the parts.

> 
> > +	if (status & CXLDEV_EVENT_STATUS_INFO)
> > +		cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_INFO);
> > +	if (status & CXLDEV_EVENT_STATUS_WARN)
> > +		cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_WARN);
> > +	if (status & CXLDEV_EVENT_STATUS_FAIL)
> > +		cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_FAIL);
> > +	if (status & CXLDEV_EVENT_STATUS_FATAL)
> > +		cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_FATAL);
> 
> This retrieval order should be flipped. If there is a FATAL pending I
> expect a monitor wants that first and would be happy to parse the INFO
> later. I would go so far as to say that if the INFO logger is looping
> and a FATAL comes in the driver should get that out first before going
> back for more INFO logs. That would mean executing Clear Events and
> looping through the logs by priority until all the status bits fall
> silent inside cxl_mem_get_records_log().

I'll flip them.  And determine if this is really what we want to do for the
irq.

The issue with the irq handling calling a single function which checks all
status is that we may end up with some odd interrupts doing nothing depending
on racing etc.

A buffer per log would eliminate this to some extent if the message numbers are
not shared.  I don't doubt that vendors are unlikely to burn more than 1
message number so the irq may indeed be shared and serialized anyway.

For simplicity I'll throw them all together.

> 
> > +}
> > +EXPORT_SYMBOL_NS_GPL(cxl_mem_get_event_records, CXL);
> > +
> >  /**
> >   * cxl_mem_get_partition_info - Get partition info
> >   * @cxlds: The device data for the operation
> > @@ -846,6 +950,7 @@ struct cxl_dev_state *cxl_dev_state_create(struct device *dev)
> >  	}
> >  
> >  	mutex_init(&cxlds->mbox_mutex);
> > +	mutex_init(&cxlds->event_buf_lock);
> >  	cxlds->dev = dev;
> >  
> >  	return cxlds;
> > diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> > index f680450f0b16..d4baae74cd97 100644
> > --- a/drivers/cxl/cxl.h
> > +++ b/drivers/cxl/cxl.h
> > @@ -132,6 +132,13 @@ static inline int ways_to_cxl(unsigned int ways, u8 *iw)
> >  #define CXLDEV_CAP_CAP_ID_SECONDARY_MAILBOX 0x3
> >  #define CXLDEV_CAP_CAP_ID_MEMDEV 0x4000
> >  
> > +/* CXL 3.0 8.2.8.3.1 Event Status Register */
> > +#define CXLDEV_DEV_EVENT_STATUS_OFFSET		0x00
> > +#define CXLDEV_EVENT_STATUS_INFO		BIT(0)
> > +#define CXLDEV_EVENT_STATUS_WARN		BIT(1)
> > +#define CXLDEV_EVENT_STATUS_FAIL		BIT(2)
> > +#define CXLDEV_EVENT_STATUS_FATAL		BIT(3)
> > +
> >  /* CXL 2.0 8.2.8.4 Mailbox Registers */
> >  #define CXLDEV_MBOX_CAPS_OFFSET 0x00
> >  #define   CXLDEV_MBOX_CAP_PAYLOAD_SIZE_MASK GENMASK(4, 0)
> > diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> > index cd35f43fedd4..55d57f5a64bc 100644
> > --- a/drivers/cxl/cxlmem.h
> > +++ b/drivers/cxl/cxlmem.h
> > @@ -4,6 +4,7 @@
> >  #define __CXL_MEM_H__
> >  #include <uapi/linux/cxl_mem.h>
> >  #include <linux/cdev.h>
> > +#include <linux/uuid.h>
> >  #include "cxl.h"
> >  
> >  /* CXL 2.0 8.2.8.5.1.1 Memory Device Status Register */
> > @@ -250,12 +251,16 @@ struct cxl_dev_state {
> >  
> >  	bool msi_enabled;
> >  
> > +	struct cxl_get_event_payload *event_buf;
> > +	struct mutex event_buf_lock;
> > +
> 
> Missing kdoc.

Got it from Jonathan.

> 
> >  	int (*mbox_send)(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd);
> >  };
> >  
> >  enum cxl_opcode {
> >  	CXL_MBOX_OP_INVALID		= 0x0000,
> >  	CXL_MBOX_OP_RAW			= CXL_MBOX_OP_INVALID,
> > +	CXL_MBOX_OP_GET_EVENT_RECORD	= 0x0100,
> >  	CXL_MBOX_OP_GET_FW_INFO		= 0x0200,
> >  	CXL_MBOX_OP_ACTIVATE_FW		= 0x0202,
> >  	CXL_MBOX_OP_GET_SUPPORTED_LOGS	= 0x0400,
> > @@ -325,6 +330,72 @@ struct cxl_mbox_identify {
> >  	u8 qos_telemetry_caps;
> >  } __packed;
> >  
> > +/*
> > + * Common Event Record Format
> > + * CXL rev 3.0 section 8.2.9.2.1; Table 8-42
> > + */
> > +struct cxl_event_record_hdr {
> > +	uuid_t id;
> > +	u8 length;
> > +	u8 flags[3];
> > +	__le16 handle;
> > +	__le16 related_handle;
> > +	__le64 timestamp;
> > +	u8 maint_op_class;
> > +	u8 reserved[0xf];
> 
> Nit, but lets not copy the spec here and just make all the field sizes
> decimal. It even saves a character to write 15 instead of 0xf, and @flags
> is also decimal.

Ok.

> 
> > +} __packed;
> > +
> > +#define CXL_EVENT_RECORD_DATA_LENGTH 0x50
> > +struct cxl_event_record_raw {
> > +	struct cxl_event_record_hdr hdr;
> > +	u8 data[CXL_EVENT_RECORD_DATA_LENGTH];
> > +} __packed;
> > +
> > +/*
> > + * Get Event Records output payload
> > + * CXL rev 3.0 section 8.2.9.2.2; Table 8-50
> > + */
> > +#define CXL_GET_EVENT_FLAG_OVERFLOW		BIT(0)
> > +#define CXL_GET_EVENT_FLAG_MORE_RECORDS		BIT(1)
> > +struct cxl_get_event_payload {
> > +	u8 flags;
> > +	u8 reserved1;
> > +	__le16 overflow_err_count;
> > +	__le64 first_overflow_timestamp;
> > +	__le64 last_overflow_timestamp;
> > +	__le16 record_count;
> > +	u8 reserved2[0xa];
> 
> Same nit.

Done.

[snip]

> > +
> > +/* This part must be outside protection */
> > +#undef TRACE_INCLUDE_FILE
> > +#define TRACE_INCLUDE_FILE cxl
> > +#include <trace/define_trace.h>
> > diff --git a/include/uapi/linux/cxl_mem.h b/include/uapi/linux/cxl_mem.h
> > index c71021a2a9ed..70459be5bdd4 100644
> > --- a/include/uapi/linux/cxl_mem.h
> > +++ b/include/uapi/linux/cxl_mem.h
> > @@ -24,6 +24,7 @@
> >  	___C(IDENTIFY, "Identify Command"),                               \
> >  	___C(RAW, "Raw device command"),                                  \
> >  	___C(GET_SUPPORTED_LOGS, "Get Supported Logs"),                   \
> > +	___C(GET_EVENT_RECORD, "Get Event Record"),                       \
> >  	___C(GET_FW_INFO, "Get FW Info"),                                 \
> >  	___C(GET_PARTITION_INFO, "Get Partition Information"),            \
> >  	___C(GET_LSA, "Get Label Storage Area"),                          \
> 
> Yikes, no, this is an enum. New commands need to come at the end
> otherwise different kernels will disagree about the command numbering.

Ooops sorry about that.  I somehow thought these were tied to the command
values.

Thanks, Changed for all the commands,

> Likely needs a comment here alerting future devs about the ABI breakage
> danger here.

Additional patch added.

Ira
  
Dan Williams Dec. 3, 2022, 9:33 p.m. UTC | #10
Ira Weiny wrote:
> On Thu, Dec 01, 2022 at 05:39:12PM -0800, Dan Williams wrote:
> > ira.weiny@ wrote:
> > > From: Ira Weiny <ira.weiny@intel.com>
> > > 
> 
> [snip]
> 
> > >  
> > > +#define CREATE_TRACE_POINTS
> > > +#include <trace/events/cxl.h>
> > > +
> > >  #include "core.h"
> > >  
> > >  static bool cxl_raw_allow_all;
> > > @@ -48,6 +51,7 @@ static struct cxl_mem_command cxl_mem_commands[CXL_MEM_COMMAND_ID_MAX] = {
> > >  	CXL_CMD(RAW, CXL_VARIABLE_PAYLOAD, CXL_VARIABLE_PAYLOAD, 0),
> > >  #endif
> > >  	CXL_CMD(GET_SUPPORTED_LOGS, 0, CXL_VARIABLE_PAYLOAD, CXL_CMD_FLAG_FORCE_ENABLE),
> > > +	CXL_CMD(GET_EVENT_RECORD, 1, CXL_VARIABLE_PAYLOAD, 0),
> > >  	CXL_CMD(GET_FW_INFO, 0, 0x50, 0),
> > >  	CXL_CMD(GET_PARTITION_INFO, 0, 0x20, 0),
> > >  	CXL_CMD(GET_LSA, 0x8, CXL_VARIABLE_PAYLOAD, 0),
> > 
> > Similar to this patch:
> > 
> > https://lore.kernel.org/linux-cxl/166993221008.1995348.11651567302609703175.stgit@dwillia2-xfh.jf.intel.com/
> > 
> > CXL_MEM_COMMAND_ID_GET_EVENT_RECORD, should be added to the "always
> > kernel" / cxlds->exclusive_cmds mask.
> 
> Done for all the commands.  I'll rebase as well before sending this out.
> 
> > 
> > > @@ -704,6 +708,106 @@ int cxl_enumerate_cmds(struct cxl_dev_state *cxlds)
> > >  }
> > >  EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL);
> > >  
> > > +static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
> > > +				    enum cxl_event_log_type type)
> > > +{
> > > +	struct cxl_get_event_payload *payload;
> > > +	u16 nr_rec;
> > > +
> > > +	mutex_lock(&cxlds->event_buf_lock);
> > > +
> > > +	payload = cxlds->event_buf;
> > > +
> > > +	do {
> > > +		u8 log_type = type;
> > > +		int rc;
> > > +
> > > +		rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_GET_EVENT_RECORD,
> > > +				       &log_type, sizeof(log_type),
> > > +				       payload, cxlds->payload_size);
> > > +		if (rc) {
> > > +			dev_err(cxlds->dev, "Event log '%s': Failed to query event records : %d",
> > > +				cxl_event_log_type_str(type), rc);
> > > +			goto unlock_buffer;
> > > +		}
> > > +
> > > +		nr_rec = le16_to_cpu(payload->record_count);
> > > +		if (trace_cxl_generic_event_enabled()) {
> > 
> > This feels like a premature micro-optimization as none of this code is
> > fast path and it is dwarfed by the cost of executing the mailbox
> > command. I started with trying to reduce the 80 column collision
> > pressure, but then stepped back even further and asked, why?
> 
> Because Steven told me to.  :-(  I should have been smarter than that.

You did the right thing, I failed to jump in sooner on this set.

> 
> > 
> > > +			int i;
> > > +
> > > +			for (i = 0; i < nr_rec; i++)
> > > +				trace_cxl_generic_event(dev_name(cxlds->dev),
> > > +							type,
> > > +							&payload->records[i]);
> > 
> > As far as I can tell trace_cxl_generic_event() always expects a
> > device-name as its first argument. So why not enforce that with
> > type-safety?  I.e. I think trace_cxl_generic_event() should take a
> > "struct device *", not a string unless it is really the case that any
> > old string will do as the first argument to the trace event. Otherwise
> > the trace point can do "__string(dev_name, dev_name(dev))", and mandate
> > that callers pass devices.
> 
> From a trace point view 'any old string' will do.  There was nothing else the
> trace needed from struct device so I skipped it.

I'd prefer more fine-grained type safety wherever possible.

> 
> [snip]
> 
> > > +
> > > +/**
> > > + * cxl_mem_get_event_records - Get Event Records from the device
> > > + * @cxlds: The device data for the operation
> > > + *
> > > + * Retrieve all event records available on the device and report them as trace
> > > + * events.
> > > + *
> > > + * See CXL rev 3.0 @8.2.9.2.2 Get Event Records
> > > + */
> > > +void cxl_mem_get_event_records(struct cxl_dev_state *cxlds)
> > > +{
> > > +	u32 status = readl(cxlds->regs.status + CXLDEV_DEV_EVENT_STATUS_OFFSET);
> > > +
> > > +	dev_dbg(cxlds->dev, "Reading event logs: %x\n", status);
> > > +
> > > +	if (!cxlds->event_buf) {
> > > +		cxlds->event_buf = alloc_event_buf(cxlds);
> > > +		if (WARN_ON_ONCE(!cxlds->event_buf))
> > > +			return;
> > > +	}
> > 
> > What's the point of having an event_buf_lock if event_buf is reallocated
> > every call?
> 
> This is only called on start up.

cxl_mem_get_event_records() is called all the time. The place to
allocate buffers attached to 'struct cxl_dev_state' at start up is
cxl_dev_state_create(), or sometime after cxl_enumerate_cmds() if you
want to wait and see if the device supports events and CXL _OSC says the
driver can drive events.

> > Just allocate event_buf once at the beginning of time during init if the
> > device supports event log retrieval, and fail the driver load if that
> > allocation fails. No runtime WARN() for memory allocation.
> 
> It was.  I'll make that more clear in the next series.
> 
> > 
> > I notice this patch does not clear events, I trust that comes later in
> > the series, but I think it belongs here to make this patch a complete
> > standalone thought.
> 
> Squashed.  But it does make for a large patch.  Which I'm not a fan of for
> review.  Lucky that now we have a lot of review on the parts.
> 
> > 
> > > +	if (status & CXLDEV_EVENT_STATUS_INFO)
> > > +		cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_INFO);
> > > +	if (status & CXLDEV_EVENT_STATUS_WARN)
> > > +		cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_WARN);
> > > +	if (status & CXLDEV_EVENT_STATUS_FAIL)
> > > +		cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_FAIL);
> > > +	if (status & CXLDEV_EVENT_STATUS_FATAL)
> > > +		cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_FATAL);
> > 
> > This retrieval order should be flipped. If there is a FATAL pending I
> > expect a monitor wants that first and would be happy to parse the INFO
> > later. I would go so far as to say that if the INFO logger is looping
> > and a FATAL comes in the driver should get that out first before going
> > back for more INFO logs. That would mean executing Clear Events and
> > looping through the logs by priority until all the status bits fall
> > silent inside cxl_mem_get_records_log().
> 
> I'll flip them.  And determine if this is really what we want to do for the
> irq.
> 
> The issue with the irq handling calling a single function which checks all
> status is that we may end up with some odd interrupts doing nothing depending
> on racing etc.

If an event handler wakes and reads 0-status bits because another
handler did it then return IRQ_HANDLED. You'll have this problem whether
you have a central function or not, because there's only one status
register for multiple sources.
  

Patch

diff --git a/MAINTAINERS b/MAINTAINERS
index ca063a504026..4b7c6e3055c6 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5223,6 +5223,7 @@  M:	Dan Williams <dan.j.williams@intel.com>
 L:	linux-cxl@vger.kernel.org
 S:	Maintained
 F:	drivers/cxl/
+F:	include/trace/events/cxl.h
 F:	include/uapi/linux/cxl_mem.h
 
 CONEXANT ACCESSRUNNER USB DRIVER
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 16176b9278b4..70b681027a3d 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -7,6 +7,9 @@ 
 #include <cxlmem.h>
 #include <cxl.h>
 
+#define CREATE_TRACE_POINTS
+#include <trace/events/cxl.h>
+
 #include "core.h"
 
 static bool cxl_raw_allow_all;
@@ -48,6 +51,7 @@  static struct cxl_mem_command cxl_mem_commands[CXL_MEM_COMMAND_ID_MAX] = {
 	CXL_CMD(RAW, CXL_VARIABLE_PAYLOAD, CXL_VARIABLE_PAYLOAD, 0),
 #endif
 	CXL_CMD(GET_SUPPORTED_LOGS, 0, CXL_VARIABLE_PAYLOAD, CXL_CMD_FLAG_FORCE_ENABLE),
+	CXL_CMD(GET_EVENT_RECORD, 1, CXL_VARIABLE_PAYLOAD, 0),
 	CXL_CMD(GET_FW_INFO, 0, 0x50, 0),
 	CXL_CMD(GET_PARTITION_INFO, 0, 0x20, 0),
 	CXL_CMD(GET_LSA, 0x8, CXL_VARIABLE_PAYLOAD, 0),
@@ -704,6 +708,106 @@  int cxl_enumerate_cmds(struct cxl_dev_state *cxlds)
 }
 EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL);
 
+static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
+				    enum cxl_event_log_type type)
+{
+	struct cxl_get_event_payload *payload;
+	u16 nr_rec;
+
+	mutex_lock(&cxlds->event_buf_lock);
+
+	payload = cxlds->event_buf;
+
+	do {
+		u8 log_type = type;
+		int rc;
+
+		rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_GET_EVENT_RECORD,
+				       &log_type, sizeof(log_type),
+				       payload, cxlds->payload_size);
+		if (rc) {
+			dev_err(cxlds->dev, "Event log '%s': Failed to query event records : %d",
+				cxl_event_log_type_str(type), rc);
+			goto unlock_buffer;
+		}
+
+		nr_rec = le16_to_cpu(payload->record_count);
+		if (trace_cxl_generic_event_enabled()) {
+			int i;
+
+			for (i = 0; i < nr_rec; i++)
+				trace_cxl_generic_event(dev_name(cxlds->dev),
+							type,
+							&payload->records[i]);
+		}
+
+		if (trace_cxl_overflow_enabled() &&
+		    (payload->flags & CXL_GET_EVENT_FLAG_OVERFLOW))
+			trace_cxl_overflow(dev_name(cxlds->dev), type, payload);
+
+	} while (nr_rec);
+
+unlock_buffer:
+	mutex_unlock(&cxlds->event_buf_lock);
+}
+
+static void cxl_mem_free_event_buffer(void *data)
+{
+	struct cxl_dev_state *cxlds = data;
+
+	kvfree(cxlds->event_buf);
+}
+
+/*
+ * There is a single buffer for reading event logs from the mailbox.  All logs
+ * share this buffer protected by the cxlds->event_buf_lock.
+ */
+static struct cxl_get_event_payload *alloc_event_buf(struct cxl_dev_state *cxlds)
+{
+	struct cxl_get_event_payload *buf;
+
+	dev_dbg(cxlds->dev, "Allocating event buffer size %zu\n",
+		cxlds->payload_size);
+
+	buf = kvmalloc(cxlds->payload_size, GFP_KERNEL);
+	if (buf && devm_add_action_or_reset(cxlds->dev,
+			cxl_mem_free_event_buffer, cxlds))
+		return NULL;
+	return buf;
+}
+
+/**
+ * cxl_mem_get_event_records - Get Event Records from the device
+ * @cxlds: The device data for the operation
+ *
+ * Retrieve all event records available on the device and report them as trace
+ * events.
+ *
+ * See CXL rev 3.0 @8.2.9.2.2 Get Event Records
+ */
+void cxl_mem_get_event_records(struct cxl_dev_state *cxlds)
+{
+	u32 status = readl(cxlds->regs.status + CXLDEV_DEV_EVENT_STATUS_OFFSET);
+
+	dev_dbg(cxlds->dev, "Reading event logs: %x\n", status);
+
+	if (!cxlds->event_buf) {
+		cxlds->event_buf = alloc_event_buf(cxlds);
+		if (WARN_ON_ONCE(!cxlds->event_buf))
+			return;
+	}
+
+	if (status & CXLDEV_EVENT_STATUS_INFO)
+		cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_INFO);
+	if (status & CXLDEV_EVENT_STATUS_WARN)
+		cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_WARN);
+	if (status & CXLDEV_EVENT_STATUS_FAIL)
+		cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_FAIL);
+	if (status & CXLDEV_EVENT_STATUS_FATAL)
+		cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_FATAL);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_mem_get_event_records, CXL);
+
 /**
  * cxl_mem_get_partition_info - Get partition info
  * @cxlds: The device data for the operation
@@ -846,6 +950,7 @@  struct cxl_dev_state *cxl_dev_state_create(struct device *dev)
 	}
 
 	mutex_init(&cxlds->mbox_mutex);
+	mutex_init(&cxlds->event_buf_lock);
 	cxlds->dev = dev;
 
 	return cxlds;
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index f680450f0b16..d4baae74cd97 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -132,6 +132,13 @@  static inline int ways_to_cxl(unsigned int ways, u8 *iw)
 #define CXLDEV_CAP_CAP_ID_SECONDARY_MAILBOX 0x3
 #define CXLDEV_CAP_CAP_ID_MEMDEV 0x4000
 
+/* CXL 3.0 8.2.8.3.1 Event Status Register */
+#define CXLDEV_DEV_EVENT_STATUS_OFFSET		0x00
+#define CXLDEV_EVENT_STATUS_INFO		BIT(0)
+#define CXLDEV_EVENT_STATUS_WARN		BIT(1)
+#define CXLDEV_EVENT_STATUS_FAIL		BIT(2)
+#define CXLDEV_EVENT_STATUS_FATAL		BIT(3)
+
 /* CXL 2.0 8.2.8.4 Mailbox Registers */
 #define CXLDEV_MBOX_CAPS_OFFSET 0x00
 #define   CXLDEV_MBOX_CAP_PAYLOAD_SIZE_MASK GENMASK(4, 0)
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index cd35f43fedd4..55d57f5a64bc 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -4,6 +4,7 @@ 
 #define __CXL_MEM_H__
 #include <uapi/linux/cxl_mem.h>
 #include <linux/cdev.h>
+#include <linux/uuid.h>
 #include "cxl.h"
 
 /* CXL 2.0 8.2.8.5.1.1 Memory Device Status Register */
@@ -250,12 +251,16 @@  struct cxl_dev_state {
 
 	bool msi_enabled;
 
+	struct cxl_get_event_payload *event_buf;
+	struct mutex event_buf_lock;
+
 	int (*mbox_send)(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd);
 };
 
 enum cxl_opcode {
 	CXL_MBOX_OP_INVALID		= 0x0000,
 	CXL_MBOX_OP_RAW			= CXL_MBOX_OP_INVALID,
+	CXL_MBOX_OP_GET_EVENT_RECORD	= 0x0100,
 	CXL_MBOX_OP_GET_FW_INFO		= 0x0200,
 	CXL_MBOX_OP_ACTIVATE_FW		= 0x0202,
 	CXL_MBOX_OP_GET_SUPPORTED_LOGS	= 0x0400,
@@ -325,6 +330,72 @@  struct cxl_mbox_identify {
 	u8 qos_telemetry_caps;
 } __packed;
 
+/*
+ * Common Event Record Format
+ * CXL rev 3.0 section 8.2.9.2.1; Table 8-42
+ */
+struct cxl_event_record_hdr {
+	uuid_t id;
+	u8 length;
+	u8 flags[3];
+	__le16 handle;
+	__le16 related_handle;
+	__le64 timestamp;
+	u8 maint_op_class;
+	u8 reserved[0xf];
+} __packed;
+
+#define CXL_EVENT_RECORD_DATA_LENGTH 0x50
+struct cxl_event_record_raw {
+	struct cxl_event_record_hdr hdr;
+	u8 data[CXL_EVENT_RECORD_DATA_LENGTH];
+} __packed;
+
+/*
+ * Get Event Records output payload
+ * CXL rev 3.0 section 8.2.9.2.2; Table 8-50
+ */
+#define CXL_GET_EVENT_FLAG_OVERFLOW		BIT(0)
+#define CXL_GET_EVENT_FLAG_MORE_RECORDS		BIT(1)
+struct cxl_get_event_payload {
+	u8 flags;
+	u8 reserved1;
+	__le16 overflow_err_count;
+	__le64 first_overflow_timestamp;
+	__le64 last_overflow_timestamp;
+	__le16 record_count;
+	u8 reserved2[0xa];
+	struct cxl_event_record_raw records[];
+} __packed;
+
+/*
+ * CXL rev 3.0 section 8.2.9.2.2; Table 8-49
+ */
+enum cxl_event_log_type {
+	CXL_EVENT_TYPE_INFO = 0x00,
+	CXL_EVENT_TYPE_WARN,
+	CXL_EVENT_TYPE_FAIL,
+	CXL_EVENT_TYPE_FATAL,
+	CXL_EVENT_TYPE_MAX
+};
+
+static inline const char *cxl_event_log_type_str(enum cxl_event_log_type type)
+{
+	switch (type) {
+	case CXL_EVENT_TYPE_INFO:
+		return "Informational";
+	case CXL_EVENT_TYPE_WARN:
+		return "Warning";
+	case CXL_EVENT_TYPE_FAIL:
+		return "Failure";
+	case CXL_EVENT_TYPE_FATAL:
+		return "Fatal";
+	default:
+		break;
+	}
+	return "<unknown>";
+}
+
 struct cxl_mbox_get_partition_info {
 	__le64 active_volatile_cap;
 	__le64 active_persistent_cap;
@@ -384,6 +455,7 @@  int cxl_mem_create_range_info(struct cxl_dev_state *cxlds);
 struct cxl_dev_state *cxl_dev_state_create(struct device *dev);
 void set_exclusive_cxl_commands(struct cxl_dev_state *cxlds, unsigned long *cmds);
 void clear_exclusive_cxl_commands(struct cxl_dev_state *cxlds, unsigned long *cmds);
+void cxl_mem_get_event_records(struct cxl_dev_state *cxlds);
 #ifdef CONFIG_CXL_SUSPEND
 void cxl_mem_active_inc(void);
 void cxl_mem_active_dec(void);
diff --git a/include/trace/events/cxl.h b/include/trace/events/cxl.h
new file mode 100644
index 000000000000..c03a1a894af8
--- /dev/null
+++ b/include/trace/events/cxl.h
@@ -0,0 +1,126 @@ 
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM cxl
+
+#if !defined(_CXL_TRACE_EVENTS_H) ||  defined(TRACE_HEADER_MULTI_READ)
+#define _CXL_TRACE_EVENTS_H
+
+#include <asm-generic/unaligned.h>
+#include <linux/tracepoint.h>
+#include <cxlmem.h>
+
+TRACE_EVENT(cxl_overflow,
+
+	TP_PROTO(const char *dev_name, enum cxl_event_log_type log,
+		 struct cxl_get_event_payload *payload),
+
+	TP_ARGS(dev_name, log, payload),
+
+	TP_STRUCT__entry(
+		__string(dev_name, dev_name)
+		__field(int, log)
+		__field(u64, first_ts)
+		__field(u64, last_ts)
+		__field(u16, count)
+	),
+
+	TP_fast_assign(
+		__assign_str(dev_name, dev_name);
+		__entry->log = log;
+		__entry->count = le16_to_cpu(payload->overflow_err_count);
+		__entry->first_ts = le64_to_cpu(payload->first_overflow_timestamp);
+		__entry->last_ts = le64_to_cpu(payload->last_overflow_timestamp);
+	),
+
+	TP_printk("%s: EVENT LOG OVERFLOW log=%s : %u records from %llu to %llu",
+		__get_str(dev_name), cxl_event_log_type_str(__entry->log),
+		__entry->count, __entry->first_ts, __entry->last_ts)
+
+);
+
+/*
+ * Common Event Record Format
+ * CXL 3.0 section 8.2.9.2.1; Table 8-42
+ */
+#define CXL_EVENT_RECORD_FLAG_PERMANENT		BIT(2)
+#define CXL_EVENT_RECORD_FLAG_MAINT_NEEDED	BIT(3)
+#define CXL_EVENT_RECORD_FLAG_PERF_DEGRADED	BIT(4)
+#define CXL_EVENT_RECORD_FLAG_HW_REPLACE	BIT(5)
+#define show_hdr_flags(flags)	__print_flags(flags, " | ",			   \
+	{ CXL_EVENT_RECORD_FLAG_PERMANENT,	"PERMANENT_CONDITION"		}, \
+	{ CXL_EVENT_RECORD_FLAG_MAINT_NEEDED,	"MAINTENANCE_NEEDED"		}, \
+	{ CXL_EVENT_RECORD_FLAG_PERF_DEGRADED,	"PERFORMANCE_DEGRADED"		}, \
+	{ CXL_EVENT_RECORD_FLAG_HW_REPLACE,	"HARDWARE_REPLACEMENT_NEEDED"	}  \
+)
+
+/*
+ * Define macros for the common header of each CXL event.
+ *
+ * Tracepoints using these macros must do 3 things:
+ *
+ *	1) Add CXL_EVT_TP_entry to TP_STRUCT__entry
+ *	2) Use CXL_EVT_TP_fast_assign within TP_fast_assign;
+ *	   pass the dev_name, log, and CXL event header
+ *	3) Use CXL_EVT_TP_printk() instead of TP_printk()
+ *
+ * See the generic_event tracepoint as an example.
+ */
+#define CXL_EVT_TP_entry					\
+	__string(dev_name, dev_name)				\
+	__field(int, log)					\
+	__field_struct(uuid_t, hdr_uuid)			\
+	__field(u32, hdr_flags)					\
+	__field(u16, hdr_handle)				\
+	__field(u16, hdr_related_handle)			\
+	__field(u64, hdr_timestamp)				\
+	__field(u8, hdr_length)					\
+	__field(u8, hdr_maint_op_class)
+
+#define CXL_EVT_TP_fast_assign(dname, l, hdr)					\
+	__assign_str(dev_name, (dname));					\
+	__entry->log = (l);							\
+	memcpy(&__entry->hdr_uuid, &(hdr).id, sizeof(uuid_t));			\
+	__entry->hdr_length = (hdr).length;					\
+	__entry->hdr_flags = get_unaligned_le24((hdr).flags);			\
+	__entry->hdr_handle = le16_to_cpu((hdr).handle);			\
+	__entry->hdr_related_handle = le16_to_cpu((hdr).related_handle);	\
+	__entry->hdr_timestamp = le64_to_cpu((hdr).timestamp);			\
+	__entry->hdr_maint_op_class = (hdr).maint_op_class
+
+#define CXL_EVT_TP_printk(fmt, ...) \
+	TP_printk("%s log=%s : time=%llu uuid=%pUb len=%d flags='%s' "		\
+		"handle=%x related_handle=%x maint_op_class=%u"			\
+		" : " fmt,							\
+		__get_str(dev_name), cxl_event_log_type_str(__entry->log),	\
+		__entry->hdr_timestamp, &__entry->hdr_uuid, __entry->hdr_length,\
+		show_hdr_flags(__entry->hdr_flags), __entry->hdr_handle,	\
+		__entry->hdr_related_handle, __entry->hdr_maint_op_class,	\
+		##__VA_ARGS__)
+
+TRACE_EVENT(cxl_generic_event,
+
+	TP_PROTO(const char *dev_name, enum cxl_event_log_type log,
+		 struct cxl_event_record_raw *rec),
+
+	TP_ARGS(dev_name, log, rec),
+
+	TP_STRUCT__entry(
+		CXL_EVT_TP_entry
+		__array(u8, data, CXL_EVENT_RECORD_DATA_LENGTH)
+	),
+
+	TP_fast_assign(
+		CXL_EVT_TP_fast_assign(dev_name, log, rec->hdr);
+		memcpy(__entry->data, &rec->data, CXL_EVENT_RECORD_DATA_LENGTH);
+	),
+
+	CXL_EVT_TP_printk("%s",
+		__print_hex(__entry->data, CXL_EVENT_RECORD_DATA_LENGTH))
+);
+
+#endif /* _CXL_TRACE_EVENTS_H */
+
+/* This part must be outside protection */
+#undef TRACE_INCLUDE_FILE
+#define TRACE_INCLUDE_FILE cxl
+#include <trace/define_trace.h>
diff --git a/include/uapi/linux/cxl_mem.h b/include/uapi/linux/cxl_mem.h
index c71021a2a9ed..70459be5bdd4 100644
--- a/include/uapi/linux/cxl_mem.h
+++ b/include/uapi/linux/cxl_mem.h
@@ -24,6 +24,7 @@ 
 	___C(IDENTIFY, "Identify Command"),                               \
 	___C(RAW, "Raw device command"),                                  \
 	___C(GET_SUPPORTED_LOGS, "Get Supported Logs"),                   \
+	___C(GET_EVENT_RECORD, "Get Event Record"),                       \
 	___C(GET_FW_INFO, "Get FW Info"),                                 \
 	___C(GET_PARTITION_INFO, "Get Partition Information"),            \
 	___C(GET_LSA, "Get Label Storage Area"),                          \