[GIT,PULL] Compute Express Link (CXL) Fixes for 6.8-rc6

Message ID 65da4a4392d41_2bce929471@dwillia2-mobl3.amr.corp.intel.com.notmuch
State New
Headers
Series [GIT,PULL] Compute Express Link (CXL) Fixes for 6.8-rc6 |

Pull-request

git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl tags/cxl-fixes-6.8-rc6

Message

Dan Williams Feb. 24, 2024, 7:57 p.m. UTC
  Hi Linus, please pull from:

  git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl tags/cxl-fixes-6.8-rc6

..to receive a collection of significant fixes for the CXL subsystem.
The largest change in this set, that bordered on "new development", is
the fix for the fact that the location of the new qos_class attribute
did not match the Documentation. The fix ends up deleting more code than
it added, and it has a new unit test to backstop basic errors in this
interface going forward. So the "red-diff" and unit test saved the "rip
it out and try again" response.

In contrast, the new notification path for firmware reported CXL errors
(CXL CPER notifications) has a locking context bug that can not be fixed
with a red-diff. Given where the release cycle stands, it is not
comfortable to squeeze in that fix in these waning days. So, that
receives the "back it out and try again later" treatment.

There is a regression fix in the code that establishes memory NUMA nodes
for platform CXL regions. That has an ack from x86 folks. There are a
couple more fixups for Linux to understand (reassemble) CXL regions
instantiated by platform firmware. The policy around platforms that do
not match host-physical-address with system-physical-address (i.e.
systems that have an address translation mechanism between the address
range reported in the ACPI CEDT.CFMWS and endpoint decoders) has been
softened to abort driver load rather than teardown the memory range (can
cause system hangs).  Lastly, there is a robustness / regression fix for
cases where the driver would previously continue in the face of error,
and a fixup for PCI error notification handling.

This has a build success notification from the kbuild-robot. Stephen noticed
that I needed to rebase the cxl/next branch which pushed out this pull
request by a week, but no other linux-next reports since.

---

The following changes since commit b401b621758e46812da61fa58a67c3fd8d91de0d:

  Linux 6.8-rc5 (2024-02-18 12:56:25 -0800)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl tags/cxl-fixes-6.8-rc6

for you to fetch changes up to 5c6224bfabbf7f3e491c51ab50fd2c6f92ba1141:

  cxl/acpi: Fix load failures due to single window creation failure (2024-02-20 22:58:05 -0800)

----------------------------------------------------------------
cxl fixes for 6.8-rc6

- Fix NUMA initialization from ACPI CEDT.CFMWS

- Fix region assembly failures due to async init order

- Fix / simplify export of qos_class information

- Fix cxl_acpi initialization vs single-window-init failures

- Fix handling of repeated 'pci_channel_io_frozen' notifications

- Workaround platforms that violate host-physical-address ==
  system-physical address assumptions

- Defer CXL CPER notification handling to v6.9

----------------------------------------------------------------
Alison Schofield (4):
      x86/numa: Fix the address overlap check in numa_fill_memblks()
      x86/numa: Fix the sort compare func used in numa_fill_memblks()
      cxl/region: Handle endpoint decoders in cxl_region_find_decoder()
      cxl/region: Allow out of order assembly of autodiscovered regions

Dan Williams (3):
      acpi/ghes: Remove CXL CPER notifications
      Merge branch 'for-6.8/cxl-cper' into for-6.8/cxl
      cxl/acpi: Fix load failures due to single window creation failure

Dave Jiang (4):
      cxl: Change 'struct cxl_memdev_state' *_perf_list to single 'struct cxl_dpa_perf'
      cxl: Remove unnecessary type cast in cxl_qos_class_verify()
      cxl: Fix sysfs export of qos_class for memdev
      cxl/test: Add support for qos_class checking

Li Ming (1):
      cxl/pci: Skip to handle RAS errors if CXL.mem device is detached

Robert Richter (1):
      cxl/pci: Fix disabling memory if DVSEC CXL Range does not match a CFMWS window

 arch/x86/mm/numa.c            | 21 ++++-------
 drivers/acpi/apei/ghes.c      | 63 -------------------------------
 drivers/cxl/acpi.c            | 46 ++++++++++++++---------
 drivers/cxl/core/cdat.c       | 86 +++++++++++++------------------------------
 drivers/cxl/core/mbox.c       |  4 +-
 drivers/cxl/core/memdev.c     | 63 +++++++++++++++++++++++++++++++
 drivers/cxl/core/pci.c        | 49 ++++++++++++++++--------
 drivers/cxl/core/region.c     | 62 +++++++++++++++++++++++--------
 drivers/cxl/cxl.h             |  2 +
 drivers/cxl/cxlmem.h          | 10 ++---
 drivers/cxl/mem.c             | 56 ----------------------------
 drivers/cxl/pci.c             | 57 +---------------------------
 include/linux/cxl-event.h     | 18 ---------
 include/linux/memblock.h      |  2 +
 mm/memblock.c                 |  5 ++-
 tools/testing/cxl/Kbuild      |  1 +
 tools/testing/cxl/test/cxl.c  | 63 ++++++++++++++++++++++++++-----
 tools/testing/cxl/test/mock.c | 14 +++++++
 tools/testing/cxl/test/mock.h |  1 +
 19 files changed, 289 insertions(+), 334 deletions(-)
  

Comments

pr-tracker-bot@kernel.org Feb. 25, 2024, 12:08 a.m. UTC | #1
The pull request you sent on Sat, 24 Feb 2024 11:57:55 -0800:

> git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl tags/cxl-fixes-6.8-rc6

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/ac389bc0ca56e1a2f92b2a17e58298390a3879a8

Thank you!