[RFC,0/3] Fixing i915 PMU use after free after driver unbind

Message ID 20240115170120.662220-1-tvrtko.ursulin@linux.intel.com
Headers
Series Fixing i915 PMU use after free after driver unbind |

Message

Tvrtko Ursulin Jan. 15, 2024, 5:01 p.m. UTC
  From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Hi Peter, all,

This is an early RFC to outline a newly discovered problem in the current
handling of driver unbind with active perf fds.

The sequence is basically this:

1. Open a perf fd
2. Unbind a driver
3. Close the dangling fd

Or a slightly more evil variant:

1. Open a perf fd
2. Unbind a driver
3. Bind the driver again
4. Close the dangling fd

I thought we had this covered by recording the unbound status (pmu->closed in
i915_pmu.c) and making sure the struct i915_pmu (and struct perf_pmu) remain
active until the last event is closed (via internal reference counting). But
what I missed until now are two things:

1)
core.c: _free_event() will dereference event->pmu _after_ event->destroy().

KASAN catches this easily and patches 1 & 2 are the attempt to fix that.

2)
A more evil case where pmu->cpu_pmu_context per-cpu allocation gets re-used
_before_ the old perf fd is closed.

There things can nicely explode on the list_del_init inside event_sched_out on
list_del_init(&event->active_list); (with list debugging turned on of course).

Most easily reproducible by simply re-binding i915, which happens to grab the
same per-cpu block and then the new perf_pmu_register zaps the list_head which
the old event will try to unlink itself from.

This is what the third patch attempts to deal with. It is a bit incomplete
though, as I was unsure what is the best approach to fix and so thought to send
it out early for some guidance.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Cc: Aravind Iddamsetty <aravind.iddamsetty@linux.intel.com>

Tvrtko Ursulin (3):
  perf: Add new late event free callback
  drm/i915/pmu: Move i915 reference drop to new event->free()
  perf: Reference count struct perf_cpu_pmu_context to fix driver unbind

 drivers/gpu/drm/i915/i915_pmu.c |  4 ++--
 include/linux/perf_event.h      |  2 ++
 kernel/events/core.c            | 34 ++++++++++++++++++++++++---------
 3 files changed, 29 insertions(+), 11 deletions(-)