[RFC,00/17] DRM_USE_DYNAMIC_DEBUG regression

Message ID 20221206003424.592078-1-jim.cromie@gmail.com
Headers
Series DRM_USE_DYNAMIC_DEBUG regression |

Message

Jim Cromie Dec. 6, 2022, 12:34 a.m. UTC
  Hi everyone,

DRM_USE_DYNAMIC_DEBUG=y has a regression on rc-*

Regression is due to a chicken-egg problem loading modules; on
`modprobe i915`, drm is loaded 1st, and drm.debug is set.  When
drm_debug_enabled() tested __drm_debug at runtime, that just worked.

But with DRM_USE_DYNAMIC_DEBUG=y, the runtime test is replaced with a
post-load enablement of drm_dbg/dyndbg callsites (static-keys), via
dyndbg's callback on __drm_debug.  Since all drm-drivers need drm.ko,
it is loaded 1st, then drm.debug=X is applied, then drivers load, but
too late for drm_dbgs to be enabled.

STATUS

For all-loadable drm,i915,amdgpu configs, it almost works, but
propagating drm.debug to dependent modules doesnt actually apply,
though the motions are there.  This is not the problem I want to chase
here.

The more basic trouble is:

For builtin drm + helpers, things are broken pretty early; at the
beginning of dynamic_debug_init().  As the ddebug_sanity() commit-msg
describes in some detail, the records added by _USE fail to reference
the struct ddebug_class_map created and exported by _DEFINE, but get
separate addresses to "other" data that segv's when used as the
expected pointer. FWIW, the pointer val starts with "revi".

OVERVIEW

DECLARE_DYNDBG_CLASSMAP is broken: it is one-size-fits-all-poorly.
It muddles the distinction between a (single) definition, and multiple
references.  Something exported should suffice.

The core of this patchset splits it into:

DYNDBG_CLASSMAP_DEFINE	used once per subsystem to define each classmap
DYNDBG_CLASSMAP_USE	declare dependence on a DEFINEd classmap

This makes the weird coordinated-changes-by-identical-classmaps
"feature" unnecessary; the DEFINE can export the var, and USE refers
to the exported var.

So this patchset adds another section: __dyndbg_class_refs.

It is like __dyndbg_classes; it is scanned under ddebug_add_module(),
and attached to each module's ddebug_table.  Once attached, it can be
used like classes to validate and apply class FOO >control queries.

It also maps the class user -> definer explicitly, so that when the
module is loaded, the section scan can find the kernel-param that is
wired to dyndbg's kparam-callback, and apply its state-var, forex:
__drm_debug to the just loaded helper/driver module.

Theres plenty to address Im sure.

Jim Cromie (17):
  test-dyndbg: fixup CLASSMAP usage error
  test-dyndbg: show that DEBUG enables prdbgs at compiletime
  dyndbg: fix readback value on LEVEL_NAMES interfaces
  dyndbg: replace classmap list with a vector
  dyndbg: make ddebug_apply_class_bitmap more selective
  dyndbg: dynamic_debug_init - use pointer inequality, not strcmp
  dyndbg: drop NUM_TYPE_ARRAY
  dyndbg: reduce verbose/debug clutter
  dyndbg-API: replace DECLARE_DYNDBG_CLASSMAP with
    DYNDBG_CLASSMAP(_DEFINE|_USE)
  dyndbg-API: specialize DYNDBG_CLASSMAP_(DEFINE|USE)
  dyndbg-API: DYNDBG_CLASSMAP_USE drop extra args
  dyndbg-API: DYNDBG_CLASSMAP_DEFINE() improvements
  drm_print: fix stale macro-name in comment
  dyndbg: unwrap __ddebug_add_module inner function NOTYET
  dyndbg: ddebug_sanity()
  dyndbg: mess-w-dep-class
  dyndbg: miss-on HACK

 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c |  14 +-
 drivers/gpu/drm/display/drm_dp_helper.c |  14 +-
 drivers/gpu/drm/drm_crtc_helper.c       |  14 +-
 drivers/gpu/drm/drm_print.c             |  22 +--
 drivers/gpu/drm/i915/i915_params.c      |  14 +-
 drivers/gpu/drm/nouveau/nouveau_drm.c   |  14 +-
 include/asm-generic/vmlinux.lds.h       |   3 +
 include/drm/drm_print.h                 |   6 +-
 include/linux/dynamic_debug.h           |  57 ++++--
 include/linux/map.h                     |  54 ++++++
 kernel/module/main.c                    |   2 +
 lib/dynamic_debug.c                     | 240 +++++++++++++++++++-----
 lib/test_dynamic_debug.c                |  47 ++---
 13 files changed, 344 insertions(+), 157 deletions(-)
 create mode 100644 include/linux/map.h
  

Comments

Daniel Vetter Jan. 11, 2023, 11:09 p.m. UTC | #1
On Mon, Dec 05, 2022 at 05:34:07PM -0700, Jim Cromie wrote:
> Hi everyone,
> 
> DRM_USE_DYNAMIC_DEBUG=y has a regression on rc-*
> 
> Regression is due to a chicken-egg problem loading modules; on
> `modprobe i915`, drm is loaded 1st, and drm.debug is set.  When
> drm_debug_enabled() tested __drm_debug at runtime, that just worked.
> 
> But with DRM_USE_DYNAMIC_DEBUG=y, the runtime test is replaced with a
> post-load enablement of drm_dbg/dyndbg callsites (static-keys), via
> dyndbg's callback on __drm_debug.  Since all drm-drivers need drm.ko,
> it is loaded 1st, then drm.debug=X is applied, then drivers load, but
> too late for drm_dbgs to be enabled.
> 
> STATUS
> 
> For all-loadable drm,i915,amdgpu configs, it almost works, but
> propagating drm.debug to dependent modules doesnt actually apply,
> though the motions are there.  This is not the problem I want to chase
> here.
> 
> The more basic trouble is:
> 
> For builtin drm + helpers, things are broken pretty early; at the
> beginning of dynamic_debug_init().  As the ddebug_sanity() commit-msg
> describes in some detail, the records added by _USE fail to reference
> the struct ddebug_class_map created and exported by _DEFINE, but get
> separate addresses to "other" data that segv's when used as the
> expected pointer. FWIW, the pointer val starts with "revi".

So I honestly have no idea here, linker stuff is way beyond where I have
clue. So what's the way forward here?

The DEFINE/USE split does like the right thing to do at least from the
"how it's used in drivers" pov. But if we're just running circles not
quite getting there I dunno :-/
-Daniel

> 
> OVERVIEW
> 
> DECLARE_DYNDBG_CLASSMAP is broken: it is one-size-fits-all-poorly.
> It muddles the distinction between a (single) definition, and multiple
> references.  Something exported should suffice.
> 
> The core of this patchset splits it into:
> 
> DYNDBG_CLASSMAP_DEFINE	used once per subsystem to define each classmap
> DYNDBG_CLASSMAP_USE	declare dependence on a DEFINEd classmap
> 
> This makes the weird coordinated-changes-by-identical-classmaps
> "feature" unnecessary; the DEFINE can export the var, and USE refers
> to the exported var.
> 
> So this patchset adds another section: __dyndbg_class_refs.
> 
> It is like __dyndbg_classes; it is scanned under ddebug_add_module(),
> and attached to each module's ddebug_table.  Once attached, it can be
> used like classes to validate and apply class FOO >control queries.
> 
> It also maps the class user -> definer explicitly, so that when the
> module is loaded, the section scan can find the kernel-param that is
> wired to dyndbg's kparam-callback, and apply its state-var, forex:
> __drm_debug to the just loaded helper/driver module.
> 
> Theres plenty to address Im sure.
> 
> Jim Cromie (17):
>   test-dyndbg: fixup CLASSMAP usage error
>   test-dyndbg: show that DEBUG enables prdbgs at compiletime
>   dyndbg: fix readback value on LEVEL_NAMES interfaces
>   dyndbg: replace classmap list with a vector
>   dyndbg: make ddebug_apply_class_bitmap more selective
>   dyndbg: dynamic_debug_init - use pointer inequality, not strcmp
>   dyndbg: drop NUM_TYPE_ARRAY
>   dyndbg: reduce verbose/debug clutter
>   dyndbg-API: replace DECLARE_DYNDBG_CLASSMAP with
>     DYNDBG_CLASSMAP(_DEFINE|_USE)
>   dyndbg-API: specialize DYNDBG_CLASSMAP_(DEFINE|USE)
>   dyndbg-API: DYNDBG_CLASSMAP_USE drop extra args
>   dyndbg-API: DYNDBG_CLASSMAP_DEFINE() improvements
>   drm_print: fix stale macro-name in comment
>   dyndbg: unwrap __ddebug_add_module inner function NOTYET
>   dyndbg: ddebug_sanity()
>   dyndbg: mess-w-dep-class
>   dyndbg: miss-on HACK
> 
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c |  14 +-
>  drivers/gpu/drm/display/drm_dp_helper.c |  14 +-
>  drivers/gpu/drm/drm_crtc_helper.c       |  14 +-
>  drivers/gpu/drm/drm_print.c             |  22 +--
>  drivers/gpu/drm/i915/i915_params.c      |  14 +-
>  drivers/gpu/drm/nouveau/nouveau_drm.c   |  14 +-
>  include/asm-generic/vmlinux.lds.h       |   3 +
>  include/drm/drm_print.h                 |   6 +-
>  include/linux/dynamic_debug.h           |  57 ++++--
>  include/linux/map.h                     |  54 ++++++
>  kernel/module/main.c                    |   2 +
>  lib/dynamic_debug.c                     | 240 +++++++++++++++++++-----
>  lib/test_dynamic_debug.c                |  47 ++---
>  13 files changed, 344 insertions(+), 157 deletions(-)
>  create mode 100644 include/linux/map.h
> 
> -- 
> 2.38.1
>
  
Jim Cromie Jan. 13, 2023, 6:29 p.m. UTC | #2
On Wed, Jan 11, 2023 at 4:09 PM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> On Mon, Dec 05, 2022 at 05:34:07PM -0700, Jim Cromie wrote:
> > Hi everyone,
> >
> > DRM_USE_DYNAMIC_DEBUG=y has a regression on rc-*
> >
> > Regression is due to a chicken-egg problem loading modules; on
> > `modprobe i915`, drm is loaded 1st, and drm.debug is set.  When
> > drm_debug_enabled() tested __drm_debug at runtime, that just worked.
> >
> > But with DRM_USE_DYNAMIC_DEBUG=y, the runtime test is replaced with a
> > post-load enablement of drm_dbg/dyndbg callsites (static-keys), via
> > dyndbg's callback on __drm_debug.  Since all drm-drivers need drm.ko,
> > it is loaded 1st, then drm.debug=X is applied, then drivers load, but
> > too late for drm_dbgs to be enabled.
> >
> > STATUS
> >
> > For all-loadable drm,i915,amdgpu configs, it almost works, but
> > propagating drm.debug to dependent modules doesnt actually apply,
> > though the motions are there.  This is not the problem I want to chase
> > here.
> >
> > The more basic trouble is:
> >
> > For builtin drm + helpers, things are broken pretty early; at the
> > beginning of dynamic_debug_init().  As the ddebug_sanity() commit-msg
> > describes in some detail, the records added by _USE fail to reference
> > the struct ddebug_class_map created and exported by _DEFINE, but get
> > separate addresses to "other" data that segv's when used as the
> > expected pointer. FWIW, the pointer val starts with "revi".
>
> So I honestly have no idea here, linker stuff is way beyond where I have
> clue. So what's the way forward here?
>

Ive fixed this aspect.
Unsurprisingly, it wasnt the linker :-}

> The DEFINE/USE split does like the right thing to do at least from the
> "how it's used in drivers" pov. But if we're just running circles not
> quite getting there I dunno :-/
> -Daniel
>

Sending new rev next.
I think its getting close.
  
Daniel Vetter Jan. 13, 2023, 6:48 p.m. UTC | #3
On Fri, Jan 13, 2023 at 11:29:57AM -0700, jim.cromie@gmail.com wrote:
> On Wed, Jan 11, 2023 at 4:09 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> >
> > On Mon, Dec 05, 2022 at 05:34:07PM -0700, Jim Cromie wrote:
> > > Hi everyone,
> > >
> > > DRM_USE_DYNAMIC_DEBUG=y has a regression on rc-*
> > >
> > > Regression is due to a chicken-egg problem loading modules; on
> > > `modprobe i915`, drm is loaded 1st, and drm.debug is set.  When
> > > drm_debug_enabled() tested __drm_debug at runtime, that just worked.
> > >
> > > But with DRM_USE_DYNAMIC_DEBUG=y, the runtime test is replaced with a
> > > post-load enablement of drm_dbg/dyndbg callsites (static-keys), via
> > > dyndbg's callback on __drm_debug.  Since all drm-drivers need drm.ko,
> > > it is loaded 1st, then drm.debug=X is applied, then drivers load, but
> > > too late for drm_dbgs to be enabled.
> > >
> > > STATUS
> > >
> > > For all-loadable drm,i915,amdgpu configs, it almost works, but
> > > propagating drm.debug to dependent modules doesnt actually apply,
> > > though the motions are there.  This is not the problem I want to chase
> > > here.
> > >
> > > The more basic trouble is:
> > >
> > > For builtin drm + helpers, things are broken pretty early; at the
> > > beginning of dynamic_debug_init().  As the ddebug_sanity() commit-msg
> > > describes in some detail, the records added by _USE fail to reference
> > > the struct ddebug_class_map created and exported by _DEFINE, but get
> > > separate addresses to "other" data that segv's when used as the
> > > expected pointer. FWIW, the pointer val starts with "revi".
> >
> > So I honestly have no idea here, linker stuff is way beyond where I have
> > clue. So what's the way forward here?
> >
> 
> Ive fixed this aspect.
> Unsurprisingly, it wasnt the linker :-}

Awesome!

> > The DEFINE/USE split does like the right thing to do at least from the
> > "how it's used in drivers" pov. But if we're just running circles not
> > quite getting there I dunno :-/
> > -Daniel
> >
> 
> Sending new rev next.
> I think its getting close.

Thanks a lot for keeping on pushing this.
-Daniel