[git,pull] drm fixes for 6.1-rc1

Message ID CAPM=9ty3DGWa8vnfumgSrpSgWnixWjikb6C0Zk_5bW+deKLVQw@mail.gmail.com
State New
Headers
Series [git,pull] drm fixes for 6.1-rc1 |

Pull-request

git://anongit.freedesktop.org/drm/drm tags/drm-next-2022-10-14

Message

Dave Airlie Oct. 14, 2022, 12:29 a.m. UTC
  Hi Linus,

Round of fixes for the merge window stuff, bunch of amdgpu and i915
changes, this should have the gcc11 warning fix, amongst other
changes.

Dave.

drm-next-2022-10-14:
drm fixes for 6.1-rc1

amdgpu:
- DC mutex fix
- DC SubVP fixes
- DCN 3.2.x fixes
- DCN 3.1.x fixes
- SDMA 6.x fixes
- Enable DPIA for 3.1.4
- VRR fixes
- VRAM BO swapping fix
- Revert dirty fb helper change
- SR-IOV suspend/resume fixes
- Work around GCC array bounds check fail warning
- UMC 8.10 fixes
- Misc fixes and cleanups

i915:
- Round to closest in g4x+ HDMI clock readout
- Update MOCS table for EHL
- Fix PSR_IMR/IIR field handling
- Fix watermark calculations for gen12+/DG2 modifiers
- Reject excessive dotclocks early
- Fix revocation of non-persistent contexts
- Handle migration for dpt
- Fix display problems after resume
- Allow control over the flags when migrating
- Consider DG2_RC_CCS_CC when migrating buffers
The following changes since commit bafaf67c42f4b547bf4fb329ac6dcb28b05de15e:

  Revert "drm/sched: Use parent fence instead of finished" (2022-10-07
12:58:39 +1000)

are available in the Git repository at:

  git://anongit.freedesktop.org/drm/drm tags/drm-next-2022-10-14

for you to fetch changes up to fc3523a833c9c109e68209f1ecdd15864373e66a:

  Merge tag 'amd-drm-fixes-6.1-2022-10-12' of
https://gitlab.freedesktop.org/agd5f/linux into drm-next (2022-10-14
07:47:25 +1000)

----------------------------------------------------------------
drm fixes for 6.1-rc1

amdgpu:
- DC mutex fix
- DC SubVP fixes
- DCN 3.2.x fixes
- DCN 3.1.x fixes
- SDMA 6.x fixes
- Enable DPIA for 3.1.4
- VRR fixes
- VRAM BO swapping fix
- Revert dirty fb helper change
- SR-IOV suspend/resume fixes
- Work around GCC array bounds check fail warning
- UMC 8.10 fixes
- Misc fixes and cleanups

i915:
- Round to closest in g4x+ HDMI clock readout
- Update MOCS table for EHL
- Fix PSR_IMR/IIR field handling
- Fix watermark calculations for gen12+/DG2 modifiers
- Reject excessive dotclocks early
- Fix revocation of non-persistent contexts
- Handle migration for dpt
- Fix display problems after resume
- Allow control over the flags when migrating
- Consider DG2_RC_CCS_CC when migrating buffers

----------------------------------------------------------------
Alex Deucher (7):
      drm/amdgpu: switch sdma buffer function tear down to a helper
      drm/amdgpu: fix SDMA suspend/resume on SR-IOV
      drm/amd/display: make dcn32_split_stream_for_mpc_or_odm static
      drm/amd/display: fix indentation in dc.c
      drm/amd/display: make virtual_disable_link_output static
      drm/amd/display: add a license to cursor_reg_cache.h
      drm/amd/display: fix transfer function passed to build_coefficients()

Alexey Kodanev (2):
      drm/amd/pm: vega10_hwmgr: fix potential off-by-one overflow in
'performance_levels'
      drm/amd/pm: smu7_hwmgr: fix potential off-by-one overflow in
'performance_levels'

Alvin Lee (5):
      drm/amd/display: Only commit SubVP state after pipe programming
      drm/amd/display: Block SubVP if rotation being used
      drm/amd/display: Disable GSL when enabling phantom pipe
      drm/amd/display: For SubVP pipe split case use min transition into MPO
      drm/amd/display: Fix watermark calculation

Aric Cyr (4):
      Revert "drm/amd/display: correct hostvm flag"
      drm/amd/display: Fix vupdate and vline position calculation
      drm/amd/display: 3.2.206
      drm/amd/display: 3.2.207

Arunpravin Paneer Selvam (1):
      drm/amdgpu: Fix VRAM BO swap issue

Aurabindo Pillai (2):
      drm/amd/display: Do not trigger timing sync for phantom pipes
      drm/amd/display: Add HUBP surface flip interrupt handler

Bokun Zhang (1):
      drm/amdgpu: Fix SDMA engine resume issue under SRIOV

Candice Li (2):
      drm/amdgpu: Update umc v8_10_0 headers
      drm/amdgpu: Add poison mode query for umc v8_10_0

Charlene Liu (1):
      drm/amd/display: prevent S4 test from failing

Daniel Gomez (1):
      drm/amd/display: Fix mutex lock in dcn10

Dave Airlie (3):
      Merge tag 'drm-intel-next-fixes-2022-10-06-1' of
git://anongit.freedesktop.org/drm/drm-intel into drm-next
      Merge tag 'drm-intel-next-fixes-2022-10-13' of
git://anongit.freedesktop.org/drm/drm-intel into drm-next
      Merge tag 'amd-drm-fixes-6.1-2022-10-12' of
https://gitlab.freedesktop.org/agd5f/linux into drm-next

Dillon Varone (8):
      drm/amd/display: Program SubVP in dc_commit_state_no_check
      drm/amd/display: Reorder FCLK P-state switch sequence for DCN32
      drm/amd/display: Increase compbuf size prior to updating clocks
      drm/amd/display: Fix merging dynamic ODM+MPO configs on DCN32
      Revert "drm/amd/display: skip commit minimal transition state"
      drm/amd/display: Use correct pixel clock to program DTBCLK DTO's
      drm/amd/display: Acquire FCLK DPM levels on DCN32
      drm/amd/display: Fix bug preventing FCLK Pstate allow message being sent

Dmytro Laktyushkin (3):
      drm/amd/display: fix dcn315 dml detile overestimation
      drm/amd/display: add dummy pstate workaround to dcn315
      drm/amd/display: always allow pstate change when no dpps are
active on dcn315

Dong Chenchen (1):
      drm/amd/display: Removed unused variable 'sdp_stream_enable'

Eric Bernstein (1):
      drm/amd/display: Fix disable DSC logic in the DIO code

Fangzhi Zuo (1):
      drm/amd/display: Validate DSC After Enable All New CRTCs

George Shen (1):
      drm/amd/display: Add missing SDP registers to DCN32 reglist

Guenter Roeck (1):
      drm/amd/display: fix array-bounds error in
dc_stream_remove_writeback() [take 2]

Hamza Mahfooz (1):
      Revert "drm/amdgpu: use dirty framebuffer helper"

Ian Chen (1):
      drm/amd/display: Refactor edp ILR caps codes

Iswara Nagulendran (1):
      drm/amd/display: Allow PSR exit when panel is disconnected

Josip Pavic (1):
      drm/amd/display: do not compare integers of different widths

Jouni Högander (1):
      drm/i915/psr: Fix PSR_IMR/IIR field handling

Jun Lei (1):
      drm/amd/display: Add a helper to map ODM/MPC/Multi-Plane resources

Leo (Hanghong) Ma (1):
      drm/amd/display: AUX tracing cleanup

Leo Chen (1):
      drm/amd/display: Add log for LTTPR

Lewis Huang (1):
      drm/amd/display: Keep OTG on when Z10 is disable

Li Zhong (1):
      drivers/amd/pm: check the return value of amdgpu_bo_kmap

Martin Leung (3):
      drm/amd/display: block odd h_total timings from halving pixel rate
      drm/amd/display: unblock mcm_luts
      drm/amd/display: zeromem mypipe heap struct before using it

Matthew Auld (3):
      drm/i915/display: handle migration for dpt
      drm/i915: allow control over the flags when migrating
      drm/i915/display: consider DG2_RC_CCS_CC when migrating buffers

Max Tseng (1):
      drm/amd/display: Use the same cursor info across features

Meenakshikumar Somasundaram (1):
      drm/amd/display: Display does not light up after S4 resume

Nicholas Kazlauskas (1):
      drm/amd/display: Update PMFW z-state interface for DCN314

Philip Yang (2):
      drm/amdgpu: Set vmbo destroy after pt bo is created
      drm/amdgpu: Correct amdgpu_amdkfd_total_mem_size calculation

Randy Dunlap (1):
      drm/amd/display: clean up dcn32_fpu.c kernel-doc

Rodrigo Siqueira (14):
      drm/amd/display: Drop unused code for DCN32/321
      drm/amd/display: Update DCN321 hook that deals with pipe aquire
      drm/amd/display: Fix SubVP control flow in the MPO context
      drm/amd/display: Remove OPTC lock check
      drm/amd/display: Adding missing HDMI ACP SEND register
      drm/amd/display: Add PState change high hook for DCN32
      drm/amd/display: Enable 2 to 1 ODM policy if supported
      drm/amd/display: Disconnect DSC for unused pipes during ODM transition
      drm/amd/display: update DSC for DCN32
      drm/amd/display: Minor code style change
      drm/amd/display: Add a missing hook to DCN20
      drm/amd/display: Use set_vtotal_min_max to configure OTG VTOTAL
      drm/amd/display: Drop uncessary OTG lock check
      drm/amd/display: Clean some DCN32 macros

Roman Li (1):
      drm/amd/display: Enable dpia support for dcn314

Ruili Ji (1):
      drm/amdgpu: Enable F32_WPTR_POLL_ENABLE in mqd

Shirish S (1):
      drm/amd/display: explicitly disable psr_feature_enable appropriately

Sonny Jiang (1):
      drm/amdgpu: Enable VCN PG on GC11_0_1

Tao Zhou (4):
      drm/amdgpu: remove check for CE in RAS error address query
      drm/amdgpu: define RAS convert_error_address API
      drm/amdgpu: define convert_error_address for umc v8.7
      drm/amdgpu: fix coding style issue for mca notifier

Tejas Upadhyay (1):
      drm/i915/ehl: Update MOCS table for EHL

Thomas Hellström (1):
      drm/i915: Fix display problems after resume

Tvrtko Ursulin (1):
      drm/i915/guc: Fix revocation of non-persistent contexts

Ville Syrjälä (7):
      drm/i915: Round to closest in g4x+ HDMI clock readout
      drm/i915: Fix watermark calculations for gen12+ RC CCS modifier
      drm/i915: Fix watermark calculations for gen12+ MC CCS modifier
      drm/i915: Fix watermark calculations for gen12+ CCS+CC modifier
      drm/i915: Fix watermark calculations for DG2 CCS modifiers
      drm/i915: Fix watermark calculations for DG2 CCS+CC modifier
      drm/i915: Reject excessive dotclocks early

Vladimir Stempen (2):
      drm/amd/display: properly configure DCFCLK when enable/disable Freesync
      drm/amd/display: increase hardware status wait time

Wenjing Liu (3):
      drm/amd/display: fix integer overflow during MSA V_Freq calculation
      drm/amd/display: write all 4 bytes of FFE_PRESET dpcd value
      drm/amd/display: Add missing mask sh for SYM32_TP_SQ_PULSE register

Yang Li (3):
      drm/amd/display: clean up one inconsistent indenting
      drm/amd/display: clean up one inconsistent indenting
      drm/amd/display: Simplify bool conversion

Yang Yingliang (3):
      drm/amd/display: change to enc314_stream_encoder_dp_blank static
      drm/amdgpu/sdma: add missing release_firmware() in
amdgpu_sdma_init_microcode()
      drm/amd/display: fix build error on arm64

Yuan Can (1):
      drm/amd/display: Remove unused struct i2c_id_config_access

Yunxiang Li (1):
      drm/amd/display: Fix vblank refcount in vrr transition

Zhikai Zhai (1):
      drm/amd/display: skip commit minimal transition state

 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c         |   6 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_display.c        |  14 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c         |   5 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c            |   8 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c           |  29 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h           |   2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c            |  17 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h            |   7 +-
 drivers/gpu/drm/amd/amdgpu/cik_sdma.c              |   6 +-
 drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c             |   6 +-
 drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c             |   6 +-
 drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c             |  29 +--
 drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c             |  11 +-
 drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c             |  15 +-
 drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c             |  17 +-
 drivers/gpu/drm/amd/amdgpu/si_dma.c                |   5 +-
 drivers/gpu/drm/amd/amdgpu/soc21.c                 |   1 +
 drivers/gpu/drm/amd/amdgpu/umc_v6_1.c              |  10 +-
 drivers/gpu/drm/amd/amdgpu/umc_v6_7.c              | 165 ++++++--------
 drivers/gpu/drm/amd/amdgpu/umc_v8_10.c             |  78 ++++---
 drivers/gpu/drm/amd/amdgpu/umc_v8_7.c              |  63 +++---
 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c   |   3 +-
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c  |  71 +++---
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.c  |   8 +-
 drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c |   7 -
 .../amd/display/dc/clk_mgr/dcn20/dcn20_clk_mgr.c   |   4 +-
 .../drm/amd/display/dc/clk_mgr/dcn314/dcn314_smu.c |  11 +-
 .../amd/display/dc/clk_mgr/dcn32/dcn32_clk_mgr.c   |  85 +++++---
 drivers/gpu/drm/amd/display/dc/core/dc.c           | 105 ++++++++-
 drivers/gpu/drm/amd/display/dc/core/dc_link.c      |  11 +-
 drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c   |  70 +++---
 drivers/gpu/drm/amd/display/dc/core/dc_resource.c  |  53 ++++-
 drivers/gpu/drm/amd/display/dc/core/dc_stream.c    |   8 +-
 drivers/gpu/drm/amd/display/dc/dc.h                |   8 +-
 drivers/gpu/drm/amd/display/dc/dc_dmub_srv.c       | 147 ++++++++++++-
 drivers/gpu/drm/amd/display/dc/dc_dmub_srv.h       |   1 +
 drivers/gpu/drm/amd/display/dc/dc_link.h           |   4 +
 drivers/gpu/drm/amd/display/dc/dce/dce_aux.c       |  13 +-
 drivers/gpu/drm/amd/display/dc/dcn10/dcn10_dpp.c   |   1 +
 .../drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c  | 239 +++++----------------
 drivers/gpu/drm/amd/display/dc/dcn10/dcn10_optc.c  |  40 +---
 drivers/gpu/drm/amd/display/dc/dcn10/dcn10_optc.h  |   1 -
 .../gpu/drm/amd/display/dc/dcn10/dcn10_resource.c  |  66 +++++-
 drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hubp.c  |  30 +++
 drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c |  25 +--
 drivers/gpu/drm/amd/display/dc/dcn20/dcn20_optc.c  |   1 +
 .../gpu/drm/amd/display/dc/dcn21/dcn21_hubbub.c    |   8 +-
 .../gpu/drm/amd/display/dc/dcn21/dcn21_resource.c  |  13 +-
 drivers/gpu/drm/amd/display/dc/dcn30/dcn30_dpp.c   |   4 +
 drivers/gpu/drm/amd/display/dc/dcn30/dcn30_optc.c  |   3 +-
 .../gpu/drm/amd/display/dc/dcn30/dcn30_resource.c  |   4 +
 .../drm/amd/display/dc/dcn301/dcn301_resource.c    |   2 +-
 .../display/dc/dcn31/dcn31_hpo_dp_stream_encoder.c |  20 +-
 drivers/gpu/drm/amd/display/dc/dcn31/dcn31_optc.c  |   2 -
 .../gpu/drm/amd/display/dc/dcn31/dcn31_resource.c  |  15 +-
 .../display/dc/dcn314/dcn314_dio_stream_encoder.c  |   2 +-
 .../drm/amd/display/dc/dcn314/dcn314_resource.c    |  16 +-
 .../drm/amd/display/dc/dcn315/dcn315_resource.c    |  15 +-
 .../drm/amd/display/dc/dcn316/dcn316_resource.c    |  13 +-
 .../amd/display/dc/dcn32/dcn32_dio_link_encoder.c  |   7 -
 .../amd/display/dc/dcn32/dcn32_dio_link_encoder.h  |   4 -
 .../display/dc/dcn32/dcn32_dio_stream_encoder.c    |  57 +++--
 .../display/dc/dcn32/dcn32_dio_stream_encoder.h    |  14 +-
 .../display/dc/dcn32/dcn32_hpo_dp_link_encoder.h   |   1 +
 .../gpu/drm/amd/display/dc/dcn32/dcn32_hubbub.c    |   1 +
 drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hubp.c  |   6 +-
 drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c |  42 ++--
 drivers/gpu/drm/amd/display/dc/dcn32/dcn32_optc.c  |   2 +-
 .../gpu/drm/amd/display/dc/dcn32/dcn32_resource.c  |  31 +++
 .../gpu/drm/amd/display/dc/dcn32/dcn32_resource.h  |  22 ++
 .../amd/display/dc/dcn32/dcn32_resource_helpers.c  |  88 ++++++++
 .../display/dc/dcn321/dcn321_dio_link_encoder.c    |   1 -
 .../drm/amd/display/dc/dcn321/dcn321_resource.c    |   6 +-
 .../gpu/drm/amd/display/dc/dml/calcs/dcn_calcs.c   | 118 +++++-----
 .../gpu/drm/amd/display/dc/dml/dcn31/dcn31_fpu.c   |  96 +++------
 .../gpu/drm/amd/display/dc/dml/dcn31/dcn31_fpu.h   |   1 +
 .../amd/display/dc/dml/dcn31/display_mode_vba_31.c |  15 ++
 .../gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c   | 131 ++++++-----
 .../amd/display/dc/dml/dcn32/display_mode_vba_32.c |  21 +-
 .../gpu/drm/amd/display/dc/dml/display_mode_lib.c  |   1 +
 .../gpu/drm/amd/display/dc/dml/display_mode_lib.h  |   1 +
 drivers/gpu/drm/amd/display/dc/inc/core_types.h    |   6 +-
 drivers/gpu/drm/amd/display/dc/inc/dcn_calcs.h     |  19 +-
 drivers/gpu/drm/amd/display/dc/inc/hw/clk_mgr.h    |  15 +-
 .../drm/amd/display/dc/inc/hw/cursor_reg_cache.h   |  99 +++++++++
 drivers/gpu/drm/amd/display/dc/inc/hw/dpp.h        |   4 +
 drivers/gpu/drm/amd/display/dc/inc/hw/hubp.h       |   5 +
 .../drm/amd/display/dc/inc/hw/timing_generator.h   |   1 -
 drivers/gpu/drm/amd/display/dc/inc/resource.h      |   6 +
 .../gpu/drm/amd/display/dc/link/link_hwss_hpo_dp.c |   2 +-
 .../drm/amd/display/dc/virtual/virtual_link_hwss.c |   2 +-
 drivers/gpu/drm/amd/display/dmub/dmub_srv.h        |   1 +
 drivers/gpu/drm/amd/display/dmub/inc/dmub_cmd.h    | 140 ++++++++++--
 drivers/gpu/drm/amd/display/dmub/src/dmub_dcn31.c  |   1 +
 .../drm/amd/display/modules/color/color_gamma.c    |   2 +-
 .../amd/include/asic_reg/umc/umc_8_10_0_offset.h   |   2 +
 .../amd/include/asic_reg/umc/umc_8_10_0_sh_mask.h  |   3 +
 drivers/gpu/drm/amd/pm/legacy-dpm/kv_dpm.c         |   5 +-
 .../gpu/drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c    |   2 +-
 .../gpu/drm/amd/pm/powerplay/hwmgr/vega10_hwmgr.c  |   2 +-
 drivers/gpu/drm/i915/display/g4x_hdmi.c            |   2 +-
 drivers/gpu/drm/i915/display/intel_display.c       |  18 ++
 drivers/gpu/drm/i915/display/intel_fb_pin.c        |  62 ++++--
 drivers/gpu/drm/i915/display/intel_psr.c           |  78 ++++---
 drivers/gpu/drm/i915/display/skl_watermark.c       |  16 +-
 drivers/gpu/drm/i915/gem/i915_gem_context.c        |   8 +-
 drivers/gpu/drm/i915/gem/i915_gem_object.c         |  37 +++-
 drivers/gpu/drm/i915/gem/i915_gem_object.h         |   4 +
 drivers/gpu/drm/i915/gem/i915_gem_object_types.h   |   3 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c            |   5 +-
 drivers/gpu/drm/i915/gt/intel_context.c            |   5 +-
 drivers/gpu/drm/i915/gt/intel_context.h            |   3 +-
 drivers/gpu/drm/i915/gt/intel_ggtt.c               |   8 +-
 drivers/gpu/drm/i915/gt/intel_mocs.c               |   8 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c  |  26 +--
 drivers/gpu/drm/i915/i915_reg.h                    |  16 +-
 116 files changed, 1830 insertions(+), 1081 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/display/dc/inc/hw/cursor_reg_cache.h
  

Comments

Linus Torvalds Oct. 14, 2022, 5:04 a.m. UTC | #1
On Thu, Oct 13, 2022 at 5:29 PM Dave Airlie <airlied@gmail.com> wrote:
>
> Round of fixes for the merge window stuff, bunch of amdgpu and i915
> changes, this should have the gcc11 warning fix, amongst other
> changes.

Some of those amd changes aren't "fixes". They are some major code changes.

We're still in the merge window, so I'm letting it slide, but calling
then "fixes" really stretches things. They are fixes exactly the same
way completely new development can "fix" things.

                      Linus
  
pr-tracker-bot@kernel.org Oct. 14, 2022, 5:07 a.m. UTC | #2
The pull request you sent on Fri, 14 Oct 2022 10:29:19 +1000:

> git://anongit.freedesktop.org/drm/drm tags/drm-next-2022-10-14

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/9c9155a3509a2ebdb06d77c7a621e9685c802eac

Thank you!
  
Arthur Marsh Oct. 16, 2022, 8:08 a.m. UTC | #3
From: Arthur Marsh <arthur.marsh@internode.on.net>

Hi, the "drm fixes for 6.1-rc1" commit caused the amdgpu module to fail
with my Cape Verde radeonsi card.

I haven't been able to bisect the problem to an individual commit, but
attach a dmesg extract below.

I'm happy to supply any other configuration information and test patches.

Arthur.

 Linux version 6.0.0+ (root@am64) (gcc-12 (Debian 12.2.0-5) 12.2.0, GNU ld (GNU Binutils for Debian) 2.39) #5179 SMP PREEMPT_DYNAMIC Fri Oct 14 17:00:40 ACDT 2022
 Command line: BOOT_IMAGE=/vmlinuz-6.0.0+ root=UUID=39706f53-7c27-4310-b22a-36c7b042d1a1 ro single amdgpu.audio=1 amdgpu.si_support=1 radeon.si_support=0 page_owner=on amdgpu.gpu_recovery=1
...

 [drm] amdgpu kernel modesetting enabled.
 amdgpu 0000:01:00.0: vgaarb: deactivate vga console
 Console: switching to colour dummy device 80x25
 [drm] initializing kernel modesetting (VERDE 0x1002:0x682B 0x1458:0x22CA 0x87).
 [drm] register mmio base: 0xFE8C0000
 [drm] register mmio size: 262144
 [drm] add ip block number 0 <si_common>
 [drm] add ip block number 1 <gmc_v6_0>
 [drm] add ip block number 2 <si_ih>
 [drm] add ip block number 3 <gfx_v6_0>
 [drm] add ip block number 4 <si_dma>
 [drm] add ip block number 5 <si_dpm>
 [drm] add ip block number 6 <dce_v6_0>
 [drm] add ip block number 7 <uvd_v3_1>
 [drm] BIOS signature incorrect 5b 7
 resource sanity check: requesting [mem 0x000c0000-0x000dffff], which spans more than PCI Bus 0000:00 [mem 0x000d0000-0x000dffff window]
 caller pci_map_rom+0x68/0x1b0 mapping multiple BARs
 amdgpu 0000:01:00.0: No more image in the PCI ROM
 amdgpu 0000:01:00.0: amdgpu: Fetched VBIOS from ROM BAR
 amdgpu: ATOM BIOS: xxx-xxx-xxx
 amdgpu 0000:01:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
 amdgpu 0000:01:00.0: amdgpu: PCIE atomic ops is not supported
 [drm] PCIE gen 2 link speeds already enabled
 [drm] vm size is 64 GB, 2 levels, block size is 10-bit, fragment size is 9-bit
 RTL8211B Gigabit Ethernet r8169-0-300:00: attached PHY driver (mii_bus:phy_addr=r8169-0-300:00, irq=MAC)
 r8169 0000:03:00.0 eth0: Link is Down
 amdgpu 0000:01:00.0: amdgpu: VRAM: 2048M 0x000000F400000000 - 0x000000F47FFFFFFF (2048M used)
 amdgpu 0000:01:00.0: amdgpu: GART: 1024M 0x000000FF00000000 - 0x000000FF3FFFFFFF
 [drm] Detected VRAM RAM=2048M, BAR=256M
 [drm] RAM width 128bits DDR3
 [drm] amdgpu: 2048M of VRAM memory ready
 [drm] amdgpu: 3979M of GTT memory ready.
 [drm] GART: num cpu pages 262144, num gpu pages 262144
 amdgpu 0000:01:00.0: amdgpu: PCIE GART of 1024M enabled (table at 0x000000F400A00000).
 [drm] Internal thermal controller with fan control
 [drm] amdgpu: dpm initialized
 [drm] AMDGPU Display Connectors
 [drm] Connector 0:
 [drm]   HDMI-A-1
 [drm]   HPD1
 [drm]   DDC: 0x194c 0x194c 0x194d 0x194d 0x194e 0x194e 0x194f 0x194f
 [drm]   Encoders:
 [drm]     DFP1: INTERNAL_UNIPHY
 [drm] Connector 1:
 [drm]   DVI-D-1
 [drm]   HPD2
 [drm]   DDC: 0x1950 0x1950 0x1951 0x1951 0x1952 0x1952 0x1953 0x1953
 [drm]   Encoders:
 [drm]     DFP2: INTERNAL_UNIPHY
 [drm] Connector 2:
 [drm]   VGA-1
 [drm]   DDC: 0x1970 0x1970 0x1971 0x1971 0x1972 0x1972 0x1973 0x1973
 [drm]   Encoders:
 [drm]     CRT1: INTERNAL_KLDSCP_DAC1
 [drm] Found UVD firmware Version: 64.0 Family ID: 13
 amdgpu: Move buffer fallback to memcpy unavailable
 [drm:amdgpu_device_init.cold [amdgpu]] *ERROR* sw_init of IP block <uvd_v3_1> failed -19
 amdgpu 0000:01:00.0: amdgpu: amdgpu_device_ip_init failed
 amdgpu 0000:01:00.0: amdgpu: Fatal error during GPU init
 amdgpu 0000:01:00.0: amdgpu: amdgpu: finishing device.
 BUG: kernel NULL pointer dereference, address: 0000000000000090
 #PF: supervisor write access in kernel mode
 #PF: error_code(0x0002) - not-present page
 PGD 0 P4D 0 
 Oops: 0002 [#1] PREEMPT SMP NOPTI
 CPU: 3 PID: 447 Comm: udevd Not tainted 6.0.0+ #5179
 Hardware name: System manufacturer System Product Name/M3A78 PRO, BIOS 1701    01/27/2011
 RIP: 0010:drm_sched_fini+0x80/0xa0 [gpu_sched]
 Code: 76 83 0e c4 c6 85 8c 01 00 00 00 5b 5d 41 5c 41 5d c3 cc cc cc cc 4c 8d 63 f0 4c 89 e7 e8 08 99 8e c4 48 8b 03 48 39 d8 74 0f <c6> 80 90 00 00 00 01 48 8b 00 48 39 d8 75 f1 4c 89 e7 e8 c9 99 8e
 RSP: 0018:ffffbeb3c06bfbb8 EFLAGS: 00010213
 RAX: 0000000000000000 RBX: ffff99bae8269a98 RCX: ffff99bab703afc0
 RDX: 0000000000000001 RSI: ffff99bab703afe8 RDI: 0000000000000000
 RBP: ffff99bae82699f0 R08: ffffffff85cd0bc2 R09: 0000000000000010
 R10: 0000000000000035 R11: ffff99bb594806c0 R12: ffff99bae8269a88
 R13: ffff99bae82699f8 R14: ffff99bae82665e8 R15: 0000000000000000
 FS:  00007fd81fcd9840(0000) GS:ffff99bb67cc0000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000090 CR3: 0000000111822000 CR4: 00000000000006e0
 Call Trace:
  <TASK>
  amdgpu_fence_driver_sw_fini+0xc2/0xd0 [amdgpu]
  amdgpu_device_fini_sw+0x17/0x3c0 [amdgpu]
  amdgpu_driver_release_kms+0x12/0x30 [amdgpu]
  devm_drm_dev_init_release+0x4a/0x70 [drm]
  release_nodes+0x40/0xb0
  devres_release_all+0x89/0xc0
  device_unbind_cleanup+0xe/0x70
  really_probe+0x245/0x3a0
  ? pm_runtime_barrier+0x61/0xb0
  __driver_probe_device+0x78/0x170
  driver_probe_device+0x2d/0xb0
  __driver_attach+0xdc/0x1d0
  ? __device_attach_driver+0x100/0x100
  bus_for_each_dev+0x69/0xa0
  bus_add_driver+0x1d4/0x230
  ? _raw_spin_unlock+0x15/0x40
  driver_register+0x89/0xe0
  ? 0xffffffffc0c3b000
  do_one_initcall+0x44/0x200
  ? __kmem_cache_alloc_node+0x90/0x360
  ? kmalloc_trace+0x38/0xc0
  do_init_module+0x4a/0x1e0
  __do_sys_finit_module+0xb5/0x130
  do_syscall_64+0x3a/0x90
  entry_SYSCALL_64_after_hwframe+0x63/0xcd
 RIP: 0033:0x7fd81ff5b1b9
 Code: 08 44 89 e0 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 27 1c 0d 00 f7 d8 64 89 01 48
 RSP: 002b:00007ffc5b37cbb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
 RAX: ffffffffffffffda RBX: 000055e5f2f6a140 RCX: 00007fd81ff5b1b9
 RDX: 0000000000000000 RSI: 000055e5f2f67e30 RDI: 0000000000000017
 RBP: 000055e5f2f67e30 R08: 0000000000000000 R09: 000055e5f2f46700
 R10: 0000000000000017 R11: 0000000000000246 R12: 0000000000020000
 R13: 0000000000000000 R14: 000055e5f2f65b00 R15: 0000000000000024
  </TASK>
 Modules linked in: amdgpu(+) snd_emu10k1_synth snd_emux_synth snd_seq_midi_emul snd_seq_virmidi snd_seq_midi snd_seq_midi_event snd_seq wmi_bmof snd_emu10k1 edac_mce_amd gpu_sched drm_buddy video kvm_amd drm_ttm_helper ttm snd_util_mem drm_display_helper snd_ac97_codec ccp drm_kms_helper snd_hda_codec_hdmi rng_core ac97_bus snd_rawmidi snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core snd_seq_device drm kvm snd_hwdep snd_pcm_oss snd_mixer_oss evdev serio_raw snd_pcm irqbypass i2c_algo_bit fb_sys_fops syscopyarea sysfillrect emu10k1_gp pcspkr gameport k10temp snd_timer sysimgblt snd acpi_cpufreq wmi soundcore button sp5100_tco asus_atk0110 ext4 crc16 mbcache jbd2 btrfs blake2b_generic xor raid6_pq zstd_compress libcrc32c crc32c_generic uas usb_storage sg sd_mod hid_generic t10_pi usbhid hid sr_mod cdrom crc64_rocksoft crc64 ata_generic ahci pata_atiixp libahci ohci_pci firewire_ohci libata firewire_core crc_itu_t xhci_pci scsi_mod ohci_hcd r8169 ehci_pci xhci_hcd
  realtek ehci_hcd mdio_devres i2c_piix4 scsi_common usbcore libphy usb_common
 CR2: 0000000000000090
 ---[ end trace 0000000000000000 ]---
 RIP: 0010:drm_sched_fini+0x80/0xa0 [gpu_sched]
 Code: 76 83 0e c4 c6 85 8c 01 00 00 00 5b 5d 41 5c 41 5d c3 cc cc cc cc 4c 8d 63 f0 4c 89 e7 e8 08 99 8e c4 48 8b 03 48 39 d8 74 0f <c6> 80 90 00 00 00 01 48 8b 00 48 39 d8 75 f1 4c 89 e7 e8 c9 99 8e
 RSP: 0018:ffffbeb3c06bfbb8 EFLAGS: 00010213
 RAX: 0000000000000000 RBX: ffff99bae8269a98 RCX: ffff99bab703afc0
 RDX: 0000000000000001 RSI: ffff99bab703afe8 RDI: 0000000000000000
 RBP: ffff99bae82699f0 R08: ffffffff85cd0bc2 R09: 0000000000000010
 R10: 0000000000000035 R11: ffff99bb594806c0 R12: ffff99bae8269a88
 R13: ffff99bae82699f8 R14: ffff99bae82665e8 R15: 0000000000000000
 FS:  00007fd81fcd9840(0000) GS:ffff99bb67cc0000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000090 CR3: 0000000111822000 CR4: 00000000000006e0
 note: udevd[447] exited with preempt_count 1
 udevd[433]: worker [447] terminated by signal 9 (Killed)
 udevd[433]: worker [447] failed while handling '/devices/pci0000:00/0000:00:02.0/0000:01:00.0'
 r8169 0000:03:00.0 eth0: Link is Up - 1Gbps/Full - flow control off
 IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
 Adding 4194300k swap on /dev/sda4.  Priority:-2 extents:1 across:4194300k FS
 EXT4-fs (sda5): re-mounted. Quota mode: none.
 lp: driver loaded but no devices found
 ppdev: user-space parallel port driver
 it87: Found IT8716F chip at 0xe80, revision 3
 ACPI Warning: SystemIO range 0x0000000000000E85-0x0000000000000E86 conflicts with OpRegion 0x0000000000000E85-0x0000000000000E86 (\_SB.PCI0.SBRG.ASOC.HWRE) (20220331/utaddress-204)
 ACPI: OSL: Resource conflict; ACPI support missing from driver?
 BUG: unable to handle page fault for address: 00000000000065c0
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0 
 Oops: 0000 [#2] PREEMPT SMP NOPTI
 CPU: 2 PID: 55 Comm: kworker/2:1 Tainted: G      D            6.0.0+ #5179
 Hardware name: System manufacturer System Product Name/M3A78 PRO, BIOS 1701    01/27/2011
 Workqueue: events output_poll_execute [drm_kms_helper]
 RIP: 0010:amdgpu_device_rreg.part.0+0x39/0x100 [amdgpu]
 Code: 6c 24 08 48 89 fb 4c 89 64 24 10 44 8d 24 b5 00 00 00 00 4c 3b a7 88 08 00 00 89 f5 73 70 83 e2 02 74 2f 4c 03 a3 90 08 00 00 <45> 8b 24 24 48 8b 43 08 0f b7 70 3e 66 90 44 89 e0 48 8b 1c 24 48
 RSP: 0018:ffffbeb3c0717c48 EFLAGS: 00010206
 RAX: 0000000000000000 RBX: ffff99bae8260000 RCX: 0000000000000000
 RDX: 0000000000000000 RSI: 0000000000001970 RDI: ffff99bae8260000
 RBP: 0000000000001970 R08: ffffbeb3c0717e08 R09: 0000000000000000
 R10: 0000000000000018 R11: fefefefefefefeff R12: 00000000000065c0
 R13: ffffbeb3c0717d70 R14: 0000000000000000 R15: 000000010005e340
 FS:  0000000000000000(0000) GS:ffff99bb67c80000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00000000000065c0 CR3: 000000008980a000 CR4: 00000000000006e0
 Call Trace:
  <TASK>
  amdgpu_i2c_pre_xfer+0x163/0x180 [amdgpu]
  bit_xfer+0x36/0x530 [i2c_algo_bit]
  __i2c_transfer+0x185/0x550
  i2c_transfer+0xa2/0x110
  amdgpu_display_ddc_probe+0xbd/0x100 [amdgpu]
  amdgpu_connector_vga_detect+0x8e/0x200 [amdgpu]
  drm_helper_probe_detect_ctx+0x7b/0xd0 [drm_kms_helper]
  output_poll_execute+0x152/0x220 [drm_kms_helper]
  process_one_work+0x1ae/0x370
  worker_thread+0x4d/0x3b0
  ? rescuer_thread+0x380/0x380
  kthread+0xe3/0x110
  ? kthread_complete_and_exit+0x20/0x20
  ret_from_fork+0x22/0x30
  </TASK>
 Modules linked in: max6650 hwmon_vid parport_pc ppdev lp parport amdgpu(+) snd_emu10k1_synth snd_emux_synth snd_seq_midi_emul snd_seq_virmidi snd_seq_midi snd_seq_midi_event snd_seq wmi_bmof snd_emu10k1 edac_mce_amd gpu_sched drm_buddy video kvm_amd drm_ttm_helper ttm snd_util_mem drm_display_helper snd_ac97_codec ccp drm_kms_helper snd_hda_codec_hdmi rng_core ac97_bus snd_rawmidi snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core snd_seq_device drm kvm snd_hwdep snd_pcm_oss snd_mixer_oss evdev serio_raw snd_pcm irqbypass i2c_algo_bit fb_sys_fops syscopyarea sysfillrect emu10k1_gp pcspkr gameport k10temp snd_timer sysimgblt snd acpi_cpufreq wmi soundcore button sp5100_tco asus_atk0110 ext4 crc16 mbcache jbd2 btrfs blake2b_generic xor raid6_pq zstd_compress libcrc32c crc32c_generic uas usb_storage sg sd_mod hid_generic t10_pi usbhid hid sr_mod cdrom crc64_rocksoft crc64 ata_generic ahci pata_atiixp libahci ohci_pci firewire_ohci libata firewire_core crc_itu_t xhci_pci
  scsi_mod ohci_hcd r8169 ehci_pci xhci_hcd realtek ehci_hcd mdio_devres i2c_piix4 scsi_common usbcore libphy usb_common
 CR2: 00000000000065c0
 ---[ end trace 0000000000000000 ]---
 RIP: 0010:drm_sched_fini+0x80/0xa0 [gpu_sched]
 Code: 76 83 0e c4 c6 85 8c 01 00 00 00 5b 5d 41 5c 41 5d c3 cc cc cc cc 4c 8d 63 f0 4c 89 e7 e8 08 99 8e c4 48 8b 03 48 39 d8 74 0f <c6> 80 90 00 00 00 01 48 8b 00 48 39 d8 75 f1 4c 89 e7 e8 c9 99 8e
 RSP: 0018:ffffbeb3c06bfbb8 EFLAGS: 00010213
 RAX: 0000000000000000 RBX: ffff99bae8269a98 RCX: ffff99bab703afc0
 RDX: 0000000000000001 RSI: ffff99bab703afe8 RDI: 0000000000000000
 RBP: ffff99bae82699f0 R08: ffffffff85cd0bc2 R09: 0000000000000010
 R10: 0000000000000035 R11: ffff99bb594806c0 R12: ffff99bae8269a88
 R13: ffff99bae82699f8 R14: ffff99bae82665e8 R15: 0000000000000000
 FS:  0000000000000000(0000) GS:ffff99bb67c80000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00000000000065c0 CR3: 000000008980a000 CR4: 00000000000006e0
  
Dave Airlie Oct. 16, 2022, 9:44 p.m. UTC | #4
On Sun, 16 Oct 2022 at 18:09, Arthur Marsh
<arthur.marsh@internode.on.net> wrote:
>
> From: Arthur Marsh <arthur.marsh@internode.on.net>
>
> Hi, the "drm fixes for 6.1-rc1" commit caused the amdgpu module to fail
> with my Cape Verde radeonsi card.
>
> I haven't been able to bisect the problem to an individual commit, but
> attach a dmesg extract below.
>
> I'm happy to supply any other configuration information and test patches.
>

Can you try reverting: it's the only think I can spot that might
affect a card that old since most changes in that request were for
display hw you don't have.

ommit 312b4dc11d4f74bfe03ea25ffe04c1f2fdd13cb9
Author: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Date:   Tue Oct 4 07:33:39 2022 -0700

    drm/amdgpu: Fix VRAM BO swap issue

    DRM buddy manager allocates the contiguous memory requests in
    a single block or multiple blocks. So for the ttm move operation
    (incase of low vram memory) we should consider all the blocks to
    compute the total memory size which compared with the struct
    ttm_resource num_pages in order to verify that the blocks are
    contiguous for the eviction process.

    v2: Added a Fixes tag
    v3: Rewrite the code to save a bit of calculations and
        variables (Christian)

    Fixes: c9cad937c0c5 ("drm/amdgpu: add drm buddy support to amdgpu")
    Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
    Reviewed-by: Christian König <christian.koenig@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


Thanks,
Dave.

> Arthur.
>
>  Linux version 6.0.0+ (root@am64) (gcc-12 (Debian 12.2.0-5) 12.2.0, GNU ld (GNU Binutils for Debian) 2.39) #5179 SMP PREEMPT_DYNAMIC Fri Oct 14 17:00:40 ACDT 2022
>  Command line: BOOT_IMAGE=/vmlinuz-6.0.0+ root=UUID=39706f53-7c27-4310-b22a-36c7b042d1a1 ro single amdgpu.audio=1 amdgpu.si_support=1 radeon.si_support=0 page_owner=on amdgpu.gpu_recovery=1
> ...
>
>  [drm] amdgpu kernel modesetting enabled.
>  amdgpu 0000:01:00.0: vgaarb: deactivate vga console
>  Console: switching to colour dummy device 80x25
>  [drm] initializing kernel modesetting (VERDE 0x1002:0x682B 0x1458:0x22CA 0x87).
>  [drm] register mmio base: 0xFE8C0000
>  [drm] register mmio size: 262144
>  [drm] add ip block number 0 <si_common>
>  [drm] add ip block number 1 <gmc_v6_0>
>  [drm] add ip block number 2 <si_ih>
>  [drm] add ip block number 3 <gfx_v6_0>
>  [drm] add ip block number 4 <si_dma>
>  [drm] add ip block number 5 <si_dpm>
>  [drm] add ip block number 6 <dce_v6_0>
>  [drm] add ip block number 7 <uvd_v3_1>
>  [drm] BIOS signature incorrect 5b 7
>  resource sanity check: requesting [mem 0x000c0000-0x000dffff], which spans more than PCI Bus 0000:00 [mem 0x000d0000-0x000dffff window]
>  caller pci_map_rom+0x68/0x1b0 mapping multiple BARs
>  amdgpu 0000:01:00.0: No more image in the PCI ROM
>  amdgpu 0000:01:00.0: amdgpu: Fetched VBIOS from ROM BAR
>  amdgpu: ATOM BIOS: xxx-xxx-xxx
>  amdgpu 0000:01:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
>  amdgpu 0000:01:00.0: amdgpu: PCIE atomic ops is not supported
>  [drm] PCIE gen 2 link speeds already enabled
>  [drm] vm size is 64 GB, 2 levels, block size is 10-bit, fragment size is 9-bit
>  RTL8211B Gigabit Ethernet r8169-0-300:00: attached PHY driver (mii_bus:phy_addr=r8169-0-300:00, irq=MAC)
>  r8169 0000:03:00.0 eth0: Link is Down
>  amdgpu 0000:01:00.0: amdgpu: VRAM: 2048M 0x000000F400000000 - 0x000000F47FFFFFFF (2048M used)
>  amdgpu 0000:01:00.0: amdgpu: GART: 1024M 0x000000FF00000000 - 0x000000FF3FFFFFFF
>  [drm] Detected VRAM RAM=2048M, BAR=256M
>  [drm] RAM width 128bits DDR3
>  [drm] amdgpu: 2048M of VRAM memory ready
>  [drm] amdgpu: 3979M of GTT memory ready.
>  [drm] GART: num cpu pages 262144, num gpu pages 262144
>  amdgpu 0000:01:00.0: amdgpu: PCIE GART of 1024M enabled (table at 0x000000F400A00000).
>  [drm] Internal thermal controller with fan control
>  [drm] amdgpu: dpm initialized
>  [drm] AMDGPU Display Connectors
>  [drm] Connector 0:
>  [drm]   HDMI-A-1
>  [drm]   HPD1
>  [drm]   DDC: 0x194c 0x194c 0x194d 0x194d 0x194e 0x194e 0x194f 0x194f
>  [drm]   Encoders:
>  [drm]     DFP1: INTERNAL_UNIPHY
>  [drm] Connector 1:
>  [drm]   DVI-D-1
>  [drm]   HPD2
>  [drm]   DDC: 0x1950 0x1950 0x1951 0x1951 0x1952 0x1952 0x1953 0x1953
>  [drm]   Encoders:
>  [drm]     DFP2: INTERNAL_UNIPHY
>  [drm] Connector 2:
>  [drm]   VGA-1
>  [drm]   DDC: 0x1970 0x1970 0x1971 0x1971 0x1972 0x1972 0x1973 0x1973
>  [drm]   Encoders:
>  [drm]     CRT1: INTERNAL_KLDSCP_DAC1
>  [drm] Found UVD firmware Version: 64.0 Family ID: 13
>  amdgpu: Move buffer fallback to memcpy unavailable
>  [drm:amdgpu_device_init.cold [amdgpu]] *ERROR* sw_init of IP block <uvd_v3_1> failed -19
>  amdgpu 0000:01:00.0: amdgpu: amdgpu_device_ip_init failed
>  amdgpu 0000:01:00.0: amdgpu: Fatal error during GPU init
>  amdgpu 0000:01:00.0: amdgpu: amdgpu: finishing device.
>  BUG: kernel NULL pointer dereference, address: 0000000000000090
>  #PF: supervisor write access in kernel mode
>  #PF: error_code(0x0002) - not-present page
>  PGD 0 P4D 0
>  Oops: 0002 [#1] PREEMPT SMP NOPTI
>  CPU: 3 PID: 447 Comm: udevd Not tainted 6.0.0+ #5179
>  Hardware name: System manufacturer System Product Name/M3A78 PRO, BIOS 1701    01/27/2011
>  RIP: 0010:drm_sched_fini+0x80/0xa0 [gpu_sched]
>  Code: 76 83 0e c4 c6 85 8c 01 00 00 00 5b 5d 41 5c 41 5d c3 cc cc cc cc 4c 8d 63 f0 4c 89 e7 e8 08 99 8e c4 48 8b 03 48 39 d8 74 0f <c6> 80 90 00 00 00 01 48 8b 00 48 39 d8 75 f1 4c 89 e7 e8 c9 99 8e
>  RSP: 0018:ffffbeb3c06bfbb8 EFLAGS: 00010213
>  RAX: 0000000000000000 RBX: ffff99bae8269a98 RCX: ffff99bab703afc0
>  RDX: 0000000000000001 RSI: ffff99bab703afe8 RDI: 0000000000000000
>  RBP: ffff99bae82699f0 R08: ffffffff85cd0bc2 R09: 0000000000000010
>  R10: 0000000000000035 R11: ffff99bb594806c0 R12: ffff99bae8269a88
>  R13: ffff99bae82699f8 R14: ffff99bae82665e8 R15: 0000000000000000
>  FS:  00007fd81fcd9840(0000) GS:ffff99bb67cc0000(0000) knlGS:0000000000000000
>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  CR2: 0000000000000090 CR3: 0000000111822000 CR4: 00000000000006e0
>  Call Trace:
>   <TASK>
>   amdgpu_fence_driver_sw_fini+0xc2/0xd0 [amdgpu]
>   amdgpu_device_fini_sw+0x17/0x3c0 [amdgpu]
>   amdgpu_driver_release_kms+0x12/0x30 [amdgpu]
>   devm_drm_dev_init_release+0x4a/0x70 [drm]
>   release_nodes+0x40/0xb0
>   devres_release_all+0x89/0xc0
>   device_unbind_cleanup+0xe/0x70
>   really_probe+0x245/0x3a0
>   ? pm_runtime_barrier+0x61/0xb0
>   __driver_probe_device+0x78/0x170
>   driver_probe_device+0x2d/0xb0
>   __driver_attach+0xdc/0x1d0
>   ? __device_attach_driver+0x100/0x100
>   bus_for_each_dev+0x69/0xa0
>   bus_add_driver+0x1d4/0x230
>   ? _raw_spin_unlock+0x15/0x40
>   driver_register+0x89/0xe0
>   ? 0xffffffffc0c3b000
>   do_one_initcall+0x44/0x200
>   ? __kmem_cache_alloc_node+0x90/0x360
>   ? kmalloc_trace+0x38/0xc0
>   do_init_module+0x4a/0x1e0
>   __do_sys_finit_module+0xb5/0x130
>   do_syscall_64+0x3a/0x90
>   entry_SYSCALL_64_after_hwframe+0x63/0xcd
>  RIP: 0033:0x7fd81ff5b1b9
>  Code: 08 44 89 e0 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 27 1c 0d 00 f7 d8 64 89 01 48
>  RSP: 002b:00007ffc5b37cbb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
>  RAX: ffffffffffffffda RBX: 000055e5f2f6a140 RCX: 00007fd81ff5b1b9
>  RDX: 0000000000000000 RSI: 000055e5f2f67e30 RDI: 0000000000000017
>  RBP: 000055e5f2f67e30 R08: 0000000000000000 R09: 000055e5f2f46700
>  R10: 0000000000000017 R11: 0000000000000246 R12: 0000000000020000
>  R13: 0000000000000000 R14: 000055e5f2f65b00 R15: 0000000000000024
>   </TASK>
>  Modules linked in: amdgpu(+) snd_emu10k1_synth snd_emux_synth snd_seq_midi_emul snd_seq_virmidi snd_seq_midi snd_seq_midi_event snd_seq wmi_bmof snd_emu10k1 edac_mce_amd gpu_sched drm_buddy video kvm_amd drm_ttm_helper ttm snd_util_mem drm_display_helper snd_ac97_codec ccp drm_kms_helper snd_hda_codec_hdmi rng_core ac97_bus snd_rawmidi snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core snd_seq_device drm kvm snd_hwdep snd_pcm_oss snd_mixer_oss evdev serio_raw snd_pcm irqbypass i2c_algo_bit fb_sys_fops syscopyarea sysfillrect emu10k1_gp pcspkr gameport k10temp snd_timer sysimgblt snd acpi_cpufreq wmi soundcore button sp5100_tco asus_atk0110 ext4 crc16 mbcache jbd2 btrfs blake2b_generic xor raid6_pq zstd_compress libcrc32c crc32c_generic uas usb_storage sg sd_mod hid_generic t10_pi usbhid hid sr_mod cdrom crc64_rocksoft crc64 ata_generic ahci pata_atiixp libahci ohci_pci firewire_ohci libata firewire_core crc_itu_t xhci_pci scsi_mod ohci_hcd r8169 ehci_pci xhci_hcd
>   realtek ehci_hcd mdio_devres i2c_piix4 scsi_common usbcore libphy usb_common
>  CR2: 0000000000000090
>  ---[ end trace 0000000000000000 ]---
>  RIP: 0010:drm_sched_fini+0x80/0xa0 [gpu_sched]
>  Code: 76 83 0e c4 c6 85 8c 01 00 00 00 5b 5d 41 5c 41 5d c3 cc cc cc cc 4c 8d 63 f0 4c 89 e7 e8 08 99 8e c4 48 8b 03 48 39 d8 74 0f <c6> 80 90 00 00 00 01 48 8b 00 48 39 d8 75 f1 4c 89 e7 e8 c9 99 8e
>  RSP: 0018:ffffbeb3c06bfbb8 EFLAGS: 00010213
>  RAX: 0000000000000000 RBX: ffff99bae8269a98 RCX: ffff99bab703afc0
>  RDX: 0000000000000001 RSI: ffff99bab703afe8 RDI: 0000000000000000
>  RBP: ffff99bae82699f0 R08: ffffffff85cd0bc2 R09: 0000000000000010
>  R10: 0000000000000035 R11: ffff99bb594806c0 R12: ffff99bae8269a88
>  R13: ffff99bae82699f8 R14: ffff99bae82665e8 R15: 0000000000000000
>  FS:  00007fd81fcd9840(0000) GS:ffff99bb67cc0000(0000) knlGS:0000000000000000
>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  CR2: 0000000000000090 CR3: 0000000111822000 CR4: 00000000000006e0
>  note: udevd[447] exited with preempt_count 1
>  udevd[433]: worker [447] terminated by signal 9 (Killed)
>  udevd[433]: worker [447] failed while handling '/devices/pci0000:00/0000:00:02.0/0000:01:00.0'
>  r8169 0000:03:00.0 eth0: Link is Up - 1Gbps/Full - flow control off
>  IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
>  Adding 4194300k swap on /dev/sda4.  Priority:-2 extents:1 across:4194300k FS
>  EXT4-fs (sda5): re-mounted. Quota mode: none.
>  lp: driver loaded but no devices found
>  ppdev: user-space parallel port driver
>  it87: Found IT8716F chip at 0xe80, revision 3
>  ACPI Warning: SystemIO range 0x0000000000000E85-0x0000000000000E86 conflicts with OpRegion 0x0000000000000E85-0x0000000000000E86 (\_SB.PCI0.SBRG.ASOC.HWRE) (20220331/utaddress-204)
>  ACPI: OSL: Resource conflict; ACPI support missing from driver?
>  BUG: unable to handle page fault for address: 00000000000065c0
>  #PF: supervisor read access in kernel mode
>  #PF: error_code(0x0000) - not-present page
>  PGD 0 P4D 0
>  Oops: 0000 [#2] PREEMPT SMP NOPTI
>  CPU: 2 PID: 55 Comm: kworker/2:1 Tainted: G      D            6.0.0+ #5179
>  Hardware name: System manufacturer System Product Name/M3A78 PRO, BIOS 1701    01/27/2011
>  Workqueue: events output_poll_execute [drm_kms_helper]
>  RIP: 0010:amdgpu_device_rreg.part.0+0x39/0x100 [amdgpu]
>  Code: 6c 24 08 48 89 fb 4c 89 64 24 10 44 8d 24 b5 00 00 00 00 4c 3b a7 88 08 00 00 89 f5 73 70 83 e2 02 74 2f 4c 03 a3 90 08 00 00 <45> 8b 24 24 48 8b 43 08 0f b7 70 3e 66 90 44 89 e0 48 8b 1c 24 48
>  RSP: 0018:ffffbeb3c0717c48 EFLAGS: 00010206
>  RAX: 0000000000000000 RBX: ffff99bae8260000 RCX: 0000000000000000
>  RDX: 0000000000000000 RSI: 0000000000001970 RDI: ffff99bae8260000
>  RBP: 0000000000001970 R08: ffffbeb3c0717e08 R09: 0000000000000000
>  R10: 0000000000000018 R11: fefefefefefefeff R12: 00000000000065c0
>  R13: ffffbeb3c0717d70 R14: 0000000000000000 R15: 000000010005e340
>  FS:  0000000000000000(0000) GS:ffff99bb67c80000(0000) knlGS:0000000000000000
>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  CR2: 00000000000065c0 CR3: 000000008980a000 CR4: 00000000000006e0
>  Call Trace:
>   <TASK>
>   amdgpu_i2c_pre_xfer+0x163/0x180 [amdgpu]
>   bit_xfer+0x36/0x530 [i2c_algo_bit]
>   __i2c_transfer+0x185/0x550
>   i2c_transfer+0xa2/0x110
>   amdgpu_display_ddc_probe+0xbd/0x100 [amdgpu]
>   amdgpu_connector_vga_detect+0x8e/0x200 [amdgpu]
>   drm_helper_probe_detect_ctx+0x7b/0xd0 [drm_kms_helper]
>   output_poll_execute+0x152/0x220 [drm_kms_helper]
>   process_one_work+0x1ae/0x370
>   worker_thread+0x4d/0x3b0
>   ? rescuer_thread+0x380/0x380
>   kthread+0xe3/0x110
>   ? kthread_complete_and_exit+0x20/0x20
>   ret_from_fork+0x22/0x30
>   </TASK>
>  Modules linked in: max6650 hwmon_vid parport_pc ppdev lp parport amdgpu(+) snd_emu10k1_synth snd_emux_synth snd_seq_midi_emul snd_seq_virmidi snd_seq_midi snd_seq_midi_event snd_seq wmi_bmof snd_emu10k1 edac_mce_amd gpu_sched drm_buddy video kvm_amd drm_ttm_helper ttm snd_util_mem drm_display_helper snd_ac97_codec ccp drm_kms_helper snd_hda_codec_hdmi rng_core ac97_bus snd_rawmidi snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core snd_seq_device drm kvm snd_hwdep snd_pcm_oss snd_mixer_oss evdev serio_raw snd_pcm irqbypass i2c_algo_bit fb_sys_fops syscopyarea sysfillrect emu10k1_gp pcspkr gameport k10temp snd_timer sysimgblt snd acpi_cpufreq wmi soundcore button sp5100_tco asus_atk0110 ext4 crc16 mbcache jbd2 btrfs blake2b_generic xor raid6_pq zstd_compress libcrc32c crc32c_generic uas usb_storage sg sd_mod hid_generic t10_pi usbhid hid sr_mod cdrom crc64_rocksoft crc64 ata_generic ahci pata_atiixp libahci ohci_pci firewire_ohci libata firewire_core crc_itu_t xhci_pci
>   scsi_mod ohci_hcd r8169 ehci_pci xhci_hcd realtek ehci_hcd mdio_devres i2c_piix4 scsi_common usbcore libphy usb_common
>  CR2: 00000000000065c0
>  ---[ end trace 0000000000000000 ]---
>  RIP: 0010:drm_sched_fini+0x80/0xa0 [gpu_sched]
>  Code: 76 83 0e c4 c6 85 8c 01 00 00 00 5b 5d 41 5c 41 5d c3 cc cc cc cc 4c 8d 63 f0 4c 89 e7 e8 08 99 8e c4 48 8b 03 48 39 d8 74 0f <c6> 80 90 00 00 00 01 48 8b 00 48 39 d8 75 f1 4c 89 e7 e8 c9 99 8e
>  RSP: 0018:ffffbeb3c06bfbb8 EFLAGS: 00010213
>  RAX: 0000000000000000 RBX: ffff99bae8269a98 RCX: ffff99bab703afc0
>  RDX: 0000000000000001 RSI: ffff99bab703afe8 RDI: 0000000000000000
>  RBP: ffff99bae82699f0 R08: ffffffff85cd0bc2 R09: 0000000000000010
>  R10: 0000000000000035 R11: ffff99bb594806c0 R12: ffff99bae8269a88
>  R13: ffff99bae82699f8 R14: ffff99bae82665e8 R15: 0000000000000000
>  FS:  0000000000000000(0000) GS:ffff99bb67c80000(0000) knlGS:0000000000000000
>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  CR2: 00000000000065c0 CR3: 000000008980a000 CR4: 00000000000006e0
  
Arthur Marsh Oct. 17, 2022, 1:13 a.m. UTC | #5
Thanks Dave, I reverted patch 312b4dc11d4f74bfe03ea25ffe04c1f2fdd13cb9 against 6.1-rc1 and the resulting kernel loaded amdgpu fine on my pc with Cape Verde GPU.

Regards,

Arthur. 

On 17 October 2022 8:14:18 am ACDT, Dave Airlie <airlied@gmail.com> wrote:
>On Sun, 16 Oct 2022 at 18:09, Arthur Marsh
><arthur.marsh@internode.on.net> wrote:
>>
>> From: Arthur Marsh <arthur.marsh@internode.on.net>
>>
>> Hi, the "drm fixes for 6.1-rc1" commit caused the amdgpu module to fail
>> with my Cape Verde radeonsi card.
>>
>> I haven't been able to bisect the problem to an individual commit, but
>> attach a dmesg extract below.
>>
>> I'm happy to supply any other configuration information and test patches.
>>
>
>Can you try reverting: it's the only think I can spot that might
>affect a card that old since most changes in that request were for
>display hw you don't have.
>
>ommit 312b4dc11d4f74bfe03ea25ffe04c1f2fdd13cb9
>Author: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
>Date:   Tue Oct 4 07:33:39 2022 -0700
>
>    drm/amdgpu: Fix VRAM BO swap issue
>
>    DRM buddy manager allocates the contiguous memory requests in
>    a single block or multiple blocks. So for the ttm move operation
>    (incase of low vram memory) we should consider all the blocks to
>    compute the total memory size which compared with the struct
>    ttm_resource num_pages in order to verify that the blocks are
>    contiguous for the eviction process.
>
>    v2: Added a Fixes tag
>    v3: Rewrite the code to save a bit of calculations and
>        variables (Christian)
>
>    Fixes: c9cad937c0c5 ("drm/amdgpu: add drm buddy support to amdgpu")
>    Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
>    Reviewed-by: Christian König <christian.koenig@amd.com>
>    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>
>
>Thanks,
>Dave.
>
>> Arthur.
>>
>>  Linux version 6.0.0+ (root@am64) (gcc-12 (Debian 12.2.0-5) 12.2.0, GNU ld (GNU Binutils for Debian) 2.39) #5179 SMP PREEMPT_DYNAMIC Fri Oct 14 17:00:40 ACDT 2022
>>  Command line: BOOT_IMAGE=/vmlinuz-6.0.0+ root=UUID=39706f53-7c27-4310-b22a-36c7b042d1a1 ro single amdgpu.audio=1 amdgpu.si_support=1 radeon.si_support=0 page_owner=on amdgpu.gpu_recovery=1
>> ...
>>
>>  [drm] amdgpu kernel modesetting enabled.
>>  amdgpu 0000:01:00.0: vgaarb: deactivate vga console
>>  Console: switching to colour dummy device 80x25
>>  [drm] initializing kernel modesetting (VERDE 0x1002:0x682B 0x1458:0x22CA 0x87).
>>  [drm] register mmio base: 0xFE8C0000
>>  [drm] register mmio size: 262144
>>  [drm] add ip block number 0 <si_common>
>>  [drm] add ip block number 1 <gmc_v6_0>
>>  [drm] add ip block number 2 <si_ih>
>>  [drm] add ip block number 3 <gfx_v6_0>
>>  [drm] add ip block number 4 <si_dma>
>>  [drm] add ip block number 5 <si_dpm>
>>  [drm] add ip block number 6 <dce_v6_0>
>>  [drm] add ip block number 7 <uvd_v3_1>
>>  [drm] BIOS signature incorrect 5b 7
>>  resource sanity check: requesting [mem 0x000c0000-0x000dffff], which spans more than PCI Bus 0000:00 [mem 0x000d0000-0x000dffff window]
>>  caller pci_map_rom+0x68/0x1b0 mapping multiple BARs
>>  amdgpu 0000:01:00.0: No more image in the PCI ROM
>>  amdgpu 0000:01:00.0: amdgpu: Fetched VBIOS from ROM BAR
>>  amdgpu: ATOM BIOS: xxx-xxx-xxx
>>  amdgpu 0000:01:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
>>  amdgpu 0000:01:00.0: amdgpu: PCIE atomic ops is not supported
>>  [drm] PCIE gen 2 link speeds already enabled
>>  [drm] vm size is 64 GB, 2 levels, block size is 10-bit, fragment size is 9-bit
>>  RTL8211B Gigabit Ethernet r8169-0-300:00: attached PHY driver (mii_bus:phy_addr=r8169-0-300:00, irq=MAC)
>>  r8169 0000:03:00.0 eth0: Link is Down
>>  amdgpu 0000:01:00.0: amdgpu: VRAM: 2048M 0x000000F400000000 - 0x000000F47FFFFFFF (2048M used)
>>  amdgpu 0000:01:00.0: amdgpu: GART: 1024M 0x000000FF00000000 - 0x000000FF3FFFFFFF
>>  [drm] Detected VRAM RAM=2048M, BAR=256M
>>  [drm] RAM width 128bits DDR3
>>  [drm] amdgpu: 2048M of VRAM memory ready
>>  [drm] amdgpu: 3979M of GTT memory ready.
>>  [drm] GART: num cpu pages 262144, num gpu pages 262144
>>  amdgpu 0000:01:00.0: amdgpu: PCIE GART of 1024M enabled (table at 0x000000F400A00000).
>>  [drm] Internal thermal controller with fan control
>>  [drm] amdgpu: dpm initialized
>>  [drm] AMDGPU Display Connectors
>>  [drm] Connector 0:
>>  [drm]   HDMI-A-1
>>  [drm]   HPD1
>>  [drm]   DDC: 0x194c 0x194c 0x194d 0x194d 0x194e 0x194e 0x194f 0x194f
>>  [drm]   Encoders:
>>  [drm]     DFP1: INTERNAL_UNIPHY
>>  [drm] Connector 1:
>>  [drm]   DVI-D-1
>>  [drm]   HPD2
>>  [drm]   DDC: 0x1950 0x1950 0x1951 0x1951 0x1952 0x1952 0x1953 0x1953
>>  [drm]   Encoders:
>>  [drm]     DFP2: INTERNAL_UNIPHY
>>  [drm] Connector 2:
>>  [drm]   VGA-1
>>  [drm]   DDC: 0x1970 0x1970 0x1971 0x1971 0x1972 0x1972 0x1973 0x1973
>>  [drm]   Encoders:
>>  [drm]     CRT1: INTERNAL_KLDSCP_DAC1
>>  [drm] Found UVD firmware Version: 64.0 Family ID: 13
>>  amdgpu: Move buffer fallback to memcpy unavailable
>>  [drm:amdgpu_device_init.cold [amdgpu]] *ERROR* sw_init of IP block <uvd_v3_1> failed -19
>>  amdgpu 0000:01:00.0: amdgpu: amdgpu_device_ip_init failed
>>  amdgpu 0000:01:00.0: amdgpu: Fatal error during GPU init
>>  amdgpu 0000:01:00.0: amdgpu: amdgpu: finishing device.
>>  BUG: kernel NULL pointer dereference, address: 0000000000000090
>>  #PF: supervisor write access in kernel mode
>>  #PF: error_code(0x0002) - not-present page
>>  PGD 0 P4D 0
>>  Oops: 0002 [#1] PREEMPT SMP NOPTI
>>  CPU: 3 PID: 447 Comm: udevd Not tainted 6.0.0+ #5179
>>  Hardware name: System manufacturer System Product Name/M3A78 PRO, BIOS 1701    01/27/2011
>>  RIP: 0010:drm_sched_fini+0x80/0xa0 [gpu_sched]
>>  Code: 76 83 0e c4 c6 85 8c 01 00 00 00 5b 5d 41 5c 41 5d c3 cc cc cc cc 4c 8d 63 f0 4c 89 e7 e8 08 99 8e c4 48 8b 03 48 39 d8 74 0f <c6> 80 90 00 00 00 01 48 8b 00 48 39 d8 75 f1 4c 89 e7 e8 c9 99 8e
>>  RSP: 0018:ffffbeb3c06bfbb8 EFLAGS: 00010213
>>  RAX: 0000000000000000 RBX: ffff99bae8269a98 RCX: ffff99bab703afc0
>>  RDX: 0000000000000001 RSI: ffff99bab703afe8 RDI: 0000000000000000
>>  RBP: ffff99bae82699f0 R08: ffffffff85cd0bc2 R09: 0000000000000010
>>  R10: 0000000000000035 R11: ffff99bb594806c0 R12: ffff99bae8269a88
>>  R13: ffff99bae82699f8 R14: ffff99bae82665e8 R15: 0000000000000000
>>  FS:  00007fd81fcd9840(0000) GS:ffff99bb67cc0000(0000) knlGS:0000000000000000
>>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>  CR2: 0000000000000090 CR3: 0000000111822000 CR4: 00000000000006e0
>>  Call Trace:
>>   <TASK>
>>   amdgpu_fence_driver_sw_fini+0xc2/0xd0 [amdgpu]
>>   amdgpu_device_fini_sw+0x17/0x3c0 [amdgpu]
>>   amdgpu_driver_release_kms+0x12/0x30 [amdgpu]
>>   devm_drm_dev_init_release+0x4a/0x70 [drm]
>>   release_nodes+0x40/0xb0
>>   devres_release_all+0x89/0xc0
>>   device_unbind_cleanup+0xe/0x70
>>   really_probe+0x245/0x3a0
>>   ? pm_runtime_barrier+0x61/0xb0
>>   __driver_probe_device+0x78/0x170
>>   driver_probe_device+0x2d/0xb0
>>   __driver_attach+0xdc/0x1d0
>>   ? __device_attach_driver+0x100/0x100
>>   bus_for_each_dev+0x69/0xa0
>>   bus_add_driver+0x1d4/0x230
>>   ? _raw_spin_unlock+0x15/0x40
>>   driver_register+0x89/0xe0
>>   ? 0xffffffffc0c3b000
>>   do_one_initcall+0x44/0x200
>>   ? __kmem_cache_alloc_node+0x90/0x360
>>   ? kmalloc_trace+0x38/0xc0
>>   do_init_module+0x4a/0x1e0
>>   __do_sys_finit_module+0xb5/0x130
>>   do_syscall_64+0x3a/0x90
>>   entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>  RIP: 0033:0x7fd81ff5b1b9
>>  Code: 08 44 89 e0 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 27 1c 0d 00 f7 d8 64 89 01 48
>>  RSP: 002b:00007ffc5b37cbb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
>>  RAX: ffffffffffffffda RBX: 000055e5f2f6a140 RCX: 00007fd81ff5b1b9
>>  RDX: 0000000000000000 RSI: 000055e5f2f67e30 RDI: 0000000000000017
>>  RBP: 000055e5f2f67e30 R08: 0000000000000000 R09: 000055e5f2f46700
>>  R10: 0000000000000017 R11: 0000000000000246 R12: 0000000000020000
>>  R13: 0000000000000000 R14: 000055e5f2f65b00 R15: 0000000000000024
>>   </TASK>
>>  Modules linked in: amdgpu(+) snd_emu10k1_synth snd_emux_synth snd_seq_midi_emul snd_seq_virmidi snd_seq_midi snd_seq_midi_event snd_seq wmi_bmof snd_emu10k1 edac_mce_amd gpu_sched drm_buddy video kvm_amd drm_ttm_helper ttm snd_util_mem drm_display_helper snd_ac97_codec ccp drm_kms_helper snd_hda_codec_hdmi rng_core ac97_bus snd_rawmidi snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core snd_seq_device drm kvm snd_hwdep snd_pcm_oss snd_mixer_oss evdev serio_raw snd_pcm irqbypass i2c_algo_bit fb_sys_fops syscopyarea sysfillrect emu10k1_gp pcspkr gameport k10temp snd_timer sysimgblt snd acpi_cpufreq wmi soundcore button sp5100_tco asus_atk0110 ext4 crc16 mbcache jbd2 btrfs blake2b_generic xor raid6_pq zstd_compress libcrc32c crc32c_generic uas usb_storage sg sd_mod hid_generic t10_pi usbhid hid sr_mod cdrom crc64_rocksoft crc64 ata_generic ahci pata_atiixp libahci ohci_pci firewire_ohci libata firewire_core crc_itu_t xhci_pci scsi_mod ohci_hcd r8169 ehci_pci xhci_hcd
>>   realtek ehci_hcd mdio_devres i2c_piix4 scsi_common usbcore libphy usb_common
>>  CR2: 0000000000000090
>>  ---[ end trace 0000000000000000 ]---
>>  RIP: 0010:drm_sched_fini+0x80/0xa0 [gpu_sched]
>>  Code: 76 83 0e c4 c6 85 8c 01 00 00 00 5b 5d 41 5c 41 5d c3 cc cc cc cc 4c 8d 63 f0 4c 89 e7 e8 08 99 8e c4 48 8b 03 48 39 d8 74 0f <c6> 80 90 00 00 00 01 48 8b 00 48 39 d8 75 f1 4c 89 e7 e8 c9 99 8e
>>  RSP: 0018:ffffbeb3c06bfbb8 EFLAGS: 00010213
>>  RAX: 0000000000000000 RBX: ffff99bae8269a98 RCX: ffff99bab703afc0
>>  RDX: 0000000000000001 RSI: ffff99bab703afe8 RDI: 0000000000000000
>>  RBP: ffff99bae82699f0 R08: ffffffff85cd0bc2 R09: 0000000000000010
>>  R10: 0000000000000035 R11: ffff99bb594806c0 R12: ffff99bae8269a88
>>  R13: ffff99bae82699f8 R14: ffff99bae82665e8 R15: 0000000000000000
>>  FS:  00007fd81fcd9840(0000) GS:ffff99bb67cc0000(0000) knlGS:0000000000000000
>>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>  CR2: 0000000000000090 CR3: 0000000111822000 CR4: 00000000000006e0
>>  note: udevd[447] exited with preempt_count 1
>>  udevd[433]: worker [447] terminated by signal 9 (Killed)
>>  udevd[433]: worker [447] failed while handling '/devices/pci0000:00/0000:00:02.0/0000:01:00.0'
>>  r8169 0000:03:00.0 eth0: Link is Up - 1Gbps/Full - flow control off
>>  IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
>>  Adding 4194300k swap on /dev/sda4.  Priority:-2 extents:1 across:4194300k FS
>>  EXT4-fs (sda5): re-mounted. Quota mode: none.
>>  lp: driver loaded but no devices found
>>  ppdev: user-space parallel port driver
>>  it87: Found IT8716F chip at 0xe80, revision 3
>>  ACPI Warning: SystemIO range 0x0000000000000E85-0x0000000000000E86 conflicts with OpRegion 0x0000000000000E85-0x0000000000000E86 (\_SB.PCI0.SBRG.ASOC.HWRE) (20220331/utaddress-204)
>>  ACPI: OSL: Resource conflict; ACPI support missing from driver?
>>  BUG: unable to handle page fault for address: 00000000000065c0
>>  #PF: supervisor read access in kernel mode
>>  #PF: error_code(0x0000) - not-present page
>>  PGD 0 P4D 0
>>  Oops: 0000 [#2] PREEMPT SMP NOPTI
>>  CPU: 2 PID: 55 Comm: kworker/2:1 Tainted: G      D            6.0.0+ #5179
>>  Hardware name: System manufacturer System Product Name/M3A78 PRO, BIOS 1701    01/27/2011
>>  Workqueue: events output_poll_execute [drm_kms_helper]
>>  RIP: 0010:amdgpu_device_rreg.part.0+0x39/0x100 [amdgpu]
>>  Code: 6c 24 08 48 89 fb 4c 89 64 24 10 44 8d 24 b5 00 00 00 00 4c 3b a7 88 08 00 00 89 f5 73 70 83 e2 02 74 2f 4c 03 a3 90 08 00 00 <45> 8b 24 24 48 8b 43 08 0f b7 70 3e 66 90 44 89 e0 48 8b 1c 24 48
>>  RSP: 0018:ffffbeb3c0717c48 EFLAGS: 00010206
>>  RAX: 0000000000000000 RBX: ffff99bae8260000 RCX: 0000000000000000
>>  RDX: 0000000000000000 RSI: 0000000000001970 RDI: ffff99bae8260000
>>  RBP: 0000000000001970 R08: ffffbeb3c0717e08 R09: 0000000000000000
>>  R10: 0000000000000018 R11: fefefefefefefeff R12: 00000000000065c0
>>  R13: ffffbeb3c0717d70 R14: 0000000000000000 R15: 000000010005e340
>>  FS:  0000000000000000(0000) GS:ffff99bb67c80000(0000) knlGS:0000000000000000
>>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>  CR2: 00000000000065c0 CR3: 000000008980a000 CR4: 00000000000006e0
>>  Call Trace:
>>   <TASK>
>>   amdgpu_i2c_pre_xfer+0x163/0x180 [amdgpu]
>>   bit_xfer+0x36/0x530 [i2c_algo_bit]
>>   __i2c_transfer+0x185/0x550
>>   i2c_transfer+0xa2/0x110
>>   amdgpu_display_ddc_probe+0xbd/0x100 [amdgpu]
>>   amdgpu_connector_vga_detect+0x8e/0x200 [amdgpu]
>>   drm_helper_probe_detect_ctx+0x7b/0xd0 [drm_kms_helper]
>>   output_poll_execute+0x152/0x220 [drm_kms_helper]
>>   process_one_work+0x1ae/0x370
>>   worker_thread+0x4d/0x3b0
>>   ? rescuer_thread+0x380/0x380
>>   kthread+0xe3/0x110
>>   ? kthread_complete_and_exit+0x20/0x20
>>   ret_from_fork+0x22/0x30
>>   </TASK>
>>  Modules linked in: max6650 hwmon_vid parport_pc ppdev lp parport amdgpu(+) snd_emu10k1_synth snd_emux_synth snd_seq_midi_emul snd_seq_virmidi snd_seq_midi snd_seq_midi_event snd_seq wmi_bmof snd_emu10k1 edac_mce_amd gpu_sched drm_buddy video kvm_amd drm_ttm_helper ttm snd_util_mem drm_display_helper snd_ac97_codec ccp drm_kms_helper snd_hda_codec_hdmi rng_core ac97_bus snd_rawmidi snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core snd_seq_device drm kvm snd_hwdep snd_pcm_oss snd_mixer_oss evdev serio_raw snd_pcm irqbypass i2c_algo_bit fb_sys_fops syscopyarea sysfillrect emu10k1_gp pcspkr gameport k10temp snd_timer sysimgblt snd acpi_cpufreq wmi soundcore button sp5100_tco asus_atk0110 ext4 crc16 mbcache jbd2 btrfs blake2b_generic xor raid6_pq zstd_compress libcrc32c crc32c_generic uas usb_storage sg sd_mod hid_generic t10_pi usbhid hid sr_mod cdrom crc64_rocksoft crc64 ata_generic ahci pata_atiixp libahci ohci_pci firewire_ohci libata firewire_core crc_itu_t xhci_pci
>>   scsi_mod ohci_hcd r8169 ehci_pci xhci_hcd realtek ehci_hcd mdio_devres i2c_piix4 scsi_common usbcore libphy usb_common
>>  CR2: 00000000000065c0
>>  ---[ end trace 0000000000000000 ]---
>>  RIP: 0010:drm_sched_fini+0x80/0xa0 [gpu_sched]
>>  Code: 76 83 0e c4 c6 85 8c 01 00 00 00 5b 5d 41 5c 41 5d c3 cc cc cc cc 4c 8d 63 f0 4c 89 e7 e8 08 99 8e c4 48 8b 03 48 39 d8 74 0f <c6> 80 90 00 00 00 01 48 8b 00 48 39 d8 75 f1 4c 89 e7 e8 c9 99 8e
>>  RSP: 0018:ffffbeb3c06bfbb8 EFLAGS: 00010213
>>  RAX: 0000000000000000 RBX: ffff99bae8269a98 RCX: ffff99bab703afc0
>>  RDX: 0000000000000001 RSI: ffff99bab703afe8 RDI: 0000000000000000
>>  RBP: ffff99bae82699f0 R08: ffffffff85cd0bc2 R09: 0000000000000010
>>  R10: 0000000000000035 R11: ffff99bb594806c0 R12: ffff99bae8269a88
>>  R13: ffff99bae82699f8 R14: ffff99bae82665e8 R15: 0000000000000000
>>  FS:  0000000000000000(0000) GS:ffff99bb67c80000(0000) knlGS:0000000000000000
>>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>  CR2: 00000000000065c0 CR3: 000000008980a000 CR4: 00000000000006e0
  
Christian König Oct. 17, 2022, 6:20 a.m. UTC | #6
Arun please take a look into this ASAP.

Thanks,
Christian.

Am 17.10.22 um 03:13 schrieb Arthur Marsh:
> Thanks Dave, I reverted patch 312b4dc11d4f74bfe03ea25ffe04c1f2fdd13cb9 against 6.1-rc1 and the resulting kernel loaded amdgpu fine on my pc with Cape Verde GPU.
>
> Regards,
>
> Arthur.
>
> On 17 October 2022 8:14:18 am ACDT, Dave Airlie <airlied@gmail.com> wrote:
>> On Sun, 16 Oct 2022 at 18:09, Arthur Marsh
>> <arthur.marsh@internode.on.net> wrote:
>>> From: Arthur Marsh <arthur.marsh@internode.on.net>
>>>
>>> Hi, the "drm fixes for 6.1-rc1" commit caused the amdgpu module to fail
>>> with my Cape Verde radeonsi card.
>>>
>>> I haven't been able to bisect the problem to an individual commit, but
>>> attach a dmesg extract below.
>>>
>>> I'm happy to supply any other configuration information and test patches.
>>>
>> Can you try reverting: it's the only think I can spot that might
>> affect a card that old since most changes in that request were for
>> display hw you don't have.
>>
>> ommit 312b4dc11d4f74bfe03ea25ffe04c1f2fdd13cb9
>> Author: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
>> Date:   Tue Oct 4 07:33:39 2022 -0700
>>
>>     drm/amdgpu: Fix VRAM BO swap issue
>>
>>     DRM buddy manager allocates the contiguous memory requests in
>>     a single block or multiple blocks. So for the ttm move operation
>>     (incase of low vram memory) we should consider all the blocks to
>>     compute the total memory size which compared with the struct
>>     ttm_resource num_pages in order to verify that the blocks are
>>     contiguous for the eviction process.
>>
>>     v2: Added a Fixes tag
>>     v3: Rewrite the code to save a bit of calculations and
>>         variables (Christian)
>>
>>     Fixes: c9cad937c0c5 ("drm/amdgpu: add drm buddy support to amdgpu")
>>     Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
>>     Reviewed-by: Christian König <christian.koenig@amd.com>
>>     Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>>
>>
>> Thanks,
>> Dave.
>>
>>> Arthur.
>>>
>>>   Linux version 6.0.0+ (root@am64) (gcc-12 (Debian 12.2.0-5) 12.2.0, GNU ld (GNU Binutils for Debian) 2.39) #5179 SMP PREEMPT_DYNAMIC Fri Oct 14 17:00:40 ACDT 2022
>>>   Command line: BOOT_IMAGE=/vmlinuz-6.0.0+ root=UUID=39706f53-7c27-4310-b22a-36c7b042d1a1 ro single amdgpu.audio=1 amdgpu.si_support=1 radeon.si_support=0 page_owner=on amdgpu.gpu_recovery=1
>>> ...
>>>
>>>   [drm] amdgpu kernel modesetting enabled.
>>>   amdgpu 0000:01:00.0: vgaarb: deactivate vga console
>>>   Console: switching to colour dummy device 80x25
>>>   [drm] initializing kernel modesetting (VERDE 0x1002:0x682B 0x1458:0x22CA 0x87).
>>>   [drm] register mmio base: 0xFE8C0000
>>>   [drm] register mmio size: 262144
>>>   [drm] add ip block number 0 <si_common>
>>>   [drm] add ip block number 1 <gmc_v6_0>
>>>   [drm] add ip block number 2 <si_ih>
>>>   [drm] add ip block number 3 <gfx_v6_0>
>>>   [drm] add ip block number 4 <si_dma>
>>>   [drm] add ip block number 5 <si_dpm>
>>>   [drm] add ip block number 6 <dce_v6_0>
>>>   [drm] add ip block number 7 <uvd_v3_1>
>>>   [drm] BIOS signature incorrect 5b 7
>>>   resource sanity check: requesting [mem 0x000c0000-0x000dffff], which spans more than PCI Bus 0000:00 [mem 0x000d0000-0x000dffff window]
>>>   caller pci_map_rom+0x68/0x1b0 mapping multiple BARs
>>>   amdgpu 0000:01:00.0: No more image in the PCI ROM
>>>   amdgpu 0000:01:00.0: amdgpu: Fetched VBIOS from ROM BAR
>>>   amdgpu: ATOM BIOS: xxx-xxx-xxx
>>>   amdgpu 0000:01:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
>>>   amdgpu 0000:01:00.0: amdgpu: PCIE atomic ops is not supported
>>>   [drm] PCIE gen 2 link speeds already enabled
>>>   [drm] vm size is 64 GB, 2 levels, block size is 10-bit, fragment size is 9-bit
>>>   RTL8211B Gigabit Ethernet r8169-0-300:00: attached PHY driver (mii_bus:phy_addr=r8169-0-300:00, irq=MAC)
>>>   r8169 0000:03:00.0 eth0: Link is Down
>>>   amdgpu 0000:01:00.0: amdgpu: VRAM: 2048M 0x000000F400000000 - 0x000000F47FFFFFFF (2048M used)
>>>   amdgpu 0000:01:00.0: amdgpu: GART: 1024M 0x000000FF00000000 - 0x000000FF3FFFFFFF
>>>   [drm] Detected VRAM RAM=2048M, BAR=256M
>>>   [drm] RAM width 128bits DDR3
>>>   [drm] amdgpu: 2048M of VRAM memory ready
>>>   [drm] amdgpu: 3979M of GTT memory ready.
>>>   [drm] GART: num cpu pages 262144, num gpu pages 262144
>>>   amdgpu 0000:01:00.0: amdgpu: PCIE GART of 1024M enabled (table at 0x000000F400A00000).
>>>   [drm] Internal thermal controller with fan control
>>>   [drm] amdgpu: dpm initialized
>>>   [drm] AMDGPU Display Connectors
>>>   [drm] Connector 0:
>>>   [drm]   HDMI-A-1
>>>   [drm]   HPD1
>>>   [drm]   DDC: 0x194c 0x194c 0x194d 0x194d 0x194e 0x194e 0x194f 0x194f
>>>   [drm]   Encoders:
>>>   [drm]     DFP1: INTERNAL_UNIPHY
>>>   [drm] Connector 1:
>>>   [drm]   DVI-D-1
>>>   [drm]   HPD2
>>>   [drm]   DDC: 0x1950 0x1950 0x1951 0x1951 0x1952 0x1952 0x1953 0x1953
>>>   [drm]   Encoders:
>>>   [drm]     DFP2: INTERNAL_UNIPHY
>>>   [drm] Connector 2:
>>>   [drm]   VGA-1
>>>   [drm]   DDC: 0x1970 0x1970 0x1971 0x1971 0x1972 0x1972 0x1973 0x1973
>>>   [drm]   Encoders:
>>>   [drm]     CRT1: INTERNAL_KLDSCP_DAC1
>>>   [drm] Found UVD firmware Version: 64.0 Family ID: 13
>>>   amdgpu: Move buffer fallback to memcpy unavailable
>>>   [drm:amdgpu_device_init.cold [amdgpu]] *ERROR* sw_init of IP block <uvd_v3_1> failed -19
>>>   amdgpu 0000:01:00.0: amdgpu: amdgpu_device_ip_init failed
>>>   amdgpu 0000:01:00.0: amdgpu: Fatal error during GPU init
>>>   amdgpu 0000:01:00.0: amdgpu: amdgpu: finishing device.
>>>   BUG: kernel NULL pointer dereference, address: 0000000000000090
>>>   #PF: supervisor write access in kernel mode
>>>   #PF: error_code(0x0002) - not-present page
>>>   PGD 0 P4D 0
>>>   Oops: 0002 [#1] PREEMPT SMP NOPTI
>>>   CPU: 3 PID: 447 Comm: udevd Not tainted 6.0.0+ #5179
>>>   Hardware name: System manufacturer System Product Name/M3A78 PRO, BIOS 1701    01/27/2011
>>>   RIP: 0010:drm_sched_fini+0x80/0xa0 [gpu_sched]
>>>   Code: 76 83 0e c4 c6 85 8c 01 00 00 00 5b 5d 41 5c 41 5d c3 cc cc cc cc 4c 8d 63 f0 4c 89 e7 e8 08 99 8e c4 48 8b 03 48 39 d8 74 0f <c6> 80 90 00 00 00 01 48 8b 00 48 39 d8 75 f1 4c 89 e7 e8 c9 99 8e
>>>   RSP: 0018:ffffbeb3c06bfbb8 EFLAGS: 00010213
>>>   RAX: 0000000000000000 RBX: ffff99bae8269a98 RCX: ffff99bab703afc0
>>>   RDX: 0000000000000001 RSI: ffff99bab703afe8 RDI: 0000000000000000
>>>   RBP: ffff99bae82699f0 R08: ffffffff85cd0bc2 R09: 0000000000000010
>>>   R10: 0000000000000035 R11: ffff99bb594806c0 R12: ffff99bae8269a88
>>>   R13: ffff99bae82699f8 R14: ffff99bae82665e8 R15: 0000000000000000
>>>   FS:  00007fd81fcd9840(0000) GS:ffff99bb67cc0000(0000) knlGS:0000000000000000
>>>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>   CR2: 0000000000000090 CR3: 0000000111822000 CR4: 00000000000006e0
>>>   Call Trace:
>>>    <TASK>
>>>    amdgpu_fence_driver_sw_fini+0xc2/0xd0 [amdgpu]
>>>    amdgpu_device_fini_sw+0x17/0x3c0 [amdgpu]
>>>    amdgpu_driver_release_kms+0x12/0x30 [amdgpu]
>>>    devm_drm_dev_init_release+0x4a/0x70 [drm]
>>>    release_nodes+0x40/0xb0
>>>    devres_release_all+0x89/0xc0
>>>    device_unbind_cleanup+0xe/0x70
>>>    really_probe+0x245/0x3a0
>>>    ? pm_runtime_barrier+0x61/0xb0
>>>    __driver_probe_device+0x78/0x170
>>>    driver_probe_device+0x2d/0xb0
>>>    __driver_attach+0xdc/0x1d0
>>>    ? __device_attach_driver+0x100/0x100
>>>    bus_for_each_dev+0x69/0xa0
>>>    bus_add_driver+0x1d4/0x230
>>>    ? _raw_spin_unlock+0x15/0x40
>>>    driver_register+0x89/0xe0
>>>    ? 0xffffffffc0c3b000
>>>    do_one_initcall+0x44/0x200
>>>    ? __kmem_cache_alloc_node+0x90/0x360
>>>    ? kmalloc_trace+0x38/0xc0
>>>    do_init_module+0x4a/0x1e0
>>>    __do_sys_finit_module+0xb5/0x130
>>>    do_syscall_64+0x3a/0x90
>>>    entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>>   RIP: 0033:0x7fd81ff5b1b9
>>>   Code: 08 44 89 e0 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 27 1c 0d 00 f7 d8 64 89 01 48
>>>   RSP: 002b:00007ffc5b37cbb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
>>>   RAX: ffffffffffffffda RBX: 000055e5f2f6a140 RCX: 00007fd81ff5b1b9
>>>   RDX: 0000000000000000 RSI: 000055e5f2f67e30 RDI: 0000000000000017
>>>   RBP: 000055e5f2f67e30 R08: 0000000000000000 R09: 000055e5f2f46700
>>>   R10: 0000000000000017 R11: 0000000000000246 R12: 0000000000020000
>>>   R13: 0000000000000000 R14: 000055e5f2f65b00 R15: 0000000000000024
>>>    </TASK>
>>>   Modules linked in: amdgpu(+) snd_emu10k1_synth snd_emux_synth snd_seq_midi_emul snd_seq_virmidi snd_seq_midi snd_seq_midi_event snd_seq wmi_bmof snd_emu10k1 edac_mce_amd gpu_sched drm_buddy video kvm_amd drm_ttm_helper ttm snd_util_mem drm_display_helper snd_ac97_codec ccp drm_kms_helper snd_hda_codec_hdmi rng_core ac97_bus snd_rawmidi snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core snd_seq_device drm kvm snd_hwdep snd_pcm_oss snd_mixer_oss evdev serio_raw snd_pcm irqbypass i2c_algo_bit fb_sys_fops syscopyarea sysfillrect emu10k1_gp pcspkr gameport k10temp snd_timer sysimgblt snd acpi_cpufreq wmi soundcore button sp5100_tco asus_atk0110 ext4 crc16 mbcache jbd2 btrfs blake2b_generic xor raid6_pq zstd_compress libcrc32c crc32c_generic uas usb_storage sg sd_mod hid_generic t10_pi usbhid hid sr_mod cdrom crc64_rocksoft crc64 ata_generic ahci pata_atiixp libahci ohci_pci firewire_ohci libata firewire_core crc_itu_t xhci_pci scsi_mod ohci_hcd r8169 ehci_pci xhci_hcd
>>>    realtek ehci_hcd mdio_devres i2c_piix4 scsi_common usbcore libphy usb_common
>>>   CR2: 0000000000000090
>>>   ---[ end trace 0000000000000000 ]---
>>>   RIP: 0010:drm_sched_fini+0x80/0xa0 [gpu_sched]
>>>   Code: 76 83 0e c4 c6 85 8c 01 00 00 00 5b 5d 41 5c 41 5d c3 cc cc cc cc 4c 8d 63 f0 4c 89 e7 e8 08 99 8e c4 48 8b 03 48 39 d8 74 0f <c6> 80 90 00 00 00 01 48 8b 00 48 39 d8 75 f1 4c 89 e7 e8 c9 99 8e
>>>   RSP: 0018:ffffbeb3c06bfbb8 EFLAGS: 00010213
>>>   RAX: 0000000000000000 RBX: ffff99bae8269a98 RCX: ffff99bab703afc0
>>>   RDX: 0000000000000001 RSI: ffff99bab703afe8 RDI: 0000000000000000
>>>   RBP: ffff99bae82699f0 R08: ffffffff85cd0bc2 R09: 0000000000000010
>>>   R10: 0000000000000035 R11: ffff99bb594806c0 R12: ffff99bae8269a88
>>>   R13: ffff99bae82699f8 R14: ffff99bae82665e8 R15: 0000000000000000
>>>   FS:  00007fd81fcd9840(0000) GS:ffff99bb67cc0000(0000) knlGS:0000000000000000
>>>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>   CR2: 0000000000000090 CR3: 0000000111822000 CR4: 00000000000006e0
>>>   note: udevd[447] exited with preempt_count 1
>>>   udevd[433]: worker [447] terminated by signal 9 (Killed)
>>>   udevd[433]: worker [447] failed while handling '/devices/pci0000:00/0000:00:02.0/0000:01:00.0'
>>>   r8169 0000:03:00.0 eth0: Link is Up - 1Gbps/Full - flow control off
>>>   IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
>>>   Adding 4194300k swap on /dev/sda4.  Priority:-2 extents:1 across:4194300k FS
>>>   EXT4-fs (sda5): re-mounted. Quota mode: none.
>>>   lp: driver loaded but no devices found
>>>   ppdev: user-space parallel port driver
>>>   it87: Found IT8716F chip at 0xe80, revision 3
>>>   ACPI Warning: SystemIO range 0x0000000000000E85-0x0000000000000E86 conflicts with OpRegion 0x0000000000000E85-0x0000000000000E86 (\_SB.PCI0.SBRG.ASOC.HWRE) (20220331/utaddress-204)
>>>   ACPI: OSL: Resource conflict; ACPI support missing from driver?
>>>   BUG: unable to handle page fault for address: 00000000000065c0
>>>   #PF: supervisor read access in kernel mode
>>>   #PF: error_code(0x0000) - not-present page
>>>   PGD 0 P4D 0
>>>   Oops: 0000 [#2] PREEMPT SMP NOPTI
>>>   CPU: 2 PID: 55 Comm: kworker/2:1 Tainted: G      D            6.0.0+ #5179
>>>   Hardware name: System manufacturer System Product Name/M3A78 PRO, BIOS 1701    01/27/2011
>>>   Workqueue: events output_poll_execute [drm_kms_helper]
>>>   RIP: 0010:amdgpu_device_rreg.part.0+0x39/0x100 [amdgpu]
>>>   Code: 6c 24 08 48 89 fb 4c 89 64 24 10 44 8d 24 b5 00 00 00 00 4c 3b a7 88 08 00 00 89 f5 73 70 83 e2 02 74 2f 4c 03 a3 90 08 00 00 <45> 8b 24 24 48 8b 43 08 0f b7 70 3e 66 90 44 89 e0 48 8b 1c 24 48
>>>   RSP: 0018:ffffbeb3c0717c48 EFLAGS: 00010206
>>>   RAX: 0000000000000000 RBX: ffff99bae8260000 RCX: 0000000000000000
>>>   RDX: 0000000000000000 RSI: 0000000000001970 RDI: ffff99bae8260000
>>>   RBP: 0000000000001970 R08: ffffbeb3c0717e08 R09: 0000000000000000
>>>   R10: 0000000000000018 R11: fefefefefefefeff R12: 00000000000065c0
>>>   R13: ffffbeb3c0717d70 R14: 0000000000000000 R15: 000000010005e340
>>>   FS:  0000000000000000(0000) GS:ffff99bb67c80000(0000) knlGS:0000000000000000
>>>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>   CR2: 00000000000065c0 CR3: 000000008980a000 CR4: 00000000000006e0
>>>   Call Trace:
>>>    <TASK>
>>>    amdgpu_i2c_pre_xfer+0x163/0x180 [amdgpu]
>>>    bit_xfer+0x36/0x530 [i2c_algo_bit]
>>>    __i2c_transfer+0x185/0x550
>>>    i2c_transfer+0xa2/0x110
>>>    amdgpu_display_ddc_probe+0xbd/0x100 [amdgpu]
>>>    amdgpu_connector_vga_detect+0x8e/0x200 [amdgpu]
>>>    drm_helper_probe_detect_ctx+0x7b/0xd0 [drm_kms_helper]
>>>    output_poll_execute+0x152/0x220 [drm_kms_helper]
>>>    process_one_work+0x1ae/0x370
>>>    worker_thread+0x4d/0x3b0
>>>    ? rescuer_thread+0x380/0x380
>>>    kthread+0xe3/0x110
>>>    ? kthread_complete_and_exit+0x20/0x20
>>>    ret_from_fork+0x22/0x30
>>>    </TASK>
>>>   Modules linked in: max6650 hwmon_vid parport_pc ppdev lp parport amdgpu(+) snd_emu10k1_synth snd_emux_synth snd_seq_midi_emul snd_seq_virmidi snd_seq_midi snd_seq_midi_event snd_seq wmi_bmof snd_emu10k1 edac_mce_amd gpu_sched drm_buddy video kvm_amd drm_ttm_helper ttm snd_util_mem drm_display_helper snd_ac97_codec ccp drm_kms_helper snd_hda_codec_hdmi rng_core ac97_bus snd_rawmidi snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core snd_seq_device drm kvm snd_hwdep snd_pcm_oss snd_mixer_oss evdev serio_raw snd_pcm irqbypass i2c_algo_bit fb_sys_fops syscopyarea sysfillrect emu10k1_gp pcspkr gameport k10temp snd_timer sysimgblt snd acpi_cpufreq wmi soundcore button sp5100_tco asus_atk0110 ext4 crc16 mbcache jbd2 btrfs blake2b_generic xor raid6_pq zstd_compress libcrc32c crc32c_generic uas usb_storage sg sd_mod hid_generic t10_pi usbhid hid sr_mod cdrom crc64_rocksoft crc64 ata_generic ahci pata_atiixp libahci ohci_pci firewire_ohci libata firewire_core crc_itu_t xhci_pci
>>>    scsi_mod ohci_hcd r8169 ehci_pci xhci_hcd realtek ehci_hcd mdio_devres i2c_piix4 scsi_common usbcore libphy usb_common
>>>   CR2: 00000000000065c0
>>>   ---[ end trace 0000000000000000 ]---
>>>   RIP: 0010:drm_sched_fini+0x80/0xa0 [gpu_sched]
>>>   Code: 76 83 0e c4 c6 85 8c 01 00 00 00 5b 5d 41 5c 41 5d c3 cc cc cc cc 4c 8d 63 f0 4c 89 e7 e8 08 99 8e c4 48 8b 03 48 39 d8 74 0f <c6> 80 90 00 00 00 01 48 8b 00 48 39 d8 75 f1 4c 89 e7 e8 c9 99 8e
>>>   RSP: 0018:ffffbeb3c06bfbb8 EFLAGS: 00010213
>>>   RAX: 0000000000000000 RBX: ffff99bae8269a98 RCX: ffff99bab703afc0
>>>   RDX: 0000000000000001 RSI: ffff99bab703afe8 RDI: 0000000000000000
>>>   RBP: ffff99bae82699f0 R08: ffffffff85cd0bc2 R09: 0000000000000010
>>>   R10: 0000000000000035 R11: ffff99bb594806c0 R12: ffff99bae8269a88
>>>   R13: ffff99bae82699f8 R14: ffff99bae82665e8 R15: 0000000000000000
>>>   FS:  0000000000000000(0000) GS:ffff99bb67c80000(0000) knlGS:0000000000000000
>>>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>   CR2: 00000000000065c0 CR3: 000000008980a000 CR4: 00000000000006e0
  
Arunpravin Paneer Selvam Oct. 17, 2022, 6:54 a.m. UTC | #7
Hi Arthur,

Is this old radeon card?

Thanks,
Arun

On 10/17/2022 11:50 AM, Christian König wrote:
> Arun please take a look into this ASAP.
>
> Thanks,
> Christian.
>
> Am 17.10.22 um 03:13 schrieb Arthur Marsh:
>> Thanks Dave, I reverted patch 
>> 312b4dc11d4f74bfe03ea25ffe04c1f2fdd13cb9 against 6.1-rc1 and the 
>> resulting kernel loaded amdgpu fine on my pc with Cape Verde GPU.
>>
>> Regards,
>>
>> Arthur.
>>
>> On 17 October 2022 8:14:18 am ACDT, Dave Airlie <airlied@gmail.com> 
>> wrote:
>>> On Sun, 16 Oct 2022 at 18:09, Arthur Marsh
>>> <arthur.marsh@internode.on.net> wrote:
>>>> From: Arthur Marsh <arthur.marsh@internode.on.net>
>>>>
>>>> Hi, the "drm fixes for 6.1-rc1" commit caused the amdgpu module to 
>>>> fail
>>>> with my Cape Verde radeonsi card.
>>>>
>>>> I haven't been able to bisect the problem to an individual commit, but
>>>> attach a dmesg extract below.
>>>>
>>>> I'm happy to supply any other configuration information and test 
>>>> patches.
>>>>
>>> Can you try reverting: it's the only think I can spot that might
>>> affect a card that old since most changes in that request were for
>>> display hw you don't have.
>>>
>>> ommit 312b4dc11d4f74bfe03ea25ffe04c1f2fdd13cb9
>>> Author: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
>>> Date:   Tue Oct 4 07:33:39 2022 -0700
>>>
>>>     drm/amdgpu: Fix VRAM BO swap issue
>>>
>>>     DRM buddy manager allocates the contiguous memory requests in
>>>     a single block or multiple blocks. So for the ttm move operation
>>>     (incase of low vram memory) we should consider all the blocks to
>>>     compute the total memory size which compared with the struct
>>>     ttm_resource num_pages in order to verify that the blocks are
>>>     contiguous for the eviction process.
>>>
>>>     v2: Added a Fixes tag
>>>     v3: Rewrite the code to save a bit of calculations and
>>>         variables (Christian)
>>>
>>>     Fixes: c9cad937c0c5 ("drm/amdgpu: add drm buddy support to amdgpu")
>>>     Signed-off-by: Arunpravin Paneer Selvam 
>>> <Arunpravin.PaneerSelvam@amd.com>
>>>     Reviewed-by: Christian König <christian.koenig@amd.com>
>>>     Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>>>
>>>
>>> Thanks,
>>> Dave.
>>>
>>>> Arthur.
>>>>
>>>>   Linux version 6.0.0+ (root@am64) (gcc-12 (Debian 12.2.0-5) 
>>>> 12.2.0, GNU ld (GNU Binutils for Debian) 2.39) #5179 SMP 
>>>> PREEMPT_DYNAMIC Fri Oct 14 17:00:40 ACDT 2022
>>>>   Command line: BOOT_IMAGE=/vmlinuz-6.0.0+ 
>>>> root=UUID=39706f53-7c27-4310-b22a-36c7b042d1a1 ro single 
>>>> amdgpu.audio=1 amdgpu.si_support=1 radeon.si_support=0 
>>>> page_owner=on amdgpu.gpu_recovery=1
>>>> ...
>>>>
>>>>   [drm] amdgpu kernel modesetting enabled.
>>>>   amdgpu 0000:01:00.0: vgaarb: deactivate vga console
>>>>   Console: switching to colour dummy device 80x25
>>>>   [drm] initializing kernel modesetting (VERDE 0x1002:0x682B 
>>>> 0x1458:0x22CA 0x87).
>>>>   [drm] register mmio base: 0xFE8C0000
>>>>   [drm] register mmio size: 262144
>>>>   [drm] add ip block number 0 <si_common>
>>>>   [drm] add ip block number 1 <gmc_v6_0>
>>>>   [drm] add ip block number 2 <si_ih>
>>>>   [drm] add ip block number 3 <gfx_v6_0>
>>>>   [drm] add ip block number 4 <si_dma>
>>>>   [drm] add ip block number 5 <si_dpm>
>>>>   [drm] add ip block number 6 <dce_v6_0>
>>>>   [drm] add ip block number 7 <uvd_v3_1>
>>>>   [drm] BIOS signature incorrect 5b 7
>>>>   resource sanity check: requesting [mem 0x000c0000-0x000dffff], 
>>>> which spans more than PCI Bus 0000:00 [mem 0x000d0000-0x000dffff 
>>>> window]
>>>>   caller pci_map_rom+0x68/0x1b0 mapping multiple BARs
>>>>   amdgpu 0000:01:00.0: No more image in the PCI ROM
>>>>   amdgpu 0000:01:00.0: amdgpu: Fetched VBIOS from ROM BAR
>>>>   amdgpu: ATOM BIOS: xxx-xxx-xxx
>>>>   amdgpu 0000:01:00.0: amdgpu: Trusted Memory Zone (TMZ) feature 
>>>> not supported
>>>>   amdgpu 0000:01:00.0: amdgpu: PCIE atomic ops is not supported
>>>>   [drm] PCIE gen 2 link speeds already enabled
>>>>   [drm] vm size is 64 GB, 2 levels, block size is 10-bit, fragment 
>>>> size is 9-bit
>>>>   RTL8211B Gigabit Ethernet r8169-0-300:00: attached PHY driver 
>>>> (mii_bus:phy_addr=r8169-0-300:00, irq=MAC)
>>>>   r8169 0000:03:00.0 eth0: Link is Down
>>>>   amdgpu 0000:01:00.0: amdgpu: VRAM: 2048M 0x000000F400000000 - 
>>>> 0x000000F47FFFFFFF (2048M used)
>>>>   amdgpu 0000:01:00.0: amdgpu: GART: 1024M 0x000000FF00000000 - 
>>>> 0x000000FF3FFFFFFF
>>>>   [drm] Detected VRAM RAM=2048M, BAR=256M
>>>>   [drm] RAM width 128bits DDR3
>>>>   [drm] amdgpu: 2048M of VRAM memory ready
>>>>   [drm] amdgpu: 3979M of GTT memory ready.
>>>>   [drm] GART: num cpu pages 262144, num gpu pages 262144
>>>>   amdgpu 0000:01:00.0: amdgpu: PCIE GART of 1024M enabled (table at 
>>>> 0x000000F400A00000).
>>>>   [drm] Internal thermal controller with fan control
>>>>   [drm] amdgpu: dpm initialized
>>>>   [drm] AMDGPU Display Connectors
>>>>   [drm] Connector 0:
>>>>   [drm]   HDMI-A-1
>>>>   [drm]   HPD1
>>>>   [drm]   DDC: 0x194c 0x194c 0x194d 0x194d 0x194e 0x194e 0x194f 0x194f
>>>>   [drm]   Encoders:
>>>>   [drm]     DFP1: INTERNAL_UNIPHY
>>>>   [drm] Connector 1:
>>>>   [drm]   DVI-D-1
>>>>   [drm]   HPD2
>>>>   [drm]   DDC: 0x1950 0x1950 0x1951 0x1951 0x1952 0x1952 0x1953 0x1953
>>>>   [drm]   Encoders:
>>>>   [drm]     DFP2: INTERNAL_UNIPHY
>>>>   [drm] Connector 2:
>>>>   [drm]   VGA-1
>>>>   [drm]   DDC: 0x1970 0x1970 0x1971 0x1971 0x1972 0x1972 0x1973 0x1973
>>>>   [drm]   Encoders:
>>>>   [drm]     CRT1: INTERNAL_KLDSCP_DAC1
>>>>   [drm] Found UVD firmware Version: 64.0 Family ID: 13
>>>>   amdgpu: Move buffer fallback to memcpy unavailable
>>>>   [drm:amdgpu_device_init.cold [amdgpu]] *ERROR* sw_init of IP 
>>>> block <uvd_v3_1> failed -19
>>>>   amdgpu 0000:01:00.0: amdgpu: amdgpu_device_ip_init failed
>>>>   amdgpu 0000:01:00.0: amdgpu: Fatal error during GPU init
>>>>   amdgpu 0000:01:00.0: amdgpu: amdgpu: finishing device.
>>>>   BUG: kernel NULL pointer dereference, address: 0000000000000090
>>>>   #PF: supervisor write access in kernel mode
>>>>   #PF: error_code(0x0002) - not-present page
>>>>   PGD 0 P4D 0
>>>>   Oops: 0002 [#1] PREEMPT SMP NOPTI
>>>>   CPU: 3 PID: 447 Comm: udevd Not tainted 6.0.0+ #5179
>>>>   Hardware name: System manufacturer System Product Name/M3A78 PRO, 
>>>> BIOS 1701    01/27/2011
>>>>   RIP: 0010:drm_sched_fini+0x80/0xa0 [gpu_sched]
>>>>   Code: 76 83 0e c4 c6 85 8c 01 00 00 00 5b 5d 41 5c 41 5d c3 cc cc 
>>>> cc cc 4c 8d 63 f0 4c 89 e7 e8 08 99 8e c4 48 8b 03 48 39 d8 74 0f 
>>>> <c6> 80 90 00 00 00 01 48 8b 00 48 39 d8 75 f1 4c 89 e7 e8 c9 99 8e
>>>>   RSP: 0018:ffffbeb3c06bfbb8 EFLAGS: 00010213
>>>>   RAX: 0000000000000000 RBX: ffff99bae8269a98 RCX: ffff99bab703afc0
>>>>   RDX: 0000000000000001 RSI: ffff99bab703afe8 RDI: 0000000000000000
>>>>   RBP: ffff99bae82699f0 R08: ffffffff85cd0bc2 R09: 0000000000000010
>>>>   R10: 0000000000000035 R11: ffff99bb594806c0 R12: ffff99bae8269a88
>>>>   R13: ffff99bae82699f8 R14: ffff99bae82665e8 R15: 0000000000000000
>>>>   FS:  00007fd81fcd9840(0000) GS:ffff99bb67cc0000(0000) 
>>>> knlGS:0000000000000000
>>>>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>   CR2: 0000000000000090 CR3: 0000000111822000 CR4: 00000000000006e0
>>>>   Call Trace:
>>>>    <TASK>
>>>>    amdgpu_fence_driver_sw_fini+0xc2/0xd0 [amdgpu]
>>>>    amdgpu_device_fini_sw+0x17/0x3c0 [amdgpu]
>>>>    amdgpu_driver_release_kms+0x12/0x30 [amdgpu]
>>>>    devm_drm_dev_init_release+0x4a/0x70 [drm]
>>>>    release_nodes+0x40/0xb0
>>>>    devres_release_all+0x89/0xc0
>>>>    device_unbind_cleanup+0xe/0x70
>>>>    really_probe+0x245/0x3a0
>>>>    ? pm_runtime_barrier+0x61/0xb0
>>>>    __driver_probe_device+0x78/0x170
>>>>    driver_probe_device+0x2d/0xb0
>>>>    __driver_attach+0xdc/0x1d0
>>>>    ? __device_attach_driver+0x100/0x100
>>>>    bus_for_each_dev+0x69/0xa0
>>>>    bus_add_driver+0x1d4/0x230
>>>>    ? _raw_spin_unlock+0x15/0x40
>>>>    driver_register+0x89/0xe0
>>>>    ? 0xffffffffc0c3b000
>>>>    do_one_initcall+0x44/0x200
>>>>    ? __kmem_cache_alloc_node+0x90/0x360
>>>>    ? kmalloc_trace+0x38/0xc0
>>>>    do_init_module+0x4a/0x1e0
>>>>    __do_sys_finit_module+0xb5/0x130
>>>>    do_syscall_64+0x3a/0x90
>>>>    entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>>>   RIP: 0033:0x7fd81ff5b1b9
>>>>   Code: 08 44 89 e0 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 48 89 f8 
>>>> 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 
>>>> <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 27 1c 0d 00 f7 d8 64 89 01 48
>>>>   RSP: 002b:00007ffc5b37cbb8 EFLAGS: 00000246 ORIG_RAX: 
>>>> 0000000000000139
>>>>   RAX: ffffffffffffffda RBX: 000055e5f2f6a140 RCX: 00007fd81ff5b1b9
>>>>   RDX: 0000000000000000 RSI: 000055e5f2f67e30 RDI: 0000000000000017
>>>>   RBP: 000055e5f2f67e30 R08: 0000000000000000 R09: 000055e5f2f46700
>>>>   R10: 0000000000000017 R11: 0000000000000246 R12: 0000000000020000
>>>>   R13: 0000000000000000 R14: 000055e5f2f65b00 R15: 0000000000000024
>>>>    </TASK>
>>>>   Modules linked in: amdgpu(+) snd_emu10k1_synth snd_emux_synth 
>>>> snd_seq_midi_emul snd_seq_virmidi snd_seq_midi snd_seq_midi_event 
>>>> snd_seq wmi_bmof snd_emu10k1 edac_mce_amd gpu_sched drm_buddy video 
>>>> kvm_amd drm_ttm_helper ttm snd_util_mem drm_display_helper 
>>>> snd_ac97_codec ccp drm_kms_helper snd_hda_codec_hdmi rng_core 
>>>> ac97_bus snd_rawmidi snd_hda_intel snd_intel_dspcfg snd_hda_codec 
>>>> snd_hda_core snd_seq_device drm kvm snd_hwdep snd_pcm_oss 
>>>> snd_mixer_oss evdev serio_raw snd_pcm irqbypass i2c_algo_bit 
>>>> fb_sys_fops syscopyarea sysfillrect emu10k1_gp pcspkr gameport 
>>>> k10temp snd_timer sysimgblt snd acpi_cpufreq wmi soundcore button 
>>>> sp5100_tco asus_atk0110 ext4 crc16 mbcache jbd2 btrfs 
>>>> blake2b_generic xor raid6_pq zstd_compress libcrc32c crc32c_generic 
>>>> uas usb_storage sg sd_mod hid_generic t10_pi usbhid hid sr_mod 
>>>> cdrom crc64_rocksoft crc64 ata_generic ahci pata_atiixp libahci 
>>>> ohci_pci firewire_ohci libata firewire_core crc_itu_t xhci_pci 
>>>> scsi_mod ohci_hcd r8169 ehci_pci xhci_hcd
>>>>    realtek ehci_hcd mdio_devres i2c_piix4 scsi_common usbcore 
>>>> libphy usb_common
>>>>   CR2: 0000000000000090
>>>>   ---[ end trace 0000000000000000 ]---
>>>>   RIP: 0010:drm_sched_fini+0x80/0xa0 [gpu_sched]
>>>>   Code: 76 83 0e c4 c6 85 8c 01 00 00 00 5b 5d 41 5c 41 5d c3 cc cc 
>>>> cc cc 4c 8d 63 f0 4c 89 e7 e8 08 99 8e c4 48 8b 03 48 39 d8 74 0f 
>>>> <c6> 80 90 00 00 00 01 48 8b 00 48 39 d8 75 f1 4c 89 e7 e8 c9 99 8e
>>>>   RSP: 0018:ffffbeb3c06bfbb8 EFLAGS: 00010213
>>>>   RAX: 0000000000000000 RBX: ffff99bae8269a98 RCX: ffff99bab703afc0
>>>>   RDX: 0000000000000001 RSI: ffff99bab703afe8 RDI: 0000000000000000
>>>>   RBP: ffff99bae82699f0 R08: ffffffff85cd0bc2 R09: 0000000000000010
>>>>   R10: 0000000000000035 R11: ffff99bb594806c0 R12: ffff99bae8269a88
>>>>   R13: ffff99bae82699f8 R14: ffff99bae82665e8 R15: 0000000000000000
>>>>   FS:  00007fd81fcd9840(0000) GS:ffff99bb67cc0000(0000) 
>>>> knlGS:0000000000000000
>>>>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>   CR2: 0000000000000090 CR3: 0000000111822000 CR4: 00000000000006e0
>>>>   note: udevd[447] exited with preempt_count 1
>>>>   udevd[433]: worker [447] terminated by signal 9 (Killed)
>>>>   udevd[433]: worker [447] failed while handling 
>>>> '/devices/pci0000:00/0000:00:02.0/0000:01:00.0'
>>>>   r8169 0000:03:00.0 eth0: Link is Up - 1Gbps/Full - flow control off
>>>>   IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
>>>>   Adding 4194300k swap on /dev/sda4.  Priority:-2 extents:1 
>>>> across:4194300k FS
>>>>   EXT4-fs (sda5): re-mounted. Quota mode: none.
>>>>   lp: driver loaded but no devices found
>>>>   ppdev: user-space parallel port driver
>>>>   it87: Found IT8716F chip at 0xe80, revision 3
>>>>   ACPI Warning: SystemIO range 
>>>> 0x0000000000000E85-0x0000000000000E86 conflicts with OpRegion 
>>>> 0x0000000000000E85-0x0000000000000E86 (\_SB.PCI0.SBRG.ASOC.HWRE) 
>>>> (20220331/utaddress-204)
>>>>   ACPI: OSL: Resource conflict; ACPI support missing from driver?
>>>>   BUG: unable to handle page fault for address: 00000000000065c0
>>>>   #PF: supervisor read access in kernel mode
>>>>   #PF: error_code(0x0000) - not-present page
>>>>   PGD 0 P4D 0
>>>>   Oops: 0000 [#2] PREEMPT SMP NOPTI
>>>>   CPU: 2 PID: 55 Comm: kworker/2:1 Tainted: G D            6.0.0+ 
>>>> #5179
>>>>   Hardware name: System manufacturer System Product Name/M3A78 PRO, 
>>>> BIOS 1701    01/27/2011
>>>>   Workqueue: events output_poll_execute [drm_kms_helper]
>>>>   RIP: 0010:amdgpu_device_rreg.part.0+0x39/0x100 [amdgpu]
>>>>   Code: 6c 24 08 48 89 fb 4c 89 64 24 10 44 8d 24 b5 00 00 00 00 4c 
>>>> 3b a7 88 08 00 00 89 f5 73 70 83 e2 02 74 2f 4c 03 a3 90 08 00 00 
>>>> <45> 8b 24 24 48 8b 43 08 0f b7 70 3e 66 90 44 89 e0 48 8b 1c 24 48
>>>>   RSP: 0018:ffffbeb3c0717c48 EFLAGS: 00010206
>>>>   RAX: 0000000000000000 RBX: ffff99bae8260000 RCX: 0000000000000000
>>>>   RDX: 0000000000000000 RSI: 0000000000001970 RDI: ffff99bae8260000
>>>>   RBP: 0000000000001970 R08: ffffbeb3c0717e08 R09: 0000000000000000
>>>>   R10: 0000000000000018 R11: fefefefefefefeff R12: 00000000000065c0
>>>>   R13: ffffbeb3c0717d70 R14: 0000000000000000 R15: 000000010005e340
>>>>   FS:  0000000000000000(0000) GS:ffff99bb67c80000(0000) 
>>>> knlGS:0000000000000000
>>>>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>   CR2: 00000000000065c0 CR3: 000000008980a000 CR4: 00000000000006e0
>>>>   Call Trace:
>>>>    <TASK>
>>>>    amdgpu_i2c_pre_xfer+0x163/0x180 [amdgpu]
>>>>    bit_xfer+0x36/0x530 [i2c_algo_bit]
>>>>    __i2c_transfer+0x185/0x550
>>>>    i2c_transfer+0xa2/0x110
>>>>    amdgpu_display_ddc_probe+0xbd/0x100 [amdgpu]
>>>>    amdgpu_connector_vga_detect+0x8e/0x200 [amdgpu]
>>>>    drm_helper_probe_detect_ctx+0x7b/0xd0 [drm_kms_helper]
>>>>    output_poll_execute+0x152/0x220 [drm_kms_helper]
>>>>    process_one_work+0x1ae/0x370
>>>>    worker_thread+0x4d/0x3b0
>>>>    ? rescuer_thread+0x380/0x380
>>>>    kthread+0xe3/0x110
>>>>    ? kthread_complete_and_exit+0x20/0x20
>>>>    ret_from_fork+0x22/0x30
>>>>    </TASK>
>>>>   Modules linked in: max6650 hwmon_vid parport_pc ppdev lp parport 
>>>> amdgpu(+) snd_emu10k1_synth snd_emux_synth snd_seq_midi_emul 
>>>> snd_seq_virmidi snd_seq_midi snd_seq_midi_event snd_seq wmi_bmof 
>>>> snd_emu10k1 edac_mce_amd gpu_sched drm_buddy video kvm_amd 
>>>> drm_ttm_helper ttm snd_util_mem drm_display_helper snd_ac97_codec 
>>>> ccp drm_kms_helper snd_hda_codec_hdmi rng_core ac97_bus snd_rawmidi 
>>>> snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core 
>>>> snd_seq_device drm kvm snd_hwdep snd_pcm_oss snd_mixer_oss evdev 
>>>> serio_raw snd_pcm irqbypass i2c_algo_bit fb_sys_fops syscopyarea 
>>>> sysfillrect emu10k1_gp pcspkr gameport k10temp snd_timer sysimgblt 
>>>> snd acpi_cpufreq wmi soundcore button sp5100_tco asus_atk0110 ext4 
>>>> crc16 mbcache jbd2 btrfs blake2b_generic xor raid6_pq zstd_compress 
>>>> libcrc32c crc32c_generic uas usb_storage sg sd_mod hid_generic 
>>>> t10_pi usbhid hid sr_mod cdrom crc64_rocksoft crc64 ata_generic 
>>>> ahci pata_atiixp libahci ohci_pci firewire_ohci libata 
>>>> firewire_core crc_itu_t xhci_pci
>>>>    scsi_mod ohci_hcd r8169 ehci_pci xhci_hcd realtek ehci_hcd 
>>>> mdio_devres i2c_piix4 scsi_common usbcore libphy usb_common
>>>>   CR2: 00000000000065c0
>>>>   ---[ end trace 0000000000000000 ]---
>>>>   RIP: 0010:drm_sched_fini+0x80/0xa0 [gpu_sched]
>>>>   Code: 76 83 0e c4 c6 85 8c 01 00 00 00 5b 5d 41 5c 41 5d c3 cc cc 
>>>> cc cc 4c 8d 63 f0 4c 89 e7 e8 08 99 8e c4 48 8b 03 48 39 d8 74 0f 
>>>> <c6> 80 90 00 00 00 01 48 8b 00 48 39 d8 75 f1 4c 89 e7 e8 c9 99 8e
>>>>   RSP: 0018:ffffbeb3c06bfbb8 EFLAGS: 00010213
>>>>   RAX: 0000000000000000 RBX: ffff99bae8269a98 RCX: ffff99bab703afc0
>>>>   RDX: 0000000000000001 RSI: ffff99bab703afe8 RDI: 0000000000000000
>>>>   RBP: ffff99bae82699f0 R08: ffffffff85cd0bc2 R09: 0000000000000010
>>>>   R10: 0000000000000035 R11: ffff99bb594806c0 R12: ffff99bae8269a88
>>>>   R13: ffff99bae82699f8 R14: ffff99bae82665e8 R15: 0000000000000000
>>>>   FS:  0000000000000000(0000) GS:ffff99bb67c80000(0000) 
>>>> knlGS:0000000000000000
>>>>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>   CR2: 00000000000065c0 CR3: 000000008980a000 CR4: 00000000000006e0
>
  
Christian König Oct. 17, 2022, 7:07 a.m. UTC | #8
Hi Arun,

the hw generation doesn't matter. This error message here:

amdgpu: Move buffer fallback to memcpy unavailable

indicates that the detection of linear buffers still doesn't work as 
expected or that we have a bug somewhere else.

Maybe the limiting when SDMA moves are not available isn't working 
correctly?

Regards,
Christian.

Am 17.10.22 um 08:54 schrieb Arunpravin Paneer Selvam:
> Hi Arthur,
>
> Is this old radeon card?
>
> Thanks,
> Arun
>
> On 10/17/2022 11:50 AM, Christian König wrote:
>> Arun please take a look into this ASAP.
>>
>> Thanks,
>> Christian.
>>
>> Am 17.10.22 um 03:13 schrieb Arthur Marsh:
>>> Thanks Dave, I reverted patch 
>>> 312b4dc11d4f74bfe03ea25ffe04c1f2fdd13cb9 against 6.1-rc1 and the 
>>> resulting kernel loaded amdgpu fine on my pc with Cape Verde GPU.
>>>
>>> Regards,
>>>
>>> Arthur.
>>>
>>> On 17 October 2022 8:14:18 am ACDT, Dave Airlie <airlied@gmail.com> 
>>> wrote:
>>>> On Sun, 16 Oct 2022 at 18:09, Arthur Marsh
>>>> <arthur.marsh@internode.on.net> wrote:
>>>>> From: Arthur Marsh <arthur.marsh@internode.on.net>
>>>>>
>>>>> Hi, the "drm fixes for 6.1-rc1" commit caused the amdgpu module to 
>>>>> fail
>>>>> with my Cape Verde radeonsi card.
>>>>>
>>>>> I haven't been able to bisect the problem to an individual commit, 
>>>>> but
>>>>> attach a dmesg extract below.
>>>>>
>>>>> I'm happy to supply any other configuration information and test 
>>>>> patches.
>>>>>
>>>> Can you try reverting: it's the only think I can spot that might
>>>> affect a card that old since most changes in that request were for
>>>> display hw you don't have.
>>>>
>>>> ommit 312b4dc11d4f74bfe03ea25ffe04c1f2fdd13cb9
>>>> Author: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
>>>> Date:   Tue Oct 4 07:33:39 2022 -0700
>>>>
>>>>     drm/amdgpu: Fix VRAM BO swap issue
>>>>
>>>>     DRM buddy manager allocates the contiguous memory requests in
>>>>     a single block or multiple blocks. So for the ttm move operation
>>>>     (incase of low vram memory) we should consider all the blocks to
>>>>     compute the total memory size which compared with the struct
>>>>     ttm_resource num_pages in order to verify that the blocks are
>>>>     contiguous for the eviction process.
>>>>
>>>>     v2: Added a Fixes tag
>>>>     v3: Rewrite the code to save a bit of calculations and
>>>>         variables (Christian)
>>>>
>>>>     Fixes: c9cad937c0c5 ("drm/amdgpu: add drm buddy support to 
>>>> amdgpu")
>>>>     Signed-off-by: Arunpravin Paneer Selvam 
>>>> <Arunpravin.PaneerSelvam@amd.com>
>>>>     Reviewed-by: Christian König <christian.koenig@amd.com>
>>>>     Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>>>>
>>>>
>>>> Thanks,
>>>> Dave.
>>>>
>>>>> Arthur.
>>>>>
>>>>>   Linux version 6.0.0+ (root@am64) (gcc-12 (Debian 12.2.0-5) 
>>>>> 12.2.0, GNU ld (GNU Binutils for Debian) 2.39) #5179 SMP 
>>>>> PREEMPT_DYNAMIC Fri Oct 14 17:00:40 ACDT 2022
>>>>>   Command line: BOOT_IMAGE=/vmlinuz-6.0.0+ 
>>>>> root=UUID=39706f53-7c27-4310-b22a-36c7b042d1a1 ro single 
>>>>> amdgpu.audio=1 amdgpu.si_support=1 radeon.si_support=0 
>>>>> page_owner=on amdgpu.gpu_recovery=1
>>>>> ...
>>>>>
>>>>>   [drm] amdgpu kernel modesetting enabled.
>>>>>   amdgpu 0000:01:00.0: vgaarb: deactivate vga console
>>>>>   Console: switching to colour dummy device 80x25
>>>>>   [drm] initializing kernel modesetting (VERDE 0x1002:0x682B 
>>>>> 0x1458:0x22CA 0x87).
>>>>>   [drm] register mmio base: 0xFE8C0000
>>>>>   [drm] register mmio size: 262144
>>>>>   [drm] add ip block number 0 <si_common>
>>>>>   [drm] add ip block number 1 <gmc_v6_0>
>>>>>   [drm] add ip block number 2 <si_ih>
>>>>>   [drm] add ip block number 3 <gfx_v6_0>
>>>>>   [drm] add ip block number 4 <si_dma>
>>>>>   [drm] add ip block number 5 <si_dpm>
>>>>>   [drm] add ip block number 6 <dce_v6_0>
>>>>>   [drm] add ip block number 7 <uvd_v3_1>
>>>>>   [drm] BIOS signature incorrect 5b 7
>>>>>   resource sanity check: requesting [mem 0x000c0000-0x000dffff], 
>>>>> which spans more than PCI Bus 0000:00 [mem 0x000d0000-0x000dffff 
>>>>> window]
>>>>>   caller pci_map_rom+0x68/0x1b0 mapping multiple BARs
>>>>>   amdgpu 0000:01:00.0: No more image in the PCI ROM
>>>>>   amdgpu 0000:01:00.0: amdgpu: Fetched VBIOS from ROM BAR
>>>>>   amdgpu: ATOM BIOS: xxx-xxx-xxx
>>>>>   amdgpu 0000:01:00.0: amdgpu: Trusted Memory Zone (TMZ) feature 
>>>>> not supported
>>>>>   amdgpu 0000:01:00.0: amdgpu: PCIE atomic ops is not supported
>>>>>   [drm] PCIE gen 2 link speeds already enabled
>>>>>   [drm] vm size is 64 GB, 2 levels, block size is 10-bit, fragment 
>>>>> size is 9-bit
>>>>>   RTL8211B Gigabit Ethernet r8169-0-300:00: attached PHY driver 
>>>>> (mii_bus:phy_addr=r8169-0-300:00, irq=MAC)
>>>>>   r8169 0000:03:00.0 eth0: Link is Down
>>>>>   amdgpu 0000:01:00.0: amdgpu: VRAM: 2048M 0x000000F400000000 - 
>>>>> 0x000000F47FFFFFFF (2048M used)
>>>>>   amdgpu 0000:01:00.0: amdgpu: GART: 1024M 0x000000FF00000000 - 
>>>>> 0x000000FF3FFFFFFF
>>>>>   [drm] Detected VRAM RAM=2048M, BAR=256M
>>>>>   [drm] RAM width 128bits DDR3
>>>>>   [drm] amdgpu: 2048M of VRAM memory ready
>>>>>   [drm] amdgpu: 3979M of GTT memory ready.
>>>>>   [drm] GART: num cpu pages 262144, num gpu pages 262144
>>>>>   amdgpu 0000:01:00.0: amdgpu: PCIE GART of 1024M enabled (table 
>>>>> at 0x000000F400A00000).
>>>>>   [drm] Internal thermal controller with fan control
>>>>>   [drm] amdgpu: dpm initialized
>>>>>   [drm] AMDGPU Display Connectors
>>>>>   [drm] Connector 0:
>>>>>   [drm]   HDMI-A-1
>>>>>   [drm]   HPD1
>>>>>   [drm]   DDC: 0x194c 0x194c 0x194d 0x194d 0x194e 0x194e 0x194f 
>>>>> 0x194f
>>>>>   [drm]   Encoders:
>>>>>   [drm]     DFP1: INTERNAL_UNIPHY
>>>>>   [drm] Connector 1:
>>>>>   [drm]   DVI-D-1
>>>>>   [drm]   HPD2
>>>>>   [drm]   DDC: 0x1950 0x1950 0x1951 0x1951 0x1952 0x1952 0x1953 
>>>>> 0x1953
>>>>>   [drm]   Encoders:
>>>>>   [drm]     DFP2: INTERNAL_UNIPHY
>>>>>   [drm] Connector 2:
>>>>>   [drm]   VGA-1
>>>>>   [drm]   DDC: 0x1970 0x1970 0x1971 0x1971 0x1972 0x1972 0x1973 
>>>>> 0x1973
>>>>>   [drm]   Encoders:
>>>>>   [drm]     CRT1: INTERNAL_KLDSCP_DAC1
>>>>>   [drm] Found UVD firmware Version: 64.0 Family ID: 13
>>>>>   amdgpu: Move buffer fallback to memcpy unavailable
>>>>>   [drm:amdgpu_device_init.cold [amdgpu]] *ERROR* sw_init of IP 
>>>>> block <uvd_v3_1> failed -19
>>>>>   amdgpu 0000:01:00.0: amdgpu: amdgpu_device_ip_init failed
>>>>>   amdgpu 0000:01:00.0: amdgpu: Fatal error during GPU init
>>>>>   amdgpu 0000:01:00.0: amdgpu: amdgpu: finishing device.
>>>>>   BUG: kernel NULL pointer dereference, address: 0000000000000090
>>>>>   #PF: supervisor write access in kernel mode
>>>>>   #PF: error_code(0x0002) - not-present page
>>>>>   PGD 0 P4D 0
>>>>>   Oops: 0002 [#1] PREEMPT SMP NOPTI
>>>>>   CPU: 3 PID: 447 Comm: udevd Not tainted 6.0.0+ #5179
>>>>>   Hardware name: System manufacturer System Product Name/M3A78 
>>>>> PRO, BIOS 1701    01/27/2011
>>>>>   RIP: 0010:drm_sched_fini+0x80/0xa0 [gpu_sched]
>>>>>   Code: 76 83 0e c4 c6 85 8c 01 00 00 00 5b 5d 41 5c 41 5d c3 cc 
>>>>> cc cc cc 4c 8d 63 f0 4c 89 e7 e8 08 99 8e c4 48 8b 03 48 39 d8 74 
>>>>> 0f <c6> 80 90 00 00 00 01 48 8b 00 48 39 d8 75 f1 4c 89 e7 e8 c9 
>>>>> 99 8e
>>>>>   RSP: 0018:ffffbeb3c06bfbb8 EFLAGS: 00010213
>>>>>   RAX: 0000000000000000 RBX: ffff99bae8269a98 RCX: ffff99bab703afc0
>>>>>   RDX: 0000000000000001 RSI: ffff99bab703afe8 RDI: 0000000000000000
>>>>>   RBP: ffff99bae82699f0 R08: ffffffff85cd0bc2 R09: 0000000000000010
>>>>>   R10: 0000000000000035 R11: ffff99bb594806c0 R12: ffff99bae8269a88
>>>>>   R13: ffff99bae82699f8 R14: ffff99bae82665e8 R15: 0000000000000000
>>>>>   FS:  00007fd81fcd9840(0000) GS:ffff99bb67cc0000(0000) 
>>>>> knlGS:0000000000000000
>>>>>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>   CR2: 0000000000000090 CR3: 0000000111822000 CR4: 00000000000006e0
>>>>>   Call Trace:
>>>>>    <TASK>
>>>>>    amdgpu_fence_driver_sw_fini+0xc2/0xd0 [amdgpu]
>>>>>    amdgpu_device_fini_sw+0x17/0x3c0 [amdgpu]
>>>>>    amdgpu_driver_release_kms+0x12/0x30 [amdgpu]
>>>>>    devm_drm_dev_init_release+0x4a/0x70 [drm]
>>>>>    release_nodes+0x40/0xb0
>>>>>    devres_release_all+0x89/0xc0
>>>>>    device_unbind_cleanup+0xe/0x70
>>>>>    really_probe+0x245/0x3a0
>>>>>    ? pm_runtime_barrier+0x61/0xb0
>>>>>    __driver_probe_device+0x78/0x170
>>>>>    driver_probe_device+0x2d/0xb0
>>>>>    __driver_attach+0xdc/0x1d0
>>>>>    ? __device_attach_driver+0x100/0x100
>>>>>    bus_for_each_dev+0x69/0xa0
>>>>>    bus_add_driver+0x1d4/0x230
>>>>>    ? _raw_spin_unlock+0x15/0x40
>>>>>    driver_register+0x89/0xe0
>>>>>    ? 0xffffffffc0c3b000
>>>>>    do_one_initcall+0x44/0x200
>>>>>    ? __kmem_cache_alloc_node+0x90/0x360
>>>>>    ? kmalloc_trace+0x38/0xc0
>>>>>    do_init_module+0x4a/0x1e0
>>>>>    __do_sys_finit_module+0xb5/0x130
>>>>>    do_syscall_64+0x3a/0x90
>>>>>    entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>>>>   RIP: 0033:0x7fd81ff5b1b9
>>>>>   Code: 08 44 89 e0 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 48 89 
>>>>> f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 
>>>>> 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 27 1c 0d 00 f7 d8 64 89 
>>>>> 01 48
>>>>>   RSP: 002b:00007ffc5b37cbb8 EFLAGS: 00000246 ORIG_RAX: 
>>>>> 0000000000000139
>>>>>   RAX: ffffffffffffffda RBX: 000055e5f2f6a140 RCX: 00007fd81ff5b1b9
>>>>>   RDX: 0000000000000000 RSI: 000055e5f2f67e30 RDI: 0000000000000017
>>>>>   RBP: 000055e5f2f67e30 R08: 0000000000000000 R09: 000055e5f2f46700
>>>>>   R10: 0000000000000017 R11: 0000000000000246 R12: 0000000000020000
>>>>>   R13: 0000000000000000 R14: 000055e5f2f65b00 R15: 0000000000000024
>>>>>    </TASK>
>>>>>   Modules linked in: amdgpu(+) snd_emu10k1_synth snd_emux_synth 
>>>>> snd_seq_midi_emul snd_seq_virmidi snd_seq_midi snd_seq_midi_event 
>>>>> snd_seq wmi_bmof snd_emu10k1 edac_mce_amd gpu_sched drm_buddy 
>>>>> video kvm_amd drm_ttm_helper ttm snd_util_mem drm_display_helper 
>>>>> snd_ac97_codec ccp drm_kms_helper snd_hda_codec_hdmi rng_core 
>>>>> ac97_bus snd_rawmidi snd_hda_intel snd_intel_dspcfg snd_hda_codec 
>>>>> snd_hda_core snd_seq_device drm kvm snd_hwdep snd_pcm_oss 
>>>>> snd_mixer_oss evdev serio_raw snd_pcm irqbypass i2c_algo_bit 
>>>>> fb_sys_fops syscopyarea sysfillrect emu10k1_gp pcspkr gameport 
>>>>> k10temp snd_timer sysimgblt snd acpi_cpufreq wmi soundcore button 
>>>>> sp5100_tco asus_atk0110 ext4 crc16 mbcache jbd2 btrfs 
>>>>> blake2b_generic xor raid6_pq zstd_compress libcrc32c 
>>>>> crc32c_generic uas usb_storage sg sd_mod hid_generic t10_pi usbhid 
>>>>> hid sr_mod cdrom crc64_rocksoft crc64 ata_generic ahci pata_atiixp 
>>>>> libahci ohci_pci firewire_ohci libata firewire_core crc_itu_t 
>>>>> xhci_pci scsi_mod ohci_hcd r8169 ehci_pci xhci_hcd
>>>>>    realtek ehci_hcd mdio_devres i2c_piix4 scsi_common usbcore 
>>>>> libphy usb_common
>>>>>   CR2: 0000000000000090
>>>>>   ---[ end trace 0000000000000000 ]---
>>>>>   RIP: 0010:drm_sched_fini+0x80/0xa0 [gpu_sched]
>>>>>   Code: 76 83 0e c4 c6 85 8c 01 00 00 00 5b 5d 41 5c 41 5d c3 cc 
>>>>> cc cc cc 4c 8d 63 f0 4c 89 e7 e8 08 99 8e c4 48 8b 03 48 39 d8 74 
>>>>> 0f <c6> 80 90 00 00 00 01 48 8b 00 48 39 d8 75 f1 4c 89 e7 e8 c9 
>>>>> 99 8e
>>>>>   RSP: 0018:ffffbeb3c06bfbb8 EFLAGS: 00010213
>>>>>   RAX: 0000000000000000 RBX: ffff99bae8269a98 RCX: ffff99bab703afc0
>>>>>   RDX: 0000000000000001 RSI: ffff99bab703afe8 RDI: 0000000000000000
>>>>>   RBP: ffff99bae82699f0 R08: ffffffff85cd0bc2 R09: 0000000000000010
>>>>>   R10: 0000000000000035 R11: ffff99bb594806c0 R12: ffff99bae8269a88
>>>>>   R13: ffff99bae82699f8 R14: ffff99bae82665e8 R15: 0000000000000000
>>>>>   FS:  00007fd81fcd9840(0000) GS:ffff99bb67cc0000(0000) 
>>>>> knlGS:0000000000000000
>>>>>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>   CR2: 0000000000000090 CR3: 0000000111822000 CR4: 00000000000006e0
>>>>>   note: udevd[447] exited with preempt_count 1
>>>>>   udevd[433]: worker [447] terminated by signal 9 (Killed)
>>>>>   udevd[433]: worker [447] failed while handling 
>>>>> '/devices/pci0000:00/0000:00:02.0/0000:01:00.0'
>>>>>   r8169 0000:03:00.0 eth0: Link is Up - 1Gbps/Full - flow control off
>>>>>   IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
>>>>>   Adding 4194300k swap on /dev/sda4.  Priority:-2 extents:1 
>>>>> across:4194300k FS
>>>>>   EXT4-fs (sda5): re-mounted. Quota mode: none.
>>>>>   lp: driver loaded but no devices found
>>>>>   ppdev: user-space parallel port driver
>>>>>   it87: Found IT8716F chip at 0xe80, revision 3
>>>>>   ACPI Warning: SystemIO range 
>>>>> 0x0000000000000E85-0x0000000000000E86 conflicts with OpRegion 
>>>>> 0x0000000000000E85-0x0000000000000E86 (\_SB.PCI0.SBRG.ASOC.HWRE) 
>>>>> (20220331/utaddress-204)
>>>>>   ACPI: OSL: Resource conflict; ACPI support missing from driver?
>>>>>   BUG: unable to handle page fault for address: 00000000000065c0
>>>>>   #PF: supervisor read access in kernel mode
>>>>>   #PF: error_code(0x0000) - not-present page
>>>>>   PGD 0 P4D 0
>>>>>   Oops: 0000 [#2] PREEMPT SMP NOPTI
>>>>>   CPU: 2 PID: 55 Comm: kworker/2:1 Tainted: G D 6.0.0+ #5179
>>>>>   Hardware name: System manufacturer System Product Name/M3A78 
>>>>> PRO, BIOS 1701    01/27/2011
>>>>>   Workqueue: events output_poll_execute [drm_kms_helper]
>>>>>   RIP: 0010:amdgpu_device_rreg.part.0+0x39/0x100 [amdgpu]
>>>>>   Code: 6c 24 08 48 89 fb 4c 89 64 24 10 44 8d 24 b5 00 00 00 00 
>>>>> 4c 3b a7 88 08 00 00 89 f5 73 70 83 e2 02 74 2f 4c 03 a3 90 08 00 
>>>>> 00 <45> 8b 24 24 48 8b 43 08 0f b7 70 3e 66 90 44 89 e0 48 8b 1c 
>>>>> 24 48
>>>>>   RSP: 0018:ffffbeb3c0717c48 EFLAGS: 00010206
>>>>>   RAX: 0000000000000000 RBX: ffff99bae8260000 RCX: 0000000000000000
>>>>>   RDX: 0000000000000000 RSI: 0000000000001970 RDI: ffff99bae8260000
>>>>>   RBP: 0000000000001970 R08: ffffbeb3c0717e08 R09: 0000000000000000
>>>>>   R10: 0000000000000018 R11: fefefefefefefeff R12: 00000000000065c0
>>>>>   R13: ffffbeb3c0717d70 R14: 0000000000000000 R15: 000000010005e340
>>>>>   FS:  0000000000000000(0000) GS:ffff99bb67c80000(0000) 
>>>>> knlGS:0000000000000000
>>>>>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>   CR2: 00000000000065c0 CR3: 000000008980a000 CR4: 00000000000006e0
>>>>>   Call Trace:
>>>>>    <TASK>
>>>>>    amdgpu_i2c_pre_xfer+0x163/0x180 [amdgpu]
>>>>>    bit_xfer+0x36/0x530 [i2c_algo_bit]
>>>>>    __i2c_transfer+0x185/0x550
>>>>>    i2c_transfer+0xa2/0x110
>>>>>    amdgpu_display_ddc_probe+0xbd/0x100 [amdgpu]
>>>>>    amdgpu_connector_vga_detect+0x8e/0x200 [amdgpu]
>>>>>    drm_helper_probe_detect_ctx+0x7b/0xd0 [drm_kms_helper]
>>>>>    output_poll_execute+0x152/0x220 [drm_kms_helper]
>>>>>    process_one_work+0x1ae/0x370
>>>>>    worker_thread+0x4d/0x3b0
>>>>>    ? rescuer_thread+0x380/0x380
>>>>>    kthread+0xe3/0x110
>>>>>    ? kthread_complete_and_exit+0x20/0x20
>>>>>    ret_from_fork+0x22/0x30
>>>>>    </TASK>
>>>>>   Modules linked in: max6650 hwmon_vid parport_pc ppdev lp parport 
>>>>> amdgpu(+) snd_emu10k1_synth snd_emux_synth snd_seq_midi_emul 
>>>>> snd_seq_virmidi snd_seq_midi snd_seq_midi_event snd_seq wmi_bmof 
>>>>> snd_emu10k1 edac_mce_amd gpu_sched drm_buddy video kvm_amd 
>>>>> drm_ttm_helper ttm snd_util_mem drm_display_helper snd_ac97_codec 
>>>>> ccp drm_kms_helper snd_hda_codec_hdmi rng_core ac97_bus 
>>>>> snd_rawmidi snd_hda_intel snd_intel_dspcfg snd_hda_codec 
>>>>> snd_hda_core snd_seq_device drm kvm snd_hwdep snd_pcm_oss 
>>>>> snd_mixer_oss evdev serio_raw snd_pcm irqbypass i2c_algo_bit 
>>>>> fb_sys_fops syscopyarea sysfillrect emu10k1_gp pcspkr gameport 
>>>>> k10temp snd_timer sysimgblt snd acpi_cpufreq wmi soundcore button 
>>>>> sp5100_tco asus_atk0110 ext4 crc16 mbcache jbd2 btrfs 
>>>>> blake2b_generic xor raid6_pq zstd_compress libcrc32c 
>>>>> crc32c_generic uas usb_storage sg sd_mod hid_generic t10_pi usbhid 
>>>>> hid sr_mod cdrom crc64_rocksoft crc64 ata_generic ahci pata_atiixp 
>>>>> libahci ohci_pci firewire_ohci libata firewire_core crc_itu_t 
>>>>> xhci_pci
>>>>>    scsi_mod ohci_hcd r8169 ehci_pci xhci_hcd realtek ehci_hcd 
>>>>> mdio_devres i2c_piix4 scsi_common usbcore libphy usb_common
>>>>>   CR2: 00000000000065c0
>>>>>   ---[ end trace 0000000000000000 ]---
>>>>>   RIP: 0010:drm_sched_fini+0x80/0xa0 [gpu_sched]
>>>>>   Code: 76 83 0e c4 c6 85 8c 01 00 00 00 5b 5d 41 5c 41 5d c3 cc 
>>>>> cc cc cc 4c 8d 63 f0 4c 89 e7 e8 08 99 8e c4 48 8b 03 48 39 d8 74 
>>>>> 0f <c6> 80 90 00 00 00 01 48 8b 00 48 39 d8 75 f1 4c 89 e7 e8 c9 
>>>>> 99 8e
>>>>>   RSP: 0018:ffffbeb3c06bfbb8 EFLAGS: 00010213
>>>>>   RAX: 0000000000000000 RBX: ffff99bae8269a98 RCX: ffff99bab703afc0
>>>>>   RDX: 0000000000000001 RSI: ffff99bab703afe8 RDI: 0000000000000000
>>>>>   RBP: ffff99bae82699f0 R08: ffffffff85cd0bc2 R09: 0000000000000010
>>>>>   R10: 0000000000000035 R11: ffff99bb594806c0 R12: ffff99bae8269a88
>>>>>   R13: ffff99bae82699f8 R14: ffff99bae82665e8 R15: 0000000000000000
>>>>>   FS:  0000000000000000(0000) GS:ffff99bb67c80000(0000) 
>>>>> knlGS:0000000000000000
>>>>>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>   CR2: 00000000000065c0 CR3: 000000008980a000 CR4: 00000000000006e0
>>
>
  
Dave Airlie Oct. 17, 2022, 8:01 a.m. UTC | #9
On Mon, 17 Oct 2022 at 17:07, Christian König <christian.koenig@amd.com> wrote:
>
> Hi Arun,
>
> the hw generation doesn't matter. This error message here:
>
> amdgpu: Move buffer fallback to memcpy unavailable
>
> indicates that the detection of linear buffers still doesn't work as
> expected or that we have a bug somewhere else.
>
> Maybe the limiting when SDMA moves are not available isn't working
> correctly?

It is a CAPE_VERDE, so maybe something with the SI UVD memory limitations?

Dave.
  
Christian König Oct. 17, 2022, 8:09 a.m. UTC | #10
Am 17.10.22 um 10:01 schrieb Dave Airlie:
> On Mon, 17 Oct 2022 at 17:07, Christian König <christian.koenig@amd.com> wrote:
>> Hi Arun,
>>
>> the hw generation doesn't matter. This error message here:
>>
>> amdgpu: Move buffer fallback to memcpy unavailable
>>
>> indicates that the detection of linear buffers still doesn't work as
>> expected or that we have a bug somewhere else.
>>
>> Maybe the limiting when SDMA moves are not available isn't working
>> correctly?
> It is a CAPE_VERDE, so maybe something with the SI UVD memory limitations?

Yeah, good point. Could be that we try to move something into the UVD 
memory window and that something isn't allocated linearly.

Arun can you trace the allocation and make sure that all kernel 
allocations have the CONTIGUOUS flag set?

Thanks,
Christian.

>
> Dave.
  
Arunpravin Paneer Selvam Oct. 17, 2022, 8:40 p.m. UTC | #11
Hi Christian,

Looks like we have to exit the loop if there are no blocks to compare.
May be that's why the function returns false.

@Arthur Marsh Could you please test the attached patch.

Thanks,
Arun

On 10/17/2022 1:39 PM, Christian König wrote:
> Am 17.10.22 um 10:01 schrieb Dave Airlie:
>> On Mon, 17 Oct 2022 at 17:07, Christian König 
>> <christian.koenig@amd.com> wrote:
>>> Hi Arun,
>>>
>>> the hw generation doesn't matter. This error message here:
>>>
>>> amdgpu: Move buffer fallback to memcpy unavailable
>>>
>>> indicates that the detection of linear buffers still doesn't work as
>>> expected or that we have a bug somewhere else.
>>>
>>> Maybe the limiting when SDMA moves are not available isn't working
>>> correctly?
>> It is a CAPE_VERDE, so maybe something with the SI UVD memory 
>> limitations?
>
> Yeah, good point. Could be that we try to move something into the UVD 
> memory window and that something isn't allocated linearly.
>
> Arun can you trace the allocation and make sure that all kernel 
> allocations have the CONTIGUOUS flag set?
>
> Thanks,
> Christian.
>
>>
>> Dave.
>
From 132ce83f893eaea743fb7f41a9dc72afea52cdaa Mon Sep 17 00:00:00 2001
From: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Date: Mon, 17 Oct 2022 13:15:21 -0700
Subject: [PATCH] drm/amdgpu: Fix for BO move issue

If there are no blocks to compare then exit
the loop.

Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index dc262d2c2925..57277b1cf183 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -439,6 +439,9 @@ static bool amdgpu_mem_visible(struct amdgpu_device *adev,
 	while (cursor.remaining) {
 		amdgpu_res_next(&cursor, cursor.size);
 
+		if (!cursor.remaining)
+			break;
+
 		/* ttm_resource_ioremap only supports contiguous memory */
 		if (end != cursor.start)
 			return false;
  
Arthur Marsh Oct. 18, 2022, 1:28 a.m. UTC | #12
Thanks Arunpravin, your patch applied to the 6.1-rc1 code built a kernel that loaded the amdgpu module on my pc with Cape Verde GPU card with no problems.

Regards,

Arthur. 

On 18 October 2022 7:10:45 am ACDT, Arunpravin Paneer Selvam <arunpravin.paneerselvam@amd.com> wrote:
>Hi Christian,
>
>Looks like we have to exit the loop if there are no blocks to compare.
>May be that's why the function returns false.
>
>@Arthur Marsh Could you please test the attached patch.
>
>Thanks,
>Arun
>
>On 10/17/2022 1:39 PM, Christian König wrote:
>> Am 17.10.22 um 10:01 schrieb Dave Airlie:
>>> On Mon, 17 Oct 2022 at 17:07, Christian König <christian.koenig@amd.com> wrote:
>>>> Hi Arun,
>>>> 
>>>> the hw generation doesn't matter. This error message here:
>>>> 
>>>> amdgpu: Move buffer fallback to memcpy unavailable
>>>> 
>>>> indicates that the detection of linear buffers still doesn't work as
>>>> expected or that we have a bug somewhere else.
>>>> 
>>>> Maybe the limiting when SDMA moves are not available isn't working
>>>> correctly?
>>> It is a CAPE_VERDE, so maybe something with the SI UVD memory limitations?
>> 
>> Yeah, good point. Could be that we try to move something into the UVD memory window and that something isn't allocated linearly.
>> 
>> Arun can you trace the allocation and make sure that all kernel allocations have the CONTIGUOUS flag set?
>> 
>> Thanks,
>> Christian.
>> 
>>> 
>>> Dave.
>>