[0/4] Add LVTS support for mt8192

Message ID 20230307163413.143334-1-bchihi@baylibre.com
Headers
Series Add LVTS support for mt8192 |

Message

Balsam CHIHI March 7, 2023, 4:34 p.m. UTC
  From: Balsam CHIHI <bchihi@baylibre.com>

Add full LVTS support (MCU thermal domain + AP thermal domain) to MediaTek MT8192 SoC.

This series is a continuation of the previous series "Add LVTS Thermal Architecture" v14 :
    https://patchwork.kernel.org/project/linux-pm/cover/20230209105628.50294-1-bchihi@baylibre.com/
and "Add LVTS's AP thermal domain support for mt8195" :
    https://patchwork.kernel.org/project/linux-pm/cover/20230307154524.118541-1-bchihi@baylibre.com/

Based on top of thermal/linux-next :
    base-commit: 6828e402d06f7c574430b61c05db784cd847b19f

Depends on these patches as they are not yet applyied to thermal/linux-next branch :
    [1/4] dt-bindings: thermal: mediatek: Add AP domain to LVTS thermal controllers for mt8195
    https://patchwork.kernel.org/project/linux-pm/patch/20230307154524.118541-2-bchihi@baylibre.com/
    [2/4] thermal/drivers/mediatek/lvts_thermal: Add AP domain for mt8195
    https://patchwork.kernel.org/project/linux-pm/patch/20230307154524.118541-3-bchihi@baylibre.com/

Balsam CHIHI (4):
  dt-bindings: thermal: mediatek: Add LVTS thermal controller definition
    for mt8192
  thermal/drivers/mediatek/lvts_thermal: Add mt8192 support
  arm64: dts: mediatek: mt8192: Add thermal zones and thermal nodes
  arm64: dts: mediatek: mt8192: Add temperature mitigation threshold

 arch/arm64/boot/dts/mediatek/mt8192.dtsi      | 454 ++++++++++++++++++
 drivers/thermal/mediatek/lvts_thermal.c       | 106 +++-
 .../thermal/mediatek,lvts-thermal.h           |  19 +
 3 files changed, 577 insertions(+), 2 deletions(-)


base-commit: 6828e402d06f7c574430b61c05db784cd847b19f
prerequisite-patch-id: 73be949bd16979769e5b94905b244dcee4a8f687
prerequisite-patch-id: 9076e9b3bd3cc411b7b80344211364db5f0cca17
prerequisite-patch-id: e220d6ae26786f524c249588433f02e5f5f906ad
prerequisite-patch-id: 58e295ae36ad4784f3eb3830412f35dad31bb8b6
prerequisite-patch-id: d23d83a946e5b876ef01a717fd51b07df1fa08dd
prerequisite-patch-id: d67f2455eef1c4a9ecc460dbf3c2e3ad47d213ec
prerequisite-patch-id: b407d2998e57678952128b3a4bac92a379132b09
prerequisite-patch-id: fbb9212ce8c3530da17d213f56fa334ce4fa1b2b
prerequisite-patch-id: 5db9eed2659028cf4419f2de3d093af7df6c2dad
prerequisite-patch-id: a83c00c628605d1c8fbe1d97074f9f28efb1bcfc
  

Comments

Chen-Yu Tsai March 9, 2023, 5:04 a.m. UTC | #1
On Wed, Mar 8, 2023 at 12:34 AM <bchihi@baylibre.com> wrote:
>
> From: Balsam CHIHI <bchihi@baylibre.com>
>
> Add full LVTS support (MCU thermal domain + AP thermal domain) to MediaTek MT8192 SoC.
>
> This series is a continuation of the previous series "Add LVTS Thermal Architecture" v14 :
>     https://patchwork.kernel.org/project/linux-pm/cover/20230209105628.50294-1-bchihi@baylibre.com/
> and "Add LVTS's AP thermal domain support for mt8195" :
>     https://patchwork.kernel.org/project/linux-pm/cover/20230307154524.118541-1-bchihi@baylibre.com/
>
> Based on top of thermal/linux-next :
>     base-commit: 6828e402d06f7c574430b61c05db784cd847b19f
>
> Depends on these patches as they are not yet applyied to thermal/linux-next branch :
>     [1/4] dt-bindings: thermal: mediatek: Add AP domain to LVTS thermal controllers for mt8195
>     https://patchwork.kernel.org/project/linux-pm/patch/20230307154524.118541-2-bchihi@baylibre.com/
>     [2/4] thermal/drivers/mediatek/lvts_thermal: Add AP domain for mt8195
>     https://patchwork.kernel.org/project/linux-pm/patch/20230307154524.118541-3-bchihi@baylibre.com/
>
> Balsam CHIHI (4):
>   dt-bindings: thermal: mediatek: Add LVTS thermal controller definition
>     for mt8192
>   thermal/drivers/mediatek/lvts_thermal: Add mt8192 support
>   arm64: dts: mediatek: mt8192: Add thermal zones and thermal nodes
>   arm64: dts: mediatek: mt8192: Add temperature mitigation threshold

I tried this on my Hayato. As soon as lvts_ap probes and its thermal zones
are registered, a "critical temperature reached" warning is immediately
triggered for all the zones, a reboot is forced. A NULL pointer dereference
is also triggered somewhere. I filtered out all the interspersed "critical
temperature" messages:

[    2.943847] Unable to handle kernel NULL pointer dereference at
virtual address 0000000000000600
[    2.958818] Mem abort info:
[    2.965996]   ESR = 0x0000000096000005
[    2.973765] SMCCC: SOC_ID: ID = jep106:0426:8192 Revision = 0x00000000
[    2.975442]   EC = 0x25: DABT (current EL), IL = 32 bits
[    2.987305]   SET = 0, FnV = 0
[    2.995521]   EA = 0, S1PTW = 0
[    3.004265]   FSC = 0x05: level 1 translation fault
[    3.014365] Data abort info:
[    3.017344]   ISV = 0, ISS = 0x00000005
[    3.021279]   CM = 0, WnR = 0
[    3.022124] GACT probability NOT on
[    3.024277] [0000000000000600] user address but active_mm is swapper
[    3.034190] Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP
[    3.044738] Modules linked in:
[    3.044745] CPU: 0 PID: 97 Comm: irq/273-1100b00 Not tainted
6.3.0-rc1-next-20230308-01996-g3c0b9a61a3e5-dirty #575
c7b94096b594a95f18217c2ad4a2bd6d2c431108
[    3.044751] Hardware name: Google Hayato rev1 (DT)
[    3.044755] pstate: 60000009 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[    3.052055] pc : __mutex_lock+0x60/0x438
[    3.052066] lr : __mutex_lock+0x54/0x438
[    3.052070] sp : ffffffc008883c60
[    3.070822] x29: ffffffc008883c60 x28: ffffff80c281a880 x27: 000881f00009001f
[    3.070830] x26: 1fc0000000247c00 x25: ffffff80c281a900 x24: 0000000000000000
[    3.070837] x23: 0000000000000000 x22: ffffffe5ae5d45f4 x21: 0000000000000002
[    3.086211] x20: 0000000000000000 x19: 00000000000005a0 x18: ffffffffffffffff
[    3.086218] x17: 6568636165722065 x16: 727574617265706d x15: 0000000000000028
[    3.097773] x14: 0000000000000000 x13: 0000000000003395 x12: ffffffe5af7f6ff0
[    3.097780] x11: 65706d655428206e x10: 0000000000000000 x9 : ffffffe5adcf4b08
[    3.097787] x8 : ffffffe5afe03230 x7 : 00000000000261b0 x6 : ffffff80c2b86600
[    3.105609] x5 : 0000000000000000 x4 : ffffff80c2b86600 x3 : 0000000000000000
[    3.112565] x2 : ffffff9b505f6000 x1 : 0000000000000000 x0 : 0000000000000000
[    3.127593] Call trace:
[    3.127595]  __mutex_lock+0x60/0x438
[    3.127600]  mutex_lock_nested+0x34/0x48
[    3.141844]  thermal_zone_device_update+0x34/0x80
[    3.152879]  lvts_irq_handler+0xbc/0x158
[    3.152886]  irq_thread_fn+0x34/0xb8
[    3.161489]  irq_thread+0x19c/0x298
[    3.161494]  kthread+0x11c/0x128
[    3.175152]  ret_from_fork+0x10/0x20
[    3.175163] Code: 97ccbb7c 9000bea0 b9411400 35000080 (f9403260)
[    3.189402] ---[ end trace 0000000000000000 ]---
[    3.193417] Kernel panic - not syncing: Oops: Fatal exception
[    3.201255] Kernel Offset: 0x25a5c00000 from 0xffffffc008000000
[    3.201257] PHYS_OFFSET: 0x40000000
[    3.201259] CPU features: 0x600000,01700506,3200720b
[    3.201263] Memory Limit: none
[    3.376838] Rebooting in 30 seconds..


>  arch/arm64/boot/dts/mediatek/mt8192.dtsi      | 454 ++++++++++++++++++
>  drivers/thermal/mediatek/lvts_thermal.c       | 106 +++-
>  .../thermal/mediatek,lvts-thermal.h           |  19 +
>  3 files changed, 577 insertions(+), 2 deletions(-)
>
>
> base-commit: 6828e402d06f7c574430b61c05db784cd847b19f
> prerequisite-patch-id: 73be949bd16979769e5b94905b244dcee4a8f687
> prerequisite-patch-id: 9076e9b3bd3cc411b7b80344211364db5f0cca17
> prerequisite-patch-id: e220d6ae26786f524c249588433f02e5f5f906ad
> prerequisite-patch-id: 58e295ae36ad4784f3eb3830412f35dad31bb8b6
> prerequisite-patch-id: d23d83a946e5b876ef01a717fd51b07df1fa08dd
> prerequisite-patch-id: d67f2455eef1c4a9ecc460dbf3c2e3ad47d213ec
> prerequisite-patch-id: b407d2998e57678952128b3a4bac92a379132b09
> prerequisite-patch-id: fbb9212ce8c3530da17d213f56fa334ce4fa1b2b
> prerequisite-patch-id: 5db9eed2659028cf4419f2de3d093af7df6c2dad
> prerequisite-patch-id: a83c00c628605d1c8fbe1d97074f9f28efb1bcfc
> --
> 2.34.1
>
>
  
Balsam CHIHI March 9, 2023, 10:47 a.m. UTC | #2
Hi Chen-Yu,

On Thu, Mar 9, 2023 at 6:04 AM Chen-Yu Tsai <wenst@chromium.org> wrote:
>
> On Wed, Mar 8, 2023 at 12:34 AM <bchihi@baylibre.com> wrote:
> >
> > From: Balsam CHIHI <bchihi@baylibre.com>
> >
> > Add full LVTS support (MCU thermal domain + AP thermal domain) to MediaTek MT8192 SoC.
> >
> > This series is a continuation of the previous series "Add LVTS Thermal Architecture" v14 :
> >     https://patchwork.kernel.org/project/linux-pm/cover/20230209105628.50294-1-bchihi@baylibre.com/
> > and "Add LVTS's AP thermal domain support for mt8195" :
> >     https://patchwork.kernel.org/project/linux-pm/cover/20230307154524.118541-1-bchihi@baylibre.com/
> >
> > Based on top of thermal/linux-next :
> >     base-commit: 6828e402d06f7c574430b61c05db784cd847b19f
> >
> > Depends on these patches as they are not yet applyied to thermal/linux-next branch :
> >     [1/4] dt-bindings: thermal: mediatek: Add AP domain to LVTS thermal controllers for mt8195
> >     https://patchwork.kernel.org/project/linux-pm/patch/20230307154524.118541-2-bchihi@baylibre.com/
> >     [2/4] thermal/drivers/mediatek/lvts_thermal: Add AP domain for mt8195
> >     https://patchwork.kernel.org/project/linux-pm/patch/20230307154524.118541-3-bchihi@baylibre.com/
> >
> > Balsam CHIHI (4):
> >   dt-bindings: thermal: mediatek: Add LVTS thermal controller definition
> >     for mt8192
> >   thermal/drivers/mediatek/lvts_thermal: Add mt8192 support
> >   arm64: dts: mediatek: mt8192: Add thermal zones and thermal nodes
> >   arm64: dts: mediatek: mt8192: Add temperature mitigation threshold
>
> I tried this on my Hayato. As soon as lvts_ap probes and its thermal zones
> are registered, a "critical temperature reached" warning is immediately
> triggered for all the zones, a reboot is forced. A NULL pointer dereference
> is also triggered somewhere. I filtered out all the interspersed "critical
> temperature" messages:
>

Thank you very much for testing!
It seems like interrupts on mt8192 and mt8195 do not behave the same way.
I am investigating the issues.

> [    2.943847] Unable to handle kernel NULL pointer dereference at
> virtual address 0000000000000600
> [    2.958818] Mem abort info:
> [    2.965996]   ESR = 0x0000000096000005
> [    2.973765] SMCCC: SOC_ID: ID = jep106:0426:8192 Revision = 0x00000000
> [    2.975442]   EC = 0x25: DABT (current EL), IL = 32 bits
> [    2.987305]   SET = 0, FnV = 0
> [    2.995521]   EA = 0, S1PTW = 0
> [    3.004265]   FSC = 0x05: level 1 translation fault
> [    3.014365] Data abort info:
> [    3.017344]   ISV = 0, ISS = 0x00000005
> [    3.021279]   CM = 0, WnR = 0
> [    3.022124] GACT probability NOT on
> [    3.024277] [0000000000000600] user address but active_mm is swapper
> [    3.034190] Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP
> [    3.044738] Modules linked in:
> [    3.044745] CPU: 0 PID: 97 Comm: irq/273-1100b00 Not tainted
> 6.3.0-rc1-next-20230308-01996-g3c0b9a61a3e5-dirty #575
> c7b94096b594a95f18217c2ad4a2bd6d2c431108
> [    3.044751] Hardware name: Google Hayato rev1 (DT)
> [    3.044755] pstate: 60000009 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [    3.052055] pc : __mutex_lock+0x60/0x438
> [    3.052066] lr : __mutex_lock+0x54/0x438
> [    3.052070] sp : ffffffc008883c60
> [    3.070822] x29: ffffffc008883c60 x28: ffffff80c281a880 x27: 000881f00009001f
> [    3.070830] x26: 1fc0000000247c00 x25: ffffff80c281a900 x24: 0000000000000000
> [    3.070837] x23: 0000000000000000 x22: ffffffe5ae5d45f4 x21: 0000000000000002
> [    3.086211] x20: 0000000000000000 x19: 00000000000005a0 x18: ffffffffffffffff
> [    3.086218] x17: 6568636165722065 x16: 727574617265706d x15: 0000000000000028
> [    3.097773] x14: 0000000000000000 x13: 0000000000003395 x12: ffffffe5af7f6ff0
> [    3.097780] x11: 65706d655428206e x10: 0000000000000000 x9 : ffffffe5adcf4b08
> [    3.097787] x8 : ffffffe5afe03230 x7 : 00000000000261b0 x6 : ffffff80c2b86600
> [    3.105609] x5 : 0000000000000000 x4 : ffffff80c2b86600 x3 : 0000000000000000
> [    3.112565] x2 : ffffff9b505f6000 x1 : 0000000000000000 x0 : 0000000000000000
> [    3.127593] Call trace:
> [    3.127595]  __mutex_lock+0x60/0x438
> [    3.127600]  mutex_lock_nested+0x34/0x48
> [    3.141844]  thermal_zone_device_update+0x34/0x80
> [    3.152879]  lvts_irq_handler+0xbc/0x158
> [    3.152886]  irq_thread_fn+0x34/0xb8
> [    3.161489]  irq_thread+0x19c/0x298
> [    3.161494]  kthread+0x11c/0x128
> [    3.175152]  ret_from_fork+0x10/0x20
> [    3.175163] Code: 97ccbb7c 9000bea0 b9411400 35000080 (f9403260)
> [    3.189402] ---[ end trace 0000000000000000 ]---
> [    3.193417] Kernel panic - not syncing: Oops: Fatal exception
> [    3.201255] Kernel Offset: 0x25a5c00000 from 0xffffffc008000000
> [    3.201257] PHYS_OFFSET: 0x40000000
> [    3.201259] CPU features: 0x600000,01700506,3200720b
> [    3.201263] Memory Limit: none
> [    3.376838] Rebooting in 30 seconds..
>
>
[...]

Best regards,
Balsam
  
Balsam CHIHI March 22, 2023, 12:48 p.m. UTC | #3
Hi Chen-Yu,

I suspect the bug comes from incorrect calibration data offsets for AP
Domain because you confirm that MCU Domain probe runs without issues.
Is it possible to test something for us to confirm this theory (i
don't have an mt8192 board on hand now), when you have the time of
course?
We would like to test AP Domain's calibration data offsets with a
working one, for example :

 static const struct lvts_ctrl_data mt8192_lvts_ap_data_ctrl[] = {
                {
-               .cal_offset = { 0x25, 0x28 },
+               .cal_offset = { 0x04, 0x04 },
                .lvts_sensor = {
                        { .dt_id = MT8192_AP_VPU0 },
                        { .dt_id = MT8192_AP_VPU1 }
@@ -1336,7 +1336,7 @@ static const struct lvts_ctrl_data
mt8192_lvts_ap_data_ctrl[] = {
                .hw_tshut_temp = LVTS_HW_SHUTDOWN_MT8192,
        },
        {
-               .cal_offset = { 0x2e, 0x31 },
+               .cal_offset = { 0x04, 0x04 },
                .lvts_sensor = {
                        { .dt_id = MT8192_AP_GPU0 },
                        { .dt_id = MT8192_AP_GPU1 }
@@ -1346,7 +1346,7 @@ static const struct lvts_ctrl_data
mt8192_lvts_ap_data_ctrl[] = {
                .hw_tshut_temp = LVTS_HW_SHUTDOWN_MT8192,
        },
        {
-               .cal_offset = { 0x37, 0x3a },
+               .cal_offset = { 0x04, 0x04 },
                .lvts_sensor = {
                        { .dt_id = MT8192_AP_INFRA },
                        { .dt_id = MT8192_AP_CAM },
@@ -1356,7 +1356,7 @@ static const struct lvts_ctrl_data
mt8192_lvts_ap_data_ctrl[] = {
                .hw_tshut_temp = LVTS_HW_SHUTDOWN_MT8192,
        },
        {
-               .cal_offset = { 0x40, 0x43, 0x46 },
+               .cal_offset = { 0x04, 0x04, 0x04 },
                .lvts_sensor = {
                        { .dt_id = MT8192_AP_MD0 },
                        { .dt_id = MT8192_AP_MD1 },

This example is tested and works for mt8195,
(all sensors use the same calibration data offset for testing purposes).

Thank you in advance for your help.

Best regards,
Balsam
  
Chen-Yu Tsai March 25, 2023, 4:33 a.m. UTC | #4
On Wed, Mar 22, 2023 at 8:48 PM Balsam CHIHI <bchihi@baylibre.com> wrote:
>
> Hi Chen-Yu,
>
> I suspect the bug comes from incorrect calibration data offsets for AP
> Domain because you confirm that MCU Domain probe runs without issues.
> Is it possible to test something for us to confirm this theory (i
> don't have an mt8192 board on hand now), when you have the time of
> course?
> We would like to test AP Domain's calibration data offsets with a
> working one, for example :
>
>  static const struct lvts_ctrl_data mt8192_lvts_ap_data_ctrl[] = {
>                 {
> -               .cal_offset = { 0x25, 0x28 },
> +               .cal_offset = { 0x04, 0x04 },
>                 .lvts_sensor = {
>                         { .dt_id = MT8192_AP_VPU0 },
>                         { .dt_id = MT8192_AP_VPU1 }
> @@ -1336,7 +1336,7 @@ static const struct lvts_ctrl_data
> mt8192_lvts_ap_data_ctrl[] = {
>                 .hw_tshut_temp = LVTS_HW_SHUTDOWN_MT8192,
>         },
>         {
> -               .cal_offset = { 0x2e, 0x31 },
> +               .cal_offset = { 0x04, 0x04 },
>                 .lvts_sensor = {
>                         { .dt_id = MT8192_AP_GPU0 },
>                         { .dt_id = MT8192_AP_GPU1 }
> @@ -1346,7 +1346,7 @@ static const struct lvts_ctrl_data
> mt8192_lvts_ap_data_ctrl[] = {
>                 .hw_tshut_temp = LVTS_HW_SHUTDOWN_MT8192,
>         },
>         {
> -               .cal_offset = { 0x37, 0x3a },
> +               .cal_offset = { 0x04, 0x04 },
>                 .lvts_sensor = {
>                         { .dt_id = MT8192_AP_INFRA },
>                         { .dt_id = MT8192_AP_CAM },
> @@ -1356,7 +1356,7 @@ static const struct lvts_ctrl_data
> mt8192_lvts_ap_data_ctrl[] = {
>                 .hw_tshut_temp = LVTS_HW_SHUTDOWN_MT8192,
>         },
>         {
> -               .cal_offset = { 0x40, 0x43, 0x46 },
> +               .cal_offset = { 0x04, 0x04, 0x04 },
>                 .lvts_sensor = {
>                         { .dt_id = MT8192_AP_MD0 },
>                         { .dt_id = MT8192_AP_MD1 },
>
> This example is tested and works for mt8195,
> (all sensors use the same calibration data offset for testing purposes).
>
> Thank you in advance for your help.

The MCU ones are still tripping though. If I change all of them to 0x04,
then nothing trips. There's also a bug in the interrupt handling code
that needs to be dealt with.

AFAICT the calibration data is stored differently. If you look at ChromeOS's
downstream v5.10 driver, you'll see mt6873_efuse_to_cal_data() for MT8192,
and mt8195_efuse_to_cal_data() for MT8195. The difference sums up to:
MT8195 has all data sequentially stored, while MT8192 has most data stored
in lower 24 bits of each 32-bit word, and the highest 8 bits are then used
to pack data for the remaining sensors.

Regards
ChenYu
  
Balsam CHIHI March 28, 2023, 12:20 a.m. UTC | #5
On Sat, Mar 25, 2023 at 5:33 AM Chen-Yu Tsai <wenst@chromium.org> wrote:
>
> On Wed, Mar 22, 2023 at 8:48 PM Balsam CHIHI <bchihi@baylibre.com> wrote:
> >
> > Hi Chen-Yu,
> >
> > I suspect the bug comes from incorrect calibration data offsets for AP
> > Domain because you confirm that MCU Domain probe runs without issues.
> > Is it possible to test something for us to confirm this theory (i
> > don't have an mt8192 board on hand now), when you have the time of
> > course?
> > We would like to test AP Domain's calibration data offsets with a
> > working one, for example :
> >
> >  static const struct lvts_ctrl_data mt8192_lvts_ap_data_ctrl[] = {
> >                 {
> > -               .cal_offset = { 0x25, 0x28 },
> > +               .cal_offset = { 0x04, 0x04 },
> >                 .lvts_sensor = {
> >                         { .dt_id = MT8192_AP_VPU0 },
> >                         { .dt_id = MT8192_AP_VPU1 }
> > @@ -1336,7 +1336,7 @@ static const struct lvts_ctrl_data
> > mt8192_lvts_ap_data_ctrl[] = {
> >                 .hw_tshut_temp = LVTS_HW_SHUTDOWN_MT8192,
> >         },
> >         {
> > -               .cal_offset = { 0x2e, 0x31 },
> > +               .cal_offset = { 0x04, 0x04 },
> >                 .lvts_sensor = {
> >                         { .dt_id = MT8192_AP_GPU0 },
> >                         { .dt_id = MT8192_AP_GPU1 }
> > @@ -1346,7 +1346,7 @@ static const struct lvts_ctrl_data
> > mt8192_lvts_ap_data_ctrl[] = {
> >                 .hw_tshut_temp = LVTS_HW_SHUTDOWN_MT8192,
> >         },
> >         {
> > -               .cal_offset = { 0x37, 0x3a },
> > +               .cal_offset = { 0x04, 0x04 },
> >                 .lvts_sensor = {
> >                         { .dt_id = MT8192_AP_INFRA },
> >                         { .dt_id = MT8192_AP_CAM },
> > @@ -1356,7 +1356,7 @@ static const struct lvts_ctrl_data
> > mt8192_lvts_ap_data_ctrl[] = {
> >                 .hw_tshut_temp = LVTS_HW_SHUTDOWN_MT8192,
> >         },
> >         {
> > -               .cal_offset = { 0x40, 0x43, 0x46 },
> > +               .cal_offset = { 0x04, 0x04, 0x04 },
> >                 .lvts_sensor = {
> >                         { .dt_id = MT8192_AP_MD0 },
> >                         { .dt_id = MT8192_AP_MD1 },
> >
> > This example is tested and works for mt8195,
> > (all sensors use the same calibration data offset for testing purposes).
> >
> > Thank you in advance for your help.
>
> The MCU ones are still tripping though. If I change all of them to 0x04,
> then nothing trips. There's also a bug in the interrupt handling code
> that needs to be dealt with.
>
> AFAICT the calibration data is stored differently. If you look at ChromeOS's
> downstream v5.10 driver, you'll see mt6873_efuse_to_cal_data() for MT8192,
> and mt8195_efuse_to_cal_data() for MT8195. The difference sums up to:
> MT8195 has all data sequentially stored, while MT8192 has most data stored
> in lower 24 bits of each 32-bit word, and the highest 8 bits are then used
> to pack data for the remaining sensors.
>
> Regards
> ChenYu

Hi Chen-Yu Tsai,

Thank you very much for helping me testing this suggestion.

Indeed, calibration data is stored differently in the mt8192 compared to mt8195.
So, the mt8192's support will be delayed for now, to allow further debugging.

In the mean time, we will only continue to upstream the remaining
mt8195's source code, so it will get full LVTS support.
A new series will be submitted soon.

Would you please point me out to the bug in interrupt handling code?

Best regards,
Balsam
  
Chen-Yu Tsai March 28, 2023, 3:12 a.m. UTC | #6
On Tue, Mar 28, 2023 at 8:21 AM Balsam CHIHI <bchihi@baylibre.com> wrote:
>
> On Sat, Mar 25, 2023 at 5:33 AM Chen-Yu Tsai <wenst@chromium.org> wrote:
> >
> > On Wed, Mar 22, 2023 at 8:48 PM Balsam CHIHI <bchihi@baylibre.com> wrote:
> > >
> > > Hi Chen-Yu,
> > >
> > > I suspect the bug comes from incorrect calibration data offsets for AP
> > > Domain because you confirm that MCU Domain probe runs without issues.
> > > Is it possible to test something for us to confirm this theory (i
> > > don't have an mt8192 board on hand now), when you have the time of
> > > course?
> > > We would like to test AP Domain's calibration data offsets with a
> > > working one, for example :
> > >
> > >  static const struct lvts_ctrl_data mt8192_lvts_ap_data_ctrl[] = {
> > >                 {
> > > -               .cal_offset = { 0x25, 0x28 },
> > > +               .cal_offset = { 0x04, 0x04 },
> > >                 .lvts_sensor = {
> > >                         { .dt_id = MT8192_AP_VPU0 },
> > >                         { .dt_id = MT8192_AP_VPU1 }
> > > @@ -1336,7 +1336,7 @@ static const struct lvts_ctrl_data
> > > mt8192_lvts_ap_data_ctrl[] = {
> > >                 .hw_tshut_temp = LVTS_HW_SHUTDOWN_MT8192,
> > >         },
> > >         {
> > > -               .cal_offset = { 0x2e, 0x31 },
> > > +               .cal_offset = { 0x04, 0x04 },
> > >                 .lvts_sensor = {
> > >                         { .dt_id = MT8192_AP_GPU0 },
> > >                         { .dt_id = MT8192_AP_GPU1 }
> > > @@ -1346,7 +1346,7 @@ static const struct lvts_ctrl_data
> > > mt8192_lvts_ap_data_ctrl[] = {
> > >                 .hw_tshut_temp = LVTS_HW_SHUTDOWN_MT8192,
> > >         },
> > >         {
> > > -               .cal_offset = { 0x37, 0x3a },
> > > +               .cal_offset = { 0x04, 0x04 },
> > >                 .lvts_sensor = {
> > >                         { .dt_id = MT8192_AP_INFRA },
> > >                         { .dt_id = MT8192_AP_CAM },
> > > @@ -1356,7 +1356,7 @@ static const struct lvts_ctrl_data
> > > mt8192_lvts_ap_data_ctrl[] = {
> > >                 .hw_tshut_temp = LVTS_HW_SHUTDOWN_MT8192,
> > >         },
> > >         {
> > > -               .cal_offset = { 0x40, 0x43, 0x46 },
> > > +               .cal_offset = { 0x04, 0x04, 0x04 },
> > >                 .lvts_sensor = {
> > >                         { .dt_id = MT8192_AP_MD0 },
> > >                         { .dt_id = MT8192_AP_MD1 },
> > >
> > > This example is tested and works for mt8195,
> > > (all sensors use the same calibration data offset for testing purposes).
> > >
> > > Thank you in advance for your help.
> >
> > The MCU ones are still tripping though. If I change all of them to 0x04,
> > then nothing trips. There's also a bug in the interrupt handling code
> > that needs to be dealt with.
> >
> > AFAICT the calibration data is stored differently. If you look at ChromeOS's
> > downstream v5.10 driver, you'll see mt6873_efuse_to_cal_data() for MT8192,
> > and mt8195_efuse_to_cal_data() for MT8195. The difference sums up to:
> > MT8195 has all data sequentially stored, while MT8192 has most data stored
> > in lower 24 bits of each 32-bit word, and the highest 8 bits are then used
> > to pack data for the remaining sensors.
> >
> > Regards
> > ChenYu
>
> Hi Chen-Yu Tsai,
>
> Thank you very much for helping me testing this suggestion.
>
> Indeed, calibration data is stored differently in the mt8192 compared to mt8195.
> So, the mt8192's support will be delayed for now, to allow further debugging.
>
> In the mean time, we will only continue to upstream the remaining
> mt8195's source code, so it will get full LVTS support.
> A new series will be submitted soon.
>
> Would you please point me out to the bug in interrupt handling code?

I just sent out two patches and CC-ed you on them. They are here just in case:

https://lore.kernel.org/linux-pm/20230328031037.1361048-1-wenst@chromium.org/
https://lore.kernel.org/linux-pm/20230328031017.1360976-1-wenst@chromium.org/

ChenYu
  
Balsam CHIHI March 29, 2023, 8:05 a.m. UTC | #7
Hi Chen-Yu,

On Tue, Mar 28, 2023 at 5:12 AM Chen-Yu Tsai <wenst@chromium.org> wrote:
>
> On Tue, Mar 28, 2023 at 8:21 AM Balsam CHIHI <bchihi@baylibre.com> wrote:
> >
> > On Sat, Mar 25, 2023 at 5:33 AM Chen-Yu Tsai <wenst@chromium.org> wrote:
> > >
> > > On Wed, Mar 22, 2023 at 8:48 PM Balsam CHIHI <bchihi@baylibre.com> wrote:
> > > >
> > > > Hi Chen-Yu,
> > > >
> > > > I suspect the bug comes from incorrect calibration data offsets for AP
> > > > Domain because you confirm that MCU Domain probe runs without issues.
> > > > Is it possible to test something for us to confirm this theory (i
> > > > don't have an mt8192 board on hand now), when you have the time of
> > > > course?
> > > > We would like to test AP Domain's calibration data offsets with a
> > > > working one, for example :
> > > >
> > > >  static const struct lvts_ctrl_data mt8192_lvts_ap_data_ctrl[] = {
> > > >                 {
> > > > -               .cal_offset = { 0x25, 0x28 },
> > > > +               .cal_offset = { 0x04, 0x04 },
> > > >                 .lvts_sensor = {
> > > >                         { .dt_id = MT8192_AP_VPU0 },
> > > >                         { .dt_id = MT8192_AP_VPU1 }
> > > > @@ -1336,7 +1336,7 @@ static const struct lvts_ctrl_data
> > > > mt8192_lvts_ap_data_ctrl[] = {
> > > >                 .hw_tshut_temp = LVTS_HW_SHUTDOWN_MT8192,
> > > >         },
> > > >         {
> > > > -               .cal_offset = { 0x2e, 0x31 },
> > > > +               .cal_offset = { 0x04, 0x04 },
> > > >                 .lvts_sensor = {
> > > >                         { .dt_id = MT8192_AP_GPU0 },
> > > >                         { .dt_id = MT8192_AP_GPU1 }
> > > > @@ -1346,7 +1346,7 @@ static const struct lvts_ctrl_data
> > > > mt8192_lvts_ap_data_ctrl[] = {
> > > >                 .hw_tshut_temp = LVTS_HW_SHUTDOWN_MT8192,
> > > >         },
> > > >         {
> > > > -               .cal_offset = { 0x37, 0x3a },
> > > > +               .cal_offset = { 0x04, 0x04 },
> > > >                 .lvts_sensor = {
> > > >                         { .dt_id = MT8192_AP_INFRA },
> > > >                         { .dt_id = MT8192_AP_CAM },
> > > > @@ -1356,7 +1356,7 @@ static const struct lvts_ctrl_data
> > > > mt8192_lvts_ap_data_ctrl[] = {
> > > >                 .hw_tshut_temp = LVTS_HW_SHUTDOWN_MT8192,
> > > >         },
> > > >         {
> > > > -               .cal_offset = { 0x40, 0x43, 0x46 },
> > > > +               .cal_offset = { 0x04, 0x04, 0x04 },
> > > >                 .lvts_sensor = {
> > > >                         { .dt_id = MT8192_AP_MD0 },
> > > >                         { .dt_id = MT8192_AP_MD1 },
> > > >
> > > > This example is tested and works for mt8195,
> > > > (all sensors use the same calibration data offset for testing purposes).
> > > >
> > > > Thank you in advance for your help.
> > >
> > > The MCU ones are still tripping though. If I change all of them to 0x04,
> > > then nothing trips. There's also a bug in the interrupt handling code
> > > that needs to be dealt with.
> > >
> > > AFAICT the calibration data is stored differently. If you look at ChromeOS's
> > > downstream v5.10 driver, you'll see mt6873_efuse_to_cal_data() for MT8192,
> > > and mt8195_efuse_to_cal_data() for MT8195. The difference sums up to:
> > > MT8195 has all data sequentially stored, while MT8192 has most data stored
> > > in lower 24 bits of each 32-bit word, and the highest 8 bits are then used
> > > to pack data for the remaining sensors.
> > >
> > > Regards
> > > ChenYu
> >
> > Hi Chen-Yu Tsai,
> >
> > Thank you very much for helping me testing this suggestion.
> >
> > Indeed, calibration data is stored differently in the mt8192 compared to mt8195.
> > So, the mt8192's support will be delayed for now, to allow further debugging.
> >
> > In the mean time, we will only continue to upstream the remaining
> > mt8195's source code, so it will get full LVTS support.
> > A new series will be submitted soon.
> >
> > Would you please point me out to the bug in interrupt handling code?
>
> I just sent out two patches and CC-ed you on them. They are here just in case:
>
> https://lore.kernel.org/linux-pm/20230328031037.1361048-1-wenst@chromium.org/
> https://lore.kernel.org/linux-pm/20230328031017.1360976-1-wenst@chromium.org/

Well received. I'm testing it.
Thanks!

Best regards,
Balsam

>
> ChenYu
  
Nícolas F. R. A. Prado April 24, 2023, 10:21 p.m. UTC | #8
On Tue, Mar 28, 2023 at 02:20:24AM +0200, Balsam CHIHI wrote:
> On Sat, Mar 25, 2023 at 5:33 AM Chen-Yu Tsai <wenst@chromium.org> wrote:
> >
> > On Wed, Mar 22, 2023 at 8:48 PM Balsam CHIHI <bchihi@baylibre.com> wrote:
> > >
> > > Hi Chen-Yu,
> > >
> > > I suspect the bug comes from incorrect calibration data offsets for AP
> > > Domain because you confirm that MCU Domain probe runs without issues.
> > > Is it possible to test something for us to confirm this theory (i
> > > don't have an mt8192 board on hand now), when you have the time of
> > > course?
> > > We would like to test AP Domain's calibration data offsets with a
> > > working one, for example :
> > >
> > >  static const struct lvts_ctrl_data mt8192_lvts_ap_data_ctrl[] = {
> > >                 {
> > > -               .cal_offset = { 0x25, 0x28 },
> > > +               .cal_offset = { 0x04, 0x04 },
> > >                 .lvts_sensor = {
> > >                         { .dt_id = MT8192_AP_VPU0 },
> > >                         { .dt_id = MT8192_AP_VPU1 }
> > > @@ -1336,7 +1336,7 @@ static const struct lvts_ctrl_data
[..]
> > >
> > > This example is tested and works for mt8195,
> > > (all sensors use the same calibration data offset for testing purposes).
> > >
> > > Thank you in advance for your help.
> >
> > The MCU ones are still tripping though. If I change all of them to 0x04,
> > then nothing trips. There's also a bug in the interrupt handling code
> > that needs to be dealt with.
> >
> > AFAICT the calibration data is stored differently. If you look at ChromeOS's
> > downstream v5.10 driver, you'll see mt6873_efuse_to_cal_data() for MT8192,
> > and mt8195_efuse_to_cal_data() for MT8195. The difference sums up to:
> > MT8195 has all data sequentially stored, while MT8192 has most data stored
> > in lower 24 bits of each 32-bit word, and the highest 8 bits are then used
> > to pack data for the remaining sensors.
> >
> > Regards
> > ChenYu
> 
> Hi Chen-Yu Tsai,
> 
> Thank you very much for helping me testing this suggestion.
> 
> Indeed, calibration data is stored differently in the mt8192 compared to mt8195.
> So, the mt8192's support will be delayed for now, to allow further debugging.
> 
> In the mean time, we will only continue to upstream the remaining
> mt8195's source code, so it will get full LVTS support.
> A new series will be submitted soon.

Hi Balsam,

like Chen-Yu mentioned, the calibration data is stored with 4 byte alignment for
MT8192, but the data that is split between non-contiguous bytes is for the
thermal controllers (called Resistor-Capacitor Calibration downstream) not the
sensors. The controller calibration isn't currently handled in this driver (and
downstream it also isn't used, since a current value is read from the controller
instead), so we can just ignore those.

The patch below adjusts the addresseses for the sensors and gives me reasonable
reads, so the machine no longer reboots. Can you integrate it into your series?

Thanks,
Nícolas

From 4506f03b806f3eeb89887bac2c1c86d61da97281 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?N=C3=ADcolas=20F=2E=20R=2E=20A=2E=20Prado?=
 <nfraprado@collabora.com>
Date: Mon, 24 Apr 2023 17:42:42 -0400
Subject: [PATCH] thermal/drivers/mediatek/lvts_thermal: Fix calibration
 offsets for MT8192
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Signed-off-by: Nícolas F. R. A. Prado <nfraprado@collabora.com>
---
 drivers/thermal/mediatek/lvts_thermal.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/thermal/mediatek/lvts_thermal.c b/drivers/thermal/mediatek/lvts_thermal.c
index b6956c89d557..f8afbc2ac190 100644
--- a/drivers/thermal/mediatek/lvts_thermal.c
+++ b/drivers/thermal/mediatek/lvts_thermal.c
@@ -1261,7 +1261,7 @@ static const struct lvts_ctrl_data mt8195_lvts_ap_data_ctrl[] = {
 
 static const struct lvts_ctrl_data mt8192_lvts_mcu_data_ctrl[] = {
 	{
-		.cal_offset = { 0x04, 0x07 },
+		.cal_offset = { 0x04, 0x08 },
 		.lvts_sensor = {
 			{ .dt_id = MT8192_MCU_BIG_CPU0 },
 			{ .dt_id = MT8192_MCU_BIG_CPU1 }
@@ -1271,7 +1271,7 @@ static const struct lvts_ctrl_data mt8192_lvts_mcu_data_ctrl[] = {
 		.hw_tshut_temp = LVTS_HW_SHUTDOWN_MT8192,
 	},
 	{
-		.cal_offset = { 0x0d, 0x10 },
+		.cal_offset = { 0x0c, 0x10 },
 		.lvts_sensor = {
 			{ .dt_id = MT8192_MCU_BIG_CPU2 },
 			{ .dt_id = MT8192_MCU_BIG_CPU3 }
@@ -1281,7 +1281,7 @@ static const struct lvts_ctrl_data mt8192_lvts_mcu_data_ctrl[] = {
 		.hw_tshut_temp = LVTS_HW_SHUTDOWN_MT8192,
 	},
 	{
-		.cal_offset = { 0x16, 0x19, 0x1c, 0x1f },
+		.cal_offset = { 0x14, 0x18, 0x1c, 0x20 },
 		.lvts_sensor = {
 			{ .dt_id = MT8192_MCU_LITTLE_CPU0 },
 			{ .dt_id = MT8192_MCU_LITTLE_CPU1 },
@@ -1296,7 +1296,7 @@ static const struct lvts_ctrl_data mt8192_lvts_mcu_data_ctrl[] = {
 
 static const struct lvts_ctrl_data mt8192_lvts_ap_data_ctrl[] = {
 		{
-		.cal_offset = { 0x25, 0x28 },
+		.cal_offset = { 0x24, 0x28 },
 		.lvts_sensor = {
 			{ .dt_id = MT8192_AP_VPU0 },
 			{ .dt_id = MT8192_AP_VPU1 }
@@ -1306,7 +1306,7 @@ static const struct lvts_ctrl_data mt8192_lvts_ap_data_ctrl[] = {
 		.hw_tshut_temp = LVTS_HW_SHUTDOWN_MT8192,
 	},
 	{
-		.cal_offset = { 0x2e, 0x31 },
+		.cal_offset = { 0x2c, 0x30 },
 		.lvts_sensor = {
 			{ .dt_id = MT8192_AP_GPU0 },
 			{ .dt_id = MT8192_AP_GPU1 }
@@ -1316,7 +1316,7 @@ static const struct lvts_ctrl_data mt8192_lvts_ap_data_ctrl[] = {
 		.hw_tshut_temp = LVTS_HW_SHUTDOWN_MT8192,
 	},
 	{
-		.cal_offset = { 0x37, 0x3a },
+		.cal_offset = { 0x34, 0x38 },
 		.lvts_sensor = {
 			{ .dt_id = MT8192_AP_INFRA },
 			{ .dt_id = MT8192_AP_CAM },
@@ -1326,7 +1326,7 @@ static const struct lvts_ctrl_data mt8192_lvts_ap_data_ctrl[] = {
 		.hw_tshut_temp = LVTS_HW_SHUTDOWN_MT8192,
 	},
 	{
-		.cal_offset = { 0x40, 0x43, 0x46 },
+		.cal_offset = { 0x3c, 0x40, 0x44 },
 		.lvts_sensor = {
 			{ .dt_id = MT8192_AP_MD0 },
 			{ .dt_id = MT8192_AP_MD1 },
  
Balsam CHIHI April 25, 2023, 8:36 a.m. UTC | #9
On Tue, Apr 25, 2023 at 12:21 AM Nícolas F. R. A. Prado
<nfraprado@collabora.com> wrote:
>
> On Tue, Mar 28, 2023 at 02:20:24AM +0200, Balsam CHIHI wrote:
> > On Sat, Mar 25, 2023 at 5:33 AM Chen-Yu Tsai <wenst@chromium.org> wrote:
> > >
> > > On Wed, Mar 22, 2023 at 8:48 PM Balsam CHIHI <bchihi@baylibre.com> wrote:
> > > >
> > > > Hi Chen-Yu,
> > > >
> > > > I suspect the bug comes from incorrect calibration data offsets for AP
> > > > Domain because you confirm that MCU Domain probe runs without issues.
> > > > Is it possible to test something for us to confirm this theory (i
> > > > don't have an mt8192 board on hand now), when you have the time of
> > > > course?
> > > > We would like to test AP Domain's calibration data offsets with a
> > > > working one, for example :
> > > >
> > > >  static const struct lvts_ctrl_data mt8192_lvts_ap_data_ctrl[] = {
> > > >                 {
> > > > -               .cal_offset = { 0x25, 0x28 },
> > > > +               .cal_offset = { 0x04, 0x04 },
> > > >                 .lvts_sensor = {
> > > >                         { .dt_id = MT8192_AP_VPU0 },
> > > >                         { .dt_id = MT8192_AP_VPU1 }
> > > > @@ -1336,7 +1336,7 @@ static const struct lvts_ctrl_data
> [..]
> > > >
> > > > This example is tested and works for mt8195,
> > > > (all sensors use the same calibration data offset for testing purposes).
> > > >
> > > > Thank you in advance for your help.
> > >
> > > The MCU ones are still tripping though. If I change all of them to 0x04,
> > > then nothing trips. There's also a bug in the interrupt handling code
> > > that needs to be dealt with.
> > >
> > > AFAICT the calibration data is stored differently. If you look at ChromeOS's
> > > downstream v5.10 driver, you'll see mt6873_efuse_to_cal_data() for MT8192,
> > > and mt8195_efuse_to_cal_data() for MT8195. The difference sums up to:
> > > MT8195 has all data sequentially stored, while MT8192 has most data stored
> > > in lower 24 bits of each 32-bit word, and the highest 8 bits are then used
> > > to pack data for the remaining sensors.
> > >
> > > Regards
> > > ChenYu
> >
> > Hi Chen-Yu Tsai,
> >
> > Thank you very much for helping me testing this suggestion.
> >
> > Indeed, calibration data is stored differently in the mt8192 compared to mt8195.
> > So, the mt8192's support will be delayed for now, to allow further debugging.
> >
> > In the mean time, we will only continue to upstream the remaining
> > mt8195's source code, so it will get full LVTS support.
> > A new series will be submitted soon.
>
> Hi Balsam,
>
> like Chen-Yu mentioned, the calibration data is stored with 4 byte alignment for
> MT8192, but the data that is split between non-contiguous bytes is for the
> thermal controllers (called Resistor-Capacitor Calibration downstream) not the
> sensors. The controller calibration isn't currently handled in this driver (and
> downstream it also isn't used, since a current value is read from the controller
> instead), so we can just ignore those.
>
> The patch below adjusts the addresseses for the sensors and gives me reasonable
> reads, so the machine no longer reboots. Can you integrate it into your series?
>
> Thanks,
> Nícolas
>

Hello Nícolas,

Thank you very much for your help!
I really appreciate it.
Yes, of course I will integrate your fix to the series immediately.

Best regards,
Balsam

> From 4506f03b806f3eeb89887bac2c1c86d61da97281 Mon Sep 17 00:00:00 2001
> From: =?UTF-8?q?N=C3=ADcolas=20F=2E=20R=2E=20A=2E=20Prado?=
>  <nfraprado@collabora.com>
> Date: Mon, 24 Apr 2023 17:42:42 -0400
> Subject: [PATCH] thermal/drivers/mediatek/lvts_thermal: Fix calibration
>  offsets for MT8192
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
>
> Signed-off-by: Nícolas F. R. A. Prado <nfraprado@collabora.com>
> ---
>  drivers/thermal/mediatek/lvts_thermal.c | 14 +++++++-------
>  1 file changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/thermal/mediatek/lvts_thermal.c b/drivers/thermal/mediatek/lvts_thermal.c
> index b6956c89d557..f8afbc2ac190 100644
> --- a/drivers/thermal/mediatek/lvts_thermal.c
> +++ b/drivers/thermal/mediatek/lvts_thermal.c
> @@ -1261,7 +1261,7 @@ static const struct lvts_ctrl_data mt8195_lvts_ap_data_ctrl[] = {
>
>  static const struct lvts_ctrl_data mt8192_lvts_mcu_data_ctrl[] = {
>         {
> -               .cal_offset = { 0x04, 0x07 },
> +               .cal_offset = { 0x04, 0x08 },
>                 .lvts_sensor = {
>                         { .dt_id = MT8192_MCU_BIG_CPU0 },
>                         { .dt_id = MT8192_MCU_BIG_CPU1 }
> @@ -1271,7 +1271,7 @@ static const struct lvts_ctrl_data mt8192_lvts_mcu_data_ctrl[] = {
>                 .hw_tshut_temp = LVTS_HW_SHUTDOWN_MT8192,
>         },
>         {
> -               .cal_offset = { 0x0d, 0x10 },
> +               .cal_offset = { 0x0c, 0x10 },
>                 .lvts_sensor = {
>                         { .dt_id = MT8192_MCU_BIG_CPU2 },
>                         { .dt_id = MT8192_MCU_BIG_CPU3 }
> @@ -1281,7 +1281,7 @@ static const struct lvts_ctrl_data mt8192_lvts_mcu_data_ctrl[] = {
>                 .hw_tshut_temp = LVTS_HW_SHUTDOWN_MT8192,
>         },
>         {
> -               .cal_offset = { 0x16, 0x19, 0x1c, 0x1f },
> +               .cal_offset = { 0x14, 0x18, 0x1c, 0x20 },
>                 .lvts_sensor = {
>                         { .dt_id = MT8192_MCU_LITTLE_CPU0 },
>                         { .dt_id = MT8192_MCU_LITTLE_CPU1 },
> @@ -1296,7 +1296,7 @@ static const struct lvts_ctrl_data mt8192_lvts_mcu_data_ctrl[] = {
>
>  static const struct lvts_ctrl_data mt8192_lvts_ap_data_ctrl[] = {
>                 {
> -               .cal_offset = { 0x25, 0x28 },
> +               .cal_offset = { 0x24, 0x28 },
>                 .lvts_sensor = {
>                         { .dt_id = MT8192_AP_VPU0 },
>                         { .dt_id = MT8192_AP_VPU1 }
> @@ -1306,7 +1306,7 @@ static const struct lvts_ctrl_data mt8192_lvts_ap_data_ctrl[] = {
>                 .hw_tshut_temp = LVTS_HW_SHUTDOWN_MT8192,
>         },
>         {
> -               .cal_offset = { 0x2e, 0x31 },
> +               .cal_offset = { 0x2c, 0x30 },
>                 .lvts_sensor = {
>                         { .dt_id = MT8192_AP_GPU0 },
>                         { .dt_id = MT8192_AP_GPU1 }
> @@ -1316,7 +1316,7 @@ static const struct lvts_ctrl_data mt8192_lvts_ap_data_ctrl[] = {
>                 .hw_tshut_temp = LVTS_HW_SHUTDOWN_MT8192,
>         },
>         {
> -               .cal_offset = { 0x37, 0x3a },
> +               .cal_offset = { 0x34, 0x38 },
>                 .lvts_sensor = {
>                         { .dt_id = MT8192_AP_INFRA },
>                         { .dt_id = MT8192_AP_CAM },
> @@ -1326,7 +1326,7 @@ static const struct lvts_ctrl_data mt8192_lvts_ap_data_ctrl[] = {
>                 .hw_tshut_temp = LVTS_HW_SHUTDOWN_MT8192,
>         },
>         {
> -               .cal_offset = { 0x40, 0x43, 0x46 },
> +               .cal_offset = { 0x3c, 0x40, 0x44 },
>                 .lvts_sensor = {
>                         { .dt_id = MT8192_AP_MD0 },
>                         { .dt_id = MT8192_AP_MD1 },
> --
> 2.40.0
  
Chen-Yu Tsai April 25, 2023, 9:59 a.m. UTC | #10
On Tue, Apr 25, 2023 at 6:21 AM Nícolas F. R. A. Prado
<nfraprado@collabora.com> wrote:
>
> On Tue, Mar 28, 2023 at 02:20:24AM +0200, Balsam CHIHI wrote:
> > On Sat, Mar 25, 2023 at 5:33 AM Chen-Yu Tsai <wenst@chromium.org> wrote:
> > >
> > > On Wed, Mar 22, 2023 at 8:48 PM Balsam CHIHI <bchihi@baylibre.com> wrote:
> > > >
> > > > Hi Chen-Yu,
> > > >
> > > > I suspect the bug comes from incorrect calibration data offsets for AP
> > > > Domain because you confirm that MCU Domain probe runs without issues.
> > > > Is it possible to test something for us to confirm this theory (i
> > > > don't have an mt8192 board on hand now), when you have the time of
> > > > course?
> > > > We would like to test AP Domain's calibration data offsets with a
> > > > working one, for example :
> > > >
> > > >  static const struct lvts_ctrl_data mt8192_lvts_ap_data_ctrl[] = {
> > > >                 {
> > > > -               .cal_offset = { 0x25, 0x28 },
> > > > +               .cal_offset = { 0x04, 0x04 },
> > > >                 .lvts_sensor = {
> > > >                         { .dt_id = MT8192_AP_VPU0 },
> > > >                         { .dt_id = MT8192_AP_VPU1 }
> > > > @@ -1336,7 +1336,7 @@ static const struct lvts_ctrl_data
> [..]
> > > >
> > > > This example is tested and works for mt8195,
> > > > (all sensors use the same calibration data offset for testing purposes).
> > > >
> > > > Thank you in advance for your help.
> > >
> > > The MCU ones are still tripping though. If I change all of them to 0x04,
> > > then nothing trips. There's also a bug in the interrupt handling code
> > > that needs to be dealt with.
> > >
> > > AFAICT the calibration data is stored differently. If you look at ChromeOS's
> > > downstream v5.10 driver, you'll see mt6873_efuse_to_cal_data() for MT8192,
> > > and mt8195_efuse_to_cal_data() for MT8195. The difference sums up to:
> > > MT8195 has all data sequentially stored, while MT8192 has most data stored
> > > in lower 24 bits of each 32-bit word, and the highest 8 bits are then used
> > > to pack data for the remaining sensors.
> > >
> > > Regards
> > > ChenYu
> >
> > Hi Chen-Yu Tsai,
> >
> > Thank you very much for helping me testing this suggestion.
> >
> > Indeed, calibration data is stored differently in the mt8192 compared to mt8195.
> > So, the mt8192's support will be delayed for now, to allow further debugging.
> >
> > In the mean time, we will only continue to upstream the remaining
> > mt8195's source code, so it will get full LVTS support.
> > A new series will be submitted soon.
>
> Hi Balsam,
>
> like Chen-Yu mentioned, the calibration data is stored with 4 byte alignment for
> MT8192, but the data that is split between non-contiguous bytes is for the
> thermal controllers (called Resistor-Capacitor Calibration downstream) not the
> sensors. The controller calibration isn't currently handled in this driver (and
> downstream it also isn't used, since a current value is read from the controller
> instead), so we can just ignore those.
>
> The patch below adjusts the addresseses for the sensors and gives me reasonable
> reads, so the machine no longer reboots. Can you integrate it into your series?

Not sure what I got wrong, but on my machine the VPU0 and VPU1 zone interrupts
are still tripping excessively. The readings seem normal though. Specifically,
it's bits 16 and 17 that are tripping.

> Thanks,
> Nícolas
>
> From 4506f03b806f3eeb89887bac2c1c86d61da97281 Mon Sep 17 00:00:00 2001
> From: =?UTF-8?q?N=C3=ADcolas=20F=2E=20R=2E=20A=2E=20Prado?=
>  <nfraprado@collabora.com>
> Date: Mon, 24 Apr 2023 17:42:42 -0400
> Subject: [PATCH] thermal/drivers/mediatek/lvts_thermal: Fix calibration
>  offsets for MT8192
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
>
> Signed-off-by: Nícolas F. R. A. Prado <nfraprado@collabora.com>
> ---
>  drivers/thermal/mediatek/lvts_thermal.c | 14 +++++++-------
>  1 file changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/thermal/mediatek/lvts_thermal.c b/drivers/thermal/mediatek/lvts_thermal.c
> index b6956c89d557..f8afbc2ac190 100644
> --- a/drivers/thermal/mediatek/lvts_thermal.c
> +++ b/drivers/thermal/mediatek/lvts_thermal.c
> @@ -1261,7 +1261,7 @@ static const struct lvts_ctrl_data mt8195_lvts_ap_data_ctrl[] = {
>
>  static const struct lvts_ctrl_data mt8192_lvts_mcu_data_ctrl[] = {
>         {
> -               .cal_offset = { 0x04, 0x07 },
> +               .cal_offset = { 0x04, 0x08 },
>                 .lvts_sensor = {
>                         { .dt_id = MT8192_MCU_BIG_CPU0 },
>                         { .dt_id = MT8192_MCU_BIG_CPU1 }
> @@ -1271,7 +1271,7 @@ static const struct lvts_ctrl_data mt8192_lvts_mcu_data_ctrl[] = {
>                 .hw_tshut_temp = LVTS_HW_SHUTDOWN_MT8192,
>         },
>         {
> -               .cal_offset = { 0x0d, 0x10 },
> +               .cal_offset = { 0x0c, 0x10 },
>                 .lvts_sensor = {
>                         { .dt_id = MT8192_MCU_BIG_CPU2 },
>                         { .dt_id = MT8192_MCU_BIG_CPU3 }
> @@ -1281,7 +1281,7 @@ static const struct lvts_ctrl_data mt8192_lvts_mcu_data_ctrl[] = {
>                 .hw_tshut_temp = LVTS_HW_SHUTDOWN_MT8192,
>         },
>         {
> -               .cal_offset = { 0x16, 0x19, 0x1c, 0x1f },
> +               .cal_offset = { 0x14, 0x18, 0x1c, 0x20 },
>                 .lvts_sensor = {
>                         { .dt_id = MT8192_MCU_LITTLE_CPU0 },
>                         { .dt_id = MT8192_MCU_LITTLE_CPU1 },
> @@ -1296,7 +1296,7 @@ static const struct lvts_ctrl_data mt8192_lvts_mcu_data_ctrl[] = {
>
>  static const struct lvts_ctrl_data mt8192_lvts_ap_data_ctrl[] = {
>                 {
> -               .cal_offset = { 0x25, 0x28 },
> +               .cal_offset = { 0x24, 0x28 },
>                 .lvts_sensor = {
>                         { .dt_id = MT8192_AP_VPU0 },
>                         { .dt_id = MT8192_AP_VPU1 }
> @@ -1306,7 +1306,7 @@ static const struct lvts_ctrl_data mt8192_lvts_ap_data_ctrl[] = {
>                 .hw_tshut_temp = LVTS_HW_SHUTDOWN_MT8192,
>         },
>         {
> -               .cal_offset = { 0x2e, 0x31 },
> +               .cal_offset = { 0x2c, 0x30 },
>                 .lvts_sensor = {
>                         { .dt_id = MT8192_AP_GPU0 },
>                         { .dt_id = MT8192_AP_GPU1 }
> @@ -1316,7 +1316,7 @@ static const struct lvts_ctrl_data mt8192_lvts_ap_data_ctrl[] = {
>                 .hw_tshut_temp = LVTS_HW_SHUTDOWN_MT8192,
>         },
>         {
> -               .cal_offset = { 0x37, 0x3a },
> +               .cal_offset = { 0x34, 0x38 },
>                 .lvts_sensor = {
>                         { .dt_id = MT8192_AP_INFRA },
>                         { .dt_id = MT8192_AP_CAM },
> @@ -1326,7 +1326,7 @@ static const struct lvts_ctrl_data mt8192_lvts_ap_data_ctrl[] = {
>                 .hw_tshut_temp = LVTS_HW_SHUTDOWN_MT8192,
>         },
>         {
> -               .cal_offset = { 0x40, 0x43, 0x46 },
> +               .cal_offset = { 0x3c, 0x40, 0x44 },
>                 .lvts_sensor = {
>                         { .dt_id = MT8192_AP_MD0 },
>                         { .dt_id = MT8192_AP_MD1 },
> --
> 2.40.0
  
Balsam CHIHI April 25, 2023, 11:28 a.m. UTC | #11
On Tue, Apr 25, 2023 at 12:00 PM Chen-Yu Tsai <wenst@chromium.org> wrote:
>
> On Tue, Apr 25, 2023 at 6:21 AM Nícolas F. R. A. Prado
> <nfraprado@collabora.com> wrote:
> >
> > On Tue, Mar 28, 2023 at 02:20:24AM +0200, Balsam CHIHI wrote:
> > > On Sat, Mar 25, 2023 at 5:33 AM Chen-Yu Tsai <wenst@chromium.org> wrote:
> > > >
> > > > On Wed, Mar 22, 2023 at 8:48 PM Balsam CHIHI <bchihi@baylibre.com> wrote:
> > > > >
> > > > > Hi Chen-Yu,
> > > > >
> > > > > I suspect the bug comes from incorrect calibration data offsets for AP
> > > > > Domain because you confirm that MCU Domain probe runs without issues.
> > > > > Is it possible to test something for us to confirm this theory (i
> > > > > don't have an mt8192 board on hand now), when you have the time of
> > > > > course?
> > > > > We would like to test AP Domain's calibration data offsets with a
> > > > > working one, for example :
> > > > >
> > > > >  static const struct lvts_ctrl_data mt8192_lvts_ap_data_ctrl[] = {
> > > > >                 {
> > > > > -               .cal_offset = { 0x25, 0x28 },
> > > > > +               .cal_offset = { 0x04, 0x04 },
> > > > >                 .lvts_sensor = {
> > > > >                         { .dt_id = MT8192_AP_VPU0 },
> > > > >                         { .dt_id = MT8192_AP_VPU1 }
> > > > > @@ -1336,7 +1336,7 @@ static const struct lvts_ctrl_data
> > [..]
> > > > >
> > > > > This example is tested and works for mt8195,
> > > > > (all sensors use the same calibration data offset for testing purposes).
> > > > >
> > > > > Thank you in advance for your help.
> > > >
> > > > The MCU ones are still tripping though. If I change all of them to 0x04,
> > > > then nothing trips. There's also a bug in the interrupt handling code
> > > > that needs to be dealt with.
> > > >
> > > > AFAICT the calibration data is stored differently. If you look at ChromeOS's
> > > > downstream v5.10 driver, you'll see mt6873_efuse_to_cal_data() for MT8192,
> > > > and mt8195_efuse_to_cal_data() for MT8195. The difference sums up to:
> > > > MT8195 has all data sequentially stored, while MT8192 has most data stored
> > > > in lower 24 bits of each 32-bit word, and the highest 8 bits are then used
> > > > to pack data for the remaining sensors.
> > > >
> > > > Regards
> > > > ChenYu
> > >
> > > Hi Chen-Yu Tsai,
> > >
> > > Thank you very much for helping me testing this suggestion.
> > >
> > > Indeed, calibration data is stored differently in the mt8192 compared to mt8195.
> > > So, the mt8192's support will be delayed for now, to allow further debugging.
> > >
> > > In the mean time, we will only continue to upstream the remaining
> > > mt8195's source code, so it will get full LVTS support.
> > > A new series will be submitted soon.
> >
> > Hi Balsam,
> >
> > like Chen-Yu mentioned, the calibration data is stored with 4 byte alignment for
> > MT8192, but the data that is split between non-contiguous bytes is for the
> > thermal controllers (called Resistor-Capacitor Calibration downstream) not the
> > sensors. The controller calibration isn't currently handled in this driver (and
> > downstream it also isn't used, since a current value is read from the controller
> > instead), so we can just ignore those.
> >
> > The patch below adjusts the addresseses for the sensors and gives me reasonable
> > reads, so the machine no longer reboots. Can you integrate it into your series?
>
> Not sure what I got wrong, but on my machine the VPU0 and VPU1 zone interrupts
> are still tripping excessively. The readings seem normal though. Specifically,
> it's bits 16 and 17 that are tripping.
>

Hi Chen-Yu,

Thank you for testing!

As the readings are normal that proves that calibration data offsets
are correct.
would you like that I send the v2 of series to add mt8192 support?
Then we could deal with the interrupts later in a separate fix,
because the interrupt code in common for both SoC (mt8192 and mt8195)?

Does Nícolas also have tripping interrupts?
On my side, I've got no interrupts tripping on mt8195.

Any other suggestions (a question for everyone)?

Best regards,
Balsam

> > Thanks,
> > Nícolas
> >
> > From 4506f03b806f3eeb89887bac2c1c86d61da97281 Mon Sep 17 00:00:00 2001
> > From: =?UTF-8?q?N=C3=ADcolas=20F=2E=20R=2E=20A=2E=20Prado?=
> >  <nfraprado@collabora.com>
> > Date: Mon, 24 Apr 2023 17:42:42 -0400
> > Subject: [PATCH] thermal/drivers/mediatek/lvts_thermal: Fix calibration
> >  offsets for MT8192
> > MIME-Version: 1.0
> > Content-Type: text/plain; charset=UTF-8
> > Content-Transfer-Encoding: 8bit
> >
> > Signed-off-by: Nícolas F. R. A. Prado <nfraprado@collabora.com>
> > ---
> >  drivers/thermal/mediatek/lvts_thermal.c | 14 +++++++-------
> >  1 file changed, 7 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/thermal/mediatek/lvts_thermal.c b/drivers/thermal/mediatek/lvts_thermal.c
> > index b6956c89d557..f8afbc2ac190 100644
> > --- a/drivers/thermal/mediatek/lvts_thermal.c
> > +++ b/drivers/thermal/mediatek/lvts_thermal.c
> > @@ -1261,7 +1261,7 @@ static const struct lvts_ctrl_data mt8195_lvts_ap_data_ctrl[] = {
> >
> >  static const struct lvts_ctrl_data mt8192_lvts_mcu_data_ctrl[] = {
> >         {
> > -               .cal_offset = { 0x04, 0x07 },
> > +               .cal_offset = { 0x04, 0x08 },
> >                 .lvts_sensor = {
> >                         { .dt_id = MT8192_MCU_BIG_CPU0 },
> >                         { .dt_id = MT8192_MCU_BIG_CPU1 }
> > @@ -1271,7 +1271,7 @@ static const struct lvts_ctrl_data mt8192_lvts_mcu_data_ctrl[] = {
> >                 .hw_tshut_temp = LVTS_HW_SHUTDOWN_MT8192,
> >         },
> >         {
> > -               .cal_offset = { 0x0d, 0x10 },
> > +               .cal_offset = { 0x0c, 0x10 },
> >                 .lvts_sensor = {
> >                         { .dt_id = MT8192_MCU_BIG_CPU2 },
> >                         { .dt_id = MT8192_MCU_BIG_CPU3 }
> > @@ -1281,7 +1281,7 @@ static const struct lvts_ctrl_data mt8192_lvts_mcu_data_ctrl[] = {
> >                 .hw_tshut_temp = LVTS_HW_SHUTDOWN_MT8192,
> >         },
> >         {
> > -               .cal_offset = { 0x16, 0x19, 0x1c, 0x1f },
> > +               .cal_offset = { 0x14, 0x18, 0x1c, 0x20 },
> >                 .lvts_sensor = {
> >                         { .dt_id = MT8192_MCU_LITTLE_CPU0 },
> >                         { .dt_id = MT8192_MCU_LITTLE_CPU1 },
> > @@ -1296,7 +1296,7 @@ static const struct lvts_ctrl_data mt8192_lvts_mcu_data_ctrl[] = {
> >
> >  static const struct lvts_ctrl_data mt8192_lvts_ap_data_ctrl[] = {
> >                 {
> > -               .cal_offset = { 0x25, 0x28 },
> > +               .cal_offset = { 0x24, 0x28 },
> >                 .lvts_sensor = {
> >                         { .dt_id = MT8192_AP_VPU0 },
> >                         { .dt_id = MT8192_AP_VPU1 }
> > @@ -1306,7 +1306,7 @@ static const struct lvts_ctrl_data mt8192_lvts_ap_data_ctrl[] = {
> >                 .hw_tshut_temp = LVTS_HW_SHUTDOWN_MT8192,
> >         },
> >         {
> > -               .cal_offset = { 0x2e, 0x31 },
> > +               .cal_offset = { 0x2c, 0x30 },
> >                 .lvts_sensor = {
> >                         { .dt_id = MT8192_AP_GPU0 },
> >                         { .dt_id = MT8192_AP_GPU1 }
> > @@ -1316,7 +1316,7 @@ static const struct lvts_ctrl_data mt8192_lvts_ap_data_ctrl[] = {
> >                 .hw_tshut_temp = LVTS_HW_SHUTDOWN_MT8192,
> >         },
> >         {
> > -               .cal_offset = { 0x37, 0x3a },
> > +               .cal_offset = { 0x34, 0x38 },
> >                 .lvts_sensor = {
> >                         { .dt_id = MT8192_AP_INFRA },
> >                         { .dt_id = MT8192_AP_CAM },
> > @@ -1326,7 +1326,7 @@ static const struct lvts_ctrl_data mt8192_lvts_ap_data_ctrl[] = {
> >                 .hw_tshut_temp = LVTS_HW_SHUTDOWN_MT8192,
> >         },
> >         {
> > -               .cal_offset = { 0x40, 0x43, 0x46 },
> > +               .cal_offset = { 0x3c, 0x40, 0x44 },
> >                 .lvts_sensor = {
> >                         { .dt_id = MT8192_AP_MD0 },
> >                         { .dt_id = MT8192_AP_MD1 },
> > --
> > 2.40.0
  
Nícolas F. R. A. Prado April 26, 2023, 11:20 p.m. UTC | #12
On Tue, Apr 25, 2023 at 01:28:39PM +0200, Balsam CHIHI wrote:
> On Tue, Apr 25, 2023 at 12:00 PM Chen-Yu Tsai <wenst@chromium.org> wrote:
> >
> > On Tue, Apr 25, 2023 at 6:21 AM Nícolas F. R. A. Prado
> > <nfraprado@collabora.com> wrote:
> > >
> > > On Tue, Mar 28, 2023 at 02:20:24AM +0200, Balsam CHIHI wrote:
> > > > On Sat, Mar 25, 2023 at 5:33 AM Chen-Yu Tsai <wenst@chromium.org> wrote:
> > > > >
> > > > > On Wed, Mar 22, 2023 at 8:48 PM Balsam CHIHI <bchihi@baylibre.com> wrote:
[..]
> > > >
> > > > Hi Chen-Yu Tsai,
> > > >
> > > > Thank you very much for helping me testing this suggestion.
> > > >
> > > > Indeed, calibration data is stored differently in the mt8192 compared to mt8195.
> > > > So, the mt8192's support will be delayed for now, to allow further debugging.
> > > >
> > > > In the mean time, we will only continue to upstream the remaining
> > > > mt8195's source code, so it will get full LVTS support.
> > > > A new series will be submitted soon.
> > >
> > > Hi Balsam,
> > >
> > > like Chen-Yu mentioned, the calibration data is stored with 4 byte alignment for
> > > MT8192, but the data that is split between non-contiguous bytes is for the
> > > thermal controllers (called Resistor-Capacitor Calibration downstream) not the
> > > sensors. The controller calibration isn't currently handled in this driver (and
> > > downstream it also isn't used, since a current value is read from the controller
> > > instead), so we can just ignore those.
> > >
> > > The patch below adjusts the addresseses for the sensors and gives me reasonable
> > > reads, so the machine no longer reboots. Can you integrate it into your series?
> >
> > Not sure what I got wrong, but on my machine the VPU0 and VPU1 zone interrupts
> > are still tripping excessively. The readings seem normal though. Specifically,
> > it's bits 16 and 17 that are tripping.
> >
> 
> Hi Chen-Yu,
> 
> Thank you for testing!
> 
> As the readings are normal that proves that calibration data offsets
> are correct.
> would you like that I send the v2 of series to add mt8192 support?
> Then we could deal with the interrupts later in a separate fix,
> because the interrupt code in common for both SoC (mt8192 and mt8195)?
> 
> Does Nícolas also have tripping interrupts?
> On my side, I've got no interrupts tripping on mt8195.
> 
> Any other suggestions (a question for everyone)?

Hi,

sorry for the delay.

Indeed the interrupts are constantly tripping on mt8192 here as well.

I do not see the same bits as Chen-Yu mentioned however, I see

LVTS_MONINTSTS = 0x08070000

which corresponds to

	Hot threshold on sense point 3
	high to normal offset on sense point 2
	high offset on sense point 2
	low offset on sense point 2

and it's the same on all controllers and domains here, which is weird. I noticed
we have offset interrupts enabled even though we don't configure the values for
those, but even after disabling them and clearing the status register, the
interrupts keep triggering and the status is the same, so for some reason
LVTS_MONINT doesn't seem to be honored.

I also tried using the filtered mode instead of immediate for the sensors, and
that together with disabling the extra interrupts, got me a zeroed
LVTS_MONINTSTS. However no interrupts seem to be triggered at all (nor
LVTS_MONINTSTS updated) when the temperature goes over the configured one in
LVTS_HTHRE.

I tried the driver on mt8195 (Tomato chromebook) as well, and it has the same
LVTS_MONINTSTS = 0x08070000
even though the interrupts aren't being triggered, but in fact I don't see them
triggering over the threshold either, so I suspect the irq number might be
incorrectly described in the DT there.

Do either of you have it working correctly on mt8195?

Anyway, I'll keep digging and reply here when I find a solution.

Thanks,
Nícolas
  
Balsam CHIHI April 27, 2023, 2:08 p.m. UTC | #13
On Thu, Apr 27, 2023 at 1:20 AM Nícolas F. R. A. Prado
<nfraprado@collabora.com> wrote:
>
> On Tue, Apr 25, 2023 at 01:28:39PM +0200, Balsam CHIHI wrote:
> > On Tue, Apr 25, 2023 at 12:00 PM Chen-Yu Tsai <wenst@chromium.org> wrote:
> > >
> > > On Tue, Apr 25, 2023 at 6:21 AM Nícolas F. R. A. Prado
> > > <nfraprado@collabora.com> wrote:
> > > >
> > > > On Tue, Mar 28, 2023 at 02:20:24AM +0200, Balsam CHIHI wrote:
> > > > > On Sat, Mar 25, 2023 at 5:33 AM Chen-Yu Tsai <wenst@chromium.org> wrote:
> > > > > >
> > > > > > On Wed, Mar 22, 2023 at 8:48 PM Balsam CHIHI <bchihi@baylibre.com> wrote:
> [..]
> > > > >
> > > > > Hi Chen-Yu Tsai,
> > > > >
> > > > > Thank you very much for helping me testing this suggestion.
> > > > >
> > > > > Indeed, calibration data is stored differently in the mt8192 compared to mt8195.
> > > > > So, the mt8192's support will be delayed for now, to allow further debugging.
> > > > >
> > > > > In the mean time, we will only continue to upstream the remaining
> > > > > mt8195's source code, so it will get full LVTS support.
> > > > > A new series will be submitted soon.
> > > >
> > > > Hi Balsam,
> > > >
> > > > like Chen-Yu mentioned, the calibration data is stored with 4 byte alignment for
> > > > MT8192, but the data that is split between non-contiguous bytes is for the
> > > > thermal controllers (called Resistor-Capacitor Calibration downstream) not the
> > > > sensors. The controller calibration isn't currently handled in this driver (and
> > > > downstream it also isn't used, since a current value is read from the controller
> > > > instead), so we can just ignore those.
> > > >
> > > > The patch below adjusts the addresseses for the sensors and gives me reasonable
> > > > reads, so the machine no longer reboots. Can you integrate it into your series?
> > >
> > > Not sure what I got wrong, but on my machine the VPU0 and VPU1 zone interrupts
> > > are still tripping excessively. The readings seem normal though. Specifically,
> > > it's bits 16 and 17 that are tripping.
> > >
> >
> > Hi Chen-Yu,
> >
> > Thank you for testing!
> >
> > As the readings are normal that proves that calibration data offsets
> > are correct.
> > would you like that I send the v2 of series to add mt8192 support?
> > Then we could deal with the interrupts later in a separate fix,
> > because the interrupt code in common for both SoC (mt8192 and mt8195)?
> >
> > Does Nícolas also have tripping interrupts?
> > On my side, I've got no interrupts tripping on mt8195.
> >
> > Any other suggestions (a question for everyone)?
>
> Hi,
>
> sorry for the delay.
>
> Indeed the interrupts are constantly tripping on mt8192 here as well.
>
> I do not see the same bits as Chen-Yu mentioned however, I see
>
> LVTS_MONINTSTS = 0x08070000
>
> which corresponds to
>
>         Hot threshold on sense point 3
>         high to normal offset on sense point 2
>         high offset on sense point 2
>         low offset on sense point 2
>
> and it's the same on all controllers and domains here, which is weird. I noticed
> we have offset interrupts enabled even though we don't configure the values for
> those, but even after disabling them and clearing the status register, the
> interrupts keep triggering and the status is the same, so for some reason
> LVTS_MONINT doesn't seem to be honored.
>
> I also tried using the filtered mode instead of immediate for the sensors, and
> that together with disabling the extra interrupts, got me a zeroed
> LVTS_MONINTSTS. However no interrupts seem to be triggered at all (nor
> LVTS_MONINTSTS updated) when the temperature goes over the configured one in
> LVTS_HTHRE.
>
> I tried the driver on mt8195 (Tomato chromebook) as well, and it has the same
> LVTS_MONINTSTS = 0x08070000
> even though the interrupts aren't being triggered, but in fact I don't see them
> triggering over the threshold either, so I suspect the irq number might be
> incorrectly described in the DT there.
>
> Do either of you have it working correctly on mt8195?
>
> Anyway, I'll keep digging and reply here when I find a solution.

Hi Nícolas,

Thank your for your time testing and investigating the interrupt issues!

I only have an mt8195 based board (i1200-demo), and I could not
trigger any interrupt on it.
I whish that MediaTek could reply to this thread to give us more
information (I avoid disclosing MediaTek's internal information).
And now, it's clear that mt8192 interrupts does work at least (but not
properly, may be we could fix it at driver level).

It's been a couple of days since I sent a v2 of the series that adds
LVTS support for mt8192 SoC (+ Suspend and Resume, + Doc update):
"https://lore.kernel.org/all/20230425133052.199767-1-bchihi@baylibre.com/".
I wish that it will be applied very soon, then we could patch the core driver.

My colleagues "Alexandre Mergnat (amergnat@baylibre.com)" and
"Alexandre Bailon (abailon@baylibre.com)" are now part of this
project.
Please let them know of future information.

Thanks again for suggesting solutions!

Best regards,
Balsam

>
> Thanks,
> Nícolas
  
Nícolas F. R. A. Prado April 28, 2023, 8 p.m. UTC | #14
On Thu, Apr 27, 2023 at 04:08:13PM +0200, Balsam CHIHI wrote:
> On Thu, Apr 27, 2023 at 1:20 AM Nícolas F. R. A. Prado
> <nfraprado@collabora.com> wrote:
> >
> > On Tue, Apr 25, 2023 at 01:28:39PM +0200, Balsam CHIHI wrote:
> > > On Tue, Apr 25, 2023 at 12:00 PM Chen-Yu Tsai <wenst@chromium.org> wrote:
> > > >
> > > > On Tue, Apr 25, 2023 at 6:21 AM Nícolas F. R. A. Prado
> > > > <nfraprado@collabora.com> wrote:
> > > > >
> > > > > On Tue, Mar 28, 2023 at 02:20:24AM +0200, Balsam CHIHI wrote:
> > > > > > On Sat, Mar 25, 2023 at 5:33 AM Chen-Yu Tsai <wenst@chromium.org> wrote:
> > > > > > >
> > > > > > > On Wed, Mar 22, 2023 at 8:48 PM Balsam CHIHI <bchihi@baylibre.com> wrote:
> > [..]
> > > > > >
> > > > > > Hi Chen-Yu Tsai,
> > > > > >
> > > > > > Thank you very much for helping me testing this suggestion.
> > > > > >
> > > > > > Indeed, calibration data is stored differently in the mt8192 compared to mt8195.
> > > > > > So, the mt8192's support will be delayed for now, to allow further debugging.
> > > > > >
> > > > > > In the mean time, we will only continue to upstream the remaining
> > > > > > mt8195's source code, so it will get full LVTS support.
> > > > > > A new series will be submitted soon.
> > > > >
> > > > > Hi Balsam,
> > > > >
> > > > > like Chen-Yu mentioned, the calibration data is stored with 4 byte alignment for
> > > > > MT8192, but the data that is split between non-contiguous bytes is for the
> > > > > thermal controllers (called Resistor-Capacitor Calibration downstream) not the
> > > > > sensors. The controller calibration isn't currently handled in this driver (and
> > > > > downstream it also isn't used, since a current value is read from the controller
> > > > > instead), so we can just ignore those.
> > > > >
> > > > > The patch below adjusts the addresseses for the sensors and gives me reasonable
> > > > > reads, so the machine no longer reboots. Can you integrate it into your series?
> > > >
> > > > Not sure what I got wrong, but on my machine the VPU0 and VPU1 zone interrupts
> > > > are still tripping excessively. The readings seem normal though. Specifically,
> > > > it's bits 16 and 17 that are tripping.
> > > >
> > >
> > > Hi Chen-Yu,
> > >
> > > Thank you for testing!
> > >
> > > As the readings are normal that proves that calibration data offsets
> > > are correct.
> > > would you like that I send the v2 of series to add mt8192 support?
> > > Then we could deal with the interrupts later in a separate fix,
> > > because the interrupt code in common for both SoC (mt8192 and mt8195)?
> > >
> > > Does Nícolas also have tripping interrupts?
> > > On my side, I've got no interrupts tripping on mt8195.
> > >
> > > Any other suggestions (a question for everyone)?
> >
> > Hi,
> >
> > sorry for the delay.
> >
> > Indeed the interrupts are constantly tripping on mt8192 here as well.
> >
> > I do not see the same bits as Chen-Yu mentioned however, I see
> >
> > LVTS_MONINTSTS = 0x08070000
> >
> > which corresponds to
> >
> >         Hot threshold on sense point 3
> >         high to normal offset on sense point 2
> >         high offset on sense point 2
> >         low offset on sense point 2
> >
> > and it's the same on all controllers and domains here, which is weird. I noticed
> > we have offset interrupts enabled even though we don't configure the values for
> > those, but even after disabling them and clearing the status register, the
> > interrupts keep triggering and the status is the same, so for some reason
> > LVTS_MONINT doesn't seem to be honored.
> >
> > I also tried using the filtered mode instead of immediate for the sensors, and
> > that together with disabling the extra interrupts, got me a zeroed
> > LVTS_MONINTSTS. However no interrupts seem to be triggered at all (nor
> > LVTS_MONINTSTS updated) when the temperature goes over the configured one in
> > LVTS_HTHRE.
> >
> > I tried the driver on mt8195 (Tomato chromebook) as well, and it has the same
> > LVTS_MONINTSTS = 0x08070000
> > even though the interrupts aren't being triggered, but in fact I don't see them
> > triggering over the threshold either, so I suspect the irq number might be
> > incorrectly described in the DT there.
> >
> > Do either of you have it working correctly on mt8195?
> >
> > Anyway, I'll keep digging and reply here when I find a solution.
> 
> Hi Nícolas,
> 
> Thank your for your time testing and investigating the interrupt issues!
> 
> I only have an mt8195 based board (i1200-demo), and I could not
> trigger any interrupt on it.
> I whish that MediaTek could reply to this thread to give us more
> information (I avoid disclosing MediaTek's internal information).
> And now, it's clear that mt8192 interrupts does work at least (but not
> properly, may be we could fix it at driver level).
> 
> It's been a couple of days since I sent a v2 of the series that adds
> LVTS support for mt8192 SoC (+ Suspend and Resume, + Doc update):
> "https://lore.kernel.org/all/20230425133052.199767-1-bchihi@baylibre.com/".
> I wish that it will be applied very soon, then we could patch the core driver.
> 
> My colleagues "Alexandre Mergnat (amergnat@baylibre.com)" and
> "Alexandre Bailon (abailon@baylibre.com)" are now part of this
> project.
> Please let them know of future information.
> 
> Thanks again for suggesting solutions!

Hi,

finally managed to fix the issues. I had mis-read the interrupt status bits,
which made things a whole lot more confusing...

I CC'ed you on the series, but for the archive this is it:
https://lore.kernel.org/all/20230428195347.3832687-1-nfraprado@collabora.com/

Please review/test it if you have the time.

I have one extra comment regarding the mt8192 support, but I'll write it on the
v2 of this series.

Thanks,
Nícolas