zswap: don't warn if none swapcache folio is passed to zswap_load

Message ID 20230810095652.3905184-1-fengwei.yin@intel.com
State New
Headers
Series zswap: don't warn if none swapcache folio is passed to zswap_load |

Commit Message

Yin Fengwei Aug. 10, 2023, 9:56 a.m. UTC
  With mm-unstable branch, if trigger swap activity and it's possible
see following warning:
[  178.093511][  T651] WARNING: CPU: 2 PID: 651 at mm/zswap.c:1387 zswap_load+0x67/0x570
[  178.095155][  T651] Modules linked in:
[  178.096103][  T651] CPU: 2 PID: 651 Comm: gmain Not tainted 6.5.0-rc4-00492-gad3232df3e41 #148
[  178.098372][  T651] Hardware name: QEMU Standard PC (i440FX + PIIX,1996), BIOS 1.14.0-2 04/01/2014
[  178.101114][  T651] RIP: 0010:zswap_load+0x67/0x570
[  178.102359][  T651] Code: a0 78 4b 85 e8 ea db ff ff 48 8b 00 a8 01 0f 84 84 04 00 00 48 89 df e8 d7 db ff ff 48 8b 00 a9 00 00 08 00 0f 85 c4
[  178.106376][  T651] RSP: 0018:ffffc900011b3760 EFLAGS: 00010246
[  178.107675][  T651] RAX: 0017ffffc0080001 RBX: ffffea0004a991c0 RCX:ffffc900011b37dc
[  178.109242][  T651] RDX: 0000000000000000 RSI: 0000000000000001 RDI:ffffea0004a991c0
[  178.110916][  T651] RBP: ffffea0004a991c0 R08: 0000000000000243 R09:00000000c9a1aafc
[  178.112377][  T651] R10: 00000000c9657db3 R11: 000000003c9657db R12:0000000000014b9c
[  178.113698][  T651] R13: ffff88813501e710 R14: ffff88810d591000 R15:0000000000000000
[  178.115008][  T651] FS:  00007fb21a9ff700(0000) GS:ffff88813bc80000(0000) knlGS:0000000000000000
[  178.116423][  T651] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  178.117421][  T651] CR2: 00005632cbfc81f6 CR3: 0000000131450002 CR4:0000000000370ee0
[  178.118683][  T651] DR0: 0000000000000000 DR1: 0000000000000000 DR2:0000000000000000
[  178.119894][  T651] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:0000000000000400
[  178.121087][  T651] Call Trace:
[  178.121654][  T651]  <TASK>
[  178.122109][  T651]  ? zswap_load+0x67/0x570
[  178.122658][  T651]  ? __warn+0x81/0x170
[  178.123119][  T651]  ? zswap_load+0x67/0x570
[  178.123608][  T651]  ? report_bug+0x167/0x190
[  178.124150][  T651]  ? handle_bug+0x3c/0x70
[  178.124615][  T651]  ? exc_invalid_op+0x13/0x60
[  178.125192][  T651]  ? asm_exc_invalid_op+0x16/0x20
[  178.125753][  T651]  ? zswap_load+0x67/0x570
[  178.126231][  T651]  ? lock_acquire+0xbb/0x290
[  178.126745][  T651]  ? folio_add_lru+0x40/0x1c0
[  178.127261][  T651]  ? find_held_lock+0x2b/0x80
[  178.127776][  T651]  swap_readpage+0xc7/0x5c0
[  178.128273][  T651]  do_swap_page+0x86d/0xf50
[  178.128770][  T651]  ? __pte_offset_map+0x3e/0x290
[  178.129321][  T651]  ? __pte_offset_map+0x1c4/0x290
[  178.129883][  T651]  __handle_mm_fault+0x6ad/0xca0
[  178.130419][  T651]  handle_mm_fault+0x18b/0x410
[  178.130992][  T651]  do_user_addr_fault+0x1f1/0x820
[  178.132076][  T651]  exc_page_fault+0x63/0x1a0
[  178.132599][  T651]  asm_exc_page_fault+0x22/0x30

It's possible that swap_readpage() is called with none swapcache folio
in do_swap_page() and trigger this warning. So we shouldn't assume
zswap_load() always takes swapcache folio.

Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
---
 mm/zswap.c | 1 -
 1 file changed, 1 deletion(-)
  

Comments

Yu Zhao Aug. 10, 2023, 6:44 p.m. UTC | #1
On Thu, Aug 10, 2023 at 3:58 AM Yin Fengwei <fengwei.yin@intel.com> wrote:
>
> With mm-unstable branch, if trigger swap activity and it's possible
> see following warning:
> [  178.093511][  T651] WARNING: CPU: 2 PID: 651 at mm/zswap.c:1387 zswap_load+0x67/0x570
> [  178.095155][  T651] Modules linked in:
> [  178.096103][  T651] CPU: 2 PID: 651 Comm: gmain Not tainted 6.5.0-rc4-00492-gad3232df3e41 #148
> [  178.098372][  T651] Hardware name: QEMU Standard PC (i440FX + PIIX,1996), BIOS 1.14.0-2 04/01/2014
> [  178.101114][  T651] RIP: 0010:zswap_load+0x67/0x570
> [  178.102359][  T651] Code: a0 78 4b 85 e8 ea db ff ff 48 8b 00 a8 01 0f 84 84 04 00 00 48 89 df e8 d7 db ff ff 48 8b 00 a9 00 00 08 00 0f 85 c4
> [  178.106376][  T651] RSP: 0018:ffffc900011b3760 EFLAGS: 00010246
> [  178.107675][  T651] RAX: 0017ffffc0080001 RBX: ffffea0004a991c0 RCX:ffffc900011b37dc
> [  178.109242][  T651] RDX: 0000000000000000 RSI: 0000000000000001 RDI:ffffea0004a991c0
> [  178.110916][  T651] RBP: ffffea0004a991c0 R08: 0000000000000243 R09:00000000c9a1aafc
> [  178.112377][  T651] R10: 00000000c9657db3 R11: 000000003c9657db R12:0000000000014b9c
> [  178.113698][  T651] R13: ffff88813501e710 R14: ffff88810d591000 R15:0000000000000000
> [  178.115008][  T651] FS:  00007fb21a9ff700(0000) GS:ffff88813bc80000(0000) knlGS:0000000000000000
> [  178.116423][  T651] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  178.117421][  T651] CR2: 00005632cbfc81f6 CR3: 0000000131450002 CR4:0000000000370ee0
> [  178.118683][  T651] DR0: 0000000000000000 DR1: 0000000000000000 DR2:0000000000000000
> [  178.119894][  T651] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:0000000000000400
> [  178.121087][  T651] Call Trace:
> [  178.121654][  T651]  <TASK>
> [  178.122109][  T651]  ? zswap_load+0x67/0x570
> [  178.122658][  T651]  ? __warn+0x81/0x170
> [  178.123119][  T651]  ? zswap_load+0x67/0x570
> [  178.123608][  T651]  ? report_bug+0x167/0x190
> [  178.124150][  T651]  ? handle_bug+0x3c/0x70
> [  178.124615][  T651]  ? exc_invalid_op+0x13/0x60
> [  178.125192][  T651]  ? asm_exc_invalid_op+0x16/0x20
> [  178.125753][  T651]  ? zswap_load+0x67/0x570
> [  178.126231][  T651]  ? lock_acquire+0xbb/0x290
> [  178.126745][  T651]  ? folio_add_lru+0x40/0x1c0
> [  178.127261][  T651]  ? find_held_lock+0x2b/0x80
> [  178.127776][  T651]  swap_readpage+0xc7/0x5c0
> [  178.128273][  T651]  do_swap_page+0x86d/0xf50
> [  178.128770][  T651]  ? __pte_offset_map+0x3e/0x290
> [  178.129321][  T651]  ? __pte_offset_map+0x1c4/0x290
> [  178.129883][  T651]  __handle_mm_fault+0x6ad/0xca0
> [  178.130419][  T651]  handle_mm_fault+0x18b/0x410
> [  178.130992][  T651]  do_user_addr_fault+0x1f1/0x820
> [  178.132076][  T651]  exc_page_fault+0x63/0x1a0
> [  178.132599][  T651]  asm_exc_page_fault+0x22/0x30
>
> It's possible that swap_readpage() is called with none swapcache folio
> in do_swap_page() and trigger this warning. So we shouldn't assume
> zswap_load() always takes swapcache folio.

Did you use a bdev with QUEUE_FLAG_SYNCHRONOUS? Otherwise it sounds
like a bug to me.
  
Yin Fengwei Aug. 10, 2023, 11:09 p.m. UTC | #2
On 8/11/2023 2:44 AM, Yu Zhao wrote:
> On Thu, Aug 10, 2023 at 3:58 AM Yin Fengwei <fengwei.yin@intel.com> wrote:
>>
>> With mm-unstable branch, if trigger swap activity and it's possible
>> see following warning:
>> [  178.093511][  T651] WARNING: CPU: 2 PID: 651 at mm/zswap.c:1387 zswap_load+0x67/0x570
>> [  178.095155][  T651] Modules linked in:
>> [  178.096103][  T651] CPU: 2 PID: 651 Comm: gmain Not tainted 6.5.0-rc4-00492-gad3232df3e41 #148
>> [  178.098372][  T651] Hardware name: QEMU Standard PC (i440FX + PIIX,1996), BIOS 1.14.0-2 04/01/2014
>> [  178.101114][  T651] RIP: 0010:zswap_load+0x67/0x570
>> [  178.102359][  T651] Code: a0 78 4b 85 e8 ea db ff ff 48 8b 00 a8 01 0f 84 84 04 00 00 48 89 df e8 d7 db ff ff 48 8b 00 a9 00 00 08 00 0f 85 c4
>> [  178.106376][  T651] RSP: 0018:ffffc900011b3760 EFLAGS: 00010246
>> [  178.107675][  T651] RAX: 0017ffffc0080001 RBX: ffffea0004a991c0 RCX:ffffc900011b37dc
>> [  178.109242][  T651] RDX: 0000000000000000 RSI: 0000000000000001 RDI:ffffea0004a991c0
>> [  178.110916][  T651] RBP: ffffea0004a991c0 R08: 0000000000000243 R09:00000000c9a1aafc
>> [  178.112377][  T651] R10: 00000000c9657db3 R11: 000000003c9657db R12:0000000000014b9c
>> [  178.113698][  T651] R13: ffff88813501e710 R14: ffff88810d591000 R15:0000000000000000
>> [  178.115008][  T651] FS:  00007fb21a9ff700(0000) GS:ffff88813bc80000(0000) knlGS:0000000000000000
>> [  178.116423][  T651] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [  178.117421][  T651] CR2: 00005632cbfc81f6 CR3: 0000000131450002 CR4:0000000000370ee0
>> [  178.118683][  T651] DR0: 0000000000000000 DR1: 0000000000000000 DR2:0000000000000000
>> [  178.119894][  T651] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:0000000000000400
>> [  178.121087][  T651] Call Trace:
>> [  178.121654][  T651]  <TASK>
>> [  178.122109][  T651]  ? zswap_load+0x67/0x570
>> [  178.122658][  T651]  ? __warn+0x81/0x170
>> [  178.123119][  T651]  ? zswap_load+0x67/0x570
>> [  178.123608][  T651]  ? report_bug+0x167/0x190
>> [  178.124150][  T651]  ? handle_bug+0x3c/0x70
>> [  178.124615][  T651]  ? exc_invalid_op+0x13/0x60
>> [  178.125192][  T651]  ? asm_exc_invalid_op+0x16/0x20
>> [  178.125753][  T651]  ? zswap_load+0x67/0x570
>> [  178.126231][  T651]  ? lock_acquire+0xbb/0x290
>> [  178.126745][  T651]  ? folio_add_lru+0x40/0x1c0
>> [  178.127261][  T651]  ? find_held_lock+0x2b/0x80
>> [  178.127776][  T651]  swap_readpage+0xc7/0x5c0
>> [  178.128273][  T651]  do_swap_page+0x86d/0xf50
>> [  178.128770][  T651]  ? __pte_offset_map+0x3e/0x290
>> [  178.129321][  T651]  ? __pte_offset_map+0x1c4/0x290
>> [  178.129883][  T651]  __handle_mm_fault+0x6ad/0xca0
>> [  178.130419][  T651]  handle_mm_fault+0x18b/0x410
>> [  178.130992][  T651]  do_user_addr_fault+0x1f1/0x820
>> [  178.132076][  T651]  exc_page_fault+0x63/0x1a0
>> [  178.132599][  T651]  asm_exc_page_fault+0x22/0x30
>>
>> It's possible that swap_readpage() is called with none swapcache folio
>> in do_swap_page() and trigger this warning. So we shouldn't assume
>> zswap_load() always takes swapcache folio.
> 
> Did you use a bdev with QUEUE_FLAG_SYNCHRONOUS? Otherwise it sounds
> like a bug to me.
I hit this warning with zram which has QUEUE_FLAG_SYNCHRONOUS set. Thanks.


Regards
Yin, Fengwei
  
Yu Zhao Aug. 10, 2023, 11:13 p.m. UTC | #3
On Thu, Aug 10, 2023 at 5:09 PM Yin, Fengwei <fengwei.yin@intel.com> wrote:
>
>
>
> On 8/11/2023 2:44 AM, Yu Zhao wrote:
> > On Thu, Aug 10, 2023 at 3:58 AM Yin Fengwei <fengwei.yin@intel.com> wrote:
> >>
> >> With mm-unstable branch, if trigger swap activity and it's possible
> >> see following warning:
> >> [  178.093511][  T651] WARNING: CPU: 2 PID: 651 at mm/zswap.c:1387 zswap_load+0x67/0x570
> >> [  178.095155][  T651] Modules linked in:
> >> [  178.096103][  T651] CPU: 2 PID: 651 Comm: gmain Not tainted 6.5.0-rc4-00492-gad3232df3e41 #148
> >> [  178.098372][  T651] Hardware name: QEMU Standard PC (i440FX + PIIX,1996), BIOS 1.14.0-2 04/01/2014
> >> [  178.101114][  T651] RIP: 0010:zswap_load+0x67/0x570
> >> [  178.102359][  T651] Code: a0 78 4b 85 e8 ea db ff ff 48 8b 00 a8 01 0f 84 84 04 00 00 48 89 df e8 d7 db ff ff 48 8b 00 a9 00 00 08 00 0f 85 c4
> >> [  178.106376][  T651] RSP: 0018:ffffc900011b3760 EFLAGS: 00010246
> >> [  178.107675][  T651] RAX: 0017ffffc0080001 RBX: ffffea0004a991c0 RCX:ffffc900011b37dc
> >> [  178.109242][  T651] RDX: 0000000000000000 RSI: 0000000000000001 RDI:ffffea0004a991c0
> >> [  178.110916][  T651] RBP: ffffea0004a991c0 R08: 0000000000000243 R09:00000000c9a1aafc
> >> [  178.112377][  T651] R10: 00000000c9657db3 R11: 000000003c9657db R12:0000000000014b9c
> >> [  178.113698][  T651] R13: ffff88813501e710 R14: ffff88810d591000 R15:0000000000000000
> >> [  178.115008][  T651] FS:  00007fb21a9ff700(0000) GS:ffff88813bc80000(0000) knlGS:0000000000000000
> >> [  178.116423][  T651] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> [  178.117421][  T651] CR2: 00005632cbfc81f6 CR3: 0000000131450002 CR4:0000000000370ee0
> >> [  178.118683][  T651] DR0: 0000000000000000 DR1: 0000000000000000 DR2:0000000000000000
> >> [  178.119894][  T651] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:0000000000000400
> >> [  178.121087][  T651] Call Trace:
> >> [  178.121654][  T651]  <TASK>
> >> [  178.122109][  T651]  ? zswap_load+0x67/0x570
> >> [  178.122658][  T651]  ? __warn+0x81/0x170
> >> [  178.123119][  T651]  ? zswap_load+0x67/0x570
> >> [  178.123608][  T651]  ? report_bug+0x167/0x190
> >> [  178.124150][  T651]  ? handle_bug+0x3c/0x70
> >> [  178.124615][  T651]  ? exc_invalid_op+0x13/0x60
> >> [  178.125192][  T651]  ? asm_exc_invalid_op+0x16/0x20
> >> [  178.125753][  T651]  ? zswap_load+0x67/0x570
> >> [  178.126231][  T651]  ? lock_acquire+0xbb/0x290
> >> [  178.126745][  T651]  ? folio_add_lru+0x40/0x1c0
> >> [  178.127261][  T651]  ? find_held_lock+0x2b/0x80
> >> [  178.127776][  T651]  swap_readpage+0xc7/0x5c0
> >> [  178.128273][  T651]  do_swap_page+0x86d/0xf50
> >> [  178.128770][  T651]  ? __pte_offset_map+0x3e/0x290
> >> [  178.129321][  T651]  ? __pte_offset_map+0x1c4/0x290
> >> [  178.129883][  T651]  __handle_mm_fault+0x6ad/0xca0
> >> [  178.130419][  T651]  handle_mm_fault+0x18b/0x410
> >> [  178.130992][  T651]  do_user_addr_fault+0x1f1/0x820
> >> [  178.132076][  T651]  exc_page_fault+0x63/0x1a0
> >> [  178.132599][  T651]  asm_exc_page_fault+0x22/0x30
> >>
> >> It's possible that swap_readpage() is called with none swapcache folio
> >> in do_swap_page() and trigger this warning. So we shouldn't assume
> >> zswap_load() always takes swapcache folio.
> >
> > Did you use a bdev with QUEUE_FLAG_SYNCHRONOUS? Otherwise it sounds
> > like a bug to me.
>
> I hit this warning with zram which has QUEUE_FLAG_SYNCHRONOUS set. Thanks.

Reviewed-by: Yu Zhao <yuzhao@google.com>
  
Yosry Ahmed Aug. 10, 2023, 11:15 p.m. UTC | #4
On Thu, Aug 10, 2023 at 4:09 PM Yin, Fengwei <fengwei.yin@intel.com> wrote:
>
>
>
> On 8/11/2023 2:44 AM, Yu Zhao wrote:
> > On Thu, Aug 10, 2023 at 3:58 AM Yin Fengwei <fengwei.yin@intel.com> wrote:
> >>
> >> With mm-unstable branch, if trigger swap activity and it's possible
> >> see following warning:
> >> [  178.093511][  T651] WARNING: CPU: 2 PID: 651 at mm/zswap.c:1387 zswap_load+0x67/0x570
> >> [  178.095155][  T651] Modules linked in:
> >> [  178.096103][  T651] CPU: 2 PID: 651 Comm: gmain Not tainted 6.5.0-rc4-00492-gad3232df3e41 #148
> >> [  178.098372][  T651] Hardware name: QEMU Standard PC (i440FX + PIIX,1996), BIOS 1.14.0-2 04/01/2014
> >> [  178.101114][  T651] RIP: 0010:zswap_load+0x67/0x570
> >> [  178.102359][  T651] Code: a0 78 4b 85 e8 ea db ff ff 48 8b 00 a8 01 0f 84 84 04 00 00 48 89 df e8 d7 db ff ff 48 8b 00 a9 00 00 08 00 0f 85 c4
> >> [  178.106376][  T651] RSP: 0018:ffffc900011b3760 EFLAGS: 00010246
> >> [  178.107675][  T651] RAX: 0017ffffc0080001 RBX: ffffea0004a991c0 RCX:ffffc900011b37dc
> >> [  178.109242][  T651] RDX: 0000000000000000 RSI: 0000000000000001 RDI:ffffea0004a991c0
> >> [  178.110916][  T651] RBP: ffffea0004a991c0 R08: 0000000000000243 R09:00000000c9a1aafc
> >> [  178.112377][  T651] R10: 00000000c9657db3 R11: 000000003c9657db R12:0000000000014b9c
> >> [  178.113698][  T651] R13: ffff88813501e710 R14: ffff88810d591000 R15:0000000000000000
> >> [  178.115008][  T651] FS:  00007fb21a9ff700(0000) GS:ffff88813bc80000(0000) knlGS:0000000000000000
> >> [  178.116423][  T651] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> [  178.117421][  T651] CR2: 00005632cbfc81f6 CR3: 0000000131450002 CR4:0000000000370ee0
> >> [  178.118683][  T651] DR0: 0000000000000000 DR1: 0000000000000000 DR2:0000000000000000
> >> [  178.119894][  T651] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:0000000000000400
> >> [  178.121087][  T651] Call Trace:
> >> [  178.121654][  T651]  <TASK>
> >> [  178.122109][  T651]  ? zswap_load+0x67/0x570
> >> [  178.122658][  T651]  ? __warn+0x81/0x170
> >> [  178.123119][  T651]  ? zswap_load+0x67/0x570
> >> [  178.123608][  T651]  ? report_bug+0x167/0x190
> >> [  178.124150][  T651]  ? handle_bug+0x3c/0x70
> >> [  178.124615][  T651]  ? exc_invalid_op+0x13/0x60
> >> [  178.125192][  T651]  ? asm_exc_invalid_op+0x16/0x20
> >> [  178.125753][  T651]  ? zswap_load+0x67/0x570
> >> [  178.126231][  T651]  ? lock_acquire+0xbb/0x290
> >> [  178.126745][  T651]  ? folio_add_lru+0x40/0x1c0
> >> [  178.127261][  T651]  ? find_held_lock+0x2b/0x80
> >> [  178.127776][  T651]  swap_readpage+0xc7/0x5c0
> >> [  178.128273][  T651]  do_swap_page+0x86d/0xf50
> >> [  178.128770][  T651]  ? __pte_offset_map+0x3e/0x290
> >> [  178.129321][  T651]  ? __pte_offset_map+0x1c4/0x290
> >> [  178.129883][  T651]  __handle_mm_fault+0x6ad/0xca0
> >> [  178.130419][  T651]  handle_mm_fault+0x18b/0x410
> >> [  178.130992][  T651]  do_user_addr_fault+0x1f1/0x820
> >> [  178.132076][  T651]  exc_page_fault+0x63/0x1a0
> >> [  178.132599][  T651]  asm_exc_page_fault+0x22/0x30
> >>
> >> It's possible that swap_readpage() is called with none swapcache folio
> >> in do_swap_page() and trigger this warning. So we shouldn't assume
> >> zswap_load() always takes swapcache folio.
> >
> > Did you use a bdev with QUEUE_FLAG_SYNCHRONOUS? Otherwise it sounds
> > like a bug to me.
> I hit this warning with zram which has QUEUE_FLAG_SYNCHRONOUS set. Thanks.

Does it make sense to keep the warning and instead change it to check
SWP_SYNCHRONOUS_IO as well? Something like:

VM_WARN_ON_ONCE(!folio_test_swapcache(folio) &&
!swap_type_to_swap_info(type)->flags && SWP_SYNCHRONOUS_IO);

Of course this is too ugly, so perhaps we want a helper to check if a
swapfile is synchronous.

>
>
> Regards
> Yin, Fengwei
>
>
  
Yin Fengwei Aug. 10, 2023, 11:30 p.m. UTC | #5
On 8/11/2023 7:15 AM, Yosry Ahmed wrote:
> On Thu, Aug 10, 2023 at 4:09 PM Yin, Fengwei <fengwei.yin@intel.com> wrote:
>>
>>
>>
>> On 8/11/2023 2:44 AM, Yu Zhao wrote:
>>> On Thu, Aug 10, 2023 at 3:58 AM Yin Fengwei <fengwei.yin@intel.com> wrote:
>>>>
>>>> With mm-unstable branch, if trigger swap activity and it's possible
>>>> see following warning:
>>>> [  178.093511][  T651] WARNING: CPU: 2 PID: 651 at mm/zswap.c:1387 zswap_load+0x67/0x570
>>>> [  178.095155][  T651] Modules linked in:
>>>> [  178.096103][  T651] CPU: 2 PID: 651 Comm: gmain Not tainted 6.5.0-rc4-00492-gad3232df3e41 #148
>>>> [  178.098372][  T651] Hardware name: QEMU Standard PC (i440FX + PIIX,1996), BIOS 1.14.0-2 04/01/2014
>>>> [  178.101114][  T651] RIP: 0010:zswap_load+0x67/0x570
>>>> [  178.102359][  T651] Code: a0 78 4b 85 e8 ea db ff ff 48 8b 00 a8 01 0f 84 84 04 00 00 48 89 df e8 d7 db ff ff 48 8b 00 a9 00 00 08 00 0f 85 c4
>>>> [  178.106376][  T651] RSP: 0018:ffffc900011b3760 EFLAGS: 00010246
>>>> [  178.107675][  T651] RAX: 0017ffffc0080001 RBX: ffffea0004a991c0 RCX:ffffc900011b37dc
>>>> [  178.109242][  T651] RDX: 0000000000000000 RSI: 0000000000000001 RDI:ffffea0004a991c0
>>>> [  178.110916][  T651] RBP: ffffea0004a991c0 R08: 0000000000000243 R09:00000000c9a1aafc
>>>> [  178.112377][  T651] R10: 00000000c9657db3 R11: 000000003c9657db R12:0000000000014b9c
>>>> [  178.113698][  T651] R13: ffff88813501e710 R14: ffff88810d591000 R15:0000000000000000
>>>> [  178.115008][  T651] FS:  00007fb21a9ff700(0000) GS:ffff88813bc80000(0000) knlGS:0000000000000000
>>>> [  178.116423][  T651] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> [  178.117421][  T651] CR2: 00005632cbfc81f6 CR3: 0000000131450002 CR4:0000000000370ee0
>>>> [  178.118683][  T651] DR0: 0000000000000000 DR1: 0000000000000000 DR2:0000000000000000
>>>> [  178.119894][  T651] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:0000000000000400
>>>> [  178.121087][  T651] Call Trace:
>>>> [  178.121654][  T651]  <TASK>
>>>> [  178.122109][  T651]  ? zswap_load+0x67/0x570
>>>> [  178.122658][  T651]  ? __warn+0x81/0x170
>>>> [  178.123119][  T651]  ? zswap_load+0x67/0x570
>>>> [  178.123608][  T651]  ? report_bug+0x167/0x190
>>>> [  178.124150][  T651]  ? handle_bug+0x3c/0x70
>>>> [  178.124615][  T651]  ? exc_invalid_op+0x13/0x60
>>>> [  178.125192][  T651]  ? asm_exc_invalid_op+0x16/0x20
>>>> [  178.125753][  T651]  ? zswap_load+0x67/0x570
>>>> [  178.126231][  T651]  ? lock_acquire+0xbb/0x290
>>>> [  178.126745][  T651]  ? folio_add_lru+0x40/0x1c0
>>>> [  178.127261][  T651]  ? find_held_lock+0x2b/0x80
>>>> [  178.127776][  T651]  swap_readpage+0xc7/0x5c0
>>>> [  178.128273][  T651]  do_swap_page+0x86d/0xf50
>>>> [  178.128770][  T651]  ? __pte_offset_map+0x3e/0x290
>>>> [  178.129321][  T651]  ? __pte_offset_map+0x1c4/0x290
>>>> [  178.129883][  T651]  __handle_mm_fault+0x6ad/0xca0
>>>> [  178.130419][  T651]  handle_mm_fault+0x18b/0x410
>>>> [  178.130992][  T651]  do_user_addr_fault+0x1f1/0x820
>>>> [  178.132076][  T651]  exc_page_fault+0x63/0x1a0
>>>> [  178.132599][  T651]  asm_exc_page_fault+0x22/0x30
>>>>
>>>> It's possible that swap_readpage() is called with none swapcache folio
>>>> in do_swap_page() and trigger this warning. So we shouldn't assume
>>>> zswap_load() always takes swapcache folio.
>>>
>>> Did you use a bdev with QUEUE_FLAG_SYNCHRONOUS? Otherwise it sounds
>>> like a bug to me.
>> I hit this warning with zram which has QUEUE_FLAG_SYNCHRONOUS set. Thanks.
> 
> Does it make sense to keep the warning and instead change it to check
> SWP_SYNCHRONOUS_IO as well? Something like:
> 
> VM_WARN_ON_ONCE(!folio_test_swapcache(folio) &&
> !swap_type_to_swap_info(type)->flags && SWP_SYNCHRONOUS_IO);
> 
> Of course this is too ugly, so perhaps we want a helper to check if a
> swapfile is synchronous.
My understanding was that the WARN here is zswap_load() doesn't expect
a folio not in swapcache. With zram, swap_readpage() must accept the
folio not in swapcache. So this warn should not be there.

But your comment make more sense to me. I will update the patch not
to remove this WARN. Thanks.

Regards
Yin, Fengwei

> 
>>
>>
>> Regards
>> Yin, Fengwei
>>
>>
>
  
Yosry Ahmed Aug. 10, 2023, 11:32 p.m. UTC | #6
On Thu, Aug 10, 2023 at 4:31 PM Yin, Fengwei <fengwei.yin@intel.com> wrote:
>
>
>
> On 8/11/2023 7:15 AM, Yosry Ahmed wrote:
> > On Thu, Aug 10, 2023 at 4:09 PM Yin, Fengwei <fengwei.yin@intel.com> wrote:
> >>
> >>
> >>
> >> On 8/11/2023 2:44 AM, Yu Zhao wrote:
> >>> On Thu, Aug 10, 2023 at 3:58 AM Yin Fengwei <fengwei.yin@intel.com> wrote:
> >>>>
> >>>> With mm-unstable branch, if trigger swap activity and it's possible
> >>>> see following warning:
> >>>> [  178.093511][  T651] WARNING: CPU: 2 PID: 651 at mm/zswap.c:1387 zswap_load+0x67/0x570
> >>>> [  178.095155][  T651] Modules linked in:
> >>>> [  178.096103][  T651] CPU: 2 PID: 651 Comm: gmain Not tainted 6.5.0-rc4-00492-gad3232df3e41 #148
> >>>> [  178.098372][  T651] Hardware name: QEMU Standard PC (i440FX + PIIX,1996), BIOS 1.14.0-2 04/01/2014
> >>>> [  178.101114][  T651] RIP: 0010:zswap_load+0x67/0x570
> >>>> [  178.102359][  T651] Code: a0 78 4b 85 e8 ea db ff ff 48 8b 00 a8 01 0f 84 84 04 00 00 48 89 df e8 d7 db ff ff 48 8b 00 a9 00 00 08 00 0f 85 c4
> >>>> [  178.106376][  T651] RSP: 0018:ffffc900011b3760 EFLAGS: 00010246
> >>>> [  178.107675][  T651] RAX: 0017ffffc0080001 RBX: ffffea0004a991c0 RCX:ffffc900011b37dc
> >>>> [  178.109242][  T651] RDX: 0000000000000000 RSI: 0000000000000001 RDI:ffffea0004a991c0
> >>>> [  178.110916][  T651] RBP: ffffea0004a991c0 R08: 0000000000000243 R09:00000000c9a1aafc
> >>>> [  178.112377][  T651] R10: 00000000c9657db3 R11: 000000003c9657db R12:0000000000014b9c
> >>>> [  178.113698][  T651] R13: ffff88813501e710 R14: ffff88810d591000 R15:0000000000000000
> >>>> [  178.115008][  T651] FS:  00007fb21a9ff700(0000) GS:ffff88813bc80000(0000) knlGS:0000000000000000
> >>>> [  178.116423][  T651] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>>> [  178.117421][  T651] CR2: 00005632cbfc81f6 CR3: 0000000131450002 CR4:0000000000370ee0
> >>>> [  178.118683][  T651] DR0: 0000000000000000 DR1: 0000000000000000 DR2:0000000000000000
> >>>> [  178.119894][  T651] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:0000000000000400
> >>>> [  178.121087][  T651] Call Trace:
> >>>> [  178.121654][  T651]  <TASK>
> >>>> [  178.122109][  T651]  ? zswap_load+0x67/0x570
> >>>> [  178.122658][  T651]  ? __warn+0x81/0x170
> >>>> [  178.123119][  T651]  ? zswap_load+0x67/0x570
> >>>> [  178.123608][  T651]  ? report_bug+0x167/0x190
> >>>> [  178.124150][  T651]  ? handle_bug+0x3c/0x70
> >>>> [  178.124615][  T651]  ? exc_invalid_op+0x13/0x60
> >>>> [  178.125192][  T651]  ? asm_exc_invalid_op+0x16/0x20
> >>>> [  178.125753][  T651]  ? zswap_load+0x67/0x570
> >>>> [  178.126231][  T651]  ? lock_acquire+0xbb/0x290
> >>>> [  178.126745][  T651]  ? folio_add_lru+0x40/0x1c0
> >>>> [  178.127261][  T651]  ? find_held_lock+0x2b/0x80
> >>>> [  178.127776][  T651]  swap_readpage+0xc7/0x5c0
> >>>> [  178.128273][  T651]  do_swap_page+0x86d/0xf50
> >>>> [  178.128770][  T651]  ? __pte_offset_map+0x3e/0x290
> >>>> [  178.129321][  T651]  ? __pte_offset_map+0x1c4/0x290
> >>>> [  178.129883][  T651]  __handle_mm_fault+0x6ad/0xca0
> >>>> [  178.130419][  T651]  handle_mm_fault+0x18b/0x410
> >>>> [  178.130992][  T651]  do_user_addr_fault+0x1f1/0x820
> >>>> [  178.132076][  T651]  exc_page_fault+0x63/0x1a0
> >>>> [  178.132599][  T651]  asm_exc_page_fault+0x22/0x30
> >>>>
> >>>> It's possible that swap_readpage() is called with none swapcache folio
> >>>> in do_swap_page() and trigger this warning. So we shouldn't assume
> >>>> zswap_load() always takes swapcache folio.
> >>>
> >>> Did you use a bdev with QUEUE_FLAG_SYNCHRONOUS? Otherwise it sounds
> >>> like a bug to me.
> >> I hit this warning with zram which has QUEUE_FLAG_SYNCHRONOUS set. Thanks.
> >
> > Does it make sense to keep the warning and instead change it to check
> > SWP_SYNCHRONOUS_IO as well? Something like:
> >
> > VM_WARN_ON_ONCE(!folio_test_swapcache(folio) &&
> > !swap_type_to_swap_info(type)->flags && SWP_SYNCHRONOUS_IO);
> >
> > Of course this is too ugly, so perhaps we want a helper to check if a
> > swapfile is synchronous.
> My understanding was that the WARN here is zswap_load() doesn't expect
> a folio not in swapcache. With zram, swap_readpage() must accept the
> folio not in swapcache. So this warn should not be there.
>
> But your comment make more sense to me. I will update the patch not
> to remove this WARN. Thanks.

Thanks. What I have in mind is that usually zram & zswap are not used
together (which is probably why no one reported this warning before),
so in the common case this warning is valuable.

>
> Regards
> Yin, Fengwei
>
> >
> >>
> >>
> >> Regards
> >> Yin, Fengwei
> >>
> >>
> >
  
Yu Zhao Aug. 10, 2023, 11:43 p.m. UTC | #7
On Thu, Aug 10, 2023 at 5:31 PM Yin, Fengwei <fengwei.yin@intel.com> wrote:
>
>
>
> On 8/11/2023 7:15 AM, Yosry Ahmed wrote:
> > On Thu, Aug 10, 2023 at 4:09 PM Yin, Fengwei <fengwei.yin@intel.com> wrote:
> >>
> >>
> >>
> >> On 8/11/2023 2:44 AM, Yu Zhao wrote:
> >>> On Thu, Aug 10, 2023 at 3:58 AM Yin Fengwei <fengwei.yin@intel.com> wrote:
> >>>>
> >>>> With mm-unstable branch, if trigger swap activity and it's possible
> >>>> see following warning:
> >>>> [  178.093511][  T651] WARNING: CPU: 2 PID: 651 at mm/zswap.c:1387 zswap_load+0x67/0x570
> >>>> [  178.095155][  T651] Modules linked in:
> >>>> [  178.096103][  T651] CPU: 2 PID: 651 Comm: gmain Not tainted 6.5.0-rc4-00492-gad3232df3e41 #148
> >>>> [  178.098372][  T651] Hardware name: QEMU Standard PC (i440FX + PIIX,1996), BIOS 1.14.0-2 04/01/2014
> >>>> [  178.101114][  T651] RIP: 0010:zswap_load+0x67/0x570
> >>>> [  178.102359][  T651] Code: a0 78 4b 85 e8 ea db ff ff 48 8b 00 a8 01 0f 84 84 04 00 00 48 89 df e8 d7 db ff ff 48 8b 00 a9 00 00 08 00 0f 85 c4
> >>>> [  178.106376][  T651] RSP: 0018:ffffc900011b3760 EFLAGS: 00010246
> >>>> [  178.107675][  T651] RAX: 0017ffffc0080001 RBX: ffffea0004a991c0 RCX:ffffc900011b37dc
> >>>> [  178.109242][  T651] RDX: 0000000000000000 RSI: 0000000000000001 RDI:ffffea0004a991c0
> >>>> [  178.110916][  T651] RBP: ffffea0004a991c0 R08: 0000000000000243 R09:00000000c9a1aafc
> >>>> [  178.112377][  T651] R10: 00000000c9657db3 R11: 000000003c9657db R12:0000000000014b9c
> >>>> [  178.113698][  T651] R13: ffff88813501e710 R14: ffff88810d591000 R15:0000000000000000
> >>>> [  178.115008][  T651] FS:  00007fb21a9ff700(0000) GS:ffff88813bc80000(0000) knlGS:0000000000000000
> >>>> [  178.116423][  T651] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>>> [  178.117421][  T651] CR2: 00005632cbfc81f6 CR3: 0000000131450002 CR4:0000000000370ee0
> >>>> [  178.118683][  T651] DR0: 0000000000000000 DR1: 0000000000000000 DR2:0000000000000000
> >>>> [  178.119894][  T651] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:0000000000000400
> >>>> [  178.121087][  T651] Call Trace:
> >>>> [  178.121654][  T651]  <TASK>
> >>>> [  178.122109][  T651]  ? zswap_load+0x67/0x570
> >>>> [  178.122658][  T651]  ? __warn+0x81/0x170
> >>>> [  178.123119][  T651]  ? zswap_load+0x67/0x570
> >>>> [  178.123608][  T651]  ? report_bug+0x167/0x190
> >>>> [  178.124150][  T651]  ? handle_bug+0x3c/0x70
> >>>> [  178.124615][  T651]  ? exc_invalid_op+0x13/0x60
> >>>> [  178.125192][  T651]  ? asm_exc_invalid_op+0x16/0x20
> >>>> [  178.125753][  T651]  ? zswap_load+0x67/0x570
> >>>> [  178.126231][  T651]  ? lock_acquire+0xbb/0x290
> >>>> [  178.126745][  T651]  ? folio_add_lru+0x40/0x1c0
> >>>> [  178.127261][  T651]  ? find_held_lock+0x2b/0x80
> >>>> [  178.127776][  T651]  swap_readpage+0xc7/0x5c0
> >>>> [  178.128273][  T651]  do_swap_page+0x86d/0xf50
> >>>> [  178.128770][  T651]  ? __pte_offset_map+0x3e/0x290
> >>>> [  178.129321][  T651]  ? __pte_offset_map+0x1c4/0x290
> >>>> [  178.129883][  T651]  __handle_mm_fault+0x6ad/0xca0
> >>>> [  178.130419][  T651]  handle_mm_fault+0x18b/0x410
> >>>> [  178.130992][  T651]  do_user_addr_fault+0x1f1/0x820
> >>>> [  178.132076][  T651]  exc_page_fault+0x63/0x1a0
> >>>> [  178.132599][  T651]  asm_exc_page_fault+0x22/0x30
> >>>>
> >>>> It's possible that swap_readpage() is called with none swapcache folio
> >>>> in do_swap_page() and trigger this warning. So we shouldn't assume
> >>>> zswap_load() always takes swapcache folio.
> >>>
> >>> Did you use a bdev with QUEUE_FLAG_SYNCHRONOUS? Otherwise it sounds
> >>> like a bug to me.
> >> I hit this warning with zram which has QUEUE_FLAG_SYNCHRONOUS set. Thanks.
> >
> > Does it make sense to keep the warning and instead change it to check
> > SWP_SYNCHRONOUS_IO as well? Something like:
> >
> > VM_WARN_ON_ONCE(!folio_test_swapcache(folio) &&
> > !swap_type_to_swap_info(type)->flags && SWP_SYNCHRONOUS_IO);
> >
> > Of course this is too ugly, so perhaps we want a helper to check if a
> > swapfile is synchronous.
> My understanding was that the WARN here is zswap_load() doesn't expect
> a folio not in swapcache. With zram, swap_readpage() must accept the
> folio not in swapcache. So this warn should not be there.
>
> But your comment make more sense to me. I will update the patch not
> to remove this WARN. Thanks.

That can cause another warning.

Please don't overegineer.
  
Yosry Ahmed Aug. 10, 2023, 11:45 p.m. UTC | #8
On Thu, Aug 10, 2023 at 4:44 PM Yu Zhao <yuzhao@google.com> wrote:
>
> On Thu, Aug 10, 2023 at 5:31 PM Yin, Fengwei <fengwei.yin@intel.com> wrote:
> >
> >
> >
> > On 8/11/2023 7:15 AM, Yosry Ahmed wrote:
> > > On Thu, Aug 10, 2023 at 4:09 PM Yin, Fengwei <fengwei.yin@intel.com> wrote:
> > >>
> > >>
> > >>
> > >> On 8/11/2023 2:44 AM, Yu Zhao wrote:
> > >>> On Thu, Aug 10, 2023 at 3:58 AM Yin Fengwei <fengwei.yin@intel.com> wrote:
> > >>>>
> > >>>> With mm-unstable branch, if trigger swap activity and it's possible
> > >>>> see following warning:
> > >>>> [  178.093511][  T651] WARNING: CPU: 2 PID: 651 at mm/zswap.c:1387 zswap_load+0x67/0x570
> > >>>> [  178.095155][  T651] Modules linked in:
> > >>>> [  178.096103][  T651] CPU: 2 PID: 651 Comm: gmain Not tainted 6.5.0-rc4-00492-gad3232df3e41 #148
> > >>>> [  178.098372][  T651] Hardware name: QEMU Standard PC (i440FX + PIIX,1996), BIOS 1.14.0-2 04/01/2014
> > >>>> [  178.101114][  T651] RIP: 0010:zswap_load+0x67/0x570
> > >>>> [  178.102359][  T651] Code: a0 78 4b 85 e8 ea db ff ff 48 8b 00 a8 01 0f 84 84 04 00 00 48 89 df e8 d7 db ff ff 48 8b 00 a9 00 00 08 00 0f 85 c4
> > >>>> [  178.106376][  T651] RSP: 0018:ffffc900011b3760 EFLAGS: 00010246
> > >>>> [  178.107675][  T651] RAX: 0017ffffc0080001 RBX: ffffea0004a991c0 RCX:ffffc900011b37dc
> > >>>> [  178.109242][  T651] RDX: 0000000000000000 RSI: 0000000000000001 RDI:ffffea0004a991c0
> > >>>> [  178.110916][  T651] RBP: ffffea0004a991c0 R08: 0000000000000243 R09:00000000c9a1aafc
> > >>>> [  178.112377][  T651] R10: 00000000c9657db3 R11: 000000003c9657db R12:0000000000014b9c
> > >>>> [  178.113698][  T651] R13: ffff88813501e710 R14: ffff88810d591000 R15:0000000000000000
> > >>>> [  178.115008][  T651] FS:  00007fb21a9ff700(0000) GS:ffff88813bc80000(0000) knlGS:0000000000000000
> > >>>> [  178.116423][  T651] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > >>>> [  178.117421][  T651] CR2: 00005632cbfc81f6 CR3: 0000000131450002 CR4:0000000000370ee0
> > >>>> [  178.118683][  T651] DR0: 0000000000000000 DR1: 0000000000000000 DR2:0000000000000000
> > >>>> [  178.119894][  T651] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:0000000000000400
> > >>>> [  178.121087][  T651] Call Trace:
> > >>>> [  178.121654][  T651]  <TASK>
> > >>>> [  178.122109][  T651]  ? zswap_load+0x67/0x570
> > >>>> [  178.122658][  T651]  ? __warn+0x81/0x170
> > >>>> [  178.123119][  T651]  ? zswap_load+0x67/0x570
> > >>>> [  178.123608][  T651]  ? report_bug+0x167/0x190
> > >>>> [  178.124150][  T651]  ? handle_bug+0x3c/0x70
> > >>>> [  178.124615][  T651]  ? exc_invalid_op+0x13/0x60
> > >>>> [  178.125192][  T651]  ? asm_exc_invalid_op+0x16/0x20
> > >>>> [  178.125753][  T651]  ? zswap_load+0x67/0x570
> > >>>> [  178.126231][  T651]  ? lock_acquire+0xbb/0x290
> > >>>> [  178.126745][  T651]  ? folio_add_lru+0x40/0x1c0
> > >>>> [  178.127261][  T651]  ? find_held_lock+0x2b/0x80
> > >>>> [  178.127776][  T651]  swap_readpage+0xc7/0x5c0
> > >>>> [  178.128273][  T651]  do_swap_page+0x86d/0xf50
> > >>>> [  178.128770][  T651]  ? __pte_offset_map+0x3e/0x290
> > >>>> [  178.129321][  T651]  ? __pte_offset_map+0x1c4/0x290
> > >>>> [  178.129883][  T651]  __handle_mm_fault+0x6ad/0xca0
> > >>>> [  178.130419][  T651]  handle_mm_fault+0x18b/0x410
> > >>>> [  178.130992][  T651]  do_user_addr_fault+0x1f1/0x820
> > >>>> [  178.132076][  T651]  exc_page_fault+0x63/0x1a0
> > >>>> [  178.132599][  T651]  asm_exc_page_fault+0x22/0x30
> > >>>>
> > >>>> It's possible that swap_readpage() is called with none swapcache folio
> > >>>> in do_swap_page() and trigger this warning. So we shouldn't assume
> > >>>> zswap_load() always takes swapcache folio.
> > >>>
> > >>> Did you use a bdev with QUEUE_FLAG_SYNCHRONOUS? Otherwise it sounds
> > >>> like a bug to me.
> > >> I hit this warning with zram which has QUEUE_FLAG_SYNCHRONOUS set. Thanks.
> > >
> > > Does it make sense to keep the warning and instead change it to check
> > > SWP_SYNCHRONOUS_IO as well? Something like:
> > >
> > > VM_WARN_ON_ONCE(!folio_test_swapcache(folio) &&
> > > !swap_type_to_swap_info(type)->flags && SWP_SYNCHRONOUS_IO);
> > >
> > > Of course this is too ugly, so perhaps we want a helper to check if a
> > > swapfile is synchronous.
> > My understanding was that the WARN here is zswap_load() doesn't expect
> > a folio not in swapcache. With zram, swap_readpage() must accept the
> > folio not in swapcache. So this warn should not be there.
> >
> > But your comment make more sense to me. I will update the patch not
> > to remove this WARN. Thanks.
>
> That can cause another warning.
>
> Please don't overegineer.

How so?

Using zswap with zram is a weird combination, if anything I would
prefer leaving the warning as-is than removing it to be honest.
  
Yin Fengwei Aug. 11, 2023, 12:36 a.m. UTC | #9
On 8/11/2023 7:43 AM, Yu Zhao wrote:
> On Thu, Aug 10, 2023 at 5:31 PM Yin, Fengwei <fengwei.yin@intel.com> wrote:
>>
>>
>>
>> On 8/11/2023 7:15 AM, Yosry Ahmed wrote:
>>> On Thu, Aug 10, 2023 at 4:09 PM Yin, Fengwei <fengwei.yin@intel.com> wrote:
>>>>
>>>>
>>>>
>>>> On 8/11/2023 2:44 AM, Yu Zhao wrote:
>>>>> On Thu, Aug 10, 2023 at 3:58 AM Yin Fengwei <fengwei.yin@intel.com> wrote:
>>>>>>
>>>>>> With mm-unstable branch, if trigger swap activity and it's possible
>>>>>> see following warning:
>>>>>> [  178.093511][  T651] WARNING: CPU: 2 PID: 651 at mm/zswap.c:1387 zswap_load+0x67/0x570
>>>>>> [  178.095155][  T651] Modules linked in:
>>>>>> [  178.096103][  T651] CPU: 2 PID: 651 Comm: gmain Not tainted 6.5.0-rc4-00492-gad3232df3e41 #148
>>>>>> [  178.098372][  T651] Hardware name: QEMU Standard PC (i440FX + PIIX,1996), BIOS 1.14.0-2 04/01/2014
>>>>>> [  178.101114][  T651] RIP: 0010:zswap_load+0x67/0x570
>>>>>> [  178.102359][  T651] Code: a0 78 4b 85 e8 ea db ff ff 48 8b 00 a8 01 0f 84 84 04 00 00 48 89 df e8 d7 db ff ff 48 8b 00 a9 00 00 08 00 0f 85 c4
>>>>>> [  178.106376][  T651] RSP: 0018:ffffc900011b3760 EFLAGS: 00010246
>>>>>> [  178.107675][  T651] RAX: 0017ffffc0080001 RBX: ffffea0004a991c0 RCX:ffffc900011b37dc
>>>>>> [  178.109242][  T651] RDX: 0000000000000000 RSI: 0000000000000001 RDI:ffffea0004a991c0
>>>>>> [  178.110916][  T651] RBP: ffffea0004a991c0 R08: 0000000000000243 R09:00000000c9a1aafc
>>>>>> [  178.112377][  T651] R10: 00000000c9657db3 R11: 000000003c9657db R12:0000000000014b9c
>>>>>> [  178.113698][  T651] R13: ffff88813501e710 R14: ffff88810d591000 R15:0000000000000000
>>>>>> [  178.115008][  T651] FS:  00007fb21a9ff700(0000) GS:ffff88813bc80000(0000) knlGS:0000000000000000
>>>>>> [  178.116423][  T651] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>> [  178.117421][  T651] CR2: 00005632cbfc81f6 CR3: 0000000131450002 CR4:0000000000370ee0
>>>>>> [  178.118683][  T651] DR0: 0000000000000000 DR1: 0000000000000000 DR2:0000000000000000
>>>>>> [  178.119894][  T651] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:0000000000000400
>>>>>> [  178.121087][  T651] Call Trace:
>>>>>> [  178.121654][  T651]  <TASK>
>>>>>> [  178.122109][  T651]  ? zswap_load+0x67/0x570
>>>>>> [  178.122658][  T651]  ? __warn+0x81/0x170
>>>>>> [  178.123119][  T651]  ? zswap_load+0x67/0x570
>>>>>> [  178.123608][  T651]  ? report_bug+0x167/0x190
>>>>>> [  178.124150][  T651]  ? handle_bug+0x3c/0x70
>>>>>> [  178.124615][  T651]  ? exc_invalid_op+0x13/0x60
>>>>>> [  178.125192][  T651]  ? asm_exc_invalid_op+0x16/0x20
>>>>>> [  178.125753][  T651]  ? zswap_load+0x67/0x570
>>>>>> [  178.126231][  T651]  ? lock_acquire+0xbb/0x290
>>>>>> [  178.126745][  T651]  ? folio_add_lru+0x40/0x1c0
>>>>>> [  178.127261][  T651]  ? find_held_lock+0x2b/0x80
>>>>>> [  178.127776][  T651]  swap_readpage+0xc7/0x5c0
>>>>>> [  178.128273][  T651]  do_swap_page+0x86d/0xf50
>>>>>> [  178.128770][  T651]  ? __pte_offset_map+0x3e/0x290
>>>>>> [  178.129321][  T651]  ? __pte_offset_map+0x1c4/0x290
>>>>>> [  178.129883][  T651]  __handle_mm_fault+0x6ad/0xca0
>>>>>> [  178.130419][  T651]  handle_mm_fault+0x18b/0x410
>>>>>> [  178.130992][  T651]  do_user_addr_fault+0x1f1/0x820
>>>>>> [  178.132076][  T651]  exc_page_fault+0x63/0x1a0
>>>>>> [  178.132599][  T651]  asm_exc_page_fault+0x22/0x30
>>>>>>
>>>>>> It's possible that swap_readpage() is called with none swapcache folio
>>>>>> in do_swap_page() and trigger this warning. So we shouldn't assume
>>>>>> zswap_load() always takes swapcache folio.
>>>>>
>>>>> Did you use a bdev with QUEUE_FLAG_SYNCHRONOUS? Otherwise it sounds
>>>>> like a bug to me.
>>>> I hit this warning with zram which has QUEUE_FLAG_SYNCHRONOUS set. Thanks.
>>>
>>> Does it make sense to keep the warning and instead change it to check
>>> SWP_SYNCHRONOUS_IO as well? Something like:
>>>
>>> VM_WARN_ON_ONCE(!folio_test_swapcache(folio) &&
>>> !swap_type_to_swap_info(type)->flags && SWP_SYNCHRONOUS_IO);
>>>
>>> Of course this is too ugly, so perhaps we want a helper to check if a
>>> swapfile is synchronous.
>> My understanding was that the WARN here is zswap_load() doesn't expect
>> a folio not in swapcache. With zram, swap_readpage() must accept the
>> folio not in swapcache. So this warn should not be there.
>>
>> But your comment make more sense to me. I will update the patch not
>> to remove this WARN. Thanks.
> 
> That can cause another warning.
My understanding is that WARN may be wanted by zswap code.

> 
> Please don't overegineer.
  
Yu Zhao Aug. 11, 2023, 3:02 a.m. UTC | #10
On Thu, Aug 10, 2023 at 6:37 PM Yin, Fengwei <fengwei.yin@intel.com> wrote:
>
>
>
> On 8/11/2023 7:43 AM, Yu Zhao wrote:
> > On Thu, Aug 10, 2023 at 5:31 PM Yin, Fengwei <fengwei.yin@intel.com> wrote:
> >>
> >>
> >>
> >> On 8/11/2023 7:15 AM, Yosry Ahmed wrote:
> >>> On Thu, Aug 10, 2023 at 4:09 PM Yin, Fengwei <fengwei.yin@intel.com> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 8/11/2023 2:44 AM, Yu Zhao wrote:
> >>>>> On Thu, Aug 10, 2023 at 3:58 AM Yin Fengwei <fengwei.yin@intel.com> wrote:
> >>>>>>
> >>>>>> With mm-unstable branch, if trigger swap activity and it's possible
> >>>>>> see following warning:
> >>>>>> [  178.093511][  T651] WARNING: CPU: 2 PID: 651 at mm/zswap.c:1387 zswap_load+0x67/0x570
> >>>>>> [  178.095155][  T651] Modules linked in:
> >>>>>> [  178.096103][  T651] CPU: 2 PID: 651 Comm: gmain Not tainted 6.5.0-rc4-00492-gad3232df3e41 #148
> >>>>>> [  178.098372][  T651] Hardware name: QEMU Standard PC (i440FX + PIIX,1996), BIOS 1.14.0-2 04/01/2014
> >>>>>> [  178.101114][  T651] RIP: 0010:zswap_load+0x67/0x570
> >>>>>> [  178.102359][  T651] Code: a0 78 4b 85 e8 ea db ff ff 48 8b 00 a8 01 0f 84 84 04 00 00 48 89 df e8 d7 db ff ff 48 8b 00 a9 00 00 08 00 0f 85 c4
> >>>>>> [  178.106376][  T651] RSP: 0018:ffffc900011b3760 EFLAGS: 00010246
> >>>>>> [  178.107675][  T651] RAX: 0017ffffc0080001 RBX: ffffea0004a991c0 RCX:ffffc900011b37dc
> >>>>>> [  178.109242][  T651] RDX: 0000000000000000 RSI: 0000000000000001 RDI:ffffea0004a991c0
> >>>>>> [  178.110916][  T651] RBP: ffffea0004a991c0 R08: 0000000000000243 R09:00000000c9a1aafc
> >>>>>> [  178.112377][  T651] R10: 00000000c9657db3 R11: 000000003c9657db R12:0000000000014b9c
> >>>>>> [  178.113698][  T651] R13: ffff88813501e710 R14: ffff88810d591000 R15:0000000000000000
> >>>>>> [  178.115008][  T651] FS:  00007fb21a9ff700(0000) GS:ffff88813bc80000(0000) knlGS:0000000000000000
> >>>>>> [  178.116423][  T651] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>>>>> [  178.117421][  T651] CR2: 00005632cbfc81f6 CR3: 0000000131450002 CR4:0000000000370ee0
> >>>>>> [  178.118683][  T651] DR0: 0000000000000000 DR1: 0000000000000000 DR2:0000000000000000
> >>>>>> [  178.119894][  T651] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:0000000000000400
> >>>>>> [  178.121087][  T651] Call Trace:
> >>>>>> [  178.121654][  T651]  <TASK>
> >>>>>> [  178.122109][  T651]  ? zswap_load+0x67/0x570
> >>>>>> [  178.122658][  T651]  ? __warn+0x81/0x170
> >>>>>> [  178.123119][  T651]  ? zswap_load+0x67/0x570
> >>>>>> [  178.123608][  T651]  ? report_bug+0x167/0x190
> >>>>>> [  178.124150][  T651]  ? handle_bug+0x3c/0x70
> >>>>>> [  178.124615][  T651]  ? exc_invalid_op+0x13/0x60
> >>>>>> [  178.125192][  T651]  ? asm_exc_invalid_op+0x16/0x20
> >>>>>> [  178.125753][  T651]  ? zswap_load+0x67/0x570
> >>>>>> [  178.126231][  T651]  ? lock_acquire+0xbb/0x290
> >>>>>> [  178.126745][  T651]  ? folio_add_lru+0x40/0x1c0
> >>>>>> [  178.127261][  T651]  ? find_held_lock+0x2b/0x80
> >>>>>> [  178.127776][  T651]  swap_readpage+0xc7/0x5c0
> >>>>>> [  178.128273][  T651]  do_swap_page+0x86d/0xf50
> >>>>>> [  178.128770][  T651]  ? __pte_offset_map+0x3e/0x290
> >>>>>> [  178.129321][  T651]  ? __pte_offset_map+0x1c4/0x290
> >>>>>> [  178.129883][  T651]  __handle_mm_fault+0x6ad/0xca0
> >>>>>> [  178.130419][  T651]  handle_mm_fault+0x18b/0x410
> >>>>>> [  178.130992][  T651]  do_user_addr_fault+0x1f1/0x820
> >>>>>> [  178.132076][  T651]  exc_page_fault+0x63/0x1a0
> >>>>>> [  178.132599][  T651]  asm_exc_page_fault+0x22/0x30
> >>>>>>
> >>>>>> It's possible that swap_readpage() is called with none swapcache folio
> >>>>>> in do_swap_page() and trigger this warning. So we shouldn't assume
> >>>>>> zswap_load() always takes swapcache folio.
> >>>>>
> >>>>> Did you use a bdev with QUEUE_FLAG_SYNCHRONOUS? Otherwise it sounds
> >>>>> like a bug to me.
> >>>> I hit this warning with zram which has QUEUE_FLAG_SYNCHRONOUS set. Thanks.
> >>>
> >>> Does it make sense to keep the warning and instead change it to check
> >>> SWP_SYNCHRONOUS_IO as well? Something like:
> >>>
> >>> VM_WARN_ON_ONCE(!folio_test_swapcache(folio) &&
> >>> !swap_type_to_swap_info(type)->flags && SWP_SYNCHRONOUS_IO);
> >>>
> >>> Of course this is too ugly, so perhaps we want a helper to check if a
> >>> swapfile is synchronous.
> >> My understanding was that the WARN here is zswap_load() doesn't expect
> >> a folio not in swapcache. With zram, swap_readpage() must accept the
> >> folio not in swapcache. So this warn should not be there.
> >>
> >> But your comment make more sense to me. I will update the patch not
> >> to remove this WARN. Thanks.
> >
> > That can cause another warning.
> My understanding is that WARN may be wanted by zswap code.
>
> >
> > Please don't overegineer.

The original patch looks good to me. What Yosry suggested seems not
only overengineered but also can cause a new KCSAN warning.
  
Yosry Ahmed Aug. 11, 2023, 3:08 a.m. UTC | #11
On Thu, Aug 10, 2023 at 8:03 PM Yu Zhao <yuzhao@google.com> wrote:
>
> On Thu, Aug 10, 2023 at 6:37 PM Yin, Fengwei <fengwei.yin@intel.com> wrote:
> >
> >
> >
> > On 8/11/2023 7:43 AM, Yu Zhao wrote:
> > > On Thu, Aug 10, 2023 at 5:31 PM Yin, Fengwei <fengwei.yin@intel.com> wrote:
> > >>
> > >>
> > >>
> > >> On 8/11/2023 7:15 AM, Yosry Ahmed wrote:
> > >>> On Thu, Aug 10, 2023 at 4:09 PM Yin, Fengwei <fengwei.yin@intel.com> wrote:
> > >>>>
> > >>>>
> > >>>>
> > >>>> On 8/11/2023 2:44 AM, Yu Zhao wrote:
> > >>>>> On Thu, Aug 10, 2023 at 3:58 AM Yin Fengwei <fengwei.yin@intel.com> wrote:
> > >>>>>>
> > >>>>>> With mm-unstable branch, if trigger swap activity and it's possible
> > >>>>>> see following warning:
> > >>>>>> [  178.093511][  T651] WARNING: CPU: 2 PID: 651 at mm/zswap.c:1387 zswap_load+0x67/0x570
> > >>>>>> [  178.095155][  T651] Modules linked in:
> > >>>>>> [  178.096103][  T651] CPU: 2 PID: 651 Comm: gmain Not tainted 6.5.0-rc4-00492-gad3232df3e41 #148
> > >>>>>> [  178.098372][  T651] Hardware name: QEMU Standard PC (i440FX + PIIX,1996), BIOS 1.14.0-2 04/01/2014
> > >>>>>> [  178.101114][  T651] RIP: 0010:zswap_load+0x67/0x570
> > >>>>>> [  178.102359][  T651] Code: a0 78 4b 85 e8 ea db ff ff 48 8b 00 a8 01 0f 84 84 04 00 00 48 89 df e8 d7 db ff ff 48 8b 00 a9 00 00 08 00 0f 85 c4
> > >>>>>> [  178.106376][  T651] RSP: 0018:ffffc900011b3760 EFLAGS: 00010246
> > >>>>>> [  178.107675][  T651] RAX: 0017ffffc0080001 RBX: ffffea0004a991c0 RCX:ffffc900011b37dc
> > >>>>>> [  178.109242][  T651] RDX: 0000000000000000 RSI: 0000000000000001 RDI:ffffea0004a991c0
> > >>>>>> [  178.110916][  T651] RBP: ffffea0004a991c0 R08: 0000000000000243 R09:00000000c9a1aafc
> > >>>>>> [  178.112377][  T651] R10: 00000000c9657db3 R11: 000000003c9657db R12:0000000000014b9c
> > >>>>>> [  178.113698][  T651] R13: ffff88813501e710 R14: ffff88810d591000 R15:0000000000000000
> > >>>>>> [  178.115008][  T651] FS:  00007fb21a9ff700(0000) GS:ffff88813bc80000(0000) knlGS:0000000000000000
> > >>>>>> [  178.116423][  T651] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > >>>>>> [  178.117421][  T651] CR2: 00005632cbfc81f6 CR3: 0000000131450002 CR4:0000000000370ee0
> > >>>>>> [  178.118683][  T651] DR0: 0000000000000000 DR1: 0000000000000000 DR2:0000000000000000
> > >>>>>> [  178.119894][  T651] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:0000000000000400
> > >>>>>> [  178.121087][  T651] Call Trace:
> > >>>>>> [  178.121654][  T651]  <TASK>
> > >>>>>> [  178.122109][  T651]  ? zswap_load+0x67/0x570
> > >>>>>> [  178.122658][  T651]  ? __warn+0x81/0x170
> > >>>>>> [  178.123119][  T651]  ? zswap_load+0x67/0x570
> > >>>>>> [  178.123608][  T651]  ? report_bug+0x167/0x190
> > >>>>>> [  178.124150][  T651]  ? handle_bug+0x3c/0x70
> > >>>>>> [  178.124615][  T651]  ? exc_invalid_op+0x13/0x60
> > >>>>>> [  178.125192][  T651]  ? asm_exc_invalid_op+0x16/0x20
> > >>>>>> [  178.125753][  T651]  ? zswap_load+0x67/0x570
> > >>>>>> [  178.126231][  T651]  ? lock_acquire+0xbb/0x290
> > >>>>>> [  178.126745][  T651]  ? folio_add_lru+0x40/0x1c0
> > >>>>>> [  178.127261][  T651]  ? find_held_lock+0x2b/0x80
> > >>>>>> [  178.127776][  T651]  swap_readpage+0xc7/0x5c0
> > >>>>>> [  178.128273][  T651]  do_swap_page+0x86d/0xf50
> > >>>>>> [  178.128770][  T651]  ? __pte_offset_map+0x3e/0x290
> > >>>>>> [  178.129321][  T651]  ? __pte_offset_map+0x1c4/0x290
> > >>>>>> [  178.129883][  T651]  __handle_mm_fault+0x6ad/0xca0
> > >>>>>> [  178.130419][  T651]  handle_mm_fault+0x18b/0x410
> > >>>>>> [  178.130992][  T651]  do_user_addr_fault+0x1f1/0x820
> > >>>>>> [  178.132076][  T651]  exc_page_fault+0x63/0x1a0
> > >>>>>> [  178.132599][  T651]  asm_exc_page_fault+0x22/0x30
> > >>>>>>
> > >>>>>> It's possible that swap_readpage() is called with none swapcache folio
> > >>>>>> in do_swap_page() and trigger this warning. So we shouldn't assume
> > >>>>>> zswap_load() always takes swapcache folio.
> > >>>>>
> > >>>>> Did you use a bdev with QUEUE_FLAG_SYNCHRONOUS? Otherwise it sounds
> > >>>>> like a bug to me.
> > >>>> I hit this warning with zram which has QUEUE_FLAG_SYNCHRONOUS set. Thanks.
> > >>>
> > >>> Does it make sense to keep the warning and instead change it to check
> > >>> SWP_SYNCHRONOUS_IO as well? Something like:
> > >>>
> > >>> VM_WARN_ON_ONCE(!folio_test_swapcache(folio) &&
> > >>> !swap_type_to_swap_info(type)->flags && SWP_SYNCHRONOUS_IO);
> > >>>
> > >>> Of course this is too ugly, so perhaps we want a helper to check if a
> > >>> swapfile is synchronous.
> > >> My understanding was that the WARN here is zswap_load() doesn't expect
> > >> a folio not in swapcache. With zram, swap_readpage() must accept the
> > >> folio not in swapcache. So this warn should not be there.
> > >>
> > >> But your comment make more sense to me. I will update the patch not
> > >> to remove this WARN. Thanks.
> > >
> > > That can cause another warning.
> > My understanding is that WARN may be wanted by zswap code.
> >
> > >
> > > Please don't overegineer.
>
> The original patch looks good to me. What Yosry suggested seems not
> only overengineered but also can cause a new KCSAN warning.

I suppose that can be easily mitigated with data_race(), similar to
do_swap_page().

Anyway, I don't feel strongly about it, if you do then we can go with
the current patch :)

It just feels odd to me to drop a warning from zswap due to an
interaction with zram, which should not be happening in practice.
  
Yu Zhao Aug. 11, 2023, 3:11 a.m. UTC | #12
On Thu, Aug 10, 2023 at 5:46 PM Yosry Ahmed <yosryahmed@google.com> wrote:
>
> On Thu, Aug 10, 2023 at 4:44 PM Yu Zhao <yuzhao@google.com> wrote:
> >
> > On Thu, Aug 10, 2023 at 5:31 PM Yin, Fengwei <fengwei.yin@intel.com> wrote:
> > >
> > >
> > >
> > > On 8/11/2023 7:15 AM, Yosry Ahmed wrote:
> > > > On Thu, Aug 10, 2023 at 4:09 PM Yin, Fengwei <fengwei.yin@intel.com> wrote:
> > > >>
> > > >>
> > > >>
> > > >> On 8/11/2023 2:44 AM, Yu Zhao wrote:
> > > >>> On Thu, Aug 10, 2023 at 3:58 AM Yin Fengwei <fengwei.yin@intel.com> wrote:
> > > >>>>
> > > >>>> With mm-unstable branch, if trigger swap activity and it's possible
> > > >>>> see following warning:
> > > >>>> [  178.093511][  T651] WARNING: CPU: 2 PID: 651 at mm/zswap.c:1387 zswap_load+0x67/0x570
> > > >>>> [  178.095155][  T651] Modules linked in:
> > > >>>> [  178.096103][  T651] CPU: 2 PID: 651 Comm: gmain Not tainted 6.5.0-rc4-00492-gad3232df3e41 #148
> > > >>>> [  178.098372][  T651] Hardware name: QEMU Standard PC (i440FX + PIIX,1996), BIOS 1.14.0-2 04/01/2014
> > > >>>> [  178.101114][  T651] RIP: 0010:zswap_load+0x67/0x570
> > > >>>> [  178.102359][  T651] Code: a0 78 4b 85 e8 ea db ff ff 48 8b 00 a8 01 0f 84 84 04 00 00 48 89 df e8 d7 db ff ff 48 8b 00 a9 00 00 08 00 0f 85 c4
> > > >>>> [  178.106376][  T651] RSP: 0018:ffffc900011b3760 EFLAGS: 00010246
> > > >>>> [  178.107675][  T651] RAX: 0017ffffc0080001 RBX: ffffea0004a991c0 RCX:ffffc900011b37dc
> > > >>>> [  178.109242][  T651] RDX: 0000000000000000 RSI: 0000000000000001 RDI:ffffea0004a991c0
> > > >>>> [  178.110916][  T651] RBP: ffffea0004a991c0 R08: 0000000000000243 R09:00000000c9a1aafc
> > > >>>> [  178.112377][  T651] R10: 00000000c9657db3 R11: 000000003c9657db R12:0000000000014b9c
> > > >>>> [  178.113698][  T651] R13: ffff88813501e710 R14: ffff88810d591000 R15:0000000000000000
> > > >>>> [  178.115008][  T651] FS:  00007fb21a9ff700(0000) GS:ffff88813bc80000(0000) knlGS:0000000000000000
> > > >>>> [  178.116423][  T651] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > >>>> [  178.117421][  T651] CR2: 00005632cbfc81f6 CR3: 0000000131450002 CR4:0000000000370ee0
> > > >>>> [  178.118683][  T651] DR0: 0000000000000000 DR1: 0000000000000000 DR2:0000000000000000
> > > >>>> [  178.119894][  T651] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:0000000000000400
> > > >>>> [  178.121087][  T651] Call Trace:
> > > >>>> [  178.121654][  T651]  <TASK>
> > > >>>> [  178.122109][  T651]  ? zswap_load+0x67/0x570
> > > >>>> [  178.122658][  T651]  ? __warn+0x81/0x170
> > > >>>> [  178.123119][  T651]  ? zswap_load+0x67/0x570
> > > >>>> [  178.123608][  T651]  ? report_bug+0x167/0x190
> > > >>>> [  178.124150][  T651]  ? handle_bug+0x3c/0x70
> > > >>>> [  178.124615][  T651]  ? exc_invalid_op+0x13/0x60
> > > >>>> [  178.125192][  T651]  ? asm_exc_invalid_op+0x16/0x20
> > > >>>> [  178.125753][  T651]  ? zswap_load+0x67/0x570
> > > >>>> [  178.126231][  T651]  ? lock_acquire+0xbb/0x290
> > > >>>> [  178.126745][  T651]  ? folio_add_lru+0x40/0x1c0
> > > >>>> [  178.127261][  T651]  ? find_held_lock+0x2b/0x80
> > > >>>> [  178.127776][  T651]  swap_readpage+0xc7/0x5c0
> > > >>>> [  178.128273][  T651]  do_swap_page+0x86d/0xf50
> > > >>>> [  178.128770][  T651]  ? __pte_offset_map+0x3e/0x290
> > > >>>> [  178.129321][  T651]  ? __pte_offset_map+0x1c4/0x290
> > > >>>> [  178.129883][  T651]  __handle_mm_fault+0x6ad/0xca0
> > > >>>> [  178.130419][  T651]  handle_mm_fault+0x18b/0x410
> > > >>>> [  178.130992][  T651]  do_user_addr_fault+0x1f1/0x820
> > > >>>> [  178.132076][  T651]  exc_page_fault+0x63/0x1a0
> > > >>>> [  178.132599][  T651]  asm_exc_page_fault+0x22/0x30
> > > >>>>
> > > >>>> It's possible that swap_readpage() is called with none swapcache folio
> > > >>>> in do_swap_page() and trigger this warning. So we shouldn't assume
> > > >>>> zswap_load() always takes swapcache folio.
> > > >>>
> > > >>> Did you use a bdev with QUEUE_FLAG_SYNCHRONOUS? Otherwise it sounds
> > > >>> like a bug to me.
> > > >> I hit this warning with zram which has QUEUE_FLAG_SYNCHRONOUS set. Thanks.
> > > >
> > > > Does it make sense to keep the warning and instead change it to check
> > > > SWP_SYNCHRONOUS_IO as well? Something like:
> > > >
> > > > VM_WARN_ON_ONCE(!folio_test_swapcache(folio) &&
> > > > !swap_type_to_swap_info(type)->flags && SWP_SYNCHRONOUS_IO);
> > > >
> > > > Of course this is too ugly, so perhaps we want a helper to check if a
> > > > swapfile is synchronous.
> > > My understanding was that the WARN here is zswap_load() doesn't expect
> > > a folio not in swapcache. With zram, swap_readpage() must accept the
> > > folio not in swapcache. So this warn should not be there.
> > >
> > > But your comment make more sense to me. I will update the patch not
> > > to remove this WARN. Thanks.
> >
> > That can cause another warning.
> >
> > Please don't overegineer.
>
> How so?
>
> Using zswap with zram is a weird combination

Not at all -- it can achieve tiering between different compressors:
fast but low compression ratio for zswap but the opposite for zram.

> if anything I would
> prefer leaving the warning as-is than removing it to be honest.
  
Yosry Ahmed Aug. 11, 2023, 3:21 a.m. UTC | #13
On Thu, Aug 10, 2023 at 8:12 PM Yu Zhao <yuzhao@google.com> wrote:
>
> On Thu, Aug 10, 2023 at 5:46 PM Yosry Ahmed <yosryahmed@google.com> wrote:
> >
> > On Thu, Aug 10, 2023 at 4:44 PM Yu Zhao <yuzhao@google.com> wrote:
> > >
> > > On Thu, Aug 10, 2023 at 5:31 PM Yin, Fengwei <fengwei.yin@intel.com> wrote:
> > > >
> > > >
> > > >
> > > > On 8/11/2023 7:15 AM, Yosry Ahmed wrote:
> > > > > On Thu, Aug 10, 2023 at 4:09 PM Yin, Fengwei <fengwei.yin@intel.com> wrote:
> > > > >>
> > > > >>
> > > > >>
> > > > >> On 8/11/2023 2:44 AM, Yu Zhao wrote:
> > > > >>> On Thu, Aug 10, 2023 at 3:58 AM Yin Fengwei <fengwei.yin@intel.com> wrote:
> > > > >>>>
> > > > >>>> With mm-unstable branch, if trigger swap activity and it's possible
> > > > >>>> see following warning:
> > > > >>>> [  178.093511][  T651] WARNING: CPU: 2 PID: 651 at mm/zswap.c:1387 zswap_load+0x67/0x570
> > > > >>>> [  178.095155][  T651] Modules linked in:
> > > > >>>> [  178.096103][  T651] CPU: 2 PID: 651 Comm: gmain Not tainted 6.5.0-rc4-00492-gad3232df3e41 #148
> > > > >>>> [  178.098372][  T651] Hardware name: QEMU Standard PC (i440FX + PIIX,1996), BIOS 1.14.0-2 04/01/2014
> > > > >>>> [  178.101114][  T651] RIP: 0010:zswap_load+0x67/0x570
> > > > >>>> [  178.102359][  T651] Code: a0 78 4b 85 e8 ea db ff ff 48 8b 00 a8 01 0f 84 84 04 00 00 48 89 df e8 d7 db ff ff 48 8b 00 a9 00 00 08 00 0f 85 c4
> > > > >>>> [  178.106376][  T651] RSP: 0018:ffffc900011b3760 EFLAGS: 00010246
> > > > >>>> [  178.107675][  T651] RAX: 0017ffffc0080001 RBX: ffffea0004a991c0 RCX:ffffc900011b37dc
> > > > >>>> [  178.109242][  T651] RDX: 0000000000000000 RSI: 0000000000000001 RDI:ffffea0004a991c0
> > > > >>>> [  178.110916][  T651] RBP: ffffea0004a991c0 R08: 0000000000000243 R09:00000000c9a1aafc
> > > > >>>> [  178.112377][  T651] R10: 00000000c9657db3 R11: 000000003c9657db R12:0000000000014b9c
> > > > >>>> [  178.113698][  T651] R13: ffff88813501e710 R14: ffff88810d591000 R15:0000000000000000
> > > > >>>> [  178.115008][  T651] FS:  00007fb21a9ff700(0000) GS:ffff88813bc80000(0000) knlGS:0000000000000000
> > > > >>>> [  178.116423][  T651] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > >>>> [  178.117421][  T651] CR2: 00005632cbfc81f6 CR3: 0000000131450002 CR4:0000000000370ee0
> > > > >>>> [  178.118683][  T651] DR0: 0000000000000000 DR1: 0000000000000000 DR2:0000000000000000
> > > > >>>> [  178.119894][  T651] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:0000000000000400
> > > > >>>> [  178.121087][  T651] Call Trace:
> > > > >>>> [  178.121654][  T651]  <TASK>
> > > > >>>> [  178.122109][  T651]  ? zswap_load+0x67/0x570
> > > > >>>> [  178.122658][  T651]  ? __warn+0x81/0x170
> > > > >>>> [  178.123119][  T651]  ? zswap_load+0x67/0x570
> > > > >>>> [  178.123608][  T651]  ? report_bug+0x167/0x190
> > > > >>>> [  178.124150][  T651]  ? handle_bug+0x3c/0x70
> > > > >>>> [  178.124615][  T651]  ? exc_invalid_op+0x13/0x60
> > > > >>>> [  178.125192][  T651]  ? asm_exc_invalid_op+0x16/0x20
> > > > >>>> [  178.125753][  T651]  ? zswap_load+0x67/0x570
> > > > >>>> [  178.126231][  T651]  ? lock_acquire+0xbb/0x290
> > > > >>>> [  178.126745][  T651]  ? folio_add_lru+0x40/0x1c0
> > > > >>>> [  178.127261][  T651]  ? find_held_lock+0x2b/0x80
> > > > >>>> [  178.127776][  T651]  swap_readpage+0xc7/0x5c0
> > > > >>>> [  178.128273][  T651]  do_swap_page+0x86d/0xf50
> > > > >>>> [  178.128770][  T651]  ? __pte_offset_map+0x3e/0x290
> > > > >>>> [  178.129321][  T651]  ? __pte_offset_map+0x1c4/0x290
> > > > >>>> [  178.129883][  T651]  __handle_mm_fault+0x6ad/0xca0
> > > > >>>> [  178.130419][  T651]  handle_mm_fault+0x18b/0x410
> > > > >>>> [  178.130992][  T651]  do_user_addr_fault+0x1f1/0x820
> > > > >>>> [  178.132076][  T651]  exc_page_fault+0x63/0x1a0
> > > > >>>> [  178.132599][  T651]  asm_exc_page_fault+0x22/0x30
> > > > >>>>
> > > > >>>> It's possible that swap_readpage() is called with none swapcache folio
> > > > >>>> in do_swap_page() and trigger this warning. So we shouldn't assume
> > > > >>>> zswap_load() always takes swapcache folio.
> > > > >>>
> > > > >>> Did you use a bdev with QUEUE_FLAG_SYNCHRONOUS? Otherwise it sounds
> > > > >>> like a bug to me.
> > > > >> I hit this warning with zram which has QUEUE_FLAG_SYNCHRONOUS set. Thanks.
> > > > >
> > > > > Does it make sense to keep the warning and instead change it to check
> > > > > SWP_SYNCHRONOUS_IO as well? Something like:
> > > > >
> > > > > VM_WARN_ON_ONCE(!folio_test_swapcache(folio) &&
> > > > > !swap_type_to_swap_info(type)->flags && SWP_SYNCHRONOUS_IO);
> > > > >
> > > > > Of course this is too ugly, so perhaps we want a helper to check if a
> > > > > swapfile is synchronous.
> > > > My understanding was that the WARN here is zswap_load() doesn't expect
> > > > a folio not in swapcache. With zram, swap_readpage() must accept the
> > > > folio not in swapcache. So this warn should not be there.
> > > >
> > > > But your comment make more sense to me. I will update the patch not
> > > > to remove this WARN. Thanks.
> > >
> > > That can cause another warning.
> > >
> > > Please don't overegineer.
> >
> > How so?
> >
> > Using zswap with zram is a weird combination
>
> Not at all -- it can achieve tiering between different compressors:
> fast but low compression ratio for zswap but the opposite for zram.

That's definitely an interesting use case, thanks for pointing this out.

I would prefer creating a helper and using it in both do_swap_fault()
and zswap_load() in the WARN_ON (with data_race()), but I am not
against just removing the WARN_ON either. I will leave it up to you
and Yin :)

>
> > if anything I would
> > prefer leaving the warning as-is than removing it to be honest.
  
Yin Fengwei Aug. 11, 2023, 5:21 a.m. UTC | #14
On 8/11/2023 11:21 AM, Yosry Ahmed wrote:
> On Thu, Aug 10, 2023 at 8:12 PM Yu Zhao <yuzhao@google.com> wrote:
>>
>> On Thu, Aug 10, 2023 at 5:46 PM Yosry Ahmed <yosryahmed@google.com> wrote:
>>>
>>> On Thu, Aug 10, 2023 at 4:44 PM Yu Zhao <yuzhao@google.com> wrote:
>>>>
>>>> On Thu, Aug 10, 2023 at 5:31 PM Yin, Fengwei <fengwei.yin@intel.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 8/11/2023 7:15 AM, Yosry Ahmed wrote:
>>>>>> On Thu, Aug 10, 2023 at 4:09 PM Yin, Fengwei <fengwei.yin@intel.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 8/11/2023 2:44 AM, Yu Zhao wrote:
>>>>>>>> On Thu, Aug 10, 2023 at 3:58 AM Yin Fengwei <fengwei.yin@intel.com> wrote:
>>>>>>>>>
>>>>>>>>> With mm-unstable branch, if trigger swap activity and it's possible
>>>>>>>>> see following warning:
>>>>>>>>> [  178.093511][  T651] WARNING: CPU: 2 PID: 651 at mm/zswap.c:1387 zswap_load+0x67/0x570
>>>>>>>>> [  178.095155][  T651] Modules linked in:
>>>>>>>>> [  178.096103][  T651] CPU: 2 PID: 651 Comm: gmain Not tainted 6.5.0-rc4-00492-gad3232df3e41 #148
>>>>>>>>> [  178.098372][  T651] Hardware name: QEMU Standard PC (i440FX + PIIX,1996), BIOS 1.14.0-2 04/01/2014
>>>>>>>>> [  178.101114][  T651] RIP: 0010:zswap_load+0x67/0x570
>>>>>>>>> [  178.102359][  T651] Code: a0 78 4b 85 e8 ea db ff ff 48 8b 00 a8 01 0f 84 84 04 00 00 48 89 df e8 d7 db ff ff 48 8b 00 a9 00 00 08 00 0f 85 c4
>>>>>>>>> [  178.106376][  T651] RSP: 0018:ffffc900011b3760 EFLAGS: 00010246
>>>>>>>>> [  178.107675][  T651] RAX: 0017ffffc0080001 RBX: ffffea0004a991c0 RCX:ffffc900011b37dc
>>>>>>>>> [  178.109242][  T651] RDX: 0000000000000000 RSI: 0000000000000001 RDI:ffffea0004a991c0
>>>>>>>>> [  178.110916][  T651] RBP: ffffea0004a991c0 R08: 0000000000000243 R09:00000000c9a1aafc
>>>>>>>>> [  178.112377][  T651] R10: 00000000c9657db3 R11: 000000003c9657db R12:0000000000014b9c
>>>>>>>>> [  178.113698][  T651] R13: ffff88813501e710 R14: ffff88810d591000 R15:0000000000000000
>>>>>>>>> [  178.115008][  T651] FS:  00007fb21a9ff700(0000) GS:ffff88813bc80000(0000) knlGS:0000000000000000
>>>>>>>>> [  178.116423][  T651] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>>>>> [  178.117421][  T651] CR2: 00005632cbfc81f6 CR3: 0000000131450002 CR4:0000000000370ee0
>>>>>>>>> [  178.118683][  T651] DR0: 0000000000000000 DR1: 0000000000000000 DR2:0000000000000000
>>>>>>>>> [  178.119894][  T651] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:0000000000000400
>>>>>>>>> [  178.121087][  T651] Call Trace:
>>>>>>>>> [  178.121654][  T651]  <TASK>
>>>>>>>>> [  178.122109][  T651]  ? zswap_load+0x67/0x570
>>>>>>>>> [  178.122658][  T651]  ? __warn+0x81/0x170
>>>>>>>>> [  178.123119][  T651]  ? zswap_load+0x67/0x570
>>>>>>>>> [  178.123608][  T651]  ? report_bug+0x167/0x190
>>>>>>>>> [  178.124150][  T651]  ? handle_bug+0x3c/0x70
>>>>>>>>> [  178.124615][  T651]  ? exc_invalid_op+0x13/0x60
>>>>>>>>> [  178.125192][  T651]  ? asm_exc_invalid_op+0x16/0x20
>>>>>>>>> [  178.125753][  T651]  ? zswap_load+0x67/0x570
>>>>>>>>> [  178.126231][  T651]  ? lock_acquire+0xbb/0x290
>>>>>>>>> [  178.126745][  T651]  ? folio_add_lru+0x40/0x1c0
>>>>>>>>> [  178.127261][  T651]  ? find_held_lock+0x2b/0x80
>>>>>>>>> [  178.127776][  T651]  swap_readpage+0xc7/0x5c0
>>>>>>>>> [  178.128273][  T651]  do_swap_page+0x86d/0xf50
>>>>>>>>> [  178.128770][  T651]  ? __pte_offset_map+0x3e/0x290
>>>>>>>>> [  178.129321][  T651]  ? __pte_offset_map+0x1c4/0x290
>>>>>>>>> [  178.129883][  T651]  __handle_mm_fault+0x6ad/0xca0
>>>>>>>>> [  178.130419][  T651]  handle_mm_fault+0x18b/0x410
>>>>>>>>> [  178.130992][  T651]  do_user_addr_fault+0x1f1/0x820
>>>>>>>>> [  178.132076][  T651]  exc_page_fault+0x63/0x1a0
>>>>>>>>> [  178.132599][  T651]  asm_exc_page_fault+0x22/0x30
>>>>>>>>>
>>>>>>>>> It's possible that swap_readpage() is called with none swapcache folio
>>>>>>>>> in do_swap_page() and trigger this warning. So we shouldn't assume
>>>>>>>>> zswap_load() always takes swapcache folio.
>>>>>>>>
>>>>>>>> Did you use a bdev with QUEUE_FLAG_SYNCHRONOUS? Otherwise it sounds
>>>>>>>> like a bug to me.
>>>>>>> I hit this warning with zram which has QUEUE_FLAG_SYNCHRONOUS set. Thanks.
>>>>>>
>>>>>> Does it make sense to keep the warning and instead change it to check
>>>>>> SWP_SYNCHRONOUS_IO as well? Something like:
>>>>>>
>>>>>> VM_WARN_ON_ONCE(!folio_test_swapcache(folio) &&
>>>>>> !swap_type_to_swap_info(type)->flags && SWP_SYNCHRONOUS_IO);
>>>>>>
>>>>>> Of course this is too ugly, so perhaps we want a helper to check if a
>>>>>> swapfile is synchronous.
>>>>> My understanding was that the WARN here is zswap_load() doesn't expect
>>>>> a folio not in swapcache. With zram, swap_readpage() must accept the
>>>>> folio not in swapcache. So this warn should not be there.
>>>>>
>>>>> But your comment make more sense to me. I will update the patch not
>>>>> to remove this WARN. Thanks.
>>>>
>>>> That can cause another warning.
>>>>
>>>> Please don't overegineer.
>>>
>>> How so?
>>>
>>> Using zswap with zram is a weird combination
>>
>> Not at all -- it can achieve tiering between different compressors:
>> fast but low compression ratio for zswap but the opposite for zram.
> 
> That's definitely an interesting use case, thanks for pointing this out.
> 
> I would prefer creating a helper and using it in both do_swap_fault()
> and zswap_load() in the WARN_ON (with data_race()), but I am not
> against just removing the WARN_ON either. I will leave it up to you
> and Yin :)
OK. I will stick to the current patch.

Regards
Yin, Fengwei

> 
>>
>>> if anything I would
>>> prefer leaving the warning as-is than removing it to be honest.
  
Chris Li Aug. 15, 2023, 2:21 p.m. UTC | #15
Hi Yin,

On Fri, Aug 11, 2023 at 01:21:21PM +0800, Yin, Fengwei wrote:
> OK. I will stick to the current patch.

I think remove that warning is fine.

Feel free to add:

Reviewed-by: Chris Li (Google) <chrisl@kernel.org>

Chris
  

Patch

diff --git a/mm/zswap.c b/mm/zswap.c
index 1e17f11a7896..7300b98d4a03 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -1384,7 +1384,6 @@  bool zswap_load(struct folio *folio)
 	bool ret;
 
 	VM_WARN_ON_ONCE(!folio_test_locked(folio));
-	VM_WARN_ON_ONCE(!folio_test_swapcache(folio));
 
 	/* find */
 	spin_lock(&tree->lock);