[linux-next,v3] swap_state: update shadow_nodes for anonymous page
Commit Message
From: Yang Yang <yang.yang29@zte.com.cn>
Shadow_nodes is for shadow nodes reclaiming of workingset handling,
it is updated when page cache add or delete since long time ago
workingset only supported page cache. But when workingset supports
anonymous page detection, we missied updating shadow nodes for
it. This caused that shadow nodes of anonymous page will never be
reclaimd by scan_shadow_nodes() even they use much memory and
system memory is tense.
So update shadow_nodes of anonymous page when swap cache is
add or delete by calling xas_set_update(..workingset_update_node).
Fixes: aae466b0052e ("mm/swap: implement workingset detection for anonymous LRU")
Signed-off-by: Yang Yang <yang.yang29@zte.com.cn>
Reviewed-by: Ran Xiaokai <ran.xiaokai@zte.com.cn>
---
change for v3
- Modify git log of explain of this patch do in imperative mood. Thanks to
Bagas Sanjaya.
change for v2
- Include a description of the user-visible effect. Add fixes tag. Modify comments.
Also call workingset_update_node() in clear_shadow_from_swap_cache(). Thanks
to Matthew Wilcox.
---
include/linux/xarray.h | 3 ++-
mm/swap_state.c | 6 ++++++
2 files changed, 8 insertions(+), 1 deletion(-)
Comments
On Fri, Jan 13, 2023 at 05:36:45PM +0800, yang.yang29@zte.com.cn wrote:
> From: Yang Yang <yang.yang29@zte.com.cn>
>
> Shadow_nodes is for shadow nodes reclaiming of workingset handling,
> it is updated when page cache add or delete since long time ago
> workingset only supported page cache. But when workingset supports
> anonymous page detection, we missied updating shadow nodes for
> it. This caused that shadow nodes of anonymous page will never be
> reclaimd by scan_shadow_nodes() even they use much memory and
> system memory is tense.
>
> So update shadow_nodes of anonymous page when swap cache is
> add or delete by calling xas_set_update(..workingset_update_node).
What testing did you do of this? I have this crash in today's testing:
04304 BUG: kernel NULL pointer dereference, address: 0000000000000080
04304 #PF: supervisor read access in kernel mode
04304 #PF: error_code(0x0000) - not-present page
04304 PGD 0 P4D 0
04304 Oops: 0000 [#1] PREEMPT SMP NOPTI
04304 CPU: 4 PID: 3219629 Comm: sh Kdump: loaded Not tainted 6.2.0-rc4-next-20230116-00016-gd289d3de8ce5-dirty #69
04304 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
04304 RIP: 0010:_raw_spin_trylock+0x12/0x50
04304 Code: e0 41 5c 5d c3 89 c6 48 89 df e8 89 06 00 00 4c 89 e0 5b 41 5c 5d c3 90 55 48 89 e5 53 48 89 fb bf 01 00 00 00 e8 be 5b 71 ff <8b> 03 85 c0 75 16 ba 01 00 00 00 f0 0f b1 13 b8 01 00 00 00 75 06
04304 RSP: 0018:ffff888059afbbb8 EFLAGS: 00010093
04304 RAX: 0000000000000003 RBX: 0000000000000080 RCX: 0000000000000000
04304 RDX: 0000000000000000 RSI: ffff8880033e24c8 RDI: 0000000000000001
04304 RBP: ffff888059afbbc0 R08: 0000000000000000 R09: ffff888059afbd68
04304 R10: ffff88807d9db868 R11: 0000000000000000 R12: ffff8880033e24c0
04304 R13: ffff88800a1d8008 R14: ffff8880033e24c8 R15: ffff8880033e24c0
04304 FS: 00007feeeabc6740(0000) GS:ffff88807d900000(0000) knlGS:0000000000000000
04304 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
04304 CR2: 0000000000000080 CR3: 0000000059830003 CR4: 0000000000770ea0
04304 PKRU: 55555554
04304 Call Trace:
04304 <TASK>
04304 shadow_lru_isolate+0x3a/0x120
04304 __list_lru_walk_one+0xa3/0x190
04304 ? memcg_list_lru_alloc+0x330/0x330
04304 ? memcg_list_lru_alloc+0x330/0x330
04304 list_lru_walk_one_irq+0x59/0x80
04304 scan_shadow_nodes+0x27/0x30
04304 do_shrink_slab+0x13b/0x2e0
04304 shrink_slab+0x92/0x250
04304 drop_slab+0x41/0x90
04304 drop_caches_sysctl_handler+0x70/0x80
04304 proc_sys_call_handler+0x162/0x210
04304 proc_sys_write+0xe/0x10
04304 vfs_write+0x1c7/0x3a0
04304 ksys_write+0x57/0xd0
04304 __x64_sys_write+0x14/0x20
04304 do_syscall_64+0x34/0x80
04304 entry_SYSCALL_64_after_hwframe+0x63/0xcd
04304 RIP: 0033:0x7feeeacc1190
Decoding it, shadow_lru_isolate+0x3a/0x120 maps back to this line:
if (!spin_trylock(&mapping->host->i_lock)) {
i_lock is at offset 128 of struct inode, so that matches the dump.
I believe that swapper_spaces never have ->host set, so I don't
believe you've tested this patch since 51b8c1fe250d went in
back in 2021.
> What testing did you do of this? I have this crash in today's testing:
My test is this:
1.Configure zram for swap.
2.Run some program malloc and access large memory, make sure they
can cause swap.
3.Watch count_shadow_nodes() and shadow_lru_isolate() to make sure
that shadow_nodes are really shrinking by adding printk().
Really sorry for inadequate test, I will try more tests include drop_caches
by sysctl.
> i_lock is at offset 128 of struct inode, so that matches the dump.
> I believe that swapper_spaces never have ->host set, so I don't
> believe you've tested this patch since 51b8c1fe250d went in
> back in 2021.
You are totally right. I reproduce the panic in linux-next, and fix
it by patch v4. I should be more careful, since I used Linux 5.14
to test the patch which is a mistake.
Much apologies for the time wasted.
Thanks.
@@ -1643,7 +1643,8 @@ static inline void xas_set_order(struct xa_state *xas, unsigned long index,
* @update: Function to call when updating a node.
*
* The XArray can notify a caller after it has updated an xa_node.
- * This is advanced functionality and is only needed by the page cache.
+ * This is advanced functionality and is only needed by the page cache
+ * and swap cache.
*/
static inline void xas_set_update(struct xa_state *xas, xa_update_node_t update)
{
@@ -94,6 +94,8 @@ int add_to_swap_cache(struct folio *folio, swp_entry_t entry,
unsigned long i, nr = folio_nr_pages(folio);
void *old;
+ xas_set_update(&xas, workingset_update_node);
+
VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);
VM_BUG_ON_FOLIO(folio_test_swapcache(folio), folio);
VM_BUG_ON_FOLIO(!folio_test_swapbacked(folio), folio);
@@ -145,6 +147,8 @@ void __delete_from_swap_cache(struct folio *folio,
pgoff_t idx = swp_offset(entry);
XA_STATE(xas, &address_space->i_pages, idx);
+ xas_set_update(&xas, workingset_update_node);
+
VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);
VM_BUG_ON_FOLIO(!folio_test_swapcache(folio), folio);
VM_BUG_ON_FOLIO(folio_test_writeback(folio), folio);
@@ -252,6 +256,8 @@ void clear_shadow_from_swap_cache(int type, unsigned long begin,
struct address_space *address_space = swap_address_space(entry);
XA_STATE(xas, &address_space->i_pages, curr);
+ xas_set_update(&xas, workingset_update_node);
+
xa_lock_irq(&address_space->i_pages);
xas_for_each(&xas, old, end) {
if (!xa_is_value(old))