Message ID | a24a86fbae09711e61dc4424aa7aebff718e9995.1678703534.git.baolin.wang@linux.alibaba.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp1109188wrd; Mon, 13 Mar 2023 03:48:33 -0700 (PDT) X-Google-Smtp-Source: AK7set+WBPq+lTU6XP4BDaUov7HUtTT0QFzbU5EL66rgnVnXAEeX1Pl3pr8fc7e5iGBvm7sJjKNs X-Received: by 2002:a05:6a21:6da7:b0:cc:c69b:f7f1 with SMTP id wl39-20020a056a216da700b000ccc69bf7f1mr45334417pzb.15.1678704513630; Mon, 13 Mar 2023 03:48:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1678704513; cv=none; d=google.com; s=arc-20160816; b=oRvBEVW2pj32t4sPTYJL7MPjvZKRlu12g/Q7g03/qYtW9hguqPdZBBwng7oOra17fv r2XIZDWs9NlvGevBs99IlcqJEW2eeIRtakg0HJ/DsDkfVuMINhjy2DE1jy98nGAvKYDW ghQtghMV1qN7D8ial3K/u54U/54Kjc7yRV9Fg9jHglT7E2eieKubTgtNzuW4mJ0rFgTQ yKTO583FYC68oy836eTd+rPUFYgyg69SF+Z62Vf2gfaRSxzP4efkjoYQdi4kLs22Sfey +YAwfktB0twKu3u1ZepRdn9hQLATnYKt0lb4I08w3TMH/tehx+VD0n4+XbD/FtLceC8K 2LNQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=W+6m/J5co4iV+tD4vjXf7uQYThOT2r/sspycShPNR4A=; b=kI6WoQQ6S2DJqU1FHT3GQJq2d+4aPiTB4dHqwhR5l7aRcqrtwDWC/y/DkqsY2rVNeE rkHyrqQURPmqQUtko9sStEMTvrKoCwF0nsATzVtQBtIB/qy/yB85QJJbYcl8j2Y1w3wQ LOi2TRo+FPqfX3pUtXr4YS7ZKnU99O+U+lZLgOMIdCd5a8TGJkwe4I2dsb21WZOajtnz HqX03BCxETL9H8LACZf3rl4aWqUMt0d6YIDSFhL9bFBukGhrbWCi5/olLUN5VWN3NnjD nxY4GiUifviCd671/Ll39uYQvIkBgCQNHGpxDhiNrGYpjIffsRfMTKXdE5aW6c/kGPj5 jYJg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id a19-20020aa794b3000000b00593a3cabd75si6196200pfl.313.2023.03.13.03.48.19; Mon, 13 Mar 2023 03:48:33 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230095AbjCMKiF (ORCPT <rfc822;realc9580@gmail.com> + 99 others); Mon, 13 Mar 2023 06:38:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47352 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229888AbjCMKh4 (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 13 Mar 2023 06:37:56 -0400 Received: from out30-99.freemail.mail.aliyun.com (out30-99.freemail.mail.aliyun.com [115.124.30.99]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 021B65F532 for <linux-kernel@vger.kernel.org>; Mon, 13 Mar 2023 03:37:30 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R391e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018046059;MF=baolin.wang@linux.alibaba.com;NM=1;PH=DS;RN=8;SR=0;TI=SMTPD_---0VdlJ.M3_1678703847; Received: from localhost(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0VdlJ.M3_1678703847) by smtp.aliyun-inc.com; Mon, 13 Mar 2023 18:37:27 +0800 From: Baolin Wang <baolin.wang@linux.alibaba.com> To: akpm@linux-foundation.org Cc: mgorman@techsingularity.net, osalvador@suse.de, vbabka@suse.cz, william.lam@bytedance.com, baolin.wang@linux.alibaba.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH 2/2] mm: compaction: fix the possible deadlock when isolating hugetlb pages Date: Mon, 13 Mar 2023 18:37:17 +0800 Message-Id: <a24a86fbae09711e61dc4424aa7aebff718e9995.1678703534.git.baolin.wang@linux.alibaba.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <1bc1c955b03603c4e14f56dfbbef9f637f18dbbd.1678703534.git.baolin.wang@linux.alibaba.com> References: <1bc1c955b03603c4e14f56dfbbef9f637f18dbbd.1678703534.git.baolin.wang@linux.alibaba.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-9.9 required=5.0 tests=BAYES_00, ENV_AND_HDR_SPF_MATCH,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS, UNPARSEABLE_RELAY,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1760249264219268830?= X-GMAIL-MSGID: =?utf-8?q?1760249264219268830?= |
Series |
[1/2] mm: compaction: consider the number of scanning compound pages in isolate fail path
|
|
Commit Message
Baolin Wang
March 13, 2023, 10:37 a.m. UTC
When trying to isolate a migratable pageblock, it can contain several
normal pages or several hugetlb pages (e.g. CONT-PTE 64K hugetlb on arm64)
in a pageblock. That means we may hold the lru lock of a normal page to
continue to isolate the next hugetlb page by isolate_or_dissolve_huge_page()
in the same migratable pageblock.
However in the isolate_or_dissolve_huge_page(), it may allocate a new hugetlb
page and dissolve the old one by alloc_and_dissolve_hugetlb_folio() if the
hugetlb's refcount is zero. That means we can still enter the direct compaction
path to allocate a new hugetlb page under the current lru lock, which
may cause possible deadlock.
To avoid this possible deadlock, we should release the lru lock when trying
to isolate a hugetbl page. Moreover it does not make sense to take the lru
lock to isolate a hugetlb, which is not in the lru list.
Fixes: 369fa227c219 ("mm: make alloc_contig_range handle free hugetlb pages")
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
mm/compaction.c | 5 +++++
1 file changed, 5 insertions(+)
Comments
On 03/13/23 18:37, Baolin Wang wrote: > When trying to isolate a migratable pageblock, it can contain several > normal pages or several hugetlb pages (e.g. CONT-PTE 64K hugetlb on arm64) > in a pageblock. That means we may hold the lru lock of a normal page to > continue to isolate the next hugetlb page by isolate_or_dissolve_huge_page() > in the same migratable pageblock. > > However in the isolate_or_dissolve_huge_page(), it may allocate a new hugetlb > page and dissolve the old one by alloc_and_dissolve_hugetlb_folio() if the > hugetlb's refcount is zero. That means we can still enter the direct compaction > path to allocate a new hugetlb page under the current lru lock, which > may cause possible deadlock. > > To avoid this possible deadlock, we should release the lru lock when trying > to isolate a hugetbl page. Moreover it does not make sense to take the lru > lock to isolate a hugetlb, which is not in the lru list. > > Fixes: 369fa227c219 ("mm: make alloc_contig_range handle free hugetlb pages") > Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> > --- > mm/compaction.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/mm/compaction.c b/mm/compaction.c > index c9d9ad958e2a..ac8ff152421a 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c Thanks! I suspect holding the lru lock when calling isolate_or_dissolve_huge_page was not considered. However, I wonder if this can really happen in practice? Before the code below, there is this: /* * Periodically drop the lock (if held) regardless of its * contention, to give chance to IRQs. Abort completely if * a fatal signal is pending. */ if (!(low_pfn % COMPACT_CLUSTER_MAX)) { if (locked) { unlock_page_lruvec_irqrestore(locked, flags); locked = NULL; } ... } It would seem that the pfn of a hugetlb page would always be a multiple of COMPACT_CLUSTER_MAX so we would drop the lock. However, I am not sure if that is ALWAYS true and would prefer something like the code you suggested. Did you actually see this deadlock in practice?
On Mon, 13 Mar 2023 10:08:38 -0700 Mike Kravetz <mike.kravetz@oracle.com> wrote: > I suspect holding the lru lock when calling isolate_or_dissolve_huge_page was > not considered. However, I wonder if this can really happen in practice? > > Before the code below, there is this: > > /* > * Periodically drop the lock (if held) regardless of its > * contention, to give chance to IRQs. Abort completely if > * a fatal signal is pending. > */ > if (!(low_pfn % COMPACT_CLUSTER_MAX)) { > if (locked) { > unlock_page_lruvec_irqrestore(locked, flags); > locked = NULL; > } > ... > } > > It would seem that the pfn of a hugetlb page would always be a multiple of > COMPACT_CLUSTER_MAX so we would drop the lock. However, I am not sure if > that is ALWAYS true and would prefer something like the code you suggested. > > Did you actually see this deadlock in practice? Presumably the lack of lockdep reports about this tells us something?
On 3/14/2023 1:08 AM, Mike Kravetz wrote: > On 03/13/23 18:37, Baolin Wang wrote: >> When trying to isolate a migratable pageblock, it can contain several >> normal pages or several hugetlb pages (e.g. CONT-PTE 64K hugetlb on arm64) >> in a pageblock. That means we may hold the lru lock of a normal page to >> continue to isolate the next hugetlb page by isolate_or_dissolve_huge_page() >> in the same migratable pageblock. >> >> However in the isolate_or_dissolve_huge_page(), it may allocate a new hugetlb >> page and dissolve the old one by alloc_and_dissolve_hugetlb_folio() if the >> hugetlb's refcount is zero. That means we can still enter the direct compaction >> path to allocate a new hugetlb page under the current lru lock, which >> may cause possible deadlock. >> >> To avoid this possible deadlock, we should release the lru lock when trying >> to isolate a hugetbl page. Moreover it does not make sense to take the lru >> lock to isolate a hugetlb, which is not in the lru list. >> >> Fixes: 369fa227c219 ("mm: make alloc_contig_range handle free hugetlb pages") >> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> >> --- >> mm/compaction.c | 5 +++++ >> 1 file changed, 5 insertions(+) >> >> diff --git a/mm/compaction.c b/mm/compaction.c >> index c9d9ad958e2a..ac8ff152421a 100644 >> --- a/mm/compaction.c >> +++ b/mm/compaction.c > > Thanks! > > I suspect holding the lru lock when calling isolate_or_dissolve_huge_page was > not considered. However, I wonder if this can really happen in practice? > > Before the code below, there is this: > > /* > * Periodically drop the lock (if held) regardless of its > * contention, to give chance to IRQs. Abort completely if > * a fatal signal is pending. > */ > if (!(low_pfn % COMPACT_CLUSTER_MAX)) { > if (locked) { > unlock_page_lruvec_irqrestore(locked, flags); > locked = NULL; > } > ... > } > > It would seem that the pfn of a hugetlb page would always be a multiple of > COMPACT_CLUSTER_MAX so we would drop the lock. However, I am not sure if > that is ALWAYS true and would prefer something like the code you suggested. Well, this is not always true, suppose the CONT-PTE hugetlb on ARM arch, which contains 16 contiguous normal pages. > Did you actually see this deadlock in practice? I did not see this issue in practice until now, but I think it can be triggered from code inspection if trying to isolate a CONT-PTE hugetlb.
On 03/14/23 12:11, Baolin Wang wrote: > On 3/14/2023 1:08 AM, Mike Kravetz wrote: > > On 03/13/23 18:37, Baolin Wang wrote: > > > > It would seem that the pfn of a hugetlb page would always be a multiple of > > COMPACT_CLUSTER_MAX so we would drop the lock. However, I am not sure if > > that is ALWAYS true and would prefer something like the code you suggested. > > Well, this is not always true, suppose the CONT-PTE hugetlb on ARM arch, > which contains 16 contiguous normal pages. > Right. I keep forgetting about the CONT-* page sizes on arm :( In any case, I think explicitly dropping the lock as you have done is a good idea. Feel free to add, Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
On 3/15/2023 1:27 AM, Mike Kravetz wrote: > On 03/14/23 12:11, Baolin Wang wrote: >> On 3/14/2023 1:08 AM, Mike Kravetz wrote: >>> On 03/13/23 18:37, Baolin Wang wrote: >>> >>> It would seem that the pfn of a hugetlb page would always be a multiple of >>> COMPACT_CLUSTER_MAX so we would drop the lock. However, I am not sure if >>> that is ALWAYS true and would prefer something like the code you suggested. >> >> Well, this is not always true, suppose the CONT-PTE hugetlb on ARM arch, >> which contains 16 contiguous normal pages. >> > > Right. I keep forgetting about the CONT-* page sizes on arm :( > > In any case, I think explicitly dropping the lock as you have done is a > good idea. > > Feel free to add, > > Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Thanks for reviewing.
On 3/13/23 11:37, Baolin Wang wrote: > When trying to isolate a migratable pageblock, it can contain several > normal pages or several hugetlb pages (e.g. CONT-PTE 64K hugetlb on arm64) > in a pageblock. That means we may hold the lru lock of a normal page to > continue to isolate the next hugetlb page by isolate_or_dissolve_huge_page() > in the same migratable pageblock. > > However in the isolate_or_dissolve_huge_page(), it may allocate a new hugetlb > page and dissolve the old one by alloc_and_dissolve_hugetlb_folio() if the > hugetlb's refcount is zero. That means we can still enter the direct compaction > path to allocate a new hugetlb page under the current lru lock, which > may cause possible deadlock. > > To avoid this possible deadlock, we should release the lru lock when trying > to isolate a hugetbl page. Moreover it does not make sense to take the lru > lock to isolate a hugetlb, which is not in the lru list. > > Fixes: 369fa227c219 ("mm: make alloc_contig_range handle free hugetlb pages") > Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Thanks! > --- > mm/compaction.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/mm/compaction.c b/mm/compaction.c > index c9d9ad958e2a..ac8ff152421a 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -893,6 +893,11 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, > } > > if (PageHuge(page) && cc->alloc_contig) { > + if (locked) { > + unlock_page_lruvec_irqrestore(locked, flags); > + locked = NULL; > + } > + > ret = isolate_or_dissolve_huge_page(page, &cc->migratepages); > > /*
diff --git a/mm/compaction.c b/mm/compaction.c index c9d9ad958e2a..ac8ff152421a 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -893,6 +893,11 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, } if (PageHuge(page) && cc->alloc_contig) { + if (locked) { + unlock_page_lruvec_irqrestore(locked, flags); + locked = NULL; + } + ret = isolate_or_dissolve_huge_page(page, &cc->migratepages); /*