Message ID | 20230809100754.3094517-2-shikemeng@huaweicloud.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:c44e:0:b0:3f2:4152:657d with SMTP id w14csp2552997vqr; Tue, 8 Aug 2023 21:12:24 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGuz8wWqHDdRFTfl4K+NJjqmW1Gjzb+0Q8tbe8DUsD78p9LioSIjLCy3LGHDGIEh49OaMo3 X-Received: by 2002:a17:902:c409:b0:1bb:ab0d:4f76 with SMTP id k9-20020a170902c40900b001bbab0d4f76mr2118032plk.58.1691554344224; Tue, 08 Aug 2023 21:12:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1691554344; cv=none; d=google.com; s=arc-20160816; b=Kdc3RMZY0TSngP78Rrtpw1GVyM228IGVcvRxTgyBMeK9XpY0U5QLBVem0+PTKEkUrc a7A1ZdrU8BqgfpRZ7xZIefxHpO7yD9jv5GgU3NEYjwpffmmnfiDI7rlhLjJBG3BDu62i FjzpcNxLwVcJHU5KIhRFiP+ra1ucOYRY9RdHdzt/CuBV4oxGCUJS8aTTDy+n05Q8xE8r kx9HvEOt6oSCMiIRI+twwQtShvNQLowusz0QfaddXC0Yh5HH1K+2vJUZqxQBZlSHWOk2 2Wzz0o0B6AVRwP+VvyciUUywOFt5Qqtf3/8sy5ZZET8Rp2w6/UMFJoy34Hhe2NLqF6gC eqEQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=9xBSZPX5FYhYWj4xHTDCMD4smmcokZS9fX+L78mPfbw=; fh=7OUidGM7uWbpAl8VTxhM9W4yciPE2x0Gkm1foXEeriw=; b=yLeDA3Jm2TEOllCsznjMm9AOWPS7XdQnJbTVHILWbLTX644PHI337eK698d+IvuJgM NkfmTdQU2hzt/SY8N2QweS9nvzS7bQww4ZE+s+cu4c85ekRQA9wFemDv/LXkb/K/qCQr 0fkkGT7Ha4Jnr0PWfx9aByqjp+q6uaqYxDowpK11VwesddZxy6hqiCd8mjQEHwR60jMr 09k0ncqR3v2BI8zN19uRnKIHx+sF1yMd9wr/uIyPhp10ms+Nx3VV8ISC0WZ8AMVETDij AIPTplRsVjFiU7HL1gkycX8v4nXx/D879rveCYNmTr2CDfTtn/ArFH9kzjd/kD4y5W79 r0yA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id n6-20020a170903110600b001bc5169be55si7407381plh.314.2023.08.08.21.12.11; Tue, 08 Aug 2023 21:12:24 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230055AbjHICHv (ORCPT <rfc822;aaronkmseo@gmail.com> + 99 others); Tue, 8 Aug 2023 22:07:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53600 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229840AbjHICHs (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 8 Aug 2023 22:07:48 -0400 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 77E6D1BCD for <linux-kernel@vger.kernel.org>; Tue, 8 Aug 2023 19:07:48 -0700 (PDT) Received: from mail02.huawei.com (unknown [172.30.67.143]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4RLD3N5qqKz4f3pHf for <linux-kernel@vger.kernel.org>; Wed, 9 Aug 2023 10:07:44 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.124.27]) by APP2 (Coremail) with SMTP id Syh0CgBXem7v9NJkFAkTAQ--.60414S3; Wed, 09 Aug 2023 10:07:45 +0800 (CST) From: Kemeng Shi <shikemeng@huaweicloud.com> To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, baolin.wang@linux.alibaba.com, mgorman@techsingularity.net, david@redhat.com, willy@infradead.org Cc: shikemeng@huaweicloud.com Subject: [PATCH 1/2] mm/page_alloc: remove track of active PCP lists range in bulk free Date: Wed, 9 Aug 2023 18:07:53 +0800 Message-Id: <20230809100754.3094517-2-shikemeng@huaweicloud.com> X-Mailer: git-send-email 2.30.0 In-Reply-To: <20230809100754.3094517-1-shikemeng@huaweicloud.com> References: <20230809100754.3094517-1-shikemeng@huaweicloud.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-CM-TRANSID: Syh0CgBXem7v9NJkFAkTAQ--.60414S3 X-Coremail-Antispam: 1UD129KBjvJXoW7KFyrKFWkGw4kKFyrKr18AFb_yoW8GFy7pF Wftw1SyrWkJryrCr4DAa1Dury7Kw1qkFWDurWrG348ZwnxWFyIk3WIqr9ayFyrJFWkZFy7 AF4UJryFya4UZ3JanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUBK14x267AKxVW5JVWrJwAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2jI8I6cxK62vIxIIY0VWUZVW8XwA2048vs2IY02 0E87I2jVAFwI0_Jr4l82xGYIkIc2x26xkF7I0E14v26r1I6r4UM28lY4IEw2IIxxk0rwA2 F7IY1VAKz4vEj48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjx v20xvEc7CjxVAFwI0_Cr1j6rxdM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E 87Iv6xkF7I0E14v26rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64 kE6c02F40Ex7xfMcIj6xIIjxv20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm 72CE4IkC6x0Yz7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41lF7I21c0EjII2zVCS5cI20VAGYx C7MxAIw28IcxkI7VAKI48JMxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I0E5I8CrVAFwI0_ Jr0_Jr4lx2IqxVCjr7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVWUtVW8ZwCIc40Y0x 0EwIxGrwCI42IY6xIIjxv20xvE14v26r1j6r1xMIIF0xvE2Ix0cI8IcVCY1x0267AKxVW8 JVWxJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Jr0_Gr1lIx AIcVC2z280aVCY1x0267AKxVW8JVW8JrUvcSsGvfC2KfnxnUUI43ZEXa7sRE2NtUUUUUU= = X-CM-SenderInfo: 5vklyvpphqwq5kxd4v5lfo033gof0z/ X-CFilter-Loop: Reflected X-Spam-Status: No, score=0.1 required=5.0 tests=BAYES_00,DATE_IN_FUTURE_06_12, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1773723287963220407 X-GMAIL-MSGID: 1773723287963220407 |
Series |
Two minor cleanups for pcp list in page_alloc
|
|
Commit Message
Kemeng Shi
Aug. 9, 2023, 10:07 a.m. UTC
After commit fd56eef258a17 ("mm/page_alloc: simplify how many pages are
selected per pcp list during bulk free"), we will drain all pages in
selected pcp list. And we ensured passed count is < pcp->count. Then,
the search will finish before wrap-around and track of active PCP lists
range intended for wrap-around case is no longer needed.
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
mm/page_alloc.c | 15 +++------------
1 file changed, 3 insertions(+), 12 deletions(-)
Comments
Hi Kemeng, Can you confirm this patch has no intended functional change? I have a patch sitting in my tree for a while related to this count vs pcp->count. The BPF function hook can potentially change pcp->count and make count out of sync with pcp->count which causes a dead loop. Maybe I can send my out alone side with yours for discussion? I don't mind my patch combined with yours. Your change looks fine to me. There is more can be done on the clean up. Chris On Wed, Aug 09, 2023 at 06:07:53PM +0800, Kemeng Shi wrote: > After commit fd56eef258a17 ("mm/page_alloc: simplify how many pages are > selected per pcp list during bulk free"), we will drain all pages in > selected pcp list. And we ensured passed count is < pcp->count. Then, > the search will finish before wrap-around and track of active PCP lists > range intended for wrap-around case is no longer needed. > > Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> > --- > mm/page_alloc.c | 15 +++------------ > 1 file changed, 3 insertions(+), 12 deletions(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 96b7c1a7d1f2..1ddcb2707d05 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1207,8 +1207,6 @@ static void free_pcppages_bulk(struct zone *zone, int count, > int pindex) > { > unsigned long flags; > - int min_pindex = 0; > - int max_pindex = NR_PCP_LISTS - 1; > unsigned int order; > bool isolated_pageblocks; > struct page *page; > @@ -1231,17 +1229,10 @@ static void free_pcppages_bulk(struct zone *zone, int count, > > /* Remove pages from lists in a round-robin fashion. */ > do { > - if (++pindex > max_pindex) > - pindex = min_pindex; > + if (++pindex > NR_PCP_LISTS - 1) > + pindex = 0; > list = &pcp->lists[pindex]; > - if (!list_empty(list)) > - break; > - > - if (pindex == max_pindex) > - max_pindex--; > - if (pindex == min_pindex) > - min_pindex++; > - } while (1); > + } while (list_empty(list)); > > order = pindex_to_order(pindex); > nr_pages = 1 << order; > -- > 2.30.0 >
on 8/16/2023 1:45 AM, Chris Li wrote: > Hi Kemeng, > > Can you confirm this patch has no intended functional change? > Hi Chris, there is no functional change intended in this patch. As I menthioned in changelog, there is no wrap for list iteration, so that the active PCP lists range will never be used. > I have a patch sitting in my tree for a while related to this > count vs pcp->count. The BPF function hook can potentially change > pcp->count and make count out of sync with pcp->count which causes > a dead loop. > I guess pcp->count is set to bigger than it should be. In this case, we will keep trying get pages while all pages in pcp list were taken off already and dead lock will happen. In this case, dead looo will happen with or without this patch as the root cause is that we try to get pages more than pcp list owns.> Maybe I can send my out alone side with yours for discussion? > I don't mind my patch combined with yours. > Either way is acceptable to me, just feel free to choose one you like and I'd like to see if more we could do to this. > Your change looks fine to me. There is more can be done > on the clean up. > Thanks for feedback, and more clean up is welcome. > Chris > > On Wed, Aug 09, 2023 at 06:07:53PM +0800, Kemeng Shi wrote: >> After commit fd56eef258a17 ("mm/page_alloc: simplify how many pages are >> selected per pcp list during bulk free"), we will drain all pages in >> selected pcp list. And we ensured passed count is < pcp->count. Then, >> the search will finish before wrap-around and track of active PCP lists >> range intended for wrap-around case is no longer needed. > >> >> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> >> --- >> mm/page_alloc.c | 15 +++------------ >> 1 file changed, 3 insertions(+), 12 deletions(-) >> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index 96b7c1a7d1f2..1ddcb2707d05 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -1207,8 +1207,6 @@ static void free_pcppages_bulk(struct zone *zone, int count, >> int pindex) >> { >> unsigned long flags; >> - int min_pindex = 0; >> - int max_pindex = NR_PCP_LISTS - 1; >> unsigned int order; >> bool isolated_pageblocks; >> struct page *page; >> @@ -1231,17 +1229,10 @@ static void free_pcppages_bulk(struct zone *zone, int count, >> >> /* Remove pages from lists in a round-robin fashion. */ >> do { >> - if (++pindex > max_pindex) >> - pindex = min_pindex; >> + if (++pindex > NR_PCP_LISTS - 1) >> + pindex = 0; >> list = &pcp->lists[pindex]; >> - if (!list_empty(list)) >> - break; >> - >> - if (pindex == max_pindex) >> - max_pindex--; >> - if (pindex == min_pindex) >> - min_pindex++; >> - } while (1); >> + } while (list_empty(list)); >> >> order = pindex_to_order(pindex); >> nr_pages = 1 << order; >> -- >> 2.30.0 >> >
Hi Kemeng, On Wed, Aug 16, 2023 at 7:22 PM Kemeng Shi <shikemeng@huaweicloud.com> wrote: > Hi Chris, there is no functional change intended in this patch. As > I menthioned in changelog, there is no wrap for list iteration, so > that the active PCP lists range will never be used. > > I have a patch sitting in my tree for a while related to this > > count vs pcp->count. The BPF function hook can potentially change > > pcp->count and make count out of sync with pcp->count which causes > > a dead loop. In this case the BPF allocates a page inside spin_lock. The "pcp->count" is smaller than "count". The loop condition only checks "count > 0" but all pcp->lists pages have been free. pcp->count is 0 but "count" is 1. After a few times wrap around, the pindex_max is smaller than pindex_min, then reach to -1 cause the invalid page fault. > I guess pcp->count is set to bigger than it should be. In this case, > we will keep trying get pages while all pages in pcp list were taken > off already and dead lock will happen. In this case, dead looo will > happen with or without this patch as the root cause is that we try > to get pages more than pcp list owns.> Maybe I can send my out alone side with yours for discussion? My patch is split into two parts, the first patch is a functional change to allow pcp->count drop below "count". The other patch is just to clean up, and should have the same function. Sure will send it out and CC you for discussion. > > I don't mind my patch combined with yours. > > > Either way is acceptable to me, just feel free to choose one you like > and I'd like to see if more we could do to this. Thanks Chris
diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 96b7c1a7d1f2..1ddcb2707d05 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1207,8 +1207,6 @@ static void free_pcppages_bulk(struct zone *zone, int count, int pindex) { unsigned long flags; - int min_pindex = 0; - int max_pindex = NR_PCP_LISTS - 1; unsigned int order; bool isolated_pageblocks; struct page *page; @@ -1231,17 +1229,10 @@ static void free_pcppages_bulk(struct zone *zone, int count, /* Remove pages from lists in a round-robin fashion. */ do { - if (++pindex > max_pindex) - pindex = min_pindex; + if (++pindex > NR_PCP_LISTS - 1) + pindex = 0; list = &pcp->lists[pindex]; - if (!list_empty(list)) - break; - - if (pindex == max_pindex) - max_pindex--; - if (pindex == min_pindex) - min_pindex++; - } while (1); + } while (list_empty(list)); order = pindex_to_order(pindex); nr_pages = 1 << order;