Message ID | 20230125015738.912924-1-zokeefe@google.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp43234wrn; Tue, 24 Jan 2023 17:58:47 -0800 (PST) X-Google-Smtp-Source: AMrXdXuue9GEjm9+/iv86hUmYcuOxlBS9A5OZCBW7oy1lyLuvMdiy3x1a1AxaGSm9l1uynHTV1MC X-Received: by 2002:a17:902:9002:b0:194:9c02:7619 with SMTP id a2-20020a170902900200b001949c027619mr31445223plp.29.1674611926871; Tue, 24 Jan 2023 17:58:46 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1674611926; cv=none; d=google.com; s=arc-20160816; b=oQeDarYb3BlgHKr/wzKzVsXAX7IRrKpi/+8OxKmAxa0PHlWRl7/h0nXTZhBfVeWdQ6 NvX+zbNuFe0UypEayv+v2NFcdbHxOarxPbYPImkpK2yQP2u0E37D8fY9zkW6JAJc+7o+ 4IxMwB4JjBQ8JsmWxf+ornRu8FQnykuDMpTaAvZ4uJDAuhbQlJQ7vND3GFgaT+ETYj+v nEcUZH5Yd8BjCAkaWpihMjtZwvjgUeGZmmb01L+QuGtFRcgsxQQUvNSLXyWLOE5IQI19 iwhUuX11WqDEYEZh2tTBcjaGdXYXAuyMSJk7kcmwTPBadL401CzDzdQOSmYEapj03Ubk Jj4A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:mime-version:date :dkim-signature; bh=fp9ccs/R/HqGAoRU6R/FJYwx892zglssKTnZXb6TNhs=; b=k2e563GvsDBnWPhMNuk1IZV3JO4v8I0ke3qrWoMVVNSLF3tywVDgIC5SwYO6DkrMTf mOBsuPJ0T9uvuCcFTlXLTsqN/oWiqyyA+pojXatNINYKCAJek4bq58aID8Fts3wqcFYx x+KYy2rba+cSsX/Ygpi7efY/NRLYcWD8MY8vm1wWnVfr4kfhNW0lwERkct39o3tmmfM9 PWnB8PRTD0qE4IwRxLBN7+Acy3p0bqbDaDueb8O3WFJPf+LqkJFcbksWc080Mjr4eVNw P6+OJpvSXtYL5pd7lRU+I00B488yp0WI1OrvCQN4cq0c6/tK0HJCprBGkzSI/smLai5S 2EiQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=WAqnNhti; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 13-20020a170902c24d00b00194968874f1si3996552plg.453.2023.01.24.17.58.35; Tue, 24 Jan 2023 17:58:46 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=WAqnNhti; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230160AbjAYB6Q (ORCPT <rfc822;rust.linux@gmail.com> + 99 others); Tue, 24 Jan 2023 20:58:16 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45202 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231286AbjAYB6O (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 24 Jan 2023 20:58:14 -0500 Received: from mail-pj1-x104a.google.com (mail-pj1-x104a.google.com [IPv6:2607:f8b0:4864:20::104a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AE54BCD for <linux-kernel@vger.kernel.org>; Tue, 24 Jan 2023 17:58:11 -0800 (PST) Received: by mail-pj1-x104a.google.com with SMTP id lk5-20020a17090b33c500b00228cb369d7aso280771pjb.5 for <linux-kernel@vger.kernel.org>; Tue, 24 Jan 2023 17:58:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:mime-version:date:from:to:cc:subject :date:message-id:reply-to; bh=fp9ccs/R/HqGAoRU6R/FJYwx892zglssKTnZXb6TNhs=; b=WAqnNhtiG8WjUY8O+bk3uMlpTRdXAO1398g06DJIjO5tpLENLCDEFEWfl5MKQQ2cS4 FJmL4Hx1MgPkUvaWuxuEKoXfRmKWL2q9d5N75BjvVzLdMDt/6/LMiboSf2ZuostsQRoT 5oOPTgTiEDS0FKff/+yrSTTKn/j66fTfrr+Er9d8jkJiPj/NjiPkpckv7Sfa8Z4H5BlN +qW9JTz3wAKujy3DdbdwCaEfka7lpLk/bYQvXMvMgNfLHIjIFV2LvQDDSZdYttuhSOO5 tVfxl+1M1BxCZvN4M8o48BWAxGGoiHGmTA+PLRW0WNl344BBT0PzlqW+jQqTi6sTu+My HpVQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:mime-version:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=fp9ccs/R/HqGAoRU6R/FJYwx892zglssKTnZXb6TNhs=; b=3QujvQ+fz0XBqNSE0t9UJ34Y5Ss1ztLQY2PAjGKwJI6jtFKW0I5BE/QagwhSsqfuGI pp9Yx7Wy/83B2X36cleUechny506RuoSairPzGfjibCq4ljac2k7N4AHRr9CrTwDQ6zU mKCQOX9lToGXrmN9at0C+zESN3ypYFLdhdFNoCusVo+Lwg2uOlbduvIeu3UiP6ChYy// HaUV7/cMcLmAxPQG9669Z9Is7hfw9ObXw/yCYnbsgkWPpW80fM+iR4hUl1gU9/r2XCw1 NGt2H30//at7etQlPdPbjqiGBmFROr/L/2O9gppYX963DKyOtMh4a8fd8fdFp25pr5is M9IQ== X-Gm-Message-State: AFqh2koCf1bFjEHfofhtmGrQhLOp4WHLqAN8DgNBm6VHWIFHD75EcOj/ 8gztPSHDWMZRBNqyRikXXj0PnAjK5AeN X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a05:6a00:22ca:b0:58b:3d7f:746a with SMTP id f10-20020a056a0022ca00b0058b3d7f746amr3665212pfj.77.1674611891064; Tue, 24 Jan 2023 17:58:11 -0800 (PST) Date: Tue, 24 Jan 2023 17:57:37 -0800 Mime-Version: 1.0 X-Mailer: git-send-email 2.39.1.405.gd4c25cc71f-goog Message-ID: <20230125015738.912924-1-zokeefe@google.com> Subject: [PATCH 1/2] mm/MADV_COLLAPSE: set EAGAIN on unexpected page refcount From: "Zach O'Keefe" <zokeefe@google.com> To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Andrew Morton <akpm@linux-foundation.org>, Hugh Dickins <hughd@google.com>, Yang Shi <shy828301@gmail.com>, "Zach O'Keefe" <zokeefe@google.com> Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1755957875970271137?= X-GMAIL-MSGID: =?utf-8?q?1755957875970271137?= |
Series |
[1/2] mm/MADV_COLLAPSE: set EAGAIN on unexpected page refcount
|
|
Commit Message
Zach O'Keefe
Jan. 25, 2023, 1:57 a.m. UTC
During collapse, in a few places we check to see if a given small page
has any unaccounted references. If the refcount on the page doesn't
match our expectations, it must be there is an unknown user concurrently
interested in the page, and so it's not safe to move the contents
elsewhere. However, the unaccounted pins are likely an ephemeral state.
In such a situation, make MADV_COLLAPSE set EAGAIN errno, indicating that
collapse may succeed on retry.
Fixes: 7d8faaf15545 ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse")
Reported-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Zach O'Keefe <zokeefe@google.com>
---
mm/khugepaged.c | 1 +
1 file changed, 1 insertion(+)
Comments
On Tue, Jan 24, 2023 at 5:58 PM Zach O'Keefe <zokeefe@google.com> wrote: > > During collapse, in a few places we check to see if a given small page > has any unaccounted references. If the refcount on the page doesn't > match our expectations, it must be there is an unknown user concurrently > interested in the page, and so it's not safe to move the contents > elsewhere. However, the unaccounted pins are likely an ephemeral state. > > In such a situation, make MADV_COLLAPSE set EAGAIN errno, indicating that > collapse may succeed on retry. The page may be DMA pinned (for example, pin_user_pages()), it is not worth retrying for such pages. But it may also not be worth optimizing for this case at this point. So the patch looks good to me. Reviewed-by: Yang Shi <shy828301@gmail.com> > > Fixes: 7d8faaf15545 ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse") > Reported-by: Hugh Dickins <hughd@google.com> > Signed-off-by: Zach O'Keefe <zokeefe@google.com> > > --- > mm/khugepaged.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > index e23619bfecc4..fa38cae240b9 100644 > --- a/mm/khugepaged.c > +++ b/mm/khugepaged.c > @@ -2712,6 +2712,7 @@ static int madvise_collapse_errno(enum scan_result r) > case SCAN_CGROUP_CHARGE_FAIL: > return -EBUSY; > /* Resource temporary unavailable - trying again might succeed */ > + case SCAN_PAGE_COUNT: > case SCAN_PAGE_LOCK: > case SCAN_PAGE_LRU: > case SCAN_DEL_PAGE_LRU: > -- > 2.39.1.405.gd4c25cc71f-goog >
On Wed, Jan 25, 2023 at 10:06 AM Yang Shi <shy828301@gmail.com> wrote: > > On Tue, Jan 24, 2023 at 5:58 PM Zach O'Keefe <zokeefe@google.com> wrote: > > > > During collapse, in a few places we check to see if a given small page > > has any unaccounted references. If the refcount on the page doesn't > > match our expectations, it must be there is an unknown user concurrently > > interested in the page, and so it's not safe to move the contents > > elsewhere. However, the unaccounted pins are likely an ephemeral state. > > > > In such a situation, make MADV_COLLAPSE set EAGAIN errno, indicating that > > collapse may succeed on retry. > > The page may be DMA pinned (for example, pin_user_pages()), it is not > worth retrying for such pages. But it may also not be worth optimizing > for this case at this point. > > So the patch looks good to me. Reviewed-by: Yang Shi <shy828301@gmail.com> Thanks as always, Yang, and good point about DMA pinning. As you mentioned, I don't know if it's worth considering that too much right now, as it's unlikely these two uses (MADV_COLLAPSE and DMA pining) would be used together. We can revisit if necessary later if it's an issue, but for now, I think it's a win that MADV_COLLAPSE (+ a bounded userspace retry loop based off erno) is more likely to succeed.
On Tue, 24 Jan 2023, Zach O'Keefe wrote: > During collapse, in a few places we check to see if a given small page > has any unaccounted references. If the refcount on the page doesn't > match our expectations, it must be there is an unknown user concurrently > interested in the page, and so it's not safe to move the contents > elsewhere. However, the unaccounted pins are likely an ephemeral state. > > In such a situation, make MADV_COLLAPSE set EAGAIN errno, indicating that > collapse may succeed on retry. > > Fixes: 7d8faaf15545 ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse") > Reported-by: Hugh Dickins <hughd@google.com> > Signed-off-by: Zach O'Keefe <zokeefe@google.com> This was Reviewed-by: Yang Shi <shy828301@gmail.com> and now I'll give it a nudge with Acked-by: Hugh Dickins <hughd@google.com> since it hasn't appeared in mm-unstable or linux-next yet: I think its Cc:stable sibling 2/2, already in 6.2-rc, got all the attention. Thanks! Hugh > > --- > mm/khugepaged.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > index e23619bfecc4..fa38cae240b9 100644 > --- a/mm/khugepaged.c > +++ b/mm/khugepaged.c > @@ -2712,6 +2712,7 @@ static int madvise_collapse_errno(enum scan_result r) > case SCAN_CGROUP_CHARGE_FAIL: > return -EBUSY; > /* Resource temporary unavailable - trying again might succeed */ > + case SCAN_PAGE_COUNT: > case SCAN_PAGE_LOCK: > case SCAN_PAGE_LRU: > case SCAN_DEL_PAGE_LRU: > -- > 2.39.1.405.gd4c25cc71f-goog
On Wed, 8 Feb 2023 21:09:04 -0800 (PST) Hugh Dickins <hughd@google.com> wrote: > On Tue, 24 Jan 2023, Zach O'Keefe wrote: > > > During collapse, in a few places we check to see if a given small page > > has any unaccounted references. If the refcount on the page doesn't > > match our expectations, it must be there is an unknown user concurrently > > interested in the page, and so it's not safe to move the contents > > elsewhere. However, the unaccounted pins are likely an ephemeral state. > > > > In such a situation, make MADV_COLLAPSE set EAGAIN errno, indicating that > > collapse may succeed on retry. > > > > Fixes: 7d8faaf15545 ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse") > > Reported-by: Hugh Dickins <hughd@google.com> > > Signed-off-by: Zach O'Keefe <zokeefe@google.com> > > This was > Reviewed-by: Yang Shi <shy828301@gmail.com> > and now I'll give it a nudge with > Acked-by: Hugh Dickins <hughd@google.com> > since it hasn't appeared in mm-unstable or linux-next yet: Buildbot failed on [2/2] so I skipped the whole series in expectation of a v2 series, which didn't happen. Instead, Zach trickily sent what was [2/2] as a standalone patch. So [1/2] got lost. Sigh, poor me. Thanks, I'll merge [1/2] into mm-hotfixes. > I think its Cc:stable sibling 2/2, already in 6.2-rc, got all the attention. I'm not seeing anything in the [1/2] changelog which indicates that a backport is needed. IOW, # cat .signature When fixing a bug, please describe the end-user visible effects of that bug.
On Thu, 9 Feb 2023, Andrew Morton wrote: > > Thanks, I'll merge [1/2] into mm-hotfixes. Great, thanks. > > I'm not seeing anything in the [1/2] changelog which indicates that a > backport is needed. IOW, Correct: it's just changing the errno for some racy cases from "you're wrong, don't bother me again" to "it might be worth having another go": not fixing an instability, as 2/2 was. > > # cat .signature > When fixing a bug, please describe the end-user visible effects of that bug. If whatever's being run by the end-user is coded to try again on -EAGAIN, then the end-user will less often see occasional unexplained failures. Hugh
On Thu, 9 Feb 2023 13:50:30 -0800 (PST) Hugh Dickins <hughd@google.com> wrote: > > > > > I'm not seeing anything in the [1/2] changelog which indicates that a > > backport is needed. IOW, > > Correct: it's just changing the errno for some racy cases from "you're > wrong, don't bother me again" to "it might be worth having another go": > not fixing an instability, as 2/2 was. > > > > > # cat .signature > > When fixing a bug, please describe the end-user visible effects of that bug. > > If whatever's being run by the end-user is coded to try again on -EAGAIN, > then the end-user will less often see occasional unexplained failures. > OK, thanks. I redid the changelog's final paragraph thusly: : In this situation, MADV_COLLAPSE returns -EINVAL when it should return : -EAGAIN. This could cause userspace to conclude that the syscall failed, : when it in fact could succeed by retrying.
On Thu, Feb 9, 2023 at 2:12 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > On Thu, 9 Feb 2023 13:50:30 -0800 (PST) Hugh Dickins <hughd@google.com> wrote: > > > > > > > > > I'm not seeing anything in the [1/2] changelog which indicates that a > > > backport is needed. IOW, > > > > Correct: it's just changing the errno for some racy cases from "you're > > wrong, don't bother me again" to "it might be worth having another go": > > not fixing an instability, as 2/2 was. > > > > > > > > # cat .signature > > > When fixing a bug, please describe the end-user visible effects of that bug. > > > > If whatever's being run by the end-user is coded to try again on -EAGAIN, > > then the end-user will less often see occasional unexplained failures. > > > > OK, thanks. I redid the changelog's final paragraph thusly: > > : In this situation, MADV_COLLAPSE returns -EINVAL when it should return > : -EAGAIN. This could cause userspace to conclude that the syscall failed, > : when it in fact could succeed by retrying. > This looks good to me. Thanks Andrew! Also thanks Hugh for being on the lookout for this patch -- I hastily read through my emails regarding which patches were merged where and had assumed this merged with 2/2. Also, apologies about the confusing v1 [1/2] and v2 [2/2] fiasco; in hindsight that probably wasn't the most decipherable thing to do :) Best, Zach
diff --git a/mm/khugepaged.c b/mm/khugepaged.c index e23619bfecc4..fa38cae240b9 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -2712,6 +2712,7 @@ static int madvise_collapse_errno(enum scan_result r) case SCAN_CGROUP_CHARGE_FAIL: return -EBUSY; /* Resource temporary unavailable - trying again might succeed */ + case SCAN_PAGE_COUNT: case SCAN_PAGE_LOCK: case SCAN_PAGE_LRU: case SCAN_DEL_PAGE_LRU: