Message ID | 20230414175444.1837474-1-surenb@google.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp570009vqo; Fri, 14 Apr 2023 11:10:11 -0700 (PDT) X-Google-Smtp-Source: AKy350ZSjqjcj2XM85nCPZNKs9p/4MNG4C0KjS6QWsmb8PpB5cqxo3PmYH5iA6klM6EY1yAsi/gl X-Received: by 2002:a05:6a20:bb21:b0:cc:7967:8a75 with SMTP id fc33-20020a056a20bb2100b000cc79678a75mr5792169pzb.46.1681495811242; Fri, 14 Apr 2023 11:10:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1681495811; cv=none; d=google.com; s=arc-20160816; b=AZsMR2gBdzwxJD4556zRrLKoBIzuknqhAAqliT6hzp010xda11prs97uJ+4SBWILzd UCkqKqlACE2geKMOaS9wlB461QyR6nCPyEXgEofJyznTpzV+01NRY5ndrniQYC7LrysK bltgkfddDrh4i2hAnoBm67G52QbDCodaZ0MIUwzqwOvFnuLwoOKVPR5VIia/4WAKv3d+ TCaoplJTOwuVzptbtGCCcLWlbWO6NoiqV/Tq87LVejAik+sHNwS3gnasZ6DNBHOH61/I riwgqfbSw1H2knM6uNlxDc1hcTJCV5Q2nih2OekMEwLBGfv+tnG7bLgaufobVkrZBiIR DsDQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:mime-version:date :dkim-signature; bh=sxtuJlZyuhJWBSe3BUp1o+R468Jyg7wjt5EQDKL8liw=; b=QRNzO9iR8SjxSQO+/LKygEqUarUdlqvgvtncVWg0/6Se3RYNcaYSe4RttfWOFblEef jSqhJbzfQyf0NYIBkBpv2tFxM0SZ+h0aoe9kjKBmk/GI+FGujC4zLQsOIA77QsG9Geb/ tpewZ03pF9gvjw8j916vHZ/YN6tQDMxkpmgO7vkprZkjxRYN9hbrB9kZSjqVPezZpC5+ Cq25niPDA+ZhpvN4XIEG+nOReNZ8DR6HkAwGIotJ5ECgf0WiBt9u1EFpRmQI6691bpNi u18wuzyV6nox1UEkbsPEGocOIazfW6g3bOsIGB9nQGNQNXs4TLPMF0iokIhes2JNz5Xr eWKw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=g9SpJgAT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u15-20020a63470f000000b0051b280be4a2si4920265pga.486.2023.04.14.11.09.52; Fri, 14 Apr 2023 11:10:11 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=g9SpJgAT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229930AbjDNRyw (ORCPT <rfc822;leviz.kernel.dev@gmail.com> + 99 others); Fri, 14 Apr 2023 13:54:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42820 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229564AbjDNRyu (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 14 Apr 2023 13:54:50 -0400 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 30F29210C for <linux-kernel@vger.kernel.org>; Fri, 14 Apr 2023 10:54:49 -0700 (PDT) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-54fbb2ee579so64040757b3.14 for <linux-kernel@vger.kernel.org>; Fri, 14 Apr 2023 10:54:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1681494888; x=1684086888; h=cc:to:from:subject:message-id:mime-version:date:from:to:cc:subject :date:message-id:reply-to; bh=sxtuJlZyuhJWBSe3BUp1o+R468Jyg7wjt5EQDKL8liw=; b=g9SpJgATsQDIYrkaPJd8d+PpBLT6jxYfne0OaHPIARm2IOQH+5ZUzrk668RVIdTFIK sxOkrjHFb2IwJhV+Rhhqr+Ms2OhPNi3AodJ+9OY+2GG2/dSng1wTYMa57UFr7FfZAVxI Mjh/iCg/xVEVvZMZlFypxHv6CoV31SltYbDHcAJCNrxBVXuHnfWdWgPZy7WTw1rRq2XX TXhZzJwJ3E/Ie4KgOXvM9CGQPa5RcER+MZF/7gxO1S4H4wu9t0bNr4IGAKdJ6Cqg7aGH BaxrJvra46EdXACNO7xHjaZO0BZXtOmHgL+PMxZE4/oVvLRUiP56FT8IvxbimldOWzpD U8NA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1681494888; x=1684086888; h=cc:to:from:subject:message-id:mime-version:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=sxtuJlZyuhJWBSe3BUp1o+R468Jyg7wjt5EQDKL8liw=; b=a6PAdwh6iKzhaUGvdJsmCMnvJvv/F/B4ciY0hKXWen3QRQ0MOECxeQ172NLcNIDDAs NuzYSEJxazm2q/YRkSDoiQ0ivwd57TgBcLVRO/BK/EKeTUfdY3aAjx1gH3IgLx4+/c1P ZFhyLrdX/hC3bGmMdfil+0ObFBcNNKcZJ3bW0nRjVD5yZNgcIQAHs1aXEQuRY/1xh7O7 NOVQEc7CSsmBB03+xiCzGlI1TE1J2VxdnQ1lV+dOd9QGOsG1JZbMVHgyOLrz+u03ZN+y C0wEEahKDxYpooSpnfBORTwuwS9JO55VcfQkrhRo3NOjoux1ztn9aBBdiX5fNINQkORb EkcA== X-Gm-Message-State: AAQBX9dQlPl7cbO5uFUM1P5bfN7yydTx0UFza1M4Mpk6a91EIQSo+gQY 7J0Us05wy0npzhm59IQuXvi0UmzC8dk= X-Received: from surenb-desktop.mtv.corp.google.com ([2620:15c:211:201:46c0:7584:f020:e09f]) (user=surenb job=sendgmr) by 2002:a25:730c:0:b0:b91:b64e:743d with SMTP id o12-20020a25730c000000b00b91b64e743dmr1199758ybc.9.1681494888506; Fri, 14 Apr 2023 10:54:48 -0700 (PDT) Date: Fri, 14 Apr 2023 10:54:44 -0700 Mime-Version: 1.0 X-Mailer: git-send-email 2.40.0.634.g4ca3ef3211-goog Message-ID: <20230414175444.1837474-1-surenb@google.com> Subject: [PATCH 1/1] mm: do not increment pgfault stats when page fault handler retries From: Suren Baghdasaryan <surenb@google.com> To: akpm@linux-foundation.org Cc: willy@infradead.org, hannes@cmpxchg.org, mhocko@suse.com, josef@toxicpanda.com, jack@suse.cz, ldufour@linux.ibm.com, laurent.dufour@fr.ibm.com, michel@lespinasse.org, liam.howlett@oracle.com, jglisse@google.com, vbabka@suse.cz, minchan@google.com, dave@stgolabs.net, punit.agrawal@bytedance.com, lstoakes@gmail.com, surenb@google.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1763176152008191411?= X-GMAIL-MSGID: =?utf-8?q?1763176152008191411?= |
Series |
[1/1] mm: do not increment pgfault stats when page fault handler retries
|
|
Commit Message
Suren Baghdasaryan
April 14, 2023, 5:54 p.m. UTC
If the page fault handler requests a retry, we will count the fault multiple times. This is a relatively harmless problem as the retry paths are not often requested, and the only user-visible problem is that the fault counter will be slightly higher than it should be. Nevertheless, userspace only took one fault, and should not see the fact that the kernel had to retry the fault multiple times. Fixes: 6b4c9f446981 ("filemap: drop the mmap_sem for all blocking operations") Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> --- Patch applies cleanly over linux-next and mm-unstable mm/memory.c | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-)
Comments
On Fri, Apr 14, 2023 at 10:54:44AM -0700, Suren Baghdasaryan wrote: > If the page fault handler requests a retry, we will count the fault > multiple times. This is a relatively harmless problem as the retry paths > are not often requested, and the only user-visible problem is that the > fault counter will be slightly higher than it should be. Nevertheless, > userspace only took one fault, and should not see the fact that the > kernel had to retry the fault multiple times. > > Fixes: 6b4c9f446981 ("filemap: drop the mmap_sem for all blocking operations") I know I suggested this fixes line, but I think it's actually been here much longer, perhaps since Fixes: d065bd810b6d ("mm: retry page fault when blocking on disk transfer") Michel, what do you think? > Signed-off-by: Suren Baghdasaryan <surenb@google.com> > Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> > --- > Patch applies cleanly over linux-next and mm-unstable > > mm/memory.c | 16 ++++++++++------ > 1 file changed, 10 insertions(+), 6 deletions(-) > > diff --git a/mm/memory.c b/mm/memory.c > index 1c5b231fe6e3..d88f370eacd1 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -5212,17 +5212,16 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address, > > __set_current_state(TASK_RUNNING); > > - count_vm_event(PGFAULT); > - count_memcg_event_mm(vma->vm_mm, PGFAULT); > - > ret = sanitize_fault_flags(vma, &flags); > if (ret) > - return ret; > + goto out; > > if (!arch_vma_access_permitted(vma, flags & FAULT_FLAG_WRITE, > flags & FAULT_FLAG_INSTRUCTION, > - flags & FAULT_FLAG_REMOTE)) > - return VM_FAULT_SIGSEGV; > + flags & FAULT_FLAG_REMOTE)) { > + ret = VM_FAULT_SIGSEGV; > + goto out; > + } > > /* > * Enable the memcg OOM handling for faults triggered in user > @@ -5253,6 +5252,11 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address, > } > > mm_account_fault(regs, address, flags, ret); > +out: > + if (!(ret & VM_FAULT_RETRY)) { > + count_vm_event(PGFAULT); > + count_memcg_event_mm(vma->vm_mm, PGFAULT); > + } > > return ret; > } > -- > 2.40.0.634.g4ca3ef3211-goog >
Hi, Suren, On Fri, Apr 14, 2023 at 10:54:44AM -0700, Suren Baghdasaryan wrote: > If the page fault handler requests a retry, we will count the fault > multiple times. This is a relatively harmless problem as the retry paths > are not often requested, and the only user-visible problem is that the > fault counter will be slightly higher than it should be. Nevertheless, > userspace only took one fault, and should not see the fact that the > kernel had to retry the fault multiple times. > > Fixes: 6b4c9f446981 ("filemap: drop the mmap_sem for all blocking operations") > Signed-off-by: Suren Baghdasaryan <surenb@google.com> > Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> > --- > Patch applies cleanly over linux-next and mm-unstable > > mm/memory.c | 16 ++++++++++------ > 1 file changed, 10 insertions(+), 6 deletions(-) > > diff --git a/mm/memory.c b/mm/memory.c > index 1c5b231fe6e3..d88f370eacd1 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -5212,17 +5212,16 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address, > > __set_current_state(TASK_RUNNING); > > - count_vm_event(PGFAULT); > - count_memcg_event_mm(vma->vm_mm, PGFAULT); > - > ret = sanitize_fault_flags(vma, &flags); > if (ret) > - return ret; > + goto out; > > if (!arch_vma_access_permitted(vma, flags & FAULT_FLAG_WRITE, > flags & FAULT_FLAG_INSTRUCTION, > - flags & FAULT_FLAG_REMOTE)) > - return VM_FAULT_SIGSEGV; > + flags & FAULT_FLAG_REMOTE)) { > + ret = VM_FAULT_SIGSEGV; > + goto out; > + } > > /* > * Enable the memcg OOM handling for faults triggered in user > @@ -5253,6 +5252,11 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address, > } > > mm_account_fault(regs, address, flags, ret); Here is the mm_account_fault() function taking care of some other accountings. Perhaps good to put things into it? It also already ignores invalid faults: if (ret & (VM_FAULT_ERROR | VM_FAULT_RETRY)) return; I see that you may also want to account for sigbus, however I really don't know why. Explanations would be great when it would matter. So far it makes sense to me if we skip both RETRY or ERROR cases. > +out: > + if (!(ret & VM_FAULT_RETRY)) { > + count_vm_event(PGFAULT); > + count_memcg_event_mm(vma->vm_mm, PGFAULT); There is one thing worth noticing is here vma may or may not be valid depending on the retval of the fault. RETRY is exactly one of the cases that accessing vma may be unsafe due to releasing of mmap read lock. The other one is the recently added VM_FAULT_COMPLETE. So if we want to move this chunk (or any vma reference) to be later we need to consider a valid vma / mm being there first, or we're prone to accessing a vma that has already been released, I think. > + } > > return ret; > } > -- > 2.40.0.634.g4ca3ef3211-goog > > Thanks,
On Fri, Apr 14, 2023 at 2:47 PM Peter Xu <peterx@redhat.com> wrote: > > Hi, Suren, Hi Peter, > > On Fri, Apr 14, 2023 at 10:54:44AM -0700, Suren Baghdasaryan wrote: > > If the page fault handler requests a retry, we will count the fault > > multiple times. This is a relatively harmless problem as the retry paths > > are not often requested, and the only user-visible problem is that the > > fault counter will be slightly higher than it should be. Nevertheless, > > userspace only took one fault, and should not see the fact that the > > kernel had to retry the fault multiple times. > > > > Fixes: 6b4c9f446981 ("filemap: drop the mmap_sem for all blocking operations") > > Signed-off-by: Suren Baghdasaryan <surenb@google.com> > > Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> > > --- > > Patch applies cleanly over linux-next and mm-unstable > > > > mm/memory.c | 16 ++++++++++------ > > 1 file changed, 10 insertions(+), 6 deletions(-) > > > > diff --git a/mm/memory.c b/mm/memory.c > > index 1c5b231fe6e3..d88f370eacd1 100644 > > --- a/mm/memory.c > > +++ b/mm/memory.c > > @@ -5212,17 +5212,16 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address, > > > > __set_current_state(TASK_RUNNING); > > > > - count_vm_event(PGFAULT); > > - count_memcg_event_mm(vma->vm_mm, PGFAULT); > > - > > ret = sanitize_fault_flags(vma, &flags); > > if (ret) > > - return ret; > > + goto out; > > > > if (!arch_vma_access_permitted(vma, flags & FAULT_FLAG_WRITE, > > flags & FAULT_FLAG_INSTRUCTION, > > - flags & FAULT_FLAG_REMOTE)) > > - return VM_FAULT_SIGSEGV; > > + flags & FAULT_FLAG_REMOTE)) { > > + ret = VM_FAULT_SIGSEGV; > > + goto out; > > + } > > > > /* > > * Enable the memcg OOM handling for faults triggered in user > > @@ -5253,6 +5252,11 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address, > > } > > > > mm_account_fault(regs, address, flags, ret); > > Here is the mm_account_fault() function taking care of some other > accountings. Perhaps good to put things into it? That seems appropriate. Let me take a closer look. > > It also already ignores invalid faults: > > if (ret & (VM_FAULT_ERROR | VM_FAULT_RETRY)) > return; Can there be a case of (!VM_FAULT_ERROR && VM_FAULT_RETRY) - basically we need to retry but no errors happened? If so then this condition would double-count pagefaults in such cases. If such return code is impossible then it's the same as checking for VM_FAULT_RETRY. > > I see that you may also want to account for sigbus, however I really don't > know why. Explanations would be great when it would matter. So far it > makes sense to me if we skip both RETRY or ERROR cases. Accounting in case of a sigbus is not affected by this patch I think. We account for sigbus or any other error cases because there was a pagefault and we need to account for it. Whether we failed to handle it or not should not affect the count. We skip the retry case because we know the same fault will be retried. If we don't skip then we will double-count this fault. > > > +out: > > + if (!(ret & VM_FAULT_RETRY)) { > > + count_vm_event(PGFAULT); > > + count_memcg_event_mm(vma->vm_mm, PGFAULT); > > There is one thing worth noticing is here vma may or may not be valid > depending on the retval of the fault. > > RETRY is exactly one of the cases that accessing vma may be unsafe due to > releasing of mmap read lock. The other one is the recently added > VM_FAULT_COMPLETE. So if we want to move this chunk (or any vma reference) > to be later we need to consider a valid vma / mm being there first, or > we're prone to accessing a vma that has already been released, I think. Good catch! I think you are right and I should have stored vma->vm_mm in the beginning and used it when calling count_memcg_event_mm(). I'll prepare a new patch which handles this correctly. Thanks, Suren. > > > + } > > > > return ret; > > } > > -- > > 2.40.0.634.g4ca3ef3211-goog > > > > > > Thanks, > > -- > Peter Xu > > -- > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com. >
On Fri, Apr 14, 2023 at 3:14 PM Suren Baghdasaryan <surenb@google.com> wrote: > > On Fri, Apr 14, 2023 at 2:47 PM Peter Xu <peterx@redhat.com> wrote: > > > > Hi, Suren, > > Hi Peter, > > > > > On Fri, Apr 14, 2023 at 10:54:44AM -0700, Suren Baghdasaryan wrote: > > > If the page fault handler requests a retry, we will count the fault > > > multiple times. This is a relatively harmless problem as the retry paths > > > are not often requested, and the only user-visible problem is that the > > > fault counter will be slightly higher than it should be. Nevertheless, > > > userspace only took one fault, and should not see the fact that the > > > kernel had to retry the fault multiple times. > > > > > > Fixes: 6b4c9f446981 ("filemap: drop the mmap_sem for all blocking operations") > > > Signed-off-by: Suren Baghdasaryan <surenb@google.com> > > > Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> > > > --- > > > Patch applies cleanly over linux-next and mm-unstable > > > > > > mm/memory.c | 16 ++++++++++------ > > > 1 file changed, 10 insertions(+), 6 deletions(-) > > > > > > diff --git a/mm/memory.c b/mm/memory.c > > > index 1c5b231fe6e3..d88f370eacd1 100644 > > > --- a/mm/memory.c > > > +++ b/mm/memory.c > > > @@ -5212,17 +5212,16 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address, > > > > > > __set_current_state(TASK_RUNNING); > > > > > > - count_vm_event(PGFAULT); > > > - count_memcg_event_mm(vma->vm_mm, PGFAULT); > > > - > > > ret = sanitize_fault_flags(vma, &flags); > > > if (ret) > > > - return ret; > > > + goto out; > > > > > > if (!arch_vma_access_permitted(vma, flags & FAULT_FLAG_WRITE, > > > flags & FAULT_FLAG_INSTRUCTION, > > > - flags & FAULT_FLAG_REMOTE)) > > > - return VM_FAULT_SIGSEGV; > > > + flags & FAULT_FLAG_REMOTE)) { > > > + ret = VM_FAULT_SIGSEGV; > > > + goto out; > > > + } > > > > > > /* > > > * Enable the memcg OOM handling for faults triggered in user > > > @@ -5253,6 +5252,11 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address, > > > } > > > > > > mm_account_fault(regs, address, flags, ret); > > > > Here is the mm_account_fault() function taking care of some other > > accountings. Perhaps good to put things into it? > > That seems appropriate. Let me take a closer look. > > > > > It also already ignores invalid faults: > > > > if (ret & (VM_FAULT_ERROR | VM_FAULT_RETRY)) > > return; > > Can there be a case of (!VM_FAULT_ERROR && VM_FAULT_RETRY) - basically > we need to retry but no errors happened? If so then this condition > would double-count pagefaults in such cases. If such return code is > impossible then it's the same as checking for VM_FAULT_RETRY. > > > > > I see that you may also want to account for sigbus, however I really don't > > know why. Explanations would be great when it would matter. So far it > > makes sense to me if we skip both RETRY or ERROR cases. > > Accounting in case of a sigbus is not affected by this patch I think. > We account for sigbus or any other error cases because there was a > pagefault and we need to account for it. Whether we failed to handle > it or not should not affect the count. We skip the retry case because > we know the same fault will be retried. If we don't skip then we will > double-count this fault. mm_account_fault() has a nice comment explaining why it skips errors and that now makes sense to me. Let me move the accounting there and see if others agree that's the right place. > > > > > > +out: > > > + if (!(ret & VM_FAULT_RETRY)) { > > > + count_vm_event(PGFAULT); > > > + count_memcg_event_mm(vma->vm_mm, PGFAULT); > > > > There is one thing worth noticing is here vma may or may not be valid > > depending on the retval of the fault. > > > > RETRY is exactly one of the cases that accessing vma may be unsafe due to > > releasing of mmap read lock. The other one is the recently added > > VM_FAULT_COMPLETE. So if we want to move this chunk (or any vma reference) > > to be later we need to consider a valid vma / mm being there first, or > > we're prone to accessing a vma that has already been released, I think. > > Good catch! I think you are right and I should have stored vma->vm_mm > in the beginning and used it when calling count_memcg_event_mm(). > I'll prepare a new patch which handles this correctly. > Thanks, > Suren. > > > > > > + } > > > > > > return ret; > > > } > > > -- > > > 2.40.0.634.g4ca3ef3211-goog > > > > > > > > > > Thanks, > > > > -- > > Peter Xu > > > > -- > > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com. > >
Hi, Suren, On Fri, Apr 14, 2023 at 03:14:23PM -0700, Suren Baghdasaryan wrote: > > It also already ignores invalid faults: > > > > if (ret & (VM_FAULT_ERROR | VM_FAULT_RETRY)) > > return; > > Can there be a case of (!VM_FAULT_ERROR && VM_FAULT_RETRY) - basically > we need to retry but no errors happened? If so then this condition > would double-count pagefaults in such cases. If ret==VM_FAULT_RETRY it should return here already, so I assume mm_account_fault() itself is fine regarding fault retries? Note that I think "ret & (VM_FAULT_ERROR | VM_FAULT_RETRY)" above means "either ERROR or RETRY we'll skip the accounting". IMHO we should have 3 cases here: - ERROR && !RETRY error triggered of any kind - RETRY && !ERROR we need to try one more time - !RETRY && !ERROR we finished the fault I don't think ERROR & RETRY can even be set at the same time so I assume there's no option 4) - a RETRY should imply no ERROR already, even though it's still incomplete so need another attempt. Thanks,
On Fri, Apr 14, 2023 at 3:35 PM Peter Xu <peterx@redhat.com> wrote: > > Hi, Suren, > > On Fri, Apr 14, 2023 at 03:14:23PM -0700, Suren Baghdasaryan wrote: > > > It also already ignores invalid faults: > > > > > > if (ret & (VM_FAULT_ERROR | VM_FAULT_RETRY)) > > > return; > > > > Can there be a case of (!VM_FAULT_ERROR && VM_FAULT_RETRY) - basically > > we need to retry but no errors happened? If so then this condition > > would double-count pagefaults in such cases. > > If ret==VM_FAULT_RETRY it should return here already, so I assume > mm_account_fault() itself is fine regarding fault retries? > > Note that I think "ret & (VM_FAULT_ERROR | VM_FAULT_RETRY)" above means > "either ERROR or RETRY we'll skip the accounting". > > IMHO we should have 3 cases here: > > - ERROR && !RETRY > error triggered of any kind > > - RETRY && !ERROR > we need to try one more time > > - !RETRY && !ERROR > we finished the fault After looking some more into mm_account_fault(), I think it would be fine to count the faults which produced errors. IIUC these counters represent the total number of faults, not the number of valid and successful faults. If so then I think simply using VM_FAULT_RETRY should be ok without considering all possible combinations. WDYT? > > I don't think ERROR & RETRY can even be set at the same time so I assume > there's no option 4) - a RETRY should imply no ERROR already, even though > it's still incomplete so need another attempt. > > Thanks, > > -- > Peter Xu >
On Fri, Apr 14, 2023 at 4:49 PM Suren Baghdasaryan <surenb@google.com> wrote: > > On Fri, Apr 14, 2023 at 3:35 PM Peter Xu <peterx@redhat.com> wrote: > > > > Hi, Suren, > > > > On Fri, Apr 14, 2023 at 03:14:23PM -0700, Suren Baghdasaryan wrote: > > > > It also already ignores invalid faults: > > > > > > > > if (ret & (VM_FAULT_ERROR | VM_FAULT_RETRY)) > > > > return; > > > > > > Can there be a case of (!VM_FAULT_ERROR && VM_FAULT_RETRY) - basically > > > we need to retry but no errors happened? If so then this condition > > > would double-count pagefaults in such cases. > > > > If ret==VM_FAULT_RETRY it should return here already, so I assume > > mm_account_fault() itself is fine regarding fault retries? > > > > Note that I think "ret & (VM_FAULT_ERROR | VM_FAULT_RETRY)" above means > > "either ERROR or RETRY we'll skip the accounting". > > > > IMHO we should have 3 cases here: > > > > - ERROR && !RETRY > > error triggered of any kind > > > > - RETRY && !ERROR > > we need to try one more time > > > > - !RETRY && !ERROR > > we finished the fault > > After looking some more into mm_account_fault(), I think it would be > fine to count the faults which produced errors. IIUC these counters > represent the total number of faults, not the number of valid and > successful faults. If so then I think simply using VM_FAULT_RETRY > should be ok without considering all possible combinations. WDYT? I posted v2 at https://lore.kernel.org/all/20230415000818.1955007-1-surenb@google.com/ Hopefully it's closer to what we want it to be. > > > > > I don't think ERROR & RETRY can even be set at the same time so I assume > > there's no option 4) - a RETRY should imply no ERROR already, even though > > it's still incomplete so need another attempt. > > > > Thanks, > > > > -- > > Peter Xu > >
diff --git a/mm/memory.c b/mm/memory.c index 1c5b231fe6e3..d88f370eacd1 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -5212,17 +5212,16 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address, __set_current_state(TASK_RUNNING); - count_vm_event(PGFAULT); - count_memcg_event_mm(vma->vm_mm, PGFAULT); - ret = sanitize_fault_flags(vma, &flags); if (ret) - return ret; + goto out; if (!arch_vma_access_permitted(vma, flags & FAULT_FLAG_WRITE, flags & FAULT_FLAG_INSTRUCTION, - flags & FAULT_FLAG_REMOTE)) - return VM_FAULT_SIGSEGV; + flags & FAULT_FLAG_REMOTE)) { + ret = VM_FAULT_SIGSEGV; + goto out; + } /* * Enable the memcg OOM handling for faults triggered in user @@ -5253,6 +5252,11 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address, } mm_account_fault(regs, address, flags, ret); +out: + if (!(ret & VM_FAULT_RETRY)) { + count_vm_event(PGFAULT); + count_memcg_event_mm(vma->vm_mm, PGFAULT); + } return ret; }