Message ID | 20230227173632.3292573-30-surenb@google.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp2557194wrd; Mon, 27 Feb 2023 09:50:16 -0800 (PST) X-Google-Smtp-Source: AK7set9WJacytdHpjZ5qike6hV+zNl18usu+c3PRf3erNzk0A/AU1/bb80vGKyK/7p7Jo+CDHd7q X-Received: by 2002:a17:907:2cc5:b0:8ae:e724:ea15 with SMTP id hg5-20020a1709072cc500b008aee724ea15mr43406148ejc.76.1677520216449; Mon, 27 Feb 2023 09:50:16 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1677520216; cv=none; d=google.com; s=arc-20160816; b=xlTi1Tbcp5cJTEtYYP3zE/P22+EzolU9AKOPHGnybWdORvc/7DNutUVT5koKBRWM0m g1GZQPZBcBtSV3yiw5sw8myMtc9C0swRpzp47JVYpowrudwQaDNJ5g23qxQ6h58LRTZ0 ept9x+PpTloiaIypVfJbSguxdKxBZWEMEcpg68RdmTviTVmPMckD8+d9yFkXVe5S9+6r CR1vh8tl8ZfHPXjsuVv/ns2hu+rN+OvL9rzy6utdvPTcd1LEBS5JOZD69DLI5MQt3v3e 3SDP+l0RTGXqYQaOP6T4shMhiBidsrBoONKV9LL+8CNIEbIuOh12NobWpWqkq9fCqnh6 KQ6Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=8qbswzDezEhzstnVvKcWG8k4mqqShtR5CvkIjaL/Ams=; b=E7ZBMMrfF2lL3VmwK+2rlYijqzl5FqT8aTVniFlgcpL5oQZEEXnHWh+CCkKQrMx7ii 1Od2CeYptL2YiWJ2w4EFbRs6HP13Glr/8V08yID4Ukt19NqgNQpHfcjL28EmCZCa1tr4 SxDWrQcHKeONxhuDU88UOfsSXOMjGAbXYrI1eaLwzIotvRNtIPaW3VmA45/CX4pH1fWz bXVkOInEJ1YuRdSTuyuGCrSEhC7tJCmdKF9tFFMHglL7P+HAWF/g7d6rikCjIRzi2bJB Qc1+KXwcn2oXQsi43SuTQonYWbSqLQ8v1admXHvcs8kWUlSlHeAqAINij+qAnIAcMxg3 Nk2w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b="S2z/SSZH"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id k23-20020a170906971700b008ddf3c18301si8412823ejx.653.2023.02.27.09.49.45; Mon, 27 Feb 2023 09:50:16 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b="S2z/SSZH"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230388AbjB0RjJ (ORCPT <rfc822;wenzhi022@gmail.com> + 99 others); Mon, 27 Feb 2023 12:39:09 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48356 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230123AbjB0RiW (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 27 Feb 2023 12:38:22 -0500 Received: from mail-pg1-x54a.google.com (mail-pg1-x54a.google.com [IPv6:2607:f8b0:4864:20::54a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 58F42241F8 for <linux-kernel@vger.kernel.org>; Mon, 27 Feb 2023 09:37:47 -0800 (PST) Received: by mail-pg1-x54a.google.com with SMTP id 6-20020a631046000000b00502afcf62easo2138207pgq.8 for <linux-kernel@vger.kernel.org>; Mon, 27 Feb 2023 09:37:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=8qbswzDezEhzstnVvKcWG8k4mqqShtR5CvkIjaL/Ams=; b=S2z/SSZHK7ga7fn7K3WxQ1Z9Tp5cfnBCSQx6U3NwmV1gsUS+bGmINQRxa2j1mxprJj TlUS9zejozFI94Me4SRmYnZJhHoCJ1T/yWkWmu9QLBmKTP9UVKEKrUis4+6aHa3Th3fJ h+Y+1Jmq5YaDOU6wdiux0RRTdqhbgVjK/U97ZWeloQvvOddB9ya0oDdfoI+0RwHp4MBY fv4OUGk3ZqItB135iDjzRMrjgTvWp+7rVjVBq1LRXg1IvF5Kyuov7vPWbX/2nDhfcOko v7S1G6CGTLyAoqFG7bUJ2+HrxmmeBgvMQ0WZGMska7TydHy4EV5ohTcRFh6YPtqaKDjC j2tg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=8qbswzDezEhzstnVvKcWG8k4mqqShtR5CvkIjaL/Ams=; b=jJ9/gUTnYGhiBGFv0svv9xmVWox3uI02m1ARhea21eJnN+8Tbv6UXxzZp+fM67gx5s Iqq1J3PFJ6vlmt6+rHCCTirVZF/dBVlgNIrLjg7jPOxlJ812jU5S+h+F+XqBVHcVA8oy PywhU2Smf1EG7WTFMD9jxnfLaEg7poYe0mokT4wLrJ2M+CB1Za+/2FTkqssbFIJtMJjG Nh5R4Oc/+xfLTmd+JFrld1qNapGneVj0nb627cX7i3ZNd2+u1DR7VYGguKAwyc/uyYdN 9ep8OPvFRKhD35hRhNaEgsOfPiVS11/D0JkLiTfYF8RcL3ynahensPEeZVOD9wYoKsy/ W5rA== X-Gm-Message-State: AO0yUKV2x9gA7LS2cxYdvIarE1fxGAIyDG2a56S2OPZRodLRslRhvQ2M fOkOVYzoRNP4/cu4yHS995gWlm8I0VQ= X-Received: from surenb-desktop.mtv.corp.google.com ([2620:15c:211:200:e1f6:21d1:eead:3897]) (user=surenb job=sendgmr) by 2002:a63:7a56:0:b0:4fb:b88f:e98a with SMTP id j22-20020a637a56000000b004fbb88fe98amr6296782pgn.7.1677519466345; Mon, 27 Feb 2023 09:37:46 -0800 (PST) Date: Mon, 27 Feb 2023 09:36:28 -0800 In-Reply-To: <20230227173632.3292573-1-surenb@google.com> Mime-Version: 1.0 References: <20230227173632.3292573-1-surenb@google.com> X-Mailer: git-send-email 2.39.2.722.g9855ee24e9-goog Message-ID: <20230227173632.3292573-30-surenb@google.com> Subject: [PATCH v4 29/33] x86/mm: try VMA lock-based page fault handling first From: Suren Baghdasaryan <surenb@google.com> To: akpm@linux-foundation.org Cc: michel@lespinasse.org, jglisse@google.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mgorman@techsingularity.net, dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com, peterz@infradead.org, ldufour@linux.ibm.com, paulmck@kernel.org, mingo@redhat.com, will@kernel.org, luto@kernel.org, songliubraving@fb.com, peterx@redhat.com, david@redhat.com, dhowells@redhat.com, hughd@google.com, bigeasy@linutronix.de, kent.overstreet@linux.dev, punit.agrawal@bytedance.com, lstoakes@gmail.com, peterjung1337@gmail.com, rientjes@google.com, chriscli@google.com, axelrasmussen@google.com, joelaf@google.com, minchan@google.com, rppt@kernel.org, jannh@google.com, shakeelb@google.com, tatashin@google.com, edumazet@google.com, gthelen@google.com, gurua@google.com, arjunroy@google.com, soheil@google.com, leewalsh@google.com, posk@google.com, michalechner92@googlemail.com, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, x86@kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com, Suren Baghdasaryan <surenb@google.com> Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1759007438096353326?= X-GMAIL-MSGID: =?utf-8?q?1759007438096353326?= |
Series |
Per-VMA locks
|
|
Commit Message
Suren Baghdasaryan
Feb. 27, 2023, 5:36 p.m. UTC
Attempt VMA lock-based page fault handling first, and fall back to the
existing mmap_lock-based handling if that fails.
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
arch/x86/Kconfig | 1 +
arch/x86/mm/fault.c | 36 ++++++++++++++++++++++++++++++++++++
2 files changed, 37 insertions(+)
Comments
Hi, On 27. 02. 23, 18:36, Suren Baghdasaryan wrote: > Attempt VMA lock-based page fault handling first, and fall back to the > existing mmap_lock-based handling if that fails. > > Signed-off-by: Suren Baghdasaryan <surenb@google.com> > --- > arch/x86/Kconfig | 1 + > arch/x86/mm/fault.c | 36 ++++++++++++++++++++++++++++++++++++ > 2 files changed, 37 insertions(+) > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > index a825bf031f49..df21fba77db1 100644 > --- a/arch/x86/Kconfig > +++ b/arch/x86/Kconfig > @@ -27,6 +27,7 @@ config X86_64 > # Options that are inherently 64-bit kernel only: > select ARCH_HAS_GIGANTIC_PAGE > select ARCH_SUPPORTS_INT128 if CC_HAS_INT128 > + select ARCH_SUPPORTS_PER_VMA_LOCK > select ARCH_USE_CMPXCHG_LOCKREF > select HAVE_ARCH_SOFT_DIRTY > select MODULES_USE_ELF_RELA > diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c > index a498ae1fbe66..e4399983c50c 100644 > --- a/arch/x86/mm/fault.c > +++ b/arch/x86/mm/fault.c > @@ -19,6 +19,7 @@ > #include <linux/uaccess.h> /* faulthandler_disabled() */ > #include <linux/efi.h> /* efi_crash_gracefully_on_page_fault()*/ > #include <linux/mm_types.h> > +#include <linux/mm.h> /* find_and_lock_vma() */ > > #include <asm/cpufeature.h> /* boot_cpu_has, ... */ > #include <asm/traps.h> /* dotraplinkage, ... */ > @@ -1333,6 +1334,38 @@ void do_user_addr_fault(struct pt_regs *regs, > } > #endif > > +#ifdef CONFIG_PER_VMA_LOCK > + if (!(flags & FAULT_FLAG_USER)) > + goto lock_mmap; > + > + vma = lock_vma_under_rcu(mm, address); > + if (!vma) > + goto lock_mmap; > + > + if (unlikely(access_error(error_code, vma))) { > + vma_end_read(vma); > + goto lock_mmap; > + } > + fault = handle_mm_fault(vma, address, flags | FAULT_FLAG_VMA_LOCK, regs); > + vma_end_read(vma); > + > + if (!(fault & VM_FAULT_RETRY)) { > + count_vm_vma_lock_event(VMA_LOCK_SUCCESS); > + goto done; > + } > + count_vm_vma_lock_event(VMA_LOCK_RETRY); This is apparently not strong enough as it causes go build failures like: [ 409s] strconv [ 409s] releasep: m=0x579e2000 m->p=0x5781c600 p->m=0x0 p->status=2 [ 409s] fatal error: releasep: invalid p state [ 409s] [ 325s] hash/adler32 [ 325s] hash/crc32 [ 325s] cmd/internal/codesign [ 336s] fatal error: runtime: out of memory There are many kinds of similar errors. It happens in 1-3 out of 20 builds only. If I revert the commit on top of 6.4, they all dismiss. Any idea? The downstream report: https://bugzilla.suse.com/show_bug.cgi?id=1212775 > + > + /* Quick path to respond to signals */ > + if (fault_signal_pending(fault, regs)) { > + if (!user_mode(regs)) > + kernelmode_fixup_or_oops(regs, error_code, address, > + SIGBUS, BUS_ADRERR, > + ARCH_DEFAULT_PKEY); > + return; > + } > +lock_mmap: > +#endif /* CONFIG_PER_VMA_LOCK */ > + > /* > * Kernel-mode access to the user address space should only occur > * on well-defined single instructions listed in the exception > @@ -1433,6 +1466,9 @@ void do_user_addr_fault(struct pt_regs *regs, > } > > mmap_read_unlock(mm); > +#ifdef CONFIG_PER_VMA_LOCK > +done: > +#endif > if (likely(!(fault & VM_FAULT_ERROR))) > return; > thanks,
On Thu, Jun 29, 2023 at 7:40 AM Jiri Slaby <jirislaby@kernel.org> wrote: > > Hi, > > On 27. 02. 23, 18:36, Suren Baghdasaryan wrote: > > Attempt VMA lock-based page fault handling first, and fall back to the > > existing mmap_lock-based handling if that fails. > > > > Signed-off-by: Suren Baghdasaryan <surenb@google.com> > > --- > > arch/x86/Kconfig | 1 + > > arch/x86/mm/fault.c | 36 ++++++++++++++++++++++++++++++++++++ > > 2 files changed, 37 insertions(+) > > > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > > index a825bf031f49..df21fba77db1 100644 > > --- a/arch/x86/Kconfig > > +++ b/arch/x86/Kconfig > > @@ -27,6 +27,7 @@ config X86_64 > > # Options that are inherently 64-bit kernel only: > > select ARCH_HAS_GIGANTIC_PAGE > > select ARCH_SUPPORTS_INT128 if CC_HAS_INT128 > > + select ARCH_SUPPORTS_PER_VMA_LOCK > > select ARCH_USE_CMPXCHG_LOCKREF > > select HAVE_ARCH_SOFT_DIRTY > > select MODULES_USE_ELF_RELA > > diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c > > index a498ae1fbe66..e4399983c50c 100644 > > --- a/arch/x86/mm/fault.c > > +++ b/arch/x86/mm/fault.c > > @@ -19,6 +19,7 @@ > > #include <linux/uaccess.h> /* faulthandler_disabled() */ > > #include <linux/efi.h> /* efi_crash_gracefully_on_page_fault()*/ > > #include <linux/mm_types.h> > > +#include <linux/mm.h> /* find_and_lock_vma() */ > > > > #include <asm/cpufeature.h> /* boot_cpu_has, ... */ > > #include <asm/traps.h> /* dotraplinkage, ... */ > > @@ -1333,6 +1334,38 @@ void do_user_addr_fault(struct pt_regs *regs, > > } > > #endif > > > > +#ifdef CONFIG_PER_VMA_LOCK > > + if (!(flags & FAULT_FLAG_USER)) > > + goto lock_mmap; > > + > > + vma = lock_vma_under_rcu(mm, address); > > + if (!vma) > > + goto lock_mmap; > > + > > + if (unlikely(access_error(error_code, vma))) { > > + vma_end_read(vma); > > + goto lock_mmap; > > + } > > + fault = handle_mm_fault(vma, address, flags | FAULT_FLAG_VMA_LOCK, regs); > > + vma_end_read(vma); > > + > > + if (!(fault & VM_FAULT_RETRY)) { > > + count_vm_vma_lock_event(VMA_LOCK_SUCCESS); > > + goto done; > > + } > > + count_vm_vma_lock_event(VMA_LOCK_RETRY); > > This is apparently not strong enough as it causes go build failures like: > > [ 409s] strconv > [ 409s] releasep: m=0x579e2000 m->p=0x5781c600 p->m=0x0 p->status=2 > [ 409s] fatal error: releasep: invalid p state > [ 409s] > > [ 325s] hash/adler32 > [ 325s] hash/crc32 > [ 325s] cmd/internal/codesign > [ 336s] fatal error: runtime: out of memory Hi Jiri, Thanks for reporting! I'm not familiar with go builds. Could you please explain the error to me or point me to some documentation to decipher that error? Thanks, Suren. > > There are many kinds of similar errors. It happens in 1-3 out of 20 > builds only. > > If I revert the commit on top of 6.4, they all dismiss. Any idea? > > The downstream report: > https://bugzilla.suse.com/show_bug.cgi?id=1212775 > > > + > > + /* Quick path to respond to signals */ > > + if (fault_signal_pending(fault, regs)) { > > + if (!user_mode(regs)) > > + kernelmode_fixup_or_oops(regs, error_code, address, > > + SIGBUS, BUS_ADRERR, > > + ARCH_DEFAULT_PKEY); > > + return; > > + } > > +lock_mmap: > > +#endif /* CONFIG_PER_VMA_LOCK */ > > + > > /* > > * Kernel-mode access to the user address space should only occur > > * on well-defined single instructions listed in the exception > > @@ -1433,6 +1466,9 @@ void do_user_addr_fault(struct pt_regs *regs, > > } > > > > mmap_read_unlock(mm); > > +#ifdef CONFIG_PER_VMA_LOCK > > +done: > > +#endif > > if (likely(!(fault & VM_FAULT_ERROR))) > > return; > > > > thanks, > -- > js > suse labs >
Linux regression tracking (Thorsten Leemhuis)
June 29, 2023, 5:06 p.m. UTC |
#3
Addressed
Unaddressed
[CCing the regression list, as it should be in the loop for regressions: https://docs.kernel.org/admin-guide/reporting-regressions.html] On 29.06.23 16:40, Jiri Slaby wrote: > > On 27. 02. 23, 18:36, Suren Baghdasaryan wrote: >> Attempt VMA lock-based page fault handling first, and fall back to the >> existing mmap_lock-based handling if that fails. > [...] >> + fault = handle_mm_fault(vma, address, flags | >> FAULT_FLAG_VMA_LOCK, regs); >> + vma_end_read(vma); >> + >> + if (!(fault & VM_FAULT_RETRY)) { >> + count_vm_vma_lock_event(VMA_LOCK_SUCCESS); >> + goto done; >> + } >> + count_vm_vma_lock_event(VMA_LOCK_RETRY); > > This is apparently not strong enough as it causes go build failures like: > > [ 409s] strconv > [ 409s] releasep: m=0x579e2000 m->p=0x5781c600 p->m=0x0 p->status=2 > [ 409s] fatal error: releasep: invalid p state > [ 409s] > > [ 325s] hash/adler32 > [ 325s] hash/crc32 > [ 325s] cmd/internal/codesign > [ 336s] fatal error: runtime: out of memory > > There are many kinds of similar errors. It happens in 1-3 out of 20 > builds only. > > If I revert the commit on top of 6.4, they all dismiss. Any idea? > > The downstream report: > https://bugzilla.suse.com/show_bug.cgi?id=1212775 > [...] Thanks for the report. To be sure the issue doesn't fall through the cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression tracking bot: #regzbot ^introduced 0bff0aaea03e2a3ed6bfa3021 https://bugzilla.suse.com/show_bug.cgi?id=1212775 #regzbot title mm: failures when building go in 1-3 out of 20 builds #regzbot ignore-activity Developers: When fixing the issue, remember to add 'Link:' tags pointing to the report (the parent of this mail). See page linked in footer for details. Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr That page also explains what to do if mails like this annoy you.
On 29. 06. 23, 17:30, Suren Baghdasaryan wrote: > On Thu, Jun 29, 2023 at 7:40 AM Jiri Slaby <jirislaby@kernel.org> wrote: >> >> Hi, >> >> On 27. 02. 23, 18:36, Suren Baghdasaryan wrote: >>> Attempt VMA lock-based page fault handling first, and fall back to the >>> existing mmap_lock-based handling if that fails. >>> >>> Signed-off-by: Suren Baghdasaryan <surenb@google.com> >>> --- >>> arch/x86/Kconfig | 1 + >>> arch/x86/mm/fault.c | 36 ++++++++++++++++++++++++++++++++++++ >>> 2 files changed, 37 insertions(+) >>> >>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig >>> index a825bf031f49..df21fba77db1 100644 >>> --- a/arch/x86/Kconfig >>> +++ b/arch/x86/Kconfig >>> @@ -27,6 +27,7 @@ config X86_64 >>> # Options that are inherently 64-bit kernel only: >>> select ARCH_HAS_GIGANTIC_PAGE >>> select ARCH_SUPPORTS_INT128 if CC_HAS_INT128 >>> + select ARCH_SUPPORTS_PER_VMA_LOCK >>> select ARCH_USE_CMPXCHG_LOCKREF >>> select HAVE_ARCH_SOFT_DIRTY >>> select MODULES_USE_ELF_RELA >>> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c >>> index a498ae1fbe66..e4399983c50c 100644 >>> --- a/arch/x86/mm/fault.c >>> +++ b/arch/x86/mm/fault.c >>> @@ -19,6 +19,7 @@ >>> #include <linux/uaccess.h> /* faulthandler_disabled() */ >>> #include <linux/efi.h> /* efi_crash_gracefully_on_page_fault()*/ >>> #include <linux/mm_types.h> >>> +#include <linux/mm.h> /* find_and_lock_vma() */ >>> >>> #include <asm/cpufeature.h> /* boot_cpu_has, ... */ >>> #include <asm/traps.h> /* dotraplinkage, ... */ >>> @@ -1333,6 +1334,38 @@ void do_user_addr_fault(struct pt_regs *regs, >>> } >>> #endif >>> >>> +#ifdef CONFIG_PER_VMA_LOCK >>> + if (!(flags & FAULT_FLAG_USER)) >>> + goto lock_mmap; >>> + >>> + vma = lock_vma_under_rcu(mm, address); >>> + if (!vma) >>> + goto lock_mmap; >>> + >>> + if (unlikely(access_error(error_code, vma))) { >>> + vma_end_read(vma); >>> + goto lock_mmap; >>> + } >>> + fault = handle_mm_fault(vma, address, flags | FAULT_FLAG_VMA_LOCK, regs); >>> + vma_end_read(vma); >>> + >>> + if (!(fault & VM_FAULT_RETRY)) { >>> + count_vm_vma_lock_event(VMA_LOCK_SUCCESS); >>> + goto done; >>> + } >>> + count_vm_vma_lock_event(VMA_LOCK_RETRY); >> >> This is apparently not strong enough as it causes go build failures like: >> >> [ 409s] strconv >> [ 409s] releasep: m=0x579e2000 m->p=0x5781c600 p->m=0x0 p->status=2 >> [ 409s] fatal error: releasep: invalid p state >> [ 409s] >> >> [ 325s] hash/adler32 >> [ 325s] hash/crc32 >> [ 325s] cmd/internal/codesign >> [ 336s] fatal error: runtime: out of memory > > Hi Jiri, > Thanks for reporting! I'm not familiar with go builds. Could you > please explain the error to me or point me to some documentation to > decipher that error? Sorry, we are on the same boat -- me neither. It only popped up in our (openSUSE) build system and I only tracked it down by bisection. Let me know if I can try something (like a patch or gathering some debug info). thanks,
On 30. 06. 23, 8:35, Jiri Slaby wrote: > On 29. 06. 23, 17:30, Suren Baghdasaryan wrote: >> On Thu, Jun 29, 2023 at 7:40 AM Jiri Slaby <jirislaby@kernel.org> wrote: >>> >>> Hi, >>> >>> On 27. 02. 23, 18:36, Suren Baghdasaryan wrote: >>>> Attempt VMA lock-based page fault handling first, and fall back to the >>>> existing mmap_lock-based handling if that fails. >>>> >>>> Signed-off-by: Suren Baghdasaryan <surenb@google.com> >>>> --- >>>> arch/x86/Kconfig | 1 + >>>> arch/x86/mm/fault.c | 36 ++++++++++++++++++++++++++++++++++++ >>>> 2 files changed, 37 insertions(+) >>>> >>>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig >>>> index a825bf031f49..df21fba77db1 100644 >>>> --- a/arch/x86/Kconfig >>>> +++ b/arch/x86/Kconfig >>>> @@ -27,6 +27,7 @@ config X86_64 >>>> # Options that are inherently 64-bit kernel only: >>>> select ARCH_HAS_GIGANTIC_PAGE >>>> select ARCH_SUPPORTS_INT128 if CC_HAS_INT128 >>>> + select ARCH_SUPPORTS_PER_VMA_LOCK >>>> select ARCH_USE_CMPXCHG_LOCKREF >>>> select HAVE_ARCH_SOFT_DIRTY >>>> select MODULES_USE_ELF_RELA >>>> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c >>>> index a498ae1fbe66..e4399983c50c 100644 >>>> --- a/arch/x86/mm/fault.c >>>> +++ b/arch/x86/mm/fault.c >>>> @@ -19,6 +19,7 @@ >>>> #include <linux/uaccess.h> /* >>>> faulthandler_disabled() */ >>>> #include <linux/efi.h> /* >>>> efi_crash_gracefully_on_page_fault()*/ >>>> #include <linux/mm_types.h> >>>> +#include <linux/mm.h> /* find_and_lock_vma() */ >>>> >>>> #include <asm/cpufeature.h> /* boot_cpu_has, >>>> ... */ >>>> #include <asm/traps.h> /* dotraplinkage, >>>> ... */ >>>> @@ -1333,6 +1334,38 @@ void do_user_addr_fault(struct pt_regs *regs, >>>> } >>>> #endif >>>> >>>> +#ifdef CONFIG_PER_VMA_LOCK >>>> + if (!(flags & FAULT_FLAG_USER)) >>>> + goto lock_mmap; >>>> + >>>> + vma = lock_vma_under_rcu(mm, address); >>>> + if (!vma) >>>> + goto lock_mmap; >>>> + >>>> + if (unlikely(access_error(error_code, vma))) { >>>> + vma_end_read(vma); >>>> + goto lock_mmap; >>>> + } >>>> + fault = handle_mm_fault(vma, address, flags | >>>> FAULT_FLAG_VMA_LOCK, regs); >>>> + vma_end_read(vma); >>>> + >>>> + if (!(fault & VM_FAULT_RETRY)) { >>>> + count_vm_vma_lock_event(VMA_LOCK_SUCCESS); >>>> + goto done; >>>> + } >>>> + count_vm_vma_lock_event(VMA_LOCK_RETRY); >>> >>> This is apparently not strong enough as it causes go build failures >>> like: >>> >>> [ 409s] strconv >>> [ 409s] releasep: m=0x579e2000 m->p=0x5781c600 p->m=0x0 p->status=2 >>> [ 409s] fatal error: releasep: invalid p state >>> [ 409s] >>> >>> [ 325s] hash/adler32 >>> [ 325s] hash/crc32 >>> [ 325s] cmd/internal/codesign >>> [ 336s] fatal error: runtime: out of memory >> >> Hi Jiri, >> Thanks for reporting! I'm not familiar with go builds. Could you >> please explain the error to me or point me to some documentation to >> decipher that error? > > Sorry, we are on the same boat -- me neither. It only popped up in our > (openSUSE) build system and I only tracked it down by bisection. Let me > know if I can try something (like a patch or gathering some debug info). FWIW, a failed build log: https://decibel.fi.muni.cz/~xslaby/n/vma/log.txt and a strace for it: https://decibel.fi.muni.cz/~xslaby/n/vma/strace.txt An excerpt from the log: [ 55s] runtime: marked free object in span 0x7fca6824bec8, elemsize=192 freeindex=0 (bad use of unsafe.Pointer? try -d=checkptr) [ 55s] 0xc0002f2000 alloc marked [ 55s] 0xc0002f20c0 alloc marked [ 55s] 0xc0002f2180 alloc marked [ 55s] 0xc0002f2240 free unmarked [ 55s] 0xc0002f2300 alloc marked [ 55s] 0xc0002f23c0 alloc marked [ 55s] 0xc0002f2480 alloc marked [ 55s] 0xc0002f2540 alloc marked [ 55s] 0xc0002f2600 alloc marked [ 55s] 0xc0002f26c0 alloc marked [ 55s] 0xc0002f2780 alloc marked [ 55s] 0xc0002f2840 alloc marked [ 55s] 0xc0002f2900 alloc marked [ 55s] 0xc0002f29c0 free unmarked [ 55s] 0xc0002f2a80 alloc marked [ 55s] 0xc0002f2b40 alloc marked [ 55s] 0xc0002f2c00 alloc marked [ 55s] 0xc0002f2cc0 alloc marked [ 55s] 0xc0002f2d80 alloc marked [ 55s] 0xc0002f2e40 alloc marked [ 55s] 0xc0002f2f00 alloc marked [ 55s] 0xc0002f2fc0 alloc marked [ 55s] 0xc0002f3080 alloc marked [ 55s] 0xc0002f3140 alloc marked [ 55s] 0xc0002f3200 alloc marked [ 55s] 0xc0002f32c0 alloc marked [ 55s] 0xc0002f3380 free unmarked [ 55s] 0xc0002f3440 free marked zombie An excerpt from strace: > 2348 clone3({flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, child_tid=0x7fcaa6a1b990, parent_tid=0x7fcaa6a1b990, exit_signal=0, stack=0x7fcaa621b000, stack_size=0x7ffe00, tls=0x7fcaa6a1b6c0} => {parent_tid=[2350]}, 88) = 2350 > 2348 clone3({flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, child_tid=0x7fcaa5882990, parent_tid=0x7fcaa5882990, exit_signal=0, stack=0x7fcaa5082000, stack_size=0x7ffe00, tls=0x7fcaa58826c0} => {parent_tid=[2351]}, 88) = 2351 > 2350 <... clone3 resumed> => {parent_tid=[2372]}, 88) = 2372 > 2351 <... clone3 resumed> => {parent_tid=[2354]}, 88) = 2354 > 2351 <... clone3 resumed> => {parent_tid=[2357]}, 88) = 2357 > 2354 <... clone3 resumed> => {parent_tid=[2355]}, 88) = 2355 > 2355 <... clone3 resumed> => {parent_tid=[2370]}, 88) = 2370 > 2370 mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...> > 2370 <... mmap resumed>) = 0x7fca68249000 > 2372 <... clone3 resumed> => {parent_tid=[2384]}, 88) = 2384 > 2384 <... clone3 resumed> => {parent_tid=[2388]}, 88) = 2388 > 2388 <... clone3 resumed> => {parent_tid=[2392]}, 88) = 2392 > 2392 <... clone3 resumed> => {parent_tid=[2395]}, 88) = 2395 > 2395 write(2, "runtime: marked free object in s"..., 36 <unfinished ...> I.e. IIUC, all are threads (CLONE_VM) and thread 2370 mapped ANON 0x7fca68249000 - 0x7fca6827ffff and go in thread 2395 thinks for some reason 0x7fca6824bec8 in that region is "bad". > thanks,--
On 30. 06. 23, 10:28, Jiri Slaby wrote: > > 2348 > clone3({flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, child_tid=0x7fcaa5882990, parent_tid=0x7fcaa5882990, exit_signal=0, stack=0x7fcaa5082000, stack_size=0x7ffe00, tls=0x7fcaa58826c0} => {parent_tid=[2351]}, 88) = 2351 > > 2350 <... clone3 resumed> => {parent_tid=[2372]}, 88) = 2372 > > 2351 <... clone3 resumed> => {parent_tid=[2354]}, 88) = 2354 > > 2351 <... clone3 resumed> => {parent_tid=[2357]}, 88) = 2357 > > 2354 <... clone3 resumed> => {parent_tid=[2355]}, 88) = 2355 > > 2355 <... clone3 resumed> => {parent_tid=[2370]}, 88) = 2370 > > 2370 mmap(NULL, 262144, PROT_READ|PROT_WRITE, > MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...> > > 2370 <... mmap resumed>) = 0x7fca68249000 > > 2372 <... clone3 resumed> => {parent_tid=[2384]}, 88) = 2384 > > 2384 <... clone3 resumed> => {parent_tid=[2388]}, 88) = 2388 > > 2388 <... clone3 resumed> => {parent_tid=[2392]}, 88) = 2392 > > 2392 <... clone3 resumed> => {parent_tid=[2395]}, 88) = 2395 > > 2395 write(2, "runtime: marked free object in s"..., 36 <unfinished > ...> > > I.e. IIUC, all are threads (CLONE_VM) and thread 2370 mapped ANON > 0x7fca68249000 - 0x7fca6827ffff and go in thread 2395 thinks for some > reason 0x7fca6824bec8 in that region is "bad". As I was noticed, this might be as well be a fail of the go's inter-thread communication (or alike) too. It might now be only more exposed with vma-based locks as we can do more parallelism now. There are older hard to reproduce bugs in go with similar symptoms (we see this error sometimes now too): https://github.com/golang/go/issues/15246 Or this 2016 bug is a red herring. Hard to tell... >> thanks,
On Fri, Jun 30, 2023 at 1:43 AM Jiri Slaby <jirislaby@kernel.org> wrote: > > On 30. 06. 23, 10:28, Jiri Slaby wrote: > > > 2348 > > clone3({flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, child_tid=0x7fcaa5882990, parent_tid=0x7fcaa5882990, exit_signal=0, stack=0x7fcaa5082000, stack_size=0x7ffe00, tls=0x7fcaa58826c0} => {parent_tid=[2351]}, 88) = 2351 > > > 2350 <... clone3 resumed> => {parent_tid=[2372]}, 88) = 2372 > > > 2351 <... clone3 resumed> => {parent_tid=[2354]}, 88) = 2354 > > > 2351 <... clone3 resumed> => {parent_tid=[2357]}, 88) = 2357 > > > 2354 <... clone3 resumed> => {parent_tid=[2355]}, 88) = 2355 > > > 2355 <... clone3 resumed> => {parent_tid=[2370]}, 88) = 2370 > > > 2370 mmap(NULL, 262144, PROT_READ|PROT_WRITE, > > MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...> > > > 2370 <... mmap resumed>) = 0x7fca68249000 > > > 2372 <... clone3 resumed> => {parent_tid=[2384]}, 88) = 2384 > > > 2384 <... clone3 resumed> => {parent_tid=[2388]}, 88) = 2388 > > > 2388 <... clone3 resumed> => {parent_tid=[2392]}, 88) = 2392 > > > 2392 <... clone3 resumed> => {parent_tid=[2395]}, 88) = 2395 > > > 2395 write(2, "runtime: marked free object in s"..., 36 <unfinished > > ...> > > > > I.e. IIUC, all are threads (CLONE_VM) and thread 2370 mapped ANON > > 0x7fca68249000 - 0x7fca6827ffff and go in thread 2395 thinks for some > > reason 0x7fca6824bec8 in that region is "bad". Thanks for the analysis Jiri. Is it possible from these logs to identify whether 2370 finished the mmap operation before 2395 tried to access 0x7fca6824bec8? That access has to happen only after mmap finishes mapping the region. > > As I was noticed, this might be as well be a fail of the go's > inter-thread communication (or alike) too. It might now be only more > exposed with vma-based locks as we can do more parallelism now. Yes, with multithreaded processes like these where threads are mapping and accessing memory areas, per-VMA locks should allow for greater parallelism. So, if there is a race like the one I asked above, it might become more pronounced with per-VMA locks. I'll double check the code, but from Kernel's POV mmap would take the mmap_lock for write then will lock the VMA lock for write. That should prevent any page fault handlers from accessing this VMA in parallel until writers release the locks. Page fault path would try to find the VMA without any lock and then will try to read-lock that VMA. If it fails it will fall back to mmap_lock. So, if the writer started first and obtained the VMA lock, the reader will fall back to mmap_lock and will block until the writer releases the mmap_lock. If the reader got VMA read lock first then the writer will block while obtaining the VMA's write lock. However for your scenario, the reader (page fault) might be getting here before the writer (mmap) and upon not finding the VMA it is looking for, it will fail. Please let me know if you can verify this scenario. Thanks, Suren. > > There are older hard to reproduce bugs in go with similar symptoms (we > see this error sometimes now too): > https://github.com/golang/go/issues/15246 > > Or this 2016 bug is a red herring. Hard to tell... > > >> thanks, > -- > js > suse labs >
Linux regression tracking (Thorsten Leemhuis)
July 3, 2023, 9:58 a.m. UTC |
#8
Addressed
Unaddressed
On 29.06.23 16:40, Jiri Slaby wrote: > On 27. 02. 23, 18:36, Suren Baghdasaryan wrote: >> Attempt VMA lock-based page fault handling first, and fall back to the >> existing mmap_lock-based handling if that fails. >> >> Signed-off-by: Suren Baghdasaryan <surenb@google.com> >> --- >> arch/x86/Kconfig | 1 + >> arch/x86/mm/fault.c | 36 ++++++++++++++++++++++++++++++++++++ >> 2 files changed, 37 insertions(+) >> >> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig >> index a825bf031f49..df21fba77db1 100644 >> --- a/arch/x86/Kconfig >> +++ b/arch/x86/Kconfig >> @@ -27,6 +27,7 @@ config X86_64 >> # Options that are inherently 64-bit kernel only: >> select ARCH_HAS_GIGANTIC_PAGE >> select ARCH_SUPPORTS_INT128 if CC_HAS_INT128 >> + select ARCH_SUPPORTS_PER_VMA_LOCK >> select ARCH_USE_CMPXCHG_LOCKREF >> select HAVE_ARCH_SOFT_DIRTY >> select MODULES_USE_ELF_RELA >> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c >> index a498ae1fbe66..e4399983c50c 100644 >> --- a/arch/x86/mm/fault.c >> +++ b/arch/x86/mm/fault.c >> @@ -19,6 +19,7 @@ >> #include <linux/uaccess.h> /* faulthandler_disabled() */ >> #include <linux/efi.h> /* >> efi_crash_gracefully_on_page_fault()*/ >> #include <linux/mm_types.h> >> +#include <linux/mm.h> /* find_and_lock_vma() */ >> #include <asm/cpufeature.h> /* boot_cpu_has, ... */ >> #include <asm/traps.h> /* dotraplinkage, ... */ >> @@ -1333,6 +1334,38 @@ void do_user_addr_fault(struct pt_regs *regs, >> } >> #endif >> +#ifdef CONFIG_PER_VMA_LOCK >> + if (!(flags & FAULT_FLAG_USER)) >> + goto lock_mmap; >> + >> + vma = lock_vma_under_rcu(mm, address); >> + if (!vma) >> + goto lock_mmap; >> + >> + if (unlikely(access_error(error_code, vma))) { >> + vma_end_read(vma); >> + goto lock_mmap; >> + } >> + fault = handle_mm_fault(vma, address, flags | >> FAULT_FLAG_VMA_LOCK, regs); >> + vma_end_read(vma); >> + >> + if (!(fault & VM_FAULT_RETRY)) { >> + count_vm_vma_lock_event(VMA_LOCK_SUCCESS); >> + goto done; >> + } >> + count_vm_vma_lock_event(VMA_LOCK_RETRY); > > This is apparently not strong enough as it causes go build failures like: TWIMC & for the record: there is another report about trouble caused by this change; for details see https://bugzilla.kernel.org/show_bug.cgi?id=217624 And a "forward to devs and lists" thread about that report: https://lore.kernel.org/all/facbfec3-837a-51ed-85fa-31021c17d6ef@gmail.com/ Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr If I did something stupid, please tell me, as explained on that page. > [ 409s] strconv > [ 409s] releasep: m=0x579e2000 m->p=0x5781c600 p->m=0x0 p->status=2 > [ 409s] fatal error: releasep: invalid p state > [ 409s] > > [ 325s] hash/adler32 > [ 325s] hash/crc32 > [ 325s] cmd/internal/codesign > [ 336s] fatal error: runtime: out of memory > > There are many kinds of similar errors. It happens in 1-3 out of 20 > builds only. > > If I revert the commit on top of 6.4, they all dismiss. Any idea? > > The downstream report: > https://bugzilla.suse.com/show_bug.cgi?id=1212775 > >> + >> + /* Quick path to respond to signals */ >> + if (fault_signal_pending(fault, regs)) { >> + if (!user_mode(regs)) >> + kernelmode_fixup_or_oops(regs, error_code, address, >> + SIGBUS, BUS_ADRERR, >> + ARCH_DEFAULT_PKEY); >> + return; >> + } >> +lock_mmap: >> +#endif /* CONFIG_PER_VMA_LOCK */ >> + >> /* >> * Kernel-mode access to the user address space should only occur >> * on well-defined single instructions listed in the exception >> @@ -1433,6 +1466,9 @@ void do_user_addr_fault(struct pt_regs *regs, >> } >> mmap_read_unlock(mm); >> +#ifdef CONFIG_PER_VMA_LOCK >> +done: >> +#endif >> if (likely(!(fault & VM_FAULT_ERROR))) >> return; >> > > thanks,
Cc Jacob Young (from kernel bugzilla) On 30. 06. 23, 19:40, Suren Baghdasaryan wrote: > On Fri, Jun 30, 2023 at 1:43 AM Jiri Slaby <jirislaby@kernel.org> wrote: >> >> On 30. 06. 23, 10:28, Jiri Slaby wrote: >>> > 2348 >>> clone3({flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, child_tid=0x7fcaa5882990, parent_tid=0x7fcaa5882990, exit_signal=0, stack=0x7fcaa5082000, stack_size=0x7ffe00, tls=0x7fcaa58826c0} => {parent_tid=[2351]}, 88) = 2351 >>> > 2350 <... clone3 resumed> => {parent_tid=[2372]}, 88) = 2372 >>> > 2351 <... clone3 resumed> => {parent_tid=[2354]}, 88) = 2354 >>> > 2351 <... clone3 resumed> => {parent_tid=[2357]}, 88) = 2357 >>> > 2354 <... clone3 resumed> => {parent_tid=[2355]}, 88) = 2355 >>> > 2355 <... clone3 resumed> => {parent_tid=[2370]}, 88) = 2370 >>> > 2370 mmap(NULL, 262144, PROT_READ|PROT_WRITE, >>> MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...> >>> > 2370 <... mmap resumed>) = 0x7fca68249000 >>> > 2372 <... clone3 resumed> => {parent_tid=[2384]}, 88) = 2384 >>> > 2384 <... clone3 resumed> => {parent_tid=[2388]}, 88) = 2388 >>> > 2388 <... clone3 resumed> => {parent_tid=[2392]}, 88) = 2392 >>> > 2392 <... clone3 resumed> => {parent_tid=[2395]}, 88) = 2395 >>> > 2395 write(2, "runtime: marked free object in s"..., 36 <unfinished >>> ...> >>> >>> I.e. IIUC, all are threads (CLONE_VM) and thread 2370 mapped ANON >>> 0x7fca68249000 - 0x7fca6827ffff and go in thread 2395 thinks for some >>> reason 0x7fca6824bec8 in that region is "bad". > > Thanks for the analysis Jiri. > Is it possible from these logs to identify whether 2370 finished the > mmap operation before 2395 tried to access 0x7fca6824bec8? That access > has to happen only after mmap finishes mapping the region. Hi, it's hard to tell, but I assume so. For now, forget about this go's overly complicated, hard to reproduce case and concentrate on the very nice reduced testcase in: https://bugzilla.kernel.org/show_bug.cgi?id=217624 ;) FWIW, I can reproduce using the test case too. thanks,
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index a825bf031f49..df21fba77db1 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -27,6 +27,7 @@ config X86_64 # Options that are inherently 64-bit kernel only: select ARCH_HAS_GIGANTIC_PAGE select ARCH_SUPPORTS_INT128 if CC_HAS_INT128 + select ARCH_SUPPORTS_PER_VMA_LOCK select ARCH_USE_CMPXCHG_LOCKREF select HAVE_ARCH_SOFT_DIRTY select MODULES_USE_ELF_RELA diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index a498ae1fbe66..e4399983c50c 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -19,6 +19,7 @@ #include <linux/uaccess.h> /* faulthandler_disabled() */ #include <linux/efi.h> /* efi_crash_gracefully_on_page_fault()*/ #include <linux/mm_types.h> +#include <linux/mm.h> /* find_and_lock_vma() */ #include <asm/cpufeature.h> /* boot_cpu_has, ... */ #include <asm/traps.h> /* dotraplinkage, ... */ @@ -1333,6 +1334,38 @@ void do_user_addr_fault(struct pt_regs *regs, } #endif +#ifdef CONFIG_PER_VMA_LOCK + if (!(flags & FAULT_FLAG_USER)) + goto lock_mmap; + + vma = lock_vma_under_rcu(mm, address); + if (!vma) + goto lock_mmap; + + if (unlikely(access_error(error_code, vma))) { + vma_end_read(vma); + goto lock_mmap; + } + fault = handle_mm_fault(vma, address, flags | FAULT_FLAG_VMA_LOCK, regs); + vma_end_read(vma); + + if (!(fault & VM_FAULT_RETRY)) { + count_vm_vma_lock_event(VMA_LOCK_SUCCESS); + goto done; + } + count_vm_vma_lock_event(VMA_LOCK_RETRY); + + /* Quick path to respond to signals */ + if (fault_signal_pending(fault, regs)) { + if (!user_mode(regs)) + kernelmode_fixup_or_oops(regs, error_code, address, + SIGBUS, BUS_ADRERR, + ARCH_DEFAULT_PKEY); + return; + } +lock_mmap: +#endif /* CONFIG_PER_VMA_LOCK */ + /* * Kernel-mode access to the user address space should only occur * on well-defined single instructions listed in the exception @@ -1433,6 +1466,9 @@ void do_user_addr_fault(struct pt_regs *regs, } mmap_read_unlock(mm); +#ifdef CONFIG_PER_VMA_LOCK +done: +#endif if (likely(!(fault & VM_FAULT_ERROR))) return;