Message ID | df548a6ae3fa135eec3b446eb3dae8eb4227da97.1682885809.git.lstoakes@gmail.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp2315770vqo; Sun, 30 Apr 2023 13:32:33 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4+Bv02wFccFF5GIRwY3KpNwYCUuincjHuHeI8QrA1reg1OrOfC8/xaWjfsEo54eyWRzsfX X-Received: by 2002:a05:6a21:150b:b0:f2:9161:d4de with SMTP id nq11-20020a056a21150b00b000f29161d4demr12065545pzb.49.1682886753348; Sun, 30 Apr 2023 13:32:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1682886753; cv=none; d=google.com; s=arc-20160816; b=N/kr6ZxbqZowBabmIIPzL82n06TGiFG44dO/Os7EPOoEZzVdJpCa/NyQ7N4ebEL5lq SMLAz6zSCs2D5QIRYHfYk30q/9wc8Ec9oIYBNRf4YX4a4wQprhVPdfpnld+ijvfHBUkw rQPfzX88SuHbfkllWBh/TiS1z8hsNq6yfZk0izjKmNU0ExrfZumOJ8b16xTNT0TY10HI nsbTorgkE2iMnrx69nqzHrWDS2kN6XeziMPnU423We2rB6WW98ZuHNt6FVdheMxxlQSU NBbOAd+XvCnGZkyudcEG3hImkaBsRwaVti2pS/Xw5pB80e14qyaFCfYI77q/44YucBbc WH1A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=b23kKCtHH6bptYMnbkiFzswByhO4ugXGC7lReBIJxcE=; b=kfYD1bM2DVfZ+NQu7NkUVu7tMd0TaAWefquFnO+LD4FXmQV1T020wdQb/SLr6pUwqM aoXESr+gQcLqK9423G/mBwR+tgnV+TTu6JHYHi6JFyZB3y3579TFJTS6LGVCpc22EgB0 21XG0Tr0eL1jcLBPoLFE+lmUM8YjEtuzmfhkuumohe5y5NeZjQ+ajX1SLHBOpUTc8y5D jTXu0TMWRPfkDXuZ/BBjxpCMJ/Bf1sf3vsnMDHs8t5/eJquJx9ITV0Hw/Ou8Im1JakK9 o+j1mXBpbywo82AoQAiWT+tFxvTzldMFVeE/ChVh+ToNqGeVrBBx3KCMCXDfvRxYAO6B 4nNw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b=RgSAcUcA; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id w11-20020a63c10b000000b0050f74bf6b3asi26323452pgf.700.2023.04.30.13.32.17; Sun, 30 Apr 2023 13:32:33 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b=RgSAcUcA; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232064AbjD3UTm (ORCPT <rfc822;rbbytesnap@gmail.com> + 99 others); Sun, 30 Apr 2023 16:19:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57332 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231223AbjD3UTk (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Sun, 30 Apr 2023 16:19:40 -0400 Received: from mail-wm1-x32e.google.com (mail-wm1-x32e.google.com [IPv6:2a00:1450:4864:20::32e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5EDB3E4C for <linux-kernel@vger.kernel.org>; Sun, 30 Apr 2023 13:19:39 -0700 (PDT) Received: by mail-wm1-x32e.google.com with SMTP id 5b1f17b1804b1-3f1950f5628so17354165e9.3 for <linux-kernel@vger.kernel.org>; Sun, 30 Apr 2023 13:19:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1682885978; x=1685477978; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=b23kKCtHH6bptYMnbkiFzswByhO4ugXGC7lReBIJxcE=; b=RgSAcUcAQnMir0yk7+P9NWQKpZGc724wNfiCtLYUJMRweuO2G+rKaOundt+kRebwgE 2sA6flt2erNsCueOprvBW5SVElIoXeSqxHKlTYRo8Cw0fuD0ciyAHEh4AqzOJ79knliE tI+mXlKr22bkfbqqxwogYvuX9gJzfcwlkj31+Hx0lnvhaZQ45bKVZ8ygnOqmAN2t/m0F AFck/JQXi3Wnk9y6Qs5sEAcTgxRgzt4BNEXcCuW8IMPnZTSe9D4D29ih0EszWGpOvkqT 0C6kd4cypf2y4ntF4CURxvHijaSw2BKIRmpcHs6CbSoOcbZ0sgOtmzAuvGyCNFQOUABm 6p/w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682885978; x=1685477978; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=b23kKCtHH6bptYMnbkiFzswByhO4ugXGC7lReBIJxcE=; b=lt4l4pRzcXUFluLoeXyjaxsUnK45WU7Ffis8m32dFU5EhWtzvhcja40C2Aiqz1QpQQ zotfQV8iCuKOxO7z90i/kZByKlnEvrVip1Yphyxbrvcc0PdDrhqnCaWnF2iKDnQL7JKr myj84I1F0MP0mfWrx5o3Q1pUHUBW1GiIQC4Xt4oN42OkBuRzva3Ub3qy0BVmcqa8WP7v nmR4htCBGu2LH1fzmkI3Wszcv3QCycaBvLKtCZKDXv/E43CTJAmFoj33+O8cXaeJvkup Y6TrruJoZQwF71Mx5mycS8x9rKHxXF1z7UoGmcSbnwoIhON2nO6YvTjhdhZK+NKrpieq xu2w== X-Gm-Message-State: AC+VfDxWZJANvZqYFl3mwCBL1RMeHXUOGUtUhB+XDjpvgW08C49gd1g2 9hoVqQM35TLkgiuqwb5byBc= X-Received: by 2002:a7b:c84b:0:b0:3f1:75a9:5c0d with SMTP id c11-20020a7bc84b000000b003f175a95c0dmr9248332wml.26.1682885977604; Sun, 30 Apr 2023 13:19:37 -0700 (PDT) Received: from lucifer.home ([2a00:23c5:dc8c:8701:1663:9a35:5a7b:1d76]) by smtp.googlemail.com with ESMTPSA id j32-20020a05600c1c2000b003f173987ec2sm34328903wms.22.2023.04.30.13.19.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 30 Apr 2023 13:19:36 -0700 (PDT) From: Lorenzo Stoakes <lstoakes@gmail.com> To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton <akpm@linux-foundation.org> Cc: "Liam R . Howlett" <Liam.Howlett@oracle.com>, Vlastimil Babka <vbabka@suse.cz>, Lorenzo Stoakes <lstoakes@gmail.com> Subject: [PATCH] mm/mmap/vma_merge: always check invariants Date: Sun, 30 Apr 2023 21:19:17 +0100 Message-Id: <df548a6ae3fa135eec3b446eb3dae8eb4227da97.1682885809.git.lstoakes@gmail.com> X-Mailer: git-send-email 2.40.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1764634660454309985?= X-GMAIL-MSGID: =?utf-8?q?1764634660454309985?= |
Series |
mm/mmap/vma_merge: always check invariants
|
|
Commit Message
Lorenzo Stoakes
April 30, 2023, 8:19 p.m. UTC
We may still have inconsistent input parameters even if we choose not to
merge and the vma_merge() invariant checks are useful for checking this
with no production runtime cost (these are only relevant when
CONFIG_DEBUG_VM is specified).
Therefore, perform these checks regardless of whether we merge.
This is relevant, as a recent issue (addressed in commit "mm/mempolicy:
Correctly update prev when policy is equal on mbind") in the mbind logic
was only picked up in the 6.2.y stable branch where these assertions are
performed prior to determining mergeability.
Had this remained the same in mainline this issue may have been picked up
faster, so moving forward let's always check them.
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
---
mm/mmap.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
Comments
On 4/30/23 22:19, Lorenzo Stoakes wrote: > We may still have inconsistent input parameters even if we choose not to > merge and the vma_merge() invariant checks are useful for checking this > with no production runtime cost (these are only relevant when > CONFIG_DEBUG_VM is specified). > > Therefore, perform these checks regardless of whether we merge. > > This is relevant, as a recent issue (addressed in commit "mm/mempolicy: > Correctly update prev when policy is equal on mbind") in the mbind logic > was only picked up in the 6.2.y stable branch where these assertions are > performed prior to determining mergeability. > > Had this remained the same in mainline this issue may have been picked up > faster, so moving forward let's always check them. > > Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> > --- > mm/mmap.c | 10 +++++----- > 1 file changed, 5 insertions(+), 5 deletions(-) > > diff --git a/mm/mmap.c b/mm/mmap.c > index 5522130ae606..13678edaa22c 100644 > --- a/mm/mmap.c > +++ b/mm/mmap.c > @@ -960,17 +960,17 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm, > merge_next = true; > } > > + /* Verify some invariant that must be enforced by the caller. */ > + VM_WARN_ON(prev && addr <= prev->vm_start); > + VM_WARN_ON(curr && (addr != curr->vm_start || end > curr->vm_end)); > + VM_WARN_ON(addr >= end); > + > if (!merge_prev && !merge_next) > return NULL; /* Not mergeable. */ > > res = vma = prev; > remove = remove2 = adjust = NULL; > > - /* Verify some invariant that must be enforced by the caller. */ > - VM_WARN_ON(prev && addr <= prev->vm_start); > - VM_WARN_ON(curr && (addr != curr->vm_start || end > curr->vm_end)); > - VM_WARN_ON(addr >= end); > - > /* Can we merge both the predecessor and the successor? */ > if (merge_prev && merge_next && > is_mergeable_anon_vma(prev->anon_vma, next->anon_vma, NULL)) {
On 30.04.23 22:19, Lorenzo Stoakes wrote: > We may still have inconsistent input parameters even if we choose not to > merge and the vma_merge() invariant checks are useful for checking this > with no production runtime cost (these are only relevant when > CONFIG_DEBUG_VM is specified). > > Therefore, perform these checks regardless of whether we merge. > > This is relevant, as a recent issue (addressed in commit "mm/mempolicy: > Correctly update prev when policy is equal on mbind") in the mbind logic > was only picked up in the 6.2.y stable branch where these assertions are > performed prior to determining mergeability. > > Had this remained the same in mainline this issue may have been picked up > faster, so moving forward let's always check them. > > Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> > --- > mm/mmap.c | 10 +++++----- > 1 file changed, 5 insertions(+), 5 deletions(-) > > diff --git a/mm/mmap.c b/mm/mmap.c > index 5522130ae606..13678edaa22c 100644 > --- a/mm/mmap.c > +++ b/mm/mmap.c > @@ -960,17 +960,17 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm, > merge_next = true; > } > > + /* Verify some invariant that must be enforced by the caller. */ > + VM_WARN_ON(prev && addr <= prev->vm_start); > + VM_WARN_ON(curr && (addr != curr->vm_start || end > curr->vm_end)); > + VM_WARN_ON(addr >= end); > + > if (!merge_prev && !merge_next) > return NULL; /* Not mergeable. */ > > res = vma = prev; > remove = remove2 = adjust = NULL; > > - /* Verify some invariant that must be enforced by the caller. */ > - VM_WARN_ON(prev && addr <= prev->vm_start); > - VM_WARN_ON(curr && (addr != curr->vm_start || end > curr->vm_end)); > - VM_WARN_ON(addr >= end); > - > /* Can we merge both the predecessor and the successor? */ > if (merge_prev && merge_next && > is_mergeable_anon_vma(prev->anon_vma, next->anon_vma, NULL)) { Reviewed-by: David Hildenbrand <david@redhat.com>
* Lorenzo Stoakes <lstoakes@gmail.com> [230430 16:19]: > We may still have inconsistent input parameters even if we choose not to > merge and the vma_merge() invariant checks are useful for checking this > with no production runtime cost (these are only relevant when > CONFIG_DEBUG_VM is specified). > > Therefore, perform these checks regardless of whether we merge. > > This is relevant, as a recent issue (addressed in commit "mm/mempolicy: > Correctly update prev when policy is equal on mbind") in the mbind logic > was only picked up in the 6.2.y stable branch where these assertions are > performed prior to determining mergeability. > > Had this remained the same in mainline this issue may have been picked up > faster, so moving forward let's always check them. > > Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> > --- > mm/mmap.c | 10 +++++----- > 1 file changed, 5 insertions(+), 5 deletions(-) > > diff --git a/mm/mmap.c b/mm/mmap.c > index 5522130ae606..13678edaa22c 100644 > --- a/mm/mmap.c > +++ b/mm/mmap.c > @@ -960,17 +960,17 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm, > merge_next = true; > } > > + /* Verify some invariant that must be enforced by the caller. */ > + VM_WARN_ON(prev && addr <= prev->vm_start); > + VM_WARN_ON(curr && (addr != curr->vm_start || end > curr->vm_end)); > + VM_WARN_ON(addr >= end); > + > if (!merge_prev && !merge_next) > return NULL; /* Not mergeable. */ > > res = vma = prev; > remove = remove2 = adjust = NULL; > > - /* Verify some invariant that must be enforced by the caller. */ > - VM_WARN_ON(prev && addr <= prev->vm_start); > - VM_WARN_ON(curr && (addr != curr->vm_start || end > curr->vm_end)); > - VM_WARN_ON(addr >= end); > - > /* Can we merge both the predecessor and the successor? */ > if (merge_prev && merge_next && > is_mergeable_anon_vma(prev->anon_vma, next->anon_vma, NULL)) { > -- > 2.40.1 >
On Sun, 30 Apr 2023 21:19:17 +0100 Lorenzo Stoakes <lstoakes@gmail.com> wrote: > We may still have inconsistent input parameters even if we choose not to > merge and the vma_merge() invariant checks are useful for checking this > with no production runtime cost (these are only relevant when > CONFIG_DEBUG_VM is specified). > > Therefore, perform these checks regardless of whether we merge. > > This is relevant, as a recent issue (addressed in commit "mm/mempolicy: > Correctly update prev when policy is equal on mbind") in the mbind logic > was only picked up in the 6.2.y stable branch where these assertions are > performed prior to determining mergeability. > > Had this remained the same in mainline this issue may have been picked up > faster, so moving forward let's always check them. > I'll scoot this into 6.4-rc, given the recent problems in this area. > --- a/mm/mmap.c > +++ b/mm/mmap.c > @@ -960,17 +960,17 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm, > merge_next = true; > } > > + /* Verify some invariant that must be enforced by the caller. */ > + VM_WARN_ON(prev && addr <= prev->vm_start); > + VM_WARN_ON(curr && (addr != curr->vm_start || end > curr->vm_end)); > + VM_WARN_ON(addr >= end); Maybe converting to VM_WARN_ON_ONCE() would be kinder.
Hi, On Sun, Apr 30, 2023 at 09:19:17PM +0100, Lorenzo Stoakes wrote: > We may still have inconsistent input parameters even if we choose not to > merge and the vma_merge() invariant checks are useful for checking this > with no production runtime cost (these are only relevant when > CONFIG_DEBUG_VM is specified). > > Therefore, perform these checks regardless of whether we merge. > > This is relevant, as a recent issue (addressed in commit "mm/mempolicy: > Correctly update prev when policy is equal on mbind") in the mbind logic > was only picked up in the 6.2.y stable branch where these assertions are > performed prior to determining mergeability. > > Had this remained the same in mainline this issue may have been picked up > faster, so moving forward let's always check them. > > Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> > --- > mm/mmap.c | 10 +++++----- > 1 file changed, 5 insertions(+), 5 deletions(-) > > diff --git a/mm/mmap.c b/mm/mmap.c > index 5522130ae606..13678edaa22c 100644 > --- a/mm/mmap.c > +++ b/mm/mmap.c > @@ -960,17 +960,17 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm, > merge_next = true; > } > > + /* Verify some invariant that must be enforced by the caller. */ > + VM_WARN_ON(prev && addr <= prev->vm_start); > + VM_WARN_ON(curr && (addr != curr->vm_start || end > curr->vm_end)); > + VM_WARN_ON(addr >= end); > + I'm seeing this fire a lot when fuzzing v6.4-rc1 on arm64 using Syzkaller. The splat looks like: | Syzkaller hit 'WARNING in vma_merge' bug. | | ------------[ cut here ]------------ | WARNING: CPU: 0 PID: 193 at mm/mmap.c:965 vma_merge+0x21c/0x1158 mm/mmap.c:965 | CPU: 0 PID: 193 Comm: syz-executor105 Not tainted 6.4.0-rc1-00001-g7d54d3135001 #1 | Hardware name: linux,dummy-virt (DT) | pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) | pc : vma_merge+0x21c/0x1158 mm/mmap.c:965 | lr : vma_merge+0x21c/0x1158 mm/mmap.c:965 | sp : ffff800018ec7970 | x29: ffff800018ec7970 x28: 0000000020000000 x27: 0000000000000000 | x26: 0000000000000000 x25: 1ffff000031d8f42 x24: ffff000010d58000 | x23: 0000000000000000 x22: ffff000017acc9b0 x21: 0000000020ffd000 | x20: 0000000020ffb000 x19: ffff000017acc8b8 x18: 0000000000000005 | x17: 0000000000000000 x16: 0000000000000000 x15: 1fffe00002f27494 | x14: 0000000000000000 x13: 000000009a8feb3a x12: ffff700002ddc77d | x11: 1ffff00002ddc77c x10: ffff700002ddc77c x9 : dfff800000000000 | x8 : ffff800016ee3be3 x7 : 0000000000000000 x6 : 0000000000000000 | x5 : ffff000017939b00 x4 : ffff800010c4a000 x3 : ffff800008000000 | x2 : 0000000000000000 x1 : ffff000017939b00 x0 : 0000000000000000 | Call trace: | vma_merge+0x21c/0x1158 mm/mmap.c:965 | userfaultfd_register fs/userfaultfd.c:1485 [inline] | userfaultfd_ioctl+0x378c/0x4240 fs/userfaultfd.c:2050 | vfs_ioctl fs/ioctl.c:51 [inline] | __do_sys_ioctl fs/ioctl.c:870 [inline] | __se_sys_ioctl fs/ioctl.c:856 [inline] | __arm64_sys_ioctl+0x184/0x218 fs/ioctl.c:856 | __invoke_syscall arch/arm64/kernel/syscall.c:38 [inline] | invoke_syscall+0x8c/0x2d8 arch/arm64/kernel/syscall.c:52 | el0_svc_common.constprop.0+0xf4/0x300 arch/arm64/kernel/syscall.c:142 | do_el0_svc+0x6c/0x180 arch/arm64/kernel/syscall.c:193 | el0_svc+0x4c/0x110 arch/arm64/kernel/entry-common.c:637 | el0t_64_sync_handler+0xf4/0x120 arch/arm64/kernel/entry-common.c:655 | el0t_64_sync+0x190/0x198 arch/arm64/kernel/entry.S:591 | irq event stamp: 2212 | hardirqs last enabled at (2211): [<ffff80000805d03c>] local_daif_restore arch/arm64/include/asm/daifflags.h:75 [inline] | hardirqs last enabled at (2211): [<ffff80000805d03c>] el0_svc_common.constprop.0+0xac/0x300 arch/arm64/kernel/syscall.c:107 | hardirqs last disabled at (2212): [<ffff80000e67c40c>] el1_dbg+0x24/0xa0 arch/arm64/kernel/entry-common.c:405 | softirqs last enabled at (2190): [<ffff800008021718>] softirq_handle_end kernel/softirq.c:414 [inline] | softirqs last enabled at (2190): [<ffff800008021718>] __do_softirq+0x8e8/0xe50 kernel/softirq.c:600 | softirqs last disabled at (2183): [<ffff80000802ad9c>] ____do_softirq+0x1c/0x30 arch/arm64/kernel/irq.c:80 | ---[ end trace 0000000000000000 ]--- I can reproduce that reliably with the below: | #include <linux/userfaultfd.h> | #include <sys/ioctl.h> | #include <sys/mman.h> | #include <sys/syscall.h> | #include <sys/types.h> | #include <unistd.h> | | int main(int argc, char *argv[]) | { | int uffd; | void *addr; | | struct uffdio_api uffdio_api; | struct uffdio_register uffdio_register; | | uffd = syscall(__NR_userfaultfd, 0x801ul); | | uffdio_api.api = UFFD_API; | uffdio_api.features = 0; | ioctl(uffd, UFFDIO_API, &uffdio_api); | | addr = mmap(NULL, 0x1000000ul, PROT_READ | PROT_WRITE, | MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); | | uffdio_register.range.start = (unsigned long)addr + 0x10000; | uffdio_register.range.len = 0x2000; | uffdio_register.mode = UFFDIO_REGISTER_MODE_MISSING; | ioctl(uffd, UFFDIO_REGISTER, &uffdio_register); | | return 0; | } ... which is cleaned up from the orginial Syzkaller reproducer: | Syzkaller reproducer: | # {Threaded:false Repeat:false RepeatTimes:0 Procs:1 Slowdown:1 Sandbox: SandboxArg:0 Leak:false NetInjection:false NetDevices:false NetReset:false Cgroups:false BinfmtMisc:false CloseFDs:false KCSAN:false DevlinkPCI:false NicVF:false USB:false VhciInjection:false Wifi:false IEEE802154:false Sysctl:false UseTmpDir:false HandleSegv:false Repro:false Trace:false LegacyOptions:{Collide:false Fault:false FaultCall:0 FaultNth:0}} | r0 = userfaultfd(0x801) | r1 = dup(r0) | ioctl$UFFDIO_API(r1, 0xc018aa3f, &(0x7f0000000000)) | ioctl$UFFDIO_REGISTER(r1, 0xc020aa00, &(0x7f00000001c0)={{&(0x7f0000ffb000/0x2000)=nil, 0x2000}, 0x1}) | | | C reproducer: | // autogenerated by syzkaller (https://github.com/google/syzkaller) | | #define _GNU_SOURCE | | #include <endian.h> | #include <stdint.h> | #include <stdio.h> | #include <stdlib.h> | #include <string.h> | #include <sys/syscall.h> | #include <sys/types.h> | #include <unistd.h> | | #ifndef __NR_dup | #define __NR_dup 23 | #endif | #ifndef __NR_ioctl | #define __NR_ioctl 29 | #endif | #ifndef __NR_mmap | #define __NR_mmap 222 | #endif | #ifndef __NR_userfaultfd | #define __NR_userfaultfd 282 | #endif | | uint64_t r[2] = {0xffffffffffffffff, 0xffffffffffffffff}; | | int main(void) | { | syscall(__NR_mmap, 0x1ffff000ul, 0x1000ul, 0ul, 0x32ul, -1, 0ul); | syscall(__NR_mmap, 0x20000000ul, 0x1000000ul, 7ul, 0x32ul, -1, 0ul); | syscall(__NR_mmap, 0x21000000ul, 0x1000ul, 0ul, 0x32ul, -1, 0ul); | intptr_t res = 0; | res = syscall(__NR_userfaultfd, 0x801ul); | if (res != -1) | r[0] = res; | res = syscall(__NR_dup, r[0]); | if (res != -1) | r[1] = res; | *(uint64_t*)0x20000000 = 0xaa; | *(uint64_t*)0x20000008 = 0; | *(uint64_t*)0x20000010 = 0; | syscall(__NR_ioctl, r[1], 0xc018aa3f, 0x20000000ul); | *(uint64_t*)0x200001c0 = 0x20ffb000; | *(uint64_t*)0x200001c8 = 0x2000; | *(uint64_t*)0x200001d0 = 1; | *(uint64_t*)0x200001d8 = 0; | syscall(__NR_ioctl, r[1], 0xc020aa00, 0x200001c0ul); | return 0; | } Thanks, Mark.
On Wed, May 10, 2023 at 03:15:51PM +0100, Mark Rutland wrote: > Hi, > > On Sun, Apr 30, 2023 at 09:19:17PM +0100, Lorenzo Stoakes wrote: > > We may still have inconsistent input parameters even if we choose not to > > merge and the vma_merge() invariant checks are useful for checking this > > with no production runtime cost (these are only relevant when > > CONFIG_DEBUG_VM is specified). > > > > Therefore, perform these checks regardless of whether we merge. > > > > This is relevant, as a recent issue (addressed in commit "mm/mempolicy: > > Correctly update prev when policy is equal on mbind") in the mbind logic > > was only picked up in the 6.2.y stable branch where these assertions are > > performed prior to determining mergeability. > > > > Had this remained the same in mainline this issue may have been picked up > > faster, so moving forward let's always check them. > > > > Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> > > --- > > mm/mmap.c | 10 +++++----- > > 1 file changed, 5 insertions(+), 5 deletions(-) > > > > diff --git a/mm/mmap.c b/mm/mmap.c > > index 5522130ae606..13678edaa22c 100644 > > --- a/mm/mmap.c > > +++ b/mm/mmap.c > > @@ -960,17 +960,17 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm, > > merge_next = true; > > } > > > > + /* Verify some invariant that must be enforced by the caller. */ > > + VM_WARN_ON(prev && addr <= prev->vm_start); > > + VM_WARN_ON(curr && (addr != curr->vm_start || end > curr->vm_end)); > > + VM_WARN_ON(addr >= end); > > + > > I'm seeing this fire a lot when fuzzing v6.4-rc1 on arm64 using Syzkaller. > Thanks, from the line I suspect addr != curr->vm_start, but need to look into the repro, at lsf/mm so a bit time lagged :) Am looking into it! > The splat looks like: > > | Syzkaller hit 'WARNING in vma_merge' bug. > | > | ------------[ cut here ]------------ > | WARNING: CPU: 0 PID: 193 at mm/mmap.c:965 vma_merge+0x21c/0x1158 mm/mmap.c:965 > | CPU: 0 PID: 193 Comm: syz-executor105 Not tainted 6.4.0-rc1-00001-g7d54d3135001 #1 > | Hardware name: linux,dummy-virt (DT) > | pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) > | pc : vma_merge+0x21c/0x1158 mm/mmap.c:965 > | lr : vma_merge+0x21c/0x1158 mm/mmap.c:965 > | sp : ffff800018ec7970 > | x29: ffff800018ec7970 x28: 0000000020000000 x27: 0000000000000000 > | x26: 0000000000000000 x25: 1ffff000031d8f42 x24: ffff000010d58000 > | x23: 0000000000000000 x22: ffff000017acc9b0 x21: 0000000020ffd000 > | x20: 0000000020ffb000 x19: ffff000017acc8b8 x18: 0000000000000005 > | x17: 0000000000000000 x16: 0000000000000000 x15: 1fffe00002f27494 > | x14: 0000000000000000 x13: 000000009a8feb3a x12: ffff700002ddc77d > | x11: 1ffff00002ddc77c x10: ffff700002ddc77c x9 : dfff800000000000 > | x8 : ffff800016ee3be3 x7 : 0000000000000000 x6 : 0000000000000000 > | x5 : ffff000017939b00 x4 : ffff800010c4a000 x3 : ffff800008000000 > | x2 : 0000000000000000 x1 : ffff000017939b00 x0 : 0000000000000000 > | Call trace: > | vma_merge+0x21c/0x1158 mm/mmap.c:965 > | userfaultfd_register fs/userfaultfd.c:1485 [inline] > | userfaultfd_ioctl+0x378c/0x4240 fs/userfaultfd.c:2050 > | vfs_ioctl fs/ioctl.c:51 [inline] > | __do_sys_ioctl fs/ioctl.c:870 [inline] > | __se_sys_ioctl fs/ioctl.c:856 [inline] > | __arm64_sys_ioctl+0x184/0x218 fs/ioctl.c:856 > | __invoke_syscall arch/arm64/kernel/syscall.c:38 [inline] > | invoke_syscall+0x8c/0x2d8 arch/arm64/kernel/syscall.c:52 > | el0_svc_common.constprop.0+0xf4/0x300 arch/arm64/kernel/syscall.c:142 > | do_el0_svc+0x6c/0x180 arch/arm64/kernel/syscall.c:193 > | el0_svc+0x4c/0x110 arch/arm64/kernel/entry-common.c:637 > | el0t_64_sync_handler+0xf4/0x120 arch/arm64/kernel/entry-common.c:655 > | el0t_64_sync+0x190/0x198 arch/arm64/kernel/entry.S:591 > | irq event stamp: 2212 > | hardirqs last enabled at (2211): [<ffff80000805d03c>] local_daif_restore arch/arm64/include/asm/daifflags.h:75 [inline] > | hardirqs last enabled at (2211): [<ffff80000805d03c>] el0_svc_common.constprop.0+0xac/0x300 arch/arm64/kernel/syscall.c:107 > | hardirqs last disabled at (2212): [<ffff80000e67c40c>] el1_dbg+0x24/0xa0 arch/arm64/kernel/entry-common.c:405 > | softirqs last enabled at (2190): [<ffff800008021718>] softirq_handle_end kernel/softirq.c:414 [inline] > | softirqs last enabled at (2190): [<ffff800008021718>] __do_softirq+0x8e8/0xe50 kernel/softirq.c:600 > | softirqs last disabled at (2183): [<ffff80000802ad9c>] ____do_softirq+0x1c/0x30 arch/arm64/kernel/irq.c:80 > | ---[ end trace 0000000000000000 ]--- > > I can reproduce that reliably with the below: > > | #include <linux/userfaultfd.h> > | #include <sys/ioctl.h> > | #include <sys/mman.h> > | #include <sys/syscall.h> > | #include <sys/types.h> > | #include <unistd.h> > | > | int main(int argc, char *argv[]) > | { > | int uffd; > | void *addr; > | > | struct uffdio_api uffdio_api; > | struct uffdio_register uffdio_register; > | > | uffd = syscall(__NR_userfaultfd, 0x801ul); > | > | uffdio_api.api = UFFD_API; > | uffdio_api.features = 0; > | ioctl(uffd, UFFDIO_API, &uffdio_api); > | > | addr = mmap(NULL, 0x1000000ul, PROT_READ | PROT_WRITE, > | MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); > | > | uffdio_register.range.start = (unsigned long)addr + 0x10000; > | uffdio_register.range.len = 0x2000; > | uffdio_register.mode = UFFDIO_REGISTER_MODE_MISSING; > | ioctl(uffd, UFFDIO_REGISTER, &uffdio_register); > | > | return 0; > | } > > ... which is cleaned up from the orginial Syzkaller reproducer: > > | Syzkaller reproducer: > | # {Threaded:false Repeat:false RepeatTimes:0 Procs:1 Slowdown:1 Sandbox: SandboxArg:0 Leak:false NetInjection:false NetDevices:false NetReset:false Cgroups:false BinfmtMisc:false CloseFDs:false KCSAN:false DevlinkPCI:false NicVF:false USB:false VhciInjection:false Wifi:false IEEE802154:false Sysctl:false UseTmpDir:false HandleSegv:false Repro:false Trace:false LegacyOptions:{Collide:false Fault:false FaultCall:0 FaultNth:0}} > | r0 = userfaultfd(0x801) > | r1 = dup(r0) > | ioctl$UFFDIO_API(r1, 0xc018aa3f, &(0x7f0000000000)) > | ioctl$UFFDIO_REGISTER(r1, 0xc020aa00, &(0x7f00000001c0)={{&(0x7f0000ffb000/0x2000)=nil, 0x2000}, 0x1}) > | > | > | C reproducer: > | // autogenerated by syzkaller (https://github.com/google/syzkaller) > | > | #define _GNU_SOURCE > | > | #include <endian.h> > | #include <stdint.h> > | #include <stdio.h> > | #include <stdlib.h> > | #include <string.h> > | #include <sys/syscall.h> > | #include <sys/types.h> > | #include <unistd.h> > | > | #ifndef __NR_dup > | #define __NR_dup 23 > | #endif > | #ifndef __NR_ioctl > | #define __NR_ioctl 29 > | #endif > | #ifndef __NR_mmap > | #define __NR_mmap 222 > | #endif > | #ifndef __NR_userfaultfd > | #define __NR_userfaultfd 282 > | #endif > | > | uint64_t r[2] = {0xffffffffffffffff, 0xffffffffffffffff}; > | > | int main(void) > | { > | syscall(__NR_mmap, 0x1ffff000ul, 0x1000ul, 0ul, 0x32ul, -1, 0ul); > | syscall(__NR_mmap, 0x20000000ul, 0x1000000ul, 7ul, 0x32ul, -1, 0ul); > | syscall(__NR_mmap, 0x21000000ul, 0x1000ul, 0ul, 0x32ul, -1, 0ul); > | intptr_t res = 0; > | res = syscall(__NR_userfaultfd, 0x801ul); > | if (res != -1) > | r[0] = res; > | res = syscall(__NR_dup, r[0]); > | if (res != -1) > | r[1] = res; > | *(uint64_t*)0x20000000 = 0xaa; > | *(uint64_t*)0x20000008 = 0; > | *(uint64_t*)0x20000010 = 0; > | syscall(__NR_ioctl, r[1], 0xc018aa3f, 0x20000000ul); > | *(uint64_t*)0x200001c0 = 0x20ffb000; > | *(uint64_t*)0x200001c8 = 0x2000; > | *(uint64_t*)0x200001d0 = 1; > | *(uint64_t*)0x200001d8 = 0; > | syscall(__NR_ioctl, r[1], 0xc020aa00, 0x200001c0ul); > | return 0; > | } > > Thanks, > Mark.
On Wed, May 10, 2023 at 09:04:44AM -0700, Lorenzo Stoakes wrote: > On Wed, May 10, 2023 at 03:15:51PM +0100, Mark Rutland wrote: > > Hi, > > > > On Sun, Apr 30, 2023 at 09:19:17PM +0100, Lorenzo Stoakes wrote: > > > We may still have inconsistent input parameters even if we choose not to > > > merge and the vma_merge() invariant checks are useful for checking this > > > with no production runtime cost (these are only relevant when > > > CONFIG_DEBUG_VM is specified). > > > > > > Therefore, perform these checks regardless of whether we merge. > > > > > > This is relevant, as a recent issue (addressed in commit "mm/mempolicy: > > > Correctly update prev when policy is equal on mbind") in the mbind logic > > > was only picked up in the 6.2.y stable branch where these assertions are > > > performed prior to determining mergeability. > > > > > > Had this remained the same in mainline this issue may have been picked up > > > faster, so moving forward let's always check them. > > > > > > Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> > > > --- > > > mm/mmap.c | 10 +++++----- > > > 1 file changed, 5 insertions(+), 5 deletions(-) > > > > > > diff --git a/mm/mmap.c b/mm/mmap.c > > > index 5522130ae606..13678edaa22c 100644 > > > --- a/mm/mmap.c > > > +++ b/mm/mmap.c > > > @@ -960,17 +960,17 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm, > > > merge_next = true; > > > } > > > > > > + /* Verify some invariant that must be enforced by the caller. */ > > > + VM_WARN_ON(prev && addr <= prev->vm_start); > > > + VM_WARN_ON(curr && (addr != curr->vm_start || end > curr->vm_end)); > > > + VM_WARN_ON(addr >= end); > > > + > > > > I'm seeing this fire a lot when fuzzing v6.4-rc1 on arm64 using Syzkaller. > > > > Thanks, from the line I suspect addr != curr->vm_start, but need to look > into the repro, at lsf/mm so a bit time lagged :) No problem; FWIW I can confirm your theory, the reproducer is causing: addr > curr->vm_start ... confirmed the the following hack, log below. | diff --git a/mm/mmap.c b/mm/mmap.c | index 13678edaa22c..2cdebba15719 100644 | --- a/mm/mmap.c | +++ b/mm/mmap.c | @@ -961,9 +961,21 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm, | } | | /* Verify some invariant that must be enforced by the caller. */ | - VM_WARN_ON(prev && addr <= prev->vm_start); | - VM_WARN_ON(curr && (addr != curr->vm_start || end > curr->vm_end)); | - VM_WARN_ON(addr >= end); | + VM_WARN(prev && addr <= prev->vm_start, | + "addr = 0x%016lx, prev->vm_start = 0x%016lx\n", | + addr, prev->vm_start); | + | + VM_WARN(curr && addr != curr->vm_start, | + "addr = 0x%016lx, curr->vm_start = 0x%016lx\n", | + addr, curr->vm_start); | + | + VM_WARN(curr && addr > curr->vm_end, | + "addr = 0x%016lx, curr->vm_end = 0x%016lx\n", | + addr, curr->vm_end); | + | + VM_WARN(addr >= end, | + "addr = 0x%016lx, end = 0x%016lx\n", | + addr, end); | | if (!merge_prev && !merge_next) | return NULL; /* Not mergeable. */ ... with that applied, running the reproducer results in: | addr = 0x0000ffff99dc2000, curr->vm_start = 0x0000ffff99db2000 | WARNING: CPU: 0 PID: 163 at mm/mmap.c:968 vma_merge+0x3d4/0x1260 ... i.e. addr > curr->vm_start Thanks, Mark.
On Wed, May 10, 2023 at 05:17:49PM +0100, Mark Rutland wrote: > On Wed, May 10, 2023 at 09:04:44AM -0700, Lorenzo Stoakes wrote: > > On Wed, May 10, 2023 at 03:15:51PM +0100, Mark Rutland wrote: > > > Hi, > > > > > > On Sun, Apr 30, 2023 at 09:19:17PM +0100, Lorenzo Stoakes wrote: > > > > We may still have inconsistent input parameters even if we choose not to > > > > merge and the vma_merge() invariant checks are useful for checking this > > > > with no production runtime cost (these are only relevant when > > > > CONFIG_DEBUG_VM is specified). > > > > > > > > Therefore, perform these checks regardless of whether we merge. > > > > > > > > This is relevant, as a recent issue (addressed in commit "mm/mempolicy: > > > > Correctly update prev when policy is equal on mbind") in the mbind logic > > > > was only picked up in the 6.2.y stable branch where these assertions are > > > > performed prior to determining mergeability. > > > > > > > > Had this remained the same in mainline this issue may have been picked up > > > > faster, so moving forward let's always check them. > > > > > > > > Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> > > > > --- > > > > mm/mmap.c | 10 +++++----- > > > > 1 file changed, 5 insertions(+), 5 deletions(-) > > > > > > > > diff --git a/mm/mmap.c b/mm/mmap.c > > > > index 5522130ae606..13678edaa22c 100644 > > > > --- a/mm/mmap.c > > > > +++ b/mm/mmap.c > > > > @@ -960,17 +960,17 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm, > > > > merge_next = true; > > > > } > > > > > > > > + /* Verify some invariant that must be enforced by the caller. */ > > > > + VM_WARN_ON(prev && addr <= prev->vm_start); > > > > + VM_WARN_ON(curr && (addr != curr->vm_start || end > curr->vm_end)); > > > > + VM_WARN_ON(addr >= end); > > > > + > > > > > > I'm seeing this fire a lot when fuzzing v6.4-rc1 on arm64 using Syzkaller. > > > > > > > Thanks, from the line I suspect addr != curr->vm_start, but need to look > > into the repro, at lsf/mm so a bit time lagged :) > > No problem; FWIW I can confirm your theory, the reproducer is causing: > > addr > curr->vm_start > > ... confirmed the the following hack, log below. Awesome thanks for that! Just been firing up qemu to do this. Cases 5-8 should really have addr == curr->vm_start, I wonder if it's another case but curr is being set incorrectly, it should in theory not be the case. (See [1] for a visualisation of merge cases as a handy reference) Of course userfaultfd might be the offender here and might be relying on no merge case arising but passing dodgy parameters. [1]:https://ljs.io/merge_cases.png > > | diff --git a/mm/mmap.c b/mm/mmap.c > | index 13678edaa22c..2cdebba15719 100644 > | --- a/mm/mmap.c > | +++ b/mm/mmap.c > | @@ -961,9 +961,21 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm, > | } > | > | /* Verify some invariant that must be enforced by the caller. */ > | - VM_WARN_ON(prev && addr <= prev->vm_start); > | - VM_WARN_ON(curr && (addr != curr->vm_start || end > curr->vm_end)); > | - VM_WARN_ON(addr >= end); > | + VM_WARN(prev && addr <= prev->vm_start, > | + "addr = 0x%016lx, prev->vm_start = 0x%016lx\n", > | + addr, prev->vm_start); > | + > | + VM_WARN(curr && addr != curr->vm_start, > | + "addr = 0x%016lx, curr->vm_start = 0x%016lx\n", > | + addr, curr->vm_start); > | + > | + VM_WARN(curr && addr > curr->vm_end, > | + "addr = 0x%016lx, curr->vm_end = 0x%016lx\n", > | + addr, curr->vm_end); > | + > | + VM_WARN(addr >= end, > | + "addr = 0x%016lx, end = 0x%016lx\n", > | + addr, end); > | > | if (!merge_prev && !merge_next) > | return NULL; /* Not mergeable. */ > > ... with that applied, running the reproducer results in: > > | addr = 0x0000ffff99dc2000, curr->vm_start = 0x0000ffff99db2000 > | WARNING: CPU: 0 PID: 163 at mm/mmap.c:968 vma_merge+0x3d4/0x1260 > > ... i.e. addr > curr->vm_start > > Thanks, > Mark.
On Wed, May 10, 2023 at 09:26:10AM -0700, Lorenzo Stoakes wrote: > On Wed, May 10, 2023 at 05:17:49PM +0100, Mark Rutland wrote: > > On Wed, May 10, 2023 at 09:04:44AM -0700, Lorenzo Stoakes wrote: > > > On Wed, May 10, 2023 at 03:15:51PM +0100, Mark Rutland wrote: > > > > Hi, > > > > > > > > On Sun, Apr 30, 2023 at 09:19:17PM +0100, Lorenzo Stoakes wrote: > > > > > We may still have inconsistent input parameters even if we choose not to > > > > > merge and the vma_merge() invariant checks are useful for checking this > > > > > with no production runtime cost (these are only relevant when > > > > > CONFIG_DEBUG_VM is specified). > > > > > > > > > > Therefore, perform these checks regardless of whether we merge. > > > > > > > > > > This is relevant, as a recent issue (addressed in commit "mm/mempolicy: > > > > > Correctly update prev when policy is equal on mbind") in the mbind logic > > > > > was only picked up in the 6.2.y stable branch where these assertions are > > > > > performed prior to determining mergeability. > > > > > > > > > > Had this remained the same in mainline this issue may have been picked up > > > > > faster, so moving forward let's always check them. > > > > > > > > > > Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> > > > > > --- > > > > > mm/mmap.c | 10 +++++----- > > > > > 1 file changed, 5 insertions(+), 5 deletions(-) > > > > > > > > > > diff --git a/mm/mmap.c b/mm/mmap.c > > > > > index 5522130ae606..13678edaa22c 100644 > > > > > --- a/mm/mmap.c > > > > > +++ b/mm/mmap.c > > > > > @@ -960,17 +960,17 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm, > > > > > merge_next = true; > > > > > } > > > > > > > > > > + /* Verify some invariant that must be enforced by the caller. */ > > > > > + VM_WARN_ON(prev && addr <= prev->vm_start); > > > > > + VM_WARN_ON(curr && (addr != curr->vm_start || end > curr->vm_end)); > > > > > + VM_WARN_ON(addr >= end); > > > > > + > > > > > > > > I'm seeing this fire a lot when fuzzing v6.4-rc1 on arm64 using Syzkaller. > > > > > > > > > > Thanks, from the line I suspect addr != curr->vm_start, but need to look > > > into the repro, at lsf/mm so a bit time lagged :) > > > > No problem; FWIW I can confirm your theory, the reproducer is causing: > > > > addr > curr->vm_start > > > > ... confirmed the the following hack, log below. > > Awesome thanks for that! Just been firing up qemu to do this. > > Cases 5-8 should really have addr == curr->vm_start, I wonder if it's > another case but curr is being set incorrectly, it should in theory not be > the case. > > (See [1] for a visualisation of merge cases as a handy reference) > > Of course userfaultfd might be the offender here and might be relying on no > merge case arising but passing dodgy parameters. > > [1]:https://ljs.io/merge_cases.png > > > > > | diff --git a/mm/mmap.c b/mm/mmap.c > > | index 13678edaa22c..2cdebba15719 100644 > > | --- a/mm/mmap.c > > | +++ b/mm/mmap.c > > | @@ -961,9 +961,21 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm, > > | } > > | > > | /* Verify some invariant that must be enforced by the caller. */ > > | - VM_WARN_ON(prev && addr <= prev->vm_start); > > | - VM_WARN_ON(curr && (addr != curr->vm_start || end > curr->vm_end)); > > | - VM_WARN_ON(addr >= end); > > | + VM_WARN(prev && addr <= prev->vm_start, > > | + "addr = 0x%016lx, prev->vm_start = 0x%016lx\n", > > | + addr, prev->vm_start); > > | + > > | + VM_WARN(curr && addr != curr->vm_start, > > | + "addr = 0x%016lx, curr->vm_start = 0x%016lx\n", > > | + addr, curr->vm_start); > > | + > > | + VM_WARN(curr && addr > curr->vm_end, > > | + "addr = 0x%016lx, curr->vm_end = 0x%016lx\n", > > | + addr, curr->vm_end); > > | + > > | + VM_WARN(addr >= end, > > | + "addr = 0x%016lx, end = 0x%016lx\n", > > | + addr, end); > > | > > | if (!merge_prev && !merge_next) > > | return NULL; /* Not mergeable. */ > > > > ... with that applied, running the reproducer results in: > > > > | addr = 0x0000ffff99dc2000, curr->vm_start = 0x0000ffff99db2000 > > | WARNING: CPU: 0 PID: 163 at mm/mmap.c:968 vma_merge+0x3d4/0x1260 > > > > ... i.e. addr > curr->vm_start > > > > Thanks, > > Mark. It looks like userfaultfd_register() is indeed using vma_merge() to determine whether potentially broken input is mergeable, e.g.:- if (vma->vm_start < start) { ret = split_vma(&vmi, vma, start, 1); if (ret) break; } So it is proactively using this to determine whether the VMA needs splitting which is... interesting :) I need to look at this in detail, but I do suspect uffd could do this check _before_ the vma_merge() (I'd need to double-check this though), or perhaps vma_merge could un-warn it. What's worrying here is that having addr > curr->vm_start but end == next->vm_start could result in a broken, overlapped merged. In any case this needs some post-jet lag analysis (happy for you or Liam or whoever to step in too if you fancy ;) THe most proximal solution is to just drop the addr != curr->vm_start check or at least return NULL if that check is failed, but I think it'd be better to take a step back and examing the uffd code in detail because the approach seems like it might have a broken edge case. I think this check has more so brought out an issue rather than caused one, but until I can give a post-lsf/mm, post-jet lag analysis I can't be conclusively coherent on this. Instinct says we should probably change uffd but might be wrong!
(adding Peter) On Wed, May 10, 2023 at 09:26:10AM -0700, Lorenzo Stoakes wrote: > On Wed, May 10, 2023 at 05:17:49PM +0100, Mark Rutland wrote: > > On Wed, May 10, 2023 at 09:04:44AM -0700, Lorenzo Stoakes wrote: > > > On Wed, May 10, 2023 at 03:15:51PM +0100, Mark Rutland wrote: > > > > Hi, > > > > > > > > On Sun, Apr 30, 2023 at 09:19:17PM +0100, Lorenzo Stoakes wrote: > > > > > We may still have inconsistent input parameters even if we choose not to > > > > > merge and the vma_merge() invariant checks are useful for checking this > > > > > with no production runtime cost (these are only relevant when > > > > > CONFIG_DEBUG_VM is specified). > > > > > > > > > > Therefore, perform these checks regardless of whether we merge. > > > > > > > > > > This is relevant, as a recent issue (addressed in commit "mm/mempolicy: > > > > > Correctly update prev when policy is equal on mbind") in the mbind logic > > > > > was only picked up in the 6.2.y stable branch where these assertions are > > > > > performed prior to determining mergeability. > > > > > > > > > > Had this remained the same in mainline this issue may have been picked up > > > > > faster, so moving forward let's always check them. > > > > > > > > > > Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> > > > > > --- > > > > > mm/mmap.c | 10 +++++----- > > > > > 1 file changed, 5 insertions(+), 5 deletions(-) > > > > > > > > > > diff --git a/mm/mmap.c b/mm/mmap.c > > > > > index 5522130ae606..13678edaa22c 100644 > > > > > --- a/mm/mmap.c > > > > > +++ b/mm/mmap.c > > > > > @@ -960,17 +960,17 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm, > > > > > merge_next = true; > > > > > } > > > > > > > > > > + /* Verify some invariant that must be enforced by the caller. */ > > > > > + VM_WARN_ON(prev && addr <= prev->vm_start); > > > > > + VM_WARN_ON(curr && (addr != curr->vm_start || end > curr->vm_end)); > > > > > + VM_WARN_ON(addr >= end); > > > > > + > > > > > > > > I'm seeing this fire a lot when fuzzing v6.4-rc1 on arm64 using Syzkaller. > > > > > > > > > > Thanks, from the line I suspect addr != curr->vm_start, but need to look > > > into the repro, at lsf/mm so a bit time lagged :) > > > > No problem; FWIW I can confirm your theory, the reproducer is causing: > > > > addr > curr->vm_start > > > > ... confirmed the the following hack, log below. > > Awesome thanks for that! Just been firing up qemu to do this. > > Cases 5-8 should really have addr == curr->vm_start, I wonder if it's > another case but curr is being set incorrectly, it should in theory not be > the case. AFAIU, it's a case of "adjust vma, but don't merge, because prev is not compatible". Looks like uffd first attempts to merge compatible the newly registered range with adjacent vmas relying on that there won't be no merge when addr != curr->vm_start and only after the merge attempt it splits the edges. I think that moving the split in fs/userfaultfd.c:1495 (as of v6.4-rc1) before vma_merge() will be the right fix. > (See [1] for a visualisation of merge cases as a handy reference) > > Of course userfaultfd might be the offender here and might be relying on no > merge case arising but passing dodgy parameters. > > [1]:https://ljs.io/merge_cases.png You really should put it into Documentation/mm ;-) > > > > | diff --git a/mm/mmap.c b/mm/mmap.c > > | index 13678edaa22c..2cdebba15719 100644 > > | --- a/mm/mmap.c > > | +++ b/mm/mmap.c > > | @@ -961,9 +961,21 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm, > > | } > > | > > | /* Verify some invariant that must be enforced by the caller. */ > > | - VM_WARN_ON(prev && addr <= prev->vm_start); > > | - VM_WARN_ON(curr && (addr != curr->vm_start || end > curr->vm_end)); > > | - VM_WARN_ON(addr >= end); > > | + VM_WARN(prev && addr <= prev->vm_start, > > | + "addr = 0x%016lx, prev->vm_start = 0x%016lx\n", > > | + addr, prev->vm_start); > > | + > > | + VM_WARN(curr && addr != curr->vm_start, > > | + "addr = 0x%016lx, curr->vm_start = 0x%016lx\n", > > | + addr, curr->vm_start); > > | + > > | + VM_WARN(curr && addr > curr->vm_end, > > | + "addr = 0x%016lx, curr->vm_end = 0x%016lx\n", > > | + addr, curr->vm_end); > > | + > > | + VM_WARN(addr >= end, > > | + "addr = 0x%016lx, end = 0x%016lx\n", > > | + addr, end); > > | > > | if (!merge_prev && !merge_next) > > | return NULL; /* Not mergeable. */ > > > > ... with that applied, running the reproducer results in: > > > > | addr = 0x0000ffff99dc2000, curr->vm_start = 0x0000ffff99db2000 > > | WARNING: CPU: 0 PID: 163 at mm/mmap.c:968 vma_merge+0x3d4/0x1260 > > > > ... i.e. addr > curr->vm_start > > > > Thanks, > > Mark. >
On Thu, May 11, 2023 at 11:08:41AM -0700, Mike Rapoport wrote: > (adding Peter) > > On Wed, May 10, 2023 at 09:26:10AM -0700, Lorenzo Stoakes wrote: > > On Wed, May 10, 2023 at 05:17:49PM +0100, Mark Rutland wrote: > > > On Wed, May 10, 2023 at 09:04:44AM -0700, Lorenzo Stoakes wrote: > > > > On Wed, May 10, 2023 at 03:15:51PM +0100, Mark Rutland wrote: > > > > > Hi, > > > > > > > > > > On Sun, Apr 30, 2023 at 09:19:17PM +0100, Lorenzo Stoakes wrote: > > > > > > We may still have inconsistent input parameters even if we choose not to > > > > > > merge and the vma_merge() invariant checks are useful for checking this > > > > > > with no production runtime cost (these are only relevant when > > > > > > CONFIG_DEBUG_VM is specified). > > > > > > > > > > > > Therefore, perform these checks regardless of whether we merge. > > > > > > > > > > > > This is relevant, as a recent issue (addressed in commit "mm/mempolicy: > > > > > > Correctly update prev when policy is equal on mbind") in the mbind logic > > > > > > was only picked up in the 6.2.y stable branch where these assertions are > > > > > > performed prior to determining mergeability. > > > > > > > > > > > > Had this remained the same in mainline this issue may have been picked up > > > > > > faster, so moving forward let's always check them. > > > > > > > > > > > > Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> > > > > > > --- > > > > > > mm/mmap.c | 10 +++++----- > > > > > > 1 file changed, 5 insertions(+), 5 deletions(-) > > > > > > > > > > > > diff --git a/mm/mmap.c b/mm/mmap.c > > > > > > index 5522130ae606..13678edaa22c 100644 > > > > > > --- a/mm/mmap.c > > > > > > +++ b/mm/mmap.c > > > > > > @@ -960,17 +960,17 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm, > > > > > > merge_next = true; > > > > > > } > > > > > > > > > > > > + /* Verify some invariant that must be enforced by the caller. */ > > > > > > + VM_WARN_ON(prev && addr <= prev->vm_start); > > > > > > + VM_WARN_ON(curr && (addr != curr->vm_start || end > curr->vm_end)); > > > > > > + VM_WARN_ON(addr >= end); > > > > > > + > > > > > > > > > > I'm seeing this fire a lot when fuzzing v6.4-rc1 on arm64 using Syzkaller. > > > > > > > > > > > > > Thanks, from the line I suspect addr != curr->vm_start, but need to look > > > > into the repro, at lsf/mm so a bit time lagged :) > > > > > > No problem; FWIW I can confirm your theory, the reproducer is causing: > > > > > > addr > curr->vm_start > > > > > > ... confirmed the the following hack, log below. > > > > Awesome thanks for that! Just been firing up qemu to do this. > > > > Cases 5-8 should really have addr == curr->vm_start, I wonder if it's > > another case but curr is being set incorrectly, it should in theory not be > > the case. > > AFAIU, it's a case of "adjust vma, but don't merge, because prev is not > compatible". Looks like uffd first attempts to merge compatible the newly > registered range with adjacent vmas relying on that there won't be no merge > when addr != curr->vm_start and only after the merge attempt it splits the > edges. > > I think that moving the split in fs/userfaultfd.c:1495 (as of v6.4-rc1) > before vma_merge() will be the right fix. > Ack this was my strong suspicion, just want to get back to the UK and de-lagged before I dig in properly. > > (See [1] for a visualisation of merge cases as a handy reference) > > > > Of course userfaultfd might be the offender here and might be relying on no > > merge case arising but passing dodgy parameters. > > > > [1]:https://ljs.io/merge_cases.png > > You really should put it into Documentation/mm ;-) Well... will reply to your lsf/mm docs thread on this topic ;) > > > > > > > | diff --git a/mm/mmap.c b/mm/mmap.c > > > | index 13678edaa22c..2cdebba15719 100644 > > > | --- a/mm/mmap.c > > > | +++ b/mm/mmap.c > > > | @@ -961,9 +961,21 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm, > > > | } > > > | > > > | /* Verify some invariant that must be enforced by the caller. */ > > > | - VM_WARN_ON(prev && addr <= prev->vm_start); > > > | - VM_WARN_ON(curr && (addr != curr->vm_start || end > curr->vm_end)); > > > | - VM_WARN_ON(addr >= end); > > > | + VM_WARN(prev && addr <= prev->vm_start, > > > | + "addr = 0x%016lx, prev->vm_start = 0x%016lx\n", > > > | + addr, prev->vm_start); > > > | + > > > | + VM_WARN(curr && addr != curr->vm_start, > > > | + "addr = 0x%016lx, curr->vm_start = 0x%016lx\n", > > > | + addr, curr->vm_start); > > > | + > > > | + VM_WARN(curr && addr > curr->vm_end, > > > | + "addr = 0x%016lx, curr->vm_end = 0x%016lx\n", > > > | + addr, curr->vm_end); > > > | + > > > | + VM_WARN(addr >= end, > > > | + "addr = 0x%016lx, end = 0x%016lx\n", > > > | + addr, end); > > > | > > > | if (!merge_prev && !merge_next) > > > | return NULL; /* Not mergeable. */ > > > > > > ... with that applied, running the reproducer results in: > > > > > > | addr = 0x0000ffff99dc2000, curr->vm_start = 0x0000ffff99db2000 > > > | WARNING: CPU: 0 PID: 163 at mm/mmap.c:968 vma_merge+0x3d4/0x1260 > > > > > > ... i.e. addr > curr->vm_start > > > > > > Thanks, > > > Mark. > >
On Thu, May 11, 2023 at 11:08:41AM -0700, Mike Rapoport wrote: > (adding Peter) > > On Wed, May 10, 2023 at 09:26:10AM -0700, Lorenzo Stoakes wrote: > > On Wed, May 10, 2023 at 05:17:49PM +0100, Mark Rutland wrote: > > > On Wed, May 10, 2023 at 09:04:44AM -0700, Lorenzo Stoakes wrote: > > > > On Wed, May 10, 2023 at 03:15:51PM +0100, Mark Rutland wrote: > > > > > Hi, > > > > > > > > > > On Sun, Apr 30, 2023 at 09:19:17PM +0100, Lorenzo Stoakes wrote: > > > > > > We may still have inconsistent input parameters even if we choose not to > > > > > > merge and the vma_merge() invariant checks are useful for checking this > > > > > > with no production runtime cost (these are only relevant when > > > > > > CONFIG_DEBUG_VM is specified). > > > > > > > > > > > > Therefore, perform these checks regardless of whether we merge. > > > > > > > > > > > > This is relevant, as a recent issue (addressed in commit "mm/mempolicy: > > > > > > Correctly update prev when policy is equal on mbind") in the mbind logic > > > > > > was only picked up in the 6.2.y stable branch where these assertions are > > > > > > performed prior to determining mergeability. > > > > > > > > > > > > Had this remained the same in mainline this issue may have been picked up > > > > > > faster, so moving forward let's always check them. > > > > > > > > > > > > Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> > > > > > > --- > > > > > > mm/mmap.c | 10 +++++----- > > > > > > 1 file changed, 5 insertions(+), 5 deletions(-) > > > > > > > > > > > > diff --git a/mm/mmap.c b/mm/mmap.c > > > > > > index 5522130ae606..13678edaa22c 100644 > > > > > > --- a/mm/mmap.c > > > > > > +++ b/mm/mmap.c > > > > > > @@ -960,17 +960,17 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm, > > > > > > merge_next = true; > > > > > > } > > > > > > > > > > > > + /* Verify some invariant that must be enforced by the caller. */ > > > > > > + VM_WARN_ON(prev && addr <= prev->vm_start); > > > > > > + VM_WARN_ON(curr && (addr != curr->vm_start || end > curr->vm_end)); > > > > > > + VM_WARN_ON(addr >= end); > > > > > > + > > > > > > > > > > I'm seeing this fire a lot when fuzzing v6.4-rc1 on arm64 using Syzkaller. > > > > > > > > > > > > > Thanks, from the line I suspect addr != curr->vm_start, but need to look > > > > into the repro, at lsf/mm so a bit time lagged :) > > > > > > No problem; FWIW I can confirm your theory, the reproducer is causing: > > > > > > addr > curr->vm_start > > > > > > ... confirmed the the following hack, log below. > > > > Awesome thanks for that! Just been firing up qemu to do this. > > > > Cases 5-8 should really have addr == curr->vm_start, I wonder if it's > > another case but curr is being set incorrectly, it should in theory not be > > the case. > > AFAIU, it's a case of "adjust vma, but don't merge, because prev is not > compatible". Looks like uffd first attempts to merge compatible the newly > registered range with adjacent vmas relying on that there won't be no merge > when addr != curr->vm_start and only after the merge attempt it splits the > edges. > > I think that moving the split in fs/userfaultfd.c:1495 (as of v6.4-rc1) > before vma_merge() will be the right fix. > Indeed it was, patch at https://lore.kernel.org/all/20230514172731.134188-1-lstoakes@gmail.com/ [snip]
diff --git a/mm/mmap.c b/mm/mmap.c index 5522130ae606..13678edaa22c 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -960,17 +960,17 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm, merge_next = true; } + /* Verify some invariant that must be enforced by the caller. */ + VM_WARN_ON(prev && addr <= prev->vm_start); + VM_WARN_ON(curr && (addr != curr->vm_start || end > curr->vm_end)); + VM_WARN_ON(addr >= end); + if (!merge_prev && !merge_next) return NULL; /* Not mergeable. */ res = vma = prev; remove = remove2 = adjust = NULL; - /* Verify some invariant that must be enforced by the caller. */ - VM_WARN_ON(prev && addr <= prev->vm_start); - VM_WARN_ON(curr && (addr != curr->vm_start || end > curr->vm_end)); - VM_WARN_ON(addr >= end); - /* Can we merge both the predecessor and the successor? */ if (merge_prev && merge_next && is_mergeable_anon_vma(prev->anon_vma, next->anon_vma, NULL)) {