Message ID | 20230129082856.22113-1-tiwai@suse.de |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp1664824wrn; Sun, 29 Jan 2023 00:43:01 -0800 (PST) X-Google-Smtp-Source: AK7set/DxnjZzuWhSrn1FR+BEKozQq+ZysQrsCtENz6rL+/CKpyX4lCo1MuAvM+wPHMEcvyUS/fl X-Received: by 2002:a17:90b:4f46:b0:22c:1211:89c0 with SMTP id pj6-20020a17090b4f4600b0022c121189c0mr15593032pjb.30.1674981781386; Sun, 29 Jan 2023 00:43:01 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1674981781; cv=none; d=google.com; s=arc-20160816; b=bUuHcA9j30FuV7Vrz0KsVUX4dRb/nEEny9oXdrKsfDmRCZ/1D8AZkWUZb//Ze2X1jT 3x8N5FqIvtnVGi20z/2KFNsj8tjcl4ilnwCFUO9FU+TOT9UtKL3tY3P/Ve8VqqxaF/Qn X6YfrdYpfNi5SDaEDddSbt52Owsoixrw8WQZCE0mFYvIXH3qJzD7AvX5oedkFm++E8kE /3DugvoR42qvY2ltY+CWzwhDWO4RT7gMUi/9pLZPIx+7TAHvicGnfkUhrEMVfTx+OMsM 3Br4SpBzGQKNYwNEys0DLm9PbxzwKKgQpqjnNtMyw/JI2Phz8S5L6ve3+3ZiD5xwNIRY vsgg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature:dkim-signature; bh=L4YFbahCifwwNGpSJHPlW/1QYyuaOZX6D/eqp4s/nSE=; b=nw2c5VIvzw3eRS+lErr2AAt2OQeh64K4oyoABAwmegD+0ciImF0NS4GshqSwPTmFC9 K8xRclo2GhUBa2czuYdSJHipN89iez9E6CAniHXxBxRDIhFlFnSy2k6CgkUb8czadW/Y pXrQ8zK9YpgPofWcj45zBKXzdzXiOxFjIq5id4BhW2PMXMUZYPCOjR6oPTa5wmUXvRM7 JJmC0aqTr346mysLHcFeYat1OfpdFzRsH5mKlGdMexEZRGQJjOztucxc8mOrBh97MXLx 9nNawWQV7xl/nXkdwd2UhQVueYdmL+2NUzLgW1qEswmQ3LC4CVZoVEbalf56xcltDDmb WZFQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.de header.s=susede2_rsa header.b=wG9LqxUv; dkim=neutral (no key) header.i=@suse.de header.s=susede2_ed25519 header.b=aJBBaFkc; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=suse.de Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id n28-20020a638f1c000000b0049f478df3b8si5991704pgd.702.2023.01.29.00.42.45; Sun, 29 Jan 2023 00:43:01 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.de header.s=susede2_rsa header.b=wG9LqxUv; dkim=neutral (no key) header.i=@suse.de header.s=susede2_ed25519 header.b=aJBBaFkc; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=suse.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230223AbjA2I3F (ORCPT <rfc822;n2h9z4@gmail.com> + 99 others); Sun, 29 Jan 2023 03:29:05 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34416 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229519AbjA2I3E (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Sun, 29 Jan 2023 03:29:04 -0500 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 368AE21A36; Sun, 29 Jan 2023 00:29:03 -0800 (PST) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id C930D1FF23; Sun, 29 Jan 2023 08:29:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1674980941; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=L4YFbahCifwwNGpSJHPlW/1QYyuaOZX6D/eqp4s/nSE=; b=wG9LqxUvn0B/3NHh1jHP/DoBhhKWeNXg0z3EtQdDzFKIJ6udcKBLL9F453m2TvdUm5UkhA vi7asR++R59lfDNP1p7wEq2iqKQBETkedxDmFGhVdWeEKWbq3ORhRT0IZKbEwsAT936s3M k6lVJiF4qfSQe09Gqq7Y5amEmIs6NZA= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1674980941; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=L4YFbahCifwwNGpSJHPlW/1QYyuaOZX6D/eqp4s/nSE=; b=aJBBaFkcPkCwOvnJtKzzHHF0OO4DfuMnGxxAePMlyPH8O4ShKDis4hsX0+3HG3cknLFRgR mNdTtl6HwqapEhBw== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id A34E813918; Sun, 29 Jan 2023 08:29:01 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id IV42J00u1mO3SQAAMHmgww (envelope-from <tiwai@suse.de>); Sun, 29 Jan 2023 08:29:01 +0000 From: Takashi Iwai <tiwai@suse.de> To: Helge Deller <deller@gmx.de> Cc: linux-fbdev@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org, Patrik Jakobsson <pjakobsson@suse.de>, Thomas Zimmermann <tzimmermann@suse.de> Subject: [PATCH v2] fbdev: Fix invalid page access after closing deferred I/O devices Date: Sun, 29 Jan 2023 09:28:56 +0100 Message-Id: <20230129082856.22113-1-tiwai@suse.de> X-Mailer: git-send-email 2.35.3 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1756345696446968461?= X-GMAIL-MSGID: =?utf-8?q?1756345696446968461?= |
Series |
[v2] fbdev: Fix invalid page access after closing deferred I/O devices
|
|
Commit Message
Takashi Iwai
Jan. 29, 2023, 8:28 a.m. UTC
When a fbdev with deferred I/O is once opened and closed, the dirty pages still remain queued in the pageref list, and eventually later those may be processed in the delayed work. This may lead to a corruption of pages, hitting an Oops. This patch makes sure to cancel the delayed work and clean up the pageref list at closing the device for addressing the bug. A part of the cleanup code is factored out as a new helper function that is called from the common fb_release(). Reviewed-by: Patrik Jakobsson <patrik.r.jakobsson@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de> --- v1->v2: Fix build error without CONFIG_FB_DEFERRED_IO drivers/video/fbdev/core/fb_defio.c | 10 +++++++++- drivers/video/fbdev/core/fbmem.c | 4 ++++ include/linux/fb.h | 1 + 3 files changed, 14 insertions(+), 1 deletion(-)
Comments
On Sun, 2023-01-29 at 09:28 +0100, Takashi Iwai wrote: > When a fbdev with deferred I/O is once opened and closed, the dirty > pages still remain queued in the pageref list, and eventually later > those may be processed in the delayed work. This may lead to a > corruption of pages, hitting an Oops. > > This patch makes sure to cancel the delayed work and clean up the > pageref list at closing the device for addressing the bug. A part of > the cleanup code is factored out as a new helper function that is > called from the common fb_release(). > > Reviewed-by: Patrik Jakobsson <patrik.r.jakobsson@gmail.com> > Cc: <stable@vger.kernel.org> > Signed-off-by: Takashi Iwai <tiwai@suse.de> > --- > v1->v2: Fix build error without CONFIG_FB_DEFERRED_IO > > drivers/video/fbdev/core/fb_defio.c | 10 +++++++++- > drivers/video/fbdev/core/fbmem.c | 4 ++++ > include/linux/fb.h | 1 + > 3 files changed, 14 insertions(+), 1 deletion(-) > > diff --git a/drivers/video/fbdev/core/fb_defio.c > b/drivers/video/fbdev/core/fb_defio.c > index c730253ab85c..583cbcf09446 100644 > --- a/drivers/video/fbdev/core/fb_defio.c > +++ b/drivers/video/fbdev/core/fb_defio.c > @@ -313,7 +313,7 @@ void fb_deferred_io_open(struct fb_info *info, > } > EXPORT_SYMBOL_GPL(fb_deferred_io_open); > > -void fb_deferred_io_cleanup(struct fb_info *info) > +void fb_deferred_io_release(struct fb_info *info) > { > struct fb_deferred_io *fbdefio = info->fbdefio; > struct page *page; > @@ -327,6 +327,14 @@ void fb_deferred_io_cleanup(struct fb_info > *info) > page = fb_deferred_io_page(info, i); > page->mapping = NULL; > } > +} > +EXPORT_SYMBOL_GPL(fb_deferred_io_release); > + > +void fb_deferred_io_cleanup(struct fb_info *info) > +{ > + struct fb_deferred_io *fbdefio = info->fbdefio; > + > + fb_deferred_io_release(info); > > kvfree(info->pagerefs); > mutex_destroy(&fbdefio->lock); > diff --git a/drivers/video/fbdev/core/fbmem.c > b/drivers/video/fbdev/core/fbmem.c > index 3a6c8458eb8d..ab3545a00abc 100644 > --- a/drivers/video/fbdev/core/fbmem.c > +++ b/drivers/video/fbdev/core/fbmem.c > @@ -1454,6 +1454,10 @@ __releases(&info->lock) > struct fb_info * const info = file->private_data; > > lock_fb_info(info); > +#if IS_ENABLED(CONFIG_FB_DEFERRED_IO) > + if (info->fbdefio) > + fb_deferred_io_release(info); > +#endif > if (info->fbops->fb_release) > info->fbops->fb_release(info,1); > module_put(info->fbops->owner); > diff --git a/include/linux/fb.h b/include/linux/fb.h > index 96b96323e9cb..73eb1f85ea8e 100644 > --- a/include/linux/fb.h > +++ b/include/linux/fb.h > @@ -662,6 +662,7 @@ extern int fb_deferred_io_init(struct fb_info > *info); > extern void fb_deferred_io_open(struct fb_info *info, > struct inode *inode, > struct file *file); > +extern void fb_deferred_io_release(struct fb_info *info); > extern void fb_deferred_io_cleanup(struct fb_info *info); > extern int fb_deferred_io_fsync(struct file *file, loff_t start, > loff_t end, int datasync); Tested-by: Miko Larsson <mikoxyzzz@gmail.com>
Hi Am 29.01.23 um 09:28 schrieb Takashi Iwai: > When a fbdev with deferred I/O is once opened and closed, the dirty > pages still remain queued in the pageref list, and eventually later > those may be processed in the delayed work. This may lead to a > corruption of pages, hitting an Oops. Do you have more information on this problem? The mmap'ed buffer of the fbdev device comes from a vmalloc call. That memory's location never changes; even across pairs of open/close on the device file. I'm surprised that a page entry becomes invalid. In drm_fbdev_cleanup(), we first remove the fbdefio at [1] and then vfree() the shadow buffer. So the memory should still be around until fbdevio is gone. [1] https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/drm_fb_helper.c#L2146 > > This patch makes sure to cancel the delayed work and clean up the > pageref list at closing the device for addressing the bug. A part of > the cleanup code is factored out as a new helper function that is > called from the common fb_release(). The delayed work is required to copy the framebuffer to the device output. So if it's just canceled, could this result in missing updates? There's a call to cancel_delayed_work_sync() in the new helper fb_deferred_io_release(). Is this the right function? Maybe flush_delayed_work() is a better choice. > > Reviewed-by: Patrik Jakobsson <patrik.r.jakobsson@gmail.com> > Cc: <stable@vger.kernel.org> > Signed-off-by: Takashi Iwai <tiwai@suse.de> This could use a Fixes tag. It's not exactly clear to me when this problem got originally introduced, but the recent refactoring seems a candidate. Fixes: 56c134f7f1b5 ("fbdev: Track deferred-I/O pages in pageref struct") Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: Javier Martinez Canillas <javierm@redhat.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: Zack Rusin <zackr@vmware.com> Cc: VMware Graphics Reviewers <linux-graphics-maintainer@vmware.com> Cc: Jaya Kumar <jayalk@intworks.biz> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Wei Liu <wei.liu@kernel.org> Cc: Dexuan Cui <decui@microsoft.com> Cc: Steve Glendinning <steve.glendinning@shawell.net> Cc: Bernie Thompson <bernie@plugable.com> Cc: Helge Deller <deller@gmx.de> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Stephen Kitt <steve@sk2.org> Cc: Peter Suti <peter.suti@streamunlimited.com> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Geert Uytterhoeven <geert+renesas@glider.be> Cc: ye xingchen <ye.xingchen@zte.com.cn> Cc: Petr Mladek <pmladek@suse.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: Tom Rix <trix@redhat.com> Cc: dri-devel@lists.freedesktop.org Cc: linux-fbdev@vger.kernel.org Cc: linux-hyperv@vger.kernel.org Cc: <stable@vger.kernel.org> # v5.19+ > --- > v1->v2: Fix build error without CONFIG_FB_DEFERRED_IO > > drivers/video/fbdev/core/fb_defio.c | 10 +++++++++- > drivers/video/fbdev/core/fbmem.c | 4 ++++ > include/linux/fb.h | 1 + > 3 files changed, 14 insertions(+), 1 deletion(-) > > diff --git a/drivers/video/fbdev/core/fb_defio.c b/drivers/video/fbdev/core/fb_defio.c > index c730253ab85c..583cbcf09446 100644 > --- a/drivers/video/fbdev/core/fb_defio.c > +++ b/drivers/video/fbdev/core/fb_defio.c > @@ -313,7 +313,7 @@ void fb_deferred_io_open(struct fb_info *info, > } > EXPORT_SYMBOL_GPL(fb_deferred_io_open); > > -void fb_deferred_io_cleanup(struct fb_info *info) > +void fb_deferred_io_release(struct fb_info *info) > { > struct fb_deferred_io *fbdefio = info->fbdefio; > struct page *page; > @@ -327,6 +327,14 @@ void fb_deferred_io_cleanup(struct fb_info *info) > page = fb_deferred_io_page(info, i); > page->mapping = NULL; > } > +} > +EXPORT_SYMBOL_GPL(fb_deferred_io_release); It's all in the same module. No need to export this symbol. Best regards Thomas > + > +void fb_deferred_io_cleanup(struct fb_info *info) > +{ > + struct fb_deferred_io *fbdefio = info->fbdefio; > + > + fb_deferred_io_release(info); > > kvfree(info->pagerefs); > mutex_destroy(&fbdefio->lock); > diff --git a/drivers/video/fbdev/core/fbmem.c b/drivers/video/fbdev/core/fbmem.c > index 3a6c8458eb8d..ab3545a00abc 100644 > --- a/drivers/video/fbdev/core/fbmem.c > +++ b/drivers/video/fbdev/core/fbmem.c > @@ -1454,6 +1454,10 @@ __releases(&info->lock) > struct fb_info * const info = file->private_data; > > lock_fb_info(info); > +#if IS_ENABLED(CONFIG_FB_DEFERRED_IO) > + if (info->fbdefio) > + fb_deferred_io_release(info); > +#endif > if (info->fbops->fb_release) > info->fbops->fb_release(info,1); > module_put(info->fbops->owner); > diff --git a/include/linux/fb.h b/include/linux/fb.h > index 96b96323e9cb..73eb1f85ea8e 100644 > --- a/include/linux/fb.h > +++ b/include/linux/fb.h > @@ -662,6 +662,7 @@ extern int fb_deferred_io_init(struct fb_info *info); > extern void fb_deferred_io_open(struct fb_info *info, > struct inode *inode, > struct file *file); > +extern void fb_deferred_io_release(struct fb_info *info); > extern void fb_deferred_io_cleanup(struct fb_info *info); > extern int fb_deferred_io_fsync(struct file *file, loff_t start, > loff_t end, int datasync); -- Thomas Zimmermann Graphics Driver Developer SUSE Software Solutions Germany GmbH Maxfeldstr. 5, 90409 Nürnberg, Germany (HRB 36809, AG Nürnberg) Geschäftsführer: Ivo Totev
On Mon, 30 Jan 2023 09:28:36 +0100, Thomas Zimmermann wrote: > > Hi > > Am 29.01.23 um 09:28 schrieb Takashi Iwai: > > When a fbdev with deferred I/O is once opened and closed, the dirty > > pages still remain queued in the pageref list, and eventually later > > those may be processed in the delayed work. This may lead to a > > corruption of pages, hitting an Oops. > > Do you have more information on this problem? The details are in SUSE bugzilla, but that's an internal bug entry (and you know the number :) It happens at the following at least: - A VM is started with VGA console, no fb, on the installer - VM is switched to bochs drm - Start fbiterm on VT1, switching to the graphics mode on VT - Exit fbiterm, going back to the text mode on VT; at this moment, it gets Oops like: [ 42.338319][ T122] BUG: unable to handle page fault for address: ffffe570c1000030 [ 42.340063][ T122] #PF: supervisor read access in kernel mode [ 42.340519][ T122] #PF: error_code(0x0000) - not-present page [ 42.340979][ T122] PGD 34c38067 P4D 34c38067 PUD 34c37067 PMD 0 [ 42.341456][ T122] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 42.341853][ T122] CPU: 1 PID: 122 Comm: kworker/1:2 Not tainted 5.14.21-150500.5.g2ad24ee-default #1 SLE15-SP5 (unreleased) b7a28d028376a517e888a7ff28c5e5dede93267c [ 42.343000][ T122] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552-rebuilt.opensuse.org 04/01/2014 [ 42.343929][ T122] Workqueue: events fb_deferred_io_work [ 42.344355][ T122] RIP: 0010:page_mapped+0x5e/0x90 [ 42.344743][ T122] Code: a8 01 75 d7 8b 47 30 f7 d0 c1 e8 1f c3 cc cc cc cc 48 89 df e8 33 9c 05 00 89 c1 31 c0 85 c9 74 13 eb d3 48 c1 e2 06 48 01 da <8b> 42 30 85 c0 79 c0 83 c1 01 48 8b 33 48 63 d1 b8 01 00 00 00 f7 [ 42.346285][ T122] RSP: 0018:ffffb68640207e08 EFLAGS: 00010286 [ 42.346749][ T122] RAX: 00000000b3aea8f0 RBX: ffffe570c0f00000 RCX: 0000000000004000 [ 42.347355][ T122] RDX: ffffe570c1000000 RSI: 000fffffc0010009 RDI: ffffe570c0f00000 [ 42.347960][ T122] RBP: ffffffffc0503050 R08: 0000000000000000 R09: 0000000000000001 [ 42.348568][ T122] R10: 0000000000000000 R11: ffffb68640207c88 R12: ffffffffc0503020 [ 42.349180][ T122] R13: ffff921281dcdc00 R14: ffff9212bcf08000 R15: ffffe570c0f00000 [ 42.349789][ T122] FS: 0000000000000000(0000) GS:ffff9212b3b00000(0000) knlGS:0000000000000000 [ 42.350471][ T122] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 42.350975][ T122] CR2: ffffe570c1000030 CR3: 000000001b810000 CR4: 00000000000006e0 [ 42.351588][ T122] Call Trace: [ 42.351845][ T122] <TASK> [ 42.352069][ T122] page_mkclean+0x6e/0xc0 [ 42.352400][ T122] ? page_referenced_one+0x190/0x190 [ 42.353714][ T122] ? pmdp_collapse_flush+0x60/0x60 [ 42.354106][ T122] fb_deferred_io_work+0x13d/0x190 [ 42.354496][ T122] process_one_work+0x267/0x440 [ 42.354866][ T122] ? process_one_work+0x440/0x440 [ 42.355247][ T122] worker_thread+0x2d/0x3d0 [ 42.355590][ T122] ? process_one_work+0x440/0x440 [ 42.355972][ T122] kthread+0x156/0x180 [ 42.356281][ T122] ? set_kthread_struct+0x50/0x50 [ 42.356662][ T122] ret_from_fork+0x22/0x30 [ 42.357006][ T122] </TASK> The page info shows that it's a compound page but it's somehow broken. On VM, it's triggered reliably with the scenario above, always at the same position. FWIW, the Oops is hit even if there is no rewrite on the screen. That is, another procedure is: - Start VM, run fbiterm on VT1 - Switch to VT2, text mode - On VT2, kill fbiterm; the crash still happens even if no screen change is performed > The mmap'ed buffer of the fbdev device comes from a vmalloc call. That > memory's location never changes; even across pairs of open/close on > the device file. I'm surprised that a page entry becomes invalid. > > In drm_fbdev_cleanup(), we first remove the fbdefio at [1] and then > vfree() the shadow buffer. So the memory should still be around until > fbdevio is gone. Yes, that's the puzzling part, too. Also, another thing is that the bug couldn't be triggered easily when the fb is started in a different way. e.g. when you run fbiterm & exit on the VM that had efifb, it didn't hit. So, overall, it might be that I'm scratching a wrong surface. But at least it "fixes" the problem above apparently, and the deferred io base code itself has certainly the potential problem in general as my patch suggests. > [1] > https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/drm_fb_helper.c#L2146 > > > > > This patch makes sure to cancel the delayed work and clean up the > > pageref list at closing the device for addressing the bug. A part of > > the cleanup code is factored out as a new helper function that is > > called from the common fb_release(). > > The delayed work is required to copy the framebuffer to the device > output. So if it's just canceled, could this result in missing > updates? > > There's a call to cancel_delayed_work_sync() in the new helper > fb_deferred_io_release(). Is this the right function? Maybe > flush_delayed_work() is a better choice. I thought of that, but took a shorter path. OK, let's check whether this keeps working with that change. > > Reviewed-by: Patrik Jakobsson <patrik.r.jakobsson@gmail.com> > > Cc: <stable@vger.kernel.org> > > Signed-off-by: Takashi Iwai <tiwai@suse.de> > > This could use a Fixes tag. It's not exactly clear to me when this > problem got originally introduced, but the recent refactoring seems a > candidate. > > Fixes: 56c134f7f1b5 ("fbdev: Track deferred-I/O pages in pageref struct") Hrm, this might be. Maybe Patrik can test with the revert of this? > Cc: Thomas Zimmermann <tzimmermann@suse.de> > Cc: Javier Martinez Canillas <javierm@redhat.com> > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> > Cc: Maxime Ripard <mripard@kernel.org> > Cc: Zack Rusin <zackr@vmware.com> > Cc: VMware Graphics Reviewers <linux-graphics-maintainer@vmware.com> > Cc: Jaya Kumar <jayalk@intworks.biz> > Cc: Daniel Vetter <daniel@ffwll.ch> > Cc: "K. Y. Srinivasan" <kys@microsoft.com> > Cc: Haiyang Zhang <haiyangz@microsoft.com> > Cc: Wei Liu <wei.liu@kernel.org> > Cc: Dexuan Cui <decui@microsoft.com> > Cc: Steve Glendinning <steve.glendinning@shawell.net> > Cc: Bernie Thompson <bernie@plugable.com> > Cc: Helge Deller <deller@gmx.de> > Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > Cc: Stephen Kitt <steve@sk2.org> > Cc: Peter Suti <peter.suti@streamunlimited.com> > Cc: Sam Ravnborg <sam@ravnborg.org> > Cc: Geert Uytterhoeven <geert+renesas@glider.be> > Cc: ye xingchen <ye.xingchen@zte.com.cn> > Cc: Petr Mladek <pmladek@suse.com> > Cc: John Ogness <john.ogness@linutronix.de> > Cc: Tom Rix <trix@redhat.com> > Cc: dri-devel@lists.freedesktop.org > Cc: linux-fbdev@vger.kernel.org > Cc: linux-hyperv@vger.kernel.org > Cc: <stable@vger.kernel.org> # v5.19+ Nah, please don't. Too many Cc's, literally a spam. > > --- > > v1->v2: Fix build error without CONFIG_FB_DEFERRED_IO > > > > drivers/video/fbdev/core/fb_defio.c | 10 +++++++++- > > drivers/video/fbdev/core/fbmem.c | 4 ++++ > > include/linux/fb.h | 1 + > > 3 files changed, 14 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/video/fbdev/core/fb_defio.c b/drivers/video/fbdev/core/fb_defio.c > > index c730253ab85c..583cbcf09446 100644 > > --- a/drivers/video/fbdev/core/fb_defio.c > > +++ b/drivers/video/fbdev/core/fb_defio.c > > @@ -313,7 +313,7 @@ void fb_deferred_io_open(struct fb_info *info, > > } > > EXPORT_SYMBOL_GPL(fb_deferred_io_open); > > -void fb_deferred_io_cleanup(struct fb_info *info) > > +void fb_deferred_io_release(struct fb_info *info) > > { > > struct fb_deferred_io *fbdefio = info->fbdefio; > > struct page *page; > > @@ -327,6 +327,14 @@ void fb_deferred_io_cleanup(struct fb_info *info) > > page = fb_deferred_io_page(info, i); > > page->mapping = NULL; > > } > > +} > > +EXPORT_SYMBOL_GPL(fb_deferred_io_release); > > It's all in the same module. No need to export this symbol. I noticed it, too, but just keep the same style as other functions :) That said, the other exported symbols are also useless. I can prepare another patch to clean it up. thanks, Takashi
Hi Am 30.01.23 um 09:52 schrieb Takashi Iwai: > On Mon, 30 Jan 2023 09:28:36 +0100, > Thomas Zimmermann wrote: >> >> Hi >> >> Am 29.01.23 um 09:28 schrieb Takashi Iwai: >>> When a fbdev with deferred I/O is once opened and closed, the dirty >>> pages still remain queued in the pageref list, and eventually later >>> those may be processed in the delayed work. This may lead to a >>> corruption of pages, hitting an Oops. >> >> Do you have more information on this problem? > > The details are in SUSE bugzilla, but that's an internal bug entry > (and you know the number :) It happens at the following at least: > > - A VM is started with VGA console, no fb, on the installer > - VM is switched to bochs drm > - Start fbiterm on VT1, switching to the graphics mode on VT > - Exit fbiterm, going back to the text mode on VT; > at this moment, it gets Oops like: > > [ 42.338319][ T122] BUG: unable to handle page fault for address: > ffffe570c1000030 > [ 42.340063][ T122] #PF: supervisor read access in kernel mode > [ 42.340519][ T122] #PF: error_code(0x0000) - not-present page > [ 42.340979][ T122] PGD 34c38067 P4D 34c38067 PUD 34c37067 PMD 0 > [ 42.341456][ T122] Oops: 0000 [#1] PREEMPT SMP NOPTI > [ 42.341853][ T122] CPU: 1 PID: 122 Comm: kworker/1:2 Not tainted > 5.14.21-150500.5.g2ad24ee-default #1 SLE15-SP5 (unreleased) > b7a28d028376a517e888a7ff28c5e5dede93267c > [ 42.343000][ T122] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), > BIOS rel-1.16.0-0-gd239552-rebuilt.opensuse.org 04/01/2014 > [ 42.343929][ T122] Workqueue: events fb_deferred_io_work > [ 42.344355][ T122] RIP: 0010:page_mapped+0x5e/0x90 > [ 42.344743][ T122] Code: a8 01 75 d7 8b 47 30 f7 d0 c1 e8 1f c3 cc cc cc cc > 48 89 df e8 33 9c 05 00 89 c1 31 c0 85 c9 74 13 eb d3 48 c1 e2 06 48 01 da <8b> > 42 30 85 c0 79 c0 83 c1 01 48 8b 33 48 63 d1 b8 01 00 00 00 f7 > [ 42.346285][ T122] RSP: 0018:ffffb68640207e08 EFLAGS: 00010286 > [ 42.346749][ T122] RAX: 00000000b3aea8f0 RBX: ffffe570c0f00000 RCX: > 0000000000004000 > [ 42.347355][ T122] RDX: ffffe570c1000000 RSI: 000fffffc0010009 RDI: > ffffe570c0f00000 > [ 42.347960][ T122] RBP: ffffffffc0503050 R08: 0000000000000000 R09: > 0000000000000001 > [ 42.348568][ T122] R10: 0000000000000000 R11: ffffb68640207c88 R12: > ffffffffc0503020 > [ 42.349180][ T122] R13: ffff921281dcdc00 R14: ffff9212bcf08000 R15: > ffffe570c0f00000 > [ 42.349789][ T122] FS: 0000000000000000(0000) GS:ffff9212b3b00000(0000) > knlGS:0000000000000000 > [ 42.350471][ T122] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 42.350975][ T122] CR2: ffffe570c1000030 CR3: 000000001b810000 CR4: > 00000000000006e0 > [ 42.351588][ T122] Call Trace: > [ 42.351845][ T122] <TASK> > [ 42.352069][ T122] page_mkclean+0x6e/0xc0 > [ 42.352400][ T122] ? page_referenced_one+0x190/0x190 > [ 42.353714][ T122] ? pmdp_collapse_flush+0x60/0x60 > [ 42.354106][ T122] fb_deferred_io_work+0x13d/0x190 > [ 42.354496][ T122] process_one_work+0x267/0x440 > [ 42.354866][ T122] ? process_one_work+0x440/0x440 > [ 42.355247][ T122] worker_thread+0x2d/0x3d0 > [ 42.355590][ T122] ? process_one_work+0x440/0x440 > [ 42.355972][ T122] kthread+0x156/0x180 > [ 42.356281][ T122] ? set_kthread_struct+0x50/0x50 > [ 42.356662][ T122] ret_from_fork+0x22/0x30 > [ 42.357006][ T122] </TASK> > > The page info shows that it's a compound page but it's somehow > broken. On VM, it's triggered reliably with the scenario above, > always at the same position. > > FWIW, the Oops is hit even if there is no rewrite on the screen. > That is, another procedure is: > - Start VM, run fbiterm on VT1 > - Switch to VT2, text mode > - On VT2, kill fbiterm; the crash still happens even if no screen > change is performed > >> The mmap'ed buffer of the fbdev device comes from a vmalloc call. That >> memory's location never changes; even across pairs of open/close on >> the device file. I'm surprised that a page entry becomes invalid. >> >> In drm_fbdev_cleanup(), we first remove the fbdefio at [1] and then >> vfree() the shadow buffer. So the memory should still be around until >> fbdevio is gone. > > Yes, that's the puzzling part, too. Also, another thing is that the > bug couldn't be triggered easily when the fb is started in a different > way. e.g. when you run fbiterm & exit on the VM that had efifb, it > didn't hit. > > So, overall, it might be that I'm scratching a wrong surface. But at > least it "fixes" the problem above apparently, and the deferred io > base code itself has certainly the potential problem in general as my > patch suggests. Are there multiple graphics devices? There's just recently been a bugfix where graphics devices accidentally shared the same list of deferred pages. See https://lore.kernel.org/dri-devel/20230121192418.2814955-4-javierm@redhat.com/ > >> [1] >> https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/drm_fb_helper.c#L2146 >> >>> >>> This patch makes sure to cancel the delayed work and clean up the >>> pageref list at closing the device for addressing the bug. A part of >>> the cleanup code is factored out as a new helper function that is >>> called from the common fb_release(). >> >> The delayed work is required to copy the framebuffer to the device >> output. So if it's just canceled, could this result in missing >> updates? >> >> There's a call to cancel_delayed_work_sync() in the new helper >> fb_deferred_io_release(). Is this the right function? Maybe >> flush_delayed_work() is a better choice. > > I thought of that, but took a shorter path. > OK, let's check whether this keeps working with that change. I read that cancel_() is not enough and needs to be followed by a flush_() to ensure quiescence. So maybe we should call that flush_ unconditionally. > >>> Reviewed-by: Patrik Jakobsson <patrik.r.jakobsson@gmail.com> >>> Cc: <stable@vger.kernel.org> >>> Signed-off-by: Takashi Iwai <tiwai@suse.de> >> >> This could use a Fixes tag. It's not exactly clear to me when this >> problem got originally introduced, but the recent refactoring seems a >> candidate. >> >> Fixes: 56c134f7f1b5 ("fbdev: Track deferred-I/O pages in pageref struct") > > Hrm, this might be. Maybe Patrik can test with the revert of this? That's not easily revertable. > >> Cc: Thomas Zimmermann <tzimmermann@suse.de> >> Cc: Javier Martinez Canillas <javierm@redhat.com> >> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> >> Cc: Maxime Ripard <mripard@kernel.org> >> Cc: Zack Rusin <zackr@vmware.com> >> Cc: VMware Graphics Reviewers <linux-graphics-maintainer@vmware.com> >> Cc: Jaya Kumar <jayalk@intworks.biz> >> Cc: Daniel Vetter <daniel@ffwll.ch> >> Cc: "K. Y. Srinivasan" <kys@microsoft.com> >> Cc: Haiyang Zhang <haiyangz@microsoft.com> >> Cc: Wei Liu <wei.liu@kernel.org> >> Cc: Dexuan Cui <decui@microsoft.com> >> Cc: Steve Glendinning <steve.glendinning@shawell.net> >> Cc: Bernie Thompson <bernie@plugable.com> >> Cc: Helge Deller <deller@gmx.de> >> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> >> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> >> Cc: Stephen Kitt <steve@sk2.org> >> Cc: Peter Suti <peter.suti@streamunlimited.com> >> Cc: Sam Ravnborg <sam@ravnborg.org> >> Cc: Geert Uytterhoeven <geert+renesas@glider.be> >> Cc: ye xingchen <ye.xingchen@zte.com.cn> >> Cc: Petr Mladek <pmladek@suse.com> >> Cc: John Ogness <john.ogness@linutronix.de> >> Cc: Tom Rix <trix@redhat.com> >> Cc: dri-devel@lists.freedesktop.org >> Cc: linux-fbdev@vger.kernel.org >> Cc: linux-hyperv@vger.kernel.org >> Cc: <stable@vger.kernel.org> # v5.19+ > > Nah, please don't. Too many Cc's, literally a spam. Ok. > >>> --- >>> v1->v2: Fix build error without CONFIG_FB_DEFERRED_IO >>> >>> drivers/video/fbdev/core/fb_defio.c | 10 +++++++++- >>> drivers/video/fbdev/core/fbmem.c | 4 ++++ >>> include/linux/fb.h | 1 + >>> 3 files changed, 14 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/video/fbdev/core/fb_defio.c b/drivers/video/fbdev/core/fb_defio.c >>> index c730253ab85c..583cbcf09446 100644 >>> --- a/drivers/video/fbdev/core/fb_defio.c >>> +++ b/drivers/video/fbdev/core/fb_defio.c >>> @@ -313,7 +313,7 @@ void fb_deferred_io_open(struct fb_info *info, >>> } >>> EXPORT_SYMBOL_GPL(fb_deferred_io_open); >>> -void fb_deferred_io_cleanup(struct fb_info *info) >>> +void fb_deferred_io_release(struct fb_info *info) >>> { >>> struct fb_deferred_io *fbdefio = info->fbdefio; >>> struct page *page; >>> @@ -327,6 +327,14 @@ void fb_deferred_io_cleanup(struct fb_info *info) >>> page = fb_deferred_io_page(info, i); >>> page->mapping = NULL; >>> } >>> +} >>> +EXPORT_SYMBOL_GPL(fb_deferred_io_release); >> >> It's all in the same module. No need to export this symbol. > > I noticed it, too, but just keep the same style as other functions :) > That said, the other exported symbols are also useless. I can prepare > another patch to clean it up. Your choice, but appreciated. Best regards Thomas > > > thanks, > > Takashi -- Thomas Zimmermann Graphics Driver Developer SUSE Software Solutions Germany GmbH Maxfeldstr. 5, 90409 Nürnberg, Germany (HRB 36809, AG Nürnberg) Geschäftsführer: Ivo Totev
On Mon, 30 Jan 2023 10:28:16 +0100, Thomas Zimmermann wrote: > > Hi > > Am 30.01.23 um 09:52 schrieb Takashi Iwai: > > On Mon, 30 Jan 2023 09:28:36 +0100, > > Thomas Zimmermann wrote: > >> > >> Hi > >> > >> Am 29.01.23 um 09:28 schrieb Takashi Iwai: > >>> When a fbdev with deferred I/O is once opened and closed, the dirty > >>> pages still remain queued in the pageref list, and eventually later > >>> those may be processed in the delayed work. This may lead to a > >>> corruption of pages, hitting an Oops. > >> > >> Do you have more information on this problem? > > > > The details are in SUSE bugzilla, but that's an internal bug entry > > (and you know the number :) It happens at the following at least: > > > > - A VM is started with VGA console, no fb, on the installer > > - VM is switched to bochs drm > > - Start fbiterm on VT1, switching to the graphics mode on VT > > - Exit fbiterm, going back to the text mode on VT; > > at this moment, it gets Oops like: > > > > [ 42.338319][ T122] BUG: unable to handle page fault for address: > > ffffe570c1000030 > > [ 42.340063][ T122] #PF: supervisor read access in kernel mode > > [ 42.340519][ T122] #PF: error_code(0x0000) - not-present page > > [ 42.340979][ T122] PGD 34c38067 P4D 34c38067 PUD 34c37067 PMD 0 > > [ 42.341456][ T122] Oops: 0000 [#1] PREEMPT SMP NOPTI > > [ 42.341853][ T122] CPU: 1 PID: 122 Comm: kworker/1:2 Not tainted > > 5.14.21-150500.5.g2ad24ee-default #1 SLE15-SP5 (unreleased) > > b7a28d028376a517e888a7ff28c5e5dede93267c > > [ 42.343000][ T122] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), > > BIOS rel-1.16.0-0-gd239552-rebuilt.opensuse.org 04/01/2014 > > [ 42.343929][ T122] Workqueue: events fb_deferred_io_work > > [ 42.344355][ T122] RIP: 0010:page_mapped+0x5e/0x90 > > [ 42.344743][ T122] Code: a8 01 75 d7 8b 47 30 f7 d0 c1 e8 1f c3 cc cc cc cc > > 48 89 df e8 33 9c 05 00 89 c1 31 c0 85 c9 74 13 eb d3 48 c1 e2 06 48 01 da <8b> > > 42 30 85 c0 79 c0 83 c1 01 48 8b 33 48 63 d1 b8 01 00 00 00 f7 > > [ 42.346285][ T122] RSP: 0018:ffffb68640207e08 EFLAGS: 00010286 > > [ 42.346749][ T122] RAX: 00000000b3aea8f0 RBX: ffffe570c0f00000 RCX: > > 0000000000004000 > > [ 42.347355][ T122] RDX: ffffe570c1000000 RSI: 000fffffc0010009 RDI: > > ffffe570c0f00000 > > [ 42.347960][ T122] RBP: ffffffffc0503050 R08: 0000000000000000 R09: > > 0000000000000001 > > [ 42.348568][ T122] R10: 0000000000000000 R11: ffffb68640207c88 R12: > > ffffffffc0503020 > > [ 42.349180][ T122] R13: ffff921281dcdc00 R14: ffff9212bcf08000 R15: > > ffffe570c0f00000 > > [ 42.349789][ T122] FS: 0000000000000000(0000) GS:ffff9212b3b00000(0000) > > knlGS:0000000000000000 > > [ 42.350471][ T122] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 42.350975][ T122] CR2: ffffe570c1000030 CR3: 000000001b810000 CR4: > > 00000000000006e0 > > [ 42.351588][ T122] Call Trace: > > [ 42.351845][ T122] <TASK> > > [ 42.352069][ T122] page_mkclean+0x6e/0xc0 > > [ 42.352400][ T122] ? page_referenced_one+0x190/0x190 > > [ 42.353714][ T122] ? pmdp_collapse_flush+0x60/0x60 > > [ 42.354106][ T122] fb_deferred_io_work+0x13d/0x190 > > [ 42.354496][ T122] process_one_work+0x267/0x440 > > [ 42.354866][ T122] ? process_one_work+0x440/0x440 > > [ 42.355247][ T122] worker_thread+0x2d/0x3d0 > > [ 42.355590][ T122] ? process_one_work+0x440/0x440 > > [ 42.355972][ T122] kthread+0x156/0x180 > > [ 42.356281][ T122] ? set_kthread_struct+0x50/0x50 > > [ 42.356662][ T122] ret_from_fork+0x22/0x30 > > [ 42.357006][ T122] </TASK> > > > > The page info shows that it's a compound page but it's somehow > > broken. On VM, it's triggered reliably with the scenario above, > > always at the same position. > > > > FWIW, the Oops is hit even if there is no rewrite on the screen. > > That is, another procedure is: > > - Start VM, run fbiterm on VT1 > > - Switch to VT2, text mode > > - On VT2, kill fbiterm; the crash still happens even if no screen > > change is performed > > > >> The mmap'ed buffer of the fbdev device comes from a vmalloc call. That > >> memory's location never changes; even across pairs of open/close on > >> the device file. I'm surprised that a page entry becomes invalid. > >> > >> In drm_fbdev_cleanup(), we first remove the fbdefio at [1] and then > >> vfree() the shadow buffer. So the memory should still be around until > >> fbdevio is gone. > > > > Yes, that's the puzzling part, too. Also, another thing is that the > > bug couldn't be triggered easily when the fb is started in a different > > way. e.g. when you run fbiterm & exit on the VM that had efifb, it > > didn't hit. > > > > So, overall, it might be that I'm scratching a wrong surface. But at > > least it "fixes" the problem above apparently, and the deferred io > > base code itself has certainly the potential problem in general as my > > patch suggests. > > Are there multiple graphics devices? There's just recently been a > bugfix where graphics devices accidentally shared the same list of > deferred pages. See > > > https://lore.kernel.org/dri-devel/20230121192418.2814955-4-javierm@redhat.com/ No, it's a KVM with a single VGA. > >> [1] > >> https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/drm_fb_helper.c#L2146 > >> > >>> > >>> This patch makes sure to cancel the delayed work and clean up the > >>> pageref list at closing the device for addressing the bug. A part of > >>> the cleanup code is factored out as a new helper function that is > >>> called from the common fb_release(). > >> > >> The delayed work is required to copy the framebuffer to the device > >> output. So if it's just canceled, could this result in missing > >> updates? > >> > >> There's a call to cancel_delayed_work_sync() in the new helper > >> fb_deferred_io_release(). Is this the right function? Maybe > >> flush_delayed_work() is a better choice. > > > > I thought of that, but took a shorter path. > > OK, let's check whether this keeps working with that change. > > I read that cancel_() is not enough and needs to be followed by a > flush_() to ensure quiescence. That's not the case, I believe. As long as the code after that cleans up the pending pages, the cancel_sync alone must be fine. (Otherwise so many other code paths would hit the problem.) > So maybe we should call that flush_ > unconditionally. > > >>> Reviewed-by: Patrik Jakobsson <patrik.r.jakobsson@gmail.com> > >>> Cc: <stable@vger.kernel.org> > >>> Signed-off-by: Takashi Iwai <tiwai@suse.de> > >> > >> This could use a Fixes tag. It's not exactly clear to me when this > >> problem got originally introduced, but the recent refactoring seems a > >> candidate. > >> > >> Fixes: 56c134f7f1b5 ("fbdev: Track deferred-I/O pages in pageref struct") > > > > Hrm, this might be. Maybe Patrik can test with the revert of this? > > That's not easily revertable. Indeed. > >> Cc: Thomas Zimmermann <tzimmermann@suse.de> > >> Cc: Javier Martinez Canillas <javierm@redhat.com> > >> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> > >> Cc: Maxime Ripard <mripard@kernel.org> > >> Cc: Zack Rusin <zackr@vmware.com> > >> Cc: VMware Graphics Reviewers <linux-graphics-maintainer@vmware.com> > >> Cc: Jaya Kumar <jayalk@intworks.biz> > >> Cc: Daniel Vetter <daniel@ffwll.ch> > >> Cc: "K. Y. Srinivasan" <kys@microsoft.com> > >> Cc: Haiyang Zhang <haiyangz@microsoft.com> > >> Cc: Wei Liu <wei.liu@kernel.org> > >> Cc: Dexuan Cui <decui@microsoft.com> > >> Cc: Steve Glendinning <steve.glendinning@shawell.net> > >> Cc: Bernie Thompson <bernie@plugable.com> > >> Cc: Helge Deller <deller@gmx.de> > >> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> > >> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > >> Cc: Stephen Kitt <steve@sk2.org> > >> Cc: Peter Suti <peter.suti@streamunlimited.com> > >> Cc: Sam Ravnborg <sam@ravnborg.org> > >> Cc: Geert Uytterhoeven <geert+renesas@glider.be> > >> Cc: ye xingchen <ye.xingchen@zte.com.cn> > >> Cc: Petr Mladek <pmladek@suse.com> > >> Cc: John Ogness <john.ogness@linutronix.de> > >> Cc: Tom Rix <trix@redhat.com> > >> Cc: dri-devel@lists.freedesktop.org > >> Cc: linux-fbdev@vger.kernel.org > >> Cc: linux-hyperv@vger.kernel.org > >> Cc: <stable@vger.kernel.org> # v5.19+ > > > > Nah, please don't. Too many Cc's, literally a spam. > > Ok. > > > > >>> --- > >>> v1->v2: Fix build error without CONFIG_FB_DEFERRED_IO > >>> > >>> drivers/video/fbdev/core/fb_defio.c | 10 +++++++++- > >>> drivers/video/fbdev/core/fbmem.c | 4 ++++ > >>> include/linux/fb.h | 1 + > >>> 3 files changed, 14 insertions(+), 1 deletion(-) > >>> > >>> diff --git a/drivers/video/fbdev/core/fb_defio.c b/drivers/video/fbdev/core/fb_defio.c > >>> index c730253ab85c..583cbcf09446 100644 > >>> --- a/drivers/video/fbdev/core/fb_defio.c > >>> +++ b/drivers/video/fbdev/core/fb_defio.c > >>> @@ -313,7 +313,7 @@ void fb_deferred_io_open(struct fb_info *info, > >>> } > >>> EXPORT_SYMBOL_GPL(fb_deferred_io_open); > >>> -void fb_deferred_io_cleanup(struct fb_info *info) > >>> +void fb_deferred_io_release(struct fb_info *info) > >>> { > >>> struct fb_deferred_io *fbdefio = info->fbdefio; > >>> struct page *page; > >>> @@ -327,6 +327,14 @@ void fb_deferred_io_cleanup(struct fb_info *info) > >>> page = fb_deferred_io_page(info, i); > >>> page->mapping = NULL; > >>> } > >>> +} > >>> +EXPORT_SYMBOL_GPL(fb_deferred_io_release); > >> > >> It's all in the same module. No need to export this symbol. > > > > I noticed it, too, but just keep the same style as other functions :) > > That said, the other exported symbols are also useless. I can prepare > > another patch to clean it up. > > Your choice, but appreciated. Rather a choice by the maintainer, I'd say. thanks, Takashi
On Mon, 30 Jan 2023 09:52:43 +0100, Takashi Iwai wrote: > > > There's a call to cancel_delayed_work_sync() in the new helper > > fb_deferred_io_release(). Is this the right function? Maybe > > flush_delayed_work() is a better choice. > > I thought of that, but took a shorter path. > OK, let's check whether this keeps working with that change. Interestingly, this still makes the bug happening at the very same place. (The tested patch is below.) So, more looking and testing, more confusing the problem becomes :-< Takashi -- 8< -- --- a/drivers/video/fbdev/core/fb_defio.c +++ b/drivers/video/fbdev/core/fb_defio.c @@ -313,20 +313,31 @@ void fb_deferred_io_open(struct fb_info *info, } EXPORT_SYMBOL_GPL(fb_deferred_io_open); -void fb_deferred_io_cleanup(struct fb_info *info) +/* clear out the mapping that we setup */ +static void fb_deferred_io_clear_mapping(struct fb_info *info) { - struct fb_deferred_io *fbdefio = info->fbdefio; struct page *page; int i; - BUG_ON(!fbdefio); - cancel_delayed_work_sync(&info->deferred_work); - - /* clear out the mapping that we setup */ for (i = 0 ; i < info->fix.smem_len; i += PAGE_SIZE) { page = fb_deferred_io_page(info, i); page->mapping = NULL; } +} + +void fb_deferred_io_release(struct fb_info *info) +{ + flush_delayed_work(&info->deferred_work); + fb_deferred_io_clear_mapping(info); +} + +void fb_deferred_io_cleanup(struct fb_info *info) +{ + struct fb_deferred_io *fbdefio = info->fbdefio; + + BUG_ON(!fbdefio); + cancel_delayed_work_sync(&info->deferred_work); + fb_deferred_io_clear_mapping(info); kvfree(info->pagerefs); mutex_destroy(&fbdefio->lock); --- a/drivers/video/fbdev/core/fbmem.c +++ b/drivers/video/fbdev/core/fbmem.c @@ -1454,6 +1454,10 @@ __releases(&info->lock) struct fb_info * const info = file->private_data; lock_fb_info(info); +#if IS_ENABLED(CONFIG_FB_DEFERRED_IO) + if (info->fbdefio) + fb_deferred_io_release(info); +#endif if (info->fbops->fb_release) info->fbops->fb_release(info,1); module_put(info->fbops->owner); --- a/include/linux/fb.h +++ b/include/linux/fb.h @@ -662,6 +662,7 @@ extern int fb_deferred_io_init(struct fb_info *info); extern void fb_deferred_io_open(struct fb_info *info, struct inode *inode, struct file *file); +extern void fb_deferred_io_release(struct fb_info *info); extern void fb_deferred_io_cleanup(struct fb_info *info); extern int fb_deferred_io_fsync(struct file *file, loff_t start, loff_t end, int datasync);
Hi, this bug could be a symptom of the problem reported at [1]. Best regards Thomas [1] https://lore.kernel.org/dri-devel/CAM0jSHOcvZoyv-y6bnvFaUybRQsDx_mfOydL1uaNM4T4PgcA=A@mail.gmail.com/T/#mfbef4df9b49fc5fda9bcfa26db5ca13cdaef6d7e Am 29.01.23 um 09:28 schrieb Takashi Iwai: > When a fbdev with deferred I/O is once opened and closed, the dirty > pages still remain queued in the pageref list, and eventually later > those may be processed in the delayed work. This may lead to a > corruption of pages, hitting an Oops. > > This patch makes sure to cancel the delayed work and clean up the > pageref list at closing the device for addressing the bug. A part of > the cleanup code is factored out as a new helper function that is > called from the common fb_release(). > > Reviewed-by: Patrik Jakobsson <patrik.r.jakobsson@gmail.com> > Cc: <stable@vger.kernel.org> > Signed-off-by: Takashi Iwai <tiwai@suse.de> > --- > v1->v2: Fix build error without CONFIG_FB_DEFERRED_IO > > drivers/video/fbdev/core/fb_defio.c | 10 +++++++++- > drivers/video/fbdev/core/fbmem.c | 4 ++++ > include/linux/fb.h | 1 + > 3 files changed, 14 insertions(+), 1 deletion(-) > > diff --git a/drivers/video/fbdev/core/fb_defio.c b/drivers/video/fbdev/core/fb_defio.c > index c730253ab85c..583cbcf09446 100644 > --- a/drivers/video/fbdev/core/fb_defio.c > +++ b/drivers/video/fbdev/core/fb_defio.c > @@ -313,7 +313,7 @@ void fb_deferred_io_open(struct fb_info *info, > } > EXPORT_SYMBOL_GPL(fb_deferred_io_open); > > -void fb_deferred_io_cleanup(struct fb_info *info) > +void fb_deferred_io_release(struct fb_info *info) > { > struct fb_deferred_io *fbdefio = info->fbdefio; > struct page *page; > @@ -327,6 +327,14 @@ void fb_deferred_io_cleanup(struct fb_info *info) > page = fb_deferred_io_page(info, i); > page->mapping = NULL; > } > +} > +EXPORT_SYMBOL_GPL(fb_deferred_io_release); > + > +void fb_deferred_io_cleanup(struct fb_info *info) > +{ > + struct fb_deferred_io *fbdefio = info->fbdefio; > + > + fb_deferred_io_release(info); > > kvfree(info->pagerefs); > mutex_destroy(&fbdefio->lock); > diff --git a/drivers/video/fbdev/core/fbmem.c b/drivers/video/fbdev/core/fbmem.c > index 3a6c8458eb8d..ab3545a00abc 100644 > --- a/drivers/video/fbdev/core/fbmem.c > +++ b/drivers/video/fbdev/core/fbmem.c > @@ -1454,6 +1454,10 @@ __releases(&info->lock) > struct fb_info * const info = file->private_data; > > lock_fb_info(info); > +#if IS_ENABLED(CONFIG_FB_DEFERRED_IO) > + if (info->fbdefio) > + fb_deferred_io_release(info); > +#endif > if (info->fbops->fb_release) > info->fbops->fb_release(info,1); > module_put(info->fbops->owner); > diff --git a/include/linux/fb.h b/include/linux/fb.h > index 96b96323e9cb..73eb1f85ea8e 100644 > --- a/include/linux/fb.h > +++ b/include/linux/fb.h > @@ -662,6 +662,7 @@ extern int fb_deferred_io_init(struct fb_info *info); > extern void fb_deferred_io_open(struct fb_info *info, > struct inode *inode, > struct file *file); > +extern void fb_deferred_io_release(struct fb_info *info); > extern void fb_deferred_io_cleanup(struct fb_info *info); > extern int fb_deferred_io_fsync(struct file *file, loff_t start, > loff_t end, int datasync); -- Thomas Zimmermann Graphics Driver Developer SUSE Software Solutions Germany GmbH Maxfeldstr. 5, 90409 Nürnberg, Germany (HRB 36809, AG Nürnberg) Geschäftsführer: Ivo Totev
Am 29.01.23 um 09:28 schrieb Takashi Iwai: > When a fbdev with deferred I/O is once opened and closed, the dirty > pages still remain queued in the pageref list, and eventually later > those may be processed in the delayed work. This may lead to a > corruption of pages, hitting an Oops. > > This patch makes sure to cancel the delayed work and clean up the > pageref list at closing the device for addressing the bug. A part of > the cleanup code is factored out as a new helper function that is > called from the common fb_release(). > > Reviewed-by: Patrik Jakobsson <patrik.r.jakobsson@gmail.com> > Cc: <stable@vger.kernel.org> > Signed-off-by: Takashi Iwai <tiwai@suse.de> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> > --- > v1->v2: Fix build error without CONFIG_FB_DEFERRED_IO > > drivers/video/fbdev/core/fb_defio.c | 10 +++++++++- > drivers/video/fbdev/core/fbmem.c | 4 ++++ > include/linux/fb.h | 1 + > 3 files changed, 14 insertions(+), 1 deletion(-) > > diff --git a/drivers/video/fbdev/core/fb_defio.c b/drivers/video/fbdev/core/fb_defio.c > index c730253ab85c..583cbcf09446 100644 > --- a/drivers/video/fbdev/core/fb_defio.c > +++ b/drivers/video/fbdev/core/fb_defio.c > @@ -313,7 +313,7 @@ void fb_deferred_io_open(struct fb_info *info, > } > EXPORT_SYMBOL_GPL(fb_deferred_io_open); > > -void fb_deferred_io_cleanup(struct fb_info *info) > +void fb_deferred_io_release(struct fb_info *info) > { > struct fb_deferred_io *fbdefio = info->fbdefio; > struct page *page; > @@ -327,6 +327,14 @@ void fb_deferred_io_cleanup(struct fb_info *info) > page = fb_deferred_io_page(info, i); > page->mapping = NULL; > } > +} > +EXPORT_SYMBOL_GPL(fb_deferred_io_release); > + > +void fb_deferred_io_cleanup(struct fb_info *info) > +{ > + struct fb_deferred_io *fbdefio = info->fbdefio; > + > + fb_deferred_io_release(info); > > kvfree(info->pagerefs); > mutex_destroy(&fbdefio->lock); > diff --git a/drivers/video/fbdev/core/fbmem.c b/drivers/video/fbdev/core/fbmem.c > index 3a6c8458eb8d..ab3545a00abc 100644 > --- a/drivers/video/fbdev/core/fbmem.c > +++ b/drivers/video/fbdev/core/fbmem.c > @@ -1454,6 +1454,10 @@ __releases(&info->lock) > struct fb_info * const info = file->private_data; > > lock_fb_info(info); > +#if IS_ENABLED(CONFIG_FB_DEFERRED_IO) > + if (info->fbdefio) > + fb_deferred_io_release(info); > +#endif > if (info->fbops->fb_release) > info->fbops->fb_release(info,1); > module_put(info->fbops->owner); > diff --git a/include/linux/fb.h b/include/linux/fb.h > index 96b96323e9cb..73eb1f85ea8e 100644 > --- a/include/linux/fb.h > +++ b/include/linux/fb.h > @@ -662,6 +662,7 @@ extern int fb_deferred_io_init(struct fb_info *info); > extern void fb_deferred_io_open(struct fb_info *info, > struct inode *inode, > struct file *file); > +extern void fb_deferred_io_release(struct fb_info *info); > extern void fb_deferred_io_cleanup(struct fb_info *info); > extern int fb_deferred_io_fsync(struct file *file, loff_t start, > loff_t end, int datasync); -- Thomas Zimmermann Graphics Driver Developer SUSE Software Solutions Germany GmbH Maxfeldstr. 5, 90409 Nürnberg, Germany (HRB 36809, AG Nürnberg) Geschäftsführer: Ivo Totev
Added to drm-misc-fixes. Am 29.01.23 um 09:28 schrieb Takashi Iwai: > When a fbdev with deferred I/O is once opened and closed, the dirty > pages still remain queued in the pageref list, and eventually later > those may be processed in the delayed work. This may lead to a > corruption of pages, hitting an Oops. > > This patch makes sure to cancel the delayed work and clean up the > pageref list at closing the device for addressing the bug. A part of > the cleanup code is factored out as a new helper function that is > called from the common fb_release(). > > Reviewed-by: Patrik Jakobsson <patrik.r.jakobsson@gmail.com> > Cc: <stable@vger.kernel.org> > Signed-off-by: Takashi Iwai <tiwai@suse.de> > --- > v1->v2: Fix build error without CONFIG_FB_DEFERRED_IO > > drivers/video/fbdev/core/fb_defio.c | 10 +++++++++- > drivers/video/fbdev/core/fbmem.c | 4 ++++ > include/linux/fb.h | 1 + > 3 files changed, 14 insertions(+), 1 deletion(-) > > diff --git a/drivers/video/fbdev/core/fb_defio.c b/drivers/video/fbdev/core/fb_defio.c > index c730253ab85c..583cbcf09446 100644 > --- a/drivers/video/fbdev/core/fb_defio.c > +++ b/drivers/video/fbdev/core/fb_defio.c > @@ -313,7 +313,7 @@ void fb_deferred_io_open(struct fb_info *info, > } > EXPORT_SYMBOL_GPL(fb_deferred_io_open); > > -void fb_deferred_io_cleanup(struct fb_info *info) > +void fb_deferred_io_release(struct fb_info *info) > { > struct fb_deferred_io *fbdefio = info->fbdefio; > struct page *page; > @@ -327,6 +327,14 @@ void fb_deferred_io_cleanup(struct fb_info *info) > page = fb_deferred_io_page(info, i); > page->mapping = NULL; > } > +} > +EXPORT_SYMBOL_GPL(fb_deferred_io_release); > + > +void fb_deferred_io_cleanup(struct fb_info *info) > +{ > + struct fb_deferred_io *fbdefio = info->fbdefio; > + > + fb_deferred_io_release(info); > > kvfree(info->pagerefs); > mutex_destroy(&fbdefio->lock); > diff --git a/drivers/video/fbdev/core/fbmem.c b/drivers/video/fbdev/core/fbmem.c > index 3a6c8458eb8d..ab3545a00abc 100644 > --- a/drivers/video/fbdev/core/fbmem.c > +++ b/drivers/video/fbdev/core/fbmem.c > @@ -1454,6 +1454,10 @@ __releases(&info->lock) > struct fb_info * const info = file->private_data; > > lock_fb_info(info); > +#if IS_ENABLED(CONFIG_FB_DEFERRED_IO) > + if (info->fbdefio) > + fb_deferred_io_release(info); > +#endif > if (info->fbops->fb_release) > info->fbops->fb_release(info,1); > module_put(info->fbops->owner); > diff --git a/include/linux/fb.h b/include/linux/fb.h > index 96b96323e9cb..73eb1f85ea8e 100644 > --- a/include/linux/fb.h > +++ b/include/linux/fb.h > @@ -662,6 +662,7 @@ extern int fb_deferred_io_init(struct fb_info *info); > extern void fb_deferred_io_open(struct fb_info *info, > struct inode *inode, > struct file *file); > +extern void fb_deferred_io_release(struct fb_info *info); > extern void fb_deferred_io_cleanup(struct fb_info *info); > extern int fb_deferred_io_fsync(struct file *file, loff_t start, > loff_t end, int datasync); -- Thomas Zimmermann Graphics Driver Developer SUSE Software Solutions Germany GmbH Maxfeldstr. 5, 90409 Nürnberg, Germany (HRB 36809, AG Nürnberg) Geschäftsführer: Ivo Totev
diff --git a/drivers/video/fbdev/core/fb_defio.c b/drivers/video/fbdev/core/fb_defio.c index c730253ab85c..583cbcf09446 100644 --- a/drivers/video/fbdev/core/fb_defio.c +++ b/drivers/video/fbdev/core/fb_defio.c @@ -313,7 +313,7 @@ void fb_deferred_io_open(struct fb_info *info, } EXPORT_SYMBOL_GPL(fb_deferred_io_open); -void fb_deferred_io_cleanup(struct fb_info *info) +void fb_deferred_io_release(struct fb_info *info) { struct fb_deferred_io *fbdefio = info->fbdefio; struct page *page; @@ -327,6 +327,14 @@ void fb_deferred_io_cleanup(struct fb_info *info) page = fb_deferred_io_page(info, i); page->mapping = NULL; } +} +EXPORT_SYMBOL_GPL(fb_deferred_io_release); + +void fb_deferred_io_cleanup(struct fb_info *info) +{ + struct fb_deferred_io *fbdefio = info->fbdefio; + + fb_deferred_io_release(info); kvfree(info->pagerefs); mutex_destroy(&fbdefio->lock); diff --git a/drivers/video/fbdev/core/fbmem.c b/drivers/video/fbdev/core/fbmem.c index 3a6c8458eb8d..ab3545a00abc 100644 --- a/drivers/video/fbdev/core/fbmem.c +++ b/drivers/video/fbdev/core/fbmem.c @@ -1454,6 +1454,10 @@ __releases(&info->lock) struct fb_info * const info = file->private_data; lock_fb_info(info); +#if IS_ENABLED(CONFIG_FB_DEFERRED_IO) + if (info->fbdefio) + fb_deferred_io_release(info); +#endif if (info->fbops->fb_release) info->fbops->fb_release(info,1); module_put(info->fbops->owner); diff --git a/include/linux/fb.h b/include/linux/fb.h index 96b96323e9cb..73eb1f85ea8e 100644 --- a/include/linux/fb.h +++ b/include/linux/fb.h @@ -662,6 +662,7 @@ extern int fb_deferred_io_init(struct fb_info *info); extern void fb_deferred_io_open(struct fb_info *info, struct inode *inode, struct file *file); +extern void fb_deferred_io_release(struct fb_info *info); extern void fb_deferred_io_cleanup(struct fb_info *info); extern int fb_deferred_io_fsync(struct file *file, loff_t start, loff_t end, int datasync);