Message ID | 20221024043305.1491403-1-ira.weiny@intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp249112wru; Sun, 23 Oct 2022 21:35:24 -0700 (PDT) X-Google-Smtp-Source: AMsMyM7hsRagu3HyN7/9YornkUcVYGDrY97BkBUVjm14Z6nIy4cI5YzkhDx7te/cfysAWPGnzfLl X-Received: by 2002:a17:90b:1b03:b0:20d:ac3b:f1dd with SMTP id nu3-20020a17090b1b0300b0020dac3bf1ddmr62322761pjb.121.1666586124087; Sun, 23 Oct 2022 21:35:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666586124; cv=none; d=google.com; s=arc-20160816; b=Jvp22nZkQVVaVNX88N4ADRryyu3+TFY5pfcKCtZHheuyRUrpRra9kATYygvRiUTnj2 HzCepYoHVAR7YMLuLGzNEFAnAfCpsakA7mYOC8cJS80RQxNiVabeysIPL1FthYdYhHYr PJqtmqlcLa1B4aLxZ3+4PlcBWmXh8xpNU3vB3Gc3MV4b8MWyVioqHz5e3p4YswboVTNJ 09Xim8hVnVFYyx1AvaGDIssEY2joy7cJBpS74N2QqGm3YgCM9fK5xap+p+jCZLKSqRE2 97F0Rs1eETDrGFUPHRn4b02feUESGB+rgTriaYQohPM395NOTBruPuFkW2YIfgDJErgv BZ1g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=2oIag8Xv2h+YiHkF0FTwdKhoNgI0xFurHOTOrtKP0L0=; b=sUUu7hEJbwck9AO9F+snfFwP0RbMFoNN86aykO5wxYGQHEzrRqIuNFD2QN+HVYYW2M UoTNJVJ/0OgsNQwBoLaT3e7ghQW+QxElRy4z3hhf7urkV+/Op71twSo8dgw7434pSpLw +bzeBtkIo0Ax7E7vC8NxsN/hBn3TkYARsOOm3VF6EAOHjRFFWuyQFlC8pXD04UQO5xlW PAtHrDAzbKDa8m3gEhnROqKm3fR6xt4V5YQARW5nBCXkXJAJHKhbBD9Oo0lh5jFqpX/r z8pSibUAB7M36HQS7gSf5e/y3lDr8xoSFs/az7sm6wdpxxrOZb56BqLpz90gV5I7T72D DIFA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=lcPUjuqY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id x14-20020a63170e000000b0044bf9d57c38si36653302pgl.240.2022.10.23.21.35.10; Sun, 23 Oct 2022 21:35:24 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=lcPUjuqY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229455AbiJXEdV (ORCPT <rfc822;pwkd43@gmail.com> + 99 others); Mon, 24 Oct 2022 00:33:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37548 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229872AbiJXEdQ (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 24 Oct 2022 00:33:16 -0400 Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 306BA754BC for <linux-kernel@vger.kernel.org>; Sun, 23 Oct 2022 21:33:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1666585995; x=1698121995; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=ObwjONVUV1m6qgO+y0lFIH+jDLUNTJNwafsT6+bksmU=; b=lcPUjuqYqGSIRxorcfoUyv7MFrikZc9F6mbTLR85g+56kOSlhWTAsfmu B07hQElMoTSZh4tMByzlAfOaTQWFoQOR/Hmz+BtfqKGXzPSOWmv1VEAD5 9xQp3A8sqW+5z+uUJp7OiTYyZ25ZPwqrgsvqe7CQnvpxj8BO0lVeNAStS /a+icmkPM0Z+qT8N87nG64HnIpe8aSkJAQuisk+Pc9OaJO7t+SlPp45wU P0VbmC0P8ZH0RgBVP5XvpVDuzjiyzi/WVqaQKmaXoT4sHwQza5Wap6xv+ iVSkuKarhd5lxmRtSCveQQIJ5crqikibPNQJlT+C7bgb4Ow+5XyFKT6Fc A==; X-IronPort-AV: E=McAfee;i="6500,9779,10509"; a="371555184" X-IronPort-AV: E=Sophos;i="5.95,207,1661842800"; d="scan'208";a="371555184" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Oct 2022 21:33:14 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10509"; a="694424439" X-IronPort-AV: E=Sophos;i="5.95,207,1661842800"; d="scan'208";a="694424439" Received: from iweiny-mobl.amr.corp.intel.com (HELO localhost) ([10.252.130.95]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Oct 2022 21:33:12 -0700 From: ira.weiny@intel.com To: Andrew Morton <akpm@linux-foundation.org> Cc: Ira Weiny <ira.weiny@intel.com>, Randy Dunlap <rdunlap@infradead.org>, Peter Xu <peterx@redhat.com>, Andrea Arcangeli <aarcange@redhat.com>, Matthew Wilcox <willy@infradead.org>, kernel test robot <yujie.liu@intel.com>, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH for rc] mm/shmem: Ensure proper fallback if page faults Date: Sun, 23 Oct 2022 21:33:05 -0700 Message-Id: <20221024043305.1491403-1-ira.weiny@intel.com> X-Mailer: git-send-email 2.37.2 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.9 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747542211407280349?= X-GMAIL-MSGID: =?utf-8?q?1747542211407280349?= |
Series |
[for,rc] mm/shmem: Ensure proper fallback if page faults
|
|
Commit Message
Ira Weiny
Oct. 24, 2022, 4:33 a.m. UTC
From: Ira Weiny <ira.weiny@intel.com> The kernel test robot flagged a recursive lock as a result of a conversion from kmap_atomic() to kmap_local_folio()[Link] The cause was due to the code depending on the kmap_atomic() side effect of disabling page faults. In that case the code expects the fault to fail and take the fallback case. git archaeology implied that the recursion may not be an actual bug.[1] However, the mmap_lock needed in the fault may be the one held.[2] Add an explicit pagefault_disable() and a big comment to explain this for future souls looking at this code. [1] https://lore.kernel.org/all/Y1MymJ%2FINb45AdaY@iweiny-desk3/ [2] https://lore.kernel.org/all/Y1M2p9OtBGnKwGUE@x1n/ Fixes: 7a7256d5f512 ("shmem: convert shmem_mfill_atomic_pte() to use a folio") Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Peter Xu <peterx@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Reported-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reported-by: kernel test robot <yujie.liu@intel.com> Link: https://lore.kernel.org/r/202210211215.9dc6efb5-yujie.liu@intel.com Signed-off-by: Ira Weiny <ira.weiny@intel.com> --- Thanks to Matt and Andrew for initial diagnosis. Thanks to Randy for pointing out C code needs ';' :-D Thanks to Andrew for suggesting an elaborate comment Thanks to Peter for pointing out that the mm's may be the same. --- mm/shmem.c | 7 +++++++ 1 file changed, 7 insertions(+)
Comments
On Sun, Oct 23, 2022 at 09:33:05PM -0700, Ira wrote: > From: Ira Weiny <ira.weiny@intel.com> > > The kernel test robot flagged a recursive lock as a result of a > conversion from kmap_atomic() to kmap_local_folio()[Link] > > The cause was due to the code depending on the kmap_atomic() side effect > of disabling page faults. In that case the code expects the fault to > fail and take the fallback case. > > git archaeology implied that the recursion may not be an actual bug.[1] > However, the mmap_lock needed in the fault may be the one held.[2] > > Add an explicit pagefault_disable() and a big comment to explain this > for future souls looking at this code. > > [1] https://lore.kernel.org/all/Y1MymJ%2FINb45AdaY@iweiny-desk3/ > [2] https://lore.kernel.org/all/Y1M2p9OtBGnKwGUE@x1n/ > > Fixes: 7a7256d5f512 ("shmem: convert shmem_mfill_atomic_pte() to use a folio") > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Randy Dunlap <rdunlap@infradead.org> > Cc: Peter Xu <peterx@redhat.com> > Cc: Andrea Arcangeli <aarcange@redhat.com> > Reported-by: Matthew Wilcox (Oracle) <willy@infradead.org> > Reported-by: kernel test robot <yujie.liu@intel.com> > Link: https://lore.kernel.org/r/202210211215.9dc6efb5-yujie.liu@intel.com > Signed-off-by: Ira Weiny <ira.weiny@intel.com> > > --- > Thanks to Matt and Andrew for initial diagnosis. > Thanks to Randy for pointing out C code needs ';' :-D > Thanks to Andrew for suggesting an elaborate comment > Thanks to Peter for pointing out that the mm's may be the same. > --- > mm/shmem.c | 7 +++++++ > 1 file changed, 7 insertions(+) > > diff --git a/mm/shmem.c b/mm/shmem.c > index 8280a5cb48df..c1bca31cd485 100644 > --- a/mm/shmem.c > +++ b/mm/shmem.c > @@ -2424,9 +2424,16 @@ int shmem_mfill_atomic_pte(struct mm_struct *dst_mm, > > if (!zeropage) { /* COPY */ > page_kaddr = kmap_local_folio(folio, 0); > + /* > + * The mmap_lock is held here. Disable page faults to > + * prevent deadlock should copy_from_user() fault. The > + * copy will be retried outside the mmap_lock. > + */ Offline Dave Hansen and I were discussing this and he was concerned that this comment implies that a deadlock would always occur rather than might occur. I was not clear on this as I was thinking the read mmap_lock was non-recursive. So I think we have 3 cases only 1 of which will actually deadlock and is, as Dave puts it, currently theoretical. 1) Different mm's are in play (no issue) 2) Readlock implementation is recursive and same mm is in play (no issue) 3) Readlock implementation is _not_ recursive (issue) In both 1 and 2 lockdep is incorrectly flagging the issue but 3 is a problem and I think this is what Andrea was thinking. Is that the case? If so the above comment is incorrectly worded and I should update it. Ira > + pagefault_disable(); > ret = copy_from_user(page_kaddr, > (const void __user *)src_addr, > PAGE_SIZE); > + pagefault_enable(); > kunmap_local(page_kaddr); > > /* fallback to copy_from_user outside mmap_lock */ > -- > 2.37.2 >
On Mon, Oct 24, 2022 at 09:54:30AM -0700, Ira Weiny wrote: > On Sun, Oct 23, 2022 at 09:33:05PM -0700, Ira wrote: > > From: Ira Weiny <ira.weiny@intel.com> > > > > The kernel test robot flagged a recursive lock as a result of a > > conversion from kmap_atomic() to kmap_local_folio()[Link] > > > > The cause was due to the code depending on the kmap_atomic() side effect > > of disabling page faults. In that case the code expects the fault to > > fail and take the fallback case. > > > > git archaeology implied that the recursion may not be an actual bug.[1] > > However, the mmap_lock needed in the fault may be the one held.[2] > > > > Add an explicit pagefault_disable() and a big comment to explain this > > for future souls looking at this code. > > > > [1] https://lore.kernel.org/all/Y1MymJ%2FINb45AdaY@iweiny-desk3/ > > [2] https://lore.kernel.org/all/Y1M2p9OtBGnKwGUE@x1n/ > > > > Fixes: 7a7256d5f512 ("shmem: convert shmem_mfill_atomic_pte() to use a folio") > > Cc: Andrew Morton <akpm@linux-foundation.org> > > Cc: Randy Dunlap <rdunlap@infradead.org> > > Cc: Peter Xu <peterx@redhat.com> > > Cc: Andrea Arcangeli <aarcange@redhat.com> > > Reported-by: Matthew Wilcox (Oracle) <willy@infradead.org> > > Reported-by: kernel test robot <yujie.liu@intel.com> > > Link: https://lore.kernel.org/r/202210211215.9dc6efb5-yujie.liu@intel.com > > Signed-off-by: Ira Weiny <ira.weiny@intel.com> > > > > --- > > Thanks to Matt and Andrew for initial diagnosis. > > Thanks to Randy for pointing out C code needs ';' :-D > > Thanks to Andrew for suggesting an elaborate comment > > Thanks to Peter for pointing out that the mm's may be the same. > > --- > > mm/shmem.c | 7 +++++++ > > 1 file changed, 7 insertions(+) > > > > diff --git a/mm/shmem.c b/mm/shmem.c > > index 8280a5cb48df..c1bca31cd485 100644 > > --- a/mm/shmem.c > > +++ b/mm/shmem.c > > @@ -2424,9 +2424,16 @@ int shmem_mfill_atomic_pte(struct mm_struct *dst_mm, > > > > if (!zeropage) { /* COPY */ > > page_kaddr = kmap_local_folio(folio, 0); > > + /* > > + * The mmap_lock is held here. Disable page faults to > > + * prevent deadlock should copy_from_user() fault. The > > + * copy will be retried outside the mmap_lock. > > + */ > > Offline Dave Hansen and I were discussing this and he was concerned that this > comment implies that a deadlock would always occur rather than might occur. > > I was not clear on this as I was thinking the read mmap_lock was non-recursive. > > So I think we have 3 cases only 1 of which will actually deadlock and is, as > Dave puts it, currently theoretical. > > 1) Different mm's are in play (no issue) > 2) Readlock implementation is recursive and same mm is in play (no issue) > 3) Readlock implementation is _not_ recursive (issue) > > In both 1 and 2 lockdep is incorrectly flagging the issue but 3 is a problem > and I think this is what Andrea was thinking. The readlock implementation is only recursive if nobody else has taken a write lock. AIUI, no other process can take a write lock on the mmap_lock (other processes can take read locks by examining /proc/$pid/maps, for example), although maybe ptrace can take the mmap_lock for write? But if you have a multithreaded process, one of the other threads can call mmap() and that will prevent recursion (due to fairness). Even if it's a different process that you're trying to acquire the mmap read lock on, you can still get into a deadly embrace. eg: process A thread 1 takes read lock on own mmap_lock process A thread 2 calls mmap, blocks taking write lock process B thread 1 takes page fault, read lock on own mmap lock process B thread 2 calls mmap, blocks taking write lock process A thread 1 blocks taking read lock on process B process B thread 1 blocks taking read lock on process A Now all four threads are blocked waiting for each other.
On Mon, Oct 24, 2022 at 09:54:30AM -0700, Ira Weiny wrote: > On Sun, Oct 23, 2022 at 09:33:05PM -0700, Ira wrote: > > From: Ira Weiny <ira.weiny@intel.com> > > > > The kernel test robot flagged a recursive lock as a result of a > > conversion from kmap_atomic() to kmap_local_folio()[Link] > > > > The cause was due to the code depending on the kmap_atomic() side effect > > of disabling page faults. In that case the code expects the fault to > > fail and take the fallback case. > > > > git archaeology implied that the recursion may not be an actual bug.[1] > > However, the mmap_lock needed in the fault may be the one held.[2] > > > > Add an explicit pagefault_disable() and a big comment to explain this > > for future souls looking at this code. > > > > [1] https://lore.kernel.org/all/Y1MymJ%2FINb45AdaY@iweiny-desk3/ > > [2] https://lore.kernel.org/all/Y1M2p9OtBGnKwGUE@x1n/ > > > > Fixes: 7a7256d5f512 ("shmem: convert shmem_mfill_atomic_pte() to use a folio") > > Cc: Andrew Morton <akpm@linux-foundation.org> > > Cc: Randy Dunlap <rdunlap@infradead.org> > > Cc: Peter Xu <peterx@redhat.com> > > Cc: Andrea Arcangeli <aarcange@redhat.com> > > Reported-by: Matthew Wilcox (Oracle) <willy@infradead.org> > > Reported-by: kernel test robot <yujie.liu@intel.com> > > Link: https://lore.kernel.org/r/202210211215.9dc6efb5-yujie.liu@intel.com > > Signed-off-by: Ira Weiny <ira.weiny@intel.com> > > > > --- > > Thanks to Matt and Andrew for initial diagnosis. > > Thanks to Randy for pointing out C code needs ';' :-D > > Thanks to Andrew for suggesting an elaborate comment > > Thanks to Peter for pointing out that the mm's may be the same. > > --- > > mm/shmem.c | 7 +++++++ > > 1 file changed, 7 insertions(+) > > > > diff --git a/mm/shmem.c b/mm/shmem.c > > index 8280a5cb48df..c1bca31cd485 100644 > > --- a/mm/shmem.c > > +++ b/mm/shmem.c > > @@ -2424,9 +2424,16 @@ int shmem_mfill_atomic_pte(struct mm_struct *dst_mm, > > > > if (!zeropage) { /* COPY */ > > page_kaddr = kmap_local_folio(folio, 0); > > + /* > > + * The mmap_lock is held here. Disable page faults to > > + * prevent deadlock should copy_from_user() fault. The > > + * copy will be retried outside the mmap_lock. > > + */ > > Offline Dave Hansen and I were discussing this and he was concerned that this > comment implies that a deadlock would always occur rather than might occur. Agreed, "prevent deadlock" might be too strong in this context. > > I was not clear on this as I was thinking the read mmap_lock was non-recursive. > > So I think we have 3 cases only 1 of which will actually deadlock and is, as > Dave puts it, currently theoretical. > > 1) Different mm's are in play (no issue) > 2) Readlock implementation is recursive and same mm is in play (no issue) > 3) Readlock implementation is _not_ recursive (issue) > > In both 1 and 2 lockdep is incorrectly flagging the issue but 3 is a problem > and I think this is what Andrea was thinking. > > Is that the case? IMHO it would be good enough to just mention lockdep (as it can definitely trigger) or just quote similarly as Andrea's original comment somehow: If the rwsem starves writers it wasn't strictly a bug but lockdep doesn't like it and this avoids depending on lowlevel implementation details of the lock. IIUC no deadlock could really trigger at that time or Andrea should have written it in some other way. It also has actually summarized the goal that then we won't rely on rwsem impl but just make it always work. Thanks, > > If so the above comment is incorrectly worded and I should update it. > > Ira > > > + pagefault_disable(); > > ret = copy_from_user(page_kaddr, > > (const void __user *)src_addr, > > PAGE_SIZE); > > + pagefault_enable(); > > kunmap_local(page_kaddr); > > > > /* fallback to copy_from_user outside mmap_lock */ > > -- > > 2.37.2 > > >
diff --git a/mm/shmem.c b/mm/shmem.c index 8280a5cb48df..c1bca31cd485 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -2424,9 +2424,16 @@ int shmem_mfill_atomic_pte(struct mm_struct *dst_mm, if (!zeropage) { /* COPY */ page_kaddr = kmap_local_folio(folio, 0); + /* + * The mmap_lock is held here. Disable page faults to + * prevent deadlock should copy_from_user() fault. The + * copy will be retried outside the mmap_lock. + */ + pagefault_disable(); ret = copy_from_user(page_kaddr, (const void __user *)src_addr, PAGE_SIZE); + pagefault_enable(); kunmap_local(page_kaddr); /* fallback to copy_from_user outside mmap_lock */