Message ID | 20230519093953.10972-1-arnd@kernel.org |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp1106270vqo; Fri, 19 May 2023 02:44:58 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7NhB0zjnKBEyn/cocfs2Dn6WQmb+cRUuamGbGQ6ezrspyB9abPyaK7NuUJVLi26GFyxyyg X-Received: by 2002:a17:902:c944:b0:1ab:675:3e0c with SMTP id i4-20020a170902c94400b001ab06753e0cmr2804731pla.33.1684489498438; Fri, 19 May 2023 02:44:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1684489498; cv=none; d=google.com; s=arc-20160816; b=Nx+JrgAUwhmm/Pc665xy1lEO55UjJ5KDBVagcOxpOr8RvaH6aWFtizyWoHIG/Amon8 UrwTsXNefrJs1x3kx+JslL/WEbA1ycVD3eDwqvqA3/0jga4juJUTRpFO7wSuVfJkXhzx sOwuWUaMOnndDtvXqVxH3pSXn6pyrgqcV3eC2DupQ6V2NUyYqz5B6QOGdOU1Ppdouw0/ e9Srdu12BO0SP8/jSLVy76ORBCMxUUw6W2+1QYjQCbaMSM0SJOzGgooCCmUP/1mTGeA2 vXD9/+ysxYK0OhX0Wppbv4Mn2N7+kfNOhTggK89+2c708X7tE02Y6kJ4OJwxKeKAgCuW u3mA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=ezp62D7PEgdKiYtNUvBAWKopeE5PIJ1hfgWBtclaTd8=; b=E9n1pyd+GkHe9QZCzOjKkIuhh+zju9jvuyKQtxKj7nWs/NWn/K1bUmipksJl7sw4zj AdFDP33rUF+skI2WPrbW8gzypP63Wf7KUxuy91MSb8DwNhoQstRRl8v5jVPh8HHiNtWL mD25YCvB06KznRWyKU7XyTBtRYtwqtmBTY2YC6kjTwihR1nXewz+KCa7tqFdvV7Pn3Fs SGmjlSKZvMhO7FjV1tjuGEex1lwFVQvQAb5tZW69rHDyoDTOlUwpVepG9oizW9QsARJQ lTRAazx2h5Oq7o9rw7Rz0WKK454w9fRAhTttt8Cz+U+YsNLq9oa/UjW+L3bO78TS1Jmm K1Lg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=r5ynEHQ4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id a2-20020a170902900200b001ae4d1c1288si3243085plp.279.2023.05.19.02.44.41; Fri, 19 May 2023 02:44:58 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=r5ynEHQ4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231165AbjESJn2 (ORCPT <rfc822;wlfightup@gmail.com> + 99 others); Fri, 19 May 2023 05:43:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47548 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231720AbjESJnK (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 19 May 2023 05:43:10 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0D00D2D79; Fri, 19 May 2023 02:41:18 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 1A3AF61167; Fri, 19 May 2023 09:40:05 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 00C86C433EF; Fri, 19 May 2023 09:40:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1684489204; bh=62m/R5D6FGhqWG6KoI9PHPEn7TdMdQqLFV3wk0LBtDc=; h=From:To:Cc:Subject:Date:From; b=r5ynEHQ4VnYlo+YQjrf4A6aNXFbJk6j3c64Y7p0QM+DrWvOKJbZh68x46Tqydb38O MzJEmqOAOCv/UjBxt/7hYmy5B5C8xXNk+Pk4slcPAxTzm5btMMHC8Tq4NLUcW99nsw EHkGmjo/iZ9JE/YY1bu16vysaEMjywfEh8dG2cv4T4U7F1sYQWJ137x0R7n1LKD+ug 5o7PlE0O4QN5ybvdtJkliLy1JQG4V3dwzKwfGg4D+x90k9W+F1nhotMLMP+WPAY/so mQGAkaUOWqty6IqB1mzWmq2/FnYe5i3AgFTYmLaxD46SMuJQ7Cr0Ckz7+VFoXphUWy fEs8pclnp2+HQ== From: Arnd Bergmann <arnd@kernel.org> To: Andrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de>, Catalin Marinas <catalin.marinas@arm.com>, Will Deacon <will@kernel.org>, Peter Zijlstra <peterz@infradead.org>, Ingo Molnar <mingo@redhat.com>, Arnaldo Carvalho de Melo <acme@kernel.org>, Mark Rutland <mark.rutland@arm.com>, Alexander Shishkin <alexander.shishkin@linux.intel.com>, Jiri Olsa <jolsa@kernel.org>, Namhyung Kim <namhyung@kernel.org>, Ian Rogers <irogers@google.com>, Adrian Hunter <adrian.hunter@intel.com>, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-perf-users@vger.kernel.org Subject: [PATCH] [suggestion] mm/gup: avoid IS_ERR_OR_NULL Date: Fri, 19 May 2023 11:39:13 +0200 Message-Id: <20230519093953.10972-1-arnd@kernel.org> X-Mailer: git-send-email 2.39.2 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1766315260182089491?= X-GMAIL-MSGID: =?utf-8?q?1766315260182089491?= |
Series |
[suggestion] mm/gup: avoid IS_ERR_OR_NULL
|
|
Commit Message
Arnd Bergmann
May 19, 2023, 9:39 a.m. UTC
From: Arnd Bergmann <arnd@arndb.de> While looking at an unused-variable warning, I noticed a new interface coming in that requires the use of IS_ERR_OR_NULL(), which tends to indicate bad interface design and is usually surprising to users. Change get_user_page_vma_remote() to return -EIO when no pages were found and adapt the callers to match. Fixes: eca1a00155df ("mm/gup: remove vmas parameter from get_user_pages_remote()") Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- I see the real bug is already fixed, but this seemed worth pointing out still. Not sure if this is the best way to handle the return types here, but the version in linux-next doesn't look great either. --- arch/arm64/kernel/mte.c | 4 ++-- include/linux/mm.h | 2 +- kernel/events/uprobes.c | 5 ++++- mm/memory.c | 2 +- 4 files changed, 8 insertions(+), 5 deletions(-)
Comments
Given you are sharply criticising the code I authored here, is it too much to ask for you to cc- me, the author on commentaries like this? Thanks. On Fri, May 19, 2023 at 11:39:13AM +0200, Arnd Bergmann wrote: > From: Arnd Bergmann <arnd@arndb.de> > > While looking at an unused-variable warning, I noticed a new interface coming > in that requires the use of IS_ERR_OR_NULL(), which tends to indicate bad > interface design and is usually surprising to users. I am not sure I understand your reasoning, why does it 'tend to indicate bad interface design'? You say that as if it is an obvious truth. Not obvious to me at all. There are 3 possible outcomes from the function - an error, the function failing to pin a page, or it succeeding in doing so. For some of the callers that results in an error, for others it is not an error. Overloading EIO on the assumption that gup will never, ever return this indicating an error seems to me a worse solution. > > Change get_user_page_vma_remote() to return -EIO when no pages were > found and adapt the callers to match. > > Fixes: eca1a00155df ("mm/gup: remove vmas parameter from get_user_pages_remote()") > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > I see the real bug is already fixed, but this seemed worth pointing out still. > Not sure if this is the best way to handle the return types here, but the version > in linux-next doesn't look great either. > --- > arch/arm64/kernel/mte.c | 4 ++-- > include/linux/mm.h | 2 +- > kernel/events/uprobes.c | 5 ++++- > mm/memory.c | 2 +- > 4 files changed, 8 insertions(+), 5 deletions(-) > > diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c > index 4c5ef9b20065..6983ba35ce16 100644 > --- a/arch/arm64/kernel/mte.c > +++ b/arch/arm64/kernel/mte.c > @@ -434,8 +434,8 @@ static int __access_remote_tags(struct mm_struct *mm, unsigned long addr, > struct page *page = get_user_page_vma_remote(mm, addr, > gup_flags, &vma); > > - if (IS_ERR_OR_NULL(page)) { > - err = page == NULL ? -EIO : PTR_ERR(page); > + if (IS_ERR(page)) { > + err = PTR_ERR(page); > break; > } > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 42ff3e04c006..4bb172e4818c 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -2397,7 +2397,7 @@ static inline struct page *get_user_page_vma_remote(struct mm_struct *mm, > if (got < 0) > return ERR_PTR(got); > if (got == 0) > - return NULL; > + return ERR_PTR(-EIO); > > vma = vma_lookup(mm, addr); > if (WARN_ON_ONCE(!vma)) { > diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c > index cac3aef7c6f7..9cf2d4ba760e 100644 > --- a/kernel/events/uprobes.c > +++ b/kernel/events/uprobes.c > @@ -474,7 +474,10 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm, > gup_flags |= FOLL_SPLIT_PMD; > /* Read the page with vaddr into memory */ > old_page = get_user_page_vma_remote(mm, vaddr, gup_flags, &vma); > - if (IS_ERR_OR_NULL(old_page)) > + if (old_page == ERR_PTR(-EIO)) > + return 0; > + > + if (IS_ERR(old_page)) > return PTR_ERR(old_page); I hate this, you're now using an error to indicate a non-error state. Also you have no idea whether get_user_page_vma_remote() has encountered an error condition returning -EIO rather than not pinning anything so this could also be broken. > > ret = verify_opcode(old_page, vaddr, &opcode); > diff --git a/mm/memory.c b/mm/memory.c > index 8358f3b853f2..f9a81278e76d 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -5604,7 +5604,7 @@ int __access_remote_vm(struct mm_struct *mm, unsigned long addr, void *buf, > struct page *page = get_user_page_vma_remote(mm, addr, > gup_flags, &vma); > > - if (IS_ERR_OR_NULL(page)) { > + if (IS_ERR(page)) { > #ifndef CONFIG_HAVE_IOREMAP_PROT > break; > #else > -- > 2.39.2 > Not a fan at all of this patch, it doesn't achieve anything useful, is in service of some theoretical improvement, and actually introduces a new class of bug (differentiating EIO and failing to pin).
On Fri, May 19, 2023, at 16:51, Lorenzo Stoakes wrote: > Given you are sharply criticising the code I authored here, is it too much > to ask for you to cc- me, the author on commentaries like this? Thanks. My mistake, I expected this to get added automatically based on the "Fixes:" tag, I probably dropped you by accident in the end. > On Fri, May 19, 2023 at 11:39:13AM +0200, Arnd Bergmann wrote: >> From: Arnd Bergmann <arnd@arndb.de> >> >> While looking at an unused-variable warning, I noticed a new interface coming >> in that requires the use of IS_ERR_OR_NULL(), which tends to indicate bad >> interface design and is usually surprising to users. > > I am not sure I understand your reasoning, why does it 'tend to indicate > bad interface design'? You say that as if it is an obvious truth. Not > obvious to me at all. > > There are 3 possible outcomes from the function - an error, the function > failing to pin a page, or it succeeding in doing so. For some of the > callers that results in an error, for others it is not an error. > > Overloading EIO on the assumption that gup will never, ever return this > indicating an error seems to me a worse solution. The problem is that we have inconsistent error handling in functions that return an object, about half of them use NULL to indicate an error, and the other half use ERR_PTR(), and users frequently get those wrong by picking the wrong one. Functions that can return both make this worse because whichever of the two normal ways a user expects, they still get it wrong. > Not a fan at all of this patch, it doesn't achieve anything useful, is in > service of some theoretical improvement, and actually introduces a new > class of bug (differentiating EIO and failing to pin). Having another -EIO return code is a problem, so I agree that my patch wouldn't be good either. Maybe separating the error return from the page pointer by passing a 'struct page **p' argument that gets filled would help? Arnd
On Fri, May 19, 2023 at 05:09:35PM +0200, Arnd Bergmann wrote: > On Fri, May 19, 2023, at 16:51, Lorenzo Stoakes wrote: > > Given you are sharply criticising the code I authored here, is it too much > > to ask for you to cc- me, the author on commentaries like this? Thanks. > > My mistake, I expected this to get added automatically based on > the "Fixes:" tag, I probably dropped you by accident in the end. > OK no worries, it's often the way that something is purely accidental but seems ruder than intended (or even rude at all) because text is a terrible format :) > > On Fri, May 19, 2023 at 11:39:13AM +0200, Arnd Bergmann wrote: > >> From: Arnd Bergmann <arnd@arndb.de> > >> > >> While looking at an unused-variable warning, I noticed a new interface coming > >> in that requires the use of IS_ERR_OR_NULL(), which tends to indicate bad > >> interface design and is usually surprising to users. > > > > I am not sure I understand your reasoning, why does it 'tend to indicate > > bad interface design'? You say that as if it is an obvious truth. Not > > obvious to me at all. > > > > There are 3 possible outcomes from the function - an error, the function > > failing to pin a page, or it succeeding in doing so. For some of the > > callers that results in an error, for others it is not an error. > > > > Overloading EIO on the assumption that gup will never, ever return this > > indicating an error seems to me a worse solution. > > The problem is that we have inconsistent error handling in functions > that return an object, about half of them use NULL to indicate an error, > and the other half use ERR_PTR(), and users frequently get those > wrong by picking the wrong one. Functions that can return both make > this worse because whichever of the two normal ways a user expects, > they still get it wrong. > > > Not a fan at all of this patch, it doesn't achieve anything useful, is in > > service of some theoretical improvement, and actually introduces a new > > class of bug (differentiating EIO and failing to pin). > > Having another -EIO return code is a problem, so I agree that > my patch wouldn't be good either. Maybe separating the error return > from the page pointer by passing a 'struct page **p' argument that > gets filled would help? Yeah I see your point, in the majority of cases failing to pin is an error, I just wonder if something like adding another parameter wouldn't just add more noise/confusion here than it saves? Sadly I think aspects of this are C sucking at dealing with multiple return values sanely, and there probably isn't a totally nice way of dealing with this. > > Arnd
On Fri, May 19, 2023 at 03:51:51PM +0100, Lorenzo Stoakes wrote: > Given you are sharply criticising the code I authored here, is it too much > to ask for you to cc- me, the author on commentaries like this? Thanks. > > On Fri, May 19, 2023 at 11:39:13AM +0200, Arnd Bergmann wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > > > While looking at an unused-variable warning, I noticed a new interface coming > > in that requires the use of IS_ERR_OR_NULL(), which tends to indicate bad > > interface design and is usually surprising to users. > > I am not sure I understand your reasoning, why does it 'tend to indicate > bad interface design'? You say that as if it is an obvious truth. Not > obvious to me at all. > > There are 3 possible outcomes from the function - an error, the function > failing to pin a page, or it succeeding in doing so. For some of the > callers that results in an error, for others it is not an error. No, there really isn't. Either it pins the page or it doesn't. Returning "NULL" to mean a specific kind of failure was encountered is crazy.. Especially if we don't document what that specific failure even was. IIRC if you look really closely the only time get_user_pages() actually returns 0 is if the input argument validation fails, which I think is a bug that should be fixed. get_user_pages() never returns 0, so get_user_page_vma_remote() never returns NULL. Until we get there collapsing the 0 to EIO is perfectly fine. Jason
On Fri, May 19, 2023 at 07:17:41PM -0300, Jason Gunthorpe wrote: > On Fri, May 19, 2023 at 03:51:51PM +0100, Lorenzo Stoakes wrote: > > Given you are sharply criticising the code I authored here, is it too much > > to ask for you to cc- me, the author on commentaries like this? Thanks. > > > > On Fri, May 19, 2023 at 11:39:13AM +0200, Arnd Bergmann wrote: > > > From: Arnd Bergmann <arnd@arndb.de> > > > > > > While looking at an unused-variable warning, I noticed a new interface coming > > > in that requires the use of IS_ERR_OR_NULL(), which tends to indicate bad > > > interface design and is usually surprising to users. > > > > I am not sure I understand your reasoning, why does it 'tend to indicate > > bad interface design'? You say that as if it is an obvious truth. Not > > obvious to me at all. > > > > There are 3 possible outcomes from the function - an error, the function > > failing to pin a page, or it succeeding in doing so. For some of the > > callers that results in an error, for others it is not an error. > > No, there really isn't. > > Either it pins the page or it doesn't. Returning "NULL" to mean a > specific kind of failure was encountered is crazy.. Especially if we > don't document what that specific failure even was. > It's not a specific kind of failure, it's literally "I didn't pin any pages" which a caller may or may not choose to interpret as a failure. > IIRC if you look really closely the only time get_user_pages() > actually returns 0 is if the input argument validation fails, which I > think is a bug that should be fixed. That can be a reason for gup returning 0 but also if it you look at the main loop in __get_user_pages_locked(), if it can't find the VMA it will bail early, OR if the VMA flags are not as expected it'll bail early. > > get_user_pages() never returns 0, so get_user_page_vma_remote() never > returns NULL. Until we get there collapsing the 0 to EIO is perfectly > fine. Well no, as shown above actually there is a distinct third state, i.e. couldn't pin, which if you see there is at least one case where the caller differentiates between an error and not being able to pin - uprobe_write_opcode() - which treats failure to pin as a non-error state. Also if we decided at some point to return -EIO as an error suddenly we would be treating an error state as not an error state in the proposed code which sounds like a foot gun. > > Jason
On Sat, May 20, 2023 at 06:19:37AM +0100, Lorenzo Stoakes wrote: > On Fri, May 19, 2023 at 07:17:41PM -0300, Jason Gunthorpe wrote: > > On Fri, May 19, 2023 at 03:51:51PM +0100, Lorenzo Stoakes wrote: > > > Given you are sharply criticising the code I authored here, is it too much > > > to ask for you to cc- me, the author on commentaries like this? Thanks. > > > > > > On Fri, May 19, 2023 at 11:39:13AM +0200, Arnd Bergmann wrote: > > > > From: Arnd Bergmann <arnd@arndb.de> > > > > > > > > While looking at an unused-variable warning, I noticed a new interface coming > > > > in that requires the use of IS_ERR_OR_NULL(), which tends to indicate bad > > > > interface design and is usually surprising to users. > > > > > > I am not sure I understand your reasoning, why does it 'tend to indicate > > > bad interface design'? You say that as if it is an obvious truth. Not > > > obvious to me at all. > > > > > > There are 3 possible outcomes from the function - an error, the function > > > failing to pin a page, or it succeeding in doing so. For some of the > > > callers that results in an error, for others it is not an error. > > > > No, there really isn't. > > > > Either it pins the page or it doesn't. Returning "NULL" to mean a > > specific kind of failure was encountered is crazy.. Especially if we > > don't document what that specific failure even was. > > > > It's not a specific kind of failure, it's literally "I didn't pin any > pages" which a caller may or may not choose to interpret as a failure. Any time gup fails it didn't pin any pages, that is the whole point. All that is happening is some ill defined subset of gup errors are returning 0 instead of an error code. If we want to enable callers to ignore certain errors then we need to return error codes with well defined meanings, have documentation what errors are included and actually make it sane. > That can be a reason for gup returning 0 but also if it you look at the > main loop in __get_user_pages_locked(), if it can't find the VMA it will > bail early, OR if the VMA flags are not as expected it'll bail early. And how does that make any sense? Missing VMA should be EFAULT. > caller differentiates between an error and not being able to pin - > uprobe_write_opcode() - which treats failure to pin as a non-error state. That looks like a bug since the function returns 0 on success but it clearly didn't succeed. > Also if we decided at some point to return -EIO as an error suddenly we > would be treating an error state as not an error state in the proposed code > which sounds like a foot gun. No, this returning 0 on failure is a foot gun. Failing to pin a single page is always an error, the only question is what sub reason caused the error to happen. There is no third case where it is not an error. Jason
On Sat, May 20, 2023 at 05:25:52AM -0300, Jason Gunthorpe wrote: > On Sat, May 20, 2023 at 06:19:37AM +0100, Lorenzo Stoakes wrote: > > On Fri, May 19, 2023 at 07:17:41PM -0300, Jason Gunthorpe wrote: > > > On Fri, May 19, 2023 at 03:51:51PM +0100, Lorenzo Stoakes wrote: > > > > Given you are sharply criticising the code I authored here, is it too much > > > > to ask for you to cc- me, the author on commentaries like this? Thanks. > > > > > > > > On Fri, May 19, 2023 at 11:39:13AM +0200, Arnd Bergmann wrote: > > > > > From: Arnd Bergmann <arnd@arndb.de> > > > > > > > > > > While looking at an unused-variable warning, I noticed a new interface coming > > > > > in that requires the use of IS_ERR_OR_NULL(), which tends to indicate bad > > > > > interface design and is usually surprising to users. > > > > > > > > I am not sure I understand your reasoning, why does it 'tend to indicate > > > > bad interface design'? You say that as if it is an obvious truth. Not > > > > obvious to me at all. > > > > > > > > There are 3 possible outcomes from the function - an error, the function > > > > failing to pin a page, or it succeeding in doing so. For some of the > > > > callers that results in an error, for others it is not an error. > > > > > > No, there really isn't. > > > > > > Either it pins the page or it doesn't. Returning "NULL" to mean a > > > specific kind of failure was encountered is crazy.. Especially if we > > > don't document what that specific failure even was. > > > > > > > It's not a specific kind of failure, it's literally "I didn't pin any > > pages" which a caller may or may not choose to interpret as a failure. > > Any time gup fails it didn't pin any pages, that is the whole > point. All that is happening is some ill defined subset of gup errors > are returning 0 instead of an error code. > > If we want to enable callers to ignore certain errors then we need to > return error codes with well defined meanings, have documentation what > errors are included and actually make it sane. Yeah I agree it's not exactly great that a failure to pin can be considered an ordinary case, but I don't think a wrapper function is where we should be trying to fix this. In fact this patch isn't even fixing it, it's treating EIO as a success case for the (possibly broken) uprobe case. I think we are at the wrong level of abstraction here, basically. > > > That can be a reason for gup returning 0 but also if it you look at the > > main loop in __get_user_pages_locked(), if it can't find the VMA it will > > bail early, OR if the VMA flags are not as expected it'll bail early. > > And how does that make any sense? Missing VMA should be EFAULT. Yeah missing VMA doesn't really make sense since we invoke the function with the mmap lock held (it _could_ make sense if you were calling it via one of the unlocked functions, speculatively, though how sensible doing that is another discussion...) > > > caller differentiates between an error and not being able to pin - > > uprobe_write_opcode() - which treats failure to pin as a non-error state. > > That looks like a bug since the function returns 0 on success but it > clearly didn't succeed. The code is specifically handling a failure-to-pin separately - set_swbp() -> uprobe_write_opcode() -> install_breakpoint() explicitly does the following:- ret = set_swbp(&uprobe->arch, mm, vaddr); if (!ret) clear_bit(MMF_RECALC_UPROBES, &mm->flags); So even if this is... questionable, the code literally does want to differentiate between an error and a failure to pin. Presumably this is because of the flag check, but yeah we should be differentiating between sub-cases. > > > Also if we decided at some point to return -EIO as an error suddenly we > > would be treating an error state as not an error state in the proposed code > > which sounds like a foot gun. > > No, this returning 0 on failure is a foot gun. Failing to pin a single > page is always an error, the only question is what sub reason caused > the error to happen. There is no third case where it is not an error. > > Jason The uprobe path thinks otherwise, but maybe the answer is that we just need to -EFAULT on missing VMA and -EPERM on invalid flags. I could look into a patch that tries to undo this convention, and then we could revisit this for the wrapper function too.
On Sat, May 20, 2023 at 10:12:40AM +0100, Lorenzo Stoakes wrote: > > No, this returning 0 on failure is a foot gun. Failing to pin a single > > page is always an error, the only question is what sub reason caused > > the error to happen. There is no third case where it is not an error. > > The uprobe path thinks otherwise, but maybe the answer is that we just need > to -EFAULT on missing VMA and -EPERM on invalid flags. I think uprobe is just broken to think there is a third outcome. Let's just fix it instead of trying to pretend it makes sense. Jason
On Sat, May 27, 2023 at 06:52:01AM -0300, Jason Gunthorpe wrote: > On Sat, May 20, 2023 at 10:12:40AM +0100, Lorenzo Stoakes wrote: > > > No, this returning 0 on failure is a foot gun. Failing to pin a single > > > page is always an error, the only question is what sub reason caused > > > the error to happen. There is no third case where it is not an error. > > > > The uprobe path thinks otherwise, but maybe the answer is that we just need > > to -EFAULT on missing VMA and -EPERM on invalid flags. > > I think uprobe is just broken to think there is a third outcome. Let's > just fix it instead of trying to pretend it makes sense. Sure, will take a look at that if I get a chance. We can at the very least adjust get_user_page_vma_remote() with this fixed. Do you feel that a partially successful pinning for other GUP callers should equally be treated as an error (and pages unpinned -> return error code)? In that instance we'd need to audit things somewhat. I have a couple more GUP cleanups saved up, so could add that to my queue of things to look at between book work :) > > Jason > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
On Sun, May 28, 2023 at 04:13:44PM +0100, Lorenzo Stoakes wrote: > On Sat, May 27, 2023 at 06:52:01AM -0300, Jason Gunthorpe wrote: > > On Sat, May 20, 2023 at 10:12:40AM +0100, Lorenzo Stoakes wrote: > > > > No, this returning 0 on failure is a foot gun. Failing to pin a single > > > > page is always an error, the only question is what sub reason caused > > > > the error to happen. There is no third case where it is not an error. > > > > > > The uprobe path thinks otherwise, but maybe the answer is that we just need > > > to -EFAULT on missing VMA and -EPERM on invalid flags. > > > > I think uprobe is just broken to think there is a third outcome. Let's > > just fix it instead of trying to pretend it makes sense. > > Sure, will take a look at that if I get a chance. We can at the very least > adjust get_user_page_vma_remote() with this fixed. > > Do you feel that a partially successful pinning for other GUP callers > should equally be treated as an error (and pages unpinned -> return error > code)? In that instance we'd need to audit things somewhat. That seems more deeply ingrained at least, I'm not as keen to change it as to get rid of the 0 return result. Jason
On 5/28/23 09:22, Jason Gunthorpe wrote: > On Sun, May 28, 2023 at 04:13:44PM +0100, Lorenzo Stoakes wrote: >> On Sat, May 27, 2023 at 06:52:01AM -0300, Jason Gunthorpe wrote: >>> On Sat, May 20, 2023 at 10:12:40AM +0100, Lorenzo Stoakes wrote: >>>>> No, this returning 0 on failure is a foot gun. Failing to pin a single >>>>> page is always an error, the only question is what sub reason caused >>>>> the error to happen. There is no third case where it is not an error. >>>> >>>> The uprobe path thinks otherwise, but maybe the answer is that we just need >>>> to -EFAULT on missing VMA and -EPERM on invalid flags. >>> >>> I think uprobe is just broken to think there is a third outcome. Let's >>> just fix it instead of trying to pretend it makes sense. >> >> Sure, will take a look at that if I get a chance. We can at the very least >> adjust get_user_page_vma_remote() with this fixed. Great! We've had previous discussions about getting rid of this pseudo-tristate errno in gup, so I just wanted to mention that I'm also glad to see the movement toward, "return some pages, or else a -errno". That's progress. >> >> Do you feel that a partially successful pinning for other GUP callers >> should equally be treated as an error (and pages unpinned -> return error >> code)? In that instance we'd need to audit things somewhat. > > That seems more deeply ingrained at least, I'm not as keen to change > it as to get rid of the 0 return result. > Yes. It's not just "audit things somewhat", it's more like, "change quite a few call sites, some of which actually gather sets of partial results in a retry loop". So some actual coding changes there. thanks,
diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c index 4c5ef9b20065..6983ba35ce16 100644 --- a/arch/arm64/kernel/mte.c +++ b/arch/arm64/kernel/mte.c @@ -434,8 +434,8 @@ static int __access_remote_tags(struct mm_struct *mm, unsigned long addr, struct page *page = get_user_page_vma_remote(mm, addr, gup_flags, &vma); - if (IS_ERR_OR_NULL(page)) { - err = page == NULL ? -EIO : PTR_ERR(page); + if (IS_ERR(page)) { + err = PTR_ERR(page); break; } diff --git a/include/linux/mm.h b/include/linux/mm.h index 42ff3e04c006..4bb172e4818c 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2397,7 +2397,7 @@ static inline struct page *get_user_page_vma_remote(struct mm_struct *mm, if (got < 0) return ERR_PTR(got); if (got == 0) - return NULL; + return ERR_PTR(-EIO); vma = vma_lookup(mm, addr); if (WARN_ON_ONCE(!vma)) { diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index cac3aef7c6f7..9cf2d4ba760e 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -474,7 +474,10 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm, gup_flags |= FOLL_SPLIT_PMD; /* Read the page with vaddr into memory */ old_page = get_user_page_vma_remote(mm, vaddr, gup_flags, &vma); - if (IS_ERR_OR_NULL(old_page)) + if (old_page == ERR_PTR(-EIO)) + return 0; + + if (IS_ERR(old_page)) return PTR_ERR(old_page); ret = verify_opcode(old_page, vaddr, &opcode); diff --git a/mm/memory.c b/mm/memory.c index 8358f3b853f2..f9a81278e76d 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -5604,7 +5604,7 @@ int __access_remote_vm(struct mm_struct *mm, unsigned long addr, void *buf, struct page *page = get_user_page_vma_remote(mm, addr, gup_flags, &vma); - if (IS_ERR_OR_NULL(page)) { + if (IS_ERR(page)) { #ifndef CONFIG_HAVE_IOREMAP_PROT break; #else