Message ID | 20240227070418.62292-1-ioworker0@gmail.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel+bounces-82775-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:a81b:b0:108:e6aa:91d0 with SMTP id bq27csp2532361dyb; Mon, 26 Feb 2024 23:05:05 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCWnpWVNy1xXvfkph72jtplaBq6aRzvgf0DgF8pbAN/KNoAtJmAEaWDKSMOqmWagKZ70JucF5iINax0dBKkSqpnelXLSSw== X-Google-Smtp-Source: AGHT+IGi3IqtzwZhT5PztLUb2E6uJrlcQZdl119AfNJXoRhUUSVu2+/AAcwxLM1DwqDIzVR7VmQM X-Received: by 2002:a17:902:8494:b0:1dc:4c25:c4fd with SMTP id c20-20020a170902849400b001dc4c25c4fdmr8465918plo.21.1709017505001; Mon, 26 Feb 2024 23:05:05 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1709017504; cv=pass; d=google.com; s=arc-20160816; b=yvrTd7xt7JDk/amg21h2D0SP2qcmLEu5pn6yXv3CsZnzZnu0PgqG0bS0nlXUF4ipEU Q9Tq1tgyaWmfXk8ZdQURcnX0a1JmeDI3wbnMUcQPKshjO9feDYEb/vzkSlupHC+pVsEp PgyIzCtUjf1pQGxtDM72xtdaKI3txrl6lKGGVPL7iSlunlySYTy41sfBdqUL67RImV0U tss7uiQMUmVqA7ndvkZJFpqFS2JKfnemzmdj/DzursuwhdBBd1/fXeeLw1WhkFu3AukU gjoTQejOzQTbKyG0JoiUtqCT2z649js7H/d2w8abY4/IPU6Sd9dUGr4A75tjCwEoZrBA 3Amw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:message-id:date:subject:cc:to :from:dkim-signature; bh=5Xh9xry9yXUPCbuZkIvBEujR1IE4YpcvcS1See1vKpM=; fh=OwJI2a79kGLpgMJu87UaxgQe3HQ8dD+Q5sCbMXERE4w=; b=0C7i2h/hHb8DvEyf0Vpkf5+L17jD01SkYSkGpqXVBWkf7+cOIOlHtsJHTFR9uJUvgx BoRXAYg6+TuYS+PgQf7nojzWKpy/Dssd6S+RJE5cY1eCNIOETk4fxVIRkxyrs+pidH+k 1dQo6vz2UWEq1SSyvhDptlEH0Xpyoe30ittBjgjMV29gNideFHPgzYn5+iwiE4jPOuxU A8x+V6sH7mjXsBSkM8K3f4z6eMqtfZL3zvuBEk6ufkgFI88ggAJRvLahPm/aOihWyBKH 75mYx3/QQC0npeVWigJeg3+gP5Tv8W+8TIeTH3zLHY7bU40LuuCTqT5D8GM4V84ZrbAw zTDQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=QL23UPDg; arc=pass (i=1 spf=pass spfdomain=gmail.com dkim=pass dkdomain=gmail.com dmarc=pass fromdomain=gmail.com); spf=pass (google.com: domain of linux-kernel+bounces-82775-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-82775-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id ma11-20020a170903094b00b001dc870abe78si908333plb.615.2024.02.26.23.05.04 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 26 Feb 2024 23:05:04 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-82775-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=QL23UPDg; arc=pass (i=1 spf=pass spfdomain=gmail.com dkim=pass dkdomain=gmail.com dmarc=pass fromdomain=gmail.com); spf=pass (google.com: domain of linux-kernel+bounces-82775-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-82775-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id C9EF8282786 for <ouuuleilei@gmail.com>; Tue, 27 Feb 2024 07:05:04 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id E05D753E38; Tue, 27 Feb 2024 07:04:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="QL23UPDg" Received: from mail-pl1-f180.google.com (mail-pl1-f180.google.com [209.85.214.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0108728366 for <linux-kernel@vger.kernel.org>; Tue, 27 Feb 2024 07:04:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.180 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709017492; cv=none; b=RKjsDazX83I+CXt6uIWhZEXrITthEqT1XsqSt9RwSH+Wj7f/nTF6fYuROkaKLf+1TrvIHW5gAq71hVb5aVRsj9bDHjHPlK52ZQhmZ8gIgR0ZipITxWYubNmEHUwxkulEtmYbTuO1xFAzTXgou4Hzrjntd42I4hxLUZJDeeHObjE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709017492; c=relaxed/simple; bh=zBL8nxW+5/neV82/pCUbPddBMc5jugz9xVoVZc2ucAs=; h=From:To:Cc:Subject:Date:Message-Id:MIME-Version; b=MSitVehquCPJk2VCX7ILKt71SoBXTZAVWFE/Cpo+g02YDRBlIyL3x5NCbRQof2kY0x2TcoLSSK2aLVdqsLtvnfoEQ5TmbVF1t4yrAdqWqMuWy361EEEBWnTOrgEmWxVdOilamuiZDMsZKPjaa8/3I3lh8bI15R4MxFiZV67BRzA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=QL23UPDg; arc=none smtp.client-ip=209.85.214.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-pl1-f180.google.com with SMTP id d9443c01a7336-1dca8b86ee7so14067525ad.2 for <linux-kernel@vger.kernel.org>; Mon, 26 Feb 2024 23:04:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1709017490; x=1709622290; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=5Xh9xry9yXUPCbuZkIvBEujR1IE4YpcvcS1See1vKpM=; b=QL23UPDg7H6PcMBeDbhFvmyqnBBLnGgwHxNxe7I3oTQJFUGuHf2b1TGe5vyMuujCo3 aLP9GpQSTRrdMkPeYW9PA1TPygKVJcU2jegFtbBDssjzl8ZHCeVNE4swCxlqvPLHd2mi Ot8Y4tUsh+aR6Km0m3iAjWXc6ja9hsY4kZUqlqfuY1c3Lw8grXhEkSDFhwltDzXSll5h WAcL/kBqxdzfDmG3VXcNG+1rhp4C3y39UGkBbksco8a6WlgeZwc/NWfEawZivU6NY4t7 enCvIrOzSnXqpcqUQeFdsICVbe5dSWqn8eiwINnlz/i+LdKPmaLqIuBvnMHE79Aq4J/g OaKg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709017490; x=1709622290; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=5Xh9xry9yXUPCbuZkIvBEujR1IE4YpcvcS1See1vKpM=; b=nZHAikzApwfo734+AXEMBBtTP/pt+r6eZMhjWEAMLdKlcrw2c9q8EjhFrpKhQSUEhL TqJfmoR/d3j2X3BTDKHU0knJyc0cJyZuKSVE2uFMXvyhtcLxe4L1Ak5KV7OOnAVjkudS mkRhTZmc9O0SJ7Hj3DsFvsWWbwa+2arGwLOXz+ZbeBD3VNCY7oPwp+h7qGp1Bfxd0tvC qI6AwzQnRWsFii0jUD7AGbTX82GhcWdt5Yr98cFmHTRWjPkHlPLjts/YQ9z5x0afEI4P U+okrE4NWf/ZewddhPb8CFJu6LY1scfLI37LRCqd6S7gUmDilexldvY73y9xYaSoSb15 irBg== X-Forwarded-Encrypted: i=1; AJvYcCVDfxa/23xx0PcVtsaOzfQcOQFldXAXd8JwgZd4Y1a5ymmOpDsxJQXwpXv1UzWuczkEh0/OjnxTtgADPVWRRu4i++vZlXAti/ryntEa X-Gm-Message-State: AOJu0YxC9Kiy57TONdZTv+mZrMB9m/Ye/jNTNLvVQStH5CZ8Vv2V5OKR tLiH/kqVahJc7fzrHZhPnYu/aFo0vgFeVPFuv57S0kOdimRzcc2Q X-Received: by 2002:a17:902:ed8b:b0:1db:d13d:6bf3 with SMTP id e11-20020a170902ed8b00b001dbd13d6bf3mr6479790plj.62.1709017490243; Mon, 26 Feb 2024 23:04:50 -0800 (PST) Received: from LancedeMBP.lan ([112.10.225.117]) by smtp.gmail.com with ESMTPSA id x9-20020a1709029a4900b001dc944299acsm783347plv.217.2024.02.26.23.04.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 26 Feb 2024 23:04:50 -0800 (PST) From: Lance Yang <ioworker0@gmail.com> To: akpm@linux-foundation.org Cc: ryan.roberts@arm.com, 21cnbao@gmail.com, david@redhat.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Lance Yang <ioworker0@gmail.com> Subject: [PATCH 1/1] mm/memory: Fix boundary check for next PFN in folio_pte_batch() Date: Tue, 27 Feb 2024 15:04:18 +0800 Message-Id: <20240227070418.62292-1-ioworker0@gmail.com> X-Mailer: git-send-email 2.33.1 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1792034739603154365 X-GMAIL-MSGID: 1792034739603154365 |
Series |
[1/1] mm/memory: Fix boundary check for next PFN in folio_pte_batch()
|
|
Commit Message
Lance Yang
Feb. 27, 2024, 7:04 a.m. UTC
Previously, in folio_pte_batch(), only the upper boundary of the
folio was checked using '>=' for comparison. This led to
incorrect behavior when the next PFN exceeded the lower boundary
of the folio, especially in corner cases where the next PFN might
fall into a different folio.
Signed-off-by: Lance Yang <ioworker0@gmail.com>
---
mm/memory.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
Comments
On 27.02.24 08:04, Lance Yang wrote: > Previously, in folio_pte_batch(), only the upper boundary of the > folio was checked using '>=' for comparison. This led to > incorrect behavior when the next PFN exceeded the lower boundary > of the folio, especially in corner cases where the next PFN might > fall into a different folio. Which commit does this fix? The introducing commit (f8d937761d65c87e9987b88ea7beb7bddc333a0e) is already in mm-stable, so we would need a Fixes: tag. Unless, Ryan's changes introduced a problem. BUT I don't see what is broken. :) Can you please give an example/reproducer? We know that the first PTE maps the folio. By incrementing the PFN using pte_next_pfn/pte_advance_pfn, we cannot suddenly get a lower PFN. So how would pte_advance_pfn(folio_start_pfn + X) suddenly give us a PFN lower than folio_start_pfn? Note that we are not really concerned about any kind of pte_advance_pfn() overflow that could generate PFN=0. I convinces myself that that that is something we don't have to worry about. [I also thought about getting rid of the pte_pfn(pte) >= folio_end_pfn and instead limiting end_ptep. But that requires more work before the loop and feels more like a micro-optimization.] > > Signed-off-by: Lance Yang <ioworker0@gmail.com> > --- > mm/memory.c | 7 +++++-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/mm/memory.c b/mm/memory.c > index 642b4f2be523..e5291d1e8c37 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -986,12 +986,15 @@ static inline int folio_pte_batch(struct folio *folio, unsigned long addr, > pte_t *start_ptep, pte_t pte, int max_nr, fpb_t flags, > bool *any_writable) > { > - unsigned long folio_end_pfn = folio_pfn(folio) + folio_nr_pages(folio); > + unsigned long folio_start_pfn, folio_end_pfn; > const pte_t *end_ptep = start_ptep + max_nr; > pte_t expected_pte, *ptep; > bool writable; > int nr; > > + folio_start_pfn = folio_pfn(folio); > + folio_end_pfn = folio_start_pfn + folio_nr_pages(folio); > + > if (any_writable) > *any_writable = false; > > @@ -1015,7 +1018,7 @@ static inline int folio_pte_batch(struct folio *folio, unsigned long addr, > * corner cases the next PFN might fall into a different > * folio. > */ > - if (pte_pfn(pte) >= folio_end_pfn) > + if (pte_pfn(pte) >= folio_end_pfn || pte_pfn(pte) < folio_start_pfn) > break; > > if (any_writable)
Hey David, Thanks for taking time to review! On Tue, Feb 27, 2024 at 3:30 PM David Hildenbrand <david@redhat.com> wrote: > > On 27.02.24 08:04, Lance Yang wrote: > > Previously, in folio_pte_batch(), only the upper boundary of the > > folio was checked using '>=' for comparison. This led to > > incorrect behavior when the next PFN exceeded the lower boundary > > of the folio, especially in corner cases where the next PFN might > > fall into a different folio. > > Which commit does this fix? > > The introducing commit (f8d937761d65c87e9987b88ea7beb7bddc333a0e) is > already in mm-stable, so we would need a Fixes: tag. Unless, Ryan's > changes introduced a problem. > > BUT > > I don't see what is broken. :) > > Can you please give an example/reproducer? For example1: PTE0 is present for large folio1. PTE1 is present for large folio1. PTE2 is present for large folio1. PTE3 is present for large folio1. folio_nr_pages(folio1) is 4. folio_nr_pages(folio2) is 4. pte = *start_ptep = PTE0; max_nr = folio_nr_pages(folio2); If folio_pfn(folio1) < folio_pfn(folio2), the return value of folio_pte_batch(folio2, start_ptep, pte, max_nr) will be 4(Actually it should be 0). For example2: PTE0 is present for large folio2. PTE1 is present for large folio1. PTE2 is present for large folio1. PTE3 is present for large folio1. folio_nr_pages(folio1) is 4. folio_nr_pages(folio2) is 4. pte = *start_ptep = PTE0; max_nr = folio_nr_pages(folio1); If max_nr=4, the return value of folio_pte_batch(folio1, start_ptep, pte, max_nr) will be 1(Actually it should be 0). folio_pte_batch() will soon be exported, and IMO, these corner cases may need to be handled. Thanks, Lance > > We know that the first PTE maps the folio. By incrementing the PFN using > pte_next_pfn/pte_advance_pfn, we cannot suddenly get a lower PFN. > > So how would pte_advance_pfn(folio_start_pfn + X) suddenly give us a PFN > lower than folio_start_pfn? > > Note that we are not really concerned about any kind of > pte_advance_pfn() overflow that could generate PFN=0. I convinces myself > that that that is something we don't have to worry about. > > > [I also thought about getting rid of the pte_pfn(pte) >= folio_end_pfn > and instead limiting end_ptep. But that requires more work before the > loop and feels more like a micro-optimization.] > > > > > Signed-off-by: Lance Yang <ioworker0@gmail.com> > > --- > > mm/memory.c | 7 +++++-- > > 1 file changed, 5 insertions(+), 2 deletions(-) > > > > diff --git a/mm/memory.c b/mm/memory.c > > index 642b4f2be523..e5291d1e8c37 100644 > > --- a/mm/memory.c > > +++ b/mm/memory.c > > @@ -986,12 +986,15 @@ static inline int folio_pte_batch(struct folio *folio, unsigned long addr, > > pte_t *start_ptep, pte_t pte, int max_nr, fpb_t flags, > > bool *any_writable) > > { > > - unsigned long folio_end_pfn = folio_pfn(folio) + folio_nr_pages(folio); > > + unsigned long folio_start_pfn, folio_end_pfn; > > const pte_t *end_ptep = start_ptep + max_nr; > > pte_t expected_pte, *ptep; > > bool writable; > > int nr; > > > > + folio_start_pfn = folio_pfn(folio); > > + folio_end_pfn = folio_start_pfn + folio_nr_pages(folio); > > + > > if (any_writable) > > *any_writable = false; > > > > @@ -1015,7 +1018,7 @@ static inline int folio_pte_batch(struct folio *folio, unsigned long addr, > > * corner cases the next PFN might fall into a different > > * folio. > > */ > > - if (pte_pfn(pte) >= folio_end_pfn) > > + if (pte_pfn(pte) >= folio_end_pfn || pte_pfn(pte) < folio_start_pfn) > > break; > > > > if (any_writable) > > -- > Cheers, > > David / dhildenb >
On 27.02.24 09:23, Lance Yang wrote: > Hey David, > > Thanks for taking time to review! > > On Tue, Feb 27, 2024 at 3:30 PM David Hildenbrand <david@redhat.com> wrote: >> >> On 27.02.24 08:04, Lance Yang wrote: >>> Previously, in folio_pte_batch(), only the upper boundary of the >>> folio was checked using '>=' for comparison. This led to >>> incorrect behavior when the next PFN exceeded the lower boundary >>> of the folio, especially in corner cases where the next PFN might >>> fall into a different folio. >> >> Which commit does this fix? >> >> The introducing commit (f8d937761d65c87e9987b88ea7beb7bddc333a0e) is >> already in mm-stable, so we would need a Fixes: tag. Unless, Ryan's >> changes introduced a problem. >> >> BUT >> >> I don't see what is broken. :) >> >> Can you please give an example/reproducer? > > For example1: > > PTE0 is present for large folio1. > PTE1 is present for large folio1. > PTE2 is present for large folio1. > PTE3 is present for large folio1. > > folio_nr_pages(folio1) is 4. > folio_nr_pages(folio2) is 4. > > pte = *start_ptep = PTE0; > max_nr = folio_nr_pages(folio2); > > If folio_pfn(folio1) < folio_pfn(folio2), > the return value of folio_pte_batch(folio2, start_ptep, pte, max_nr) > will be 4(Actually it should be 0). > > For example2: > > PTE0 is present for large folio2. > PTE1 is present for large folio1. > PTE2 is present for large folio1. > PTE3 is present for large folio1. > > folio_nr_pages(folio1) is 4. > folio_nr_pages(folio2) is 4. > > pte = *start_ptep = PTE0; > max_nr = folio_nr_pages(folio1); > In both cases, start_ptep does not map the folio. It's a BUG in your caller unless I am missing something important. > If max_nr=4, the return value of folio_pte_batch(folio1, start_ptep, > pte, max_nr) > will be 1(Actually it should be 0). > > folio_pte_batch() will soon be exported, and IMO, these corner cases may need > to be handled. No, you should fix your caller. The function cannot possibly do something reasonable if start_ptep does not map the folio. nr = pte_batch_hint(start_ptep, pte); .. ptep = start_ptep + nr; /* nr is >= 1 */ .. return min(ptep - start_ptep, max_nr); /* will return something > 0 */ Which would return > 0 for something that does not map that folio. I was trying to avoid official kernel docs for this internal helper, maybe we have to improve it now.
On Tue, Feb 27, 2024 at 4:33 PM David Hildenbrand <david@redhat.com> wrote: > > On 27.02.24 09:23, Lance Yang wrote: > > Hey David, > > > > Thanks for taking time to review! > > > > On Tue, Feb 27, 2024 at 3:30 PM David Hildenbrand <david@redhatcom> wrote: > >> > >> On 27.02.24 08:04, Lance Yang wrote: > >>> Previously, in folio_pte_batch(), only the upper boundary of the > >>> folio was checked using '>=' for comparison. This led to > >>> incorrect behavior when the next PFN exceeded the lower boundary > >>> of the folio, especially in corner cases where the next PFN might > >>> fall into a different folio. > >> > >> Which commit does this fix? > >> > >> The introducing commit (f8d937761d65c87e9987b88ea7beb7bddc333a0e) is > >> already in mm-stable, so we would need a Fixes: tag. Unless, Ryan's > >> changes introduced a problem. > >> > >> BUT > >> > >> I don't see what is broken. :) > >> > >> Can you please give an example/reproducer? > > > > For example1: > > > > PTE0 is present for large folio1. > > PTE1 is present for large folio1. > > PTE2 is present for large folio1. > > PTE3 is present for large folio1. > > > > folio_nr_pages(folio1) is 4. > > folio_nr_pages(folio2) is 4. > > > > pte = *start_ptep = PTE0; > > max_nr = folio_nr_pages(folio2); > > > > If folio_pfn(folio1) < folio_pfn(folio2), > > the return value of folio_pte_batch(folio2, start_ptep, pte, max_nr) > > will be 4(Actually it should be 0). > > > > For example2: > > > > PTE0 is present for large folio2. > > PTE1 is present for large folio1. > > PTE2 is present for large folio1. > > PTE3 is present for large folio1. > > > > folio_nr_pages(folio1) is 4. > > folio_nr_pages(folio2) is 4. > > > > pte = *start_ptep = PTE0; > > max_nr = folio_nr_pages(folio1); > > > > In both cases, start_ptep does not map the folio. > > It's a BUG in your caller unless I am missing something important. Sorry, I understood. Thanks for your clarification! Lance > > > > If max_nr=4, the return value of folio_pte_batch(folio1, start_ptep, > > pte, max_nr) > > will be 1(Actually it should be 0). > > > > folio_pte_batch() will soon be exported, and IMO, these corner cases may need > > to be handled. > > No, you should fix your caller. > > The function cannot possibly do something reasonable if start_ptep does > not map the folio. > > nr = pte_batch_hint(start_ptep, pte); > ... > ptep = start_ptep + nr; /* nr is >= 1 */ > ... > return min(ptep - start_ptep, max_nr); /* will return something > 0 */ > > Which would return > 0 for something that does not map that folio. > > > I was trying to avoid official kernel docs for this internal helper, > maybe we have to improve it now. > > -- > Cheers, > > David / dhildenb >
On 27.02.24 09:45, Lance Yang wrote: > On Tue, Feb 27, 2024 at 4:33 PM David Hildenbrand <david@redhat.com> wrote: >> >> On 27.02.24 09:23, Lance Yang wrote: >>> Hey David, >>> >>> Thanks for taking time to review! >>> >>> On Tue, Feb 27, 2024 at 3:30 PM David Hildenbrand <david@redhat.com> wrote: >>>> >>>> On 27.02.24 08:04, Lance Yang wrote: >>>>> Previously, in folio_pte_batch(), only the upper boundary of the >>>>> folio was checked using '>=' for comparison. This led to >>>>> incorrect behavior when the next PFN exceeded the lower boundary >>>>> of the folio, especially in corner cases where the next PFN might >>>>> fall into a different folio. >>>> >>>> Which commit does this fix? >>>> >>>> The introducing commit (f8d937761d65c87e9987b88ea7beb7bddc333a0e) is >>>> already in mm-stable, so we would need a Fixes: tag. Unless, Ryan's >>>> changes introduced a problem. >>>> >>>> BUT >>>> >>>> I don't see what is broken. :) >>>> >>>> Can you please give an example/reproducer? >>> >>> For example1: >>> >>> PTE0 is present for large folio1. >>> PTE1 is present for large folio1. >>> PTE2 is present for large folio1. >>> PTE3 is present for large folio1. >>> >>> folio_nr_pages(folio1) is 4. >>> folio_nr_pages(folio2) is 4. >>> >>> pte = *start_ptep = PTE0; >>> max_nr = folio_nr_pages(folio2); >>> >>> If folio_pfn(folio1) < folio_pfn(folio2), >>> the return value of folio_pte_batch(folio2, start_ptep, pte, max_nr) >>> will be 4(Actually it should be 0). >>> >>> For example2: >>> >>> PTE0 is present for large folio2. >>> PTE1 is present for large folio1. >>> PTE2 is present for large folio1. >>> PTE3 is present for large folio1. >>> >>> folio_nr_pages(folio1) is 4. >>> folio_nr_pages(folio2) is 4. >>> >>> pte = *start_ptep = PTE0; >>> max_nr = folio_nr_pages(folio1); >>> >> >> In both cases, start_ptep does not map the folio. >> >> It's a BUG in your caller unless I am missing something important. > > Sorry, I understood. > > Thanks for your clarification! I'll post some kernel doc as reply to Barry's export patch to clarify that.
diff --git a/mm/memory.c b/mm/memory.c index 642b4f2be523..e5291d1e8c37 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -986,12 +986,15 @@ static inline int folio_pte_batch(struct folio *folio, unsigned long addr, pte_t *start_ptep, pte_t pte, int max_nr, fpb_t flags, bool *any_writable) { - unsigned long folio_end_pfn = folio_pfn(folio) + folio_nr_pages(folio); + unsigned long folio_start_pfn, folio_end_pfn; const pte_t *end_ptep = start_ptep + max_nr; pte_t expected_pte, *ptep; bool writable; int nr; + folio_start_pfn = folio_pfn(folio); + folio_end_pfn = folio_start_pfn + folio_nr_pages(folio); + if (any_writable) *any_writable = false; @@ -1015,7 +1018,7 @@ static inline int folio_pte_batch(struct folio *folio, unsigned long addr, * corner cases the next PFN might fall into a different * folio. */ - if (pte_pfn(pte) >= folio_end_pfn) + if (pte_pfn(pte) >= folio_end_pfn || pte_pfn(pte) < folio_start_pfn) break; if (any_writable)