Message ID | 20240212104448.2589568-8-kirill.shutemov@linux.intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel+bounces-61357-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:50ea:b0:106:860b:bbdd with SMTP id r10csp2381630dyd; Mon, 12 Feb 2024 04:04:26 -0800 (PST) X-Google-Smtp-Source: AGHT+IF0xn7AXTpOGBIumm5uHkdeIE1Fb3cqm4LKMrQc74eTr0EaQcuo7sN7JFdAtMQVLVMNGsgw X-Received: by 2002:a17:902:d501:b0:1da:2c01:fef5 with SMTP id b1-20020a170902d50100b001da2c01fef5mr3461029plg.56.1707739466161; Mon, 12 Feb 2024 04:04:26 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707739466; cv=pass; d=google.com; s=arc-20160816; b=dRIMCmRvGwRv00mVR/rNnaaiGZaizZt3/qvb/hlXU57a7twImIttHMmAOr7hQr/OJY 3Qn8/S3JydSGUO1vMy8SXoJLZO2QiVjuRqOr9EmTtIZZ1CSyukeLePx1EZ/7e2JKKpt2 vshVRwrVfgGYZ+mZgexC/rH9Y9w4OTbGZHlJhJzVzgxnLYs3UvT+kXn/OpTlVOPr9RGP uF7wX9xoyH3deF5xDuuMpYaJCzz13D0E0+Vb8izx2obONJfU2t/Z/BDqbC7WXOsAfIC1 WLiLI8+REHruwb4MU0k9oTq7KkRcTBdH0Xs6fr+aAkL+Zen69YNDiIUg0OZx5SIgQKyy rIvQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=kxw9uVRF0uklRfRh6iTwvS8jLXK1PeQrAkZPaXxZPCw=; fh=6rLdNDfjENId5Me7Uud0sAEp/o76D4mz3U8D22/LUIk=; b=oQDpseQJxp9vgwpklahxxmMwcbr4zwbXRBhtcTBCmIvRu11wtNHRFD9r/lG4MU74rY eaBqhp5oQ2uOxcvJlKmzbnqu9Vy4s4b1yNuwisbeml5v2AnO3BwNgM9dXZ8JhI9BRzt8 M4Ppnil+dzaYtpRN4qEibRJiQCNr0SgvlePYZ7CxBfozDcxcWJRH19JW7FUP9cqgRT4e I99vbYufqDFdwF9CbZUOX0H3Hf789xzgr1GT9nZWQlO9j950gjTyI46XEUExvnsShlB+ 7xFJAvZKT8BsvmbGUqqjJ0pGu8YF7iCdP3Voqx0DKGBC4x7fyX8sL4Iy3XXtOCfMToai mJXg==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=RGgpLoD2; arc=pass (i=1 dkim=pass dkdomain=intel.com dmarc=pass fromdomain=linux.intel.com); spf=pass (google.com: domain of linux-kernel+bounces-61357-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-61357-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Forwarded-Encrypted: i=2; AJvYcCVTWTbUJo3VZpqS3xmoO1sx9LeGyPEmZTiAFuRthwVnbhPI2ERjTBd8jFHkVLGlYmvPxUfxetXwD0DOYW94UaxvtzquQg== Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id s21-20020a656915000000b005cde42fcb11si141471pgq.707.2024.02.12.04.04.26 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 12 Feb 2024 04:04:26 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-61357-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=RGgpLoD2; arc=pass (i=1 dkim=pass dkdomain=intel.com dmarc=pass fromdomain=linux.intel.com); spf=pass (google.com: domain of linux-kernel+bounces-61357-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-61357-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 70ED6285C69 for <ouuuleilei@gmail.com>; Mon, 12 Feb 2024 10:48:36 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id B43363C070; Mon, 12 Feb 2024 10:45:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="RGgpLoD2" Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4406B3A8D5 for <linux-kernel@vger.kernel.org>; Mon, 12 Feb 2024 10:45:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.17 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707734708; cv=none; b=JENKDga0DLLeACPA8tewhBoOEToWoftolWvVdpozHWUxsAcyog6gWlWShPzJinF5+DCROMTQrc5UmYG4sFKTCb1tpTicFy3GHhfIDHrCET15XN+82sOpQMi/qn1PwRXMWIUMSOhuQ5Izr3Eh2B+8xh2TxqJQ7NPfySyIS6oKOcc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707734708; c=relaxed/simple; bh=SxAJq8PySGkIldIS+Dq+6BevKxjnV/qVUXBP60oIUiw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=rY8M6HCaN1Y4CAKHnrrxDen3YPmktBcA0irOjBb/2FUTaB6CY8C5+Nro5obi1hG05jZi0PlpDUbcuEi+TgBrphd2K3OXFJ/8an0/kgoYMlxZvZkosyvqh54gWI0fbtWkgVTTn38CcRXUepAbgrFTl+8p3SqDYIoXu2ne+xb95es= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.helo=mgamail.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=RGgpLoD2; arc=none smtp.client-ip=192.198.163.17 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.helo=mgamail.intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1707734705; x=1739270705; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=SxAJq8PySGkIldIS+Dq+6BevKxjnV/qVUXBP60oIUiw=; b=RGgpLoD23pZM1KAvseQfdqsnUo9zf+Vh25P/SEJo81e5kHVkjDY625Ti 13a5h2HGe3e6gXCoW7ApB1DIirVB2QGbbtJ72xxmEqxT3s39sJU66KhMJ TvHSc3mK2hgVxGpgq6HujSHT10tcSBVlk1efJA7vYV8ytTGf28SJEP5Pj hkWPZJ6ze6kUN3gY4NmP4WtCGvma5Xg82FUWgdjQ3m/qmYdK6MCmp4rp3 ufhqPVyjDks87adoEfDfYCoVP1yZU7tYgfomZx1P/HqVf0PQLsuqixvx5 GxN4bA+LYQMeJuBdg3AAOzDp/REq4IX4vvZl3q+T6eMl6sQyrE+qSIbgA Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10981"; a="1585047" X-IronPort-AV: E=Sophos;i="6.05,263,1701158400"; d="scan'208";a="1585047" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmvoesa111.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Feb 2024 02:45:04 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10981"; a="935035597" X-IronPort-AV: E=Sophos;i="6.05,262,1701158400"; d="scan'208";a="935035597" Received: from black.fi.intel.com ([10.237.72.28]) by fmsmga001.fm.intel.com with ESMTP; 12 Feb 2024 02:45:00 -0800 Received: by black.fi.intel.com (Postfix, from userid 1000) id 483AE53A; Mon, 12 Feb 2024 12:44:53 +0200 (EET) From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> To: Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>, Dave Hansen <dave.hansen@linux.intel.com>, x86@kernel.org Cc: "Rafael J. Wysocki" <rafael@kernel.org>, Peter Zijlstra <peterz@infradead.org>, Adrian Hunter <adrian.hunter@intel.com>, Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>, Elena Reshetova <elena.reshetova@intel.com>, Jun Nakajima <jun.nakajima@intel.com>, Rick Edgecombe <rick.p.edgecombe@intel.com>, Tom Lendacky <thomas.lendacky@amd.com>, "Kalra, Ashish" <ashish.kalra@amd.com>, Sean Christopherson <seanjc@google.com>, "Huang, Kai" <kai.huang@intel.com>, Baoquan He <bhe@redhat.com>, kexec@lists.infradead.org, linux-coco@lists.linux.dev, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Subject: [PATCHv7 07/16] x86/mm: Return correct level from lookup_address() if pte is none Date: Mon, 12 Feb 2024 12:44:39 +0200 Message-ID: <20240212104448.2589568-8-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240212104448.2589568-1-kirill.shutemov@linux.intel.com> References: <20240212104448.2589568-1-kirill.shutemov@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1790694618230407042 X-GMAIL-MSGID: 1790694618230407042 |
Series |
x86/tdx: Add kexec support
|
|
Commit Message
Kirill A. Shutemov
Feb. 12, 2024, 10:44 a.m. UTC
lookup_address() only returns correct page table level for the entry if the entry is not none. Make the helper to always return correct 'level'. It allows to implement iterator over kernel page tables using lookup_address(). Add one more entry into enum pg_level to indicate size of VA covered by one PGD entry in 5-level paging mode. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> --- arch/x86/include/asm/pgtable_types.h | 1 + arch/x86/mm/pat/set_memory.c | 8 ++++---- 2 files changed, 5 insertions(+), 4 deletions(-)
Comments
On 02/12/24 at 12:44pm, Kirill A. Shutemov wrote: > lookup_address() only returns correct page table level for the entry if > the entry is not none. > > Make the helper to always return correct 'level'. It allows to implement > iterator over kernel page tables using lookup_address(). > > Add one more entry into enum pg_level to indicate size of VA covered by > one PGD entry in 5-level paging mode. > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> > --- > arch/x86/include/asm/pgtable_types.h | 1 + > arch/x86/mm/pat/set_memory.c | 8 ++++---- > 2 files changed, 5 insertions(+), 4 deletions(-) > > diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h > index 0b748ee16b3d..3f648ffdfbe5 100644 > --- a/arch/x86/include/asm/pgtable_types.h > +++ b/arch/x86/include/asm/pgtable_types.h > @@ -548,6 +548,7 @@ enum pg_level { > PG_LEVEL_2M, > PG_LEVEL_1G, > PG_LEVEL_512G, > + PG_LEVEL_256T, > PG_LEVEL_NUM > }; > > diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c > index f92da8c9a86d..3612e3167147 100644 > --- a/arch/x86/mm/pat/set_memory.c > +++ b/arch/x86/mm/pat/set_memory.c > @@ -666,32 +666,32 @@ pte_t *lookup_address_in_pgd(pgd_t *pgd, unsigned long address, LGTM, Reviewed-by: Baoquan He <bhe@redhat.com> By the way, we may need update the code comment above function lookup_address_in_pgd() and function lookup_address() since they don't reflect the latest behaviour of them. > pud_t *pud; > pmd_t *pmd; > > - *level = PG_LEVEL_NONE; > + *level = PG_LEVEL_256T; > > if (pgd_none(*pgd)) > return NULL; > > + *level = PG_LEVEL_512G; > p4d = p4d_offset(pgd, address); > if (p4d_none(*p4d)) > return NULL; > > - *level = PG_LEVEL_512G; > if (p4d_large(*p4d) || !p4d_present(*p4d)) > return (pte_t *)p4d; > > + *level = PG_LEVEL_1G; > pud = pud_offset(p4d, address); > if (pud_none(*pud)) > return NULL; > > - *level = PG_LEVEL_1G; > if (pud_large(*pud) || !pud_present(*pud)) > return (pte_t *)pud; > > + *level = PG_LEVEL_2M; > pmd = pmd_offset(pud, address); > if (pmd_none(*pmd)) > return NULL; > > - *level = PG_LEVEL_2M; > if (pmd_large(*pmd) || !pmd_present(*pmd)) > return (pte_t *)pmd; > > -- > 2.43.0 >
On Mon, Feb 19, 2024 at 01:12:32PM +0800, Baoquan He wrote: > On 02/12/24 at 12:44pm, Kirill A. Shutemov wrote: > > lookup_address() only returns correct page table level for the entry if > > the entry is not none. > > > > Make the helper to always return correct 'level'. It allows to implement > > iterator over kernel page tables using lookup_address(). > > > > Add one more entry into enum pg_level to indicate size of VA covered by > > one PGD entry in 5-level paging mode. > > > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > > Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> > > --- > > arch/x86/include/asm/pgtable_types.h | 1 + > > arch/x86/mm/pat/set_memory.c | 8 ++++---- > > 2 files changed, 5 insertions(+), 4 deletions(-) > > > > diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h > > index 0b748ee16b3d..3f648ffdfbe5 100644 > > --- a/arch/x86/include/asm/pgtable_types.h > > +++ b/arch/x86/include/asm/pgtable_types.h > > @@ -548,6 +548,7 @@ enum pg_level { > > PG_LEVEL_2M, > > PG_LEVEL_1G, > > PG_LEVEL_512G, > > + PG_LEVEL_256T, > > PG_LEVEL_NUM > > }; > > > > diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c > > index f92da8c9a86d..3612e3167147 100644 > > --- a/arch/x86/mm/pat/set_memory.c > > +++ b/arch/x86/mm/pat/set_memory.c > > @@ -666,32 +666,32 @@ pte_t *lookup_address_in_pgd(pgd_t *pgd, unsigned long address, > > LGTM, > > Reviewed-by: Baoquan He <bhe@redhat.com> > > By the way, we may need update the code comment above function > lookup_address_in_pgd() and function lookup_address() since they don't > reflect the latest behaviour of them. I am not sure what part of the comment you see doesn't reflect the behaviour. From my PoV, changed code matches the comment closer that original. Hm?
On 02/19/24 at 03:52pm, Kirill A. Shutemov wrote: > On Mon, Feb 19, 2024 at 01:12:32PM +0800, Baoquan He wrote: > > On 02/12/24 at 12:44pm, Kirill A. Shutemov wrote: > > > lookup_address() only returns correct page table level for the entry if > > > the entry is not none. > > > > > > Make the helper to always return correct 'level'. It allows to implement > > > iterator over kernel page tables using lookup_address(). > > > > > > Add one more entry into enum pg_level to indicate size of VA covered by > > > one PGD entry in 5-level paging mode. > > > > > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > > > Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> > > > --- > > > arch/x86/include/asm/pgtable_types.h | 1 + > > > arch/x86/mm/pat/set_memory.c | 8 ++++---- > > > 2 files changed, 5 insertions(+), 4 deletions(-) > > > > > > diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h > > > index 0b748ee16b3d..3f648ffdfbe5 100644 > > > --- a/arch/x86/include/asm/pgtable_types.h > > > +++ b/arch/x86/include/asm/pgtable_types.h > > > @@ -548,6 +548,7 @@ enum pg_level { > > > PG_LEVEL_2M, > > > PG_LEVEL_1G, > > > PG_LEVEL_512G, > > > + PG_LEVEL_256T, > > > PG_LEVEL_NUM > > > }; > > > > > > diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c > > > index f92da8c9a86d..3612e3167147 100644 > > > --- a/arch/x86/mm/pat/set_memory.c > > > +++ b/arch/x86/mm/pat/set_memory.c > > > @@ -666,32 +666,32 @@ pte_t *lookup_address_in_pgd(pgd_t *pgd, unsigned long address, > > > > LGTM, > > > > Reviewed-by: Baoquan He <bhe@redhat.com> > > > > By the way, we may need update the code comment above function > > lookup_address_in_pgd() and function lookup_address() since they don't > > reflect the latest behaviour of them. > > I am not sure what part of the comment you see doesn't reflect the > behaviour. From my PoV, changed code matches the comment closer that > original. Oh, I didn't make it clear. I mean update the code comment for lookup_address(), and add code comment for lookup_address_in_pgd() to mention the level thing. Maybe it's a chance to do that. ===1> * * Lookup the page table entry for a virtual address. Return a pointer * to the entry and the level of the mapping. * * Note: We return pud and pmd either when the entry is marked large ~~~~~~~~~~~ seems we return p4d too * or when the present bit is not set. Otherwise we would return a * pointer to a nonexisting mapping. ~~~~~~~~~~~~~~~ NULL? */ pte_t *lookup_address(unsigned long address, unsigned int *level) { return lookup_address_in_pgd(pgd_offset_k(address), address, level); } EXPORT_SYMBOL_GPL(lookup_address); === ===2> /* * Lookup the page table entry for a virtual address in a specific pgd. * Return a pointer to the entry and the level of the mapping. ~~ also could return NULL if it's none entry. And do we need to mention the level thing? */ pte_t *lookup_address_in_pgd(pgd_t *pgd, unsigned long address, unsigned int *level) .. }
On Tue, Feb 20, 2024 at 06:25:43PM +0800, Baoquan He wrote: > > I am not sure what part of the comment you see doesn't reflect the > > behaviour. From my PoV, changed code matches the comment closer that > > original. > > Oh, I didn't make it clear. I mean update the code comment for > lookup_address(), and add code comment for lookup_address_in_pgd() to > mention the level thing. Maybe it's a chance to do that. > > ===1> > * > * Lookup the page table entry for a virtual address. Return a pointer > * to the entry and the level of the mapping. > * > * Note: We return pud and pmd either when the entry is marked large > ~~~~~~~~~~~ seems we return p4d too > * or when the present bit is not set. Otherwise we would return a > * pointer to a nonexisting mapping. > ~~~~~~~~~~~~~~~ NULL? > */ > pte_t *lookup_address(unsigned long address, unsigned int *level) > { > return lookup_address_in_pgd(pgd_offset_k(address), address, level); > } > EXPORT_SYMBOL_GPL(lookup_address); > === > > ===2> > /* > * Lookup the page table entry for a virtual address in a specific pgd. > * Return a pointer to the entry and the level of the mapping. > ~~ also could return NULL if it's none entry. And do we need to > mention the level thing? > */ > pte_t *lookup_address_in_pgd(pgd_t *pgd, unsigned long address, > unsigned int *level) > ... > } > What about this fixup: diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c index 3612e3167147..425ff6e192e6 100644 --- a/arch/x86/mm/pat/set_memory.c +++ b/arch/x86/mm/pat/set_memory.c @@ -657,7 +657,8 @@ static inline pgprot_t verify_rwx(pgprot_t old, pgprot_t new, unsigned long star /* * Lookup the page table entry for a virtual address in a specific pgd. - * Return a pointer to the entry and the level of the mapping. + * Return a pointer to the entry and the level of the mapping (or NULL if + * the entry is none) and level of the entry. */ pte_t *lookup_address_in_pgd(pgd_t *pgd, unsigned long address, unsigned int *level) @@ -704,9 +705,8 @@ pte_t *lookup_address_in_pgd(pgd_t *pgd, unsigned long address, * Lookup the page table entry for a virtual address. Return a pointer * to the entry and the level of the mapping. * - * Note: We return pud and pmd either when the entry is marked large - * or when the present bit is not set. Otherwise we would return a - * pointer to a nonexisting mapping. + * Note: the function returns p4d, pud and pmd either when the entry is marked + * large or when the present bit is not set. Otherwise it returns NULL. */ pte_t *lookup_address(unsigned long address, unsigned int *level) {
On 02/20/24 at 02:36pm, Kirill A. Shutemov wrote: > On Tue, Feb 20, 2024 at 06:25:43PM +0800, Baoquan He wrote: > > > I am not sure what part of the comment you see doesn't reflect the > > > behaviour. From my PoV, changed code matches the comment closer that > > > original. > > > > Oh, I didn't make it clear. I mean update the code comment for > > lookup_address(), and add code comment for lookup_address_in_pgd() to > > mention the level thing. Maybe it's a chance to do that. > > > > ===1> > > * > > * Lookup the page table entry for a virtual address. Return a pointer > > * to the entry and the level of the mapping. > > * > > * Note: We return pud and pmd either when the entry is marked large > > ~~~~~~~~~~~ seems we return p4d too > > * or when the present bit is not set. Otherwise we would return a > > * pointer to a nonexisting mapping. > > ~~~~~~~~~~~~~~~ NULL? > > */ > > pte_t *lookup_address(unsigned long address, unsigned int *level) > > { > > return lookup_address_in_pgd(pgd_offset_k(address), address, level); > > } > > EXPORT_SYMBOL_GPL(lookup_address); > > === > > > > ===2> > > /* > > * Lookup the page table entry for a virtual address in a specific pgd. > > * Return a pointer to the entry and the level of the mapping. > > ~~ also could return NULL if it's none entry. And do we need to > > mention the level thing? > > */ > > pte_t *lookup_address_in_pgd(pgd_t *pgd, unsigned long address, > > unsigned int *level) > > ... > > } > > > > What about this fixup: Some nitpicks. > > diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c > index 3612e3167147..425ff6e192e6 100644 > --- a/arch/x86/mm/pat/set_memory.c > +++ b/arch/x86/mm/pat/set_memory.c > @@ -657,7 +657,8 @@ static inline pgprot_t verify_rwx(pgprot_t old, pgprot_t new, unsigned long star > > /* > * Lookup the page table entry for a virtual address in a specific pgd. > - * Return a pointer to the entry and the level of the mapping. > + * Return a pointer to the entry and the level of the mapping (or NULL if > + * the entry is none) and level of the entry. ^ this right parenthesis may need be moved to the end. ======= * Return a pointer to the entry and the level of the mapping (or NULL if * the entry is none and level of the entry). ======= > */ > pte_t *lookup_address_in_pgd(pgd_t *pgd, unsigned long address, > unsigned int *level) > @@ -704,9 +705,8 @@ pte_t *lookup_address_in_pgd(pgd_t *pgd, unsigned long address, > * Lookup the page table entry for a virtual address. Return a pointer > * to the entry and the level of the mapping. > * > - * Note: We return pud and pmd either when the entry is marked large > - * or when the present bit is not set. Otherwise we would return a > - * pointer to a nonexisting mapping. > + * Note: the function returns p4d, pud and pmd either when the entry is marked ~~~ ^ s/and/or/ > + * large or when the present bit is not set. Otherwise it returns NULL. > */ > pte_t *lookup_address(unsigned long address, unsigned int *level) > { > -- > Kiryl Shutsemau / Kirill A. Shutemov >
On Wed, Feb 21, 2024 at 10:37:29AM +0800, Baoquan He wrote: > > diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c > > index 3612e3167147..425ff6e192e6 100644 > > --- a/arch/x86/mm/pat/set_memory.c > > +++ b/arch/x86/mm/pat/set_memory.c > > @@ -657,7 +657,8 @@ static inline pgprot_t verify_rwx(pgprot_t old, pgprot_t new, unsigned long star > > > > /* > > * Lookup the page table entry for a virtual address in a specific pgd. > > - * Return a pointer to the entry and the level of the mapping. > > + * Return a pointer to the entry and the level of the mapping (or NULL if > > + * the entry is none) and level of the entry. > ^ this right parenthesis may need be moved to the end. > > > ======= > * Return a pointer to the entry and the level of the mapping (or NULL if > * the entry is none and level of the entry). > ======= Emm.. I like my variant more. We return level regardless if the entry none or not. I don't see a reason to repeat it twice. > > */ > > pte_t *lookup_address_in_pgd(pgd_t *pgd, unsigned long address, > > unsigned int *level) > > @@ -704,9 +705,8 @@ pte_t *lookup_address_in_pgd(pgd_t *pgd, unsigned long address, > > * Lookup the page table entry for a virtual address. Return a pointer > > * to the entry and the level of the mapping. > > * > > - * Note: We return pud and pmd either when the entry is marked large > > - * or when the present bit is not set. Otherwise we would return a > > - * pointer to a nonexisting mapping. > > + * Note: the function returns p4d, pud and pmd either when the entry is marked > ~~~ > ^ s/and/or/ Fair enough.
On 02/21/24 at 04:15pm, Kirill A. Shutemov wrote: > On Wed, Feb 21, 2024 at 10:37:29AM +0800, Baoquan He wrote: > > > diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c > > > index 3612e3167147..425ff6e192e6 100644 > > > --- a/arch/x86/mm/pat/set_memory.c > > > +++ b/arch/x86/mm/pat/set_memory.c > > > @@ -657,7 +657,8 @@ static inline pgprot_t verify_rwx(pgprot_t old, pgprot_t new, unsigned long star > > > > > > /* > > > * Lookup the page table entry for a virtual address in a specific pgd. > > > - * Return a pointer to the entry and the level of the mapping. > > > + * Return a pointer to the entry and the level of the mapping (or NULL if > > > + * the entry is none) and level of the entry. > > ^ this right parenthesis may need be moved to the end. > > > > > > ======= > > * Return a pointer to the entry and the level of the mapping (or NULL if > > * the entry is none and level of the entry). > > ======= > > Emm.. I like my variant more. We return level regardless if the entry none > or not. I don't see a reason to repeat it twice. * Lookup the page table entry for a virtual address in a specific pgd. * Return a pointer to the entry and the level of the mapping (or NULL if * the entry is none) and level of the entry. Hmm, I am confused. Why do we need to stress the level of the mapping and level of the entry? Wondering what is the difference. I must miss something.
On Thu, Feb 22, 2024 at 07:01:41PM +0800, Baoquan He wrote: > On 02/21/24 at 04:15pm, Kirill A. Shutemov wrote: > > On Wed, Feb 21, 2024 at 10:37:29AM +0800, Baoquan He wrote: > > > > diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c > > > > index 3612e3167147..425ff6e192e6 100644 > > > > --- a/arch/x86/mm/pat/set_memory.c > > > > +++ b/arch/x86/mm/pat/set_memory.c > > > > @@ -657,7 +657,8 @@ static inline pgprot_t verify_rwx(pgprot_t old, pgprot_t new, unsigned long star > > > > > > > > /* > > > > * Lookup the page table entry for a virtual address in a specific pgd. > > > > - * Return a pointer to the entry and the level of the mapping. > > > > + * Return a pointer to the entry and the level of the mapping (or NULL if > > > > + * the entry is none) and level of the entry. > > > ^ this right parenthesis may need be moved to the end. > > > > > > > > > ======= > > > * Return a pointer to the entry and the level of the mapping (or NULL if > > > * the entry is none and level of the entry). > > > ======= > > > > Emm.. I like my variant more. We return level regardless if the entry none > > or not. I don't see a reason to repeat it twice. > > > * Lookup the page table entry for a virtual address in a specific pgd. > * Return a pointer to the entry and the level of the mapping (or NULL if > * the entry is none) and level of the entry. > > Hmm, I am confused. Why do we need to stress the level of the mapping > and level of the entry? Wondering what is the difference. I must miss > something. My bad. This is way I meant to write: * Lookup the page table entry for a virtual address in a specific pgd. * Return a pointer to the entry (or NULL if the entry does not exist) and * the level of the entry.
On 02/22/24 at 04:04pm, Kirill A. Shutemov wrote: > On Thu, Feb 22, 2024 at 07:01:41PM +0800, Baoquan He wrote: > > On 02/21/24 at 04:15pm, Kirill A. Shutemov wrote: > > > On Wed, Feb 21, 2024 at 10:37:29AM +0800, Baoquan He wrote: > > > > > diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c > > > > > index 3612e3167147..425ff6e192e6 100644 > > > > > --- a/arch/x86/mm/pat/set_memory.c > > > > > +++ b/arch/x86/mm/pat/set_memory.c > > > > > @@ -657,7 +657,8 @@ static inline pgprot_t verify_rwx(pgprot_t old, pgprot_t new, unsigned long star > > > > > > > > > > /* > > > > > * Lookup the page table entry for a virtual address in a specific pgd. > > > > > - * Return a pointer to the entry and the level of the mapping. > > > > > + * Return a pointer to the entry and the level of the mapping (or NULL if > > > > > + * the entry is none) and level of the entry. > > > > ^ this right parenthesis may need be moved to the end. > > > > > > > > > > > > ======= > > > > * Return a pointer to the entry and the level of the mapping (or NULL if > > > > * the entry is none and level of the entry). > > > > ======= > > > > > > Emm.. I like my variant more. We return level regardless if the entry none > > > or not. I don't see a reason to repeat it twice. > > > > > > * Lookup the page table entry for a virtual address in a specific pgd. > > * Return a pointer to the entry and the level of the mapping (or NULL if > > * the entry is none) and level of the entry. > > > > Hmm, I am confused. Why do we need to stress the level of the mapping > > and level of the entry? Wondering what is the difference. I must miss > > something. > > My bad. This is way I meant to write: > > * Lookup the page table entry for a virtual address in a specific pgd. > * Return a pointer to the entry (or NULL if the entry does not exist) and > * the level of the entry. ACK. Thanks.
On 2/12/24 02:44, Kirill A. Shutemov wrote: > lookup_address() only returns correct page table level for the entry if > the entry is not none. Currently, lookup_address() returns two things: 1. A "pte_t" (which might be a p[g4um]d_t) 2. The 'level' of the page tables where the "pte_t" was found (returned via a pointer) If no pte_t is found, 'level' is essentially garbage. > Make the helper to always return correct 'level'. It allows to implement > iterator over kernel page tables using lookup_address(). One nit with this description: What's "correct" isn't immediately obvious to me. It wasn't exactly incorrect before. I think it would be better to say: Always fill out the level. For NULL "pte_t"s, fill in the level where the p*d_none() entry was found mirroring the "found" behavior. Always filling out the level allows using lookup_address() to iterate over kernel page tables. > Add one more entry into enum pg_level to indicate size of VA covered by > one PGD entry in 5-level paging mode. Needs some 'the's: Add one more entry into enum pg_level to indicate the size of the VA covered by one PGD entry in 5-level paging mode. With that fixed: Reviewed-by: Dave Hansen <dave.hansen@intel.com>
On 2/23/24 10:45, Dave Hansen wrote: > Always filling out the level allows using lookup_address() to > iterate over kernel page tables. This doesn't parse very well. How about this instead: Always filling out the level allows using lookup_address() to precisely skip over holes when walking kernel page tables. I think that more accurately captures what you're doing with it in the next patch.
diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h index 0b748ee16b3d..3f648ffdfbe5 100644 --- a/arch/x86/include/asm/pgtable_types.h +++ b/arch/x86/include/asm/pgtable_types.h @@ -548,6 +548,7 @@ enum pg_level { PG_LEVEL_2M, PG_LEVEL_1G, PG_LEVEL_512G, + PG_LEVEL_256T, PG_LEVEL_NUM }; diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c index f92da8c9a86d..3612e3167147 100644 --- a/arch/x86/mm/pat/set_memory.c +++ b/arch/x86/mm/pat/set_memory.c @@ -666,32 +666,32 @@ pte_t *lookup_address_in_pgd(pgd_t *pgd, unsigned long address, pud_t *pud; pmd_t *pmd; - *level = PG_LEVEL_NONE; + *level = PG_LEVEL_256T; if (pgd_none(*pgd)) return NULL; + *level = PG_LEVEL_512G; p4d = p4d_offset(pgd, address); if (p4d_none(*p4d)) return NULL; - *level = PG_LEVEL_512G; if (p4d_large(*p4d) || !p4d_present(*p4d)) return (pte_t *)p4d; + *level = PG_LEVEL_1G; pud = pud_offset(p4d, address); if (pud_none(*pud)) return NULL; - *level = PG_LEVEL_1G; if (pud_large(*pud) || !pud_present(*pud)) return (pte_t *)pud; + *level = PG_LEVEL_2M; pmd = pmd_offset(pud, address); if (pmd_none(*pmd)) return NULL; - *level = PG_LEVEL_2M; if (pmd_large(*pmd) || !pmd_present(*pmd)) return (pte_t *)pmd;