Message ID | 20240103084424.20014-1-yan.y.zhao@intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel+bounces-15309-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:6f82:b0:100:9c79:88ff with SMTP id tb2csp4910411dyb; Wed, 3 Jan 2024 01:14:04 -0800 (PST) X-Google-Smtp-Source: AGHT+IEcXlBBwGjVONMJdKSVBGMPIuk384RMgGDlXbPidC3RTAdd7ui110xNP0yFnlYxFpcq9NuQ X-Received: by 2002:a17:90b:1c8b:b0:28b:bcd6:896c with SMTP id oo11-20020a17090b1c8b00b0028bbcd6896cmr929894pjb.38.1704273241948; Wed, 03 Jan 2024 01:14:01 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1704273241; cv=none; d=google.com; s=arc-20160816; b=aMLMRopZsims5YRgde/9z2yCI+AGMYPMhD5nO5hD3U80EhzmZFERucGltltj0odZ/u Sb0rsXT1bI4IXYNbmu5C9es3Wu0EZppQzoRHpTI45U+hDUAF/UGUjP35/HFXb+f4Wm6/ QqlvFzM3mdcA1C3w+EVWcynrS0JefNMSE8jPPV3XcbYC6iZyR5TrF3ezYZo6yrmKOr9u CygugE2CWdNn7kyobj9306BICfmGuNPd3kYY7ClF9aElbvvIQ58u4DlQ3qlp28YmcGyz 1SyadWQzNL3+t3YXOjIqODN2pWCONCiez4oP9NgCKqfCWBdw539P+qNt2tKXpNUUDukW MqPw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-unsubscribe:list-subscribe:list-id:precedence:references :in-reply-to:message-id:date:subject:cc:to:from:dkim-signature; bh=gmOBVm90mvWxoOmh0mVvC775r9Y7/N/8mkT5N+ec3MU=; fh=Vx93WHraSNwAtImeuZXYuwSlSJ9uZLtDICdtmXdGBhg=; b=rng1HU0j39xDRJhRyzl4ANlj5WqTL1YY8OGvYzahlfXy1kq588Cr1pbbKntOQh71GP 7Z5WHe86S45GxCP0P1xhZ6XuvIYQy55OEzzco2Vvo2a56pkPZIET5JYRaI20RM34j2Kv u28HEIxPYlV/D4mWQYc0yzTFaPf8g+Im98LUxkeKjBdiOvwSuceRVpYZFLmkQQ6OvtVB 9jDc+BrHtqPvoD97hMAxpeuu1duzFMZmy/zWP9piCJA4Ix0fPvfBOWv0+F+qg0CCEIc/ jVIVh34pAYJxAdUYerqhae1pZqQIgCxH4Hg46uhP8r5KUKy2DgLsus8OqYUdqwDVDj0Q BXzg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=H7lIBFb2; spf=pass (google.com: domain of linux-kernel+bounces-15309-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-15309-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id h7-20020a654047000000b005cdba90861bsi11727849pgp.39.2024.01.03.01.14.01 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 03 Jan 2024 01:14:01 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-15309-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=H7lIBFb2; spf=pass (google.com: domain of linux-kernel+bounces-15309-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-15309-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 2C478285390 for <ouuuleilei@gmail.com>; Wed, 3 Jan 2024 09:14:01 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 11F3A18631; Wed, 3 Jan 2024 09:13:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="H7lIBFb2" X-Original-To: linux-kernel@vger.kernel.org Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.65]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C373E18AE4; Wed, 3 Jan 2024 09:13:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1704273218; x=1735809218; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=cTwg2O0ziBhg+OL/NfG87tprFRUxjdv/1jQ57i90GUg=; b=H7lIBFb29p+Shbn5LxJwQYIVc3qoc/0JpNyGoSnAuy7/dx1E7amP2oWu xFQV+cfx30Q2BREM2n5VfKL7+nPYGU7FRhtRaYR0l8V2714hb5gY0cHuc J4rRxfcL63+a6J17tREBSjU0mtp8kjRtlBuc1/gHblNlF5ljjNu+Trlun uptXjwZ2+khPmFykgbXnE3Gp8aFa9dOWJpQaF+AZ7nwGnjcEV05pKs18j btPCtPyvUdVeVU8l2wsShJdzbZGI1PBzPGqo0tFn1SBejtUP+tdusiB/z y9oELmnHbDtb2R47O8L60LPLOljEwkkddSTBk079mIjdK33i0yej+SZIA w==; X-IronPort-AV: E=McAfee;i="6600,9927,10941"; a="400794941" X-IronPort-AV: E=Sophos;i="6.04,327,1695711600"; d="scan'208";a="400794941" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Jan 2024 01:13:37 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10941"; a="814207245" X-IronPort-AV: E=Sophos;i="6.04,327,1695711600"; d="scan'208";a="814207245" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Jan 2024 01:13:35 -0800 From: Yan Zhao <yan.y.zhao@intel.com> To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org Cc: pbonzini@redhat.com, seanjc@google.com, shuah@kernel.org, stevensd@chromium.org, Yan Zhao <yan.y.zhao@intel.com> Subject: [RFC PATCH v2 1/3] KVM: allow mapping of compound tail pages for IO or PFNMAP mapping Date: Wed, 3 Jan 2024 16:44:24 +0800 Message-Id: <20240103084424.20014-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20240103084327.19955-1-yan.y.zhao@intel.com> References: <20240103084327.19955-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1787059996438594106 X-GMAIL-MSGID: 1787060018692078774 |
Series |
KVM: allow mapping of compound tail pages for IO or PFNMAP mapping
|
|
Commit Message
Yan Zhao
Jan. 3, 2024, 8:44 a.m. UTC
Allow mapping of tail pages of compound pages for IO or PFNMAP mapping
by trying and getting ref count of its head page.
For IO or PFNMAP mapping, sometimes it's backed by compound pages.
KVM will just return error on mapping of tail pages of the compound pages,
as ref count of the tail pages are always 0.
So, rather than check and add ref count of a tail page, check and add ref
count of its folio (head page) to allow mapping of the compound tail pages.
This will not break the origial intention to disallow mapping of tail pages
of non-compound higher order allocations as the folio of a non-compound
tail page is the same as the page itself.
On the other side, put_page() has already converted page to folio before
putting page ref.
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
virt/kvm/kvm_main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Comments
On Wed, Jan 03, 2024, Yan Zhao wrote: > Allow mapping of tail pages of compound pages for IO or PFNMAP mapping > by trying and getting ref count of its head page. > > For IO or PFNMAP mapping, sometimes it's backed by compound pages. > KVM will just return error on mapping of tail pages of the compound pages, > as ref count of the tail pages are always 0. > > So, rather than check and add ref count of a tail page, check and add ref > count of its folio (head page) to allow mapping of the compound tail pages. Can you add a blurb to call out that this is effectively what gup() does in try_get_folio()? That knowledge give me a _lot_ more confidence that this is correct (I didn't think too deeply about what this patch was doing when I looked at v1). > This will not break the origial intention to disallow mapping of tail pages > of non-compound higher order allocations as the folio of a non-compound > tail page is the same as the page itself. > > On the other side, put_page() has already converted page to folio before > putting page ref. > > Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> > --- > virt/kvm/kvm_main.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index acd67fb40183..f53b58446ac7 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -2892,7 +2892,7 @@ static int kvm_try_get_pfn(kvm_pfn_t pfn) > if (!page) > return 1; > > - return get_page_unless_zero(page); > + return folio_try_get(page_folio(page)); This seems like it needs retry logic, a la try_get_folio(), to guard against a race with the folio being split. From page_folio(): If the caller* does not hold a reference, this call may race with a folio split, so it should re-check the folio still contains this page after gaining a reference on the folio. I assume that splitting one of these folios is extremely unlikely, but I don't see any harm in being paranoid (unless this really truly cannot race).
On Mon, Feb 12, 2024 at 07:17:21PM -0800, Sean Christopherson wrote: > On Wed, Jan 03, 2024, Yan Zhao wrote: > > Allow mapping of tail pages of compound pages for IO or PFNMAP mapping > > by trying and getting ref count of its head page. > > > > For IO or PFNMAP mapping, sometimes it's backed by compound pages. > > KVM will just return error on mapping of tail pages of the compound pages, > > as ref count of the tail pages are always 0. > > > > So, rather than check and add ref count of a tail page, check and add ref > > count of its folio (head page) to allow mapping of the compound tail pages. > > Can you add a blurb to call out that this is effectively what gup() does in > try_get_folio()? That knowledge give me a _lot_ more confidence that this is > correct (I didn't think too deeply about what this patch was doing when I looked > at v1). Sure. > > > This will not break the origial intention to disallow mapping of tail pages > > of non-compound higher order allocations as the folio of a non-compound > > tail page is the same as the page itself. > > > > On the other side, put_page() has already converted page to folio before > > putting page ref. > > > > Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> > > --- > > virt/kvm/kvm_main.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > > index acd67fb40183..f53b58446ac7 100644 > > --- a/virt/kvm/kvm_main.c > > +++ b/virt/kvm/kvm_main.c > > @@ -2892,7 +2892,7 @@ static int kvm_try_get_pfn(kvm_pfn_t pfn) > > if (!page) > > return 1; > > > > - return get_page_unless_zero(page); > > + return folio_try_get(page_folio(page)); > > This seems like it needs retry logic, a la try_get_folio(), to guard against a > race with the folio being split. From page_folio(): > > If the caller* does not hold a reference, this call may race with a folio split, > so it should re-check the folio still contains this page after gaining a > reference on the folio. > > I assume that splitting one of these folios is extremely unlikely, but I don't > see any harm in being paranoid (unless this really truly cannot race). Yes, you are right! Will do the retry. Thanks!
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index acd67fb40183..f53b58446ac7 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2892,7 +2892,7 @@ static int kvm_try_get_pfn(kvm_pfn_t pfn) if (!page) return 1; - return get_page_unless_zero(page); + return folio_try_get(page_folio(page)); } static int hva_to_pfn_remapped(struct vm_area_struct *vma,