From patchwork Mon Feb 26 08:29:15 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Isaku Yamahata X-Patchwork-Id: 206480 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:a81b:b0:108:e6aa:91d0 with SMTP id bq27csp1956671dyb; Mon, 26 Feb 2024 01:21:44 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCVfc0Ai86aqKf0ZZfizjJYoKUtC+7tkfTXSNyvN+HZZdFOrITCWNG1O07WsTsmjHx5yHF7p29U0JRBS0YoDEIoz+vTj4w== X-Google-Smtp-Source: AGHT+IHAsuiuRvxRV7Oja4doXK8erqx9HJYHQJrnMoYKmjFGG2fu8dOfasxvvriMzDtv4theCkiY X-Received: by 2002:a17:906:150b:b0:a3f:d742:f353 with SMTP id b11-20020a170906150b00b00a3fd742f353mr4173713ejd.57.1708939303848; Mon, 26 Feb 2024 01:21:43 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1708939303; cv=pass; d=google.com; s=arc-20160816; b=lLHqH3n2Qa9FWvs/8nqu4fO4U3HwqdXOR4amVuOVnFmE4zLtiYw9snGYpDX1m6kr7R vJ9/+ocitvKDz6HBZuZfAqg3CfV60OSGo4LiXAa5MxB7cYouzNsOS6L56hzEceLCbLVX dz6nVOhhUqsC8MHyujgRTVidshXz4Z/QbYmsvLv5wbmWOpJZaOVRc+WMCA6/qpvLvsog r1+pRtWRkWx1TNX5k3k8UMANUL49i3U2ALMAWVI514VWqlY0oBTgSI3zYWX+Dec1ZuiP zc6FTNnSo/Hg0/TZ16uy3zvm1y0mgCHJSKxXkz5KFKzHlU0j/7irVbL2I43dzcCddu46 tCCg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=79R30YyaslGP7kQ3f1DzgT9i0VHB+/H+exnEQvKqudY=; fh=Itbyk7CEvizIrzGEESCqq3I2tZgG1kc/GkVOa3S7Hsg=; b=BxAw4KKhNcCb3QfxQ+FifenGLFmWUpCs3EDUgyOsBWw1wPfXnbj0YW0ToniUdd90/W 40y06dzREbzXjYQ5b+qhANBz2iMZZDZNOSP8g6FLo7vjB39f0yAjKZFBjNDT88DYxMwN 6RjkVvkpfTp2hUEg7ipYt7+6e5D++hEKAC+dLdya+v419N2rZgevMbNv4n5CkRWaH0Bk oyyTCUVWaU5yijLrlqeHnvY9JbZySSTVfDtBcaw1v1S2ga6h12UAgLnECJBZVryY42je T8ddtkOGA0YOFh37U32e/L7gSPdq48NJdBJMgS8UXGkC+fI5dn/00v5uIFLVkk3KmfCz IRJQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=TjBSVyDC; arc=pass (i=1 spf=pass spfdomain=intel.com dkim=pass dkdomain=intel.com dmarc=pass fromdomain=intel.com); spf=pass (google.com: domain of linux-kernel+bounces-80897-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-80897-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [2604:1380:4601:e00::3]) by mx.google.com with ESMTPS id j17-20020a170906095100b00a3dd7f75e15si2009164ejd.439.2024.02.26.01.21.43 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 26 Feb 2024 01:21:43 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-80897-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=TjBSVyDC; arc=pass (i=1 spf=pass spfdomain=intel.com dkim=pass dkdomain=intel.com dmarc=pass fromdomain=intel.com); spf=pass (google.com: domain of linux-kernel+bounces-80897-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-80897-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id DD1A51F265D9 for ; Mon, 26 Feb 2024 09:13:12 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 2A99A13399D; Mon, 26 Feb 2024 08:29:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="TjBSVyDC" Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6AD881CD37; Mon, 26 Feb 2024 08:29:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.8 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708936176; cv=none; b=uinN4Ck6xM9JxDfAhLB5DxlNJNWcHMmWqhVLicafq0jVbl5yiKcRVJyfisXO5W7vAJkwJThgHsjlkdlcv8W0KVSOkorvlk496e/RMUQVmqsdZRbpsmgXKzsI/qm94Gil5UjL+wAtCnn1Y+wBa0g+i5thWG1YTyri0S51JKTW9QA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708936176; c=relaxed/simple; bh=YvsddT3dqFtUhcHe4XrGJkymHl+vcOM1Ih5ZHHag8ws=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=q8aQ8WyTGrR+4kl+a1163S0jxJwXnEeMJ/nGBXFb8F3hX+P67NRDuWe89aUXk5Jp4yqFR4uW0gB0Jlex90CdxYQzKT5p4D+zpUpP/t7T0e6MIoH4O9/255Q5yLD+JVgwzAFKX839VNMN8a3eEYf0xQyjyfHiC63mQmAyuqL/CHc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=TjBSVyDC; arc=none smtp.client-ip=192.198.163.8 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1708936173; x=1740472173; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=YvsddT3dqFtUhcHe4XrGJkymHl+vcOM1Ih5ZHHag8ws=; b=TjBSVyDCesyySEwH/viEibNOLhRtv6VKGAup7E2NTwcoPmD/xne9JgWh 1hPErTih/pcgH384uInxFRsmiTEwlIhp3OuJ7hksT1E3sc12zR4aU2GuL 0+VXK9FjeIhi6ve8ikxnVmbnWAR+MiYsoGN63juYSq9HJvEgvYxlcBQ3R fYWYQO1gzx2KqR+zQLks+Nk9koNTt0iUbJc2i+Yb4rxKVqRIU7Z1yS0Le atL03xvmWKXEMuDfB7gImOCfFsbuyHNq90kwKc7dGOFnIp05nuVpT6fcU KodpWYkW3vxcQiZao0IzUsIUED6WjlGPVHYjKoAKpXd+xyfmcCMocrl9F w==; X-IronPort-AV: E=McAfee;i="6600,9927,10995"; a="20751497" X-IronPort-AV: E=Sophos;i="6.06,185,1705392000"; d="scan'208";a="20751497" Received: from fmviesa008.fm.intel.com ([10.60.135.148]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Feb 2024 00:29:31 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.06,185,1705392000"; d="scan'208";a="6735284" Received: from ls.sc.intel.com (HELO localhost) ([172.25.112.31]) by fmviesa008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Feb 2024 00:29:31 -0800 From: isaku.yamahata@intel.com To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, Paolo Bonzini , erdemaktas@google.com, Sean Christopherson , Sagi Shahar , Kai Huang , chen.bo@intel.com, hang.yuan@intel.com, tina.zhang@intel.com Subject: [PATCH v8 01/14] KVM: Add transparent hugepage support for dedicated guest memory Date: Mon, 26 Feb 2024 00:29:15 -0800 Message-Id: <6fdc566ffb45eeaa653ec21c0a539723b8ee056d.1708933624.git.isaku.yamahata@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1791952739354591342 X-GMAIL-MSGID: 1791952739354591342 From: Sean Christopherson Extended guest_memfd to allow backing guest memory with transparent hugepages. Require userspace to opt-in via a flag even though there's no known/anticipated use case for forcing small pages as THP is optional, i.e. to avoid ending up in a situation where userspace is unaware that KVM can't provide hugepages. For simplicity, require the guest_memfd size to be a multiple of the hugepage size, e.g. so that KVM doesn't need to do bounds checking when deciding whether or not to allocate a huge folio. When reporting the max order when KVM gets a pfn from guest_memfd, force order-0 pages if the hugepage is not fully contained by the memslot binding, e.g. if userspace requested hugepages but punches a hole in the memslot bindings in order to emulate x86's VGA hole. Signed-off-by: Sean Christopherson Link: https://lore.kernel.org/r/20231027182217.3615211-18-seanjc@google.com Signed-off-by: Isaku Yamahata --- Documentation/virt/kvm/api.rst | 7 ++++ include/uapi/linux/kvm.h | 2 + virt/kvm/guest_memfd.c | 73 ++++++++++++++++++++++++++++++---- 3 files changed, 75 insertions(+), 7 deletions(-) diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 4b70d2b43532..213738a38b07 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -6312,6 +6312,8 @@ and cannot be resized (guest_memfd files do however support PUNCH_HOLE). __u64 reserved[6]; }; + #define KVM_GUEST_MEMFD_ALLOW_HUGEPAGE (1ULL << 0) + Conceptually, the inode backing a guest_memfd file represents physical memory, i.e. is coupled to the virtual machine as a thing, not to a "struct kvm". The file itself, which is bound to a "struct kvm", is that instance's view of the @@ -6328,6 +6330,11 @@ most one mapping per page, i.e. binding multiple memory regions to a single guest_memfd range is not allowed (any number of memory regions can be bound to a single guest_memfd file, but the bound ranges must not overlap). +If KVM_GUEST_MEMFD_ALLOW_HUGEPAGE is set in flags, KVM will attempt to allocate +and map hugepages for the guest_memfd file. This is currently best effort. If +KVM_GUEST_MEMFD_ALLOW_HUGEPAGE is set, the size must be aligned to the maximum +transparent hugepage size supported by the kernel + See KVM_SET_USER_MEMORY_REGION2 for additional details. 4.143 KVM_MEMORY_MAPPING diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index a7aa804ef021..47faaf71799f 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -2317,6 +2317,8 @@ struct kvm_create_guest_memfd { __u64 reserved[6]; }; +#define KVM_GUEST_MEMFD_ALLOW_HUGEPAGE (1ULL << 0) + #define KVM_MEMORY_MAPPING _IOWR(KVMIO, 0xd5, struct kvm_memory_mapping) struct kvm_memory_mapping { diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index 3830d50b9b67..236443c3d8dc 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -13,14 +13,47 @@ struct kvm_gmem { struct list_head entry; }; -static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index) +static struct folio *kvm_gmem_get_huge_folio(struct inode *inode, pgoff_t index) { +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + unsigned long huge_index = round_down(index, HPAGE_PMD_NR); + unsigned long flags = (unsigned long)inode->i_private; + struct address_space *mapping = inode->i_mapping; + gfp_t gfp = mapping_gfp_mask(mapping); struct folio *folio; - /* TODO: Support huge pages. */ - folio = filemap_grab_folio(inode->i_mapping, index); - if (IS_ERR_OR_NULL(folio)) + if (!(flags & KVM_GUEST_MEMFD_ALLOW_HUGEPAGE)) + return NULL; + + if (filemap_range_has_page(mapping, huge_index << PAGE_SHIFT, + (huge_index + HPAGE_PMD_NR - 1) << PAGE_SHIFT)) + return NULL; + + folio = filemap_alloc_folio(gfp, HPAGE_PMD_ORDER); + if (!folio) + return NULL; + + if (filemap_add_folio(mapping, folio, huge_index, gfp)) { + folio_put(folio); return NULL; + } + + return folio; +#else + return NULL; +#endif +} + +static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index) +{ + struct folio *folio; + + folio = kvm_gmem_get_huge_folio(inode, index); + if (!folio) { + folio = filemap_grab_folio(inode->i_mapping, index); + if (IS_ERR_OR_NULL(folio)) + return NULL; + } /* * Use the up-to-date flag to track whether or not the memory has been @@ -363,6 +396,7 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags) inode->i_mode |= S_IFREG; inode->i_size = size; mapping_set_gfp_mask(inode->i_mapping, GFP_HIGHUSER); + mapping_set_large_folios(inode->i_mapping); mapping_set_unmovable(inode->i_mapping); /* Unmovable mappings are supposed to be marked unevictable as well. */ WARN_ON_ONCE(!mapping_unevictable(inode->i_mapping)); @@ -388,12 +422,21 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args) u64 flags = args->flags; u64 valid_flags = 0; + if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) + valid_flags |= KVM_GUEST_MEMFD_ALLOW_HUGEPAGE; + if (flags & ~valid_flags) return -EINVAL; if (size <= 0 || !PAGE_ALIGNED(size)) return -EINVAL; +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + if ((flags & KVM_GUEST_MEMFD_ALLOW_HUGEPAGE) && + !IS_ALIGNED(size, HPAGE_PMD_SIZE)) + return -EINVAL; +#endif + return __kvm_gmem_create(kvm, size, flags); } @@ -488,7 +531,7 @@ void kvm_gmem_unbind(struct kvm_memory_slot *slot) int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, gfn_t gfn, kvm_pfn_t *pfn, int *max_order) { - pgoff_t index = gfn - slot->base_gfn + slot->gmem.pgoff; + pgoff_t index, huge_index; struct kvm_gmem *gmem; struct folio *folio; struct page *page; @@ -501,6 +544,7 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, gmem = file->private_data; + index = gfn - slot->base_gfn + slot->gmem.pgoff; if (WARN_ON_ONCE(xa_load(&gmem->bindings, index) != slot)) { r = -EIO; goto out_fput; @@ -520,9 +564,24 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, page = folio_file_page(folio, index); *pfn = page_to_pfn(page); - if (max_order) - *max_order = 0; + if (!max_order) + goto success; + + *max_order = compound_order(compound_head(page)); + if (!*max_order) + goto success; + /* + * The folio can be mapped with a hugepage if and only if the folio is + * fully contained by the range the memslot is bound to. Note, the + * caller is responsible for handling gfn alignment, this only deals + * with the file binding. + */ + huge_index = ALIGN(index, 1ull << *max_order); + if (huge_index < ALIGN(slot->gmem.pgoff, 1ull << *max_order) || + huge_index + (1ull << *max_order) > slot->gmem.pgoff + slot->npages) + *max_order = 0; +success: r = 0; out_unlock: