From patchwork Mon Feb 5 12:01:46 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Gowans, James" X-Patchwork-Id: 196772 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:168b:b0:106:860b:bbdd with SMTP id ma11csp825545dyb; Mon, 5 Feb 2024 04:03:42 -0800 (PST) X-Google-Smtp-Source: AGHT+IHBh+4CH5zIIFcAFk58i+a9m0XsxcYSfCaqYMZw2gdbkW9p8TEcbcFk6a8sKyj4bzDFHLvY X-Received: by 2002:aa7:c445:0:b0:55f:89d7:132e with SMTP id n5-20020aa7c445000000b0055f89d7132emr4952009edr.39.1707134622720; Mon, 05 Feb 2024 04:03:42 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707134622; cv=pass; d=google.com; s=arc-20160816; b=rPUxpwc9aXmm18I7noLdZOpd0w2pMxr8EyBvsIlyEocgP2U5zFepdm9Z++mOjWHOMZ UWCgBRhczneHipXOHT/TPw9+cKECfSMPjrxlur1bDoUItm+9jKGak9vWSRUsvDvuTDHx fkbj9ENh04SEgnu8WXk2THzwN/cEAY5++hcH+7Mk9YMZL2JlDjrtgLBw1tlBpcy+lYGl WcSTbw19Q5rCHevgQPFfywsb9/TCxLpoUn1z258at25hR1clXSWTdfk0J+MTLQASt0Xm Ttw6f+TFW7x4AZWC2d/Hg4ohfkErErXTltJjl6WQXA+4mBO4RQPAHrlVEKU8c7DVXQ14 nxDw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=ulQJ3D4Me8ScxlcztMMDIgEvnR/mGPMynfspToLkF2M=; fh=E3i5FxqiYooV0m/tp5j91cs+Z6Bi94Pgrt5rWJqe6NU=; b=o2Zn946DPKRsh2v9RVyk0uaFA/0uAX4uhuFts0XmyxIrsm8W7ANjUr67h54Fl6XdrO yJGLaa6Nqe8q5fQwVbP13F0o7mOBA9K0b4yNNIHyboNAfUXDuHospx5YdB5zD1NFU4GS d2jgaicT9xwBGkVRvPDDOf/vAwluBuG/my/L9NykF7AQVATnA+0+BBma3ylvL+XsQMO2 q03rsaKXr1NZkvpFRA09SHktgchFzIDQeABDBPV5qfhVaTgPfjxbyQp+Dck4rg2kw/0j 5FKZamfXzfFYBjBihFEFbK/Hm2Gvk4ARPSSJbN887KQm8748KlzK00kHbFnQtAaJb7Q3 8hpw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b=b3mmLUM8; arc=pass (i=1 spf=pass spfdomain=amazon.com dkim=pass dkdomain=amazon.com dmarc=pass fromdomain=amazon.com); spf=pass (google.com: domain of linux-kernel+bounces-52542-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-52542-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com X-Forwarded-Encrypted: i=1; AJvYcCWcc+rUfCJrWVs1WMX4uLwt5GHUhDl9a7HuhRDie+7NWmtV7AKue4nvOCvYHDGNP2IbPOzu1yDrY8y9mWhjkQ4EkBA4Mg== Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id a8-20020a509b48000000b0055fc426b511si3943255edj.641.2024.02.05.04.03.42 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 05 Feb 2024 04:03:42 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-52542-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b=b3mmLUM8; arc=pass (i=1 spf=pass spfdomain=amazon.com dkim=pass dkdomain=amazon.com dmarc=pass fromdomain=amazon.com); spf=pass (google.com: domain of linux-kernel+bounces-52542-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-52542-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id EF6AB1F221ED for ; Mon, 5 Feb 2024 12:03:41 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 042241B943; Mon, 5 Feb 2024 12:02:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="b3mmLUM8" Received: from smtp-fw-9106.amazon.com (smtp-fw-9106.amazon.com [207.171.188.206]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E87BF1AACF; Mon, 5 Feb 2024 12:02:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=207.171.188.206 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707134549; cv=none; b=tN9N11+/bs67/kIY+g53EPbcEEeflSkH1zs3kkYKw50DgqCZk+alRRPFTW+KAOGdhlMgRqs+u9xj9iM0/46tdoApKMqW2dN4/sDwtgNajAeXc7fLFkvkVKUpKOc25Rj4bSU2fpCHJrEE1Q/ORMxkQfMNFV2pzcG2ZgxKpXfn0Mw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707134549; c=relaxed/simple; bh=RV+Ytw8ThVExKlkqCCz/3DDoByyrvDLhxUnYt76lnO8=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=HTg/jKuyBiCsP4PFdL2Phu3DBq0wFvOGyBG5mqOSiHR07vZxr3l6B9+KWf3M8Wf2GaJrdmZCtnu38Lb2dgWoLk4zWyuWj9iEy8jdm9mWEDGjBGnmLtwsd7c7Jwp/DVcTgpNaY9rEheBFHYx2Gaj2C5M/+HVfXKmeplAtySGOWaE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com; spf=pass smtp.mailfrom=amazon.com; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b=b3mmLUM8; arc=none smtp.client-ip=207.171.188.206 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1707134549; x=1738670549; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ulQJ3D4Me8ScxlcztMMDIgEvnR/mGPMynfspToLkF2M=; b=b3mmLUM8uxUdbQnSkV2tBea6B5synisslurnQMLhT1Y48dqsPRoeoRTI 6zWc/GGxW9avbDPVCdKxH58P9Dr0k5iHKpkCzx+wqjbVJk03MVe6yB7e7 EYdCy2Rd/TN/Ur0HgkAZqbavOeRmsOHJuvj7iR+0Yjh9S/ByJIIE4Kmac g=; X-IronPort-AV: E=Sophos;i="6.05,245,1701129600"; d="scan'208";a="702145833" Received: from pdx4-co-svc-p1-lb2-vlan2.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.25.36.210]) by smtp-border-fw-9106.sea19.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2024 12:02:22 +0000 Received: from EX19MTAEUB002.ant.amazon.com [10.0.43.254:59802] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.28.192:2525] with esmtp (Farcaster) id c85ecf83-e3a4-4c42-963b-1a5c7099c0b7; Mon, 5 Feb 2024 12:02:20 +0000 (UTC) X-Farcaster-Flow-ID: c85ecf83-e3a4-4c42-963b-1a5c7099c0b7 Received: from EX19D014EUC004.ant.amazon.com (10.252.51.182) by EX19MTAEUB002.ant.amazon.com (10.252.51.59) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:02:20 +0000 Received: from dev-dsk-jgowans-1a-a3faec1f.eu-west-1.amazon.com (172.19.112.191) by EX19D014EUC004.ant.amazon.com (10.252.51.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:02:14 +0000 From: James Gowans To: CC: Eric Biederman , , "Joerg Roedel" , Will Deacon , , Alexander Viro , "Christian Brauner" , , Paolo Bonzini , Sean Christopherson , , Andrew Morton , , Alexander Graf , David Woodhouse , "Jan H . Schoenherr" , Usama Arif , Anthony Yznaga , Stanislav Kinsburskii , , , Subject: [RFC 01/18] pkernfs: Introduce filesystem skeleton Date: Mon, 5 Feb 2024 12:01:46 +0000 Message-ID: <20240205120203.60312-2-jgowans@amazon.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240205120203.60312-1-jgowans@amazon.com> References: <20240205120203.60312-1-jgowans@amazon.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: EX19D045UWA002.ant.amazon.com (10.13.139.12) To EX19D014EUC004.ant.amazon.com (10.252.51.182) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1790060394093341453 X-GMAIL-MSGID: 1790060394093341453 Add an in-memory filesystem: pkernfs. Memory is donated to pkernfs by carving it out of the normal System RAM range with the memmap= cmdline parameter and then giving that same physical range to pkernfs with the pkernfs= cmdline parameter. A new filesystem is added; so far it doesn't do much except persist a super block at the start of the donated memory and allows itself to be mounted. --- fs/Kconfig | 1 + fs/Makefile | 3 ++ fs/pkernfs/Kconfig | 9 ++++ fs/pkernfs/Makefile | 6 +++ fs/pkernfs/pkernfs.c | 99 ++++++++++++++++++++++++++++++++++++++++++++ fs/pkernfs/pkernfs.h | 6 +++ 6 files changed, 124 insertions(+) create mode 100644 fs/pkernfs/Kconfig create mode 100644 fs/pkernfs/Makefile create mode 100644 fs/pkernfs/pkernfs.c create mode 100644 fs/pkernfs/pkernfs.h diff --git a/fs/Kconfig b/fs/Kconfig index aa7e03cc1941..33a9770ae657 100644 --- a/fs/Kconfig +++ b/fs/Kconfig @@ -331,6 +331,7 @@ source "fs/sysv/Kconfig" source "fs/ufs/Kconfig" source "fs/erofs/Kconfig" source "fs/vboxsf/Kconfig" +source "fs/pkernfs/Kconfig" endif # MISC_FILESYSTEMS diff --git a/fs/Makefile b/fs/Makefile index f9541f40be4e..1af35b494b5d 100644 --- a/fs/Makefile +++ b/fs/Makefile @@ -19,6 +19,9 @@ obj-y := open.o read_write.o file_table.o super.o \ obj-$(CONFIG_BUFFER_HEAD) += buffer.o mpage.o obj-$(CONFIG_PROC_FS) += proc_namespace.o + +obj-y += pkernfs/ + obj-$(CONFIG_LEGACY_DIRECT_IO) += direct-io.o obj-y += notify/ obj-$(CONFIG_EPOLL) += eventpoll.o diff --git a/fs/pkernfs/Kconfig b/fs/pkernfs/Kconfig new file mode 100644 index 000000000000..59621a1d9aef --- /dev/null +++ b/fs/pkernfs/Kconfig @@ -0,0 +1,9 @@ +# SPDX-License-Identifier: GPL-2.0-only + +config PKERNFS_FS + bool "Persistent Kernel filesystem (pkernfs)" + help + An in-memory filesystem on top of reserved memory specified via + pkernfs= cmdline argument. Used for storing kernel state and + userspace memory which is preserved across kexec to support + live update. diff --git a/fs/pkernfs/Makefile b/fs/pkernfs/Makefile new file mode 100644 index 000000000000..17258cb77f58 --- /dev/null +++ b/fs/pkernfs/Makefile @@ -0,0 +1,6 @@ +# SPDX-License-Identifier: GPL-2.0-only +# +# Makefile for persistent kernel filesystem +# + +obj-$(CONFIG_PKERNFS_FS) += pkernfs.o diff --git a/fs/pkernfs/pkernfs.c b/fs/pkernfs/pkernfs.c new file mode 100644 index 000000000000..4c476ddc35b6 --- /dev/null +++ b/fs/pkernfs/pkernfs.c @@ -0,0 +1,99 @@ +// SPDX-License-Identifier: GPL-2.0-only + +#include "pkernfs.h" +#include +#include +#include +#include +#include + +static phys_addr_t pkernfs_base, pkernfs_size; +static void *pkernfs_mem; +static const struct super_operations pkernfs_super_ops = { }; + +static int pkernfs_fill_super(struct super_block *sb, struct fs_context *fc) +{ + struct inode *inode; + struct dentry *dentry; + struct pkernfs_sb *psb; + + pkernfs_mem = memremap(pkernfs_base, pkernfs_size, MEMREMAP_WB); + psb = (struct pkernfs_sb *) pkernfs_mem; + + if (psb->magic_number == PKERNFS_MAGIC_NUMBER) { + pr_info("pkernfs: Restoring from super block\n"); + } else { + pr_info("pkernfs: Clean super block; initialising\n"); + psb->magic_number = PKERNFS_MAGIC_NUMBER; + } + + sb->s_op = &pkernfs_super_ops; + + inode = new_inode(sb); + if (!inode) + return -ENOMEM; + + inode->i_ino = 1; + inode->i_mode = S_IFDIR; + inode->i_op = &simple_dir_inode_operations; + inode->i_fop = &simple_dir_operations; + inode->i_atime = inode->i_mtime = current_time(inode); + inode_set_ctime_current(inode); + /* directory inodes start off with i_nlink == 2 (for "." entry) */ + inc_nlink(inode); + + dentry = d_make_root(inode); + if (!dentry) + return -ENOMEM; + sb->s_root = dentry; + + return 0; +} + +static int pkernfs_get_tree(struct fs_context *fc) +{ + return get_tree_nodev(fc, pkernfs_fill_super); +} + +static const struct fs_context_operations pkernfs_context_ops = { + .get_tree = pkernfs_get_tree, +}; + +static int pkernfs_init_fs_context(struct fs_context *const fc) +{ + fc->ops = &pkernfs_context_ops; + return 0; +} + +static struct file_system_type pkernfs_fs_type = { + .owner = THIS_MODULE, + .name = "pkernfs", + .init_fs_context = pkernfs_init_fs_context, + .kill_sb = kill_litter_super, + .fs_flags = FS_USERNS_MOUNT, +}; + +static int __init pkernfs_init(void) +{ + int ret; + + ret = register_filesystem(&pkernfs_fs_type); + return ret; +} + +/** + * Format: pkernfs=: + * Just like: memmap=nn[KMG]!ss[KMG] + */ +static int __init parse_pkernfs_extents(char *p) +{ + pkernfs_size = memparse(p, &p); + p++; /* Skip over ! char */ + pkernfs_base = memparse(p, &p); + return 0; +} + +early_param("pkernfs", parse_pkernfs_extents); + +MODULE_ALIAS_FS("pkernfs"); +module_init(pkernfs_init); diff --git a/fs/pkernfs/pkernfs.h b/fs/pkernfs/pkernfs.h new file mode 100644 index 000000000000..bd1e2a6fd336 --- /dev/null +++ b/fs/pkernfs/pkernfs.h @@ -0,0 +1,6 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ + +#define PKERNFS_MAGIC_NUMBER 0x706b65726e6673 +struct pkernfs_sb { + unsigned long magic_number; +}; From patchwork Mon Feb 5 12:01:47 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Gowans, James" X-Patchwork-Id: 196775 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:168b:b0:106:860b:bbdd with SMTP id ma11csp826997dyb; Mon, 5 Feb 2024 04:06:06 -0800 (PST) X-Google-Smtp-Source: AGHT+IGaiQsmxYXX2JhfqkqhB33DpnFOwBkT1PglacuOfXBmIrnE+sh496i6s/IEpUQ9o/JV9z0J X-Received: by 2002:a17:902:dac7:b0:1d8:ffbe:82d9 with SMTP id q7-20020a170902dac700b001d8ffbe82d9mr8714076plx.14.1707134766101; Mon, 05 Feb 2024 04:06:06 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707134766; cv=pass; d=google.com; s=arc-20160816; b=bubD9ZYdqXx5tZ17Fvd7uC3bXwRB8HpeNrOWRAh//CQO3SJ5LrhnEpwE1clQv5ERSL tKORg5OOcnS002Qwm3BDYlO+axZsqB8N606GmIgQad2pxbvsgq0z/J9pMeFk2s3nLMIs 3+rj/M2KGEYQgeVdFLaNzrULhnZ/SPMpFd2O5V0Gs9IcMRFz9NY4TWcF8I5by2TdpbYu GyiUtMH+I8OR6bqZU0jbBwoSkxvW0Qi/GQUkkFIFhgBJ58ZnuikMd2v68yS2nn4p/FfE ND0dAVTqI4ZWhtJn3e6z2yLaml7LvtEZZrstS7qRdeUG0bgJZurTogbzLrbMlXglCwnO LVVA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=DNmTqGFg0B4wwqijKuh2wndJnn1CFCRShUGGQJLOmFw=; fh=JiiN7xHB9VipojDCEdIGszUCirdeUuKSO6l9Llu7Kis=; b=uTZ0UmHwnXp6+Z1FVvdRR/7teX36xJ94XRHrbfJusHPq5IIbR0EE9lNnfLRfvflY5b 100vsKTIhaLgZh8q7GAcGLMW5UneBWtEeNxOEsc0xnE259TOkCyu2u6SnYH+vhMRDuuW sE0rdL9J+V+rF5lFl2w0bSlHoqJkEW5NNBQbxyFKZ0rGwGWEdZ0lN85sOgnSW98ZC26S Kkfl9Bw1vgz3d+TuBfsTuJ2biWFn1ldfOmE982sc6uYenXNSmiz0Z6W43HF+z7/HEkaL AkLha3N7Ur3Kd3mtJcXVSy99XdouJ0a7eL9An5Bx6IkyYBpyD6P7us7wXooEOP9xmzMD wu2w==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b="TA0K/s1I"; arc=pass (i=1 spf=pass spfdomain=amazon.com dkim=pass dkdomain=amazon.com dmarc=pass fromdomain=amazon.com); spf=pass (google.com: domain of linux-kernel+bounces-52543-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-52543-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com X-Forwarded-Encrypted: i=1; AJvYcCXnkcskkZOhibWANHO0ierPqHJJFRWefe5+9MEbdFAXQtO7kcAxsORoJM855X+wx9M96crzrQNov6eW1tqBJxvyHqHDfg== Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id r12-20020a170902c60c00b001d9c1d53c82si754003plr.57.2024.02.05.04.06.05 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 05 Feb 2024 04:06:06 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-52543-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b="TA0K/s1I"; arc=pass (i=1 spf=pass spfdomain=amazon.com dkim=pass dkdomain=amazon.com dmarc=pass fromdomain=amazon.com); spf=pass (google.com: domain of linux-kernel+bounces-52543-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-52543-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 11B4B281015 for ; Mon, 5 Feb 2024 12:03:53 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id DDC561BC39; Mon, 5 Feb 2024 12:02:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="TA0K/s1I" Received: from smtp-fw-80007.amazon.com (smtp-fw-80007.amazon.com [99.78.197.218]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CB5331B7FC; Mon, 5 Feb 2024 12:02:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=99.78.197.218 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707134551; cv=none; b=YtqlS9XWObXzGjvXO4VGj97Cfts5pzf17z4It4SHLmTQhFhcpb6E05grmxwGbtJq41X5gF6mOpr4918jN9QwGm++H1n/pKUvw2sxkrXf3Z+B2yjB9XsFta703QnLpq15zizBMqzsyYdowRFVYwDNSGe3oDB6Purbes2H2WnTld4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707134551; c=relaxed/simple; bh=SnpnjsIe5mX0+E5JVEhCwr63Ah3ckizateAvEwVUJoo=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=QH326F/wTJacONs6mn3+FHQ4TbeWa+g3aaQRHrQMyg3MfpiMp6ryYETf/oQJ9ZZVO7HOJYnevAsYPOACpqpsB77wTWSFOnc0N0nnqGLIcaDU+gzNwvUSyNSoTLt+XjDAc5wo9La1rWKBu2ddDv3DZYjAoh7uIBE0KVdowWGQIpg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com; spf=pass smtp.mailfrom=amazon.com; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b=TA0K/s1I; arc=none smtp.client-ip=99.78.197.218 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1707134550; x=1738670550; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=DNmTqGFg0B4wwqijKuh2wndJnn1CFCRShUGGQJLOmFw=; b=TA0K/s1IJHf/prFJ+13DsO3hRi6rIgNVOyxZkj9Mc4tZPsOBC4X0GPps 4beNfJUsnGEfWYwxzHtz+IGhFINgJh5da5qq9tB4yyUzCYKcIS3rOnkyO FTdoHF0fuS35Wad33aVxPAF3orkzA+n9bitGEca8aj8giW0lsHjNSMJSJ o=; X-IronPort-AV: E=Sophos;i="6.05,245,1701129600"; d="scan'208";a="271936637" Received: from pdx4-co-svc-p1-lb2-vlan2.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.25.36.210]) by smtp-border-fw-80007.pdx80.corp.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2024 12:02:28 +0000 Received: from EX19MTAEUC001.ant.amazon.com [10.0.43.254:3818] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.33.186:2525] with esmtp (Farcaster) id e767a7b8-4373-41b8-af32-a81e33b618aa; Mon, 5 Feb 2024 12:02:27 +0000 (UTC) X-Farcaster-Flow-ID: e767a7b8-4373-41b8-af32-a81e33b618aa Received: from EX19D014EUC004.ant.amazon.com (10.252.51.182) by EX19MTAEUC001.ant.amazon.com (10.252.51.193) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:02:26 +0000 Received: from dev-dsk-jgowans-1a-a3faec1f.eu-west-1.amazon.com (172.19.112.191) by EX19D014EUC004.ant.amazon.com (10.252.51.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:02:20 +0000 From: James Gowans To: CC: Eric Biederman , , "Joerg Roedel" , Will Deacon , , Alexander Viro , "Christian Brauner" , , Paolo Bonzini , Sean Christopherson , , Andrew Morton , , Alexander Graf , David Woodhouse , "Jan H . Schoenherr" , Usama Arif , Anthony Yznaga , Stanislav Kinsburskii , , , Subject: [RFC 02/18] pkernfs: Add persistent inodes hooked into directies Date: Mon, 5 Feb 2024 12:01:47 +0000 Message-ID: <20240205120203.60312-3-jgowans@amazon.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240205120203.60312-1-jgowans@amazon.com> References: <20240205120203.60312-1-jgowans@amazon.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: EX19D045UWA002.ant.amazon.com (10.13.139.12) To EX19D014EUC004.ant.amazon.com (10.252.51.182) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1790060544648774931 X-GMAIL-MSGID: 1790060544648774931 Add the ability to create inodes for files and directories inside directories. Inodes are persistent in the in-memory filesystem; the second 2 MiB is used as an "inode store." The inode store is one big array of struct pkernfs_inodes and they use a linked list to point to the next sibling inode or in the case of a directory the child inode which is the first inode in that directory. Free inodese are similarly maintained in a linked list with the first free inode being pointed to by the super block. Directory file_operations are added to support iterating through the content of a directory. Simiarly inode operations are added to support creating a file inside a directory. This allocate the next free inode and makes it the head of tthe "child inode" linked list for the directory. Unlink is implemented to remove an inode from the linked list. This is a bit finicky as it is done differently depending on whether the inode is the first child of a directory or somewhere later in the linked list. --- fs/pkernfs/Makefile | 2 +- fs/pkernfs/dir.c | 43 +++++++++++++ fs/pkernfs/inode.c | 148 +++++++++++++++++++++++++++++++++++++++++++ fs/pkernfs/pkernfs.c | 13 ++-- fs/pkernfs/pkernfs.h | 34 ++++++++++ 5 files changed, 234 insertions(+), 6 deletions(-) create mode 100644 fs/pkernfs/dir.c create mode 100644 fs/pkernfs/inode.c diff --git a/fs/pkernfs/Makefile b/fs/pkernfs/Makefile index 17258cb77f58..0a66e98bda07 100644 --- a/fs/pkernfs/Makefile +++ b/fs/pkernfs/Makefile @@ -3,4 +3,4 @@ # Makefile for persistent kernel filesystem # -obj-$(CONFIG_PKERNFS_FS) += pkernfs.o +obj-$(CONFIG_PKERNFS_FS) += pkernfs.o inode.o dir.o diff --git a/fs/pkernfs/dir.c b/fs/pkernfs/dir.c new file mode 100644 index 000000000000..b10ce745f19d --- /dev/null +++ b/fs/pkernfs/dir.c @@ -0,0 +1,43 @@ +// SPDX-License-Identifier: GPL-2.0-only + +#include "pkernfs.h" + +static int pkernfs_dir_iterate(struct file *dir, struct dir_context *ctx) +{ + struct pkernfs_inode *pkernfs_inode; + struct super_block *sb = dir->f_inode->i_sb; + + /* Indication from previous invoke that there's no more to iterate. */ + if (ctx->pos == -1) + return 0; + + if (!dir_emit_dots(dir, ctx)) + return 0; + + /* + * Just emitted this dir; go to dir contents. Use pos to smuggle + * the next inode number to emit across iterations. + * -1 indicates no valid inode. Can't use 0 because first loop has pos=0 + */ + if (ctx->pos == 2) { + ctx->pos = pkernfs_get_persisted_inode(sb, dir->f_inode->i_ino)->child_ino; + /* Empty dir case. */ + if (ctx->pos == 0) + ctx->pos = -1; + } + + while (ctx->pos > 1) { + pkernfs_inode = pkernfs_get_persisted_inode(sb, ctx->pos); + dir_emit(ctx, pkernfs_inode->filename, PKERNFS_FILENAME_LEN, + ctx->pos, DT_UNKNOWN); + ctx->pos = pkernfs_inode->sibling_ino; + if (!ctx->pos) + ctx->pos = -1; + } + return 0; +} + +const struct file_operations pkernfs_dir_fops = { + .owner = THIS_MODULE, + .iterate_shared = pkernfs_dir_iterate, +}; diff --git a/fs/pkernfs/inode.c b/fs/pkernfs/inode.c new file mode 100644 index 000000000000..f6584c8b8804 --- /dev/null +++ b/fs/pkernfs/inode.c @@ -0,0 +1,148 @@ +// SPDX-License-Identifier: GPL-2.0-only + +#include "pkernfs.h" +#include + +const struct inode_operations pkernfs_dir_inode_operations; + +struct pkernfs_inode *pkernfs_get_persisted_inode(struct super_block *sb, int ino) +{ + /* + * Inode index starts at 1, so -1 to get memory index. + */ + return ((struct pkernfs_inode *) (pkernfs_mem + PMD_SIZE)) + ino - 1; +} + +struct inode *pkernfs_inode_get(struct super_block *sb, unsigned long ino) +{ + struct inode *inode = iget_locked(sb, ino); + + /* If this inode is cached it is already populated; just return */ + if (!(inode->i_state & I_NEW)) + return inode; + inode->i_op = &pkernfs_dir_inode_operations; + inode->i_sb = sb; + inode->i_mode = S_IFREG; + unlock_new_inode(inode); + return inode; +} + +static unsigned long pkernfs_allocate_inode(struct super_block *sb) +{ + + unsigned long next_free_ino; + struct pkernfs_sb *psb = (struct pkernfs_sb *) pkernfs_mem; + + next_free_ino = psb->next_free_ino; + if (!next_free_ino) + return -ENOMEM; + psb->next_free_ino = + pkernfs_get_persisted_inode(sb, next_free_ino)->sibling_ino; + return next_free_ino; +} + +/* + * Zeroes the inode and makes it the head of the free list. + */ +static void pkernfs_free_inode(struct super_block *sb, unsigned long ino) +{ + struct pkernfs_sb *psb = (struct pkernfs_sb *) pkernfs_mem; + struct pkernfs_inode *inode = pkernfs_get_persisted_inode(sb, ino); + + memset(inode, 0, sizeof(struct pkernfs_inode)); + inode->sibling_ino = psb->next_free_ino; + psb->next_free_ino = ino; +} + +void pkernfs_initialise_inode_store(struct super_block *sb) +{ + /* Inode store is a PMD sized (ie: 2 MiB) page */ + memset(pkernfs_get_persisted_inode(sb, 1), 0, PMD_SIZE); + /* Point each inode for the next one; linked-list initialisation. */ + for (unsigned long ino = 2; ino * sizeof(struct pkernfs_inode) < PMD_SIZE; ino++) + pkernfs_get_persisted_inode(sb, ino - 1)->sibling_ino = ino; +} + +static int pkernfs_create(struct mnt_idmap *id, struct inode *dir, + struct dentry *dentry, umode_t mode, bool excl) +{ + unsigned long free_inode; + struct pkernfs_inode *pkernfs_inode; + struct inode *vfs_inode; + + free_inode = pkernfs_allocate_inode(dir->i_sb); + if (free_inode <= 0) + return -ENOMEM; + + pkernfs_inode = pkernfs_get_persisted_inode(dir->i_sb, free_inode); + pkernfs_inode->sibling_ino = pkernfs_get_persisted_inode(dir->i_sb, dir->i_ino)->child_ino; + pkernfs_get_persisted_inode(dir->i_sb, dir->i_ino)->child_ino = free_inode; + strscpy(pkernfs_inode->filename, dentry->d_name.name, PKERNFS_FILENAME_LEN); + pkernfs_inode->flags = PKERNFS_INODE_FLAG_FILE; + + vfs_inode = pkernfs_inode_get(dir->i_sb, free_inode); + d_instantiate(dentry, vfs_inode); + return 0; +} + +static struct dentry *pkernfs_lookup(struct inode *dir, + struct dentry *dentry, + unsigned int flags) +{ + struct pkernfs_inode *pkernfs_inode; + unsigned long ino; + + pkernfs_inode = pkernfs_get_persisted_inode(dir->i_sb, dir->i_ino); + ino = pkernfs_inode->child_ino; + while (ino) { + pkernfs_inode = pkernfs_get_persisted_inode(dir->i_sb, ino); + if (!strncmp(pkernfs_inode->filename, dentry->d_name.name, PKERNFS_FILENAME_LEN)) { + d_add(dentry, pkernfs_inode_get(dir->i_sb, ino)); + break; + } + ino = pkernfs_inode->sibling_ino; + } + return NULL; +} + +static int pkernfs_unlink(struct inode *dir, struct dentry *dentry) +{ + unsigned long ino; + struct pkernfs_inode *inode; + + ino = pkernfs_get_persisted_inode(dir->i_sb, dir->i_ino)->child_ino; + + /* Special case for first file in dir */ + if (ino == dentry->d_inode->i_ino) { + pkernfs_get_persisted_inode(dir->i_sb, dir->i_ino)->child_ino = + pkernfs_get_persisted_inode(dir->i_sb, dentry->d_inode->i_ino)->sibling_ino; + pkernfs_free_inode(dir->i_sb, ino); + return 0; + } + + /* + * Although we know exactly the inode to free, because we maintain only + * a singly linked list we need to scan for it to find the previous + * element so it's "next" pointer can be updated. + */ + while (ino) { + inode = pkernfs_get_persisted_inode(dir->i_sb, ino); + /* We've found the one pointing to the one we want to delete */ + if (inode->sibling_ino == dentry->d_inode->i_ino) { + inode->sibling_ino = + pkernfs_get_persisted_inode(dir->i_sb, + dentry->d_inode->i_ino)->sibling_ino; + pkernfs_free_inode(dir->i_sb, dentry->d_inode->i_ino); + break; + } + ino = pkernfs_get_persisted_inode(dir->i_sb, ino)->sibling_ino; + } + + return 0; +} + +const struct inode_operations pkernfs_dir_inode_operations = { + .create = pkernfs_create, + .lookup = pkernfs_lookup, + .unlink = pkernfs_unlink, +}; diff --git a/fs/pkernfs/pkernfs.c b/fs/pkernfs/pkernfs.c index 4c476ddc35b6..518c610e3877 100644 --- a/fs/pkernfs/pkernfs.c +++ b/fs/pkernfs/pkernfs.c @@ -8,7 +8,7 @@ #include static phys_addr_t pkernfs_base, pkernfs_size; -static void *pkernfs_mem; +void *pkernfs_mem; static const struct super_operations pkernfs_super_ops = { }; static int pkernfs_fill_super(struct super_block *sb, struct fs_context *fc) @@ -24,23 +24,26 @@ static int pkernfs_fill_super(struct super_block *sb, struct fs_context *fc) pr_info("pkernfs: Restoring from super block\n"); } else { pr_info("pkernfs: Clean super block; initialising\n"); + pkernfs_initialise_inode_store(sb); psb->magic_number = PKERNFS_MAGIC_NUMBER; + pkernfs_get_persisted_inode(sb, 1)->flags = PKERNFS_INODE_FLAG_DIR; + strscpy(pkernfs_get_persisted_inode(sb, 1)->filename, ".", PKERNFS_FILENAME_LEN); + psb->next_free_ino = 2; } sb->s_op = &pkernfs_super_ops; - inode = new_inode(sb); + inode = pkernfs_inode_get(sb, 1); if (!inode) return -ENOMEM; - inode->i_ino = 1; inode->i_mode = S_IFDIR; - inode->i_op = &simple_dir_inode_operations; - inode->i_fop = &simple_dir_operations; + inode->i_fop = &pkernfs_dir_fops; inode->i_atime = inode->i_mtime = current_time(inode); inode_set_ctime_current(inode); /* directory inodes start off with i_nlink == 2 (for "." entry) */ inc_nlink(inode); + inode_init_owner(&nop_mnt_idmap, inode, NULL, inode->i_mode); dentry = d_make_root(inode); if (!dentry) diff --git a/fs/pkernfs/pkernfs.h b/fs/pkernfs/pkernfs.h index bd1e2a6fd336..192e089b3151 100644 --- a/fs/pkernfs/pkernfs.h +++ b/fs/pkernfs/pkernfs.h @@ -1,6 +1,40 @@ /* SPDX-License-Identifier: GPL-2.0-only */ +#include + #define PKERNFS_MAGIC_NUMBER 0x706b65726e6673 +#define PKERNFS_FILENAME_LEN 255 + +extern void *pkernfs_mem; + struct pkernfs_sb { unsigned long magic_number; + /* Inode number */ + unsigned long next_free_ino; }; + +// If neither of these are set the inode is not in use. +#define PKERNFS_INODE_FLAG_FILE (1 << 0) +#define PKERNFS_INODE_FLAG_DIR (1 << 1) +struct pkernfs_inode { + int flags; + /* + * Points to next inode in the same directory, or + * 0 if last file in directory. + */ + unsigned long sibling_ino; + /* + * If this inode is a directory, this points to the + * first inode *in* that directory. + */ + unsigned long child_ino; + char filename[PKERNFS_FILENAME_LEN]; + int mappings_block; + int num_mappings; +}; + +void pkernfs_initialise_inode_store(struct super_block *sb); +struct inode *pkernfs_inode_get(struct super_block *sb, unsigned long ino); +struct pkernfs_inode *pkernfs_get_persisted_inode(struct super_block *sb, int ino); + +extern const struct file_operations pkernfs_dir_fops; From patchwork Mon Feb 5 12:01:48 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Gowans, James" X-Patchwork-Id: 196773 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:168b:b0:106:860b:bbdd with SMTP id ma11csp826158dyb; Mon, 5 Feb 2024 04:04:38 -0800 (PST) X-Google-Smtp-Source: AGHT+IEUkbdTN3+audQxFmY3FEgBIxaLw+tzapv3sED28Ok0PI4UTczkPatNANOzrI3lS+twF7NR X-Received: by 2002:a17:906:7ce:b0:a37:9c44:c5cf with SMTP id m14-20020a17090607ce00b00a379c44c5cfmr2639109ejc.24.1707134678416; Mon, 05 Feb 2024 04:04:38 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707134678; cv=pass; d=google.com; s=arc-20160816; b=vXINMbNpPktrmhwX03TafCuQ4YqpJ/aNDTDkGKcg8acCAZhJ9zGLktln9yTQOWxcSz 8JUM+D9uzt+FCJQBwsjYpJthGRX0XxpHVC0u7tppcC5N61z++SSlsvNWV7kJU7LJPNbi FoJESHNY8zhzGbxedm1lliKp2ztGePPIENtf7z+QNP4yLQ4ddLQ2j7XA++On6PEZqL2p 4cPUzYvSshGi8Pr+8iNB0lDIikH2ntrN6L8hem9cjRPaTQZ4I0Y2rom6hlAm/nTJPOmt Bfjic8BthaOejFv73wEYkc6RzjZLkUchEA+mwmSvGWj7vw4IyTg7mXuHh2ye7sBjy2P/ MDrg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=09HM9kCtJQ5fhByhNyO/r0il26Ag7KNxaSp1FSa3SKw=; fh=79Bk/C04ygjlIgMUNUcYi+8y5N034KZYtkqzsRg71rI=; b=IDMcDGd2AXT3pl+GTMQooadPieoOzXLV2PDnveGuj3RsQGuIfooqabHTlOnScDVI8c l9BvsD0e5l8R28P2tu5epX0SBvcEzViX066fk7m6GxO1uvCbYG6tkiF+UD08am0W6H5d 4AvdAV0EAvx63IH6ErKhODhloalvqEziaM63g2cBajThRn5Hbc2HJHHYdD6E1EOOmzYN GS/Tx8ahuQ6fweKoB3idxp2kPE8TcF/TShBGpzwij9uMJZznDaA59Xk8FgIsL8BlApL2 NCRVE19InkY0AXi43HzFAcwlETFpwFbnujQlP6IrMZ8ojnmZvJIj1WJfSM3kwka0LKSX sIEQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b=i6+E2Y7u; arc=pass (i=1 spf=pass spfdomain=amazon.com dkim=pass dkdomain=amazon.com dmarc=pass fromdomain=amazon.com); spf=pass (google.com: domain of linux-kernel+bounces-52545-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-52545-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com X-Forwarded-Encrypted: i=1; AJvYcCVepZWGhlvznXfkCHTdTMX2SFSiXV6MnNUQ50fUGoUjRZ4ZCOSOP211rN+pS8k9PYgdtK3tSAW1ItgNztj05VN/SghbAA== Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id d27-20020a170906345b00b00a37c32a77a3si889750ejb.41.2024.02.05.04.04.38 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 05 Feb 2024 04:04:38 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-52545-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b=i6+E2Y7u; arc=pass (i=1 spf=pass spfdomain=amazon.com dkim=pass dkdomain=amazon.com dmarc=pass fromdomain=amazon.com); spf=pass (google.com: domain of linux-kernel+bounces-52545-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-52545-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id D63B71F219DB for ; Mon, 5 Feb 2024 12:04:37 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id BFE661BDC4; Mon, 5 Feb 2024 12:03:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="i6+E2Y7u" Received: from smtp-fw-80008.amazon.com (smtp-fw-80008.amazon.com [99.78.197.219]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4BC771BC46; Mon, 5 Feb 2024 12:03:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=99.78.197.219 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707134581; cv=none; b=UGS50KRa6ZX0QVyhpyucufZJVSejsRLWvm8M7pHNl5y239WnfFWvQgAvp5GdyPMDunVHvgxQnake9gCob0RjpBhxOfWKPDZa8D41bdTpUzuMJIk4rKHgYNFksZc2ZIYdL+Qqh+46wvG5JTwyXZgTm2Hh6IhY6BnD9yCmwcnTUjk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707134581; c=relaxed/simple; bh=lErGtpA70KiDfjFQXCKs90lxVYs+wLNGHzuwt7P6XCM=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Mv7crg3dWGqyJZfR4Sn2oSCFMUU71kVlC/lzIqgVUzHr3wV8emNUFCJV9fliN5Avw3XU/LZvYuqTLB3l1/HET9NYAFqohLgduhYRrukDDHQE2VMJnXcKm0PDqQFEh5ozIflOKHNHYjz/7r1TN/ZFSPq1/oVzrnpoKFGc3y8tI2M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com; spf=pass smtp.mailfrom=amazon.com; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b=i6+E2Y7u; arc=none smtp.client-ip=99.78.197.219 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1707134580; x=1738670580; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=09HM9kCtJQ5fhByhNyO/r0il26Ag7KNxaSp1FSa3SKw=; b=i6+E2Y7uQY4J6eLqwZJnGtNvDDFRxqGbbuND7EMksfEJDoiaiB+GMEwn qwAXylnrSmo0XnSYzJKFCKClN1aqm1eeraOqBC7AK1Ua0gALOCa7vduN5 dj8oZu+tawgjUa8qDCjZiL4w4O6jEyOVJOqKo3Eq/WZbVYkqElKtr7YjC w=; X-IronPort-AV: E=Sophos;i="6.05,245,1701129600"; d="scan'208";a="63755246" Received: from pdx4-co-svc-p1-lb2-vlan3.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.25.36.214]) by smtp-border-fw-80008.pdx80.corp.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2024 12:02:58 +0000 Received: from EX19MTAEUA001.ant.amazon.com [10.0.10.100:37383] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.33.186:2525] with esmtp (Farcaster) id 0e22be1f-cb32-4c25-b8b9-6a540783cffe; Mon, 5 Feb 2024 12:02:57 +0000 (UTC) X-Farcaster-Flow-ID: 0e22be1f-cb32-4c25-b8b9-6a540783cffe Received: from EX19D014EUC004.ant.amazon.com (10.252.51.182) by EX19MTAEUA001.ant.amazon.com (10.252.50.192) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:02:57 +0000 Received: from dev-dsk-jgowans-1a-a3faec1f.eu-west-1.amazon.com (172.19.112.191) by EX19D014EUC004.ant.amazon.com (10.252.51.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:02:50 +0000 From: James Gowans To: CC: Eric Biederman , , "Joerg Roedel" , Will Deacon , , Alexander Viro , "Christian Brauner" , , Paolo Bonzini , Sean Christopherson , , Andrew Morton , , Alexander Graf , David Woodhouse , "Jan H . Schoenherr" , Usama Arif , Anthony Yznaga , Stanislav Kinsburskii , , , Subject: [RFC 03/18] pkernfs: Define an allocator for persistent pages Date: Mon, 5 Feb 2024 12:01:48 +0000 Message-ID: <20240205120203.60312-4-jgowans@amazon.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240205120203.60312-1-jgowans@amazon.com> References: <20240205120203.60312-1-jgowans@amazon.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: EX19D044UWB004.ant.amazon.com (10.13.139.134) To EX19D014EUC004.ant.amazon.com (10.252.51.182) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1790060452469356142 X-GMAIL-MSGID: 1790060452469356142 This introduces the concept of a bitmap allocator for pages from the pkernfs filesystem. The allocation bitmap is stored in the second half of the first page. This imposes an artificial limit of the maximum size of the filesystem; this needs to be made extensible. The allocations can be zeroed, that's it so far. The next commit will add the ability to allocate and use it. --- fs/pkernfs/Makefile | 2 +- fs/pkernfs/allocator.c | 27 +++++++++++++++++++++++++++ fs/pkernfs/pkernfs.c | 1 + fs/pkernfs/pkernfs.h | 1 + 4 files changed, 30 insertions(+), 1 deletion(-) create mode 100644 fs/pkernfs/allocator.c diff --git a/fs/pkernfs/Makefile b/fs/pkernfs/Makefile index 0a66e98bda07..d8b92a74fbc6 100644 --- a/fs/pkernfs/Makefile +++ b/fs/pkernfs/Makefile @@ -3,4 +3,4 @@ # Makefile for persistent kernel filesystem # -obj-$(CONFIG_PKERNFS_FS) += pkernfs.o inode.o dir.o +obj-$(CONFIG_PKERNFS_FS) += pkernfs.o inode.o allocator.o dir.o diff --git a/fs/pkernfs/allocator.c b/fs/pkernfs/allocator.c new file mode 100644 index 000000000000..1d4aac9c4545 --- /dev/null +++ b/fs/pkernfs/allocator.c @@ -0,0 +1,27 @@ +// SPDX-License-Identifier: GPL-2.0-only + +#include "pkernfs.h" + +/** + * For allocating blocks from the pkernfs filesystem. + * The first two blocks are special: + * - the first block is persitent filesystme metadata and + * a bitmap of allocated blocks + * - the second block is an array of persisted inodes; the + * inode store. + */ + +void *pkernfs_allocations_bitmap(struct super_block *sb) +{ + /* Allocations is 2nd half of first block */ + return pkernfs_mem + (1 << 20); +} + +void pkernfs_zero_allocations(struct super_block *sb) +{ + memset(pkernfs_allocations_bitmap(sb), 0, (1 << 20)); + /* First page is persisted super block and allocator bitmap */ + set_bit(0, pkernfs_allocations_bitmap(sb)); + /* Second page is inode store */ + set_bit(1, pkernfs_allocations_bitmap(sb)); +} diff --git a/fs/pkernfs/pkernfs.c b/fs/pkernfs/pkernfs.c index 518c610e3877..199c2c648bca 100644 --- a/fs/pkernfs/pkernfs.c +++ b/fs/pkernfs/pkernfs.c @@ -25,6 +25,7 @@ static int pkernfs_fill_super(struct super_block *sb, struct fs_context *fc) } else { pr_info("pkernfs: Clean super block; initialising\n"); pkernfs_initialise_inode_store(sb); + pkernfs_zero_allocations(sb); psb->magic_number = PKERNFS_MAGIC_NUMBER; pkernfs_get_persisted_inode(sb, 1)->flags = PKERNFS_INODE_FLAG_DIR; strscpy(pkernfs_get_persisted_inode(sb, 1)->filename, ".", PKERNFS_FILENAME_LEN); diff --git a/fs/pkernfs/pkernfs.h b/fs/pkernfs/pkernfs.h index 192e089b3151..4655780f31f2 100644 --- a/fs/pkernfs/pkernfs.h +++ b/fs/pkernfs/pkernfs.h @@ -34,6 +34,7 @@ struct pkernfs_inode { }; void pkernfs_initialise_inode_store(struct super_block *sb); +void pkernfs_zero_allocations(struct super_block *sb); struct inode *pkernfs_inode_get(struct super_block *sb, unsigned long ino); struct pkernfs_inode *pkernfs_get_persisted_inode(struct super_block *sb, int ino); From patchwork Mon Feb 5 12:01:49 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Gowans, James" X-Patchwork-Id: 196777 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:168b:b0:106:860b:bbdd with SMTP id ma11csp827272dyb; Mon, 5 Feb 2024 04:06:35 -0800 (PST) X-Google-Smtp-Source: AGHT+IFc15mtY9ER/Z1J9FspHRYqvkwyel16/egeaAKE1uuawdxh15XZm2Ut+BQJ3B2SvvQYyg0V X-Received: by 2002:a17:902:eb8c:b0:1d8:ab27:d784 with SMTP id q12-20020a170902eb8c00b001d8ab27d784mr7380424plg.21.1707134794954; Mon, 05 Feb 2024 04:06:34 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707134794; cv=pass; d=google.com; s=arc-20160816; b=Gk0psxT6r5fLAfpTiIu5zdt7kDXy8dYnQp4Tdfr/pqtloTv2exZY42gljt9qK2s2BQ m3oTvqUF3zwkOVggEoGpFSc5JMyJwxY9Fnz1LSO7GMxeMLgNJtpENkZvLnu8Q2XmpY3d 0xDwqIozvt+OiYJJjvVLxB5Y8lfHQwHfSHvy8IOHbmYS+4sw9EBC6McDPwaHwcJuw1zU 9DZKKHq3UtipXZ8XiV4MGWvVXdSN+cvdTyHxC4THZMkDRbDiv1gzcNrbsbL2RzunB3Ro piGadYuPxsY0mE+JLxK3qufhO9eyiysHeKZIYA9K6rI4WPVO+v/hmm69BIVIWulzpxtD M8hA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=Am0Sels7NAPtGwBi87Q194PvOn79+zUBtD9YBYxO8Vc=; fh=/IhcMPv0Zm12lz02cffYCdOVnkxaNG4n0DetWDS2M+M=; b=XlLuEjBFw0YtW2B5TbcvXV3InMCscrWwQdO4yq5Dx/wUswQdqpti2HOLQXiP3QFdF9 EKHATtmDYAQh4y/UHLWFky0gjoJXQi5HMddFNMZ44YvlzHIRpta15MxDanYKGoDGxaaz 2cVLmvDdkmiKhRfwH+2gRyxYV0EiLq0VWnE9h8+UgKcHv4WGn7V6Z+wrG8CSao1BCvLN ez8zNgwDsXXZgW2y/WMHIAiksIaKPmYEYNGp6Tpgjal6RaZBMzsirav73zfkqPhcOp/5 tj90ZHbLr44Escwx3JAGnGxZoclEBCCqxfpWuYbb9TgWb8ozLj4XTx5a8RGAKw3JPqqQ 21TQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b=tRoTa+6N; arc=pass (i=1 spf=pass spfdomain=amazon.com dkim=pass dkdomain=amazon.com dmarc=pass fromdomain=amazon.com); spf=pass (google.com: domain of linux-kernel+bounces-52546-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-52546-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com X-Forwarded-Encrypted: i=1; AJvYcCWHMVGSqUUmcyqhUCsvbol8FSvnRxmBlrIboKh6erOtRESoHvIQB+c0z+mroFecvOdFcNp/piic7MRnnrxhR3/sqPMXnw== Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id r12-20020a170902c60c00b001d9c1d53c82si754003plr.57.2024.02.05.04.06.34 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 05 Feb 2024 04:06:34 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-52546-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b=tRoTa+6N; arc=pass (i=1 spf=pass spfdomain=amazon.com dkim=pass dkdomain=amazon.com dmarc=pass fromdomain=amazon.com); spf=pass (google.com: domain of linux-kernel+bounces-52546-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-52546-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 4E6B8284E6E for ; Mon, 5 Feb 2024 12:05:05 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 33CA91BF2F; Mon, 5 Feb 2024 12:03:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="tRoTa+6N" Received: from smtp-fw-80008.amazon.com (smtp-fw-80008.amazon.com [99.78.197.219]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E7D321BDCB; Mon, 5 Feb 2024 12:03:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=99.78.197.219 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707134586; cv=none; b=n16IcvSLXjK47vsrXpzxvjyFXBwMAW2fYxSyOosY8LE80GH5ANpzQP2QEyqbwPbsazWlMwa564X7nwQQKf3Z4oQL8P/CFg3SgsqkzLWwxH3WgJGorgSv5BTMbObKIQPUtAQru4jfheBD2QYP862oMt3+cmRWtkgwCbeAv0cTttw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707134586; c=relaxed/simple; bh=GdB6alF/k2cwvfHA4lSrVP44W/b6Zl/qxhtKU2oOfIg=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=b2+b4kYm44q/uncgm6QOG0skh58IckRaRhPVPK3S/pCljhBjny91igl/z1mmis3M42uoKAf/g5zLICu5HDLA+875LXUW2y7kqywArvbP1MnM/ipwieRB1kRRahT8QY4EUYqiDJMDpkbIW0NMzIxnshdYssG/E0daqDe2EXY66Mw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com; spf=pass smtp.mailfrom=amazon.com; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b=tRoTa+6N; arc=none smtp.client-ip=99.78.197.219 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1707134585; x=1738670585; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Am0Sels7NAPtGwBi87Q194PvOn79+zUBtD9YBYxO8Vc=; b=tRoTa+6NKF1TTmvU+S524n3sBkweAW2lRFo7R2Nd1951ePXPtp5iKLkI M0jhxHM66+/G9zFDQb8jbGOHRnw4BC7qaB/dAmVt2ochRIriJ/P5l8dq+ h3vFgYBLenYFvBsUtg3gK+cEoMb5iUIcJO3Mto/Yu3XFaTNU3eegf9qFL c=; X-IronPort-AV: E=Sophos;i="6.05,245,1701129600"; d="scan'208";a="63755258" Received: from pdx4-co-svc-p1-lb2-vlan3.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.25.36.214]) by smtp-border-fw-80008.pdx80.corp.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2024 12:03:04 +0000 Received: from EX19MTAEUA002.ant.amazon.com [10.0.43.254:35084] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.28.192:2525] with esmtp (Farcaster) id a0419078-eb19-4b6b-acd7-2633051ba0ca; Mon, 5 Feb 2024 12:03:03 +0000 (UTC) X-Farcaster-Flow-ID: a0419078-eb19-4b6b-acd7-2633051ba0ca Received: from EX19D014EUC004.ant.amazon.com (10.252.51.182) by EX19MTAEUA002.ant.amazon.com (10.252.50.124) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:03:03 +0000 Received: from dev-dsk-jgowans-1a-a3faec1f.eu-west-1.amazon.com (172.19.112.191) by EX19D014EUC004.ant.amazon.com (10.252.51.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:02:57 +0000 From: James Gowans To: CC: Eric Biederman , , "Joerg Roedel" , Will Deacon , , Alexander Viro , "Christian Brauner" , , Paolo Bonzini , Sean Christopherson , , Andrew Morton , , Alexander Graf , David Woodhouse , "Jan H . Schoenherr" , Usama Arif , Anthony Yznaga , Stanislav Kinsburskii , , , Subject: [RFC 04/18] pkernfs: support file truncation Date: Mon, 5 Feb 2024 12:01:49 +0000 Message-ID: <20240205120203.60312-5-jgowans@amazon.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240205120203.60312-1-jgowans@amazon.com> References: <20240205120203.60312-1-jgowans@amazon.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: EX19D044UWB004.ant.amazon.com (10.13.139.134) To EX19D014EUC004.ant.amazon.com (10.252.51.182) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1790060574756004303 X-GMAIL-MSGID: 1790060574756004303 In the previous commit a block allocator was added. Now use that block allocator to allocate blocks for files when ftruncate is run on them. To do that a inode_operations is added on the file inodes with a getattr callback handling the ATTR_SIZE attribute. When this is invoked pages are allocated, the indexes of which are put into a mappings block. The mappings block is an array with the index being the file offset block and the value at that index being the pkernfs block backign that file offset. --- fs/pkernfs/Makefile | 2 +- fs/pkernfs/allocator.c | 24 +++++++++++++++++++ fs/pkernfs/file.c | 53 ++++++++++++++++++++++++++++++++++++++++++ fs/pkernfs/inode.c | 27 ++++++++++++++++++--- fs/pkernfs/pkernfs.h | 7 ++++++ 5 files changed, 109 insertions(+), 4 deletions(-) create mode 100644 fs/pkernfs/file.c diff --git a/fs/pkernfs/Makefile b/fs/pkernfs/Makefile index d8b92a74fbc6..e41f06cc490f 100644 --- a/fs/pkernfs/Makefile +++ b/fs/pkernfs/Makefile @@ -3,4 +3,4 @@ # Makefile for persistent kernel filesystem # -obj-$(CONFIG_PKERNFS_FS) += pkernfs.o inode.o allocator.o dir.o +obj-$(CONFIG_PKERNFS_FS) += pkernfs.o inode.o allocator.o dir.o file.o diff --git a/fs/pkernfs/allocator.c b/fs/pkernfs/allocator.c index 1d4aac9c4545..3905ce92b4a9 100644 --- a/fs/pkernfs/allocator.c +++ b/fs/pkernfs/allocator.c @@ -25,3 +25,27 @@ void pkernfs_zero_allocations(struct super_block *sb) /* Second page is inode store */ set_bit(1, pkernfs_allocations_bitmap(sb)); } + +/* + * Allocs one 2 MiB block, and returns the block index. + * Index is 2 MiB chunk index. + */ +unsigned long pkernfs_alloc_block(struct super_block *sb) +{ + unsigned long free_bit; + + /* Allocations is 2nd half of first page */ + void *allocations_mem = pkernfs_allocations_bitmap(sb); + free_bit = bitmap_find_next_zero_area(allocations_mem, + PMD_SIZE / 2, /* Size */ + 0, /* Start */ + 1, /* Number of zeroed bits to look for */ + 0); /* Alignment mask - none required. */ + bitmap_set(allocations_mem, free_bit, 1); + return free_bit; +} + +void *pkernfs_addr_for_block(struct super_block *sb, int block_idx) +{ + return pkernfs_mem + (block_idx * PMD_SIZE); +} diff --git a/fs/pkernfs/file.c b/fs/pkernfs/file.c new file mode 100644 index 000000000000..27a637423178 --- /dev/null +++ b/fs/pkernfs/file.c @@ -0,0 +1,53 @@ +// SPDX-License-Identifier: GPL-2.0-only + +#include "pkernfs.h" + +static int truncate(struct inode *inode, loff_t newsize) +{ + unsigned long free_block; + struct pkernfs_inode *pkernfs_inode; + unsigned long *mappings; + + pkernfs_inode = pkernfs_get_persisted_inode(inode->i_sb, inode->i_ino); + mappings = (unsigned long *)pkernfs_addr_for_block(inode->i_sb, + pkernfs_inode->mappings_block); + i_size_write(inode, newsize); + for (int block_idx = 0; block_idx * PMD_SIZE < newsize; ++block_idx) { + free_block = pkernfs_alloc_block(inode->i_sb); + if (free_block <= 0) + /* TODO: roll back allocations. */ + return -ENOMEM; + *(mappings + block_idx) = free_block; + ++pkernfs_inode->num_mappings; + } + return 0; +} + +static int inode_setattr(struct mnt_idmap *idmap, struct dentry *dentry, struct iattr *iattr) +{ + struct inode *inode = dentry->d_inode; + int error; + + error = setattr_prepare(idmap, dentry, iattr); + if (error) + return error; + + if (iattr->ia_valid & ATTR_SIZE) { + error = truncate(inode, iattr->ia_size); + if (error) + return error; + } + setattr_copy(idmap, inode, iattr); + mark_inode_dirty(inode); + return 0; +} + +const struct inode_operations pkernfs_file_inode_operations = { + .setattr = inode_setattr, + .getattr = simple_getattr, +}; + +const struct file_operations pkernfs_file_fops = { + .owner = THIS_MODULE, + .iterate_shared = NULL, +}; diff --git a/fs/pkernfs/inode.c b/fs/pkernfs/inode.c index f6584c8b8804..7fe4e7b220cc 100644 --- a/fs/pkernfs/inode.c +++ b/fs/pkernfs/inode.c @@ -15,14 +15,28 @@ struct pkernfs_inode *pkernfs_get_persisted_inode(struct super_block *sb, int in struct inode *pkernfs_inode_get(struct super_block *sb, unsigned long ino) { + struct pkernfs_inode *pkernfs_inode; struct inode *inode = iget_locked(sb, ino); /* If this inode is cached it is already populated; just return */ if (!(inode->i_state & I_NEW)) return inode; - inode->i_op = &pkernfs_dir_inode_operations; + pkernfs_inode = pkernfs_get_persisted_inode(sb, ino); inode->i_sb = sb; - inode->i_mode = S_IFREG; + if (pkernfs_inode->flags & PKERNFS_INODE_FLAG_DIR) { + inode->i_op = &pkernfs_dir_inode_operations; + inode->i_mode = S_IFDIR; + } else { + inode->i_op = &pkernfs_file_inode_operations; + inode->i_mode = S_IFREG; + inode->i_fop = &pkernfs_file_fops; + } + + inode->i_atime = inode->i_mtime = current_time(inode); + inode_set_ctime_current(inode); + set_nlink(inode, 1); + + /* Switch based on file type */ unlock_new_inode(inode); return inode; } @@ -79,6 +93,8 @@ static int pkernfs_create(struct mnt_idmap *id, struct inode *dir, pkernfs_get_persisted_inode(dir->i_sb, dir->i_ino)->child_ino = free_inode; strscpy(pkernfs_inode->filename, dentry->d_name.name, PKERNFS_FILENAME_LEN); pkernfs_inode->flags = PKERNFS_INODE_FLAG_FILE; + pkernfs_inode->mappings_block = pkernfs_alloc_block(dir->i_sb); + memset(pkernfs_addr_for_block(dir->i_sb, pkernfs_inode->mappings_block), 0, (2 << 20)); vfs_inode = pkernfs_inode_get(dir->i_sb, free_inode); d_instantiate(dentry, vfs_inode); @@ -90,6 +106,7 @@ static struct dentry *pkernfs_lookup(struct inode *dir, unsigned int flags) { struct pkernfs_inode *pkernfs_inode; + struct inode *vfs_inode; unsigned long ino; pkernfs_inode = pkernfs_get_persisted_inode(dir->i_sb, dir->i_ino); @@ -97,7 +114,10 @@ static struct dentry *pkernfs_lookup(struct inode *dir, while (ino) { pkernfs_inode = pkernfs_get_persisted_inode(dir->i_sb, ino); if (!strncmp(pkernfs_inode->filename, dentry->d_name.name, PKERNFS_FILENAME_LEN)) { - d_add(dentry, pkernfs_inode_get(dir->i_sb, ino)); + vfs_inode = pkernfs_inode_get(dir->i_sb, ino); + mark_inode_dirty(dir); + dir->i_atime = current_time(dir); + d_add(dentry, vfs_inode); break; } ino = pkernfs_inode->sibling_ino; @@ -146,3 +166,4 @@ const struct inode_operations pkernfs_dir_inode_operations = { .lookup = pkernfs_lookup, .unlink = pkernfs_unlink, }; + diff --git a/fs/pkernfs/pkernfs.h b/fs/pkernfs/pkernfs.h index 4655780f31f2..8b4fee8c5b2e 100644 --- a/fs/pkernfs/pkernfs.h +++ b/fs/pkernfs/pkernfs.h @@ -34,8 +34,15 @@ struct pkernfs_inode { }; void pkernfs_initialise_inode_store(struct super_block *sb); + void pkernfs_zero_allocations(struct super_block *sb); +unsigned long pkernfs_alloc_block(struct super_block *sb); struct inode *pkernfs_inode_get(struct super_block *sb, unsigned long ino); +void *pkernfs_addr_for_block(struct super_block *sb, int block_idx); + struct pkernfs_inode *pkernfs_get_persisted_inode(struct super_block *sb, int ino); + extern const struct file_operations pkernfs_dir_fops; +extern const struct file_operations pkernfs_file_fops; +extern const struct inode_operations pkernfs_file_inode_operations; From patchwork Mon Feb 5 12:01:51 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Gowans, James" X-Patchwork-Id: 196776 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:168b:b0:106:860b:bbdd with SMTP id ma11csp827205dyb; Mon, 5 Feb 2024 04:06:29 -0800 (PST) X-Google-Smtp-Source: AGHT+IEaceboIn2CeCcxPO/ZeFuTxzDveslAdokVTk8EC/CQXjHm5dc48jCudfmhll3me0trFR3T X-Received: by 2002:a05:620a:1088:b0:783:6e7a:c815 with SMTP id g8-20020a05620a108800b007836e7ac815mr9463985qkk.32.1707134789261; Mon, 05 Feb 2024 04:06:29 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707134789; cv=pass; d=google.com; s=arc-20160816; b=KpbVEcicDXrUUuwwmDJ/nrazttVcYfFD52I4/xocY2ZqqwO6ISB18Xdl8fzRJluIqn YdiSmX8jboUdHtbzbNdLT3tND6I26YzjuMJO9LxTGa3LNa4PAcScBl74DH0S4VayYl5p Nod7/SPeGW9tg0DupZNz8Zu00Vk4AqirOwTg57gKm9z/4MXODTLH4KjhjpPyFIjVkjut GTVxAW+do6PWHoMmjdP+VgR7nSoT5JQArxw0D3JqK/9sMeLrzVokad4UmRuxAoGC2ZRa yX2fS+3eUZSfKKqRwUTeHmk0FD8zUKaS3QE6gLx3SVEHbWBUIjnOP5RoVbAu3Nl2UPQw h2wQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=GuAtgRmAwOIaihgufqnss82d+hBkVwofjU/pF2m60/o=; fh=J+6p39bZkeiKzcI23O8koExWgWRiediP033SIYjxXIo=; b=Tvymy2c7KoPGdm6hPkWHmPC1kR5wtaobYrwI7W2N4aLwJ2RxMVwkRoSVFjkxwYEWcE /VcLwvObnCb+dd+V6tdNgkwkPhKomZzEhtMkizw7bH1ZKa2mn9va5//WiITbJk55NCG2 rm9cpZpjZbCTaG70hvfLmzAyZCgkdpbKbH0M8fTtj57tJHjcYwQXgGKh1GThzy3R3/59 1VSq/2LZB19YZBolFL2XwlsrI2PS5aXAN/qOyfisumQfmkemUQv9Nr+hI1BC0E9eTRx6 4Gk43WMxvku8cFRLsbrjD9j/INGTCxh7EmukkazzQ24RpBM8DKZk2S/o5ZXG/p3rluxJ itHA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b=kcEW2jEB; arc=pass (i=1 spf=pass spfdomain=amazon.com dkim=pass dkdomain=amazon.com dmarc=pass fromdomain=amazon.com); spf=pass (google.com: domain of linux-kernel+bounces-52548-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-52548-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com X-Forwarded-Encrypted: i=1; AJvYcCXRxvKCMtFI+jJZkcXBrVkxtPzmKjDfH/FdJoo4mSqrHXYyXyQlJeBu1KDGCeiPhQNl1xFlZgmaTQWHT6IAA59zZmJkdQ== Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id o3-20020a0ce403000000b0068c81f8d6edsi8366428qvl.476.2024.02.05.04.06.29 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 05 Feb 2024 04:06:29 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-52548-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b=kcEW2jEB; arc=pass (i=1 spf=pass spfdomain=amazon.com dkim=pass dkdomain=amazon.com dmarc=pass fromdomain=amazon.com); spf=pass (google.com: domain of linux-kernel+bounces-52548-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-52548-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 1D7A51C23855 for ; Mon, 5 Feb 2024 12:05:59 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 432471B95B; Mon, 5 Feb 2024 12:03:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="kcEW2jEB" Received: from smtp-fw-9106.amazon.com (smtp-fw-9106.amazon.com [207.171.188.206]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 468BD1B80F; Mon, 5 Feb 2024 12:03:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=207.171.188.206 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707134626; cv=none; b=Z6n1QEHQXpQ9dBaQHl8UfHgkOPaQMANm3/dGOEpILfMmKHSlRU2fj73/PGHxwAKyHDXiu8OLiY1JxVX8sHFP1iI2qotLeR3yFwCj3vQTN25YuTVs4TuEXaVwpQuyqg6ksOUtlJgCNNf8dtthJcInKoviObyGtg9Nd0+4weX9VJo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707134626; c=relaxed/simple; bh=vP7xqQVckvYXPZog0UglD7fg0Qi7KmKUf6ygoE4dg1o=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=n9TQJb1gJJpw68htqAJ4g5fFJb9negbe3kIsW24w6g3D8ozBKRuUZJ5VTEQ60SyntvpKE+ZhNy1CTZIvsKKc/fd69RQTtDHPU6XVl2R3mNsR5FiciPMRXvctBgD4M7wUB1M6ZDGayyzyEDg9+MoGZcv601iyaztHOJzDZ9vdiX0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com; spf=pass smtp.mailfrom=amazon.com; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b=kcEW2jEB; arc=none smtp.client-ip=207.171.188.206 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1707134626; x=1738670626; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=GuAtgRmAwOIaihgufqnss82d+hBkVwofjU/pF2m60/o=; b=kcEW2jEB1SevwyiGV9tiPNgSpSSB4Eb54eOLEyLccIv4DAiivPzXAGdS 9wOFjIVdVzVLx6YG1T11NREeoN35/V7msL4/WDiRJ5SeQMU/NAEWMVTsQ SeU/vkmJ4JlP2YuDU6wH0Azevl/s1u72Sf980FYpMlpeIR/3y0hulM5zX 0=; X-IronPort-AV: E=Sophos;i="6.05,245,1701129600"; d="scan'208";a="702146151" Received: from pdx4-co-svc-p1-lb2-vlan2.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.25.36.210]) by smtp-border-fw-9106.sea19.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2024 12:03:45 +0000 Received: from EX19MTAEUB002.ant.amazon.com [10.0.43.254:55484] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.32.190:2525] with esmtp (Farcaster) id 54deb5d5-b17f-4ae2-b6be-2dc346f855b1; Mon, 5 Feb 2024 12:03:43 +0000 (UTC) X-Farcaster-Flow-ID: 54deb5d5-b17f-4ae2-b6be-2dc346f855b1 Received: from EX19D014EUC004.ant.amazon.com (10.252.51.182) by EX19MTAEUB002.ant.amazon.com (10.252.51.59) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:03:40 +0000 Received: from dev-dsk-jgowans-1a-a3faec1f.eu-west-1.amazon.com (172.19.112.191) by EX19D014EUC004.ant.amazon.com (10.252.51.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:03:33 +0000 From: James Gowans To: CC: Eric Biederman , , "Joerg Roedel" , Will Deacon , , Alexander Viro , "Christian Brauner" , , Paolo Bonzini , Sean Christopherson , , Andrew Morton , , Alexander Graf , David Woodhouse , "Jan H . Schoenherr" , Usama Arif , Anthony Yznaga , Stanislav Kinsburskii , , , Subject: [RFC 06/18] init: Add liveupdate cmdline param Date: Mon, 5 Feb 2024 12:01:51 +0000 Message-ID: <20240205120203.60312-7-jgowans@amazon.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240205120203.60312-1-jgowans@amazon.com> References: <20240205120203.60312-1-jgowans@amazon.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: EX19D033UWC001.ant.amazon.com (10.13.139.218) To EX19D014EUC004.ant.amazon.com (10.252.51.182) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1790060568899604510 X-GMAIL-MSGID: 1790060568899604510 This will allow other subsystems to know when we're going a LU and hence when they should be restoring rather than reinitialising state. --- include/linux/init.h | 1 + init/main.c | 10 ++++++++++ 2 files changed, 11 insertions(+) diff --git a/include/linux/init.h b/include/linux/init.h index 266c3e1640d4..d7c68c7bfaf0 100644 --- a/include/linux/init.h +++ b/include/linux/init.h @@ -146,6 +146,7 @@ extern int do_one_initcall(initcall_t fn); extern char __initdata boot_command_line[]; extern char *saved_command_line; extern unsigned int saved_command_line_len; +extern bool liveupdate; extern unsigned int reset_devices; /* used by init/main.c */ diff --git a/init/main.c b/init/main.c index e24b0780fdff..7807a56c3473 100644 --- a/init/main.c +++ b/init/main.c @@ -165,6 +165,16 @@ static char *ramdisk_execute_command = "/init"; bool static_key_initialized __read_mostly; EXPORT_SYMBOL_GPL(static_key_initialized); +bool liveupdate __read_mostly; +EXPORT_SYMBOL(liveupdate); + +static int __init set_liveupdate(char *param) +{ + liveupdate = true; + return 0; +} +early_param("liveupdate", set_liveupdate); + /* * If set, this is an indication to the drivers that reset the underlying * device before going ahead with the initialization otherwise driver might From patchwork Mon Feb 5 12:01:52 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Gowans, James" X-Patchwork-Id: 196799 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:168b:b0:106:860b:bbdd with SMTP id ma11csp841123dyb; Mon, 5 Feb 2024 04:32:03 -0800 (PST) X-Google-Smtp-Source: AGHT+IElkk9jmBjTakdYUZUl4GLWyoYo8W3iOguTQc07I3WMt9CK+gBsCsQZ5xrVl5C7JOmLAzHY X-Received: by 2002:a05:6402:3c1:b0:55f:fbaa:5d99 with SMTP id t1-20020a05640203c100b0055ffbaa5d99mr4326676edw.42.1707136323696; Mon, 05 Feb 2024 04:32:03 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707136323; cv=pass; d=google.com; s=arc-20160816; b=MO4nx5zHPFsNgBjseRKZMwKZRRhI+cKSSN+QjEHyCzPbv9W+T13xIYwXeBSu9w3+Aa QjjOu1hxEkbH70z6m6htARBJbXmyTFm+BjhjaIGoOcPF7Wlb5d+NsQYmu5Yd4pCQ65gb E3J6rrPixLdIAVNgGChWOs5oWdxNIQdHH9zagyRPxNLnSVa3HHDyjpv82eFhYGqS1Lqg u47R5XESI4mP7SBokONZ4jWX6Oge7l2Ncxe0YJyiYDG4Bw8NaugM/qHjP3wDYorZxJft AnfBQctzPOaEDNNtEU+48nUe/naWA0qhQ6mPskV/p5jl/fSYLK20Jyc1dHyPWpisq4zM LBZA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=TzDhoCjRqYjwB6Ud3BxBgq+3tlmLodP3ZHF9TRx+87E=; fh=dl6rtxuvJFZMYIjJMiIBxZAlbBo78oPK8azl5YNVrxI=; b=tYJAW8YeXp2QeuC8zW5mkuXEPDrPtKgNAszJR45FJf1oUk5BZwoCcY8WhyEo6oyvUC rIjcQZrjQXw6/o68a+Td5w+oqKK/pVUSFXuqRlKgjlEdLZrLWSRsayiMxwH8Lq6pucSh fvO9n5f0zuGsP43ppE6lvssqiOlD4Jla1QozXv2SCyiIRb3m1RSkSYzSgoSAMT9jUNES TuFF4o7NKUFDDTujGxuXuIvmsQheIsT5vOdL+4WnjiFR3Bg3mMXvU6FCt4ZneuYJxkFo oCCmepQIy9DMVERkiuGtw/hITh62jvlEkKY64tXcpPH0EGGdo6UfVafGkRg5RHFiNLEW okHw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b=oVzsA6nA; arc=pass (i=1 spf=pass spfdomain=amazon.com dkim=pass dkdomain=amazon.com dmarc=pass fromdomain=amazon.com); spf=pass (google.com: domain of linux-kernel+bounces-52550-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-52550-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com X-Forwarded-Encrypted: i=1; AJvYcCVOh1eAHt6S0ABNnhl7dbIDn7QzFhYPsJbCdm5jd9isQhVZHsdi3+4aLgYsefiT2mrkUuharK6p7zsDQ5ZXSJkdL40g1Q== Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id z4-20020aa7cf84000000b0055ef9a173fdsi3796003edx.546.2024.02.05.04.32.03 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 05 Feb 2024 04:32:03 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-52550-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b=oVzsA6nA; arc=pass (i=1 spf=pass spfdomain=amazon.com dkim=pass dkdomain=amazon.com dmarc=pass fromdomain=amazon.com); spf=pass (google.com: domain of linux-kernel+bounces-52550-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-52550-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 588A91F24D92 for ; Mon, 5 Feb 2024 12:06:54 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 351B11CA91; Mon, 5 Feb 2024 12:04:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="oVzsA6nA" Received: from smtp-fw-9102.amazon.com (smtp-fw-9102.amazon.com [207.171.184.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2AE291C68D; Mon, 5 Feb 2024 12:03:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=207.171.184.29 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707134639; cv=none; b=NVgyJDGkp3qdvbci56kmhgp3kQQTNmE1SlYeUhPQ6An0nw1OiJUfajyI+LuuWRs4seyHPOT1hPaK4cku+FoYyfjvfx0g1ekI9xpty28UsZvrdo/npkDUnaHE1h7jSTIhUYKOFdJKpO0N9ue9BLeLIy7RwjpsQKhS2IIH8HwhEZw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707134639; c=relaxed/simple; bh=KJj1B5pipHCNG5+0xARVhv9lBaZP4+Uag/y5HQlG81E=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Ku3DHDTq6tsE9Me3BLFiCXMiicfTsn/bbbVh372Zv7aeiPgMa9T2ukqZSUk0HSIPaWF1wyzEchBWZn4+m1V7HRs8kqQjSxAlp9CX2ZCpRLpLyOGqbCBUn982+rTVmsAOrSQP4/rk9DgUYecEcoO+dqaVMQR9YQgSzYR0YLMtm9c= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com; spf=pass smtp.mailfrom=amazon.com; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b=oVzsA6nA; arc=none smtp.client-ip=207.171.184.29 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1707134638; x=1738670638; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=TzDhoCjRqYjwB6Ud3BxBgq+3tlmLodP3ZHF9TRx+87E=; b=oVzsA6nAcfsX2N9xvk+a6q/VX6J3iUMxPcWj8ZReekVCI1QcTwQpPgY1 7pufPRraCq/vpoLSpm9GWvVjHX/gwtpQk3b260sTs+RCcqKuFfnY5SWJX KPF0eYZXHHULCJq1p4kiXiEsmaqrWgvxagCOCHboX6GAzuQ7jC2dxTgOG Q=; X-IronPort-AV: E=Sophos;i="6.05,245,1701129600"; d="scan'208";a="394883262" Received: from pdx4-co-svc-p1-lb2-vlan3.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.25.36.214]) by smtp-border-fw-9102.sea19.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2024 12:03:52 +0000 Received: from EX19MTAEUC002.ant.amazon.com [10.0.17.79:26775] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.8.155:2525] with esmtp (Farcaster) id 37d56f4d-adef-4cf1-bb52-948bac8bacd3; Mon, 5 Feb 2024 12:03:50 +0000 (UTC) X-Farcaster-Flow-ID: 37d56f4d-adef-4cf1-bb52-948bac8bacd3 Received: from EX19D014EUC004.ant.amazon.com (10.252.51.182) by EX19MTAEUC002.ant.amazon.com (10.252.51.181) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:03:46 +0000 Received: from dev-dsk-jgowans-1a-a3faec1f.eu-west-1.amazon.com (172.19.112.191) by EX19D014EUC004.ant.amazon.com (10.252.51.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:03:40 +0000 From: James Gowans To: CC: Eric Biederman , , "Joerg Roedel" , Will Deacon , , Alexander Viro , "Christian Brauner" , , Paolo Bonzini , Sean Christopherson , , Andrew Morton , , Alexander Graf , David Woodhouse , "Jan H . Schoenherr" , Usama Arif , Anthony Yznaga , Stanislav Kinsburskii , , , Subject: [RFC 07/18] pkernfs: Add file type for IOMMU root pgtables Date: Mon, 5 Feb 2024 12:01:52 +0000 Message-ID: <20240205120203.60312-8-jgowans@amazon.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240205120203.60312-1-jgowans@amazon.com> References: <20240205120203.60312-1-jgowans@amazon.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: EX19D033UWC001.ant.amazon.com (10.13.139.218) To EX19D014EUC004.ant.amazon.com (10.252.51.182) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1790062177644538282 X-GMAIL-MSGID: 1790062177644538282 So far pkernfs is able to hold regular files for userspace to mmap and in which store persisted data. Now begin the IOMMU integration for persistent IOMMU pgtables. A new type of inode is created for an IOMMU data directory. A new type of inode is also created for a file which holds the IOMMU root pgtables. The inode types are specified by flags on the inodes. Different inode ops are also registed on the IOMMU pgtables file to ensure that userspace can't access it. These IOMMU directory and data inodes are created lazily: pkernfs_alloc_iommu_root_pgtables() scans for these and returns them if they already exist (ie: after kexec) or creates them if they don't exist (ie: cold boot). The data in the IOMMU root pgtables file needs to be accessible early in system boot: before filesystems are initialised and before anything is mounted. To support this the pkernfs initialisation code is split out into an pkernfs_init() function which is responsible for making the pkernfs memory available. Here the filesystem abstraction starts to creak: the pkernfs functions responsible for the IOMMU pgtables files manipulated persisted inodes directly. It may be preferable to somehow get pkernfs mounted early in system boot before it's needed by the IOMMU so that filesystem paths can be used, but it is unclear if that's possible. The need for super blocks in the pkernfs functions has been limited so far, super blocks are barely used because the pkernfs extents are stored as global variables in pkernfs.c. Now NULLs are actually supplied to functions which take a super block. This is also not pretty and this code should probably rather be plumbing some sort of wrapper around the persisted super block which would allow supporting multiple mount moints. Additionally, the memory backing the IOMMU root pgtable file is mapped into the direct map by registering it as a device. This is needed because the IOMMU does phys_to_virt in a few places when traversing the pgtables so the direct map virtual address should be populated. The alternative would be to replace all of the phy_to_virt calls in the IOMMU driver with wrappers which understand if the phys_addr is part of a pkernfs file. The next commit will use this pkernfs file for root pgtables. --- fs/pkernfs/Makefile | 2 +- fs/pkernfs/inode.c | 17 +++++-- fs/pkernfs/iommu.c | 98 +++++++++++++++++++++++++++++++++++++++++ fs/pkernfs/pkernfs.c | 38 ++++++++++------ fs/pkernfs/pkernfs.h | 7 +++ include/linux/pkernfs.h | 36 +++++++++++++++ 6 files changed, 181 insertions(+), 17 deletions(-) create mode 100644 fs/pkernfs/iommu.c create mode 100644 include/linux/pkernfs.h diff --git a/fs/pkernfs/Makefile b/fs/pkernfs/Makefile index e41f06cc490f..7f0f7a4cd3a1 100644 --- a/fs/pkernfs/Makefile +++ b/fs/pkernfs/Makefile @@ -3,4 +3,4 @@ # Makefile for persistent kernel filesystem # -obj-$(CONFIG_PKERNFS_FS) += pkernfs.o inode.o allocator.o dir.o file.o +obj-$(CONFIG_PKERNFS_FS) += pkernfs.o inode.o allocator.o dir.o file.o iommu.o diff --git a/fs/pkernfs/inode.c b/fs/pkernfs/inode.c index 7fe4e7b220cc..1d712e0a82a1 100644 --- a/fs/pkernfs/inode.c +++ b/fs/pkernfs/inode.c @@ -25,11 +25,18 @@ struct inode *pkernfs_inode_get(struct super_block *sb, unsigned long ino) inode->i_sb = sb; if (pkernfs_inode->flags & PKERNFS_INODE_FLAG_DIR) { inode->i_op = &pkernfs_dir_inode_operations; + inode->i_fop = &pkernfs_dir_fops; inode->i_mode = S_IFDIR; - } else { + } else if (pkernfs_inode->flags & PKERNFS_INODE_FLAG_FILE) { inode->i_op = &pkernfs_file_inode_operations; - inode->i_mode = S_IFREG; inode->i_fop = &pkernfs_file_fops; + inode->i_mode = S_IFREG; + } else if (pkernfs_inode->flags & PKERNFS_INODE_FLAG_IOMMU_DIR) { + inode->i_op = &pkernfs_iommu_dir_inode_operations; + inode->i_fop = &pkernfs_dir_fops; + inode->i_mode = S_IFDIR; + } else if (pkernfs_inode->flags | PKERNFS_INODE_FLAG_IOMMU_ROOT_PGTABLES) { + inode->i_mode = S_IFREG; } inode->i_atime = inode->i_mtime = current_time(inode); @@ -41,7 +48,7 @@ struct inode *pkernfs_inode_get(struct super_block *sb, unsigned long ino) return inode; } -static unsigned long pkernfs_allocate_inode(struct super_block *sb) +unsigned long pkernfs_allocate_inode(struct super_block *sb) { unsigned long next_free_ino; @@ -167,3 +174,7 @@ const struct inode_operations pkernfs_dir_inode_operations = { .unlink = pkernfs_unlink, }; +const struct inode_operations pkernfs_iommu_dir_inode_operations = { + .lookup = pkernfs_lookup, +}; + diff --git a/fs/pkernfs/iommu.c b/fs/pkernfs/iommu.c new file mode 100644 index 000000000000..5bce8146d7bb --- /dev/null +++ b/fs/pkernfs/iommu.c @@ -0,0 +1,98 @@ +// SPDX-License-Identifier: GPL-2.0-only + +#include "pkernfs.h" +#include + + +void pkernfs_alloc_iommu_root_pgtables(struct pkernfs_region *pkernfs_region) +{ + unsigned long *mappings_block_vaddr; + unsigned long inode_idx; + struct pkernfs_inode *iommu_pgtables, *iommu_dir = NULL; + int rc; + + pkernfs_init(); + + /* Try find a 'iommu' directory */ + inode_idx = pkernfs_get_persisted_inode(NULL, 1)->child_ino; + while (inode_idx) { + if (!strncmp(pkernfs_get_persisted_inode(NULL, inode_idx)->filename, + "iommu", PKERNFS_FILENAME_LEN)) { + iommu_dir = pkernfs_get_persisted_inode(NULL, inode_idx); + break; + } + inode_idx = pkernfs_get_persisted_inode(NULL, inode_idx)->sibling_ino; + } + + if (!iommu_dir) { + unsigned long root_pgtables_ino = 0; + unsigned long iommu_dir_ino = pkernfs_allocate_inode(NULL); + + iommu_dir = pkernfs_get_persisted_inode(NULL, iommu_dir_ino); + strscpy(iommu_dir->filename, "iommu", PKERNFS_FILENAME_LEN); + iommu_dir->flags = PKERNFS_INODE_FLAG_IOMMU_DIR; + + /* Make this the head of the list. */ + iommu_dir->sibling_ino = pkernfs_get_persisted_inode(NULL, 1)->child_ino; + pkernfs_get_persisted_inode(NULL, 1)->child_ino = iommu_dir_ino; + + /* Add a child file for pgtables. */ + root_pgtables_ino = pkernfs_allocate_inode(NULL); + iommu_pgtables = pkernfs_get_persisted_inode(NULL, root_pgtables_ino); + strscpy(iommu_pgtables->filename, "root-pgtables", PKERNFS_FILENAME_LEN); + iommu_pgtables->sibling_ino = iommu_dir->child_ino; + iommu_dir->child_ino = root_pgtables_ino; + iommu_pgtables->flags = PKERNFS_INODE_FLAG_IOMMU_ROOT_PGTABLES; + iommu_pgtables->mappings_block = pkernfs_alloc_block(NULL); + /* TODO: make alloc zero. */ + memset(pkernfs_addr_for_block(NULL, iommu_pgtables->mappings_block), 0, (2 << 20)); + } else { + inode_idx = iommu_dir->child_ino; + while (inode_idx) { + if (!strncmp(pkernfs_get_persisted_inode(NULL, inode_idx)->filename, + "root-pgtables", PKERNFS_FILENAME_LEN)) { + iommu_pgtables = pkernfs_get_persisted_inode(NULL, inode_idx); + break; + } + inode_idx = pkernfs_get_persisted_inode(NULL, inode_idx)->sibling_ino; + } + } + + /* + * For a pkernfs region block, the "mappings_block" field is still + * just a block index, but that block doesn't actually contain mappings + * it contains the pkernfs_region data + */ + + mappings_block_vaddr = (unsigned long *)pkernfs_addr_for_block(NULL, + iommu_pgtables->mappings_block); + set_bit(0, mappings_block_vaddr); + pkernfs_region->vaddr = mappings_block_vaddr; + pkernfs_region->paddr = pkernfs_base + (iommu_pgtables->mappings_block * PMD_SIZE); + pkernfs_region->bytes = PMD_SIZE; + + dev_set_name(&pkernfs_region->dev, "iommu_root_pgtables"); + rc = device_register(&pkernfs_region->dev); + if (rc) + pr_err("device_register failed: %i\n", rc); + + pkernfs_region->pgmap.range.start = pkernfs_base + + (iommu_pgtables->mappings_block * PMD_SIZE); + pkernfs_region->pgmap.range.end = + pkernfs_region->pgmap.range.start + PMD_SIZE - 1; + pkernfs_region->pgmap.nr_range = 1; + pkernfs_region->pgmap.type = MEMORY_DEVICE_GENERIC; + pkernfs_region->vaddr = + devm_memremap_pages(&pkernfs_region->dev, &pkernfs_region->pgmap); + pkernfs_region->paddr = pkernfs_base + + (iommu_pgtables->mappings_block * PMD_SIZE); +} + +void *pkernfs_region_paddr_to_vaddr(struct pkernfs_region *region, unsigned long paddr) +{ + if (WARN_ON(paddr >= region->paddr + region->bytes)) + return NULL; + if (WARN_ON(paddr < region->paddr)) + return NULL; + return region->vaddr + (paddr - region->paddr); +} diff --git a/fs/pkernfs/pkernfs.c b/fs/pkernfs/pkernfs.c index f010c2d76c76..2e8c4b0a5807 100644 --- a/fs/pkernfs/pkernfs.c +++ b/fs/pkernfs/pkernfs.c @@ -11,12 +11,14 @@ phys_addr_t pkernfs_base, pkernfs_size; void *pkernfs_mem; static const struct super_operations pkernfs_super_ops = { }; -static int pkernfs_fill_super(struct super_block *sb, struct fs_context *fc) +void pkernfs_init(void) { - struct inode *inode; - struct dentry *dentry; + static int inited; struct pkernfs_sb *psb; + if (inited++) + return; + pkernfs_mem = memremap(pkernfs_base, pkernfs_size, MEMREMAP_WB); psb = (struct pkernfs_sb *) pkernfs_mem; @@ -24,13 +26,21 @@ static int pkernfs_fill_super(struct super_block *sb, struct fs_context *fc) pr_info("pkernfs: Restoring from super block\n"); } else { pr_info("pkernfs: Clean super block; initialising\n"); - pkernfs_initialise_inode_store(sb); - pkernfs_zero_allocations(sb); + pkernfs_initialise_inode_store(NULL); + pkernfs_zero_allocations(NULL); psb->magic_number = PKERNFS_MAGIC_NUMBER; - pkernfs_get_persisted_inode(sb, 1)->flags = PKERNFS_INODE_FLAG_DIR; - strscpy(pkernfs_get_persisted_inode(sb, 1)->filename, ".", PKERNFS_FILENAME_LEN); + pkernfs_get_persisted_inode(NULL, 1)->flags = PKERNFS_INODE_FLAG_DIR; + strscpy(pkernfs_get_persisted_inode(NULL, 1)->filename, ".", PKERNFS_FILENAME_LEN); psb->next_free_ino = 2; } +} + +static int pkernfs_fill_super(struct super_block *sb, struct fs_context *fc) +{ + struct inode *inode; + struct dentry *dentry; + + pkernfs_init(); sb->s_op = &pkernfs_super_ops; @@ -77,12 +87,9 @@ static struct file_system_type pkernfs_fs_type = { .fs_flags = FS_USERNS_MOUNT, }; -static int __init pkernfs_init(void) +static int __init pkernfs_fs_init(void) { - int ret; - - ret = register_filesystem(&pkernfs_fs_type); - return ret; + return register_filesystem(&pkernfs_fs_type); } /** @@ -97,7 +104,12 @@ static int __init parse_pkernfs_extents(char *p) return 0; } +bool pkernfs_enabled(void) +{ + return !!pkernfs_base; +} + early_param("pkernfs", parse_pkernfs_extents); MODULE_ALIAS_FS("pkernfs"); -module_init(pkernfs_init); +module_init(pkernfs_fs_init); diff --git a/fs/pkernfs/pkernfs.h b/fs/pkernfs/pkernfs.h index 1a7aa783a9be..e1b7ae3fe7f1 100644 --- a/fs/pkernfs/pkernfs.h +++ b/fs/pkernfs/pkernfs.h @@ -1,6 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0-only */ #include +#include #define PKERNFS_MAGIC_NUMBER 0x706b65726e6673 #define PKERNFS_FILENAME_LEN 255 @@ -18,6 +19,8 @@ struct pkernfs_sb { // If neither of these are set the inode is not in use. #define PKERNFS_INODE_FLAG_FILE (1 << 0) #define PKERNFS_INODE_FLAG_DIR (1 << 1) +#define PKERNFS_INODE_FLAG_IOMMU_DIR (1 << 2) +#define PKERNFS_INODE_FLAG_IOMMU_ROOT_PGTABLES (1 << 3) struct pkernfs_inode { int flags; /* @@ -31,20 +34,24 @@ struct pkernfs_inode { */ unsigned long child_ino; char filename[PKERNFS_FILENAME_LEN]; + /* Block index for where the mappings live. */ int mappings_block; int num_mappings; }; void pkernfs_initialise_inode_store(struct super_block *sb); +void pkernfs_init(void); void pkernfs_zero_allocations(struct super_block *sb); unsigned long pkernfs_alloc_block(struct super_block *sb); struct inode *pkernfs_inode_get(struct super_block *sb, unsigned long ino); void *pkernfs_addr_for_block(struct super_block *sb, int block_idx); +unsigned long pkernfs_allocate_inode(struct super_block *sb); struct pkernfs_inode *pkernfs_get_persisted_inode(struct super_block *sb, int ino); extern const struct file_operations pkernfs_dir_fops; extern const struct file_operations pkernfs_file_fops; extern const struct inode_operations pkernfs_file_inode_operations; +extern const struct inode_operations pkernfs_iommu_dir_inode_operations; diff --git a/include/linux/pkernfs.h b/include/linux/pkernfs.h new file mode 100644 index 000000000000..0110e4784109 --- /dev/null +++ b/include/linux/pkernfs.h @@ -0,0 +1,36 @@ +/* SPDX-License-Identifier: MIT */ + +#ifndef _LINUX_PKERNFS_H +#define _LINUX_PKERNFS_H + +#include +#include + +#ifdef CONFIG_PKERNFS_FS +extern bool pkernfs_enabled(void); +#else +static inline bool pkernfs_enabled(void) +{ + return false; +} +#endif + +/* + * This is a light wrapper around the data behind a pkernfs + * file. Really it should be a file but the filesystem comes + * up too late: IOMMU needs root pgtables before fs is up. + */ +struct pkernfs_region { + void *vaddr; + unsigned long paddr; + unsigned long bytes; + struct dev_pagemap pgmap; + struct device dev; +}; + +void pkernfs_alloc_iommu_root_pgtables(struct pkernfs_region *pkernfs_region); +void pkernfs_alloc_page_from_region(struct pkernfs_region *pkernfs_region, + void **vaddr, unsigned long *paddr); +void *pkernfs_region_paddr_to_vaddr(struct pkernfs_region *region, unsigned long paddr); + +#endif /* _LINUX_PKERNFS_H */ From patchwork Mon Feb 5 12:01:53 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Gowans, James" X-Patchwork-Id: 196778 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:168b:b0:106:860b:bbdd with SMTP id ma11csp827382dyb; Mon, 5 Feb 2024 04:06:49 -0800 (PST) X-Google-Smtp-Source: AGHT+IGxcYw2ToBBBTV89UtJy8LdM2m6Aq5Z8kJOE0FTiXUgQ0fZO5D/P47BIkY50GJWpmshOT0o X-Received: by 2002:a05:6870:d38d:b0:219:44bf:a0ba with SMTP id k13-20020a056870d38d00b0021944bfa0bamr6205146oag.10.1707134809473; Mon, 05 Feb 2024 04:06:49 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707134809; cv=pass; d=google.com; s=arc-20160816; b=m7xnpvC/eUZJjrE6vCQppNIii/cfFYQuwVX7sjlMkOAjVKTiY+z/jsS9T7cUY9CudW YgqosMRrX2HMKXM9yOitSGySv9MyEk4PdtyFwjdWtcIWy47Bb+VMdxlBVZkFZR3Rtsx7 0TX35CxSJqQ9KHqoqJxb4++SwwlVybNJpYSe9i/5qWs6KPFPfS2oQRBMjVdBhKu0C6HO +AGLl9LTGRvq6dR+ajuxrVEIOT7RLpIcd2tbJrDb8CmgaeNrJ50Z3IYlkrA/aZHJ8+lj 4TLEKdJbjbmSyxgIw17xzj7ES7t/ZWwtukAeA1JvcX5AV1pt6ObQdpa0RIw8VXxZJsnx jpTg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=iY0J0IZ5qTqLG1CI5oIteOfffrC6hlfw7IE0MCx8M3k=; fh=YQvSdUma7ZldWqdQXJMZ/SnieH9R9UvY78nNn94zdAE=; b=KVHZuyc4uVH+2qphE7uNp2oaG6pRlGThvRaa9XEp8DoqP27s8g2sT0e2vBH7l6larL GU5E3qFiW1atKv50AT1TOi6hOcjs4A01v3Op5NdTeosPPDSQO/gpR8pPCldCfMr5hAek Z/noFN4/1RL/xb+fSev7WcSMnE9sZL9E4wp7z3CvwKYJ1qIxG8xoytrdq8U57E8NqReR 04lVOp7ssWDLIfj+CpZLcMzAQpMOQQgMMK0n3GJmZbfegd+NivmNR/igpdShxU7F7cR3 VxRILpnTXwvs02FwN3tIrWV6OSvagCTWu+Oaloxx/yGyMvOBeRTQg0DYfJq8mAN2aXw9 ufHw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b=JSkDIGyr; arc=pass (i=1 spf=pass spfdomain=amazon.com dkim=pass dkdomain=amazon.com dmarc=pass fromdomain=amazon.com); spf=pass (google.com: domain of linux-kernel+bounces-52549-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-52549-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com X-Forwarded-Encrypted: i=1; AJvYcCUHpK/mC+FBbD7l7CSFJwxJDiQVYgwD0uQbadlOAdXaod3mds40FtwFCjGFnZP+DoXVIPnqBhkwhZ2Jng9bRr/ep5o+Bw== Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id u12-20020a0cf88c000000b0068c668f1557si8188503qvn.594.2024.02.05.04.06.49 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 05 Feb 2024 04:06:49 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-52549-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b=JSkDIGyr; arc=pass (i=1 spf=pass spfdomain=amazon.com dkim=pass dkdomain=amazon.com dmarc=pass fromdomain=amazon.com); spf=pass (google.com: domain of linux-kernel+bounces-52549-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-52549-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 76E931C22B2F for ; Mon, 5 Feb 2024 12:06:42 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 4D6941CA88; Mon, 5 Feb 2024 12:04:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="JSkDIGyr" Received: from smtp-fw-80009.amazon.com (smtp-fw-80009.amazon.com [99.78.197.220]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AC28D1C686; Mon, 5 Feb 2024 12:03:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=99.78.197.220 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707134638; cv=none; b=qYAVKrOnyIG7xicsCDxV3ckmQfZ1w/Uor2B2GKDMINbzXDCNgZIhk1KdmJI1SecoWgF3p0gHg3Q8yCLID+RSmD9DkwbEruq75np1yEcmA0anutU0DU1jgQXaQk0aQ0wCJ+450tiREjp3jMwgoEme77PH3EnDAD4DE69PlIfBFX4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707134638; c=relaxed/simple; bh=HphmXgXmCKa5mqdnPL9wBxqcBYTxkbzFB0ny6/qkCB4=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=HZdlIRaBysRkb9SkBJxIXoeqsdwmtPqTHHEREieG2RT6z/jlGM+AknRvsEXMkUDW/hOPGsk2pU64TbJExN4wfmSSpI33qnLtfvZlqNuCe4Xm0On/FIR6SmC14l3FQ6roh1RIp5W8E419GPYkaaUgb9CUOLPS5Y5ngpFsyO4pR4E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com; spf=pass smtp.mailfrom=amazon.com; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b=JSkDIGyr; arc=none smtp.client-ip=99.78.197.220 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1707134637; x=1738670637; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=iY0J0IZ5qTqLG1CI5oIteOfffrC6hlfw7IE0MCx8M3k=; b=JSkDIGyrOfNkdmuDUcA1wlHoiTEd4BH8dtO97qK0NJ2Pte20i/a0OOCm C0j4dmJ+xT+dqfxYjkzyAaHx0JvaB1O29vk4fcn3dFp+/HSh1BHdFnrJJ 7nrvM4D8CHbVq1DsiDeLNIbnc82oZ+jmY+bgefcK6RMaqaeUsJlwrjaNP Y=; X-IronPort-AV: E=Sophos;i="6.05,245,1701129600"; d="scan'208";a="63724432" Received: from pdx4-co-svc-p1-lb2-vlan2.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.25.36.210]) by smtp-border-fw-80009.pdx80.corp.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2024 12:03:55 +0000 Received: from EX19MTAEUC001.ant.amazon.com [10.0.10.100:51867] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.28.192:2525] with esmtp (Farcaster) id 0b48539d-2334-4d72-bccb-1f938dcbdb04; Mon, 5 Feb 2024 12:03:53 +0000 (UTC) X-Farcaster-Flow-ID: 0b48539d-2334-4d72-bccb-1f938dcbdb04 Received: from EX19D014EUC004.ant.amazon.com (10.252.51.182) by EX19MTAEUC001.ant.amazon.com (10.252.51.155) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:03:53 +0000 Received: from dev-dsk-jgowans-1a-a3faec1f.eu-west-1.amazon.com (172.19.112.191) by EX19D014EUC004.ant.amazon.com (10.252.51.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:03:46 +0000 From: James Gowans To: CC: Eric Biederman , , "Joerg Roedel" , Will Deacon , , Alexander Viro , "Christian Brauner" , , Paolo Bonzini , Sean Christopherson , , Andrew Morton , , Alexander Graf , David Woodhouse , "Jan H . Schoenherr" , Usama Arif , Anthony Yznaga , Stanislav Kinsburskii , , , Subject: [RFC 08/18] iommu: Add allocator for pgtables from persistent region Date: Mon, 5 Feb 2024 12:01:53 +0000 Message-ID: <20240205120203.60312-9-jgowans@amazon.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240205120203.60312-1-jgowans@amazon.com> References: <20240205120203.60312-1-jgowans@amazon.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: EX19D033UWC001.ant.amazon.com (10.13.139.218) To EX19D014EUC004.ant.amazon.com (10.252.51.182) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1790060589800142841 X-GMAIL-MSGID: 1790060589800142841 The specific IOMMU drivers will need to ability to allocate pages from a pkernfs IOMMU pgtable file for their pgtables. Also, the IOMMU drivers will need to ability to consistent get the same page for the root PGD page - add a specific function to get this PGD "root" page. This is different to allocating regular pgtable pages because the exact same page needs to be *restored* after kexec into the pgd pointer on the IOMMU domain struct. To support this sort of allocation the pkernfs region is treated as an array of 512 4 KiB pages, the first of which is an allocation bitmap. --- drivers/iommu/Makefile | 1 + drivers/iommu/pgtable_alloc.c | 36 +++++++++++++++++++++++++++++++++++ drivers/iommu/pgtable_alloc.h | 9 +++++++++ 3 files changed, 46 insertions(+) create mode 100644 drivers/iommu/pgtable_alloc.c create mode 100644 drivers/iommu/pgtable_alloc.h diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile index 769e43d780ce..cadebabe9581 100644 --- a/drivers/iommu/Makefile +++ b/drivers/iommu/Makefile @@ -1,5 +1,6 @@ # SPDX-License-Identifier: GPL-2.0 obj-y += amd/ intel/ arm/ iommufd/ +obj-y += pgtable_alloc.o obj-$(CONFIG_IOMMU_API) += iommu.o obj-$(CONFIG_IOMMU_API) += iommu-traces.o obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o diff --git a/drivers/iommu/pgtable_alloc.c b/drivers/iommu/pgtable_alloc.c new file mode 100644 index 000000000000..f0c2e12f8a8b --- /dev/null +++ b/drivers/iommu/pgtable_alloc.c @@ -0,0 +1,36 @@ +// SPDX-License-Identifier: GPL-2.0-only + +#include "pgtable_alloc.h" +#include + +/* + * The first 4 KiB is the bitmap - set the first bit in the bitmap. + * Scan bitmap to find next free bits - it's next free page. + */ + +void iommu_alloc_page_from_region(struct pkernfs_region *region, void **vaddr, unsigned long *paddr) +{ + int page_idx; + + page_idx = bitmap_find_free_region(region->vaddr, 512, 0); + *vaddr = region->vaddr + (page_idx << PAGE_SHIFT); + if (paddr) + *paddr = region->paddr + (page_idx << PAGE_SHIFT); +} + + +void *pgtable_get_root_page(struct pkernfs_region *region, bool liveupdate) +{ + /* + * The page immediately after the bitmap is the root page. + * It would be wrong for the page to be allocated if we're + * NOT doing a liveupdate, or for a liveupdate to happen + * with no allocated page. Detect this mismatch. + */ + if (test_bit(1, region->vaddr) ^ liveupdate) { + pr_err("%sdoing a liveupdate but root pg bit incorrect", + liveupdate ? "" : "NOT "); + } + set_bit(1, region->vaddr); + return region->vaddr + PAGE_SIZE; +} diff --git a/drivers/iommu/pgtable_alloc.h b/drivers/iommu/pgtable_alloc.h new file mode 100644 index 000000000000..c1666a7be3d3 --- /dev/null +++ b/drivers/iommu/pgtable_alloc.h @@ -0,0 +1,9 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ + +#include +#include + +void iommu_alloc_page_from_region(struct pkernfs_region *region, + void **vaddr, unsigned long *paddr); + +void *pgtable_get_root_page(struct pkernfs_region *region, bool liveupdate); From patchwork Mon Feb 5 12:01:54 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Gowans, James" X-Patchwork-Id: 196779 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:168b:b0:106:860b:bbdd with SMTP id ma11csp828240dyb; Mon, 5 Feb 2024 04:08:12 -0800 (PST) X-Google-Smtp-Source: AGHT+IEC0EiwL6JJWniGlDpQIxz3Fo4ZFocnn+OWAuUbuWlsCMurYtJ/WnczGjRLJQQwlr5fzIbb X-Received: by 2002:a05:6870:a447:b0:214:2886:705b with SMTP id n7-20020a056870a44700b002142886705bmr8427841oal.11.1707134892242; Mon, 05 Feb 2024 04:08:12 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707134892; cv=pass; d=google.com; s=arc-20160816; b=bat+dMQrVb5ESQdjLW5o2AsTHMr88yrHZOdFR4Ht8TzbhiQYxUSdTeRBc4S6Byq/FX Bno7nxPt8nuH1snIV8D8iSwRA9UAe7FmUHXR8swofsCjClrRfOo9X/hXCEnH9s7N+gxQ SNO6oLO32tzzjI3TKDFrZB+WsalB/DEbpakixonC6smL2/bg0wAIOAmb6oDFV5CW/7jn Bi3KhagCtDANQjzVsWk0HuUH/tzkzsRVlCHkk+WwbKOS92hWwozs+5wkO2ps6rudoPsa U1irsHdtgbiPE4dOAIuuF9Dq2JOLSQ1vRnuKjnVsMZlVm0y6yEjDY8xNRH/Wa8VbLcRi tO8g== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=TxdyyN106bvAzVLll6ZOpOdKPjlViDjzUv1y30GOivQ=; fh=QcYMPvVa4nm7qngdV8/5wpJ5Yb2jkFoePaHQXzFPvF8=; b=WMFHq3v1/AEl2ENuF+zxr3pKvHZ0kIqTJRCb6PbIkGlZSLK7UHp3yinJ06Uq76clfP 5+fF3BlPUbjA6SL8DGWfVHpcTl3nhG8MSX9JQxvGUKmZAwYKOtSWEw62DWti7+hbCs0K MWcpBYCP40LIBRXgOj6Puw4ciaWZPt6H3JIStcIC5Qplz4VgR5wQ0Amu86a/sb3GcnZW ZmFdw2kj7kVRjthFKkstN7RTjsSbAuy/WfIsFRlAIWnNBJGRC2SnQr0kjJRFgNNeymG6 ITl4zdfOzSIxtRZb9UiSB2ogDczYMGI/57/JGVP79ffcTusEc1RZZVAnfeVp/UnkFhv4 jc9Q==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b="hgi7L/iN"; arc=pass (i=1 spf=pass spfdomain=amazon.com dkim=pass dkdomain=amazon.com dmarc=pass fromdomain=amazon.com); spf=pass (google.com: domain of linux-kernel+bounces-52552-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-52552-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com X-Forwarded-Encrypted: i=1; AJvYcCUE48uGgmYCupUdCX1aiMvLK1wobDB6f+aAJxpwp3p00FyxYOk4ZHdzkDuC3Prpa8SJ+kT3ETqP5agsZ2yWa0eLlgSYkw== Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id o3-20020a0ce403000000b0068c81f8d6edsi8366428qvl.476.2024.02.05.04.08.12 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 05 Feb 2024 04:08:12 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-52552-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b="hgi7L/iN"; arc=pass (i=1 spf=pass spfdomain=amazon.com dkim=pass dkdomain=amazon.com dmarc=pass fromdomain=amazon.com); spf=pass (google.com: domain of linux-kernel+bounces-52552-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-52552-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id F10001C21295 for ; Mon, 5 Feb 2024 12:08:11 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id D91D61CD3C; Mon, 5 Feb 2024 12:04:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="hgi7L/iN" Received: from smtp-fw-9105.amazon.com (smtp-fw-9105.amazon.com [207.171.188.204]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 75A591AAB9; Mon, 5 Feb 2024 12:04:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=207.171.188.204 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707134676; cv=none; b=fqNjPzPIn2HowPGUce4T9p+JVE5nBCQQChxYNPiqhyc+KterqzkQKmU1ZFgUSi+wTYL8Q2vy54pMm8zeF4bZLhzL89H6tvVdIOZEQYUwVygHyXApbUcMfDKWTQcRufGgIK6YAhzLsAZlVAcpYcEBG4Mcv0J2MGksry0zYBKiMkQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707134676; c=relaxed/simple; bh=5Uxpgc0dI22+UGWqfon6DoCVovwe1kf0HryqXSj4tnM=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=ah2mlriwc4GPE2ctEnazWiq/jPss9WrHL6rnoYO1I44DFzSyNmP8V1WYUZFBTqUmN1aH8X4Fi+XMK+wEZFPSAu19It774Sdlcn/WoT+m/s9ZDvHZHD0IrCnecOKvYKeGowIBhr5VXhA8/v/E4LiZ8w+2qIOTu6GZauK/Ueylypw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com; spf=pass smtp.mailfrom=amazon.com; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b=hgi7L/iN; arc=none smtp.client-ip=207.171.188.204 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1707134674; x=1738670674; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=TxdyyN106bvAzVLll6ZOpOdKPjlViDjzUv1y30GOivQ=; b=hgi7L/iNQzybvsy8wLvuACoX1rTQU+9BfNz+mOnOnpCmhgHmszw0AzbN ecx4I6yCuQAQaPR1PdW9mUikqdyGrxDITMckVJYkohrrJGpepLpavihnW 9hFJknCqV2h1HfqVPju/OBrKJiKd1diIYGnwliuFrQotg0gn/AKkqR0aI w=; X-IronPort-AV: E=Sophos;i="6.05,245,1701129600"; d="scan'208";a="702759804" Received: from pdx4-co-svc-p1-lb2-vlan2.amazon.com (HELO smtpout.prod.us-east-1.prod.farcaster.email.amazon.dev) ([10.25.36.210]) by smtp-border-fw-9105.sea19.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2024 12:04:26 +0000 Received: from EX19MTAEUC001.ant.amazon.com [10.0.10.100:58296] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.14.129:2525] with esmtp (Farcaster) id e132c2ad-2d34-4ccb-890d-621a7e0c08cd; Mon, 5 Feb 2024 12:04:25 +0000 (UTC) X-Farcaster-Flow-ID: e132c2ad-2d34-4ccb-890d-621a7e0c08cd Received: from EX19D014EUC004.ant.amazon.com (10.252.51.182) by EX19MTAEUC001.ant.amazon.com (10.252.51.155) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:04:23 +0000 Received: from dev-dsk-jgowans-1a-a3faec1f.eu-west-1.amazon.com (172.19.112.191) by EX19D014EUC004.ant.amazon.com (10.252.51.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:04:17 +0000 From: James Gowans To: CC: Eric Biederman , , "Joerg Roedel" , Will Deacon , , Alexander Viro , "Christian Brauner" , , Paolo Bonzini , Sean Christopherson , , Andrew Morton , , Alexander Graf , David Woodhouse , "Jan H . Schoenherr" , Usama Arif , Anthony Yznaga , Stanislav Kinsburskii , , , Subject: [RFC 09/18] intel-iommu: Use pkernfs for root/context pgtable pages Date: Mon, 5 Feb 2024 12:01:54 +0000 Message-ID: <20240205120203.60312-10-jgowans@amazon.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240205120203.60312-1-jgowans@amazon.com> References: <20240205120203.60312-1-jgowans@amazon.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: EX19D041UWA001.ant.amazon.com (10.13.139.124) To EX19D014EUC004.ant.amazon.com (10.252.51.182) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1790060676579502479 X-GMAIL-MSGID: 1790060676579502479 The previous commits were preparation for using pkernfs memory for IOMMU pgtables: a file in the filesystem is available and an allocator to allocate 4-KiB pages from that file is available. Now use those to actually use pkernfs memory for root and context pgtable pages. If pkernfs is enabled then a "region" (physical and virtual memory chunk) is fetch from pkernfs and used to drive the allocator. Should this rather just be a pointer to a pkernfs inode? That abstraction seems leaky but without having the ability to store struct files at this point it's probably the more accurate. The freeing still needs to be hooked into the allocator... --- drivers/iommu/intel/iommu.c | 24 ++++++++++++++++++++---- drivers/iommu/intel/iommu.h | 2 ++ 2 files changed, 22 insertions(+), 4 deletions(-) diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 744e4e6b8d72..2dd3f055dbce 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -19,6 +19,7 @@ #include #include #include +#include #include #include #include @@ -28,6 +29,7 @@ #include "../dma-iommu.h" #include "../irq_remapping.h" #include "../iommu-sva.h" +#include "../pgtable_alloc.h" #include "pasid.h" #include "cap_audit.h" #include "perfmon.h" @@ -617,7 +619,12 @@ struct context_entry *iommu_context_addr(struct intel_iommu *iommu, u8 bus, if (!alloc) return NULL; - context = alloc_pgtable_page(iommu->node, GFP_ATOMIC); + if (pkernfs_enabled()) + iommu_alloc_page_from_region( + &iommu->pkernfs_region, + (void **) &context, NULL); + else + context = alloc_pgtable_page(iommu->node, GFP_ATOMIC); if (!context) return NULL; @@ -1190,7 +1197,15 @@ static int iommu_alloc_root_entry(struct intel_iommu *iommu) { struct root_entry *root; - root = alloc_pgtable_page(iommu->node, GFP_ATOMIC); + if (pkernfs_enabled()) { + pkernfs_alloc_iommu_root_pgtables(&iommu->pkernfs_region); + root = pgtable_get_root_page( + &iommu->pkernfs_region, + liveupdate); + } else { + root = alloc_pgtable_page(iommu->node, GFP_ATOMIC); + } + if (!root) { pr_err("Allocating root entry for %s failed\n", iommu->name); @@ -2790,7 +2805,7 @@ static int __init init_dmars(void) init_translation_status(iommu); - if (translation_pre_enabled(iommu) && !is_kdump_kernel()) { + if (translation_pre_enabled(iommu) && !is_kdump_kernel() && !liveupdate) { iommu_disable_translation(iommu); clear_translation_pre_enabled(iommu); pr_warn("Translation was enabled for %s but we are not in kdump mode\n", @@ -2806,7 +2821,8 @@ static int __init init_dmars(void) if (ret) goto free_iommu; - if (translation_pre_enabled(iommu)) { + /* For the live update case restore pgtables, don't copy */ + if (translation_pre_enabled(iommu) && !liveupdate) { pr_info("Translation already enabled - trying to copy translation structures\n"); ret = copy_translation_tables(iommu); diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h index e6a3e7065616..a2338e398ba3 100644 --- a/drivers/iommu/intel/iommu.h +++ b/drivers/iommu/intel/iommu.h @@ -22,6 +22,7 @@ #include #include #include +#include #include #include @@ -672,6 +673,7 @@ struct intel_iommu { unsigned long *copied_tables; /* bitmap of copied tables */ spinlock_t lock; /* protect context, domain ids */ struct root_entry *root_entry; /* virtual address */ + struct pkernfs_region pkernfs_region; struct iommu_flush flush; #endif From patchwork Mon Feb 5 12:01:55 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Gowans, James" X-Patchwork-Id: 196794 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:168b:b0:106:860b:bbdd with SMTP id ma11csp838378dyb; Mon, 5 Feb 2024 04:27:47 -0800 (PST) X-Google-Smtp-Source: AGHT+IGFDsCcIzq153V/DYAtE2NPO1i3NaOXIPwLN67CWFUvamDLc0PSt3ClwVGEJIUHMGLKrEt3 X-Received: by 2002:a17:90b:2406:b0:295:cf9f:a1de with SMTP id nr6-20020a17090b240600b00295cf9fa1demr13463579pjb.12.1707136067754; Mon, 05 Feb 2024 04:27:47 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707136067; cv=pass; d=google.com; s=arc-20160816; b=qgqBmNp96cPksiuLRrHZmnliVYIHRHmKC86xP+nqGUSORre7f7cO7KvHxTyjV29K9E cbsXjIAfw1kyXkKOo7j2cDv1trDSQwSnt1ZKeNjpqwieK7iASNGy1b5NV/uNx7jz3xfA 6mEMQyvoLNGs/3qi4mYqxlPMC+1OffNORlTs/1D2sqmr+HxZR007pv9TY4pBx46yLr6Y cWmo8bAMKsqLtRtYMpZ8mUoQfjUMzHqkOhP/zqTRJF44vkXYfE2fFy5L1LTLDmSxVFaZ rwvDFjuxI2QGQ6+J37JlJmY9sNjhyYXCv4jo6G7eXHeNRY39dFzss6NGOzUo2BpPUw4k WrgA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=R61AUSOacFWGGPMbVgCPWnIrJ1KmkLbm3r8EEWKzIHg=; fh=3ZlUufKHqDhLN0/t46VgHKDanRW0FnwWPFSCht6LcAs=; b=pEzPLlOtsU4AGytiomnMiqe7VAyHgeLI01qAP+43pIBmGpkt6sTrwn4/z2N/NhinVC IPQuLzwZgExUJegj6hmqPsJzJOMAkyYLs5dtZWw0mZnu00KfHZUsoJuYj9CsR3+8FB5Q oCSahGz9ufJo4p0skFvbKvsU1OoPmlQLFiCVlE/Yscs3DVMd4il2Vesx1xVYcJRPYby5 5/6nIf7z2lmdT5V8iBEDYjg9e334Tg92Q+Pigbr+b55D63gk1BRa5gXm2QF3Wh1q5Bse Tx9LAg9LmdJG5NWLdetT5ugZD7SWjWdJqqm9aCdnDlG+Frs1bDs6ODGAEuIyZaSKTvuS Fxdg==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b=ZpHiYYS7; arc=pass (i=1 spf=pass spfdomain=amazon.com dkim=pass dkdomain=amazon.com dmarc=pass fromdomain=amazon.com); spf=pass (google.com: domain of linux-kernel+bounces-52553-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-52553-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com X-Forwarded-Encrypted: i=1; AJvYcCXq5+phfOj0Hm6oMehklsVqqRvAfwNpWsNIfUGcvjSvfV1knqtX/lnuij8IzPaatKaXjqdqgtkmNtdBBWjdGq4x8Nlj3g== Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [147.75.48.161]) by mx.google.com with ESMTPS id w6-20020a17090aad4600b0029653f86794si4388252pjv.109.2024.02.05.04.27.47 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 05 Feb 2024 04:27:47 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-52553-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) client-ip=147.75.48.161; Authentication-Results: mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b=ZpHiYYS7; arc=pass (i=1 spf=pass spfdomain=amazon.com dkim=pass dkdomain=amazon.com dmarc=pass fromdomain=amazon.com); spf=pass (google.com: domain of linux-kernel+bounces-52553-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-52553-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 230D4B2480D for ; Mon, 5 Feb 2024 12:08:18 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 797171CF8C; Mon, 5 Feb 2024 12:04:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="ZpHiYYS7" Received: from smtp-fw-52004.amazon.com (smtp-fw-52004.amazon.com [52.119.213.154]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7A0841B5AA; Mon, 5 Feb 2024 12:04:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=52.119.213.154 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707134677; cv=none; b=YSzcTq8d/8Om7LYg9kQFFgPXbkweq5YueINlGL8+g6A/ig5+YUz+cvBBrSNq/eDlnEqaszCL5s/0E3yUs/q/Ut83H0cz3iIAA5QN0PwjkBUpacOcB2ru+KRliuugsXu68Oo4IXzmqENYMCeNzO8jDLOGKsLPF9FmlEgvr0+RAxM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707134677; c=relaxed/simple; bh=pZgiiTAlLapO69RdGzjL9bPIQ+8hoYW8Z/jLkX/PZWw=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Uv7FPGFSDLgtlsiqMTPG2ubsYin/qTuXBX69pYpwqgy17tEY9TbklgH60CzFtyM3gXjfHBp2ftSoadRIfgL6rW2o8mnkg/q9sHQ8s/jbePy9w6uF2vN7gHeXjtHu8xW7PRn2gjVLhthXWne04rSb6Q6epp4udKbC5Yz7/M9Z95I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com; spf=pass smtp.mailfrom=amazon.com; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b=ZpHiYYS7; arc=none smtp.client-ip=52.119.213.154 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1707134676; x=1738670676; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=R61AUSOacFWGGPMbVgCPWnIrJ1KmkLbm3r8EEWKzIHg=; b=ZpHiYYS7bX7Ee49iWVwgn6tXTcpzrQg/OJnRaSUU+v0EuUDGFcIda7LO mY2xzJKofyFUzM1DcCoHChqTwj4RQkUcyhJHjPRSfxKtVcqRRmNS7Y+ay ACpFwNad/ZcuF1gsLqjjysVMrpa1wm5F71PkDijz43Vm0nZt15Jgnizj+ g=; X-IronPort-AV: E=Sophos;i="6.05,245,1701129600"; d="scan'208";a="182633292" Received: from iad12-co-svc-p1-lb1-vlan2.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.43.8.2]) by smtp-border-fw-52004.iad7.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2024 12:04:31 +0000 Received: from EX19MTAEUA001.ant.amazon.com [10.0.43.254:29990] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.14.129:2525] with esmtp (Farcaster) id 479437ca-efef-4c96-b04c-24208e90e4af; Mon, 5 Feb 2024 12:04:30 +0000 (UTC) X-Farcaster-Flow-ID: 479437ca-efef-4c96-b04c-24208e90e4af Received: from EX19D014EUC004.ant.amazon.com (10.252.51.182) by EX19MTAEUA001.ant.amazon.com (10.252.50.192) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:04:29 +0000 Received: from dev-dsk-jgowans-1a-a3faec1f.eu-west-1.amazon.com (172.19.112.191) by EX19D014EUC004.ant.amazon.com (10.252.51.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:04:23 +0000 From: James Gowans To: CC: Eric Biederman , , "Joerg Roedel" , Will Deacon , , Alexander Viro , "Christian Brauner" , , Paolo Bonzini , Sean Christopherson , , Andrew Morton , , Alexander Graf , David Woodhouse , "Jan H . Schoenherr" , Usama Arif , Anthony Yznaga , Stanislav Kinsburskii , , , Subject: [RFC 10/18] iommu/intel: zap context table entries on kexec Date: Mon, 5 Feb 2024 12:01:55 +0000 Message-ID: <20240205120203.60312-11-jgowans@amazon.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240205120203.60312-1-jgowans@amazon.com> References: <20240205120203.60312-1-jgowans@amazon.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: EX19D041UWA001.ant.amazon.com (10.13.139.124) To EX19D014EUC004.ant.amazon.com (10.252.51.182) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1790061909301398314 X-GMAIL-MSGID: 1790061909301398314 In the next commit the IOMMU shutdown function will be modified to not actually shut down the IOMMU when doing a kexec. To prevent leaving DMA mappings for non-persistent devices around during kexec we add a function to the kexec flow which iterates though all IOMMU domains and zaps the context entries for the devices belonging to those domain. A list of domains for the IOMMU is added and maintained. --- drivers/iommu/intel/dmar.c | 1 + drivers/iommu/intel/iommu.c | 34 ++++++++++++++++++++++++++++++---- drivers/iommu/intel/iommu.h | 2 ++ 3 files changed, 33 insertions(+), 4 deletions(-) diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c index 23cb80d62a9a..00f69f40a4ac 100644 --- a/drivers/iommu/intel/dmar.c +++ b/drivers/iommu/intel/dmar.c @@ -1097,6 +1097,7 @@ static int alloc_iommu(struct dmar_drhd_unit *drhd) iommu->segment = drhd->segment; iommu->node = NUMA_NO_NODE; + INIT_LIST_HEAD(&iommu->domains); ver = readl(iommu->reg + DMAR_VER_REG); pr_info("%s: reg_base_addr %llx ver %d:%d cap %llx ecap %llx\n", diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 2dd3f055dbce..315c6b7f901c 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -1831,6 +1831,7 @@ static int domain_attach_iommu(struct dmar_domain *domain, goto err_clear; } domain_update_iommu_cap(domain); + list_add(&domain->domains, &iommu->domains); spin_unlock(&iommu->lock); return 0; @@ -3608,6 +3609,33 @@ static void intel_disable_iommus(void) iommu_disable_translation(iommu); } +void zap_context_table_entries(struct intel_iommu *iommu) +{ + struct context_entry *context; + struct dmar_domain *domain; + struct device_domain_info *device; + int bus, devfn; + u16 did_old; + + list_for_each_entry(domain, &iommu->domains, domains) { + list_for_each_entry(device, &domain->devices, link) { + context = iommu_context_addr(iommu, device->bus, device->devfn, 0); + if (!context || !context_present(context)) + continue; + context_domain_id(context); + context_clear_entry(context); + __iommu_flush_cache(iommu, context, sizeof(*context)); + iommu->flush.flush_context(iommu, + did_old, + (((u16)bus) << 8) | devfn, + DMA_CCMD_MASK_NOBIT, + DMA_CCMD_DEVICE_INVL); + iommu->flush.flush_iotlb(iommu, did_old, 0, 0, + DMA_TLB_DSI_FLUSH); + } + } +} + void intel_iommu_shutdown(void) { struct dmar_drhd_unit *drhd; @@ -3620,10 +3648,8 @@ void intel_iommu_shutdown(void) /* Disable PMRs explicitly here. */ for_each_iommu(iommu, drhd) - iommu_disable_protect_mem_regions(iommu); - - /* Make sure the IOMMUs are switched off */ - intel_disable_iommus(); + zap_context_table_entries(iommu); + return up_write(&dmar_global_lock); } diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h index a2338e398ba3..4a2f163a86f3 100644 --- a/drivers/iommu/intel/iommu.h +++ b/drivers/iommu/intel/iommu.h @@ -600,6 +600,7 @@ struct dmar_domain { spinlock_t lock; /* Protect device tracking lists */ struct list_head devices; /* all devices' list */ struct list_head dev_pasids; /* all attached pasids */ + struct list_head domains; /* all struct dmar_domains on this IOMMU */ struct dma_pte *pgd; /* virtual address */ int gaw; /* max guest address width */ @@ -700,6 +701,7 @@ struct intel_iommu { void *perf_statistic; struct iommu_pmu *pmu; + struct list_head domains; /* all struct dmar_domains on this IOMMU */ }; /* PCI domain-device relationship */ From patchwork Mon Feb 5 12:01:56 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Gowans, James" X-Patchwork-Id: 196780 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:168b:b0:106:860b:bbdd with SMTP id ma11csp828510dyb; Mon, 5 Feb 2024 04:08:45 -0800 (PST) X-Google-Smtp-Source: AGHT+IGXTvmhyvfUhFlaGbQKW2XI/9F/+fHCXHj+KtHNVMs3jL5u+9QDZzbm5fJWtN5LWAXT29Zx X-Received: by 2002:a9d:6296:0:b0:6db:fed6:a7fd with SMTP id x22-20020a9d6296000000b006dbfed6a7fdmr9948872otk.4.1707134925668; Mon, 05 Feb 2024 04:08:45 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707134925; cv=pass; d=google.com; s=arc-20160816; b=stmKohY0TkXAkX1oq2vqML7O5hIoih1RVE6Zt3DXU28DSGNibv3iZ2fWr6q9XvxAyp e5A7umhp8khNAI+UL+zKVFuA0UZBBoUDj+zpKnXnMybBq6pCUxJKrkkvIaPQpf2T6akv hh3J4tcRas4KlcfTK2uT3qGMWF3jh1I8+5lKf2yM20/wCRkbnebUKR4u+ZfZJrTUeGaP VnFq/dOCopUbgjzVKEmOEC4/Xa7MPXs/jcxPAXgIMjiO1xWRIzkXhyUFlObx31s850Eb 2sZnKo5jUJgqwrqq5MuTx7idghvwiZW0K+prpagCzZucIH0cMyHW3+GeX9pOURxyzYPM 6hyg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=J/26rYh9SEOln6y0M0icroO5+g+Mdt4aS3KBehgGFN8=; fh=0ZZvlqih1OcunTE2ONKhT+c9iSXRgstX8T5Lvr9SR/c=; b=sSguzuWcmGXFH2CSWHl5uMOK/CIxbHdxcKI8GPrYpL4/A27U8aZ11wWqD0HbCN38V/ ZOQ1vaTJ+ohB8svl5720EZFo8GOMA7b15ccZlWgLGn2idZ75UlbJpIbj0NAhPPPX48r0 CSeIqy95qAubYYLQOo1aUvAtQ85BrmsZP/KzmP+3qKNQd7Yts9uO6XMISkHvWK2sSceU JVVsdabUf8ef5I0W2mxIs4MoOpBvFXoR2+otssdB0Tdo929j9W0sx0AlxE7sRI2RtQ5Q XSc/dAn9Rji4FE3op8AiRRFDaNhvXKKOF5ZRNwRDcV7L4Wzrr+/txjj/WMOFbfbsfIfL sX/Q==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b=jBgeXVbC; arc=pass (i=1 spf=pass spfdomain=amazon.com dkim=pass dkdomain=amazon.com dmarc=pass fromdomain=amazon.com); spf=pass (google.com: domain of linux-kernel+bounces-52554-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-52554-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com X-Forwarded-Encrypted: i=1; AJvYcCWPHQVLMcnqRK67WXI8E1Q3krxdnKJawFBrq3TDs1gIi30wJLl3wMgC2Jeqy4MOSzntmsCwUyKD1ZZEkKXSfBuY6QLoeQ== Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id x20-20020ac85f14000000b0042bfffa1930si8497116qta.686.2024.02.05.04.08.45 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 05 Feb 2024 04:08:45 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-52554-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b=jBgeXVbC; arc=pass (i=1 spf=pass spfdomain=amazon.com dkim=pass dkdomain=amazon.com dmarc=pass fromdomain=amazon.com); spf=pass (google.com: domain of linux-kernel+bounces-52554-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-52554-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 738341C20BCB for ; Mon, 5 Feb 2024 12:08:45 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 7CEC41D525; Mon, 5 Feb 2024 12:04:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="jBgeXVbC" Received: from smtp-fw-80009.amazon.com (smtp-fw-80009.amazon.com [99.78.197.220]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 47BFD1CD35; Mon, 5 Feb 2024 12:04:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=99.78.197.220 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707134680; cv=none; b=YIByIh6zRP3VGUbzHnVFM8Am4kH8qnxo1ooVupDGwnn1jMtx6owrFatAyu2HgdLUZ1lQ+XDwtoIK9CPGNFGcp4owGqJm4JzrWyFm1Lsnru4k4XZzVAXvamk1QZbD5KyB4uO9ybm20nnHcPMJGd90mDSrPaL47/8hYhI/l62nvPc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707134680; c=relaxed/simple; bh=fscNGkpC5vVpP/8B8akCxtTGKONgvg1AB8vehQ7smFU=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=X4Y+0NAqKIGDdiT48cXfR0058rHy9OxalbUwtpk2JcyO8NDI5pD6/nlSqOiVEAUOWnJ7J9z/xQ2twUgYvThevuQMi/hFDI99yugBujF6ufJvNoOBGIiIjp57Onh+GD8n6zla891IzS9hgJWWp0yshv5df7pE3+9LaTvmxBCxZmM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com; spf=pass smtp.mailfrom=amazon.com; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b=jBgeXVbC; arc=none smtp.client-ip=99.78.197.220 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1707134680; x=1738670680; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=J/26rYh9SEOln6y0M0icroO5+g+Mdt4aS3KBehgGFN8=; b=jBgeXVbC7MQ7ZeIq302ImADqVc71l2YNCmbm/gxsDbKY2jZ4RtRubVSp BMrt66UyyL0Rk72Q8hIhiQAcndzT2dObMXoEZ8dAKfdA7IERIGzWm/p2L P8lNTmaiQmpbvrKZomnyha00s5pOWg0qd2QccLfBLKpD+5MDvaNB6k6aE c=; X-IronPort-AV: E=Sophos;i="6.05,245,1701129600"; d="scan'208";a="63724561" Received: from pdx4-co-svc-p1-lb2-vlan2.amazon.com (HELO smtpout.prod.us-east-1.prod.farcaster.email.amazon.dev) ([10.25.36.210]) by smtp-border-fw-80009.pdx80.corp.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2024 12:04:38 +0000 Received: from EX19MTAEUA002.ant.amazon.com [10.0.43.254:15018] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.28.192:2525] with esmtp (Farcaster) id 82f5a304-b865-4a4d-9902-a5be57c26a04; Mon, 5 Feb 2024 12:04:36 +0000 (UTC) X-Farcaster-Flow-ID: 82f5a304-b865-4a4d-9902-a5be57c26a04 Received: from EX19D014EUC004.ant.amazon.com (10.252.51.182) by EX19MTAEUA002.ant.amazon.com (10.252.50.124) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:04:36 +0000 Received: from dev-dsk-jgowans-1a-a3faec1f.eu-west-1.amazon.com (172.19.112.191) by EX19D014EUC004.ant.amazon.com (10.252.51.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:04:30 +0000 From: James Gowans To: CC: Eric Biederman , , "Joerg Roedel" , Will Deacon , , Alexander Viro , "Christian Brauner" , , Paolo Bonzini , Sean Christopherson , , Andrew Morton , , Alexander Graf , David Woodhouse , "Jan H . Schoenherr" , Usama Arif , Anthony Yznaga , Stanislav Kinsburskii , , , Subject: [RFC 11/18] dma-iommu: Always enable deferred attaches for liveupdate Date: Mon, 5 Feb 2024 12:01:56 +0000 Message-ID: <20240205120203.60312-12-jgowans@amazon.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240205120203.60312-1-jgowans@amazon.com> References: <20240205120203.60312-1-jgowans@amazon.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: EX19D041UWA001.ant.amazon.com (10.13.139.124) To EX19D014EUC004.ant.amazon.com (10.252.51.182) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1790060711805496189 X-GMAIL-MSGID: 1790060711805496189 Seeing as translations are pre-enabled, all devices will be set for deferred attach. The deferred attached actually has to be done when doing DMA mapping for devices to work. There may be a better way to do this be, for example, consulting the context entry table and only deferring attach if there is a persisted context table entry for this device. --- drivers/iommu/dma-iommu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index e5d087bd6da1..76f916848f48 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -1750,7 +1750,7 @@ void iommu_dma_compose_msi_msg(struct msi_desc *desc, struct msi_msg *msg) static int iommu_dma_init(void) { - if (is_kdump_kernel()) + if (is_kdump_kernel() || liveupdate) static_branch_enable(&iommu_deferred_attach_enabled); return iova_cache_get(); From patchwork Mon Feb 5 12:01:57 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Gowans, James" X-Patchwork-Id: 196796 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:168b:b0:106:860b:bbdd with SMTP id ma11csp840186dyb; Mon, 5 Feb 2024 04:30:33 -0800 (PST) X-Google-Smtp-Source: AGHT+IFrSQSdoOoC7LPJXCEwddhhQByN1cfY3AZSicD8v16OmqgJmKQu6oYI1pcp70RWV1yizzWu X-Received: by 2002:a05:6e02:92c:b0:363:c1c3:6c4b with SMTP id o12-20020a056e02092c00b00363c1c36c4bmr5610902ilt.2.1707136233148; Mon, 05 Feb 2024 04:30:33 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707136233; cv=pass; d=google.com; s=arc-20160816; b=xOzhymsZcQQacEPv9vCJKvz6jZxOOO24NvUx+u+voGJiyPMe5UvWXyUWKiOui1hNsc gqD1SiI88sUD1ssiXlBXHW33axLg3KRxXK/MU6b6Hfn8b6heLLX8wZdDMr/vpgXYzfgE qDfyt3fw2h1o7AsAU6aiwluIAp/Fb/20dlaIyNBQjWsku7n7iJORDBApFA+86tc042u8 RqBhSeaE+1gk2ySwVgT9+Kin+MZ7bLaucBgD9530yrNy7foPzagGgB53PWhOKIKU9E4n dLNhH0EGwIBLB3yrxaSwGzXkR0N9HuRngmqDJlFDBK6wFSfAhS3Qga4Lu6htMlPFm4Ha ak/A== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=FfAwJVvQHyFiTbONYCZnQNlkav6R7SrA1Dbs6XI8844=; fh=yOYMoj5meVJK45ZK7EEQHRPOYHSqUiNDG92SyssRodM=; b=0iZGf7cFWFOuOwRaIBg4ES/G1LP6eLk7P8aiFxbf0w9rICeUfwbOxL1J0gNyqlSi6v KR+Ms4ZLQ9ys3ZCy3QaLyIYz3O/FVQTJ4NRNsGfwTJ3KCOhcIjRJGvhaMhnXANbFU4OV kf0RZP2Vjfm8Sz4ACidRugjOAROGE4AWm6fD1z/IvXN4C+S/2k790zM5Rb7cWKne6I19 jtMbOcAsb92OiMZrTLhnXnvwZmthUBpsur/CSHj6iBzlX5xIl0ArEqfq9o+uPN7la90o WLNGK01dr2acNL3yJQGtsCTJwN73ZqEZDMzifDHYSqjI5OaPNP8/HHkmGE2SxAZ/CuKR i04Q==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b=rBiC7ikK; arc=pass (i=1 spf=pass spfdomain=amazon.com dkim=pass dkdomain=amazon.com dmarc=pass fromdomain=amazon.com); spf=pass (google.com: domain of linux-kernel+bounces-52555-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-52555-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com X-Forwarded-Encrypted: i=1; AJvYcCWcFHJbFrvmwHh15x2Q70caoEYXLnPGnDSdHJp1FbiYRRWxFuzhOnwmQuJdwjJZwyStzon7wCCThv2eKA9K4AY+DqLmdQ== Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [147.75.48.161]) by mx.google.com with ESMTPS id d70-20020a633649000000b005dc15e82c47si2147523pga.363.2024.02.05.04.30.32 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 05 Feb 2024 04:30:33 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-52555-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) client-ip=147.75.48.161; Authentication-Results: mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b=rBiC7ikK; arc=pass (i=1 spf=pass spfdomain=amazon.com dkim=pass dkdomain=amazon.com dmarc=pass fromdomain=amazon.com); spf=pass (google.com: domain of linux-kernel+bounces-52555-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-52555-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 6F324B21112 for ; Mon, 5 Feb 2024 12:09:22 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 10B5B1BDEE; Mon, 5 Feb 2024 12:05:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="rBiC7ikK" Received: from smtp-fw-52002.amazon.com (smtp-fw-52002.amazon.com [52.119.213.150]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 48F0F1D532; Mon, 5 Feb 2024 12:05:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=52.119.213.150 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707134715; cv=none; b=FCgsm2jH2tPppUIa65dYnNVJmFUN/oB7RHyAqCzr9gv5sty3uAgUGRxh4D/IW1WysTbTOkBLEJ/4+pM0dQPF0hXa+DThYYq9xvu8nK5Z2e0/8I5JdpgGSrXYRusL70vltKISELMrh0tcye6sfP2cePuRZvojBAZ85fKreGk4uT8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707134715; c=relaxed/simple; bh=3vrGczpyrruaMjXatBGYsYV2shf2AH13YtkDkq8j5sY=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=GCTrGrkcp2urfikKzgw4TlO9k9dSjsxy0lEdmtcK9NASUA/keVhN728V+cCt5bUD3FjGyrnMciWcw+Rh1g00UiUX3G9R3Xmb0bM6SqseKgLPJ+BaVOADPXLu2WSyzimR41oVOpGLIqGV1uCh4ifbFKcpvpKhxHR164+Nte4xS/0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com; spf=pass smtp.mailfrom=amazon.com; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b=rBiC7ikK; arc=none smtp.client-ip=52.119.213.150 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1707134713; x=1738670713; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=FfAwJVvQHyFiTbONYCZnQNlkav6R7SrA1Dbs6XI8844=; b=rBiC7ikKOvQFn4asAe08Bc5Aazz7SkCi9MG/65Unu2c8sm+oWJpXnFwV g+F5I3SDvv2ophIVeXpl8l4FKUoYjdNTM2Ec4h3KnqbFNJbGLX/tjQzz7 xwH9VULxVmg7l9CbjzZh0S+ckkRukr1velW8RuvRnSIQp/pUGh3BkNWLW Q=; X-IronPort-AV: E=Sophos;i="6.05,245,1701129600"; d="scan'208";a="610940405" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.43.8.6]) by smtp-border-fw-52002.iad7.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2024 12:05:08 +0000 Received: from EX19MTAEUB002.ant.amazon.com [10.0.43.254:20056] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.8.155:2525] with esmtp (Farcaster) id 0814e71f-b1b8-4c6d-9ee2-790efdbab159; Mon, 5 Feb 2024 12:05:06 +0000 (UTC) X-Farcaster-Flow-ID: 0814e71f-b1b8-4c6d-9ee2-790efdbab159 Received: from EX19D014EUC004.ant.amazon.com (10.252.51.182) by EX19MTAEUB002.ant.amazon.com (10.252.51.59) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:05:06 +0000 Received: from dev-dsk-jgowans-1a-a3faec1f.eu-west-1.amazon.com (172.19.112.191) by EX19D014EUC004.ant.amazon.com (10.252.51.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:05:00 +0000 From: James Gowans To: CC: Eric Biederman , , "Joerg Roedel" , Will Deacon , , Alexander Viro , "Christian Brauner" , , Paolo Bonzini , Sean Christopherson , , Andrew Morton , , Alexander Graf , David Woodhouse , "Jan H . Schoenherr" , Usama Arif , Anthony Yznaga , Stanislav Kinsburskii , , , Subject: [RFC 12/18] pkernfs: Add IOMMU domain pgtables file Date: Mon, 5 Feb 2024 12:01:57 +0000 Message-ID: <20240205120203.60312-13-jgowans@amazon.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240205120203.60312-1-jgowans@amazon.com> References: <20240205120203.60312-1-jgowans@amazon.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: EX19D046UWA004.ant.amazon.com (10.13.139.76) To EX19D014EUC004.ant.amazon.com (10.252.51.182) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1790062082668084847 X-GMAIL-MSGID: 1790062082668084847 Similar to the IOMMU root pgtables file which was added in a previous commit, now support a file type for IOMMU domain pgtables in the IOMMU directory. These domain pgtable files only need to be useable after the system has booted up, for example by QEMU creating one of these files and using it to back the IOMMU pgtables for a persistent VM. As such the filesystem abstraction can be better maintained here as the kernel code doesn't need to reach "behind" the filesystem abstraction like it does for the root pgtables. A new inode type is created for domain pgtable files, and the IOMMU directory gets inode_operation callbacks to support creating and deleting these files in it. Note: there is a use-after-free risk here too: if the domain pgtable file is truncated while it's in-use for IOMMU pgtables then freed memory could still be mapped into the IOMMU. To mitigate this there should be a machanism to "freeze" the files once they've been given to the IOMMU. --- fs/pkernfs/inode.c | 9 +++++-- fs/pkernfs/iommu.c | 55 +++++++++++++++++++++++++++++++++++++++-- fs/pkernfs/pkernfs.h | 4 +++ include/linux/pkernfs.h | 1 + 4 files changed, 65 insertions(+), 4 deletions(-) diff --git a/fs/pkernfs/inode.c b/fs/pkernfs/inode.c index 1d712e0a82a1..35842cd61002 100644 --- a/fs/pkernfs/inode.c +++ b/fs/pkernfs/inode.c @@ -35,7 +35,11 @@ struct inode *pkernfs_inode_get(struct super_block *sb, unsigned long ino) inode->i_op = &pkernfs_iommu_dir_inode_operations; inode->i_fop = &pkernfs_dir_fops; inode->i_mode = S_IFDIR; - } else if (pkernfs_inode->flags | PKERNFS_INODE_FLAG_IOMMU_ROOT_PGTABLES) { + } else if (pkernfs_inode->flags & PKERNFS_INODE_FLAG_IOMMU_ROOT_PGTABLES) { + inode->i_fop = &pkernfs_file_fops; + inode->i_mode = S_IFREG; + } else if (pkernfs_inode->flags & PKERNFS_INODE_FLAG_IOMMU_DOMAIN_PGTABLES) { + inode->i_fop = &pkernfs_file_fops; inode->i_mode = S_IFREG; } @@ -175,6 +179,7 @@ const struct inode_operations pkernfs_dir_inode_operations = { }; const struct inode_operations pkernfs_iommu_dir_inode_operations = { + .create = pkernfs_create_iommu_pgtables, .lookup = pkernfs_lookup, + .unlink = pkernfs_unlink, }; - diff --git a/fs/pkernfs/iommu.c b/fs/pkernfs/iommu.c index 5bce8146d7bb..f14e76013e85 100644 --- a/fs/pkernfs/iommu.c +++ b/fs/pkernfs/iommu.c @@ -4,6 +4,27 @@ #include +void pkernfs_alloc_iommu_domain_pgtables(struct file *ppts, struct pkernfs_region *pkernfs_region) +{ + struct pkernfs_inode *pkernfs_inode; + unsigned long *mappings_block_vaddr; + unsigned long inode_idx; + + /* + * For a pkernfs region block, the "mappings_block" field is still + * just a block index, but that block doesn't actually contain mappings + * it contains the pkernfs_region data + */ + + inode_idx = ppts->f_inode->i_ino; + pkernfs_inode = pkernfs_get_persisted_inode(NULL, inode_idx); + + mappings_block_vaddr = (unsigned long *)pkernfs_addr_for_block(NULL, + pkernfs_inode->mappings_block); + set_bit(0, mappings_block_vaddr); + pkernfs_region->vaddr = mappings_block_vaddr; + pkernfs_region->paddr = pkernfs_base + (pkernfs_inode->mappings_block * (2 << 20)); +} void pkernfs_alloc_iommu_root_pgtables(struct pkernfs_region *pkernfs_region) { unsigned long *mappings_block_vaddr; @@ -63,9 +84,8 @@ void pkernfs_alloc_iommu_root_pgtables(struct pkernfs_region *pkernfs_region) * just a block index, but that block doesn't actually contain mappings * it contains the pkernfs_region data */ - mappings_block_vaddr = (unsigned long *)pkernfs_addr_for_block(NULL, - iommu_pgtables->mappings_block); + iommu_pgtables->mappings_block); set_bit(0, mappings_block_vaddr); pkernfs_region->vaddr = mappings_block_vaddr; pkernfs_region->paddr = pkernfs_base + (iommu_pgtables->mappings_block * PMD_SIZE); @@ -88,6 +108,29 @@ void pkernfs_alloc_iommu_root_pgtables(struct pkernfs_region *pkernfs_region) (iommu_pgtables->mappings_block * PMD_SIZE); } +int pkernfs_create_iommu_pgtables(struct mnt_idmap *id, struct inode *dir, + struct dentry *dentry, umode_t mode, bool excl) +{ + unsigned long free_inode; + struct pkernfs_inode *pkernfs_inode; + struct inode *vfs_inode; + + free_inode = pkernfs_allocate_inode(dir->i_sb); + if (free_inode <= 0) + return -ENOMEM; + + pkernfs_inode = pkernfs_get_persisted_inode(dir->i_sb, free_inode); + pkernfs_inode->sibling_ino = pkernfs_get_persisted_inode(dir->i_sb, dir->i_ino)->child_ino; + pkernfs_get_persisted_inode(dir->i_sb, dir->i_ino)->child_ino = free_inode; + strscpy(pkernfs_inode->filename, dentry->d_name.name, PKERNFS_FILENAME_LEN); + pkernfs_inode->flags = PKERNFS_INODE_FLAG_IOMMU_DOMAIN_PGTABLES; + pkernfs_inode->mappings_block = pkernfs_alloc_block(dir->i_sb); + memset(pkernfs_addr_for_block(dir->i_sb, pkernfs_inode->mappings_block), 0, (2 << 20)); + vfs_inode = pkernfs_inode_get(dir->i_sb, free_inode); + d_add(dentry, vfs_inode); + return 0; +} + void *pkernfs_region_paddr_to_vaddr(struct pkernfs_region *region, unsigned long paddr) { if (WARN_ON(paddr >= region->paddr + region->bytes)) @@ -96,3 +139,11 @@ void *pkernfs_region_paddr_to_vaddr(struct pkernfs_region *region, unsigned long return NULL; return region->vaddr + (paddr - region->paddr); } + +bool pkernfs_is_iommu_domain_pgtables(struct file *f) +{ + return f && + pkernfs_get_persisted_inode(f->f_inode->i_sb, f->f_inode->i_ino)->flags & + PKERNFS_INODE_FLAG_IOMMU_DOMAIN_PGTABLES; +} + diff --git a/fs/pkernfs/pkernfs.h b/fs/pkernfs/pkernfs.h index e1b7ae3fe7f1..9bea827f8b40 100644 --- a/fs/pkernfs/pkernfs.h +++ b/fs/pkernfs/pkernfs.h @@ -21,6 +21,7 @@ struct pkernfs_sb { #define PKERNFS_INODE_FLAG_DIR (1 << 1) #define PKERNFS_INODE_FLAG_IOMMU_DIR (1 << 2) #define PKERNFS_INODE_FLAG_IOMMU_ROOT_PGTABLES (1 << 3) +#define PKERNFS_INODE_FLAG_IOMMU_DOMAIN_PGTABLES (1 << 4) struct pkernfs_inode { int flags; /* @@ -50,8 +51,11 @@ void *pkernfs_addr_for_block(struct super_block *sb, int block_idx); unsigned long pkernfs_allocate_inode(struct super_block *sb); struct pkernfs_inode *pkernfs_get_persisted_inode(struct super_block *sb, int ino); +int pkernfs_create_iommu_pgtables(struct mnt_idmap *id, struct inode *dir, + struct dentry *dentry, umode_t mode, bool excl); extern const struct file_operations pkernfs_dir_fops; extern const struct file_operations pkernfs_file_fops; extern const struct inode_operations pkernfs_file_inode_operations; extern const struct inode_operations pkernfs_iommu_dir_inode_operations; +extern const struct inode_operations pkernfs_iommu_domain_pgtables_inode_operations; diff --git a/include/linux/pkernfs.h b/include/linux/pkernfs.h index 0110e4784109..4ca923ee0d82 100644 --- a/include/linux/pkernfs.h +++ b/include/linux/pkernfs.h @@ -33,4 +33,5 @@ void pkernfs_alloc_page_from_region(struct pkernfs_region *pkernfs_region, void **vaddr, unsigned long *paddr); void *pkernfs_region_paddr_to_vaddr(struct pkernfs_region *region, unsigned long paddr); +bool pkernfs_is_iommu_domain_pgtables(struct file *f); #endif /* _LINUX_PKERNFS_H */ From patchwork Mon Feb 5 12:01:58 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Gowans, James" X-Patchwork-Id: 196781 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:168b:b0:106:860b:bbdd with SMTP id ma11csp829207dyb; Mon, 5 Feb 2024 04:10:09 -0800 (PST) X-Google-Smtp-Source: AGHT+IEbjSUSOQ1R95duTPYG/as6CuxodRoonWFHf2Z9qhv7p0pXbYjyL5EZvy5VQeWvJ9zMYV+n X-Received: by 2002:a05:6358:703:b0:178:c0fd:d2e4 with SMTP id e3-20020a056358070300b00178c0fdd2e4mr16174512rwj.10.1707135009074; Mon, 05 Feb 2024 04:10:09 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707135009; cv=pass; d=google.com; s=arc-20160816; b=SeiLEuHc6z22JfpZivVMTHm5OM3a6MGRlCHRlSJXdHwbtz6urmSelsGxVgC2Jn6xAS hE2o5XEsp3SPdi5xa+kWwBAAuqto/8bYVxE1ec1B0H7Ic0R6KQFthJfrlvU+37m9+hPr w8Sm6uw9kTHE6EYPKrBl6CeaL679OaW/Pz+9tGz9YMUMJ0UUFLR9KX0XvnA3HWt3G8QW oVCrwOe9flYplq+qdnw0wnMDYWzTpQ70ypnQN8sHcu8sNBQULclSI2Y8gbAAk7loSjAb Aq6jLDLHbQzYxtC8FyV/+iJuiYjRd7sJV99zcCCBID4EldMkzpYQc83yrYHpTEkyTMcQ /bmA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=mVhmItHN9qmSrpLWsRnSDOh51VJYGjIwmSMhmIPWHpo=; fh=WyoJpEedGBH15D+aloQEFveYslb59xaq1dexmTs0ymo=; b=mVe4yTu4Ief9BEeesUsV7w1gfUUkfKcsRnR90M8LKzIkasXefP+zxWY7Dca1dcslsJ O4LkzE+mpswGNM3ZvyXMBKPeZS3dzjCsDzm9NLxQ8QZajUNH9ZQJy2duccnVnshoQbNt /fdZIl/yqM5x+OmawU1VE84Kgkq7Wt07Yv6JwZK3nZQYd+6T0oo+N9sjL5w58rEVy8nh 5etcmvVJ8Hb4yInCCO/e9qADFZD6muR2I2IQBPJRwu5GVQiTTqjUK2vnNGRL4Mlpl4nv pQICzSUzV8KPP4splRcWf/EDhAJwsaI+yDuW1gRQlyyj4cJHj1/EWtS9Y3tfVIvfH7Gj viTw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b=R3AYRMnS; arc=pass (i=1 spf=pass spfdomain=amazon.com dkim=pass dkdomain=amazon.com dmarc=pass fromdomain=amazon.com); spf=pass (google.com: domain of linux-kernel+bounces-52556-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-52556-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com X-Forwarded-Encrypted: i=1; AJvYcCXPFcWyxFBzGdmInYVEFxZctitBUU2beNiNwfS9/eM7GPJdpwDr4W+r+JvQF6vhD9wcWvUJh4wER4IJ8Tmtu0aFyVOoGQ== Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id y38-20020a631826000000b005bd27be66e1si5732446pgl.719.2024.02.05.04.10.08 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 05 Feb 2024 04:10:09 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-52556-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b=R3AYRMnS; arc=pass (i=1 spf=pass spfdomain=amazon.com dkim=pass dkdomain=amazon.com dmarc=pass fromdomain=amazon.com); spf=pass (google.com: domain of linux-kernel+bounces-52556-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-52556-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 0FA672857E0 for ; Mon, 5 Feb 2024 12:09:46 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 439C01D53D; Mon, 5 Feb 2024 12:05:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="R3AYRMnS" Received: from smtp-fw-80006.amazon.com (smtp-fw-80006.amazon.com [99.78.197.217]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2D1FE1BDED; Mon, 5 Feb 2024 12:05:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=99.78.197.217 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707134718; cv=none; b=IMpi1JE6/plHPXfoTrG1lv8DK1G4m0qLP9hDABUv7auautwokDOz6hAm8RJGHAtlhXdabhGXo23sgH9p48+F3sWMHQN8P3ZjPdX/COGKX9gNQhTbTr8zXWr+Opubk1PTejf5c8GvgGB+zJ+8qKiyExYuZ2GCiZe2mB5ApVqQokc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707134718; c=relaxed/simple; bh=z8BoM9mBaSRQmeip//Yd1XmcxTr/klkAQB0/G6dx2tY=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=S7jOIy6WnHIF5sFYrhDhXFafXN+ZK731Ezovmo9A3ryHtKNjHulaAvaVMDBdQhVGiJXSpD077Sfljdbljeih4hv5ny3iMB2apApqvTLhYw/AqfdP0oFEr5wZjZXG/uBhs43LdiqC4TsHu369gyqEaFIm5kYCJY1Rde3XhM/zkL4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com; spf=pass smtp.mailfrom=amazon.com; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b=R3AYRMnS; arc=none smtp.client-ip=99.78.197.217 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1707134717; x=1738670717; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=mVhmItHN9qmSrpLWsRnSDOh51VJYGjIwmSMhmIPWHpo=; b=R3AYRMnSiKwRmMKtH/lqN0t0jVA1hEeWujguPv2alxxY26X3S7XzQefN CbMyAY1Rw40hw8Q32U8gZ8gatsvonTW4i0VA21ckDc9rQ1AWwQz/O6p+M FzpA83tH+NQsPmMmtOZu3T0MuoB6TXvnFWVtb+BYZJp3/CpOeYlh70u6f A=; X-IronPort-AV: E=Sophos;i="6.05,245,1701129600"; d="scan'208";a="271102836" Received: from pdx4-co-svc-p1-lb2-vlan3.amazon.com (HELO smtpout.prod.us-east-1.prod.farcaster.email.amazon.dev) ([10.25.36.214]) by smtp-border-fw-80006.pdx80.corp.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2024 12:05:14 +0000 Received: from EX19MTAEUC002.ant.amazon.com [10.0.17.79:18332] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.28.192:2525] with esmtp (Farcaster) id 430958cf-6b4e-40ab-be7a-e53a50074ea7; Mon, 5 Feb 2024 12:05:13 +0000 (UTC) X-Farcaster-Flow-ID: 430958cf-6b4e-40ab-be7a-e53a50074ea7 Received: from EX19D014EUC004.ant.amazon.com (10.252.51.182) by EX19MTAEUC002.ant.amazon.com (10.252.51.181) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:05:13 +0000 Received: from dev-dsk-jgowans-1a-a3faec1f.eu-west-1.amazon.com (172.19.112.191) by EX19D014EUC004.ant.amazon.com (10.252.51.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:05:06 +0000 From: James Gowans To: CC: Eric Biederman , , "Joerg Roedel" , Will Deacon , , Alexander Viro , "Christian Brauner" , , Paolo Bonzini , Sean Christopherson , , Andrew Morton , , Alexander Graf , David Woodhouse , "Jan H . Schoenherr" , Usama Arif , Anthony Yznaga , Stanislav Kinsburskii , , , Subject: [RFC 13/18] vfio: add ioctl to define persistent pgtables on container Date: Mon, 5 Feb 2024 12:01:58 +0000 Message-ID: <20240205120203.60312-14-jgowans@amazon.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240205120203.60312-1-jgowans@amazon.com> References: <20240205120203.60312-1-jgowans@amazon.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: EX19D046UWA004.ant.amazon.com (10.13.139.76) To EX19D014EUC004.ant.amazon.com (10.252.51.182) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1790060799471271610 X-GMAIL-MSGID: 1790060799471271610 The previous commits added a file type in pkernfs for IOMMU persistent page tables. Now support actually setting persistent page tables on an IOMMU domain. This is done via a VFIO ioctl on a VFIO container. Userspace needs to create and open a IOMMU persistent page tables file and then supply that fd to the new VFIO_CONTAINER_SET_PERSISTENT_PGTABLES ioctl. That ioctl sets the supplied struct file on the struct vfio_container. Later when the IOMMU domain is allocated by VFIO, VFIO will check to see if the persistent pagetables have been defined and if they have will use the iommu_domain_alloc_persistent API which was introduced in the previous commit to pass the struct file down to the IOMMU which will actually use it for page tables. After kexec userspace needs to open the same IOMMU page table file and set it again via the same ioctl so that the IOMMU continues to use the same memory region for its page tables for that domain. --- drivers/vfio/container.c | 27 +++++++++++++++++++++++++++ drivers/vfio/vfio.h | 2 ++ drivers/vfio/vfio_iommu_type1.c | 27 +++++++++++++++++++++++++-- include/uapi/linux/vfio.h | 9 +++++++++ 4 files changed, 63 insertions(+), 2 deletions(-) diff --git a/drivers/vfio/container.c b/drivers/vfio/container.c index d53d08f16973..b60fcbf7bad0 100644 --- a/drivers/vfio/container.c +++ b/drivers/vfio/container.c @@ -11,6 +11,7 @@ #include #include #include +#include #include #include "vfio.h" @@ -21,6 +22,7 @@ struct vfio_container { struct rw_semaphore group_lock; struct vfio_iommu_driver *iommu_driver; void *iommu_data; + struct file *persistent_pgtables; bool noiommu; }; @@ -306,6 +308,8 @@ static long vfio_ioctl_set_iommu(struct vfio_container *container, continue; } + driver->ops->set_persistent_pgtables(data, container->persistent_pgtables); + ret = __vfio_container_attach_groups(container, driver, data); if (ret) { driver->ops->release(data); @@ -324,6 +328,26 @@ static long vfio_ioctl_set_iommu(struct vfio_container *container, return ret; } +static int vfio_ioctl_set_persistent_pgtables(struct vfio_container *container, + unsigned long arg) +{ + struct vfio_set_persistent_pgtables set_ppts; + struct file *ppts; + + if (copy_from_user(&set_ppts, (void __user *)arg, sizeof(set_ppts))) + return -EFAULT; + + ppts = fget(set_ppts.persistent_pgtables_fd); + if (!ppts) + return -EBADF; + if (!pkernfs_is_iommu_domain_pgtables(ppts)) { + fput(ppts); + return -EBADF; + } + container->persistent_pgtables = ppts; + return 0; +} + static long vfio_fops_unl_ioctl(struct file *filep, unsigned int cmd, unsigned long arg) { @@ -345,6 +369,9 @@ static long vfio_fops_unl_ioctl(struct file *filep, case VFIO_SET_IOMMU: ret = vfio_ioctl_set_iommu(container, arg); break; + case VFIO_CONTAINER_SET_PERSISTENT_PGTABLES: + ret = vfio_ioctl_set_persistent_pgtables(container, arg); + break; default: driver = container->iommu_driver; data = container->iommu_data; diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h index 307e3f29b527..6fa301bf6474 100644 --- a/drivers/vfio/vfio.h +++ b/drivers/vfio/vfio.h @@ -226,6 +226,8 @@ struct vfio_iommu_driver_ops { void *data, size_t count, bool write); struct iommu_domain *(*group_iommu_domain)(void *iommu_data, struct iommu_group *group); + int (*set_persistent_pgtables)(void *iommu_data, + struct file *ppts); }; struct vfio_iommu_driver { diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index eacd6ec04de5..b36edfc5c9ef 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -75,6 +75,7 @@ struct vfio_iommu { bool nesting; bool dirty_page_tracking; struct list_head emulated_iommu_groups; + struct file *persistent_pgtables; }; struct vfio_domain { @@ -2143,9 +2144,14 @@ static void vfio_iommu_iova_insert_copy(struct vfio_iommu *iommu, static int vfio_iommu_domain_alloc(struct device *dev, void *data) { + /* data is an in pointer to PPTs, and an out to the new domain. */ + struct file *ppts = *(struct file **) data; struct iommu_domain **domain = data; - *domain = iommu_domain_alloc(dev->bus); + if (ppts) + *domain = iommu_domain_alloc_persistent(dev->bus, ppts); + else + *domain = iommu_domain_alloc(dev->bus); return 1; /* Don't iterate */ } @@ -2156,6 +2162,8 @@ static int vfio_iommu_type1_attach_group(void *iommu_data, struct vfio_iommu_group *group; struct vfio_domain *domain, *d; bool resv_msi; + /* In/out ptr to iommu_domain_alloc. */ + void *domain_alloc_data; phys_addr_t resv_msi_base = 0; struct iommu_domain_geometry *geo; LIST_HEAD(iova_copy); @@ -2203,8 +2211,12 @@ static int vfio_iommu_type1_attach_group(void *iommu_data, * want to iterate beyond the first device (if any). */ ret = -EIO; - iommu_group_for_each_dev(iommu_group, &domain->domain, + /* Smuggle the PPTs in the data field; it will be clobbered with the new domain */ + domain_alloc_data = iommu->persistent_pgtables; + iommu_group_for_each_dev(iommu_group, &domain_alloc_data, vfio_iommu_domain_alloc); + domain->domain = domain_alloc_data; + if (!domain->domain) goto out_free_domain; @@ -3165,6 +3177,16 @@ vfio_iommu_type1_group_iommu_domain(void *iommu_data, return domain; } +int vfio_iommu_type1_set_persistent_pgtables(void *iommu_data, + struct file *ppts) +{ + + struct vfio_iommu *iommu = iommu_data; + + iommu->persistent_pgtables = ppts; + return 0; +} + static const struct vfio_iommu_driver_ops vfio_iommu_driver_ops_type1 = { .name = "vfio-iommu-type1", .owner = THIS_MODULE, @@ -3179,6 +3201,7 @@ static const struct vfio_iommu_driver_ops vfio_iommu_driver_ops_type1 = { .unregister_device = vfio_iommu_type1_unregister_device, .dma_rw = vfio_iommu_type1_dma_rw, .group_iommu_domain = vfio_iommu_type1_group_iommu_domain, + .set_persistent_pgtables = vfio_iommu_type1_set_persistent_pgtables, }; static int __init vfio_iommu_type1_init(void) diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index afc1369216d9..fa9676bb4b26 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -1797,6 +1797,15 @@ struct vfio_iommu_spapr_tce_remove { }; #define VFIO_IOMMU_SPAPR_TCE_REMOVE _IO(VFIO_TYPE, VFIO_BASE + 20) +struct vfio_set_persistent_pgtables { + /* + * File descriptor for a pkernfs IOMMU pgtables + * file to be used for persistence. + */ + __u32 persistent_pgtables_fd; +}; +#define VFIO_CONTAINER_SET_PERSISTENT_PGTABLES _IO(VFIO_TYPE, VFIO_BASE + 21) + /* ***************************************************************** */ #endif /* _UAPIVFIO_H */ From patchwork Mon Feb 5 12:01:59 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Gowans, James" X-Patchwork-Id: 196782 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:168b:b0:106:860b:bbdd with SMTP id ma11csp829314dyb; Mon, 5 Feb 2024 04:10:20 -0800 (PST) X-Google-Smtp-Source: AGHT+IHy1DsYdo+XU0m1BYEjw/R63VBwoPM887nZvRgY8Z1gpWVgx54VSN5FDlmO4vvdn0Wv8IvO X-Received: by 2002:a17:902:904a:b0:1d9:1b80:8799 with SMTP id w10-20020a170902904a00b001d91b808799mr14846814plz.41.1707135019846; Mon, 05 Feb 2024 04:10:19 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707135019; cv=pass; d=google.com; s=arc-20160816; b=veN0OHxbJDs9k+a9E+QL2SL796y3iV6Ph3avXvPPzaqLPA2GtPt4RVSxKn7OJBJYWw +aSXXvOBTVY48nkBXIhfd+6NPDcGur9LremOv4G5GSnKVi2vQo/V45wEReTj3UzzhCkN HQspUXX1+uTrWPqjHe5oLF21qBB1NTq1LPfAXSXcwqHWg7vTfQtGEJcYBzSdmcOojUU/ JAe0vBIWdRy4b9hQDoBTzLIx75yQxsZVLCFO7Nqk1XDUlrxdXK8f5DHVnzgAdGPll9qi VNNS2cMKao+LtuyfCXYNi31lxnYtPfXmfDxxmPug+gqVgbzSkoMWPXli6pIRF6Nq7MTB fXBg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=VONiUp37V5G8RKTN0EmhvHz4ErjJ4ifKDUQ6e8VlzjQ=; fh=x8LTwJIK77F/aQw79XnvdPNvW0bQ+Rx7+3IfQRp0bkI=; b=FB+zELQSrw6sAzqv2qyP6LPgUtYwGJDDuRVlsAfNaKwB01XLhEZ/AXEjjNoxLBtJoI CJFvrfuwYhiTFzLbcpeiZ/KXw7RMX8Vk4eUiMfUWkVt3KNng4d7oDLDUtYESqtDNponc b30nfEuD/Svv0QHYUjkIe3e446y0M7RNsCRVwdlGFH4ADcBj9zcG/6UeZmW4ovXRuHVw H+ttFZFy4oYOOrG2EiyxascHJNoNx5JioBdm70GDZslyOP6KHgHQfKjCDC8sAUd0Cdl9 qNE7ZdnU9R33VMfcC2wxtTbU2j29PX55JI/2M01lm5o1i+Txc5qbaZLnoEmbtKzx61t8 eN0g==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b="ldc/791q"; arc=pass (i=1 spf=pass spfdomain=amazon.com dkim=pass dkdomain=amazon.com dmarc=pass fromdomain=amazon.com); spf=pass (google.com: domain of linux-kernel+bounces-52557-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-52557-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com X-Forwarded-Encrypted: i=1; AJvYcCXmHG7UjULCUHa5pZZ26VdWJrlLOMp8fhBLKYnQejsmRg6rlPczq+fBscF8dPeQ483b8zSrSNdwcktjsZa+IpN0eFQBJw== Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id j20-20020a170902f25400b001d94544ed1esi6103193plc.410.2024.02.05.04.10.19 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 05 Feb 2024 04:10:19 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-52557-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b="ldc/791q"; arc=pass (i=1 spf=pass spfdomain=amazon.com dkim=pass dkdomain=amazon.com dmarc=pass fromdomain=amazon.com); spf=pass (google.com: domain of linux-kernel+bounces-52557-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-52557-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 865A5282134 for ; Mon, 5 Feb 2024 12:10:15 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id F27031F95F; Mon, 5 Feb 2024 12:05:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="ldc/791q" Received: from smtp-fw-80008.amazon.com (smtp-fw-80008.amazon.com [99.78.197.219]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C5A0F1D555; Mon, 5 Feb 2024 12:05:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=99.78.197.219 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707134724; cv=none; b=Xj38B/7cbmL4nhpLOAyn3dTSbJi37O08IeNFU1KushXWVTyOJTf/GSsqOlfEmOOHJDS5I5UE5Avs9md3tRBLGcNngpOOld96M03ggHEyIQmwZn/2uAUMQBHe96fc2WnUAtxdkToGFOecuq1tMtt7H+e0KwNx3sfhzo03x4rmsw8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707134724; c=relaxed/simple; bh=HqWQLqTIWJhQA/wLFQbrxl5Yh4O/OkFiHi8etjbGQVs=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=fjab8lnUPyg5s84tJmnImIwq4G3TZKxLgHF8ijaIEY7i8MDy449SdtBy3NplA/StpDbjHof76tnDfUIdP7kL6eykBl79iI9Z8hbza2royxhApLvdo0NL18PV0mSuEFR8FNJ/FDVKTdV1sNSt1Q7E2Szd5mKVBftmJJEm0gySzc0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com; spf=pass smtp.mailfrom=amazon.com; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b=ldc/791q; arc=none smtp.client-ip=99.78.197.219 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1707134722; x=1738670722; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=VONiUp37V5G8RKTN0EmhvHz4ErjJ4ifKDUQ6e8VlzjQ=; b=ldc/791qw6jJic1Uai0kv46k6X1Wfizk0tOGGdMAT7eR2d3YERm3mklH jHUZ6hCp3vTgh8dtRjl5+2zBoGEDDYu/mFZLH+Ewz+8PELZTA/YOMqNTh L/NuBZM/klXbZjUht1oy+3oiCT1mUrFu2AN4Uu9yLIhAo8xh3H6pCt9XV k=; X-IronPort-AV: E=Sophos;i="6.05,245,1701129600"; d="scan'208";a="63755776" Received: from pdx4-co-svc-p1-lb2-vlan3.amazon.com (HELO smtpout.prod.us-east-1.prod.farcaster.email.amazon.dev) ([10.25.36.214]) by smtp-border-fw-80008.pdx80.corp.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2024 12:05:21 +0000 Received: from EX19MTAEUC001.ant.amazon.com [10.0.10.100:40572] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.14.233:2525] with esmtp (Farcaster) id 27f4f8c6-d18d-45f7-aee0-af42d924df8a; Mon, 5 Feb 2024 12:05:19 +0000 (UTC) X-Farcaster-Flow-ID: 27f4f8c6-d18d-45f7-aee0-af42d924df8a Received: from EX19D014EUC004.ant.amazon.com (10.252.51.182) by EX19MTAEUC001.ant.amazon.com (10.252.51.155) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:05:19 +0000 Received: from dev-dsk-jgowans-1a-a3faec1f.eu-west-1.amazon.com (172.19.112.191) by EX19D014EUC004.ant.amazon.com (10.252.51.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:05:13 +0000 From: James Gowans To: CC: Eric Biederman , , "Joerg Roedel" , Will Deacon , , Alexander Viro , "Christian Brauner" , , Paolo Bonzini , Sean Christopherson , , Andrew Morton , , Alexander Graf , David Woodhouse , "Jan H . Schoenherr" , Usama Arif , Anthony Yznaga , Stanislav Kinsburskii , , , Subject: [RFC 14/18] intel-iommu: Allocate domain pgtable pages from pkernfs Date: Mon, 5 Feb 2024 12:01:59 +0000 Message-ID: <20240205120203.60312-15-jgowans@amazon.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240205120203.60312-1-jgowans@amazon.com> References: <20240205120203.60312-1-jgowans@amazon.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: EX19D046UWA004.ant.amazon.com (10.13.139.76) To EX19D014EUC004.ant.amazon.com (10.252.51.182) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1790060810356544870 X-GMAIL-MSGID: 1790060810356544870 In the previous commit VFIO was updated to be able to define persistent pgtables on a container. Now the IOMMU driver is updated to accept the file for persistent pgtables when the domain is allocated and use that file as the source of pages for the pgtables. The iommu_ops.domain_alloc callback is extended to page a struct file for the pkernfs domain pgtables file. Most call sites are updated to supply NULL here, indicating no persistent pgtables. VFIO's caller is updated to plumb the pkernfs file through. When this file is supplied the md_domain_init() function convers the file into a pkernfs region and uses that region for pgtables. Similarly to the root pgtables there are use after free issues with this that need sorting out, and the free() functions also need to be updated to free from the pkernfs region. It may be better to store the struct file on the dmar_domain and map file offset to addr every time rather than using a pkernfs region for this. --- drivers/iommu/intel/iommu.c | 35 +++++++++++++++++++++++++++-------- drivers/iommu/intel/iommu.h | 1 + drivers/iommu/iommu.c | 22 ++++++++++++++-------- drivers/iommu/pgtable_alloc.c | 7 +++++++ drivers/iommu/pgtable_alloc.h | 1 + fs/pkernfs/iommu.c | 2 +- include/linux/iommu.h | 6 +++++- include/linux/pkernfs.h | 1 + 8 files changed, 57 insertions(+), 18 deletions(-) diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 315c6b7f901c..809ca9e93992 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -946,7 +946,13 @@ static struct dma_pte *pfn_to_dma_pte(struct dmar_domain *domain, if (!dma_pte_present(pte)) { uint64_t pteval; - tmp_page = alloc_pgtable_page(domain->nid, gfp); + if (domain->pgtables_allocator.vaddr) + iommu_alloc_page_from_region( + &domain->pgtables_allocator, + &tmp_page, + NULL); + else + tmp_page = alloc_pgtable_page(domain->nid, gfp); if (!tmp_page) return NULL; @@ -2399,7 +2405,7 @@ static int iommu_domain_identity_map(struct dmar_domain *domain, DMA_PTE_READ|DMA_PTE_WRITE, GFP_KERNEL); } -static int md_domain_init(struct dmar_domain *domain, int guest_width); +static int md_domain_init(struct dmar_domain *domain, int guest_width, struct file *ppts); static int __init si_domain_init(int hw) { @@ -2411,7 +2417,7 @@ static int __init si_domain_init(int hw) if (!si_domain) return -EFAULT; - if (md_domain_init(si_domain, DEFAULT_DOMAIN_ADDRESS_WIDTH)) { + if (md_domain_init(si_domain, DEFAULT_DOMAIN_ADDRESS_WIDTH, NULL)) { domain_exit(si_domain); si_domain = NULL; return -EFAULT; @@ -4029,7 +4035,7 @@ static void device_block_translation(struct device *dev) info->domain = NULL; } -static int md_domain_init(struct dmar_domain *domain, int guest_width) +static int md_domain_init(struct dmar_domain *domain, int guest_width, struct file *ppts) { int adjust_width; @@ -4042,8 +4048,21 @@ static int md_domain_init(struct dmar_domain *domain, int guest_width) domain->iommu_superpage = 0; domain->max_addr = 0; - /* always allocate the top pgd */ - domain->pgd = alloc_pgtable_page(domain->nid, GFP_ATOMIC); + if (ppts) { + unsigned long pgd_phy; + + pkernfs_get_region_for_ppts( + ppts, + &domain->pgtables_allocator); + iommu_get_pgd_page( + &domain->pgtables_allocator, + (void **) &domain->pgd, + &pgd_phy); + + } else { + /* always allocate the top pgd */ + domain->pgd = alloc_pgtable_page(domain->nid, GFP_ATOMIC); + } if (!domain->pgd) return -ENOMEM; domain_flush_cache(domain, domain->pgd, PAGE_SIZE); @@ -4064,7 +4083,7 @@ static struct iommu_domain blocking_domain = { } }; -static struct iommu_domain *intel_iommu_domain_alloc(unsigned type) +static struct iommu_domain *intel_iommu_domain_alloc(unsigned int type, struct file *ppts) { struct dmar_domain *dmar_domain; struct iommu_domain *domain; @@ -4079,7 +4098,7 @@ static struct iommu_domain *intel_iommu_domain_alloc(unsigned type) pr_err("Can't allocate dmar_domain\n"); return NULL; } - if (md_domain_init(dmar_domain, DEFAULT_DOMAIN_ADDRESS_WIDTH)) { + if (md_domain_init(dmar_domain, DEFAULT_DOMAIN_ADDRESS_WIDTH, ppts)) { pr_err("Domain initialization failed\n"); domain_exit(dmar_domain); return NULL; diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h index 4a2f163a86f3..f772fdcf3828 100644 --- a/drivers/iommu/intel/iommu.h +++ b/drivers/iommu/intel/iommu.h @@ -602,6 +602,7 @@ struct dmar_domain { struct list_head dev_pasids; /* all attached pasids */ struct list_head domains; /* all struct dmar_domains on this IOMMU */ + struct pkernfs_region pgtables_allocator; struct dma_pte *pgd; /* virtual address */ int gaw; /* max guest address width */ diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 3a67e636287a..f26e83d5b159 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -97,7 +97,7 @@ static int iommu_bus_notifier(struct notifier_block *nb, unsigned long action, void *data); static void iommu_release_device(struct device *dev); static struct iommu_domain *__iommu_domain_alloc(const struct bus_type *bus, - unsigned type); + unsigned int type, struct file *ppts); static int __iommu_attach_device(struct iommu_domain *domain, struct device *dev); static int __iommu_attach_group(struct iommu_domain *domain, @@ -1734,7 +1734,7 @@ __iommu_group_alloc_default_domain(const struct bus_type *bus, { if (group->default_domain && group->default_domain->type == req_type) return group->default_domain; - return __iommu_domain_alloc(bus, req_type); + return __iommu_domain_alloc(bus, req_type, NULL); } /* @@ -1971,7 +1971,7 @@ void iommu_set_fault_handler(struct iommu_domain *domain, EXPORT_SYMBOL_GPL(iommu_set_fault_handler); static struct iommu_domain *__iommu_domain_alloc(const struct bus_type *bus, - unsigned type) + unsigned int type, struct file *ppts) { struct iommu_domain *domain; unsigned int alloc_type = type & IOMMU_DOMAIN_ALLOC_FLAGS; @@ -1979,7 +1979,7 @@ static struct iommu_domain *__iommu_domain_alloc(const struct bus_type *bus, if (bus == NULL || bus->iommu_ops == NULL) return NULL; - domain = bus->iommu_ops->domain_alloc(alloc_type); + domain = bus->iommu_ops->domain_alloc(alloc_type, ppts); if (!domain) return NULL; @@ -2001,9 +2001,15 @@ static struct iommu_domain *__iommu_domain_alloc(const struct bus_type *bus, return domain; } +struct iommu_domain *iommu_domain_alloc_persistent(const struct bus_type *bus, struct file *ppts) +{ + return __iommu_domain_alloc(bus, IOMMU_DOMAIN_UNMANAGED, ppts); +} +EXPORT_SYMBOL_GPL(iommu_domain_alloc_persistent); + struct iommu_domain *iommu_domain_alloc(const struct bus_type *bus) { - return __iommu_domain_alloc(bus, IOMMU_DOMAIN_UNMANAGED); + return __iommu_domain_alloc(bus, IOMMU_DOMAIN_UNMANAGED, NULL); } EXPORT_SYMBOL_GPL(iommu_domain_alloc); @@ -3198,14 +3204,14 @@ static int __iommu_group_alloc_blocking_domain(struct iommu_group *group) return 0; group->blocking_domain = - __iommu_domain_alloc(dev->dev->bus, IOMMU_DOMAIN_BLOCKED); + __iommu_domain_alloc(dev->dev->bus, IOMMU_DOMAIN_BLOCKED, NULL); if (!group->blocking_domain) { /* * For drivers that do not yet understand IOMMU_DOMAIN_BLOCKED * create an empty domain instead. */ group->blocking_domain = __iommu_domain_alloc( - dev->dev->bus, IOMMU_DOMAIN_UNMANAGED); + dev->dev->bus, IOMMU_DOMAIN_UNMANAGED, NULL); if (!group->blocking_domain) return -EINVAL; } @@ -3500,7 +3506,7 @@ struct iommu_domain *iommu_sva_domain_alloc(struct device *dev, const struct iommu_ops *ops = dev_iommu_ops(dev); struct iommu_domain *domain; - domain = ops->domain_alloc(IOMMU_DOMAIN_SVA); + domain = ops->domain_alloc(IOMMU_DOMAIN_SVA, false); if (!domain) return NULL; diff --git a/drivers/iommu/pgtable_alloc.c b/drivers/iommu/pgtable_alloc.c index f0c2e12f8a8b..276db15932cc 100644 --- a/drivers/iommu/pgtable_alloc.c +++ b/drivers/iommu/pgtable_alloc.c @@ -7,6 +7,13 @@ * The first 4 KiB is the bitmap - set the first bit in the bitmap. * Scan bitmap to find next free bits - it's next free page. */ +void iommu_get_pgd_page(struct pkernfs_region *region, void **vaddr, unsigned long *paddr) +{ + set_bit(1, region->vaddr); + *vaddr = region->vaddr + (1 << PAGE_SHIFT); + if (paddr) + *paddr = region->paddr + (1 << PAGE_SHIFT); +} void iommu_alloc_page_from_region(struct pkernfs_region *region, void **vaddr, unsigned long *paddr) { diff --git a/drivers/iommu/pgtable_alloc.h b/drivers/iommu/pgtable_alloc.h index c1666a7be3d3..50c3abba922b 100644 --- a/drivers/iommu/pgtable_alloc.h +++ b/drivers/iommu/pgtable_alloc.h @@ -3,6 +3,7 @@ #include #include +void iommu_get_pgd_page(struct pkernfs_region *region, void **vaddr, unsigned long *paddr); void iommu_alloc_page_from_region(struct pkernfs_region *region, void **vaddr, unsigned long *paddr); diff --git a/fs/pkernfs/iommu.c b/fs/pkernfs/iommu.c index f14e76013e85..5d0b256e7dd8 100644 --- a/fs/pkernfs/iommu.c +++ b/fs/pkernfs/iommu.c @@ -4,7 +4,7 @@ #include -void pkernfs_alloc_iommu_domain_pgtables(struct file *ppts, struct pkernfs_region *pkernfs_region) +void pkernfs_get_region_for_ppts(struct file *ppts, struct pkernfs_region *pkernfs_region) { struct pkernfs_inode *pkernfs_inode; unsigned long *mappings_block_vaddr; diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 0225cf7445de..01bb89246ef7 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -101,6 +101,7 @@ struct iommu_domain { enum iommu_page_response_code (*iopf_handler)(struct iommu_fault *fault, void *data); void *fault_data; + struct file *persistent_pgtables; union { struct { iommu_fault_handler_t handler; @@ -266,7 +267,8 @@ struct iommu_ops { void *(*hw_info)(struct device *dev, u32 *length, u32 *type); /* Domain allocation and freeing by the iommu driver */ - struct iommu_domain *(*domain_alloc)(unsigned iommu_domain_type); + /* If ppts is not null it is a persistent domain; null is non-persistent */ + struct iommu_domain *(*domain_alloc)(unsigned int tiommu_domain_type, struct file *ppts); struct iommu_device *(*probe_device)(struct device *dev); void (*release_device)(struct device *dev); @@ -466,6 +468,8 @@ extern bool iommu_present(const struct bus_type *bus); extern bool device_iommu_capable(struct device *dev, enum iommu_cap cap); extern bool iommu_group_has_isolated_msi(struct iommu_group *group); extern struct iommu_domain *iommu_domain_alloc(const struct bus_type *bus); +extern struct iommu_domain *iommu_domain_alloc_persistent(const struct bus_type *bus, + struct file *ppts); extern void iommu_domain_free(struct iommu_domain *domain); extern int iommu_attach_device(struct iommu_domain *domain, struct device *dev); diff --git a/include/linux/pkernfs.h b/include/linux/pkernfs.h index 4ca923ee0d82..8aa69ef5a2d8 100644 --- a/include/linux/pkernfs.h +++ b/include/linux/pkernfs.h @@ -31,6 +31,7 @@ struct pkernfs_region { void pkernfs_alloc_iommu_root_pgtables(struct pkernfs_region *pkernfs_region); void pkernfs_alloc_page_from_region(struct pkernfs_region *pkernfs_region, void **vaddr, unsigned long *paddr); +void pkernfs_get_region_for_ppts(struct file *ppts, struct pkernfs_region *pkernfs_region); void *pkernfs_region_paddr_to_vaddr(struct pkernfs_region *region, unsigned long paddr); bool pkernfs_is_iommu_domain_pgtables(struct file *f); From patchwork Mon Feb 5 12:02:00 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Gowans, James" X-Patchwork-Id: 196784 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:168b:b0:106:860b:bbdd with SMTP id ma11csp830298dyb; Mon, 5 Feb 2024 04:12:24 -0800 (PST) X-Google-Smtp-Source: AGHT+IGVnGIh2xRMm+4h5XZE0YWLRXu2AJq9zcQDKWowG1dMeZntORmiQW8hDsDoAm5S7wSGdq6c X-Received: by 2002:a17:902:bb16:b0:1d9:7a79:c35a with SMTP id im22-20020a170902bb1600b001d97a79c35amr8348676plb.35.1707135144156; Mon, 05 Feb 2024 04:12:24 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707135144; cv=pass; d=google.com; s=arc-20160816; b=riD8eSH9ehnDF17CalmO4KCjPMnnI0iEJ0zvK+u7xV6XiS9fg4CVOU9TNmnJzJ5nUC ZZgwAPR84MNpD5wUuhm5s2QOjlKSNfPkIYT5shYbCSbrR88gZwZDSd156+oKzDAX4Ujg r7hhesZeZ3FPIwqiK/+lYEsBGMKakIFxSN0JI4TO8fmpnhHhNh1erViw3XFsqKWy/nBR fU0z2XI+k9HadCBeIb+n/7HfEDbgyIFEysm6kAQ3YUAk0q4ztmAG92DfnDcU8gSggWe3 yhevHU4Iw7SEQm6wSuG8jFyBVn0DZZFOogHhzmTP4B2gL9PQ9lwDF3SYwGrC0+Bpqqoi 0mnQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=nHBURAsv2kVj18uGoBS/ynF5eAKuZbHIuIE7fti0DT8=; fh=w3fT+kb9/xmdebfT2rkK/EiKaIhApsXPiWut93mWVuE=; b=UouhZebis9Ekphjn4DNLjihChgZUHzUaC5XkZXTmFMMktt9ra/e1Z2+260GW6N4mFa ZY3mzqP8RyNLMATgxkjNWvMSifRitSGLGI8KUaH5J4Fhr17oDK8/bdtEjlzy6llnDfjS veGI9LTtUxfuxy0ykzoBY3exIJXOMXy+bCuM7GrjxLhKkj0+QcPf14jan5KbLvLbFF8C HPE7+RCDQCsFj6zz9Vemun5Pr0v/EiMmP9TRY5LhAX2BBH94r3NLzRmTFUmZVYuYtZtW Adz3cKyzpSBDTs0KGTAXfz7orwW3raGklthK3VYA9d86OO4Le0c2SVfbd6J3F0YhcX6h JQBA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b=tPu64pUv; arc=pass (i=1 spf=pass spfdomain=amazon.com dkim=pass dkdomain=amazon.com dmarc=pass fromdomain=amazon.com); spf=pass (google.com: domain of linux-kernel+bounces-52562-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-52562-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com X-Forwarded-Encrypted: i=1; AJvYcCV2mtLSGJnM7zH9B1BcnhKYRhgmmYtTyzAbDwYXI8tNFSfgHe64xqwuSDoKw+Z4+Nnphe1rjB/U0ByULCAZK3NSWJhWEQ== Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id w21-20020a170902d3d500b001d94beef7c6si5232236plb.486.2024.02.05.04.12.24 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 05 Feb 2024 04:12:24 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-52562-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b=tPu64pUv; arc=pass (i=1 spf=pass spfdomain=amazon.com dkim=pass dkdomain=amazon.com dmarc=pass fromdomain=amazon.com); spf=pass (google.com: domain of linux-kernel+bounces-52562-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-52562-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id E56A3280EF7 for ; Mon, 5 Feb 2024 12:12:23 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id BBF2B21370; Mon, 5 Feb 2024 12:06:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="tPu64pUv" Received: from smtp-fw-52003.amazon.com (smtp-fw-52003.amazon.com [52.119.213.152]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9E369200C3; Mon, 5 Feb 2024 12:06:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=52.119.213.152 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707134767; cv=none; b=jr+433HzYYAtjj+BxAl+SVaMSqVuXS9x6saff/uj1A8eGyY6Fw8DvRuYxJjigUInFdoDQ+gl/TDMnaelpwYCwBgtxpOEn2+1euExWWX+SFTe28mnq+9scTGnErgETxTNyczCMin4BBySZRqpaBK3ISD7bVKGO2c94u7XIk3fGa0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707134767; c=relaxed/simple; bh=wiv5U2uS8hUNGvTNLpJMav4y5mpoMA0A7e/DcQgUB54=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=cItoH1kD6BBUG1ze24SXY7eIs18Ol8VIwjTLBPQD6tCuyUwNL4uJlvsCupedWL7J/93A0kEMU042wJbnGC4qb/D9yU4ehwM4z/RxCrlefUT7f0PT487XQRsHYMjLDQIjI3qhvCv8/7YPNbtXaHLn+jwdZZ3BWzaFi4NZsIRlElc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com; spf=pass smtp.mailfrom=amazon.com; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b=tPu64pUv; arc=none smtp.client-ip=52.119.213.152 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1707134766; x=1738670766; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=nHBURAsv2kVj18uGoBS/ynF5eAKuZbHIuIE7fti0DT8=; b=tPu64pUvAr6GnsEJ2eZJyq16WF88nHXybjAkbNmSFVN6DizbghynKDZN VMYgd63kckia0/Kh+s2KMXDnCkauoyNrhzbwajn+MWNP0ZFgLFFKsJ3fv tODXa+LUMh4p3y8a2ACA2lLkRz1SBH6fWIl1iSwR08WAmOWQ2Qkpzd9a9 4=; X-IronPort-AV: E=Sophos;i="6.05,245,1701129600"; d="scan'208";a="635764854" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.43.8.6]) by smtp-border-fw-52003.iad7.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2024 12:06:02 +0000 Received: from EX19MTAEUA002.ant.amazon.com [10.0.43.254:50504] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.28.192:2525] with esmtp (Farcaster) id a892883c-e384-427e-8387-b1c5e5820895; Mon, 5 Feb 2024 12:05:50 +0000 (UTC) X-Farcaster-Flow-ID: a892883c-e384-427e-8387-b1c5e5820895 Received: from EX19D014EUC004.ant.amazon.com (10.252.51.182) by EX19MTAEUA002.ant.amazon.com (10.252.50.124) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:05:49 +0000 Received: from dev-dsk-jgowans-1a-a3faec1f.eu-west-1.amazon.com (172.19.112.191) by EX19D014EUC004.ant.amazon.com (10.252.51.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:05:43 +0000 From: James Gowans To: CC: Eric Biederman , , "Joerg Roedel" , Will Deacon , , Alexander Viro , "Christian Brauner" , , Paolo Bonzini , Sean Christopherson , , Andrew Morton , , Alexander Graf , David Woodhouse , "Jan H . Schoenherr" , Usama Arif , Anthony Yznaga , Stanislav Kinsburskii , , , Subject: [RFC 15/18] pkernfs: register device memory for IOMMU domain pgtables Date: Mon, 5 Feb 2024 12:02:00 +0000 Message-ID: <20240205120203.60312-16-jgowans@amazon.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240205120203.60312-1-jgowans@amazon.com> References: <20240205120203.60312-1-jgowans@amazon.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: EX19D042UWB002.ant.amazon.com (10.13.139.175) To EX19D014EUC004.ant.amazon.com (10.252.51.182) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1790060941005760586 X-GMAIL-MSGID: 1790060941005760586 Similarly to the root/context pgtables, the IOMMU driver also does phys_to_virt when walking the domain pgtables. To make this work properly the physical memory needs to be mapped in at the correct place in the direct map. Register a memory device to support this. The alternative would be to wrap all of the phys_to_virt functions in something which is pkernfs aware. --- fs/pkernfs/iommu.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/fs/pkernfs/iommu.c b/fs/pkernfs/iommu.c index 5d0b256e7dd8..073b9dd48237 100644 --- a/fs/pkernfs/iommu.c +++ b/fs/pkernfs/iommu.c @@ -9,6 +9,7 @@ void pkernfs_get_region_for_ppts(struct file *ppts, struct pkernfs_region *pkern struct pkernfs_inode *pkernfs_inode; unsigned long *mappings_block_vaddr; unsigned long inode_idx; + int rc; /* * For a pkernfs region block, the "mappings_block" field is still @@ -22,7 +23,20 @@ void pkernfs_get_region_for_ppts(struct file *ppts, struct pkernfs_region *pkern mappings_block_vaddr = (unsigned long *)pkernfs_addr_for_block(NULL, pkernfs_inode->mappings_block); set_bit(0, mappings_block_vaddr); - pkernfs_region->vaddr = mappings_block_vaddr; + + dev_set_name(&pkernfs_region->dev, "vfio-ppt-%s", pkernfs_inode->filename); + rc = device_register(&pkernfs_region->dev); + if (rc) + pr_err("device_register failed: %i\n", rc); + + pkernfs_region->pgmap.range.start = pkernfs_base + + (pkernfs_inode->mappings_block * PMD_SIZE); + pkernfs_region->pgmap.range.end = + pkernfs_region->pgmap.range.start + PMD_SIZE - 1; + pkernfs_region->pgmap.nr_range = 1; + pkernfs_region->pgmap.type = MEMORY_DEVICE_GENERIC; + pkernfs_region->vaddr = + devm_memremap_pages(&pkernfs_region->dev, &pkernfs_region->pgmap); pkernfs_region->paddr = pkernfs_base + (pkernfs_inode->mappings_block * (2 << 20)); } void pkernfs_alloc_iommu_root_pgtables(struct pkernfs_region *pkernfs_region) From patchwork Mon Feb 5 12:02:01 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Gowans, James" X-Patchwork-Id: 196793 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:168b:b0:106:860b:bbdd with SMTP id ma11csp838349dyb; Mon, 5 Feb 2024 04:27:43 -0800 (PST) X-Google-Smtp-Source: AGHT+IGh6Uteq6hwruc9uMuCwKlPKpgyVqvkMw+VNPIT7pxFbPbN1a98YqrIQIEcr5gbqLQKTCso X-Received: by 2002:a17:906:32c3:b0:a31:7dc1:c7c1 with SMTP id k3-20020a17090632c300b00a317dc1c7c1mr5467965ejk.65.1707136062968; Mon, 05 Feb 2024 04:27:42 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707136062; cv=pass; d=google.com; s=arc-20160816; b=NALznC7q+FkvmWBYlwMbaCvOd4M3O/L0w5v/cf7NYucH9PgmCyq4osjEm/5uHwOaKg URzf40YglWxD+ME4eAJXgjPId8BbaciGmsmtYZg1ClU9i1+zqWCWNw4xPpapY89MKmOx 0TsIxmxiOzAvyNLqCDuMHMRoTBLhU10EpMft5ym6NuzZSZ7TUpFQgCGHnDGLBefDABLt TXu89kgCHlapnApi/lhj71MSlcOg0iueMh14RhovN0MWf+rhRtj4XXhLDppYIMVQ3OZ6 rxooFJ9/jSIWbfhTHKfsWdDsx/wxB1a9DKyDv/Gs2zOXubbQlicJ/PRtzYLaAK3nYvZ0 baUA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=2AUEdbvoftHMg2AkT8RS6u9G56Zx/z0Bmygk8dweA74=; fh=0lX75j/PJFPSttQWTPKMQJBNQ6As5ik5CIzCUhyUYPc=; b=jDnrntgJArsu08YYspHdSCBrVIf2Bh4tx2PUaulrP2HsDh8RB0thRJ9TB2YxoKujL0 tqqaUOa0Hi1y6CTvZ6PA6etmec9XUXYbbs42A1qqmtvfF4htd3CDNxQFBqQLQdQ2e0ZS XjaSecZ4SjfTihUOQrMXYxIbxdyUsO8AplvjCN3nNcSKrKehTS/JovSzCsKYx0sABInq q4q/0phjZbg/CJ+KNxi93xZNalX/L2v9LeHFMCGLogh2P2eOX5wJye3wvNKb/lCOBTY5 MRfYaXzCIbXQoeS6wmFWqtugCoAMXVsy9f2VzAvx+eZGjkVmnv0ZAjH50sW3pnYURVUV +rSA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b="i/klmtUX"; arc=pass (i=1 spf=pass spfdomain=amazon.com dkim=pass dkdomain=amazon.com dmarc=pass fromdomain=amazon.com); spf=pass (google.com: domain of linux-kernel+bounces-52559-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-52559-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com X-Forwarded-Encrypted: i=1; AJvYcCWFlIgjPVXuCMMsK5lpuiQ1yPD2wqgf4XbHyE9MIUy0gnyWgfK9kqPjcxut/UpmAR29WznkK6b479qb9jKlpp1RORbdHQ== Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id g16-20020a1709067c5000b00a376e4851e5si2110235ejp.221.2024.02.05.04.27.42 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 05 Feb 2024 04:27:42 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-52559-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b="i/klmtUX"; arc=pass (i=1 spf=pass spfdomain=amazon.com dkim=pass dkdomain=amazon.com dmarc=pass fromdomain=amazon.com); spf=pass (google.com: domain of linux-kernel+bounces-52559-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-52559-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 4D8701F277A6 for ; Mon, 5 Feb 2024 12:11:02 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 330A51D555; Mon, 5 Feb 2024 12:06:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="i/klmtUX" Received: from smtp-fw-80008.amazon.com (smtp-fw-80008.amazon.com [99.78.197.219]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AD1691B943; Mon, 5 Feb 2024 12:05:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=99.78.197.219 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707134759; cv=none; b=s81H1sAf3ew6o8YKADNQuYTm96YJ0bVI1brNyCqgigp/GtwmnYhTCAIMywGgWVW9ziSdxQhr6jVh56zuYEqyJOm6/5XT+wR70eo+JQ1JLuYTQAGsy1UrtMeNoZ+9Idy/zf0ZAGAqg40ckFLCWnVCmUSPXihlWEVYEKvK3TWF4/k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707134759; c=relaxed/simple; bh=GKb17qTpfFtGwafuxlfpZaDYOiFuRf0lqbATSZhVa5M=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=VrKKmf3sgMGcE9hox37DOgifvWdwgb8fS5p5Ix71RaXvdu923oa5WXAu+CGE7OBa8LdCyX7UkTESOFLqmllOLVty2yTY0oxDIIrt60VDLiv/ZjDnIBozpSYDUamnnpcerbRZqlzT3W71ywGX5O70Eydx7KZ5nPiFQ6Y0xmA6mLU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com; spf=pass smtp.mailfrom=amazon.com; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b=i/klmtUX; arc=none smtp.client-ip=99.78.197.219 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1707134757; x=1738670757; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=2AUEdbvoftHMg2AkT8RS6u9G56Zx/z0Bmygk8dweA74=; b=i/klmtUXhQUrqDvb4qCuzopKWT/ra9S74iPzNL3XvyoyCKcwmf1zgN4T miLXyPqCx6UXRj23ocW5tiM35DErXD+aCBEoRp0jMNNqmiSaBol5bQGif ZC2YCfK6Dd3qeJdvGBv3PJgo97QbI+tWtNJLMnAJnqpGex+192NyLyvIn c=; X-IronPort-AV: E=Sophos;i="6.05,245,1701129600"; d="scan'208";a="63755948" Received: from pdx4-co-svc-p1-lb2-vlan3.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.25.36.214]) by smtp-border-fw-80008.pdx80.corp.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2024 12:05:57 +0000 Received: from EX19MTAEUB001.ant.amazon.com [10.0.17.79:13316] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.45.85:2525] with esmtp (Farcaster) id 596cc4dd-1066-4e32-b40f-00d6ab21896d; Mon, 5 Feb 2024 12:05:56 +0000 (UTC) X-Farcaster-Flow-ID: 596cc4dd-1066-4e32-b40f-00d6ab21896d Received: from EX19D014EUC004.ant.amazon.com (10.252.51.182) by EX19MTAEUB001.ant.amazon.com (10.252.51.26) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:05:56 +0000 Received: from dev-dsk-jgowans-1a-a3faec1f.eu-west-1.amazon.com (172.19.112.191) by EX19D014EUC004.ant.amazon.com (10.252.51.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:05:49 +0000 From: James Gowans To: CC: Eric Biederman , , "Joerg Roedel" , Will Deacon , , Alexander Viro , "Christian Brauner" , , Paolo Bonzini , Sean Christopherson , , Andrew Morton , , Alexander Graf , David Woodhouse , "Jan H . Schoenherr" , Usama Arif , Anthony Yznaga , Stanislav Kinsburskii , , , Subject: [RFC 16/18] vfio: support not mapping IOMMU pgtables on live-update Date: Mon, 5 Feb 2024 12:02:01 +0000 Message-ID: <20240205120203.60312-17-jgowans@amazon.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240205120203.60312-1-jgowans@amazon.com> References: <20240205120203.60312-1-jgowans@amazon.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: EX19D042UWB002.ant.amazon.com (10.13.139.175) To EX19D014EUC004.ant.amazon.com (10.252.51.182) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1790061904200293426 X-GMAIL-MSGID: 1790061904200293426 When restoring VMs after live update kexec, the IOVAs for the guest VM are already present in the persisted page tables. It is unnecessary to clobber the existing pgtable entries and it may introduce races if pgtable modifications happen concurrently with DMA. Provide a new VFIO MAP_DMA flag which userspace can supply to inform VFIO that the IOVAs are already mapped. In this case VFIO will skip over the call to the IOMMU driver to do the mapping. VFIO still needs the MAP_DMA ioctl to set up its internal data structures about the mapping. It would probably be better to move the persistence one layer up and persist the VFIO container in pkernfs. That way the whole container could be picked up and re-used without needing to do any MAP_DMA ioctls after kexec. --- drivers/vfio/vfio_iommu_type1.c | 24 +++++++++++++----------- include/uapi/linux/vfio.h | 1 + 2 files changed, 14 insertions(+), 11 deletions(-) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index b36edfc5c9ef..dc2682fbda2e 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -1456,7 +1456,7 @@ static int vfio_iommu_map(struct vfio_iommu *iommu, dma_addr_t iova, } static int vfio_pin_map_dma(struct vfio_iommu *iommu, struct vfio_dma *dma, - size_t map_size) + size_t map_size, unsigned int flags) { dma_addr_t iova = dma->iova; unsigned long vaddr = dma->vaddr; @@ -1479,14 +1479,16 @@ static int vfio_pin_map_dma(struct vfio_iommu *iommu, struct vfio_dma *dma, break; } - /* Map it! */ - ret = vfio_iommu_map(iommu, iova + dma->size, pfn, npage, - dma->prot); - if (ret) { - vfio_unpin_pages_remote(dma, iova + dma->size, pfn, - npage, true); - vfio_batch_unpin(&batch, dma); - break; + if (!(flags & VFIO_DMA_MAP_FLAG_LIVE_UPDATE)) { + /* Map it! */ + ret = vfio_iommu_map(iommu, iova + dma->size, pfn, npage, + dma->prot); + if (ret) { + vfio_unpin_pages_remote(dma, iova + dma->size, pfn, + npage, true); + vfio_batch_unpin(&batch, dma); + break; + } } size -= npage << PAGE_SHIFT; @@ -1662,7 +1664,7 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu, if (list_empty(&iommu->domain_list)) dma->size = size; else - ret = vfio_pin_map_dma(iommu, dma, size); + ret = vfio_pin_map_dma(iommu, dma, size, map->flags); if (!ret && iommu->dirty_page_tracking) { ret = vfio_dma_bitmap_alloc(dma, pgsize); @@ -2836,7 +2838,7 @@ static int vfio_iommu_type1_map_dma(struct vfio_iommu *iommu, struct vfio_iommu_type1_dma_map map; unsigned long minsz; uint32_t mask = VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE | - VFIO_DMA_MAP_FLAG_VADDR; + VFIO_DMA_MAP_FLAG_VADDR | VFIO_DMA_MAP_FLAG_LIVE_UPDATE; minsz = offsetofend(struct vfio_iommu_type1_dma_map, size); diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index fa9676bb4b26..d04d28e52110 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -1536,6 +1536,7 @@ struct vfio_iommu_type1_dma_map { #define VFIO_DMA_MAP_FLAG_READ (1 << 0) /* readable from device */ #define VFIO_DMA_MAP_FLAG_WRITE (1 << 1) /* writable from device */ #define VFIO_DMA_MAP_FLAG_VADDR (1 << 2) +#define VFIO_DMA_MAP_FLAG_LIVE_UPDATE (1 << 3) /* IOVAs already mapped in IOMMU before LU */ __u64 vaddr; /* Process virtual address */ __u64 iova; /* IO virtual address */ __u64 size; /* Size of mapping (bytes) */ From patchwork Mon Feb 5 12:02:02 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Gowans, James" X-Patchwork-Id: 196783 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:168b:b0:106:860b:bbdd with SMTP id ma11csp830204dyb; Mon, 5 Feb 2024 04:12:12 -0800 (PST) X-Google-Smtp-Source: AGHT+IH49wLrB441sGp5crhghY7XAhBQr+QCxQbWcXIo30WKrhTJJj7hm61cb3KAgkgm2mucqSRc X-Received: by 2002:a17:906:cc9:b0:a37:3acc:4ff4 with SMTP id l9-20020a1709060cc900b00a373acc4ff4mr4899813ejh.75.1707135131845; Mon, 05 Feb 2024 04:12:11 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707135131; cv=pass; d=google.com; s=arc-20160816; b=mqDEiFfkHsQqzqq4cz37vvjcA/2MM847ZOyZA5fj2NpZN3PuXdwo0msSVsyOD5MuXB fibkAV+RmX+inIhy8hzMVAJ4M0Fxdm8S30TeDghRnARNGdMkVASvQA44QVrX62XsUK6h zOMvPeMmx/LQAeLd7wO7lFUnJjfdtt0CaZ4P1isyAQVtCobzwB8CitRtdxvTg63vNjSM /WiPxLKJy5w8AZgBIVff+02zqtn7rNe2o9rYPG21tshcO6NEURyS6A+TzQsqrSv6njOV 4QxrS2HK8j6jnZLygvKXVI82BXsQQX9ftd3DpSsPOqnBHeYfcjZonJFfk6fKoiczP2UK wUkQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=PZz9j1l+xmnCAt/gSMnG+c7uMbg+JDxTUK6mJnmY1gY=; fh=uVFH/0hUdCxUSWhUZcCn7uYjlKw33C/OCGmPvNxlnzU=; b=J3gX+CP7jpE8HWLoSlN417lVNf8z9ejd8dW06HxU7fCVyas4PlvU5THzYp8+l60c5d 9lpQmQ9Wc93VjzW5AxcGhaxZ+pftYL+vdMfSXUL6bB2qUI85+JKzQO3Y4cVirTwgMQ4P RWEjAtWDg6J8QvN/+a5mebTKJtq2s6doJCf2OeNEeH+ItOF93n9M129jPB27AXJKqCBP XDmczZF4SboipnWYTWDzyV1clEsHYvHaQlHrJIXQSpgdcW1MO+JP3MgRYSA9+VHrGFlg Kbg7R3qzrDvvPvI0HVSxK7xuhl6E+bNdBZp/2TaRfvuxTRgdRX799PCqKFe8dgYaw5YO cwXQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b=EywkZ7fI; arc=pass (i=1 spf=pass spfdomain=amazon.com dkim=pass dkdomain=amazon.com dmarc=pass fromdomain=amazon.com); spf=pass (google.com: domain of linux-kernel+bounces-52560-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-52560-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com X-Forwarded-Encrypted: i=1; AJvYcCWhAphBnnh6Q3wDvaJY6jS7IPLF6+fN/KCXvGdyPU+W2hYYeqTEWsP7APEypGwtkURo8SEaImVeOfhcYy3QUcbdpWSe8Q== Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id f7-20020a170906084700b00a3742c41337si2830618ejd.53.2024.02.05.04.12.11 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 05 Feb 2024 04:12:11 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-52560-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b=EywkZ7fI; arc=pass (i=1 spf=pass spfdomain=amazon.com dkim=pass dkdomain=amazon.com dmarc=pass fromdomain=amazon.com); spf=pass (google.com: domain of linux-kernel+bounces-52560-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-52560-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 75F151F27678 for ; Mon, 5 Feb 2024 12:12:11 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 138FF20DCB; Mon, 5 Feb 2024 12:06:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="EywkZ7fI" Received: from smtp-fw-52004.amazon.com (smtp-fw-52004.amazon.com [52.119.213.154]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6C695200AB; Mon, 5 Feb 2024 12:06:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=52.119.213.154 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707134766; cv=none; b=TrtEn0EFDa6q9kni+9Q50ayTsVHLS4ncgdEjk873gKHUiKA5t1j+QCA5y9763X6E+GTi/MzyIJO4Iba2So1D4UHWLN5xbqHz49hnOwgB/AVxTSTJpsEPKPSjLpBvOBh2EUSxS4Ls0TPPWtzo8XCt3Y4qLrBXmxGdNkJUV8w9HQI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707134766; c=relaxed/simple; bh=ixoM0Sd7pCRhUQdiB6CxvVkou2FFR6qDgDmLsYAURmU=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=qpHDBZGQ72WunF5mvfKj6q5e3Va1EZt/sb8gVejzt7VPRFmurPWM4HwuHlp75hJb3BVi9IRnxNbX/UhgTH6Hzx/9skDavMQWe38fqbvdPLaoGiy/Kd3g2722azHReXPR0boHqwR6vCmRZvEadUBkIa+LJrKwksmwUuB23ASrTfg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com; spf=pass smtp.mailfrom=amazon.com; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b=EywkZ7fI; arc=none smtp.client-ip=52.119.213.154 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1707134765; x=1738670765; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=PZz9j1l+xmnCAt/gSMnG+c7uMbg+JDxTUK6mJnmY1gY=; b=EywkZ7fINIMUY/2tN9n7MFAb0T18hEP31HO8IxhveOGhJH3sa05dWqFh coiXD8Zckk5WX3vBORsjQwIrfWx4Nb5ub6x1lhj8NciceWTpP2y4JQDtQ qRFLvb/M/pLYQnRXw/BxamCSR6Essvn+Ne6oZRuzI6InFuzSlqd17JaDi o=; X-IronPort-AV: E=Sophos;i="6.05,245,1701129600"; d="scan'208";a="182633597" Received: from iad12-co-svc-p1-lb1-vlan2.amazon.com (HELO smtpout.prod.us-east-1.prod.farcaster.email.amazon.dev) ([10.43.8.2]) by smtp-border-fw-52004.iad7.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2024 12:06:04 +0000 Received: from EX19MTAEUB002.ant.amazon.com [10.0.17.79:57869] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.28.144:2525] with esmtp (Farcaster) id 1dbb430b-a9a2-4718-957f-9c92058f05d9; Mon, 5 Feb 2024 12:06:02 +0000 (UTC) X-Farcaster-Flow-ID: 1dbb430b-a9a2-4718-957f-9c92058f05d9 Received: from EX19D014EUC004.ant.amazon.com (10.252.51.182) by EX19MTAEUB002.ant.amazon.com (10.252.51.59) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:06:02 +0000 Received: from dev-dsk-jgowans-1a-a3faec1f.eu-west-1.amazon.com (172.19.112.191) by EX19D014EUC004.ant.amazon.com (10.252.51.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:05:56 +0000 From: James Gowans To: CC: Eric Biederman , , "Joerg Roedel" , Will Deacon , , Alexander Viro , "Christian Brauner" , , Paolo Bonzini , Sean Christopherson , , Andrew Morton , , Alexander Graf , David Woodhouse , "Jan H . Schoenherr" , Usama Arif , Anthony Yznaga , Stanislav Kinsburskii , , , Subject: [RFC 17/18] pci: Don't clear bus master is persistence enabled Date: Mon, 5 Feb 2024 12:02:02 +0000 Message-ID: <20240205120203.60312-18-jgowans@amazon.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240205120203.60312-1-jgowans@amazon.com> References: <20240205120203.60312-1-jgowans@amazon.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: EX19D042UWB002.ant.amazon.com (10.13.139.175) To EX19D014EUC004.ant.amazon.com (10.252.51.182) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1790060928329578688 X-GMAIL-MSGID: 1790060928329578688 In order for persistent devices to continue to DMA during kexec the bus mastering capability needs to remain on. Do not disable bus mastering if pkernfs is enabled, indicating that persistent devices are enabled. Only persistent devices should have bus mastering left on during kexec but this serves as a rough approximation of the functionality needed for this pkernfs RFC. --- drivers/pci/pci-driver.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index 51ec9e7e784f..131127967811 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -9,6 +9,7 @@ #include #include #include +#include #include #include #include @@ -519,7 +520,8 @@ static void pci_device_shutdown(struct device *dev) * If it is not a kexec reboot, firmware will hit the PCI * devices with big hammer and stop their DMA any way. */ - if (kexec_in_progress && (pci_dev->current_state <= PCI_D3hot)) + if (kexec_in_progress && (pci_dev->current_state <= PCI_D3hot) + && !pkernfs_enabled()) pci_clear_master(pci_dev); } From patchwork Mon Feb 5 12:02:03 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Gowans, James" X-Patchwork-Id: 196801 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:168b:b0:106:860b:bbdd with SMTP id ma11csp842532dyb; Mon, 5 Feb 2024 04:34:46 -0800 (PST) X-Google-Smtp-Source: AGHT+IFvfVT8MBZsWTJHVibQCuQcb65yertLMqyvKKceBCgzTI/I2neHfxtCkyyN+XcFFeIXFQwc X-Received: by 2002:a92:b708:0:b0:363:bfc7:74b1 with SMTP id k8-20020a92b708000000b00363bfc774b1mr5929342ili.32.1707136486741; Mon, 05 Feb 2024 04:34:46 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707136486; cv=pass; d=google.com; s=arc-20160816; b=n2CeQteaRadUHQvZIYFeWSvejMV040Mi9sTN44TY2eEDjzqVpxPH03HvkSzp6kiK0t +94XfK5hxljXLncAHZe4LokthUisi9yjcYHbH4EkizzHElpMZ5O3CykTo3TKcivJFnI+ 49CTpZtR+Ed0WfcWy1/SIyVV0yKok/z7h004bynfJhiWi6bqk/D736i45hbhqvvwm9he SMvY7ETx8mF602zHD0hOjdUOaoVfo6kNoR4PmsdUj5dRRupaoroL80jWHI/3dkp5uMOx +hgFxGNY0dtB3h8HP/SH9e+AOAWEVY6PrCuFcpXpMgHctvvVO2GFqZiWoh1K6V7Lwgit S0KA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=FiBUV1fiPqoy2U4uMejLoWWd/EiydMvzfGCqIJIuYJY=; fh=2n0mOOkL1xedtKzS9J/dtpgInDU++q11+ptG8+4wvVk=; b=e6PSixRQgzKA2953qy1iEuikH+/BMb8hOZyxXhrOw1DJy+4QLvj8LnlSPT5h86EStA 8ZaoNxqFrcWa2StSymqdwlIdBnuk6tmpMBQzxyt+H86lpKJ6mSI8akwTR9NYHRNUGTk0 bsMuSBnOm4D2a77zUYIs2bB1SKXyMMoC9nq5NjiBPmmFkOiRnWBBBdp+gwGgSmsN26yD Sqc/ewVRJE9Avvpy5yQlt5DvbJeZtvfMThhWIGBxtwukZ1l7nQFlea+BaJnniLJPFaMx Zay59D42opE7IXwT05YAAiQAMLStkJ5p2OViVvIqWbpgzoWvQIVy9zT/IdkfQTox6WZC 8P/w==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b=KS4I5EHo; arc=pass (i=1 spf=pass spfdomain=amazon.com dkim=pass dkdomain=amazon.com dmarc=pass fromdomain=amazon.com); spf=pass (google.com: domain of linux-kernel+bounces-52565-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-52565-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com X-Forwarded-Encrypted: i=1; AJvYcCW1Rq4cy3MabgECuGwF7QLXtSHV5jiqwsTKSdfNeZv/s+i9jl6rv/vv0GcNwMrxNdsnD1YigbwBJxy3Xp+UiqUI+SlIfg== Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [2604:1380:40f1:3f00::1]) by mx.google.com with ESMTPS id d8-20020a634f08000000b005cf9e59472bsi6028551pgb.85.2024.02.05.04.34.46 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 05 Feb 2024 04:34:46 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-52565-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) client-ip=2604:1380:40f1:3f00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b=KS4I5EHo; arc=pass (i=1 spf=pass spfdomain=amazon.com dkim=pass dkdomain=amazon.com dmarc=pass fromdomain=amazon.com); spf=pass (google.com: domain of linux-kernel+bounces-52565-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-52565-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id D79C7B299BA for ; Mon, 5 Feb 2024 12:13:29 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 6B9172377D; Mon, 5 Feb 2024 12:06:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="KS4I5EHo" Received: from smtp-fw-2101.amazon.com (smtp-fw-2101.amazon.com [72.21.196.25]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B873A21112; Mon, 5 Feb 2024 12:06:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=72.21.196.25 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707134800; cv=none; b=YxJ6wdOQS0TZPvzRInTa1S5d99lTgL89aVwjyXgEE8L+sbQbVWR8cdMY2BdLvhUVOg6G6XZYkehkkd4vHlDHBpqXzir5wcaFGfmOzjd71JSOnLZ1YxjUC6qUCkVkWWbEmPM9Mgd3r5xcn/RgTW7MwLVRoVXFM125TYYXkp7Skw0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707134800; c=relaxed/simple; bh=yhYdhvkUj66URI+HxR9xU2mALUiQmweSZAQjAWuVHXw=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=LzgPUhIbuNVAexs0wKTrO0nqm1P+HU2wRm8r/0YBT9wphTVZRnWkGGDrDn49NgI3RmH+IjFj/Cy5iG/uOy9fyEWEYR1pfNR5hB61TcirbB86pRe/1qI50OGpObfLNteAx0wtke33E8dTPYFx6OlzzTFjG6bBjourBaKUL5ZRyiA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com; spf=pass smtp.mailfrom=amazon.com; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b=KS4I5EHo; arc=none smtp.client-ip=72.21.196.25 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1707134799; x=1738670799; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=FiBUV1fiPqoy2U4uMejLoWWd/EiydMvzfGCqIJIuYJY=; b=KS4I5EHoPb/GLevhTJXhrqZGA19Gu0J1gr1dLxgNL42X9GGMElWCxcV6 TMkSKYaGibsuGrDtJ/BVIk12A5B2F6AeD63lMV41v1myZ+ZGI6khCfgDA KtYbjAX5a0RIQlGh00w6P5OeRBJ1H9xpdzO4LHNt/rm5sA/HwFOxllbTC U=; X-IronPort-AV: E=Sophos;i="6.05,245,1701129600"; d="scan'208";a="378967633" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.43.8.6]) by smtp-border-fw-2101.iad2.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2024 12:06:35 +0000 Received: from EX19MTAEUC002.ant.amazon.com [10.0.17.79:8094] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.45.85:2525] with esmtp (Farcaster) id 26f74936-a3bc-4ca0-9d65-59fbb5471a6c; Mon, 5 Feb 2024 12:06:32 +0000 (UTC) X-Farcaster-Flow-ID: 26f74936-a3bc-4ca0-9d65-59fbb5471a6c Received: from EX19D014EUC004.ant.amazon.com (10.252.51.182) by EX19MTAEUC002.ant.amazon.com (10.252.51.181) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:06:32 +0000 Received: from dev-dsk-jgowans-1a-a3faec1f.eu-west-1.amazon.com (172.19.112.191) by EX19D014EUC004.ant.amazon.com (10.252.51.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Mon, 5 Feb 2024 12:06:26 +0000 From: James Gowans To: CC: Eric Biederman , , "Joerg Roedel" , Will Deacon , , Alexander Viro , "Christian Brauner" , , Paolo Bonzini , Sean Christopherson , , Andrew Morton , , Alexander Graf , David Woodhouse , "Jan H . Schoenherr" , Usama Arif , Anthony Yznaga , Stanislav Kinsburskii , , , Subject: [RFC 18/18] vfio-pci: Assume device working after liveupdate Date: Mon, 5 Feb 2024 12:02:03 +0000 Message-ID: <20240205120203.60312-19-jgowans@amazon.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240205120203.60312-1-jgowans@amazon.com> References: <20240205120203.60312-1-jgowans@amazon.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: EX19D044UWB004.ant.amazon.com (10.13.139.134) To EX19D014EUC004.ant.amazon.com (10.252.51.182) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1790062348611728056 X-GMAIL-MSGID: 1790062348611728056 When re-creating a VFIO device after liveupdate no desctructive actions should be taken on it to avoid interrupting any ongoing DMA. Specifically bus mastering should not be cleared and the device should not be reset. Assume that reset works properly and skip over bus mastering reset. Ideally this would only be done for persistent devices but in this rough RFC there currently is no mechanism at this point to easily tell if a device is persisted or not. --- drivers/vfio/pci/vfio_pci_core.c | 20 +++++++++++++------- 1 file changed, 13 insertions(+), 7 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c index 1929103ee59a..a7f56d43e0a4 100644 --- a/drivers/vfio/pci/vfio_pci_core.c +++ b/drivers/vfio/pci/vfio_pci_core.c @@ -480,19 +480,25 @@ int vfio_pci_core_enable(struct vfio_pci_core_device *vdev) return ret; } - /* Don't allow our initial saved state to include busmaster */ - pci_clear_master(pdev); + if (!liveupdate) { + /* Don't allow our initial saved state to include busmaster */ + pci_clear_master(pdev); + } ret = pci_enable_device(pdev); if (ret) goto out_power; - /* If reset fails because of the device lock, fail this path entirely */ - ret = pci_try_reset_function(pdev); - if (ret == -EAGAIN) - goto out_disable_device; + if (!liveupdate) { + /* If reset fails because of the device lock, fail this path entirely */ + ret = pci_try_reset_function(pdev); + if (ret == -EAGAIN) + goto out_disable_device; - vdev->reset_works = !ret; + vdev->reset_works = !ret; + } else { + vdev->reset_works = 1; + } pci_save_state(pdev); vdev->pci_saved_state = pci_store_saved_state(pdev); if (!vdev->pci_saved_state)