Message ID | 20231211-vv-dax_abi-v3-2-acf6cc1bde9f@intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:bcd1:0:b0:403:3b70:6f57 with SMTP id r17csp7383988vqy; Mon, 11 Dec 2023 14:52:49 -0800 (PST) X-Google-Smtp-Source: AGHT+IG9M4HrnS1r3DBOoD3H+mJqEjU+64j/nu2gVeK//CvWGUCo1F2cYr9KcI0gAVBWmCjYh1pp X-Received: by 2002:a17:90a:9f05:b0:285:b7b9:dcd4 with SMTP id n5-20020a17090a9f0500b00285b7b9dcd4mr2478575pjp.16.1702335168713; Mon, 11 Dec 2023 14:52:48 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702335168; cv=none; d=google.com; s=arc-20160816; b=zU/AY6iLy+rbgAAueks16MQr06jWzbdepQdIrSBXGIgdnowT94jTq/Hsjah7lRZm34 XppJS5S25pjIPnizBbc0gXlcnAcrmjqxPAYL038T3D/EU2dFN+1CGEo7gV8zlA2E98a0 bjBrIVA6RVRO2EW576fy9s3xROWfC5VtKreHJyrO+GrWPAL9p1gCsO19DZq6OUIBTKQ/ J7hVkJ9LTIV0IeGxNvyXA8VHpfeYWZI3GxVOwThXDNEkDWTSBWbveaszjAVMrbr9W4UR eV0Z6Tr9y2XptH4RaIbWYbWfES9xUKP2GCiJPKWqV06VHo6TkKUq9gafZ+2IMgdmj0Ak iqOg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:in-reply-to:references:message-id :content-transfer-encoding:mime-version:subject:date:from :dkim-signature; bh=2XEq/LlPNt6JECrMuiKKhY08i91dDkA1AW6e0w7rjK0=; fh=JpvfM3e+aJyUWlVhWivfe3Bpai8TmGDwtLUms+5GK1w=; b=JXcEBLoF69VSfVhWzdNiU7Trk/7dHwFOmIyK7HRR/0Bvlp/9X+Wi/3dWKEt6b95Z2R dTSFhNSC3ngnCCUvTkxEWjZmQd2392H+kQrE+sIAL/RH+QN9dBG0Plcx0wl65vfIW/bs P/B7f5nxu52kVamHGvEb22UaH/KcJj0KUA8wpXJEreJ8cOwNAQLVs3nPYEkhN6w4ar15 fPcC5fhrgUcCs7Ps9V9hZFqDCJPyVqSgHJRbNZEBApZDEK+U6Pxt6LN6U6A2ivUXQp5F gdZ/fjvWbpJJQpiWA8FOEIWoekVbNLaTkXINi9Nw+L7xOd73bten9gSumwnbLvxoCwdM vxzg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=mnl9ktrZ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from snail.vger.email (snail.vger.email. [2620:137:e000::3:7]) by mx.google.com with ESMTPS id lj6-20020a17090b344600b00286dd5ae1cbsi6807408pjb.125.2023.12.11.14.52.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 11 Dec 2023 14:52:48 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) client-ip=2620:137:e000::3:7; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=mnl9ktrZ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id BC88E804F607; Mon, 11 Dec 2023 14:52:47 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230307AbjLKWwc (ORCPT <rfc822;dexuan.linux@gmail.com> + 99 others); Mon, 11 Dec 2023 17:52:32 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50750 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230101AbjLKWw2 (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 11 Dec 2023 17:52:28 -0500 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.8]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7941BEA; Mon, 11 Dec 2023 14:52:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1702335154; x=1733871154; h=from:date:subject:mime-version:content-transfer-encoding: message-id:references:in-reply-to:to:cc; bh=ehImgNx/k9AKKVzaSKbOLo1Mb4OGFvOlrdKezvX51Qs=; b=mnl9ktrZGwft0/ztMbnaEwHkUOfGIrXRU4xt86tHV4frmXtPYovdRLoZ IMSC4a8pWqUwRubFfhtYBz0bb/MXhtcvfNmUObtHHoRHk8m643AYvw52S d6Fkkcywz7i3afFoqumBy/h2G8/61m2Nva+0KJ3OsUfeBkC8ZFy0p21HR 6CZx4H08XPXSAtBBnVGHc45SRLlA5KKKVNo56k85wB/HslnQCsOz/SPW1 UO3HX0gAqDxNREqo9mDzWSeqldoIrpzcwoJb795ZFfaBdHUPT//TiSDlH +HIEZnqaJE+m5z9fa7ra2d4q7F/uuGz0D9G2xf0mK6DXa3xCXhZVPzMrh Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10921"; a="8083760" X-IronPort-AV: E=Sophos;i="6.04,268,1695711600"; d="scan'208";a="8083760" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Dec 2023 14:52:31 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10921"; a="946511338" X-IronPort-AV: E=Sophos;i="6.04,268,1695711600"; d="scan'208";a="946511338" Received: from tlyon-mobl2.amr.corp.intel.com (HELO [192.168.1.200]) ([10.212.89.19]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Dec 2023 14:52:29 -0800 From: Vishal Verma <vishal.l.verma@intel.com> Date: Mon, 11 Dec 2023 15:52:18 -0700 Subject: [PATCH v3 2/2] dax: add a sysfs knob to control memmap_on_memory behavior MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Message-Id: <20231211-vv-dax_abi-v3-2-acf6cc1bde9f@intel.com> References: <20231211-vv-dax_abi-v3-0-acf6cc1bde9f@intel.com> In-Reply-To: <20231211-vv-dax_abi-v3-0-acf6cc1bde9f@intel.com> To: Dan Williams <dan.j.williams@intel.com>, Vishal Verma <vishal.l.verma@intel.com>, Dave Jiang <dave.jiang@intel.com> Cc: linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev, linux-cxl@vger.kernel.org, David Hildenbrand <david@redhat.com>, Dave Hansen <dave.hansen@linux.intel.com>, Huang Ying <ying.huang@intel.com>, Li Zhijian <lizhijian@fujitsu.com>, Jonathan Cameron <Jonathan.Cameron@huawei.com> X-Mailer: b4 0.13-dev-433a8 X-Developer-Signature: v=1; a=openpgp-sha256; l=4067; i=vishal.l.verma@intel.com; h=from:subject:message-id; bh=ehImgNx/k9AKKVzaSKbOLo1Mb4OGFvOlrdKezvX51Qs=; b=owGbwMvMwCXGf25diOft7jLG02pJDKnlk9bs0Q08u36yeE9D5odPpy9eCq97f3DXZimjGV/ut wuXmyXv6ChlYRDjYpAVU2T5u+cj4zG57fk8gQmOMHNYmUCGMHBxCsBEWsUY/sp/05f8vH7V+2LJ dUduyLd3/SlOK1JcbvvFQTuv8drym+kMPxlNnmheye0rqIxzYlwU819lzccdx/O5BCPvnUnwMTD zZAMA X-Developer-Key: i=vishal.l.verma@intel.com; a=openpgp; fpr=F8682BE134C67A12332A2ED07AFA61BEA3B84DFF X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE, SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Mon, 11 Dec 2023 14:52:47 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1785027801680809112 X-GMAIL-MSGID: 1785027801680809112 |
Series |
Add DAX ABI for memmap_on_memory
|
|
Commit Message
Verma, Vishal L
Dec. 11, 2023, 10:52 p.m. UTC
Add a sysfs knob for dax devices to control the memmap_on_memory setting if the dax device were to be hotplugged as system memory. The default memmap_on_memory setting for dax devices originating via pmem or hmem is set to 'false' - i.e. no memmap_on_memory semantics, to preserve legacy behavior. For dax devices via CXL, the default is on. The sysfs control allows the administrator to override the above defaults if needed. Cc: David Hildenbrand <david@redhat.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Huang Ying <ying.huang@intel.com> Tested-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> --- drivers/dax/bus.c | 47 +++++++++++++++++++++++++++++++++ Documentation/ABI/testing/sysfs-bus-dax | 17 ++++++++++++ 2 files changed, 64 insertions(+)
Comments
Vishal Verma <vishal.l.verma@intel.com> writes: > Add a sysfs knob for dax devices to control the memmap_on_memory setting > if the dax device were to be hotplugged as system memory. > > The default memmap_on_memory setting for dax devices originating via > pmem or hmem is set to 'false' - i.e. no memmap_on_memory semantics, to > preserve legacy behavior. For dax devices via CXL, the default is on. > The sysfs control allows the administrator to override the above > defaults if needed. > > Cc: David Hildenbrand <david@redhat.com> > Cc: Dan Williams <dan.j.williams@intel.com> > Cc: Dave Jiang <dave.jiang@intel.com> > Cc: Dave Hansen <dave.hansen@linux.intel.com> > Cc: Huang Ying <ying.huang@intel.com> > Tested-by: Li Zhijian <lizhijian@fujitsu.com> > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > Reviewed-by: David Hildenbrand <david@redhat.com> > Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> > --- > drivers/dax/bus.c | 47 +++++++++++++++++++++++++++++++++ > Documentation/ABI/testing/sysfs-bus-dax | 17 ++++++++++++ > 2 files changed, 64 insertions(+) > > diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c > index 1ff1ab5fa105..2871e5188f0d 100644 > --- a/drivers/dax/bus.c > +++ b/drivers/dax/bus.c > @@ -1270,6 +1270,52 @@ static ssize_t numa_node_show(struct device *dev, > } > static DEVICE_ATTR_RO(numa_node); > > +static ssize_t memmap_on_memory_show(struct device *dev, > + struct device_attribute *attr, char *buf) > +{ > + struct dev_dax *dev_dax = to_dev_dax(dev); > + > + return sprintf(buf, "%d\n", dev_dax->memmap_on_memory); > +} > + > +static ssize_t memmap_on_memory_store(struct device *dev, > + struct device_attribute *attr, > + const char *buf, size_t len) > +{ > + struct device_driver *drv = dev->driver; > + struct dev_dax *dev_dax = to_dev_dax(dev); > + struct dax_region *dax_region = dev_dax->region; > + struct dax_device_driver *dax_drv = to_dax_drv(drv); > + ssize_t rc; > + bool val; > + > + rc = kstrtobool(buf, &val); > + if (rc) > + return rc; > + > + if (dev_dax->memmap_on_memory == val) > + return len; > + > + device_lock(dax_region->dev); > + if (!dax_region->dev->driver) { > + device_unlock(dax_region->dev); > + return -ENXIO; > + } I think that it should be OK to write to "memmap_on_memory" if no driver is bound to the device. We just need to avoid to write to it when kmem driver is bound. -- Best Regards, Huang, Ying > + > + if (dax_drv->type == DAXDRV_KMEM_TYPE) { > + device_unlock(dax_region->dev); > + return -EBUSY; > + } > + > + device_lock(dev); > + dev_dax->memmap_on_memory = val; > + device_unlock(dev); > + > + device_unlock(dax_region->dev); > + return len; > +} > +static DEVICE_ATTR_RW(memmap_on_memory); > + > static umode_t dev_dax_visible(struct kobject *kobj, struct attribute *a, int n) > { > struct device *dev = container_of(kobj, struct device, kobj); > @@ -1296,6 +1342,7 @@ static struct attribute *dev_dax_attributes[] = { > &dev_attr_align.attr, > &dev_attr_resource.attr, > &dev_attr_numa_node.attr, > + &dev_attr_memmap_on_memory.attr, > NULL, > }; > > diff --git a/Documentation/ABI/testing/sysfs-bus-dax b/Documentation/ABI/testing/sysfs-bus-dax > index a61a7b186017..b1fd8bf8a7de 100644 > --- a/Documentation/ABI/testing/sysfs-bus-dax > +++ b/Documentation/ABI/testing/sysfs-bus-dax > @@ -149,3 +149,20 @@ KernelVersion: v5.1 > Contact: nvdimm@lists.linux.dev > Description: > (RO) The id attribute indicates the region id of a dax region. > + > +What: /sys/bus/dax/devices/daxX.Y/memmap_on_memory > +Date: October, 2023 > +KernelVersion: v6.8 > +Contact: nvdimm@lists.linux.dev > +Description: > + (RW) Control the memmap_on_memory setting if the dax device > + were to be hotplugged as system memory. This determines whether > + the 'altmap' for the hotplugged memory will be placed on the > + device being hotplugged (memmap_on_memory=1) or if it will be > + placed on regular memory (memmap_on_memory=0). This attribute > + must be set before the device is handed over to the 'kmem' > + driver (i.e. hotplugged into system-ram). Additionally, this > + depends on CONFIG_MHP_MEMMAP_ON_MEMORY, and a globally enabled > + memmap_on_memory parameter for memory_hotplug. This is > + typically set on the kernel command line - > + memory_hotplug.memmap_on_memory set to 'true' or 'force'."
On Tue, 2023-12-12 at 08:30 +0800, Huang, Ying wrote: > Vishal Verma <vishal.l.verma@intel.com> writes: > > > Add a sysfs knob for dax devices to control the memmap_on_memory setting > > if the dax device were to be hotplugged as system memory. > > > > The default memmap_on_memory setting for dax devices originating via > > pmem or hmem is set to 'false' - i.e. no memmap_on_memory semantics, to > > preserve legacy behavior. For dax devices via CXL, the default is on. > > The sysfs control allows the administrator to override the above > > defaults if needed. > > > > Cc: David Hildenbrand <david@redhat.com> > > Cc: Dan Williams <dan.j.williams@intel.com> > > Cc: Dave Jiang <dave.jiang@intel.com> > > Cc: Dave Hansen <dave.hansen@linux.intel.com> > > Cc: Huang Ying <ying.huang@intel.com> > > Tested-by: Li Zhijian <lizhijian@fujitsu.com> > > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > > Reviewed-by: David Hildenbrand <david@redhat.com> > > Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> > > --- > > drivers/dax/bus.c | 47 +++++++++++++++++++++++++++++++++ > > Documentation/ABI/testing/sysfs-bus-dax | 17 ++++++++++++ > > 2 files changed, 64 insertions(+) > > > > diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c > > index 1ff1ab5fa105..2871e5188f0d 100644 > > --- a/drivers/dax/bus.c > > +++ b/drivers/dax/bus.c > > @@ -1270,6 +1270,52 @@ static ssize_t numa_node_show(struct device *dev, > > } > > static DEVICE_ATTR_RO(numa_node); > > > > +static ssize_t memmap_on_memory_show(struct device *dev, > > + struct device_attribute *attr, char *buf) > > +{ > > + struct dev_dax *dev_dax = to_dev_dax(dev); > > + > > + return sprintf(buf, "%d\n", dev_dax->memmap_on_memory); > > +} > > + > > +static ssize_t memmap_on_memory_store(struct device *dev, > > + struct device_attribute *attr, > > + const char *buf, size_t len) > > +{ > > + struct device_driver *drv = dev->driver; > > + struct dev_dax *dev_dax = to_dev_dax(dev); > > + struct dax_region *dax_region = dev_dax->region; > > + struct dax_device_driver *dax_drv = to_dax_drv(drv); > > + ssize_t rc; > > + bool val; > > + > > + rc = kstrtobool(buf, &val); > > + if (rc) > > + return rc; > > + > > + if (dev_dax->memmap_on_memory == val) > > + return len; > > + > > + device_lock(dax_region->dev); > > + if (!dax_region->dev->driver) { > > + device_unlock(dax_region->dev); > > + return -ENXIO; > > + } > > I think that it should be OK to write to "memmap_on_memory" if no driver > is bound to the device. We just need to avoid to write to it when kmem > driver is bound. Oh this is just a check on the region driver, not for a dax driver being bound to the device. It's the same as what things like align_store(), size_store() etc. do for dax device reconfiguration. That said, it might be okay to remove this check, as this operation doesn't change any attributes of the dax region (the other interfaces I mentioned above can affect regions, so we want to lock the region device). If removing the check, we'd drop the region lock acquisition as well. Dan, any thoughts on this?
"Verma, Vishal L" <vishal.l.verma@intel.com> writes: > On Tue, 2023-12-12 at 08:30 +0800, Huang, Ying wrote: >> Vishal Verma <vishal.l.verma@intel.com> writes: >> >> > Add a sysfs knob for dax devices to control the memmap_on_memory setting >> > if the dax device were to be hotplugged as system memory. >> > >> > The default memmap_on_memory setting for dax devices originating via >> > pmem or hmem is set to 'false' - i.e. no memmap_on_memory semantics, to >> > preserve legacy behavior. For dax devices via CXL, the default is on. >> > The sysfs control allows the administrator to override the above >> > defaults if needed. >> > >> > Cc: David Hildenbrand <david@redhat.com> >> > Cc: Dan Williams <dan.j.williams@intel.com> >> > Cc: Dave Jiang <dave.jiang@intel.com> >> > Cc: Dave Hansen <dave.hansen@linux.intel.com> >> > Cc: Huang Ying <ying.huang@intel.com> >> > Tested-by: Li Zhijian <lizhijian@fujitsu.com> >> > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> >> > Reviewed-by: David Hildenbrand <david@redhat.com> >> > Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> >> > --- >> > drivers/dax/bus.c | 47 +++++++++++++++++++++++++++++++++ >> > Documentation/ABI/testing/sysfs-bus-dax | 17 ++++++++++++ >> > 2 files changed, 64 insertions(+) >> > >> > diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c >> > index 1ff1ab5fa105..2871e5188f0d 100644 >> > --- a/drivers/dax/bus.c >> > +++ b/drivers/dax/bus.c >> > @@ -1270,6 +1270,52 @@ static ssize_t numa_node_show(struct device *dev, >> > } >> > static DEVICE_ATTR_RO(numa_node); >> > >> > +static ssize_t memmap_on_memory_show(struct device *dev, >> > + struct device_attribute *attr, char *buf) >> > +{ >> > + struct dev_dax *dev_dax = to_dev_dax(dev); >> > + >> > + return sprintf(buf, "%d\n", dev_dax->memmap_on_memory); >> > +} >> > + >> > +static ssize_t memmap_on_memory_store(struct device *dev, >> > + struct device_attribute *attr, >> > + const char *buf, size_t len) >> > +{ >> > + struct device_driver *drv = dev->driver; >> > + struct dev_dax *dev_dax = to_dev_dax(dev); >> > + struct dax_region *dax_region = dev_dax->region; >> > + struct dax_device_driver *dax_drv = to_dax_drv(drv); >> > + ssize_t rc; >> > + bool val; >> > + >> > + rc = kstrtobool(buf, &val); >> > + if (rc) >> > + return rc; >> > + >> > + if (dev_dax->memmap_on_memory == val) >> > + return len; >> > + >> > + device_lock(dax_region->dev); >> > + if (!dax_region->dev->driver) { >> > + device_unlock(dax_region->dev); >> > + return -ENXIO; >> > + } >> >> I think that it should be OK to write to "memmap_on_memory" if no driver >> is bound to the device. We just need to avoid to write to it when kmem >> driver is bound. > > Oh this is just a check on the region driver, not for a dax driver > being bound to the device. It's the same as what things like > align_store(), size_store() etc. do for dax device reconfiguration. Sorry, I misunderstood it. > That said, it might be okay to remove this check, as this operation > doesn't change any attributes of the dax region (the other interfaces I > mentioned above can affect regions, so we want to lock the region > device). If removing the check, we'd drop the region lock acquisition > as well. This sounds good to me. And is it necessary to check driver type with device_lock()? Can driver be changed between checking and lock? -- Best Regards, Huang, Ying
Verma, Vishal L wrote: > On Tue, 2023-12-12 at 08:30 +0800, Huang, Ying wrote: > > Vishal Verma <vishal.l.verma@intel.com> writes: > > > > > Add a sysfs knob for dax devices to control the memmap_on_memory setting > > > if the dax device were to be hotplugged as system memory. > > > > > > The default memmap_on_memory setting for dax devices originating via > > > pmem or hmem is set to 'false' - i.e. no memmap_on_memory semantics, to > > > preserve legacy behavior. For dax devices via CXL, the default is on. > > > The sysfs control allows the administrator to override the above > > > defaults if needed. > > > > > > Cc: David Hildenbrand <david@redhat.com> > > > Cc: Dan Williams <dan.j.williams@intel.com> > > > Cc: Dave Jiang <dave.jiang@intel.com> > > > Cc: Dave Hansen <dave.hansen@linux.intel.com> > > > Cc: Huang Ying <ying.huang@intel.com> > > > Tested-by: Li Zhijian <lizhijian@fujitsu.com> > > > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > > > Reviewed-by: David Hildenbrand <david@redhat.com> > > > Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> > > > --- > > > drivers/dax/bus.c | 47 +++++++++++++++++++++++++++++++++ > > > Documentation/ABI/testing/sysfs-bus-dax | 17 ++++++++++++ > > > 2 files changed, 64 insertions(+) > > > > > > diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c > > > index 1ff1ab5fa105..2871e5188f0d 100644 > > > --- a/drivers/dax/bus.c > > > +++ b/drivers/dax/bus.c > > > @@ -1270,6 +1270,52 @@ static ssize_t numa_node_show(struct device *dev, > > > } > > > static DEVICE_ATTR_RO(numa_node); > > > > > > +static ssize_t memmap_on_memory_show(struct device *dev, > > > + struct device_attribute *attr, char *buf) > > > +{ > > > + struct dev_dax *dev_dax = to_dev_dax(dev); > > > + > > > + return sprintf(buf, "%d\n", dev_dax->memmap_on_memory); > > > +} > > > + > > > +static ssize_t memmap_on_memory_store(struct device *dev, > > > + struct device_attribute *attr, > > > + const char *buf, size_t len) > > > +{ > > > + struct device_driver *drv = dev->driver; > > > + struct dev_dax *dev_dax = to_dev_dax(dev); > > > + struct dax_region *dax_region = dev_dax->region; > > > + struct dax_device_driver *dax_drv = to_dax_drv(drv); > > > + ssize_t rc; > > > + bool val; > > > + > > > + rc = kstrtobool(buf, &val); > > > + if (rc) > > > + return rc; > > > + > > > + if (dev_dax->memmap_on_memory == val) > > > + return len; > > > + > > > + device_lock(dax_region->dev); > > > + if (!dax_region->dev->driver) { > > > + device_unlock(dax_region->dev); > > > + return -ENXIO; > > > + } > > > > I think that it should be OK to write to "memmap_on_memory" if no driver > > is bound to the device. We just need to avoid to write to it when kmem > > driver is bound. > > Oh this is just a check on the region driver, not for a dax driver > being bound to the device. It's the same as what things like > align_store(), size_store() etc. do for dax device reconfiguration. > > That said, it might be okay to remove this check, as this operation > doesn't change any attributes of the dax region (the other interfaces I > mentioned above can affect regions, so we want to lock the region > device). If removing the check, we'd drop the region lock acquisition > as well. > > Dan, any thoughts on this? Since this is a dev_dax attribute then this would have already been synchronously shutdown when dax_region->dev->driver transitioned to NULL. I.e. region unbind causes dev_dax deletion. However, there's a different issue here as dev->driver was referenced without the device_lock(). Additionally, I think this function also makes it clear that device lock flavor of guard() would be useful: DEFINE_GUARD(dev, struct device *, device_lock(_T), device_unlock(_T)) ...then I would expect something like: guard(dev)(dev); if (dev_dax->memmap_on_memory != val && dev->driver && to_dax_drv(dev->driver)->type == DAXDRV_KMEM_TYPE) return -EBUSY; dev_dax->memmap_on_memory = val; return len; ...maybe with some temp variables to reduce the derefence chain, buy you get the idea. Only prevent changes while the device is active under kmem.
On Tue, 2023-12-12 at 08:56 +0800, Huang, Ying wrote: > "Verma, Vishal L" <vishal.l.verma@intel.com> writes: > > > On Tue, 2023-12-12 at 08:30 +0800, Huang, Ying wrote: > > > > > > > + > > > > +static ssize_t memmap_on_memory_store(struct device *dev, > > > > + struct device_attribute *attr, > > > > + const char *buf, size_t len) > > > > +{ > > > > + struct device_driver *drv = dev->driver; > > > > + struct dev_dax *dev_dax = to_dev_dax(dev); > > > > + struct dax_region *dax_region = dev_dax->region; > > > > + struct dax_device_driver *dax_drv = to_dax_drv(drv); > > > > + ssize_t rc; > > > > + bool val; > > > > + > > > > + rc = kstrtobool(buf, &val); > > > > + if (rc) > > > > + return rc; > > > > + > > > > + if (dev_dax->memmap_on_memory == val) > > > > + return len; > > > > + > > > > + device_lock(dax_region->dev); > > > > + if (!dax_region->dev->driver) { > > > > + device_unlock(dax_region->dev); > > > > + return -ENXIO; > > > > + } > > > > > > I think that it should be OK to write to "memmap_on_memory" if no driver > > > is bound to the device. We just need to avoid to write to it when kmem > > > driver is bound. > > > > Oh this is just a check on the region driver, not for a dax driver > > being bound to the device. It's the same as what things like > > align_store(), size_store() etc. do for dax device reconfiguration. > > Sorry, I misunderstood it. > > > That said, it might be okay to remove this check, as this operation > > doesn't change any attributes of the dax region (the other interfaces I > > mentioned above can affect regions, so we want to lock the region > > device). If removing the check, we'd drop the region lock acquisition > > as well. > > This sounds good to me. > > And is it necessary to check driver type with device_lock()? Can driver > be changed between checking and lock? > Oh, good point, the type check should happen with the device lock held. I'll make that change.
On 11.12.23 23:52, Vishal Verma wrote: > Add a sysfs knob for dax devices to control the memmap_on_memory setting > if the dax device were to be hotplugged as system memory. > > The default memmap_on_memory setting for dax devices originating via > pmem or hmem is set to 'false' - i.e. no memmap_on_memory semantics, to > preserve legacy behavior. For dax devices via CXL, the default is on. > The sysfs control allows the administrator to override the above > defaults if needed. > > Cc: David Hildenbrand <david@redhat.com> > Cc: Dan Williams <dan.j.williams@intel.com> > Cc: Dave Jiang <dave.jiang@intel.com> > Cc: Dave Hansen <dave.hansen@linux.intel.com> > Cc: Huang Ying <ying.huang@intel.com> > Tested-by: Li Zhijian <lizhijian@fujitsu.com> > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > Reviewed-by: David Hildenbrand <david@redhat.com> > Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> > --- > drivers/dax/bus.c | 47 +++++++++++++++++++++++++++++++++ > Documentation/ABI/testing/sysfs-bus-dax | 17 ++++++++++++ > 2 files changed, 64 insertions(+) > > diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c > index 1ff1ab5fa105..2871e5188f0d 100644 > --- a/drivers/dax/bus.c > +++ b/drivers/dax/bus.c > @@ -1270,6 +1270,52 @@ static ssize_t numa_node_show(struct device *dev, > } > static DEVICE_ATTR_RO(numa_node); > > +static ssize_t memmap_on_memory_show(struct device *dev, > + struct device_attribute *attr, char *buf) > +{ > + struct dev_dax *dev_dax = to_dev_dax(dev); > + > + return sprintf(buf, "%d\n", dev_dax->memmap_on_memory); > +} > + > +static ssize_t memmap_on_memory_store(struct device *dev, > + struct device_attribute *attr, > + const char *buf, size_t len) > +{ > + struct device_driver *drv = dev->driver; > + struct dev_dax *dev_dax = to_dev_dax(dev); > + struct dax_region *dax_region = dev_dax->region; > + struct dax_device_driver *dax_drv = to_dax_drv(drv); > + ssize_t rc; > + bool val; > + > + rc = kstrtobool(buf, &val); > + if (rc) > + return rc; > + > + if (dev_dax->memmap_on_memory == val) > + return len; > + > + device_lock(dax_region->dev); > + if (!dax_region->dev->driver) { > + device_unlock(dax_region->dev); > + return -ENXIO; > + } > + > + if (dax_drv->type == DAXDRV_KMEM_TYPE) { > + device_unlock(dax_region->dev); > + return -EBUSY; > + } > + > + device_lock(dev); > + dev_dax->memmap_on_memory = val; > + device_unlock(dev); > + > + device_unlock(dax_region->dev); > + return len; > +} > +static DEVICE_ATTR_RW(memmap_on_memory); > + > static umode_t dev_dax_visible(struct kobject *kobj, struct attribute *a, int n) > { > struct device *dev = container_of(kobj, struct device, kobj); > @@ -1296,6 +1342,7 @@ static struct attribute *dev_dax_attributes[] = { > &dev_attr_align.attr, > &dev_attr_resource.attr, > &dev_attr_numa_node.attr, > + &dev_attr_memmap_on_memory.attr, > NULL, > }; > > diff --git a/Documentation/ABI/testing/sysfs-bus-dax b/Documentation/ABI/testing/sysfs-bus-dax > index a61a7b186017..b1fd8bf8a7de 100644 > --- a/Documentation/ABI/testing/sysfs-bus-dax > +++ b/Documentation/ABI/testing/sysfs-bus-dax > @@ -149,3 +149,20 @@ KernelVersion: v5.1 > Contact: nvdimm@lists.linux.dev > Description: > (RO) The id attribute indicates the region id of a dax region. > + > +What: /sys/bus/dax/devices/daxX.Y/memmap_on_memory > +Date: October, 2023 > +KernelVersion: v6.8 > +Contact: nvdimm@lists.linux.dev > +Description: > + (RW) Control the memmap_on_memory setting if the dax device > + were to be hotplugged as system memory. This determines whether > + the 'altmap' for the hotplugged memory will be placed on the > + device being hotplugged (memmap_on_memory=1) or if it will be > + placed on regular memory (memmap_on_memory=0). This attribute > + must be set before the device is handed over to the 'kmem' > + driver (i.e. hotplugged into system-ram). Additionally, this > + depends on CONFIG_MHP_MEMMAP_ON_MEMORY, and a globally enabled > + memmap_on_memory parameter for memory_hotplug. This is > + typically set on the kernel command line - > + memory_hotplug.memmap_on_memory set to 'true' or 'force'." > Thinking about it, I wonder if we could disallow setting that property to "true" if the current configuration does not allow it. That is: 1) Removing the "size" parameter from mhp_supports_memmap_on_memory(), it doesn't make any sense anymore. 2) Exporting mhp_supports_memmap_on_memory() to modules. 3) When setting memmap_on_memory, check whether mhp_supports_memmap_on_memory() == true. Then, the user really gets an error when trying to set it to "true".
diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c index 1ff1ab5fa105..2871e5188f0d 100644 --- a/drivers/dax/bus.c +++ b/drivers/dax/bus.c @@ -1270,6 +1270,52 @@ static ssize_t numa_node_show(struct device *dev, } static DEVICE_ATTR_RO(numa_node); +static ssize_t memmap_on_memory_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct dev_dax *dev_dax = to_dev_dax(dev); + + return sprintf(buf, "%d\n", dev_dax->memmap_on_memory); +} + +static ssize_t memmap_on_memory_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t len) +{ + struct device_driver *drv = dev->driver; + struct dev_dax *dev_dax = to_dev_dax(dev); + struct dax_region *dax_region = dev_dax->region; + struct dax_device_driver *dax_drv = to_dax_drv(drv); + ssize_t rc; + bool val; + + rc = kstrtobool(buf, &val); + if (rc) + return rc; + + if (dev_dax->memmap_on_memory == val) + return len; + + device_lock(dax_region->dev); + if (!dax_region->dev->driver) { + device_unlock(dax_region->dev); + return -ENXIO; + } + + if (dax_drv->type == DAXDRV_KMEM_TYPE) { + device_unlock(dax_region->dev); + return -EBUSY; + } + + device_lock(dev); + dev_dax->memmap_on_memory = val; + device_unlock(dev); + + device_unlock(dax_region->dev); + return len; +} +static DEVICE_ATTR_RW(memmap_on_memory); + static umode_t dev_dax_visible(struct kobject *kobj, struct attribute *a, int n) { struct device *dev = container_of(kobj, struct device, kobj); @@ -1296,6 +1342,7 @@ static struct attribute *dev_dax_attributes[] = { &dev_attr_align.attr, &dev_attr_resource.attr, &dev_attr_numa_node.attr, + &dev_attr_memmap_on_memory.attr, NULL, }; diff --git a/Documentation/ABI/testing/sysfs-bus-dax b/Documentation/ABI/testing/sysfs-bus-dax index a61a7b186017..b1fd8bf8a7de 100644 --- a/Documentation/ABI/testing/sysfs-bus-dax +++ b/Documentation/ABI/testing/sysfs-bus-dax @@ -149,3 +149,20 @@ KernelVersion: v5.1 Contact: nvdimm@lists.linux.dev Description: (RO) The id attribute indicates the region id of a dax region. + +What: /sys/bus/dax/devices/daxX.Y/memmap_on_memory +Date: October, 2023 +KernelVersion: v6.8 +Contact: nvdimm@lists.linux.dev +Description: + (RW) Control the memmap_on_memory setting if the dax device + were to be hotplugged as system memory. This determines whether + the 'altmap' for the hotplugged memory will be placed on the + device being hotplugged (memmap_on_memory=1) or if it will be + placed on regular memory (memmap_on_memory=0). This attribute + must be set before the device is handed over to the 'kmem' + driver (i.e. hotplugged into system-ram). Additionally, this + depends on CONFIG_MHP_MEMMAP_ON_MEMORY, and a globally enabled + memmap_on_memory parameter for memory_hotplug. This is + typically set on the kernel command line - + memory_hotplug.memmap_on_memory set to 'true' or 'force'."