[RFC] ACPI: container: Add power domain control methods
Commit Message
Platform devices which supports power control are often required to be
power off/on together with the devices in the same power domain. However,
there isn't a generic driver that support the power control logic of
these devices.
ACPI container seems to be a good place to hold these control logic. Add
platform devices in the same power domain in a ACPI container, we can
easily get the locality information about these devices and can moniter
the power of these devices in the same power domain together.
This patch provide three userspace control interface to control the power
of devices together in the container:
- on: power up the devices in the container and then online these devices
which will be triggered by BIOS.
- off: offline and eject the child devices in the container which are
ejectable.
- pxms: show the pxms of devices which are present in the container.
In our scenario, we need to control the power of HBM memory devices which
can be power consuming and will only be used in some specialized scenarios,
such as HPC. HBM memory devices in a socket are in the same power domain,
and should be power off/on together. We have come up with an idea that put
these power control logic in a specialized driver, but ACPI container seems
to be a more generic place to hold these control logic.
Signed-off-by: Zhang Zekun <zhangzekun11@huawei.com>
---
drivers/acpi/Kconfig | 12 +++++
drivers/acpi/container.c | 112 +++++++++++++++++++++++++++++++++++++++
2 files changed, 124 insertions(+)
Comments
On Tue, Oct 25, 2022 at 8:17 AM Zhang Zekun <zhangzekun11@huawei.com> wrote:
>
> Platform devices which supports power control are often required to be
> power off/on together with the devices in the same power domain. However,
> there isn't a generic driver that support the power control logic of
> these devices.
Not true.
There is the ACPI power resources interface designed to represent
power domains that is well supported and used in the industry.
If it doesn't work for you, explain why.
Hi, Rafael J
This patch wants to put some generic control logic in container, and
these logic can
cover a batch of scenarios similar to ours. ACPI power resources
interface is not confilct
with this patch and can be used inside the container for more
complicated scenarios.
In our secenaio, we need to control the power of some HBM memory device,
each of it
will be configured as a PNP0C80, HBM devices in one socket are in the
same power
domain and need to power on/off together. Every HBM memory device
represent a numa
node and have no cpu on it. The topology in one socket can be simplifed
and represented as
+---------+
| node0 |
| CPUs |
| DRAM |
+---------+
|
+------+-------+
| |
+---------+ +---------+
| node1 | | node2 |
| no-cpu | | no-cpu |
| HBM | | HBM |
+---------+ +---------+
To use ACPI power domain management interface, we need to develop a
specialized
driver to maintain the relationship between socket id and numa nodes to
tell the
userspace which socket does this numa node belong to. Note that the numa
node in
the same socket will be power on/off together.
Socket id of a memory device can be reported by BIOS via DSDT or other
ACPI tables,
but we can just skip this step by put all of the devices belongs to the
same socket
in a container. And, we can call each child devices' "_PXM" function to
expose numa
nodes of HBM devices to userspace.
Besides, To power off the devices we need first to offline these ACPI
devices, and then
call the ACPI function "_EJ0" to finally remove it. This are also
generic logic that can be
used to remove ejectable devices.
what we really need is a place to support these generic control logic,
rather than the
interfaces to implement our requirements.
Best Regards,
Zekun, Zhang
在 2022/10/29 1:07, Rafael J. Wysocki 写道:
> On Tue, Oct 25, 2022 at 8:17 AM Zhang Zekun <zhangzekun11@huawei.com> wrote:
>> Platform devices which supports power control are often required to be
>> power off/on together with the devices in the same power domain. However,
>> there isn't a generic driver that support the power control logic of
>> these devices.
> Not true.
>
> There is the ACPI power resources interface designed to represent
> power domains that is well supported and used in the industry.
>
> If it doesn't work for you, explain why.
>
Kindly ping.
在 2022/10/29 1:07, Rafael J. Wysocki 写道:
> On Tue, Oct 25, 2022 at 8:17 AM Zhang Zekun <zhangzekun11@huawei.com> wrote:
>> Platform devices which supports power control are often required to be
>> power off/on together with the devices in the same power domain. However,
>> there isn't a generic driver that support the power control logic of
>> these devices.
> Not true.
>
> There is the ACPI power resources interface designed to represent
> power domains that is well supported and used in the industry.
>
> If it doesn't work for you, explain why.
>
On Thu, Nov 10, 2022 at 1:13 PM zhangzekun (A) <zhangzekun11@huawei.com> wrote:
>
> Kindly ping.
I'm not going to apply this patch if that's what you're asking about.
Please have a look at LPI which is the ACPI way of doing what you want.
If you need to extend the support for it in the kernel, please do so.
If you need to extend the definition of LPI in the ACPI specification,
there is also a way to do that.
What you are trying to do would require extending the container device
definition in the specification anyway.
> 在 2022/10/29 1:07, Rafael J. Wysocki 写道:
> > On Tue, Oct 25, 2022 at 8:17 AM Zhang Zekun <zhangzekun11@huawei.com> wrote:
> >> Platform devices which supports power control are often required to be
> >> power off/on together with the devices in the same power domain. However,
> >> there isn't a generic driver that support the power control logic of
> >> these devices.
> > Not true.
> >
> > There is the ACPI power resources interface designed to represent
> > power domains that is well supported and used in the industry.
> >
> > If it doesn't work for you, explain why.
> >
>
Hi, Rafael J
Thanks a lot for your advice! I will look into LPI and find a better way
to do what I want.
Best Regards,
Zekun, Zhang
在 2022/11/10 21:05, Rafael J. Wysocki 写道:
> On Thu, Nov 10, 2022 at 1:13 PM zhangzekun (A) <zhangzekun11@huawei.com> wrote:
>> Kindly ping.
> I'm not going to apply this patch if that's what you're asking about.
>
> Please have a look at LPI which is the ACPI way of doing what you want.
>
> If you need to extend the support for it in the kernel, please do so.
>
> If you need to extend the definition of LPI in the ACPI specification,
> there is also a way to do that.
>
> What you are trying to do would require extending the container device
> definition in the specification anyway.
>
>> 在 2022/10/29 1:07, Rafael J. Wysocki 写道:
>>> On Tue, Oct 25, 2022 at 8:17 AM Zhang Zekun <zhangzekun11@huawei.com> wrote:
>>>> Platform devices which supports power control are often required to be
>>>> power off/on together with the devices in the same power domain. However,
>>>> there isn't a generic driver that support the power control logic of
>>>> these devices.
>>> Not true.
>>>
>>> There is the ACPI power resources interface designed to represent
>>> power domains that is well supported and used in the industry.
>>>
>>> If it doesn't work for you, explain why.
>>>
@@ -584,6 +584,18 @@ config ACPI_PRMT
substantially increase computational overhead related to the
initialization of some server systems.
+config ACPI_POWER_DOMAIN_CTL
+ bool "acpi container power domain control support"
+ depends on ACPI_CONTAINER
+ default n
+ help
+ Add userspace power control interfaces in container which can be used
+ for manipulating the power of child devices in the same power domain.
+
+ To use this feature you need to put devices in the same power domain
+ in a container. Enable this feature if you want to control the power
+ of these devices together.
+
endif # ACPI
config X86_PM_TIMER
@@ -42,6 +42,115 @@ static void acpi_container_release(struct device *dev)
kfree(to_container_dev(dev));
}
+#ifdef CONFIG_ACPI_POWER_DOMAIN_CTL
+
+static int get_pxm(struct acpi_device *acpi_device, void *arg)
+{
+ int nid;
+ unsigned long long sta;
+ acpi_handle handle;
+ nodemask_t *mask;
+ acpi_status status;
+
+ mask = arg;
+ handle = acpi_device->handle;
+
+ status = acpi_evaluate_integer(handle, "_STA", NULL, &sta);
+ if (ACPI_SUCCESS(status) && (sta & ACPI_STA_DEVICE_ENABLED)) {
+ nid = acpi_get_node(handle);
+ if (nid >= 0)
+ node_set(nid, *mask);
+ }
+
+ return 0;
+}
+
+static ssize_t pxms_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ nodemask_t mask;
+ acpi_status status;
+ struct acpi_device *adev;
+
+ adev = to_acpi_device(dev);
+ nodes_clear(mask);
+
+ status = acpi_dev_for_each_child(adev, get_pxm, &mask);
+
+ return sysfs_emit(buf, "%*pbl\n", nodemask_pr_args(&mask));
+}
+DEVICE_ATTR_RO(pxms);
+
+static ssize_t on_store(struct device *d, struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ acpi_status status;
+ acpi_handle handle;
+ struct acpi_device *adev;
+
+ if (!count || buf[0] != '1')
+ return -EINVAL;
+
+ adev = to_acpi_device(d);
+ handle = adev->handle;
+ status = acpi_evaluate_object(handle, "_ON", NULL, NULL);
+ if (status == AE_NOT_FOUND)
+ acpi_handle_warn(handle, "No power on support for the container\n");
+ else if (ACPI_FAILURE(status))
+ acpi_handle_warn(handle, "Power on the device failed (0x%x)\n", status);
+
+ return count;
+}
+DEVICE_ATTR_WO(on);
+
+static int eject_device(struct acpi_device *acpi_device, void *not_used)
+{
+ acpi_object_type unused;
+ acpi_status status;
+
+ status = acpi_get_type(acpi_device->handle, &unused);
+ if (ACPI_FAILURE(status) || !acpi_device->flags.ejectable)
+ return -ENODEV;
+
+ acpi_dev_get(acpi_device);
+ status = acpi_hotplug_schedule(acpi_device, ACPI_OST_EC_OSPM_EJECT);
+ if (ACPI_SUCCESS(status))
+ return status;
+
+ acpi_dev_put(acpi_device);
+ acpi_evaluate_ost(acpi_device->handle, ACPI_OST_EC_OSPM_EJECT,
+ ACPI_OST_SC_NON_SPECIFIC_FAILURE, NULL);
+
+ return status == AE_NO_MEMORY ? -ENOMEM : -EAGAIN;
+}
+
+static ssize_t off_store(struct device *d, struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct acpi_device *adev;
+ acpi_status status;
+
+ if (!count || buf[0] != '1')
+ return -EINVAL;
+
+ adev = to_acpi_device(d);
+ status = acpi_dev_for_each_child(adev, eject_device, NULL);
+ if (ACPI_SUCCESS(status))
+ return count;
+
+ return status;
+}
+DEVICE_ATTR_WO(off);
+
+static void create_sysfs(struct device *dev)
+{
+ device_create_file(dev, &dev_attr_on);
+ device_create_file(dev, &dev_attr_off);
+ device_create_file(dev, &dev_attr_pxms);
+}
+#endif
+
static int container_device_attach(struct acpi_device *adev,
const struct acpi_device_id *not_used)
{
@@ -68,6 +177,9 @@ static int container_device_attach(struct acpi_device *adev,
return ret;
}
adev->driver_data = dev;
+#ifdef CONFIG_ACPI_POWER_DOMAIN_CTL
+ create_sysfs(&adev->dev);
+#endif
return 1;
}