[v3] driver core: shut down devices asynchronously
Commit Message
Shut down devices asynchronously, ensuring that each device is shut down
before its parents.
This can dramatically reduce system shutdown/reboot time on systems that
have devices that take many seconds to shut down, such as some NVMe drives.
On one system tested, the shutdown time went from 11 minutes without this
patch to 55 seconds with the patch.
Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>
---
v1->v2: rewritten using kernel async code (suggested by Lukas Wunner)
v2->v3: removed recursive functions to schedule children to be shutdown
before parents, since existing device_shutdown loop will already
do this
---
drivers/base/base.h | 3 ++
drivers/base/core.c | 102 ++++++++++++++++++++++++++----------------
include/linux/async.h | 6 +++
3 files changed, 73 insertions(+), 38 deletions(-)
Comments
Hi Stuart,
kernel test robot noticed the following build warnings:
[auto build test WARNING on driver-core/driver-core-testing]
[also build test WARNING on driver-core/driver-core-next driver-core/driver-core-linus linus/master v6.6-rc2 next-20230921]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Stuart-Hayes/driver-core-shut-down-devices-asynchronously/20230921-030141
base: driver-core/driver-core-testing
patch link: https://lore.kernel.org/r/20230920185923.3422-1-stuart.w.hayes%40gmail.com
patch subject: [PATCH v3] driver core: shut down devices asynchronously
config: um-allnoconfig (https://download.01.org/0day-ci/archive/20230921/202309212351.tXxMovlT-lkp@intel.com/config)
compiler: clang version 17.0.0 (https://github.com/llvm/llvm-project.git 4a5ac14ee968ff0ad5d2cc1ffa0299048db4c88a)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20230921/202309212351.tXxMovlT-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202309212351.tXxMovlT-lkp@intel.com/
All warnings (new ones prefixed by >>):
In file included from drivers/base/core.c:25:
In file included from include/linux/blkdev.h:9:
In file included from include/linux/blk_types.h:10:
In file included from include/linux/bvec.h:10:
In file included from include/linux/highmem.h:12:
In file included from include/linux/hardirq.h:11:
In file included from arch/um/include/asm/hardirq.h:5:
In file included from include/asm-generic/hardirq.h:17:
In file included from include/linux/irq.h:20:
In file included from include/linux/io.h:13:
In file included from arch/um/include/asm/io.h:24:
include/asm-generic/io.h:547:31: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
547 | val = __raw_readb(PCI_IOBASE + addr);
| ~~~~~~~~~~ ^
include/asm-generic/io.h:560:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
560 | val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr));
| ~~~~~~~~~~ ^
include/uapi/linux/byteorder/little_endian.h:37:51: note: expanded from macro '__le16_to_cpu'
37 | #define __le16_to_cpu(x) ((__force __u16)(__le16)(x))
| ^
In file included from drivers/base/core.c:25:
In file included from include/linux/blkdev.h:9:
In file included from include/linux/blk_types.h:10:
In file included from include/linux/bvec.h:10:
In file included from include/linux/highmem.h:12:
In file included from include/linux/hardirq.h:11:
In file included from arch/um/include/asm/hardirq.h:5:
In file included from include/asm-generic/hardirq.h:17:
In file included from include/linux/irq.h:20:
In file included from include/linux/io.h:13:
In file included from arch/um/include/asm/io.h:24:
include/asm-generic/io.h:573:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
573 | val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
| ~~~~~~~~~~ ^
include/uapi/linux/byteorder/little_endian.h:35:51: note: expanded from macro '__le32_to_cpu'
35 | #define __le32_to_cpu(x) ((__force __u32)(__le32)(x))
| ^
In file included from drivers/base/core.c:25:
In file included from include/linux/blkdev.h:9:
In file included from include/linux/blk_types.h:10:
In file included from include/linux/bvec.h:10:
In file included from include/linux/highmem.h:12:
In file included from include/linux/hardirq.h:11:
In file included from arch/um/include/asm/hardirq.h:5:
In file included from include/asm-generic/hardirq.h:17:
In file included from include/linux/irq.h:20:
In file included from include/linux/io.h:13:
In file included from arch/um/include/asm/io.h:24:
include/asm-generic/io.h:584:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
584 | __raw_writeb(value, PCI_IOBASE + addr);
| ~~~~~~~~~~ ^
include/asm-generic/io.h:594:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
594 | __raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr);
| ~~~~~~~~~~ ^
include/asm-generic/io.h:604:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
604 | __raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr);
| ~~~~~~~~~~ ^
include/asm-generic/io.h:692:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
692 | readsb(PCI_IOBASE + addr, buffer, count);
| ~~~~~~~~~~ ^
include/asm-generic/io.h:700:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
700 | readsw(PCI_IOBASE + addr, buffer, count);
| ~~~~~~~~~~ ^
include/asm-generic/io.h:708:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
708 | readsl(PCI_IOBASE + addr, buffer, count);
| ~~~~~~~~~~ ^
include/asm-generic/io.h:717:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
717 | writesb(PCI_IOBASE + addr, buffer, count);
| ~~~~~~~~~~ ^
include/asm-generic/io.h:726:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
726 | writesw(PCI_IOBASE + addr, buffer, count);
| ~~~~~~~~~~ ^
include/asm-generic/io.h:735:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
735 | writesl(PCI_IOBASE + addr, buffer, count);
| ~~~~~~~~~~ ^
>> drivers/base/core.c:4811:19: warning: variable 'parent' is uninitialized when used here [-Wuninitialized]
4811 | dev->parent ? &parent->p->child_domain : &top_domain);
| ^~~~~~
drivers/base/core.c:4785:29: note: initialize the variable 'parent' to silence this warning
4785 | struct device *dev, *parent;
| ^
| = NULL
13 warnings generated.
vim +/parent +4811 drivers/base/core.c
4778
4779 /**
4780 * device_shutdown - call ->shutdown() on each device to shutdown.
4781 */
4782 void device_shutdown(void)
4783 {
4784 ASYNC_DOMAIN_EXCLUSIVE(top_domain);
4785 struct device *dev, *parent;
4786
4787 wait_for_device_probe();
4788 device_block_probing();
4789
4790 cpufreq_suspend();
4791
4792 spin_lock(&devices_kset->list_lock);
4793 /*
4794 * Walk the devices list backward, scheduling shutdown of each in
4795 * turn. Beware that device unplug events may also start pulling
4796 * devices offline, even as the system is shutting down.
4797 */
4798 while (!list_empty(&devices_kset->list)) {
4799 dev = list_entry(devices_kset->list.prev, struct device,
4800 kobj.entry);
4801
4802 get_device(dev);
4803 /*
4804 * Make sure the device is off the kset list, in the
4805 * event that dev->*->shutdown() doesn't remove it.
4806 */
4807 list_del_init(&dev->kobj.entry);
4808 spin_unlock(&devices_kset->list_lock);
4809
4810 async_schedule_domain(shutdown_device, dev,
> 4811 dev->parent ? &parent->p->child_domain : &top_domain);
4812
4813 spin_lock(&devices_kset->list_lock);
4814 }
4815 spin_unlock(&devices_kset->list_lock);
4816 async_synchronize_full_domain(&top_domain);
4817 }
4818
@@ -11,6 +11,7 @@
*
*/
#include <linux/notifier.h>
+#include <linux/async.h>
/**
* struct subsys_private - structure to hold the private to the driver core portions of the bus_type/class structure.
@@ -97,6 +98,7 @@ struct driver_private {
* the device; typically because it depends on another driver getting
* probed first.
* @async_driver - pointer to device driver awaiting probe via async_probe
+ * @child_domain - domain for async shutdown work of children
* @device - pointer back to the struct device that this structure is
* associated with.
* @dead - This device is currently either in the process of or has been
@@ -114,6 +116,7 @@ struct device_private {
struct list_head deferred_probe;
struct device_driver *async_driver;
char *deferred_probe_reason;
+ struct async_domain child_domain;
struct device *device;
u8 dead:1;
};
@@ -9,6 +9,7 @@
*/
#include <linux/acpi.h>
+#include <linux/async.h>
#include <linux/cpufreq.h>
#include <linux/device.h>
#include <linux/err.h>
@@ -3473,6 +3474,7 @@ static int device_private_init(struct device *dev)
klist_init(&dev->p->klist_children, klist_children_get,
klist_children_put);
INIT_LIST_HEAD(&dev->p->deferred_probe);
+ INIT_ASYNC_DOMAIN_EXCLUSIVE(&dev->p->child_domain);
return 0;
}
@@ -4718,11 +4720,68 @@ int device_change_owner(struct device *dev, kuid_t kuid, kgid_t kgid)
}
EXPORT_SYMBOL_GPL(device_change_owner);
+/**
+ * shutdown_device - shutdown one device
+ * @data: the pointer to the struct device to be shutdown
+ * @cookie: not used
+ *
+ * This shuts down one device, after waiting for children to finish
+ * shutdown. This should be scheduled for any children first.
+ */
+static void shutdown_device(void *data, async_cookie_t cookie)
+{
+ struct device *dev = data;
+
+ /*
+ * wait for shutdown work of children to finish
+ */
+ async_synchronize_full_domain(&dev->p->child_domain);
+
+ /*
+ * Make sure the device is off the kset list, in the
+ * event that dev->*->shutdown() doesn't remove it.
+ */
+ spin_lock(&devices_kset->list_lock);
+ list_del_init(&dev->kobj.entry);
+ spin_unlock(&devices_kset->list_lock);
+
+ /* hold lock to avoid race with probe/release */
+ if (dev->parent)
+ device_lock(dev->parent);
+ device_lock(dev);
+
+ /* Don't allow any more runtime suspends */
+ pm_runtime_get_noresume(dev);
+ pm_runtime_barrier(dev);
+
+ if (dev->class && dev->class->shutdown_pre) {
+ if (initcall_debug)
+ dev_info(dev, "shutdown_pre\n");
+ dev->class->shutdown_pre(dev);
+ }
+ if (dev->bus && dev->bus->shutdown) {
+ if (initcall_debug)
+ dev_info(dev, "shutdown\n");
+ dev->bus->shutdown(dev);
+ } else if (dev->driver && dev->driver->shutdown) {
+ if (initcall_debug)
+ dev_info(dev, "shutdown\n");
+ dev->driver->shutdown(dev);
+ }
+
+ device_unlock(dev);
+ if (dev->parent)
+ device_unlock(dev->parent);
+
+ put_device(dev);
+}
+
/**
* device_shutdown - call ->shutdown() on each device to shutdown.
*/
void device_shutdown(void)
{
+ ASYNC_DOMAIN_EXCLUSIVE(top_domain);
struct device *dev, *parent;
wait_for_device_probe();
@@ -4732,20 +4791,14 @@ void device_shutdown(void)
spin_lock(&devices_kset->list_lock);
/*
- * Walk the devices list backward, shutting down each in turn.
- * Beware that device unplug events may also start pulling
+ * Walk the devices list backward, scheduling shutdown of each in
+ * turn. Beware that device unplug events may also start pulling
* devices offline, even as the system is shutting down.
*/
while (!list_empty(&devices_kset->list)) {
dev = list_entry(devices_kset->list.prev, struct device,
kobj.entry);
- /*
- * hold reference count of device's parent to
- * prevent it from being freed because parent's
- * lock is to be held
- */
- parent = get_device(dev->parent);
get_device(dev);
/*
* Make sure the device is off the kset list, in the
@@ -4754,40 +4807,13 @@ void device_shutdown(void)
list_del_init(&dev->kobj.entry);
spin_unlock(&devices_kset->list_lock);
- /* hold lock to avoid race with probe/release */
- if (parent)
- device_lock(parent);
- device_lock(dev);
-
- /* Don't allow any more runtime suspends */
- pm_runtime_get_noresume(dev);
- pm_runtime_barrier(dev);
-
- if (dev->class && dev->class->shutdown_pre) {
- if (initcall_debug)
- dev_info(dev, "shutdown_pre\n");
- dev->class->shutdown_pre(dev);
- }
- if (dev->bus && dev->bus->shutdown) {
- if (initcall_debug)
- dev_info(dev, "shutdown\n");
- dev->bus->shutdown(dev);
- } else if (dev->driver && dev->driver->shutdown) {
- if (initcall_debug)
- dev_info(dev, "shutdown\n");
- dev->driver->shutdown(dev);
- }
-
- device_unlock(dev);
- if (parent)
- device_unlock(parent);
-
- put_device(dev);
- put_device(parent);
+ async_schedule_domain(shutdown_device, dev,
+ dev->parent ? &parent->p->child_domain : &top_domain);
spin_lock(&devices_kset->list_lock);
}
spin_unlock(&devices_kset->list_lock);
+ async_synchronize_full_domain(&top_domain);
}
/*
@@ -35,6 +35,12 @@ struct async_domain {
struct async_domain _name = { .pending = LIST_HEAD_INIT(_name.pending), \
.registered = 0 }
+static inline void INIT_ASYNC_DOMAIN_EXCLUSIVE(struct async_domain *domain)
+{
+ INIT_LIST_HEAD(&domain->pending);
+ domain->registered = 0;
+}
+
async_cookie_t async_schedule_node(async_func_t func, void *data,
int node);
async_cookie_t async_schedule_node_domain(async_func_t func, void *data,