mmc: allow mmc to block wait_for_device_probe()

Message ID 20230328223740.69446-1-dennis@kernel.org
State New
Headers
Series mmc: allow mmc to block wait_for_device_probe() |

Commit Message

Dennis Zhou March 28, 2023, 10:37 p.m. UTC
  I've been hitting a failed data device lookup when using dm-verity and a
root device on an emmc partition. This is because there is a race where
dm-verity is looking for a data device, but the partitions on the emmc
device haven't been probed yet.

Initially I looked at solving this by changing devt_from_devname() to
look for partitions, but it seems there is legacy reasons and issues due
to dm.

MMC uses 2 levels of probing. The first to handle initializing the
host and the second to iterate attached devices. The second is done by
a workqueue item. However, this paradigm makes wait_for_device_probe()
useless as a barrier for when we can assume attached devices have been
probed.

This patch fixes this by exposing 2 methods inc/dec_probe_count() to
allow device drivers that do asynchronous probing to delay waiters on
wait_for_device_probe() so that when they are released, they can assume
attached devices have been probed.

Signed-off-by: Dennis Zhou <dennis@kernel.org>
---
 drivers/base/dd.c        | 17 +++++++++++++++--
 drivers/mmc/core/core.c  | 25 +++++++++++++++++++++++--
 include/linux/device.h   |  7 +++++++
 include/linux/mmc/host.h |  1 +
 4 files changed, 46 insertions(+), 4 deletions(-)
  

Comments

Greg KH March 29, 2023, 4:54 a.m. UTC | #1
On Tue, Mar 28, 2023 at 03:37:40PM -0700, Dennis Zhou wrote:
> I've been hitting a failed data device lookup when using dm-verity and a
> root device on an emmc partition. This is because there is a race where
> dm-verity is looking for a data device, but the partitions on the emmc
> device haven't been probed yet.
> 
> Initially I looked at solving this by changing devt_from_devname() to
> look for partitions, but it seems there is legacy reasons and issues due
> to dm.
> 
> MMC uses 2 levels of probing. The first to handle initializing the
> host and the second to iterate attached devices. The second is done by
> a workqueue item. However, this paradigm makes wait_for_device_probe()
> useless as a barrier for when we can assume attached devices have been
> probed.
> 
> This patch fixes this by exposing 2 methods inc/dec_probe_count() to
> allow device drivers that do asynchronous probing to delay waiters on
> wait_for_device_probe() so that when they are released, they can assume
> attached devices have been probed.

Please no.  For 2 reasons:
  - the api names you picked here do not make much sense from a global
    namespace standpoint.  Always try to do "noun/verb" as well, so if
    we really wanted to do this it would be "driver_probe_incrememt()"
    or something like that.
 - drivers and subsystems should not be messing around with the probe
   count as it's a hack in the first place to get around other issues.
   Please let's not make it worse and make a formal api for it and allow
   anyone to mess with it.

Why can't you just use normal deferred probing for this?

thanks,

greg k-h
  
Dennis Zhou March 29, 2023, 8:29 p.m. UTC | #2
On Wed, Mar 29, 2023 at 06:54:11AM +0200, Greg Kroah-Hartman wrote:
> On Tue, Mar 28, 2023 at 03:37:40PM -0700, Dennis Zhou wrote:
> > I've been hitting a failed data device lookup when using dm-verity and a
> > root device on an emmc partition. This is because there is a race where
> > dm-verity is looking for a data device, but the partitions on the emmc
> > device haven't been probed yet.
> > 
> > Initially I looked at solving this by changing devt_from_devname() to
> > look for partitions, but it seems there is legacy reasons and issues due
> > to dm.
> > 
> > MMC uses 2 levels of probing. The first to handle initializing the
> > host and the second to iterate attached devices. The second is done by
> > a workqueue item. However, this paradigm makes wait_for_device_probe()
> > useless as a barrier for when we can assume attached devices have been
> > probed.
> > 
> > This patch fixes this by exposing 2 methods inc/dec_probe_count() to
> > allow device drivers that do asynchronous probing to delay waiters on
> > wait_for_device_probe() so that when they are released, they can assume
> > attached devices have been probed.
> 

Thanks for the quick reply.

> Please no.  For 2 reasons:
>   - the api names you picked here do not make much sense from a global
>     namespace standpoint.  Always try to do "noun/verb" as well, so if
>     we really wanted to do this it would be "driver_probe_incrememt()"
>     or something like that.

Yeah that is a bit of a blunder on my part...

>  - drivers and subsystems should not be messing around with the probe
>    count as it's a hack in the first place to get around other issues.
>    Please let's not make it worse and make a formal api for it and allow
>    anyone to mess with it.
> 

That's fair.

> Why can't you just use normal deferred probing for this?
> 

I'm not familiar with why mmc is written the way it is, but probing
creates a notion of the host whereas the devices attached are probed
later via a work item.

Examining it a bit closer, inlining the first discovery call
avoids all of this mess. I sent that out just now in [1]. Hopefully
that'll be fine.

[1] https://lore.kernel.org/lkml/20230329202148.71107-1-dennis@kernel.org/T/#u

> thanks,
> 
> greg k-h

Thanks,
Dennis
  
Greg KH March 31, 2023, 7:30 a.m. UTC | #3
On Wed, Mar 29, 2023 at 01:29:52PM -0700, Dennis Zhou wrote:
> On Wed, Mar 29, 2023 at 06:54:11AM +0200, Greg Kroah-Hartman wrote:
> > On Tue, Mar 28, 2023 at 03:37:40PM -0700, Dennis Zhou wrote:
> > > I've been hitting a failed data device lookup when using dm-verity and a
> > > root device on an emmc partition. This is because there is a race where
> > > dm-verity is looking for a data device, but the partitions on the emmc
> > > device haven't been probed yet.
> > > 
> > > Initially I looked at solving this by changing devt_from_devname() to
> > > look for partitions, but it seems there is legacy reasons and issues due
> > > to dm.
> > > 
> > > MMC uses 2 levels of probing. The first to handle initializing the
> > > host and the second to iterate attached devices. The second is done by
> > > a workqueue item. However, this paradigm makes wait_for_device_probe()
> > > useless as a barrier for when we can assume attached devices have been
> > > probed.
> > > 
> > > This patch fixes this by exposing 2 methods inc/dec_probe_count() to
> > > allow device drivers that do asynchronous probing to delay waiters on
> > > wait_for_device_probe() so that when they are released, they can assume
> > > attached devices have been probed.
> > 
> 
> Thanks for the quick reply.
> 
> > Please no.  For 2 reasons:
> >   - the api names you picked here do not make much sense from a global
> >     namespace standpoint.  Always try to do "noun/verb" as well, so if
> >     we really wanted to do this it would be "driver_probe_incrememt()"
> >     or something like that.
> 
> Yeah that is a bit of a blunder on my part...
> 
> >  - drivers and subsystems should not be messing around with the probe
> >    count as it's a hack in the first place to get around other issues.
> >    Please let's not make it worse and make a formal api for it and allow
> >    anyone to mess with it.
> > 
> 
> That's fair.
> 
> > Why can't you just use normal deferred probing for this?
> > 
> 
> I'm not familiar with why mmc is written the way it is, but probing
> creates a notion of the host whereas the devices attached are probed
> later via a work item.
> 
> Examining it a bit closer, inlining the first discovery call
> avoids all of this mess. I sent that out just now in [1]. Hopefully
> that'll be fine.
> 
> [1] https://lore.kernel.org/lkml/20230329202148.71107-1-dennis@kernel.org/T/#u

Looks much better, except for the kernel test bot issues...

thanks,

greg k-h
  

Patch

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 95ae347df137..c0117476e1d6 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -494,6 +494,19 @@  EXPORT_SYMBOL_GPL(device_bind_driver);
 static atomic_t probe_count = ATOMIC_INIT(0);
 static DECLARE_WAIT_QUEUE_HEAD(probe_waitqueue);
 
+void inc_probe_count(void)
+{
+	atomic_inc(&probe_count);
+}
+EXPORT_SYMBOL_GPL(inc_probe_count);
+
+void dec_probe_count(void)
+{
+	if (atomic_dec_return(&probe_count) == 0)
+		wake_up_all(&probe_waitqueue);
+}
+EXPORT_SYMBOL_GPL(dec_probe_count);
+
 static ssize_t state_synced_show(struct device *dev,
 				 struct device_attribute *attr, char *buf)
 {
@@ -793,8 +806,8 @@  static int driver_probe_device(struct device_driver *drv, struct device *dev)
 		    !defer_all_probes)
 			driver_deferred_probe_trigger();
 	}
-	atomic_dec(&probe_count);
-	wake_up_all(&probe_waitqueue);
+	if (atomic_dec_return(&probe_count) == 0)
+		wake_up_all(&probe_waitqueue);
 	return ret;
 }
 
diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index 368f10405e13..92690984dac2 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -2192,11 +2192,11 @@  void mmc_rescan(struct work_struct *work)
 	int i;
 
 	if (host->rescan_disable)
-		return;
+		goto out_probe;
 
 	/* If there is a non-removable card registered, only scan once */
 	if (!mmc_card_is_removable(host) && host->rescan_entered)
-		return;
+		goto out_probe;
 	host->rescan_entered = 1;
 
 	if (host->trigger_card_event && host->ops->card_event) {
@@ -2247,6 +2247,13 @@  void mmc_rescan(struct work_struct *work)
  out:
 	if (host->caps & MMC_CAP_NEEDS_POLL)
 		mmc_schedule_delayed_work(&host->detect, HZ);
+
+out_probe:
+	if (host->start_probe) {
+		/* matches inc_probe_count() in mmc_start_host() */
+		dec_probe_count();
+		host->start_probe = 0;
+	}
 }
 
 void mmc_start_host(struct mmc_host *host)
@@ -2261,6 +2268,15 @@  void mmc_start_host(struct mmc_host *host)
 	}
 
 	mmc_gpiod_request_cd_irq(host);
+
+	/*
+	 * MMC uses 2 levels of probing. The first to handle initializing the
+	 * host and the second to iterate attached devices. However, this
+	 * paradigm breaks wait_for_device_probe(). Fix this here by
+	 * incrementing the probe_count and decrementing after the scan.
+	 */
+	host->start_probe = 1;
+	inc_probe_count();
 	_mmc_detect_change(host, 0, false);
 }
 
@@ -2273,6 +2289,11 @@  void __mmc_stop_host(struct mmc_host *host)
 
 	host->rescan_disable = 1;
 	cancel_delayed_work_sync(&host->detect);
+	/* start_probe is protected by the cancel_delayed_work_sync() */
+	if (host->start_probe) {
+		dec_probe_count();
+		host->start_probe = 0;
+	}
 }
 
 void mmc_stop_host(struct mmc_host *host)
diff --git a/include/linux/device.h b/include/linux/device.h
index e270cb740b9e..d09bdc33d1cf 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -891,6 +891,13 @@  int __must_check device_reprobe(struct device *dev);
 
 bool device_is_bound(struct device *dev);
 
+/*
+ * Functions that inc/dec probe_count to allow device drivers that finish
+ * probing asynchronously to delay wait_for_device_probe() appropriately.
+ */
+void inc_probe_count(void);
+void dec_probe_count(void);
+
 /*
  * Easy functions for dynamically creating devices on the fly
  */
diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h
index 0c0c9a0fdf57..ea7b9158f052 100644
--- a/include/linux/mmc/host.h
+++ b/include/linux/mmc/host.h
@@ -428,6 +428,7 @@  struct mmc_host {
 	unsigned int		retune_paused:1; /* re-tuning is temporarily disabled */
 	unsigned int		retune_crc_disable:1; /* don't trigger retune upon crc */
 	unsigned int		can_dma_map_merge:1; /* merging can be used */
+	unsigned int		start_probe:1; /* if this is our first scan */
 
 	int			rescan_disable;	/* disable card detection */
 	int			rescan_entered;	/* used with nonremovable devices */