mmc: allow mmc to block wait_for_device_probe()
Commit Message
I've been hitting a failed data device lookup when using dm-verity and a
root device on an emmc partition. This is because there is a race where
dm-verity is looking for a data device, but the partitions on the emmc
device haven't been probed yet.
Initially I looked at solving this by changing devt_from_devname() to
look for partitions, but it seems there is legacy reasons and issues due
to dm.
MMC uses 2 levels of probing. The first to handle initializing the
host and the second to iterate attached devices. The second is done by
a workqueue item. However, this paradigm makes wait_for_device_probe()
useless as a barrier for when we can assume attached devices have been
probed.
This patch fixes this by exposing 2 methods inc/dec_probe_count() to
allow device drivers that do asynchronous probing to delay waiters on
wait_for_device_probe() so that when they are released, they can assume
attached devices have been probed.
Signed-off-by: Dennis Zhou <dennis@kernel.org>
---
drivers/base/dd.c | 17 +++++++++++++++--
drivers/mmc/core/core.c | 25 +++++++++++++++++++++++--
include/linux/device.h | 7 +++++++
include/linux/mmc/host.h | 1 +
4 files changed, 46 insertions(+), 4 deletions(-)
Comments
On Tue, Mar 28, 2023 at 03:37:40PM -0700, Dennis Zhou wrote:
> I've been hitting a failed data device lookup when using dm-verity and a
> root device on an emmc partition. This is because there is a race where
> dm-verity is looking for a data device, but the partitions on the emmc
> device haven't been probed yet.
>
> Initially I looked at solving this by changing devt_from_devname() to
> look for partitions, but it seems there is legacy reasons and issues due
> to dm.
>
> MMC uses 2 levels of probing. The first to handle initializing the
> host and the second to iterate attached devices. The second is done by
> a workqueue item. However, this paradigm makes wait_for_device_probe()
> useless as a barrier for when we can assume attached devices have been
> probed.
>
> This patch fixes this by exposing 2 methods inc/dec_probe_count() to
> allow device drivers that do asynchronous probing to delay waiters on
> wait_for_device_probe() so that when they are released, they can assume
> attached devices have been probed.
Please no. For 2 reasons:
- the api names you picked here do not make much sense from a global
namespace standpoint. Always try to do "noun/verb" as well, so if
we really wanted to do this it would be "driver_probe_incrememt()"
or something like that.
- drivers and subsystems should not be messing around with the probe
count as it's a hack in the first place to get around other issues.
Please let's not make it worse and make a formal api for it and allow
anyone to mess with it.
Why can't you just use normal deferred probing for this?
thanks,
greg k-h
On Wed, Mar 29, 2023 at 06:54:11AM +0200, Greg Kroah-Hartman wrote:
> On Tue, Mar 28, 2023 at 03:37:40PM -0700, Dennis Zhou wrote:
> > I've been hitting a failed data device lookup when using dm-verity and a
> > root device on an emmc partition. This is because there is a race where
> > dm-verity is looking for a data device, but the partitions on the emmc
> > device haven't been probed yet.
> >
> > Initially I looked at solving this by changing devt_from_devname() to
> > look for partitions, but it seems there is legacy reasons and issues due
> > to dm.
> >
> > MMC uses 2 levels of probing. The first to handle initializing the
> > host and the second to iterate attached devices. The second is done by
> > a workqueue item. However, this paradigm makes wait_for_device_probe()
> > useless as a barrier for when we can assume attached devices have been
> > probed.
> >
> > This patch fixes this by exposing 2 methods inc/dec_probe_count() to
> > allow device drivers that do asynchronous probing to delay waiters on
> > wait_for_device_probe() so that when they are released, they can assume
> > attached devices have been probed.
>
Thanks for the quick reply.
> Please no. For 2 reasons:
> - the api names you picked here do not make much sense from a global
> namespace standpoint. Always try to do "noun/verb" as well, so if
> we really wanted to do this it would be "driver_probe_incrememt()"
> or something like that.
Yeah that is a bit of a blunder on my part...
> - drivers and subsystems should not be messing around with the probe
> count as it's a hack in the first place to get around other issues.
> Please let's not make it worse and make a formal api for it and allow
> anyone to mess with it.
>
That's fair.
> Why can't you just use normal deferred probing for this?
>
I'm not familiar with why mmc is written the way it is, but probing
creates a notion of the host whereas the devices attached are probed
later via a work item.
Examining it a bit closer, inlining the first discovery call
avoids all of this mess. I sent that out just now in [1]. Hopefully
that'll be fine.
[1] https://lore.kernel.org/lkml/20230329202148.71107-1-dennis@kernel.org/T/#u
> thanks,
>
> greg k-h
Thanks,
Dennis
On Wed, Mar 29, 2023 at 01:29:52PM -0700, Dennis Zhou wrote:
> On Wed, Mar 29, 2023 at 06:54:11AM +0200, Greg Kroah-Hartman wrote:
> > On Tue, Mar 28, 2023 at 03:37:40PM -0700, Dennis Zhou wrote:
> > > I've been hitting a failed data device lookup when using dm-verity and a
> > > root device on an emmc partition. This is because there is a race where
> > > dm-verity is looking for a data device, but the partitions on the emmc
> > > device haven't been probed yet.
> > >
> > > Initially I looked at solving this by changing devt_from_devname() to
> > > look for partitions, but it seems there is legacy reasons and issues due
> > > to dm.
> > >
> > > MMC uses 2 levels of probing. The first to handle initializing the
> > > host and the second to iterate attached devices. The second is done by
> > > a workqueue item. However, this paradigm makes wait_for_device_probe()
> > > useless as a barrier for when we can assume attached devices have been
> > > probed.
> > >
> > > This patch fixes this by exposing 2 methods inc/dec_probe_count() to
> > > allow device drivers that do asynchronous probing to delay waiters on
> > > wait_for_device_probe() so that when they are released, they can assume
> > > attached devices have been probed.
> >
>
> Thanks for the quick reply.
>
> > Please no. For 2 reasons:
> > - the api names you picked here do not make much sense from a global
> > namespace standpoint. Always try to do "noun/verb" as well, so if
> > we really wanted to do this it would be "driver_probe_incrememt()"
> > or something like that.
>
> Yeah that is a bit of a blunder on my part...
>
> > - drivers and subsystems should not be messing around with the probe
> > count as it's a hack in the first place to get around other issues.
> > Please let's not make it worse and make a formal api for it and allow
> > anyone to mess with it.
> >
>
> That's fair.
>
> > Why can't you just use normal deferred probing for this?
> >
>
> I'm not familiar with why mmc is written the way it is, but probing
> creates a notion of the host whereas the devices attached are probed
> later via a work item.
>
> Examining it a bit closer, inlining the first discovery call
> avoids all of this mess. I sent that out just now in [1]. Hopefully
> that'll be fine.
>
> [1] https://lore.kernel.org/lkml/20230329202148.71107-1-dennis@kernel.org/T/#u
Looks much better, except for the kernel test bot issues...
thanks,
greg k-h
@@ -494,6 +494,19 @@ EXPORT_SYMBOL_GPL(device_bind_driver);
static atomic_t probe_count = ATOMIC_INIT(0);
static DECLARE_WAIT_QUEUE_HEAD(probe_waitqueue);
+void inc_probe_count(void)
+{
+ atomic_inc(&probe_count);
+}
+EXPORT_SYMBOL_GPL(inc_probe_count);
+
+void dec_probe_count(void)
+{
+ if (atomic_dec_return(&probe_count) == 0)
+ wake_up_all(&probe_waitqueue);
+}
+EXPORT_SYMBOL_GPL(dec_probe_count);
+
static ssize_t state_synced_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
@@ -793,8 +806,8 @@ static int driver_probe_device(struct device_driver *drv, struct device *dev)
!defer_all_probes)
driver_deferred_probe_trigger();
}
- atomic_dec(&probe_count);
- wake_up_all(&probe_waitqueue);
+ if (atomic_dec_return(&probe_count) == 0)
+ wake_up_all(&probe_waitqueue);
return ret;
}
@@ -2192,11 +2192,11 @@ void mmc_rescan(struct work_struct *work)
int i;
if (host->rescan_disable)
- return;
+ goto out_probe;
/* If there is a non-removable card registered, only scan once */
if (!mmc_card_is_removable(host) && host->rescan_entered)
- return;
+ goto out_probe;
host->rescan_entered = 1;
if (host->trigger_card_event && host->ops->card_event) {
@@ -2247,6 +2247,13 @@ void mmc_rescan(struct work_struct *work)
out:
if (host->caps & MMC_CAP_NEEDS_POLL)
mmc_schedule_delayed_work(&host->detect, HZ);
+
+out_probe:
+ if (host->start_probe) {
+ /* matches inc_probe_count() in mmc_start_host() */
+ dec_probe_count();
+ host->start_probe = 0;
+ }
}
void mmc_start_host(struct mmc_host *host)
@@ -2261,6 +2268,15 @@ void mmc_start_host(struct mmc_host *host)
}
mmc_gpiod_request_cd_irq(host);
+
+ /*
+ * MMC uses 2 levels of probing. The first to handle initializing the
+ * host and the second to iterate attached devices. However, this
+ * paradigm breaks wait_for_device_probe(). Fix this here by
+ * incrementing the probe_count and decrementing after the scan.
+ */
+ host->start_probe = 1;
+ inc_probe_count();
_mmc_detect_change(host, 0, false);
}
@@ -2273,6 +2289,11 @@ void __mmc_stop_host(struct mmc_host *host)
host->rescan_disable = 1;
cancel_delayed_work_sync(&host->detect);
+ /* start_probe is protected by the cancel_delayed_work_sync() */
+ if (host->start_probe) {
+ dec_probe_count();
+ host->start_probe = 0;
+ }
}
void mmc_stop_host(struct mmc_host *host)
@@ -891,6 +891,13 @@ int __must_check device_reprobe(struct device *dev);
bool device_is_bound(struct device *dev);
+/*
+ * Functions that inc/dec probe_count to allow device drivers that finish
+ * probing asynchronously to delay wait_for_device_probe() appropriately.
+ */
+void inc_probe_count(void);
+void dec_probe_count(void);
+
/*
* Easy functions for dynamically creating devices on the fly
*/
@@ -428,6 +428,7 @@ struct mmc_host {
unsigned int retune_paused:1; /* re-tuning is temporarily disabled */
unsigned int retune_crc_disable:1; /* don't trigger retune upon crc */
unsigned int can_dma_map_merge:1; /* merging can be used */
+ unsigned int start_probe:1; /* if this is our first scan */
int rescan_disable; /* disable card detection */
int rescan_entered; /* used with nonremovable devices */