[v5,1/8] drivers: kunit: Generic helpers for test device creation

Message ID bad670ee135391eb902bd34b8bcbe777afabc7fd.1679474247.git.mazziesaccount@gmail.com
State New
Headers
Series Support ROHM BU27034 ALS sensor |

Commit Message

Matti Vaittinen March 22, 2023, 9:05 a.m. UTC
  The creation of a dummy device in order to test managed interfaces is a
generally useful test feature. The drm test helpers
drm_kunit_helper_alloc_device() and drm_kunit_helper_free_device()
are doing exactly this. It makes no sense that each and every component
which intends to be testing managed interfaces will create similar
helpers so stole these for more generic use.

Signed-off-by: Matti Vaittinen <mazziesaccount@gmail.com>

---
Changes:
v4 => v5:
- Add accidentally dropped header and email recipients
- do not rename + move helpers from DRM but add temporary dublicates to
  simplify merging. (This patch does not touch DRM and can be merged
  separately. DRM patch and IIO test patch still depend on this one).

Please note that there's something similar ongoing in the CCF:
https://lore.kernel.org/all/20230302013822.1808711-1-sboyd@kernel.org/

I do like the simplicity of these DRM-originated helpers so I think
they're worth. I do equally like the Stephen's idea of having the
"dummy platform device" related helpers under drivers/base and the
header being in include/kunit/platform_device.h which is similar to real
platform device under include/linux/platform_device.h
---
 drivers/base/test/Kconfig             |  5 ++
 drivers/base/test/Makefile            |  2 +
 drivers/base/test/test_kunit_device.c | 83 +++++++++++++++++++++++++++
 include/kunit/platform_device.h       | 13 +++++
 4 files changed, 103 insertions(+)
 create mode 100644 drivers/base/test/test_kunit_device.c
 create mode 100644 include/kunit/platform_device.h
  

Comments

Greg KH March 22, 2023, 12:04 p.m. UTC | #1
On Wed, Mar 22, 2023 at 11:05:55AM +0200, Matti Vaittinen wrote:
> --- /dev/null
> +++ b/include/kunit/platform_device.h
> @@ -0,0 +1,13 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef __KUNIT_PLATFORM_DEVICE__
> +#define __KUNIT_PLATFORM_DEVICE__
> +
> +#include <kunit/test.h>
> +
> +struct device;
> +
> +struct device *test_kunit_helper_alloc_device(struct kunit *test);
> +void test_kunit_helper_free_device(struct kunit *test, struct device *dev);

Why are you calling this a "platform_device" when it isn't a platform
device at all?

Why not just say "device.h" here?

thanks,

greg k-h
  
Greg KH March 22, 2023, 12:07 p.m. UTC | #2
On Wed, Mar 22, 2023 at 11:05:55AM +0200, Matti Vaittinen wrote:
> --- /dev/null
> +++ b/drivers/base/test/test_kunit_device.c
> @@ -0,0 +1,83 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * These helpers have been extracted from drm test code at
> + * drm_kunit_helpers.c which was authored by
> + * Maxime Ripard <maxime@cerno.tech>
> + */
> +
> +#include <linux/device.h>
> +#include <linux/platform_device.h>
> +
> +#include <kunit/platform_device.h>
> +
> +#define KUNIT_DEVICE_NAME	"test-kunit-mock-device"
> +
> +static int fake_probe(struct platform_device *pdev)

Please do not abuse platform devices and drivers for things that are not
actually platform devices and drivers.

> +{
> +	return 0;
> +}
> +
> +static int fake_remove(struct platform_device *pdev)
> +{
> +	return 0;
> +}
> +
> +static struct platform_driver fake_platform_driver = {
> +	.probe	= fake_probe,
> +	.remove	= fake_remove,
> +	.driver = {
> +		.name	= KUNIT_DEVICE_NAME,
> +	},
> +};

Why do you need this fake platform driver at all?

Why not just use a virtual device?

> +
> +/**
> + * test_kunit_helper_alloc_device - Allocate a mock device for a KUnit test
> + * @test: The test context object
> + *
> + * This allocates a fake struct &device to create a mock for a KUnit
> + * test. The device will also be bound to a fake driver. It will thus be
> + * able to leverage the usual infrastructure and most notably the
> + * device-managed resources just like a "real" device.

What specific "usual infrastructure" are you wanting to access here?

And again, if you want a fake device, make a virtual one, by just
calling device_create().

Or are you wanting to do "more" with that device pointer than
device_create() can give you?

Again, please do not abuse the platform device infrastructure for things
it was never ment to do (i.e. create fake devices that are not really a
platform device.)

thanks,

greg k-h
  
Greg KH March 22, 2023, 12:08 p.m. UTC | #3
On Wed, Mar 22, 2023 at 11:05:55AM +0200, Matti Vaittinen wrote:
> The creation of a dummy device in order to test managed interfaces is a
> generally useful test feature. The drm test helpers
> drm_kunit_helper_alloc_device() and drm_kunit_helper_free_device()
> are doing exactly this. It makes no sense that each and every component
> which intends to be testing managed interfaces will create similar
> helpers so stole these for more generic use.
> 
> Signed-off-by: Matti Vaittinen <mazziesaccount@gmail.com>
> 
> ---
> Changes:
> v4 => v5:
> - Add accidentally dropped header and email recipients
> - do not rename + move helpers from DRM but add temporary dublicates to
>   simplify merging. (This patch does not touch DRM and can be merged
>   separately. DRM patch and IIO test patch still depend on this one).
> 
> Please note that there's something similar ongoing in the CCF:
> https://lore.kernel.org/all/20230302013822.1808711-1-sboyd@kernel.org/
> 
> I do like the simplicity of these DRM-originated helpers so I think
> they're worth. I do equally like the Stephen's idea of having the
> "dummy platform device" related helpers under drivers/base and the
> header being in include/kunit/platform_device.h which is similar to real
> platform device under include/linux/platform_device.h
> ---
>  drivers/base/test/Kconfig             |  5 ++
>  drivers/base/test/Makefile            |  2 +
>  drivers/base/test/test_kunit_device.c | 83 +++++++++++++++++++++++++++

By putting files in this directory, you are asking me to maintain this
code, and right now, I can't agree with that, sorry.  See my review
comments on it for why.

thanks,

greg k-h
  
Matti Vaittinen March 22, 2023, 1:48 p.m. UTC | #4
Hi Greg,

Thanks for looking at this.

On 3/22/23 14:07, Greg Kroah-Hartman wrote:
> On Wed, Mar 22, 2023 at 11:05:55AM +0200, Matti Vaittinen wrote:
>> --- /dev/null
>> +++ b/drivers/base/test/test_kunit_device.c
>> @@ -0,0 +1,83 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * These helpers have been extracted from drm test code at
>> + * drm_kunit_helpers.c which was authored by
>> + * Maxime Ripard <maxime@cerno.tech>
>> + */
>> +
>> +#include <linux/device.h>
>> +#include <linux/platform_device.h>
>> +
>> +#include <kunit/platform_device.h>
>> +
>> +#define KUNIT_DEVICE_NAME	"test-kunit-mock-device"
>> +
>> +static int fake_probe(struct platform_device *pdev)
> 
> Please do not abuse platform devices and drivers for things that are not
> actually platform devices and drivers.
> 
>> +{
>> +	return 0;
>> +}
>> +
>> +static int fake_remove(struct platform_device *pdev)
>> +{
>> +	return 0;
>> +}
>> +
>> +static struct platform_driver fake_platform_driver = {
>> +	.probe	= fake_probe,
>> +	.remove	= fake_remove,
>> +	.driver = {
>> +		.name	= KUNIT_DEVICE_NAME,
>> +	},
>> +};
> 
> Why do you need this fake platform driver at all?
> 
> Why not just use a virtual device?

I can only answer on my behalf. In my case the answer to why I used 
platform_devices is practicality. I wanted to test devm_ APIs using 
KUnit tests and I was pointed to an existing implementation in DRM (seen 
in these patches). It didn't seem to make any sense to re-invent the 
wheel by writing another implementation for the existing in-tree 
functionality.

Maybe Maxime had a better reason to go with the platform devices.

>> +/**
>> + * test_kunit_helper_alloc_device - Allocate a mock device for a KUnit test
>> + * @test: The test context object
>> + *
>> + * This allocates a fake struct &device to create a mock for a KUnit
>> + * test. The device will also be bound to a fake driver. It will thus be
>> + * able to leverage the usual infrastructure and most notably the
>> + * device-managed resources just like a "real" device.
> 
> What specific "usual infrastructure" are you wanting to access here?
> 
> And again, if you want a fake device, make a virtual one, by just
> calling device_create().
> 
> Or are you wanting to do "more" with that device pointer than
> device_create() can give you?

Personally, I was (am) only interested in devm_ unwinding. I guess the
device_create(), device_add(), device_remove()... (didn't study this 
sequence in details so sorry if there is errors) could've been 
sufficient for me. I haven't looked how much of the code that there is 
for 'platform devices' should be duplicated to support that sequence for 
testability purposes.

The biggest thing for me is that I don't like the idea of creating own 
'test device' in <add subsystem here> while we already have some in DRM 
(or others). Thus, I do see value in adding generic helpers for 
supporting running KUnit tests on devm_* APIs. Hence it'd be good to 
have _some_ support for it. And having them in drivers/base/test seemed 
like a correct place to me. What I really don't know is if there are 
legitimate use-cases for using platform_devices in DRM tests. Perhaps 
Maxime can shed light on that.

Yours,
	-- Matti
  
Greg KH March 22, 2023, 6:57 p.m. UTC | #5
On Wed, Mar 22, 2023 at 03:48:00PM +0200, Matti Vaittinen wrote:
> Hi Greg,
> 
> Thanks for looking at this.
> 
> On 3/22/23 14:07, Greg Kroah-Hartman wrote:
> > On Wed, Mar 22, 2023 at 11:05:55AM +0200, Matti Vaittinen wrote:
> > > --- /dev/null
> > > +++ b/drivers/base/test/test_kunit_device.c
> > > @@ -0,0 +1,83 @@
> > > +// SPDX-License-Identifier: GPL-2.0
> > > +/*
> > > + * These helpers have been extracted from drm test code at
> > > + * drm_kunit_helpers.c which was authored by
> > > + * Maxime Ripard <maxime@cerno.tech>
> > > + */
> > > +
> > > +#include <linux/device.h>
> > > +#include <linux/platform_device.h>
> > > +
> > > +#include <kunit/platform_device.h>
> > > +
> > > +#define KUNIT_DEVICE_NAME	"test-kunit-mock-device"
> > > +
> > > +static int fake_probe(struct platform_device *pdev)
> > 
> > Please do not abuse platform devices and drivers for things that are not
> > actually platform devices and drivers.
> > 
> > > +{
> > > +	return 0;
> > > +}
> > > +
> > > +static int fake_remove(struct platform_device *pdev)
> > > +{
> > > +	return 0;
> > > +}
> > > +
> > > +static struct platform_driver fake_platform_driver = {
> > > +	.probe	= fake_probe,
> > > +	.remove	= fake_remove,
> > > +	.driver = {
> > > +		.name	= KUNIT_DEVICE_NAME,
> > > +	},
> > > +};
> > 
> > Why do you need this fake platform driver at all?
> > 
> > Why not just use a virtual device?
> 
> I can only answer on my behalf. In my case the answer to why I used
> platform_devices is practicality. I wanted to test devm_ APIs using KUnit
> tests and I was pointed to an existing implementation in DRM (seen in these
> patches). It didn't seem to make any sense to re-invent the wheel by writing
> another implementation for the existing in-tree functionality.

That's fine, but please, let's do this right if it's going to be in the
driver core, that way we can actually test the driver core code as well.

> > > +/**
> > > + * test_kunit_helper_alloc_device - Allocate a mock device for a KUnit test
> > > + * @test: The test context object
> > > + *
> > > + * This allocates a fake struct &device to create a mock for a KUnit
> > > + * test. The device will also be bound to a fake driver. It will thus be
> > > + * able to leverage the usual infrastructure and most notably the
> > > + * device-managed resources just like a "real" device.
> > 
> > What specific "usual infrastructure" are you wanting to access here?
> > 
> > And again, if you want a fake device, make a virtual one, by just
> > calling device_create().
> > 
> > Or are you wanting to do "more" with that device pointer than
> > device_create() can give you?
> 
> Personally, I was (am) only interested in devm_ unwinding. I guess the
> device_create(), device_add(), device_remove()... (didn't study this
> sequence in details so sorry if there is errors) could've been sufficient
> for me. I haven't looked how much of the code that there is for 'platform
> devices' should be duplicated to support that sequence for testability
> purposes.

Any device can access devm_ code, there's no need for it to be a
platform device at all.

> The biggest thing for me is that I don't like the idea of creating own 'test
> device' in <add subsystem here> while we already have some in DRM (or
> others). Thus, I do see value in adding generic helpers for supporting
> running KUnit tests on devm_* APIs. Hence it'd be good to have _some_
> support for it.

I agree, let's use a virtual device and a virtual bus (you can use the
auxbus code for this as that's all there for this type of thing) and
then you can test the devm_* code, _AND_ you can test the driver core at
the same time.

> And having them in drivers/base/test seemed like a correct
> place to me. What I really don't know is if there are legitimate use-cases
> for using platform_devices in DRM tests. Perhaps Maxime can shed light on
> that.

I agree that this could be in drivers/base/test/ but then let's test the
driver core, not just provide a dummy platform device.

If you want to test the platform driver/device api, that would be great
too, that can be plaform device/driver specific, but don't use one for
some other random driver core functionality please.

thanks,

greg k-h
  
Matti Vaittinen March 23, 2023, 7:17 a.m. UTC | #6
On 3/22/23 20:57, Greg Kroah-Hartman wrote:
> On Wed, Mar 22, 2023 at 03:48:00PM +0200, Matti Vaittinen wrote:
>> Hi Greg,
>>
>> Thanks for looking at this.
>>
>> On 3/22/23 14:07, Greg Kroah-Hartman wrote:
>>> On Wed, Mar 22, 2023 at 11:05:55AM +0200, Matti Vaittinen wrote:
>>>> +/**
>>>> + * test_kunit_helper_alloc_device - Allocate a mock device for a KUnit test
>>>> + * @test: The test context object
>>>> + *
>>>> + * This allocates a fake struct &device to create a mock for a KUnit
>>>> + * test. The device will also be bound to a fake driver. It will thus be
>>>> + * able to leverage the usual infrastructure and most notably the
>>>> + * device-managed resources just like a "real" device.
>>>
>>> What specific "usual infrastructure" are you wanting to access here?
>>>
>>> And again, if you want a fake device, make a virtual one, by just
>>> calling device_create().
>>>
>>> Or are you wanting to do "more" with that device pointer than
>>> device_create() can give you?
>>
>> Personally, I was (am) only interested in devm_ unwinding. I guess the
>> device_create(), device_add(), device_remove()... (didn't study this
>> sequence in details so sorry if there is errors) could've been sufficient
>> for me. I haven't looked how much of the code that there is for 'platform
>> devices' should be duplicated to support that sequence for testability
>> purposes.
> 
> Any device can access devm_ code, there's no need for it to be a
> platform device at all.
> 
>> The biggest thing for me is that I don't like the idea of creating own 'test
>> device' in <add subsystem here> while we already have some in DRM (or
>> others). Thus, I do see value in adding generic helpers for supporting
>> running KUnit tests on devm_* APIs. Hence it'd be good to have _some_
>> support for it.
> 
> I agree, let's use a virtual device and a virtual bus (you can use the
> auxbus code for this as that's all there for this type of thing)

Hm. The auxiliary_devices require parent. What would be the best way to 
deal with that in KUnit tests?

> then you can test the devm_* code, _AND_ you can test the driver core at
> the same time.
> 
>> And having them in drivers/base/test seemed like a correct
>> place to me. What I really don't know is if there are legitimate use-cases
>> for using platform_devices in DRM tests. Perhaps Maxime can shed light on
>> that.
> 
> I agree that this could be in drivers/base/test/ but then let's test the
> driver core, not just provide a dummy platform device.
> 
> If you want to test the platform driver/device api, that would be great
> too, that can be plaform device/driver specific, but don't use one for
> some other random driver core functionality please.

I am very conservative what comes to adding unit tests due to the huge 
inertia they add to any further development. I usually only add tests to 
APIs which I know won't require changing (I don't know such in-kernel 
APIs) - or to functions which I think are error-prone. So, I am probably 
one of the last persons adding UTs to code I don't know :)

Yours,
	-- Matti

-- 
Matti Vaittinen
Linux kernel developer at ROHM Semiconductors
Oulu Finland

~~ When things go utterly wrong vim users can always type :help! ~~
  
David Gow March 23, 2023, 7:30 a.m. UTC | #7
On Wed, 22 Mar 2023 at 17:06, Matti Vaittinen <mazziesaccount@gmail.com> wrote:
>
> The creation of a dummy device in order to test managed interfaces is a
> generally useful test feature. The drm test helpers
> drm_kunit_helper_alloc_device() and drm_kunit_helper_free_device()
> are doing exactly this. It makes no sense that each and every component
> which intends to be testing managed interfaces will create similar
> helpers so stole these for more generic use.
>
> Signed-off-by: Matti Vaittinen <mazziesaccount@gmail.com>
>
> ---
> Changes:
> v4 => v5:
> - Add accidentally dropped header and email recipients
> - do not rename + move helpers from DRM but add temporary dublicates to
>   simplify merging. (This patch does not touch DRM and can be merged
>   separately. DRM patch and IIO test patch still depend on this one).
>
> Please note that there's something similar ongoing in the CCF:
> https://lore.kernel.org/all/20230302013822.1808711-1-sboyd@kernel.org/
>
> I do like the simplicity of these DRM-originated helpers so I think
> they're worth. I do equally like the Stephen's idea of having the
> "dummy platform device" related helpers under drivers/base and the
> header being in include/kunit/platform_device.h which is similar to real
> platform device under include/linux/platform_device.h
> ---

Thanks for sending this my way.

It's clear we need some way of creating "fake" devices for KUnit
tests. Given that there are now (at least) three different drivers
looking to use this, we'll ultimately need something which works for
everyone.

I think the questions we therefore need to answer are:
- Do we specifically need a platform_device (or, any other specific
device struct), or would any old struct device work? I can see why we
would need a platform device for cases where we're testing things like
device-tree properties (or, in the future, having e.g. USB-specific
helpers which create usb_device). Most tests just use
root_device_register() thus far, though.
- What helpers do we need to make creating, using, and cleaning up
these devices as simple as possible.

I think that in this particular case, we don't actually need a struct
platform_device. Replacing these helpers with simple calls to
root_device_register() and root_device_unregister() seems to work fine
with this patch series for me. (It does break the
drm_test_managed_run_action test, though.) So I don't think having
these helpers actually help this series at the moment.

That being said, if they used the KUnit resource system to
automatically clean up the device when the test is finished (or
otherwise exits), that would seem like a real advantage. The clk and
drm examples both do this, and I'm hoping to add an API to make it
even simpler going forward. (With the work-in-progress API described
here[1], the whole thing should boil down to "struct device *dev =
root_device_register(name); kunit_defer(root_device_unregister,
dev);".)

So, I guess we have three cases we need to look at:
- A test just needs any old struct device. Tests thus far have
hardcoded, or had their own wrappers around root_device_register() for
this.
- A test needs a device attached to a bus (but doesn't care which
bus). Thus far, people have used struct platform_device for this (see
the DRM helpers, which use a platform device for managed resource
tests[2]). Maybe the right solution here is something like a struct
kunit_device?
- A test needs a device attached to a specific bus. We'll probably
need some more detailed faking of that bus. This covers cases like
having fake USB devices, devicetree integration, etc.

I'd suggest that, for the majority of cases which only care about the
first case, let's just use root_device_register() directly, or have a
thin wrapper like the old root_device-based version of the DRM
helpers[3]. This will probable serve us well enough while we work out
how to handle the other two cases properly (which is already being
looked at for the CLK/DeviceTree patches and the DRM stuff).

If the resulting helpers are generally useful enough, they can
probably sit in either drivers/base or lib/kunit. I'd rather not have
code that's really specific to certain busses sitting in lib/kunit
rather than alongside the device/bus code in drivers/base or some
other subsystem/driver path, but I can tolerate it for the very
generic struct device.

Regardless, I've left a few notes on the patch itself below.

Cheers,
-- David

[1]: https://kunit-review.googlesource.com/c/linux/+/5434/3/include/kunit/resource.h
[2]: https://lore.kernel.org/all/20221123-rpi-kunit-tests-v3-8-4615a663a84a@cerno.tech/
[3]: https://elixir.bootlin.com/linux/v6.2/source/drivers/gpu/drm/tests/drm_kunit_helpers.c#L39
>  drivers/base/test/Kconfig             |  5 ++
>  drivers/base/test/Makefile            |  2 +
>  drivers/base/test/test_kunit_device.c | 83 +++++++++++++++++++++++++++
>  include/kunit/platform_device.h       | 13 +++++
>  4 files changed, 103 insertions(+)
>  create mode 100644 drivers/base/test/test_kunit_device.c
>  create mode 100644 include/kunit/platform_device.h
>
> diff --git a/drivers/base/test/Kconfig b/drivers/base/test/Kconfig
> index 610a1ba7a467..642b5b183c10 100644
> --- a/drivers/base/test/Kconfig
> +++ b/drivers/base/test/Kconfig
> @@ -1,4 +1,9 @@
>  # SPDX-License-Identifier: GPL-2.0
> +
> +config TEST_KUNIT_DEVICE_HELPERS
> +       depends on KUNIT
> +       tristate
> +
>  config TEST_ASYNC_DRIVER_PROBE
>         tristate "Build kernel module to test asynchronous driver probing"
>         depends on m
> diff --git a/drivers/base/test/Makefile b/drivers/base/test/Makefile
> index 7f76fee6f989..49926524ec6f 100644
> --- a/drivers/base/test/Makefile
> +++ b/drivers/base/test/Makefile
> @@ -1,5 +1,7 @@
>  # SPDX-License-Identifier: GPL-2.0
>  obj-$(CONFIG_TEST_ASYNC_DRIVER_PROBE)  += test_async_driver_probe.o
>
> +obj-$(CONFIG_TEST_KUNIT_DEVICE_HELPERS) += test_kunit_device.o
> +
>  obj-$(CONFIG_DRIVER_PE_KUNIT_TEST) += property-entry-test.o
>  CFLAGS_property-entry-test.o += $(DISABLE_STRUCTLEAK_PLUGIN)
> diff --git a/drivers/base/test/test_kunit_device.c b/drivers/base/test/test_kunit_device.c
> new file mode 100644
> index 000000000000..75790e15b85c
> --- /dev/null
> +++ b/drivers/base/test/test_kunit_device.c
> @@ -0,0 +1,83 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * These helpers have been extracted from drm test code at
> + * drm_kunit_helpers.c which was authored by
> + * Maxime Ripard <maxime@cerno.tech>
> + */
> +
> +#include <linux/device.h>
> +#include <linux/platform_device.h>
> +
> +#include <kunit/platform_device.h>
> +
> +#define KUNIT_DEVICE_NAME      "test-kunit-mock-device"

Personally, I'd really rather this be a name passed in by the test.
What if a test needs to create multiple distinct devices?

> +
> +static int fake_probe(struct platform_device *pdev)
> +{
> +       return 0;
> +}
> +
> +static int fake_remove(struct platform_device *pdev)
> +{
> +       return 0;
> +}
> +
> +static struct platform_driver fake_platform_driver = {
> +       .probe  = fake_probe,
> +       .remove = fake_remove,
> +       .driver = {
> +               .name   = KUNIT_DEVICE_NAME,
> +       },
> +};
> +
> +/**
> + * test_kunit_helper_alloc_device - Allocate a mock device for a KUnit test
> + * @test: The test context object
> + *
> + * This allocates a fake struct &device to create a mock for a KUnit
> + * test. The device will also be bound to a fake driver. It will thus be
> + * able to leverage the usual infrastructure and most notably the
> + * device-managed resources just like a "real" device.
> + *
> + * Callers need to make sure test_kunit_helper_free_device() on the
> + * device when done.
> + *
> + * Returns:
> + * A pointer to the new device, or an ERR_PTR() otherwise.
> + */
> +struct device *test_kunit_helper_alloc_device(struct kunit *test)
> +{
> +       struct platform_device *pdev;
> +       int ret;
> +
> +       ret = platform_driver_register(&fake_platform_driver);
> +       KUNIT_ASSERT_EQ(test, ret, 0);
> +
> +       pdev = platform_device_alloc(KUNIT_DEVICE_NAME, PLATFORM_DEVID_NONE);
> +       KUNIT_ASSERT_NOT_ERR_OR_NULL(test, pdev);
> +
> +       ret = platform_device_add(pdev);
> +       KUNIT_ASSERT_EQ(test, ret, 0);
> +
> +       return &pdev->dev;
> +}
> +EXPORT_SYMBOL_GPL(test_kunit_helper_alloc_device);
> +
> +/**
> + * test_kunit_helper_free_device - Frees a mock device
> + * @test: The test context object
> + * @dev: The device to free
> + *
> + * Frees a device allocated with test_kunit_helper_alloc_device().
> + */

This really should be automatically called when the test exits,
probably using kunit reources. Ideally, there'd also be a function to
free it earlier, which can be done by calling kunit_remove_resource()
to lower the refcount.

> +void test_kunit_helper_free_device(struct kunit *test, struct device *dev)
> +{
> +       struct platform_device *pdev = to_platform_device(dev);
> +
> +       platform_device_unregister(pdev);
> +       platform_driver_unregister(&fake_platform_driver);
> +}
> +EXPORT_SYMBOL_GPL(test_kunit_helper_free_device);
> +
> +MODULE_AUTHOR("Matti Vaittinen <mazziesaccount@gmail.com>");
> +MODULE_LICENSE("GPL");
> diff --git a/include/kunit/platform_device.h b/include/kunit/platform_device.h
> new file mode 100644
> index 000000000000..2a9c7bdd75eb
> --- /dev/null
> +++ b/include/kunit/platform_device.h
> @@ -0,0 +1,13 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef __KUNIT_PLATFORM_DEVICE__
> +#define __KUNIT_PLATFORM_DEVICE__
> +
> +#include <kunit/test.h>
> +
> +struct device;
> +
> +struct device *test_kunit_helper_alloc_device(struct kunit *test);
> +void test_kunit_helper_free_device(struct kunit *test, struct device *dev);

If these helpers are supposed to guarantee that the resulting device
is a platform device, let's reflect that in their names. Otherwise,
let's not put this in a platform_device.h header, but maybe something
more general, like kunit/device.h.

> +
> +#endif
> --
> 2.39.2
>
>
> --
> Matti Vaittinen, Linux device drivers
> ROHM Semiconductors, Finland SWDC
> Kiviharjunlenkki 1E
> 90220 OULU
> FINLAND
>
> ~~~ "I don't think so," said Rene Descartes. Just then he vanished ~~~
> Simon says - in Latin please.
> ~~~ "non cogito me" dixit Rene Descarte, deinde evanescavit ~~~
> Thanks to Simon Glass for the translation =]
  
Matti Vaittinen March 23, 2023, 8:35 a.m. UTC | #8
Hi David, all,

On 3/23/23 09:30, David Gow wrote:
> On Wed, 22 Mar 2023 at 17:06, Matti Vaittinen <mazziesaccount@gmail.com> wrote:
>>
>> The creation of a dummy device in order to test managed interfaces is a
>> generally useful test feature. The drm test helpers
>> drm_kunit_helper_alloc_device() and drm_kunit_helper_free_device()
>> are doing exactly this. It makes no sense that each and every component
>> which intends to be testing managed interfaces will create similar
>> helpers so stole these for more generic use.
>>
>> Signed-off-by: Matti Vaittinen <mazziesaccount@gmail.com>
>>
>> ---
>> Changes:
>> v4 => v5:
>> - Add accidentally dropped header and email recipients
>> - do not rename + move helpers from DRM but add temporary dublicates to
>>    simplify merging. (This patch does not touch DRM and can be merged
>>    separately. DRM patch and IIO test patch still depend on this one).
>>
>> Please note that there's something similar ongoing in the CCF:
>> https://lore.kernel.org/all/20230302013822.1808711-1-sboyd@kernel.org/
>>
>> I do like the simplicity of these DRM-originated helpers so I think
>> they're worth. I do equally like the Stephen's idea of having the
>> "dummy platform device" related helpers under drivers/base and the
>> header being in include/kunit/platform_device.h which is similar to real
>> platform device under include/linux/platform_device.h
>> ---
> 
> Thanks for sending this my way.
> 
> It's clear we need some way of creating "fake" devices for KUnit
> tests. Given that there are now (at least) three different drivers
> looking to use this, we'll ultimately need something which works for
> everyone.
> 
> I think the questions we therefore need to answer are:
> - Do we specifically need a platform_device (or, any other specific
> device struct), or would any old struct device work? I can see why we
> would need a platform device for cases where we're testing things like
> device-tree properties (or, in the future, having e.g. USB-specific
> helpers which create usb_device). Most tests just use
> root_device_register() thus far, though.

Funny timing. I just found the root_device_register() while wondering 
the parent for auxiliary_devices.

I think the root_device_[un]register() meets my current needs.

> - What helpers do we need to make creating, using, and cleaning up
> these devices as simple as possible.
> 
> I think that in this particular case, we don't actually need a struct
> platform_device. Replacing these helpers with simple calls to
> root_device_register() and root_device_unregister() seems to work fine
> with this patch series for me. (It does break the
> drm_test_managed_run_action test, though.) So I don't think having
> these helpers actually help this series at the moment.

I am afraid that further work with these helpers is out of the scope for 
me (at least for now). I'll drop the DRM and the helper patches from 
this series && go with the root_device_register(), 
root_device_unregister() in the IIO tests added in this series.

> That being said, if they used the KUnit resource system to
> automatically clean up the device when the test is finished (or
> otherwise exits),

My 10 cents to this is that yes, automatic unwinding at test exit would 
be simple - and also correct for most of the time. However, I think 
there might also be tests that would like to verify the unwinding 
process has really managed to do what it was intended to do. That, I 
think would require being able to manually drop the device in the middle 
of the test.

> So, I guess we have three cases we need to look at:
> - A test just needs any old struct device. Tests thus far have
> hardcoded, or had their own wrappers around root_device_register() for
> this.

As said above, my case fits this category.

> - A test needs a device attached to a bus (but doesn't care which
> bus). Thus far, people have used struct platform_device for this (see
> the DRM helpers, which use a platform device for managed resource
> tests[2]). Maybe the right solution here is something like a struct
> kunit_device?

This sounds like, how to put it, "architecturally correct". But...

> - A test needs a device attached to a specific bus. We'll probably
> need some more detailed faking of that bus. This covers cases like
> having fake USB devices, devicetree integration, etc.

...if we in any case need this, wouldn't the kunit_device just be 
unnecessary bloat because if the test does not care which bus it is 
sitting in, then it could probably use a bus-specific device as well?

> I'd suggest that, for the majority of cases which only care about the
> first case, let's just use root_device_register() directly,

After finding the root_device_register() - I agree.

> or have a
> thin wrapper like the old root_device-based version of the DRM
> helpers[3]. This will probable serve us well enough while we work out
> how to handle the other two cases properly (which is already being
> looked at for the CLK/DeviceTree patches and the DRM stuff).
> 
> If the resulting helpers are generally useful enough, they can
> probably sit in either drivers/base or lib/kunit. I'd rather not have
> code that's really specific to certain busses sitting in lib/kunit
> rather than alongside the device/bus code in drivers/base or some
> other subsystem/driver path, but I can tolerate it for the very
> generic struct device.
> 
> Regardless, I've left a few notes on the patch itself below.

Thanks but I guess I'll just drop this one :)

> 
> Cheers,
> -- David
> 
> [1]: https://kunit-review.googlesource.com/c/linux/+/5434/3/include/kunit/resource.h
> [2]: https://lore.kernel.org/all/20221123-rpi-kunit-tests-v3-8-4615a663a84a@cerno.tech/
> [3]: https://elixir.bootlin.com/linux/v6.2/source/drivers/gpu/drm/tests/drm_kunit_helpers.c#L39

Thanks for the thorough analysis and these links! This was enlightening :)

Yours,
	-- Matti
  
Greg KH March 23, 2023, 8:58 a.m. UTC | #9
On Thu, Mar 23, 2023 at 07:17:40AM +0000, Vaittinen, Matti wrote:
> On 3/22/23 20:57, Greg Kroah-Hartman wrote:
> > On Wed, Mar 22, 2023 at 03:48:00PM +0200, Matti Vaittinen wrote:
> >> Hi Greg,
> >>
> >> Thanks for looking at this.
> >>
> >> On 3/22/23 14:07, Greg Kroah-Hartman wrote:
> >>> On Wed, Mar 22, 2023 at 11:05:55AM +0200, Matti Vaittinen wrote:
> >>>> +/**
> >>>> + * test_kunit_helper_alloc_device - Allocate a mock device for a KUnit test
> >>>> + * @test: The test context object
> >>>> + *
> >>>> + * This allocates a fake struct &device to create a mock for a KUnit
> >>>> + * test. The device will also be bound to a fake driver. It will thus be
> >>>> + * able to leverage the usual infrastructure and most notably the
> >>>> + * device-managed resources just like a "real" device.
> >>>
> >>> What specific "usual infrastructure" are you wanting to access here?
> >>>
> >>> And again, if you want a fake device, make a virtual one, by just
> >>> calling device_create().
> >>>
> >>> Or are you wanting to do "more" with that device pointer than
> >>> device_create() can give you?
> >>
> >> Personally, I was (am) only interested in devm_ unwinding. I guess the
> >> device_create(), device_add(), device_remove()... (didn't study this
> >> sequence in details so sorry if there is errors) could've been sufficient
> >> for me. I haven't looked how much of the code that there is for 'platform
> >> devices' should be duplicated to support that sequence for testability
> >> purposes.
> > 
> > Any device can access devm_ code, there's no need for it to be a
> > platform device at all.
> > 
> >> The biggest thing for me is that I don't like the idea of creating own 'test
> >> device' in <add subsystem here> while we already have some in DRM (or
> >> others). Thus, I do see value in adding generic helpers for supporting
> >> running KUnit tests on devm_* APIs. Hence it'd be good to have _some_
> >> support for it.
> > 
> > I agree, let's use a virtual device and a virtual bus (you can use the
> > auxbus code for this as that's all there for this type of thing)
> 
> Hm. The auxiliary_devices require parent. What would be the best way to 
> deal with that in KUnit tests?

If you use NULL as the parent, it goes into the root.

> > then you can test the devm_* code, _AND_ you can test the driver core at
> > the same time.
> > 
> >> And having them in drivers/base/test seemed like a correct
> >> place to me. What I really don't know is if there are legitimate use-cases
> >> for using platform_devices in DRM tests. Perhaps Maxime can shed light on
> >> that.
> > 
> > I agree that this could be in drivers/base/test/ but then let's test the
> > driver core, not just provide a dummy platform device.
> > 
> > If you want to test the platform driver/device api, that would be great
> > too, that can be plaform device/driver specific, but don't use one for
> > some other random driver core functionality please.
> 
> I am very conservative what comes to adding unit tests due to the huge 
> inertia they add to any further development. I usually only add tests to 
> APIs which I know won't require changing (I don't know such in-kernel 
> APIs)

So anything that is changing doesn't get a test?  If you only test
things that don't change then no tests fail, and so, why have the test
at all?

On the contrary, tests should be used to verify things that are changing
all the time, to ensure that we don't break things.  That's why we need
them, not to just validate that old code still is going ok.

The driver core is changing, and so, I would love to see tests for it to
ensure that I don't break anything over time.  That should NOT slow down
development but rather, speed it up as it ensures that things still work
properly.

> - or to functions which I think are error-prone. So, I am probably 
> one of the last persons adding UTs to code I don't know :)

That's fine, you don't have to add test code for stuff you don't know.

But again, do NOT abuse a platform device for this, that's not ok, and
the in-kernel code that does do this should be fixed up.

thanks,

greg k-h
  
Greg KH March 23, 2023, 9:02 a.m. UTC | #10
On Thu, Mar 23, 2023 at 03:30:34PM +0800, David Gow wrote:
> On Wed, 22 Mar 2023 at 17:06, Matti Vaittinen <mazziesaccount@gmail.com> wrote:
> >
> > The creation of a dummy device in order to test managed interfaces is a
> > generally useful test feature. The drm test helpers
> > drm_kunit_helper_alloc_device() and drm_kunit_helper_free_device()
> > are doing exactly this. It makes no sense that each and every component
> > which intends to be testing managed interfaces will create similar
> > helpers so stole these for more generic use.
> >
> > Signed-off-by: Matti Vaittinen <mazziesaccount@gmail.com>
> >
> > ---
> > Changes:
> > v4 => v5:
> > - Add accidentally dropped header and email recipients
> > - do not rename + move helpers from DRM but add temporary dublicates to
> >   simplify merging. (This patch does not touch DRM and can be merged
> >   separately. DRM patch and IIO test patch still depend on this one).
> >
> > Please note that there's something similar ongoing in the CCF:
> > https://lore.kernel.org/all/20230302013822.1808711-1-sboyd@kernel.org/
> >
> > I do like the simplicity of these DRM-originated helpers so I think
> > they're worth. I do equally like the Stephen's idea of having the
> > "dummy platform device" related helpers under drivers/base and the
> > header being in include/kunit/platform_device.h which is similar to real
> > platform device under include/linux/platform_device.h
> > ---
> 
> Thanks for sending this my way.
> 
> It's clear we need some way of creating "fake" devices for KUnit
> tests. Given that there are now (at least) three different drivers
> looking to use this, we'll ultimately need something which works for
> everyone.
> 
> I think the questions we therefore need to answer are:
> - Do we specifically need a platform_device (or, any other specific
> device struct), or would any old struct device work? I can see why we
> would need a platform device for cases where we're testing things like
> device-tree properties (or, in the future, having e.g. USB-specific
> helpers which create usb_device). Most tests just use
> root_device_register() thus far, though.

Any struct device would work, so please do NOT use a platform device.

Use aux devices, or better yet, just a normal virtual device by calling
device_create().

> - What helpers do we need to make creating, using, and cleaning up
> these devices as simple as possible.

Becides what the driver core already provides you?  What exactly are you
wanting to do here?

> I think that in this particular case, we don't actually need a struct
> platform_device. Replacing these helpers with simple calls to
> root_device_register() and root_device_unregister() seems to work fine
> with this patch series for me. (It does break the
> drm_test_managed_run_action test, though.) So I don't think having
> these helpers actually help this series at the moment.

Why abuse root_device_*() for something that really is not a root device
in the system?  Why not use a virtual device?  Or better yet, a kunit
bus with devices on it that are just for testing?  That way you can
properly test the bus and driver and device code fully, which is what
you are implying is needed here.

> That being said, if they used the KUnit resource system to
> automatically clean up the device when the test is finished (or
> otherwise exits), that would seem like a real advantage. The clk and
> drm examples both do this, and I'm hoping to add an API to make it
> even simpler going forward. (With the work-in-progress API described
> here[1], the whole thing should boil down to "struct device *dev =
> root_device_register(name); kunit_defer(root_device_unregister,
> dev);".)
> 
> So, I guess we have three cases we need to look at:
> - A test just needs any old struct device. Tests thus far have
> hardcoded, or had their own wrappers around root_device_register() for
> this.

Again, make a kunit bus and devices, that might be "simplest" overall.

> - A test needs a device attached to a bus (but doesn't care which
> bus). Thus far, people have used struct platform_device for this (see
> the DRM helpers, which use a platform device for managed resource
> tests[2]). Maybe the right solution here is something like a struct
> kunit_device?

Yes.

> - A test needs a device attached to a specific bus. We'll probably
> need some more detailed faking of that bus. This covers cases like
> having fake USB devices, devicetree integration, etc.

Have your own bus.  No need to mess with USB or any real bus UNLESS you
want to actually test those busses, and if so, just create fake USB or
clk or whatever devices so that you can test them.

Or just rely on the testing that some busses have for their devices (USB
has this today and syzbot knows how to test it), as busses for hardware
can be complex and usually require a "real" driver to be written for
them to test things.

thanks,

gre gk-h
  
Matti Vaittinen March 23, 2023, 9:20 a.m. UTC | #11
On 3/23/23 10:58, Greg Kroah-Hartman wrote:
> On Thu, Mar 23, 2023 at 07:17:40AM +0000, Vaittinen, Matti wrote:
>> On 3/22/23 20:57, Greg Kroah-Hartman wrote:
>>> On Wed, Mar 22, 2023 at 03:48:00PM +0200, Matti Vaittinen wrote:
>>>> Hi Greg,
>>>>
>>>> Thanks for looking at this.
>>>>
>>>> On 3/22/23 14:07, Greg Kroah-Hartman wrote:
>>>>> On Wed, Mar 22, 2023 at 11:05:55AM +0200, Matti Vaittinen wrote:

>> I am very conservative what comes to adding unit tests due to the huge
>> inertia they add to any further development. I usually only add tests to
>> APIs which I know won't require changing (I don't know such in-kernel
>> APIs)
> 
> So anything that is changing doesn't get a test? 

No. I think you misread me. I didn't say I don't like adding tests to 
code which changes. I said, I don't like adding tests to APIs which change.

  If you only test
> things that don't change then no tests fail, and so, why have the test
> at all?

Because implementation cascading into functions below an API may change 
even if the API stays unchanged.

> On the contrary, tests should be used to verify things that are changing
> all the time, to ensure that we don't break things.

This is only true when your test code stays valid. Problem with 
excessive amount of tests is that more we have callers for an API, 
harder changing that API becomes. I've seen a point where people stop 
fixing "unimportant" things just because the amount of work fixing all 
impacted UT-cases would take. I know that many things went wrong before 
that project ended up to the point - but what I picked up with me is 
that carelessly added UTs do really hinder further development.

  That's why we need
> them, not to just validate that old code still is going ok.
> 
> The driver core is changing, and so, I would love to see tests for it to
> ensure that I don't break anything over time.  That should NOT slow down
> development but rather, speed it up as it ensures that things still work
> properly.

I agree that there are cases where UTs are very handy and can add 
confidence that things work as intended. Still, my strong opinion is 
that people should consider what parts of code are really worth testing 
- and how to do the tests so that the amount of maintenance required by 
the tests stays low. It's definitely _not fun_ to do refactoring for 
minor improvement when 400+ unit-test cases break. It's a point when 
many developers start seeing fixing this minor culprit much less 
important... And when people stop fixing minor things ... major things 
start to be just around the corner.

> 
>> - or to functions which I think are error-prone. So, I am probably
>> one of the last persons adding UTs to code I don't know :)
> 
> That's fine, you don't have to add test code for stuff you don't know.
> 
> But again, do NOT abuse a platform device for this, that's not ok, and
> the in-kernel code that does do this should be fixed up.

As suggested by David in another mail - I'll go with the 
root_device_[un]register(). I'll drop this patch entirely. Thanks for 
help, this was once again very educating :)

Yours,
	-- Matti
  
Matti Vaittinen March 23, 2023, 10:01 a.m. UTC | #12
On 3/23/23 10:58, Greg Kroah-Hartman wrote:
> On Thu, Mar 23, 2023 at 07:17:40AM +0000, Vaittinen, Matti wrote:
>> On 3/22/23 20:57, Greg Kroah-Hartman wrote:
>>> On Wed, Mar 22, 2023 at 03:48:00PM +0200, Matti Vaittinen wrote:
>>>> Hi Greg,
>>>>
>>>> Thanks for looking at this.
>>>>
>>>> On 3/22/23 14:07, Greg Kroah-Hartman wrote:
>>>>> On Wed, Mar 22, 2023 at 11:05:55AM +0200, Matti Vaittinen wrote:

>>>> The biggest thing for me is that I don't like the idea of creating own 'test
>>>> device' in <add subsystem here> while we already have some in DRM (or
>>>> others). Thus, I do see value in adding generic helpers for supporting
>>>> running KUnit tests on devm_* APIs. Hence it'd be good to have _some_
>>>> support for it.
>>>
>>> I agree, let's use a virtual device and a virtual bus (you can use the
>>> auxbus code for this as that's all there for this type of thing)
>>
>> Hm. The auxiliary_devices require parent. What would be the best way to
>> deal with that in KUnit tests?
> 
> If you use NULL as the parent, it goes into the root.

As far as I read this is not the case with auxiliary devices. Judging 
the docs they were intended to be representing some part of a (parent) 
device. I see the auxiliary_device_init() has explicit check for parent 
being populated:

int auxiliary_device_init(struct auxiliary_device *auxdev)
{
         struct device *dev = &auxdev->dev;

         if (!dev->parent) {
                 pr_err("auxiliary_device has a NULL dev->parent\n");
                 return -EINVAL;
         }

As I wrote in another mail, I thought of using a root_device for this 
IIO test as was suggested by David. To tell the truth, implementing a 
kunit bus device is starting to feel a bit overwhelming... I started 
just adding a driver for a light sensor, ended up adding a helper for 
IIO gain-time-scale conversions and I am slightly reluctant to going the 
extra-extra mile of adding some UT infrastructure in the context of this 
driver work...

Well, let's see. Maybe I change my mind after a good night's sleep :)

Yours,
	-- Matti
  
Maxime Ripard March 23, 2023, 10:07 a.m. UTC | #13
Hi David,

On Thu, Mar 23, 2023 at 03:30:34PM +0800, David Gow wrote:
> I think the questions we therefore need to answer are:
> - Do we specifically need a platform_device (or, any other specific
> device struct), or would any old struct device work? I can see why we
> would need a platform device for cases where we're testing things like
> device-tree properties (or, in the future, having e.g. USB-specific
> helpers which create usb_device). Most tests just use
> root_device_register() thus far, though.
> - What helpers do we need to make creating, using, and cleaning up
> these devices as simple as possible.
> 
> I think that in this particular case, we don't actually need a struct
> platform_device. Replacing these helpers with simple calls to
> root_device_register() and root_device_unregister() seems to work fine
> with this patch series for me. (It does break the
> drm_test_managed_run_action test, though.) So I don't think having
> these helpers actually help this series at the moment.

I'm not sure that a root_device is a good option, see below.

> That being said, if they used the KUnit resource system to
> automatically clean up the device when the test is finished (or
> otherwise exits), that would seem like a real advantage. The clk and
> drm examples both do this, and I'm hoping to add an API to make it
> even simpler going forward. (With the work-in-progress API described
> here[1], the whole thing should boil down to "struct device *dev =
> root_device_register(name); kunit_defer(root_device_unregister,
> dev);".)

Oh, yes, please make it happen :)

The current API is a bit of a pain compared to other managed APIs like
devm or drmm.

> So, I guess we have three cases we need to look at:
> - A test just needs any old struct device. Tests thus far have
> hardcoded, or had their own wrappers around root_device_register() for
> this.
> - A test needs a device attached to a bus (but doesn't care which
> bus). Thus far, people have used struct platform_device for this (see
> the DRM helpers, which use a platform device for managed resource
> tests[2]). Maybe the right solution here is something like a struct
> kunit_device?
> - A test needs a device attached to a specific bus. We'll probably
> need some more detailed faking of that bus. This covers cases like
> having fake USB devices, devicetree integration, etc.

Yeah, from the current discussion on the IIO and clk patchsets, and what
we've done in DRM, I guess there's two major use cases:

  * You test an (isolated) function that takes a device as an argument.
    Here you probably don't really care about what the device is as long
    as you can provide one. This is what is being done for the DRM
    helpers at the moment, even though it's not really isolated. The
    device is essentially mocked. This could be your points 1 and 2.

  * You want to probe the driver with a fake context. The device and
    drivers are very much real, but the device tree (or whatever) is
    mocked. This is what the clocks patches from Stephen are doing. This
    could be covered by your point 3.

> I'd suggest that, for the majority of cases which only care about the
> first case, let's just use root_device_register() directly, or have a
> thin wrapper like the old root_device-based version of the DRM
> helpers[3]. This will probable serve us well enough while we work out
> how to handle the other two cases properly (which is already being
> looked at for the CLK/DeviceTree patches and the DRM stuff).

I disagree, and I think your cases 1 and 2 should be merged. We were
initially using a root_device in DRM. We had to switch to
platform_device (but actually any device attached to a bus) because a
root device isn't attached to a bus and thus the devm resources are
never freed.

When a function takes a device as an argument, there's a good chance
that it will use devm nowadays, so we ended up exhausting resources
pools (like IDs iirc) when running a lot of tests in sequence.

So yeah, I think you can't just assume that a root device will do even
for a true unit test that would test an isolated function. It either
needs to be tied to a bus, or we need to force the devm resource release
when the root device is removed somehow.

Maxime
  
Maxime Ripard March 23, 2023, 10:12 a.m. UTC | #14
On Wed, Mar 22, 2023 at 07:57:10PM +0100, Greg Kroah-Hartman wrote:
> > > > +/**
> > > > + * test_kunit_helper_alloc_device - Allocate a mock device for a KUnit test
> > > > + * @test: The test context object
> > > > + *
> > > > + * This allocates a fake struct &device to create a mock for a KUnit
> > > > + * test. The device will also be bound to a fake driver. It will thus be
> > > > + * able to leverage the usual infrastructure and most notably the
> > > > + * device-managed resources just like a "real" device.
> > > 
> > > What specific "usual infrastructure" are you wanting to access here?
> > > 
> > > And again, if you want a fake device, make a virtual one, by just
> > > calling device_create().
> > > 
> > > Or are you wanting to do "more" with that device pointer than
> > > device_create() can give you?
> > 
> > Personally, I was (am) only interested in devm_ unwinding. I guess the
> > device_create(), device_add(), device_remove()... (didn't study this
> > sequence in details so sorry if there is errors) could've been sufficient
> > for me. I haven't looked how much of the code that there is for 'platform
> > devices' should be duplicated to support that sequence for testability
> > purposes.
> 
> Any device can access devm_ code, there's no need for it to be a
> platform device at all.

Sure but the resources are only released if the device is part of a bus,
so it can't be a root_device (or bare device) either

Maxime
  
Greg KH March 23, 2023, 10:21 a.m. UTC | #15
On Thu, Mar 23, 2023 at 11:12:16AM +0100, Maxime Ripard wrote:
> On Wed, Mar 22, 2023 at 07:57:10PM +0100, Greg Kroah-Hartman wrote:
> > > > > +/**
> > > > > + * test_kunit_helper_alloc_device - Allocate a mock device for a KUnit test
> > > > > + * @test: The test context object
> > > > > + *
> > > > > + * This allocates a fake struct &device to create a mock for a KUnit
> > > > > + * test. The device will also be bound to a fake driver. It will thus be
> > > > > + * able to leverage the usual infrastructure and most notably the
> > > > > + * device-managed resources just like a "real" device.
> > > > 
> > > > What specific "usual infrastructure" are you wanting to access here?
> > > > 
> > > > And again, if you want a fake device, make a virtual one, by just
> > > > calling device_create().
> > > > 
> > > > Or are you wanting to do "more" with that device pointer than
> > > > device_create() can give you?
> > > 
> > > Personally, I was (am) only interested in devm_ unwinding. I guess the
> > > device_create(), device_add(), device_remove()... (didn't study this
> > > sequence in details so sorry if there is errors) could've been sufficient
> > > for me. I haven't looked how much of the code that there is for 'platform
> > > devices' should be duplicated to support that sequence for testability
> > > purposes.
> > 
> > Any device can access devm_ code, there's no need for it to be a
> > platform device at all.
> 
> Sure but the resources are only released if the device is part of a bus,
> so it can't be a root_device (or bare device) either

The resources are not cleaned up when the device is freed no matter if
it's on a bus or not?  If so, then that's a bug that needs to be fixed,
and tested :)

thanks,

greg k-h
  
Greg KH March 23, 2023, 10:25 a.m. UTC | #16
On Thu, Mar 23, 2023 at 11:20:33AM +0200, Matti Vaittinen wrote:
> On 3/23/23 10:58, Greg Kroah-Hartman wrote:
> > On Thu, Mar 23, 2023 at 07:17:40AM +0000, Vaittinen, Matti wrote:
> > > On 3/22/23 20:57, Greg Kroah-Hartman wrote:
> > > > On Wed, Mar 22, 2023 at 03:48:00PM +0200, Matti Vaittinen wrote:
> > > > > Hi Greg,
> > > > > 
> > > > > Thanks for looking at this.
> > > > > 
> > > > > On 3/22/23 14:07, Greg Kroah-Hartman wrote:
> > > > > > On Wed, Mar 22, 2023 at 11:05:55AM +0200, Matti Vaittinen wrote:
> 
> > > I am very conservative what comes to adding unit tests due to the huge
> > > inertia they add to any further development. I usually only add tests to
> > > APIs which I know won't require changing (I don't know such in-kernel
> > > APIs)
> > 
> > So anything that is changing doesn't get a test?
> 
> No. I think you misread me. I didn't say I don't like adding tests to code
> which changes. I said, I don't like adding tests to APIs which change.

Then you should not be writing any in-kernel tests as all of our APIs
change all the time.

>  If you only test
> > things that don't change then no tests fail, and so, why have the test
> > at all?
> 
> Because implementation cascading into functions below an API may change even
> if the API stays unchanged.

Then it needs to be fixed.

> > On the contrary, tests should be used to verify things that are changing
> > all the time, to ensure that we don't break things.
> 
> This is only true when your test code stays valid. Problem with excessive
> amount of tests is that more we have callers for an API, harder changing
> that API becomes. I've seen a point where people stop fixing "unimportant"
> things just because the amount of work fixing all impacted UT-cases would
> take. I know that many things went wrong before that project ended up to the
> point - but what I picked up with me is that carelessly added UTs do really
> hinder further development.

Again, in-kernel apis change at any moment.  I just changed one that was
over 15 years old.  Don't get stuck into thinking that you can only
write tests for stuff that is "stable" as nothing in the kernel is
"stable" and can change at any point in time.  You fix up all the
in-kernel users of the api, and the tests, and all is good.  That's how
kernel development works.

>  That's why we need
> > them, not to just validate that old code still is going ok.
> > 
> > The driver core is changing, and so, I would love to see tests for it to
> > ensure that I don't break anything over time.  That should NOT slow down
> > development but rather, speed it up as it ensures that things still work
> > properly.
> 
> I agree that there are cases where UTs are very handy and can add confidence
> that things work as intended. Still, my strong opinion is that people should
> consider what parts of code are really worth testing - and how to do the
> tests so that the amount of maintenance required by the tests stays low.
> It's definitely _not fun_ to do refactoring for minor improvement when 400+
> unit-test cases break. It's a point when many developers start seeing fixing
> this minor culprit much less important... And when people stop fixing minor
> things ... major things start to be just around the corner.

If people stop fixing minor things then the kernel development process
is dead.  Based on all the changes that go into it right now, we are far
from having that problem.

So write valid tests, if we get to the point where we have too much of a
problem fixing up the tests than the real users of apis, then we can
revisit it.  But for now, that's not an issue.

And again, remember, and api can, and will, change at any moment in
time, you can never know what will be "stable" as we do not have such a
thing.

thanks,

greg k-h
  
Greg KH March 23, 2023, 10:27 a.m. UTC | #17
On Thu, Mar 23, 2023 at 12:01:15PM +0200, Matti Vaittinen wrote:
> On 3/23/23 10:58, Greg Kroah-Hartman wrote:
> > On Thu, Mar 23, 2023 at 07:17:40AM +0000, Vaittinen, Matti wrote:
> > > On 3/22/23 20:57, Greg Kroah-Hartman wrote:
> > > > On Wed, Mar 22, 2023 at 03:48:00PM +0200, Matti Vaittinen wrote:
> > > > > Hi Greg,
> > > > > 
> > > > > Thanks for looking at this.
> > > > > 
> > > > > On 3/22/23 14:07, Greg Kroah-Hartman wrote:
> > > > > > On Wed, Mar 22, 2023 at 11:05:55AM +0200, Matti Vaittinen wrote:
> 
> > > > > The biggest thing for me is that I don't like the idea of creating own 'test
> > > > > device' in <add subsystem here> while we already have some in DRM (or
> > > > > others). Thus, I do see value in adding generic helpers for supporting
> > > > > running KUnit tests on devm_* APIs. Hence it'd be good to have _some_
> > > > > support for it.
> > > > 
> > > > I agree, let's use a virtual device and a virtual bus (you can use the
> > > > auxbus code for this as that's all there for this type of thing)
> > > 
> > > Hm. The auxiliary_devices require parent. What would be the best way to
> > > deal with that in KUnit tests?
> > 
> > If you use NULL as the parent, it goes into the root.
> 
> As far as I read this is not the case with auxiliary devices. Judging the
> docs they were intended to be representing some part of a (parent) device. I
> see the auxiliary_device_init() has explicit check for parent being
> populated:
> 
> int auxiliary_device_init(struct auxiliary_device *auxdev)
> {
>         struct device *dev = &auxdev->dev;
> 
>         if (!dev->parent) {
>                 pr_err("auxiliary_device has a NULL dev->parent\n");
>                 return -EINVAL;
>         }

Yes as it wants to "split" a device up into smaller devices.  So make a
real device that it can hang off of.

> As I wrote in another mail, I thought of using a root_device for this IIO
> test as was suggested by David. To tell the truth, implementing a kunit bus
> device is starting to feel a bit overwhelming... I started just adding a
> driver for a light sensor, ended up adding a helper for IIO gain-time-scale
> conversions and I am slightly reluctant to going the extra-extra mile of
> adding some UT infrastructure in the context of this driver work...

I think it is worth it as the driver core has no tests.  So it obviously
must be correct, right?  :)

thanks,

greg k-h
  
Matti Vaittinen March 23, 2023, 10:43 a.m. UTC | #18
This is a low priority babbling - feel free to skip if busy.

On 3/23/23 12:25, Greg Kroah-Hartman wrote:
> On Thu, Mar 23, 2023 at 11:20:33AM +0200, Matti Vaittinen wrote:
>> On 3/23/23 10:58, Greg Kroah-Hartman wrote:
>>> On Thu, Mar 23, 2023 at 07:17:40AM +0000, Vaittinen, Matti wrote:
>>>> On 3/22/23 20:57, Greg Kroah-Hartman wrote:
>>>>> On Wed, Mar 22, 2023 at 03:48:00PM +0200, Matti Vaittinen wrote:
>>>>>> Hi Greg,
>>>>>>
>>>>>> Thanks for looking at this.
>>>>>>
>>>>>> On 3/22/23 14:07, Greg Kroah-Hartman wrote:
>>>>>>> On Wed, Mar 22, 2023 at 11:05:55AM +0200, Matti Vaittinen wrote:
>>
>>>> I am very conservative what comes to adding unit tests due to the huge
>>>> inertia they add to any further development. I usually only add tests to
>>>> APIs which I know won't require changing (I don't know such in-kernel
>>>> APIs)
>>>
>>> So anything that is changing doesn't get a test?
>>
>> No. I think you misread me. I didn't say I don't like adding tests to code
>> which changes. I said, I don't like adding tests to APIs which change.
> 
> Then you should not be writing any in-kernel tests as all of our APIs
> change all the time.
> 
>>   If you only test
>>> things that don't change then no tests fail, and so, why have the test
>>> at all?
>>
>> Because implementation cascading into functions below an API may change even
>> if the API stays unchanged.
> 
> Then it needs to be fixed.
> 
>>> On the contrary, tests should be used to verify things that are changing
>>> all the time, to ensure that we don't break things.
>>
>> This is only true when your test code stays valid. Problem with excessive
>> amount of tests is that more we have callers for an API, harder changing
>> that API becomes. I've seen a point where people stop fixing "unimportant"
>> things just because the amount of work fixing all impacted UT-cases would
>> take. I know that many things went wrong before that project ended up to the
>> point - but what I picked up with me is that carelessly added UTs do really
>> hinder further development.
> 
> Again, in-kernel apis change at any moment.

I agree. This is why I initially wrote:

 >>>> APIs which I know won't require changing (I don't know such in-kernel
 >>>> APIs)

> Don't get stuck into thinking that you can only
> write tests for stuff that is "stable" as nothing in the kernel is
> "stable" and can change at any point in time.

I don't. But I don't either think that UTs come with no cost. Thus I do 
only write tests when I see a _real need_ for one. If the APIs would be 
guaranteed not to change, then I would understand writing the tests for 
each and every "thing" without much of thinking if "the thing" is worth 
the test.

>  You fix up all the
> in-kernel users of the api, and the tests, and all is good.  That's how
> kernel development works.

Sure. This is how it works and how I think it should work. But I also 
have seen how this 'UT work overhead' has made people to decide not to 
touch things. Not in kernel but in other project. This is a real thing 
which can happen - many engineers like me are lazy bastards :)

>>   That's why we need
>>> them, not to just validate that old code still is going ok.
>>>
>>> The driver core is changing, and so, I would love to see tests for it to
>>> ensure that I don't break anything over time.  That should NOT slow down
>>> development but rather, speed it up as it ensures that things still work
>>> properly.
>>
>> I agree that there are cases where UTs are very handy and can add confidence
>> that things work as intended. Still, my strong opinion is that people should
>> consider what parts of code are really worth testing - and how to do the
>> tests so that the amount of maintenance required by the tests stays low.
>> It's definitely _not fun_ to do refactoring for minor improvement when 400+
>> unit-test cases break. It's a point when many developers start seeing fixing
>> this minor culprit much less important... And when people stop fixing minor
>> things ... major things start to be just around the corner.
> 
> If people stop fixing minor things then the kernel development process
> is dead.  Based on all the changes that go into it right now, we are far
> from having that problem.

And I am so happy for that. Kernel/drivers are still fun to work with. 
My personal preference is to keep it that way :)

> So write valid tests, if we get to the point where we have too much of a
> problem fixing up the tests than the real users of apis, then we can
> revisit it.  But for now, that's not an issue.

The beginning of your sentence hits the point. Write valid tests. I just 
encourage people to occasionally ask if the test they write is really a 
valid one. :)

> And again, remember, and api can, and will, change at any moment in
> time, you can never know what will be "stable" as we do not have such a
> thing.

We agree on this.
  
Matti Vaittinen March 23, 2023, 11 a.m. UTC | #19
On 3/23/23 12:27, Greg Kroah-Hartman wrote:
> On Thu, Mar 23, 2023 at 12:01:15PM +0200, Matti Vaittinen wrote:
>> On 3/23/23 10:58, Greg Kroah-Hartman wrote:
>>> On Thu, Mar 23, 2023 at 07:17:40AM +0000, Vaittinen, Matti wrote:
>>>> On 3/22/23 20:57, Greg Kroah-Hartman wrote:
>>>>> On Wed, Mar 22, 2023 at 03:48:00PM +0200, Matti Vaittinen wrote:
>>>>>> Hi Greg,
>>>>>>
>>>>>> Thanks for looking at this.
>>>>>>
>>>>>> On 3/22/23 14:07, Greg Kroah-Hartman wrote:
>>>>>>> On Wed, Mar 22, 2023 at 11:05:55AM +0200, Matti Vaittinen wrote:
>>
>>>>>> The biggest thing for me is that I don't like the idea of creating own 'test
>>>>>> device' in <add subsystem here> while we already have some in DRM (or
>>>>>> others). Thus, I do see value in adding generic helpers for supporting
>>>>>> running KUnit tests on devm_* APIs. Hence it'd be good to have _some_
>>>>>> support for it.
>>>>>
>>>>> I agree, let's use a virtual device and a virtual bus (you can use the
>>>>> auxbus code for this as that's all there for this type of thing)
>>>>
>>>> Hm. The auxiliary_devices require parent. What would be the best way to
>>>> deal with that in KUnit tests?
>>>
>>> If you use NULL as the parent, it goes into the root.
>>
>> As far as I read this is not the case with auxiliary devices. Judging the
>> docs they were intended to be representing some part of a (parent) device. I
>> see the auxiliary_device_init() has explicit check for parent being
>> populated:
>>
>> int auxiliary_device_init(struct auxiliary_device *auxdev)
>> {
>>          struct device *dev = &auxdev->dev;
>>
>>          if (!dev->parent) {
>>                  pr_err("auxiliary_device has a NULL dev->parent\n");
>>                  return -EINVAL;
>>          }
> 
> Yes as it wants to "split" a device up into smaller devices.  So make a
> real device that it can hang off of.

Yep. This is what led me to the root_device_register()... :rolleyes: And 
seein the root-device alone could do what I need - adding auxiliary 
device on top of it just for the sake of adding one seems a bit of an 
over-engineering to me :)

>> As I wrote in another mail, I thought of using a root_device for this IIO
>> test as was suggested by David. To tell the truth, implementing a kunit bus
>> device is starting to feel a bit overwhelming... I started just adding a
>> driver for a light sensor, ended up adding a helper for IIO gain-time-scale
>> conversions and I am slightly reluctant to going the extra-extra mile of
>> adding some UT infrastructure in the context of this driver work...
> 
> I think it is worth it as the driver core has no tests.  So it obviously
> must be correct, right?  :)

Doh. Greg, I hate you :) How could one argue with something like this? I 
think I will submit the v6 with the root_device_register() due to the 
aux-device requiring it in any case. I know that will end up to your 
table still as IIO is going through your hands anyways.

I will however take a look at what Maxime said about devm unwinding not 
being done w/o a bus because I think I saw the unwinding done in these 
IIO tests even when using the root_device_register() 
root_device_unregister(). If the unwinding really is not done, then I 
will come back to this auxiliary device rehearsal
  
Matti Vaittinen March 23, 2023, 12:16 p.m. UTC | #20
Hi Maxime, all

On 3/23/23 12:21, Greg Kroah-Hartman wrote:
> On Thu, Mar 23, 2023 at 11:12:16AM +0100, Maxime Ripard wrote:
>> On Wed, Mar 22, 2023 at 07:57:10PM +0100, Greg Kroah-Hartman wrote:
>>>>>> +/**
>>>>>> + * test_kunit_helper_alloc_device - Allocate a mock device for a KUnit test
>>>>>> + * @test: The test context object
>>>>>> + *
>>>>>> + * This allocates a fake struct &device to create a mock for a KUnit
>>>>>> + * test. The device will also be bound to a fake driver. It will thus be
>>>>>> + * able to leverage the usual infrastructure and most notably the
>>>>>> + * device-managed resources just like a "real" device.
>>>>>
>>>>> What specific "usual infrastructure" are you wanting to access here?
>>>>>
>>>>> And again, if you want a fake device, make a virtual one, by just
>>>>> calling device_create().
>>>>>
>>>>> Or are you wanting to do "more" with that device pointer than
>>>>> device_create() can give you?
>>>>
>>>> Personally, I was (am) only interested in devm_ unwinding. I guess the
>>>> device_create(), device_add(), device_remove()... (didn't study this
>>>> sequence in details so sorry if there is errors) could've been sufficient
>>>> for me. I haven't looked how much of the code that there is for 'platform
>>>> devices' should be duplicated to support that sequence for testability
>>>> purposes.
>>>
>>> Any device can access devm_ code, there's no need for it to be a
>>> platform device at all.
>>
>> Sure but the resources are only released if the device is part of a bus,
>> so it can't be a root_device (or bare device) either
> 
> The resources are not cleaned up when the device is freed no matter if
> it's on a bus or not?  If so, then that's a bug that needs to be fixed,
> and tested :)

This is strange. I just ran a test on a beaglebone black using Linux 
6.3.0-rc2 + the IIO patches we se here (but the IIO test patch modified 
to use the root_device_register() and root_device_unregister().

I passed the device pointer from root_device_register() to the 
devm_iio_init_iio_gts()

// snip
         dev = root_device_register(IIO_GTS_TEST_DEV);
         KUNIT_EXPECT_NOT_ERR_OR_NULL(test, dev);
         if (IS_ERR_OR_NULL(dev))
                 return NULL;

         ret = devm_iio_init_iio_gts(dev, TEST_SCALE_1X, 0, g_table, num_g,
                                     i_table, num_i, gts);

  - and saw the tables for available scales allocated:

  if (gts.num_avail_all_scales)
                 pr_info("GTS: table allocation succeeded\n");
         else
                 pr_info("GTS: table allocation failed\n");

         pr_info("gts: num_avail_all_scales %d\n", 
gts.num_avail_all_scales);

(this printed:
[   52.132966]     # Subtest: iio-gain-time-scale
[   52.132982]     1..7
[   52.157455] GTS: table allocation succeeded
[   52.164077] gts: num_avail_all_scales 16

Next I unregister the root-device and check if the unwinding code which 
frees the tables and zeroes the scale count was ran:

         root_device_unregister(dev);
         pr_info("gts: num_avail_all_scales %d\n", 
gts.num_avail_all_scales);

         if (gts.num_avail_all_scales)
                 pr_info("devm unwinding not done\n");
         else
                 pr_info("devm unwinding succeeded\n");

Which printed:
[   52.168101] gts: num_avail_all_scales 0
[   52.171957] devm unwinding succeeded

I can send patch(es) just for testing this on other machines if someone 
want's to see if the lack of devm unwinding is somehow architecture 
specific (which sounds very strange to me) - although using this IIO 
series just for checking the unwinding is a bit of an overkill. I just 
happened to have these tests at my hands / in my tree for testing.

In any case, devm unwinding using root_device_[un]register() seems to 
"work on my machine".

Naxime, what was the environment where you observed lack of unwinding? 
(Huh, I am so afraid of sending this post out - I've experienced too 
many "Oh, boy - how I didn't notice THAT" moments in the past and maybe 
I am again overlooking something...)

Yours,
	-- Matti
  
Maxime Ripard March 23, 2023, 12:29 p.m. UTC | #21
On Thu, Mar 23, 2023 at 02:16:52PM +0200, Matti Vaittinen wrote:
> Hi Maxime, all
> 
> On 3/23/23 12:21, Greg Kroah-Hartman wrote:
> > On Thu, Mar 23, 2023 at 11:12:16AM +0100, Maxime Ripard wrote:
> > > On Wed, Mar 22, 2023 at 07:57:10PM +0100, Greg Kroah-Hartman wrote:
> > > > > > > +/**
> > > > > > > + * test_kunit_helper_alloc_device - Allocate a mock device for a KUnit test
> > > > > > > + * @test: The test context object
> > > > > > > + *
> > > > > > > + * This allocates a fake struct &device to create a mock for a KUnit
> > > > > > > + * test. The device will also be bound to a fake driver. It will thus be
> > > > > > > + * able to leverage the usual infrastructure and most notably the
> > > > > > > + * device-managed resources just like a "real" device.
> > > > > > 
> > > > > > What specific "usual infrastructure" are you wanting to access here?
> > > > > > 
> > > > > > And again, if you want a fake device, make a virtual one, by just
> > > > > > calling device_create().
> > > > > > 
> > > > > > Or are you wanting to do "more" with that device pointer than
> > > > > > device_create() can give you?
> > > > > 
> > > > > Personally, I was (am) only interested in devm_ unwinding. I guess the
> > > > > device_create(), device_add(), device_remove()... (didn't study this
> > > > > sequence in details so sorry if there is errors) could've been sufficient
> > > > > for me. I haven't looked how much of the code that there is for 'platform
> > > > > devices' should be duplicated to support that sequence for testability
> > > > > purposes.
> > > > 
> > > > Any device can access devm_ code, there's no need for it to be a
> > > > platform device at all.
> > > 
> > > Sure but the resources are only released if the device is part of a bus,
> > > so it can't be a root_device (or bare device) either
> > 
> > The resources are not cleaned up when the device is freed no matter if
> > it's on a bus or not?  If so, then that's a bug that needs to be fixed,
> > and tested :)
> 
> This is strange. I just ran a test on a beaglebone black using Linux
> 6.3.0-rc2 + the IIO patches we se here (but the IIO test patch modified to
> use the root_device_register() and root_device_unregister().
> 
> I passed the device pointer from root_device_register() to the
> devm_iio_init_iio_gts()
> 
> // snip
>         dev = root_device_register(IIO_GTS_TEST_DEV);
>         KUNIT_EXPECT_NOT_ERR_OR_NULL(test, dev);
>         if (IS_ERR_OR_NULL(dev))
>                 return NULL;
> 
>         ret = devm_iio_init_iio_gts(dev, TEST_SCALE_1X, 0, g_table, num_g,
>                                     i_table, num_i, gts);
> 
>  - and saw the tables for available scales allocated:
> 
>  if (gts.num_avail_all_scales)
>                 pr_info("GTS: table allocation succeeded\n");
>         else
>                 pr_info("GTS: table allocation failed\n");
> 
>         pr_info("gts: num_avail_all_scales %d\n", gts.num_avail_all_scales);
> 
> (this printed:
> [   52.132966]     # Subtest: iio-gain-time-scale
> [   52.132982]     1..7
> [   52.157455] GTS: table allocation succeeded
> [   52.164077] gts: num_avail_all_scales 16
> 
> Next I unregister the root-device and check if the unwinding code which
> frees the tables and zeroes the scale count was ran:
> 
>         root_device_unregister(dev);
>         pr_info("gts: num_avail_all_scales %d\n", gts.num_avail_all_scales);
> 
>         if (gts.num_avail_all_scales)
>                 pr_info("devm unwinding not done\n");
>         else
>                 pr_info("devm unwinding succeeded\n");
> 
> Which printed:
> [   52.168101] gts: num_avail_all_scales 0
> [   52.171957] devm unwinding succeeded
> 
> I can send patch(es) just for testing this on other machines if someone
> want's to see if the lack of devm unwinding is somehow architecture specific
> (which sounds very strange to me) - although using this IIO series just for
> checking the unwinding is a bit of an overkill. I just happened to have
> these tests at my hands / in my tree for testing.
> 
> In any case, devm unwinding using root_device_[un]register() seems to "work
> on my machine".
> 
> Naxime, what was the environment where you observed lack of unwinding? (Huh,
> I am so afraid of sending this post out - I've experienced too many "Oh, boy
> - how I didn't notice THAT" moments in the past and maybe I am again
> overlooking something...)

This is the description of what was happening:
https://lore.kernel.org/dri-devel/20221117165311.vovrc7usy4efiytl@houat/

Maxime
  
Matti Vaittinen March 23, 2023, 1:02 p.m. UTC | #22
On 3/23/23 14:29, Maxime Ripard wrote:
> On Thu, Mar 23, 2023 at 02:16:52PM +0200, Matti Vaittinen wrote:
> 
> This is the description of what was happening:
> https://lore.kernel.org/dri-devel/20221117165311.vovrc7usy4efiytl@houat/
Thanks Maxime. Do I read this correcty. The devm_ unwinding not being 
done when root_device_register() is used is not because 
root_device_unregister() would not trigger the unwinding - but rather 
because DRM code on top of this device keeps the refcount increased?

If this is the case, then it sounds like a DRM specific issue to me. 
Whether it is a feature or bug is beyond my knowledge. Still, I would 
not say using the root_device_[un]register() in generic code is not 
feasible - unless all other subsytems have similar refcount handling.

Sure thing using root_device_register() root_device_unregister() in DRM 
does not work as such. This, however, does not mean the generic kunit 
helpers should use platform_devices to force unwinding?

Well, It's almost the best season for ice-fishing in Finland so opening 
a can of worms is not that bad, right? :)

Thanks for the education people! I did learn a thing or two Today.

Yours,
	-- Matti
  
Maxime Ripard March 23, 2023, 4:36 p.m. UTC | #23
On Thu, Mar 23, 2023 at 03:02:03PM +0200, Matti Vaittinen wrote:
> On 3/23/23 14:29, Maxime Ripard wrote:
> > On Thu, Mar 23, 2023 at 02:16:52PM +0200, Matti Vaittinen wrote:
> > 
> > This is the description of what was happening:
> > https://lore.kernel.org/dri-devel/20221117165311.vovrc7usy4efiytl@houat/
>
> Thanks Maxime. Do I read this correcty. The devm_ unwinding not being done
> when root_device_register() is used is not because root_device_unregister()
> would not trigger the unwinding - but rather because DRM code on top of this
> device keeps the refcount increased?

There's a difference of behaviour between a root_device and any device
with a bus: the root_device will only release the devm resources when
it's freed (in device_release), but a bus device will also do it in
device_del (through bus_remove_device() -> device_release_driver() ->
device_release_driver_internal() -> __device_release_driver() ->
device_unbind_cleanup(), which are skipped (in multiple places) if
there's no bus and no driver attached to the device).

It does affect DRM, but I'm pretty sure it will affect any framework
that deals with device hotplugging by deferring the framework structure
until the last (userspace) user closes its file descriptor. So I'd
assume that v4l2 and cec at least are also affected, and most likely
others.

> If this is the case, then it sounds like a DRM specific issue to me.

I mean, I guess. One could also argue that it's because IIO doesn't
properly deal with hotplugging. I'm not sure how that helps. Those are
common helpers which should accommodate every framework, and your second
patch breaks the kunit tests for DRM anyway.

> Whether it is a feature or bug is beyond my knowledge. Still, I would
> not say using the root_device_[un]register() in generic code is not
> feasible - unless all other subsytems have similar refcount handling.
> 
> Sure thing using root_device_register() root_device_unregister() in DRM does
> not work as such. This, however, does not mean the generic kunit helpers
> should use platform_devices to force unwinding?

platform_devices were a quick way to get a device that would have a bus
and a driver bound to fall into the right patch above. We probably
shouldn't use platform_devices and a kunit_device sounds like the best
idea, but the test linked in the original mail I pointed you to should
work with whatever we come up with. It works with multiple (platform,
PCI, USB, etc) buses, so the mock we create should behave like their
real world equivalents.

Maxime
  
Matti Vaittinen March 24, 2023, 6:11 a.m. UTC | #24
On 3/23/23 18:36, Maxime Ripard wrote:
> On Thu, Mar 23, 2023 at 03:02:03PM +0200, Matti Vaittinen wrote:
>> On 3/23/23 14:29, Maxime Ripard wrote:
>>> On Thu, Mar 23, 2023 at 02:16:52PM +0200, Matti Vaittinen wrote:
>>>
>>> This is the description of what was happening:
>>> https://lore.kernel.org/dri-devel/20221117165311.vovrc7usy4efiytl@houat/
>>
>> Thanks Maxime. Do I read this correcty. The devm_ unwinding not being done
>> when root_device_register() is used is not because root_device_unregister()
>> would not trigger the unwinding - but rather because DRM code on top of this
>> device keeps the refcount increased?
> 
> There's a difference of behaviour between a root_device and any device
> with a bus: the root_device will only release the devm resources when
> it's freed (in device_release), but a bus device will also do it in
> device_del (through bus_remove_device() -> device_release_driver() ->
> device_release_driver_internal() -> __device_release_driver() ->
> device_unbind_cleanup(), which are skipped (in multiple places) if
> there's no bus and no driver attached to the device).
> 
> It does affect DRM, but I'm pretty sure it will affect any framework
> that deals with device hotplugging by deferring the framework structure
> until the last (userspace) user closes its file descriptor. So I'd
> assume that v4l2 and cec at least are also affected, and most likely
> others.

Thanks for the explanation and patience :)

> 
>> If this is the case, then it sounds like a DRM specific issue to me.
> 
> I mean, I guess. One could also argue that it's because IIO doesn't
> properly deal with hotplugging.

I must say I haven't been testing the IIO registration API. I've only 
tested the helper API which is not backed up by any "IIO device". (This 
is fine for the helper because it must by design be cleaned-up only 
after the IIO-deregistration).

After your explanation here, I am not convinced IIO wouldn't see the 
same issue if I was testing the devm_iio_device_alloc() & co.

> I'm not sure how that helps. Those are
> common helpers which should accommodate every framework,

Ok. Fair enough. Besides, if the root-device was sufficient - then I 
would actually not see the need for a helper. People could in that case 
directly use the root_device_register(). So, if helpers are provided 
they should be backed up by a device with a bus then.

> and your second
> patch breaks the kunit tests for DRM anyway.

Oh, I must have made an error there. It was supposed to be just a 
refactoring with no functional changes. Sorry about that. Anyways, that 
patch can be forgotten as Greg opposes using the platform devices in 
generic helpers.

>> Whether it is a feature or bug is beyond my knowledge. Still, I would
>> not say using the root_device_[un]register() in generic code is not
>> feasible - unless all other subsytems have similar refcount handling.
>>
>> Sure thing using root_device_register() root_device_unregister() in DRM does
>> not work as such. This, however, does not mean the generic kunit helpers
>> should use platform_devices to force unwinding?
> 
> platform_devices were a quick way to get a device that would have a bus
> and a driver bound to fall into the right patch above. We probably
> shouldn't use platform_devices and a kunit_device sounds like the best
> idea, but the test linked in the original mail I pointed you to should
> work with whatever we come up with. It works with multiple (platform,
> PCI, USB, etc) buses, so the mock we create should behave like their
> real world equivalents.
Thanks for the patience and the explanation. Now I understand a generic 
test device needs to sit on a bus.

As I said, in my very specific IIO related test the test device does not 
need a bus. Hence I'll drop the 'generic helpers' from this series.

Yours,
	-- Matti
  
David Gow March 24, 2023, 6:34 a.m. UTC | #25
On Fri, 24 Mar 2023 at 14:11, Matti Vaittinen <mazziesaccount@gmail.com> wrote:
>
> On 3/23/23 18:36, Maxime Ripard wrote:
> > On Thu, Mar 23, 2023 at 03:02:03PM +0200, Matti Vaittinen wrote:
> >> On 3/23/23 14:29, Maxime Ripard wrote:
> >>> On Thu, Mar 23, 2023 at 02:16:52PM +0200, Matti Vaittinen wrote:
> >>>
> >>> This is the description of what was happening:
> >>> https://lore.kernel.org/dri-devel/20221117165311.vovrc7usy4efiytl@houat/
> >>
> >> Thanks Maxime. Do I read this correcty. The devm_ unwinding not being done
> >> when root_device_register() is used is not because root_device_unregister()
> >> would not trigger the unwinding - but rather because DRM code on top of this
> >> device keeps the refcount increased?
> >
> > There's a difference of behaviour between a root_device and any device
> > with a bus: the root_device will only release the devm resources when
> > it's freed (in device_release), but a bus device will also do it in
> > device_del (through bus_remove_device() -> device_release_driver() ->
> > device_release_driver_internal() -> __device_release_driver() ->
> > device_unbind_cleanup(), which are skipped (in multiple places) if
> > there's no bus and no driver attached to the device).
> >
> > It does affect DRM, but I'm pretty sure it will affect any framework
> > that deals with device hotplugging by deferring the framework structure
> > until the last (userspace) user closes its file descriptor. So I'd
> > assume that v4l2 and cec at least are also affected, and most likely
> > others.
>
> Thanks for the explanation and patience :)
>

Thanks from me as well -- this has certainly helped me understand some
of the details of the driver model that had slipped past me.

> >
> >> If this is the case, then it sounds like a DRM specific issue to me.
> >
> > I mean, I guess. One could also argue that it's because IIO doesn't
> > properly deal with hotplugging.
>
> I must say I haven't been testing the IIO registration API. I've only
> tested the helper API which is not backed up by any "IIO device". (This
> is fine for the helper because it must by design be cleaned-up only
> after the IIO-deregistration).
>
> After your explanation here, I am not convinced IIO wouldn't see the
> same issue if I was testing the devm_iio_device_alloc() & co.
>
> > I'm not sure how that helps. Those are
> > common helpers which should accommodate every framework,
>
> Ok. Fair enough. Besides, if the root-device was sufficient - then I
> would actually not see the need for a helper. People could in that case
> directly use the root_device_register(). So, if helpers are provided
> they should be backed up by a device with a bus then.
>

I think there is _some_ value in helpers even without a bus, but it's
much more limited:
- It's less confusing if KUnit test devices are using kunit labelled
structs and functions.
- Helpers could use KUnit's resource management API to ensure any
device created is properly unregistered and removed when the test
exits (particularly if it exits early due to, e.g., an assertion).

I've played around implementing those with a proper struct
kunit_device and the automatic cleanup on test failure, and thus far
it -- like root_device_register -- works for all of the tests except
the drm-test-managed one.

So if we really wanted to, we could use KUnit-specific helpers for
just those tests which currently work with root_device_register(), but
if we're going to try to implement a KUnit bus -- which I think is at
least worth investigating -- I'd rather not either hold up otherwise
good tests on helper development, or rush a helper out only to have to
change it a lot when we see exactly what the bus implementation would
look like.

> > and your second
> > patch breaks the kunit tests for DRM anyway.
>
> Oh, I must have made an error there. It was supposed to be just a
> refactoring with no functional changes. Sorry about that. Anyways, that
> patch can be forgotten as Greg opposes using the platform devices in
> generic helpers.
>
> >> Whether it is a feature or bug is beyond my knowledge. Still, I would
> >> not say using the root_device_[un]register() in generic code is not
> >> feasible - unless all other subsytems have similar refcount handling.
> >>
> >> Sure thing using root_device_register() root_device_unregister() in DRM does
> >> not work as such. This, however, does not mean the generic kunit helpers
> >> should use platform_devices to force unwinding?
> >
> > platform_devices were a quick way to get a device that would have a bus
> > and a driver bound to fall into the right patch above. We probably
> > shouldn't use platform_devices and a kunit_device sounds like the best
> > idea, but the test linked in the original mail I pointed you to should
> > work with whatever we come up with. It works with multiple (platform,
> > PCI, USB, etc) buses, so the mock we create should behave like their
> > real world equivalents.
> Thanks for the patience and the explanation. Now I understand a generic
> test device needs to sit on a bus.
>
> As I said, in my very specific IIO related test the test device does not
> need a bus. Hence I'll drop the 'generic helpers' from this series.
>

I think that sounds like a good strategy for now, and we can work on a
set of 'generic helpers' which have an associated bus and struct
kunit_device in the meantime. If we can continue to use
root_device_register until those are ready, that'd be very convenient.
Certainly, the helpers I'm playing with at the moment have a few other
dependencies I'd want to land first, so I suspect they wouldn't be
done in time for 6.4. It'd also make sense to see if we can make sure
any such helpers could either work well with (or at least not conflict
with) tests which use devicetree.

Regardless, thanks very much for putting all of the effort in to
working this out. I think we have a much better idea of the
requirements for this sort of thing now.

Cheers,
-- David
  
Matti Vaittinen March 24, 2023, 6:51 a.m. UTC | #26
On 3/24/23 08:34, David Gow wrote:
> On Fri, 24 Mar 2023 at 14:11, Matti Vaittinen <mazziesaccount@gmail.com> wrote:
>>
>> On 3/23/23 18:36, Maxime Ripard wrote:
>>> On Thu, Mar 23, 2023 at 03:02:03PM +0200, Matti Vaittinen wrote:
>>>> On 3/23/23 14:29, Maxime Ripard wrote:
>>>>> On Thu, Mar 23, 2023 at 02:16:52PM +0200, Matti Vaittinen wrote:

>> Ok. Fair enough. Besides, if the root-device was sufficient - then I
>> would actually not see the need for a helper. People could in that case
>> directly use the root_device_register(). So, if helpers are provided
>> they should be backed up by a device with a bus then.
> 
> I think there is _some_ value in helpers even without a bus, but it's
> much more limited:
> - It's less confusing if KUnit test devices are using kunit labelled
> structs and functions.
> - Helpers could use KUnit's resource management API to ensure any
> device created is properly unregistered and removed when the test
> exits (particularly if it exits early due to, e.g., an assertion).

Ah. That's true. Being able to abort the test on error w/o being forced 
to do a clean-up dance for the dummy device would be convenient.

> I've played around implementing those with a proper struct
> kunit_device and the automatic cleanup on test failure, and thus far
> it -- like root_device_register -- works for all of the tests except
> the drm-test-managed one.
> 
> So if we really wanted to, we could use KUnit-specific helpers for
> just those tests which currently work with root_device_register(), but
> if we're going to try to implement a KUnit bus -- which I think is at
> least worth investigating -- I'd rather not either hold up otherwise
> good tests on helper development, or rush a helper out only to have to
> change it a lot when we see exactly what the bus implementation would
> look like.

It's easy for me to agree.

>> As I said, in my very specific IIO related test the test device does not
>> need a bus. Hence I'll drop the 'generic helpers' from this series.
>>
> 
> I think that sounds like a good strategy for now, and we can work on a
> set of 'generic helpers' which have an associated bus and struct
> kunit_device in the meantime. If we can continue to use
> root_device_register until those are ready, that'd be very convenient.

Would it be a tiny bit more acceptable if we did add a very simple:

#define kunit_root_device_register(name) root_device_register(name)
#define kunit_root_device_unregister(dev) root_device_unregister(dev)

to include/kunit/device.h (or somesuch)

This should help us later to at least spot the places where 
root_device_[un]register() is abused and (potentially mass-)covert them 
to use the proper helpers when they're available.

Yours,
	-- Matti
  
David Gow March 24, 2023, 9:52 a.m. UTC | #27
On Fri, 24 Mar 2023 at 14:51, Matti Vaittinen <mazziesaccount@gmail.com> wrote:
>
> On 3/24/23 08:34, David Gow wrote:
> > On Fri, 24 Mar 2023 at 14:11, Matti Vaittinen <mazziesaccount@gmail.com> wrote:
> >>
> >> On 3/23/23 18:36, Maxime Ripard wrote:
> >>> On Thu, Mar 23, 2023 at 03:02:03PM +0200, Matti Vaittinen wrote:
> >>>> On 3/23/23 14:29, Maxime Ripard wrote:
> >>>>> On Thu, Mar 23, 2023 at 02:16:52PM +0200, Matti Vaittinen wrote:
>
> >> Ok. Fair enough. Besides, if the root-device was sufficient - then I
> >> would actually not see the need for a helper. People could in that case
> >> directly use the root_device_register(). So, if helpers are provided
> >> they should be backed up by a device with a bus then.
> >
> > I think there is _some_ value in helpers even without a bus, but it's
> > much more limited:
> > - It's less confusing if KUnit test devices are using kunit labelled
> > structs and functions.
> > - Helpers could use KUnit's resource management API to ensure any
> > device created is properly unregistered and removed when the test
> > exits (particularly if it exits early due to, e.g., an assertion).
>
> Ah. That's true. Being able to abort the test on error w/o being forced
> to do a clean-up dance for the dummy device would be convenient.
>
> > I've played around implementing those with a proper struct
> > kunit_device and the automatic cleanup on test failure, and thus far
> > it -- like root_device_register -- works for all of the tests except
> > the drm-test-managed one.
> >
> > So if we really wanted to, we could use KUnit-specific helpers for
> > just those tests which currently work with root_device_register(), but
> > if we're going to try to implement a KUnit bus -- which I think is at
> > least worth investigating -- I'd rather not either hold up otherwise
> > good tests on helper development, or rush a helper out only to have to
> > change it a lot when we see exactly what the bus implementation would
> > look like.
>
> It's easy for me to agree.
>
> >> As I said, in my very specific IIO related test the test device does not
> >> need a bus. Hence I'll drop the 'generic helpers' from this series.
> >>
> >
> > I think that sounds like a good strategy for now, and we can work on a
> > set of 'generic helpers' which have an associated bus and struct
> > kunit_device in the meantime. If we can continue to use
> > root_device_register until those are ready, that'd be very convenient.
>
> Would it be a tiny bit more acceptable if we did add a very simple:
>
> #define kunit_root_device_register(name) root_device_register(name)
> #define kunit_root_device_unregister(dev) root_device_unregister(dev)
>
> to include/kunit/device.h (or somesuch)
>
> This should help us later to at least spot the places where
> root_device_[un]register() is abused and (potentially mass-)covert them
> to use the proper helpers when they're available.
>

Great idea.

The code I've been playing with has the following in include/kunit/device.h:

/* Register a new device against a KUnit test. */
struct device *kunit_device_register(struct kunit *test, const char *name);
/* Unregister a device created by kunit_device_register() early (i.e.,
before test cleanup). */
void kunit_device_unregister(struct kunit *test, struct device *dev);

If we used the same names, and just forwarded them to
root_device_register() and root_device_unregister() for now
(discarding the struct kunit pointer), then I expect we could just
swap out the implementation to gain the extra functionality.

It's a little less explicit, though, so I could see the value in using
macros with "root_device" in the name to make the current
implementation clearer, and the eventual change more obvious.

Cheers,
-- David
  
Matti Vaittinen March 24, 2023, 10:05 a.m. UTC | #28
On 3/24/23 11:52, David Gow wrote:
> On Fri, 24 Mar 2023 at 14:51, Matti Vaittinen <mazziesaccount@gmail.com> wrote:
>>
>> On 3/24/23 08:34, David Gow wrote:
>>> On Fri, 24 Mar 2023 at 14:11, Matti Vaittinen <mazziesaccount@gmail.com> wrote:

>>> I think that sounds like a good strategy for now, and we can work on a
>>> set of 'generic helpers' which have an associated bus and struct
>>> kunit_device in the meantime. If we can continue to use
>>> root_device_register until those are ready, that'd be very convenient.
>>
>> Would it be a tiny bit more acceptable if we did add a very simple:
>>
>> #define kunit_root_device_register(name) root_device_register(name)
>> #define kunit_root_device_unregister(dev) root_device_unregister(dev)
>>
>> to include/kunit/device.h (or somesuch)
>>
>> This should help us later to at least spot the places where
>> root_device_[un]register() is abused and (potentially mass-)covert them
>> to use the proper helpers when they're available.
>>
> 
> Great idea.
> 
> The code I've been playing with has the following in include/kunit/device.h:
> 
> /* Register a new device against a KUnit test. */
> struct device *kunit_device_register(struct kunit *test, const char *name);
> /* Unregister a device created by kunit_device_register() early (i.e.,
> before test cleanup). */
> void kunit_device_unregister(struct kunit *test, struct device *dev);
> 
> If we used the same names, and just forwarded them to
> root_device_register() and root_device_unregister() for now
> (discarding the struct kunit pointer), then I expect we could just
> swap out the implementation to gain the extra functionality.
> 
> It's a little less explicit, though, so I could see the value in using
> macros with "root_device" in the name to make the current
> implementation clearer, and the eventual change more obvious.

I think it makes sense to avoid the mass-conversions if the signatures 
for kunit_device_register() and kunit_device_unregister() are expected 
to be known by now. If there is no objections I'll add those wrappers + 
the include/kunit/device.h to v6 of this series.

I think I'll try to hold back sending the v6 until next Monday - unless 
I get really bored during the weekend ;)

Yours,
	-- Matti
  
Matti Vaittinen March 24, 2023, 10:17 a.m. UTC | #29
On 3/24/23 12:05, Matti Vaittinen wrote:
> On 3/24/23 11:52, David Gow wrote:
>> On Fri, 24 Mar 2023 at 14:51, Matti Vaittinen 
>> <mazziesaccount@gmail.com> wrote:
>>>
>>> On 3/24/23 08:34, David Gow wrote:
>>>> On Fri, 24 Mar 2023 at 14:11, Matti Vaittinen 
>>>> <mazziesaccount@gmail.com> wrote:
> 
>>>> I think that sounds like a good strategy for now, and we can work on a
>>>> set of 'generic helpers' which have an associated bus and struct
>>>> kunit_device in the meantime. If we can continue to use
>>>> root_device_register until those are ready, that'd be very convenient.
>>>
>>> Would it be a tiny bit more acceptable if we did add a very simple:
>>>
>>> #define kunit_root_device_register(name) root_device_register(name)
>>> #define kunit_root_device_unregister(dev) root_device_unregister(dev)
>>>
>>> to include/kunit/device.h (or somesuch)
>>>
>>> This should help us later to at least spot the places where
>>> root_device_[un]register() is abused and (potentially mass-)covert them
>>> to use the proper helpers when they're available.
>>>
>>
>> Great idea.
>>
>> The code I've been playing with has the following in 
>> include/kunit/device.h:
>>
>> /* Register a new device against a KUnit test. */
>> struct device *kunit_device_register(struct kunit *test, const char 
>> *name);
>> /* Unregister a device created by kunit_device_register() early (i.e.,
>> before test cleanup). */
>> void kunit_device_unregister(struct kunit *test, struct device *dev);
>>
>> If we used the same names, and just forwarded them to
>> root_device_register() and root_device_unregister() for now
>> (discarding the struct kunit pointer), then I expect we could just
>> swap out the implementation to gain the extra functionality.

There's one thing though. If the goal is to do a direct replacement and 
if automatic device deletion upon test completion / test abort is 
planned - then it should be there also for these initial wrappers.

If these wrappers don't yet include the automatic device clean-up - then 
it probably makes more sense to just do the kunit_root_device_* defines 
because the tests are likely to need removing the explicit device 
clean-ups when proper APIs are finished.

Yours,
	-- Matti
  
Maxime Ripard March 24, 2023, 12:31 p.m. UTC | #30
On Fri, Mar 24, 2023 at 08:11:52AM +0200, Matti Vaittinen wrote:
> On 3/23/23 18:36, Maxime Ripard wrote:
> > On Thu, Mar 23, 2023 at 03:02:03PM +0200, Matti Vaittinen wrote:
> > > On 3/23/23 14:29, Maxime Ripard wrote:
> > > > On Thu, Mar 23, 2023 at 02:16:52PM +0200, Matti Vaittinen wrote:
> > > > 
> > > > This is the description of what was happening:
> > > > https://lore.kernel.org/dri-devel/20221117165311.vovrc7usy4efiytl@houat/
> > > 
> > > Thanks Maxime. Do I read this correcty. The devm_ unwinding not being done
> > > when root_device_register() is used is not because root_device_unregister()
> > > would not trigger the unwinding - but rather because DRM code on top of this
> > > device keeps the refcount increased?
> > 
> > There's a difference of behaviour between a root_device and any device
> > with a bus: the root_device will only release the devm resources when
> > it's freed (in device_release), but a bus device will also do it in
> > device_del (through bus_remove_device() -> device_release_driver() ->
> > device_release_driver_internal() -> __device_release_driver() ->
> > device_unbind_cleanup(), which are skipped (in multiple places) if
> > there's no bus and no driver attached to the device).
> > 
> > It does affect DRM, but I'm pretty sure it will affect any framework
> > that deals with device hotplugging by deferring the framework structure
> > until the last (userspace) user closes its file descriptor. So I'd
> > assume that v4l2 and cec at least are also affected, and most likely
> > others.
> 
> Thanks for the explanation and patience :)
> 
> > 
> > > If this is the case, then it sounds like a DRM specific issue to me.
> > 
> > I mean, I guess. One could also argue that it's because IIO doesn't
> > properly deal with hotplugging.
> 
> I must say I haven't been testing the IIO registration API. I've only tested
> the helper API which is not backed up by any "IIO device". (This is fine for
> the helper because it must by design be cleaned-up only after the
> IIO-deregistration).
> 
> After your explanation here, I am not convinced IIO wouldn't see the same
> issue if I was testing the devm_iio_device_alloc() & co.

It depends really. The issue DRM is trying to solve is that, when a
device is gone, some application might still have an open FD and could
still poke into the kernel, while all the resources would have been
free'd if it was using devm.

So everything is kept around until the last fd is closed, so you still
have a reference to the device (even though it's been removed from its
bus) until that time.

It could be possible that IIO just doesn't handle that case at all. I
guess most of the devices aren't hotpluggable, and there's not much to
interact with from a userspace PoV iirc, so it might be why.

> > I'm not sure how that helps. Those are
> > common helpers which should accommodate every framework,
> 
> Ok. Fair enough. Besides, if the root-device was sufficient - then I would
> actually not see the need for a helper. People could in that case directly
> use the root_device_register(). So, if helpers are provided they should be
> backed up by a device with a bus then.
> 
> > and your second
> > patch breaks the kunit tests for DRM anyway.
> 
> Oh, I must have made an error there. It was supposed to be just a
> refactoring with no functional changes. Sorry about that. Anyways, that
> patch can be forgotten as Greg opposes using the platform devices in generic
> helpers.
> 
> > > Whether it is a feature or bug is beyond my knowledge. Still, I would
> > > not say using the root_device_[un]register() in generic code is not
> > > feasible - unless all other subsytems have similar refcount handling.
> > > 
> > > Sure thing using root_device_register() root_device_unregister() in DRM does
> > > not work as such. This, however, does not mean the generic kunit helpers
> > > should use platform_devices to force unwinding?
> > 
> > platform_devices were a quick way to get a device that would have a bus
> > and a driver bound to fall into the right patch above. We probably
> > shouldn't use platform_devices and a kunit_device sounds like the best
> > idea, but the test linked in the original mail I pointed you to should
> > work with whatever we come up with. It works with multiple (platform,
> > PCI, USB, etc) buses, so the mock we create should behave like their
> > real world equivalents.
>
> Thanks for the patience and the explanation. Now I understand a generic test
> device needs to sit on a bus.
> 
> As I said, in my very specific IIO related test the test device does not
> need a bus. Hence I'll drop the 'generic helpers' from this series.

So, I went around and created a bunch of kunit tests that shows the
problem without DRM being involved at all.

It does three things:

 - It registers a device, attaches a devm action, unregisters the device
   and then checks that the action has ran.

 - It registers a device, gets a reference to it, attaches a devm
   action, puts back the reference, unregisters the device and then
   checks that the action has ran.

 - It registers a device, gets a reference to it, attaches a devm action
   that will put back the reference, unregisters the device and then
   checks that the action has ran.

And in three cases: first with a root_device, then platform_device, then
a platform_device that has been bound to a driver.

Once you've applied that patch, you can run it using:

./tools/testing/kunit/kunit.py run --kunitconfig=drivers/base/test/ devm-inconsistencies

You'll see that only the last case passes all the tests, even though the
code itself is exactly the same.

Maxime

-- >8 --
diff --git a/drivers/base/test/.kunitconfig b/drivers/base/test/.kunitconfig
new file mode 100644
index 000000000000..473923f0998b
--- /dev/null
+++ b/drivers/base/test/.kunitconfig
@@ -0,0 +1,2 @@
+CONFIG_KUNIT=y
+CONFIG_DM_KUNIT_TEST=y
diff --git a/drivers/base/test/Kconfig b/drivers/base/test/Kconfig
index 610a1ba7a467..9d42051f8f8e 100644
--- a/drivers/base/test/Kconfig
+++ b/drivers/base/test/Kconfig
@@ -9,6 +9,10 @@ config TEST_ASYNC_DRIVER_PROBE

 	  If unsure say N.

+config DM_KUNIT_TEST
+	tristate "KUnit Tests for the device model" if !KUNIT_ALL_TESTS
+	depends on KUNIT
+
 config DRIVER_PE_KUNIT_TEST
 	bool "KUnit Tests for property entry API" if !KUNIT_ALL_TESTS
 	depends on KUNIT=y
diff --git a/drivers/base/test/Makefile b/drivers/base/test/Makefile
index 7f76fee6f989..31bcaa743e94 100644
--- a/drivers/base/test/Makefile
+++ b/drivers/base/test/Makefile
@@ -1,5 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0
 obj-$(CONFIG_TEST_ASYNC_DRIVER_PROBE)	+= test_async_driver_probe.o

+obj-$(CONFIG_DM_KUNIT_TEST)	+= test-devm-cleanup.o
+
 obj-$(CONFIG_DRIVER_PE_KUNIT_TEST) += property-entry-test.o
 CFLAGS_property-entry-test.o += $(DISABLE_STRUCTLEAK_PLUGIN)
diff --git a/drivers/base/test/test-devm-cleanup.c b/drivers/base/test/test-devm-cleanup.c
new file mode 100644
index 000000000000..bad3cb1385b1
--- /dev/null
+++ b/drivers/base/test/test-devm-cleanup.c
@@ -0,0 +1,353 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <kunit/resource.h>
+
+#include <linux/device.h>
+#include <linux/platform_device.h>
+
+#define DEVICE_NAME "test"
+
+struct test_priv {
+	bool probe_done;
+	bool release_done;
+	wait_queue_head_t release_wq;
+	struct device *dev;
+};
+
+static void devm_device_action(void *ptr)
+{
+	struct test_priv *priv = ptr;
+
+	priv->release_done = true;
+	wake_up_interruptible(&priv->release_wq);
+}
+
+static void devm_put_device_action(void *ptr)
+{
+	struct test_priv *priv = ptr;
+
+	put_device(priv->dev);
+	priv->release_done = true;
+	wake_up_interruptible(&priv->release_wq);
+}
+
+#define RELEASE_TIMEOUT_MS	500
+
+static void root_device_register_unregister_test(struct kunit *test)
+{
+	struct test_priv *priv;
+	int ret;
+
+	priv = kunit_kzalloc(test, sizeof(*priv), GFP_KERNEL);
+	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, priv);
+	init_waitqueue_head(&priv->release_wq);
+
+	priv->dev = root_device_register(DEVICE_NAME);
+	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, priv->dev);
+
+	ret = devm_add_action_or_reset(priv->dev, devm_device_action, priv);
+	KUNIT_ASSERT_EQ(test, ret, 0);
+
+	root_device_unregister(priv->dev);
+
+	ret = wait_event_interruptible_timeout(priv->release_wq, priv->release_done,
+					       msecs_to_jiffies(RELEASE_TIMEOUT_MS));
+	KUNIT_EXPECT_GT(test, ret, 0);
+}
+
+static void root_device_register_get_put_unregister_test(struct kunit *test)
+{
+	struct test_priv *priv;
+	int ret;
+
+	priv = kunit_kzalloc(test, sizeof(*priv), GFP_KERNEL);
+	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, priv);
+	init_waitqueue_head(&priv->release_wq);
+
+	priv->dev = root_device_register(DEVICE_NAME);
+	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, priv->dev);
+
+	get_device(priv->dev);
+
+	ret = devm_add_action_or_reset(priv->dev, devm_device_action, priv);
+	KUNIT_ASSERT_EQ(test, ret, 0);
+
+	put_device(priv->dev);
+
+	root_device_unregister(priv->dev);
+
+	ret = wait_event_interruptible_timeout(priv->release_wq, priv->release_done,
+					       msecs_to_jiffies(RELEASE_TIMEOUT_MS));
+	KUNIT_EXPECT_GT(test, ret, 0);
+}
+
+static void root_device_register_get_unregister_with_devm_test(struct kunit *test)
+{
+	struct test_priv *priv;
+	int ret;
+
+	priv = kunit_kzalloc(test, sizeof(*priv), GFP_KERNEL);
+	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, priv);
+	init_waitqueue_head(&priv->release_wq);
+
+	priv->dev = root_device_register(DEVICE_NAME);
+	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, priv->dev);
+
+	get_device(priv->dev);
+
+	ret = devm_add_action_or_reset(priv->dev, devm_put_device_action, priv);
+	KUNIT_ASSERT_EQ(test, ret, 0);
+
+	root_device_unregister(priv->dev);
+
+	ret = wait_event_interruptible_timeout(priv->release_wq, priv->release_done,
+					       msecs_to_jiffies(RELEASE_TIMEOUT_MS));
+	KUNIT_EXPECT_GT(test, ret, 0);
+}
+
+static void platform_device_register_unregister_test(struct kunit *test)
+{
+	struct platform_device *pdev;
+	struct test_priv *priv;
+	int ret;
+
+	priv = kunit_kzalloc(test, sizeof(*priv), GFP_KERNEL);
+	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, priv);
+	init_waitqueue_head(&priv->release_wq);
+
+	pdev = platform_device_alloc(DEVICE_NAME, PLATFORM_DEVID_NONE);
+	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, pdev);
+
+	ret = platform_device_add(pdev);
+	KUNIT_ASSERT_EQ(test, ret, 0);
+
+	priv->dev = &pdev->dev;
+
+	ret = devm_add_action_or_reset(priv->dev, devm_device_action, priv);
+	KUNIT_ASSERT_EQ(test, ret, 0);
+
+	platform_device_unregister(pdev);
+
+	ret = wait_event_interruptible_timeout(priv->release_wq, priv->release_done,
+					       msecs_to_jiffies(RELEASE_TIMEOUT_MS));
+	KUNIT_EXPECT_GT(test, ret, 0);
+}
+
+static void platform_device_register_get_put_unregister_test(struct kunit *test)
+{
+	struct platform_device *pdev;
+	struct test_priv *priv;
+	int ret;
+
+	priv = kunit_kzalloc(test, sizeof(*priv), GFP_KERNEL);
+	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, priv);
+	init_waitqueue_head(&priv->release_wq);
+
+	pdev = platform_device_alloc(DEVICE_NAME, PLATFORM_DEVID_NONE);
+	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, pdev);
+
+	ret = platform_device_add(pdev);
+	KUNIT_ASSERT_EQ(test, ret, 0);
+
+	priv->dev = &pdev->dev;
+
+	get_device(priv->dev);
+
+	ret = devm_add_action_or_reset(priv->dev, devm_device_action, priv);
+	KUNIT_ASSERT_EQ(test, ret, 0);
+
+	put_device(priv->dev);
+
+	platform_device_unregister(pdev);
+
+	ret = wait_event_interruptible_timeout(priv->release_wq, priv->release_done,
+					       msecs_to_jiffies(RELEASE_TIMEOUT_MS));
+	KUNIT_EXPECT_GT(test, ret, 0);
+}
+
+static void platform_device_register_get_unregister_with_devm_test(struct kunit *test)
+{
+	struct platform_device *pdev;
+	struct test_priv *priv;
+	int ret;
+
+	priv = kunit_kzalloc(test, sizeof(*priv), GFP_KERNEL);
+	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, priv);
+	init_waitqueue_head(&priv->release_wq);
+
+	pdev = platform_device_alloc(DEVICE_NAME, PLATFORM_DEVID_NONE);
+	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, pdev);
+
+	ret = platform_device_add(pdev);
+	KUNIT_ASSERT_EQ(test, ret, 0);
+
+	priv->dev = &pdev->dev;
+
+	get_device(priv->dev);
+
+	ret = devm_add_action_or_reset(priv->dev, devm_put_device_action, priv);
+	KUNIT_ASSERT_EQ(test, ret, 0);
+
+	platform_device_unregister(pdev);
+
+	ret = wait_event_interruptible_timeout(priv->release_wq, priv->release_done,
+					       msecs_to_jiffies(RELEASE_TIMEOUT_MS));
+	KUNIT_EXPECT_GT(test, ret, 0);
+}
+
+static int fake_probe(struct platform_device *pdev)
+{
+	struct test_priv *priv = platform_get_drvdata(pdev);
+
+	priv->probe_done = true;
+	wake_up_interruptible(&priv->release_wq);
+
+	return 0;
+}
+
+static struct platform_driver fake_driver = {
+	.probe	= fake_probe,
+	.driver = {
+		.name = DEVICE_NAME,
+	},
+};
+
+static void probed_platform_device_register_unregister_test(struct kunit *test)
+{
+	struct platform_device *pdev;
+	struct test_priv *priv;
+	int ret;
+
+	priv = kunit_kzalloc(test, sizeof(*priv), GFP_KERNEL);
+	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, priv);
+	init_waitqueue_head(&priv->release_wq);
+
+	ret = platform_driver_register(&fake_driver);
+	KUNIT_ASSERT_EQ(test, ret, 0);
+
+	pdev = platform_device_alloc(DEVICE_NAME, PLATFORM_DEVID_NONE);
+	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, pdev);
+
+	priv->dev = &pdev->dev;
+	platform_set_drvdata(pdev, priv);
+
+	ret = platform_device_add(pdev);
+	KUNIT_ASSERT_EQ(test, ret, 0);
+
+	ret = wait_event_interruptible_timeout(priv->release_wq, priv->probe_done,
+					       msecs_to_jiffies(RELEASE_TIMEOUT_MS));
+	KUNIT_ASSERT_GT(test, ret, 0);
+
+	ret = devm_add_action_or_reset(priv->dev, devm_device_action, priv);
+	KUNIT_ASSERT_EQ(test, ret, 0);
+
+	platform_device_unregister(pdev);
+
+	ret = wait_event_interruptible_timeout(priv->release_wq, priv->release_done,
+					       msecs_to_jiffies(RELEASE_TIMEOUT_MS));
+	KUNIT_EXPECT_GT(test, ret, 0);
+
+	platform_driver_unregister(&fake_driver);
+}
+
+static void probed_platform_device_register_get_put_unregister_test(struct kunit *test)
+{
+	struct platform_device *pdev;
+	struct test_priv *priv;
+	int ret;
+
+	priv = kunit_kzalloc(test, sizeof(*priv), GFP_KERNEL);
+	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, priv);
+	init_waitqueue_head(&priv->release_wq);
+
+	ret = platform_driver_register(&fake_driver);
+	KUNIT_ASSERT_EQ(test, ret, 0);
+
+	pdev = platform_device_alloc(DEVICE_NAME, PLATFORM_DEVID_NONE);
+	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, pdev);
+
+	priv->dev = &pdev->dev;
+	platform_set_drvdata(pdev, priv);
+
+	ret = platform_device_add(pdev);
+	KUNIT_ASSERT_EQ(test, ret, 0);
+
+	ret = wait_event_interruptible_timeout(priv->release_wq, priv->probe_done,
+					       msecs_to_jiffies(RELEASE_TIMEOUT_MS));
+	KUNIT_ASSERT_GT(test, ret, 0);
+
+	get_device(priv->dev);
+
+	ret = devm_add_action_or_reset(priv->dev, devm_device_action, priv);
+	KUNIT_ASSERT_EQ(test, ret, 0);
+
+	put_device(priv->dev);
+
+	platform_device_unregister(pdev);
+
+	ret = wait_event_interruptible_timeout(priv->release_wq, priv->release_done,
+					       msecs_to_jiffies(RELEASE_TIMEOUT_MS));
+	KUNIT_EXPECT_GT(test, ret, 0);
+
+	platform_driver_unregister(&fake_driver);
+}
+
+static void probed_platform_device_register_get_unregister_with_devm_test(struct kunit *test)
+{
+	struct platform_device *pdev;
+	struct test_priv *priv;
+	int ret;
+
+	priv = kunit_kzalloc(test, sizeof(*priv), GFP_KERNEL);
+	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, priv);
+	init_waitqueue_head(&priv->release_wq);
+
+	ret = platform_driver_register(&fake_driver);
+	KUNIT_ASSERT_EQ(test, ret, 0);
+
+	pdev = platform_device_alloc(DEVICE_NAME, PLATFORM_DEVID_NONE);
+	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, pdev);
+
+	priv->dev = &pdev->dev;
+	platform_set_drvdata(pdev, priv);
+
+	ret = platform_device_add(pdev);
+	KUNIT_ASSERT_EQ(test, ret, 0);
+
+	ret = wait_event_interruptible_timeout(priv->release_wq, priv->probe_done,
+					       msecs_to_jiffies(RELEASE_TIMEOUT_MS));
+	KUNIT_ASSERT_GT(test, ret, 0);
+
+	get_device(priv->dev);
+
+	ret = devm_add_action_or_reset(priv->dev, devm_put_device_action, priv);
+	KUNIT_ASSERT_EQ(test, ret, 0);
+
+	platform_device_unregister(pdev);
+
+	ret = wait_event_interruptible_timeout(priv->release_wq, priv->release_done,
+					       msecs_to_jiffies(RELEASE_TIMEOUT_MS));
+	KUNIT_EXPECT_GT(test, ret, 0);
+
+	platform_driver_unregister(&fake_driver);
+}
+
+static struct kunit_case devm_inconsistencies_tests[] = {
+	KUNIT_CASE(root_device_register_unregister_test),
+	KUNIT_CASE(root_device_register_get_put_unregister_test),
+	KUNIT_CASE(root_device_register_get_unregister_with_devm_test),
+	KUNIT_CASE(platform_device_register_unregister_test),
+	KUNIT_CASE(platform_device_register_get_put_unregister_test),
+	KUNIT_CASE(platform_device_register_get_unregister_with_devm_test),
+	KUNIT_CASE(probed_platform_device_register_unregister_test),
+	KUNIT_CASE(probed_platform_device_register_get_put_unregister_test),
+	KUNIT_CASE(probed_platform_device_register_get_unregister_with_devm_test),
+	{}
+};
+
+static struct kunit_suite devm_inconsistencies_test_suite = {
+	.name = "devm-inconsistencies",
+	.test_cases = devm_inconsistencies_tests,
+};
+
+kunit_test_suite(devm_inconsistencies_test_suite);
-- >8 --
  
Maxime Ripard March 24, 2023, 12:36 p.m. UTC | #31
On Thu, Mar 23, 2023 at 11:21:58AM +0100, Greg Kroah-Hartman wrote:
> On Thu, Mar 23, 2023 at 11:12:16AM +0100, Maxime Ripard wrote:
> > On Wed, Mar 22, 2023 at 07:57:10PM +0100, Greg Kroah-Hartman wrote:
> > > > > > +/**
> > > > > > + * test_kunit_helper_alloc_device - Allocate a mock device for a KUnit test
> > > > > > + * @test: The test context object
> > > > > > + *
> > > > > > + * This allocates a fake struct &device to create a mock for a KUnit
> > > > > > + * test. The device will also be bound to a fake driver. It will thus be
> > > > > > + * able to leverage the usual infrastructure and most notably the
> > > > > > + * device-managed resources just like a "real" device.
> > > > > 
> > > > > What specific "usual infrastructure" are you wanting to access here?
> > > > > 
> > > > > And again, if you want a fake device, make a virtual one, by just
> > > > > calling device_create().
> > > > > 
> > > > > Or are you wanting to do "more" with that device pointer than
> > > > > device_create() can give you?
> > > > 
> > > > Personally, I was (am) only interested in devm_ unwinding. I guess the
> > > > device_create(), device_add(), device_remove()... (didn't study this
> > > > sequence in details so sorry if there is errors) could've been sufficient
> > > > for me. I haven't looked how much of the code that there is for 'platform
> > > > devices' should be duplicated to support that sequence for testability
> > > > purposes.
> > > 
> > > Any device can access devm_ code, there's no need for it to be a
> > > platform device at all.
> > 
> > Sure but the resources are only released if the device is part of a bus,
> > so it can't be a root_device (or bare device) either
> 
> The resources are not cleaned up when the device is freed no matter if
> it's on a bus or not?  If so, then that's a bug that needs to be fixed,
> and tested :)

Please have a look at:
https://lore.kernel.org/linux-kselftest/20230324123157.bbwvfq4gsxnlnfwb@houat/

I couldn't get an answer on whether it was considered a bug or not last
time, but as you can see there's a clear difference between a root
device and a platform device that has probed when it comes to resource
cleanup.

Maxime
  
Greg KH March 24, 2023, 12:43 p.m. UTC | #32
On Fri, Mar 24, 2023 at 01:36:32PM +0100, Maxime Ripard wrote:
> On Thu, Mar 23, 2023 at 11:21:58AM +0100, Greg Kroah-Hartman wrote:
> > On Thu, Mar 23, 2023 at 11:12:16AM +0100, Maxime Ripard wrote:
> > > On Wed, Mar 22, 2023 at 07:57:10PM +0100, Greg Kroah-Hartman wrote:
> > > > > > > +/**
> > > > > > > + * test_kunit_helper_alloc_device - Allocate a mock device for a KUnit test
> > > > > > > + * @test: The test context object
> > > > > > > + *
> > > > > > > + * This allocates a fake struct &device to create a mock for a KUnit
> > > > > > > + * test. The device will also be bound to a fake driver. It will thus be
> > > > > > > + * able to leverage the usual infrastructure and most notably the
> > > > > > > + * device-managed resources just like a "real" device.
> > > > > > 
> > > > > > What specific "usual infrastructure" are you wanting to access here?
> > > > > > 
> > > > > > And again, if you want a fake device, make a virtual one, by just
> > > > > > calling device_create().
> > > > > > 
> > > > > > Or are you wanting to do "more" with that device pointer than
> > > > > > device_create() can give you?
> > > > > 
> > > > > Personally, I was (am) only interested in devm_ unwinding. I guess the
> > > > > device_create(), device_add(), device_remove()... (didn't study this
> > > > > sequence in details so sorry if there is errors) could've been sufficient
> > > > > for me. I haven't looked how much of the code that there is for 'platform
> > > > > devices' should be duplicated to support that sequence for testability
> > > > > purposes.
> > > > 
> > > > Any device can access devm_ code, there's no need for it to be a
> > > > platform device at all.
> > > 
> > > Sure but the resources are only released if the device is part of a bus,
> > > so it can't be a root_device (or bare device) either
> > 
> > The resources are not cleaned up when the device is freed no matter if
> > it's on a bus or not?  If so, then that's a bug that needs to be fixed,
> > and tested :)
> 
> Please have a look at:
> https://lore.kernel.org/linux-kselftest/20230324123157.bbwvfq4gsxnlnfwb@houat/
> 
> I couldn't get an answer on whether it was considered a bug or not last
> time, but as you can see there's a clear difference between a root
> device and a platform device that has probed when it comes to resource
> cleanup.

Great, testing shows there are bugs!  :)

That's a great start of a test, how about submitting that in a way that
I can test it and we can go from there?

thanks,

greg k-h
  
Maxime Ripard March 24, 2023, 12:46 p.m. UTC | #33
On Fri, Mar 24, 2023 at 02:34:19PM +0800, David Gow wrote:
> On Fri, 24 Mar 2023 at 14:11, Matti Vaittinen <mazziesaccount@gmail.com> wrote:
> >
> > On 3/23/23 18:36, Maxime Ripard wrote:
> > > On Thu, Mar 23, 2023 at 03:02:03PM +0200, Matti Vaittinen wrote:
> > >> On 3/23/23 14:29, Maxime Ripard wrote:
> > >>> On Thu, Mar 23, 2023 at 02:16:52PM +0200, Matti Vaittinen wrote:
> > >>>
> > >>> This is the description of what was happening:
> > >>> https://lore.kernel.org/dri-devel/20221117165311.vovrc7usy4efiytl@houat/
> > >>
> > >> Thanks Maxime. Do I read this correcty. The devm_ unwinding not being done
> > >> when root_device_register() is used is not because root_device_unregister()
> > >> would not trigger the unwinding - but rather because DRM code on top of this
> > >> device keeps the refcount increased?
> > >
> > > There's a difference of behaviour between a root_device and any device
> > > with a bus: the root_device will only release the devm resources when
> > > it's freed (in device_release), but a bus device will also do it in
> > > device_del (through bus_remove_device() -> device_release_driver() ->
> > > device_release_driver_internal() -> __device_release_driver() ->
> > > device_unbind_cleanup(), which are skipped (in multiple places) if
> > > there's no bus and no driver attached to the device).
> > >
> > > It does affect DRM, but I'm pretty sure it will affect any framework
> > > that deals with device hotplugging by deferring the framework structure
> > > until the last (userspace) user closes its file descriptor. So I'd
> > > assume that v4l2 and cec at least are also affected, and most likely
> > > others.
> >
> > Thanks for the explanation and patience :)
> >
> 
> Thanks from me as well -- this has certainly helped me understand some
> of the details of the driver model that had slipped past me.
> 
> > >
> > >> If this is the case, then it sounds like a DRM specific issue to me.
> > >
> > > I mean, I guess. One could also argue that it's because IIO doesn't
> > > properly deal with hotplugging.
> >
> > I must say I haven't been testing the IIO registration API. I've only
> > tested the helper API which is not backed up by any "IIO device". (This
> > is fine for the helper because it must by design be cleaned-up only
> > after the IIO-deregistration).
> >
> > After your explanation here, I am not convinced IIO wouldn't see the
> > same issue if I was testing the devm_iio_device_alloc() & co.
> >
> > > I'm not sure how that helps. Those are
> > > common helpers which should accommodate every framework,
> >
> > Ok. Fair enough. Besides, if the root-device was sufficient - then I
> > would actually not see the need for a helper. People could in that case
> > directly use the root_device_register(). So, if helpers are provided
> > they should be backed up by a device with a bus then.
> >
> 
> I think there is _some_ value in helpers even without a bus, but it's
> much more limited:
> - It's less confusing if KUnit test devices are using kunit labelled
> structs and functions.
> - Helpers could use KUnit's resource management API to ensure any
> device created is properly unregistered and removed when the test
> exits (particularly if it exits early due to, e.g., an assertion).
> 
> I've played around implementing those with a proper struct
> kunit_device and the automatic cleanup on test failure, and thus far
> it -- like root_device_register -- works for all of the tests except
> the drm-test-managed one.

Yeah, like I said you need a device that has been bound to a driver for
it to work at the moment.

I guess for driver mocks we could move to a setup where we get
kunit-specific drivers like what Stephen has been implementing for the
clocks but I guess we would need to register the kunit tests in the
driver probe which doesn't look like it's possible at the moment?

Maxime
  
Maxime Ripard March 24, 2023, 1:02 p.m. UTC | #34
On Fri, Mar 24, 2023 at 01:43:07PM +0100, Greg Kroah-Hartman wrote:
> On Fri, Mar 24, 2023 at 01:36:32PM +0100, Maxime Ripard wrote:
> > On Thu, Mar 23, 2023 at 11:21:58AM +0100, Greg Kroah-Hartman wrote:
> > > On Thu, Mar 23, 2023 at 11:12:16AM +0100, Maxime Ripard wrote:
> > > > On Wed, Mar 22, 2023 at 07:57:10PM +0100, Greg Kroah-Hartman wrote:
> > > > > > > > +/**
> > > > > > > > + * test_kunit_helper_alloc_device - Allocate a mock device for a KUnit test
> > > > > > > > + * @test: The test context object
> > > > > > > > + *
> > > > > > > > + * This allocates a fake struct &device to create a mock for a KUnit
> > > > > > > > + * test. The device will also be bound to a fake driver. It will thus be
> > > > > > > > + * able to leverage the usual infrastructure and most notably the
> > > > > > > > + * device-managed resources just like a "real" device.
> > > > > > > 
> > > > > > > What specific "usual infrastructure" are you wanting to access here?
> > > > > > > 
> > > > > > > And again, if you want a fake device, make a virtual one, by just
> > > > > > > calling device_create().
> > > > > > > 
> > > > > > > Or are you wanting to do "more" with that device pointer than
> > > > > > > device_create() can give you?
> > > > > > 
> > > > > > Personally, I was (am) only interested in devm_ unwinding. I guess the
> > > > > > device_create(), device_add(), device_remove()... (didn't study this
> > > > > > sequence in details so sorry if there is errors) could've been sufficient
> > > > > > for me. I haven't looked how much of the code that there is for 'platform
> > > > > > devices' should be duplicated to support that sequence for testability
> > > > > > purposes.
> > > > > 
> > > > > Any device can access devm_ code, there's no need for it to be a
> > > > > platform device at all.
> > > > 
> > > > Sure but the resources are only released if the device is part of a bus,
> > > > so it can't be a root_device (or bare device) either
> > > 
> > > The resources are not cleaned up when the device is freed no matter if
> > > it's on a bus or not?  If so, then that's a bug that needs to be fixed,
> > > and tested :)
> > 
> > Please have a look at:
> > https://lore.kernel.org/linux-kselftest/20230324123157.bbwvfq4gsxnlnfwb@houat/
> > 
> > I couldn't get an answer on whether it was considered a bug or not last
> > time, but as you can see there's a clear difference between a root
> > device and a platform device that has probed when it comes to resource
> > cleanup.
> 
> Great, testing shows there are bugs!  :)

I mean, it wasn't clear to me that it was indeed a bug or the intent
behind devm was that it would only work when probed. Both seemed
reasonable.

> That's a great start of a test, how about submitting that in a way that
> I can test it and we can go from there?

Ack.

I guess I'd need to arrange them somewhat differently for it to be
useful and merge-able.

How would you prefer them to be submitted, in two different files
testing both the root devices and platform devices?

Maxime
  
Greg KH March 24, 2023, 1:42 p.m. UTC | #35
On Fri, Mar 24, 2023 at 02:02:06PM +0100, Maxime Ripard wrote:
> On Fri, Mar 24, 2023 at 01:43:07PM +0100, Greg Kroah-Hartman wrote:
> > On Fri, Mar 24, 2023 at 01:36:32PM +0100, Maxime Ripard wrote:
> > > On Thu, Mar 23, 2023 at 11:21:58AM +0100, Greg Kroah-Hartman wrote:
> > > > On Thu, Mar 23, 2023 at 11:12:16AM +0100, Maxime Ripard wrote:
> > > > > On Wed, Mar 22, 2023 at 07:57:10PM +0100, Greg Kroah-Hartman wrote:
> > > > > > > > > +/**
> > > > > > > > > + * test_kunit_helper_alloc_device - Allocate a mock device for a KUnit test
> > > > > > > > > + * @test: The test context object
> > > > > > > > > + *
> > > > > > > > > + * This allocates a fake struct &device to create a mock for a KUnit
> > > > > > > > > + * test. The device will also be bound to a fake driver. It will thus be
> > > > > > > > > + * able to leverage the usual infrastructure and most notably the
> > > > > > > > > + * device-managed resources just like a "real" device.
> > > > > > > > 
> > > > > > > > What specific "usual infrastructure" are you wanting to access here?
> > > > > > > > 
> > > > > > > > And again, if you want a fake device, make a virtual one, by just
> > > > > > > > calling device_create().
> > > > > > > > 
> > > > > > > > Or are you wanting to do "more" with that device pointer than
> > > > > > > > device_create() can give you?
> > > > > > > 
> > > > > > > Personally, I was (am) only interested in devm_ unwinding. I guess the
> > > > > > > device_create(), device_add(), device_remove()... (didn't study this
> > > > > > > sequence in details so sorry if there is errors) could've been sufficient
> > > > > > > for me. I haven't looked how much of the code that there is for 'platform
> > > > > > > devices' should be duplicated to support that sequence for testability
> > > > > > > purposes.
> > > > > > 
> > > > > > Any device can access devm_ code, there's no need for it to be a
> > > > > > platform device at all.
> > > > > 
> > > > > Sure but the resources are only released if the device is part of a bus,
> > > > > so it can't be a root_device (or bare device) either
> > > > 
> > > > The resources are not cleaned up when the device is freed no matter if
> > > > it's on a bus or not?  If so, then that's a bug that needs to be fixed,
> > > > and tested :)
> > > 
> > > Please have a look at:
> > > https://lore.kernel.org/linux-kselftest/20230324123157.bbwvfq4gsxnlnfwb@houat/
> > > 
> > > I couldn't get an answer on whether it was considered a bug or not last
> > > time, but as you can see there's a clear difference between a root
> > > device and a platform device that has probed when it comes to resource
> > > cleanup.
> > 
> > Great, testing shows there are bugs!  :)
> 
> I mean, it wasn't clear to me that it was indeed a bug or the intent
> behind devm was that it would only work when probed. Both seemed
> reasonable.
> 
> > That's a great start of a test, how about submitting that in a way that
> > I can test it and we can go from there?
> 
> Ack.
> 
> I guess I'd need to arrange them somewhat differently for it to be
> useful and merge-able.
> 
> How would you prefer them to be submitted, in two different files
> testing both the root devices and platform devices?

root devices are rare, but yes, one for each would be good, thanks!

greg k-h
  
David Gow March 25, 2023, 4:35 a.m. UTC | #36
On Fri, 24 Mar 2023 at 18:17, Matti Vaittinen <mazziesaccount@gmail.com> wrote:
>
> On 3/24/23 12:05, Matti Vaittinen wrote:
> > On 3/24/23 11:52, David Gow wrote:
> >> On Fri, 24 Mar 2023 at 14:51, Matti Vaittinen
> >> <mazziesaccount@gmail.com> wrote:
> >>>
> >>> On 3/24/23 08:34, David Gow wrote:
> >>>> On Fri, 24 Mar 2023 at 14:11, Matti Vaittinen
> >>>> <mazziesaccount@gmail.com> wrote:
> >
> >>>> I think that sounds like a good strategy for now, and we can work on a
> >>>> set of 'generic helpers' which have an associated bus and struct
> >>>> kunit_device in the meantime. If we can continue to use
> >>>> root_device_register until those are ready, that'd be very convenient.
> >>>
> >>> Would it be a tiny bit more acceptable if we did add a very simple:
> >>>
> >>> #define kunit_root_device_register(name) root_device_register(name)
> >>> #define kunit_root_device_unregister(dev) root_device_unregister(dev)
> >>>
> >>> to include/kunit/device.h (or somesuch)
> >>>
> >>> This should help us later to at least spot the places where
> >>> root_device_[un]register() is abused and (potentially mass-)covert them
> >>> to use the proper helpers when they're available.
> >>>
> >>
> >> Great idea.
> >>
> >> The code I've been playing with has the following in
> >> include/kunit/device.h:
> >>
> >> /* Register a new device against a KUnit test. */
> >> struct device *kunit_device_register(struct kunit *test, const char
> >> *name);
> >> /* Unregister a device created by kunit_device_register() early (i.e.,
> >> before test cleanup). */
> >> void kunit_device_unregister(struct kunit *test, struct device *dev);
> >>
> >> If we used the same names, and just forwarded them to
> >> root_device_register() and root_device_unregister() for now
> >> (discarding the struct kunit pointer), then I expect we could just
> >> swap out the implementation to gain the extra functionality.
>
> There's one thing though. If the goal is to do a direct replacement and
> if automatic device deletion upon test completion / test abort is
> planned - then it should be there also for these initial wrappers.
>

Yeah, that's an excellent point. It's a pretty subtle change in
behaviour to suddenly introduce that, so changing it behind the scenes
is probably unwise.

> If these wrappers don't yet include the automatic device clean-up - then
> it probably makes more sense to just do the kunit_root_device_* defines
> because the tests are likely to need removing the explicit device
> clean-ups when proper APIs are finished.
>

I sent out my prototype implementation of this here, which does do the
automatic cleanup:
https://lore.kernel.org/linux-kselftest/20230325043104.3761770-1-davidgow@google.com/T/#mf797239a8bce11630875fdf60aab9ed627add1f0

It's probably overkill to squeeze into your patch series, though,
given it also adds and uses a whole new kunit_defer() API.

Cheers,
-- David
  
David Gow March 25, 2023, 5:40 a.m. UTC | #37
On Fri, 24 Mar 2023 at 20:32, Maxime Ripard <maxime@cerno.tech> wrote:
>
> On Fri, Mar 24, 2023 at 08:11:52AM +0200, Matti Vaittinen wrote:
> > On 3/23/23 18:36, Maxime Ripard wrote:
> > > On Thu, Mar 23, 2023 at 03:02:03PM +0200, Matti Vaittinen wrote:
> > > > On 3/23/23 14:29, Maxime Ripard wrote:
> > > > > On Thu, Mar 23, 2023 at 02:16:52PM +0200, Matti Vaittinen wrote:
> > > > >
> > > > > This is the description of what was happening:
> > > > > https://lore.kernel.org/dri-devel/20221117165311.vovrc7usy4efiytl@houat/
> > > >
> > > > Thanks Maxime. Do I read this correcty. The devm_ unwinding not being done
> > > > when root_device_register() is used is not because root_device_unregister()
> > > > would not trigger the unwinding - but rather because DRM code on top of this
> > > > device keeps the refcount increased?
> > >
> > > There's a difference of behaviour between a root_device and any device
> > > with a bus: the root_device will only release the devm resources when
> > > it's freed (in device_release), but a bus device will also do it in
> > > device_del (through bus_remove_device() -> device_release_driver() ->
> > > device_release_driver_internal() -> __device_release_driver() ->
> > > device_unbind_cleanup(), which are skipped (in multiple places) if
> > > there's no bus and no driver attached to the device).
> > >
> > > It does affect DRM, but I'm pretty sure it will affect any framework
> > > that deals with device hotplugging by deferring the framework structure
> > > until the last (userspace) user closes its file descriptor. So I'd
> > > assume that v4l2 and cec at least are also affected, and most likely
> > > others.
> >
> > Thanks for the explanation and patience :)
> >
> > >
> > > > If this is the case, then it sounds like a DRM specific issue to me.
> > >
> > > I mean, I guess. One could also argue that it's because IIO doesn't
> > > properly deal with hotplugging.
> >
> > I must say I haven't been testing the IIO registration API. I've only tested
> > the helper API which is not backed up by any "IIO device". (This is fine for
> > the helper because it must by design be cleaned-up only after the
> > IIO-deregistration).
> >
> > After your explanation here, I am not convinced IIO wouldn't see the same
> > issue if I was testing the devm_iio_device_alloc() & co.
>
> It depends really. The issue DRM is trying to solve is that, when a
> device is gone, some application might still have an open FD and could
> still poke into the kernel, while all the resources would have been
> free'd if it was using devm.
>
> So everything is kept around until the last fd is closed, so you still
> have a reference to the device (even though it's been removed from its
> bus) until that time.
>
> It could be possible that IIO just doesn't handle that case at all. I
> guess most of the devices aren't hotpluggable, and there's not much to
> interact with from a userspace PoV iirc, so it might be why.
>
> > > I'm not sure how that helps. Those are
> > > common helpers which should accommodate every framework,
> >
> > Ok. Fair enough. Besides, if the root-device was sufficient - then I would
> > actually not see the need for a helper. People could in that case directly
> > use the root_device_register(). So, if helpers are provided they should be
> > backed up by a device with a bus then.
> >
> > > and your second
> > > patch breaks the kunit tests for DRM anyway.
> >
> > Oh, I must have made an error there. It was supposed to be just a
> > refactoring with no functional changes. Sorry about that. Anyways, that
> > patch can be forgotten as Greg opposes using the platform devices in generic
> > helpers.
> >
> > > > Whether it is a feature or bug is beyond my knowledge. Still, I would
> > > > not say using the root_device_[un]register() in generic code is not
> > > > feasible - unless all other subsytems have similar refcount handling.
> > > >
> > > > Sure thing using root_device_register() root_device_unregister() in DRM does
> > > > not work as such. This, however, does not mean the generic kunit helpers
> > > > should use platform_devices to force unwinding?
> > >
> > > platform_devices were a quick way to get a device that would have a bus
> > > and a driver bound to fall into the right patch above. We probably
> > > shouldn't use platform_devices and a kunit_device sounds like the best
> > > idea, but the test linked in the original mail I pointed you to should
> > > work with whatever we come up with. It works with multiple (platform,
> > > PCI, USB, etc) buses, so the mock we create should behave like their
> > > real world equivalents.
> >
> > Thanks for the patience and the explanation. Now I understand a generic test
> > device needs to sit on a bus.
> >
> > As I said, in my very specific IIO related test the test device does not
> > need a bus. Hence I'll drop the 'generic helpers' from this series.
>
> So, I went around and created a bunch of kunit tests that shows the
> problem without DRM being involved at all.
>
> It does three things:
>
>  - It registers a device, attaches a devm action, unregisters the device
>    and then checks that the action has ran.
>
>  - It registers a device, gets a reference to it, attaches a devm
>    action, puts back the reference, unregisters the device and then
>    checks that the action has ran.
>
>  - It registers a device, gets a reference to it, attaches a devm action
>    that will put back the reference, unregisters the device and then
>    checks that the action has ran.
>
> And in three cases: first with a root_device, then platform_device, then
> a platform_device that has been bound to a driver.
>
> Once you've applied that patch, you can run it using:
>
> ./tools/testing/kunit/kunit.py run --kunitconfig=drivers/base/test/ devm-inconsistencies
>
> You'll see that only the last case passes all the tests, even though the
> code itself is exactly the same.
>

This illustrates the problem very nicely, thanks.

I played around a bit with this, and I'm definitely leaning towards
this being a bug, rather than intentional behaviour. There's actually
an explicit call to devres_release_all() in device_release() which
seems to suggest that this should work:
/*
* Some platform devices are driven without driver attached
* and managed resources may have been acquired.  Make sure
* all resources are released.
*
* Drivers still can add resources into device after device
* is deleted but alive, so release devres here to avoid
* possible memory leak.
*/

My "solution" is just to call devres_release_all() in device_del() as
well, which fixes your tests (and the drm-test-managed one when ported
to use root_device_register() or my kunit_device_register() API[1]).

--8<--
diff --git a/drivers/base/core.c b/drivers/base/core.c
index 6878dfcbf0d6..adfec6185014 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -3778,6 +3778,17 @@ void device_del(struct device *dev)
       device_platform_notify_remove(dev);
       device_links_purge(dev);

+       /*
+        * If a device does not have a driver attached, we need to clean up any
+        * managed resources. We do this in device_release(), but it's never
+        * called (and we leak the device) if a managed resource holds a
+        * reference to the device. So release all managed resources here,
+        * like we do in driver_detach(). We still need to do so again in
+        * device_release() in case someone adds a new resource after this
+        * point, though.
+        */
+       devres_release_all(dev);
+
       bus_notify(dev, BUS_NOTIFY_REMOVED_DEVICE);
       kobject_uevent(&dev->kobj, KOBJ_REMOVE);
       glue_dir = get_glue_dir(dev);

-->8--

It doesn't _seem_ to break anything else, and I've managed to convince
myself that it's probably the correct fix. (Albeit, as someone with a
limited knowledge of this part of the code, who also hasn't had quite
enough sleep, so take that with some salt.)

Still, I'd agree with Greg that it'd be great to have versions of your
tests upstream before making any such radical changes.

Cheers,
-- David
  
Matti Vaittinen March 25, 2023, 7:26 a.m. UTC | #38
On 3/25/23 06:35, David Gow wrote:
> On Fri, 24 Mar 2023 at 18:17, Matti Vaittinen <mazziesaccount@gmail.com> wrote:
>>
>> On 3/24/23 12:05, Matti Vaittinen wrote:
>>> On 3/24/23 11:52, David Gow wrote:
>>>> On Fri, 24 Mar 2023 at 14:51, Matti Vaittinen
>>>> <mazziesaccount@gmail.com> wrote:
>>>>>
>>>>> On 3/24/23 08:34, David Gow wrote:
>>>>>> On Fri, 24 Mar 2023 at 14:11, Matti Vaittinen
>>>>>> <mazziesaccount@gmail.com> wrote:
>>>
>>>>>> I think that sounds like a good strategy for now, and we can work on a
>>>>>> set of 'generic helpers' which have an associated bus and struct
>>>>>> kunit_device in the meantime. If we can continue to use
>>>>>> root_device_register until those are ready, that'd be very convenient.
>>>>>
>>>>> Would it be a tiny bit more acceptable if we did add a very simple:
>>>>>
>>>>> #define kunit_root_device_register(name) root_device_register(name)
>>>>> #define kunit_root_device_unregister(dev) root_device_unregister(dev)
>>>>>
>>>>> to include/kunit/device.h (or somesuch)
>>>>>
>>>>> This should help us later to at least spot the places where
>>>>> root_device_[un]register() is abused and (potentially mass-)covert them
>>>>> to use the proper helpers when they're available.
>>>>>
>>>>
>>>> Great idea.
>>>>
>>>> The code I've been playing with has the following in
>>>> include/kunit/device.h:
>>>>
>>>> /* Register a new device against a KUnit test. */
>>>> struct device *kunit_device_register(struct kunit *test, const char
>>>> *name);
>>>> /* Unregister a device created by kunit_device_register() early (i.e.,
>>>> before test cleanup). */
>>>> void kunit_device_unregister(struct kunit *test, struct device *dev);
>>>>
>>>> If we used the same names, and just forwarded them to
>>>> root_device_register() and root_device_unregister() for now
>>>> (discarding the struct kunit pointer), then I expect we could just
>>>> swap out the implementation to gain the extra functionality.
>>
>> There's one thing though. If the goal is to do a direct replacement and
>> if automatic device deletion upon test completion / test abort is
>> planned - then it should be there also for these initial wrappers.
>>
> 
> Yeah, that's an excellent point. It's a pretty subtle change in
> behaviour to suddenly introduce that, so changing it behind the scenes
> is probably unwise.
> 
>> If these wrappers don't yet include the automatic device clean-up - then
>> it probably makes more sense to just do the kunit_root_device_* defines
>> because the tests are likely to need removing the explicit device
>> clean-ups when proper APIs are finished.
>>
> 
> I sent out my prototype implementation of this here, which does do the
> automatic cleanup:
> https://lore.kernel.org/linux-kselftest/20230325043104.3761770-1-davidgow@google.com/T/#mf797239a8bce11630875fdf60aab9ed627add1f0
> 
> It's probably overkill to squeeze into your patch series, though,
> given it also adds and uses a whole new kunit_defer() API.

Thanks for letting me know. I did also prepare this commit yesterday:
https://github.com/M-Vaittinen/linux/commit/b784a90f8cc64ff83e802ec818e662fae1d0c264

It does use the existing kunit resources for clean-up. I am not sure if 
it is worth a shot or should I just drop it and use the root-device API 
for now. Any educated opinions on that? :)

Yours,
	-- Matti
  
Jonathan Cameron March 25, 2023, 5:50 p.m. UTC | #39
On Fri, 24 Mar 2023 13:31:57 +0100
Maxime Ripard <maxime@cerno.tech> wrote:

> On Fri, Mar 24, 2023 at 08:11:52AM +0200, Matti Vaittinen wrote:
> > On 3/23/23 18:36, Maxime Ripard wrote:  
> > > On Thu, Mar 23, 2023 at 03:02:03PM +0200, Matti Vaittinen wrote:  
> > > > On 3/23/23 14:29, Maxime Ripard wrote:  
> > > > > On Thu, Mar 23, 2023 at 02:16:52PM +0200, Matti Vaittinen wrote:
> > > > > 
> > > > > This is the description of what was happening:
> > > > > https://lore.kernel.org/dri-devel/20221117165311.vovrc7usy4efiytl@houat/  
> > > > 
> > > > Thanks Maxime. Do I read this correcty. The devm_ unwinding not being done
> > > > when root_device_register() is used is not because root_device_unregister()
> > > > would not trigger the unwinding - but rather because DRM code on top of this
> > > > device keeps the refcount increased?  
> > > 
> > > There's a difference of behaviour between a root_device and any device
> > > with a bus: the root_device will only release the devm resources when
> > > it's freed (in device_release), but a bus device will also do it in
> > > device_del (through bus_remove_device() -> device_release_driver() ->
> > > device_release_driver_internal() -> __device_release_driver() ->
> > > device_unbind_cleanup(), which are skipped (in multiple places) if
> > > there's no bus and no driver attached to the device).
> > > 
> > > It does affect DRM, but I'm pretty sure it will affect any framework
> > > that deals with device hotplugging by deferring the framework structure
> > > until the last (userspace) user closes its file descriptor. So I'd
> > > assume that v4l2 and cec at least are also affected, and most likely
> > > others.  
> > 
> > Thanks for the explanation and patience :)
> >   
> > >   
> > > > If this is the case, then it sounds like a DRM specific issue to me.  
> > > 
> > > I mean, I guess. One could also argue that it's because IIO doesn't
> > > properly deal with hotplugging.  
> > 
> > I must say I haven't been testing the IIO registration API. I've only tested
> > the helper API which is not backed up by any "IIO device". (This is fine for
> > the helper because it must by design be cleaned-up only after the
> > IIO-deregistration).
> > 
> > After your explanation here, I am not convinced IIO wouldn't see the same
> > issue if I was testing the devm_iio_device_alloc() & co.  
> 
> It depends really. The issue DRM is trying to solve is that, when a
> device is gone, some application might still have an open FD and could
> still poke into the kernel, while all the resources would have been
> free'd if it was using devm.
> 
> So everything is kept around until the last fd is closed, so you still
> have a reference to the device (even though it's been removed from its
> bus) until that time.
> 
> It could be possible that IIO just doesn't handle that case at all. I
> guess most of the devices aren't hotpluggable, and there's not much to
> interact with from a userspace PoV iirc, so it might be why.

Lars-Peter Clausen (IIRC) fixed up the IIO handling of the similar cases a
long time ago now. It's simpler that for some other subsystems as we don't
have as many interdependencies as occur in DRM etc.

I 'think' we are fine in general with the IIO approach to this (I think we
did have one report of a theoretical race condition in the remove path that
was never fully addressed).

For IIO we also have fds that can be open but all accesses to them are proxied
through the IIO core and one of the things iio_device_unregister() or the devm
equivalent does is to set indio_dev->info = NULL  (+ wake up anyone waiting on
data etc). Alongside removing the callbacks, that is also used as a flag
to indicate the device has gone.

Note that we keep a reference to the struct indio_dev->dev (rather that the
underlying device) so that is not freed until the last fd is closed.
Thus, although devm unwinding has occurred that doesn't mean all the data
that was allocated with devm_xx calls is cleared up immediately.
  
Lars-Peter Clausen March 26, 2023, 5:16 p.m. UTC | #40
On 3/25/23 10:50, Jonathan Cameron wrote:
> On Fri, 24 Mar 2023 13:31:57 +0100
> Maxime Ripard <maxime@cerno.tech> wrote:
>
>> On Fri, Mar 24, 2023 at 08:11:52AM +0200, Matti Vaittinen wrote:
>>> On 3/23/23 18:36, Maxime Ripard wrote:
>>>> On Thu, Mar 23, 2023 at 03:02:03PM +0200, Matti Vaittinen wrote:
>>>>> On 3/23/23 14:29, Maxime Ripard wrote:
>>>>>> On Thu, Mar 23, 2023 at 02:16:52PM +0200, Matti Vaittinen wrote:
>>>>>>
>>>>>> This is the description of what was happening:
>>>>>> https://lore.kernel.org/dri-devel/20221117165311.vovrc7usy4efiytl@houat/
>>>>> Thanks Maxime. Do I read this correcty. The devm_ unwinding not being done
>>>>> when root_device_register() is used is not because root_device_unregister()
>>>>> would not trigger the unwinding - but rather because DRM code on top of this
>>>>> device keeps the refcount increased?
>>>> There's a difference of behaviour between a root_device and any device
>>>> with a bus: the root_device will only release the devm resources when
>>>> it's freed (in device_release), but a bus device will also do it in
>>>> device_del (through bus_remove_device() -> device_release_driver() ->
>>>> device_release_driver_internal() -> __device_release_driver() ->
>>>> device_unbind_cleanup(), which are skipped (in multiple places) if
>>>> there's no bus and no driver attached to the device).
>>>>
>>>> It does affect DRM, but I'm pretty sure it will affect any framework
>>>> that deals with device hotplugging by deferring the framework structure
>>>> until the last (userspace) user closes its file descriptor. So I'd
>>>> assume that v4l2 and cec at least are also affected, and most likely
>>>> others.
>>> Thanks for the explanation and patience :)
>>>    
>>>>    
>>>>> If this is the case, then it sounds like a DRM specific issue to me.
>>>> I mean, I guess. One could also argue that it's because IIO doesn't
>>>> properly deal with hotplugging.
>>> I must say I haven't been testing the IIO registration API. I've only tested
>>> the helper API which is not backed up by any "IIO device". (This is fine for
>>> the helper because it must by design be cleaned-up only after the
>>> IIO-deregistration).
>>>
>>> After your explanation here, I am not convinced IIO wouldn't see the same
>>> issue if I was testing the devm_iio_device_alloc() & co.
>> It depends really. The issue DRM is trying to solve is that, when a
>> device is gone, some application might still have an open FD and could
>> still poke into the kernel, while all the resources would have been
>> free'd if it was using devm.
>>
>> So everything is kept around until the last fd is closed, so you still
>> have a reference to the device (even though it's been removed from its
>> bus) until that time.
>>
>> It could be possible that IIO just doesn't handle that case at all. I
>> guess most of the devices aren't hotpluggable, and there's not much to
>> interact with from a userspace PoV iirc, so it might be why.
> Lars-Peter Clausen (IIRC) fixed up the IIO handling of the similar cases a
> long time ago now. It's simpler that for some other subsystems as we don't
> have as many interdependencies as occur in DRM etc.
>
> I 'think' we are fine in general with the IIO approach to this (I think we
> did have one report of a theoretical race condition in the remove path that
> was never fully addressed).
>
> For IIO we also have fds that can be open but all accesses to them are proxied
> through the IIO core and one of the things iio_device_unregister() or the devm
> equivalent does is to set indio_dev->info = NULL  (+ wake up anyone waiting on
> data etc). Alongside removing the callbacks, that is also used as a flag
> to indicate the device has gone.
>
> Note that we keep a reference to the struct indio_dev->dev (rather that the
> underlying device) so that is not freed until the last fd is closed.
> Thus, although devm unwinding has occurred that doesn't mean all the data
> that was allocated with devm_xx calls is cleared up immediately.

IIO is fully hot-plug and hot-unplug capable. And it will have the same 
issue. When using managed device registration that establishes a parent 
child relationship between the devices and in combination with a device 
where the managed unwinding does not happen on unbind, but rather on in 
the release callback you create a cyclic reference dependency. The child 
device holds a reference to the parent, but the reference is only 
released in the parents release callback. And since that release 
callback is not called until the last reference is dropped you end up 
with a resource leak.

There are even some other places where IIO drivers run into this. E.g. 
any driver that does `devm_iio_trigger_register(&indio_dev->dev, ...)` 
creates a resource leak on the trigger and the IIO device. The indio_dev 
is not a bus device, hence no unbind and the trigger holds a reference 
so the release callback will never be called either.
  
Maxime Ripard March 29, 2023, 7:43 p.m. UTC | #41
Hi David,

On Sat, Mar 25, 2023 at 01:40:01PM +0800, David Gow wrote:
> On Fri, 24 Mar 2023 at 20:32, Maxime Ripard <maxime@cerno.tech> wrote:
> >
> > On Fri, Mar 24, 2023 at 08:11:52AM +0200, Matti Vaittinen wrote:
> > > On 3/23/23 18:36, Maxime Ripard wrote:
> > > > On Thu, Mar 23, 2023 at 03:02:03PM +0200, Matti Vaittinen wrote:
> > > > > On 3/23/23 14:29, Maxime Ripard wrote:
> > > > > > On Thu, Mar 23, 2023 at 02:16:52PM +0200, Matti Vaittinen wrote:
> > > > > >
> > > > > > This is the description of what was happening:
> > > > > > https://lore.kernel.org/dri-devel/20221117165311.vovrc7usy4efiytl@houat/
> > > > >
> > > > > Thanks Maxime. Do I read this correcty. The devm_ unwinding not being done
> > > > > when root_device_register() is used is not because root_device_unregister()
> > > > > would not trigger the unwinding - but rather because DRM code on top of this
> > > > > device keeps the refcount increased?
> > > >
> > > > There's a difference of behaviour between a root_device and any device
> > > > with a bus: the root_device will only release the devm resources when
> > > > it's freed (in device_release), but a bus device will also do it in
> > > > device_del (through bus_remove_device() -> device_release_driver() ->
> > > > device_release_driver_internal() -> __device_release_driver() ->
> > > > device_unbind_cleanup(), which are skipped (in multiple places) if
> > > > there's no bus and no driver attached to the device).
> > > >
> > > > It does affect DRM, but I'm pretty sure it will affect any framework
> > > > that deals with device hotplugging by deferring the framework structure
> > > > until the last (userspace) user closes its file descriptor. So I'd
> > > > assume that v4l2 and cec at least are also affected, and most likely
> > > > others.
> > >
> > > Thanks for the explanation and patience :)
> > >
> > > >
> > > > > If this is the case, then it sounds like a DRM specific issue to me.
> > > >
> > > > I mean, I guess. One could also argue that it's because IIO doesn't
> > > > properly deal with hotplugging.
> > >
> > > I must say I haven't been testing the IIO registration API. I've only tested
> > > the helper API which is not backed up by any "IIO device". (This is fine for
> > > the helper because it must by design be cleaned-up only after the
> > > IIO-deregistration).
> > >
> > > After your explanation here, I am not convinced IIO wouldn't see the same
> > > issue if I was testing the devm_iio_device_alloc() & co.
> >
> > It depends really. The issue DRM is trying to solve is that, when a
> > device is gone, some application might still have an open FD and could
> > still poke into the kernel, while all the resources would have been
> > free'd if it was using devm.
> >
> > So everything is kept around until the last fd is closed, so you still
> > have a reference to the device (even though it's been removed from its
> > bus) until that time.
> >
> > It could be possible that IIO just doesn't handle that case at all. I
> > guess most of the devices aren't hotpluggable, and there's not much to
> > interact with from a userspace PoV iirc, so it might be why.
> >
> > > > I'm not sure how that helps. Those are
> > > > common helpers which should accommodate every framework,
> > >
> > > Ok. Fair enough. Besides, if the root-device was sufficient - then I would
> > > actually not see the need for a helper. People could in that case directly
> > > use the root_device_register(). So, if helpers are provided they should be
> > > backed up by a device with a bus then.
> > >
> > > > and your second
> > > > patch breaks the kunit tests for DRM anyway.
> > >
> > > Oh, I must have made an error there. It was supposed to be just a
> > > refactoring with no functional changes. Sorry about that. Anyways, that
> > > patch can be forgotten as Greg opposes using the platform devices in generic
> > > helpers.
> > >
> > > > > Whether it is a feature or bug is beyond my knowledge. Still, I would
> > > > > not say using the root_device_[un]register() in generic code is not
> > > > > feasible - unless all other subsytems have similar refcount handling.
> > > > >
> > > > > Sure thing using root_device_register() root_device_unregister() in DRM does
> > > > > not work as such. This, however, does not mean the generic kunit helpers
> > > > > should use platform_devices to force unwinding?
> > > >
> > > > platform_devices were a quick way to get a device that would have a bus
> > > > and a driver bound to fall into the right patch above. We probably
> > > > shouldn't use platform_devices and a kunit_device sounds like the best
> > > > idea, but the test linked in the original mail I pointed you to should
> > > > work with whatever we come up with. It works with multiple (platform,
> > > > PCI, USB, etc) buses, so the mock we create should behave like their
> > > > real world equivalents.
> > >
> > > Thanks for the patience and the explanation. Now I understand a generic test
> > > device needs to sit on a bus.
> > >
> > > As I said, in my very specific IIO related test the test device does not
> > > need a bus. Hence I'll drop the 'generic helpers' from this series.
> >
> > So, I went around and created a bunch of kunit tests that shows the
> > problem without DRM being involved at all.
> >
> > It does three things:
> >
> >  - It registers a device, attaches a devm action, unregisters the device
> >    and then checks that the action has ran.
> >
> >  - It registers a device, gets a reference to it, attaches a devm
> >    action, puts back the reference, unregisters the device and then
> >    checks that the action has ran.
> >
> >  - It registers a device, gets a reference to it, attaches a devm action
> >    that will put back the reference, unregisters the device and then
> >    checks that the action has ran.
> >
> > And in three cases: first with a root_device, then platform_device, then
> > a platform_device that has been bound to a driver.
> >
> > Once you've applied that patch, you can run it using:
> >
> > ./tools/testing/kunit/kunit.py run --kunitconfig=drivers/base/test/ devm-inconsistencies
> >
> > You'll see that only the last case passes all the tests, even though the
> > code itself is exactly the same.
> >
> 
> This illustrates the problem very nicely, thanks.
> 
> I played around a bit with this, and I'm definitely leaning towards
> this being a bug, rather than intentional behaviour. There's actually
> an explicit call to devres_release_all() in device_release() which
> seems to suggest that this should work:
> /*
> * Some platform devices are driven without driver attached
> * and managed resources may have been acquired.  Make sure
> * all resources are released.
> *
> * Drivers still can add resources into device after device
> * is deleted but alive, so release devres here to avoid
> * possible memory leak.
> */
> 
> My "solution" is just to call devres_release_all() in device_del() as
> well, which fixes your tests (and the drm-test-managed one when ported
> to use root_device_register() or my kunit_device_register() API[1]).
> 
> --8<--
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index 6878dfcbf0d6..adfec6185014 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -3778,6 +3778,17 @@ void device_del(struct device *dev)
>        device_platform_notify_remove(dev);
>        device_links_purge(dev);
> 
> +       /*
> +        * If a device does not have a driver attached, we need to clean up any
> +        * managed resources. We do this in device_release(), but it's never
> +        * called (and we leak the device) if a managed resource holds a
> +        * reference to the device. So release all managed resources here,
> +        * like we do in driver_detach(). We still need to do so again in
> +        * device_release() in case someone adds a new resource after this
> +        * point, though.
> +        */
> +       devres_release_all(dev);
> +
>        bus_notify(dev, BUS_NOTIFY_REMOVED_DEVICE);
>        kobject_uevent(&dev->kobj, KOBJ_REMOVE);
>        glue_dir = get_glue_dir(dev);
> 
> -->8--
> 
> It doesn't _seem_ to break anything else, and I've managed to convince
> myself that it's probably the correct fix.

Yeah, as an outsider, I came to the same conclusion last time...

> (Albeit, as someone with a limited knowledge of this part of the code,
> who also hasn't had quite enough sleep, so take that with some salt.)

... but as someone that also have a limited knowledge of that part of
the code, I certainly wasn't sure it was the proper thing to do :)

> Still, I'd agree with Greg that it'd be great to have versions of your
> tests upstream before making any such radical changes.

I just submitted them.

Thanks!
Maxime
  
Maxime Ripard March 29, 2023, 7:46 p.m. UTC | #42
On Sat, Mar 25, 2023 at 05:50:44PM +0000, Jonathan Cameron wrote:
> On Fri, 24 Mar 2023 13:31:57 +0100
> Maxime Ripard <maxime@cerno.tech> wrote:
> 
> > On Fri, Mar 24, 2023 at 08:11:52AM +0200, Matti Vaittinen wrote:
> > > On 3/23/23 18:36, Maxime Ripard wrote:  
> > > > On Thu, Mar 23, 2023 at 03:02:03PM +0200, Matti Vaittinen wrote:  
> > > > > On 3/23/23 14:29, Maxime Ripard wrote:  
> > > > > > On Thu, Mar 23, 2023 at 02:16:52PM +0200, Matti Vaittinen wrote:
> > > > > > 
> > > > > > This is the description of what was happening:
> > > > > > https://lore.kernel.org/dri-devel/20221117165311.vovrc7usy4efiytl@houat/  
> > > > > 
> > > > > Thanks Maxime. Do I read this correcty. The devm_ unwinding not being done
> > > > > when root_device_register() is used is not because root_device_unregister()
> > > > > would not trigger the unwinding - but rather because DRM code on top of this
> > > > > device keeps the refcount increased?  
> > > > 
> > > > There's a difference of behaviour between a root_device and any device
> > > > with a bus: the root_device will only release the devm resources when
> > > > it's freed (in device_release), but a bus device will also do it in
> > > > device_del (through bus_remove_device() -> device_release_driver() ->
> > > > device_release_driver_internal() -> __device_release_driver() ->
> > > > device_unbind_cleanup(), which are skipped (in multiple places) if
> > > > there's no bus and no driver attached to the device).
> > > > 
> > > > It does affect DRM, but I'm pretty sure it will affect any framework
> > > > that deals with device hotplugging by deferring the framework structure
> > > > until the last (userspace) user closes its file descriptor. So I'd
> > > > assume that v4l2 and cec at least are also affected, and most likely
> > > > others.  
> > > 
> > > Thanks for the explanation and patience :)
> > >   
> > > >   
> > > > > If this is the case, then it sounds like a DRM specific issue to me.  
> > > > 
> > > > I mean, I guess. One could also argue that it's because IIO doesn't
> > > > properly deal with hotplugging.  
> > > 
> > > I must say I haven't been testing the IIO registration API. I've only tested
> > > the helper API which is not backed up by any "IIO device". (This is fine for
> > > the helper because it must by design be cleaned-up only after the
> > > IIO-deregistration).
> > > 
> > > After your explanation here, I am not convinced IIO wouldn't see the same
> > > issue if I was testing the devm_iio_device_alloc() & co.  
> > 
> > It depends really. The issue DRM is trying to solve is that, when a
> > device is gone, some application might still have an open FD and could
> > still poke into the kernel, while all the resources would have been
> > free'd if it was using devm.
> > 
> > So everything is kept around until the last fd is closed, so you still
> > have a reference to the device (even though it's been removed from its
> > bus) until that time.
> > 
> > It could be possible that IIO just doesn't handle that case at all. I
> > guess most of the devices aren't hotpluggable, and there's not much to
> > interact with from a userspace PoV iirc, so it might be why.
> 
> Lars-Peter Clausen (IIRC) fixed up the IIO handling of the similar cases a
> long time ago now. It's simpler that for some other subsystems as we don't
> have as many interdependencies as occur in DRM etc.
> 
> I 'think' we are fine in general with the IIO approach to this (I think we
> did have one report of a theoretical race condition in the remove path that
> was never fully addressed).
> 
> For IIO we also have fds that can be open but all accesses to them are proxied
> through the IIO core and one of the things iio_device_unregister() or the devm
> equivalent does is to set indio_dev->info = NULL  (+ wake up anyone waiting on
> data etc). Alongside removing the callbacks, that is also used as a flag
> to indicate the device has gone.

Sorry if it came as trying to put IIO under a bad light, it certainly
wasn't my intention. I was trying to come up with possible explanations
as to why IIO's design was simpler than DRM is :)

> Note that we keep a reference to the struct indio_dev->dev (rather that the
> underlying device) so that is not freed until the last fd is closed.
> Thus, although devm unwinding has occurred that doesn't mean all the data
> that was allocated with devm_xx calls is cleared up immediately.

I'm not sure I get that part though. devm unwinding can happen even if the refcount is > 1

Maxime
  
Jonathan Cameron April 1, 2023, 3:30 p.m. UTC | #43
On Sun, 26 Mar 2023 10:16:54 -0700
Lars-Peter Clausen <lars@metafoo.de> wrote:

> On 3/25/23 10:50, Jonathan Cameron wrote:
> > On Fri, 24 Mar 2023 13:31:57 +0100
> > Maxime Ripard <maxime@cerno.tech> wrote:
> >  
> >> On Fri, Mar 24, 2023 at 08:11:52AM +0200, Matti Vaittinen wrote:  
> >>> On 3/23/23 18:36, Maxime Ripard wrote:  
> >>>> On Thu, Mar 23, 2023 at 03:02:03PM +0200, Matti Vaittinen wrote:  
> >>>>> On 3/23/23 14:29, Maxime Ripard wrote:  
> >>>>>> On Thu, Mar 23, 2023 at 02:16:52PM +0200, Matti Vaittinen wrote:
> >>>>>>
> >>>>>> This is the description of what was happening:
> >>>>>> https://lore.kernel.org/dri-devel/20221117165311.vovrc7usy4efiytl@houat/  
> >>>>> Thanks Maxime. Do I read this correcty. The devm_ unwinding not being done
> >>>>> when root_device_register() is used is not because root_device_unregister()
> >>>>> would not trigger the unwinding - but rather because DRM code on top of this
> >>>>> device keeps the refcount increased?  
> >>>> There's a difference of behaviour between a root_device and any device
> >>>> with a bus: the root_device will only release the devm resources when
> >>>> it's freed (in device_release), but a bus device will also do it in
> >>>> device_del (through bus_remove_device() -> device_release_driver() ->
> >>>> device_release_driver_internal() -> __device_release_driver() ->
> >>>> device_unbind_cleanup(), which are skipped (in multiple places) if
> >>>> there's no bus and no driver attached to the device).
> >>>>
> >>>> It does affect DRM, but I'm pretty sure it will affect any framework
> >>>> that deals with device hotplugging by deferring the framework structure
> >>>> until the last (userspace) user closes its file descriptor. So I'd
> >>>> assume that v4l2 and cec at least are also affected, and most likely
> >>>> others.  
> >>> Thanks for the explanation and patience :)
> >>>      
> >>>>      
> >>>>> If this is the case, then it sounds like a DRM specific issue to me.  
> >>>> I mean, I guess. One could also argue that it's because IIO doesn't
> >>>> properly deal with hotplugging.  
> >>> I must say I haven't been testing the IIO registration API. I've only tested
> >>> the helper API which is not backed up by any "IIO device". (This is fine for
> >>> the helper because it must by design be cleaned-up only after the
> >>> IIO-deregistration).
> >>>
> >>> After your explanation here, I am not convinced IIO wouldn't see the same
> >>> issue if I was testing the devm_iio_device_alloc() & co.  
> >> It depends really. The issue DRM is trying to solve is that, when a
> >> device is gone, some application might still have an open FD and could
> >> still poke into the kernel, while all the resources would have been
> >> free'd if it was using devm.
> >>
> >> So everything is kept around until the last fd is closed, so you still
> >> have a reference to the device (even though it's been removed from its
> >> bus) until that time.
> >>
> >> It could be possible that IIO just doesn't handle that case at all. I
> >> guess most of the devices aren't hotpluggable, and there's not much to
> >> interact with from a userspace PoV iirc, so it might be why.  
> > Lars-Peter Clausen (IIRC) fixed up the IIO handling of the similar cases a
> > long time ago now. It's simpler that for some other subsystems as we don't
> > have as many interdependencies as occur in DRM etc.
> >
> > I 'think' we are fine in general with the IIO approach to this (I think we
> > did have one report of a theoretical race condition in the remove path that
> > was never fully addressed).
> >
> > For IIO we also have fds that can be open but all accesses to them are proxied
> > through the IIO core and one of the things iio_device_unregister() or the devm
> > equivalent does is to set indio_dev->info = NULL  (+ wake up anyone waiting on
> > data etc). Alongside removing the callbacks, that is also used as a flag
> > to indicate the device has gone.
> >
> > Note that we keep a reference to the struct indio_dev->dev (rather that the
> > underlying device) so that is not freed until the last fd is closed.
> > Thus, although devm unwinding has occurred that doesn't mean all the data
> > that was allocated with devm_xx calls is cleared up immediately.  
> 
> IIO is fully hot-plug and hot-unplug capable. And it will have the same 
> issue. When using managed device registration that establishes a parent 
> child relationship between the devices and in combination with a device 
> where the managed unwinding does not happen on unbind, but rather on in 
> the release callback you create a cyclic reference dependency. The child 
> device holds a reference to the parent, but the reference is only 
> released in the parents release callback. And since that release 
> callback is not called until the last reference is dropped you end up 
> with a resource leak.
> 
> There are even some other places where IIO drivers run into this. E.g. 
> any driver that does `devm_iio_trigger_register(&indio_dev->dev, ...)` 

A driver should should not do that.
Should be devm_iio_trigger_registers(parent device, ...)

There is a corner where you need to detach the trigger from userspace
before release which is odd if it was attached by default. There are some
left over corners there that I think still cause trouble.

> creates a resource leak on the trigger and the IIO device. The indio_dev 
> is not a bus device, hence no unbind and the trigger holds a reference 
> so the release callback will never be called either.
>
  
Jonathan Cameron April 1, 2023, 3:36 p.m. UTC | #44
On Wed, 29 Mar 2023 21:46:09 +0200
Maxime Ripard <maxime@cerno.tech> wrote:

> On Sat, Mar 25, 2023 at 05:50:44PM +0000, Jonathan Cameron wrote:
> > On Fri, 24 Mar 2023 13:31:57 +0100
> > Maxime Ripard <maxime@cerno.tech> wrote:
> >   
> > > On Fri, Mar 24, 2023 at 08:11:52AM +0200, Matti Vaittinen wrote:  
> > > > On 3/23/23 18:36, Maxime Ripard wrote:    
> > > > > On Thu, Mar 23, 2023 at 03:02:03PM +0200, Matti Vaittinen wrote:    
> > > > > > On 3/23/23 14:29, Maxime Ripard wrote:    
> > > > > > > On Thu, Mar 23, 2023 at 02:16:52PM +0200, Matti Vaittinen wrote:
> > > > > > > 
> > > > > > > This is the description of what was happening:
> > > > > > > https://lore.kernel.org/dri-devel/20221117165311.vovrc7usy4efiytl@houat/    
> > > > > > 
> > > > > > Thanks Maxime. Do I read this correcty. The devm_ unwinding not being done
> > > > > > when root_device_register() is used is not because root_device_unregister()
> > > > > > would not trigger the unwinding - but rather because DRM code on top of this
> > > > > > device keeps the refcount increased?    
> > > > > 
> > > > > There's a difference of behaviour between a root_device and any device
> > > > > with a bus: the root_device will only release the devm resources when
> > > > > it's freed (in device_release), but a bus device will also do it in
> > > > > device_del (through bus_remove_device() -> device_release_driver() ->
> > > > > device_release_driver_internal() -> __device_release_driver() ->
> > > > > device_unbind_cleanup(), which are skipped (in multiple places) if
> > > > > there's no bus and no driver attached to the device).
> > > > > 
> > > > > It does affect DRM, but I'm pretty sure it will affect any framework
> > > > > that deals with device hotplugging by deferring the framework structure
> > > > > until the last (userspace) user closes its file descriptor. So I'd
> > > > > assume that v4l2 and cec at least are also affected, and most likely
> > > > > others.    
> > > > 
> > > > Thanks for the explanation and patience :)
> > > >     
> > > > >     
> > > > > > If this is the case, then it sounds like a DRM specific issue to me.    
> > > > > 
> > > > > I mean, I guess. One could also argue that it's because IIO doesn't
> > > > > properly deal with hotplugging.    
> > > > 
> > > > I must say I haven't been testing the IIO registration API. I've only tested
> > > > the helper API which is not backed up by any "IIO device". (This is fine for
> > > > the helper because it must by design be cleaned-up only after the
> > > > IIO-deregistration).
> > > > 
> > > > After your explanation here, I am not convinced IIO wouldn't see the same
> > > > issue if I was testing the devm_iio_device_alloc() & co.    
> > > 
> > > It depends really. The issue DRM is trying to solve is that, when a
> > > device is gone, some application might still have an open FD and could
> > > still poke into the kernel, while all the resources would have been
> > > free'd if it was using devm.
> > > 
> > > So everything is kept around until the last fd is closed, so you still
> > > have a reference to the device (even though it's been removed from its
> > > bus) until that time.
> > > 
> > > It could be possible that IIO just doesn't handle that case at all. I
> > > guess most of the devices aren't hotpluggable, and there's not much to
> > > interact with from a userspace PoV iirc, so it might be why.  
> > 
> > Lars-Peter Clausen (IIRC) fixed up the IIO handling of the similar cases a
> > long time ago now. It's simpler that for some other subsystems as we don't
> > have as many interdependencies as occur in DRM etc.
> > 
> > I 'think' we are fine in general with the IIO approach to this (I think we
> > did have one report of a theoretical race condition in the remove path that
> > was never fully addressed).
> > 
> > For IIO we also have fds that can be open but all accesses to them are proxied
> > through the IIO core and one of the things iio_device_unregister() or the devm
> > equivalent does is to set indio_dev->info = NULL  (+ wake up anyone waiting on
> > data etc). Alongside removing the callbacks, that is also used as a flag
> > to indicate the device has gone.  
> 
> Sorry if it came as trying to put IIO under a bad light, it certainly
> wasn't my intention. I was trying to come up with possible explanations
> as to why IIO's design was simpler than DRM is :)

No problem :) I'm sure there are gremlins hiding there.
Part of the problem is that nothing prevents drivers doing 'wrong things'
other than us noticing when it happens.

> 
> > Note that we keep a reference to the struct indio_dev->dev (rather that the
> > underlying device) so that is not freed until the last fd is closed.
> > Thus, although devm unwinding has occurred that doesn't mean all the data
> > that was allocated with devm_xx calls is cleared up immediately.  
> 
> I'm not sure I get that part though. devm unwinding can happen even if the refcount is > 1

No IIO driver should be using devm on the indio_dev->dev, they should be doing it on the
parent device.  When the devm_iio_device_free() gets called, that doesn't actually free
the device. Just decrements a reference count (earlier on we already ensured that
it is just acting as a stub that provides no access to the underlying device for
open FDs.).

There are probably more problems hiding though!

Jonathan



> 
> Maxime
  

Patch

diff --git a/drivers/base/test/Kconfig b/drivers/base/test/Kconfig
index 610a1ba7a467..642b5b183c10 100644
--- a/drivers/base/test/Kconfig
+++ b/drivers/base/test/Kconfig
@@ -1,4 +1,9 @@ 
 # SPDX-License-Identifier: GPL-2.0
+
+config TEST_KUNIT_DEVICE_HELPERS
+	depends on KUNIT
+	tristate
+
 config TEST_ASYNC_DRIVER_PROBE
 	tristate "Build kernel module to test asynchronous driver probing"
 	depends on m
diff --git a/drivers/base/test/Makefile b/drivers/base/test/Makefile
index 7f76fee6f989..49926524ec6f 100644
--- a/drivers/base/test/Makefile
+++ b/drivers/base/test/Makefile
@@ -1,5 +1,7 @@ 
 # SPDX-License-Identifier: GPL-2.0
 obj-$(CONFIG_TEST_ASYNC_DRIVER_PROBE)	+= test_async_driver_probe.o
 
+obj-$(CONFIG_TEST_KUNIT_DEVICE_HELPERS) += test_kunit_device.o
+
 obj-$(CONFIG_DRIVER_PE_KUNIT_TEST) += property-entry-test.o
 CFLAGS_property-entry-test.o += $(DISABLE_STRUCTLEAK_PLUGIN)
diff --git a/drivers/base/test/test_kunit_device.c b/drivers/base/test/test_kunit_device.c
new file mode 100644
index 000000000000..75790e15b85c
--- /dev/null
+++ b/drivers/base/test/test_kunit_device.c
@@ -0,0 +1,83 @@ 
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * These helpers have been extracted from drm test code at
+ * drm_kunit_helpers.c which was authored by
+ * Maxime Ripard <maxime@cerno.tech>
+ */
+
+#include <linux/device.h>
+#include <linux/platform_device.h>
+
+#include <kunit/platform_device.h>
+
+#define KUNIT_DEVICE_NAME	"test-kunit-mock-device"
+
+static int fake_probe(struct platform_device *pdev)
+{
+	return 0;
+}
+
+static int fake_remove(struct platform_device *pdev)
+{
+	return 0;
+}
+
+static struct platform_driver fake_platform_driver = {
+	.probe	= fake_probe,
+	.remove	= fake_remove,
+	.driver = {
+		.name	= KUNIT_DEVICE_NAME,
+	},
+};
+
+/**
+ * test_kunit_helper_alloc_device - Allocate a mock device for a KUnit test
+ * @test: The test context object
+ *
+ * This allocates a fake struct &device to create a mock for a KUnit
+ * test. The device will also be bound to a fake driver. It will thus be
+ * able to leverage the usual infrastructure and most notably the
+ * device-managed resources just like a "real" device.
+ *
+ * Callers need to make sure test_kunit_helper_free_device() on the
+ * device when done.
+ *
+ * Returns:
+ * A pointer to the new device, or an ERR_PTR() otherwise.
+ */
+struct device *test_kunit_helper_alloc_device(struct kunit *test)
+{
+	struct platform_device *pdev;
+	int ret;
+
+	ret = platform_driver_register(&fake_platform_driver);
+	KUNIT_ASSERT_EQ(test, ret, 0);
+
+	pdev = platform_device_alloc(KUNIT_DEVICE_NAME, PLATFORM_DEVID_NONE);
+	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, pdev);
+
+	ret = platform_device_add(pdev);
+	KUNIT_ASSERT_EQ(test, ret, 0);
+
+	return &pdev->dev;
+}
+EXPORT_SYMBOL_GPL(test_kunit_helper_alloc_device);
+
+/**
+ * test_kunit_helper_free_device - Frees a mock device
+ * @test: The test context object
+ * @dev: The device to free
+ *
+ * Frees a device allocated with test_kunit_helper_alloc_device().
+ */
+void test_kunit_helper_free_device(struct kunit *test, struct device *dev)
+{
+	struct platform_device *pdev = to_platform_device(dev);
+
+	platform_device_unregister(pdev);
+	platform_driver_unregister(&fake_platform_driver);
+}
+EXPORT_SYMBOL_GPL(test_kunit_helper_free_device);
+
+MODULE_AUTHOR("Matti Vaittinen <mazziesaccount@gmail.com>");
+MODULE_LICENSE("GPL");
diff --git a/include/kunit/platform_device.h b/include/kunit/platform_device.h
new file mode 100644
index 000000000000..2a9c7bdd75eb
--- /dev/null
+++ b/include/kunit/platform_device.h
@@ -0,0 +1,13 @@ 
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef __KUNIT_PLATFORM_DEVICE__
+#define __KUNIT_PLATFORM_DEVICE__
+
+#include <kunit/test.h>
+
+struct device;
+
+struct device *test_kunit_helper_alloc_device(struct kunit *test);
+void test_kunit_helper_free_device(struct kunit *test, struct device *dev);
+
+#endif