[v4,05/45] drm/connector: Check drm_connector_init pointers arguments

Message ID 20231128-kms-hdmi-connector-state-v4-5-c7602158306e@kernel.org
State New
Headers
Series drm/connector: Create HDMI Connector infrastructure |

Commit Message

Maxime Ripard Nov. 28, 2023, 10:24 a.m. UTC
  All the drm_connector_init variants take at least a pointer to the
device, connector and hooks implementation.

However, none of them check their value before dereferencing those
pointers which can lead to a NULL-pointer dereference if the author
isn't careful.

Let's test those pointers instead and error out if any is NULL.

Signed-off-by: Maxime Ripard <mripard@kernel.org>
---
 drivers/gpu/drm/drm_connector.c | 3 +++
 1 file changed, 3 insertions(+)
  

Comments

Jani Nikula Nov. 28, 2023, 12:54 p.m. UTC | #1
On Tue, 28 Nov 2023, Maxime Ripard <mripard@kernel.org> wrote:
> All the drm_connector_init variants take at least a pointer to the
> device, connector and hooks implementation.
>
> However, none of them check their value before dereferencing those
> pointers which can lead to a NULL-pointer dereference if the author
> isn't careful.

Arguably oopsing on the spot is preferrable when this can't be caused by
user input. It's always a mistake that should be caught early during
development.

Not everyone checks the return value of drm_connector_init and friends,
so those cases will lead to more mysterious bugs later. And probably
oopses as well.


BR,
Jani.


>
> Let's test those pointers instead and error out if any is NULL.
>
> Signed-off-by: Maxime Ripard <mripard@kernel.org>
> ---
>  drivers/gpu/drm/drm_connector.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/drivers/gpu/drm/drm_connector.c b/drivers/gpu/drm/drm_connector.c
> index b0516505f7ae..2f60755dccdd 100644
> --- a/drivers/gpu/drm/drm_connector.c
> +++ b/drivers/gpu/drm/drm_connector.c
> @@ -229,6 +229,9 @@ static int __drm_connector_init(struct drm_device *dev,
>  	struct ida *connector_ida =
>  		&drm_connector_enum_list[connector_type].ida;
>  
> +	if (!dev || !connector || !funcs)
> +		return -EINVAL;
> +
>  	WARN_ON(drm_drv_uses_atomic_modeset(dev) &&
>  		(!funcs->atomic_destroy_state ||
>  		 !funcs->atomic_duplicate_state));
  
Maxime Ripard Nov. 28, 2023, 1:29 p.m. UTC | #2
Hi Jani,

On Tue, Nov 28, 2023 at 02:54:02PM +0200, Jani Nikula wrote:
> On Tue, 28 Nov 2023, Maxime Ripard <mripard@kernel.org> wrote:
> > All the drm_connector_init variants take at least a pointer to the
> > device, connector and hooks implementation.
> >
> > However, none of them check their value before dereferencing those
> > pointers which can lead to a NULL-pointer dereference if the author
> > isn't careful.
> 
> Arguably oopsing on the spot is preferrable when this can't be caused by
> user input. It's always a mistake that should be caught early during
> development.
> 
> Not everyone checks the return value of drm_connector_init and friends,
> so those cases will lead to more mysterious bugs later. And probably
> oopses as well.

So maybe we can do both then, with something like

if (WARN_ON(!dev))
   return -EINVAL

if (drm_WARN_ON(dev, !connector || !funcs))
   return -EINVAL;

I'd still like to check for this, so we can have proper testing, and we
already check for those pointers in some places (like funcs in
drm_connector_init), so if we don't cover everything we're inconsistent.

Maxime
  
Ville Syrjälä Nov. 28, 2023, 1:49 p.m. UTC | #3
On Tue, Nov 28, 2023 at 02:29:40PM +0100, Maxime Ripard wrote:
> Hi Jani,
> 
> On Tue, Nov 28, 2023 at 02:54:02PM +0200, Jani Nikula wrote:
> > On Tue, 28 Nov 2023, Maxime Ripard <mripard@kernel.org> wrote:
> > > All the drm_connector_init variants take at least a pointer to the
> > > device, connector and hooks implementation.
> > >
> > > However, none of them check their value before dereferencing those
> > > pointers which can lead to a NULL-pointer dereference if the author
> > > isn't careful.
> > 
> > Arguably oopsing on the spot is preferrable when this can't be caused by
> > user input. It's always a mistake that should be caught early during
> > development.
> > 
> > Not everyone checks the return value of drm_connector_init and friends,
> > so those cases will lead to more mysterious bugs later. And probably
> > oopses as well.
> 
> So maybe we can do both then, with something like
> 
> if (WARN_ON(!dev))
>    return -EINVAL
> 
> if (drm_WARN_ON(dev, !connector || !funcs))
>    return -EINVAL;
> 
> I'd still like to check for this, so we can have proper testing, and we
> already check for those pointers in some places (like funcs in
> drm_connector_init), so if we don't cover everything we're inconsistent.

People will invariably cargo-cult this kind of stuff absolutely
everywhere and then all your functions will have tons of dead
code to check their arguments. I'd prefer not to go there
usually.

Should we perhaps start to use the (arguably hideous)
 - void f(struct foo *bar)
 + void f(struct foo bar[static 1])
syntax to tell the compiler we don't accept NULL pointers?

Hmm. Apparently that has the same problem as using any
other kind of array syntax in the prototype. That is,
the compiler demands to know the definition of 'struct foo'
even though we're passing in effectively a pointer. Sigh.
  
Maxime Ripard Nov. 29, 2023, 9:11 a.m. UTC | #4
Hi Ville,

On Tue, Nov 28, 2023 at 03:49:08PM +0200, Ville Syrjälä wrote:
> On Tue, Nov 28, 2023 at 02:29:40PM +0100, Maxime Ripard wrote:
> > On Tue, Nov 28, 2023 at 02:54:02PM +0200, Jani Nikula wrote:
> > > On Tue, 28 Nov 2023, Maxime Ripard <mripard@kernel.org> wrote:
> > > > All the drm_connector_init variants take at least a pointer to the
> > > > device, connector and hooks implementation.
> > > >
> > > > However, none of them check their value before dereferencing those
> > > > pointers which can lead to a NULL-pointer dereference if the author
> > > > isn't careful.
> > > 
> > > Arguably oopsing on the spot is preferrable when this can't be caused by
> > > user input. It's always a mistake that should be caught early during
> > > development.
> > > 
> > > Not everyone checks the return value of drm_connector_init and friends,
> > > so those cases will lead to more mysterious bugs later. And probably
> > > oopses as well.
> > 
> > So maybe we can do both then, with something like
> > 
> > if (WARN_ON(!dev))
> >    return -EINVAL
> > 
> > if (drm_WARN_ON(dev, !connector || !funcs))
> >    return -EINVAL;
> > 
> > I'd still like to check for this, so we can have proper testing, and we
> > already check for those pointers in some places (like funcs in
> > drm_connector_init), so if we don't cover everything we're inconsistent.
> 
> People will invariably cargo-cult this kind of stuff absolutely
> everywhere and then all your functions will have tons of dead
> code to check their arguments.

And that's a bad thing because... ?

Also, are you really saying that checking that your arguments make sense
is cargo-cult?

We're already doing it in some parts of KMS, so we have to be
consistent, and the answer to "most drivers don't check the error"
cannot be "let's just give on error checking then".

> I'd prefer not to go there usually.
> 
> Should we perhaps start to use the (arguably hideous)
>  - void f(struct foo *bar)
>  + void f(struct foo bar[static 1])
> syntax to tell the compiler we don't accept NULL pointers?
> 
> Hmm. Apparently that has the same problem as using any
> other kind of array syntax in the prototype. That is,
> the compiler demands to know the definition of 'struct foo'
> even though we're passing in effectively a pointer. Sigh.

Honestly, I don't care as long as it's something we can unit-test to
make sure we make it consistent. We can't unit test a complete kernel
crash.
  
Ville Syrjälä Nov. 29, 2023, 9:18 a.m. UTC | #5
On Wed, Nov 29, 2023 at 10:11:26AM +0100, Maxime Ripard wrote:
> Hi Ville,
> 
> On Tue, Nov 28, 2023 at 03:49:08PM +0200, Ville Syrjälä wrote:
> > On Tue, Nov 28, 2023 at 02:29:40PM +0100, Maxime Ripard wrote:
> > > On Tue, Nov 28, 2023 at 02:54:02PM +0200, Jani Nikula wrote:
> > > > On Tue, 28 Nov 2023, Maxime Ripard <mripard@kernel.org> wrote:
> > > > > All the drm_connector_init variants take at least a pointer to the
> > > > > device, connector and hooks implementation.
> > > > >
> > > > > However, none of them check their value before dereferencing those
> > > > > pointers which can lead to a NULL-pointer dereference if the author
> > > > > isn't careful.
> > > > 
> > > > Arguably oopsing on the spot is preferrable when this can't be caused by
> > > > user input. It's always a mistake that should be caught early during
> > > > development.
> > > > 
> > > > Not everyone checks the return value of drm_connector_init and friends,
> > > > so those cases will lead to more mysterious bugs later. And probably
> > > > oopses as well.
> > > 
> > > So maybe we can do both then, with something like
> > > 
> > > if (WARN_ON(!dev))
> > >    return -EINVAL
> > > 
> > > if (drm_WARN_ON(dev, !connector || !funcs))
> > >    return -EINVAL;
> > > 
> > > I'd still like to check for this, so we can have proper testing, and we
> > > already check for those pointers in some places (like funcs in
> > > drm_connector_init), so if we don't cover everything we're inconsistent.
> > 
> > People will invariably cargo-cult this kind of stuff absolutely
> > everywhere and then all your functions will have tons of dead
> > code to check their arguments.
> 
> And that's a bad thing because... ?
> 
> Also, are you really saying that checking that your arguments make sense
> is cargo-cult?
> 
> We're already doing it in some parts of KMS, so we have to be
> consistent, and the answer to "most drivers don't check the error"
> cannot be "let's just give on error checking then".
> 
> > I'd prefer not to go there usually.
> > 
> > Should we perhaps start to use the (arguably hideous)
> >  - void f(struct foo *bar)
> >  + void f(struct foo bar[static 1])
> > syntax to tell the compiler we don't accept NULL pointers?
> > 
> > Hmm. Apparently that has the same problem as using any
> > other kind of array syntax in the prototype. That is,
> > the compiler demands to know the definition of 'struct foo'
> > even though we're passing in effectively a pointer. Sigh.
> 
> Honestly, I don't care as long as it's something we can unit-test to
> make sure we make it consistent. We can't unit test a complete kernel
> crash.

Why do you want to put utterly broken code into a unit test?
  
Jani Nikula Nov. 29, 2023, 9:38 a.m. UTC | #6
On Wed, 29 Nov 2023, Maxime Ripard <mripard@kernel.org> wrote:
> Hi Ville,
>
> On Tue, Nov 28, 2023 at 03:49:08PM +0200, Ville Syrjälä wrote:
>> On Tue, Nov 28, 2023 at 02:29:40PM +0100, Maxime Ripard wrote:
>> > On Tue, Nov 28, 2023 at 02:54:02PM +0200, Jani Nikula wrote:
>> > > On Tue, 28 Nov 2023, Maxime Ripard <mripard@kernel.org> wrote:
>> > > > All the drm_connector_init variants take at least a pointer to the
>> > > > device, connector and hooks implementation.
>> > > >
>> > > > However, none of them check their value before dereferencing those
>> > > > pointers which can lead to a NULL-pointer dereference if the author
>> > > > isn't careful.
>> > > 
>> > > Arguably oopsing on the spot is preferrable when this can't be caused by
>> > > user input. It's always a mistake that should be caught early during
>> > > development.
>> > > 
>> > > Not everyone checks the return value of drm_connector_init and friends,
>> > > so those cases will lead to more mysterious bugs later. And probably
>> > > oopses as well.
>> > 
>> > So maybe we can do both then, with something like
>> > 
>> > if (WARN_ON(!dev))
>> >    return -EINVAL
>> > 
>> > if (drm_WARN_ON(dev, !connector || !funcs))
>> >    return -EINVAL;
>> > 
>> > I'd still like to check for this, so we can have proper testing, and we
>> > already check for those pointers in some places (like funcs in
>> > drm_connector_init), so if we don't cover everything we're inconsistent.
>> 
>> People will invariably cargo-cult this kind of stuff absolutely
>> everywhere and then all your functions will have tons of dead
>> code to check their arguments.
>
> And that's a bad thing because... ?
>
> Also, are you really saying that checking that your arguments make sense
> is cargo-cult?

It's a powerful thing to be able to assume a NULL argument is always a
fatal programming error on the caller's side, and should oops and get
caught immediately. It's an assertion.

We're not talking about user input or anything like that here.

If you start checking for things that can't happen, and return errors
for them, you start gracefully handling things that don't have anything
graceful about them.

Having such checks in place trains people to think they *may* happen.

While it should fail fast and loud at the developer's first smoke test,
and get fixed then and there.


BR,
Jani.


>
> We're already doing it in some parts of KMS, so we have to be
> consistent, and the answer to "most drivers don't check the error"
> cannot be "let's just give on error checking then".
>
>> I'd prefer not to go there usually.
>> 
>> Should we perhaps start to use the (arguably hideous)
>>  - void f(struct foo *bar)
>>  + void f(struct foo bar[static 1])
>> syntax to tell the compiler we don't accept NULL pointers?
>> 
>> Hmm. Apparently that has the same problem as using any
>> other kind of array syntax in the prototype. That is,
>> the compiler demands to know the definition of 'struct foo'
>> even though we're passing in effectively a pointer. Sigh.
>
> Honestly, I don't care as long as it's something we can unit-test to
> make sure we make it consistent. We can't unit test a complete kernel
> crash.
  
Pekka Paalanen Nov. 29, 2023, 10:12 a.m. UTC | #7
On Tue, 28 Nov 2023 15:49:08 +0200
Ville Syrjälä <ville.syrjala@linux.intel.com> wrote:

> Should we perhaps start to use the (arguably hideous)
>  - void f(struct foo *bar)
>  + void f(struct foo bar[static 1])
> syntax to tell the compiler we don't accept NULL pointers?
> 
> Hmm. Apparently that has the same problem as using any
> other kind of array syntax in the prototype. That is,
> the compiler demands to know the definition of 'struct foo'
> even though we're passing in effectively a pointer. Sigh.


__attribute__((nonnull)) ?


Thanks,
pq
  
Ville Syrjälä Nov. 29, 2023, 10:25 a.m. UTC | #8
On Wed, Nov 29, 2023 at 12:12:59PM +0200, Pekka Paalanen wrote:
> On Tue, 28 Nov 2023 15:49:08 +0200
> Ville Syrjälä <ville.syrjala@linux.intel.com> wrote:
> 
> > Should we perhaps start to use the (arguably hideous)
> >  - void f(struct foo *bar)
> >  + void f(struct foo bar[static 1])
> > syntax to tell the compiler we don't accept NULL pointers?
> > 
> > Hmm. Apparently that has the same problem as using any
> > other kind of array syntax in the prototype. That is,
> > the compiler demands to know the definition of 'struct foo'
> > even though we're passing in effectively a pointer. Sigh.
> 
> 
> __attribute__((nonnull)) ?

I guess that would work, though the syntax is horrible when
you need to flag specific arguments.
  
Maxime Ripard Nov. 29, 2023, 11:10 a.m. UTC | #9
On Wed, Nov 29, 2023 at 11:38:42AM +0200, Jani Nikula wrote:
> On Wed, 29 Nov 2023, Maxime Ripard <mripard@kernel.org> wrote:
> > Hi Ville,
> >
> > On Tue, Nov 28, 2023 at 03:49:08PM +0200, Ville Syrjälä wrote:
> >> On Tue, Nov 28, 2023 at 02:29:40PM +0100, Maxime Ripard wrote:
> >> > On Tue, Nov 28, 2023 at 02:54:02PM +0200, Jani Nikula wrote:
> >> > > On Tue, 28 Nov 2023, Maxime Ripard <mripard@kernel.org> wrote:
> >> > > > All the drm_connector_init variants take at least a pointer to the
> >> > > > device, connector and hooks implementation.
> >> > > >
> >> > > > However, none of them check their value before dereferencing those
> >> > > > pointers which can lead to a NULL-pointer dereference if the author
> >> > > > isn't careful.
> >> > > 
> >> > > Arguably oopsing on the spot is preferrable when this can't be caused by
> >> > > user input. It's always a mistake that should be caught early during
> >> > > development.
> >> > > 
> >> > > Not everyone checks the return value of drm_connector_init and friends,
> >> > > so those cases will lead to more mysterious bugs later. And probably
> >> > > oopses as well.
> >> > 
> >> > So maybe we can do both then, with something like
> >> > 
> >> > if (WARN_ON(!dev))
> >> >    return -EINVAL
> >> > 
> >> > if (drm_WARN_ON(dev, !connector || !funcs))
> >> >    return -EINVAL;
> >> > 
> >> > I'd still like to check for this, so we can have proper testing, and we
> >> > already check for those pointers in some places (like funcs in
> >> > drm_connector_init), so if we don't cover everything we're inconsistent.
> >> 
> >> People will invariably cargo-cult this kind of stuff absolutely
> >> everywhere and then all your functions will have tons of dead
> >> code to check their arguments.
> >
> > And that's a bad thing because... ?
> >
> > Also, are you really saying that checking that your arguments make sense
> > is cargo-cult?
> 
> It's a powerful thing to be able to assume a NULL argument is always a
> fatal programming error on the caller's side, and should oops and get
> caught immediately. It's an assertion.

Yeah, but we're not really doing that either. We have no explicit
assertion anywhere. We take a pointer in, and just hope that it will be
dereferenced later on and that the kernel will crash. The pointer to the
functions especially is only deferenced very later on.

And assertions might be powerful, but being able to notice errors and
debug them is too. A panic takes away basically any remote access to
debug. If you don't have a console, you're done.

> We're not talking about user input or anything like that here.
> 
> If you start checking for things that can't happen, and return errors
> for them, you start gracefully handling things that don't have anything
> graceful about them.

But there's nothing graceful to do here: you just return from your probe
function that you couldn't probe and that's it. Just like you do when
you can't map your registers, or get your interrupt, or register into
any framework (including drm_dev_register that pretty much every driver
handles properly if it returns an error, without being graceful about
it).

> Having such checks in place trains people to think they *may* happen.

In most cases, kmalloc can't fail. We seem to have a very different
policy towards it.

> While it should fail fast and loud at the developer's first smoke test,
> and get fixed then and there.

Returning an error + a warning also qualifies for "fail fast and loud".
But keeps the system alive for someone to notice in any case.

Maxime
  
Jani Nikula Nov. 29, 2023, 11:40 a.m. UTC | #10
On Wed, 29 Nov 2023, Maxime Ripard <mripard@kernel.org> wrote:
> On Wed, Nov 29, 2023 at 11:38:42AM +0200, Jani Nikula wrote:
>> On Wed, 29 Nov 2023, Maxime Ripard <mripard@kernel.org> wrote:
>> > Hi Ville,
>> >
>> > On Tue, Nov 28, 2023 at 03:49:08PM +0200, Ville Syrjälä wrote:
>> >> On Tue, Nov 28, 2023 at 02:29:40PM +0100, Maxime Ripard wrote:
>> >> > On Tue, Nov 28, 2023 at 02:54:02PM +0200, Jani Nikula wrote:
>> >> > > On Tue, 28 Nov 2023, Maxime Ripard <mripard@kernel.org> wrote:
>> >> > > > All the drm_connector_init variants take at least a pointer to the
>> >> > > > device, connector and hooks implementation.
>> >> > > >
>> >> > > > However, none of them check their value before dereferencing those
>> >> > > > pointers which can lead to a NULL-pointer dereference if the author
>> >> > > > isn't careful.
>> >> > > 
>> >> > > Arguably oopsing on the spot is preferrable when this can't be caused by
>> >> > > user input. It's always a mistake that should be caught early during
>> >> > > development.
>> >> > > 
>> >> > > Not everyone checks the return value of drm_connector_init and friends,
>> >> > > so those cases will lead to more mysterious bugs later. And probably
>> >> > > oopses as well.
>> >> > 
>> >> > So maybe we can do both then, with something like
>> >> > 
>> >> > if (WARN_ON(!dev))
>> >> >    return -EINVAL
>> >> > 
>> >> > if (drm_WARN_ON(dev, !connector || !funcs))
>> >> >    return -EINVAL;
>> >> > 
>> >> > I'd still like to check for this, so we can have proper testing, and we
>> >> > already check for those pointers in some places (like funcs in
>> >> > drm_connector_init), so if we don't cover everything we're inconsistent.
>> >> 
>> >> People will invariably cargo-cult this kind of stuff absolutely
>> >> everywhere and then all your functions will have tons of dead
>> >> code to check their arguments.
>> >
>> > And that's a bad thing because... ?
>> >
>> > Also, are you really saying that checking that your arguments make sense
>> > is cargo-cult?
>> 
>> It's a powerful thing to be able to assume a NULL argument is always a
>> fatal programming error on the caller's side, and should oops and get
>> caught immediately. It's an assertion.
>
> Yeah, but we're not really doing that either. We have no explicit
> assertion anywhere. We take a pointer in, and just hope that it will be
> dereferenced later on and that the kernel will crash. The pointer to the
> functions especially is only deferenced very later on.
>
> And assertions might be powerful, but being able to notice errors and
> debug them is too. A panic takes away basically any remote access to
> debug. If you don't have a console, you're done.
>
>> We're not talking about user input or anything like that here.
>> 
>> If you start checking for things that can't happen, and return errors
>> for them, you start gracefully handling things that don't have anything
>> graceful about them.
>
> But there's nothing graceful to do here: you just return from your probe
> function that you couldn't probe and that's it. Just like you do when
> you can't map your registers, or get your interrupt, or register into
> any framework (including drm_dev_register that pretty much every driver
> handles properly if it returns an error, without being graceful about
> it).

Those are all dynamic things that can fail.

Quite different from passing NULL dev, connector, or funcs to
drm_connector_init() and friends.

I think it's wrong to set the example that everything needs to be
checked, everything needs to return an error, every call needs to check
for error return, all the time, everywhere. People absolutely will cargo
cult that, and that's what Ville is referring to.

If you pass NULL dev, connector, or funcs to drm_connector_init() I
think you absolutely deserve to get an oops.

For dev, you could possibly not have reached the function with NULL
dev. (And __drm_connector_init() has dev->mode_config before the check,
so you'll get a static analyzer warning about dereference before the
check.) If you have NULL connector, you didn't check for allocation
failure earlier. If you have NULL funcs, you just passed NULL, because
it's generally supposed to be a pointer to a static const struct.

>> Having such checks in place trains people to think they *may* happen.
>
> In most cases, kmalloc can't fail. We seem to have a very different
> policy towards it.

Again, dynamic in nature and can fail.

>> While it should fail fast and loud at the developer's first smoke test,
>> and get fixed then and there.
>
> Returning an error + a warning also qualifies for "fail fast and loud".
> But keeps the system alive for someone to notice in any case.

But where do you draw the line? If we keep adding these checks to things
that actually can't happen, we teach developers we need to check for
impossible things. And we teach them not to trust anything.

I scroll down the file and reach
drm_connector_attach_edid_property(). Should we NULL check connector?
Should we change the function to int and return a value? Should the
caller check the value? Then there's drm_connector_attach_encoder(). And
drm_connector_has_possible_encoder(). And so on and so forth.

Where do you draw the line?


BR,
Jani.
  
Maxime Ripard Nov. 29, 2023, 1:26 p.m. UTC | #11
On Wed, Nov 29, 2023 at 01:40:38PM +0200, Jani Nikula wrote:
> On Wed, 29 Nov 2023, Maxime Ripard <mripard@kernel.org> wrote:
> > On Wed, Nov 29, 2023 at 11:38:42AM +0200, Jani Nikula wrote:
> >> On Wed, 29 Nov 2023, Maxime Ripard <mripard@kernel.org> wrote:
> >> > Hi Ville,
> >> >
> >> > On Tue, Nov 28, 2023 at 03:49:08PM +0200, Ville Syrjälä wrote:
> >> >> On Tue, Nov 28, 2023 at 02:29:40PM +0100, Maxime Ripard wrote:
> >> >> > On Tue, Nov 28, 2023 at 02:54:02PM +0200, Jani Nikula wrote:
> >> >> > > On Tue, 28 Nov 2023, Maxime Ripard <mripard@kernel.org> wrote:
> >> >> > > > All the drm_connector_init variants take at least a pointer to the
> >> >> > > > device, connector and hooks implementation.
> >> >> > > >
> >> >> > > > However, none of them check their value before dereferencing those
> >> >> > > > pointers which can lead to a NULL-pointer dereference if the author
> >> >> > > > isn't careful.
> >> >> > > 
> >> >> > > Arguably oopsing on the spot is preferrable when this can't be caused by
> >> >> > > user input. It's always a mistake that should be caught early during
> >> >> > > development.
> >> >> > > 
> >> >> > > Not everyone checks the return value of drm_connector_init and friends,
> >> >> > > so those cases will lead to more mysterious bugs later. And probably
> >> >> > > oopses as well.
> >> >> > 
> >> >> > So maybe we can do both then, with something like
> >> >> > 
> >> >> > if (WARN_ON(!dev))
> >> >> >    return -EINVAL
> >> >> > 
> >> >> > if (drm_WARN_ON(dev, !connector || !funcs))
> >> >> >    return -EINVAL;
> >> >> > 
> >> >> > I'd still like to check for this, so we can have proper testing, and we
> >> >> > already check for those pointers in some places (like funcs in
> >> >> > drm_connector_init), so if we don't cover everything we're inconsistent.
> >> >> 
> >> >> People will invariably cargo-cult this kind of stuff absolutely
> >> >> everywhere and then all your functions will have tons of dead
> >> >> code to check their arguments.
> >> >
> >> > And that's a bad thing because... ?
> >> >
> >> > Also, are you really saying that checking that your arguments make sense
> >> > is cargo-cult?
> >> 
> >> It's a powerful thing to be able to assume a NULL argument is always a
> >> fatal programming error on the caller's side, and should oops and get
> >> caught immediately. It's an assertion.
> >
> > Yeah, but we're not really doing that either. We have no explicit
> > assertion anywhere. We take a pointer in, and just hope that it will be
> > dereferenced later on and that the kernel will crash. The pointer to the
> > functions especially is only deferenced very later on.
> >
> > And assertions might be powerful, but being able to notice errors and
> > debug them is too. A panic takes away basically any remote access to
> > debug. If you don't have a console, you're done.
> >
> >> We're not talking about user input or anything like that here.
> >> 
> >> If you start checking for things that can't happen, and return errors
> >> for them, you start gracefully handling things that don't have anything
> >> graceful about them.
> >
> > But there's nothing graceful to do here: you just return from your probe
> > function that you couldn't probe and that's it. Just like you do when
> > you can't map your registers, or get your interrupt, or register into
> > any framework (including drm_dev_register that pretty much every driver
> > handles properly if it returns an error, without being graceful about
> > it).
> 
> Those are all dynamic things that can fail.
> 
> Quite different from passing NULL dev, connector, or funcs to
> drm_connector_init() and friends.
> 
> I think it's wrong to set the example that everything needs to be
> checked, everything needs to return an error, every call needs to check
> for error return, all the time, everywhere. People absolutely will cargo
> cult that, and that's what Ville is referring to.
> 
> If you pass NULL dev, connector, or funcs to drm_connector_init() I
> think you absolutely deserve to get an oops.
> 
> For dev, you could possibly not have reached the function with NULL
> dev. (And __drm_connector_init() has dev->mode_config before the check,
> so you'll get a static analyzer warning about dereference before the
> check.) If you have NULL connector, you didn't check for allocation
> failure earlier. If you have NULL funcs, you just passed NULL, because
> it's generally supposed to be a pointer to a static const struct.
> 
> >> Having such checks in place trains people to think they *may* happen.
> >
> > In most cases, kmalloc can't fail. We seem to have a very different
> > policy towards it.
> 
> Again, dynamic in nature and can fail.
> 
> >> While it should fail fast and loud at the developer's first smoke test,
> >> and get fixed then and there.
> >
> > Returning an error + a warning also qualifies for "fail fast and loud".
> > But keeps the system alive for someone to notice in any case.
> 
> But where do you draw the line?

This also applies to static things then.
drm_connector_attach_scaling_mode_property() or
drm_mode_create_colorspace_property() (or plenty of others) will check
on the value of the supported scaling modes colorspaces, even though
they are static.

It looks like we have that policy of "just assert and roll with it" for
pointers, but not for other static values passed to those initialization
functions.

> If we keep adding these checks to things that actually can't happen,
> we teach developers we need to check for impossible things. And we
> teach them not to trust anything.

Well, I certainly don't trust drivers to get things right.

> I scroll down the file and reach drm_connector_attach_edid_property().
> Should we NULL check connector? Should we change the function to int
> and return a value? Should the caller check the value? Then there's
> drm_connector_attach_encoder(). And
> drm_connector_has_possible_encoder(). And so on and so forth.
> 
> Where do you draw the line?

If things can fail, we should expect the caller to handle the failure
somehow. The documentation of drm_connector_attach_encoder() states that
it can fail, so we should expect it.
drm_connector_has_possible_encoder() doesn't so we can assume it can't
fail.

If the function can fail but wasn't designed or documented as such, then
it's on the function. If it was but the caller didn't handle the error
case, then that's on the caller.

Maxime
  
Maxime Ripard Dec. 1, 2023, 9:01 a.m. UTC | #12
On Wed, Nov 29, 2023 at 11:18:24AM +0200, Ville Syrjälä wrote:
> On Wed, Nov 29, 2023 at 10:11:26AM +0100, Maxime Ripard wrote:
> > Hi Ville,
> > 
> > On Tue, Nov 28, 2023 at 03:49:08PM +0200, Ville Syrjälä wrote:
> > > On Tue, Nov 28, 2023 at 02:29:40PM +0100, Maxime Ripard wrote:
> > > > On Tue, Nov 28, 2023 at 02:54:02PM +0200, Jani Nikula wrote:
> > > > > On Tue, 28 Nov 2023, Maxime Ripard <mripard@kernel.org> wrote:
> > > > > > All the drm_connector_init variants take at least a pointer to the
> > > > > > device, connector and hooks implementation.
> > > > > >
> > > > > > However, none of them check their value before dereferencing those
> > > > > > pointers which can lead to a NULL-pointer dereference if the author
> > > > > > isn't careful.
> > > > > 
> > > > > Arguably oopsing on the spot is preferrable when this can't be caused by
> > > > > user input. It's always a mistake that should be caught early during
> > > > > development.
> > > > > 
> > > > > Not everyone checks the return value of drm_connector_init and friends,
> > > > > so those cases will lead to more mysterious bugs later. And probably
> > > > > oopses as well.
> > > > 
> > > > So maybe we can do both then, with something like
> > > > 
> > > > if (WARN_ON(!dev))
> > > >    return -EINVAL
> > > > 
> > > > if (drm_WARN_ON(dev, !connector || !funcs))
> > > >    return -EINVAL;
> > > > 
> > > > I'd still like to check for this, so we can have proper testing, and we
> > > > already check for those pointers in some places (like funcs in
> > > > drm_connector_init), so if we don't cover everything we're inconsistent.
> > > 
> > > People will invariably cargo-cult this kind of stuff absolutely
> > > everywhere and then all your functions will have tons of dead
> > > code to check their arguments.
> > 
> > And that's a bad thing because... ?
> > 
> > Also, are you really saying that checking that your arguments make sense
> > is cargo-cult?
> > 
> > We're already doing it in some parts of KMS, so we have to be
> > consistent, and the answer to "most drivers don't check the error"
> > cannot be "let's just give on error checking then".
> > 
> > > I'd prefer not to go there usually.
> > > 
> > > Should we perhaps start to use the (arguably hideous)
> > >  - void f(struct foo *bar)
> > >  + void f(struct foo bar[static 1])
> > > syntax to tell the compiler we don't accept NULL pointers?
> > > 
> > > Hmm. Apparently that has the same problem as using any
> > > other kind of array syntax in the prototype. That is,
> > > the compiler demands to know the definition of 'struct foo'
> > > even though we're passing in effectively a pointer. Sigh.
> > 
> > Honestly, I don't care as long as it's something we can unit-test to
> > make sure we make it consistent. We can't unit test a complete kernel
> > crash.
> 
> Why do you want to put utterly broken code into a unit test?

Utterly broken code happens. It probably shouldn't, but here we are.

Anyway, you mostly missed the consistent part.

The current state with it is:

  - planes:
    - drm_universal_plane_init warns if funcs->destroy NULL
    - drm_universal_plane_alloc errors out if funcs is NULL
    - drmm_universal_plane_alloc warns and errors out if funcs or funcs->destroy are NULL

  - CRTC:
    - drm_crtc_init_with_planes warns if funcs->destroy NULL
    - drmm_crtc_init_with_planes warns if funcs or funcs->destroy are NULL
    - drmm_crtc_alloc_with_planes warns and errors out if funcs or funcs->destroy are NULL

  - encoder:
    - drm_encoder_init warns if funcs->destroy NULL
    - drmm_encoder_init warns and errors out if funcs or funcs->destroy are NULL
    - drmm_encoder_alloc warns and errors out if funcs or funcs->destroy are NULL

  - connectors:
    - drm_connector_init warns and errors out if funcs or funcs->destroy are NULL
    - drm_connector_init_with_ddc warns and errors out if funcs or funcs->destroy are NULL
    - drmm_connector_init warns and errors out if funcs or funcs->destroy are NULL

I think that just proves that your opinion is just not as clear cut as
you'd like it to be, and it's far from being the policy you claim it is.

Plus, we're not even remotely consistent there, and we're not
documenting that anywhere.

And we have plenty of other examples of static stuff being checked
because it just makes sense. All variants of drm_crtc_init_with_planes
will for example check that the correct plane type is associated to the
primary and cursor planes.

We should fix that.

Maxime
  
Ville Syrjälä Dec. 1, 2023, 3:15 p.m. UTC | #13
On Fri, Dec 01, 2023 at 10:01:49AM +0100, Maxime Ripard wrote:
> On Wed, Nov 29, 2023 at 11:18:24AM +0200, Ville Syrjälä wrote:
> > On Wed, Nov 29, 2023 at 10:11:26AM +0100, Maxime Ripard wrote:
> > > Hi Ville,
> > > 
> > > On Tue, Nov 28, 2023 at 03:49:08PM +0200, Ville Syrjälä wrote:
> > > > On Tue, Nov 28, 2023 at 02:29:40PM +0100, Maxime Ripard wrote:
> > > > > On Tue, Nov 28, 2023 at 02:54:02PM +0200, Jani Nikula wrote:
> > > > > > On Tue, 28 Nov 2023, Maxime Ripard <mripard@kernel.org> wrote:
> > > > > > > All the drm_connector_init variants take at least a pointer to the
> > > > > > > device, connector and hooks implementation.
> > > > > > >
> > > > > > > However, none of them check their value before dereferencing those
> > > > > > > pointers which can lead to a NULL-pointer dereference if the author
> > > > > > > isn't careful.
> > > > > > 
> > > > > > Arguably oopsing on the spot is preferrable when this can't be caused by
> > > > > > user input. It's always a mistake that should be caught early during
> > > > > > development.
> > > > > > 
> > > > > > Not everyone checks the return value of drm_connector_init and friends,
> > > > > > so those cases will lead to more mysterious bugs later. And probably
> > > > > > oopses as well.
> > > > > 
> > > > > So maybe we can do both then, with something like
> > > > > 
> > > > > if (WARN_ON(!dev))
> > > > >    return -EINVAL
> > > > > 
> > > > > if (drm_WARN_ON(dev, !connector || !funcs))
> > > > >    return -EINVAL;
> > > > > 
> > > > > I'd still like to check for this, so we can have proper testing, and we
> > > > > already check for those pointers in some places (like funcs in
> > > > > drm_connector_init), so if we don't cover everything we're inconsistent.
> > > > 
> > > > People will invariably cargo-cult this kind of stuff absolutely
> > > > everywhere and then all your functions will have tons of dead
> > > > code to check their arguments.
> > > 
> > > And that's a bad thing because... ?
> > > 
> > > Also, are you really saying that checking that your arguments make sense
> > > is cargo-cult?
> > > 
> > > We're already doing it in some parts of KMS, so we have to be
> > > consistent, and the answer to "most drivers don't check the error"
> > > cannot be "let's just give on error checking then".
> > > 
> > > > I'd prefer not to go there usually.
> > > > 
> > > > Should we perhaps start to use the (arguably hideous)
> > > >  - void f(struct foo *bar)
> > > >  + void f(struct foo bar[static 1])
> > > > syntax to tell the compiler we don't accept NULL pointers?
> > > > 
> > > > Hmm. Apparently that has the same problem as using any
> > > > other kind of array syntax in the prototype. That is,
> > > > the compiler demands to know the definition of 'struct foo'
> > > > even though we're passing in effectively a pointer. Sigh.
> > > 
> > > Honestly, I don't care as long as it's something we can unit-test to
> > > make sure we make it consistent. We can't unit test a complete kernel
> > > crash.
> > 
> > Why do you want to put utterly broken code into a unit test?
> 
> Utterly broken code happens. It probably shouldn't, but here we are.
> 
> Anyway, you mostly missed the consistent part.
> 
> The current state with it is:
> 
>   - planes:
>     - drm_universal_plane_init warns if funcs->destroy NULL
>     - drm_universal_plane_alloc errors out if funcs is NULL
>     - drmm_universal_plane_alloc warns and errors out if funcs or funcs->destroy are NULL
> 
>   - CRTC:
>     - drm_crtc_init_with_planes warns if funcs->destroy NULL
>     - drmm_crtc_init_with_planes warns if funcs or funcs->destroy are NULL
>     - drmm_crtc_alloc_with_planes warns and errors out if funcs or funcs->destroy are NULL
> 
>   - encoder:
>     - drm_encoder_init warns if funcs->destroy NULL
>     - drmm_encoder_init warns and errors out if funcs or funcs->destroy are NULL
>     - drmm_encoder_alloc warns and errors out if funcs or funcs->destroy are NULL
> 
>   - connectors:
>     - drm_connector_init warns and errors out if funcs or funcs->destroy are NULL
>     - drm_connector_init_with_ddc warns and errors out if funcs or funcs->destroy are NULL
>     - drmm_connector_init warns and errors out if funcs or funcs->destroy are NULL

Those are perhaps fine, as things may not oops immediately if you get
that wrong. But NULL checking things that will anyway oops immediately,
(and there's not chance the thing will do anything useful if you prevent
the oops with a NULL check) doesn't make any sense to me. It just makes
you doubt the entire universe when you see code like that.

> 
> I think that just proves that your opinion is just not as clear cut as
> you'd like it to be, and it's far from being the policy you claim it is.
> 
> Plus, we're not even remotely consistent there, and we're not
> documenting that anywhere.
> 
> And we have plenty of other examples of static stuff being checked
> because it just makes sense. All variants of drm_crtc_init_with_planes
> will for example check that the correct plane type is associated to the
> primary and cursor planes.
> 
> We should fix that.

Ideally the compiler should catch all of this. But given we are
implementing this in C (and even C23 constexpr was neutered beyond
uselessness) that's not really possible in any nice way :(
  
Ville Syrjälä Dec. 1, 2023, 3:17 p.m. UTC | #14
On Wed, Nov 29, 2023 at 12:25:37PM +0200, Ville Syrjälä wrote:
> On Wed, Nov 29, 2023 at 12:12:59PM +0200, Pekka Paalanen wrote:
> > On Tue, 28 Nov 2023 15:49:08 +0200
> > Ville Syrjälä <ville.syrjala@linux.intel.com> wrote:
> > 
> > > Should we perhaps start to use the (arguably hideous)
> > >  - void f(struct foo *bar)
> > >  + void f(struct foo bar[static 1])
> > > syntax to tell the compiler we don't accept NULL pointers?
> > > 
> > > Hmm. Apparently that has the same problem as using any
> > > other kind of array syntax in the prototype. That is,
> > > the compiler demands to know the definition of 'struct foo'
> > > even though we're passing in effectively a pointer. Sigh.
> > 
> > 
> > __attribute__((nonnull)) ?
> 
> I guess that would work, though the syntax is horrible when
> you need to flag specific arguments.

I played around with this a bit (blindly cocci'd tons of
drm and i915 function declarations with the nonnull attribute)
and it's somewhat underwhelming unfortunately.

It will trip only if the compiler is 100% sure you're passing
in a NULL. There is no way to eg. tell the compiler that a
function can return a NULL and thus anything coming from it
should be checked by the caller before passing it on to
something with the nonnull attribute. And I suppose error
pointers would also screw that idea over anyway.

Additionally the NULL device checks being being done in 
the drm_err/dbg macros trip this up left right and center.
And hiding that check inside a function (instead of having
it in the macro) is also ruined by the fact that we apparently
pass different types of pointers to these macros :( Generics
could be used to sort out that type mess I suppose, or the
code that passes the wrong type (DSI code at least) should
just be changed to not do that. But not sure if there's enough
benefit to warrant the work.
  

Patch

diff --git a/drivers/gpu/drm/drm_connector.c b/drivers/gpu/drm/drm_connector.c
index b0516505f7ae..2f60755dccdd 100644
--- a/drivers/gpu/drm/drm_connector.c
+++ b/drivers/gpu/drm/drm_connector.c
@@ -229,6 +229,9 @@  static int __drm_connector_init(struct drm_device *dev,
 	struct ida *connector_ida =
 		&drm_connector_enum_list[connector_type].ida;
 
+	if (!dev || !connector || !funcs)
+		return -EINVAL;
+
 	WARN_ON(drm_drv_uses_atomic_modeset(dev) &&
 		(!funcs->atomic_destroy_state ||
 		 !funcs->atomic_duplicate_state));