HID: logitech-hidpp: rework one more time the retries attempts

Message ID 20230621-logitech-fixes-v1-1-32e70933c0b0@redhat.com
State New
Headers
Series HID: logitech-hidpp: rework one more time the retries attempts |

Commit Message

Benjamin Tissoires June 21, 2023, 9:42 a.m. UTC
  Make the code looks less like Pascal.

Extract the internal code inside a helper function, fix the
initialization of the parameters used in the helper function
(`hidpp->answer_available` was not reset and `*response` wasn't too),
and use a `do {...} while();` loop.

Fixes: 586e8fede795 ("HID: logitech-hidpp: Retry commands when device is busy")
Cc: stable@vger.kernel.org
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
---
as requested by https://lore.kernel.org/all/CAHk-=wiMbF38KCNhPFiargenpSBoecSXTLQACKS2UMyo_Vu2ww@mail.gmail.com/
This is a rewrite of that particular piece of code.
---
 drivers/hid/hid-logitech-hidpp.c | 102 +++++++++++++++++++++++----------------
 1 file changed, 61 insertions(+), 41 deletions(-)


---
base-commit: b98ec211af5508457e2b1c4cc99373630a83fa81
change-id: 20230621-logitech-fixes-a4c0e66ea2ad

Best regards,
  

Comments

Greg KH June 21, 2023, 10:50 a.m. UTC | #1
On Wed, Jun 21, 2023 at 11:42:30AM +0200, Benjamin Tissoires wrote:
> Make the code looks less like Pascal.
> 
> Extract the internal code inside a helper function, fix the
> initialization of the parameters used in the helper function
> (`hidpp->answer_available` was not reset and `*response` wasn't too),
> and use a `do {...} while();` loop.
> 
> Fixes: 586e8fede795 ("HID: logitech-hidpp: Retry commands when device is busy")
> Cc: stable@vger.kernel.org
> Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
> ---
> as requested by https://lore.kernel.org/all/CAHk-=wiMbF38KCNhPFiargenpSBoecSXTLQACKS2UMyo_Vu2ww@mail.gmail.com/
> This is a rewrite of that particular piece of code.
> ---
>  drivers/hid/hid-logitech-hidpp.c | 102 +++++++++++++++++++++++----------------
>  1 file changed, 61 insertions(+), 41 deletions(-)
> 
> diff --git a/drivers/hid/hid-logitech-hidpp.c b/drivers/hid/hid-logitech-hidpp.c
> index dfe8e09a18de..3d1ffe199f08 100644
> --- a/drivers/hid/hid-logitech-hidpp.c
> +++ b/drivers/hid/hid-logitech-hidpp.c
> @@ -275,21 +275,20 @@ static int __hidpp_send_report(struct hid_device *hdev,
>  }
>  
>  /*
> - * hidpp_send_message_sync() returns 0 in case of success, and something else
> - * in case of a failure.
> - * - If ' something else' is positive, that means that an error has been raised
> - *   by the protocol itself.
> - * - If ' something else' is negative, that means that we had a classic error
> - *   (-ENOMEM, -EPIPE, etc...)
> + * Effectively send the message to the device, waiting for its answer.
> + *
> + * Must be called with hidpp->send_mutex locked
> + *
> + * Same return protocol than hidpp_send_message_sync():
> + * - success on 0
> + * - negative error means transport error
> + * - positive value means protocol error
>   */
> -static int hidpp_send_message_sync(struct hidpp_device *hidpp,
> +static int __do_hidpp_send_message_sync(struct hidpp_device *hidpp,
>  	struct hidpp_report *message,
>  	struct hidpp_report *response)

__must_hold(&hidpp->send_mutex)  ?
  
Benjamin Tissoires June 23, 2023, 8:37 a.m. UTC | #2
On Jun 21 2023, Greg KH wrote:
> 
> On Wed, Jun 21, 2023 at 11:42:30AM +0200, Benjamin Tissoires wrote:
> > Make the code looks less like Pascal.
> > 
> > Extract the internal code inside a helper function, fix the
> > initialization of the parameters used in the helper function
> > (`hidpp->answer_available` was not reset and `*response` wasn't too),
> > and use a `do {...} while();` loop.
> > 
> > Fixes: 586e8fede795 ("HID: logitech-hidpp: Retry commands when device is busy")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
> > ---
> > as requested by https://lore.kernel.org/all/CAHk-=wiMbF38KCNhPFiargenpSBoecSXTLQACKS2UMyo_Vu2ww@mail.gmail.com/
> > This is a rewrite of that particular piece of code.
> > ---
> >  drivers/hid/hid-logitech-hidpp.c | 102 +++++++++++++++++++++++----------------
> >  1 file changed, 61 insertions(+), 41 deletions(-)
> > 
> > diff --git a/drivers/hid/hid-logitech-hidpp.c b/drivers/hid/hid-logitech-hidpp.c
> > index dfe8e09a18de..3d1ffe199f08 100644
> > --- a/drivers/hid/hid-logitech-hidpp.c
> > +++ b/drivers/hid/hid-logitech-hidpp.c
> > @@ -275,21 +275,20 @@ static int __hidpp_send_report(struct hid_device *hdev,
> >  }
> >  
> >  /*
> > - * hidpp_send_message_sync() returns 0 in case of success, and something else
> > - * in case of a failure.
> > - * - If ' something else' is positive, that means that an error has been raised
> > - *   by the protocol itself.
> > - * - If ' something else' is negative, that means that we had a classic error
> > - *   (-ENOMEM, -EPIPE, etc...)
> > + * Effectively send the message to the device, waiting for its answer.
> > + *
> > + * Must be called with hidpp->send_mutex locked
> > + *
> > + * Same return protocol than hidpp_send_message_sync():
> > + * - success on 0
> > + * - negative error means transport error
> > + * - positive value means protocol error
> >   */
> > -static int hidpp_send_message_sync(struct hidpp_device *hidpp,
> > +static int __do_hidpp_send_message_sync(struct hidpp_device *hidpp,
> >  	struct hidpp_report *message,
> >  	struct hidpp_report *response)
> 
> __must_hold(&hidpp->send_mutex)  ?
> 

Good point. I'll add this in v2.

I'm still waiting for some feedback from the people who particpated in
the original BZ, but the new bug is harder to reproduce. Anyway, there
is no rush IMO.

Cheers,
Benjamin
  
Bastien Nocera June 25, 2023, 8:29 a.m. UTC | #3
On Wed, 2023-06-21 at 11:42 +0200, Benjamin Tissoires wrote:
> Make the code looks less like Pascal.

Honestly, while this was written in jest in an email is fine, putting
this in the commit message is quite insulting.

The "retry" patch tried to fix real world problems by making minimal
code changes, eg. avoiding the review problem that the present patch
has, and even then, all of us missed the logic bug.

I also haven't written any Pascal code since 1996.

> Extract the internal code inside a helper function, fix the
> initialization of the parameters used in the helper function
> (`hidpp->answer_available` was not reset and `*response` wasn't too),

"wasn't either".

> and use a `do {...} while();` loop.
> 
> Fixes: 586e8fede795 ("HID: logitech-hidpp: Retry commands when device
> is busy")
> Cc: stable@vger.kernel.org
> Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
> ---
> as requested by
> https://lore.kernel.org/all/CAHk-=wiMbF38KCNhPFiargenpSBoecSXTLQACKS2UMyo_Vu2ww@mail.gmail.com/
> This is a rewrite of that particular piece of code.
> ---
>  drivers/hid/hid-logitech-hidpp.c | 102 +++++++++++++++++++++++------
> ----------
>  1 file changed, 61 insertions(+), 41 deletions(-)
> 
> diff --git a/drivers/hid/hid-logitech-hidpp.c b/drivers/hid/hid-
> logitech-hidpp.c
> index dfe8e09a18de..3d1ffe199f08 100644
> --- a/drivers/hid/hid-logitech-hidpp.c
> +++ b/drivers/hid/hid-logitech-hidpp.c
> @@ -275,21 +275,20 @@ static int __hidpp_send_report(struct
> hid_device *hdev,
>  }
>  
>  /*
> - * hidpp_send_message_sync() returns 0 in case of success, and
> something else
> - * in case of a failure.
> - * - If ' something else' is positive, that means that an error has
> been raised
> - *   by the protocol itself.
> - * - If ' something else' is negative, that means that we had a
> classic error
> - *   (-ENOMEM, -EPIPE, etc...)
> + * Effectively send the message to the device, waiting for its
> answer.
> + *
> + * Must be called with hidpp->send_mutex locked
> + *
> + * Same return protocol than hidpp_send_message_sync():
> + * - success on 0
> + * - negative error means transport error
> + * - positive value means protocol error
>   */
> -static int hidpp_send_message_sync(struct hidpp_device *hidpp,
> +static int __do_hidpp_send_message_sync(struct hidpp_device *hidpp,
>         struct hidpp_report *message,
>         struct hidpp_report *response)
>  {
> -       int ret = -1;
> -       int max_retries = 3;
> -
> -       mutex_lock(&hidpp->send_mutex);
> +       int ret;
>  
>         hidpp->send_receive_buf = response;
>         hidpp->answer_available = false;
> @@ -300,41 +299,62 @@ static int hidpp_send_message_sync(struct
> hidpp_device *hidpp,
>          */
>         *response = *message;
>  
> -       for (; max_retries != 0 && ret; max_retries--) {
> -               ret = __hidpp_send_report(hidpp->hid_dev, message);
> +       ret = __hidpp_send_report(hidpp->hid_dev, message);
> +       if (ret) {
> +               dbg_hid("__hidpp_send_report returned err: %d\n",
> ret);
> +               memset(response, 0, sizeof(struct hidpp_report));
> +               return ret;
> +       }
>  
> -               if (ret) {
> -                       dbg_hid("__hidpp_send_report returned err:
> %d\n", ret);
> -                       memset(response, 0, sizeof(struct
> hidpp_report));
> -                       break;
> -               }
> +       if (!wait_event_timeout(hidpp->wait, hidpp->answer_available,
> +                               5*HZ)) {
> +               dbg_hid("%s:timeout waiting for response\n",
> __func__);
> +               memset(response, 0, sizeof(struct hidpp_report));
> +               return -ETIMEDOUT;
> +       }
>  
> -               if (!wait_event_timeout(hidpp->wait, hidpp-
> >answer_available,
> -                                       5*HZ)) {
> -                       dbg_hid("%s:timeout waiting for response\n",
> __func__);
> -                       memset(response, 0, sizeof(struct
> hidpp_report));
> -                       ret = -ETIMEDOUT;
> -                       break;
> -               }
> +       if (response->report_id == REPORT_ID_HIDPP_SHORT &&
> +           response->rap.sub_id == HIDPP_ERROR) {
> +               ret = response->rap.params[1];
> +               dbg_hid("%s:got hidpp error %02X\n", __func__, ret);
> +               return ret;
> +       }
>  
> -               if (response->report_id == REPORT_ID_HIDPP_SHORT &&
> -                   response->rap.sub_id == HIDPP_ERROR) {
> -                       ret = response->rap.params[1];
> -                       dbg_hid("%s:got hidpp error %02X\n",
> __func__, ret);
> +       if ((response->report_id == REPORT_ID_HIDPP_LONG ||
> +            response->report_id == REPORT_ID_HIDPP_VERY_LONG) &&
> +           response->fap.feature_index == HIDPP20_ERROR) {
> +               ret = response->fap.params[1];
> +               dbg_hid("%s:got hidpp 2.0 error %02X\n", __func__,
> ret);
> +               return ret;
> +       }
> +
> +       return 0;
> +}
> +
> +/*
> + * hidpp_send_message_sync() returns 0 in case of success, and
> something else
> + * in case of a failure.
> + * - If ' something else' is positive, that means that an error has
> been raised
> + *   by the protocol itself.
> + * - If ' something else' is negative, that means that we had a
> classic error
> + *   (-ENOMEM, -EPIPE, etc...)

Do we really need to re-explain the possible return values that were
already explained above __do_hidpp_send_message_sync()?

If we do, why don't also do it for hidpp_send_fap_command_sync() and
hidpp_send_rap_command_sync(), or their callers?

If it's absolutely necessary, a "see __do_hidpp_send_message_sync()"
should be enough.

I've double-checked that none of the existing callers expected a
partially filled in "response" struct on error.

Reviewed-by: Bastien Nocera <hadess@hadess.net>

> + */
> +static int hidpp_send_message_sync(struct hidpp_device *hidpp,
> +       struct hidpp_report *message,
> +       struct hidpp_report *response)
> +{
> +       int ret;
> +       int max_retries = 3;
> +
> +       mutex_lock(&hidpp->send_mutex);
> +
> +       do {
> +               ret = __do_hidpp_send_message_sync(hidpp, message,
> response);
> +               if (ret != HIDPP20_ERROR_BUSY)
>                         break;
> -               }
>  
> -               if ((response->report_id == REPORT_ID_HIDPP_LONG ||
> -                    response->report_id ==
> REPORT_ID_HIDPP_VERY_LONG) &&
> -                   response->fap.feature_index == HIDPP20_ERROR) {
> -                       ret = response->fap.params[1];
> -                       if (ret != HIDPP20_ERROR_BUSY) {
> -                               dbg_hid("%s:got hidpp 2.0 error
> %02X\n", __func__, ret);
> -                               break;
> -                       }
> -                       dbg_hid("%s:got busy hidpp 2.0 error %02X,
> retrying\n", __func__, ret);
> -               }
> -       }
> +               dbg_hid("%s:got busy hidpp 2.0 error %02X,
> retrying\n", __func__, ret);
> +       } while (--max_retries);
>  
>         mutex_unlock(&hidpp->send_mutex);
>         return ret;
> 
> ---
> base-commit: b98ec211af5508457e2b1c4cc99373630a83fa81
> change-id: 20230621-logitech-fixes-a4c0e66ea2ad
> 
> Best regards,
  
Bastien Nocera June 25, 2023, 8:29 a.m. UTC | #4
On Fri, 2023-06-23 at 10:37 +0200, Benjamin Tissoires wrote:
> 
> On Jun 21 2023, Greg KH wrote:
> > 
> > On Wed, Jun 21, 2023 at 11:42:30AM +0200, Benjamin Tissoires wrote:
> > > Make the code looks less like Pascal.
> > > 
> > > Extract the internal code inside a helper function, fix the
> > > initialization of the parameters used in the helper function
> > > (`hidpp->answer_available` was not reset and `*response` wasn't
> > > too),
> > > and use a `do {...} while();` loop.
> > > 
> > > Fixes: 586e8fede795 ("HID: logitech-hidpp: Retry commands when
> > > device is busy")
> > > Cc: stable@vger.kernel.org
> > > Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
> > > ---
> > > as requested by
> > > https://lore.kernel.org/all/CAHk-=wiMbF38KCNhPFiargenpSBoecSXTLQACKS2UMyo_Vu2ww@mail.gmail.com/
> > > This is a rewrite of that particular piece of code.
> > > ---
> > >  drivers/hid/hid-logitech-hidpp.c | 102 +++++++++++++++++++++++--
> > > --------------
> > >  1 file changed, 61 insertions(+), 41 deletions(-)
> > > 
> > > diff --git a/drivers/hid/hid-logitech-hidpp.c b/drivers/hid/hid-
> > > logitech-hidpp.c
> > > index dfe8e09a18de..3d1ffe199f08 100644
> > > --- a/drivers/hid/hid-logitech-hidpp.c
> > > +++ b/drivers/hid/hid-logitech-hidpp.c
> > > @@ -275,21 +275,20 @@ static int __hidpp_send_report(struct
> > > hid_device *hdev,
> > >  }
> > >  
> > >  /*
> > > - * hidpp_send_message_sync() returns 0 in case of success, and
> > > something else
> > > - * in case of a failure.
> > > - * - If ' something else' is positive, that means that an error
> > > has been raised
> > > - *   by the protocol itself.
> > > - * - If ' something else' is negative, that means that we had a
> > > classic error
> > > - *   (-ENOMEM, -EPIPE, etc...)
> > > + * Effectively send the message to the device, waiting for its
> > > answer.
> > > + *
> > > + * Must be called with hidpp->send_mutex locked
> > > + *
> > > + * Same return protocol than hidpp_send_message_sync():
> > > + * - success on 0
> > > + * - negative error means transport error
> > > + * - positive value means protocol error
> > >   */
> > > -static int hidpp_send_message_sync(struct hidpp_device *hidpp,
> > > +static int __do_hidpp_send_message_sync(struct hidpp_device
> > > *hidpp,
> > >         struct hidpp_report *message,
> > >         struct hidpp_report *response)
> > 
> > __must_hold(&hidpp->send_mutex)  ?
> > 
> 
> Good point. I'll add this in v2.
> 
> I'm still waiting for some feedback from the people who particpated
> in
> the original BZ, but the new bug is harder to reproduce. Anyway,
> there
> is no rush IMO.

The problem is only ever going to show up in very limited circumstances
after the logic fix was applied.

You need a hardware problem (such as the controller being too busy to
answer) to trigger the problems fixed by this patch. I don't see a way
to reliably reproduce it unless you inject that hardware error.
  
Benjamin Tissoires June 26, 2023, 2:01 p.m. UTC | #5
On Sun, Jun 25, 2023 at 10:30 AM Bastien Nocera <hadess@hadess.net> wrote:
>
> On Wed, 2023-06-21 at 11:42 +0200, Benjamin Tissoires wrote:
> > Make the code looks less like Pascal.
>
> Honestly, while this was written in jest in an email is fine, putting
> this in the commit message is quite insulting.
>
> The "retry" patch tried to fix real world problems by making minimal
> code changes, eg. avoiding the review problem that the present patch
> has, and even then, all of us missed the logic bug.
>
> I also haven't written any Pascal code since 1996.

Apologies for that. I honestly took Linus' remark to myself only,
because I was fixing your fix on my original code.
And while initially fixing your for loop, I should have realized that
this was very hard to follow, because of the "if (sth; sth < 1 && foo
&& bar; sth+=1)".

I'll amend v2

>
> > Extract the internal code inside a helper function, fix the
> > initialization of the parameters used in the helper function
> > (`hidpp->answer_available` was not reset and `*response` wasn't too),
>
> "wasn't either".
>
> > and use a `do {...} while();` loop.
> >
> > Fixes: 586e8fede795 ("HID: logitech-hidpp: Retry commands when device
> > is busy")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
> > ---
> > as requested by
> > https://lore.kernel.org/all/CAHk-=wiMbF38KCNhPFiargenpSBoecSXTLQACKS2UMyo_Vu2ww@mail.gmail.com/
> > This is a rewrite of that particular piece of code.
> > ---
> >  drivers/hid/hid-logitech-hidpp.c | 102 +++++++++++++++++++++++------
> > ----------
> >  1 file changed, 61 insertions(+), 41 deletions(-)
> >
> > diff --git a/drivers/hid/hid-logitech-hidpp.c b/drivers/hid/hid-
> > logitech-hidpp.c
> > index dfe8e09a18de..3d1ffe199f08 100644
> > --- a/drivers/hid/hid-logitech-hidpp.c
> > +++ b/drivers/hid/hid-logitech-hidpp.c
> > @@ -275,21 +275,20 @@ static int __hidpp_send_report(struct
> > hid_device *hdev,
> >  }
> >
> >  /*
> > - * hidpp_send_message_sync() returns 0 in case of success, and
> > something else
> > - * in case of a failure.
> > - * - If ' something else' is positive, that means that an error has
> > been raised
> > - *   by the protocol itself.
> > - * - If ' something else' is negative, that means that we had a
> > classic error
> > - *   (-ENOMEM, -EPIPE, etc...)
> > + * Effectively send the message to the device, waiting for its
> > answer.
> > + *
> > + * Must be called with hidpp->send_mutex locked
> > + *
> > + * Same return protocol than hidpp_send_message_sync():
> > + * - success on 0
> > + * - negative error means transport error
> > + * - positive value means protocol error
> >   */
> > -static int hidpp_send_message_sync(struct hidpp_device *hidpp,
> > +static int __do_hidpp_send_message_sync(struct hidpp_device *hidpp,
> >         struct hidpp_report *message,
> >         struct hidpp_report *response)
> >  {
> > -       int ret = -1;
> > -       int max_retries = 3;
> > -
> > -       mutex_lock(&hidpp->send_mutex);
> > +       int ret;
> >
> >         hidpp->send_receive_buf = response;
> >         hidpp->answer_available = false;
> > @@ -300,41 +299,62 @@ static int hidpp_send_message_sync(struct
> > hidpp_device *hidpp,
> >          */
> >         *response = *message;
> >
> > -       for (; max_retries != 0 && ret; max_retries--) {
> > -               ret = __hidpp_send_report(hidpp->hid_dev, message);
> > +       ret = __hidpp_send_report(hidpp->hid_dev, message);
> > +       if (ret) {
> > +               dbg_hid("__hidpp_send_report returned err: %d\n",
> > ret);
> > +               memset(response, 0, sizeof(struct hidpp_report));
> > +               return ret;
> > +       }
> >
> > -               if (ret) {
> > -                       dbg_hid("__hidpp_send_report returned err:
> > %d\n", ret);
> > -                       memset(response, 0, sizeof(struct
> > hidpp_report));
> > -                       break;
> > -               }
> > +       if (!wait_event_timeout(hidpp->wait, hidpp->answer_available,
> > +                               5*HZ)) {
> > +               dbg_hid("%s:timeout waiting for response\n",
> > __func__);
> > +               memset(response, 0, sizeof(struct hidpp_report));
> > +               return -ETIMEDOUT;
> > +       }
> >
> > -               if (!wait_event_timeout(hidpp->wait, hidpp-
> > >answer_available,
> > -                                       5*HZ)) {
> > -                       dbg_hid("%s:timeout waiting for response\n",
> > __func__);
> > -                       memset(response, 0, sizeof(struct
> > hidpp_report));
> > -                       ret = -ETIMEDOUT;
> > -                       break;
> > -               }
> > +       if (response->report_id == REPORT_ID_HIDPP_SHORT &&
> > +           response->rap.sub_id == HIDPP_ERROR) {
> > +               ret = response->rap.params[1];
> > +               dbg_hid("%s:got hidpp error %02X\n", __func__, ret);
> > +               return ret;
> > +       }
> >
> > -               if (response->report_id == REPORT_ID_HIDPP_SHORT &&
> > -                   response->rap.sub_id == HIDPP_ERROR) {
> > -                       ret = response->rap.params[1];
> > -                       dbg_hid("%s:got hidpp error %02X\n",
> > __func__, ret);
> > +       if ((response->report_id == REPORT_ID_HIDPP_LONG ||
> > +            response->report_id == REPORT_ID_HIDPP_VERY_LONG) &&
> > +           response->fap.feature_index == HIDPP20_ERROR) {
> > +               ret = response->fap.params[1];
> > +               dbg_hid("%s:got hidpp 2.0 error %02X\n", __func__,
> > ret);
> > +               return ret;
> > +       }
> > +
> > +       return 0;
> > +}
> > +
> > +/*
> > + * hidpp_send_message_sync() returns 0 in case of success, and
> > something else
> > + * in case of a failure.
> > + * - If ' something else' is positive, that means that an error has
> > been raised
> > + *   by the protocol itself.
> > + * - If ' something else' is negative, that means that we had a
> > classic error
> > + *   (-ENOMEM, -EPIPE, etc...)
>
> Do we really need to re-explain the possible return values that were
> already explained above __do_hidpp_send_message_sync()?

Right, maybe we don't need to duplicate the comment after all.

>
> If we do, why don't also do it for hidpp_send_fap_command_sync() and
> hidpp_send_rap_command_sync(), or their callers?

In a way it would make sense to do, because this is non standard.

>
> If it's absolutely necessary, a "see __do_hidpp_send_message_sync()"
> should be enough.

Good point.

>
> I've double-checked that none of the existing callers expected a
> partially filled in "response" struct on error.
>
> Reviewed-by: Bastien Nocera <hadess@hadess.net>

Thanks!

Cheers,
Benjamin

>
> > + */
> > +static int hidpp_send_message_sync(struct hidpp_device *hidpp,
> > +       struct hidpp_report *message,
> > +       struct hidpp_report *response)
> > +{
> > +       int ret;
> > +       int max_retries = 3;
> > +
> > +       mutex_lock(&hidpp->send_mutex);
> > +
> > +       do {
> > +               ret = __do_hidpp_send_message_sync(hidpp, message,
> > response);
> > +               if (ret != HIDPP20_ERROR_BUSY)
> >                         break;
> > -               }
> >
> > -               if ((response->report_id == REPORT_ID_HIDPP_LONG ||
> > -                    response->report_id ==
> > REPORT_ID_HIDPP_VERY_LONG) &&
> > -                   response->fap.feature_index == HIDPP20_ERROR) {
> > -                       ret = response->fap.params[1];
> > -                       if (ret != HIDPP20_ERROR_BUSY) {
> > -                               dbg_hid("%s:got hidpp 2.0 error
> > %02X\n", __func__, ret);
> > -                               break;
> > -                       }
> > -                       dbg_hid("%s:got busy hidpp 2.0 error %02X,
> > retrying\n", __func__, ret);
> > -               }
> > -       }
> > +               dbg_hid("%s:got busy hidpp 2.0 error %02X,
> > retrying\n", __func__, ret);
> > +       } while (--max_retries);
> >
> >         mutex_unlock(&hidpp->send_mutex);
> >         return ret;
> >
> > ---
> > base-commit: b98ec211af5508457e2b1c4cc99373630a83fa81
> > change-id: 20230621-logitech-fixes-a4c0e66ea2ad
> >
> > Best regards,
>
  
Benjamin Tissoires June 26, 2023, 2:02 p.m. UTC | #6
On Sun, Jun 25, 2023 at 10:30 AM Bastien Nocera <hadess@hadess.net> wrote:
>
> On Fri, 2023-06-23 at 10:37 +0200, Benjamin Tissoires wrote:
> >
> > On Jun 21 2023, Greg KH wrote:
> > >
> > > On Wed, Jun 21, 2023 at 11:42:30AM +0200, Benjamin Tissoires wrote:
> > > > Make the code looks less like Pascal.
> > > >
> > > > Extract the internal code inside a helper function, fix the
> > > > initialization of the parameters used in the helper function
> > > > (`hidpp->answer_available` was not reset and `*response` wasn't
> > > > too),
> > > > and use a `do {...} while();` loop.
> > > >
> > > > Fixes: 586e8fede795 ("HID: logitech-hidpp: Retry commands when
> > > > device is busy")
> > > > Cc: stable@vger.kernel.org
> > > > Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
> > > > ---
> > > > as requested by
> > > > https://lore.kernel.org/all/CAHk-=wiMbF38KCNhPFiargenpSBoecSXTLQACKS2UMyo_Vu2ww@mail.gmail.com/
> > > > This is a rewrite of that particular piece of code.
> > > > ---
> > > >  drivers/hid/hid-logitech-hidpp.c | 102 +++++++++++++++++++++++--
> > > > --------------
> > > >  1 file changed, 61 insertions(+), 41 deletions(-)
> > > >
> > > > diff --git a/drivers/hid/hid-logitech-hidpp.c b/drivers/hid/hid-
> > > > logitech-hidpp.c
> > > > index dfe8e09a18de..3d1ffe199f08 100644
> > > > --- a/drivers/hid/hid-logitech-hidpp.c
> > > > +++ b/drivers/hid/hid-logitech-hidpp.c
> > > > @@ -275,21 +275,20 @@ static int __hidpp_send_report(struct
> > > > hid_device *hdev,
> > > >  }
> > > >
> > > >  /*
> > > > - * hidpp_send_message_sync() returns 0 in case of success, and
> > > > something else
> > > > - * in case of a failure.
> > > > - * - If ' something else' is positive, that means that an error
> > > > has been raised
> > > > - *   by the protocol itself.
> > > > - * - If ' something else' is negative, that means that we had a
> > > > classic error
> > > > - *   (-ENOMEM, -EPIPE, etc...)
> > > > + * Effectively send the message to the device, waiting for its
> > > > answer.
> > > > + *
> > > > + * Must be called with hidpp->send_mutex locked
> > > > + *
> > > > + * Same return protocol than hidpp_send_message_sync():
> > > > + * - success on 0
> > > > + * - negative error means transport error
> > > > + * - positive value means protocol error
> > > >   */
> > > > -static int hidpp_send_message_sync(struct hidpp_device *hidpp,
> > > > +static int __do_hidpp_send_message_sync(struct hidpp_device
> > > > *hidpp,
> > > >         struct hidpp_report *message,
> > > >         struct hidpp_report *response)
> > >
> > > __must_hold(&hidpp->send_mutex)  ?
> > >
> >
> > Good point. I'll add this in v2.
> >
> > I'm still waiting for some feedback from the people who particpated
> > in
> > the original BZ, but the new bug is harder to reproduce. Anyway,
> > there
> > is no rush IMO.
>
> The problem is only ever going to show up in very limited circumstances
> after the logic fix was applied.
>
> You need a hardware problem (such as the controller being too busy to
> answer) to trigger the problems fixed by this patch. I don't see a way
> to reliably reproduce it unless you inject that hardware error.
>

Some people on the Bz were able to reproduce with multiple reboots.
But it's not as urgent as previously, and we were close to the 6.4
final when I sent it. I'll make sure this goes into 6.5 and gets
proper stable backports FWIW.

Cheers,
Benjamin
  
Linux regression tracking (Thorsten Leemhuis) July 11, 2023, 1:09 p.m. UTC | #7
On 26.06.23 16:02, Benjamin Tissoires wrote:
> On Sun, Jun 25, 2023 at 10:30 AM Bastien Nocera <hadess@hadess.net> wrote:
>> On Fri, 2023-06-23 at 10:37 +0200, Benjamin Tissoires wrote:
>>> On Jun 21 2023, Greg KH wrote:
>>>> On Wed, Jun 21, 2023 at 11:42:30AM +0200, Benjamin Tissoires wrote:
>>>>> Make the code looks less like Pascal.
>>>>>
>>>>> Extract the internal code inside a helper function, fix the
>>>>> initialization of the parameters used in the helper function
>>>>> (`hidpp->answer_available` was not reset and `*response` wasn't
>>>>> too),
>>>>> and use a `do {...} while();` loop.
>>>>>
>>>>> Fixes: 586e8fede795 ("HID: logitech-hidpp: Retry commands when
>>>>> device is busy")
>>>>> Cc: stable@vger.kernel.org
>>>>> Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
>>>>> ---
>>>>> as requested by
>>>>> https://lore.kernel.org/all/CAHk-=wiMbF38KCNhPFiargenpSBoecSXTLQACKS2UMyo_Vu2ww@mail.gmail.com/
>>>>> This is a rewrite of that particular piece of code.
>>>>> ---
>>>>>  drivers/hid/hid-logitech-hidpp.c | 102 +++++++++++++++++++++++--
>>>>> --------------
>>>>>  1 file changed, 61 insertions(+), 41 deletions(-)
> [...]
> 
> Some people on the Bz were able to reproduce with multiple reboots.
> But it's not as urgent as previously, and we were close to the 6.4
> final when I sent it. I'll make sure this goes into 6.5 and gets
> proper stable backports FWIW.

Did that happen? Doesn't look like it from here, but maybe I'm missing
something. Where there maybe other changes to resolve the remaining
problems some users encounter sporadically since the urgent fixes went in?

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.
  
Benjamin Tissoires July 11, 2023, 1:40 p.m. UTC | #8
On Tue, Jul 11, 2023 at 3:10 PM Linux regression tracking (Thorsten
Leemhuis) <regressions@leemhuis.info> wrote:
>
> On 26.06.23 16:02, Benjamin Tissoires wrote:
> > On Sun, Jun 25, 2023 at 10:30 AM Bastien Nocera <hadess@hadess.net> wrote:
> >> On Fri, 2023-06-23 at 10:37 +0200, Benjamin Tissoires wrote:
> >>> On Jun 21 2023, Greg KH wrote:
> >>>> On Wed, Jun 21, 2023 at 11:42:30AM +0200, Benjamin Tissoires wrote:
> >>>>> Make the code looks less like Pascal.
> >>>>>
> >>>>> Extract the internal code inside a helper function, fix the
> >>>>> initialization of the parameters used in the helper function
> >>>>> (`hidpp->answer_available` was not reset and `*response` wasn't
> >>>>> too),
> >>>>> and use a `do {...} while();` loop.
> >>>>>
> >>>>> Fixes: 586e8fede795 ("HID: logitech-hidpp: Retry commands when
> >>>>> device is busy")
> >>>>> Cc: stable@vger.kernel.org
> >>>>> Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
> >>>>> ---
> >>>>> as requested by
> >>>>> https://lore.kernel.org/all/CAHk-=wiMbF38KCNhPFiargenpSBoecSXTLQACKS2UMyo_Vu2ww@mail.gmail.com/
> >>>>> This is a rewrite of that particular piece of code.
> >>>>> ---
> >>>>>  drivers/hid/hid-logitech-hidpp.c | 102 +++++++++++++++++++++++--
> >>>>> --------------
> >>>>>  1 file changed, 61 insertions(+), 41 deletions(-)
> > [...]
> >
> > Some people on the Bz were able to reproduce with multiple reboots.
> > But it's not as urgent as previously, and we were close to the 6.4
> > final when I sent it. I'll make sure this goes into 6.5 and gets
> > proper stable backports FWIW.
>
> Did that happen? Doesn't look like it from here, but maybe I'm missing
> something. Where there maybe other changes to resolve the remaining
> problems some users encounter sporadically since the urgent fixes went in?

No, there were no other changes that could have solved this. I guess
the randomness of the problem makes it way harder to detect and to
reproduce.

I'll send a v2 of that patch with the reviews today or tomorrow and we
can probably get it through the current 6.5 cycle.

Cheers,
Benjamin

>
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> --
> Everything you wanna know about Linux kernel regression tracking:
> https://linux-regtracking.leemhuis.info/about/#tldr
> If I did something stupid, please tell me, as explained on that page.
>
  
Linux regression tracking (Thorsten Leemhuis) July 11, 2023, 1:56 p.m. UTC | #9
On 11.07.23 15:40, Benjamin Tissoires wrote:
> On Tue, Jul 11, 2023 at 3:10 PM Linux regression tracking (Thorsten
> Leemhuis) <regressions@leemhuis.info> wrote:
>>
>> On 26.06.23 16:02, Benjamin Tissoires wrote:
>>> On Sun, Jun 25, 2023 at 10:30 AM Bastien Nocera <hadess@hadess.net> wrote:
>>>> On Fri, 2023-06-23 at 10:37 +0200, Benjamin Tissoires wrote:
>>>>> On Jun 21 2023, Greg KH wrote:
>>>>>> On Wed, Jun 21, 2023 at 11:42:30AM +0200, Benjamin Tissoires wrote:
>>>>>>> Make the code looks less like Pascal.
>>>>>>>
>>>>>>> Extract the internal code inside a helper function, fix the
>>>>>>> initialization of the parameters used in the helper function
>>>>>>> (`hidpp->answer_available` was not reset and `*response` wasn't
>>>>>>> too),
>>>>>>> and use a `do {...} while();` loop.
>>>>>>>
>>>>>>> Fixes: 586e8fede795 ("HID: logitech-hidpp: Retry commands when
>>>>>>> device is busy")
>>>>>>> Cc: stable@vger.kernel.org
>>>>>>> Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
>>>>>>> ---
>>>>>>> as requested by
>>>>>>> https://lore.kernel.org/all/CAHk-=wiMbF38KCNhPFiargenpSBoecSXTLQACKS2UMyo_Vu2ww@mail.gmail.com/
>>>>>>> This is a rewrite of that particular piece of code.
>>>>>>> ---
>>>>>>>  drivers/hid/hid-logitech-hidpp.c | 102 +++++++++++++++++++++++--
>>>>>>> --------------
>>>>>>>  1 file changed, 61 insertions(+), 41 deletions(-)
>>> [...]
>>>
>>> Some people on the Bz were able to reproduce with multiple reboots.
>>> But it's not as urgent as previously, and we were close to the 6.4
>>> final when I sent it. I'll make sure this goes into 6.5 and gets
>>> proper stable backports FWIW.
>>
>> Did that happen? Doesn't look like it from here, but maybe I'm missing
>> something. Where there maybe other changes to resolve the remaining
>> problems some users encounter sporadically since the urgent fixes went in?
> 
> No, there were no other changes that could have solved this. I guess
> the randomness of the problem makes it way harder to detect and to
> reproduce.
> 
> I'll send a v2 of that patch with the reviews today or tomorrow and we
> can probably get it through the current 6.5 cycle.

Great, many thx!

Ciao, Thorsten
  

Patch

diff --git a/drivers/hid/hid-logitech-hidpp.c b/drivers/hid/hid-logitech-hidpp.c
index dfe8e09a18de..3d1ffe199f08 100644
--- a/drivers/hid/hid-logitech-hidpp.c
+++ b/drivers/hid/hid-logitech-hidpp.c
@@ -275,21 +275,20 @@  static int __hidpp_send_report(struct hid_device *hdev,
 }
 
 /*
- * hidpp_send_message_sync() returns 0 in case of success, and something else
- * in case of a failure.
- * - If ' something else' is positive, that means that an error has been raised
- *   by the protocol itself.
- * - If ' something else' is negative, that means that we had a classic error
- *   (-ENOMEM, -EPIPE, etc...)
+ * Effectively send the message to the device, waiting for its answer.
+ *
+ * Must be called with hidpp->send_mutex locked
+ *
+ * Same return protocol than hidpp_send_message_sync():
+ * - success on 0
+ * - negative error means transport error
+ * - positive value means protocol error
  */
-static int hidpp_send_message_sync(struct hidpp_device *hidpp,
+static int __do_hidpp_send_message_sync(struct hidpp_device *hidpp,
 	struct hidpp_report *message,
 	struct hidpp_report *response)
 {
-	int ret = -1;
-	int max_retries = 3;
-
-	mutex_lock(&hidpp->send_mutex);
+	int ret;
 
 	hidpp->send_receive_buf = response;
 	hidpp->answer_available = false;
@@ -300,41 +299,62 @@  static int hidpp_send_message_sync(struct hidpp_device *hidpp,
 	 */
 	*response = *message;
 
-	for (; max_retries != 0 && ret; max_retries--) {
-		ret = __hidpp_send_report(hidpp->hid_dev, message);
+	ret = __hidpp_send_report(hidpp->hid_dev, message);
+	if (ret) {
+		dbg_hid("__hidpp_send_report returned err: %d\n", ret);
+		memset(response, 0, sizeof(struct hidpp_report));
+		return ret;
+	}
 
-		if (ret) {
-			dbg_hid("__hidpp_send_report returned err: %d\n", ret);
-			memset(response, 0, sizeof(struct hidpp_report));
-			break;
-		}
+	if (!wait_event_timeout(hidpp->wait, hidpp->answer_available,
+				5*HZ)) {
+		dbg_hid("%s:timeout waiting for response\n", __func__);
+		memset(response, 0, sizeof(struct hidpp_report));
+		return -ETIMEDOUT;
+	}
 
-		if (!wait_event_timeout(hidpp->wait, hidpp->answer_available,
-					5*HZ)) {
-			dbg_hid("%s:timeout waiting for response\n", __func__);
-			memset(response, 0, sizeof(struct hidpp_report));
-			ret = -ETIMEDOUT;
-			break;
-		}
+	if (response->report_id == REPORT_ID_HIDPP_SHORT &&
+	    response->rap.sub_id == HIDPP_ERROR) {
+		ret = response->rap.params[1];
+		dbg_hid("%s:got hidpp error %02X\n", __func__, ret);
+		return ret;
+	}
 
-		if (response->report_id == REPORT_ID_HIDPP_SHORT &&
-		    response->rap.sub_id == HIDPP_ERROR) {
-			ret = response->rap.params[1];
-			dbg_hid("%s:got hidpp error %02X\n", __func__, ret);
+	if ((response->report_id == REPORT_ID_HIDPP_LONG ||
+	     response->report_id == REPORT_ID_HIDPP_VERY_LONG) &&
+	    response->fap.feature_index == HIDPP20_ERROR) {
+		ret = response->fap.params[1];
+		dbg_hid("%s:got hidpp 2.0 error %02X\n", __func__, ret);
+		return ret;
+	}
+
+	return 0;
+}
+
+/*
+ * hidpp_send_message_sync() returns 0 in case of success, and something else
+ * in case of a failure.
+ * - If ' something else' is positive, that means that an error has been raised
+ *   by the protocol itself.
+ * - If ' something else' is negative, that means that we had a classic error
+ *   (-ENOMEM, -EPIPE, etc...)
+ */
+static int hidpp_send_message_sync(struct hidpp_device *hidpp,
+	struct hidpp_report *message,
+	struct hidpp_report *response)
+{
+	int ret;
+	int max_retries = 3;
+
+	mutex_lock(&hidpp->send_mutex);
+
+	do {
+		ret = __do_hidpp_send_message_sync(hidpp, message, response);
+		if (ret != HIDPP20_ERROR_BUSY)
 			break;
-		}
 
-		if ((response->report_id == REPORT_ID_HIDPP_LONG ||
-		     response->report_id == REPORT_ID_HIDPP_VERY_LONG) &&
-		    response->fap.feature_index == HIDPP20_ERROR) {
-			ret = response->fap.params[1];
-			if (ret != HIDPP20_ERROR_BUSY) {
-				dbg_hid("%s:got hidpp 2.0 error %02X\n", __func__, ret);
-				break;
-			}
-			dbg_hid("%s:got busy hidpp 2.0 error %02X, retrying\n", __func__, ret);
-		}
-	}
+		dbg_hid("%s:got busy hidpp 2.0 error %02X, retrying\n", __func__, ret);
+	} while (--max_retries);
 
 	mutex_unlock(&hidpp->send_mutex);
 	return ret;