[3/5] Bluetooth: hci_event: Remove limit of 2 reconnection attempts

Message ID 20240102185933.64179-4-verdre@v0yd.nl
State New
Headers
Series Bluetooth: Improve retrying of connection attempts |

Commit Message

Jonas Dreßler Jan. 2, 2024, 6:59 p.m. UTC
  Since commit 4c67bc74f016b0d360b8573e18969c0ff7926974, we retry connecting
later when we get a "Command Disallowed" error returned by "Create
Connection".

In this commit the intention was to retry only once, and give up if we see
"Command Disallowed" again on the second try.

This made sense back then when the retry was initiated *only* from the
"Connect Complete" event. If we received that event, we knew that now the
card now must have a "free slot" for a new connection request again. These
days we call hci_conn_check_pending() from a few more places though, and
in these places we can't really be sure that there's a "free slot" on the
card, so the second try to "Create Connection" might fail again.

Deal with this by being less strict about these retries and try again
every time we get "Command Disallowed" errors, removing the limitation to
only two attempts.

Since this can potentially cause us to enter an endless cycle of
reconnection attempts, we'll add some guarding against that with the next
commit.

Signed-off-by: Jonas Dreßler <verdre@v0yd.nl>
---
 net/bluetooth/hci_event.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)
  

Comments

Luiz Augusto von Dentz Jan. 3, 2024, 4:05 p.m. UTC | #1
Hi Jonas,

On Tue, Jan 2, 2024 at 1:59 PM Jonas Dreßler <verdre@v0yd.nl> wrote:
>
> Since commit 4c67bc74f016b0d360b8573e18969c0ff7926974, we retry connecting
> later when we get a "Command Disallowed" error returned by "Create
> Connection".
>
> In this commit the intention was to retry only once, and give up if we see
> "Command Disallowed" again on the second try.
>
> This made sense back then when the retry was initiated *only* from the
> "Connect Complete" event. If we received that event, we knew that now the
> card now must have a "free slot" for a new connection request again. These
> days we call hci_conn_check_pending() from a few more places though, and
> in these places we can't really be sure that there's a "free slot" on the
> card, so the second try to "Create Connection" might fail again.
>
> Deal with this by being less strict about these retries and try again
> every time we get "Command Disallowed" errors, removing the limitation to
> only two attempts.
>
> Since this can potentially cause us to enter an endless cycle of
> reconnection attempts, we'll add some guarding against that with the next
> commit.

Don't see where you are doing such guarding, besides you seem to
assume HCI_ERROR_COMMAND_DISALLOWED would always means the controller
is busy, or something like that, but it could perform the connection
later, but that may not always be the case, thus why I think
reconnecting just a few number of times is better, if you really need
to keep retrying then this needs to be controlled by a policy in
userspace not hardcoded in the kernel, well I can even argument that
perhaps the initial number of reconnection shall be configurable so
one don't have to recompile the kernel if that needs changing.

> Signed-off-by: Jonas Dreßler <verdre@v0yd.nl>
> ---
>  net/bluetooth/hci_event.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
> index e8b4a0126..e1f5b6f90 100644
> --- a/net/bluetooth/hci_event.c
> +++ b/net/bluetooth/hci_event.c
> @@ -2323,12 +2323,13 @@ static void hci_cs_create_conn(struct hci_dev *hdev, __u8 status)
>
>         if (status) {
>                 if (conn && conn->state == BT_CONNECT) {
> -                       if (status != HCI_ERROR_COMMAND_DISALLOWED || conn->attempt > 2) {
> +                       if (status == HCI_ERROR_COMMAND_DISALLOWED) {
> +                               conn->state = BT_CONNECT2;
> +                       } else {
>                                 conn->state = BT_CLOSED;
>                                 hci_connect_cfm(conn, status);
>                                 hci_conn_del(conn);
> -                       } else
> -                               conn->state = BT_CONNECT2;
> +                       }
>                 }
>         } else {
>                 if (!conn) {
> --
> 2.43.0
>
  
Jonas Dreßler Jan. 5, 2024, 3:54 p.m. UTC | #2
Hi Luiz,

On 1/3/24 17:05, Luiz Augusto von Dentz wrote:
> Hi Jonas,
> 
> On Tue, Jan 2, 2024 at 1:59 PM Jonas Dreßler <verdre@v0yd.nl> wrote:
>>
>> Since commit 4c67bc74f016b0d360b8573e18969c0ff7926974, we retry connecting
>> later when we get a "Command Disallowed" error returned by "Create
>> Connection".
>>
>> In this commit the intention was to retry only once, and give up if we see
>> "Command Disallowed" again on the second try.
>>
>> This made sense back then when the retry was initiated *only* from the
>> "Connect Complete" event. If we received that event, we knew that now the
>> card now must have a "free slot" for a new connection request again. These
>> days we call hci_conn_check_pending() from a few more places though, and
>> in these places we can't really be sure that there's a "free slot" on the
>> card, so the second try to "Create Connection" might fail again.
>>
>> Deal with this by being less strict about these retries and try again
>> every time we get "Command Disallowed" errors, removing the limitation to
>> only two attempts.
>>
>> Since this can potentially cause us to enter an endless cycle of
>> reconnection attempts, we'll add some guarding against that with the next
>> commit.
> 
> Don't see where you are doing such guarding, besides you seem to
> assume HCI_ERROR_COMMAND_DISALLOWED would always means the controller
> is busy, or something like that, but it could perform the connection
> later, but that may not always be the case, thus why I think
> reconnecting just a few number of times is better, if you really need
> to keep retrying then this needs to be controlled by a policy in
> userspace not hardcoded in the kernel, well I can even argument that
> perhaps the initial number of reconnection shall be configurable so
> one don't have to recompile the kernel if that needs changing.

Yes, fair enough, the next commit assumes that COMMAND_DISALLOWED always 
means busy. The guarding is that we stop retrying as soon as there's no 
(competing) ongoing connection attempt nor an active inquiry, which 
should eventually be the case no matter what, no?

I agree it's probably still better to not rely on this fairly complex 
sanity check and keep the checking of attempts nonetheless.

I think we could keep doing that if we check for 
!hci_conn_hash_lookup_state(hdev, ACL_LINK, BT_CONNECT) && 
!test_bit(HCI_INQUIRY, &hdev->flags) in hci_conn_check_pending() before 
we actually retry, to make sure the retry counter doesn't get 
incremented wrongly. I'll give that a try.

> 
>> Signed-off-by: Jonas Dreßler <verdre@v0yd.nl>
>> ---
>>   net/bluetooth/hci_event.c | 7 ++++---
>>   1 file changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
>> index e8b4a0126..e1f5b6f90 100644
>> --- a/net/bluetooth/hci_event.c
>> +++ b/net/bluetooth/hci_event.c
>> @@ -2323,12 +2323,13 @@ static void hci_cs_create_conn(struct hci_dev *hdev, __u8 status)
>>
>>          if (status) {
>>                  if (conn && conn->state == BT_CONNECT) {
>> -                       if (status != HCI_ERROR_COMMAND_DISALLOWED || conn->attempt > 2) {
>> +                       if (status == HCI_ERROR_COMMAND_DISALLOWED) {
>> +                               conn->state = BT_CONNECT2;
>> +                       } else {
>>                                  conn->state = BT_CLOSED;
>>                                  hci_connect_cfm(conn, status);
>>                                  hci_conn_del(conn);
>> -                       } else
>> -                               conn->state = BT_CONNECT2;
>> +                       }
>>                  }
>>          } else {
>>                  if (!conn) {
>> --
>> 2.43.0
>>
> 
> 

Cheers,
Jonas
  
Luiz Augusto von Dentz Jan. 5, 2024, 4:05 p.m. UTC | #3
Hi Jonas,

On Fri, Jan 5, 2024 at 10:54 AM Jonas Dreßler <verdre@v0yd.nl> wrote:
>
> Hi Luiz,
>
> On 1/3/24 17:05, Luiz Augusto von Dentz wrote:
> > Hi Jonas,
> >
> > On Tue, Jan 2, 2024 at 1:59 PM Jonas Dreßler <verdre@v0yd.nl> wrote:
> >>
> >> Since commit 4c67bc74f016b0d360b8573e18969c0ff7926974, we retry connecting
> >> later when we get a "Command Disallowed" error returned by "Create
> >> Connection".
> >>
> >> In this commit the intention was to retry only once, and give up if we see
> >> "Command Disallowed" again on the second try.
> >>
> >> This made sense back then when the retry was initiated *only* from the
> >> "Connect Complete" event. If we received that event, we knew that now the
> >> card now must have a "free slot" for a new connection request again. These
> >> days we call hci_conn_check_pending() from a few more places though, and
> >> in these places we can't really be sure that there's a "free slot" on the
> >> card, so the second try to "Create Connection" might fail again.
> >>
> >> Deal with this by being less strict about these retries and try again
> >> every time we get "Command Disallowed" errors, removing the limitation to
> >> only two attempts.
> >>
> >> Since this can potentially cause us to enter an endless cycle of
> >> reconnection attempts, we'll add some guarding against that with the next
> >> commit.
> >
> > Don't see where you are doing such guarding, besides you seem to
> > assume HCI_ERROR_COMMAND_DISALLOWED would always means the controller
> > is busy, or something like that, but it could perform the connection
> > later, but that may not always be the case, thus why I think
> > reconnecting just a few number of times is better, if you really need
> > to keep retrying then this needs to be controlled by a policy in
> > userspace not hardcoded in the kernel, well I can even argument that
> > perhaps the initial number of reconnection shall be configurable so
> > one don't have to recompile the kernel if that needs changing.
>
> Yes, fair enough, the next commit assumes that COMMAND_DISALLOWED always
> means busy. The guarding is that we stop retrying as soon as there's no
> (competing) ongoing connection attempt nor an active inquiry, which
> should eventually be the case no matter what, no?
>
> I agree it's probably still better to not rely on this fairly complex
> sanity check and keep the checking of attempts nonetheless.
>
> I think we could keep doing that if we check for
> !hci_conn_hash_lookup_state(hdev, ACL_LINK, BT_CONNECT) &&
> !test_bit(HCI_INQUIRY, &hdev->flags) in hci_conn_check_pending() before
> we actually retry, to make sure the retry counter doesn't get
> incremented wrongly. I'll give that a try.

Perhaps I'm missing something, but it should be possible to block
concurrent access to HCI while a command is pending with use of
hci_cmd_sync, at least on LE we do that by waiting the connection
complete event so connection attempts are serialized but we don't seem
to be doing the same for BR/EDR.


> >
> >> Signed-off-by: Jonas Dreßler <verdre@v0yd.nl>
> >> ---
> >>   net/bluetooth/hci_event.c | 7 ++++---
> >>   1 file changed, 4 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
> >> index e8b4a0126..e1f5b6f90 100644
> >> --- a/net/bluetooth/hci_event.c
> >> +++ b/net/bluetooth/hci_event.c
> >> @@ -2323,12 +2323,13 @@ static void hci_cs_create_conn(struct hci_dev *hdev, __u8 status)
> >>
> >>          if (status) {
> >>                  if (conn && conn->state == BT_CONNECT) {
> >> -                       if (status != HCI_ERROR_COMMAND_DISALLOWED || conn->attempt > 2) {
> >> +                       if (status == HCI_ERROR_COMMAND_DISALLOWED) {
> >> +                               conn->state = BT_CONNECT2;
> >> +                       } else {
> >>                                  conn->state = BT_CLOSED;
> >>                                  hci_connect_cfm(conn, status);
> >>                                  hci_conn_del(conn);
> >> -                       } else
> >> -                               conn->state = BT_CONNECT2;
> >> +                       }
> >>                  }
> >>          } else {
> >>                  if (!conn) {
> >> --
> >> 2.43.0
> >>
> >
> >
>
> Cheers,
> Jonas



--
Luiz Augusto von Dentz
  
Jonas Dreßler Jan. 7, 2024, 10:20 p.m. UTC | #4
Hi Luiz,

On 1/5/24 17:05, Luiz Augusto von Dentz wrote:
> Hi Jonas,
> 
> On Fri, Jan 5, 2024 at 10:54 AM Jonas Dreßler <verdre@v0yd.nl> wrote:
>>
>> Hi Luiz,
>>
>> On 1/3/24 17:05, Luiz Augusto von Dentz wrote:
>>> Hi Jonas,
>>>
>>> On Tue, Jan 2, 2024 at 1:59 PM Jonas Dreßler <verdre@v0yd.nl> wrote:
>>>>
>>>> Since commit 4c67bc74f016b0d360b8573e18969c0ff7926974, we retry connecting
>>>> later when we get a "Command Disallowed" error returned by "Create
>>>> Connection".
>>>>
>>>> In this commit the intention was to retry only once, and give up if we see
>>>> "Command Disallowed" again on the second try.
>>>>
>>>> This made sense back then when the retry was initiated *only* from the
>>>> "Connect Complete" event. If we received that event, we knew that now the
>>>> card now must have a "free slot" for a new connection request again. These
>>>> days we call hci_conn_check_pending() from a few more places though, and
>>>> in these places we can't really be sure that there's a "free slot" on the
>>>> card, so the second try to "Create Connection" might fail again.
>>>>
>>>> Deal with this by being less strict about these retries and try again
>>>> every time we get "Command Disallowed" errors, removing the limitation to
>>>> only two attempts.
>>>>
>>>> Since this can potentially cause us to enter an endless cycle of
>>>> reconnection attempts, we'll add some guarding against that with the next
>>>> commit.
>>>
>>> Don't see where you are doing such guarding, besides you seem to
>>> assume HCI_ERROR_COMMAND_DISALLOWED would always means the controller
>>> is busy, or something like that, but it could perform the connection
>>> later, but that may not always be the case, thus why I think
>>> reconnecting just a few number of times is better, if you really need
>>> to keep retrying then this needs to be controlled by a policy in
>>> userspace not hardcoded in the kernel, well I can even argument that
>>> perhaps the initial number of reconnection shall be configurable so
>>> one don't have to recompile the kernel if that needs changing.
>>
>> Yes, fair enough, the next commit assumes that COMMAND_DISALLOWED always
>> means busy. The guarding is that we stop retrying as soon as there's no
>> (competing) ongoing connection attempt nor an active inquiry, which
>> should eventually be the case no matter what, no?
>>
>> I agree it's probably still better to not rely on this fairly complex
>> sanity check and keep the checking of attempts nonetheless.
>>
>> I think we could keep doing that if we check for
>> !hci_conn_hash_lookup_state(hdev, ACL_LINK, BT_CONNECT) &&
>> !test_bit(HCI_INQUIRY, &hdev->flags) in hci_conn_check_pending() before
>> we actually retry, to make sure the retry counter doesn't get
>> incremented wrongly. I'll give that a try.
> 
> Perhaps I'm missing something, but it should be possible to block
> concurrent access to HCI while a command is pending with use of
> hci_cmd_sync, at least on LE we do that by waiting the connection
> complete event so connection attempts are serialized but we don't seem
> to be doing the same for BR/EDR.
> 

That's a good point, it might even allow for a nice little cleanup 
because we can then cancel inquiries in hci_acl_create_connection() 
synchronously, too.

Question is just whether there's any devices out there that actually do 
support paging with more than a single device at a time and if we want 
to support that?

> 
>>>
>>>> Signed-off-by: Jonas Dreßler <verdre@v0yd.nl>
>>>> ---
>>>>    net/bluetooth/hci_event.c | 7 ++++---
>>>>    1 file changed, 4 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
>>>> index e8b4a0126..e1f5b6f90 100644
>>>> --- a/net/bluetooth/hci_event.c
>>>> +++ b/net/bluetooth/hci_event.c
>>>> @@ -2323,12 +2323,13 @@ static void hci_cs_create_conn(struct hci_dev *hdev, __u8 status)
>>>>
>>>>           if (status) {
>>>>                   if (conn && conn->state == BT_CONNECT) {
>>>> -                       if (status != HCI_ERROR_COMMAND_DISALLOWED || conn->attempt > 2) {
>>>> +                       if (status == HCI_ERROR_COMMAND_DISALLOWED) {
>>>> +                               conn->state = BT_CONNECT2;
>>>> +                       } else {
>>>>                                   conn->state = BT_CLOSED;
>>>>                                   hci_connect_cfm(conn, status);
>>>>                                   hci_conn_del(conn);
>>>> -                       } else
>>>> -                               conn->state = BT_CONNECT2;
>>>> +                       }
>>>>                   }
>>>>           } else {
>>>>                   if (!conn) {
>>>> --
>>>> 2.43.0
>>>>
>>>
>>>
>>
>> Cheers,
>> Jonas
> 
> 
> 
> --
> Luiz Augusto von Dentz
  
Luiz Augusto von Dentz Jan. 7, 2024, 11:53 p.m. UTC | #5
Hi Jonas,

On Sun, Jan 7, 2024 at 5:20 PM Jonas Dreßler <verdre@v0yd.nl> wrote:
>
> Hi Luiz,
>
> On 1/5/24 17:05, Luiz Augusto von Dentz wrote:
> > Hi Jonas,
> >
> > On Fri, Jan 5, 2024 at 10:54 AM Jonas Dreßler <verdre@v0yd.nl> wrote:
> >>
> >> Hi Luiz,
> >>
> >> On 1/3/24 17:05, Luiz Augusto von Dentz wrote:
> >>> Hi Jonas,
> >>>
> >>> On Tue, Jan 2, 2024 at 1:59 PM Jonas Dreßler <verdre@v0yd.nl> wrote:
> >>>>
> >>>> Since commit 4c67bc74f016b0d360b8573e18969c0ff7926974, we retry connecting
> >>>> later when we get a "Command Disallowed" error returned by "Create
> >>>> Connection".
> >>>>
> >>>> In this commit the intention was to retry only once, and give up if we see
> >>>> "Command Disallowed" again on the second try.
> >>>>
> >>>> This made sense back then when the retry was initiated *only* from the
> >>>> "Connect Complete" event. If we received that event, we knew that now the
> >>>> card now must have a "free slot" for a new connection request again. These
> >>>> days we call hci_conn_check_pending() from a few more places though, and
> >>>> in these places we can't really be sure that there's a "free slot" on the
> >>>> card, so the second try to "Create Connection" might fail again.
> >>>>
> >>>> Deal with this by being less strict about these retries and try again
> >>>> every time we get "Command Disallowed" errors, removing the limitation to
> >>>> only two attempts.
> >>>>
> >>>> Since this can potentially cause us to enter an endless cycle of
> >>>> reconnection attempts, we'll add some guarding against that with the next
> >>>> commit.
> >>>
> >>> Don't see where you are doing such guarding, besides you seem to
> >>> assume HCI_ERROR_COMMAND_DISALLOWED would always means the controller
> >>> is busy, or something like that, but it could perform the connection
> >>> later, but that may not always be the case, thus why I think
> >>> reconnecting just a few number of times is better, if you really need
> >>> to keep retrying then this needs to be controlled by a policy in
> >>> userspace not hardcoded in the kernel, well I can even argument that
> >>> perhaps the initial number of reconnection shall be configurable so
> >>> one don't have to recompile the kernel if that needs changing.
> >>
> >> Yes, fair enough, the next commit assumes that COMMAND_DISALLOWED always
> >> means busy. The guarding is that we stop retrying as soon as there's no
> >> (competing) ongoing connection attempt nor an active inquiry, which
> >> should eventually be the case no matter what, no?
> >>
> >> I agree it's probably still better to not rely on this fairly complex
> >> sanity check and keep the checking of attempts nonetheless.
> >>
> >> I think we could keep doing that if we check for
> >> !hci_conn_hash_lookup_state(hdev, ACL_LINK, BT_CONNECT) &&
> >> !test_bit(HCI_INQUIRY, &hdev->flags) in hci_conn_check_pending() before
> >> we actually retry, to make sure the retry counter doesn't get
> >> incremented wrongly. I'll give that a try.
> >
> > Perhaps I'm missing something, but it should be possible to block
> > concurrent access to HCI while a command is pending with use of
> > hci_cmd_sync, at least on LE we do that by waiting the connection
> > complete event so connection attempts are serialized but we don't seem
> > to be doing the same for BR/EDR.
> >
>
> That's a good point, it might even allow for a nice little cleanup
> because we can then cancel inquiries in hci_acl_create_connection()
> synchronously, too.
>
> Question is just whether there's any devices out there that actually do
> support paging with more than a single device at a time and if we want
> to support that?

I think the controller would serialize the paging anyway, so the fact
that we would serialize it on the host side might actually means that
we have a common behavior across controllers rather than having to
handle errors when the controller is not capable of serializing
connections by themselves.

> >
> >>>
> >>>> Signed-off-by: Jonas Dreßler <verdre@v0yd.nl>
> >>>> ---
> >>>>    net/bluetooth/hci_event.c | 7 ++++---
> >>>>    1 file changed, 4 insertions(+), 3 deletions(-)
> >>>>
> >>>> diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
> >>>> index e8b4a0126..e1f5b6f90 100644
> >>>> --- a/net/bluetooth/hci_event.c
> >>>> +++ b/net/bluetooth/hci_event.c
> >>>> @@ -2323,12 +2323,13 @@ static void hci_cs_create_conn(struct hci_dev *hdev, __u8 status)
> >>>>
> >>>>           if (status) {
> >>>>                   if (conn && conn->state == BT_CONNECT) {
> >>>> -                       if (status != HCI_ERROR_COMMAND_DISALLOWED || conn->attempt > 2) {
> >>>> +                       if (status == HCI_ERROR_COMMAND_DISALLOWED) {
> >>>> +                               conn->state = BT_CONNECT2;
> >>>> +                       } else {
> >>>>                                   conn->state = BT_CLOSED;
> >>>>                                   hci_connect_cfm(conn, status);
> >>>>                                   hci_conn_del(conn);
> >>>> -                       } else
> >>>> -                               conn->state = BT_CONNECT2;
> >>>> +                       }
> >>>>                   }
> >>>>           } else {
> >>>>                   if (!conn) {
> >>>> --
> >>>> 2.43.0
> >>>>
> >>>
> >>>
> >>
> >> Cheers,
> >> Jonas
> >
> >
> >
> > --
> > Luiz Augusto von Dentz
  

Patch

diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
index e8b4a0126..e1f5b6f90 100644
--- a/net/bluetooth/hci_event.c
+++ b/net/bluetooth/hci_event.c
@@ -2323,12 +2323,13 @@  static void hci_cs_create_conn(struct hci_dev *hdev, __u8 status)
 
 	if (status) {
 		if (conn && conn->state == BT_CONNECT) {
-			if (status != HCI_ERROR_COMMAND_DISALLOWED || conn->attempt > 2) {
+			if (status == HCI_ERROR_COMMAND_DISALLOWED) {
+				conn->state = BT_CONNECT2;
+			} else {
 				conn->state = BT_CLOSED;
 				hci_connect_cfm(conn, status);
 				hci_conn_del(conn);
-			} else
-				conn->state = BT_CONNECT2;
+			}
 		}
 	} else {
 		if (!conn) {