[BlueZ] Bluetooth: hci_core: Skip hci_cmd_work if hci_request is pending

Message ID 20240214012328.BlueZ.1.I10eda6f375bc8dfedf4eef3a8cf6572c65803afc@changeid
State New
Headers
Series [BlueZ] Bluetooth: hci_core: Skip hci_cmd_work if hci_request is pending |

Commit Message

Hsin-chen Chuang Feb. 13, 2024, 5:23 p.m. UTC
  hci_cmd_work overwrites the hdev->sent_cmd which contains the required
info for a hci_request to work. In the real world, it's observed that
a request from hci_le_ext_create_conn_sync could be interrupted by
the authentication (hci_conn_auth) caused by rfcomm_sock_connect. When
it happends, hci_le_ext_create_conn_sync hangs until timeout; If the
LE connection is triggered by MGMT, it freezes the whole MGMT interface.

Signed-off-by: Hsin-chen Chuang <chharry@chromium.org>
---

 net/bluetooth/hci_core.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)
  

Comments

Luiz Augusto von Dentz Feb. 15, 2024, 9:37 p.m. UTC | #1
Hi Hsin-chen,

On Tue, Feb 13, 2024 at 12:24 PM Hsin-chen Chuang <chharry@chromiumorg> wrote:
>
> hci_cmd_work overwrites the hdev->sent_cmd which contains the required
> info for a hci_request to work. In the real world, it's observed that
> a request from hci_le_ext_create_conn_sync could be interrupted by
> the authentication (hci_conn_auth) caused by rfcomm_sock_connect. When
> it happends, hci_le_ext_create_conn_sync hangs until timeout; If the
> LE connection is triggered by MGMT, it freezes the whole MGMT interface.
>
> Signed-off-by: Hsin-chen Chuang <chharry@chromium.org>
> ---
>
>  net/bluetooth/hci_core.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
> index 34c8dca2069f..e3706889976d 100644
> --- a/net/bluetooth/hci_core.c
> +++ b/net/bluetooth/hci_core.c
> @@ -4213,8 +4213,11 @@ static void hci_cmd_work(struct work_struct *work)
>         BT_DBG("%s cmd_cnt %d cmd queued %d", hdev->name,
>                atomic_read(&hdev->cmd_cnt), skb_queue_len(&hdev->cmd_q));
>
> -       /* Send queued commands */
> -       if (atomic_read(&hdev->cmd_cnt)) {
> +       /* Send queued commands. Don't send the command when there is a pending
> +        * hci_request because the request callbacks would be overwritten.
> +        */
> +       if (atomic_read(&hdev->cmd_cnt) &&
> +           !hci_dev_test_flag(hdev, HCI_CMD_PENDING)) {
>                 skb = skb_dequeue(&hdev->cmd_q);
>                 if (!skb)
>                         return;
> --
> 2.43.0.687.g38aa6559b0-goog


This seems to be causing some mgmt-tester failures:

Pair Device - Sec Mode 3 Success 1                   Timed out   22.753 seconds
Pair Device - Sec Mode 3 Reject 1                    Timed out   22.533 seconds
Pair Device - Sec Mode 3 Reject 2                    Timed out   22.526 seconds

I think this is because we need to respond to an event with a command like:

< HCI Command: Create Conn.. (0x01|0x0005) plen 13  #241 [hci0] 16:25:38.699066
        Address: 00:AA:01:01:00:00 (Intel Corporation)
        Packet type: 0x0018
          DM1 may be used
          DH1 may be used
        Page scan repetition mode: R2 (0x02)
        Page scan mode: Mandatory (0x00)
        Clock offset: 0x0000
        Role switch: Allow peripheral (0x01)
> HCI Event: Command Status (0x0f) plen 4           #242 [hci0] 16:25:38.701881
      Create Connection (0x01|0x0005) ncmd 1
        Status: Success (0x00)
> HCI Event: Link Key Request (0x17) plen 6         #243 [hci0] 16:25:38.702375
        Address: 00:AA:01:01:00:00 (Intel Corporation)

But because Create Connection is pending we cannot respond to Link Key
Request, so it is actually a design problem if we cannot send commands
because something is pending so perhaps we need to redesign how we
store cmd_sent so we can have multiple outstanding commands rather
than just one.
  
Luiz Augusto von Dentz Feb. 16, 2024, 3:54 p.m. UTC | #2
Hi Hsin-chen,

On Thu, Feb 15, 2024 at 11:21 PM Hsin-chen Chuang <chharry@google.com> wrote:
>
> +Some Googlers who would be interested in
>
> Hi Luiz,
>
> How about moving the hci_req-related data out from sent_cmd? This allows sending HCI commands while hci_req data would not be overwritten.

I have something like the following in the works:

https://gist.github.com/Vudentz/251275bb688fac32585f90ac0076c407

It is not stable yet, but I think we can get away with it since it
just means we can keep the pending request stored in the req_skb, that
said we might need to overhaul this design since it is not very clean
in my opinion.

> On Fri, Feb 16, 2024 at 5:37 AM Luiz Augusto von Dentz <luiz.dentz@gmail.com> wrote:
>>
>> Hi Hsin-chen,
>>
>> On Tue, Feb 13, 2024 at 12:24 PM Hsin-chen Chuang <chharry@chromium.org> wrote:
>> >
>> > hci_cmd_work overwrites the hdev->sent_cmd which contains the required
>> > info for a hci_request to work. In the real world, it's observed that
>> > a request from hci_le_ext_create_conn_sync could be interrupted by
>> > the authentication (hci_conn_auth) caused by rfcomm_sock_connect. When
>> > it happends, hci_le_ext_create_conn_sync hangs until timeout; If the
>> > LE connection is triggered by MGMT, it freezes the whole MGMT interface.
>> >
>> > Signed-off-by: Hsin-chen Chuang <chharry@chromium.org>
>> > ---
>> >
>> >  net/bluetooth/hci_core.c | 7 +++++--
>> >  1 file changed, 5 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
>> > index 34c8dca2069f..e3706889976d 100644
>> > --- a/net/bluetooth/hci_core.c
>> > +++ b/net/bluetooth/hci_core.c
>> > @@ -4213,8 +4213,11 @@ static void hci_cmd_work(struct work_struct *work)
>> >         BT_DBG("%s cmd_cnt %d cmd queued %d", hdev->name,
>> >                atomic_read(&hdev->cmd_cnt), skb_queue_len(&hdev->cmd_q));
>> >
>> > -       /* Send queued commands */
>> > -       if (atomic_read(&hdev->cmd_cnt)) {
>> > +       /* Send queued commands. Don't send the command when there is a pending
>> > +        * hci_request because the request callbacks would be overwritten.
>> > +        */
>> > +       if (atomic_read(&hdev->cmd_cnt) &&
>> > +           !hci_dev_test_flag(hdev, HCI_CMD_PENDING)) {
>> >                 skb = skb_dequeue(&hdev->cmd_q);
>> >                 if (!skb)
>> >                         return;
>> > --
>> > 2.43.0.687.g38aa6559b0-goog
>>
>>
>> This seems to be causing some mgmt-tester failures:
>>
>> Pair Device - Sec Mode 3 Success 1                   Timed out   22.753 seconds
>> Pair Device - Sec Mode 3 Reject 1                    Timed out   22.533 seconds
>> Pair Device - Sec Mode 3 Reject 2                    Timed out   22.526 seconds
>>
>> I think this is because we need to respond to an event with a command like:
>>
>> < HCI Command: Create Conn.. (0x01|0x0005) plen 13  #241 [hci0] 16:25:38699066
>>         Address: 00:AA:01:01:00:00 (Intel Corporation)
>>         Packet type: 0x0018
>>           DM1 may be used
>>           DH1 may be used
>>         Page scan repetition mode: R2 (0x02)
>>         Page scan mode: Mandatory (0x00)
>>         Clock offset: 0x0000
>>         Role switch: Allow peripheral (0x01)
>> > HCI Event: Command Status (0x0f) plen 4           #242 [hci0] 16:25:38701881
>>       Create Connection (0x01|0x0005) ncmd 1
>>         Status: Success (0x00)
>> > HCI Event: Link Key Request (0x17) plen 6         #243 [hci0] 16:25:38702375
>>         Address: 00:AA:01:01:00:00 (Intel Corporation)
>>
>> But because Create Connection is pending we cannot respond to Link Key
>> Request, so it is actually a design problem if we cannot send commands
>> because something is pending so perhaps we need to redesign how we
>> store cmd_sent so we can have multiple outstanding commands rather
>> than just one.
>>
>> --
>> Luiz Augusto von Dentz
>
>
>
> --
> Best Regards,
> Hsin-chen
  

Patch

diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
index 34c8dca2069f..e3706889976d 100644
--- a/net/bluetooth/hci_core.c
+++ b/net/bluetooth/hci_core.c
@@ -4213,8 +4213,11 @@  static void hci_cmd_work(struct work_struct *work)
 	BT_DBG("%s cmd_cnt %d cmd queued %d", hdev->name,
 	       atomic_read(&hdev->cmd_cnt), skb_queue_len(&hdev->cmd_q));
 
-	/* Send queued commands */
-	if (atomic_read(&hdev->cmd_cnt)) {
+	/* Send queued commands. Don't send the command when there is a pending
+	 * hci_request because the request callbacks would be overwritten.
+	 */
+	if (atomic_read(&hdev->cmd_cnt) &&
+	    !hci_dev_test_flag(hdev, HCI_CMD_PENDING)) {
 		skb = skb_dequeue(&hdev->cmd_q);
 		if (!skb)
 			return;