net: qrtr: Do not do DEL_SERVER broadcast after DEL_CLIENT

Message ID 1680095250-21032-1-git-send-email-quic_srichara@quicinc.com
State New
Headers
Series net: qrtr: Do not do DEL_SERVER broadcast after DEL_CLIENT |

Commit Message

Sricharan Ramabadhran March 29, 2023, 1:07 p.m. UTC
  When the qrtr socket is released, qrtr_port_remove gets called, which
broadcasts a DEL_CLIENT. After this DEL_SERVER is also additionally
broadcasted, which becomes NOP, but triggers the below error msg.

"failed while handling packet from 2:-2", since remote node already
acted upon on receiving the DEL_CLIENT, once again when it receives
the DEL_SERVER, it returns -ENOENT.

Fixing it by not sending a 'DEL_SERVER' to remote when a 'DEL_CLIENT'
was sent for that port.

Signed-off-by: Ram Kumar D <quic_ramd@quicinc.com>
Signed-off-by: Sricharan R <quic_srichara@quicinc.com>
---
Note: Functionally tested on 5.4 kernel and compile tested on 6.3 TOT

 net/qrtr/ns.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)
  

Comments

Jakub Kicinski March 30, 2023, 4:32 a.m. UTC | #1
On Wed, 29 Mar 2023 18:37:30 +0530 Sricharan R wrote:
> When the qrtr socket is released, qrtr_port_remove gets called, which
> broadcasts a DEL_CLIENT. After this DEL_SERVER is also additionally
> broadcasted, which becomes NOP, but triggers the below error msg.
> 
> "failed while handling packet from 2:-2", since remote node already
> acted upon on receiving the DEL_CLIENT, once again when it receives
> the DEL_SERVER, it returns -ENOENT.
> 
> Fixing it by not sending a 'DEL_SERVER' to remote when a 'DEL_CLIENT'
> was sent for that port.

You use the word "fix" so please add a Fixes tag.

> Signed-off-by: Ram Kumar D <quic_ramd@quicinc.com>
> Signed-off-by: Sricharan R <quic_srichara@quicinc.com>

Spell out full names, please.
  
Manivannan Sadhasivam March 30, 2023, 6:24 a.m. UTC | #2
On Wed, Mar 29, 2023 at 06:37:30PM +0530, Sricharan R wrote:
> When the qrtr socket is released, qrtr_port_remove gets called, which
> broadcasts a DEL_CLIENT. After this DEL_SERVER is also additionally
> broadcasted, which becomes NOP, but triggers the below error msg.
> 
> "failed while handling packet from 2:-2", since remote node already
> acted upon on receiving the DEL_CLIENT, once again when it receives
> the DEL_SERVER, it returns -ENOENT.
> 
> Fixing it by not sending a 'DEL_SERVER' to remote when a 'DEL_CLIENT'
> was sent for that port.
> 

Can you share the qrtr trace when this happens to help me understand the flow?

- Mani

> Signed-off-by: Ram Kumar D <quic_ramd@quicinc.com>
> Signed-off-by: Sricharan R <quic_srichara@quicinc.com>
> ---
> Note: Functionally tested on 5.4 kernel and compile tested on 6.3 TOT
> 
>  net/qrtr/ns.c | 16 ++++++++++------
>  1 file changed, 10 insertions(+), 6 deletions(-)
> 
> diff --git a/net/qrtr/ns.c b/net/qrtr/ns.c
> index 722936f..6fbb195 100644
> --- a/net/qrtr/ns.c
> +++ b/net/qrtr/ns.c
> @@ -274,7 +274,7 @@ static struct qrtr_server *server_add(unsigned int service,
>  	return NULL;
>  }
>  
> -static int server_del(struct qrtr_node *node, unsigned int port)
> +static int server_del(struct qrtr_node *node, unsigned int port, bool del_server)
>  {
>  	struct qrtr_lookup *lookup;
>  	struct qrtr_server *srv;
> @@ -287,7 +287,7 @@ static int server_del(struct qrtr_node *node, unsigned int port)
>  	radix_tree_delete(&node->servers, port);
>  
>  	/* Broadcast the removal of local servers */
> -	if (srv->node == qrtr_ns.local_node)
> +	if (srv->node == qrtr_ns.local_node && del_server)
>  		service_announce_del(&qrtr_ns.bcast_sq, srv);
>  
>  	/* Announce the service's disappearance to observers */
> @@ -373,7 +373,7 @@ static int ctrl_cmd_bye(struct sockaddr_qrtr *from)
>  		}
>  		slot = radix_tree_iter_resume(slot, &iter);
>  		rcu_read_unlock();
> -		server_del(node, srv->port);
> +		server_del(node, srv->port, true);
>  		rcu_read_lock();
>  	}
>  	rcu_read_unlock();
> @@ -459,10 +459,14 @@ static int ctrl_cmd_del_client(struct sockaddr_qrtr *from,
>  		kfree(lookup);
>  	}
>  
> -	/* Remove the server belonging to this port */
> +	/* Remove the server belonging to this port
> +	 * Given that DEL_CLIENT is already broadcasted
> +	 * by port_remove, no need to send DEL_SERVER for
> +	 * the same port to remote
> +	 */
>  	node = node_get(node_id);
>  	if (node)
> -		server_del(node, port);
> +		server_del(node, port, false);
>  
>  	/* Advertise the removal of this client to all local servers */
>  	local_node = node_get(qrtr_ns.local_node);
> @@ -567,7 +571,7 @@ static int ctrl_cmd_del_server(struct sockaddr_qrtr *from,
>  	if (!node)
>  		return -ENOENT;
>  
> -	return server_del(node, port);
> +	return server_del(node, port, true);
>  }
>  
>  static int ctrl_cmd_new_lookup(struct sockaddr_qrtr *from,
> -- 
> 2.7.4
>
  
Sricharan Ramabadhran March 30, 2023, 9:48 a.m. UTC | #3
On 3/30/2023 11:54 AM, Manivannan Sadhasivam wrote:
> On Wed, Mar 29, 2023 at 06:37:30PM +0530, Sricharan R wrote:
>> When the qrtr socket is released, qrtr_port_remove gets called, which
>> broadcasts a DEL_CLIENT. After this DEL_SERVER is also additionally
>> broadcasted, which becomes NOP, but triggers the below error msg.
>>
>> "failed while handling packet from 2:-2", since remote node already
>> acted upon on receiving the DEL_CLIENT, once again when it receives
>> the DEL_SERVER, it returns -ENOENT.
>>
>> Fixing it by not sending a 'DEL_SERVER' to remote when a 'DEL_CLIENT'
>> was sent for that port.
>>
> 
> Can you share the qrtr trace when this happens to help me understand the flow?

    Flow is like this.

     IPQ                                   SDX
     ---                           	  ----
                                 	 qrtr_release
                                          qrtr_port_remove
                                 	 qrtr_send_del_client
                                        		|
                                     		|
                                     		|
                                     		|
    RX CTRL: cmd:0x6 addr[0x2:0x40d4]<-----------|
     (qrtr_send_client broadcasts it to          |
      the remote,                      		|
      IPQ cleans up the port)                    |
                                          	|
	                              ctrl_cmd_del_client
        		                        (send_del_client
	               	                 also forwards the
	                       	         DEL_CLIENT to
         	                       	 internal ns.c.
	                                 Which then again
         	                         sends DEL_server
                 	                 to same port to
                         	         remote)
                                 	       |
                                                |
    RX CTRL: cmd:0x5 SVC[0x1389:0x1]            |
      addr[0x2:0x40d4] <-------------------- ---|
      (IPQ on receiving the DEL_SERVER on
       same port throws the message
       "failed while handling packet from 2:-2")


  Regards,
    Sricharan
  
Sricharan Ramabadhran March 30, 2023, 9:58 a.m. UTC | #4
On 3/30/2023 10:02 AM, Jakub Kicinski wrote:
> On Wed, 29 Mar 2023 18:37:30 +0530 Sricharan R wrote:
>> When the qrtr socket is released, qrtr_port_remove gets called, which
>> broadcasts a DEL_CLIENT. After this DEL_SERVER is also additionally
>> broadcasted, which becomes NOP, but triggers the below error msg.
>>
>> "failed while handling packet from 2:-2", since remote node already
>> acted upon on receiving the DEL_CLIENT, once again when it receives
>> the DEL_SERVER, it returns -ENOENT.
>>
>> Fixing it by not sending a 'DEL_SERVER' to remote when a 'DEL_CLIENT'
>> was sent for that port.
> 
> You use the word "fix" so please add a Fixes tag.
> 

  ok

>> Signed-off-by: Ram Kumar D <quic_ramd@quicinc.com>
>> Signed-off-by: Sricharan R <quic_srichara@quicinc.com>
> 
> Spell out full names, please.

  ok

Regards,
  Sricharan
  
Manivannan Sadhasivam March 30, 2023, 12:39 p.m. UTC | #5
On Wed, Mar 29, 2023 at 06:37:30PM +0530, Sricharan R wrote:
> When the qrtr socket is released, qrtr_port_remove gets called, which
> broadcasts a DEL_CLIENT. After this DEL_SERVER is also additionally
> broadcasted, which becomes NOP, but triggers the below error msg.
> 
> "failed while handling packet from 2:-2", since remote node already
> acted upon on receiving the DEL_CLIENT, once again when it receives
> the DEL_SERVER, it returns -ENOENT.
> 
> Fixing it by not sending a 'DEL_SERVER' to remote when a 'DEL_CLIENT'
> was sent for that port.
> 

How about:

"On the remote side, when QRTR socket is removed, af_qrtr will call
qrtr_port_remove() which broadcasts the DEL_CLIENT packet to all neighbours
including local NS. NS upon receiving the DEL_CLIENT packet, will remove
the lookups associated with the node:port and broadcasts the DEL_SERVER
packet.

But on the host side, due to the arrival of the DEL_CLIENT packet, the NS
would've already deleted the server belonging to that port. So when the
remote's NS again broadcasts the DEL_SERVER for that port, it throws below
error message on the host:

"failed while handling packet from 2:-2"

So fix this error by not broadcasting the DEL_SERVER packet when the
DEL_CLIENT packet gets processed."

> Signed-off-by: Ram Kumar D <quic_ramd@quicinc.com>
> Signed-off-by: Sricharan R <quic_srichara@quicinc.com>
> ---
> Note: Functionally tested on 5.4 kernel and compile tested on 6.3 TOT
> 
>  net/qrtr/ns.c | 16 ++++++++++------
>  1 file changed, 10 insertions(+), 6 deletions(-)
> 
> diff --git a/net/qrtr/ns.c b/net/qrtr/ns.c
> index 722936f..6fbb195 100644
> --- a/net/qrtr/ns.c
> +++ b/net/qrtr/ns.c
> @@ -274,7 +274,7 @@ static struct qrtr_server *server_add(unsigned int service,
>  	return NULL;
>  }
>  
> -static int server_del(struct qrtr_node *node, unsigned int port)
> +static int server_del(struct qrtr_node *node, unsigned int port, bool del_server)

s/bool del_server/bool bcast/g

>  {
>  	struct qrtr_lookup *lookup;
>  	struct qrtr_server *srv;
> @@ -287,7 +287,7 @@ static int server_del(struct qrtr_node *node, unsigned int port)
>  	radix_tree_delete(&node->servers, port);
>  
>  	/* Broadcast the removal of local servers */
> -	if (srv->node == qrtr_ns.local_node)
> +	if (srv->node == qrtr_ns.local_node && del_server)
>  		service_announce_del(&qrtr_ns.bcast_sq, srv);
>  
>  	/* Announce the service's disappearance to observers */
> @@ -373,7 +373,7 @@ static int ctrl_cmd_bye(struct sockaddr_qrtr *from)
>  		}
>  		slot = radix_tree_iter_resume(slot, &iter);
>  		rcu_read_unlock();
> -		server_del(node, srv->port);
> +		server_del(node, srv->port, true);
>  		rcu_read_lock();
>  	}
>  	rcu_read_unlock();
> @@ -459,10 +459,14 @@ static int ctrl_cmd_del_client(struct sockaddr_qrtr *from,
>  		kfree(lookup);
>  	}
>  
> -	/* Remove the server belonging to this port */
> +	/* Remove the server belonging to this port
> +	 * Given that DEL_CLIENT is already broadcasted
> +	 * by port_remove, no need to send DEL_SERVER for
> +	 * the same port to remote
> +	 */

	/*
 	 * Remove the server belonging to this port but don't broadcast
	 * DEL_SERVER. Neighbours would've already removed the server belonging
	 * to this port due to the DEL_CLIENT broadcast from qrtr_port_remove().
	 */
- Mani

>  	node = node_get(node_id);
>  	if (node)
> -		server_del(node, port);
> +		server_del(node, port, false);
>  
>  	/* Advertise the removal of this client to all local servers */
>  	local_node = node_get(qrtr_ns.local_node);
> @@ -567,7 +571,7 @@ static int ctrl_cmd_del_server(struct sockaddr_qrtr *from,
>  	if (!node)
>  		return -ENOENT;
>  
> -	return server_del(node, port);
> +	return server_del(node, port, true);
>  }
>  
>  static int ctrl_cmd_new_lookup(struct sockaddr_qrtr *from,
> -- 
> 2.7.4
>
  
Sricharan Ramabadhran March 30, 2023, 7:43 p.m. UTC | #6
On 3/30/2023 6:09 PM, Manivannan Sadhasivam wrote:
> On Wed, Mar 29, 2023 at 06:37:30PM +0530, Sricharan R wrote:
>> When the qrtr socket is released, qrtr_port_remove gets called, which
>> broadcasts a DEL_CLIENT. After this DEL_SERVER is also additionally
>> broadcasted, which becomes NOP, but triggers the below error msg.
>>
>> "failed while handling packet from 2:-2", since remote node already
>> acted upon on receiving the DEL_CLIENT, once again when it receives
>> the DEL_SERVER, it returns -ENOENT.
>>
>> Fixing it by not sending a 'DEL_SERVER' to remote when a 'DEL_CLIENT'
>> was sent for that port.
>>
> 
> How about:
> 
> "On the remote side, when QRTR socket is removed, af_qrtr will call
> qrtr_port_remove() which broadcasts the DEL_CLIENT packet to all neighbours
> including local NS. NS upon receiving the DEL_CLIENT packet, will remove
> the lookups associated with the node:port and broadcasts the DEL_SERVER
> packet.
> 
> But on the host side, due to the arrival of the DEL_CLIENT packet, the NS
> would've already deleted the server belonging to that port. So when the
> remote's NS again broadcasts the DEL_SERVER for that port, it throws below
> error message on the host:
> 
> "failed while handling packet from 2:-2"
> 
> So fix this error by not broadcasting the DEL_SERVER packet when the
> DEL_CLIENT packet gets processed."
> 

   Sure, sounds good. Will change this up and send V2.

>> Signed-off-by: Ram Kumar D <quic_ramd@quicinc.com>
>> Signed-off-by: Sricharan R <quic_srichara@quicinc.com>
>> ---
>> Note: Functionally tested on 5.4 kernel and compile tested on 6.3 TOT
>>

  <...>

>>   
>> -	/* Remove the server belonging to this port */
>> +	/* Remove the server belonging to this port
>> +	 * Given that DEL_CLIENT is already broadcasted
>> +	 * by port_remove, no need to send DEL_SERVER for
>> +	 * the same port to remote
>> +	 */
> 
> 	/*
>   	 * Remove the server belonging to this port but don't broadcast
> 	 * DEL_SERVER. Neighbours would've already removed the server belonging
> 	 * to this port due to the DEL_CLIENT broadcast from qrtr_port_remove().
> 	 */

    Sure, would reword it like above in V2. Thanks.

Regards,
  Sricharan
  

Patch

diff --git a/net/qrtr/ns.c b/net/qrtr/ns.c
index 722936f..6fbb195 100644
--- a/net/qrtr/ns.c
+++ b/net/qrtr/ns.c
@@ -274,7 +274,7 @@  static struct qrtr_server *server_add(unsigned int service,
 	return NULL;
 }
 
-static int server_del(struct qrtr_node *node, unsigned int port)
+static int server_del(struct qrtr_node *node, unsigned int port, bool del_server)
 {
 	struct qrtr_lookup *lookup;
 	struct qrtr_server *srv;
@@ -287,7 +287,7 @@  static int server_del(struct qrtr_node *node, unsigned int port)
 	radix_tree_delete(&node->servers, port);
 
 	/* Broadcast the removal of local servers */
-	if (srv->node == qrtr_ns.local_node)
+	if (srv->node == qrtr_ns.local_node && del_server)
 		service_announce_del(&qrtr_ns.bcast_sq, srv);
 
 	/* Announce the service's disappearance to observers */
@@ -373,7 +373,7 @@  static int ctrl_cmd_bye(struct sockaddr_qrtr *from)
 		}
 		slot = radix_tree_iter_resume(slot, &iter);
 		rcu_read_unlock();
-		server_del(node, srv->port);
+		server_del(node, srv->port, true);
 		rcu_read_lock();
 	}
 	rcu_read_unlock();
@@ -459,10 +459,14 @@  static int ctrl_cmd_del_client(struct sockaddr_qrtr *from,
 		kfree(lookup);
 	}
 
-	/* Remove the server belonging to this port */
+	/* Remove the server belonging to this port
+	 * Given that DEL_CLIENT is already broadcasted
+	 * by port_remove, no need to send DEL_SERVER for
+	 * the same port to remote
+	 */
 	node = node_get(node_id);
 	if (node)
-		server_del(node, port);
+		server_del(node, port, false);
 
 	/* Advertise the removal of this client to all local servers */
 	local_node = node_get(qrtr_ns.local_node);
@@ -567,7 +571,7 @@  static int ctrl_cmd_del_server(struct sockaddr_qrtr *from,
 	if (!node)
 		return -ENOENT;
 
-	return server_del(node, port);
+	return server_del(node, port, true);
 }
 
 static int ctrl_cmd_new_lookup(struct sockaddr_qrtr *from,