[for-next,v3,2/2] RDMA/rxe: Fix mr leak in RESPST_ERR_RNR

Message ID 20221024052049.20577-1-lizhijian@fujitsu.com
State New
Headers
Series [for-next,v3,1/2] RDMA/rxe: Remove unnecessary mr testing |

Commit Message

Zhijian Li (Fujitsu) Oct. 24, 2022, 5:20 a.m. UTC
  rxe_recheck_mr() will increase mr's ref_cnt, so we should call rxe_put(mr)
to drop mr's ref_cnt in RESPST_ERR_RNR to avoid below warning:
[  633.447883] WARNING: CPU: 0 PID: 4156 at drivers/infiniband/sw/rxe/rxe_pool.c:259 __rxe_cleanup+0x1df/0x240 [rdma_rxe]
...
[  633.509482] Call Trace:
[  633.510246]  <TASK>
[  633.510962]  rxe_dereg_mr+0x4c/0x60 [rdma_rxe]
[  633.512123]  ib_dereg_mr_user+0xa8/0x200 [ib_core]
[  633.513444]  ib_mr_pool_destroy+0x77/0xb0 [ib_core]
[  633.514763]  nvme_rdma_destroy_queue_ib+0x89/0x240 [nvme_rdma]
[  633.516230]  nvme_rdma_free_queue+0x40/0x50 [nvme_rdma]
[  633.517577]  nvme_rdma_teardown_io_queues.part.0+0xc3/0x120 [nvme_rdma]
[  633.519204]  nvme_rdma_error_recovery_work+0x4d/0xf0 [nvme_rdma]
[  633.520695]  process_one_work+0x582/0xa40
[  633.522987]  ? pwq_dec_nr_in_flight+0x100/0x100
[  633.524227]  ? rwlock_bug.part.0+0x60/0x60
[  633.525372]  worker_thread+0x2a9/0x700
[  633.526437]  ? process_one_work+0xa40/0xa40
[  633.527589]  kthread+0x168/0x1a0
[  633.528518]  ? kthread_complete_and_exit+0x20/0x20
[  633.529792]  ret_from_fork+0x22/0x30

CC: Bob Pearson <rpearsonhpe@gmail.com>
Fixes: 8a1a0be894da ("RDMA/rxe: Replace mr by rkey in responder resources")
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
V2: remove mr testing
---
 drivers/infiniband/sw/rxe/rxe_resp.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)
  

Comments

Leon Romanovsky Oct. 24, 2022, 11:59 a.m. UTC | #1
On Mon, Oct 24, 2022 at 01:20:49PM +0800, Li Zhijian wrote:
> rxe_recheck_mr() will increase mr's ref_cnt, so we should call rxe_put(mr)
> to drop mr's ref_cnt in RESPST_ERR_RNR to avoid below warning:
> [  633.447883] WARNING: CPU: 0 PID: 4156 at drivers/infiniband/sw/rxe/rxe_pool.c:259 __rxe_cleanup+0x1df/0x240 [rdma_rxe]
> ...
> [  633.509482] Call Trace:
> [  633.510246]  <TASK>
> [  633.510962]  rxe_dereg_mr+0x4c/0x60 [rdma_rxe]
> [  633.512123]  ib_dereg_mr_user+0xa8/0x200 [ib_core]
> [  633.513444]  ib_mr_pool_destroy+0x77/0xb0 [ib_core]
> [  633.514763]  nvme_rdma_destroy_queue_ib+0x89/0x240 [nvme_rdma]
> [  633.516230]  nvme_rdma_free_queue+0x40/0x50 [nvme_rdma]
> [  633.517577]  nvme_rdma_teardown_io_queues.part.0+0xc3/0x120 [nvme_rdma]
> [  633.519204]  nvme_rdma_error_recovery_work+0x4d/0xf0 [nvme_rdma]
> [  633.520695]  process_one_work+0x582/0xa40
> [  633.522987]  ? pwq_dec_nr_in_flight+0x100/0x100
> [  633.524227]  ? rwlock_bug.part.0+0x60/0x60
> [  633.525372]  worker_thread+0x2a9/0x700
> [  633.526437]  ? process_one_work+0xa40/0xa40
> [  633.527589]  kthread+0x168/0x1a0
> [  633.528518]  ? kthread_complete_and_exit+0x20/0x20
> [  633.529792]  ret_from_fork+0x22/0x30
> 
> CC: Bob Pearson <rpearsonhpe@gmail.com>
> Fixes: 8a1a0be894da ("RDMA/rxe: Replace mr by rkey in responder resources")
> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
> V2: remove mr testing

This should be after ---

> ---
>  drivers/infiniband/sw/rxe/rxe_resp.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
> index b02639cf8cba..41250154a478 100644
> --- a/drivers/infiniband/sw/rxe/rxe_resp.c
> +++ b/drivers/infiniband/sw/rxe/rxe_resp.c
> @@ -806,8 +806,10 @@ static enum resp_states read_reply(struct rxe_qp *qp,
>  
>  	skb = prepare_ack_packet(qp, &ack_pkt, opcode, payload,
>  				 res->cur_psn, AETH_ACK_UNLIMITED);
> -	if (!skb)
> +	if (!skb) {
> +		rxe_put(mr);
>  		return RESPST_ERR_RNR;
> +	}
>  
>  	rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt),
>  		    payload, RXE_FROM_MR_OBJ);
> -- 
> 2.31.1
>
  
Bob Pearson Oct. 24, 2022, 2:15 p.m. UTC | #2
On 10/24/22 00:20, Li Zhijian wrote:
> rxe_recheck_mr() will increase mr's ref_cnt, so we should call rxe_put(mr)
> to drop mr's ref_cnt in RESPST_ERR_RNR to avoid below warning:
> [  633.447883] WARNING: CPU: 0 PID: 4156 at drivers/infiniband/sw/rxe/rxe_pool.c:259 __rxe_cleanup+0x1df/0x240 [rdma_rxe]
> ...
> [  633.509482] Call Trace:
> [  633.510246]  <TASK>
> [  633.510962]  rxe_dereg_mr+0x4c/0x60 [rdma_rxe]
> [  633.512123]  ib_dereg_mr_user+0xa8/0x200 [ib_core]
> [  633.513444]  ib_mr_pool_destroy+0x77/0xb0 [ib_core]
> [  633.514763]  nvme_rdma_destroy_queue_ib+0x89/0x240 [nvme_rdma]
> [  633.516230]  nvme_rdma_free_queue+0x40/0x50 [nvme_rdma]
> [  633.517577]  nvme_rdma_teardown_io_queues.part.0+0xc3/0x120 [nvme_rdma]
> [  633.519204]  nvme_rdma_error_recovery_work+0x4d/0xf0 [nvme_rdma]
> [  633.520695]  process_one_work+0x582/0xa40
> [  633.522987]  ? pwq_dec_nr_in_flight+0x100/0x100
> [  633.524227]  ? rwlock_bug.part.0+0x60/0x60
> [  633.525372]  worker_thread+0x2a9/0x700
> [  633.526437]  ? process_one_work+0xa40/0xa40
> [  633.527589]  kthread+0x168/0x1a0
> [  633.528518]  ? kthread_complete_and_exit+0x20/0x20
> [  633.529792]  ret_from_fork+0x22/0x30
> 
> CC: Bob Pearson <rpearsonhpe@gmail.com>
> Fixes: 8a1a0be894da ("RDMA/rxe: Replace mr by rkey in responder resources")
> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
> V2: remove mr testing
> ---
>  drivers/infiniband/sw/rxe/rxe_resp.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
> index b02639cf8cba..41250154a478 100644
> --- a/drivers/infiniband/sw/rxe/rxe_resp.c
> +++ b/drivers/infiniband/sw/rxe/rxe_resp.c
> @@ -806,8 +806,10 @@ static enum resp_states read_reply(struct rxe_qp *qp,
>  
>  	skb = prepare_ack_packet(qp, &ack_pkt, opcode, payload,
>  				 res->cur_psn, AETH_ACK_UNLIMITED);
> -	if (!skb)
> +	if (!skb) {
> +		rxe_put(mr);
>  		return RESPST_ERR_RNR;
> +	}
>  
>  	rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt),
>  		    payload, RXE_FROM_MR_OBJ);

This is correct. Good catch. Needs cleanup per Leon otherwise it's good.

Bob
  
Jason Gunthorpe Oct. 24, 2022, 5:13 p.m. UTC | #3
On Mon, Oct 24, 2022 at 01:20:49PM +0800, Li Zhijian wrote:
> rxe_recheck_mr() will increase mr's ref_cnt, so we should call rxe_put(mr)
> to drop mr's ref_cnt in RESPST_ERR_RNR to avoid below warning:
> [  633.447883] WARNING: CPU: 0 PID: 4156 at drivers/infiniband/sw/rxe/rxe_pool.c:259 __rxe_cleanup+0x1df/0x240 [rdma_rxe]
> ...
> [  633.509482] Call Trace:
> [  633.510246]  <TASK>
> [  633.510962]  rxe_dereg_mr+0x4c/0x60 [rdma_rxe]
> [  633.512123]  ib_dereg_mr_user+0xa8/0x200 [ib_core]
> [  633.513444]  ib_mr_pool_destroy+0x77/0xb0 [ib_core]
> [  633.514763]  nvme_rdma_destroy_queue_ib+0x89/0x240 [nvme_rdma]
> [  633.516230]  nvme_rdma_free_queue+0x40/0x50 [nvme_rdma]
> [  633.517577]  nvme_rdma_teardown_io_queues.part.0+0xc3/0x120 [nvme_rdma]
> [  633.519204]  nvme_rdma_error_recovery_work+0x4d/0xf0 [nvme_rdma]
> [  633.520695]  process_one_work+0x582/0xa40
> [  633.522987]  ? pwq_dec_nr_in_flight+0x100/0x100
> [  633.524227]  ? rwlock_bug.part.0+0x60/0x60
> [  633.525372]  worker_thread+0x2a9/0x700
> [  633.526437]  ? process_one_work+0xa40/0xa40
> [  633.527589]  kthread+0x168/0x1a0
> [  633.528518]  ? kthread_complete_and_exit+0x20/0x20
> [  633.529792]  ret_from_fork+0x22/0x30
> 
> CC: Bob Pearson <rpearsonhpe@gmail.com>
> Fixes: 8a1a0be894da ("RDMA/rxe: Replace mr by rkey in responder resources")
> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
> V2: remove mr testing
> ---
>  drivers/infiniband/sw/rxe/rxe_resp.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)

Sigh, please try to avoid sending patches in a way that causes
patchworks to become confused. I updated things to remove the if as in
this v2.

Jason
  
Zhijian Li (Fujitsu) Oct. 25, 2022, 1:05 a.m. UTC | #4
On 25/10/2022 01:13, Jason Gunthorpe wrote:
> On Mon, Oct 24, 2022 at 01:20:49PM +0800, Li Zhijian wrote:
>> rxe_recheck_mr() will increase mr's ref_cnt, so we should call rxe_put(mr)
>> to drop mr's ref_cnt in RESPST_ERR_RNR to avoid below warning:
>> [  633.447883] WARNING: CPU: 0 PID: 4156 at drivers/infiniband/sw/rxe/rxe_pool.c:259 __rxe_cleanup+0x1df/0x240 [rdma_rxe]
>> ...
>> [  633.509482] Call Trace:
>> [  633.510246]  <TASK>
>> [  633.510962]  rxe_dereg_mr+0x4c/0x60 [rdma_rxe]
>> [  633.512123]  ib_dereg_mr_user+0xa8/0x200 [ib_core]
>> [  633.513444]  ib_mr_pool_destroy+0x77/0xb0 [ib_core]
>> [  633.514763]  nvme_rdma_destroy_queue_ib+0x89/0x240 [nvme_rdma]
>> [  633.516230]  nvme_rdma_free_queue+0x40/0x50 [nvme_rdma]
>> [  633.517577]  nvme_rdma_teardown_io_queues.part.0+0xc3/0x120 [nvme_rdma]
>> [  633.519204]  nvme_rdma_error_recovery_work+0x4d/0xf0 [nvme_rdma]
>> [  633.520695]  process_one_work+0x582/0xa40
>> [  633.522987]  ? pwq_dec_nr_in_flight+0x100/0x100
>> [  633.524227]  ? rwlock_bug.part.0+0x60/0x60
>> [  633.525372]  worker_thread+0x2a9/0x700
>> [  633.526437]  ? process_one_work+0xa40/0xa40
>> [  633.527589]  kthread+0x168/0x1a0
>> [  633.528518]  ? kthread_complete_and_exit+0x20/0x20
>> [  633.529792]  ret_from_fork+0x22/0x30
>>
>> CC: Bob Pearson <rpearsonhpe@gmail.com>
>> Fixes: 8a1a0be894da ("RDMA/rxe: Replace mr by rkey in responder resources")
>> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
>> V2: remove mr testing
>> ---
>>   drivers/infiniband/sw/rxe/rxe_resp.c | 4 +++-
>>   1 file changed, 3 insertions(+), 1 deletion(-)
> Sigh, please try to avoid sending patches in a way that causes
> patchworks to become confused.

Understood
> I updated things to remove the if as in
> this v2.

thanks a lot.




>
> Jason
  

Patch

diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
index b02639cf8cba..41250154a478 100644
--- a/drivers/infiniband/sw/rxe/rxe_resp.c
+++ b/drivers/infiniband/sw/rxe/rxe_resp.c
@@ -806,8 +806,10 @@  static enum resp_states read_reply(struct rxe_qp *qp,
 
 	skb = prepare_ack_packet(qp, &ack_pkt, opcode, payload,
 				 res->cur_psn, AETH_ACK_UNLIMITED);
-	if (!skb)
+	if (!skb) {
+		rxe_put(mr);
 		return RESPST_ERR_RNR;
+	}
 
 	rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt),
 		    payload, RXE_FROM_MR_OBJ);