nfc: virtual_ncidev: Add variable to check if ndev is running

Message ID 20231119164705.1991375-1-phind.uet@gmail.com
State New
Headers
Series nfc: virtual_ncidev: Add variable to check if ndev is running |

Commit Message

Nguyen Dinh Phi Nov. 19, 2023, 4:47 p.m. UTC
  syzbot reported an memory leak that happens when an skb is add to
send_buff after virtual nci closed.
This patch adds a variable to track if the ndev is running before
handling new skb in send function.

Reported-by: syzbot+6eb09d75211863f15e3e@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/lkml/00000000000075472b06007df4fb@google.com
Signed-off-by: Nguyen Dinh Phi <phind.uet@gmail.com>
---
 drivers/nfc/virtual_ncidev.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)
  

Comments

Bongsu Jeon Nov. 20, 2023, 4:47 a.m. UTC | #1
On 20/11/2023 01:47, Nguyen Dinh Phi wrote:

> syzbot reported an memory leak that happens when an skb is add to
> send_buff after virtual nci closed.
> This patch adds a variable to track if the ndev is running before
> handling new skb in send function.
> 
> Reported-by: syzbot+6eb09d75211863f15e3e@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/lkml/00000000000075472b06007df4fb@google.com
> Signed-off-by: Nguyen Dinh Phi <phind.uet@gmail.com>
> ---
>  drivers/nfc/virtual_ncidev.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/nfc/virtual_ncidev.c b/drivers/nfc/virtual_ncidev.c
> index b027be0b0b6f..ac8226db54e2 100644
> --- a/drivers/nfc/virtual_ncidev.c
> +++ b/drivers/nfc/virtual_ncidev.c
> @@ -20,26 +20,31 @@
>                                   NFC_PROTO_ISO14443_MASK | \
>                                   NFC_PROTO_ISO14443_B_MASK | \
>                                   NFC_PROTO_ISO15693_MASK)
> +#define NCIDEV_RUNNING 0
This define isn't used.

>  
>  struct virtual_nci_dev {
>          struct nci_dev *ndev;
>          struct mutex mtx;
>          struct sk_buff *send_buff;
>          struct wait_queue_head wq;
> +        bool running;
>  };
>  
>  static int virtual_nci_open(struct nci_dev *ndev)
>  {
> +        struct virtual_nci_dev *vdev = nci_get_drvdata(ndev);
> +
> +        vdev->running = true;
>          return 0;
>  }
>  
>  static int virtual_nci_close(struct nci_dev *ndev)
>  {
>          struct virtual_nci_dev *vdev = nci_get_drvdata(ndev);
> -
>          mutex_lock(&vdev->mtx);
>          kfree_skb(vdev->send_buff);
>          vdev->send_buff = NULL;
> +        vdev->running = false;
>          mutex_unlock(&vdev->mtx);
>  
>          return 0;
> @@ -50,7 +55,7 @@ static int virtual_nci_send(struct nci_dev *ndev, struct sk_buff *skb)
>          struct virtual_nci_dev *vdev = nci_get_drvdata(ndev);
>  
>          mutex_lock(&vdev->mtx);
> -        if (vdev->send_buff) {
> +        if (vdev->send_buff || !vdev->running) {

Dear Krzysztof,

I agree this defensive code.
But i think NFC submodule has to avoid this situation.(calling send function of closed nci_dev)
Could you check this?

Best regards,
Bongsu
  
Krzysztof Kozlowski Nov. 20, 2023, 9:06 a.m. UTC | #2
On 20/11/2023 05:47, Bongsu Jeon wrote:
> 
> On 20/11/2023 01:47, Nguyen Dinh Phi wrote:
> 
>> syzbot reported an memory leak that happens when an skb is add to
>> send_buff after virtual nci closed.
>> This patch adds a variable to track if the ndev is running before
>> handling new skb in send function.
>>
>> Reported-by: syzbot+6eb09d75211863f15e3e@syzkaller.appspotmail.com
>> Closes: https://lore.kernel.org/lkml/00000000000075472b06007df4fb@google.com
>> Signed-off-by: Nguyen Dinh Phi <phind.uet@gmail.com>
>> ---
>>  drivers/nfc/virtual_ncidev.c | 9 +++++++--
>>  1 file changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/nfc/virtual_ncidev.c b/drivers/nfc/virtual_ncidev.c
>> index b027be0b0b6f..ac8226db54e2 100644
>> --- a/drivers/nfc/virtual_ncidev.c
>> +++ b/drivers/nfc/virtual_ncidev.c
>> @@ -20,26 +20,31 @@
>>                                   NFC_PROTO_ISO14443_MASK | \
>>                                   NFC_PROTO_ISO14443_B_MASK | \
>>                                   NFC_PROTO_ISO15693_MASK)
>> +#define NCIDEV_RUNNING 0
> This define isn't used.
> 
>>  
>>  struct virtual_nci_dev {
>>          struct nci_dev *ndev;
>>          struct mutex mtx;
>>          struct sk_buff *send_buff;
>>          struct wait_queue_head wq;
>> +        bool running;
>>  };
>>  
>>  static int virtual_nci_open(struct nci_dev *ndev)
>>  {
>> +        struct virtual_nci_dev *vdev = nci_get_drvdata(ndev);
>> +
>> +        vdev->running = true;
>>          return 0;
>>  }
>>  
>>  static int virtual_nci_close(struct nci_dev *ndev)
>>  {
>>          struct virtual_nci_dev *vdev = nci_get_drvdata(ndev);
>> -
>>          mutex_lock(&vdev->mtx);
>>          kfree_skb(vdev->send_buff);
>>          vdev->send_buff = NULL;
>> +        vdev->running = false;
>>          mutex_unlock(&vdev->mtx);
>>  
>>          return 0;
>> @@ -50,7 +55,7 @@ static int virtual_nci_send(struct nci_dev *ndev, struct sk_buff *skb)
>>          struct virtual_nci_dev *vdev = nci_get_drvdata(ndev);
>>  
>>          mutex_lock(&vdev->mtx);
>> -        if (vdev->send_buff) {
>> +        if (vdev->send_buff || !vdev->running) {
> 
> Dear Krzysztof,
> 
> I agree this defensive code.
> But i think NFC submodule has to avoid this situation.(calling send function of closed nci_dev)
> Could you check this?

This code looks not effective. At this point vdev->send_buff is always
false, so the additional check would not bring any value.

I don't see this fixing anything. Syzbot also does not seem to agree.

Nguyen, please test your patches against syzbot *before* sending them.
If you claim this fixes the report, please provide me the link to syzbot
test results confirming it is fixed.

I looked at syzbot dashboard and do not see this issue fixed with this
patch.

Best regards,
Krzysztof
  
Nguyen Dinh Phi Nov. 20, 2023, 10:39 a.m. UTC | #3
On 20/11/23 5:06 pm, Krzysztof Kozlowski wrote:
> On 20/11/2023 05:47, Bongsu Jeon wrote:
>>
>> On 20/11/2023 01:47, Nguyen Dinh Phi wrote:
>>
>>> syzbot reported an memory leak that happens when an skb is add to
>>> send_buff after virtual nci closed.
>>> This patch adds a variable to track if the ndev is running before
>>> handling new skb in send function.
>>>
>>> Reported-by: syzbot+6eb09d75211863f15e3e@syzkaller.appspotmail.com
>>> Closes: https://lore.kernel.org/lkml/00000000000075472b06007df4fb@google.com
>>> Signed-off-by: Nguyen Dinh Phi <phind.uet@gmail.com>
>>> ---
>>>   drivers/nfc/virtual_ncidev.c | 9 +++++++--
>>>   1 file changed, 7 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/nfc/virtual_ncidev.c b/drivers/nfc/virtual_ncidev.c
>>> index b027be0b0b6f..ac8226db54e2 100644
>>> --- a/drivers/nfc/virtual_ncidev.c
>>> +++ b/drivers/nfc/virtual_ncidev.c
>>> @@ -20,26 +20,31 @@
>>>                                    NFC_PROTO_ISO14443_MASK | \
>>>                                    NFC_PROTO_ISO14443_B_MASK | \
>>>                                    NFC_PROTO_ISO15693_MASK)
>>> +#define NCIDEV_RUNNING 0
>> This define isn't used.
>>
>>>   
>>>   struct virtual_nci_dev {
>>>           struct nci_dev *ndev;
>>>           struct mutex mtx;
>>>           struct sk_buff *send_buff;
>>>           struct wait_queue_head wq;
>>> +        bool running;
>>>   };
>>>   
>>>   static int virtual_nci_open(struct nci_dev *ndev)
>>>   {
>>> +        struct virtual_nci_dev *vdev = nci_get_drvdata(ndev);
>>> +
>>> +        vdev->running = true;
>>>           return 0;
>>>   }
>>>   
>>>   static int virtual_nci_close(struct nci_dev *ndev)
>>>   {
>>>           struct virtual_nci_dev *vdev = nci_get_drvdata(ndev);
>>> -
>>>           mutex_lock(&vdev->mtx);
>>>           kfree_skb(vdev->send_buff);
>>>           vdev->send_buff = NULL;
>>> +        vdev->running = false;
>>>           mutex_unlock(&vdev->mtx);
>>>   
>>>           return 0;
>>> @@ -50,7 +55,7 @@ static int virtual_nci_send(struct nci_dev *ndev, struct sk_buff *skb)
>>>           struct virtual_nci_dev *vdev = nci_get_drvdata(ndev);
>>>   
>>>           mutex_lock(&vdev->mtx);
>>> -        if (vdev->send_buff) {
>>> +        if (vdev->send_buff || !vdev->running) {
>>
>> Dear Krzysztof,
>>
>> I agree this defensive code.
>> But i think NFC submodule has to avoid this situation.(calling send function of closed nci_dev)
>> Could you check this?
> 
> This code looks not effective. At this point vdev->send_buff is always
> false, so the additional check would not bring any value.
> 
> I don't see this fixing anything. Syzbot also does not seem to agree.
> 
> Nguyen, please test your patches against syzbot *before* sending them.
> If you claim this fixes the report, please provide me the link to syzbot
> test results confirming it is fixed.
> 
> I looked at syzbot dashboard and do not see this issue fixed with this
> patch.
> 
> Best regards,
> Krzysztof
> 

Hi Krzysztof,

I've submitted it to syzbot, it is the test request that created at 
[2023/11/20 09:39] in dashboard link 
https://syzkaller.appspot.com/bug?extid=6eb09d75211863f15e3e

Best regards,
Phi
  
Krzysztof Kozlowski Nov. 20, 2023, 10:45 a.m. UTC | #4
On 20/11/2023 11:39, Nguyen Dinh Phi wrote:
>>>>           mutex_lock(&vdev->mtx);
>>>>           kfree_skb(vdev->send_buff);
>>>>           vdev->send_buff = NULL;
>>>> +        vdev->running = false;
>>>>           mutex_unlock(&vdev->mtx);
>>>>   
>>>>           return 0;
>>>> @@ -50,7 +55,7 @@ static int virtual_nci_send(struct nci_dev *ndev, struct sk_buff *skb)
>>>>           struct virtual_nci_dev *vdev = nci_get_drvdata(ndev);
>>>>   
>>>>           mutex_lock(&vdev->mtx);
>>>> -        if (vdev->send_buff) {
>>>> +        if (vdev->send_buff || !vdev->running) {
>>>
>>> Dear Krzysztof,
>>>
>>> I agree this defensive code.
>>> But i think NFC submodule has to avoid this situation.(calling send function of closed nci_dev)
>>> Could you check this?
>>
>> This code looks not effective. At this point vdev->send_buff is always
>> false, so the additional check would not bring any value.
>>
>> I don't see this fixing anything. Syzbot also does not seem to agree.
>>
>> Nguyen, please test your patches against syzbot *before* sending them.
>> If you claim this fixes the report, please provide me the link to syzbot
>> test results confirming it is fixed.
>>
>> I looked at syzbot dashboard and do not see this issue fixed with this
>> patch.
>>
>> Best regards,
>> Krzysztof
>>
> 
> Hi Krzysztof,
> 
> I've submitted it to syzbot, it is the test request that created at 
> [2023/11/20 09:39] in dashboard link 
> https://syzkaller.appspot.com/bug?extid=6eb09d75211863f15e3e

...and I see there two errors.

I don't know, maybe I miss something obvious (our brains like to do it
sometimes), but please explain me how this could fix anything?

Best regards,
Krzysztof
  
Nguyen Dinh Phi Nov. 20, 2023, 6:23 p.m. UTC | #5
On 11/20/2023 6:45 PM, Krzysztof Kozlowski wrote:
> On 20/11/2023 11:39, Nguyen Dinh Phi wrote:
>>>>>            mutex_lock(&vdev->mtx);
>>>>>            kfree_skb(vdev->send_buff);
>>>>>            vdev->send_buff = NULL;
>>>>> +        vdev->running = false;
>>>>>            mutex_unlock(&vdev->mtx);
>>>>>    
>>>>>            return 0;
>>>>> @@ -50,7 +55,7 @@ static int virtual_nci_send(struct nci_dev *ndev, struct sk_buff *skb)
>>>>>            struct virtual_nci_dev *vdev = nci_get_drvdata(ndev);
>>>>>    
>>>>>            mutex_lock(&vdev->mtx);
>>>>> -        if (vdev->send_buff) {
>>>>> +        if (vdev->send_buff || !vdev->running) {
>>>>
>>>> Dear Krzysztof,
>>>>
>>>> I agree this defensive code.
>>>> But i think NFC submodule has to avoid this situation.(calling send function of closed nci_dev)
>>>> Could you check this?
>>>
>>> This code looks not effective. At this point vdev->send_buff is always
>>> false, so the additional check would not bring any value.
>>>
>>> I don't see this fixing anything. Syzbot also does not seem to agree.
>>>
>>> Nguyen, please test your patches against syzbot *before* sending them.
>>> If you claim this fixes the report, please provide me the link to syzbot
>>> test results confirming it is fixed.
>>>
>>> I looked at syzbot dashboard and do not see this issue fixed with this
>>> patch.
>>>
>>> Best regards,
>>> Krzysztof
>>>
>>
>> Hi Krzysztof,
>>
>> I've submitted it to syzbot, it is the test request that created at
>> [2023/11/20 09:39] in dashboard link
>> https://syzkaller.appspot.com/bug?extid=6eb09d75211863f15e3e
> 
> ...and I see there two errors.
> 
These are because I sent email wrongly and syzbot truncates the patch 
and can not compile

> I don't know, maybe I miss something obvious (our brains like to do it
> sometimes), but please explain me how this could fix anything?
> 
> Best regards,
> Krzysztof
> 

The issue arises when an skb is added to the send_buff after invoking 
ndev->ops->close() but before unregistering the device. In such cases, 
the virtual device will generate a copy of skb, but with no consumer 
thereafter. Consequently, this object persists indefinitely.

This problem seems to stem from the existence of time gaps between 
ops->close() and the destruction of the workqueue. During this interval, 
incoming requests continue to trigger the send function.

best regards,
Phi
  
Krzysztof Kozlowski Nov. 20, 2023, 6:29 p.m. UTC | #6
On 20/11/2023 19:23, Phi Nguyen wrote:
> On 11/20/2023 6:45 PM, Krzysztof Kozlowski wrote:
>> On 20/11/2023 11:39, Nguyen Dinh Phi wrote:
>>>>>>            mutex_lock(&vdev->mtx);
>>>>>>            kfree_skb(vdev->send_buff);
>>>>>>            vdev->send_buff = NULL;
>>>>>> +        vdev->running = false;
>>>>>>            mutex_unlock(&vdev->mtx);
>>>>>>    
>>>>>>            return 0;
>>>>>> @@ -50,7 +55,7 @@ static int virtual_nci_send(struct nci_dev *ndev, struct sk_buff *skb)
>>>>>>            struct virtual_nci_dev *vdev = nci_get_drvdata(ndev);
>>>>>>    
>>>>>>            mutex_lock(&vdev->mtx);
>>>>>> -        if (vdev->send_buff) {
>>>>>> +        if (vdev->send_buff || !vdev->running) {
>>>>>
>>>>> Dear Krzysztof,
>>>>>
>>>>> I agree this defensive code.
>>>>> But i think NFC submodule has to avoid this situation.(calling send function of closed nci_dev)
>>>>> Could you check this?
>>>>
>>>> This code looks not effective. At this point vdev->send_buff is always
>>>> false, so the additional check would not bring any value.
>>>>
>>>> I don't see this fixing anything. Syzbot also does not seem to agree.
>>>>
>>>> Nguyen, please test your patches against syzbot *before* sending them.
>>>> If you claim this fixes the report, please provide me the link to syzbot
>>>> test results confirming it is fixed.
>>>>
>>>> I looked at syzbot dashboard and do not see this issue fixed with this
>>>> patch.
>>>>
>>>> Best regards,
>>>> Krzysztof
>>>>
>>>
>>> Hi Krzysztof,
>>>
>>> I've submitted it to syzbot, it is the test request that created at
>>> [2023/11/20 09:39] in dashboard link
>>> https://syzkaller.appspot.com/bug?extid=6eb09d75211863f15e3e
>>
>> ...and I see there two errors.
>>
> These are because I sent email wrongly and syzbot truncates the patch 
> and can not compile
> 
>> I don't know, maybe I miss something obvious (our brains like to do it
>> sometimes), but please explain me how this could fix anything?
>>
>> Best regards,
>> Krzysztof
>>
> 
> The issue arises when an skb is added to the send_buff after invoking 
> ndev->ops->close() but before unregistering the device. In such cases, 
> the virtual device will generate a copy of skb, but with no consumer 
> thereafter. Consequently, this object persists indefinitely.
> 
> This problem seems to stem from the existence of time gaps between 
> ops->close() and the destruction of the workqueue. During this interval, 
> incoming requests continue to trigger the send function.

I asked how this could fix anything. Can you respond to my original comment?

Look:

>>>> This code looks not effective. At this point vdev->send_buff is always
>>>> false, so the additional check would not bring any value.

Best regards,
Krzysztof
  
Krzysztof Kozlowski Nov. 20, 2023, 6:32 p.m. UTC | #7
On 20/11/2023 19:29, Krzysztof Kozlowski wrote:
> On 20/11/2023 19:23, Phi Nguyen wrote:
>> On 11/20/2023 6:45 PM, Krzysztof Kozlowski wrote:
>>> On 20/11/2023 11:39, Nguyen Dinh Phi wrote:
>>>>>>>            mutex_lock(&vdev->mtx);
>>>>>>>            kfree_skb(vdev->send_buff);
>>>>>>>            vdev->send_buff = NULL;
>>>>>>> +        vdev->running = false;
>>>>>>>            mutex_unlock(&vdev->mtx);
>>>>>>>    
>>>>>>>            return 0;
>>>>>>> @@ -50,7 +55,7 @@ static int virtual_nci_send(struct nci_dev *ndev, struct sk_buff *skb)
>>>>>>>            struct virtual_nci_dev *vdev = nci_get_drvdata(ndev);
>>>>>>>    
>>>>>>>            mutex_lock(&vdev->mtx);
>>>>>>> -        if (vdev->send_buff) {
>>>>>>> +        if (vdev->send_buff || !vdev->running) {
>>>>>>
>>>>>> Dear Krzysztof,
>>>>>>
>>>>>> I agree this defensive code.
>>>>>> But i think NFC submodule has to avoid this situation.(calling send function of closed nci_dev)
>>>>>> Could you check this?
>>>>>
>>>>> This code looks not effective. At this point vdev->send_buff is always
>>>>> false, so the additional check would not bring any value.
>>>>>
>>>>> I don't see this fixing anything. Syzbot also does not seem to agree.
>>>>>
>>>>> Nguyen, please test your patches against syzbot *before* sending them.
>>>>> If you claim this fixes the report, please provide me the link to syzbot
>>>>> test results confirming it is fixed.
>>>>>
>>>>> I looked at syzbot dashboard and do not see this issue fixed with this
>>>>> patch.
>>>>>
>>>>> Best regards,
>>>>> Krzysztof
>>>>>
>>>>
>>>> Hi Krzysztof,
>>>>
>>>> I've submitted it to syzbot, it is the test request that created at
>>>> [2023/11/20 09:39] in dashboard link
>>>> https://syzkaller.appspot.com/bug?extid=6eb09d75211863f15e3e
>>>
>>> ...and I see there two errors.
>>>
>> These are because I sent email wrongly and syzbot truncates the patch 
>> and can not compile
>>
>>> I don't know, maybe I miss something obvious (our brains like to do it
>>> sometimes), but please explain me how this could fix anything?
>>>
>>> Best regards,
>>> Krzysztof
>>>
>>
>> The issue arises when an skb is added to the send_buff after invoking 
>> ndev->ops->close() but before unregistering the device. In such cases, 
>> the virtual device will generate a copy of skb, but with no consumer 
>> thereafter. Consequently, this object persists indefinitely.
>>
>> This problem seems to stem from the existence of time gaps between 
>> ops->close() and the destruction of the workqueue. During this interval, 
>> incoming requests continue to trigger the send function.
> 
> I asked how this could fix anything. Can you respond to my original comment?
> 
> Look:
> 
>>>>> This code looks not effective. At this point vdev->send_buff is always
>>>>> false, so the additional check would not bring any value.

Uh, now I see, I missed that's the if here is for send_buff != NULL, so
quite different case than virtual_nci_close().

Best regards,
Krzysztof
  
Krzysztof Kozlowski Nov. 20, 2023, 6:44 p.m. UTC | #8
On 19/11/2023 17:47, Nguyen Dinh Phi wrote:
> syzbot reported an memory leak that happens when an skb is add to
> send_buff after virtual nci closed.
> This patch adds a variable to track if the ndev is running before
> handling new skb in send function.
> 
> Reported-by: syzbot+6eb09d75211863f15e3e@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/lkml/00000000000075472b06007df4fb@google.com
> Signed-off-by: Nguyen Dinh Phi <phind.uet@gmail.com>
> ---
>  drivers/nfc/virtual_ncidev.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/nfc/virtual_ncidev.c b/drivers/nfc/virtual_ncidev.c
> index b027be0b0b6f..ac8226db54e2 100644
> --- a/drivers/nfc/virtual_ncidev.c
> +++ b/drivers/nfc/virtual_ncidev.c
> @@ -20,26 +20,31 @@
>  				 NFC_PROTO_ISO14443_MASK | \
>  				 NFC_PROTO_ISO14443_B_MASK | \
>  				 NFC_PROTO_ISO15693_MASK)
> +#define NCIDEV_RUNNING 0

As pointed out, drop.

>  
>  struct virtual_nci_dev {
>  	struct nci_dev *ndev;
>  	struct mutex mtx;
>  	struct sk_buff *send_buff;
>  	struct wait_queue_head wq;
> +	bool running;
>  };
>  
>  static int virtual_nci_open(struct nci_dev *ndev)
>  {
> +	struct virtual_nci_dev *vdev = nci_get_drvdata(ndev);
> +
> +	vdev->running = true;
>  	return 0;
>  }
>  
>  static int virtual_nci_close(struct nci_dev *ndev)
>  {
>  	struct virtual_nci_dev *vdev = nci_get_drvdata(ndev);
> -

Drop this change.

>  	mutex_lock(&vdev->mtx);
>  	kfree_skb(vdev->send_buff);
>  	vdev->send_buff = NULL;
> +	vdev->running = false;
>  	mutex_unlock(&vdev->mtx);
>  
>  	return 0;
> @@ -50,7 +55,7 @@ static int virtual_nci_send(struct nci_dev *ndev, struct sk_buff *skb)
>  	struct virtual_nci_dev *vdev = nci_get_drvdata(ndev);
>  
>  	mutex_lock(&vdev->mtx);
> -	if (vdev->send_buff) {
> +	if (vdev->send_buff || !vdev->running) {

Rest looks good, thank you. I think the driver is still not safe for
closing the file descriptor, but that's separate issue.

Please send v2 fixing above points, with a changelog under ---.

Best regards,
Krzysztof
  
Bongsu Jeon Nov. 21, 2023, 1:31 a.m. UTC | #9
On 20/11/2023 19:23, Phi Nguyen wrote:

> The issue arises when an skb is added to the send_buff after invoking 
> ndev->ops->close() but before unregistering the device. In such cases, 
> the virtual device will generate a copy of skb, but with no consumer 
> thereafter. Consequently, this object persists indefinitely.
> 
> This problem seems to stem from the existence of time gaps between 
> ops->close() and the destruction of the workqueue. During this interval, 
> incoming requests continue to trigger the send function.

Dear Krzysztof,

Even though i agree on this patch, i think that NFC subsystem could handle this scenario not to trigger the send function after close.
Do you think it would be better that each nci driver has the responsibility to handle this scenario?

Best regards,
Bongsu
  
Krzysztof Kozlowski Nov. 21, 2023, 7:05 a.m. UTC | #10
On 21/11/2023 02:31, Bongsu Jeon wrote:
> On 20/11/2023 19:23, Phi Nguyen wrote:
> 
>> The issue arises when an skb is added to the send_buff after invoking 
>> ndev->ops->close() but before unregistering the device. In such cases, 
>> the virtual device will generate a copy of skb, but with no consumer 
>> thereafter. Consequently, this object persists indefinitely.
>>
>> This problem seems to stem from the existence of time gaps between 
>> ops->close() and the destruction of the workqueue. During this interval, 
>> incoming requests continue to trigger the send function.
> 
> Dear Krzysztof,
> 
> Even though i agree on this patch, i think that NFC subsystem could handle this scenario not to trigger the send function after close.
> Do you think it would be better that each nci driver has the responsibility to handle this scenario?

It's better if core does it, but so far each driver was taking care of
it. Send a patch if you have some better solution.

Best regards,
Krzysztof
  

Patch

diff --git a/drivers/nfc/virtual_ncidev.c b/drivers/nfc/virtual_ncidev.c
index b027be0b0b6f..ac8226db54e2 100644
--- a/drivers/nfc/virtual_ncidev.c
+++ b/drivers/nfc/virtual_ncidev.c
@@ -20,26 +20,31 @@ 
 				 NFC_PROTO_ISO14443_MASK | \
 				 NFC_PROTO_ISO14443_B_MASK | \
 				 NFC_PROTO_ISO15693_MASK)
+#define NCIDEV_RUNNING 0
 
 struct virtual_nci_dev {
 	struct nci_dev *ndev;
 	struct mutex mtx;
 	struct sk_buff *send_buff;
 	struct wait_queue_head wq;
+	bool running;
 };
 
 static int virtual_nci_open(struct nci_dev *ndev)
 {
+	struct virtual_nci_dev *vdev = nci_get_drvdata(ndev);
+
+	vdev->running = true;
 	return 0;
 }
 
 static int virtual_nci_close(struct nci_dev *ndev)
 {
 	struct virtual_nci_dev *vdev = nci_get_drvdata(ndev);
-
 	mutex_lock(&vdev->mtx);
 	kfree_skb(vdev->send_buff);
 	vdev->send_buff = NULL;
+	vdev->running = false;
 	mutex_unlock(&vdev->mtx);
 
 	return 0;
@@ -50,7 +55,7 @@  static int virtual_nci_send(struct nci_dev *ndev, struct sk_buff *skb)
 	struct virtual_nci_dev *vdev = nci_get_drvdata(ndev);
 
 	mutex_lock(&vdev->mtx);
-	if (vdev->send_buff) {
+	if (vdev->send_buff || !vdev->running) {
 		mutex_unlock(&vdev->mtx);
 		kfree_skb(skb);
 		return -1;