[v3] mm: Make vmalloc_dump_obj() call in clean context

Message ID 20221118003441.3980437-1-qiang1.zhang@intel.com
State New
Headers
Series [v3] mm: Make vmalloc_dump_obj() call in clean context |

Commit Message

Zqiang Nov. 18, 2022, 12:34 a.m. UTC
  Currently, the mem_dump_obj() is invoked in call_rcu(), the
call_rcu() is maybe invoked in non-preemptive code segment,
for object allocated from vmalloc(), the following scenarios
may occur:

        CPU 0
tasks context
   spin_lock(&vmap_area_lock)
       Interrupt context
           call_rcu()
             mem_dump_obj
               vmalloc_dump_obj
                 spin_lock(&vmap_area_lock) <--deadlock

and for PREEMPT-RT kernel, the spinlock will convert to sleepable
lock, so the vmap_area_lock spinlock not allowed to get in non-preemptive
code segment. therefore, this commit make the vmalloc_dump_obj() call in
a clean context.

Signed-off-by: Zqiang <qiang1.zhang@intel.com>
---
 v1->v2:
 add IS_ENABLED(CONFIG_PREEMPT_RT) check.
 v2->v3:
 change commit message and add some comment.

 mm/util.c    |  4 +++-
 mm/vmalloc.c | 25 +++++++++++++++++++++++++
 2 files changed, 28 insertions(+), 1 deletion(-)
  

Comments

Zqiang Nov. 22, 2022, 11:05 p.m. UTC | #1
Gently ping  😊

Thanks
Zqiang

>Currently, the mem_dump_obj() is invoked in call_rcu(), the
>call_rcu() is maybe invoked in non-preemptive code segment,
>for object allocated from vmalloc(), the following scenarios
>may occur:
>
>        CPU 0
>tasks context
>   spin_lock(&vmap_area_lock)
>       Interrupt context
>           call_rcu()
>             mem_dump_obj
>               vmalloc_dump_obj
>                 spin_lock(&vmap_area_lock) <--deadlock
>
>and for PREEMPT-RT kernel, the spinlock will convert to sleepable
>lock, so the vmap_area_lock spinlock not allowed to get in non-preemptive
>code segment. therefore, this commit make the vmalloc_dump_obj() call in
>a clean context.
>
>Signed-off-by: Zqiang <qiang1.zhang@intel.com>
>---
>v1->v2:
> add IS_ENABLED(CONFIG_PREEMPT_RT) check.
> v2->v3:
> change commit message and add some comment.
>
> mm/util.c    |  4 +++-
> mm/vmalloc.c | 25 +++++++++++++++++++++++++
> 2 files changed, 28 insertions(+), 1 deletion(-)
>
>diff --git a/mm/util.c b/mm/util.c
>index 12984e76767e..2b0222a728cc 100644
>--- a/mm/util.c
>+++ b/mm/util.c
>@@ -1128,7 +1128,9 @@ void mem_dump_obj(void *object)
> 		return;
> 
> 	if (virt_addr_valid(object))
>-		type = "non-slab/vmalloc memory";
>+		type = "non-slab memory";
>+	else if (is_vmalloc_addr(object))
>+		type = "vmalloc memory";
> 	else if (object == NULL)
> 		type = "NULL pointer";
> 	else if (object == ZERO_SIZE_PTR)
>diff --git a/mm/vmalloc.c b/mm/vmalloc.c
>index ccaa461998f3..4351eafbe7ab 100644
>--- a/mm/vmalloc.c
>+++ b/mm/vmalloc.c
>@@ -4034,6 +4034,31 @@ bool vmalloc_dump_obj(void *object)
> 	struct vm_struct *vm;
> 	void *objp = (void *)PAGE_ALIGN((unsigned long)object);
> 
>+	/* for non-vmalloc addr, return directly */
>+	if (!is_vmalloc_addr(objp))
>+		return false;
>+
>+	/**
>+	 * for non-Preempt-RT kernel, return directly. otherwise not
>+	 * only needs to determine whether it is in the interrupt context
>+	 * (in_interrupt())to avoid deadlock, but also to avoid acquire
>+	 * vmap_area_lock spinlock in disables interrupts or preempts
>+	 * critical sections, because the vmap_area_lock spinlock convert
>+	 * to sleepable lock
>+	 */
>+	if (IS_ENABLED(CONFIG_PREEMPT_RT) && !preemptible())
>+		return false;
>+
>+	/**
>+	 * get here, for Preempt-RT kernel, it means that we are in
>+	 * preemptible context(preemptible() is true), it also means
>+	 * that the in_interrupt() will return false.
>+	 * for non-Preempt-RT kernel, only needs to determine whether
>+	 * it is in the interrupt context(in_interrupt()) to avoid deadlock
>+	 */
>+	if (in_interrupt())
>+		return false;
>+
> 	vm = find_vm_area(objp);
> 	if (!vm)
> 		return false;
>-- 
>2.25.1
  
Zhen Lei Nov. 28, 2022, 7:59 a.m. UTC | #2
On 2022/11/23 7:05, Zhang, Qiang1 wrote:
> 
> Gently ping  😊
> 
> Thanks
> Zqiang
> 
>> Currently, the mem_dump_obj() is invoked in call_rcu(), the
>> call_rcu() is maybe invoked in non-preemptive code segment,
>> for object allocated from vmalloc(), the following scenarios
>> may occur:
>>
>>        CPU 0
>> tasks context
>>   spin_lock(&vmap_area_lock)
>>       Interrupt context
>>           call_rcu()
>>             mem_dump_obj
>>               vmalloc_dump_obj
>>                 spin_lock(&vmap_area_lock) <--deadlock
>>
>> and for PREEMPT-RT kernel, the spinlock will convert to sleepable
>> lock, so the vmap_area_lock spinlock not allowed to get in non-preemptive
>> code segment. therefore, this commit make the vmalloc_dump_obj() call in
>> a clean context.
>>
>> Signed-off-by: Zqiang <qiang1.zhang@intel.com>
>> ---
>> v1->v2:
>> add IS_ENABLED(CONFIG_PREEMPT_RT) check.
>> v2->v3:
>> change commit message and add some comment.
>>
>> mm/util.c    |  4 +++-
>> mm/vmalloc.c | 25 +++++++++++++++++++++++++
>> 2 files changed, 28 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/util.c b/mm/util.c
>> index 12984e76767e..2b0222a728cc 100644
>> --- a/mm/util.c
>> +++ b/mm/util.c
>> @@ -1128,7 +1128,9 @@ void mem_dump_obj(void *object)
>> 		return;
>>
>> 	if (virt_addr_valid(object))
>> -		type = "non-slab/vmalloc memory";
>> +		type = "non-slab memory";
>> +	else if (is_vmalloc_addr(object))
>> +		type = "vmalloc memory";
>> 	else if (object == NULL)
>> 		type = "NULL pointer";
>> 	else if (object == ZERO_SIZE_PTR)
>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
>> index ccaa461998f3..4351eafbe7ab 100644
>> --- a/mm/vmalloc.c
>> +++ b/mm/vmalloc.c
>> @@ -4034,6 +4034,31 @@ bool vmalloc_dump_obj(void *object)
>> 	struct vm_struct *vm;
>> 	void *objp = (void *)PAGE_ALIGN((unsigned long)object);
>>
>> +	/* for non-vmalloc addr, return directly */
>> +	if (!is_vmalloc_addr(objp))
>> +		return false;
>> +
>> +	/**
>> +	 * for non-Preempt-RT kernel, return directly. otherwise not
>> +	 * only needs to determine whether it is in the interrupt context
>> +	 * (in_interrupt())to avoid deadlock, but also to avoid acquire
>> +	 * vmap_area_lock spinlock in disables interrupts or preempts
>> +	 * critical sections, because the vmap_area_lock spinlock convert
>> +	 * to sleepable lock
>> +	 */
>> +	if (IS_ENABLED(CONFIG_PREEMPT_RT) && !preemptible())
>> +		return false;
>> +
>> +	/**
>> +	 * get here, for Preempt-RT kernel, it means that we are in
>> +	 * preemptible context(preemptible() is true), it also means
>> +	 * that the in_interrupt() will return false.
>> +	 * for non-Preempt-RT kernel, only needs to determine whether
>> +	 * it is in the interrupt context(in_interrupt()) to avoid deadlock
>> +	 */
>> +	if (in_interrupt())
>> +		return false;

We want mem_dump_obj() to work properly in the interrupt context. But with
this if statement, it's impossible to work properly.

Here's my test case:
void *tst_p;

void my_irqwork_handler(struct irq_work *work)
{
        void *p = tst_p;

        printk("enter my_irqwork_handler: CPU=%d, locked=%d\n", smp_processor_id(), tst_is_locked());
        mem_dump_obj(p);
        vfree(p);
}

static void test_mem_dump(void)
{
        struct irq_work work = IRQ_WORK_INIT_HARD(my_irqwork_handler);

        tst_p = vmalloc(PAGE_SIZE);
        if (!tst_p) {
                printk("vmalloc failed\n");
                return;
        }
        printk("enter test_mem_dump: CPU=%d\n", smp_processor_id());

        //tst_lock();
        irq_work_queue(&work);
        //tst_unlock();

        printk("leave test_mem_dump: CPU=%d\n", smp_processor_id());
}

Test result:
[   45.212941] enter test_mem_dump: CPU=0
[   45.213280] enter my_irqwork_handler: CPU=0, locked=0
[   45.213546]  vmalloc memory
[   45.213996] leave test_mem_dump: CPU=0

>> +
>> 	vm = find_vm_area(objp);
>> 	if (!vm)
>> 		return false;
>> -- 
>> 2.25.1
>
  
Zqiang Nov. 28, 2022, 8:33 a.m. UTC | #3
On 2022/11/23 7:05, Zhang, Qiang1 wrote:
> 
> Gently ping  😊
> 
> Thanks
> Zqiang
> 
>> Currently, the mem_dump_obj() is invoked in call_rcu(), the
>> call_rcu() is maybe invoked in non-preemptive code segment, for 
>> object allocated from vmalloc(), the following scenarios may occur:
>>
>>        CPU 0
>> tasks context
>>   spin_lock(&vmap_area_lock)
>>       Interrupt context
>>           call_rcu()
>>             mem_dump_obj
>>               vmalloc_dump_obj
>>                 spin_lock(&vmap_area_lock) <--deadlock
>>
>> and for PREEMPT-RT kernel, the spinlock will convert to sleepable 
>> lock, so the vmap_area_lock spinlock not allowed to get in 
>> non-preemptive code segment. therefore, this commit make the 
>> vmalloc_dump_obj() call in a clean context.
>>
>> Signed-off-by: Zqiang <qiang1.zhang@intel.com>
>> ---
>> v1->v2:
>> add IS_ENABLED(CONFIG_PREEMPT_RT) check.
>> v2->v3:
>> change commit message and add some comment.
>>
>> mm/util.c    |  4 +++-
>> mm/vmalloc.c | 25 +++++++++++++++++++++++++
>> 2 files changed, 28 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/util.c b/mm/util.c
>> index 12984e76767e..2b0222a728cc 100644
>> --- a/mm/util.c
>> +++ b/mm/util.c
>> @@ -1128,7 +1128,9 @@ void mem_dump_obj(void *object)
>> 		return;
>>
>> 	if (virt_addr_valid(object))
>> -		type = "non-slab/vmalloc memory";
>> +		type = "non-slab memory";
>> +	else if (is_vmalloc_addr(object))
>> +		type = "vmalloc memory";
>> 	else if (object == NULL)
>> 		type = "NULL pointer";
>> 	else if (object == ZERO_SIZE_PTR)
>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 
>> ccaa461998f3..4351eafbe7ab 100644
>> --- a/mm/vmalloc.c
>> +++ b/mm/vmalloc.c
>> @@ -4034,6 +4034,31 @@ bool vmalloc_dump_obj(void *object)
>> 	struct vm_struct *vm;
>> 	void *objp = (void *)PAGE_ALIGN((unsigned long)object);
>>
>> +	/* for non-vmalloc addr, return directly */
>> +	if (!is_vmalloc_addr(objp))
>> +		return false;
>> +
>> +	/**
>> +	 * for non-Preempt-RT kernel, return directly. otherwise not
>> +	 * only needs to determine whether it is in the interrupt context
>> +	 * (in_interrupt())to avoid deadlock, but also to avoid acquire
>> +	 * vmap_area_lock spinlock in disables interrupts or preempts
>> +	 * critical sections, because the vmap_area_lock spinlock convert
>> +	 * to sleepable lock
>> +	 */
>> +	if (IS_ENABLED(CONFIG_PREEMPT_RT) && !preemptible())
>> +		return false;
>> +
>> +	/**
>> +	 * get here, for Preempt-RT kernel, it means that we are in
>> +	 * preemptible context(preemptible() is true), it also means
>> +	 * that the in_interrupt() will return false.
>> +	 * for non-Preempt-RT kernel, only needs to determine whether
>> +	 * it is in the interrupt context(in_interrupt()) to avoid deadlock
>> +	 */
>> +	if (in_interrupt())
>> +		return false;
>
>
>We want mem_dump_obj() to work properly in the interrupt context. But with this if statement, it's impossible to work properly.

This is to avoid the following scenarios, because, call_rcu() can be invoked in hard irq or
softirq context, so mem_dump_obj() not dump some details info.

CPU 0
tasks context
   spin_lock(&vmap_area_lock)
       Interrupt  or softirq context
           call_rcu()
             mem_dump_obj
              vmalloc_dump_obj
                 spin_lock(&vmap_area_lock) <--deadlock

because mem_dump_obj() only used by RCU,   I'm not sure if this modification is appropriate, 
need to hear from Paul.

Thanks
Zqiang


>
>Here's my test case:
>void *tst_p;
>
>void my_irqwork_handler(struct irq_work *work) {
>        void *p = tst_p;
>
>        printk("enter my_irqwork_handler: CPU=%d, locked=%d\n", smp_processor_id(), tst_is_locked());
>        mem_dump_obj(p);
>        vfree(p);
>}
>
>static void test_mem_dump(void)
>{
>        struct irq_work work = IRQ_WORK_INIT_HARD(my_irqwork_handler);
>
>        tst_p = vmalloc(PAGE_SIZE);
>        if (!tst_p) {
>                printk("vmalloc failed\n");
>                return;
>        }
>        printk("enter test_mem_dump: CPU=%d\n", smp_processor_id());
>
>        //tst_lock();
>        irq_work_queue(&work);
>        //tst_unlock();
>
>        printk("leave test_mem_dump: CPU=%d\n", smp_processor_id()); }
>
>Test result:
>[   45.212941] enter test_mem_dump: CPU=0
>[   45.213280] enter my_irqwork_handler: CPU=0, locked=0
>[   45.213546]  vmalloc memory
>[   45.213996] leave test_mem_dump: CPU=0
>
>> +
>> 	vm = find_vm_area(objp);
>> 	if (!vm)
>> 		return false;
>> --
>> 2.25.1
> 
>
>-- 
>Regards,
>  Zhen Lei
  
Zhen Lei Nov. 28, 2022, 9:13 a.m. UTC | #4
On 2022/11/28 16:33, Zhang, Qiang1 wrote:
> On 2022/11/23 7:05, Zhang, Qiang1 wrote:
>>
>> Gently ping  😊
>>
>> Thanks
>> Zqiang
>>
>>> Currently, the mem_dump_obj() is invoked in call_rcu(), the
>>> call_rcu() is maybe invoked in non-preemptive code segment, for 
>>> object allocated from vmalloc(), the following scenarios may occur:
>>>
>>>        CPU 0
>>> tasks context
>>>   spin_lock(&vmap_area_lock)
>>>       Interrupt context
>>>           call_rcu()
>>>             mem_dump_obj
>>>               vmalloc_dump_obj
>>>                 spin_lock(&vmap_area_lock) <--deadlock
>>>
>>> and for PREEMPT-RT kernel, the spinlock will convert to sleepable 
>>> lock, so the vmap_area_lock spinlock not allowed to get in 
>>> non-preemptive code segment. therefore, this commit make the 
>>> vmalloc_dump_obj() call in a clean context.
>>>
>>> Signed-off-by: Zqiang <qiang1.zhang@intel.com>
>>> ---
>>> v1->v2:
>>> add IS_ENABLED(CONFIG_PREEMPT_RT) check.
>>> v2->v3:
>>> change commit message and add some comment.
>>>
>>> mm/util.c    |  4 +++-
>>> mm/vmalloc.c | 25 +++++++++++++++++++++++++
>>> 2 files changed, 28 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/mm/util.c b/mm/util.c
>>> index 12984e76767e..2b0222a728cc 100644
>>> --- a/mm/util.c
>>> +++ b/mm/util.c
>>> @@ -1128,7 +1128,9 @@ void mem_dump_obj(void *object)
>>> 		return;
>>>
>>> 	if (virt_addr_valid(object))
>>> -		type = "non-slab/vmalloc memory";
>>> +		type = "non-slab memory";
>>> +	else if (is_vmalloc_addr(object))
>>> +		type = "vmalloc memory";
>>> 	else if (object == NULL)
>>> 		type = "NULL pointer";
>>> 	else if (object == ZERO_SIZE_PTR)
>>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 
>>> ccaa461998f3..4351eafbe7ab 100644
>>> --- a/mm/vmalloc.c
>>> +++ b/mm/vmalloc.c
>>> @@ -4034,6 +4034,31 @@ bool vmalloc_dump_obj(void *object)
>>> 	struct vm_struct *vm;
>>> 	void *objp = (void *)PAGE_ALIGN((unsigned long)object);
>>>
>>> +	/* for non-vmalloc addr, return directly */
>>> +	if (!is_vmalloc_addr(objp))
>>> +		return false;
>>> +
>>> +	/**
>>> +	 * for non-Preempt-RT kernel, return directly. otherwise not
>>> +	 * only needs to determine whether it is in the interrupt context
>>> +	 * (in_interrupt())to avoid deadlock, but also to avoid acquire
>>> +	 * vmap_area_lock spinlock in disables interrupts or preempts
>>> +	 * critical sections, because the vmap_area_lock spinlock convert
>>> +	 * to sleepable lock
>>> +	 */
>>> +	if (IS_ENABLED(CONFIG_PREEMPT_RT) && !preemptible())
>>> +		return false;
>>> +
>>> +	/**
>>> +	 * get here, for Preempt-RT kernel, it means that we are in
>>> +	 * preemptible context(preemptible() is true), it also means
>>> +	 * that the in_interrupt() will return false.
>>> +	 * for non-Preempt-RT kernel, only needs to determine whether
>>> +	 * it is in the interrupt context(in_interrupt()) to avoid deadlock
>>> +	 */
>>> +	if (in_interrupt())
>>> +		return false;
>>
>>
>> We want mem_dump_obj() to work properly in the interrupt context. But with this if statement, it's impossible to work properly.
> 
> This is to avoid the following scenarios, because, call_rcu() can be invoked in hard irq or
> softirq context, so mem_dump_obj() not dump some details info.

OK. Sorry, I'm confusing your issue with what I'm doing right now.

https://lkml.org/lkml/2022/11/16/913

I need "if (in_interrupt() && spin_is_locked(&vmap_area_lock))". So
mem_dump_obj() can work well in interrupt, except the task was
interrupted in the critical section of vmap_area_lock.


> 
> CPU 0
> tasks context
>    spin_lock(&vmap_area_lock)
>        Interrupt  or softirq context
>            call_rcu()
>              mem_dump_obj
>               vmalloc_dump_obj
>                  spin_lock(&vmap_area_lock) <--deadlock
> 
> because mem_dump_obj() only used by RCU,   I'm not sure if this modification is appropriate, 
> need to hear from Paul.
> 
> Thanks
> Zqiang
> 
> 
>>
>> Here's my test case:
>> void *tst_p;
>>
>> void my_irqwork_handler(struct irq_work *work) {
>>        void *p = tst_p;
>>
>>        printk("enter my_irqwork_handler: CPU=%d, locked=%d\n", smp_processor_id(), tst_is_locked());
>>        mem_dump_obj(p);
>>        vfree(p);
>> }
>>
>> static void test_mem_dump(void)
>> {
>>        struct irq_work work = IRQ_WORK_INIT_HARD(my_irqwork_handler);
>>
>>        tst_p = vmalloc(PAGE_SIZE);
>>        if (!tst_p) {
>>                printk("vmalloc failed\n");
>>                return;
>>        }
>>        printk("enter test_mem_dump: CPU=%d\n", smp_processor_id());
>>
>>        //tst_lock();
>>        irq_work_queue(&work);
>>        //tst_unlock();
>>
>>        printk("leave test_mem_dump: CPU=%d\n", smp_processor_id()); }
>>
>> Test result:
>> [   45.212941] enter test_mem_dump: CPU=0
>> [   45.213280] enter my_irqwork_handler: CPU=0, locked=0
>> [   45.213546]  vmalloc memory
>> [   45.213996] leave test_mem_dump: CPU=0
>>
>>> +
>>> 	vm = find_vm_area(objp);
>>> 	if (!vm)
>>> 		return false;
>>> --
>>> 2.25.1
>>
>>
>> -- 
>> Regards,
>>  Zhen Lei
  

Patch

diff --git a/mm/util.c b/mm/util.c
index 12984e76767e..2b0222a728cc 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -1128,7 +1128,9 @@  void mem_dump_obj(void *object)
 		return;
 
 	if (virt_addr_valid(object))
-		type = "non-slab/vmalloc memory";
+		type = "non-slab memory";
+	else if (is_vmalloc_addr(object))
+		type = "vmalloc memory";
 	else if (object == NULL)
 		type = "NULL pointer";
 	else if (object == ZERO_SIZE_PTR)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index ccaa461998f3..4351eafbe7ab 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -4034,6 +4034,31 @@  bool vmalloc_dump_obj(void *object)
 	struct vm_struct *vm;
 	void *objp = (void *)PAGE_ALIGN((unsigned long)object);
 
+	/* for non-vmalloc addr, return directly */
+	if (!is_vmalloc_addr(objp))
+		return false;
+
+	/**
+	 * for non-Preempt-RT kernel, return directly. otherwise not
+	 * only needs to determine whether it is in the interrupt context
+	 * (in_interrupt())to avoid deadlock, but also to avoid acquire
+	 * vmap_area_lock spinlock in disables interrupts or preempts
+	 * critical sections, because the vmap_area_lock spinlock convert
+	 * to sleepable lock
+	 */
+	if (IS_ENABLED(CONFIG_PREEMPT_RT) && !preemptible())
+		return false;
+
+	/**
+	 * get here, for Preempt-RT kernel, it means that we are in
+	 * preemptible context(preemptible() is true), it also means
+	 * that the in_interrupt() will return false.
+	 * for non-Preempt-RT kernel, only needs to determine whether
+	 * it is in the interrupt context(in_interrupt()) to avoid deadlock
+	 */
+	if (in_interrupt())
+		return false;
+
 	vm = find_vm_area(objp);
 	if (!vm)
 		return false;