[1/4] hwtracing: hisi_ptt: Make cpumask only present online CPUs

Message ID 20230315094316.26772-2-yangyicong@huawei.com
State New
Headers
Series Improve PTT filter interface and some fixes |

Commit Message

Yicong Yang March 15, 2023, 9:43 a.m. UTC
  From: Yicong Yang <yangyicong@hisilicon.com>

perf will try to start PTT trace on every CPU presented in cpumask sysfs
attribute and it will fail to start on offline CPUs(see the comments in
perf_event_open()). But the driver is using cpumask_of_node() to export
the available cpumask which may include offline CPUs and may fail the
perf unintendedly. Fix this by only export the online CPUs of the node.

Fixes: ff0de066b463 ("hwtracing: hisi_ptt: Add trace function support for HiSilicon PCIe Tune and Trace device")
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
---
 drivers/hwtracing/ptt/hisi_ptt.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)
  

Comments

Jonathan Cameron March 28, 2023, 4:24 p.m. UTC | #1
On Wed, 15 Mar 2023 17:43:13 +0800
Yicong Yang <yangyicong@huawei.com> wrote:

> From: Yicong Yang <yangyicong@hisilicon.com>
> 
> perf will try to start PTT trace on every CPU presented in cpumask sysfs
> attribute and it will fail to start on offline CPUs(see the comments in
> perf_event_open()). But the driver is using cpumask_of_node() to export
> the available cpumask which may include offline CPUs and may fail the
> perf unintendedly. Fix this by only export the online CPUs of the node.

There isn't clear documentation that I can find for cpumask_of_node()
and chasing through on arm64 (which is what we care about for this driver)
it's maintained via numa_add_cpu() numa_remove_cpu()
Those are called in arch/arm64/kernel/smp.c in locations that are closely coupled
with set_cpu_online(cpu, XXX);
https://elixir.bootlin.com/linux/v6.3-rc4/source/arch/arm64/kernel/smp.c#L246
https://elixir.bootlin.com/linux/v6.3-rc4/source/arch/arm64/kernel/smp.c#L303

Now there are races when the two might not be in sync but in this case
we are just exposing the result to userspace, so chances of a race
after this sysfs attribute has been read seems much higher to me and
I don't think we can do anything about that.

Is there another path that I'm missing where online and node masks are out
of sync?

Jonathan


> 
> Fixes: ff0de066b463 ("hwtracing: hisi_ptt: Add trace function support for HiSilicon PCIe Tune and Trace device")
> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>

> ---
>  drivers/hwtracing/ptt/hisi_ptt.c | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/hwtracing/ptt/hisi_ptt.c b/drivers/hwtracing/ptt/hisi_ptt.c
> index 30f1525639b5..0a10c7ec46ad 100644
> --- a/drivers/hwtracing/ptt/hisi_ptt.c
> +++ b/drivers/hwtracing/ptt/hisi_ptt.c
> @@ -487,9 +487,18 @@ static ssize_t cpumask_show(struct device *dev, struct device_attribute *attr,
>  			    char *buf)
>  {
>  	struct hisi_ptt *hisi_ptt = to_hisi_ptt(dev_get_drvdata(dev));
> -	const cpumask_t *cpumask = cpumask_of_node(dev_to_node(&hisi_ptt->pdev->dev));
> +	cpumask_var_t mask;
> +	ssize_t n;
>  
> -	return cpumap_print_to_pagebuf(true, buf, cpumask);
> +	if (!alloc_cpumask_var(&mask, GFP_KERNEL))
> +		return 0;
> +
> +	cpumask_and(mask, cpumask_of_node(dev_to_node(&hisi_ptt->pdev->dev)),
> +		    cpu_online_mask);
> +	n = cpumap_print_to_pagebuf(true, buf, mask);
> +	free_cpumask_var(mask);
> +
> +	return n;
>  }
>  static DEVICE_ATTR_RO(cpumask);
>
  
Yicong Yang March 30, 2023, 3:53 a.m. UTC | #2
On 2023/3/29 0:24, Jonathan Cameron wrote:
> On Wed, 15 Mar 2023 17:43:13 +0800
> Yicong Yang <yangyicong@huawei.com> wrote:
> 
>> From: Yicong Yang <yangyicong@hisilicon.com>
>>
>> perf will try to start PTT trace on every CPU presented in cpumask sysfs
>> attribute and it will fail to start on offline CPUs(see the comments in
>> perf_event_open()). But the driver is using cpumask_of_node() to export
>> the available cpumask which may include offline CPUs and may fail the
>> perf unintendedly. Fix this by only export the online CPUs of the node.
> 
> There isn't clear documentation that I can find for cpumask_of_node()
> and chasing through on arm64 (which is what we care about for this driver)
> it's maintained via numa_add_cpu() numa_remove_cpu()
> Those are called in arch/arm64/kernel/smp.c in locations that are closely coupled
> with set_cpu_online(cpu, XXX);
> https://elixir.bootlin.com/linux/v6.3-rc4/source/arch/arm64/kernel/smp.c#L246
> https://elixir.bootlin.com/linux/v6.3-rc4/source/arch/arm64/kernel/smp.c#L303
> 
> Now there are races when the two might not be in sync but in this case
> we are just exposing the result to userspace, so chances of a race
> after this sysfs attribute has been read seems much higher to me and
> I don't think we can do anything about that.
> 
> Is there another path that I'm missing where online and node masks are out
> of sync?
> 

maybe no. This patch maybe incorrect and I need more investigation, so let's me
drop it from the series. Tested and everything seems fine now.

I found this problem and referred to commit 064f0e9302af ("mm: only display online cpus of the numa node")
which might be the same problem. But seems unnecessary that cpumask_of_node()
already include online CPUs only.

Thanks.

> Jonathan
> 
> 
>>
>> Fixes: ff0de066b463 ("hwtracing: hisi_ptt: Add trace function support for HiSilicon PCIe Tune and Trace device")
>> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
> 
>> ---
>>  drivers/hwtracing/ptt/hisi_ptt.c | 13 +++++++++++--
>>  1 file changed, 11 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/hwtracing/ptt/hisi_ptt.c b/drivers/hwtracing/ptt/hisi_ptt.c
>> index 30f1525639b5..0a10c7ec46ad 100644
>> --- a/drivers/hwtracing/ptt/hisi_ptt.c
>> +++ b/drivers/hwtracing/ptt/hisi_ptt.c
>> @@ -487,9 +487,18 @@ static ssize_t cpumask_show(struct device *dev, struct device_attribute *attr,
>>  			    char *buf)
>>  {
>>  	struct hisi_ptt *hisi_ptt = to_hisi_ptt(dev_get_drvdata(dev));
>> -	const cpumask_t *cpumask = cpumask_of_node(dev_to_node(&hisi_ptt->pdev->dev));
>> +	cpumask_var_t mask;
>> +	ssize_t n;
>>  
>> -	return cpumap_print_to_pagebuf(true, buf, cpumask);
>> +	if (!alloc_cpumask_var(&mask, GFP_KERNEL))
>> +		return 0;
>> +
>> +	cpumask_and(mask, cpumask_of_node(dev_to_node(&hisi_ptt->pdev->dev)),
>> +		    cpu_online_mask);
>> +	n = cpumap_print_to_pagebuf(true, buf, mask);
>> +	free_cpumask_var(mask);
>> +
>> +	return n;
>>  }
>>  static DEVICE_ATTR_RO(cpumask);
>>  
> 
> .
>
  
Jonathan Cameron March 30, 2023, 8:34 a.m. UTC | #3
On Thu, 30 Mar 2023 11:53:14 +0800
Yicong Yang <yangyicong@huawei.com> wrote:

> On 2023/3/29 0:24, Jonathan Cameron wrote:
> > On Wed, 15 Mar 2023 17:43:13 +0800
> > Yicong Yang <yangyicong@huawei.com> wrote:
> >   
> >> From: Yicong Yang <yangyicong@hisilicon.com>
> >>
> >> perf will try to start PTT trace on every CPU presented in cpumask sysfs
> >> attribute and it will fail to start on offline CPUs(see the comments in
> >> perf_event_open()). But the driver is using cpumask_of_node() to export
> >> the available cpumask which may include offline CPUs and may fail the
> >> perf unintendedly. Fix this by only export the online CPUs of the node.  
> > 
> > There isn't clear documentation that I can find for cpumask_of_node()
> > and chasing through on arm64 (which is what we care about for this driver)
> > it's maintained via numa_add_cpu() numa_remove_cpu()
> > Those are called in arch/arm64/kernel/smp.c in locations that are closely coupled
> > with set_cpu_online(cpu, XXX);
> > https://elixir.bootlin.com/linux/v6.3-rc4/source/arch/arm64/kernel/smp.c#L246
> > https://elixir.bootlin.com/linux/v6.3-rc4/source/arch/arm64/kernel/smp.c#L303
> > 
> > Now there are races when the two might not be in sync but in this case
> > we are just exposing the result to userspace, so chances of a race
> > after this sysfs attribute has been read seems much higher to me and
> > I don't think we can do anything about that.
> > 
> > Is there another path that I'm missing where online and node masks are out
> > of sync?
> >   
> 
> maybe no. This patch maybe incorrect and I need more investigation, so let's me
> drop it from the series. Tested and everything seems fine now.
> 
> I found this problem and referred to commit 064f0e9302af ("mm: only display online cpus of the numa node")
> which might be the same problem. But seems unnecessary that cpumask_of_node()
> already include online CPUs only.

Seems it was fixed up for arm64 in
7f954aa1a ("arm64: smp: remove cpu and numa topology information when hotplugging out CPMU")

If we could audit all the other architectures it would be great to document
the properties of this cpmuask and possibly simplify the code in the
path you highlight above (assuming no race conditions etc)

Jonathan
 
> 
> Thanks.
> 
> > Jonathan
> > 
> >   
> >>
> >> Fixes: ff0de066b463 ("hwtracing: hisi_ptt: Add trace function support for HiSilicon PCIe Tune and Trace device")
> >> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>  
> >   
> >> ---
> >>  drivers/hwtracing/ptt/hisi_ptt.c | 13 +++++++++++--
> >>  1 file changed, 11 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/hwtracing/ptt/hisi_ptt.c b/drivers/hwtracing/ptt/hisi_ptt.c
> >> index 30f1525639b5..0a10c7ec46ad 100644
> >> --- a/drivers/hwtracing/ptt/hisi_ptt.c
> >> +++ b/drivers/hwtracing/ptt/hisi_ptt.c
> >> @@ -487,9 +487,18 @@ static ssize_t cpumask_show(struct device *dev, struct device_attribute *attr,
> >>  			    char *buf)
> >>  {
> >>  	struct hisi_ptt *hisi_ptt = to_hisi_ptt(dev_get_drvdata(dev));
> >> -	const cpumask_t *cpumask = cpumask_of_node(dev_to_node(&hisi_ptt->pdev->dev));
> >> +	cpumask_var_t mask;
> >> +	ssize_t n;
> >>  
> >> -	return cpumap_print_to_pagebuf(true, buf, cpumask);
> >> +	if (!alloc_cpumask_var(&mask, GFP_KERNEL))
> >> +		return 0;
> >> +
> >> +	cpumask_and(mask, cpumask_of_node(dev_to_node(&hisi_ptt->pdev->dev)),
> >> +		    cpu_online_mask);
> >> +	n = cpumap_print_to_pagebuf(true, buf, mask);
> >> +	free_cpumask_var(mask);
> >> +
> >> +	return n;
> >>  }
> >>  static DEVICE_ATTR_RO(cpumask);
> >>    
> > 
> > .
> >
  

Patch

diff --git a/drivers/hwtracing/ptt/hisi_ptt.c b/drivers/hwtracing/ptt/hisi_ptt.c
index 30f1525639b5..0a10c7ec46ad 100644
--- a/drivers/hwtracing/ptt/hisi_ptt.c
+++ b/drivers/hwtracing/ptt/hisi_ptt.c
@@ -487,9 +487,18 @@  static ssize_t cpumask_show(struct device *dev, struct device_attribute *attr,
 			    char *buf)
 {
 	struct hisi_ptt *hisi_ptt = to_hisi_ptt(dev_get_drvdata(dev));
-	const cpumask_t *cpumask = cpumask_of_node(dev_to_node(&hisi_ptt->pdev->dev));
+	cpumask_var_t mask;
+	ssize_t n;
 
-	return cpumap_print_to_pagebuf(true, buf, cpumask);
+	if (!alloc_cpumask_var(&mask, GFP_KERNEL))
+		return 0;
+
+	cpumask_and(mask, cpumask_of_node(dev_to_node(&hisi_ptt->pdev->dev)),
+		    cpu_online_mask);
+	n = cpumap_print_to_pagebuf(true, buf, mask);
+	free_cpumask_var(mask);
+
+	return n;
 }
 static DEVICE_ATTR_RO(cpumask);