Message ID | 20230816141008.535450-1-suzuki.poulose@arm.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b82d:0:b0:3f2:4152:657d with SMTP id z13csp72611vqi; Wed, 16 Aug 2023 08:01:06 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEX2EHncc5MqXXkvDrq+hvnoHvfYBE2dZ1hyJm59T6qW84zP4hEBE9uzPv23jOGCZyQOt4g X-Received: by 2002:a05:6a21:601:b0:104:ad71:f080 with SMTP id ll1-20020a056a21060100b00104ad71f080mr2148976pzb.34.1692198065912; Wed, 16 Aug 2023 08:01:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1692198065; cv=none; d=google.com; s=arc-20160816; b=VsnyqtUEpSUhxxXoAHSoBBw5x9OaIzoP2O/aJOrRP14mW8QKch6DG7Yt9ghS7tLpJc E95K8S/QFLbUn5W6QEHri+pyHZUsxSks1FjAuWRflEX63GSPps21YbA+NdcokeI+74Eg E0syRv/qSZe6sa7fpi7Tmp9YSftS9/SDGy4E0PGm5ebjrWnzEEo607RZyimrLsO08ty3 G5leHT0eRzFbM4P7DlH0RKSNI3GbYD55/rLPISmZOrCegwOEhv18j7ISbJMh59wtl7CC qTVaB9Mriu+LHc482HUIP53Kv7Aa14RQM1x1YjqHVa2LFcgo0E40jqNZ0Kl854j795xY Db6Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=+QdILIrWtQ2bd4jZ+rTURrfuhY0VJwt30qspsECoQJo=; fh=zVGwnGiRlj574Pww4KH/BLlOiI5ctJdh8V8FzVctypg=; b=Z9dcldPbmJ+NRBjHG3D9B7FJMH7dlGgc6yFGaX/1mZPYhnRUclJ5OWVlx0YyGuE+Bn PRccSdOQE1BrptKJpBbUHnZSrBaxaW2g6djXvd0cG6e/zr5Ho2PPdivjQOUHAlRC66aG SoA51QjUDvIhaZhulQDIGCW3Wo4e9Ruxhrw90vlcQait6LPUNmesu7GzWTAq70koFZJ4 UxMwtnh16CycivHsjKGsxpK/V4UYMckGQCWpHbtD16ve3mBrzM04Bcn3iz1tNanOD5fi oIen5UXNY2CiIjDYUIhkETMq7IqS7vgxqHsCxL4lA+Y65+9tqDAJqqKfwRrG+QJpANFU q8UA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id z2-20020a63ac42000000b005655bf61e37si9434215pgn.14.2023.08.16.08.00.45; Wed, 16 Aug 2023 08:01:05 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1343508AbjHPOK3 (ORCPT <rfc822;somadevkernel@gmail.com> + 99 others); Wed, 16 Aug 2023 10:10:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54476 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1343567AbjHPOKY (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 16 Aug 2023 10:10:24 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id A5711210D for <linux-kernel@vger.kernel.org>; Wed, 16 Aug 2023 07:10:22 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 8E243D75; Wed, 16 Aug 2023 07:11:03 -0700 (PDT) Received: from ewhatever.cambridge.arm.com (ewhatever.cambridge.arm.com [10.1.197.1]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id A1E723F762; Wed, 16 Aug 2023 07:10:20 -0700 (PDT) From: Suzuki K Poulose <suzuki.poulose@arm.com> To: hejunhao3@huawei.com Cc: coresight@lists.linaro.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, jonathan.cameron@huawei.com, leo.yan@linaro.org, mike.leach@linaro.org, james.clark@arm.com, linuxarm@huawei.com, yangyicong@huawei.com, prime.zeng@hisilicon.com, Suzuki K Poulose <suzuki.poulose@arm.com> Subject: [PATCH v2 1/2] coresight: trbe: Fix TRBE potential sleep in atomic context Date: Wed, 16 Aug 2023 15:10:07 +0100 Message-Id: <20230816141008.535450-1-suzuki.poulose@arm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230814093813.19152-1-hejunhao3@huawei.com> References: <20230814093813.19152-1-hejunhao3@huawei.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1774201119773886078 X-GMAIL-MSGID: 1774398279117169861 |
Series |
[v2,1/2] coresight: trbe: Fix TRBE potential sleep in atomic context
|
|
Commit Message
Suzuki K Poulose
Aug. 16, 2023, 2:10 p.m. UTC
From: Junhao He <hejunhao3@huawei.com> smp_call_function_single() will allocate an IPI interrupt vector to the target processor and send a function call request to the interrupt vector. After the target processor receives the IPI interrupt, it will execute arm_trbe_remove_coresight_cpu() call request in the interrupt handler. According to the device_unregister() stack information, if other process is useing the device, the down_write() may sleep, and trigger deadlocks or unexpected errors. arm_trbe_remove_coresight_cpu coresight_unregister device_unregister device_del kobject_del __kobject_del sysfs_remove_dir kernfs_remove down_write ---------> it may sleep Add a helper arm_trbe_disable_cpu() to disable TRBE precpu irq and reset per TRBE. Simply call arm_trbe_remove_coresight_cpu() directly without useing the smp_call_function_single(), which is the same as registering the TRBE coresight device. Fixes: 3fbf7f011f24 ("coresight: sink: Add TRBE driver") Signed-off-by: Junhao He <hejunhao3@huawei.com> Link: https://lore.kernel.org/r/20230814093813.19152-2-hejunhao3@huawei.com [ Remove duplicate cpumask checks during removal ] Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> --- drivers/hwtracing/coresight/coresight-trbe.c | 33 +++++++++++--------- 1 file changed, 18 insertions(+), 15 deletions(-)
Comments
Hello Junhao, On 8/16/23 19:40, Suzuki K Poulose wrote: > From: Junhao He <hejunhao3@huawei.com> > > smp_call_function_single() will allocate an IPI interrupt vector to > the target processor and send a function call request to the interrupt > vector. After the target processor receives the IPI interrupt, it will > execute arm_trbe_remove_coresight_cpu() call request in the interrupt > handler. > > According to the device_unregister() stack information, if other process > is useing the device, the down_write() may sleep, and trigger deadlocks > or unexpected errors. > > arm_trbe_remove_coresight_cpu > coresight_unregister > device_unregister > device_del > kobject_del > __kobject_del > sysfs_remove_dir > kernfs_remove > down_write ---------> it may sleep But how did you really detect this problem ? Does this show up as an warning when you enable lockdep debug ? OR it really happened during a real workload execution followed by TRBE module unload. Although the problem seems plausible (which needs fixing), just wondering how did we trigger this. > > Add a helper arm_trbe_disable_cpu() to disable TRBE precpu irq and reset > per TRBE. > Simply call arm_trbe_remove_coresight_cpu() directly without useing the > smp_call_function_single(), which is the same as registering the TRBE > coresight device. > > Fixes: 3fbf7f011f24 ("coresight: sink: Add TRBE driver") > Signed-off-by: Junhao He <hejunhao3@huawei.com> > Link: https://lore.kernel.org/r/20230814093813.19152-2-hejunhao3@huawei.com > [ Remove duplicate cpumask checks during removal ] > Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> > --- > drivers/hwtracing/coresight/coresight-trbe.c | 33 +++++++++++--------- > 1 file changed, 18 insertions(+), 15 deletions(-) > > diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c > index 7720619909d6..025f70adee47 100644 > --- a/drivers/hwtracing/coresight/coresight-trbe.c > +++ b/drivers/hwtracing/coresight/coresight-trbe.c > @@ -1225,6 +1225,17 @@ static void arm_trbe_enable_cpu(void *info) > enable_percpu_irq(drvdata->irq, IRQ_TYPE_NONE); > } > > +static void arm_trbe_disable_cpu(void *info) > +{ > + struct trbe_drvdata *drvdata = info; > + struct trbe_cpudata *cpudata = this_cpu_ptr(drvdata->cpudata); > + > + disable_percpu_irq(drvdata->irq); > + trbe_reset_local(cpudata); > + cpudata->drvdata = NULL; > +} > + > + > static void arm_trbe_register_coresight_cpu(struct trbe_drvdata *drvdata, int cpu) > { > struct trbe_cpudata *cpudata = per_cpu_ptr(drvdata->cpudata, cpu); > @@ -1326,18 +1337,12 @@ static void arm_trbe_probe_cpu(void *info) > cpumask_clear_cpu(cpu, &drvdata->supported_cpus); > } > > -static void arm_trbe_remove_coresight_cpu(void *info) > +static void arm_trbe_remove_coresight_cpu(struct trbe_drvdata *drvdata, int cpu) > { > - int cpu = smp_processor_id(); > - struct trbe_drvdata *drvdata = info; > - struct trbe_cpudata *cpudata = per_cpu_ptr(drvdata->cpudata, cpu); > struct coresight_device *trbe_csdev = coresight_get_percpu_sink(cpu); > > - disable_percpu_irq(drvdata->irq); > - trbe_reset_local(cpudata); > if (trbe_csdev) { > coresight_unregister(trbe_csdev); > - cpudata->drvdata = NULL; > coresight_set_percpu_sink(cpu, NULL); > } > } > @@ -1366,8 +1371,10 @@ static int arm_trbe_remove_coresight(struct trbe_drvdata *drvdata) > { > int cpu; > > - for_each_cpu(cpu, &drvdata->supported_cpus) > - smp_call_function_single(cpu, arm_trbe_remove_coresight_cpu, drvdata, 1); > + for_each_cpu(cpu, &drvdata->supported_cpus) { > + smp_call_function_single(cpu, arm_trbe_disable_cpu, drvdata, 1); > + arm_trbe_remove_coresight_cpu(drvdata, cpu); > + } > free_percpu(drvdata->cpudata); > return 0; > } > @@ -1406,12 +1413,8 @@ static int arm_trbe_cpu_teardown(unsigned int cpu, struct hlist_node *node) > { > struct trbe_drvdata *drvdata = hlist_entry_safe(node, struct trbe_drvdata, hotplug_node); > > - if (cpumask_test_cpu(cpu, &drvdata->supported_cpus)) { > - struct trbe_cpudata *cpudata = per_cpu_ptr(drvdata->cpudata, cpu); > - > - disable_percpu_irq(drvdata->irq); > - trbe_reset_local(cpudata); > - } > + if (cpumask_test_cpu(cpu, &drvdata->supported_cpus)) > + arm_trbe_disable_cpu(drvdata); This code hunk seems unrelated to the context here other than just finding another use case for arm_trbe_disable_cpu(). The problem is - arm_trbe_disable_cpu() resets cpudata->drvdata which might not get re-initialized back in arm_trbe_cpu_startup(), as there will still be a per cpu sink associated as confirmed with coresight_get_percpu_sink(). I guess it might be better to drop this change and just keep everything limited to SMP IPI callback reworking in arm_trbe_remove_coresight(). > return 0; > } >
Hi Anshuman Khandual, On 2023/8/17 15:13, Anshuman Khandual wrote: > Hello Junhao, > > On 8/16/23 19:40, Suzuki K Poulose wrote: >> From: Junhao He <hejunhao3@huawei.com> >> >> smp_call_function_single() will allocate an IPI interrupt vector to >> the target processor and send a function call request to the interrupt >> vector. After the target processor receives the IPI interrupt, it will >> execute arm_trbe_remove_coresight_cpu() call request in the interrupt >> handler. >> >> According to the device_unregister() stack information, if other process >> is useing the device, the down_write() may sleep, and trigger deadlocks >> or unexpected errors. >> >> arm_trbe_remove_coresight_cpu >> coresight_unregister >> device_unregister >> device_del >> kobject_del >> __kobject_del >> sysfs_remove_dir >> kernfs_remove >> down_write ---------> it may sleep > But how did you really detect this problem ? Does this show up as an warning when > you enable lockdep debug ? OR it really happened during a real workload execution > followed by TRBE module unload. Although the problem seems plausible (which needs > fixing), just wondering how did we trigger this. Yes, it really happened during a real workload. If the TRBE driver is loaded and unloaded cyclically. the test script following: for ((i=0;i<99999;i++)) do insmod coresight-trbe.ko; rmmod coresight-trbe.ko; echo "loop $i"; done The kernel will report a panic. >> Add a helper arm_trbe_disable_cpu() to disable TRBE precpu irq and reset >> per TRBE. >> Simply call arm_trbe_remove_coresight_cpu() directly without useing the >> smp_call_function_single(), which is the same as registering the TRBE >> coresight device. >> >> Fixes: 3fbf7f011f24 ("coresight: sink: Add TRBE driver") >> Signed-off-by: Junhao He <hejunhao3@huawei.com> >> Link: https://lore.kernel.org/r/20230814093813.19152-2-hejunhao3@huawei.com >> [ Remove duplicate cpumask checks during removal ] >> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> >> --- >> drivers/hwtracing/coresight/coresight-trbe.c | 33 +++++++++++--------- >> 1 file changed, 18 insertions(+), 15 deletions(-) >> >> diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c >> index 7720619909d6..025f70adee47 100644 >> --- a/drivers/hwtracing/coresight/coresight-trbe.c >> +++ b/drivers/hwtracing/coresight/coresight-trbe.c >> @@ -1225,6 +1225,17 @@ static void arm_trbe_enable_cpu(void *info) >> enable_percpu_irq(drvdata->irq, IRQ_TYPE_NONE); >> } >> >> +static void arm_trbe_disable_cpu(void *info) >> +{ >> + struct trbe_drvdata *drvdata = info; >> + struct trbe_cpudata *cpudata = this_cpu_ptr(drvdata->cpudata); >> + >> + disable_percpu_irq(drvdata->irq); >> + trbe_reset_local(cpudata); >> + cpudata->drvdata = NULL; >> +} >> + >> + >> static void arm_trbe_register_coresight_cpu(struct trbe_drvdata *drvdata, int cpu) >> { >> struct trbe_cpudata *cpudata = per_cpu_ptr(drvdata->cpudata, cpu); >> @@ -1326,18 +1337,12 @@ static void arm_trbe_probe_cpu(void *info) >> cpumask_clear_cpu(cpu, &drvdata->supported_cpus); >> } >> >> -static void arm_trbe_remove_coresight_cpu(void *info) >> +static void arm_trbe_remove_coresight_cpu(struct trbe_drvdata *drvdata, int cpu) >> { >> - int cpu = smp_processor_id(); >> - struct trbe_drvdata *drvdata = info; >> - struct trbe_cpudata *cpudata = per_cpu_ptr(drvdata->cpudata, cpu); >> struct coresight_device *trbe_csdev = coresight_get_percpu_sink(cpu); >> >> - disable_percpu_irq(drvdata->irq); >> - trbe_reset_local(cpudata); >> if (trbe_csdev) { >> coresight_unregister(trbe_csdev); >> - cpudata->drvdata = NULL; >> coresight_set_percpu_sink(cpu, NULL); >> } >> } >> @@ -1366,8 +1371,10 @@ static int arm_trbe_remove_coresight(struct trbe_drvdata *drvdata) >> { >> int cpu; >> >> - for_each_cpu(cpu, &drvdata->supported_cpus) >> - smp_call_function_single(cpu, arm_trbe_remove_coresight_cpu, drvdata, 1); >> + for_each_cpu(cpu, &drvdata->supported_cpus) { >> + smp_call_function_single(cpu, arm_trbe_disable_cpu, drvdata, 1); >> + arm_trbe_remove_coresight_cpu(drvdata, cpu); >> + } >> free_percpu(drvdata->cpudata); >> return 0; >> } >> @@ -1406,12 +1413,8 @@ static int arm_trbe_cpu_teardown(unsigned int cpu, struct hlist_node *node) >> { >> struct trbe_drvdata *drvdata = hlist_entry_safe(node, struct trbe_drvdata, hotplug_node); >> >> - if (cpumask_test_cpu(cpu, &drvdata->supported_cpus)) { >> - struct trbe_cpudata *cpudata = per_cpu_ptr(drvdata->cpudata, cpu); >> - >> - disable_percpu_irq(drvdata->irq); >> - trbe_reset_local(cpudata); >> - } >> + if (cpumask_test_cpu(cpu, &drvdata->supported_cpus)) >> + arm_trbe_disable_cpu(drvdata); > This code hunk seems unrelated to the context here other than just finding another use case > for arm_trbe_disable_cpu(). The problem is - arm_trbe_disable_cpu() resets cpudata->drvdata > which might not get re-initialized back in arm_trbe_cpu_startup(), as there will still be a > per cpu sink associated as confirmed with coresight_get_percpu_sink(). I guess it might be > better to drop this change and just keep everything limited to SMP IPI callback reworking in > arm_trbe_remove_coresight(). OK, will fix it. The change is just to simplify the code of cpu_teardown. Maybe we can consider whether we need to set "cpudata->drvdata = NULL" in arm_trbe_disable_cpu()? If it's not necessary, This can be kept. Then drop the release cpudata->drvdata from arm_trbe_disable_cpu(). Best regards, Junhao. >> return 0; >> } >> > . >
On 17/08/2023 09:41, hejunhao wrote: > Hi Anshuman Khandual, > > > On 2023/8/17 15:13, Anshuman Khandual wrote: >> Hello Junhao, >> >> On 8/16/23 19:40, Suzuki K Poulose wrote: >>> From: Junhao He <hejunhao3@huawei.com> >>> >>> smp_call_function_single() will allocate an IPI interrupt vector to >>> the target processor and send a function call request to the interrupt >>> vector. After the target processor receives the IPI interrupt, it will >>> execute arm_trbe_remove_coresight_cpu() call request in the interrupt >>> handler. >>> >>> According to the device_unregister() stack information, if other process >>> is useing the device, the down_write() may sleep, and trigger deadlocks >>> or unexpected errors. >>> >>> arm_trbe_remove_coresight_cpu >>> coresight_unregister >>> device_unregister >>> device_del >>> kobject_del >>> __kobject_del >>> sysfs_remove_dir >>> kernfs_remove >>> down_write ---------> it may sleep >> But how did you really detect this problem ? Does this show up as an >> warning when >> you enable lockdep debug ? OR it really happened during a real >> workload execution >> followed by TRBE module unload. Although the problem seems plausible >> (which needs >> fixing), just wondering how did we trigger this. > > Yes, it really happened during a real workload. > > If the TRBE driver is loaded and unloaded cyclically. the test script > following: > > for ((i=0;i<99999;i++)) > do > insmod coresight-trbe.ko; > rmmod coresight-trbe.ko; > echo "loop $i"; > done > > The kernel will report a panic. > I wonder how easy it would be to add a kselftest to do this with all of the Coresight modules. Because we also had a problem with bad reference counting preventing an unload of the CTI module. Although that did require starting a perf session, which might complicated the test. >>> Add a helper arm_trbe_disable_cpu() to disable TRBE precpu irq and reset >>> per TRBE. >>> Simply call arm_trbe_remove_coresight_cpu() directly without useing the >>> smp_call_function_single(), which is the same as registering the TRBE >>> coresight device. >>> >>> Fixes: 3fbf7f011f24 ("coresight: sink: Add TRBE driver") >>> Signed-off-by: Junhao He <hejunhao3@huawei.com> >>> Link: >>> https://lore.kernel.org/r/20230814093813.19152-2-hejunhao3@huawei.com >>> [ Remove duplicate cpumask checks during removal ] >>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> >>> --- >>> drivers/hwtracing/coresight/coresight-trbe.c | 33 +++++++++++--------- >>> 1 file changed, 18 insertions(+), 15 deletions(-) >>> >>> diff --git a/drivers/hwtracing/coresight/coresight-trbe.c >>> b/drivers/hwtracing/coresight/coresight-trbe.c >>> index 7720619909d6..025f70adee47 100644 >>> --- a/drivers/hwtracing/coresight/coresight-trbe.c >>> +++ b/drivers/hwtracing/coresight/coresight-trbe.c >>> @@ -1225,6 +1225,17 @@ static void arm_trbe_enable_cpu(void *info) >>> enable_percpu_irq(drvdata->irq, IRQ_TYPE_NONE); >>> } >>> +static void arm_trbe_disable_cpu(void *info) >>> +{ >>> + struct trbe_drvdata *drvdata = info; >>> + struct trbe_cpudata *cpudata = this_cpu_ptr(drvdata->cpudata); >>> + >>> + disable_percpu_irq(drvdata->irq); >>> + trbe_reset_local(cpudata); >>> + cpudata->drvdata = NULL; >>> +} >>> + >>> + >>> static void arm_trbe_register_coresight_cpu(struct trbe_drvdata >>> *drvdata, int cpu) >>> { >>> struct trbe_cpudata *cpudata = per_cpu_ptr(drvdata->cpudata, cpu); >>> @@ -1326,18 +1337,12 @@ static void arm_trbe_probe_cpu(void *info) >>> cpumask_clear_cpu(cpu, &drvdata->supported_cpus); >>> } >>> -static void arm_trbe_remove_coresight_cpu(void *info) >>> +static void arm_trbe_remove_coresight_cpu(struct trbe_drvdata >>> *drvdata, int cpu) >>> { >>> - int cpu = smp_processor_id(); >>> - struct trbe_drvdata *drvdata = info; >>> - struct trbe_cpudata *cpudata = per_cpu_ptr(drvdata->cpudata, cpu); >>> struct coresight_device *trbe_csdev = >>> coresight_get_percpu_sink(cpu); >>> - disable_percpu_irq(drvdata->irq); >>> - trbe_reset_local(cpudata); >>> if (trbe_csdev) { >>> coresight_unregister(trbe_csdev); >>> - cpudata->drvdata = NULL; >>> coresight_set_percpu_sink(cpu, NULL); >>> } >>> } >>> @@ -1366,8 +1371,10 @@ static int arm_trbe_remove_coresight(struct >>> trbe_drvdata *drvdata) >>> { >>> int cpu; >>> - for_each_cpu(cpu, &drvdata->supported_cpus) >>> - smp_call_function_single(cpu, arm_trbe_remove_coresight_cpu, >>> drvdata, 1); >>> + for_each_cpu(cpu, &drvdata->supported_cpus) { >>> + smp_call_function_single(cpu, arm_trbe_disable_cpu, drvdata, >>> 1); >>> + arm_trbe_remove_coresight_cpu(drvdata, cpu); >>> + } >>> free_percpu(drvdata->cpudata); >>> return 0; >>> } >>> @@ -1406,12 +1413,8 @@ static int arm_trbe_cpu_teardown(unsigned int >>> cpu, struct hlist_node *node) >>> { >>> struct trbe_drvdata *drvdata = hlist_entry_safe(node, struct >>> trbe_drvdata, hotplug_node); >>> - if (cpumask_test_cpu(cpu, &drvdata->supported_cpus)) { >>> - struct trbe_cpudata *cpudata = per_cpu_ptr(drvdata->cpudata, >>> cpu); >>> - >>> - disable_percpu_irq(drvdata->irq); >>> - trbe_reset_local(cpudata); >>> - } >>> + if (cpumask_test_cpu(cpu, &drvdata->supported_cpus)) >>> + arm_trbe_disable_cpu(drvdata); >> This code hunk seems unrelated to the context here other than just >> finding another use case >> for arm_trbe_disable_cpu(). The problem is - arm_trbe_disable_cpu() >> resets cpudata->drvdata >> which might not get re-initialized back in arm_trbe_cpu_startup(), as >> there will still be a >> per cpu sink associated as confirmed with coresight_get_percpu_sink(). >> I guess it might be >> better to drop this change and just keep everything limited to SMP IPI >> callback reworking in >> arm_trbe_remove_coresight(). > > OK, will fix it. The change is just to simplify the code of cpu_teardown. > Maybe we can consider whether we need to set "cpudata->drvdata = NULL" > in arm_trbe_disable_cpu()? If it's not necessary, This can be kept. > Then drop the release cpudata->drvdata from arm_trbe_disable_cpu(). > > Best regards, > Junhao. > >>> return 0; >>> } >>> >> . >> > > _______________________________________________ > CoreSight mailing list -- coresight@lists.linaro.org > To unsubscribe send an email to coresight-leave@lists.linaro.org
On 17/08/2023 08:13, Anshuman Khandual wrote: > Hello Junhao, > > On 8/16/23 19:40, Suzuki K Poulose wrote: >> From: Junhao He <hejunhao3@huawei.com> >> >> smp_call_function_single() will allocate an IPI interrupt vector to >> the target processor and send a function call request to the interrupt >> vector. After the target processor receives the IPI interrupt, it will >> execute arm_trbe_remove_coresight_cpu() call request in the interrupt >> handler. >> >> According to the device_unregister() stack information, if other process >> is useing the device, the down_write() may sleep, and trigger deadlocks >> or unexpected errors. >> >> arm_trbe_remove_coresight_cpu >> coresight_unregister >> device_unregister >> device_del >> kobject_del >> __kobject_del >> sysfs_remove_dir >> kernfs_remove >> down_write ---------> it may sleep > > But how did you really detect this problem ? Does this show up as an warning when > you enable lockdep debug ? OR it really happened during a real workload execution > followed by TRBE module unload. Although the problem seems plausible (which needs > fixing), just wondering how did we trigger this. I was able to trigger this with just : modprobe coresight-trbe; modprobe -r coresight-trbe; With all the bells and whistles enabled in the kernel. Suzuki
diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c index 7720619909d6..025f70adee47 100644 --- a/drivers/hwtracing/coresight/coresight-trbe.c +++ b/drivers/hwtracing/coresight/coresight-trbe.c @@ -1225,6 +1225,17 @@ static void arm_trbe_enable_cpu(void *info) enable_percpu_irq(drvdata->irq, IRQ_TYPE_NONE); } +static void arm_trbe_disable_cpu(void *info) +{ + struct trbe_drvdata *drvdata = info; + struct trbe_cpudata *cpudata = this_cpu_ptr(drvdata->cpudata); + + disable_percpu_irq(drvdata->irq); + trbe_reset_local(cpudata); + cpudata->drvdata = NULL; +} + + static void arm_trbe_register_coresight_cpu(struct trbe_drvdata *drvdata, int cpu) { struct trbe_cpudata *cpudata = per_cpu_ptr(drvdata->cpudata, cpu); @@ -1326,18 +1337,12 @@ static void arm_trbe_probe_cpu(void *info) cpumask_clear_cpu(cpu, &drvdata->supported_cpus); } -static void arm_trbe_remove_coresight_cpu(void *info) +static void arm_trbe_remove_coresight_cpu(struct trbe_drvdata *drvdata, int cpu) { - int cpu = smp_processor_id(); - struct trbe_drvdata *drvdata = info; - struct trbe_cpudata *cpudata = per_cpu_ptr(drvdata->cpudata, cpu); struct coresight_device *trbe_csdev = coresight_get_percpu_sink(cpu); - disable_percpu_irq(drvdata->irq); - trbe_reset_local(cpudata); if (trbe_csdev) { coresight_unregister(trbe_csdev); - cpudata->drvdata = NULL; coresight_set_percpu_sink(cpu, NULL); } } @@ -1366,8 +1371,10 @@ static int arm_trbe_remove_coresight(struct trbe_drvdata *drvdata) { int cpu; - for_each_cpu(cpu, &drvdata->supported_cpus) - smp_call_function_single(cpu, arm_trbe_remove_coresight_cpu, drvdata, 1); + for_each_cpu(cpu, &drvdata->supported_cpus) { + smp_call_function_single(cpu, arm_trbe_disable_cpu, drvdata, 1); + arm_trbe_remove_coresight_cpu(drvdata, cpu); + } free_percpu(drvdata->cpudata); return 0; } @@ -1406,12 +1413,8 @@ static int arm_trbe_cpu_teardown(unsigned int cpu, struct hlist_node *node) { struct trbe_drvdata *drvdata = hlist_entry_safe(node, struct trbe_drvdata, hotplug_node); - if (cpumask_test_cpu(cpu, &drvdata->supported_cpus)) { - struct trbe_cpudata *cpudata = per_cpu_ptr(drvdata->cpudata, cpu); - - disable_percpu_irq(drvdata->irq); - trbe_reset_local(cpudata); - } + if (cpumask_test_cpu(cpu, &drvdata->supported_cpus)) + arm_trbe_disable_cpu(drvdata); return 0; }