[v5,04/11] PM / QoS: Decouple request alloc from dev_pm_qos_mtx

Message ID 20230822180208.95556-5-robdclark@gmail.com
State New
Headers
Series None |

Commit Message

Rob Clark Aug. 22, 2023, 6:01 p.m. UTC
  From: Rob Clark <robdclark@chromium.org>

Similar to the previous patch, move the allocation out from under
dev_pm_qos_mtx, by speculatively doing the allocation and handle
any race after acquiring dev_pm_qos_mtx by freeing the redundant
allocation.

Signed-off-by: Rob Clark <robdclark@chromium.org>
---
 drivers/base/power/qos.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)
  

Comments

kernel test robot Sept. 22, 2023, 7:14 a.m. UTC | #1
Hello,

kernel test robot noticed "canonical_address#:#[##]" on:

commit: d308a440bdf329cfa70cc5d35c565939d81ae73f ("[PATCH v5 04/11] PM / QoS: Decouple request alloc from dev_pm_qos_mtx")
url: https://github.com/intel-lab-lkp/linux/commits/Rob-Clark/PM-devfreq-Drop-unneed-locking-to-appease-lockdep/20230823-020443
base: git://anongit.freedesktop.org/drm/drm-misc drm-misc-next
patch link: https://lore.kernel.org/all/20230822180208.95556-5-robdclark@gmail.com/
patch subject: [PATCH v5 04/11] PM / QoS: Decouple request alloc from dev_pm_qos_mtx

in testcase: blktests
version: blktests-x86_64-e0bb3dc-1_20230912
with following parameters:

	disk: 1SSD
	test: nvme-group-01
	nvme_trtype: rdma



compiler: gcc-12
test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480+ (Sapphire Rapids) with 256G memory

(please refer to attached dmesg/kmsg for entire log/backtrace)



If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202309221426.fb0fe750-oliver.sang@intel.com



[   79.616893][ T2311]
[   79.634663][ T3447] run blktests nvme/032 at 2023-09-19 15:50:52
[   83.369231][ T2313] /lkp/lkp/src/monitors/kmemleak: 19: echo: echo: I/O error
[   83.369240][ T2313]
[   85.082264][ T1434] nvme nvme0: 128/0/0 default/read/poll queues
[   88.926272][ T3447] general protection fault, probably for non-canonical address 0xdffffc0000000024: 0000 [#1] PREEMPT SMP KASAN NOPTI
[   88.941100][ T3447] KASAN: null-ptr-deref in range [0x0000000000000120-0x0000000000000127]
[   88.951583][ T3447] CPU: 95 PID: 3447 Comm: check Tainted: G S                 6.5.0-rc2-00514-gd308a440bdf3 #1
[   88.964091][ T3447] Hardware name: Intel Corporation D50DNP1SBB/D50DNP1SBB, BIOS SE5C7411.86B.8118.D04.2206151341 06/15/2022
[ 88.977880][ T3447] RIP: 0010:dev_pm_qos_update_user_latency_tolerance (kbuild/src/consumer/drivers/base/power/qos.c:936) 
[ 88.987504][ T3447] Code: 02 00 00 48 8b bb 08 02 00 00 e8 79 ea ff ff 48 8d b8 20 01 00 00 48 89 c5 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 3a 02 00 00 45 31 f6 48 83 bd 20 01 00 00 00 0f
All code
========
   0:	02 00                	add    (%rax),%al
   2:	00 48 8b             	add    %cl,-0x75(%rax)
   5:	bb 08 02 00 00       	mov    $0x208,%ebx
   a:	e8 79 ea ff ff       	callq  0xffffffffffffea88
   f:	48 8d b8 20 01 00 00 	lea    0x120(%rax),%rdi
  16:	48 89 c5             	mov    %rax,%rbp
  19:	48 b8 00 00 00 00 00 	movabs $0xdffffc0000000000,%rax
  20:	fc ff df 
  23:	48 89 fa             	mov    %rdi,%rdx
  26:	48 c1 ea 03          	shr    $0x3,%rdx
  2a:*	80 3c 02 00          	cmpb   $0x0,(%rdx,%rax,1)		<-- trapping instruction
  2e:	0f 85 3a 02 00 00    	jne    0x26e
  34:	45 31 f6             	xor    %r14d,%r14d
  37:	48 83 bd 20 01 00 00 	cmpq   $0x0,0x120(%rbp)
  3e:	00 
  3f:	0f                   	.byte 0xf

Code starting with the faulting instruction
===========================================
   0:	80 3c 02 00          	cmpb   $0x0,(%rdx,%rax,1)
   4:	0f 85 3a 02 00 00    	jne    0x244
   a:	45 31 f6             	xor    %r14d,%r14d
   d:	48 83 bd 20 01 00 00 	cmpq   $0x0,0x120(%rbp)
  14:	00 
  15:	0f                   	.byte 0xf
[   89.010647][ T3447] RSP: 0018:ffa0000017fe7b70 EFLAGS: 00010206
[   89.018574][ T3447] RAX: dffffc0000000000 RBX: ff1100209b614298 RCX: 0000000000000000
[   89.028658][ T3447] RDX: 0000000000000024 RSI: 00000000ffffffff RDI: 0000000000000120
[   89.038735][ T3447] RBP: 0000000000000000 R08: 0000000000000000 R09: fff3fc0002ffcf64
[   89.048812][ T3447] R10: 0000000000000003 R11: ff1100208a8624b0 R12: ff1100209b6144a0
[   89.058895][ T3447] R13: 00000000ffffffff R14: ffffffffc08e3468 R15: ff110001273f4138
[   89.068957][ T3447] FS:  00007fc6d8027740(0000) GS:ff11003fd3180000(0000) knlGS:0000000000000000
[   89.080098][ T3447] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   89.088618][ T3447] CR2: 00007f5be5eeb120 CR3: 0000000263306002 CR4: 0000000000f71ee0
[   89.098720][ T3447] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   89.108812][ T3447] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[   89.118899][ T3447] PKRU: 55555554
[   89.123997][ T3447] Call Trace:
[   89.128804][ T3447]  <TASK>
[ 89.133218][ T3447] ? die_addr (kbuild/src/consumer/arch/x86/kernel/dumpstack.c:421 kbuild/src/consumer/arch/x86/kernel/dumpstack.c:460) 
[ 89.139003][ T3447] ? exc_general_protection (kbuild/src/consumer/arch/x86/kernel/traps.c:786 kbuild/src/consumer/arch/x86/kernel/traps.c:728) 
[ 89.146323][ T3447] ? asm_exc_general_protection (kbuild/src/consumer/arch/x86/include/asm/idtentry.h:564) 
[ 89.153849][ T3447] ? dev_pm_qos_update_user_latency_tolerance (kbuild/src/consumer/drivers/base/power/qos.c:936) 


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20230922/202309221426.fb0fe750-oliver.sang@intel.com
  

Patch

diff --git a/drivers/base/power/qos.c b/drivers/base/power/qos.c
index 7e95760d16dc..09834f3354d7 100644
--- a/drivers/base/power/qos.c
+++ b/drivers/base/power/qos.c
@@ -930,8 +930,12 @@  s32 dev_pm_qos_get_user_latency_tolerance(struct device *dev)
 int dev_pm_qos_update_user_latency_tolerance(struct device *dev, s32 val)
 {
 	struct dev_pm_qos *qos = dev_pm_qos_constraints_allocate(dev);
+	struct dev_pm_qos_request *req = NULL;
 	int ret = 0;
 
+	if (!qos->latency_tolerance_req)
+		req = kzalloc(sizeof(*req), GFP_KERNEL);
+
 	mutex_lock(&dev_pm_qos_mtx);
 
 	dev_pm_qos_constraints_set(dev, qos);
@@ -945,8 +949,6 @@  int dev_pm_qos_update_user_latency_tolerance(struct device *dev, s32 val)
 		goto out;
 
 	if (!dev->power.qos->latency_tolerance_req) {
-		struct dev_pm_qos_request *req;
-
 		if (val < 0) {
 			if (val == PM_QOS_LATENCY_TOLERANCE_NO_CONSTRAINT)
 				ret = 0;
@@ -954,17 +956,15 @@  int dev_pm_qos_update_user_latency_tolerance(struct device *dev, s32 val)
 				ret = -EINVAL;
 			goto out;
 		}
-		req = kzalloc(sizeof(*req), GFP_KERNEL);
 		if (!req) {
 			ret = -ENOMEM;
 			goto out;
 		}
 		ret = __dev_pm_qos_add_request(dev, req, DEV_PM_QOS_LATENCY_TOLERANCE, val);
-		if (ret < 0) {
-			kfree(req);
+		if (ret < 0)
 			goto out;
-		}
 		dev->power.qos->latency_tolerance_req = req;
+		req = NULL;
 	} else {
 		if (val < 0) {
 			__dev_pm_qos_drop_user_request(dev, DEV_PM_QOS_LATENCY_TOLERANCE);
@@ -976,6 +976,7 @@  int dev_pm_qos_update_user_latency_tolerance(struct device *dev, s32 val)
 
  out:
 	mutex_unlock(&dev_pm_qos_mtx);
+	kfree(req);
 	return ret;
 }
 EXPORT_SYMBOL_GPL(dev_pm_qos_update_user_latency_tolerance);