From patchwork Wed Nov 23 14:57:56 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oded Gabbay X-Patchwork-Id: 25036 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp2844069wrr; Wed, 23 Nov 2022 07:06:41 -0800 (PST) X-Google-Smtp-Source: AA0mqf7ZFqLvDeK5eA1PsT8f2zWIJtJoWLJwcluU32JkJS3gC85GdjW50Z1xItnm8gyIJ+UXo6Qf X-Received: by 2002:a17:90a:c712:b0:212:9625:c8e9 with SMTP id o18-20020a17090ac71200b002129625c8e9mr36802562pjt.128.1669216001146; Wed, 23 Nov 2022 07:06:41 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1669216001; cv=none; d=google.com; s=arc-20160816; b=riUB1ki0bhysWx91A7XUhHTgSm/j+H5objgB3LzvcKsxQkbgbqKCwN5eNHNErx1J5+ 1mM2BayIupV18gdrHq4cQje37Hc0lLzXToHsvyCP42yTHp7Ofz3Wbcb/GBuSRT8uZZso b8jjQstxgtE5lI0YtQihHXWy4GPkpZpcEH/pQm/ccFWOaxATXJDSO1/64GF/DKtt8ZeY 9yIyn4nTZR030z4gPyQe3YrXK2CAcMje5Cn85FWYszXBmPTU4l21Mk8w0vi4Ou6nI228 9nS6RqITYjfF49j+QiUBkHH4CWsIK0Fw+vYfAOEiipZLAMoAIc1NXNv89o06GhB4jbKb JUyA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=hqDXlEj6zYL/dwpyZnOtvM2PwXvIvQbnPLvqCmqxRLE=; b=DOxDG+u4COHLA9rnSIn9nzXBVgDY+gjF0Uz1rWRSVPuGftHO1Q6eyamJw8oR1zE3Yv lRTrRIboXFlX4tO4iP8EnruFn3sdtAbI/UHzyou2GRWA9otFmiC8035IMZT5/PerHVhP z412BRbDmL87nrKYoyf6GsEg/rRZads58/RPY8pgSwk2AzeK1LNISgRwB/+wPDWUSkye OoFD5ehcgv1FWNmqWD2/eR32EymTpUkC79Dnt5UzZkvwPlRwx0iPABTO0it3Od8nIrhL eiTKYnYO2yolUVvkIY/0bhsOOksDW13Dfn41a8GgaJsznGmo0IeRcOnFu43Umbjn45vu 187A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=EZl11+ml; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id k191-20020a6384c8000000b00477c498cca3si857552pgd.55.2022.11.23.07.06.15; Wed, 23 Nov 2022 07:06:41 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=EZl11+ml; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235309AbiKWO62 (ORCPT + 99 others); Wed, 23 Nov 2022 09:58:28 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59222 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238173AbiKWO6P (ORCPT ); Wed, 23 Nov 2022 09:58:15 -0500 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3590E5FA9 for ; Wed, 23 Nov 2022 06:58:12 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id B8FD8B8206C for ; Wed, 23 Nov 2022 14:58:10 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id A06CFC433D6; Wed, 23 Nov 2022 14:58:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1669215489; bh=0kGd4u1zhabaucfl7C3QkCQrJfpFWk2tRyn/vCN7FXI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=EZl11+mlBdZ5dduflYDzaF3SmpJ4sXaqgKyazDzSQ6sGgo0diZKiBWPiJcP/X6g9N V5rjbx8wvEK4ABlqxu8vgGs/L66w58H4oRIJaLoLUTp2AC9kp8Y1IEyJDexjjJEp0u PmdyuTGInkeGG+0v7DrHk0SnVbo+E0HOrRYf/K62pcHvwO8ix7f7Nm5w8w0YnnyQ42 JdIGCD86JWv1zX+6hFWlzmGyjXwoATHj9FAc6KhkTpov/ts+tRQPCSZNvBh2GzxWiW DvUhX630sHgR55dczDijvbi63XQpSb6nzC1g+SYX+GAf3+l1mSsY1zW6Q3brirU8hd BpkVRwmD6ON3A== From: Oded Gabbay To: linux-kernel@vger.kernel.org Cc: Tomer Tayar Subject: [PATCH 3/8] habanalabs: print context refcount value if hard reset fails Date: Wed, 23 Nov 2022 16:57:56 +0200 Message-Id: <20221123145801.542029-3-ogabbay@kernel.org> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20221123145801.542029-1-ogabbay@kernel.org> References: <20221123145801.542029-1-ogabbay@kernel.org> MIME-Version: 1.0 X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1750299837961701370?= X-GMAIL-MSGID: =?utf-8?q?1750299837961701370?= From: Tomer Tayar Failing to kill a user process during a hard reset can be due to a reference to the user context which isn't released. To make it easier to understand if this the reason for the failure and not something else, add a print of the context refcount value. Signed-off-by: Tomer Tayar Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/misc/habanalabs/common/device.c | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/drivers/misc/habanalabs/common/device.c b/drivers/misc/habanalabs/common/device.c index f5864893237c..926f230def56 100644 --- a/drivers/misc/habanalabs/common/device.c +++ b/drivers/misc/habanalabs/common/device.c @@ -696,10 +696,22 @@ static void device_hard_reset_pending(struct work_struct *work) flags = device_reset_work->flags | HL_DRV_RESET_FROM_RESET_THR; rc = hl_device_reset(hdev, flags); + if ((rc == -EBUSY) && !hdev->device_fini_pending) { - dev_info(hdev->dev, - "Could not reset device. will try again in %u seconds", - HL_PENDING_RESET_PER_SEC); + struct hl_ctx *ctx = hl_get_compute_ctx(hdev); + + if (ctx) { + /* The read refcount value should subtracted by one, because the read is + * protected with hl_get_compute_ctx(). + */ + dev_info(hdev->dev, + "Could not reset device (compute_ctx refcount %u). will try again in %u seconds", + kref_read(&ctx->refcount) - 1, HL_PENDING_RESET_PER_SEC); + hl_ctx_put(ctx); + } else { + dev_info(hdev->dev, "Could not reset device. will try again in %u seconds", + HL_PENDING_RESET_PER_SEC); + } queue_delayed_work(hdev->reset_wq, &device_reset_work->reset_work, msecs_to_jiffies(HL_PENDING_RESET_PER_SEC * 1000));