From patchwork Wed Nov 23 14:57:58 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oded Gabbay X-Patchwork-Id: 25040 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp2844249wrr; Wed, 23 Nov 2022 07:06:53 -0800 (PST) X-Google-Smtp-Source: AA0mqf7LpxSMKMhNdpErDe+r6jgi0A9rQs9LzoW7XvRvZQO3IY3WS1jSobNv1ygzRSkohje/G/UG X-Received: by 2002:a62:ab18:0:b0:56b:9ae8:ca05 with SMTP id p24-20020a62ab18000000b0056b9ae8ca05mr9597183pff.59.1669216012975; Wed, 23 Nov 2022 07:06:52 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1669216012; cv=none; d=google.com; s=arc-20160816; b=lWkqtL4bqfv/n179Icaa6wwA4F5beZRYzqTmtIPJalASp6dhv1SEm3TQKEohmpoc6f K7CogQTaXx9Fi1qDahoQNrxAVNfM8kufFOBlxYWJgMjVXbwcCFB4L7uNe74DrejddB0V d/zbHDOuo7fFpKY821/nu2S5GkQ0J5UW28x+XgINuxmejeKIzQdURJ24/DBOJMw/15sP R+9eH2sBsa+QFfxZc/tbIpgnEQPk2ppTSKCoU4gvFqaYJCATSLqSo3sgtV8W0Q6YmZuO FdOODyYpWUsKBHWAslHa4cvYk8Df6uzoaNFYpT4mzZVqCsOEYcq6kJAJ5inbNWJR3Ra5 30Tg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=xKlBh/GwJ32FNK0sxi4YF1Rx4hllk6QxcSnfmIur0KU=; b=a3oYIrTx0tS0ZIvXv5u1gfpAViCsfZYQ3HRDL0X+/hw3EiCDtjUhH/atgNd/LWdVsy 8s1sSdmFyTWlmV/P85mh9RazCCDviF5JPeNIzY+MNXzbgn9knnl4PlA22AN7BcVpNn5z uWua1UJpj/ac4fGu0H/3+FydcIW6C4Jh4I+hHqcWgy4sO+7/Mu5z/9l7xsb2jtRIzPwY e7edzfdbGrHeUgcqSZ4lEjKDdpuhNoPx7/uwggIraE3JMqh+fjhlxVS1etzNC2yoeNHW MFF8Q8ZKPf0b/PzfCvKJTcionQ41T//j9RvjqFgPMOPpaJv8RMD+6TH4d9yXT+ggAKVd mOYg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=bekf9bIU; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id s1-20020a170902c64100b001870464adb6si14662474pls.183.2022.11.23.07.06.32; Wed, 23 Nov 2022 07:06:52 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=bekf9bIU; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238218AbiKWO6j (ORCPT + 99 others); Wed, 23 Nov 2022 09:58:39 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59432 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238280AbiKWO6P (ORCPT ); Wed, 23 Nov 2022 09:58:15 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9C485DF17 for ; Wed, 23 Nov 2022 06:58:13 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 96CBE61D56 for ; Wed, 23 Nov 2022 14:58:12 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3B993C43470; Wed, 23 Nov 2022 14:58:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1669215492; bh=riz4FuhsBhjUoUmsJfNCnyk/9pFendJtQjvdaN2u9Vo=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=bekf9bIUKmdi7KZ1BCWC2+7BGc4FJ7zP0ammSH49Dbv1YdqgwU0/v17wQ+rSJK2iR O83DlXdOD5F3AF0G3smb26BtEpA7DO0ugMfNdtqRNHtpGuHHNbP5qQaF7vP2josDGJ fjJovHi77FoC02mDY2REFyXBN5iI7X+nF3AXron7jmtDpcE1Ah53hfViybKvoebwt0 jiFAQn5whPRyMKEQAxKcoApgGAfxYoGRH5figsv7F0NEv8rY52Zbw5HjgsJz33N71K EXpub/wpeq3pkxAk7LM++pYoQ0j925eI0MyNj3LVveTCZDAOIfZHlBQsyw9nenFKmh q9oHsqW8LY6wA== From: Oded Gabbay To: linux-kernel@vger.kernel.org Cc: Tomer Tayar Subject: [PATCH 5/8] habanalabs: clear non-released encapsulated signals Date: Wed, 23 Nov 2022 16:57:58 +0200 Message-Id: <20221123145801.542029-5-ogabbay@kernel.org> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20221123145801.542029-1-ogabbay@kernel.org> References: <20221123145801.542029-1-ogabbay@kernel.org> MIME-Version: 1.0 X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1750299850356477508?= X-GMAIL-MSGID: =?utf-8?q?1750299850356477508?= From: Tomer Tayar Reserved encapsulated signals which were not released hold the context refcount, leading to a failure when killing the user process on device reset or device fini. Add the release of these left signals in the CS roll-back process. Signed-off-by: Tomer Tayar Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- .../habanalabs/common/command_submission.c | 46 ++++++++++++---- drivers/misc/habanalabs/common/context.c | 53 +++++++++++-------- drivers/misc/habanalabs/common/habanalabs.h | 3 +- 3 files changed, 71 insertions(+), 31 deletions(-) diff --git a/drivers/misc/habanalabs/common/command_submission.c b/drivers/misc/habanalabs/common/command_submission.c index f1c69c8ed74a..ea0e5101c10e 100644 --- a/drivers/misc/habanalabs/common/command_submission.c +++ b/drivers/misc/habanalabs/common/command_submission.c @@ -742,13 +742,11 @@ static void cs_do_release(struct kref *ref) */ if (hl_cs_cmpl->encaps_signals) kref_put(&hl_cs_cmpl->encaps_sig_hdl->refcount, - hl_encaps_handle_do_release); + hl_encaps_release_handle_and_put_ctx); } - if ((cs->type == CS_TYPE_WAIT || cs->type == CS_TYPE_COLLECTIVE_WAIT) - && cs->encaps_signals) - kref_put(&cs->encaps_sig_hdl->refcount, - hl_encaps_handle_do_release); + if ((cs->type == CS_TYPE_WAIT || cs->type == CS_TYPE_COLLECTIVE_WAIT) && cs->encaps_signals) + kref_put(&cs->encaps_sig_hdl->refcount, hl_encaps_release_handle_and_put_ctx); out: /* Must be called before hl_ctx_put because inside we use ctx to get @@ -1011,6 +1009,34 @@ static void cs_rollback(struct hl_device *hdev, struct hl_cs *cs) hl_complete_job(hdev, job); } +/* + * release_reserved_encaps_signals() - release reserved encapsulated signals. + * @hdev: pointer to habanalabs device structure + * + * Release reserved encapsulated signals which weren't un-reserved, or for which a CS with + * encapsulated signals wasn't submitted and thus weren't released as part of CS roll-back. + * For these signals need also to put the refcount of the H/W SOB which was taken at the + * reservation. + */ +static void release_reserved_encaps_signals(struct hl_device *hdev) +{ + struct hl_ctx *ctx = hl_get_compute_ctx(hdev); + struct hl_cs_encaps_sig_handle *handle; + struct hl_encaps_signals_mgr *mgr; + u32 id; + + if (!ctx) + return; + + mgr = &ctx->sig_mgr; + + idr_for_each_entry(&mgr->handles, handle, id) + if (handle->cs_seq == ULLONG_MAX) + kref_put(&handle->refcount, hl_encaps_release_handle_and_put_sob_ctx); + + hl_ctx_put(ctx); +} + void hl_cs_rollback_all(struct hl_device *hdev, bool skip_wq_flush) { int i; @@ -1039,6 +1065,8 @@ void hl_cs_rollback_all(struct hl_device *hdev, bool skip_wq_flush) } force_complete_multi_cs(hdev); + + release_reserved_encaps_signals(hdev); } static void @@ -2001,6 +2029,8 @@ static int cs_ioctl_reserve_signals(struct hl_fpriv *hpriv, */ handle->pre_sob_val = prop->next_sob_val - handle->count; + handle->cs_seq = ULLONG_MAX; + *signals_count = prop->next_sob_val; hdev->asic_funcs->hw_queues_unlock(hdev); @@ -2350,10 +2380,8 @@ static int cs_ioctl_signal_wait(struct hl_fpriv *hpriv, enum hl_cs_type cs_type, /* We finished with the CS in this function, so put the ref */ cs_put(cs); free_cs_chunk_array: - if (!wait_cs_submitted && cs_encaps_signals && handle_found && - is_wait_cs) - kref_put(&encaps_sig_hdl->refcount, - hl_encaps_handle_do_release); + if (!wait_cs_submitted && cs_encaps_signals && handle_found && is_wait_cs) + kref_put(&encaps_sig_hdl->refcount, hl_encaps_release_handle_and_put_ctx); kfree(cs_chunk_array); out: return rc; diff --git a/drivers/misc/habanalabs/common/context.c b/drivers/misc/habanalabs/common/context.c index ba6675960203..9c8b1b37b510 100644 --- a/drivers/misc/habanalabs/common/context.c +++ b/drivers/misc/habanalabs/common/context.c @@ -9,37 +9,46 @@ #include -void hl_encaps_handle_do_release(struct kref *ref) +static void encaps_handle_do_release(struct hl_cs_encaps_sig_handle *handle, bool put_hw_sob, + bool put_ctx) { - struct hl_cs_encaps_sig_handle *handle = - container_of(ref, struct hl_cs_encaps_sig_handle, refcount); struct hl_encaps_signals_mgr *mgr = &handle->ctx->sig_mgr; + if (put_hw_sob) + hw_sob_put(handle->hw_sob); + spin_lock(&mgr->lock); idr_remove(&mgr->handles, handle->id); spin_unlock(&mgr->lock); - hl_ctx_put(handle->ctx); + if (put_ctx) + hl_ctx_put(handle->ctx); + kfree(handle); } -static void hl_encaps_handle_do_release_sob(struct kref *ref) +void hl_encaps_release_handle_and_put_ctx(struct kref *ref) { struct hl_cs_encaps_sig_handle *handle = - container_of(ref, struct hl_cs_encaps_sig_handle, refcount); - struct hl_encaps_signals_mgr *mgr = &handle->ctx->sig_mgr; + container_of(ref, struct hl_cs_encaps_sig_handle, refcount); - /* if we're here, then there was a signals reservation but cs with - * encaps signals wasn't submitted, so need to put refcount - * to hw_sob taken at the reservation. - */ - hw_sob_put(handle->hw_sob); + encaps_handle_do_release(handle, false, true); +} - spin_lock(&mgr->lock); - idr_remove(&mgr->handles, handle->id); - spin_unlock(&mgr->lock); +static void hl_encaps_release_handle_and_put_sob(struct kref *ref) +{ + struct hl_cs_encaps_sig_handle *handle = + container_of(ref, struct hl_cs_encaps_sig_handle, refcount); - kfree(handle); + encaps_handle_do_release(handle, true, false); +} + +void hl_encaps_release_handle_and_put_sob_ctx(struct kref *ref) +{ + struct hl_cs_encaps_sig_handle *handle = + container_of(ref, struct hl_cs_encaps_sig_handle, refcount); + + encaps_handle_do_release(handle, true, true); } static void hl_encaps_sig_mgr_init(struct hl_encaps_signals_mgr *mgr) @@ -48,8 +57,7 @@ static void hl_encaps_sig_mgr_init(struct hl_encaps_signals_mgr *mgr) idr_init(&mgr->handles); } -static void hl_encaps_sig_mgr_fini(struct hl_device *hdev, - struct hl_encaps_signals_mgr *mgr) +static void hl_encaps_sig_mgr_fini(struct hl_device *hdev, struct hl_encaps_signals_mgr *mgr) { struct hl_cs_encaps_sig_handle *handle; struct idr *idp; @@ -57,11 +65,14 @@ static void hl_encaps_sig_mgr_fini(struct hl_device *hdev, idp = &mgr->handles; + /* The IDR is expected to be empty at this stage, because any left signal should have been + * released as part of CS roll-back. + */ if (!idr_is_empty(idp)) { - dev_warn(hdev->dev, "device released while some encaps signals handles are still allocated\n"); + dev_warn(hdev->dev, + "device released while some encaps signals handles are still allocated\n"); idr_for_each_entry(idp, handle, id) - kref_put(&handle->refcount, - hl_encaps_handle_do_release_sob); + kref_put(&handle->refcount, hl_encaps_release_handle_and_put_sob); } idr_destroy(&mgr->handles); diff --git a/drivers/misc/habanalabs/common/habanalabs.h b/drivers/misc/habanalabs/common/habanalabs.h index 0329a0980bb7..e2527d976ee0 100644 --- a/drivers/misc/habanalabs/common/habanalabs.h +++ b/drivers/misc/habanalabs/common/habanalabs.h @@ -3775,7 +3775,8 @@ void hl_sysfs_add_dev_vrm_attr(struct hl_device *hdev, struct attribute_group *d void hw_sob_get(struct hl_hw_sob *hw_sob); void hw_sob_put(struct hl_hw_sob *hw_sob); -void hl_encaps_handle_do_release(struct kref *ref); +void hl_encaps_release_handle_and_put_ctx(struct kref *ref); +void hl_encaps_release_handle_and_put_sob_ctx(struct kref *ref); void hl_hw_queue_encaps_sig_set_sob_info(struct hl_device *hdev, struct hl_cs *cs, struct hl_cs_job *job, struct hl_cs_compl *cs_cmpl);