From patchwork Thu Dec 8 15:13:25 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oded Gabbay X-Patchwork-Id: 31389 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp253456wrr; Thu, 8 Dec 2022 07:16:09 -0800 (PST) X-Google-Smtp-Source: AA0mqf7UTl4ZKDqTkXI6UQP3KNf619gVYPb8pVuuRi0DcBnz4PPPM6ixAnSGtdTrvLll3okxkBZy X-Received: by 2002:a17:906:3155:b0:7ad:90db:c241 with SMTP id e21-20020a170906315500b007ad90dbc241mr79294517eje.284.1670512568929; Thu, 08 Dec 2022 07:16:08 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670512568; cv=none; d=google.com; s=arc-20160816; b=rpknil67YNN2txfr6ZVXul6vINms2k1lKVcEtaUTj5DxoShSuOxc6DN66GQ/22oEKt 5ufFjbtMzRWSgqSrNX+yc2/Ad86O9zEtYWO7+DTexISgs3M3jKlOIqx10DU27yLrVLBx JZGagfD1NHMx2y58wPHeBmencGTdKUpNHFRdSM7vd0f8NBiANx+oivxsDtYqsGao0G5f B0uiGxTdJ+96607g+HosD8zKtwIBmLm9yD09GxdY1+f7a8O2g9XMn3p4wR7AUBI+YH3H juELrGpsYuk71DmhEBSTixEWPYaVUnEZZxcG6iMlr1ZbZWfMqIjUPrMyo+pP+FAP5XAU xRCQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=7kFAmbWhvWA35wIFfEHN46ncJHoJ4C6lZ43D5M1mptU=; b=hl8PVnKorXk7KEtZRzziokE582NWR1sZqFhr8HQtjx+SDRJCMQO3BaTrDdcgYZcekR mXjVzuFNBqOPT4xPU+S3Mug6hLturh7dOkmam1rxgrVjX2NnZqqqzchJS3qGChsQpBCB fSjb1Sca0s8u1H/JnXdwyq16g4/h9vIjBuLKHv1anffykhK6uhkMLES5FG/0w03rgPLQ sgAwzD+AZ9SL/HviVH/UeCUw8ttu1WQLlOcsXyR+RO5Kl91WjW+E0ggmARhuzf0mnYZp fKiBlmizRqTX4bNHxm4gXOhDCt9ukK+8uQoqGUgILAHYEjZfrZMpb/UyXT8yiSIT3Rza N+2Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=gBtwtUPW; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ht11-20020a170907608b00b0079800b8172bsi21442663ejc.450.2022.12.08.07.15.44; Thu, 08 Dec 2022 07:16:08 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=gBtwtUPW; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229901AbiLHPOK (ORCPT + 99 others); Thu, 8 Dec 2022 10:14:10 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58090 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229524AbiLHPOF (ORCPT ); Thu, 8 Dec 2022 10:14:05 -0500 Received: from sin.source.kernel.org (sin.source.kernel.org [IPv6:2604:1380:40e1:4800::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6BF0E7B565 for ; Thu, 8 Dec 2022 07:14:03 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sin.source.kernel.org (Postfix) with ESMTPS id D31EACE24B1 for ; Thu, 8 Dec 2022 15:13:57 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3DDEAC433C1; Thu, 8 Dec 2022 15:13:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1670512436; bh=DWeduHE5LG/ZZm2EpXZg/Tyocb4hcbZCucorTSOe254=; h=From:To:Cc:Subject:Date:From; b=gBtwtUPWqcqyPAIlJJPbzSrTmEAtaZk8PN3p7V0GGLeGDJMQEBxxZkVhjEdnJ8pWP 0CgjzgwxxDaCn65tZPCdTAokRjq5RRgDH2QMmfBeyOAxl4s3N97q+XL5daN8cq8B37 fVIYe2gNsD/hb9zaBdnxsUx8luTQn98B1Q1pFe3NgA86TYGw+V2+2YTpc3pA6QKoll zoWopz6CsPBf4l1bJtgpWVth4T23DLm2D1JBsNodQxjMfoRfkBYlcQ1K/yUOhpuB+f fuOPFfk1fm/R+fSWVlAYegM9DAeHUW0SPO1AujaWhNMa9T6oO1p4FJNPhc4k1jw3I8 qdq03fw4IBgMw== From: Oded Gabbay To: linux-kernel@vger.kernel.org Cc: tal albo Subject: [PATCH 01/26] habanalabs/gaudi2: fix BMON 3rd address range Date: Thu, 8 Dec 2022 17:13:25 +0200 Message-Id: <20221208151350.1833823-1-ogabbay@kernel.org> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751659387579725503?= X-GMAIL-MSGID: =?utf-8?q?1751659387579725503?= From: tal albo Fix programming incorrect value of address range Signed-off-by: tal albo Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/misc/habanalabs/gaudi2/gaudi2_coresight.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/misc/habanalabs/gaudi2/gaudi2_coresight.c b/drivers/misc/habanalabs/gaudi2/gaudi2_coresight.c index 56c6ab692482..1df7a59e4101 100644 --- a/drivers/misc/habanalabs/gaudi2/gaudi2_coresight.c +++ b/drivers/misc/habanalabs/gaudi2/gaudi2_coresight.c @@ -2376,10 +2376,10 @@ static int gaudi2_config_bmon(struct hl_device *hdev, struct hl_debug_params *pa WREG32(base_reg + mmBMON_ADDRH_S2_OFFSET, upper_32_bits(input->start_addr2)); WREG32(base_reg + mmBMON_ADDRL_E2_OFFSET, lower_32_bits(input->end_addr2)); WREG32(base_reg + mmBMON_ADDRH_E2_OFFSET, upper_32_bits(input->end_addr2)); - WREG32(base_reg + mmBMON_ADDRL_S3_OFFSET, lower_32_bits(input->start_addr2)); - WREG32(base_reg + mmBMON_ADDRH_S3_OFFSET, upper_32_bits(input->start_addr2)); - WREG32(base_reg + mmBMON_ADDRL_E3_OFFSET, lower_32_bits(input->end_addr2)); - WREG32(base_reg + mmBMON_ADDRH_E3_OFFSET, upper_32_bits(input->end_addr2)); + WREG32(base_reg + mmBMON_ADDRL_S3_OFFSET, lower_32_bits(input->start_addr3)); + WREG32(base_reg + mmBMON_ADDRH_S3_OFFSET, upper_32_bits(input->start_addr3)); + WREG32(base_reg + mmBMON_ADDRL_E3_OFFSET, lower_32_bits(input->end_addr3)); + WREG32(base_reg + mmBMON_ADDRH_E3_OFFSET, upper_32_bits(input->end_addr3)); WREG32(base_reg + mmBMON_IDL_OFFSET, 0x0); WREG32(base_reg + mmBMON_IDH_OFFSET, 0x0); From patchwork Thu Dec 8 15:13:26 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oded Gabbay X-Patchwork-Id: 31400 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp253745wrr; Thu, 8 Dec 2022 07:16:32 -0800 (PST) X-Google-Smtp-Source: AA0mqf4bFUoS21KIqySyaYqJzVc/z+fJeCsReYhXssBxLITwm86QayC9P7Ecv+v+6LI1QGXdg8U1 X-Received: by 2002:a17:906:a08e:b0:7ad:79c0:5479 with SMTP id q14-20020a170906a08e00b007ad79c05479mr10434871ejy.392.1670512592645; Thu, 08 Dec 2022 07:16:32 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670512592; cv=none; d=google.com; s=arc-20160816; b=T+2cGZfEG/sWvzHmo7LstqRi4VlPx+AkM150HvdHL9Oj+/iKeUXd+7WNqcFx9l2p9i ADSXsAp7DHWZ4707R30zKawjcsS7RysrZ0va4dNzzQvwURvv6WDV8xmk7xD4n37nHTz4 mb6EmYDJMYBQnTmUYdN4CVwgWYOL+zXmc4l55pIzXqOwyoIqoYk8OIW/rKSinSDHiONp WnlMyf0XWfQ3lNbStw8p6YndPReV8y4UXNtjhfTwNUjLlj6t8y1pnGL6k2N4U+oZ26pl ifo/P1rOOdx7vmBzs+28vHSIbGH+VmxnbubGU313LXMQ4k9Q5xwoWelbAIGYRQP6bOZ2 k11Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=DF20Xzv4fLTCblqfetNpJ+Fhzg2c6VFFwKPLUdXBGik=; b=YARIGI2M5UiYkriBCB1dMjMPAj0pY9hms3q1yKI6fb01EI1XWZLGZ3l1mxX8dawtwm 0LJMXX/aq5j3QuGKxsiF2qr4cP8IEp4Y9grXMEH90cOArKU+qn0RmNjzDI4H/HewFMZA nnJxbP6TQgIsXCW10k2jf8YjHfNgkXri540+30K2sU1hVr6BUgHo0zK7DZozicM/XCy9 EcKbQnlj1eLUweAHykyjY9flu0xwKlQ1/vhiiYjAs866BbEYhkSkRJr5YxuOVXDHWmYV t+3NCGF35I+3LmgwpgUSn5XpjnLo3LP+4eDzB51RVscIvvZVdUuD8KbarMp2mxyfZnLm vc3w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=MovZCgAP; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ne22-20020a1709077b9600b007c127609e69si3047998ejc.713.2022.12.08.07.16.07; Thu, 08 Dec 2022 07:16:32 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=MovZCgAP; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229792AbiLHPOH (ORCPT + 99 others); Thu, 8 Dec 2022 10:14:07 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58088 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229675AbiLHPOF (ORCPT ); Thu, 8 Dec 2022 10:14:05 -0500 Received: from sin.source.kernel.org (sin.source.kernel.org [145.40.73.55]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DDABE7B55F for ; Thu, 8 Dec 2022 07:14:02 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sin.source.kernel.org (Postfix) with ESMTPS id 3BC55CE24AF for ; Thu, 8 Dec 2022 15:13:59 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8D565C433D7; Thu, 8 Dec 2022 15:13:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1670512437; bh=tlXEVAOSNK/nYsqgZwBYGiq4v7wig4HlBnsvgqzMN2A=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=MovZCgAPFk/WPqZwWJs4Y7v8Vtrb0NF2/tMi+K440G6DVaRkx/b3w9pyKZfopjlCo x4jMKSGW2klF3ME+Z+CxcZhYj4BFTIZuwjZq9D8YXaEdVpVWBi2mjjKB3AKqBmk0kE +CXl3wBYXtNkTUDemahzZUbheK9T2Pl5xkUzAer6Aghrk2JbBXPTHktfwRCiNRjU3i 1ZJ8vFVyr1IT76LbFifrkajIR77GVK4rkHPD8hGpJ2Qj6xB2VcctpLPrSjmNqRNOkV YfuKdtpJXRERc2AzK+GYyEWmoHRV2KwQ7ORL/tPthvYb96UkOjNvyNyCdWLzJgOjLV 7uckjoEBs/6qA== From: Oded Gabbay To: linux-kernel@vger.kernel.org Cc: farah kassabri Subject: [PATCH 02/26] habanalabs: read binning info from preboot Date: Thu, 8 Dec 2022 17:13:26 +0200 Message-Id: <20221208151350.1833823-2-ogabbay@kernel.org> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20221208151350.1833823-1-ogabbay@kernel.org> References: <20221208151350.1833823-1-ogabbay@kernel.org> MIME-Version: 1.0 X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751659412393517366?= X-GMAIL-MSGID: =?utf-8?q?1751659412393517366?= From: farah kassabri Sometimes we need the binning info at a very early state of the driver initialization. Therefore, support was added in preboot to provide the binning info as part of the f/w descriptor and the driver can now use that. Signed-off-by: farah kassabri Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/misc/habanalabs/common/firmware_if.c | 24 +++++++++++++- drivers/misc/habanalabs/common/habanalabs.h | 5 +++ .../habanalabs/include/common/hl_boot_if.h | 31 +++++++++++++------ 3 files changed, 50 insertions(+), 10 deletions(-) diff --git a/drivers/misc/habanalabs/common/firmware_if.c b/drivers/misc/habanalabs/common/firmware_if.c index 228b92278e48..4f364c3085fe 100644 --- a/drivers/misc/habanalabs/common/firmware_if.c +++ b/drivers/misc/habanalabs/common/firmware_if.c @@ -2560,13 +2560,35 @@ static int hl_fw_dynamic_init_cpu(struct hl_device *hdev, } if (!(hdev->fw_components & FW_TYPE_BOOT_CPU)) { + struct lkd_fw_binning_info *binning_info; + rc = hl_fw_dynamic_request_descriptor(hdev, fw_loader, 0); if (rc) goto protocol_err; /* read preboot version */ - return hl_fw_dynamic_read_device_fw_version(hdev, FW_COMP_PREBOOT, + rc = hl_fw_dynamic_read_device_fw_version(hdev, FW_COMP_PREBOOT, fw_loader->dynamic_loader.comm_desc.cur_fw_ver); + + if (rc) + goto out; + + /* read binning info from preboot */ + if (hdev->support_preboot_binning) { + binning_info = &fw_loader->dynamic_loader.comm_desc.binning_info; + hdev->tpc_binning = le64_to_cpu(binning_info->tpc_mask_l); + hdev->dram_binning = le32_to_cpu(binning_info->dram_mask); + hdev->edma_binning = le32_to_cpu(binning_info->edma_mask); + hdev->decoder_binning = le32_to_cpu(binning_info->dec_mask); + hdev->rotator_binning = le32_to_cpu(binning_info->rot_mask); + + dev_dbg(hdev->dev, + "Read binning masks: tpc: 0x%llx, dram: 0x%llx, edma: 0x%x, dec: 0x%x, rot:0x%x\n", + hdev->tpc_binning, hdev->dram_binning, hdev->edma_binning, + hdev->decoder_binning, hdev->rotator_binning); + } +out: + return rc; } /* load boot fit to FW */ diff --git a/drivers/misc/habanalabs/common/habanalabs.h b/drivers/misc/habanalabs/common/habanalabs.h index e2527d976ee0..9e42d0e9ce33 100644 --- a/drivers/misc/habanalabs/common/habanalabs.h +++ b/drivers/misc/habanalabs/common/habanalabs.h @@ -3157,6 +3157,8 @@ struct hl_reset_info { * @edma_binning: contains mask of edma engines that is received from the f/w which * indicates which edma engines are binned-out * @device_release_watchdog_timeout_sec: device release watchdog timeout value in seconds. + * @rotator_binning: contains mask of rotators engines that is received from the f/w + * which indicates which rotator engines are binned-out(Gaudi3 and above). * @id: device minor. * @id_control: minor of the control device. * @cdev_idx: char device index. Used for setting its name. @@ -3214,6 +3216,7 @@ struct hl_reset_info { * @heartbeat: Controls if we want to enable the heartbeat mechanism vs. the f/w, which verifies * that the f/w is always alive. Used only for testing. * @supports_ctx_switch: true if a ctx switch is required upon first submission. + * @support_preboot_binning: true if we support read binning info from preboot. */ struct hl_device { struct pci_dev *pdev; @@ -3322,6 +3325,7 @@ struct hl_device { u32 decoder_binning; u32 edma_binning; u32 device_release_watchdog_timeout_sec; + u32 rotator_binning; u16 id; u16 id_control; u16 cdev_idx; @@ -3355,6 +3359,7 @@ struct hl_device { u8 supports_mmu_prefetch; u8 reset_upon_device_release; u8 supports_ctx_switch; + u8 support_preboot_binning; /* Parameters for bring-up */ u64 nic_ports_mask; diff --git a/drivers/misc/habanalabs/include/common/hl_boot_if.h b/drivers/misc/habanalabs/include/common/hl_boot_if.h index e0ea51cc7475..fe034111360e 100644 --- a/drivers/misc/habanalabs/include/common/hl_boot_if.h +++ b/drivers/misc/habanalabs/include/common/hl_boot_if.h @@ -439,7 +439,7 @@ struct cpu_dyn_regs { /* TODO: remove the desc magic after the code is updated to use message */ /* HCDM - Habana Communications Descriptor Magic */ #define HL_COMMS_DESC_MAGIC 0x4843444D -#define HL_COMMS_DESC_VER 1 +#define HL_COMMS_DESC_VER 3 /* HCMv - Habana Communications Message + header version */ #define HL_COMMS_MSG_MAGIC_VALUE 0x48434D00 @@ -450,8 +450,10 @@ struct cpu_dyn_regs { ((ver) & HL_COMMS_MSG_MAGIC_VER_MASK)) #define HL_COMMS_MSG_MAGIC_V0 HL_COMMS_DESC_MAGIC #define HL_COMMS_MSG_MAGIC_V1 HL_COMMS_MSG_MAGIC_VER(1) +#define HL_COMMS_MSG_MAGIC_V2 HL_COMMS_MSG_MAGIC_VER(2) +#define HL_COMMS_MSG_MAGIC_V3 HL_COMMS_MSG_MAGIC_VER(3) -#define HL_COMMS_MSG_MAGIC HL_COMMS_MSG_MAGIC_V1 +#define HL_COMMS_MSG_MAGIC HL_COMMS_MSG_MAGIC_V3 #define HL_COMMS_MSG_MAGIC_VALIDATE_MAGIC(magic) \ (((magic) & HL_COMMS_MSG_MAGIC_MASK) == \ @@ -474,22 +476,31 @@ enum comms_msg_type { /* * Binning information shared between LKD and FW - * @tpc_mask - TPC binning information + * @tpc_mask_l - TPC binning information lower 64 bit * @dec_mask - Decoder binning information - * @hbm_mask - HBM binning information + * @dram_mask - DRAM binning information * @edma_mask - EDMA binning information * @mme_mask_l - MME binning information lower 32 * @mme_mask_h - MME binning information upper 32 - * @reserved - reserved field for 64 bit alignment + * @rot_mask - Rotator binning information + * @xbar_mask - xBAR binning information + * @reserved - reserved field for future binning info w/o ABI change + * @tpc_mask_h - TPC binning information upper 64 bit + * @nic_mask - NIC binning information */ struct lkd_fw_binning_info { - __le64 tpc_mask; + __le64 tpc_mask_l; __le32 dec_mask; - __le32 hbm_mask; + __le32 dram_mask; __le32 edma_mask; __le32 mme_mask_l; __le32 mme_mask_h; - __le32 reserved; + __le32 rot_mask; + __le32 xbar_mask; + __le32 reserved0; + __le64 tpc_mask_h; + __le64 nic_mask; + __le32 reserved1[8]; }; /* TODO: remove this struct after the code is updated to use message */ @@ -521,6 +532,7 @@ struct lkd_fw_comms_desc { /* can be used for 1 more version w/o ABI change */ char reserved0[VERSION_MAX_LEN]; __le64 img_addr; /* address for next FW component load */ + struct lkd_fw_binning_info binning_info; }; enum comms_reset_cause { @@ -545,6 +557,7 @@ struct lkd_fw_comms_msg { char reserved0[VERSION_MAX_LEN]; /* address for next FW component load */ __le64 img_addr; + struct lkd_fw_binning_info binning_info; }; struct { __u8 reset_cause; @@ -552,7 +565,7 @@ struct lkd_fw_comms_msg { struct { __u8 fw_cfg_skip; /* 1 - skip, 0 - don't skip */ }; - struct lkd_fw_binning_info binning_info; + struct lkd_fw_binning_info binning_conf; }; }; From patchwork Thu Dec 8 15:13:27 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oded Gabbay X-Patchwork-Id: 31395 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp253593wrr; Thu, 8 Dec 2022 07:16:19 -0800 (PST) X-Google-Smtp-Source: AA0mqf6yuFRYCdPd86Kj+sHQ5LYHFFg9xJK5iBit0UzyIsqBaS08iF3fdxx9EmQ+niuEVZ5LB0R5 X-Received: by 2002:aa7:db4b:0:b0:46a:c6d3:a237 with SMTP id n11-20020aa7db4b000000b0046ac6d3a237mr7245975edt.132.1670512578752; Thu, 08 Dec 2022 07:16:18 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670512578; cv=none; d=google.com; s=arc-20160816; b=rc5vsP+gVU5optdVL6Mw+698EgUr1EnjUXd81aStQSEW2iwkzF2MoAHeOwwEGecyTw q4LIjntKlygfe8rLqiz+aBaSuAPlI98MVsGq8yQCBSdkHJpgC5gAdtqAWBj5vJRfXhyp jW0RiA1xQx7ve39stfecvZUWjC2XWdj61vUrWmLExs2xQ0drMSV52aXRhzIMj12iTwYK mzFpiN4elYDG2i1o6wLHPv+W51xWDGg3VZft3fQwmoq2xFNSptT/Zc77v5jRauRI2MT/ ATwIl6TKBJdcagah2kYMYcGHiqkIgdp3PS1xYoyOOiBX04WbUDAP9m3MchtbECXTi/Cu Q1CQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=MWHo4Fc/h7oobC05MLZN04ZUMufvvKOqJdqxGKH7+yI=; b=HuqKQMmsXLlvFM2uIbVtLny8OMtBP3E/38VyuhYYKvy37HtlIjm8LmgV1ui9LcFTSB 5FixQjGYyvc2zyrYxpbYzmo665lGluQ4OdS5Pq1ecnooR489ydTkdp8zmr5PD+rbxecX 6E9WTZotpTD3Hg4g8VewguE7Re+8RQrB5xqoer45ecRgJzpESnGXkBU1qpKq04cHLRCE 0pEe2lE82DrHaxK2ea/ditHOiBy2xNd7ahCJ60eeXxJ2bhWayLcIQTUMwqqMHC6PbuRC WiaNw6wXZBhLVnzGA0LxMYfG+r0D5wp2bTzDv0TnutGusYvi5IcUAzYWN/vu/rN7oOtI 0aFw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=ZMj7fYoY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id wu5-20020a170906eec500b00730a4246dd0si9703244ejb.593.2022.12.08.07.15.54; Thu, 08 Dec 2022 07:16:18 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=ZMj7fYoY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230159AbiLHPOf (ORCPT + 99 others); Thu, 8 Dec 2022 10:14:35 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58218 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229961AbiLHPOV (ORCPT ); Thu, 8 Dec 2022 10:14:21 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 95FA7ABA10 for ; Thu, 8 Dec 2022 07:14:09 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 4529661F7C for ; Thu, 8 Dec 2022 15:13:59 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id DB280C433B5; Thu, 8 Dec 2022 15:13:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1670512438; bh=vduhEwTqUmujvyUHIEbNfHylxhRM+/iHi4mF+4Bpzvk=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=ZMj7fYoYcAgR54QWLbYw3liejLnN8HA6KDMScCtz9FJk5qcN++cjhgJuiMCQL+xeq kOEZKQouFylF4gf9rays11mytgr+5KrYX11784qb++HpuEBrx+lJSzt17DGl2OcRd4 UGM7WAsVcYt0VeKp7de8mFeJ5c+zPPnPCFkwvExc1wioqghdQ8nYmc6a2pJ/oqYlsk 5vSpshRAVJlCVlvhObIsiuxo/7wDYHDOGvKoW6ByssCvii4w1TxIO1zL61ELTzHQPr snepxKposuX7bZeTvTcIOGWD3XarroyzjeKkfxCvQx/21duQTEbcOUXxuw+mhKImT3 OB9b6qaqAA86g== From: Oded Gabbay To: linux-kernel@vger.kernel.org Cc: Tomer Tayar Subject: [PATCH 03/26] habanalabs: remove releasing of user threads from device release Date: Thu, 8 Dec 2022 17:13:27 +0200 Message-Id: <20221208151350.1833823-3-ogabbay@kernel.org> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20221208151350.1833823-1-ogabbay@kernel.org> References: <20221208151350.1833823-1-ogabbay@kernel.org> MIME-Version: 1.0 X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751659397828520630?= X-GMAIL-MSGID: =?utf-8?q?1751659397828520630?= From: Tomer Tayar The device file is not in use when hl_device_release() is called, and there aren't any user threads that use IOCTLs to wait for interrupts. Therefore there is no need to release them at this point. Signed-off-by: Tomer Tayar Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/misc/habanalabs/common/device.c | 5 ----- 1 file changed, 5 deletions(-) diff --git a/drivers/misc/habanalabs/common/device.c b/drivers/misc/habanalabs/common/device.c index 87ab329e65d4..1453f2ec72d9 100644 --- a/drivers/misc/habanalabs/common/device.c +++ b/drivers/misc/habanalabs/common/device.c @@ -511,11 +511,6 @@ static int hl_device_release(struct inode *inode, struct file *filp) return 0; } - /* Each pending user interrupt holds the user's context, hence we - * must release them all before calling hl_ctx_mgr_fini(). - */ - hl_release_pending_user_interrupts(hpriv->hdev); - hl_ctx_mgr_fini(hdev, &hpriv->ctx_mgr); hl_mem_mgr_fini(&hpriv->mem_mgr); From patchwork Thu Dec 8 15:13:28 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oded Gabbay X-Patchwork-Id: 31394 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp253574wrr; Thu, 8 Dec 2022 07:16:17 -0800 (PST) X-Google-Smtp-Source: AA0mqf6hZkZ6ccKPiOVrs8ludVPtJKnoUYlk9rG2Tyb2ikUBOLlEiSP9xK2LRWPoUVlIhov9MZMw X-Received: by 2002:a17:906:1797:b0:7c0:c805:42a with SMTP id t23-20020a170906179700b007c0c805042amr20761763eje.324.1670512577048; Thu, 08 Dec 2022 07:16:17 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670512577; cv=none; d=google.com; s=arc-20160816; b=bhTntfRJpNWFWvtjVFfAi2GHWEFKcKKmb2aeXMYeNmU1eeZup4P87TRuAZ6jWPsQBr d+06n6PaeJQDcNUu8HhmUn1SJgN0YURDtDmIPDFM52v4K4bxnTe73dL1O4Fqi7RmZUBb +O87NEdpwK4W5+Q0IWPE6Q5AfCKW05VUcw+dgjPyE5YonLnY9XJfbYZsj15Z/5UkXLtF anjlhX23Uxwm/EZVQChKD5cP5zjQIeOCKP9j6Quhg+dxP0TxNuQj0heNk59UURXfB0l0 J6vYmox10u62DD5KEBfxGQBv2UtiBQrIj3U262JL8M+oqTG/mD3Ry85el4iHeje5Gyt0 iNqw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=Im/Um+LPPylu7z4XfBZbMB51fVnw9S2BHlfK0gtYuaI=; b=DTQ8IIzZ4dpKH8vDwACKMt7/VKp/KFVp3NTvUAu6PaYrOEWRls8Tg3ALFlS4/PXDGI xj6yeN3+YWcu6hG6Kubrmbebp161I9/Pqcc9gVw5NGQki1FnBEVfvXcq1U+844ghZ2MU 5Gu2mvhQJrvmBhb/tc/9dlDUYxwC3Njxa0P2EXH50jozem5SGh6MU+9BD9xxQ/zGtdas 5nUBFQ8BqmTZc0dxCG5VQodjvBfZkZF0DjF0E7kZSBg7MKwkE4kc/7+Xka3IxzZeMR6t njr7ePfmK3P7P52/O+16xu/Jn6XEmaHwE8+2fmCUE1mFIQy/hbXVuCYTUaDUbU7DaCFv PUcA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=Cry0nHF3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id qb27-20020a1709077e9b00b007ac60b82ea5si20224041ejc.96.2022.12.08.07.15.53; Thu, 08 Dec 2022 07:16:17 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=Cry0nHF3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230189AbiLHPOb (ORCPT + 99 others); Thu, 8 Dec 2022 10:14:31 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58498 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229989AbiLHPOU (ORCPT ); Thu, 8 Dec 2022 10:14:20 -0500 Received: from sin.source.kernel.org (sin.source.kernel.org [145.40.73.55]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3ACFEA13E7 for ; Thu, 8 Dec 2022 07:14:09 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sin.source.kernel.org (Postfix) with ESMTPS id B9281CE24BE for ; Thu, 8 Dec 2022 15:14:01 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 355EAC433C1; Thu, 8 Dec 2022 15:13:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1670512440; bh=T/DDDDeU3/Lr3ifPweQ7tEM8IZx5N7efjB+ixRabz8I=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Cry0nHF3yRK4KLjyJ583haf6ZtojFTh/hh4b1wNT/Nal57AQdF3R0V2d20gBztXu1 j43Q+DTyJD7Ij3rwPnsoqJDeeSuATapGeIAFnyCubh8zmLd7WrMCfC15D4tEpBEctk ClxRcs2mdq2CfNktLPgLU25EPlzUI+2484s4LkBO7BIUGdUcvDBle/tpS/ba0IOloM hZwL4k8it6DXMz89b0kJGMWJFR0XxynESumIbf4lynd4cQGeED8+kHMHDJG5y7H8Xr IuFxKKnYWGRmvpiY+oVJ0I6rOvO1s2K9PVNdQlfWD9Q1GcoAsvrRHUg629QU4DoGDP Xjvsg948yu7Dw== From: Oded Gabbay To: linux-kernel@vger.kernel.org Cc: Tomer Tayar Subject: [PATCH 04/26] habanalabs: abort waiting user threads upon error Date: Thu, 8 Dec 2022 17:13:28 +0200 Message-Id: <20221208151350.1833823-4-ogabbay@kernel.org> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20221208151350.1833823-1-ogabbay@kernel.org> References: <20221208151350.1833823-1-ogabbay@kernel.org> MIME-Version: 1.0 X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751659396335439640?= X-GMAIL-MSGID: =?utf-8?q?1751659396335439640?= From: Tomer Tayar User should close the FD when being notified about an error, after which a device reset takes place. However, if the user has pending threads that wait for completions, the device release won't be called and eventually the watchdog timeout will expire, leading to hard reset and killing the user process. To avoid it, abort such waiting threads right after the error notification, and block following waiting operations. Signed-off-by: Tomer Tayar Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- .../habanalabs/common/command_submission.c | 28 +++++++++++++++++-- drivers/misc/habanalabs/common/device.c | 2 ++ drivers/misc/habanalabs/common/habanalabs.h | 1 + 3 files changed, 28 insertions(+), 3 deletions(-) diff --git a/drivers/misc/habanalabs/common/command_submission.c b/drivers/misc/habanalabs/common/command_submission.c index ea0e5101c10e..cf3b82efc65c 100644 --- a/drivers/misc/habanalabs/common/command_submission.c +++ b/drivers/misc/habanalabs/common/command_submission.c @@ -1117,6 +1117,27 @@ void hl_release_pending_user_interrupts(struct hl_device *hdev) wake_pending_user_interrupt_threads(interrupt); } +static void force_complete_cs(struct hl_device *hdev) +{ + struct hl_cs *cs; + + spin_lock(&hdev->cs_mirror_lock); + + list_for_each_entry(cs, &hdev->cs_mirror_list, mirror_node) { + cs->fence->error = -EIO; + complete_all(&cs->fence->completion); + } + + spin_unlock(&hdev->cs_mirror_lock); +} + +void hl_abort_waitings_for_completion(struct hl_device *hdev) +{ + force_complete_cs(hdev); + force_complete_multi_cs(hdev); + hl_release_pending_user_interrupts(hdev); +} + static void job_wq_completion(struct work_struct *work) { struct hl_cs_job *job = container_of(work, struct hl_cs_job, @@ -3489,14 +3510,15 @@ static int hl_interrupt_wait_ioctl(struct hl_fpriv *hpriv, void *data) int hl_wait_ioctl(struct hl_fpriv *hpriv, void *data) { + struct hl_device *hdev = hpriv->hdev; union hl_wait_cs_args *args = data; u32 flags = args->in.flags; int rc; - /* If the device is not operational, no point in waiting for any command submission or - * user interrupt + /* If the device is not operational, or if an error has happened and user should release the + * device, there is no point in waiting for any command submission or user interrupt. */ - if (!hl_device_operational(hpriv->hdev, NULL)) + if (!hl_device_operational(hpriv->hdev, NULL) || hdev->reset_info.watchdog_active) return -EBUSY; if (flags & HL_WAIT_CS_FLAGS_INTERRUPT) diff --git a/drivers/misc/habanalabs/common/device.c b/drivers/misc/habanalabs/common/device.c index 1453f2ec72d9..92721111b652 100644 --- a/drivers/misc/habanalabs/common/device.c +++ b/drivers/misc/habanalabs/common/device.c @@ -1865,6 +1865,8 @@ int hl_device_cond_reset(struct hl_device *hdev, u32 flags, u64 event_mask) hl_ctx_put(ctx); + hl_abort_waitings_for_completion(hdev); + return 0; device_reset: diff --git a/drivers/misc/habanalabs/common/habanalabs.h b/drivers/misc/habanalabs/common/habanalabs.h index 9e42d0e9ce33..7fb45610ad0c 100644 --- a/drivers/misc/habanalabs/common/habanalabs.h +++ b/drivers/misc/habanalabs/common/habanalabs.h @@ -3791,6 +3791,7 @@ void hl_dec_fini(struct hl_device *hdev); void hl_dec_ctx_fini(struct hl_ctx *ctx); void hl_release_pending_user_interrupts(struct hl_device *hdev); +void hl_abort_waitings_for_completion(struct hl_device *hdev); int hl_cs_signal_sob_wraparound_handler(struct hl_device *hdev, u32 q_idx, struct hl_hw_sob **hw_sob, u32 count, bool encaps_sig); From patchwork Thu Dec 8 15:13:29 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oded Gabbay X-Patchwork-Id: 31390 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp253460wrr; Thu, 8 Dec 2022 07:16:09 -0800 (PST) X-Google-Smtp-Source: AA0mqf5O9MqCsqNZwgN72+Yz+uSZKUlkSqxkiIXlK32QocEN/s0IIpENvvimkXlbHbfwbmRvzcYJ X-Received: by 2002:a17:902:aa43:b0:17f:52af:d022 with SMTP id c3-20020a170902aa4300b0017f52afd022mr91040773plr.122.1670512569120; Thu, 08 Dec 2022 07:16:09 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670512569; cv=none; d=google.com; s=arc-20160816; b=ldCrwDdX+kWmzblnl0GiZfAyGGD/cie1tIuX9niS6VN4AVokd4mmKyhZWEEDVx2C/r iYTaI9CGD48tlFz2riQPOEzp6WuvXZVNhutwBcBIFaaurmQlA6qFpRSsxEgE+v+zHFsy s26yzsT6UNbdKupBJ2lxHQTf1IVoL4VmaZcxt72HmbZbO7SrZfvOKS5V4wPK/9eHfVZd 84cbJi+bBoYlgMm44fKNZ14Clc8gty4h6lG4bEuxSXjFK+m3aPpYYbNnWrrqK67ob2nk JXN29jAlqXxqIzmDWJD1miPv2bBSk7u2YUtstQH5TaBJ0qURo2wGAxPf51SEasyihJiJ vrcg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=BL6fNMT/vt0NnP0yS5xL/Tuzvtrhagtdb/K2lzZbY6g=; b=KaSCFJvZEmLvFSwSbwViEZxiTd7IEuFY1/WfvFYAHCRsupYmSU9xZyDxsF/wumTBq5 rfHZlNWjDJE+rh+z9Axu3ktasBmec4sTllK8itNvO/d9T9kouM3IyZqV3FemQDnHlY5S qbFT0u/GgotJuEQbEj4wDyJ55aHOdWEXYTYTX26EnKmuMDbKCwnN4NXINTJXX9naoNNv JgMKgsAcobBioKoLT7L8q5G2sDk77GwaaS6xFS7C+HqFOIPwN9hM+qsroEdDxnmLxG0F YKQdE7MSOK64Vs1T8nbR7HyyElXRMIaY1ejFow73QvCpB51JEz0fMgn3JnlyKR3l62t0 dY6Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=YmF2CYCM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id l13-20020a170903120d00b00189b7f50e78si20471235plh.134.2022.12.08.07.15.54; Thu, 08 Dec 2022 07:16:09 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=YmF2CYCM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230219AbiLHPOn (ORCPT + 99 others); Thu, 8 Dec 2022 10:14:43 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58266 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229963AbiLHPOV (ORCPT ); Thu, 8 Dec 2022 10:14:21 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 96020ABA12 for ; Thu, 8 Dec 2022 07:14:09 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id E4E3E61F79 for ; Thu, 8 Dec 2022 15:14:01 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 84869C433D6; Thu, 8 Dec 2022 15:14:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1670512441; bh=yxVZQftgzo21Yr+TkI+JgRMZd0otPm2SNXiTeaqQMcs=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=YmF2CYCMbHGi9ckAu7W3NxQMfQABVl6dh71pCCdT3FtpZ3D0AQeZcP0okCZSM4ida rS6J4Z/C5hhFcQ3I+DjFmLTjoaAuagetFubJ+J9CAd2mAssxvzyrgHs79HdtYtkA4G pBxT39gdz7qG94CyDowN7ZhCs9EUGC2DWJY85seVl5oJ5hlrrUFBmct2cl/DUQqlsa q7yy+wCQzZq3FYA5DFQ5TSdnNz1QV9sgkfYXZtCVaz7WVTRZoTGoFl6jLP6ns8YgU4 /Zm6xntf+mPyo+gvN5TxZuw1wGRoPahLPRLgvNKByvOg+n/fr+ghMD3tbuVXKwx90t HQGmwZRY7B29w== From: Oded Gabbay To: linux-kernel@vger.kernel.org Cc: Ofir Bitton Subject: [PATCH 05/26] habanalabs: don't notify user about clk throttling due to power Date: Thu, 8 Dec 2022 17:13:29 +0200 Message-Id: <20221208151350.1833823-5-ogabbay@kernel.org> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20221208151350.1833823-1-ogabbay@kernel.org> References: <20221208151350.1833823-1-ogabbay@kernel.org> MIME-Version: 1.0 X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751659387895629612?= X-GMAIL-MSGID: =?utf-8?q?1751659387895629612?= From: Ofir Bitton As clock throttling due to high power consumption can happen very frequently and there is no real reason to notify the user about it, we skip this notification in all asics. Signed-off-by: Ofir Bitton Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/misc/habanalabs/gaudi/gaudi.c | 7 ++++--- drivers/misc/habanalabs/gaudi2/gaudi2.c | 7 ++++--- 2 files changed, 8 insertions(+), 6 deletions(-) diff --git a/drivers/misc/habanalabs/gaudi/gaudi.c b/drivers/misc/habanalabs/gaudi/gaudi.c index 9f5e208701ba..ae78f838f987 100644 --- a/drivers/misc/habanalabs/gaudi/gaudi.c +++ b/drivers/misc/habanalabs/gaudi/gaudi.c @@ -7584,7 +7584,7 @@ static int tpc_krn_event_to_tpc_id(u16 tpc_dec_event_type) return (tpc_dec_event_type - GAUDI_EVENT_TPC0_KRN_ERR) / 6; } -static void gaudi_print_clk_change_info(struct hl_device *hdev, u16 event_type) +static void gaudi_print_clk_change_info(struct hl_device *hdev, u16 event_type, u64 *event_mask) { ktime_t zero_time = ktime_set(0, 0); @@ -7612,6 +7612,7 @@ static void gaudi_print_clk_change_info(struct hl_device *hdev, u16 event_type) hdev->clk_throttling.aggregated_reason |= HL_CLK_THROTTLE_THERMAL; hdev->clk_throttling.timestamp[HL_CLK_THROTTLE_TYPE_THERMAL].start = ktime_get(); hdev->clk_throttling.timestamp[HL_CLK_THROTTLE_TYPE_THERMAL].end = zero_time; + *event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR; dev_info_ratelimited(hdev->dev, "Clock throttling due to overheating\n"); break; @@ -7619,6 +7620,7 @@ static void gaudi_print_clk_change_info(struct hl_device *hdev, u16 event_type) case GAUDI_EVENT_FIX_THERMAL_ENV_E: hdev->clk_throttling.current_reason &= ~HL_CLK_THROTTLE_THERMAL; hdev->clk_throttling.timestamp[HL_CLK_THROTTLE_TYPE_THERMAL].end = ktime_get(); + *event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR; dev_info_ratelimited(hdev->dev, "Thermal envelop is safe, back to optimal clock\n"); break; @@ -7887,8 +7889,7 @@ static void gaudi_handle_eqe(struct hl_device *hdev, struct hl_eq_entry *eq_entr break; case GAUDI_EVENT_FIX_POWER_ENV_S ... GAUDI_EVENT_FIX_THERMAL_ENV_E: - event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR; - gaudi_print_clk_change_info(hdev, event_type); + gaudi_print_clk_change_info(hdev, event_type, &event_mask); hl_fw_unmask_irq(hdev, event_type); break; diff --git a/drivers/misc/habanalabs/gaudi2/gaudi2.c b/drivers/misc/habanalabs/gaudi2/gaudi2.c index e793fb2bdcbe..c14b3bb16f96 100644 --- a/drivers/misc/habanalabs/gaudi2/gaudi2.c +++ b/drivers/misc/habanalabs/gaudi2/gaudi2.c @@ -8603,7 +8603,7 @@ static void gaudi2_handle_hbm_mc_spi(struct hl_device *hdev, u64 intr_cause_data hbm_mc_spi[i].cause); } -static void gaudi2_print_clk_change_info(struct hl_device *hdev, u16 event_type) +static void gaudi2_print_clk_change_info(struct hl_device *hdev, u16 event_type, u64 *event_mask) { ktime_t zero_time = ktime_set(0, 0); @@ -8629,12 +8629,14 @@ static void gaudi2_print_clk_change_info(struct hl_device *hdev, u16 event_type) hdev->clk_throttling.aggregated_reason |= HL_CLK_THROTTLE_THERMAL; hdev->clk_throttling.timestamp[HL_CLK_THROTTLE_TYPE_THERMAL].start = ktime_get(); hdev->clk_throttling.timestamp[HL_CLK_THROTTLE_TYPE_THERMAL].end = zero_time; + *event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR; dev_info_ratelimited(hdev->dev, "Clock throttling due to overheating\n"); break; case GAUDI2_EVENT_CPU_FIX_THERMAL_ENV_E: hdev->clk_throttling.current_reason &= ~HL_CLK_THROTTLE_THERMAL; hdev->clk_throttling.timestamp[HL_CLK_THROTTLE_TYPE_THERMAL].end = ktime_get(); + *event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR; dev_info_ratelimited(hdev->dev, "Thermal envelop is safe, back to optimal clock\n"); break; @@ -9085,8 +9087,7 @@ static void gaudi2_handle_eqe(struct hl_device *hdev, struct hl_eq_entry *eq_ent case GAUDI2_EVENT_CPU_FIX_POWER_ENV_E: case GAUDI2_EVENT_CPU_FIX_THERMAL_ENV_S: case GAUDI2_EVENT_CPU_FIX_THERMAL_ENV_E: - gaudi2_print_clk_change_info(hdev, event_type); - event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR; + gaudi2_print_clk_change_info(hdev, event_type, &event_mask); break; case GAUDI2_EVENT_CPU_PKT_QUEUE_OUT_SYNC: From patchwork Thu Dec 8 15:13:30 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oded Gabbay X-Patchwork-Id: 31392 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp253499wrr; Thu, 8 Dec 2022 07:16:11 -0800 (PST) X-Google-Smtp-Source: AA0mqf7P7F6g3Y2cNvHmAbewFp+ujjhZX4FxssAhYHvc/TwYgFCfwXOFtHKTxYjXyOWRhlIP1y4P X-Received: by 2002:a05:6402:1cc5:b0:46d:94b0:e6e5 with SMTP id ds5-20020a0564021cc500b0046d94b0e6e5mr2045189edb.380.1670512571566; Thu, 08 Dec 2022 07:16:11 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670512571; cv=none; d=google.com; s=arc-20160816; b=IdrR5g4TjNeM7wINN/zsjQT0DNHN9WG061grAkeeSC1WIfhPQrIEXK3X4uLJMQXvHF TmRRJBwjQ9rz4iyKJxNJkrIGt6ZvKXDmvL7l29oQeOuxeH8r2vP/dAlZb/IfQmXHJZ1x hTydjcg1cvS6WVP4b11iKhgyu9mCbJ455k01k8xLvltbf6JbURKlk7d52RXJSBtEB+F8 nV3TTKLkQl8WgplFad9hcHkVb6OYeztG5FSU3hadrdmW4YbmmyVG1SZv0kULBgycYmgv rIEvoJvAcVN0eKy8bOUoTTywTScGpP7PWZdJ4tE9fbbHA0W2hfaz7hBjwiGATro9c/pX KsvQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=y/kTP9fglSsAdhzx+zdikAOrEsOiLWtXWHe7619mGrs=; b=Dq04Xt0EfggspTgfpH25dVnIfNTXPuc4C8ChQA0x75nisYYDhAX7gC2laR9cHQ14Vt 4eyq7WlW66zhoxRa/OpcpOQlInLd2GnnngsFo69mub90TJ81pVhJ1rcmFnQro6uXbYMX ID1QDMuJ4IiCsiFWSKacMY/sQudVwSQEpNOEDH581k3BJRn8h1tNDSa2wRiC3QPNLmyv CVxxjsJY+svWoLPZj0a3LUmq8BU2rTS/8HkdsIfFiLPK5/17jjdp3TjQgveWuBGWq+WG pJTC8v+2EVHhTewEl8PEsynNkXMy42ycMg0NWpsAR7yfT4uxPbr0QLXKL5ffmWBGpkPn IDow== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=QkbzVw+y; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id hd14-20020a170907968e00b007c07a5fc045si22105770ejc.536.2022.12.08.07.15.47; Thu, 08 Dec 2022 07:16:11 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=QkbzVw+y; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230116AbiLHPOW (ORCPT + 99 others); Thu, 8 Dec 2022 10:14:22 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58218 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229932AbiLHPOK (ORCPT ); Thu, 8 Dec 2022 10:14:10 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 751C77B56F for ; Thu, 8 Dec 2022 07:14:08 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 3BE5461F6F for ; Thu, 8 Dec 2022 15:14:03 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id D2770C433B5; Thu, 8 Dec 2022 15:14:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1670512442; bh=lDCu27sI9boaM5mz1HqLBlHxhSBKXjqQ9eZ6bpk1ac0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=QkbzVw+ypy5dTMcvP9Ll+zqUf8fmFnO/CkjMNMYkgzU4OegtD6isNvlB8UBeaCvtT 86r60suvKGXHBtB/DirDwWrdJClF/bzhguoSqpn4khu7Fs3bsqsM3evNXGOC1HoaFN tSaCrE6KJi8ojlX3OmXsuOnqkQkygUH7wiu5/dR8I+WPKeN5xwoVfC9uOE3p3XIgfI ayPP3XtfngmnerIf72gV5BttRj7VSRNFlTwq79vg/2AUgGmzspXpaYbNZ1+hA2BG3c sxEeeReIJpMIZjuFdBoWjFIifqpQlIp4zEaWg50Vwtcr3+Pe91/GYTUElYQ0dLqX2b lTUEOAE7MwHBw== From: Oded Gabbay To: linux-kernel@vger.kernel.org Cc: Tomer Tayar Subject: [PATCH 06/26] habanalabs: don't allow user to destroy CB handle more than once Date: Thu, 8 Dec 2022 17:13:30 +0200 Message-Id: <20221208151350.1833823-6-ogabbay@kernel.org> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20221208151350.1833823-1-ogabbay@kernel.org> References: <20221208151350.1833823-1-ogabbay@kernel.org> MIME-Version: 1.0 X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751659390501180913?= X-GMAIL-MSGID: =?utf-8?q?1751659390501180913?= From: Tomer Tayar The refcount of a CB buffer is initialized when user allocates a CB, and is decreased when he destroys the CB handle. If this refcount is increased also from kernel and user sends more than one destroy requests for the handle, the buffer will be released/freed and later be accessed when the refcount is put from kernel side. To avoid it, prevent user from destroying the handle more than once. Signed-off-by: Tomer Tayar Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- .../misc/habanalabs/common/command_buffer.c | 22 +++++++++++++++++++ drivers/misc/habanalabs/common/device.c | 2 +- drivers/misc/habanalabs/common/habanalabs.h | 6 ++++- .../misc/habanalabs/common/habanalabs_drv.c | 2 +- drivers/misc/habanalabs/common/memory_mgr.c | 4 +++- 5 files changed, 32 insertions(+), 4 deletions(-) diff --git a/drivers/misc/habanalabs/common/command_buffer.c b/drivers/misc/habanalabs/common/command_buffer.c index 2b332991ac6a..24100501f8ca 100644 --- a/drivers/misc/habanalabs/common/command_buffer.c +++ b/drivers/misc/habanalabs/common/command_buffer.c @@ -298,9 +298,31 @@ int hl_cb_create(struct hl_device *hdev, struct hl_mem_mgr *mmg, int hl_cb_destroy(struct hl_mem_mgr *mmg, u64 cb_handle) { + struct hl_cb *cb; int rc; + /* Make sure that a CB handle isn't destroyed by user more than once */ + if (!mmg->is_kernel_mem_mgr) { + cb = hl_cb_get(mmg, cb_handle); + if (!cb) { + dev_dbg(mmg->dev, "CB destroy failed, no CB was found for handle %#llx\n", + cb_handle); + rc = -EINVAL; + goto out; + } + + rc = atomic_cmpxchg(&cb->is_handle_destroyed, 0, 1); + hl_cb_put(cb); + if (rc) { + dev_dbg(mmg->dev, "CB destroy failed, handle %#llx was already destroyed\n", + cb_handle); + rc = -EINVAL; + goto out; + } + } + rc = hl_mmap_mem_buf_put_handle(mmg, cb_handle); +out: if (rc < 0) return rc; /* Invalid handle */ diff --git a/drivers/misc/habanalabs/common/device.c b/drivers/misc/habanalabs/common/device.c index 92721111b652..afd9d4d46574 100644 --- a/drivers/misc/habanalabs/common/device.c +++ b/drivers/misc/habanalabs/common/device.c @@ -853,7 +853,7 @@ static int device_early_init(struct hl_device *hdev) if (rc) goto free_chip_info; - hl_mem_mgr_init(hdev->dev, &hdev->kernel_mem_mgr); + hl_mem_mgr_init(hdev->dev, &hdev->kernel_mem_mgr, 1); hdev->reset_wq = create_singlethread_workqueue("hl_device_reset"); if (!hdev->reset_wq) { diff --git a/drivers/misc/habanalabs/common/habanalabs.h b/drivers/misc/habanalabs/common/habanalabs.h index 7fb45610ad0c..ecf7e5da8f1d 100644 --- a/drivers/misc/habanalabs/common/habanalabs.h +++ b/drivers/misc/habanalabs/common/habanalabs.h @@ -872,11 +872,13 @@ struct hl_mmap_mem_buf; * @dev: back pointer to the owning device * @lock: protects handles * @handles: an idr holding all active handles to the memory buffers in the system. + * @is_kernel_mem_mgr: indicate whether the memory manager is the per-device kernel memory manager */ struct hl_mem_mgr { struct device *dev; spinlock_t lock; struct idr handles; + u8 is_kernel_mem_mgr; }; /** @@ -935,6 +937,7 @@ struct hl_mmap_mem_buf { * @size: holds the CB's size. * @roundup_size: holds the cb size after roundup to page size. * @cs_cnt: holds number of CS that this CB participates in. + * @is_handle_destroyed: atomic boolean indicating whether or not the CB handle was destroyed. * @is_pool: true if CB was acquired from the pool, false otherwise. * @is_internal: internally allocated * @is_mmu_mapped: true if the CB is mapped to the device's MMU. @@ -951,6 +954,7 @@ struct hl_cb { u32 size; u32 roundup_size; atomic_t cs_cnt; + atomic_t is_handle_destroyed; u8 is_pool; u8 is_internal; u8 is_mmu_mapped; @@ -3805,7 +3809,7 @@ __printf(4, 5) int hl_snprintf_resize(char **buf, size_t *size, size_t *offset, char *hl_format_as_binary(char *buf, size_t buf_len, u32 n); const char *hl_sync_engine_to_string(enum hl_sync_engine_type engine_type); -void hl_mem_mgr_init(struct device *dev, struct hl_mem_mgr *mmg); +void hl_mem_mgr_init(struct device *dev, struct hl_mem_mgr *mmg, u8 is_kernel_mem_mgr); void hl_mem_mgr_fini(struct hl_mem_mgr *mmg); int hl_mem_mgr_mmap(struct hl_mem_mgr *mmg, struct vm_area_struct *vma, void *args); diff --git a/drivers/misc/habanalabs/common/habanalabs_drv.c b/drivers/misc/habanalabs/common/habanalabs_drv.c index 7815c60df54e..a2983913d7c0 100644 --- a/drivers/misc/habanalabs/common/habanalabs_drv.c +++ b/drivers/misc/habanalabs/common/habanalabs_drv.c @@ -164,7 +164,7 @@ int hl_device_open(struct inode *inode, struct file *filp) nonseekable_open(inode, filp); hl_ctx_mgr_init(&hpriv->ctx_mgr); - hl_mem_mgr_init(hpriv->hdev->dev, &hpriv->mem_mgr); + hl_mem_mgr_init(hpriv->hdev->dev, &hpriv->mem_mgr, 0); hpriv->taskpid = get_task_pid(current, PIDTYPE_PID); diff --git a/drivers/misc/habanalabs/common/memory_mgr.c b/drivers/misc/habanalabs/common/memory_mgr.c index 1936d653699e..e652db601f0e 100644 --- a/drivers/misc/habanalabs/common/memory_mgr.c +++ b/drivers/misc/habanalabs/common/memory_mgr.c @@ -309,14 +309,16 @@ int hl_mem_mgr_mmap(struct hl_mem_mgr *mmg, struct vm_area_struct *vma, * * @dev: owner device pointer * @mmg: structure to initialize + * @is_kernel_mem_mgr: indicate whether the memory manager is the per-device kernel memory manager * * Initialize an instance of unified memory manager */ -void hl_mem_mgr_init(struct device *dev, struct hl_mem_mgr *mmg) +void hl_mem_mgr_init(struct device *dev, struct hl_mem_mgr *mmg, u8 is_kernel_mem_mgr) { mmg->dev = dev; spin_lock_init(&mmg->lock); idr_init(&mmg->handles); + mmg->is_kernel_mem_mgr = is_kernel_mem_mgr; } /** From patchwork Thu Dec 8 15:13:31 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oded Gabbay X-Patchwork-Id: 31393 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp253560wrr; Thu, 8 Dec 2022 07:16:16 -0800 (PST) X-Google-Smtp-Source: AA0mqf7Q1PNA1HHcq5TMuf8vp0o0OzQZsSkSHNhwtxFxbIPWz/LKMSwlEcw1yqIdEiPeR25yOe7D X-Received: by 2002:aa7:da4d:0:b0:46b:4156:bf29 with SMTP id w13-20020aa7da4d000000b0046b4156bf29mr39773661eds.246.1670512575943; Thu, 08 Dec 2022 07:16:15 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670512575; cv=none; d=google.com; s=arc-20160816; b=hncB/OZ8yjP6p23Uk85xp6aL0ik4FKHln1s2JtpEo6YTVMZ7bqhuI5xnlrOtaMEU0q 2kOrWKvQBiMCV/jwhPMCPw0uaqfYATFpOu58bR9MSdiZolxHhfg/HQFPEYR1WNzTJoev feICO4g9P3rTAybTPMTviz0so8fsw6sBgsFZ1D4uxKmkz6K6nbHJG9LLrct+Zw4rd+34 nHZodizet+BMZeJv+oUoYD925oHKcIck5Fzt3Mn3Qvw3Xvu967oFuQrfn5Z7fLnOXcBM ICCqk3S3oVKP3jPJCUutSE8L6xBG4JzdeHGPQVdsB7e+4MpsuEbqy80blGXuoD7vRu2A VyLw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=4gIBHdkOjqVUVJy4gyQctjGluVAo3EGC7AMMa8TezcU=; b=Y5GW8BfStJjCdVDQCR/49740vV4V5/ndKO/5cLxoRzg+CXeJksowDnXr1yssNdSQ6F UugJLcuNffIzvKtVzcTNVpnCSGH5xoF9mApjaDwOzWIqW2yc0LIUQM4k9Sburzdw5qTq CfG7JY03E76rOzapq9flRv/0Tf3yJwHq5q1whYKsBBPl9XVitI9rdcUTsNUfhu46oZVB l/EH5zwRdyNtZjdGSchEARlHPGFVHbnUUUq+59eVNWz7/5dOmXqQMlyIz6kgfacitcn7 Y1m+MSZeSgAFi8I3cvHsP5gkFe8gKJf+0yhmVD5GwMq1hJ4fVJtZ6mP8zTEPdozE51Zb Z40w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=gj50EHYB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id nd23-20020a170907629700b0079dc9dcbbb6si22162208ejc.337.2022.12.08.07.15.50; Thu, 08 Dec 2022 07:16:15 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=gj50EHYB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230176AbiLHPO0 (ORCPT + 99 others); Thu, 8 Dec 2022 10:14:26 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58234 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229954AbiLHPOL (ORCPT ); Thu, 8 Dec 2022 10:14:11 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 866EE7B55F for ; Thu, 8 Dec 2022 07:14:08 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 9019461F80 for ; Thu, 8 Dec 2022 15:14:04 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2CDBAC433C1; Thu, 8 Dec 2022 15:14:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1670512444; bh=hRUULdwh0l7BEJjOmnNp0uUoNa+TpzM9RCP8BRzI1PM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=gj50EHYBf/MVDPCuuge4TfzxjTEU6HVtHZazQE5F3vO6O6VgasIIYDqUCZPLhguXV Wynzvg/2/BHxkM9zZBMfsxQrIY9I/DE5wnf54fTHPPX5y/WKBR/Kaq8PHTXWWSqG9T JCkI6d+JkbzViD/uHETkiFb6SoRUApkmHUfKkWD1Uk9llXG5DwtLDeIs8Dd46IIpDo z3YgIwCsuxOsw6wqU1VUVxF/Pum36MwbhJY22gcBsCfF62RhOiuiASGNHRMGjBQPNl 7BdgaGckNQpjPU1Q/xkXh6AoVdEag+0p8F/JWGqtLrPd5KEKdk3aajrnQINUDHbhv6 XtOrrxSZfOneg== From: Oded Gabbay To: linux-kernel@vger.kernel.org Cc: Tomer Tayar Subject: [PATCH 07/26] habanalabs: use dev_dbg() when hl_mmap_mem_buf_get() fails Date: Thu, 8 Dec 2022 17:13:31 +0200 Message-Id: <20221208151350.1833823-7-ogabbay@kernel.org> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20221208151350.1833823-1-ogabbay@kernel.org> References: <20221208151350.1833823-1-ogabbay@kernel.org> MIME-Version: 1.0 X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751659394448418470?= X-GMAIL-MSGID: =?utf-8?q?1751659394448418470?= From: Tomer Tayar As hl_mmap_mem_buf_get() is called also from IOCTLs which can have a bad handle from user, modify the print for "no match to handle" to use dev_dbg(). Calls to this function which are not dependent on user, already have an error print when the function fails. Signed-off-by: Tomer Tayar Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/misc/habanalabs/common/memory_mgr.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/misc/habanalabs/common/memory_mgr.c b/drivers/misc/habanalabs/common/memory_mgr.c index e652db601f0e..92d20ed465b4 100644 --- a/drivers/misc/habanalabs/common/memory_mgr.c +++ b/drivers/misc/habanalabs/common/memory_mgr.c @@ -25,8 +25,7 @@ struct hl_mmap_mem_buf *hl_mmap_mem_buf_get(struct hl_mem_mgr *mmg, u64 handle) buf = idr_find(&mmg->handles, lower_32_bits(handle >> PAGE_SHIFT)); if (!buf) { spin_unlock(&mmg->lock); - dev_warn(mmg->dev, - "Buff get failed, no match to handle %#llx\n", handle); + dev_dbg(mmg->dev, "Buff get failed, no match to handle %#llx\n", handle); return NULL; } kref_get(&buf->refcount); From patchwork Thu Dec 8 15:13:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oded Gabbay X-Patchwork-Id: 31399 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp253730wrr; Thu, 8 Dec 2022 07:16:30 -0800 (PST) X-Google-Smtp-Source: AA0mqf6G/aZTN08lLS9vemcSLw/16ZV9Q7NAEtqJ/XaQPtuFZWq6N50yh2wFbmKbo/gyFmobF6Hq X-Received: by 2002:a17:906:b0d7:b0:78d:39e8:89eb with SMTP id bk23-20020a170906b0d700b0078d39e889ebmr65396973ejb.639.1670512590509; Thu, 08 Dec 2022 07:16:30 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670512590; cv=none; d=google.com; s=arc-20160816; b=Ue1lfkv5/VrEY/fLMSra8mC5BxTEooUu38WcYj/ZtuU+jPBvU/8WVXTlHZSYcLtWjm crCOQkB4pQxlTAJQDAA19rAyhYVFQ5bJKYMA43Nk8eQEiAkOmttrhF3RPh5YLfvVL9Za vbL/KFsG/UIilZ3TgPJ2XfYcoLfIQmztF7Fr6iRuD5WK2b0+QcZ28aQeopiuTk2aXUID xyT+JQ2JYpVxG+Seo2lowHZs8ha+aOhFoHFOp1IsTcXblqZFNGlbnBQmtX2AJt/y84zy rUKy4sjFNRPumxKMRt44Au20UA3Z9t7G2wcQNQszhEjOSOKJ8/BoYaRwJ26I+uAXFdFB uYcg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=SAdBGHeb1CV/te2At+Aof2qO0sYBYzykYa86SFXmRY8=; b=yZk3a4dOhiP1oradIOywh0I7zs/pTqbDBJ/8Tgkti0O0XHI35qwzSW345/XO/4l5Us +axqzvn1J2v63LKBGryV73MbkRgoEktbO8j4GC6l0YP4523g0mX2H1zX9mU/BiuLZlyo E8mEUc9pLrv/8vrUH0rCr2PIZ48qsHU0+oiAc9UgwDosQxjuXUgOtkKni0PKi9yHQhHb M2Mfv7wlnHU5rMQidlt+rsTcj1uqevZm0FjzjG9ycD0e4bv3k5frZdoBHmAS+nPoCNFT wCyuo+cD+4qM/0hBAVE5BpxrkkS0qJzbfupLPCz6myHl+jHO0CiWCPdwecTpVgOSpdpM hk8g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=c6SoIfk6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id wz13-20020a170906fe4d00b007813b1924ccsi19106911ejb.934.2022.12.08.07.16.04; Thu, 08 Dec 2022 07:16:30 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=c6SoIfk6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230171AbiLHPOx (ORCPT + 99 others); Thu, 8 Dec 2022 10:14:53 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58256 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230110AbiLHPOW (ORCPT ); Thu, 8 Dec 2022 10:14:22 -0500 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E4E947B565 for ; Thu, 8 Dec 2022 07:14:10 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 805DEB82362 for ; Thu, 8 Dec 2022 15:14:06 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7A88FC433D6; Thu, 8 Dec 2022 15:14:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1670512445; bh=EdMaeOizcuVWEAj740RUMsc9QgZkcxvrbrIB5v7n/Ko=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=c6SoIfk6mQFBlPX+luIjx3jHsXhqDmks/UsYtoRZd/HdKWDDYdEc6ft8R+3aFYckw qwB/iiQhHiqLVR9znif49XBArz6oNln+F/Wa1XlTH0WAANmWCNC5zJiRtrCrzotfE7 uBzlyHIlL1y00cDUmcqbnSfCo+oj48jyHVif2PDXvchMDsfaSWiE2Jnm/wEUwxSS+F hPKiicdaP92qL6n4knvA4IapQXr78p+oI23RM5RHM8EhRQcPYBQChgEGgaIwIvlCll 75fWc9sw0irX5YkLSlz0/Qvl+qL3kR3F1SUYGRM+43OMprSZeiTK2aWGZVQOs7xO3c 7E5owuZGya8eg== From: Oded Gabbay To: linux-kernel@vger.kernel.org Cc: Ohad Sharabi Subject: [PATCH 08/26] habanalabs: make set_dram_properties an ASIC function Date: Thu, 8 Dec 2022 17:13:32 +0200 Message-Id: <20221208151350.1833823-8-ogabbay@kernel.org> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20221208151350.1833823-1-ogabbay@kernel.org> References: <20221208151350.1833823-1-ogabbay@kernel.org> MIME-Version: 1.0 X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751659410195782774?= X-GMAIL-MSGID: =?utf-8?q?1751659410195782774?= From: Ohad Sharabi As ASICs are evolving, we will need to update the DRAM properties at various points because we may get different information from the f/w at different points of the initialization. This ASIC function is a foundation for this capability. Signed-off-by: Ohad Sharabi Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/misc/habanalabs/common/habanalabs.h | 1 + drivers/misc/habanalabs/gaudi/gaudi.c | 6 ++++++ drivers/misc/habanalabs/gaudi2/gaudi2.c | 3 ++- drivers/misc/habanalabs/goya/goya.c | 6 ++++++ 4 files changed, 15 insertions(+), 1 deletion(-) diff --git a/drivers/misc/habanalabs/common/habanalabs.h b/drivers/misc/habanalabs/common/habanalabs.h index ecf7e5da8f1d..893ebcba170b 100644 --- a/drivers/misc/habanalabs/common/habanalabs.h +++ b/drivers/misc/habanalabs/common/habanalabs.h @@ -1683,6 +1683,7 @@ struct hl_asic_funcs { int (*set_engine_cores)(struct hl_device *hdev, u32 *core_ids, u32 num_cores, u32 core_command); int (*send_device_activity)(struct hl_device *hdev, bool open); + int (*set_dram_properties)(struct hl_device *hdev); }; diff --git a/drivers/misc/habanalabs/gaudi/gaudi.c b/drivers/misc/habanalabs/gaudi/gaudi.c index ae78f838f987..1b701a87c6fe 100644 --- a/drivers/misc/habanalabs/gaudi/gaudi.c +++ b/drivers/misc/habanalabs/gaudi/gaudi.c @@ -9134,6 +9134,11 @@ static u32 *gaudi_get_stream_master_qid_arr(void) return gaudi_stream_master; } +static int gaudi_set_dram_properties(struct hl_device *hdev) +{ + return 0; +} + static void gaudi_check_if_razwi_happened(struct hl_device *hdev) { } @@ -9260,6 +9265,7 @@ static const struct hl_asic_funcs gaudi_funcs = { .access_dev_mem = hl_access_dev_mem, .set_dram_bar_base = gaudi_set_hbm_bar_base, .send_device_activity = gaudi_send_device_activity, + .set_dram_properties = gaudi_set_dram_properties, }; /** diff --git a/drivers/misc/habanalabs/gaudi2/gaudi2.c b/drivers/misc/habanalabs/gaudi2/gaudi2.c index c14b3bb16f96..10c017b8ddfa 100644 --- a/drivers/misc/habanalabs/gaudi2/gaudi2.c +++ b/drivers/misc/habanalabs/gaudi2/gaudi2.c @@ -2485,7 +2485,7 @@ static int gaudi2_cpucp_info_get(struct hl_device *hdev) * at this point the DRAM parameters need to be updated according to data obtained * from the FW */ - rc = gaudi2_set_dram_properties(hdev); + rc = hdev->asic_funcs->set_dram_properties(hdev); if (rc) return rc; @@ -10467,6 +10467,7 @@ static const struct hl_asic_funcs gaudi2_funcs = { .set_dram_bar_base = gaudi2_set_hbm_bar_base, .set_engine_cores = gaudi2_set_engine_cores, .send_device_activity = gaudi2_send_device_activity, + .set_dram_properties = gaudi2_set_dram_properties, }; void gaudi2_set_asic_funcs(struct hl_device *hdev) diff --git a/drivers/misc/habanalabs/goya/goya.c b/drivers/misc/habanalabs/goya/goya.c index 0f083fcf81a6..ee0c7db16270 100644 --- a/drivers/misc/habanalabs/goya/goya.c +++ b/drivers/misc/habanalabs/goya/goya.c @@ -5420,6 +5420,11 @@ static int goya_scrub_device_dram(struct hl_device *hdev, u64 val) return -EOPNOTSUPP; } +static int goya_set_dram_properties(struct hl_device *hdev) +{ + return 0; +} + static int goya_send_device_activity(struct hl_device *hdev, bool open) { return 0; @@ -5518,6 +5523,7 @@ static const struct hl_asic_funcs goya_funcs = { .access_dev_mem = hl_access_dev_mem, .set_dram_bar_base = goya_set_ddr_bar_base, .send_device_activity = goya_send_device_activity, + .set_dram_properties = goya_set_dram_properties, }; /* From patchwork Thu Dec 8 15:13:33 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oded Gabbay X-Patchwork-Id: 31391 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp253490wrr; Thu, 8 Dec 2022 07:16:11 -0800 (PST) X-Google-Smtp-Source: AA0mqf64DOudp5wVEo8O6OjETb/QLoT7rauXEk2U2DK8rzAOJYkURDG5JHG7FTwXJoWWyfN7bUix X-Received: by 2002:a17:90b:124d:b0:21a:1dad:b9d6 with SMTP id gx13-20020a17090b124d00b0021a1dadb9d6mr4750486pjb.81.1670512571134; Thu, 08 Dec 2022 07:16:11 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670512571; cv=none; d=google.com; s=arc-20160816; b=Sy+Q5+lF9WUdkX/OIQ5zT1K17S7Mmlm8MoOoa9Z+PYCHb6X1I2WKMyM+LW4qB8usPP jHTOMlXfWiZUdNI26cHTm2n3IUIcNxHaFZ6gBMcW2KwGH0bI0Bjx50zuIz8zKwQ596uj kaU0Prq6l1tTDjuegRZdLi+rdpJwFHTbJlVr0vc7APco4Ro03uUXe+ah6auf8ew2hUBy GAnagnX2HeK434xiW0GLTigRQmBSlDnjEdnVv0HiS8c/72dQogVybRFFLA4dBI7Ys31k ejzO4nBc0BhESWpLgq4C1cgIYE7ejoun6qtBLkZhr0quQa+HpCZcyjGxQwgmkByycLIw tgPA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=zXFvVliMtfPlxvWz3EAgP6bxsqQCRCiL+e3/UEWXnxQ=; b=FQlw312OWoeo/d9SfVopsSUa/8/V9+kKkdfbAIoUiWF/OmgZDeLc09ez3KiZrvLYze vbjvJfoTHwPoA7isK4JypzD8fvEdvlDJQaxlG5od2HvGuGA1nscZNHzziRD6FnDvhOT/ W4msrDKVD2U26xDgn1tyHQSRQ0GNfT+kHiBuehnMAk0ce51BD6lHeGs7CIzd6qb4Z6Up xQ7tvDC1V+5ia9vdLmKci5DwbUwxHwLZT0uli5daHBKN0nLFw1qBG6OgoNmXVzInia0r 5qKXqWeeZwVrFmsrtbzB3gY0RdOs1mRGiTJppAmJVKXm+ps4Y+xHFY8/15uD0fUhGpD0 5K6g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=cjwsM0rK; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id i70-20020a638749000000b00478b771fa18si12458818pge.250.2022.12.08.07.15.52; Thu, 08 Dec 2022 07:16:11 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=cjwsM0rK; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230124AbiLHPO2 (ORCPT + 99 others); Thu, 8 Dec 2022 10:14:28 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58266 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229962AbiLHPOM (ORCPT ); Thu, 8 Dec 2022 10:14:12 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C17528C69C for ; Thu, 8 Dec 2022 07:14:08 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 36AD261F7D for ; Thu, 8 Dec 2022 15:14:07 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id C8ACAC433D7; Thu, 8 Dec 2022 15:14:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1670512446; bh=LyaZU2I2loV4W36afzInpEjWhu8FPBoABSC+JiOO0bI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=cjwsM0rKyAbn7b/ZWfml4xHwfDFxpxYeLXGBLH9YIPRSimvvM0r1grwp5ER5EtL0r gLw2jkjqRnjuWIB0wLpVFtg8QcPWFu7lY9m1fsVwvO1AJy/iy2u23reBobAxLG0tEL mLEc2zAjYPik6IZmDlQkuimgE5GkZmFFykswJ2yZQu30NAghhcBON464Oy8Ql4KQ2o rjSfHDcvqzvOtno+qLxLbrbI085favd5xqX6ixGFDLuPq7lPXd0vJj6j+51QGDte+U 301z54vFo98k7MaeTd1WUQcHerpZI30f/fKhrQ3epdWGr4g+m57022crQLTlJ33v6c IhmtSmhhiFh2g== From: Oded Gabbay To: linux-kernel@vger.kernel.org Cc: Marco Pagani Subject: [PATCH 09/26] habanalabs: fix double assignment in MMU V1 Date: Thu, 8 Dec 2022 17:13:33 +0200 Message-Id: <20221208151350.1833823-9-ogabbay@kernel.org> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20221208151350.1833823-1-ogabbay@kernel.org> References: <20221208151350.1833823-1-ogabbay@kernel.org> MIME-Version: 1.0 X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751659389887228060?= X-GMAIL-MSGID: =?utf-8?q?1751659389887228060?= From: Marco Pagani Removing double assignment of the hop2_pte_addr variable in dram_default_mapping_fini(). Dead store reported by clang-analyzer. Signed-off-by: Marco Pagani Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/misc/habanalabs/common/mmu/mmu_v1.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/misc/habanalabs/common/mmu/mmu_v1.c b/drivers/misc/habanalabs/common/mmu/mmu_v1.c index 8a40de4a4761..d925dc4dd097 100644 --- a/drivers/misc/habanalabs/common/mmu/mmu_v1.c +++ b/drivers/misc/habanalabs/common/mmu/mmu_v1.c @@ -344,7 +344,6 @@ static void dram_default_mapping_fini(struct hl_ctx *ctx) } } - hop2_pte_addr = hop2_addr; hop2_pte_addr = hop2_addr; for (i = 0 ; i < num_of_hop3 ; i++) { clear_pte(ctx, hop2_pte_addr); From patchwork Thu Dec 8 15:13:34 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oded Gabbay X-Patchwork-Id: 31396 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp253610wrr; Thu, 8 Dec 2022 07:16:20 -0800 (PST) X-Google-Smtp-Source: AA0mqf5FGM9c/Va11jzFtojtc2WhIdabbnYn+LrKe90mFB0gngUKLbXexhB47yVhu2hJ/pwOKPq1 X-Received: by 2002:a17:906:2692:b0:789:d492:89f5 with SMTP id t18-20020a170906269200b00789d49289f5mr82561235ejc.103.1670512579887; Thu, 08 Dec 2022 07:16:19 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670512579; cv=none; d=google.com; s=arc-20160816; b=sHrcvgAJ+thauUtEYX166SejhY/tDcb6LVCi/kj7JXqLtoP4luXPf/uECVE3/OU/EH BUlC+TFFFCPqQURbw5NpsSs7mr4V+bky/D56iWPC7DSdgHmkCJHyWZevMZZid+qJxL6G 63tFLKgtrJJdhUMevNIY2nDeH8IXmEb4wXpEoDxfz1q9bqktgRRjPY+gwCLR5TSAIMq4 uKKQZD/qjr2MvDwH4StdEyJtR1dYPJ1Wcig0SfTBCqglIAgpBLOkIDYV6qTAe2UHcKVf eUWoAKe3FIQcoFqr4grkS7sT2QEOPFYuxKIlHHKEjYZFLCRMaDhDlST2rnUqZjBbsgz7 HJpg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=oU1pYDQXI0kTBDj+i5Zl3yOrJhULk0MTLxr4L04yirU=; b=yTI5ADK5otSB02inTitUsBrgnC27KgOPODIjJEChEmz5HMi5uYDtYGjjLUtya7AqzT PqDe70vOfFM9LruyvZ5mbJADkSaFzwFW2ogywsI+2/w4N/rI24w0HR2y6nx/lWcCQjYY BNfqisK/+9qLhFtGPWFWV5tZq+FZk/r14ww1NWY5Y6rueGhOmjgP0oaqzvfTkotGI3iX 9Fk4UFZ2WqHTpSdNRJAjtLyY+M8pNdYkQeuIU0qVeagl0dgS6s+W6MYd4PP8/4ErRWB+ CNxRZn91PCC6yGjnYLpe5BeSeHWzdl8zop4p3xTCCHMfWjMm0x5aOGAXa3HZ1bSogCRF 9kKA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=t9J3+IOq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id l2-20020aa7c302000000b0046adf195723si6136372edq.251.2022.12.08.07.15.55; Thu, 08 Dec 2022 07:16:19 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=t9J3+IOq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229573AbiLHPOs (ORCPT + 99 others); Thu, 8 Dec 2022 10:14:48 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58526 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230085AbiLHPOV (ORCPT ); Thu, 8 Dec 2022 10:14:21 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 82C9CABA3B for ; Thu, 8 Dec 2022 07:14:11 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 83A0A61F6B for ; Thu, 8 Dec 2022 15:14:08 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 24244C433C1; Thu, 8 Dec 2022 15:14:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1670512447; bh=z5+gFbi7QyqTQehzztxl1G4UjrcDzJw7IykCbAog4ms=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=t9J3+IOq+xGe3DxdnCkYTHRXkIatT8U4U+QTOw9ulmQ9bHtaaCcy+xrfe4+A8thPu aqasXFMs2G9AYqdKqjDfIdWzOqyx2mf66RUAsLXv6Oz08y3zQjecpgDaEkAPXTSwOm kNKOycYRm4cirkN1Wd4UPLod6VaWBbZpdI5wWXZ/7d3G3xnDEZDZHbkdabcEJ+tLYr RDdkuC3umLlpukqEG19lskw5KTlnAI0pQn5DNQcpswSjNX0PpUNbW7rix3XD7XMgrW GuU72PgTLiSrg2qG4I/Q2qx+lENR6OHHlORhmKRKft6FVUv0OUZYgfwDJMliljwdLR oIBcQDTy+I2PQ== From: Oded Gabbay To: linux-kernel@vger.kernel.org Cc: Ohad Sharabi Subject: [PATCH 10/26] habanalabs: update DRAM props according to preboot data Date: Thu, 8 Dec 2022 17:13:34 +0200 Message-Id: <20221208151350.1833823-10-ogabbay@kernel.org> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20221208151350.1833823-1-ogabbay@kernel.org> References: <20221208151350.1833823-1-ogabbay@kernel.org> MIME-Version: 1.0 X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751659398839369843?= X-GMAIL-MSGID: =?utf-8?q?1751659398839369843?= From: Ohad Sharabi If the f/w reports the binning masks at the preboot stage, the driver must align its DRAM properties according to the new information. Signed-off-by: Ohad Sharabi Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/misc/habanalabs/common/firmware_if.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/misc/habanalabs/common/firmware_if.c b/drivers/misc/habanalabs/common/firmware_if.c index 4f364c3085fe..046866c673e2 100644 --- a/drivers/misc/habanalabs/common/firmware_if.c +++ b/drivers/misc/habanalabs/common/firmware_if.c @@ -2582,6 +2582,10 @@ static int hl_fw_dynamic_init_cpu(struct hl_device *hdev, hdev->decoder_binning = le32_to_cpu(binning_info->dec_mask); hdev->rotator_binning = le32_to_cpu(binning_info->rot_mask); + rc = hdev->asic_funcs->set_dram_properties(hdev); + if (rc) + goto out; + dev_dbg(hdev->dev, "Read binning masks: tpc: 0x%llx, dram: 0x%llx, edma: 0x%x, dec: 0x%x, rot:0x%x\n", hdev->tpc_binning, hdev->dram_binning, hdev->edma_binning, From patchwork Thu Dec 8 15:13:35 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oded Gabbay X-Patchwork-Id: 31398 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp253640wrr; Thu, 8 Dec 2022 07:16:23 -0800 (PST) X-Google-Smtp-Source: AA0mqf5sWJmKBzyu6vF8OWFwDKJayJKbdpgVm4dUs+KfWy63fN0HNHHWK3s8E6VmBXT4NXmtJFi0 X-Received: by 2002:a65:5606:0:b0:477:b96f:1f62 with SMTP id l6-20020a655606000000b00477b96f1f62mr61607804pgs.563.1670512582765; Thu, 08 Dec 2022 07:16:22 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670512582; cv=none; d=google.com; s=arc-20160816; b=BUfvJtVY6MLnc6/5bZyQrNLMjA12UO83k3oqQHcc7QUyhj1Iqpll7Y6UrPETwgHxBn EP77HjQI0Y6GtPeRTnykcGLQCe+qlzCX/7xCp9kwQ4PmcdODB+6UW/3tOyclwYdiWw46 yaY0Ud4flzsvz5LrNmz2xBW7x0pZMwMC7Oof4tSv8vJSzPu3gzMHGWmmK0hJDfCDgiGf hMvOO+Q6L4fYXw+nx/+MBClxvHPPOcbkQoNQ1J0uOl/suoM29JtZKDz+rUuJBhEDAChQ Ae+Oi5wuznouFAzDMLkDR7U7L7TRE85EWXGJPI68gmrITn5tm5Hba7bW6tl6R2N+0MKD UggA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=tiDN+lG5x9pn6u4eU52e1+XCTs3qYdvTnbQte5mScPA=; b=J2tnCJvqhqRVL9VM82/pRo8y0Xgm6LMX7XV91g81RADzVXG6SCgypTY+CMnXHJyrDo Rm+wDB+zCoHwOpYG9vDdVswLo5J6BejKcTEwYCtTQ7U1GBcfgzEN/EpgHzRZ7KeU4ugk LGf10oV1hJkFtsWXFD/1X/G2G0D6b/SRMQ0Hozr/CSqqBpUkzKzHB5s5pjbIPlKfRNFg fKFJ15rP/xWVWH4S08tT2leFLrL+v62c8In4s/fXRPBfxXKjbAeKVPbCnSP8W5KoLYw5 JSG7wSlTB9hN1PUNIo1M30d8s9UKn5NSY+G4kEQwZ/qkgtb+1N+YYB9jWUi3rayxl9pO rWtw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=kcy1o6rF; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id h185-20020a6383c2000000b00478eacb6575si6618896pge.156.2022.12.08.07.16.08; Thu, 08 Dec 2022 07:16:22 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=kcy1o6rF; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230242AbiLHPO4 (ORCPT + 99 others); Thu, 8 Dec 2022 10:14:56 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58498 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230106AbiLHPOW (ORCPT ); Thu, 8 Dec 2022 10:14:22 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 82B11ABA3A for ; Thu, 8 Dec 2022 07:14:11 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 430EF61F81 for ; Thu, 8 Dec 2022 15:14:10 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7245DC433D6; Thu, 8 Dec 2022 15:14:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1670512449; bh=2OvQYim45wB3jh/pgwiXWsMYOHe1O1yP6ejcydoTWVI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=kcy1o6rFq3/Seox94uQ8JtdsWtv9UIV4D5KsCBfUffN9vdtAI0UvbCI+NeMu1AYb5 o3TWhsNBTah0d0VOdkBUmu81li1lfot22xlKFuQf1yNlX6ZKMN6EJb/IAX2v+gTPOJ k5V6XtPmpEt7LO3bJSDtPEr5xys2Cd/lsltOxgT496FYQPhZ6fXlOkzYB2Ng/+VxZZ 4gub22dSaxW9lAQonz01yH+vqwDVd1BPvX2c/9YB3mQ6O/jytT5gDxaNJ6c8V8/WUf KxIxId+Xkx9LXamURzUUpfNTiJ259W8saqNwz6/nZUGbFkC1ar9CwzXwb36hIxAdXw LE3yKT+Kj8o7Q== From: Oded Gabbay To: linux-kernel@vger.kernel.org Cc: Ofir Bitton Subject: [PATCH 11/26] habanalabs/gaudi2: count interrupt causes Date: Thu, 8 Dec 2022 17:13:35 +0200 Message-Id: <20221208151350.1833823-11-ogabbay@kernel.org> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20221208151350.1833823-1-ogabbay@kernel.org> References: <20221208151350.1833823-1-ogabbay@kernel.org> MIME-Version: 1.0 X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751659401913383252?= X-GMAIL-MSGID: =?utf-8?q?1751659401913383252?= From: Ofir Bitton During event handling we extract interrupt cause and count it. In case we could not find any cause we should add proper error. Signed-off-by: Ofir Bitton Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/misc/habanalabs/gaudi2/gaudi2.c | 361 +++++++++++++++++------- 1 file changed, 252 insertions(+), 109 deletions(-) diff --git a/drivers/misc/habanalabs/gaudi2/gaudi2.c b/drivers/misc/habanalabs/gaudi2/gaudi2.c index 10c017b8ddfa..b8da2aa024ca 100644 --- a/drivers/misc/habanalabs/gaudi2/gaudi2.c +++ b/drivers/misc/habanalabs/gaudi2/gaudi2.c @@ -53,6 +53,7 @@ #define GAUDI2_HIF_HMMU_FULL_MASK 0xFFFF #define GAUDI2_DECODER_FULL_MASK 0x3FF +#define GAUDI2_NA_EVENT_CAUSE 0xFF #define GAUDI2_NUM_OF_QM_ERR_CAUSE 18 #define GAUDI2_NUM_OF_QM_LCP_ERR_CAUSE 25 #define GAUDI2_NUM_OF_QM_ARB_ERR_CAUSE 3 @@ -6987,10 +6988,10 @@ static void print_qman_data_on_err(struct hl_device *hdev, u32 qid_base, u32 str gaudi2_print_last_pqes_on_err(hdev, qid_base, i, qman_base, false); } -static void gaudi2_handle_qman_err_generic(struct hl_device *hdev, const char *qm_name, - u64 qman_base, u32 qid_base) +static int gaudi2_handle_qman_err_generic(struct hl_device *hdev, const char *qm_name, + u64 qman_base, u32 qid_base) { - u32 i, j, glbl_sts_val, arb_err_val, num_error_causes; + u32 i, j, glbl_sts_val, arb_err_val, num_error_causes, error_count = 0; u64 glbl_sts_addr, arb_err_addr; char reg_desc[32]; @@ -7013,12 +7014,14 @@ static void gaudi2_handle_qman_err_generic(struct hl_device *hdev, const char *q } for (j = 0 ; j < num_error_causes ; j++) - if (glbl_sts_val & BIT(j)) + if (glbl_sts_val & BIT(j)) { dev_err_ratelimited(hdev->dev, "%s %s. err cause: %s\n", qm_name, reg_desc, i == QMAN_STREAMS ? gaudi2_qman_lower_cp_error_cause[j] : gaudi2_qman_error_cause[j]); + error_count++; + } print_qman_data_on_err(hdev, qid_base, i, qman_base); } @@ -7026,13 +7029,18 @@ static void gaudi2_handle_qman_err_generic(struct hl_device *hdev, const char *q arb_err_val = RREG32(arb_err_addr); if (!arb_err_val) - return; + goto out; for (j = 0 ; j < GAUDI2_NUM_OF_QM_ARB_ERR_CAUSE ; j++) { - if (arb_err_val & BIT(j)) + if (arb_err_val & BIT(j)) { dev_err_ratelimited(hdev->dev, "%s ARB_ERR. err cause: %s\n", qm_name, gaudi2_qman_arb_error_cause[j]); + error_count++; + } } + +out: + return error_count; } static void gaudi2_razwi_rr_hbw_shared_printf_info(struct hl_device *hdev, @@ -7675,17 +7683,17 @@ static void gaudi2_razwi_unmapped_addr_lbw_printf_info(struct hl_device *hdev, u } /* PSOC RAZWI interrupt occurs only when trying to access a bad address */ -static void gaudi2_ack_psoc_razwi_event_handler(struct hl_device *hdev, u64 *event_mask) +static int gaudi2_ack_psoc_razwi_event_handler(struct hl_device *hdev, u64 *event_mask) { u32 hbw_aw_set, hbw_ar_set, lbw_aw_set, lbw_ar_set, rtr_id, dcore_id, dcore_rtr_id, xy, - razwi_mask_info, razwi_intr = 0; + razwi_mask_info, razwi_intr = 0, error_count = 0; int rtr_map_arr_len = NUM_OF_RTR_PER_DCORE * NUM_OF_DCORES; u64 rtr_ctrl_base_addr; if (hdev->pldm || !(hdev->fw_components & FW_TYPE_LINUX)) { razwi_intr = RREG32(mmPSOC_GLOBAL_CONF_RAZWI_INTERRUPT); if (!razwi_intr) - return; + return 0; } razwi_mask_info = RREG32(mmPSOC_GLOBAL_CONF_RAZWI_MASK_INFO); @@ -7743,15 +7751,19 @@ static void gaudi2_ack_psoc_razwi_event_handler(struct hl_device *hdev, u64 *eve gaudi2_razwi_unmapped_addr_lbw_printf_info(hdev, rtr_id, rtr_ctrl_base_addr, false, event_mask); + error_count++; + clear: /* Clear Interrupts only on pldm or if f/w doesn't handle interrupts */ if (hdev->pldm || !(hdev->fw_components & FW_TYPE_LINUX)) WREG32(mmPSOC_GLOBAL_CONF_RAZWI_INTERRUPT, razwi_intr); + + return error_count; } -static void _gaudi2_handle_qm_sei_err(struct hl_device *hdev, u64 qman_base) +static int _gaudi2_handle_qm_sei_err(struct hl_device *hdev, u64 qman_base) { - u32 i, sts_val, sts_clr_val = 0; + u32 i, sts_val, sts_clr_val = 0, error_count = 0; sts_val = RREG32(qman_base + QM_SEI_STATUS_OFFSET); @@ -7760,16 +7772,20 @@ static void _gaudi2_handle_qm_sei_err(struct hl_device *hdev, u64 qman_base) dev_err_ratelimited(hdev->dev, "QM SEI. err cause: %s\n", gaudi2_qm_sei_error_cause[i]); sts_clr_val |= BIT(i); + error_count++; } } WREG32(qman_base + QM_SEI_STATUS_OFFSET, sts_clr_val); + + return error_count; } -static void gaudi2_handle_qm_sei_err(struct hl_device *hdev, u16 event_type, +static int gaudi2_handle_qm_sei_err(struct hl_device *hdev, u16 event_type, struct hl_eq_razwi_info *razwi_info, u64 *event_mask) { enum razwi_event_sources module; + u32 error_count = 0; u64 qman_base; u8 index; @@ -7808,24 +7824,27 @@ static void gaudi2_handle_qm_sei_err(struct hl_device *hdev, u16 event_type, module = RAZWI_ROT; break; default: - return; + return 0; } - _gaudi2_handle_qm_sei_err(hdev, qman_base); + error_count = _gaudi2_handle_qm_sei_err(hdev, qman_base); /* There is a single event per NIC macro, so should check its both QMAN blocks */ if (event_type >= GAUDI2_EVENT_NIC0_AXI_ERROR_RESPONSE && event_type <= GAUDI2_EVENT_NIC11_AXI_ERROR_RESPONSE) - _gaudi2_handle_qm_sei_err(hdev, qman_base + NIC_QM_OFFSET); + error_count += _gaudi2_handle_qm_sei_err(hdev, + qman_base + NIC_QM_OFFSET); /* check if RAZWI happened */ if (razwi_info) gaudi2_ack_module_razwi_event_handler(hdev, module, 0, 0, razwi_info, event_mask); + + return error_count; } -static void gaudi2_handle_qman_err(struct hl_device *hdev, u16 event_type) +static int gaudi2_handle_qman_err(struct hl_device *hdev, u16 event_type) { - u32 qid_base; + u32 qid_base, error_count = 0; u64 qman_base; char desc[32]; u8 index; @@ -7941,19 +7960,21 @@ static void gaudi2_handle_qman_err(struct hl_device *hdev, u16 event_type) snprintf(desc, ARRAY_SIZE(desc), "ROTATOR1_QM"); break; default: - return; + return 0; } - gaudi2_handle_qman_err_generic(hdev, desc, qman_base, qid_base); + error_count = gaudi2_handle_qman_err_generic(hdev, desc, qman_base, qid_base); /* Handle EDMA QM SEI here because there is no AXI error response event for EDMA */ if (event_type >= GAUDI2_EVENT_HDMA2_QM && event_type <= GAUDI2_EVENT_HDMA5_QM) - _gaudi2_handle_qm_sei_err(hdev, qman_base); + error_count += _gaudi2_handle_qm_sei_err(hdev, qman_base); + + return error_count; } -static void gaudi2_handle_arc_farm_sei_err(struct hl_device *hdev) +static int gaudi2_handle_arc_farm_sei_err(struct hl_device *hdev) { - u32 i, sts_val, sts_clr_val = 0; + u32 i, sts_val, sts_clr_val = 0, error_count = 0; sts_val = RREG32(mmARC_FARM_ARC0_AUX_ARC_SEI_INTR_STS); @@ -7962,15 +7983,18 @@ static void gaudi2_handle_arc_farm_sei_err(struct hl_device *hdev) dev_err_ratelimited(hdev->dev, "ARC SEI. err cause: %s\n", gaudi2_arc_sei_error_cause[i]); sts_clr_val |= BIT(i); + error_count++; } } WREG32(mmARC_FARM_ARC0_AUX_ARC_SEI_INTR_CLR, sts_clr_val); + + return error_count; } -static void gaudi2_handle_cpu_sei_err(struct hl_device *hdev) +static int gaudi2_handle_cpu_sei_err(struct hl_device *hdev) { - u32 i, sts_val, sts_clr_val = 0; + u32 i, sts_val, sts_clr_val = 0, error_count = 0; sts_val = RREG32(mmCPU_IF_CPU_SEI_INTR_STS); @@ -7979,50 +8003,63 @@ static void gaudi2_handle_cpu_sei_err(struct hl_device *hdev) dev_err_ratelimited(hdev->dev, "CPU SEI. err cause: %s\n", gaudi2_cpu_sei_error_cause[i]); sts_clr_val |= BIT(i); + error_count++; } } WREG32(mmCPU_IF_CPU_SEI_INTR_CLR, sts_clr_val); + + return error_count; } -static void gaudi2_handle_rot_err(struct hl_device *hdev, u8 rot_index, +static int gaudi2_handle_rot_err(struct hl_device *hdev, u8 rot_index, struct hl_eq_razwi_with_intr_cause *razwi_with_intr_cause, u64 *event_mask) { u64 intr_cause_data = le64_to_cpu(razwi_with_intr_cause->intr_cause.intr_cause_data); + u32 error_count = 0; int i; for (i = 0 ; i < GAUDI2_NUM_OF_ROT_ERR_CAUSE ; i++) - if (intr_cause_data & BIT(i)) + if (intr_cause_data & BIT(i)) { dev_err_ratelimited(hdev->dev, "ROT%u. err cause: %s\n", rot_index, guadi2_rot_error_cause[i]); + error_count++; + } /* check if RAZWI happened */ gaudi2_ack_module_razwi_event_handler(hdev, RAZWI_ROT, rot_index, 0, &razwi_with_intr_cause->razwi_info, event_mask); + + return error_count; } -static void gaudi2_tpc_ack_interrupts(struct hl_device *hdev, u8 tpc_index, char *interrupt_name, +static int gaudi2_tpc_ack_interrupts(struct hl_device *hdev, u8 tpc_index, char *interrupt_name, struct hl_eq_razwi_with_intr_cause *razwi_with_intr_cause, u64 *event_mask) { u64 intr_cause_data = le64_to_cpu(razwi_with_intr_cause->intr_cause.intr_cause_data); + u32 error_count = 0; int i; for (i = 0 ; i < GAUDI2_NUM_OF_TPC_INTR_CAUSE ; i++) - if (intr_cause_data & BIT(i)) + if (intr_cause_data & BIT(i)) { dev_err_ratelimited(hdev->dev, "TPC%d_%s interrupt cause: %s\n", tpc_index, interrupt_name, gaudi2_tpc_interrupts_cause[i]); + error_count++; + } /* check if RAZWI happened */ gaudi2_ack_module_razwi_event_handler(hdev, RAZWI_TPC, tpc_index, 0, &razwi_with_intr_cause->razwi_info, event_mask); + + return error_count; } -static void gaudi2_handle_dec_err(struct hl_device *hdev, u8 dec_index, const char *interrupt_name, +static int gaudi2_handle_dec_err(struct hl_device *hdev, u8 dec_index, const char *interrupt_name, struct hl_eq_razwi_info *razwi_info, u64 *event_mask) { - u32 sts_addr, sts_val, sts_clr_val = 0; + u32 sts_addr, sts_val, sts_clr_val = 0, error_count = 0; int i; if (dec_index < NUM_OF_VDEC_PER_DCORE * NUM_OF_DCORES) @@ -8042,6 +8079,7 @@ static void gaudi2_handle_dec_err(struct hl_device *hdev, u8 dec_index, const ch dev_err_ratelimited(hdev->dev, "DEC%u_%s err cause: %s\n", dec_index, interrupt_name, gaudi2_dec_error_cause[i]); sts_clr_val |= BIT(i); + error_count++; } } @@ -8051,12 +8089,14 @@ static void gaudi2_handle_dec_err(struct hl_device *hdev, u8 dec_index, const ch /* Write 1 clear errors */ WREG32(sts_addr, sts_clr_val); + + return error_count; } -static void gaudi2_handle_mme_err(struct hl_device *hdev, u8 mme_index, const char *interrupt_name, +static int gaudi2_handle_mme_err(struct hl_device *hdev, u8 mme_index, const char *interrupt_name, struct hl_eq_razwi_info *razwi_info, u64 *event_mask) { - u32 sts_addr, sts_val, sts_clr_addr, sts_clr_val = 0; + u32 sts_addr, sts_val, sts_clr_addr, sts_clr_val = 0, error_count = 0; int i; sts_addr = mmDCORE0_MME_CTRL_LO_INTR_CAUSE + DCORE_OFFSET * mme_index; @@ -8069,6 +8109,7 @@ static void gaudi2_handle_mme_err(struct hl_device *hdev, u8 mme_index, const ch dev_err_ratelimited(hdev->dev, "MME%u_%s err cause: %s\n", mme_index, interrupt_name, guadi2_mme_error_cause[i]); sts_clr_val |= BIT(i); + error_count++; } } @@ -8078,23 +8119,29 @@ static void gaudi2_handle_mme_err(struct hl_device *hdev, u8 mme_index, const ch event_mask); WREG32(sts_clr_addr, sts_clr_val); + + return error_count; } -static void gaudi2_handle_mme_sbte_err(struct hl_device *hdev, u8 mme_index, u8 sbte_index, +static int gaudi2_handle_mme_sbte_err(struct hl_device *hdev, u8 mme_index, u8 sbte_index, u64 intr_cause_data) { - int i; + int i, error_count = 0; for (i = 0 ; i < GAUDI2_NUM_OF_MME_SBTE_ERR_CAUSE ; i++) - if (intr_cause_data & BIT(i)) + if (intr_cause_data & BIT(i)) { dev_err_ratelimited(hdev->dev, "MME%uSBTE%u_AXI_ERR_RSP err cause: %s\n", mme_index, sbte_index, guadi2_mme_sbte_error_cause[i]); + error_count++; + } + + return error_count; } -static void gaudi2_handle_mme_wap_err(struct hl_device *hdev, u8 mme_index, +static int gaudi2_handle_mme_wap_err(struct hl_device *hdev, u8 mme_index, struct hl_eq_razwi_info *razwi_info, u64 *event_mask) { - u32 sts_addr, sts_val, sts_clr_addr, sts_clr_val = 0; + u32 sts_addr, sts_val, sts_clr_addr, sts_clr_val = 0, error_count = 0; int i; sts_addr = mmDCORE0_MME_ACC_INTR_CAUSE + DCORE_OFFSET * mme_index; @@ -8108,6 +8155,7 @@ static void gaudi2_handle_mme_wap_err(struct hl_device *hdev, u8 mme_index, "MME%u_WAP_SOURCE_RESULT_INVALID err cause: %s\n", mme_index, guadi2_mme_wap_error_cause[i]); sts_clr_val |= BIT(i); + error_count++; } } @@ -8118,10 +8166,13 @@ static void gaudi2_handle_mme_wap_err(struct hl_device *hdev, u8 mme_index, event_mask); WREG32(sts_clr_addr, sts_clr_val); + + return error_count; } -static void gaudi2_handle_kdma_core_event(struct hl_device *hdev, u64 intr_cause_data) +static int gaudi2_handle_kdma_core_event(struct hl_device *hdev, u64 intr_cause_data) { + u32 error_count = 0; int i; /* If an AXI read or write error is received, an error is reported and @@ -8130,19 +8181,28 @@ static void gaudi2_handle_kdma_core_event(struct hl_device *hdev, u64 intr_cause * the actual error caused by a LBW KDMA transaction. */ for (i = 0 ; i < GAUDI2_NUM_OF_DMA_CORE_INTR_CAUSE ; i++) - if (intr_cause_data & BIT(i)) + if (intr_cause_data & BIT(i)) { dev_err_ratelimited(hdev->dev, "kdma core err cause: %s\n", gaudi2_kdma_core_interrupts_cause[i]); + error_count++; + } + + return error_count; } -static void gaudi2_handle_dma_core_event(struct hl_device *hdev, u64 intr_cause_data) +static int gaudi2_handle_dma_core_event(struct hl_device *hdev, u64 intr_cause_data) { + u32 error_count = 0; int i; for (i = 0 ; i < GAUDI2_NUM_OF_DMA_CORE_INTR_CAUSE ; i++) - if (intr_cause_data & BIT(i)) + if (intr_cause_data & BIT(i)) { dev_err_ratelimited(hdev->dev, "dma core err cause: %s\n", gaudi2_dma_core_interrupts_cause[i]); + error_count++; + } + + return error_count; } static void gaudi2_print_pcie_mstr_rr_mstr_if_razwi_info(struct hl_device *hdev, u64 *event_mask) @@ -8178,9 +8238,10 @@ static void gaudi2_print_pcie_mstr_rr_mstr_if_razwi_info(struct hl_device *hdev, } } -static void gaudi2_print_pcie_addr_dec_info(struct hl_device *hdev, u64 intr_cause_data, +static int gaudi2_print_pcie_addr_dec_info(struct hl_device *hdev, u64 intr_cause_data, u64 *event_mask) { + u32 error_count = 0; int i; for (i = 0 ; i < GAUDI2_NUM_OF_PCIE_ADDR_DEC_ERR_CAUSE ; i++) { @@ -8189,6 +8250,7 @@ static void gaudi2_print_pcie_addr_dec_info(struct hl_device *hdev, u64 intr_cau dev_err_ratelimited(hdev->dev, "PCIE ADDR DEC Error: %s\n", gaudi2_pcie_addr_dec_error_cause[i]); + error_count++; switch (intr_cause_data & BIT_ULL(i)) { case PCIE_WRAP_PCIE_IC_SEI_INTR_IND_AXI_LBW_ERR_INTR_MASK: @@ -8198,33 +8260,44 @@ static void gaudi2_print_pcie_addr_dec_info(struct hl_device *hdev, u64 intr_cau break; } } + + return error_count; } -static void gaudi2_handle_pif_fatal(struct hl_device *hdev, u64 intr_cause_data) +static int gaudi2_handle_pif_fatal(struct hl_device *hdev, u64 intr_cause_data) { + u32 error_count = 0; int i; for (i = 0 ; i < GAUDI2_NUM_OF_PMMU_FATAL_ERR_CAUSE ; i++) { - if (intr_cause_data & BIT_ULL(i)) + if (intr_cause_data & BIT_ULL(i)) { dev_err_ratelimited(hdev->dev, "PMMU PIF err cause: %s\n", gaudi2_pmmu_fatal_interrupts_cause[i]); + error_count++; + } } + + return error_count; } -static void gaudi2_handle_hif_fatal(struct hl_device *hdev, u16 event_type, u64 intr_cause_data) +static int gaudi2_handle_hif_fatal(struct hl_device *hdev, u16 event_type, u64 intr_cause_data) { - u32 dcore_id, hif_id; + u32 dcore_id, hif_id, error_count = 0; int i; dcore_id = (event_type - GAUDI2_EVENT_HIF0_FATAL) / 4; hif_id = (event_type - GAUDI2_EVENT_HIF0_FATAL) % 4; for (i = 0 ; i < GAUDI2_NUM_OF_HIF_FATAL_ERR_CAUSE ; i++) { - if (intr_cause_data & BIT_ULL(i)) + if (intr_cause_data & BIT_ULL(i)) { dev_err_ratelimited(hdev->dev, "DCORE%u_HIF%u: %s\n", dcore_id, hif_id, gaudi2_hif_fatal_interrupts_cause[i]); + error_count++; + } } + + return error_count; } static void gaudi2_handle_page_error(struct hl_device *hdev, u64 mmu_base, bool is_pmmu, @@ -8270,10 +8343,10 @@ static void gaudi2_handle_access_error(struct hl_device *hdev, u64 mmu_base, boo WREG32(mmu_base + MMU_OFFSET(mmDCORE0_HMMU0_MMU_ACCESS_ERROR_CAPTURE), 0); } -static void gaudi2_handle_mmu_spi_sei_generic(struct hl_device *hdev, const char *mmu_name, +static int gaudi2_handle_mmu_spi_sei_generic(struct hl_device *hdev, const char *mmu_name, u64 mmu_base, bool is_pmmu, u64 *event_mask) { - u32 spi_sei_cause, interrupt_clr = 0x0; + u32 spi_sei_cause, interrupt_clr = 0x0, error_count = 0; int i; spi_sei_cause = RREG32(mmu_base + MMU_SPI_SEI_CAUSE_OFFSET); @@ -8290,6 +8363,8 @@ static void gaudi2_handle_mmu_spi_sei_generic(struct hl_device *hdev, const char if (gaudi2_mmu_spi_sei[i].clear_bit >= 0) interrupt_clr |= BIT(gaudi2_mmu_spi_sei[i].clear_bit); + + error_count++; } } @@ -8298,12 +8373,14 @@ static void gaudi2_handle_mmu_spi_sei_generic(struct hl_device *hdev, const char /* Clear interrupt */ WREG32(mmu_base + MMU_INTERRUPT_CLR_OFFSET, interrupt_clr); + + return error_count; } -static void gaudi2_handle_sm_err(struct hl_device *hdev, u8 sm_index) +static int gaudi2_handle_sm_err(struct hl_device *hdev, u8 sm_index) { - u32 sei_cause_addr, sei_cause_val, sei_cause_cause, sei_cause_log; - u32 cq_intr_addr, cq_intr_val, cq_intr_queue_index; + u32 sei_cause_addr, sei_cause_val, sei_cause_cause, sei_cause_log, + cq_intr_addr, cq_intr_val, cq_intr_queue_index, error_count = 0; int i; sei_cause_addr = mmDCORE0_SYNC_MNGR_GLBL_SM_SEI_CAUSE + DCORE_OFFSET * sm_index; @@ -8328,6 +8405,7 @@ static void gaudi2_handle_sm_err(struct hl_device *hdev, u8 sm_index) gaudi2_sm_sei_cause[i].cause_name, gaudi2_sm_sei_cause[i].log_name, sei_cause_log & gaudi2_sm_sei_cause[i].log_mask); + error_count++; break; } @@ -8343,16 +8421,20 @@ static void gaudi2_handle_sm_err(struct hl_device *hdev, u8 sm_index) dev_err_ratelimited(hdev->dev, "SM%u err. err cause: CQ_INTR. queue index: %u\n", sm_index, cq_intr_queue_index); + error_count++; /* Clear CQ_INTR */ WREG32(cq_intr_addr, 0); } + + return error_count; } -static void gaudi2_handle_mmu_spi_sei_err(struct hl_device *hdev, u16 event_type, u64 *event_mask) +static int gaudi2_handle_mmu_spi_sei_err(struct hl_device *hdev, u16 event_type, u64 *event_mask) { bool is_pmmu = false; char desc[32]; + u32 error_count = 0; u64 mmu_base; u8 index; @@ -8404,10 +8486,12 @@ static void gaudi2_handle_mmu_spi_sei_err(struct hl_device *hdev, u16 event_type snprintf(desc, ARRAY_SIZE(desc), "PMMU"); break; default: - return; + return 0; } - gaudi2_handle_mmu_spi_sei_generic(hdev, desc, mmu_base, is_pmmu, event_mask); + error_count = gaudi2_handle_mmu_spi_sei_generic(hdev, desc, mmu_base, is_pmmu, event_mask); + + return error_count; } @@ -8586,21 +8670,30 @@ static bool gaudi2_handle_hbm_mc_sei_err(struct hl_device *hdev, u16 event_type, return require_hard_reset; } -static void gaudi2_handle_hbm_cattrip(struct hl_device *hdev, u64 intr_cause_data) +static int gaudi2_handle_hbm_cattrip(struct hl_device *hdev, u64 intr_cause_data) { - dev_err(hdev->dev, - "HBM catastrophic temperature error (CATTRIP) cause %#llx\n", - intr_cause_data); + if (intr_cause_data) { + dev_err(hdev->dev, + "HBM catastrophic temperature error (CATTRIP) cause %#llx\n", + intr_cause_data); + return 1; + } + + return 0; } -static void gaudi2_handle_hbm_mc_spi(struct hl_device *hdev, u64 intr_cause_data) +static int gaudi2_handle_hbm_mc_spi(struct hl_device *hdev, u64 intr_cause_data) { - u32 i; + u32 i, error_count = 0; for (i = 0 ; i < GAUDI2_NUM_OF_HBM_MC_SPI_CAUSE ; i++) - if (intr_cause_data & hbm_mc_spi[i].mask) + if (intr_cause_data & hbm_mc_spi[i].mask) { dev_dbg(hdev->dev, "HBM spi event: notification cause(%s)\n", hbm_mc_spi[i].cause); + error_count++; + } + + return error_count; } static void gaudi2_print_clk_change_info(struct hl_device *hdev, u16 event_type, u64 *event_mask) @@ -8657,9 +8750,9 @@ static void gaudi2_print_out_of_sync_info(struct hl_device *hdev, le32_to_cpu(sync_err->pi), le32_to_cpu(sync_err->ci), q->pi, atomic_read(&q->ci)); } -static void gaudi2_handle_pcie_p2p_msix(struct hl_device *hdev) +static int gaudi2_handle_pcie_p2p_msix(struct hl_device *hdev) { - u32 p2p_intr, msix_gw_intr; + u32 p2p_intr, msix_gw_intr, error_count = 0; p2p_intr = RREG32(mmPCIE_WRAP_P2P_INTR); msix_gw_intr = RREG32(mmPCIE_WRAP_MSIX_GW_INTR); @@ -8670,6 +8763,7 @@ static void gaudi2_handle_pcie_p2p_msix(struct hl_device *hdev) RREG32(mmPCIE_WRAP_P2P_REQ_ID)); WREG32(mmPCIE_WRAP_P2P_INTR, 0x1); + error_count++; } if (msix_gw_intr) { @@ -8678,13 +8772,16 @@ static void gaudi2_handle_pcie_p2p_msix(struct hl_device *hdev) RREG32(mmPCIE_WRAP_MSIX_GW_VEC)); WREG32(mmPCIE_WRAP_MSIX_GW_INTR, 0x1); + error_count++; } + + return error_count; } -static void gaudi2_handle_pcie_drain(struct hl_device *hdev, +static int gaudi2_handle_pcie_drain(struct hl_device *hdev, struct hl_eq_pcie_drain_ind_data *drain_data) { - u64 lbw_rd, lbw_wr, hbw_rd, hbw_wr, cause; + u64 lbw_rd, lbw_wr, hbw_rd, hbw_wr, cause, error_count = 0; cause = le64_to_cpu(drain_data->intr_cause.intr_cause_data); lbw_rd = le64_to_cpu(drain_data->drain_rd_addr_lbw); @@ -8692,26 +8789,37 @@ static void gaudi2_handle_pcie_drain(struct hl_device *hdev, hbw_rd = le64_to_cpu(drain_data->drain_rd_addr_hbw); hbw_wr = le64_to_cpu(drain_data->drain_wr_addr_hbw); - if (cause & BIT_ULL(0)) + if (cause & BIT_ULL(0)) { dev_err_ratelimited(hdev->dev, "PCIE AXI drain LBW completed, read_err %u, write_err %u\n", !!lbw_rd, !!lbw_wr); + error_count++; + } - if (cause & BIT_ULL(1)) + if (cause & BIT_ULL(1)) { dev_err_ratelimited(hdev->dev, "PCIE AXI drain HBW completed, raddr %#llx, waddr %#llx\n", hbw_rd, hbw_wr); + error_count++; + } + + return error_count; } -static void gaudi2_handle_psoc_drain(struct hl_device *hdev, u64 intr_cause_data) +static int gaudi2_handle_psoc_drain(struct hl_device *hdev, u64 intr_cause_data) { + u32 error_count = 0; int i; for (i = 0 ; i < GAUDI2_NUM_OF_AXI_DRAIN_ERR_CAUSE ; i++) { - if (intr_cause_data & BIT_ULL(i)) + if (intr_cause_data & BIT_ULL(i)) { dev_err_ratelimited(hdev->dev, "PSOC %s completed\n", gaudi2_psoc_axi_drain_interrupts_cause[i]); + error_count++; + } } + + return error_count; } static void gaudi2_print_cpu_pkt_failure_info(struct hl_device *hdev, @@ -8724,8 +8832,7 @@ static void gaudi2_print_cpu_pkt_failure_info(struct hl_device *hdev, le32_to_cpu(sync_err->pi), le32_to_cpu(sync_err->ci), q->pi, atomic_read(&q->ci)); } -static void hl_arc_event_handle(struct hl_device *hdev, - struct hl_eq_engine_arc_intr_data *data) +static int hl_arc_event_handle(struct hl_device *hdev, struct hl_eq_engine_arc_intr_data *data) { struct hl_engine_arc_dccm_queue_full_irq *q; u32 intr_type, engine_id; @@ -8742,9 +8849,10 @@ static void hl_arc_event_handle(struct hl_device *hdev, dev_err_ratelimited(hdev->dev, "ARC DCCM Full event: EngId: %u, Intr_type: %u, Qidx: %u\n", engine_id, intr_type, q->queue_index); - break; + return 1; default: dev_err_ratelimited(hdev->dev, "Unknown ARC event type\n"); + return 0; } } @@ -8752,7 +8860,7 @@ static void gaudi2_handle_eqe(struct hl_device *hdev, struct hl_eq_entry *eq_ent { struct gaudi2_device *gaudi2 = hdev->asic_specific; bool reset_required = false, is_critical = false; - u32 ctl, reset_flags = HL_DRV_RESET_HARD; + u32 ctl, reset_flags = HL_DRV_RESET_HARD, error_count = 0; int index, sbte_index; u64 event_mask = 0; u16 event_type; @@ -8779,6 +8887,7 @@ static void gaudi2_handle_eqe(struct hl_device *hdev, struct hl_eq_entry *eq_ent event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR; reset_required = gaudi2_handle_ecc_event(hdev, event_type, &eq_entry->ecc_data); is_critical = eq_entry->ecc_data.is_critical; + error_count++; break; case GAUDI2_EVENT_TPC0_QM ... GAUDI2_EVENT_PDMA1_QM: @@ -8786,48 +8895,50 @@ static void gaudi2_handle_eqe(struct hl_device *hdev, struct hl_eq_entry *eq_ent case GAUDI2_EVENT_ROTATOR0_ROT0_QM ... GAUDI2_EVENT_ROTATOR1_ROT1_QM: fallthrough; case GAUDI2_EVENT_NIC0_QM0 ... GAUDI2_EVENT_NIC11_QM1: - gaudi2_handle_qman_err(hdev, event_type); + error_count = gaudi2_handle_qman_err(hdev, event_type); event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR; break; case GAUDI2_EVENT_ARC_AXI_ERROR_RESPONSE_0: reset_flags |= HL_DRV_RESET_FW_FATAL_ERR; - gaudi2_handle_arc_farm_sei_err(hdev); + error_count = gaudi2_handle_arc_farm_sei_err(hdev); event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR; break; case GAUDI2_EVENT_CPU_AXI_ERR_RSP: - gaudi2_handle_cpu_sei_err(hdev); + error_count = gaudi2_handle_cpu_sei_err(hdev); event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR; break; case GAUDI2_EVENT_PDMA_CH0_AXI_ERR_RSP: case GAUDI2_EVENT_PDMA_CH1_AXI_ERR_RSP: reset_flags |= HL_DRV_RESET_FW_FATAL_ERR; - gaudi2_handle_qm_sei_err(hdev, event_type, &eq_entry->razwi_info, &event_mask); + error_count = gaudi2_handle_qm_sei_err(hdev, event_type, + &eq_entry->razwi_info, &event_mask); event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR; break; case GAUDI2_EVENT_ROTATOR0_AXI_ERROR_RESPONSE: case GAUDI2_EVENT_ROTATOR1_AXI_ERROR_RESPONSE: index = event_type - GAUDI2_EVENT_ROTATOR0_AXI_ERROR_RESPONSE; - gaudi2_handle_rot_err(hdev, index, &eq_entry->razwi_with_intr_cause, &event_mask); - gaudi2_handle_qm_sei_err(hdev, event_type, NULL, &event_mask); + error_count = gaudi2_handle_rot_err(hdev, index, + &eq_entry->razwi_with_intr_cause, &event_mask); + error_count += gaudi2_handle_qm_sei_err(hdev, event_type, NULL, &event_mask); event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR; break; case GAUDI2_EVENT_TPC0_AXI_ERR_RSP ... GAUDI2_EVENT_TPC24_AXI_ERR_RSP: index = event_type - GAUDI2_EVENT_TPC0_AXI_ERR_RSP; - gaudi2_tpc_ack_interrupts(hdev, index, "AXI_ERR_RSP", + error_count = gaudi2_tpc_ack_interrupts(hdev, index, "AXI_ERR_RSP", &eq_entry->razwi_with_intr_cause, &event_mask); - gaudi2_handle_qm_sei_err(hdev, event_type, NULL, &event_mask); + error_count += gaudi2_handle_qm_sei_err(hdev, event_type, NULL, &event_mask); event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR; break; case GAUDI2_EVENT_DEC0_AXI_ERR_RSPONSE ... GAUDI2_EVENT_DEC9_AXI_ERR_RSPONSE: index = event_type - GAUDI2_EVENT_DEC0_AXI_ERR_RSPONSE; - gaudi2_handle_dec_err(hdev, index, "AXI_ERR_RESPONSE", &eq_entry->razwi_info, - &event_mask); + error_count = gaudi2_handle_dec_err(hdev, index, "AXI_ERR_RESPONSE", + &eq_entry->razwi_info, &event_mask); event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR; break; @@ -8858,8 +8969,8 @@ static void gaudi2_handle_eqe(struct hl_device *hdev, struct hl_eq_entry *eq_ent case GAUDI2_EVENT_TPC24_KERNEL_ERR: index = (event_type - GAUDI2_EVENT_TPC0_KERNEL_ERR) / (GAUDI2_EVENT_TPC1_KERNEL_ERR - GAUDI2_EVENT_TPC0_KERNEL_ERR); - gaudi2_tpc_ack_interrupts(hdev, index, "KRN_ERR", &eq_entry->razwi_with_intr_cause, - &event_mask); + error_count = gaudi2_tpc_ack_interrupts(hdev, index, "KRN_ERR", + &eq_entry->razwi_with_intr_cause, &event_mask); event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR; break; @@ -8875,7 +8986,8 @@ static void gaudi2_handle_eqe(struct hl_device *hdev, struct hl_eq_entry *eq_ent case GAUDI2_EVENT_DEC9_SPI: index = (event_type - GAUDI2_EVENT_DEC0_SPI) / (GAUDI2_EVENT_DEC1_SPI - GAUDI2_EVENT_DEC0_SPI); - gaudi2_handle_dec_err(hdev, index, "SPI", &eq_entry->razwi_info, &event_mask); + error_count = gaudi2_handle_dec_err(hdev, index, "SPI", + &eq_entry->razwi_info, &event_mask); event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR; break; @@ -8886,9 +8998,9 @@ static void gaudi2_handle_eqe(struct hl_device *hdev, struct hl_eq_entry *eq_ent index = (event_type - GAUDI2_EVENT_MME0_CTRL_AXI_ERROR_RESPONSE) / (GAUDI2_EVENT_MME1_CTRL_AXI_ERROR_RESPONSE - GAUDI2_EVENT_MME0_CTRL_AXI_ERROR_RESPONSE); - gaudi2_handle_mme_err(hdev, index, + error_count = gaudi2_handle_mme_err(hdev, index, "CTRL_AXI_ERROR_RESPONSE", &eq_entry->razwi_info, &event_mask); - gaudi2_handle_qm_sei_err(hdev, event_type, NULL, &event_mask); + error_count += gaudi2_handle_qm_sei_err(hdev, event_type, NULL, &event_mask); event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR; break; @@ -8899,8 +9011,8 @@ static void gaudi2_handle_eqe(struct hl_device *hdev, struct hl_eq_entry *eq_ent index = (event_type - GAUDI2_EVENT_MME0_QMAN_SW_ERROR) / (GAUDI2_EVENT_MME1_QMAN_SW_ERROR - GAUDI2_EVENT_MME0_QMAN_SW_ERROR); - gaudi2_handle_mme_err(hdev, index, "QMAN_SW_ERROR", &eq_entry->razwi_info, - &event_mask); + error_count = gaudi2_handle_mme_err(hdev, index, "QMAN_SW_ERROR", + &eq_entry->razwi_info, &event_mask); event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR; break; @@ -8911,25 +9023,26 @@ static void gaudi2_handle_eqe(struct hl_device *hdev, struct hl_eq_entry *eq_ent index = (event_type - GAUDI2_EVENT_MME0_WAP_SOURCE_RESULT_INVALID) / (GAUDI2_EVENT_MME1_WAP_SOURCE_RESULT_INVALID - GAUDI2_EVENT_MME0_WAP_SOURCE_RESULT_INVALID); - gaudi2_handle_mme_wap_err(hdev, index, &eq_entry->razwi_info, &event_mask); + error_count = gaudi2_handle_mme_wap_err(hdev, index, + &eq_entry->razwi_info, &event_mask); event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR; break; case GAUDI2_EVENT_KDMA_CH0_AXI_ERR_RSP: case GAUDI2_EVENT_KDMA0_CORE: - gaudi2_handle_kdma_core_event(hdev, + error_count = gaudi2_handle_kdma_core_event(hdev, le64_to_cpu(eq_entry->intr_cause.intr_cause_data)); event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR; break; case GAUDI2_EVENT_HDMA2_CORE ... GAUDI2_EVENT_PDMA1_CORE: - gaudi2_handle_dma_core_event(hdev, + error_count = gaudi2_handle_dma_core_event(hdev, le64_to_cpu(eq_entry->intr_cause.intr_cause_data)); event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR; break; case GAUDI2_EVENT_PCIE_ADDR_DEC_ERR: - gaudi2_print_pcie_addr_dec_info(hdev, + error_count = gaudi2_print_pcie_addr_dec_info(hdev, le64_to_cpu(eq_entry->intr_cause.intr_cause_data), &event_mask); reset_flags |= HL_DRV_RESET_FW_FATAL_ERR; event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR; @@ -8939,27 +9052,27 @@ static void gaudi2_handle_eqe(struct hl_device *hdev, struct hl_eq_entry *eq_ent case GAUDI2_EVENT_HMMU_0_AXI_ERR_RSP ... GAUDI2_EVENT_HMMU_12_AXI_ERR_RSP: case GAUDI2_EVENT_PMMU0_PAGE_FAULT_WR_PERM ... GAUDI2_EVENT_PMMU0_SECURITY_ERROR: case GAUDI2_EVENT_PMMU_AXI_ERR_RSP_0: - gaudi2_handle_mmu_spi_sei_err(hdev, event_type, &event_mask); + error_count = gaudi2_handle_mmu_spi_sei_err(hdev, event_type, &event_mask); reset_flags |= HL_DRV_RESET_FW_FATAL_ERR; event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR; break; case GAUDI2_EVENT_HIF0_FATAL ... GAUDI2_EVENT_HIF12_FATAL: - gaudi2_handle_hif_fatal(hdev, event_type, + error_count = gaudi2_handle_hif_fatal(hdev, event_type, le64_to_cpu(eq_entry->intr_cause.intr_cause_data)); reset_flags |= HL_DRV_RESET_FW_FATAL_ERR; event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR; break; case GAUDI2_EVENT_PMMU_FATAL_0: - gaudi2_handle_pif_fatal(hdev, + error_count = gaudi2_handle_pif_fatal(hdev, le64_to_cpu(eq_entry->intr_cause.intr_cause_data)); reset_flags |= HL_DRV_RESET_FW_FATAL_ERR; event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR; break; case GAUDI2_EVENT_PSOC63_RAZWI_OR_PID_MIN_MAX_INTERRUPT: - gaudi2_ack_psoc_razwi_event_handler(hdev, &event_mask); + error_count = gaudi2_ack_psoc_razwi_event_handler(hdev, &event_mask); event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR; break; @@ -8969,33 +9082,39 @@ static void gaudi2_handle_eqe(struct hl_device *hdev, struct hl_eq_entry *eq_ent reset_flags |= HL_DRV_RESET_FW_FATAL_ERR; reset_required = true; } + error_count++; break; case GAUDI2_EVENT_HBM_CATTRIP_0 ... GAUDI2_EVENT_HBM_CATTRIP_5: - gaudi2_handle_hbm_cattrip(hdev, le64_to_cpu(eq_entry->intr_cause.intr_cause_data)); + error_count = gaudi2_handle_hbm_cattrip(hdev, + le64_to_cpu(eq_entry->intr_cause.intr_cause_data)); event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR; break; case GAUDI2_EVENT_HBM0_MC0_SPI ... GAUDI2_EVENT_HBM5_MC1_SPI: - gaudi2_handle_hbm_mc_spi(hdev, le64_to_cpu(eq_entry->intr_cause.intr_cause_data)); + error_count = gaudi2_handle_hbm_mc_spi(hdev, + le64_to_cpu(eq_entry->intr_cause.intr_cause_data)); event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR; break; case GAUDI2_EVENT_PCIE_DRAIN_COMPLETE: - gaudi2_handle_pcie_drain(hdev, &eq_entry->pcie_drain_ind_data); + error_count = gaudi2_handle_pcie_drain(hdev, &eq_entry->pcie_drain_ind_data); event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR; break; case GAUDI2_EVENT_PSOC59_RPM_ERROR_OR_DRAIN: - gaudi2_handle_psoc_drain(hdev, le64_to_cpu(eq_entry->intr_cause.intr_cause_data)); + error_count = gaudi2_handle_psoc_drain(hdev, + le64_to_cpu(eq_entry->intr_cause.intr_cause_data)); event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR; break; case GAUDI2_EVENT_CPU_AXI_ECC: + error_count = GAUDI2_NA_EVENT_CAUSE; reset_flags |= HL_DRV_RESET_FW_FATAL_ERR; event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR; break; case GAUDI2_EVENT_CPU_L2_RAM_ECC: + error_count = GAUDI2_NA_EVENT_CAUSE; reset_flags |= HL_DRV_RESET_FW_FATAL_ERR; event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR; break; @@ -9009,25 +9128,30 @@ static void gaudi2_handle_eqe(struct hl_device *hdev, struct hl_eq_entry *eq_ent sbte_index = (event_type - GAUDI2_EVENT_MME0_SBTE0_AXI_ERR_RSP) % (GAUDI2_EVENT_MME1_SBTE0_AXI_ERR_RSP - GAUDI2_EVENT_MME0_SBTE0_AXI_ERR_RSP); - gaudi2_handle_mme_sbte_err(hdev, index, sbte_index, - le64_to_cpu(eq_entry->intr_cause.intr_cause_data)); + error_count = gaudi2_handle_mme_sbte_err(hdev, index, sbte_index, + le64_to_cpu(eq_entry->intr_cause.intr_cause_data)); event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR; break; case GAUDI2_EVENT_VM0_ALARM_A ... GAUDI2_EVENT_VM3_ALARM_B: + error_count = GAUDI2_NA_EVENT_CAUSE; reset_flags |= HL_DRV_RESET_FW_FATAL_ERR; event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR; break; case GAUDI2_EVENT_PSOC_AXI_ERR_RSP: + error_count = GAUDI2_NA_EVENT_CAUSE; event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR; break; case GAUDI2_EVENT_PSOC_PRSTN_FALL: + error_count = GAUDI2_NA_EVENT_CAUSE; event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR; break; case GAUDI2_EVENT_PCIE_APB_TIMEOUT: + error_count = GAUDI2_NA_EVENT_CAUSE; reset_flags |= HL_DRV_RESET_FW_FATAL_ERR; event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR; break; case GAUDI2_EVENT_PCIE_FATAL_ERR: + error_count = GAUDI2_NA_EVENT_CAUSE; event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR; break; case GAUDI2_EVENT_TPC0_BMON_SPMU: @@ -9080,6 +9204,7 @@ static void gaudi2_handle_eqe(struct hl_device *hdev, struct hl_eq_entry *eq_ent case GAUDI2_EVENT_DEC8_BMON_SPMU: case GAUDI2_EVENT_DEC9_BMON_SPMU: case GAUDI2_EVENT_ROTATOR0_BMON_SPMU ... GAUDI2_EVENT_SM3_BMON_SPMU: + error_count = GAUDI2_NA_EVENT_CAUSE; event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR; break; @@ -9088,65 +9213,83 @@ static void gaudi2_handle_eqe(struct hl_device *hdev, struct hl_eq_entry *eq_ent case GAUDI2_EVENT_CPU_FIX_THERMAL_ENV_S: case GAUDI2_EVENT_CPU_FIX_THERMAL_ENV_E: gaudi2_print_clk_change_info(hdev, event_type, &event_mask); + error_count = GAUDI2_NA_EVENT_CAUSE; break; case GAUDI2_EVENT_CPU_PKT_QUEUE_OUT_SYNC: gaudi2_print_out_of_sync_info(hdev, &eq_entry->pkt_sync_err); + error_count = GAUDI2_NA_EVENT_CAUSE; event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR; break; case GAUDI2_EVENT_PCIE_FLR_REQUESTED: event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR; + error_count = GAUDI2_NA_EVENT_CAUSE; /* Do nothing- FW will handle it */ break; case GAUDI2_EVENT_PCIE_P2P_MSIX: - gaudi2_handle_pcie_p2p_msix(hdev); + error_count = gaudi2_handle_pcie_p2p_msix(hdev); event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR; break; case GAUDI2_EVENT_SM0_AXI_ERROR_RESPONSE ... GAUDI2_EVENT_SM3_AXI_ERROR_RESPONSE: index = event_type - GAUDI2_EVENT_SM0_AXI_ERROR_RESPONSE; - gaudi2_handle_sm_err(hdev, index); + error_count = gaudi2_handle_sm_err(hdev, index); event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR; break; case GAUDI2_EVENT_PSOC_MME_PLL_LOCK_ERR ... GAUDI2_EVENT_DCORE2_HBM_PLL_LOCK_ERR: + error_count = GAUDI2_NA_EVENT_CAUSE; event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR; break; case GAUDI2_EVENT_CPU_CPLD_SHUTDOWN_CAUSE: dev_info(hdev->dev, "CPLD shutdown cause, reset reason: 0x%llx\n", le64_to_cpu(eq_entry->data[0])); + error_count = GAUDI2_NA_EVENT_CAUSE; event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR; break; case GAUDI2_EVENT_CPU_CPLD_SHUTDOWN_EVENT: dev_err(hdev->dev, "CPLD shutdown event, reset reason: 0x%llx\n", le64_to_cpu(eq_entry->data[0])); + error_count = GAUDI2_NA_EVENT_CAUSE; event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR; break; case GAUDI2_EVENT_CPU_PKT_SANITY_FAILED: gaudi2_print_cpu_pkt_failure_info(hdev, &eq_entry->pkt_sync_err); + error_count = GAUDI2_NA_EVENT_CAUSE; event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR; break; case GAUDI2_EVENT_ARC_DCCM_FULL: - hl_arc_event_handle(hdev, &eq_entry->arc_data); + error_count = hl_arc_event_handle(hdev, &eq_entry->arc_data); event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR; break; case GAUDI2_EVENT_CPU_FP32_NOT_SUPPORTED: event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR; + error_count = GAUDI2_NA_EVENT_CAUSE; is_critical = true; break; default: - if (gaudi2_irq_map_table[event_type].valid) + if (gaudi2_irq_map_table[event_type].valid) { dev_err_ratelimited(hdev->dev, "Cannot find handler for event %d\n", event_type); + error_count = GAUDI2_NA_EVENT_CAUSE; + } } + /* Make sure to dump an error in case no error cause was printed so far. + * Note that although we have counted the errors, we use this number as + * a boolean. + */ + if (error_count == 0 && !is_info_event(event_type)) + dev_err_ratelimited(hdev->dev, + "No Error cause for H/W event %d\n", event_type); + if ((gaudi2_irq_map_table[event_type].reset || reset_required) && (hdev->hard_reset_on_fw_events || (hdev->asic_prop.fw_security_enabled && is_critical))) From patchwork Thu Dec 8 15:13:36 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oded Gabbay X-Patchwork-Id: 31401 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp253761wrr; Thu, 8 Dec 2022 07:16:34 -0800 (PST) X-Google-Smtp-Source: AA0mqf5ACBf63nLXXy5eJt+ONRsqy67gZItaNTy1yB+Qr3vYAww0nkljM4dt2/zz2T6d3WNgYDov X-Received: by 2002:a17:902:b282:b0:189:91f3:bfe2 with SMTP id u2-20020a170902b28200b0018991f3bfe2mr49002192plr.34.1670512593902; Thu, 08 Dec 2022 07:16:33 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670512593; cv=none; d=google.com; s=arc-20160816; b=tqCgSwsdOjCdoG64DOehLb+fwGeWmUfiF645EHxOFII/ASsFsYJ5DxNWsKJ25FQ+3b 3v3k5+ya0Q6rqDGW/veLvhw4Dlngec2JwFmDHGOKrMC6H9BiAjpBhofp3ZXYTVC3IGcS TbhxUBgafZURdCoS58rGdv62EdTLf3TJgBuf9BMtsEHGlF1CMFpJSXsVIpPWthgHayfc vGe+A+XVkBiGjzWiTjrA1Ta4XaKUfLB01LedVRpZrylBuHBgmS+fuaCl5geN5TfOutiz g+gbyUVA9FbDMAgmgcySIbjMHGf5FExZqT8yKGVFWi/aKgwAur0JLrfv+6xvmhLTiel6 PRvg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=KdUyNE7/YW7XsDXbe0DY78t5AUWVeEp7oVhv+ZqZpDA=; b=OpqCkz0RQkG9mCarf1pImm0aNoWiRmZGEaea8Cy9PKgXk9qZIA8DUdUD+bfcWOq5RT P68sG+DOiqZba74tYC0vfp79Z8G09XAiJTiba/BL8AyYukWXaE9nh1CJ56Sns8cSvrnm aLfTGdwoBR5Ot2uvdwGuDOCdWphG9jFO1ID/OZeMpy/uMMSzmfQdJLHzCV6AETRefr+g fGJJKhGiBM5h0ENdIjmVIlCZ1U9upu2OzxFphVapUnsfyM1N0dVf49tyXPUxE2G2Qn3R +Tb7y1KP48dJ/8TwAD9pN3H/kyQPq0Dfpvg63XgmG6Rl1sGEAZgASLSKBF0iE2ZS6CZ+ 4nzQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=pJ3KwRJE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id g8-20020a636b08000000b004785d1e2b7bsi23746251pgc.514.2022.12.08.07.16.20; Thu, 08 Dec 2022 07:16:33 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=pJ3KwRJE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230249AbiLHPO6 (ORCPT + 99 others); Thu, 8 Dec 2022 10:14:58 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58594 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230127AbiLHPOX (ORCPT ); Thu, 8 Dec 2022 10:14:23 -0500 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 14425AD302 for ; Thu, 8 Dec 2022 07:14:14 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 7670DB82433 for ; Thu, 8 Dec 2022 15:14:12 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 33F52C433C1; Thu, 8 Dec 2022 15:14:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1670512451; bh=AYCT82xl+8PhF00u32+sGmaa0COjC7ofaZy8iEkbCuo=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=pJ3KwRJEBgA2cRkunpBTasni+OESvvhp/0+k+1C5IWDGRfv13D/OEFGIK4/wEfIGl y2yrXp07I0JD5Cf5SUfHml3WfmzUxmkNo3eSLE0HSgqy7Cqpvn/A3eJ+SJxYOLCs0t kMHsxlzFlxJ76vS7WLj8Tq33hD6M5FMwZ3VINmS+fHCjFqoYdbxxXlZjiFAStN6UL1 9FqiuXzSqy518gX/V6SkZ0dabV56We2Yj+2xnl+rmD9npDbT3ShjEgByQiDn+CN3ii 7Dnjdhg76MGh6pkZnF/HCde9unOwBqRt5mgQ/GPiCt/H7hskNFkhO8tYlMpZvh8Djm c7NBI4qdV2Daw== From: Oded Gabbay To: linux-kernel@vger.kernel.org Cc: Ofir Bitton Subject: [PATCH 12/26] habanalabs/gaudi2: remove duplicated event prints Date: Thu, 8 Dec 2022 17:13:36 +0200 Message-Id: <20221208151350.1833823-12-ogabbay@kernel.org> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20221208151350.1833823-1-ogabbay@kernel.org> References: <20221208151350.1833823-1-ogabbay@kernel.org> MIME-Version: 1.0 X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751659413665921515?= X-GMAIL-MSGID: =?utf-8?q?1751659413665921515?= From: Ofir Bitton In order to reduce error log, we try to minimize the dumped rows while keeping all relevant error info. In addition we completely remove clock throttling debug logs. Signed-off-by: Ofir Bitton Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/misc/habanalabs/gaudi2/gaudi2.c | 339 +++++++++++------------- 1 file changed, 149 insertions(+), 190 deletions(-) diff --git a/drivers/misc/habanalabs/gaudi2/gaudi2.c b/drivers/misc/habanalabs/gaudi2/gaudi2.c index b8da2aa024ca..8373239ad1bc 100644 --- a/drivers/misc/habanalabs/gaudi2/gaudi2.c +++ b/drivers/misc/habanalabs/gaudi2/gaudi2.c @@ -6804,38 +6804,37 @@ static inline bool is_info_event(u32 event) switch (event) { case GAUDI2_EVENT_CPU_CPLD_SHUTDOWN_CAUSE: case GAUDI2_EVENT_CPU_FIX_POWER_ENV_S ... GAUDI2_EVENT_CPU_FIX_THERMAL_ENV_E: + + /* return in case of NIC status event - these events are received periodically and not as + * an indication to an error. + */ + case GAUDI2_EVENT_CPU0_STATUS_NIC0_ENG0 ... GAUDI2_EVENT_CPU11_STATUS_NIC11_ENG1: return true; default: return false; } } -static void gaudi2_print_irq_info(struct hl_device *hdev, u16 event_type) +void gaudi2_print_event(struct hl_device *hdev, u16 event_type, + bool ratelimited, const char *fmt, ...) { - char desc[64] = ""; - bool event_valid = false; - - /* return in case of NIC status event - these events are received periodically and not as - * an indication to an error, thus not printed. - */ - if (event_type >= GAUDI2_EVENT_CPU0_STATUS_NIC0_ENG0 && - event_type <= GAUDI2_EVENT_CPU11_STATUS_NIC11_ENG1) - return; + struct va_format vaf; + va_list args; - if (gaudi2_irq_map_table[event_type].valid) { - snprintf(desc, sizeof(desc), gaudi2_irq_map_table[event_type].name); - event_valid = true; - } - - if (!event_valid) - snprintf(desc, sizeof(desc), "N/A"); + va_start(args, fmt); + vaf.fmt = fmt; + vaf.va = &args; - if (is_info_event(event_type)) - dev_info_ratelimited(hdev->dev, "Received H/W interrupt %d [\"%s\"]\n", - event_type, desc); + if (ratelimited) + dev_err_ratelimited(hdev->dev, "%s: %pV\n", + gaudi2_irq_map_table[event_type].valid ? + gaudi2_irq_map_table[event_type].name : "N/A Event", &vaf); else - dev_err_ratelimited(hdev->dev, "Received H/W interrupt %d [\"%s\"]\n", - event_type, desc); + dev_err(hdev->dev, "%s: %pV\n", + gaudi2_irq_map_table[event_type].valid ? + gaudi2_irq_map_table[event_type].name : "N/A Event", &vaf); + + va_end(args); } static bool gaudi2_handle_ecc_event(struct hl_device *hdev, u16 event_type, @@ -6848,7 +6847,7 @@ static bool gaudi2_handle_ecc_event(struct hl_device *hdev, u16 event_type, ecc_syndrom = le64_to_cpu(ecc_data->ecc_syndrom); memory_wrapper_idx = ecc_data->memory_wrapper_idx; - dev_err(hdev->dev, + gaudi2_print_event(hdev, event_type, !ecc_data->is_critical, "ECC error detected. address: %#llx. Syndrom: %#llx. block id %u. critical %u.\n", ecc_address, ecc_syndrom, memory_wrapper_idx, ecc_data->is_critical); @@ -6988,7 +6987,7 @@ static void print_qman_data_on_err(struct hl_device *hdev, u32 qid_base, u32 str gaudi2_print_last_pqes_on_err(hdev, qid_base, i, qman_base, false); } -static int gaudi2_handle_qman_err_generic(struct hl_device *hdev, const char *qm_name, +static int gaudi2_handle_qman_err_generic(struct hl_device *hdev, u16 event_type, u64 qman_base, u32 qid_base) { u32 i, j, glbl_sts_val, arb_err_val, num_error_causes, error_count = 0; @@ -7015,11 +7014,11 @@ static int gaudi2_handle_qman_err_generic(struct hl_device *hdev, const char *qm for (j = 0 ; j < num_error_causes ; j++) if (glbl_sts_val & BIT(j)) { - dev_err_ratelimited(hdev->dev, "%s %s. err cause: %s\n", - qm_name, reg_desc, - i == QMAN_STREAMS ? - gaudi2_qman_lower_cp_error_cause[j] : - gaudi2_qman_error_cause[j]); + gaudi2_print_event(hdev, event_type, true, + "%s. err cause: %s", reg_desc, + i == QMAN_STREAMS ? + gaudi2_qman_lower_cp_error_cause[j] : + gaudi2_qman_error_cause[j]); error_count++; } @@ -7033,8 +7032,9 @@ static int gaudi2_handle_qman_err_generic(struct hl_device *hdev, const char *qm for (j = 0 ; j < GAUDI2_NUM_OF_QM_ARB_ERR_CAUSE ; j++) { if (arb_err_val & BIT(j)) { - dev_err_ratelimited(hdev->dev, "%s ARB_ERR. err cause: %s\n", - qm_name, gaudi2_qman_arb_error_cause[j]); + gaudi2_print_event(hdev, event_type, true, + "ARB_ERR. err cause: %s", + gaudi2_qman_arb_error_cause[j]); error_count++; } } @@ -7761,7 +7761,7 @@ static int gaudi2_ack_psoc_razwi_event_handler(struct hl_device *hdev, u64 *even return error_count; } -static int _gaudi2_handle_qm_sei_err(struct hl_device *hdev, u64 qman_base) +static int _gaudi2_handle_qm_sei_err(struct hl_device *hdev, u64 qman_base, u16 event_type) { u32 i, sts_val, sts_clr_val = 0, error_count = 0; @@ -7769,8 +7769,8 @@ static int _gaudi2_handle_qm_sei_err(struct hl_device *hdev, u64 qman_base) for (i = 0 ; i < GAUDI2_NUM_OF_QM_SEI_ERR_CAUSE ; i++) { if (sts_val & BIT(i)) { - dev_err_ratelimited(hdev->dev, "QM SEI. err cause: %s\n", - gaudi2_qm_sei_error_cause[i]); + gaudi2_print_event(hdev, event_type, true, + "err cause: %s", gaudi2_qm_sei_error_cause[i]); sts_clr_val |= BIT(i); error_count++; } @@ -7827,13 +7827,13 @@ static int gaudi2_handle_qm_sei_err(struct hl_device *hdev, u16 event_type, return 0; } - error_count = _gaudi2_handle_qm_sei_err(hdev, qman_base); + error_count = _gaudi2_handle_qm_sei_err(hdev, qman_base, event_type); /* There is a single event per NIC macro, so should check its both QMAN blocks */ if (event_type >= GAUDI2_EVENT_NIC0_AXI_ERROR_RESPONSE && event_type <= GAUDI2_EVENT_NIC11_AXI_ERROR_RESPONSE) error_count += _gaudi2_handle_qm_sei_err(hdev, - qman_base + NIC_QM_OFFSET); + qman_base + NIC_QM_OFFSET, event_type); /* check if RAZWI happened */ if (razwi_info) @@ -7846,7 +7846,6 @@ static int gaudi2_handle_qman_err(struct hl_device *hdev, u16 event_type) { u32 qid_base, error_count = 0; u64 qman_base; - char desc[32]; u8 index; switch (event_type) { @@ -7854,125 +7853,104 @@ static int gaudi2_handle_qman_err(struct hl_device *hdev, u16 event_type) index = event_type - GAUDI2_EVENT_TPC0_QM; qid_base = GAUDI2_QUEUE_ID_DCORE0_TPC_0_0 + index * QMAN_STREAMS; qman_base = mmDCORE0_TPC0_QM_BASE + index * DCORE_TPC_OFFSET; - snprintf(desc, ARRAY_SIZE(desc), "DCORE0_TPC%d_QM", index); break; case GAUDI2_EVENT_TPC6_QM ... GAUDI2_EVENT_TPC11_QM: index = event_type - GAUDI2_EVENT_TPC6_QM; qid_base = GAUDI2_QUEUE_ID_DCORE1_TPC_0_0 + index * QMAN_STREAMS; qman_base = mmDCORE1_TPC0_QM_BASE + index * DCORE_TPC_OFFSET; - snprintf(desc, ARRAY_SIZE(desc), "DCORE1_TPC%d_QM", index); break; case GAUDI2_EVENT_TPC12_QM ... GAUDI2_EVENT_TPC17_QM: index = event_type - GAUDI2_EVENT_TPC12_QM; qid_base = GAUDI2_QUEUE_ID_DCORE2_TPC_0_0 + index * QMAN_STREAMS; qman_base = mmDCORE2_TPC0_QM_BASE + index * DCORE_TPC_OFFSET; - snprintf(desc, ARRAY_SIZE(desc), "DCORE2_TPC%d_QM", index); break; case GAUDI2_EVENT_TPC18_QM ... GAUDI2_EVENT_TPC23_QM: index = event_type - GAUDI2_EVENT_TPC18_QM; qid_base = GAUDI2_QUEUE_ID_DCORE3_TPC_0_0 + index * QMAN_STREAMS; qman_base = mmDCORE3_TPC0_QM_BASE + index * DCORE_TPC_OFFSET; - snprintf(desc, ARRAY_SIZE(desc), "DCORE3_TPC%d_QM", index); break; case GAUDI2_EVENT_TPC24_QM: qid_base = GAUDI2_QUEUE_ID_DCORE0_TPC_6_0; qman_base = mmDCORE0_TPC6_QM_BASE; - snprintf(desc, ARRAY_SIZE(desc), "DCORE0_TPC6_QM"); break; case GAUDI2_EVENT_MME0_QM: qid_base = GAUDI2_QUEUE_ID_DCORE0_MME_0_0; qman_base = mmDCORE0_MME_QM_BASE; - snprintf(desc, ARRAY_SIZE(desc), "DCORE0_MME_QM"); break; case GAUDI2_EVENT_MME1_QM: qid_base = GAUDI2_QUEUE_ID_DCORE1_MME_0_0; qman_base = mmDCORE1_MME_QM_BASE; - snprintf(desc, ARRAY_SIZE(desc), "DCORE1_MME_QM"); break; case GAUDI2_EVENT_MME2_QM: qid_base = GAUDI2_QUEUE_ID_DCORE2_MME_0_0; qman_base = mmDCORE2_MME_QM_BASE; - snprintf(desc, ARRAY_SIZE(desc), "DCORE2_MME_QM"); break; case GAUDI2_EVENT_MME3_QM: qid_base = GAUDI2_QUEUE_ID_DCORE3_MME_0_0; qman_base = mmDCORE3_MME_QM_BASE; - snprintf(desc, ARRAY_SIZE(desc), "DCORE3_MME_QM"); break; case GAUDI2_EVENT_HDMA0_QM: qid_base = GAUDI2_QUEUE_ID_DCORE0_EDMA_0_0; qman_base = mmDCORE0_EDMA0_QM_BASE; - snprintf(desc, ARRAY_SIZE(desc), "DCORE0_EDMA0_QM"); break; case GAUDI2_EVENT_HDMA1_QM: qid_base = GAUDI2_QUEUE_ID_DCORE0_EDMA_1_0; qman_base = mmDCORE0_EDMA1_QM_BASE; - snprintf(desc, ARRAY_SIZE(desc), "DCORE0_EDMA1_QM"); break; case GAUDI2_EVENT_HDMA2_QM: qid_base = GAUDI2_QUEUE_ID_DCORE1_EDMA_0_0; qman_base = mmDCORE1_EDMA0_QM_BASE; - snprintf(desc, ARRAY_SIZE(desc), "DCORE1_EDMA0_QM"); break; case GAUDI2_EVENT_HDMA3_QM: qid_base = GAUDI2_QUEUE_ID_DCORE1_EDMA_1_0; qman_base = mmDCORE1_EDMA1_QM_BASE; - snprintf(desc, ARRAY_SIZE(desc), "DCORE1_EDMA1_QM"); break; case GAUDI2_EVENT_HDMA4_QM: qid_base = GAUDI2_QUEUE_ID_DCORE2_EDMA_0_0; qman_base = mmDCORE2_EDMA0_QM_BASE; - snprintf(desc, ARRAY_SIZE(desc), "DCORE2_EDMA0_QM"); break; case GAUDI2_EVENT_HDMA5_QM: qid_base = GAUDI2_QUEUE_ID_DCORE2_EDMA_1_0; qman_base = mmDCORE2_EDMA1_QM_BASE; - snprintf(desc, ARRAY_SIZE(desc), "DCORE2_EDMA1_QM"); break; case GAUDI2_EVENT_HDMA6_QM: qid_base = GAUDI2_QUEUE_ID_DCORE3_EDMA_0_0; qman_base = mmDCORE3_EDMA0_QM_BASE; - snprintf(desc, ARRAY_SIZE(desc), "DCORE3_EDMA0_QM"); break; case GAUDI2_EVENT_HDMA7_QM: qid_base = GAUDI2_QUEUE_ID_DCORE3_EDMA_1_0; qman_base = mmDCORE3_EDMA1_QM_BASE; - snprintf(desc, ARRAY_SIZE(desc), "DCORE3_EDMA1_QM"); break; case GAUDI2_EVENT_PDMA0_QM: qid_base = GAUDI2_QUEUE_ID_PDMA_0_0; qman_base = mmPDMA0_QM_BASE; - snprintf(desc, ARRAY_SIZE(desc), "PDMA0_QM"); break; case GAUDI2_EVENT_PDMA1_QM: qid_base = GAUDI2_QUEUE_ID_PDMA_1_0; qman_base = mmPDMA1_QM_BASE; - snprintf(desc, ARRAY_SIZE(desc), "PDMA1_QM"); break; case GAUDI2_EVENT_ROTATOR0_ROT0_QM: qid_base = GAUDI2_QUEUE_ID_ROT_0_0; qman_base = mmROT0_QM_BASE; - snprintf(desc, ARRAY_SIZE(desc), "ROTATOR0_QM"); break; case GAUDI2_EVENT_ROTATOR1_ROT1_QM: qid_base = GAUDI2_QUEUE_ID_ROT_1_0; qman_base = mmROT1_QM_BASE; - snprintf(desc, ARRAY_SIZE(desc), "ROTATOR1_QM"); break; default: return 0; } - error_count = gaudi2_handle_qman_err_generic(hdev, desc, qman_base, qid_base); + error_count = gaudi2_handle_qman_err_generic(hdev, event_type, qman_base, qid_base); /* Handle EDMA QM SEI here because there is no AXI error response event for EDMA */ if (event_type >= GAUDI2_EVENT_HDMA2_QM && event_type <= GAUDI2_EVENT_HDMA5_QM) - error_count += _gaudi2_handle_qm_sei_err(hdev, qman_base); + error_count += _gaudi2_handle_qm_sei_err(hdev, qman_base, event_type); return error_count; } -static int gaudi2_handle_arc_farm_sei_err(struct hl_device *hdev) +static int gaudi2_handle_arc_farm_sei_err(struct hl_device *hdev, u16 event_type) { u32 i, sts_val, sts_clr_val = 0, error_count = 0; @@ -7980,8 +7958,8 @@ static int gaudi2_handle_arc_farm_sei_err(struct hl_device *hdev) for (i = 0 ; i < GAUDI2_NUM_OF_ARC_SEI_ERR_CAUSE ; i++) { if (sts_val & BIT(i)) { - dev_err_ratelimited(hdev->dev, "ARC SEI. err cause: %s\n", - gaudi2_arc_sei_error_cause[i]); + gaudi2_print_event(hdev, event_type, true, + "err cause: %s", gaudi2_arc_sei_error_cause[i]); sts_clr_val |= BIT(i); error_count++; } @@ -7992,7 +7970,7 @@ static int gaudi2_handle_arc_farm_sei_err(struct hl_device *hdev) return error_count; } -static int gaudi2_handle_cpu_sei_err(struct hl_device *hdev) +static int gaudi2_handle_cpu_sei_err(struct hl_device *hdev, u16 event_type) { u32 i, sts_val, sts_clr_val = 0, error_count = 0; @@ -8000,8 +7978,8 @@ static int gaudi2_handle_cpu_sei_err(struct hl_device *hdev) for (i = 0 ; i < GAUDI2_NUM_OF_CPU_SEI_ERR_CAUSE ; i++) { if (sts_val & BIT(i)) { - dev_err_ratelimited(hdev->dev, "CPU SEI. err cause: %s\n", - gaudi2_cpu_sei_error_cause[i]); + gaudi2_print_event(hdev, event_type, true, + "err cause: %s", gaudi2_cpu_sei_error_cause[i]); sts_clr_val |= BIT(i); error_count++; } @@ -8012,7 +7990,7 @@ static int gaudi2_handle_cpu_sei_err(struct hl_device *hdev) return error_count; } -static int gaudi2_handle_rot_err(struct hl_device *hdev, u8 rot_index, +static int gaudi2_handle_rot_err(struct hl_device *hdev, u8 rot_index, u16 event_type, struct hl_eq_razwi_with_intr_cause *razwi_with_intr_cause, u64 *event_mask) { @@ -8022,8 +8000,8 @@ static int gaudi2_handle_rot_err(struct hl_device *hdev, u8 rot_index, for (i = 0 ; i < GAUDI2_NUM_OF_ROT_ERR_CAUSE ; i++) if (intr_cause_data & BIT(i)) { - dev_err_ratelimited(hdev->dev, "ROT%u. err cause: %s\n", - rot_index, guadi2_rot_error_cause[i]); + gaudi2_print_event(hdev, event_type, true, + "err cause: %s", guadi2_rot_error_cause[i]); error_count++; } @@ -8034,7 +8012,7 @@ static int gaudi2_handle_rot_err(struct hl_device *hdev, u8 rot_index, return error_count; } -static int gaudi2_tpc_ack_interrupts(struct hl_device *hdev, u8 tpc_index, char *interrupt_name, +static int gaudi2_tpc_ack_interrupts(struct hl_device *hdev, u8 tpc_index, u16 event_type, struct hl_eq_razwi_with_intr_cause *razwi_with_intr_cause, u64 *event_mask) { @@ -8044,8 +8022,8 @@ static int gaudi2_tpc_ack_interrupts(struct hl_device *hdev, u8 tpc_index, char for (i = 0 ; i < GAUDI2_NUM_OF_TPC_INTR_CAUSE ; i++) if (intr_cause_data & BIT(i)) { - dev_err_ratelimited(hdev->dev, "TPC%d_%s interrupt cause: %s\n", - tpc_index, interrupt_name, gaudi2_tpc_interrupts_cause[i]); + gaudi2_print_event(hdev, event_type, true, + "interrupt cause: %s", gaudi2_tpc_interrupts_cause[i]); error_count++; } @@ -8056,7 +8034,7 @@ static int gaudi2_tpc_ack_interrupts(struct hl_device *hdev, u8 tpc_index, char return error_count; } -static int gaudi2_handle_dec_err(struct hl_device *hdev, u8 dec_index, const char *interrupt_name, +static int gaudi2_handle_dec_err(struct hl_device *hdev, u8 dec_index, u16 event_type, struct hl_eq_razwi_info *razwi_info, u64 *event_mask) { u32 sts_addr, sts_val, sts_clr_val = 0, error_count = 0; @@ -8076,8 +8054,8 @@ static int gaudi2_handle_dec_err(struct hl_device *hdev, u8 dec_index, const cha for (i = 0 ; i < GAUDI2_NUM_OF_DEC_ERR_CAUSE ; i++) { if (sts_val & BIT(i)) { - dev_err_ratelimited(hdev->dev, "DEC%u_%s err cause: %s\n", - dec_index, interrupt_name, gaudi2_dec_error_cause[i]); + gaudi2_print_event(hdev, event_type, true, + "err cause: %s", gaudi2_dec_error_cause[i]); sts_clr_val |= BIT(i); error_count++; } @@ -8093,7 +8071,7 @@ static int gaudi2_handle_dec_err(struct hl_device *hdev, u8 dec_index, const cha return error_count; } -static int gaudi2_handle_mme_err(struct hl_device *hdev, u8 mme_index, const char *interrupt_name, +static int gaudi2_handle_mme_err(struct hl_device *hdev, u8 mme_index, u16 event_type, struct hl_eq_razwi_info *razwi_info, u64 *event_mask) { u32 sts_addr, sts_val, sts_clr_addr, sts_clr_val = 0, error_count = 0; @@ -8106,8 +8084,8 @@ static int gaudi2_handle_mme_err(struct hl_device *hdev, u8 mme_index, const cha for (i = 0 ; i < GAUDI2_NUM_OF_MME_ERR_CAUSE ; i++) { if (sts_val & BIT(i)) { - dev_err_ratelimited(hdev->dev, "MME%u_%s err cause: %s\n", - mme_index, interrupt_name, guadi2_mme_error_cause[i]); + gaudi2_print_event(hdev, event_type, true, + "err cause: %s", guadi2_mme_error_cause[i]); sts_clr_val |= BIT(i); error_count++; } @@ -8123,22 +8101,22 @@ static int gaudi2_handle_mme_err(struct hl_device *hdev, u8 mme_index, const cha return error_count; } -static int gaudi2_handle_mme_sbte_err(struct hl_device *hdev, u8 mme_index, u8 sbte_index, +static int gaudi2_handle_mme_sbte_err(struct hl_device *hdev, u16 event_type, u64 intr_cause_data) { int i, error_count = 0; for (i = 0 ; i < GAUDI2_NUM_OF_MME_SBTE_ERR_CAUSE ; i++) if (intr_cause_data & BIT(i)) { - dev_err_ratelimited(hdev->dev, "MME%uSBTE%u_AXI_ERR_RSP err cause: %s\n", - mme_index, sbte_index, guadi2_mme_sbte_error_cause[i]); + gaudi2_print_event(hdev, event_type, true, + "err cause: %s", guadi2_mme_sbte_error_cause[i]); error_count++; } return error_count; } -static int gaudi2_handle_mme_wap_err(struct hl_device *hdev, u8 mme_index, +static int gaudi2_handle_mme_wap_err(struct hl_device *hdev, u8 mme_index, u16 event_type, struct hl_eq_razwi_info *razwi_info, u64 *event_mask) { u32 sts_addr, sts_val, sts_clr_addr, sts_clr_val = 0, error_count = 0; @@ -8151,9 +8129,8 @@ static int gaudi2_handle_mme_wap_err(struct hl_device *hdev, u8 mme_index, for (i = 0 ; i < GAUDI2_NUM_OF_MME_WAP_ERR_CAUSE ; i++) { if (sts_val & BIT(i)) { - dev_err_ratelimited(hdev->dev, - "MME%u_WAP_SOURCE_RESULT_INVALID err cause: %s\n", - mme_index, guadi2_mme_wap_error_cause[i]); + gaudi2_print_event(hdev, event_type, true, + "err cause: %s", guadi2_mme_wap_error_cause[i]); sts_clr_val |= BIT(i); error_count++; } @@ -8170,7 +8147,8 @@ static int gaudi2_handle_mme_wap_err(struct hl_device *hdev, u8 mme_index, return error_count; } -static int gaudi2_handle_kdma_core_event(struct hl_device *hdev, u64 intr_cause_data) +static int gaudi2_handle_kdma_core_event(struct hl_device *hdev, u16 event_type, + u64 intr_cause_data) { u32 error_count = 0; int i; @@ -8182,23 +8160,24 @@ static int gaudi2_handle_kdma_core_event(struct hl_device *hdev, u64 intr_cause_ */ for (i = 0 ; i < GAUDI2_NUM_OF_DMA_CORE_INTR_CAUSE ; i++) if (intr_cause_data & BIT(i)) { - dev_err_ratelimited(hdev->dev, "kdma core err cause: %s\n", - gaudi2_kdma_core_interrupts_cause[i]); + gaudi2_print_event(hdev, event_type, true, + "err cause: %s", gaudi2_kdma_core_interrupts_cause[i]); error_count++; } return error_count; } -static int gaudi2_handle_dma_core_event(struct hl_device *hdev, u64 intr_cause_data) +static int gaudi2_handle_dma_core_event(struct hl_device *hdev, u16 event_type, + u64 intr_cause_data) { u32 error_count = 0; int i; for (i = 0 ; i < GAUDI2_NUM_OF_DMA_CORE_INTR_CAUSE ; i++) if (intr_cause_data & BIT(i)) { - dev_err_ratelimited(hdev->dev, "dma core err cause: %s\n", - gaudi2_dma_core_interrupts_cause[i]); + gaudi2_print_event(hdev, event_type, true, + "err cause: %s", gaudi2_dma_core_interrupts_cause[i]); error_count++; } @@ -8238,8 +8217,8 @@ static void gaudi2_print_pcie_mstr_rr_mstr_if_razwi_info(struct hl_device *hdev, } } -static int gaudi2_print_pcie_addr_dec_info(struct hl_device *hdev, u64 intr_cause_data, - u64 *event_mask) +static int gaudi2_print_pcie_addr_dec_info(struct hl_device *hdev, u16 event_type, + u64 intr_cause_data, u64 *event_mask) { u32 error_count = 0; int i; @@ -8248,8 +8227,8 @@ static int gaudi2_print_pcie_addr_dec_info(struct hl_device *hdev, u64 intr_caus if (!(intr_cause_data & BIT_ULL(i))) continue; - dev_err_ratelimited(hdev->dev, "PCIE ADDR DEC Error: %s\n", - gaudi2_pcie_addr_dec_error_cause[i]); + gaudi2_print_event(hdev, event_type, true, + "err cause: %s", gaudi2_pcie_addr_dec_error_cause[i]); error_count++; switch (intr_cause_data & BIT_ULL(i)) { @@ -8264,7 +8243,8 @@ static int gaudi2_print_pcie_addr_dec_info(struct hl_device *hdev, u64 intr_caus return error_count; } -static int gaudi2_handle_pif_fatal(struct hl_device *hdev, u64 intr_cause_data) +static int gaudi2_handle_pif_fatal(struct hl_device *hdev, u16 event_type, + u64 intr_cause_data) { u32 error_count = 0; @@ -8272,8 +8252,8 @@ static int gaudi2_handle_pif_fatal(struct hl_device *hdev, u64 intr_cause_data) for (i = 0 ; i < GAUDI2_NUM_OF_PMMU_FATAL_ERR_CAUSE ; i++) { if (intr_cause_data & BIT_ULL(i)) { - dev_err_ratelimited(hdev->dev, "PMMU PIF err cause: %s\n", - gaudi2_pmmu_fatal_interrupts_cause[i]); + gaudi2_print_event(hdev, event_type, true, + "err cause: %s", gaudi2_pmmu_fatal_interrupts_cause[i]); error_count++; } } @@ -8283,16 +8263,13 @@ static int gaudi2_handle_pif_fatal(struct hl_device *hdev, u64 intr_cause_data) static int gaudi2_handle_hif_fatal(struct hl_device *hdev, u16 event_type, u64 intr_cause_data) { - u32 dcore_id, hif_id, error_count = 0; + u32 error_count = 0; int i; - dcore_id = (event_type - GAUDI2_EVENT_HIF0_FATAL) / 4; - hif_id = (event_type - GAUDI2_EVENT_HIF0_FATAL) % 4; - for (i = 0 ; i < GAUDI2_NUM_OF_HIF_FATAL_ERR_CAUSE ; i++) { if (intr_cause_data & BIT_ULL(i)) { - dev_err_ratelimited(hdev->dev, "DCORE%u_HIF%u: %s\n", dcore_id, hif_id, - gaudi2_hif_fatal_interrupts_cause[i]); + gaudi2_print_event(hdev, event_type, true, + "err cause: %s", gaudi2_hif_fatal_interrupts_cause[i]); error_count++; } } @@ -8343,7 +8320,7 @@ static void gaudi2_handle_access_error(struct hl_device *hdev, u64 mmu_base, boo WREG32(mmu_base + MMU_OFFSET(mmDCORE0_HMMU0_MMU_ACCESS_ERROR_CAPTURE), 0); } -static int gaudi2_handle_mmu_spi_sei_generic(struct hl_device *hdev, const char *mmu_name, +static int gaudi2_handle_mmu_spi_sei_generic(struct hl_device *hdev, u16 event_type, u64 mmu_base, bool is_pmmu, u64 *event_mask) { u32 spi_sei_cause, interrupt_clr = 0x0, error_count = 0; @@ -8353,8 +8330,8 @@ static int gaudi2_handle_mmu_spi_sei_generic(struct hl_device *hdev, const char for (i = 0 ; i < GAUDI2_NUM_OF_MMU_SPI_SEI_CAUSE ; i++) { if (spi_sei_cause & BIT(i)) { - dev_err_ratelimited(hdev->dev, "%s SPI_SEI ERR. err cause: %s\n", - mmu_name, gaudi2_mmu_spi_sei[i].cause); + gaudi2_print_event(hdev, event_type, true, + "err cause: %s", gaudi2_mmu_spi_sei[i].cause); if (i == 0) gaudi2_handle_page_error(hdev, mmu_base, is_pmmu, event_mask); @@ -8377,7 +8354,7 @@ static int gaudi2_handle_mmu_spi_sei_generic(struct hl_device *hdev, const char return error_count; } -static int gaudi2_handle_sm_err(struct hl_device *hdev, u8 sm_index) +static int gaudi2_handle_sm_err(struct hl_device *hdev, u16 event_type, u8 sm_index) { u32 sei_cause_addr, sei_cause_val, sei_cause_cause, sei_cause_log, cq_intr_addr, cq_intr_val, cq_intr_queue_index, error_count = 0; @@ -8400,11 +8377,11 @@ static int gaudi2_handle_sm_err(struct hl_device *hdev, u8 sm_index) if (!(sei_cause_cause & BIT(i))) continue; - dev_err_ratelimited(hdev->dev, "SM%u SEI ERR. err cause: %s. %s: 0x%X\n", - sm_index, - gaudi2_sm_sei_cause[i].cause_name, - gaudi2_sm_sei_cause[i].log_name, - sei_cause_log & gaudi2_sm_sei_cause[i].log_mask); + gaudi2_print_event(hdev, event_type, true, + "err cause: %s. %s: 0x%X\n", + gaudi2_sm_sei_cause[i].cause_name, + gaudi2_sm_sei_cause[i].log_name, + sei_cause_log & gaudi2_sm_sei_cause[i].log_mask); error_count++; break; } @@ -8433,7 +8410,6 @@ static int gaudi2_handle_sm_err(struct hl_device *hdev, u8 sm_index) static int gaudi2_handle_mmu_spi_sei_err(struct hl_device *hdev, u16 event_type, u64 *event_mask) { bool is_pmmu = false; - char desc[32]; u32 error_count = 0; u64 mmu_base; u8 index; @@ -8442,54 +8418,46 @@ static int gaudi2_handle_mmu_spi_sei_err(struct hl_device *hdev, u16 event_type, case GAUDI2_EVENT_HMMU0_PAGE_FAULT_OR_WR_PERM ... GAUDI2_EVENT_HMMU3_SECURITY_ERROR: index = (event_type - GAUDI2_EVENT_HMMU0_PAGE_FAULT_OR_WR_PERM) / 3; mmu_base = mmDCORE0_HMMU0_MMU_BASE + index * DCORE_HMMU_OFFSET; - snprintf(desc, ARRAY_SIZE(desc), "DCORE0_HMMU%d", index); break; case GAUDI2_EVENT_HMMU_0_AXI_ERR_RSP ... GAUDI2_EVENT_HMMU_3_AXI_ERR_RSP: index = (event_type - GAUDI2_EVENT_HMMU_0_AXI_ERR_RSP); mmu_base = mmDCORE0_HMMU0_MMU_BASE + index * DCORE_HMMU_OFFSET; - snprintf(desc, ARRAY_SIZE(desc), "DCORE0_HMMU%d", index); break; case GAUDI2_EVENT_HMMU8_PAGE_FAULT_WR_PERM ... GAUDI2_EVENT_HMMU11_SECURITY_ERROR: index = (event_type - GAUDI2_EVENT_HMMU8_PAGE_FAULT_WR_PERM) / 3; mmu_base = mmDCORE1_HMMU0_MMU_BASE + index * DCORE_HMMU_OFFSET; - snprintf(desc, ARRAY_SIZE(desc), "DCORE1_HMMU%d", index); break; case GAUDI2_EVENT_HMMU_8_AXI_ERR_RSP ... GAUDI2_EVENT_HMMU_11_AXI_ERR_RSP: index = (event_type - GAUDI2_EVENT_HMMU_8_AXI_ERR_RSP); mmu_base = mmDCORE1_HMMU0_MMU_BASE + index * DCORE_HMMU_OFFSET; - snprintf(desc, ARRAY_SIZE(desc), "DCORE1_HMMU%d", index); break; case GAUDI2_EVENT_HMMU7_PAGE_FAULT_WR_PERM ... GAUDI2_EVENT_HMMU4_SECURITY_ERROR: index = (event_type - GAUDI2_EVENT_HMMU7_PAGE_FAULT_WR_PERM) / 3; mmu_base = mmDCORE2_HMMU0_MMU_BASE + index * DCORE_HMMU_OFFSET; - snprintf(desc, ARRAY_SIZE(desc), "DCORE2_HMMU%d", index); break; case GAUDI2_EVENT_HMMU_7_AXI_ERR_RSP ... GAUDI2_EVENT_HMMU_4_AXI_ERR_RSP: index = (event_type - GAUDI2_EVENT_HMMU_7_AXI_ERR_RSP); mmu_base = mmDCORE2_HMMU0_MMU_BASE + index * DCORE_HMMU_OFFSET; - snprintf(desc, ARRAY_SIZE(desc), "DCORE2_HMMU%d", index); break; case GAUDI2_EVENT_HMMU15_PAGE_FAULT_WR_PERM ... GAUDI2_EVENT_HMMU12_SECURITY_ERROR: index = (event_type - GAUDI2_EVENT_HMMU15_PAGE_FAULT_WR_PERM) / 3; mmu_base = mmDCORE3_HMMU0_MMU_BASE + index * DCORE_HMMU_OFFSET; - snprintf(desc, ARRAY_SIZE(desc), "DCORE3_HMMU%d", index); break; case GAUDI2_EVENT_HMMU_15_AXI_ERR_RSP ... GAUDI2_EVENT_HMMU_12_AXI_ERR_RSP: index = (event_type - GAUDI2_EVENT_HMMU_15_AXI_ERR_RSP); mmu_base = mmDCORE3_HMMU0_MMU_BASE + index * DCORE_HMMU_OFFSET; - snprintf(desc, ARRAY_SIZE(desc), "DCORE3_HMMU%d", index); break; case GAUDI2_EVENT_PMMU0_PAGE_FAULT_WR_PERM ... GAUDI2_EVENT_PMMU0_SECURITY_ERROR: case GAUDI2_EVENT_PMMU_AXI_ERR_RSP_0: is_pmmu = true; mmu_base = mmPMMU_HBW_MMU_BASE; - snprintf(desc, ARRAY_SIZE(desc), "PMMU"); break; default: return 0; } - error_count = gaudi2_handle_mmu_spi_sei_generic(hdev, desc, mmu_base, is_pmmu, event_mask); + error_count = gaudi2_handle_mmu_spi_sei_generic(hdev, event_type, mmu_base, + is_pmmu, event_mask); return error_count; } @@ -8611,22 +8579,17 @@ static bool gaudi2_handle_hbm_mc_sei_err(struct hl_device *hdev, u16 event_type, cause_idx = sei_data->hdr.sei_cause; if (cause_idx > GAUDI2_NUM_OF_HBM_SEI_CAUSE - 1) { - dev_err_ratelimited(hdev->dev, "Invalid HBM SEI event cause (%d) provided by FW\n", - cause_idx); + gaudi2_print_event(hdev, event_type, true, + "err cause: %s", + "Invalid HBM SEI event cause (%d) provided by FW\n", cause_idx); return true; } - if (sei_data->hdr.is_critical) - dev_err(hdev->dev, - "System Critical Error Interrupt - HBM(%u) MC(%u) MC_CH(%u) MC_PC(%u). Error cause: %s\n", - hbm_id, mc_id, sei_data->hdr.mc_channel, sei_data->hdr.mc_pseudo_channel, - hbm_mc_sei_cause[cause_idx]); - - else - dev_err_ratelimited(hdev->dev, - "System Non-Critical Error Interrupt - HBM(%u) MC(%u) MC_CH(%u) MC_PC(%u). Error cause: %s\n", - hbm_id, mc_id, sei_data->hdr.mc_channel, sei_data->hdr.mc_pseudo_channel, - hbm_mc_sei_cause[cause_idx]); + gaudi2_print_event(hdev, event_type, !sei_data->hdr.is_critical, + "System %s Error Interrupt - HBM(%u) MC(%u) MC_CH(%u) MC_PC(%u). Error cause: %s\n", + sei_data->hdr.is_critical ? "Critical" : "Non-critical", + hbm_id, mc_id, sei_data->hdr.mc_channel, sei_data->hdr.mc_pseudo_channel, + hbm_mc_sei_cause[cause_idx]); /* Print error-specific info */ switch (cause_idx) { @@ -8670,12 +8633,12 @@ static bool gaudi2_handle_hbm_mc_sei_err(struct hl_device *hdev, u16 event_type, return require_hard_reset; } -static int gaudi2_handle_hbm_cattrip(struct hl_device *hdev, u64 intr_cause_data) +static int gaudi2_handle_hbm_cattrip(struct hl_device *hdev, u16 event_type, + u64 intr_cause_data) { if (intr_cause_data) { - dev_err(hdev->dev, - "HBM catastrophic temperature error (CATTRIP) cause %#llx\n", - intr_cause_data); + gaudi2_print_event(hdev, event_type, true, + "temperature error cause: %#llx", intr_cause_data); return 1; } @@ -8708,13 +8671,13 @@ static void gaudi2_print_clk_change_info(struct hl_device *hdev, u16 event_type, hdev->clk_throttling.aggregated_reason |= HL_CLK_THROTTLE_POWER; hdev->clk_throttling.timestamp[HL_CLK_THROTTLE_TYPE_POWER].start = ktime_get(); hdev->clk_throttling.timestamp[HL_CLK_THROTTLE_TYPE_POWER].end = zero_time; - dev_info_ratelimited(hdev->dev, "Clock throttling due to power consumption\n"); + dev_dbg_ratelimited(hdev->dev, "Clock throttling due to power consumption\n"); break; case GAUDI2_EVENT_CPU_FIX_POWER_ENV_E: hdev->clk_throttling.current_reason &= ~HL_CLK_THROTTLE_POWER; hdev->clk_throttling.timestamp[HL_CLK_THROTTLE_TYPE_POWER].end = ktime_get(); - dev_info_ratelimited(hdev->dev, "Power envelop is safe, back to optimal clock\n"); + dev_dbg_ratelimited(hdev->dev, "Power envelop is safe, back to optimal clock\n"); break; case GAUDI2_EVENT_CPU_FIX_THERMAL_ENV_S: @@ -8741,16 +8704,18 @@ static void gaudi2_print_clk_change_info(struct hl_device *hdev, u16 event_type, mutex_unlock(&hdev->clk_throttling.lock); } -static void gaudi2_print_out_of_sync_info(struct hl_device *hdev, +static void gaudi2_print_out_of_sync_info(struct hl_device *hdev, u16 event_type, struct cpucp_pkt_sync_err *sync_err) { struct hl_hw_queue *q = &hdev->kernel_queues[GAUDI2_QUEUE_ID_CPU_PQ]; - dev_err(hdev->dev, "Out of sync with FW, FW: pi=%u, ci=%u, LKD: pi=%u, ci=%d\n", - le32_to_cpu(sync_err->pi), le32_to_cpu(sync_err->ci), q->pi, atomic_read(&q->ci)); + gaudi2_print_event(hdev, event_type, false, + "FW: pi=%u, ci=%u, LKD: pi=%u, ci=%d\n", + le32_to_cpu(sync_err->pi), le32_to_cpu(sync_err->ci), + q->pi, atomic_read(&q->ci)); } -static int gaudi2_handle_pcie_p2p_msix(struct hl_device *hdev) +static int gaudi2_handle_pcie_p2p_msix(struct hl_device *hdev, u16 event_type) { u32 p2p_intr, msix_gw_intr, error_count = 0; @@ -8758,7 +8723,7 @@ static int gaudi2_handle_pcie_p2p_msix(struct hl_device *hdev) msix_gw_intr = RREG32(mmPCIE_WRAP_MSIX_GW_INTR); if (p2p_intr) { - dev_err_ratelimited(hdev->dev, + gaudi2_print_event(hdev, event_type, true, "pcie p2p transaction terminated due to security, req_id(0x%x)\n", RREG32(mmPCIE_WRAP_P2P_REQ_ID)); @@ -8767,7 +8732,7 @@ static int gaudi2_handle_pcie_p2p_msix(struct hl_device *hdev) } if (msix_gw_intr) { - dev_err_ratelimited(hdev->dev, + gaudi2_print_event(hdev, event_type, true, "pcie msi-x gen denied due to vector num check failure, vec(0x%X)\n", RREG32(mmPCIE_WRAP_MSIX_GW_VEC)); @@ -8822,17 +8787,18 @@ static int gaudi2_handle_psoc_drain(struct hl_device *hdev, u64 intr_cause_data) return error_count; } -static void gaudi2_print_cpu_pkt_failure_info(struct hl_device *hdev, +static void gaudi2_print_cpu_pkt_failure_info(struct hl_device *hdev, u16 event_type, struct cpucp_pkt_sync_err *sync_err) { struct hl_hw_queue *q = &hdev->kernel_queues[GAUDI2_QUEUE_ID_CPU_PQ]; - dev_warn(hdev->dev, + gaudi2_print_event(hdev, event_type, false, "FW reported sanity check failure, FW: pi=%u, ci=%u, LKD: pi=%u, ci=%d\n", le32_to_cpu(sync_err->pi), le32_to_cpu(sync_err->ci), q->pi, atomic_read(&q->ci)); } -static int hl_arc_event_handle(struct hl_device *hdev, struct hl_eq_engine_arc_intr_data *data) +static int hl_arc_event_handle(struct hl_device *hdev, u16 event_type, + struct hl_eq_engine_arc_intr_data *data) { struct hl_engine_arc_dccm_queue_full_irq *q; u32 intr_type, engine_id; @@ -8846,12 +8812,12 @@ static int hl_arc_event_handle(struct hl_device *hdev, struct hl_eq_engine_arc_i case ENGINE_ARC_DCCM_QUEUE_FULL_IRQ: q = (struct hl_engine_arc_dccm_queue_full_irq *) &payload; - dev_err_ratelimited(hdev->dev, + gaudi2_print_event(hdev, event_type, true, "ARC DCCM Full event: EngId: %u, Intr_type: %u, Qidx: %u\n", engine_id, intr_type, q->queue_index); return 1; default: - dev_err_ratelimited(hdev->dev, "Unknown ARC event type\n"); + gaudi2_print_event(hdev, event_type, true, "Unknown ARC event type\n"); return 0; } } @@ -8860,8 +8826,7 @@ static void gaudi2_handle_eqe(struct hl_device *hdev, struct hl_eq_entry *eq_ent { struct gaudi2_device *gaudi2 = hdev->asic_specific; bool reset_required = false, is_critical = false; - u32 ctl, reset_flags = HL_DRV_RESET_HARD, error_count = 0; - int index, sbte_index; + u32 index, ctl, reset_flags = HL_DRV_RESET_HARD, error_count = 0; u64 event_mask = 0; u16 event_type; @@ -8877,8 +8842,6 @@ static void gaudi2_handle_eqe(struct hl_device *hdev, struct hl_eq_entry *eq_ent gaudi2->events_stat[event_type]++; gaudi2->events_stat_aggregate[event_type]++; - gaudi2_print_irq_info(hdev, event_type); - switch (event_type) { case GAUDI2_EVENT_PCIE_CORE_SERR ... GAUDI2_EVENT_ARC0_ECC_DERR: fallthrough; @@ -8901,12 +8864,12 @@ static void gaudi2_handle_eqe(struct hl_device *hdev, struct hl_eq_entry *eq_ent case GAUDI2_EVENT_ARC_AXI_ERROR_RESPONSE_0: reset_flags |= HL_DRV_RESET_FW_FATAL_ERR; - error_count = gaudi2_handle_arc_farm_sei_err(hdev); + error_count = gaudi2_handle_arc_farm_sei_err(hdev, event_type); event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR; break; case GAUDI2_EVENT_CPU_AXI_ERR_RSP: - error_count = gaudi2_handle_cpu_sei_err(hdev); + error_count = gaudi2_handle_cpu_sei_err(hdev, event_type); event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR; break; @@ -8921,7 +8884,7 @@ static void gaudi2_handle_eqe(struct hl_device *hdev, struct hl_eq_entry *eq_ent case GAUDI2_EVENT_ROTATOR0_AXI_ERROR_RESPONSE: case GAUDI2_EVENT_ROTATOR1_AXI_ERROR_RESPONSE: index = event_type - GAUDI2_EVENT_ROTATOR0_AXI_ERROR_RESPONSE; - error_count = gaudi2_handle_rot_err(hdev, index, + error_count = gaudi2_handle_rot_err(hdev, index, event_type, &eq_entry->razwi_with_intr_cause, &event_mask); error_count += gaudi2_handle_qm_sei_err(hdev, event_type, NULL, &event_mask); event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR; @@ -8929,7 +8892,7 @@ static void gaudi2_handle_eqe(struct hl_device *hdev, struct hl_eq_entry *eq_ent case GAUDI2_EVENT_TPC0_AXI_ERR_RSP ... GAUDI2_EVENT_TPC24_AXI_ERR_RSP: index = event_type - GAUDI2_EVENT_TPC0_AXI_ERR_RSP; - error_count = gaudi2_tpc_ack_interrupts(hdev, index, "AXI_ERR_RSP", + error_count = gaudi2_tpc_ack_interrupts(hdev, index, event_type, &eq_entry->razwi_with_intr_cause, &event_mask); error_count += gaudi2_handle_qm_sei_err(hdev, event_type, NULL, &event_mask); event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR; @@ -8937,8 +8900,8 @@ static void gaudi2_handle_eqe(struct hl_device *hdev, struct hl_eq_entry *eq_ent case GAUDI2_EVENT_DEC0_AXI_ERR_RSPONSE ... GAUDI2_EVENT_DEC9_AXI_ERR_RSPONSE: index = event_type - GAUDI2_EVENT_DEC0_AXI_ERR_RSPONSE; - error_count = gaudi2_handle_dec_err(hdev, index, "AXI_ERR_RESPONSE", - &eq_entry->razwi_info, &event_mask); + error_count = gaudi2_handle_dec_err(hdev, index, event_type, + &eq_entry->razwi_info, &event_mask); event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR; break; @@ -8969,8 +8932,8 @@ static void gaudi2_handle_eqe(struct hl_device *hdev, struct hl_eq_entry *eq_ent case GAUDI2_EVENT_TPC24_KERNEL_ERR: index = (event_type - GAUDI2_EVENT_TPC0_KERNEL_ERR) / (GAUDI2_EVENT_TPC1_KERNEL_ERR - GAUDI2_EVENT_TPC0_KERNEL_ERR); - error_count = gaudi2_tpc_ack_interrupts(hdev, index, "KRN_ERR", - &eq_entry->razwi_with_intr_cause, &event_mask); + error_count = gaudi2_tpc_ack_interrupts(hdev, index, event_type, + &eq_entry->razwi_with_intr_cause, &event_mask); event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR; break; @@ -8986,7 +8949,7 @@ static void gaudi2_handle_eqe(struct hl_device *hdev, struct hl_eq_entry *eq_ent case GAUDI2_EVENT_DEC9_SPI: index = (event_type - GAUDI2_EVENT_DEC0_SPI) / (GAUDI2_EVENT_DEC1_SPI - GAUDI2_EVENT_DEC0_SPI); - error_count = gaudi2_handle_dec_err(hdev, index, "SPI", + error_count = gaudi2_handle_dec_err(hdev, index, event_type, &eq_entry->razwi_info, &event_mask); event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR; break; @@ -8998,8 +8961,8 @@ static void gaudi2_handle_eqe(struct hl_device *hdev, struct hl_eq_entry *eq_ent index = (event_type - GAUDI2_EVENT_MME0_CTRL_AXI_ERROR_RESPONSE) / (GAUDI2_EVENT_MME1_CTRL_AXI_ERROR_RESPONSE - GAUDI2_EVENT_MME0_CTRL_AXI_ERROR_RESPONSE); - error_count = gaudi2_handle_mme_err(hdev, index, - "CTRL_AXI_ERROR_RESPONSE", &eq_entry->razwi_info, &event_mask); + error_count = gaudi2_handle_mme_err(hdev, index, event_type, + &eq_entry->razwi_info, &event_mask); error_count += gaudi2_handle_qm_sei_err(hdev, event_type, NULL, &event_mask); event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR; break; @@ -9011,7 +8974,7 @@ static void gaudi2_handle_eqe(struct hl_device *hdev, struct hl_eq_entry *eq_ent index = (event_type - GAUDI2_EVENT_MME0_QMAN_SW_ERROR) / (GAUDI2_EVENT_MME1_QMAN_SW_ERROR - GAUDI2_EVENT_MME0_QMAN_SW_ERROR); - error_count = gaudi2_handle_mme_err(hdev, index, "QMAN_SW_ERROR", + error_count = gaudi2_handle_mme_err(hdev, index, event_type, &eq_entry->razwi_info, &event_mask); event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR; break; @@ -9023,26 +8986,26 @@ static void gaudi2_handle_eqe(struct hl_device *hdev, struct hl_eq_entry *eq_ent index = (event_type - GAUDI2_EVENT_MME0_WAP_SOURCE_RESULT_INVALID) / (GAUDI2_EVENT_MME1_WAP_SOURCE_RESULT_INVALID - GAUDI2_EVENT_MME0_WAP_SOURCE_RESULT_INVALID); - error_count = gaudi2_handle_mme_wap_err(hdev, index, + error_count = gaudi2_handle_mme_wap_err(hdev, index, event_type, &eq_entry->razwi_info, &event_mask); event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR; break; case GAUDI2_EVENT_KDMA_CH0_AXI_ERR_RSP: case GAUDI2_EVENT_KDMA0_CORE: - error_count = gaudi2_handle_kdma_core_event(hdev, + error_count = gaudi2_handle_kdma_core_event(hdev, event_type, le64_to_cpu(eq_entry->intr_cause.intr_cause_data)); event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR; break; case GAUDI2_EVENT_HDMA2_CORE ... GAUDI2_EVENT_PDMA1_CORE: - error_count = gaudi2_handle_dma_core_event(hdev, + error_count = gaudi2_handle_dma_core_event(hdev, event_type, le64_to_cpu(eq_entry->intr_cause.intr_cause_data)); event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR; break; case GAUDI2_EVENT_PCIE_ADDR_DEC_ERR: - error_count = gaudi2_print_pcie_addr_dec_info(hdev, + error_count = gaudi2_print_pcie_addr_dec_info(hdev, event_type, le64_to_cpu(eq_entry->intr_cause.intr_cause_data), &event_mask); reset_flags |= HL_DRV_RESET_FW_FATAL_ERR; event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR; @@ -9065,7 +9028,7 @@ static void gaudi2_handle_eqe(struct hl_device *hdev, struct hl_eq_entry *eq_ent break; case GAUDI2_EVENT_PMMU_FATAL_0: - error_count = gaudi2_handle_pif_fatal(hdev, + error_count = gaudi2_handle_pif_fatal(hdev, event_type, le64_to_cpu(eq_entry->intr_cause.intr_cause_data)); reset_flags |= HL_DRV_RESET_FW_FATAL_ERR; event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR; @@ -9086,7 +9049,7 @@ static void gaudi2_handle_eqe(struct hl_device *hdev, struct hl_eq_entry *eq_ent break; case GAUDI2_EVENT_HBM_CATTRIP_0 ... GAUDI2_EVENT_HBM_CATTRIP_5: - error_count = gaudi2_handle_hbm_cattrip(hdev, + error_count = gaudi2_handle_hbm_cattrip(hdev, event_type, le64_to_cpu(eq_entry->intr_cause.intr_cause_data)); event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR; break; @@ -9122,14 +9085,8 @@ static void gaudi2_handle_eqe(struct hl_device *hdev, struct hl_eq_entry *eq_ent case GAUDI2_EVENT_MME1_SBTE0_AXI_ERR_RSP ... GAUDI2_EVENT_MME1_SBTE4_AXI_ERR_RSP: case GAUDI2_EVENT_MME2_SBTE0_AXI_ERR_RSP ... GAUDI2_EVENT_MME2_SBTE4_AXI_ERR_RSP: case GAUDI2_EVENT_MME3_SBTE0_AXI_ERR_RSP ... GAUDI2_EVENT_MME3_SBTE4_AXI_ERR_RSP: - index = (event_type - GAUDI2_EVENT_MME0_SBTE0_AXI_ERR_RSP) / - (GAUDI2_EVENT_MME1_SBTE0_AXI_ERR_RSP - - GAUDI2_EVENT_MME0_SBTE0_AXI_ERR_RSP); - sbte_index = (event_type - GAUDI2_EVENT_MME0_SBTE0_AXI_ERR_RSP) % - (GAUDI2_EVENT_MME1_SBTE0_AXI_ERR_RSP - - GAUDI2_EVENT_MME0_SBTE0_AXI_ERR_RSP); - error_count = gaudi2_handle_mme_sbte_err(hdev, index, sbte_index, - le64_to_cpu(eq_entry->intr_cause.intr_cause_data)); + error_count = gaudi2_handle_mme_sbte_err(hdev, event_type, + le64_to_cpu(eq_entry->intr_cause.intr_cause_data)); event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR; break; case GAUDI2_EVENT_VM0_ALARM_A ... GAUDI2_EVENT_VM3_ALARM_B: @@ -9217,7 +9174,7 @@ static void gaudi2_handle_eqe(struct hl_device *hdev, struct hl_eq_entry *eq_ent break; case GAUDI2_EVENT_CPU_PKT_QUEUE_OUT_SYNC: - gaudi2_print_out_of_sync_info(hdev, &eq_entry->pkt_sync_err); + gaudi2_print_out_of_sync_info(hdev, event_type, &eq_entry->pkt_sync_err); error_count = GAUDI2_NA_EVENT_CAUSE; event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR; break; @@ -9229,13 +9186,13 @@ static void gaudi2_handle_eqe(struct hl_device *hdev, struct hl_eq_entry *eq_ent break; case GAUDI2_EVENT_PCIE_P2P_MSIX: - error_count = gaudi2_handle_pcie_p2p_msix(hdev); + error_count = gaudi2_handle_pcie_p2p_msix(hdev, event_type); event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR; break; case GAUDI2_EVENT_SM0_AXI_ERROR_RESPONSE ... GAUDI2_EVENT_SM3_AXI_ERROR_RESPONSE: index = event_type - GAUDI2_EVENT_SM0_AXI_ERROR_RESPONSE; - error_count = gaudi2_handle_sm_err(hdev, index); + error_count = gaudi2_handle_sm_err(hdev, event_type, index); event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR; break; @@ -9258,13 +9215,13 @@ static void gaudi2_handle_eqe(struct hl_device *hdev, struct hl_eq_entry *eq_ent break; case GAUDI2_EVENT_CPU_PKT_SANITY_FAILED: - gaudi2_print_cpu_pkt_failure_info(hdev, &eq_entry->pkt_sync_err); + gaudi2_print_cpu_pkt_failure_info(hdev, event_type, &eq_entry->pkt_sync_err); error_count = GAUDI2_NA_EVENT_CAUSE; event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR; break; case GAUDI2_EVENT_ARC_DCCM_FULL: - error_count = hl_arc_event_handle(hdev, &eq_entry->arc_data); + error_count = hl_arc_event_handle(hdev, event_type, &eq_entry->arc_data); event_mask |= HL_NOTIFIER_EVENT_USER_ENGINE_ERR; break; @@ -9286,7 +9243,9 @@ static void gaudi2_handle_eqe(struct hl_device *hdev, struct hl_eq_entry *eq_ent * Note that although we have counted the errors, we use this number as * a boolean. */ - if (error_count == 0 && !is_info_event(event_type)) + if (error_count == GAUDI2_NA_EVENT_CAUSE && !is_info_event(event_type)) + gaudi2_print_event(hdev, event_type, true, "%d", event_type); + else if (error_count == 0) dev_err_ratelimited(hdev->dev, "No Error cause for H/W event %d\n", event_type); From patchwork Thu Dec 8 15:13:37 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oded Gabbay X-Patchwork-Id: 31402 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp253930wrr; Thu, 8 Dec 2022 07:16:50 -0800 (PST) X-Google-Smtp-Source: AA0mqf5arr+A5zOqw/2ptNvLSKO0atRqXamjhZzLVNQqaeRpdg+oNBs7XW437fbxKZK+bb+82x7d X-Received: by 2002:a17:906:158f:b0:78d:45e9:97be with SMTP id k15-20020a170906158f00b0078d45e997bemr76216125ejd.565.1670512610051; Thu, 08 Dec 2022 07:16:50 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670512610; cv=none; d=google.com; s=arc-20160816; b=aR+rWXzO+v8pAwEe/oh5iGTfd/+84JOgu6iMvzXncfu0FZQIYvRVu6QiaDcGyDKDq5 uUZp1jf/Jj4bsl2RngM8UWHxxQeBlTpbgLdhpJ4lBkaefUnsDyXSpZIZA03UAlM2/ogY Gm0fkvtpaYZMZJnqXSTXzR5VwBCpu8iz25IqlG+Rj8aLPZHK7W7e2oxa/3xhX4J98TsV SspEDvouz0lMhUDP1UQopFE4MBfolqUBcVGOr/ZAzMx7vzx3GTIj7kzT0UgX1FwgCN3N 4z/V6E5u9/fvzzUjLRHzEhcPdvK2zK5cR4yElxb3uhrNixw8ZCKIfbuvQqBQ+P5VnqLs R2iA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=Z5xjpmLLLjOZIoCaPpzYOUr7cJHtAB1m9ga3Tif1BqU=; b=tQpxjxdUMzQ9l7us0VGqXTCUMvhL/WcJhmcuzfHnKdmeqRdzjZX4AKQWFfJ4qwdhE1 eCvgPgb+XRiK9XdnDDl2uGERiP9GL0xmNJ0Q4Iq2pihRwBojqNA83Z7tKKxL1QVU2FlB fSS9dT/lteUdfJk7t0JfY4tl1Ybz96lIp9x08FD7wQRJx6YRumS3X1PtJHl9UqiK/3zM tw3d2ZIoz72s/jDlr5VJ2RZd6F6lzbgndVkXCzl9XqauNaQv+HR+9X4FN2EOwXIhyZSa 2n3sgpPXaky0WrPosf5m3Lu2peZsxI2XP7sMJakdCV6GRJ0q06kOnJDQT490xY0lAlAM 3uZQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=FbNUGvwr; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id sg33-20020a170907a42100b0078d9f02b452si14330390ejc.861.2022.12.08.07.16.25; Thu, 08 Dec 2022 07:16:50 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=FbNUGvwr; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230253AbiLHPPB (ORCPT + 99 others); Thu, 8 Dec 2022 10:15:01 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58598 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230129AbiLHPOX (ORCPT ); Thu, 8 Dec 2022 10:14:23 -0500 Received: from sin.source.kernel.org (sin.source.kernel.org [145.40.73.55]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F125BABA29 for ; Thu, 8 Dec 2022 07:14:15 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sin.source.kernel.org (Postfix) with ESMTPS id 60E1CCE24B1 for ; Thu, 8 Dec 2022 15:14:14 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id B6C52C433D6; Thu, 8 Dec 2022 15:14:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1670512452; bh=0Ny7Hg3x78P01JGdM08WcN1k/djTpuFQmBSNIxo7jgA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=FbNUGvwrJTI+vrijnj+Mmqwu6/lP2LPmmuDiDstCMuBiW8GLyjvVDBupB4CifGAZL EDKG4YLGg5zUwQuDRoKmOJJVd1UyzNkUMamWHIL2oNbFeLXZ9D7vQQr735Mxh10P6a OaFQY643RxypX823iEmg3mo1QAV8jgN0J3nZhTT5f/o6Yw4OfaEpvmMNCzGzJLIqWC 4vyZ+Wv4822ueSlRCCGMR7lBnhnwfwh4rbqu3kIimslsgNZqg/stwY8zq9Wv6NdOQT bzc9i5JNeA5sTCeeQjcFue84NBHvbZZGTGuismYcA8/VDaQ0BPGB46ub2hPPbOvRUm lJ1NzcmoNvUSw== From: Oded Gabbay To: linux-kernel@vger.kernel.org Cc: Tamir Gilad-Raz Subject: [PATCH 13/26] habanalabs: adjacent timestamps should be more accurate Date: Thu, 8 Dec 2022 17:13:37 +0200 Message-Id: <20221208151350.1833823-13-ogabbay@kernel.org> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20221208151350.1833823-1-ogabbay@kernel.org> References: <20221208151350.1833823-1-ogabbay@kernel.org> MIME-Version: 1.0 X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751659430418824005?= X-GMAIL-MSGID: =?utf-8?q?1751659430418824005?= From: Tamir Gilad-Raz timestamp events that expire on the same interrupt will get the same timestamp value Signed-off-by: Tamir Gilad-Raz Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/misc/habanalabs/common/irq.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/misc/habanalabs/common/irq.c b/drivers/misc/habanalabs/common/irq.c index 94d537fd4fde..8bbcc223df91 100644 --- a/drivers/misc/habanalabs/common/irq.c +++ b/drivers/misc/habanalabs/common/irq.c @@ -228,7 +228,7 @@ static void hl_ts_free_objects(struct work_struct *work) * list to a dedicated workqueue to do the actual put. */ static int handle_registration_node(struct hl_device *hdev, struct hl_user_pending_interrupt *pend, - struct list_head **free_list) + struct list_head **free_list, ktime_t now) { struct timestamp_reg_free_node *free_node; u64 timestamp; @@ -246,7 +246,7 @@ static int handle_registration_node(struct hl_device *hdev, struct hl_user_pendi if (!free_node) return -ENOMEM; - timestamp = ktime_get_ns(); + timestamp = ktime_to_ns(now); *pend->ts_reg_info.timestamp_kernel_addr = timestamp; @@ -298,7 +298,7 @@ static void handle_user_interrupt(struct hl_device *hdev, struct hl_user_interru if (pend->ts_reg_info.buf) { if (!reg_node_handle_fail) { rc = handle_registration_node(hdev, pend, - &ts_reg_free_list_head); + &ts_reg_free_list_head, now); if (rc) reg_node_handle_fail = true; } From patchwork Thu Dec 8 15:13:38 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oded Gabbay X-Patchwork-Id: 31403 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp254208wrr; Thu, 8 Dec 2022 07:17:12 -0800 (PST) X-Google-Smtp-Source: AA0mqf6ADMnuM1tDPezfeiiMGvI3ZolerKMBGu23IkE9dE9MVNxJd+BC9u7lfcZslmphyBb3C85i X-Received: by 2002:a17:906:381:b0:78c:b8b0:9d35 with SMTP id b1-20020a170906038100b0078cb8b09d35mr72423236eja.586.1670512632729; Thu, 08 Dec 2022 07:17:12 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670512632; cv=none; d=google.com; s=arc-20160816; b=gQyAHagORTxR+nl+XTJaKnX0fxVochqrLG16PJ9i8UxS9HhW6+yXVDd2OJbc/EoWcQ WbIvShQ9UP0EbbyrKbBjSTkakm30+kVioPoOilg5hZIB0WiExqyBigk8ENYjn8W7WsAJ 3DctsuS1zABe7b89uQn+40jlf2tBw5L9IPdh5amw8clsne8v0FGYdvNxQkxzeHye7pgX /pYrMh1GVefzZzvSvKQkjMJ6xPOt1gz9ufS2h20frkd/rh4/14uTJU3hwZ9Yk+GAq4Gm 84NA4x7pJTFSs0VoQ8QW3BPO3zJDSMew3dWqH5Rjxswd4mT4lrrFI5Z7aMBpflL9hncw MHzg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=rLZXmMxNNVz89ammjolPbJ7lnM8d0mjUwDl7VaSJmzY=; b=DmNTuBiD6oDUQx75rc5VlLvEMf7GsUYHGt1VPe/xLBjOg7sA3zEOSEyIkKAWs+seSe bMLaF2fbtvCEK5Ip1T7VfHb2bQNEteqnVkJmqBY8IWJtjn7twoAj88h7cBfpbPvbDo2G 2iYN+kydQfWKoYyhP0KBYTQa16S4q9rrq/e57XvCUNDngzW55fVcSN92YEkjmFnM9r8U flwehJfCCZSmyWIXtKbLdh5u/2JUsixaU6+ryhEhrdXFheueQruFMJZ707jtwse6+EC+ Aur2YhIbuuDA+jfacdEHtQxe+MOd+5G4xSanKiUiYbmjKs5UT1AvymKyBt7Z9VZy1yFr TwLw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=sVG+VSq9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id hs3-20020a1709073e8300b007c09f4b0947si20023853ejc.1004.2022.12.08.07.16.48; Thu, 08 Dec 2022 07:17:12 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=sVG+VSq9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230258AbiLHPPF (ORCPT + 99 others); Thu, 8 Dec 2022 10:15:05 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58526 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230160AbiLHPOY (ORCPT ); Thu, 8 Dec 2022 10:14:24 -0500 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 777A8A13DB for ; Thu, 8 Dec 2022 07:14:16 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 2A06EB823DC for ; Thu, 8 Dec 2022 15:14:15 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 107FBC433B5; Thu, 8 Dec 2022 15:14:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1670512453; bh=Axm7JkrI0AwBWI6qFRE8GtU+YHzlmrVmu4Yn38yTB8k=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=sVG+VSq9MaUHgiL3Q1EsrgdsNxt/qpyf4MRjkMPFKGxFU7I+/oIpUJjB9ByTOPsUG +cjBX+TdC28FhRxd5/crjyPbinPRuS472/GD3BNVB0Pf69CfEwjETSYrvqdv9WZG1W n8ItVQDBW6KPbldP+Cl9MxFkWENwfa4y1s9fXQ4ohV0E3rIO2dlFB7iyZz2jC+qucd ZJRSNXAWzXGC4hnHpYz0XoiyjenK/US+Xoks1hAqak1QDmbJdJLsAIl3jx9Q9Kq2MI yN8Ps7tjLHxX/hhwAuYSeXxUrEdcOHAxyaGauzKqcW45AeHzrAcl5wB/i/M84ZdkBT 3w1v0hHHLRGew== From: Oded Gabbay To: linux-kernel@vger.kernel.org Cc: Tomer Tayar Subject: [PATCH 14/26] habanalabs: skip device idle check in hpriv_release if in reset Date: Thu, 8 Dec 2022 17:13:38 +0200 Message-Id: <20221208151350.1833823-14-ogabbay@kernel.org> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20221208151350.1833823-1-ogabbay@kernel.org> References: <20221208151350.1833823-1-ogabbay@kernel.org> MIME-Version: 1.0 X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751659454374994736?= X-GMAIL-MSGID: =?utf-8?q?1751659454374994736?= From: Tomer Tayar When user context is released and hpriv_release() is called, there is a device idle status check, to understand if user has left the device not idle and then a reset is required. However, if the user process is killed because of device hard reset, the device at this point would always be not idle, because the device engines were already forcefully halted. Modify hpriv_release() to skip the idle check if reset is in progress. Signed-off-by: Tomer Tayar Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/misc/habanalabs/common/device.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/misc/habanalabs/common/device.c b/drivers/misc/habanalabs/common/device.c index afd9d4d46574..71f958a2e91b 100644 --- a/drivers/misc/habanalabs/common/device.c +++ b/drivers/misc/habanalabs/common/device.c @@ -428,8 +428,10 @@ static void hpriv_release(struct kref *ref) */ reset_device = hdev->reset_upon_device_release || hdev->reset_info.watchdog_active; - /* Unless device is reset in any case, check idle status and reset if device is not idle */ - if (!reset_device && hdev->pdev && !hdev->pldm) + /* Check the device idle status and reset if not idle. + * Skip it if already in reset, or if device is going to be reset in any case. + */ + if (!hdev->reset_info.in_reset && !reset_device && hdev->pdev && !hdev->pldm) device_is_idle = hdev->asic_funcs->is_device_idle(hdev, idle_mask, HL_BUSY_ENGINES_MASK_EXT_SIZE, NULL); if (!device_is_idle) { From patchwork Thu Dec 8 15:13:39 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oded Gabbay X-Patchwork-Id: 31404 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp254495wrr; Thu, 8 Dec 2022 07:17:43 -0800 (PST) X-Google-Smtp-Source: AA0mqf5oMFNSSJ713I23FEHaX1vilElPpMm4kN7yOjzR1KnrBUZLoAZCyqx/r72+tMP3g1mH3Rs6 X-Received: by 2002:aa7:c0d0:0:b0:46d:8e85:170e with SMTP id j16-20020aa7c0d0000000b0046d8e85170emr2317695edp.422.1670512663429; Thu, 08 Dec 2022 07:17:43 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670512663; cv=none; d=google.com; s=arc-20160816; b=CXH7eDlTzgJy0KXuWx2Qh668DmszlIH4hXxuA3PZ7QEyk6q5b8G4tujc+rjYSgKDvs uyuLqXlhUMl/8GxvnYTu/4w80MI+MO6gPpddsEQzm05zkOFSI2oNdCDtqNgXQTGUk71F DTDcX6ygNaJKmgy44PPuV0FvOA0IJ30khd/b3WZqg4xeVeFnfZVURD0+83gZhGiPJ7rM 6iTTx5uWEEac1rjpuZ2Lg2IIlRy2xtkkWs73c1ifFHKgOdXgPYvgOxgmkHVs89SK3k3R aC+imj3/h5bFCplWjvFzgDtNxoRbBUWH7p7MuLOlZxiQxk8rrWU+th1gY0ckZASCrAI1 UJaA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=nZAqiaWRFbhQLkujHZHkxVHbJEgvqkxcyLhYadJmHpg=; b=JzXyyUScvG8Nu/wlEVBsDPkXgPZzfQjAl/BWAtB40Txb0lWYKHvpl/8ArJWiRkkEQH 8qfJuc5JCyNeB30Nx04p5IKXBExT9JtnaU7PTtNliHnxAyHRhRtHFSESjYVCYVYjT/HT 5Ueh1XHMoFAJPWxgdwwvYKEkqwDHIvbX5Koflvxlwl+Vldq7gsSovRFVKdfS/gXW8n7d q7/K8sPNWyoMSa+hm1CPnvD6aYf9yVVYfdNWaGZvai4FpbNpffhkx+4i6NsC3XfuPv6U koqO7jxujPSDe5ixhETfBnk2uKxpcKOwoROdOW2XB5BEP7a1tvb90P+wNTajLmPZT9di njSA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=igZeISUT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id l13-20020a056402254d00b00458d94f1a45si7605128edb.413.2022.12.08.07.17.19; Thu, 08 Dec 2022 07:17:43 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=igZeISUT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229847AbiLHPPN (ORCPT + 99 others); Thu, 8 Dec 2022 10:15:13 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58484 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229932AbiLHPOY (ORCPT ); Thu, 8 Dec 2022 10:14:24 -0500 Received: from sin.source.kernel.org (sin.source.kernel.org [145.40.73.55]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A86AAABA15 for ; Thu, 8 Dec 2022 07:14:18 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sin.source.kernel.org (Postfix) with ESMTPS id 02351CE24C0 for ; Thu, 8 Dec 2022 15:14:17 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5EB12C433C1; Thu, 8 Dec 2022 15:14:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1670512455; bh=kqnJh32428fEtQQnluf14oKM86g0ErJo5Wv73mAYI7I=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=igZeISUToq2SvC+PazVTJLrjcGauZWwZlAuV0owz8QbRIiO1/tssi7+zKg2qApZ40 lIAvImA8IkDniGVJVwFjHT3sKqw8f3Hyb6RPgmTJWWctCdEtxvNHJPH8itlm3+57f0 yAoMwgEow7fjb5vcp6pQ/kR/RXLQwJmyMcP0Dp4aNUfRilitsQDz1rntwyIOBLxbyN f+3fL2G8Tmsx0Hr2YGRmprpoHBaByDlbtvsKxtXj+axefhUWbDt95w3Xr4TV1ss7bo I4KWdQjqPTbZ3jaujyaI2N2xOjDdLJYAtP7epqRMohPUuHRoB2FSMfkCGxJlvBFUaN c6zR+RUhhOUOg== From: Oded Gabbay To: linux-kernel@vger.kernel.org Cc: Ofir Bitton Subject: [PATCH 15/26] habanalabs/gaudi2: support abrupt device reset event Date: Thu, 8 Dec 2022 17:13:39 +0200 Message-Id: <20221208151350.1833823-15-ogabbay@kernel.org> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20221208151350.1833823-1-ogabbay@kernel.org> References: <20221208151350.1833823-1-ogabbay@kernel.org> MIME-Version: 1.0 X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751659486266546827?= X-GMAIL-MSGID: =?utf-8?q?1751659486266546827?= From: Ofir Bitton In certain scenarios, firmware might encounter a fatal event for which a device reset is required. Hence, a proper notification is needed for driver to be aware and initiate a reset sequence. In secured environments the reset will be performed by firmware without an explicit request from the driver. Signed-off-by: Ofir Bitton Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/misc/habanalabs/gaudi2/gaudi2.c | 1 + drivers/misc/habanalabs/include/gaudi2/gaudi2_async_events.h | 1 + .../habanalabs/include/gaudi2/gaudi2_async_ids_map_extended.h | 2 ++ 3 files changed, 4 insertions(+) diff --git a/drivers/misc/habanalabs/gaudi2/gaudi2.c b/drivers/misc/habanalabs/gaudi2/gaudi2.c index 8373239ad1bc..ba3b0ae76ebf 100644 --- a/drivers/misc/habanalabs/gaudi2/gaudi2.c +++ b/drivers/misc/habanalabs/gaudi2/gaudi2.c @@ -9226,6 +9226,7 @@ static void gaudi2_handle_eqe(struct hl_device *hdev, struct hl_eq_entry *eq_ent break; case GAUDI2_EVENT_CPU_FP32_NOT_SUPPORTED: + case GAUDI2_EVENT_DEV_RESET_REQ: event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR; error_count = GAUDI2_NA_EVENT_CAUSE; is_critical = true; diff --git a/drivers/misc/habanalabs/include/gaudi2/gaudi2_async_events.h b/drivers/misc/habanalabs/include/gaudi2/gaudi2_async_events.h index 305b576222e6..50852cc80373 100644 --- a/drivers/misc/habanalabs/include/gaudi2/gaudi2_async_events.h +++ b/drivers/misc/habanalabs/include/gaudi2/gaudi2_async_events.h @@ -958,6 +958,7 @@ enum gaudi2_async_event_id { GAUDI2_EVENT_CPU11_STATUS_NIC11_ENG1 = 1318, GAUDI2_EVENT_ARC_DCCM_FULL = 1319, GAUDI2_EVENT_CPU_FP32_NOT_SUPPORTED = 1320, + GAUDI2_EVENT_DEV_RESET_REQ = 1321, GAUDI2_EVENT_SIZE, }; diff --git a/drivers/misc/habanalabs/include/gaudi2/gaudi2_async_ids_map_extended.h b/drivers/misc/habanalabs/include/gaudi2/gaudi2_async_ids_map_extended.h index d510cb10c883..82be01bea98e 100644 --- a/drivers/misc/habanalabs/include/gaudi2/gaudi2_async_ids_map_extended.h +++ b/drivers/misc/habanalabs/include/gaudi2/gaudi2_async_ids_map_extended.h @@ -2665,6 +2665,8 @@ static struct gaudi2_async_events_ids_map gaudi2_irq_map_table[] = { .msg = 1, .reset = 0, .name = "ARC_DCCM_FULL" }, { .fc_id = 1320, .cpu_id = 626, .valid = 1, .msg = 1, .reset = 1, .name = "FP32_NOT_SUPPORTED" }, + { .fc_id = 1321, .cpu_id = 627, .valid = 1, + .msg = 1, .reset = 1, .name = "DEV_RESET_REQ" }, }; #endif /* __GAUDI2_ASYNC_IDS_MAP_EVENTS_EXT_H_ */ From patchwork Thu Dec 8 15:13:40 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oded Gabbay X-Patchwork-Id: 31406 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp254662wrr; Thu, 8 Dec 2022 07:18:01 -0800 (PST) X-Google-Smtp-Source: AA0mqf5hTF52KgQvRvkGn43FawsQEM//UCiShdeq09K00Fp5axFoYAUvsaHs7ZIQRSFYPdrpBKCo X-Received: by 2002:a17:902:f293:b0:189:9313:9e55 with SMTP id k19-20020a170902f29300b0018993139e55mr48013413plc.76.1670512681389; Thu, 08 Dec 2022 07:18:01 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670512681; cv=none; d=google.com; s=arc-20160816; b=QqOj3xmeJaHvDWn7qs3wSkYmykXlLA5HXQJDhUM6Ltr7RHXSHFRokUcWHorw+nLGMt hSROULcn+W+NmC45CjpCSetr8fLcpyWhfeQwfOXYs5IwZ5pqLHYxhQI0cQSQ2LmfUdFY S1ZwmuFPNpVR+fBHOXqr3hcqkkF9hEU0fmnvVkp5zR7jj2xPybmOL+n1P6WHzOk8sOgs W3txxPXmdLn4DC600qhQZDnH83cZQYAmxIPPWSQdCCBIItpK4iNB5bk2w+li/fmHSemj yThOjGPhuvJX2L2EhcFAxu6qJwNVnWO0IEeVlBb7km03zXVBmP9grjk+5lk/0MtBRLzs Rhqw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=nlrPJo+nIENAuEX+yR8fyPSPfK0pykGYntBxpuslct4=; b=HMXEl4CUN8ja3VOzBVcp2dLBqHECEvxA2vgOMpgc+2ckqOeauEkSsSqLO60nTlkGJg IrAW8STJjRGo8RzEaYU6+sk5exbWQB6E3omnZQwlK9biRiDU7UgUgj4UTUJWjCpuR7V+ xUt7RYOHDRyIbZRPrk99uGtho8LGY7/BzksShVyT3e5akCeyoxH889P6flavXzV/X+qH nLX/4ve1OO4sFopKT25w8huWGOrApWorm4qJrkdA21Mu+UPhoFoB4N+jJyzkZEz2KSW7 Z6VcIBgF6SMeI4vIdFgxj8/gpupRiWmiDKlDawOyNk1/k3rkb0Dhy4NTGNDNUjJqMUt+ E76g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=eEPzu9Nf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id j36-20020a635964000000b004786230ec58si23652237pgm.169.2022.12.08.07.17.44; Thu, 08 Dec 2022 07:18:01 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=eEPzu9Nf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230269AbiLHPPK (ORCPT + 99 others); Thu, 8 Dec 2022 10:15:10 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58658 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230077AbiLHPOY (ORCPT ); Thu, 8 Dec 2022 10:14:24 -0500 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5F044AD300 for ; Thu, 8 Dec 2022 07:14:18 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id CED35B82438 for ; Thu, 8 Dec 2022 15:14:16 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id AD1D8C433D7; Thu, 8 Dec 2022 15:14:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1670512456; bh=cAKcQcV7iFao3mOnMlI+eG//rxDwyNKzE9TaluP3uTw=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=eEPzu9Nf19B4upbsDev+PrO9XTZElweiNQgTmQci4ZUiOZCsBUv2NEVUXk62bDvxv Fxe9Dxo4ISyT9HrbcqXddJfFTPSDTipvM/OkCZlW29l+pzQ7WbV2cKZEXIg6aH9e5M R6atgI6G33paYLZUERyXmpTHgRgOBtX5p8u8ry7lBZf5pQ0EWdHMUo4+ag3KX+0YQF +ZuBKjqtP9a0k7x77CdD2Wt5FyJmd18Z6E15gVTYspDy4X6Ks8i9BRTjmxneAJJa6L wSZ8K3kAt8zd+kkCg7fJWW0DO2Ht3SC+K1eKNfyjOW08eF+X34xzvyxO3wFLgDatM1 w+cSCnVwO0Vvg== From: Oded Gabbay To: linux-kernel@vger.kernel.org Cc: Ohad Sharabi Subject: [PATCH 16/26] habanalabs: define traces for COMMS protocol Date: Thu, 8 Dec 2022 17:13:40 +0200 Message-Id: <20221208151350.1833823-16-ogabbay@kernel.org> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20221208151350.1833823-1-ogabbay@kernel.org> References: <20221208151350.1833823-1-ogabbay@kernel.org> MIME-Version: 1.0 X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751659505434094400?= X-GMAIL-MSGID: =?utf-8?q?1751659505434094400?= From: Ohad Sharabi As the COMMS protocol is being used more widely in our driver, an available debug tool for the handshake will be handy. This commit defines tracepoints to various key points of the COMMS protocol. Signed-off-by: Ohad Sharabi Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- include/trace/events/habanalabs.h | 36 +++++++++++++++++++++++++++++++ 1 file changed, 36 insertions(+) diff --git a/include/trace/events/habanalabs.h b/include/trace/events/habanalabs.h index f05c5fa668a2..10233e13cee4 100644 --- a/include/trace/events/habanalabs.h +++ b/include/trace/events/habanalabs.h @@ -87,6 +87,42 @@ DEFINE_EVENT(habanalabs_dma_alloc_template, habanalabs_dma_free, TP_PROTO(struct device *dev, u64 cpu_addr, u64 dma_addr, size_t size, const char *caller), TP_ARGS(dev, cpu_addr, dma_addr, size, caller)); +DECLARE_EVENT_CLASS(habanalabs_comms_template, + TP_PROTO(struct device *dev, char *op_str), + + TP_ARGS(dev, op_str), + + TP_STRUCT__entry( + __string(dname, dev_name(dev)) + __field(char *, op_str) + ), + + TP_fast_assign( + __assign_str(dname, dev_name(dev)); + __entry->op_str = op_str; + ), + + TP_printk("%s: cms: %s", + __get_str(dname), + __entry->op_str) +); + +DEFINE_EVENT(habanalabs_comms_template, habanalabs_comms_protocol_cmd, + TP_PROTO(struct device *dev, char *op_str), + TP_ARGS(dev, op_str)); + +DEFINE_EVENT(habanalabs_comms_template, habanalabs_comms_send_cmd, + TP_PROTO(struct device *dev, char *op_str), + TP_ARGS(dev, op_str)); + +DEFINE_EVENT(habanalabs_comms_template, habanalabs_comms_wait_status, + TP_PROTO(struct device *dev, char *op_str), + TP_ARGS(dev, op_str)); + +DEFINE_EVENT(habanalabs_comms_template, habanalabs_comms_wait_status_done, + TP_PROTO(struct device *dev, char *op_str), + TP_ARGS(dev, op_str)); + #endif /* if !defined(_TRACE_HABANALABS_H) || defined(TRACE_HEADER_MULTI_READ) */ /* This part must be outside protection */ From patchwork Thu Dec 8 15:13:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oded Gabbay X-Patchwork-Id: 31405 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp254619wrr; Thu, 8 Dec 2022 07:17:57 -0800 (PST) X-Google-Smtp-Source: AA0mqf4PAnnsXYhU3g6SOWTY1wp0mI9Wbe9JoVR2nZlV/hE76NGK12xK1FwdyXst1RRJf9cJHSej X-Received: by 2002:a17:906:950e:b0:7c0:fd1a:afce with SMTP id u14-20020a170906950e00b007c0fd1aafcemr11819795ejx.48.1670512676943; Thu, 08 Dec 2022 07:17:56 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670512676; cv=none; d=google.com; s=arc-20160816; b=ZCSCGbXyWaiQnYbWLlk4WrnAXLzGtTAraardI42CwB4yIXkYCoIeto/g5Ag45269Cm fCInxQ2bhiUTmmEpdFTqfOKN9uyBRoVTVoE34CvWnAjY240DMDPdaY4feiXg3fZc7Xxb G1+/+3uEblW7I1khDuDiu+/3d2ibb14G+jH/kPERIzkRaiPIUR6rqKZxC3M1alCJpW15 /dV5H5R5zypbHlc9WUBVEgjeqIQ3DKeUuYwmzRW5T06Lv1xvZEm2u71MA8EI4nwoza3Q dOf/F7EnvkTO9ozb2weN+uUZBb3Tu49SyP3+vlvRnLRKLnI9l/cpiYFmr6cgpVvAKEYY RNrA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=/7N3mWtakzI86+JosIaCaQfLG1ihLPomw++vhjO6x+o=; b=s0KsihAhB73gQPwQgYwrLtrDCcX+TXy7xd24VfNE+T77ozYMPoqTycQBBSCz/0uv+i PUNj1pHE7ontf609do//iM3FKSxkswTEOV2N3LasZz77wi0Qj6Se+VRVqaNq6bXQyAiI NoCFcEh1KV7ytyYzOjjC/UHfvpjSHAmHn5cV0yhTA06nFg5EmT8i9oaRXhBMDZ/pRr1D bxchfwBHOqc7XFsMuxsP4cOVG8Wj/9zXSL3RnPz7lHp3/xeWOK6ThG2gkojN2wU75tFr EG93HZUnQgdhZGJJZ3IcRZ43mA/DN12W81tDJHYDFQUJ6wRKngXBnnGbNRiA914DJ9qi wWLQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=Ks5WIIex; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id n14-20020a05640205ce00b0046af790c410si7664590edx.569.2022.12.08.07.17.30; Thu, 08 Dec 2022 07:17:56 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=Ks5WIIex; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230191AbiLHPPR (ORCPT + 99 others); Thu, 8 Dec 2022 10:15:17 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58662 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230084AbiLHPOY (ORCPT ); Thu, 8 Dec 2022 10:14:24 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 78B4FAD30A for ; Thu, 8 Dec 2022 07:14:18 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 6A03861F79 for ; Thu, 8 Dec 2022 15:14:18 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 07EA1C433B5; Thu, 8 Dec 2022 15:14:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1670512457; bh=QCFI8Bg0UNmXzc0lLft98hLai46arv5oXu2M9fxiNmQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Ks5WIIexjuGux+LQlCrHSfwsGEnYE9GxB6qP7/MD63O5fncATPyVYrT7WXw1U1grF z82yCCTHWVO2x6ijzSRD9sgKEpFrHt1eKv6rqKEdVfmAZqhNoHMFKWuz3lJ0NEQ6XR qeARM0lBZTEfW7oadR5tv8nA0lOgFcL2hklRlThwmkzOD0xuHHKNhbqeeljg+PwnVs e+4endn65oQRm6prYRZu16SclzFejf+kSskkXTpGKUpsaWAIwL5b08wJJAo8fw+VG0 u9k5gLINhxgvewurELgdTKW2fuI4hcB2k2N4Vf9QMNRNeMLhxWLS4cMLxfFcF801b5 uAMY1HedC+9og== From: Oded Gabbay To: linux-kernel@vger.kernel.org Cc: Ohad Sharabi Subject: [PATCH 17/26] habanalabs: trace COMMS protocol Date: Thu, 8 Dec 2022 17:13:41 +0200 Message-Id: <20221208151350.1833823-17-ogabbay@kernel.org> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20221208151350.1833823-1-ogabbay@kernel.org> References: <20221208151350.1833823-1-ogabbay@kernel.org> MIME-Version: 1.0 X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751659501017626332?= X-GMAIL-MSGID: =?utf-8?q?1751659501017626332?= From: Ohad Sharabi Call COMMS tracepoints from within the dynamic CPU FW load. This can help debug failures or delays in the dynamic FW load flow. Signed-off-by: Ohad Sharabi Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/misc/habanalabs/common/firmware_if.c | 31 ++++++++++++++++++++ 1 file changed, 31 insertions(+) diff --git a/drivers/misc/habanalabs/common/firmware_if.c b/drivers/misc/habanalabs/common/firmware_if.c index 046866c673e2..ebb81cf89f02 100644 --- a/drivers/misc/habanalabs/common/firmware_if.c +++ b/drivers/misc/habanalabs/common/firmware_if.c @@ -14,8 +14,32 @@ #include #include +#include + #define FW_FILE_MAX_SIZE 0x1400000 /* maximum size of 20MB */ +static char *comms_cmd_str_arr[COMMS_INVLD_LAST] = { + [COMMS_NOOP] = __stringify(COMMS_NOOP), + [COMMS_CLR_STS] = __stringify(COMMS_CLR_STS), + [COMMS_RST_STATE] = __stringify(COMMS_RST_STATE), + [COMMS_PREP_DESC] = __stringify(COMMS_PREP_DESC), + [COMMS_DATA_RDY] = __stringify(COMMS_DATA_RDY), + [COMMS_EXEC] = __stringify(COMMS_EXEC), + [COMMS_RST_DEV] = __stringify(COMMS_RST_DEV), + [COMMS_GOTO_WFE] = __stringify(COMMS_GOTO_WFE), + [COMMS_SKIP_BMC] = __stringify(COMMS_SKIP_BMC), + [COMMS_PREP_DESC_ELBI] = __stringify(COMMS_PREP_DESC_ELBI), +}; + +static char *comms_sts_str_arr[COMMS_STS_INVLD_LAST] = { + [COMMS_STS_NOOP] = __stringify(COMMS_STS_NOOP), + [COMMS_STS_ACK] = __stringify(COMMS_STS_ACK), + [COMMS_STS_OK] = __stringify(COMMS_STS_OK), + [COMMS_STS_ERR] = __stringify(COMMS_STS_ERR), + [COMMS_STS_VALID_ERR] = __stringify(COMMS_STS_VALID_ERR), + [COMMS_STS_TIMEOUT_ERR] = __stringify(COMMS_STS_TIMEOUT_ERR), +}; + static char *extract_fw_ver_from_str(const char *fw_str) { char *str, *fw_ver, *whitespace; @@ -1634,6 +1658,7 @@ static void hl_fw_dynamic_send_cmd(struct hl_device *hdev, val = FIELD_PREP(COMMS_COMMAND_CMD_MASK, cmd); val |= FIELD_PREP(COMMS_COMMAND_SIZE_MASK, size); + trace_habanalabs_comms_send_cmd(hdev->dev, comms_cmd_str_arr[cmd]); WREG32(le32_to_cpu(dyn_regs->kmd_msg_to_cpu), val); } @@ -1691,6 +1716,8 @@ static int hl_fw_dynamic_wait_for_status(struct hl_device *hdev, dyn_regs = &fw_loader->dynamic_loader.comm_desc.cpu_dyn_regs; + trace_habanalabs_comms_wait_status(hdev->dev, comms_sts_str_arr[expected_status]); + /* Wait for expected status */ rc = hl_poll_timeout( hdev, @@ -1706,6 +1733,8 @@ static int hl_fw_dynamic_wait_for_status(struct hl_device *hdev, return -EIO; } + trace_habanalabs_comms_wait_status_done(hdev->dev, comms_sts_str_arr[expected_status]); + /* * skip storing FW response for NOOP to preserve the actual desired * FW status @@ -1778,6 +1807,8 @@ int hl_fw_dynamic_send_protocol_cmd(struct hl_device *hdev, { int rc; + trace_habanalabs_comms_protocol_cmd(hdev->dev, comms_cmd_str_arr[cmd]); + /* first send clear command to clean former commands */ rc = hl_fw_dynamic_send_clear_cmd(hdev, fw_loader); if (rc) From patchwork Thu Dec 8 15:13:42 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oded Gabbay X-Patchwork-Id: 31407 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp254776wrr; Thu, 8 Dec 2022 07:18:12 -0800 (PST) X-Google-Smtp-Source: AA0mqf4bhDel2xD6rWCgb7zvA9zN1BFyWz9xJ7zWf6UPPMBVoE8w+lllc/GOcXz+UmYbLbzCMeDZ X-Received: by 2002:a17:907:7e86:b0:7af:bc9:5e8d with SMTP id qb6-20020a1709077e8600b007af0bc95e8dmr2540620ejc.3.1670512692152; Thu, 08 Dec 2022 07:18:12 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670512692; cv=none; d=google.com; s=arc-20160816; b=dhouXYSfPEvovGampld/mbDgHP2+OCpJgP0TbLVYsVUYUCbDe+5QyUoDPiTI5cDckT T6/YS4mrODjTAXDsuvcGBg3D05r5wVQK7ovkVglUvggfbeNaFegZNd/Y/ewo6/A09hcG 2BSOsU0NSTm107d4mxsNgFlwVLaTFF+BgUnHZBxOM1dDvXdXp7xYdoGtie4HaIEnW3/J cHuFBo0gIXQT3MtQf1088irxSOnYWml0nxNK9ZdWmpNPNN23r2M56Frs4DQGXwRNxlNO v+a5kA86T58iJBJP+Bp5G8szeYbo0qVNKk5rglLsd9Qz4TeqygdV+67fdDcCNiLhXi0y jBLQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=Aqyqxk7EdgF10MI3nnh0pGd8mgJxiQ3ZpRgMYnlOQFQ=; b=b7IujXCP7AueFrNoUUAKiLsb3weKxqcD5Zn8kNIlfcfnkieP3oLQULt9JXEUz+6MOD RfJbPiJ9+ztiA0lRlklTClfvgjl6HQdtLZsMTyK7V+vB+CK9MZJjpwe8XBGu1ly1JNbL tSRwbHKXO5yG33DBwSh+45L1iTh3/AMdbDLxXyH/7a1xQTpGlxSAZu2y4N+p9Gb2cS7V 8B8Qb54y1rF56JnHTSFBDXbPV9xdpH1/wO45lpBF2x5rjKufm+HvxmBiFBhS3AaJCQ89 y9hkFnX3adhRs0qAZ74G/KTx5bF/Nuv+HEkBphy3zqGS86pl43m48O+ap5NET9whfr+P 8sKQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=fBaDrIsI; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id be8-20020a1709070a4800b007825337afeesi18888301ejc.273.2022.12.08.07.17.44; Thu, 08 Dec 2022 07:18:12 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=fBaDrIsI; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230196AbiLHPPV (ORCPT + 99 others); Thu, 8 Dec 2022 10:15:21 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58706 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230179AbiLHPO0 (ORCPT ); Thu, 8 Dec 2022 10:14:26 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2CC517B56C for ; Thu, 8 Dec 2022 07:14:20 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id B920E61F7D for ; Thu, 8 Dec 2022 15:14:19 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 57EA4C433C1; Thu, 8 Dec 2022 15:14:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1670512459; bh=5/B/6xm6EZAJ5ul0EuRWGnEHFBhwPuBWdvvYzLWeBlU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=fBaDrIsIJwFl99jsLsKk7341Pms0EoeW62pYwYz7gCo2L9W4WW30F97abdgoXY9K/ dnhdo/3SCVJfUn/uQTCSOxhyQq0unZBX6ORU9SHNCJqcdl6aX06Lan/B3UdTQrzHwO SjanNY0AEvGUErvY+xpoW3vf9WRj/52D4D6/jOQ2baWOYE+fbgQAjJW2266qzxZ2LK sBnZifcmiVi6RbE07/su8DVdAtAFoKsOwW8WWzAB3CoxARVOVAgeIvvypyWcp970+V hMW0FzIR7hPIlU/20Fokemsyi+AkHQIHAMVvbUaxsTgeNFj/gSual6Du2FlMJw9JWa VXK3OfXWk3m6g== From: Oded Gabbay To: linux-kernel@vger.kernel.org Cc: farah kassabri Subject: [PATCH 18/26] habanalabs: set log level for descriptor validation to debug Date: Thu, 8 Dec 2022 17:13:42 +0200 Message-Id: <20221208151350.1833823-18-ogabbay@kernel.org> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20221208151350.1833823-1-ogabbay@kernel.org> References: <20221208151350.1833823-1-ogabbay@kernel.org> MIME-Version: 1.0 X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751659516346036518?= X-GMAIL-MSGID: =?utf-8?q?1751659516346036518?= From: farah kassabri This warning doesn't have real consequences, and therefore can be printed in debug level. Signed-off-by: farah kassabri Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/misc/habanalabs/common/firmware_if.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/misc/habanalabs/common/firmware_if.c b/drivers/misc/habanalabs/common/firmware_if.c index ebb81cf89f02..537b1ae3fcb7 100644 --- a/drivers/misc/habanalabs/common/firmware_if.c +++ b/drivers/misc/habanalabs/common/firmware_if.c @@ -1932,11 +1932,11 @@ static int hl_fw_dynamic_validate_descriptor(struct hl_device *hdev, int rc; if (le32_to_cpu(fw_desc->header.magic) != HL_COMMS_DESC_MAGIC) - dev_warn(hdev->dev, "Invalid magic for dynamic FW descriptor (%x)\n", + dev_dbg(hdev->dev, "Invalid magic for dynamic FW descriptor (%x)\n", fw_desc->header.magic); if (fw_desc->header.version != HL_COMMS_DESC_VER) - dev_warn(hdev->dev, "Invalid version for dynamic FW descriptor (%x)\n", + dev_dbg(hdev->dev, "Invalid version for dynamic FW descriptor (%x)\n", fw_desc->header.version); /* From patchwork Thu Dec 8 15:13:43 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oded Gabbay X-Patchwork-Id: 31408 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp254984wrr; Thu, 8 Dec 2022 07:18:33 -0800 (PST) X-Google-Smtp-Source: AA0mqf6XcfHX1DjbW5n8Ywmm87JS/856Qd3tQW0cWD0Ai2d7gs4CfMpUp1szaNZ1cOzx9YOCguja X-Received: by 2002:a05:6402:c88:b0:46c:aa8b:da52 with SMTP id cm8-20020a0564020c8800b0046caa8bda52mr15909535edb.262.1670512713346; Thu, 08 Dec 2022 07:18:33 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670512713; cv=none; d=google.com; s=arc-20160816; b=RmTyHvyGG5ETqkGGE2W3qU8j3F/W+WoYuqyjzRw8U6qkCyBQHWsIAR+MGUyd9/roBi U61sy7IfbNex8Fg2Ih+wuoYEUoyp1xf+Qdjx+1ZiWCTWd19OXdrgq+a71K1Kke34TtNZ t+hpglWbhHoi5C2JA48gkhQMNzo4acA4aIutHbznMaAeOWELym+2jWlXaBcJCpcoJ2+U VS8xRwnJsjllPgYKAoexdeHAQTrRWLw3VnUEAOt05pAULPgIc7/ZqZaG1ieJYG8Au5tK z/Ia48uDN9G9BP3TZOPGW/wLdkDwXfsDM8g106QwQE3UjEeujD/2sq2ZUf9+CXSi9uCF WX8A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=VSFL1943jGgUxM7R/FqvbB0KzOmwIKGHiX/K0hInfAM=; b=PKVpzKQVx284oF3Per4ZIRrHoF77AuJsU5dSmddPPuxh1STcSTulftjAxPTw4doM00 l51/KFZvhfj/Ep8iCIPQCOo/DPiSGlQAlOwUVaC7GUi2f2uF8P59Mo5pIoadrP6UubRf r5rsxXCbEbu6X1FObhcA3t5BBRngw/SN/r7qr1QiHuzaRblmROXyY2cFl8E8lwqFFoez AgudKuHc8Gb0s05pjSO50M8l6AzW5QlTbky9DZEK0+0RHztj7k4/4Elgx+ColB/kSVTN tUP/LvXdZC22dbfMzVoHOTt027yLIBJuLAIXwiOGBubg3cgOmFDkc0sYQ2Prw+jDUDHD gKIw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=TVfTPEjv; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id o16-20020aa7c7d0000000b0046cfe42c10asi5595319eds.624.2022.12.08.07.18.08; Thu, 08 Dec 2022 07:18:33 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=TVfTPEjv; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229849AbiLHPPZ (ORCPT + 99 others); Thu, 8 Dec 2022 10:15:25 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58708 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230180AbiLHPO0 (ORCPT ); Thu, 8 Dec 2022 10:14:26 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9897DAD31C for ; Thu, 8 Dec 2022 07:14:21 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 1794961F80 for ; Thu, 8 Dec 2022 15:14:21 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id A74BFC433D6; Thu, 8 Dec 2022 15:14:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1670512460; bh=nggGQMhecchLPKWMCGZApD7iuvqknklG7XXG/izo1E0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=TVfTPEjvPrAob54oD28xbqfxnEkExQ44o94SxaSzngIePiCmnOsdqFSWIOzatmYie XwEa1RBy9xmFPSs7Swhu8rOlCP4oWSPXhzhs2+Qbvy3mDq/+u9Ntay/AUNQeQ02juO Lyt2KPEyYm1rbG8aiApWu8Ck3N1BqMcNxYGfI5qeqQi1LuB8bKksfcfVwNB/q8zkcE UJD+IQRuVcey5oXkyx/12/RFzUeRaf4Y2ngJLDyQ06REh09pPET6BmBRusjnps1yT0 NCkNy/99/v5NI8Bapq2ZDG/NCQxlhu9KN6+p5qUb1n71Y2Z5WLn4Ms59k8JfRAZIep sWeHf7AIyVLIQ== From: Oded Gabbay To: linux-kernel@vger.kernel.org Cc: Ohad Sharabi Subject: [PATCH 19/26] habanalabs: remove support to export dmabuf from handle Date: Thu, 8 Dec 2022 17:13:43 +0200 Message-Id: <20221208151350.1833823-19-ogabbay@kernel.org> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20221208151350.1833823-1-ogabbay@kernel.org> References: <20221208151350.1833823-1-ogabbay@kernel.org> MIME-Version: 1.0 X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751659539141549935?= X-GMAIL-MSGID: =?utf-8?q?1751659539141549935?= From: Ohad Sharabi The API to the user which allows exporting DMA buffer from handle is deprecated here. It was never used as it is relevant only for Gaudi2, and the user stack has yet to add support for dmabuf in Gaudi2. Looking forward, a modified API to export DMA buffer for ASICs that supports virtual memory will be added. Until the new API will be ready- exporting DMA buffer will not be supported for ASICs with virtual memory. Signed-off-by: Ohad Sharabi Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/misc/habanalabs/common/habanalabs.h | 5 - drivers/misc/habanalabs/common/memory.c | 143 ++------------------ 2 files changed, 9 insertions(+), 139 deletions(-) diff --git a/drivers/misc/habanalabs/common/habanalabs.h b/drivers/misc/habanalabs/common/habanalabs.h index 893ebcba170b..e68928b59c1e 100644 --- a/drivers/misc/habanalabs/common/habanalabs.h +++ b/drivers/misc/habanalabs/common/habanalabs.h @@ -1744,8 +1744,6 @@ struct hl_cs_counters_atomic { * struct hl_dmabuf_priv - a dma-buf private object. * @dmabuf: pointer to dma-buf object. * @ctx: pointer to the dma-buf owner's context. - * @phys_pg_pack: pointer to physical page pack if the dma-buf was exported for - * memory allocation handle. * @device_address: physical address of the device's memory. Relevant only * if phys_pg_pack is NULL (dma-buf was exported from address). * The total size can be taken from the dmabuf object. @@ -1753,7 +1751,6 @@ struct hl_cs_counters_atomic { struct hl_dmabuf_priv { struct dma_buf *dmabuf; struct hl_ctx *ctx; - struct hl_vm_phys_pg_pack *phys_pg_pack; uint64_t device_address; }; @@ -2117,7 +2114,6 @@ struct hl_vm_hw_block_list_node { * @node: used to attach to deletion list that is used when all the allocations are cleared * at the teardown of the context. * @mapping_cnt: number of shared mappings. - * @exporting_cnt: number of dma-buf exporting. * @asid: the context related to this list. * @page_size: size of each page in the pack. * @flags: HL_MEM_* flags related to this list. @@ -2133,7 +2129,6 @@ struct hl_vm_phys_pg_pack { u64 total_size; struct list_head node; atomic_t mapping_cnt; - u32 exporting_cnt; u32 asid; u32 page_size; u32 flags; diff --git a/drivers/misc/habanalabs/common/memory.c b/drivers/misc/habanalabs/common/memory.c index 7c5c18be294a..864a8a1c6067 100644 --- a/drivers/misc/habanalabs/common/memory.c +++ b/drivers/misc/habanalabs/common/memory.c @@ -371,12 +371,6 @@ static int free_device_memory(struct hl_ctx *ctx, struct hl_mem_in *args) return -EINVAL; } - if (phys_pg_pack->exporting_cnt) { - spin_unlock(&vm->idr_lock); - dev_dbg(hdev->dev, "handle %u is exported, cannot free\n", handle); - return -EINVAL; - } - /* must remove from idr before the freeing of the physical pages as the refcount of the pool * is also the trigger of the idr destroy */ @@ -1700,29 +1694,19 @@ static struct sg_table *hl_map_dmabuf(struct dma_buf_attachment *attachment, enum dma_data_direction dir) { struct dma_buf *dma_buf = attachment->dmabuf; - struct hl_vm_phys_pg_pack *phys_pg_pack; struct hl_dmabuf_priv *hl_dmabuf; struct hl_device *hdev; struct sg_table *sgt; hl_dmabuf = dma_buf->priv; hdev = hl_dmabuf->ctx->hdev; - phys_pg_pack = hl_dmabuf->phys_pg_pack; if (!attachment->peer2peer) { dev_dbg(hdev->dev, "Failed to map dmabuf because p2p is disabled\n"); return ERR_PTR(-EPERM); } - if (phys_pg_pack) - sgt = alloc_sgt_from_device_pages(hdev, - phys_pg_pack->pages, - phys_pg_pack->npages, - phys_pg_pack->page_size, - attachment->dev, - dir); - else - sgt = alloc_sgt_from_device_pages(hdev, + sgt = alloc_sgt_from_device_pages(hdev, &hl_dmabuf->device_address, 1, hl_dmabuf->dmabuf->size, @@ -1763,18 +1747,8 @@ static void hl_unmap_dmabuf(struct dma_buf_attachment *attachment, static void hl_release_dmabuf(struct dma_buf *dmabuf) { struct hl_dmabuf_priv *hl_dmabuf = dmabuf->priv; - struct hl_ctx *ctx = hl_dmabuf->ctx; - struct hl_device *hdev = ctx->hdev; - struct hl_vm *vm = &hdev->vm; - - if (hl_dmabuf->phys_pg_pack) { - spin_lock(&vm->idr_lock); - hl_dmabuf->phys_pg_pack->exporting_cnt--; - spin_unlock(&vm->idr_lock); - } hl_ctx_put(hl_dmabuf->ctx); - kfree(hl_dmabuf); } @@ -1785,7 +1759,7 @@ static const struct dma_buf_ops habanalabs_dmabuf_ops = { .release = hl_release_dmabuf, }; -static int export_dmabuf_common(struct hl_ctx *ctx, +static int export_dmabuf(struct hl_ctx *ctx, struct hl_dmabuf_priv *hl_dmabuf, u64 total_size, int flags, int *dmabuf_fd) { @@ -1849,6 +1823,11 @@ static int export_dmabuf_from_addr(struct hl_ctx *ctx, u64 device_addr, prop = &hdev->asic_prop; + if (prop->dram_supports_virtual_memory) { + dev_dbg(hdev->dev, "Export not supported for devices with virtual memory\n"); + return -EOPNOTSUPP; + } + if (!IS_ALIGNED(device_addr, PAGE_SIZE)) { dev_dbg(hdev->dev, "exported device memory address 0x%llx should be aligned to 0x%lx\n", @@ -1890,99 +1869,7 @@ static int export_dmabuf_from_addr(struct hl_ctx *ctx, u64 device_addr, hl_dmabuf->device_address = device_addr; - rc = export_dmabuf_common(ctx, hl_dmabuf, size, flags, dmabuf_fd); - if (rc) - goto err_free_dmabuf_wrapper; - - return 0; - -err_free_dmabuf_wrapper: - kfree(hl_dmabuf); - return rc; -} - -/** - * export_dmabuf_from_handle() - export a dma-buf object for the given memory - * handle. - * @ctx: pointer to the context structure. - * @handle: device memory allocation handle. - * @flags: DMA-BUF file/FD flags. - * @dmabuf_fd: pointer to result FD that represents the dma-buf object. - * - * Create and export a dma-buf object for an existing memory allocation inside - * the device memory, and return a FD which is associated with the dma-buf - * object. - * - * Return: 0 on success, non-zero for failure. - */ -static int export_dmabuf_from_handle(struct hl_ctx *ctx, u64 handle, int flags, - int *dmabuf_fd) -{ - struct hl_vm_phys_pg_pack *phys_pg_pack; - struct hl_dmabuf_priv *hl_dmabuf; - struct hl_device *hdev = ctx->hdev; - struct asic_fixed_properties *prop; - struct hl_vm *vm = &hdev->vm; - u64 bar_address; - int rc, i; - - prop = &hdev->asic_prop; - - if (upper_32_bits(handle)) { - dev_dbg(hdev->dev, "no match for handle 0x%llx\n", handle); - return -EINVAL; - } - - spin_lock(&vm->idr_lock); - - phys_pg_pack = idr_find(&vm->phys_pg_pack_handles, (u32) handle); - if (!phys_pg_pack) { - spin_unlock(&vm->idr_lock); - dev_dbg(hdev->dev, "no match for handle 0x%x\n", (u32) handle); - return -EINVAL; - } - - /* increment now to avoid freeing device memory while exporting */ - phys_pg_pack->exporting_cnt++; - - spin_unlock(&vm->idr_lock); - - if (phys_pg_pack->vm_type != VM_TYPE_PHYS_PACK) { - dev_dbg(hdev->dev, "handle 0x%llx does not represent DRAM memory\n", handle); - rc = -EINVAL; - goto err_dec_exporting_cnt; - } - - for (i = 0 ; i < phys_pg_pack->npages ; i++) { - - bar_address = hdev->dram_pci_bar_start + - (phys_pg_pack->pages[i] - - prop->dram_base_address); - - if (bar_address + phys_pg_pack->page_size > - hdev->dram_pci_bar_start + prop->dram_pci_bar_size || - bar_address + phys_pg_pack->page_size < bar_address) { - - dev_dbg(hdev->dev, - "DRAM memory range 0x%llx (+0x%x) is outside of PCI BAR boundaries\n", - phys_pg_pack->pages[i], - phys_pg_pack->page_size); - - rc = -EINVAL; - goto err_dec_exporting_cnt; - } - } - - hl_dmabuf = kzalloc(sizeof(*hl_dmabuf), GFP_KERNEL); - if (!hl_dmabuf) { - rc = -ENOMEM; - goto err_dec_exporting_cnt; - } - - hl_dmabuf->phys_pg_pack = phys_pg_pack; - - rc = export_dmabuf_common(ctx, hl_dmabuf, phys_pg_pack->total_size, - flags, dmabuf_fd); + rc = export_dmabuf(ctx, hl_dmabuf, size, flags, dmabuf_fd); if (rc) goto err_free_dmabuf_wrapper; @@ -1990,12 +1877,6 @@ static int export_dmabuf_from_handle(struct hl_ctx *ctx, u64 handle, int flags, err_free_dmabuf_wrapper: kfree(hl_dmabuf); - -err_dec_exporting_cnt: - spin_lock(&vm->idr_lock); - phys_pg_pack->exporting_cnt--; - spin_unlock(&vm->idr_lock); - return rc; } @@ -2269,13 +2150,7 @@ int hl_mem_ioctl(struct hl_fpriv *hpriv, void *data) break; case HL_MEM_OP_EXPORT_DMABUF_FD: - if (hdev->asic_prop.dram_supports_virtual_memory) - rc = export_dmabuf_from_handle(ctx, - args->in.export_dmabuf_fd.handle, - args->in.flags, - &dmabuf_fd); - else - rc = export_dmabuf_from_addr(ctx, + rc = export_dmabuf_from_addr(ctx, args->in.export_dmabuf_fd.handle, args->in.export_dmabuf_fd.mem_size, args->in.flags, From patchwork Thu Dec 8 15:13:44 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oded Gabbay X-Patchwork-Id: 31411 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp255334wrr; Thu, 8 Dec 2022 07:19:09 -0800 (PST) X-Google-Smtp-Source: AA0mqf4GEPH2raNrivdVs/RZLaD6VKwxpmtKc9wxHovIBo8bGw9iRT40goF8ABC1BMKkNMA6I0Sw X-Received: by 2002:a17:906:3b92:b0:7c0:f5d4:94bb with SMTP id u18-20020a1709063b9200b007c0f5d494bbmr1881362ejf.442.1670512749632; Thu, 08 Dec 2022 07:19:09 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670512749; cv=none; d=google.com; s=arc-20160816; b=PDddz0+2IPzPlGEB0rNR7c03pzzLmJLmJ3+k5Q6iKl666yIhSwCPxITQz4mA3iOtJU YiCIasaHN7IWkLBVylocdXcbHyOjRhOQXASM0gfxhz0CWQHIFcoPm6NV4NT1vQ+UWrpG HEVYY0WB/wbBjjt0jt8tlBDJ0Zei93m2Fwsj4ZutjFxrO3YxtWcBBHWMikbPDBtvV7iX e2HYSisPy9v37K7ITQx41Q4kUgcBS85xbGQ5Nzt28P73sClaE7n7tvfl2qU/Yli7/hbx kQzj2goBeqIvr9Q/sSQ+AQqVHRdlR/6uAiZxk6zSvQoXnKaqqXkpsyRN2wby6CQI7Evn LYmg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=sa748EneAaIqygnbtSY6S52mArwa/XkzJEvMrI9aSTM=; b=AkPq6l220VR6T8oL02YO/A0Er2xROLhZoFmWZ4wPhQdjj6oZfVOjHYDjHdZ1HdVp/g SXECW7itRFiumyzhLjefXNltBgNpZGrl7R7tnkSPh4Iz5V9/IBTnBkcbTLyWG6E4B8ZR xENMwsf0uF9lTpF4RvUg0yrdaXjyEMM9c4Mmzn4R0IsOS6M1g9l+Tb9HDI1ryCC2Cmmu N/vM4g1HuWgLOBIS37dJOFy+zQNVj/iJi2FwKJVUtYWXJyEPCmVg2duFjEBQq/bSX4UX IwTBzcTuUQsx6mYPT5vNJtSyhzA788fqWubysx97IhSl9sT4hJbDiYPYwk5QyyJY5RvO FLig== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=GJ2oMSez; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id wu14-20020a170906eece00b007ba04e1398bsi18654055ejb.943.2022.12.08.07.18.45; Thu, 08 Dec 2022 07:19:09 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=GJ2oMSez; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230304AbiLHPP2 (ORCPT + 99 others); Thu, 8 Dec 2022 10:15:28 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58930 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230020AbiLHPOb (ORCPT ); Thu, 8 Dec 2022 10:14:31 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DD530AD322 for ; Thu, 8 Dec 2022 07:14:22 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 65A0061F6B for ; Thu, 8 Dec 2022 15:14:22 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 01708C433C1; Thu, 8 Dec 2022 15:14:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1670512461; bh=2WxFtwBQLkQ7axHemveXVhRJxDTNuVv+k1voETl0tvQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=GJ2oMSez+bqd/xc+Z5Bdljg6g0YLWQx5lVGG0mKlgLd2VNZziEemhGLWZrLzelAOM UhbtlLtDV66S7qCiNHF4PV139NBMd8X/MuHuYQWf9Hqy5iMmtBv725XtOHxizhvoKp aVFL7ppuyR3xiAbH4t6/qkRgzSPRMgQ1uKhrH7FFFYNr09HpTsCSD7xqK0VgcX7cpe QrXHmEQQaeH/x4CvRRJSt6N6s3JNT12tQPGzRr+3j5doEvEMiWe097VEMbOKOYfpIW IqrQ1QsNL5lYyxyGdUFabhGKZXxj7F1vXsd1+EBX/IMHHFXV3Wd5WIJwFvYacSbE51 nH+M3ZheXBI7g== From: Oded Gabbay To: linux-kernel@vger.kernel.org Cc: Ohad Sharabi Subject: [PATCH 20/26] habanalabs: helper function to validate export params Date: Thu, 8 Dec 2022 17:13:44 +0200 Message-Id: <20221208151350.1833823-20-ogabbay@kernel.org> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20221208151350.1833823-1-ogabbay@kernel.org> References: <20221208151350.1833823-1-ogabbay@kernel.org> MIME-Version: 1.0 X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751659577011780661?= X-GMAIL-MSGID: =?utf-8?q?1751659577011780661?= From: Ohad Sharabi Validate export parameters in a dedicated function instead of in the main export flow. This will be useful later when support to export dmabuf for devices with virtual memory will be added. Signed-off-by: Ohad Sharabi Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/misc/habanalabs/common/memory.c | 79 ++++++++++++++----------- 1 file changed, 44 insertions(+), 35 deletions(-) diff --git a/drivers/misc/habanalabs/common/memory.c b/drivers/misc/habanalabs/common/memory.c index 864a8a1c6067..e3b2e882b037 100644 --- a/drivers/misc/habanalabs/common/memory.c +++ b/drivers/misc/habanalabs/common/memory.c @@ -1797,36 +1797,10 @@ static int export_dmabuf(struct hl_ctx *ctx, return rc; } -/** - * export_dmabuf_from_addr() - export a dma-buf object for the given memory - * address and size. - * @ctx: pointer to the context structure. - * @device_addr: device memory physical address. - * @size: size of device memory. - * @flags: DMA-BUF file/FD flags. - * @dmabuf_fd: pointer to result FD that represents the dma-buf object. - * - * Create and export a dma-buf object for an existing memory allocation inside - * the device memory, and return a FD which is associated with the dma-buf - * object. - * - * Return: 0 on success, non-zero for failure. - */ -static int export_dmabuf_from_addr(struct hl_ctx *ctx, u64 device_addr, - u64 size, int flags, int *dmabuf_fd) +static int validate_export_params(struct hl_device *hdev, u64 device_addr, u64 size) { - struct hl_dmabuf_priv *hl_dmabuf; - struct hl_device *hdev = ctx->hdev; - struct asic_fixed_properties *prop; + struct asic_fixed_properties *prop = &hdev->asic_prop; u64 bar_address; - int rc; - - prop = &hdev->asic_prop; - - if (prop->dram_supports_virtual_memory) { - dev_dbg(hdev->dev, "Export not supported for devices with virtual memory\n"); - return -EOPNOTSUPP; - } if (!IS_ALIGNED(device_addr, PAGE_SIZE)) { dev_dbg(hdev->dev, @@ -1843,26 +1817,61 @@ static int export_dmabuf_from_addr(struct hl_ctx *ctx, u64 device_addr, } if (device_addr < prop->dram_user_base_address || - device_addr + size > prop->dram_end_address || - device_addr + size < device_addr) { + (device_addr + size) > prop->dram_end_address || + (device_addr + size) < device_addr) { dev_dbg(hdev->dev, "DRAM memory range 0x%llx (+0x%llx) is outside of DRAM boundaries\n", device_addr, size); return -EINVAL; } - bar_address = hdev->dram_pci_bar_start + - (device_addr - prop->dram_base_address); + bar_address = hdev->dram_pci_bar_start + (device_addr - prop->dram_base_address); - if (bar_address + size > - hdev->dram_pci_bar_start + prop->dram_pci_bar_size || - bar_address + size < bar_address) { + if ((bar_address + size) > (hdev->dram_pci_bar_start + prop->dram_pci_bar_size) || + (bar_address + size) < bar_address) { dev_dbg(hdev->dev, "DRAM memory range 0x%llx (+0x%llx) is outside of PCI BAR boundaries\n", device_addr, size); return -EINVAL; } + return 0; +} + +/** + * export_dmabuf_from_addr() - export a dma-buf object for the given memory + * address and size. + * @ctx: pointer to the context structure. + * @device_addr: device memory physical address. + * @size: size of device memory. + * @flags: DMA-BUF file/FD flags. + * @dmabuf_fd: pointer to result FD that represents the dma-buf object. + * + * Create and export a dma-buf object for an existing memory allocation inside + * the device memory, and return a FD which is associated with the dma-buf + * object. + * + * Return: 0 on success, non-zero for failure. + */ +static int export_dmabuf_from_addr(struct hl_ctx *ctx, u64 device_addr, + u64 size, int flags, int *dmabuf_fd) +{ + struct hl_dmabuf_priv *hl_dmabuf; + struct hl_device *hdev = ctx->hdev; + struct asic_fixed_properties *prop; + int rc; + + prop = &hdev->asic_prop; + + if (prop->dram_supports_virtual_memory) { + dev_dbg(hdev->dev, "Export not supported for devices with virtual memory\n"); + return -EOPNOTSUPP; + } + + rc = validate_export_params(hdev, device_addr, size); + if (rc) + return rc; + hl_dmabuf = kzalloc(sizeof(*hl_dmabuf), GFP_KERNEL); if (!hl_dmabuf) return -ENOMEM; From patchwork Thu Dec 8 15:13:45 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oded Gabbay X-Patchwork-Id: 31410 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp255239wrr; Thu, 8 Dec 2022 07:19:00 -0800 (PST) X-Google-Smtp-Source: AA0mqf7Y9pZGuPF7Q2PlwWTSz+4wM0WOyDnJJmCenUY5lWuAhQMUoW9MSBpcuaTNmzUmjl5JmCT3 X-Received: by 2002:a63:525e:0:b0:477:bca8:1cd9 with SMTP id s30-20020a63525e000000b00477bca81cd9mr60306467pgl.581.1670512739976; Thu, 08 Dec 2022 07:18:59 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670512739; cv=none; d=google.com; s=arc-20160816; b=YVFTRwA2QZeDnm54LWekAExEFfM5rN8UyPuy+iOjVtZouyfEEGKdAGi6HicgR7ONZw PN2w4iR7SimETMSOw40+fxmAcMSM1THwwbQuf/5ffsqlAswGgxPqvjW7ZfLCwIDAN7Vp aLMJXQD4ZyxdJ1tAxYA8H95o2jQKe9RVBD5Bl8fp6B9Bg6z5M6G3wUdH5hHl9wZAc7Px 3i5pbpz33qGuMjbgqSFqteR8fpVnPXUbqTWIOtaf7yC8tFUZsh21kfb8HzNGRSD35rzZ qXViuLOf5+MA+E+nwO+y1rxtzOFUJvzG7xYrX8RdIIat34TX2U8aofNZzeKtFpPrqu0e Pj7Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=soipPfc5pC/weX2IDTXIB7qpCZz6ksmxTKF2TMZ5V0Q=; b=FYXW3KD/bGGnOa/hp+tVNVlIAofBgQUjhtgsodEl0w70/P1oHdhN0uZp3+aosg6Ix+ QKDE4ErISGOwkf2moFOt8SCgxX47IErFF+b9EG1hPyAunEYUNCuELhPe8p4HAtZ9+SbY ClrdPWESMNjnAu23lS39z6HyNKlDwracHLsIs/fDR9CKbMV4HNcMtoiJ39J4dBlKR5Tp 4wGaX5cMIY5yOZ9yDViYvus9E86B9eriEGI/5FUyPCTtFuhDf0NRiL+3wv1GWNelEVeg T9tFXrm5Heh0jjKyLKIRBH91vkV7cI7tPaMnSVMQ8sUl5A+O3aZfKZnF2kqiT6gm5Tz3 fJiQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=sx8oWSPV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id j36-20020a635964000000b004786230ec58si23652237pgm.169.2022.12.08.07.18.45; Thu, 08 Dec 2022 07:18:59 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=sx8oWSPV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230215AbiLHPPq (ORCPT + 99 others); Thu, 8 Dec 2022 10:15:46 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58530 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229524AbiLHPOe (ORCPT ); Thu, 8 Dec 2022 10:14:34 -0500 Received: from sin.source.kernel.org (sin.source.kernel.org [145.40.73.55]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BA50DABA3A for ; Thu, 8 Dec 2022 07:14:26 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sin.source.kernel.org (Postfix) with ESMTPS id E6257CE24B1 for ; Thu, 8 Dec 2022 15:14:24 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4F9F6C433D6; Thu, 8 Dec 2022 15:14:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1670512463; bh=dlRx9Apop/1V4GBplL5fbslH6PpSxBLjSz/W4nhyK/A=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=sx8oWSPVbymJEIxO33pkweRGTuUHn+Fy2p7Pr/grfSVEL7m0ZnbUssDPVnC+GW5I9 En0ASp45ky+QZvrCNoDURD2BJhqkN/P3vLl+ryV7/mnwICIUfbsnG6T6GJjV8HJxD6 82T88TazCDTgAohGiK8JvRsUQE7o6YETd5Z6vEA+od/pkGC7MtgMCZm+FwBnyxLIPL OHQZ/rO2Vj8YshAoMOe4MdoBfeFWs6koK8yFj5C5C92VwIw3jDySWHVBvperkipw34 YpFLRFtA7EWt9NWBMvueZxaneQjnolCY+v4zdeQ5goIlAic98ATePa13Qf4tzDoaJS CrLbCEQSFBqRA== From: Oded Gabbay To: linux-kernel@vger.kernel.org Cc: Ohad Sharabi Subject: [PATCH 21/26] habanalabs: modify export dmabuf API Date: Thu, 8 Dec 2022 17:13:45 +0200 Message-Id: <20221208151350.1833823-21-ogabbay@kernel.org> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20221208151350.1833823-1-ogabbay@kernel.org> References: <20221208151350.1833823-1-ogabbay@kernel.org> MIME-Version: 1.0 X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751659566779821451?= X-GMAIL-MSGID: =?utf-8?q?1751659566779821451?= From: Ohad Sharabi A previous commit deprecated the option to export from handle, leaving the code with no support for devices with virtual memory. This commit modifies the export API in a way that unifies the uAPI to user address for both cases (i.e. with and without MMU support) and add the actual support for devices with virtual memory. Signed-off-by: Ohad Sharabi Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/misc/habanalabs/common/habanalabs.h | 9 + drivers/misc/habanalabs/common/memory.c | 219 +++++++++++++++++--- include/uapi/misc/habanalabs.h | 21 +- 3 files changed, 218 insertions(+), 31 deletions(-) diff --git a/drivers/misc/habanalabs/common/habanalabs.h b/drivers/misc/habanalabs/common/habanalabs.h index e68928b59c1e..ef5a765f3313 100644 --- a/drivers/misc/habanalabs/common/habanalabs.h +++ b/drivers/misc/habanalabs/common/habanalabs.h @@ -1744,6 +1744,9 @@ struct hl_cs_counters_atomic { * struct hl_dmabuf_priv - a dma-buf private object. * @dmabuf: pointer to dma-buf object. * @ctx: pointer to the dma-buf owner's context. + * @phys_pg_pack: pointer to physical page pack if the dma-buf was exported + * where virtual memory is supported. + * @memhash_hnode: pointer to the memhash node. this object holds the export count. * @device_address: physical address of the device's memory. Relevant only * if phys_pg_pack is NULL (dma-buf was exported from address). * The total size can be taken from the dmabuf object. @@ -1751,6 +1754,8 @@ struct hl_cs_counters_atomic { struct hl_dmabuf_priv { struct dma_buf *dmabuf; struct hl_ctx *ctx; + struct hl_vm_phys_pg_pack *phys_pg_pack; + struct hl_vm_hash_node *memhash_hnode; uint64_t device_address; }; @@ -2078,12 +2083,16 @@ struct hl_cs_parser { * hl_userptr). * @node: node to hang on the hash table in context object. * @vaddr: key virtual address. + * @handle: memory handle for device memory allocation. * @ptr: value pointer (hl_vm_phys_pg_list or hl_userptr). + * @export_cnt: number of exports from within the VA block. */ struct hl_vm_hash_node { struct hlist_node node; u64 vaddr; + u64 handle; void *ptr; + int export_cnt; }; /** diff --git a/drivers/misc/habanalabs/common/memory.c b/drivers/misc/habanalabs/common/memory.c index e3b2e882b037..c7c27ffa6309 100644 --- a/drivers/misc/habanalabs/common/memory.c +++ b/drivers/misc/habanalabs/common/memory.c @@ -19,7 +19,9 @@ MODULE_IMPORT_NS(DMA_BUF); #define HL_MMU_DEBUG 0 /* use small pages for supporting non-pow2 (32M/40M/48M) DRAM phys page sizes */ -#define DRAM_POOL_PAGE_SIZE SZ_8M +#define DRAM_POOL_PAGE_SIZE SZ_8M + +#define MEM_HANDLE_INVALID ULONG_MAX static int allocate_timestamps_buffers(struct hl_fpriv *hpriv, struct hl_mem_in *args, u64 *handle); @@ -1234,6 +1236,7 @@ static int map_device_va(struct hl_ctx *ctx, struct hl_mem_in *args, u64 *device hnode->ptr = vm_type; hnode->vaddr = ret_vaddr; + hnode->handle = is_userptr ? MEM_HANDLE_INVALID : handle; mutex_lock(&ctx->mem_hash_lock); hash_add(ctx->mem_hash, &hnode->node, ret_vaddr); @@ -1307,6 +1310,12 @@ static int unmap_device_va(struct hl_ctx *ctx, struct hl_mem_in *args, return -EINVAL; } + if (hnode->export_cnt) { + mutex_unlock(&ctx->mem_hash_lock); + dev_err(hdev->dev, "failed to unmap %#llx, memory is exported\n", vaddr); + return -EINVAL; + } + hash_del(&hnode->node); mutex_unlock(&ctx->mem_hash_lock); @@ -1694,19 +1703,29 @@ static struct sg_table *hl_map_dmabuf(struct dma_buf_attachment *attachment, enum dma_data_direction dir) { struct dma_buf *dma_buf = attachment->dmabuf; + struct hl_vm_phys_pg_pack *phys_pg_pack; struct hl_dmabuf_priv *hl_dmabuf; struct hl_device *hdev; struct sg_table *sgt; hl_dmabuf = dma_buf->priv; hdev = hl_dmabuf->ctx->hdev; + phys_pg_pack = hl_dmabuf->phys_pg_pack; if (!attachment->peer2peer) { dev_dbg(hdev->dev, "Failed to map dmabuf because p2p is disabled\n"); return ERR_PTR(-EPERM); } - sgt = alloc_sgt_from_device_pages(hdev, + if (phys_pg_pack) + sgt = alloc_sgt_from_device_pages(hdev, + phys_pg_pack->pages, + phys_pg_pack->npages, + phys_pg_pack->page_size, + attachment->dev, + dir); + else + sgt = alloc_sgt_from_device_pages(hdev, &hl_dmabuf->device_address, 1, hl_dmabuf->dmabuf->size, @@ -1747,8 +1766,15 @@ static void hl_unmap_dmabuf(struct dma_buf_attachment *attachment, static void hl_release_dmabuf(struct dma_buf *dmabuf) { struct hl_dmabuf_priv *hl_dmabuf = dmabuf->priv; + struct hl_ctx *ctx = hl_dmabuf->ctx; + + if (hl_dmabuf->memhash_hnode) { + mutex_lock(&ctx->mem_hash_lock); + hl_dmabuf->memhash_hnode->export_cnt--; + mutex_unlock(&ctx->mem_hash_lock); + } - hl_ctx_put(hl_dmabuf->ctx); + hl_ctx_put(ctx); kfree(hl_dmabuf); } @@ -1797,11 +1823,8 @@ static int export_dmabuf(struct hl_ctx *ctx, return rc; } -static int validate_export_params(struct hl_device *hdev, u64 device_addr, u64 size) +static int validate_export_params_common(struct hl_device *hdev, u64 device_addr, u64 size) { - struct asic_fixed_properties *prop = &hdev->asic_prop; - u64 bar_address; - if (!IS_ALIGNED(device_addr, PAGE_SIZE)) { dev_dbg(hdev->dev, "exported device memory address 0x%llx should be aligned to 0x%lx\n", @@ -1816,6 +1839,19 @@ static int validate_export_params(struct hl_device *hdev, u64 device_addr, u64 s return -EINVAL; } + return 0; +} + +static int validate_export_params_no_mmu(struct hl_device *hdev, u64 device_addr, u64 size) +{ + struct asic_fixed_properties *prop = &hdev->asic_prop; + u64 bar_address; + int rc; + + rc = validate_export_params_common(hdev, device_addr, size); + if (rc) + return rc; + if (device_addr < prop->dram_user_base_address || (device_addr + size) > prop->dram_end_address || (device_addr + size) < device_addr) { @@ -1838,12 +1874,115 @@ static int validate_export_params(struct hl_device *hdev, u64 device_addr, u64 s return 0; } +static int validate_export_params(struct hl_device *hdev, u64 device_addr, u64 size, u64 offset, + struct hl_vm_phys_pg_pack *phys_pg_pack) +{ + struct asic_fixed_properties *prop = &hdev->asic_prop; + u64 bar_address; + int i, rc; + + rc = validate_export_params_common(hdev, device_addr, size); + if (rc) + return rc; + + if ((offset + size) > phys_pg_pack->total_size) { + dev_dbg(hdev->dev, "offset %#llx and size %#llx exceed total map size %#llx\n", + offset, size, phys_pg_pack->total_size); + return -EINVAL; + } + + for (i = 0 ; i < phys_pg_pack->npages ; i++) { + + bar_address = hdev->dram_pci_bar_start + + (phys_pg_pack->pages[i] - prop->dram_base_address); + + if ((bar_address + phys_pg_pack->page_size) > + (hdev->dram_pci_bar_start + prop->dram_pci_bar_size) || + (bar_address + phys_pg_pack->page_size) < bar_address) { + dev_dbg(hdev->dev, + "DRAM memory range 0x%llx (+0x%x) is outside of PCI BAR boundaries\n", + phys_pg_pack->pages[i], + phys_pg_pack->page_size); + + return -EINVAL; + } + } + + return 0; +} + +static struct hl_vm_hash_node *memhash_node_export_get(struct hl_ctx *ctx, u64 addr) +{ + struct hl_device *hdev = ctx->hdev; + struct hl_vm_hash_node *hnode; + + /* get the memory handle */ + mutex_lock(&ctx->mem_hash_lock); + hash_for_each_possible(ctx->mem_hash, hnode, node, (unsigned long)addr) + if (addr == hnode->vaddr) + break; + + if (!hnode) { + mutex_unlock(&ctx->mem_hash_lock); + dev_dbg(hdev->dev, "map address %#llx not found\n", addr); + return ERR_PTR(-EINVAL); + } + + if (upper_32_bits(hnode->handle)) { + mutex_unlock(&ctx->mem_hash_lock); + dev_dbg(hdev->dev, "invalid handle %#llx for map address %#llx\n", + hnode->handle, addr); + return ERR_PTR(-EINVAL); + } + + /* + * node found, increase export count so this memory cannot be unmapped + * and the hash node cannot be deleted. + */ + hnode->export_cnt++; + mutex_unlock(&ctx->mem_hash_lock); + + return hnode; +} + +static void memhash_node_export_put(struct hl_ctx *ctx, struct hl_vm_hash_node *hnode) +{ + mutex_lock(&ctx->mem_hash_lock); + hnode->export_cnt--; + mutex_unlock(&ctx->mem_hash_lock); +} + +static struct hl_vm_phys_pg_pack *get_phys_pg_pack_from_hash_node(struct hl_device *hdev, + struct hl_vm_hash_node *hnode) +{ + struct hl_vm_phys_pg_pack *phys_pg_pack; + struct hl_vm *vm = &hdev->vm; + + spin_lock(&vm->idr_lock); + phys_pg_pack = idr_find(&vm->phys_pg_pack_handles, (u32) hnode->handle); + if (!phys_pg_pack) { + spin_unlock(&vm->idr_lock); + dev_dbg(hdev->dev, "no match for handle 0x%x\n", (u32) hnode->handle); + return ERR_PTR(-EINVAL); + } + + spin_unlock(&vm->idr_lock); + + if (phys_pg_pack->vm_type != VM_TYPE_PHYS_PACK) { + dev_dbg(hdev->dev, "handle 0x%llx does not represent DRAM memory\n", hnode->handle); + return ERR_PTR(-EINVAL); + } + + return phys_pg_pack; +} + /** * export_dmabuf_from_addr() - export a dma-buf object for the given memory * address and size. * @ctx: pointer to the context structure. - * @device_addr: device memory physical address. - * @size: size of device memory. + * @addr: device address. + * @size: size of device memory to export. + * @offset: the offset into the buffer from which to start exporting * @flags: DMA-BUF file/FD flags. * @dmabuf_fd: pointer to result FD that represents the dma-buf object. * @@ -1853,37 +1992,66 @@ static int validate_export_params(struct hl_device *hdev, u64 device_addr, u64 s * * Return: 0 on success, non-zero for failure. */ -static int export_dmabuf_from_addr(struct hl_ctx *ctx, u64 device_addr, - u64 size, int flags, int *dmabuf_fd) +static int export_dmabuf_from_addr(struct hl_ctx *ctx, u64 addr, u64 size, u64 offset, + int flags, int *dmabuf_fd) { - struct hl_dmabuf_priv *hl_dmabuf; - struct hl_device *hdev = ctx->hdev; + struct hl_vm_phys_pg_pack *phys_pg_pack = NULL; + struct hl_vm_hash_node *hnode = NULL; struct asic_fixed_properties *prop; + struct hl_dmabuf_priv *hl_dmabuf; + struct hl_device *hdev; + u64 export_addr; int rc; + hdev = ctx->hdev; prop = &hdev->asic_prop; - if (prop->dram_supports_virtual_memory) { - dev_dbg(hdev->dev, "Export not supported for devices with virtual memory\n"); - return -EOPNOTSUPP; + /* offset must be 0 in devices without virtual memory support */ + if (!prop->dram_supports_virtual_memory && offset) { + dev_dbg(hdev->dev, "offset is not allowed in device without virtual memory\n"); + return -EINVAL; } - rc = validate_export_params(hdev, device_addr, size); - if (rc) - return rc; + export_addr = addr + offset; hl_dmabuf = kzalloc(sizeof(*hl_dmabuf), GFP_KERNEL); if (!hl_dmabuf) return -ENOMEM; - hl_dmabuf->device_address = device_addr; + if (prop->dram_supports_virtual_memory) { + hnode = memhash_node_export_get(ctx, addr); + if (IS_ERR(hnode)) { + rc = PTR_ERR(hnode); + goto err_free_dmabuf_wrapper; + } + phys_pg_pack = get_phys_pg_pack_from_hash_node(hdev, hnode); + if (IS_ERR(phys_pg_pack)) { + rc = PTR_ERR(phys_pg_pack); + goto dec_memhash_export_cnt; + } + rc = validate_export_params(hdev, export_addr, size, offset, phys_pg_pack); + if (rc) + goto dec_memhash_export_cnt; + + hl_dmabuf->phys_pg_pack = phys_pg_pack; + hl_dmabuf->memhash_hnode = hnode; + } else { + rc = validate_export_params_no_mmu(hdev, export_addr, size); + if (rc) + goto err_free_dmabuf_wrapper; + } + + hl_dmabuf->device_address = export_addr; rc = export_dmabuf(ctx, hl_dmabuf, size, flags, dmabuf_fd); if (rc) - goto err_free_dmabuf_wrapper; + goto dec_memhash_export_cnt; return 0; +dec_memhash_export_cnt: + if (prop->dram_supports_virtual_memory) + memhash_node_export_put(ctx, hnode); err_free_dmabuf_wrapper: kfree(hl_dmabuf); return rc; @@ -2160,10 +2328,11 @@ int hl_mem_ioctl(struct hl_fpriv *hpriv, void *data) case HL_MEM_OP_EXPORT_DMABUF_FD: rc = export_dmabuf_from_addr(ctx, - args->in.export_dmabuf_fd.handle, - args->in.export_dmabuf_fd.mem_size, - args->in.flags, - &dmabuf_fd); + args->in.export_dmabuf_fd.addr, + args->in.export_dmabuf_fd.mem_size, + args->in.export_dmabuf_fd.offset, + args->in.flags, + &dmabuf_fd); memset(args, 0, sizeof(*args)); args->out.fd = dmabuf_fd; break; diff --git a/include/uapi/misc/habanalabs.h b/include/uapi/misc/habanalabs.h index 3b995e841eb8..c67d18901c1d 100644 --- a/include/uapi/misc/habanalabs.h +++ b/include/uapi/misc/habanalabs.h @@ -1851,15 +1851,24 @@ struct hl_mem_in { /** * structure for exporting DMABUF object (used with * the HL_MEM_OP_EXPORT_DMABUF_FD op) - * @handle: handle returned from HL_MEM_OP_ALLOC. - * in Gaudi, where we don't have MMU for the device memory, the - * driver expects a physical address (instead of a handle) in the - * device memory space. - * @mem_size: size of memory allocation. Relevant only for GAUDI + * @addr: for Gaudi1, the driver expects a physical address + * inside the device's DRAM. this is because in Gaudi1 + * we don't have MMU that covers the device's DRAM. + * for all other ASICs, the driver expects a device + * virtual address that represents the start address of + * a mapped DRAM memory area inside the device. + * the address must be the same as was received from the + * driver during a previous HL_MEM_OP_MAP operation. + * @mem_size: size of memory to export. + * @offset: for Gaudi1, this value must be 0. For all other ASICs, + * the driver expects an offset inside of the memory area + * describe by addr. the offset represents the start + * address of that the exported dma-buf object describes. */ struct { - __u64 handle; + __u64 addr; __u64 mem_size; + __u64 offset; } export_dmabuf_fd; }; From patchwork Thu Dec 8 15:13:46 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oded Gabbay X-Patchwork-Id: 31409 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp255132wrr; Thu, 8 Dec 2022 07:18:47 -0800 (PST) X-Google-Smtp-Source: AA0mqf7Jh2KhXIwh8CU8x6Z4XbAo3SzgtflhEB0jGuiYnrGvvbthtZ/f82jMYqJtHm5qjBE0lSRW X-Received: by 2002:aa7:c415:0:b0:46c:4b56:8c06 with SMTP id j21-20020aa7c415000000b0046c4b568c06mr20476962edq.230.1670512727635; Thu, 08 Dec 2022 07:18:47 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670512727; cv=none; d=google.com; s=arc-20160816; b=I8+1fHQqGkoBdIYphwgVnQ5LsTjUIAhounYutsYnsOV3476BvthtQLioQtM3xz9iYi 62LsHIoEOdjd3V+ClUPJ1cInwYWoh9hxdSSmcSKY9HzvS4LlMv+8iRk35X4yq+jWAcna 58s3CLx5yqBlKu7gCVar1CaiegX8FA5SBXa46hsoehP+vuAvtCOyxWBe0MJUlOTxU+Iy 5VZttvO8raaNOFhwrrMOwGsc3gziuplCf35VM5TiPc2iUdVsylFPuincsB3pqlEh9K+p h+kA4t2uy949TWgh14Oy1LGacVdoEjjAw5qIeC1jtMMPck1V1nXwVkkaJBpf8HbGiudA yOAQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=DiDtz73o8YItgzJn3GVokP/h/QLLhY12mWkl0qRc5ks=; b=HRlGiNF0cDwJxDPjnsRLZuquuYJiRAG3dxAq/P7zyl1cDRIzPlf4Aguow+NIcDKmFb CP8Ku7d8vUXXlEihvvM15ns3gSWni/b1WWAR7iBcjGuvlX6jVSxGqTQ/d46dxij/6M4G LLUQKUL4Uh7Pupc5dhzCXseeC1PFXsR6sfRCnVkdzGuf2ILhAFl1IABF3F5OY2guoOKg QZHYaQlsF3D2LDyoSs9h2ujOcQZHxT+nXBd3J9Z5yrubSNpeIKlYmlM38eGy9zbJId4j LrVd8HGhyCgbGdwppoYzdvSO7OyGekMpWCBVbubIrf4UOuWGNu7zyRFwHBTfpi2fMbnq A8Mg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=d6mqwmoF; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id s15-20020a056402014f00b0046b953601besi5893489edu.29.2022.12.08.07.18.23; Thu, 08 Dec 2022 07:18:47 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=d6mqwmoF; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230207AbiLHPPj (ORCPT + 99 others); Thu, 8 Dec 2022 10:15:39 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59026 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230150AbiLHPOd (ORCPT ); Thu, 8 Dec 2022 10:14:33 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3CA0699F3E for ; Thu, 8 Dec 2022 07:14:25 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 0E74661F7D for ; Thu, 8 Dec 2022 15:14:25 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9D674C433B5; Thu, 8 Dec 2022 15:14:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1670512464; bh=zTmncM0WxJ5kdI/AaCxtRQosZClPsUDSPGCXdWDYCjE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=d6mqwmoF20NyV8UTjI5S2Jf8S9ynQfH5OaTxtoukHI3DvSdML6c5L7XOawTdR9m8R 7bbXd8wDmKX+kQkRM1LzXVCLjfvuwhB28tgxZcqLudelIIk5l9xdicPr469YL/qSvM AQIFYLU/C11ZQB5eq8yCiERyMuTC3bzCSYn0DdIh+d4Sv2xZJkFieOSilpT4lfn6DV 451jKkzEs+uReKIc7Hw1C6AwDkgqxow/jJzFpHYxXM8kRdnpaH7VgTsx2DBStgDt1z SZ6ejS3++cuTwjNvNuH9PM43bQSf3BXGDutytacPSKsHtakx6v3xuZZh2cnC31KPt2 epH+SPzakJQIA== From: Oded Gabbay To: linux-kernel@vger.kernel.org Cc: Ohad Sharabi Subject: [PATCH 22/26] habanalabs: fix dmabuf to export only required size Date: Thu, 8 Dec 2022 17:13:46 +0200 Message-Id: <20221208151350.1833823-22-ogabbay@kernel.org> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20221208151350.1833823-1-ogabbay@kernel.org> References: <20221208151350.1833823-1-ogabbay@kernel.org> MIME-Version: 1.0 X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751659553898797278?= X-GMAIL-MSGID: =?utf-8?q?1751659553898797278?= From: Ohad Sharabi This patch fixes a bug that was found in the dmabuf flow. Bug description as found on Gaudi2 device: 1. User allocates 4MB of device memory - Note that although the allocation size was 4MB the HMMU allocated a full page of 768MB to back the request. - The user gets a memory handle that points to a single page (768MB) - Mapping the handle, the user gets virtual address to the start of the page. 2. User exports the buffer 3. User registers the exported buffer in the importer. This flow has a callback to the exporter which in turn converts the phys_page_pack to an SG list for the importer. This SG list is of single entry of size 768MB. However, the size that was passed to the importer was only 4MB. The solution for this is to make sure the importer gets exposure only to the exported size. This will be done by fixing the SG created by the exporter to be of the total size of the actual exported memory requested by the user. Signed-off-by: Ohad Sharabi Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/misc/habanalabs/common/habanalabs.h | 2 ++ drivers/misc/habanalabs/common/memory.c | 35 +++++++++++++++------ 2 files changed, 28 insertions(+), 9 deletions(-) diff --git a/drivers/misc/habanalabs/common/habanalabs.h b/drivers/misc/habanalabs/common/habanalabs.h index ef5a765f3313..de715c91a87e 100644 --- a/drivers/misc/habanalabs/common/habanalabs.h +++ b/drivers/misc/habanalabs/common/habanalabs.h @@ -2120,6 +2120,7 @@ struct hl_vm_hw_block_list_node { * @pages: the physical page array. * @npages: num physical pages in the pack. * @total_size: total size of all the pages in this list. + * @exported_size: buffer exported size. * @node: used to attach to deletion list that is used when all the allocations are cleared * at the teardown of the context. * @mapping_cnt: number of shared mappings. @@ -2136,6 +2137,7 @@ struct hl_vm_phys_pg_pack { u64 *pages; u64 npages; u64 total_size; + u64 exported_size; struct list_head node; atomic_t mapping_cnt; u32 asid; diff --git a/drivers/misc/habanalabs/common/memory.c b/drivers/misc/habanalabs/common/memory.c index c7c27ffa6309..3f05bb398c70 100644 --- a/drivers/misc/habanalabs/common/memory.c +++ b/drivers/misc/habanalabs/common/memory.c @@ -1548,10 +1548,10 @@ static int set_dma_sg(struct scatterlist *sg, u64 bar_address, u64 chunk_size, } static struct sg_table *alloc_sgt_from_device_pages(struct hl_device *hdev, u64 *pages, u64 npages, - u64 page_size, struct device *dev, - enum dma_data_direction dir) + u64 page_size, u64 exported_size, + struct device *dev, enum dma_data_direction dir) { - u64 chunk_size, bar_address, dma_max_seg_size; + u64 chunk_size, bar_address, dma_max_seg_size, cur_size_to_export, cur_npages; struct asic_fixed_properties *prop; int rc, i, j, nents, cur_page; struct scatterlist *sg; @@ -1577,16 +1577,23 @@ static struct sg_table *alloc_sgt_from_device_pages(struct hl_device *hdev, u64 if (!sgt) return ERR_PTR(-ENOMEM); + /* remove export size restrictions in case not explicitly defined */ + cur_size_to_export = exported_size ? exported_size : (npages * page_size); + /* If the size of each page is larger than the dma max segment size, * then we can't combine pages and the number of entries in the SGL * will just be the * * */ - if (page_size > dma_max_seg_size) - nents = npages * DIV_ROUND_UP_ULL(page_size, dma_max_seg_size); - else + if (page_size > dma_max_seg_size) { + /* we should limit number of pages according to the exported size */ + cur_npages = DIV_ROUND_UP_SECTOR_T(cur_size_to_export, page_size); + nents = cur_npages * DIV_ROUND_UP_SECTOR_T(page_size, dma_max_seg_size); + } else { + cur_npages = npages; + /* Get number of non-contiguous chunks */ - for (i = 1, nents = 1, chunk_size = page_size ; i < npages ; i++) { + for (i = 1, nents = 1, chunk_size = page_size ; i < cur_npages ; i++) { if (pages[i - 1] + page_size != pages[i] || chunk_size + page_size > dma_max_seg_size) { nents++; @@ -1596,6 +1603,7 @@ static struct sg_table *alloc_sgt_from_device_pages(struct hl_device *hdev, u64 chunk_size += page_size; } + } rc = sg_alloc_table(sgt, nents, GFP_KERNEL | __GFP_ZERO); if (rc) @@ -1618,7 +1626,8 @@ static struct sg_table *alloc_sgt_from_device_pages(struct hl_device *hdev, u64 else cur_device_address += dma_max_seg_size; - chunk_size = min(size_left, dma_max_seg_size); + /* make sure not to export over exported size */ + chunk_size = min3(size_left, dma_max_seg_size, cur_size_to_export); bar_address = hdev->dram_pci_bar_start + cur_device_address; @@ -1626,6 +1635,8 @@ static struct sg_table *alloc_sgt_from_device_pages(struct hl_device *hdev, u64 if (rc) goto error_unmap; + cur_size_to_export -= chunk_size; + if (size_left > dma_max_seg_size) { size_left -= dma_max_seg_size; } else { @@ -1637,7 +1648,7 @@ static struct sg_table *alloc_sgt_from_device_pages(struct hl_device *hdev, u64 /* Merge pages and put them into the scatterlist */ for_each_sgtable_dma_sg(sgt, sg, i) { chunk_size = page_size; - for (j = cur_page + 1 ; j < npages ; j++) { + for (j = cur_page + 1 ; j < cur_npages ; j++) { if (pages[j - 1] + page_size != pages[j] || chunk_size + page_size > dma_max_seg_size) break; @@ -1648,10 +1659,13 @@ static struct sg_table *alloc_sgt_from_device_pages(struct hl_device *hdev, u64 bar_address = hdev->dram_pci_bar_start + (pages[cur_page] - prop->dram_base_address); + /* make sure not to export over exported size */ + chunk_size = min(chunk_size, cur_size_to_export); rc = set_dma_sg(sg, bar_address, chunk_size, dev, dir); if (rc) goto error_unmap; + cur_size_to_export -= chunk_size; cur_page = j; } } @@ -1722,6 +1736,7 @@ static struct sg_table *hl_map_dmabuf(struct dma_buf_attachment *attachment, phys_pg_pack->pages, phys_pg_pack->npages, phys_pg_pack->page_size, + phys_pg_pack->exported_size, attachment->dev, dir); else @@ -1729,6 +1744,7 @@ static struct sg_table *hl_map_dmabuf(struct dma_buf_attachment *attachment, &hl_dmabuf->device_address, 1, hl_dmabuf->dmabuf->size, + 0, attachment->dev, dir); @@ -2033,6 +2049,7 @@ static int export_dmabuf_from_addr(struct hl_ctx *ctx, u64 addr, u64 size, u64 o if (rc) goto dec_memhash_export_cnt; + phys_pg_pack->exported_size = size; hl_dmabuf->phys_pg_pack = phys_pg_pack; hl_dmabuf->memhash_hnode = hnode; } else { From patchwork Thu Dec 8 15:13:47 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oded Gabbay X-Patchwork-Id: 31416 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp256648wrr; Thu, 8 Dec 2022 07:21:20 -0800 (PST) X-Google-Smtp-Source: AA0mqf73kXiPM9CE2iq54aeMr+jjYYqdU+Xy+bZTsptTAR9J9Y2CxjNyed7ahPpkkPoxVxLxKSY8 X-Received: by 2002:a17:907:cbc9:b0:7c0:8a2c:8886 with SMTP id vk9-20020a170907cbc900b007c08a2c8886mr20752822ejc.183.1670512880109; Thu, 08 Dec 2022 07:21:20 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670512880; cv=none; d=google.com; s=arc-20160816; b=gt4/B4A/oZqMUHBw3RQzC8ITGyfbwuHFBY4XMV34BQeo3LZB2G4Q0s1MujbqwJ7agm NbYz7WekwuljvTVE4YeEa2w2s+j28GqZj+m6URRs/3QHLFyESWxBIyOETlRGMICKynu7 kD+0rWiglh+dUmlU3QZxBLcsq5Vxb4CsD80M5lTGki4r9zKHQL7D+JCzb3FH83xFTueo KtqVLxfAT0PtMWboGTOUC5DN/V8jwkR/vaorDsa1ZzwH15YBlszlsJc+j2Hj8IBOgpMO 0qkH5Dq4HaAQ06knnUw4oYXCX5CJL3eShdBxvT1fkLr0SRqdH861XgjeYt2jh4sAY4Sg CRgA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=OdRFT1tlGPpJ+YhelN9ypxGpFB7xTu2hP/alDRGELgM=; b=Qjh/1u0XKivBaJVn0iCQCWF0YgLRazcgXYbnRdaxYJXpP56gQvU9+ysqggEssH/fmc uwEIivi/Y6KMJ4J2Whee0vQCQM1w6IU3GL8ZQ+wz9ush5eK8nUqg6jtifbFgOTt1JO1O hSSuhCDNjHAPP1fTxWQvwd5uT7xFozUY2PmULROluAxHTWW4Q1MfHu2Wx+oGNc1f5v/k W8sVTRlIciuY10g9W6UNvUQlRpx2mU29K19KajF/HwWpiqZyWSTL13GAzN53aw0XEm5C mAPBZpPw4lUA86deeGb3BgL/Yu5/ZE4gWmK8YnMN9ElnXJYm3gWMqJZwmCDyPBKgdSNP 3Nwg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=t+6qIkS3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id cf20-20020a170906b2d400b007c0d6b33bb8si5758719ejb.641.2022.12.08.07.20.56; Thu, 08 Dec 2022 07:21:20 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=t+6qIkS3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230324AbiLHPPn (ORCPT + 99 others); Thu, 8 Dec 2022 10:15:43 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58632 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230151AbiLHPOe (ORCPT ); Thu, 8 Dec 2022 10:14:34 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C4977ABA3B for ; Thu, 8 Dec 2022 07:14:26 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 5B88C61F7C for ; Thu, 8 Dec 2022 15:14:26 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id EB2EEC433C1; Thu, 8 Dec 2022 15:14:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1670512465; bh=Efq5ukaDJ2ts0dWvrsIKmLu4t+yi7P/WceV654akUco=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=t+6qIkS34HzWkqwr1Qnuodm0vPwocNi7IE+dJqiZH0srusbkvR3o9BS/oSZdYyPYJ R0oCtxpU3t/TZqf6lhu+V8b4sx1IDCBe+qgRyBONFiBnvBgaCKIjNQu1HXnHHEAFjY c5JoHVn8Dm3jLsrRHqtzPnlXFYPREesRkzYdBIPVZh76pLubUFkp/gQuFx0o+ceKfN B4TQ6DX5o5pxQFp/cTosPvFlRtYPwqTM3SadQPIJAxFwAI+aFjoAU2RGIHvmW0iww8 c6bqQFCWC46Nf5O+SGAm+DEnj+w8pxhGJvsevoYANOFeS8rmLvxaGCH7zbl887lJ2+ 6FITrXOCs2fsg== From: Oded Gabbay To: linux-kernel@vger.kernel.org Cc: Tomer Tayar Subject: [PATCH 23/26] habanalabs: fix handling of wait CS for interrupting signals Date: Thu, 8 Dec 2022 17:13:47 +0200 Message-Id: <20221208151350.1833823-23-ogabbay@kernel.org> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20221208151350.1833823-1-ogabbay@kernel.org> References: <20221208151350.1833823-1-ogabbay@kernel.org> MIME-Version: 1.0 X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751659713370025383?= X-GMAIL-MSGID: =?utf-8?q?1751659713370025383?= From: Tomer Tayar The -ERESTARTSYS return value is not handled correctly when a signal is received while waiting for CS completion. This can lead to bad output values to user when waiting for a single CS completion, and more severe, it can cause a non-stopping loop when waiting to multi-CS completion and until a CS timeout. Fix the handling and exit the waiting if this return value is received. Signed-off-by: Tomer Tayar Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- .../misc/habanalabs/common/command_submission.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/drivers/misc/habanalabs/common/command_submission.c b/drivers/misc/habanalabs/common/command_submission.c index cf3b82efc65c..0ec8cdcbb1f5 100644 --- a/drivers/misc/habanalabs/common/command_submission.c +++ b/drivers/misc/habanalabs/common/command_submission.c @@ -2590,7 +2590,9 @@ static int hl_wait_for_fence(struct hl_ctx *ctx, u64 seq, struct hl_fence *fence *status = CS_WAIT_STATUS_BUSY; } - if (error == -ETIMEDOUT || error == -EIO) + if (completion_rc == -ERESTARTSYS) + rc = completion_rc; + else if (error == -ETIMEDOUT || error == -EIO) rc = error; return rc; @@ -2849,6 +2851,9 @@ static int hl_wait_multi_cs_completion(struct multi_cs_data *mcs_data, if (completion_rc > 0) mcs_data->timestamp = mcs_compl->timestamp; + if (completion_rc == -ERESTARTSYS) + return completion_rc; + mcs_data->wait_status = completion_rc; return 0; @@ -2994,15 +2999,15 @@ static int hl_multi_cs_wait_ioctl(struct hl_fpriv *hpriv, void *data) free_seq_arr: kfree(cs_seq_arr); - if (rc) - return rc; - - if (mcs_data.wait_status == -ERESTARTSYS) { + if (rc == -ERESTARTSYS) { dev_err_ratelimited(hdev->dev, "user process got signal while waiting for Multi-CS\n"); - return -EINTR; + rc = -EINTR; } + if (rc) + return rc; + /* update output args */ memset(args, 0, sizeof(*args)); From patchwork Thu Dec 8 15:13:48 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oded Gabbay X-Patchwork-Id: 31412 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp255400wrr; Thu, 8 Dec 2022 07:19:16 -0800 (PST) X-Google-Smtp-Source: AA0mqf6xZ/MtVgJYX8ciuNqf/Lvu7X+09MZqACezVco/VN8Su+/en1ygSMyzMgG9eMZB5EN4lq4k X-Received: by 2002:a05:6402:1f85:b0:462:2410:9720 with SMTP id c5-20020a0564021f8500b0046224109720mr7266670edc.84.1670512756459; Thu, 08 Dec 2022 07:19:16 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670512756; cv=none; d=google.com; s=arc-20160816; b=nx2EWgSJDd39QU/vGvFh9oUXunxuuQDoH79Vl2Eedr3n3psVGQfJ44sh+UiRF/0rdy /JD0A7+vjgihvxiHTKLZgEwAEukZaxGHNCx30ME+HGllBZVaY5kvlyr91jMP9NP2BHbV Y6PKOMU8vCbzHPgrCbM1wREexaLlnGU/poEmsmmAuD/rvK9i3QUGGejtHHsKT+RBYRrw MXZ92pNc6EpVPmZ+aoLdkaeEZ5nhhcI0BUw2IbFb/TWFIFmgPVhkUWHbq9fF8M4gU8lf ifzaUENy51jxwDAB5w665V0cLLZVXH+xs7Oz1tsEGjimJXy/6HUA9iBnjMpv1+0UcPiQ 4Dfw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=g/zvF6QFx5KTrM+BJWb2far9HNmHmu8/2MjVb30x8eY=; b=XBqBYR+dj4HW72lmEqT0tV7xsnFz2oW375j7Pa4VvBEtlz91AfvfRHEfptjhA8aXn1 O8gY4+CiMS9AgFKGYIY/Qm0b2pwQmcyu+KHmtVDE3GKAoTbCtBzTtpUK4GRcsSYDjFKX GpcpGCPw/fxPq6cxyaQyVFLtVNV0iwswq6fKRvluCeqidguAzSZGRlltWXGm+dDcLLcZ DUQ/KAmfImMIJYa5zEXUro4e9Zk91pCV3Qw7bXGquwwTQARiPDSnCE9MDH8+Abi9H5W4 ZIcRHWVO/PV2lBb2qrmni6lqz0/u49+rrjdr2yXQF//qf9S2pAIVMTFaZZEfHv9Az4ig PFsA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=aGFduoip; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b15-20020a056402350f00b0046311e80ebcsi7483048edd.151.2022.12.08.07.18.52; Thu, 08 Dec 2022 07:19:16 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=aGFduoip; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229894AbiLHPPt (ORCPT + 99 others); Thu, 8 Dec 2022 10:15:49 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59234 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229665AbiLHPOj (ORCPT ); Thu, 8 Dec 2022 10:14:39 -0500 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8B9F7AD339 for ; Thu, 8 Dec 2022 07:14:29 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 483A5B823DC for ; Thu, 8 Dec 2022 15:14:28 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 46167C43151; Thu, 8 Dec 2022 15:14:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1670512467; bh=HoYALqjvqnCPCS7thHq8D4m998sGCS3WKDk+gLoFYak=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=aGFduoip26eis6oUkuGUmR4uyB1MPqQ4JOC1XmvUMBCcOrVk+GfuqsOawYVIquHkT ynubygIxLnbtbJjN1ZmIbBix/PJlY2bHJAhbK2DOirLT2Prz9NaCk3SCNxWPHZb6W4 XJH2yXA1AuTp3FsmbHdgcpuvdRATbK0TKbVpOtZlTcjf84ZGEKuQC8wutt1kl/vXFo 6dntuvl1rixXT86/d/SL7+asfWs9liyOcYbSAF4pTYHOF8RK367NQVrU/nlm34QE5e Vu8I6xqFRFb2MvTQ14CVCIqqKrDxcaWapmKvpeEYEKjvqJ/0qSUys0Kp1C2eDxjHX0 iXsW82VyOXQIQ== From: Oded Gabbay To: linux-kernel@vger.kernel.org Cc: Tomer Tayar Subject: [PATCH 24/26] habanalabs: put fences in case of unexpected wait status Date: Thu, 8 Dec 2022 17:13:48 +0200 Message-Id: <20221208151350.1833823-24-ogabbay@kernel.org> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20221208151350.1833823-1-ogabbay@kernel.org> References: <20221208151350.1833823-1-ogabbay@kernel.org> MIME-Version: 1.0 X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751659584406555024?= X-GMAIL-MSGID: =?utf-8?q?1751659584406555024?= From: Tomer Tayar Need to put fences even if an unexpected status value is received while waiting for a fence. Signed-off-by: Tomer Tayar Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/misc/habanalabs/common/command_submission.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/misc/habanalabs/common/command_submission.c b/drivers/misc/habanalabs/common/command_submission.c index 0ec8cdcbb1f5..1543ef993f8e 100644 --- a/drivers/misc/habanalabs/common/command_submission.c +++ b/drivers/misc/habanalabs/common/command_submission.c @@ -2722,7 +2722,8 @@ static int hl_cs_poll_fences(struct multi_cs_data *mcs_data, struct multi_cs_com break; default: dev_err(hdev->dev, "Invalid fence status\n"); - return -EINVAL; + rc = -EINVAL; + break; } } From patchwork Thu Dec 8 15:13:49 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oded Gabbay X-Patchwork-Id: 31413 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp256126wrr; Thu, 8 Dec 2022 07:20:29 -0800 (PST) X-Google-Smtp-Source: AA0mqf7/fJ/pbny2DO6Q2IotOkldrODBbQ2mdfLOdiT7Uzt9Ik3loX67XgSCna/vrqnrf6QXAwVm X-Received: by 2002:a63:a55:0:b0:478:ff89:feef with SMTP id z21-20020a630a55000000b00478ff89feefmr3864550pgk.205.1670512828728; Thu, 08 Dec 2022 07:20:28 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670512828; cv=none; d=google.com; s=arc-20160816; b=Nwvaz+CHfBQYpRvXDYuaoldpYCFHgqyGCcUEWq24qIxTs1vh5ZyohXuXMN4VUbtu1X IDpC1sy6PGbSXa3a0OViF3oxv6q0U7pLfN1SxSTPj7JcVN2jJwuct0Kcrw4HJxUHjZXh nPdHCG8m98kkQbxm1AvxuRiHQIwhI50mZtVU8UwPKBIRkAGMsjGtSVZbYgALSkOL/54c xAEYwRqS9pwhUF0jza1E3/wfii6vM2uTCbTrkBSKg3Mn8bljdjPVY5YqNjs9JIymb7x0 Ed7QAGX2+aZoABSLm+N4GCesv4Mx74z+NWZkAjfEF7cTH3TuqL/UtV1pJWBx+ZgWzeZR i37g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=ZoNLTffFVXIQAxgOi23WDVMicZSVoO0LQ8+w5eQjGss=; b=SCD8xzhmxG2rIlpn1gGigOneq51KafYdpVJTlWhqjW9Y8vrbCayBnyfpzIB9vFP155 SsBuyrAq5uYqifcBuNmWLZdzFB5y27Sp4zWDJWDmdENOsPnAWZp6nr4/LEgnhU94I17q cgJb7K50vjmDjv/YQ0moBfvhZP32CEMxUIfVVsMZPt4Acp/1hajKy88lAqnO29k4gENx j4ga8iY7bWzNOEg8vrl5QvIc9ycsfJbgOtu5kXLJUANksSzuFtc0iqyV9QUxrgHIj3Im yxMcR2qB1nMfsTH2ppke/HqJ8wg4a6Nc3Qp195HWrw51s2LysLUuNNbS8FP8RroLeOf8 ojkQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=K2HkNy6s; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id j36-20020a635964000000b004786230ec58si23652237pgm.169.2022.12.08.07.20.13; Thu, 08 Dec 2022 07:20:28 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=K2HkNy6s; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230334AbiLHPPy (ORCPT + 99 others); Thu, 8 Dec 2022 10:15:54 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59280 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230162AbiLHPOk (ORCPT ); Thu, 8 Dec 2022 10:14:40 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6F63CAD338 for ; Thu, 8 Dec 2022 07:14:29 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id F14D661F80 for ; Thu, 8 Dec 2022 15:14:28 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 94101C433D6; Thu, 8 Dec 2022 15:14:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1670512468; bh=Mu/9dkl66q/9zHOO3N55f6Zqs7iCpKF6LOtWcXein+k=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=K2HkNy6s5QjxE2i+UKvNQcnPwv6kFyzkuYoIHGFr/Kua9NxLXiuFLSpqHdQRIOqo4 HEGt+AuJhGNGi8aQ8z0d6a24oUUOcW0sq9qOfH2kLcUQu9saVjFxeJ8O27BkaUAl1A IytSdbZ9X+NDPe2nqmgpm2wL2C13OWzqUvC9SNmoRmQ5xpHa0vT7wp2M5p2Y4xHP8f CWH7mmeNvxHsQKxBmmJmJb/B60iaLWTFmKnsDttrqSpFGkFZ/4BLxsUcik/o/Qfr91 v7m4ccq35rbQO6Hq1Gl3VsbQJJwNMUkvq3i7uuUxIF5WUg7csKca2CldSZicW5/jw5 vMo0sBJvh6ioA== From: Oded Gabbay To: linux-kernel@vger.kernel.org Cc: Ohad Sharabi Subject: [PATCH 25/26] habanalabs/gaudi2: wait for preboot ready if HW state is dirty Date: Thu, 8 Dec 2022 17:13:49 +0200 Message-Id: <20221208151350.1833823-25-ogabbay@kernel.org> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20221208151350.1833823-1-ogabbay@kernel.org> References: <20221208151350.1833823-1-ogabbay@kernel.org> MIME-Version: 1.0 X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751659659769525308?= X-GMAIL-MSGID: =?utf-8?q?1751659659769525308?= From: Ohad Sharabi Instead of waiting for BTM indication we should wait for preboot ready. Consider the below scenario: 1. FW update is being triggered - setting the dirty bit 2. hard reset will be triggered due to the dirty bit 3. FW initiates the reset: - dirty bit cleared - BTM indication cleared - preboot ready indication cleared 4. during hard reset: - BTM indication will be set - BIST test performed and another reset triggered 5. only after this reset the preboot will set the preboot ready When polling on BTM indication alone we can lose sync with FW while trying to communicate with FW that is during reset. To overcome this we will always wait to preboot ready indication. Signed-off-by: Ohad Sharabi Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/misc/habanalabs/common/firmware_if.c | 2 +- drivers/misc/habanalabs/common/habanalabs.h | 1 + drivers/misc/habanalabs/gaudi2/gaudi2.c | 26 +++++++++++++++++++- 3 files changed, 27 insertions(+), 2 deletions(-) diff --git a/drivers/misc/habanalabs/common/firmware_if.c b/drivers/misc/habanalabs/common/firmware_if.c index 537b1ae3fcb7..cda0bf3dbf1b 100644 --- a/drivers/misc/habanalabs/common/firmware_if.c +++ b/drivers/misc/habanalabs/common/firmware_if.c @@ -1352,7 +1352,7 @@ static void detect_cpu_boot_status(struct hl_device *hdev, u32 status) } } -static int hl_fw_wait_preboot_ready(struct hl_device *hdev) +int hl_fw_wait_preboot_ready(struct hl_device *hdev) { struct pre_fw_load_props *pre_fw_load = &hdev->fw_loader.pre_fw_load; u32 status; diff --git a/drivers/misc/habanalabs/common/habanalabs.h b/drivers/misc/habanalabs/common/habanalabs.h index de715c91a87e..e5443bf7fe12 100644 --- a/drivers/misc/habanalabs/common/habanalabs.h +++ b/drivers/misc/habanalabs/common/habanalabs.h @@ -3745,6 +3745,7 @@ int hl_fw_cpucp_power_get(struct hl_device *hdev, u64 *power); void hl_fw_ask_hard_reset_without_linux(struct hl_device *hdev); void hl_fw_ask_halt_machine_without_linux(struct hl_device *hdev); int hl_fw_init_cpu(struct hl_device *hdev); +int hl_fw_wait_preboot_ready(struct hl_device *hdev); int hl_fw_read_preboot_status(struct hl_device *hdev); int hl_fw_dynamic_send_protocol_cmd(struct hl_device *hdev, struct fw_load_mgr *fw_loader, diff --git a/drivers/misc/habanalabs/gaudi2/gaudi2.c b/drivers/misc/habanalabs/gaudi2/gaudi2.c index ba3b0ae76ebf..5242b6f6bf95 100644 --- a/drivers/misc/habanalabs/gaudi2/gaudi2.c +++ b/drivers/misc/habanalabs/gaudi2/gaudi2.c @@ -5484,7 +5484,31 @@ static void gaudi2_hw_fini(struct hl_device *hdev, bool hard_reset, bool fw_rese skip_reset: if (driver_performs_reset || hard_reset) - gaudi2_poll_btm_indication(hdev, reset_sleep_ms, poll_timeout_us); + /* + * Instead of waiting for BTM indication we should wait for preboot ready: + * Consider the below scenario: + * 1. FW update is being triggered + * - setting the dirty bit + * 2. hard reset will be triggered due to the dirty bit + * 3. FW initiates the reset: + * - dirty bit cleared + * - BTM indication cleared + * - preboot ready indication cleared + * 4. during hard reset: + * - BTM indication will be set + * - BIST test performed and another reset triggered + * 5. only after this reset the preboot will set the preboot ready + * + * when polling on BTM indication alone we can lose sync with FW while trying to + * communicate with FW that is during reset. + * to overcome this we will always wait to preboot ready indication + */ + if ((hdev->fw_components & FW_TYPE_PREBOOT_CPU)) { + msleep(reset_sleep_ms); + hl_fw_wait_preboot_ready(hdev); + } else { + gaudi2_poll_btm_indication(hdev, reset_sleep_ms, poll_timeout_us); + } else gaudi2_get_soft_rst_done_indication(hdev, poll_timeout_us); From patchwork Thu Dec 8 15:13:50 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oded Gabbay X-Patchwork-Id: 31414 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp256248wrr; Thu, 8 Dec 2022 07:20:41 -0800 (PST) X-Google-Smtp-Source: AA0mqf5IrFGiaSMxp3vryvKphFKosvfWRrXpJf+j5egrIVu0scRFbHi5Ft6G04FDXn83SH/GacZf X-Received: by 2002:a17:906:4a5a:b0:7c0:e6d7:dabc with SMTP id a26-20020a1709064a5a00b007c0e6d7dabcmr15983454ejv.227.1670512841067; Thu, 08 Dec 2022 07:20:41 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670512841; cv=none; d=google.com; s=arc-20160816; b=SCGQkt+faxkWaajiuoXBoZ9wp4vxbt1AdtsWaHeJ3EBD6dRPI6/3yldi5jKtxF6BFz C3NQjm9Gs4YHj5/2yo6/a3NJ25/j/DkwNN1+11hHN240BGSVBsQQxY9CvqA2mLpY2c+F sRuMtVLY/NOCvglGibHTY5kd3LmrAR5ayuf6Aj3aRJlnvmFePxZOox7B75FpIvwSCMl5 zECnPzOoKEfw3WZcHALLbt93Z/kS1XqibyVLvzjmF/HWaV3RbQNBIJ6hVpXDjtp7JeXL eawPozLm3U/9q+0SYXlq/LgiX7WjKtuvVfck3v2kmZVRXz1lTIB9cQK/OMDdWUeDHRVA 7xFw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=6HVfGw8Fp30CZkGs21Qts96598GEVF/4hNPySCaGemM=; b=DNEB+2HALxctTaMSVmqIEoOzQxiExl51cKuPHbMTO9EpvYJ6cl8Em2zn3nzKhzq9XV vNrFrEY8Ai0YIOzFFGgqXw+ZizLVJ3Iupxyam4ugdFnfwNRxgGElubptJ6WYBfWbRoKI BZ31Sd6/yLYgei5ERYfY9ZnlP1kE3UknDhEtWEB3q4EtshNNaLhzby8R4yHlWV34AS4U 0U76SPDriZEpg04pCI1T9eGYYaVftBnDajiggSQAdluVpR2dXMWbqhyjTN5IKfVW1T53 pvV6zjq8FPwsSfY4MzPa6Gc1n2LgXndXoOUdBOxCPTvW4QqzmiES0prxx7IKyGOD6YR5 IMwA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="jxo4D/pF"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b10-20020aa7df8a000000b0046b2327bf88si5912709edy.76.2022.12.08.07.20.17; Thu, 08 Dec 2022 07:20:41 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="jxo4D/pF"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230337AbiLHPP4 (ORCPT + 99 others); Thu, 8 Dec 2022 10:15:56 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59416 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230221AbiLHPOp (ORCPT ); Thu, 8 Dec 2022 10:14:45 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B5CFFAD33F for ; Thu, 8 Dec 2022 07:14:30 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 4FFF161F6B for ; Thu, 8 Dec 2022 15:14:30 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id E2899C433D7; Thu, 8 Dec 2022 15:14:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1670512469; bh=fGmEfHJ5sJXZ4aGfedlfBXl5fLVhG6tOkoie688PUzU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=jxo4D/pFiATIio7WyLEC0/K6IM0pK+9Wv2tJFVntxAgrntpJQuNPC2coDRzAHfgny SfkHLcmb5Jx2yJI6HSUx6aF73hkavskXD38lMtQw1YALyvZPi72mZ6bIqMjG5EOzf2 gGY0yrEoqodUg7I0bB4jeEGDAUAE7JMbh0CmKYUlnbs32xbAe9jwv+s+XbRrk53xAR 7NRw4w5fQtYNVSISIOJVny3jNWYUQAzjjVEztFVcFKOlqjs6/Awu4vl1xPWEJcNPIm 75EZtsiBzRMx05wmEdd6EUGR4wWtReeDiP8K5jP/gHtzzEdKJ0Bnyk7L/H4Cl5bVkT 6Lz06slDlxkdw== From: Oded Gabbay To: linux-kernel@vger.kernel.org Cc: farah kassabri Subject: [PATCH 26/26] habanalabs: fix wrong variable type used for vzalloc Date: Thu, 8 Dec 2022 17:13:50 +0200 Message-Id: <20221208151350.1833823-26-ogabbay@kernel.org> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20221208151350.1833823-1-ogabbay@kernel.org> References: <20221208151350.1833823-1-ogabbay@kernel.org> MIME-Version: 1.0 X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751659672749730373?= X-GMAIL-MSGID: =?utf-8?q?1751659672749730373?= From: farah kassabri vzalloc expects void* and not void __iomem*. Signed-off-by: farah kassabri Reviewed-by: Oded Gabbay Signed-off-by: Oded Gabbay --- drivers/misc/habanalabs/common/firmware_if.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/misc/habanalabs/common/firmware_if.c b/drivers/misc/habanalabs/common/firmware_if.c index cda0bf3dbf1b..ee4d1c5ca527 100644 --- a/drivers/misc/habanalabs/common/firmware_if.c +++ b/drivers/misc/habanalabs/common/firmware_if.c @@ -2019,9 +2019,10 @@ static int hl_fw_dynamic_read_and_validate_descriptor(struct hl_device *hdev, struct fw_load_mgr *fw_loader) { struct lkd_fw_comms_desc *fw_desc; - void __iomem *src, *temp_fw_desc; struct pci_mem_region *region; struct fw_response *response; + void *temp_fw_desc; + void __iomem *src; u16 fw_data_size; enum pci_region region_id; int rc;