From patchwork Sat Nov 19 07:48:15 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "yekai (A)" X-Patchwork-Id: 23227 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp603186wrr; Fri, 18 Nov 2022 23:58:52 -0800 (PST) X-Google-Smtp-Source: AA0mqf7LxyeoLGShXltSwLvTgAFWIRUC2fX3gSNR+XeMksKZ1KrCzMh/yDLS/2FR+FiqnaJvDFSx X-Received: by 2002:a62:1b4a:0:b0:573:20a7:d with SMTP id b71-20020a621b4a000000b0057320a7000dmr7568414pfb.65.1668844732510; Fri, 18 Nov 2022 23:58:52 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1668844732; cv=none; d=google.com; s=arc-20160816; b=PC4mJNN8PdFyXNKLRUHt5vEdmnie0+h7UXOSUBs0kenGwgG8WJd+AfNMjtvPGK6H8t FMP/wPjLlZ/BwsB/tMSH5mrxRrKXPOL7Gh6qFbFi01TeHiAyfKdCYoT3LoURfvQl5cKo 8o3nAzuwX5av3L5LTl++rOCAWB25HX49xNpu+pEvvyVgvOVPwwS+q4gojFoO+kf9ucLZ DCHPnIcmyRMkSunVYUXTnyYWMlEvXJCUp4+e49Idhalxy11C5KB4sFefoJm3IzjrDN0I k6Xt3dPOiGkrq3f8vUODeNxQIqqQYZX30nQ9ITXXreN86CKbb6ZRs9upRfS60vRzLRPe WcYg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:in-reply-to:message-id :date:subject:cc:to:from; bh=HwFsCqbxecoVDBrGKEzGt3SsZ3uAAcwdkCeRksugWVw=; b=docahIXOtRzxmJkpC6sHud/kq37kMqauFGkNqxd2MXbk5APu91Z0zY42xXq6bn0pvQ XkcWDfz7f+23/drqaB298RjU/UoBKlrAsirDr3sFZ2QVbdS+dcopJ+U0GSDq7RB79eTj 6GK3O5Tpn7j9ip2j96XPZ/RAtmcM0IMQkwp1Eb2RwSbnJEfmCo9XvY8si/GogBLynJnC 1qwuKZvwghMxdmVqjgi0ooCK1YWx3AFOvY16AuKpd0AZ87d3B/AV3qJL3eqkWgybMPp3 ZvC/aB+TUB+b7Ij2JLZYngxV3IRwSSmvwue7T8NOkutOzA39OAn6RYLyU7JIWXsSJoD/ SnTw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id k9-20020a170902c40900b00186748fd1a9si7375440plk.170.2022.11.18.23.58.39; Fri, 18 Nov 2022 23:58:52 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233114AbiKSHyd (ORCPT + 99 others); Sat, 19 Nov 2022 02:54:33 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51964 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229500AbiKSHyb (ORCPT ); Sat, 19 Nov 2022 02:54:31 -0500 Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AFFB6167E6; Fri, 18 Nov 2022 23:54:29 -0800 (PST) Received: from dggpeml500025.china.huawei.com (unknown [172.30.72.55]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4NDmBN0jxxzRpBC; Sat, 19 Nov 2022 15:54:04 +0800 (CST) Received: from dggpeml100012.china.huawei.com (7.185.36.121) by dggpeml500025.china.huawei.com (7.185.36.35) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Sat, 19 Nov 2022 15:54:27 +0800 Received: from huawei.com (10.67.165.24) by dggpeml100012.china.huawei.com (7.185.36.121) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Sat, 19 Nov 2022 15:54:27 +0800 From: Kai Ye To: , CC: , , , , Subject: [PATCH v10 1/3] uacce: supports device isolation feature Date: Sat, 19 Nov 2022 07:48:15 +0000 Message-ID: <20221119074817.12063-2-yekai13@huawei.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221119074817.12063-1-yekai13@huawei.com> References: <20221119074817.12063-1-yekai13@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.67.165.24] X-ClientProxiedBy: dggems703-chm.china.huawei.com (10.3.19.180) To dggpeml100012.china.huawei.com (7.185.36.121) X-CFilter-Loop: Reflected X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1749910534639137525?= X-GMAIL-MSGID: =?utf-8?q?1749910534639137525?= UACCE adds the hardware error isolation feature. To improve service reliability, some uacce devices that frequently encounter hardware errors are isolated. Therefore, this feature is added. Users can configure the hardware error threshold by 'isolate_strategy' sysfs node. The user space can get the device isolated state by 'isolate' sysfs node. If the number of device errors exceeds the configured error threshold, the device will be isolated. It means the uacce device is unavailable. Signed-off-by: Kai Ye --- drivers/misc/uacce/uacce.c | 50 ++++++++++++++++++++++++++++++++++++++ include/linux/uacce.h | 12 +++++++++ 2 files changed, 62 insertions(+) diff --git a/drivers/misc/uacce/uacce.c b/drivers/misc/uacce/uacce.c index b70a013139c7..946f7e4b364f 100644 --- a/drivers/misc/uacce/uacce.c +++ b/drivers/misc/uacce/uacce.c @@ -363,12 +363,52 @@ static ssize_t region_dus_size_show(struct device *dev, uacce->qf_pg_num[UACCE_QFRT_DUS] << PAGE_SHIFT); } +static ssize_t isolate_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct uacce_device *uacce = to_uacce_device(dev); + + return sysfs_emit(buf, "%d\n", uacce->ops->get_isolate_state(uacce)); +} + +static ssize_t isolate_strategy_show(struct device *dev, struct device_attribute *attr, char *buf) +{ + struct uacce_device *uacce = to_uacce_device(dev); + u32 val; + + val = uacce->ops->isolate_err_threshold_read(uacce); + + return sysfs_emit(buf, "%u\n", val); +} + +static ssize_t isolate_strategy_store(struct device *dev, struct device_attribute *attr, + const char *buf, size_t count) +{ + struct uacce_device *uacce = to_uacce_device(dev); + unsigned long val; + int ret; + + if (kstrtoul(buf, 0, &val) < 0) + return -EINVAL; + + if (val > UACCE_MAX_ERR_THRESHOLD) + return -EINVAL; + + ret = uacce->ops->isolate_err_threshold_write(uacce, val); + if (ret) + return ret; + + return count; +} + static DEVICE_ATTR_RO(api); static DEVICE_ATTR_RO(flags); static DEVICE_ATTR_RO(available_instances); static DEVICE_ATTR_RO(algorithms); static DEVICE_ATTR_RO(region_mmio_size); static DEVICE_ATTR_RO(region_dus_size); +static DEVICE_ATTR_RO(isolate); +static DEVICE_ATTR_RW(isolate_strategy); static struct attribute *uacce_dev_attrs[] = { &dev_attr_api.attr, @@ -377,6 +417,8 @@ static struct attribute *uacce_dev_attrs[] = { &dev_attr_algorithms.attr, &dev_attr_region_mmio_size.attr, &dev_attr_region_dus_size.attr, + &dev_attr_isolate.attr, + &dev_attr_isolate_strategy.attr, NULL, }; @@ -392,6 +434,14 @@ static umode_t uacce_dev_is_visible(struct kobject *kobj, (!uacce->qf_pg_num[UACCE_QFRT_DUS]))) return 0; + if (attr == &dev_attr_isolate_strategy.attr && + (!uacce->ops->isolate_err_threshold_read && + !uacce->ops->isolate_err_threshold_write)) + return 0; + + if (attr == &dev_attr_isolate.attr && !uacce->ops->get_isolate_state) + return 0; + return attr->mode; } diff --git a/include/linux/uacce.h b/include/linux/uacce.h index 9ce88c28b0a8..0a81c3dfd26c 100644 --- a/include/linux/uacce.h +++ b/include/linux/uacce.h @@ -8,6 +8,7 @@ #define UACCE_NAME "uacce" #define UACCE_MAX_REGION 2 #define UACCE_MAX_NAME_SIZE 64 +#define UACCE_MAX_ERR_THRESHOLD 65535 struct uacce_queue; struct uacce_device; @@ -30,6 +31,9 @@ struct uacce_qfile_region { * @is_q_updated: check whether the task is finished * @mmap: mmap addresses of queue to user space * @ioctl: ioctl for user space users of the queue + * @get_isolate_state: get the device state after set the isolate strategy + * @isolate_err_threshold_write: stored the isolate error threshold to the device + * @isolate_err_threshold_read: read the isolate error threshold value from the device */ struct uacce_ops { int (*get_available_instances)(struct uacce_device *uacce); @@ -43,6 +47,9 @@ struct uacce_ops { struct uacce_qfile_region *qfr); long (*ioctl)(struct uacce_queue *q, unsigned int cmd, unsigned long arg); + enum uacce_dev_state (*get_isolate_state)(struct uacce_device *uacce); + int (*isolate_err_threshold_write)(struct uacce_device *uacce, u32 num); + u32 (*isolate_err_threshold_read)(struct uacce_device *uacce); }; /** @@ -57,6 +64,11 @@ struct uacce_interface { const struct uacce_ops *ops; }; +enum uacce_dev_state { + UACCE_DEV_NORMAL, + UACCE_DEV_ISOLATE, +}; + enum uacce_q_state { UACCE_Q_ZOMBIE = 0, UACCE_Q_INIT, From patchwork Sat Nov 19 07:48:16 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "yekai (A)" X-Patchwork-Id: 23228 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp603374wrr; Fri, 18 Nov 2022 23:59:39 -0800 (PST) X-Google-Smtp-Source: AA0mqf7J52VstLD9CVNC1ksCLSRxsmhP3Vdqu7+zezRb7WfbldkNpXNgC8HfKzb15exqENa6DIS1 X-Received: by 2002:aa7:d844:0:b0:458:fa8f:f82c with SMTP id f4-20020aa7d844000000b00458fa8ff82cmr8861735eds.246.1668844779164; Fri, 18 Nov 2022 23:59:39 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1668844779; cv=none; d=google.com; s=arc-20160816; b=JP3pzPDy0FUDZmwIG6Zzh1BiVl2v4VZxcmTYk4vQ9cIqCh3Pqlb6t7Pd7c5NJ29fWy XZspTWeQWw+r74SnppUKRnz8BsQQ2h2QOWdZmtHjgSF96Ca9flt7hFdduFot8yT8pD59 7CYC6lBpsA5toAcbPRUFii1F4ReOwivuCUkiCDxnvWS9RnyaF6/fv2TWut2pngle6CvW A2klRlYevWDHl+lmKJN682dBZ8SXnlwHUFvl8NM31tvsiY2lTJg+yiMLQnTebqagR0Zp c8nuwaJ20Y7zvmb31O2y8+fZSGaq0HA2VbaThSfV+Z9e66fDTT9ovPIVaUc02as5tHSZ TGew== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:in-reply-to:message-id :date:subject:cc:to:from; bh=cr7+9w02ScSfmJA969h+bv1U/N88PXAivymK9Hx0ppg=; b=Sf6LDemSruYVEai26GqjcJF1fh6GdF2fNrfvTJxYwWIDtg92S8PDg56pN+C5kavM4h llb5wTond6hhtzhQr4FnpRRaZObh8tk+dyzm0h13kfy6DpuSryAzxvPksHjHNPTyM8/7 uHXLxXrNmIPBZ7Bw4qt9V6urEtjXG9LM2UzrwZK/UzvnYYud5LlFUyT7jonH3RixwxIv O8g9rOYlG/vahz07AODdlTuUB8PC5I8YO+65mHuSDt0flix5HF3h3EV/zMlxoFFbrCkN Op6Npn9Q5XMs1zjr3iztcM8UR0Nz7lXMAMjfathWTJ9FQIkpnq+Xo6Oa+l3zcLOKQNsn duYg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id x12-20020a170906148c00b00791a3dd01b6si4254319ejc.864.2022.11.18.23.59.15; Fri, 18 Nov 2022 23:59:39 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233366AbiKSHyg (ORCPT + 99 others); Sat, 19 Nov 2022 02:54:36 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51960 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229515AbiKSHyb (ORCPT ); Sat, 19 Nov 2022 02:54:31 -0500 Received: from szxga08-in.huawei.com (szxga08-in.huawei.com [45.249.212.255]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D309C248DF; Fri, 18 Nov 2022 23:54:29 -0800 (PST) Received: from dggpeml500025.china.huawei.com (unknown [172.30.72.53]) by szxga08-in.huawei.com (SkyGuard) with ESMTP id 4NDmBL5jvtz15Mcm; Sat, 19 Nov 2022 15:54:02 +0800 (CST) Received: from dggpeml100012.china.huawei.com (7.185.36.121) by dggpeml500025.china.huawei.com (7.185.36.35) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Sat, 19 Nov 2022 15:54:28 +0800 Received: from huawei.com (10.67.165.24) by dggpeml100012.china.huawei.com (7.185.36.121) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Sat, 19 Nov 2022 15:54:27 +0800 From: Kai Ye To: , CC: , , , , Subject: [PATCH v10 2/3] Documentation: add the device isolation feature sysfs nodes for uacce Date: Sat, 19 Nov 2022 07:48:16 +0000 Message-ID: <20221119074817.12063-3-yekai13@huawei.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221119074817.12063-1-yekai13@huawei.com> References: <20221119074817.12063-1-yekai13@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.67.165.24] X-ClientProxiedBy: dggems703-chm.china.huawei.com (10.3.19.180) To dggpeml100012.china.huawei.com (7.185.36.121) X-CFilter-Loop: Reflected X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1749910583169321232?= X-GMAIL-MSGID: =?utf-8?q?1749910583169321232?= Update documentation describing sysfs node that could help to configure hardware error threshold for users in the user space. And describing sysfs node that could read the device isolated state. Signed-off-by: Kai Ye --- Documentation/ABI/testing/sysfs-driver-uacce | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/Documentation/ABI/testing/sysfs-driver-uacce b/Documentation/ABI/testing/sysfs-driver-uacce index 08f2591138af..d3f0b8f3c589 100644 --- a/Documentation/ABI/testing/sysfs-driver-uacce +++ b/Documentation/ABI/testing/sysfs-driver-uacce @@ -19,6 +19,24 @@ Contact: linux-accelerators@lists.ozlabs.org Description: Available instances left of the device Return -ENODEV if uacce_ops get_available_instances is not provided +What: /sys/class/uacce//isolate_strategy +Date: Nov 2022 +KernelVersion: 6.1 +Contact: linux-accelerators@lists.ozlabs.org +Description: (RW) A sysfs node that configure the error threshold for the hardware + isolation strategy. This size is a configured integer value, which is the + number of threshold for hardware errors occurred in one hour. The default is 0. + 0 means never isolate the device. The maximum value is 65535. You can write + a number of threshold based on your hardware. + +What: /sys/class/uacce//isolate +Date: Nov 2022 +KernelVersion: 6.1 +Contact: linux-accelerators@lists.ozlabs.org +Description: (R) A sysfs node that read the device isolated state. The value 1 + means the device is unavailable. The 0 means the device is + available. + What: /sys/class/uacce//algorithms Date: Feb 2020 KernelVersion: 5.7 From patchwork Sat Nov 19 07:48:17 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "yekai (A)" X-Patchwork-Id: 23229 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp606507wrr; Sat, 19 Nov 2022 00:07:36 -0800 (PST) X-Google-Smtp-Source: AA0mqf5VrAQtqGHBtk7Q8HjKUmWfeSCu3SeCHNgYdWi8EVB4VxBe1aItjlUS59CfGJP1bZn++RGS X-Received: by 2002:a17:902:ab4b:b0:186:85a0:550 with SMTP id ij11-20020a170902ab4b00b0018685a00550mr2919494plb.144.1668845256409; Sat, 19 Nov 2022 00:07:36 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1668845256; cv=none; d=google.com; s=arc-20160816; b=L87OAOvAhhFnNBryvSKqocx7f9WtqjpHv1I/MeflOIEdu7YwCJp6tlHcsLnkZhLBsW jxuQOpLIs8Ut0lV/PaNLhjHyITE5L2cdVpHAQwn33bNfKC0oAg25/OJ0BoVjQ3V7fZGC eQrAPeu+g8OzgDJ1UFy6pE26znwjgC42IFp8MHyrd6coTqFi5K7s+Y5TNRvVlUm3vgEH exQfpeciQKUVvWfMVkU/Buac+IkluHa+Ki+qA0yHJSteFyerS85wodNYN1kZuhvr+a9y 3W3RB6xeUyeC4S3aNCYi0g+2os48N//RHtyjnnQCHZTgruA5HIT1ehaxBm8RCoYEa2vT mbAw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:in-reply-to:message-id :date:subject:cc:to:from; bh=HYK0LwOLgyI5UHLUYiBMnpwWcYom0ZkLcFW9hUAPre4=; b=bvwzEoIqTxwsPTob5g0iamjloNZWjm5nSf/ocpcqkwfpOteNVmCynR5QvVK0QEAHYn qSA7SFLgdrOLmibh1UFiDYXKgSz46nXmg2Ueh6CO6uzCbUrPYlNmJiiWFT9xOtdQyf0N 3BBvLomYpPkl+ih14sKIpiKBLxUBvX2GVwIUkl2ImAHyzov6P4U6145RxKzpxC/Jodwo fRbPKemsEuy3ft7f6gxnVyibBJtcoa9c9ZS/3Pl7J0KKVcqjcg/C4L3Uj60hZNmvPjy7 a8EpSKSPryDzs4gjN9yrsQh0WjMu9tMuAS7Uc0bXl5MG7K2fkWjyfvvO119UC7cth0K6 3gZQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b2-20020a63d802000000b00476980fbc85si6018588pgh.105.2022.11.19.00.07.23; Sat, 19 Nov 2022 00:07:36 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232426AbiKSHyo (ORCPT + 99 others); Sat, 19 Nov 2022 02:54:44 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51972 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231403AbiKSHyb (ORCPT ); Sat, 19 Nov 2022 02:54:31 -0500 Received: from szxga08-in.huawei.com (szxga08-in.huawei.com [45.249.212.255]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D2FDA23166; Fri, 18 Nov 2022 23:54:29 -0800 (PST) Received: from dggpeml500020.china.huawei.com (unknown [172.30.72.56]) by szxga08-in.huawei.com (SkyGuard) with ESMTP id 4NDmBL6wC5z15Mff; Sat, 19 Nov 2022 15:54:02 +0800 (CST) Received: from dggpeml100012.china.huawei.com (7.185.36.121) by dggpeml500020.china.huawei.com (7.185.36.88) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Sat, 19 Nov 2022 15:54:28 +0800 Received: from huawei.com (10.67.165.24) by dggpeml100012.china.huawei.com (7.185.36.121) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Sat, 19 Nov 2022 15:54:28 +0800 From: Kai Ye To: , CC: , , , , Subject: [PATCH v10 3/3] crypto: hisilicon/qm - define the device isolation strategy Date: Sat, 19 Nov 2022 07:48:17 +0000 Message-ID: <20221119074817.12063-4-yekai13@huawei.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221119074817.12063-1-yekai13@huawei.com> References: <20221119074817.12063-1-yekai13@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.67.165.24] X-ClientProxiedBy: dggems703-chm.china.huawei.com (10.3.19.180) To dggpeml100012.china.huawei.com (7.185.36.121) X-CFilter-Loop: Reflected X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1749911083609173832?= X-GMAIL-MSGID: =?utf-8?q?1749911083609173832?= Define the device isolation strategy by the device driver. The user configures a hardware error threshold value by uacce interface. If the number of hardware errors exceeds the value of setting error threshold in one hour. The device will not be available in user space. The VF device use the PF device isolation strategy. All the hardware errors are processed by PF driver. Signed-off-by: Kai Ye Acked-by: Herbert Xu --- drivers/crypto/hisilicon/qm.c | 169 +++++++++++++++++++++++++++++++--- include/linux/hisi_acc_qm.h | 15 +++ 2 files changed, 169 insertions(+), 15 deletions(-) diff --git a/drivers/crypto/hisilicon/qm.c b/drivers/crypto/hisilicon/qm.c index 36d70b9f6117..0a0c984a4e7a 100644 --- a/drivers/crypto/hisilicon/qm.c +++ b/drivers/crypto/hisilicon/qm.c @@ -367,6 +367,16 @@ struct hisi_qm_resource { struct list_head list; }; +/** + * struct qm_hw_err - Structure describing the device errors + * @list: hardware error list + * @timestamp: timestamp when the error occurred + */ +struct qm_hw_err { + struct list_head list; + unsigned long long timestamp; +}; + struct hisi_qm_hw_ops { int (*get_vft)(struct hisi_qm *qm, u32 *base, u32 *number); void (*qm_db)(struct hisi_qm *qm, u16 qn, @@ -2469,6 +2479,113 @@ static long hisi_qm_uacce_ioctl(struct uacce_queue *q, unsigned int cmd, return -EINVAL; } +/** + * qm_hw_err_isolate() - Try to set the isolation status of the uacce device + * according to user's configuration of error threshold. + * @qm: the uacce device + */ +static int qm_hw_err_isolate(struct hisi_qm *qm) +{ + struct qm_hw_err *err, *tmp, *hw_err; + struct qm_err_isolate *isolate; + u32 count = 0; + + isolate = &qm->isolate_data; + +#define SECONDS_PER_HOUR 3600 + + /* All the hw errs are processed by PF driver */ + if (qm->uacce->is_vf || isolate->is_isolate || !isolate->err_threshold) + return 0; + + hw_err = kzalloc(sizeof(*hw_err), GFP_KERNEL); + if (!hw_err) + return -ENOMEM; + + /* + * Time-stamp every slot AER error. Then check the AER error log when the + * next device AER error occurred. if the device slot AER error count exceeds + * the setting error threshold in one hour, the isolated state will be set + * to true. And the AER error logs that exceed one hour will be cleared. + */ + mutex_lock(&isolate->isolate_lock); + hw_err->timestamp = jiffies; + list_for_each_entry_safe(err, tmp, &isolate->qm_hw_errs, list) { + if ((hw_err->timestamp - err->timestamp) / HZ > + SECONDS_PER_HOUR) { + list_del(&err->list); + kfree(err); + } else { + count++; + } + } + list_add(&hw_err->list, &isolate->qm_hw_errs); + mutex_unlock(&isolate->isolate_lock); + + if (count >= isolate->err_threshold) + isolate->is_isolate = true; + + return 0; +} + +static void qm_hw_err_destroy(struct hisi_qm *qm) +{ + struct qm_hw_err *err, *tmp; + + mutex_lock(&qm->isolate_data.isolate_lock); + list_for_each_entry_safe(err, tmp, &qm->isolate_data.qm_hw_errs, list) { + list_del(&err->list); + kfree(err); + } + mutex_unlock(&qm->isolate_data.isolate_lock); +} + +static enum uacce_dev_state hisi_qm_get_isolate_state(struct uacce_device *uacce) +{ + struct hisi_qm *qm = uacce->priv; + struct hisi_qm *pf_qm; + + if (uacce->is_vf) + pf_qm = pci_get_drvdata(pci_physfn(qm->pdev)); + else + pf_qm = qm; + + return pf_qm->isolate_data.is_isolate ? + UACCE_DEV_ISOLATE : UACCE_DEV_NORMAL; +} + +static int hisi_qm_isolate_threshold_write(struct uacce_device *uacce, u32 num) +{ + struct hisi_qm *qm = uacce->priv; + + /* Must be set by PF */ + if (uacce->is_vf) + return -EPERM; + + if (qm->isolate_data.is_isolate) + return -EPERM; + + qm->isolate_data.err_threshold = num; + + /* After the policy is updated, need to reset the hardware err list */ + qm_hw_err_destroy(qm); + + return 0; +} + +static u32 hisi_qm_isolate_threshold_read(struct uacce_device *uacce) +{ + struct hisi_qm *qm = uacce->priv; + struct hisi_qm *pf_qm; + + if (uacce->is_vf) { + pf_qm = pci_get_drvdata(pci_physfn(qm->pdev)); + return pf_qm->isolate_data.err_threshold; + } + + return qm->isolate_data.err_threshold; +} + static const struct uacce_ops uacce_qm_ops = { .get_available_instances = hisi_qm_get_available_instances, .get_queue = hisi_qm_uacce_get_queue, @@ -2478,8 +2595,22 @@ static const struct uacce_ops uacce_qm_ops = { .mmap = hisi_qm_uacce_mmap, .ioctl = hisi_qm_uacce_ioctl, .is_q_updated = hisi_qm_is_q_updated, + .get_isolate_state = hisi_qm_get_isolate_state, + .isolate_err_threshold_write = hisi_qm_isolate_threshold_write, + .isolate_err_threshold_read = hisi_qm_isolate_threshold_read, }; +static void qm_remove_uacce(struct hisi_qm *qm) +{ + struct uacce_device *uacce = qm->uacce; + + if (qm->use_sva) { + qm_hw_err_destroy(qm); + uacce_remove(uacce); + qm->uacce = NULL; + } +} + static int qm_alloc_uacce(struct hisi_qm *qm) { struct pci_dev *pdev = qm->pdev; @@ -2506,8 +2637,7 @@ static int qm_alloc_uacce(struct hisi_qm *qm) qm->use_sva = true; } else { /* only consider sva case */ - uacce_remove(uacce); - qm->uacce = NULL; + qm_remove_uacce(qm); return -EINVAL; } @@ -2540,6 +2670,8 @@ static int qm_alloc_uacce(struct hisi_qm *qm) uacce->qf_pg_num[UACCE_QFRT_DUS] = dus_page_nr; qm->uacce = uacce; + INIT_LIST_HEAD(&qm->isolate_data.qm_hw_errs); + mutex_init(&qm->isolate_data.isolate_lock); return 0; } @@ -4029,6 +4161,12 @@ static int qm_controller_reset_prepare(struct hisi_qm *qm) return ret; } + if (qm->use_sva) { + ret = qm_hw_err_isolate(qm); + if (ret) + pci_err(pdev, "failed to isolate hw err!\n"); + } + ret = qm_wait_vf_prepare_finish(qm); if (ret) pci_err(pdev, "failed to stop by vfs in soft reset!\n"); @@ -4336,21 +4474,25 @@ static int qm_controller_reset(struct hisi_qm *qm) qm->err_ini->show_last_dfx_regs(qm); ret = qm_soft_reset(qm); - if (ret) { - pci_err(pdev, "Controller reset failed (%d)\n", ret); - qm_reset_bit_clear(qm); - return ret; - } + if (ret) + goto err_reset; ret = qm_controller_reset_done(qm); - if (ret) { - qm_reset_bit_clear(qm); - return ret; - } + if (ret) + goto err_reset; pci_info(pdev, "Controller reset complete\n"); return 0; + +err_reset: + pci_err(pdev, "Controller reset failed (%d)\n", ret); + qm_reset_bit_clear(qm); + + /* if resetting fails, isolate the device */ + if (qm->use_sva) + qm->isolate_data.is_isolate = true; + return ret; } /** @@ -5271,10 +5413,7 @@ int hisi_qm_init(struct hisi_qm *qm) err_free_qm_memory: hisi_qm_memory_uninit(qm); err_alloc_uacce: - if (qm->use_sva) { - uacce_remove(qm->uacce); - qm->uacce = NULL; - } + qm_remove_uacce(qm); err_irq_register: qm_irqs_unregister(qm); err_pci_init: diff --git a/include/linux/hisi_acc_qm.h b/include/linux/hisi_acc_qm.h index 61f012d67f60..67f92f696784 100644 --- a/include/linux/hisi_acc_qm.h +++ b/include/linux/hisi_acc_qm.h @@ -272,6 +272,20 @@ struct hisi_qm_poll_data { u16 *qp_finish_id; }; +/** + * struct qm_err_isolate + * @isolate_lock: protects device error log + * @err_threshold: user config error threshold which triggers isolation + * @is_isolate: device isolation state + * @uacce_hw_errs: index into qm device error list + */ +struct qm_err_isolate { + struct mutex isolate_lock; + u32 err_threshold; + bool is_isolate; + struct list_head qm_hw_errs; +}; + struct hisi_qm { enum qm_hw_ver ver; enum qm_fun_type fun_type; @@ -341,6 +355,7 @@ struct hisi_qm { struct qm_shaper_factor *factor; u32 mb_qos; u32 type_rate; + struct qm_err_isolate isolate_data; }; struct hisi_qp_status {