Message ID | 20231130075247.3078931-1-quan@os.amperecomputing.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:bcd1:0:b0:403:3b70:6f57 with SMTP id r17csp218924vqy; Wed, 29 Nov 2023 23:53:20 -0800 (PST) X-Google-Smtp-Source: AGHT+IFqmyFUamO3Ee8sLnASxj4yx/jggurY4iDZ1iH/7Uz9p6eiTRImMp7bhedrKOYRRMfO417o X-Received: by 2002:a05:6870:b682:b0:1fa:16e3:7128 with SMTP id cy2-20020a056870b68200b001fa16e37128mr20903556oab.1.1701330800604; Wed, 29 Nov 2023 23:53:20 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1701330800; cv=pass; d=google.com; s=arc-20160816; b=zgaujVjHZl7iC3r04snXbRwycyaZIXW6/nMocOc5u9HM0ibQuMUPmNCOSSgQPSs3+l cEH2nNf2+eAdgBaBK5sJHYh74s4txsdN8LpmPtgNEhzwpKkcNZRVq4hpG/LKfQok/Wki k27aSfhng0VwVQ7n+aHgMy+/nwy08cY1J1Fh/9jmuS4rREE0rJre381cHkO7MxN8rd7b P08RdYXrDutoKbZXt9vRio/C2almS4MXjJy1e5ohvX0fXyubS9pyc5ddnNP8IZdznLEi zxMJm+pD7u6wTZ1M+V85iB5Su5t29KmO+Oh4HkB3+x1FDqUkLiJBL2rDxJ8Wp72g6PEz dv+w== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:content-transfer-encoding :message-id:date:subject:cc:to:from:dkim-signature; bh=asqmTou09C/3k66FLy5kyxq5QD0gk2YCPZVq0DeXgMI=; fh=WoYFUkiINZtHukqq7o95wOpfMz927+l3IoZ1the0pSo=; b=x1GVC9AxOaIw6+VXUTvz7di6xzLl2fpw/eG7VcENFY52lKRSDdpDSW9brV9Ev4Ylrr 7q0uBDIfkhM1gSkAyJ8Zet16AoyH8J9qU80ih31y3DjOn4pov9nBSm68aj5H1ImAOdNm Vec9+pATHDOFyR5YcSkJUPrI3RdsnukbAIXE6rlX3CPHkh+LAiD5lNZ1SY/mquVWPyev i8pQgPlNpwPFPF+V/BqbMvKyeUhUfAZT0pmnoLsiKVOowZfqV2YngHYHjIa27xxDnbkf pPJ+7c4NMkKUP/YOBK7f57T64kZR4y8zf4u1IZOkg7jD6LaLJMPL27LGvGrcOwZw1aGF qAcQ== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@os.amperecomputing.com header.s=selector2 header.b=GAoFdcgZ; arc=pass (i=1 spf=pass spfdomain=os.amperecomputing.com dkim=pass dkdomain=os.amperecomputing.com dmarc=pass fromdomain=os.amperecomputing.com); spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amperecomputing.com Received: from snail.vger.email (snail.vger.email. [2620:137:e000::3:7]) by mx.google.com with ESMTPS id s22-20020a056a0008d600b006c6930e7540si717215pfu.121.2023.11.29.23.53.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 29 Nov 2023 23:53:20 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) client-ip=2620:137:e000::3:7; Authentication-Results: mx.google.com; dkim=pass header.i=@os.amperecomputing.com header.s=selector2 header.b=GAoFdcgZ; arc=pass (i=1 spf=pass spfdomain=os.amperecomputing.com dkim=pass dkdomain=os.amperecomputing.com dmarc=pass fromdomain=os.amperecomputing.com); spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amperecomputing.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id ED0BA802F857; Wed, 29 Nov 2023 23:53:18 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229971AbjK3HxK (ORCPT <rfc822;ruipengqi7@gmail.com> + 99 others); Thu, 30 Nov 2023 02:53:10 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54880 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229596AbjK3HxJ (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 30 Nov 2023 02:53:09 -0500 Received: from NAM10-DM6-obe.outbound.protection.outlook.com (mail-dm6nam10on2096.outbound.protection.outlook.com [40.107.93.96]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 66195A3; Wed, 29 Nov 2023 23:53:14 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=h+WbDS611KgkvhVYyrViY76mtvhZrZonQlW/ch3Zuo4mXN6bRmVLaimul1pNWY8naP+lX3MXpQA+xtq6WKGZZr72uoWCIFRdv0JvgatewRv/JB6pAxTJYLSfoHW6V4mukiBKC3iHeCt6GZ35kEtVvfRbuuTg9BTZE1h4QusEkRmkMJr+zltkJ2moadtrnqbQdL0OGwVY3PZvTGJlegdEnpD+glQaA50IfLHHLP8GBvCpzjSsAPTRIq4nkE3TN4LSzmu4F9sf4oSIIZarXo/Y2Y4NWbrtnfdhCAC1gEI0tSpih6X7sWgpebMpTeWwsRb7uCXb2jg73u4fzor+ehJU2w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=asqmTou09C/3k66FLy5kyxq5QD0gk2YCPZVq0DeXgMI=; b=P4BgUFGtHTLh0Lk7Rx6WUO9BckKfSSYCFoNFTM5bRRm+kELxR4D3JrSjYpwYyRMICPmE17g1dupXxRZ1z/G7ShFeXl2806uwCz/JUIMjRIOWoFVXkQOCRoOHb4hViluyfXXIJQbNKwT1TcWnWKHUEXH/RiDc4fDVHHRPdJyE/+/Ijq9Op8EGcD1q1kDeUbjksWX9aLqJ/DId1Q1ugph6ckqRQoP7sF5M6dIhELG7c+zYmCF/DDkthHLimjES3JY6SUeKOU+Bx2KtAnmEhlAZWS+v24ATvPghv/YAMI0lsddOtq08xsI69G7JM46nGoz7FTx0AS18XZL3co40fycTnw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=os.amperecomputing.com; dmarc=pass action=none header.from=os.amperecomputing.com; dkim=pass header.d=os.amperecomputing.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=os.amperecomputing.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=asqmTou09C/3k66FLy5kyxq5QD0gk2YCPZVq0DeXgMI=; b=GAoFdcgZE+1LSIQzgSyGScnyp3V3qOjGkDFTlMc/D5wi50/gj0thBe9ZV/0vVjO41WbkVKB+zOSHMMjE+tDPJRjGW7W1I8eACZEZn7yod/KsvuMH0b/2kM5wN7rfOgn04BtgkdbNL2+3DJAT9JcO41C9jd2G3QoKwl3Q/yrVPeU= Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=os.amperecomputing.com; Received: from SN4PR01MB7455.prod.exchangelabs.com (2603:10b6:806:202::11) by BN0PR01MB7037.prod.exchangelabs.com (2603:10b6:408:149::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7046.23; Thu, 30 Nov 2023 07:53:10 +0000 Received: from SN4PR01MB7455.prod.exchangelabs.com ([fe80::5682:1d84:171a:1d68]) by SN4PR01MB7455.prod.exchangelabs.com ([fe80::5682:1d84:171a:1d68%3]) with mapi id 15.20.7025.022; Thu, 30 Nov 2023 07:53:10 +0000 From: Quan Nguyen <quan@os.amperecomputing.com> To: Jeremy Kerr <jk@codeconstruct.com.au>, Matt Johnston <matt@codeconstruct.com.au>, "David S . Miller" <davem@davemloft.net>, Eric Dumazet <edumazet@google.com>, Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, openbmc@lists.ozlabs.org, Open Source Submission <patches@amperecomputing.com> Cc: Phong Vo <phong@os.amperecomputing.com>, Thang Nguyen <thang@os.amperecomputing.com>, Quan Nguyen <quan@os.amperecomputing.com>, Dung Cao <dung@os.amperecomputing.com> Subject: [PATCH] mctp i2c: Requeue the packet when arbitration is lost Date: Thu, 30 Nov 2023 14:52:47 +0700 Message-Id: <20231130075247.3078931-1-quan@os.amperecomputing.com> X-Mailer: git-send-email 2.35.1 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-ClientProxiedBy: SI2PR04CA0017.apcprd04.prod.outlook.com (2603:1096:4:197::15) To SN4PR01MB7455.prod.exchangelabs.com (2603:10b6:806:202::11) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SN4PR01MB7455:EE_|BN0PR01MB7037:EE_ X-MS-Office365-Filtering-Correlation-Id: dec803bb-62be-4d3a-a3db-08dbf1796524 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 0RSgACKP6REJcVpP7GabG/UmPv5uEeQjfoWUZvQ2CA0cKzY7SalkzxtCbYgm1qxGmkGpDVtFPUgQLP+eC/oq8CYCKY//e4XQO0r+WtdDpTapzekIe6Ir2CLp+v/Z+eAH4Ts7m9ZhTa4jHTGo6UrW8ADi6pO7Y2VJCbaVN1k+0OHZaBvTBDtywGpeYrh8vgszHVKLD1gDT19jyGU+d+3V25mhZBHhqG1rHMVi4P4csbcYhuui58o/vNv0Kmw6IpSzaNoEuPWSSI0kr0zzAjAp1jK2E1CPskIpXvBFtSE+WRHDIFDDwZO2Er82zNjYLOqHp91uOuZ2awIsIMdYPVo8Fr3bBrjQrFllPoaCWWx6vGvG3LnH9oNBOz+ZD2UcPwHUa1uz9CJxrXulkHmAjruIGFvqem04fDHI30gb8ULQaW0Ihw3ykWtssabLnLJiB7MEszZZrxupPRcVSLYrHA5pJzYxAAOOQ06ahqMDQD6+HTMNCZMh72oyREdyzOZz6BQh6zHra/UxoMSaH347RG7NqDaeg1jQYaLfqyahj0DT3ADrY9/t/wkvYo43+WbeSs/PWWKfjPMq9Indz7lqfK+bH7xLgHaKIxUGOQz76P2tcOFE9IYknamuRHl92m9lqfmXS4E1JIG6kZBjoQ/jkWT26GqHg+C69yCfaWA/ARYH6zolN7z0dSfEZdbwRc48VeRrtJThbvcx8GHIpluJGdYbFcbitjpJQU3w34bU8DORxNA= X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:SN4PR01MB7455.prod.exchangelabs.com;PTR:;CAT:NONE;SFS:(13230031)(136003)(376002)(346002)(396003)(39850400004)(366004)(230922051799003)(186009)(1800799012)(64100799003)(451199024)(6506007)(6666004)(52116002)(38100700002)(8676002)(8936002)(4326008)(41300700001)(6512007)(478600001)(6486002)(110136005)(2906002)(921008)(107886003)(38350700005)(2616005)(1076003)(66946007)(316002)(5660300002)(26005)(54906003)(66476007)(66556008)(86362001)(202311291699003)(83380400001);DIR:OUT;SFP:1102; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: BTOFA95GMgy6W6Dkz70L9sah9uCiL5iyYcdU2M0IMTJ+EHkTexy3vpnqep63mo/M45TY7G0qbKrPfmy4rC/jsH53hv9MZgE3Ldz5+1C41Oj2SsOwn0o17qxJEwyyDw9NcVhvh8LbnIYlXNIhLNDNAj9ey20YmI1ttBNBjl20NahSXfwwro4rfSGP0yZ66GFbFEpc+3dPtRwX8sUiE4NMtA0bAEqRcbuZwxYedswhKc9K88DPS7AEKr5HgRYBsdKoP8xAnffpw/3EJf9DWUHa4i4qPqvk35ns/BjfU4CAOo7yDhDyahPsLqL6rMorV4u8Qu3C4ozA0uHh6N9ANEaQORNDgGrEe0yxXxY7VvufCnRmcbIOOOFDwapOj+KNEqJeU4II7Uvu56Es+zkwERxcmHrwNmROP76dIZdABXDAAdpGivu6OHMg+ozb3hhZne4QpsQROtI6hn6AC4KNkNqdgJyyW1ztvDGnXZbgnFYs+WK5ZMDItoPkUPXzc6GRUF2EpQUlwyjD5W1+JdN2E8wpbsxMLsholpuidcfUsaoXgKK3kL4egwyN2vn30F3/DbzZ4hUPx/Pq6POzbLz30AXk5aguPnxvDJQsLsdbmldGZupQf0RKbruU4C195a7rP3m3iNUDDWsXk8oPfQqB6q0YnRYRFmt9cmRaX0YByAViO1jpW5dOYpe40Z4eoEh9HuN+NoX5eQUyVrO0QSS0/CHjmXhNndgcyBrrkQ6KGCCOX+hxsbtN2m0iQDAtILqfhKhLtmQwEIHaD+dE/WPET/XJ73ZdrA/sWQ3hTfL+WxBsXiGigMmlr6tpyXci2vb+6tJwLl5tddgozXGooHeSqh6AakyT++dlKkrYo1Jdqh2kQUCeG17nf7FObjx98uyzVZ8dbfxHD84OUYuD+ucR49kLJ+uBUAvjxLvH0yc0d8adxwnS3UVilFHsKv1moyvdnORJUFcyvytdfVvUEiFue2Cft75g/lBSH68C+b5Pm0ExUlJesNeG/O/gcpA+2YqHGHHRVjhcv9NE90ITS82T81uYZHvXDYwQl7tBHefuf3fJX+LT5SMSp3Zc/PLkjyOR4soh2Juvd4dc3sUuFQk2k/XwsMKXaF7qt7Q39rAFz174f/7WmsiI5qC6Kcl7YcbhJyjmgW7GK7cHBVjyxIyRoJ2gvEZGPDjiPMpgTiU9DXCGgO1ETAExKXhN58va9osFy31q1cDGi8/40f3DIMvcao+FJ6jVF71mv6r2bxwJrUqKq43olKyL9+CyQ2Q2AqMm2SsD5mu3NtjbzlIsV2qOm2eq0mDEZDrCI32+BbBNagtncYCDJtYOXnU/V4B/cLpGMxkUBAuGhzrMeBCEgFmT0C1AX0ZDr43Ntj4h/lYkQ9umXeqC+02JbSIM4nybo22DXYMQOvODBoFFsDRsbzaehCyiSlyk3aQjn9Ka0+Hzv/Q4pTCUWj0oJGRgkNFKhngbNPTxuDPjxSFkL8YqgifOJBW9HWIuTiVh/fqYB3OyKuVb/pD254lYlgYEKh1gFvkzsUZxht9SduojfkJU98OYXI38OMO1wiqvvo7Uq4b7jakW+Spd9Pek94/YFqA+qc/4/QEzQ3Wuc36AcSN5LmvgjcOY46pGEmCRJnU5mTsM/ywJQvE= X-OriginatorOrg: os.amperecomputing.com X-MS-Exchange-CrossTenant-Network-Message-Id: dec803bb-62be-4d3a-a3db-08dbf1796524 X-MS-Exchange-CrossTenant-AuthSource: SN4PR01MB7455.prod.exchangelabs.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 30 Nov 2023 07:53:09.9429 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 3bc2b170-fd94-476d-b0ce-4229bdc904a7 X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: elJv3Q8NAPvHz53o4EtvuDVriztZEY/i3Ri+y+RlnLWfz9Cy3fz6sGXIx8EiU3IY6d8ebTzpSeVNS0SXzLK3iywdgVZaHIPTcr3Vy1ucP4g= X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN0PR01MB7037 X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,RCVD_IN_DNSWL_BLOCKED,RCVD_IN_MSPIKE_H2, SPF_HELO_PASS,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Wed, 29 Nov 2023 23:53:19 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1783974645583137495 X-GMAIL-MSGID: 1783974645583137495 |
Series |
mctp i2c: Requeue the packet when arbitration is lost
|
|
Commit Message
Quan Nguyen
Nov. 30, 2023, 7:52 a.m. UTC
If arbitration is lost, __i2c_transfer() returns -EAGAIN and the packet should be resent. Requeue the packet and increase collisions count on this case. Co-developed-by: Dung Cao <dung@os.amperecomputing.com> Signed-off-by: Dung Cao <dung@os.amperecomputing.com> Signed-off-by: Quan Nguyen <quan@os.amperecomputing.com> --- drivers/net/mctp/mctp-i2c.c | 35 +++++++++++++++++++++-------------- 1 file changed, 21 insertions(+), 14 deletions(-)
Comments
Hi Quan, > If arbitration is lost, __i2c_transfer() returns -EAGAIN and the > packet should be resent. > > Requeue the packet and increase collisions count on this case. Are you sure you want to re-queue the packet here? The i2c core would have already retried on arbitration loss: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/i2c/i2c-core-base.c#n2223 With this change, we would be disregarding the limits in adap->retries and/or adap->timeout. Cheers, Jeremy
On 30/11/2023 15:03, Jeremy Kerr wrote: > Hi Quan, > >> If arbitration is lost, __i2c_transfer() returns -EAGAIN and the >> packet should be resent. >> >> Requeue the packet and increase collisions count on this case. > > Are you sure you want to re-queue the packet here? The i2c core would > have already retried on arbitration loss: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/i2c/i2c-core-base.c#n2223 > > With this change, we would be disregarding the limits in adap->retries > and/or adap->timeout. > Yes, the __i2c_transfer() does try its best to send the packet but if it still fail with -EAGAIN, that means it still unable to win the arbitration. Without this patch we usually see the error below. The consequence is it stop and need the PLDM layer to retry when timed out, which takes time. "[ 61.616210] i2c i2c-3: __i2c_transfer failed -11" With this commit, we all see the packets go through peacefully just by requeueing the packets at MCTP layer and eliminated the need to retry in PLDM layer which would need more time. The result in our test is as below (Note: would needs some addition debug print code). I also think it is worth noting that in our setup there are multiple Masters. [73183.511487] i2c i2c-3: Arbitration loss, re-queue the packet [73192.550519] i2c i2c-3: __i2c_transfer failed -11 [73192.555734] i2c i2c-3: Arbitration loss, re-queue the packet [73207.250036] i2c i2c-3: __i2c_transfer failed -11 [73207.255247] i2c i2c-3: Arbitration loss, re-queue the packet [73429.499875] i2c i2c-3: __i2c_transfer failed -11 [73429.505116] i2c i2c-3: Arbitration loss, re-queue the packet [73504.672065] i2c i2c-3: __i2c_transfer failed -11 [73504.677317] i2c i2c-3: Arbitration loss, re-queue the packet [73540.762984] i2c i2c-3: __i2c_transfer failed -11 [73540.768242] i2c i2c-3: Arbitration loss, re-queue the packet [73631.040049] i2c i2c-3: __i2c_transfer failed -11 [73631.045333] i2c i2c-3: Arbitration loss, re-queue the packet [73648.538967] i2c i2c-3: __i2c_transfer failed -11 Thanks, - Quan
Hi Quan, > With this commit, we all see the packets go through peacefully just > by requeueing the packets at MCTP layer and eliminated the need to > retry in PLDM layer which would need more time. It's certainly possible that this tries harder to send MCTP packets, but it's just duplicating the retry mechanism already present in the i2c core. Why do we need another retry mechanism here? How is the i2c core retry mechanism not sufficient? It would seem that you can get the same behaviour by adjusting the retries/timeout limits in the core code, which has the advantage of applying to all arbitration loss events on the i2c bus, not just for MCTP tx. Cheers, Jeremy
On 30/11/2023 16:40, Jeremy Kerr wrote: > Hi Quan, > >> With this commit, we all see the packets go through peacefully just >> by requeueing the packets at MCTP layer and eliminated the need to >> retry in PLDM layer which would need more time. > > It's certainly possible that this tries harder to send MCTP packets, > but it's just duplicating the retry mechanism already present in the > i2c core. > > Why do we need another retry mechanism here? How is the i2c core retry > mechanism not sufficient? > > It would seem that you can get the same behaviour by adjusting the > retries/timeout limits in the core code, which has the advantage of > applying to all arbitration loss events on the i2c bus, not just for > MCTP tx. > Hi Jeremy, As per [1], __i2c_transfer() will retry for adap->retries times consecutively (without any wait) within the amount of time specified by adap->timeout. So as per my observation, once it loses the arbitration, the next retry is most likely still lost as another controller who win the arbitration may still be using the bus. Especially for upper layer protocol message like PLDM or SPDM, which size is far bigger than SMBus, usually ends up to queue several MCTP packets at a time. But if to requeue the packet, it must wait to acquire the lock before actually queueing that packet, and that is more likely to increase the chance to win the arbitration than to retry it right away as on the i2c core. Another reason is that, as i2c is widely used for many other applications, fixing the retry algorithm within the i2c core seems impossible. The other fact is that the initial default value of these two parameters depends on each type of controller; I'm not sure why yet. + i2c-aspeed: retries=0 timeout=1 sec [2] + i2c-cadence: retries=3 timeout=1 sec [3] + i2c-designware: retries=3 timeout=1 sec [4], [5] + i2c-emev2: retries=5 timeout=1 sec [6] + ... Unfortunately, in our case, we use i2c-aspeed, and there is only one try (see [2]), and that means we have only one single shot. I'm not sure why i2c-aspeed chose to set retries=0 by default, but I guess there must be a reason behind it. And yes, I agree, as per [7], these two parameters could be adjusted via ioctl() from userspace if the user wishes to change them. But, honestly, forcing users to change these parameters is not a good way, as I might have to say. To avoid that, requeueing the packet in the MCTP layer was kind of way better choice, and it was confirmed via our case. Thanks, - Quan [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/i2c/i2c-core-base.c#n2223 [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/i2c/busses/i2c-aspeed.c#n1030 [3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/i2c/busses/i2c-cadence.c#n1322 [4] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/i2c/busses/i2c-designware-master.c#n1030 [5] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/i2c/busses/i2c-designware-slave.c#n258 [6] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/i2c/busses/i2c-emev2.c#n385 [7] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/i2c/i2c-dev.c#n478
Hi Quan, > As per [1], __i2c_transfer() will retry for adap->retries times > consecutively (without any wait) within the amount of time specified > by adap->timeout. > > So as per my observation, once it loses the arbitration, the next > retry is most likely still lost as another controller who win the > arbitration may still be using the bus. In general (and specifically with your hardware setup), the controller should be waiting for a bus-idle state before attempting the retransmission. You may well hit another arbitration loss after that, but it won't be from the same bus activity. > Especially for upper layer protocol message like PLDM or SPDM, which > size is far bigger than SMBus, usually ends up to queue several MCTP > packets at a time. But if to requeue the packet, it must wait to > acquire the lock ... you're relying on the delay of acquiring a spinlock? The only contention on that lock is from local packets being sent to the device (and, in heavy TX backlogs, the netif queue will be stopped, so that lock will be uncontended). That sounds fairly fragile, and somewhat disconnected from the goal of waiting for a bus idle state. > before actually queueing that packet, and that is > more likely to increase the chance to win the arbitration than to > retry it right away as on the i2c core. > > Another reason is that, as i2c is widely used for many other > applications, fixing the retry algorithm within the i2c core seems > impossible. What needs fixing there? You haven't identified any issue with it. > The other fact is that the initial default value of these two > parameters > depends on each type of controller; I'm not sure why yet. > > + i2c-aspeed: retries=0 timeout=1 sec [2] > + i2c-cadence: retries=3 timeout=1 sec [3] > + i2c-designware: retries=3 timeout=1 sec [4], [5] > + i2c-emev2: retries=5 timeout=1 sec [6] > + ... > > Unfortunately, in our case, we use i2c-aspeed, and there is only one > try (see [2]), and that means we have only one single shot. I'm not > sure why i2c-aspeed chose to set retries=0 by default, but I guess > there must be a reason behind it. I would suggest that the actual fix you want here is to increase that retry count, rather than working-around your "not sure" points above with a duplication of the common retry mechanism. > And yes, I agree, as per [7], these two parameters could be adjusted > via ioctl() from userspace if the user wishes to change them. But, > honestly, forcing users to change these parameters is not a good way, > as I might have to say. But now you're forcing users to use your infinite-retry mechanism instead. We already have a retry mechanism, which is user-configurable, and we can set per-controller defaults. If you believe the defaults (present in the aspeed driver) are not suitable, and it's too onerous for users to adjust, then I would suggest proposing a change to the default. Your requeue approach has a problem, in that there is no mechanism for recovery on repeated packet contention. In a scenario where a specific packet always causes contention (say, a faulty device on the bus attempts to respond to that packet too early), then the packet is never dequeued; other packets already queued will be blocked behind this one packet. The only way to make forward progress from there is to fill the TX queue completely. You could address that by limiting the retries and/or having a timeout, but then you may as well just use the existing retry mechanism that we already have, which already does that. > To avoid that, requeueing the packet in the MCTP layer was kind of > way better choice, and it was confirmed via our case. Your earlier examples showed a max of one retry was needed for recovery. I would suggest bumping the i2c-aspeed driver default to suit the other in-tree controllers would be a much more appropriate fix. Cheers, Jeremy
On 01/12/2023 15:38, Jeremy Kerr wrote: > Hi Quan, > >> As per [1], __i2c_transfer() will retry for adap->retries times >> consecutively (without any wait) within the amount of time specified >> by adap->timeout. >> >> So as per my observation, once it loses the arbitration, the next >> retry is most likely still lost as another controller who win the >> arbitration may still be using the bus. > > In general (and specifically with your hardware setup), the controller > should be waiting for a bus-idle state before attempting the > retransmission. You may well hit another arbitration loss after that, > but it won't be from the same bus activity. > Yes, that is the correct behaviour and I think that is why __i2c_transfer() return -EAGAIN when arbitration loss. One example I could find in kernel tree is here [a], and, it seems not to totally rely on the retry mechanism implemented in i2c core. [a] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/video/fbdev/omap2/omapfb/displays/connector-dvi.c#n157 Anyway, it's been great to discuss on this topic. To answer your questions I did have to dig around and have learn a lot. :-) My sincere thanks to you, - Quan
diff --git a/drivers/net/mctp/mctp-i2c.c b/drivers/net/mctp/mctp-i2c.c index b37a9e4bade4..132023306052 100644 --- a/drivers/net/mctp/mctp-i2c.c +++ b/drivers/net/mctp/mctp-i2c.c @@ -442,14 +442,14 @@ static void mctp_i2c_unlock_reset(struct mctp_i2c_dev *midev) i2c_unlock_bus(midev->adapter, I2C_LOCK_SEGMENT); } -static void mctp_i2c_xmit(struct mctp_i2c_dev *midev, struct sk_buff *skb) +static int mctp_i2c_xmit(struct mctp_i2c_dev *midev, struct sk_buff *skb) { struct net_device_stats *stats = &midev->ndev->stats; enum mctp_i2c_flow_state fs; struct mctp_i2c_hdr *hdr; struct i2c_msg msg = {0}; u8 *pecp; - int rc; + int rc = 0; fs = mctp_i2c_get_tx_flow_state(midev, skb); @@ -461,17 +461,11 @@ static void mctp_i2c_xmit(struct mctp_i2c_dev *midev, struct sk_buff *skb) dev_warn_ratelimited(&midev->adapter->dev, "Bad tx length %d vs skb %u\n", hdr->byte_count + 3, skb->len); - return; + return -EINVAL; } - if (skb_tailroom(skb) >= 1) { - /* Linear case with space, we can just append the PEC */ - skb_put(skb, 1); - } else { - /* Otherwise need to copy the buffer */ - skb_copy_bits(skb, 0, midev->tx_scratch, skb->len); - hdr = (void *)midev->tx_scratch; - } + skb_copy_bits(skb, 0, midev->tx_scratch, skb->len); + hdr = (void *)midev->tx_scratch; pecp = (void *)&hdr->source_slave + hdr->byte_count; *pecp = i2c_smbus_pec(0, (u8 *)hdr, hdr->byte_count + 3); @@ -503,17 +497,20 @@ static void mctp_i2c_xmit(struct mctp_i2c_dev *midev, struct sk_buff *skb) break; case MCTP_I2C_TX_FLOW_INVALID: - return; + return rc; } if (rc < 0) { dev_warn_ratelimited(&midev->adapter->dev, "__i2c_transfer failed %d\n", rc); stats->tx_errors++; + if (rc == -EAGAIN) + stats->collisions++; } else { stats->tx_bytes += skb->len; stats->tx_packets++; } + return rc; } static void mctp_i2c_flow_release(struct mctp_i2c_dev *midev) @@ -568,6 +565,7 @@ static int mctp_i2c_tx_thread(void *data) struct mctp_i2c_dev *midev = data; struct sk_buff *skb; unsigned long flags; + int rc; for (;;) { if (kthread_should_stop()) @@ -583,8 +581,17 @@ static int mctp_i2c_tx_thread(void *data) mctp_i2c_flow_release(midev); } else if (skb) { - mctp_i2c_xmit(midev, skb); - kfree_skb(skb); + rc = mctp_i2c_xmit(midev, skb); + if (rc == -EAGAIN) { + spin_lock_irqsave(&midev->tx_queue.lock, flags); + if (skb_queue_len(&midev->tx_queue) >= MCTP_I2C_TX_WORK_LEN) + kfree_skb(skb); + else + __skb_queue_head(&midev->tx_queue, skb); + spin_unlock_irqrestore(&midev->tx_queue.lock, flags); + } else { + kfree_skb(skb); + } } else { wait_event_idle(midev->tx_wq,