Message ID | ZVf/pqw5YcF7sldg@shikoro |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:9910:0:b0:403:3b70:6f57 with SMTP id i16csp895262vqn; Fri, 17 Nov 2023 16:05:21 -0800 (PST) X-Google-Smtp-Source: AGHT+IGX7lu4BG8u3EGyLIzTbNy+sQcMbg7CEUzatGqjO/EEuRkondjSjEcIQh3dlb0votXjmwyc X-Received: by 2002:a05:6808:23cd:b0:3ad:aee1:1b3e with SMTP id bq13-20020a05680823cd00b003adaee11b3emr1326545oib.8.1700265921113; Fri, 17 Nov 2023 16:05:21 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1700265921; cv=none; d=google.com; s=arc-20160816; b=CqTr5u6BWdSZt/OaPyunB6eDZF6SShTt0kpEKhRdDAKMdrzAhNSVy3Yrwr/u819njF m0gFdEY59M0mzmeu4irv2E1hK+/Qm7RJKIJNGeTSxQ8c4emChOg0ntCLGOeSK783Fyc6 N6i9wyTvr9D5TlG8HltaSlP8yDMQkYB7AGEBJaTYvVv3uJ6jq0HDUQ51xNPoWncq4qOQ GGywdbKhq8Y6ND28MyB4oBpKC4rYgZWYdN0/RgHZNWpvlBEoC0LX+2D0JgAUvvLHFvJq DTz3sSlKv6bwYyUh8o7Pk1GkTlc7viEuy400Ld+Js/25KSQNUxmlAqWzzMlrPuHwjXRe fX+w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-disposition:mime-version :mail-followup-to:message-id:subject:cc:to:from:date:dkim-signature; bh=zvliBSOKkv9z5UA3wuXK+P1EhDNBb4W3115G4bgKHx4=; fh=ce5vt2Olc1NJ5B+TLPDKobZuVX5zglFMuQaA1WfyR6M=; b=L/ouoKxusjbBVUfM0sXEz5+SDy3GLfaCCUTIxtbL4tb+oocIovheP38WMOEVkVcJWC Hf0942eQj1kqc+wDA4tqOhyRxi/nzAxNsSFT/9/wg9c9bZM/bH5yUN0o7YU10thCd4M3 pyqCsUhUD/xCzBGCbbh4sFvrRQcu0oIHzzQ44rXp4uXVoGtZMBMcxEXPn3pUuaFe89ru Seb3OnPWWsXcy8myhUWUUWqEx2Wp/1WI+O37qAtuNrMh4kBWFZUTepmnZvPS70CiIDL5 JmS5SbStduwae8ODmDg0itATQTBRFhYhg1q/ZqIxh8T3R7JU2k78sHWvjkEXmbhOCCx2 jiag== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=GerSS6DN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from lipwig.vger.email (lipwig.vger.email. [23.128.96.33]) by mx.google.com with ESMTPS id q8-20020a056830018800b006b8bc3a615dsi1040111ota.177.2023.11.17.16.05.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 17 Nov 2023 16:05:21 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) client-ip=23.128.96.33; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=GerSS6DN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id 1747582698F8; Fri, 17 Nov 2023 16:05:19 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234284AbjKRAFE (ORCPT <rfc822;jaysivo@gmail.com> + 29 others); Fri, 17 Nov 2023 19:05:04 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44184 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231533AbjKRAFC (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 17 Nov 2023 19:05:02 -0500 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 77C74B4 for <linux-kernel@vger.kernel.org>; Fri, 17 Nov 2023 16:04:59 -0800 (PST) Received: by smtp.kernel.org (Postfix) with ESMTPSA id BE40FC433C8; Sat, 18 Nov 2023 00:04:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1700265899; bh=eU7pWfXkgBvwAmZrRDoiJ4H3YjL2nwJyHUyzgk2bKag=; h=Date:From:To:Cc:Subject:From; b=GerSS6DN//mPm6zym/8Rkwr8vndxo2+k0YtJP9JCGdlvfCkx+Jp2zhUlE/fd1nJ+s DaVo12Q97IMAxnCC01BnwIbFte2EqCTfdfq1CKrzdWIy3wdUjR6dytaCaRPl0kylrT 5mQTVKWSRnExUFU35Clq0IlbqK3b/sMkTJJpjFh9tro3heTrDW68cQjvWImsuNprv/ UAEqei/ku4THbf8BDrJrMdzDrviMPxL8pMBxK7dpYwA0eBv7XrUl3ZfYoSQn/DB68c DyiC/Z/Adj9cn4+2NAU2MlL7g5dYctimleqQ88g2XfqGGV03IjC3cydia8piK5RMYL X1xjXWrntJIhw== Date: Fri, 17 Nov 2023 19:04:54 -0500 From: Wolfram Sang <wsa@kernel.org> To: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-i2c@vger.kernel.org, linux-kernel@vger.kernel.org, Peter Rosin <peda@axentia.se>, Bartosz Golaszewski <brgl@bgdev.pl>, Andi Shyti <andi.shyti@kernel.org> Subject: [PULL REQUEST] i2c-for-6.7-rc2 Message-ID: <ZVf/pqw5YcF7sldg@shikoro> Mail-Followup-To: Wolfram Sang <wsa@kernel.org>, Linus Torvalds <torvalds@linux-foundation.org>, linux-i2c@vger.kernel.org, linux-kernel@vger.kernel.org, Peter Rosin <peda@axentia.se>, Bartosz Golaszewski <brgl@bgdev.pl>, Andi Shyti <andi.shyti@kernel.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="7M+Pchu/CDK0TST7" Content-Disposition: inline X-Spam-Status: No, score=-1.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Fri, 17 Nov 2023 16:05:19 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1782858038647279735 X-GMAIL-MSGID: 1782858038647279735 |
Series |
[PULL,REQUEST] i2c-for-6.7-rc2
|
|
Pull-request
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux.git tags/i2c-for-6.7-rc2Message
Wolfram Sang
Nov. 18, 2023, 12:04 a.m. UTC
The following changes since commit b85ea95d086471afb4ad062012a4d73cd328fa86: Linux 6.7-rc1 (2023-11-12 16:19:07 -0800) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux.git tags/i2c-for-6.7-rc2 for you to fetch changes up to 382561d16854a747e6df71034da08d20d6013dfe: i2c: ocores: Move system PM hooks to the NOIRQ phase (2023-11-13 12:43:42 -0500) ---------------------------------------------------------------- Revert a not-working conversion to generic recovery for PXA, use proper IO accessors for designware, and use proper PM level for ocores to allow accessing interrupt providers late. ---------------------------------------------------------------- Jan Bottorff (1): i2c: designware: Fix corrupted memory seen in the ISR Robert Marko (1): Revert "i2c: pxa: move to generic GPIO recovery" Samuel Holland (1): i2c: ocores: Move system PM hooks to the NOIRQ phase with much appreciated quality assurance from ---------------------------------------------------------------- Andi Shyti (1): (Rev.) i2c: ocores: Move system PM hooks to the NOIRQ phase Serge Semin (2): (Test) i2c: designware: Fix corrupted memory seen in the ISR (Rev.) i2c: designware: Fix corrupted memory seen in the ISR drivers/i2c/busses/i2c-designware-common.c | 16 +++---- drivers/i2c/busses/i2c-ocores.c | 4 +- drivers/i2c/busses/i2c-pxa.c | 76 ++++++++++++++++++++++++++---- 3 files changed, 78 insertions(+), 18 deletions(-)
Comments
On Fri, 17 Nov 2023 at 16:05, Wolfram Sang <wsa@kernel.org> wrote: > > Jan Bottorff (1): > i2c: designware: Fix corrupted memory seen in the ISR I have pulled this, but honestly, looking at the patch, I really get the feeling that there's some deeper problem going on. Either the designware driver doesn't do the right locking, or the relaxed IO accesses improperly are escaping the locks that do exist. Either way, just changing "writel_relaxed()" to "writel()" seems to be wrong. Of course, it is entirely possible that those accesses should never have been relaxed in the first place, and that the actual access ordering between two accesses in the same thread matters. For example, the code did *val = readw_relaxed(dev->base + reg) | (readw_relaxed(dev->base + reg + 2) << 16); and if the order of those two readw's mattered, then the "relaxed" was always entirely wrong. But the commit message seems to very much imply a multi-thread issue, and for *that* issue, doing "writel_relaxed" -> "writel" is very much wrong. The only thing fixing threading issues is proper locks (or _working_ locks). Removing the "relaxed" may *hide* the issue, but doesn't really fix it. For the arm64 people I brought in: this is now commit f726eaa787e9 ("i2c: designware: Fix corrupted memory seen in the ISR") upstream. I've done the pull, because even if this is purely a "hide the problem" fix, it's better than what the code did. I'm just asking that people look at this a bit more. Linus
The pull request you sent on Fri, 17 Nov 2023 19:04:54 -0500:
> git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux.git tags/i2c-for-6.7-rc2
has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/23dfa043f6d5ac9339607f077f2c30cd50798e12
Thank you!
On Sat, Nov 18, 2023 at 09:56:59AM -0800, Linus Torvalds wrote: > On Fri, 17 Nov 2023 at 16:05, Wolfram Sang <wsa@kernel.org> wrote: > > > > Jan Bottorff (1): > > i2c: designware: Fix corrupted memory seen in the ISR > > I have pulled this, but honestly, looking at the patch, I really get > the feeling that there's some deeper problem going on. > > Either the designware driver doesn't do the right locking, or the > relaxed IO accesses improperly are escaping the locks that do exist. > > Either way, just changing "writel_relaxed()" to "writel()" seems to be wrong. > > Of course, it is entirely possible that those accesses should never > have been relaxed in the first place, and that the actual access > ordering between two accesses in the same thread matters. For example, > the code did > > *val = readw_relaxed(dev->base + reg) | > (readw_relaxed(dev->base + reg + 2) << 16); > > and if the order of those two readw's mattered, then the "relaxed" was > always entirely wrong. > > But the commit message seems to very much imply a multi-thread issue, > and for *that* issue, doing "writel_relaxed" -> "writel" is very much > wrong. The only thing fixing threading issues is proper locks (or > _working_ locks). > > Removing the "relaxed" may *hide* the issue, but doesn't really fix it. > > For the arm64 people I brought in: this is now commit f726eaa787e9 > ("i2c: designware: Fix corrupted memory seen in the ISR") upstream. > I've done the pull, because even if this is purely a "hide the > problem" fix, it's better than what the code did. I'm just asking that > people look at this a bit more. Thanks for putting me on CC. The original issue was discussed quite a bit over at: https://lore.kernel.org/all/20230913232938.420423-1-janb@os.amperecomputing.com/ and I think the high-level problem was something like: 1. CPU x writes some stuff to memory (I think one example was i2c_dw_xfer() setting 'dev->msg_read_idx' to 0) 2. CPU x writes to an I/O register on this I2C controller which generates an IRQ (end of i2c_dw_xfer_init()) 3. CPU y takes the IRQ 4. CPU y reads 'dev->msg_read_idx' and doesn't see the write from (1) (i2c folks: please chime in if I got this wrong) the issue being that the writes in (1) are not ordered before the I/O access in (2) if the relaxed accessor is used. Rather than upgrade only the register writes which can trigger an interrupt, the more conservative approach of upgrading everything to non-relaxed I/O accesses was taken (which is probably necessary to deal with spurious IRQs properly anyway because otherwise 'dev->msg_read_idx' could be read early in step (4)). Your point about locking is interesting. I don't see any obvious locks being taken in i2c_dw_isr(), so I don't think the issue is because relaxed accesses are escaping existing locks. An alternative would be putting steps (1) and (2) in a critical section and then taking the lock again in the interrupt handler. Or did you have something else in mind? Will
Hi Will, thanks a lot for this summary! > and I think the high-level problem was something like: > > 1. CPU x writes some stuff to memory (I think one example was i2c_dw_xfer() > setting 'dev->msg_read_idx' to 0) > 2. CPU x writes to an I/O register on this I2C controller which generates > an IRQ (end of i2c_dw_xfer_init()) > 3. CPU y takes the IRQ > 4. CPU y reads 'dev->msg_read_idx' and doesn't see the write from (1) > > (i2c folks: please chime in if I got this wrong) I admit that I didn't dive into this specific discussion. But we had this kind of re-ordering problem in the past in i2c, so avoiding the relaxed_* where possible came to be a good thing in my book. So, I recommended removing it for all writes, not only the one causing problems here. relaxed_* should only be used when really needed. So, this is why I applied the patch, plus I trust the people giving their tags after the in-depth discussion. But yeah, if somebody more experienced with this driver could double-check against the potential locking problem, this would be good.
On Mon, 20 Nov 2023 at 07:05, Will Deacon <will@kernel.org> wrote: > > and I think the high-level problem was something like: > > 1. CPU x writes some stuff to memory (I think one example was i2c_dw_xfer() > setting 'dev->msg_read_idx' to 0) > 2. CPU x writes to an I/O register on this I2C controller which generates > an IRQ (end of i2c_dw_xfer_init()) > 3. CPU y takes the IRQ > 4. CPU y reads 'dev->msg_read_idx' and doesn't see the write from (1) > > (i2c folks: please chime in if I got this wrong) > > the issue being that the writes in (1) are not ordered before the I/O > access in (2) if the relaxed accessor is used. Ok, then removing relaxed is indeed the right thing to do. Because yes, it's an actual ordering issue with the IO write, not some locking issue. Thanks for filling in the details, that patch looked iffy to me, but it does sound like everything is good. Linus