Message ID | 20231212094310.3633-1-antonio.borneo@foss.st.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:bcd1:0:b0:403:3b70:6f57 with SMTP id r17csp7610869vqy; Tue, 12 Dec 2023 01:43:31 -0800 (PST) X-Google-Smtp-Source: AGHT+IEPdN4pbIFrzpV2rWsEb3Sw8oRLu71CVg9Bux8Ax4AA6Kam2vvIDxW5MmvMWQ0Nc9hDomWr X-Received: by 2002:a05:6a20:320f:b0:18f:97c:6177 with SMTP id hl15-20020a056a20320f00b0018f097c6177mr6478331pzc.116.1702374211385; Tue, 12 Dec 2023 01:43:31 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702374211; cv=none; d=google.com; s=arc-20160816; b=XiiCHmrtVX1j1PXebF6pRk85ePiy4tR7sWYHjYACOMEBTkSxCaAtwnPdXW5WCKjNx/ 3Ru/gS6BQNlvzLDl/0+MUKeQKjyfn6lVawgPp1XMBJiLsyWY0KVdFgHTlTt6Ui9M8Ijf y/rGlkJ8dQ/WbwVMC1JYJNNPNmaR7GnMsysUXLE25iMmjw07msOWIP24/1PiqgSZZtGW 2KOGow7SmgTTHtzlRgdJr+M5siooXHKuG344uGQo+Sr9bMInn8HhnEZrP4TguQvyWv2z z6RhC3ccx4fq3c66fa8f9D2SlRqiYXi1Ksr8bpjfBrih93GMe632Tm4Zh5GedsBdkGI/ uYDg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=2m3Til5/b9PZ2Uwec07k8iVtFn56gHhKsPS0il+OJ94=; fh=bDj0mIoyDvgt+6kb5BvEqQ3di8FOWBb5i2VCDBssocA=; b=BFNcxQO2qvG2T3twqfBpS40QMwLnoXqp0C0nXGG6nL7Dkhm4mKu2Eg3vz2JHjnPhHK j0pkjf6cUr5ym0rs6fsaqOgBF2u/xlslf7XF0MVwM8SAJQVSKav4LGpncX/UVWfoqlrA QZYNGUfobixRDTPHuN7YWPsb+qgLIDGi1SouAP5CEfT/SeUAoM+KsG8omPLf8aQybAZK 8lzv5A4N8QU9bJevZ98bVd4jhQtjHHDYGZ5g1TwnKKLHf3rjoT8LpVTWig+1wtIBSAn/ tgE4H10+c9tESpamixAAKi110wER88DDh16qVK04wWmzXkp24lW/Bn2XX5MrHAzBpAJY 73fA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@foss.st.com header.s=selector1 header.b=KMZ2mVUe; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=foss.st.com Received: from lipwig.vger.email (lipwig.vger.email. [23.128.96.33]) by mx.google.com with ESMTPS id c20-20020a056a000ad400b006d0b39dfc2bsi152548pfl.195.2023.12.12.01.43.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Dec 2023 01:43:31 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) client-ip=23.128.96.33; Authentication-Results: mx.google.com; dkim=pass header.i=@foss.st.com header.s=selector1 header.b=KMZ2mVUe; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=foss.st.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id 6A84880ABB34; Tue, 12 Dec 2023 01:43:27 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230218AbjLLJnT (ORCPT <rfc822;dexuan.linux@gmail.com> + 99 others); Tue, 12 Dec 2023 04:43:19 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55312 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229379AbjLLJnS (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 12 Dec 2023 04:43:18 -0500 Received: from mx07-00178001.pphosted.com (mx07-00178001.pphosted.com [185.132.182.106]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 731A5E3 for <linux-kernel@vger.kernel.org>; Tue, 12 Dec 2023 01:43:24 -0800 (PST) Received: from pps.filterd (m0288072.ppops.net [127.0.0.1]) by mx07-00178001.pphosted.com (8.17.1.22/8.17.1.22) with ESMTP id 3BC6gnqx004028; Tue, 12 Dec 2023 10:43:14 +0100 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foss.st.com; h= from:to:cc:subject:date:message-id:mime-version:content-type :content-transfer-encoding; s=selector1; bh=2m3Til5/b9PZ2Uwec07k 8iVtFn56gHhKsPS0il+OJ94=; b=KMZ2mVUe8vsxP7R3iqs5mlYRt4+WTLik1H3U GzUwCoS2717f49PR5dq/XZS5IjbcGRwPbQEadSs051dsH6JQoCFM9EVyEnt/FAKm itwChVoAoiitn7CyE298rfHxCqYeY2HthJM7BrRL7+aNDyRjCygbI9MAXFyDZt5+ YzFzq88juHPrBJGSSR/F/xt+8x0MMTxKotcADgZJlUfFt9WzmpZJ2+uf4klov9VY lHjNJjov5JEDcUXgl5aABYc9027npY5mq9xmqV+YG+Tb+PZsWw5sXBEj8QiXG/f4 ReDKpoacgMHDm4xGUdstg6ToFP6o0GMv9GvUN3VhbQdExkJjoA== Received: from beta.dmz-eu.st.com (beta.dmz-eu.st.com [164.129.1.35]) by mx07-00178001.pphosted.com (PPS) with ESMTPS id 3uvehmauxr-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 12 Dec 2023 10:43:14 +0100 (CET) Received: from euls16034.sgp.st.com (euls16034.sgp.st.com [10.75.44.20]) by beta.dmz-eu.st.com (STMicroelectronics) with ESMTP id 0D26610005A; Tue, 12 Dec 2023 10:43:14 +0100 (CET) Received: from Webmail-eu.st.com (shfdag1node1.st.com [10.75.129.69]) by euls16034.sgp.st.com (STMicroelectronics) with ESMTP id EF977215139; Tue, 12 Dec 2023 10:43:13 +0100 (CET) Received: from localhost (10.201.20.114) by SHFDAG1NODE1.st.com (10.75.129.69) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.27; Tue, 12 Dec 2023 10:43:13 +0100 From: Antonio Borneo <antonio.borneo@foss.st.com> To: Andy Whitcroft <apw@canonical.com>, Joe Perches <joe@perches.com>, Dwaipayan Ray <dwaipayanray1@gmail.com>, Lukas Bulwahn <lukas.bulwahn@gmail.com> CC: Antonio Borneo <antonio.borneo@foss.st.com>, <linux-kernel@vger.kernel.org>, =?utf-8?b?Q2zDqW1lbnQgTMOpZ2Vy?= <clement.leger@bootlin.com>, =?utf-8?q?Cl?= =?utf-8?q?=C3=A9ment_Le_Goffic?= <clement.legoffic@foss.st.com>, <linux-stm32@st-md-mailman.stormreply.com> Subject: [PATCH] checkpatch: use utf-8 match for spell checking Date: Tue, 12 Dec 2023 10:43:10 +0100 Message-ID: <20231212094310.3633-1-antonio.borneo@foss.st.com> X-Mailer: git-send-email 2.42.0 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit X-Originating-IP: [10.201.20.114] X-ClientProxiedBy: EQNCAS1NODE4.st.com (10.75.129.82) To SHFDAG1NODE1.st.com (10.75.129.69) X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.997,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-12-12_03,2023-12-07_01,2023-05-22_02 X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Tue, 12 Dec 2023 01:43:27 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1785068740944652077 X-GMAIL-MSGID: 1785068740944652077 |
Series |
checkpatch: use utf-8 match for spell checking
|
|
Commit Message
Antonio Borneo
Dec. 12, 2023, 9:43 a.m. UTC
The current code that checks for misspelling verifies, in a more
complex regex, if $rawline matches [^\w]($misspellings)[^\w]
Being $rawline a byte-string, a utf-8 character in $rawline can
match the non-word-char [^\w].
E.g.:
./script/checkpatch.pl --git 81c2f059ab9
WARNING: 'ment' may be misspelled - perhaps 'meant'?
#36: FILE: MAINTAINERS:14360:
+M: Clément Léger <clement.leger@bootlin.com>
^^^^
Use a utf-8 version of $rawline for spell checking.
Signed-off-by: Antonio Borneo <antonio.borneo@foss.st.com>
Reported-by: Clément Le Goffic <clement.legoffic@foss.st.com>
---
scripts/checkpatch.pl | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
base-commit: b85ea95d086471afb4ad062012a4d73cd328fa86
Comments
On Tue, 2023-12-12 at 10:43 +0100, Antonio Borneo wrote: > The current code that checks for misspelling verifies, in a more > complex regex, if $rawline matches [^\w]($misspellings)[^\w] > > Being $rawline a byte-string, a utf-8 character in $rawline can > match the non-word-char [^\w]. > E.g.: > ./script/checkpatch.pl --git 81c2f059ab9 > WARNING: 'ment' may be misspelled - perhaps 'meant'? > #36: FILE: MAINTAINERS:14360: > +M: Clément Léger <clement.leger@bootlin.com> > ^^^^ > > Use a utf-8 version of $rawline for spell checking. > > Signed-off-by: Antonio Borneo <antonio.borneo@foss.st.com> > Reported-by: Clément Le Goffic <clement.legoffic@foss.st.com> Seems sensible, thanks, but: > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl [] > @@ -3477,7 +3477,8 @@ sub process { > # Check for various typo / spelling mistakes > if (defined($misspellings) && > ($in_commit_log || $line =~ /^(?:\+|Subject:)/i)) { > - while ($rawline =~ /(?:^|[^\w\-'`])($misspellings)(?:[^\w\-'`]|$)/gi) { > + my $rawline_utf8 = decode("utf8", $rawline); > + while ($rawline_utf8 =~ /(?:^|[^\w\-'`])($misspellings)(?:[^\w\-'`]|$)/gi) { > my $typo = $1; > my $blank = copy_spacing($rawline); Maybe this needs to use $rawline_utf8 ? > my $ptr = substr($blank, 0, $-[1]) . "^" x length($typo); And may now the $fix bit will not always work properly
On Tue, 2023-12-12 at 11:07 -0800, Joe Perches wrote: > On Tue, 2023-12-12 at 10:43 +0100, Antonio Borneo wrote: > > The current code that checks for misspelling verifies, in a more > > complex regex, if $rawline matches [^\w]($misspellings)[^\w] > > > > Being $rawline a byte-string, a utf-8 character in $rawline can > > match the non-word-char [^\w]. > > E.g.: > > ./script/checkpatch.pl --git 81c2f059ab9 > > WARNING: 'ment' may be misspelled - perhaps 'meant'? > > #36: FILE: MAINTAINERS:14360: > > +M: Clément Léger <clement.leger@bootlin.com> > > ^^^^ > > > > Use a utf-8 version of $rawline for spell checking. > > > > Signed-off-by: Antonio Borneo <antonio.borneo@foss.st.com> > > Reported-by: Clément Le Goffic <clement.legoffic@foss.st.com> > > Seems sensible, thanks, but: > > > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl > [] > > @@ -3477,7 +3477,8 @@ sub process { > > # Check for various typo / spelling mistakes > > if (defined($misspellings) && > > ($in_commit_log || $line =~ /^(?:\+|Subject:)/i)) { > > - while ($rawline =~ /(?:^|[^\w\-'`])($misspellings)(?:[^\w\-'`]|$)/gi) { > > + my $rawline_utf8 = decode("utf8", $rawline); > > + while ($rawline_utf8 =~ /(?:^|[^\w\-'`])($misspellings)(?:[^\w\-'`]|$)/gi) { > > my $typo = $1; > > my $blank = copy_spacing($rawline); > > Maybe this needs to use $rawline_utf8 ? Correct, I will send a v2! > > > my $ptr = substr($blank, 0, $-[1]) . "^" x length($typo); > > And may now the $fix bit will not always work properly I have run some test and it looks ok with current ASCII file scripts/spelling.txt. I have also tested adding some utf-8 string in the spelling file, but checkpatch reads it as ASCII and extending it to utf-8 will require further modifications in checkpatch, way beyond this simple fix. Thanks for the review. Antonio
diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index 25fdb7fda112..58646bd6ef56 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -3477,7 +3477,8 @@ sub process { # Check for various typo / spelling mistakes if (defined($misspellings) && ($in_commit_log || $line =~ /^(?:\+|Subject:)/i)) { - while ($rawline =~ /(?:^|[^\w\-'`])($misspellings)(?:[^\w\-'`]|$)/gi) { + my $rawline_utf8 = decode("utf8", $rawline); + while ($rawline_utf8 =~ /(?:^|[^\w\-'`])($misspellings)(?:[^\w\-'`]|$)/gi) { my $typo = $1; my $blank = copy_spacing($rawline); my $ptr = substr($blank, 0, $-[1]) . "^" x length($typo);