Message ID | 20231214-get-maintainers-utf8-v2-1-b188dc7042a4@bang-olufsen.dk |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:3b04:b0:fb:cd0c:d3e with SMTP id c4csp8611376dys; Thu, 14 Dec 2023 07:07:22 -0800 (PST) X-Google-Smtp-Source: AGHT+IGkavxhZ9zAwtXpjV/fx8r/zTKp4Svb58JfOhCUNV++MjyISlISFhyndU5paHN3yCYcp+Jd X-Received: by 2002:a17:902:c40d:b0:1d3:4c35:17a3 with SMTP id k13-20020a170902c40d00b001d34c3517a3mr2661022plk.91.1702566442435; Thu, 14 Dec 2023 07:07:22 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702566442; cv=none; d=google.com; s=arc-20160816; b=MNRg9nuNW3Qs2pxsDC+DV56tS4jDHqKCgNcIgCpx3EvXN9X0/ia6aXbqPhZ0uCdCsg 4TOuUEYeHOcNTIvw+U5tYy6jRocO6dtvQXG2p4brEh82M5432JVy/OIO+mUJJFnWXK/Z zWOgiwPQ64Ly1BuHUQAEzguofiblV1Fn0lqWPL+zAYSire89zDcnVDy/K8MzeVrMstOD CD1maxLsC6o0D+02kvCtAjkpYTEP7udD0RatAB/6aDzjyJBz0rJCVkbwk81ZBXBBQpLn tCGEVk7Tfh2IeRN/Km5/0EK4OC5EZIN+9GyHcUcKcTPa6GXoZRzbwnFllzBjhnQ7w7ST HAiw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:message-id:content-transfer-encoding :mime-version:subject:date:from:dkim-signature; bh=Zr4tfF93uxRBpIQm7Itom01eR+K81zCOIkvaVvYrtXk=; fh=7MeINWv0pMMYpsfk+DJhUKatUwwjalnN6wOmv5h+7yI=; b=PTjI43HaBbSpK5t0VOKOL7mYqLpKGV+rWizpJj/1yN1ANrPLnkEYcJXscxf6SuFlB7 3LAkn8asA72pzWK4RERNxkm28rnsqnePRqStSjqOZwiuX/pLvLloU8PofIheHLaZWHI3 wFqDncuxdCnrkZoMs0gtXFcqIh/jBEKuBkbyJ/l8mYMtGEGnjGshlb8AqkSNTsOxJcxF 6lAAeUUImD0qDJb/kLFbiwVas6JszD566ru2Xu4PUzYSpS0fQfrNL98hibVxTGXFO0mr 5Bilq9UaYFBkSAvlJTxJW8Ua/75MtOzk5FVrHS0Rzl/79p1KsXSoXoiteJJ53ZtxNyfa LtXw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@pqrs.dk header.s=key1 header.b=UGdfi8uY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from groat.vger.email (groat.vger.email. [2620:137:e000::3:5]) by mx.google.com with ESMTPS id e4-20020a17090301c400b001cf50ffd38dsi1739620plh.164.2023.12.14.07.07.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Dec 2023 07:07:22 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) client-ip=2620:137:e000::3:5; Authentication-Results: mx.google.com; dkim=pass header.i=@pqrs.dk header.s=key1 header.b=UGdfi8uY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id AF5D982516D3; Thu, 14 Dec 2023 07:07:15 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1573642AbjLNPHH (ORCPT <rfc822;dexuan.linux@gmail.com> + 99 others); Thu, 14 Dec 2023 10:07:07 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49528 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1573629AbjLNPHG (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 14 Dec 2023 10:07:06 -0500 Received: from out-176.mta1.migadu.com (out-176.mta1.migadu.com [IPv6:2001:41d0:203:375::b0]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9EBF611B for <linux-kernel@vger.kernel.org>; Thu, 14 Dec 2023 07:07:10 -0800 (PST) X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pqrs.dk; s=key1; t=1702566427; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=Zr4tfF93uxRBpIQm7Itom01eR+K81zCOIkvaVvYrtXk=; b=UGdfi8uYqVky/1c2eFLH/hEl/cC3Bz7hK8JrlIAob3vCyWoLsEvryHvgy07v1XRwssOG4e C+zm0VCGsNFbLjcbMzKC+nIwNpOl2jUzCV3sLvLd674iYFjuUMELiVCC7lzpPAJp8h+C5s eqGSgIqFaL7E9liVCBYkpxKCIlx1uqT0PvR3e+XQARyMnR8+zecjWc1B2+LCcpbr87XIwd mQ75u7Gb3Cksdqi8FgBuse2EsTV4QegXl8z3sF3M6F3N++XdD+RzaxtMquCxpnFEuC+U/1 fzC3EXBIY5o+ojtiP0BHqKTKZjFDRbuJcxjxOvA6miKuc64FPljItvMrJcxi4Q== From: =?utf-8?q?Alvin_=C5=A0ipraga?= <alvin@pqrs.dk> Date: Thu, 14 Dec 2023 16:06:53 +0100 Subject: [PATCH v2] get_maintainer: correctly parse UTF-8 encoded names in files MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit Message-Id: <20231214-get-maintainers-utf8-v2-1-b188dc7042a4@bang-olufsen.dk> X-B4-Tracking: v=1; b=H4sIAAwae2UC/4WOTQ6CMBCFr0Jm7RjaIiIr72FYlDKFiVpMWxoN4 e5WLuDiLb6XvJ8VAnmmAG2xgqfEgWeXQR4KMJN2IyEPmUGWUolSVDhSxKdmF7PIB1yibVBJU59 MNdS20ZCjL0+W33vtrcs8cYiz/+wrSfzcP4VJoEClbWPOmnqpLtc+n8H5sdhA7jjcodu27Qukz Ts+vgAAAA== To: Joe Perches <joe@perches.com>, Linus Torvalds <torvalds@linux-foundation.org> Cc: =?utf-8?q?Duje_Mihanovi=C4=87?= <duje.mihanovic@skole.hr>, Konstantin Ryabitsev <konstantin@linuxfoundation.org>, linux-kernel@vger.kernel.org, =?utf-8?q?Alvin_=C5=A0ipraga?= <alsi@bang-olufsen.dk> X-Migadu-Flow: FLOW_OUT X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Thu, 14 Dec 2023 07:07:15 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1785270309792124064 X-GMAIL-MSGID: 1785270309792124064 |
Series |
[v2] get_maintainer: correctly parse UTF-8 encoded names in files
|
|
Commit Message
Alvin Šipraga
Dec. 14, 2023, 3:06 p.m. UTC
From: Alvin Šipraga <alsi@bang-olufsen.dk> While the script correctly extracts UTF-8 encoded names from the MAINTAINERS file, the regular expressions damage my name when parsing from .yaml files. Fix this by replacing the Latin-1-compatible regular expressions with the unicode property matcher \p{L}, which matches on any letter according to the Unicode General Category of letters. It's also necessary to instruct Perl to open all files with UTF-8 encoding. The issue was also identified on the tools mailing list [1]. This should solve the observed side effects there as well. Link: https://lore.kernel.org/tools/20230726-gush-slouching-a5cd41@meerkat/ [1] Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk> --- Changes in v2: - use '\p{L}' rather than '\p{Latin}', so that matching is even more inclusive (i.e. match also Greek letters, CJK, etc.) - fix commit message to refer to tools mailing list, not b4 mailing list - Link to v1: https://lore.kernel.org/r/20231014-get-maintainers-utf8-v1-1-3af8c7aeb239@bang-olufsen.dk --- scripts/get_maintainer.pl | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) --- base-commit: 70f8c6f8f8800d970b10676cceae42bba51a4899 change-id: 20231014-get-maintainers-utf8-32c65c4d6f8a
Comments
On Thu, 2023-12-14 at 16:06 +0100, Alvin Šipraga wrote: > From: Alvin Šipraga <alsi@bang-olufsen.dk> > > While the script correctly extracts UTF-8 encoded names from the > MAINTAINERS file, the regular expressions damage my name when parsing > from .yaml files. Fix this by replacing the Latin-1-compatible regular > expressions with the unicode property matcher \p{L}, which matches on > any letter according to the Unicode General Category of letters. OK > It's also necessary to instruct Perl to open all files with UTF-8 encoding. I doubt this. > --- > Changes in v2: > - use '\p{L}' rather than '\p{Latin}', so that matching is even more > inclusive (i.e. match also Greek letters, CJK, etc.) > - fix commit message to refer to tools mailing list, not b4 mailing list > - Link to v1: https://lore.kernel.org/r/20231014-get-maintainers-utf8-v1-1-3af8c7aeb239@bang-olufsen.dk OK > diff --git a/scripts/get_maintainer.pl b/scripts/get_maintainer.pl [] > @@ -20,6 +20,7 @@ use Getopt::Long qw(:config no_auto_abbrev); > use Cwd; > use File::Find; > use File::Spec::Functions; > +use open qw(:std :encoding(UTF-8)); I think this global use is unnecessary. > @@ -442,7 +443,7 @@ sub maintainers_in_file { > my $text = do { local($/) ; <$f> }; > close($f); > > - my @poss_addr = $text =~ m$[A-Za-zÀ-ÿ\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g; > + my @poss_addr = $text =~ m$[\p{L}\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g; > push(@file_emails, clean_file_emails(@poss_addr)); > } > } Rather than open _all_ files in utf-8, perhaps the block that opens a specific file to find maintainers sub maintainers_in_file { my ($file) = @_; return if ($file =~ m@\bMAINTAINERS$@); if (-f $file && ($email_file_emails || $file =~ /\.yaml$/)) { open(my $f, '<', $file) or die "$P: Can't open $file: $!\n"; my $text = do { local($/) ; <$f> }; close($f); ... should change the open(my $f... to use open qw(:std :encoding(UTF-8)); open(my $f... And unrelated and secondarily, perhaps the $file =~ /\.yaml$/ test should be $file =~ /\.(?:yaml|dtsi?)$/ to also find any maintainer address in the dts* files https://lore.kernel.org/lkml/20231028174656.GA3310672@bill-the-cat/T/
On Thu, Dec 14, 2023 at 07:57:54AM -0800, Joe Perches wrote: > On Thu, 2023-12-14 at 16:06 +0100, Alvin Šipraga wrote: > > @@ -442,7 +443,7 @@ sub maintainers_in_file { > > my $text = do { local($/) ; <$f> }; > > close($f); > > > > - my @poss_addr = $text =~ m$[A-Za-zÀ-ÿ\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g; > > + my @poss_addr = $text =~ m$[\p{L}\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g; > > push(@file_emails, clean_file_emails(@poss_addr)); > > } > > } > > Rather than open _all_ files in utf-8, perhaps the block > that opens a specific file to find maintainers > > sub maintainers_in_file { > my ($file) = @_; > > return if ($file =~ m@\bMAINTAINERS$@); > > if (-f $file && ($email_file_emails || $file =~ /\.yaml$/)) { > open(my $f, '<', $file) > or die "$P: Can't open $file: $!\n"; > my $text = do { local($/) ; <$f> }; > close($f); > ... > > should change the > > open(my $f... > to > use open qw(:std :encoding(UTF-8)); > open(my $f... Yes, this also works for parsing the name in an arbitrary file. But with the change you suggest above, the script then corrupts my name when it is lifted from MAINTAINERS (!?): $ ./scripts/get_maintainer.pl -f drivers/net/dsa/realtek/ | grep alsi "Alvin Å ipraga" <alsi@bang-olufsen.dk> (maintainer:REALTEK RTL83xx SMI DSA ROUTER CHIPS) I'm not entirely sure why that happens, since my name doesn't get corrupted when coming from MAINTAINERS with the upstream state of the script. But anyway, with your suggestion I would then also have to add it here: @@ -347,6 +346,7 @@ my @mfiles = (); my @self_test_info = (); sub read_maintainer_file { + use open qw(:std :encoding(UTF-8)); my ($file) = @_; open (my $maint, '<', "$file") ... and I guess there might be other cases too. Rather than scattering it all over, don't you think it's more robust to open all files in UTF-8? I tried to show in one of my replies to v1 [1] that this should be compatible with basically all of the source tree. [1] https://lore.kernel.org/all/dzn6uco4c45oaa3ia4u37uo5mlt33obecv7gghj2l756fr4hdh@mt3cprft3tmq/ If you are still unconvinced then I will gladly send a v3 patching the two cases we have discussed (read_maintainer_file() and maintainers_in_file()). > > > And unrelated and secondarily, perhaps the > $file =~ /\.yaml$/ > test should be > $file =~ /\.(?:yaml|dtsi?)$/ > > to also find any maintainer address in the dts* files > > https://lore.kernel.org/lkml/20231028174656.GA3310672@bill-the-cat/T/ Is this supposed to parse the "Copyright (c) 20xx John Doe <foo@bar.toto>" in the .dts* files? But sure, I can do a resend of Shawn's original patch separately if you like. Kind regards, Alvin
On Fri, 2023-12-15 at 10:30 +0000, Alvin Šipraga wrote: > On Thu, Dec 14, 2023 at 07:57:54AM -0800, Joe Perches wrote: > > On Thu, 2023-12-14 at 16:06 +0100, Alvin Šipraga wrote: > > > @@ -442,7 +443,7 @@ sub maintainers_in_file { > > > my $text = do { local($/) ; <$f> }; > > > close($f); > > > > > > - my @poss_addr = $text =~ m$[A-Za-zÀ-ÿ\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g; > > > + my @poss_addr = $text =~ m$[\p{L}\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g; > > > push(@file_emails, clean_file_emails(@poss_addr)); Hi again Alvin. Separate issue, but on the one .yaml file I tried: $ ./scripts/get_maintainer.pl Documentation/devicetree/bindings/serial/8250.yaml Greg Kroah-Hartman <gregkh@linuxfoundation.org> (supporter:TTY LAYER AND SERIAL DRIVERS) Jiri Slaby <jirislaby@kernel.org> (supporter:TTY LAYER AND SERIAL DRIVERS) Rob Herring <robh+dt@kernel.org> (maintainer:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS) Krzysztof Kozlowski <krzysztof.kozlowski+dt@linaro.org> (maintainer:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS) Conor Dooley <conor+dt@kernel.org> (maintainer:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS) Lubomir Rintel <lkundrak@v3.sk> (in file) - <devicetree@vger.kernel.org> (in file) linux-kernel@vger.kernel.org (open list:TTY LAYER AND SERIAL DRIVERS) linux-serial@vger.kernel.org (open list:TTY LAYER AND SERIAL DRIVERS) devicetree@vger.kernel.org (open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS) Note the single '-' in the "name" portion of devicetree@vger.kernel.org Maybe clean_file_emails needs some better name cleansing code. > > Rather than open _all_ files in utf-8, perhaps the block > > that opens a specific file to find maintainers > > > > sub maintainers_in_file { > > my ($file) = @_; > > > > return if ($file =~ m@\bMAINTAINERS$@); > > > > if (-f $file && ($email_file_emails || $file =~ /\.yaml$/)) { > > open(my $f, '<', $file) > > or die "$P: Can't open $file: $!\n"; > > my $text = do { local($/) ; <$f> }; > > close($f); > > ... > > > > should change the > > > > open(my $f... > > to > > use open qw(:std :encoding(UTF-8)); > > open(my $f... > > Yes, this also works for parsing the name in an arbitrary file. But with the > change you suggest above, the script then corrupts my name when it is lifted > from MAINTAINERS (!?): > > $ ./scripts/get_maintainer.pl -f drivers/net/dsa/realtek/ | grep alsi > "Alvin Å ipraga" <alsi@bang-olufsen.dk> (maintainer:REALTEK RTL83xx SMI DSA ROUTER CHIPS) Curious. Let me see if I can figure out why that happens. > If you are still unconvinced then I will gladly send a v3 patching the two cases > we have discussed (read_maintainer_file() and maintainers_in_file()). No rush. > > And unrelated and secondarily, perhaps the > > $file =~ /\.yaml$/ > > test should be > > $file =~ /\.(?:yaml|dtsi?)$/ > > > > to also find any maintainer address in the dts* files > > > > https://lore.kernel.org/lkml/20231028174656.GA3310672@bill-the-cat/T/ > > Is this supposed to parse the "Copyright (c) 20xx John Doe <foo@bar.toto>" in > the .dts* files? Yes, just as it would and does for .yaml files. $ git grep -P -i 'copy.*\<\w+\@\w+\.\w+\>' -- '*.yaml' Documentation/devicetree/bindings/display/bridge/chrontel,ch7033.yaml:# Copyright (C) 2019,2020 Lubomir Rintel <lkundrak@v3.sk> Documentation/devicetree/bindings/media/marvell,mmp2-ccic.yaml:# Copyright 2019,2020 Lubomir Rintel <lkundrak@v3.sk> Documentation/devicetree/bindings/misc/olpc,xo1.75-ec.yaml:# Copyright (C) 2019,2020 Lubomir Rintel <lkundrak@v3.sk> Documentation/devicetree/bindings/phy/allwinner,sun50i-h6-usb3-phy.yaml:# Copyright 2019 Ondrej Jirman <megous@megous.com> Documentation/devicetree/bindings/phy/marvell,mmp3-hsic-phy.yaml:# Copyright 2019 Lubomir Rintel <lkundrak@v3.sk> Documentation/devicetree/bindings/phy/marvell,mmp3-usb-phy.yaml:# Copyright 2019,2020 Lubomir Rintel <lkundrak@v3.sk> Documentation/devicetree/bindings/reset/bitmain,bm1880-reset.yaml:# Copyright 2019 Manivannan Sadhasivam <mani@kernel.org> Documentation/devicetree/bindings/reset/marvell,berlin2-reset.yaml:# Copyright 2015 Antoine Tenart <atenart@kernel.org> Documentation/devicetree/bindings/reset/qca,ar7100-reset.yaml:# Copyright 2015 Alban Bedel <albeu@free.fr> Documentation/devicetree/bindings/serial/8250.yaml:# Copyright 2020 Lubomir Rintel <lkundrak@v3.sk> Documentation/devicetree/bindings/spi/marvell,mmp2-ssp.yaml:# Copyright 2019,2020 Lubomir Rintel <lkundrak@v3.sk> Documentation/devicetree/bindings/usb/marvell,pxau2o-ehci.yaml:# Copyright 2019,2020 Lubomir Rintel <lkundrak@v3.sk> > But sure, I can do a resend of Shawn's original patch > separately if you like. Yes please. Make sure to cc Andrew Morton.
+Shawn, please have a look at the bottom of this mail about your patch we would like to resend. On Fri, Dec 15, 2023 at 10:30:52AM -0800, Joe Perches wrote: > On Fri, 2023-12-15 at 10:30 +0000, Alvin Šipraga wrote: > > On Thu, Dec 14, 2023 at 07:57:54AM -0800, Joe Perches wrote: > > > On Thu, 2023-12-14 at 16:06 +0100, Alvin Šipraga wrote: > > > > @@ -442,7 +443,7 @@ sub maintainers_in_file { > > > > my $text = do { local($/) ; <$f> }; > > > > close($f); > > > > > > > > - my @poss_addr = $text =~ m$[A-Za-zÀ-ÿ\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g; > > > > + my @poss_addr = $text =~ m$[\p{L}\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g; > > > > push(@file_emails, clean_file_emails(@poss_addr)); > > Hi again Alvin. > > Separate issue, but on the one .yaml file I tried: > > $ ./scripts/get_maintainer.pl Documentation/devicetree/bindings/serial/8250.yaml > Greg Kroah-Hartman <gregkh@linuxfoundation.org> (supporter:TTY LAYER AND SERIAL DRIVERS) > Jiri Slaby <jirislaby@kernel.org> (supporter:TTY LAYER AND SERIAL DRIVERS) > Rob Herring <robh+dt@kernel.org> (maintainer:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS) > Krzysztof Kozlowski <krzysztof.kozlowski+dt@linaro.org> (maintainer:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS) > Conor Dooley <conor+dt@kernel.org> (maintainer:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS) > Lubomir Rintel <lkundrak@v3.sk> (in file) > - <devicetree@vger.kernel.org> (in file) > linux-kernel@vger.kernel.org (open list:TTY LAYER AND SERIAL DRIVERS) > linux-serial@vger.kernel.org (open list:TTY LAYER AND SERIAL DRIVERS) > devicetree@vger.kernel.org (open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS) > > Note the single '-' in the "name" portion of devicetree@vger.kernel.org > > Maybe clean_file_emails needs some better name cleansing code. OK, I had a look and made a patch to fix this as well. Please see v3 which is on its way to your inbox. Regarding the .dts* patch, I need some feedback below before sending anything, so I did not include it in the series. > > > > Rather than open _all_ files in utf-8, perhaps the block > > > that opens a specific file to find maintainers > > > > > > sub maintainers_in_file { > > > my ($file) = @_; > > > > > > return if ($file =~ m@\bMAINTAINERS$@); > > > > > > if (-f $file && ($email_file_emails || $file =~ /\.yaml$/)) { > > > open(my $f, '<', $file) > > > or die "$P: Can't open $file: $!\n"; > > > my $text = do { local($/) ; <$f> }; > > > close($f); > > > ... > > > > > > should change the > > > > > > open(my $f... > > > to > > > use open qw(:std :encoding(UTF-8)); > > > open(my $f... > > > > Yes, this also works for parsing the name in an arbitrary file. But with the > > change you suggest above, the script then corrupts my name when it is lifted > > from MAINTAINERS (!?): > > > > $ ./scripts/get_maintainer.pl -f drivers/net/dsa/realtek/ | grep alsi > > "Alvin Å ipraga" <alsi@bang-olufsen.dk> (maintainer:REALTEK RTL83xx SMI DSA ROUTER CHIPS) > > Curious. Let me see if I can figure out why that happens. > > > > If you are still unconvinced then I will gladly send a v3 patching the two cases > > we have discussed (read_maintainer_file() and maintainers_in_file()). > > No rush. > > > > And unrelated and secondarily, perhaps the > > > $file =~ /\.yaml$/ > > > test should be > > > $file =~ /\.(?:yaml|dtsi?)$/ > > > > > > to also find any maintainer address in the dts* files > > > > > > https://lore.kernel.org/lkml/20231028174656.GA3310672@bill-the-cat/T/ > > > > Is this supposed to parse the "Copyright (c) 20xx John Doe <foo@bar.toto>" in > > the .dts* files? > > Yes, just as it would and does for .yaml files. > > $ git grep -P -i 'copy.*\<\w+\@\w+\.\w+\>' -- '*.yaml' > Documentation/devicetree/bindings/display/bridge/chrontel,ch7033.yaml:# Copyright (C) 2019,2020 Lubomir Rintel <lkundrak@v3.sk> > Documentation/devicetree/bindings/media/marvell,mmp2-ccic.yaml:# Copyright 2019,2020 Lubomir Rintel <lkundrak@v3.sk> > Documentation/devicetree/bindings/misc/olpc,xo1.75-ec.yaml:# Copyright (C) 2019,2020 Lubomir Rintel <lkundrak@v3.sk> > Documentation/devicetree/bindings/phy/allwinner,sun50i-h6-usb3-phy.yaml:# Copyright 2019 Ondrej Jirman <megous@megous.com> > Documentation/devicetree/bindings/phy/marvell,mmp3-hsic-phy.yaml:# Copyright 2019 Lubomir Rintel <lkundrak@v3.sk> > Documentation/devicetree/bindings/phy/marvell,mmp3-usb-phy.yaml:# Copyright 2019,2020 Lubomir Rintel <lkundrak@v3.sk> > Documentation/devicetree/bindings/reset/bitmain,bm1880-reset.yaml:# Copyright 2019 Manivannan Sadhasivam <mani@kernel.org> > Documentation/devicetree/bindings/reset/marvell,berlin2-reset.yaml:# Copyright 2015 Antoine Tenart <atenart@kernel.org> > Documentation/devicetree/bindings/reset/qca,ar7100-reset.yaml:# Copyright 2015 Alban Bedel <albeu@free.fr> > Documentation/devicetree/bindings/serial/8250.yaml:# Copyright 2020 Lubomir Rintel <lkundrak@v3.sk> > Documentation/devicetree/bindings/spi/marvell,mmp2-ssp.yaml:# Copyright 2019,2020 Lubomir Rintel <lkundrak@v3.sk> > Documentation/devicetree/bindings/usb/marvell,pxau2o-ehci.yaml:# Copyright 2019,2020 Lubomir Rintel <lkundrak@v3.sk> Hmm, I ran this over the arch/ directory on all .dts* files and got some false positives: 1355276466-18295-1-git-send-email-arve@android.com This one comes from a lore link with a Message-ID. So deleting URLs before parsing should fix it. aaci_bitclk@12.288M led@10.0 led@10.1 led@10.2 led@10.3 led@10.4 led@10.5 led@10.6 led@10.7 led@18.0 led@18.1 led@18.2 led@18.3 led@18.4 led@18.5 led@18.6 led@18.7 mxtal@19.2M timclk@2.4M uartclk@14.74M xtal24.576@24.576M These can also be avoided by strengthening the email parsing regex to more strictly validate the TLD, which may not begin with a digit. So something like this: diff --git a/scripts/get_maintainer.pl b/scripts/get_maintainer.pl index faa96801897a..e253053da967 100755 --- a/scripts/get_maintainer.pl +++ b/scripts/get_maintainer.pl @@ -446,7 +446,10 @@ sub maintainers_in_file { my $text = do { local($/) ; <$f> }; close($f); - my @poss_addr = $text =~ m$[\p{L}\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g; + # Avoid mistaking URLs with Message-IDs as emails + $text =~ s/https?:[^\s]+//g; + + my @poss_addr = $text =~ m$[\p{L}\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z][A-Za-z0-9]+[\)\>\}]{0,1}$g; push(@file_emails, clean_file_emails(@poss_addr)); } } > > > But sure, I can do a resend of Shawn's original patch > > separately if you like. > > Yes please. Make sure to cc Andrew Morton. Given that some changes are needed, I want your input on how to send: 1. Should I send the above diff as a patch preceding Shawn's patch? 2. Should I send the above diff as a patch following Shawn's patch? Here the justification is arguably more clear in the history. 3. Or should I roll it into Shawn's patch? If so, Shawn, may I have your Signed-off-by? Kind regards, Alvin
diff --git a/scripts/get_maintainer.pl b/scripts/get_maintainer.pl index ab123b498fd9..344d0cda9854 100755 --- a/scripts/get_maintainer.pl +++ b/scripts/get_maintainer.pl @@ -20,6 +20,7 @@ use Getopt::Long qw(:config no_auto_abbrev); use Cwd; use File::Find; use File::Spec::Functions; +use open qw(:std :encoding(UTF-8)); my $cur_path = fastgetcwd() . '/'; my $lk_path = "./"; @@ -442,7 +443,7 @@ sub maintainers_in_file { my $text = do { local($/) ; <$f> }; close($f); - my @poss_addr = $text =~ m$[A-Za-zÀ-ÿ\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g; + my @poss_addr = $text =~ m$[\p{L}\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g; push(@file_emails, clean_file_emails(@poss_addr)); } } @@ -2460,13 +2461,13 @@ sub clean_file_emails { $name = ""; } - my @nw = split(/[^A-Za-zÀ-ÿ\'\,\.\+-]/, $name); + my @nw = split(/[^\p{L}\'\,\.\+-]/, $name); if (@nw > 2) { my $first = $nw[@nw - 3]; my $middle = $nw[@nw - 2]; my $last = $nw[@nw - 1]; - if (((length($first) == 1 && $first =~ m/[A-Za-z]/) || + if (((length($first) == 1 && $first =~ m/\p{L}/) || (length($first) == 2 && substr($first, -1) eq ".")) || (length($middle) == 1 || (length($middle) == 2 && substr($middle, -1) eq "."))) {