From patchwork Tue Dec 19 01:25:15 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Alvin_=C5=A0ipraga?= X-Patchwork-Id: 180704 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:24d3:b0:fb:cd0c:d3e with SMTP id r19csp1647941dyi; Mon, 18 Dec 2023 17:26:26 -0800 (PST) X-Google-Smtp-Source: AGHT+IEhxt4H8scxQMlgp/v3zWSkmZpYVpDZC2sMOYDHdUTRcGjouxywXpBGtmZXKu2VZCK1lawn X-Received: by 2002:a50:cc88:0:b0:553:5609:3344 with SMTP id q8-20020a50cc88000000b0055356093344mr1024426edi.20.1702949186406; Mon, 18 Dec 2023 17:26:26 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702949186; cv=none; d=google.com; s=arc-20160816; b=rANomJsSNk5FpXShXSLQr4aq2+ZpjrxyT/8J3Y9wfRtvMfRljrpxNDq/RS2DGxn6E2 /L9W4yygm25Ne9gXB9IoNmqj5ScvtaPqNmvth7S6Ras4eWxzevJjDyJ0PNdF8csH9x1N Y+uqcEbcIDe1XMRZppJ2BEvvFNmL2+elDzuMm7MneRshGsGyY/ba3NzD6OV5b+p3V7kn uyPn/bRG+VWAmEmOxz9soyxLMu6HkaSlqQMMLffb7YQ+2ggR8q5Dhnp6xps/h1hkheQU 32ptvMZAJVdM9txiOB3XNAciX8Hg8IvzqqEYof4heIGD9CKSCFGblzt7a+UONG/xNktQ tWQQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:list-unsubscribe:list-subscribe:list-id:precedence :subject:date:from:dkim-signature; bh=DCcA5+GzeT7LmnAC4ZC7lotYpQkFKLlKWVTf7X6olec=; fh=xl2J40ly2ZcWHQktF4kxcRfS4CvpZ1lyNrelw9LazNI=; b=j3T4yBo74Yk3kyvhlN4/NNu9Io43s1QzPN+HuNdmT5BYlxFUMlt/Cs5s/gZknoWb0l VFMC50xu6KwpymdqQREndB5+2s44GBFCDFnqdjlRDoj3XZ3vHGgf2KXF21YJVD9mDtzj qSbHds74vu9+thgUUP63DBGNkQt8iPaCC1+rxXAMn0p7L2yhfXzUKp7zxy0hOfltyVnZ ZSYSueJ1RbzeTGZNlsM5NyGsWInm+GCitrt+2/6DOW6NoSXXixIkmAAXKEiXtYfFqdX6 p6gPSuHPE3rI0QtKVXXcu5FEmKsgyAUg+gj8BWHIJu8PNBdwgZZwpfhq1zu/MmnSPjpH EhsA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@pqrs.dk header.s=key1 header.b=WbvSdKXo; spf=pass (google.com: domain of linux-kernel+bounces-4553-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-4553-ouuuleilei=gmail.com@vger.kernel.org" Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id b94-20020a509f67000000b005539201893fsi119573edf.269.2023.12.18.17.26.26 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 18 Dec 2023 17:26:26 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-4553-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; dkim=pass header.i=@pqrs.dk header.s=key1 header.b=WbvSdKXo; spf=pass (google.com: domain of linux-kernel+bounces-4553-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-4553-ouuuleilei=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 0918E1F23124 for ; Tue, 19 Dec 2023 01:26:26 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 2A39F63D1; Tue, 19 Dec 2023 01:25:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=pqrs.dk header.i=@pqrs.dk header.b="WbvSdKXo" X-Original-To: linux-kernel@vger.kernel.org Received: from out-183.mta0.migadu.com (out-183.mta0.migadu.com [91.218.175.183]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8D3B615AC for ; Tue, 19 Dec 2023 01:25:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=pqrs.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=pqrs.dk X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pqrs.dk; s=key1; t=1702949137; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=DCcA5+GzeT7LmnAC4ZC7lotYpQkFKLlKWVTf7X6olec=; b=WbvSdKXocvo/vee2x/4gWhabR5oUOFwYlDUL49pFBugA3rlgQAtfZEYOnPGoFZbNEAKeQD L7qjo3jJwn5bQ1HLg8a78fkvseHQ2D8YawQ5IhtNdZ9+7Oy+rqfgixtH/t/w1wlYrVqScT Vtm59qGpbVGYDL/y4IFf51r8+d9hiADQP0fSxjMxBeTsznXA8P/e1kFzjhqClg5bNV6n5c iocVQ3UpcrBEilkx0w2lXyfj4Id8OXjfn9XbY3pj8quvD4X8oDeNSwGJztj/EraYGQ1EsA pZG+H2avnXHdNha9ldle27k5xIaN2pUFnI40PEogjJ/7jP+ITArhmrSDihh/sw== From: =?utf-8?q?Alvin_=C5=A0ipraga?= Date: Tue, 19 Dec 2023 02:25:15 +0100 Subject: [PATCH v3 2/2] get_maintainer: remove stray punctuation when cleaning file emails Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20231219-get-maintainers-utf8-v3-2-f85a39e2265a@bang-olufsen.dk> References: <20231219-get-maintainers-utf8-v3-0-f85a39e2265a@bang-olufsen.dk> In-Reply-To: <20231219-get-maintainers-utf8-v3-0-f85a39e2265a@bang-olufsen.dk> To: Joe Perches , Linus Torvalds , Andrew Morton Cc: =?utf-8?q?Duje_Mihanovi=C4=87?= , Konstantin Ryabitsev , linux-kernel@vger.kernel.org, =?utf-8?q?Alvin_=C5=A0ipraga?= X-Migadu-Flow: FLOW_OUT X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1785671645910904855 X-GMAIL-MSGID: 1785671645910904855 From: Alvin Šipraga When parsing emails from .yaml files in particular, stray punctuation such as a leading '-' can end up in the name. For example, consider a common YAML section such as: maintainers: - devicetree@vger.kernel.org This would previously be processed by get_maintainer.pl as: - Make the logic in clean_file_emails more robust by deleting any sub-names which consist of common single punctuation marks before proceeding to the best-effort name extraction logic. The output is then correct: devicetree@vger.kernel.org Some additional comments are added to the function to make things clearer to future readers. Link: https://lore.kernel.org/all/0173e76a36b3a9b4e7f324dd3a36fd4a9757f302.camel@perches.com/ Suggested-by: Joe Perches Signed-off-by: Alvin Šipraga --- scripts/get_maintainer.pl | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/scripts/get_maintainer.pl b/scripts/get_maintainer.pl index dac38c6e3b1c..ee1aed7e090c 100755 --- a/scripts/get_maintainer.pl +++ b/scripts/get_maintainer.pl @@ -2462,11 +2462,17 @@ sub clean_file_emails { foreach my $email (@file_emails) { $email =~ s/[\(\<\{]{0,1}([A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+)[\)\>\}]{0,1}/\<$1\>/g; my ($name, $address) = parse_email($email); - if ($name eq '"[,\.]"') { - $name = ""; - } + # Strip quotes for easier processing, format_email will add them back + $name =~ s/^"(.*)"$/$1/; + + # Split into name-like parts and remove stray punctuation particles my @nw = split(/[^\p{L}\'\,\.\+-]/, $name); + @nw = grep(!/^[\'\,\.\+-]$/, @nw); + + # Make a best effort to extract the name, and only the name, by taking + # only the last two names, or in the case of obvious initials, the last + # three names. if (@nw > 2) { my $first = $nw[@nw - 3]; my $middle = $nw[@nw - 2]; @@ -2480,18 +2486,16 @@ sub clean_file_emails { } else { $name = "$middle $last"; } + } else { + $name = "@nw"; } if (substr($name, -1) =~ /[,\.]/) { $name = substr($name, 0, length($name) - 1); - } elsif (substr($name, -2) =~ /[,\.]"/) { - $name = substr($name, 0, length($name) - 2) . '"'; } if (substr($name, 0, 1) =~ /[,\.]/) { $name = substr($name, 1, length($name) - 1); - } elsif (substr($name, 0, 2) =~ /"[,\.]/) { - $name = '"' . substr($name, 2, length($name) - 2); } my $fmt_email = format_email($name, $address, $email_usename);