Message ID | 20221021191507.9026-1-antonio.borneo@foss.st.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4242:0:0:0:0:0 with SMTP id s2csp864387wrr; Fri, 21 Oct 2022 12:18:58 -0700 (PDT) X-Google-Smtp-Source: AMsMyM7b/pjeM2rbLriu3J9ijppnPfkZRT0cajH8P2jPG+HI5YWsYwcS9wGjH2jbsdAMkHt0JIcT X-Received: by 2002:a17:907:6d11:b0:78d:cce7:2bd5 with SMTP id sa17-20020a1709076d1100b0078dcce72bd5mr16735939ejc.43.1666379937962; Fri, 21 Oct 2022 12:18:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666379937; cv=none; d=google.com; s=arc-20160816; b=tBKuWh01I4oxxRJtVxRjJ0klylX5QcxuzjEwWYCriu4b49/O4k7dVP0n4uI9cQDJf0 1u52h5q76SrA+vkFJTr6XCMExlKgo8ccDcKYFtW7+/Qc8H5T7QYYBkIPSIFMB0mMNelU UwT8xg9LLNAC6ickC3UkQB4aFJrOiwPIqBgjzH9IdGRUpgX99nabaKJ5CVrgzA5LmLrH x9qBrz8V9WNuXeDnop/bA2BsKui5A9qjAFPiCfhkyuEUz/6z4n6O1q9AlXhLsNmyhtw9 mx4fpblF+2j5cb4kglKkXay5IJaFKWJuYMWDYXDJMioztGFvsqsnVsHSci6IgxeUpx0i e2ag== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=rpB0t4cce39ShZcMP7Omj98OzqHAfccHfhWe9Hk8V0g=; b=uNqnl/daqMWpuQ97htnN5vKMmjlyD0B0JzZHqkXAG8VHmrnDQo73A6vy77m21JcHiO 412DHmTC49lrl7YVSbNkEX1/avhvgRy/RIYR+zSNUkxEZUVNtA3hPSRFYYp9qfnnmZwn BSAUyn4t6Etc4r6Jy2ZSN6pWkfMexxQ8l7MG+YaYtxrZp4uaZNh0WKN9fgwcyqLYJHSj rMnY1wuVMOq+Nyri29eV0LzLl4z24e+6pfij8JiWD8zf6gaM0iE7G0r9x5y+NC/Jeanv Fgc5LpQGdH46atB3I2oo3lWMuVYFRNczqK4boZVHXgsj+dZL1tTq0d/d7Jh9Mmzns7r4 Tfmg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@foss.st.com header.s=selector1 header.b=JYGOVtHz; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=foss.st.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b8-20020a170906708800b0078df946ea14si18411950ejk.419.2022.10.21.12.18.30; Fri, 21 Oct 2022 12:18:57 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@foss.st.com header.s=selector1 header.b=JYGOVtHz; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=foss.st.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229777AbiJUTQy (ORCPT <rfc822;mntrajkot1@gmail.com> + 99 others); Fri, 21 Oct 2022 15:16:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48858 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230416AbiJUTQW (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 21 Oct 2022 15:16:22 -0400 Received: from mx07-00178001.pphosted.com (mx08-00178001.pphosted.com [91.207.212.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 16AF31116D for <linux-kernel@vger.kernel.org>; Fri, 21 Oct 2022 12:15:42 -0700 (PDT) Received: from pps.filterd (m0046661.ppops.net [127.0.0.1]) by mx07-00178001.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 29LDufN2004136; Fri, 21 Oct 2022 21:15:25 +0200 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=foss.st.com; h=from : to : cc : subject : date : message-id : mime-version : content-transfer-encoding : content-type; s=selector1; bh=rpB0t4cce39ShZcMP7Omj98OzqHAfccHfhWe9Hk8V0g=; b=JYGOVtHzsm8o02qL7Ie8vsiw5bieAN68feZKusIqBTYDiJ/puZ/I20pL1xd1f9lwbYUz Y1GSNBCHOQUdhf7APHLxKXW7BB3vEa/Ncqo8B49W3jQfapiAhODJkXgkSkvqsQZxLnwB z9U/m+KaTnT+Tu5FgDKaFoUDzeSBmrbQOAzv9oTBRtJqmpHzIThp6Rx8KLhQ5+6CaxpR LunZSi9VFT2Uf8aEwC/5mQtqPRY0nmPiIQstCcxPSqKHWsCYHVa8WGkwz5o+rPAlhXdT b23Jn+HeqAi4RzltqWCU6PlYmWqYmQZKHe7yHKR8+hvJqXpOvdR82vuMmOoRyJfEdLbX VA== Received: from beta.dmz-eu.st.com (beta.dmz-eu.st.com [164.129.1.35]) by mx07-00178001.pphosted.com (PPS) with ESMTPS id 3kbrgtk7r0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 21 Oct 2022 21:15:25 +0200 Received: from euls16034.sgp.st.com (euls16034.sgp.st.com [10.75.44.20]) by beta.dmz-eu.st.com (STMicroelectronics) with ESMTP id D145B10002A; Fri, 21 Oct 2022 21:15:19 +0200 (CEST) Received: from Webmail-eu.st.com (shfdag1node1.st.com [10.75.129.69]) by euls16034.sgp.st.com (STMicroelectronics) with ESMTP id 67F0B2C4212; Fri, 21 Oct 2022 21:15:19 +0200 (CEST) Received: from localhost (10.211.9.227) by SHFDAG1NODE1.st.com (10.75.129.69) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Fri, 21 Oct 2022 21:15:19 +0200 From: Antonio Borneo <antonio.borneo@foss.st.com> To: Andy Whitcroft <apw@canonical.com>, Joe Perches <joe@perches.com>, Dwaipayan Ray <dwaipayanray1@gmail.com>, Lukas Bulwahn <lukas.bulwahn@gmail.com>, <linux-kernel@vger.kernel.org> CC: Antonio Borneo <antonio.borneo@foss.st.com> Subject: [PATCH] checkpatch: handle utf8 while computing length of commit msg lines Date: Fri, 21 Oct 2022 21:15:07 +0200 Message-ID: <20221021191507.9026-1-antonio.borneo@foss.st.com> X-Mailer: git-send-email 2.38.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-Originating-IP: [10.211.9.227] X-ClientProxiedBy: EQNCAS1NODE3.st.com (10.75.129.80) To SHFDAG1NODE1.st.com (10.75.129.69) X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-10-21_04,2022-10-21_01,2022-06-22_01 X-Spam-Status: No, score=-2.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,RCVD_IN_DNSWL_LOW,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747326009888178485?= X-GMAIL-MSGID: =?utf-8?q?1747326009888178485?= |
Series |
checkpatch: handle utf8 while computing length of commit msg lines
|
|
Commit Message
Antonio Borneo
Oct. 21, 2022, 7:15 p.m. UTC
The current check for the length of each line in the commit msg
uses length($line) that counts line's bytes.
If the line contains utf8 characters, the byte count can exceed
the cap even on quite short lines.
Count the utf8 characters for checking line length.
Signed-off-by: Antonio Borneo <antonio.borneo@foss.st.com>
---
Actually it's not fully clear to me if utf8 characters in the
commit msg are acceptable/tolerated or to be avoided.
In the commit msg of 15662b3e8644 ("checkpatch: add a --strict
check for utf-8 in commit logs") is stated:
Some find using utf-8 in commit logs inappropriate.
scripts/checkpatch.pl | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
base-commit: 9abf2313adc1ca1b6180c508c25f22f9395cc780
Comments
On Fri, 2022-10-21 at 21:15 +0200, Antonio Borneo wrote: > The current check for the length of each line in the commit msg > uses length($line) that counts line's bytes. > If the line contains utf8 characters, the byte count can exceed > the cap even on quite short lines. > > Count the utf8 characters for checking line length. > > Signed-off-by: Antonio Borneo <antonio.borneo@foss.st.com> > > --- > > Actually it's not fully clear to me if utf8 characters in the > commit msg are acceptable/tolerated or to be avoided. Nor is it to me, likely it's OK though as at least checkpatch has an existing test/comment for nominally valid UTF-8 in commit messages. CHK("INVALID_UTF8", "Invalid UTF-8, patch and commit message should be encoded in UTF-8\n" . $hereptr); > In the commit msg of 15662b3e8644 ("checkpatch: add a --strict > check for utf-8 in commit logs") is stated: > Some find using utf-8 in commit logs inappropriate. I don't particularly care one way or another. Andrew? Linus? > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl > index 1e5e66ae5a52..eaad5da50554 100755 > --- a/scripts/checkpatch.pl > +++ b/scripts/checkpatch.pl > @@ -3220,7 +3220,7 @@ sub process { > > # Check for line lengths > 75 in commit log, warn once > if ($in_commit_log && !$commit_log_long_line && > - length($line) > 75 && > + length(decode("utf8", $line)) > 75 && > !($line =~ /^\s*[a-zA-Z0-9_\/\.]+\s+\|\s+\d+/ || > # file delta changes > $line =~ /^\s*(?:[\w\.\-\+]*\/)++[\w\.\-\+]+:/ || > > base-commit: 9abf2313adc1ca1b6180c508c25f22f9395cc780
On Fri, Oct 21, 2022 at 10:48 PM Joe Perches <joe@perches.com> wrote: > > On Fri, 2022-10-21 at 21:15 +0200, Antonio Borneo wrote: > > > > Actually it's not fully clear to me if utf8 characters in the > > commit msg are acceptable/tolerated or to be avoided. utf8 is not just acceptable, but actively encouraged in commit messages. Not *grtatuitous* use (please - no emojis) but there is absolutely nothing wrong when using utf8 when appropriate. And getting people's names right is not just appropriate, but actually important. And depending on where in the world you are from, utf8 is absolutely required, and no, we don't do Latin1 for that subset of the world (any more - we have a dark history of Latin1 in some corners). That said, I'm not convinced the whole line length check really matters, or is even appropriate. A lot of commit messages absolutely should have long lines, regardless of any UTF8 issues. Just as a recent example, see commit 71e2d666ef85 ("mm/huge_memory: do not clobber swp_entry_t during THP split"), which has a 200+ character line, and that's *exactly* what it should have. Splitting that line would be actively wrong. The same often goes for things like quoted compiler warnings etc. I personally can't think of a case where we've actually had issues wrt "line length in bytes vs line length in characters". And I'm not convinced the length check is appropriate in the first place. The only line that really shouldn't be overly long is the _first_ line of the commit message, because that tends to be a "somebody write a whole paragraph in line-wrapped mode". And the first line of the commit message really is special, and should not just be of a reasonable length (although75 chars may be too restrictive), but should have an empty line after it. I didn't look into what the checkpatch.pl script does around that code, maybe that's what it already does. Linus
diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index 1e5e66ae5a52..eaad5da50554 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -3220,7 +3220,7 @@ sub process { # Check for line lengths > 75 in commit log, warn once if ($in_commit_log && !$commit_log_long_line && - length($line) > 75 && + length(decode("utf8", $line)) > 75 && !($line =~ /^\s*[a-zA-Z0-9_\/\.]+\s+\|\s+\d+/ || # file delta changes $line =~ /^\s*(?:[\w\.\-\+]*\/)++[\w\.\-\+]+:/ ||