checkpatch: Also accept commit ids with 13-40 chars of sha1

Message ID 62f82b0308de05f5aab913392049af15d53c777d.1701804489.git.geert+renesas@glider.be
State New
Headers
Series checkpatch: Also accept commit ids with 13-40 chars of sha1 |

Commit Message

Geert Uytterhoeven Dec. 5, 2023, 7:34 p.m. UTC
  Documentation/dev-tools/checkpatch.rst says:

  **GIT_COMMIT_ID**
    The proper way to reference a commit id is:
    commit <12+ chars of sha1> ("<title line>")

However, scripts/checkpatch.pl has two different checks: one warning
check accepting 12 characters exactly:

    # Check Fixes: styles is correct
    Please use correct Fixes: style 'Fixes: <12 chars of sha1> (\"<title line>\")'

and a second error check accepting 12-40 characters:

    # Check for git id commit length and improperly formed commit descriptions
    # A correctly formed commit description is:
    #    commit <SHA-1 hash length 12+ chars> ("Complete commit subject")
    Please use git commit description style 'commit <12+ chars of sha1>

Hence patches containing commit ids with more than 12 characters are
flagged by checkpatch, and sometimes rejected by maintainers or
reviewers.

Fix this by aligning the first check with the second check, and with the
documentation.

Fixes: bd17e036b495bebb ("checkpatch: warn for non-standard fixes tag style")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
---
Perhaps the time is ripe to increase the minimum from 12 to 16 chars
(in a follow-up patch)?

Running git-unique-abbrev[1] on a tree containing v6.7-rc3 and all
stable releases gives:

    12000853 objects
     4: 12000853 / 65536
     5: 12000717 / 1048423
     6: 6130888 / 2703295
     7: 525025 / 260563
     8: 33736 / 16861
     9: 2106 / 1053
    10: 160 / 80
    11: 10 / 5
    12: 0 / 0
    21cf4d54d3c702ac20c6747fa6d4f64dee07dd11
    21cf4d54d3ced8a3e752030e483d72997721076d
    8a048bbf89528d45c604aed68f7e0f0ef957067d
    8a048bbf895b1359e4a33b779ea6d7386cfe4de2
    d3ac4e475103c4364ecb47a6a55c114d7c42a014
    d3ac4e47510ec0753ebe1e418a334ad202784aa8
    d597639e2036f04f0226761e2d818b31f2db7820
    d597639e203a100156501df8a0756fd09573e2de
    ef91b6e893a00d903400f8e1303efc4d52b710af
    ef91b6e893afc4c4ca488453ea9f19ced5fa5861

12000853 is still smaller than sqrt(16^12) = 16777216, but the safety
margin is getting smaller.  E.g. my main work tree already contains
almost 18M objects.  Hence the Birthday Paradox states that collisions
of 12 char sha1 values are imminent.

Note that we standardized on 12 chars in commit d311cd44545f2f69
("checkpatch: add test for commit id formatting style in commit log") in
v3.17.  For comparison, running git-unique-abbrev on a tree with all
(upstream + stable) releases from that era gives:

    4052307 objects
     4: 4052307 / 65536
     5: 3966948 / 940963
     6: 869691 / 417363
     7: 61208 / 30523
     8: 3979 / 1989
     9: 258 / 129
    10: 24 / 12
    11: 6 / 3
    12: 0 / 0
    21cf4d54d3c702ac20c6747fa6d4f64dee07dd11
    21cf4d54d3ced8a3e752030e483d72997721076d
    d597639e2036f04f0226761e2d818b31f2db7820
    d597639e203a100156501df8a0756fd09573e2de
    ef91b6e893a00d903400f8e1303efc4d52b710af
    ef91b6e893afc4c4ca488453ea9f19ced5fa5861

So the number of objects increased threefold during the last 9 years.

Thanks for your comments!

[1] https://blog.cuviper.com/2013/11/10/how-short-can-git-abbreviate/
---
 scripts/checkpatch.pl | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
  

Comments

Joe Perches Dec. 5, 2023, 8:05 p.m. UTC | #1
On Tue, 2023-12-05 at 20:34 +0100, Geert Uytterhoeven wrote:
> Documentation/dev-tools/checkpatch.rst says:
> 
>   **GIT_COMMIT_ID**
>     The proper way to reference a commit id is:
>     commit <12+ chars of sha1> ("<title line>")

It's not just checkpatch.

Documentation/process/submitting-patches.rst:``git bisect``, please use the 'Fixes:' tag with the first 12 characters of


So that would need to be updated as well.

And 12 still has quite some headroom.

$ git rev-list --all --abbrev=0 --abbrev-commit | \
  awk '{ a[length] += 1 } END { for (len in a) print len, a[len] }'
5 107
6 684276
7 505734
8 41769
9 2665
10 174
11 8
  
Geert Uytterhoeven Dec. 6, 2023, 7:45 a.m. UTC | #2
Hi Joe,

On Tue, Dec 5, 2023 at 9:05 PM Joe Perches <joe@perches.com> wrote:
> On Tue, 2023-12-05 at 20:34 +0100, Geert Uytterhoeven wrote:
> > Documentation/dev-tools/checkpatch.rst says:
> >
> >   **GIT_COMMIT_ID**
> >     The proper way to reference a commit id is:
> >     commit <12+ chars of sha1> ("<title line>")
>
> It's not just checkpatch.
>
> Documentation/process/submitting-patches.rst:``git bisect``, please use the 'Fixes:' tag with the first 12 characters of
>
>
> So that would need to be updated as well.

And:

Documentation/process/maintainer-tip.rst: - Fixes: 12char-SHA1
("sub/sys: Original subject line")

but the example uses 15:

Documentation/process/maintainer-tip.rst:     Fixes: abcdef012345678
("x86/xxx: Replace foo with bar")

Documentation/process/researcher-guidelines.rst:  Fixes:
aaaabbbbccccdddd ("Introduce support for FooBar")

16

Documentation/process/submitting-patches.rst:   Commit
e21d2170f36602ae2708 ("video: remove unnecessary

20

> And 12 still has quite some headroom.
>
> $ git rev-list --all --abbrev=0 --abbrev-commit | \
>   awk '{ a[length] += 1 } END { for (len in a) print len, a[len] }'
> 5 107
> 6 684276
> 7 505734
> 8 41769
> 9 2665
> 10 174
> 11 8

How many collisions do you need? These will be dereferenced years
from now.

Gr{oetje,eeting}s,

                        Geert
  
Niklas Söderlund Dec. 10, 2023, 9:56 p.m. UTC | #3
Hi Geert,

Thanks for your patch.

On 2023-12-05 20:34:16 +0100, Geert Uytterhoeven wrote:
> Documentation/dev-tools/checkpatch.rst says:
> 
>   **GIT_COMMIT_ID**
>     The proper way to reference a commit id is:
>     commit <12+ chars of sha1> ("<title line>")
> 
> However, scripts/checkpatch.pl has two different checks: one warning
> check accepting 12 characters exactly:
> 
>     # Check Fixes: styles is correct
>     Please use correct Fixes: style 'Fixes: <12 chars of sha1> (\"<title line>\")'
> 
> and a second error check accepting 12-40 characters:
> 
>     # Check for git id commit length and improperly formed commit descriptions
>     # A correctly formed commit description is:
>     #    commit <SHA-1 hash length 12+ chars> ("Complete commit subject")
>     Please use git commit description style 'commit <12+ chars of sha1>
> 
> Hence patches containing commit ids with more than 12 characters are
> flagged by checkpatch, and sometimes rejected by maintainers or
> reviewers.

I agree, it's not nice that the two commit id checks do not agree on 
length and that this should likely be aligned.

To clarify, the two commit id checks in checkpatch.pl do not conflict 
with each other as one check Fixes tags while the other checks the 
format when referring to commits in general, right?

The intention when adding the check for Fixes tags was to conform with 
what is documented in [1]. And to enforce a minimum number of characters 
as there where issues where too few where used.

> 
> Fix this by aligning the first check with the second check, and with the
> documentation.

I think this change is a good idea, but the documentation should also be 
updated.

1. https://www.kernel.org/doc/html/latest/process/submitting-patches.html#describe-your-changes

> 
> Fixes: bd17e036b495bebb ("checkpatch: warn for non-standard fixes tag style")
> Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
> ---
> Perhaps the time is ripe to increase the minimum from 12 to 16 chars
> (in a follow-up patch)?
> 
> Running git-unique-abbrev[1] on a tree containing v6.7-rc3 and all
> stable releases gives:
> 
>     12000853 objects
>      4: 12000853 / 65536
>      5: 12000717 / 1048423
>      6: 6130888 / 2703295
>      7: 525025 / 260563
>      8: 33736 / 16861
>      9: 2106 / 1053
>     10: 160 / 80
>     11: 10 / 5
>     12: 0 / 0
>     21cf4d54d3c702ac20c6747fa6d4f64dee07dd11
>     21cf4d54d3ced8a3e752030e483d72997721076d
>     8a048bbf89528d45c604aed68f7e0f0ef957067d
>     8a048bbf895b1359e4a33b779ea6d7386cfe4de2
>     d3ac4e475103c4364ecb47a6a55c114d7c42a014
>     d3ac4e47510ec0753ebe1e418a334ad202784aa8
>     d597639e2036f04f0226761e2d818b31f2db7820
>     d597639e203a100156501df8a0756fd09573e2de
>     ef91b6e893a00d903400f8e1303efc4d52b710af
>     ef91b6e893afc4c4ca488453ea9f19ced5fa5861
> 
> 12000853 is still smaller than sqrt(16^12) = 16777216, but the safety
> margin is getting smaller.  E.g. my main work tree already contains
> almost 18M objects.  Hence the Birthday Paradox states that collisions
> of 12 char sha1 values are imminent.
> 
> Note that we standardized on 12 chars in commit d311cd44545f2f69
> ("checkpatch: add test for commit id formatting style in commit log") in
> v3.17.  For comparison, running git-unique-abbrev on a tree with all
> (upstream + stable) releases from that era gives:
> 
>     4052307 objects
>      4: 4052307 / 65536
>      5: 3966948 / 940963
>      6: 869691 / 417363
>      7: 61208 / 30523
>      8: 3979 / 1989
>      9: 258 / 129
>     10: 24 / 12
>     11: 6 / 3
>     12: 0 / 0
>     21cf4d54d3c702ac20c6747fa6d4f64dee07dd11
>     21cf4d54d3ced8a3e752030e483d72997721076d
>     d597639e2036f04f0226761e2d818b31f2db7820
>     d597639e203a100156501df8a0756fd09573e2de
>     ef91b6e893a00d903400f8e1303efc4d52b710af
>     ef91b6e893afc4c4ca488453ea9f19ced5fa5861
> 
> So the number of objects increased threefold during the last 9 years.
> 
> Thanks for your comments!
> 
> [1] https://blog.cuviper.com/2013/11/10/how-short-can-git-abbreviate/
> ---
>  scripts/checkpatch.pl | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> index 25fdb7fda1128aa9..a4e178a68f6d1d5f 100755
> --- a/scripts/checkpatch.pl
> +++ b/scripts/checkpatch.pl
> @@ -3209,7 +3209,7 @@ sub process {
>  				$tag_case = 0 if $tag eq "Fixes:";
>  				$tag_space = 0 if ($line =~ /^fixes:? [0-9a-f]{5,} ($balanced_parens)/i);
>  
> -				$id_length = 0 if ($orig_commit =~ /^[0-9a-f]{12}$/i);
> +				$id_length = 0 if ($orig_commit =~ /^[0-9a-f]{12,40}$/i);
>  				$id_case = 0 if ($orig_commit !~ /[A-F]/);
>  
>  				# Always strip leading/trailing parens then double quotes if existing
> @@ -3226,7 +3226,7 @@ sub process {
>  			if ($ctitle ne $title || $tag_case || $tag_space ||
>  			    $id_length || $id_case || !$title_has_quotes) {
>  				if (WARN("BAD_FIXES_TAG",
> -				     "Please use correct Fixes: style 'Fixes: <12 chars of sha1> (\"<title line>\")' - ie: 'Fixes: $cid (\"$ctitle\")'\n" . $herecurr) &&
> +				     "Please use correct Fixes: style 'Fixes: <12+ chars of sha1> (\"<title line>\")' - ie: 'Fixes: $cid (\"$ctitle\")'\n" . $herecurr) &&
>  				    $fix) {
>  					$fixed[$fixlinenr] = "Fixes: $cid (\"$ctitle\")";
>  				}
> -- 
> 2.34.1
>
  

Patch

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 25fdb7fda1128aa9..a4e178a68f6d1d5f 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -3209,7 +3209,7 @@  sub process {
 				$tag_case = 0 if $tag eq "Fixes:";
 				$tag_space = 0 if ($line =~ /^fixes:? [0-9a-f]{5,} ($balanced_parens)/i);
 
-				$id_length = 0 if ($orig_commit =~ /^[0-9a-f]{12}$/i);
+				$id_length = 0 if ($orig_commit =~ /^[0-9a-f]{12,40}$/i);
 				$id_case = 0 if ($orig_commit !~ /[A-F]/);
 
 				# Always strip leading/trailing parens then double quotes if existing
@@ -3226,7 +3226,7 @@  sub process {
 			if ($ctitle ne $title || $tag_case || $tag_space ||
 			    $id_length || $id_case || !$title_has_quotes) {
 				if (WARN("BAD_FIXES_TAG",
-				     "Please use correct Fixes: style 'Fixes: <12 chars of sha1> (\"<title line>\")' - ie: 'Fixes: $cid (\"$ctitle\")'\n" . $herecurr) &&
+				     "Please use correct Fixes: style 'Fixes: <12+ chars of sha1> (\"<title line>\")' - ie: 'Fixes: $cid (\"$ctitle\")'\n" . $herecurr) &&
 				    $fix) {
 					$fixed[$fixlinenr] = "Fixes: $cid (\"$ctitle\")";
 				}