string.c: test *cmp for all possible 1-character strings

Message ID 20221222140506.1961281-1-linux@rasmusvillemoes.dk
State New
Headers
Series string.c: test *cmp for all possible 1-character strings |

Commit Message

Rasmus Villemoes Dec. 22, 2022, 2:05 p.m. UTC
  The switch to -funsigned-char made a pre-existing bug on m68k more
apparent. That is now fixed (by removing m68k's private strcmp(), see
commit 7c0846125358), but we still have quite a few architectures that
provide one or more of strcmp(), strncmp() and memcmp().

They probably all work fine for the cases where the input is all
ASCII, and/or where the caller only wants to know about equality or
not (i.e. only checks whether the return value is 0 or not).

Let's check that all these implementations also behave correctly for
bytes with the high bit set, and provide the correct ordering -
independent of us now building with -funsigned-char, the C standard
says that these *cmp functions should consider the buffers as
consisting of unsigned chars.

This is only intended to help find other latent bugs and can/should be
ripped out again before v6.2, or perhaps moved to test_string.c in
some form, but for now I think it's worth doing unconditionally.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
---
 lib/string.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)
  

Comments

Jason A. Donenfeld Dec. 22, 2022, 3:15 p.m. UTC | #1
On Thu, Dec 22, 2022 at 03:05:06PM +0100, Rasmus Villemoes wrote:
> The switch to -funsigned-char made a pre-existing bug on m68k more
> apparent. That is now fixed (by removing m68k's private strcmp(), see
> commit 7c0846125358), but we still have quite a few architectures that
> provide one or more of strcmp(), strncmp() and memcmp().
> 
> They probably all work fine for the cases where the input is all
> ASCII, and/or where the caller only wants to know about equality or
> not (i.e. only checks whether the return value is 0 or not).
> 
> Let's check that all these implementations also behave correctly for
> bytes with the high bit set, and provide the correct ordering -
> independent of us now building with -funsigned-char, the C standard
> says that these *cmp functions should consider the buffers as
> consisting of unsigned chars.
> 
> This is only intended to help find other latent bugs and can/should be
> ripped out again before v6.2, or perhaps moved to test_string.c in
> some form, but for now I think it's worth doing unconditionally.
> 
> Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
> ---
>  lib/string.c | 27 +++++++++++++++++++++++++++
>  1 file changed, 27 insertions(+)
> 
> diff --git a/lib/string.c b/lib/string.c
> index 4fb566ea610f..1718f96e8082 100644
> --- a/lib/string.c
> +++ b/lib/string.c
> @@ -880,3 +880,30 @@ void *memchr_inv(const void *start, int c, size_t bytes)
>  	return check_bytes8(start, value, bytes % 8);
>  }
>  EXPORT_SYMBOL(memchr_inv);
> +
> +static int sign(int x)
> +{
> +	return (x > 0) - (x < 0);
> +}
> +
> +static int test_xxxcmp(void)
> +{
> +	char a[2], b[2];
> +	int i, j;
> +
> +	a[1] = b[1] = 0;
> +	for (i = 0; i < 256; ++i) {
> +		a[0] = i;
> +		for (j = 0; j < 256; ++j) {
> +			b[0] = j;
> +			WARN_ONCE(sign(strcmp(a, b)) != sign(i - j),
> +				  "strcmp() broken for (%2ph, %2ph)\n", a, b);
> +			WARN_ONCE(sign(memcmp(a, b, 2)) != sign(i - j),
> +				  "memcmp() broken for (%2ph, %2ph)\n", a, b);
> +			WARN_ONCE(sign(strncmp(a, b, 2)) != sign(i - j),
> +				  "strncmp() broken for (%2ph, %2ph)\n", a, b);
> +		}
> +	}
> +	return 0;
> +}
> +late_initcall(test_xxxcmp);

This probably belongs in some config-gated selftest file that can be
compiled out, rather than running unconditionally on every boot, right?

Jason
  
Rasmus Villemoes Dec. 23, 2022, 7:42 a.m. UTC | #2
On 22/12/2022 16.15, Jason A. Donenfeld wrote:
> On Thu, Dec 22, 2022 at 03:05:06PM +0100, Rasmus Villemoes wrote:

>> This is only intended to help find other latent bugs and can/should be
>> ripped out again before v6.2, or perhaps moved to test_string.c in
>> some form, but for now I think it's worth doing unconditionally.
>>
> This probably belongs in some config-gated selftest file that can be
> compiled out, rather than running unconditionally on every boot, right?

I believe this was already answered in the last paragraph of the commit log.

Rasmus
  
kernel test robot Dec. 23, 2022, 7:56 a.m. UTC | #3
Hi Rasmus,

I love your patch! Yet something to improve:

[auto build test ERROR on linux/master]
[also build test ERROR on linus/master v6.1 next-20221220]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Rasmus-Villemoes/string-c-test-cmp-for-all-possible-1-character-strings/20221222-220708
patch link:    https://lore.kernel.org/r/20221222140506.1961281-1-linux%40rasmusvillemoes.dk
patch subject: [PATCH] string.c: test *cmp for all possible 1-character strings
config: riscv-randconfig-r042-20221219
compiler: clang version 16.0.0 (https://github.com/llvm/llvm-project 98b13979fb05f3ed288a900deb843e7b27589e58)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install riscv cross compiling tool for clang build
        # apt-get install binutils-riscv64-linux-gnu
        # https://github.com/intel-lab-lkp/linux/commit/0235c6544a848ef03332c7840c87b356c08a4b1d
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Rasmus-Villemoes/string-c-test-cmp-for-all-possible-1-character-strings/20221222-220708
        git checkout 0235c6544a848ef03332c7840c87b356c08a4b1d
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=riscv olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=riscv SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> ld.lld: error: undefined symbol: __warn_printk
   >>> referenced by ctype.c
   >>>               arch/riscv/purgatory/purgatory.ro:(test_xxxcmp)
   >>> referenced by ctype.c
   >>>               arch/riscv/purgatory/purgatory.ro:(test_xxxcmp)
   >>> referenced by ctype.c
   >>>               arch/riscv/purgatory/purgatory.ro:(test_xxxcmp)
  
kernel test robot Dec. 23, 2022, 10:34 p.m. UTC | #4
Hi Rasmus,

I love your patch! Yet something to improve:

[auto build test ERROR on linux/master]
[also build test ERROR on linus/master v6.1 next-20221220]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Rasmus-Villemoes/string-c-test-cmp-for-all-possible-1-character-strings/20221222-220708
patch link:    https://lore.kernel.org/r/20221222140506.1961281-1-linux%40rasmusvillemoes.dk
patch subject: [PATCH] string.c: test *cmp for all possible 1-character strings
config: riscv-allyesconfig
compiler: riscv64-linux-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/0235c6544a848ef03332c7840c87b356c08a4b1d
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Rasmus-Villemoes/string-c-test-cmp-for-all-possible-1-character-strings/20221222-220708
        git checkout 0235c6544a848ef03332c7840c87b356c08a4b1d
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=riscv olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=riscv SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   riscv64-linux-ld: arch/riscv/purgatory/purgatory.ro: in function `.L13':
>> string.c:(.text+0x1832): undefined reference to `__warn_printk'
   riscv64-linux-ld: arch/riscv/purgatory/purgatory.ro: in function `.L3':
   string.c:(.text+0x187a): undefined reference to `__warn_printk'
   riscv64-linux-ld: arch/riscv/purgatory/purgatory.ro: in function `.L6':
   string.c:(.text+0x18c4): undefined reference to `__warn_printk'
  

Patch

diff --git a/lib/string.c b/lib/string.c
index 4fb566ea610f..1718f96e8082 100644
--- a/lib/string.c
+++ b/lib/string.c
@@ -880,3 +880,30 @@  void *memchr_inv(const void *start, int c, size_t bytes)
 	return check_bytes8(start, value, bytes % 8);
 }
 EXPORT_SYMBOL(memchr_inv);
+
+static int sign(int x)
+{
+	return (x > 0) - (x < 0);
+}
+
+static int test_xxxcmp(void)
+{
+	char a[2], b[2];
+	int i, j;
+
+	a[1] = b[1] = 0;
+	for (i = 0; i < 256; ++i) {
+		a[0] = i;
+		for (j = 0; j < 256; ++j) {
+			b[0] = j;
+			WARN_ONCE(sign(strcmp(a, b)) != sign(i - j),
+				  "strcmp() broken for (%2ph, %2ph)\n", a, b);
+			WARN_ONCE(sign(memcmp(a, b, 2)) != sign(i - j),
+				  "memcmp() broken for (%2ph, %2ph)\n", a, b);
+			WARN_ONCE(sign(strncmp(a, b, 2)) != sign(i - j),
+				  "strncmp() broken for (%2ph, %2ph)\n", a, b);
+		}
+	}
+	return 0;
+}
+late_initcall(test_xxxcmp);