[3/4] tools/nolibc: Fix strlcpy() return code and size usage

Message ID 20240129141516.198636-4-rodrigo@sdfg.com.ar
State New
Headers
Series tools/nolibc: Misc fixes for strlcpy() and strlcat() |

Commit Message

Rodrigo Campos Jan. 29, 2024, 2:15 p.m. UTC
  The return code should always be strlen(src), and we should copy at most
size-1 bytes.

While we are there, make sure to null-terminate the dst buffer.

Signed-off-by: Rodrigo Campos <rodrigo@sdfg.com.ar>
---
 tools/include/nolibc/string.h | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)
  

Comments

Willy Tarreau Feb. 11, 2024, 11:08 a.m. UTC | #1
Hi Rodrigo,

On Mon, Jan 29, 2024 at 03:15:15PM +0100, Rodrigo Campos wrote:
> The return code should always be strlen(src), and we should copy at most
> size-1 bytes.
> 
> While we are there, make sure to null-terminate the dst buffer.
> 
> Signed-off-by: Rodrigo Campos <rodrigo@sdfg.com.ar>
> ---
>  tools/include/nolibc/string.h | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/tools/include/nolibc/string.h b/tools/include/nolibc/string.h
> index b2149e1342a8..e4bc0302967d 100644
> --- a/tools/include/nolibc/string.h
> +++ b/tools/include/nolibc/string.h
> @@ -212,15 +212,16 @@ size_t strlcpy(char *dst, const char *src, size_t size)
>  	size_t len;
>  	char c;
>  
> -	for (len = 0;;) {
> +	for (len = 0; len < size; len++) {
>  		c = src[len];
> -		if (len < size)
> +		if (len < size - 1)
>  			dst[len] = c;
> +		if (len == size - 1)
> +			dst[len] = '\0';
>  		if (!c)
>  			break;
> -		len++;
>  	}
> -	return len;
> +	return strlen(src);
>  }

It's good, but for the same reason as the previous one, I'm getting
smaller code by doing less in the loop. Also calling strlen() here
looks expensive, I'm seeing that the compiler inlined it nevertheless
and did it in a dep-optimized way due to the asm statement. That
results in 67 bytes total while a simpler version gives 47.

If I explicitly mark strlen() __attribute__((noinline)) that prevents
it from doing so starting with gcc-10, where it correctly places a jump
from strlcpy() to strlen() and ends up with 50 bytes (vs 44 for the alt
one). The other one I can propose is directly derived from the other
strlcat() variant, which first performs the copy and starts to count:

size_t strlcpy(char *dst, const char *src, size_t size)
{
        size_t len;

        for (len = 0; len < size; len++) {
                if (!(dst[len] = src[len]))
                        return len;
        }

        /* end of src not found before size */
        if (size)
                dst[size - 1] = '\0';

        while (src[len])
                len++;

        return len;
}

Just let me know what you think. And I think we should explicitly mark
strlen() and the few other ones we're marking weak as noinline so that
the compiler perfers a call there to inlining. Well, registers clobbering
might not always be worth, but given that strlen() alone provides some
savings I think it's still interesting.

Thanks,
Willy
  
Willy Tarreau Feb. 11, 2024, 11:14 a.m. UTC | #2
On Sun, Feb 11, 2024 at 12:08:14PM +0100, Willy Tarreau wrote:
> And I think we should explicitly mark
> strlen() and the few other ones we're marking weak as noinline so that
> the compiler perfers a call there to inlining.

So actually the weak argument always prevents inlining from happening
so this is not needed (I didn't have it in my previous test).

Willy
  
Rodrigo Campos Feb. 14, 2024, 3:50 p.m. UTC | #3
On 2/11/24 12:08, Willy Tarreau wrote:
> Hi Rodrigo,
> 
> It's good, but for the same reason as the previous one, I'm getting
> smaller code by doing less in the loop. Also calling strlen() here
> looks expensive, I'm seeing that the compiler inlined it nevertheless
> and did it in a dep-optimized way due to the asm statement. That
> results in 67 bytes total while a simpler version gives 47.
> 
> If I explicitly mark strlen() __attribute__((noinline)) that prevents
> it from doing so starting with gcc-10, where it correctly places a jump
> from strlcpy() to strlen() and ends up with 50 bytes (vs 44 for the alt
> one). The other one I can propose is directly derived from the other
> strlcat() variant, which first performs the copy and starts to count:
> 
> size_t strlcpy(char *dst, const char *src, size_t size)
> {
>          size_t len;
> 
>          for (len = 0; len < size; len++) {
>                  if (!(dst[len] = src[len]))
>                          return len;
>          }
> 
>          /* end of src not found before size */
>          if (size)
>                  dst[size - 1] = '\0';
> 
>          while (src[len])
>                  len++;
> 
>          return len;
> }
> 
> Just let me know what you think. 

This is one is very nice, thanks!

Sorry I didn't think about the size at all when writing the functions :)

We can change the loop to be:

         for (len = 0; len < size; len++) {
                 dst[len] = src[len];
                 if (!dst[len])
                         break;
         }

That IMHO it is slightly more readable and makes it only 2 bytes longer 
here.

What do you think? I'm fine with both, of course.


If I resend, shall I add a suggested-by or directly you as the author?



Best,
Rodrigo
  
Willy Tarreau Feb. 14, 2024, 3:55 p.m. UTC | #4
On Wed, Feb 14, 2024 at 12:50:53PM -0300, Rodrigo Campos wrote:
> On 2/11/24 12:08, Willy Tarreau wrote:
> > Hi Rodrigo,
> > 
> > It's good, but for the same reason as the previous one, I'm getting
> > smaller code by doing less in the loop. Also calling strlen() here
> > looks expensive, I'm seeing that the compiler inlined it nevertheless
> > and did it in a dep-optimized way due to the asm statement. That
> > results in 67 bytes total while a simpler version gives 47.
> > 
> > If I explicitly mark strlen() __attribute__((noinline)) that prevents
> > it from doing so starting with gcc-10, where it correctly places a jump
> > from strlcpy() to strlen() and ends up with 50 bytes (vs 44 for the alt
> > one). The other one I can propose is directly derived from the other
> > strlcat() variant, which first performs the copy and starts to count:
> > 
> > size_t strlcpy(char *dst, const char *src, size_t size)
> > {
> >          size_t len;
> > 
> >          for (len = 0; len < size; len++) {
> >                  if (!(dst[len] = src[len]))
> >                          return len;
> >          }
> > 
> >          /* end of src not found before size */
> >          if (size)
> >                  dst[size - 1] = '\0';
> > 
> >          while (src[len])
> >                  len++;
> > 
> >          return len;
> > }
> > 
> > Just let me know what you think.
> 
> This is one is very nice, thanks!
> 
> Sorry I didn't think about the size at all when writing the functions :)

Never be sorry, low-level user code like this is never trivial and
that's the goal of the nolibc-test in the first place ;-)

> We can change the loop to be:
> 
>         for (len = 0; len < size; len++) {
>                 dst[len] = src[len];
>                 if (!dst[len])
>                         break;
>         }
> 
> That IMHO it is slightly more readable and makes it only 2 bytes longer
> here.

It's not exactly the same, it will always write a zero at dst[size-1]
due to the break statement. As much as I hate returns in the middle,
this one made sense for this case. A goto to the final return statement
is fine as well.

> What do you think? I'm fine with both, of course.

I'm fine with the more readable part (I also prefer it) but not the use
of break here.

> If I resend, shall I add a suggested-by or directly you as the author?

No need for either, it's your work, my part was just a review and an
addictive temptation to look at asm code ;-)

Cheers,
Willy
  

Patch

diff --git a/tools/include/nolibc/string.h b/tools/include/nolibc/string.h
index b2149e1342a8..e4bc0302967d 100644
--- a/tools/include/nolibc/string.h
+++ b/tools/include/nolibc/string.h
@@ -212,15 +212,16 @@  size_t strlcpy(char *dst, const char *src, size_t size)
 	size_t len;
 	char c;
 
-	for (len = 0;;) {
+	for (len = 0; len < size; len++) {
 		c = src[len];
-		if (len < size)
+		if (len < size - 1)
 			dst[len] = c;
+		if (len == size - 1)
+			dst[len] = '\0';
 		if (!c)
 			break;
-		len++;
 	}
-	return len;
+	return strlen(src);
 }
 
 static __attribute__((unused))