[RFC,v2,1/1] kernfs: replace deprecated strlcpy() with strscpy()

Message ID 20231122212008.11790-1-mirsad.todorovac@alu.unizg.hr
State New
Headers
Series [RFC,v2,1/1] kernfs: replace deprecated strlcpy() with strscpy() |

Commit Message

Mirsad Todorovac Nov. 22, 2023, 9:20 p.m. UTC
  From: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>

According to strlcpy() being officially deprecated and the encouragement
to remove the remaining occurrences, this came as the intriguing example.

In the kernfs_name_locked() the behaviour of truncating the kn->name is
preserved, for it only used in the module for printing in the log and
declared static. It is only called from pr_cont_kernfs_name() via kernfs_name()
and returned result is ignored.

It is avoided to go past the allocated page and cause the internal equivalent
of SEGFAULT in the unlikely case kn->name is not null-terminated, which I
believe was the idea behind replacing strlcpy() with strscpy().

kernfs_path_from_node_locked() has "(null)" which certainly cannot overrun,
and a carefully calculated len and truncated path elsewhere.

Fixes: 17627157cda13 ("kernfs: handle null pointers while printing node name and path")
Fixes: 3eef34ad7dc36 ("kernfs: implement kernfs_get_parent(), kernfs_name/path() and friends")
Fixes: 9f6df573a4041 ("kernfs: Add API to generate relative kernfs path")
Fixes: 3abb1d90f5d93 ("kernfs: make kernfs_path*() behave in the style of strlcpy()")
Fixes: e56ed358afd81 ("kernfs: make kernfs_walk_ns() use kernfs_pr_cont_buf[]")
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Aditya Kali <adityakali@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: linux-kernel@vger.kernel.org
Link: https://lwn.net/ml/ksummit-discuss/20231019054642.GF14346@lst.de/#t
Link: https://lwn.net/Articles/659214/
Signed-off-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
---
v2:
 Fixed a Cc: address bug.
 Fixing the patch according to the new definition of strscpy() in
    LWN article https://lwn.net/Articles/659214/ that makes strscpy_truncate()
    obsoleted.

 fs/kernfs/dir.c | 25 ++++++++++++++++---------
 1 file changed, 16 insertions(+), 9 deletions(-)
  

Comments

Tejun Heo Nov. 22, 2023, 9:31 p.m. UTC | #1
Hello,

On Wed, Nov 22, 2023 at 10:20:10PM +0100, Mirsad Goran Todorovac wrote:
> From: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
> 
> According to strlcpy() being officially deprecated and the encouragement
> to remove the remaining occurrences, this came as the intriguing example.
> 
> In the kernfs_name_locked() the behaviour of truncating the kn->name is
> preserved, for it only used in the module for printing in the log and
> declared static. It is only called from pr_cont_kernfs_name() via kernfs_name()
> and returned result is ignored.
> 
> It is avoided to go past the allocated page and cause the internal equivalent
> of SEGFAULT in the unlikely case kn->name is not null-terminated, which I
> believe was the idea behind replacing strlcpy() with strscpy().

I don't follow this line of thinking. Yeah, if the input arguments are
wrong, it can malfunction. strscpy() is gonna page fault too if the input
string crosses into an unmapped page before the destination buffer is
filled, right? But it'd be a stretch to claim that that's a problem with the
function. Maybe I'm missing something but I'm having a hard time seeing the
value in these conversions.

> kernfs_path_from_node_locked() has "(null)" which certainly cannot overrun,
> and a carefully calculated len and truncated path elsewhere.

Some of the functions you're modifying were returning the full length of the
input string before and it isn't clear from the patch or description whether
the conversion is safe. It'd help if you can elaborate more on why the
conversions are safe and how you verified them.

Thanks.
  
Mirsad Todorovac Nov. 22, 2023, 11:37 p.m. UTC | #2
Hi Mr. Tejun Heo,

On 11/22/23 22:31, Tejun Heo wrote:
> Hello,
> 
> On Wed, Nov 22, 2023 at 10:20:10PM +0100, Mirsad Goran Todorovac wrote:
>> From: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
>>
>> According to strlcpy() being officially deprecated and the encouragement
>> to remove the remaining occurrences, this came as the intriguing example.
>>
>> In the kernfs_name_locked() the behaviour of truncating the kn->name is
>> preserved, for it only used in the module for printing in the log and
>> declared static. It is only called from pr_cont_kernfs_name() via kernfs_name()
>> and returned result is ignored.
>>
>> It is avoided to go past the allocated page and cause the internal equivalent
>> of SEGFAULT in the unlikely case kn->name is not null-terminated, which I
>> believe was the idea behind replacing strlcpy() with strscpy().
> 
> I don't follow this line of thinking. Yeah, if the input arguments are
> wrong, it can malfunction. strscpy() is gonna page fault too if the input
> string crosses into an unmapped page before the destination buffer is
> filled, right? But it'd be a stretch to claim that that's a problem with the
> function. Maybe I'm missing something but I'm having a hard time seeing the
> value in these conversions.

INTRODUCTION

I will try to explain, it is not complicated, but we have to cover all cases
of use of kernfs_name_locked(), kernfs_name() and cgroup_name() which uses it.

We are lucky that there are only a couple of uses throughout kernel
(verified with grep and Bootlin).

But let's start from the beginning.

We have a semantical gap where strlcpy() returned the rather useless
value of strlen(src) on buffer overrun which was tried to be avoided at all
cost, for it can page fault into unmapped region.

When we use that value, we are in trouble when strlen(src) > buflen,
and we return in effect strlen(src). Calculating it is not only useless, but
possibly can cause page fault in the case src is not null-terminated.

(This is not my idea, I've read it on LWN, but I cannot find the reference
right now, I may be able to provide it later.)

It is also inefficient to scan all characters after the "count"
bytes until the end of src just to be able to say that the length overruns
the buffer, when it is obvious after the first character exceeding "count".

But we are happy that no part of kernel uses this value ATM, so
the function kernel_name_locked() could as well be void.

I will try to elaborate this into greater detail, and I tried to do it
in the full but shortest possible.

===

Allegedly, by the article here: [1] https://lwn.net/Articles/659214/

strscpy() now (or since 2015) copies string up to "count" bytes and
truncates it.

And from the source of strscpy():

[2] https://elixir.bootlin.com/linux/latest/source/lib/string.c#L122

	size_t max = count;

	while (max >= sizeof(unsigned long)) {
		unsigned long c, data;

		c = read_word_at_a_time(src+res);
		if (has_zero(c, &data, &constants)) {
			data = prep_zero_mask(c, data, &constants);
			data = create_zero_mask(data);
			*(unsigned long *)(dest+res) = c & zero_bytemask(data);
			return res + find_zero(data);
		}
		*(unsigned long *)(dest+res) = c;
		res += sizeof(unsigned long);
		count -= sizeof(unsigned long);
		max -= sizeof(unsigned long);
	}

	while (count) {
		char c;

		c = src[res];
		dest[res] = c;
		if (!c)
			return res;
		res++;
		count--;
	}

it is visible that strscpy() will not go ahead more than "count" bytes into
a theoretically unterminated string, certainly not more than the sizeof(unsigned long).

This should dramatically decrease chances of hitting a page boundary and page
fault if the string is erroneously or maliciously not null-terminated.

>> kernfs_path_from_node_locked() has "(null)" which certainly cannot overrun,
>> and a carefully calculated len and truncated path elsewhere.
> 
> Some of the functions you're modifying were returning the full length of the
> input string before and it isn't clear from the patch or description whether
> the conversion is safe. It'd help if you can elaborate more on why the
> conversions are safe and how you verified them.

Well, it is not so complicated, once you get into the Source:

The only place where the return value is modified is here:

   54 static int kernfs_name_locked(struct kernfs_node *kn, char *buf, size_t buflen)
   55 {
   56         size_t len;
   57
   58         if (!kn)
   59                 return strscpy(buf, "(null)", buflen);
   60
   61         len = strscpy(buf, kn->parent ? kn->name : "/", buflen);
   62
   63         if (unlikely(len == -E2BIG)) {
   64                 return buflen - 1;
   65         } else
   66                 return len;
   67 }

Just in this mail Linus has declined the possibility of using any variant of
strlen() to establish the length of kn->name:

[3] https://lore.kernel.org/lkml/CAHk-=wg7pwWiF4eWVTfFYkfAk_5YDHkmkgZ04cgXkNUO_9pR3A@mail.gmail.com/

kernfs_name_locked() is called from this function:

  201 int kernfs_name(struct kernfs_node *kn, char *buf, size_t buflen)
  202 {
  203         unsigned long flags;
  204         int ret;
  205
  206         read_lock_irqsave(&kernfs_rename_lock, flags);
  207         ret = kernfs_name_locked(kn, buf, buflen);
  208         read_unlock_irqrestore(&kernfs_rename_lock, flags);
  209         return ret;
  210 }

The return value is passed to the caller. The kernfs_name() is called only here:

  247 void pr_cont_kernfs_name(struct kernfs_node *kn)
  248 {
  249         unsigned long flags;
  250
  251         spin_lock_irqsave(&kernfs_pr_cont_lock, flags);
  252
  253         kernfs_name(kn, kernfs_pr_cont_buf, sizeof(kernfs_pr_cont_buf));
  254         pr_cont("%s", kernfs_pr_cont_buf);
  255
  256         spin_unlock_irqrestore(&kernfs_pr_cont_lock, flags);
  257 }

and the return value is ignored.

Search shows that kernfs is defined and used here:

Defined in 1 files as a prototype:

     include/linux/kernfs.h, line 394 (as a prototype)

Defined in 2 files as a function:

     fs/kernfs/dir.c, line 194 (as a function)
     include/linux/kernfs.h, line 472 (as a function)

Documented in 1 files:

     fs/kernfs/dir.c, line 178

Referenced in 2 files:

     fs/kernfs/dir.c, line 246
     include/linux/cgroup.h, line 596

Let's go through each case:

A search shows that

user@host:~/linux/kernel/torvalds4 $ grep -n -E 'kernfs_name[(]' --include '*.c' --include '*.h' -r .
./fs/kernfs/dir.c:201:int kernfs_name(struct kernfs_node *kn, char *buf, size_t buflen)
./fs/kernfs/dir.c:247:void pr_cont_kernfs_name(struct kernfs_node *kn)
./fs/kernfs/dir.c:253:	kernfs_name(kn, kernfs_pr_cont_buf, sizeof(kernfs_pr_cont_buf));
./include/linux/cgroup.h:594:	return kernfs_name(cgrp->kn, buf, buflen);
./include/linux/cgroup.h:604:	pr_cont_kernfs_name(cgrp->kn);
./include/linux/kernfs.h:395:int kernfs_name(struct kernfs_node *kn, char *buf, size_t buflen);
./include/linux/kernfs.h:398:void pr_cont_kernfs_name(struct kernfs_node *kn);
./include/linux/kernfs.h:473:static inline int kernfs_name(struct kernfs_node *kn, char *buf, size_t buflen)
./include/linux/kernfs.h:481:static inline void pr_cont_kernfs_name(struct kernfs_node *kn) { }
./debian/linux-headers/usr/src/linux-headers-6.7.0-rc2-rtl-v0.2-nokcsan-00027-g9b71d857b080-dirty/include/linux/cgroup.h:594:	return kernfs_name(cgrp->kn, buf, buflen);
./debian/linux-headers/usr/src/linux-headers-6.7.0-rc2-rtl-v0.2-nokcsan-00027-g9b71d857b080-dirty/include/linux/cgroup.h:604:	pr_cont_kernfs_name(cgrp->kn);
./debian/linux-headers/usr/src/linux-headers-6.7.0-rc2-rtl-v0.2-nokcsan-00027-g9b71d857b080-dirty/include/linux/kernfs.h:395:int kernfs_name(struct kernfs_node *kn, char *buf, size_t buflen);
./debian/linux-headers/usr/src/linux-headers-6.7.0-rc2-rtl-v0.2-nokcsan-00027-g9b71d857b080-dirty/include/linux/kernfs.h:398:void pr_cont_kernfs_name(struct kernfs_node *kn);
./debian/linux-headers/usr/src/linux-headers-6.7.0-rc2-rtl-v0.2-nokcsan-00027-g9b71d857b080-dirty/include/linux/kernfs.h:473:static inline int kernfs_name(struct kernfs_node *kn, char *buf, size_t buflen)
./debian/linux-headers/usr/src/linux-headers-6.7.0-rc2-rtl-v0.2-nokcsan-00027-g9b71d857b080-dirty/include/linux/kernfs.h:481:static inline void pr_cont_kernfs_name(struct kernfs_node *kn) { }

In include/linux/kernfs.h:395: kernfs_name() is defined as:

395 int kernfs_name(struct kernfs_node *kn, char *buf, size_t buflen);

In include/linux/kernfs.h:473: kernfs_name() is defined as:

472
473 static inline int kernfs_name(struct kernfs_node *kn, char *buf, size_t buflen)
474 { return -ENOSYS; }
475

elsewhere it is used in include/linux/cgroup.h:594:

592 static inline int cgroup_name(struct cgroup *cgrp, char *buf, size_t buflen)
593 {
594         return kernfs_name(cgrp->kn, buf, buflen);
595 }

Seek cgroup_name() in the Source:

user@dhost:~/linux/kernel/torvalds4$ grep -n -E 'cgroup_name[(]' --include '*.c' --include '*.h' -r .
./include/linux/cgroup.h:592:static inline int cgroup_name(struct cgroup *cgrp, char *buf, size_t buflen)
./include/linux/cgroup.h:602:static inline void pr_cont_cgroup_name(struct cgroup *cgrp)
./kernel/cgroup/debug.c:103:		cgroup_name(c, name_buf, NAME_MAX + 1);
./kernel/cgroup/cpuset.c:4237:		pr_cont_cgroup_name(cs->css.cgroup);
./kernel/cgroup/cpuset.c:4878:	pr_cont_cgroup_name(cgrp);
./mm/page_owner.c:384:	cgroup_name(memcg->css.cgroup, name, sizeof(name));
./tools/perf/util/cgroup.c:216:static int add_cgroup_name(const char *fpath, const struct stat *sb __maybe_unused,
./tools/perf/util/cgroup.c:235:static int check_and_add_cgroup_name(const char *fpath)
./tools/perf/util/cgroup.c:245:	return add_cgroup_name(fpath, NULL, FTW_D, NULL);
./tools/perf/util/cgroup.c:278:			ret = check_and_add_cgroup_name(s);
./tools/perf/util/cgroup.c:283:			if (check_and_add_cgroup_name("/") < 0)
./debian/linux-headers/usr/src/linux-headers-6.7.0-rc2-rtl-v0.2-nokcsan-00027-g9b71d857b080-dirty/include/linux/cgroup.h:592:static inline int cgroup_name(struct cgroup *cgrp, char *buf, size_t buflen)
./debian/linux-headers/usr/src/linux-headers-6.7.0-rc2-rtl-v0.2-nokcsan-00027-g9b71d857b080-dirty/include/linux/cgroup.h:602:static inline void pr_cont_cgroup_name(struct cgroup *cgrp)
user@host:~/linux/kernel/torvalds4$

Here in the line 103 the return value of cgroup_name() is ignored:

kernel/cgroup/debug.c:
  87 static int current_css_set_cg_links_read(struct seq_file *seq, void *v)
  88 {
  89         struct cgrp_cset_link *link;
  90         struct css_set *cset;
  91         char *name_buf;
  92
  93         name_buf = kmalloc(NAME_MAX + 1, GFP_KERNEL);
  94         if (!name_buf)
  95                 return -ENOMEM;
  96
  97         spin_lock_irq(&css_set_lock);
  98         rcu_read_lock();
  99         cset = task_css_set(current);
100         list_for_each_entry(link, &cset->cgrp_links, cgrp_link) {
101                 struct cgroup *c = link->cgrp;
102
103                 cgroup_name(c, name_buf, NAME_MAX + 1);
104                 seq_printf(seq, "Root %d group %s\n",
105                            c->root->hierarchy_id, name_buf);
106         }
107         rcu_read_unlock();
108         spin_unlock_irq(&css_set_lock);
109         kfree(name_buf);
110         return 0;
111 }

mm/page_owner.c:384:

	cgroup_name(memcg->css.cgroup, name, sizeof(name));

Referenced in 8 files:

     kernel/cgroup/debug.c, line 103
     mm/page_owner.c, line 384

All the other references of cgroup_name are struct or struct members:

     tools/perf/util/cgroup.c
         line 211
         line 229
         line 242
         line 245
         line 255
         line 296
         line 339
         line 415
     tools/tracing/rtla/src/osnoise_hist.c
         line 571
         line 574
         line 866
     tools/tracing/rtla/src/osnoise_top.c
         line 415
         line 418
         line 699
     tools/tracing/rtla/src/timerlat_hist.c
         line 645
         line 648
         line 986
         line 1061
         line 1061
     tools/tracing/rtla/src/timerlat_top.c
         line 479
         line 482
         line 796
         line 880
         line 880
     tools/tracing/rtla/src/timerlat_u.c
         line 66
         line 67

And this is the only place where cgroup_name() is invoked.

Second place where the strlcpy() -> strscpy() is replaced is here:

  141 static int kernfs_path_from_node_locked(struct kernfs_node *kn_to,
  142                                         struct kernfs_node *kn_from,
  143                                         char *buf, size_t buflen)
  144 {
  145         struct kernfs_node *kn, *common;
  146         const char parent_str[] = "/..";
  147         size_t depth_from, depth_to, len = 0;
  148         int i, j;
  149
  150         if (!kn_to)
  151                 return strscpy(buf, "(null)", buflen);
  152
  153         if (!kn_from)
  154                 kn_from = kernfs_root(kn_to)->kn;
  155
  156         if (kn_from == kn_to)
  157                 return strscpy(buf, "/", buflen);
  158
  159         common = kernfs_common_ancestor(kn_from, kn_to);
  160         if (WARN_ON(!common))
  161                 return -EINVAL;
  162
  163         depth_to = kernfs_depth(common, kn_to);
  164         depth_from = kernfs_depth(common, kn_from);
  165
  166         buf[0] = '\0';
  167
  168         for (i = 0; i < depth_from; i++)
  169                 len += strscpy(buf + len, parent_str,
  170                                len < buflen ? buflen - len : 0);
  171
  172         /* Calculate how many bytes we need for the rest */
  173         for (i = depth_to - 1; i >= 0; i--) {
  174                 for (kn = kn_to, j = 0; j < i; j++)
  175                         kn = kn->parent;
  176                 len += strscpy(buf + len, "/",
  177                                len < buflen ? buflen - len : 0);
  178                 len += strscpy(buf + len, kn->name,
  179                                len < buflen ? buflen - len : 0);
  180         }
  181
  182         return len;
  183 }

This is safe, as in line 151 "(null)" cannot overrun buf of buflen which
is sizeof(kernfs_pr_cont_buf) which is PATH_MAX:

   29 static char kernfs_pr_cont_buf[PATH_MAX];

Line 157 cannot possibly overrun buf, and it is equivalent to strlcpy()
in all cases where there is no possibility of overrun.

Lines 169, 176 and 178 carefully check if (len < buflen) and copy buflen - len
or zero bytes.

This is safe, for we see that in case of count == 0 strscpy() just like
strlcpy() turns to a virtual NOP.

Finally here:

  856 static struct kernfs_node *kernfs_walk_ns(struct kernfs_node *parent,
  857                                           const unsigned char *path,
  858                                           const void *ns)
  859 {
  860         size_t len;
  861         char *p, *name;
  862
  863         lockdep_assert_held_read(&kernfs_root(parent)->kernfs_rwsem);
  864
  865         spin_lock_irq(&kernfs_pr_cont_lock);
  866
  867         len = strscpy(kernfs_pr_cont_buf, path, sizeof(kernfs_pr_cont_buf));
  868
  869         if (unlikely(len == -E2BIG)) {
  870                 spin_unlock_irq(&kernfs_pr_cont_lock);
  871                 return NULL;
  872         }
  873
  874         p = kernfs_pr_cont_buf;
  875
  876         while ((name = strsep(&p, "/")) && parent) {
  877                 if (*name == '\0')
  878                         continue;
  879                 parent = kernfs_find_ns(parent, name, ns);
  880         }
  881
  882         spin_unlock_irq(&kernfs_pr_cont_lock);
  883
  884         return parent;
  885 }

the previous test was

	if (len >= sizeof(kernfs_pr_cont_buf))

which is safely replaced by

	if (len == -E2BIG)

For both will happen if and only if the string overruns the buffer.

CONCLUSION

The only place where the src string can overrun the buffer is here:

   54 static int kernfs_name_locked(struct kernfs_node *kn, char *buf, size_t buflen)
   55 {
   56         size_t len;
   57
   58         if (!kn)
   59                 return strscpy(buf, "(null)", buflen);
   60
   61         len = strscpy(buf, kn->parent ? kn->name : "/", buflen);
   62
   63         if (unlikely(len == -E2BIG)) {
   64                 return buflen - 1;
   65         } else
   66                 return len;
   67 }

and this function is only called from

  201 int kernfs_name(struct kernfs_node *kn, char *buf, size_t buflen)
  202 {
  203         unsigned long flags;
  204         int ret;
  205
  206         read_lock_irqsave(&kernfs_rename_lock, flags);
  207         ret = kernfs_name_locked(kn, buf, buflen);
  208         read_unlock_irqrestore(&kernfs_rename_lock, flags);
  209         return ret;
  210 }

and

592 static inline int cgroup_name(struct cgroup *cgrp, char *buf, size_t buflen)
593 {
594         return kernfs_name(cgrp->kn, buf, buflen);
595 }

Both kernfs_name() and cgroup_name() are only called while ignoring their return
value in:

  247 void pr_cont_kernfs_name(struct kernfs_node *kn)
  248 {
  249         unsigned long flags;
  250
  251         spin_lock_irqsave(&kernfs_pr_cont_lock, flags);
  252
  253 →       kernfs_name(kn, kernfs_pr_cont_buf, sizeof(kernfs_pr_cont_buf));
  254         pr_cont("%s", kernfs_pr_cont_buf);
  255
  256         spin_unlock_irqrestore(&kernfs_pr_cont_lock, flags);
  257 }

and

kernel/cgroup/debug.c:
  87 static int current_css_set_cg_links_read(struct seq_file *seq, void *v)
  88 {
  89         struct cgrp_cset_link *link;
  90         struct css_set *cset;
  91         char *name_buf;
  92
  93         name_buf = kmalloc(NAME_MAX + 1, GFP_KERNEL);
  94         if (!name_buf)
  95                 return -ENOMEM;
  96
  97         spin_lock_irq(&css_set_lock);
  98         rcu_read_lock();
  99         cset = task_css_set(current);
100         list_for_each_entry(link, &cset->cgrp_links, cgrp_link) {
101                 struct cgroup *c = link->cgrp;
102
103 →               cgroup_name(c, name_buf, NAME_MAX + 1);
104                 seq_printf(seq, "Root %d group %s\n",
105                            c->root->hierarchy_id, name_buf);
106         }
107         rcu_read_unlock();
108         spin_unlock_irq(&css_set_lock);
109         kfree(name_buf);
110         return 0;
111 }

This indicates that the change of semanthics from returning unfortunate

  59 return strlcpy(buf, kn->parent ? kn->name : "/", buflen);

and in effect unwanted strlen(kn->name) does not affect the rest of kernel.
Which is more luck than expected.

Linus in [3] says [quote]:

	The reason to use strscpy() is to *avoid* doing the strlen() on the
	source, and limit things to the limited size.

	If you need to do the strlen(), then use strlcpy(). It's a broken
	interface, but creating this kind of horror wrapper that does the same
	thing as strlcpy() is worse than just using the regular version.

	So the strscpy() conversion should *only* be done if the caller
	doesn't care about the difference in return values (or done *together*
	with changing the caller to use the nicer strscpy() return value).

We are in this case lucky because none of the final callers in the stack trace
cares about the return value of kernfs_name_locked(), or the value in other
functions isn't changed at all, because buffer overrun is carefully avoided
by calculations ensuring len < buflen.

I am just running the kernel with this patch applied, but I don't know of the
border case selftests for the kernfs_name() and cgroup_name() overrun.

REFERENCES:

[1] https://lwn.net/Articles/659214/

[2] https://elixir.bootlin.com/linux/latest/source/lib/string.c#L122

[3] https://lore.kernel.org/lkml/CAHk-=wg7pwWiF4eWVTfFYkfAk_5YDHkmkgZ04cgXkNUO_9pR3A@mail.gmail.com/

Hope this helps.

Best regards,
Mirsad Todorovac


> Thanks.
>
  
Tejun Heo Nov. 25, 2023, 7:35 p.m. UTC | #3
Hello,

On Thu, Nov 23, 2023 at 12:37:03AM +0100, Mirsad Todorovac wrote:
...
>  141 static int kernfs_path_from_node_locked(struct kernfs_node *kn_to,
>  142                                         struct kernfs_node *kn_from,
>  143                                         char *buf, size_t buflen)
...
>  172         /* Calculate how many bytes we need for the rest */
>  173         for (i = depth_to - 1; i >= 0; i--) {
>  174                 for (kn = kn_to, j = 0; j < i; j++)
>  175                         kn = kn->parent;
>  176                 len += strscpy(buf + len, "/",
>  177                                len < buflen ? buflen - len : 0);
>  178                 len += strscpy(buf + len, kn->name,
>  179                                len < buflen ? buflen - len : 0);
>  180         }
>  181
>  182         return len;
>  183 }
...
> This is safe, for we see that in case of count == 0 strscpy() just like
> strlcpy() turns to a virtual NOP.

The conversion itself isn't dangerous but it changes the return value of the
function. The comment is not updated and the callers are still assuming that
the function returns full length when the buffer is too short. e.g. Take a
look at cgroup_show_path(). All those paths seem safe but the code is more
more confusing because the conversions are half-way. I'm not necessarily
against the conversion but the benefit to risk / churn ratio doesn't seem
too attractive. If you wanna push this through, please make the conversion
complete including the comments and the callers and include a short summary
of the changes and why they're safe in the commit message.

Thanks.
  
Mirsad Todorovac Nov. 25, 2023, 11:33 p.m. UTC | #4
Hello, Mr. Tejun Heo,

On 11/25/23 20:35, Tejun Heo wrote:
> Hello,
> 
> On Thu, Nov 23, 2023 at 12:37:03AM +0100, Mirsad Todorovac wrote:
> ...
>>   141 static int kernfs_path_from_node_locked(struct kernfs_node *kn_to,
>>   142                                         struct kernfs_node *kn_from,
>>   143                                         char *buf, size_t buflen)
> ...
>>   172         /* Calculate how many bytes we need for the rest */
>>   173         for (i = depth_to - 1; i >= 0; i--) {
>>   174                 for (kn = kn_to, j = 0; j < i; j++)
>>   175                         kn = kn->parent;
>>   176                 len += strscpy(buf + len, "/",
>>   177                                len < buflen ? buflen - len : 0);
>>   178                 len += strscpy(buf + len, kn->name,
>>   179                                len < buflen ? buflen - len : 0);
>>   180         }
>>   181
>>   182         return len;
>>   183 }
> ...
>> This is safe, for we see that in case of count == 0 strscpy() just like
>> strlcpy() turns to a virtual NOP.
> 
> The conversion itself isn't dangerous but it changes the return value of the
> function. The comment is not updated and the callers are still assuming that
> the function returns full length when the buffer is too short. e.g. Take a
> look at cgroup_show_path(). All those paths seem safe but the code is more
> more confusing because the conversions are half-way. I'm not necessarily
> against the conversion but the benefit to risk / churn ratio doesn't seem
> too attractive. If you wanna push this through, please make the conversion
> complete including the comments and the callers and include a short summary
> of the changes and why they're safe in the commit message.
> 
> Thanks.

Thank you kindly for your review of my lengthy analysis of the return value.

I apologise for another long email, but most of it is the code you already
understand or wrote, so it should be easy for you to read.

Apparently, what troubles you is:

1. Lack of comment that the return value of kernfs_name_locked() changed for
strlen(kn->node) >= buflen. This is noted and it is a legitimate concern.

     54 static int kernfs_name_locked(struct kernfs_node *kn, char *buf, size_t buflen)
     55 {
     56         size_t len;
     57
     58         if (!kn)
     59                 return strscpy(buf, "(null)", buflen);
     60
     61         len = strscpy(buf, kn->parent ? kn->name : "/", buflen);
     62
     63         if (unlikely(len == -E2BIG)) {
     64                 return buflen - 1;
     65         } else
     66                 return len;
     67 }

Returning strlen(kn->name) that is greater than buflen really does not make any
sense IMHO, as by the first character after buflen we already know we are in
overrun.

Maybe the comment should be like this:

/*
  * The function kernfs_name_locked() returns the name of kernfs_node *kn truncated
  * to buflen - 1 or a "/" if it is the root node.
  *
  * The resulting buffer buf is in any case zero-terminated, and the value returned
  * is the number of actual characters copied (excluding the termination NUL byte).
  */

Would this be clear enough in your opinion?

2. The example with cgroup_show_path() that is of your concern. I have carefully
examined it and there is nothing to worry about.

So, the stacktrace goes like this:

kernel/cgroup/cgroup.c:1879: cgroup_show_path(sf, kf_node, kf_root)
	1893: len = kernfs_path_from_node(kf_node, ns_cgroup->kn, buf, PATH_MAX);

fs/kernfs/dir.c:228: kernfs_path_from_node(to, from, buf, buflen)
	235:  ret = kernfs_path_from_node_locked(to, from, buf, buflen);

fs/kernfs/dir.c:141: kernfs_path_from_node_locked(kn_to, kn_from, buf, buflen)

Let us analyse the stacktrace:

kernel/cgroup/cgroup.c:
=======================
   1879 int cgroup_show_path(struct seq_file *sf, struct kernfs_node *kf_node,
   1880                      struct kernfs_root *kf_root)
   1881 {
   1882         int len = 0;
   1883         char *buf = NULL;
   1884         struct cgroup_root *kf_cgroot = cgroup_root_from_kf(kf_root);
   1885         struct cgroup *ns_cgroup;
   1886
   1887         buf = kmalloc(PATH_MAX, GFP_KERNEL);
   1888         if (!buf)
   1889                 return -ENOMEM;
   1890
   1891         spin_lock_irq(&css_set_lock);
   1892         ns_cgroup = current_cgns_cgroup_from_root(kf_cgroot);
→ 1893         len = kernfs_path_from_node(kf_node, ns_cgroup->kn, buf, PATH_MAX);
   1894         spin_unlock_irq(&css_set_lock);
   1895
   1896         if (len >= PATH_MAX)
   1897                 len = -ERANGE;
   1898         else if (len > 0) {
   1899                 seq_escape(sf, buf, " \t\n\\");
   1900                 len = 0;
   1901         }
   1902         kfree(buf);
   1903         return len;
   1904 }

fs/kernfs/dir.c:
================
    228 int kernfs_path_from_node(struct kernfs_node *to, struct kernfs_node *from,
    229                           char *buf, size_t buflen)
    230 {
    231         unsigned long flags;
    232         int ret;
    233
    234         read_lock_irqsave(&kernfs_rename_lock, flags);
→  235         ret = kernfs_path_from_node_locked(to, from, buf, buflen);
    236         read_unlock_irqrestore(&kernfs_rename_lock, flags);
    237         return ret;
    238 }
    239 EXPORT_SYMBOL_GPL(kernfs_path_from_node);

The problem seems to be this function:
======================================

    141 static int kernfs_path_from_node_locked(struct kernfs_node *kn_to,
    142                                         struct kernfs_node *kn_from,
    143                                         char *buf, size_t buflen)
    144 {
    145         struct kernfs_node *kn, *common;
    146         const char parent_str[] = "/..";
    147         size_t depth_from, depth_to, len = 0;
    148         int i, j;
    149
    150         if (!kn_to)
  → 151                 return strscpy(buf, "(null)", buflen);
    152
    153         if (!kn_from)
    154                 kn_from = kernfs_root(kn_to)->kn;
    155
    156         if (kn_from == kn_to)
  → 157                 return strscpy(buf, "/", buflen);
    158
    159         common = kernfs_common_ancestor(kn_from, kn_to);
    160         if (WARN_ON(!common))
    161                 return -EINVAL;
    162
    163         depth_to = kernfs_depth(common, kn_to);
    164         depth_from = kernfs_depth(common, kn_from);
    165
    166         buf[0] = '\0';
    167
    168         for (i = 0; i < depth_from; i++)
  → 169                 len += strscpy(buf + len, parent_str,
    170                                len < buflen ? buflen - len : 0);
    171
    172         /* Calculate how many bytes we need for the rest */
    173         for (i = depth_to - 1; i >= 0; i--) {
    174                 for (kn = kn_to, j = 0; j < i; j++)
    175                         kn = kn->parent;
  → 176                 len += strscpy(buf + len, "/",
    177                                len < buflen ? buflen - len : 0);
  → 178                 len += strscpy(buf + len, kn->name,
    179                                len < buflen ? buflen - len : 0);
    180         }
    181
    182         return len;
    183 }

We have five (5) uses of strlcpy() -> strscpy() conversion.

Line 151: strscpy(buf, "(null)", buflen);
		cannot overrun and immediately returns - return value just as if strlcpy() was used.

Line 157: strscpy(buf, "/", buflen);
		cannot overrun and immediately returns - return value just as if strlcpy() was used.

Line 169: len += strscpy(buf + len, parent_str, len < buflen ? buflen - len : 0);

Line 176: len += strscpy(buf + len, "/",        len < buflen ? buflen - len : 0);

Line 178: len += strscpy(buf + len, kn->name,   len < buflen ? buflen - len : 0);

As you will clearly see, all of those cases are already carefully and cleverly protected already against
overrun. This was apparently done by you and Mr. Aditya Kali. Nothing changes here with replacement
one-for-one strlcpy() -> strscpy(), as there is never a buffer overrun.

In case of overrun, the construct (len < buflen ? buflen - len : 0) will simply give a "count" of
zero, and this will cause an early exit from strscpy().

So, my conclusion is that for callers of kernfs_path_from_node_locked() and kernfs_path_from_node()
nothing changes in semantics or the return value.

For the kernfs_name_locked(), we get a truncated value of kn->name in buf as before, only return
value changes for this special case, but I have explained and I think proven that it is used
nowhere in code.

For the function:

fs/kernfs/dir.c
===============
    856 static struct kernfs_node *kernfs_walk_ns(struct kernfs_node *parent,
    857                                           const unsigned char *path,
    858                                           const void *ns)
    859 {
    860         size_t len;
    861         char *p, *name;
    862
    863         lockdep_assert_held_read(&kernfs_root(parent)->kernfs_rwsem);
    864
    865         spin_lock_irq(&kernfs_pr_cont_lock);
    866
    867         len = strscpy(kernfs_pr_cont_buf, path, sizeof(kernfs_pr_cont_buf));
    868
    869         if (unlikely(len == -E2BIG)) {
    870                 spin_unlock_irq(&kernfs_pr_cont_lock);
    871                 return NULL;
    872         }
    873
    874         p = kernfs_pr_cont_buf;
    875
    876         while ((name = strsep(&p, "/")) && parent) {
    877                 if (*name == '\0')
    878                         continue;
    879                 parent = kernfs_find_ns(parent, name, ns);
    880         }
    881
    882         spin_unlock_irq(&kernfs_pr_cont_lock);
    883
    884         return parent;
    885 }

Nothing changes. Neither the truncation of buffer and neither the return value (NULL or parent).

The previous check was:

    860         len = strlcpy(kernfs_pr_cont_buf, path, sizeof(kernfs_pr_cont_buf));
    861
    862         if (len >= sizeof(kernfs_pr_cont_buf)) {
    863                 spin_unlock_irq(&kernfs_pr_cont_lock);
    864                 return NULL;
    865         }

The string is truncated on overrun just as with strlcpy(), and the test

	if (len == -E2BIG)

is equivalent to

	if (len >= sizeof(kernfs_pr_cont_buf))

NOTE:

The forementioned situation in cgroup_show_path():

→ 1893         len = kernfs_path_from_node(kf_node, ns_cgroup->kn, buf, PATH_MAX);
   1894         spin_unlock_irq(&css_set_lock);
   1895
   1896         if (len >= PATH_MAX)
   1897                 len = -ERANGE;
   1898         else if (len > 0) {
   1899                 seq_escape(sf, buf, " \t\n\\");
   1900                 len = 0;
   1901         }

len > PATH_MAX can never happen because:

kernfs_path_from_node(kf_node, ns_cgroup->kn, buf, PATH_MAX);

calls

    234         read_lock_irqsave(&kernfs_rename_lock, flags);
→  235         ret = kernfs_path_from_node_locked(to, from, buf, buflen);
    236         read_unlock_irqrestore(&kernfs_rename_lock, flags);
    237         return ret;

which is evaluated to:

    234         read_lock_irqsave(&kernfs_rename_lock, flags);
→  235         ret = kernfs_path_from_node_locked(to, from, buf, PATH_MAX);
    236         read_unlock_irqrestore(&kernfs_rename_lock, flags);
    237         return ret;

and then to:

    141 static int kernfs_path_from_node_locked(struct kernfs_node *kn_to,
    142                                         struct kernfs_node *kn_from,
    143                                         char *buf, size_t buflen)
    144 {
    145         struct kernfs_node *kn, *common;
    146         const char parent_str[] = "/..";
    147         size_t depth_from, depth_to, len = 0;
    148         int i, j;
    149
    150         if (!kn_to)
  → 151                 return strscpy(buf, "(null)", PATH_MAX);
    152
    153         if (!kn_from)
    154                 kn_from = kernfs_root(kn_to)->kn;
    155
    156         if (kn_from == kn_to)
  → 157                 return strscpy(buf, "/", PATH_MAX);
    158
    159         common = kernfs_common_ancestor(kn_from, kn_to);
    160         if (WARN_ON(!common))
    161                 return -EINVAL;
    162
    163         depth_to = kernfs_depth(common, kn_to);
    164         depth_from = kernfs_depth(common, kn_from);
    165
    166         buf[0] = '\0';
    167
    168         for (i = 0; i < depth_from; i++)
  → 169                 len += strscpy(buf + len, parent_str,
    170                                len < PATH_MAX ? PATH_MAX - len : 0);
    171
    172         /* Calculate how many bytes we need for the rest */
    173         for (i = depth_to - 1; i >= 0; i--) {
    174                 for (kn = kn_to, j = 0; j < i; j++)
    175                         kn = kn->parent;
  → 176                 len += strscpy(buf + len, "/",
    177                                len < PATH_MAX ? PATH_MAX - len : 0);
  → 178                 len += strscpy(buf + len, kn->name,
    179                                len < PATH_MAX ? PATH_MAX - len : 0);
    180         }
    181
    182         return len;
    183 }

len will never exceed PATH_MAX, because it is increased by (PATH_MAX - len) or 0 (zero),
as strscpy will copy either PATH_MAX - len or 0 (zero) characters.

The second use of kernfs_path_from_node() appears safe:

   2350 int cgroup_path_ns_locked(struct cgroup *cgrp, char *buf, size_t buflen,
   2351                           struct cgroup_namespace *ns)
   2352 {
   2353         struct cgroup *root = cset_cgroup_from_root(ns->root_cset, cgrp->root);
   2354
→ 2355         return kernfs_path_from_node(cgrp->kn, root->kn, buf, buflen);
   2356 }

According to Bootlin, this covers all uses of the function kernfs_path_from_node():

Defined in 1 files as a prototype:

     include/linux/kernfs.h, line 396 (as a prototype)

Defined in 2 files as a function:

     fs/kernfs/dir.c, line 221 (as a function)
     include/linux/kernfs.h, line 476 (as a function)

Documented in 1 files:

     fs/kernfs/dir.c, line 205

Referenced in 3 files:

     fs/kernfs/dir.c
         line 232
         line 265
     include/linux/kernfs.h, line 596
     kernel/cgroup/cgroup.c
         line 1893
         line 2355

Hope this helps.

Best regards,
Mirsad Todorovac
  

Patch

diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index 8b2bd65d70e7..a6971a6756bc 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -53,10 +53,17 @@  static bool kernfs_lockdep(struct kernfs_node *kn)
 
 static int kernfs_name_locked(struct kernfs_node *kn, char *buf, size_t buflen)
 {
+	size_t len;
+
 	if (!kn)
-		return strlcpy(buf, "(null)", buflen);
+		return strscpy(buf, "(null)", buflen);
+
+	len = strscpy(buf, kn->parent ? kn->name : "/", buflen);
 
-	return strlcpy(buf, kn->parent ? kn->name : "/", buflen);
+	if (unlikely(len == -E2BIG)) {
+		return buflen - 1;
+	} else
+		return len;
 }
 
 /* kernfs_node_depth - compute depth from @from to @to */
@@ -141,13 +148,13 @@  static int kernfs_path_from_node_locked(struct kernfs_node *kn_to,
 	int i, j;
 
 	if (!kn_to)
-		return strlcpy(buf, "(null)", buflen);
+		return strscpy(buf, "(null)", buflen);
 
 	if (!kn_from)
 		kn_from = kernfs_root(kn_to)->kn;
 
 	if (kn_from == kn_to)
-		return strlcpy(buf, "/", buflen);
+		return strscpy(buf, "/", buflen);
 
 	common = kernfs_common_ancestor(kn_from, kn_to);
 	if (WARN_ON(!common))
@@ -159,16 +166,16 @@  static int kernfs_path_from_node_locked(struct kernfs_node *kn_to,
 	buf[0] = '\0';
 
 	for (i = 0; i < depth_from; i++)
-		len += strlcpy(buf + len, parent_str,
+		len += strscpy(buf + len, parent_str,
 			       len < buflen ? buflen - len : 0);
 
 	/* Calculate how many bytes we need for the rest */
 	for (i = depth_to - 1; i >= 0; i--) {
 		for (kn = kn_to, j = 0; j < i; j++)
 			kn = kn->parent;
-		len += strlcpy(buf + len, "/",
+		len += strscpy(buf + len, "/",
 			       len < buflen ? buflen - len : 0);
-		len += strlcpy(buf + len, kn->name,
+		len += strscpy(buf + len, kn->name,
 			       len < buflen ? buflen - len : 0);
 	}
 
@@ -857,9 +864,9 @@  static struct kernfs_node *kernfs_walk_ns(struct kernfs_node *parent,
 
 	spin_lock_irq(&kernfs_pr_cont_lock);
 
-	len = strlcpy(kernfs_pr_cont_buf, path, sizeof(kernfs_pr_cont_buf));
+	len = strscpy(kernfs_pr_cont_buf, path, sizeof(kernfs_pr_cont_buf));
 
-	if (len >= sizeof(kernfs_pr_cont_buf)) {
+	if (unlikely(len == -E2BIG)) {
 		spin_unlock_irq(&kernfs_pr_cont_lock);
 		return NULL;
 	}