Message ID | 20231018-strncpy-drivers-nvme-host-fabrics-c-v1-1-b6677df40a35@google.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:2010:b0:403:3b70:6f57 with SMTP id fe16csp39347vqb; Wed, 18 Oct 2023 15:49:19 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHeATVErd8Qe1Bd9sOtBdAveExvoKBZJr8OqjZaRcYuS23FidPQpd01tFbkykHU9apF9Tju X-Received: by 2002:a05:6870:8a25:b0:1e9:d158:7e85 with SMTP id p37-20020a0568708a2500b001e9d1587e85mr814662oaq.30.1697669359309; Wed, 18 Oct 2023 15:49:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1697669359; cv=none; d=google.com; s=arc-20160816; b=Qpn1ia7bPG3ju10hqj8Ydu8DPe4hdYM0O6gbPiFkP8ZO7/c/Atyni2niJSKYZcWnuR mPzwXOhgmKCc2aV52wGN20ee7a2/x6jjkT3Nf0CXX+tBsGwr9z7kd0qxAIxzc6GD6/0O qlLPacbQSZzMveWRqVPF0/auZvgo766CkHZsMmO6HKV93QK3UUT5JO6L6rOwa+ySGLBV 5k+SFeC2RdD015IbcAcD+0omJQI2r99kHFnW+7omTXu2fspfD3hpRVze+XkaOgpYfYbb 9Qu11b/WALzoYXCpi8Yl7prI0FuP7BoqyMmgWUwfvzYvyYAeDCOazKQ7+tHXYbDEBRwv lCzg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:mime-version:date :dkim-signature; bh=2bSQ7qSQqMVz/tkx7lIyl6/3Dlp40FjkhjBtPKLDoWc=; fh=a+b54BReUAz2NfXDm3aZA3nT0UFIh1YGI3Fjir6e4m8=; b=F8su9wUg1BXeoXE9t2SAqEdtnZMmrWJydBjhP+Li1hchYICfhYdY0/XxjOkgSNMU3h xELNW6FXwgcGG2QWy6gE936bIkGXWK+MLVXItUQkdxYHaJZFOeuFoAtbUjr8WzodxuCs 3uuYGgADbZXUsGuBllAt2xQc5reKb6o9hr6iA6x2smGKdkjwo/Fh/uEq8iKJQt3TYFcF NffFKZPak+oYGvH21JVKeqMKexN7ZiZ9ZQpGYmzJEV11GGFbvvAcpkFqRxiSVLsfC+yQ Jf59ceHByAvClIaRO7pQzmJWImRybGE+WM9mrHGK1jJx2ZJqrHzLFlBkFCf3VuScC1DZ 3bZw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=gInKwQPE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from agentk.vger.email (agentk.vger.email. [23.128.96.32]) by mx.google.com with ESMTPS id t71-20020a63814a000000b005ad6babc019si3035943pgd.479.2023.10.18.15.49.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 18 Oct 2023 15:49:19 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) client-ip=23.128.96.32; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=gInKwQPE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id 578B3811544D; Wed, 18 Oct 2023 15:49:15 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229966AbjJRWs4 (ORCPT <rfc822;zwp10758@gmail.com> + 24 others); Wed, 18 Oct 2023 18:48:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43510 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229897AbjJRWsw (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 18 Oct 2023 18:48:52 -0400 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0DA19114 for <linux-kernel@vger.kernel.org>; Wed, 18 Oct 2023 15:48:51 -0700 (PDT) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-5a7b3ae01c0so112044127b3.3 for <linux-kernel@vger.kernel.org>; Wed, 18 Oct 2023 15:48:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1697669330; x=1698274130; darn=vger.kernel.org; h=cc:to:from:subject:message-id:mime-version:date:from:to:cc:subject :date:message-id:reply-to; bh=2bSQ7qSQqMVz/tkx7lIyl6/3Dlp40FjkhjBtPKLDoWc=; b=gInKwQPEan85wo+wHCJe1s91K8I/+OYBZigEUPoAf21eDXGWzAIBMH2J3nf5kqd3g7 yKn49VFOaPzGPlZzrXVG6vT788Ngics3kvSkIiIdYxwDB/yD31OSAaMN21jGybJnD89x /938DYxr0UcAmdPfeSykAYcgByOFQDNDlS4M4LtZ3yc2Y9BMQ0Ri2WGeklOYf+Tj/wJP RMGLphqBHbTrS0IMk3tBXJjKVCF99bJ+h+2RIaGmrEo4kwLdMhj8IGajfKquEcJj7PS+ WFikqZ1PXePbzZMY0bGcMDyhyp+XZ3T0V9TfzP6CE3uoqOO6RF/WhvEDPEfM1iMUTE93 SAJA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697669330; x=1698274130; h=cc:to:from:subject:message-id:mime-version:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=2bSQ7qSQqMVz/tkx7lIyl6/3Dlp40FjkhjBtPKLDoWc=; b=qfF6vOv7FZD0FIgMPkYdh1ncMvukR6zSUtOqXKWRt5qo8vxY5mwAO13fIed2RvXKky onIB8wn5SlPG1e5sJTb3Fa9TazLaARBQNVGfIrp5IlXB1KyU6C3pcQYyOExfIuadCGl0 Ya9rSGkb7E2axRc9PG3kEHCWMZbVBVjerrM1czBj6fDza70HHwycswNdRHZD0SBV/Vtp ol0bO86ohuNqOVQna3ypexN86Yz6v7fGQtLjtSI2UztVaKOQm9owcke4pP12WFK+Ap8A zgzxLLLcjKhiF+WFh67HA2Rz55iKlFkUKVzZtow2NJ8AZ13KJ7dlNlt59tbYcp1pl0UL yMsQ== X-Gm-Message-State: AOJu0Yzeluwwf2e3ARH+zmmI4TvV5dRz4ng06wscX+VmAYMxbcpe4Opb q81BU24rHUI+4yvzhiKRVkSKkGSAEFY+ZKxFqw== X-Received: from jstitt-linux1.c.googlers.com ([fda3:e722:ac3:cc00:2b:ff92:c0a8:23b5]) (user=justinstitt job=sendgmr) by 2002:a81:5217:0:b0:57a:118a:f31 with SMTP id g23-20020a815217000000b0057a118a0f31mr15075ywb.7.1697669330350; Wed, 18 Oct 2023 15:48:50 -0700 (PDT) Date: Wed, 18 Oct 2023 22:48:49 +0000 Mime-Version: 1.0 X-B4-Tracking: v=1; b=H4sIANBgMGUC/x3NMQ6DMAxA0asgz7VEDFRpr1IxhGDAQwOyUdQKc Xcixrf8f4CxChu8qwOUs5isqcA9KohLSDOjjMVANTWudh5t1xS3P44qmdUw5S/jstqOUxhUomH E1j2p84Goe3kopU15kt99+fTneQF0i0XsdQAAAA== X-Developer-Key: i=justinstitt@google.com; a=ed25519; pk=tC3hNkJQTpNX/gLKxTNQKDmiQl6QjBNCGKJINqAdJsE= X-Developer-Signature: v=1; a=ed25519-sha256; t=1697669329; l=2668; i=justinstitt@google.com; s=20230717; h=from:subject:message-id; bh=eTZl8RLmSTRi1g6HBFXiHkDVcGwMrh/QN3Blnb4tlGM=; b=NqNL716tMxkz9di1bgPcw2w6eC4c776nurW+wFygYEOGZYbyCCVOYleoxDFKohV9u1XrQ7RvN UxB6tgf9Ik5CsmgFtVwMYknNxVzD3Jah1eF23yTnA5oMSFhciN+z9AP X-Mailer: b4 0.12.3 Message-ID: <20231018-strncpy-drivers-nvme-host-fabrics-c-v1-1-b6677df40a35@google.com> Subject: [PATCH] nvme-fabrics: replace deprecated strncpy with strscpy From: Justin Stitt <justinstitt@google.com> To: Keith Busch <kbusch@kernel.org>, Jens Axboe <axboe@kernel.dk>, Christoph Hellwig <hch@lst.de>, Sagi Grimberg <sagi@grimberg.me> Cc: linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org, linux-hardening@vger.kernel.org, Justin Stitt <justinstitt@google.com> Content-Type: text/plain; charset="utf-8" X-Spam-Status: No, score=-8.4 required=5.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Wed, 18 Oct 2023 15:49:15 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1780135346193291712 X-GMAIL-MSGID: 1780135346193291712 |
Series |
nvme-fabrics: replace deprecated strncpy with strscpy
|
|
Commit Message
Justin Stitt
Oct. 18, 2023, 10:48 p.m. UTC
strncpy() is deprecated for use on NUL-terminated destination strings
[1] and as such we should prefer more robust and less ambiguous string
interfaces.
We expect both data->subsysnqn and data->hostnqn to be NUL-terminated
based on their usage with format specifier ("%s"):
fabrics.c:
322: dev_err(ctrl->device,
323: "%s, subsysnqn \"%s\"\n",
324: inv_data, data->subsysnqn);
...
349: dev_err(ctrl->device,
350: "Connect for subsystem %s is not allowed, hostnqn: %s\n",
351: data->subsysnqn, data->hostnqn);
Moreover, there's no need to NUL-pad since `data` is zero-allocated
already in fabrics.c:
383: data = kzalloc(sizeof(*data), GFP_KERNEL);
... therefore any further NUL-padding is rendered useless.
Considering the above, a suitable replacement is `strscpy` [2] due to
the fact that it guarantees NUL-termination on the destination buffer
without unnecessarily NUL-padding.
I opted not to switch NVMF_NQN_SIZE to sizeof(data->xyz) because the
size is defined as:
| /* NQN names in commands fields specified one size */
| #define NVMF_NQN_FIELD_LEN 256
... while NVMF_NQN_SIZE is defined as:
| /* However the max length of a qualified name is another size */
| #define NVMF_NQN_SIZE 223
Since 223 seems pretty magic, I'm not going to touch it.
Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1]
Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html [2]
Link: https://github.com/KSPP/linux/issues/90
Cc: linux-hardening@vger.kernel.org
Signed-off-by: Justin Stitt <justinstitt@google.com>
---
Note: build-tested only.
Found with: $ rg "strncpy\("
---
drivers/nvme/host/fabrics.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
---
base-commit: 58720809f52779dc0f08e53e54b014209d13eebb
change-id: 20231018-strncpy-drivers-nvme-host-fabrics-c-416258a22598
Best regards,
--
Justin Stitt <justinstitt@google.com>
Comments
On Wed, Oct 18, 2023 at 10:48:49PM +0000, Justin Stitt wrote: > strncpy() is deprecated for use on NUL-terminated destination strings > [1] and as such we should prefer more robust and less ambiguous string > interfaces. If we want that we need to stop pretendening direct manipulation of nul-terminate strings is a good idea. I suspect the churn of replacing one helper with another, maybe slightly better, one probably introduces more bugs than it fixes. If we want to attack the issue for real we need to use something better. lib/seq_buf.c is a good start for a lot of simple cases that just append to strings including creating complex ones. Kent had a bunch of good ideas on how to improve it, but couldn't be convinced to contribute to it instead of duplicating the functionality which is a bit sad, but I think we need to switch to something like seq_buf that actually has a counted string instead of all this messing around with the null-terminated strings.
On Wed, Oct 18, 2023 at 10:48:49PM +0000, Justin Stitt wrote: > strncpy() is deprecated for use on NUL-terminated destination strings > [1] and as such we should prefer more robust and less ambiguous string > interfaces. > > We expect both data->subsysnqn and data->hostnqn to be NUL-terminated > based on their usage with format specifier ("%s"): > fabrics.c: > 322: dev_err(ctrl->device, > 323: "%s, subsysnqn \"%s\"\n", > 324: inv_data, data->subsysnqn); > ... > 349: dev_err(ctrl->device, > 350: "Connect for subsystem %s is not allowed, hostnqn: %s\n", > 351: data->subsysnqn, data->hostnqn); > > Moreover, there's no need to NUL-pad since `data` is zero-allocated > already in fabrics.c: > 383: data = kzalloc(sizeof(*data), GFP_KERNEL); > ... therefore any further NUL-padding is rendered useless. > > Considering the above, a suitable replacement is `strscpy` [2] due to > the fact that it guarantees NUL-termination on the destination buffer > without unnecessarily NUL-padding. > > I opted not to switch NVMF_NQN_SIZE to sizeof(data->xyz) because the > size is defined as: > | /* NQN names in commands fields specified one size */ > | #define NVMF_NQN_FIELD_LEN 256 > > ... while NVMF_NQN_SIZE is defined as: > | /* However the max length of a qualified name is another size */ > | #define NVMF_NQN_SIZE 223 > > Since 223 seems pretty magic, I'm not going to touch it. struct nvmf_connect_data { ... char subsysnqn[NVMF_NQN_FIELD_LEN]; char hostnqn[NVMF_NQN_FIELD_LEN]; Honestly, the use of NVMF_NQN_SIZE as the length arg looks like a bug. struct nvmf_ctrl_options { ... char *subsysnqn; ... struct nvmf_host *host; struct nvmf_host { ... char nqn[NVMF_NQN_SIZE]; ctrl->opts->host->nqn is sized as NVMF_NQN_SIZE, so this is like saying: strscpy(dest, src, sizeof(src)); Which can go wrong when dest is smaller than src, but that's not the case here. It seems ctrl->opts->host->nqn is expected to be a C string: drivers/nvme/host/fabrics.h: strcmp(opts->host->nqn, ctrl->opts->host->nqn) || And subsysnqn seems to be the same size: drivers/nvme/target/core.c: subsys->subsysnqn = kstrndup(subsysnqn, NVMF_NQN_SIZE, drivers/nvme/target/core.c- GFP_KERNEL); So these really look like bugs to me. Perhaps a follow-up patch to fix this, if nvme maintainers agree? Either way, strscpy is an improvement: Reviewed-by: Kees Cook <keescook@chromium.org> -Kees > > Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1] > Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html [2] > Link: https://github.com/KSPP/linux/issues/90 > Cc: linux-hardening@vger.kernel.org > Signed-off-by: Justin Stitt <justinstitt@google.com> > --- > Note: build-tested only. > > Found with: $ rg "strncpy\(" > --- > drivers/nvme/host/fabrics.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c > index 8175d49f2909..57aad3ce311a 100644 > --- a/drivers/nvme/host/fabrics.c > +++ b/drivers/nvme/host/fabrics.c > @@ -386,8 +386,8 @@ static struct nvmf_connect_data *nvmf_connect_data_prep(struct nvme_ctrl *ctrl, > > uuid_copy(&data->hostid, &ctrl->opts->host->id); > data->cntlid = cpu_to_le16(cntlid); > - strncpy(data->subsysnqn, ctrl->opts->subsysnqn, NVMF_NQN_SIZE); > - strncpy(data->hostnqn, ctrl->opts->host->nqn, NVMF_NQN_SIZE); > + strscpy(data->subsysnqn, ctrl->opts->subsysnqn, NVMF_NQN_SIZE); > + strscpy(data->hostnqn, ctrl->opts->host->nqn, NVMF_NQN_SIZE); > > return data; > } > > --- > base-commit: 58720809f52779dc0f08e53e54b014209d13eebb > change-id: 20231018-strncpy-drivers-nvme-host-fabrics-c-416258a22598 > > Best regards, > -- > Justin Stitt <justinstitt@google.com> > >
On Thu, Oct 19, 2023 at 07:46:42AM +0200, Christoph Hellwig wrote: > On Wed, Oct 18, 2023 at 10:48:49PM +0000, Justin Stitt wrote: > > strncpy() is deprecated for use on NUL-terminated destination strings > > [1] and as such we should prefer more robust and less ambiguous string > > interfaces. > > If we want that we need to stop pretendening direct manipulation of > nul-terminate strings is a good idea. I suspect the churn of replacing > one helper with another, maybe slightly better, one probably > introduces more bugs than it fixes. > > If we want to attack the issue for real we need to use something > better. > > lib/seq_buf.c is a good start for a lot of simple cases that just > append to strings including creating complex ones. Kent had a bunch > of good ideas on how to improve it, but couldn't be convinced to > contribute to it instead of duplicating the functionality which > is a bit sad, but I think we need to switch to something like > seq_buf that actually has a counted string instead of all this messing > around with the null-terminated strings. When doing more complex string creation, I agree. I spent some time doing this while I was looking at removing strcat() and strlcat(); this is where seq_buf shines. (And seq_buf is actually both: it maintains its %NUL termination _and_ does the length counting.) The only thing clunky about it was initialization, but all the conversions I experimented with were way cleaner using seq_buf. I even added a comment to strlcat()'s kern-doc to aim folks at seq_buf. :) /** * strlcat - Append a string to an existing string ... * Do not use this function. While FORTIFY_SOURCE tries to avoid * read and write overflows, this is only possible when the sizes * of @p and @q are known to the compiler. Prefer building the * string with formatting, via scnprintf(), seq_buf, or similar. Almost all of the remaining strncpy() usage is just string to string copying, but the corner cases that are being spun out that aren't strscpy() or strscpy_pad() are covered by strtomem(), kmemdup_nul(), and memcpy(). Each of these are a clear improvement since they remove the ambiguity of the intended behavior. Using seq_buf ends up being way more overhead than is needed. -Kees
On Wed, Oct 18, 2023 at 11:01:54PM -0700, Kees Cook wrote: > On Thu, Oct 19, 2023 at 07:46:42AM +0200, Christoph Hellwig wrote: > > On Wed, Oct 18, 2023 at 10:48:49PM +0000, Justin Stitt wrote: > > > strncpy() is deprecated for use on NUL-terminated destination strings > > > [1] and as such we should prefer more robust and less ambiguous string > > > interfaces. > > > > If we want that we need to stop pretendening direct manipulation of > > nul-terminate strings is a good idea. I suspect the churn of replacing > > one helper with another, maybe slightly better, one probably > > introduces more bugs than it fixes. > > > > If we want to attack the issue for real we need to use something > > better. > > > > lib/seq_buf.c is a good start for a lot of simple cases that just > > append to strings including creating complex ones. Kent had a bunch > > of good ideas on how to improve it, but couldn't be convinced to > > contribute to it instead of duplicating the functionality which > > is a bit sad, but I think we need to switch to something like > > seq_buf that actually has a counted string instead of all this messing > > around with the null-terminated strings. > > When doing more complex string creation, I agree. I spent some time > doing this while I was looking at removing strcat() and strlcat(); this > is where seq_buf shines. (And seq_buf is actually both: it maintains its > %NUL termination _and_ does the length counting.) The only thing clunky > about it was initialization, but all the conversions I experimented with > were way cleaner using seq_buf. (...) I also agree. I'm using several other schemes based on pointer+length in other projects and despite not being complete in terms of API (due to the slow migration of old working code), over time it proves much easier to use and requires far less controls. With NUL-teminated strings you need to perform checks for each and every operation. When the length is known and controlled, most often you can get rid of many tests on intermediate operations and perform a check at the end, thus you end up with less "if" and "goto fail" in the code, because the checks are no longer for "not crashing nor introducing vulnerabilities", but just "returning a correct result", which can often be detected more easily. Another benefit I found by accident is that when you need to compare some tokens against multiple ones (say some keywords for example), it becomes much faster than strcmp()-based if/else series because in this case you start by comparing lengths instead of comparing contents. And when your macros allow you to constify string constants, the compiler will replace long "if" series with checks against constant values, and may even arrange them as a tree since all are constants, sometimes mixing with the first char as the discriminator. Typically on the test below I observe a 10x speedup at -O3 and ~5x at -O2 when I convert this: if (!strcmp(name, "host") || !strcmp(name, "content-length") || !strcmp(name, "connection") || !strcmp(name, "proxy-connection") || !strcmp(name, "keep-alive") || !strcmp(name, "upgrade") || !strcmp(name, "te") || !strcmp(name, "transfer-encoding")) return 1; to this: if (isteq(name, ist("host")) || isteq(name, ist("content-length")) || isteq(name, ist("connection")) || isteq(name, ist("proxy-connection")) || isteq(name, ist("keep-alive")) || isteq(name, ist("upgrade")) || isteq(name, ist("te")) || isteq(name, ist("transfer-encoding"))) return 1; The code is larger but when compiled at -Os, it instead becomes smaller. Another interesting property I'm using in the API above, that might or might not apply there is that for most archs we care about, functions can take a struct of two words passed as registers, and can return such a struct as a pair of registers as well. This allows to chain functions by passing one function's return as the argument to another one, which is what users often want to do to avoid intermediate variables. All this to say that length-based strings do offer quite a lot of benefits over the long term. Willy
On Thu, Oct 19, 2023 at 09:01:53AM +0200, Willy Tarreau wrote: > On Wed, Oct 18, 2023 at 11:01:54PM -0700, Kees Cook wrote: > > On Thu, Oct 19, 2023 at 07:46:42AM +0200, Christoph Hellwig wrote: > > > On Wed, Oct 18, 2023 at 10:48:49PM +0000, Justin Stitt wrote: > > > > strncpy() is deprecated for use on NUL-terminated destination strings > > > > [1] and as such we should prefer more robust and less ambiguous string > > > > interfaces. > > > > > > If we want that we need to stop pretendening direct manipulation of > > > nul-terminate strings is a good idea. I suspect the churn of replacing > > > one helper with another, maybe slightly better, one probably > > > introduces more bugs than it fixes. > > > > > > If we want to attack the issue for real we need to use something > > > better. > > > > > > lib/seq_buf.c is a good start for a lot of simple cases that just > > > append to strings including creating complex ones. Kent had a bunch > > > of good ideas on how to improve it, but couldn't be convinced to > > > contribute to it instead of duplicating the functionality which > > > is a bit sad, but I think we need to switch to something like > > > seq_buf that actually has a counted string instead of all this messing > > > around with the null-terminated strings. > > > > When doing more complex string creation, I agree. I spent some time > > doing this while I was looking at removing strcat() and strlcat(); this > > is where seq_buf shines. (And seq_buf is actually both: it maintains its > > %NUL termination _and_ does the length counting.) The only thing clunky > > about it was initialization, but all the conversions I experimented with > > were way cleaner using seq_buf. > (...) > > I also agree. I'm using several other schemes based on pointer+length in > other projects and despite not being complete in terms of API (due to the > slow migration of old working code), over time it proves much easier to > use and requires far less controls. > > With NUL-teminated strings you need to perform checks for each and every > operation. When the length is known and controlled, most often you can > get rid of many tests on intermediate operations and perform a check at > the end, thus you end up with less "if" and "goto fail" in the code, > because the checks are no longer for "not crashing nor introducing > vulnerabilities", but just "returning a correct result", which can often > be detected more easily. > > Another benefit I found by accident is that when you need to compare some > tokens against multiple ones (say some keywords for example), it becomes > much faster than strcmp()-based if/else series because in this case you > start by comparing lengths instead of comparing contents. And when your > macros allow you to constify string constants, the compiler will replace > long "if" series with checks against constant values, and may even arrange > them as a tree since all are constants, sometimes mixing with the first > char as the discriminator. Typically on the test below I observe a 10x > speedup at -O3 and ~5x at -O2 when I convert this: > > if (!strcmp(name, "host") || > !strcmp(name, "content-length") || > !strcmp(name, "connection") || > !strcmp(name, "proxy-connection") || > !strcmp(name, "keep-alive") || > !strcmp(name, "upgrade") || > !strcmp(name, "te") || > !strcmp(name, "transfer-encoding")) > return 1; > > to this: > > if (isteq(name, ist("host")) || > isteq(name, ist("content-length")) || > isteq(name, ist("connection")) || > isteq(name, ist("proxy-connection")) || > isteq(name, ist("keep-alive")) || > isteq(name, ist("upgrade")) || > isteq(name, ist("te")) || > isteq(name, ist("transfer-encoding"))) > return 1; > > The code is larger but when compiled at -Os, it instead becomes smaller. > > Another interesting property I'm using in the API above, that might or > might not apply there is that for most archs we care about, functions > can take a struct of two words passed as registers, and can return > such a struct as a pair of registers as well. This allows to chain > functions by passing one function's return as the argument to another > one, which is what users often want to do to avoid intermediate > variables. Chaining should be nice cherry on top for very specific cases but certainly not promoted or advertised. Deleting intermediate variables promotes implementation-defined behaviour because of unspecified order of evaluation of function arguments. Second, debuggers still operate with lines in mind, so jumping to the next statement written like this f(g(), h()) can be problematic. Intermediate variables are much less of a problem now that -Wdeclaration-after-statement has been finally abolished! They don't consume LOC anymore. > All this to say that length-based strings do offer quite a lot of > benefits over the long term. As long as they are named kstring :-) Or std_string, he-he.
On Thu, Oct 19, 2023 at 02:40:52PM +0300, Alexey Dobriyan wrote: > On Thu, Oct 19, 2023 at 09:01:53AM +0200, Willy Tarreau wrote: > > On Wed, Oct 18, 2023 at 11:01:54PM -0700, Kees Cook wrote: > > > On Thu, Oct 19, 2023 at 07:46:42AM +0200, Christoph Hellwig wrote: > > > > On Wed, Oct 18, 2023 at 10:48:49PM +0000, Justin Stitt wrote: > > > > > strncpy() is deprecated for use on NUL-terminated destination strings > > > > > [1] and as such we should prefer more robust and less ambiguous string > > > > > interfaces. > > > > > > > > If we want that we need to stop pretendening direct manipulation of > > > > nul-terminate strings is a good idea. I suspect the churn of replacing > > > > one helper with another, maybe slightly better, one probably > > > > introduces more bugs than it fixes. > > > > > > > > If we want to attack the issue for real we need to use something > > > > better. > > > > > > > > lib/seq_buf.c is a good start for a lot of simple cases that just > > > > append to strings including creating complex ones. Kent had a bunch > > > > of good ideas on how to improve it, but couldn't be convinced to > > > > contribute to it instead of duplicating the functionality which > > > > is a bit sad, but I think we need to switch to something like > > > > seq_buf that actually has a counted string instead of all this messing > > > > around with the null-terminated strings. > > > > > > When doing more complex string creation, I agree. I spent some time > > > doing this while I was looking at removing strcat() and strlcat(); this > > > is where seq_buf shines. (And seq_buf is actually both: it maintains its > > > %NUL termination _and_ does the length counting.) The only thing clunky > > > about it was initialization, but all the conversions I experimented with > > > were way cleaner using seq_buf. > > (...) > > > > I also agree. I'm using several other schemes based on pointer+length in > > other projects and despite not being complete in terms of API (due to the > > slow migration of old working code), over time it proves much easier to > > use and requires far less controls. > > > > With NUL-teminated strings you need to perform checks for each and every > > operation. When the length is known and controlled, most often you can > > get rid of many tests on intermediate operations and perform a check at > > the end, thus you end up with less "if" and "goto fail" in the code, > > because the checks are no longer for "not crashing nor introducing > > vulnerabilities", but just "returning a correct result", which can often > > be detected more easily. > > > > Another benefit I found by accident is that when you need to compare some > > tokens against multiple ones (say some keywords for example), it becomes > > much faster than strcmp()-based if/else series because in this case you > > start by comparing lengths instead of comparing contents. And when your > > macros allow you to constify string constants, the compiler will replace > > long "if" series with checks against constant values, and may even arrange > > them as a tree since all are constants, sometimes mixing with the first > > char as the discriminator. Typically on the test below I observe a 10x > > speedup at -O3 and ~5x at -O2 when I convert this: > > > > if (!strcmp(name, "host") || > > !strcmp(name, "content-length") || > > !strcmp(name, "connection") || > > !strcmp(name, "proxy-connection") || > > !strcmp(name, "keep-alive") || > > !strcmp(name, "upgrade") || > > !strcmp(name, "te") || > > !strcmp(name, "transfer-encoding")) > > return 1; > > > > to this: > > > > if (isteq(name, ist("host")) || > > isteq(name, ist("content-length")) || > > isteq(name, ist("connection")) || > > isteq(name, ist("proxy-connection")) || > > isteq(name, ist("keep-alive")) || > > isteq(name, ist("upgrade")) || > > isteq(name, ist("te")) || > > isteq(name, ist("transfer-encoding"))) > > return 1; > > > > The code is larger but when compiled at -Os, it instead becomes smaller. > > > > Another interesting property I'm using in the API above, that might or > > might not apply there is that for most archs we care about, functions > > can take a struct of two words passed as registers, and can return > > such a struct as a pair of registers as well. This allows to chain > > functions by passing one function's return as the argument to another > > one, which is what users often want to do to avoid intermediate > > variables. > > Chaining should be nice cherry on top for very specific cases but certainly > not promoted or advertised. Deleting intermediate variables promotes > implementation-defined behaviour because of unspecified order of evaluation > of function arguments. Second, debuggers still operate with lines in mind, > so jumping to the next statement written like this > > f(g(), h()) > > can be problematic. It obviously depends what these functions do, but that remains true for lots of other use cases applying to a shared memory location, if that's the case. Also it happens that a lot of string functions that are used as arguments to other ones are in fact lookups, skip, trim etc which only manipulate the pointer and the length and not the contents. > Intermediate variables are much less of a problem now > that -Wdeclaration-after-statement has been finally abolished! They don't > consume LOC anymore. Intermediate variables declared after statements remain an abomination which turn a visual lookup from O(indent_levels) to O(lines) because normally you only have to quickly glance a the previous opening brace and if you don't find, you repeat, but with them you have to visually scan every single line. They're now allowed for macros and iterators which can make a good use of them but it's not a reason for abusing them in code supposed to be reviewable by humans. > > All this to say that length-based strings do offer quite a lot of > > benefits over the long term. > > As long as they are named kstring :-) Or std_string, he-he. That point is the last of my concerns ;-) Willy
On Wed, Oct 18, 2023 at 11:01:54PM -0700, Kees Cook wrote: > Almost all of the remaining strncpy() usage is just string to string > copying, but the corner cases that are being spun out that aren't > strscpy() or strscpy_pad() are covered by strtomem(), kmemdup_nul(), > and memcpy(). Each of these are a clear improvement since they remove > the ambiguity of the intended behavior. Using seq_buf ends up being way > more overhead than is needed. I'm really not sure strscpy is much of an improvement. In this particular case in most other places we simply use a snprintf for nqns, which seems useful here to if we don't want the full buf. But switching to a completely undocumented helper like strscpy seems not useful at all.
On Thu, Oct 19, 2023 at 9:46 PM Christoph Hellwig <hch@lst.de> wrote: > > On Wed, Oct 18, 2023 at 11:01:54PM -0700, Kees Cook wrote: > > Almost all of the remaining strncpy() usage is just string to string > > copying, but the corner cases that are being spun out that aren't > > strscpy() or strscpy_pad() are covered by strtomem(), kmemdup_nul(), > > and memcpy(). Each of these are a clear improvement since they remove > > the ambiguity of the intended behavior. Using seq_buf ends up being way > > more overhead than is needed. > > I'm really not sure strscpy is much of an improvement. In this particular > case in most other places we simply use a snprintf for nqns, which seems > useful here to if we don't want the full buf. > > But switching to a completely undocumented helper like strscpy seems not > useful at all. There's some docs at [1]. Perhaps there could be more? [1]: https://elixir.bootlin.com/linux/v6.6-rc6/source/include/linux/fortify-string.h#L292
On Fri, 20 Oct 2023 at 10:40, Justin Stitt <justinstitt@google.com> wrote: > > There's some docs at [1]. Perhaps there could be more? > > [1]: https://elixir.bootlin.com/linux/v6.6-rc6/source/include/linux/fortify-string.h#L292 Note that we have so few 'strlcpy()' calls that we really should remove that horrid horrid interface. It's a buggy piece of sh*t. 'strlcpy()' is fundamentally unsafe BY DESIGN if you don't trust the source string - which is one of the alleged reasons to use it. That said, while 'strscpy()' fixes the fundamental conceptual bugs of strlcpy(), it's worth noting that it has *two* differences wrt strlcpy: - it doesn't have the broken return value - it copies things in word-size chunks to be more efficient And because of that word-size thing it will possibly insert more than one NUL character at the end of the result. To give an example, if you have dst[64] = "abcdefghijklmnopqrstuvwxyz"; src[64] = "123\0........"; strlcpy(dst, src, 64); then the strlcpy() will return 3 (the size of the copy), but the destination will end up looking like dst[64] = "123\0\0\0\0\0ijklmnopqrstuvwxyz..."; This odd "possibly word padding" is basically never relevant (and it's obviously always also limited by the size you gave it), but *if* you are doing something really odd, and you expect the end of the destination string to not be modified at all past the final NUL of the copy, you need to be aware of this. It does mean that if you used to have dst[4]; strlcpy(dst, "abc", 8); then that *used* to work (because it would copy four bytes: "abc\0" and that fits in 'dst[]'). But dst[4]; strscpy(dst, "abc", 8); will overflow dst[], because it will do a word-copy and you told 'strscpy()' that you had a 8-byte buffer, and it will try to write "abc\0\0\0\0\0" into the destination. The above is insane code, but it's an example of why a blind strlcpy->strscpy conversion might change semantics. Linus
On Fri, Oct 20, 2023 at 10:56:31AM -0700, Linus Torvalds wrote: > On Fri, 20 Oct 2023 at 10:40, Justin Stitt <justinstitt@google.com> wrote: > > > > There's some docs at [1]. Perhaps there could be more? > > > > [1]: https://elixir.bootlin.com/linux/v6.6-rc6/source/include/linux/fortify-string.h#L292 > > Note that we have so few 'strlcpy()' calls that we really should > remove that horrid horrid interface. It's a buggy piece of sh*t. Yup, that's on-going. There's just a few left; Azeem has been chipping away at strlcpy. > It does mean that if you used to have > > dst[4]; > strlcpy(dst, "abc", 8); > > then that *used* to work (because it would copy four bytes: "abc\0" > and that fits in 'dst[]'). But > > dst[4]; > strscpy(dst, "abc", 8); > > will overflow dst[], because it will do a word-copy and you told > 'strscpy()' that you had a 8-byte buffer, and it will try to write > "abc\0\0\0\0\0" into the destination. Luckily, we already have checks for these mismatched sizes at compile time (i.e. CONFIG_FORTIFY_SOURCE will already check for pathological cases like above where 8 > sizeof(dst)). > The above is insane code, but it's an example of why a blind > strlcpy->strscpy conversion might change semantics. Totally agreed. All of the recent string conversions have been paying close attention to the behavioral differences.
On Fri, Oct 20, 2023 at 10:40:12AM -0700, Justin Stitt wrote: > On Thu, Oct 19, 2023 at 9:46 PM Christoph Hellwig <hch@lst.de> wrote: > > > > On Wed, Oct 18, 2023 at 11:01:54PM -0700, Kees Cook wrote: > > > Almost all of the remaining strncpy() usage is just string to string > > > copying, but the corner cases that are being spun out that aren't > > > strscpy() or strscpy_pad() are covered by strtomem(), kmemdup_nul(), > > > and memcpy(). Each of these are a clear improvement since they remove > > > the ambiguity of the intended behavior. Using seq_buf ends up being way > > > more overhead than is needed. > > > > I'm really not sure strscpy is much of an improvement. In this particular > > case in most other places we simply use a snprintf for nqns, which seems > > useful here to if we don't want the full buf. > > > > But switching to a completely undocumented helper like strscpy seems not > > useful at all. I'm curious where you looked and didn't find documentation -- perhaps there is an improvement to be made to aim one to where the existing documentation lives? > > There's some docs at [1]. Perhaps there could be more? > > [1]: https://elixir.bootlin.com/linux/v6.6-rc6/source/include/linux/fortify-string.h#L292 Right, And it's even valid kern-doc, which gets rendered in the kernel API docs, along with all the other string functions: https://docs.kernel.org/core-api/kernel-api.html#c.strscpy
On Fri, Oct 20, 2023 at 11:30:49AM -0700, Kees Cook wrote: > I'm curious where you looked and didn't find documentation -- perhaps > there is an improvement to be made to aim one to where the existing > documentation lives? My order was the following: - look for kernel doc on the main function implementation in lib/string.c (as found by a grep for an EXPORT_SYMBOL for it) - after not finding it there, but seeing that it has an ifdef for an arch override, which turns out to be unused - then I grepped the Documentation/ directory for it, and while there are quite a few matches for strscpy, they are largely in examples, with the only text referring to strscpy being Documentation/process/deprecated.rst that tells you to use it instead of strcpy, but not how it actually works - after that I realized that some people put the kerneldoc on the declaration, so I looked at that in string.h, but couldn't find it. > > There's some docs at [1]. Perhaps there could be more? > > > > [1]: https://elixir.bootlin.com/linux/v6.6-rc6/source/include/linux/fortify-string.h#L292 > > Right, And it's even valid kern-doc, which gets rendered in the kernel > API docs, along with all the other string functions: > https://docs.kernel.org/core-api/kernel-api.html#c.strscpy Well, I never use the generated kerneldoc because it's much harder than just grepping the tree, but indeed it exists even if it's hidden in the most obsfucated way. But at least I know now!
On Thu, 2023-10-26 at 12:01 +0200, Christoph Hellwig wrote: > > > There's some docs at [1]. Perhaps there could be more? > > > > > > [1]: > > > https://elixir.bootlin.com/linux/v6.6-rc6/source/include/linux/fortify-string.h#L292 > > > > Right, And it's even valid kern-doc, which gets rendered in the > > kernel API docs, along with all the other string functions: > > https://docs.kernel.org/core-api/kernel-api.html#c.strscpy > > Well, I never use the generated kerneldoc because it's much harder > than just grepping the tree, but indeed it exists even if it's hidden > in the most obsfucated way. But at least I know now! This, honestly, is one of the really annoying problems of kerneldoc. When looking for structures or functions git grep "<function> -" usually finds it. However, I recently asked on linux-scsi if we could point to the doc about system_state and what it meant. However, either we all suck or there's no such documentation because no-one could find it. While it's nice in theory to have everything documented, it's not much use if no one can actually find the information ... James
> > > [1]: https://elixir.bootlin.com/linux/v6.6-rc6/source/include/linux/fortify-string.h#L292
I found that https://elixir.bootlin.com/linux is the best way to find
Documentation for functions and structures. I would suggest try it
first, and only when what fails to start using grep.
Andrew
On Thu, Oct 26, 2023 at 03:44:22PM +0200, Andrew Lunn wrote: > > > > [1]: https://elixir.bootlin.com/linux/v6.6-rc6/source/include/linux/fortify-string.h#L292 > > I found that https://elixir.bootlin.com/linux is the best way to find > Documentation for functions and structures. I would suggest try it > first, and only when what fails to start using grep. It's painful to have to open the HTML documentation generated by the kernel build system when developing, and even more painful to have to use a particular website. If the documentation is difficult to locate in the source tree, we're doing something wrong.
On Thu, 26 Oct 2023 07:39:44 -0400 James Bottomley <James.Bottomley@HansenPartnership.com> wrote: > While it's nice in theory to have everything documented, it's not much > use if no one can actually find the information ... Does kerneldoc provide an automated index? That is, if we had a single file that had every function in the kernel that is documented, with the path to the file that documents it, it would make finding documentation much simpler. Maybe it already does? Which would mean we need a way to find the index too! -- Steve
Hi Steven, On Thu, Oct 26, 2023 at 3:52 PM Steven Rostedt <rostedt@goodmis.org> wrote: > On Thu, 26 Oct 2023 07:39:44 -0400 > James Bottomley <James.Bottomley@HansenPartnership.com> wrote: > > > While it's nice in theory to have everything documented, it's not much > > use if no one can actually find the information ... > > Does kerneldoc provide an automated index? That is, if we had a single file > that had every function in the kernel that is documented, with the path to > the file that documents it, it would make finding documentation much > simpler. > > Maybe it already does? Which would mean we need a way to find the index too! ctags? Although "git grep" is faster (assumed you use the "correct" search pattern, which can sometimes be challenging, indeed). Gr{oetje,eeting}s, Geert
Steven Rostedt <rostedt@goodmis.org> writes: > On Thu, 26 Oct 2023 07:39:44 -0400 > James Bottomley <James.Bottomley@HansenPartnership.com> wrote: > >> While it's nice in theory to have everything documented, it's not much >> use if no one can actually find the information ... > > Does kerneldoc provide an automated index? If you go to https://www.kernel.org/doc/html/latest/ and type a symbol into the search box on the left, you'll get directed to the right place (if it exists). For James's system_state example, it makes it clear that there's only one reference - in the coding-style document, of all places... I've never looked into that index to see how hard it would be to access without a browser. jon
On Thu, 2023-10-26 at 15:44 +0200, Andrew Lunn wrote: > > > > [1]: > > > > https://elixir.bootlin.com/linux/v6.6-rc6/source/include/linux/fortify-string.h#L292 > > I found that https://elixir.bootlin.com/linux That's a 404, I think you mean https://elixir.bootlin.com/linux/latest/source > is the best way to find Documentation for functions and structures. > I would suggest try it first, and only when what fails to start using > grep. I just tried it with system_state and it doesn't even find the definition. I think it might be because it has annotations which confuse the searcher (it's in init/main.c as enum system_states system_state __read_mostly; ). If there's any meaningful doc about it, elixir also doesn't find it. James
Jonathan Corbet <corbet@lwn.net> writes: > Steven Rostedt <rostedt@goodmis.org> writes: > >> On Thu, 26 Oct 2023 07:39:44 -0400 >> James Bottomley <James.Bottomley@HansenPartnership.com> wrote: >> >>> While it's nice in theory to have everything documented, it's not much >>> use if no one can actually find the information ... >> >> Does kerneldoc provide an automated index? > > If you go to https://www.kernel.org/doc/html/latest/ BTW there's now a shorter URL for this whic is really nice: https://docs.kernel.org/
On Thu, Oct 26, 2023 at 03:59:28PM +0200, Geert Uytterhoeven wrote: > Hi Steven, > > On Thu, Oct 26, 2023 at 3:52 PM Steven Rostedt <rostedt@goodmis.org> wrote: > > On Thu, 26 Oct 2023 07:39:44 -0400 > > James Bottomley <James.Bottomley@HansenPartnership.com> wrote: > > > > > While it's nice in theory to have everything documented, it's not much > > > use if no one can actually find the information ... > > > > Does kerneldoc provide an automated index? That is, if we had a single file > > that had every function in the kernel that is documented, with the path to > > the file that documents it, it would make finding documentation much > > simpler. > > > > Maybe it already does? Which would mean we need a way to find the index too! > > ctags? ctags is a tool from previous century. It doesn't help that "make tags" is single-threaded. It needs constant babysitting (loop-like macros, ignore attibute annotations which masquerade as identifiers). I think "make tags" became much slower because ignore-list is one giant regexp which only grows bigger. > Although "git grep" is faster (assumed you use the "correct" search > pattern, which can sometimes be challenging, indeed). I tried QT Creator indexing at some point (which is parallel), it needs to be told that headers are C not C++. I didn't find a way to tell it that .c files are C too but F2 jumped to definitions quite well. Also hovering over identifier/name works (being IDE it understands popular doc styles). It can be made to work reasonably well provided that you did "make allmodconfig" and added few header locations. clangd parses like compiler, not like human and kernel uses a lot of CONFIG defines so some config must be chosen. But I need to recheck all this stuff now that new version was propagated to distros. It should be better (and less segfaulty :-)
On Wed, 18 Oct 2023 22:48:49 +0000, Justin Stitt wrote: > strncpy() is deprecated for use on NUL-terminated destination strings > [1] and as such we should prefer more robust and less ambiguous string > interfaces. > > We expect both data->subsysnqn and data->hostnqn to be NUL-terminated > based on their usage with format specifier ("%s"): > fabrics.c: > 322: dev_err(ctrl->device, > 323: "%s, subsysnqn \"%s\"\n", > 324: inv_data, data->subsysnqn); > ... > 349: dev_err(ctrl->device, > 350: "Connect for subsystem %s is not allowed, hostnqn: %s\n", > 351: data->subsysnqn, data->hostnqn); > > [...] Applied to for-next/hardening, thanks! [1/1] nvme-fabrics: replace deprecated strncpy with strscpy https://git.kernel.org/kees/c/5ef935fd5520 Take care,
diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c index 8175d49f2909..57aad3ce311a 100644 --- a/drivers/nvme/host/fabrics.c +++ b/drivers/nvme/host/fabrics.c @@ -386,8 +386,8 @@ static struct nvmf_connect_data *nvmf_connect_data_prep(struct nvme_ctrl *ctrl, uuid_copy(&data->hostid, &ctrl->opts->host->id); data->cntlid = cpu_to_le16(cntlid); - strncpy(data->subsysnqn, ctrl->opts->subsysnqn, NVMF_NQN_SIZE); - strncpy(data->hostnqn, ctrl->opts->host->nqn, NVMF_NQN_SIZE); + strscpy(data->subsysnqn, ctrl->opts->subsysnqn, NVMF_NQN_SIZE); + strscpy(data->hostnqn, ctrl->opts->host->nqn, NVMF_NQN_SIZE); return data; }