Message ID | Y6SJDbKBk471KE4k@p183 |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:e747:0:0:0:0:0 with SMTP id c7csp52885wrn; Thu, 22 Dec 2022 08:45:43 -0800 (PST) X-Google-Smtp-Source: AMrXdXsDjZQ+PcolYhnmnQ+28kChMp0cXn0HwPhYBrAMBCeYIVVSk2UCP26KZDrUQdHlU72IWDRA X-Received: by 2002:a50:ed17:0:b0:46d:6f14:aec with SMTP id j23-20020a50ed17000000b0046d6f140aecmr5463588eds.0.1671727543247; Thu, 22 Dec 2022 08:45:43 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1671727543; cv=none; d=google.com; s=arc-20160816; b=V6mtGdDm5hh/oF2pdopqBxz4yPUajLGtLZyzFBcCyH8QqDPAEYS7k4SSpsAvuFZlRq SmMvpAjZZEBgZk26Vv5BUThekwWdgAJRWbjG7b0sQUbGC2wNvmiDJLCDtZkiq8zJH1CX 9kN7Uym9zvuMQH61age6ThHe1+SPT2LLoMl3MUFD2S0gL4CxIRE7eihhCJwwkD1vCnpU GCWvgGvWW8SNlOFxFAvEvdV/gWmNc67Mb9/kALHlTC+iAz46yRDKWFBKF+1tdXmiUHcZ TaX7H1f6QfADXtwhj0cWKAuVo8nayXtNOLyZwvHRLCDDpgJOvsLxTzAsDXaZ2lxzzusD MFsA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-disposition:mime-version:message-id :subject:cc:to:from:date:dkim-signature; bh=yZLAlOhCGilIwuTmtW0IVLJRanvaBYIelqxDGN9L3mA=; b=ONKQ0g48P+KQJiG3ebzN1ZId1i2RtaA4PJUj6fWLzzr/E21RUBx2HOmPcnPCwbl2Yx OrrDgHjyQAxPizBTdS0IMPsU6Jc8tEJ2DcqR6YE2rfmIZ4ksv6Thkw0hM2rRXi5Igb28 ycyKjrXhu/CPobfTdxWfhyIZCb6PdGYcnBH9zl4obyfflmCcaqUM0TNTpzrPC6UZi9ah ijoeym7qPpsRRMqdsmAkMqmS5GYv5bD8RzCPGUYfil7FjRn5JNAj6kk6GRLSn9M3+Sax h5wRlMHy5GSnybx0nBxyA7SqfII88jSeK5ljPAO8/z4BZXsyxqg7I/+0s3IKa5+DVJUH 2Afg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=ntlPTMV4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id o22-20020a170906975600b007ae8d01144dsi893474ejy.717.2022.12.22.08.45.19; Thu, 22 Dec 2022 08:45:43 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=ntlPTMV4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230178AbiLVQnB (ORCPT <rfc822;pacteraone@gmail.com> + 99 others); Thu, 22 Dec 2022 11:43:01 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51352 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229603AbiLVQm5 (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 22 Dec 2022 11:42:57 -0500 Received: from mail-wm1-x32d.google.com (mail-wm1-x32d.google.com [IPv6:2a00:1450:4864:20::32d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DAC5227B0F; Thu, 22 Dec 2022 08:42:56 -0800 (PST) Received: by mail-wm1-x32d.google.com with SMTP id ja17so1830869wmb.3; Thu, 22 Dec 2022 08:42:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-disposition:mime-version:message-id:subject:cc:to:from:date :from:to:cc:subject:date:message-id:reply-to; bh=yZLAlOhCGilIwuTmtW0IVLJRanvaBYIelqxDGN9L3mA=; b=ntlPTMV4mBJYN7GB1xLqSC6qEFW7/c4RVIUl7rVnZDNgEyUnIFTrW7ehcuwSjA9MOT roXNazIIcGsS24FHx1L0hDGQS3Z8bXmRO0K8a7JuCtbpCxERvYemdLj3Ua0F+5P3X8Js 6+6krr8AjM5RIrp+OxufDGOxjY2Hieowj0//MgfhL0H9x7rknnumipH69NwM/Hw9X11p peaLgdBwcc1r1HCNdLU7G+QzpckOgMmxMzZq1aEn04PkaKZum8VZr5h/v5z0CMpwNa1w EefqCy5z9v4qYICYsn3Qf9hH91DI57OppL3On7C7ra5J+d/YImz3xvg16yTkqmmOGHxI 5q1A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-disposition:mime-version:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=yZLAlOhCGilIwuTmtW0IVLJRanvaBYIelqxDGN9L3mA=; b=Jwkc08Ugqo7++opMPsfC+9JfzLoqgu2ouvyVncv0Tju0zbLIjKAeItD/32IWSzClAz hCK6X/qetjYvGIo7JQe7zzzu8Hz7d/hOuzDLdTaypT5GE9oVfQ9ufxNfCkVnIskT5uYv mOcrs+pkT12kOsIaWn5MvI/UW+Z//kQA6Balb+NvVQqjH+SE7sVUO+FPwzPuK+IizkpK NcAepCfnCVuOPrWuAlVl2d6Oya9mqnUM5uq8F3vT5mnEDVE3psm+xFILhMha/aSDa/p/ J+RwpCvumVWvJaSNwolHDj/2BoBG0ixPNN8FgiBZzBq5NIElpS3INghVtizvRFxslmk6 nSbQ== X-Gm-Message-State: AFqh2kqbItKmIqwS2r3PG/DFksAVLfwSgLZRKbJahtUQBVGQF6EyL/L/ QoiqA9wZQA/6DKeXkKDu8Q== X-Received: by 2002:a05:600c:4d25:b0:3d2:27ba:dde0 with SMTP id u37-20020a05600c4d2500b003d227badde0mr4876976wmp.33.1671727375391; Thu, 22 Dec 2022 08:42:55 -0800 (PST) Received: from p183 ([46.53.253.211]) by smtp.gmail.com with ESMTPSA id t184-20020a1c46c1000000b003b4a699ce8esm6333221wma.6.2022.12.22.08.42.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 22 Dec 2022 08:42:54 -0800 (PST) Date: Thu, 22 Dec 2022 19:42:53 +0300 From: Alexey Dobriyan <adobriyan@gmail.com> To: Alejandro Colomar <alx.manpages@gmail.com>, Michael Kerrisk <mtk.manpages@gmail.com> Cc: linux-kernel@vger.kernel.org, linux-man@vger.kernel.org, oss-security@lists.openwall.com Subject: [patch] proc.5: tell how to parse /proc/*/stat correctly Message-ID: <Y6SJDbKBk471KE4k@p183> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1752933380012243406?= X-GMAIL-MSGID: =?utf-8?q?1752933380012243406?= |
Series |
proc.5: tell how to parse /proc/*/stat correctly
|
|
Commit Message
Alexey Dobriyan
Dec. 22, 2022, 4:42 p.m. UTC
/proc/*/stat can't be parsed with split() or split(" ") or split(' ')
or sscanf("%d (%s) ...") or equivalents because "comm" can contain
whitespace and parenthesis and is not escaped by the kernel.
BTW escaping would not help with naive split() anyway.
Mention strrchr(')') so people can at least stop adding new bugs.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
---
man5/proc.5 | 5 +++++
1 file changed, 5 insertions(+)
Comments
Alexey Dobriyan wrote on Thu, Dec 22, 2022 at 07:42:53PM +0300: > --- a/man5/proc.5 > +++ b/man5/proc.5 > @@ -2092,6 +2092,11 @@ Strings longer than > .B TASK_COMM_LEN > (16) characters (including the terminating null byte) are silently truncated. > This is visible whether or not the executable is swapped out. > + > +Note that \fIcomm\fP can contain space and closing parenthesis characters. > +Parsing /proc/${pid}/stat with split() or equivalent, or scanf(3) isn't > +reliable. The correct way is to locate closing parenthesis with strrchr(')') > +from the end of the buffer and parse integers from there. That's still not enough unless new lines are escaped, which they aren't: $ echo -n 'test) 0 0 0 ' > /proc/$$/comm $ cat /proc/$$/stat 71076 (test) 0 0 0 ) S 71075 71076 71076 34840 71192 4194304 6623 6824 0 0 10 3 2 7 20 0 1 0 36396573 15208448 2888 18446744073709551615 94173281726464 94173282650929 140734972513568 0 0 0 65536 3686404 1266761467 1 0 0 17 1 0 0 0 0 0 94173282892592 94173282940880 94173287231488 140734972522071 140734972522076 140734972522076 140734972526574 0 The silver lining here is that comm length is rather small (16) so we cannot emulate full lines and a very careful process could notice that there are not enough fields after the last parenthesis... So just look for the last closing parenthesis in the next line and try again? But, really, I just don't see how this can practically be said to be parsable...
On Fri, Dec 23, 2022 at 07:03:17AM +0900, Dominique Martinet wrote: > Alexey Dobriyan wrote on Thu, Dec 22, 2022 at 07:42:53PM +0300: > > --- a/man5/proc.5 > > +++ b/man5/proc.5 > > @@ -2092,6 +2092,11 @@ Strings longer than > > .B TASK_COMM_LEN > > (16) characters (including the terminating null byte) are silently truncated. > > This is visible whether or not the executable is swapped out. > > + > > +Note that \fIcomm\fP can contain space and closing parenthesis characters. > > +Parsing /proc/${pid}/stat with split() or equivalent, or scanf(3) isn't > > +reliable. The correct way is to locate closing parenthesis with strrchr(')') > > +from the end of the buffer and parse integers from there. > > That's still not enough unless new lines are escaped, which they aren't: > > $ echo -n 'test) 0 0 0 > ' > /proc/$$/comm > $ cat /proc/$$/stat > 71076 (test) 0 0 0 > ) S 71075 71076 71076 34840 71192 4194304 6623 6824 0 0 10 3 2 7 20 0 1 0 36396573 15208448 2888 18446744073709551615 94173281726464 94173282650929 140734972513568 0 0 0 65536 3686404 1266761467 1 0 0 17 1 0 0 0 0 0 94173282892592 94173282940880 94173287231488 140734972522071 140734972522076 140734972522076 140734972526574 0 > > The silver lining here is that comm length is rather small (16) so we > cannot emulate full lines and a very careful process could notice that > there are not enough fields after the last parenthesis... So just look > for the last closing parenthesis in the next line and try again? No, just don't treat this file's content as a line (nor as several lines) - treat it as a string that might contain new line characters. The ps command from procps-ng seems to manage, e.g. for your test "ps c" prints: 29394 pts/3 S 0:00 test) 0 0 0? where the question mark is what it substitutes for the non-printable character (the new line character). I didn't check whether the process name it prints comes from /proc/$$/stat or /proc/$$/status, though (per strace, it reads both). > But, really, I just don't see how this can practically be said to be parsable... This format certainly makes it easier to get a parser wrong than to get it right. I agree the above man page edit is not enough, and should also mention the caveat that this shouldn't be read in nor parsed as a line. Also, the Linux kernel does have problems with new lines in the comm field elsewhere, at least in the log messages it produces: https://github.com/lkrg-org/lkrg/issues/165 Here I looked into this in context of LKRG development, but with the kernel itself also producing messages with comm in them the point of only fixing LKRG's messages is moot. Alexander P.S. While this thread goes well so far, please note that in general CC'ing other lists on postings to oss-security (or vice versa) is discouraged. With such CC's, possible follow-ups from members of those other lists can be off-topic for oss-security - e.g., they might focus on non-security technicalities. Probably not this time when only a man page is to be patched, but proposed patches to the Linux kernel often result in lengthy discussions and multiple versions of the patch. In those cases, I think it's better to have separate threads and only post summary follow-up(s) to oss-security (e.g., one message stating that a patch was proposed and linking to the thread, and another after the final version is merged).
Solar Designer wrote on Fri, Dec 23, 2022 at 12:21:12AM +0100: > On Fri, Dec 23, 2022 at 07:03:17AM +0900, Dominique Martinet wrote: > > Alexey Dobriyan wrote on Thu, Dec 22, 2022 at 07:42:53PM +0300: > > > --- a/man5/proc.5 > > > +++ b/man5/proc.5 > > > @@ -2092,6 +2092,11 @@ Strings longer than > > > .B TASK_COMM_LEN > > > (16) characters (including the terminating null byte) are silently truncated. > > > This is visible whether or not the executable is swapped out. > > > + > > > +Note that \fIcomm\fP can contain space and closing parenthesis characters. > > > +Parsing /proc/${pid}/stat with split() or equivalent, or scanf(3) isn't > > > +reliable. The correct way is to locate closing parenthesis with strrchr(')') > > > +from the end of the buffer and parse integers from there. > > > > That's still not enough unless new lines are escaped, which they aren't: > > > > $ echo -n 'test) 0 0 0 > > ' > /proc/$$/comm > > $ cat /proc/$$/stat > > 71076 (test) 0 0 0 > > ) S 71075 71076 71076 34840 71192 4194304 6623 6824 0 0 10 3 2 7 20 0 1 0 36396573 15208448 2888 18446744073709551615 94173281726464 94173282650929 140734972513568 0 0 0 65536 3686404 1266761467 1 0 0 17 1 0 0 0 0 0 94173282892592 94173282940880 94173287231488 140734972522071 140734972522076 140734972522076 140734972526574 0 > > > > The silver lining here is that comm length is rather small (16) so we > > cannot emulate full lines and a very careful process could notice that > > there are not enough fields after the last parenthesis... So just look > > for the last closing parenthesis in the next line and try again? > > No, just don't treat this file's content as a line (nor as several > lines) - treat it as a string that might contain new line characters. Ah, this came just after the /proc/net/unix discussion in another thread[1] pointing to [2] with one line per entry, and I was still in that mode. For /proc/pid/stat with a single entry I agree treating it as a buffer and looking for the last closing parenthesis should be correct as per the man page suggestion -- sorry for the noise. [1] https://www.openwall.com/lists/oss-security/2022/12/21/8 [2] https://lore.kernel.org/all/8a87957e-4d33-9351-ae74-243441cb03cd@opteya.com/
On Thursday 2022-12-22 23:03, Dominique Martinet wrote: >> + >> +Note that \fIcomm\fP can contain space and closing parenthesis characters. >> +Parsing /proc/${pid}/stat with split() or equivalent, or scanf(3) isn't >> +reliable. The correct way is to locate closing parenthesis with strrchr(')') >> +from the end of the buffer and parse integers from there. > >That's still not enough unless new lines are escaped, which they aren't: strrchr does not concern itself with "lines". If your input buffer contains the complete content of /proc/X/stat (and not just a "line" thereof), the strrchr approach appears quite workable.
Dominique Martinet writes:
> But, really, I just don't see how this can practically be said to be parsable...
In its current form it never will be. The solution is to place
this variable-length field last. Then you can "cut -d ' ' -f 51-"
to get the command+args part (assuming I counted all those fields
correctly ...)
Of course, this breaks backwards compatability.
--lyndon
On 2022-12-28, Lyndon Nerenberg (VE7TFX/VE6BBM) wrote: > Dominique Martinet writes: > >> But, really, I just don't see how this can practically be said to be parsable... > > In its current form it never will be. The solution is to place > this variable-length field last. Then you can "cut -d ' ' -f 51-" > to get the command+args part (assuming I counted all those fields > correctly ...) > > Of course, this breaks backwards compatability. > I think that cut command doesn't handle newlines, you probably need cut -z, which looks a bit hacky to me. There already is 'ps -q $$ -o comm=', of course, and that works fine - as well as libprocps. I don't know, I think just adding the strrchr note seems acceptable. Tavis.
On Tue, Dec 27, 2022 at 04:44:49PM -0800, Lyndon Nerenberg (VE7TFX/VE6BBM) wrote: > Dominique Martinet writes: > > > But, really, I just don't see how this can practically be said to be parsable... > > In its current form it never will be. The solution is to place > this variable-length field last. Then you can "cut -d ' ' -f 51-" > to get the command+args part (assuming I counted all those fields > correctly ...) > > Of course, this breaks backwards compatability. It would also break forwards compatibility in the case new fields needed to be added. The only solution would be a libxo-style feature wherein a machine-parseable format is exposed by virtue of a file extension. Examples: 1. /proc/pid/stats.json 2. /proc/pid/stats.xml 3. /proc/pid/stats.yaml_shouldnt_be_a_thing Thanks,
On Wed, Dec 28, 2022 at 10:24:58AM -0500, Shawn Webb wrote: > On Tue, Dec 27, 2022 at 04:44:49PM -0800, Lyndon Nerenberg (VE7TFX/VE6BBM) wrote: > > Dominique Martinet writes: > > > > > But, really, I just don't see how this can practically be said to be parsable... > > > > In its current form it never will be. The solution is to place > > this variable-length field last. Then you can "cut -d ' ' -f 51-" > > to get the command+args part (assuming I counted all those fields > > correctly ...) > > > > Of course, this breaks backwards compatability. > > It would also break forwards compatibility in the case new fields > needed to be added. > > The only solution would be a libxo-style feature wherein a > machine-parseable format is exposed by virtue of a file extension. > > Examples: > > 1. /proc/pid/stats.json > 2. /proc/pid/stats.xml > 3. /proc/pid/stats.yaml_shouldnt_be_a_thing To expand upon this idea, lets define an example json file: { "schemaver": "20221228001", "name": "cat", "state": { "raw": "R", "intval": 1, "Pretty": "(Running)", }, "tgid": 5452, "pid": 5452, "ppid": 743, "uid": { "real": 501, "effective": 501, "saved_set": 501, "fs": 501 } } And so on.
On Wed, Dec 28, 2022 at 10:24:58AM -0500, Shawn Webb wrote: > On Tue, Dec 27, 2022 at 04:44:49PM -0800, Lyndon Nerenberg (VE7TFX/VE6BBM) wrote: > > Dominique Martinet writes: > > > > > But, really, I just don't see how this can practically be said to be parsable... > > > > In its current form it never will be. The solution is to place > > this variable-length field last. Then you can "cut -d ' ' -f 51-" > > to get the command+args part (assuming I counted all those fields > > correctly ...) > > > > Of course, this breaks backwards compatability. > > It would also break forwards compatibility in the case new fields > needed to be added. > > The only solution would be a libxo-style feature wherein a > machine-parseable format is exposed by virtue of a file extension. > > Examples: > > 1. /proc/pid/stats.json > 2. /proc/pid/stats.xml > 3. /proc/pid/stats.yaml_shouldnt_be_a_thing A binary format would be even better. No risk of ambiguity.
On Wednesday 2022-12-28 17:47, Demi Marie Obenour wrote: >> Examples: >> >> 1. /proc/pid/stats.json >> 2. /proc/pid/stats.xml >> 3. /proc/pid/stats.yaml_shouldnt_be_a_thing > >A binary format would be even better. No risk of ambiguity. So like EBML?
On Wed, Dec 28, 2022 at 11:47:25AM -0500, Demi Marie Obenour wrote: > On Wed, Dec 28, 2022 at 10:24:58AM -0500, Shawn Webb wrote: > > On Tue, Dec 27, 2022 at 04:44:49PM -0800, Lyndon Nerenberg (VE7TFX/VE6BBM) wrote: > > > Dominique Martinet writes: > > > > > > > But, really, I just don't see how this can practically be said to be parsable... > > > > > > In its current form it never will be. The solution is to place > > > this variable-length field last. Then you can "cut -d ' ' -f 51-" > > > to get the command+args part (assuming I counted all those fields > > > correctly ...) > > > > > > Of course, this breaks backwards compatability. > > > > It would also break forwards compatibility in the case new fields > > needed to be added. > > > > The only solution would be a libxo-style feature wherein a > > machine-parseable format is exposed by virtue of a file extension. > > > > Examples: > > > > 1. /proc/pid/stats.json > > 2. /proc/pid/stats.xml > > 3. /proc/pid/stats.yaml_shouldnt_be_a_thing > > A binary format would be even better. No risk of ambiguity. I think the argument I'm trying to make is to be flexible in implementation, allowing for future needs and wants--that is "future proofing".
On Wed, Dec 28, 2022 at 12:25:17PM -0500, Shawn Webb wrote: > On Wed, Dec 28, 2022 at 11:47:25AM -0500, Demi Marie Obenour wrote: > > On Wed, Dec 28, 2022 at 10:24:58AM -0500, Shawn Webb wrote: > > > On Tue, Dec 27, 2022 at 04:44:49PM -0800, Lyndon Nerenberg (VE7TFX/VE6BBM) wrote: > > > > Dominique Martinet writes: > > > > > > > > > But, really, I just don't see how this can practically be said to be parsable... > > > > > > > > In its current form it never will be. The solution is to place > > > > this variable-length field last. Then you can "cut -d ' ' -f 51-" > > > > to get the command+args part (assuming I counted all those fields > > > > correctly ...) > > > > > > > > Of course, this breaks backwards compatability. > > > > > > It would also break forwards compatibility in the case new fields > > > needed to be added. > > > > > > The only solution would be a libxo-style feature wherein a > > > machine-parseable format is exposed by virtue of a file extension. > > > > > > Examples: > > > > > > 1. /proc/pid/stats.json > > > 2. /proc/pid/stats.xml > > > 3. /proc/pid/stats.yaml_shouldnt_be_a_thing > > > > A binary format would be even better. No risk of ambiguity. > > I think the argument I'm trying to make is to be flexible in > implementation, allowing for future needs and wants--that is "future > proofing". Linux should not have an XML, JSON, or YAML serializer. Linux already does way too much; let’s not add one more thing to the list.
On Wed, Dec 28, 2022 at 01:02:35PM -0500, Demi Marie Obenour wrote: > On Wed, Dec 28, 2022 at 12:25:17PM -0500, Shawn Webb wrote: > > On Wed, Dec 28, 2022 at 11:47:25AM -0500, Demi Marie Obenour wrote: > > > On Wed, Dec 28, 2022 at 10:24:58AM -0500, Shawn Webb wrote: > > > > On Tue, Dec 27, 2022 at 04:44:49PM -0800, Lyndon Nerenberg (VE7TFX/VE6BBM) wrote: > > > > > Dominique Martinet writes: > > > > > > > > > > > But, really, I just don't see how this can practically be said to be parsable... > > > > > > > > > > In its current form it never will be. The solution is to place > > > > > this variable-length field last. Then you can "cut -d ' ' -f 51-" > > > > > to get the command+args part (assuming I counted all those fields > > > > > correctly ...) > > > > > > > > > > Of course, this breaks backwards compatability. > > > > > > > > It would also break forwards compatibility in the case new fields > > > > needed to be added. > > > > > > > > The only solution would be a libxo-style feature wherein a > > > > machine-parseable format is exposed by virtue of a file extension. > > > > > > > > Examples: > > > > > > > > 1. /proc/pid/stats.json > > > > 2. /proc/pid/stats.xml > > > > 3. /proc/pid/stats.yaml_shouldnt_be_a_thing > > > > > > A binary format would be even better. No risk of ambiguity. > > > > I think the argument I'm trying to make is to be flexible in > > implementation, allowing for future needs and wants--that is "future > > proofing". > > Linux should not have an XML, JSON, or YAML serializer. Linux already > does way too much; let’s not add one more thing to the list. Handling a new binary format is not 'one more thing' added? > -- > Sincerely, > Demi Marie Obenour (she/her/hers) > Invisible Things Lab
On Wed, Dec 28, 2022 at 01:02:35PM -0500, Demi Marie Obenour wrote: > On Wed, Dec 28, 2022 at 12:25:17PM -0500, Shawn Webb wrote: > > On Wed, Dec 28, 2022 at 11:47:25AM -0500, Demi Marie Obenour wrote: > > > On Wed, Dec 28, 2022 at 10:24:58AM -0500, Shawn Webb wrote: > > > > On Tue, Dec 27, 2022 at 04:44:49PM -0800, Lyndon Nerenberg (VE7TFX/VE6BBM) wrote: > > > > > Dominique Martinet writes: > > > > > > > > > > > But, really, I just don't see how this can practically be said to be parsable... > > > > > > > > > > In its current form it never will be. The solution is to place > > > > > this variable-length field last. Then you can "cut -d ' ' -f 51-" > > > > > to get the command+args part (assuming I counted all those fields > > > > > correctly ...) > > > > > > > > > > Of course, this breaks backwards compatability. > > > > > > > > It would also break forwards compatibility in the case new fields > > > > needed to be added. > > > > > > > > The only solution would be a libxo-style feature wherein a > > > > machine-parseable format is exposed by virtue of a file extension. > > > > > > > > Examples: > > > > > > > > 1. /proc/pid/stats.json > > > > 2. /proc/pid/stats.xml > > > > 3. /proc/pid/stats.yaml_shouldnt_be_a_thing > > > > > > A binary format would be even better. No risk of ambiguity. > > > > I think the argument I'm trying to make is to be flexible in > > implementation, allowing for future needs and wants--that is "future > > proofing". > > Linux should not have an XML, JSON, or YAML serializer. Linux already > does way too much; let’s not add one more thing to the list. Somewhat agreed. I think formats like JSON provide a good balance between machine parseable and human readable. As I described earlier, though, when it comes to concepts like procfs and sysfs, I have a bias towards abandoning them in favor of sysctl. If sysctl nodes were to be used, no new serialization formats would need to be implemented--and developers would also use a safter method of system and process inspection and manipulation.
Hi all, On 12/28/22 20:24, Shawn Webb wrote: > On Wed, Dec 28, 2022 at 01:02:35PM -0500, Demi Marie Obenour wrote: >> On Wed, Dec 28, 2022 at 12:25:17PM -0500, Shawn Webb wrote: >>> On Wed, Dec 28, 2022 at 11:47:25AM -0500, Demi Marie Obenour wrote: >>>> On Wed, Dec 28, 2022 at 10:24:58AM -0500, Shawn Webb wrote: >>>>> On Tue, Dec 27, 2022 at 04:44:49PM -0800, Lyndon Nerenberg (VE7TFX/VE6BBM) wrote: >>>>>> Dominique Martinet writes: >>>>>> >>>>>>> But, really, I just don't see how this can practically be said to be parsable... >>>>>> >>>>>> In its current form it never will be. The solution is to place >>>>>> this variable-length field last. Then you can "cut -d ' ' -f 51-" >>>>>> to get the command+args part (assuming I counted all those fields >>>>>> correctly ...) >>>>>> >>>>>> Of course, this breaks backwards compatability. >>>>> >>>>> It would also break forwards compatibility in the case new fields >>>>> needed to be added. >>>>> >>>>> The only solution would be a libxo-style feature wherein a >>>>> machine-parseable format is exposed by virtue of a file extension. >>>>> >>>>> Examples: >>>>> >>>>> 1. /proc/pid/stats.json >>>>> 2. /proc/pid/stats.xml >>>>> 3. /proc/pid/stats.yaml_shouldnt_be_a_thing >>>> >>>> A binary format would be even better. No risk of ambiguity. >>> >>> I think the argument I'm trying to make is to be flexible in >>> implementation, allowing for future needs and wants--that is "future >>> proofing". >> >> Linux should not have an XML, JSON, or YAML serializer. Linux already >> does way too much; let’s not add one more thing to the list. > > Somewhat agreed. I think formats like JSON provide a good balance > between machine parseable and human readable. > a > As I described earlier, though, when it comes to concepts like procfs > and sysfs, I have a bias towards abandoning them in favor of sysctl. > If sysctl nodes were to be used, no new serialization formats would > need to be implemented--and developers would also use a safter method > of system and process inspection and manipulation. > Just a comment as someone who is reading without much understanding of the contents of /prod/pid/stat: If organization of the data in the file is a problem, and the format starts to matter, maybe it's a hint that there are too many different contents, and could be split into different files, each one with its own formatting rules. I'll suggest that maybe a set of files, maybe contained in a common directory stats.d, is what you're looking for? Binary format is not of my preference, since most user-space tools work with the standard interface, that is, text. Cheers, Alex -- <http://www.alejandro-colomar.es/>
On Wed, Dec 28, 2022 at 01:02:35PM -0500, Demi Marie Obenour wrote: > > I think the argument I'm trying to make is to be flexible in > > implementation, allowing for future needs and wants--that is "future > > proofing". > > Linux should not have an XML, JSON, or YAML serializer. Linux already > does way too much; let’s not add one more thing to the list. There's always Protobufs[1]! :-) And all of these are better than ASN.1, for which Google already has a limited parser (for x.509 certificates). :-) :-) :-) - Ted
On Wed, Dec 28, 2022 at 05:14:42PM -0500, Theodore Ts'o wrote: > On Wed, Dec 28, 2022 at 01:02:35PM -0500, Demi Marie Obenour wrote: > > > I think the argument I'm trying to make is to be flexible in > > > implementation, allowing for future needs and wants--that is "future > > > proofing". > > > > Linux should not have an XML, JSON, or YAML serializer. Linux already > > does way too much; let’s not add one more thing to the list. > > There's always Protobufs[1]! :-) And all of these are better than > ASN.1, for which Google already has a limited parser (for x.509 > certificates). :-) :-) :-) > > - Ted Cap’n Proto is better than Protobufs :-)
* Tavis Ormandy <taviso@gmail.com>, 2022-12-28 01:50: >>>But, really, I just don't see how this can practically be said to be >>>parsable... >> >>In its current form it never will be. The solution is to place this >>variable-length field last. Then you can "cut -d ' ' -f 51-" to get >>the command+args part (assuming I counted all those fields correctly >>...) >> >>Of course, this breaks backwards compatability. > >I think that cut command doesn't handle newlines, Indeed. >There already is 'ps -q $$ -o >comm=' FWIW, "ps ... -o comm=" doesn't just print the raw comm value: it replaces non-printable chars with punctuation characters, and it may append " <defunct>" if the process is a zombie. The easiest way to get unmangled comm is to read it from /proc/$PID/comm, then strip the trailing newline. (But I suspect most /proc/*/stat parsers don't care about the comm field at all; they just want to skip over it to get their hands on the following fields.)
From: Shawn Webb > Sent: 28 December 2022 15:25 > > On Tue, Dec 27, 2022 at 04:44:49PM -0800, Lyndon Nerenberg (VE7TFX/VE6BBM) wrote: > > Dominique Martinet writes: > > > > > But, really, I just don't see how this can practically be said to be parsable... > > > > In its current form it never will be. The solution is to place > > this variable-length field last. Then you can "cut -d ' ' -f 51-" > > to get the command+args part (assuming I counted all those fields > > correctly ...) > > > > Of course, this breaks backwards compatability. > > It would also break forwards compatibility in the case new fields > needed to be added. > > The only solution would be a libxo-style feature wherein a > machine-parseable format is exposed by virtue of a file extension. > > Examples: > > 1. /proc/pid/stats.json > 2. /proc/pid/stats.xml > 3. /proc/pid/stats.yaml_shouldnt_be_a_thing None of those are of any real use if you are trying to parse the data in something like a shell script. Multiple lines formatted as "tag:value" are probably the best bet. Provided something sane is done with embedded \n (and maybe \r). David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Hi all, Let's wind this oss-security thread down as it relates to brainstorming and commenting on totally new designs - no more of that, please. Many things were said, but realistically the interface isn't _that_ broken (this can be parsed correctly, and procps-ng manages to) and is (hopefully) not going to change much (in my opinion, and I know I'm not alone in this, most of the proposals would make things worse overall). Somewhat realistically, one possible change is replacing the most risky characters, such as braces and anything <= ASCII 32, perhaps with '?' to match what procps-ng is doing. Perhaps do this either on all updates of "comm" or in all places where "comm" is reported to userspace (including procfs and kernel messages, by calling a common function). "comm" isn't the full process name anyway - it's often truncated - so it can reasonably be made safer in other ways as well. As an option, the replacing of whitespace (ASCII 32) and braces could be limited to the "stat" file, but the control characters are (even more) problematic with other interfaces where "comm" is exposed, so replacing them should probably be global. Happy New Year! Alexander
--- a/man5/proc.5 +++ b/man5/proc.5 @@ -2092,6 +2092,11 @@ Strings longer than .B TASK_COMM_LEN (16) characters (including the terminating null byte) are silently truncated. This is visible whether or not the executable is swapped out. + +Note that \fIcomm\fP can contain space and closing parenthesis characters. +Parsing /proc/${pid}/stat with split() or equivalent, or scanf(3) isn't +reliable. The correct way is to locate closing parenthesis with strrchr(')') +from the end of the buffer and parse integers from there. .TP (3) \fIstate\fP \ %c One of the following characters, indicating process state: