[0/4] selftests/nolibc: add user-space 'efault' handler

Message ID	cover.1685443199.git.falcon@tinylab.org
Headers	Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; From: Zhangjin Wu <falcon@tinylab.org> To: w@1wt.eu Cc: falcon@tinylab.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-riscv@lists.infradead.org, thomas@t-8ch.de Subject: [PATCH 0/4] selftests/nolibc: add user-space 'efault' handler Date: Tue, 30 May 2023 18:47:38 +0800 Message-Id: <cover.1685443199.git.falcon@tinylab.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Feedback-ID: bizesmtp:tinylab.org:qybglogicsvrsz:qybglogicsvrsz3a-3 Precedence: bulk
Series	selftests/nolibc: add user-space 'efault' handler \| [0/4] selftests/nolibc: add user-space 'efault' handler [1/4] selftests/nolibc: allow rerun with the same settings [2/4] selftests/nolibc: add rerun support [3/4] selftests/nolibc: add user space efault handler [4/4] selftests/nolibc: add user-space efault restore test case

Message ID

cover.1685443199.git.falcon@tinylab.org

Headers

Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
From: Zhangjin Wu <falcon@tinylab.org>
To: w@1wt.eu
Cc: falcon@tinylab.org, linux-kernel@vger.kernel.org,
        linux-kselftest@vger.kernel.org, linux-riscv@lists.infradead.org,
        thomas@t-8ch.de
Subject: [PATCH 0/4] selftests/nolibc: add user-space 'efault' handler
Date: Tue, 30 May 2023 18:47:38 +0800
Message-Id: <cover.1685443199.git.falcon@tinylab.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Feedback-ID: bizesmtp:tinylab.org:qybglogicsvrsz:qybglogicsvrsz3a-3
Precedence: bulk

Series

selftests/nolibc: add user-space 'efault' handler |

Message

Zhangjin Wu May 30, 2023, 10:47 a.m. UTC

  Hi, Willy, Thomas

This is not really for merge, but only let it work as a demo code to
test whether it is possible to restore the next test when there is a bad
pointer access in user-space [1].

Besides, a new 'run' command is added to 'NOLIBC_TEST' environment
variable or arguments to control the running iterations, this may be
used to test the reentrancy issues, but no failures found currently ;-)

With glibc, it works as following:

    $ ./nolibc-test run:2,syscall:28-30,stdlib:1
    Running iteration(s): 2

    Current iteration: 1

    Running test 'syscall', from 28 to 30
    28 dup3_m1 = -1 EBADF                                            [OK]
    29 efault_handler ! 11 SIGSEGV                                   [OK]
    30 execve_root = -1 EACCES                                       [OK]
    Errors during this test: 0

    Running test 'stdlib'
    1 getenv_blah = <(null)>                                         [OK]
    Errors during this test: 0

    Total number of errors in the 1 iteration(s): 0

    Current iteration: 2

    Running test 'syscall'
    28 dup3_m1 = -1 EBADF                                            [OK]
    29 efault_handler ! 11 SIGSEGV                                   [OK]
    30 execve_root = -1 EACCES                                       [OK]
    Errors during this test: 0

    Running test 'stdlib'
    1 getenv_blah = <(null)>                                         [OK]
    Errors during this test: 0

    Total number of errors in the 2 iteration(s): 0

With nolibc, it will be skipped (run:2,syscall:28-30,stdlib:10):

    Running iteration(s): 2

    Current iteration: 1

    Running test 'syscall', from 28 to 30
    28 dup3_m1 = -1 EBADF                                            [OK]
    29 efault_handler                                               [SKIPPED]
    30 execve_root = -1 EACCES                                       [OK]
    Errors during this test: 0

    Running test 'stdlib', from 10 to 10
    10 strrchr_foobar_o = <obar>                                     [OK]
    Errors during this test: 0

    Total number of errors in the 1 iteration(s): 0

    Current iteration: 2

    Running test 'syscall', from 28 to 30
    28 dup3_m1 = -1 EBADF                                            [OK]
    29 efault_handler                                               [SKIPPED]
    30 execve_root = -1 EACCES                                       [OK]
    Errors during this test: 0

    Running test 'stdlib', from 10 to 10
    10 strrchr_foobar_o = <obar>                                     [OK]
    Errors during this test: 0

    Total number of errors in the 2 iteration(s): 0

Best regards,
Zhangjin
---

[1]: https://lore.kernel.org/linux-riscv/20230529113143.GB2762@1wt.eu/ 

Zhangjin Wu (4):
  selftests/nolibc: allow rerun with the same settings
  selftests/nolibc: add rerun support
  selftests/nolibc: add user space efault handler
  selftests/nolibc: add user-space efault restore test case

 tools/testing/selftests/nolibc/nolibc-test.c | 247 +++++++++++++++++--
 1 file changed, 221 insertions(+), 26 deletions(-)

Comments

Willy Tarreau June 4, 2023, 11:05 a.m. UTC | #1

Hi Zhangjin,

On Tue, May 30, 2023 at 06:47:38PM +0800, Zhangjin Wu wrote:
> Hi, Willy, Thomas
> 
> This is not really for merge, but only let it work as a demo code to
> test whether it is possible to restore the next test when there is a bad
> pointer access in user-space [1].
> 
> Besides, a new 'run' command is added to 'NOLIBC_TEST' environment
> variable or arguments to control the running iterations, this may be
> used to test the reentrancy issues, but no failures found currently ;-)

Since the tests we're running are essentially API tests, I'm having
a hard time seeing in which case it can be useful to repeat the tests.
I'm not necessarily against doing it, I'm used to repeating tests for
example in anything sensitive to timing or race conditions, it's just
that here I'm not seeing the benefit. And the fact you found no failure
is rather satisfying because the opposite would have surprised me.

Regarding the efault handler, I don't think it's a good idea until we
have signal+longjmp support in nolibc. Because running different tests
with different libcs kind of defeats the purpose of the test in the
first place. The reason why I wanted nolibc-test to be portable to at
least one other libc is to help the developer figure if a failure is in
the nolibc syscall they're implementing or in the test itself. Here if
we start to say that some parts cannot be tested similarly, the benefit
disappears.

I mentioned previously that I'm not particularly impatient to work on
signals and longjmp. But in parallel I understand how this can make the
life of some developers easier and even allow to widen the spectrum of
some tests. Thus, maybe in the end it could be beneficial to make progress
on this front and support these. We should make sure that this doesn't
inflate the code base however. I guess I'd be fine with ignoring libc-
based restarts on EINTR, alt stacks and so on and keeping this minimal
(i.e. catch a segfault/bus error/sigill in a test program, or a Ctrl-C
in a tiny shell).

Just let us know if you think that's something you could be interested
in exploring. There might be differences between architectures, I have
not checked.

Thanks,
Willy

Thomas Weißschuh June 4, 2023, 7:07 p.m. UTC | #2

On 2023-06-04 13:05:18+0200, Willy Tarreau wrote:
> Hi Zhangjin,
> 
> On Tue, May 30, 2023 at 06:47:38PM +0800, Zhangjin Wu wrote:
> > Hi, Willy, Thomas
> > 
> > This is not really for merge, but only let it work as a demo code to
> > test whether it is possible to restore the next test when there is a bad
> > pointer access in user-space [1].
> > 
> > Besides, a new 'run' command is added to 'NOLIBC_TEST' environment
> > variable or arguments to control the running iterations, this may be
> > used to test the reentrancy issues, but no failures found currently ;-)
> 
> Since the tests we're running are essentially API tests, I'm having
> a hard time seeing in which case it can be useful to repeat the tests.
> I'm not necessarily against doing it, I'm used to repeating tests for
> example in anything sensitive to timing or race conditions, it's just
> that here I'm not seeing the benefit. And the fact you found no failure
> is rather satisfying because the opposite would have surprised me.
> 
> Regarding the efault handler, I don't think it's a good idea until we
> have signal+longjmp support in nolibc. Because running different tests
> with different libcs kind of defeats the purpose of the test in the
> first place. The reason why I wanted nolibc-test to be portable to at
> least one other libc is to help the developer figure if a failure is in
> the nolibc syscall they're implementing or in the test itself. Here if
> we start to say that some parts cannot be tested similarly, the benefit
> disappears.
> 
> I mentioned previously that I'm not particularly impatient to work on
> signals and longjmp. But in parallel I understand how this can make the
> life of some developers easier and even allow to widen the spectrum of
> some tests. Thus, maybe in the end it could be beneficial to make progress
> on this front and support these. We should make sure that this doesn't
> inflate the code base however. I guess I'd be fine with ignoring libc-
> based restarts on EINTR, alt stacks and so on and keeping this minimal
> (i.e. catch a segfault/bus error/sigill in a test program, or a Ctrl-C
> in a tiny shell).
> 
> Just let us know if you think that's something you could be interested
> in exploring. There might be differences between architectures, I have
> not checked.

If the goal is to handle hard errors like segfaults more gracefully,
would it not be easier to run each testcase in a subprocess?

Then we can just check if the child exited successfully.

It should also be completely architecture agnostic.

Thomas

Willy Tarreau June 4, 2023, 7:14 p.m. UTC | #3

On Sun, Jun 04, 2023 at 09:07:25PM +0200, Thomas Weißschuh wrote:
> On 2023-06-04 13:05:18+0200, Willy Tarreau wrote:
> > Hi Zhangjin,
> > 
> > On Tue, May 30, 2023 at 06:47:38PM +0800, Zhangjin Wu wrote:
> > > Hi, Willy, Thomas
> > > 
> > > This is not really for merge, but only let it work as a demo code to
> > > test whether it is possible to restore the next test when there is a bad
> > > pointer access in user-space [1].
> > > 
> > > Besides, a new 'run' command is added to 'NOLIBC_TEST' environment
> > > variable or arguments to control the running iterations, this may be
> > > used to test the reentrancy issues, but no failures found currently ;-)
> > 
> > Since the tests we're running are essentially API tests, I'm having
> > a hard time seeing in which case it can be useful to repeat the tests.
> > I'm not necessarily against doing it, I'm used to repeating tests for
> > example in anything sensitive to timing or race conditions, it's just
> > that here I'm not seeing the benefit. And the fact you found no failure
> > is rather satisfying because the opposite would have surprised me.
> > 
> > Regarding the efault handler, I don't think it's a good idea until we
> > have signal+longjmp support in nolibc. Because running different tests
> > with different libcs kind of defeats the purpose of the test in the
> > first place. The reason why I wanted nolibc-test to be portable to at
> > least one other libc is to help the developer figure if a failure is in
> > the nolibc syscall they're implementing or in the test itself. Here if
> > we start to say that some parts cannot be tested similarly, the benefit
> > disappears.
> > 
> > I mentioned previously that I'm not particularly impatient to work on
> > signals and longjmp. But in parallel I understand how this can make the
> > life of some developers easier and even allow to widen the spectrum of
> > some tests. Thus, maybe in the end it could be beneficial to make progress
> > on this front and support these. We should make sure that this doesn't
> > inflate the code base however. I guess I'd be fine with ignoring libc-
> > based restarts on EINTR, alt stacks and so on and keeping this minimal
> > (i.e. catch a segfault/bus error/sigill in a test program, or a Ctrl-C
> > in a tiny shell).
> > 
> > Just let us know if you think that's something you could be interested
> > in exploring. There might be differences between architectures, I have
> > not checked.
> 
> If the goal is to handle hard errors like segfaults more gracefully,
> would it not be easier to run each testcase in a subprocess?
> 
> Then we can just check if the child exited successfully.
> 
> It should also be completely architecture agnostic.

Could be, indeed. However it would complexify a bit strace debugging,
but yeah that might be something to think about.

Willy

Zhangjin Wu June 6, 2023, 4:04 a.m. UTC | #4

> On 2023-06-04 13:05:18+0200, Willy Tarreau wrote:
> > Hi Zhangjin,
> > 
> > On Tue, May 30, 2023 at 06:47:38PM +0800, Zhangjin Wu wrote:
> > > Hi, Willy, Thomas
> > > 
> > 
> > Just let us know if you think that's something you could be interested
> > in exploring. There might be differences between architectures, I have
> > not checked.
> 
> If the goal is to handle hard errors like segfaults more gracefully,
> would it not be easier to run each testcase in a subprocess?
> 
> Then we can just check if the child exited successfully.
>

Yeah, it is easier, it may be possible to simply pass the test case to
something like test_fork() and let the child to run it there.

I will take a try, thanks very much.

> It should also be completely architecture agnostic.

It is for we can reuse the test_fork() stuff.

Best regards,
Zhangjin

> 
> Thomas