Message ID | 20230717103152.202078-7-ryan.roberts@arm.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:c923:0:b0:3e4:2afc:c1 with SMTP id j3csp1026356vqt; Mon, 17 Jul 2023 03:39:56 -0700 (PDT) X-Google-Smtp-Source: APBJJlG5b/MvcbxW4zO82MWiq1QIwgSswQxbIg0Y4U8HwpOIDXntrXKMkjRp2OdSv+co1+amcOa1 X-Received: by 2002:a17:906:10cd:b0:98d:e696:de4f with SMTP id v13-20020a17090610cd00b0098de696de4fmr11595194ejv.26.1689590396007; Mon, 17 Jul 2023 03:39:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689590395; cv=none; d=google.com; s=arc-20160816; b=ALaf1vcBRBzzpqTuIQYmvrMC+ByL7+dHGsdm7av6v6XAk4O6ARqFO7z2gWVcmD9sf0 thwlJGqiW+L5cBBiLulXt7GQ64V1BdTziiv9a6H2DlIQl5LEXgFjtAd8luK1EJwv8t3L EMnzUbV8r7KQeiFy/fCEv6Lhfw4p3A2Vep+Ci/L1cxshq6jD6Gqk6HllIT28GqC8zlQy ylOzo1CmcS06GaYc3RSzCaaPyqnBnjyzx709+xAeIx2X3egrhDX3f6I+cssHIhB8ODpo iu6stbt9xFF79ADNATeCSmy0IMN60ue4+B6qFYE5OOIuR6l20PC8vGiTp74dDM4NOaTf vEJA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=rHtD/pnAHEzhlHxmKN5MIBCps4aYfm1NZFwHzfIO8rs=; fh=OUPfCZjK3Gm4uAxJmzTYkNGoALdOI9ZG4q/pkSTXiCE=; b=OGT9bt/OAqvJ1LLVv4gZy6mmDdosZFMq3z3fCqiPSF4qH1pExgtgYuGi+GL99YNMfR GNUYduNZdqE8pnXzZCX9uWL0XLRQXIue1pD4uVgy+QqLLmuFSpkt3Wwrhyu6QrtOoCtw pGaQhP14B6OYHe6EEfJIT7y7RIgpEme8UIHX7Ipip/Le61kzjT9JHWNcnwzSjswNJiuB vTTGwsFgAWgkd1RDxHQS19ychW+YlpversurYfW72ZKlBnv4Jiw548jx9FB5l/D5zW1+ u9kZTUfsgQ1ZQ51qwhI7EmJvlpuIxJw42+PW5HQ85ehrnbaPeBjOa1zoR+OKzh9JGgPl +Fmw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id hb4-20020a170906b88400b0096f81ae0ac9si13377040ejb.34.2023.07.17.03.39.32; Mon, 17 Jul 2023 03:39:55 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230338AbjGQKc3 (ORCPT <rfc822;hadasmailinglist@gmail.com> + 99 others); Mon, 17 Jul 2023 06:32:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41498 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230213AbjGQKcY (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 17 Jul 2023 06:32:24 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 87CB110DA; Mon, 17 Jul 2023 03:32:18 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 5DE7F11FB; Mon, 17 Jul 2023 03:33:01 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 639333F67D; Mon, 17 Jul 2023 03:32:16 -0700 (PDT) From: Ryan Roberts <ryan.roberts@arm.com> To: "Andrew Morton" <akpm@linux-foundation.org>, "Shuah Khan" <shuah@kernel.org>, =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= <jglisse@redhat.com>, "David Hildenbrand" <david@redhat.com>, "Mark Brown" <broonie@kernel.org>, "John Hubbard" <jhubbard@nvidia.com>, "Florent Revest" <revest@chromium.org>, Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com>, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org Subject: [PATCH v2 6/8] selftests/mm: Make migration test robust to failure Date: Mon, 17 Jul 2023 11:31:50 +0100 Message-Id: <20230717103152.202078-7-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230717103152.202078-1-ryan.roberts@arm.com> References: <20230717103152.202078-1-ryan.roberts@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1771663939060580155 X-GMAIL-MSGID: 1771663939060580155 |
Series |
selftests/mm fixes for arm64
|
|
Commit Message
Ryan Roberts
July 17, 2023, 10:31 a.m. UTC
The `migration` test currently has a number of robustness problems that
cause it to hang and leak resources.
Timeout: There are 3 tests, which each previously ran for 60 seconds.
However, the timeout in mm/settings for a single test binary was set to
45 seconds. So when run using run_kselftest.sh, the top level timeout
would trigger before the test binary was finished. Solve this by meeting
in the middle; each of the 3 tests now runs for 20 seconds (for a total
of 60), and the top level timeout is set to 90 seconds.
Leaking child processes: the `shared_anon` test fork()s some children
but then an ASSERT() fires before the test kills those children. The
assert causes immediate exit of the parent and leaking of the children.
Furthermore, if run using the run_kselftest.sh wrapper, the wrapper
would get stuck waiting for those children to exit, which never happens.
Solve this by deferring any asserts until after the children are killed.
The same pattern is used for the threaded tests for uniformity.
With these changes, the test binary now runs to completion on arm64,
with 2 tests passing and the `shared_anon` test failing.
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
tools/testing/selftests/mm/migration.c | 14 ++++++++++----
tools/testing/selftests/mm/settings | 2 +-
2 files changed, 11 insertions(+), 5 deletions(-)
Comments
On 17.07.23 12:31, Ryan Roberts wrote: > The `migration` test currently has a number of robustness problems that > cause it to hang and leak resources. > > Timeout: There are 3 tests, which each previously ran for 60 seconds. > However, the timeout in mm/settings for a single test binary was set to > 45 seconds. So when run using run_kselftest.sh, the top level timeout > would trigger before the test binary was finished. Solve this by meeting > in the middle; each of the 3 tests now runs for 20 seconds (for a total > of 60), and the top level timeout is set to 90 seconds. > > Leaking child processes: the `shared_anon` test fork()s some children > but then an ASSERT() fires before the test kills those children. The > assert causes immediate exit of the parent and leaking of the children. > Furthermore, if run using the run_kselftest.sh wrapper, the wrapper > would get stuck waiting for those children to exit, which never happens. > Solve this by deferring any asserts until after the children are killed. > The same pattern is used for the threaded tests for uniformity. > > With these changes, the test binary now runs to completion on arm64, > with 2 tests passing and the `shared_anon` test failing. > > Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> > --- > tools/testing/selftests/mm/migration.c | 14 ++++++++++---- > tools/testing/selftests/mm/settings | 2 +- > 2 files changed, 11 insertions(+), 5 deletions(-) > > diff --git a/tools/testing/selftests/mm/migration.c b/tools/testing/selftests/mm/migration.c > index 379581567f27..189d7d9070e8 100644 > --- a/tools/testing/selftests/mm/migration.c > +++ b/tools/testing/selftests/mm/migration.c > @@ -15,7 +15,7 @@ > #include <time.h> > > #define TWOMEG (2<<20) > -#define RUNTIME (60) > +#define RUNTIME (20) > > #define ALIGN(x, a) (((x) + (a - 1)) & (~((a) - 1))) > > @@ -118,6 +118,7 @@ TEST_F_TIMEOUT(migration, private_anon, 2*RUNTIME) > { > uint64_t *ptr; > int i; > + int ret; > > if (self->nthreads < 2 || self->n1 < 0 || self->n2 < 0) > SKIP(return, "Not enough threads or NUMA nodes available"); > @@ -131,9 +132,10 @@ TEST_F_TIMEOUT(migration, private_anon, 2*RUNTIME) > if (pthread_create(&self->threads[i], NULL, access_mem, ptr)) > perror("Couldn't create thread"); > > - ASSERT_EQ(migrate(ptr, self->n1, self->n2), 0); > + ret = migrate(ptr, self->n1, self->n2); > for (i = 0; i < self->nthreads - 1; i++) > ASSERT_EQ(pthread_cancel(self->threads[i]), 0); > + ASSERT_EQ(ret, 0); Why is that required? This does not involve fork. > } > > /* > @@ -144,6 +146,7 @@ TEST_F_TIMEOUT(migration, shared_anon, 2*RUNTIME) > pid_t pid; > uint64_t *ptr; > int i; > + int ret; > > if (self->nthreads < 2 || self->n1 < 0 || self->n2 < 0) > SKIP(return, "Not enough threads or NUMA nodes available"); > @@ -161,9 +164,10 @@ TEST_F_TIMEOUT(migration, shared_anon, 2*RUNTIME) > self->pids[i] = pid; > } > > - ASSERT_EQ(migrate(ptr, self->n1, self->n2), 0); > + ret = migrate(ptr, self->n1, self->n2); > for (i = 0; i < self->nthreads - 1; i++) > ASSERT_EQ(kill(self->pids[i], SIGTERM), 0); > + ASSERT_EQ(ret, 0); Might be cleaner to also: diff --git a/tools/testing/selftests/mm/migration.c b/tools/testing/selftests/mm/migration.c index 379581567f27..b3f12b9847ec 100644 --- a/tools/testing/selftests/mm/migration.c +++ b/tools/testing/selftests/mm/migration.c @@ -11,6 +11,7 @@ #include <numaif.h> #include <sys/mman.h> #include <sys/types.h> +#include <sys/prctl.h> #include <signal.h> #include <time.h> @@ -155,10 +156,12 @@ TEST_F_TIMEOUT(migration, shared_anon, 2*RUNTIME) memset(ptr, 0xde, TWOMEG); for (i = 0; i < self->nthreads - 1; i++) { pid = fork(); - if (!pid) + if (!pid) { + prctl(PR_SET_PDEATHSIG, SIGHUP); access_mem(ptr); - else + } else { self->pids[i] = pid; + } } ASSERT_EQ(migrate(ptr, self->n1, self->n2), 0); Then, whenever the parent dies, all child processes get zapped.
On 17/07/2023 18:40, David Hildenbrand wrote: > On 17.07.23 12:31, Ryan Roberts wrote: >> The `migration` test currently has a number of robustness problems that >> cause it to hang and leak resources. >> >> Timeout: There are 3 tests, which each previously ran for 60 seconds. >> However, the timeout in mm/settings for a single test binary was set to >> 45 seconds. So when run using run_kselftest.sh, the top level timeout >> would trigger before the test binary was finished. Solve this by meeting >> in the middle; each of the 3 tests now runs for 20 seconds (for a total >> of 60), and the top level timeout is set to 90 seconds. >> >> Leaking child processes: the `shared_anon` test fork()s some children >> but then an ASSERT() fires before the test kills those children. The >> assert causes immediate exit of the parent and leaking of the children. >> Furthermore, if run using the run_kselftest.sh wrapper, the wrapper >> would get stuck waiting for those children to exit, which never happens. >> Solve this by deferring any asserts until after the children are killed. >> The same pattern is used for the threaded tests for uniformity. >> >> With these changes, the test binary now runs to completion on arm64, >> with 2 tests passing and the `shared_anon` test failing. >> >> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> >> --- >> tools/testing/selftests/mm/migration.c | 14 ++++++++++---- >> tools/testing/selftests/mm/settings | 2 +- >> 2 files changed, 11 insertions(+), 5 deletions(-) >> >> diff --git a/tools/testing/selftests/mm/migration.c >> b/tools/testing/selftests/mm/migration.c >> index 379581567f27..189d7d9070e8 100644 >> --- a/tools/testing/selftests/mm/migration.c >> +++ b/tools/testing/selftests/mm/migration.c >> @@ -15,7 +15,7 @@ >> #include <time.h> >> #define TWOMEG (2<<20) >> -#define RUNTIME (60) >> +#define RUNTIME (20) >> #define ALIGN(x, a) (((x) + (a - 1)) & (~((a) - 1))) >> @@ -118,6 +118,7 @@ TEST_F_TIMEOUT(migration, private_anon, 2*RUNTIME) >> { >> uint64_t *ptr; >> int i; >> + int ret; >> if (self->nthreads < 2 || self->n1 < 0 || self->n2 < 0) >> SKIP(return, "Not enough threads or NUMA nodes available"); >> @@ -131,9 +132,10 @@ TEST_F_TIMEOUT(migration, private_anon, 2*RUNTIME) >> if (pthread_create(&self->threads[i], NULL, access_mem, ptr)) >> perror("Couldn't create thread"); >> - ASSERT_EQ(migrate(ptr, self->n1, self->n2), 0); >> + ret = migrate(ptr, self->n1, self->n2); >> for (i = 0; i < self->nthreads - 1; i++) >> ASSERT_EQ(pthread_cancel(self->threads[i]), 0); >> + ASSERT_EQ(ret, 0); > > Why is that required? This does not involve fork. It's not required. I was just trying to keep everything aligned to the same pattern. > >> } >> /* >> @@ -144,6 +146,7 @@ TEST_F_TIMEOUT(migration, shared_anon, 2*RUNTIME) >> pid_t pid; >> uint64_t *ptr; >> int i; >> + int ret; >> if (self->nthreads < 2 || self->n1 < 0 || self->n2 < 0) >> SKIP(return, "Not enough threads or NUMA nodes available"); >> @@ -161,9 +164,10 @@ TEST_F_TIMEOUT(migration, shared_anon, 2*RUNTIME) >> self->pids[i] = pid; >> } >> - ASSERT_EQ(migrate(ptr, self->n1, self->n2), 0); >> + ret = migrate(ptr, self->n1, self->n2); >> for (i = 0; i < self->nthreads - 1; i++) >> ASSERT_EQ(kill(self->pids[i], SIGTERM), 0); >> + ASSERT_EQ(ret, 0); > > > Might be cleaner to also: Or instead of? I agree this is neater, so will undo the moving of the ASSERT() and rely on this prctl. > > diff --git a/tools/testing/selftests/mm/migration.c > b/tools/testing/selftests/mm/migration.c > index 379581567f27..b3f12b9847ec 100644 > --- a/tools/testing/selftests/mm/migration.c > +++ b/tools/testing/selftests/mm/migration.c > @@ -11,6 +11,7 @@ > #include <numaif.h> > #include <sys/mman.h> > #include <sys/types.h> > +#include <sys/prctl.h> > #include <signal.h> > #include <time.h> > > @@ -155,10 +156,12 @@ TEST_F_TIMEOUT(migration, shared_anon, 2*RUNTIME) > memset(ptr, 0xde, TWOMEG); > for (i = 0; i < self->nthreads - 1; i++) { > pid = fork(); > - if (!pid) > + if (!pid) { > + prctl(PR_SET_PDEATHSIG, SIGHUP); > access_mem(ptr); > - else > + } else { > self->pids[i] = pid; > + } > } > > ASSERT_EQ(migrate(ptr, self->n1, self->n2), 0); > > > Then, whenever the parent dies, all child processes get zapped. > >
On 18.07.23 12:49, Ryan Roberts wrote: > On 17/07/2023 18:40, David Hildenbrand wrote: >> On 17.07.23 12:31, Ryan Roberts wrote: >>> The `migration` test currently has a number of robustness problems that >>> cause it to hang and leak resources. >>> >>> Timeout: There are 3 tests, which each previously ran for 60 seconds. >>> However, the timeout in mm/settings for a single test binary was set to >>> 45 seconds. So when run using run_kselftest.sh, the top level timeout >>> would trigger before the test binary was finished. Solve this by meeting >>> in the middle; each of the 3 tests now runs for 20 seconds (for a total >>> of 60), and the top level timeout is set to 90 seconds. >>> >>> Leaking child processes: the `shared_anon` test fork()s some children >>> but then an ASSERT() fires before the test kills those children. The >>> assert causes immediate exit of the parent and leaking of the children. >>> Furthermore, if run using the run_kselftest.sh wrapper, the wrapper >>> would get stuck waiting for those children to exit, which never happens. >>> Solve this by deferring any asserts until after the children are killed. >>> The same pattern is used for the threaded tests for uniformity. >>> >>> With these changes, the test binary now runs to completion on arm64, >>> with 2 tests passing and the `shared_anon` test failing. >>> >>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> >>> --- >>> tools/testing/selftests/mm/migration.c | 14 ++++++++++---- >>> tools/testing/selftests/mm/settings | 2 +- >>> 2 files changed, 11 insertions(+), 5 deletions(-) >>> >>> diff --git a/tools/testing/selftests/mm/migration.c >>> b/tools/testing/selftests/mm/migration.c >>> index 379581567f27..189d7d9070e8 100644 >>> --- a/tools/testing/selftests/mm/migration.c >>> +++ b/tools/testing/selftests/mm/migration.c >>> @@ -15,7 +15,7 @@ >>> #include <time.h> >>> #define TWOMEG (2<<20) >>> -#define RUNTIME (60) >>> +#define RUNTIME (20) >>> #define ALIGN(x, a) (((x) + (a - 1)) & (~((a) - 1))) >>> @@ -118,6 +118,7 @@ TEST_F_TIMEOUT(migration, private_anon, 2*RUNTIME) >>> { >>> uint64_t *ptr; >>> int i; >>> + int ret; >>> if (self->nthreads < 2 || self->n1 < 0 || self->n2 < 0) >>> SKIP(return, "Not enough threads or NUMA nodes available"); >>> @@ -131,9 +132,10 @@ TEST_F_TIMEOUT(migration, private_anon, 2*RUNTIME) >>> if (pthread_create(&self->threads[i], NULL, access_mem, ptr)) >>> perror("Couldn't create thread"); >>> - ASSERT_EQ(migrate(ptr, self->n1, self->n2), 0); >>> + ret = migrate(ptr, self->n1, self->n2); >>> for (i = 0; i < self->nthreads - 1; i++) >>> ASSERT_EQ(pthread_cancel(self->threads[i]), 0); >>> + ASSERT_EQ(ret, 0); >> >> Why is that required? This does not involve fork. > > It's not required. I was just trying to keep everything aligned to the same pattern. > >> >>> } >>> /* >>> @@ -144,6 +146,7 @@ TEST_F_TIMEOUT(migration, shared_anon, 2*RUNTIME) >>> pid_t pid; >>> uint64_t *ptr; >>> int i; >>> + int ret; >>> if (self->nthreads < 2 || self->n1 < 0 || self->n2 < 0) >>> SKIP(return, "Not enough threads or NUMA nodes available"); >>> @@ -161,9 +164,10 @@ TEST_F_TIMEOUT(migration, shared_anon, 2*RUNTIME) >>> self->pids[i] = pid; >>> } >>> - ASSERT_EQ(migrate(ptr, self->n1, self->n2), 0); >>> + ret = migrate(ptr, self->n1, self->n2); >>> for (i = 0; i < self->nthreads - 1; i++) >>> ASSERT_EQ(kill(self->pids[i], SIGTERM), 0); >>> + ASSERT_EQ(ret, 0); >> >> >> Might be cleaner to also: > > Or instead of? I agree this is neater, so will undo the moving of the ASSERT() > and rely on this prctl. I was thinking about possible races when our parent process already quits before our child managed to set the prctl. prctl() won't do anything in that case, hmmmm. But similarly, existing code might already trigger the migrate() + kill before the child processes even started to access_mem(). Racy :)
On 18.07.23 13:23, David Hildenbrand wrote: > On 18.07.23 12:49, Ryan Roberts wrote: >> On 17/07/2023 18:40, David Hildenbrand wrote: >>> On 17.07.23 12:31, Ryan Roberts wrote: >>>> The `migration` test currently has a number of robustness problems that >>>> cause it to hang and leak resources. >>>> >>>> Timeout: There are 3 tests, which each previously ran for 60 seconds. >>>> However, the timeout in mm/settings for a single test binary was set to >>>> 45 seconds. So when run using run_kselftest.sh, the top level timeout >>>> would trigger before the test binary was finished. Solve this by meeting >>>> in the middle; each of the 3 tests now runs for 20 seconds (for a total >>>> of 60), and the top level timeout is set to 90 seconds. >>>> >>>> Leaking child processes: the `shared_anon` test fork()s some children >>>> but then an ASSERT() fires before the test kills those children. The >>>> assert causes immediate exit of the parent and leaking of the children. >>>> Furthermore, if run using the run_kselftest.sh wrapper, the wrapper >>>> would get stuck waiting for those children to exit, which never happens. >>>> Solve this by deferring any asserts until after the children are killed. >>>> The same pattern is used for the threaded tests for uniformity. >>>> >>>> With these changes, the test binary now runs to completion on arm64, >>>> with 2 tests passing and the `shared_anon` test failing. >>>> >>>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> >>>> --- >>>> tools/testing/selftests/mm/migration.c | 14 ++++++++++---- >>>> tools/testing/selftests/mm/settings | 2 +- >>>> 2 files changed, 11 insertions(+), 5 deletions(-) >>>> >>>> diff --git a/tools/testing/selftests/mm/migration.c >>>> b/tools/testing/selftests/mm/migration.c >>>> index 379581567f27..189d7d9070e8 100644 >>>> --- a/tools/testing/selftests/mm/migration.c >>>> +++ b/tools/testing/selftests/mm/migration.c >>>> @@ -15,7 +15,7 @@ >>>> #include <time.h> >>>> #define TWOMEG (2<<20) >>>> -#define RUNTIME (60) >>>> +#define RUNTIME (20) >>>> #define ALIGN(x, a) (((x) + (a - 1)) & (~((a) - 1))) >>>> @@ -118,6 +118,7 @@ TEST_F_TIMEOUT(migration, private_anon, 2*RUNTIME) >>>> { >>>> uint64_t *ptr; >>>> int i; >>>> + int ret; >>>> if (self->nthreads < 2 || self->n1 < 0 || self->n2 < 0) >>>> SKIP(return, "Not enough threads or NUMA nodes available"); >>>> @@ -131,9 +132,10 @@ TEST_F_TIMEOUT(migration, private_anon, 2*RUNTIME) >>>> if (pthread_create(&self->threads[i], NULL, access_mem, ptr)) >>>> perror("Couldn't create thread"); >>>> - ASSERT_EQ(migrate(ptr, self->n1, self->n2), 0); >>>> + ret = migrate(ptr, self->n1, self->n2); >>>> for (i = 0; i < self->nthreads - 1; i++) >>>> ASSERT_EQ(pthread_cancel(self->threads[i]), 0); >>>> + ASSERT_EQ(ret, 0); >>> >>> Why is that required? This does not involve fork. >> >> It's not required. I was just trying to keep everything aligned to the same pattern. >> >>> >>>> } >>>> /* >>>> @@ -144,6 +146,7 @@ TEST_F_TIMEOUT(migration, shared_anon, 2*RUNTIME) >>>> pid_t pid; >>>> uint64_t *ptr; >>>> int i; >>>> + int ret; >>>> if (self->nthreads < 2 || self->n1 < 0 || self->n2 < 0) >>>> SKIP(return, "Not enough threads or NUMA nodes available"); >>>> @@ -161,9 +164,10 @@ TEST_F_TIMEOUT(migration, shared_anon, 2*RUNTIME) >>>> self->pids[i] = pid; >>>> } >>>> - ASSERT_EQ(migrate(ptr, self->n1, self->n2), 0); >>>> + ret = migrate(ptr, self->n1, self->n2); >>>> for (i = 0; i < self->nthreads - 1; i++) >>>> ASSERT_EQ(kill(self->pids[i], SIGTERM), 0); >>>> + ASSERT_EQ(ret, 0); >>> >>> >>> Might be cleaner to also: >> >> Or instead of? I agree this is neater, so will undo the moving of the ASSERT() >> and rely on this prctl. > > I was thinking about possible races when our parent process already > quits before our child managed to set the prctl. prctl() won't do > anything in that case, hmmmm. > > But similarly, existing code might already trigger the migrate() + kill > before the child processes even started to access_mem(). > > Racy :) > Maybe what would work, is checking after the prctl() in the child if the parent is already gone.
On 18/07/2023 12:24, David Hildenbrand wrote: > On 18.07.23 13:23, David Hildenbrand wrote: >> On 18.07.23 12:49, Ryan Roberts wrote: >>> On 17/07/2023 18:40, David Hildenbrand wrote: >>>> On 17.07.23 12:31, Ryan Roberts wrote: >>>>> The `migration` test currently has a number of robustness problems that >>>>> cause it to hang and leak resources. >>>>> >>>>> Timeout: There are 3 tests, which each previously ran for 60 seconds. >>>>> However, the timeout in mm/settings for a single test binary was set to >>>>> 45 seconds. So when run using run_kselftest.sh, the top level timeout >>>>> would trigger before the test binary was finished. Solve this by meeting >>>>> in the middle; each of the 3 tests now runs for 20 seconds (for a total >>>>> of 60), and the top level timeout is set to 90 seconds. >>>>> >>>>> Leaking child processes: the `shared_anon` test fork()s some children >>>>> but then an ASSERT() fires before the test kills those children. The >>>>> assert causes immediate exit of the parent and leaking of the children. >>>>> Furthermore, if run using the run_kselftest.sh wrapper, the wrapper >>>>> would get stuck waiting for those children to exit, which never happens. >>>>> Solve this by deferring any asserts until after the children are killed. >>>>> The same pattern is used for the threaded tests for uniformity. >>>>> >>>>> With these changes, the test binary now runs to completion on arm64, >>>>> with 2 tests passing and the `shared_anon` test failing. >>>>> >>>>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> >>>>> --- >>>>> tools/testing/selftests/mm/migration.c | 14 ++++++++++---- >>>>> tools/testing/selftests/mm/settings | 2 +- >>>>> 2 files changed, 11 insertions(+), 5 deletions(-) >>>>> >>>>> diff --git a/tools/testing/selftests/mm/migration.c >>>>> b/tools/testing/selftests/mm/migration.c >>>>> index 379581567f27..189d7d9070e8 100644 >>>>> --- a/tools/testing/selftests/mm/migration.c >>>>> +++ b/tools/testing/selftests/mm/migration.c >>>>> @@ -15,7 +15,7 @@ >>>>> #include <time.h> >>>>> #define TWOMEG (2<<20) >>>>> -#define RUNTIME (60) >>>>> +#define RUNTIME (20) >>>>> #define ALIGN(x, a) (((x) + (a - 1)) & (~((a) - 1))) >>>>> @@ -118,6 +118,7 @@ TEST_F_TIMEOUT(migration, private_anon, 2*RUNTIME) >>>>> { >>>>> uint64_t *ptr; >>>>> int i; >>>>> + int ret; >>>>> if (self->nthreads < 2 || self->n1 < 0 || self->n2 < 0) >>>>> SKIP(return, "Not enough threads or NUMA nodes available"); >>>>> @@ -131,9 +132,10 @@ TEST_F_TIMEOUT(migration, private_anon, 2*RUNTIME) >>>>> if (pthread_create(&self->threads[i], NULL, access_mem, ptr)) >>>>> perror("Couldn't create thread"); >>>>> - ASSERT_EQ(migrate(ptr, self->n1, self->n2), 0); >>>>> + ret = migrate(ptr, self->n1, self->n2); >>>>> for (i = 0; i < self->nthreads - 1; i++) >>>>> ASSERT_EQ(pthread_cancel(self->threads[i]), 0); >>>>> + ASSERT_EQ(ret, 0); >>>> >>>> Why is that required? This does not involve fork. >>> >>> It's not required. I was just trying to keep everything aligned to the same >>> pattern. >>> >>>> >>>>> } >>>>> /* >>>>> @@ -144,6 +146,7 @@ TEST_F_TIMEOUT(migration, shared_anon, 2*RUNTIME) >>>>> pid_t pid; >>>>> uint64_t *ptr; >>>>> int i; >>>>> + int ret; >>>>> if (self->nthreads < 2 || self->n1 < 0 || self->n2 < 0) >>>>> SKIP(return, "Not enough threads or NUMA nodes available"); >>>>> @@ -161,9 +164,10 @@ TEST_F_TIMEOUT(migration, shared_anon, 2*RUNTIME) >>>>> self->pids[i] = pid; >>>>> } >>>>> - ASSERT_EQ(migrate(ptr, self->n1, self->n2), 0); >>>>> + ret = migrate(ptr, self->n1, self->n2); >>>>> for (i = 0; i < self->nthreads - 1; i++) >>>>> ASSERT_EQ(kill(self->pids[i], SIGTERM), 0); >>>>> + ASSERT_EQ(ret, 0); >>>> >>>> >>>> Might be cleaner to also: >>> >>> Or instead of? I agree this is neater, so will undo the moving of the ASSERT() >>> and rely on this prctl. >> >> I was thinking about possible races when our parent process already >> quits before our child managed to set the prctl. prctl() won't do >> anything in that case, hmmmm. >> >> But similarly, existing code might already trigger the migrate() + kill >> before the child processes even started to access_mem(). >> >> Racy :) >> > > Maybe what would work, is checking after the prctl() in the child if the parent > is already gone. Like this? if (!pid) { prctl(PR_SET_PDEATHSIG, SIGHUP); /* Parent may have died before prctl so check now. */ if (getppid() == 1) kill(getpid(), SIGHUP); access_mem(ptr); } >
On 18.07.23 14:42, Ryan Roberts wrote: > On 18/07/2023 12:24, David Hildenbrand wrote: >> On 18.07.23 13:23, David Hildenbrand wrote: >>> On 18.07.23 12:49, Ryan Roberts wrote: >>>> On 17/07/2023 18:40, David Hildenbrand wrote: >>>>> On 17.07.23 12:31, Ryan Roberts wrote: >>>>>> The `migration` test currently has a number of robustness problems that >>>>>> cause it to hang and leak resources. >>>>>> >>>>>> Timeout: There are 3 tests, which each previously ran for 60 seconds. >>>>>> However, the timeout in mm/settings for a single test binary was set to >>>>>> 45 seconds. So when run using run_kselftest.sh, the top level timeout >>>>>> would trigger before the test binary was finished. Solve this by meeting >>>>>> in the middle; each of the 3 tests now runs for 20 seconds (for a total >>>>>> of 60), and the top level timeout is set to 90 seconds. >>>>>> >>>>>> Leaking child processes: the `shared_anon` test fork()s some children >>>>>> but then an ASSERT() fires before the test kills those children. The >>>>>> assert causes immediate exit of the parent and leaking of the children. >>>>>> Furthermore, if run using the run_kselftest.sh wrapper, the wrapper >>>>>> would get stuck waiting for those children to exit, which never happens. >>>>>> Solve this by deferring any asserts until after the children are killed. >>>>>> The same pattern is used for the threaded tests for uniformity. >>>>>> >>>>>> With these changes, the test binary now runs to completion on arm64, >>>>>> with 2 tests passing and the `shared_anon` test failing. >>>>>> >>>>>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> >>>>>> --- >>>>>> tools/testing/selftests/mm/migration.c | 14 ++++++++++---- >>>>>> tools/testing/selftests/mm/settings | 2 +- >>>>>> 2 files changed, 11 insertions(+), 5 deletions(-) >>>>>> >>>>>> diff --git a/tools/testing/selftests/mm/migration.c >>>>>> b/tools/testing/selftests/mm/migration.c >>>>>> index 379581567f27..189d7d9070e8 100644 >>>>>> --- a/tools/testing/selftests/mm/migration.c >>>>>> +++ b/tools/testing/selftests/mm/migration.c >>>>>> @@ -15,7 +15,7 @@ >>>>>> #include <time.h> >>>>>> #define TWOMEG (2<<20) >>>>>> -#define RUNTIME (60) >>>>>> +#define RUNTIME (20) >>>>>> #define ALIGN(x, a) (((x) + (a - 1)) & (~((a) - 1))) >>>>>> @@ -118,6 +118,7 @@ TEST_F_TIMEOUT(migration, private_anon, 2*RUNTIME) >>>>>> { >>>>>> uint64_t *ptr; >>>>>> int i; >>>>>> + int ret; >>>>>> if (self->nthreads < 2 || self->n1 < 0 || self->n2 < 0) >>>>>> SKIP(return, "Not enough threads or NUMA nodes available"); >>>>>> @@ -131,9 +132,10 @@ TEST_F_TIMEOUT(migration, private_anon, 2*RUNTIME) >>>>>> if (pthread_create(&self->threads[i], NULL, access_mem, ptr)) >>>>>> perror("Couldn't create thread"); >>>>>> - ASSERT_EQ(migrate(ptr, self->n1, self->n2), 0); >>>>>> + ret = migrate(ptr, self->n1, self->n2); >>>>>> for (i = 0; i < self->nthreads - 1; i++) >>>>>> ASSERT_EQ(pthread_cancel(self->threads[i]), 0); >>>>>> + ASSERT_EQ(ret, 0); >>>>> >>>>> Why is that required? This does not involve fork. >>>> >>>> It's not required. I was just trying to keep everything aligned to the same >>>> pattern. >>>> >>>>> >>>>>> } >>>>>> /* >>>>>> @@ -144,6 +146,7 @@ TEST_F_TIMEOUT(migration, shared_anon, 2*RUNTIME) >>>>>> pid_t pid; >>>>>> uint64_t *ptr; >>>>>> int i; >>>>>> + int ret; >>>>>> if (self->nthreads < 2 || self->n1 < 0 || self->n2 < 0) >>>>>> SKIP(return, "Not enough threads or NUMA nodes available"); >>>>>> @@ -161,9 +164,10 @@ TEST_F_TIMEOUT(migration, shared_anon, 2*RUNTIME) >>>>>> self->pids[i] = pid; >>>>>> } >>>>>> - ASSERT_EQ(migrate(ptr, self->n1, self->n2), 0); >>>>>> + ret = migrate(ptr, self->n1, self->n2); >>>>>> for (i = 0; i < self->nthreads - 1; i++) >>>>>> ASSERT_EQ(kill(self->pids[i], SIGTERM), 0); >>>>>> + ASSERT_EQ(ret, 0); >>>>> >>>>> >>>>> Might be cleaner to also: >>>> >>>> Or instead of? I agree this is neater, so will undo the moving of the ASSERT() >>>> and rely on this prctl. >>> >>> I was thinking about possible races when our parent process already >>> quits before our child managed to set the prctl. prctl() won't do >>> anything in that case, hmmmm. >>> >>> But similarly, existing code might already trigger the migrate() + kill >>> before the child processes even started to access_mem(). >>> >>> Racy :) >>> >> >> Maybe what would work, is checking after the prctl() in the child if the parent >> is already gone. > > > Like this? > > if (!pid) { > prctl(PR_SET_PDEATHSIG, SIGHUP); > /* Parent may have died before prctl so check now. */ > if (getppid() == 1) > kill(getpid(), SIGHUP); > access_mem(ptr); > } Staring at forget_original_parent(), that order should work. I do wonder if there is a nicer way to handle that, but maybe that already is the "nice" way.
diff --git a/tools/testing/selftests/mm/migration.c b/tools/testing/selftests/mm/migration.c index 379581567f27..189d7d9070e8 100644 --- a/tools/testing/selftests/mm/migration.c +++ b/tools/testing/selftests/mm/migration.c @@ -15,7 +15,7 @@ #include <time.h> #define TWOMEG (2<<20) -#define RUNTIME (60) +#define RUNTIME (20) #define ALIGN(x, a) (((x) + (a - 1)) & (~((a) - 1))) @@ -118,6 +118,7 @@ TEST_F_TIMEOUT(migration, private_anon, 2*RUNTIME) { uint64_t *ptr; int i; + int ret; if (self->nthreads < 2 || self->n1 < 0 || self->n2 < 0) SKIP(return, "Not enough threads or NUMA nodes available"); @@ -131,9 +132,10 @@ TEST_F_TIMEOUT(migration, private_anon, 2*RUNTIME) if (pthread_create(&self->threads[i], NULL, access_mem, ptr)) perror("Couldn't create thread"); - ASSERT_EQ(migrate(ptr, self->n1, self->n2), 0); + ret = migrate(ptr, self->n1, self->n2); for (i = 0; i < self->nthreads - 1; i++) ASSERT_EQ(pthread_cancel(self->threads[i]), 0); + ASSERT_EQ(ret, 0); } /* @@ -144,6 +146,7 @@ TEST_F_TIMEOUT(migration, shared_anon, 2*RUNTIME) pid_t pid; uint64_t *ptr; int i; + int ret; if (self->nthreads < 2 || self->n1 < 0 || self->n2 < 0) SKIP(return, "Not enough threads or NUMA nodes available"); @@ -161,9 +164,10 @@ TEST_F_TIMEOUT(migration, shared_anon, 2*RUNTIME) self->pids[i] = pid; } - ASSERT_EQ(migrate(ptr, self->n1, self->n2), 0); + ret = migrate(ptr, self->n1, self->n2); for (i = 0; i < self->nthreads - 1; i++) ASSERT_EQ(kill(self->pids[i], SIGTERM), 0); + ASSERT_EQ(ret, 0); } /* @@ -173,6 +177,7 @@ TEST_F_TIMEOUT(migration, private_anon_thp, 2*RUNTIME) { uint64_t *ptr; int i; + int ret; if (self->nthreads < 2 || self->n1 < 0 || self->n2 < 0) SKIP(return, "Not enough threads or NUMA nodes available"); @@ -188,9 +193,10 @@ TEST_F_TIMEOUT(migration, private_anon_thp, 2*RUNTIME) if (pthread_create(&self->threads[i], NULL, access_mem, ptr)) perror("Couldn't create thread"); - ASSERT_EQ(migrate(ptr, self->n1, self->n2), 0); + ret = migrate(ptr, self->n1, self->n2); for (i = 0; i < self->nthreads - 1; i++) ASSERT_EQ(pthread_cancel(self->threads[i]), 0); + ASSERT_EQ(ret, 0); } TEST_HARNESS_MAIN diff --git a/tools/testing/selftests/mm/settings b/tools/testing/selftests/mm/settings index 9abfc60e9e6f..ba4d85f74cd6 100644 --- a/tools/testing/selftests/mm/settings +++ b/tools/testing/selftests/mm/settings @@ -1 +1 @@ -timeout=45 +timeout=90