[libstdc++] Use __gthread_join in jthread/95989

Message ID oredqou760.fsf@lxoliva.fsfla.org
State Accepted
Headers
Series [libstdc++] Use __gthread_join in jthread/95989 |

Checks

Context Check Description
snail/gcc-patch-check success Github commit url

Commit Message

Alexandre Oliva Feb. 17, 2023, 6:39 a.m. UTC
  Ref: https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570617.html

Bernd Edlinger <bernd.edlinger@hotmail.de> reported that the 95989.cc
test fails without pthread_join at the end of main, but pthread_join
is no good for a test that doesn't require pthreads.

This patch adds a __gthread_join call instead.

Regstrapped on x86_64-linux-gnu.
Tested on arm-vxworks7 (gcc-12) and arm-eabi (trunk).  Ok to install?

for  libstdc++-v3/ChangeLog

	* testsuite/30_threads/jthread/95989.cc (main): Call
	__gthread_join at the end.
---
 libstdc++-v3/testsuite/30_threads/jthread/95989.cc |    1 +
 1 file changed, 1 insertion(+)
  

Comments

Jonathan Wakely Feb. 17, 2023, 10:58 a.m. UTC | #1
On Fri, 17 Feb 2023, 06:40 Alexandre Oliva via Libstdc++, <
libstdc++@gcc.gnu.org> wrote:

>
> Ref: https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570617.html
>
> Bernd Edlinger <bernd.edlinger@hotmail.de> reported that the 95989.cc
> test fails without pthread_join at the end of main,


Yes, but that doesn't mean we want to do that.


but pthread_join
> is no good for a test that doesn't require pthreads.
>
> This patch adds a __gthread_join call instead.
>
> Regstrapped on x86_64-linux-gnu.
> Tested on arm-vxworks7 (gcc-12) and arm-eabi (trunk).  Ok to install?
>

Nope, the test is correct, if it needs that there's a bug somewhere else.
Papering over it in the test doesn't help.

Each of test01, test02, and test03 creates a thread, joins it, and returns,
which needs to work with no additional code. The change happens to "work"
because it causes a non-weak reference to pthread_join, which otherwise
doesn't get linked in from libpthread.a on ubuntu.

The proper fix is to ensure the program has a non-weak reference to
pthread_join without extra help (or use a recent glibc where it always
works).




> for  libstdc++-v3/ChangeLog
>
>         * testsuite/30_threads/jthread/95989.cc (main): Call
>         __gthread_join at the end.
> ---
>  libstdc++-v3/testsuite/30_threads/jthread/95989.cc |    1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/libstdc++-v3/testsuite/30_threads/jthread/95989.cc
> b/libstdc++-v3/testsuite/30_threads/jthread/95989.cc
> index e98836d094531..407b52748438c 100644
> --- a/libstdc++-v3/testsuite/30_threads/jthread/95989.cc
> +++ b/libstdc++-v3/testsuite/30_threads/jthread/95989.cc
> @@ -52,4 +52,5 @@ main()
>    test01();
>    test02();
>    test03();
> +  __gthread_join(0, NULL);
>  }
>
> --
> Alexandre Oliva, happy hacker                https://FSFLA.org/blogs/lxo/
>    Free Software Activist                       GNU Toolchain Engineer
> Disinformation flourishes because many people care deeply about injustice
> but very few check the facts.  Ask me about <https://stallmansupport.org>
>
  
Alexandre Oliva March 3, 2023, 8:18 a.m. UTC | #2
On Feb 17, 2023, Jonathan Wakely <jwakely.gcc@gmail.com> wrote:

> The proper fix is to ensure the program has a non-weak reference to
> pthread_join without extra help (or use a recent glibc where it always
> works).

Indeed!  How about this?  Regstrapped on x86_64-linux-gnu, also tested
on arm-vx7r2 (gcc-12); verified that strong references are present in
95989.o, but not in libstdc++.a or .so.  Ok to install?

(nit: the static local in a ctor of a template class may make for
multiple copies.  Maybe a non-template always-inline function called by
all instantiations would be better.)


link pthread_join from std::thread ctor

Like pthread_create, pthread_join may fail to be statically linked in
absent strong uses, so add to user code strong references to both when
std::thread objects are created.


for  libstdc++-v3/ChangeLog

	* include/bits/std_thread.h (std::thread ctor): Add strong
	reference to pthread_join.

---
 libstdc++-v3/include/bits/std_thread.h |    1 +
 1 file changed, 1 insertion(+)

diff --git a/libstdc++-v3/include/bits/std_thread.h b/libstdc++-v3/include/bits/std_thread.h
index adbd3928ff783..4bda350fa2c7b 100644
--- a/libstdc++-v3/include/bits/std_thread.h
+++ b/libstdc++-v3/include/bits/std_thread.h
@@ -145,6 +145,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #ifdef GTHR_ACTIVE_PROXY
 	// Create a reference to pthread_create, not just the gthr weak symbol.
 	auto __depend = reinterpret_cast<void(*)()>(&pthread_create);
+	static auto __attribute__((__used__)) __depend_join = &pthread_join;
 #else
 	auto __depend = nullptr;
 #endif
  
Jonathan Wakely March 3, 2023, 9:33 a.m. UTC | #3
On Fri, 3 Mar 2023 at 08:20, Alexandre Oliva via Libstdc++ <
libstdc++@gcc.gnu.org> wrote:

> On Feb 17, 2023, Jonathan Wakely <jwakely.gcc@gmail.com> wrote:
>
> > The proper fix is to ensure the program has a non-weak reference to
> > pthread_join without extra help (or use a recent glibc where it always
> > works).
>
> Indeed!  How about this?  Regstrapped on x86_64-linux-gnu, also tested
> on arm-vx7r2 (gcc-12); verified that strong references are present in
> 95989.o, but not in libstdc++.a or .so.  Ok to install?
>
> (nit: the static local in a ctor of a template class may make for
> multiple copies.  Maybe a non-template always-inline function called by
> all instantiations would be better.)
>

Yeah, that does seem less than ideal.

Jakub previously suggested doing this for PR 61841, which was a similar
problem with pthread_create:

__asm ("" : : "r" (&pthread_create)); would not be optimized away.


That would avoid the multiple copies.

Alternatively we could get really creative and cast the addresses of both
&pthread_create and &pthread_join to uintptr_t and XOR them, and pass that
as the __depend argument (which is never actually dereferenced, it's only
there to create a link-time dependency).



>
>
> link pthread_join from std::thread ctor
>
> Like pthread_create, pthread_join may fail to be statically linked in
> absent strong uses, so add to user code strong references to both when
> std::thread objects are created.
>
>
> for  libstdc++-v3/ChangeLog
>
>         * include/bits/std_thread.h (std::thread ctor): Add strong
>         reference to pthread_join.
>
> ---
>  libstdc++-v3/include/bits/std_thread.h |    1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/libstdc++-v3/include/bits/std_thread.h
> b/libstdc++-v3/include/bits/std_thread.h
> index adbd3928ff783..4bda350fa2c7b 100644
> --- a/libstdc++-v3/include/bits/std_thread.h
> +++ b/libstdc++-v3/include/bits/std_thread.h
> @@ -145,6 +145,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  #ifdef GTHR_ACTIVE_PROXY
>         // Create a reference to pthread_create, not just the gthr weak
> symbol.
>         auto __depend = reinterpret_cast<void(*)()>(&pthread_create);
> +       static auto __attribute__((__used__)) __depend_join =
> &pthread_join;
>  #else
>         auto __depend = nullptr;
>  #endif
>
>
> --
> Alexandre Oliva, happy hacker                https://FSFLA.org/blogs/lxo/
>    Free Software Activist                       GNU Toolchain Engineer
> Disinformation flourishes because many people care deeply about injustice
> but very few check the facts.  Ask me about <https://stallmansupport.org>
>
>
  
Jonathan Wakely March 3, 2023, 9:48 a.m. UTC | #4
On Fri, 3 Mar 2023 at 09:33, Jonathan Wakely <jwakely@redhat.com> wrote:

>
>
> On Fri, 3 Mar 2023 at 08:20, Alexandre Oliva via Libstdc++ <
> libstdc++@gcc.gnu.org> wrote:
>
>> On Feb 17, 2023, Jonathan Wakely <jwakely.gcc@gmail.com> wrote:
>>
>> > The proper fix is to ensure the program has a non-weak reference to
>> > pthread_join without extra help (or use a recent glibc where it always
>> > works).
>>
>> Indeed!  How about this?  Regstrapped on x86_64-linux-gnu, also tested
>> on arm-vx7r2 (gcc-12); verified that strong references are present in
>> 95989.o, but not in libstdc++.a or .so.  Ok to install?
>>
>> (nit: the static local in a ctor of a template class may make for
>> multiple copies.  Maybe a non-template always-inline function called by
>> all instantiations would be better.)
>>
>
> Yeah, that does seem less than ideal.
>
> Jakub previously suggested doing this for PR 61841, which was a similar
> problem with pthread_create:
>
> __asm ("" : : "r" (&pthread_create)); would not be optimized away.
>
>
> That would avoid the multiple copies.
>

As Jakub pointed out, it adds a scheduling barrier, but a few cycles when
creating a new thread is probably not even measurable.


>
> Alternatively we could get really creative and cast the addresses of both
> &pthread_create and &pthread_join to uintptr_t and XOR them, and pass that
> as the __depend argument (which is never actually dereferenced, it's only
> there to create a link-time dependency).
>

I should be clear that I don't think the creative solution is a good idea.
And if we ever support building libstdc++ with LTO it will become visible
that the argument is unused, and we'd need something like the asm
dependency anyway.



>
>
>
>>
>>
>> link pthread_join from std::thread ctor
>>
>> Like pthread_create, pthread_join may fail to be statically linked in
>> absent strong uses, so add to user code strong references to both when
>> std::thread objects are created.
>>
>>
>> for  libstdc++-v3/ChangeLog
>>
>>         * include/bits/std_thread.h (std::thread ctor): Add strong
>>         reference to pthread_join.
>>
>> ---
>>  libstdc++-v3/include/bits/std_thread.h |    1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/libstdc++-v3/include/bits/std_thread.h
>> b/libstdc++-v3/include/bits/std_thread.h
>> index adbd3928ff783..4bda350fa2c7b 100644
>> --- a/libstdc++-v3/include/bits/std_thread.h
>> +++ b/libstdc++-v3/include/bits/std_thread.h
>> @@ -145,6 +145,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>  #ifdef GTHR_ACTIVE_PROXY
>>         // Create a reference to pthread_create, not just the gthr weak
>> symbol.
>>         auto __depend = reinterpret_cast<void(*)()>(&pthread_create);
>> +       static auto __attribute__((__used__)) __depend_join =
>> &pthread_join;
>>  #else
>>         auto __depend = nullptr;
>>  #endif
>>
>>
>> --
>> Alexandre Oliva, happy hacker                https://FSFLA.org/blogs/lxo/
>>    Free Software Activist                       GNU Toolchain Engineer
>> Disinformation flourishes because many people care deeply about injustice
>> but very few check the facts.  Ask me about <https://stallmansupport.org>
>>
>>
  
Alexandre Oliva March 3, 2023, 5:46 p.m. UTC | #5
On Mar  3, 2023, Jonathan Wakely <jwakely@redhat.com> wrote:

> On Fri, 3 Mar 2023 at 09:33, Jonathan Wakely <jwakely@redhat.com> wrote:
>> Jakub previously suggested doing this for PR 61841, which was a similar
>> problem with pthread_create:
>> 
>> __asm ("" : : "r" (&pthread_create)); would not be optimized away.
>> 
>> 
>> That would avoid the multiple copies.

Not really.  There would be multiple copies of the code that loads
pthread_create's address.  And we don't really need the address, a
single never-executed call would do.  I've explored these possibilities
a bit, and here's what I've come up with: a private static member
function that we output in units that instantiate the thread template
ctor, to pass its address to _M_start_thread.  Since it's never actually
called, we don't really need the hacks in some of the alternatives I
left in place, mainly for your enjoyment.

They all work equally well, just as efficient per-instantiation at
runtime, a little different space and loading overheads, but the last
one, that is enabled, is my favorite: only PLT relocations, that we'd
likely get anyway, no full-address resolution, and as-short-as-possible
calls, enough to get a relocation with a strong reference to pull the
symbol in when linking, but as short as possible call sequences, because
of the type cast.

As a bonus, I put in (in the last minute, after my test runs) something
to keep even LTO happy: the asm statements to prevent depend from being
optimized out in _M_start_thread.  In non-LTO, its impact should be
virtually zero.

How does this look?  (minus the #if 0/#elif 0/.../#else)


link pthread_join from std::thread ctor

Like pthread_create, pthread_join may fail to be statically linked in
absent strong uses, so add to user code strong references to both when
std::thread objects are created.


for  libstdc++-v3/ChangeLog

	* include/bits/std_thread.h (thread::_M_thread_deps): New
	static inline function.
	(std::thread template ctor): Pass it to _M_start_thread.
	* src/c++11/thread.cc (thread::_M_start_thread): Name depend
	parameter, force it live on entry.
---
 libstdc++-v3/include/bits/std_thread.h |   51 ++++++++++++++++++++++++++++----
 libstdc++-v3/src/c++11/thread.cc       |   10 +++++-
 2 files changed, 52 insertions(+), 9 deletions(-)

diff --git a/libstdc++-v3/include/bits/std_thread.h b/libstdc++-v3/include/bits/std_thread.h
index adbd3928ff783..3ffd2a823a698 100644
--- a/libstdc++-v3/include/bits/std_thread.h
+++ b/libstdc++-v3/include/bits/std_thread.h
@@ -132,6 +132,49 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
     thread() noexcept = default;
 
 #ifdef _GLIBCXX_HAS_GTHREADS
+  private:
+    // This adds to user code that creates std:thread objects (because
+    // it is called by the template ctor below) strong references to
+    // pthread_create and pthread_join, which ensures they are both
+    // linked in even during static linking.  We can't depend on
+    // gthread calls to bring them in, because those may use weak
+    // references.
+    static void
+    _M_thread_deps_never_run() {
+#ifdef GTHR_ACTIVE_PROXY
+#if 0
+      static auto const __attribute__ ((__used__)) _M_create = pthread_create;
+      static auto const __attribute__ ((__used__)) _M_join = pthread_join;
+#elif 0
+      pthread_t thr;
+      pthread_create (&thr, nullptr, nullptr, nullptr);
+      pthread_join (thr, nullptr);
+#elif 0
+      asm goto ("" : : : : _M_never_run);
+      if (0)
+	{
+	_M_never_run:
+	  pthread_t thr;
+	  pthread_create (&thr, nullptr, nullptr, nullptr);
+	  pthread_join (thr, nullptr);
+	}
+#elif 0
+      bool _M_skip_always = false;
+      asm ("" : "+rm" (_M_skip_always));
+      if (__builtin_expect (_M_skip_always, false))
+	{
+	  pthread_t thr;
+	  pthread_create (&thr, nullptr, nullptr, nullptr);
+	  pthread_join (thr, nullptr);
+	}
+#else
+      reinterpret_cast<void (*)(void)>(&pthread_create)();
+      reinterpret_cast<void (*)(void)>(&pthread_join)();
+#endif
+#endif
+    }
+
+  public:
     template<typename _Callable, typename... _Args,
 	     typename = _Require<__not_same<_Callable>>>
       explicit
@@ -142,18 +185,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	  "std::thread arguments must be invocable after conversion to rvalues"
 	  );
 
-#ifdef GTHR_ACTIVE_PROXY
-	// Create a reference to pthread_create, not just the gthr weak symbol.
-	auto __depend = reinterpret_cast<void(*)()>(&pthread_create);
-#else
-	auto __depend = nullptr;
-#endif
 	using _Wrapper = _Call_wrapper<_Callable, _Args...>;
 	// Create a call wrapper with DECAY_COPY(__f) as its target object
 	// and DECAY_COPY(__args)... as its bound argument entities.
 	_M_start_thread(_State_ptr(new _State_impl<_Wrapper>(
 	      std::forward<_Callable>(__f), std::forward<_Args>(__args)...)),
-	    __depend);
+	    _M_thread_deps_never_run);
       }
 #endif // _GLIBCXX_HAS_GTHREADS
 
diff --git a/libstdc++-v3/src/c++11/thread.cc b/libstdc++-v3/src/c++11/thread.cc
index 2d5ffaf678e97..c91f7b02e1f3f 100644
--- a/libstdc++-v3/src/c++11/thread.cc
+++ b/libstdc++-v3/src/c++11/thread.cc
@@ -154,8 +154,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   }
 
   void
-  thread::_M_start_thread(_State_ptr state, void (*)())
+  thread::_M_start_thread(_State_ptr state, void (*depend)())
   {
+    // Make sure it's not optimized out, not even with LTO.
+    asm ("" : : "rm" (depend));
+
     if (!__gthread_active_p())
       {
 #if __cpp_exceptions
@@ -190,8 +193,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   }
 
   void
-  thread::_M_start_thread(__shared_base_type __b, void (*)())
+  thread::_M_start_thread(__shared_base_type __b, void (*depend)())
   {
+    // Make sure it's not optimized out, not even with LTO.
+    asm ("" : : "rm" (depend));
+
     auto ptr = __b.get();
     // Create a reference cycle that will be broken in the new thread.
     ptr->_M_this_ptr = std::move(__b);
  
Jonathan Wakely March 3, 2023, 6:12 p.m. UTC | #6
On Fri, 3 Mar 2023 at 17:47, Alexandre Oliva <oliva@adacore.com> wrote:

> On Mar  3, 2023, Jonathan Wakely <jwakely@redhat.com> wrote:
>
> > On Fri, 3 Mar 2023 at 09:33, Jonathan Wakely <jwakely@redhat.com> wrote:
> >> Jakub previously suggested doing this for PR 61841, which was a similar
> >> problem with pthread_create:
> >>
> >> __asm ("" : : "r" (&pthread_create)); would not be optimized away.
> >>
> >>
> >> That would avoid the multiple copies.
>
> Not really.  There would be multiple copies of the code that loads
> pthread_create's address.  And we don't really need the address, a
> single never-executed call would do.  I've explored these possibilities
> a bit, and here's what I've come up with: a private static member
> function that we output in units that instantiate the thread template
> ctor, to pass its address to _M_start_thread.  Since it's never actually
> called, we don't really need the hacks in some of the alternatives I
> left in place, mainly for your enjoyment.
>
> They all work equally well, just as efficient per-instantiation at
> runtime, a little different space and loading overheads, but the last
> one, that is enabled, is my favorite: only PLT relocations, that we'd
> likely get anyway, no full-address resolution, and as-short-as-possible
> calls, enough to get a relocation with a strong reference to pull the
> symbol in when linking, but as short as possible call sequences, because
> of the type cast.
>

And those expressions aren't ever optimized away as unused?


>
> As a bonus, I put in (in the last minute, after my test runs) something
> to keep even LTO happy: the asm statements to prevent depend from being
> optimized out in _M_start_thread.  In non-LTO, its impact should be
> virtually zero.
>
> How does this look?  (minus the #if 0/#elif 0/.../#else)
>

Looks good, thanks for going the extra mile to check all the alternatives,
and the futureproofing it for LTO.

OK for trunk.
  
Jonathan Wakely March 3, 2023, 6:14 p.m. UTC | #7
On Fri, 3 Mar 2023 at 18:12, Jonathan Wakely <jwakely@redhat.com> wrote:

>
>
> On Fri, 3 Mar 2023 at 17:47, Alexandre Oliva <oliva@adacore.com> wrote:
>
>> On Mar  3, 2023, Jonathan Wakely <jwakely@redhat.com> wrote:
>>
>> > On Fri, 3 Mar 2023 at 09:33, Jonathan Wakely <jwakely@redhat.com>
>> wrote:
>> >> Jakub previously suggested doing this for PR 61841, which was a similar
>> >> problem with pthread_create:
>> >>
>> >> __asm ("" : : "r" (&pthread_create)); would not be optimized away.
>> >>
>> >>
>> >> That would avoid the multiple copies.
>>
>> Not really.  There would be multiple copies of the code that loads
>> pthread_create's address.  And we don't really need the address, a
>> single never-executed call would do.  I've explored these possibilities
>> a bit, and here's what I've come up with: a private static member
>> function that we output in units that instantiate the thread template
>> ctor, to pass its address to _M_start_thread.  Since it's never actually
>> called, we don't really need the hacks in some of the alternatives I
>> left in place, mainly for your enjoyment.
>>
>> They all work equally well, just as efficient per-instantiation at
>> runtime, a little different space and loading overheads, but the last
>> one, that is enabled, is my favorite: only PLT relocations, that we'd
>> likely get anyway, no full-address resolution, and as-short-as-possible
>> calls, enough to get a relocation with a strong reference to pull the
>> symbol in when linking, but as short as possible call sequences, because
>> of the type cast.
>>
>
> And those expressions aren't ever optimized away as unused?
>

Oh, I missed that they're called after casting them, I didn't notice the
trailing ().

That would be UB to call them through the wrong pointer type, so the
compiler could decide they're unreachable, but it seems to work for now.

Thanks!
  
Alexandre Oliva March 3, 2023, 7:55 p.m. UTC | #8
On Mar  3, 2023, Jonathan Wakely <jwakely@redhat.com> wrote:

> On Fri, 3 Mar 2023 at 18:12, Jonathan Wakely <jwakely@redhat.com> wrote:

>> And those expressions aren't ever optimized away as unused?

> Oh, I missed that they're called after casting them

*nod*

> That would be UB to call them through the wrong pointer type, so the
> compiler could decide they're unreachable

Ugh, indeed.  We can drop the cast and add the required parameters if it
ever becomes an issue.  Here's what I've just installed.


link pthread_join from std::thread ctor

Like pthread_create, pthread_join may fail to be statically linked in
absent strong uses, so add to user code strong references to both when
std::thread objects are created.


for  libstdc++-v3/ChangeLog

	PR libstdc++/104852
	PR libstdc++/95989
	PR libstdc++/52590
	* include/bits/std_thread.h (thread::_M_thread_deps): New
	static implicitly-inline member function.
	(std::thread template ctor): Pass it to _M_start_thread.
	* src/c++11/thread.cc (thread::_M_start_thread): Name depend
	parameter, force it live on entry.
---
 libstdc++-v3/include/bits/std_thread.h |   24 +++++++++++++++++-------
 libstdc++-v3/src/c++11/thread.cc       |   10 ++++++++--
 2 files changed, 25 insertions(+), 9 deletions(-)

diff --git a/libstdc++-v3/include/bits/std_thread.h b/libstdc++-v3/include/bits/std_thread.h
index adbd3928ff783..e88c07fbd4f0f 100644
--- a/libstdc++-v3/include/bits/std_thread.h
+++ b/libstdc++-v3/include/bits/std_thread.h
@@ -132,6 +132,22 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
     thread() noexcept = default;
 
 #ifdef _GLIBCXX_HAS_GTHREADS
+  private:
+    // This adds to user code that creates std:thread objects (because
+    // it is called by the template ctor below) strong references to
+    // pthread_create and pthread_join, which ensures they are both
+    // linked in even during static linking.  We can't depend on
+    // gthread calls to bring them in, because those may use weak
+    // references.
+    static void
+    _M_thread_deps_never_run() {
+#ifdef GTHR_ACTIVE_PROXY
+      reinterpret_cast<void (*)(void)>(&pthread_create)();
+      reinterpret_cast<void (*)(void)>(&pthread_join)();
+#endif
+    }
+
+  public:
     template<typename _Callable, typename... _Args,
 	     typename = _Require<__not_same<_Callable>>>
       explicit
@@ -142,18 +158,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	  "std::thread arguments must be invocable after conversion to rvalues"
 	  );
 
-#ifdef GTHR_ACTIVE_PROXY
-	// Create a reference to pthread_create, not just the gthr weak symbol.
-	auto __depend = reinterpret_cast<void(*)()>(&pthread_create);
-#else
-	auto __depend = nullptr;
-#endif
 	using _Wrapper = _Call_wrapper<_Callable, _Args...>;
 	// Create a call wrapper with DECAY_COPY(__f) as its target object
 	// and DECAY_COPY(__args)... as its bound argument entities.
 	_M_start_thread(_State_ptr(new _State_impl<_Wrapper>(
 	      std::forward<_Callable>(__f), std::forward<_Args>(__args)...)),
-	    __depend);
+	    _M_thread_deps_never_run);
       }
 #endif // _GLIBCXX_HAS_GTHREADS
 
diff --git a/libstdc++-v3/src/c++11/thread.cc b/libstdc++-v3/src/c++11/thread.cc
index 2d5ffaf678e97..c91f7b02e1f3f 100644
--- a/libstdc++-v3/src/c++11/thread.cc
+++ b/libstdc++-v3/src/c++11/thread.cc
@@ -154,8 +154,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   }
 
   void
-  thread::_M_start_thread(_State_ptr state, void (*)())
+  thread::_M_start_thread(_State_ptr state, void (*depend)())
   {
+    // Make sure it's not optimized out, not even with LTO.
+    asm ("" : : "rm" (depend));
+
     if (!__gthread_active_p())
       {
 #if __cpp_exceptions
@@ -190,8 +193,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   }
 
   void
-  thread::_M_start_thread(__shared_base_type __b, void (*)())
+  thread::_M_start_thread(__shared_base_type __b, void (*depend)())
   {
+    // Make sure it's not optimized out, not even with LTO.
+    asm ("" : : "rm" (depend));
+
     auto ptr = __b.get();
     // Create a reference cycle that will be broken in the new thread.
     ptr->_M_this_ptr = std::move(__b);
  
Florian Weimer March 3, 2023, 8:15 p.m. UTC | #9
* Alexandre Oliva via Gcc-patches:

> +    // Make sure it's not optimized out, not even with LTO.
> +    asm ("" : : "rm" (depend));

If the m constraint is used, this may never emit the symbol name and
thus not create a reference after all.
  
Alexandre Oliva March 3, 2023, 8:36 p.m. UTC | #10
Hello, Florian,

On Mar  3, 2023, Florian Weimer <fw@deneb.enyo.de> wrote:

> * Alexandre Oliva via Gcc-patches:
>> +    // Make sure it's not optimized out, not even with LTO.
>> +    asm ("" : : "rm" (depend));

> If the m constraint is used, this may never emit the symbol name and
> thus not create a reference after all.

But that is no longer the pthread symbol itself, it's the symbol of a
static member function with vague linkage that, because the compiler
believes the asm statement will reference it, will still be output, and
it's that function body that will refer to and thus pull in the symbols
we need.

Now, hmm, maybe with per-function sections, the compiler will emit it,
but with section gc, the linker may drop it, so we might lose the needed
function body.  When performing LTO with an LTO-enabled libstdc++.
  
Alexandre Oliva March 3, 2023, 8:57 p.m. UTC | #11
On Mar  3, 2023, Alexandre Oliva <oliva@adacore.com> wrote:

> Hello, Florian,
> On Mar  3, 2023, Florian Weimer <fw@deneb.enyo.de> wrote:

>> * Alexandre Oliva via Gcc-patches:
>>> +    // Make sure it's not optimized out, not even with LTO.
>>> +    asm ("" : : "rm" (depend));

>> If the m constraint is used, this may never emit the symbol name and
>> thus not create a reference after all.

> But that is no longer the pthread symbol itself, it's the symbol of a
> static member function with vague linkage that, because the compiler
> believes the asm statement will reference it, will still be output, and
> it's that function body that will refer to and thus pull in the symbols
> we need.

> Now, hmm, maybe with per-function sections, the compiler will emit it,
> but with section gc, the linker may drop it, so we might lose the needed
> function body.  When performing LTO with an LTO-enabled libstdc++.

Wait, no, you had me going but it's really fine.  What would be in
memory is a *pointer* to the function, so the function would still be
referenced by whatever initialized the pointer, so it wouldn't get GCed.
  

Patch

diff --git a/libstdc++-v3/testsuite/30_threads/jthread/95989.cc b/libstdc++-v3/testsuite/30_threads/jthread/95989.cc
index e98836d094531..407b52748438c 100644
--- a/libstdc++-v3/testsuite/30_threads/jthread/95989.cc
+++ b/libstdc++-v3/testsuite/30_threads/jthread/95989.cc
@@ -52,4 +52,5 @@  main()
   test01();
   test02();
   test03();
+  __gthread_join(0, NULL);
 }