[1/2] libstdc++: Atomic wait/notify ABI stabilization

Message ID 20231116134736.1287539-1-jwakely@redhat.com
State Accepted
Headers
Series [1/2] libstdc++: Atomic wait/notify ABI stabilization |

Checks

Context Check Description
snail/gcc-patch-check success Github commit url

Commit Message

Jonathan Wakely Nov. 16, 2023, 1:45 p.m. UTC
  From: Thomas Rodgers <trodgers@redhat.com>

These two patches were written by Tom earlier this year, before he left
Red Hat. We should finish reviewing them for GCC 14 (and probably squash
them into one?)

Tom, you mentioned further work that changes the __platform_wait_t* to
uintptr_t, is that ready, or likely to be ready very soon?

Tested x86_64-linux, testing underway on powerpc-aix and sparc-solaris.


-- >8 --

This represents a major refactoring of the previous atomic::wait
and atomic::notify implementation detail. The aim of this change
is to simplify the implementation details and position the resulting
implementation so that much of the current header-only detail
can be moved into the shared library, while also accounting for
anticipated changes to wait/notify functionality for C++26.

The previous implementation implemented spin logic in terms of
the types __default_spin_policy, __timed_backoff_spin_policy, and
the free function __atomic_spin. These are replaced in favor of
two new free functions; __spin_impl and __spin_until_impl. These
currently inline free functions are expected to be moved into the
libstdc++ shared library in a future commit.

The previous implementation derived untimed and timed wait
implementation detail from __detail::__waiter_pool_base. This
is-a relationship is removed in the new version and the previous
implementation detail is renamed to reflect this change. The
static _S_for member has been renamed as well to indicate that it
returns the __waiter_pool_impl entry in the static 'side table'
for a given awaited address.

This new implementation replaces all of the non-templated waiting
detail of __waiter_base, __waiter_pool, __waiter, __enters_wait, and
__bare_wait with the __wait_impl free function, and the supporting
__wait_flags enum and __wait_args struct. This currenly inline free
function is expected to be moved into the libstdc++ shared library
in a future commit.

This new implementation replaces all of the non-templated notifying
detail of __waiter_base, __waiter_pool, and __waiter with the
__notify_impl free function. This currently inline free function
is expected to be moved into the libstdc++ shared library in a
future commit.

The __atomic_wait_address template function is updated to account
for the above changes and to support the expected C++26 change to
pass the most recent observed value to the caller supplied predicate.

A new non-templated __atomic_wait_address_v free function is added
that only works for atomic types that operate only on __platform_wait_t
and requires the caller to supply a memory order. This is intended
to be the simplest code path for such types.

The __atomic_wait_address_v template function is now implemented in
terms of new __atomic_wait_address template and continues to accept
a user supplied "value function" to retrieve the current value of
the atomic.

The __atomic_notify_address template function is updated to account
for the above changes.

The template __platform_wait_until_impl is renamed to
__wait_clock_t. The previous __platform_wait_until template is deleted
and the functionality previously provided is moved t the new tempalate
function __wait_until. A similar change is made to the
__cond_wait_until_impl/__cond_wait_until implementation.

This new implementation similarly replaces all of the non-templated
waiting detail of __timed_waiter_pool, __timed_waiter, etc. with
the new __wait_until_impl free function. This currently inline free
function is expected to be moved into the libstdc++ shared library
in a future commit.

This implementation replaces all templated waiting functions that
manage clock conversion as well as relative waiting (wait_for) with
the new template functions __wait_until and __wait_for.

Similarly the previous implementation detail for the various
__atomic_wait_address_Xxx templates is adjusted to account for the
implementation changes outlined above.

All of the "bare wait" versions of __atomic_wait_Xxx have been removed
and replaced with a defaulted boolean __bare_wait parameter on the
new version of these templates.

libstdc++-v3/ChangeLog:

	* include/bits/atomic_timed_wait.h:
	(__detail::__platform_wait_until_impl): Rename to
	__platform_wait_until.
	(__detail::__platform_wait_until): Remove previous
	definition.
	(__detail::__cond_wait_until_impl): Rename to
	__cond_wait_until.
	(__detail::__cond_wait_until): Remove previous
	definition.
	(__detail::__spin_until_impl): New function.
	(__detail::__wait_until_impl): New function.
	(__detail::__wait_until): New function.
	(__detail::__wait_for): New function.
	(__detail::__timed_waiter_pool): Remove type.
	(__detail::__timed_backoff_spin_policy): Remove type.
	(__detail::__timed_waiter): Remove type.
	(__detail::__enters_timed_wait): Remove type alias.
	(__detail::__bare_timed_wait): Remove type alias.
	(__atomic_wait_address_until): Adjust to new implementation
	detail.
	(__atomic_wait_address_until_v): Likewise.
	(__atomic_wait_address_bare): Remove.
	(__atomic_wait_address_for): Adjust to new implementation
	detail.
	(__atomic_wait_address_for_v): Likewise.
	(__atomic_wait_address_for_bare): Remove.
	* include/bits/atomic_wait.h: Include bits/stl_pair.h.
	(__detail::__default_spin_policy): Remove type.
	(__detail::__atomic_spin): Remove function.
	(__detail::__waiter_pool_base): Rename to __waiter_pool_impl.
	Remove _M_notify.  Rename _S_for to _S_impl_for.
	(__detail::__waiter_base): Remove type.
	(__detail::__waiter_pool): Remove type.
	(__detail::__waiter): Remove type.
	(__detail::__enters_wait): Remove type alias.
	(__detail::__bare_wait): Remove type alias.
	(__detail::__wait_flags): New enum.
	(__detail::__wait_args): New struct.
	(__detail::__wait_result_type): New type alias.
	(__detail::__spin_impl): New function.
	(__detail::__wait_impl): New function.
	(__atomic_wait_address): Adjust to new implementation detail.
	(__atomic_wait_address_v): Likewise.
	(__atomic_notify_address): Likewise.
	(__atomic_wait_address_bare): Delete.
	(__atomic_notify_address_bare): Likewise.
	* include/bits/semaphore_base.h: Adjust implementation to
	use new __atomic_wait_address_v contract.
	* include/std/barrier: Adjust implementation to use new
	__atomic_wait contract.
	* include/std/latch: Adjust implementation to use new
	__atomic_wait contract.
	* testsuite/29_atomics/atomic/wait_notify/100334.cc (main):
	Adjust to for __detail::__waiter_pool_base renaming.
---
 libstdc++-v3/include/bits/atomic_timed_wait.h | 549 ++++++++----------
 libstdc++-v3/include/bits/atomic_wait.h       | 486 ++++++++--------
 libstdc++-v3/include/bits/semaphore_base.h    |  53 +-
 libstdc++-v3/include/std/barrier              |   6 +-
 libstdc++-v3/include/std/latch                |   5 +-
 .../29_atomics/atomic/wait_notify/100334.cc   |   4 +-
 6 files changed, 514 insertions(+), 589 deletions(-)
  

Comments

Jonathan Wakely Nov. 16, 2023, 8:46 p.m. UTC | #1
On Thu, 16 Nov 2023 at 13:49, Jonathan Wakely <jwakely@redhat.com> wrote:
>
> From: Thomas Rodgers <trodgers@redhat.com>
>
> These two patches were written by Tom earlier this year, before he left
> Red Hat. We should finish reviewing them for GCC 14 (and probably squash
> them into one?)
>
> Tom, you mentioned further work that changes the __platform_wait_t* to
> uintptr_t, is that ready, or likely to be ready very soon?
>
> Tested x86_64-linux, testing underway on powerpc-aix and sparc-solaris.

I'm seeing this on AIX and Solaris:

WARNING: program timed out.
FAIL: 30_threads/semaphore/try_acquire.cc  -std=gnu++20 execution test
  

Patch

diff --git a/libstdc++-v3/include/bits/atomic_timed_wait.h b/libstdc++-v3/include/bits/atomic_timed_wait.h
index ba21ab20a11..90427ef8b42 100644
--- a/libstdc++-v3/include/bits/atomic_timed_wait.h
+++ b/libstdc++-v3/include/bits/atomic_timed_wait.h
@@ -74,62 +74,32 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #ifdef _GLIBCXX_HAVE_LINUX_FUTEX
 #define _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
     // returns true if wait ended before timeout
-    template<typename _Dur>
-      bool
-      __platform_wait_until_impl(const __platform_wait_t* __addr,
-				 __platform_wait_t __old,
-				 const chrono::time_point<__wait_clock_t, _Dur>&
-				      __atime) noexcept
-      {
-	auto __s = chrono::time_point_cast<chrono::seconds>(__atime);
-	auto __ns = chrono::duration_cast<chrono::nanoseconds>(__atime - __s);
+    bool
+    __platform_wait_until(const __platform_wait_t* __addr,
+			  __platform_wait_t __old,
+			  const __wait_clock_t::time_point& __atime) noexcept
+    {
+      auto __s = chrono::time_point_cast<chrono::seconds>(__atime);
+      auto __ns = chrono::duration_cast<chrono::nanoseconds>(__atime - __s);
 
-	struct timespec __rt =
+      struct timespec __rt =
 	{
 	  static_cast<std::time_t>(__s.time_since_epoch().count()),
 	  static_cast<long>(__ns.count())
 	};
 
-	auto __e = syscall (SYS_futex, __addr,
-			    static_cast<int>(__futex_wait_flags::
-						__wait_bitset_private),
-			    __old, &__rt, nullptr,
-			    static_cast<int>(__futex_wait_flags::
-						__bitset_match_any));
-
-	if (__e)
-	  {
-	    if (errno == ETIMEDOUT)
-	      return false;
-	    if (errno != EINTR && errno != EAGAIN)
-	      __throw_system_error(errno);
-	  }
-	return true;
-      }
-
-    // returns true if wait ended before timeout
-    template<typename _Clock, typename _Dur>
-      bool
-      __platform_wait_until(const __platform_wait_t* __addr, __platform_wait_t __old,
-			    const chrono::time_point<_Clock, _Dur>& __atime)
-      {
-	if constexpr (is_same_v<__wait_clock_t, _Clock>)
-	  {
-	    return __platform_wait_until_impl(__addr, __old, __atime);
-	  }
-	else
-	  {
-	    if (!__platform_wait_until_impl(__addr, __old,
-					    __to_wait_clock(__atime)))
-	      {
-		// We got a timeout when measured against __clock_t but
-		// we need to check against the caller-supplied clock
-		// to tell whether we should return a timeout.
-		if (_Clock::now() < __atime)
-		  return true;
-	      }
+      auto __e = syscall (SYS_futex, __addr,
+			  static_cast<int>(__futex_wait_flags::__wait_bitset_private),
+			  __old, &__rt, nullptr,
+			  static_cast<int>(__futex_wait_flags::__bitset_match_any));
+      if (__e)
+	{
+	  if (errno == ETIMEDOUT)
 	    return false;
-	  }
+	  if (errno != EINTR && errno != EAGAIN)
+	    __throw_system_error(errno);
+	}
+	return true;
       }
 #else
 // define _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT and implement __platform_wait_until()
@@ -139,15 +109,10 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
 #ifdef _GLIBCXX_HAS_GTHREADS
     // Returns true if wait ended before timeout.
-    // _Clock must be either steady_clock or system_clock.
-    template<typename _Clock, typename _Dur>
-      bool
-      __cond_wait_until_impl(__condvar& __cv, mutex& __mx,
-			     const chrono::time_point<_Clock, _Dur>& __atime)
-      {
-	static_assert(std::__is_one_of<_Clock, chrono::steady_clock,
-					       chrono::system_clock>::value);
-
+    bool
+    __cond_wait_until(__condvar& __cv, mutex& __mx,
+		      const __wait_clock_t::time_point& __atime)
+    {
 	auto __s = chrono::time_point_cast<chrono::seconds>(__atime);
 	auto __ns = chrono::duration_cast<chrono::nanoseconds>(__atime - __s);
 
@@ -158,293 +123,261 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	  };
 
 #ifdef _GLIBCXX_USE_PTHREAD_COND_CLOCKWAIT
-	if constexpr (is_same_v<chrono::steady_clock, _Clock>)
+	if constexpr (is_same_v<chrono::steady_clock, __wait_clock_t>)
 	  __cv.wait_until(__mx, CLOCK_MONOTONIC, __ts);
 	else
 #endif
 	  __cv.wait_until(__mx, __ts);
-	return _Clock::now() < __atime;
-      }
-
-    // returns true if wait ended before timeout
-    template<typename _Clock, typename _Dur>
-      bool
-      __cond_wait_until(__condvar& __cv, mutex& __mx,
-	  const chrono::time_point<_Clock, _Dur>& __atime)
-      {
-#ifdef _GLIBCXX_USE_PTHREAD_COND_CLOCKWAIT
-	if constexpr (is_same_v<_Clock, chrono::steady_clock>)
-	  return __detail::__cond_wait_until_impl(__cv, __mx, __atime);
-	else
-#endif
-	if constexpr (is_same_v<_Clock, chrono::system_clock>)
-	  return __detail::__cond_wait_until_impl(__cv, __mx, __atime);
-	else
-	  {
-	    if (__cond_wait_until_impl(__cv, __mx,
-				       __to_wait_clock(__atime)))
-	      {
-		// We got a timeout when measured against __clock_t but
-		// we need to check against the caller-supplied clock
-		// to tell whether we should return a timeout.
-		if (_Clock::now() < __atime)
-		  return true;
-	      }
-	    return false;
-	  }
+	return __wait_clock_t::now() < __atime;
       }
 #endif // _GLIBCXX_HAS_GTHREADS
 
-    struct __timed_waiter_pool : __waiter_pool_base
+    inline __wait_result_type
+    __spin_until_impl(const __platform_wait_t* __addr, __wait_args __args,
+		      const __wait_clock_t::time_point& __deadline)
     {
-      // returns true if wait ended before timeout
-      template<typename _Clock, typename _Dur>
-	bool
-	_M_do_wait_until(__platform_wait_t* __addr, __platform_wait_t __old,
-			 const chrono::time_point<_Clock, _Dur>& __atime)
+      auto __t0 = __wait_clock_t::now();
+      using namespace literals::chrono_literals;
+
+      __platform_wait_t __val;
+      auto __now = __wait_clock_t::now();
+      for (; __now < __deadline; __now = __wait_clock_t::now())
+      {
+	auto __elapsed = __now - __t0;
+#ifndef _GLIBCXX_NO_SLEEP
+	if (__elapsed > 128ms)
+	{
+	  this_thread::sleep_for(64ms);
+	}
+	else if (__elapsed > 64us)
+	{
+	  this_thread::sleep_for(__elapsed / 2);
+	}
+	else
+#endif
+	if (__elapsed > 4us)
+	{
+	  __thread_yield();
+	}
+	else
+	{
+	  auto __res = __detail::__spin_impl(__addr, __args);
+	  if (__res.first)
+	    return __res;
+	}
+
+	__atomic_load(__addr, &__val, __args._M_order);
+	if (__val != __args._M_old)
+	    return make_pair(true, __val);
+      }
+      return make_pair(false, __val);
+    }
+
+    inline __wait_result_type
+    __wait_until_impl(const __platform_wait_t* __addr, __wait_args __args,
+		      const __wait_clock_t::time_point& __atime)
+    {
+#ifdef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
+      __waiter_pool_impl* __pool = nullptr;
+#else
+      // if we don't have __platform_wait, we always need the side-table
+      __waiter_pool_impl* __pool = &__waiter_pool_impl::_S_impl_for(__addr);
+#endif
+
+      __platform_wait_t* __wait_addr;
+      if (__args & __wait_flags::__proxy_wait)
 	{
 #ifdef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
-	  return __platform_wait_until(__addr, __old, __atime);
-#else
-	  __platform_wait_t __val;
-	  __atomic_load(__addr, &__val, __ATOMIC_RELAXED);
-	  if (__val == __old)
-	    {
-	      lock_guard<mutex> __l(_M_mtx);
-	      return __cond_wait_until(_M_cv, _M_mtx, __atime);
-	    }
-	  else
-	    return true;
-#endif // _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
+	  __pool = &__waiter_pool_impl::_S_impl_for(__addr);
+#endif
+	  __wait_addr = &__pool->_M_ver;
+	  __atomic_load(__wait_addr, &__args._M_old, __args._M_order);
 	}
-    };
+      else
+	__wait_addr = const_cast<__platform_wait_t*>(__addr);
 
-    struct __timed_backoff_spin_policy
-    {
-      __wait_clock_t::time_point _M_deadline;
-      __wait_clock_t::time_point _M_t0;
+      if (__args & __wait_flags::__do_spin)
+	{
+	  auto __res = __detail::__spin_until_impl(__wait_addr, __args, __atime);
+	  if (__res.first)
+	    return __res;
+	  if (__args & __wait_flags::__spin_only)
+	    return __res;
+	}
 
-      template<typename _Clock, typename _Dur>
-	__timed_backoff_spin_policy(chrono::time_point<_Clock, _Dur>
-				      __deadline = _Clock::time_point::max(),
-				    chrono::time_point<_Clock, _Dur>
-				      __t0 = _Clock::now()) noexcept
-	  : _M_deadline(__to_wait_clock(__deadline))
-	  , _M_t0(__to_wait_clock(__t0))
-	{ }
-
-      bool
-      operator()() const noexcept
+      if (!(__args & __wait_flags::__track_contention))
       {
-	using namespace literals::chrono_literals;
-	auto __now = __wait_clock_t::now();
-	if (_M_deadline <= __now)
-	  return false;
-
-	// FIXME: this_thread::sleep_for not available #ifdef _GLIBCXX_NO_SLEEP
-
-	auto __elapsed = __now - _M_t0;
-	if (__elapsed > 128ms)
-	  {
-	    this_thread::sleep_for(64ms);
-	  }
-	else if (__elapsed > 64us)
-	  {
-	    this_thread::sleep_for(__elapsed / 2);
-	  }
-	else if (__elapsed > 4us)
-	  {
-	    __thread_yield();
-	  }
-	else
-	  return false;
-	return true;
+	// caller does not externally track contention
+#ifdef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
+	__pool = (__pool == nullptr) ? &__waiter_pool_impl::_S_impl_for(__addr)
+				     : __pool;
+#endif
+	__pool->_M_enter_wait();
       }
-    };
 
-    template<typename _EntersWait>
-      struct __timed_waiter : __waiter_base<__timed_waiter_pool>
+      __wait_result_type __res;
+#ifdef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
+      if (__platform_wait_until(__wait_addr, __args._M_old, __atime))
+	__res = make_pair(true, __args._M_old);
+      else
+	__res = make_pair(false, __args._M_old);
+#else
+      __platform_wait_t __val;
+      __atomic_load(__wait_addr, &__val, __args._M_order);
+      if (__val == __args._M_old)
+	{
+	   lock_guard<mutex> __l{ __pool->_M_mtx };
+	   __atomic_load(__wait_addr, &__val, __args._M_order);
+	   if (__val == __args._M_old &&
+	       __cond_wait_until(__pool->_M_cv, __pool->_M_mtx, __atime))
+	     __res = make_pair(true, __val);
+	}
+      else
+	__res = make_pair(false, __val);
+#endif
+
+      if (!(__args & __wait_flags::__track_contention))
+	  // caller does not externally track contention
+	  __pool->_M_leave_wait();
+      return __res;
+    }
+
+    template<typename _Clock, typename _Dur>
+      __wait_result_type
+      __wait_until(const __platform_wait_t* __addr, __wait_args __args,
+		   const chrono::time_point<_Clock, _Dur>& __atime) noexcept
       {
-	using __base_type = __waiter_base<__timed_waiter_pool>;
-
-	template<typename _Tp>
-	  __timed_waiter(const _Tp* __addr) noexcept
-	  : __base_type(__addr)
-	{
-	  if constexpr (_EntersWait::value)
-	    _M_w._M_enter_wait();
-	}
-
-	~__timed_waiter()
-	{
-	  if constexpr (_EntersWait::value)
-	    _M_w._M_leave_wait();
-	}
-
-	// returns true if wait ended before timeout
-	template<typename _Tp, typename _ValFn,
-		 typename _Clock, typename _Dur>
-	  bool
-	  _M_do_wait_until_v(_Tp __old, _ValFn __vfn,
-			     const chrono::time_point<_Clock, _Dur>&
-								__atime) noexcept
+	if constexpr (is_same_v<__wait_clock_t, _Clock>)
+	  return __detail::__wait_until_impl(__addr, __args, __atime);
+	else
 	  {
-	    __platform_wait_t __val;
-	    if (_M_do_spin(__old, std::move(__vfn), __val,
-			   __timed_backoff_spin_policy(__atime)))
-	      return true;
-	    return __base_type::_M_w._M_do_wait_until(__base_type::_M_addr, __val, __atime);
-	  }
+	     auto __res = __detail::__wait_until_impl(__addr, __args,
+					    __to_wait_clock(__atime));
+	     if (!__res.first)
+	       {
+		  // We got a timeout when measured against __clock_t but
+		  // we need to check against the caller-supplied clock
+		  // to tell whether we should return a timeout.
+		  if (_Clock::now() < __atime)
+		    return make_pair(true, __res.second);
+		}
+	      return __res;
+	    }
+      }
 
-	// returns true if wait ended before timeout
-	template<typename _Pred,
-		 typename _Clock, typename _Dur>
-	  bool
-	  _M_do_wait_until(_Pred __pred, __platform_wait_t __val,
-			  const chrono::time_point<_Clock, _Dur>&
-							      __atime) noexcept
-	  {
-	    for (auto __now = _Clock::now(); __now < __atime;
-		  __now = _Clock::now())
-	      {
-		if (__base_type::_M_w._M_do_wait_until(
-		      __base_type::_M_addr, __val, __atime)
-		    && __pred())
-		  return true;
-
-		if (__base_type::_M_do_spin(__pred, __val,
-			       __timed_backoff_spin_policy(__atime, __now)))
-		  return true;
-	      }
-	    return false;
-	  }
-
-	// returns true if wait ended before timeout
-	template<typename _Pred,
-		 typename _Clock, typename _Dur>
-	  bool
-	  _M_do_wait_until(_Pred __pred,
-			   const chrono::time_point<_Clock, _Dur>&
-								__atime) noexcept
-	  {
-	    __platform_wait_t __val;
-	    if (__base_type::_M_do_spin(__pred, __val,
-					__timed_backoff_spin_policy(__atime)))
-	      return true;
-	    return _M_do_wait_until(__pred, __val, __atime);
-	  }
-
-	template<typename _Tp, typename _ValFn,
-		 typename _Rep, typename _Period>
-	  bool
-	  _M_do_wait_for_v(_Tp __old, _ValFn __vfn,
-			   const chrono::duration<_Rep, _Period>&
-								__rtime) noexcept
-	  {
-	    __platform_wait_t __val;
-	    if (_M_do_spin_v(__old, std::move(__vfn), __val))
-	      return true;
-
-	    if (!__rtime.count())
-	      return false; // no rtime supplied, and spin did not acquire
-
-	    auto __reltime = chrono::ceil<__wait_clock_t::duration>(__rtime);
-
-	    return __base_type::_M_w._M_do_wait_until(
-					  __base_type::_M_addr,
-					  __val,
-					  chrono::steady_clock::now() + __reltime);
-	  }
-
-	template<typename _Pred,
-		 typename _Rep, typename _Period>
-	  bool
-	  _M_do_wait_for(_Pred __pred,
-			 const chrono::duration<_Rep, _Period>& __rtime) noexcept
-	  {
-	    __platform_wait_t __val;
-	    if (__base_type::_M_do_spin(__pred, __val))
-	      return true;
-
-	    if (!__rtime.count())
-	      return false; // no rtime supplied, and spin did not acquire
-
-	    auto __reltime = chrono::ceil<__wait_clock_t::duration>(__rtime);
-
-	    return _M_do_wait_until(__pred, __val,
-				    chrono::steady_clock::now() + __reltime);
-	  }
-      };
-
-    using __enters_timed_wait = __timed_waiter<std::true_type>;
-    using __bare_timed_wait = __timed_waiter<std::false_type>;
+    template<typename _Rep, typename _Period>
+      __wait_result_type
+      __wait_for(const __platform_wait_t* __addr, __wait_args __args,
+		 const chrono::duration<_Rep, _Period>& __rtime) noexcept
+    {
+      if (!__rtime.count())
+	// no rtime supplied, just spin a bit
+	return __detail::__wait_impl(__addr, __args | __wait_flags::__spin_only);
+      auto const __reltime = chrono::ceil<__wait_clock_t::duration>(__rtime);
+      auto const __atime = chrono::steady_clock::now() + __reltime;
+      return __detail::__wait_until(__addr, __args, __atime);
+    }
   } // namespace __detail
 
   // returns true if wait ended before timeout
+  template<typename _Tp,
+	   typename _Pred, typename _ValFn,
+	   typename _Clock, typename _Dur>
+    bool
+    __atomic_wait_address_until(const _Tp* __addr, _Pred&& __pred,
+				_ValFn&& __vfn,
+				const chrono::time_point<_Clock, _Dur>& __atime,
+				bool __bare_wait = false) noexcept
+    {
+       const auto __wait_addr =
+	   reinterpret_cast<const __detail::__platform_wait_t*>(__addr);
+       __detail::__wait_args __args{ __addr, __bare_wait };
+       _Tp __val = __vfn();
+       while (!__pred(__val))
+	 {
+	   auto __res = __detail::__wait_until(__wait_addr, __args, __atime);
+	   if (!__res.first)
+	     // timed out
+	     return __res.first; // C++26 will also return last observed __val
+	   __val = __vfn();
+	 }
+       return true; // C++26 will also return last observed __val
+    }
+
+  template<typename _Clock, typename _Dur>
+    bool
+    __atomic_wait_address_until_v(const __detail::__platform_wait_t* __addr,
+				  __detail::__platform_wait_t __old,
+				  int __order,
+				  const chrono::time_point<_Clock, _Dur>& __atime,
+				  bool __bare_wait = false) noexcept
+    {
+      __detail::__wait_args __args{ __addr, __old, __order, __bare_wait };
+      auto __res = __detail::__wait_until(__addr, __args, __atime);
+      return __res.first; // C++26 will also return last observed __val
+    }
+
   template<typename _Tp, typename _ValFn,
 	   typename _Clock, typename _Dur>
     bool
     __atomic_wait_address_until_v(const _Tp* __addr, _Tp&& __old, _ValFn&& __vfn,
-			const chrono::time_point<_Clock, _Dur>&
-			    __atime) noexcept
+				  const chrono::time_point<_Clock, _Dur>& __atime,
+				  bool __bare_wait = false) noexcept
     {
-      __detail::__enters_timed_wait __w{__addr};
-      return __w._M_do_wait_until_v(__old, __vfn, __atime);
+       auto __pfn = [&](const _Tp& __val)
+	   { return !__detail::__atomic_compare(__old, __val); };
+       return __atomic_wait_address_until(__addr, __pfn, forward<_ValFn>(__vfn),
+					  __atime, __bare_wait);
     }
 
-  template<typename _Tp, typename _Pred,
-	   typename _Clock, typename _Dur>
-    bool
-    __atomic_wait_address_until(const _Tp* __addr, _Pred __pred,
-				const chrono::time_point<_Clock, _Dur>&
-							      __atime) noexcept
-    {
-      __detail::__enters_timed_wait __w{__addr};
-      return __w._M_do_wait_until(__pred, __atime);
-    }
+  template<typename _Tp,
+	   typename _Pred, typename _ValFn,
+	   typename _Rep, typename _Period>
+   bool
+   __atomic_wait_address_for(const _Tp* __addr, _Pred&& __pred,
+			     _ValFn&& __vfn,
+			     const chrono::duration<_Rep, _Period>& __rtime,
+			     bool __bare_wait = false) noexcept
+  {
+      const auto __wait_addr =
+	  reinterpret_cast<const __detail::__platform_wait_t*>(__addr);
+      __detail::__wait_args __args{ __addr, __bare_wait };
+      _Tp __val = __vfn();
+      while (!__pred(__val))
+	{
+	  auto __res = __detail::__wait_for(__wait_addr, __args, __rtime);
+	  if (!__res.first)
+	    // timed out
+	    return __res.first; // C++26 will also return last observed __val
+	  __val = __vfn();
+	}
+      return true; // C++26 will also return last observed __val
+  }
 
-  template<typename _Pred,
-	   typename _Clock, typename _Dur>
+  template<typename _Rep, typename _Period>
     bool
-    __atomic_wait_address_until_bare(const __detail::__platform_wait_t* __addr,
-				_Pred __pred,
-				const chrono::time_point<_Clock, _Dur>&
-							      __atime) noexcept
-    {
-      __detail::__bare_timed_wait __w{__addr};
-      return __w._M_do_wait_until(__pred, __atime);
-    }
+    __atomic_wait_address_for_v(const __detail::__platform_wait_t* __addr,
+				__detail::__platform_wait_t __old,
+				int __order,
+				const chrono::time_point<_Rep, _Period>& __rtime,
+				bool __bare_wait = false) noexcept
+  {
+    __detail::__wait_args __args{ __addr, __old, __order, __bare_wait };
+    auto __res = __detail::__wait_for(__addr, __args, __rtime);
+    return __res.first; // C++26 will also return last observed __Val
+  }
 
   template<typename _Tp, typename _ValFn,
 	   typename _Rep, typename _Period>
     bool
     __atomic_wait_address_for_v(const _Tp* __addr, _Tp&& __old, _ValFn&& __vfn,
-		      const chrono::duration<_Rep, _Period>& __rtime) noexcept
+				const chrono::duration<_Rep, _Period>& __rtime,
+				bool __bare_wait = false) noexcept
     {
-      __detail::__enters_timed_wait __w{__addr};
-      return __w._M_do_wait_for_v(__old, __vfn, __rtime);
-    }
-
-  template<typename _Tp, typename _Pred,
-	   typename _Rep, typename _Period>
-    bool
-    __atomic_wait_address_for(const _Tp* __addr, _Pred __pred,
-		      const chrono::duration<_Rep, _Period>& __rtime) noexcept
-    {
-
-      __detail::__enters_timed_wait __w{__addr};
-      return __w._M_do_wait_for(__pred, __rtime);
-    }
-
-  template<typename _Pred,
-	   typename _Rep, typename _Period>
-    bool
-    __atomic_wait_address_for_bare(const __detail::__platform_wait_t* __addr,
-			_Pred __pred,
-			const chrono::duration<_Rep, _Period>& __rtime) noexcept
-    {
-      __detail::__bare_timed_wait __w{__addr};
-      return __w._M_do_wait_for(__pred, __rtime);
+      auto __pfn = [&](const _Tp& __val)
+	  { return !__detail::__atomic_compare(__old, __val); };
+      return __atomic_wait_address_for(__addr, __pfn, forward<_ValFn>(__vfn),
+				       __rtime, __bare_wait);
     }
 _GLIBCXX_END_NAMESPACE_VERSION
 } // namespace std
diff --git a/libstdc++-v3/include/bits/atomic_wait.h b/libstdc++-v3/include/bits/atomic_wait.h
index 506b7f0cc9d..63d132fc9a8 100644
--- a/libstdc++-v3/include/bits/atomic_wait.h
+++ b/libstdc++-v3/include/bits/atomic_wait.h
@@ -47,7 +47,8 @@ 
 # include <bits/functexcept.h>
 #endif
 
-# include <bits/std_mutex.h>  // std::mutex, std::__condvar
+#include <bits/stl_pair.h>
+#include <bits/std_mutex.h>  // std::mutex, std::__condvar
 
 namespace std _GLIBCXX_VISIBILITY(default)
 {
@@ -131,7 +132,7 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
     __thread_yield() noexcept
     {
 #if defined _GLIBCXX_HAS_GTHREADS && defined _GLIBCXX_USE_SCHED_YIELD
-     __gthread_yield();
+      __gthread_yield();
 #endif
     }
 
@@ -148,38 +149,6 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
     inline constexpr auto __atomic_spin_count_relax = 12;
     inline constexpr auto __atomic_spin_count = 16;
 
-    struct __default_spin_policy
-    {
-      bool
-      operator()() const noexcept
-      { return false; }
-    };
-
-    template<typename _Pred,
-	     typename _Spin = __default_spin_policy>
-      bool
-      __atomic_spin(_Pred& __pred, _Spin __spin = _Spin{ }) noexcept
-      {
-	for (auto __i = 0; __i < __atomic_spin_count; ++__i)
-	  {
-	    if (__pred())
-	      return true;
-
-	    if (__i < __atomic_spin_count_relax)
-	      __detail::__thread_relax();
-	    else
-	      __detail::__thread_yield();
-	  }
-
-	while (__spin())
-	  {
-	    if (__pred())
-	      return true;
-	  }
-
-	return false;
-      }
-
     // return true if equal
     template<typename _Tp>
       bool __atomic_compare(const _Tp& __a, const _Tp& __b)
@@ -188,7 +157,7 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	return __builtin_memcmp(&__a, &__b, sizeof(_Tp)) == 0;
       }
 
-    struct __waiter_pool_base
+    struct __waiter_pool_impl
     {
       // Don't use std::hardware_destructive_interference_size here because we
       // don't want the layout of library types to depend on compiler options.
@@ -205,7 +174,7 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #ifndef _GLIBCXX_HAVE_PLATFORM_WAIT
       __condvar _M_cv;
 #endif
-      __waiter_pool_base() = default;
+      __waiter_pool_impl() = default;
 
       void
       _M_enter_wait() noexcept
@@ -223,256 +192,271 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	return __res != 0;
       }
 
-      void
-      _M_notify(__platform_wait_t* __addr, [[maybe_unused]] bool __all,
-		bool __bare) noexcept
-      {
-#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
-	if (__addr == &_M_ver)
-	  {
-	    __atomic_fetch_add(__addr, 1, __ATOMIC_SEQ_CST);
-	    __all = true;
-	  }
-
-	if (__bare || _M_waiting())
-	  __platform_notify(__addr, __all);
-#else
-	{
-	  lock_guard<mutex> __l(_M_mtx);
-	  __atomic_fetch_add(__addr, 1, __ATOMIC_RELAXED);
-	}
-	if (__bare || _M_waiting())
-	  _M_cv.notify_all();
-#endif
-      }
-
-      static __waiter_pool_base&
-      _S_for(const void* __addr) noexcept
+      static __waiter_pool_impl&
+      _S_impl_for(const void* __addr) noexcept
       {
 	constexpr uintptr_t __ct = 16;
-	static __waiter_pool_base __w[__ct];
+	static __waiter_pool_impl __w[__ct];
 	auto __key = (uintptr_t(__addr) >> 2) % __ct;
 	return __w[__key];
       }
     };
 
-    struct __waiter_pool : __waiter_pool_base
+    enum class __wait_flags : uint32_t
     {
-      void
-      _M_do_wait(const __platform_wait_t* __addr, __platform_wait_t __old) noexcept
-      {
-#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
-	__platform_wait(__addr, __old);
-#else
-	__platform_wait_t __val;
-	__atomic_load(__addr, &__val, __ATOMIC_SEQ_CST);
-	if (__val == __old)
-	  {
-	    lock_guard<mutex> __l(_M_mtx);
-	    __atomic_load(__addr, &__val, __ATOMIC_RELAXED);
-	    if (__val == __old)
-	      _M_cv.wait(_M_mtx);
-	  }
-#endif // __GLIBCXX_HAVE_PLATFORM_WAIT
-      }
+       __abi_version = 0,
+       __proxy_wait = 1,
+       __track_contention = 2,
+       __do_spin = 4,
+       __spin_only = 8 | __do_spin, // implies __do_spin
+       __abi_version_mask = 0xffff0000,
     };
 
-    template<typename _Tp>
-      struct __waiter_base
+    struct __wait_args
+    {
+      __platform_wait_t _M_old;
+      int _M_order = __ATOMIC_ACQUIRE;
+      __wait_flags _M_flags;
+
+      template<typename _Tp>
+	explicit __wait_args(const _Tp* __addr,
+			     bool __bare_wait = false) noexcept
+	    : _M_flags{ _S_flags_for(__addr, __bare_wait) }
+	{ }
+
+      __wait_args(const __platform_wait_t* __addr, __platform_wait_t __old,
+		  int __order, bool __bare_wait = false) noexcept
+	  : _M_old{ __old }
+	  , _M_order{ __order }
+	  , _M_flags{ _S_flags_for(__addr, __bare_wait) }
+	{ }
+
+      __wait_args(const __wait_args&) noexcept = default;
+      __wait_args&
+      operator=(const __wait_args&) noexcept = default;
+
+      bool
+      operator&(__wait_flags __flag) const noexcept
       {
-	using __waiter_type = _Tp;
+	 using __t = underlying_type_t<__wait_flags>;
+	 return static_cast<__t>(_M_flags)
+	     & static_cast<__t>(__flag);
+      }
 
-	__waiter_type& _M_w;
-	__platform_wait_t* _M_addr;
+      __wait_args
+      operator|(__wait_flags __flag) const noexcept
+      {
+	using __t = underlying_type_t<__wait_flags>;
+	__wait_args __res{ *this };
+	const auto __flags = static_cast<__t>(__res._M_flags)
+			     | static_cast<__t>(__flag);
+	__res._M_flags = __wait_flags{ __flags };
+	return __res;
+      }
 
-	template<typename _Up>
-	  static __platform_wait_t*
-	  _S_wait_addr(const _Up* __a, __platform_wait_t* __b)
-	  {
-	    if constexpr (__platform_wait_uses_type<_Up>)
-	      return reinterpret_cast<__platform_wait_t*>(const_cast<_Up*>(__a));
-	    else
-	      return __b;
-	  }
+    private:
+      static int
+      constexpr _S_default_flags() noexcept
+      {
+	using __t = underlying_type_t<__wait_flags>;
+	return static_cast<__t>(__wait_flags::__abi_version)
+		| static_cast<__t>(__wait_flags::__do_spin);
+      }
 
-	static __waiter_type&
-	_S_for(const void* __addr) noexcept
+      template<typename _Tp>
+	static int
+	constexpr _S_flags_for(const _Tp*, bool __bare_wait) noexcept
 	{
-	  static_assert(sizeof(__waiter_type) == sizeof(__waiter_pool_base));
-	  auto& res = __waiter_pool_base::_S_for(__addr);
-	  return reinterpret_cast<__waiter_type&>(res);
+	  auto __res = _S_default_flags();
+	  if (!__bare_wait)
+	    __res |= static_cast<int>(__wait_flags::__track_contention);
+	  if constexpr (!__platform_wait_uses_type<_Tp>)
+	    __res |= static_cast<int>(__wait_flags::__proxy_wait);
+	  return __res;
 	}
 
-	template<typename _Up>
-	  explicit __waiter_base(const _Up* __addr) noexcept
-	    : _M_w(_S_for(__addr))
-	    , _M_addr(_S_wait_addr(__addr, &_M_w._M_ver))
-	  { }
-
-	void
-	_M_notify(bool __all, bool __bare = false) noexcept
-	{ _M_w._M_notify(_M_addr, __all, __bare); }
-
-	template<typename _Up, typename _ValFn,
-		 typename _Spin = __default_spin_policy>
-	  static bool
-	  _S_do_spin_v(__platform_wait_t* __addr,
-		       const _Up& __old, _ValFn __vfn,
-		       __platform_wait_t& __val,
-		       _Spin __spin = _Spin{ })
-	  {
-	    auto const __pred = [=]
-	      { return !__detail::__atomic_compare(__old, __vfn()); };
-
-	    if constexpr (__platform_wait_uses_type<_Up>)
-	      {
-		__builtin_memcpy(&__val, &__old, sizeof(__val));
-	      }
-	    else
-	      {
-		__atomic_load(__addr, &__val, __ATOMIC_ACQUIRE);
-	      }
-	    return __atomic_spin(__pred, __spin);
-	  }
-
-	template<typename _Up, typename _ValFn,
-		 typename _Spin = __default_spin_policy>
-	  bool
-	  _M_do_spin_v(const _Up& __old, _ValFn __vfn,
-		       __platform_wait_t& __val,
-		       _Spin __spin = _Spin{ })
-	  { return _S_do_spin_v(_M_addr, __old, __vfn, __val, __spin); }
-
-	template<typename _Pred,
-		 typename _Spin = __default_spin_policy>
-	  static bool
-	  _S_do_spin(const __platform_wait_t* __addr,
-		     _Pred __pred,
-		     __platform_wait_t& __val,
-		     _Spin __spin = _Spin{ })
-	  {
-	    __atomic_load(__addr, &__val, __ATOMIC_ACQUIRE);
-	    return __atomic_spin(__pred, __spin);
-	  }
-
-	template<typename _Pred,
-		 typename _Spin = __default_spin_policy>
-	  bool
-	  _M_do_spin(_Pred __pred, __platform_wait_t& __val,
-		     _Spin __spin = _Spin{ })
-	  { return _S_do_spin(_M_addr, __pred, __val, __spin); }
-      };
-
-    template<typename _EntersWait>
-      struct __waiter : __waiter_base<__waiter_pool>
-      {
-	using __base_type = __waiter_base<__waiter_pool>;
-
-	template<typename _Tp>
-	  explicit __waiter(const _Tp* __addr) noexcept
-	    : __base_type(__addr)
-	  {
-	    if constexpr (_EntersWait::value)
-	      _M_w._M_enter_wait();
-	  }
-
-	~__waiter()
+      template<typename _Tp>
+	static int
+	_S_memory_order_for(const _Tp*, int __order) noexcept
 	{
-	  if constexpr (_EntersWait::value)
-	    _M_w._M_leave_wait();
+	  if constexpr (__platform_wait_uses_type<_Tp>)
+	    return __order;
+	  return __ATOMIC_ACQUIRE;
+	}
+    };
+
+    using __wait_result_type = pair<bool, __platform_wait_t>;
+    inline __wait_result_type
+    __spin_impl(const __platform_wait_t* __addr, __wait_args __args)
+    {
+      __platform_wait_t __val;
+      for (auto __i = 0; __i < __atomic_spin_count; ++__i)
+	{
+	  __atomic_load(__addr, &__val, __args._M_order);
+	  if (__val != __args._M_old)
+	    return make_pair(true, __val);
+	  if (__i < __atomic_spin_count_relax)
+	    __detail::__thread_relax();
+	  else
+	    __detail::__thread_yield();
+	}
+      return make_pair(false, __val);
+    }
+
+    inline __wait_result_type
+    __wait_impl(const __platform_wait_t* __addr, __wait_args __args)
+    {
+#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
+      __waiter_pool_impl* __pool = nullptr;
+#else
+      // if we don't have __platform_wait, we always need the side-table
+      __waiter_pool_impl* __pool = &__waiter_pool_impl::_S_impl_for(__addr);
+#endif
+
+      __platform_wait_t* __wait_addr;
+      if (__args & __wait_flags::__proxy_wait)
+	{
+#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
+	  __pool = &__waiter_pool_impl::_S_impl_for(__addr);
+#endif
+	  __wait_addr = &__pool->_M_ver;
+	  __atomic_load(__wait_addr, &__args._M_old, __args._M_order);
+	}
+      else
+	__wait_addr = const_cast<__platform_wait_t*>(__addr);
+
+      if (__args & __wait_flags::__do_spin)
+	{
+	  auto __res = __detail::__spin_impl(__wait_addr, __args);
+	  if (__res.first)
+	    return __res;
+	  if (__args & __wait_flags::__spin_only)
+	    return __res;
 	}
 
-	template<typename _Tp, typename _ValFn>
-	  void
-	  _M_do_wait_v(_Tp __old, _ValFn __vfn)
-	  {
-	    do
-	      {
-		__platform_wait_t __val;
-		if (__base_type::_M_do_spin_v(__old, __vfn, __val))
-		  return;
-		__base_type::_M_w._M_do_wait(__base_type::_M_addr, __val);
-	      }
-	    while (__detail::__atomic_compare(__old, __vfn()));
-	  }
+      if (!(__args & __wait_flags::__track_contention))
+	{
+	  // caller does not externally track contention
+#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
+	  __pool = (__pool == nullptr) ? &__waiter_pool_impl::_S_impl_for(__addr)
+				       : __pool;
+#endif
+	  __pool->_M_enter_wait();
+	}
 
-	template<typename _Pred>
-	  void
-	  _M_do_wait(_Pred __pred) noexcept
-	  {
-	    do
-	      {
-		__platform_wait_t __val;
-		if (__base_type::_M_do_spin(__pred, __val))
-		  return;
-		__base_type::_M_w._M_do_wait(__base_type::_M_addr, __val);
-	      }
-	    while (!__pred());
-	  }
-      };
+      __wait_result_type __res;
+#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
+      __platform_wait(__wait_addr, __args._M_old);
+      __res = make_pair(false, __args._M_old);
+#else
+      __platform_wait_t __val;
+      __atomic_load(__wait_addr, &__val, __args._M_order);
+      if (__val == __args._M_old)
+	{
+	    lock_guard<mutex> __l{ __pool->_M_mtx };
+	    __atomic_load(__wait_addr, &__val, __args._M_order);
+	    if (__val == __args._M_old)
+		__pool->_M_cv.wait(__pool->_M_mtx);
+	}
+      __res = make_pair(false, __val);
+#endif
 
-    using __enters_wait = __waiter<std::true_type>;
-    using __bare_wait = __waiter<std::false_type>;
+      if (!(__args & __wait_flags::__track_contention))
+	// caller does not externally track contention
+	__pool->_M_leave_wait();
+      return __res;
+    }
+
+    inline void
+    __notify_impl(const __platform_wait_t* __addr, [[maybe_unused]] bool __all,
+		  __wait_args __args)
+    {
+#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
+      __waiter_pool_impl* __pool = nullptr;
+#else
+      // if we don't have __platform_notify, we always need the side-table
+      __waiter_pool_impl* __pool = &__waiter_pool_impl::_S_impl_for(__addr);
+#endif
+
+      if (!(__args & __wait_flags::__track_contention))
+	{
+#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
+	  __pool = &__waiter_pool_impl::_S_impl_for(__addr);
+#endif
+	  if (!__pool->_M_waiting())
+	    return;
+	}
+
+      __platform_wait_t* __wait_addr;
+      if (__args & __wait_flags::__proxy_wait)
+	{
+#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
+	   __pool = (__pool == nullptr) ? &__waiter_pool_impl::_S_impl_for(__addr)
+					: __pool;
+#endif
+	   __wait_addr = &__pool->_M_ver;
+	   __atomic_fetch_add(__wait_addr, 1, __ATOMIC_RELAXED);
+	   __all = true;
+	 }
+
+#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
+      __platform_notify(__wait_addr, __all);
+#else
+      lock_guard<mutex> __l{ __pool->_M_mtx };
+      __pool->_M_cv.notify_all();
+#endif
+    }
   } // namespace __detail
 
+  template<typename _Tp,
+	   typename _Pred, typename _ValFn>
+    void
+    __atomic_wait_address(const _Tp* __addr,
+			  _Pred&& __pred, _ValFn&& __vfn,
+			  bool __bare_wait = false) noexcept
+    {
+      const auto __wait_addr =
+	  reinterpret_cast<const __detail::__platform_wait_t*>(__addr);
+      __detail::__wait_args __args{ __addr, __bare_wait };
+      _Tp __val = __vfn();
+      while (!__pred(__val))
+	{
+	  __detail::__wait_impl(__wait_addr, __args);
+	  __val = __vfn();
+	}
+      // C++26 will return __val
+    }
+
+  inline void
+  __atomic_wait_address_v(const __detail::__platform_wait_t* __addr,
+			  __detail::__platform_wait_t __old,
+			  int __order)
+  {
+    __detail::__wait_args __args{ __addr, __old, __order };
+    // C++26 will not ignore the return value here
+    __detail::__wait_impl(__addr, __args);
+  }
+
   template<typename _Tp, typename _ValFn>
     void
     __atomic_wait_address_v(const _Tp* __addr, _Tp __old,
 			    _ValFn __vfn) noexcept
     {
-      __detail::__enters_wait __w(__addr);
-      __w._M_do_wait_v(__old, __vfn);
-    }
-
-  template<typename _Tp, typename _Pred>
-    void
-    __atomic_wait_address(const _Tp* __addr, _Pred __pred) noexcept
-    {
-      __detail::__enters_wait __w(__addr);
-      __w._M_do_wait(__pred);
-    }
-
-  // This call is to be used by atomic types which track contention externally
-  template<typename _Pred>
-    void
-    __atomic_wait_address_bare(const __detail::__platform_wait_t* __addr,
-			       _Pred __pred) noexcept
-    {
-#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
-      do
-	{
-	  __detail::__platform_wait_t __val;
-	  if (__detail::__bare_wait::_S_do_spin(__addr, __pred, __val))
-	    return;
-	  __detail::__platform_wait(__addr, __val);
-	}
-      while (!__pred());
-#else // !_GLIBCXX_HAVE_PLATFORM_WAIT
-      __detail::__bare_wait __w(__addr);
-      __w._M_do_wait(__pred);
-#endif
+      auto __pfn = [&](const _Tp& __val)
+	  { return !__detail::__atomic_compare(__old, __val); };
+      __atomic_wait_address(__addr, __pfn, forward<_ValFn>(__vfn));
     }
 
   template<typename _Tp>
     void
-    __atomic_notify_address(const _Tp* __addr, bool __all) noexcept
+    __atomic_notify_address(const _Tp* __addr, bool __all,
+			    bool __bare_wait = false) noexcept
     {
-      __detail::__bare_wait __w(__addr);
-      __w._M_notify(__all);
+      const auto __wait_addr =
+	  reinterpret_cast<const __detail::__platform_wait_t*>(__addr);
+      __detail::__wait_args __args{ __addr, __bare_wait };
+      __detail::__notify_impl(__wait_addr, __all, __args);
     }
-
-  // This call is to be used by atomic types which track contention externally
-  inline void
-  __atomic_notify_address_bare(const __detail::__platform_wait_t* __addr,
-			       bool __all) noexcept
-  {
-#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
-    __detail::__platform_notify(__addr, __all);
-#else
-    __detail::__bare_wait __w(__addr);
-    __w._M_notify(__all, true);
-#endif
-  }
 _GLIBCXX_END_NAMESPACE_VERSION
 } // namespace std
 #endif // __glibcxx_atomic_wait
diff --git a/libstdc++-v3/include/bits/semaphore_base.h b/libstdc++-v3/include/bits/semaphore_base.h
index 7e07b308cdc..f137c9df681 100644
--- a/libstdc++-v3/include/bits/semaphore_base.h
+++ b/libstdc++-v3/include/bits/semaphore_base.h
@@ -194,10 +194,16 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
     __atomic_semaphore(const __atomic_semaphore&) = delete;
     __atomic_semaphore& operator=(const __atomic_semaphore&) = delete;
 
-    static _GLIBCXX_ALWAYS_INLINE bool
-    _S_do_try_acquire(__detail::__platform_wait_t* __counter) noexcept
+    static _GLIBCXX_ALWAYS_INLINE __detail::__platform_wait_t
+    _S_get_current(__detail::__platform_wait_t* __counter) noexcept
+    {
+      return __atomic_impl::load(__counter, memory_order::acquire);
+    }
+
+    static _GLIBCXX_ALWAYS_INLINE bool
+    _S_do_try_acquire(__detail::__platform_wait_t* __counter,
+		      __detail::__platform_wait_t __old) noexcept
     {
-      auto __old = __atomic_impl::load(__counter, memory_order::acquire);
       if (__old == 0)
 	return false;
 
@@ -210,17 +216,21 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
     _GLIBCXX_ALWAYS_INLINE void
     _M_acquire() noexcept
     {
-      auto const __pred =
-	[this] { return _S_do_try_acquire(&this->_M_counter); };
-      std::__atomic_wait_address_bare(&_M_counter, __pred);
+      auto const __vfn = [this]{ return _S_get_current(&this->_M_counter); };
+      auto const __pred = [this](__detail::__platform_wait_t __cur)
+	{ return _S_do_try_acquire(&this->_M_counter, __cur); };
+      std::__atomic_wait_address(&_M_counter, __pred, __vfn, true);
     }
 
     bool
     _M_try_acquire() noexcept
     {
-      auto const __pred =
-	[this] { return _S_do_try_acquire(&this->_M_counter); };
-      return std::__detail::__atomic_spin(__pred);
+      auto const __vfn = [this]{ return _S_get_current(&this->_M_counter); };
+      auto const __pred = [this](__detail::__platform_wait_t __cur)
+	{ return _S_do_try_acquire(&this->_M_counter, __cur); };
+      return __atomic_wait_address_for(&_M_counter, __pred, __vfn,
+					 __detail::__wait_clock_t::duration(),
+					 true);
     }
 
     template<typename _Clock, typename _Duration>
@@ -228,21 +238,22 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
       _M_try_acquire_until(const chrono::time_point<_Clock,
 			   _Duration>& __atime) noexcept
       {
-	auto const __pred =
-	  [this] { return _S_do_try_acquire(&this->_M_counter); };
-
-	return __atomic_wait_address_until_bare(&_M_counter, __pred, __atime);
+	auto const __vfn = [this]{ return _S_get_current(&this->_M_counter); };
+	auto const __pred = [this](__detail::__platform_wait_t __cur)
+	 { return _S_do_try_acquire(&this->_M_counter, __cur); };
+	return std::__atomic_wait_address_until(&_M_counter,
+						__pred, __vfn, __atime, true);
       }
 
     template<typename _Rep, typename _Period>
       _GLIBCXX_ALWAYS_INLINE bool
-      _M_try_acquire_for(const chrono::duration<_Rep, _Period>& __rtime)
-	noexcept
+      _M_try_acquire_for(const chrono::duration<_Rep, _Period>& __rtime) noexcept
       {
-	auto const __pred =
-	  [this] { return _S_do_try_acquire(&this->_M_counter); };
-
-	return __atomic_wait_address_for_bare(&_M_counter, __pred, __rtime);
+	auto const __vfn = [this]{ return _S_get_current(&this->_M_counter); };
+	auto const __pred = [this](__detail::__platform_wait_t __cur)
+	 { return _S_do_try_acquire(&this->_M_counter, __cur); };
+	return std::__atomic_wait_address_for(&_M_counter,
+					      __pred, __vfn, __rtime, true);
       }
 
     _GLIBCXX_ALWAYS_INLINE void
@@ -251,9 +262,9 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
       if (0 < __atomic_impl::fetch_add(&_M_counter, __update, memory_order_release))
 	return;
       if (__update > 1)
-	__atomic_notify_address_bare(&_M_counter, true);
+	__atomic_notify_address(&_M_counter, true, true);
       else
-	__atomic_notify_address_bare(&_M_counter, true);
+	__atomic_notify_address(&_M_counter, true, true);
 // FIXME - Figure out why this does not wake a waiting thread
 //	__atomic_notify_address_bare(&_M_counter, false);
     }
diff --git a/libstdc++-v3/include/std/barrier b/libstdc++-v3/include/std/barrier
index 8e03a5820c6..08286fd95d2 100644
--- a/libstdc++-v3/include/std/barrier
+++ b/libstdc++-v3/include/std/barrier
@@ -192,11 +192,7 @@  It looks different from literature pseudocode for two main reasons:
       wait(arrival_token&& __old_phase) const
       {
 	__atomic_phase_const_ref_t __phase(_M_phase);
-	auto const __test_fn = [=]
-	  {
-	    return __phase.load(memory_order_acquire) != __old_phase;
-	  };
-	std::__atomic_wait_address(&_M_phase, __test_fn);
+	__phase.wait(__old_phase, memory_order_acquire);
       }
 
       void
diff --git a/libstdc++-v3/include/std/latch b/libstdc++-v3/include/std/latch
index 27cd80d7100..525647c1701 100644
--- a/libstdc++-v3/include/std/latch
+++ b/libstdc++-v3/include/std/latch
@@ -74,8 +74,9 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
     _GLIBCXX_ALWAYS_INLINE void
     wait() const noexcept
     {
-      auto const __pred = [this] { return this->try_wait(); };
-      std::__atomic_wait_address(&_M_a, __pred);
+        auto const __vfn = [this] { return this->try_wait(); };
+        auto const __pred = [this](bool __b) { return __b; };
+        std::__atomic_wait_address(&_M_a, __pred, __vfn);
     }
 
     _GLIBCXX_ALWAYS_INLINE void
diff --git a/libstdc++-v3/testsuite/29_atomics/atomic/wait_notify/100334.cc b/libstdc++-v3/testsuite/29_atomics/atomic/wait_notify/100334.cc
index 78e718e4597..aa1d57d1457 100644
--- a/libstdc++-v3/testsuite/29_atomics/atomic/wait_notify/100334.cc
+++ b/libstdc++-v3/testsuite/29_atomics/atomic/wait_notify/100334.cc
@@ -54,8 +54,8 @@  main()
     atom->store(0);
   }
 
-  auto a = &std::__detail::__waiter_pool_base::_S_for(reinterpret_cast<char *>(atomics.a[0]));
-  auto b = &std::__detail::__waiter_pool_base::_S_for(reinterpret_cast<char *>(atomics.a[1]));
+  auto a = &std::__detail::__waiter_pool_impl::_S_impl_for(reinterpret_cast<char *>(atomics.a[0]));
+  auto b = &std::__detail::__waiter_pool_impl::_S_impl_for(reinterpret_cast<char *>(atomics.a[1]));
   VERIFY( a == b );
 
   auto fut0 = std::async(std::launch::async, [&] { atomics.a[0]->wait(0); });