On Fri, Aug 18, 2023 at 11:18:19AM +0800, Zhu, Lipeng wrote:
> From: Lipeng Zhu <lipeng.zhu@intel.com>
>
> This patch try to introduce the rwlock and split the read/write to
> unit_root tree and unit_cache with rwlock instead of the mutex to
> increase CPU efficiency. In the get_gfc_unit function, the percentage
> to step into the insert_unit function is around 30%, in most instances,
> we can get the unit in the phase of reading the unit_cache or unit_root
> tree. So split the read/write phase by rwlock would be an approach to
> make it more parallel.
>
> BTW, the IPC metrics can gain around 9x in our test
> server with 220 cores. The benchmark we used is
> https://github.com/rwesson/NEAT
>
> libgcc/ChangeLog:
>
> * gthr-posix.h (__GTHREAD_RWLOCK_INIT): New macro
> (__gthrw): New function
> (__gthread_rwlock_rdlock): New function
> (__gthread_rwlock_tryrdlock): New function
> (__gthread_rwlock_wrlock): New function
> (__gthread_rwlock_trywrlock): New function
> (__gthread_rwlock_unlock): New function
>
> libgfortran/ChangeLog:
>
> * io/async.c (DEBUG_LINE): New
> * io/async.h (RWLOCK_DEBUG_ADD): New macro
> (CHECK_RDLOCK): New macro
> (CHECK_WRLOCK): New macro
> (TAIL_RWLOCK_DEBUG_QUEUE): New macro
> (IN_RWLOCK_DEBUG_QUEUE): New macro
> (RDLOCK): New macro
> (WRLOCK): New macro
> (RWUNLOCK): New macro
> (RD_TO_WRLOCK): New macro
> (INTERN_RDLOCK): New macro
> (INTERN_WRLOCK): New macro
> (INTERN_RWUNLOCK): New macro
> * io/io.h (internal_proto): Define unit_rwlock
> * io/transfer.c (st_read_done_worker): Relace unit_lock with unit_rwlock
> (st_write_done_worker): Relace unit_lock with unit_rwlock
> * io/unit.c (get_gfc_unit): Relace unit_lock with unit_rwlock
> (if): Relace unit_lock with unit_rwlock
> (close_unit_1): Relace unit_lock with unit_rwlock
> (close_units): Relace unit_lock with unit_rwlock
> (newunit_alloc): Relace unit_lock with unit_rwlock
> * io/unix.c (flush_all_units): Relace unit_lock with unit_rwlock
The changeLog entries are all incorrect.
1) they should be indented by a tab, not 4 spaces, and should end with
a dot
2) when several consecutive descriptions have the same text, especially
when it is long, it should use Likewise. for the 2nd and following
3) (internal_proto) is certainly not what you've changed, from what I can
see in io.h you've done:
* io/io.h (struct gfc_unit): Change UNIT_LOCK to UNIT_RWLOCK in
a comment.
(unit_lock): Remove including associated internal_proto.
(unit_rwlock): New declarations including associated internal_proto.
(dec_waiting_unlocked): Use WRLOCK and RWUNLOCK on unit_rwlock
instead of __gthread_mutex_lock and __gthread_mutex_unlock on
unit_lock.
(if) is certainly not what you've changed either, always find what
function or macro the change was in, or if you remove something, state
it, if you add something, state it.
4) all the
Replace unit_lock with unit_rwlock. descriptions only partially match
reality, you've also changed the operations on those variables.
> --- a/libgfortran/io/async.h
> +++ b/libgfortran/io/async.h
> @@ -207,9 +207,132 @@
> INTERN_LOCK (&debug_queue_lock); \
> MUTEX_DEBUG_ADD (mutex); \
> INTERN_UNLOCK (&debug_queue_lock); \
> - DEBUG_PRINTF ("%s" DEBUG_RED "ACQ:" DEBUG_NORM " %-30s %78p\n", aio_prefix, #mutex, mutex); \
> + DEBUG_PRINTF ("%s" DEBUG_RED "ACQ:" DEBUG_NORM " %-30s %78p\n", aio_prefix, #mutex, \
> + mutex); \
Why are you changing this at all?
> +#define RD_TO_WRLOCK(rwlock) \
> + RWUNLOCK (rwlock);\
At least a space before \ (or better tab
> +#define RD_TO_WRLOCK(rwlock) \
> + RWUNLOCK (rwlock);\
Likewise.
> + WRLOCK (rwlock);
> +#endif
> +#endif
> +
> +#ifndef __GTHREAD_RWLOCK_INIT
> +#define RDLOCK(rwlock) LOCK (rwlock)
> +#define WRLOCK(rwlock) LOCK (rwlock)
> +#define RWUNLOCK(rwlock) UNLOCK (rwlock)
> +#define RD_TO_WRLOCK(rwlock) {}
do {} while (0)
instead of {}
?
> #endif
>
> #define INTERN_LOCK(mutex) T_ERROR (__gthread_mutex_lock, mutex);
>
> #define INTERN_UNLOCK(mutex) T_ERROR (__gthread_mutex_unlock, mutex);
>
> +#define INTERN_RDLOCK(rwlock) T_ERROR (__gthread_rwlock_rdlock, rwlock);
> +#define INTERN_WRLOCK(rwlock) T_ERROR (__gthread_rwlock_wrlock, rwlock);
> +#define INTERN_RWUNLOCK(rwlock) T_ERROR (__gthread_rwlock_unlock, rwlock);
Admittedly preexisting issue, but I wonder why the ; at the end, all the
uses of RDLOCK etc. I've seen were followed by ;
> --- a/libgfortran/io/unit.c
> +++ b/libgfortran/io/unit.c
> @@ -33,34 +33,36 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
>
>
> /* IO locking rules:
> - UNIT_LOCK is a master lock, protecting UNIT_ROOT tree and UNIT_CACHE.
> + UNIT_RWLOCK is a master lock, protecting UNIT_ROOT tree and UNIT_CACHE.
master rw lock
?
> +/* get_gfc_unit_from_root()-- Given an integer, return a pointer
> + to the unit structure. Returns NULL if the unit does not exist,
> + otherwise returns a locked unit. */
Formatting, to is indented differently from otherwise, both should
be 3 spaces, and there should be 2 spaces after ., rather than just
one (in both spots).
> +
> +static inline gfc_unit * get_gfc_unit_from_unit_root (int n)
Formatting, there shouldn't be space after *, but also function
name should be at the beginning of line, so in this case it should
be
static inline gfc_unit *
get_gfc_unit_from_unit_root (int n)
> +{
> + gfc_unit *p;
> + int c = 0;
> + p = unit_root;
This should be indented by 2 spaces rather than 4.
> + while (p != NULL)
> + {
> + c = compare (n, p->unit_number);
> + if (c < 0)
> + p = p->left;
> + if (c > 0)
> + p = p->right;
> + if (c == 0)
> + break;
> + }
> + return p;
And the above return p; line as well.
Jakub
On 2023/12/8 18:19, Jakub Jelinek wrote:
> On Fri, Aug 18, 2023 at 11:18:19AM +0800, Zhu, Lipeng wrote:
> > From: Lipeng Zhu <lipeng.zhu@intel.com>
> >
> > This patch try to introduce the rwlock and split the read/write to
> > unit_root tree and unit_cache with rwlock instead of the mutex to
> > increase CPU efficiency. In the get_gfc_unit function, the percentage
> > to step into the insert_unit function is around 30%, in most
> > instances, we can get the unit in the phase of reading the unit_cache
> > or unit_root tree. So split the read/write phase by rwlock would be an
> > approach to make it more parallel.
> >
> > BTW, the IPC metrics can gain around 9x in our test server with 220
> > cores. The benchmark we used is https://github.com/rwesson/NEAT
> >
> > libgcc/ChangeLog:
> >
> > * gthr-posix.h (__GTHREAD_RWLOCK_INIT): New macro
> > (__gthrw): New function
> > (__gthread_rwlock_rdlock): New function
> > (__gthread_rwlock_tryrdlock): New function
> > (__gthread_rwlock_wrlock): New function
> > (__gthread_rwlock_trywrlock): New function
> > (__gthread_rwlock_unlock): New function
> >
> > libgfortran/ChangeLog:
> >
> > * io/async.c (DEBUG_LINE): New
> > * io/async.h (RWLOCK_DEBUG_ADD): New macro
> > (CHECK_RDLOCK): New macro
> > (CHECK_WRLOCK): New macro
> > (TAIL_RWLOCK_DEBUG_QUEUE): New macro
> > (IN_RWLOCK_DEBUG_QUEUE): New macro
> > (RDLOCK): New macro
> > (WRLOCK): New macro
> > (RWUNLOCK): New macro
> > (RD_TO_WRLOCK): New macro
> > (INTERN_RDLOCK): New macro
> > (INTERN_WRLOCK): New macro
> > (INTERN_RWUNLOCK): New macro
> > * io/io.h (internal_proto): Define unit_rwlock
> > * io/transfer.c (st_read_done_worker): Relace unit_lock with unit_rwlock
> > (st_write_done_worker): Relace unit_lock with unit_rwlock
> > * io/unit.c (get_gfc_unit): Relace unit_lock with unit_rwlock
> > (if): Relace unit_lock with unit_rwlock
> > (close_unit_1): Relace unit_lock with unit_rwlock
> > (close_units): Relace unit_lock with unit_rwlock
> > (newunit_alloc): Relace unit_lock with unit_rwlock
> > * io/unix.c (flush_all_units): Relace unit_lock with unit_rwlock
>
> The changeLog entries are all incorrect.
> 1) they should be indented by a tab, not 4 spaces, and should end with
> a dot
> 2) when several consecutive descriptions have the same text, especially
> when it is long, it should use Likewise. for the 2nd and following
> 3) (internal_proto) is certainly not what you've changed, from what I can
> see in io.h you've done:
> * io/io.h (struct gfc_unit): Change UNIT_LOCK to UNIT_RWLOCK in
> a comment.
> (unit_lock): Remove including associated internal_proto.
> (unit_rwlock): New declarations including associated internal_proto.
> (dec_waiting_unlocked): Use WRLOCK and RWUNLOCK on unit_rwlock
> instead of __gthread_mutex_lock and __gthread_mutex_unlock on
> unit_lock.
> (if) is certainly not what you've changed either, always find what
> function or macro the change was in, or if you remove something, state
> it, if you add something, state it.
> 4) all the
> Replace unit_lock with unit_rwlock. descriptions only partially match
> reality, you've also changed the operations on those variables.
>
Hi Jakub,
Thanks for your help, very appreciated! I just updated the patch according to
your comments. A new version [PATCH V7] is sent out for your review which
update the change log and formatting code according to your following
comments.
Lipeng Zhu
> > --- a/libgfortran/io/async.h
> > +++ b/libgfortran/io/async.h
> > @@ -207,9 +207,132 @@
> > INTERN_LOCK (&debug_queue_lock);
> \
> > MUTEX_DEBUG_ADD (mutex);
> \
> > INTERN_UNLOCK (&debug_queue_lock);
> \
> > - DEBUG_PRINTF ("%s" DEBUG_RED "ACQ:" DEBUG_NORM " %-
> 30s %78p\n", aio_prefix, #mutex, mutex); \
> > + DEBUG_PRINTF ("%s" DEBUG_RED "ACQ:" DEBUG_NORM " %-
> 30s %78p\n", aio_prefix, #mutex, \
> > + mutex); \
>
> Why are you changing this at all?
>
> > +#define RD_TO_WRLOCK(rwlock) \
> > + RWUNLOCK (rwlock);\
>
> At least a space before \ (or better tab
>
> > +#define RD_TO_WRLOCK(rwlock) \
> > + RWUNLOCK (rwlock);\
>
> Likewise.
>
> > + WRLOCK (rwlock);
> > +#endif
> > +#endif
> > +
> > +#ifndef __GTHREAD_RWLOCK_INIT
> > +#define RDLOCK(rwlock) LOCK (rwlock)
> > +#define WRLOCK(rwlock) LOCK (rwlock)
> > +#define RWUNLOCK(rwlock) UNLOCK (rwlock) #define
> RD_TO_WRLOCK(rwlock)
> > +{}
>
> do {} while (0)
> instead of {}
> ?
>
> > #endif
> >
> > #define INTERN_LOCK(mutex) T_ERROR (__gthread_mutex_lock, mutex);
> >
> > #define INTERN_UNLOCK(mutex) T_ERROR (__gthread_mutex_unlock,
> mutex);
> >
> > +#define INTERN_RDLOCK(rwlock) T_ERROR (__gthread_rwlock_rdlock,
> > +rwlock); #define INTERN_WRLOCK(rwlock) T_ERROR
> > +(__gthread_rwlock_wrlock, rwlock); #define INTERN_RWUNLOCK(rwlock)
> > +T_ERROR (__gthread_rwlock_unlock, rwlock);
>
> Admittedly preexisting issue, but I wonder why the ; at the end, all the uses of
> RDLOCK etc. I've seen were followed by ;
>
> > --- a/libgfortran/io/unit.c
> > +++ b/libgfortran/io/unit.c
> > @@ -33,34 +33,36 @@ see the files COPYING3 and COPYING.RUNTIME
> > respectively. If not, see
> >
> >
> > /* IO locking rules:
> > - UNIT_LOCK is a master lock, protecting UNIT_ROOT tree and UNIT_CACHE.
> > + UNIT_RWLOCK is a master lock, protecting UNIT_ROOT tree and
> UNIT_CACHE.
>
> master rw lock
> ?
>
> > +/* get_gfc_unit_from_root()-- Given an integer, return a pointer
> > + to the unit structure. Returns NULL if the unit does not exist,
> > + otherwise returns a locked unit. */
>
> Formatting, to is indented differently from otherwise, both should be 3 spaces,
> and there should be 2 spaces after ., rather than just one (in both spots).
>
> > +
> > +static inline gfc_unit * get_gfc_unit_from_unit_root (int n)
>
> Formatting, there shouldn't be space after *, but also function name should be
> at the beginning of line, so in this case it should be static inline gfc_unit *
> get_gfc_unit_from_unit_root (int n)
>
> > +{
> > + gfc_unit *p;
> > + int c = 0;
> > + p = unit_root;
>
> This should be indented by 2 spaces rather than 4.
>
> > + while (p != NULL)
> > + {
> > + c = compare (n, p->unit_number);
> > + if (c < 0)
> > + p = p->left;
> > + if (c > 0)
> > + p = p->right;
> > + if (c == 0)
> > + break;
> > + }
> > + return p;
>
> And the above return p; line as well.
>
> Jakub
@@ -48,6 +48,9 @@ typedef pthread_t __gthread_t;
typedef pthread_key_t __gthread_key_t;
typedef pthread_once_t __gthread_once_t;
typedef pthread_mutex_t __gthread_mutex_t;
+#ifndef __cplusplus
+typedef pthread_rwlock_t __gthread_rwlock_t;
+#endif
typedef pthread_mutex_t __gthread_recursive_mutex_t;
typedef pthread_cond_t __gthread_cond_t;
typedef struct timespec __gthread_time_t;
@@ -58,6 +61,9 @@ typedef struct timespec __gthread_time_t;
#define __GTHREAD_MUTEX_INIT PTHREAD_MUTEX_INITIALIZER
#define __GTHREAD_MUTEX_INIT_FUNCTION __gthread_mutex_init_function
+#ifndef __cplusplus
+#define __GTHREAD_RWLOCK_INIT PTHREAD_RWLOCK_INITIALIZER
+#endif
#define __GTHREAD_ONCE_INIT PTHREAD_ONCE_INIT
#if defined(PTHREAD_RECURSIVE_MUTEX_INITIALIZER)
#define __GTHREAD_RECURSIVE_MUTEX_INIT PTHREAD_RECURSIVE_MUTEX_INITIALIZER
@@ -135,6 +141,13 @@ __gthrw(pthread_mutexattr_init)
__gthrw(pthread_mutexattr_settype)
__gthrw(pthread_mutexattr_destroy)
+#ifndef __cplusplus
+__gthrw(pthread_rwlock_rdlock)
+__gthrw(pthread_rwlock_tryrdlock)
+__gthrw(pthread_rwlock_wrlock)
+__gthrw(pthread_rwlock_trywrlock)
+__gthrw(pthread_rwlock_unlock)
+#endif
#if defined(_LIBOBJC) || defined(_LIBOBJC_WEAK)
/* Objective-C. */
@@ -885,6 +898,53 @@ __gthread_cond_destroy (__gthread_cond_t* __cond)
return __gthrw_(pthread_cond_destroy) (__cond);
}
+#ifndef __cplusplus
+static inline int
+__gthread_rwlock_rdlock (__gthread_rwlock_t *__rwlock)
+{
+ if (__gthread_active_p ())
+ return __gthrw_(pthread_rwlock_rdlock) (__rwlock);
+ else
+ return 0;
+}
+
+static inline int
+__gthread_rwlock_tryrdlock (__gthread_rwlock_t *__rwlock)
+{
+ if (__gthread_active_p ())
+ return __gthrw_(pthread_rwlock_tryrdlock) (__rwlock);
+ else
+ return 0;
+}
+
+static inline int
+__gthread_rwlock_wrlock (__gthread_rwlock_t *__rwlock)
+{
+ if (__gthread_active_p ())
+ return __gthrw_(pthread_rwlock_wrlock) (__rwlock);
+ else
+ return 0;
+}
+
+static inline int
+__gthread_rwlock_trywrlock (__gthread_rwlock_t *__rwlock)
+{
+ if (__gthread_active_p ())
+ return __gthrw_(pthread_rwlock_trywrlock) (__rwlock);
+ else
+ return 0;
+}
+
+static inline int
+__gthread_rwlock_unlock (__gthread_rwlock_t *__rwlock)
+{
+ if (__gthread_active_p ())
+ return __gthrw_(pthread_rwlock_unlock) (__rwlock);
+ else
+ return 0;
+}
+#endif
+
#endif /* _LIBOBJC */
#endif /* ! GCC_GTHR_POSIX_H */
@@ -42,6 +42,10 @@ DEBUG_LINE (__thread const char *aio_prefix = MPREFIX);
DEBUG_LINE (__gthread_mutex_t debug_queue_lock = __GTHREAD_MUTEX_INIT;)
DEBUG_LINE (aio_lock_debug *aio_debug_head = NULL;)
+#ifdef __GTHREAD_RWLOCK_INIT
+DEBUG_LINE (aio_rwlock_debug *aio_rwlock_debug_head = NULL;)
+DEBUG_LINE (__gthread_rwlock_t debug_queue_rwlock = __GTHREAD_RWLOCK_INIT;)
+#endif
/* Current unit for asynchronous I/O. Needed for error reporting. */
@@ -207,9 +207,132 @@
INTERN_LOCK (&debug_queue_lock); \
MUTEX_DEBUG_ADD (mutex); \
INTERN_UNLOCK (&debug_queue_lock); \
- DEBUG_PRINTF ("%s" DEBUG_RED "ACQ:" DEBUG_NORM " %-30s %78p\n", aio_prefix, #mutex, mutex); \
+ DEBUG_PRINTF ("%s" DEBUG_RED "ACQ:" DEBUG_NORM " %-30s %78p\n", aio_prefix, #mutex, \
+ mutex); \
} while (0)
+#ifdef __GTHREAD_RWLOCK_INIT
+#define RWLOCK_DEBUG_ADD(rwlock) do { \
+ aio_rwlock_debug *n; \
+ n = xmalloc (sizeof (aio_rwlock_debug)); \
+ n->prev = TAIL_RWLOCK_DEBUG_QUEUE; \
+ if (n->prev) \
+ n->prev->next = n; \
+ n->next = NULL; \
+ n->line = __LINE__; \
+ n->func = __FUNCTION__; \
+ n->rw = rwlock; \
+ if (!aio_rwlock_debug_head) { \
+ aio_rwlock_debug_head = n; \
+ } \
+ } while (0)
+
+#define CHECK_RDLOCK(rwlock, status) do { \
+ aio_rwlock_debug *curr; \
+ INTERN_WRLOCK (&debug_queue_rwlock); \
+ if (__gthread_rwlock_tryrdlock (rwlock)) { \
+ if ((curr = IN_RWLOCK_DEBUG_QUEUE (rwlock))) { \
+ sprintf (status, DEBUG_RED "%s():%d" DEBUG_NORM, curr->func, curr->line); \
+ } else \
+ sprintf (status, DEBUG_RED "unknown" DEBUG_NORM); \
+ } \
+ else { \
+ __gthread_rwlock_unlock (rwlock); \
+ sprintf (status, DEBUG_GREEN "rwunlocked" DEBUG_NORM); \
+ } \
+ INTERN_RWUNLOCK (&debug_queue_rwlock); \
+ }while (0)
+
+#define CHECK_WRLOCK(rwlock, status) do { \
+ aio_rwlock_debug *curr; \
+ INTERN_WRLOCK (&debug_queue_rwlock); \
+ if (__gthread_rwlock_trywrlock (rwlock)) { \
+ if ((curr = IN_RWLOCK_DEBUG_QUEUE (rwlock))) { \
+ sprintf (status, DEBUG_RED "%s():%d" DEBUG_NORM, curr->func, curr->line); \
+ } else \
+ sprintf (status, DEBUG_RED "unknown" DEBUG_NORM); \
+ } \
+ else { \
+ __gthread_rwlock_unlock (rwlock); \
+ sprintf (status, DEBUG_GREEN "rwunlocked" DEBUG_NORM); \
+ } \
+ INTERN_RWUNLOCK (&debug_queue_rwlock); \
+ }while (0)
+
+#define TAIL_RWLOCK_DEBUG_QUEUE ({ \
+ aio_rwlock_debug *curr = aio_rwlock_debug_head; \
+ while (curr && curr->next) { \
+ curr = curr->next; \
+ } \
+ curr; \
+ })
+
+#define IN_RWLOCK_DEBUG_QUEUE(rwlock) ({ \
+ __label__ end; \
+ aio_rwlock_debug *curr = aio_rwlock_debug_head; \
+ while (curr) { \
+ if (curr->rw == rwlock) { \
+ goto end; \
+ } \
+ curr = curr->next; \
+ } \
+ end:; \
+ curr; \
+ })
+
+#define RDLOCK(rwlock) do { \
+ char status[200]; \
+ CHECK_RDLOCK (rwlock, status); \
+ DEBUG_PRINTF ("%s%-42s prev: %-35s %20s():%-5d %18p\n", aio_prefix, \
+ DEBUG_RED "RDLOCK: " DEBUG_NORM #rwlock, status, __FUNCTION__, __LINE__, (void *) rwlock); \
+ INTERN_RDLOCK (rwlock); \
+ INTERN_WRLOCK (&debug_queue_rwlock); \
+ RWLOCK_DEBUG_ADD (rwlock); \
+ INTERN_RWUNLOCK (&debug_queue_rwlock); \
+ DEBUG_PRINTF ("%s" DEBUG_RED "ACQ:" DEBUG_NORM " %-30s %78p\n", aio_prefix, #rwlock, rwlock); \
+ } while (0)
+
+#define WRLOCK(rwlock) do { \
+ char status[200]; \
+ CHECK_WRLOCK (rwlock, status); \
+ DEBUG_PRINTF ("%s%-42s prev: %-35s %20s():%-5d %18p\n", aio_prefix, \
+ DEBUG_RED "WRLOCK: " DEBUG_NORM #rwlock, status, __FUNCTION__, __LINE__, (void *) rwlock); \
+ INTERN_WRLOCK (rwlock); \
+ INTERN_WRLOCK (&debug_queue_rwlock); \
+ RWLOCK_DEBUG_ADD (rwlock); \
+ INTERN_RWUNLOCK (&debug_queue_rwlock); \
+ DEBUG_PRINTF ("%s" DEBUG_RED "ACQ:" DEBUG_NORM " %-30s %78p\n", aio_prefix, #rwlock, rwlock); \
+ } while (0)
+
+#define RWUNLOCK(rwlock) do { \
+ aio_rwlock_debug *curr; \
+ DEBUG_PRINTF ("%s%-75s %20s():%-5d %18p\n", aio_prefix, DEBUG_GREEN "RWUNLOCK: " DEBUG_NORM #rwlock, \
+ __FUNCTION__, __LINE__, (void *) rwlock); \
+ INTERN_WRLOCK (&debug_queue_rwlock); \
+ curr = IN_RWLOCK_DEBUG_QUEUE (rwlock); \
+ if (curr) \
+ { \
+ if (curr->prev) \
+ curr->prev->next = curr->next; \
+ if (curr->next) { \
+ curr->next->prev = curr->prev; \
+ if (curr == aio_rwlock_debug_head) \
+ aio_rwlock_debug_head = curr->next; \
+ } else { \
+ if (curr == aio_rwlock_debug_head) \
+ aio_rwlock_debug_head = NULL; \
+ } \
+ free (curr); \
+ } \
+ INTERN_RWUNLOCK (&debug_queue_rwlock); \
+ INTERN_RWUNLOCK (rwlock); \
+ } while (0)
+
+#define RD_TO_WRLOCK(rwlock) \
+ RWUNLOCK (rwlock);\
+ WRLOCK (rwlock);
+#endif
+
#define DEBUG_LINE(...) __VA_ARGS__
#else
@@ -221,12 +344,31 @@
#define LOCK(mutex) INTERN_LOCK (mutex)
#define UNLOCK(mutex) INTERN_UNLOCK (mutex)
#define TRYLOCK(mutex) (__gthread_mutex_trylock (mutex))
+#ifdef __GTHREAD_RWLOCK_INIT
+#define RDLOCK(rwlock) INTERN_RDLOCK (rwlock)
+#define WRLOCK(rwlock) INTERN_WRLOCK (rwlock)
+#define RWUNLOCK(rwlock) INTERN_RWUNLOCK (rwlock)
+#define RD_TO_WRLOCK(rwlock) \
+ RWUNLOCK (rwlock);\
+ WRLOCK (rwlock);
+#endif
+#endif
+
+#ifndef __GTHREAD_RWLOCK_INIT
+#define RDLOCK(rwlock) LOCK (rwlock)
+#define WRLOCK(rwlock) LOCK (rwlock)
+#define RWUNLOCK(rwlock) UNLOCK (rwlock)
+#define RD_TO_WRLOCK(rwlock) {}
#endif
#define INTERN_LOCK(mutex) T_ERROR (__gthread_mutex_lock, mutex);
#define INTERN_UNLOCK(mutex) T_ERROR (__gthread_mutex_unlock, mutex);
+#define INTERN_RDLOCK(rwlock) T_ERROR (__gthread_rwlock_rdlock, rwlock);
+#define INTERN_WRLOCK(rwlock) T_ERROR (__gthread_rwlock_wrlock, rwlock);
+#define INTERN_RWUNLOCK(rwlock) T_ERROR (__gthread_rwlock_unlock, rwlock);
+
#if ASYNC_IO
/* au->lock has to be held when calling this macro. */
@@ -288,8 +430,18 @@ DEBUG_LINE (typedef struct aio_lock_debug{
struct aio_lock_debug *prev;
} aio_lock_debug;)
+DEBUG_LINE (typedef struct aio_rwlock_debug{
+ __gthread_rwlock_t *rw;
+ int line;
+ const char *func;
+ struct aio_rwlock_debug *next;
+ struct aio_rwlock_debug *prev;
+} aio_rwlock_debug;)
+
DEBUG_LINE (extern aio_lock_debug *aio_debug_head;)
DEBUG_LINE (extern __gthread_mutex_t debug_queue_lock;)
+DEBUG_LINE (extern aio_rwlock_debug *aio_rwlock_debug_head;)
+DEBUG_LINE (extern __gthread_rwlock_t debug_queue_rwlock;)
/* Thread - local storage of the current unit we are looking at. Needed for
error reporting. */
@@ -690,7 +690,7 @@ typedef struct gfc_unit
from the UNIT_ROOT tree, but doesn't free it and the
last of the waiting threads will do that.
This must be either atomically increased/decreased, or
- always guarded by UNIT_LOCK. */
+ always guarded by UNIT_RWLOCK. */
int waiting;
/* Flag set by close_unit if the unit as been closed.
Must be manipulated under unit's lock. */
@@ -769,8 +769,13 @@ internal_proto(default_recl);
extern gfc_unit *unit_root;
internal_proto(unit_root);
-extern __gthread_mutex_t unit_lock;
-internal_proto(unit_lock);
+#ifdef __GTHREAD_RWLOCK_INIT
+extern __gthread_rwlock_t unit_rwlock;
+internal_proto(unit_rwlock);
+#else
+extern __gthread_mutex_t unit_rwlock;
+internal_proto(unit_rwlock);
+#endif
extern int close_unit (gfc_unit *);
internal_proto(close_unit);
@@ -1015,9 +1020,9 @@ dec_waiting_unlocked (gfc_unit *u)
#ifdef HAVE_ATOMIC_FETCH_ADD
(void) __atomic_fetch_add (&u->waiting, -1, __ATOMIC_RELAXED);
#else
- __gthread_mutex_lock (&unit_lock);
+ WRLOCK (&unit_rwlock);
u->waiting--;
- __gthread_mutex_unlock (&unit_lock);
+ RWUNLOCK (&unit_rwlock);
#endif
}
@@ -4539,9 +4539,9 @@ st_read_done_worker (st_parameter_dt *dtp, bool unlock)
if (free_newunit)
{
/* Avoid inverse lock issues by placing after unlock_unit. */
- LOCK (&unit_lock);
+ WRLOCK (&unit_rwlock);
newunit_free (dtp->common.unit);
- UNLOCK (&unit_lock);
+ RWUNLOCK (&unit_rwlock);
}
}
@@ -4636,9 +4636,9 @@ st_write_done_worker (st_parameter_dt *dtp, bool unlock)
if (free_newunit)
{
/* Avoid inverse lock issues by placing after unlock_unit. */
- LOCK (&unit_lock);
+ WRLOCK (&unit_rwlock);
newunit_free (dtp->common.unit);
- UNLOCK (&unit_lock);
+ RWUNLOCK (&unit_rwlock);
}
}
@@ -33,34 +33,36 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
/* IO locking rules:
- UNIT_LOCK is a master lock, protecting UNIT_ROOT tree and UNIT_CACHE.
+ UNIT_RWLOCK is a master lock, protecting UNIT_ROOT tree and UNIT_CACHE.
+ Using an rwlock improves efficiency by allowing us to separate readers
+ and writers of both UNIT_ROOT and UNIT_CACHE.
Concurrent use of different units should be supported, so
each unit has its own lock, LOCK.
Open should be atomic with its reopening of units and list_read.c
in several places needs find_unit another unit while holding stdin
- unit's lock, so it must be possible to acquire UNIT_LOCK while holding
+ unit's lock, so it must be possible to acquire UNIT_RWLOCK while holding
some unit's lock. Therefore to avoid deadlocks, it is forbidden
- to acquire unit's private locks while holding UNIT_LOCK, except
+ to acquire unit's private locks while holding UNIT_RWLOCK, except
for freshly created units (where no other thread can get at their
address yet) or when using just trylock rather than lock operation.
In addition to unit's private lock each unit has a WAITERS counter
and CLOSED flag. WAITERS counter must be either only
atomically incremented/decremented in all places (if atomic builtins
- are supported), or protected by UNIT_LOCK in all places (otherwise).
+ are supported), or protected by UNIT_RWLOCK in all places (otherwise).
CLOSED flag must be always protected by unit's LOCK.
- After finding a unit in UNIT_CACHE or UNIT_ROOT with UNIT_LOCK held,
+ After finding a unit in UNIT_CACHE or UNIT_ROOT with UNIT_RWLOCK held,
WAITERS must be incremented to avoid concurrent close from freeing
- the unit between unlocking UNIT_LOCK and acquiring unit's LOCK.
- Unit freeing is always done under UNIT_LOCK. If close_unit sees any
+ the unit between unlocking UNIT_RWLOCK and acquiring unit's LOCK.
+ Unit freeing is always done under UNIT_RWLOCK. If close_unit sees any
WAITERS, it doesn't free the unit but instead sets the CLOSED flag
and the thread that decrements WAITERS to zero while CLOSED flag is
- set is responsible for freeing it (while holding UNIT_LOCK).
+ set is responsible for freeing it (while holding UNIT_RWLOCK).
flush_all_units operation is iterating over the unit tree with
- increasing UNIT_NUMBER while holding UNIT_LOCK and attempting to
+ increasing UNIT_NUMBER while holding UNIT_RWLOCK and attempting to
flush each unit (and therefore needs the unit's LOCK held as well).
To avoid deadlocks, it just trylocks the LOCK and if unsuccessful,
- remembers the current unit's UNIT_NUMBER, unlocks UNIT_LOCK, acquires
- unit's LOCK and after flushing reacquires UNIT_LOCK and restarts with
+ remembers the current unit's UNIT_NUMBER, unlocks UNIT_RWLOCK, acquires
+ unit's LOCK and after flushing reacquires UNIT_RWLOCK and restarts with
the smallest UNIT_NUMBER above the last one flushed.
If find_unit/find_or_create_unit/find_file/get_unit routines return
@@ -101,10 +103,14 @@ gfc_offset max_offset;
gfc_offset default_recl;
gfc_unit *unit_root;
+#ifdef __GTHREAD_RWLOCK_INIT
+__gthread_rwlock_t unit_rwlock = __GTHREAD_RWLOCK_INIT;
+#else
#ifdef __GTHREAD_MUTEX_INIT
-__gthread_mutex_t unit_lock = __GTHREAD_MUTEX_INIT;
+__gthread_mutex_t unit_rwlock = __GTHREAD_MUTEX_INIT;
#else
-__gthread_mutex_t unit_lock;
+__gthread_mutex_t unit_rwlock;
+#endif
#endif
/* We use these filenames for error reporting. */
@@ -317,6 +323,27 @@ delete_unit (gfc_unit *old)
unit_root = delete_treap (old, unit_root);
}
+/* get_gfc_unit_from_root()-- Given an integer, return a pointer
+ to the unit structure. Returns NULL if the unit does not exist,
+ otherwise returns a locked unit. */
+
+static inline gfc_unit * get_gfc_unit_from_unit_root (int n)
+{
+ gfc_unit *p;
+ int c = 0;
+ p = unit_root;
+ while (p != NULL)
+ {
+ c = compare (n, p->unit_number);
+ if (c < 0)
+ p = p->left;
+ if (c > 0)
+ p = p->right;
+ if (c == 0)
+ break;
+ }
+ return p;
+}
/* get_gfc_unit()-- Given an integer, return a pointer to the unit
structure. Returns NULL if the unit does not exist,
@@ -329,7 +356,7 @@ get_gfc_unit (int n, int do_create)
int c, created = 0;
NOTE ("Unit n=%d, do_create = %d", n, do_create);
- LOCK (&unit_lock);
+ RDLOCK (&unit_rwlock);
retry:
for (c = 0; c < CACHE_SIZE; c++)
@@ -339,18 +366,25 @@ retry:
goto found;
}
- p = unit_root;
- while (p != NULL)
- {
- c = compare (n, p->unit_number);
- if (c < 0)
- p = p->left;
- if (c > 0)
- p = p->right;
- if (c == 0)
- break;
- }
-
+ p = get_gfc_unit_from_unit_root (n);
+
+ /* We did not find a unit in the cache nor in the unit list,
+ create a new (locked) unit and insert into the unit list and
+ cache. Manipulating either or both the unit list and the unit
+ cache requires to hold a write-lock [for obvious reasons]:
+ By separating the read/write lock, we will greatly reduce
+ the contention on the read part, while the write part is
+ unlikely once the unit hits the cache. */
+ RD_TO_WRLOCK (&unit_rwlock);
+
+ /* In the case of high concurrency, when multiple threads want
+ to find or create the same unit, the unit number may not
+ exist in cache nor in the unit list during read phase, then
+ threads will acquire the write-lock to insert the same unit
+ number to unit list. To avoid duplicate insert, we need to
+ find unit list once again to ensure that the unit number
+ not exist. */
+ p = get_gfc_unit_from_unit_root (n);
if (p == NULL && do_create)
{
p = insert_unit (n);
@@ -368,8 +402,8 @@ retry:
if (created)
{
/* Newly created units have their lock held already
- from insert_unit. Just unlock UNIT_LOCK and return. */
- UNLOCK (&unit_lock);
+ from insert_unit. Just unlock UNIT_RWLOCK and return. */
+ RWUNLOCK (&unit_rwlock);
return p;
}
@@ -380,7 +414,7 @@ found:
if (! TRYLOCK (&p->lock))
{
/* assert (p->closed == 0); */
- UNLOCK (&unit_lock);
+ RWUNLOCK (&unit_rwlock);
return p;
}
@@ -388,14 +422,14 @@ found:
}
- UNLOCK (&unit_lock);
+ RWUNLOCK (&unit_rwlock);
if (p != NULL && (p->child_dtio == 0))
{
LOCK (&p->lock);
if (p->closed)
{
- LOCK (&unit_lock);
+ WRLOCK (&unit_rwlock);
UNLOCK (&p->lock);
if (predec_waiting_locked (p) == 0)
destroy_unit_mutex (p);
@@ -593,8 +627,8 @@ init_units (void)
#endif
#endif
-#ifndef __GTHREAD_MUTEX_INIT
- __GTHREAD_MUTEX_INIT_FUNCTION (&unit_lock);
+#if (!defined(__GTHREAD_RWLOCK_INIT) && !defined(__GTHREAD_MUTEX_INIT))
+ __GTHREAD_MUTEX_INIT_FUNCTION (&unit_rwlock);
#endif
if (sizeof (max_offset) == 8)
@@ -731,7 +765,7 @@ close_unit_1 (gfc_unit *u, int locked)
u->closed = 1;
if (!locked)
- LOCK (&unit_lock);
+ WRLOCK (&unit_rwlock);
for (i = 0; i < CACHE_SIZE; i++)
if (unit_cache[i] == u)
@@ -758,7 +792,7 @@ close_unit_1 (gfc_unit *u, int locked)
destroy_unit_mutex (u);
if (!locked)
- UNLOCK (&unit_lock);
+ RWUNLOCK (&unit_rwlock);
return rc;
}
@@ -795,10 +829,10 @@ close_unit (gfc_unit *u)
void
close_units (void)
{
- LOCK (&unit_lock);
+ WRLOCK (&unit_rwlock);
while (unit_root != NULL)
close_unit_1 (unit_root, 1);
- UNLOCK (&unit_lock);
+ RWUNLOCK (&unit_rwlock);
free (newunits);
@@ -905,7 +939,7 @@ finish_last_advance_record (gfc_unit *u)
int
newunit_alloc (void)
{
- LOCK (&unit_lock);
+ WRLOCK (&unit_rwlock);
if (!newunits)
{
newunits = xcalloc (16, 1);
@@ -919,7 +953,7 @@ newunit_alloc (void)
{
newunits[ii] = true;
newunit_lwi = ii + 1;
- UNLOCK (&unit_lock);
+ RWUNLOCK (&unit_rwlock);
return -ii + NEWUNIT_START;
}
}
@@ -932,12 +966,12 @@ newunit_alloc (void)
memset (newunits + old_size, 0, old_size);
newunits[old_size] = true;
newunit_lwi = old_size + 1;
- UNLOCK (&unit_lock);
+ RWUNLOCK (&unit_rwlock);
return -old_size + NEWUNIT_START;
}
-/* Free a previously allocated newunit= unit number. unit_lock must
+/* Free a previously allocated newunit= unit number. unit_rwlock must
be held when calling. */
void
@@ -1774,7 +1774,7 @@ find_file (const char *file, gfc_charlen_type file_len)
id = id_from_path (path);
#endif
- LOCK (&unit_lock);
+ RDLOCK (&unit_rwlock);
retry:
u = find_file0 (unit_root, FIND_FILE0_ARGS);
if (u != NULL)
@@ -1783,19 +1783,19 @@ retry:
if (! __gthread_mutex_trylock (&u->lock))
{
/* assert (u->closed == 0); */
- UNLOCK (&unit_lock);
+ RWUNLOCK (&unit_rwlock);
goto done;
}
inc_waiting_locked (u);
}
- UNLOCK (&unit_lock);
+ RWUNLOCK (&unit_rwlock);
if (u != NULL)
{
LOCK (&u->lock);
if (u->closed)
{
- LOCK (&unit_lock);
+ RDLOCK (&unit_rwlock);
UNLOCK (&u->lock);
if (predec_waiting_locked (u) == 0)
free (u);
@@ -1839,13 +1839,13 @@ flush_all_units (void)
gfc_unit *u;
int min_unit = 0;
- LOCK (&unit_lock);
+ WRLOCK (&unit_rwlock);
do
{
u = flush_all_units_1 (unit_root, min_unit);
if (u != NULL)
inc_waiting_locked (u);
- UNLOCK (&unit_lock);
+ RWUNLOCK (&unit_rwlock);
if (u == NULL)
return;
@@ -1856,13 +1856,13 @@ flush_all_units (void)
if (u->closed == 0)
{
sflush (u->s);
- LOCK (&unit_lock);
+ WRLOCK (&unit_rwlock);
UNLOCK (&u->lock);
(void) predec_waiting_locked (u);
}
else
{
- LOCK (&unit_lock);
+ WRLOCK (&unit_rwlock);
UNLOCK (&u->lock);
if (predec_waiting_locked (u) == 0)
free (u);
new file mode 100644
@@ -0,0 +1,33 @@
+! { dg-do run }
+! Multiple threads call open/write/read/close in concurrency with different unit number,
+! threads can acquire read lock concurrently, to find unit from cache or unit list very frequently,
+! if not found, threads will acquire the write lock exclusively to insert unit to cache and unit list.
+! This test case is used to stress both the read and write lock when access unit cache and list.
+program main
+ use omp_lib
+ implicit none
+ integer:: unit_number, v1, v2, i
+ character(11) :: file_name
+ character(3) :: async = "no"
+ !$omp parallel private (unit_number, v1, v2, file_name, async, i)
+ do i = 0, 100
+ unit_number = 10 + omp_get_thread_num ()
+ write (file_name, "(I3, A)") unit_number, "_tst.dat"
+ file_name = adjustl(file_name)
+ open (unit_number, file=file_name, asynchronous="yes")
+ ! call inquire with file parameter to test find_file in unix.c
+ inquire (file=file_name, asynchronous=async)
+ if (async /= "YES") stop 1
+ write (unit_number, *, asynchronous="yes") unit_number
+ write (unit_number, *, asynchronous="yes") unit_number + 1
+ close(unit_number)
+
+ open (unit_number, file = file_name, asynchronous="yes")
+ read (unit_number, *, asynchronous="yes") v1
+ read (unit_number, *, asynchronous="yes") v2
+ wait (unit_number)
+ if ((v1 /= unit_number) .or. (v2 /= unit_number + 1)) stop 2
+ close(unit_number, status="delete")
+ end do
+ !$omp end parallel
+end program
new file mode 100644
@@ -0,0 +1,22 @@
+! { dg-do run }
+! Insert a unit into cache at the beginning, then start multiple
+! threads to access the same unit concurrency, unit will be found in unit cache during the read lock phase.
+! This test case is used to test the read lock when access unit cache and list.
+program main
+ use omp_lib
+ implicit none
+ integer:: thread_id, total_threads, i, j
+ total_threads = omp_get_max_threads ()
+ open (10, file='tst.dat', asynchronous="yes")
+ !$omp parallel private (thread_id, i, j)
+ do i = 1, 100
+ thread_id = omp_get_thread_num ()
+ do j = 1, 100
+ write (10, *, asynchronous="yes") thread_id, i
+ end do
+ end do
+ !$omp end parallel
+ ! call inquire with file parameter to test find_file in unix.c
+ call flush ()
+ close (10, status="delete")
+end program
new file mode 100644
@@ -0,0 +1,18 @@
+! { dg-do run }
+! Find or create the same unit number in concurrency,
+! at beginning, threads cannot find the unit in cache or unit list,
+! then threads will acquire the write lock to insert unit.
+! This test case is used to ensure that no duplicate unit number will be
+! inserted into cache nor unit list when same unit was accessed in concurrency.
+program main
+ use omp_lib
+ implicit none
+ integer:: i
+ !$omp parallel private (i)
+ do i = 1, 100
+ open (10, file='tst.dat', asynchronous="yes")
+ ! Delete the unit number from cache and unit list to stress write lock.
+ close (10, status="delete")
+ end do
+ !$omp end parallel
+end program