From patchwork Wed Aug 2 08:43:14 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Li, Pan2 via Gcc-patches" X-Patchwork-Id: 129708 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:9f41:0:b0:3e4:2afc:c1 with SMTP id v1csp306670vqx; Wed, 2 Aug 2023 01:46:26 -0700 (PDT) X-Google-Smtp-Source: APBJJlHFV7ek0B2X+glflbKKNzbhbUJ6RWCxrRJkSf4RHxCwuWBe9Msxj/6qU5rdvuJX+szCeBNh X-Received: by 2002:a05:651c:232:b0:2b9:aa4d:3719 with SMTP id z18-20020a05651c023200b002b9aa4d3719mr4525256ljn.12.1690965985794; Wed, 02 Aug 2023 01:46:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690965985; cv=none; d=google.com; s=arc-20160816; b=Fiv+H9K+IsyJd2H2C2wxuxRvrZcZGA8yuZYK9mi0tJS5NQyBoIQoMKZ29r3WMspEAP olsmM9f28nAk8Wa7J50BKMOWe2l+sh6H61pSAswDfOUn8xfXg5zNZhAMs/hCktuozmfs 55OZbgGyIvMIHq2GG1Ug4w//I3xoVfKm8XXyCYgzrUHGq606XdJl2hQ2AOlNH3oDhvru R5E0HWmM0Kx5btC1j15/uSYMHrhui2XBNfBOhXqEkl4TiQN11s1StGx1u+SgRwwcq1YC XHLbx3utF3/Byj6GYDxk8NuaaByVBpuRlx6ajX9QsmdWfWsEeFqyoIXwfhj3YqLZk61L MjRA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence :content-transfer-encoding:mime-version:message-id:date:subject:cc :to:dmarc-filter:delivered-to:dkim-signature:dkim-filter; bh=77gps5pZAjBK0wzx8PYbf/X2it3qarYunGteni5F7yI=; fh=VTJRFa2ZtWMqKuAVQcD5kKMm3r6/diq5yFeOdm639fM=; b=BVAk1G659e6AG0TDIs8DCz/GQob5co6iPf5CbfbDmPHe8kHOAjQBxckbABQj1mBUGE z8q1S3MYmS+yiyEmeuiHqOHU+AT8vQlRIaf7MiPyK+4BQ2y5+Y+lvD9D/cCM5tqcg6BP ysH4AyAwXinmMH9IC5JoGIxY7uG5BY1vPFrp1jMJ4n8LiY4mRYoWTMOlc3IoxC/znFsK odv3JhEZQkkZyYRm/yDmQu0aWAmaEBgy5BKezRAxDXMophYNx/4QhAkMnqalv4kBc6Ck IPZvYlQozdPupUNHnHaYU64thfDmeyoCVxfpKzbHz1TRGx22RPcYPon8Ka4/wmgW5f2/ jtaQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b="la/f1B5k"; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id lo8-20020a170906fa0800b0099bd5561245si10285856ejb.54.2023.08.02.01.46.25 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 02 Aug 2023 01:46:25 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b="la/f1B5k"; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 6D6883858430 for ; Wed, 2 Aug 2023 08:46:24 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 6D6883858430 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1690965984; bh=77gps5pZAjBK0wzx8PYbf/X2it3qarYunGteni5F7yI=; h=To:Cc:Subject:Date:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:From; b=la/f1B5kDvLTvAJoLmth3YbR7/DgB7ZlkaWMNdQ0OiC7ceGQzPH0NC99xInCVTzDF FffxQ6CnEE8bTYYJ1wX9AmcURyNeXiScvVLS2s5WmAZOBgpoAa6OVvhhS7psMTK3uC f8ZeTI/qnmq7gFYidWNXt8Crc5FlsxmzwPRXjNSw= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (unknown [134.134.136.100]) by sourceware.org (Postfix) with ESMTPS id 8601E3858D1E for ; Wed, 2 Aug 2023 08:45:19 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 8601E3858D1E X-IronPort-AV: E=McAfee;i="6600,9927,10789"; a="435845899" X-IronPort-AV: E=Sophos;i="6.01,248,1684825200"; d="scan'208";a="435845899" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Aug 2023 01:45:18 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10789"; a="798997347" X-IronPort-AV: E=Sophos;i="6.01,248,1684825200"; d="scan'208";a="798997347" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by fmsmga004.fm.intel.com with ESMTP; 02 Aug 2023 01:45:15 -0700 Received: from shliclel4217.sh.intel.com (shliclel4217.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id AE14A10079A9; Wed, 2 Aug 2023 16:45:14 +0800 (CST) To: gcc-patches@gcc.gnu.org Cc: ubizjak@gmail.com, jakub@redhat.com, hongtao.liu@intel.com, jun.zhang@intel.com Subject: [PATCH] Enable tpause Exponential backoff and thread delay Date: Wed, 2 Aug 2023 16:43:14 +0800 Message-Id: <20230802084314.965951-1-jun.zhang@intel.com> X-Mailer: git-send-email 2.31.1 MIME-Version: 1.0 X-Spam-Status: No, score=-11.7 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: "Zhang, Jun via Gcc-patches" From: "Li, Pan2 via Gcc-patches" Reply-To: "Zhang, Jun" Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1773106349072876684 X-GMAIL-MSGID: 1773106349072876684 There are two kinds of pause bottleneck, one is in user space, the other is in kernel. Tpause plus backoff could reduce loop count in user space. To kernel, Because tasks start at same time, they usually arrive critial area at same time, this decrease performance. tasks started one by one could avoid it. include/ChangeLog: * localfn.h: define RUNLOCALFN. libgomp/ChangeLog: * config/linux/wait.h: split do_spin * env.c (initialize_env): set gomp_thread_delay_count default value * libgomp.h: add gomp_thread_delay_count * team.c (gomp_thread_start): add RUNLOCALFN * config/linux/spin.h: head file. * config/linux/x86/localfn.h: implement thread delay. * config/linux/x86/mutex.c: implement tpause backoff. * config/linux/x86/spin.h: spin head file. --- include/localfn.h | 6 +++ libgomp/config/linux/spin.h | 12 ++++++ libgomp/config/linux/wait.h | 11 ++--- libgomp/config/linux/x86/localfn.h | 19 +++++++++ libgomp/config/linux/x86/mutex.c | 66 ++++++++++++++++++++++++++++++ libgomp/config/linux/x86/spin.h | 5 +++ libgomp/env.c | 4 ++ libgomp/libgomp.h | 1 + libgomp/team.c | 8 ++-- 9 files changed, 121 insertions(+), 11 deletions(-) create mode 100644 include/localfn.h create mode 100644 libgomp/config/linux/spin.h create mode 100644 libgomp/config/linux/x86/localfn.h create mode 100644 libgomp/config/linux/x86/mutex.c create mode 100644 libgomp/config/linux/x86/spin.h diff --git a/include/localfn.h b/include/localfn.h new file mode 100644 index 00000000000..998e6554aec --- /dev/null +++ b/include/localfn.h @@ -0,0 +1,6 @@ +#define RUNLOCALFN(a, b, c) \ + do \ + { \ + a (b); \ + } \ + while (0) diff --git a/libgomp/config/linux/spin.h b/libgomp/config/linux/spin.h new file mode 100644 index 00000000000..ad8eba275ed --- /dev/null +++ b/libgomp/config/linux/spin.h @@ -0,0 +1,12 @@ +static inline int +do_spin_for_count (int *addr, int val, unsigned long long count) +{ + unsigned long long i; + for (i = 0; i < count; i++) + if (__builtin_expect (__atomic_load_n (addr, MEMMODEL_RELAXED) != val, 0)) + return 0; + else + cpu_relax (); + return 1; +} + diff --git a/libgomp/config/linux/wait.h b/libgomp/config/linux/wait.h index 29d745f7141..17b7ef11c96 100644 --- a/libgomp/config/linux/wait.h +++ b/libgomp/config/linux/wait.h @@ -44,21 +44,16 @@ extern int gomp_futex_wait, gomp_futex_wake; #include - +#include static inline int do_spin (int *addr, int val) { - unsigned long long i, count = gomp_spin_count_var; + unsigned long long count = gomp_spin_count_var; if (__builtin_expect (__atomic_load_n (&gomp_managed_threads, MEMMODEL_RELAXED) > gomp_available_cpus, 0)) count = gomp_throttled_spin_count_var; - for (i = 0; i < count; i++) - if (__builtin_expect (__atomic_load_n (addr, MEMMODEL_RELAXED) != val, 0)) - return 0; - else - cpu_relax (); - return 1; + return do_spin_for_count (addr, val, count); } static inline void do_wait (int *addr, int val) diff --git a/libgomp/config/linux/x86/localfn.h b/libgomp/config/linux/x86/localfn.h new file mode 100644 index 00000000000..379aced99ee --- /dev/null +++ b/libgomp/config/linux/x86/localfn.h @@ -0,0 +1,19 @@ +#ifdef __x86_64__ +static inline void +gomp_thread_delay(unsigned int count) +{ + unsigned long long i; + for (i = 0; i < count * gomp_thread_delay_count; i++) + __builtin_ia32_pause (); +} + +#define RUNLOCALFN(a, b, c) \ + do \ + { \ + gomp_thread_delay(c); \ + a (b); \ + } \ + while (0) +#else +# include "../../../../include/localfn.h" +#endif diff --git a/libgomp/config/linux/x86/mutex.c b/libgomp/config/linux/x86/mutex.c new file mode 100644 index 00000000000..5a14efb522e --- /dev/null +++ b/libgomp/config/linux/x86/mutex.c @@ -0,0 +1,66 @@ +#include "../mutex.c" + +#ifdef __x86_64__ +static inline int +do_spin_for_count_generic (int *addr, int val, unsigned long long count) +{ + unsigned long long i; + for (i = 0; i < count; i++) + if (__builtin_expect (__atomic_load_n (addr, MEMMODEL_RELAXED) != val, + 0)) + return 0; + else + cpu_relax (); + return 1; +} + +#ifndef __WAITPKG__ +#pragma GCC push_options +#pragma GCC target("waitpkg") +#define __DISABLE_WAITPKG__ +#endif /* __WAITPKG__ */ + +static inline unsigned long long __rdtsc(void) +{ + unsigned long long var; + unsigned int hi, lo; + + __asm volatile ("rdtsc" : "=a" (lo), "=d" (hi)); + + var = ((unsigned long long)hi << 32) | lo; + return var; +} + +#define PAUSE_TP 200 +static inline int +do_spin_for_backoff_tpause (int *addr, int val, unsigned long long count) +{ + unsigned int ctrl = 1; + unsigned long long wait_time = 1; + unsigned long long mask = 1ULL << __builtin_ia32_bsrdi(count * PAUSE_TP); + do + { + __builtin_ia32_tpause (ctrl, wait_time + __rdtsc()); + wait_time = (wait_time << 1) | 1; + if (__builtin_expect (__atomic_load_n (addr, MEMMODEL_RELAXED) != val, + 0)) + return 0; + } + while ((wait_time & mask) == 0); + return 1; +} + +#ifdef __DISABLE_WAITPKG__ +#undef __DISABLE_WAITPKG__ +#pragma GCC pop_options +#endif /* __DISABLE_WAITPKG__ */ + +int do_spin_for_count (int *addr, int val, unsigned long long count) +{ + if(__builtin_cpu_supports ("waitpkg")) + return do_spin_for_backoff_tpause(addr, val, count); + else + return do_spin_for_count_generic(addr, val, count); +} + +#endif diff --git a/libgomp/config/linux/x86/spin.h b/libgomp/config/linux/x86/spin.h new file mode 100644 index 00000000000..fb8529af026 --- /dev/null +++ b/libgomp/config/linux/x86/spin.h @@ -0,0 +1,5 @@ +#ifdef __x86_64__ +extern int do_spin_for_count (int *, int, unsigned long long) ; +#else +# include "../spin.h" +#endif diff --git a/libgomp/env.c b/libgomp/env.c index f24484d7f70..6a96a4b0df1 100644 --- a/libgomp/env.c +++ b/libgomp/env.c @@ -106,6 +106,7 @@ gomp_mutex_t gomp_managed_threads_lock; #endif unsigned long gomp_available_cpus = 1, gomp_managed_threads = 1; unsigned long long gomp_spin_count_var, gomp_throttled_spin_count_var; +unsigned long long gomp_thread_delay_count; unsigned long *gomp_nthreads_var_list, gomp_nthreads_var_list_len; char *gomp_bind_var_list; unsigned long gomp_bind_var_list_len; @@ -2419,6 +2420,9 @@ initialize_env (void) else if (all != NULL && gomp_get_icv_flag (all->flags, GOMP_ICV_WAIT_POLICY)) wait_policy = all->icvs.wait_policy; + if (!parse_spincount ("GOMP_DELAYCOUNT", &gomp_thread_delay_count)) + gomp_thread_delay_count = 300; + if (!parse_spincount ("GOMP_SPINCOUNT", &gomp_spin_count_var)) { /* Using a rough estimation of 100000 spins per msec, diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h index 4d2bfab4b71..c3ccf247f6c 100644 --- a/libgomp/libgomp.h +++ b/libgomp/libgomp.h @@ -596,6 +596,7 @@ extern bool gomp_cancel_var; extern enum gomp_target_offload_t gomp_target_offload_var; extern int gomp_max_task_priority_var; extern unsigned long long gomp_spin_count_var, gomp_throttled_spin_count_var; +extern unsigned long long gomp_thread_delay_count; extern unsigned long gomp_available_cpus, gomp_managed_threads; extern unsigned long *gomp_nthreads_var_list, gomp_nthreads_var_list_len; extern char *gomp_bind_var_list; diff --git a/libgomp/team.c b/libgomp/team.c index 54dfca8080a..2a5aff72654 100644 --- a/libgomp/team.c +++ b/libgomp/team.c @@ -30,6 +30,7 @@ #include "pool.h" #include #include +#include "localfn.h" #ifdef LIBGOMP_USE_PTHREADS pthread_attr_t gomp_thread_attr; @@ -62,7 +63,6 @@ struct gomp_thread_start_data pthread_t handle; }; - /* This function is a pthread_create entry point. This contains the idle loop in which a thread waits to be called up to become part of a team. */ @@ -111,7 +111,8 @@ gomp_thread_start (void *xdata) gomp_barrier_wait (&team->barrier); - local_fn (local_data); + RUNLOCALFN(local_fn, local_data, thr->ts.team_id); + gomp_team_barrier_wait_final (&team->barrier); gomp_finish_task (task); gomp_barrier_wait_last (&team->barrier); @@ -126,7 +127,8 @@ gomp_thread_start (void *xdata) struct gomp_team *team = thr->ts.team; struct gomp_task *task = thr->task; - local_fn (local_data); + RUNLOCALFN(local_fn, local_data, thr->ts.team_id); + gomp_team_barrier_wait_final (&team->barrier); gomp_finish_task (task);