Message ID | 20231228122411.3189-1-maimon.sagi@gmail.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel+bounces-12640-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:6f82:b0:100:9c79:88ff with SMTP id tb2csp1958099dyb; Thu, 28 Dec 2023 04:24:47 -0800 (PST) X-Google-Smtp-Source: AGHT+IHtDO7e1XOB9MPL6cxFcwIie9mm3fh1Pqlz/ddkrFp9m93jnIn33a7XotPxQkTJye/uktlP X-Received: by 2002:a0c:f9c6:0:b0:67f:b7db:dd7 with SMTP id j6-20020a0cf9c6000000b0067fb7db0dd7mr10226874qvo.123.1703766287199; Thu, 28 Dec 2023 04:24:47 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1703766287; cv=none; d=google.com; s=arc-20160816; b=dQ+/ZYytkJGaT71UzI+7lBVLh5wbGWlAbfiYC3ay1dpni8eLDmr52gD8su8ioOmKt6 hEufe7IJaYKI/b2glNHaXzMhfl4HpKmzdxBCAb0AkPrMk4lqqZ5FTVFhGFynWF0vzj6/ bpQyBrZMX+UHxJ8WuaKCMjiPiupmROBeTOF/Z7X2u/780kX/ATtg4wy7wYXG54/FUO4r twMpOz4dYR0YKBgSEWKikttV0mFaP+7TZbND5J/cDeJ5Vmaf6RRl3Ewm1wqf1xh5SFFR evpw7j509x4oriXsdnRvzsn26foo8pVacJ1bYsRe28n/dfnHUfUCE9P5rGir9BHKI50N tmRA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:message-id:date:subject:cc:to :from:dkim-signature; bh=cwzrGyHtg55O3i+HEC4AZpo/IYoZaxO+3+s9kU5V6zs=; fh=1+c19CDm2H+z3qOlg51/bxwkAVvK6OLVKZM6oZMAgsk=; b=uis/pYpybsRwKBKza8xb6Vd4ffl9Onnj3qIHQVEhwKE5vuYscORsSm5K64jV0RdhkM XcQRhOeYXdUpRxtEGhJZW8oBJZ+NlgIDXUgfAzhgh09rip/D81cN0CAqPRbtHOxA61ol H0qgAoLcknvqw5snHJuuaTZStMlbL2Xz/pl1eWe1+0D2PDdHwBBa8zYzWk5lVhZgkdV5 avBG/99DNG0D8a3sA3xhb2/LrfPHcFykZZDZdE7yTGCfwMceN66Ga15v1MdC2iNjGORO 3ExSsNRpCxpHKRbvMYUxdh7wWisHFZhmB+1FE6f0MrIAyesQifK32k6t7qivgCGNCdkX Mx2A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=hRRuEtiC; spf=pass (google.com: domain of linux-kernel+bounces-12640-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-12640-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id o13-20020a0ce40d000000b0067f3a061ad9si16801159qvl.609.2023.12.28.04.24.47 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 28 Dec 2023 04:24:47 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-12640-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=hRRuEtiC; spf=pass (google.com: domain of linux-kernel+bounces-12640-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-12640-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id E96D41C21F18 for <ouuuleilei@gmail.com>; Thu, 28 Dec 2023 12:24:46 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 807CB8465; Thu, 28 Dec 2023 12:24:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="hRRuEtiC" X-Original-To: linux-kernel@vger.kernel.org Received: from mail-yw1-f170.google.com (mail-yw1-f170.google.com [209.85.128.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 692397483; Thu, 28 Dec 2023 12:24:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-yw1-f170.google.com with SMTP id 00721157ae682-5e76948cda7so40592957b3.3; Thu, 28 Dec 2023 04:24:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1703766260; x=1704371060; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=cwzrGyHtg55O3i+HEC4AZpo/IYoZaxO+3+s9kU5V6zs=; b=hRRuEtiClfgw8Nk6yd7wgH5ak9g4lrp428yZn1aAaAmW0FfIIh8u+vGokmO/8aLb21 1B+fcmgKbYmzm+1Lsk2T4axlPptEl4daJWqvmQMdJKfOeT0v/7OeQVeyw8lcmxdZt7+R DIh4MLY2cIen6kBvBs4cLRehgF6JZHI49fvE1u5CTy2LoP+fN0mCDsQQQhARqpClK4l1 FQErySXgHxg4bXv7vqmka7UKFldIs5DzLtR+u7VWTSWtCKsN+2wtzx3bzlE+Y3O0elxQ 2C+fVwTWVwNuqlzHt6QxXtmfVBeNEDP0no2/zFBzUXJ3JLg3CcwIJnHOFYsiQxntbwNz cn5A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1703766260; x=1704371060; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=cwzrGyHtg55O3i+HEC4AZpo/IYoZaxO+3+s9kU5V6zs=; b=UfRX/P/P1MeeBhAnHPKpBTRouFIGBoqRe4QN/DaEau9RpXiwd0F0618wi12xaH3JBE FesWZRhhigy+sH3sDfr1e0dh0v4WuPTyU2CmuvIzPFQq4FvNIF9H7uIFHsQFgS0XSweB YE5nd+1KfuDMWnyS5/JqmNncoP16DgC0cOO7fbinPv+F5Rm3Zo6YtApga/KpDpm+GTpQ q1D1sbDB0X65a4ERDmmw42KJrvYEmVSUMg3CM36DLAQeZxdjdGaOOQA2r74xcI0cV2os hknvH3py77Q30hfDQES5qo9rbZs/GKlVQsVktLQsWlG/yuAHN03A+MbTgmkNdpWOwN5K SXXg== X-Gm-Message-State: AOJu0YzrLi90uLwzlxYwhZLYYKgQ//lc90wuh2nzfVDce0EDVa8XiviI Jd49gJko58ZrTk/XLP4QusU= X-Received: by 2002:a0d:e642:0:b0:5e7:3e3a:1a4f with SMTP id p63-20020a0de642000000b005e73e3a1a4fmr5590594ywe.17.1703766260252; Thu, 28 Dec 2023 04:24:20 -0800 (PST) Received: from ran.advaoptical.com ([82.166.23.19]) by smtp.gmail.com with ESMTPSA id u80-20020a0deb53000000b005e82879d18esm7474661ywe.53.2023.12.28.04.24.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 28 Dec 2023 04:24:19 -0800 (PST) From: Sagi Maimon <maimon.sagi@gmail.com> To: richardcochran@gmail.com, luto@kernel.org, datglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com, arnd@arndb.de, geert@linux-m68k.org, peterz@infradead.org, hannes@cmpxchg.org, sohil.mehta@intel.com, rick.p.edgecombe@intel.com, nphamcs@gmail.com, palmer@sifive.com, maimon.sagi@gmail.com, keescook@chromium.org, legion@kernel.org, mark.rutland@arm.com Cc: linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linux-arch@vger.kernel.org, netdev@vger.kernel.org Subject: [PATCH v3] posix-timers: add multi_clock_gettime system call Date: Thu, 28 Dec 2023 14:24:11 +0200 Message-Id: <20231228122411.3189-1-maimon.sagi@gmail.com> X-Mailer: git-send-email 2.26.3 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1786528438161608706 X-GMAIL-MSGID: 1786528438161608706 |
Series |
[v3] posix-timers: add multi_clock_gettime system call
|
|
Commit Message
Sagi Maimon
Dec. 28, 2023, 12:24 p.m. UTC
Some user space applications need to read some clocks.
Each read requires moving from user space to kernel space.
The syscall overhead causes unpredictable delay between N clocks reads
Removing this delay causes better synchronization between N clocks.
Introduce a new system call multi_clock_gettime, which can be used to measure
the offset between multiple clocks, from variety of types: PHC, virtual PHC
and various system clocks (CLOCK_REALTIME, CLOCK_MONOTONIC, etc).
The offset includes the total time that the driver needs to read the clock
timestamp.
New system call allows the reading of a list of clocks - up to PTP_MAX_CLOCKS.
Supported clocks IDs: PHC, virtual PHC and various system clocks.
Up to PTP_MAX_SAMPLES times (per clock) in a single system call read.
The system call returns n_clocks timestamps for each measurement:
- clock 0 timestamp
- ...
- clock n timestamp
Signed-off-by: Sagi Maimon <maimon.sagi@gmail.com>
---
Addressed comments from:
- Thomas Gleixner : https://www.spinics.net/lists/netdev/msg959514.html
Changes since version 2:
- Change multi PHC syscall to fit the Y2038 safe variant.
- Define the syscall data structure under uapi directory, so it will be known in user space.
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
include/linux/syscalls.h | 2 +-
include/uapi/asm-generic/unistd.h | 4 +++-
include/uapi/linux/multi_clock.h | 21 ++++++++++++++++
kernel/time/posix-timers.c | 33 ++++++++++++++++++++++++++
5 files changed, 59 insertions(+), 2 deletions(-)
create mode 100644 include/uapi/linux/multi_clock.h
Comments
Hi Sagi,
kernel test robot noticed the following build errors:
[auto build test ERROR on tip/x86/asm]
[also build test ERROR on arnd-asm-generic/master tip/timers/core linus/master v6.7-rc7]
[cannot apply to next-20231222]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Sagi-Maimon/posix-timers-add-multi_clock_gettime-system-call/20231228-202632
base: tip/x86/asm
patch link: https://lore.kernel.org/r/20231228122411.3189-1-maimon.sagi%40gmail.com
patch subject: [PATCH v3] posix-timers: add multi_clock_gettime system call
config: x86_64-buildonly-randconfig-002-20231229 (https://download.01.org/0day-ci/archive/20231229/202312291148.zfhZ9vDJ-lkp@intel.com/config)
compiler: ClangBuiltLinux clang version 17.0.6 (https://github.com/llvm/llvm-project 6009708b4367171ccdbf4b5905cb6a803753fe18)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231229/202312291148.zfhZ9vDJ-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202312291148.zfhZ9vDJ-lkp@intel.com/
All errors (new ones prefixed by >>):
In file included from <built-in>:1:
>> ./usr/include/linux/multi_clock.h:14:2: error: unknown type name 'clockid_t'
14 | clockid_t clkid_arr[MULTI_PTP_MAX_CLOCKS]; /* list of clock IDs */
| ^
1 error generated.
Hi Sagi, kernel test robot noticed the following build errors: [auto build test ERROR on tip/x86/asm] [also build test ERROR on arnd-asm-generic/master tip/timers/core linus/master v6.7-rc7] [cannot apply to next-20231222] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Sagi-Maimon/posix-timers-add-multi_clock_gettime-system-call/20231228-202632 base: tip/x86/asm patch link: https://lore.kernel.org/r/20231228122411.3189-1-maimon.sagi%40gmail.com patch subject: [PATCH v3] posix-timers: add multi_clock_gettime system call config: powerpc-allmodconfig (https://download.01.org/0day-ci/archive/20231229/202312291154.hCJdKLKM-lkp@intel.com/config) compiler: clang version 18.0.0git (https://github.com/llvm/llvm-project 8a4266a626914765c0c69839e8a51be383013c1a) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231229/202312291154.hCJdKLKM-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202312291154.hCJdKLKM-lkp@intel.com/ All error/warnings (new ones prefixed by >>): In file included from kernel/time/time.c:33: >> include/linux/syscalls.h:1164:48: warning: declaration of 'struct __ptp_multi_clock_get' will not be visible outside of this function [-Wvisibility] 1164 | asmlinkage long sys_multi_clock_gettime(struct __ptp_multi_clock_get __user * ptp_multi_clk_get); | ^ 1 warning generated. -- In file included from kernel/time/hrtimer.c:30: >> include/linux/syscalls.h:1164:48: warning: declaration of 'struct __ptp_multi_clock_get' will not be visible outside of this function [-Wvisibility] 1164 | asmlinkage long sys_multi_clock_gettime(struct __ptp_multi_clock_get __user * ptp_multi_clk_get); | ^ kernel/time/hrtimer.c:147:20: warning: unused function 'is_migration_base' [-Wunused-function] 147 | static inline bool is_migration_base(struct hrtimer_clock_base *base) | ^~~~~~~~~~~~~~~~~ kernel/time/hrtimer.c:1876:20: warning: unused function '__hrtimer_peek_ahead_timers' [-Wunused-function] 1876 | static inline void __hrtimer_peek_ahead_timers(void) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~ 3 warnings generated. -- In file included from kernel/time/posix-timers.c:26: >> include/linux/syscalls.h:1164:48: warning: declaration of 'struct __ptp_multi_clock_get' will not be visible outside of this function [-Wvisibility] 1164 | asmlinkage long sys_multi_clock_gettime(struct __ptp_multi_clock_get __user * ptp_multi_clk_get); | ^ >> kernel/time/posix-timers.c:1430:1: error: conflicting types for 'sys_multi_clock_gettime' 1430 | SYSCALL_DEFINE1(multi_clock_gettime, struct __ptp_multi_clock_get __user *, ptp_multi_clk_get) | ^ include/linux/syscalls.h:219:36: note: expanded from macro 'SYSCALL_DEFINE1' 219 | #define SYSCALL_DEFINE1(name, ...) SYSCALL_DEFINEx(1, _##name, __VA_ARGS__) | ^ include/linux/syscalls.h:230:2: note: expanded from macro 'SYSCALL_DEFINEx' 230 | __SYSCALL_DEFINEx(x, sname, __VA_ARGS__) | ^ include/linux/syscalls.h:244:18: note: expanded from macro '__SYSCALL_DEFINEx' 244 | asmlinkage long sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)) \ | ^ <scratch space>:135:1: note: expanded from here 135 | sys_multi_clock_gettime | ^ include/linux/syscalls.h:1164:17: note: previous declaration is here 1164 | asmlinkage long sys_multi_clock_gettime(struct __ptp_multi_clock_get __user * ptp_multi_clk_get); | ^ 1 warning and 1 error generated. vim +/sys_multi_clock_gettime +1430 kernel/time/posix-timers.c 1429 > 1430 SYSCALL_DEFINE1(multi_clock_gettime, struct __ptp_multi_clock_get __user *, ptp_multi_clk_get) 1431 { 1432 const struct k_clock *kc; 1433 struct timespec64 kernel_tp; 1434 struct __ptp_multi_clock_get multi_clk_get; 1435 unsigned int i, j; 1436 int error; 1437 1438 if (copy_from_user(&multi_clk_get, ptp_multi_clk_get, sizeof(multi_clk_get))) 1439 return -EFAULT; 1440 1441 if (multi_clk_get.n_samples > MULTI_PTP_MAX_SAMPLES) 1442 return -EINVAL; 1443 if (multi_clk_get.n_clocks > MULTI_PTP_MAX_CLOCKS) 1444 return -EINVAL; 1445 1446 for (j = 0; j < multi_clk_get.n_samples; j++) { 1447 for (i = 0; i < multi_clk_get.n_clocks; i++) { 1448 kc = clockid_to_kclock(multi_clk_get.clkid_arr[i]); 1449 if (!kc) 1450 return -EINVAL; 1451 error = kc->clock_get_timespec(multi_clk_get.clkid_arr[i], &kernel_tp); 1452 if (!error && put_timespec64(&kernel_tp, (struct __kernel_timespec __user *) 1453 &ptp_multi_clk_get->ts[j][i])) 1454 error = -EFAULT; 1455 } 1456 } 1457 1458 return error; 1459 } 1460
On Thu, Dec 28, 2023, at 13:24, Sagi Maimon wrote: > Some user space applications need to read some clocks. > Each read requires moving from user space to kernel space. > The syscall overhead causes unpredictable delay between N clocks reads > Removing this delay causes better synchronization between N clocks. > > Introduce a new system call multi_clock_gettime, which can be used to measure > the offset between multiple clocks, from variety of types: PHC, virtual PHC > and various system clocks (CLOCK_REALTIME, CLOCK_MONOTONIC, etc). > The offset includes the total time that the driver needs to read the clock > timestamp. > > New system call allows the reading of a list of clocks - up to PTP_MAX_CLOCKS. > Supported clocks IDs: PHC, virtual PHC and various system clocks. > Up to PTP_MAX_SAMPLES times (per clock) in a single system call read. > The system call returns n_clocks timestamps for each measurement: > - clock 0 timestamp > - ... > - clock n timestamp > > Signed-off-by: Sagi Maimon <maimon.sagi@gmail.com> Hi Sagi, Exposing an interface to read multiple clocks makes sense to me, but I wonder if the interface you use is too inflexible. > --- a/include/uapi/asm-generic/unistd.h > +++ b/include/uapi/asm-generic/unistd.h > @@ -828,9 +828,11 @@ __SYSCALL(__NR_futex_wake, sys_futex_wake) > __SYSCALL(__NR_futex_wait, sys_futex_wait) > #define __NR_futex_requeue 456 > __SYSCALL(__NR_futex_requeue, sys_futex_requeue) > +#define __NR_multi_clock_gettime 457 > +__SYSCALL(__NR_multi_clock_gettime, sys_multi_clock_gettime) > > #undef __NR_syscalls > -#define __NR_syscalls 457 > +#define __NR_syscalls 458 Site note: hooking it up only here is sufficient for the code review but not for inclusion: once we have an agreement on the API, this should be added to all architectures at once. > +#define MULTI_PTP_MAX_CLOCKS 5 /* Max number of clocks */ > +#define MULTI_PTP_MAX_SAMPLES 10 /* Max allowed offset measurement samples. */ > + > +struct __ptp_multi_clock_get { > + unsigned int n_clocks; /* Desired number of clocks. */ > + unsigned int n_samples; /* Desired number of measurements per clock. */ > + clockid_t clkid_arr[MULTI_PTP_MAX_CLOCKS]; /* list of clock IDs */ > + /* > + * Array of list of n_clocks clocks time samples n_samples times. > + */ > + struct __kernel_timespec ts[MULTI_PTP_MAX_SAMPLES][MULTI_PTP_MAX_CLOCKS]; > +}; The fixed size arrays here seem to be an unnecessary limitation, both MULTI_PTP_MAX_SAMPLES and MULTI_PTP_MAX_CLOCKS are small enough that one can come up with scenarios where you would want a higher number, but at the same time the structure is already 808 bytes long, which is more than you'd normally want to put on the kernel stack, and which may take a significant time to copy to and from userspace. Since n_clocks and n_samples are always inputs to the syscall, you can just pass them as register arguments and use a dynamically sized array instead. It's not clear to me what you gain from having the n_samples argument over just calling the syscall repeatedly. Does this offer a benefit for accuracy or is this just meant to avoid syscall overhead. > +SYSCALL_DEFINE1(multi_clock_gettime, struct __ptp_multi_clock_get > __user *, ptp_multi_clk_get) > +{ > + const struct k_clock *kc; > + struct timespec64 kernel_tp; > + struct __ptp_multi_clock_get multi_clk_get; > + unsigned int i, j; > + int error; > + > + if (copy_from_user(&multi_clk_get, ptp_multi_clk_get, > sizeof(multi_clk_get))) > + return -EFAULT; Here you copy the entire structure from userspace, but I don't actually see the .ts[] array on the stack being accessed later as you just copy to the user pointer directly. > + for (i = 0; i < multi_clk_get.n_clocks; i++) { > + kc = clockid_to_kclock(multi_clk_get.clkid_arr[i]); > + if (!kc) > + return -EINVAL; > + error = kc->clock_get_timespec(multi_clk_get.clkid_arr[i], > &kernel_tp); > + if (!error && put_timespec64(&kernel_tp, (struct __kernel_timespec > __user *) > + &ptp_multi_clk_get->ts[j][i])) > + error = -EFAULT; > + } The put_timespec64() and possibly the clockid_to_kclock() have some overhead that may introduce jitter, so it may be better to pull that out of the loop and have a fixed-size array of timespec64 values on the stack and then copy them at the end. On the other hand, this will still give less accuracy than the getcrosststamp() callback with ioctl(PTP_SYS_OFFSET_PRECISE), so either the last bit of accuracy isn't all that important, or you need to refine the interface to actually be an improvement over the chardev. Arnd
On Fri, Dec 29, 2023 at 5:27 PM Arnd Bergmann <arnd@arndb.de> wrote: > > On Thu, Dec 28, 2023, at 13:24, Sagi Maimon wrote: > > Some user space applications need to read some clocks. > > Each read requires moving from user space to kernel space. > > The syscall overhead causes unpredictable delay between N clocks reads > > Removing this delay causes better synchronization between N clocks. > > > > Introduce a new system call multi_clock_gettime, which can be used to measure > > the offset between multiple clocks, from variety of types: PHC, virtual PHC > > and various system clocks (CLOCK_REALTIME, CLOCK_MONOTONIC, etc). > > The offset includes the total time that the driver needs to read the clock > > timestamp. > > > > New system call allows the reading of a list of clocks - up to PTP_MAX_CLOCKS. > > Supported clocks IDs: PHC, virtual PHC and various system clocks. > > Up to PTP_MAX_SAMPLES times (per clock) in a single system call read. > > The system call returns n_clocks timestamps for each measurement: > > - clock 0 timestamp > > - ... > > - clock n timestamp > > > > Signed-off-by: Sagi Maimon <maimon.sagi@gmail.com> > > Hi Sagi, > > Exposing an interface to read multiple clocks makes sense to me, > but I wonder if the interface you use is too inflexible. > Arnd - thanks alot for your notes. > > --- a/include/uapi/asm-generic/unistd.h > > +++ b/include/uapi/asm-generic/unistd.h > > @@ -828,9 +828,11 @@ __SYSCALL(__NR_futex_wake, sys_futex_wake) > > __SYSCALL(__NR_futex_wait, sys_futex_wait) > > #define __NR_futex_requeue 456 > > __SYSCALL(__NR_futex_requeue, sys_futex_requeue) > > +#define __NR_multi_clock_gettime 457 > > +__SYSCALL(__NR_multi_clock_gettime, sys_multi_clock_gettime) > > > > #undef __NR_syscalls > > -#define __NR_syscalls 457 > > +#define __NR_syscalls 458 > > Site note: hooking it up only here is sufficient for the > code review but not for inclusion: once we have an agreement > on the API, this should be added to all architectures at once. > > > +#define MULTI_PTP_MAX_CLOCKS 5 /* Max number of clocks */ > > +#define MULTI_PTP_MAX_SAMPLES 10 /* Max allowed offset measurement samples. */ > > + > > +struct __ptp_multi_clock_get { > > + unsigned int n_clocks; /* Desired number of clocks. */ > > + unsigned int n_samples; /* Desired number of measurements per clock. */ > > + clockid_t clkid_arr[MULTI_PTP_MAX_CLOCKS]; /* list of clock IDs */ > > + /* > > + * Array of list of n_clocks clocks time samples n_samples times. > > + */ > > + struct __kernel_timespec ts[MULTI_PTP_MAX_SAMPLES][MULTI_PTP_MAX_CLOCKS]; > > +}; > > The fixed size arrays here seem to be an unnecessary limitation, > both MULTI_PTP_MAX_SAMPLES and MULTI_PTP_MAX_CLOCKS are small > enough that one can come up with scenarios where you would want > a higher number, but at the same time the structure is already > 808 bytes long, which is more than you'd normally want to put > on the kernel stack, and which may take a significant time to > copy to and from userspace. > > Since n_clocks and n_samples are always inputs to the syscall, > you can just pass them as register arguments and use a dynamically > sized array instead. > Both MULTI_PTP_MAX_SAMPLES and MULTI_PTP_MAX_CLOCKS are enough of any usage we can think of, But I think you are right, it is better to use a dynamically sized array for future use, plus to use less stack memory. On patch v4 a dynamically sized array will be used . I leaving both MULTI_PTP_MAX_SAMPLES and MULTI_PTP_MAX_CLOCKS but increasing their values, since there should be some limitation. > It's not clear to me what you gain from having the n_samples > argument over just calling the syscall repeatedly. Does > this offer a benefit for accuracy or is this just meant to > avoid syscall overhead. It is mainly to avoid syscall overhead which also slightly improve the accuracy. > > +SYSCALL_DEFINE1(multi_clock_gettime, struct __ptp_multi_clock_get > > __user *, ptp_multi_clk_get) > > +{ > > + const struct k_clock *kc; > > + struct timespec64 kernel_tp; > > + struct __ptp_multi_clock_get multi_clk_get; > > + unsigned int i, j; > > + int error; > > + > > + if (copy_from_user(&multi_clk_get, ptp_multi_clk_get, > > sizeof(multi_clk_get))) > > + return -EFAULT; > > Here you copy the entire structure from userspace, but > I don't actually see the .ts[] array on the stack being > accessed later as you just copy to the user pointer > directly. > you are right, will be fixed on patch v4. > > + for (i = 0; i < multi_clk_get.n_clocks; i++) { > > + kc = clockid_to_kclock(multi_clk_get.clkid_arr[i]); > > + if (!kc) > > + return -EINVAL; > > + error = kc->clock_get_timespec(multi_clk_get.clkid_arr[i], > > &kernel_tp); > > + if (!error && put_timespec64(&kernel_tp, (struct __kernel_timespec > > __user *) > > + &ptp_multi_clk_get->ts[j][i])) > > + error = -EFAULT; > > + } > > The put_timespec64() and possibly the clockid_to_kclock() have > some overhead that may introduce jitter, so it may be better to > pull that out of the loop and have a fixed-size array > of timespec64 values on the stack and then copy them > at the end. > This overhead may introduce marginal jitter in my opinion, still I like your idea since it is better to remove any overhead. This will be fixed in patch v4. I don't intend to use "a fixed-size array of timespec64 values on the stack" since it will cause stack memory overflow, Instead I will use a dynamically sized array (according to your previews advice). > On the other hand, this will still give less accuracy than the > getcrosststamp() callback with ioctl(PTP_SYS_OFFSET_PRECISE), > so either the last bit of accuracy isn't all that important, > or you need to refine the interface to actually be an > improvement over the chardev. > I don't understand this comment, please explain. The ioctl(PTP_SYS_OFFSET_PRECISE) is one specific case that can be done by multi_clock_gettime syscall (which cover many more cases) Plus the ioctl(PTP_SYS_OFFSET_PRECISE) works only on drivers that support this feature. > Arnd
On Sun, Dec 31, 2023, at 17:00, Sagi Maimon wrote: > On Fri, Dec 29, 2023 at 5:27 PM Arnd Bergmann <arnd@arndb.de> wrote: >> > +struct __ptp_multi_clock_get { >> > + unsigned int n_clocks; /* Desired number of clocks. */ >> > + unsigned int n_samples; /* Desired number of measurements per clock. */ >> > + clockid_t clkid_arr[MULTI_PTP_MAX_CLOCKS]; /* list of clock IDs */ >> > + /* >> > + * Array of list of n_clocks clocks time samples n_samples times. >> > + */ >> > + struct __kernel_timespec ts[MULTI_PTP_MAX_SAMPLES][MULTI_PTP_MAX_CLOCKS]; >> > +}; >> >> The fixed size arrays here seem to be an unnecessary limitation, >> both MULTI_PTP_MAX_SAMPLES and MULTI_PTP_MAX_CLOCKS are small >> enough that one can come up with scenarios where you would want >> a higher number, but at the same time the structure is already >> 808 bytes long, which is more than you'd normally want to put >> on the kernel stack, and which may take a significant time to >> copy to and from userspace. >> >> Since n_clocks and n_samples are always inputs to the syscall, >> you can just pass them as register arguments and use a dynamically >> sized array instead. >> > Both MULTI_PTP_MAX_SAMPLES and MULTI_PTP_MAX_CLOCKS are enough of any > usage we can think of, > But I think you are right, it is better to use a dynamically sized > array for future use, plus to use less stack memory. > On patch v4 a dynamically sized array will be used . > I leaving both MULTI_PTP_MAX_SAMPLES and MULTI_PTP_MAX_CLOCKS but > increasing their values, since there should be some limitation. I think having an implementation specific limit in the kernel is fine, but it would be nice to hardcode that limit in the API. If both clkidarr[] and ts[] are passed as pointer arguments in registers, they can be arbitrarily long in the API and still have a documented maximum that we can extend in the future without changing the interface. >> It's not clear to me what you gain from having the n_samples >> argument over just calling the syscall repeatedly. Does >> this offer a benefit for accuracy or is this just meant to >> avoid syscall overhead. > It is mainly to avoid syscall overhead which also slightly > improve the accuracy. This is not a big deal as far as I'm concerned, but it would be nice to back this up with some numbers if you think it's worthwhile, as my impression is that the effect is barely measurable: my guess would be that the syscall overhead is always much less than the cost for the hardware access. >> On the other hand, this will still give less accuracy than the >> getcrosststamp() callback with ioctl(PTP_SYS_OFFSET_PRECISE), >> so either the last bit of accuracy isn't all that important, >> or you need to refine the interface to actually be an >> improvement over the chardev. >> > I don't understand this comment, please explain. > The ioctl(PTP_SYS_OFFSET_PRECISE) is one specific case that can be > done by multi_clock_gettime syscall (which cover many more cases) > Plus the ioctl(PTP_SYS_OFFSET_PRECISE) works only on drivers that > support this feature. My point here is that on drivers that do support PTP_SYS_OFFSET_PRECISE, the extra accuracy should be maintained by the new interface, ideally in a way that does not have any other downsides. I think Andy's suggestion of exposing time offsets instead of absolute times would actually achieve that: If the interface is changed to return the offset against CLOCK_MONOTONIC, CLOCK_MONOTONIC_RAW or CLOCK_BOOTTIME (not sure what is best here), then the new syscall can use getcrosststamp() where supported for the best results or fall back to gettimex64() or gettime64() otherwise to provide a consistent user interface. Returning an offset would also allow easily calculating an average over multiple calls in the kernel, instead of returning a two-dimensional array. Arnd
On Tue, Jan 2, 2024 at 1:30 PM Arnd Bergmann <arnd@arndb.de> wrote: > > On Sun, Dec 31, 2023, at 17:00, Sagi Maimon wrote: > > On Fri, Dec 29, 2023 at 5:27 PM Arnd Bergmann <arnd@arndb.de> wrote: > > >> > +struct __ptp_multi_clock_get { > >> > + unsigned int n_clocks; /* Desired number of clocks. */ > >> > + unsigned int n_samples; /* Desired number of measurements per clock. */ > >> > + clockid_t clkid_arr[MULTI_PTP_MAX_CLOCKS]; /* list of clock IDs */ > >> > + /* > >> > + * Array of list of n_clocks clocks time samples n_samples times. > >> > + */ > >> > + struct __kernel_timespec ts[MULTI_PTP_MAX_SAMPLES][MULTI_PTP_MAX_CLOCKS]; > >> > +}; > >> > >> The fixed size arrays here seem to be an unnecessary limitation, > >> both MULTI_PTP_MAX_SAMPLES and MULTI_PTP_MAX_CLOCKS are small > >> enough that one can come up with scenarios where you would want > >> a higher number, but at the same time the structure is already > >> 808 bytes long, which is more than you'd normally want to put > >> on the kernel stack, and which may take a significant time to > >> copy to and from userspace. > >> > >> Since n_clocks and n_samples are always inputs to the syscall, > >> you can just pass them as register arguments and use a dynamically > >> sized array instead. > >> > > Both MULTI_PTP_MAX_SAMPLES and MULTI_PTP_MAX_CLOCKS are enough of any > > usage we can think of, > > But I think you are right, it is better to use a dynamically sized > > array for future use, plus to use less stack memory. > > On patch v4 a dynamically sized array will be used . > > I leaving both MULTI_PTP_MAX_SAMPLES and MULTI_PTP_MAX_CLOCKS but > > increasing their values, since there should be some limitation. > > I think having an implementation specific limit in the kernel is > fine, but it would be nice to hardcode that limit in the API. > > If both clkidarr[] and ts[] are passed as pointer arguments > in registers, they can be arbitrarily long in the API and > still have a documented maximum that we can extend in the > future without changing the interface. > > >> It's not clear to me what you gain from having the n_samples > >> argument over just calling the syscall repeatedly. Does > >> this offer a benefit for accuracy or is this just meant to > >> avoid syscall overhead. > > It is mainly to avoid syscall overhead which also slightly > > improve the accuracy. > > This is not a big deal as far as I'm concerned, but it > would be nice to back this up with some numbers if you > think it's worthwhile, as my impression is that the effect > is barely measurable: my guess would be that the syscall > overhead is always much less than the cost for the hardware > access. > > >> On the other hand, this will still give less accuracy than the > >> getcrosststamp() callback with ioctl(PTP_SYS_OFFSET_PRECISE), > >> so either the last bit of accuracy isn't all that important, > >> or you need to refine the interface to actually be an > >> improvement over the chardev. > >> > > I don't understand this comment, please explain. > > The ioctl(PTP_SYS_OFFSET_PRECISE) is one specific case that can be > > done by multi_clock_gettime syscall (which cover many more cases) > > Plus the ioctl(PTP_SYS_OFFSET_PRECISE) works only on drivers that > > support this feature. > > My point here is that on drivers that do support > PTP_SYS_OFFSET_PRECISE, the extra accuracy should be maintained > by the new interface, ideally in a way that does not have any > other downsides. > > I think Andy's suggestion of exposing time offsets instead > of absolute times would actually achieve that: If the > interface is changed to return the offset against > CLOCK_MONOTONIC, CLOCK_MONOTONIC_RAW or CLOCK_BOOTTIME > (not sure what is best here), then the new syscall can use > getcrosststamp() where supported for the best results or > fall back to gettimex64() or gettime64() otherwise to > provide a consistent user interface. > > Returning an offset would also allow easily calculating an > average over multiple calls in the kernel, instead of > returning a two-dimensional array. > PTP_SYS_OFFSET_PRECISE returns the systime and PHC time and not offset. But you are right , in the next patch I will use this IOCTL . > Arnd
On Tue, Jan 02, 2024 at 12:29:59PM +0100, Arnd Bergmann wrote: > I think Andy's suggestion of exposing time offsets instead > of absolute times would actually achieve that: If the > interface is changed to return the offset against > CLOCK_MONOTONIC, CLOCK_MONOTONIC_RAW or CLOCK_BOOTTIME > (not sure what is best here), then the new syscall can use > getcrosststamp() where supported for the best results or > fall back to gettimex64() or gettime64() otherwise to > provide a consistent user interface. Yes, it makes more sense to provide the offset, since that is what the user needs in the end. Can we change the name of the system call to "clock compare"? int clock_compare(clockid_t a, clockid_t b, int64_t *offset, int64_t *error); returns: zero or error code, offset = a - b error = maximum error due to asymmetry If clocks a and b are both System-V clocks, then *error=0 and *offset can be returned directly from the kernel's time keeping state. If getcrosststamp() is supported on a or b, then invoke it. otherwise do this: t1 = gettime(a) t2 = gettime(b) t3 - gettime(c) *offset = (t1 + t3)/2 - t2 *error = (t3 - t1)/2 There is no need for repeated measurement, since user space can call again when `error` is unacceptable. Thanks, Richard
On Thu, Jan 11, 2024 at 9:31 AM Richard Cochran <richardcochran@gmail.com> wrote: > > On Tue, Jan 02, 2024 at 12:29:59PM +0100, Arnd Bergmann wrote: > > > I think Andy's suggestion of exposing time offsets instead > > of absolute times would actually achieve that: If the > > interface is changed to return the offset against > > CLOCK_MONOTONIC, CLOCK_MONOTONIC_RAW or CLOCK_BOOTTIME > > (not sure what is best here), then the new syscall can use > > getcrosststamp() where supported for the best results or > > fall back to gettimex64() or gettime64() otherwise to > > provide a consistent user interface. > > Yes, it makes more sense to provide the offset, since that is what the > user needs in the end. > Make sense will be made on the next patch. > Can we change the name of the system call to "clock compare"? > > int clock_compare(clockid_t a, clockid_t b, > int64_t *offset, int64_t *error); > > returns: zero or error code, > offset = a - b > error = maximum error due to asymmetry > > If clocks a and b are both System-V clocks, then *error=0 and *offset > can be returned directly from the kernel's time keeping state. > > If getcrosststamp() is supported on a or b, then invoke it. > > otherwise do this: > > t1 = gettime(a) > t2 = gettime(b) > t3 - gettime(c) > > *offset = (t1 + t3)/2 - t2 > *error = (t3 - t1)/2 > > There is no need for repeated measurement, since user space can call > again when `error` is unacceptable. > Thanks for your notes, all of them will be done on the next patch (it will take some time due to work overload). The only question that I have is: why not implement it as an IOCTL? It makes more sense to me since it is close to another IOCTL, the "PTP_SYS_OFFSET" family. Does it make sense to you? > Thanks, > Richard > > >
On Mon, Jan 15, 2024 at 05:49:32PM +0200, Sagi Maimon wrote: > Thanks for your notes, all of them will be done on the next patch (it > will take some time due to work overload). No hurry, glad you are keeping this going... > The only question that I have is: why not implement it as an IOCTL? > It makes more sense to me since it is close to another IOCTL, the > "PTP_SYS_OFFSET" family. I've often needed other clock offsets, like CLOCK_REALTIME - CLOCK_MONOTONIC. Those don't have a character device, and so there is no way to call ioctl() on them. That is why I'd like to have a system call that handles any two clock_t instances, using the most accurate back end based on the kinds of the two clocks. Thanks, Richard
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index 8cb8bf68721c..9cdeb0bf49db 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -378,6 +378,7 @@ 454 common futex_wake sys_futex_wake 455 common futex_wait sys_futex_wait 456 common futex_requeue sys_futex_requeue +457 common multi_clock_gettime sys_multi_clock_gettime # # Due to a historical design error, certain syscalls are numbered differently diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index fd9d12de7e92..b59fa776ab76 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -1161,7 +1161,7 @@ asmlinkage long sys_mmap_pgoff(unsigned long addr, unsigned long len, unsigned long prot, unsigned long flags, unsigned long fd, unsigned long pgoff); asmlinkage long sys_old_mmap(struct mmap_arg_struct __user *arg); - +asmlinkage long sys_multi_clock_gettime(struct __ptp_multi_clock_get __user * ptp_multi_clk_get); /* * Not a real system call, but a placeholder for syscalls which are diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h index 756b013fb832..beb3e0052d3c 100644 --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h @@ -828,9 +828,11 @@ __SYSCALL(__NR_futex_wake, sys_futex_wake) __SYSCALL(__NR_futex_wait, sys_futex_wait) #define __NR_futex_requeue 456 __SYSCALL(__NR_futex_requeue, sys_futex_requeue) +#define __NR_multi_clock_gettime 457 +__SYSCALL(__NR_multi_clock_gettime, sys_multi_clock_gettime) #undef __NR_syscalls -#define __NR_syscalls 457 +#define __NR_syscalls 458 /* * 32 bit systems traditionally used different diff --git a/include/uapi/linux/multi_clock.h b/include/uapi/linux/multi_clock.h new file mode 100644 index 000000000000..07099045ec32 --- /dev/null +++ b/include/uapi/linux/multi_clock.h @@ -0,0 +1,21 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +#ifndef _UAPI_MULTI_CLOCK_H +#define _UAPI_MULTI_CLOCK_H + +#include <linux/types.h> +#include <linux/time_types.h> + +#define MULTI_PTP_MAX_CLOCKS 5 /* Max number of clocks */ +#define MULTI_PTP_MAX_SAMPLES 10 /* Max allowed offset measurement samples. */ + +struct __ptp_multi_clock_get { + unsigned int n_clocks; /* Desired number of clocks. */ + unsigned int n_samples; /* Desired number of measurements per clock. */ + clockid_t clkid_arr[MULTI_PTP_MAX_CLOCKS]; /* list of clock IDs */ + /* + * Array of list of n_clocks clocks time samples n_samples times. + */ + struct __kernel_timespec ts[MULTI_PTP_MAX_SAMPLES][MULTI_PTP_MAX_CLOCKS]; +}; + +#endif /* _UAPI_MULTI_CLOCK_H */ diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c index b924f0f096fa..80cbce59d4f4 100644 --- a/kernel/time/posix-timers.c +++ b/kernel/time/posix-timers.c @@ -31,6 +31,7 @@ #include <linux/compat.h> #include <linux/nospec.h> #include <linux/time_namespace.h> +#include <linux/multi_clock.h> #include "timekeeping.h" #include "posix-timers.h" @@ -1426,6 +1427,38 @@ SYSCALL_DEFINE4(clock_nanosleep_time32, clockid_t, which_clock, int, flags, #endif +SYSCALL_DEFINE1(multi_clock_gettime, struct __ptp_multi_clock_get __user *, ptp_multi_clk_get) +{ + const struct k_clock *kc; + struct timespec64 kernel_tp; + struct __ptp_multi_clock_get multi_clk_get; + unsigned int i, j; + int error; + + if (copy_from_user(&multi_clk_get, ptp_multi_clk_get, sizeof(multi_clk_get))) + return -EFAULT; + + if (multi_clk_get.n_samples > MULTI_PTP_MAX_SAMPLES) + return -EINVAL; + if (multi_clk_get.n_clocks > MULTI_PTP_MAX_CLOCKS) + return -EINVAL; + + for (j = 0; j < multi_clk_get.n_samples; j++) { + for (i = 0; i < multi_clk_get.n_clocks; i++) { + kc = clockid_to_kclock(multi_clk_get.clkid_arr[i]); + if (!kc) + return -EINVAL; + error = kc->clock_get_timespec(multi_clk_get.clkid_arr[i], &kernel_tp); + if (!error && put_timespec64(&kernel_tp, (struct __kernel_timespec __user *) + &ptp_multi_clk_get->ts[j][i])) + error = -EFAULT; + } + } + + return error; +} + + static const struct k_clock clock_realtime = { .clock_getres = posix_get_hrtimer_res, .clock_get_timespec = posix_get_realtime_timespec,