[v3] posix-timers: add multi_clock_gettime system call

Message ID 20231228122411.3189-1-maimon.sagi@gmail.com
State New
Headers
Series [v3] posix-timers: add multi_clock_gettime system call |

Commit Message

Sagi Maimon Dec. 28, 2023, 12:24 p.m. UTC
  Some user space applications need to read some clocks.
Each read requires moving from user space to kernel space.
The syscall overhead causes unpredictable delay between N clocks reads
Removing this delay causes better synchronization between N clocks.

Introduce a new system call multi_clock_gettime, which can be used to measure
the offset between multiple clocks, from variety of types: PHC, virtual PHC
and various system clocks (CLOCK_REALTIME, CLOCK_MONOTONIC, etc).
The offset includes the total time that the driver needs to read the clock
timestamp.

New system call allows the reading of a list of clocks - up to PTP_MAX_CLOCKS.
Supported clocks IDs: PHC, virtual PHC and various system clocks.
Up to PTP_MAX_SAMPLES times (per clock) in a single system call read.
The system call returns n_clocks timestamps for each measurement:
- clock 0 timestamp
- ...
- clock n timestamp

Signed-off-by: Sagi Maimon <maimon.sagi@gmail.com>
---
 Addressed comments from:
 - Thomas Gleixner : https://www.spinics.net/lists/netdev/msg959514.html
          
 Changes since version 2:
 - Change multi PHC syscall to fit the Y2038 safe variant.
 - Define the syscall data structure under uapi directory, so it will be known in user space.
 
 arch/x86/entry/syscalls/syscall_64.tbl |  1 +
 include/linux/syscalls.h               |  2 +-
 include/uapi/asm-generic/unistd.h      |  4 +++-
 include/uapi/linux/multi_clock.h       | 21 ++++++++++++++++
 kernel/time/posix-timers.c             | 33 ++++++++++++++++++++++++++
 5 files changed, 59 insertions(+), 2 deletions(-)
 create mode 100644 include/uapi/linux/multi_clock.h
  

Comments

kernel test robot Dec. 29, 2023, 3:21 a.m. UTC | #1
Hi Sagi,

kernel test robot noticed the following build errors:

[auto build test ERROR on tip/x86/asm]
[also build test ERROR on arnd-asm-generic/master tip/timers/core linus/master v6.7-rc7]
[cannot apply to next-20231222]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Sagi-Maimon/posix-timers-add-multi_clock_gettime-system-call/20231228-202632
base:   tip/x86/asm
patch link:    https://lore.kernel.org/r/20231228122411.3189-1-maimon.sagi%40gmail.com
patch subject: [PATCH v3] posix-timers: add multi_clock_gettime system call
config: x86_64-buildonly-randconfig-002-20231229 (https://download.01.org/0day-ci/archive/20231229/202312291148.zfhZ9vDJ-lkp@intel.com/config)
compiler: ClangBuiltLinux clang version 17.0.6 (https://github.com/llvm/llvm-project 6009708b4367171ccdbf4b5905cb6a803753fe18)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231229/202312291148.zfhZ9vDJ-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202312291148.zfhZ9vDJ-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from <built-in>:1:
>> ./usr/include/linux/multi_clock.h:14:2: error: unknown type name 'clockid_t'
      14 |         clockid_t clkid_arr[MULTI_PTP_MAX_CLOCKS]; /* list of clock IDs */
         |         ^
   1 error generated.
  
kernel test robot Dec. 29, 2023, 4:03 a.m. UTC | #2
Hi Sagi,

kernel test robot noticed the following build errors:

[auto build test ERROR on tip/x86/asm]
[also build test ERROR on arnd-asm-generic/master tip/timers/core linus/master v6.7-rc7]
[cannot apply to next-20231222]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Sagi-Maimon/posix-timers-add-multi_clock_gettime-system-call/20231228-202632
base:   tip/x86/asm
patch link:    https://lore.kernel.org/r/20231228122411.3189-1-maimon.sagi%40gmail.com
patch subject: [PATCH v3] posix-timers: add multi_clock_gettime system call
config: powerpc-allmodconfig (https://download.01.org/0day-ci/archive/20231229/202312291154.hCJdKLKM-lkp@intel.com/config)
compiler: clang version 18.0.0git (https://github.com/llvm/llvm-project 8a4266a626914765c0c69839e8a51be383013c1a)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231229/202312291154.hCJdKLKM-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202312291154.hCJdKLKM-lkp@intel.com/

All error/warnings (new ones prefixed by >>):

   In file included from kernel/time/time.c:33:
>> include/linux/syscalls.h:1164:48: warning: declaration of 'struct __ptp_multi_clock_get' will not be visible outside of this function [-Wvisibility]
    1164 | asmlinkage long sys_multi_clock_gettime(struct __ptp_multi_clock_get __user * ptp_multi_clk_get);
         |                                                ^
   1 warning generated.
--
   In file included from kernel/time/hrtimer.c:30:
>> include/linux/syscalls.h:1164:48: warning: declaration of 'struct __ptp_multi_clock_get' will not be visible outside of this function [-Wvisibility]
    1164 | asmlinkage long sys_multi_clock_gettime(struct __ptp_multi_clock_get __user * ptp_multi_clk_get);
         |                                                ^
   kernel/time/hrtimer.c:147:20: warning: unused function 'is_migration_base' [-Wunused-function]
     147 | static inline bool is_migration_base(struct hrtimer_clock_base *base)
         |                    ^~~~~~~~~~~~~~~~~
   kernel/time/hrtimer.c:1876:20: warning: unused function '__hrtimer_peek_ahead_timers' [-Wunused-function]
    1876 | static inline void __hrtimer_peek_ahead_timers(void)
         |                    ^~~~~~~~~~~~~~~~~~~~~~~~~~~
   3 warnings generated.
--
   In file included from kernel/time/posix-timers.c:26:
>> include/linux/syscalls.h:1164:48: warning: declaration of 'struct __ptp_multi_clock_get' will not be visible outside of this function [-Wvisibility]
    1164 | asmlinkage long sys_multi_clock_gettime(struct __ptp_multi_clock_get __user * ptp_multi_clk_get);
         |                                                ^
>> kernel/time/posix-timers.c:1430:1: error: conflicting types for 'sys_multi_clock_gettime'
    1430 | SYSCALL_DEFINE1(multi_clock_gettime, struct __ptp_multi_clock_get __user *, ptp_multi_clk_get)
         | ^
   include/linux/syscalls.h:219:36: note: expanded from macro 'SYSCALL_DEFINE1'
     219 | #define SYSCALL_DEFINE1(name, ...) SYSCALL_DEFINEx(1, _##name, __VA_ARGS__)
         |                                    ^
   include/linux/syscalls.h:230:2: note: expanded from macro 'SYSCALL_DEFINEx'
     230 |         __SYSCALL_DEFINEx(x, sname, __VA_ARGS__)
         |         ^
   include/linux/syscalls.h:244:18: note: expanded from macro '__SYSCALL_DEFINEx'
     244 |         asmlinkage long sys##name(__MAP(x,__SC_DECL,__VA_ARGS__))       \
         |                         ^
   <scratch space>:135:1: note: expanded from here
     135 | sys_multi_clock_gettime
         | ^
   include/linux/syscalls.h:1164:17: note: previous declaration is here
    1164 | asmlinkage long sys_multi_clock_gettime(struct __ptp_multi_clock_get __user * ptp_multi_clk_get);
         |                 ^
   1 warning and 1 error generated.


vim +/sys_multi_clock_gettime +1430 kernel/time/posix-timers.c

  1429	
> 1430	SYSCALL_DEFINE1(multi_clock_gettime, struct __ptp_multi_clock_get __user *, ptp_multi_clk_get)
  1431	{
  1432		const struct k_clock *kc;
  1433		struct timespec64 kernel_tp;
  1434		struct __ptp_multi_clock_get multi_clk_get;
  1435		unsigned int i, j;
  1436		int error;
  1437	
  1438		if (copy_from_user(&multi_clk_get, ptp_multi_clk_get, sizeof(multi_clk_get)))
  1439			return -EFAULT;
  1440	
  1441		if (multi_clk_get.n_samples > MULTI_PTP_MAX_SAMPLES)
  1442			return -EINVAL;
  1443		if (multi_clk_get.n_clocks > MULTI_PTP_MAX_CLOCKS)
  1444			return -EINVAL;
  1445	
  1446		for (j = 0; j < multi_clk_get.n_samples; j++) {
  1447			for (i = 0; i < multi_clk_get.n_clocks; i++) {
  1448				kc = clockid_to_kclock(multi_clk_get.clkid_arr[i]);
  1449				if (!kc)
  1450					return -EINVAL;
  1451				error = kc->clock_get_timespec(multi_clk_get.clkid_arr[i], &kernel_tp);
  1452				if (!error && put_timespec64(&kernel_tp, (struct __kernel_timespec __user *)
  1453							     &ptp_multi_clk_get->ts[j][i]))
  1454					error = -EFAULT;
  1455			}
  1456		}
  1457	
  1458		return error;
  1459	}
  1460
  
Arnd Bergmann Dec. 29, 2023, 3:26 p.m. UTC | #3
On Thu, Dec 28, 2023, at 13:24, Sagi Maimon wrote:
> Some user space applications need to read some clocks.
> Each read requires moving from user space to kernel space.
> The syscall overhead causes unpredictable delay between N clocks reads
> Removing this delay causes better synchronization between N clocks.
>
> Introduce a new system call multi_clock_gettime, which can be used to measure
> the offset between multiple clocks, from variety of types: PHC, virtual PHC
> and various system clocks (CLOCK_REALTIME, CLOCK_MONOTONIC, etc).
> The offset includes the total time that the driver needs to read the clock
> timestamp.
>
> New system call allows the reading of a list of clocks - up to PTP_MAX_CLOCKS.
> Supported clocks IDs: PHC, virtual PHC and various system clocks.
> Up to PTP_MAX_SAMPLES times (per clock) in a single system call read.
> The system call returns n_clocks timestamps for each measurement:
> - clock 0 timestamp
> - ...
> - clock n timestamp
>
> Signed-off-by: Sagi Maimon <maimon.sagi@gmail.com>

Hi Sagi,

Exposing an interface to read multiple clocks makes sense to me,
but I wonder if the interface you use is too inflexible.

> --- a/include/uapi/asm-generic/unistd.h
> +++ b/include/uapi/asm-generic/unistd.h
> @@ -828,9 +828,11 @@ __SYSCALL(__NR_futex_wake, sys_futex_wake)
>  __SYSCALL(__NR_futex_wait, sys_futex_wait)
>  #define __NR_futex_requeue 456
>  __SYSCALL(__NR_futex_requeue, sys_futex_requeue)
> +#define __NR_multi_clock_gettime 457
> +__SYSCALL(__NR_multi_clock_gettime, sys_multi_clock_gettime)
> 
>  #undef __NR_syscalls
> -#define __NR_syscalls 457
> +#define __NR_syscalls 458

Site note: hooking it up only here is sufficient for the
code review but not for inclusion: once we have an agreement
on the API, this should be added to all architectures at once.

> +#define MULTI_PTP_MAX_CLOCKS 5 /* Max number of clocks */
> +#define MULTI_PTP_MAX_SAMPLES 10 /* Max allowed offset measurement samples. */
> +
> +struct __ptp_multi_clock_get {
> +	unsigned int n_clocks; /* Desired number of clocks. */
> +	unsigned int n_samples; /* Desired number of measurements per clock. */
> +	clockid_t clkid_arr[MULTI_PTP_MAX_CLOCKS]; /* list of clock IDs */
> +	/*
> +	 * Array of list of n_clocks clocks time samples n_samples times.
> +	 */
> +	struct  __kernel_timespec ts[MULTI_PTP_MAX_SAMPLES][MULTI_PTP_MAX_CLOCKS];
> +};

The fixed size arrays here seem to be an unnecessary limitation,
both MULTI_PTP_MAX_SAMPLES and MULTI_PTP_MAX_CLOCKS are small
enough that one can come up with scenarios where you would want
a higher number, but at the same time the structure is already
808 bytes long, which is more than you'd normally want to put
on the kernel stack, and which may take a significant time to
copy to and from userspace.

Since n_clocks and n_samples are always inputs to the syscall,
you can just pass them as register arguments and use a dynamically
sized array instead.

It's not clear to me what you gain from having the n_samples
argument over just calling the syscall repeatedly. Does
this offer a benefit for accuracy or is this just meant to
avoid syscall overhead.
> +SYSCALL_DEFINE1(multi_clock_gettime, struct __ptp_multi_clock_get 
> __user *, ptp_multi_clk_get)
> +{
> +	const struct k_clock *kc;
> +	struct timespec64 kernel_tp;
> +	struct __ptp_multi_clock_get multi_clk_get;
> +	unsigned int i, j;
> +	int error;
> +
> +	if (copy_from_user(&multi_clk_get, ptp_multi_clk_get, 
> sizeof(multi_clk_get)))
> +		return -EFAULT;

Here you copy the entire structure from userspace, but
I don't actually see the .ts[] array on the stack being
accessed later as you just copy to the user pointer
directly.

> +		for (i = 0; i < multi_clk_get.n_clocks; i++) {
> +			kc = clockid_to_kclock(multi_clk_get.clkid_arr[i]);
> +			if (!kc)
> +				return -EINVAL;
> +			error = kc->clock_get_timespec(multi_clk_get.clkid_arr[i], 
> &kernel_tp);
> +			if (!error && put_timespec64(&kernel_tp, (struct __kernel_timespec 
> __user *)
> +						     &ptp_multi_clk_get->ts[j][i]))
> +				error = -EFAULT;
> +		}

The put_timespec64() and possibly the clockid_to_kclock() have
some overhead that may introduce jitter, so it may be better to
pull that out of the loop and have a fixed-size array
of timespec64 values on the stack and then copy them
at the end.

On the other hand, this will still give less accuracy than the
getcrosststamp() callback with ioctl(PTP_SYS_OFFSET_PRECISE),
so either the last bit of accuracy isn't all that important,
or you need to refine the interface to actually be an
improvement over the chardev.

      Arnd
  
Sagi Maimon Dec. 31, 2023, 4 p.m. UTC | #4
On Fri, Dec 29, 2023 at 5:27 PM Arnd Bergmann <arnd@arndb.de> wrote:
>
> On Thu, Dec 28, 2023, at 13:24, Sagi Maimon wrote:
> > Some user space applications need to read some clocks.
> > Each read requires moving from user space to kernel space.
> > The syscall overhead causes unpredictable delay between N clocks reads
> > Removing this delay causes better synchronization between N clocks.
> >
> > Introduce a new system call multi_clock_gettime, which can be used to measure
> > the offset between multiple clocks, from variety of types: PHC, virtual PHC
> > and various system clocks (CLOCK_REALTIME, CLOCK_MONOTONIC, etc).
> > The offset includes the total time that the driver needs to read the clock
> > timestamp.
> >
> > New system call allows the reading of a list of clocks - up to PTP_MAX_CLOCKS.
> > Supported clocks IDs: PHC, virtual PHC and various system clocks.
> > Up to PTP_MAX_SAMPLES times (per clock) in a single system call read.
> > The system call returns n_clocks timestamps for each measurement:
> > - clock 0 timestamp
> > - ...
> > - clock n timestamp
> >
> > Signed-off-by: Sagi Maimon <maimon.sagi@gmail.com>
>
> Hi Sagi,
>
> Exposing an interface to read multiple clocks makes sense to me,
> but I wonder if the interface you use is too inflexible.
>
Arnd - thanks alot for your notes.
> > --- a/include/uapi/asm-generic/unistd.h
> > +++ b/include/uapi/asm-generic/unistd.h
> > @@ -828,9 +828,11 @@ __SYSCALL(__NR_futex_wake, sys_futex_wake)
> >  __SYSCALL(__NR_futex_wait, sys_futex_wait)
> >  #define __NR_futex_requeue 456
> >  __SYSCALL(__NR_futex_requeue, sys_futex_requeue)
> > +#define __NR_multi_clock_gettime 457
> > +__SYSCALL(__NR_multi_clock_gettime, sys_multi_clock_gettime)
> >
> >  #undef __NR_syscalls
> > -#define __NR_syscalls 457
> > +#define __NR_syscalls 458
>
> Site note: hooking it up only here is sufficient for the
> code review but not for inclusion: once we have an agreement
> on the API, this should be added to all architectures at once.
>
> > +#define MULTI_PTP_MAX_CLOCKS 5 /* Max number of clocks */
> > +#define MULTI_PTP_MAX_SAMPLES 10 /* Max allowed offset measurement samples. */
> > +
> > +struct __ptp_multi_clock_get {
> > +     unsigned int n_clocks; /* Desired number of clocks. */
> > +     unsigned int n_samples; /* Desired number of measurements per clock. */
> > +     clockid_t clkid_arr[MULTI_PTP_MAX_CLOCKS]; /* list of clock IDs */
> > +     /*
> > +      * Array of list of n_clocks clocks time samples n_samples times.
> > +      */
> > +     struct  __kernel_timespec ts[MULTI_PTP_MAX_SAMPLES][MULTI_PTP_MAX_CLOCKS];
> > +};
>
> The fixed size arrays here seem to be an unnecessary limitation,
> both MULTI_PTP_MAX_SAMPLES and MULTI_PTP_MAX_CLOCKS are small
> enough that one can come up with scenarios where you would want
> a higher number, but at the same time the structure is already
> 808 bytes long, which is more than you'd normally want to put
> on the kernel stack, and which may take a significant time to
> copy to and from userspace.
>
> Since n_clocks and n_samples are always inputs to the syscall,
> you can just pass them as register arguments and use a dynamically
> sized array instead.
>
Both MULTI_PTP_MAX_SAMPLES and MULTI_PTP_MAX_CLOCKS are enough of any
usage we can think of,
But I think you are right, it is better to use a dynamically sized
array for future use, plus to use less stack memory.
On patch v4 a dynamically sized array will be used .
I leaving both MULTI_PTP_MAX_SAMPLES and MULTI_PTP_MAX_CLOCKS but
increasing their values, since there should be some limitation.
> It's not clear to me what you gain from having the n_samples
> argument over just calling the syscall repeatedly. Does
> this offer a benefit for accuracy or is this just meant to
> avoid syscall overhead.
It is mainly to avoid syscall overhead which also slightly improve the accuracy.
> > +SYSCALL_DEFINE1(multi_clock_gettime, struct __ptp_multi_clock_get
> > __user *, ptp_multi_clk_get)
> > +{
> > +     const struct k_clock *kc;
> > +     struct timespec64 kernel_tp;
> > +     struct __ptp_multi_clock_get multi_clk_get;
> > +     unsigned int i, j;
> > +     int error;
> > +
> > +     if (copy_from_user(&multi_clk_get, ptp_multi_clk_get,
> > sizeof(multi_clk_get)))
> > +             return -EFAULT;
>
> Here you copy the entire structure from userspace, but
> I don't actually see the .ts[] array on the stack being
> accessed later as you just copy to the user pointer
> directly.
>
you are right, will be fixed on patch v4.
> > +             for (i = 0; i < multi_clk_get.n_clocks; i++) {
> > +                     kc = clockid_to_kclock(multi_clk_get.clkid_arr[i]);
> > +                     if (!kc)
> > +                             return -EINVAL;
> > +                     error = kc->clock_get_timespec(multi_clk_get.clkid_arr[i],
> > &kernel_tp);
> > +                     if (!error && put_timespec64(&kernel_tp, (struct __kernel_timespec
> > __user *)
> > +                                                  &ptp_multi_clk_get->ts[j][i]))
> > +                             error = -EFAULT;
> > +             }
>
> The put_timespec64() and possibly the clockid_to_kclock() have
> some overhead that may introduce jitter, so it may be better to
> pull that out of the loop and have a fixed-size array
> of timespec64 values on the stack and then copy them
> at the end.
>
This overhead may introduce marginal jitter in my opinion,
still I like your idea since it is better to remove any overhead.
This will be fixed in patch v4.
I don't intend to use "a fixed-size array of timespec64 values on the
stack" since it will cause stack memory overflow,
Instead I will use a dynamically sized array (according to your
previews advice).
> On the other hand, this will still give less accuracy than the
> getcrosststamp() callback with ioctl(PTP_SYS_OFFSET_PRECISE),
> so either the last bit of accuracy isn't all that important,
> or you need to refine the interface to actually be an
> improvement over the chardev.
>
I don't understand this comment, please explain.
The ioctl(PTP_SYS_OFFSET_PRECISE) is one specific case that can be
done by multi_clock_gettime syscall (which cover many more cases)
Plus the ioctl(PTP_SYS_OFFSET_PRECISE) works only on drivers that
support this feature.
>       Arnd
  
Arnd Bergmann Jan. 2, 2024, 11:29 a.m. UTC | #5
On Sun, Dec 31, 2023, at 17:00, Sagi Maimon wrote:
> On Fri, Dec 29, 2023 at 5:27 PM Arnd Bergmann <arnd@arndb.de> wrote:

>> > +struct __ptp_multi_clock_get {
>> > +     unsigned int n_clocks; /* Desired number of clocks. */
>> > +     unsigned int n_samples; /* Desired number of measurements per clock. */
>> > +     clockid_t clkid_arr[MULTI_PTP_MAX_CLOCKS]; /* list of clock IDs */
>> > +     /*
>> > +      * Array of list of n_clocks clocks time samples n_samples times.
>> > +      */
>> > +     struct  __kernel_timespec ts[MULTI_PTP_MAX_SAMPLES][MULTI_PTP_MAX_CLOCKS];
>> > +};
>>
>> The fixed size arrays here seem to be an unnecessary limitation,
>> both MULTI_PTP_MAX_SAMPLES and MULTI_PTP_MAX_CLOCKS are small
>> enough that one can come up with scenarios where you would want
>> a higher number, but at the same time the structure is already
>> 808 bytes long, which is more than you'd normally want to put
>> on the kernel stack, and which may take a significant time to
>> copy to and from userspace.
>>
>> Since n_clocks and n_samples are always inputs to the syscall,
>> you can just pass them as register arguments and use a dynamically
>> sized array instead.
>>
> Both MULTI_PTP_MAX_SAMPLES and MULTI_PTP_MAX_CLOCKS are enough of any
> usage we can think of,
> But I think you are right, it is better to use a dynamically sized
> array for future use, plus to use less stack memory.
> On patch v4 a dynamically sized array will be used .
> I leaving both MULTI_PTP_MAX_SAMPLES and MULTI_PTP_MAX_CLOCKS but
> increasing their values, since there should be some limitation.

I think having an implementation specific limit in the kernel is
fine, but it would be nice to hardcode that limit in the API.

If both clkidarr[] and ts[] are passed as pointer arguments
in registers, they can be arbitrarily long in the API and
still have a documented maximum that we can extend in the
future without changing the interface.

>> It's not clear to me what you gain from having the n_samples
>> argument over just calling the syscall repeatedly. Does
>> this offer a benefit for accuracy or is this just meant to
>> avoid syscall overhead.
> It is mainly to avoid syscall overhead which also slightly
> improve the accuracy.

This is not a big deal as far as I'm concerned, but it
would be nice to back this up with some numbers if you
think it's worthwhile, as my impression is that the effect
is barely measurable: my guess would be that the syscall
overhead is always much less than the cost for the hardware
access.

>> On the other hand, this will still give less accuracy than the
>> getcrosststamp() callback with ioctl(PTP_SYS_OFFSET_PRECISE),
>> so either the last bit of accuracy isn't all that important,
>> or you need to refine the interface to actually be an
>> improvement over the chardev.
>>
> I don't understand this comment, please explain.
> The ioctl(PTP_SYS_OFFSET_PRECISE) is one specific case that can be
> done by multi_clock_gettime syscall (which cover many more cases)
> Plus the ioctl(PTP_SYS_OFFSET_PRECISE) works only on drivers that
> support this feature.

My point here is that on drivers that do support
PTP_SYS_OFFSET_PRECISE, the extra accuracy should be maintained
by the new interface, ideally in a way that does not have any
other downsides.

I think Andy's suggestion of exposing time offsets instead
of absolute times would actually achieve that: If the
interface is changed to return the offset against
CLOCK_MONOTONIC, CLOCK_MONOTONIC_RAW or CLOCK_BOOTTIME
(not sure what is best here), then the new syscall can use
getcrosststamp() where supported for the best results or
fall back to gettimex64() or gettime64() otherwise to
provide a consistent user interface.

Returning an offset would also allow easily calculating an
average over multiple calls in the kernel, instead of
returning a two-dimensional array.

    Arnd
  
Sagi Maimon Jan. 7, 2024, 2:05 p.m. UTC | #6
On Tue, Jan 2, 2024 at 1:30 PM Arnd Bergmann <arnd@arndb.de> wrote:
>
> On Sun, Dec 31, 2023, at 17:00, Sagi Maimon wrote:
> > On Fri, Dec 29, 2023 at 5:27 PM Arnd Bergmann <arnd@arndb.de> wrote:
>
> >> > +struct __ptp_multi_clock_get {
> >> > +     unsigned int n_clocks; /* Desired number of clocks. */
> >> > +     unsigned int n_samples; /* Desired number of measurements per clock. */
> >> > +     clockid_t clkid_arr[MULTI_PTP_MAX_CLOCKS]; /* list of clock IDs */
> >> > +     /*
> >> > +      * Array of list of n_clocks clocks time samples n_samples times.
> >> > +      */
> >> > +     struct  __kernel_timespec ts[MULTI_PTP_MAX_SAMPLES][MULTI_PTP_MAX_CLOCKS];
> >> > +};
> >>
> >> The fixed size arrays here seem to be an unnecessary limitation,
> >> both MULTI_PTP_MAX_SAMPLES and MULTI_PTP_MAX_CLOCKS are small
> >> enough that one can come up with scenarios where you would want
> >> a higher number, but at the same time the structure is already
> >> 808 bytes long, which is more than you'd normally want to put
> >> on the kernel stack, and which may take a significant time to
> >> copy to and from userspace.
> >>
> >> Since n_clocks and n_samples are always inputs to the syscall,
> >> you can just pass them as register arguments and use a dynamically
> >> sized array instead.
> >>
> > Both MULTI_PTP_MAX_SAMPLES and MULTI_PTP_MAX_CLOCKS are enough of any
> > usage we can think of,
> > But I think you are right, it is better to use a dynamically sized
> > array for future use, plus to use less stack memory.
> > On patch v4 a dynamically sized array will be used .
> > I leaving both MULTI_PTP_MAX_SAMPLES and MULTI_PTP_MAX_CLOCKS but
> > increasing their values, since there should be some limitation.
>
> I think having an implementation specific limit in the kernel is
> fine, but it would be nice to hardcode that limit in the API.
>
> If both clkidarr[] and ts[] are passed as pointer arguments
> in registers, they can be arbitrarily long in the API and
> still have a documented maximum that we can extend in the
> future without changing the interface.
>
> >> It's not clear to me what you gain from having the n_samples
> >> argument over just calling the syscall repeatedly. Does
> >> this offer a benefit for accuracy or is this just meant to
> >> avoid syscall overhead.
> > It is mainly to avoid syscall overhead which also slightly
> > improve the accuracy.
>
> This is not a big deal as far as I'm concerned, but it
> would be nice to back this up with some numbers if you
> think it's worthwhile, as my impression is that the effect
> is barely measurable: my guess would be that the syscall
> overhead is always much less than the cost for the hardware
> access.
>
> >> On the other hand, this will still give less accuracy than the
> >> getcrosststamp() callback with ioctl(PTP_SYS_OFFSET_PRECISE),
> >> so either the last bit of accuracy isn't all that important,
> >> or you need to refine the interface to actually be an
> >> improvement over the chardev.
> >>
> > I don't understand this comment, please explain.
> > The ioctl(PTP_SYS_OFFSET_PRECISE) is one specific case that can be
> > done by multi_clock_gettime syscall (which cover many more cases)
> > Plus the ioctl(PTP_SYS_OFFSET_PRECISE) works only on drivers that
> > support this feature.
>
> My point here is that on drivers that do support
> PTP_SYS_OFFSET_PRECISE, the extra accuracy should be maintained
> by the new interface, ideally in a way that does not have any
> other downsides.
>
> I think Andy's suggestion of exposing time offsets instead
> of absolute times would actually achieve that: If the
> interface is changed to return the offset against
> CLOCK_MONOTONIC, CLOCK_MONOTONIC_RAW or CLOCK_BOOTTIME
> (not sure what is best here), then the new syscall can use
> getcrosststamp() where supported for the best results or
> fall back to gettimex64() or gettime64() otherwise to
> provide a consistent user interface.
>
> Returning an offset would also allow easily calculating an
> average over multiple calls in the kernel, instead of
> returning a two-dimensional array.
>
PTP_SYS_OFFSET_PRECISE returns the systime and PHC time and not offset.
But you are right , in the next patch I will use this IOCTL .

>     Arnd
  
Richard Cochran Jan. 11, 2024, 7:31 a.m. UTC | #7
On Tue, Jan 02, 2024 at 12:29:59PM +0100, Arnd Bergmann wrote:

> I think Andy's suggestion of exposing time offsets instead
> of absolute times would actually achieve that: If the
> interface is changed to return the offset against
> CLOCK_MONOTONIC, CLOCK_MONOTONIC_RAW or CLOCK_BOOTTIME
> (not sure what is best here), then the new syscall can use
> getcrosststamp() where supported for the best results or
> fall back to gettimex64() or gettime64() otherwise to
> provide a consistent user interface.

Yes, it makes more sense to provide the offset, since that is what the
user needs in the end.

Can we change the name of the system call to "clock compare"?

int clock_compare(clockid_t a, clockid_t b,
		  int64_t *offset, int64_t *error);

returns: zero or error code,
 offset = a - b
 error  = maximum error due to asymmetry

If clocks a and b are both System-V clocks, then *error=0 and *offset
can be returned directly from the kernel's time keeping state.

If getcrosststamp() is supported on a or b, then invoke it.

otherwise do this:

   t1 = gettime(a)
   t2 = gettime(b)
   t3 - gettime(c)

   *offset = (t1 + t3)/2 - t2
   *error  = (t3 - t1)/2

There is no need for repeated measurement, since user space can call
again when `error` is unacceptable.

Thanks,
Richard
  
Sagi Maimon Jan. 15, 2024, 3:49 p.m. UTC | #8
On Thu, Jan 11, 2024 at 9:31 AM Richard Cochran
<richardcochran@gmail.com> wrote:
>
> On Tue, Jan 02, 2024 at 12:29:59PM +0100, Arnd Bergmann wrote:
>
> > I think Andy's suggestion of exposing time offsets instead
> > of absolute times would actually achieve that: If the
> > interface is changed to return the offset against
> > CLOCK_MONOTONIC, CLOCK_MONOTONIC_RAW or CLOCK_BOOTTIME
> > (not sure what is best here), then the new syscall can use
> > getcrosststamp() where supported for the best results or
> > fall back to gettimex64() or gettime64() otherwise to
> > provide a consistent user interface.
>
> Yes, it makes more sense to provide the offset, since that is what the
> user needs in the end.
>
Make sense will be made on the next patch.
> Can we change the name of the system call to "clock compare"?
>
> int clock_compare(clockid_t a, clockid_t b,
>                   int64_t *offset, int64_t *error);
>
> returns: zero or error code,
>  offset = a - b
>  error  = maximum error due to asymmetry
>
> If clocks a and b are both System-V clocks, then *error=0 and *offset
> can be returned directly from the kernel's time keeping state.
>
> If getcrosststamp() is supported on a or b, then invoke it.
>
> otherwise do this:
>
>    t1 = gettime(a)
>    t2 = gettime(b)
>    t3 - gettime(c)
>
>    *offset = (t1 + t3)/2 - t2
>    *error  = (t3 - t1)/2
>
> There is no need for repeated measurement, since user space can call
> again when `error` is unacceptable.
>
Thanks for your notes, all of them will be done on the next patch (it
will take some time due to work overload).
The only question that I have is: why not implement it as an IOCTL?
It makes more sense to me since it is close to another IOCTL, the
"PTP_SYS_OFFSET" family.
Does it make sense to you?

> Thanks,
> Richard
>
>
>
  
Richard Cochran Jan. 15, 2024, 11:41 p.m. UTC | #9
On Mon, Jan 15, 2024 at 05:49:32PM +0200, Sagi Maimon wrote:

> Thanks for your notes, all of them will be done on the next patch (it
> will take some time due to work overload).

No hurry, glad you are keeping this going...

> The only question that I have is: why not implement it as an IOCTL?
> It makes more sense to me since it is close to another IOCTL, the
> "PTP_SYS_OFFSET" family.

I've often needed other clock offsets, like CLOCK_REALTIME - CLOCK_MONOTONIC.

Those don't have a character device, and so there is no way to call
ioctl() on them.  That is why I'd like to have a system call that
handles any two clock_t instances, using the most accurate back end
based on the kinds of the two clocks.

Thanks,
Richard
  

Patch

diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 8cb8bf68721c..9cdeb0bf49db 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -378,6 +378,7 @@ 
 454	common	futex_wake		sys_futex_wake
 455	common	futex_wait		sys_futex_wait
 456	common	futex_requeue		sys_futex_requeue
+457	common	multi_clock_gettime	sys_multi_clock_gettime
 
 #
 # Due to a historical design error, certain syscalls are numbered differently
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index fd9d12de7e92..b59fa776ab76 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -1161,7 +1161,7 @@  asmlinkage long sys_mmap_pgoff(unsigned long addr, unsigned long len,
 			unsigned long prot, unsigned long flags,
 			unsigned long fd, unsigned long pgoff);
 asmlinkage long sys_old_mmap(struct mmap_arg_struct __user *arg);
-
+asmlinkage long sys_multi_clock_gettime(struct __ptp_multi_clock_get __user * ptp_multi_clk_get);
 
 /*
  * Not a real system call, but a placeholder for syscalls which are
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index 756b013fb832..beb3e0052d3c 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -828,9 +828,11 @@  __SYSCALL(__NR_futex_wake, sys_futex_wake)
 __SYSCALL(__NR_futex_wait, sys_futex_wait)
 #define __NR_futex_requeue 456
 __SYSCALL(__NR_futex_requeue, sys_futex_requeue)
+#define __NR_multi_clock_gettime 457
+__SYSCALL(__NR_multi_clock_gettime, sys_multi_clock_gettime)
 
 #undef __NR_syscalls
-#define __NR_syscalls 457
+#define __NR_syscalls 458
 
 /*
  * 32 bit systems traditionally used different
diff --git a/include/uapi/linux/multi_clock.h b/include/uapi/linux/multi_clock.h
new file mode 100644
index 000000000000..07099045ec32
--- /dev/null
+++ b/include/uapi/linux/multi_clock.h
@@ -0,0 +1,21 @@ 
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef _UAPI_MULTI_CLOCK_H
+#define _UAPI_MULTI_CLOCK_H
+
+#include <linux/types.h>
+#include <linux/time_types.h>
+
+#define MULTI_PTP_MAX_CLOCKS 5 /* Max number of clocks */
+#define MULTI_PTP_MAX_SAMPLES 10 /* Max allowed offset measurement samples. */
+
+struct __ptp_multi_clock_get {
+	unsigned int n_clocks; /* Desired number of clocks. */
+	unsigned int n_samples; /* Desired number of measurements per clock. */
+	clockid_t clkid_arr[MULTI_PTP_MAX_CLOCKS]; /* list of clock IDs */
+	/*
+	 * Array of list of n_clocks clocks time samples n_samples times.
+	 */
+	struct  __kernel_timespec ts[MULTI_PTP_MAX_SAMPLES][MULTI_PTP_MAX_CLOCKS];
+};
+
+#endif /* _UAPI_MULTI_CLOCK_H */
diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c
index b924f0f096fa..80cbce59d4f4 100644
--- a/kernel/time/posix-timers.c
+++ b/kernel/time/posix-timers.c
@@ -31,6 +31,7 @@ 
 #include <linux/compat.h>
 #include <linux/nospec.h>
 #include <linux/time_namespace.h>
+#include <linux/multi_clock.h>
 
 #include "timekeeping.h"
 #include "posix-timers.h"
@@ -1426,6 +1427,38 @@  SYSCALL_DEFINE4(clock_nanosleep_time32, clockid_t, which_clock, int, flags,
 
 #endif
 
+SYSCALL_DEFINE1(multi_clock_gettime, struct __ptp_multi_clock_get __user *, ptp_multi_clk_get)
+{
+	const struct k_clock *kc;
+	struct timespec64 kernel_tp;
+	struct __ptp_multi_clock_get multi_clk_get;
+	unsigned int i, j;
+	int error;
+
+	if (copy_from_user(&multi_clk_get, ptp_multi_clk_get, sizeof(multi_clk_get)))
+		return -EFAULT;
+
+	if (multi_clk_get.n_samples > MULTI_PTP_MAX_SAMPLES)
+		return -EINVAL;
+	if (multi_clk_get.n_clocks > MULTI_PTP_MAX_CLOCKS)
+		return -EINVAL;
+
+	for (j = 0; j < multi_clk_get.n_samples; j++) {
+		for (i = 0; i < multi_clk_get.n_clocks; i++) {
+			kc = clockid_to_kclock(multi_clk_get.clkid_arr[i]);
+			if (!kc)
+				return -EINVAL;
+			error = kc->clock_get_timespec(multi_clk_get.clkid_arr[i], &kernel_tp);
+			if (!error && put_timespec64(&kernel_tp, (struct __kernel_timespec __user *)
+						     &ptp_multi_clk_get->ts[j][i]))
+				error = -EFAULT;
+		}
+	}
+
+	return error;
+}
+
+
 static const struct k_clock clock_realtime = {
 	.clock_getres		= posix_get_hrtimer_res,
 	.clock_get_timespec	= posix_get_realtime_timespec,