[net-next,v6,1/4] eventpoll: support busy poll per epoll instance

Message ID 20240205210453.11301-2-jdamato@fastly.com
State New
Headers
Series Per epoll context busy poll support |

Commit Message

Joe Damato Feb. 5, 2024, 9:04 p.m. UTC
  Allow busy polling on a per-epoll context basis. The per-epoll context
usec timeout value is preferred, but the pre-existing system wide sysctl
value is still supported if it specified.

Signed-off-by: Joe Damato <jdamato@fastly.com>
---
 fs/eventpoll.c | 49 +++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 45 insertions(+), 4 deletions(-)
  

Comments

Jakub Kicinski Feb. 7, 2024, 7:04 p.m. UTC | #1
On Mon,  5 Feb 2024 21:04:46 +0000 Joe Damato wrote:
> Allow busy polling on a per-epoll context basis. The per-epoll context
> usec timeout value is preferred, but the pre-existing system wide sysctl
> value is still supported if it specified.

Why do we need u64 for usecs? I think u16 would do, and u32 would give
a very solid "engineering margin". If it was discussed in previous
versions I think it's worth explaining in the commit message.
  
Joe Damato Feb. 7, 2024, 7:14 p.m. UTC | #2
On Wed, Feb 07, 2024 at 11:04:13AM -0800, Jakub Kicinski wrote:
> On Mon,  5 Feb 2024 21:04:46 +0000 Joe Damato wrote:
> > Allow busy polling on a per-epoll context basis. The per-epoll context
> > usec timeout value is preferred, but the pre-existing system wide sysctl
> > value is still supported if it specified.
> 
> Why do we need u64 for usecs? I think u16 would do, and u32 would give
> a very solid "engineering margin". If it was discussed in previous
> versions I think it's worth explaining in the commit message.

In patch 4/4 the value is limited to U32_MAX, but if you prefer I use a u32
here instead, I can make that change.
  
Jakub Kicinski Feb. 7, 2024, 8:11 p.m. UTC | #3
On Wed, 7 Feb 2024 11:14:08 -0800 Joe Damato wrote:
> > Why do we need u64 for usecs? I think u16 would do, and u32 would give
> > a very solid "engineering margin". If it was discussed in previous
> > versions I think it's worth explaining in the commit message.  
> 
> In patch 4/4 the value is limited to U32_MAX, but if you prefer I use a u32
> here instead, I can make that change.

Unless you have a clear reason not to, I think using u32 would be more
natural? If my head math is right the range for u32 is 4096 sec,
slightly over an hour? I'd use u32 and limit it to S32_MAX.
  
Joe Damato Feb. 7, 2024, 8:23 p.m. UTC | #4
On Wed, Feb 07, 2024 at 12:11:24PM -0800, Jakub Kicinski wrote:
> On Wed, 7 Feb 2024 11:14:08 -0800 Joe Damato wrote:
> > > Why do we need u64 for usecs? I think u16 would do, and u32 would give
> > > a very solid "engineering margin". If it was discussed in previous
> > > versions I think it's worth explaining in the commit message.  
> > 
> > In patch 4/4 the value is limited to U32_MAX, but if you prefer I use a u32
> > here instead, I can make that change.
> 
> Unless you have a clear reason not to, I think using u32 would be more
> natural? If my head math is right the range for u32 is 4096 sec,
> slightly over an hour? I'd use u32 and limit it to S32_MAX.

OK, that seems fine. Sorry for the noob question, but since that represents
a fucntional change to patch 4/4, I believe I would need to drop Jiri's
Reviewed-by, is that right?
  
Jakub Kicinski Feb. 7, 2024, 8:56 p.m. UTC | #5
On Wed, 7 Feb 2024 12:23:23 -0800 Joe Damato wrote:
> > Unless you have a clear reason not to, I think using u32 would be more
> > natural? If my head math is right the range for u32 is 4096 sec,
> > slightly over an hour? I'd use u32 and limit it to S32_MAX.  
> 
> OK, that seems fine. Sorry for the noob question, but since that represents
> a fucntional change to patch 4/4, I believe I would need to drop Jiri's
> Reviewed-by, is that right?

I'd default to keeping it. But the review tag retention rules are one
of the more subjective things in kernel developments.
  
Eric Dumazet Feb. 8, 2024, 5:46 p.m. UTC | #6
On Mon, Feb 5, 2024 at 10:05 PM Joe Damato <jdamato@fastly.com> wrote:
>
> Allow busy polling on a per-epoll context basis. The per-epoll context
> usec timeout value is preferred, but the pre-existing system wide sysctl
> value is still supported if it specified.
>
> Signed-off-by: Joe Damato <jdamato@fastly.com>
> ---
>  fs/eventpoll.c | 49 +++++++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 45 insertions(+), 4 deletions(-)
>
> diff --git a/fs/eventpoll.c b/fs/eventpoll.c
> index 3534d36a1474..ce75189d46df 100644
> --- a/fs/eventpoll.c
> +++ b/fs/eventpoll.c
> @@ -227,6 +227,8 @@ struct eventpoll {
>  #ifdef CONFIG_NET_RX_BUSY_POLL
>         /* used to track busy poll napi_id */
>         unsigned int napi_id;
> +       /* busy poll timeout */
> +       u64 busy_poll_usecs;
>  #endif
>
>  #ifdef CONFIG_DEBUG_LOCK_ALLOC
> @@ -386,12 +388,44 @@ static inline int ep_events_available(struct eventpoll *ep)
>                 READ_ONCE(ep->ovflist) != EP_UNACTIVE_PTR;
>  }
>
> +/**
> + * busy_loop_ep_timeout - check if busy poll has timed out. The timeout value
> + * from the epoll instance ep is preferred, but if it is not set fallback to
> + * the system-wide global via busy_loop_timeout.
> + *
> + * @start_time: The start time used to compute the remaining time until timeout.
> + * @ep: Pointer to the eventpoll context.
> + *
> + * Return: true if the timeout has expired, false otherwise.
> + */
> +static inline bool busy_loop_ep_timeout(unsigned long start_time, struct eventpoll *ep)
> +{
> +#ifdef CONFIG_NET_RX_BUSY_POLL

It seems this local helper is only called from code compiled when
CONFIG_NET_RX_BUSY_POLL
is set.

Not sure why you need an #ifdef here.

> +       unsigned long bp_usec = READ_ONCE(ep->busy_poll_usecs);
> +
> +       if (bp_usec) {
> +               unsigned long end_time = start_time + bp_usec;
> +               unsigned long now = busy_loop_current_time();
> +
> +               return time_after(now, end_time);
> +       } else {
> +               return busy_loop_timeout(start_time);
> +       }
> +#endif
> +       return true;
> +}
> +
>  #ifdef CONFIG_NET_RX_BUSY_POLL
> +static bool ep_busy_loop_on(struct eventpoll *ep)
> +{
> +       return !!ep->busy_poll_usecs || net_busy_loop_on();
> +}
> +
>  static bool ep_busy_loop_end(void *p, unsigned long start_time)
>  {
>         struct eventpoll *ep = p;
>
> -       return ep_events_available(ep) || busy_loop_timeout(start_time);
> +       return ep_events_available(ep) || busy_loop_ep_timeout(start_time, ep);
>  }
>
>  /*
> @@ -404,7 +438,7 @@ static bool ep_busy_loop(struct eventpoll *ep, int nonblock)
>  {
>         unsigned int napi_id = READ_ONCE(ep->napi_id);
>
> -       if ((napi_id >= MIN_NAPI_ID) && net_busy_loop_on()) {
> +       if ((napi_id >= MIN_NAPI_ID) && ep_busy_loop_on(ep)) {
>                 napi_busy_loop(napi_id, nonblock ? NULL : ep_busy_loop_end, ep, false,
>                                BUSY_POLL_BUDGET);
>                 if (ep_events_available(ep))
> @@ -430,7 +464,8 @@ static inline void ep_set_busy_poll_napi_id(struct epitem *epi)
>         struct socket *sock;
>         struct sock *sk;
>
> -       if (!net_busy_loop_on())
> +       ep = epi->ep;
> +       if (!ep_busy_loop_on(ep))
>                 return;
>
>         sock = sock_from_file(epi->ffd.file);
> @@ -442,7 +477,6 @@ static inline void ep_set_busy_poll_napi_id(struct epitem *epi)
>                 return;
>
>         napi_id = READ_ONCE(sk->sk_napi_id);
> -       ep = epi->ep;
>
>         /* Non-NAPI IDs can be rejected
>          *      or
> @@ -466,6 +500,10 @@ static inline void ep_set_busy_poll_napi_id(struct epitem *epi)
>  {
>  }
>
> +static inline bool ep_busy_loop_on(struct eventpoll *ep)
> +{
> +       return false;
> +}
>  #endif /* CONFIG_NET_RX_BUSY_POLL */
>
>  /*
> @@ -2058,6 +2096,9 @@ static int do_epoll_create(int flags)
>                 error = PTR_ERR(file);
>                 goto out_free_fd;
>         }
> +#ifdef CONFIG_NET_RX_BUSY_POLL
> +       ep->busy_poll_usecs = 0;
> +#endif
>         ep->file = file;
>         fd_install(fd, file);
>         return fd;
> --
> 2.25.1
>
  
Joe Damato Feb. 8, 2024, 6:06 p.m. UTC | #7
On Thu, Feb 08, 2024 at 06:46:25PM +0100, Eric Dumazet wrote:
> On Mon, Feb 5, 2024 at 10:05 PM Joe Damato <jdamato@fastly.com> wrote:
> >
> > Allow busy polling on a per-epoll context basis. The per-epoll context
> > usec timeout value is preferred, but the pre-existing system wide sysctl
> > value is still supported if it specified.
> >
> > Signed-off-by: Joe Damato <jdamato@fastly.com>
> > ---
> >  fs/eventpoll.c | 49 +++++++++++++++++++++++++++++++++++++++++++++----
> >  1 file changed, 45 insertions(+), 4 deletions(-)
> >
> > diff --git a/fs/eventpoll.c b/fs/eventpoll.c
> > index 3534d36a1474..ce75189d46df 100644
> > --- a/fs/eventpoll.c
> > +++ b/fs/eventpoll.c
> > @@ -227,6 +227,8 @@ struct eventpoll {
> >  #ifdef CONFIG_NET_RX_BUSY_POLL
> >         /* used to track busy poll napi_id */
> >         unsigned int napi_id;
> > +       /* busy poll timeout */
> > +       u64 busy_poll_usecs;
> >  #endif
> >
> >  #ifdef CONFIG_DEBUG_LOCK_ALLOC
> > @@ -386,12 +388,44 @@ static inline int ep_events_available(struct eventpoll *ep)
> >                 READ_ONCE(ep->ovflist) != EP_UNACTIVE_PTR;
> >  }
> >
> > +/**
> > + * busy_loop_ep_timeout - check if busy poll has timed out. The timeout value
> > + * from the epoll instance ep is preferred, but if it is not set fallback to
> > + * the system-wide global via busy_loop_timeout.
> > + *
> > + * @start_time: The start time used to compute the remaining time until timeout.
> > + * @ep: Pointer to the eventpoll context.
> > + *
> > + * Return: true if the timeout has expired, false otherwise.
> > + */
> > +static inline bool busy_loop_ep_timeout(unsigned long start_time, struct eventpoll *ep)
> > +{
> > +#ifdef CONFIG_NET_RX_BUSY_POLL
> 
> It seems this local helper is only called from code compiled when
> CONFIG_NET_RX_BUSY_POLL
> is set.
> 
> Not sure why you need an #ifdef here.

Thanks, you are right.

I'll move this down to be within CONFIG_NET_RX_BUSY_POLL and get rid of the
#ifdef for the v7.

Thanks for your review.
 
> > +       unsigned long bp_usec = READ_ONCE(ep->busy_poll_usecs);
> > +
> > +       if (bp_usec) {
> > +               unsigned long end_time = start_time + bp_usec;
> > +               unsigned long now = busy_loop_current_time();
> > +
> > +               return time_after(now, end_time);
> > +       } else {
> > +               return busy_loop_timeout(start_time);
> > +       }
> > +#endif
> > +       return true;
> > +}
> > +
> >  #ifdef CONFIG_NET_RX_BUSY_POLL
> > +static bool ep_busy_loop_on(struct eventpoll *ep)
> > +{
> > +       return !!ep->busy_poll_usecs || net_busy_loop_on();
> > +}
> > +
> >  static bool ep_busy_loop_end(void *p, unsigned long start_time)
> >  {
> >         struct eventpoll *ep = p;
> >
> > -       return ep_events_available(ep) || busy_loop_timeout(start_time);
> > +       return ep_events_available(ep) || busy_loop_ep_timeout(start_time, ep);
> >  }
> >
> >  /*
> > @@ -404,7 +438,7 @@ static bool ep_busy_loop(struct eventpoll *ep, int nonblock)
> >  {
> >         unsigned int napi_id = READ_ONCE(ep->napi_id);
> >
> > -       if ((napi_id >= MIN_NAPI_ID) && net_busy_loop_on()) {
> > +       if ((napi_id >= MIN_NAPI_ID) && ep_busy_loop_on(ep)) {
> >                 napi_busy_loop(napi_id, nonblock ? NULL : ep_busy_loop_end, ep, false,
> >                                BUSY_POLL_BUDGET);
> >                 if (ep_events_available(ep))
> > @@ -430,7 +464,8 @@ static inline void ep_set_busy_poll_napi_id(struct epitem *epi)
> >         struct socket *sock;
> >         struct sock *sk;
> >
> > -       if (!net_busy_loop_on())
> > +       ep = epi->ep;
> > +       if (!ep_busy_loop_on(ep))
> >                 return;
> >
> >         sock = sock_from_file(epi->ffd.file);
> > @@ -442,7 +477,6 @@ static inline void ep_set_busy_poll_napi_id(struct epitem *epi)
> >                 return;
> >
> >         napi_id = READ_ONCE(sk->sk_napi_id);
> > -       ep = epi->ep;
> >
> >         /* Non-NAPI IDs can be rejected
> >          *      or
> > @@ -466,6 +500,10 @@ static inline void ep_set_busy_poll_napi_id(struct epitem *epi)
> >  {
> >  }
> >
> > +static inline bool ep_busy_loop_on(struct eventpoll *ep)
> > +{
> > +       return false;
> > +}
> >  #endif /* CONFIG_NET_RX_BUSY_POLL */
> >
> >  /*
> > @@ -2058,6 +2096,9 @@ static int do_epoll_create(int flags)
> >                 error = PTR_ERR(file);
> >                 goto out_free_fd;
> >         }
> > +#ifdef CONFIG_NET_RX_BUSY_POLL
> > +       ep->busy_poll_usecs = 0;
> > +#endif
> >         ep->file = file;
> >         fd_install(fd, file);
> >         return fd;
> > --
> > 2.25.1
> >
  

Patch

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 3534d36a1474..ce75189d46df 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -227,6 +227,8 @@  struct eventpoll {
 #ifdef CONFIG_NET_RX_BUSY_POLL
 	/* used to track busy poll napi_id */
 	unsigned int napi_id;
+	/* busy poll timeout */
+	u64 busy_poll_usecs;
 #endif
 
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
@@ -386,12 +388,44 @@  static inline int ep_events_available(struct eventpoll *ep)
 		READ_ONCE(ep->ovflist) != EP_UNACTIVE_PTR;
 }
 
+/**
+ * busy_loop_ep_timeout - check if busy poll has timed out. The timeout value
+ * from the epoll instance ep is preferred, but if it is not set fallback to
+ * the system-wide global via busy_loop_timeout.
+ *
+ * @start_time: The start time used to compute the remaining time until timeout.
+ * @ep: Pointer to the eventpoll context.
+ *
+ * Return: true if the timeout has expired, false otherwise.
+ */
+static inline bool busy_loop_ep_timeout(unsigned long start_time, struct eventpoll *ep)
+{
+#ifdef CONFIG_NET_RX_BUSY_POLL
+	unsigned long bp_usec = READ_ONCE(ep->busy_poll_usecs);
+
+	if (bp_usec) {
+		unsigned long end_time = start_time + bp_usec;
+		unsigned long now = busy_loop_current_time();
+
+		return time_after(now, end_time);
+	} else {
+		return busy_loop_timeout(start_time);
+	}
+#endif
+	return true;
+}
+
 #ifdef CONFIG_NET_RX_BUSY_POLL
+static bool ep_busy_loop_on(struct eventpoll *ep)
+{
+	return !!ep->busy_poll_usecs || net_busy_loop_on();
+}
+
 static bool ep_busy_loop_end(void *p, unsigned long start_time)
 {
 	struct eventpoll *ep = p;
 
-	return ep_events_available(ep) || busy_loop_timeout(start_time);
+	return ep_events_available(ep) || busy_loop_ep_timeout(start_time, ep);
 }
 
 /*
@@ -404,7 +438,7 @@  static bool ep_busy_loop(struct eventpoll *ep, int nonblock)
 {
 	unsigned int napi_id = READ_ONCE(ep->napi_id);
 
-	if ((napi_id >= MIN_NAPI_ID) && net_busy_loop_on()) {
+	if ((napi_id >= MIN_NAPI_ID) && ep_busy_loop_on(ep)) {
 		napi_busy_loop(napi_id, nonblock ? NULL : ep_busy_loop_end, ep, false,
 			       BUSY_POLL_BUDGET);
 		if (ep_events_available(ep))
@@ -430,7 +464,8 @@  static inline void ep_set_busy_poll_napi_id(struct epitem *epi)
 	struct socket *sock;
 	struct sock *sk;
 
-	if (!net_busy_loop_on())
+	ep = epi->ep;
+	if (!ep_busy_loop_on(ep))
 		return;
 
 	sock = sock_from_file(epi->ffd.file);
@@ -442,7 +477,6 @@  static inline void ep_set_busy_poll_napi_id(struct epitem *epi)
 		return;
 
 	napi_id = READ_ONCE(sk->sk_napi_id);
-	ep = epi->ep;
 
 	/* Non-NAPI IDs can be rejected
 	 *	or
@@ -466,6 +500,10 @@  static inline void ep_set_busy_poll_napi_id(struct epitem *epi)
 {
 }
 
+static inline bool ep_busy_loop_on(struct eventpoll *ep)
+{
+	return false;
+}
 #endif /* CONFIG_NET_RX_BUSY_POLL */
 
 /*
@@ -2058,6 +2096,9 @@  static int do_epoll_create(int flags)
 		error = PTR_ERR(file);
 		goto out_free_fd;
 	}
+#ifdef CONFIG_NET_RX_BUSY_POLL
+	ep->busy_poll_usecs = 0;
+#endif
 	ep->file = file;
 	fd_install(fd, file);
 	return fd;