[5.4,0/2] Fix epoll issue in 5.4 kernels

Message ID 20221124001123.3248571-1-risbhat@amazon.com
Headers
Series Fix epoll issue in 5.4 kernels |

Message

Rishabh Bhatnagar Nov. 24, 2022, 12:11 a.m. UTC
  Hi Greg
After upgrading to 5.4.211 we were started seeing some nodes getting
stuck in our Kubernetes cluster. All nodes are running this kernel
version. After taking a closer look it seems that runc was command getting
stuck. Looking at the stack it appears the thread is stuck in epoll wait for
sometime. 
[<0>] do_syscall_64+0x48/0xf0
[<0>] entry_SYSCALL_64_after_hwframe+0x5c/0xc1
[<0>] ep_poll+0x48d/0x4e0
[<0>] do_epoll_wait+0xab/0xc0
[<0>] __x64_sys_epoll_pwait+0x4d/0xa0
[<0>] do_syscall_64+0x48/0xf0
[<0>] entry_SYSCALL_64_after_hwframe+0x5c/0xc1
[<0>] futex_wait_queue_me+0xb6/0x110
[<0>] futex_wait+0xe2/0x260
[<0>] do_futex+0x372/0x4f0
[<0>] __x64_sys_futex+0x134/0x180
[<0>] do_syscall_64+0x48/0xf0
[<0>] entry_SYSCALL_64_after_hwframe+0x5c/0xc1

I noticed there are other discussions going on as well
regarding this.
https://lore.kernel.org/all/Y1pY2n6E1Xa58MXv@kroah.com/
Reverting the below patch does fix the issue:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.4.y&id=cf2db24ec4b8e9d399005ececd6f6336916ab6fc
We don't see this issue in latest upstream kernel or even latest 5.10
stable tree. Looking at the patches that went in for 5.10 stable there's
one that stands out that seems to be missing in 5.4.
289caf5d8f6c61c6d2b7fd752a7f483cd153f182 (epoll: check for events when removing
a timed out thread from the wait queue)

Backporting this patch to 5.4 we don't see the hangups anymore. Looks like
this patch fixes time out scenarios which might cause missed wake ups.
The other patch in the patch series also fixes a race and is needed for
the second patch to apply.

Roman Penyaev (1):
  epoll: call final ep_events_available() check under the lock

Soheil Hassas Yeganeh (1):
  epoll: check for events when removing a timed out thread from the wait
    queue

 fs/eventpoll.c | 68 ++++++++++++++++++++++++++++++--------------------
 1 file changed, 41 insertions(+), 27 deletions(-)
  

Comments

Benjamin Segall Nov. 28, 2022, 9:05 p.m. UTC | #1
Rishabh Bhatnagar <risbhat@amazon.com> writes:

> Hi Greg
> After upgrading to 5.4.211 we were started seeing some nodes getting
> stuck in our Kubernetes cluster. All nodes are running this kernel
> version. After taking a closer look it seems that runc was command getting
> stuck. Looking at the stack it appears the thread is stuck in epoll wait for
> sometime. 
> [<0>] do_syscall_64+0x48/0xf0
> [<0>] entry_SYSCALL_64_after_hwframe+0x5c/0xc1
> [<0>] ep_poll+0x48d/0x4e0
> [<0>] do_epoll_wait+0xab/0xc0
> [<0>] __x64_sys_epoll_pwait+0x4d/0xa0
> [<0>] do_syscall_64+0x48/0xf0
> [<0>] entry_SYSCALL_64_after_hwframe+0x5c/0xc1
> [<0>] futex_wait_queue_me+0xb6/0x110
> [<0>] futex_wait+0xe2/0x260
> [<0>] do_futex+0x372/0x4f0
> [<0>] __x64_sys_futex+0x134/0x180
> [<0>] do_syscall_64+0x48/0xf0
> [<0>] entry_SYSCALL_64_after_hwframe+0x5c/0xc1
>
> I noticed there are other discussions going on as well
> regarding this.
> https://lore.kernel.org/all/Y1pY2n6E1Xa58MXv@kroah.com/
> Reverting the below patch does fix the issue:
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.4.y&id=cf2db24ec4b8e9d399005ececd6f6336916ab6fc
> We don't see this issue in latest upstream kernel or even latest 5.10
> stable tree. Looking at the patches that went in for 5.10 stable there's
> one that stands out that seems to be missing in 5.4.
> 289caf5d8f6c61c6d2b7fd752a7f483cd153f182 (epoll: check for events when removing
> a timed out thread from the wait queue)
>
> Backporting this patch to 5.4 we don't see the hangups anymore. Looks like
> this patch fixes time out scenarios which might cause missed wake ups.
> The other patch in the patch series also fixes a race and is needed for
> the second patch to apply.

Yes, this definitely makes sense to me; the aggressive removal was only
valid because the rest of the epoll machinery did plenty of extra
checking. And I didn't as carefully check the backports when I saw the
-stable emails.