[V5,0/5] Introduce daemon failover mechanism to recover from crashing

Message ID	20230329140155.53272-1-zhujia.zj@bytedance.com
Headers	Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; From: Jia Zhu <zhujia.zj@bytedance.com> To: dhowells@redhat.com, linux-cachefs@redhat.com Cc: linux-erofs@lists.ozlabs.org, linux-kernel@vger.kernel.org, jefflexu@linux.alibaba.com, hsiangkao@linux.alibaba.com, yinxin.x@bytedance.com, Jia Zhu <zhujia.zj@bytedance.com> Subject: [PATCH V5 0/5] Introduce daemon failover mechanism to recover from crashing Date: Wed, 29 Mar 2023 22:01:50 +0800 Message-Id: <20230329140155.53272-1-zhujia.zj@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	Introduce daemon failover mechanism to recover from crashing \| [V5,0/5] Introduce daemon failover mechanism to recover from crashing [V5,1/5] cachefiles: introduce object ondemand state [V5,2/5] cachefiles: extract ondemand info field from cachefiles_object [V5,3/5] cachefiles: resend an open request if the read request's object is closed [V5,4/5] cachefiles: narrow the scope of triggering EPOLLIN events in ondemand mode [V5,5/5] cachefiles: add restore command to recover inflight ondemand read requests

Message ID

20230329140155.53272-1-zhujia.zj@bytedance.com

Headers

Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
From: Jia Zhu <zhujia.zj@bytedance.com>
To: dhowells@redhat.com, linux-cachefs@redhat.com
Cc: linux-erofs@lists.ozlabs.org, linux-kernel@vger.kernel.org,
        jefflexu@linux.alibaba.com, hsiangkao@linux.alibaba.com,
        yinxin.x@bytedance.com, Jia Zhu <zhujia.zj@bytedance.com>
Subject: [PATCH V5 0/5] Introduce daemon failover mechanism to recover from
 crashing
Date: Wed, 29 Mar 2023 22:01:50 +0800
Message-Id: <20230329140155.53272-1-zhujia.zj@bytedance.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

Introduce daemon failover mechanism to recover from crashing |

Message

Jia Zhu March 29, 2023, 2:01 p.m. UTC

  Changes since v3:
1. Make enum cachefiles_object_state to all-uppercase and optimize the implement
   of CACHEFILES_OBJECT_STATE_FUNCS.
2. For struct cachefiles_object:
	1. Make ondemand field inside of "#ifdef CONFIG_CACHEFILES_ONDEMAND".
	2. Rename struct cachefiles_ondemand_info *private to *ondemand.
3. In ondemand_object_worker():
	1. Replace type casting with container_of().
	2. Remove useless "else".
4. In cachefiles_daemon_poll(), replace xa_(un)lock with rcu_read_(un)lock.

[Background]
============
In ondemand read mode, if user daemon closes anonymous fd(e.g. daemon
crashes), subsequent read and inflight requests based on these fd will
return -EIO.
Even if above mentioned case is tolerable for some individual users, but
when it happenens in real cloud service production environment, such IO
errors will be passed to cloud service users and impact its working jobs.
It's terrible for cloud service stability.

[Design]
========
The main idea of daemon failover is reopen the inflight req related object,
thus the newly started daemon could process the req as usual. 
To implement that, we need to support:
	1. Store inflight requests during daemon crash.
	2. Hold the handle of /dev/cachefiles(by container snapshotter/systemd).
BTW, if user chooses not to keep /dev/cachefiles fd, failover is not enabled.
Inflight requests return error and passed it to container.(same behavior as now).

[Flow Path]
===========
This patchset introduce three states for ondemand object:
CLOSE: Object which just be allocated or closed by user daemon.
OPEN: Object which related OPEN request has been processed correctly.
REOPENING: Object which has been closed, and is drived to open by a read
request.

1. Daemon use UDS send/receive fd to keep and pass the fd reference of
   "/dev/cachefiles".
2. User daemon crashes -> restart and recover dev fd's reference.
3. User daemon write "restore" to device.
   2.1 Reset the object's state from CLOSE to REOPENING.
   2.2 Init a work which reinit the object and add it to wq. (daemon can
       get rid of kernel space and handle that open request).
4. The user of upper filesystem won't notice that the daemon ever crashed
   since the inflight IO is restored and handled correctly.

[Test]
======
There is a testcase for above mentioned scenario.
A user process read the file by fscache ondemand reading.
At the same time, we kill the daemon constantly.
The expected result is that the file read by user is consistent with
original, and the user doesn't notice that daemon has ever been killed.

https://github.com/userzj/demand-read-cachefilesd/commits/failover-test

[GitWeb]
========
https://github.com/userzj/linux/tree/fscache-failover-v5

RFC: https://lore.kernel.org/all/20220818135204.49878-1-zhujia.zj@bytedance.com/
V1: https://lore.kernel.org/all/20221011131552.23833-1-zhujia.zj@bytedance.com/
V2: https://lore.kernel.org/all/20221014030745.25748-1-zhujia.zj@bytedance.com/
V3: https://lore.kernel.org/all/20221014080559.42108-1-zhujia.zj@bytedance.com/
V4: https://lore.kernel.org/all/20230111052515.53941-1-zhujia.zj@bytedance.com/

Jia Zhu (5):
  cachefiles: introduce object ondemand state
  cachefiles: extract ondemand info field from cachefiles_object
  cachefiles: resend an open request if the read request's object is
    closed
  cachefiles: narrow the scope of triggering EPOLLIN events in ondemand
    mode
  cachefiles: add restore command to recover inflight ondemand read
    requests

 fs/cachefiles/daemon.c    |  16 +++-
 fs/cachefiles/interface.c |   7 +-
 fs/cachefiles/internal.h  |  59 +++++++++++++-
 fs/cachefiles/ondemand.c  | 166 ++++++++++++++++++++++++++++----------
 4 files changed, 202 insertions(+), 46 deletions(-)