From: Kinglong Mee Date: Tue, 2 Jun 2015 10:59:19 +0000 (+0800) Subject: nfsd: Reset cb_status in nfsd4_cb_prepare() at retrying X-Git-Tag: firefly_0821_release~176^2~1536^2~13 X-Git-Url: http://demsky.eecs.uci.edu/git/?a=commitdiff_plain;h=4399396eecfc586a1d92e64fe49c3c899f080436;p=firefly-linux-kernel-4.4.55.git nfsd: Reset cb_status in nfsd4_cb_prepare() at retrying nfsd enters a infinite loop and prints message every 10 seconds: May 31 18:33:52 test-server kernel: Error sending entire callback! May 31 18:34:01 test-server kernel: Error sending entire callback! This is caused by a cb_layoutreturn getting error -10008 (NFS4ERR_DELAY), the client crashing, and then nfsd entering the infinite loop: bc_sendto --> call_timeout --> nfsd4_cb_done --> nfsd4_cb_layout_done with error -10008 --> rpc_delay(task, HZ/100) --> bc_sendto ... Reproduced using xfstests 074 with nfs client's kdump on, CONFIG_DEFAULT_HUNG_TASK_TIMEOUT set, and client's blkmapd down: 1. nfs client's write operation will get the layout of file, and then send getdeviceinfo, 2. but layout segment is not recorded by client because blkmapd is down, 3. client writes data by sending WRITE to server, 4. nfs server recalls the layout of the file before WRITE, 5. network error causes the client reset the session and return NFS4ERR_DELAY, 6. so client's WRITE operation is waiting the reply. If the task hangs 120s, the client will crash. 7. so that, the next bc_sendto will fail with TIMEOUT, and cb_status is NFS4ERR_DELAY. Signed-off-by: Kinglong Mee Signed-off-by: J. Bruce Fields --- diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c index 5694cfb7a47b..8b1ac8d4f011 100644 --- a/fs/nfsd/nfs4callback.c +++ b/fs/nfsd/nfs4callback.c @@ -875,6 +875,7 @@ static void nfsd4_cb_prepare(struct rpc_task *task, void *calldata) u32 minorversion = clp->cl_minorversion; cb->cb_minorversion = minorversion; + cb->cb_status = 0; if (minorversion) { if (!nfsd41_cb_get_slot(clp, task)) return;