VFS: fix dio write returning EIO when try_to_release_page fails
authorHisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
Tue, 2 Sep 2008 21:35:40 +0000 (14:35 -0700)
committerLinus Torvalds <torvalds@linux-foundation.org>
Wed, 3 Sep 2008 02:21:37 +0000 (19:21 -0700)
commit6ccfa806a9cfbbf1cd43d5b6aa47ef2c0eb518fd
treedd3f17e1aebc802b147627a4151add363f39b77c
parent344c790e3821dac37eb742ddd0b611a300f78b9a
VFS: fix dio write returning EIO when try_to_release_page fails

Dio write returns EIO when try_to_release_page fails because bh is
still referenced.

The patch

    commit 3f31fddfa26b7594b44ff2b34f9a04ba409e0f91
    Author: Mingming Cao <cmm@us.ibm.com>
    Date:   Fri Jul 25 01:46:22 2008 -0700

        jbd: fix race between free buffer and commit transaction

was merged into 2.6.27-rc1, but I noticed that this patch is not enough
to fix the race.

I did fsstress test heavily to 2.6.27-rc1, and found that dio write still
sometimes got EIO through this test.

The patch above fixed race between freeing buffer(dio) and committing
transaction(jbd) but I discovered that there is another race, freeing
buffer(dio) and ext3/4_ordered_writepage.

: background_writeout()
     ->write_cache_pages()
       ->ext3_ordered_writepage()
         walk_page_buffers() -> take a bh ref
     block_write_full_page() -> unlock_page
: <- end_page_writeback
                : <- race! (dio write->try_to_release_page fails)
          walk_page_buffers() ->release a bh ref

ext3_ordered_writepage holds bh ref and does unlock_page remaining
taking a bh ref, so this causes the race and failure of
try_to_release_page.

To fix this race, I used the approach of falling back to buffered
writes if try_to_release_page() fails on a page.

[akpm@linux-foundation.org: cleanups]
Signed-off-by: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Mingming Cao <cmm@us.ibm.com>
Cc: Zach Brown <zach.brown@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
mm/filemap.c
mm/truncate.c