generic_file_direct_write() will update the i_size if pos + written > i_size_read(inode)
We will meet the bug if we do a direct write extending the inode, then a truncate on it </li></ul>
ocfs2_file_aio_write() <ul><li>Check if we have O_DIRECT flag
Can we do direct write? Not if end > i_size_read(inode)
down_read(&inode->i_alloc_sem); if doing direct write </li><ul><li>To protect us from truncate on the same node </li></ul><li>Only takes PR lock on the ocfs2_rw_lock of the inode when doing direct write(we are not going to change metadata) </li><ul><li>To protect i_size against other nodes
ocfs2 uses dlm to sync the metadata in the cluster
3 levels of locks are used, NL, PR, EX </li></ul><li>Better performance when multiple nodes write to a file, what is oracle famous for? ;-) </li></ul>
ocfs2_file_aio_write() count. <ul><li>Call generic_file_direct_write directly if we can do direct write (end <= i_size_read(inode))
Call __generic_file_aio_write otherwise </li><ul><li>Buffered write, no down_read on inode->i_alloc_sem
Take EX lock on the ocfs2_rw_lock of the inode </li></ul><li>__generic_file_aio_write will check if the file have O_DIRECT flag, and try gerneric_file_direct_write first,then fall back to buffered write
So if (end > i_size_read(inode)), we are still doing direct write
So what? __generic_file_aio_write will fall back to buffered write if direct write fails </li></ul>
ocfs2_direct_IO() <ul>if (i_size_read(inode) <= offset) <ul>return 0; </ul><li>Only checking the offset is not enough
But wait, we have ocfs2_direct_IO_get_blocks() </li><ul><li>ret = blockdev_direct_IO_no_locking(rw, iocb, inode,
What if the inode has a partial block in the end, and we are writing till the end of the block?
We can still get a pass, because at the level, we are talking in blocks, not bytes </li></ul>
The fix <ul><li>We could check if offset + length > i_size in ocfs2_direct_IO()
But in ocfs2_file_aio_write, we won't down_read(&inode->i_alloc_sem) when we could not do direct write.
Thus if we do a direct write extending i_size, ocfs2_file_aio_write() just prepared to do buffered write, however __generic_file_aio_write will try direct write first, so we will do direct write without down_read the i_alloc_sem
Will race with allocation change like truncate
So if ocfs2_file_aio_write() decides we could not do direct write, we call generic_file_buffered_write() instead of __generic_file_aio_write </li></ul>
Unpublished Work of Novell, Inc. All Rights Reserved. This work is an unpublished work and contains confidential, proprietary, and trade secret information of Novell, Inc. Access to this work is restricted to Novell employees who have a need to know to perform tasks within the scope of their assignments. No part of this work may be practiced, performed, copied, distributed, revised, modified, translated, abridged, condensed, expanded, collected, or adapted without the prior written consent of Novell, Inc. Any use or exploitation of this work without authorization could subject the perpetrator to criminal and civil liability. General Disclaimer This document is not to be construed as a promise by any participating company to develop, deliver, or market a product. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. Novell, Inc. makes no representations or warranties with respect to the contents of this document, and specifically disclaims any express or implied warranties of merchantability or fitness for any particular purpose. The development, release, and timing of features or functionality described for Novell products remains at the sole discretion of Novell. Further, Novell, Inc. reserves the right to revise this document and to make changes to its content, at any time, without obligation to notify any person or entity of such revisions or changes. All Novell marks referenced in this presentation are trademarks or registered trademarks of Novell, Inc. in the United States and other countries. All third-party trademarks are the property of their respective owners.
A particular slide catching your eye?
Clipping is a handy way to collect important slides you want to go back to later.