fsync is cheaper than you think

With all the talk about the new ext4’s interactions with gconfd, kde, firefox, and others, a lot of people have been assuming fsync is expensive.

If you’re living under a rock and don’t know what’s going on, here’s the short version: Lots of programs don’t write to files reliably because (a) writing to files reliably requires some contortions, and (b) writing to files reliably is slow.

Neither of these things are true, and I wrote libreliable to prove it.

libreliable is a simple, mostly portable way to get reliable file i/o
without an fsync()-penalty after every write.

That means you have no excuse for not updating your files reliably.

The included demo should show how it works. It’s fun to watch in strace/truss.

11 Comments

  1. Seun Osewa says:

    Thanks. How about a short how-to-use guide for libreliable?

  2. geocar says:

    Seun Osewa:

    The included demo.c should show how it works; basically if you have a program that goes open(tmpfile),write(),fsync(),close(),rename() sixty times a minute, you can replace that fsync() and the rename() with a reliable_post() which will queue your changes.

    A future version of libreliable could even queue changes until some other disk activity occurs (by monitoring /proc/diskstats for example).

  3. Brendan says:

    It seems to me that a separate process, while it may be the best and simplest solution, is not a trivial.

    My understanding is that FireFox was not using the file API’s, it was using SQLite. And your solution does not translate in a trivial way to that scenario.

  4. Luthor P. Wick says:

    Are you serious? Delayed fsync+move in a distinct subprocess? A whole library to do that, just to get safe fsync+move semantics?

    That’s bloody ridiculous — dude, stop drinking the koolaid. Ext4’s semantics are whacked, and you just demonstrated why.

  5. geocar says:

    @Luthor

    I don’t think you understand what’s going on. The library provides safe fsync+rename semantics when you do 60-100 updates per minute, *without spinning up the disk or forcing a filesystem flush for each update*.

    Apparently this is so complicated the gconfd people decided to just not use fsync() and instead lose data for ext3, jfs, and nfs users.

  6. geocar says:

    @Brendan

    You’re right, of course. Firefox would need something else.

  7. Luthor P. Wick says:

    I understand what is going on. I’m just saying that write+rename ordering should be provided by the file system without a whole library to implement exactly what the FS could and should be implementing itself.

  8. geocar says:

    @Luthor

    gconfd loses data on ext3 because it doesn’t call fsync; it should be calling fsync. It decided not to call fsync long before ext4 entered the picture.

    libreliable provides a way for gconfd to call fsync without spinning up the disk for every write, and without long delays, which solves the problem for both ext4 and ext3.

  9. Luthor P. Wick says:

    It loses data because fsync+rename are not properly ordered by the filesystem. The file system will rename a file before writing out the data contents to disk.

    This behavior is within specifications, but it’s also ridiculously useless. The fact that you need an entire library to work around this behavior (by delaying fsync+rename for up to a minute) demonstrates exactly why.

    It’s a common use case, the file system should do the right thing, rather than optimizing for contrived benchmarks and requiring an enormous amount of complexity to optimize a relatively simple operation.

  10. geocar says:

    > It loses data because fsync+rename are not properly ordered by the filesystem.

    I understood it loses data because it’s not doing an fsync() at all. A quick survey of the gconf tarball confirms this for me, are you talking about something else?

    > This behavior is within specifications, but it’s also ridiculously useless. The fact that you need an entire library to work around this behavior (by delaying fsync+rename for up to a minute) demonstrates exactly why.

    I’m not sure that follows. One might say that X11 is broken for the same reason because the protocol is so complex that layered libraries are required to interact with it (in contrast: Plan9’s rio is much easier to program directly)

    My understanding is that the ext4 changes improve throughput. That’s a good thing.

    My understanding is also that gconfd loses data. I haven’t used ext4 yet- gconfd is losing data on ext3, on jfs, and on nfs. I don’t find that acceptable, and as a result, fsync() is necessary.

    Why are people so resistant to fixing both problems?

  11. Bruce says:

    I understand what is going on. I’m just saying that write+rename ordering should be provided by the file system without a whole library to implement exactly what the FS could and should be implementing itself.

Leave a Reply