Saturday, March 31, 2012

Running a Command Whenever a File Changes

Edit: I actually had to republish this as I didn't understand the Blogger interface. Because of this, I added a new section where I combine this into a script that you can actually use. Enjoy.

Every once in a while you probably would like to have a command run every time a particular file changes. I know I did when I was writing some Latex based presentations with Beamer and was using the latex to dvips to gs chain of commands to compile the document into a PDF file. I came up with a way to ease these steps by detecting changes in the document's DVI file that Emacs writes for me. I later thought of a couple of other methods and figured I would put them up here in case anybody ever needed something similar. I included all of them as you never know when you need to put together a hack on some old or limited system, or just a system that you don't control and thus cannot install new software.

I decided that the interface of my little script should be to specify a file to watch, the watch file, and a command to run when it changes, the on change command. The on change command should take one argument, the watch file.

Watch File Modification Times

In this version we look at the watch file modification time and compare it to a reference time that is updated every time the watch file changes. This is definitely a cheap way to find these updates. The easiest way I could find to check the modification time of a file was to create a temporary file which acts as a time stamp. We just check which file is newer and touch the temporary file any time we want to update the time stamp.

#!/bin/bash

watched_file=$1
command=$2

# First we need a temporary file to test against
marker=$(mktemp /tmp/on-change-XXXXXX)

# We have to trap the EXIT signal as we want to clean up our tmp file
trap "rm -f $marker" EXIT

# Loop until killed
while true
do
    # If the watched_file is newer than the marker...
    if [ $watched_file -nt $marker ]
    then
        # Wait for a bit to make sure that the file is done being modified
        sleep 1
        # ... run the command ...
        $command $watched_file
        # ... and reset the marker
        touch $marker
    fi
    sleep 2
done

One thing to note is that this script, like many scripts, has a race condition in it. If it happens to detect a modification of the watch file but the modification is not yet complete, it will trigger the on change command and then probably get an error from it, then detect another modification the next iteration. This will continue until the program modifying the watched file has completed its work. This is bound to happen for short polling times (here we use two seconds) or long running programs that modify the watch file. This is the reason I added one second delays after the modification time based test. The delay is to try and give time for anything that is currently underway to complete. It helps, but of course fails to help other times.

A possible problem here is that it is easy to trick this into doing more work than it should. The watch file might be repeatedly touched but never actually modified, which would lead to unnecessary execution of the on change command. The key is to note that the newer modification time is a necessary but not sufficient condition for the file to have changed. It leads to false positives, but is very cheap.

It should be noted that, in the case of compiling Latex documents, while this can happen, it is a pretty rare event. But, if we are building a general purpose tool, it is something we should worry about.

Watch File Hash

How can we eliminate the reprocessing of identical files? File hashes.

The next technique is to poll the file for actual file changes. We don't need to keep an old copy of the file, or anything like that, we just keep track of the file's old hash value. Every two seconds (plus the time it takes to hash the file) we compute files hash and compare it to the old file's hash. If they differ, we run the specified on change command and update the stored hash value.

#!/bin/bash

watched_file=$1
command=$2

# Compute the reference hash value
hash=`md5sum $watched_file | cut -f 1 -d\ `
while true
do
    # Grab the files hash
    newhash=`md5sum $watched_file | cut -f 1 -d\ `
    # If it has changed...
    if [ $newhash != $hash ]
    then
        # ...run the command...
        $command $watched_file
        # ...and record the new hash
        hash=$newhash
    fi
    sleep 2
done

This has the advantage that it never needs to run the on change command unless there is an actual change in the file. This means that for an expensive on change command the method works very well. However, for very large watch files the hashing becomes unnecessarily expensive.

However, the race condition is still there and this time it is worse. Whereas in the modification time test method we could try to give the modifying process extra time to complete, this time we don't have that option. We just have to blindly try again until we get it right.

Use INotify

As is often the case, you find something that is a better solution than what you have hacked together after you have long since found a good enough solution to your problem. This is the case with the me and INotify.

The INotify kernel facility was designed for just the problem I was attacking. It provides, via a kernel interface, a way to hook into the file system and receive notifications on events such as reading from, writing to, opening, and closing files. If you are on one of the mainstream distros and have been keeping things even remotely up to date, you probably have an INotify ready kernel, but might not have the shell tools installed. We will be using inotifywait, which blocks until a specified file system event is triggered. In this case, we are interested in file modification events, so we will pass the option -e modify to the program. Just a note, this is taken almost verbatim from the inotifywait man page.

#!/bin/bash

watched_file=$1
command=$2

while inotifywait -e modify $watched_file; do
    # Wait a bit, in case the modifier is still working
    sleep 1
    # Then run the command
    $command $watched_file
done

This is much simpler than the other methods. A real strength here is that this is short enough that it really doesn't need to be a script at all. You can just memorize this. This is definitely a pretty good version, but it still has that race condition. So, one last version, where we will wait for the file to not be modified for a few seconds before we run the command on it. We can do this because inotifywait provides a timeout which exits with a 2 if it did timeout.

#!/bin/bash

watched_file=$1
command=$2

while inotifywait -e modify $watched_file;
do
    while [ 2 != $(inotifywait -e modify -t 1 $watched_file \
                    1> /dev/null 2> /dev/null; echo $?) ]
    do
        echo waiting...
    done
    $command $watched_file
done

This catches most of the race condition issues, I think, but the price you pay is that you have to wait at least one second after the file has been modified before the file can be processed by your script.

Of course, without a proper synchronization mechanism, which needs to be agreed upon by both programs and thus is largely incompatible with the shell idea of small self contained programs, you will never get rid of this race condition. It is firmly up to the user to ensure that two different programs are not accessing the same file at the same time (actually, I think we can go a long way towards solving this by using lsof and checking if the watched file is still open. I don't have time to explore this but lsof $watched_file seems to be promissing).

Combine The Methods

We can have the best of all of these methods by combining them into one monster script that checks for changes using INotify (with a modification time check fall back) and then confirms that the file has actually changed by comparing the file hashes. But I wont bother because this is already too long, both in words and time spent typing and editing.

Update: here it is. I changed the interface, now you specify a file and the complete command you want to run on change.

#!/bin/bash

watched_file=$1
command="${@:2}"

# We start with a bogus hash
hash=null

if which inotifywait
then
    function wait_till_unchanging()
    {
        while \
            [ $(inotifywait -e modify -t 1 $watched_file \
            1> /dev/null 2> /dev/null; echo $?) != 2 ];
        do
            echo waiting...
            sleep 1
        done
    }
else
    function wait_till_unchanging()
    {
        # Wait until there isn't a change for delay seconds
        touch $marker
        cont=1
        while [ 1 == $cont ]
        do
            cont=0
            sleep 1
            # If the watched_file is newer than the marker...
            if [ $watched_file -nt $marker ]
            then
                # reset the marker
                touch $marker
                cont=1
            fi
        done
    }
fi

if which md5sum
then
    function if_changed_run ()
    {
        # Grab the files hash
        newhash=`md5sum $watched_file | cut -f 1 -d\ `
        
        # If it has changed (this is always run the first time as $hash is null)...
        if [ $newhash != $hash ]
        then
            # ...run the command...
            $command

            # ...and record the new hash
            hash=$newhash
        fi
    }
else
    function if_changed_run ()
    {
        # ...run the command...
        $command
    }
fi

if which inotifywait
then
    while inotifywait -e modify $watched_file;
    do
        wait_till_unchanging
        if_changed_run
    done
else
    # First we need a temporary file to test against
    marker=$(mktemp /tmp/on-change-XXXXXX)

    # We have to trap the EXIT signal as we want to clean up our tmp file
    trap "rm -f $marker" EXIT

    while true
    do
        # If the watched_file is newer than the marker...
        if [ $watched_file -nt $marker ]
        then
            wait_till_unchanging
            if_changed_run
        fi
        sleep 1
    done
fi

Using It To Automate Latex Builds

I said I wanted this to make compiling Latex documents easier. In order to do that you just use something like:

on-file-change presentation.dvi dvipdf

Update: with the new interface, it looks like this:

on-file-change presentation.dvi dvipdf presentation.dvi

Of course the best method would be to figure out how to make Emacs run this post processing for me. However, I have so far failed to figure that out and this little tool is applicable to more than just this scenario anyway.

Update: I did find out how to do this the right way in Emacs. Turns out there are a lot of different ways and not all of them work. For instance, the first answer on that page doesn't work for me. I use AuCTeX, which means that you can temporarily change the Latex compilation mode to use pdflatex by using "C-c C-t C-p" in the buffer or permanently set it by adding (setq TeX-PDF-mode t) in your .emacs file.