Observing files: tangaroa

tangaroa

Observing files

May 20, 2011 19:36

kowh informs me that Linux has inotify() for this purpose. BSD has kqueue. Since the novel idea in my post already exists and has been implemented for years, I've put the rest of the post below a cut.

Certain files change from time to time, and it could be useful for programs to know when they have changed.

Program A changes a file.
Procedure B would like to know when the file is changed.

Currently, one would rewrite Program A to implement Procedure B during its write routine.

Ideally, one could write a standalone program that would be signaled whenever the file is written, without having to make any changes to Program A. The operating system's write routine would be modified to include notification logic.

On writing to a file: If there is a list of listeners for that file: Poke every listener to wake it up.
Permissions are easy; anything that can read a file can listen to it.

The hard part is defining what is to be sent to the listeners. This is easy for an append since the listener will only need the new data. It's more difficult for a write. Should the listener receive the new data in full or a diff? What about a case where the file is written-in-place on the disk with a low-level API, and the old state of the data is gone and irrecoverable?
Is the listener going to be a process that runs in the background and simply waits or could it be a program that the OS will start up when the event occurs?

Should reads also be listenable? Should information about the environment of the file-access, such as date, time, user, and process ID, also be be sent to the listener? Should the listener API work on a queryable set of files rather than one file at a time, which would also listen to any new file that matched the query string?

Then there are concurrency considerations. The file could easily be re-rewritten between the time that notices are fired and the listeners finish taking action. Listener blocking action could be "friendly" in which the listener quits and starts over if any other process edits the file, "greedy" in which the listener blocks the file until its routine is done, or "ignorant" in which multiple listener threads carry on their work and piss over each others' output if their output methods are not prepared for concurrency.

There's also the problem of an infinite event loop. The dispatcher will need to track sets of which triggers have opened which processes and stop signaling if an event is later thrown on a file that has already been triggered by one of the processes in the chain.

Alternatives:

Have program B ping the file timestamp for updates every few minutes and reread the whole file if it changed.

This is what most developers do, but it is wasteful.
Do whatever tail -f does.

The fact that developers don't do this is proof that this method is too difficult to use, or at least more obscure than it should be.