The correctness and completeness of algebraic file synchronisers
I did some work on an algebraic approach to file synchronisation back in 2000. This basically means that if one has multiple copies of the same filesystem, and different users modify them differently, then we combine all changes (synchronise them) not by specifying rules based on what content we find in the filesystems, but based on the updates or commands that have been applied to them. This approach works very well as one can see whether two updates are in conflict just by looking at pairs of commands and a small set of rules.
I have recently become interested in this problem again, and in particular in offering rigorous proofs that the algorithm proposed to detect conflicting commands really works as intended, and cannot be improved in this framework. The statements to prove were intuitive, but actually proving them turned out to be a difficult task. One reason for this was that previously, all propositions about commands were of the form "these two commands behave exactly like that command on all filesystems" - that is, they had an implied universal quantifier (or an existential one if one negated the proposition). However, in order to prove, for example, that the commands returned by the synchroniser algorithm designed to modify the filesystems so that they would be as much like each other again as possible will not actually cause an error, one needed to deal with propositions of the form "if these commands do not cause an error, then those commands won't cause an error, either". In other words, the algebraic system needed to be extended to be able to describe implication, or some kind of conditionality.
This could be done based on partial functions and a partial order, and I am very excited about the result. The end product of this new research is a paper, now available on arXiv.
Algebraic File Synchronization: Adequacy and Completeness
Elod Pal Csirmaz
Abstract
With distributed computing and mobile applications, synchronizing diverging replicas of data structures is a more and more common problem. We use algebraic methods to reason about filesystem operations, and introduce a simplified definition of conflicting updates to filesystems. We also define algorithms for update detection and reconciliation and present rigorous proofs that they not only work as intended, but also cannot be improved on.
To achieve this, we introduce a novel, symmetric set of filesystem commands with higher information content, which removes edge cases and increases the predictive powers of our algebraic model. We also present a number of generally useful classes and properties of sequences of commands.
These results are often intuitive, but providing exact proofs for them is far from trivial. They contribute to our understanding of this special type of algebraic model, and toward building more complete algebras of filesystem trees and extending algebraic approaches to other data storage protocols. They also form a theoretical basis for specifying and guaranteeing the error-free operation of applications that implement an algebraic approach to synchronization.