Synchronizing Many Filesystems in Near Linear Time
The preprint and sample code are now available for the latest result in our Algebraic Theory of Filesystems.
Finding a provably correct subquadratic synchronization algorithm for many filesystems is one of the main theoretical problems in Operational Transformation (OT) and Conflict-free Replicated Data Types (CRDT) frameworks. Based on the Algebraic Theory of Filesystems, which incorporates non-commutative filesystem commands natively, we developed and built a proof-of-concept implementation of an algorithm which synchronizes an arbitrary number of replicas. It is provably correct, and has linear complexity after an initial sorting phase. It works by identifying conflicting command pairs and requesting one of them to be removed; can be guided to reach any of the theoretically possible synchronized states.
The algorithm allows asynchronous usage. After the client sends a synchronization request, the local replica remains available for further modification. When the synchronization instructions arrive, they can be merged with the changes made since the synchronization request. It also works on filesystems with directed acyclic graph-based path structure in place of the traditional tree-like one. Thus it can be applied to filesystems with hard or soft links as long as the links create no loops.