Certified Mergeable Replicated Data Types

Vimala Soundarapandian*, Adharsh Kamath, Kartik Nagar*, KC Sivaramakrishnan*‡
*IIT Madras   NITK Surathkal   Tarides
PLDI 2022 (ACM SIGPLAN Conference on Programming Language Design and Implementation)

Abstract

Replicated data types (RDTs) are data structures that permit concurrent modification of multiple, potentially geo-distributed, replicas without coordination between them. RDTs are designed so that conflicting operations are eventually deterministically reconciled ensuring convergence. However, constructing correct RDTs remains difficult due to the complexity of reasoning about independently evolving replica states, and existing approaches sacrifice efficiency for correctness.

We present Peepul, a pragmatic approach to building and verifying efficient RDTs. We cast RDTs as mergeable replicated data types (MRDTs) in the mould of distributed version control (like Git), equipping them with a three-way merge function. We develop replication-aware simulation relations to verify both convergence and functional correctness, and implement Peepul as an F* library that discharges proof obligations to an SMT solver. The verified RDTs are extracted as OCaml code and run on Irmin, a Git-like distributed database.

The Problem

Modern distributed software replicates data across multiple locations for fault tolerance and low latency. When replicas are modified concurrently, conflicts must be resolved automatically. Convergent Replicated Data Types (CRDTs) solve this but carry expensive causal metadata and use inefficient merge strategies.

Mergeable Replicated Data Types (MRDTs) offer a better alternative: ordinary purely functional data structures are promoted to RDTs by equipping them with a three-way merge function (using a lowest common ancestor, like Git). This avoids tombstones and metadata overhead. But two key challenges remain:

How Peepul Works

Peepul bridges the gap between high-level specifications and efficient implementations through three key ideas:

  1. Git-like Store Model: MRDTs operate over a store modelled after distributed version control — data is versioned in branches with explicit merges. The store provides the lowest common ancestor (LCA) for three-way merging, abstracting away causal metadata.
  2. Replication-Aware Simulation Relations: To prove an MRDT implementation correct, we define a simulation relation between the concrete state (the efficient data structure) and an abstract state (a high-level specification based on events and visibility). The relation must be maintained across all operations and merges, ensuring both functional correctness (operations return the right values) and convergence modulo observable behaviour (replicas with the same history behave identically).
  3. Automated Verification in F*: The simulation relation generates proof obligations that F* discharges to the Z3 SMT solver. For most MRDTs, the proofs are fully automatic. The verified implementations are extracted as OCaml code and run on Irmin.

Example: OR-Set

An observed-remove set (OR-set) where concurrent add and remove of the same element results in the add winning. Peepul provides three verified implementations with different space/time trade-offs:

// OR-set state: set of (element, timestamp) pairs
Sigma = P(N x N)
sigma_0 = {}

do(add(a), sigma, t) = (sigma ∪ {(a, t)}, ⊥)
do(remove(a), sigma, t) = ({e ∈ sigma | fst(e) ≠ a}, ⊥)

merge(sigma_lca, sigma_a, sigma_b) =
  (sigma_lca ∩ sigma_a ∩ sigma_b) ∪
  (sigma_a - sigma_lca) ∪
  (sigma_b - sigma_lca)

The merge takes the LCA and two divergent versions: it keeps elements present in all three, and adds elements that are new in either branch. The space-efficient variant (OR-set-space) deduplicates entries, while OR-set-spacetime uses a balanced BST for O(log n) operations.

Verified MRDTs

Peepul includes 9 verified MRDT implementations with varying complexity:

MRDT Lines of Code Lines of Proof Lemmas Verif. Time (s)
Increment-only counter 6 43 2 3.5
PN counter 8 43 2 23.2
Enable-wins flag 20 58–89 3–7 104–1074
LWW register 5 44 1 4.2
G-set / G-map 10 / 48 23–33 / 26 0–2 / 0 2–5 / 26
Mergeable log 39 95 2 36.6
OR-set (3 variants) 30–97 36–266 0–7 8.8–1854
Queue 32 1123 75 4753

The queue is the most complex MRDT — the first formally verified replicated queue with a complete specification capturing FIFO ordering, at-least-once dequeue semantics, and correct merge behaviour. Its O(n) merge is orders of magnitude faster than prior work.

Key Results

Orders-of-magnitude speedup over Quark. For queues, Quark's merge takes 10s for 1000 operations and 178s for 5000 operations due to costly relational reification. Peepul's linear-time merge takes less than a millisecond in all cases, operating directly on the concrete representation.
Space efficiency. Quark's OR-set grows unboundedly due to duplicates (reaching 10,000 elements after 100K operations), while Peepul's space-efficient OR-set stays bounded below 1000 elements for the same workload.
Compositionality. Complex MRDTs (like an IRC chat application) can be built by composing simpler verified MRDTs using a generic α-map. The correctness proof of the composite follows from the proofs of the components.
Convergence modulo observable behaviour. A new, weaker notion of convergence that allows replicas to have different internal states (e.g., differently balanced BSTs) as long as their observable behaviour is identical. This enables even more efficient implementations.

Contributions

BibTeX

@inproceedings{soundarapandian2022certified,
  title={Certified Mergeable Replicated Data Types},
  author={Soundarapandian, Vimala and Kamath, Adharsh and Nagar, Kartik and Sivaramakrishnan, KC},
  booktitle={Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI)},
  pages={332--347},
  doi={10.1145/3519939.3523735},
  year={2022}
}