Pangram verdict · v3.3
We believe that this document is fully human-written
AI likelihood · overall
HumanArticle text · 1,707 words · 5 segments analyzed
This post describes the considerations that came up in designing the document-change data structure and built-in collaborative editing feature in the upcoming version of CodeMirror (a code editor system). It is something of a followup to the Collaborative Editing in ProseMirror post.I won't introduce anything new or exciting here—the design I ended up with is a very boring non-distributed operational transformation. In a way, this post is publishing a negative result: I looked into a bunch of interesting alternative approaches, but found they didn't meet the requirements for this system.Since collaborative editing is a tricky field with a lot of different solutions, all of which have their awkward trade-offs, I think the path towards this boring end result might still provide a useful resource for people working on similar systems.Distributed versus coordinated collaborative editingThere's quite a disconnect between the scientific literature on collaborative editing and what most collaborative editors are doing. The literature is largely concerned with truly distributed collaboration, where a number of peers, taking equivalent roles, directly exchange updates among themselves and still somehow converge on the same document. A typical web system, on the other hand, has clients talking to a server, which orchestrates the exchange of updates.These problems are very different, and if you're aiming to implement the latter, about 95% of collaborative editing literature is discussing a problem you do not have. Working in a truly distributed fashion is very attractive in principle, of course. It is strictly more general, and has connotations of escaping the over-centralized modern web. But it does drag in a lot of problems, even outside of the document convergence—peers have to store the document along with, depending on the technique used, its entire history. They have to somehow discover each other, maintain connections, and so on.So for the core of a JavaScript-based library, I decided that support for a distributed model wasn't important enough to justify the additional complexity. I'll get back to what that complexity looks like later.Operational Transformation(I'm sorry, I'm going to explain operational transformation again, just like a hundred other blog posts.
Hang tight.)Operational transformation involves, in its simplest form, a transformation function that takes two changes A and B, which both apply to the same document, and produces a new pair Aᴮ (a version of A that applies to the document produced by B) and Bᴬ (B but applies to the document created by A), such that A + Bᴬ and B + Aᴮ (where + indicates change composition) produce the same document.This can be visualized as a diagram like this: Docˢ A ↙ ↘ B Docᴬ Docᴮ Bᴬ ↘ ↙ Aᴮ Docᴬᴮ You can roughly think of the transformation as moving a change through another change. Sometimes this just involves adjusting the positions it refers to (moving “insert A at 5” through “delete 0-1” yields “insert A at 4”), sometimes it it is more involved (two changes can't delete the same content, so in that case some of the deletions have to be dropped in the transformed change).For a plain-text document model, such a transformation function is not very hard to define.The other part of an operational transformation system is its control part, which determines how changes are shared and when they are transformed. A simple centralized setup can do something like this:
Keep a list of unsent local changes.
Periodically try to send those to the server, which will either accept them or, if another client sent a change first, reject them because your changes start in a base document that doesn't match the current central document.
When the server sends you changes, transform your unsent changes and the remote change over each other. Apply the transformed remote change locally, and replace your stored local changes with the transformed version, so that the next time you try to send changes you send those.
Because this makes everything proceed in lockstep, it is easy to see that this converges—the only difference in order in the way clients apply changes is when remote changes come while there are local changes pending. This is exactly the scenario that the transform function's convergence handles.
When there is more than one remote or local change involved, the situation is slightly more tricky. If the change representation that your transform function operates on allows series of changes to be composed into a single change, you can first do that and then perform a single transform.If not, you basically need to build up a bigger rectangle. If A and B are remote changes, and X and Y local changes, you can do something like this: Docˢ A↙ ↘X Docᴬ Docˣ B↙ ↘Xᴬ Aˣ↙ ↘Y Docᴬᴮ Docᴬˣ Docˣʸ Xᴬᴮ↘ ↙Bˣ Yᴬ↘ ↙Aˣʸ Docᴬᴮˣ Docᴬˣʸ Yᴬᴮ↘ ↙Bˣʸ Docᴬᴮˣʸ In each individual diamond in this monstrosity, the lower arrows are made up by transforming the upper arrows against each other. So the whole thing can be built up with O(N×M) transformations, where N and M are the number of changes on both sides. The guarantee that a single set of changes can be transformed over each other and still converge can thus be used to compute transformed versions of bigger groups of changes—the bottom sides of the big diamond provide the final transformed versions of the change sets.Then why is OT considered hard?So that's pretty easy. Still, the general consensus seems to be that OT is hard. There's an amusingly long list of papers in this space that have later been proven to be incorrect.If you've been programming for a while you've probably run into this kind of thing before: there's a hack that works simply and efficiently, but completely falls apart when you add more requirements. OT is such a hack.When the document structure is more complicated than plain text, and change structure is more complicated than just insertions and deletes, it becomes very hard to define a converging transformation function.
And when you don't have a centralized party to determine the order in which changes get applied, you need, as far as I understand the literature, to keep data beyond the current document (“tombstones” that describe where deletions happened), to guarantee convergence.The main problem is that this technique is hard to reason about. It has many practical advantages but, being a hack, doesn't provide the kind of mental framework that would allow you to confidently apply it in different situations.Conflict-free Replicated Data TypesThat is where CRDTs come in. They are another approach to convergence that, contrary to operational transformation, does provide a way for us mortals to reason about convergence.Basically, CRDTs complicate the way they represent data and changes to the point where you can apply changes in any order, and as long as you applied the same changes, you get the same result.For text-style data, CRDT approaches roughly work by assigning every character a unique ID. There's two general approaches to representing changes.The first keeps deleted characters in the document forever, just marking them as deleted. Character insertions provide reference IDs for the character before and/or after them, and those references, along with data (often a logical clock) embedded in the IDs allow each client to compute the same ordering for the characters.The second approach generates IDs in such a way that they are already ordered. When inserting something at a given position, you generate a new ID that sits between the existing IDs around that position. This means character IDs must be able to grow in size, so that you can always create a new ID between any given pair. You'll also need to include some client-specific data to make sure IDs are unique. But on the bright side, this approach doesn't require you to keep deleted content around.Given such a data type, collaborative editing becomes a matter of making sure everybody ends up receiving everybody else's changes, eventually.But I guess you can see why this isn't a no-brainer either. Compared to OT, where your document representation can just be a minimal representation of the text inside it, these techniques require an extra data structure to be stored for every character in the document. And with some of them, you don't get to delete those characters either.
There are approaches that exploit the way text is often inserted in a linear way (or loaded in bulk at the start of the editing session) to compress this data somewhat by generating sequential IDs for such spans of text, and storing them as a single element. But these introduce further complexity and don't really help provide an upper bound on the data needed—in a heavily edited document the representation may well degenerate to something close to the ID-per-character structure.For CodeMirror's core, data structures and complications like this conflict with the goal of supporting huge documents and having a pleasant programming interface. So I've decided not to introduce a CRDT in its internal data structures.It may be entirely reasonable to wire an instance of the library up to an external CRDT implementation, though, as has been done for ProseMirror with Yjs.In general, I agree with this article that the strict separation between OT and CRDT is not really justified. OT algorithms that support decentralized editing rather resemble CRDT algorithms. But the distinction between dumb-as-rocks classical OT and decentralization-capable approaches is useful.CodeMirror's approachSo the plan is to support centralized collaborative editing out of the box, and punt on the complexities of decentralized collaboration.I started the project with the idea that I'd just do what ProseMirror does, since that worked well before.To recap, ProseMirror uses something similar to OT, but without a converging transformation function. It can transform changes relative to each other, but not in a way that guarantees convergence. ProseMirror's document structure is a tree, and it supports various types of changes, including user-defined changes, that manipulate this tree. I didn't have the courage to try and define a converging transformation function there. When a client receives conflicting remote changes, it first undoes all its local pending changes, applies the remote changes, and then the transformed form of its local changes. This means that everybody ends up applying the changes in the same order, so convergence is trivial.In CodeMirror, the document model is a lot simpler, so though I got pretty far with the ProseMirror approach, it seemed needlessly awkward.