A simple way to build collaborative web apps

sambroner 2021-08-17 15:37:55 +0000 UTC [ - ]

I'm really glad to see an article like this. I've worked in the space for a while (Fluid Framework) and there's a growing number of libraries addressing realtime collab. One of the key things that many folks miss is that building a collaborative app with real time coauthoring is tricky. Setting up a websocket and hoping for the best won't work.

The libraries are also not functionally equivalent. Some use OT, some use CRDTs, some persist state, some are basically websocket wrappers, fairly different perf guarantees in both memory & latency etc. The very different capabilities make it complicated to evaluate all the tools at once.

Obviously I'm partial the Fluid Framework, but not many realtime coauthoring libraries have made it as easy to get started as Replicache. Kudos to them!

A few solutions with notes...

  - Fluid Framework - My old work... service announced at Microsoft Build '21 and will be available on Azure
  - yJS - CRDTs. Great integration with many open source projects (no service)
  - Automerge - CRDTs. Started by Martin Kleppman, used by many at Ink & Switch (no service)
  - Replicache - Seen here, founder has done a great job with previous dev tools (service integration)
  - Codox.io - Written by Chengzheng Sun, who is super impressive and wrote one of my fav CRDT/OT papers
  - Chronofold - CRDTs. Oriented towards versioned text. I'm mostly unfamiliar
  - Convergence.io - Looks good, but I haven't dug in
  - Liveblocks.io - Seems to focus on live interactions without storing state
  - derbyjs - Somewhat defunct. Cool, early effort.
  - ShareJS/ShareDB - Somewhat defunct, but the code and thinking is very readable/understandable and there are good OSS integrations
  - Firebase - Not the typical model people think of for RTC, but frequently used nonetheless

I should add... I talk to many folks in the space. People are very welcoming and excited to help each other. Really fun space right now.

nikodunk 2021-08-17 21:43:23 +0000 UTC [ - ]

I've used YJS and can strongly recommend. https://github.com/yjs/yjs

Built a Google Docs like rich text collaborator for a client on Express/Psql and React. Worked like a charm. The hardest part was dealing with ports on AWS to be honest.

BackBlast 2021-08-17 22:54:19 +0000 UTC [ - ]

I'll have to look at some of these, I've reviewed some of these but not all. You are missing some I'm familiar with.

PouchDB+CouchDB work well out of the box with minimal fuss for open pieces you can just plug into this role. PouchDB handles the client's state persist and replication on the client, couchdb is the reliable cloud service you can replicate to.

Meteor, at least their pre-apollo stack had realtime collab type features with their mini-mongo client and oplog tailing.

winrid 2021-08-18 03:10:08 +0000 UTC [ - ]

DerbyJS and ShareDB/Racer are meant to go together.

I know of one big company using it extensively, hundreds of millions of messages a day! :)

moralestapia 2021-08-18 03:39:44 +0000 UTC [ - ]

Could you share more info about the use case?

Like a Google Docs kind of thing or w?

winrid 2021-08-18 04:42:24 +0000 UTC [ - ]

In this case, the entire product uses it. Every button, field, list, is "live" with derby+sharedb and scaled by horizontally scaling redis and the DB.

vyrotek 2021-08-17 19:25:20 +0000 UTC [ - ]

Fluid Framework looks pretty cool! I somehow missed the Build announcement about this.

Maybe it's just me, but it has a SignalR + Orleans sort of vibe to it when I think about the types of problems it solves. I will definitely be digging into this a bit more.

moralestapia 2021-08-18 03:44:14 +0000 UTC [ - ]

There's algo GUN [1], but I tried it and it's too opinionated for my taste. For those who like it, it seems to be quite good.

1: https://gun.eco/

craig_asp 2021-08-17 14:46:16 +0000 UTC [ - ]

We implemented all that manually, more or less in swift (and sqlite), then react+redux, and on the back end - postgres and python+flask. Works flawlessly so far. We do have the same setup more or less, with listeners triggering UI updates and push messages signalling the clients to fetch data from the server. Then, on the server, we have two dbs -> one where we store each update or create message, in a postgres-based queue, and another one, in a normalised format which we use for login (it's way faster than replaying all messages from the queue). There are complexities when you move beyond one or two tables, though - like maintaining relations, ensuring things get done in the correct order, that they get merged (we merge all attributes of each item - e.g. one client can change color, and the if another changes the text content of the item these will get merged), etc.

We gave up on the websocket part and implemented basic polling, because they were not supported by App Engine at the time (things might have moved on since then, which is a couple of years ago). Yet, for a note/todo/habit tracking app, it simply doesn't need to be real-time from our experience.

Have a play at https://www.mindpad.io/app/. You can see how it works if you open up the web app in two incognito tabs, or on an iPhone and the web.

albertgoeswoof 2021-08-17 21:50:12 +0000 UTC [ - ]

This stack reminds me of Meteor, which came out nearly a decade ago(!). https://meteor.com

It never really took off in the mainstream - I think because it was before many developers really trusted JS on the server, and a "full stack" framework is quite a big commitment for a team to shift to. Also most CRUD apps don't need real time collab.

I remember being amazed when changes were instantly propagated between my phone and laptop browsers with almost zero lag. This was the demo that sold it for me https://www.youtube.com/watch?v=MGbmW9bwJh4

thezjy 2021-08-18 01:34:49 +0000 UTC [ - ]

Author here. Thanks for mentioning Meteor, which also impressed me a lot when it first came out. I think it didn't take off because it tries to do too much (frontend + backend + db). And one smart move by Replicache is that it tries to integrate nicely with the rest of your stack.

valzam 2021-08-18 02:17:54 +0000 UTC [ - ]

I built my first big software project in Meteor! It was great, really a shame that it didn't take off. As you said I think it tries to do too much. Hell, at some point they even introduced their own package manager. It might be good for solo developers but as soon as you have a bit more bandwith I think you give up too much control.

eatonphil 2021-08-17 15:27:12 +0000 UTC [ - ]

I haven't yet done this but based on some research it seems to me like the core of any collaborative app today (that wants to avoid Firebase and the other hosted platforms like Replicache seems to be) is easiest served by picking some CRDT library.

There are a couple of open-source CRDT libraries that provide both clients and servers (yjs [0] and automerge [1] are two big ones for JavaScript I'm aware of).

My basic assumption is that as long as you put all your relevant data into one of these data structures and have the CRDT library hook into a server for storing the data, you're basically done.

This may be a simplistic view of the problem though. For example I've heard people mention that CRDTs can be space inefficient so you may want/have to do periodic compaction.

[0] https://github.com/yjs/yjs

[1] https://github.com/automerge/automerge

feanaro 2021-08-17 18:52:35 +0000 UTC [ - ]

You probably don't want to use Automerge. See https://josephg.com/blog/crdts-go-brrr/ for a nice CRDT optimization story.

eatonphil 2021-08-17 18:58:18 +0000 UTC [ - ]

Interesting! I know there was a large performance refactor that was merged in May [0]. This post you link was written in June of this year. Unclear if the performance fix is related to the reported issues and unsure if it still exists or not.

At the very least, the automerge maintainers seem to be very actively tackling performance problems.

[0] https://github.com/automerge/automerge/pull/253

josephg 2021-08-17 23:15:02 +0000 UTC [ - ]

Yep, the version of automerge I benchmarked for that blog post includes automerge’s performance branch changes.

They’re working on it, but it’ll be awhile before automerge gets close to Yjs in terms of performance.

aboodman 2021-08-18 00:13:49 +0000 UTC [ - ]

Replicache is not hosted - it's client-side only. You bring your own backend.

brunoqc 2021-08-17 16:15:28 +0000 UTC [ - ]

Would Chronofold works for this too?

eatonphil 2021-08-17 16:19:08 +0000 UTC [ - ]

If this [0] is what you're talking about, at the moment yjs and automerge are significantly more full-featured and used by many major companies.

[0] https://github.com/dkellner/chronofold

brunoqc 2021-08-17 16:32:29 +0000 UTC [ - ]

Thanks!

tabtab 2021-08-17 20:42:48 +0000 UTC [ - ]

> is easiest served by picking some CRDT library.

RDBMS A.C.I.D. and transactions are also capable of much of the same.

idontevengohere 2021-08-17 14:58:23 +0000 UTC [ - ]

Really interesting...you can build a similar (websocket/db backed) app with LiveView out of the box, no? Any idea how well that'd hold up against this solution?

valzam 2021-08-18 02:19:56 +0000 UTC [ - ]

The big difference is that with CRDTs you can make edits offline and they will get merged with other changes when you come back online. Websocket/db really only works when you always online.

That being said you can totally implement collab without CRDTs and if you don't particularly need offline it should be easier.

paulgb 2021-08-17 18:58:55 +0000 UTC [ - ]

Does LiveView have any conflict resolution, or would it just be last-write-wins?

_virtu 2021-08-17 15:53:08 +0000 UTC [ - ]

This was my first thought as well.

nesarkvechnep 2021-08-17 15:58:05 +0000 UTC [ - ]

I'm interested how this stacks against Phoenix Channels + Presence.

deathtrader666 2021-08-17 19:16:02 +0000 UTC [ - ]

Yes, it would be great if someone with this experience can chime in, especially since Phoenix has CRDTs built-in.

thezjy 2021-08-18 01:48:19 +0000 UTC [ - ]

Replicache's creator Aaron has a pretty good Twitter thread explaining the difference among Replicache, WebSocket and (classic) CRDTs. I will summarize briefly here:

- WebSocket (and Phoenix Channel) is just a communication method. To maintain consistency and resolve conflict, you need something like Replicache.

- CRDTs are more suitable for p2p scenario while Replicache works better for client-server apps.

- Phoenix's Presence is built with CRDT but it's just a single feature, not a general CRDT toolkit.

The thread: https://twitter.com/aboodman/status/1410441402366922762

ec109685 2021-08-18 07:09:24 +0000 UTC [ - ]

Also, a bit of the underlying plumbing here: https://twitter.com/aboodman/status/1323352541887754240

Wowfunhappy 2021-08-17 19:57:08 +0000 UTC [ - ]

I remember listening to an episode of the Exponent podcast, in which Ben Thompson said something like (paraphrasing from memory):

> People who love "native apps" can complain about Electron all they want—but there's simply no replacement for the real-time collaboration offered by web-based apps like Figma!

As someone who's not exactly thrilled with Electron and its memory usage—is there a reason the two go together? Is there a reason we can't build collaborative apps in Cocoa and GTK? I think these systems are awesome, I just think they'd be even better if they weren't also running full web browsers!

SilverRed 2021-08-18 05:02:13 +0000 UTC [ - ]

>in Cocoa and GTK?

This is reason enough. Already you now have to build the UI twice because there is no GUI framework that actually looks good on all OSs. You see this all the time where apps made on linux but technically work on macos just work terrible or look super ugly on macos.

You also have to remember windows, ios and android. When you build something targeting web browsers you only have to worry about screen sizes rather than OSs.

BackBlast 2021-08-17 20:47:45 +0000 UTC [ - ]

It could totally be done natively. The obstacle is how much of the stack you have to write and maintain. There are js libraries that do most of this heavy lifting for you, and CRDTs are pretty new to most devs.

It's just much much easier and cost effective to build a single code base and hit many many targets platforms with it.

Computing history has also shown that publishing efficient lean software doesn't help in the market. At least not over time to market, getting the key features right, and your ongoing costs.

rl3 2021-08-17 20:20:53 +0000 UTC [ - ]

Figma’s performance is excellent due in large part to the fact they compile a lot of native code to Wasm. Electron or not it’s still fast.

To answer your question, collaborative apps ideally need to target the widest possible audience. Barring a massive budget, the best way to accomplish this is to also have a singular compile/build target. In most cases, that’s the web platform.

Wowfunhappy 2021-08-17 21:22:01 +0000 UTC [ - ]

Figma's performance is impressive for an Electron app, but it does choke on very large files, which Sketch would have handled without a care. It's not great.

If Sketch had had Figma's collaboration features, we wouldn't have switched. But during the pandemic it was necessary.

rl3 2021-08-18 10:57:28 +0000 UTC [ - ]

Ah, that's a shame. Good to know, thanks.

timwis 2021-08-17 14:34:10 +0000 UTC [ - ]

What about conflict resolution? If two users update the same record/field around the same time? Isn’t that the trickiest part of real time?

tmikaeld 2021-08-17 14:37:12 +0000 UTC [ - ]

That's what Replicache[0] solves, it provides for Causal+ Consistency across the entire system.

"This means that transactions are guaranteed to be applied atomically, in the same order, across all clients. Further, all clients will see an order of transactions that is compatible with causal history. Basically: all clients will end up seeing the same thing, and you're not going to have any weirdly reordered or dropped messages."

[0] https://doc.replicache.dev/design

Note: There's more in their links, but the linked sites are down..

btown 2021-08-17 15:57:03 +0000 UTC [ - ]

It appears Replicache doesn't use CRDTs since it has a central source of truth: https://news.ycombinator.com/item?id=22175530

See also the commentary here: https://doc.replicache.dev/guide/local-mutations

This sounds a lot like Operational Transform but without the transform part - it assumes that locally applied mutations can be undone and rebased without user interaction. But I feel like the Google Wave team would have a lot of objections to the idea that this can just be ignored. If your state is just a group of key value stores where last write wins and everyone can agree on who's last, that's fine, but text/token streams require a notion of transformation that I'm worried Replicache simply glosses over.

Chris_Newton 2021-08-17 18:56:24 +0000 UTC [ - ]

Indeed, there can never be one universal solution to this, because the problem is one of specification rather than (only) implementation.

For example, suppose we have an edit/delete conflict, where two clients concurrently interact with the same entity in your data model. In a simple case, we can decide to “resurrect” the affected entity and apply the edit, which is the option that never results in significant data loss and so might be a reasonable behaviour if no user interaction is involved.

Now, what if there were other consequences of deleting that entity? Maybe the client that deleted the entity then created a new entity that would violate some uniqueness constraint if both existed simultaneously. Or maybe it wasn’t the originally deleted entity that would violate that constraint, but some related one that was also deleted implicitly because of a cascade. How should we reconcile these changes, if simply allowing either one to take precedence means discarding data from the other?

At least if all clients are communicating in close to real time, it’s unlikely that any one of them will diverge far from the others before they get resynchronised, so the scope for awkward conflicts is limited. But in general, we might also need to support offline working for extended periods, when multiple clients might come back with longer sequences of potentially conflicting operations, and there’s no general way to resolve that without the intervention of users who can make intelligent decisions about intent, or at least a set of automated rules that makes sense in the context of that specific application. And in the latter case, we’d still probably want to prove that our chosen rules were internally consistent and covered all possible situations, which might not be easy.

aboodman 2021-08-18 08:06:23 +0000 UTC [ - ]

> How should we reconcile these changes, if simply allowing either one to take precedence means discarding data from the other?

Exactly. This is why Replicache expresses change as high-level operations, like createPost or deletePerson that are application-defined.

Replicache doesn't try to automatically merge the effects of concurrent mutations, it just replays the mutations in the same order on each client. It's up to the implementation of the mutation to decide what the correct result is, and that answer can and often does change when the mutation is replayed on top of different states.

Because Replicache mutations are atomic, applications can also enforce invariants such as uniqueness or even more complex app-level invariants.

Imagine, for example, a calendaring application. An application built with Replicache can enforce the invariant that a room is only booked by one event in one time slice even under concurrent edits, just using normal programmatic validation. It's hard to do this kind of thing with CRDTs or other approaches to automatic merging because the data model knows nothing about the application's constraints.

It's a pretty simple-minded system, actually, but our experience is that it is a nice way to think about these problems and provides good results for many types of data, in particular structured data.

soco 2021-08-17 20:42:06 +0000 UTC [ - ]

The good old CAP theorem hits again...

aboodman 2021-08-17 22:23:09 +0000 UTC [ - ]

I’m not sure if you are understanding that when Replicache rebases operations locally it actually re-executes code which can have arbitrary effects. This design yields a lot of flexibility to preserve intent: the function can look at current state of world and decide to do something different.

Now, it is true that OT is considered the gold standard for certain kinds of collaborative editing, in particular unstructured text. But CRDTs are quickly catching up and I believe that any CRDT should by definition be implementable on top of Replicache.

Its also quite a lot easier to implement a Replicache backend than an ot backend.

josephg 2021-08-17 23:21:58 +0000 UTC [ - ]

I don’t know enough to comment on replicache, but you can also do OT on top of an operation based CRDT. For diamond types we’re making it support both - so if you want to, applications can do OT (which is simple, small, and fast) to talk to a server (or local proxy process), and then that process can do p2p server to server replication using CRDTs.

The result is we need way less complexity in the browser, or in applications. And still get all the advantages crdts bring - namely, no need for a central server acting as the source of truth.

aboodman 2021-08-18 08:12:43 +0000 UTC [ - ]

Cool, I need to look into this more.

I think for many customers the authoritative server is an advantage. It's useful in SaaS apps for the server to be able to override the clients, for all kinds of reasons -- antiabuse, authorization, extra validation rules, or just fixing bugs.

josephg 2021-08-19 01:03:46 +0000 UTC [ - ]

Yes, I completely agree. And I think we want both:

- A fast and well written CRDT that works in p2p networks should also work great for server-to-server replication in a data center (or across data centers).

- OT algorithms designed to work with centralized servers are simple, efficient, easy to code up and easy to work with. And they provide a really nice API for local applications to do IPC. CRDT libraries can expose OT endpoints just fine.

I'm still not 100% sure about what the best approach is in the P2P case. Embedding (/ linking) a CRDT library into every application would also work fine, but its complicated to get everything working across languages. And harder to update. The other option is running a single system / applicatoin wide CRDT-like service which manages credentials, that applications talk to like LSP / D-Bus. In that case, applications can just talk OT (which is much simpler).

Either approach would work.

tmikaeld 2021-08-17 16:44:47 +0000 UTC [ - ]

I'd rather it was configurable, since there's different use-cases for both and it can be in the same app. So you're definitely making a valid point.

tabtab 2021-08-17 20:46:42 +0000 UTC [ - ]

How one wants to see them could depend; that's why I recommend using an RDBMS. One can "play back" transactions using different orders and filters. If teams get confused or accidentally "step on each others toes", then one may need to review different scenarios to see what was intended by two or more parties.

2021-08-17 14:45:53 +0000 UTC [ - ]

Zealotux 2021-08-17 14:37:29 +0000 UTC [ - ]

Figma's blog has a few valuable articles on that subject: https://www.figma.com/blog/how-figmas-multiplayer-technology...

Gabrielwxf 2021-08-17 14:37:19 +0000 UTC [ - ]

I suppose, as mentioned in the essay, it's handled by Replicache.

amelius 2021-08-17 21:34:27 +0000 UTC [ - ]

Trickiest part is probably adding fine-grained access control rules.

tommoor 2021-08-17 14:47:29 +0000 UTC [ - ]

I believe it uses a CRDT hosted by a third party service.

BackBlast 2021-08-17 20:40:24 +0000 UTC [ - ]

You could build this with couchdb multi master regional servers and pouchdb on the client and have full consistency with the replication both to clients and servers as well as conflict resolution (in case of collision) done for you.

This route seems like a lot of extra work for pretty similar functionality.

winrid 2021-08-18 03:23:30 +0000 UTC [ - ]

At FastComments we store every change as an event, which can either be pushed or polled. Clients subscribe, and poll on reconnect.

Also, integrations use polling: https://github.com/FastComments/fastcomments-integrations/tr...

The integrations work kind of like DB slave replication. They do an initial sync and then maintain state via the event stream.

theanirudh 2021-08-18 04:36:31 +0000 UTC [ - ]

Great article! Is there something similar to Replicache that is targeted towards simple multiplayer games? Im building a multiplayer version of a clicker game like Universal Paperclips[0] and dealing with similar the problems that Replicache tries to solve.

[0] https://www.decisionproblem.com/paperclips/index2.html

Zealotux 2021-08-17 14:45:48 +0000 UTC [ - ]

So far I've managed to keep the state in my side-project in sync with Websockets and Redux, Replicache sounds like the kind of solution I'd love to use, but boy the pricing makes it impossible to even consider.

ZeroCool2u 2021-08-17 15:28:20 +0000 UTC [ - ]

I don't have any plans to use Replicache, but I went and looked at the pricing and I was kind of struck by your comment. Looking at it, it seems pretty fair to me? Especially under 10k MAC's. It seems like a flat rate / month is pretty nice too. Plus, it's free for all non-commercial use.

Am I wildly off base here? Is it just that middle tier jump to over 10k that is a no go?

Again, I don't have a horse in this race or even my own startup, just trying to understand if my own judgement is way off.

Zealotux 2021-08-17 15:45:32 +0000 UTC [ - ]

I would quickly be in the $500/mo tier and that would be a consequent cost to handle since I don't really make that kind of profit yet. But I have to agree anything beyond 10K is very reasonable given the features. I just kind of wish they had an more affordable bracket between 500 and 10K but they probably have reasons not to.

memco 2021-08-17 16:56:31 +0000 UTC [ - ]

Very nice writeup! However, the example did not fully work for me. I could perform CRUD on a single tab, but opening the list in multiple tabs did not replicate the list or actions. Seeing this in the console:

  [Error] Could not connect to the server.
  [Error] Fetch API cannot load https://damp-fire-554.fly.dev/replicache-pull?list_id=kx1I-gXPWwOxU9teRUJ_c due to access control checks.
  [Error] Failed to load resource: Could not connect to the server. (replicache-pull, line 0)

Safari 14.2 on macOS 10.15.7.

thezjy 2021-08-18 03:34:37 +0000 UTC [ - ]

Seems like a CORS related bug on Safari. I tested on Safari 15 and couldn't reproduce it.

https://stackoverflow.com/questions/63141448/safari-fetch-ap...

davedx 2021-08-17 17:06:53 +0000 UTC [ - ]

It's a nice summary of how to use these technologies, but considering it states avoiding vendor lock-in is a goal, I was surprised to see it using fly.io and a managed cockroachDB.

mrkurt 2021-08-17 17:28:05 +0000 UTC [ - ]

It didn't actually use CockroachDB, they ended up using Postgres + Read Replicas.

I work on Fly.io, but there's very little vendor lock in here. We can't afford to lock people in, we're too small. We need to make their existing stuff work with zero friction.

sirtimbly 2021-08-17 20:48:13 +0000 UTC [ - ]

A 225K gzipped .wasm file download for a client-side state management and persisistence layer is not great. It is competitive with some similar solutions, but still a lot for any web app's performance budget

SilverRed 2021-08-18 05:11:04 +0000 UTC [ - ]

0.2MB seems fine for a fully featured web app. Loading up the average website today typically loads tens of MB for an index page.

aboodman 2021-08-17 21:22:52 +0000 UTC [ - ]

The release build is 100k brotli I believe. It’s possible this site is using the dev binary.

thezjy 2021-08-18 01:28:56 +0000 UTC [ - ]

It uses the production binary, originally 604.59 KB and after brotli (Vercel uses brotli by default) 213.90 KB.

aboodman 2021-08-18 08:14:42 +0000 UTC [ - ]

Ah, when I brotli compress it locally, it's 188 (which is where I remembered 100 from) but I guess it uses different settings than the auto-brotli in Vercel.

Gabrielwxf 2021-08-17 14:48:37 +0000 UTC [ - ]

> Dealing with a global database brings in much complexity that is not essential to the subject matter of this article, which will wait for another piece.

Excellent write. It would be great to know why CockroachDB failed your needs.

thezjy 2021-08-18 01:29:37 +0000 UTC [ - ]

Thanks! Wait for my next article. Hope it won't be long.

ec109685 2021-08-18 04:20:33 +0000 UTC [ - ]

Thanks for the solid article.

Definitely interested in understanding end user benefit of the distributed database given one of purposes of library is to hide write latency and there needs to be coordination for every write.

awinter-py 2021-08-18 00:53:01 +0000 UTC [ - ]

fascinating that they claim offline-first too

web apps are sorely lacking a core storage technology

whoever gets their first may not make a lot of money but they'll be more influential than react (because the schema design will penetrate native dev as well)

2021-08-18 03:42:13 +0000 UTC [ - ]