Engineering
Offline-First Landscape
Isaac Hinman
|
Jan 11, 2025
Introduction
When we set out to build Marco, we knew we were committing to two very difficult requirements: (1) IMAP-based, not API-based, and (2) cross-platform – web, Mac, Windows, Android, iOS.
We had a handful of additional ancillary requirements. One of these was offline-first, and I can now say confidently that we drastically underestimated its complexity.
I mentioned in a previous blog post that Missive was my daily driver in the recent past. It lacks offline support though, and this is one of the major downfalls of the product.
At Marco we believe that full offline support is crucial. "Managing your emails on an airplane with no wifi" is an example use case we frequently come back to. You should be able to read, delete, respond, and organise your emails with no internet connection. When you land and connect to wifi, everything should seamlessly sync.
That said, Marco is not a simple todo app. Marco is not an application that starts with zero data and grows in size gradually, as is the case with user-generated content like Notion, etc.
Marco is an application that deals with hundreds of MB of data, and hundreds of thousands (or millions) of rows/entities.
Essentially this means we are instantly jumping into the top 1% of heavy-duty use cases for offline-first implementations. Over time we realised that this actually rules out almost all available offline-first options.
Starting Point: WatermelonDB
I spent about a week deeply investigating the offline-first options available to us, in August 2024.
We (perhaps naively) had committed to the idea that our offline-first architecture should be database-agnostic – the offline-first logic should "end" at the API layer. We did not want to manage sync tables or schemas in Postgres – we wanted to write API endpoints, and manage our database ourselves.
Here's a rundown of the initial offline-first options we looked at:
FOSS, self-hosted
Database agnostic
Been around for ages, used in many production applications
Not quite FOSS, but has a free Self-Hosted Edition
Requires Postgres-level integration
Very complex architecture, requires both changes to Postgres _and_ a separate HA MongoDB deployment cluster
Appears to have been around quite awhile, but all their case studies are on demo/tiny/side projects
Looked interesting, but was in the middle of a complete rewrite
Requires Postgres-level integration
New version only handles data sync one way – it does not handle mutations
There are many other options, including RxDB, MongoDB Atlas, Couchbase, and on and on. The three listed above are the options that we deeply investigated. As will become clear, we should have looked further at this stage.
We settled on WatermelonDB and built the initial alpha version of Marco on it. The backend implementation is rather simple: there is a "pull" endpoint to GET data, and a "push" endpoint to POST mutations.
It is important here to note that although Marco is a native application in some targets, it also must run in a web browser. While we may have access to a filesystem or "true" SQLite in native targets, our common denominator is web, where persistent storage options are very limited.
On the (web) frontend, Watermelon uses IndexedDB (as do essentially all other options – even the WASM SQLite options are usually SQLite-on-top-of-IndexedDB). However, it turns out Watermelon faces a serious problem that all other relational frontend databases face – IndexedDB performance is terrible. To solve this, Watermelon uses a LokiJS adapter, which is literally just an in-memory database.
Yes, you heard that right. To get around IndexedDB performance issues, Watermelon uses LokiJS to... hold the entire database in memory. When your database size is 100MB+, this starts to become a serious problem.
Moreover, clients must pull data before they can push any new mutations, and mutations can easily be clobbered if the row has been updated on the backend before the frontend can push mutations.
On top of this, WatermelonDB is not as actively-maintained as it once was. Many issues and PRs are left without a response. For example, chunked initial syncing is not supported out of the box. We opened a PR for this in early December, but it's still not been merged.
We got quite far along with the Marco alpha build, and then had a bit of a panic in November. Our confidence in our WatermelonDB-based offline-first approach was decreasing steadily. We began to seriously question if this technology could actually support a rock-solid, modern user experience.
We decided we needed to find something better.
New Wave of Offline-First
This time around, we threw out any preconceptions we had about Postgres, separation of concerns at the API layer, etc. We had completely open minds and desperately wanted to find the "best" solution, no matter what that might look like. We were now extremely clear on the fact that we had a "tough" offline-first use case, and needed some serious help.
We discovered a host of "new wave" offline first implementations. We talked with the founders/developers of these projects and found so many extremely intelligent and talented people working on what is a very tough problem.
The leaders in this new wave are:
While Triplit and InstantDB can be described as "full stack databases", Convex is instead an entire backend solution, including API endpoints, etc. For this reason we excluded Convex, as it seems like a huge leap and essentially 100% lock-in.
Problems with Triplit
Our first (of several) rewrites was from WatermelonDB to Triplit. Both Triplit and InstantDB use triples to represent data – entities are stored as "triples" of (<entity id>, <entity field>, <entity value>). Such a representation makes syncing a straightforward affair.
Triplit's API and DX is best-in-class. However, although we desperately wanted to love the product, we found it unusable for our use case. The server-side implementation was eating gigabytes of RAM while sitting idle and would regularly OOM/crash. The client-side triples implementation would balloon 5MB of JSON into 1GB of SQLite.
We believe Triplit is a fantastic choice for any offline-first applications with relatively small storage needs. On top of this, we found the Triplit team to be incredibly talented and hard-working, and truly believe that by mid-2025, it will be a robust and capable product.
We absolutely love the developer experience with Triplit, and are rooting for the team to succeed!
But we need something which is reliable, highly performant, and battle-tested, now.
Problems with InstantDB
We next moved onto InstantDB, which can be considered a direct competitor to Triplit. Although both InstantDB and Triplit essentially solve the same problem, we found InstantDB to be a far worse implementation.
TypeScript types were non-existent. There was no sort/ordering by fields. There was no support for $like operators, and certainly no full text search. I believe some of these features have since been added – both teams are certainly scrambling to handle a million features and requests.
On the backend side of things, there are no webhooks, so it is impossible to respond to mutations in a scalable way. We would have had to build our own singleton subscriber microservice that then translated reactive queries into scalable webhooks. But reactive queries themselves are prone to dropped data, so we would need some sort of polling fallback... The backend story for InstantDB feels extremely incomplete.
Even looking past all of this, frontend queries that returned in 2-5ms with Watermelon+LokiJS were taking 200-500ms to return data with InstantDB. This is primarily because InstantDB is not optimistic, and was hitting network to fetch data with almost every request. There is no granular control as to what gets cached on the client side and what does not.
InstantDB is another promising product, and our understanding is that some teams are already building production applications on top of it. But it's simply too immature for our use case right now. There's essentially zero backend support, and the frontend UX felt like a massive downgrade coming from Watermelon.
Problems with PowerSync
Finally, to our great disappointment, we begrudgingly moved on to PowerSync. We were wary of PowerSync from the beginning, and our reluctance proved to be well-founded.
Although PowerSync is undoubtedly a mature product, and probably the most capable (on paper) of all options mentioned thus far, we hated every minute of working with it. The underlying tech might be the most production/enterprise friendly, but the DX is the worst by quite a margin.
There is a paid SaaS offering, but their pricing model would have made our use case prohibitively expensive. Therefore we needed to self-host PowerSync, which is quite a complex and expensive task in itself. Not only does it require Postgres-level integration, it also needs a HA MongoDB cluster, a lot of arcane yaml configuration, etc. It also required us to completely denormalise our Postgres tables, as relations are not properly supported across the sync buckets.
On the frontend side of things, PowerSync runs SQLite in WASM, and although the DX is fairly good, we found horrifying problems like off-by-one bugs in their Drizzle ORM integration, queries returning data from local db very slowly (100ms+), and long initialisation times (45s+ after login to Marco) with the UI view not updating until the sync fully completed.
Why So Many Problems?
There is a saying: if everyone is an asshole, maybe you're the asshole. Is the problem 5+ offline-first tools, or is it us?
Like with most things, the reality is "a bit of both". As mentioned, Marco is an incredibly data-intensive application, and from day one, we were pushing these offline-first tools to their absolute limits.
However, we also found the practical limits of these tools to be way lower than one would expect.
What is the underlying cause? In my estimation, the root cause is that all of these offline-first tools for web are essentially hacks. Because of Marco's web deployment target, which becomes our common denominator, we must support offline-first in a web browser, and web browsers only really support KV storage via IndexedDB.
All attempts to implement relational or graph databases within a web browser are essentially hacks. PowerSync itself is WASM SQLite... On top of IndexedDB. Binary SQLite chunks are literally stored in IndexedDB.
There are essentially three different variables:
The underlying (true) data store – this will always be IndexedDB for web implementations
How the data is represented for sync purposes
How the data is presented to developers
The new wave of tools are attempting triples/graph implementations, but the story is the same. Browsers only give you a KV API, and anything on top of that will be built in userland and will suffer poor performance once you hit a certain scale. Although triples are easy to store and sync, when you try to jam a relational layer on top, the whole thing starts to fall apart.
To be absolutely clear: these relational and graph implementations on top of IndexedDB do not start to show their cracks unless you have 10s/100s of MB of data, or hundreds of thousands of rows/entities. But at a certain scale, performance grinds to a halt and they become unusable. For lesser use cases, Triplit and InstantDB offer exceptional DX and velocity, with almost no drawbacks.
At this point, we were starting to pull our hair out, and were wondering if we needed to build our own sync engine. Like many others, we're highly impressed with Linear, but are also aware that their sync engine was a monumental engineering effort.
We're only a team of two, and we have a lot to work on besides offline-first itself.
Finally, A Solution
Some time in early December, I came back across an option which I had glanced over before, but disregarded: Replicache.
I think we were initially put off by their strange pricing model and the fact that it's closed-source.
I am so glad we took another look.
In terms of backend implementation, Replicache is somewhat similar to WatermelonDB, in that you need to implement push and pull endpoints in your backend, and it is otherwise entirely database-agnostic.
The frontend is where the crucial difference lies – Replicache is just a KV store. It is a thin layer on top of IndexedDB that adds reactivity and some querying DX. That's it. You get raw get
and set
performance. Some performance benchmarks are outlined here. The perf is truly remarkable.
The drawback to this KV approach is that searching/sorting/ordering entities would require scanning through entire collections. This is obviously a non-starter. In other words, if we wanted to use Replicache, we would need to handle indexing and search on our own.
We'll post more detailed write-ups on our tech stack in the future, but a quick summary of where we landed on the frontend is: Replicache + Orama. This gives us sophisticated and battle-tested data sync with conflict resolution and rebasing, but also extremely flexible and powerful indexing, full-text search, and more.
At the time of writing (January 2025), the Replicache team have just made it completely free and open source. This is because they've just released Zero, which looks extraordinarily compelling and will likely jump into the #1 spot for any offline-first product available.
We're eager to try Zero once it's a bit more stable, but for now will build our product on the extremely capable and robust piece of software that is Replicache.
Future of Offline-First
We embarked on a long and rambling journey through essentially all prior art and work in the offline-first world. It's a very hard problem to solve.
The good news is that there are many new teams and projects actively and energetically working on this problem.
Imagine a world where, as a fullstack developer, you can read and write data from an SDK in both your backend and your frontend, and they magically sync with each other. All your apps are instantly responsive. All your apps work offline out of the box.
This is already possible today with Triplit or InstantDB, if your use case is reasonable. And things are only improving.
I believe 2025 will be a year where HTTP/REST APIs will start to feel antiquated. Don't share endpoints – share databases.