Why Firestore, Part II: reasons to hate it

LeanCode
10 min readNov 24, 2020

--

by Jakub Fijałkowski

Firestore (and Firebase) is a really great solution for many different use cases. As with everything that does so much, it gets complicated very, very quickly even if it looks simple on the surface.

Here are my personal favorites that can bite you real bad based on more than 10 mobile applications in Flutter and React Native and code audits which we have accomplished at LeanCode.

This is a series of articles where we are comprehensively describing the pros and cons of using Firestore as the backend for your next mobile application. In this series, we will try to show you that making this decision is not a simple process and you need to analyze your app from multiple perspectives.

Posts in the series:

  1. Why Firestore, Part I: reasons to love it
  2. Why Firestore, Part II: reasons to hate it [you are currently reading it].
  3. Why Firestore, Part III: 6 things you need to know before using Firestore
  4. Why Firestore, Part IV: how to escape it

Would you like to learn from our past experience on what can go wrong with Firestore and Firebase and what are their biggest disadvantages and limitations? Read on to find out why!

Security & data validation

Being able to access the data directly from the end-user devices creates some non-trivial problems. Normally, your backend system would act as an intermediary that handles all the cross-cutting concerns. Here there isn’t one. Everything that would normally be done there is now the responsibility of Firestore.

And this is a bad idea.

First and foremost, backend systems do validation & authorization. Most of the time those are separate concerns. Firestore, on the other hand, treats them as equal and presents you with the same solution to both. You basically have to write authorization rules that sometimes do the validation. You have to do that in a custom language that somewhat resembles JavaScript but is not. Normally I would say that using a custom language is a good idea. Not here. These things are too complex to be handled with a language that is only able to do simple comparisons, retrieve related documents by id or do a simple query. And pretend to have functions. You can’t really express anything meaningful and stay sane. Adding validation to the mix doesn’t really help (so most of the time there is no validation at all!).

Of course, there are cases when that is not a problem. For example, you can “easily” shield yourself from bad actors by just ignoring/sanitizing malformed data on the client. There are cases when you don’t really need any authorization besides simple “these users can only write these documents and read from those ones”. I would even say that most projects fall under this category in the beginning. Then the complexity creeps in and you find yourself deep in the broken Firestore rules, with validation code everywhere in your app. And no security.

Pricing, usage calculations & indices

Uh oh. I think this is the thing that made me (and well… my clients) scream in agony. Firestore pricing looks normal — you pay for ingress/egress, storage & operations. That’s understandable. What makes it hard is expense monitoring. Or to be more specific — lack of it.

Firestore doesn’t really give you a way to check how much you use. You can see how many ops you’ve already used, but when it comes to storage and ingress/egress, you’re pretty much left to yourself. Firestore doesn’t give you anything meaningful there. All you have is a single “storage used” on your GCP bill. It doesn’t tell you how much data you really have or how much new data you’re creating, it only tells you how much they’ve billed you. Nothing else. You can try to derive the changes from it, but that won’t be anywhere near “accurate”.

Theoretically, the documentation tells you how to calculate the storage that you use (or will use). You can calculate everything yourself but that requires you to download every single document in the database or do the calculation up-front when uploading the document for the first time. It’s also painfully complicated (for such a simply stated problem), terribly slow, and will cost you money just to calculate how much money you will pay.

What does count under the term storage used? Well, everything. Documents, collections (i.e. paths, as the collection isn’t really a thing when we’re talking about storage space), indices, you name it. You pay for every byte that you create and for every byte that Firestore creates for you. And it creates a lot.

By default, Firestore indexes all of the fields in your documents. All of them. You must explicitly disable indices and you can only create 200 exemption rules (as of 21–09–2020). This makes it extremely important to model your data carefully because one wrong index can result in a tremendous amount of unused data. I am guilty of overlooking this. In Activy, where we use Firestore to sync activities and used to sync Rankings across, we generated almost 24GB of indices for every 1GB of data. We haven’t used any of that.

So, when it comes to the pricing, you have to be really, really, really careful, even for simple cases. As I say — it only takes one bad actor to pollute your data.

Latency is tricky

Google does not make any promises regarding latency. That alone might be the key to rejecting Firestore as your database. Without any assurance, you can’t design your product well. Even if the timing would be high, but you would know it, you would be able to work around it. You could hide the latency by starting the request earlier in the process of just doing it completely in the background for example. This would increase complexity (that Firestore tries to avoid) but would be doable. Without known RTT (round-trip time, latency times 2) you can only measure and hope that it will be consistent.

And the measurements aren’t that good. Over one second for a small query is a really long time. This mostly coincides with our benchmarks in Activy that uses Firestore quite extensively. It works more or less the same as in the article, i.e.:

  1. One client uploads a document,
  2. Other clients receive the notification and process the document,
  3. Then it sets the “processing completed” marker,
  4. The first client receives the “processing completed” marker with notification about the changed document.

Just uploading the document (to a known path) takes more than 300ms in Activy. Waiting for processing and notification being sent takes another couple hundred ms (~200ms in our case). All of this gives 1s at best.

When comparing that to a simple WebSockets server running on the smallest GCP instance (as per the article), Firestore just looks bad. Even if you add message processing, some small database, and such, you won’t get more than, say, 500ms RTT on the smallest instances possible.

Accepting this kind of latency might be feasible for some applications, especially in their early stages, but using Firestore for near real-time communication is shooting yourself in the foot. You won’t be fast even if you did your best.

Querying is really simple

Firestore, even though it is somewhat powerful, is rather limited compared to traditional databases (be it RDBMS or another document database). Combining the basic queries with the index-all-by-default approach gives you a great starting point, but you need to model your data for search-ability upfront. There are a number of limitations that make using Firestore painful. Some of them (e.g. the limits of OR or array-contains/array-contains-any) is not that awkward, but the first limitation, namely that you can do range queries only on a single field, is terribly irritating. It’s quite common to do “get me all transactions from this date range that are valued no less than X” and this single rule disallows that. Also, Firestore does not support “negated” queries (like not-in or plain old !=) which makes common queries unrepresentable.

Document databases tend to have limited processing capabilities, that is you can’t compute values based on query results directly in the database like, e.g. SQL, nor do they allow joins. To overcome this, Google introduced the MapReduce approach that somewhat mitigates this issue. MapReduce became the de-facto standard for document databases (even MongoDB supports it!). Unfortunately, Firestore does not have anything like that. You can implement it yourself using so-called aggregation queries but that solution is really far from being perfect. You do have control over the process, but Firestore is unable to optimize any of this. You effectively have to do optimistically-concurrent transactions just to update a collection that works as a map-reduce index. This can work for simple cases, where your source collections aren’t modified frequently, but if there is some load, your retries will eat all of your performance.

Firestore makes normal data work painful

You develop version 1 of your mobile or web application with Firestore. Everything is great. Everything syncs correctly, everything works fast, the development was a pleasure. Your userbase is getting bigger, you become famous and money starts flowing. To make you even richer you start to think about new features. And you decide to implement them.

This is where the pain starts. You’ve developed your app, you’ve gained a userbase, all of your business logic is in the app, running directly on end-user devices and you need to migrate the data. Migrations are always tricky (esp. in NoSQL databases) but here, you’re in an even worse situation. Not only that you need to migrate some data, but you also need to build your app to handle multiple versions of said data.

Consider the situation where you need to modify the model. It doesn’t really matter if it will be just a quick fix that results in adding a field or whether it will completely revamp the data, although adding things is much easier to migrate. If you have to do it once, it is doable, but needs to be accounted for upfront: the previous version of your app needs to be created so that it does not crash when the model slightly changes. The new version just needs to handle the data from the old version (so — migrate it) and save the new data so that the old app won’t break.

Why not just abandon the old version, you might ask? Because you can’t force your users to upgrade the app. There will be some folks that will plainly refuse to upgrade but even if you can ignore them, the upgrade isn’t instantaneous and you can’t control it. Doing big upgrades with normal backend systems isn’t really viable also, but here at least you control everything and it is up to you when the upgrade will (or will not) be finished.

It gets even trickier when you need to migrate data more frequently. Needing to handle three or even four versions will result in much grief and many, many bugs. And if you deploy a rogue version of the app with a critical error… You can’t really take it back fast enough. It might do you immense damages before you will be able to revert it.

You will duplicate your data many times

Because Firestore querying is limited and there is no map-reduce, one data model won’t handle all the cases. You will either end uploading much more data than you need to and do everything in-mem, or you will duplicate the data. You can leverage Cloud Functions here to make it automatic, but you will need to handle all the CRUD actions yourself.

This also means that there will be some delay between adding/modifying the document and it is propagated to other collections. It is worth designing your app so that it handles eventual consistency well, but sometimes that is just overkill. Or worse, your business (or regulatory) requirements forbid you from being eventually consistent.

If you go with the data duplication (and doing the mapping yourself), you will end up with multiple separate copies of the same data, just structured differently. This not only increases the storage cost but also extremely increases the complexity. You now have not only a single document to version, but multiple related documents that need to be upgraded with care (and possibly atomically which might make the process even harder).

Backups

This paragraph will be a short one.

There are no backups in the Firestore.

You can export data to a GCS bucket but that isn’t really a backup — it’s just an export. Firestore doesn’t ensure consistency, the backups are mostly manual (you can script that but you have to write it yourself), the timings aren’t really predictable.

It’s just not a backup.

Summary

What matters the most is to be aware of those limitations before you design your system. Those problems are not always red flags and there are certain business cases where using it as your backend might be still a very reasonable decision. You have to weigh all the pros and cons and decide for yourself.

In the next article from our series, we will describe how to use the Firestore and Firebase benefits (which are described here) the right way and which steps should you consider when you have already found yourself in the traps described above.

Follow the link to read Why Firestore? 6 things you need to know before using Firestore.

--

--

LeanCode
LeanCode

Written by LeanCode

We‘re a group of technology enthusiasts working together for our clients to create better solutions for their digital consumers. See more at https://leancode.co