Our website uses cookies.
Reject AllAllow all

This website stores cookies on your computer. The data is used to collect information about how you interact with our website and allow us to remember you. We use this information to improve and customize your browsing experience and for analytics and metrics about our visitors both on this website and other media.

How to deal with DBConnection.ConnectionError?

Inappropriate configuration

Or maybe your issues were caused by the wrong configuration. If you didn’t tweak the configuration at all, there is a chance that traffic on your website needs more DB connections, then simply pool_size increasing should be fine (if you still have resources on the application host). If not maybe you should spread the load between more instances or vertically scale your host, which is good but still will limit you at some point.

Heavy migrations

If the problems occurs during applying migrations I highly recommend reading Safe Ecto Migrations by David Bernheisel. It is the complete guide how to use Ecto migrations safety so there is no point for adding more here.

What can be done better?

If you learned about issues from users, your monitoring probably needs some work. From my experience, most of the problems can be detected earlier and you won’t have to deal with DBConnection.ConnectionError in a hurry.

It is all about observability - one of the non-functional requirements that can be easily forgotten and which is crucial if you want to continuously deliver a stable and performant platform for your users.

If you collect metrics describing query times or even just your response times you can detect in advance that something can cause issues in the near future. Just define limits and set alarms if something will go off those limits. It is quite important to set up automatic alarms, without them there is a high chance that you will miss a moment to act before the fire and all your collected metrics won’t pay off.

I really like the focus on the BEAM ecosystem on standardizing metrics. telemetry is a great foundation that allows libraries to export metrics and is easy to consume, which brings joy. So it should be quite easy to observe your Elixir applications.

Also, there is a lot of work on OpenTelemetry for BEAM, which can help you easily boost your observability to the next level, especially if you are working in a microservices environment.

And also big kudos to fly.io for supporting metrics at no cost (at the moment of writing) which can be easly used with Grafana Cloud Free. Thanks to that metrics won't cost you a penny.

Summary

We can sum up dealing with DBConnection.ConnectionError into this 3 step process:

  1. Buying more time (mitigating issues)
    • Tune your settings
    • Scale database instance
    • Decrease not critical load
  2. Finding the root cause
    • Long-lasting queries
    • Polluted transactions
    • Wrong model
    • Inappropriate configuration
    • Heavy migrations
  3. Learn from the incident
    • Postmortem

This basic 3 step pattern: mitigate, find the root cause, learn from the incident will also help you deal with other issues in production environment, not only DBConnection.ConnectionError. Feature flags and postmortems, I've introduced earlier are generic tools so introducing them will help you in many cases. So if you don’t use them already I recommend checking them out.

I hope that after reading this article dealing with DBConnection.ConnectionError and other issues in production will be a bit easier for you and what’s more important you will be prepared for them or just predict them and avoid.