When should you add Redis to Postgres?

When should you add Redis to Postgres?

Redis is an in-memory key-value store. It’s often used as a cache, to augment a more persistent database.
The benefits you might expect from using it are:
  • The DB is in RAM, so the latency is a lot smaller.
  • It’s lighter and can handle many more requests per second, particularly for read-heavy workflows
Classically, you might think of it running on the same box as your server code. At that point there’s very little latency vs a hashmap in your code - and this feels almost like a cache library you’d add to your code. In fact - many cache libraries prefer to use Redis as a backend.
How well does this fare in a world where my Express server, Postgres DB and Redis DB are all managed by separate parties though?

I’m using a couple of my mainstays - fly.io for running the web server and supabase.com for Postgres. I’ve not used managed Redis before - so I’m testing out Upstash & Fly’s option (which is also managed by Upstash).
All of the providers offered US West server locations in California, so I picked that to prevent things from being too dominated by network latency.
I’m using two benchmarks that approximate workloads I care about -

Benchmark 1: API Key Check

I’m going to be validating an auth header on every request. How noticeable is the difference here? I’m not super worried about insert latency here - as the keys & values don’t change.
I’m also testing both Fly’s Redis and Upstash’s Redis to figure out how accurate Fly’s claims that their Redis is more optimized are.
Below is the code for one of the tests, but you can find the rest of them here (TODO: link):
app.get('/api-fly-redis/:key', async (request, response) => { const key = request.params.key; // Set up the test await flyRedisClient.set(`api-${key}`, "value"); const start = now(); await flyRedisClient.get(`api-${key}`); const time_elapsed = now() - start; console.log(`Fly Redis GET ${key} took ${time_elapsed}ms`); }
And, the results… aren’t super far apart:
----API Key Check benchmarks----- api-fly-redis - min/avg/max/stddev = 9.334/10.194/12.197/0.978 ms api-upstash-redis - min/avg/max/stddev = 6.162/6.922/7.576/0.590 ms api-pg - min/avg/max/stddev = 7.848/8.748/9.208/0.408 ms
The surprising findings are:
  • The numbers are both really close, and small enough that they don’t matter to me. I had assumed that PG would be ~2-5x slower - but it turns out it isn’t. My best guess is that pg caching is kicking in, and this would be much worse for a larger table.
  • fly-redis is in fact slower on this benchmark. It might just indicate an insufficient sample size - but it’s enough to convince me that I can just use Upstash for Redis, especially given that Fly’s Redis prices are much higher than Upstash’s.

Benchmark 2: Worker Queue

My web server is going to dispatch work to workers, and this work is on the user’s critical path. I’d like to minimize time until the work gets taken off the queue quite aggressively. The tasks are quite large, but workers are often idle.
You can do this in a pull-based manner (the worker polls/blocks on a queue), or a push-based manner (the server sends out a notification for when workers must check for new work). The 2x2 with PG and Redis, are then:
  • (PG, pull): Whenever a worker is free, they try to select one or more jobs using SKIP LOCKED
  • (PG, push): Workers LISTEN on a connection, and the server NOTIFY's. The workers then each try to select one or more jobs, using a SKIP LOCKED query.
  • (Redis, pull): Whenever a worker is free, it blocks on brpop , which pops an element off a list, blocking if the list is empty.
  • (Redis, push): Unclear if you’d use this, given that brpop already handles this well. But, you could presumably listen for a PubSub notification, and then brpop.
For now, I tested a subset of these:
  • For PG, the latency of a server-sent PG notify → the worker getting notified → the worker sending the server an ack message.
  • For Redis, the latency of a server-sent lpush → the worker unblocking on brpop → the worker sending the server an ack message.
Once again, the results are pretty close:
-----Worker Queue benchmarks----- queue-redis-brpop - min/avg/max/stddev = 7.429/9.536/21.995/4.399 ms queue-pg-notify - min/avg/max/stddev = 10.124/11.212/12.574/0.692 ms
But, I had to disable PGBouncer for to make the PG + NOTIFY approach work at all. We’ll look into this further in a second.

Conclusions: Pooling, Scale, Complexity

My most certain conclusions are:
  • If you’re using Redis w/ Fly, it’s fine to not use Fly’s Redis setup.
  • Postgres is about as fast as Redis for small databases when you have the ambient noise of network latency.
But one big gotcha was that I had to disable pooling via PGBouncer, because LISTEN does not work with PGBouncer enabled (the worker hands off the connection after calling LISTEN, and then it can’t actually listen on it). Sans PGBouncer’s connection pooling, Postgres has a much smaller connection limit, in the high double to low triple-digits. There’s a nice post about the trade-offs of PGBouncer here. Since connection pooling is more suited, or perhaps even necessary for the world where you have ephemeral DB clients, the tradeoff here is a little less clear.
So finally, Redis is probably fine. It’s pretty simple, so it doesn’t feel too expensive to add it to your stack. It feels nicer & cleaner to call in this context, and you might end up using it in other contexts.
Postgres is probably fine too. It does seem dubious that it looks like an all-purpose tool, but certain features might just break past some surprisingly scale. But actually yeah, for most apps, it’s 100% fine to hit your DB a few times on every request, and defer moving that to a cache much later.