How Core Flow AI achieved better performance at half the cost migrating from Supabase to PlanetScale
This case study is cross-posted from the Core flow AI blog.
Core Flow AI builds AI-powered entertainment apps. This involves everything from the web app to the infrastructure and all the AI models. We are responsible for a top 50 gen AI consumer app with over 500k users a day. We have scaled incredibly fast, resulting in over 10 million users in less than 1 year.
We've handled this growth with a team of less than 10 people. A critical part of our success relies on having a database that can handle 10x-ing every few months while minimizing developer overhead.
Pain points with Supabase
We, as many startups, started our project on Supabase. Supabase was incredibly easy to get started with and really helped us bootstrap our first few months. However, we quickly started running into database problems. Occasionally, we'd find a bad query resulted in the entire database locking up, or deleting users becoming next to impossible as we had amassed 100M profiles in the table.
Most of these issues were self-inflicted (aka writing bad queries). Supabase didn't provide much in terms of query-level observability to help us identify and improve these poorly written queries, and since we were growing so fast, we didn't have time to debug with a fine tooth comb. Instead, we had to vertically scale our database and add read replicas to temporarily delay the problem.
Eventually, we hit a point where we were on the largest self-serve sized database, hitting multiple outages a month with little visibility into what was going wrong. Supabase's team was incredibly responsive, but without seeing our code, there was little they could do to help.
They did offer to spin up a larger size, but it would take weeks for them to provision it. Unfortunately, we didn't have weeks to spare.
Why PlanetScale
Around the same time, we saw that PlanetScale released Metal for Postgres. The benchmarks were incredible. They also had much larger self-serve instances than Supabase, so we thought we'd get a lot more runway by simply moving to them and leveraging more vertical scaling.
Although we moved purely for the sake of vertical scaling and performance, we were incredibly impressed by functionality PlanetScale had, such as Insights, high availability with read replica failover, and hands-on support from Sam (PlanetScale CEO) himself.
Before we get into what the migration looked like, the result of moving to PlanetScale ended up being:
- 1.5x better performance at half the cost
- Query insights tooling that turned hours of debugging into minutes
- Hands-on support that helped us onboard quickly
- Self-serve upgrades without dealing with contracts or lengthy provisioning periods
Our 1 week migration to PlanetScale
Day 1
As a startup, we move fast. I gave myself ~1 week to move our production workloads live to PlanetScale. The great news is the PlanetScale team moves just as quickly. Before I knew it, a Slack channel was set up with Sam and the rest of the team sharing guides and quickly upgrading my account.
We use a few extensions such as pgvector, and before even considering moving, we needed to make sure it was supported in PlanetScale. Gomez from the Engineering team was quick to answer all of my questions. He helped us with the initial migration setup, and before we knew it, we had a sync up and running, replicating our Supabase data into PlanetScale.
Day 2
The sync completed, but there was a problem: the PlanetScale database was 2x the size of the Supabase database AND some tables were up to 10x bigger than others! We retried the sync with over 10TB disk a few times but had no luck. Unfortunately, the sync takes some time, so we didn't make much progress this day.
During some of this time, I also worked on decoupling our auth from Supabase database triggers to using Supabase auth hooks.
Day 3
After doing some deeper debugging, it turned out that the Supabase connection was timing out, causing a lot of dead tuples. This was causing the database storage size to explode but not actually sync real data. After bumping it, we had a successful sync!
The only hurdle after syncing was PlanetScale's max_connections was ~600, but we run on Vercel, which results in a myriad of connections. This was prior to the full release of PgBouncer for PlanetScale. We asked for support and, as always, Gomez replied within a few minutes explaining how to bump the number.
With no fear, we moved all production read traffic to PlanetScale Postgres (this was prior to any enterprise agreements or contracts) to serve real users.
Day 4
Immediately, we got some results:
- We halved our cost from Supabase → PlanetScale while also reducing CPU usage.
- We estimated a 1.5x performance increase with a 2x price decrease simply by extrapolating our read workloads.
- Latency looked a lot better, but we lacked visibility from Supabase's side to give an accurate value.
Within 3 days, we had all of our read traffic on PlanetScale.
Day 5
With metrics looking great and our auth now using auth hooks, we were ready to go to production.
As I was ready to migrate over, I noticed in the PlanetScale dashboard that one of my replicas went offline. This was incredibly alarming, so I hopped into the Slack and Gomez replied letting me know it was a routine rotation. Coming from Supabase, if any of my replicas go offline, I'm getting paged immediately, but with PlanetScale's high availability, they can rotate the write/read replicas freely. This has now been improved to have dedicated maintenance windows, so it's less alarming.
It was time for the switch. I put the app in maintenance mode for 15 minutes, switched out the URL, launched all of our services, and stared at Datadog dashboards. Gomez was there helping monitor the status, and we shut down all connections to the Supabase DB. Everything was surprisingly boring!
And the results we saw were… absurd. P99 of <10ms was amazing, especially when we barely spent any effort optimizing our queries.
Day 6
Within 1 week, we were running our production workloads for an app that gets hundreds of thousands of unique users daily on PlanetScale with a huge amount of support from the PlanetScale team. We can't thank them enough for their help with this migration.
Day 7 and beyond
One unexpected benefit we discovered after switching to PlanetScale was the PlanetScale Insights tool. It showed us exactly what queries were slow, how often they ran, and how many rows they returned. We took the top culprits, threw them into Claude, and asked it to optimize them.
Within a few hours, we were able to identify and fix queries that were chewing through 20-30% of the CPU to using <5%. This visibility empowered us to move fast and only focus on the handful of queries that really mattered.
The write latencies were so fast that instead of polling our Redis job queue for status updates, we just moved to polling the DB itself. The read replica barely moved and had a P99.9 of 0.2 ms. This simplified our source of truth for the status of jobs.
Finally, despite Metal's insane performance, it doesn't help much when you ship an incredibly bad query or have a huge change in traffic. That day came where I was paged and we saw our entire database become unresponsive. Normally, I'd be scrambling to see application Datadog logs and running psql commands to see what query is hanging.
However, this time I could immediately see on the PlanetScale Insights dashboard exactly which query had spiked in the last 5 minutes. This took our time to diagnose the problem from hours to just minutes.
Overview of performance, support, and cost.
Overall, we came to PlanetScale with the simple hope that performance would improve. What we actually got far exceeded that: incredible query insights tooling, outstanding customer support, and a much more reliable database.
The outcomes were:
- Performance increased by 1.5x, price decreased by 2x.
- Insights helped us reduce our CPU usage by 30% and root cause outages to minutes instead of hours.
- High availability meant I felt a lot more confident to simply restart our database in times of failure/when a query would be stuck, as it would fall back to another replica.
- Self-serve upgrading to extremely large instances without contracts or lead time, which is incredibly important as we're growing rapidly.
- Incredible support and responsiveness for any issues we had.
With PlanetScale, we can move faster than ever, focusing on our core competencies and spending less developer time debugging and optimizing SQL queries.
← Back to all case studiesYour business deserves a predictable database
Never worry about another 3am wake-up call saying the site is down. Give your engineers the power they deserve with a PlanetScale database today.