Middleware saves 85% by replacing Apache Kafka with WarpStream to power AI observability platform

Feb 1, 2024
Richard Artoul


Middleware is a cloud observability company that uses AI to help customers identify, understand, and fix issues across their infrastructure. It’s probably not a surprise to you that their entire business relies on scalable streaming infrastructure: they’re ingesting and processing tens of TiBs of telemetry every day, and need to process it very efficiently.

When Middleware was first built, their engineering team used Apache Kafka® as a scalable queue between their event intake and downstream storage systems:

The initial iteration of this system worked well, but as they onboarded more customers and scaled the platform, they started to run into a number of problems with their Kafka infrastructure. 

Problems with Apache Kafka

Customer workloads were extremely unpredictable, sometimes customer logging volume would increase by 10x for a few minutes and then return back to baseline. Adding new Kafka brokers required partition rebalancing, which could take hours, and then downscaling after the fact was even slower and more difficult than upscaling.

Middleware soon realized that they couldn’t scale up Kafka fast enough to respond to these bursts. They mitigated this issue by running their Kafka clusters with large, over-provisioned broker VMs. This enabled them to absorb customer bursts, but drove up their cloud infrastructure costs significantly.

Another major contributor to Middleware’s costs was the fact they ran all of their critical infrastructure in three availability zones to ensure their product was reliable. However, Middleware quickly realized this was causing them to spend a huge amount of money on inter-zone networking fees for data replication. In fact, it turned out they were spending more on networking fees than hardware!

Leveraging WarpStream

Meghraj Choudhary is responsible for data ingestion and storage at Middleware, and his team started using Warpstream to solve their cost and scaling problems with Kafka.

“I got the WarpStream POC running in less than a day. Once I saw the stateless containers auto-scaling in our Kubernetes cluster, I knew there was no going back to Kafka. We basically ran as fast as we could to migrate our production workloads.” – Meghraj Choudhary, Staff Engineer, Middleware

Middleware saved more than 85% with WarpStream compared to their previous self-hosted Apache Kafka setup. They accomplished this by using instances that were 4x cheaper than before, letting their auto-scaler handle traffic spikes and organic growth, and leveraging WarpStream’s built-in zone awareness functionality to eliminate inter-zone networking fees entirely. In addition, Middleware’s traffic has quadrupled since they adopted WarpStream, but they haven’t had to perform any manual operations. This enabled them to spend more time developing their product instead of wasting countless engineering hours managing, scaling and rebalancing their Kafka clusters.

“When Meghraj first advocated WarpStream to me I was apprehensive. Obviously I was frustrated with all the problems we were having with our self-hosted Kafka clusters, but we had evaluated other vendors before, and they were all prohibitively expensive. We pride ourselves on being one of the most cost-effective observability providers in the space, but using WarpStream actually saved us money compared to self-hosting which is crazy when you think about it. Using WarpStream also freed up a bunch of our engineering time to focus on developing features that directly provide value to our customers.” – Laduram Vishnoi, CEO, Middleware

Return To Blog
Return To Blog