The Original Sin of Cloud Infrastructure

Richard Artoul
March 14, 2024
Many of today's most highly adopted open source “big data” infrastructure projects – like Cassandra, Kafka, Hadoop, etc. – follow a common story. A large company, startup or otherwise, faces a unique, high scale infrastructure challenge that's poorly supported by existing tools. They create an internal solution for their specific needs, and then later (kindly) open source it for the greater community to use. Now, even smaller startups can benefit from the work and expertise of these seasoned engineering teams. Great, right?
Read More

Deterministic Simulation Testing for Our Entire SaaS

Richard Artoul
March 12, 2024
How we leverage Antithesis to deterministically simulate our entire SaaS platform and verify its correctness, all the way from signup to running entire Kafka workloads.
Read More

Kafka as a KV Store: deduplicating millions of keys with just 128 MiB of RAM

Manu Cupcic
March 4, 2024
A huge part of building a drop-in replacement for Apache Kafka® was implementing support for compacted topics. The primary difference between a “regular” topic in Kafka and a “compacted” topic is that Kafka will asynchronously delete records from compacted topics that are not the latest record for a specific key within a given partition.
Read More

Anatomy of a serverless usage based billing system

Richard Artoul
February 8, 2024
Serverless products and usage based billing models go hand in hand, almost by definition. A product that is truly serverless effectively has to have usage based pricing, otherwise it’s not really serverless!
Read More

S3 Express is All You Need

Richard Artoul
November 28, 2023
The future of modern data infrastructure is object storage.
Read More

Unlocking Idempotency with Retroactive Tombstones

Richard Artoul
November 18, 2023
How we separated data from metadata to build support for idempotent producers in our Apache Kafka protocol layer.
Read More

Minimizing S3 API Costs with Distributed mmap

Richard Artoul
October 9, 2023
We first introduced WarpStream in our blog post: "Kafka is Dead, Long Live Kafka", but to summarize: WarpStream is a Kafka protocol compatible data streaming system built directly on top of object storage.
Read More

Hacking the Kafka PRoTocOL

Richard Artoul
September 18, 2023
How we built stateless load balancing into a protocol that was never designed for it.
Read More