Distributed systems built on object storage all have one common problem: removing files that have been logically deleted either due to data expiry or compaction. We review the pros and cons of five ways to solve this problem.
Pprof is an amazing tool for debugging memory leaks, but what about when it's not enough? Read about how we used gcore and viewcore to hunt a particularly nasty memory leak in a large distributed system.
Today, we're excited to announce WarpStream Schema Linking, a tool to continuously migrate any Confluent-compatible schema registry into a WarpStream BYOC Schema Registry. WarpStream now has a comprehensive Data Governance suite to handle schema needs.
We’ve released Diagnostics, a new feature for WarpStream clusters! Diagnostics continuously analyzes your clusters to identify potential problems, cost inefficiencies, and ways to make things better. It looks at the health and cost of your cluster and gives detailed explanations on how to fix and improve them.
In this blog post we'll explain how transactions work in Kafka by comparing and contrasting the implementations of transactions in two different Kafka implementations: the official Apache Kafka project, and WarpStream.
In this post, we’ll look at what noisy neighbors are, the current ways to handle them (cluster quotas and mirroring clusters), and how WarpStream’s solution compares in terms of elasticity, operational simplicity, and cost efficiency.
WarpStream BYOC reimplements the Kafka protocol with a stateless, zero-disk cloud-native architecture, replacing Kafka brokers with WarpStream Agents to simplify operations. But data streaming extends beyond Kafka clusters alone.
In this post, I’ll start off with a brief overview of “shared nothing” vs. “shared storage” architectures in general. This discussion will be a bit abstract and high-level, but the goal is to share with you some of the guiding philosophy that ultimately led to WarpStream’s architecture.