300,000 GPUs. Multiple regions. Tens of trillions of tokens. Four months of continuous training. North of two billion dollars in compute.

This is Leviathan, a theoretical frontier LLM at the bleeding edge of what current infrastructure can handle.

Every facet of Leviathan is monstrous: 100-terabyte data dumps from its web crawlers, ingested in a matter of hours. Every record validated before it touches the training corpus.
PII scrubbing and dedup running at terabytes per second. Cleaned data served to GPU data loaders at tens of GiB/s without interruption. GPU telemetry fanned out fast enough to catch a NaN gradient before it costs millions.

When Leviathan stalls because the data pipeline can't keep up, 300,000 GPUs sit idle. That's $600,000 per hour in stranded compute.

Broker-based streaming wasn’t built for Leviathan

Kafka ties partitions to disks, disks to brokers, and brokers to each other. Every scaling decision moves data.

Partition Ceiling

Each Kafka broker tops out at a few thousand partitions before metadata overhead, leader elections, and controller bottlenecks degrade the cluster. On AWS MSK, the recommended limit is 6,000 partitions per broker including replication. Leviathan needs 900,000+ partitions for parallelism across 300,000 GPUs. With RF=3, that means 2.7 million partition-replicas. At 6,000 per broker, you need 450 brokers just to hold the partition metadata, before a single byte flows. And each broker you add makes the next problem worse.

Partition Rebalancing

Any broker scaling event triggers a rebalancing storm. At 900,000+ partitions, storms last 60 to 90 minutes. 300,000 GPUs idle the entire time. Adding more brokers makes the next storm worse.

Burst Elasticity

Scaling brokers means hours of data shuffling. A 100TB crawl dump can’t wait for a rebalance to finish. Every hour of pipeline delay is another $600K in GPUs doing nothing.

Disk I/O Ceiling

Fan-out scales linearly with consumer count. Five monitoring systems reading telemetry from the same topics means 5x the disk I/O. The broker’s physical IOPS becomes the ceiling. When alerting falls four hours behind, that’s $2.4M in wasted compute before anyone notices the training run diverged.

Multi-region fragility

Separate clusters per region plus replication tooling equals offset drift, cascading failures, and operational complexity that scales faster than the cluster. A 3 AM failure in one region blocks the global pipeline, which idles GPUs 8,000 miles away.

The largest AI labs have already hit these walls.

Some built custom consumer proxies, in-house data loaders, and bespoke partition tooling. Others built proprietary systems from scratch. The models still get trained, but the engineering cost is enormous.

WarpStream removes
the wall entirely

Same Kafka protocol. Same client libraries. Fundamentally different architecture.

No Partition Ceiling

Partitions are metadata pointers into object storage, not on-disk state tied to a broker. No per-broker limit. Customers are currently running 40,000+ partitions on a single WarpStream cluster. Leviathan's 900,000 partitions are just rows in the control plane.

No Rebalancing

Agents are stateless. No data lives on the agent, so there's nothing to rebalance. Scale from 100 to 500 agents in minutes via Kubernetes HPA. Scale back down when the burst is over. Zero coordination, zero data movement.

Elastic Burst Scaling

Five independent agent groups: ingestion, processing, data serving, telemetry, and evaluation. During a crawl dump burst, the ingestion group triples while the data serving group feeding 300,000 GPUs doesn't flinch. Same topics, same offsets, zero interference.

Fan-out Without Limits

Object storage chunks cached in agent memory, shared across all consumer groups. Five consumers or fifteen, same I/O cost. No disk multiplication. Alerting never falls behind because a sixth monitoring system got added.

Multi-Region Clusters

One logical cluster spanning multiple regions. No MirrorMaker, no offset drift, no separate clusters to manage. 99.999% uptime SLA. Zero data loss (RPO=0). Ripcord mode keeps writing even when the control plane is down. A 3 AM failure in one region doesn't block anything.

Theory Meets Practice

Production WarpStream clusters today sustain 20+ GiB/s burst throughput with automatic elastic scaling. No operator intervention, no rebalancing, no degraded windows. Individual tenants run 200+ agent nodes processing tens of TiB.

One production cluster runs tens of thousands of partitions, over ten thousand topics, and multiple petabytes of retained data without operator intervention. Partition count is metadata. Throughput drives scale.

Multiple frontier AI labs, AI-native dev tools companies, and large-scale ML platforms already run production workloads on WarpStream.
Autoscalingthroughput + agents
GiB/s
agents
Leviathan isn’t real just yet, but the constraints it exposes already are. As models grow, the limiting factor is no longer compute, it’s whether the systems feeding that compute can scale without becoming the bottleneck themselves. Kafka reaches those limits quickly because its assumptions stop holding.

By removing the architectural walls that keep systems like Leviathan theoretical, WarpStream extends the boundary of what's possible, today.