WarpStream Tableflow

Materialize Kafka Topics as Iceberg Tables

Tableflow is an Iceberg-native database that materializes tables from any Kafka topic and automates ingestion, compaction, and maintenance. A truely complete real-time data lake solution.

Source Topic
tableflow_json
Table UUID
7766aaf0-16f6-4a9b
Table Path
warpstream/tableflow/
Bucket URL
s3://bucketurl
airflow_dag.json
ERROR: orphan files detected
table_repair.hql
cleanup.yml
WARN: schema drift
manual_compaction.sh
WARN: compaction lagging behind
airflow_dag.json
ERROR: orphan files detected
table_repair.hql
cleanup.yml
WARN: schema drift
manual_compaction.sh
WARN: compaction lagging behind
airflow_dag.json
ERROR: orphan files detected
table_repair.hql
cleanup.yml
WARN: schema drift
manual_compaction.sh
WARN: compaction lagging behind
airflow_dag.json
ERROR: orphan files detected
table_repair.hql
cleanup.yml
WARN: schema drift
manual_compaction.sh
WARN: compaction lagging behind
With Tableflow
Without Tableflow
Tableflow is the easiest, cheapest, and most flexible way to convert Kafka topic data into Iceberg tables with low latency, and keep them compacted.
Auto-Scaling
Built-In Dead-Letter Queue (DLQ)
Automatic Retention
Automatic compactions
Automatic table maintenance
Custom partitioning

Zero Ops,
Zero Access

WarpStream Tableflow follows the same Zero-Access BYOC design principles as WarpStream for Kafka, meaning that your raw data never leaves your environment.

Raw data is processed on your VMs, stored in your object storage buckets, and that raw data is never accessible by any third party (including us!).
Plug & Play

Attach to any existing Kafka cluster

WarpStream Tableflow works with any Kafka-compatible source (Open-Source Kafka, MSK, Confluent Cloud, WarpStream, etc), and can run in any cloud or even on-premise. Ingest simultaneously from multiple different Kafka clusters to centralize your data in a single lake.
Compatible With
GCP BigQuery
AWS Athena
DuckDB
ClickHouse
AWS Glue
Trino
Give Us A Spec

We’ll Take Care
of the Rest

Define your Iceberg table configuration in a declarative YAML file, then sit back and relax as the WarpStream Agents connect to your Kafka cluster and start creating Iceberg tables to your exacting specification.
tables:
   - source_cluster_name: "benchmark"
     source_topic: "example_json_logs_topic"
     source_format: "json"
     schema_mode: "inline"
     schema:
       fields:
         - { name: environment, type: string, id: 1}
         - { name: service, type: string, id: 2}
         - { name: status, type: string, id: 3}
         - { name: message, type: string, id: 4}
   - source_cluster_name: "benchmark"          
     source_topic: "example_avro_events_topic"
     source_format: "avro"
     schema_mode: "inline"
     schema:
       fields:
           - { name: event_id, id: 1, type: string }
           - { name: user_id, id: 2, type: long }
           - { name: session_id, id: 3, type: string }
           - name: profile
             id: 4
             type: struct
             fields:
               - { name: country, id: 5, type: string }
               - { name: language, id: 6, type: string }

Iceberg-Native Database

Tableflow is not just a connector or “zero-copy” version of Kafka tiered storage. It’s a magic, auto-scaling, completely stateless, single-binary database that runs in your environments, connects to your Kafka clusters, and manufactures Iceberg tables to your specification using a declarative YAML configuration. Tableflow is to Iceberg-generating Spark pipelines what WarpStream is to Apache Kafka.
The Iceberg tables created by Tableflow are fully-managed, which means that ingestion, compaction, table maintenance, and all other operations are handled by WarpStream automatically. In addition, Tableflow allows you to configure custom sorting and partitioning schemes for your data, enabling faster queries and lower costs. 
Comparison
Apache Spark
Built-in / “Zero Copy” / “Tiered Storage”
Connector-based solutions
WarpStream Tableflow
Auto-Scaling
Built-In DLQ
Automatically Enforces Retention
Automatically manages compactions and table maintenance
Custom partitioning
Compatible with any Kafka-compatible source
Ingest from multiple different Kafka clusters at the same time

FAQs

Don't see an answer to your question? Check our docs, or contact us directly.

Does WarpStream Tabeflow require using WarpStream topics as the source?

No, Tableflow is vendor agnostic and can be used with any Kafka-compatiable topic source (like open-source Kafka, MSK, Redpanda, Confluent, etc.). You do not need to use WarpStream as your source system to leverage Tableflow. In fact, you can use multiple source systems or clusters to ingest data into Tableflow.