v5.1 — Real-time ML Inference GA

빅데이터를
insight로 전환

Data pipelines, real-time dashboards, ML models — unified big data platform. Kafka, Spark, dbt, Snowflake를 하나의 orchestration layer에서. 8B+ events/day 처리.

8.2BEvents / day
340Active pipelines
99.97%Pipeline SLA
pipeline.dataflow.io/jobs
Events / day
8.2B
real-time
Pipelines
340
99.97% SLA
ML Models
52
in production
-- DATAFLOW pipeline definition CREATE PIPELINE churn_prediction AS SOURCE kafka.events TRANSFORM feature_store.enrich SINK ml.inference_v2; → Status: RUNNING · lag: 0.3s · rows: 12.4M/hr
ETL User events → Snowflake — healthy
ML Fraud scoring model — AUC 0.97
DASH Executive KPI board — live
Trusted by data-driven enterprises
Coupang Baemin Toss Naver Kakao Samsung LG U+ Woowa Brothers

Modern data stack의 모든 것

From ingestion to ML inference — end-to-end data platform for analytics & engineering teams.

🔄
Pipelines · Core

Unified Data Pipelines

Kafka, Spark, Flink, dbt — visual pipeline builder + SQL. Auto-scaling, schema evolution, dead letter queues.

📊
Dashboards

Real-time Dashboards

Sub-second query on billions of rows. Drag-and-drop charts, embedded analytics, scheduled reports.

🤖
ML

ML Model Serving

Feature store, training pipelines, A/B testing, real-time inference. scikit-learn, XGBoost, PyTorch 지원.

🗄️
Lakehouse

Lakehouse Architecture

S3/ADLS + Delta Lake/Iceberg. ACID transactions, time travel, schema enforcement on raw data.

🔍
Observability

Data Observability

Pipeline health, data quality checks, anomaly detection, lineage tracking. Alert on schema drift.

🔐
Governance

Data Governance

Column-level access, PII masking, audit logs, data catalog. GDPR, 개인정보보호법 compliance.

4단계로 modern data stack 구축

Connect, pipeline, visualize, predict — 평균 onboarding 1주, first dashboard 48시간.

Connect Sources

Kafka, PostgreSQL, S3, APIs — 150+ connectors. CDC, batch, streaming ingest.

Build Pipelines

Visual builder 또는 SQL로 ETL/ELT. dbt integration, schema registry, data quality rules.

Create Dashboards

Real-time charts, KPI boards, embedded widgets. Slack/Email scheduled delivery.

Deploy ML

Feature engineering → training → inference. Model registry, A/B test, auto-retrain.

데이터플로우 데이터 엔지니어링
8.2Bevents processed daily

Data engineering을
10x 더 빠르게

2019년 서울에서 시작한 DATAFLOW는 'data silos와 pipeline complexity'를 해결합니다. Lakehouse + real-time + ML — data team이 infra 대신 insight에 집중할 수 있는 platform.

Apache Kafka Apache Spark dbt Snowflake Delta Lake Airflow Feature Store SOC 2

고객들이 만든 data impact

E-commerce, fintech, logistics — DATAFLOW로 data-driven decision을 내린 기업들.

E-commerce · Enterprise

Real-time inventory 99.9% accuracy

8B events/day pipeline으로 재고·주문·배송 real-time sync. Stockout 73% 감소.

99.9%Accuracy
73%Less stockout
8BEvents/day
Fintech · Series B

Fraud detection AUC 0.97

Real-time ML inference on transaction stream. False positive 60% 감소, detection 200ms.

0.97AUC score
200msDetection
60%Less FP
Logistics · Growth

Route optimization ₩4.2B saved

GPS + weather + traffic ML pipeline. 배송 비용 18% 절감, ETA accuracy 94%.

₩4.2BAnnual savings
18%Cost down
94%ETA accuracy

Data platform 모듈 deep dive

Pipelines, Dashboards, ML, Governance — 각 모듈의 핵심 기능.

Data Pipeline Orchestration

Visual and code-based pipeline builder with auto-scaling compute.

  • 150+ source connectors
  • Kafka/Spark/Flink native
  • dbt project integration
  • Schema registry & CDC
  • Dead letter queues
  • Auto-retry & alerting

Real-time Analytics

Sub-second queries on petabyte-scale data with embedded analytics.

  • Drag-and-drop builder
  • SQL editor + saved queries
  • Embedded iframe widgets
  • Scheduled Slack/Email
  • Row-level security
  • Mobile-responsive

ML Platform

End-to-end ML lifecycle from feature engineering to production inference.

  • Feature store
  • Training pipelines
  • Model registry
  • Real-time inference
  • A/B testing framework
  • Auto-retrain triggers

Data Governance

Catalog, lineage, quality, and access control for enterprise compliance.

  • Data catalog & lineage
  • PII auto-detection
  • Column-level ACL
  • Quality scorecards
  • Audit logs
  • GDPR/개인정보보호법

Data stack과 바로 연결

150+ connectors, dbt, Airflow, REST API — 기존 data infrastructure에 plug-in.

📨 Apache Kafka
Apache Spark
🔧 dbt
❄️ Snowflake
🐘 PostgreSQL
☁️ AWS S3
🔷 BigQuery
🌊 Apache Flink
💬 Slack
🐙 GitHub
📊 Tableau
🔐 Okta SSO

Lakehouse + real-time + ML

Unified data platform — ingestion, transformation, analytics, ML inference in one stack.

Lakehouse + real-time + ML
  • Stream ProcessingKafka + Flink native. 8B+ events/day, sub-second latency.
  • LakehouseDelta Lake/Iceberg on S3. ACID, time travel, schema evolution.
  • Feature StoreOnline + offline features. Training/inference consistency 보장.
  • Cost OptimizerAuto-scaling compute, spot instances. Cloud bill 40% 절감.
Stream ProcessingKafka + Flink native. 8B+ events/day, sub-second latency.
LakehouseDelta Lake/Iceberg on S3. ACID, time travel, schema evolution.
Feature StoreOnline + offline features. Training/inference consistency 보장.
Cost OptimizerAuto-scaling compute, spot instances. Cloud bill 40% 절감.

Compute-based data pricing

14-day free trial. Pay for compute + storage or committed enterprise.

Monthly·Annual Save 20%

Starter

Small data teams & PoC.

₩0/월

Free tier · 1M events/mo

Get Started
  • 3 pipelines
  • 5 dashboards
  • 1M events / month
  • 7-day retention
  • ML inference
  • SSO

Enterprise

Petabyte-scale with dedicated infra.

Custom

committed compute · annual

Contact Sales
  • Unlimited events
  • Dedicated cluster
  • Custom ML models
  • VPC / on-premise
  • 99.97% SLA + CSM
  • SOC 2 reports

Data teams가 말하는 DATAFLOW

Data engineers, analysts, ML engineers — 4.8/5 on G2 Data Tools.

★★★★★
"8B events/day를 단일 platform에서. Kafka + Spark + dbt 통합이 game changer."
JK
김지훈Head of Data · ShopStream
★★★★★
"Fraud ML inference 200ms. Feature store로 training/serving consistency 완벽."
SL
Sarah LeeML Lead · RiskGuard
★★★★★
"Pipeline builder가 직관적. data engineer onboarding 1주 → 2일로 단축."
PM
박민수Data Eng · LogiPrime
★★★★★
"Cost optimizer로 cloud bill 40% 절감. FinOps team이 가장 좋아하는 feature."
EC
Emily ChenVP Data · MegaMart

데이터 거버넌스 & 보안

엔터프라이즈 데이터 플랫폼에 필수적인 lineage, access control, compliance를 기본 제공합니다.

📊
Data Lineage ETL→warehouse→dashboard 전 구간 추적
🔐
RBAC + ABAC 컬럼·행 단위 fine-grained access
🛡️
SOC 2 Type II 암호화 at rest/in transit · audit logs
📋
GDPR / PIPA PII 마스킹 · retention policy 자동화

Data catalog, quality scoring, anomaly alerts — 빅데이터 파이프라인의 신뢰성과 규제 준수를 하나의 governance layer로 관리합니다.

자주 묻는 질문

support@dataflow.io 또는 Slack community로 문의해 주세요.

Kafka, PostgreSQL, MySQL, MongoDB, S3, BigQuery, Snowflake, REST APIs 등 150+ connectors. CDC, batch, streaming 모두 지원.
Flink 기반 sub-second latency. 고객 사례 최대 8.2B events/day. Auto-scaling으로 traffic spike 대응.
Feature store → training pipeline → model registry → real-time inference endpoint. scikit-learn, XGBoost, PyTorch, ONNX 지원. A/B test 내장.
dbt project import, Airflow DAG trigger, schema registry sync. 기존 workflow를 그대로 유지하면서 DATAFLOW orchestration layer 추가.
Column-level ACL, PII auto-masking, audit logs, data lineage. SOC 2 Type II, GDPR, 개인정보보호법 compliance.
Enterprise 플랜에서 Kubernetes on-premise 또는 VPC dedicated cluster. Hybrid cloud (on-prem ingest + cloud analytics) 지원.

지금 바로 modern data stack을 구축하세요

14-day free trial · 1M events free · No credit card

Start Free Trial →

Data Platform 상담

30분 미팅으로 DATAFLOW가 귀사 data infrastructure에 맞는지 확인해 보세요.

🎯
Architecture Review현재 data stack 분석 & migration plan
📅
PoC Pipeline1주 PoC — 전담 data engineer 배정
🌏
Global RegionsSeoul · Tokyo · US-West data centers