How Data Index Works
Data Index is a read-only query service that captures and normalizes workflow execution events from Quarkus Flow applications.
High-Level Architecture
Data Index supports two storage backends, each with different characteristics:
Quarkus Flow App
↓ (structured logging → stdout)
FluentBit DaemonSet
↓ (tail container logs)
Storage Backend
├─ PostgreSQL (triggers) → < 1ms normalization
└─ Elasticsearch (transforms) → ~1s normalization
↓
GraphQL API
Storage Backends
Data Index provides two production-ready storage options:
| Backend | Best For | Architecture |
|---|---|---|
PostgreSQL |
< 50K workflows/day, ACID transactions, simple deployment |
|
Elasticsearch |
100K+ workflows/day, full-text search, analytics |
Choose based on your requirements: - Need ACID transactions? → PostgreSQL - Need full-text search? → Elasticsearch - High throughput (>50K/day)? → Elasticsearch - Simple deployment? → PostgreSQL
Key Components
1. Quarkus Flow Applications
Applications with structured logging enabled write JSON events to stdout:
{"instanceId":"01KQ...", "eventType":"io.serverlessworkflow.workflow.started.v1", "timestamp":1777298089.549604, ...}
Critical configuration:
-
quarkus.flow.structured-logging.enabled=true -
quarkus.flow.structured-logging.timestamp-format=epoch-seconds -
Console handler for JSON output
2. FluentBit DaemonSet
FluentBit runs on each Kubernetes node and:
-
Tails
/var/log/containers/workflows.log(pods inworkflowsnamespace) -
Filters lines to extract only JSON events (not regular app logs)
-
Parses JSON and extracts fields
-
Forwards to storage backend (PostgreSQL or Elasticsearch)
Configuration:
-
fluent-bit.conf- Input, filter, output configuration -
flatten-event.lua- Lua script to flatten nested JSON -
Deployed via ConfigMap + DaemonSet
3. Storage Backend
PostgreSQL Mode
-
Raw tables - Store complete events in JSON format
-
Real-time normalization - Events normalized in less than 1ms
-
Normalized tables - Optimized for querying
See PostgreSQL Mode Architecture for details.
Elasticsearch Mode
-
Raw indices - Store complete events
-
Automated transforms - Extract fields and write to normalized indices (~1s)
-
Normalized indices - Optimized for querying and full-text search
See Elasticsearch Mode Architecture for details.
Event Lifecycle
-
Workflow executes in Quarkus Flow app
-
Structured logging writes JSON event to stdout
-
Kubernetes captures stdout to
/var/log/containers/POD_NAME.log -
FluentBit tails log file, extracts JSON events
-
Storage backend receives and normalizes events:
-
PostgreSQL: Events normalized in real-time (< 1ms)
-
Elasticsearch: Events normalized asynchronously (~1s)
-
-
Data Index queries normalized data via storage adapter
-
GraphQL returns data to user
Event Processing Time
| Metric | PostgreSQL | Elasticsearch |
|---|---|---|
Normalization |
< 1ms |
~1s |
End-to-end |
5-10 seconds |
5-10 seconds |
Collection interval |
1 second |
1 second |
Key Design Features
Real-Time Processing
Events are normalized immediately as they arrive:
-
No separate event processor service required
-
No polling or batch processing
-
Sub-second latency from event to query
-
Handles duplicates and out-of-order events automatically
What Data Index Does NOT Do
|
Data Index is read-only. It does NOT:
|
Choosing a Storage Backend
| Requirement | PostgreSQL | Elasticsearch |
|---|---|---|
ACID transactions |
✅ Yes |
❌ Eventual consistency |
Real-time (<1ms) |
✅ Yes |
⚠️ ~1s |
Full-text search |
⚠️ Limited |
✅ Excellent |
Throughput |
< 50K workflows/day |
100K+ workflows/day |
Complexity |
⭐⭐ Medium |
⭐⭐⭐ Higher |
Scaling |
Vertical (single writer) |
Horizontal (distributed) |
Next Steps
-
PostgreSQL Mode Details - Real-time normalization
-
Elasticsearch Mode Details - High-throughput processing
-
Deployment Guide - Choose and deploy