How Data Index Works

Data Index is a read-only query service that captures and normalizes workflow execution events from Quarkus Flow applications.

High-Level Architecture

Data Index supports two storage backends, each with different characteristics:

Quarkus Flow App
    ↓ (structured logging → stdout)
FluentBit DaemonSet
    ↓ (tail container logs)
Storage Backend
    ├─ PostgreSQL (triggers) → < 1ms normalization
    └─ Elasticsearch (transforms) → ~1s normalization
    ↓
GraphQL API

Storage Backends

Data Index provides two production-ready storage options:

Backend Best For Architecture

PostgreSQL

< 50K workflows/day, ACID transactions, simple deployment

PostgreSQL Mode Details

Elasticsearch

100K+ workflows/day, full-text search, analytics

Elasticsearch Mode Details

Choose based on your requirements: - Need ACID transactions? → PostgreSQL - Need full-text search? → Elasticsearch - High throughput (>50K/day)? → Elasticsearch - Simple deployment? → PostgreSQL

Key Components

1. Quarkus Flow Applications

Applications with structured logging enabled write JSON events to stdout:

{"instanceId":"01KQ...", "eventType":"io.serverlessworkflow.workflow.started.v1", "timestamp":1777298089.549604, ...}

Critical configuration:

  • quarkus.flow.structured-logging.enabled=true

  • quarkus.flow.structured-logging.timestamp-format=epoch-seconds

  • Console handler for JSON output

2. FluentBit DaemonSet

FluentBit runs on each Kubernetes node and:

  • Tails /var/log/containers/workflows.log (pods in workflows namespace)

  • Filters lines to extract only JSON events (not regular app logs)

  • Parses JSON and extracts fields

  • Forwards to storage backend (PostgreSQL or Elasticsearch)

Configuration:

  • fluent-bit.conf - Input, filter, output configuration

  • flatten-event.lua - Lua script to flatten nested JSON

  • Deployed via ConfigMap + DaemonSet

3. Storage Backend

PostgreSQL Mode

  • Raw tables - Store complete events in JSON format

  • Real-time normalization - Events normalized in less than 1ms

  • Normalized tables - Optimized for querying

See PostgreSQL Mode Architecture for details.

Elasticsearch Mode

  • Raw indices - Store complete events

  • Automated transforms - Extract fields and write to normalized indices (~1s)

  • Normalized indices - Optimized for querying and full-text search

4. Data Index Service

Quarkus application providing:

  • GraphQL API - Query workflow instances and task executions

  • Storage adapter - JPA for PostgreSQL, ES Client for Elasticsearch

  • SmallRye GraphQL - GraphQL schema and resolvers

  • Health checks - Liveness and readiness probes

Event Lifecycle

  1. Workflow executes in Quarkus Flow app

  2. Structured logging writes JSON event to stdout

  3. Kubernetes captures stdout to /var/log/containers/POD_NAME.log

  4. FluentBit tails log file, extracts JSON events

  5. Storage backend receives and normalizes events:

    • PostgreSQL: Events normalized in real-time (< 1ms)

    • Elasticsearch: Events normalized asynchronously (~1s)

  6. Data Index queries normalized data via storage adapter

  7. GraphQL returns data to user

Event Processing Time

Metric PostgreSQL Elasticsearch

Normalization

< 1ms

~1s

End-to-end

5-10 seconds

5-10 seconds

Collection interval

1 second

1 second

Key Design Features

Real-Time Processing

Events are normalized immediately as they arrive:

  • No separate event processor service required

  • No polling or batch processing

  • Sub-second latency from event to query

  • Handles duplicates and out-of-order events automatically

Flexible Data Storage

Workflow input and output data stored as JSON:

  • Flexible - Any workflow schema supported

  • Queryable - Can filter by JSON fields when needed

  • GraphQL - Exposed as JSON strings for client parsing

Two-Stage Storage

Events stored in both raw and normalized formats:

  • Raw storage - Preserve original events for debugging and audit

  • Normalized storage - Optimized for fast querying

  • Replay capability - Can reprocess raw events if needed

What Data Index Does NOT Do

Data Index is read-only. It does NOT:

  • ❌ Execute workflows

  • ❌ Modify workflow state

  • ❌ Provide workflow management (start/stop/retry)

  • ❌ Store workflow definitions

  • ❌ Require a separate event processor service

Choosing a Storage Backend

Requirement PostgreSQL Elasticsearch

ACID transactions

✅ Yes

❌ Eventual consistency

Real-time (<1ms)

✅ Yes

⚠️ ~1s

Full-text search

⚠️ Limited

✅ Excellent

Throughput

< 50K workflows/day

100K+ workflows/day

Complexity

⭐⭐ Medium

⭐⭐⭐ Higher

Scaling

Vertical (single writer)

Horizontal (distributed)

Next Steps