PostgreSQL Mode Architecture

Data Index with PostgreSQL storage uses trigger-based normalization for real-time event processing.

Architecture Diagram

Quarkus Flow App
    ↓ (structured logging → stdout)
FluentBit DaemonSet
    ↓ (tail /var/log/containers/)
PostgreSQL Raw Tables (JSONB)
    ↓ (BEFORE INSERT triggers)
PostgreSQL Normalized Tables
    ↓ (JPA/Hibernate)
GraphQL API

Data Flow

Quarkus Flow emits events - JSON to stdout
Kubernetes captures logs - /var/log/containers/POD_NAME.log
FluentBit collects - Tails log files, filters JSON events
INSERT to raw tables - workflow_events_raw, task_events_raw (JSONB columns)
Triggers fire immediately - Extract fields from JSONB
UPSERT to normalized tables - workflow_instances, task_instances
GraphQL queries - Via JPA entities

Key Characteristics

Characteristic	Details
Latency	< 1ms for normalization, 5-10s end-to-end
Consistency	ACID transactions, guaranteed consistency
Throughput	< 50K workflows/day (PostgreSQL write limit)
Complexity	Simple - no separate event processor service
Search	Limited - basic filtering, no full-text search
Status	✅ Production Ready

Characteristic

Details

Latency

< 1ms for normalization, 5-10s end-to-end

Consistency

ACID transactions, guaranteed consistency

Throughput

< 50K workflows/day (PostgreSQL write limit)

Complexity

Simple - no separate event processor service

Search

Limited - basic filtering, no full-text search

Status

✅ Production Ready

Real-Time Normalization

Events are normalized in real-time as they arrive:

FluentBit writes raw events - Complete events stored as JSON for debugging
Events normalized automatically - Fields extracted and stored in optimized tables
Immediate availability - Data ready for querying in less than 1ms

Benefits:

Real-time processing (<1ms latency)
No separate event processor service
ACID transaction guarantees
Handles duplicates and out-of-order events automatically

Trade-offs:

PostgreSQL-specific implementation
Limited throughput vs. Elasticsearch mode (< 50K workflows/day)
Schema changes require database updates

Raw Event Storage

Raw events are preserved in their original JSON format:

workflow_events_raw - All workflow-related events
task_events_raw - All task-related events

Benefits:

Debugging - Original events preserved for troubleshooting
Replay - Can reprocess if normalization logic changes
Audit - Complete event history maintained
Flexibility - Accepts any event structure without schema changes

Normalized Tables

Events are automatically normalized to optimized tables for querying:

workflow_instances - One row per workflow execution
task_instances - One row per task execution

How it works:

Fields extracted automatically from raw events
Duplicates handled transparently
Out-of-order events resolved correctly
Immutable fields (like start time) preserved from first event
Mutable fields (like status, output) updated from latest event

Configuration

Quarkus Flow Application

# Structured logging
quarkus.flow.structured-logging.enabled=true
quarkus.flow.structured-logging.timestamp-format=epoch-seconds

FluentBit

[INPUT]
    Name              tail
    Path              /var/log/containers/*_workflows_*.log
    Parser            docker

[FILTER]
    Name              grep
    Match             *
    Regex             log {".*eventType.*}

[OUTPUT]
    Name              pgsql
    Match             *
    Host              postgresql
    Database          dataindex
    Table             workflow_events_raw

Data Index Service

quarkus.datasource.db-kind=postgresql
quarkus.datasource.jdbc.url=jdbc:postgresql://postgresql:5432/dataindex
quarkus.hibernate-orm.database.generation=none

Schema initialization is performed manually in production using SQL migration scripts from the data-index-storage-migrations module. Development mode can optionally use Flyway for automatic migrations.

Scaling Considerations

PostgreSQL mode scales well for moderate workloads:

Vertical scaling: - Increase PostgreSQL instance size - More CPU/memory for trigger processing - SSD storage for write performance

Horizontal scaling: - Read replicas for GraphQL queries - Connection pooling in Data Index service - Multiple Data Index instances (stateless)

Limitations: - Single PostgreSQL writer (triggers can’t be distributed) - ~50K workflows/day practical limit - For higher throughput, consider Elasticsearch mode