PostgreSQL Mode Architecture
Data Index with PostgreSQL storage uses trigger-based normalization for real-time event processing.
Architecture Diagram
Quarkus Flow App
↓ (structured logging → stdout)
FluentBit DaemonSet
↓ (tail /var/log/containers/)
PostgreSQL Raw Tables (JSONB)
↓ (BEFORE INSERT triggers)
PostgreSQL Normalized Tables
↓ (JPA/Hibernate)
GraphQL API
Data Flow
-
Quarkus Flow emits events - JSON to stdout
-
Kubernetes captures logs -
/var/log/containers/POD_NAME.log -
FluentBit collects - Tails log files, filters JSON events
-
INSERT to raw tables -
workflow_events_raw,task_events_raw(JSONB columns) -
Triggers fire immediately - Extract fields from JSONB
-
UPSERT to normalized tables -
workflow_instances,task_instances -
GraphQL queries - Via JPA entities
Key Characteristics
| Characteristic | Details |
|---|---|
Latency |
< 1ms for normalization, 5-10s end-to-end |
Consistency |
ACID transactions, guaranteed consistency |
Throughput |
< 50K workflows/day (PostgreSQL write limit) |
Complexity |
Simple - no separate event processor service |
Search |
Limited - basic filtering, no full-text search |
Status |
✅ Production Ready |
Real-Time Normalization
Events are normalized in real-time as they arrive:
-
FluentBit writes raw events - Complete events stored as JSON for debugging
-
Events normalized automatically - Fields extracted and stored in optimized tables
-
Immediate availability - Data ready for querying in less than 1ms
Benefits:
-
Real-time processing (<1ms latency)
-
No separate event processor service
-
ACID transaction guarantees
-
Handles duplicates and out-of-order events automatically
Trade-offs:
-
PostgreSQL-specific implementation
-
Limited throughput vs. Elasticsearch mode (< 50K workflows/day)
-
Schema changes require database updates
Raw Event Storage
Raw events are preserved in their original JSON format:
-
workflow_events_raw - All workflow-related events
-
task_events_raw - All task-related events
Benefits:
-
Debugging - Original events preserved for troubleshooting
-
Replay - Can reprocess if normalization logic changes
-
Audit - Complete event history maintained
-
Flexibility - Accepts any event structure without schema changes
Normalized Tables
Events are automatically normalized to optimized tables for querying:
-
workflow_instances - One row per workflow execution
-
task_instances - One row per task execution
How it works:
-
Fields extracted automatically from raw events
-
Duplicates handled transparently
-
Out-of-order events resolved correctly
-
Immutable fields (like start time) preserved from first event
-
Mutable fields (like status, output) updated from latest event
Configuration
Quarkus Flow Application
# Structured logging
quarkus.flow.structured-logging.enabled=true
quarkus.flow.structured-logging.timestamp-format=epoch-seconds
FluentBit
[INPUT]
Name tail
Path /var/log/containers/*_workflows_*.log
Parser docker
[FILTER]
Name grep
Match *
Regex log {".*eventType.*}
[OUTPUT]
Name pgsql
Match *
Host postgresql
Database dataindex
Table workflow_events_raw
Data Index Service
quarkus.datasource.db-kind=postgresql
quarkus.datasource.jdbc.url=jdbc:postgresql://postgresql:5432/dataindex
quarkus.hibernate-orm.database.generation=none
|
Schema initialization is performed manually in production using SQL migration scripts from the |
Scaling Considerations
PostgreSQL mode scales well for moderate workloads:
Vertical scaling: - Increase PostgreSQL instance size - More CPU/memory for trigger processing - SSD storage for write performance
Horizontal scaling: - Read replicas for GraphQL queries - Connection pooling in Data Index service - Multiple Data Index instances (stateless)
Limitations: - Single PostgreSQL writer (triggers can’t be distributed) - ~50K workflows/day practical limit - For higher throughput, consider Elasticsearch mode