PostgreSQL Deployment
Deploy Data Index with PostgreSQL storage backend for production workloads.
Overview
PostgreSQL mode uses trigger-based normalization for real-time event processing with ACID guarantees.
Architecture:
Quarkus Flow Apps → Container Logs → FluentBit DaemonSet
↓
PostgreSQL (raw tables)
↓ (triggers < 1ms)
PostgreSQL (normalized tables)
↓
Data Index GraphQL API
Characteristics:
-
Latency: < 1ms normalization, 5-10s end-to-end
-
Throughput: < 50K workflows/day
-
Consistency: ACID transactions
-
Complexity: Medium (simpler than Elasticsearch)
Prerequisites
Infrastructure:
-
Kubernetes cluster 1.24+
-
PostgreSQL 13+ server or managed service
-
Persistent storage for PostgreSQL data
-
Helm 3.x (optional, for PostgreSQL installation)
Namespaces:
kubectl create namespace data-index
kubectl create namespace logging
kubectl create namespace workflows
kubectl create namespace postgresql # If deploying PostgreSQL to cluster
Step 1: Deploy PostgreSQL
Choose one of these options:
Option A: Managed PostgreSQL (Recommended for Production)
Use a managed PostgreSQL service:
-
AWS: RDS for PostgreSQL
-
GCP: Cloud SQL for PostgreSQL
-
Azure: Azure Database for PostgreSQL
-
Other: DigitalOcean, Heroku, Aiven, etc.
Requirements:
-
PostgreSQL 13+
-
Database:
dataindex -
User with CREATE, INSERT, UPDATE, DELETE, SELECT permissions
-
Network accessible from Kubernetes cluster
Create database:
CREATE DATABASE dataindex;
CREATE USER dataindex WITH PASSWORD 'your-secure-password';
GRANT ALL PRIVILEGES ON DATABASE dataindex TO dataindex;
Connection details:
Host: your-postgres-host.region.provider.com
Port: 5432
Database: dataindex
User: dataindex
Password: your-secure-password
Option B: PostgreSQL in Kubernetes (Development/Testing)
Deploy PostgreSQL using Bitnami Helm chart:
# Add Bitnami repository
helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo update
# Install PostgreSQL
helm upgrade --install postgresql bitnami/postgresql \
--namespace postgresql \
--set auth.username=dataindex \
--set auth.password=dataindex123 \
--set auth.database=dataindex \
--set primary.persistence.size=10Gi \
--set primary.service.type=NodePort \
--set primary.service.nodePorts.postgresql=30432 \
--wait
Verify:
kubectl get pods -n postgresql
kubectl logs -n postgresql postgresql-0
# Test connection
kubectl exec -n postgresql postgresql-0 -- \
env PGPASSWORD=dataindex123 psql -U dataindex -d dataindex -c '\l'
Connection details (from within cluster):
Host: postgresql.postgresql.svc.cluster.local
Port: 5432
Database: dataindex
User: dataindex
Password: dataindex123
Step 2: Initialize Database Schema
For production deployments, you must manually execute the database migration scripts.
Migration scripts location:
The SQL migration scripts are located in the data-index-storage-migrations module:
data-index/data-index-storage-migrations/src/main/resources/db/migration/
├── V1__initial_schema.sql
└── (future migration scripts...)
Execute migrations:
# Download migration scripts from the repository
# (or extract from data-index-storage-migrations JAR)
# Execute each migration in order
kubectl exec -n postgresql postgresql-0 -- \
env PGPASSWORD=dataindex123 psql -U dataindex -d dataindex \
-f /path/to/V1__initial_schema.sql
What gets created:
-
Raw tables:
workflow_events_raw,task_events_raw -
Normalized tables:
workflow_instances,task_instances -
Trigger functions:
normalize_workflow_event(),normalize_task_event() -
Indexes for performance
Verify schema:
kubectl exec -n postgresql postgresql-0 -- \
env PGPASSWORD=dataindex123 psql -U dataindex -d dataindex -c '\dt'
# Should show:
# - workflow_events_raw
# - task_events_raw
# - workflow_instances
# - task_instances
|
Development mode only: When running locally with |
Step 3: Deploy Data Index Service
Create Configuration Secret
kubectl create secret generic data-index-config \
--namespace data-index \
--from-literal=jdbc-url='jdbc:postgresql://postgresql.postgresql.svc.cluster.local:5432/dataindex' \
--from-literal=username='dataindex' \
--from-literal=password='dataindex123'
For managed PostgreSQL, replace the JDBC URL with your external host.
Deploy Data Index
Create data-index-deployment.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: data-index-service
namespace: data-index
labels:
app: data-index-service
spec:
replicas: 2 # For high availability
selector:
matchLabels:
app: data-index-service
template:
metadata:
labels:
app: data-index-service
spec:
containers:
- name: data-index
image: kubesmarts/data-index-service:1.0.0 # Replace with your version
ports:
- containerPort: 8080
name: http
env:
- name: QUARKUS_DATASOURCE_JDBC_URL
valueFrom:
secretKeyRef:
name: data-index-config
key: jdbc-url
- name: QUARKUS_DATASOURCE_USERNAME
valueFrom:
secretKeyRef:
name: data-index-config
key: username
- name: QUARKUS_DATASOURCE_PASSWORD
valueFrom:
secretKeyRef:
name: data-index-config
key: password
- name: QUARKUS_DATASOURCE_DB_KIND
value: "postgresql"
resources:
requests:
memory: "512Mi"
cpu: "250m"
limits:
memory: "1Gi"
cpu: "1000m"
livenessProbe:
httpGet:
path: /q/health/live
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /q/health/ready
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
name: data-index-service
namespace: data-index
spec:
selector:
app: data-index-service
ports:
- name: http
port: 80
targetPort: 8080
type: ClusterIP
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: data-index
namespace: data-index
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
spec:
rules:
- host: data-index.your-domain.com # Replace with your domain
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: data-index-service
port:
number: 80
Apply:
kubectl apply -f data-index-deployment.yaml
Verify:
kubectl get pods -n data-index
kubectl logs -n data-index -l app=data-index-service
# Test GraphQL endpoint
kubectl port-forward -n data-index svc/data-index-service 8080:80
curl http://localhost:8080/graphql \
-H "Content-Type: application/json" \
-d '{"query":"{ __schema { types { name } } }"}'
Step 4: Deploy FluentBit
FluentBit collects workflow events from container logs and sends them to PostgreSQL.
Generate ConfigMap
Use the helper script:
cd data-index/scripts/fluentbit
./generate-configmap.sh
This generates postgresql/kubernetes/configmap.yaml from source files:
-
fluent-bit.conf -
parsers.conf -
flatten-event.lua
Create PostgreSQL Credentials Secret
kubectl create secret generic fluentbit-postgres \
--namespace logging \
--from-literal=host='postgresql.postgresql.svc.cluster.local' \
--from-literal=port='5432' \
--from-literal=database='dataindex' \
--from-literal=user='dataindex' \
--from-literal=password='dataindex123'
For managed PostgreSQL, use your external host.
Deploy FluentBit
kubectl apply -f postgresql/kubernetes/configmap.yaml
kubectl apply -f postgresql/kubernetes/daemonset.yaml
Or use helper script:
./deploy-fluentbit.sh postgresql
Verify:
kubectl get pods -n logging
kubectl logs -n logging -l app=workflows-fluent-bit --tail=50
# Check FluentBit is tailing logs
kubectl logs -n logging -l app=workflows-fluent-bit | grep "inotify_fs_add"
# Check for errors
kubectl logs -n logging -l app=workflows-fluent-bit | grep -i error
Step 5: Deploy Quarkus Flow Applications
Deploy your Quarkus Flow applications with structured logging enabled.
Application Configuration
application.properties:
# Structured logging
quarkus.flow.structured-logging.enabled=true
quarkus.flow.structured-logging.timestamp-format=epoch-seconds
# Console handler for JSON output
quarkus.log.handler.console."FLOW_EVENTS_CONSOLE".enabled=true
quarkus.log.handler.console."FLOW_EVENTS_CONSOLE".format=%s%n
# Route structured logging to console
quarkus.log.category."io.quarkiverse.flow.structuredlogging".handlers=FLOW_EVENTS_CONSOLE
quarkus.log.category."io.quarkiverse.flow.structuredlogging".use-parent-handlers=false
quarkus.log.category."io.quarkiverse.flow.structuredlogging".level=INFO
Deploy to workflows Namespace
|
FluentBit is configured to tail logs from the Deploy your applications to this namespace or update FluentBit’s |
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-workflow-app
namespace: workflows # Important: must match FluentBit config
spec:
template:
spec:
containers:
- name: app
image: my-workflow-app:1.0.0
# ... rest of config
Step 6: Testing
Execute a Workflow
# Port-forward your workflow app
kubectl port-forward -n workflows svc/my-workflow-app 8082:8080
# Execute workflow
curl -X POST http://localhost:8082/my-workflow \
-H "Content-Type: application/json" \
-d '{"input": "test"}'
Verify Event Flow
1. Check container logs have JSON events:
kubectl logs -n workflows <workflow-pod> | grep eventType
2. Check FluentBit collected events:
kubectl logs -n logging -l app=workflows-fluent-bit | grep -i "workflow"
3. Check raw events in PostgreSQL:
kubectl exec -n postgresql postgresql-0 -- \
env PGPASSWORD=dataindex123 psql -U dataindex -d dataindex \
-c "SELECT COUNT(*) FROM workflow_events_raw;"
4. Check normalized workflows:
kubectl exec -n postgresql postgresql-0 -- \
env PGPASSWORD=dataindex123 psql -U dataindex -d dataindex \
-c "SELECT id, name, status FROM workflow_instances LIMIT 5;"
5. Query GraphQL API:
kubectl port-forward -n data-index svc/data-index-service 8080:80
curl http://localhost:8080/graphql \
-H "Content-Type: application/json" \
-d '{
"query": "{ getWorkflowInstances(limit: 5) { id name status startDate } }"
}'
Production Considerations
High Availability
Data Index:
-
Deploy multiple replicas (2-3)
-
Use Kubernetes HorizontalPodAutoscaler
-
All instances are stateless (query-only)
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
maxSurge: 1
PostgreSQL:
-
Use managed PostgreSQL with automatic failover
-
Or deploy PostgreSQL HA solution:
-
Patroni
-
Stolon
-
CloudNativePG operator
-
FluentBit:
-
Runs as DaemonSet (one pod per node)
-
Enable filesystem buffering for reliability
Performance Tuning
PostgreSQL:
-- Increase connection pool
ALTER SYSTEM SET max_connections = 200;
-- Tune for write performance
ALTER SYSTEM SET shared_buffers = '2GB';
ALTER SYSTEM SET effective_cache_size = '6GB';
ALTER SYSTEM SET maintenance_work_mem = '512MB';
ALTER SYSTEM SET checkpoint_completion_target = 0.9;
ALTER SYSTEM SET wal_buffers = '16MB';
ALTER SYSTEM SET default_statistics_target = 100;
-- Reload configuration
SELECT pg_reload_conf();
Data Index:
Adjust JVM heap:
env:
- name: JAVA_OPTS
value: "-Xms512m -Xmx1024m"
FluentBit:
Increase buffer for high throughput:
[INPUT]
Mem_Buf_Limit 50MB
[SERVICE]
storage.max_chunks_up 512
Monitoring
Metrics to track:
-
FluentBit input/output rates
-
PostgreSQL connection pool usage
-
Trigger execution time
-
Raw vs normalized event counts
-
GraphQL query latency
Prometheus metrics:
# Data Index service exposes /q/metrics
servicemonitors.monitoring.coreos.com/data-index created
# FluentBit exposes metrics on port 2020
curl http://<fluentbit-pod>:2020/api/v1/metrics/prometheus
Grafana dashboards:
-
PostgreSQL dashboard (ID: 9628)
-
FluentBit dashboard (custom)
-
Data Index service (Quarkus metrics)
Security
Database credentials:
-
Use Kubernetes Secrets
-
Rotate credentials regularly
-
Use minimal permissions (no DROP, ALTER)
Network policies:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: data-index-network-policy
namespace: data-index
spec:
podSelector:
matchLabels:
app: data-index-service
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector:
matchLabels:
name: ingress-nginx
ports:
- protocol: TCP
port: 8080
egress:
- to:
- namespaceSelector:
matchLabels:
name: postgresql
ports:
- protocol: TCP
port: 5432
TLS:
-
Enable TLS for PostgreSQL connections
-
Use TLS for Ingress (HTTPS)
Backup and Recovery
PostgreSQL backups:
# Backup
kubectl exec -n postgresql postgresql-0 -- \
pg_dump -U dataindex dataindex > dataindex-backup-$(date +%Y%m%d).sql
# Restore
kubectl exec -i -n postgresql postgresql-0 -- \
psql -U dataindex dataindex < dataindex-backup-20260427.sql
Automated backups:
-
Use managed PostgreSQL automated backups
-
Or use tools like pgBackRest, Barman, WAL-G
Troubleshooting
See Troubleshooting Guide for common issues.
Quick checks:
# All pods running?
kubectl get pods -n data-index
kubectl get pods -n logging
kubectl get pods -n workflows
# Database accessible?
kubectl exec -n postgresql postgresql-0 -- \
env PGPASSWORD=dataindex123 psql -U dataindex -d dataindex -c 'SELECT 1;'
# Events flowing?
kubectl exec -n postgresql postgresql-0 -- \
env PGPASSWORD=dataindex123 psql -U dataindex -d dataindex \
-c 'SELECT COUNT(*), MAX(time) FROM workflow_events_raw;'
# GraphQL working?
curl http://localhost:8080/graphql -d '{"query":"{ __typename }"}'