Getting Started with Data Index
This guide walks you through deploying Data Index and verifying it works.
Prerequisites
Before you begin, ensure you have:
-
Kubernetes cluster running (or KIND for local development)
-
kubectlconfigured to access your cluster -
curlandjqinstalled
Quick Start (Development Mode)
The fastest way to start Data Index locally without Kubernetes:
PostgreSQL Backend
cd data-index/data-index-service/data-index-service-postgresql
mvn quarkus:dev
What happens:
-
Quarkus Dev Services auto-starts PostgreSQL container
-
Flyway runs schema migrations automatically
-
Service starts at localhost:8080
-
GraphQL UI available at localhost:8080/q/graphql-ui
Elasticsearch Backend
cd data-index/data-index-service/data-index-service-elasticsearch
mvn quarkus:dev
What happens:
-
Quarkus Dev Services auto-starts Elasticsearch container
-
Schema initialization (ILM, templates, transforms) runs automatically
-
Service starts at localhost:8080
-
GraphQL UI available at localhost:8080/q/graphql-ui
-
Elasticsearch available at localhost:9200
| Development mode doesn’t include FluentBit. To test with real workflow events, use the KIND installation below. |
Quick Install (KIND)
Data Index supports two storage backends. Choose one based on your needs:
-
PostgreSQL (MODE 1) - Recommended for most users, simpler deployment
-
Elasticsearch (MODE 2) - For high throughput or full-text search requirements
-
Kafka (MODE 3) - For stream-based ingestion, requires Kafka infrastructure
Option 1: PostgreSQL Backend (Recommended)
# 1. Setup KIND cluster and PostgreSQL
cd data-index/scripts/kind
./setup-cluster.sh
MODE=postgresql ./install-dependencies.sh
# 2. Deploy Data Index service
./deploy-data-index.sh postgresql
# 3. Deploy FluentBit DaemonSet (PostgreSQL mode)
cd ../fluentbit
./generate-configmap.sh postgresql postgresql/kubernetes/configmap.yaml
kubectl apply -f postgresql/kubernetes/configmap.yaml
kubectl apply -f postgresql/kubernetes/daemonset.yaml
Option 2: Elasticsearch Backend
# 1. Setup KIND cluster and Elasticsearch
cd data-index/scripts/kind
./setup-cluster.sh
MODE=elasticsearch ./install-dependencies.sh
# 2. Deploy Data Index service
./deploy-data-index.sh elasticsearch
# 3. Deploy FluentBit DaemonSet (Elasticsearch mode)
cd ../fluentbit
./generate-configmap.sh elasticsearch elasticsearch/kubernetes/configmap.yaml
kubectl apply -f elasticsearch/kubernetes/configmap.yaml
kubectl apply -f elasticsearch/kubernetes/daemonset.yaml
Option 3: Kafka Ingestion
# 1. Setup KIND cluster and Infrastructure dependencies (Kafka, PostgreSQL)
./setup-cluster.sh
MODE=kafka ./install-dependencies.sh
# 2. Deploy the data index query service (PostgreSQL backend)
./deploy-data-index.sh kafka
# 3. Initialize the database schema
./init-database-schema.sh
# 4. Deploy the Kafka ingestion service
./deploy-kafka-ingestion.sh
# 5. Deploy test workflow app with kafka profile
MODE=kafka ./deploy-workflow-app.sh
Verify Installation
Check Pods
All components should be running:
# Data Index service
kubectl get pods -n data-index
# FluentBit
kubectl get pods -n logging
# PostgreSQL
kubectl get pods -n postgresql
# Kafka
kubectl get pods -n kafka
Expected output:
NAMESPACE NAME READY STATUS
data-index data-index-service-xxx 1/1 Running
logging workflows-fluent-bit-mode1-xxx 1/1 Running
postgresql postgresql-0 1/1 Running
Test GraphQL API
Query the API to verify it’s responding:
# Port-forward to Data Index service
kubectl port-forward -n data-index svc/data-index-service 8080:8080 &
# Query GraphQL
curl -s http://localhost:8080/graphql \
-H "Content-Type: application/json" \
-d '{"query":"{ getWorkflowInstances { id name status } }"}' \
| jq .
Expected response (may be empty if no workflows executed yet):
{
"data": {
"getWorkflowInstances": []
}
}
Access GraphQL UI
Open the interactive GraphQL playground in your browser:
# Port-forward to Data Index service (if not already running)
kubectl port-forward -n data-index svc/data-index-service 8080:8080 &
# Open GraphQL UI in browser
open http://localhost:8080/q/graphql-ui/
The GraphQL UI provides:
-
Interactive query builder - Autocomplete and syntax highlighting
-
Schema explorer - Browse available queries and types
-
Query history - Saved queries for reuse
-
Documentation - Inline field descriptions
Use Ctrl+Space in the query editor for autocomplete suggestions.
|
Check Storage Backend
PostgreSQL:
kubectl exec -n postgresql postgresql-0 -- \
env PGPASSWORD=dataindex123 \
psql -U dataindex -d dataindex \
-c "\dt"
Expected tables:
workflow_instances
task_instances
workflow_events_raw
task_events_raw
flyway_schema_history
Elasticsearch:
# Port-forward to Elasticsearch
kubectl port-forward -n elasticsearch svc/elasticsearch 9200:9200 &
# List all indices
curl -s http://localhost:9200/_cat/indices?v
# Check transforms
curl -s http://localhost:9200/_transform?pretty
Expected indices:
workflow-instance-events-raw-YYYY.MM.DD
task-execution-events-raw-YYYY.MM.DD
workflow-instances
task-executions
Expected transforms:
workflow-instance-transform (started)
task-execution-transform (started)
Troubleshooting
See the Troubleshooting Guide for common issues.