Getting Started with Data Index

This guide walks you through deploying Data Index and verifying it works.

Prerequisites

Before you begin, ensure you have:

  • Kubernetes cluster running (or KIND for local development)

  • kubectl configured to access your cluster

  • curl and jq installed

Quick Start (Development Mode)

The fastest way to start Data Index locally without Kubernetes:

PostgreSQL Backend

cd data-index/data-index-service/data-index-service-postgresql
mvn quarkus:dev

What happens:

Elasticsearch Backend

cd data-index/data-index-service/data-index-service-elasticsearch
mvn quarkus:dev

What happens:

Development mode doesn’t include FluentBit. To test with real workflow events, use the KIND installation below.

Quick Install (KIND)

Data Index supports two storage backends. Choose one based on your needs:

  • PostgreSQL (MODE 1) - Recommended for most users, simpler deployment

  • Elasticsearch (MODE 2) - For high throughput or full-text search requirements

  • Kafka (MODE 3) - For stream-based ingestion, requires Kafka infrastructure

# 1. Setup KIND cluster and PostgreSQL
cd data-index/scripts/kind
./setup-cluster.sh
MODE=postgresql ./install-dependencies.sh

# 2. Deploy Data Index service
./deploy-data-index.sh postgresql

# 3. Deploy FluentBit DaemonSet (PostgreSQL mode)
cd ../fluentbit
./generate-configmap.sh postgresql postgresql/kubernetes/configmap.yaml
kubectl apply -f postgresql/kubernetes/configmap.yaml
kubectl apply -f postgresql/kubernetes/daemonset.yaml

Option 2: Elasticsearch Backend

# 1. Setup KIND cluster and Elasticsearch
cd data-index/scripts/kind
./setup-cluster.sh
MODE=elasticsearch ./install-dependencies.sh

# 2. Deploy Data Index service
./deploy-data-index.sh elasticsearch

# 3. Deploy FluentBit DaemonSet (Elasticsearch mode)
cd ../fluentbit
./generate-configmap.sh elasticsearch elasticsearch/kubernetes/configmap.yaml
kubectl apply -f elasticsearch/kubernetes/configmap.yaml
kubectl apply -f elasticsearch/kubernetes/daemonset.yaml

Option 3: Kafka Ingestion

# 1. Setup KIND cluster and Infrastructure dependencies (Kafka, PostgreSQL)
./setup-cluster.sh
MODE=kafka ./install-dependencies.sh

# 2. Deploy the data index query service (PostgreSQL backend)
./deploy-data-index.sh kafka

# 3. Initialize the database schema
./init-database-schema.sh

# 4. Deploy the Kafka ingestion service
./deploy-kafka-ingestion.sh

# 5. Deploy test workflow app with kafka profile
MODE=kafka ./deploy-workflow-app.sh

Verify Installation

Check Pods

All components should be running:

# Data Index service
kubectl get pods -n data-index

# FluentBit
kubectl get pods -n logging

# PostgreSQL
kubectl get pods -n postgresql

# Kafka
kubectl get pods -n kafka

Expected output:

NAMESPACE     NAME                                  READY   STATUS
data-index    data-index-service-xxx                1/1     Running
logging       workflows-fluent-bit-mode1-xxx        1/1     Running
postgresql    postgresql-0                          1/1     Running

Test GraphQL API

Query the API to verify it’s responding:

# Port-forward to Data Index service
kubectl port-forward -n data-index svc/data-index-service 8080:8080 &

# Query GraphQL
curl -s http://localhost:8080/graphql \
  -H "Content-Type: application/json" \
  -d '{"query":"{ getWorkflowInstances { id name status } }"}' \
  | jq .

Expected response (may be empty if no workflows executed yet):

{
  "data": {
    "getWorkflowInstances": []
  }
}

Access GraphQL UI

Open the interactive GraphQL playground in your browser:

# Port-forward to Data Index service (if not already running)
kubectl port-forward -n data-index svc/data-index-service 8080:8080 &

# Open GraphQL UI in browser
open http://localhost:8080/q/graphql-ui/

The GraphQL UI provides:

  • Interactive query builder - Autocomplete and syntax highlighting

  • Schema explorer - Browse available queries and types

  • Query history - Saved queries for reuse

  • Documentation - Inline field descriptions

Use Ctrl+Space in the query editor for autocomplete suggestions.

Check Storage Backend

PostgreSQL:

kubectl exec -n postgresql postgresql-0 -- \
  env PGPASSWORD=dataindex123 \
  psql -U dataindex -d dataindex \
  -c "\dt"

Expected tables:

 workflow_instances
 task_instances
 workflow_events_raw
 task_events_raw
 flyway_schema_history

Elasticsearch:

# Port-forward to Elasticsearch
kubectl port-forward -n elasticsearch svc/elasticsearch 9200:9200 &

# List all indices
curl -s http://localhost:9200/_cat/indices?v

# Check transforms
curl -s http://localhost:9200/_transform?pretty

Expected indices:

workflow-instance-events-raw-YYYY.MM.DD
task-execution-events-raw-YYYY.MM.DD
workflow-instances
task-executions

Expected transforms:

workflow-instance-transform (started)
task-execution-transform (started)

Troubleshooting

See the Troubleshooting Guide for common issues.