Exercise Solution: Create Synthetic Data for your FHIR Server

HAPI FHIR Server with Synthea Exercise

This exercise shows how to run a local HAPI FHIR server and populate it with synthetic patient data from Synthea.

Reference: Synthea repository and README synthetichealth/synthea

Prerequisites

Docker Desktop (with Docker Compose)
Java 11 or 17 (LTS recommended)
Git
curl

NOTE: If you are using the zip file containing the FHIR sample data, you skip steps 4 and 5 and go directly to step 6. From inside the extracted folder, you can run the commands to load the data into the HAPI FHIR server.

Setup Configuration Files

Step 1: Create Docker Compose Configuration

Create a docker-compose.yml file with the following content:

version: '3.7'

services:
  # HAPI FHIR server container
  fhir:
    container_name: fhir
    image: "hapiproject/hapi:v7.0.0"
    ports:
      - "8080:8080"  # Expose FHIR API on localhost:8080
    configs:
      - source: hapi
        target: /app/config/application.yaml  # Mount custom config
    depends_on:
      - db  # Wait for database to start first
      
  # PostgreSQL database for FHIR data persistence
  db:
    image: postgres:14
    restart: always
    environment:
      POSTGRES_PASSWORD: admin
      POSTGRES_USER: admin
      POSTGRES_DB: hapi
    volumes:
      - ./hapi.postgress.data:/var/lib/postgresql/data  # Persist data locally

configs:
  hapi:
     file: ./hapi.application.yaml  # Reference to HAPI config file

Step 2: Create HAPI Application Configuration

Create a hapi.application.yaml file in the same directory with the following content:

spring:
  datasource:
    # PostgreSQL connection details (must match docker-compose db service)
    url: 'jdbc:postgresql://db:5432/hapi'
    username: admin
    password: admin
    driverClassName: org.postgresql.Driver
  jpa:
    properties:
      # Use HAPI-specific PostgreSQL dialect for optimal performance
      hibernate.dialect: ca.uhn.fhir.jpa.model.dialect.HapiFhirPostgres94Dialect
      # Disable search indexing for faster startup (optional)
      hibernate.search.enabled: false

Running the Services

Step 3: Start HAPI FHIR Server and Database

From the project folder containing your configuration files:

# Start the containers in detached mode
docker compose up -d

# Wait 30-60 seconds for services to initialize, then verify the server is running
# This should return FHIR capability statement showing server is ready
curl -s http://localhost:8080/fhir/metadata | head -n 20

Optional: Test FHIR Server Connectivity

Verify the server can accept FHIR resources (should return a Patient resource with ID):

curl -sS -H 'Content-Type: application/fhir+json' -X POST \
  http://localhost:8080/fhir/Patient \
  --data-binary '{"resourceType":"Patient","name":[{"use":"official","family":"Test","given":["Upload"]}],"gender":"female","birthDate":"1980-01-01"}'

Generating Synthetic Data

Step 4: Clone and Build Synthea

Clone Synthea alongside this project and build it:

# Clone the Synthea repository
git clone https://github.com/synthetichealth/synthea.git
cd synthea

# Build Synthea (skip tests for faster build)
./gradlew build -x test

Step 5: Generate Synthetic Patient Data

Generate 100 synthetic patients as FHIR R4 transaction bundles:

# Generate 100 patients with FHIR R4 export enabled
./run_synthea \
  -p 100 \
  --exporter.fhir.export=true \
  --exporter.fhir.transaction_bundle=true \
  --exporter.fhir.upload=false

Note: Generated files will be under synthea/output/fhir/

Loading Data into HAPI FHIR

Step 6: Upload Organization and Practitioner

Synthea generates system-wide bundles that patient records reference. Upload these first to ensure all references resolve correctly:

# Navigate to the FHIR output directory
cd output/fhir

# Upload hospital/organization information first
# This contains Organization resources that patient encounters reference
for f in hospitalInformation*.json; do
  [ -e "$f" ] || continue 
  echo "Uploading $f"
  curl -sS -H 'Accept: application/fhir+json' \
       -H 'Content-Type: application/fhir+json;charset=utf-8' \
       -X POST http://localhost:8080/fhir \
       --data-binary "@$f" -o /tmp/resp.json -w "HTTP %{http_code}\n" | cat
  head -n 40 /tmp/resp.json  
done

# Upload practitioner information
# This contains Practitioner resources that patient encounters reference
for f in practitionerInformation*.json; do
  [ -e "$f" ] || continue  
  echo "Uploading $f"
  curl -sS -H 'Accept: application/fhir+json' \
       -H 'Content-Type: application/fhir+json;charset=utf-8' \
       -X POST http://localhost:8080/fhir \
       --data-binary "@$f" -o /tmp/resp.json -w "HTTP %{http_code}\n" | cat
  head -n 40 /tmp/resp.json 
done

Why this step is important: References to Organization and Practitioner resources are used in patient bundles. Loading them first ensures all references are valid when patient data is uploaded.

Step 7: Upload Patient Bundles

Upload all remaining transaction bundles, excluding the metadata files uploaded above:

# Navigate to the FHIR output directory (adjust path as needed)
cd output/fhir

# Upload all patient bundle files, skipping metadata files
for f in *.json; do
  case "$f" in
    hospitalInformation*|practitionerInformation*) continue;;
  esac
  
  echo "Uploading $f"

  code=$(curl -sS -o /tmp/resp.json -w "%{http_code}" \
    -H 'Accept: application/fhir+json' \
    -H 'Content-Type: application/fhir+json;charset=utf-8' \
    -X POST http://localhost:8080/fhir \
    --data-binary "@$f")
  
  echo "HTTP $code"
  

  if [ "$code" -ge 400 ]; then
    echo "Error response:"; head -n 120 /tmp/resp.json; break
  fi
done

Step 8: Verify the Data Import

# Check total number of patients imported
curl -s 'http://localhost:8080/fhir/Patient?_summary=count'

# Fetch a sample of patient records to verify data structure
curl -s 'http://localhost:8080/fhir/Patient?_count=5' | head -n 120

# Optional: Check other resource types
curl -s 'http://localhost:8080/fhir/Observation?_summary=count'
curl -s 'http://localhost:8080/fhir/Encounter?_summary=count'