Migrate vector collections from Qdrant or Milvus to Endee using a Dockerized producer-consumer pipeline with checkpoint resume support.
| Migration Type | Command |
|---|---|
| Qdrant (Dense) → Endee | qdrant-to-endee-dense |
| Qdrant (Hybrid: Dense + Sparse) → Endee | qdrant-to-endee-hybrid |
| Milvus (Dense) → Endee | milvus-to-endee-dense |
| Milvus (Hybrid: Dense + Sparse) → Endee | milvus-to-endee-hybrid |
.
├── cmd.sh # Build and run script
├── .env # Configuration (copy from .env.example)
├── data/
│ └── checkpoints/ # Checkpoint files (auto-created, mount as volume)
├── scripts/
│ ├── qdrant_to_endee_dense_migration.py
│ ├── qdrant_to_endee_hybrid_migration.py
│ ├── milvus_to_endee_dense_migration.py
│ └── milvus_to_endee_hybrid_migration.py
├── entrypoint.sh # Docker entrypoint
└── Dockerfile
Copy and edit the environment file:
cp .env.example .envSet your source database, target Endee credentials, and migration type. See Configuration Reference below for all options.
Run cmd.sh:
bash cmd.shAll settings can be provided via .env file or as environment variables passed to Docker.
# Choose one:
# qdrant-to-endee-dense
# qdrant-to-endee-hybrid
# milvus-to-endee-dense
# milvus-to-endee-hybrid
MIGRATION_TYPE=qdrant-to-endee-denseSOURCE_URL=http://your-qdrant-host
SOURCE_PORT=6333
SOURCE_API_KEY= # Leave empty if no auth
SOURCE_COLLECTION=your_collection
USE_HTTPS=falseSOURCE_URL=http://your-milvus-host
SOURCE_PORT=19530
SOURCE_API_KEY=your_milvus_token # Leave empty if no auth
SOURCE_COLLECTION=your_collection
IS_MULTIVECTOR=falseTARGET_URL=http://your-endee-host:8080 # Omit for Endee Cloud
TARGET_API_KEY=your_endee_api_key
TARGET_COLLECTION=your_index_nameBATCH_SIZE=1000 # Records fetched per batch from source
UPSERT_SIZE=1000 # Records upserted per chunk to Endee
MAX_QUEUE_SIZE=5 # Max batches buffered in memory between producer and consumer# These are read automatically from Milvus collection schema.
# Override only if needed:
# SPACE_TYPE=cosine # cosine | l2 | ip
# M=16
# EF_CONSTRUCT=128CHECKPOINT_FILE=/app/data/checkpoints/migration.json
# CLEAR_CHECKPOINT=true # Uncomment to start fresh# Comma-separated list of payload fields to use as Endee filter fields.
# All other fields go to meta.
# Endee filter fields must be scalar types (str, int, float, bool).
# Lists and dicts must go to meta — do not include them here.
FILTER_FIELDS=company,region,sector,document_typeDEBUG=false # Set true for verbose logging# ── Migration ────────────────────────────────────────────────────
MIGRATION_TYPE=qdrant-to-endee-hybrid
# ── Source (Qdrant) ──────────────────────────────────────────────
SOURCE_URL=
SOURCE_PORT=6333
SOURCE_API_KEY=
SOURCE_COLLECTION=
USE_HTTPS=false
# ── Target (Endee) ───────────────────────────────────────────────
TARGET_API_KEY=your_endee_api_key
TARGET_COLLECTION=my_endee_index
# ── Performance ──────────────────────────────────────────────────
BATCH_SIZE=1000
UPSERT_SIZE=1000
MAX_QUEUE_SIZE=5
# ── Filter fields ────────────────────────────────────────────────
FILTER_FIELDS=company,region,sector,document_type,page_number
# ── Checkpoint ───────────────────────────────────────────────────
CHECKPOINT_FILE=/app/data/checkpoints/migration.json
# ── Debug ────────────────────────────────────────────────────────
DEBUG=false#!/bin/bash
docker network create vector-net 2>/dev/null || true
docker build -t vector-migration:latest .
docker compose up --buildMigration progress is saved after every successfully upserted batch. If the migration is interrupted for any reason (network error, Ctrl+C, container restart), simply rerun the same command — it will resume from where it left off automatically.
# Resume from last checkpoint (default — just rerun):
bash cmd.sh
# Start fresh (discard checkpoint):
# Set in .env:
CLEAR_CHECKPOINT=trueThe checkpoint file is stored at CHECKPOINT_FILE (default: /app/data/checkpoints/migration.json). Since /app/data is mounted as a volume, the checkpoint persists across container restarts.
Checkpoint file example:
{
"processed_count": 50000,
"last_offset": "abc123-uuid-...",
"batch_number": 50
}Endee has two payload buckets per record:
| Bucket | Purpose | Allowed types |
|---|---|---|
filter |
Used for filtering search results | str, int, float, bool only |
meta |
Stored metadata, not filterable | Any type including list, dict |
Important: If any field in FILTER_FIELDS contains a list or dict value, Endee will reject the record with MDBX_BAD_VALSIZE. Always use scalar values in filter fields.
# ✓ Safe — scalar fields
FILTER_FIELDS=company,region,sector,page_number
# ✗ Will fail — 'product' is a list in this dataset
FILTER_FIELDS=company,productIf FILTER_FIELDS is empty, all payload fields go to filter. Fields with non-scalar values should always be excluded from FILTER_FIELDS and will automatically land in meta.
Each migration runs a producer-consumer pipeline inside asyncio:
migrate() ← sync setup: connect, detect schema, create index
└── asyncio.run(async_migrate())
├── async_producer() ← fetches batches from source into bounded queue
└── async_consumer() ← reads from queue, upserts to Endee, saves checkpoint
- Bounded queue (
MAX_QUEUE_SIZE=5) prevents memory overflow — producer pauses when queue is full. - All blocking SDK calls (Qdrant scroll, Milvus query, Endee upsert) run in
loop.run_in_executor()so the event loop is never frozen. - Parallel upsert — chunks within a batch are upserted concurrently via
asyncio.gather(). - Exponential backoff retry — failed chunks are retried up to 3 times (1s, 2s, 4s).
- Graceful shutdown —
SIGINT/SIGTERM(Ctrl+C ordocker stop) saves checkpoint and exits cleanly.
The consumer failed and the queue is full — the producer is blocked in queue.put(). Kill the container and rerun. The fix is to add a queue drain in the consumer's failure path (see source code comments).
To debug a hang:
# Inside the container
pip install py-spy
py-spy dump --pid 1A filter field contains a non-scalar value (usually a list). Remove it from FILTER_FIELDS — it will go to meta instead.
# If 'product' is a list:
FILTER_FIELDS=company,region,sector # ← remove 'product'The Endee server is out of memory — too many indexes open. Delete unused indexes on the Endee server before retrying.
free -h # check available RAM on Endee server
docker stats # check container memory usageUserWarning: Qdrant client version 1.16.2 is incompatible with server version 1.13.6
Downgrade the client to match your server version, or add check_compatibility=False to the QdrantClient constructor. Migration will still work in most cases despite the warning.
Failed to resolve 'your-host%20'
Check TARGET_URL or SOURCE_URL in .env for trailing whitespace.
- Docker
- Source database accessible from the Docker network (
--network vector-netor host network) - Endee instance running and accessible
- Sufficient disk space for checkpoint file (tiny — JSON, a few KB)
- Sufficient RAM for
MAX_QUEUE_SIZE × BATCH_SIZErecords in memory at once