Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ venv/
data/
work/
results/
tmp/

# Other files and directories to ignore
.DS_Store
Expand All @@ -19,3 +20,6 @@ results/
.codex
.idea/
.vscode/
*.mdb
*.idb
*.lbdb
2 changes: 1 addition & 1 deletion AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ Priorities, in order:
- Use Oxford commas in inline lists: "a, b, and c" not "a, b, c".
- Do not use em dashes. Restructure the sentence, or use a colon or semicolon instead.
- Avoid colorful adjectives and adverbs. Write "instruction decoder" not "elegant instruction decoder".
- Use noun phrases for checklist items, not imperative verbs. Write "opcode timing table" not "build the opcode timing table".
- Prefer using noun phrases for checklist items, not imperative verbs. Write "opcode timing table" not "build the opcode timing table".
- Headings in Markdown files must be in title case: "Build from Source" not "Build from source". Minor words (a, an, the, and, but, or, for, in, on,
at, to, by, of) stay lowercase unless they are the first word.

Expand Down
8 changes: 4 additions & 4 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
SCALE ?= 10000
SCALE ?= 100000
SEED ?= 0
ENGINES ?= issundb,ladybug,lance-graph,neo4j
SCALES ?= 1000,10000,100000
SCALES ?= 10000,100000,300000
MIN_ROUNDS ?= 20
TIME_BUDGET ?= 2.0
WARMUP ?= 3
Expand All @@ -10,10 +10,10 @@ WARMUP ?= 3

help:
@echo "graphbench targets:"
@echo " make gen SCALE=10000 # Generate a synthetic graph dataset (SCALE=number of Person nodes)"
@echo " make gen SCALE=100000 # Generate a synthetic graph dataset (SCALE=number of Person nodes)"
@echo " make engines # List which graph database engines are available"
@echo " make run [ENGINES=issundb,ladybug] # Run the benchmark for specified engines (default: all)"
@echo " make sweep [SCALES=1000,10000,100000] # Benchmark a series of scales and plot scaling curves"
@echo " make sweep [SCALES=10000,100000,300000] # Benchmark a series of scales and plot scaling curves"
@echo " make report # Generate a report from the results in the results/ directory"
@echo " make test # Run the unit tests"
@echo " make neo4j-up or neo4j-down # Start and stop the Neo4j container"
Expand Down
56 changes: 26 additions & 30 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,10 @@ against IssunDB.
| 3 | **Lance-graph** | [lancedb/lance-graph](https://github.com/lancedb/lance-graph) |
| 4 | **Neo4j** | [neo4j.com](https://neo4j.com/) |

> [!NOTE]
> Technically `Lance-graph` is not a graph database, but an in-memory graph query engine over Apache Arrow tables.
> In this repository when the word `engine` or `graph engine` are used, it referse to `Lance-graph` plus the other graph databases in the table above.

### Schema and Queries

#### Benchmark Graph Dataset
Expand Down Expand Up @@ -57,36 +61,28 @@ See the [query definitions](graphbench/queries.py) for more details.

### Methodology

The benchmarks are created, so the published numbers are reproducible and hard to manipulate.
That's achieved by:

- **Engine-independent correctness oracle.** Every query is independently re-implemented in
[`graphbench/oracle.py`](graphbench/oracle.py) with polars over the raw Parquet dataset. Each engine's
result rows (over several parameter instantiations) are diffed against the oracle; no engine, including
IssunDB, is ever used as the reference. Mismatches are reported, never silently omitted from timing.
- **Process isolation.** Each engine is built and timed in its own worker process
([`graphbench/_worker.py`](graphbench/_worker.py)), so heap state, allocator fragmentation, and caches
never leak between engines, and the peak RSS reported per engine is attributable to that engine alone.
- **Statistics.** Per query: a cold run (first execution after build) is reported separately; timed rounds
run with the garbage collector disabled until both a minimum round count and a time budget are met; the
report shows median latency with a distribution-free 95% confidence interval for the median (the
order-statistic method, not a normal approximation on the mean), and the plot carries p25 to p75 whiskers.
- **Honest comparisons.** Engines are labeled by kind (embedded / in-memory / client-server) and ingestion
method; load times are never ranked across kinds, the client-server network round-trip caveat is stated
in every report, and Neo4j's server memory settings are captured from the live server into the results.
- **Indexing differences.** Index models differ by engine and cannot be fully equalized: IssunDB
auto-indexes every scalar property, Neo4j uses a uniqueness index on `id` plus an explicit range index
on the filtered column, Ladybug indexes only its primary key, and lance-graph holds no index. The report
spells this out so a filtered-query result is read as the engine's indexing model, not raw speed alone.
- **Determinism.** The dataset is generated from a single seed, byte-for-byte reproducible, with edge rows
shuffled so no engine gains a locality advantage from sorted insertion order. Hardware (CPU model, cores,
and RAM) is recorded in every result file.
- **Scaling.** `make sweep` benchmarks a series of dataset scales and plots median latency vs scale per
query, so results are never a single-scale snapshot.

Known limitations (deliberately out of scope so far): the suite measures single-threaded read-only latency;
no concurrent throughput and no write/update workloads. Engines may not all support every query
(e.g. variable-length patterns); unsupported queries show as `ERR` in the report rather than being dropped.
To ensure reproducible, objective, and comparable performance metrics, the benchmark suite follows these practices:

- **Correctness Oracle**: Every query is re-implemented in [`graphbench/oracle.py`](graphbench/oracle.py) using Polars. Engine result rows are diffed
against this oracle to verify correctness before timing, and mismatching queries fail validation.
- **Process Isolation**: Each engine executes queries in a dedicated worker process ([`graphbench/_worker.py`](graphbench/_worker.py)) to prevent
cache, allocator, and heap contamination.
- **Statistical Rigor**: Query timing runs with the garbage collector disabled until a minimum round count and a time budget are met. Reports display
the median latency, a distribution-free 95% confidence interval, and p25 to p75 error bars. Cold runs are measured and reported separately.
- **Categorization**: Engines are categorized by architecture (embedded, in-memory, or client-server) and ingestion method. Latency reports include
network round-trip caveats for client-server engines and log live server settings.
- **Index Disclosure**: Engine index models are documented (such as IssunDB auto-indexing, Neo4j range indexing, LadybugDB primary key indexing, and
Lance-graph no-indexing) to provide context for query latency differences.
- **Determinism**: Datasets are generated from a single seed, and edge rows are shuffled to eliminate insertion-order locality benefits. CPU, core
count, and RAM specifications are saved with every run.
- **Multi-Scale Scaling**: The suite measures scaling characteristics by running a sweep across dataset sizes rather than relying on a single-point
snapshot.

#### Scope and Limitations

The suite currently measures single-threaded read-only latency.
Concurrent throughput, write workloads, and update workloads are out of scope.
Unsupported queries are reported as errors rather than being omitted.

> [!IMPORTANT]
> Benchmarking different systems (with different design philosophies, architectures, feature sets, etc.) is not straightforward and is tricky.
Expand Down
2 changes: 1 addition & 1 deletion graphbench/engines/issundb_engine.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ def __init__(self, schema: Schema, workdir: Path):
self._db_path = workdir / "social.issundb"
if self._db_path.exists():
shutil.rmtree(self._db_path)
self._db = IssunDB(str(self._db_path))
self._db = IssunDB(str(self._db_path), map_size_gb=16)

@classmethod
def probe(cls) -> EngineInfo:
Expand Down
2 changes: 1 addition & 1 deletion graphbench/engines/neo4j_engine.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ def probe(cls) -> EngineInfo:
def _reset(self) -> None:
# Batched delete so a wipe at large scales does not exhaust the heap.
self._session.run(
"MATCH (n) CALL { WITH n DETACH DELETE n } IN TRANSACTIONS OF 50000 ROWS"
"MATCH (n) CALL (n) { DETACH DELETE n } IN TRANSACTIONS OF 50000 ROWS"
)
for label in self.schema.nodes:
self._session.run(
Expand Down
4 changes: 2 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -8,15 +8,15 @@ dependencies = [
"pyarrow>=17.0",
"matplotlib>=3.9",
"polars>=1.0",
"issundb>=0.1.0a8",
"issundb>=0.1.0a9",
"patchelf>=0.17.2",
]

[project.optional-dependencies]
ladybug = ["ladybug>=0.17"]
lance-graph = ["lance-graph>=0.5"]
neo4j = ["neo4j>=5.0"]
all = ["ladybug>=0.15", "lance-graph>=0.5", "neo4j>=5.0"]
all = ["ladybug>=0.17", "lance-graph>=0.5", "neo4j>=5.0"]

[dependency-groups]
dev = ["pytest>=8.0"]
Expand Down
26 changes: 13 additions & 13 deletions uv.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading