Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,5 @@ Desktop.ini
node_modules/
dist/
*.tgz
.metabase/
upload.log
78 changes: 55 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ This repository contains the specification, examples, and a CLI that converts th

## Specification

The format is defined in **[core-spec/v1/spec.md](core-spec/v1/spec.md)** (v1.1.0). It covers entity keys, field types, folder structure, sampled field values, and the shape of each entity.
The format is defined in **[core-spec/v1/spec.md](core-spec/v1/spec.md)** (v1.0.3). It covers entity keys, field types, folder structure, sampled field values, and the shape of each entity.

Reference output for the Sample Database lives in **[examples/v1/](examples/v1/)** — both the raw `metadata.json` returned by the endpoint and the extracted YAML tree.

Expand All @@ -22,17 +22,39 @@ Reference output for the Sample Database lives in **[examples/v1/](examples/v1/)

Metadata is fetched on demand from a running Metabase instance via `GET /api/database/metadata`. The response is a flat JSON document with three arrays — `databases`, `tables`, and `fields` — streamed so that even warehouses with very large schemas can be exported without exhausting server memory.

Authenticate with either a session token (`X-Metabase-Session`) or an API key (`X-API-Key`):
Authenticate with an API key (`X-API-Key`) or session token (`X-Metabase-Session`).

### Downloading metadata

The CLI can fetch `metadata.json`, `field-values.json`, and extract the YAML tree in one streaming pass:

```sh
curl "$METABASE_URL/api/database/metadata" \
-H "X-API-Key: $METABASE_API_KEY" \
-o metadata.json
export METABASE_API_KEY=...
bunx @metabase/database-metadata download-metadata "$METABASE_URL"
```

With no flags, the command writes:

- `.metabase/metadata.json`
- `.metabase/field-values.json`
- `.metabase/databases/` — extracted YAML tree

Flags override any default or opt out of individual steps:

| Flag | Default | Purpose |
|------|---------|---------|
| `--metadata <path>` | `.metabase/metadata.json` | Where to write the raw metadata JSON |
| `--field-values <path>` | `.metabase/field-values.json` | Where to write the raw field-values JSON |
| `--extract <folder>` | `.metabase/databases` | Where to extract the YAML tree |
| `--no-field-values` | — | Skip downloading field values |
| `--no-extract` | — | Skip YAML extraction |
| `--api-key <key>` | `METABASE_API_KEY` env var | API key |

Files are streamed to disk directly — responses are never fully buffered in memory, so multi-GB exports stay lean.

### Extracting metadata to YAML

The CLI turns that JSON into the human- and agent-friendly YAML tree described in the spec:
If you already have a `metadata.json` on disk (e.g. downloaded via `curl`), you can skip the download and extract directly:

```sh
bunx @metabase/database-metadata extract-metadata <input-file> <output-folder>
Expand All @@ -43,13 +65,9 @@ bunx @metabase/database-metadata extract-metadata <input-file> <output-folder>

### Extracting field values

Metabase keeps a sampled list of distinct values for each field that's low-cardinality enough to enumerate (the same list that powers filter dropdowns in the UI). Fetch it and extract it alongside the metadata:
Metabase keeps a sampled list of distinct values for each field that's low-cardinality enough to enumerate (the same list that powers filter dropdowns in the UI).

```sh
curl "$METABASE_URL/api/database/field-values" \
-H "X-API-Key: $METABASE_API_KEY" \
-o field-values.json

bunx @metabase/database-metadata extract-field-values <metadata-file> <field-values-file> <output-folder>
```

Expand All @@ -59,6 +77,28 @@ bunx @metabase/database-metadata extract-field-values <metadata-file> <field-val

One YAML file is written per field that has values. Fields with empty samples are skipped; field IDs not present in the metadata are reported as orphans and skipped. See the spec's [Field Values](core-spec/v1/spec.md#field-values) section for the on-disk shape and when agents should consult these files.

### Uploading metadata to a target instance

`upload-metadata` streams the JSON files previously written by `download-metadata` into a target Metabase instance, remapping numeric IDs across multiple NDJSON passes (see [metabase-api-contract.md](metabase-api-contract.md)):

```sh
export METABASE_API_KEY=...
bunx @metabase/database-metadata upload-metadata "$TARGET_METABASE_URL"
```

With no flags, it reads `.metabase/metadata.json` and `.metabase/field-values.json` — the same layout `download-metadata` writes by default.

| Flag | Default | Purpose |
|------|---------|---------|
| `--metadata <path>` | `.metabase/metadata.json` | Path to the metadata JSON to upload |
| `--field-values <path>` | `.metabase/field-values.json` | Path to the field-values JSON |
| `--no-field-values` | — | Skip uploading field values |
| `--api-key <key>` | `METABASE_API_KEY` env var | API key |

The source JSON files are streamed through `@streamparser/json-node` — they are never fully loaded into memory, so 100 GB+ exports upload fine. Rows are sent in batches of 2000 per HTTP POST (matching the server's per-transaction batch size) with HTTP keep-alive, so each request is one clean server-side transaction.

Exits non-zero if any step reports row-level errors, or if the server acknowledges fewer rows than were sent in a batch (so CI can catch partial imports).

### Extracting the spec

The bundled spec can be extracted to any file — convenient for agents that need to read it locally:
Expand Down Expand Up @@ -112,25 +152,17 @@ cp .env.template .env

### 4. Fetch and extract on demand

With `.env` populated, the end-to-end flow is:
With `.env` populated, the end-to-end flow is a single command:

```sh
set -a; source .env; set +a

mkdir -p .metabase
curl -sf "$METABASE_URL/api/database/metadata" \
-H "X-API-Key: $METABASE_API_KEY" \
-o .metabase/metadata.json

curl -sf "$METABASE_URL/api/database/field-values" \
-H "X-API-Key: $METABASE_API_KEY" \
-o .metabase/field-values.json

rm -rf .metabase/databases
bunx @metabase/database-metadata extract-metadata .metabase/metadata.json .metabase/databases
bunx @metabase/database-metadata extract-field-values .metabase/metadata.json .metabase/field-values.json .metabase/databases
bunx @metabase/database-metadata download-metadata "$METABASE_URL"
```

That downloads `.metabase/metadata.json`, `.metabase/field-values.json`, and extracts the YAML tree into `.metabase/databases/`. Use `--no-field-values` or `--no-extract` to skip parts of the pipeline.

After this, tools and agents should read the YAML tree under `.metabase/databases/` — not `metadata.json` or `field-values.json`, which exist only as input to the extractors.

## Publishing to NPM
Expand Down
175 changes: 175 additions & 0 deletions bin/cli.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ type RunResult = {
exitCode: number;
};

type UploadLine = { id: number };

function runCli(args: string[]): RunResult {
const proc = Bun.spawnSync({
cmd: ["bun", "run", CLI, ...args],
Expand Down Expand Up @@ -121,6 +123,179 @@ describe("cli", () => {
});
});

describe("upload-metadata", () => {
it("errors when <instance-url> is missing", () => {
const { stderr, exitCode } = runCli(["upload-metadata"]);
expect(exitCode).toBe(1);
expect(stderr).toContain("<instance-url>");
});

it("errors when no api key is set", () => {
const proc = Bun.spawnSync({
cmd: ["bun", "run", CLI, "upload-metadata", "http://127.0.0.1:1"],
cwd: REPO_ROOT,
env: { ...process.env, METABASE_API_KEY: "" },
});
expect(proc.exitCode).toBe(1);
expect(proc.stderr.toString()).toContain("API key is required");
});

it("uploads against a mock server end-to-end", async () => {
const server = Bun.serve({
port: 0,
async fetch(request) {
const url = new URL(request.url);
const body = await request.text();
const inLines = body
.split("\n")
.map((line) => line.trim())
.filter((line) => line.length > 0);

let response = "";
switch (url.pathname) {
case "/api/database/metadata/databases":
case "/api/database/metadata/tables":
case "/api/database/metadata/fields":
for (const line of inLines) {
const { id } = JSON.parse(line) as UploadLine;
response += JSON.stringify({ old_id: id, new_id: id }) + "\n";
}
break;
case "/api/database/metadata/fields/finalize":
for (const line of inLines) {
const { id } = JSON.parse(line) as UploadLine;
response += JSON.stringify({ id, ok: true }) + "\n";
}
break;
default:
return new Response("not found", { status: 404 });
}
return new Response(response, {
headers: { "Content-Type": "application/x-ndjson" },
});
},
});
try {
// NB: must use async Bun.spawn — spawnSync would block the parent
// event loop and deadlock with the in-process mock server.
const proc = Bun.spawn({
cmd: [
"bun",
"run",
CLI,
"upload-metadata",
`http://127.0.0.1:${server.port}`,
"--metadata",
EXAMPLE_INPUT,
"--no-field-values",
],
cwd: REPO_ROOT,
env: { ...process.env, METABASE_API_KEY: "ci-key" },
stdout: "pipe",
stderr: "pipe",
});
const [stdoutText, stderrText, exitCode] = await Promise.all([
new Response(proc.stdout).text(),
new Response(proc.stderr).text(),
proc.exited,
]);
expect(exitCode).toBe(0);
expect(stdoutText).toContain("Databases:");
expect(stdoutText).toContain("Finalized:");
expect(stdoutText).not.toContain("Values:");
expect(stderrText).toBe("");
} finally {
await server.stop();
}
});
});

describe("download-metadata", () => {
let workdir: string;

beforeEach(() => {
workdir = mkdtempSync(join(tmpdir(), "download-metadata-cli-"));
});

afterEach(() => {
rmSync(workdir, { recursive: true, force: true });
});

it("errors when <instance-url> is missing", () => {
const { stderr, exitCode } = runCli(["download-metadata"]);
expect(exitCode).toBe(1);
expect(stderr).toContain("<instance-url>");
});

// End-to-end streaming + path-override via a spawned CLI against a mock
// server. Defaults-in-cwd behaviour is covered by library tests in
// src/download-metadata.test.ts — attempting the same with cwd=tmpdir
// inside bun:test reliably hangs Bun.spawn (unrelated to CLI logic).
it("overrides output paths via flags and writes each file", async () => {
const EXAMPLE_METADATA_PATH = join(REPO_ROOT, EXAMPLE_INPUT);
const EXAMPLE_VALUES_PATH = join(REPO_ROOT, EXAMPLE_FIELD_VALUES);
const server = Bun.serve({
port: 0,
fetch(request) {
const url = new URL(request.url);
if (url.pathname === "/api/database/metadata") {
return new Response(Bun.file(EXAMPLE_METADATA_PATH));
}
if (url.pathname === "/api/database/field-values") {
return new Response(Bun.file(EXAMPLE_VALUES_PATH));
}
return new Response("not found", { status: 404 });
},
});
try {
const metadataFile = join(workdir, "custom-metadata.json");
const fieldValuesFile = join(workdir, "custom-values.json");
const extractFolder = join(workdir, "custom-databases");
const proc = Bun.spawn({
cmd: [
"bun",
"run",
CLI,
"download-metadata",
`http://127.0.0.1:${server.port}`,
"--metadata",
metadataFile,
"--field-values",
fieldValuesFile,
"--extract",
extractFolder,
],
cwd: REPO_ROOT,
env: { ...process.env, METABASE_API_KEY: "ci-key" },
stdout: "pipe",
stderr: "pipe",
});
const [stdout, stderr, exitCode] = await Promise.all([
new Response(proc.stdout).text(),
new Response(proc.stderr).text(),
proc.exited,
]);
expect(stderr).toBe("");
expect(exitCode).toBe(0);
expect(stdout).toContain("Metadata:");
expect(stdout).toContain("Field values:");
expect(stdout).toContain("Extracted to:");
expect(existsSync(metadataFile)).toBe(true);
expect(existsSync(fieldValuesFile)).toBe(true);
expect(
existsSync(
join(
extractFolder,
"Sample Database/schemas/PUBLIC/tables/ORDERS.yaml",
),
),
).toBe(true);
} finally {
await server.stop();
}
});
});

describe("extract-spec", () => {
let workdir: string;

Expand Down
Loading
Loading