A TypeScript library for providing insights from git commit history.
- Actionable insights
- Fast - ~700ms for 100,000 commits (getting the git-log will be slow)
- Follows file rename and removal
- Optimized for CI
- Percentile-based classification — self-calibrating thresholds that work across any codebase size
- Composite risk scoring — weighted multi-metric risk scores per file
- Integrated (a VERY basic) code complexity engine
- Bring your own code complexity score
- Add custom metrics using full temporal history
Existing git analysis tools (code-maat, git-of-theseus, Hercules, etc.) are great for reports but feel heavy as a backend for dev-tools. This library is designed to be lightweight, fast, and embeddable.
Tip: Focus on recent history (6-9 months). While the library handles renames and long histories correctly, older data tends to add noise.
npm install git-forensicsimport { simpleGit } from 'simple-git';
import { computeForensics } from 'git-forensics';
const git = simpleGit('/path/to/repo');
const forensics = await computeForensics(git);
forensics.hotspots; // Files changed most often
forensics.churn; // Code volatility (lines added/deleted)
forensics.coupledPairs; // Hidden dependencies
forensics.couplingRankings; // Architectural hubs
forensics.codeAge; // Stale code detection
forensics.ownership; // Knowledge silos
forensics.communication; // Developer coordination needs
forensics.topContributors; // Per-file contributor breakdownRunning computeForensics on a repository returns structured data across all metrics:
Passing the result to generateInsights produces actionable alerts:
[
{
"file": "src/core/engine.ts",
"type": "hotspot",
"severity": "critical",
"data": {
"type": "hotspot",
"revisions": 64,
"rank": 2,
"percentile": 95,
},
"fragments": {
"title": "Hotspot",
"finding": "64 revisions (P95), ranked #2 in repository",
"risk": "Top-ranked churn file — prioritize for refactoring or test hardening",
"suggestion": "Consider breaking into smaller modules or adding test coverage",
},
},
{
"file": "src/core/engine.ts",
"type": "ownership-risk",
"severity": "critical",
"data": {
"type": "ownership-risk",
"fractalValue": 0.18,
"authorCount": 7,
"mainDev": "alice",
"percentile": 92,
},
"fragments": {
"title": "Fragmented Ownership",
"finding": "7 contributors, fragmentation score 0.18 (P92)",
"risk": "Diffuse ownership slows review cycles and increases merge conflicts",
"suggestion": "Request review from alice (primary contributor)",
},
},
// ... insights generated for each metric that exceeds thresholds
]generateInsights transforms metrics into alerts with severity (warning, critical) and human-readable fragments (title, finding, risk, suggestion).
Insights use percentile-based thresholds — a file is flagged based on where it ranks relative to other files in the same repository. This makes thresholds self-calibrating across codebases of any size.
| Question | Metric | Insight triggers when |
|---|---|---|
| Where's the riskiest code? | hotspots |
Revisions in P75+ (warning) or P90+ (critical) |
| What keeps getting rewritten? | churn |
Churn in P75+ or P90+ |
| What hidden dependencies exist? | coupledPairs |
≥70% co-change rate (absolute, not percentile) |
| What has ripple effects? | couplingRankings |
Coupling score in P75+ or P90+ |
| What's been forgotten? | codeAge |
Age in P75+ or P90+ |
| Who owns what? Any knowledge silos? | ownership |
≥3 authors, fragmentation in P75+ or P90+ |
All thresholds are overridable — pass a partial thresholds object and only the values you specify will change:
const insights = generateInsights(forensics, {
thresholds: {
hotspot: { warning: 80, critical: 95 }, // percentile cutoffs
churn: { warning: 80 },
staleCode: { warning: 60, critical: 85 },
coupling: { minPercent: 80 }, // stays absolute — not percentile-based
ownershipRisk: { warning: 70, critical: 90, minAuthors: 4 },
couplingScore: { warning: 80, critical: 95 },
},
});The analysis pipeline has its own configurable thresholds that control what data is collected:
const forensics = await computeForensics(git, {
maxFilesPerCommit: 50, // skip large commits from coupling analysis (default: 50)
minCoChanges: 3, // minimum co-changes to report a coupled pair (default: 3)
minCouplingPercent: 30, // minimum coupling % to report a pair (default: 30)
minSharedEntities: 2, // minimum shared files for communication pairs (default: 2)
});These options are also available on computeForensicsFromData().
forensics.stats contains the complete temporal history—every commit, by every author, for every file. Access stats.fileStats[file].byAuthor, authorContributions, nameHistory, etc. to build custom metrics like temporal histograms, expertise scores, or handoff detection.
computeRiskScores produces a single 0-100 risk score per file by combining percentile ranks across all metrics with configurable weights:
import { computeRiskScores } from 'git-forensics';
const scores = computeRiskScores(forensics);
// [
// { file: 'src/core/engine.ts', riskScore: 87.5, breakdown: { revisions: 22.5, churn: 25, ownershipRisk: 18, age: 12, couplingScore: 10 } },
// { file: 'src/api/routes.ts', riskScore: 72.0, breakdown: { ... } },
// ...
// ]Default weights:
| Metric | Weight |
|---|---|
| Revisions | 0.25 |
| Churn | 0.25 |
| Ownership Risk | 0.20 |
| Age | 0.15 |
| Coupling Score | 0.15 |
Override weights to match your priorities:
const scores = computeRiskScores(forensics, {
revisions: 0.4,
churn: 0.3,
ownershipRisk: 0.1,
age: 0.1,
couplingScore: 0.1,
});extractFileMetrics flattens forensics into per-file rows for storage. Pass includePercentiles: true to enrich each row with percentile ranks and a composite risk score:
import { extractFileMetrics } from 'git-forensics';
const metrics = extractFileMetrics(forensics, { includePercentiles: true });
// Each entry includes:
// {
// file, revisions, ageMonths, churn, fractalValue, ...
// percentiles: { revisions: 90, churn: 75, ownershipRisk: 85, ageMonths: 60, couplingScore: 40 },
// riskScore: 72.5,
// }The underlying percentile functions are exported for building custom scoring:
import {
percentileRank,
createPercentileRanker,
createInvertedPercentileRanker,
} from 'git-forensics';
// One-off calculation
percentileRank(50, [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]); // 45
// Reusable ranker for repeated lookups
const rank = createPercentileRanker([10, 20, 30, 40, 50]);
rank(30); // 50
rank(50); // 90
// Inverted ranker (lower values = higher percentile)
const riskRank = createInvertedPercentileRanker([0.1, 0.3, 0.5, 0.7, 0.9]);
riskRank(0.1); // 90 (lowest value = highest risk)git-forensics separates commit analysis from static code analysis. It provides optional complexity helpers for convenience (using indent-complexity).
It is recommended you use a language-aware complexity scoring and pass the results to computeForensics.
Loop over insights and build a PR comment or CI annotation:
const insights = generateInsights(forensics, { minSeverity: 'warning' });
for (const insight of insights) {
const prefix = insight.severity === 'critical' ? '[CRITICAL]' : '[WARNING]';
console.log(`${prefix} ${insight.file} - ${insight.fragments.title}`);
console.log(` ${insight.fragments.finding}`);
console.log(` ${insight.fragments.suggestion}\n`);
}For very large repos, store the computeForensics result between runs and rehydrate with generateInsights — no git scan needed:
import { generateInsights, getChangedFiles } from 'git-forensics';
// Fetch pre-computed forensics from your server/cache
const forensics = await fetch('https://your-server/api/forensics?repo=org/repo').then((r) =>
r.json()
);
// Generate insights only for PR changed files
const changedFiles = await getChangedFiles(git, 'origin/main');
const insights = generateInsights(forensics, { files: changedFiles, minSeverity: 'warning' });For environments without direct git access use computeForensicsFromData() with pre-fetched git data:
import { computeForensicsFromData, gitLogDataSchema, validateGitLogData } from 'git-forensics';
// Data must match the following format
const data = {
log: {
all: [
{
hash: 'abc123',
date: '2025-01-15T10:00:00Z',
author_name: 'Alice',
message: 'Add feature',
diff: {
files: [
{ file: 'src/app.ts', insertions: 50, deletions: 10 },
{ file: 'src/utils.ts', insertions: 20, deletions: 5 },
],
},
},
// ... more commits
],
},
trackedFiles: 'src/app.ts\nsrc/utils.ts\nsrc/index.ts', // from git ls-files
};
// Print JSON-schema if needed
console.log(gitLogDataSchema); // JSON Schema object
// Validate before processing
validateGitLogData(data); // throws if invalid
const forensics = computeForensicsFromData(data);v2.0.0 replaces absolute thresholds with percentile-based classification. Key changes:
InsightThresholdsvalues are now percentile cutoffs (0-100), not raw metric valuesInsightDatavariants (exceptcoupling) include apercentilefield- Stale-code severity changed from
info/warningtowarning/critical - Finding strings now include
(Pxx)percentile annotations - Generator function signatures added a
percentileRankparameter (affects direct generator importers) - New exports:
computeRiskScores,DEFAULT_RISK_WEIGHTS,percentileRank,createPercentileRanker,createInvertedPercentileRanker - New types:
PercentileThresholds,RiskWeights,FileRiskScore,ExtractFileMetricsOptions
Based on concepts from Adam Tornhill's Your Code as a Crime Scene and Software Design X-Rays.
MIT
{ "analyzedCommits": 842, "dateRange": { "from": "2024-03-10", "to": "2025-01-15" }, "metadata": { "totalFilesAnalyzed": 134, "totalAuthors": 12 }, "hotspots": [ { "file": "src/api/routes.ts", "revisions": 87, "exists": true }, { "file": "src/core/engine.ts", "revisions": 64, "exists": true }, ], "coupledPairs": [ { "file1": "src/api/routes.ts", "file2": "src/api/middleware.ts", "couplingPercent": 82, "coChanges": 34, }, ], "ownership": [ { "file": "src/core/engine.ts", "mainDev": "alice", "ownershipPercent": 34, "fractalValue": 0.18, "authorCount": 7, }, ], // ... plus churn, codeAge, couplingRankings, communication, topContributors }