Vulnhalla automates the complete security analysis pipeline:
- Fetching repositories of a given programming language from GitHub
- Downloading their corresponding CodeQL databases (if available)
- Running CodeQL queries on those databases to detect security or code-quality issues
- Post-processing the results with an LLM (ChatGPT, Gemini, etc.) to classify and filter issues
Before starting, ensure you have:
-
Python 3.10 β 3.13 (Python 3.11 or 3.12 recommended)
- Python 3.14+ is not supported (this tool uses grpcio which is not supported by Python 3.14+)
- Download from python.org
-
CodeQL CLI
- Download from CodeQL CLI releases
- Make sure
codeqlis in your PATH, or you'll set the path in.env(see Step 2)
-
(Optional) GitHub API token
- For higher rate limits when downloading databases
- Get from GitHub Settings > Tokens
-
LLM API key
- OpenAI, Azure, or Gemini API key (depending on your provider)
All configuration is in a single file: .env
- Clone the repository:
git clone https://github.com/cyberark/Vulnhalla
cd Vulnhalla- Copy
.env.exampleto.env:
cp .env.example .env- Edit
.envand fill in your values:
Example for OpenAI:
CODEQL_PATH=codeql
GITHUB_TOKEN=ghp_your_token_here
PROVIDER=openai
MODEL=gpt-4o
OPENAI_API_KEY=your-api-key-here
LLM_TEMPERATURE=0.2
LLM_TOP_P=0.2
# Optional: Logging Configuration
LOG_LEVEL=INFO # DEBUG, INFO, WARNING, ERROR
LOG_FILE= # Optional: path to log file (e.g., logs/vulnhalla.log)
LOG_FORMAT=default # default or json
# LOG_VERBOSE_CONSOLE=false # If true, WARNING/ERROR use full format (timestamp - logger - level - message)π For complete configuration reference: See Configuration Reference below for all supported providers (OpenAI, Azure, Gemini), required/optional variables, and detailed examples.
Optional: Create a virtual environment:
# (Optional) Create virtual environment
python3 -m venv venv
venv\Scripts\activate # On Windows
# On MacOS/Linux: source venv/bin/activateOption 1: Automated Setup (Recommended)
python setup.pyNote: Virtual environment is optional. If venv/ exists, setup will use it. Otherwise, it installs to your current Python environment.
The setup script will:
- Install Python dependencies from
requirements.txt - Initialize CodeQL packs
Option 2: Manual Setup
If you prefer to install manually:
pip install -r requirements.txtcd data/queries/cpp/tools
codeql pack install
cd ../issues
codeql pack install
cd ../../../..Option 1: Using the Unified Pipeline
Run the complete pipeline with a single command:
# Analyze a specific repository
python src/pipeline.py redis/redis
# Analyze top 100 repositories
python src/pipeline.pyThis will automatically:
- Fetch CodeQL databases
- Run CodeQL queries on all downloaded databases
- Analyze results with LLM and save to
output/results/ - Open the UI to browse results
Option 2: Using the Example Script
Run the end-to-end example:
python examples/example.pyThis will:
- Fetch CodeQL databases for
videolan/vlcandredis/redis - Run CodeQL queries on all downloaded databases
- Analyze results with LLM and save to
output/results/
Vulnhalla includes a full-featured User Interface for browsing and exploring analysis results.
python src/ui/ui_app.py
# or
python examples/ui_example.pyThe UI displays a two-panel top area with a controls bar at the bottom:
Top Area (side-by-side, resizable):
-
Left Panel (Issues List):
- DataTable showing: ID, Repo, Issue Name, File, LLM decision, Manual decision
- Issues count and sort indicator
- Search input box at the bottom, updates as you type (case-insensitive).
-
Right Panel (Details):
- LLM decision Section: Shows the LLM's classification (True Positive, False Positive, or Needs More Data)
- Metadata Section: Issue name, Repo, File, Line, Type, Function name
- Code Section:
- π Initial Code Context (first code snippet the LLM saw)
- π₯ Additional Code (code that the LLM requested during the conversation) - only shown if additional code exists
- Vulnerable line highlighted in red
- Summary Section: LLM final answer/decision
- Manual Decision Select: Dropdown at the bottom to set manual verdict (True Positive, False Positive, Uncertain, or Not Set)
Bottom Controls Bar:
- Language: C (only language currently supported)
- Filter by llm desicion dropdown: All, True Positive, False Positive, Needs more Info to decide
- Action buttons: Refresh, Run Analysis
- Key bindings help text
β/β- Navigate issue list (row-by-row)Tab/Shift+Tab- Switch focus between panelsEnter- Show details for selected issue/- Focus search input box (in left panel)Esc- Clear search and return focus to issues tabler- Reload results from disk[/]- Resize left/right panels (adjust split position)q- Quit application
- Click any column header to sort by that column
- Default sorting: by Repo (ascending), then by ID (ascending)
- Draggable divider between Issues List and Details panels
- Mouse: Click and drag the divider to resize
- Keyboard: Use
[to move divider left,]to move divider right - Split position is remembered during the session
After running the pipeline, results are organized in output/results/<LANG>/<ISSUE_TYPE>/:
output/results/c/Copy_function_using_source_size/
βββ 1_raw.json # Original CodeQL issue data
βββ 1_final.json # LLM conversation and classification
βββ 2_raw.json
βββ 2_final.json
βββ ...
Each *_final.json contains:
- Full LLM conversation (system prompts, user messages, assistant responses, tool calls)
- Final status code (1337 = vulnerable, 1007 = secure, 7331/3713 = needs more info)
Each *_raw.json contains:
- Original CodeQL issue data
- Function context
- Database path (includes org/repo information:
output/databases/<LANG>/<ORG>/<REPO>) - Issue location
-
CodeQL CLI not found:
SetCODEQL_PATHin your.envfile to the full path of your CodeQL executable. On Windows: The path must end with.cmd(e.g.,C:\path\to\codeql\codeql.cmd). -
GitHub rate limits:
SetGITHUB_TOKENin your.envfile (get token from https://github.com/settings/tokens). -
LLM issues:
Check your API keys in.envfile match your selected provider. -
Import errors in UI:
Make sure you're running from the project root directory, or usepython examples/ui_example.pywhich handles path setup.
All configuration is managed through environment variables in your .env file. Here's a complete reference:
| Variable | Required For | Description |
|---|---|---|
CODEQL_PATH |
All | Path to CodeQL executable. Defaults to codeql if CodeQL is in PATH. Use full path if not in PATH (e.g., C:\path\to\codeql\codeql.cmd on Windows) |
PROVIDER |
All | LLM provider: openai, azure, or gemini |
MODEL |
All | Model name (e.g., gpt-4o, gpt-4-turbo, gemini-2.5-flash) |
OpenAI:
| Variable | Description |
|---|---|
OPENAI_API_KEY |
Your OpenAI API key from platform.openai.com |
Azure OpenAI:
| Variable | Description |
|---|---|
AZURE_OPENAI_API_KEY or AZURE_API_KEY |
Your Azure OpenAI API key |
AZURE_OPENAI_ENDPOINT or AZURE_API_BASE |
Your Azure OpenAI endpoint URL (e.g., https://your-resource.openai.azure.com) |
AZURE_OPENAI_API_VERSION or AZURE_API_VERSION |
API version (default: 2024-08-01-preview) |
Gemini (Google):
| Variable | Description |
|---|---|
GOOGLE_API_KEY |
Your Google API key from Google AI Studio |
| Variable | Default | Description |
|---|---|---|
GITHUB_TOKEN |
- | GitHub API token for higher rate limits. Get from GitHub Settings > Tokens |
LLM_TEMPERATURE |
0.2 |
LLM temperature (0.0-2.0). Lower = more deterministic. Recommended: keep at 0.2 |
LLM_TOP_P |
0.2 |
LLM top-p sampling (0.0-1.0). Lower = more focused. Recommended: keep at 0.2 |
LOG_LEVEL |
INFO |
Logging level: DEBUG, INFO, WARNING, or ERROR. Controls verbosity of console output |
LOG_FILE |
- | Optional path to log file (e.g., logs/vulnhalla.log). If set, logs are written to both console and file. File logging uses DEBUG level for detailed output |
LOG_FORMAT |
default |
Log format style: default (human-readable), or json (structured JSON format) |
LOG_VERBOSE_CONSOLE |
false |
If true, WARNING/ERROR/CRITICAL use full format (timestamp - logger - level - message). Default: WARNING/ERROR use simple format (LEVEL - message), INFO always minimal (message only) |
THIRD_PARTY_LOG_LEVEL |
ERROR |
Log level for third-party libraries (LiteLLM, urllib3, requests). Options: DEBUG, INFO, WARNING, ERROR. Default suppresses most third-party noise |
β οΈ Important: Do not increaseLLM_TEMPERATUREorLLM_TOP_Punless you fully understand the impact. Lower values keep the model stable and deterministic, which is critical for security analysis. Higher values may cause the model to become inconsistent, creative, or hallucinate results.
π Note: For additional configuration examples, see the
.env.examplefile in the project root.
Vulnhalla validates your configuration at startup. If required variables are missing or invalid, you'll see clear error messages indicating what needs to be fixed.
Common validation errors:
- Missing API key for selected provider
- Invalid provider name (must be
openai,azure, orgemini) - Missing Azure endpoint (required for Azure provider)
- Invalid CodeQL path (if
CODEQL_PATHis set but file doesn't exist)
The LLM uses the following status codes:
- 1337: Security vulnerability found (True Positive)
- 1007: Code is secure, no vulnerability (False Positive)
- 7331: More code/information needed to validate security
- 3713: Likely not a security problem, but more info needed (used with 7331)
The UI maps these to:
1337β "True Positive"1007β "False Positive"7331or3713β "Needs More Data"
See requirements.txt for Python dependencies:
requests- HTTP requests for GitHub APIpySmartDL- Smart download manager for CodeQL databaseslitellm- Unified LLM interface supporting multiple providerspython-dotenv- Environment variable managementPyYAML- YAML parsing for CodeQL pack filestextual- Terminal UI framework
CodeQL queries are organized in data/queries/<LANG>/:
issues/- Security issue detection queriestools/- Helper queries (function trees, classes, global variables, macros)
Each directory contains a qlpack.yml file defining the CodeQL pack.
Copyright (c) 2025 CyberArk Software Ltd. All rights reserved.
This repository is licensed under the Apache License, Version 2.0 - see LICENSE.txt for more details.
We welcome contributions of all kinds to this repository. For instructions on how to get started and descriptions of our development workflows, please see our contributing guide.
Please read and follow our Code of Conduct. We are committed to providing a welcoming and inclusive environment for all contributors.
Feel free to contact us via GitHub issues if you have any feature requests or project issues.
