LocalLab provides a comprehensive command-line interface (CLI) that makes it easy to run AI models locally and interact with them. This guide covers all CLI features from basic server management to the powerful chat interface.
# Install LocalLab
pip install locallab locallab-client
# Configure your setup
locallab config
# Download a model (optional, for faster startup)
locallab models download microsoft/phi-2
# Start the server
locallab start
# Chat with your AI
locallab chat- Installation
- Chat Interface ⭐ Most Popular
- Model Management 🤖 New Feature
- Server Management
- Interactive Configuration
- Command Reference
- Environment Variables
- Configuration Storage
- Google Colab Integration
- Recent Updates
The LocalLab CLI is automatically installed when you install the LocalLab package:
pip install locallab locallab-clientAfter installation, you can access the CLI using the locallab command:
locallab --helpThe LocalLab Chat Interface is the easiest way to interact with your AI models. It provides a ChatGPT-like experience right in your terminal.
# Start your server
locallab start
# Open chat interface
locallab chat- 🎯 Dynamic Mode Switching - Change generation mode per message with
--stream,--chat, etc. - 🔄 Real-time Streaming - See responses as they're generated
- 💬 Rich Terminal UI - Markdown rendering with syntax highlighting
- 📚 Conversation Management - History, saving, and loading
- 🌐 Remote Access - Connect to any LocalLab server
- 🛠️ Error Recovery - Automatic reconnection and graceful handling
# Connect to local server
locallab chat
# Connect to remote server
locallab chat --url https://your-ngrok-url.app
# Use specific generation mode
locallab chat --generate chat
# Custom parameters
locallab chat --max-tokens 200 --temperature 0.8/help # Show all available commands
/history # View conversation history
/save # Save current conversation
/batch # Enter batch processing mode
/reset # Clear conversation history
/exit # Exit gracefullyOverride the default generation mode for any message:
You: Write a story --stream # Use streaming mode
You: Remember my name --chat # Use chat mode with context
You: What's 2+2? --simple # Use simple mode
You: Process these --batch # Use batch mode📖 Complete Guide: See the Chat Interface Documentation for detailed features and examples.
LocalLab includes comprehensive model management capabilities to help you download, organize, and manage AI models locally.
- ⚡ Faster Startup - Pre-downloaded models load instantly
- 📱 Offline Usage - Use models without internet connection
- 💾 Disk Management - Monitor and clean up model cache
- 🔍 Model Discovery - Find and explore available models
# Discover models from registry and HuggingFace Hub
locallab models discover
# Search for specific types of models
locallab models discover --search "code generation"
# Filter by tags
locallab models discover --tags "conversational,chat"
# Download a model for faster startup
locallab models download microsoft/phi-2
# List your cached models
locallab models list
# Get detailed model information
locallab models info microsoft/phi-2
# Clean up disk space
locallab models clean| Command | Description |
|---|---|
locallab models list |
List locally cached models |
locallab models download <model_id> |
Download a model locally |
locallab models remove <model_id> |
Remove a cached model |
locallab models discover |
Discover models from registry and HuggingFace Hub |
locallab models info <model_id> |
Show detailed model information |
locallab models clean |
Clean up orphaned cache files |
📖 Complete Guide: See the Model Management Documentation for detailed usage, examples, and advanced features.
# Start with interactive prompts for missing settings
locallab start
# Start with specific settings
locallab start --use-ngrok --port 8080 --model microsoft/phi-2# Run the configuration wizard
locallab config# Display system information
locallab infoWhen you run locallab start without all required settings, the CLI will prompt you for the missing information:
- Model Selection: Choose which model to load
- Port Selection: Specify which port to run on
- Ngrok Configuration: Enable public access via ngrok (if desired)
- Optimization Settings: Configure performance optimizations
Example interactive session:
🎮 GPU detected with 8192MB free of 16384MB total
💾 System memory: 12288MB free of 16384MB total
🚀 Welcome to LocalLab! Let's set up your server.
📦 Which model would you like to use? [microsoft/phi-2]:
🔌 Which port would you like to run on? [8000]:
🌐 Do you want to enable public access via ngrok? [y/N]: y
🔑 Please enter your ngrok auth token: ******************
⚡ Would you like to configure optimizations for better performance? [Y/n]:
📊 Enable quantization for reduced memory usage? [Y/n]:
📊 Quantization type [int8/int4]: int8
🔪 Enable attention slicing for reduced memory usage? [Y/n]:
⚡ Enable flash attention for faster inference? [Y/n]:
🔄 Enable BetterTransformer for optimized inference? [Y/n]:
🔧 Would you like to configure advanced options? [y/N]:
✅ Configuration complete!
Start the LocalLab server.
Options:
--use-ngrok: Enable ngrok for public access--port: Port to run the server on--ngrok-auth-token: Ngrok authentication token--model: Model to load (e.g., microsoft/phi-2)--quantize: Enable quantization--quantize-type: Quantization type (int8 or int4)--attention-slicing: Enable attention slicing--flash-attention: Enable flash attention--better-transformer: Enable BetterTransformer
Example:
locallab start --model microsoft/phi-2 --quantize --quantize-type int8Run the configuration wizard without starting the server. This command now shows your current configuration and allows you to modify it.
Example:
locallab configOutput:
📋 Current Configuration:
port: 8000
model_id: microsoft/phi-2
enable_quantization: true
quantization_type: int8
enable_attention_slicing: true
enable_flash_attention: false
enable_better_transformer: false
Would you like to reconfigure these settings? [Y/n]:
Display system information.
Example:
locallab infoKey environment variables:
HUGGINGFACE_TOKEN: HuggingFace API token for accessing models (optional)HUGGINGFACE_MODEL: Model to loadNGROK_AUTH_TOKEN: Ngrok authentication tokenLOCALLAB_ENABLE_QUANTIZATION: Enable/disable quantizationLOCALLAB_QUANTIZATION_TYPE: Type of quantization (int8/int4)LOCALLAB_ENABLE_ATTENTION_SLICING: Enable/disable attention slicingLOCALLAB_ENABLE_FLASH_ATTENTION: Enable/disable Flash AttentionLOCALLAB_ENABLE_BETTERTRANSFORMER: Enable/disable BetterTransformerLOCALLAB_ENABLE_CPU_OFFLOADING: Enable/disable CPU offloading
The CLI stores your configuration in ~/.locallab/config.json for future use. This includes:
- HuggingFace token (if provided)
- Model settings
- Server configuration
- Optimization settings
To view your stored configuration:
cat ~/.locallab/config.jsonTo reset your configuration, simply delete this file:
rm ~/.locallab/config.jsonThe CLI works seamlessly in Google Colab. When running in Colab, the CLI automatically detects the environment and provides appropriate defaults:
from locallab import start_server
# This will prompt for any missing required settings
start_server()
# Or with some settings provided
start_server(use_ngrok=True, port=8080)The CLI will detect that it's running in Colab and prompt for any missing required settings, such as the ngrok authentication token if use_ngrok=True is specified.
Version 0.4.9 brings significant improvements to the configuration system:
- Fixed Configuration Persistence: The
locallab configcommand now properly saves settings that are respected when runninglocallab start - Configuration Display: The
configcommand now shows your current configuration before prompting for changes - Skip Unnecessary Prompts: The CLI now only prompts for settings that aren't already configured
- Clear Feedback: After saving configuration, the CLI shows what was saved and how to use it
# Step 1: Configure your settings once
locallab config
# Step 2: Start the server using your saved configuration
locallab startWith this improved workflow, you only need to configure your settings once, and they'll be remembered for future sessions.
$ locallab config
📋 Current Configuration:
port: 8000
model_id: microsoft/phi-2
enable_quantization: true
quantization_type: int8
enable_attention_slicing: true
Would you like to reconfigure these settings? [Y/n]: n
Configuration unchanged.
$ locallab start
🎮 GPU detected with 8192MB free of 16384MB total
💾 System memory: 12288MB free of 16384MB total
✅ Using saved configuration!
Version 0.4.8 brings significant improvements to the CLI:
- Lazy Loading: The CLI now uses lazy loading for imports, resulting in much faster startup times
- Optimized Initialization: Reduced unnecessary operations during CLI startup
- Faster Response: Commands like
locallab infonow respond almost instantly
- Robust Error Recovery: Better handling of common errors like missing dependencies
- Informative Messages: More helpful error messages that guide you to solutions
- Graceful Fallbacks: The CLI now gracefully handles missing or invalid configuration values
- Seamless Integration: CLI options, environment variables, and configuration files now work together harmoniously
- Consistent Behavior: No more conflicts between different ways of setting configuration values
- Clear Precedence: Environment variables take precedence over saved configuration, which takes precedence over defaults
- Detailed Hardware Info: The
locallab infocommand now provides more detailed information about your system - Better Memory Reporting: Improved memory usage reporting with proper unit conversion (GB instead of MB)
- GPU Details: More comprehensive GPU information when available
# Start with interactive configuration - now much faster!
locallab start
# Use the improved system information command
locallab info
# Configure with specific options - now with better error handling
locallab start --model microsoft/phi-2 --quantize --quantize-type int8 --attention-slicingYou can also use the CLI functionality directly in your Python code:
from locallab.cli.interactive import prompt_for_config
from locallab.cli.config import save_config
# Run the interactive configuration
config = prompt_for_config()
# Save the configuration
save_config(config)
# Use the configuration
print(config)This is useful if you want to build your own custom configuration flow on top of LocalLab's CLI.