LocalLab CLI Guide

LocalLab provides a comprehensive command-line interface (CLI) that makes it easy to run AI models locally and interact with them. This guide covers all CLI features from basic server management to the powerful chat interface.

🚀 Quick Start

# Install LocalLab
pip install locallab locallab-client

# Configure your setup
locallab config

# Download a model (optional, for faster startup)
locallab models download microsoft/phi-2

# Start the server
locallab start

# Chat with your AI
locallab chat

📚 Table of Contents

Installation
Chat Interface ⭐ Most Popular
Model Management 🤖 New Feature
Server Management
Interactive Configuration
Command Reference
Environment Variables
Configuration Storage
Google Colab Integration
Recent Updates

Installation

The LocalLab CLI is automatically installed when you install the LocalLab package:

pip install locallab locallab-client

After installation, you can access the CLI using the locallab command:

locallab --help

Chat Interface ⭐

The LocalLab Chat Interface is the easiest way to interact with your AI models. It provides a ChatGPT-like experience right in your terminal.

Quick Start

# Start your server
locallab start

# Open chat interface
locallab chat

Key Features

🎯 Dynamic Mode Switching - Change generation mode per message with --stream, --chat, etc.
🔄 Real-time Streaming - See responses as they're generated
💬 Rich Terminal UI - Markdown rendering with syntax highlighting
📚 Conversation Management - History, saving, and loading
🌐 Remote Access - Connect to any LocalLab server
🛠️ Error Recovery - Automatic reconnection and graceful handling

Basic Usage

# Connect to local server
locallab chat

# Connect to remote server
locallab chat --url https://your-ngrok-url.app

# Use specific generation mode
locallab chat --generate chat

# Custom parameters
locallab chat --max-tokens 200 --temperature 0.8

Interactive Commands

/help      # Show all available commands
/history   # View conversation history
/save      # Save current conversation
/batch     # Enter batch processing mode
/reset     # Clear conversation history
/exit      # Exit gracefully

Dynamic Mode Switching

Override the default generation mode for any message:

You: Write a story --stream          # Use streaming mode
You: Remember my name --chat         # Use chat mode with context
You: What's 2+2? --simple           # Use simple mode
You: Process these --batch          # Use batch mode

📖 Complete Guide: See the Chat Interface Documentation for detailed features and examples.

Model Management

LocalLab includes comprehensive model management capabilities to help you download, organize, and manage AI models locally.

Why Use Model Management?

⚡ Faster Startup - Pre-downloaded models load instantly
📱 Offline Usage - Use models without internet connection
💾 Disk Management - Monitor and clean up model cache
🔍 Model Discovery - Find and explore available models

Quick Examples

# Discover models from registry and HuggingFace Hub
locallab models discover

# Search for specific types of models
locallab models discover --search "code generation"

# Filter by tags
locallab models discover --tags "conversational,chat"

# Download a model for faster startup
locallab models download microsoft/phi-2

# List your cached models
locallab models list

# Get detailed model information
locallab models info microsoft/phi-2

# Clean up disk space
locallab models clean

Available Commands

Command	Description
`locallab models list`	List locally cached models
`locallab models download <model_id>`	Download a model locally
`locallab models remove <model_id>`	Remove a cached model
`locallab models discover`	Discover models from registry and HuggingFace Hub
`locallab models info <model_id>`	Show detailed model information
`locallab models clean`	Clean up orphaned cache files

📖 Complete Guide: See the Model Management Documentation for detailed usage, examples, and advanced features.

Server Management

Start Server

# Start with interactive prompts for missing settings
locallab start

# Start with specific settings
locallab start --use-ngrok --port 8080 --model microsoft/phi-2

Configure Settings

# Run the configuration wizard
locallab config

System Information

# Display system information
locallab info

Interactive Configuration

When you run locallab start without all required settings, the CLI will prompt you for the missing information:

Model Selection: Choose which model to load
Port Selection: Specify which port to run on
Ngrok Configuration: Enable public access via ngrok (if desired)
Optimization Settings: Configure performance optimizations

Example interactive session:

🎮 GPU detected with 8192MB free of 16384MB total
💾 System memory: 12288MB free of 16384MB total

🚀 Welcome to LocalLab! Let's set up your server.

📦 Which model would you like to use? [microsoft/phi-2]:
🔌 Which port would you like to run on? [8000]:
🌐 Do you want to enable public access via ngrok? [y/N]: y
🔑 Please enter your ngrok auth token: ******************

⚡ Would you like to configure optimizations for better performance? [Y/n]:
📊 Enable quantization for reduced memory usage? [Y/n]:
📊 Quantization type [int8/int4]: int8
🔪 Enable attention slicing for reduced memory usage? [Y/n]:
⚡ Enable flash attention for faster inference? [Y/n]:
🔄 Enable BetterTransformer for optimized inference? [Y/n]:

🔧 Would you like to configure advanced options? [y/N]:

✅ Configuration complete!

Command Reference

`locallab start`

Start the LocalLab server.

Options:

--use-ngrok: Enable ngrok for public access
--port: Port to run the server on
--ngrok-auth-token: Ngrok authentication token
--model: Model to load (e.g., microsoft/phi-2)
--quantize: Enable quantization
--quantize-type: Quantization type (int8 or int4)
--attention-slicing: Enable attention slicing
--flash-attention: Enable flash attention
--better-transformer: Enable BetterTransformer

Example:

locallab start --model microsoft/phi-2 --quantize --quantize-type int8

`locallab config`

Run the configuration wizard without starting the server. This command now shows your current configuration and allows you to modify it.

Example:

locallab config

Output:

📋 Current Configuration:
  port: 8000
  model_id: microsoft/phi-2
  enable_quantization: true
  quantization_type: int8
  enable_attention_slicing: true
  enable_flash_attention: false
  enable_better_transformer: false

Would you like to reconfigure these settings? [Y/n]:

`locallab info`

Display system information.

Example:

locallab info

Environment Variables

Key environment variables:

HUGGINGFACE_TOKEN: HuggingFace API token for accessing models (optional)
HUGGINGFACE_MODEL: Model to load
NGROK_AUTH_TOKEN: Ngrok authentication token
LOCALLAB_ENABLE_QUANTIZATION: Enable/disable quantization
LOCALLAB_QUANTIZATION_TYPE: Type of quantization (int8/int4)
LOCALLAB_ENABLE_ATTENTION_SLICING: Enable/disable attention slicing
LOCALLAB_ENABLE_FLASH_ATTENTION: Enable/disable Flash Attention
LOCALLAB_ENABLE_BETTERTRANSFORMER: Enable/disable BetterTransformer
LOCALLAB_ENABLE_CPU_OFFLOADING: Enable/disable CPU offloading

Configuration Storage

The CLI stores your configuration in ~/.locallab/config.json for future use. This includes:

HuggingFace token (if provided)
Model settings
Server configuration
Optimization settings

To view your stored configuration:

cat ~/.locallab/config.json

To reset your configuration, simply delete this file:

rm ~/.locallab/config.json

Google Colab Integration

The CLI works seamlessly in Google Colab. When running in Colab, the CLI automatically detects the environment and provides appropriate defaults:

from locallab import start_server

# This will prompt for any missing required settings
start_server()

# Or with some settings provided
start_server(use_ngrok=True, port=8080)

The CLI will detect that it's running in Colab and prompt for any missing required settings, such as the ngrok authentication token if use_ngrok=True is specified.

New in v0.4.9

Version 0.4.9 brings significant improvements to the configuration system:

🔄 Persistent Configuration That Works

Fixed Configuration Persistence: The locallab config command now properly saves settings that are respected when running locallab start
Configuration Display: The config command now shows your current configuration before prompting for changes
Skip Unnecessary Prompts: The CLI now only prompts for settings that aren't already configured
Clear Feedback: After saving configuration, the CLI shows what was saved and how to use it

🛠️ Improved Configuration Workflow

# Step 1: Configure your settings once
locallab config

# Step 2: Start the server using your saved configuration
locallab start

With this improved workflow, you only need to configure your settings once, and they'll be remembered for future sessions.

Example Configuration Session

$ locallab config

📋 Current Configuration:
  port: 8000
  model_id: microsoft/phi-2
  enable_quantization: true
  quantization_type: int8
  enable_attention_slicing: true

Would you like to reconfigure these settings? [Y/n]: n
Configuration unchanged.

$ locallab start
🎮 GPU detected with 8192MB free of 16384MB total
💾 System memory: 12288MB free of 16384MB total

✅ Using saved configuration!

New in v0.4.8

Version 0.4.8 brings significant improvements to the CLI:

⚡ Lightning-Fast Startup

Lazy Loading: The CLI now uses lazy loading for imports, resulting in much faster startup times
Optimized Initialization: Reduced unnecessary operations during CLI startup
Faster Response: Commands like locallab info now respond almost instantly

🛡️ Improved Error Handling

Robust Error Recovery: Better handling of common errors like missing dependencies
Informative Messages: More helpful error messages that guide you to solutions
Graceful Fallbacks: The CLI now gracefully handles missing or invalid configuration values

🔄 Unified Configuration System

Seamless Integration: CLI options, environment variables, and configuration files now work together harmoniously
Consistent Behavior: No more conflicts between different ways of setting configuration values
Clear Precedence: Environment variables take precedence over saved configuration, which takes precedence over defaults

📊 Enhanced System Information

Detailed Hardware Info: The locallab info command now provides more detailed information about your system
Better Memory Reporting: Improved memory usage reporting with proper unit conversion (GB instead of MB)
GPU Details: More comprehensive GPU information when available

Example Usage

# Start with interactive configuration - now much faster!
locallab start

# Use the improved system information command
locallab info

# Configure with specific options - now with better error handling
locallab start --model microsoft/phi-2 --quantize --quantize-type int8 --attention-slicing

Using the CLI in Python Code

You can also use the CLI functionality directly in your Python code:

from locallab.cli.interactive import prompt_for_config
from locallab.cli.config import save_config

# Run the interactive configuration
config = prompt_for_config()

# Save the configuration
save_config(config)

# Use the configuration
print(config)

This is useful if you want to build your own custom configuration flow on top of LocalLab's CLI.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LocalLab CLI Guide

🚀 Quick Start

📚 Table of Contents

Installation

Chat Interface ⭐

Quick Start

Key Features

Basic Usage

Interactive Commands

Dynamic Mode Switching

Model Management

Why Use Model Management?

Quick Examples

Available Commands

Server Management

Start Server

Configure Settings

System Information

Interactive Configuration

Command Reference

`locallab start`

`locallab config`

`locallab info`

Environment Variables

Configuration Storage

Google Colab Integration

New in v0.4.9

🔄 Persistent Configuration That Works

🛠️ Improved Configuration Workflow

Example Configuration Session

New in v0.4.8

⚡ Lightning-Fast Startup

🛡️ Improved Error Handling

🔄 Unified Configuration System

📊 Enhanced System Information

Example Usage

Using the CLI in Python Code

FilesExpand file tree

cli.md

Latest commit

History

cli.md

File metadata and controls

LocalLab CLI Guide

🚀 Quick Start

📚 Table of Contents

Installation

Chat Interface ⭐

Quick Start

Key Features

Basic Usage

Interactive Commands

Dynamic Mode Switching

Model Management

Why Use Model Management?

Quick Examples

Available Commands

Server Management

Start Server

Configure Settings

System Information

Interactive Configuration

Command Reference

locallab start

locallab config

locallab info

Environment Variables

Configuration Storage

Google Colab Integration

New in v0.4.9

🔄 Persistent Configuration That Works

🛠️ Improved Configuration Workflow

Example Configuration Session

New in v0.4.8

⚡ Lightning-Fast Startup

🛡️ Improved Error Handling

🔄 Unified Configuration System

📊 Enhanced System Information

Example Usage

Using the CLI in Python Code

`locallab start`

`locallab config`

`locallab info`