TensorRT Edge-LLM

High-Performance Large Language Model Inference Framework for NVIDIA Edge Platforms

Overview | Quick Start | Documentation | Roadmap

Overview

TensorRT Edge-LLM is NVIDIA's high-performance C++ inference runtime for Large Language Models (LLMs) and Vision-Language Models (VLMs) on embedded platforms. It enables efficient deployment of state-of-the-art language models on resource-constrained devices such as NVIDIA Jetson and NVIDIA DRIVE platforms. TensorRT Edge-LLM provides convenient Python scripts to convert HuggingFace checkpoints to ONNX. Engine build and end-to-end inference runs entirely on Edge platforms.

Getting Started

For the supported platforms, models and precisions, see the Overview. Get started with TensorRT Edge-LLM in <15 minutes. For complete installation and usage instructions, see the Quick Start Guide.

Documentation

Introduction

Overview - What is TensorRT Edge-LLM and key features
Supported Models - Complete model compatibility matrix
Checkpoint-Based Model Loader - Recommended ONNX export pipeline

User Guide

Installation - Set up Python export pipeline and C++ runtime
Quick Start Guide - Run your first inference in ~15 minutes
Examples - End-to-end workflows
Quantization - Create quantized checkpoints for llm_loader
Experimental High-Level Python API and Server - vLLM-style API and OpenAI-compatible server
Input Format Guide - Request format and specifications
Chat Template Format - Chat template configuration

Developer Guide

Software Design

Experimental Quantization Package Design - Quantization package architecture
Legacy Python Export Pipeline - Compatibility export path; tensorrt_edgellm/ will be removed in 0.8.0 after experimental/quantization -> experimental/llm_loader reaches full feature parity for all models and features
Engine Builder - Building TensorRT engines
C++ Runtime Overview - Runtime system architecture
- LLM Inference Runtime
- LLM SpecDecode Runtime

Advanced Topics

Customization Guide - Customizing TensorRT Edge-LLM for your needs
TensorRT Plugins - Custom plugin development
Tests - Comprehensive test suite for contributors

Use Cases

🚗 Automotive

In-vehicle AI assistants
Voice-controlled interfaces
Scene understanding
Driver assistance systems

🤖 Robotics

Natural language interaction
Task planning and reasoning
Visual question answering
Human-robot collaboration

🏭 Industrial IoT

Equipment monitoring with NLP
Automated inspection
Predictive maintenance
Voice-controlled machinery

📱 Edge Devices

On-device chatbots
Offline language processing
Privacy-preserving AI
Low-latency inference

Featured Websites

Follow our GitHub repository for the latest updates, releases, and announcements.

Support

Documentation: Full Documentation
Quick Start: Quick Start Guide
Roadmap: Developer Roadmap
Issues: GitHub Issues
Discussions: GitHub Discussions
Forums: NVIDIA Developer Forums

License

Apache License 2.0

Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.claude/agents		.claude/agents
.github		.github
3rdParty		3rdParty
cmake		cmake
cpp		cpp
docs		docs
examples		examples
experimental		experimental
kernelSrcs		kernelSrcs
scripts		scripts
tensorrt_edgellm		tensorrt_edgellm
tests		tests
unittests		unittests
.clang-format		.clang-format
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CMakeLists.txt		CMakeLists.txt
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CODING_GUIDELINES.md		CODING_GUIDELINES.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
LICENSE_HEADER		LICENSE_HEADER
README.md		README.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TensorRT Edge-LLM

Overview

Getting Started

Documentation

Introduction

User Guide

Developer Guide

Software Design

Advanced Topics

Use Cases

Featured Websites

Support

License

Contributing

About

Uh oh!

Releases 5

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TensorRT Edge-LLM

Overview

Getting Started

Documentation

Introduction

User Guide

Developer Guide

Software Design

Advanced Topics

Use Cases

Featured Websites

Support

License

Contributing

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages