Skip to content

NVIDIA/TensorRT-Edge-LLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

TensorRT Edge-LLM

High-Performance Large Language Model Inference Framework for NVIDIA Edge Platforms

Documentation version license

Overview   |   Quick Start   |   Documentation   |   Roadmap


Overview

TensorRT Edge-LLM is NVIDIA's high-performance C++ inference runtime for Large Language Models (LLMs) and Vision-Language Models (VLMs) on embedded platforms. It enables efficient deployment of state-of-the-art language models on resource-constrained devices such as NVIDIA Jetson and NVIDIA DRIVE platforms. TensorRT Edge-LLM provides convenient Python scripts to convert HuggingFace checkpoints to ONNX. Engine build and end-to-end inference runs entirely on Edge platforms.


Getting Started

For the supported platforms, models and precisions, see the Overview. Get started with TensorRT Edge-LLM in <15 minutes. For complete installation and usage instructions, see the Quick Start Guide.


Documentation

Introduction

User Guide

Developer Guide

Software Design

Advanced Topics


Use Cases

🚗 Automotive

  • In-vehicle AI assistants
  • Voice-controlled interfaces
  • Scene understanding
  • Driver assistance systems

🤖 Robotics

  • Natural language interaction
  • Task planning and reasoning
  • Visual question answering
  • Human-robot collaboration

🏭 Industrial IoT

  • Equipment monitoring with NLP
  • Automated inspection
  • Predictive maintenance
  • Voice-controlled machinery

📱 Edge Devices

  • On-device chatbots
  • Offline language processing
  • Privacy-preserving AI
  • Low-latency inference

Featured Websites

Follow our GitHub repository for the latest updates, releases, and announcements.


Support


License

Apache License 2.0


Contributing

We welcome contributions! Please see our Contributing Guidelines for details.


About

High-performance, light-weight C++ LLM and VLM Inference Software for Physical AI

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors