LLM Edge Studio
In this post, I want to share a quick walkthrough of LLM Edge Studio, an NXP launcher application designed to test supported Large Language Models running locally on i.MX platforms with Ara240 DNPU acceleration.
LLM Edge Studio provides a simple GUI to select a model, load it, enter prompts, and interact with an LLM directly at the edge. It communicates with the Ara240 Runtime SDK through the eIQ AAF Connector, using a REST-based interface for prompt submission and streaming token responses.
Key Features
- Local LLM inference on supported i.MX platforms
- Ara240 DNPU acceleration
- GUI-based model selection and prompt input
- Streaming token output
- Integration with eIQ AAF Connector and Ara240 Runtime SDK
- Support for prebuilt Debian package installation or building from source
Supported Models
Qwen2.5-coder-1.5B
Qwen2.5-7B-Instruct
These models are provided as Ara240-compatible model.dvm files and are intended for local execution on the target platform.
Basic Installation
After making sure the Ara240 Runtime SDK is installed on the target board, copy the Debian package:
scp llm-edge-studio.deb root@<ip_addr>:
Install it with:
dpkg -i llm-edge-studio.deb
The installation may take a few minutes because the required models are downloaded during setup.
Running LLM Edge Studio
Start the application with:
run_llm_edge_studio
Before launching, make sure the Ara240 runtime service is running:
systemctl status rt-sdk-ara2.service --no-pager -l
Once the GUI appears, click LOAD to load the selected model. After the model is ready, enter a prompt and submit it to start interacting with the LLM.
Walkthrough Video
In the attached video, I show how to launch LLM Edge Studio, load a supported model, submit a prompt, and view the generated response running locally on the i.MX platform with Ara240 DNPU acceleration.
Summary
LLM Edge Studio is a useful tool for quickly evaluating local LLM inference on NXP i.MX platforms using Ara240 DNPU acceleration. It provides a simple workflow for model loading, prompt testing, and observing token streaming directly at the edge.
Link