VLM Edge Studio
In this post, I want to share a quick walkthrough of VLM Edge Studio, an NXP launcher application designed to interact with supported Vision-Language Models running locally on FRDM i.MX platforms with Ara240 DNPU acceleration.
VLM Edge Studio provides a Qt/QML-based GUI for model selection, prompt input, and visual interaction with locally running VLMs at the edge. It communicates with the Ara240 Runtime SDK through the eIQ AAF Connector using a REST-based interface and streaming token responses.
Key Features
- Local Vision-Language Model inference on supported i.MX platforms
- Ara240 DNPU acceleration
- GUI-based model selection and prompt input
- Streaming token output
- Integration with eIQ AAF Connector and Ara240 Runtime SDK
- Support for camera-based visual input using a USB-C HD camera
Supported Model
Qwen2.5-VL-7B-Instruct-Ara240
This model is provided as an Ara240-compatible model.dvm file and is intended for local execution on the target platform.
Basic Installation
After making sure the Ara240 Runtime SDK is installed on the target board, copy the Debian package:
scp vlm-edge-studio.deb root@<ip_addr>:
Install it with:
dpkg -i vlm-edge-studio.deb
The installation may take a few minutes because the model needs to be extracted during setup.
Running VLM Edge Studio
Start the application with:
run_vlm_edge_studio
Before launching, make sure the Ara240 runtime service is running:
systemctl status rt-sdk-ara2.service --no-pager -l
Once the GUI appears, click LOAD to load the model. After the model is ready, enter a prompt and submit it to interact with the VLM locally on the i.MX platform.
Walkthrough Video
In the attached video, I show how to launch VLM Edge Studio, load the supported Vision-Language Model, submit a prompt, and interact with the model running locally with Ara240 DNPU acceleration.
Summary
VLM Edge Studio is a useful tool for evaluating local Vision-Language Model inference on NXP i.MX platforms using Ara240 DNPU acceleration. It provides a simple workflow for loading the model, entering prompts, and interacting with visual-language AI directly at the edge.
Link