DNPU Training Hub

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

DNPU Training Hub

Discussions

Sort by:
Sure — here’s a short NXP Community post draft for Smart Device Gateway: Building Voice-Enabled Smart Devices with Ara240 DNPU using Smart Device Gateway In this post, I want to share a quick walkthrough of Smart Device Gateway, a FastAPI-based demo server that allows connected devices to use local GenAI capabilities accelerated by the Ara240 DNPU. [github.com] The idea behind this demo is to centralize AI intelligence in one gateway instead of adding powerful AI hardware to every device. A connected device only needs a microphone, speaker, and network connection to become a voice-enabled assistant. [github.com] What It Does Smart Device Gateway enables devices such as appliances or embedded clients to send audio to a local server, process the request using speech recognition, RAG, an LLM running on Ara240 DNPU, and text-to-speech, then stream the spoken response back to the client. [github.com] The current demo showcases intelligent device assistants for a generic oven and coffee machine/barista use case, using device manuals as the knowledge base for contextual responses. [github.com] Key Features FastAPI server for local GenAI-powered device interaction [github.com] OpenAI Realtime API-compatible WebSocket endpoint at /v1/realtime [github.com] Bidirectional audio streaming for low-latency voice conversations [github.com] Speech-to-text powered by Moonshine [github.com] RAG-based contextual answers using device-specific knowledge bases [github.com] Qwen2.5-7B-Instruct running on Ara240 DNPU through the eIQ AAF Connector [github.com] Text-to-speech using a VITS-based model [github.com] Host PC and edge-client options for interacting with the gateway [github.com] Architecture Overview The Smart Device Gateway receives audio from a client over WebSocket, converts speech to text, retrieves relevant context from a device knowledge base, sends the prompt to the LLM through the eIQ AAF Connector, converts the generated response back to speech, and streams the audio response to the client. [github.com] At a high level, the flow is: Audio Input → STT → RAG → eIQ AAF Connector / LLM → TTS → Audio Output The LLM runs on the Ara240 DNPU, while the server runs on the FRDM i.MX platform. [github.com] Basic Server Installation Install the Debian package on the board: dpkg -i smart-device-gateway_1.0.0_all.deb Start the server: run_server_only --host 0.0.0.0 --port 8080 The server expects the eIQ AAF Connector to already be running on 0.0.0.0:8000 with Qwen2.5-7B-Instruct properly configured. [github.com] Alternatively, the demo can start the server together with the connector: run_server --host 0.0.0.0 --port 8080 [github.com] Host PC Client Example The demo includes a push_to_talk client that can run on a host PC. After copying the push_to_talk folder, run: python -m uv run push_to_talk.py --server_ip <BOARD_IP> --port <PORT> --device oven You can also use: python -m uv run push_to_talk.py --server_ip <BOARD_IP> --port <PORT> --device barista If no device name is provided, the RAG knowledge base is not used and the response is generated from the LLM’s general knowledge. [github.com] Edge Client Example The Debian package also includes an edge client for i.MX 8M and i.MX 9 boards. After creating a config.toml file, run: run_client --config-file path/to/config.toml The edge client supports wake-word interaction using “Hey NXP”, then sends the user query to the server and receives the streamed audio response. [github.com] Walkthrough Video In the attached video, I show how to start the Smart Device Gateway server, connect a client, select a device profile such as oven or barista, ask a voice question, and receive a spoken response generated locally using Ara240 DNPU acceleration. Video: Notes and Considerations Microphone quality has a direct impact on speech-to-text accuracy. [github.com] The STT model can be sensitive to accents and pronunciation variations. [github.com] RAG responses work best when questions are related to the selected device manual or knowledge base. [github.com] This is a Proof of Concept version 1.0.0, intended to demonstrate centralized AI intelligence for smart devices. [github.com] Summary Smart Device Gateway demonstrates how everyday devices can become voice-enabled assistants by connecting to a local AI gateway. By combining STT, RAG, LLM inference on Ara240 DNPU, and TTS, the demo provides a practical reference for building local, privacy-focused GenAI experiences on NXP i.MX platforms. [github.com] Link Smart Device Gateway repository: https://github.com/nxp-imx-support/smart-device-gateway
View full article
Sure — here’s a short NXP Community post draft for eIQ AAF Connector: Serving Edge AI Models on Ara240 DNPU with eIQ AAF Connector In this post, I want to share a quick walkthrough of eIQ AAF Connector, a REST-based server that enables LLM and VLM inference on NXP i.MX platforms using the Ara240 DNPU. [github.com] The connector provides a simple HTTP interface for client applications to send prompts and receive streaming token responses from models running locally on Ara240. It is also the communication layer used by applications such as LLM Edge Studio and VLM Edge Studio. [github.com] Supported Platforms FRDM i.MX 8M Plus FRDM i.MX 95 [github.com] Key Features REST API server for Ara240-accelerated model inference [github.com] Chat Completions-style HTTP endpoint [github.com] Streaming token responses [github.com] Support for text LLMs and Qwen2.5-VL models [github.com] Model configuration through server_config.json [github.com] Optional tool calling and guided generation support for compatible text models [github.com] Optional semantic prompt caching for text models [github.com] OpenAPI documentation available through the /docs endpoint [github.com] How It Works The eIQ AAF Connector runs on the i.MX host and exposes a REST API. Client applications send prompts to the connector, which communicates with the Ara240 Runtime SDK and the loaded model.dvm running on the Ara240 DNPU. The response is returned as generated tokens, with support for streaming output. [github.com] Basic Setup After installing the Debian package, activate the connector virtual environment: source /usr/share/eiq/aaf-connector/venv/bin/activate Run the connector: connector By default, the server starts on: 127.0.0.1:8000 To allow access from another device, start it with: connector --host 0.0.0.0 [github.com] Configuration The connector uses a JSON configuration file named server_config.json to define server settings and available models. This includes model paths, tokenizer paths, model type, prompt size, tool calling support, and whether the model should be loaded at startup. [github.com] Example configuration:   {   "log_level": "INFO",   "model_config_path": "/usr/share/llm/{}/",   "model_tokenizer_path": "/usr/share/llm/{}/tokenizer",   "available_models": [     {       "name": "qwen2_5-7b",       "description": "Qwen2.5 7B instance",       "type": "text",       "tool_calling": "native",       "max_prompt_size": 2047,       "enabled": true     }   ] }   Sending a Test Request Once the server is running, a basic request can be sent to the chat completions endpoint:   curl -H 'Content-Type: application/json' </span>   -d '{     "model": "Qwen2.5-7B-Instruct",     "messages": [       {         "role": "user",         "content": "Who are you?"       }     ]   }' </span>   -X POST 0.0.0.0:8000/v1/chat/completions   The API can also be tested from the OpenAPI UI at: http://0.0.0.0:8000/docs [github.com] Walkthrough Video In the attached video, I show how to start the eIQ AAF Connector, verify the server is running, configure a model, and send a sample request to the /v1/chat/completions endpoint. Video: Notes and Limitations The Ara240 Runtime SDK must be installed on the board. [github.com] The connector requires the target model and tokenizer paths to be correctly configured. [github.com] Only one model should be enabled at a time unless the Ara240 device has enough memory for multiple enabled models. [github.com] Some features depend on model type and configuration, such as tool calling, image input, video input, structured output, and semantic caching. [github.com] Summary The eIQ AAF Connector provides the REST API layer for running edge AI models on NXP i.MX platforms with Ara240 DNPU acceleration. It allows applications to send prompts, receive generated responses, and integrate local LLM or VLM inference into demos, prototypes, and edge AI workflows. [github.com] Link eIQ AAF Connector repository: https://github.com/nxp-imx-support/eiq-aaf-connector/
View full article
Sure — here’s a short NXP Community post draft for VLM Edge Studio: Running Vision-Language Models on i.MX with Ara240 DNPU using VLM Edge Studio In this post, I want to share a quick walkthrough of VLM Edge Studio, an NXP launcher application designed to interact with supported Vision-Language Models running locally on FRDM i.MX platforms with Ara240 DNPU acceleration. [github.com] VLM Edge Studio provides a Qt/QML-based GUI for model selection, prompt input, and visual interaction with locally running VLMs at the edge. It communicates with the Ara240 Runtime SDK through the eIQ AAF Connector using a REST-based interface and streaming token responses. [github.com] Supported Platforms FRDM i.MX 8M Plus [github.com] FRDM i.MX 95 [github.com] Key Features Local Vision-Language Model inference on supported i.MX platforms [github.com] Ara240 DNPU acceleration [github.com] GUI-based model selection and prompt input [github.com] Streaming token output [github.com] Integration with eIQ AAF Connector and Ara240 Runtime SDK [github.com] Support for camera-based visual input using a USB-C HD camera [github.com] Supported Model The current version supports: Qwen2.5-VL-7B-Instruct-Ara240 [github.com] This model is provided as an Ara240-compatible model.dvm file and is intended for local execution on the target platform. [github.com] Basic Installation After making sure the Ara240 Runtime SDK is installed on the target board, copy the Debian package to the board: scp vlm-edge-studio.deb root@<ip_addr>: Install it with: dpkg -i vlm-edge-studio.deb The installation may take a few minutes because the model needs to be extracted during setup. [github.com] Running VLM Edge Studio Start the application with: run_vlm_edge_studio Before launching, make sure the Ara240 runtime service is running: systemctl status rt-sdk-ara2.service --no-pager -l Once the GUI appears, click LOAD to load the model. After the model is ready, enter a prompt and submit it to interact with the VLM locally on the i.MX platform. [github.com] Walkthrough Video In the attached video, I show how to launch VLM Edge Studio, load the supported Vision-Language Model, submit a prompt, and interact with the model running locally with Ara240 DNPU acceleration. Video: Notes and Limitations A single application instance can load only one model at a time. [github.com] Multiple models cannot currently be assigned to different endpoints within the same instance. [github.com] The UI is designed for 1920x1080 resolution; higher-resolution displays may show layout issues. [github.com] A proper 5V/3A power supply is recommended to avoid instability or board resets during inference. [github.com] Summary VLM Edge Studio is a useful tool for evaluating local Vision-Language Model inference on NXP i.MX platforms using Ara240 DNPU acceleration. It provides a simple workflow for loading the model, entering prompts, and interacting with visual-language AI directly at the edge. [github.com] Link VLM Edge Studio repository: https://github.com/nxp-imx-support/vlm-edge-studio
View full article
Sure — here’s a shorter NXP Community post draft for LLM Edge Studio: Running Local LLMs on i.MX with Ara240 DNPU using LLM Edge Studio In this post, I want to share a quick walkthrough of LLM Edge Studio, an NXP launcher application designed to test supported Large Language Models running locally on i.MX platforms with Ara240 DNPU acceleration. [github.com] LLM Edge Studio provides a simple GUI to select a model, load it, enter prompts, and interact with an LLM directly at the edge. It communicates with the Ara240 Runtime SDK through the eIQ AAF Connector, using a REST-based interface for prompt submission and streaming token responses. [github.com] Supported Platforms FRDM i.MX 8M Plus FRDM i.MX 95 [github.com] Key Features Local LLM inference on supported i.MX platforms Ara240 DNPU acceleration GUI-based model selection and prompt input Streaming token output Integration with eIQ AAF Connector and Ara240 Runtime SDK Support for prebuilt .deb package installation or building from source [github.com] Supported Models The current version supports the following Ara240-optimized models: Qwen2.5-coder-1.5B Qwen2.5-7B-Instruct [github.com] These models are provided as Ara240-compatible model.dvm files and are intended for local execution on the target platform. [github.com] Basic Installation After making sure the Ara240 Runtime SDK is installed on the target board, copy the Debian package to the board: scp llm-edge-studio.deb root@<ip_addr>: Install it with: dpkg -i llm-edge-studio.deb The installation may take a few minutes because the required models are downloaded during setup. [github.com] Running LLM Edge Studio Start the application with: run_llm_edge_studio Before launching, make sure the Ara240 runtime service is running: systemctl status rt-sdk-ara2.service --no-pager -l Once the GUI appears, click LOAD to load the selected model. After the model is ready, enter a prompt and submit it to start interacting with the LLM. [github.com] Walkthrough Video In the attached video, I show how to launch LLM Edge Studio, load a supported model, submit a prompt, and view the generated response running locally on the i.MX platform with Ara240 DNPU acceleration. Video: Notes and Limitations A single application instance can load only one model at a time. Multiple models cannot currently be assigned to different endpoints within the same instance. The UI is designed for 1920x1080 resolution; higher-resolution displays may show layout issues. A proper 5V/3A power supply is recommended to avoid instability or board resets during inference. [github.com] Summary LLM Edge Studio is a useful tool for quickly evaluating local LLM inference on NXP i.MX platforms using Ara240 DNPU acceleration. It provides a simple workflow for model loading, prompt testing, and observing token streaming directly at the edge. Link LLM Edge Studio repository: https://github.com/nxp-imx-support/llm-edge-studio
View full article
Multi-Stream YOLOv8 Object Detection with Ara240 DNPU on i.MX This post shows a walkthrough of the ARA2 Vision Examples package and its multi-stream YOLOv8 object detection application. The ara2-vision-examples package provides vision AI examples for NXP i.MX platforms using Ara240 DNPU acceleration. It demonstrates real-time video processing with AI/ML inference capabilities such as object detection, classification, pose estimation, and semantic segmentation. This walkthrough focuses on the multistream_yolov8 application, which uses GStreamer to process up to eight simultaneous video streams, run YOLOv8 object detection on each stream, and display the results in a single mosaic view. Supported Platforms FRDM i.MX 8M Plus FRDM i.MX 95 Key Features Multi-stream video processing from 1 to 8 streams YOLOv8 object detection accelerated by Ara240 DNPU Support for YOLOv8n, YOLOv8s, YOLOv8m, YOLOv8l, and YOLOv8x models GStreamer-based video pipeline Mosaic display output with bounding boxes Runtime options for stream count, model selection, synchronization, and endpoint selection FPS and IPS performance overlay per stream Running the Demo Run the application with the default settings: multistream_yolov8 Run with a specific number of streams: multistream_yolov8 -s 4 Select a different YOLOv8 model: multistream_yolov8 -s 4 --model yolov8s Run eight streams for maximum throughput: multistream_yolov8 -s 8 --sync false Enable synchronized playback: multistream_yolov8 -s 4 --sync true Walkthrough Video   In the attached video, it is shown how to launch the application, configure the number of streams, select different YOLOv8 models, and view the object detection results in the mosaic display. Links ARA2 Vision Examples repository: https://github.com/nxp-imx-support/ara2-vision-examples Multi-stream YOLOv8 README: https://github.com/nxp-imx-support/ara2-vision-examples/blob/main/tasks/object-detection/yolov8n/multistream-gstreamer/README.md
View full article
The Runtime SDK for AI/ML acceleration using the Ara240 NPU on NXP i.MX SoCs, provides the runtime environment for the Ara‑2 NPU.   Main Purpose of the Package    1. Provide the runtime environment for the Ara‑2 NPU This runtime sdk package installs everything needed for an i.MX system to communicate with and utilize the Ara240 NPU hardware, including: NPU drivers  Low‑level utilities (metrics, hardware bring‑up, flash tools) Proxy services that interface applications with the NPU Firmware loaders and NPU configuration files 2. Allow users to run AI/ML inference models on the NPU The Ara240 runtime environment includes tools for: Downloading pre‑compiled AI/ML models Running performance tests on the NPU Running classification, detection, pose, and segmentation models Inspecting HW IPS (inference/second) and real hardware performance   Scripts such as: fetch_models.sh ara_metrics.sh chip_info.sh program_flash.sh run_models_perf.sh 3. Automatically configure and optimize the i.MX system Installation does the following automatically: Expands system partition to handle large models (LLMs/VLMs) Sets up an 8GB SWAP for devices with limited RAM Prepares the runtime environment for AI workloads 4. Manage and update Ara‑2 firmware The package contains scripts to: Check the installed firmware version ( chip_info.sh ) Update firmware if needed ( program_flash.sh ) 5. Provide systemd service for automatic startup The SDK installs: A systemd service: rt‑sdk‑ara2.service   Walkthrough Video Below is the walkthrough video for this package  
View full article