Sure — here’s a short NXP Community post draft for eIQ AAF Connector:
Serving Edge AI Models on Ara240 DNPU with eIQ AAF Connector
In this post, I want to share a quick walkthrough of eIQ AAF Connector, a REST-based server that enables LLM and VLM inference on NXP i.MX platforms using the Ara240 DNPU. [github.com]
The connector provides a simple HTTP interface for client applications to send prompts and receive streaming token responses from models running locally on Ara240. It is also the communication layer used by applications such as LLM Edge Studio and VLM Edge Studio. [github.com]
Supported Platforms
FRDM i.MX 8M Plus
FRDM i.MX 95 [github.com]
Key Features
REST API server for Ara240-accelerated model inference [github.com]
Chat Completions-style HTTP endpoint [github.com]
Streaming token responses [github.com]
Support for text LLMs and Qwen2.5-VL models [github.com]
Model configuration through server_config.json [github.com]
Optional tool calling and guided generation support for compatible text models [github.com]
Optional semantic prompt caching for text models [github.com]
OpenAPI documentation available through the /docs endpoint [github.com]
How It Works
The eIQ AAF Connector runs on the i.MX host and exposes a REST API. Client applications send prompts to the connector, which communicates with the Ara240 Runtime SDK and the loaded model.dvm running on the Ara240 DNPU. The response is returned as generated tokens, with support for streaming output. [github.com]
Basic Setup
After installing the Debian package, activate the connector virtual environment:
source /usr/share/eiq/aaf-connector/venv/bin/activate
Run the connector:
connector
By default, the server starts on:
127.0.0.1:8000
To allow access from another device, start it with:
connector --host 0.0.0.0
[github.com]
Configuration
The connector uses a JSON configuration file named server_config.json to define server settings and available models. This includes model paths, tokenizer paths, model type, prompt size, tool calling support, and whether the model should be loaded at startup. [github.com]
Example configuration:
{
"log_level": "INFO",
"model_config_path": "/usr/share/llm/{}/",
"model_tokenizer_path": "/usr/share/llm/{}/tokenizer",
"available_models": [
{
"name": "qwen2_5-7b",
"description": "Qwen2.5 7B instance",
"type": "text",
"tool_calling": "native",
"max_prompt_size": 2047,
"enabled": true
}
]
}
Sending a Test Request
Once the server is running, a basic request can be sent to the chat completions endpoint:
curl -H 'Content-Type: application/json' </span>
-d '{
"model": "Qwen2.5-7B-Instruct",
"messages": [
{
"role": "user",
"content": "Who are you?"
}
]
}' </span>
-X POST 0.0.0.0:8000/v1/chat/completions
The API can also be tested from the OpenAPI UI at:
http://0.0.0.0:8000/docs
[github.com]
Walkthrough Video
In the attached video, I show how to start the eIQ AAF Connector, verify the server is running, configure a model, and send a sample request to the /v1/chat/completions endpoint.
Video:
Notes and Limitations
The Ara240 Runtime SDK must be installed on the board. [github.com]
The connector requires the target model and tokenizer paths to be correctly configured. [github.com]
Only one model should be enabled at a time unless the Ara240 device has enough memory for multiple enabled models. [github.com]
Some features depend on model type and configuration, such as tool calling, image input, video input, structured output, and semantic caching. [github.com]
Summary
The eIQ AAF Connector provides the REST API layer for running edge AI models on NXP i.MX platforms with Ara240 DNPU acceleration. It allows applications to send prompts, receive generated responses, and integrate local LLM or VLM inference into demos, prototypes, and edge AI workflows. [github.com]
Link
eIQ AAF Connector repository: https://github.com/nxp-imx-support/eiq-aaf-connector/
View full article