Porting the Deepseek to the i.MX 8MP|93 EVK

取消
显示结果 
显示  仅  | 搜索替代 
您的意思是: 

Porting the Deepseek to the i.MX 8MP|93 EVK

Porting the Deepseek to the i.MX 8MP|93 EVK

This sharing introduces how to porting the deepseek to the #i.MX8MP i.MX93EVK  with the Yocto BSP by llama.cpp

The main test model used in this document is the Qwen model that is distilled and quantized based on the deepseek model. For other versions of the deepseek model, you can refer to the steps in the document to download different models for testing.

1. Set up the demo

ON PC

a. Prepare the cross-compiling.
See the i.MX Yocto Project User's Guide for detailed information how to generate Yocto SDK environment for cross-compiling. Get the User's Guide. To activate this Yocto SDK environment on your host machine, use this command:

 

:$ source <Yocto_SDK_install_folder>/environment-setup-cortexa53-crypto-poky-linux

 


b. Cross-compile the llama.cpp
eg: i.MX93

 

:$ git clone https://github.com/ggerganov/llama.cpp
:$ mkdir build_93
:$ cd build_93
:build_93$ cmake .. -DCMAKE_SYSTEM_NAME=Linux -DCMAKE_SYSTEM_PROCESSOR=aarch64 -DCMAKE_C_COMPILER=aarch64-poky-linux-gcc -DCMAKE_CXX_COMPILER=aarch64-poky-linux-g++
:build_93$ make -j8
:build_93$ scp bin/llama-cli root@<your i.MX93 board IP>:~/
:build_93$ scp bin/*.so root@<your i.MX93 board IP>:/usr/lib/

 


c. Get the DeepSeek model on the huggingface

eg: Dowload the DeepSeek-R1-Distill-Qwen-1.5B-Q4_K_M.gguf model
Download the required Deepseek model in the huggingface.


ON Board

a.Test the Deepseek on the i.MX93 board

 

:~/# ./llama-cli --model DeepSeek-R1-Distill-Qwen-1.5B-Q4_K_M.gguf

 

 

b. Results shown below:

Yuhao_Ning_7-1740641971221.png

 

2. Results Analysis
The effects of different models on different boards were tested. It should be noted that the biggest obstacle limiting the running of the model on the board is memory.The test results including CPU and memory usage are as follows:


a. i.MX8mp + DeepSeek-R1-Distill-Qwen-7B-IQ4_XS

Yuhao_Ning_8-1740641971235.png

Yuhao_Ning_9-1740641971250.png


b. i.MX93 + DeepSeek-R1-Distill-Qwen-1.5B-Q4_K_M

Yuhao_Ning_10-1740641971265.png

Yuhao_Ning_11-1740641971283.png

 

After testing, the speed at which i.MX8MP runs DeepSeek-R1-Distill-Qwen-7B-IQ4_XS to generate tokens is about 1 token per second. The speed at which i.MX93 runs DeepSeek-R1-Distill-Qwen-1.5B-Q4_K_M to generate tokens is about 1.6 token per second.  The above test results for the generation speed are only rough test results and are for reference only.

The above icons show the CPU and memory usage of i.MX during the DeepSeek model running. It should be pointed out that the CPU efficiency affects the speed of model token generation. The memory size of the board limits whether the model can run in the corresponding development board. This is a balance between running speed and required memory size. Higher accuracy, such as using a 7B model, will result in a decrease in running speed.

无评分
版本历史
最后更新:
‎02-27-2025 01:12 AM
更新人: