topic Re: Deepseek R1 1.5B & Meta LLama 3.2 1B on NXP 8M Plus. in i.MX Processors

Deepseek R1 1.5B & Meta LLama 3.2 1B on NXP 8M Plus.

niravdesai — Wed, 21 May 2025 04:34:11 GMT

I want to run Deepseek R1 1.5B (https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) & Meta LLama 3.2 1B (https://huggingface.co/meta-llama/Llama-3.2-1B) on NXP 8M Plus.

As only TFlite can access NXP 8M Plus NPU, I understand that these models would have to be converted to TFlite. Can you share some details how to run it on NXP 8M Plus, please include all necessary steps to do it.

Re: Deepseek R1 1.5B & Meta LLama 3.2 1B on NXP 8M Plus.

jamesbone — Wed, 21 May 2025 12:14:23 GMT

Like you mention the models need to be converted to TFLIte, The usual steps to convert a model ,would be :

1. Export the Model to ONNX (using PyTorch)

import torch
import onnx
from transformers import AutoModel

model = AutoModel.from_pretrained("deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B")
dummy_input = torch.randn(1, 128) # Adjust input shape as needed
torch.onnx.export(model, dummy_input, "deepseek_r1.onnx")

2. Convert ONNX to TensorFlow

from onnx_tf.backend import prepare
import onnx

onnx_model = onnx.load("deepseek_r1.onnx")
tf_model = prepare(onnx_model)
tf_model.export_graph("deepseek_r1_tf")

3./Convert TensorFlow Model to TFLite

import tensorflow as tf

converter = tf.lite.TFLiteConverter.from_saved_model("deepseek_r1_tf")
tflite_model = converter.convert()

with open("deepseek_r1.tflite", "wb") as f:
f.write(tflite_model)

Re: Deepseek R1 1.5B & Meta LLama 3.2 1B on NXP 8M Plus.

niravdesai — Fri, 23 May 2025 06:34:55 GMT

1. We are trying to convert these two model to tflte
onnx-->tensorflow-->tflite
meta-llama/Llama-3.2-1B
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
able to convert the model to onnx format but
while converting onnx to tensorflow we are getting error --> onnx.ModelProto exceeds maximum protobuf size of 2GB: 7109597546

2. Is this conversion even possible for models like Deepseek, can you share some references that this is usually possible and people have run deepseek model on 8M Plus hardware? I could only see smaller tensorflow models like bert.

3. Can you provide reference of LLM model which can be used on iMX8Plus with npu usage, if not Deepseek & Meta llama

4. Can you please also suggest Hardware that is suitable to run Deepseek & Meta llama tflite in formats.

Re: Deepseek R1 1.5B & Meta LLama 3.2 1B on NXP 8M Plus.

jamesbone — Fri, 23 May 2025 15:59:06 GMT

This happens because TensorFlow's protobuf handling has a 2GB size constraint, and DeepSeek Model far exceeds that.

Possible Solutions:

Use tf.experimental.load_from_saved_model() Instead of Direct ONNX Import

Some versions of TensorFlow have experimental functions to handle large models more efficiently.

Optimize the ONNX Model Before Conversion

Use onnx.utils.polish_model(model) to simplify the structure.
Apply quantization (onnxruntime.quantization.quantize_dynamic() or onnxruntime.quantization.quantize_static()) to reduce model size.

Break the Model into Smaller Segments

If possible, split the model into different layers or sub-models before conversion.

Convert in FP8 Instead of FP32

Reducing precision from FP32 to FP8 (float8) can significantly lower model size. You can use onnxruntime-tools for this.

Use an Alternative Converter like onnx2tf

Some unofficial tools (onnx2tf) may handle large models better by converting layer by layer.

If none of these work, another approach is to load the model in ONNX, strip unnecessary operations, and re-export a slimmer version before converting to TensorFlow.