Deepseek R1 1.5B & Meta LLama 3.2 1B on NXP 8M Plus.

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Deepseek R1 1.5B & Meta LLama 3.2 1B on NXP 8M Plus.

2,726 Views
niravdesai
Contributor I

I want to run Deepseek R1 1.5B (https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) & Meta LLama 3.2 1B (https://huggingface.co/meta-llama/Llama-3.2-1B) on NXP 8M Plus.

As only TFlite can access NXP 8M Plus NPU, I understand that these models would have to be converted to TFlite. Can you share some details how to run it on NXP 8M Plus, please include all necessary steps to do it.

0 Kudos
Reply
3 Replies

2,634 Views
jamesbone
NXP TechSupport
NXP TechSupport

This happens because TensorFlow's protobuf handling has a 2GB size constraint, and DeepSeek Model far exceeds that.

Possible Solutions:

  1. Use tf.experimental.load_from_saved_model() Instead of Direct ONNX Import
    • Some versions of TensorFlow have experimental functions to handle large models more efficiently.
  2. Optimize the ONNX Model Before Conversion
    • Use onnx.utils.polish_model(model) to simplify the structure.
    • Apply quantization (onnxruntime.quantization.quantize_dynamic() or onnxruntime.quantization.quantize_static()) to reduce model size.
  3. Break the Model into Smaller Segments
    • If possible, split the model into different layers or sub-models before conversion.
  4. Convert in FP8 Instead of FP32
    • Reducing precision from FP32 to FP8 (float8) can significantly lower model size. You can use onnxruntime-tools for this.
  5. Use an Alternative Converter like onnx2tf
  • Some unofficial tools (onnx2tf) may handle large models better by converting layer by layer.

If none of these work, another approach is to load the model in ONNX, strip unnecessary operations, and re-export a slimmer version before converting to TensorFlow.

0 Kudos
Reply

2,663 Views
niravdesai
Contributor I

1. We are trying to convert these two model to tflte
onnx-->tensorflow-->tflite
  meta-llama/Llama-3.2-1B
  deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
  able to convert the model to onnx format but
  while converting onnx to tensorflow we are getting error --> onnx.ModelProto exceeds maximum protobuf size of 2GB: 7109597546

2. Is this conversion even possible for models like Deepseek, can you share some references that this is usually possible and people have run deepseek model on 8M Plus hardware? I could only see smaller tensorflow models like bert.

3. Can you provide reference of LLM model which can be used on iMX8Plus with npu usage, if not Deepseek & Meta llama

4. Can you please also suggest Hardware that is suitable to run Deepseek & Meta llama tflite in formats.

0 Kudos
Reply

2,704 Views
jamesbone
NXP TechSupport
NXP TechSupport

Like you mention the models need to be converted to TFLIte,  The usual steps to convert a model ,would be :

1. Export the Model to ONNX (using PyTorch)

import torch
import onnx
from transformers import AutoModel

model = AutoModel.from_pretrained("deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B")
dummy_input = torch.randn(1, 128) # Adjust input shape as needed
torch.onnx.export(model, dummy_input, "deepseek_r1.onnx")

2.  Convert ONNX to TensorFlow

from onnx_tf.backend import prepare
import onnx

onnx_model = onnx.load("deepseek_r1.onnx")
tf_model = prepare(onnx_model)
tf_model.export_graph("deepseek_r1_tf")

 

3./Convert TensorFlow Model to TFLite

import tensorflow as tf

converter = tf.lite.TFLiteConverter.from_saved_model("deepseek_r1_tf")
tflite_model = converter.convert()

with open("deepseek_r1.tflite", "wb") as f:
f.write(tflite_model)

 

 

 

 

 

 

0 Kudos
Reply