<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Deepseek R1 1.5B &amp;amp; Meta LLama 3.2 1B  on NXP 8M Plus. in i.MX Processors</title>
    <link>https://community.nxp.com/t5/i-MX-Processors/Deepseek-R1-1-5B-amp-Meta-LLama-3-2-1B-on-NXP-8M-Plus/m-p/2103207#M237519</link>
    <description>&lt;P&gt;1. We are trying to convert these two model to tflte&lt;BR /&gt;onnx--&amp;gt;tensorflow--&amp;gt;tflite&lt;BR /&gt;  meta-llama/Llama-3.2-1B&lt;BR /&gt;  deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B&lt;BR /&gt;  able to convert the model to onnx format but&lt;BR /&gt;  while converting onnx to tensorflow we are getting error --&amp;gt; onnx.ModelProto exceeds maximum protobuf size of 2GB: 7109597546&lt;/P&gt;&lt;P&gt;2. Is this conversion even possible for models like Deepseek, can you share some references that this is usually possible and people have run deepseek model on 8M Plus hardware? I could only see smaller tensorflow models like bert.&lt;/P&gt;&lt;P&gt;3. Can you provide reference of LLM model which can be used on iMX8Plus with npu usage, if not Deepseek &amp;amp; Meta llama&lt;/P&gt;&lt;P&gt;4. Can you please also suggest Hardware that is suitable to run Deepseek &amp;amp; Meta llama tflite in formats.&lt;/P&gt;</description>
    <pubDate>Fri, 23 May 2025 06:34:55 GMT</pubDate>
    <dc:creator>niravdesai</dc:creator>
    <dc:date>2025-05-23T06:34:55Z</dc:date>
    <item>
      <title>Deepseek R1 1.5B &amp; Meta LLama 3.2 1B  on NXP 8M Plus.</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/Deepseek-R1-1-5B-amp-Meta-LLama-3-2-1B-on-NXP-8M-Plus/m-p/2101161#M237420</link>
      <description>&lt;P&gt;I want to run Deepseek R1 1.5B (&lt;A title="https://huggingface.co/deepseek-ai/deepseek-r1-distill-qwen-1.5b" href="https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B" target="_blank" rel="noreferrer noopener"&gt;https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B&lt;/A&gt;) &amp;amp; Meta LLama 3.2 1B (&lt;A title="https://huggingface.co/meta-llama/llama-3.2-1b" href="https://huggingface.co/meta-llama/Llama-3.2-1B" target="_blank" rel="noreferrer noopener"&gt;https://huggingface.co/meta-llama/Llama-3.2-1B&lt;/A&gt;) on NXP 8M Plus.&lt;/P&gt;&lt;P&gt;As only TFlite can access NXP 8M Plus NPU, I understand that these models would have to be converted to TFlite. Can you share some details how to run it on NXP 8M Plus, please include all necessary steps to do it.&lt;/P&gt;</description>
      <pubDate>Wed, 21 May 2025 04:34:11 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/Deepseek-R1-1-5B-amp-Meta-LLama-3-2-1B-on-NXP-8M-Plus/m-p/2101161#M237420</guid>
      <dc:creator>niravdesai</dc:creator>
      <dc:date>2025-05-21T04:34:11Z</dc:date>
    </item>
    <item>
      <title>Re: Deepseek R1 1.5B &amp; Meta LLama 3.2 1B  on NXP 8M Plus.</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/Deepseek-R1-1-5B-amp-Meta-LLama-3-2-1B-on-NXP-8M-Plus/m-p/2101611#M237459</link>
      <description>&lt;P&gt;Like you mention the models need to be converted to TFLIte,&amp;nbsp; The usual steps to convert a model ,would be :&lt;/P&gt;
&lt;P&gt;1. &lt;!--StartFragment --&gt;&lt;STRONG&gt;Export the Model to ONNX (using PyTorch)&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;import torch&lt;BR /&gt;import onnx&lt;BR /&gt;from transformers import AutoModel&lt;/P&gt;
&lt;P&gt;model = AutoModel.from_pretrained("deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B")&lt;BR /&gt;dummy_input = torch.randn(1, 128) # Adjust input shape as needed&lt;BR /&gt;torch.onnx.export(model, dummy_input, "deepseek_r1.onnx")&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;2.&amp;nbsp; &lt;!--StartFragment --&gt;&lt;/STRONG&gt;&lt;STRONG&gt;Convert ONNX to TensorFlow&lt;/STRONG&gt;&lt;/P&gt;
&lt;OL&gt;&lt;!--EndFragment --&gt;&lt;/OL&gt;
&lt;P&gt;from onnx_tf.backend import prepare&lt;BR /&gt;import onnx&lt;/P&gt;
&lt;P&gt;onnx_model = onnx.load("deepseek_r1.onnx")&lt;BR /&gt;tf_model = prepare(onnx_model)&lt;BR /&gt;tf_model.export_graph("deepseek_r1_tf")&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;3./&lt;!--StartFragment --&gt;&lt;STRONG&gt;Convert TensorFlow Model to TFLite&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;import tensorflow as tf&lt;/P&gt;
&lt;P&gt;converter = tf.lite.TFLiteConverter.from_saved_model("deepseek_r1_tf")&lt;BR /&gt;tflite_model = converter.convert()&lt;/P&gt;
&lt;P&gt;with open("deepseek_r1.tflite", "wb") as f:&lt;BR /&gt;f.write(tflite_model)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;OL&gt;&lt;!--EndFragment --&gt;&lt;/OL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 21 May 2025 12:14:23 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/Deepseek-R1-1-5B-amp-Meta-LLama-3-2-1B-on-NXP-8M-Plus/m-p/2101611#M237459</guid>
      <dc:creator>jamesbone</dc:creator>
      <dc:date>2025-05-21T12:14:23Z</dc:date>
    </item>
    <item>
      <title>Re: Deepseek R1 1.5B &amp; Meta LLama 3.2 1B  on NXP 8M Plus.</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/Deepseek-R1-1-5B-amp-Meta-LLama-3-2-1B-on-NXP-8M-Plus/m-p/2103207#M237519</link>
      <description>&lt;P&gt;1. We are trying to convert these two model to tflte&lt;BR /&gt;onnx--&amp;gt;tensorflow--&amp;gt;tflite&lt;BR /&gt;  meta-llama/Llama-3.2-1B&lt;BR /&gt;  deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B&lt;BR /&gt;  able to convert the model to onnx format but&lt;BR /&gt;  while converting onnx to tensorflow we are getting error --&amp;gt; onnx.ModelProto exceeds maximum protobuf size of 2GB: 7109597546&lt;/P&gt;&lt;P&gt;2. Is this conversion even possible for models like Deepseek, can you share some references that this is usually possible and people have run deepseek model on 8M Plus hardware? I could only see smaller tensorflow models like bert.&lt;/P&gt;&lt;P&gt;3. Can you provide reference of LLM model which can be used on iMX8Plus with npu usage, if not Deepseek &amp;amp; Meta llama&lt;/P&gt;&lt;P&gt;4. Can you please also suggest Hardware that is suitable to run Deepseek &amp;amp; Meta llama tflite in formats.&lt;/P&gt;</description>
      <pubDate>Fri, 23 May 2025 06:34:55 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/Deepseek-R1-1-5B-amp-Meta-LLama-3-2-1B-on-NXP-8M-Plus/m-p/2103207#M237519</guid>
      <dc:creator>niravdesai</dc:creator>
      <dc:date>2025-05-23T06:34:55Z</dc:date>
    </item>
    <item>
      <title>Re: Deepseek R1 1.5B &amp; Meta LLama 3.2 1B  on NXP 8M Plus.</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/Deepseek-R1-1-5B-amp-Meta-LLama-3-2-1B-on-NXP-8M-Plus/m-p/2103668#M237545</link>
      <description>&lt;P&gt;&lt;!--StartFragment --&gt;&lt;/P&gt;
&lt;P&gt;This happens because TensorFlow's protobuf handling has a 2GB size constraint, and DeepSeek Model far exceeds that.&lt;/P&gt;
&lt;P&gt;Possible Solutions:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;&lt;STRONG&gt;Use &lt;CODE&gt;tf.experimental.load_from_saved_model()&lt;/CODE&gt; Instead of Direct ONNX Import&lt;/STRONG&gt;&lt;/LI&gt;
&lt;UL&gt;
&lt;LI&gt;Some versions of TensorFlow have experimental functions to handle large models more efficiently.&lt;/LI&gt;
&lt;/UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Optimize the ONNX Model Before Conversion&lt;/STRONG&gt;&lt;/LI&gt;
&lt;UL&gt;
&lt;LI&gt;Use &lt;CODE&gt;onnx.utils.polish_model(model)&lt;/CODE&gt; to simplify the structure.&lt;/LI&gt;
&lt;LI&gt;Apply quantization (&lt;CODE&gt;onnxruntime.quantization.quantize_dynamic()&lt;/CODE&gt; or &lt;CODE&gt;onnxruntime.quantization.quantize_static()&lt;/CODE&gt;) to reduce model size.&lt;/LI&gt;
&lt;/UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Break the Model into Smaller Segments&lt;/STRONG&gt;&lt;/LI&gt;
&lt;UL&gt;
&lt;LI&gt;If possible, split the model into different layers or sub-models before conversion.&lt;/LI&gt;
&lt;/UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Convert in FP8 Instead of FP32&lt;/STRONG&gt;&lt;/LI&gt;
&lt;UL&gt;
&lt;LI&gt;Reducing precision from FP32 to FP8 (&lt;CODE&gt;float8&lt;/CODE&gt;) can significantly lower model size. You can use &lt;CODE&gt;onnxruntime-tools&lt;/CODE&gt; for this.&lt;/LI&gt;
&lt;/UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Use an Alternative Converter like &lt;CODE&gt;onnx2tf&lt;/CODE&gt;&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;UL&gt;
&lt;LI&gt;Some unofficial tools (&lt;CODE&gt;onnx2tf&lt;/CODE&gt;) may handle large models better by converting layer by layer.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;If none of these work, another approach is to load the model in ONNX, strip unnecessary operations, and re-export a slimmer version before converting to TensorFlow.&lt;/P&gt;
&lt;P&gt;&lt;!--EndFragment --&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 23 May 2025 15:59:06 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/Deepseek-R1-1-5B-amp-Meta-LLama-3-2-1B-on-NXP-8M-Plus/m-p/2103668#M237545</guid>
      <dc:creator>jamesbone</dc:creator>
      <dc:date>2025-05-23T15:59:06Z</dc:date>
    </item>
  </channel>
</rss>

