Hello,
I am currently running the
GoPoint Driver Monitoring System (DMS) demo on an
i.MX95 evaluation kit using the
lf-6.12.49_2.2.0 release.
While examining the source code comments for the face detection model, it notes that the demo utilizes Google's MediaPipe BlazeFace (short-range) model:
- Original Google Model: face_detection_short_range.tflite (MediaPipe Assets URL)
- Model Card: BlazeFace Model Card
- License: Apache-2.0
However, the demo's downloads.json points to an NXP-hosted, quantized variant:
The Issue / Discrepancy:
When I load and compare both the original Google MediaPipe TFLite model and NXP's face_detection_ptq.tflite in netron.app, I notice that the model graphs are structurally different. They do not look like a simple 1:1 quantization of the exact same network topology.
I have two questions regarding how NXP prepared this asset for the i.MX95 NPU / eIQ stack:
- Graph Discrepancies: Why are the model graphs structurally different in Netron? Did NXP modify the network architecture, strip custom MediaPipe TFLite operations (like custom anchors/detections), or substitute certain layers to optimize compatibility with the i.MX95 NPU / eIQ inference engine?
- PTQ Implementation Pipeline: If NXP optimized and converted the original Google model, how was Post-Training Quantization (PTQ) applied? Typically, standard TensorFlow optimization pipelines require the original frozen graph (.pb), saved model format, or floating-point Keras/TF definitions to run calibration datasets. Since Google distributes MediaPipe models directly as .tflite files, did NXP apply PTQ directly onto a floating-point .tflite file (e.g., using the tf.lite.TFLiteConverter.from_saved_model pipeline or eIQ tools), or was the model reconstructed from scratch?
Any insight into the exact optimization and quantization workflow used for this demo asset would be highly appreciated!
Thanks in advance.