Hi All,
I am struggling to find an answer for following questions.
First of all, I have a IMX8mquaddevk board which has an i.MX8 M Quad Lite.
1) What is the GPU in i.MX8 M Quad Lite. Several NXP documents say it is GC7000L GPU. Please confirm if this is true ?
2) What is the peak performance of GFLOPFS of GC7000L? The Vivante website says it is 64 GFLOFS for medium precision, and 32 GFLOFS for high precision. Since GC7000L seems to have 16 processing elements, is that mean each processing element can perform 2 floating operations per cycle for high precision?
3) This document (Document Number: IMXGRAPHICUG) says the GPU has 16 processing elements.
But, the other document (Document Number: IMX8MDQLQRM) says the GPU has 4 procesisng elements. Which one is true ?
Thanks,
Jan.
Hi Olge,
What is the precision (is it 16-bit or 32-bit) that you used to obtain 25 GFLOPS?
Thanks,
PS: I could not build the clpeak benchmark on my platform. Do you mind sharing the tips how you run the clpeak benchmark?
I did build linux yocto image from sources (Beta 2 for QuadPlus but did work for 8MQ) with following extra parameters in Local.conf:
EXTRA_IMAGE_FEATURES ?= "debug-tweaks dev-pkgs tools-sdk tools-debug tools-testapps package-management"
CORE_IMAGE_EXTRA_INSTALL += " git cmake rpm"
Then you will be able to build image with all tools to get git and compile clpeak on board.
25 GFlops is for FP32. Was not able to get FP16 working. NXP pointed me to Vivante and I have not got solution from Vivante yet.
Even if FP16 is supported ~50 GFlops is too weak for me.
Oleg,
I'm not working with OpenCL but have been working with OpenGL on the i.MX6qp and i.MX8M platforms and learned some things. In the OpenGL case with the i.MX6qp, we've found that the Vivante blob code does not support the 2x GFLOPS advertised by Vivante when running with medium precision shaders instead of high precision shaders. When using the opensource GPU driver, we have been able to leverage the 2x GFLOPS performance though by configuring the shader core to correctly use the HW when the precision is medium instead of high. (It seems that the proprietargy Vivante blob code forces high precision regardless of if the application wants to use medium or high precision.)
With the i.MX8M, we have found that the the proprietary Vivante blob code does correctly leverage the HW when medium precision is requested instead of high, (and performance benefits accordingly.) We have not done any OpenCL testing yet though so it may be that the proprietary Vivante blob code does not (yet) support leveraging the HW in a manner that benefits from the 2x GFLOPS when using FP16. If I were to guess, I'd guess that Viviante/Verisilicon has not yet gotten to implementing the necessary software functionality.
Regarding the 25 vs 32 GFLOPS or 50 vs 64 GFLOPS, My guess as to this delta is that the NXP BSP for the i.MX8M runs the 3D GPU shader core at 800MHz instead of the 1GHz that Verisilicon uses when providing the 32/64 GFLOPS numbers. If this GPU shader core frequency delta is taken into consideration, you end up being within a few percent of the stated 32GFLOPS.
Regards,
Chris
Hello,
I.MX8M contains GC7000L GPU IP.
Its implementation details are provided in i.MX8M RM. Sorry, no more information
is public available.
Have a great day,
Yuri
-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------
Hi Yuri,
I believe you are referring Reference manual (document number: IMX8MDQLQRM) as an i.MX8M RM.
This i.Mx8M RM does not include how many GFLOFS on
the GC7000L. Additionally, it does contain a conflicting information regarding the number of processing elements as mentioned in my previous email.
Thanks,
Janarbek.
Hello,
Some preliminary data were provided You under Service Request.
Regards,
Yuri.