I have an 8MPLUSLPD4-EVK Evaluation Kit for the i.MX 8M Plus Applications Processor which I am using to develop optimised OpenCL code for the GC7000UL GPU contained in the onboard NXP i.MX 8M Plus. Can you point me to any documentation for the GC7000UL GPU, please? The IMX_GRAPHICS_USERS_GUIDE.pdf contains some rudimentary information, but I'd like some more detailed information, such as:
- Do the "Floating Point Execution Unit" and the "Integer Execution Unit", which are shown as being separate in Figure 6 on page 58/182, operate in parallel? In other words, if I convert some of my integer arithmetic to use floating point and then interleave floating point and integer arithmetic, would I expect to see a speed-up relative to using purely integer operations (given that my kernel is compute-bound)?
- How much "private memory" does the GC7000UL have in terms of on-chip registers (as opposed to "System Memory" simulating private memory)?
- Does the GC7000UL support "threading"?
- Table 17 says that the GC7000UL is "Full Profile". How does this relate to section 5.3 which talks about "Optimization for OpenCL embedded profile"?
Any pointers/tips/hints/suggestions appreciated - thanks!