We are currently working on a GPU library that can be used to accelerate common image processing algorithms on the imx6. One of the things that we need to do is to transfer the processed data back to the CPU. According to the literature, pbo would be the standard way to do just this. However when trying it on the imx6 with the Vivante driver, the download time is no better than just calling glReadPixels directly without using pbo, which is very slow. Currently we have a way to work around this slow download issue but we would like to know whether this slow pbo read is a bug on the driver implementation or is this due to some other limitation that we are not aware of. Your help on this is appreciated.
Thanks,
Charles
Charles,
Vivante is working on the issue. I let you know soon as I get any information.
Regards,
Andre
Hi Charles,
Vivante reported that couldn't build the sources you provided, seems like files are missing, can you re-send the package and also add the binary on it in case it fails again, so even that they can test it ?
thank you,
regards,
Andre
can you send this new version to me too ?
thanks,
Andre
Hi Charles and Andre,
I seem to have the same problem with slow PBO on i.MX6.
Did you get any response from Vivante on the issue? Did you manage to solve the problem?
Best regards,
Michał
Hi Michal,
the problem was solved on the newer bsp releases and gpu driver after this discussion. Which bsp release are you using ?/
cheers,
Andre
Andre,
The BSP I currently use is a mixture of versions - with important things cherry-picked from upstream. It seems that Vivante driver in my setup (5.0.11.p4.5-hfp) is lagging behind the most recent one (5.0.11.p8.4-hfp). I'll check it once again after upgrading.
Thanks!
Michał
Andre,
I've tried 5.0.11.p8.4-hfp Vivante driver with Linux 4.1.15+g77f6154. The PBO glReadPixels and glMapBufferRange operations now finish very quickly (<100 us). However, accessing the mapped memory is quite slow. Copying one FullHD RGBA frame from the mapped buffer to user-space allocated area takes 63-77 ms, while - for comparison - copying between two user-space allocated buffers takes 18 ms.
Can I expect it to get any faster?
Best regards,
Michał
Hi Michal,
the copy between user-space allocated areas are faster due the system cache. However, the copy performance between this memory and a mapped memory is not supposed to be that slow. We are already have working with our gpu vendor in order to fix the problem.
Regards,
Andre
Hello Andre,
do you have any progress on that issue? A fix? Some new observations?
In the meantime I've worked around the problem by pipelining the processing, so that memory copy happens simultaneously with other tasks. Of course, the slow memory access contributes to the total latency of a frame, so it would still be worthy to fix the issue.
Thanks,
Michał
Hi Michal, can you please open another thread with this specific issue so we can better track it ? also, please, add more information about the use case, and a sample application to reproduce the issue if possible.
let me know when you have this new thread and we keep discussing it from there.
thanks,
Andre
Hi Michal,
We still don't have any fix for it. I'll let you know soon as we receive any information.
thanks,
Andre
I Will check that and let you know.
Regards,
Andre
Hi Charles,
we are still waiting for Vivante. I let you know soon as I get any response.
Regards,
Andre
Hi Charles,
I will address this question to Vivante with your code attached.
soon as I get any information I let you know.
regards,
Andre