i am having bad performance with working with UMat manipulation. i tested multiplication and addition on matrices 1024*706 with UMat (that works on opencl CL_PLATFORM_VERSION: OpenCL 1.2 V6.2.4.p4.190076 ) and with Mat that works on the CPU on imx8mq. the CPU takes 800 micros while the GPU takes 1400 micros. i am running the multiply and add 500 times so its not the write from CPU to GPU. Am i missing something in the configuration that will accelerate the GPU to work better then the CPU?
hello
thanks for the reply. the code i am using is simple. i do not see any cl_ configuration. is it a compilation flag maybe?
here is the code
int main ()
{
bool isImshow = true;
std::chrono::steady_clock::time_point begin;
std::chrono::steady_clock::time_point end;
cv::Mat testMat (768,1024,CV_8UC1 );
cv::Mat testNuc (768,1024,CV_8UC1 );
// Defining GPU matrices
cv::UMat testMatGpu , testNucGpu, testMatTarget;
// Randomizing image
cv::randu(testMat, 0, (int)pow(2, 8));
cv::randu(testNuc, 0, (int)pow(2, 8));
testMat.copyTo(testMatGpu);
testNuc.copyTo(testNucGpu);
for(int i=0;i<500;i++)
{
// Performing CPU per element matrix multiplication
begin = std::chrono::steady_clock::now();
cv::multiply(testMatGpu, testNucGpu, testNucGpu);
end = std::chrono::steady_clock::now();
printResult(begin, end, "GPU per element matrix mul", testNucGpu.getMat(cv::ACCESS_READ));
}
}
Hi,
Yes I i did not see any cl_ configuration, but I see is opencv, need to have all the files, i mean the header files for Opencv is probably that your yocto is not configure to opencv.
Regards
my yocto is configured to opencv as i added it in the conf/local.conf IMAGE_INSTALL_append = " opencv ffmpeg "
from bitbake -s |grep opencv i get 4.0.1.imx+gitAUTOINC+737f8fad13_2522124473_32e315a5b1_34e4206aef_fccf7cd6a4_d29d003e00-r0
i see its logs in the build/tmp/... dirctory
my example was compiled with the sdk that i populated and i have this line in the CMakeCache.txt
//Details about finding OpenCV
FIND_PACKAGE_MESSAGE_DETAILS_OpenCV:INTERNAL=[/home/elsec-linux/sdk-opt-sumo/sysroots/aarch64-poky-linux/usr][v4.0.1()]
and when i run the code with UMat i see the application in gputop
so i believe its working but i believe its not configured correctly
thanks
Hello ran,
Can you provide the code to check it?. you problably are using cl_khr_fp16 extension that don´t work on current driver of vivante GPU.
Regards
the code is simple and i added a test to see if there is a flag cl_khr_fp16 which returned false. am i mistaken that its a flag?
here is the code
int main ()
{
#ifdef cl_khr_fp16
std::cout<<" the flag is on"<<std::endl;
#else
std::cout<<" the flag is off"<<std::endl;
#endif
bool isImshow = true;
std::chrono::steady_clock::time_point begin;
std::chrono::steady_clock::time_point end;
cv::Mat testMat (768,1024,CV_8UC1 );
cv::Mat testNuc (768,1024,CV_8UC1 );
// Defining GPU matrices
cv::UMat testMatGpu , testNucGpu, testMatTarget;
// Randomizing image
cv::randu(testMat, 0, (int)pow(2, 8));
cv::randu(testNuc, 0, (int)pow(2, 8));
testMat.copyTo(testMatGpu);
testNuc.copyTo(testNucGpu);
for(int i=0;i<500;i++)
{
// Performing CPU per element matrix multiplication
begin = std::chrono::steady_clock::now();
cv::multiply(testMatGpu, testNucGpu, testNucGpu);
end = std::chrono::steady_clock::now();
printResult(begin, end, "GPU per element matrix mul", testNucGpu.getMat(cv::ACCESS_READ));
}
}