UMat (GPU) works slower then Mat (CPU)

cancel
Showing results for 
Search instead for 
Did you mean: 

UMat (GPU) works slower then Mat (CPU)

597 Views
greeranjunk
Contributor II

i am having bad performance with working with UMat manipulation. i tested multiplication and addition on matrices 1024*706 with UMat (that works on opencl CL_PLATFORM_VERSION: OpenCL 1.2 V6.2.4.p4.190076 ) and with Mat that works on the CPU on imx8mq. the CPU takes 800 micros while the GPU takes 1400 micros. i am running the multiply and add 500 times so its not the write from CPU to GPU. Am i missing something in the configuration that will accelerate the GPU to work better then the CPU?

0 Kudos
5 Replies

269 Views
greeranjunk
Contributor II

hello

thanks for the reply. the code i am using is simple. i do not see any cl_ configuration. is it a compilation flag maybe?

here is the code

int main ()
{

    bool isImshow = true;
    std::chrono::steady_clock::time_point begin;
    std::chrono::steady_clock::time_point end;

    cv::Mat testMat (768,1024,CV_8UC1 );
    cv::Mat testNuc (768,1024,CV_8UC1 );

    // Defining GPU matrices
    cv::UMat testMatGpu , testNucGpu, testMatTarget;

    // Randomizing image
    cv::randu(testMat, 0, (int)pow(2, 8));
    cv::randu(testNuc, 0, (int)pow(2, 8));

    testMat.copyTo(testMatGpu);
    testNuc.copyTo(testNucGpu);
    
    for(int i=0;i<500;i++)
    {   
        // Performing CPU per element matrix multiplication
        begin = std::chrono::steady_clock::now();
        cv::multiply(testMatGpu, testNucGpu, testNucGpu);
        end = std::chrono::steady_clock::now();
        printResult(begin, end, "GPU per element matrix mul", testNucGpu.getMat(cv::ACCESS_READ));
    }   
}

0 Kudos

269 Views
Bio_TICFSL
NXP TechSupport
NXP TechSupport

Hi,

Yes I i did not see any cl_ configuration, but I see is opencv, need to have all the files, i mean the header files for Opencv is probably that your yocto is not configure to opencv.

Regards

0 Kudos

269 Views
greeranjunk
Contributor II

my yocto is configured to opencv as i added it in the conf/local.conf IMAGE_INSTALL_append = " opencv ffmpeg "

from bitbake -s |grep opencv i get 4.0.1.imx+gitAUTOINC+737f8fad13_2522124473_32e315a5b1_34e4206aef_fccf7cd6a4_d29d003e00-r0

i see its logs in the build/tmp/... dirctory

my example was compiled with the sdk that i populated and i have this line in the CMakeCache.txt

//Details about finding OpenCV
FIND_PACKAGE_MESSAGE_DETAILS_OpenCV:INTERNAL=[/home/elsec-linux/sdk-opt-sumo/sysroots/aarch64-poky-linux/usr][v4.0.1()]

and when i run the code with UMat i see the application in gputop

so i believe its working but i believe its not configured correctly 

thanks

0 Kudos

269 Views
Bio_TICFSL
NXP TechSupport
NXP TechSupport

Hello ran,

Can you provide the code to check it?. you problably are using cl_khr_fp16 extension that don´t work on current driver of vivante GPU.

Regards

0 Kudos

269 Views
greeranjunk
Contributor II

the code is simple and i added a test to see if there is a flag cl_khr_fp16 which returned false. am i mistaken that its a flag?

here is the code

int main ()
{
#ifdef cl_khr_fp16
std::cout<<" the flag is on"<<std::endl;
#else
std::cout<<" the flag is off"<<std::endl;
#endif

bool isImshow = true;
std::chrono::steady_clock::time_point begin;
std::chrono::steady_clock::time_point end;

cv::Mat testMat (768,1024,CV_8UC1 );
cv::Mat testNuc (768,1024,CV_8UC1 );

// Defining GPU matrices
cv::UMat testMatGpu , testNucGpu, testMatTarget;

// Randomizing image
cv::randu(testMat, 0, (int)pow(2, 8));
cv::randu(testNuc, 0, (int)pow(2, 8));

testMat.copyTo(testMatGpu);
testNuc.copyTo(testNucGpu);

for(int i=0;i<500;i++)
{
// Performing CPU per element matrix multiplication
begin = std::chrono::steady_clock::now();
cv::multiply(testMatGpu, testNucGpu, testNucGpu);
end = std::chrono::steady_clock::now();
printResult(begin, end, "GPU per element matrix mul", testNucGpu.getMat(cv::ACCESS_READ));
}
}

0 Kudos