i use the version of opencl is 1.2v , and i dont use opencv.
And i fix the kernel api. When i run it with yolov3-tiny. it use about 170s.
Why it so slow? Is the reason of my code or GPU?
By testing one by one , i found that my kernel function "softmax" has some error , I cannot clCreatKernel it .And while i delete some line likes "vstore16(localInput[i], 0, &x[i * 16]);" , it can create suceess ,How can you give me some help?
Thank very much.
Looking forward to your reply!