I would to use clFFT for GPU core of my board (phyBOARD-Polaris i.MX 8M). I have followed the instruction of this link https://github.com/clMathLibraries/clFFT. For kernel size of 128, it takes more than a second to do the computations. Is there a way to reduce the kernel computation for larger kernel sizes?
I would greatly appreciate your help.