i.MX8QM GPU hang-up in L5.4.47_2.2.0

hsaito · ‎09-02-2022

Hello,

We are developing an application using the NNAPI of TensorFlow Lite 2.2.0 in i.MX8QM.(Linux 5.4.47_2.2.0)

Executing the Invoke() function may cause the GPU to hang.

The frequency of this problem is random, occurring once every few minutes and requiring a reboot.

We believe the hardware is the cause because it occurs when the same software is used and only the hardware is changed.

The circuit configuration is the same for both, but the boards are different.

What are the possible factors for this phenomenon? And are there any countermeasures?

hsaito · ‎09-07-2022

Additional information.

On a board running Tensorflow Lite with NNPAPI properly, it was observed that the inference process also freezes when the CPU temperature rises.

When the CPU temperature reaches 107°C, the GPU clock drops to 1/64.
At this time, the Invoke() process is still running. (However, the processing speed will decrease).
When the CPU temperature drops due to the clock drop, the GPU clock returns to normal.
At this time, the Invoke() process does not return a response.

From the above, it can be inferred that fluctuations in the GPU clock affect the inference process.

However, the same phenomenon occurs even when there is no temperature increase.
Is there a factor that causes the GPU clock to fluctuate?

artsiomstaliaro · ‎09-02-2022

Hi,

Did you checked errata for this issue?

Can you trace down, to find what function exactly cause this hang?

hsaito · ‎09-04-2022

Hi artsiomstaliaro,

I have checked the errata and there does not seem to be a corresponding problem.

When we check the operation with the following program, Invoke End and error message is not displayed when hang up.

cout<<"Invoke Start"<<endl;
if(interpreter->Invoke() != kfLiteOk){
  cout<<"Error invoking detection model"<<endl;
}
cout<<"Invoke End<<endl;

When I monitor Invoke() with strace, it seems to be controlling /dev/galcore with ioctl().

ioctl(11, _IOC(_IOC_NONE, 0x75, 0x30, 0), 0xffffe8314fc8) = 0

Thus, it appears that hang ups are occurring when accessing /dev/galcore.

When hang-up occurs, there is no response when executing /unit_tests/GPU/gpu.sh.

Apps that do not use the GPU will continue to work after hang-up.

i.MX8QM GPU hang-up in L5.4.47_2.2.0

i.MX8QM GPU hang-up in L5.4.47_2.2.0

i.MX 8 Family | i.MX 8QuadMax (8QM) | 8QuadPlus