Problems in parallel execution of NPU and 3D GPU

christhi · ‎01-19-2022

Hello,

I am using ArmNN on a i.MX8M Plus with the ArmNN version 21.02 and the Yocto hardknott version 2.0 (https://github.com/varigit/variscite-bsp-platform/releases/tag/hardknott-fsl-5.10.52_2.1.0-mx8mp-v1....). The following problem occurs for the VsiNpu backend using the NPU for computing. If I use the VsiNpu backend and use the GPU for inference, evertything works fine.

I have a problem in executing the ssd_mobilenet_v1 model (https://tfhub.dev/tensorflow/lite-model/ssd_mobilenet_v1/1/default/1) in my own application. In my main application I create a object of the ArmNNEngine class (see below). The initializing steps are fine. Then I run the model (via ArmNNEngine::run method) in a seperate thread (trough std::thread, std::future::async, QThread. Everything the same problem). The output tensor looks fine for the label id, score and number of detected objects. Just the coordinates of the boxes seems to be wrong and looking like undefinded memory (e.g. values like 3018, -3018, ..., -4.300e-29, ...). This problem only occurs when I run the inference for the ssd_mobilenet_v1 model. For every other model everything looks fine. When I don't use threads, also everything is fine.
The input for the run method comes from a camera live stream. The output is draw on the live frame and is displayed using QOpenGLWidget. If I use QWidget instead of QOpenGLWidget, the coordinates are correct. So only in this combination I get wrong results: Using Threads, Using QOpenGLWidget, Using ssd_mobilenet_v1 model, inference through VsiNpu backend using NPU for inference.

Has somebody an idea if there is a known problem?

Thank you for your help !
Kind regards, Chris

class ArmNNEngine
{
private:
    mutable armnn::OutputTensors                     m_aOutputTensors;        
    mutable armnn::InputTensors                      m_aInputTensors;       
    armnn::IRuntimePtr                               m_paRuntime;            
    armnn::NetworkId                                 m_aNetID;                
    armnnTfLiteParser::BindingPointInfo              m_aInputBindingInfo;    
    std::vector<armnnTfLiteParser::BindingPointInfo> m_vaOutputBindingInfo; 

    armnn::InputTensors  MakeInputTensors(const std::pair<armnn::LayerBindingId,
                                                          armnn::TensorInfo>& input,
                                          const void* pInputTensorData);

    armnn::OutputTensors MakeOutputTensors(const std::pair<armnn::LayerBindingId,
                                                           armnn::TensorInfo>& output);

    armnn::OutputTensors MakeOutputTensors(const std::vector<std::pair<armnn::LayerBindingId,
                                                           armnn::TensorInfo>>& output);

public:

    ArmNNEngine(const uint8_t &uBackendToUse,
                const std::string &sModelPath,
                const tModelType &stModelType,
                const uint16_t &uSourceFrameWidth,
                const uint16_t &uSourceFrameHeight,
                const tColorFormat &stSourceColorFormat,
                const tDataType &stDestDataType,
                const std::vector<uint16_t> &vuDestTensorSize);

    virtual ~ArmNNEngine();

    void run(const void* pSourceData,
                  void* pDestData);
};

Problems in parallel execution of NPU and 3D GPU

Problems in parallel execution of NPU and 3D GPU

i.MX 8M | i.MX 8M Mini | i.MX 8M Nano