How to use OpenVX extension for NPU/GPU to accelerate machine vision applications

5 Kudos

This guide is about how to use EVIS to create user nodes and kernels in OpenVX to implement image processing on NPU(i.MX8MP)/GPU(i.MX8QM). Take gaussian filter as an example. It is tested on i.MX8QM and i.MX8MP.

User Node Creation from User Kernel

1. Define a user node

Register a user kernel by its ID or name

For example,

#define VX_KERNEL_NAME_GAUSSIAN "com.nxp.extension.gaussian"
#define VX_KERNEL_ENUM_GAUSSIAN 100

Get the kernel reference by the ID or name

For example,

vx_kernel kernel = vxGetKernelByName(context, VX_KERNEL_NAME_GAUSSIAN);
vx_kernel kernel = vxGetKernelByEnum(context, VX_KERNEL_ENUM_GAUSSIAN );

Create a user node

vx_node node = vxCreateGenericNode(graph, kernel);

Set input/output node parameters

For example,

vx_status status = vxSetParameterByIndex(node, index++, (vx_reference)in_image);
status |= vxSetParameterByIndex(node, index++, (vx_reference)out_image);

2. Create InputValidator/OutputValidator functions for the node

The validators are only used for graph verification.

For example,

static vx_status VX_CALLBACK vxGaussianInputValidator(vx_node node, vx_uint32 index)
static vx_status VX_CALLBACK vxGaussianOutputValidator(vx_node node, vx_uint32 index, vx_meta_format metaObj)

ToDo:

a. InputValidator:

Get the reference to the parameter object

vx_parameter paramObj = NULL;
vx_image imgObj = NULL;
paramObj=vxGetParameterByIndex(node, index);
vxQueryParameter(paramObj, VX_PARAMETER_REF, &imgObj, sizeof(vx_image));

Check meta-data restriction

vxQueryImage(imgObj, VX_IMAGE_FORMAT, &imgFmt, sizeof(imgFmt));

Check consistency with other parameters

if (VX_DF_IMAGE_U8==imgFmt) status = VX_SUCCESS;
else status = VX_ERROR_INVALID_VALUE;

b. OutputValidator

Set the meta_format object with expected meta-data for the output

status |= vxSetMetaFormatAttribute(metaObj, VX_IMAGE_FORMAT, &imgFmt, sizeof(imgFmt));
status |= vxSetMetaFormatAttribute(metaObj, VX_IMAGE_WIDTH, &width, sizeof(width));
status |= vxSetMetaFormatAttribute(metaObj, VX_IMAGE_HEIGHT, &height, sizeof(height));

3. Create Initializer function for the node.

The initializer is used to specify workdim, global work size and local work size for the user kernel. These parameters are similiar to that in OpenCL.

For example,

/* workdim, globel offset, globel scale, local size, globel size */
vx_kernel_execution_parameters_t shaderParam = {2, {0, 0, 0}, {0, 0, 0}, {0, 0, 0}, {0, 0, 0}};
vx_status VX_CALLBACK vxGaussianInitializer(vx_node nodObj, const vx_reference *paramObj, vx_uint32 paraNum)

Set attribute to the node

vxSetNodeAttribute(nodObj, VX_NODE_ATTRIBUTE_KERNEL_EXECUTION_PARAMETERS, &shaderParam, sizeof(vx_kernel_execution_parameters_t));

Note: The links below are guides about OpenCL on GPU, which are helpful to understand OpenVX implemented on GPU/NPU.

4. Create Deinitializer function for the node (Optional)

It is used to de-allocate memory allocated at initializer.

User Kernel on NPU/GPU Creation

1. Create description of a user kernel

For example,

vx_kernel_description_t vxGaussianKernelVXCInfo =
{
VX_KERNEL_ENUM_GAUSSIAN,
VX_KERNEL_NAME_GAUSSIAN,
nullptr,
vxGaussianKernelParam,
(sizeof(vxGaussianKernelParam)/sizeof(vxGaussianKernelParam[0])),
vxGaussianValidator,
nullptr,
nullptr,
vxGaussianInitializer,
nullptr
};

2. Register the new kernel

For example,

static vx_kernel_description_t* kernels[] =
{
&vxGaussianKernelVXCInfo,
};

3. Write kernel source implemented on NPU/GPU

For example,

char vxcKernelSource[] =
{
"#include \ \n\
\n\
\n\
__kernel void gaussian\n\
( \n\
__read_only image2d_t in_image, \n\
__write_only image2d_t out_image \n\
) \n\
{ \n\
int2 coord = (int2)(get_global_id(0), get_global_id(1)); \n\
int2 coord_out = coord; \n\
vxc_uchar16 lineA, lineB, lineC, out;\n\
int2 coord_in1 = coord + (int2)(-1, -1);\n\
VXC_OP4(img_load, lineA, in_image, coord_in1, 0, VXC_MODIFIER(0, 15, 0, VXC_RM_TowardZero, 0));\n\
int2 coord_in2 = coord + (int2)(-1, 0);\n\
VXC_OP4(img_load, lineB, in_image, coord_in2, 0, VXC_MODIFIER(0, 15, 0, VXC_RM_TowardZero, 0));\n\
int2 coord_in3 = coord + (int2)(-1, 1);\n\
VXC_OP4(img_load, lineC, in_image, coord_in3, 0, VXC_MODIFIER(0, 15, 0, VXC_RM_TowardZero, 0));\n\
int info = VXC_MODIFIER_FILTER(0, 13, 0, VXC_FM_Guassian, 0);\n\
VXC_OP4(filter, out, lineA, lineB, lineC, info); ;\n\
VXC_OP4_NoDest(img_store, out_image, coord_out, out, VXC_MODIFIER(0, 13, 0, VXC_RM_TowardZero, 0)); \n\
}\n\
"
};

Note: the source is written by EVIS instructions with less latency. But the EVIS instructions are limited. These fucntions defination can be found in "cl_viv_vx_ext.h" located at "/usr/include/CL/cl_viv_vx_ext.h".

Read back the processed data by GPU/NPU to check if the operations are correct.

For example,

status = vxCopyImagePatch(vx_out_image, &rect, 0, &addressing, data2, VX_READ_ONLY, VX_MEMORY_TYPE_HOST);

4. Build the NPU/GPU source code runtime

For example,

programObj = vxCreateProgramWithSource(ContextVX, 1, programSrc, &programLen);
vxBuildProgram(programObj, "-cl-viv-vx-extension");

5. Add kernel to the program

For example,

...
kernelObj = vxAddKernelInProgram(programObj,
kernels[i]->name,
kernels[i]->enumeration,
kernels[i]->numParams,
kernels[i]->validate,
kernels[i]->initialize,
kernels[i]->deinitialize
);
...
for(vx_uint32 j=0; j < kernels[i]->numParams; j++)
{
status = vxAddParameterToKernel(kernelObj,
j,
kernels[i]->parameters[j].direction,
kernels[i]->parameters[j].data_type,
kernels[i]->parameters[j].state
);

6. Finalize the kernel creation

For example,

status = vxFinalizeKernel(kernelObj);

Exercise

The example is attached. You can build and test it on i.MX8QM or i.MX8MP.

Results on i.MX8QM:

References:

Khronosdotorg/resources.md at master · KhronosGroup/Khronosdotorg · GitHub

How to use OpenVX extension for NPU/GPU to accelerate machine vision applications

How to use OpenVX extension for NPU/GPU to accelerate machine vision applications

How to use OpenVX extension for NPU/GPU to accelerate machine vision applications

i.MX 8 Family | i.MX 8QuadMax (8QM) | 8QuadPlus

i.MX 8M | i.MX 8M Mini | i.MX 8M Nano