Inference time is faster when using the pytorch glow than the eIQ glow

jmlee · ‎01-19-2021

Hi,

I used the lenet_mnist.pb model in your guide.
And I changed the model to Bundles using model-compiler tool in eIQ Glow, and I made the project using eIQ SDK.
Next, I made another project, and I changed lenet_mnist.pb to bundle using model-compiler tool in Pytorch Glow and put it in the project and ran it.
Both projects have the same code. Only the bundle is different.

But the inference time takes longer when using the bundles made using eIQ glow. I don't know why there's such a difference. Let me know if you have any expectations.

I wrote it through a translator, so the words might be awkward. I'm sorry.

The main.c code of the project using eIQ Bundles (I created the project with a lenet_mnist example of SDK and changed the bundle only).

/*
 * Copyright 2019-2020 NXP
 * All rights reserved.
 *
 * SPDX-License-Identifier: BSD-3-Clause
 */

/**
 * @file    main.c
 * @brief   Application entry point.
 */
#include <stdio.h>
#include "board.h"
#include "peripherals.h"
#include "pin_mux.h"
#include "clock_config.h"
#include "fsl_debug_console.h"
#include "timer.h"

// ----------------------------- Bundle API -----------------------------
// Bundle includes.
#include "lenet_mnist.h"
#include "glow_bundle_utils.h"

// Statically allocate memory for constant weights (model weights) and initialize.
GLOW_MEM_ALIGN(LENET_MNIST_MEM_ALIGN)
uint8_t constantWeight[LENET_MNIST_CONSTANT_MEM_SIZE] = {
#include "lenet_mnist.weights.txt"
};

// Statically allocate memory for mutable weights (model input/output data).
GLOW_MEM_ALIGN(LENET_MNIST_MEM_ALIGN)
uint8_t mutableWeight[LENET_MNIST_MUTABLE_MEM_SIZE];

// Statically allocate memory for activations (model intermediate results).
GLOW_MEM_ALIGN(LENET_MNIST_MEM_ALIGN)
uint8_t activations[LENET_MNIST_ACTIVATIONS_MEM_SIZE];

// Bundle input data absolute address.
uint8_t *inputAddr = GLOW_GET_ADDR(mutableWeight, LENET_MNIST_data);

// Bundle output data absolute address.
uint8_t *outputAddr = GLOW_GET_ADDR(mutableWeight, LENET_MNIST_softmax);

// ---------------------------- Application -----------------------------
// Lenet Mnist model input data size (bytes).
#define LENET_MNIST_INPUT_SIZE    28*28*sizeof(float)

// Lenet Mnist model number of output classes.
#define LENET_MNIST_OUTPUT_CLASS  10

// Allocate buffer for input data. This buffer contains the input image
// pre-processed and serialized as text to include here.
uint8_t imageData[LENET_MNIST_INPUT_SIZE] = {
#include "input_image.inc"
};

/*
 * @brief   Application entry point.
 */
int main(void) {

  // Initialize hardware.
  BOARD_InitBootPins();
  BOARD_InitBootClocks();
  BOARD_InitBootPeripherals();
  BOARD_InitDebugConsole();
  init_timer();

  // Timer variables.
  uint32_t start_time, stop_time;
  uint32_t duration_ms;

  // Produce input data for bundle.
  // Copy the pre-processed image data into the bundle input buffer.
  memcpy(inputAddr, imageData, sizeof(imageData));

  // Perform inference and compute inference time.
  start_time = get_time_in_us();
  lenet_mnist(constantWeight, mutableWeight, activations);
  stop_time = get_time_in_us();
  duration_ms = (stop_time - start_time) / 1000;

  // Get classification top1 result and confidence.
  float *out_data = (float*)(outputAddr);
  float max_val = 0.0;
  uint32_t max_idx = 0;
  for(int i = 0; i < LENET_MNIST_OUTPUT_CLASS; i++) {
    if (out_data[i] > max_val) {
      max_val = out_data[i];
      max_idx = i;
    }
  }

  // Print classification results.
  PRINTF("Top1 class = %lu\r\n", max_idx);
  PRINTF("Confidence = 0.%03u\r\n",(int)(max_val*1000));
  PRINTF("Inference time = %lu (ms)\r\n", duration_ms);

  return 0;
}

Result :
Top1 class = 9
Confidence = 0.946
Inference time = 54 (ms)

The main.c code of the project using Pytorch Bundles (I created the project with a hello_world example of SDK and added the bundle and code).

/*
 * Copyright (c) 2013 - 2015, Freescale Semiconductor, Inc.
 * Copyright 2016-2017 NXP
 * All rights reserved.
 *
 * SPDX-License-Identifier: BSD-3-Clause
 */

// #include "fsl_device_registers.h"

#include <stdio.h>
#include "board.h"
#include "pin_mux.h"
#include "clock_config.h"
#include "fsl_debug_console.h"
#include "timer.h"

// ----------------------------- Bundle API -----------------------------
// Bundle includes.
#include "lenet_mnist.h"

// Statically allocate memory for constant weights (model weights) and initialize.
GLOW_MEM_ALIGN(LENET_MNIST_MEM_ALIGN)
uint8_t constantWeight[LENET_MNIST_CONSTANT_MEM_SIZE] = {
#include "lenet_mnist.weights.txt"
};

// Statically allocate memory for mutable weights (model input/output data).
GLOW_MEM_ALIGN(LENET_MNIST_MEM_ALIGN)
uint8_t mutableWeight[LENET_MNIST_MUTABLE_MEM_SIZE];

// Statically allocate memory for activations (model intermediate results).
GLOW_MEM_ALIGN(LENET_MNIST_MEM_ALIGN)
uint8_t activations[LENET_MNIST_ACTIVATIONS_MEM_SIZE];

// Bundle input data absolute address.
uint8_t *inputAddr = GLOW_GET_ADDR(mutableWeight, LENET_MNIST_data);

// Bundle output data absolute address.
uint8_t *outputAddr = GLOW_GET_ADDR(mutableWeight, LENET_MNIST_softmax);

// ---------------------------- Application -----------------------------
// Lenet Mnist model input data size (bytes).
#define LENET_MNIST_INPUT_SIZE    28*28*sizeof(float)

// Lenet Mnist model number of output classes.
#define LENET_MNIST_OUTPUT_CLASS  10

// Allocate buffer for input data. This buffer contains the input image
// pre-processed and serialized as text to include here.
uint8_t imageData[LENET_MNIST_INPUT_SIZE] = {
#include "input_image.inc"
};
/*******************************************************************************
 * Definitions
 ******************************************************************************/


/*******************************************************************************
 * Prototypes
 ******************************************************************************/

/*******************************************************************************
 * Code
 ******************************************************************************/
/*!
 * @brief Main function
 */
int main(void){

    /* Init board hardware. */
    BOARD_ConfigMPU();
    BOARD_InitPins();
    BOARD_InitBootClocks();
    BOARD_InitDebugConsole();
    init_timer();

    // Timer variables.
    uint32_t start_time, stop_time;
    uint32_t duration_ms;

    // Produce input data for bundle.
    // Copy the pre-processed image data into the bundle input buffer.
    memcpy(inputAddr, imageData, sizeof(imageData));

    // Perform inference and compute inference time.
    start_time = get_time_in_us();
    lenet_mnist(constantWeight, mutableWeight, activations);
    stop_time = get_time_in_us();
    duration_ms = (stop_time - start_time) / 1000;

    // Get classification top1 result and confidence.
    float *out_data = (float*)(outputAddr);
    float max_val = 0.0;
    uint32_t max_idx = 0;
    for(int i = 0; i < LENET_MNIST_OUTPUT_CLASS; i++) {
        if (out_data[i] > max_val) {
	        max_val = out_data[i];
	        max_idx = i;
	    }
    }

    // Print classification results.
    PRINTF("Top1 class = %lu\r\n", max_idx);
    PRINTF("Confidence = 0.%03u\r\n",(int)(max_val*1000));
    PRINTF("Inference time = %lu (ms)\r\n", duration_ms);

    return 0;
}

Result :
Top1 class = 9
Confidence = 0.946
Inference time = 42 (ms)

Alexis_A · ‎01-21-2021

Hello @jmlee,

Looking at some information about both frameworks I found the following post. This explains the differences between using PyTorch or eIQ.

Best Regards,

Alexis Andalon

jmlee · ‎01-24-2021

Hi,

In earlier question(https://community.nxp.com/t5/i-MX-RT/Inference-time-is-faster-when-using-the-pytorch-glow-than-the/t...), to see the difference between pytorch and glow, I was recommended to take a look at the blog page.(https://towardsdatascience.com/pytorch-vs-tensorflow-spotting-the-difference-25c75777377b)
. But this page mainly describes comparison between pytorch, and Tensorflow, different from what I want to know about.
I made a two bundle from same model(lenet_mnist.pb from your guide), using two separate model-compiler 1. elQ glow, 2. Pytorch glow.
But the result I got shows significant difference performance between the two, Pytorch glow seems better than elQ glow in this model.
I was wondering about,
1. Is there any mistake I have made during this comparison process between elQ glow, Pytorch glow? From the benchmark, I supposed elQ glow would show better performance than pytorch glow.
2. What causes these performance difference.
3. As far as I know, elQ Glow compiler is based on Pytorch Glow compiler. In which part or features are improved? In other words, what’s the main difference between elQ glow, and pytorch glow.

Inference time is faster when using the pytorch glow than the eIQ glow

Inference time is faster when using the pytorch glow than the eIQ glow

i.MXRT 105x