What about resizing the images during retraining?

MarcinChelminsk · ‎09-18-2019

Hi NXP Team!

I followed the lab called "eIQ Transfer Learning Lab - Without Camera.pdf".

Section "3. Retrain Existing Model" says that we will retrain the model for 128x128 pixel images.

However images in folder with example images have different dimensions (like 320x232, 320x212, 500x332 pixels and so on).

So why is the reason for resizing the images?

And later on, section "5. Run Demo (point 21.)" says: change in the code the image height and width to 128, however C array representing the image (in this case of this lab this is 21652746_cc379e0eea_m.bmp) contains much more data because 21652746_cc379e0eea_m.bmp is 231x240.

So why this step is also important?

Any hints more than welcome! Thanks in advance!

MarcinChelminsk · ‎09-18-2019

Hi David,

it gives me some light on this challenge however...

Retraining process also requires 128x128 images as the input, am I right?

So script called retrain.py resizes images to proper value 128x128, am I right?

Is this function below responsible for that?

def add_jpeg_decoding(input_width, input_height, input_depth, input_mean,
                      input_std):‍‍

When you said this:

When you look in the source code of the example, you can see that before inference is run, the input images are first resized to the 128x128 format because the model cannot correctly classify images in any other format.

you mean this part of code in label_image.cpp file?

int image_width = 128;
int image_height = 128;
int image_channels = 3;
uint8_t* in = read_bmp(daisy_bmp, daisy_bmp_len, &image_width, &image_height,
                           &image_channels, s);‍‍‍‍‍‍‍‍‍‍

david_piskula · ‎09-18-2019

Hello Marcin,

Retraining process also requires 128x128 images as the input, am I right?

Yes.

So script called retrain.py resizes images to proper value 128x128, am I right?

Yes.

Is this function below responsible for that?

def add_jpeg_decoding(input_width, input_height, input_depth, input_mean,
                      input_std):‍‍‍‍‍‍‍‍

I believe so. The script actually comes from Google as part of the Tensorflow for Poets tutorial and I recommend you go through some of the guides there as well, if you wish to learn more about TensorFlow and Machine Learning in general.

you mean this part of code in label_image.cpp file?

int image_width = 128;
int image_height = 128;
int image_channels = 3;
uint8_t* in = read_bmp(daisy_bmp, daisy_bmp_len, &image_width, &image_height,
                           &image_channels, s);

Actually, these values are just initialization values and get changed to the actual width, height and number of channels of the supplied bmp in the read_bmp() function.

The code that takes care of the resizing is here:

  /* Get input dimension from the input tensor metadata
     assuming one input only */
  TfLiteIntArray* dims = interpreter->tensor(input)->dims;
  int wanted_height = dims->data[1];
  int wanted_width = dims->data[2];
  int wanted_channels = dims->data[3];

  switch (interpreter->tensor(input)->type) {
    case kTfLiteFloat32:
      s->input_floating = true;
      resize<float>(interpreter->typed_tensor<float>(input), in, image_height,
                    image_width, image_channels, wanted_height, wanted_width,
                    wanted_channels, s);
      break;
    case kTfLiteUInt8:
      resize<uint8_t>(interpreter->typed_tensor<uint8_t>(input), in,
                      image_height, image_width, image_channels, wanted_height,
                      wanted_width, wanted_channels, s);
      break;
    default:
      LOG(FATAL) << "cannot handle input type "
                 << interpreter->tensor(input)->type << " yet";
      exit(-1);
  }‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Where the wanted_height, wanted_width and wanted_channels get loaded from the model's metadata (it is generalized and automated here, so that if you decided to use a different model that requires a different format, you don't have to manually hard code the values). Afterwards, the image inside in and its actual and required dimensions get passed to the resize function, which takes care of preparing the input and storing it inside interpreter->typed_tensor<float>(input) or interpreter->typed_tensor<uint8_t>(input).

MarcinChelminsk · ‎09-20-2019

David, thank you very much for your support! It clarifies all my current doubts :smileyhappy: Thanks again!

david_piskula · ‎09-20-2019

You're welcome Marcin, I'm glad I could be of help. Don't hesitate to create new questions if you need anything else clarified in the future. Have fun with eIQ!

david_piskula · ‎09-18-2019

Hello Marcin.

The sentence "We will retrain this model for 128x128 pixel images using a python script found inside the tutorial folder." does not actually mean that we will retrain some model to work on 128x128 images. In fact, it means that the model for 128x128 images will be retrained to recognize a different set of categories. The original model was trained to classify a much larger set of categories and we retrain it to classify only daisies, dandelions, roses, sunflowers and tulips.

As you can see in

--architecture=mobilenet_0.25_128 The particular type of Mobilenet model to use as a starting point

the model architecture chosen for this demo is the mobilenet_0.25_128. The 128 at the end of the name means that the model is already pretrained for 128x128 images.

When you look in the source code of the example, you can see that before inference is run, the input images are first resized to the 128x128 format because the model cannot correctly classify images in any other format.

Hopefully this clears it up for you.