AnsweredAssumed Answered

MMDC profiling on i.MX6 Quad yields wrong number of read bytes

Question asked by Christoph Freundl on Nov 24, 2016
Latest reply on Jan 3, 2017 by frankba

Hi,

 

during performance measurements on our custom i.MX6 Quad board (using 32-bit DDR3 RAM and running a Linux 3.14 kernel with RT patches, Update: Toolchain (based on Yocto) uses a gcc 4.9.2 compiler) I stumbled upon a strange result which I cannot explain: it seems that MMDC profiling undoubtedly returns wrong values in the MMDC0_MADPSR4 register.

 

I used the following C program for my test:

 

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <time.h>

void convert(const float* input, uint16_t* output, const int size) {
    int i;
    for (i = 0; i < size; ++i) {
        output[i] = input[i] * 1000.0f;
    }
}

int main(int argc, char* argv[])
{
    const int size = (argc < 2) ? 32768 : atoi(argv[1]);
    const int niter = (argc < 3) ? 1 : atoi(argv[2]);
    int i;

    for (i = 0; i < niter; ++i) {
        float* input = (float*) malloc(size * sizeof(float));
        uint16_t* output = (uint16_t*) malloc(size * sizeof(uint16_t));

        convert(input, output, size);

        free(input);
        free(output);
    }

    return 0;
}

Note that I freshly allocate the memory before calling the convert function such that I can (in conjunction with a sufficiently large array size) rule out that we are working in the cache. In this test, I do not initialize the memory such that the whole input array should be read once and be written exactly once during the execution.

 

For the MMDC profiling itself I use the version of the mmdc tool that can be found here: GitHub - FrankBau/mmdc: Memory Bandwidth Measurement Tool for i.MX6

When calling the mmdc tool, I have set the environment variable MMDC_SLEEPTIME to 3 (seconds).

 

Profiling of the "idle" system yields this result:

MMDC new Profiling results:
***********************
Measure time: 3000ms
Total cycles count: 1584049192
Busy cycles count: 9636657
Read accesses count: 230740
Write accesses count: 45605
Read bytes count: 6878948
Write bytes count: 1453322
Avg. Read burst size: 29
Avg. Write burst size: 31
Read: 2.19 MB/s /  Write: 0.46 MB/s  Total: 2.65 MB/s
Utilization: 5%
Bus Load: 0%
Bytes Access: 30

A single run of my test program with array size 100,000,000 and only one iteration should read 400,000,000 bytes and write 200,000,000 bytes but instead it shows numbers like these:

MMDC new Profiling results:
***********************
Measure time: 3000ms
Total cycles count: 1584044016
Busy cycles count: 187396205
Read accesses count: 1308446
Write accesses count: 6621328
Read bytes count: 39956968
Write bytes count: 211873984
Avg. Read burst size: 30
Avg. Write burst size: 31
Read: 12.70 MB/s /  Write: 67.35 MB/s  Total: 80.05 MB/s
Utilization: 8%
Bus Load: 11%
Bytes Access: 31

The amount of bytes written is exactly what I expect but number of bytes read (roughly 40,000,000 bytes minus the idle load) is way below my expectancy. You can also vary the array size and the number of iterations, the result is always like that. I ruled out any overflows in the MMDC0_MADPSR4 register, which is responsible for count the total number of bytes read, by constantly monitoring it during the profile run.

 

My question is: can anybody reproduce this behavior on another i.MX6 Quad (or also Dual/DualLite/Solo) system? And in case it is not reproducible somwhere else, what could possibly go wrong in my system that could explain these profiling numbers?

 

Update: I ran the same test on the same board but using a different kernel (3.11 without RT patches) and a different toolchain (Yocto based using gcc 4.7.2 with the SoftFP ABI) for compilation. It yields the following numbers

 

MMDC new Profiling results:
***********************
Measure time: 3000ms
Total cycles count: 1584038592
Busy cycles count: 442998552
Read accesses count: 7119647
Write accesses count: 12846785
Read bytes count: 226276278
Write bytes count: 411094862
Avg. Read burst size: 31
Avg. Write burst size: 31
Read: 71.93 MB/s /  Write: 130.68 MB/s  Total: 202.61 MB/s
Utilization: 8%
Bus Load: 27%
Bytes Access: 31

 

The number of bytes read is now roughly 200,000,000 which is now significantly higher but still less than expected. Additionally, the number of bytes written has approximately doubled to about 400,000,000 bytes. I am aware that the used compiler influences the result and though I checked the generated code and did not see an additional write operation I could somehow accept the reported number of bytes written. However, it still does not make sense that according to the profiling data fewer bytes have been read than necessary in order to produce the correct result.

 

Am I making a wrong assumption anywhere? How can the differences in the profiling numbers between the two environments on the same board be explained? Any explanations or hints are welcome.

 

Update 2: Another test on a Nitrogen board (Kernel 3.10) with the same SoftFP built binaries of above results in

MMDC new Profiling results:
***********************
Measure time: 3000ms
Total cycles count: 1584077648
Busy cycles count: 125561174
Read accesses count: 524494
Write accesses count: 6532135
Read bytes count: 16401940
Write bytes count: 209027072
Avg. Read burst size: 31
Avg. Write burst size: 31
Read: 5.21 MB/s /  Write: 66.45 MB/s  Total: 71.66 MB/s
Utilization: 11%
Bus Load: 7%
Bytes Access: 31

The numbers are very similar to those of the "original" environment: the number of bytes written is as expected but - again - the number of bytes read is too low.

 

Many regards,

Christoph

Outcomes