strange temperature-related performance changes on i.MX6x

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

strange temperature-related performance changes on i.MX6x

853 Views
MOW
Contributor IV

Hi all

While testing several devices of a new batch of our most recent i.MX6-platform I noticed some strange performance differences depending on temperature and board itself and, which puzzles me the most, some boards (or CPUs?) show different characteristics than other identical boards. 

I was wondering, if this is normal behavior (maybe due to manufacturing tolerances of PCBs, i.MX6-SoC or other components) or some indicator of some kind of problem. 

The test is a very simple time-measurement (using an EPIT-timer running at 500 KHz) of a 100 executions of 4 MB memset(0)-commands running in an infinite loop on a "bare-metal" system, i.e. the code is started directly by the i.MX6 Boot-ROM and runs immediately after DDR3-DRAM initialization

  • no OS, no interrupts or DMA,
  • no GPU, IPU, Ethernet, USB or other components running,
  • on multi-core SoCs only one single ARM-core is running,
  • all clocks and voltages running at power-on default values (800 MHz ARM clock),
  • no power-management active.

While some boards show "rock-solid" performance, which hardly changes at all, other (identical) boards show performance differences from test-cycle to test-cycle of up to 1%, some (also identical) boards even "slow down" from best performance at the start of the test (with a cold CPU) to about 2% slower performance measurements after about 15 minutes, others "speed up", instead.

The performance characteristics of each board seem to be deterministic of multiple power-cycles. Boards showing more variance in performance also are more sensitive to external temperature changes of the i.MX6 SoC, i.e. cooling down the SoC using cooling spray shows far greater effects on boards that show larger performance variations already. Cooling anything else but the SoC doesn't seem to make a difference, though.

After noticing this on one kind of i.MX6 platform, I tried the same test on different board designs with different i.MX6-SoC-types, as well, with similar results.

Let me show you some pictures of my measurement (on different board-designs with different SoC-type):

imx6-temp-perf-4.pngimx6-temp-perf-5.pngimx6-temp-perf-3.pngimx6-temp-perf-2.pngimx6-temp-perf-1.png

Here is the code I am executing (on "bare-metal"):

num_runs = 100;
size = 4096*1024;
while (1) {
     for (j = 0; pBuffers[j] && j < NUM_BUFFERS; j++)
     {
          ticksStart = get_tick();
          for (i = 0; i < num_runs; i++)
               memset((void *)pBuffers[j], 0, size);
          ticksEnd = get_tick();
          printf("%u:C-memset 0-->#%03u...     %6u ticks (%6u KB/s)\n", cpu, j, diff_tick(ticksStart,ticksEnd), (1000*num_runs*(size>>10))/(diff_tick(ticksStart,ticksEnd)/500));
     }
}
‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

 

get_tick() just reads the counter value from an EPIT-timer running at 500 KHz. printf() sends the output to a UART running at 115200 baud. Removing the printf(), toggling a GPIO-pin and measuring the performance via oscilloscope, instead, shows the same behavior.

Anybody else ever noticed such variations in performance? Can this be normal, maybe ARM-, System-, and Peripheral-PLLs of the SoC drifting against each other? Or can this be an indicator of some kind of more serious problem?

(All of the board-designs a running for years, already, without any issues otherwise.)

Kind regards,

Marc

Labels (4)
0 Kudos
3 Replies

625 Views
MOW
Contributor IV

Thanks for your hints, so far.

We hadn't noticed EB830 before and indeed found some potential for optimizing our hardware design in this regard, but we couldn't find any correlations between this and the measured performance behavior, so far. Even stranger: looking at several different clocks via oscilloscope on the CLKO and CLKO2 pins (PLL1-5, MMDC-Clock, ARM-Clock, IPG-Clock, AXI-Clock, AHB-Clock) we haven't been able to see the effect on any of the clocks themselves, so far.

Running a similarly simple performance test that does not access the DRAM (only CPU integer instructions in a tight for-loop repeated lots of times, but still measuring the runtime via EPIT and outputting the results via UART, like above), we can't reproduce the different performance behavior, either. In this test all boards run at pretty much exactly the same speed without any significant changes in performance over time or temperature.

This makes me wonder, if this might indeed be somehow related to the DDR3-DRAM, MMDC RAM Controller or some internal bus showing different behavior with changing temperatures. As required for DDR3 RAM the ZQ-calibration is, of course, running regularly to compensate signal drivers, etc. for temperature differences. We also do not use any hard-coded DDR3 calibration data generated by the "DDR3 Script Aid" Excel sheets, but instead use the automatic calibration features of the MMDC on each power-cycle (for write-leveling, DQ- read- and write-delay-line calibrations), but these run only once each time the boards reset, and even using hard-coded settings instead doesn't show any difference in the measured performance characteristics.

Is it possible that different SoCs just behave slightly different due to process/manufacturing tolerances, with all the different internal bus-arbiters, clock-domains, FIFOs, etc.and some SoCs being more and other less sensitive to temperature changes? At least the MMDC apparently seems to have further internal temperature compensation mechanisms thinking of the MMDCx_MPMUR0 register.

Also notice that we haven't seen any other kinds of instabilities due to this strange behavior; not even at temperature extremes. All boards are running perfectly fine even at environmental temperatures of -10°C or +70°C. At far as we can see up to now, only boards even from one single production batch show slightly different performance characteristics with some boards showing (almost) rock-solid identical performance over time, other (identical boards) getting slower when "warming up", and some apparently even getting faster... some boards showing quite a lot of "jitter" in their performance, others hardly at all... Strange...

Kind regards,

Marc

0 Kudos

625 Views
TomE
Specialist II

I am interested to see what you find. I agree that it looks like you have different PLLs "beating against each other" there.

Is it possible that the memory timing is changing automatically? Are there any "Automatic Calibrations" going on in the memory controller that could be adding wait states?

We had problems with the i.MX53 memory being unreliable on SOME boards only at the temperature extremes. We were measuring a sensitivity to temperature based on SERIAL NUMBER. The only possible explanation we could come up with was that the boards were made on panels, and there was a variation in board or layer thickness across the panel, resulting in a similar variation in track impedance in the memory array area. As the boards were then serialised across and down the panel, the variation showed up in serial number ranges.

Details and a graph here in my second response to the original question.

https://community.nxp.com/message/562550?commentID=562550#comment-562550

Tom

0 Kudos

625 Views
igorpadykov
NXP Employee
NXP Employee

Hi Marc

one can output used clocks on CCM_CCOSR (arm_clk_root, osc_clk, EPIT source)

and measure them to find which affects instability. One can consider EB830

as it mentions that overdrive (which may be changed with temperature) can

affect operating frequency. Chapter 71 Crystal Oscillator (XTALOSC) i.MX6DQ

RM describes biased amplifiers used in oscillator circuits, seems bias currents

also may be changed with temperature also and if crystal loading capacitors

are not properly choosen this also may affect.

Note, internal ring oscillator does not provide an accurate frequency and should not be used

for measurements.

Best regards
igor
-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------

0 Kudos