AnsweredAssumed Answered

i.MX53 i.RAM (OCRAM) seems very slow, test code included.

Question asked by TomE on May 8, 2015
Latest reply on Dec 4, 2017 by TomE

(Edit: Definitely Not Answered. What put that "Assumed" thing there?)

 

Starting with the questions.

 

Am I doing the right thing here? Am I missing anything?

 

Could there be something wrong with the way the Kernel sets up the CPU? With the way the Bootstrap left the CPU set up?

 

Can others please run this test and let me know what results they get?

 

I'm running mainstream Linux 3.4 on an i.MX53 board.

 

I'll try and repeat these tests on a i.MX53 QSB on Monday.

 

I wrote some code in a FlexCAN driver to measure how long it took to read and write the FlexCAN device registers, and as expected, it took a L O N G time. When using the standard I/O Macros (which added a Memory Barrier instruction to each read) it took about 180ns. Removing the Barrier reduced it to 130ns. For an 800MHz CPU that's still over 100 CPU clocks!

 

I then wrote some code to measure the "speed" of reading the OCRAM from Linux user-space, which is meant to be capable of 1 or 2 clock access. Admittedly at 133MHz (the ahb_clk_root clock). But that's still a multiple of 7.5ns.

 

Except I'm measuring 129ns. That's  17 133MHz clocks or 103 CPU clocks.

 

Here's my measured results for reading Cached RAM, OCRAM, GPU RAM, raw DDR and the Boot ROM:

 

Memory Type        Address     Time us  MiW/Sec ns/word

Normal Cached RAM  User        2730     366      2.55

i.RAM/OCRAM        0xf8000000  135493     7.38  129

GPU3D GMEM         0xf8020000  189736     5.27  181

NAND FLASH Buffer  0xf7ff0000  155717     6.42  139

CSD0 to raw DDR    0x70000000  178597     5.60  171

Boot ROM           0x00000000  157992     6.33  151

 

I've attached the source and executable (should run under Linux on i.MX53 or any i.MX6) as "memio.tar.gz"..

 

Here's how to run the program. WARNING, that it WRITES to the memory and reads it back (to check that I've got the right address and am addressing read-write memory), so if you run it on Peripheral Space or the external DRAM the OS is running in you might cause a crash.

 

root@triton1:/tmp# nice --20 ./memio 0xf8000000 4

Offset 0xf8000000, count 4, pagesize 4096

[0] = 0x00000000

[1] = 0x00000001

[2] = 0x00000002

[3] = 0x00000003

DUMMY 1MiB runs took 2730 usec, giving 6

REAL 1MiB runs took 135493 usec, giving 71937030

 

GPU3D GMEM at 0xf8002000:

 

root@triton1:/tmp# nice --20 ./memio 0xf8020000 4

Offset 0xf8020000, count 4, pagesize 4096

[0] = 0x00000000

[1] = 0x00000001

[2] = 0x00000002

[3] = 0x00000003

DUMMY 1MiB runs took 2676 usec, giving 6

REAL 1MiB runs took 189736 usec, giving -805021690

 

Unless I can't count and got my code wrong, "135493 usec" for 1 MiB (1048576) is 129us for a single word read.

As a sanity-check, it is able to read cached RAM (meaning it is reading the cache)  in 2.5ns, or at about 390MHz on an 800MHz CPU.

 

I'll try to paste the code in here.

 

#include <stdio.h> #include <stdlib.h> #include <stdint.h> #include <fcntl.h> #include <sys/mman.h> #include <sys/time.h> #include <ctype.h> #define __USE_BSD #include <unistd.h>     /* for getpagesize() */ #undef __USE_BSD #include <string.h>     /* for memset() */  int main(int argc, char *argv[]) {     struct timeval now, then;      if (argc < 3) {         printf("Usage: %s <phys_addr> <count>\n", argv[0]);         return 0;     }      off_t offset = strtoul(argv[1], NULL, 0);     size_t count = strtoul(argv[2], NULL, 0);      int fd = open("/dev/mem", O_RDWR | O_SYNC);     if (fd < 0)     {         perror("open(/dev/me/) failed");         return -1;     }     int pagesize = getpagesize();     unsigned char *mem = mmap(NULL, pagesize,                             PROT_READ | PROT_WRITE,                             MAP_SHARED, fd, offset);     if (mem == NULL) {         printf("Can't map memory\n");         return -1;     }      printf("Offset 0x%8x, count %d, pagesize %d\n",             (unsigned int)offset, count, pagesize);      int i, j;     time_t usec;     int acc = 0;     int dummy[4096];     uint32_t *p32 = (uint32_t *)(&mem[0]);      memset(dummy, 0, sizeof(dummy));      for (i = 0; i < count; ++i)     {         p32[i] = i;     }     for (i = 0; i < count; ++i)     {         printf("[%d] = 0x%08x\n", i, p32[i]);         acc += p32[i];     }     p32 = (uint32_t *)(&dummy[0]);     gettimeofday(&now, NULL);     for (j = 0; j < 1024; j++)     {         for (i = 0; i < pagesize/4/4; ++i)         {             acc += p32[i];             acc += p32[i + 1];             acc += p32[i + 2];             acc += p32[i + 3];         }     }     gettimeofday(&then, NULL);     usec = 1000000 * (then.tv_sec - now.tv_sec) +                     (then.tv_usec - now.tv_usec);     printf("DUMMY 1MiB runs took %u usec, giving %d\n",         (unsigned int)usec, acc);      p32 = (uint32_t *)(&mem[0]);     gettimeofday(&now, NULL);     for (j = 0; j < 1024; j++)     {         for (i = 0; i < pagesize/4/4; ++i)         {             acc += p32[i];             acc += p32[i + 1];             acc += p32[i + 2];             acc += p32[i + 3];         }     }     gettimeofday(&then, NULL);     usec = 1000000 * (then.tv_sec - now.tv_sec) +                     (then.tv_usec - now.tv_usec);     printf("REAL 1MiB runs took %u usec, giving %d\n",         (unsigned int)usec, acc);      return 0; } 

 

Tom

 

Message was edited by: Tom Evans to dispute the "Assumed Answered".

Original Attachment has been moved to: memio.tar.gz

Outcomes