AnsweredAssumed Answered

i.MX6Q PCIe performance testing

Question asked by Brian Lee on Oct 4, 2018
Latest reply on Oct 4, 2018 by igorpadykov

Hi All,


I have a custom board design based on the i.MX6Q Sabresd board (MCIMX6Q-SDB) running Android 8.0.


The board has a full-size PCIe 2.0 x1 connection. I have confirmed that this is working by plugging in a PCIe 2.0 to USB 3.0 card (SilverStone SST-EC04-P, link). In Android, via the command line, I can mount an external USB SSD (MX500 120Gb) and see the files, write files etc.


My aim is to verify that the PCIe connection is working at the expected throughput (5GT/s).

My initial tests were done by timing a dd operation to the USB SSD as follows:

Android (i.MX6Q Board):

sync ; time ( dd if=/dev/zero of=/data/media/usb1/tmp/zero bs=30m count=10; sync )
10+0 records in
10+0 records out
314572800 bytes transferred in 4.368 secs (72017582 bytes/sec)
    0m04.92s real     0m00.05s user     0m04.35s system

If I compare this to the USB 3.0 port on my laptop running Ubuntu I get the following:


Lenovo Ideapad 510 running Ubuntu 16.04:


sync ; time ( dd if=/dev/zero of=/media/brian/b4fab089-df5e-4c6e-bc6d-6274175849a6/tmp/zero bs=30M count=10; sync )
10+0 records in
10+0 records out
314572800 bytes (315 MB, 300 MiB) copied, 1.91471 s, 164 MB/s


real    0m2.468s
user    0m0.000s
sys    0m0.404s

So the data transfer speed is half the speed on the i.MX6Q Board.


Looking at the specs of the busses:


PCIe 2.0 speed = 5G/Ts => 500MByte/s of actual data transfer per lane (removing overhead).


USB 3.0 speed = 5Gbit/s => the specification considers it reasonable to achieve 3.2 Gbit/s (400 MB/s) or more in practice

Now, I realise there are tons of caveats here:


SSD drive write speed (mentioned)
USB overhead
USB CPU utilisation (90%+ on the IMX.6 board)


So maybe using a USB HDD is not the best method to verify the throughput of the PCIe bus, but I was hoping to get the same speed on both my Laptop and the IMX.6 board, maybe not saturating the bus due to SSD write speeds, but at least the same throughput on both.


Now, I see that there are some discussions elsewhere on the board, namely here:, where they have set up two boards to communicate via PCIe to measure throughput.


This has the following table:



ARM core used as the bus master, and cache is disabledARM core used as the bus master, and cache is enabledIPU used as the bus master(DMA)
Data size in one write tlp8 bytes32 bytes64 bytes
Write speed~109MB/s~298MB/s~344MB/s
Data size in one read tlp32 bytes64 bytes64 bytes
Read speed~29MB/s~100MB/s~211MB/s

From the data, it shows that using the IPU DMA setup gives the fastest speeds, ~344MBytes/s (still shy of 500MBytes/s). I am a little confused on the best course of action.

So my questions are as follows:



1) Is there a better method to confirm PCIe is running at the maximum transfer speed it can on my board? Preferably something simple like plugging in a simple PCIe card and confirming a transfer rate. Is the more complex RC/EP test here the best test?


2) It appears that the best speeds with the IPU as a DMA master. Is this applicable only to Image Processing, or would this help with e.g. the PCIe to USB3.0 speeds?


3) How do I know if I am running with the cache enabled to get the best non-DMA speeds?  Is this a kernel option or just simply using the mem=768M boot command?