SDHD and NAND flash speed benchmarks?

bowerymarc · ‎02-07-2013

hi all,

i'm looking at using a K20 (120MHz) for a project that needs to stream media from either SD card or NAND flash (a couple of GB stored at least), and I can't find any benchmarks for either... the only thing I found was the max speed of SD based on 25MHz clock and 4 bits but that's not an overall system benchmark.... I need to know the continuous speed and CPU utilization for just that streaming.... anyone have any numbers?

thanks!

Marc

Hui_Ma · ‎04-18-2013

SDHC module performance:

We have seen performance of the SDHC operation using SD_CLK as 24MHz and data width as 4bit.

Read 64 blocks( 1 block = 512byte data), the whole operation time is less of 6ms

Write 64 blocks( 1 block = 512byte data), the whole operation time is less of 24ms.

Note: The performance will vary depending on the brand of the SD card you are using.

Wish it helps.

bowerymarc · ‎04-18-2013

Hi Hui Ma,

What software were you using? It seems the PE component (SDHC_LDD) does not properly use the 4-bit mode.

Your read speed of 5.4MB/s and write speed of 1.4MB/s seem almost reasonable, though still a far cry from what the SD cards are capable of - you can benchmark the particular SD card you're using on a computer to get an idea.

Thanks for the info!

Best,

MArc

dachancellor · ‎04-22-2013

Hi Marc,

I've done some inhouse work with the SDHC and NFC in regards to speed tests, as our application will be harnessing both components.

I am using a K70 running at 150MHz, but I hope it will at least demonstrate what is possible.

The SD CLK is running at 50MHz.

The NFC clock is running at 25MHz.

All of my tests consider 128MB of data (the size of the nand on our board), and I am using a UHS-1 uSD.

I'm still using the NFC LDD, as I was satisfied with the performance.

I am NOT using the SDHC LDD, as I was greatly disappointed with the performance (as you and many others have been).

I wrote a driver to use the SDHC utilizing multiblock writes, 4-bit mode, ADMA2.

The speed tests I have for the NFC, I used a timer with a 1024Hz tick, so the measurements are more accurate.

When I did the SDHC tests, I was still using the RTC for my timing, so it is 1 second accuracy.

For NAND (128MB):

full chip erase = 0.793s (161 MB/s)

full chip read = 7.086s (18.064 MB/s)

full chip write = 18.148s (7.053 MB/s)

* I recall seeing faster writes at one point...it may have to do with my current ECC setting

For SDHC, it depended to a point on the number of blocks I wrote per write multiple block command.

I don't remember exactly, but it was along these lines.

>=256 blocks per command = ~8s (16 MB/s)

128 blocks per command = ~10s (12.8 MB/s)

64 blocks per command = ~12s (10.67 MB/s)

I did a benchmark test on the card using CrystalDiskMark on my PC, and it showed the card as 18.5 MB/s read and 17.1 MB/s for sequential.

bowerymarc · ‎04-22-2013

Very useful results thank! For your NAND benchmark, was the part x8 or x16?

Your SDHC performance is more what I would expect (though still not the 30MB/s I benchmarked the i.MX31 for reads...) and it certainly shows that the SDHC_LDD component has a long way to go to be useful for nearly anything.

My goal is 10MB/s reads, so it looks like relying on SDHC would be a big risk at this early (architecting) phase of my design.... NFC looks fairly safe, though I'd be happier with a 2x headroom... your numbers are very helpful!

In your app were you using the NAND as raw memory, or are you using a file system? I was looking at YAFFS2, as I think a filesystem will be helpful....

dachancellor · ‎04-24-2013

The NAND part was x8, and I am using the NAND as raw memory.

We have a FAT32 FS on the SD card.

When using the PE component(s) that utilized FatFS for writing to the SD card, the performance was a little higher than what I saw using SDHC_LDD. This makes me wonder if I wasn't quite using it incorrectly.

However, seeing as the speed is a common complaint and the basic driver I did gave me plenty of performance, I've never investigated it further.

anguel · ‎07-04-2013

dachancellor wrote:

When using the PE component(s) that utilized FatFS for writing to the SD card, the performance was a little higher than what I saw using SDHC_LDD.

Would you please clarify which PE component(s) that utilize FatFS you are referring to?

Are these your own PE components or are these available on the web?

Regards,

Anguel

bowerymarc · ‎07-08-2013

check out ErichStyger/mcuoneclipse · GitHub

anguel · ‎07-09-2013

Marc Lindahl wrote:

check out ErichStyger/mcuoneclipse · GitHub

Thank you Marc, I know about this site. I was just wondering how dachancellor achieved higher speeds with FatFS than with SDHC_LDD as this component uses SDHC_LDD at a lower level...

dachancellor · ‎07-15-2013

Hi Anguel,

When using Erich's example that utilized FatFS, I saw higher speeds than when I used the SD card example included in the CodeWarrior install.

That is what I meant by that.

Note, that the higher speeds I mentioned in my testing does NOT use FatFS, and the results are only in the data write (though one could assume that an extra write to update the FAT would not add much time.).

anguel · ‎07-16-2013

Thank you for the feedback dachancellor!

bowerymarc · ‎07-10-2013

good question!

mr_robotto · ‎07-03-2013

Hi dachancellor,

I can confirm your previous results, SDHC_LDD performance is really disappointing.

BTW, any chance of sharing your SDHC implementation ?

bowerymarc · ‎04-15-2013

FYI for others looking, the best I've been able to get so far (K60F120) is about 38KB/s write, 44KB/s read (with a SanDisk Ultra). Dismal! If you're hoping to use the Kinetis SD for streaming media - security cams, audio players, etc. - forget it!

-------

Custom Benchmark: open file, write 1048576 bytes in 4096 byte blocks, close file:

Deleting existing benchmark files...

Creating & writing benchmark file...

28590 ms total for creating file of which 27520 ms for actual write (38102 B/s)

Reading benchmark file...

23820 ms total for opening & reading file of which 23810 ms for actual read (44039 B/s)

Copy file...

53300 ms needed for copy file (19673 B/s)

done!

Hui_Ma · ‎04-18-2013

Please check below NFC module performance:

Core 96MHz = 10.42ns
Bus 48MHz = 20.83ns
Flash 26Mhz = 38.4ns

Write a page:
- NAND device write page typical time ~220us, max (Worst case) ~500us (tProg in Micron spec)
This time does not include data from NFC buffer to NAND device, this is the time that NAND device needs to program the array (2K+64).

- Move from NFC buffer to NAND: 2112 bytes @ NFC flash clock
- ECC is performed on the fly, so no ECC timing for encoding
- DMA data to NFCBuffer from CPU internal: 2048 + 64 @ CPU clock (11 CPU clock cycle every 32 bytes, 726 CPU clock cycles per page 2112 bytes)
Timing for a page write from CPU to NAND device till DONE bit is
:<DMA time @ CPU clock> + <NAND write command> + <NAND address command> + <DATA transfer @ flash clock> + <NAND device write page time>
So, min time will be: <726 * 10.42> + ~2us + ~2us + <2112 * 38.4> + ~220us = ~313us
And, worst time will be: <726 * 10.42> + ~2us + ~2us + <2112 * 38.4> + ~500us = ~593us

Read a page:
- NAND device read page max time ~25us (tR in Micron spec)
This time does not include data from NAND device to NFC buffer.
- Move from NAND device to NFC buffer: 2112 bytes @ NFC flash clock (for Tony is 38.4ns)
- With ECC 32 bit: Worst case is 7473 bus cycles ~155us (7473 * 1/48MHz * (1 x 10^9)), best case 612 bus cycles ~12.75us
- DMA data from NFCBuffer to CPU internal: 2048 + 64 @ CPU clock (10 CPU clock cycle every 32 bytes, 660 CPU clock cycles per page 2112 bytes)
Timing for a page read from NAND device to CPU RAM till DONE bit is:
<DMA time @CPU clock> + <NAND read command> + <NAND address command> + <DATA transfer @flash clock> + <NAND device read page time> + <ECC time>
Min time: <660 * 10.42> + ~2us + ~2us + <2112 * 38.4> + ~25us + <612 * 20.83> = ~130us
Max time: <660 * 10.42> + ~2us + ~2us + <2112 * 38.4> + ~25us + <7473 * 20.83> = ~273us

Wish it helps.

bowerymarc · ‎04-18-2013

Hi Hui Ma,

This seems like a good on-paper analysis, but what I'm interested in is the actual performance, using the software driver/components supplied by Freescale, etc... (in other words, real-world...)

thanks!
Marc

SDHD and NAND flash speed benchmarks?

SDHD and NAND flash speed benchmarks?

Kinetis K Series MCUs