Hello
It would be great if people having a Coldfire Linux system could be kindly run the below little test for me:
http://www.apollo-core.com/sortbench.c
Please be so kind and download the source and compile it with -O2,
execute it and please post the result.
Many thanks your help is very much appreciated.
Kind regards
Gunnar
A Bubble Sort! That's not really a good test of anything. Can you tell us why you chose it - what characteristics of the CPUs are you trying to test?
Bubble sort - Wikipedia, the free encyclopedia
http://www.cs.duke.edu/~ola/papers/bubble.pdf
"... bubble sort, which is merely the generic *bad* algorithm."
The 32 passes you have that "benchmark" running are acting as a "cache-buster". It is stepping up through increasing data array sizes and then running through sequential data. As soon as the array is bigger than the cache the code will get slower. If this is what you want to test you could be using a far simpler benchmark.
Please post your results here. That is likely to get more responses.
The chip I'm using isn'r running Linux and doesn't have the floating point used in the time calculations. I converted it to run.
MCF5329 running at 240MHz (CF3 core, 16k combined cache):
The last column is MB/Sec courtesy of Excel:
Array = 1024 524799 bytes in 37857 usec 105.76 MB/s
Array = 2048 2098175 bytes in 150352 usec 106.47 MB/s
Array = 3072 4720127 bytes in 338087 usec 106.52 MB/s
Array = 4096 8390655 bytes in 610290 usec 104.89 MB/s
Array = 5120 13109759 bytes in 1006009 usec 99.42 MB/s
Array = 6144 18877439 bytes in 1545400 usec 93.19 MB/s
Array = 7168 25693695 bytes in 2221455 usec 88.24 MB/s
Array = 8192 33558527 bytes in 3024159 usec 84.66 MB/s
Array = 9216 42471935 bytes in 3950394 usec 82.03 MB/s
Array = 10240 52433919 bytes in 4995316 usec 80.08 MB/s
Array = 11264 63444479 bytes in 6160938 usec 78.57 MB/s
Array = 12288 75503615 bytes in 7444738 usec 77.38 MB/s
Array = 13312 88611327 bytes in 8846343 usec 76.42 MB/s
Array = 14336 102767615 bytes in 10367919 usec 75.62 MB/s
Array = 15360 117972479 bytes in 11999766 usec 75.01 MB/s
Array = 16384 134225919 bytes in 13753034 usec 74.46 MB/s
Array = 17408 151527935 bytes in 15616586 usec 74.03 MB/s
Array = 18432 169878527 bytes in 17603346 usec 73.63 MB/s
Array = 19456 189277695 bytes in 19700023 usec 73.30 MB/s
(Gave up waiting for it to complete the next one).
Here's an 800MHz ARM Cortex A4 - its memory and cache management is so good that the benchmark doesn't get slower, even at the largest array size:
-------------------------------------------------------------
SORTBENCH 1.0
-------------------------------------------------------------
Array = 1024 MB/sec = 239.64
Array = 2048 MB/sec = 238.22
Array = 3072 MB/sec = 241.01
...
Array = 31744 MB/sec = 234.54
Array = 32768 MB/sec = 234.34
I think it would be better if people could email the results to you rather than adding entries to this post.
Please add some "Contact Details" to your web site to allow this.
Your web site lacks contact details. The first and second tabs on it don't work. The "Performance" page is full of grammar and spelling errors. That doesn't give a good impression of the product:
APOLLO - High Performance Processor
aspell list < apollo | sort | uniq
APPOLLO
convirmed
GZIB
immidiates
implement (should be plural)
kep
perfromance
prim (should be "prime")
rithmetic
shoew
todays (should be "today's")
Tom
Many thanks for running the test!
>A Bubble Sort! That's not really a good test of anything.
>Can you tell us why you chose it - what characteristics of the CPUs are you trying to test?
Yes, I can explain this very well.
1) The test does some real work - sorting memory.
Whether the algorithm is the best for sorting does not matter
as I want to test some CPU features and not do the quickest sorting.
2) In opposite to Dhrystone the performance of the sort is depending on the CPU
and not mainly depending of the string implementation.
3) While test loop is small, it touched some important areas of the CPU.
- Per loop iteration it does the following:
1) One Load
2) One compare
3) One conditional swap of 2 ints
4) A pointer increment
5) A counter decrement and a loop
This means the test result depends on
1) Your IPC
2) performance of cache
3) performance of branch prediction (the conditional code)
4) loop performance /acceleration of it
5) It does a memory write and load to the same cache line, which means it trigger memory hazard detection and handling of it.
This means all the important part of a CPU are involved in the test.
I've picked the test because I looked for simple test to stress the branch prediction and the loop acceleration of our Core.
Also I wanted a test which can stress the instruction fusing of our core.
Small performance tests, like this one help, me to review the pipeline behaviour.
This test in particular helped me to see a possible improvement of our instruction predecode - which improved performance.
Again many thanks for testing!
Do you by chance have a Coldfire V4 also for testing?
Did you do some changes / improvements to the code for embedded Linux for the Coldfire V3?
If yes could you please publish it?
Regarding the Website, thanks for the review but the website is a draft atm and work in progress and not public yet.
The CPU and the website are going to be launched end of this year.
By then I hope to have first of all the CPU fully polished and the website also finished.
Cheers
Gunnar
> Do you by chance have a Coldfire V4 also for testing?
No, only MCF5329 and ARM Cortex that I can easily compile and run on. We also have MCF5235 in another project, but that's in the wrong direction for your tests and I don't work on them.
> Did you do some changes / improvements to the code for embedded Linux for the Coldfire V3?
We're not running Linux at all. We are running "interrupt driven polling loops" with some generic support functions. I just changed it to use our printing macros, our hardware timer and also to pat the hardware watchdog often enough to stop the CPU from resetting.
Tom