The performance of S08 MCU!

bluehacker · ‎12-12-2009

hello, everyone!

I want to know what is the performance of HCS08 MCU, atmel says its avr mcu have 1MIPS/MHZ

performance, but what about hcs08/rs08 mcu?

,

rocco · ‎12-13-2009

I have always thought that "mips-per-megahertz" was a rather silly criteria for comparison, unless you were concerned about power consumption. And then, you would also need to consider milliamps-per-megahertz.

Is a 1mips/mHz part that runs at a maximum speed of 2mHz better than a .25mips/mHz part that runs at 20mHz? I can buy both oscillators for the same price. But the 20mHz part will draw more power.

Having used both 8-bit AVRs and HC08s, I can't say either one was much better than the other.

I like the instruction set better on the HC08, but the development tools are much better on the AVR, which is why we switched. Our firmware development costs dwarf all other costs combined.

If you are developing in C, then the quality of your compiler may have the biggest effect on your performance. As Carl said, you may need to benchmark with your actual application, if it is that important to you.

View solution in original post

nonarKitten · ‎04-23-2011

Let me preface this with a simple statement of fact -- AVR lies. The ATmega8 is slower than the S12, clock for clock, and gets a real-world rating of about 0.26 MIPS/MHz. Companies are very misleading in this regards, but just because the AVR can execute one insturction per clock cycles doesn't mean it can achieve 1.0 MIPS/MHz. The ARM7TDMI, a fully 32-bit architecture can only achieve 0.9 MIPS/MHz.

The S12, which is the "big brother" of all Freescale's 8-bitters gets about 0.31 MIPS/MHz. To be fair, it also runs up to 80MHz, which is four times the maximum clock rate of the AVR at 20MHz; so, overall, the S12 is almost five times faster than the AVR. But you didn't ask about the S12...

I haven't checked the S08, but I would guess it's between 20% and 30% slower than the S12. I think I'll port the Dhrystone benchmark to verify this assumption, but best-guess is that it's **about** 0.24 MIPS/MHz. Given that the S08 has about a 2.5:1 clock advantage over the AVR, I would say that any weakness in it's instruction set it more than compensated for by raw speed. In fact, to perform the same with a clock difference of 48:20 MHz, the S08 would have to be as slow as 0.11 MIPS/MHz, and I doubt it's that slow.

CarlFST60L · ‎12-12-2009

My understanding is that there is more to it than MIPS/MHZ. You need to take a look at the architecture.

If your that concerned with 'who's faster', you would really need to bench mark your code for your application. You will also need to be very aware of what instructions you use as there are some very clever ways to make your code super fast.

Having only used a 8bit RISC Atmel in production twice, for our applications, they are very similar in terms of work done v clock. I did find that some things I could do 'faster' in Atmel, but, took more code space, though I am sure people can give examples that support both as being 'better'.

nonarKitten · ‎04-25-2011

Okay, I was way off:

MCU          MIPS/MHz*
Z80            0.01
65C02          0.02
HCS08          0.04
MC6809         0.07
8051           0.08
H8/300H        0.11
ATxmega64A1    0.21
ATmega8        0.22
HCS12          0.25

MSP430         0.40
PIC24          0.72
ARM7TDMI       0.74

At maximum clock, the ATmega8 (20MHz) would achieve about 4.3 MIPS, and the HCS08 (48MHz) would get approximately 1.9 MIPS -- a rather large difference, given that the HCS08 has a pretty large clock advantage on the ATmega8.

* Note, all measurements are based on the Dhrystone 1.1 test, which is known to be unreliable with smart compilers which can sometimes optimize code out of existence.

CarlFST60L · ‎04-25-2011

I started to quickly write up some examples in assembly, then realised, its going to take a long time to get this comparison done in any way that will show the pro's and con's of both, and nonarKitten has pretty well covered it... In 10 years of development with both products and over 300 products using both processors (90% freescale), we have never once considered MIPS/MHz as a 'deal breaker'. Sure we have interrupt code written in assembly and optimised for speed along with some other modules, but, the code is written in assembly with clock cycles counted when required. For our larger scale products MIPS/MHz is far more important, especially with RToS based applications, GUI's, (FEC) Ethernet etc, but in this low end 8 bit, its probably less than 1% of what we consider when choosing a specific processor for our applications.

Here is typically what we are looking for in a processor:

Power / Stop modes and their related current draw

Peripheral modules and their specific functions (ADC, I2C, PLL/FLL, CAN)

Interrupt sources

Timer modules

Flexibility and function in all modules

Tool chain

For these smaller processors, it seems to always come down to finding a processor with the right modules, pins, flash/ram size, power requirements etc.

The last time this became a problem for me was when we were doing tone generation for a siren, the poor little QD4 couldn't handle sending serial commands and generating sirens tones in very high frequency, however, we optimised the output capture compare interrupt code that generates the tones, and it all worked in the end.

nonarKitten · ‎04-26-2011

I agree -- finding the right mix of peripherals can be the greatest challenge sometimes. I wish Freescale would just ditch all the peripheral options and put something like the TPU on there, so when we need eight I2C lines, we have them; when we need USB, it's there, and if we need five independant PWM channels, okay. None of this -- well, this processor is **close** so we'll just have to live with it and bit-bang the I2C slave (ugh).

MIPS/MHz is all about instruction expressiveness and compiler efficiency, and less about performance. Performance is more about the MHz and the simple (load, store and add) insturction times. The time it takes to perform a 32x32 bit MUL is less important on embedded processors than the interrupt latency, or the time to toggle a bit in a peripheral register. It's not until you start adding high-level stuff like floating or fixed point arithmetic, GUIs, and such, that a figure like "MIPS" becomes meaningful.

And if you want performance, the new Kinetis processors get 1.25MIPS/MHz -- at 50MHz, the slowest clock available, they're more than thirty times faster than the S08 processors at 48MHz. They also solve much of the peripheral problem, as all Kinetis MCUs have the exact same peripheral register set, and only vary by pinout. Still not PSoC level where any peripheral can be on any pin, but a huge step forward.

nonarKitten · ‎04-26-2011

Wheee. Moving all the variables I could to zero-page, and swapping the strcpy with a routine that uses the stack pointer as an index pointer boosted the HCS08 from 0.04 to 0.06 MIPS/MHz. It's not interrupt friendly, but it anyone's interested:

        PSHA            ;// save our abused registers        PSHX        PSHH        CLI             ;// Interrupts need to be disabled when re-purposing the stack pointer        TSX             ;// Transfer the stack into HX to save it        STHX __Stack    ;// into our temporary spot        LDHX __Src      ;// load in our source        TXS             ;// and re-purpose out SP as an index pointer        LDHX __Dest     ;// and load our destination    loop:        PULA            ;// [3] load A from our source index (pre-inc stack pointer)        STA  ,X         ;// [2] store A into our dest index (non-inc index pointer)        AIX  #1         ;// [2] manually increment H:X        CBEQA #0,loop   ;// [4] and if A != 0, continue until we're done        LDHX __Stack    ;// reload the saved stack pointer into HX        TXS             ;// move it into the stack register        SEI             ;// and safely re-enable interrupts        PULH            ;// restore our registers        PULX        PULA        RTS             ;// and exit the function

tonyp · ‎04-26-2011

Although I find little real value in such benchmarks, which I consider completely misleading if not used in a meaninful way, when you do them, at least do them correctly:

Your copy routine (except for the CLI/SEI reversal already mentioned) has a more serious problem. It only copies one byte (except for empty strings, in which case it usually copies two). This might explain the higher grade you got for your DMIPS.

The CBEQA #0,LOOP should have been BNE LOOP to even go past the first string byte.

nonarKitten · ‎04-27-2011

Sigh... Good catch on the BEQ/BNE.

Anyway, with the fixed strcpy function, the Dhrystone MIPS drops back down to about 0.09 per MHz; and further down to about 0.06 MIPS/MHz with the default strcmp. Conversely, applying the same trick to strcmp, brings performance up to 0.12 MIPS/MHz, indicating that a large performance bottleneck stems from the limitation of having only one index register.

And while the tester's faculties can certainly be suspect at the moment, I don't want to diminish the importance of performance. While not the most critical characteristic, it's still important to know what you're getting into, and what limitations the MCU has to better guage which platform to start with. Few things are as infuriating than to have found a processor with all the right bells and whistles only to find that it can't run the application at the speed you need it to.

P.S. the compiler sort of freaks out using the stack-pointer this way.

tonyp · ‎04-27-2011

nonarKitten wrote:

Anyway, with the fixed strcpy function, the Dhrystone MIPS drops back down to about 0.09 per MHz; and further down to about 0.06 MIPS/MHz with the default strcmp. Conversely, applying the same trick to strcmp, brings performance up to 0.12 MIPS/MHz, indicating that a large performance bottleneck stems from the limitation of having only one index register.

And that just shows why benchmarks like this don't mean much if mis-applied. If the DMIPs can be so widely affected just by the behavior of only the string copy operation, for the HC08/9S08 (at least) it is a meaningless benchmark, since most CPU time (percentage-wise) spent on these MCUs -- for the majority of real-world applications for which these chips are intended -- is not related to copying strings.

bigmac · ‎04-29-2011

Hello,

This discussion reminds me of a definition of "MIPS" as a "Meaningless Indicator of Processor Speed". This would seem particularly so when the strcmp() and strcpy() "standard" libary functions are specifically hand coded to achieve a perceived improvement of Dhrystone 1.1 performance, a long since obsolete test.

The test had much more to do with compiler efficiency than internal MCU core efficiency. As a more realistic benchmark, to directly compare the maximum speed of different MCUs, perhaps there should be a set of "standard" tasks, with well defined outcomes, to also include basic I/O and peripheral operations. Well optimized assembly code would need to be used for each MCU type. Perhaps this sort of thing has already been done?

The "MHz" part might have meant something in the past, when a crystal was connected to the MCU to directly generate the bus frequency. But the picture is now quite different with the use of FLL or PLL blocks that use a low frequency reference. What value of MHz does one choose? Perhaps not the crystal or reference frequency, but maybe the bus frequency for the MCU. Or perhaps some other totally different frequency.

I totally agree that "speed" aint everything, when comparing MCU products. In fact, for the majority of projects it is not event considered.

Regards,

Mac

nonarKitten · ‎06-08-2011

My first microcontroller project at the company I work with, was using the RS08 to control two fans, a pump, an LCD and check if the user hits a button to cycle between performance modes. The main loop had to catch the interrupt as near to the PWM rising edge as possible to allow capturing the tach signal -- this interrupt ran at approximately 22.5kHz, the "recommended" frequency for PWM control of fans, according to Intel.

That interrupt alone used up nearly all the poor little RS08's CPU resources. A lot of effort was spent on making the interrupt as light-weight as possible so that the unit ran well, and it shipped still missing about 5% of the interrupts. In hind-sight, a low-end S08 core would have been a much better choice, as it runs substantially faster (higher clock, and better ISA with multiply, divide, stacks and a real interrupt).

But we had no way of determining at the onset of the project that the RS08 was going to be inadequate, and by the time we did, we had invested too much in the engineering of the PCB to respin a new board with a new micro, when we were already months behind schedule.

Now, I'm not sure if published MIPS figures for the RS08 and S08 cores would have helped that much, but it might have, at least from a comparative point when approaching management with the proposition of upgrading to the pin-compatible S08 part. Management loves numbers, and being able to say it's precisely x-times faster and would avoid costly and unnecessary code-optimization cycles, is a huge plus.

bigmac · ‎06-09-2011

Hello, I realize this is hindsight, and my comments will probably not be helpful in this instance.

If the choice of MCU was between RS08 and HCS08 devices, you don't really need comaritive MIPs ratings, because of the close similarities between the two families. Simulating the operation of more relevant test code would give a more direct comparison, and this would support your already strong argument about the limitations of the RS08 device.

Since your application had an input capture requirement, best handled by interrupt processing, the choice of a device that does not have a proper interrupt and stack structure was probably risky at the outset.

When putting a case to management, IMHO it is better to initially select a device that you have a high degree of confidence can be made to work within the project. Once the coding process is substantially complete, then may be the time to consider the feasibility of down-grading to a marginally cheaper device. If successful, this would be a bonus.

Regards,

Mac

donw · ‎04-26-2011

note: In the code above you have the CLI SEI reversed ! SEI disables interrupts....

I agree with the comments above, for me, when chosing a uP for an application, it nearly always comes down to the peripheral hardware and the code development system.

nonarKitten · ‎04-27-2011

AUGH! My bad -- thanks for the catch. Also found out that since AIX doesn't modify the CCR, another cycle can be shaved off by simply using BEQ instead of CBEQA, making it ten cycles per byte copied, plus 40-cycle overhead. In comparison, the built-in strcpy takes 43 cycles per byte, plus a 28 cycle overhead.

Not sure if memcpy could be likewise optimized, since it also needs a counter; but it's quite the performance boost having a second effective index pointer -- especially one which is autoincrementing. Wouldn't a LDA/STA ,X+ instruction be really, really handy? And what if we could swap X and Y and have a "backup" index pointer. Then we'd have an interruptable loop, that even faster (less overhead on the entry/exit code):

LDA  ,X+        ;// [3] load from source pointer, post-incSXY             ;// [1] swap X and YSTA  ,X+        ;// [2] store to destination pointer, post-incSXY             ;// [1] swap X and YBEQ  loop       ;// [3] and repeat until we LDA a 0

Also, I found a bug with the compiler (or the Dhrystone code), where if enum's are ints (default), one of the inner loops is executed 4800 times instead of only 100 times. Changing the enums to char's fixed it for some reason (maybe it's missing an initializer somewhere). Not sure what's up with that.

Any-who, current results show the HCS08 to actually be about 0.14 MIPS/MHz, a little closer to what I had originally expected.

rocco · ‎12-13-2009

I have always thought that "mips-per-megahertz" was a rather silly criteria for comparison, unless you were concerned about power consumption. And then, you would also need to consider milliamps-per-megahertz.

Is a 1mips/mHz part that runs at a maximum speed of 2mHz better than a .25mips/mHz part that runs at 20mHz? I can buy both oscillators for the same price. But the 20mHz part will draw more power.

Having used both 8-bit AVRs and HC08s, I can't say either one was much better than the other.

I like the instruction set better on the HC08, but the development tools are much better on the AVR, which is why we switched. Our firmware development costs dwarf all other costs combined.

If you are developing in C, then the quality of your compiler may have the biggest effect on your performance. As Carl said, you may need to benchmark with your actual application, if it is that important to you.

bluehacker · ‎12-14-2009

thanks for two nice persons. I am not really concerned about the performance of HCS08. I post the thread just because I am now teaching freescale's MCU, including HCS08 and coldfire. Many novices ask me the question. for them I can't give Rigorous answers just like rocco and carl. They have already accustomed the MIPS/MHZ criteria. so I need a performance data in MIPS/MHZ.

I find atmel's document claim the 1MPIS/MHZ performance for avr mcu is very attractive for many many novices. someone think avr mcu is the best mcu in the world just because atmel claim the 1MIPS/MHZ performance. SO , I think if freescale give a data which is even not Rigorous, will help to attract novices to learn HCS08

The performance of S08 MCU!

The performance of S08 MCU!

General