MCF5328 running very slow

キャンセル
次の結果を表示 
表示  限定  | 次の代わりに検索 
もしかして: 

MCF5328 running very slow

4,786件の閲覧回数
dn_engineer
Contributor I

We are facing a problem that our MCF5328 system runs very slow, slower than our old MCF5272 system.

I did a simple while loop test with long integer calculation as following on both MCF5272 system and MCF5328 system.

One cycle takes MCF5272 system 65ms while MCF5328 system takes 83ms. That doesn't make sense, MCF5272 is V2 core and runs at 48MHz, MCF5328 is V3 core and run at 80MHz/240MHz.

 

The SDR SDRAM that we are using is 48LC8M16A2 -7E, run at 16 bit mode. And I checked the configuration, everything is correct.

 

Anybody know why?

 

while(1)

{

for (i=0; i<0x7FFF; i++)
{
         temp_long = para_long*i + para_long*para_long*i + para_long*para_long*para_long*i +   para_long*para_long*para_long*para_long*i;
         temp_long *= temp_long;


      }

*(char *)0xFC0A4003 = toggle_bit;//toggle a I/O bit
      toggle_bit = 1 - toggle_bit;
   }

 

 

Thanks for your help.

ラベル(1)
0 件の賞賛
返信
24 返答(返信)

3,719件の閲覧回数
TomE
Specialist II

Are you sure the CPU is running at the speed you think it is? Check this by programming a timer and printing something when it expires.

 

Try loading some code into the SRAM and run it from there. Compare that speed with your SDRAM.

 

Why 16-bit SDRAM? That's half the speed of 32-bit SDRAM or 16 bit DDR.

 

It looks like your CACHE isn't working. Check the code that writes the CACR and the ACRs.

 

Another post in this forum had someone running uCLinux configured for 32M and 64M of memory, and the latter was running "10 times alower". That looked to be a problem with the code that set up the ACRs to make the memory cacheable.

 

I've just been running some simple loop-tests (memory copies, see my "memcpy" posts) on our MCF5329, benchmarks that take 6 insructions in a loop and should take 22 CPU clocks. I'm measuring about 25 clocks, which is close enough. Run something like that in SRAM and in SDRAM and if the cache is working they should run at the same rate.

 

Tom

 

0 件の賞賛
返信

3,719件の閲覧回数
dn_engineer
Contributor I

Hi,

 

Thanks for your reply. For this test, yes, I didn't enable cache because once I turn on the cache, USB interface and touch panel stop working. I tried to run the code in SRAM, it is much faster than run in SDRAM. Also I found some very interesting thing when I compare the 5272 and 5328 system, following table is the result. If we run only NOPs, the 5328 system is running faster than 5272, but not in the right ratio (48MHz vs 240MHz), but after I put an either interger loop or floating point loop, 5328 get slower. Do you know why? I compare the aseembly code, they look very similar.

 

All run from SDRAM

 

 

 

 

 

 

 

 

 

100 NOPs (no for loop)

100 NOPs (with for loop)

long integer loop test

(see attached code)

floating point test

(see attached code)

MCF5328

8us

40us

83ms

7.8s

MCF5272

10.4us

60us

65ms

7.5s

 

You mentioned why we use 16 bit SDR SDRAM, I am not sure because our hardware design is like that, I guess try to copy from the Eva board?

We use 16 bit SDRAM SDR mode, and the SDRAM use D16 - D31. I checked the fiex_clk output, it is 80MHz, plus our UART (RS232) and timer all run correctly with 80MHz calculation (time, baudrate, etc). Does this mean the CPU run at the right frequency?

Thank you very much for your help. I am stucked.

0 件の賞賛
返信

3,719件の閲覧回数
dionexwilson
Contributor I

I am currently using the MCF5328 processor with LCD display. If I enable the Cache and with DNFB bit of the cache control register set to 0 the LCD display work fine. However, with DNFB bit enabled the LCD display stopped working (jittering on the display). The reason why we enable the DNFB bit is because this speeds up the process quite a lot! anyone know  as why by enabling the DNFB bit causes problem with LCD controller?

0 件の賞賛
返信

3,719件の閲覧回数
TomE
Specialist II

Default noncacheable fill buffer

 

"Fill buffer used to store noncacheable accesses. The fill buffer is used only for normal (TT = 0) instruction reads
of a noncacheable region. Instructions are loaded into the fill buffer by a burst access (same as a line fill)."

 

Where are you executing code from that is non-cacheable?

 

The whole point of the cache is to speed up code execution.

 

On our box we load all code from FLASH into SDRAM and run it from there. That allows us to run code and write to the FLASH without having to do anything special.

 

My guess is that you're executing directly from FLASH. You should mark that region as being cacheable when you're running, and perhaps only changing it to non-cacheable when running any code that has to write to the FLASH.

 

You should also be running "Cacheable write-through".

 

My guess as to your "jitter" problem is that enabling DNFB is allowing burst reads from your RAM that is increasing the CPU's load on the RAM, and that is making the LCDC FIFO run late and underrun.


Check the LCD_ISR (Status Register) and see if the "UDR" bit is being set.

 

If you're servicing vertical interrupts (to perform page swaps) then your service routine should throw some sort of debug indication if it ever sees this bit set. Or the ERR bit. Or any bits other than BOF and EOF.

 

I'd recommend setting the DMA Control Register to FIXED length (BURST = 1) with HM set to 20 and TM set to 12.

 

Have you reprogrammed the Crossbar? You have to have programmed the LCDC to a higher priority than the CPU on the SDRAM port. The default is to have the CPU running higher than the LCDC, and that will guarantee underruns. The CPU should have the LOWEST priority here (what were Freescale thinking? :smileyhappy:.

 

Make sure you've programmed the "V" bit in the RAMBAR register to give the CPU single-cycle access to the SRAM.

 

You should also have the CPU's stack in the SRAM if your OS allows it - this makes things faster too.

 

Tom

 

0 件の賞賛
返信

3,719件の閲覧回数
dionexwilson
Contributor I

Hello Tom,

Thank you so much for your help, I really appreciate it.

 

Our system is basically set up as you described. After program is booted the program is run from the RAM which is cacheble.

 You are absolutely correct about the under-run condition, by setting the “DNFB” bit of CACR causes the UDR bit of LCD_ISR to be set, and setting “DNFB” to 0 will eliminate the under run issue.

Per your advice, we added the following:

(1) move the CPU stack to the SRAM

(2) the the “V” bit of RAMBAR

(3) set the LCDC to the highest priority in crossbar.

(4) configure the LCDC Graphic Window DMA Control Register to burst mode with GWHM set to 20 and GWTM set to 12.

But the under run condition remains and screen is still jittering, can you tell me the configuration setting of SDRAM and cache on your system? BTW we use the CFINIT program to implement our settings.

0 件の賞賛
返信

3,719件の閲覧回数
TomE
Specialist II

> Our system is basically set up as you described. After program is

> booted the program is run from the RAM which is cacheble.

> You are absolutely correct about the under-run condition, by

> setting the “DNFB” bit of CACR causes the UDR bit of LCD_ISR to be set

 

The only effect of the DNFB bit is to burst-read instructions from NON_CACHEABLE storage.


Therefore, by your test results, you are NOT running from cacheable RAM. It is not running the way you think it is.

 

> BTW we use the CFINIT program to implement our settings.

 

That's a starting point, but you have to check that code against the SDRAM specifications, your board design and so on.

 

Make sure you have the latest one from http://www.microapl.co.uk/ version 2.11.2 (2009) and not COLDFIRE_INIT one from Freescale's site which hasn't been update in 11 years.

 

https://community.freescale.com/message/68422#68422

 

Check the "Snippets, Boot Code, Headers, Monitors, etc. (4)" on the "Software and Tools" tab on the MCF5329 page too.

 

Read AN3606. It claims 10 80MHz clocks per 4-32-bit burst, but 11 is more likely.

 

> can you tell me the configuration setting of SDRAM and cache on your system?

 

SDRAM has to match your chips, wiring and hardware. Your cache setup is currently wrong. Here's mine. NOTE that the ACR0 maps 16M of SDRAM at 0x40000000. If you have more SDRAM than this you'll have to change the mask bits in ACR0.

 

cacheSetup:
  /* ========================= Configure Cache ======================= */
  /* CACR Register:                                                    */
  /* | 31    29  28     27        24          10 9 8        5          */
  /* | EC 0 ESB DPI | HLCK 0 0 CINVA |||| 0 DNFB DCM | 0 0 DW 0 | 0000 */
 
  move.l #0x01000000,%d0
  movec %d0,%CACR /* invalidate cache */

  nop

  /* ACR0/1 Registers:                                                       */
  /* | 31        24   23        16  15 14   13               6         2     */
  /* | ADDRESS BASE | ADDRESS MASK | E S-FIELD 0 | 0000 | 0 CM 0 0 | 0 W 0 0 */

  /*
   * Address Base = 0x40
   * Address Mask = 0x00 (16 Meg)
   * E            = 1 Access control register enabled
   * S-FIELD      = 2 Ignore Supervisor mode when matching
   * CM           = 0 Cacheable, Writethrough
   * W            = 0 Read and Write Permitted
   */
  move.l #0x4000C000,%d0
  movec %d0,%ACR0
  nop

  move.l #0x00000000,%d0 /* Second control register disabled */
  movec %d0,%ACR1
  nop

  /*
   * EC   = 1 Enable Cache
   * ESB  = 1 Enable Store Buffer (essential for speed)
   * DPI  = 0 Cache line invalidated if pushed with CPUSHL
   * HCLK = 0 Full cache mode (no locks)
   * DNFB = 0 Cache buffer not used for non-cacheable accesses
   * DCM  = 2 Default is Cache Inhibited, Precise Mode
   * DW   = 0 Default Write Privilege - Read and Write Permitted
   */
  move.l #0xA0000200,%d0
  movec %d0,%CACR /* enable cache */
  nop

 Tom

 

0 件の賞賛
返信

3,719件の閲覧回数
dionexwilson
Contributor I

Hello Tom,

 

Thank you so much for your sound advices. By following your setting for cache setup I am able to resolve the flickering issue. The problem was the “S” bits of the access control register, we had it set to 00 by changing to 10, the LCD display comes up at much faster speed, which is an indication that cache is working. I really can not thank you enough for all your kindness in helping us. My collogue and I was wondering if you are local to us (in the bay area) we will be more than happy to take you out for lunch just to show our gratitude and honor of meeting you in person.

 

Wilson

0 件の賞賛
返信

3,719件の閲覧回数
TomE
Specialist II

> the LCD display comes up at much faster speed,

 

Are you trying to do any sort of animation or fast updating of the LCD display?

 

Are you drawing into the buffer while it is displaying, or are you using Double or Triple buffering to eliminate the flashes and tears that usually causes?

 

if you're using multiple buffers, how are you handling the "page flip" between them. There's all sorts of problems with the EOF and BOF indications and interrupts this chip gives that makes that very difficult.

 

Tom

 

0 件の賞賛
返信

3,719件の閲覧回数
dionexwilson
Contributor I

Sorry for my late reply, May 30th is Memorial day for us in theUS so we had a long weekend off.

Are you drawing into the buffer while it is displaying, or are you using Double or Triple buffering to eliminate the flashes and tears that usually causes?

To prevent the display from  tearing, the driver uses double buffer for the frame buffer and don’t have any animation associated with our display. But our display do consists of many screens.

 if you're using multiple buffers, how are you handling the "page flip" between them. There's all sorts of problems with the EOF and BOF indications and interrupts this chip gives that makes that very difficult.

We did not write the display driver ourselves, we use the PEG made by Swell software. The driver switches the buffer whenever EOF is reached. BTW, does your system uses USB? Because after we speed up the display the USB that use to work stopped working, and if I switch back to the previous cache setting, the USB works but display is much slower!

 

0 件の賞賛
返信

3,719件の閲覧回数
TomE
Specialist II

> The driver switches the buffer whenever EOF is reached.

 

But the switch doesn't happen until after the next interrupt. Does the display driver have code in there to count two interrupts before allowing writing to the "other buffer"?

 

> BTW, does your system uses USB?

 

Yes, and it works.

 

> Because after we speed up the display the USB that use to work stopped working,

> and if I switch back to the previous cache setting, the USB works but display is much slower!

 

If you look back in this forum you'll find I said over two weeks ago that:

 

> 2011-05-13 04:43 AM

> Setting the cache to WRITETHROUGH (instead of COPYBACK) avoids most of the problems.

> You should be able to write to the LCD panel and have it work. In this mode you don't have

> to flush writes to the hardware, but you do have to invalidate buffers you're reading that

> the hardware has just written.

>

> You'll either have to invalidate the cache before any reads of USB rings or buffers,

> or (a lot easier) put all the USB buffers in SRAM like we do. If you have to have

> data in SDRAM then copy it to and from a set of buffers in SRAM.

>

> If your USB drivers are properly written they should have "hooks" for tis, or

> be written assuming SRAM buffers.

We bought our USB stack from SMX. It has a memory allocation function that gets all the memory required for the USB structures. We "hook" that to allocate all this memory from a block of the static RAM.

 

So who wrote your USB drivers?

 

Tom

 

0 件の賞賛
返信

3,719件の閲覧回数
dionexwilson
Contributor I

We wrote our own USB stack, we started with Freescale’s example driver and expand from there. It was quite a painful experience to say the least!!!

I will try to relocate all the USB buffers to SRAM area and hope it will resolve our problem. Thanks.

Wilson

0 件の賞賛
返信

3,719件の閲覧回数
TomE
Specialist II

Don't forget everything else you may have using DMA. Are you using the Ethernet port? ADC using DMA?

 

Tom

 

0 件の賞賛
返信

3,719件の閲覧回数
mqxexamples
Contributor I

We had a working MCF5329 system. To increase the bandwidth, we changed the H/W and replaced the 4 M SDRAM with SRAM. For some reasons we are experiencing hang-up issues. The identical program works on SDRAM but stopped working on the system that replacing the SDRAM with SRAM.

0 件の賞賛
返信

3,719件の閲覧回数
TomE
Specialist II

> To increase the bandwidth, we changed the H/W and replaced the 4 M SDRAM with SRAM.

 

Did you actually measure much of a speed improvement? You would have to be using burst-mode SRAM or extremely fast (usually expensive and small) or it could easily be slower than SDRAM.

 

Did you have the cache enabled originally, or did you find (as the original poster to this forum found) that the 240MHz CPU runs at about 10MHz with the cache off?

 

What bandwidth were you measuring with the SDRAM? You should be able to get (as I've documented on this forum) 85MB/s read speed and 207MB/s write speed to SDRAM. You should be able to copy SDRAM to SDRAM at 56MB/s if you know what you're doing, or 30MB/s if you don't (and use the library "memcpy() function) or WAY less if you're using a really dumb copy (like a byte at a time)..

 

And what speed are you now getting with the SRAM?

 

> we are experiencing hang-up issues. 

 

Run a memory test. Relax the memory timing. Add wait-states and check the setup-and-hold timing. See if the problem goes away with relaxed timing, then change the memory controller parameters one at a time to see what it is sensitive to. Verify your timing against the data sheets with a multi-channel CRO.

 

People often have trouble if running with the cache off and then use instructions or DMA (or turn the cache on), all of which performs burst accesses. Only then do they find out the memory controller isn't properly programmed for the burst timing.

 

Here's another big trap with this chip. Make sure you play with the "MCF_GPIO_MSCR_FLEXBUS" drive strength settings and set them "high" if you're using series termination (everywhere) and "low" otherwise. This gave us problems with the default SDRAM bus strength which was too high for our design. The chip defaults to "high" unless you change it, whereas previous chips (MCF5235 in our case) default to "Low". Refer to this section in the Reference Manual:

 

13.3.6 FlexBus Mode Select Control Register (MSCR_FLEXBUS)

 

Tom

 

p.s. Max, if you're reading this, check your Private Messages.

 

0 件の賞賛
返信

3,719件の閲覧回数
TomE
Specialist II

Could you let me know what speed you're getting? I'd like to know, and if you're not getting what you expect then I can help.

 

Send me a private message if you prefer.

 

Tom

 

0 件の賞賛
返信

3,719件の閲覧回数
TomE
Specialist II

> if you are local to us (in the bay area)

 

Port Phillip Bay area.

 

Melbourne. No, not Florida - Australia :smileyhappy:

 

Thanks for the invitation

 

Make sure you're running Write-Through rather that Write Back. I found that makes LCD functions that are writing to the LCD faster as you're writing data that you don't need to read back later. Flushing a write-back cache can take a long time too. Likewise ESB (the Store Buffer).

 

Tom

 

0 件の賞賛
返信

3,718件の閲覧回数
TomE
Specialist II

>  If we run only NOPs, the 5328 system is running faster than 5272, but not in the right ratio (48MHz vs 240MHz),

 

That's the ratio of the memory clocks. The MCF5328 has an 80MHz memory bus and I'd guess the 5272 is running at 48MHz.

 

The CPU can issue instructions at 240MHz. But only from SRAM or the cache.

 

When the cache is enabled, it reads a whole line (16 bytes) of data from the SDRAM to the cache line.

 

With 32-bit SDRAM that takes 10 memory clocks. In your case (16 bits wide) that would take 14 memory clocks. That is 42 CPU clocks to get 8 16-bit instructions or 4 32-bit instructions. That is a lot slower than the CPU can execute them, but the whole idea is to get your loops read into the cache, so only the first time is slow - the rest of the times are fast.

 

Without the cache, the CPU has to read EVERY instruction from the SDRAM. That will take at least 7 or 8 memory clocks.


Which is 21 or 24 CPU clocks. And if that instruction needs to read some data from memory that'll take another 21-24 clocks.

 

So by not running from cache your 240MHz CPU is running at 10MHz. A 20 year old 68000-based system could outrun it.

 

If you're happy with 20-30 year old performance, then leave the cache off. If you need it faster you MUST enable the cache.

 

Checking those figures, you're getting 100 NOPs in 8us. Do you know that a NOP takes 3 clocks and flushes the CPU execution pipes? The "proper NOP" for the MCF5328 is the "TPF" instruction. Check the manuals for this.

 

Anyway 100 in 8us is 80us/instruction or 12.5 million per second. That's one instruction per 6.4 memory clocks (19.2 CPU clocks). That means the instruction prefetcher is reading 32 bits every 12.8 memory clocks, or about what I'd expect.

 

If you stepped up to the MCF5328 to get extra speed you could have got a speedup of at least 5 on your old CPU by enabling the cache on it. You still can. In fact as the MCF5272 cache only caches instructions (and not data) it should be easier to turn it on without causing DMA problems. Are you sure you don't have it turned on already?

 

> once I turn on the cache, USB interface and touch panel stop working.

 

Everybody else has been dealing with this since caches were invented, which would be about 1960. :smileyhappy:

 

You should set up the CACR with cache set to "cache inhibited, precise" and then set up one ACR to map the SDRAM set to "Cacheable, write-through". That means that only the SDRAM is cached and not the I/O registers or your FLASH.

 

Set the "Enable Store Buffer" bit in the CACR. It speeds up stores (especially to the LCD memory).

 

Setting the cache to WRITETHROUGH (instead of COPYBACK) avoids most of the problems. You should be able to write to the LCD panel and have it work. In this mode you don't have to flush writes to the hardware, but you do have to invalidate buffers you're reading that the hardware has just written.

 

You'll either have to invalidate the cache before any reads of USB rings or buffers, or (a lot easier) put all the USB buffers in SRAM like we do. If you have to have data in SDRAM then copy it to and from a set of buffers in SRAM.

 

If your USB drivers are properly written they should have "hooks" for tis, or be written assuming SRAM buffers.

 

Make sure you put the CPU's STACK in SRAM. That way all the function local variables are in zero-wait-state memory.

 

Tom

 

0 件の賞賛
返信

3,719件の閲覧回数
maxhexis
Contributor I

Hi,

 

I have another point to underline. If you do not use the internal cache, LCDC access to the framebuffer delays CPU, too. It depends on LCD size and frame rates, but it can have a strong impact on performances. If this is the case, try to use 16 bit RGB, or 8 bit with a palette, to reduce bandwidth. Please, remember that 18 bits per pixel means 32 bits per pixel, due to the type of packing inside framebuffer memory, so a slight reduction in color depth (just 2 bits) means halving memory accesses.

 

Hoping to be useful...

0 件の賞賛
返信

3,719件の閲覧回数
dn_engineer
Contributor I

Hi All,

 

Thanks for your help and all suggestions. We are changing to use 32-bit SDR SDRAM instead of 16-bit SDR. Hopefully this will improve the performance a lot, I'll come back if I still have problem.

 

Thanks again.  

0 件の賞賛
返信

3,719件の閲覧回数
TomE
Specialist II

> We are changing to use 32-bit SDR SDRAM instead of 16-bit SDR.

 

Bad idea. The MCF5329 doesn't reset properly with 32-bit SDRAM on a non-split bus. You'll have to do a lot of work with the clock and power supply to guarantee a reliable startup.

 

Make sure you read the details here:

 

https://community.freescale.com/message/69745#69745

 

Also read the details in the latest Errata document (MCF5329DE.pdf) and also the associated Engineering Bulletin (EB740.pdf).

 

The easiest way to avoid the above problem is to use a split bus, possibly with DDR (except the board design is tougher for DDR).

 

Doubling the SDRAM width will double the speed to approximately 20MHz. That's still only going to run at 1/12 of the speed the 240MHz CPU is designed to run at.

 

Have you tried turning the cache on? If not, why not?

 

If there's a real reason you don't want to run with the cache, then please provide us with details on why not. We may learn something more.

 

Have you tried WRITE-THROUGH mode? That doesn't cause the problems you may have seen when turning the cache on previously.

 

That's the real problem you need to solve next - running with the cache.

 

Tom

 

0 件の賞賛
返信