USB speed on MCF51JM128

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

USB speed on MCF51JM128

1,292 Views
rapa
Contributor I

Hi guys.

I am using in my system micro MCF51JM128. According the datasheets, if i configure micro to work as a device on

Full-speed( I mean 12 MB/sec or 1.5 MByte/sec), this is a speed that i expect from USB bus in this case. But actually, i get only 550 KByte/sec speed and this is not enough. I need at least 1 MByte/sec. I am using by ALL free endpoints for output.I have only one endpoint for input,so the solution to change Input endpoint to be altered on-the-go(OTG) can improve the throughput, but it is not a good enough solution,because still i can not get 1 MByte/sec.

 

Anyone faced this problem before?How to solve it, if u faced this problem?

 

Thank,you Slava.

 

P.S. I am using USB driver that i found in Freescale application CAN_USB bridge.Maybe the problem is in this driver.

Labels (1)
0 Kudos
Reply
2 Replies

751 Views
TomE
Specialist II

http://www.usb.org/developers/usbfaq#band1

 

From the above link, read Table 5.2 in the "Universal Serial Bus Specification Revision 2.0" document. As far as I can tell the maximum data packet size is 64 bytes with 45 bytes of protocol overhead, resulting in a maximum bandwidth of 832,000 bytes/second. That matches what I measured with HDTune.

 

http://en.wikipedia.org/wiki/Usb#Transfer_speeds_in_practice

 

"For USB 1.1, an average transfer speed of 880 KiB/s has been observed."

 

> P.S. I am using USB driver that i found in Freescale application CAN_USB bridge.

 

Find and read AN3631.pdf. It might refer to your stack.

 

 

0 Kudos
Reply

751 Views
TomE
Specialist II

I've just plugged a USB Stick into the USB1 hub that is part of the Keyboard on my desktop computer. I am running "hdtune" to measure the maximum read rate from the stick. The maximum I can get is 800 kbytes/second.

 

If I can't get to 1 MB/s on an 8-core multi-gigahertz desktop computer, then maybe you won't be able to get that speed on an embedded device either.

 

===

 

If you want to find out what it is doing, start measuring your execution timing.

 

First though, is your CPU running at the expected speeds? Is it running at the proper clock rate and is the Cache set up properly?

 

The easiest way to start measuring timing this is to start a DMA timer (or equivalent) in a free-running mode. I'm using an MCF5329, and we have a 32-bit DMA timer free-running at 1MHz. At the start of the function you want to measure the execution time of, read the timer count register, and at the end of the function, read it again and subtract them. Print the result on a serial console or stash it away in a variable you can later inspect with the debugger. You can work through the code trying to find where it is stuck and maybe what it is waiting on. That will also tell you if the CPU is executing code at the expected rate.

 

You can also measure "how many times per second" you're running through the calls to the driver, or how often your idle loop is running. Are you calling the driver often enough?

 

Another thing that can help is to read and/or write bigger blocks of data through each call to the driver, if it supports this.

 

For the next level of instrumentation, and yes, you do really have to do this sometimes Smiley Happy

 

I've just added two profiling systems to the code I'm working on. Our code runs different threads at different interrupt levels, mainly IPL0, 1 and 3. I've got code that is called when the mainline at IPL0 comes out of idle and also by all interrupt service routines at their start and end. This captures the microsecond timer and logs the execution time of all of the different priority levels (correcting for higher levels interrupting lower ones). Once per second it prints out the total execution time for all threads for that second.

 

The other profiling system allocates a big block of RAM the same size as the code, and sets up a timer to interrupt at IPL7 every 13us. It looks back on the stack to get the interrupted program counter, and uses it as an index into the block of RAM which is used as a histogram of where the code was at. After a 30 second run it dumps the histogram to the console port. That tells me exactly where the code has been. Combine that with the MAP file (or <nm -n code.elf | grep " [tT] ">) to get symbols, merge with the histograms, sort and that gives total execution time by function.

 

If you're using a CF chip with code in FLASH and a small amount of RAM then the latter probably isn't an option.

 

I'm getting information like the following:

 

 

sh prof.sh ../../firmware.elf profile.txt | sort -n -k 2 | tailadvmaths                  79419  3.44%memcpy_moveml             80368  3.48%RllDecompress             107538  4.65%pointCubic                200560  8.69%CanvasCopy                364142  15.77%main                      955712  41.41%

 "main()" is the idle loop. The code is 59% busy and 41% free. CanvasCopy is the function taking the most time. Looking at the histogram for code within that function:

 

 

 

0x4012D5F2 = 10x4012D5F4 = 50x4012D696 = 3400x4012D698 = 820x4012D69C = 4250x4012D6A0 = 1230x4012D6A4 = 165820x4012D6A8 = 175620x4012D6AA = 864330x4012D6AC = 85560x4012D6B0 = 43260x4012D6B2 = 37740x4012D6B6 = 71710x4012D6B8 = 4585

Note that CanvasCopy registered  364,142 counts. One instruction above is taking 86,433 counts, which is 24%, and this is inside a loop that has 40 or more instructions in it. So 24% execution time from 2.5% of the loop? Worth looking at:

 

 

4012d6a0: 6000 0168   braw 4012d80a
                      src.u32 = *(srcPtr.p32++);
4012d6a4: 206e fff8   moveal %fp@(-8),%a0
4012d6a8: 2610        movel %a0@,%d3  #### THIS ONE ####
4012d6aa: 5888        addql #4,%a0
4012d6ac: 2d48 fff8   movel %a0,%fp@(-8)
                      if (a_eMode == xxxx)
4012d6b0: 4a8c        tstl %a4
4012d6b2: 6700 0128   beqw 4012d7dc <CanvasCopy+0x4f6>

 

 

The above is part of a function copying some picture data around. The instruction taking 24% of the whole execution time is stuck and waiting for the data to be read from main memory into the data cache. So it is being limited by the memory read time. If only the CPU had a "cache touch" instruction (it doesn't).

 

 

 

 

 

0 Kudos
Reply