USB DMA Engine Bulk Transfer in Custom Class

取消
显示结果 
显示  仅  | 搜索替代 
您的意思是: 

USB DMA Engine Bulk Transfer in Custom Class

2,223 次查看
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by etunnel on Tue Sep 02 23:07:06 MST 2014
I want to develop a program on LPC43xx to move data between the host PC and the LPC device as fast as possible via USB. I settled on the bulk transfer mode and using the custom class.  I got the usbd_rom_bwtest working and obtained the transfer throughput data.  Now, it is time to write a working program that handles the flow of data correctly.

I am totally confused on how the DMA engine in LPC43xx works.  I failed to setup the DMA transfer descriptor correctly to transfer data reliably.  It seems to be difficult to trap events in debugger to figure the data flow out using the test example.  I haven't been able to find the document describing how the USB DMA engine works and how it is modeled in the USB ROM data structure and API.  Where can I find the document or an example that actually shows how data are transferred correctly and continuously (usbd_rom_bwtest doesn't care about the correctness of the data, especially when transfer data from device to USB host)?

Thanks a lot for any pointer.







标签 (1)
0 项奖励
回复
4 回复数

1,655 次查看
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by pierre on Tue Sep 23 09:26:56 MST 2014
You're welcome ;)

The most annoying for me was to grasp the USB "mindset". Comparing with Ethernet, it is completely opposite. Ethernet is a link between two equal partners, and anyone can talk at any time. USB really is master/slave down to the details. The device never takes any initiative, the only thing it can do is reply when the host initiates a transaction.

If you use a protocol to communicate with the PC, you must not forget that if the device has something to say, perhaps it will only be polled at the next µframe, or later if there is traffic. Therefore 8000 roundtrips is at least 1 second. Also the data has to make its way through the OS and finally to your code. This is a bit like ping time on a network.

For example if your protocol is like :

PC- write 512 byte sector
Device - OK
PC- write 512 byte sector
Device - OK
etc

It is going to suck because of the roundtrips and you gonna get less 4MB/s if you're lucky, half that if you're not.

So, in Bulk mode, send large chunks of data (the hardware will split them into the correct size and number of USB Bulk packets according to MaxPacketSize endpoint config register and handle all the flow control). If you have lots of small commands, pipeline them.
0 项奖励
回复

1,655 次查看
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by etunnel on Tue Sep 23 00:10:04 MST 2014
I very much appreciate your information.  This definitely complements the user manual in understanding the flow.
0 项奖励
回复

1,655 次查看
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by pierre on Sat Sep 20 07:54:09 MST 2014
Forget about USB rom drivers or lpcusblib, they have rather nasty bugs/race conditions, plus the code is impossible to read, it has globals accessed from everywhere, etc.

Everything you need is in the Huge User Manual. However, it is not explained in the clearest way possible... I had to read some bits several times.

Basically you setup your Device Queue Head (see manual) where each endpoint has a pointer to a Transfer Descriptor (DTD) linked list, which will be executed in order. Each of those contain pointers to a buffer where the data will be stored/read.

Once you prime the endpoint it executes its DTD list in-order, and it will fire a Transfer Complete interrupt after each transfer... then you read the DTDs, retire the used ones, and and add more of them. The last part is subtle as you got to handle the case where the hardware is writing a finished DTD while the ISR is reading the same DTD, or the hardware is stopping after its last transfer, while at the same time the ISR is queuing a new transfer. Some registers have to be tickled in the right way, and the Manual does explain very well how to do that.

Regarding performance, you can get maximum USB bandwidth with very little CPU use on the LPC, since the USB controller can transfer 16kbytes packets and handles all the low level stuff. So you get an IRQ every 16kB, or even less if you queue more DTDs. That's a few % cpu use at full bandwidth, unless the CPU copies all data "manually", of course.

You got to remember that USB is a protocol for dumb devices.

Sending is simple : you queue a DTD and the hardware will send it when the host requests it.

Receiving is a bit more subtle. You only know the host wants to send something when it does try to send it. In that case, if you got an endpoint primed with its DTD and buffer ready, the data is transferred. Otherwise, the endpoint will NAK, you'll get an interrupt, and the host will retry later, hopefully by that time the endpoint will be primed.

You cannot ask the host to send, or tell it you are ready to receive, because USB is a protocol for dumb devices. The only way the device can talk is to reply to something. Even so called "interrupt" transfers are really the host polling the device 1000x a second and asking "any news ?"...

So if you want high throughput, always have your endpoint primed for reception, with several DTDs queue'd. If it runs out of DTDs, it will NAK, and wait for the host retry delay.

The most painful thing is to write those damn descriptors.

0 项奖励
回复

1,655 次查看
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by etunnel on Fri Sep 19 00:28:51 MST 2014
After digging around, I finally found that the user guide is very helpful:  UM10503  (http://www.nxp.com/documents/user_manual/UM10503.pdf)  It explains how the DMA engine works filling my gap in understanding.  It is a big doc.  The chapter, 24.9 Device data structures, explains the descriptor.  Now, the usbd_rom_bwtest example makes more sense.

Understanding this better, I realize something else related to the USB_EVT_OUT event I don't understand.  Originally I thought the debugger is not trapping correctly.  Now, I am convinced that I just don't understand.  I'll create different post on that.
0 项奖励
回复