USB DMA Engine Bulk Transfer in Custom Class

I want to develop a program on LPC43xx to move data between the host PC and the LPC device as fast as possible via USB. I settled on the bulk transfer mode and using the custom class.  I got the usbd_rom_bwtest working and obtained the transfer throughput data.  Now, it is time to write a working program that handles the flow of data correctly.

I am totally confused on how the DMA engine in LPC43xx works.  I failed to setup the DMA transfer descriptor correctly to transfer data reliably.  It seems to be difficult to trap events in debugger to figure the data flow out using the test example.  I haven't been able to find the document describing how the USB DMA engine works and how it is modeled in the USB ROM data structure and API.  Where can I find the document or an example that actually shows how data are transferred correctly and continuously (usbd_rom_bwtest doesn't care about the correctness of the data, especially when transfer data from device to USB host)?

Thanks a lot for any pointer.