FTDI GPIO operation on MPC5748G is slow

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

FTDI GPIO operation on MPC5748G is slow

Jump to solution
1,876 Views
fog_hua
Contributor I

Well, we use MPC5748G with FTDI chip to communicate with PC via USB cable. The MCU rx/tx frames from FTDI chip use FIFO mode via GPIOs. The same design with Infineon chip(160M Hz) and FTDI chip can reach about 2Mbps, while in MPC5748G platform, we can get only about 300kbps。

The test code is attached, please give me some advice about how to improve the performance. Thank you.

Labels (1)
Tags (3)
0 Kudos
1 Solution
1,649 Views
lukaszadrapa
NXP TechSupport
NXP TechSupport

Hi,

even if 160MHz system clock is used, it does not mean that each instruction is executed in one cycle (mentioned 6.25ns / clock). Make sure the cache is enabled, this may help a lot. If it is not enabled, additional time is needed to fetch instructions from flash due to wait states.

Then I would recommend to access IO pins directly by writing to GPDO registers and reading GPDI registers in SIU module. If  you call SDK functions, it inserts another delays.

Also this loop:

   if (isUSBDataAvailable()) {
            readUSBData();
        }

... can be written in more efficient way if you don't call functions.

Regards,

Lukas

View solution in original post

0 Kudos
9 Replies
1,649 Views
fog_hua
Contributor I

Finally, I found a battleneck of the issue. FTDI FIFO mode read/write data(8bit, one byte) with the same data pins. so each time before the read action, we need switch gpio to input buffer mode,  and write with output buffer mode. Since there are 8 gpios these actions take such a lot time. Is there any efficient way to do gpio direction switch with multi gpios?

0 Kudos
1,650 Views
lukaszadrapa
NXP TechSupport
NXP TechSupport

Hi,

even if 160MHz system clock is used, it does not mean that each instruction is executed in one cycle (mentioned 6.25ns / clock). Make sure the cache is enabled, this may help a lot. If it is not enabled, additional time is needed to fetch instructions from flash due to wait states.

Then I would recommend to access IO pins directly by writing to GPDO registers and reading GPDI registers in SIU module. If  you call SDK functions, it inserts another delays.

Also this loop:

   if (isUSBDataAvailable()) {
            readUSBData();
        }

... can be written in more efficient way if you don't call functions.

Regards,

Lukas

0 Kudos
1,649 Views
fog_hua
Contributor I

Thanks a lot for your response.

I have already enabled I-Cache and D-Cache in my project (since I use ucos as my os, the cache enabling asm code is from micrium and it works as expected), but I don't know how to configure the cache in my test code (I am not familiar with asm). I have searched from SDK and S32 examples but didn't find any reference code about cache enable/disable. So if I want enable I-Cache to my test code, where can I find the reference code?

I also tried with GPDO/GPDI actions, but failed with any obvious improvement.

And I also tried with a while loop to check cpu performance as following:

static void check_cpu_performance()
{
do {
uint32_t i = 0;
uint32_t tick1, tick2;
uint32_t delta_tick = 0;

tick1 = OSIF_GetMilliseconds();
while(i++ < 10000000);
tick2 = OSIF_GetMilliseconds();
delta_tick = tick2 - tick1;

break;
} while(0);
}

It takes about 625ms, while in my Infineon platform, it just about 450ms. Am I wrong with some cpu settings? If you need any addtional info please let me know, thanks.

0 Kudos
1,649 Views
lukaszadrapa
NXP TechSupport
NXP TechSupport

Hi,

cache can be easily enabled by writing value 0x85000001 to PCCCR register.

From the reference manual:

pastedImage_1.png

This should make a difference.

Regards,

Lukas

0 Kudos
1,649 Views
fog_hua
Contributor I

Hi Sir,

I have searched the reference manual(MPC5748GRM.pdf) and other documents, but didn't find any information about I-cache and D-cache, could you kindly give me the link of the right document?

0 Kudos
1,649 Views
lukaszadrapa
NXP TechSupport
NXP TechSupport

Hi,

I'm sorry, I mismatched two similar questions. My answer above is related to S32K microcontrollers.

In case of MPC5748G, the caches are described in core reference manual:

https://www.nxp.com/docs/en/reference-manual/e200z4RM.pdf 

Caches are enabled in startup codes and it looks like this:

#ifdef ICACHE_ENABLE
;#****************** Invalidate and Enable the Instruction cache **************
__icache_cfg:
        e_lis    r7, 0x0000
        e_or2i    r7, 0x03FF        #create mask for lower 11 bits
        mfspr    r5, 516            #in lower 11 bits we have icache size
        and.    r7, r7, r5        #check if we have icache
        e_beq    _skip_i_cache    #branch if not

        e_li    r5, 0x2
        mtspr    1011,r5

        e_li    r7, 0x4
        e_li    r8, 0x2
        ;#e_lwi r11, 0xFFFFFFFB
        e_lis    r11,0xFFFF
        e_or2i    r11,0xFFFB

__icache_inv:
        mfspr    r9, 1011
        and.    r10, r7, r9
        e_beq    __icache_no_abort
        and.    r10, r11, r9
        mtspr    1011, r10
        e_b        __icache_cfg

__icache_no_abort:
        and.    r10, r8, r9
        e_bne    __icache_inv

        mfspr    r5, 1011
        e_ori    r5, r5, 0x0001
        se_isync
        ;#msync
        mtspr    1011, r5
_skip_i_cache:
#endif

#ifdef DCACHE_ENABLE
;#****************** Invalidate and Enable the Data cache **************
__dcache_cfg:
        e_lis    r7, 0x0000
        e_or2i    r7, 0x03FF        #create mask for lower 11 bits
        mfspr    r5, 515            #in lower 11 bits we have dcache size
        and.    r7, r7, r5        #check if we have dcache
        e_beq    _skip_d_cache    #branch if not

        e_li r5, 0x2
        mtspr 1010,r5

        e_li r7, 0x4
        e_li r8, 0x2
        e_lis    r11,0xFFFF
        e_or2i    r11,0xFFFB

__dcache_inv:
        mfspr r9, 1010
        and.  r10, r7, r9
        e_beq   __dcache_no_abort
        and.  r10, r11, r9
        mtspr 1010, r10
        e_b __dcache_cfg

__dcache_no_abort:
        and.  r10, r8, r9
        e_bne __dcache_inv

        mfspr r5, 1010
        e_ori   r5, r5, 0x0001
        se_isync
        msync
        mtspr 1010, r5
_skip_d_cache:
#endif

Regards,

Lukas

0 Kudos
1,649 Views
fog_hua
Contributor I

Thanks. Well, this code has already exist in the S32 default startup file for MPC5748G. I need only enable the related macros, which I have been done in my project. It actually improved the performance when I testing with debug on flash.

But as I said in the first post, the usb read/write performance is still too slow to match my requirement. Any other advice will be appreciated.

0 Kudos
1,649 Views
lukaszadrapa
NXP TechSupport
NXP TechSupport

Hi,

there are not many options if you have already configured clocks to maximum and if the cache is enabled. Default configuration of cross bar and peripheral bridge should be about right in this case. The last option is to optimize the code as much as possible - that means write the fast loop in asm.

For example, if you use this piece of code to toggle a pin:

while(1){
           asm("e_stw %r4,0x0(%r3)");
           asm("e_stw %r5,0x0(%r3)");

};

...you can get about 6.6MHz signal. (note - there's address of GPDO prepared in r3 and values 0 an 1 in r4 and r5).

So, I'm sure you can reach higher performance but it would be necessary to spent more time with code optimization.

Regards,

Lukas

0 Kudos
1,649 Views
fog_hua
Contributor I

Please kindly to see my update and give me your advice. Thank you.

By the way, I also tried to use an oscilloscope to check the cpu performance. The test case is as following:

1. add 100 '__asm volatile ("e_nop");' between two gpio actions(pull to low, then high), and record time between the gpio low to high(T1);

2. add more 100 '__asm volatile ("e_nop");', and record time between the gpio low to high(T2);

3. (T2-T1)/100 to get one clock time, expect to be 6.25ns

A. When I debug the F/W with XXX_Debug_RAM, (T2-T1)/100 get result about 18ns;

B. When I debug the F/W with XXX_Debug without I_CACHE enable, (T2-T1)/100 get result about 18ns;

C. When I debug the F/W with XXX_Debug with I_CACHE enable, (T2-T1)/100 get result about 6.4ns;

Why did I get such a difference? I think A should be about 6.25 ns, and C should less than that, right?

0 Kudos