RT1170 trace using Lauterbach Microtrace

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

RT1170 trace using Lauterbach Microtrace

2,192 Views
mastupristi
Senior Contributor I

I have recently started using the Lauterbach Microtrace on MIMXRT1170-EVK.
I have an example project using FreeRTOS and running in ITCM.

I get a lot of 'fifofull' events, many of which seem to result from Idle cycle of FreeRTOS. It is a 5-instruction loop, and Lauterbach support explained to me that the core generates many more events than the trace port can output (despite the speed being set to 120MHz on the trace), and this results in the trace queue filling up quickly.

I tried adding lots of NOPs in the idle loop to give the fifo time to empty. It actually works, but still more 'fifofull' events remain. Some of them are still related to FreeRTOS operations such as suspend/resume.

However, Lauterbach`s support points to lowering the clock frequency of the core as the ultimate solution, in order to make the `fifofull` disappear altogether.
I had to lower it to 100MHz to make them disappear.

But that`s such a low frequency that I thought I was doing something wrong.

Does such behavior also appear to you? Is there any optimization I can try to do? Any ETM/ITM device settings or anything like that?


best regards

Max

Labels (2)
0 Kudos
Reply
7 Replies

2,165 Views
EdwinHz
NXP TechSupport
NXP TechSupport

Hi @mastupristi,

Seeing as Lauterbach Microtrace is not directly supported by us, I'm afraid there is not much I can recommend. Have you seen this issue when using our own tracing methods, described on the following AppNotes (AN13234 & AN12877)? 

0 Kudos
Reply

2,157 Views
mastupristi
Senior Contributor I

Have you seen this issue when using our own tracing methods, described on the following AppNotes (AN13234 & AN12877)? 

I am using ETM for trace, not SWO, so I am following AN12877, however it does not cover RT1170-EVK.
Lauterbach has set up some examples specifically for this board, and I have been following those, along with direct support sessions.

They explained to me that ETM exploits a 4-bit bus, and it was born for Cortex M when these rarely exceeded 200MHz frequency.

I the problem of 'fifofull' in tight loops of a few assembles instructions, such as the idle cycle of FreeRTOS, when it runs in ITC and the core runs at least 200MHz.

They explained to me that a trace event is generated at least for each branch, so in the case of the 5 instructions in the example I would have an event every 10 ns or so, assuming 2 cycles per instruction and clocked at 1GHz. These trace events are generated by the core, which sends them to the ETM, which puts them in a FIFO and then issues them on the 4-bit bus. The bus in my case runs up to 240 MHz (120 MHz DDR), so to transfer 32 bits takes 8 cycles, so 33ns. So assuming that a trace event is only 32 bits, still the re-fill rate of the ETM fifo is 3 times higher than the empty rate on the bus.

In this example I made a lot of assumptions that I don't know how realistic they are (cycles per instruction, trace event size, etc.). So it may; be plausible to have to lower the core frequency. What is much less plausible for me is that I have to drop to 100MHz.

This made me think of my configuration errors or something like that.

Since I had already found AN12877, where it is evident that NXP has done tests with Lauterbach instruments, I was hoping that NXP had also done similar tests on RT1170-EVK under conditions similar to mine (firmware with FreeRTOS running in ITC, with data in DTC, core runs at 1GHz), so that I could confirm that the problems are there or have an explanation of how to mitigate them.

If you had not done such tests I think it might be useful for the community using RT1170 for you to do them and publish an AN specifically for that platform.

best regards

Max

 

PS: the Lauterbach examples have far fewer 'fifofull' problems, but they run from flash, often with caches disabled, and with cores running a bit slower

0 Kudos
Reply

2,069 Views
HM_UoP
Contributor II


Interestingly, we're facing a similar puzzle, despite my use of the RT1062. This riddle has kept me busy for a few days now.

Your comment, "The bus in my case runs up to 240 MHz", piqued my interest. Could you kindly share your method for setting this clock speed?

In my quest, I've attempted to adjust the Track clock using the BOARD_BootClockRUN function. Unfortunately, enabling the following code interferes with TRACE32's ability to link with the RT1062, leading me to reluctantly disable it.

/* Disable TRACE clock gate. */
CLOCK_DisableClock(kCLOCK_Trace);
/* Set TRACE_PODF. */
CLOCK_SetDiv(kCLOCK_TraceDiv, 3);
/* Set Trace clock source. */
CLOCK_SetMux(kCLOCK_TraceMux, 0);

Despite this hurdle, I examined the default divider settings via the code below. The Trace module seems to work at 132MHz, while the Core operates at a robust 600MHz.

CLOCK_GetDiv(kCLOCK_TraceDiv);
CLOCK_GetMux(kCLOCK_TraceMux);

Yet, my logic analyser reveals a mere 50MHz frequency on the TraceCLK line.

In this scenario, I keep encountering the 'fifofull' issue you mentioned. Despite consulting the Trace32 help documentation and testing commands like ETM.DataTrace off, ETM.DataSuppress ON, and ETM.NoOverflow ON, I've yet to solve the 'fifofull' problem.

Your remark, 'Lauterbach examples have far fewer 'fifofull' instances', struck a chord. I suspect that's because these examples are bare metal with fewer branches.

Therefore, I have two main questions:

How did you set the 240 MHz you mentioned? Is this frequency set in the Trace32 software or does it require configuring the microcontroller's Trace module?
If you adjusted the default TRACE clock of MCU, how did you manage to link Trace32 and it?
Any insights would be greatly appreciated.

Best wishes,

HM

0 Kudos
Reply

2,064 Views
mastupristi
Senior Contributor I

Hello HM,

How did you set the 240 MHz you mentioned? Is this frequency set in the Trace32 software or does it require configuring the microcontroller's Trace module?

I was referring to the trace clock frequency, which is set by the uTrace

traceClock.png

In this image you see it highlighted in green. As I explained, the 239.9 MHz indication refers to the channel transferring data on both edges, so the TraceCLK actually runs at 120MHz (seen and measured with an oscilloscope)

The 50 MHz you have on the TraceCLK is very poor, and it makes your situation much worse with respect to fifofull events.

 

If you adjusted the default TRACE clock of MCU, how did you manage to link Trace32 and it?

Actually I didn't adjust it, I used the automatic setting proposed by Trace32.

 

Unfortunately, I found no solution or improvement. I spoke with Lauterbach several times, but the only plausible conclusion seems to be the one I explained in my previous email.

What I regret most is that NXP has not yet said anything about it.

 

I make an appeal to the NXP guys: do some tests; at least pretend you care. In short, don't make us feel abandoned to ourselves.

 

best regards

Max

0 Kudos
Reply

2,051 Views
HM_UoP
Contributor II

Hi Max

Thank you for the screenshot you've shared. It appears that the frequency may be the optimal one auto-detected by uTrace's autofocus. I am currently using the MIMXRT1060-EVKB development board, which unfortunately does not have a Trace interface. Thus, I have had to manually wire one in.

Despite having this interface, it's clear that it wasn't designed according to high-speed wiring rules. This could potentially explain the slower speeds I'm experiencing, but that's only a guess on my part. I will certainly share any updates with you as I make progress.

I recall someone in the community mentioning that the RT1176 always uses a 50MHZ Trace Clock. You can find this here: https://community.nxp.com/t5/i-MX-RT/MIMXRT1170-EVK-How-to-enable-instruction-trace-on-the-RT1170/m-...

Segger also stated, "Arm instruction trace specifies that the maximum CPU clock speed should be twice as high as the trace clock speed." You can find that information here: https://wiki.segger.com/J-Trace_overflow_error. I hope this information is of some help to you.

My current confusion lies in the relationship between the TraceClock on the TPIU output interface and the Trace clock set within the MCU. It seems to me they aren't the same.

Best regards,

HM

 

0 Kudos
Reply

2,039 Views
mastupristi
Senior Contributor I

Hi HM,

I don't know what the limit of rt1062 is, but on 117x the theoretical limit for TraceCLK would be 70MHz (equivalent to 140MHz):

maxTraceClock.png

however, Lauterbach's autofocus detects that it can run up to 119.9 MHz (equivalent to 239.9MHz) and I have not imposed any constraints on it.

However I have tried lowering the frequency of the micro, and at 200MHz I still have many 'fifofull' events, so what Segger says (which should be derived from what ARM says) is not true, at least not in all cases. If the core runs at 100MHz then I no longer have fifofull (I haven't tried 150MHz). It's also true, though, that the M7 core is very efficient in many ways compared to an M4 and even more so compared to an M3. Perhaps I can think that Segger and ARM were referring to those cores when they were talking about core frequency versus trace frequency.

best regards

Max

0 Kudos
Reply

1,731 Views
HM_UoP
Contributor II

Hi Max

Thank you for your reply. After numerous attempts, I successfully increased the TraceClock to 400MHz, which is sufficient for the IMXRT1060 operating at 600MHz. I am aware of the 70MHz frequency limitation mentioned in the datasheet, but my tests confirm that it can indeed stably operate at a 400MHz trace frequency, without encountering any of the frustrating FIFOFULL warnings.

Please refer to the following link for details on my setup, which may also be helpful for your case.

https://community.nxp.com/t5/i-MX-RT/Trace-clock-frequency-vs-Core-frequency/m-p/810130/highlight/fa... 

Best wishes,

 

HM

0 Kudos
Reply