I've been trying to use oprofile to profile the L1 cache hit rate, but I'm unable to get any results from opreport.
Was there anyone able to get any result from oprofile ?
Attached a howto for oprofile for i.mx53 ubuntu setup
Yi Li Can I get i.mx6 based guide?
Thanks for the tutorial. I've already seen this on a web page and I have been analyizing it. There are few point I'm not 100% sure about:
1) There are two configurations possible: timer and event mode. Does also timer mode supports different performance counters to be observed (e.g L1 cache hit/miss) or just the CPU cycles ?
2) In event mode there is a need to connect a JTAG probe on the target. Why exactly is this needed ?
3) According to i.mx53 errata (ENGcm10696) there is a silicon bug which could cause an perf counter to overflow without signalling the event thus loosing the counter value.
4) What is the sampling rate if timer mode is used ?
Thanks, Marko Yi Li said:
I believe you need the event mode to access those advanced performance counter. You can define the sample rate by CPU_CYCLES, please refer to the oprofile manual for detail.
So I've followed the tutorial, but all I can get is the CPU_CYCLES counter. I just can't enable other counters even in event mode.
Huh, basically I'm stuck. Hope anyone out there have some experience on how to get other event working.
I think I've found what is the root of the problem. To summarize my problem was that oprofile was not working for me. After following the tutorial above to add the PMU (ARM Performance Monutor Unit) device to the linux kernel I still wasn't able to get any reading from the PMU counters. The only exception was the "CPU CYCLES" counter.
According to the Cortex-A8 Technical Reference Manual (http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0344k/Cegfhaac.html) the performance counters are enabled if DBGEN signal is set to 1 (see table 3.99). Now, this signal is SOC (Freescale in this case) specific.
I've searched the i.MX53 Reference Manual and on p.624 found the ARM_GPC register which contains the DBGEN enable bit. I've set this bit via the 'linux/arch/arm/mach-mx5/cpu.c' cpu file:
#define CORTEXA8_PLAT_GPC 0x04
arm_plat_base = ioremap(MX53_BASE_ADDR(ARM_BASE_ADDR), SZ_4K); reg = __raw_readl(arm_plat_base + CORTEXA8_PLAT_GPC); printk(KERN_DEBUG "CORTEXA8_PLAT_GPC: %u\n", reg); reg |= (1 << 16); //set DBGEN to 1 __raw_writel(reg, arm_plat_base + CORTEXA8_PLAT_GPC); reg = __raw_readl(arm_plat_base + CORTEXA8_PLAT_GPC); printk(KERN_DEBUG "CORTEXA8_PLAT_GPC: %u\n", reg);
However after this modification I'm still unable to get any performance counter reading out of PMC. To eliminate the oprofile factor (could have a bug on this platform) I wrote few line to manually access the PMC. The resoult is always the same. I'm able to get cycle counter but no performace couter reading.
A similar bug was reported here:
Any ideas ? Maybe someone from Freescale could take a look ?
Retrieving data ...