Hello Everyone,
I'm working on replacement of run time function call (which is working fine) and have few questions regarding the e500mc core cache measurement as follows :
Are there any tool available to measure the application (user level) L1 (D and I)/l2 cache miss rate?
- I have tried with oprofile, but not helping much for my application because trying to track the program which modifies its own program code (Kinda of self modifying). The code which is getting modified during run time is not owned by the main process, which is loaded during run time into the process program memory.
- Planning to try 'perf' tool very soon though (not sure how much this is gonna help for my case).
- Not sure, how much CW Development studio will help here, because not able to compile the user space application against the tool chain provided as
part of CW suite.
Is there any provision available to read the I/D cache content ??
The performance monitor could be helpful to measure cache miss rate, see section 9.11.6 Event Selection in E500MCRM.pdf.
Cache content could be viewed and modified in CodeWarrior debugger during a debug session. See section 5.2.6 Viewing and modifying Cache Contents in Targeting_PA_Processors.pdf, the document is in the installation directory of codewarrior.
Hi Lunmin,
Thanks for your quick prompt.
Maybe I need to think about writing a user space profiler using the PREF MONT registers (time consuming) , so for quick check if in case there is any register setting or tool which can help me to dump the I or D cache content that will be sufficient for my application assessment.
Regarding the CW debugger, I'm not sure whether need to buy something new as external connectivity interface or not. BTW, I tried with default USB TAB but not able to establish the connection between debugger and development board.
Any further idea on this regard will be helpful.
Thx.
Why do you think you need to write a new profiler? What are you trying to do that perf (or oprofile) can't do? Why does it matter that your application has self-modifying code?
As for reading cache *contents* rather than just determining the miss rate, that can't be done from software. You'll need to use an external debugger for that.
Why do you think you need to write a new profiler?
What are you trying to do that perf (or oprofile) can't do?
I'm not trying to do perf or any other application profiler unable to do.Basically, my application executes 'program code' which are not owned by the same application.
For E.G:
Below is the brief process flow :
Assuming X as Linux process and Y is the program code of Y process (either a Linux process nor elf executable)
- Launch the X process (execute the main program)
- X process will parse the program code file (Y) to get some GB's of instructions and will store in its own memory (X).
- X will then branch/jump into the start point of Y process instruction (program code).
By referring the above flow, an existing profiler(s) will be able to capture only the Linux process (X) owned symbolic links and instruction execution and not capturing the program code (PPC instructions) which is loaded during run time.
Why does it matter that your application has self-modifying code?
For generic profiler, it doesn't matter in terms of code is self modified or originally compiled one. But by referring the above explained flow the tracking/profile code is Y program code and not the one owned by Linux process. In this case, my application needs to have one self profiler to assess the program code (loaded @ run time) by X process.
As for reading cache *contents* rather than just determining the miss rate, that can't be done from software. You'll need to use an external debugger for that.
External debugger referred in above statement is just a USB TAP or something else need to purchase ?
Could you elaborate on why the existing profilers care about where the instructions came from? You won't get symbolic information, of course, but you should still be able to read out the counters for everything that happens in process X. There is no "process Y" as far as Linux (and thus its profilers) is concerned; you're just loading "program Y" into "process X" and executing it in the context of "process X". I suspect the problem is with whatever mechanism you're using to view the data, not with the data collection itself.
Yes, a USB TAP should work for reading cache contents, in conjunction with CodeWarrior. If you're having trouble getting CodeWarrior and your USB TAP working, that should be raised as a separate question.
Could you elaborate on why the existing profilers care about where the instructions came from?
You won't get symbolic information, of course, but you should still be able to read out the counters for everything that happens in process X.
There is no "process Y" as far as Linux (and thus its profilers) is concerned; you're just loading "program Y" into "process X" and executing it in the context of "process X".
I can understand the exiting profiler will not and not suppose to capture/track the instruction's (Program Y).
I have altered some code whcih will provide good cache performance (miss/hit), but with new code changes also noticing same cache profile. But with your above statement, the counters are incremented for the program Y (within context X). I'm planning to recollect the data samples and will reconfirm the cache rate (miss/hit).
I suspect the problem is with whatever mechanism you're using to view the data, not with the data collection itself.
Yes.
Yes, a USB TAP should work for reading cache contents, in conjunction with CodeWarrior. If you're having trouble getting CodeWarrior and your USB TAP working, that should be raised as a separate question.
Okie..Thanks..