Coldfire MMU without Harvard TLB?

scottlarson · ‎03-25-2015

I have a question about the Coldfire v4 MMU.

In the bit descriptions of the MMUDR register, it states "If a Harvard TLB implementation is used..." Is there a way to use the MMU TLB *without* the Harvard implementation (i.e. *not* have separate 32 data and 32 code entries)? If so, how? The documentation seems to hint at this both in the description for the MMUDR.R bit (the quote above) and MMUDR.X bit: "If separate ITLB and DTLBs are used..." Implying that there is an option to not have separate ITLB and DTLB.

My goal here is to be able to both read and execute a page of instruction memory and use *only* one entry in the table, rather than two TLB entries (one ITLB entry for execute and one DTLB entry for read). Is this possible?

Thanks,

Scott Larson

TomE · ‎03-25-2015

You didn't say which CF4 manual you are looking at. Looking at the MCF54455 Reference manual and searching for "Harvard" gets:

3.1.1 Overview mentions "Harvard".

4.1.2 Features says "(Harvard TLBs)" twice.

Table 4-9. MMUDR Field Descriptions says "If a Harvard TLB implementation is used" and "If separate ITLB and DTLBs are used,"

4.3.8 MMU Implementation says "The MMU implements a 64-entry full-associative Harvard TLB architecture".

4.3.8.3 TLB Locked Entries says "Figure 4-12 is a ColdFire MMU Harvard TLB block diagram".

Figure 4-12. Version 4 ColdFire MMU Harvard TLB.

I would guess the MMUDR description is describing a GENERIC MMU Module which could be reused in different CPUs, some with and others without q Harvard architecture.

In this case the CPU is certainly "a Harvard TLB implementation".

If you've ever read any ARM documentation you'll find it is all like this. The chip manual states something like "Contains a Cortex A8 core, go read the ARM documentation". Since the cores come with a lot of chip-manufacturer-selectable options (for optional features and different cache sizes and so on) the "generic" documentation doesn't detail things like that. Sometimes the "bus interconnect" from the core to the rest of the chip is selectable and separately documented.

> My goal here is to be able to both read and execute a page of instruction memory and use *only* one entry in the table, rather than two TLB entries

I can see why you would want to do this, but I don't think it is possible.

You might find some example code here if you can work through the twisty maze that defines what gets built linked and connected:

Linux/arch/m68k/mm/mcfmmu.c - Linux Cross Reference - Free Electrons

Tom

View solution in original post

TomE · ‎03-25-2015

You didn't say which CF4 manual you are looking at. Looking at the MCF54455 Reference manual and searching for "Harvard" gets:

3.1.1 Overview mentions "Harvard".

4.1.2 Features says "(Harvard TLBs)" twice.

Table 4-9. MMUDR Field Descriptions says "If a Harvard TLB implementation is used" and "If separate ITLB and DTLBs are used,"

4.3.8 MMU Implementation says "The MMU implements a 64-entry full-associative Harvard TLB architecture".

4.3.8.3 TLB Locked Entries says "Figure 4-12 is a ColdFire MMU Harvard TLB block diagram".

Figure 4-12. Version 4 ColdFire MMU Harvard TLB.

I would guess the MMUDR description is describing a GENERIC MMU Module which could be reused in different CPUs, some with and others without q Harvard architecture.

In this case the CPU is certainly "a Harvard TLB implementation".

If you've ever read any ARM documentation you'll find it is all like this. The chip manual states something like "Contains a Cortex A8 core, go read the ARM documentation". Since the cores come with a lot of chip-manufacturer-selectable options (for optional features and different cache sizes and so on) the "generic" documentation doesn't detail things like that. Sometimes the "bus interconnect" from the core to the rest of the chip is selectable and separately documented.

> My goal here is to be able to both read and execute a page of instruction memory and use *only* one entry in the table, rather than two TLB entries

I can see why you would want to do this, but I don't think it is possible.

You might find some example code here if you can work through the twisty maze that defines what gets built linked and connected:

Linux/arch/m68k/mm/mcfmmu.c - Linux Cross Reference - Free Electrons

Tom

scottlarson · ‎03-26-2015

Hi Tom,

I forgot to state that I'm using the MCF54418. I would think they would tailor the documentation to that specific family (MCF5441x), but I guess not.

Yeah, I've been using the Linux code as an example, but there's so many levels of abstraction and #defines and little functions everywhere that makes it challenging to follow.

Thanks for your response.

TomE · ‎03-26-2015

> there's so many levels of abstraction and #defines and little functions everywhere that makes it challenging to follow.

Yes, hard isn't it? My first level of attack is to see which sources generated ".o" files and ignore the rest. You could always enable "Kernel Hacking / Compile the kernel with debug info" which then lets "m68k-elf-objdump -S vmlinux" show you the sources in context in the kernel. That should help to unwind some of the levels of #defines.

If you're not running Linux, so I'm interested in what you're using the MMU for. Are you going "full virtual" or do you just want to protect pages from "code accidents"? At least this CPU isn't as bad as many of the PPC ones (MPC860 specifically) where you can't enable the Instruction and Data Caches without setting up the TLBs and turning the MMU on. The CF4 core has ACR0-ACR3 allowing pretty good cache control.

Tom

scottlarson · ‎03-27-2015

I'm working on memory protected "downloadable application modules" for ThreadX, an RTOS my company makes (http://rtos.com/products/threadx/downloadable_application_modules).

We're not going "full virtual," just mapping the same virtual address to physical address. Each module gets its own protected region of code and data memory. I'm trying to keep it as simple as possible. After reading the 24 pages of documentation about the MMU numerous times, studying some of the Linux code, and doing some simple experiments, I am finally beginning to understand the MMU. How the heck is there no app note or example from freescale on how to use this awful thing?!

Anyway, I've got my implementation designed and it's mostly up and running now.

Cheers,

Scott

TomE · ‎03-29-2015

> I'm working on memory protected "downloadable application modules" for ThreadX

Very neat. Given there's "no limit" as to the number of pages or threads that a customer might use, do you need to support demand-loading of the TLBs? That's more complex that statically loading them.

> How the heck is there no app note or example from freescale on how to use this awful thing?!

There haven't been many App Notes like that (on the complex core stuff) since Freescale was Motorola. I guess that if you want to use the MMU you're expected to be using a standard OS like Linux, and the MMU support there is based on all the previous CPUs it ran on. So maybe you're meant to have first implemented your OS on the MC68010/MC68451 combination in the mid 1980's, then used the MC68020/MC68851, then MC68030, then all the Power chips starting with the ones that Apple used. Maybe there is an App Note that would have helped you, but it was probably written by IBM in the 1990s.

I worked on an MPC860 system years ago (1998 and 1999), and it took over a year to get all the bugs out of the MMU TLB Reload code as the Motorola example code back then was incomplete. One thing I did add to that system that was very useful was to allocate 1G of virtual space for "malloc()" and to give every call to "malloc()" its own MMU page. That found all the use-after-free bugs really quickly.

References here, showing how bad it was in 1999. Even the Motorola provided MPC860 Cache Flush code didn't work:

https://groups.google.com/forum/#!topic/comp.sys.powerpc.tech/gatLfnt5iE0

Google Groups

Tom

scottlarson · ‎03-30-2015

Hey Tom,

I am statically loading the MMU tables, so there will be a hard limit on the size of each module. Reads of instruction memory will be dynamic though - I leave one DTLB entry for instruction reads.

I remember back in college reading the programmer's guide for the 68k. I thought it was well written and didn't leave me in the dark.

That said, I still don't trust any vendor-supplied code.

Now to write a fault handler for the MMU!

Cheers,

Scott

Coldfire MMU *without* Harvard TLB?