Packing 12 bit data on MC9S12XET256

stevereeno · ‎07-02-2010

We are sampling some A2D data and storing it to external RAM. The data comes in threes. Initially, we created a structure like this:

typedef struct {

uint16_t xData;

uint16_t yData;

uint16_t zData;

} dataStructureType;

(We use the MISRA C recommended data types.)

We'd like to pack the data, because we're storing 500 MBytes of data in external RAM and not using 25% of the RAM. If we group the data into four sets of three 12 bit unsigned integers, we can get 144 bits which is divisible by 16. We thought about the following structures:

typedef struct {

uint16_t xData:12;

uint16_t yData:12;

uint16_t zData:12;

} dataStructureType;

typdef struct {

dataStructureType dataSet0;

dataStructureType dataSet1;

dataStructureType dataSet2;

dataStructureType dataSet3;

} dataSetStructureType;

I'm not quite sure what we'll get with this though. Will it even pack the data? Is there a better way to structure it or should we just write routines to pack the data? I'd like to make the data explicit and that's why I'm avoiding routines to pack the data.

Steve

kef · ‎07-03-2010

S12X is able to address up to 8MB. To address 500MB external RAM, are you using some extra hardware to extend memory paging over existing limits?

Data inside dataStructureType struct is packed, but size of this struct in bits isn't divisible by 8 and thus dataSetStructureType can't by packed. In the best case size of dataSetStructureType is 5 (size of dataStructType) * 4 (dataStructType items) = 20 bytes or 160bits. On every 20 bytes long dataSetStructType item you will loose 2 bytes on padding bits, or 10%. It's better than 25%, but I would define big array of chars and create routines to write and read packed data directly to and from array of chars.

stevereeno · ‎07-06-2010

Thanks for the reply. I think we're going to do something like you suggested. It'll take a little more documentation though.

My bad. I meant to say 500 kBytes. We broke the data into 64 kByte arrays. Actually they're a little bit smaller, because we saved some extra RAM for other data. Here's the commands that I used from the .prm file:

SENSOR_PAGE_00 = NO_INIT 0x200000'G TO 0x20EA5F'G;

SENSOR_PAGE_01 = NO_INIT 0x20EA60'G TO 0x21D4BF'G;

SENSOR_PAGE_02 = NO_INIT 0x21D4C0'G TO 0x22BF1F'G;

SENSOR_PAGE_03 = NO_INIT 0x22BF20'G TO 0x23A97F'G;

SENSOR_PAGE_04 = NO_INIT 0x23A980'G TO 0x2493DF'G;

SENSOR_PAGE_05 = NO_INIT 0x2493E0'G TO 0x257E3F'G;

SENSOR_PAGE_06 = NO_INIT 0x257E40'G TO 0x26689F'G;

SENSOR_PAGE_07 = NO_INIT 0x2668A0'G TO 0x2752FF'G;

SENSOR_PAGE_EXTRA = NO_INIT 0x275300'G TO 0x27CFFF'G;

SENSOR_PAGE_00 = NO_INIT 0x200000'G TO 0x20EA5F'G;

SENSOR_PAGE_01 = NO_INIT 0x20EA60'G TO 0x21D4BF'G;

SENSOR_PAGE_02 = NO_INIT 0x21D4C0'G TO 0x22BF1F'G;

SENSOR_PAGE_03 = NO_INIT 0x22BF20'G TO 0x23A97F'G;

SENSOR_PAGE_04 = NO_INIT 0x23A980'G TO 0x2493DF'G;

SENSOR_PAGE_05 = NO_INIT 0x2493E0'G TO 0x257E3F'G;

SENSOR_PAGE_06 = NO_INIT 0x257E40'G TO 0x26689F'G;

SENSOR_PAGE_07 = NO_INIT 0x2668A0'G TO 0x2752FF'G;

SENSOR_PAGE_EXTRA = NO_INIT 0x275300'G TO 0x27CFFF'G;

SENSOR_RAM DISTRIBUTE_INTO SENSOR_PAGE_00,

SENSOR_PAGE_01,

SENSOR_PAGE_02,

SENSOR_PAGE_03,

SENSOR_PAGE_04,

SENSOR_PAGE_05,

SENSOR_PAGE_06,

SENSOR_PAGE_07;

SENSOR_RAM_EXTRA INTO SENSOR_PAGE_EXTRA;

We set the array size to the sensor page size and used software to switch between pages. We're using global addressing. We'll probably get rid of the SENSOR_RAM_EXTRA and put that into the internal RAM since we'll have plenty of internal RAM.

kef · ‎07-07-2010

I wonder why are you defining all those 60000-bytes sized segments? Isn't it enough to define just single external RAM 'G segment and specify placement into external RAM?

EXT_RAM = NO_INIT 0x200000'G TO 0x27FFFF'G;

It would make sense to define 64kB sized 'G segments to ensure no single allocated object crosses 64kB boundary. This allows to keep GPAGE setting the same while accessing all parts of object (array).

But 60000B sectors and some of them crossing 64kB boundaries? Sectors crossing 64kB boundaries mean that if you allocate some object to such segment, part of object can be allocated in one GPAGE, and another part of object in another GPAGE. As far as I know Codewarrior sets GPAGE once for same object. For example let's define big array:

#pragma DATA_SEG __GPAGE_SEG EXTRAM
char sensordata[130000];
#pragma DATA_SEG DEFAULT

and disassemble this piece of code:

{

sensordata[12345] = 5;
sensordata[123450] = 5;

}

   15:     sensordata[12345] = 5;
0000 c605         [1]     LDAB #5
0002 8600         [1]     LDAA #GLOBAL_PAGE(sensordata)
0004 5a10         [2]     STAA /*GPAGE*/16
0006 187b0000     [4]     GSTAB sensordata:12345
   16:     sensordata[123450] = 5;
000a 187b0000     [4]     GSTAB sensordata:57914

As you may see 1) GPAGE is set up once, and 2) access to array element with index > 64k just chops array index down to 16bits (0x1E23A = 123450, 0xE23A = 57914). Of course that won't work.

The same bad things will happen if your small <64k object will be crossing 64kB boundary. Compiler just doesn't know parts of your object are in different GPAGEs, compiler assumes object fits the page.

Defining single big >64kB array would require from you switching GPAGE manually (it can be done in C) and performing array access using assembler (you need some of GSTAB, GLDAA and other Gxxx instructions). If assembler is not an option for you, then you should revise your prm file making 'G segments not crossing 64kB GPAGE boundaries.

kef · ‎07-07-2010

Please disregard what I said about the need to use assembler to access big >64k arrays. There're __far24 pointers and you can use them.

#pragma DATA_SEG __GPAGE_SEG EXTRAM
char sensordata[130000];
#pragma DATA_SEG DEFAULT

this won't work:

{

sensordata[123450] = 5;

}

but this will work:

{

char * __far24 p;

p = (char * __far24)sensordata + 123450;
*p = 5;
}

However __far24 won't help if the size of array element is >1 and some Nth element crosses 64k boundary. So if you are going to use array of shorts or array of longs or array of some structs, then you should care about proper alignment.

stevereeno · ‎07-12-2010

Thanks for the information. So it looks like the best approach for RAM utilization would be to initialize the whole memory segment as a large 8 bit array and pack the data using the 8 bit array elements. I'll need to look at the timing, because we're using very tight loops.

kef · ‎07-14-2010

__far24 pointers are slower than other methods because they use full 24bits pointer arithmetic. But using __far24 pointer you don't have to divide big >64k array into smaller pieces. That's nice.

If __far24 will turn being too slow for you, then you could define 64k sized memory segments in global memory and fill them with 64k sized 64k aligned arrays. Then you would select array to write in software.

Since global addressing is using slower Gxxx CPU instructions, you could also use CS3 chip select to select external memory. CS3 chip select will allow you to use RPAGE addressing, which might be bit faster.

But to notice any bigger difference among these 3 ways, you should have implemented some double buffering scheme or something that would allow you reducing page siwtching to only cases when it is really required. I mean that calculating array number each time new sample is taken is slow, but filling page window until the end of page is reached, and only then switching to new RPAGE - would be much more faster. Going further I have doubts regarding what will be faster, slower but bigger 64k arrays, or faster but smaller 4k RPAGE arrays. Bigger arrays may require less page switching and possibly less CPU overhead. Also calculating array number is faster in 64k case, because long int idx / 65536 operation is as simple as taking upper half of idx. 4k case is slower. Compared to 64k, in the best case idx/ 4096 is taking at least 4 additional CPU cycles.

So I don't know what's the best for you. I hope you considered improving performance of other parts of your software and maybe moving some tasks from CPU12X to XGATE. If that's easier to improve, then maybe you don't need to bother about performance of data storage. Regarding data packing, it would be the fastest to not pack data at all and loose those 25% of available RAM.

Packing 12 bit data on MC9S12XET256

Packing 12 bit data on MC9S12XET256

General