3D rendering

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

3D rendering

772 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by zp on Fri Sep 06 10:27:25 MST 2013
The power of the  LPC43xx in DMIPS is quite close to the Intel Pentium I running at more than 120MHz. So I decided to estimate the abilities of the LPC43xx in 3D graphics. Or to test my ability to show the best of the LPC4357 in 3D, if you want.

You can have a look at this 11MB 800x600 video. I'm working on my own DIY board with the plain C programming powered by Rowley Crosswork toolchain. 

LPC4357 3D rendering

This is not a 3D demo but just the perspective texture mapper visual test. I'm now in the very beginning state.
Screen resolution is 640x480 with 16bpp.

A free textured 3D lowpoly model was taken from the psionic site.
The backface culling, frustum culling and perspective mapper seems are working quite well meaning all the other interpolated values (like vertices colors for a blending and normals for a Gourad shading) will interpolate correctly as well. An optimized mesh normals per-frame recalculation also works about at no cost wit such kind of models on the LPC43xx.
No optimization of any kind still applied so the total FPS is quite low.

In the top left corner:
white number - FPS (updated every second, calculated on a second basis);
red number - previous frame time in milliseconds (updated every frame);
white number - the counter of writings in a (type of float) Z-buffer  (updated every second, calculated on a second basis);

In the bottom left is displayed a local composite matrix for the 3D object.

Comparing to the modern 3D ARM hardware accelerators (like a SGX530 in TI Sitara or Mali400 in Freescale imX5/6) this test shows just about 20K textured triangles per second against 20M in case of hardwared 3D :). By the way, all HW accelerators have free 4xAA  and many other things we can only dream of...

Though the low-poly geometry is processed quite fast, it is the texturing & rasterization issues that make the total result too poor. But if the  Z-buffer is accessed up to 50K times per frame the total FPS is about 20 (ie without closelooks). This means we can keep at least several low-poly (1K polygons) 3D objects in the scene. But the optimizations of all kind are still in a far future. At least we have a reserved 640x350 resolution, a palettized color modes or a WQVGA/QVGA LCD as the last bastion.

It is obvious that we need a pixel shader. May be another LPC43xx without LCD'd became a "master" and the other, with the LCD, will be a pixel shader with the help of its M0. (Here are going some cenzured heretic thoughts about a vybrid 6).
Labels (1)
0 Kudos
Reply
3 Replies

695 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by zp on Tue Sep 17 14:42:36 MST 2013

Quote:
I can do 2D simple graphics



Me too :)
2D allows to switch the VSYNC on. F.e. With a VGA mode and 200 32x32 sprites you can freely get 60 FPS. I experimented with as many as 125000 sprites as well (the sprite is a 32x32 bitmask with an attributes like color, position, speed etc). The LPC 4357 gives me about 0.25 FPS in this case :). One thousand though keep the framerate above 30.


Quote:
you are using DRAM for the frame buffers and internal FLASH for code execution



Of course. the SDRAM is used for all the graphics data. Code is placed in MCU's flash. The stack and system heap are both in SRAM. For my programs I'm using a custom memory management with some mix of the statically allocated buffers and dinamically allocated chunks. With 64 megabytes of a 32-bit SDRAM onboard, f.e. the first 8MB used for video buffers (tripple buffered chain for video output) and some other needs, then is the area for a dynamically  allocated blocks ( by 1KB, 16KB or 256KB chunks). So where needed I can freely use my own zpmalloc, zpcalloc, zprealloc and zpfree commands instead of intrinsic malloc, calloc, realloc and free. The last 4MB are for a RAMDISK.

But the main performance bottleneck is not a stack placement but the SDRAM speed and its bandwith distribution. By the way it is a pity LPC does not implemented CCM/TCM memory like in STM or Freescale Cortex-M4, it really can boost the performance especially in our case.

In every frame for the single render target  you must

a) restore/recreate the background
b) clear Z-buffer and all the similar common structures
c) process input and update the 3d world logic (AI)
d) apply frustum and backface culling
e) render the 3d world {here goes a long list what does it mean and in which order}
f) get some statistics about the frame and swap/shift the buffer chain

All these steps are not so strictly ordered in the real life.

The SDRAM is a shared resource and the LCD AHB master eats one third of it (in VGA 16 bit resolution), every  next master requesting the SDRAM sequentally decreases the transfer speed for the rest of masters. So no DMA or coprocessor can help if there are present a huge data streams to and from SDRAM. But they really are. 

There are some ways to improve the situation, from the common conception revsion to trying to use non-SDRAM memory. The last free 16KB ETB SRAM block is too small for the our needs. It is possible to connect an external SPI SRAM (f.e. 128KB 20MHz Quad SPI from Microchip, 512KB 40MHz SPI from EverSpin or 128K 40/104MHz SPI from Cypress; the best cost per MB ratio here wins EverSpin) or to implement a CPLD + dinamic or (pseudo) staic RAM,  connecting it  to the SPIFI or SGPIO interface.

Even better is using of a standalone pixel conveyer, adding the one more MCU. Where are the LPC437x MCUs with their dedicated third core to SGPIO transfers?

But we have to be a little bit more tolerant to the LPC43xx though - they are not intended to be used as a 3D multimedia processors. You should change the platform to ride the 3d performance up.

And the blending is a very hard task without a hardware acceleration. The number of blended areas (layers) will be very limited in case of pure software renderer. It will be very interesting how the 2d acceleration works on STM32F4x9. And surely i'll try the newest Vybrid3 with its free multilevel blending. By the way lightweighted FTDI FT800 videochips (with TS and audio) are already available and are the perfect addon to all MCU without internal LCD controllers like LPC8xx, 11xx, etc. FT800 uses framebufferless technology (as the Vybrid does), creating all the graphics on the fly.


Quote:
Which display are you using with that?



The common standard VGA monitor. I just added to the LPC4357 the simpliest buffered R-2R video DAC and a standard (slim) VGA-DSUB onboard. It is very convinient (for my taste) to have the very robust construction with normal angles of view and high detalization. The standard 40-pin socket for the RGB panel is present as well. So there is no problem to create a binary demo for a 480x272 output. Because I'm the victim of a poor design of Embest LPC4357-EVB and still have this piece of [metal] may be I'll choose this one as a demo target. But the ETA is not defined yet.
0 Kudos
Reply

695 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by wells on Sun Sep 15 11:39:00 MST 2013
I think  it's interesting, but it's a bit beyond me. I can do 2D simple graphics only :)

I take it you are using DRAM for the frame buffers and internal FLASH for code execution.
How about stacks, heap, and data (non-frame buffer) storage? Is this in DRAM too? You might be able to get a small boost in performance by placing that stuff in internal RAM if you haven't already.

Which display are you using with that? It might be interesting to put a binary demo up at some time if the display is easily obtainable...
0 Kudos
Reply

695 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by zp on Sat Sep 14 13:21:33 MST 2013
It looks like I am the only person here with a feeling of nostalgia about the former century 3D graphics and softrendering...
At least the positive result of my previous post is a huge (may be the best?) dviewers/dtime ratio on this forum. It does mean that people want bread and circuses. So the show must go on.

A little step ahead was made. The Gourad shading is now implemented.  

[img=640x480]http://www.zaurosoft.com/images/ZP_LPC4357DB/LPC4357_Gourad_shad.png[/img]

This is a copy of a backbuffer. The scene contains about 4.3K tris and the rusted barrel in the head of a Suzi is just for a Z-buffer visual test. The Gourad shading is (somtimes) funny enough (the lack of self-shadowing, the problems on some kind of geometry etc) but is still much better than flat one or the total absence of shading.

OK, no optimization up to now and the honest perspective correction on all interpolated variables, so every rendered pixel requires a reciprocal of Z-value though there are the ways not to do so;
the normals updated every frame etc.

The next targets are the simpliest particle engine, some FX and helpers (and possible the sound output), the primitives library and the animation hierarchy implementation. Then a basic AI is going like a pathfinding etc. After that i'll be ready for a real 3D demo on my LPC4357 board.
0 Kudos
Reply