Square root algorithm

I need a squareroot  function for my LPC1111 application which can take in uint32_t and return uint32_t. I try to avoid floats and the full math library to save flash. I have 2k flash and plenty of RAM available.

Anybody have an idea for a simple algorithm or method?