An OpenCL program, derived from this Apple Developer example: OpenCL Parallel Reduction Example, fails with an accuracy error.
This program uses OpenCL to sum 1,048,576 floating point numbers 1000 times.
The program was run on imx6 quad with the arguments "gpu" and "float".
The imx6 is running Windows CE7 = WEC7 = Windows Embedded Compact 7.
Result = 524317.562500 != 524315.912500
Error: Incorrect results obtained! Max error = 1.750000
The same OpenCL program, modified slightly to run on Windows 7/64 PC with an NIVIDIA K600, ran OK with no accuracy errors.
See attachments for the original OpenCL program source and derivatives.
The program has not been run on Linux. My hardware does not run Linux.
Original Attachment has been moved to: PC_AddUsingReduction.zip
Original Attachment has been moved to: WEC7_Add_UsingReduction.zip
Original Attachment has been moved to: Original-OpenCL_Parallel_Reduction_Example.zip