i.MXプロセッサナレッジベース

ディスカッション

This work is the result of my daughter's idea, she finished it with my guidance. Cradle-1 Palmsize mini-HPC World's first full function heterogeneous mini-HPC, this is what it looks like: 1 Architecture Overall: CPU+GPU heterogeneous, 4 nodes, connected by a 100M Ethernet switcher; Nodes： FreeScale I.MX6 Quad core mini-pc, with 4 ARM Cortex-A9 cores and 1 Vivante GC2000 GPU 2 Software OS: Ubuntu 11.10 linaro OpenCL driver: Vivante GC2000 OpenCL driver Compiler: C/C++: gcc 4.6.1, Fortan90/95: gfortran 4.6.1, MPI Parallel Computing: MPICH2 1.4-1 NFS network file system: nfs-kernel-server 1.2.4 SSH security: openssh 1:5.8 3 Hardware The hardware of all nodes are the same, only the software configurations are slightly different. One of them was assigned as the master node, the others are slave nodes. They were TV sticks originally, with android 4.0 installed. The node's hardware specification is: CPU: 4 1.2G Cortex-A9 cores GPU: 1 Vivante GC2000 GPU RAM: 1G DDR ROM: 8G SD NIC: usb2.0 100M Ethernet Adapter (this NIC is not the TV stick's component, we added it) WIFI: 150M Display Interface: HDMI Network Switcher: 5 port 100M Ethernet Switcher 4 Network Each node has one USB2.0 NIC and one WIFI interface, the WIFI is used as the backup connection for NIC connection. Network configurations are: IP Address assignment: (baby1 - baby4 are the four computing nodes) baby1: 100M NIC 192.168.10.1 WIFI 192.168.0.111 baby2: 100M NIC 192.168.10.2 WIFI 192.168.0.112 baby3: 100M NIC 192.168.10.3 WIFI 192.168.0.113 baby4: 100M NIC 192.168.10.4 WIFI 192.168.0.114 5 Performance Cradle-1 has 16 1.2G ARM Cortex-A9 cores and 4 Vivante GC2000 GPU cores, the total computing power of these 20 computing devices is more than 100GFLOPS, more powerful than an ordinary desktop. The whole machine is only a little bigger than a palm, and the total power consumption is less than 15 watts. The overall architecture of Cradle-1 is almost the same as Chinese Tianhe-1A or the Titan in the oak ridge lab. they used the same set of software, LINUX+OPENCL+OPENMPI. Cradle-1 supports C/C++, Fortran90/95. And almost all kinds of parallel computing algorithms can run on it, the only difference is the scale. We coded a MPI parallel computing program for large matrix multiplication with 4 processes, each process had 5 threads, four threads for the four CPU cores, and one thread for GPU computing. 6 Appearance Front Back Top Left Right One node, it has three interfaces, the right is HDMI interface, upper-left is the wireless adapter for keyboard and mouse, down-left is the power connection. One node is running Ubuntu 11.10. Coded a simple OpenCL program to display OpenCL driver information On a notebook, using remote desktop access function to obtan the node baby1's desktop. This is the sign in desktop of baby1 node. Baby 1 has X11VNC server installed. sign in baby1, open a terminal Ran a MPI testing program, ensuring that all babies (baby1 - baby4) were working Any comments? please mail to audrey.tao@hotmail.com

This is the prototype solution to enable second display showing different things on JB4.2.2 SabreSD. Make use of Class Presentation provided by android to be embedded into Status bar. When unlock the screen, the Presentation will show on second display. Now, the solution requires one .mp4 video placed in root sdcard. Of course, you may change it to show anything. The attached Files are a layout xml file, a patch and a recorded video. The layout file should be put into android/frameworks/base/packages/SystemUI/res/layout/ folder. The patch should be applied to frameworks/base.git. The recorded video shows the dual display demo as a reference.

Here are two patches to support BT656 and BT1120 output for i.MX6 ipuv3. With this patch, the i.MX6 can support the CVBS output on TV encoder. It is useful for a TV box. "L3.0.35_1.1.0_GA_bt656_output_patch.zip" is the patch for Freescale L3.0.35_1.1.0_GA_iMX6DQ BSP. "r13.4.1_bt656_output_patch.zip" is the patch for Freescale Android R13.4.1 BSP. 1. Features supported: 1) Support BT656(8 bits) and BT1120 (16 bits)interlaced output on display port. 2) Support both RGB and YUV frame buffer for BT656/BT1120 output. 3) Support PAL and NTSC mode. 4) Support on the fly switch between PAL and NTSC mode. 5) Support CVBS output based on adv7391 TV encoder. 2. Hardware link between iMX6 and adv7391 TV encoder chip. IPU1_DI0_DISP_CLK connected to adv7391 CLKIN pin. IPU1_DISP0_DAT_23~DISP0_DAT_16 connected to adv7391 P7~P0 pins. IPU1_DI0_PIN2 connected to adv7391 HSYNC pin. (option) IPU1_DI0_PIN4 connected to adv7391 VSYNC pin. (option) - Android R13.4.1 kernel. 3. How to use -- Copy the two patch files to kernel folder. $ git apply ./0001-Support-BT656-and-BT1120-output-for-iMX6-ipuv3.patch $ git apply ./0002-Support-adv739x-TV-encoder-for-BT656-output.patch -- Select them in kernel config and build the new kernel image: Device Drivers ---> Graphics support ---> [*] MXC BT656 and BT1120 output [*] ADV7390/7391 TV Output Encoder -- Uboot parameters for video mode Output BT656 NTSC data to display port with UVYV frame buffer mode: "video=mxcfb0:dev=bt656,BT656-NTSC,if=BT656,fbpix=UYVY16" Output BT656 NTSC data to display port with RGB565 frame buffer mode: "video=mxcfb0:dev=bt656,BT656-NTSC,if=BT656,fbpix=RGB565" Output BT656 PAL data to display port with RGB24 frame buffer mode: "video=mxcfb0:dev=bt656,BT656-PAL,if=BT656,fbpix=RGB24" Output CVBS NTSC signal on adv7391 with UYVY frame buffer mode: "video=mxcfb0:dev=adv739x,BT656-NTSC,if=BT656,fbpix=UYVY16" Output CVBS PAL signal on adv7391 with RGB565 frame buffer mode: "video=mxcfb0:dev=adv739x,BT656-PAL,if=BT656,fbpix=RGB565" -- Switch between PAL and NTSC $ echo D:720x480i-60 > /sys/class/graphics/fb0/mode $ echo D:720x576i-50 > /sys/class/graphics/fb0/mode 4. Note 1) For 8 bits BT656 interface, the default data pins are "DISP0_DAT_23~DISP0_DAT_16", it can also be any other continued display data pins, for example if "DISP0_DAT_7~DISP0_DAT_0" are used, the macro "BT656_IF_DI_MSB" in "kernel_imx/drivers/mxc/ipu3/ipu_disp.c" should be changed from "23" to "7". 2) For 16 bits BT1120 interface, the default data pins are "DISP0_DAT_23~DISP0_DAT_8", it can also be any other continued display data pins, the macro "BT656_IF_DI_MSB" should be modified if the hardware pins are changed. 3) When bt656 interface is the second display for each IPU,1-layer-fb (it can be checked with command "$ cat /sys/class/graphics/fbx/fsl_disp_propperty"), the frame buffer can only be YUV format. In this case, the IPU DC channel was used for BT656 display, it has no CSC function, so RGB frame buffer was not supported. 2013-08-09 updated: The new release package "L3.0.35_1.1.0_GA_bt656_output_patch_2013-08-09.zip" had fixed the BT656 dual display issue on iMX6S/DL. Removed the old release package. 2013-09-04 updated: The new release package "r13.4.1_bt656_output_patch_2013-09-04.zip" had fixed the BT656 dual display issue on iMX6S/DL. For default, the dual display was tested with HDMI + CVBS, HDMI is the main display and adv739x CVBS output is the second display. For iMX6DQ which has two IPUs, please assign dual display to two IPUs, for example adv739x is on IPU1 DI0, it is fixed, because hardware pins used for it is fixed. Then we can assign HDMI or LVDS to another IPU (IPU2). For iMX6S/DL which has only one IPU, since adv739x had used IPU1 DI0, another display should be IPU1 DI1. 2013-09-30 updated: Added patch for L3.0.35_4.1.0_GA BSP, the file is "L3.0.35_4.1.0_GA_bt656_output_patch_2013-09-30.zip". 2014-07-21 updated: Added patch for L3.10.17_1.0.0_GA BSP, the file is "L3.10.17_1.0.0_GA_bt656_output_patch_2014-07-21.zip". 2015-01-26 updated: Updated the IPU microcode for 1080i50 and 1080i60 BT1120 output, the parameters "N" for command BMA is a 8 bits parameters, so its max value is 255, but for 1080i50 and 1080i60 output, it needs more blank data in each line, the "N" will be bigger than 255, the updated IPU microcode can fix this limitation. The updated file is "IPU_Microcode_Update_for_BT1120_1080i_20150126.zip". You can update the macro "DC_MCODE_BT656_xxx" and function _ipu_dc_setup_bt656_interlaced() to the old patch if you used BT1120 mode to support 1080i display. The verified 1080i display mode is: { /* 1080I60 Interlaced output */ "BT1120-1080I60", 30, 1920, 1080, 13468, 20, 3, 20, 2, 280, 1, FB_SYNC_HOR_HIGH_ACT | FB_SYNC_VERT_HIGH_ACT, FB_VMODE_INTERLACED, FB_MODE_IS_DETAILED,}, { /* 1080I50 Interlaced output */ "BT1120-1080I50", 25, 1920, 1080, 13468, 20, 3, 20, 2, 720, 1, FB_SYNC_HOR_HIGH_ACT | FB_SYNC_VERT_HIGH_ACT, FB_VMODE_INTERLACED, FB_MODE_IS_DETAILED,}, 2016-01-28 updated: Updated IPU microcode to align with BT656.4 specification for NTSC output. For other BSP version with NTSC format support, please reference to ipu_disp_update.c for the final microcode. File "L3.0.35_4.1.0_GA_bt656_output_patch_20160128.zip"., Details, please reference to the readme.txt file in the package. 2016-06-24 update: Added BT656 and BT1120 progressive mode support. File "L3.0.35_4.1.0_GA_bt656_output_patch_20160624.zip". Details, please reference to the readme.txt file in the package. The patch for 3.14.52 GA1.1.0 BSP will be released in next week. 2016-06-27 update: Add BT656 and BT1120 display patch for 3.14.52 BSP. File "L3.14.52_1.1.0_GA_bt656_output_patch_2016-06-27.zip", details, please reference to the readme.txt in the package. 2017-03-10 update: Fixed a hard coding DC macro issue for progressive mode. Added patch "0008-Fixed-a-hard-coding-DC-macro-issue-for-progressive-m.patch" in L3.0.35_4.1.0_GA_bt656_output_patch_2017-03-10.zip. The code in patch "L3.14.52_1.1.0_GA_bt656_output_patch_2016-06-27" is correct.

Abstract This is a small tutorial about running a simple OpenCL application in an i.MX6Q. It covers a very small introduction to OpenCL, the explanation of the code and how to compile and run it. Requirements Any i.MX6Q board. Linux BSP with the gpu-viv-bin-mx6q package (for instructions on how to build the BSP, check the BSP Users Guide) OpenCL overview OpenCL allows any program to use the GPGPU features of the GC2000 (General-Purpose Computing on Graphics Processing Units) that means to use the i.MX6Q GPU processing power in any program. OpenCL uses kernels which are functions that can be executed in the GPU. These functions must be written in a C99 like code. In our current GPU there is no scheduling so each kernel will execute in a FIFO fashion. iMx6Q GPU is OpenCL 1.1 EP conformant. The Code The example provided here performs a simple addition of arrays in the GPU. The header needed to use openCL is cl.h and is under /usr/include/CL in your BSP rootfs when you install the gpu-viv-bin-mx6q package. The header is typically included like this: #include <CL/cl.h> The libraries needed to link the program are libGAL.so and libOpenCL.so those are under /usr/lib in your BSP rootfs. For details on the OpenCL API check the khronos page: http://www.khronos.org/opencl/ Our kernel source is as follows: __kernel void VectorAdd(__global int* c, __global int* a,__global int* b) { // Index of the elements to add unsigned int n = get_global_id(0); // Sum the nth element of vectors a and b and store in c c[n] = a[n] + b[n]; } The kernel is declared with the signature __kernel void VectorAdd(__global int* c, __global int* a,__global int* b). This takes vectors a and b as arguments adds them and stores the result in the vector c. It looks like a normal C99 method except for the keywords kernel and global. kernel tells the compiler this function is a kernel, global tells the compiler this attributes are of global address space. get_global_id built-in function This function will tell us to which index of the vector this kernel corresponds to. And in the last line the vectors are added. Below is the full source code commented. //************************************************************ // Demo OpenCL application to compute a simple vector addition // computation between 2 arrays on the GPU // ************************************************************ #include <stdio.h> #include <stdlib.h> #include <CL/cl.h> // // OpenCL source code const char* OpenCLSource[] = { "__kernel void VectorAdd(__global int* c, __global int* a,__global int* b)", "{", " // Index of the elements to add \n", " unsigned int n = get_global_id(0);", " // Sum the nth element of vectors a and b and store in c \n", " c[n] = a[n] + b[n];", "}" }; // Some interesting data for the vectors int InitialData1[20] = {37,50,54,50,56,0,43,43,74,71,32,36,16,43,56,100,50,25,15,17}; int InitialData2[20] = {35,51,54,58,55,32,36,69,27,39,35,40,16,44,55,14,58,75,18,15}; // Number of elements in the vectors to be added #define SIZE 100 // Main function // ************************************************************ int main(int argc, char **argv) { // Two integer source vectors in Host memory int HostVector1[SIZE], HostVector2[SIZE]; //Output Vector int HostOutputVector[SIZE]; // Initialize with some interesting repeating data for(int c = 0; c < SIZE; c++) { HostVector1[c] = InitialData1[c%20]; HostVector2[c] = InitialData2[c%20]; HostOutputVector[c] = 0; } //Get an OpenCL platform cl_platform_id cpPlatform; clGetPlatformIDs(1, &cpPlatform, NULL); // Get a GPU device cl_device_id cdDevice; clGetDeviceIDs(cpPlatform, CL_DEVICE_TYPE_GPU, 1, &cdDevice, NULL); char cBuffer[1024]; clGetDeviceInfo(cdDevice, CL_DEVICE_NAME, sizeof(cBuffer), &cBuffer, NULL); printf("CL_DEVICE_NAME: %s\n", cBuffer); clGetDeviceInfo(cdDevice, CL_DRIVER_VERSION, sizeof(cBuffer), &cBuffer, NULL); printf("CL_DRIVER_VERSION: %s\n\n", cBuffer); // Create a context to run OpenCL enabled GPU cl_context GPUContext = clCreateContextFromType(0, CL_DEVICE_TYPE_GPU, NULL, NULL, NULL); // Create a command-queue on the GPU device cl_command_queue cqCommandQueue = clCreateCommandQueue(GPUContext, cdDevice, 0, NULL); // Allocate GPU memory for source vectors AND initialize from CPU memory cl_mem GPUVector1 = clCreateBuffer(GPUContext, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, sizeof(int) * SIZE, HostVector1, NULL); cl_mem GPUVector2 = clCreateBuffer(GPUContext, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, sizeof(int) * SIZE, HostVector2, NULL); // Allocate output memory on GPU cl_mem GPUOutputVector = clCreateBuffer(GPUContext, CL_MEM_WRITE_ONLY, sizeof(int) * SIZE, NULL, NULL); // Create OpenCL program with source code cl_program OpenCLProgram = clCreateProgramWithSource(GPUContext, 7, OpenCLSource, NULL, NULL); // Build the program (OpenCL JIT compilation) clBuildProgram(OpenCLProgram, 0, NULL, NULL, NULL, NULL); // Create a handle to the compiled OpenCL function (Kernel) cl_kernel OpenCLVectorAdd = clCreateKernel(OpenCLProgram, "VectorAdd", NULL); // In the next step we associate the GPU memory with the Kernel arguments clSetKernelArg(OpenCLVectorAdd, 0, sizeof(cl_mem), (void*)&GPUOutputVector); clSetKernelArg(OpenCLVectorAdd, 1, sizeof(cl_mem), (void*)&GPUVector1); clSetKernelArg(OpenCLVectorAdd, 2, sizeof(cl_mem), (void*)&GPUVector2); // Launch the Kernel on the GPU // This kernel only uses global data size_t WorkSize[1] = {SIZE}; // one dimensional Range clEnqueueNDRangeKernel(cqCommandQueue, OpenCLVectorAdd, 1, NULL, WorkSize, NULL, 0, NULL, NULL); // Copy the output in GPU memory back to CPU memory clEnqueueReadBuffer(cqCommandQueue, GPUOutputVector, CL_TRUE, 0, SIZE * sizeof(int), HostOutputVector, 0, NULL, NULL); // Cleanup clReleaseKernel(OpenCLVectorAdd); clReleaseProgram(OpenCLProgram); clReleaseCommandQueue(cqCommandQueue); clReleaseContext(GPUContext); clReleaseMemObject(GPUVector1); clReleaseMemObject(GPUVector2); clReleaseMemObject(GPUOutputVector); for( int i =0 ; i < SIZE; i++) printf("[%d + %d = %d]\n",HostVector1[i], HostVector2[i], HostOutputVector[i]); return 0; } How to compile in Host Get to your ltib folder and run $./ltib m shell This way you will be using the cross compiler ltib uses and the default include and lib directories will be the ones in your bsp. Then run LTIB> gcc cl_sample.c -lGAL -lOpenCL -o cl_sample. How to run in the i.MX6Q Insert the GPU module root@freescale/home/user $ modprobe galcore Copy the compiled CL program and then run root@freescale /home/user$ ./cl_sample References [1] ttp://www.khronos.org/opencl/ Original Attachment has been moved to: libOpenCL.so.zip Original Attachment has been moved to: libCLC_Android.so.zip Original Attachment has been moved to: libOpenCL_Android.so.zip Original Attachment has been moved to: libCLC.so.zip

GStreamer has a simple feature to enable tracing, allowing the developer to do basic debugging. These can be done in two ways: Adding the parameter --gst-debug=LIST to the pipeline (a pipeline is a executed gst-launch command) Prepending the environment variable GST_DEBUG=LIST' LIST is a a comma-separated argument, indicating the GStreamer elements to trace. For example, if one needs to trace the sink element $ GST_DEBUG=*sink*:5 gst-launch playbin2 uri=file:///sample.avi or $ gst-launch playbin2 uri=file:///sample.avi --gst-debug=*sink*:5 Both commands produces the same log. In case want to trace for than one element, so can simple add the <element>:5, for example $ GST_DEBUG=mfw_v4lsink:5,vpudec:5 gst-launch playbin2 uri=file:///sample.avi The number 5 indicates the log category, where 5 is the highest (the most verbose log you can get) and 0 produces no output (5=LOG, 4=DEBUG, 3=INFO, 2=WARN, 1=ERROR). Log can be huge in each pipeline run. One way to filter it is using the grep command. Before grepping, one needs to redirect the standard error to the standard output (GStreamer log goes always to stderr), so $ GST_DEBUG=mfw_v4lsink:5,vpudec:5 gst-launch playbin2 uri=file:///sample.avi 2>&1 | grep <filter string> In case the log needs to be shared, it is important to remove the 'color' of the log, again, one just needs to add the parameter --gst-debug-no-color or prepend the env variable GST_DEBUG_NO_COLOR=1 ----- More shell variables that GStreamer react, can be found here https://developer.gnome.org/gstreamer/0.10/gst-running.html

There is no Freescale GStreamer element which does the JPEG decoding, so we must rely on a standard one, like 'jpegdec'. In case your Linux system was built using LTIB, in order to have the jpegdec element included on the gst-plugin-good, follow these steps: On the LTIB menuconfig, make sure the following packages are selected: gstreamer-plugins-good libjpeg libpng Remove the configure parameters '--disbale-libpng' and '--disable-jpeg' on the file './dist/lfs-5.1/gst-plugins-good/gst-plugins-good.spec' Rebuild and flash your board (or SD card) again. Image display VSALPHA=1 gst-launch filesrc location=sample.jpeg ! jpegdec ! imagefreeze ! mfw_isink Important: non 8 pixel aligned width and height is treated as not supported format in isink plugin.

Multiple-Overlay (or Multi-Overlay) means several video playbacks on a single screen. In case multiple screens are needed, check the dual-display case GStreamer i.MX6 Multi-Display $ export VSALPHA=1 $ SAMPLE1=sample1.avi; SAMPLE2=sample2.avi; SAMPLE3=sample3.avi; SAMPLE4=sample4.avi; $ WIDTH=320; HEIGHT=240; SEP=20 Four displays (2x2) $gst-launch \ playbin2 uri=file://`pwd`/$SAMPLE1 video-sink="mfw_isink axis-top=0 axis-left=0 disp-width=$WIDTH disp-height=$HEIGHT" \ playbin2 uri=file://`pwd`/$SAMPLE2 video-sink="mfw_isink axis-top=0 axis-left=`expr $WIDTH + $SEP` disp-width=$WIDTH disp-height=$HEIGHT" \ playbin2 uri=file://`pwd`/$SAMPLE3 video-sink="mfw_isink axis-top=`expr $HEIGHT + $SEP` axis-left=0 disp-width=$WIDTH disp-height=$HEIGHT" \ playbin2 uri=file://`pwd`/$SAMPLE4 video-sink="mfw_isink axis-top=`expr $HEIGHT + $SEP` axis-left=`expr $WIDTH + $SEP` disp-width=$WIDTH disp-height=$HEIGHT" Basic rotation, (2 x 1, normal and inverted) gst-launch \ playbin2 uri=file://`pwd`/$SAMPLE1 video-sink="mfw_isink axis-top=0 axis-left=0 disp-width=$WIDTH disp-height=$HEIGHT rotation=0" \ playbin2 uri=file://`pwd`/$SAMPLE2 video-sink="mfw_isink axis-top=`expr $HEIGHT + $SEP` axis-left=0 disp-width=$WIDTH disp-height=$HEIGHT rotation=3"

Use the EPDC Unit Test to exercise your EPD panel 1. Introduction The i.MX 6Solo/6DualLite and i.MX 6SoloLite processors contain an Electrophoretic Display Controller (EPDC) designed to drive E-INK(TM) EPD panels supporting a wide variety of TFT backplanes. Detailed information about the EPDC is available in the i.MX 6Solo/6DualLite and i.MX 6SoloLite Reference Manuals. Information about programming the EPDC and integrating support for a particular panel into the Linux BSP is provided in the Linux BSP Reference Manual. Now that you have your panel hardware and software integrated, how can you tell if it is all working? Or perhaps you're using a standard panel that doesn't seem to be working - how can you test it at the lowest level? The answer is the EDPC Unit Test. 2. Running the EPDC Unit Test Most i.MX processor hardware modules have an associated unit test in the unit_tests directory of the root file system image. The basics of running the EPDC unit test are simple. After logging in as root on the debug console, change to the unit tests directory and execute the test as follows: $ cd /unit_tests $ ./mxc_epdc_fb_test.out Without any arguments, the unit test runs thirteen individual tests and takes about 8 minutes to complete. You will get a lot of output on the panel and it may not be obvious that the images are displaying correctly. Running the unit test with no arguments is a nice automated way to make sure your panel is at least displaying something, and is not causing a crash or hang. But to really verify correctness each test should be run individually and the output carefully reviewed. This can be accomplished by passing a test number option as follows: $ ./mxc_epdc_fb_test.out -n 1 To show a list of all the available options, as well as a list of the individual test numbers, use the "-h" argument. The helpful output is shown below. $ ./mxc_epdc_fb_test.out -h EPDC framebuffer driver test program. Usage: mxc_epdc_fb_test [-h] [-a] [-p delay] [-u s/q/m] [-n ] -h Print this message -a Enabled animation waveforms for fast updates (tests 8-9) -p Provide a power down delay (in ms) for the EPDC driver 0 - Immediate (default) -1 - Never ms - After ms milliseconds -u Select an update scheme s - Snapshot update scheme q - Queue update scheme m - Queue and merge update scheme (default) -n Execute the tests specified in expression Expression is a set of comma-separated numeric ranges If not specified, tests 1-13 are run Example: ./mxc_epdc_fb_test.out -n 1-3,5,7 EPDC tests: 1 - Basic Updates 2 - Rotation Updates 3 - Grayscale Framebuffer Updates 4 - Auto-waveform Selection Updates 5 - Panning Updates 6 - Overlay Updates 7 - Auto-Updates 8 - Animation Mode Updates 9 - Fast Updates 10 - Partial to Full Update Transitions 11 - Test Pixel Shifting Effect 12 - Colormap Updates 13 - Collision Test Mode 14 - Stress Test In addition to "-n" to run individual tests, the "-a" and "-u" options are provided to set animation mode and waveform used, respectively. These options make sense only for some of the individual tests, as noted in the next section. The "-u s" (snapshot) mode purposely does not allow pending updates, so some of the tests will cause the driver to issue the following warning in snapshot mode: imx_epdc_fb: No free intermediate buffers available. The "-p" (power down delay) option allows you to specify how soon the device driver automatically powers down the controller after all pending updates have completed. The default is 0 (zero), meaning power down immediately. Use "-1" to disable power down (i.e. never power down). Sidebar: You can use another unit test, "dump-clocks.sh", to view the state of the EPDC clocks (i.e. the power state of the module). In the example below, a "1" in the third column indicates that the clock is running; a "0" indicates the clock is gated: $ ./dump-clocks.sh | grep epdc epdc_pix_clk pll5_video_main_clk 1 26666667 epdc_axi_clk pll2_pfd2_400M 1 198000000 3. Individual Test Details Important! The notes below assume you are running the version of the unit test binary in the tar file attached to this How-to. Please extract the file mxc_epdc_fb_tests.out from the tar file into your rootfs unit_tests directory before running the test. Most tests begin with a screen blank operation (all white pixels). Each test works with all three update schemes (snapshot, queue, queue and merge) unless otherwise noted. 3.1 Test 1: Basic Updates Draws the following patterns: Crosshatch Squares Text Ginger image Colorbar 3.2 Test 2: Rotation Updates Draws text, squares, crosshatch, and square outlines in all rotation modes: No rotation Clockwise (90 degrees) Upside down (180 degrees) Counterclockwise (270 degrees) The square outlines are drawn first in RGB format and then in Y8 format. 3.3 Test 3: Grayscale Framebuffer Updates Draws a top half black screen and then a colorbar, rotated clockwise. First draws in normal Y8 format and then in inverted Y8 format. 3.4 Test 4: Auto-waveform Selection Updates Draws several small squares using auto-waveform selection. Note: unlike the i.MX508 EPDC driver, the i.MX 6 driver does not report the final waveform selection to user-level applications. This test also verifies that squares at non-8-bit aligned pixels can be drawn. 3.5 Test 5: Panning Updates Draws a colorbar to an off-screen region. Draws the Ginger image to the frame buffer. Sets the focus (pan) to the colorbar, then updates portions of the screen. You should see the colorbar poke through. Slowly pans from Ginger to the entire colorbar. Use pan to flip between black and white buttons. Flashes buttons using pixel inversion. Flashes buttons using panning. 3.6 Test 6: Overlay Updates Switches between the frame buffer (FB) and the alternate (overlay) frame buffer as follows: Draws Ginger to the FB. Draws the colorbar to the alternate frame buffer. Shows the FB (Ginger). Shows the alternate FB (colorbar). Shows FB again (Ginger). Shows half FB, half alternate FB. Shows cutout region of alternate FB. Shows cutout in upper corner. Shows black screen. Shows clockwise-rotated text overlay in center. 3.7 Test 7: Auto-Updates Important! The auto-update mechanism must be enabled in the kernel configuration for this test to work. Enable it using the LTIB Kernel configuration menu item "Device Drivers->Graphics support->E-Ink Auto-update Mode Support." Also, please check the Linux BSP Release Notes for any issues with this feature. 3.8 Test 8: Animation Mode Updates Shows how normal (gray level) and monochrome (black and white) updates compare in appearance and performance. Quickly flashes back and forth between black and white screens. Draws normal squares. Draws black and white squares. Draws Ginger in gray scale and monochrome. Draws colorbar in gray scale and monochrome. Draws normal Y8 colorbar and monochrome colorbar. Draws inverted normal Y8 colorbar and inverted monochrome colorbar. You can run this test in animation mode using the "-a" command line option. Animation mode only updates monochrome pixels, so you will notice a drawing speed improvement. Also, you will see the implementation of one of the "rules" of using this mode: The screen must be blanked (all white or all black) when switching in and out of animation mode. 3.9 Test 9: Fast Updates Animates a square across the screen and down one side. This test can also be run in animation mode using "-a". See the animation mode notes in the Test 8 description above. 3.10 Test 10: Partial to Full Update Transitions Draws small gray squares using separate updates in partial update mode (only pixels that change are updated). Then re-draws the entire screen in one update using full update mode. In full update mode, you will notice the entire screen transition from black to the final grey value. 3.11 Test 11: Test Pixel Shifting Effect Draws a short, two pixel line and then sends two updates one pixel apart in distance and 5 seconds apart in time. Nothing much to see here; just verifies that a one pixel update shift works. 3.12 Test 12: Colormap Updates Creates a colormap and uses it to draw full screen blanks and to draw a color bar. In this test, you should follow along with the text printed in the debug console and verify each state: Screen should be black. Screen should be white now. Screen should still be white. Should be inverted color bar (white to black, left to right). Colorbar again, with no CMAP (black to white, left to right). Posterized colorbar. In the above output, "CMAP" means colormap and "Posterized colorbar" is a colorbar drawn from only black and white components. 3.13 Test 13: Collision Test Mode Draws two overlapping rectangles. Tests for collision on the first rectangle, which should result in the message: Collision test result = 0 Then draws the same overlapping rectangles, this time testing for collision on the second rectangle. The result should be: Collision test result = 1 Note: This test cannot be run using the snapshot scheme. If you try to use "-u s" it will print a message and return. 3.14 Test 14: Stress Test Draws thousands of random rectangles on the screen in different rotations. Runs for about 8 minutes. This test must be explicitly specified on the command line; it does not run by default. For example: $ ./mxc_epdc_fb_test.out -n 14 Note: This test cannot be run using the snapshot scheme. If you try to use "-u s" it will print a message and return. 4. Customizing A great way to know what's really going on in each test is to look at the source code. You may want to customize the source as well - say to add a new test that exercises some cool feature of your panel. Fortunately, the source is included in the BSP distribution so you can extract, customize, and rebuild it as needed. The magic of LTIB is beyond the scope of this article, but here are some hints: # Extract the unit test source from the imx-test package: $ ./ltib -m prep -p imx-test # Source is now in rpm/BUILD/imx-test-12.08.00/test/mxc_fb_test/mxc_epdc_fb_test.c (your package version number my be different). # Build from source: $ ./ltib -m scbuild -p imx-test # Deploy all unit test binaries to ltib rootfs directory: $ ./ltib -m scdeploy -p imx-test # Alternatively you can copy just the epdc unit test binary as follows: $ sudo cp rpm/BUILD/imx-test-12.08.00/platform/IMX6S/mxc_epdc_fb_test.out rootfs/unit_tests/ As noted in the Individual Test Details section, you should replace the file mxc_epdc_fb_test.c referenced above with the one found in the tar file attached to this How-to. 5. Something is wrong! Of course there are many things that can go wrong that are beyond the scope of this article, but assuming your hardware is working and the panel options are correctly configured in the BSP (again, beyond our scope), here are some things to check: Is the kernel configured correctly for EPDC support? Check that the following item is enabled in the LTIB Kernel configuration menu: "Device Drivers->Graphics support->E-Ink Panel Framebuffer." Are the U-boot kernel command line arguments set correctly for EPDC? For example, "video=mxcepdcfb:E060SCM consoleblank=0". The "consoleblank=0" command is useful to prevent entering low power suspend mode while you are testing. Does the "Tux" image display correctly on your panel after system boot? If so, the EPDC driver was a least able to perform the initial framebuffer update to your panel. Note: for Tux to display at boot, you need to disable LCDIF support at "Device Drivers->Graphics support->Support MXC ELCDIF framebuffer." Are there any EPDC-related error messages in the kernel log after boot? You can check with "dmesg | grep epdc".

Introduction LVDS display panel driving data flow: Display quality: To get the best display quality for 24bit LVDS display panel in Android, we should use 32bit framebuffer, make IPUv3 display Engine and LDB output 24bit pixels, since RGB component information is aligned from source to destination. 2 stages to enable display: Uboot splash screen and Kernel framebuffer Guidelines Uboot splash screen: Change should be done in board file, like board/freescale/mx6q_sabresd/mx6q_sabresd.c: 1. Set video mode in struct fb_videomode according to the new 24bit LVDS display panel’s spec(please, refer to the example at the end of this doc). 2. Set up pwm, iomux/display related clock trees in lcd_enable(). Note that these should be aligned with Kernel settings to support smooth UI transition from Uboot splash screen to Kernel framebuffer. 3. Set the output pixel format of IPUv3 display engine and LDB to IPU_PIX_FMT_RGB24 when calling ipuv3_fb_init(). 4. Set pixel clock according to the new 24bit LVDS display panel’s spec when calling ipuv3_fb_init(). 5. If dual LDB channels are needed to support tough display video mode(high resolution or high pixel clock frequency), we need to enable both of the two LDB channels and set LDB to work at split mode. LDB_CTRL register should be set accordingly in lcd_enable(). Kernel framebuffer: As we may add ‘video=‘ and ‘ldb=’ options in kernel bootup command line, Kernel code is more flexible to handle different LVDS display panels with various display color depth than Uboot code. For detail description of ‘video=’ and ‘ldb=’ option, please refer to MXC Linux BSP release notes and Android User Guide. Some known points are: 1. Add a video mode in struct fb_videomode in drivers/video/mxc/ldb.c according to the new 24bit LVDS display panel’s spec(please, refer to the example at the end of this doc). 2. Set up pwm backlight/display related iomux in platform code. 3. Set appropriate ‘video=‘ option in kernel bootup command line, for example: video=mxcfb0:dev=ldb,LDB-NEW,if=RGB24,fbpix=RGB32 4. Set appropriate ‘ldb=‘ option in kernel bootup command line if dual LDB channels are needed to support tough display video mode, for example: ldb=spl0 (IPUv3 DI0 is used) or ldb=spl1 (IPUv3 DI1 is used) 5. Set appropriate ‘fbmem=‘ option in kernel bootup command line to reserve enough memory for framebuffer. For example, if we use 1280x800 LVDS panel for fb0 and fb0 is in RGB32 pixel format, then ‘fbmem=12M’ should be used, since the formula is: fbmem= width*height*3(triple buf)*Bytes_per_pixel= 1280*800*3*4B=12MB An Example to Set struct fb_videomode: Let’s take a look at the timing description quoted from a real 1280x800@60 24bit LVDS panel spec: And, standard linux struct fb_videomode definition in include/linux/fb.h: struct fb_videomode { const char *name; /* optional */ u32 refresh; /* optional */ u32 xres; u32 yres; u32 pixclock; u32 left_margin; u32 right_margin; u32 upper_margin; u32 lower_margin; u32 hsync_len; u32 vsync_len; u32 sync; u32 vmode; u32 flag; }; What we need to do is to set every field of struct fb_videomode correctly according to the timing description of LVDS display panel’s spec: 1. name: we can set it to ‘LDB-WXGA’. 2. refresh: though it’s optional, we can set it to typical value, that is, 60(60Hz refresh rate). 3. xres: the active width, that is, 1280. 4. yres: the active height, that is, 800. 5. pixclock: calculate with this formula – pixclock=(10^12)/clk_freq. Here, typically, for this example, pixclock=(10^12)/71100000=14065. 6. left_margin/right_margin/hsync_len: They are the same to HS Back Porch(HBP)/HS Front Porch(HFP)/HS Width(HW) in the spec. Since the spec only tells us that typically HBP+HFP+HW=160. We may set left_margin=40, right_margin=40, hsync_len=80. 7. upper_margin/lower_margin/vsync_len: Similar to horizontal timing, the vertical ones can be set to upper_margin=10, lower_margin=3, vsync_len=10. 8. sync: Since the timing chart tells us that hsync/vsync are active low, so we don’t need to set FB_SYNC_HOR_HIGH_ACT or FB_SYNC_VERT_HIGH_ACT. Moreover, clock polarity and data polarity are invalid, so we set sync to be zero here. 9. vmode: this is a progressive video mode, so set vmode to FB_VMODE_NONINTERLACED. 10. flag: the video mode is provided by driver, so set flag to FB_MODE_IS_DETAILED.

Graphics are a big topic in the Android platform, containing java/jni graphic framework and 2d/3d graphic engines (skia, OpenGL-ES, renderscript). This document describes the general Android graphic stack and UI features on Freescale devices. 1. Android Graphic Stacks All Android 3D apps and games have the following graphic stack: Android system UI and all Apps UI follow 2D graphic stack as below, the hardware render will accelerate Android 2D UI with GPU HW OpenGL-ES 2.0 to improve the whole UI performance. Hardware acceleration can be disabled on i.mx6 in device/fsl/imx6/soc/imx6dq.mk USE_OPENGL_RENDERER := false Then rebuild frameworks/base/core/jni, and replace libandroid_runtime.so Surfaceflinger is responsible of all surface layers composition, and then generate the framebuffer pixmap for display devices. these graphic surface layers are from 2D/3D apps. Hwcomposer is the alternative module of Surfaceflinger with OpenGL-ES. Hwcomposer is used to combine the specific surface layers supported by specific vendor devices. Freescale i.MX6 devices use GPU 2D to combine most surface layers, and the system power can be reduced with GPU 2D instead of GPU 3D. The typical power saving case is video playback. Hwcomposer with GPU 2D can offload GPU 3D task when running game and benchmarks, it is proved to improve the overall system performance about 20%. 2. Performance measurment Show FPS for Android system performance For NFS boot you can set “debug.sf.showfps” to 1 in init.freescale.rc (“setprop debug.sf.showfps 1”) and then reboot the system. For SD or EMMC boot, you can issue command “setprop debug.sf.showfps 1” in console, then find system_server thread by top and kill it to reset the system. Graphic benchmarks for 3D capability measurement Quadrant Full test benchmark cover CPU, Memory, IO, 2D and 3D GLBenchmark http://www.glbenchmark.com/ NenaMark2 https://market.android.com/details?id=se.nena.nenamark2 An3DBench http://www.androidzoom.com/android_applications/tools/an3dbench_hnog.html AnTutu http://www.antutu.com/software.html 3DMark http://www.futuremark.com/benchmarks/3dmark06/introduction/ Browser benchmarks http://www.webkit.org/perf/sunspider/sunspider.html http://v8.googlecode.com/svn/data/benchmarks/current/run.html http://www.craftymind.com/guimark2/ http://www.craftymind.com/factory/guimark/GUIMark_HTML4.html http://themaninblue.com/writing/perspective/2010/03/22/ 3. Android UI features Dual display with same content This feature is supported in the default image in Android i.MX 6 release package. In this feature, LVDS panel and HDMI output can be supported simultaneously. It is only enabled when the HDMI TV has been connected with the board. Overscan for TV devices Some TVs may miss display the contents in overscan area. To avoid the contents in overscan area being lost, the common implement is by underscanning with an adjustable black border and letitng the viewer adjust the width of the black border. The downscan operation is done by surfaceflinger when it does surface composition through HW OpenGL ES. There is no performance impact since all the work is done by GPU HW. Overscan can be configured in display setting in visual mode: 32 bits color depth 32bpp UI can be supported by adding “bpp=32” in uboot as below: setenv bootargs ‘… video=mxcdi1fb:RGB666,XGA,bpp=32 …’, also can configure it in display setting. Enable 32bpp frame buffer and application surface buffer will be allocate to RGBA8888 format instead of default RGB565 format, that means more system memory is allocated. After enabling 32bpp, if some applications still don't have better UI quality, check to see if there is hard code to request RGB565 format surface (should request RGBA8888 format to get better quality). Sample code is attached to test for 32bpp (left is on 16bpp, right is on 32bpp) Display Visual Setting The display setting is the add-on feature in FSL Android release, it is very convenient for end-users to change display property, mostly for the following features: Dual display enablement Display color depth setting(16bpp, 32bpp) Overscan adjustment in horizontal and vertical orientation 4. Issue Diagnosis Application Compatibility Some Android applications may not run correctly on some Android releases. It may cause application compatibility, so check the application in other platforms. For example Neocore and Asphalt 5 can run on Eclair, Froyo, and Gingerbread, but will not correctly run on Honeycomb. GPU Compatibility Some game UIs may not correctly display on our Android release. When encountering this kind of issue, the customer can check whether it is caused by the game using an OpenGL extension which our GPU does not support. They can download another data package (for example not extension data package) to have a check. Others Enlarge GPU memory if you encounter UI abnormally displaying after running an application for a while. Some applications need Wifi connections, so monitor the console log to see whether there are any error reports.

Multiple-Display means video playback on multiple screens. In case playback needs to be in a unique screen, the mfw_isink element must be used and some pipelines examples can be found on this link: GStreamer iMX6 Multi-Overlay. Number of Displays Display type Kernel parameters Pipelines # Set these shells variables before running the pipelines alias gl=gst-launch SINK_1="\"mfw_v4lsink device=/dev/video17\"" SINK_2="\"mfw_v4lsink device=/dev/video18\"" SINK_3="\"mfw_v4lsink device=/dev/video20\"" media1=file:///root/media1 media2=file:///root/media2 media3=file:///root/media3 2 hdmi + lvds video=mxcfb0:dev=hdmi,1920x1080M@60,if=RGB24 video=mxcfb1:dev=ldb,LDB-XGA,if=RGB666 gl playbin2 uri=$media1 video-sink=$SINK_1 playbin2 uri=$media2 video-sink=$SINK_2 2 lvds + lvds video=mxcfb0:dev=ldb,LDB-XGA,if=RGB666 video=mxcfb1:dev=ldb,LDB-XGA,if=RGB666 gl playbin2 uri=$media1 video-sink=$SINK_1 playbin2 uri=$media2 video-sink=$SINK_2 2 lcd + lvds video=mxcfb0:dev=lcd,800x480M@55,if=RGB565 video=mxcfb1:dev=ldb,LDB-XGA,if=RGB666 gl playbin2 uri=$media1 video-sink=$SINK_1 playbin2 uri=$media2 video-sink=$SINK_2 3 hdmi + lvds + lvds video=mxcfb0:dev=hdmi,1920x1080M@60,if=RGB24 video=mxcfb1:dev=ldb,LDB-XGA,if=RGB6 video=mxcfb2:dev=ldb,LDB-XGA,if=RGB666 gl playbin2 uri=$media1 video-sink=$SINK_1 playbin2 uri=$media2 video-sink=$SINK_2 playbin2 uri=$media3 video-sink=$SINK_3

Video, bad performance gst-launch filesrc location=test.mp4 typefind=true ! aiurdemux ! vpudec ! mfw_v4lsink Video, better performance gst-launch filesrc location=sample.mp4 typefind=true ! aiurdemux ! queue max-size-time=0 ! vpudec ! mfw_v4lsink # typefind=true allows to 'type find' the source file before negotiating # max-size-time=0 indicates to ignore possible blocking issues # In case of ASF files gst-launch filesrc location=sample.asf typefind=true ! aiurdemux ! queue max-size-time=0 ! mfw_wmvdecoder ! mfw_v4lsink Audio gst-launch filesrc location=sample.mp3 typefind=true ! beepdec ! audioconvert ! 'audio/x-raw-int, channels=2' ! alsasink Audio with visualization gst-launch filesrc location=sample.mp3 typefind=true ! beepdec ! tee name=t ! queue ! audioconvert ! 'audio/x-raw-int, channels=2' ! alsasink t. ! queue ! audioconvert ! goom ! autovideoconvert ! autovideosink Video/Audio long version gst-launch filesrc location=sample.avi typefind=true ! aiurdemux name=demux demux. ! queue max-size-buffers=0 max-size-time=0 ! vpudec ! mfw_v4lsink demux. ! queue max-size-buffers=0 max-size-time=0 ! beepdec ! audioconvert ! 'audio/x-raw-int, channels=2' ! alsasink # queue properties, max-size-buffers=0 and max-size-time=0, allows a smoother playback; type 'gst-inspect queue' for more info VA short version gplay sample.avi VA short version gst-launch playbin2 uri=file://<full path to sample file>

Header 1 Header 2 Video rendering gst-launch videotestsrc ! mfw_v4lsink Audio rendering gst-launch audiotestsrc ! alsasink WAV Audio rendering gst-launch filesrc location=test.wav ! wavparse ! alsasink Video rendering selecting caps gst-launch videotestsrc ! capsfilter name='video/x-raw-yuv,format=(fourcc)I420' ! mfw_v4lsink gst-launch videotestsrc ! 'video/x-raw-yuv,format=(fourcc)I420' ! mfw_v4lsink

Fast GPU Image Processing in the i.MX 6x by Guillermo Hernandez, Freescale Introduction Color tracking is useful as a base for complex image processing use cases, like determining what parts of an image belong to skin is very important for face detection or hand gesture applications. In this example we will present a method that is robust enough to take some noise and blur, and different lighting conditions thanks to the use of OpenGL ES 2.0 shaders running in the i.MX 6X multimedia processor. Prerequisites This how-to assumes that the reader is an experienced i.mx developer and is familiar with the tools and techniques around this technology, also this paper assumes the reader has intermediate graphics knowledge and experience such as the RGBA structure of pictures and video frames and programming OpenGL based applications, as we will not dig in the details of the basic setup. Scope Within this paper, we will see how to implement a very fast color tracking application that uses the GPU instead of the CPU using OpenGL ES 2.0 shaders. Step 1: Gather all the components For this example we will use: 1. i.MX6q ARD platform 2. Linux ER5 3. Oneric rootfs with ER5 release packages 4. Open CV 2.0.0 source Step 2: building everything you need Refer to ER5 User´s Guide and Release notes on how to build and boot the board with the Ubuntu Oneric rootfs. After you are done, you will need to build the Open CV 2.0.0 source in the board, or you could add it to the ltib and have it built for you. NOTE: We will be using open CV only for convenience purposes, we will not use any if its advanced math or image processing features (because everything happens on the CPU and that is what we are trying to avoid), but rather to have an easy way of grabbing and managing frames from the USB camera. Step 3: Application setup Make sure that at this point you have a basic OpenGL Es 2.0 application running, a simple plane with a texture mapped to it should be enough to start. (Please refer to Freescale GPU examples). Step 4: OpenCV auxiliary code The basic idea of the workflow is as follows: a) Get the live feed from the USB camera using openCV function cvCapture() and store into IplImage structure. b) Create an OpenGL texture that reads the IplImage buffer every frame and map it to a plane in OpenGL ES 2.0. c) Use the Fragment Shader to perform fast image processing calculations, in this example we will examine the Sobel Filter and Binary Images that are the foundations for many complex Image Processing algorithms. d) If necessary, perform multi-pass rendering to chain several image processing shaders and get an end result. First we must import our openCV relevant headers: #include "opencv/cv.h" #include "opencv/cxcore.h" #include "opencv/cvaux.h" #include "opencv/highgui.h" Then we should define a texture size, for this example we will be using 320x240, but this can be easily changed to 640 x 480 #define TEXTURE_W 320 #define TEXTURE_H 240 We need to create an OpenCV capture device to enable its V4L camera and get the live feed: CvCapture *capture; capture = cvCreateCameraCapture (0); cvSetCaptureProperty (capture, CV_CAP_PROP_FRAME_WIDTH, TEXTURE_W); cvSetCaptureProperty (capture, CV_CAP_PROP_FRAME_HEIGHT, TEXTURE_H); Note: when we are done, remember to close the camera stream: cvReleaseCapture (&capture); OpenCV has a very convenient structure used for storing pixel arrays (a.k.a. images) called IplImage IplImage *bgr_img1; IplImage *frame1; bgr_img1 = cvCreateImage (cvSize (TEXTURE_W, TEXTURE_H), 8, 4); OpenCV has a very convenient function for capturing a frame from the camera and storing it into a IplImage frame2 = cvQueryFrame(capture2); Then we will want to separate the camera capture process from the pos-processing filters and final rendering; hence, we should create a thread to exclusively handle the camera: #include <pthread.h> pthread_t camera_thread1; pthread_create (&camera_thread1, NULL, UpdateTextureFromCamera1,(void *)&thread_id); Your UpdateTextureFromCamera() function should be something like this: void *UpdateTextureFromCamera2 (void *ptr) { while(1) { frame2 = cvQueryFrame(capture); //cvFlip (frame2, frame2, 1); // mirrored image cvCvtColor(frame2, bgr_img2, CV_BGR2BGRA); } return NULL; } Finally, the rendering loop should be something like this: while (! window->Kbhit ()) { tt = (double)cvGetTickCount(); Render (); tt = (double)cvGetTickCount() - tt; value = tt/(cvGetTickFrequency()*1000.); printf( "\ntime = %gms --- %.2lf FPS", value, 1000.0 / value); //key = cvWaitKey (30); } Step 5: Map the camera image to a GL Texture As you can see, you need a Render function call every frame, this white paper will not cover in detail the basic OpenGL or EGL setup of the application, but we would rather focus on the ES 2.0 shaders. GLuint _texture; GLeglImageOES g_imgHandle; IplImage *_texture_data; The function to map the texture from our stored pixels in IplImage is quite simple: we just need to get the image data, that is basically a pixel array void GLCVPlane::PlaneSetTex (IplImage *texture_data) { cvCvtColor (texture_data, _texture_data, CV_BGR2RGB); glBindTexture(GL_TEXTURE_2D, _texture); glTexImage2D (GL_TEXTURE_2D, 0, GL_RGB, _texture_w, _texture_h, 0, GL_RGB, GL_UNSIGNED_BYTE, _texture_data->imageData); } This function should be called inside our render loop: void Render (void) { glClearColor (0.0f, 0.0f, 0.0f, 0.0f); glClear (GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT); PlaneSetTex(bgr_img1); } At this point the OpenGL texture is ready to be used as a sampler in our Fragment Shader mapped to a 3D plane Lastly, when you are ready to draw your plane with the texture in it: // Set the shader program glUseProgram (_shader_program); … // Binds this texture handle so we can load the data into it /* Select Our Texture */ glActiveTexture(GL_TEXTURE0); //Select eglImage glEGLImageTargetTexture2DOES(GL_TEXTURE_2D, g_imgHandle); glDrawArrays (GL_TRIANGLES, 0, 6); Step 6: Use the GPU to do Image Processing First we need to make sure we have the correct Vertex Shader and Fragment shader, we will focus only in the Fragment Shader, this is where we will process our image from the camera. Below you will find the most simple fragment shader, this one only colors pixels from the sample texture const char *planefrag_shader_src = "#ifdef GL_FRAGMENT_PRECISION_HIGH \n" " precision highp float; \n" "#else \n" " precision mediump float; \n" "#endif \n" " \n" "uniform sampler2D s_texture; \n" "varying vec3 g_vVSColor; \n" "varying vec2 g_vVSTexCoord; \n" " \n" "void main() \n" "{ \n" " gl_FragColor = texture2D(s_texture,g_vVSTexCoord); \n" "} \n"; Binary Image The most Simple Image Filter is the Binary Image, this one converts a source image to a black/white output, to decide if a color should be black or white we need a threshold, everything below that threshold will be black, and any color above should be white. The shader code is as follows: const char* g_strRGBtoBlackWhiteShader = #ifdef GL_FRAGMENT_PRECISION_HIGH precision highp float; #else precision mediump float; #endif varying vec2 g_vVSTexCoord; uniform sampler2D s_texture; uniform float threshold; void main() { vec3 current_Color = texture2D(s_texture,g_vVSTexCoord).xyz; float luminance = dot (vec3(0.299,0.587,0.114),current_Color); if(luminance>threshold) \n" gl_FragColor = vec4(1.0); \n" else \n" gl_FragColor = vec4(0.0); \n" } \n"; You can notice that the main operation is to get a luminance value of the pixel, in order to achieve that we have to multiply a known vector (obtained empirically) by the current pixel, then we simply compare that luminance value with a threshold. Anything below that threshold will be black, and anything above that threshold will be considered a white pixel. SOBEL Operator Sobel is a very common filter, since it is used as a foundation for many complex Image Processing processes, particularly in edge detection algorithms. The sobel operator is based in convolutions, the convolution is made of a particular mask, often called a kernel (on common therms, usually a 3x3 matrix). The sobel operator calculates the gradient of the image at each pixel, so it tells us how it changes from the pixels surrounding the current pixel , meaning how it increases or decreases (darker to brighter values). The shader is a bit long, since several operations must be performed, we shall discuss each of its parts below: First we need to get the texture coordinates from the Vertex Shader: const char* plane_sobel_filter_shader_src = #ifdef GL_FRAGMENT_PRECISION_HIGH precision highp float; #else precision mediump float; #endif varying vec2 g_vVSTexCoord; uniform sampler2D s_texture; Then we should define our kernel, as stated before, a 3x3 matrix should be enough, and the following values have been tested with good results: mat3 kernel1 = mat3 (-1.0, -2.0, -1.0, 0.0, 0.0, 0.0, 1.0, 2.0, 1.0); We also need a convenient way to convert to grayscale, since we only need grayscale information for the Sobel operator, remember that to convert to grayscale you only need an average of the three colors: float toGrayscale(vec3 source) { float average = (source.x+source.y+source.z)/3.0; return average; } Now we go to the important part, to actually perform the convolutions. Remember that by the OpenGL ES 2.0 spec, nor recursion nor dynamic indexing is supported, so we need to do our operations the hard way: by defining vectors and multiplying them. See the following code: float doConvolution(mat3 kernel) { float sum = 0.0; float current_pixelColor = toGrayscale(texture2D(s_texture,g_vVSTexCoord).xyz); float xOffset = float(1)/1024.0; float yOffset = float(1)/768.0; float new_pixel00 = toGrayscale(texture2D(s_texture, vec2(g_vVSTexCoord.x- xOffset,g_vVSTexCoord.y-yOffset)).xyz); float new_pixel01 = toGrayscale(texture2D(s_texture, vec2(g_vVSTexCoord.x,g_vVSTexCoord.y-yOffset)).xyz); float new_pixel02 = toGrayscale(texture2D(s_texture, vec2(g_vVSTexCoord.x+xOffset,g_vVSTexCoord.y-yOffset)).xyz); vec3 pixelRow0 = vec3(new_pixel00,new_pixel01,new_pixel02); float new_pixel10 = toGrayscale(texture2D(s_texture, vec2(g_vVSTexCoord.x-xOffset,g_vVSTexCoord.y)).xyz);\n" float new_pixel11 = toGrayscale(texture2D(s_texture, vec2(g_vVSTexCoord.x,g_vVSTexCoord.y)).xyz); float new_pixel12 = toGrayscale(texture2D(s_texture, vec2(g_vVSTexCoord.x+xOffset,g_vVSTexCoord.y)).xyz); vec3 pixelRow1 = vec3(new_pixel10,new_pixel11,new_pixel12); float new_pixel20 = toGrayscale(texture2D(s_texture, vec2(g_vVSTexCoord.x-xOffset,g_vVSTexCoord.y+yOffset)).xyz); float new_pixel21 = toGrayscale(texture2D(s_texture, vec2(g_vVSTexCoord.x,g_vVSTexCoord.y+yOffset)).xyz); float new_pixel22 = toGrayscale(texture2D(s_texture, vec2(g_vVSTexCoord.x+xOffset,g_vVSTexCoord.y+yOffset)).xyz); vec3 pixelRow2 = vec3(new_pixel20,new_pixel21,new_pixel22); vec3 mult1 = (kernel[0]*pixelRow0); vec3 mult2 = (kernel[1]*pixelRow1); vec3 mult3 = (kernel[2]*pixelRow2); sum= mult1.x+mult1.y+mult1.z+mult2.x+mult2.y+mult2.z+mult3.x+ mult3.y+mult3.z;\n" return sum; } If you see the last part of our function, you can notice that we are adding the multiplication values to a sum, with this sum we will see the variation of each pixel regarding its neighbors. The last part of the shader is where we will use all our previous functions, it is worth to notice that the convolution needs to be applied horizontally and vertically for this technique to be complete: void main() { float horizontalSum = 0.0; float verticalSum = 0.0; float averageSum = 0.0; horizontalSum = doConvolution(kernel1); verticalSum = doConvolution(kernel2); if( (verticalSum > 0.2)|| (horizontalSum >0.2)||(verticalSum < -0.2)|| (horizontalSum <-0.2)) averageSum = 0.0; else averageSum = 1.0; gl_FragColor = vec4(averageSum,averageSum,averageSum,1.0); } Conclusions and future work At this point, if you have your application up and running, you can notice that Image Processing can be done quite fast, even with images larger than 640 480. This approach can be expanded to a variety of techniques like Tracking, Feature detection and Face detection. However, these techniques are out of scope for now, because this algorithms need multiple rendering passes (like face detection), where we need to perform an operation, then write the result to an offscreen buffer and use that buffer as an input for the next shader and so on. But Freescale is planning to release an Application Note in Q4 2012 that will expand this white paper and cover these techniques in detail.

Dithering Implementation for Eink Display Panel by Daiyu Ko, Freescale Dithering a. Dithering in digital image processing Dithering is a technique used in computer graphics to create the illusion of color depth in images with a limited color palette (color quantization). In a dithered image, colors not available in the palette are approximated by a diffusion of colored pixels from within the available palette. The human eye perceives the diffusion as a mixture of the colors within it (see color vision). Dithered images, particularly those with relatively few colors, can often be distinguished by a characteristic graininess, or speckled appearance. Figure 1. Original photo; note the smoothness in the detail http://en.wikipedia.org/wiki/File:Dithering_example_undithered_web_palette.png Figure 2.Original image using the web-safe color palette with no dithering applied. Note the large flat areas and loss of detail. http://en.wikipedia.org/wiki/File:Dithering_example_dithered_web_palette.png Figure 3.Original image using the web-safe color palette with Floyd–Steinberg dithering. Note that even though the same palette is used, the application of dithering gives a better representation of the original b. Applications Display hardware, including early computer video adapters and many modern LCDs used in mobile phonesand inexpensive digital cameras, show a much smaller color range than more advanced displays. One common application of dithering is to more accurately display graphics containing a greater range of colors than the hardware is capable of showing. For example, dithering might be used in order to display a photographic image containing millions of colors on video hardware that is only capable of showing 256 colors at a time. The 256 available colors would be used to generate a dithered approximation of the original image. Without dithering, the colors in the original image might simply be "rounded off" to the closest available color, resulting in a new image that is a poor representation of the original. Dithering takes advantage of the human eye's tendency to "mix" two colors in close proximity to one another. For Eink panel, since it is grayscale image only, we can use the dithering algorism to reduce the grayscale level even to black/white only but still get better visual results. c. Algorithm There are several algorithms designed to perform dithering. One of the earliest, and still one of the most popular, is the Floyd–Steinberg dithering algorithm, developed in 1975. One of the strengths of this algorithm is that it minimizes visual artifacts through an error-diffusion process; error-diffusion algorithms typically produce images that more closely represent the original than simpler dithering algorithms. (Original) Threshold Bayer (ordered) Example (Error-diffusion): Error-diffusion dithering is a feedback process that diffuses the quantization error to neighboring pixels. Floyd–Steinberg dithering only diffuses the error to neighboring pixels. This results in very fine-grained dithering. Jarvis, Judice, and Ninke dithering diffuses the error also to pixels one step further away. The dithering is coarser, but has fewer visual artifacts. It is slower than Floyd–Steinberg dithering because it distributes errors among 12 nearby pixels instead of 4 nearby pixels for Floyd–Steinberg. Stucki dithering is based on the above, but is slightly faster. Its output tends to be clean and sharp. Floyd–Steinberg Jarvis, Judice & Ninke Stucki Error-diffusion dithering (continued): Sierra dithering is based on Jarvis dithering, but it's faster while giving similar results. Filter Lite is an algorithm by Sierra that is much simpler and faster than Floyd–Steinberg, while still yielding similar (according to Sierra, better) results. Atkinson dithering, developed by Apple programmer Bill Atkinson, resembles Jarvis dithering and Sierra dithering, but it's faster. Another difference is that it doesn't diffuse the entire quantization error, but only three quarters. It tends to preserve detail well, but very light and dark areas may appear blown out. Sierra Sierra Lite Atkinson 2. Eink display panel characteristic a. Low resolution Eink only has couple resolution modes for display DU (1bit, Black/White) GC4 (2bit, Gray scale) GC16 (4bit, Gray scale) A2 (1bit, Black/White, fast update mode) b. Slow update time For 800x600 panel size (per frame) DU 300ms GC4 450ms GC16 600ms A2 125ms 3. 3. Effect by doing dithering for Eink display panel a. Low resolution with better visual quality By doing dithering to the original grayscale image, we can get better visual looking result. Even if the image becomes black and white image, with the dithering algorism, you will still get the feeling of grayscale image. b. Faster update with Eink’s animation waveform Since the DU/A2 mode could update the Eink panel faster than grayscale mode, with dithering, we can get no only the better visual looking result, but also we can use DU/A2 fast update mode to show animation or even normal video files. 4. 4. Our current dithering implementation a. Choose a simple and effective algorism Considering Eink panel’s characteristics, we compared couple dithering algorism and decide to use Atkinson dithering algorism. It is simple and the result is better especially for Einkblack/white display case. b. Made a lot of optimization so that it will not affect update time too much With the simplicity of the Atkinson dithering algorism, we can also put a lot of effort to do the optimization in order to reduce the dithering processing time and make it practical for actual use. c. Current algorism performance and result Currently, with Atkinson dithering algorism, our processing time is about 70ms. 5. 5. Availability a. We implemented both Y8->Y1 and Y8->Y4 dithering with the same dithering algorism. b. Implemented into our EPDC driver with i.MX6SL Linux 3.0.35 version release. c. Also implemented in our Video for Eink demo 6. 6. References a. Part of dithering introduction from www.wikipedia.org

Note: All these gstreamer pipelines have been tested using a i.MX6Q board with a kernel version 3.0.35-2026-geaaf30e. Tools: gst-launch gst-inspect FSL Pipeline Examples: GStreamer i.MX6 Decoding GStreamer i.MX6 Encoding GStreamer Transcoding and Scaling GStreamer i.MX6 Multi-Display GStreamer i.MX6 Multi-Overlay GStreamer i.MX6 Camera Streaming GStreamer RTP Streaming Other plugins: GStreamer ffmpeg GStreamer i.MX6 Image Capture GStreamer i.MX6 Image Display Misc: Testing GStreamer Tracing GStreamer Pipelines GStreamer miscellaneous

i.MXプロセッサナレッジベース

i.MX Processors Knowledge Base

ディスカッション

Using i.MX6Q to Build a Palm-Sized Heterogeneous Mini-HPC

How to Enable Second Display Showing Different Things on JB4.2.2 SabreSD

Patch to Support BT656 and BT1120 Output For i.MX6 BSP

OpenCL Hello World

bin2txt.pyw

Tracing GStreamer Pipelines

GStreamer i.MX6 Image Display

GStreamer i.MX6 Image Capture

GStreamer i.MX6 Multi-Overlay

How to verify EPD panel functionality

iMX6QD How to Add 24-bit LVDS Support in Android

Android Graphic UI with GPU Hardware Acceleration

GStreamer i.MX6 Multi-Display

GStreamer i.MX6 Decoding

Testing GStreamer

GStreamer gst-inspect Tool

GStreamer gst-launch Tool

Fast GPU Image Processing in the i.MX 6x

Dithering Implementation for Eink Display Panel

GStreamer i.MX6 Pipelines

i.MXプロセッサ ナレッジベース

i.MX Processors Knowledge Base

ディスカッション

i.MXプロセッサナレッジベース