Toño Hernández

Computer Vision might be the largest part of AI

Discussion created by Toño Hernández Employee on Aug 3, 2017

Computer Vision (CV) is an interdisciplinary science that enables machines to see, identify, and process images —similar to the way human vision does, but also to provide an appropriate output: interpret, analyze, recognize, or act accordingly. CV aims to view and then to give useful results for the processed image. 


“Seeing” is one of the most complicated biological processes in the human body: visual processing uses about two-thirds of the brain —involving perception on many other fronts, along with a lot of analysis. Computers must utilize several data-gathering technologies for getting the capability of sight and delivering accurate insights for smarter homes, optimized business, or safer cars. 


CV started in 1950 mimicking three separate lines: the eye, the visual cortex, and the rest of the brain. Supposedly in 1966, AI pioneer Seymour Aubrey Papert asked MIT undergraduate student Gerald Sussman and his team to plug a camera into a computer and make it describe what was seeing: The Summer Vision Project began the systematic Computer Vision research.  

Computer Vision might be the largest part of AI, and can be explained from: 
- A scientific perspective, related to the theory behind extracting information from images. 
- A technological perspective, applying models to construct Computer Vision systems.  
- An engineering perspective, aiming to automate tasks that the human visual system can do by dealing with how computers are getting high-level understanding from digital images or videos. 


Computer Vision includes some known sub-domains like object recognition, scene reconstruction & image restoration, video tracking, event detection & motion estimation, 3D-pose estimation, indexing, and learning. Usually, CV has these characterizations: 

- Imaging: concentrating on the process of producing images, for further Processing/Analysis.  

- Image Processing/Analysis: focusing on how to transform one 2D-image to another by doing pixel-manipulation: contrast enhancement, edge extractions, noise removal, or geometrical transformations. It does not require assumptions or interpretations about the image content. 

- 3D-analysis from 2D-images: inspecting the projected 3D-scene onto one or several images (like reconstructing structure or other information about the 3D-scene from one or several images). It needs some complex assumptions about the scene represented in the image. 

- Pattern Recognition: using several statistical algorithms and Artificial Neural Networks to take information from signals in general.  

- Machine Vision (MV): applying several methods to provide imaging-based automatic analysis, process control, and robot guidance to industrial applications. It requires an image-sensor technology and control theory integrated with an efficient real-time processing of image data, implemented within the hardware and software.  


CV applications include Agriculture, Robotics, Biometrics, Gesture Analysis & Face Recognition, Character Recognition, Industrial Quality Inspection & Process Control, Geoscience, Augmented Reality (AR), Image Restoration, Medical Image Analysis & Forensics, Remote Sensing, Security & Surveillance Autonomous Vehicles & Transport, and Pollution Monitoring. 


Is still not possible for CV to see objects as humans do, but considering the scale of the task, it is surprising what computers can do with what they can see. Nowadays Computer Vision is found in cameras —recognizing faces; in factory robots —monitoring problems and working around human co-workers; or in autonomous cars —watching traffic signs and checking pedestrians' safety out.