Computer vision, also abbreviated as CV, can be defined as the field of study that aims to develop techniques to help computers “see” and understand the content of digital images such as photographs and videos. The concept has been around for more than 50 years but has only recently begun to see a resurgence of interest in how machines ‘see’ and how it can be used to build products for businesses and consumers.
Computer Vision is a field of Artificial Intelligence and Computer Science. By giving computers the ability to gain a visual understanding of the world, we are able to apply the technology to aid us in several real world applications such as autonomous vehicles, facial recognition, Google Lens, etc.
Sub-domains of computer vision include scene reconstruction, event detection, video tracking, object recognition, 3D pose estimation, learning, indexing, motion estimation, and image restoration, some of which will be covered in this article.
Machine Learning is largely used to develop computer vision, so if you haven’t checked out our article on machine learning, we recommend you do so to have some background information for this article.
The internet is comprised of text and images. Indexing and searching text is relatively straightforward but when it comes to images, computers need to know what the images contain. Up until recently, the content of images and video were best described using the meta descriptions provided by the user who uploaded the media. In order to expand on the amount of information we can extract from images, we need computers to ‘see’ an image and understand the content. For humans this is a relatively easy task, but for computers, it’s an entirely new challenge that would take decades to develop from the moment the idea were suggested in the 50s.
What is Computer Vision?
Computer vision is a field of study focused on the problem of helping computers to see:
“At an abstract level, the goal of computer vision problems is to use the observed image data to infer something about the world.”
Computer vision is a multidisciplinary field that could be considered a subfield of artificial intelligence and machine learning, which both involve the use of specialised methods and make use of learning algorithms. These learning algorithms make the system better and better at their functions. In the simplest terms, CV is the discipline under a broad area of Artificial Intelligence which teaches machines to see much like we do. Its goal is to extract meaning from pixels.
Figure 1: Computer Vision can be considered as an underlying study of artificial intelligence and machine learning
The goal of computer vision is to understand the content of digital images. Typically, this involves developing methods that attempt to reproduce the capability of human vision. Understanding the content of digital images may involve extracting information from an image, which might be an object, a text description, a three-dimensional model, and so forth.
The goal of Computer Vision is to replicate human vision using digital images through three main processing components in consecutive order:
Image analysis and understanding
One of our main strengths as humans is reflected in our ability to make decisions and make sense through what we see in the real world. Providing machines and computers with this kind of visual understanding would allow them the same strength.
Object Classification involves the training of a model on a dataset of specific objects, and the model classifies new objects as belonging to one or more training categories.
Object Identification is where a model will recognise a specific instance of an object.
A classical application of CV is handwriting recognition for digitising handwritten content. Other methods of analysis include:
Video motion analysis uses computer vision to estimate the velocity of objects in a video, or the camera itself.
In image segmentation, algorithms partition images into multiple sets of views.
Scene reconstruction creates a 3D model of a scene inputted through images or video.
In image restoration, noise such as blurring is removed from photos using Machine Learning based filters.
Why is Computer Vision important?
Computer vision has enabled a fast-growing variety of applications that can improve the overall quality of life, enable new technologies, streamline processes, and improve safety. Here are some of the applications that derive from Computer Vision:
Facial recognition: Computer Vision allows computers to recognise and differentiate between human faces with great detail. The iPhone’s FaceID technology makes it possible for people to secure their phones with just their face. Facebook and Snapchat also use face-detection algorithms to apply filters and recognise you in pictures whenever you’re tagged in a photo.
Surveillance Cameras: Computer vision in surveillance can enable better detection of suspicious behaviour, wanted criminals, and improve the overall surveillance of the surrounding areas.
Smart vehicles: Computer Vision remains the main method of information extraction for detecting traffic signs and lights and other important traffic rules.
Image Retrieval: Google Images uses content-based queries to search relevant images. The algorithms looks at the content in the query image and return results best on the best matches.