How AI Sees the World: A Glimpse into the Computer Vision

Dr. Adam Mohd Khairuddin

Practically, everything we know about the world comes to us through our senses. More specifically, two-third of all the information that comes from the world and around us goes into our brain depends on our vision. The human eyes are specialized organs of sight designed to respond and detect patterns of light. By detecting the patterns of light, the eyes allow us to receive information from our surroundings as well as enable us to see and interpret the world around us. Human vision begins when the eyes capture light and convert it to nerve signals, which then travel along the optic nerves. The brain receives these signals, processes and interprets them into the images we see.

Although a computer can mimic certain aspects of the human visual system, it views visual information quite differently. This is because computers rely on algorithms and hardware to analyze digital images and videos. For instance, a computer might be able to detect a stop sign by analyzing its octagonal shape and red color. However, unlike the human eyes, the computer does not immediately understand the stop sign and know how to respond to it. For a computer to understand and respond to the stop sign, it requires a computer vision system.

Computer vision is a field of artificial intelligence (AI) that enables computers to extract, analyze and interpret meaningful data from images and videos. In its early stage, the computer vision was only able to detect simple shapes, edges and basic patterns from the matrix of image pixels. However, later on, with the use of machine learning, researchers were able to build more advanced computer vision models using hand-crafted features. More recently, through the adoption of deep learning models, computer vision has not only been improved significantly in terms of its accuracy and efficiency but also developed the ability to automatically extract features from the raw image data. Figure 1 presents the earliest mobile robot called the Stanford Cart which utilized computer vision for autonomous navigation.

Figure 1: The Stanford Cart, 1973. Source: “Principles of Sensing for Autonomy, EE259,” Stanford University, Stanford, CA. [Online]. https://ee259.stanford.edu.

Although human vision excels at learning from limited examples, it is not the same for the computer vision system. For instance, a radiologist may recognize a malignant brain tumour in a magnetic resonance imaging (MRI) scan after seeing only a few cases. However, in the case of a computer vision system, it will need to be trained on thousands of annotated MRI images, which are then fed into a computer vision model to not only learn about the patterns but also to predict at a comparable level of accuracy. The Convolutional Neural Networks (CNNs), the Residual Networks (ResNets) and the Vision Transformers (ViTs) are some of the commonly used computer vision models for performing various computer vision tasks. The tasks range from image classification and image segmentation to more advanced applications such as image generation, object detection and object tracking. Figure 2 shows the use of a CNN model for classifying brain tumours using MRI scans.

Figure 2: Example slices of MRI images from different views used for brain tumour classification with a CNN model. Source: D. Reyes et al., “Performance of convolutional neural networks for the classification of brain tumours using magnetic resonance imaging,” Heliyon, vol. 10, no. 3, p. e25468, Mar. 2024. doi: 10.1016/j.heliyon.2024.e25468.

Today, the advancements in the computer vision technology have enabled companies in various industries such as manufacturing, retail, agriculture, and healthcare to solve their problems. For instance, in the manufacturing industry where worker safety is a top priority, the technology has ensured workers to wear appropriate safety equipment, track their movements, and notify management of any unexpected events along the assembly line. In addition, the industrial scanners have also been used to monitor production lines to detect defects or perform quality checks on every finished product with greater accuracy and at a much faster rate than the manual inspection. Furthermore, the adoption of the technology in predictive maintenance has further avoided unexpected breakdowns and disruptions through the analysis of images and videos that helped to identify early signs of wear, overheating or alignment issues with equipment before they lead to failure.

In the case of the retail industry, the computer vision technology is transforming the way businesses operate and interact with customers. With cashier-less checkout system, customers can simply select the grocery items they want to purchase, while the store’s camera system automatically captures the images of each product. The algorithms in the system are trained to identify the products and their prices, and then it charges the customer the correct amount to pay as they leave the store, allowing for a faster shopping experience. The inventory management system with computer vision technology can also monitor shelves to detect when items are running low and issue replenishment alerts. This helps prevent customers from leaving the stores empty handed and buying from the competitors instead.

The computer vision technology also plays a crucial role in improving productivity and sustainability in the agriculture industry. For instance, farmers can benefit from object detection system in fruit fields to estimate yields, enabling them to allocate resources efficiently and optimize harvesting schedules. Automated harvesting machinery can further reduce dependence on manual labor by selectively picking mature produce based on their color, size and shape. Moreover, pest detection often relies on manual inspection which can be time consuming and prone to human error. The computer vision technology also allows farmers to identify leaf discoloration or structural deformities in their crops to promote healthier yields and reduce the need for pesticides.

In the healthcare industry, the computer vision technology has revolutionized the way medical professionals diagnose, monitor, and treat patients with improved accuracy, faster assessments as well as with reduced errors. For instance, the manual interpretations of MRI scans by radiologists can be time-consuming and prone to oversight. However, with the artificial intelligence (AI) driven vision systems, irregular tissue growth or tumour can be detected more quickly, allowing radiologists to prioritize the cases that require immediate attention. These systems can also improve patient monitoring by tracking movements in rehabilitation settings, such as when elderly patients or those with mobility issues attempt to leave their bed unsupervised. The systems enable nurses to be alerted before a potential fall occurs. Furthermore, AI cameras can monitor hand hygiene compliance among doctors and nurses, ensuring that they wash their hands before and after patient contact to reduce the risk of contamination in healthcare settings.

Although the computer vision has been found to be useful in various industries, the technology has certain limitations such as it depends on high quality data. It is essential to build a successful computer vision system with high-quality data. If the system is trained on insufficiently diverse data, it will struggle to adapt to new or slightly different images. For instance, a self-driving car system trained on images taken in sunny weather might fail in bad weather because the lightning, composition and appearance of the vehicle change with the different weather. Moreover, training the system with underrepresented data can also introduce biases. As an example, a skin cancer detection model trained on images of light-skinned patients may fail to detect cancer in patients with darker skin tones. Additionally, training and utilizing complex computer vision models require more powerful computing infrastructure. This may limit the ability of smaller organizations to access the powerful graphical processing units (GPUs) for training due to budget restrictions.

In conclusion, the computer vision technology has enabled computers to analyze and interpret images as well as videos to improve safety, productivity, and decision-making across industries such as manufacturing, retail, agriculture, and healthcare. The current advances in artificial intelligence and processing power will continue to expand not only its capabilities but also its adoption. More importantly, it is crucial to know that the successful adoption of computer vision systems depends on factors such as diverse, high-quality training data and access to powerful computational resources. Last but not least, the integration of explainable AI and edge computing is needed to make computer vision models become more transparent, trustworthy, efficient, secure, and cost-effective, particularly in critical areas like healthcare, security and legal systems.