CAIRO UTM Interns webinar on Real-Time Object Detection

Authored By: Ali Mahmoud Mohammed Madani, Shafishuhaza Sahlan

The CAIRO UTM Interns webinar on Real-Time Object Detection with Deep Learning and Open CV organised by CAIRO UTM Interns was delivered by Mr. Jason Koh Ye Sheng, the co-founder of InsightX Technology, a provider of cutting-edge technologies and services, which include data capture, predictive analytics and visualization. Mr. Jason graduated from Universiti Sains Malaysia (USM) in 2018 with a Bachelor in Mechatronics Engineering, where he won first place in the Innovate Malaysia Design Competition (IMDC) Intel track with a project entitled “Stroke Rehabilitation Machine with EEG Monitoring System”, within the same year. Mr. Jason then went on to work for ViTrox, a company specializes in designing and developing automated vision inspection and equipment testers for the semiconductor and electronic packaging industries, for two years as an R&D engineer, before continuing his studies in UTM where he acquired a Master of Philosophy (Mphil) in Electrical and Electronics Engineering.

The CAIRO UTM Interns webinar was moderated by Ali Mahmoud Mohammed Madani, a fellow intern undergoing industrial training in the Centre for Artificial Intelligence and Robotics (CAIRO) UTM. CAIRO UTM Interns is an industrial training group supervised by CAIRO UTM members, Associate Professor Dr Yeong Che Fai and Ir. Dr. Shafishuhaza Sahlan

Mr. Jason began by describing the difference between Artificial intelligence or AI, machine learning and deep learning. AI is an umbrella term used to describe methods that make computers mimic human behavior. Machine learning allows machines to improve over time with a statistical approach. Deep learning refers to using multi-layer neural networks – structures that mimic how the human brain works – for decision making and inferences. Deep learning is a subset of machine learning, which in turn is a subset of AI.

Object detection was then explained by Mr. Jason as a two-step process. The first step is image classification, which describes (or identifies) the object in an image. The second step is image localization, which locates the object within the image. He goes on to list the three most common types of object detection algorithms: Recurrent Convolutional Neural Network (R-CNN), Single Shot Detectors (SSD) and You Only Look Once (YOLO). R-CNN is an earlier method of object detection, while SSD and YOLO are newer and more popular.

The YOLO algorithm, Mr. Jason explained, works by dividing up an image into a grid of equally sized cells, each being a portion of the original image. The grid of cells is fed into a neural network that returns two things: bounding box prediction and probability class map. Bounding box prediction represents the predicted location of the object within the image and probability class map represents the prediction of the object itself in the image (i.e. the item being taken a picture of). “Class” here refers to the object, Mr. Jason emphasized, as there is a separate class defined for each object. Combining the two gives you the final bounding box representing the object in the image as well as its location.

When it comes to training a model for object detection, Mr. Jason stated that there are two steps: Image annotation (or labelling) and training. Image annotation refers to the process where several images are provided that are also labeled. This is a required step to training the model, performed either using a PC or online platforms. Using PCs is more code-oriented and you have complete control over the training, however with several drawbacks. Since the training process while training a model is battery intensive, use of PCs is near impossible. Using online platforms is a better solution, where several free resources are available, which include Teachable Machine and Google Colab. The latter is also more code-oriented and restricts the training run-time. A demonstration on how to use Teachable Machine, which in addition to training can also be used for image annotation, is presented in Figure 1 and Figure 2.


Figure 1 Image annotation


Figure 2 Teachable Machine demo

To get started with object detection, users must download Python as well as OpenCV for python, which is the standard library for real-time computer vision. Then they may download a pre-trained model such as YOLO and load it into the code to start coding. Mr. Jason went on to demonstrate object detection using YOLOV4-TINY, as shown in Figure 3.


Figure 3 Object detection using YOLOV4-TINY

The webinar was a very insightful and demonstrated the unlimited capabilities of machine vision. Although, Mr. Jason was only able to introduce a few applications during this session, his contribution has paved a way for more enthusiasts in machine vision to explore and discover the field for themselves.