How AI Understands Us: A Look into Natural Language Processing

Dr. Adam Mohd Khairuddin

Human language can be defined as a system of communication through speaking, writing, or making signs in a way that can be understood¹. However, such a definition fails to grasp the ability and difficulty for machines to understand human language. It is not possible for computers to communicate human language because the words used in the language are often ambiguous. For instance, in the English language, a word can have multiple meanings depending on the context. A person who speaks English can easily differentiate between the word “light” as brightness and “light” as not heavy, but computers do not.

Specifically, the natural language processing (NLP) which is a unique application of artificial intelligence (AI) plays a key role in dealing with the complex task of making machines such as computers to understand and communicate with human language. As a subfield of AI, the NLP permits computers to understand and communicate with the human language by using statistical methods, machine learning (ML) and deep learning (DL) techniques.

Throughout the years, the capabilities of natural language processing (NLP) in contextual understanding have advanced substantially. The large language model (LLM) like ChatGPT is trained on massive amount of data containing natural human language so that it can predict subsequent words or phrases as well as generate natural sounding and meaningful responses on a variety of topics. As a result, the ChatGPT can be utilized to brainstorm ideas for creative projects, troubleshoot errors in Python programming, write essays on climate change, translate text into multiple languages, and create lesson plans for history class. However, these capabilities were found to be limited in the earlier rule-based and statistical NLP systems. Figure 1 shows an example of contextual understanding by ChatGPT in sentiment analysis.

*Figure 1: Contextual understanding by ChatGPT in sentiment analysis.*

Audio signals, web content, social media, documents, and databases are common sources of raw human language data which are typically in an unstructured format. For natural language processing to work effectively, the raw human language data must be converted into a structured format that computers can analyze and interpret. The NLP has two main types of analysis that included syntactical and semantic analyses. The syntactical analysis is the process of analyzing the structure and grammar of the language, whereas the semantic analysis deals with the process of understanding the meaning of a text by analyzing the individual words and the whole text. Table 1 presents the difference between the syntactical and the semantic analyses.

Table 1: Difference between syntactical analysis and semantic analysis.

Syntactical analysis

Semantic analysis

“The student wrote an essay.”

Subject: The student

Verb: wrote

Object: an essay

“Amazon is hiring in New York.”

Semantic analysis tags “Amazon” as an organization, and “New York” as a location.

Currently, various NLP applications have been adopted in industries such as healthcare, business, media and publishing. For instance, in the healthcare industry, there is not only large amount of biopharmaceutical data being produced in multiple languages, but it is also essential to ensure timely and accurate access to the correct information. The research-based biopharmaceutical company AbbVie has partnered with Intel to utilize the Transformer NLP model which is powered by the Intel Xeon processors and the Intel oneAPI Deep Neural Network Library to develop its own language translation service called the Abbelfish and a question/answer-based search tool named the AbbVie Search. The Abbelfish provides accurate translations for biomedical terminology for multiple languages. At the same time, the AbbVie Search scans scientific questions in research articles and generates relevant results, enabling the discovery of new treatments for patients’ pharmaceuticals and manufacturing methods as well.

In the business sector, the Airbnb has become one of the companies that has successfully leveraged the natural language processing for the purpose of enhancing its customer experience. The NLP allows Airbnb to improve users’ search results by displaying relevant listings based on their search history, past bookings, and travel preferences. Since the guests and hosts may come from different countries and speak different languages, the NLP also facilitate communication through automated translation tools. Additionally, the Airbnb utilizes the NLP to analyze guest reviews as well as assisting them to identify potential issues or areas for improvement.

As in the case of the media and publishing industry, the news agency Reuters has also benefited from the use of natural language processing in its news reporting and distribution. For instance, Reuters developed the Lynx Insight, a tool that can automatically drafts short stories. These drafts are reviewed first before publishing, enabling faster alerts and improved story discovery. Journalists at Reuters also used the NLP-powered translation tool to translate non-English sources. Moreover, the agency employs “website watchers” to scan for specific text or data on websites and instantly disseminate them to their clients and journalists in real-time, ensuring that they receive the most up-to-date information.

While the natural language processing offers numerous benefits across various sectors, it also presents certain limitations. The accuracy and robustness of the NLP systems depend on having access to high-quality and diverse training data. The availability of data on the various indigenous languages, minority languages, and languages spoken in remote or underdeveloped regions are also limited. For instance, languages such as Swahili or Bengali have far less digital text available compared to English or Chinese, making it difficult to develop accurate and robust NLP systems for those languages. Furthermore, human language is not static but constantly evolving due to cultural interactions. As an example, the English language has borrowed the words ballet from the French and sudoku from the Japanese. As such, the NLP systems will always need to be trained to keep up with new words or phrases that didn’t exist before.

In conclusion, since its introduction, the natural language processing (NLP) has become not only as a transformative technology that enables computers to understand and interact with our languages but also the technology has been successfully adopted in industries such as healthcare, business, media and publishing. Although the recent large language models (LLMs) have significantly expanded the capabilities of the NLP systems, their effectiveness still relies heavily on the availability of high-quality and well-generalized training data as well as the adaptability of the systems to the dynamic nature of human language. As the AI technology advances, the NLP systems are also expected to become more accurate, context aware, and capable of delivering seamless human-computer interactions.