Explainability of AI Models: Why does it matter?

Dr. Adam Mohd Khairuddin

Artificial intelligence (AI) has seen significant growth and acceptance over the years as a technology. It has been applied across various sectors, including law enforcement, human resource management, healthcare, and finance. The AI models developed in these areas vary in complexity and performance. However, as these models become more intricate, understanding how they arrive at their decisions becomes increasingly challenging.

A black box AI model is a model whose internal processes are not transparent to its users. While users can observe the inputs and outputs of such a model, the decision-making process itself remains unclear. In contrast, traditional AI models, like linear regression and k-nearest neighbors (KNN), are more interpretable compared to advanced models such as neural networks (NN) and large language models (LLMs).

As depicted in Figure 1, AI models differ in their levels of accuracy and interpretability. While more sophisticated AI models often achieve higher accuracy, their black box nature raises concerns regarding trust and transparency. This lack of trust and clarity can hinder the adoption of these models in critical areas, such as crime prevention, job recruitment, mortgage approval, and medical diagnosis.

Figure 1: Model accuracy vs interpretability

The most common cause of inaccurate results from an AI model is poor-quality training data, which includes incomplete input data and biased algorithms. Algorithmic bias has long been a challenge in training AI models, leading to biases based on race, gender, age, and location. Currently, algorithmic bias is receiving increased attention due to the origins and growing complexity of modern AI models such as Deepseek V3, Qwen 2.5-Max, and Gemini 2.0.

It is important to recognize that biased models can result in skewed insights and harmful decision-making. For example, a study reported by MIT revealed that predictive policing tools exhibit racial bias against Black individuals. This bias has led not only to higher arrest rates among Black people but also to an improper distribution of police patrols and ineffective identification of crime hotspots in certain neighborhoods.

Another study released by Harvard University indicated that many companies now use automated applicant tracking systems in their recruitment processes. However, the algorithms employed in these recruitment processes exhibit gender biases. In 2018, Reuters reported that Amazon had to discontinue its AI recruiting tool after discovering that the algorithm was biased against women candidates.

Bias has also been observed in healthcare applications, particularly in medical diagnosis. According to a study by MIT researchers, AI models can accurately predict the race of patients based on their chest X-rays. However, it was later found that these same AI models struggled to predict accurately the race of all demographic groups, especially among women and people of color.

Similarly, mortgage approval applications are vulnerable to algorithmic bias. An article in Forbes highlighted significant disparities in mortgage approval rates among different racial groups. It reported that Black applicants are more likely to be denied mortgages compared to white applicants, even when both groups have similar financial backgrounds. This has contributed to inequalities in homeownership opportunities in the United States.

Although AI models have several shortcomings, explainable AI (XAI) has the potential to address these concerns, particularly regarding accuracy and interpretability. XAI encompasses processes and techniques that help users understand and trust AI-generated decisions. For instance, white box AI, also known as glass box AI, can provide more transparency regarding the inner workings of AI models. Figure 2 illustrates the comparison between black box AI and white box AI.

In recent years, researchers have focused on developing explainable AI (XAI) techniques to enhance the interpretability of various AI models. Two of the most widely used XAI techniques are SHapley Additive exPlanations (SHAP) and Local Interpretable Model-Agnostic Explanations (LIME). Figure 3 illustrates the application of the SHAP technique in predicting the risk of chronic kidney disease.

Figure 3: The use of SHAP technique for predicting risk of chronic kidney diseases. Source: Ghosh, S.K., Khandoker, A.H. Investigation on explainable machine learning models to predict chronic kidney diseases. Sci Rep 14, 3687 (2024). https://doi.org/10.1038/s41598-024-54375-4

For advanced AI models like large language models (LLMs), techniques such as attention-based explanations and neuron activation explanations are being utilized. However, evaluating the accuracy and reliability of these explanation methods continues to be a significant challenge in AI research.

In conclusion, as AI models evolve, their interpretability becomes crucial for ensuring transparency, fairness, and broader adoption in critical decision-making applications. While several explainable AI (XAI) techniques have already been introduced, further research is necessary to validate their effectiveness in accurately assessing the reasoning processes of AI models.