AI & Art with Cali Rezo (4): Explainability and uncertainty of AI models

In the third article of the series, we discussed how to apply AI to art analysis. Even if our results were not as conclusive as we’d hoped, they still raised a few questions that we will tackle today: what is really happening in these black box models that are neural networks? And to which extent can we assess how certain a model is of its predictions?

Today, let’s have a small interlude where we don’t directly work on the AI & Art project with Cali Rezo but instead ponder upon some issues that it has raised and that are currently being investigated by AI research teams.

Here, I will focus on two big questions with current AI models: their explainability and their uncertainty. Both of these would require a full article to dive into the details, so the point of this article is more to mention them and give you a rough feeling of what these words mean in the context of AI – and why they are worth worrying about – than to explicitly state everything about them!

I will use well-known machine learning toy datasets (like the iris dataset, the diabetes dataset…) to get cleaner and clearer results, because unfortunately, as we have seen last time, our analytical models did not manage to get much out of Cali’s images.

A. Explainability

What does it mean that a model is explainable?

Today, neural networks and AI in general are regarded as one of the most promising future of programming. On the other hand, these networks are black boxes that are hard to interpret; explainable AI, or XAI, is an ensemble of techniques that aim at fighting this lack of knowledge and possibly reveal flaws in the model’s learnings. Indeed, they adjust internally hundreds or thousands of tiny buttons to approximate a complex function as accurately as possible – and staring at this bunch of weights is often not very helpful in understanding how the model makes its predictions.

Now, you might be thinking: is it really an issue? In a way, isn’t it the goal of AI to have computers work by themselves? Well, the thing is that it would still be nice to have an idea of how an input is processed into a prediction and how our network modeled the situation at hand. In particular, in the industry, as we start to use AI more and more, we are faced with the issue of reporting on our results and justifying that we can use the model’s predictions for a commercial product. If you have a great network that reaches top-notch accuracy but you have no idea how it does so, then it is harder to convince your investors your tool is reliable.

In addition, identifying how your model works can be key in preventing it from learning relationships that do not hold in general and implement incorrect biases. For example, if your training dataset is imbalanced somehow, your network might consider that this imbalance is part of the problem itself and reproduce this behavior later on. Or perhaps the network has learnt “cheat codes” and “shortcuts” that work well only on this specific data.

It is worth saying that, sometimes, it is better to have a model that is slightly less accurate in its predictions but can be explained better. This depends on your context, of course, but losing a few percents in your accuracy score while being able to clearly map how inputs are transformed into predictions can be a huge gain. Explainability is particularly relevant when a human agent is asked to cooperate with an AI to solve a task.

Note: a quick note on AI transparency, though, is that too much can be as bad as too few. In other words, being flooded with information can actually stop an agent from taking a decision, or at least reduce his ability to do so. This issue with information overload has been nicknamed “info-besity”… and in this day and age where there is so much data everywhere, I feel like this problem extends to a lot more areas than just AI!

Deep learning is not good for interpretation. Conversely, simpler models like decision trees are a lot more easy to understand. Also, linear and logistic regressions are 2 very basic models for regression and classification problem respectively that are great in that sense: just looking at some variables after training you can immediately identify which features are the most important and how they influence the predictions.

2 simple explainable models: linear and logistic regressions

Linear regression

The linear regression is a very basic machine learning method to model a scalar relationship between one numerical output variable and one or more numerical input features. To put it simply, the idea is to assign and tune a coefficient for each feature (plus one independent bias called the “intercept” to represent a global translation from the origin), and then sum all the contributions together to get the output value; this way, when you are given a new item, you simply take its features and place them into your equation to automatically get your prediction.

Mathematically, for $latex N$ items in your dataset and $latex K$ features, you end up with a formula like this one:

$latex \hat y_i = \beta + \sum\limits_{k=0}^K \alpha_k\cdot x_{ki},\quad\forall i \in \{1,\hdots,N\}&s=1$

where $latex \hat y$ is the predicted output value, $latex \beta$ is the intercept and the $latex \alpha_k$ are the coefficients of each feature. The great thing about linear regression is that, once you have trained your model, just by looking at the values of the coefficient and the intercept you can get an intuition of which features are relevant, how much each one contributes to the prediction and even if they contribute positively or negatively.

For example, a famous machine learning dataset used to study regression is the diabetes dataset. In this toy dataset, we have various information on about 440 patients from which we can learn to predict the disease progression over one year. In total, there are 10 features plus this value.

For now, let’s restrict ourselves to only 2 – this will make the explanations easier, even if of course it extends to the 10 features. So we will see how we can predict the value of the houses looking only at:

  • the average blood pressure (ABP)
  • the HDL cholesterol measurement (HDL)

If we use the great ML Python library scikit-learn to train a linear regression on this dataset, we end up with 3 valuable pieces of information: the coefficient for the 1st feature (ABP), the coefficient for the 2nd feature (HDL) and the intercept. In other words, if we refer to the equation shown above, we have values for $latex \alpha_0$, $latex \alpha_1$ and $latex \beta$.

Here are the values we get:

$latex \alpha_0 \approx 0.38, \qquad \alpha_1 \approx -0.18, \qquad \beta \approx 0$

From this, we can deduce a few things:

  • the 1st feature (ABP) contributes positively to the prediction whereas the 2nd one (HDL) contributes negatively
  • the ABP feature is about twice as important as the HDL feature in our prediction

If we take some new test data for these features and ask our linear regression to predict the estimated progression of diabetes over a year given this information, we then get a nice rainbowy plot like this one:

Predicted progression of diabetes for a grid of test data

It essentially boils down to the same conclusion: if you have a high ABP and a low HDL, then the predicted value is high (red on the graph). Vice-versa, the more you increase the HDL value and the more you reduce the ABP value, the smaller the prediction gets (blue being the lowest predicted output).

This sounds quite logical and is very easy to understand! We have effectively checked that linear regressions, although they are too simple for complex datasets (mainly because they assume a linear relationship between the features and the predicted output), are very easy to interpret.

Logistic regression

A similar algorithm for classification is the logistic regression – it basically works the same, except that you output a categorical variable (to do so, we use the logit function that we apply to our sum of weighted contributions). And once again, the nice thing with this model is that it is easy to understand how it makes its predictions: it is quite explainable.

Let’s see how this works on the iris dataset – a common toy dataset for classification. In this dataset, we have 150 plants separated in 3 classes (of 50 flowers each) for which we know the petal length, petal width, sepal length and sepal width. Just as before, we’ll stick with the 2 first features; after training our logistic regression, we can plot our predictions like we did in the last articles (here the predictions are the background color and the real classes for our training data are the inner colors of the circles):

Plot of the 3 regions – one for each class – as predicted by our logistic regression (the inner color of circles is their real class and the background color is their predicted class)

This plot helps us understand how each of these 2 features contribute to the prediction (e.g.: the longer the sepal, the more probable you will end up in class 2 or 3).

But how can we explain neural networks?

So far, so good. We are able to explain simple models like linear and logistic regressions. But this is easy: we can look at the coefficients directly to know how features influence the prediction.

In a neural network, each node is a small button that is tuned during training. In the end, we have turned all those nudges to get the best predictions possible but it is kind of hard to decipher how an input flows through the network up to the output nodes. If you have a dense layer of 500 neurons, what does it actually mean that the 126th one has a negative coefficient whereas the 301st neuron has a positive one?

As AI models explainability has become a growing concern in the industry, people have started to develop tools to try and peek into these black boxes.

SHAP: A tool to visualize features importance

SHAP (SHapley Additive exPlanations) is a great Python tool for explaining black box models like networks. The Github project mainly relies on the papers from Scott M. LundbergGabriel G. Erion and Su-In Lee and dates back to the end of 2016. It now offers various functions for visualizing information about our features and how they influence the prediction computation.

To illustrate this we are going to look at a third famous machine learning: the Boston house prices dataset. This dataset contains about 500 houses for which we have 13 features and a mean price value (this last number is usually the one we want to predict).

Just for the sake of argument, we will know try to use a simple feed-forward network to predict the mean value on a test sub-dataset. The model (which architecture is based on this Kaggle article, by the way) has an input layer, then 2 dense hidden layers of each 64 neurons and finally an output layer with only 1 neuron for the predicted value. So, after training it, we have two choices:

  • either we look at the 64 weights and 64 biases of each dense layer (a total of 256 variables) and try to make sense out of all those seemingly “random” numbers, at least for a human eye:
[[-3.28192651e-01,  3.82912830e-02,  2.34939635e-01,
  -2.27540042e-02,  2.54860818e-01, -3.68398190e-01,
  -4.40444089e-02, -2.81749070e-01, -1.13208875e-01,
  -2.20769420e-01, -2.36565366e-01, -2.64305808e-02,
  3.38688716e-02, -2.41030738e-01, -1.03570826e-01,
  -2.87916511e-01, -1.57831654e-01, -3.80972087e-01,
  6.80320337e-02,  7.76103325e-03, -2.12798198e-03,
  6.33590668e-02, -9.92115587e-02, -7.58485496e-02,
  -2.61139691e-01,  2.09142372e-01, -9.90124345e-02,
  9.33903642e-03,  4.01112027e-02, -9.71766538e-04,
  -8.53452533e-02, -5.74095920e-02, -9.29129422e-02,
  1.56749472e-01, -2.88513958e-01, -6.89019784e-02,
  -1.27268627e-01, -8.14093351e-02, -2.41435781e-01,
  2.29337253e-02, -3.02231818e-01, -4.38865155e-01,
  2.54781339e-02, -3.80360037e-01, -3.48593965e-02,
  -1.34712592e-01, -2.67993897e-01, -8.63134414e-02,
  -1.10942155e-01, -1.19430967e-01,  2.53207773e-01,
  -3.27741385e-01,  6.96226805e-02, -8.09423029e-02,
  -3.11290473e-01,  1.51000217e-01, -4.57243592e-01,
  -3.28150272e-01, -2.38340870e-01,  3.36692661e-01,
  -3.31284314e-01,  2.09829345e-01, -4.06932354e-01,
  • or we use the SHAP library to better understand what is going on

To begin with, we can see how each feature contributes to the final prediction; the following barplot shows the importance of each of the 13 variables we have for a house when predicting its mean value:

This indicates that, for our model, NOX (the nitric oxides concentration) and RAD (the index of accessibility to radial highways) are paramount in determining the final mean value, whereas INDUS (the proportion of non-retail business acres per town) or ZN (the proportion of residential land) are not very relevant.

Note: of course, this depends on the model that has been trained. It is quite interesting to see that, in the README of the SHAP library on their Github repository, on the same Boston dataset but with a tree model, they come up with a different order for features’ importance.

Another cool visual we can get is, for a single test example, a plot that shows how each feature modified the model’s prediction from the base value (of 21.21) for that particular item. In red, you can see the features that increased the output value and in red, the ones that lowered it – down to a final predicted score of 7.03:

This matches our previous assessment that the RAD and NOX features are significant in computing our prediction, since both of them had a huge impact on guessing the output for this item. An additional piece of information we get here is that RAD is actually a feature that contributes positively (it increases the predicted value) while NOX contributes negatively (it decreases the predicted value).

Finally, this third graph is basically stacking each of the individual explanation plots we talked about just before to explain the predictions for several test examples (here, there are 10).

Of course, the SHAP library is not a magic bullet: if your problem and/or your model is complex, you will still have trouble figuring out exactly what is happening under the hood. But for a lot of basic use cases, it does a pretty good job and really helps explaining your models.

B. Uncertainty

Note: Many thanks to Eyal Kazin, data scientist at LabGenius (and currently my internship tutor!), for bringing this topic to my attention and digging up some really cool resources like this paper by Alex Kendall and Yarin Gal: “What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?” (published on Arxiv in 2017) that will guide my discussion in this section.

Uncertainty in AI: what is it and why is it important?

Apart from explainability, another nice-to-have with complex black box models like neural networks is a measure of how confident they are when they make a prediction. As mentioned in my last article, suppose your network must choose between two categories for an image; then, it is quite different if it tells you that it is category n°1 with 99% certainty, or if it is about a 50-50% chance of being either one and the network just picked one at random – because you forced it to decide for only one category. In the first case, you would have very low uncertainty (or alternatively, very high confidence) and in the second case you would have very high uncertainty.

Measuring the uncertainty of AI models is currently still mostly at the research stage, but more and more people start to get an interest in this. Even if teams have worked on this since the late 80s, trying out various tools and mathematical constructs, so far it had not been adapted to deep learning and neural networks like the ones most AI engineers use nowadays. The importance of uncertainty in the context of deep learning was brought to light recently by two situations where the model predicted something with over-confidence that resulted in a bad outcome:

  • in May 2016, an autonomous car confused a bright sky with the white side of a trailer leading to a collision and the death of the driver; the NHTSA report has become quite (in)famous since then
  • a 2015 version of Google Photos application identified two African Americans as “gorillas” which raised ethical questions and many concerns about underlying racial discrimination in the algorithm (here is an article by J. Guynn on this event)

In both those examples, had the model been able to give its predictions with also a high uncertainty for those, the system might have been able to react differently and avoid these bad consequences.

So, among other things, having your model recognize that it wasn’t trained on the kind of data you are feeding it now and therefore has no idea what to output can be paramount to avoid falsely confident predictions and bad responses: if your network learnt to distinguish between a cat and a dog, you don’t want it to predict anything with 100% confidence if you show it a car. Also, knowing the uncertainty of your model can help you identify that something is wrong either with your model or your data and correct it more easily.

As explained in Yarin Gal’s thesis (pp. 9-13), model uncertainty could become crucial for AI safety or efficient exploration in reinforcement learning. AI safety is a broad term that refers to all these new situations where we are starting to apply AI not to toy datasets but to real-life problems that might be life-threatening to humans – so, we have to be more aware of the risks of unexpected behavior from our model! This can be diagnosing a patient, predicting actions for the stock market, driving an autonomous car… in all those situations, an error from the system could result in a tragic outcome. On the other hand, reinforcement learning (RL) is a domain of machine learning where an agent trains to accomplish a specific task by discovering its environment and gradually learning the best move. In order for the agent to progress, it must be able to take chances and allow some uncertainty in its decisions, hence the need for this measure in RL.

Epistemic and aleatoric uncertainties

Now the question is: where can this uncertainty in your model come from? What can make an AI unsure about its predictions?

In their paper, Kendall and Gal mention two types of uncertainties – that group together to make the predictive uncertainty of the model:

  • the epistemic uncertainty: also called the “model uncertainty”, this is the uncertainty that comes from the model itself (be it you chose the wrong type of network altogether, or that your weights could be tuned in a lots of different ways to well-predict your data) and it can be reduced by training on more data; it is quite useful to spot out-of-data examples because the model will be able to output a prediction but also a very high uncertainty for these – effectively warning the user that this prediction is really a far-fetched extrapolation that shouldn’t be blindly trusted
  • the aleatoric uncertainty: this corresponds to the noise in the data, for example due to measurements; it is either a noise that is the same in all the dataset or one that is specific to each item and, if we take it into account, we can try to teach the model to pay less attention to noisy input and thus learn a controlled attenuation on inputs (we are essentially telling the model to “ignore”, or at least “put less trust in” the noisy data)

Their paper focuses on image analysis with the classic CIFAR10, CamVID and NYUv2 datasets. Here are some of their results on the later – which contains more than 1400 images with 40 different classes of objects to identify:

Results of uncertainty of Kendall’s and Gal’s model for depth regression on the NYUv2 dataset (from”What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?”, 2017: From left to right: input image, ground truth, depth regression, aleatoric uncertainty and epistemic uncertainty.

We notice several interesting things that help us grasp these 2 types of uncertainty:

  • the aleatoric uncertainty is quite low for close and visible objects, but it is high on the contours of objects and on reflective surfaces like the floor on the left of the first image
  • the epistemic uncertainty is higher where we have abnormal patterns in the image like the person standing at the door in the last example (most of the dataset contains images of indoor scenes without any human)

The cool thing with these epistemic and aleatoric uncertainties is that, in their paper, Kendall and Gal also offer relatively simple implementations for them – that essentially rely on Monte Carlo sampling. Most importantly, they don’t need you to alter your neural network in a profound way (only modeling the aleatoric uncertainty requires to change your loss function, and adapting it is not too difficult). This means that we can add those new measurement tools to complex deep learning networks that have been optimized by years of research for specific tasks like image segmentation, text analysis, sound recognition…

For the curious: some technical details on computing uncertainty

For those who are a bit more into maths, here are some details on how Kendall and Gal propose to actually compute uncertainty. The idea they develop in the paper (and that was already a core contribution of Gal’s thesis, see pp. 47-54) is to perform a given number of independent stochastic (i.e.: applied to a random subset of weights in the model) forward passes through the network and average the results.

  • For regression, we thus perform a Monte Carlo approximation for the integrals that appear in our loss functions (in particular, in the log-likelihood function that is now a common tool for regression in networks) so that we have an estimate of the mean and variance.
  • For classification, we can either:
    • evaluate variation ratios that tell us how “spread” around a specific label the results are
    • or take the path of information theory and compute the predictive entropy (which is equal to 1 if all classes are equally probable – the model is very uncertain – or 0 if only 1 class was predicted – the model is very confident) or the mutual information (which is maximal for the inputs where the model is very uncertain on average)

Note: to get a better intuition of these different measures of uncertainty, I suggest you take a look at Yarin Gal’s thesis, p. 54.

A nice thing with this technique is that every sample computation is independent, so they can be parallelized easily and aggregated afterwards, therefore avoiding too much of a slow down on the network training process.

This approach is neat because it is easy to implement. However, Gal warns us of some issues with it in his thesis: firstly, despite parallelization, this can lead to a longer training time for the model; secondly, the uncertainty is not “calibrated”, meaning that for example it might be bigger for inputs that have big values compared to the rest of the dataset; thirdly, although they seem to work well in practice, these formulae can underestimate the predictive uncertainty.

Moreover, other researchers in the AI community like Ian Osband have questioned whether this technique truly captures uncertainty, or if it rather gives the risk associated with your model.


This article just gave a brief overview of two topics of AI that are now under scrutiny by more and more research teams: model explainability and model uncertainty.

Despite the fact I glossed over many details, I hope this helped you feel what these are and why they are important. However, I’d like to point out that these concepts are not only relevant for the field of machine learning but in science in general. In my opinion, understanding the tools you are working with and knowing how reliable they are is always a good concern to have!

Next time, we’ll hear what Cali thought of our project, what she learnt from it and what future developments she imagined for this collaboration.

  1. Cali Rezo’s website:
  2. Eyal Kazin’s LinkedIn:
  3. Scikit’s page on ML datasets:
  4. Association for Uncertainty in Artificial Intelligence’s website:
  5. I. Wikimedia Foundation, “Explainable artificial intelligence” (, April 2019. [Online; last access 11-May-2019].
  6. I. Wikimedia Foundation, “Information overload” (, March 2019. [Online; last access 11-May-2019].
  7. I. Wikimedia Foundation, “Linear regression” (, April 2019. [Online; last access 8-May-2019].
  8. I. Wikimedia Foundation, “Logistic regression” (, April 2019. [Online; last access 9-May-2019].
  9. I. Wikimedia Foundation, “Logit” (, January 2019. [Online; last access 9-May-2019].
  10. A. Kendall and Y. Gal, “What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?” (, October 2017. [Online; last access 9-May-2019].
  11.  Scott M. LundbergGabriel G. Erion and Su-In Lee Arxiv profiles
  12. A. Kendall and Y. Gal, “What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?” (, October 2017. [Online; last access 11-May-2019].
  13. NHTSA report on the self-driving car collision (May 2016):
  14. J. Guynn, “Google Photos labeled black people as ‘gorillas'” (, July 2015. [Online; last access 10-May-2019].
  15. Y. Gal’s thesis: “Uncertainty in Deep Learning” (, September 2016. [Online; last access 11-May-2019].
  16. I. Wikimedia Foundation, “Reinforcement learning” (, May 2019. [Online; last access 11-May-2019].
  17. I. Wikimedia Foundation, “Monte Carlo method” (, May 2019. [Online; last access 11-May-2019].
  18. I. Osband, “Risk versus Uncertainty in Deep Learning: Bayes, Bootstrap and the Dangers of Dropout” (, 2016. [Online; last access 12-May-2019].

Leave a Reply

Your email address will not be published. Required fields are marked *