Once upon a time there was transparency

The bad news: in the era of neural networks, the old, “clear” notion of explainability is now a thing of the past. Even transparency has had better days.

Let’s first define what is meant by these two terms. In fact, although they largely overlap, these concepts have slightly different nuances and allusions.

Transparency refers to clarity and understanding of the entire process and the logics involved in the AI model, from its creation to output generation.

Explainability refers to the ability to explain what caused the model to generate a certain output – in the sense of understanding what is “semantically” responsible for the output.

We shall examine these two definitions in more detail.

Let’s start with transparency.

Creating a transparent AI model boils down to three fundamental elements.

Dataset – transparency, as we have said, concerns the entire process and the logics involved. The first step towards a transparent model is access to the data it has been trained with. The results of models depend heavily on the data used for training.

In our case, the data is important because we find traces of it in the results. Generative AI models, for example, produce stereotyped images. The models don’t make mistakes, they simply generate data based on what they have seen during the training. To understand the outputs better, it is useful to know the data used for training, to have transparency about that. When it comes to most mainstream models of generative AI used today – ChatGPT, Gemini, Claude 3 – we do not know which datasets were used for the training. We can speculate, but we don’t know officially (see the legal case filed by the New York Times against OpenAI). Even various open-source models such as LLama3 do not provide information about the dataset used.

Architecture – it is essential to know the architecture to understand the functioning and logics of the algorithm. Knowing the type of network used, the number of parameters used, the type of activation functions, the number of layers used, any other heuristic logics – these things are all necessary to obtain transparency on the flow of data during the inference operation, i.e. when an output is generated. Here also we have only partial knowledge of the major models on the market.

Testing and monitoring – for transparency purposes, it is good to have information concerning relevant tests, benchmarks and metrics before the model is made publicly available as well as a report on continuous monitoring once it is operational in the world.

In summary, we can say that an AI model is transparent when accompanied by

the data on which it was trained
a detailed description of the different parts of the system
testing and monitoring.

Now let’s look at explainability.

With explainability, we place more emphasis on what is “semantically” responsible for the output. We have complete explainability when we can follow backward reasoning that allows us to establish clearly which symbol caused the output. The following two examples clarify this point.

Case (1)

We are in 1998. Franco Rossi goes to the doctor because has fever and is coughing, so he believes he has pneumonia. The doctor relies on technological tools and, after examining the patient, enters his data into the software at his disposal.

The software declares the following result: negative. No pneumonia.

Question: Why was this output given? What, which “variable” (feature) was responsible for that output?

In 1998, the doctor is using a program written with rules. In this software, there is a rule that states: “if the fever is greater than x, if he coughs at least y times per minute, and if he has shortness of breath (a spirometry value of less than z), then he has pneumonia.”

In our hypothetical case, we’ll assume that Franco Rossi has a spirometry value greater than z. And so the rule is not triggered for this precise reason (the value must be greater than z).

This is an example of complete explainability. We know exactly why the software returns the negative output and which symbols are responsible for that result. Mr. Franco does not have pneumonia because the spirometry value is higher than z.

Case (2)

We are in 2024. Franco Rossi goes to the doctor because he has fever and is coughing, so he believes he has pneumonia. The doctor relies on new technological tools and, after examining the patient, enters his data into the software at his disposal.

The software declares the following result: negative. No pneumonia.

Question: Why was this output given? What, which “variable” (feature) was responsible for that output?

In 2024, the doctor has access to a model that uses a neural network. In this case, we have no rules and we cannot use “backward” reasoning like we could before. What we can do, however, since the model has been trained on these three variables, is to do some analysis to understand how much “impact” they had on the final output.
This hypothetical investigation would show lower activation of the nodes related to the shortness of breath variable and so that variable could be considered responsible for the result.

That’s right. We can interpret the lower levels of activation related to that feature as the reason why the model output was ‘negative, no pneumonia’.

So why do we have an explainability problem? After all, even in case (2), we can interpret which variable is mostly responsible for the output!

The reason is this:

The models that are used today have a multitude of features. For example, OpenAI’s tokenizer has 50,257 tokens (each token is a word or part of a word);
These data become vectors, i.e. a numerical series that has a semantic value (only) within the network;
These vectors pass through the neural network, which has a very deep architecture. We are talking about dozens and dozens of layers, each having hundreds and hundreds of nodes. And each one has an activation function before arriving at the final dense layers.

In a nutshell, it means that for that datum, that word (or piece of word), to be transformed into a vector, it has to go through a very long series of transformations.

The initial vector still has a relationship with the world, but then all contact is lost. During the journey, it becomes a purely numerical sequence which does have a semantic value within the vector space represented by the neural network, but for us observers it is only a number in the network. This is why neural networks are said to be “black boxes“.

Today, various attempts are being made to deal with this situation and there have been steps forward. Furthermore, it goes without saying that smaller and leaner models are less obscure. And if we then take simpler machine learning algorithms such as Random Forest, then it is possible to interpret results more consistently. However, the vast majority of AI models we deal with today have complex structures that have lost the clarity and explainability that we had in past decades.

Perhaps, when referring to the current scenario, it would be more correct to speak of interpretability rather than explainability.

Let’s talk about business

What does all this mean for business?

There are two aspects to consider: practical and legislative.

From a practical point of view, we have to accept that it is impossible to obtain clear and transparent explanations as to why certain algorithms produce certain outputs. We need to take a new look at causal links because, depending on the type and complexity of the model utilised, it is only possible to get partial understanding of what is responsible for the output.

As a consequence, it is essential to understand which area we are concerned with and weigh the seriousness (cost) of the error accordingly. There are different costs associated with a predictive model advising on the best dress to wear to a dinner party compared to a model tasked with helping a doctor choose the most appropriate treatment for a patient. Where and when can we allow it?

On the regulatory front, further to the already well-known GDPR, we are entering the era of the EU AI ACT. To meet the required transparency criteria, all systems classified as high-risk must provide detailed reports on the AI model used. For example, a thorough description must be provided of various elements pertaining to the system including its development, monitoring, the model’s functionalities, and the changes it has undergone over time.

Conclusions

In this new era, it is possible to demand that the process logic of AI models utilised should be as transparent as possible. However, as far as output explainability is concerned, we must come to terms with a new perspective where “semantic” responsibility – that which informs us that “this output was caused by these symbols” – is largely lost now to allow for a dynamic process of interpretating causes.