“Why Should I Trust You?” Explaining the Predictions of Any Classifier

https://arxiv.org/pdf/1602.04938.pdf

A vital concern about the machine learning models is whether to trust their prediction or the model itself or not. Verifying whether a model is very critical in deploying ML based solutions in-the-wild. For example, when using machine learning for medical diagnosis or terrorism detection, predictions cannot be acted upon on blind faith, as the consequences may be catastrophic.

The most common metric currently used in practice for measuring the trustworthiness of a ML model is the validation accuracy. However, this metric can fail in multiple ways. On one hand, the real world data could be significantly different from the training and validation data set. This is called data shift. Another problem could be data leakage which is unintentional leakage of signal into the training data that would not appear when deployed. For example, patient ID to be highly correlated with some medical condition in training and validation set. The validation accuracy could be higher because of these issues but the model cannot be trusted with real world data. One solution to improve the credibility of a model is by inspecting individual predictions and their explanations.

This paper targets to address 2 problems: giving explanation to prediction and giving explanation to the model. The authors propose LIME (Local Interpretable Model agnostic Explanations), an algorithm that can provide explanations for individual predictions made by any classifier or regression model. They extend this approach and propose SP-LIME which provides a set of representative instances and their explanations to address how trustworthy the model is.

Explanation for prediction

Giving explanation for prediction can provide more credibility to it. For example, in the previous case medical diagnosis, providing symptoms of a patient that led the model to make a prediction about his disease as explanation would help the doctor to decide whether to trust this prediction or not. Based on these examples, the explanation should have certain characteristics:

  • Interpretable: the explanation should be human readable, like list of symptoms as opposed to their corresponding feature vector that is used by the model. In case of image-classification it could be super pixels – a region in the image.
  • Local fidelity: the explanation must be locally faithful, which means it must correspond to how the model behave in the vicinity of the instance being predicted. Ideally, the explainer should be able to provide globally faithful explanations. However, this is hard to achieve.

LIME algorithm provides interpretable, locally faithful explanations for the predictions made by any model. On high level, the key idea is very simple. LIME defines explanation as an interpretable model g, for example, simple models like linear model or decision tree. The input to the model is a binary vector and has an interpretable data representation. For example, in text classification, the binary vector represent the presence or absence of words, and in images – super-pixels. The task is to give explanation for the prediction f(x) where x is an input and f is the model. Given x and f, LIME samples points z in the vicinity of x and converts it into the interpretable binary vector format z’. LIME trains g with set of z’ as input and f(z) as corresponding label.

For example in the above image, the complex model function is represented as pink background. Bold red cross is the input data point. In this example, the explanation model is a linear model (g = wz’). LIME train the linear model (the dotted line) by sampling points (normal crosses) near to the input data. Sampling is performed as an approximation to the local exploration of input data. The sampled points are converted to binary vectors and are used to fit the linear model. In this example, the explanation would be the features correspond to the top-K weights in the weight vector w of the linear model. Value of K is determined by the user.

The following is an example of explanation. There are 2 algorithms to classify a document into whether it is about Christianity or Atheism. Both algorithms make correct prediction but according to the explanation determined, algorithm 2 makes the prediction based on wrong features.

Explanation for model

Although an explanation of a single prediction provides some understanding into the reliability of  the classifier to the user, it is not sufficient to evaluate and assess trust in the model as a whole. In order to provide a global understanding of the model, SP-LIME first calculates explanation for all the data points in the training data set.

Let’s take the previous example of linear model explanation. The above figures shows weight vector of each input data (row). The grey shades indicate explanation features selected for a particular input. The goal of SP-LIME is to pick least number of input samples and their corresponding explanations that cover important components and present it to the user for verification. SP-LIME should avoid selecting instances with similar explanations. Long story short, the authors propose a greedy algorithm to pick the best B input instances from the training data set which can provide reasonable explanation for the whole model.


Share me

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *