Interpretability of Neural Network Models Used in Data Analysis

Interpretability is an integral part of every machine learning model. Without it, the chance our model will be used in domains where the cost of error is huge, like medicine, is non-existant. How can the experts in these fields really trust that our model behaves correctly and according their assumptions, if we do not provide them with explanations of our models decisions? The neural network models are regarded as one of the hardest models to interpret. This is enhanced by the recent advances that allow the networks to become much deeper using increasingly higher number of layers. On the other hand, these models can model data complexities more precisely than other models and therefore perform much better on specific task.

Our goal is to develop a method that assigns an importance factor to each feature for a specific decision, that can take into consideration the interactions present in the data. The main focus of this work is on developing this kind of method for text data that uses word embeddings.