Interpretability of machine learning models created by clustering algorithms

In our era, machine learning has become something of a certainty for solving many research problems, but also problems from real life. The difference between these two areas is, that for research oriented problems, it is not absolute necessity to be able to explain created model and its decisions. However, if we want for our model to be used in other area and want those using it to trust its decisions, it is necessary to be able to explain even complex models.

In our work, we focus on clustering as field, that has been less researched from this angle. Concrete specification, that we are focusing on is finding differences between two segments of data based on feature importance. As tools for this task, we decided to use topological data analysis as a segmentation tool, and regularization of linear models as a tool for finding importance of features. More specifically, we use logistic regression with L1 normalization as a surrogate model, which provides a sparse vector of attributes as output, that we in turn use to interpret the clustering (segmentation) model.