Prediction of Perceived Text Difficulty

Reading of textual content on various kinds of devices has become a substantial part of our everyday life. When evaluating difficulty of these texts, we distinguish between perceived and actual difficulty. There are many ways of estimating the actual text difficulty beforehand, however, when it comes to perceived difficulty, there are almost none. In our work, we propose a way of perceived text difficulty prediction based on psychological traits and gaze data.

We treat this problem as a classification task and propose a solution based on machine learning. In order to maximize the accuracy of our model, we define a combination of advanced ensemble learning methods. To assess the suitability of our solution, we evaluate it on a complex dataset reflecting the real-life reading process. Using the proposed model, we have achieved an overall improvement of several percent over the chosen baseline in most of the measured performance metrics.