Learning Video Representations for Generating Descriptions

Eye-tracking is a great way to enhance the user experience. That can be either in the direct way, when using it as a new way for users to control applications, or in an indirect way, when eye-tracking is used by interface designers and application creators that use it for usability testing to increase the usability and efficiency of their applications. The problem with eye-tracking is that it requires specialized devices that capture the gaze but these devices are not easily accessible to ordinary users but are used in only specialized environments. However, web cameras are present in almost every mobile device.
In our work, we propose a solution that utilizes the ordinary web-cams for eye-tracking. In order to achieve that we are using neural networks that are good with data containing noise or lacking quality. The architecture of the neural network that we designed is based on existing techniques and models used in the field of AI, namely the Inception modules used in convolutional neural networks. We train and evaluate our solution on one of the UX datasets created as a part of the projects done at ÚISI.