Students’ Research Works – Spring 2017: Data Analysis, Mining and Machine Learning (PeWe.Data)

Tomáš Bako: User behavior similarity identification for the task of prediction the leaving of the session
Patrik Berger: Predicting User Retention in online enviroment
Martin Borák: Detection of Anti-social Behavior in Online Communities
Matúš Cimerman: Stream analysis of incoming events using different data analysis methods
Natália Čuláková: Web Site Users’ Behavioral Trends Analysis
Juraj Flamík: Recognition of Similarities in User Behavior in Data Stream
Patrik Gajdošík: Learning Video Representations for Generating Descriptions
Tomáš Chovaňák: Web User Behavioral Patterns Recognition in Online Time for Personalized Recommendation
Martin Jakubík: Using Machine Learning for Prediting User Behaviour
Jakub Janeček: Executable documents on data analysis
Ondrej Kaššák: User Modelling for Session End Intent Prediction
Michal Kren: Ensuring Robustness against Changes in Web Sites During Data Extraction
Lukáš Marták: Modelling Music Structure using Artificial Neural Networks
Jakub Mrocek: Text reading analysis
Martin Olejár: Conflict detection and visualization in software models
Adam Rafajdus: Generative Adversarial Networks

User behavior similarity identification for the task of prediction the leaving of the session

Tomáš Bako
bachelor study, supervised by Ondrej Kaššák

Abstract. Clustering is one of the ways how to analyze a big amount of data. However, clustering itself is not a very effective way to do this. Therefore a new way of clustering was proposed – to use clustering over data stream. Clustering over data stream analysis continuous data and every item of data is used only once. After this the item is thrown away and never used more.

In this project we are searching a solution for clustering data from the ALEF system, using an algorithm that works over data stream. This algorithm is one of the common algorithms of clustering over data stream – the CluStream algorithm. It consists of 2 parts – online microclustering part and offline macroclustering part. Microclustering is used to access the stream fast enough, the macroclustering is used to produce final cluster results that could be created at any time of the clustering. The main goal of this project is to find a proper configuration of pre-processed data-set, algorithm input settings and the algorithm itself to make clusters of high enough quality, which would represent similar users of the ALEF system.

User behavior similarity identification for the task of prediction the leaving of the session

Predicting User Retention in online enviroment

Detection of Anti-social Behavior in Online Communities

Stream analysis of incoming events using different data analysis methods

Web Site Users’ Behavioral Trend Analysis

Recognition of Similarities in User Behavior in Data Stream

Web User Behavioral Patterns Recognition in Online Time for Personalized Recommendation

Learning Video Representations for Generating Descriptions

Using Machine Learning for Prediting User Behaviour

Executable documents on data analysis

User Modelling for Session End Intent Prediction

Ensuring Robustness against Changes in Web Sites during Data Extraction

Modelling Music Structure using Artificial Neural Networks

Text reading analysis

Conflict detection and visualization in software models

Generative Adversarial Networks