Students’ Research Works – Autumn 2016: Data Analysis, Mining and Machine Learning (PeWe.Data)

Identification of users’ similar behaviour in order to predict leaving of the site

Tomáš Bako
bachelor study, supervised by Ing. Ondrej Kaššák

Abstract. Behaviour of the web users is quite individual. It is affected by his aim, what is he searching for, as well, as his previous experience. However, it could be said, that the user has limited possibilities he can do, so we can tell, in general, that there should be some similarities among how the users use the website, or, a proper way of description – some users could be similar to each other. According to this fact, the way how they use the website could be analyzed and maybe clustured. The motivation for clustering is the fact, it has wide usability in, for example, modeling, creating personas, recommendation or various predictions.

My main objective will be to analyze the topic of clustering over data stream and implement a program, that would make clustering over various data sets (of how the users use the website), that would come as a data stream.

to the top | to the main

Predicting User Retention in online enviroment

bergerfotoPatrik Berger
master study, supervised by Michal Kompan

Abstract. The recent growth of market and technology advancement led to the increse of amount of competitors providing online services to its users. In those circumstances acquiring a new user is multiple times more expensive than keeping the existing ones. That makes user retention one of the key metrics of success for such an online service (e-shops, bank services, insurrance companies etc.). Successful prediction of churn of a specific user provides an opportunity to change his decision by for instance giving him a special offer. This kind of prevention and identification of churn reasons create huge motivation to explore this area.

In our work we focus on identification of the set of features to create a user model for further use for the churn prediction. In first stage of our work we plan to build a user model in selected domain and explore the possibilities of automatic feature extraction from the data. As a next step we want to select classifiers and build a structure of a learning ensemble. Finally, we are planning to test our model with a nontrivial dataset from the selected domain.

to the top | to the main

Detection of anti-social behavior in online communities

Martin Borák
master study, supervised by Ivan Srba

Abstract. Lately, online communities gain importance and popularity on the Web, mainly on places as social networks, CQA systems, online games and news or entertainment portals. Immediate communication with unlimited amount of people on enormous number of topics became a part of everyday life for hundreds of millions people in the world.

Considering the huge amount of members of these communities, content of such communication is often rather diverse. Often there are users who try to disrupt these communications. Wheatear it is by posting pointless messages, sharing links to irrelevant sites, uncalled for sarcasm or by an actual aggressive behavior and rude verbal attacks. One of the most common types of these users are haters, who spread hate via rude, vulgar and often hurtful content pointed at people or things they dislike. Such behavior degrades the quality of discussion, discouraging other users from reading and contributing to it and inevitably from visiting the portal. Also it can be a stimulus for legal issues.

In our work we will focus on analysis of antisocial behavior on Web and on automatic detection of haters’ comments on YouTube, which is a portal that mediates multimedia content and is known for high concentration of haters in discussion sections of videos.

to the top | to the main

Stream analysis of incoming events using different data analysis methods

Matúš Cimerman
master study, supervised by Jakub Ševcech

Abstract. Nowadays we can see emerging need for data analysis as data occur. Processing and analysis of data streams is a complex task, first, we particuraly need to provide low latency and fault-tolerant solution.

In our work we focus on proposal a set of tools which will help domain expert in process of data analysis. Domain expert do not need to have detailed knowledge of analytics models. Similar approach is popular when we want analyse static collections, eg. funnel analysis. We study possibilities of usage well known methods for static data analysis in domain data streams analysis. Our goal is to apply method for data analysis in domain of data streams. This approach is focused on simplicity in use of selected method and interpretability of results. It is essential for domain experts to meet these requirements because they will not need to have detailed knowledge from such a domains as machine learning or statistics. We evaluate our solution using software component implementing chosen method.

to the top | to the main

Analysis of trends in the behaviour of users of the web site

Natália Čuláková
bachelor study, supervised by Ondrej Kaššák

Abstract. Frequent pattern mining is a well known and used area of data mining but most of the approaches focus on the precision of the result and because of that they can’t be used on fast streaming data. For this we need to consider one pass only algorithms which are able to work in real time.

In this work I will be focusing on the analysis of user behaviour on a web site. I will try to use one pass algorithms to mine frequent patterns in streaming data and possibly even predict the next behaviour of users based on the statistics from past actions.

to the top | to the main

Learning Video Representations for Generating Descriptions

patrikPatrik Gajdošík
master study, supervised by Márius Šajgalík

Abstract. Eye-tracking is a great way to enhance the user experience. That can be either in a direct way, when using it as a new way for users to control applications, or in an indirect way, when eye-tracking is used by interface designers and application creators who use it for usability testing to increase the usability and efficiency of their applications. The problem with eye-tracking is that it requires specialized devices, eye-trackers, that capture the eye gaze. The eye-trackers are not widely spread among ordinary users but usually only accessible in specialized environments. However, web cameras are present in almost every mobile device.

In our work, we propose a solution that would utilize the web-cams and perform eye-tracking with them. For that we decided to use neural networks that are good with data containing noise or lacking quality. We want to design an architecture of a neural network that would take the video captured by a web camera and generate the coordinates of the user’s gaze. We also want to enhance our model that would, in addition to the gaze, recognize some of the simple patterns that can appear in the gaze recordings.

to the top | to the main

Recognition of Web user’s behavioural patterns

Tomáš Chovaňák
master study, supervised by Ondrej Kaššák

Abstract. Behavioural patterns can be understood as typical and repeating features of user’s behaviour while visiting web site. We can transform web logs, where actions of users are preserved, into transactional dataset, where transactions represents user sessions. In this work we represent behavioural patterns as frequent itemsets of pages frequently visited together in user sessions.

There is number of proposed methods for finding frequent itemsets in transactional datasets. Found behavioural patterns can be used for example to create recommendations, do prediction of user’s intentions (which can be used to cache predicted pages), adaptation of web site structure and design. Whole process of processing web logs, finding behavioural patterns and their analysis is known as Web Usage Mining. Mostly existing methods of Web Usage Mining search for behavioural patterns common for whole set of web site users. In this work we propose solution where we combine patterns mined from whole set of users with patterns mined for specific user groups and we examine hypothesis that this leads to finding behavioural patterns of better quality. Quality of behavioural patterns is implicitly derived from results of their application for tasks recommendation and user exit intent prediction.

to the top | to the main

Predicting User Activity on Twitter

Martin Jakubík

Martin Jakubík
bachelor study, supervised by Ing. Michal Kompan, PhD.

Abstract. The activity of users on the internet grows hand-in-hand with the need to personalize, adapt, perhaps just know and predict the user behavior.

I will try to predict the activity of Twitter users. I will use a dataset containing tweets – Twitter’s short messages containing the user’s name, timestamp, and a message.

to the top | to the main

Executable documents on data analysis

Jakub Janeček
bachelor study, supervised by Jakub Ševcech

Abstract. Data analysis has really emerged on the surface in the last two decades. It is really important for students to get familiar with data analysis, as it is a quite useful field of study in modern era, when we are overflowing with data. In this work we will concentrate on basic knowlage of data anylisis, especially classification with ensemble learning. We will cover the theory of differences between ensemble learning methods, the comparasion of basic classifiers and ensembles. We will also shine some light on some more concrete parts of classification, like how to fight overfitting and what it is, feature engenering and evaluation.

This will also be covered by executable documents using jupyter notebooks.

to the top | to the main

User short-term behaviour modelling

Ondrej Kaššák
doctoral study, supervised by Mária Bieliková

Abstract. Modelling of user behaviour on the web site represents relatively well-known topic. Existing approaches, however focus mostly on capturing user’s typical behaviour and preferences from a long-term perspective. Typical usage of such an information is a personalization of the web site or recommendation of interesting content.

In the moment, when we want be able to identify the user’s actions in the very close future as for example his next step within the site (next visited page or information if he remain in the site), it is needed to consider also information about user’s short-term behaviour. This means his current preferences, intent, context, actual trends etc. In the case of the short-term behaviour, there is however typically unable to identify this information as clearly as in the case of long-term preferences. The reason is that short-term behaviour is often very noisy and individual preferences are difficult to separate on the short-term scale. This problem is possible to ease by modeling the user on the various time terms, which enable to combine stability of long-term preferences with actuality and flexibility of short-term preferences.

User short-term behaviour modelling has nowadays great potential due to the allowing for example to predict user’s future steps within the web site. In the case, we know which page will user visit as next the most probably, or if we are able to predict that user will leave the site soon, we are able to proactively react – e.g., personalize the content for the user, maximize his user experience or maximize the web site provider aims.

to the top | to the main

Ensuring Robustness against Changes in Web Sites during Data Extraction

Michal Kren
master study, supervised by Ivan Srba

Abstract. The amount of data on the web increases exponentially, so we need an efficient solution for extracting this data because manual approaches are no longer viable. There is an intensive research being made in the field of automatic data extraction from the web with purpose to eliminate, or at least decrease the human input. In this work, we are focusing on web wrappers, more specifically on their maintenance. The structure of a web site changes over time, and even the smallest changes can cause a wrapper to fail. Our goal is to analyze and improve existing solutions for automatic wrapper maintenance to ensure robustness against structural changes of the page over time.

to the top | to the main

Modelling Music Structure using Artificial Neural Networks

Lukáš Marták
master study, supervised by Márius Šajgalík

Abstract. With the era of digital technologies, we can see dramatic evolution of music industry together with radical growth of music content. Libraries are crowded with music, ready to stream compressed, but still great quality audio tracks on demand. As the richness of music content grows, it is crucial to have new methods to describe this content, designed for various purposes. Music Information Retrieval is an interdisciplinary science of retrieving information from music. Various tasks have been identified within the field, which aim to solve different real-world problems.

In this work, we approach the task of Automatic Music Transcription, which is a process of retrieving musical notation from audio piece containing music recording. The main subproblem to be solved here is called Multiple Fundamental-Frequency Estimation. In the past, it has been approached mostly by signal processing domain experts, using handcrafted features to extract information from signal. We approach this problem within the context of emerging field of machine learning, focusing on deep learning methods. To be able to effectively model the structure of musical content within audio signal, we need to build an architecture of deep neural network and optimize it to gain this modelling capacity.

to the top | to the main

Visualization of user activity in an interactive application

Lukáš Meňhert
bachelor study, supervised by Mária Bieliková

Abstract. It has been proven, that opening up the the learner model to students may lead to further enhancement of their learning experience. Our project then aims to make the learner model explicit to the users of adaptive learning framework using graphical visualization. For different types of users, there are different types of visualizations that may prove interesting and so we will be visualizing data from two different viewpoints. The first one is the viewpoint of a student whose primary goal may be to get the highest possible score on a test or for us to make his learning process much more enjoyable. Second viewpoint is that of a teacher whose primary goal may be to review the most problematic questions for students and then adjust material explained on the next lecture.

Analysis of interactions in the domain is then essential to make visualizations, that add value to the learning process both from students and also from teachers perspective.

to the top | to the main

Text reading analysis

Jakub Mrocek
master study, supervised by Róbert Móro

Abstract. Our project aims to detect parts of text which are not comprehensible to the human reader using eye-tracking device to monitor their gaze. The amount of text we read every day on a screen is on a raise because of changes in our lifestyle. The main goal of a writer is to encode information in a text representation in a way the reader will be easily decoded. Eventhough the text may seems to be written in a clear and comprehensible way, the reader still may have problem understanding it correctly. This phenomenom may be caused by different mental or intelectual development of the writer and the reader. It is also obvious that comprehension problem may vary amoing different users.

However, it is not trivial to verify if the reader was able to understand the text. Thankfuly, modern technologies provide us with still more sophisticated ways of recording human computer interaction. Our method analyzes data recorded while users were reading text and tries to determine the areas which might have been difficult to read or even incomprehensible to the reader.

to the top | to the main

Conflict detection and visualization in software models

Martin Olejár
master study, supervised by Karol Rástočný

Abstract. The development of software systems includes creation of different model versions of developed system that continuously undergo significant changes. For the purpose of effective progress in the development, it is necessary to detect and identify changes in the models. Furthermore, many changes of models can be conflicting. The conflict changes are created mainly during the parallel work of developer team and must be solved before synchronization of model versions. The problem is not only detection of these conflict changes, but also detection of all model parts that are influenced by these conflict changes. Correct detection and visualization of all conflicts in the models would provide a very good precondition for their solution and successful synchronization of model versions.

In our work, we would like to detect and visualize differences between 2 UML model versions. We plan to use these differences to propose a method for detection, visualization and solution of model conflicts.

to the top | to the main

Generative Adversarial Networks

rafajdusa-photoAdam Rafajdus
master study, supervised by Márius Šajgalík

Abstract. Machine learning, more specific deep learning, has received massive boost of popularity and usability in many domains in the last years, not only thanks to increased computing power and large amount of data, but also thanks to new architectures of neural networks.

Generative adversarial networks are the new architecture of neural networks, in which two models are trained simultaneously, learning from each other, trying to compete against each other producing better results on set tasks.

This architecture has already been used to augment convolutional neural networks in tasks such as drawing pictures from prelearned datasets, but could theoretically be applied to fields as natural language processing, processing of video (as in prediction of frames) or processing and generating voice. We would like to focus our scope on processing (and generating) of text and pictures with the intention of combining features of both together (for example generating text descriptions of pictures) utilizing the advantages of generative adversarial networks.

to the top | to the main

Using Deep Learning for Achievement of Advancement in Machcine Learning and Artificial Intelligence

Metod Rybár
doctoral study, supervised by Mária Bieliková

Abstract. Focus of our research is usage of Deep learning techniques and architectures to achieve advancement of performance and optimization of Machine learning and Artificial Intelligence systems. To do this we use cutting edge methods and techniques and try to improve, combine and apply them so, that we create new types of artificially intelligent system.

to the top | to the main

Processing Time Series as Symbols

Jakub Ševcech
doctoral study, supervised by Mária Bieliková

Abstract. When processing time series data, we face several challenges. Among other is the comparison of very long time series, where we are no longer interested in value similarity, but in the similarity of inner structure. Symbolic representations of time series can be used to accentuate the inner structure, to reduce dimensionality of the data and they allow application of plethora of methods from string processing and text processing domain in time series data processing.

In our work, we propose a symbolic time series representation to transform seasonal data into a sequence of repeating shapes. We study the possibility of transformation of various types of data into sequences of symbols and we explore methods for their analysis. Applications we are focusing on are stream state classification, anomaly detection and forecasting in domains such as electrical energy consumption or other production/consumption processes. Our paramount goal is to facilitate parallel analysis of multiple data streams and multiple metrics running over them.

to the top | to the main

Visualizations of eye tracking data

Ronald Demeter
bachelor study, supervised by Jakub Šimko

Abstract. Text of the first paragraph…

 

to the top | to the main