Students’ Research Works – Autumn 2017: Data Analysis, Mining and Machine Learning (PeWe.Data)

Prediction of users’ personality traits based on task solving on the Web

Veronika Balážová
master study, supervised by Róbert Móro

Abstract: Nowadays, people use web applications and systems to a greater extent than in the past. They want to gain information, realize different services or socialize. But how each user uses an application (a system), does not depend only on the user interface and user experience, but also on user himself and his personality characteristics.
Personality characteristics can help explain how users look at the webpage and how they work with it. Automatic prediction of personality characteristics could be useful for example in personalization of the applications. Specifically, in e-shop domain it could be used to recommend suitable products for each user based on the users’ personality characteristics.
In our work, we build a user model based on user’s interactions with a web page. We use this model as an input for automatic classification of user’s personality characteristics. We work with dataset from a Slovak e-shop. In this dataset there are users’ interactions with the e-shop, their purchases, ratings and views of products and many other actions. Also included in the dataset, there are completed questionnaires of these users, which reveal their personalities according to the Big Five model (5 dimensions of personality).
Besides using interaction features, we plan to extend this model and thus improve the predictions by using eye-tracking. Basic metrics of eye-tracking (number of fixations, time to first fixation, …) will be new input to the classification.

to the top | to the main

Predicting User Retention in online enviroment

bergerfotoPatrik Berger
master study, supervised by Michal Kompan

Abstract: The recent growth of market and technology advancement led to the increse of amount of competitors providing online services to its users. In those circumstances acquiring a new user is multiple times more expensive than keeping the existing ones. That makes user retention one of the key metrics of success for such an online service (e-shops, bank services, insurrance companies etc.). Successful prediction of churn of a specific user provides an opportunity to change his decision by for instance giving him a special offer. This kind of prevention and identification of churn reasons create huge motivation to explore this area. In our work we focus on identification of the set of features to create a user model for further use for the churn prediction. In first stage of our work we plan to build a user model in selected domain and explore the possibilities of automatic feature extraction from the data. As a next step we want to select classifiers and build a structure of a learning ensemble. Finally, we are planning to test our model with a nontrivial dataset from the selected domain.

to the topto the main

Prediction of Project Success on Crowdfunding Portals

Patrik Blanárik
bachelor study, supervised by Ivan Srba

Abstract: Crowdfunding is a way of funding with the support of large number of people who help by contributing small amounts of money. For some people crowdfunding can be considered a simple way to get a capital needed to bring their creative projects to life. Crowdfunding portals such as Kickstarter and Indiegogo provide an opportunity for these people to achieve their goals. Naturally, there are projects that were successfully funded but also projects that were not.
Our task is to predict whether a project created on Kickstarter will get funded as it might be very useful for project creator as a way of feedback. To do so, we will use data about projects from Kickstarter recorded in different time periods together with relevant data gathered from Facebook, assuming social networks might have great impact on project success. Besides many other methods, we will use NLP to extract features from structured texts (e.g. project description).

to the topto the main

Personalized Recommendation Taking into Account Visual Impacts

Martin Černák
master study, supervised by Michal Kompan

Abstract: Recommender system is important part of the web nowadays. Amount of information on the web is just unsearchable and to find useful informations and keep track about interesting topic is too time consuming. Well placed and selected recommendations can improve user experience which often transforms into the revenue.
Lists of results generated by recommenders are often accompanied with graphical elements such as images. These elements and their attributes in some domains significantly influence user preferences. Despite state-of-the-art approaches are focused on processing textual content of items, or observing and modeling behavioral relationships and interactions between users, they ignore graphical elements.
In our work we aim at proposing novel method for recommendation which takes into account the influence of graphical elements and analyze its viability.
We decided to use this method to recommend hotels. This domain have some specific problems like not enough data for individual users but also there are plenty of images for individual hotels. We believe that using our method we can ease problem with data.

to the topto the main

Recognition of similarities in user behavior in data stream

Juraj Flamík
master study, supervised by Ondrej Kaššák

Abstract: It would seem, that web site user behavior is highly unique and different from other users behavior. It is based on user current intention and previous experiences with the web site. But the web site itself offers only finite number of possibilities, in which users can behave. Thanks to this fact, we can find users, who behave similarly. Then, we can use this information in tasks like personalization, user modeling, recommendation or prediction.
In our work, we analyze possibilities of user behavior clustering. Because we work with a lot of data in web sites with dynamically changing content, we focus on clustering in data stream. We are solving subtasks like feature engineering, distance and cluster quality measurements. Then we want to use these obtained clusters / behavior similarities to improve task of recommendation. At the end, we want to test our method on nontrivial real dataset and show, that clustering can help to get better results for task of recommendation.

to the topto the main

Personalized Hybrid Recommendation enhanced by Visual Features

Peter Gašpar
doctoral study

Abstract: Recommender systems have become an essential part of the Web in various domains. They provide suggestions for users about which things to read, which products to buy, or which movies to watch. They try to perfectly tailor to user preferences in order to improve overall user experience. However, in some scenarios, we need to deal with the problem of insufficient amount of information about users, items, or they interactions.
Another essential part of many Web portals are images. They play an important role in extending or even replacing an information about items (such as movie posters, product photos). Users’ decision process may be led by the visual stimuli and thus, an important information about the item is hidden in the image. Moreover, these images may also contain features that can be useful during the process of user-modelling and recommendation.
In our work, we study the problem of incorporation of features extracted from the images in a recommender systems domain. We examine hybrid recommendation approaches that combine basic recommender systems techniques and incorporate images during the ranking process of output recommendations.

to the topto the main

Scalable Personalized Recommendation

Mário Hunka
master study, supervised by Michal Kompan

Abstract: Do you know that feeling when you dont know which movie to watch? The more movies are in offer the harder it is to find suitable movie. Personalized recommendation systems are trying to fit users needs and give him relevant options to choose from, but there are many things that can influence whats relevant to you in that particular moment, e.g. other users opinion, context, popularity of actors, category…
In our works, we want to determine if quality is relevant for users most of the time. Therefore, we want to see how users are influenced by opinion of experts – in this case – movie critics. We use collaborative filtering approach that boosts or decreases the final estimation of rating based on correlation between user and critic ratings.
Final method should be used only for certain best performing user segments based on algorithm evaluation through multiple segments. Segments can be formed by many features like demography, number of reviews and so on. Combining this with other classic methods can result in robust and scalable movie recommender.

to the topto the main

Interpretabilty of Machine Learning Models created by Clustering Algorithms

Jakub Janeček
master study, supervised by Jakub Ševcech

Abstract: Interpretability is a key characteristics of machine learning model, if our goal is to persuade experts from chosen domain, for which we propose our model, to accept it and use it. The better we are able to explain the behaviour of our model, the greater is the chance it will be accpeted.
That’s why it is importatnt to strive not just for better results of our model, but also its interpretability. One without other lacks the meaning. Sometimes maybe even model with worse result has better chances of being accepted if we are able to explain it better in a way that is comprehensible for people.
We are focusing on clustering and models created by clustering algorithms. Our goal is to use feature importance of these models to raise their interpretability.

to the topto the main

Predicting Customer Satisfaction Based on Help Desk Data

Kamil Janeček
bachelor study, supervised by Eduard Kuric

Abstract: Customer satisfaction is critical for any business. Good customer care is very important and can lead to increased sales and profit. Customers with any problems contact a customer support in hopes of getting a positive resolution of their problems.Unfortunately, after finishing their conversations with the support, customers tend not to leave any feedback. This fact creates a hard problem of evaluating the customer support effectiveness. In this work, we propose method, which should predict whether a customers problem was successfully solved during an online chat session or not

to the topto the main

Prediction of website user churn rate

Tomáš Jendrejčák
bachelor study, supervised by Ondrej Kaššák

Abstrakt: The problem of customer return is problem that is often being solved in various domains. Getting user we lost to return is often unsuccessful, which is why it’s important to design a method that would be able to reveal customers with higher risk of churning early. If we manage to identify users like this early, we can take steps to avoid their churning.
This work deals with the problem of predicting user’s return to a web site with the use of machine learning.

to the topto the main

Purchase prediction in e-shop

Matúš Kalafut
bachelor study, supervised by Ondrej Kaššák

Abstract: The topic of our bachelor thesis is to predict user’s behaviour during his visit in eshop, specifically, whether he is going to buy something at the end of his visit or not. Our task is to create a method for this prediction.
We can predict user’s behaviour based on his behaviour in the past or behaviour during his present session. This type of information can be very useful for merchants and start era of personalized advertisements in their eshop. For example they will be able to create personalized newsletters for groups of customers, which showed interest in particular category of products in the past.
To solve this type of problem we need to use machine learning. Machine learning gives computers ability to learn without being explicitly programmed. For training our machine learning model we need to extract features from data, on which it will learn. Features can describe customers,products or sessions.
We use multiple algorithms for training and then compare reached results. After we modify features and try to boost models to get better results.
For training and validation our model we use data provided by discount portal zlavadna.sk

to the topto the main

User Modelling for Session End Intent Prediction

Ondrej Kaššák

Abstract: User behaviour in the web site can be modelled from two basic points of view. The first one is the short term behaviour, which reflect user’s actual intent, preferences, goal etc. It captures user’s most actual behaviour and actions but it is typically very noisy, because of influence of user’s actual context, mood and more unpredictable conditions.
The second point of view – long-term behaviour is characterized by more stable preferences identification and capturing user typical customs. On the other side, this kind of behaviour is not so adaptable to changes, it learn trends and hot topics of user behaviour only after longer time period.
To be able to model user preferences and predict future behaviour, it is suitable to combine both data sources and consider them when estimating next user actions. In our research, we focus on task of user session exit intent prediction. This task require to be able to recognize subtle changes in user behaviour in comparison to previous behaviour in different time periods as well as characteristics of actual user session.

to the topto the main

Purchase prediction in e-shop

Matej Končál
bachelor study, supervised by Ondrej Kaššák

Abstract: Shopping in an eshop is in virtual space, not in the physical stores of companies with ability to influence customer behavior. Therefore, there is a requirement to monitor customer behavior while visiting the store’s website to positively affect the outcome of visit. Supervised machine learning is one of the most used methods to predict customer behaviour nowadays. The goal of this thesis is to analyze input data and use it for training machine learning model. Two independent algorithms will be used for model training: gradient increasing decision tree and logistic regression. After training, results will be compared and better model will be picked. The picked model will be experimentally tested on data from the discount portal.

to the topto the main

Prediction of user return to website

Júlia Krajčoviechová
bachelor study, supervised by Ondrej Kaššák

Abstract: In this work, we‘re discussing the topic prediction of user return to website. At present, the casual and one-time users are a group that represents an idle potential for increasing the number of web site’s visits. The main goals is to determine the behaviour of the individual and then to use this findings to predict whether or not the user will return to the site or if the customer has lost interest in using the services. An important part of this work is the use of relatively large amount of data to analyze and track user behaviour while using the web site or look for similarities between the behaviour of users. Then, it is necessary to design a custom method, using existing methods for prediction of user return, the output of which will be the answer to whether the user will visit the site in the future again or not. Our method could be helpful also in the practise, as it can detect the deficiencies of web site because of what the web site loses its users, or it can find out the so-called occasional visitors who need to take another way to visit the site again.

to the topto the main

Improving Robustness Against Websites’ Changes During Web Data Extraction

Michal Kren
master study, supervised by Ivan Srba

Abstract: General approach to dealing with changes to the websites’ structure during web extraction is to optimize the XPath expressions before executing the wrapper. We propose a novel approach to wrapper robustness based on machine learning, applied during, or more precisely, after the extraction. When an XPath expression fails as a result of a new change to the web page’s structure, we apply binary classification to identify the desired HTML element. Based on this element a new XPath expression is generated. We will evaluate our method on a series of snapshots of selected webpages, measuring not only the accuracy of our classificator, but also the duration until our self-repairing wrapper definitivelly fails.

to the topto the main

Aspect Based Sentiment Analysis

Rastislav Krechňavý
master study, supervised by Marián Šimko

Abstract: Text data produced on social networks like Facebook continuously increase. Reading and evaluating these posts manualy is very time consuming so our research is oriented to analyzing sentiment of these tests. Our goal is to deterimine sentiment of the aspects (topics) discussed in comment section on social networks.
This will be useful in data and marketing analysis, identifying positive and negative aspects of product or finding strong and weak parts of company. We will create method which identifies aspects, measure the sentiment of them and provide results suitable for furter research.

to the topto the main

Predicting customer satisfaction based on data from customer support centre

Michaela Kolesíková
bachelor study, supervised by Ivan Srba

Abstract: My project is aimed at predicting customer satisfaction based on the conversation they had with a customer center agent. It is possible to notice that the rating left by the customer is affected by a number of factors that can be learned from conversations. Bad ratings have common features for a whole range of such reviews, but they are different from the conversations where customers were satisfied. Since feedback is left only by a small number of people (around 15%), we will try to use machine learning algorithms to find out what ratings other – unrated tickets would have.
In this case it’s appropriate to use supervised learning – binary classification – to solve this task. I’ve decided to use this ’cause we already have some labels which are represented by 1 – negative feedback or 10 – positive feedback. Working with all the data and machine learning features, it should be possible to predict ratings for unrated tickets which can help a customer center to improve its services.

to the topto the main

Interpretability and explainability of machne learning models

Ľubomír Koprla
bachelor study, supervised by Jakub Ševcech

Abstract: Once a machine learning algorithm is trained, it can be difficult to understand why it gives a particular response to a set of data inputs. This can be a disadvantage because for people it is not completely clear how the algorithm makes its decisions and then people do not trust it. The objective of interpreting and explaining machine learning algorithm is to say: “Algorithm makes this decision because these features are the most important.”
In our work, we divide interpreting into two parts: feature selection methods and explaining methods. Feature selection methods choose features, which are the most important for train model and make predictions. Explaining methods explain predictions of the classifier. They can say, which features were important for prediction. Feature selection methods make the model easier to understand and explaining methods explain prediction of the model. In addition to interpreting and explaining, we also focus on classifier performance. We demonstrate, that feature selection can improve model performance and that feature selection can distinguish randomly generated data and real data.

to the topto the main

User’s behaviour prediction in eshop

Dung Lam Tuan
bachelor study, supervised by Michal Kompan

Abstract: The ability to predict user’s behaviour is nowadays immensely important
for many areas of our life. In the e-commerce domain, it is popular because it allows sellers to better understand customers and their behaviour.
In this work, we analyze different ways of predicting customer behaviour using machine learning. The goal was to create a model that would make it possible to tell if a customer is making a purchase in eshop.
The result of this work is evaluation of accuracy and comparing of different prediction models.

  • Add to Phrasebook
    • No wordlists for English -> Slovak…
    • Create a new wordlist…
  • Copy

to the topto the main

Providing feedback in the domain of programming

Tomáš Matlovič
master study, supervised by Jozef Tvarožek

Abstract: For students using Intelligent Tutoring Systems could be feedback very valuable. It helps them by explaining mistakes, motivate them and lead to correct solution. However, Intelligent Tutoring Systems need to know about the process of solving the problém before it will be able provide useful feedback. Modeling such a process is especially difficult in domains like programming where it is not easy to decompose the problem to the sequence of independent steps. In our work, we analyze existing research in field of providing feedback in domain of programming. Based on result of analysis we propose and plan to implement system for providing feedback. We created dataset by doing dynamic analysis on programing exercises solutions. Using this dataset we will propose method for clustering solutions which were solved by same or similar process. Individual clusters will provide useful feedback to teachers. We will also detect mistakes in correct solutions by comparing solution in individual clusters. By doing this, we will provide feedback to students about their solution. Our system has potential to make usage of Intelligent Tutoring Systems more easy and effective both for teachers and student by providing relevant feedback.

to the topto the main

Reconstruction of text for Slovak language

Peter Ocelík
bachelor study, supervised by Marián Šimko

Abstract: Each of us has already been in the situation when he has found a mistake in his text. Mistakes do not have to be made just because of illiteracy. Just a little oversight and the mistake is made. If we do not find this mistake, there may be a very unpleasant situation for the author, especially if it occurs in a formal text such as thesis or curriculum vitae. That is why there are text correctors that help prevent mistakes. The aim of this thesis is to design and implement a solution that will offer a quick check of Slovak text in the Chrome web browser for the user. The solution will be based on the existing tool called LanguageTool supporting Slovak language. In this tool of ours we will use the statistical method that can detect context-based mistakes.

to the topto the main

Personalized recommendation considering visual influences

Marek Roštár
master study, supervised by Michal Kompan

Abstract: Recommender systems are typical solution to information overload of users. Recommendations created by these systems can contain most popular items, but it was found that better solution to this problem is to personalize recommendations for users. The output of these systems is typically in the form of lists of items that we want to recommend to current user. Items on this list are usually accompanied by graphical elements such as pictures. Sadly, the influence of these elements is disregarded in the making of the recommendation. In this work, we will be exploring possible approaches for recommender systems, analysis of visual inputs of graphical elements and including these inputs in the process of creating recommendations. Currently we are working on implementing prototype, looking into the possibilities of different recommendations and looking into different ways of extracting visual influences from images.

to the topto the main

User Segmentation for Personalization of Newsletters in CQA Systems

Matúš Salát
master study, supervised by Ivan Srba

Abstract: CQA systems are common resource of knowledge sharing and obtaining information. Every day many questions and answers are created. One of the way how to inform about news in system is newsletter. Nevertheless, newsletters are not personalized yet and show random generated content for every user in such a system as Stack Overflow. Newsletters are now not quite popular because content has no value for most users.
We propose to create user segments that group users with common interests. For every group, will be generated newsletter containing specific content dependent on group attributes. After that, we want to modify specific section of newsletter for personalization based on user. Personalization of newsletter can help with searching the right content for new users and stable ones too and can. We want to evaluate proposed methods on available data of existing CQA system.

to the topto the main

Recommendation Taking into Account the Time Aspects of Users and Items

Elena Štefancová
master study, supervised by Ivan Srba

Abstract: The role of reference systems is based on previous user activity that predicts interest in new item
s (e.g., Ecommerce Products). Most systems, however, do not take into account contextual aspects such as time, place, or contemporary society of other people. In particular, time aspects can play an important role for recommendations. In the case of a user, this may change his preferences. In the case of an item, it may be current (for example, when creating a weekly bulletin) or regularity (for example, when recommending seasonal items).
We analyze existing systems that take into account time-based systems and than design our own system for recommendations that take into account aspects of user time and recommended items to create more accurate recommendations. The proposed solution will be experimentally verified in the selected domain.

to the topto the main

Analysis of source code reading

Matúš Tundér
bachelor study, supervised by Jozef Tvarožek

Abstract:The main aim of the thesis is to determine, how the order of functions in the source code will affect the program comprehension while reading a source code, but also the way of reading itself. In this work, we have added a few hypotheses, which will be accepted or rejected based on obtained data from experiments. In experiments, programmers will read two types of source codes. The first one, where the main function is on the top and the second one, where the main function is on the bottom of the source code. Experiments are taking place in UX lab at the faculty. The effect of the functions’ layout is determined by scanning the information about gaze while reading the source code. To make programmers study the whole source code, there was the task given to them to find errors in the source code.

to the topto the main