Students’ Research Works – Autumn 2015

Data Analysis, Mining and Machine Learning

Usability and User Experience

User Modelling and Implicit Feedback

Recommenders

Semantics Acquisition and Domain Modeling

Text Processing and Search


Doctoral Staff

bielikova barla jtvarozek kompan kramar simkoj simkom

  • Mária Bieliková: web personalization, user/user groups and contexts modelling, usability and user experience (and HCI in general)
  • Michal Barla: user modeling, implicit user feedback, virtual communities, collaborative surfing
  • Jozef Tvarožek: social intelligent learning, collaborative learning, semantic text analysis, natural language processing
  • Marián Šimko: domain modelling, ontologies, folksonomies, semantic text analysis, Web-based Learning 2.0
  • Jakub Šimko: crowdsourcing, games with a purpose, semantics acquisition, Web-based learning
  • Michal Kompan: single user and group recommendation, satisfaction modeling
  • Tomáš Kramár: user modelling, personalized web search

to the top | to the main


ModerateIT – Moderate using IT (Imagine Cup)

adam_filipcikova_svec_vozar

Jakub Adam, Monika Filipčiková, Andrej Švec, Filip Vozár
bachelor study, supervised by Jakub Šimko

Abstract. As internet is growing, people are getting more opportunities to be connected by a wide range of devices: computers, smartphones and tablets. The freedom of communication is loosing borders. People have the pleasure of confidentiality and thus become more open. Sometimes, they are so open, they even behave unpolitely and rudely to others. Communication and user content on the Web became unregulated. Comments and open discussions easily became a place of hate and offensive behaviour. Discussions contain vulgarisms, ad hominem attacks, offensive words and many other forms of malicious behavior. All this leads to either closing of discussions, companies paying extra money to manage communications on their web sites, or social disgradation of people. We think that communication has to be managed, not omitted.

Our goal is to get rid of poisoning and unpolite comments in online discussions. Dealing with this issue costs companies, such as news portals, considerable sums of money. Our automated solution helps human moderators to quickly detect problematic comments and keep the discussion focused on a given topic. It will also prevent unnecessary conflicts between discussion participants.

Since there is a lot of aspects which can describe a single comment, we will be using many detectors, which will rate a comment based on one particular aspect, and then join the outputs together. This will result in a single number describing the likelihood that a comment is inapproppriate. Few examples of such detectors might be – detection of correlation between a comment and an article using RAKE (Rapid Automated Keyword Extraction), TF-IDF and ElasticSearch, detection of swear words, analyzing the likelihood of inapproppriate comment based on authors past behavior and so on. We will also use statistics and machine learning to help us tune thresholds and parameters for specific detectors.

Usability of Information Visualizations in the Process of Knowledge Discovery

bacho

Adam Bacho
master study, supervised by Róbert Móro

Abstract. At the time when more and more people, both experts and lay people are interested in data mining and then its subsequent analysis, it is necessary to develop techniques to its simplified representations. One of the most popular machine learning methods in the process of knowledge discovering are decision trees, but sometimes their visualization is hard to interpret, due to the large number of nodes and rules. This complexity increases even more when random forest method is used instead, where the trees are generally combined into groups of tens to hundreds. Similar problems also occur while using other less intuitive prediction models.

In this work, we explore and analyze the limitations of existing visualization approaches of various models in the process of knowledge discovery. Our goal is to propose visualization method for one of the investigated models that will address the shortcomings of the existing approaches, such as bad interpretability of the models and of their visualizations (especially for large data sets) or learnability of these visualizations. We plan to verify the proposed method in a user study and explore the impact of experiences and cognitive characteristics of different groups of users on their ability to work effectively with the proposed visualization. We consider also to automate the process of visualization adaptation for different groups of users.

User Experience on the Web

balazovaVeronika Balážová
bachelor study, supervised by Róbert Móro

Abstract. Nowadays, when almost everything can be found on the Web, it is important for the websites to be as usable, user-friendly and easily searchable for what the users need as possible. In addition, it is now true more than anytime before that “competition is just by one click far away from us”. That means if a website does not give the users exactly what they need, they leave the page and look for the information somewhere else, or this experience make them dissatisfied, angry, or frustrated.

A user interface of a website decides how the users will use the site – whether they will effectively reach their goals, less effectively, or not at all. This may be also related to the fact, what kind of person the user is.

In our work, we focus on user experience on the Web, especially in the banking domain. Firstly, we conduct a formative study in order to test usability of the website, mainly its navigation. The main goal is to identify usability issues in the interface and then offer improvements. Secondly, we plan to carry out a summative study comparing the original version of the website and the one addressing the shortcomings found in the formative study, using the appropriate metrics. At the same time, our goal in the summative study will be to verify whether there is a relationship between the number of detected errors and the type of users’ personalities (level of openness to experience, neuroticism, extraversion, etc., based on the standardized Big Five test).

Search engine keyword prediction based on user need derived from eye tracking

balun

Jozef Balún
bachelor study, supervised by Eduard Kuric

Abstract. Today there is a rule which says: It is not necessary to know everything, you just need to know how a where to find it. It only confirms that it is a key human ability to search for good information.

In this project we introduce an improvement to phrase prediction used in search engines. Suggesting phrases is an important role in using search tools. Many researches were focused on improvement of query estimation based on mouse movement and clicking, scrolling and time spent reading the document. However, our method should ensure improvement by predicting queries right away from actuals user’s view so the user could get queries in real time. User’s view can be tracked by a device called Eye tracker, which returns information about eye fixation in real time. Based on the collected information we can predict user’s intention by our method and suggest queries which result from the context of user’s read material. Queries can by used directly as a suggestion for the next searching of the user in the context of his relation.

Text Documents Clustering

Peter Belai
bachelor study, supervised by Michal Barla

Abstract. People usually encounter great amounts of texts, be it in newspapers, books, or on the internet, everyday in their lives. But we can be easily overwhelmed by these texts. Ideal solution to this problem is to group these texts, or documents, into groups according to the keyword, selected by us, and subsequently choose only those, interesting for us. But Slovak language, as many others, is rich in words, which share the same spelling, but mean different things. These words are called homonyms. There are number of approaches to this problem, but almost none of them was applied on Slovak language.

The main goal of this bachelor thesis, is to provide a solution for the problem of discovering individual senses of homonyms from text, using techniques of word sense induction. In the first phase of this work, different algorithms, that address this problem, are introduced and then one of them is chosen. The second phase consists of designing a solution to this problem and finally this solution will be implemented as the web application

Analyzing Problem Solving in Education

benickovaZuzana Beníčková
bachelor study, supervised by Jozef Tvarožek

Abstract. Modern technology enables us to record numerous inputs during an interaction with applications. Analysis of an educational interaction can bring new insights into various fields of knowledge, such as cognitive psychology, user experience, and neuroscience. This research focuses on measuring capacity of working memory by a memory game with math problems. Given that working memory plays a crucial role in reasoning, comprehension, learning, and memory updating, its analysis can provide a valuable source of information for professors about their students, for experiment leaders about the participants etc. There is currently only one freely available test on working memory of this type. The aim of this research is to develop a more exciting, gamified, educational, and shortened game that measures working memory.

Math memory game is a memory game where couples consist of a mathematical problem and a solution to the problem. I designed four different versions of memory games which load working memory in different ways, and on different levels. Subsequently, the most accurate version was chosen according to the extent to which it correlates with the results from operational span task. Operational span task is a proven method of measuring working memory, and this research attempts to prove the hypothesis by comparing the results from both tests. The analysis of my results will focus on patterns of sight of respondents with different capacities of working memory, and the length of fixation during tasks with different load on working memory.

Mind-Controlled Application

bergerPatrik Berger
bachelor study, supervised by Róbert Móro

Abstract. Electroencefalography (EEG) is relatively new technique that records electric potentials from the brain. It is mainly used for medical purposes. Recently, few portable and affordable EEG headsets have been developed. That caused a lot of interest in commercial use of EEG as a brain-computer interface (BCI). One of the problems of EEG, especially for the commercial ones is low signal to noise ratio (SNR), which is caused by many reasons. One of them are biological artifacts as eye blinks, muscle movements and so on, another is low hardware quality, last but not the least important is high complexity of brain signal itself.

Our goal is to implement successful signal processing and classifying algorithm. We use a standard dataset from a BCI competition for its preliminary evaluation and prototype validation. After that, we plan to conduct an experiment in which EEG Epoc Emotiv device is going to be used to collect data from participants using P300 speller application. We will use the collected data in order to find out how accurate the proposed algorithm is and furthermore, whether the data from Emotiv Epoc is good enough for detecting P300.

Analyzing Problem Solving in Education

Peter Bobovský
bachelor study, supervised by Jozef Tvarožek

Abstract. Gamification has been used to stimulate motivation in students for a while now. Using gamification as an introduction to programming and algorithmization is an important step in teaching the subjects. In this project we study which specific elements tied to gamification and motivation will keep the players engaged and drive them to complete more problems.

Our goal is to distinguish which elements boost the player’s motivation and which detriment it. These elements can range from simple rewards, time limits to scoring and social elements. With web applications ultimately being a standard, this project will be accessible through web browser and will be easily usable by even un-skilled students.

Detection of anti-social behavior in online communities

borakMartin Borák
master study, supervised by Ivan Srba

Abstract. Lately, online communities gain importance and popularity on the Web, mainly on places as social networks, CQA systems, online games and news or entertainment portals. Immediate communication with unlimited amount of people on enormous number of topics became a part of everyday life for hundreds of millions people in the world.

Considering the huge amount of members of these communities, content of such communication is often rather diverse. Often there are users who try to disrupt these communications. It can be by posting pointless messages, sharing links to irrelevant sites, uncalled for sarcasm or by an actual aggressive behavior and rude verbal attacks. The most notable type of these users, are so called trolls, who at first pretend to be regular members of these communities, but then they try to disrupt them by annoying people and starting arguments. Such behavior degrades the quality of discussion, discouraging other users from reading and contributing to it and inevitably from visiting the portal. Also it can be a stimulus for legal issues.

In our work we will focus on analysis of antisocial behavior on Web and on automatic detection of trolls and their posts on portals, which are homes to online communities. One of possible candidates is YouTube, which mediates multimedia content and is known for high concentration of trolls in discussion sections of videos.

Source code similarity

brillaJuraj Brilla
bachelor study, supervised by Michal Kompan

Abstract.Nowadays, people have a lots of the possibilities to share information and acquire knowledge from internet. Because of availability information on web is simpler make a plagiarism of some document or source code. This is reason for creation systems which detect the plagiarism and draw attention on it.

Nowadays exist several method to detect plagiarism in source code. In bachelor thesis we analyze possible changes in code which are separated by skills of plagiarist. We split the way of detection plagiarism in source code to two level. The first level consist of abstract syntax tree which separate source code to nodes which together form syntax tree. The second level is represented by n-grams method. It is depend on compare the small part of the code.

This method includes other algorithm which compare processed data. My program will work with source codes written is C, Java and C# sharp program language and will be implement in Java.

Pupil dilatation and stress in user studies

cervenka2Matej Červenka
bachelor study, supervised by Mgr. Martin Krupa

Abstract. Analysis of user testing records consumes much of the study moderator’s time. If we want to find a problem in user interface, it’s necessary to watch whole record of the testing and focus on all its outputs.

Forms are one of the most problematic features of the user interface. We assume that problematic forms cause stress, cognitive overload and negative emotion of users. These phenomena are manifested by mydriasis (pupil dilation), which we are able to measure with eye tracker. Our primary metric is pupil size. Alternatively we will monitor emotions (Nodulus Face Reader) and skin conductance (GSR).

If we find a correlation between mydriasis and stress that was caused by problematic areas of forms, we will be able to determine time at which the participant had a problem and determine where exactly in the form problematic areas are (by using AOIs). The goal is to develop a tool which determines times of problematic areas and saves moderator’s time which he would spend on finding these areas.

Automatic Text-Checking for Slovak Language

Ondrej Čičkán
bachelor study, supervised by Marián Šimko

Abstract. We encounter with text-checking almost in every word processing program, web browser and other applications. Late detection of spelling mistakes in the curriculum vitae, book or diploma work can be unpleasant for the author. The function of text-checking tool is automatically detect these errors and propose their corrections. It may also be useful in other programs, which require that the input text is written correctly.

Our goal is to offer a tool that automatically check a text in the Slovak language and detect the largest possible percentage of error. We decided to use a statistical method, where we use language and error model. This models will help us to choose the correct word from list of multiple suggested corrections for misspelled word. This method also allow us to correct the real-word errors.

Our solution is based on existing tools for the text-checking developed at Charles University in Prague by Michal Richter. We will have to create language and error model for Slovak language. The success of our solutions will be tested on our own data and compared to results achieved by existing tools for automatic text-checking for Slovak language.

Stream analysis of incoming events using different data analysis methods

DCIM100GOPRO

Matúš Cimerman
master study, supervised by Jakub Ševcech

Abstract. Nowadays we can see emerging need for data analysis as data occur. Processing and analysis of data streams is a complex task, first, we particuraly need to provide low latency and fault-tolerant solution.

In our work we focus on proposal a set of tools which will help domain expert in process of data analysis. Domain expert do not need to have detailed knowledge of analytics models. Similar approach is popular when we want analyse static collections, eg. funnel analysis. We study possibilities of usage well known methods for static data analysis in domain data streams analysis. Our goal is to apply method for data analysis in domain of data streams. This approach is focused on simplicity in use of selected method and interpretability of results. It is essential for domain experts to meet these requirements because they will not need to have detailed knowledge from such a domains as machine learning or statistics. We evaluate our solution using software component implementing chosen method.

Predicting Interest in Information Sources on the Internet using Machine Learning

cizMartin Číž
master study, supervised by Michal Barla

Abstract. The most important goal of a each content publisher on the Web is to capture reader’s interest, so that reader becomes a returning customer. Although it is useful to evaluate previously published articles, there is an opportunity to find article’s potential to be popular before it is even published or shortly after. There are many attributes that may decide whether an article has a potential. These attributes include title, content, author, source, topic, freshness, credibility.

To find more about the content we use topic modeling, a method for retrieving set of topics from text documents. Special use case of topic modeling where we see how topics evolved over time is called dynamic topic modeling. Created model then can be correlated with article’s visits over time, which could lead us to interesting patterns of popularity of article’s topics.

To predict popularity of an article based on its attributes we will use regression in machine learning.

Evaluation of user experience by eyetracking and emotions analysis

dragunovaMária Dragúňová
bachelor study, supervised by Mária Bieliková

Abstract. Measuring of the web page target findability is often used to evaluate web page designs usability. Eye tracking provides us several measures, such as time or number of fixations prior to the first fixation on the target. However, people are different and therefore the measured values differ. Our work is based on the natural diversity of human visual search abilities, since visual search is subject of attraction for many psychologists around the world. We will evaluate the participants’ visual search ability not only employing the standard visual search tests, but also by our developed tests, which contain typical icons from the web environment. This set of tests will be created according to the results of an experiment, choosing the stimuli with significant variety of response times of individuals.

We will evaluate our solution by computing correlation between our evaluation of visual search ability and the number of fixations prior to selecting the target element in a search task on chosen web page. If the correlation is proved, we will be able to enhance target findability measure by assigning a weight to each participant in a user study according to his visual search ability. If his visual search ability is poor, participant will be expected to reach worse results in search tasks and therefore his worse result does not necessarily indicate a problem in user interface, but only a poor human ability.

Learning representations of video for generating its description

gajdosikPatrik Gajdošík
master study, supervised by Márius Šajgalík

Abstract. Nowadays, the task of static pictures classification is handled by convolutional neural networks (CNNs), which achieve very good precision doing it. The newest approaches are capable of describing an image by using natural language while their output is not restrained to just a few phrases. The area of classifying videos is also quite an active place. Here we can also use CNNs which are extended by another modules that take into account the extra information the videos offer, comparing to static images. There we can use information about motion and audio as well. By processing these information, we can achieve more precise results. Some work has already been done on classification of short videos using different neural architectures. However, the work was usually focus on just simple categorization and often omitting some of the semantic information that video offers.

Our goal is to build an architecture of a neural network (a module) that would merge/combine the work already done in this field to find out, what parts of the semantic information videos have could be used to better the results of video classification. We also strive to focus on categorizing videos of bigger length, not just short clips, which were the main point of interest in the works done so far. We would also like to focus on creating descriptions for each part of the video (whether in simple keyword form or in natural language descriptions, we’ll see).

Automatic Computation of Textual Similarity

gallayLadislav Gallay
master study, supervised by Marián Šimko

Abstract. There is a lot of information stored in natural language in vast collections of documents that are useful to people. Natural language processing and transformation of the information into a form that the machines would understand, and enabling the work with information more advanced and efficient, is a demanding process. That is mainly because of weak formal structure and informality of natural language. Automated detection of the meaning of words and sentences is the key to understanding the language by the machine and the improvement of many subtasks, such as recommendations, retrieval, enrichment and personalization of web content.

In our approach, we are exploring the possibility of automatic comparison of the texts in a web environment based on the analysis of textual similarity. We analyse the existing approaches and their combinations. We apply the results in the domain of community question answering systems. Our goal is to design a method for determining semantic similarity between questions and answers, and the questions themselves. Based on the given question and list of other pre-filtered similiar question the algorithm ranks the questions based on their relevance to the original question. Similiary given a question and list of answers, the algorithm will rank the relevance of the answers. We aim to help the users to quicky find the answers to questions using more advanced method than standard bag-of-words, such as word vector models and neural networks. Eventually the method is being tested using real data from StackOveflow and Askalot systems.

Linking Multimedia Metadata by Using Microblogging Network

gasparPeter Gašpar
master study, supervised by Jakub Šimko

Abstract. With the huge usage of the Web, information has become even more important part of people’s lives. Whether we are interested in a picture, video, document, or status on social networking service (SNS), there is always an intention to discover something new. Many researchers are trying to find the best way to characterize information. In our study we are focusing on metadata in a domain of multimedia and television.

A big potential in building metadata database is hidden in the SNSs. Past years they have become an irreplaceable companion on the Web for the most of people. On the one hand, they provide nearly unlimited space to spread ideas and opinions. Moreover, many television companies use them to propagate their programmes with desired articles and backstage photographs and videos. SNSs are also one of the most straightforward ways to get in touch with TV audience. People’s activity on public statuses makes an opportunity to reveal other interesting content.

In our approach, we are trying to propose an innovative method to interlink TV and SNSs. Our main goal is to extract and capture metadata from available content shared by the audience and TV companies. These metadata are supposed to enrich existing databases and give an attractive extension to huge audience. In our research we are analyzing Facebook Pages of popular Slovak and worldwide TV channels.

Automatic reconstruction of slovak texts using context

gederaJakub Gedera
master study, supervised by Marián Šimko

Abstract. Everyone of us is active user of the Internet. Many people enriched the Internet with their experiences by writing blog, comments. They adapted to the culture which ruled the Internet. Communication is informal and it is significantly different from formal. Processing these informal texts is difficult and could be a problem.

In diploma thesis we plan to create a context sensitive spell-checker for slovak language. There has already been Slovak spell-checker but most of them does not use context. Traditional spell-checker will correct you when the word is missing in dictionary. But what if a word is in dictionary and i tis still incorrect because of context? Also we plan to deal with the issue of transformation of informal text to more formal that is easier for computer processing. Both problems require the use of different models which are used in natural language processing. We focus on Slovak language, therefore we will probably have to train our own models because of low support of Slovak language.

Support of student activity in an e-learning system

gondovaVeronika Gondová
bachelor study, supervised by Mária Bieliková

Abstract. Motivation is one of the most important factors which affectthe power of a man. While a man is positively motivated he can organize his time more effectively and he canconcentrate on the most important activities. He takes his duties more seriously. Student’s motivation in web education systems is particularly important.

In the last years the concept of motivation is often associated with games. The main objective of game is to keep attention and deployment player. For this reason games use variety of motivational elements. Gamification is the concept of applying game mechanics and game design techniques to motivate people to achieve their goals in non-game contexts. Gamification currently found its application in the domain of education, which is the theme of our work. The aim of our work is to support student activity in the web education system.

We will motivate the students through the game mechanics that complement the educational system ALEF . New generation of the system ALEF is adaptive learning framework that provides students with a number of questions for each week of the term. We try to support game principles throught levels, that represent sets of questions. One of the game principles states that levels should proceed from simpler to more complex. Levels in the ALEF implement this principle via personalized recommendation and item response theory.

User Behavior in the Digital Space of the Web

hlavac_patrikPatrik Hlaváč
doctoral study, supervised by Mária Bieliková

Abstract. The current state of informatization of the world and the presence of information technology in almost every area of our life requires efforts of users to adapt to different environments – system interfaces. Ability to work with such an environment depends on the time and duration of the interaction, functionality and also from information architecture. This also includes differences in information behavior of users (it is very individual and depending on the experience, knowledge, targets, locations and social contexts differs). We focus on the evaluation of user sessions, mainly on individual participant differences in the quantitative study.

Topic of user behavior on the site is devoted to a number of disciplines. Relations and interaction between human and computer addresses the discipline HCI, which is closely related to user experience – UX and user interface – UI). When identifying the methods and metrics we use knowledge of these disciplines.

holubMichal Holub
doctoral study, supervised by Mária Bieliková

Abstract. The idea of Semantic Web presumes structured, machine-readable data published on the web in one of the open standards. This would allow the emergence of new applications which could leverage these data to automate actions for humans. Presently, such data is already being created, published and linked together, thus forming the Linked Data butt or the Web of Data. Currently, there are few hundreds interconnected datasets covering wide range of domains (e.g. music, libraries, biology).

However, the number of links between entities orders of magnitude smaller than the quantity of published entities so there are still wide opportunities for researching methods for automatic link discovery. Moreover, utilization of these data is far from advanced. There are numerous possibilities for improving the precision of current recommender systems, as well as search and personalization algorithms by utilizing Linked Data.

In our research we focus on discovering relationships between entities published in the Linked Data butt and using these relationships to aid adaptive web-based applications. Mainly, we are interested in finding similar and identical entities, either within one dataset, or across more datasets and linking them together. This method has a variety of usages: 1) in deduplication algorithms (usable in data cleaning and processing tasks), 2) in similarity detection (usable in search and recommendation tasks), or 3) in data enriching and integration tasks.

Supporting Online Student Communities by Utilization of Questions and Answers Archives

hunaAdrián Huňa
master study, supervised by Ivan Srba

Abstract. In the recent years, many online student communities emerged. Thanks to rising popularity of Massive Open Online Courses (MOOC), these communities tend to be large and exhibit specific dynamics. Students in these communities usually have access to communication tools which should help them with the learning process. MOOC providers offer discussion forums that are used to ask questions about the learning material, discuss about interesting topics or as a place to introduce oneself. It is a great help for the students, however, because of the large community, course instructors and teaching assistants do not have enough time to answer all student questions.

Question answering process in online student communities has some specific features. Firstly, questions usually occur periodically, as the course repeats in time. Secondly, students’ knowledge is limited in comparison to course instructors. However, their knowledge is improving as they advance in the course. These specifics provide a tool to support these communities. Archives of questions and answers offer a way for automatic question answering, as students’ problems tend to be repeating. Moreover, the matching process between new question and answers from the past can be improved by utilization of specific user’s features. We can look at the role of the user (i.e. student, instructor), or student’s performance in the course assignments/exams. By automatic question answering, course instructors will have more time to address more difficult and complex students’ problems that cannot be usually answered by peers.

Analysis of User Activities in Web Browser

hunkaMário Hunka
bachelor study, supervised by Martin Labaj

Abstract. Web browsers are todays access points via users browse the Web content. There are many functionalities that are offered to the user. The thing that we are concern about is parallel browsing. Since it was added to web browsers, it changed the way we browse the Web. Many actions can be accomplished by different ways. There are many questions that can be asked – where are the user looking at? How does he switch between tabs? Does he use tabs more than multiple windows?

There are many studies, which have been interested in tabbed browsing. Furthermore, we have datasets from our faculty available, which can be used as well. We analyze this studies and datasets to find out their results and inspire ourselves to make our own experiment. In our thesis, we aim to implement and realize experiment in UX lab with a certain group of people who will perform a special task.

Eventually, we try to propose analysis of this work, which should lead to better understanding of behavior of parallel browsing. It can be achieved by interview the respondents that can support and clarify results from data or by comparing those results with one that already exits and finding the similarities and differences. From the conclusions we should be able to suggest some relevant improvements of future web browsers.

Analysis of User Activities in Web Browser

hurajtMiroslav Hurajt
bachelor study, supervised by Martin Labaj

Abstract. Web defined as a recurrent activity shows that a key aspect of the use of web browsers are revisitating mechanisms and page revisitation associated with them. In the past, many studies were conducted focused on gathering data from logs using revisitating mechanisms and page revisitation. There are also various foreign datasets or datasets of our faculty where we can find basic data. However, these data do not provide a closer view of what behavior preceded the usage of revisitating mechanisms and page revisitation.

In our thesis, we try to offer an analysis of the work of a group of users by monitoring perspective and areas of interests on the web sites and parts of a web browser. Further, we realize an experiment of analyzing ordinary work while using the web browser by a group of participants representing a programming user group. We have proposed a method to simulate ordinary work in a web browser by this group of users. The experiment will offer a detail view of a closer analysis of behavior in the short-term page revisitation and usage of revisitating mechanisms.

The main goal of this thesis is to perform quality analysis leading to possible implementation improvements or basis for better understanding of the user behavior representing a programming group of users in web browser.

Evaluating Web Aplications Usability Through Gaze Tracking

janikMartin Janík
master study, supervised by Mária Bieliková

Abstract. Usability of application, also know as quality of use, is a feature, which can fundamentally influence succes rate of an interactive application. Evaluation of usability depends on a type of application and on a person, who uses it. For web applications, we often do not know the set of their users. The only thing we can know, are the specific users of each end group, but they ususally represent an open and sometimes dynamically changing community. Through research of implicit feedback, which we gain from user-application interaction, we can evaluate the usability. For example, we can detect adverse behaviour, resolve it and improve the use of application.

Basic usability testing provides us sufficient amount of data to help us evaluate the desing of application. Gaze tracking brings new aspect for evaluating usability. It offers information of which objects attract attention and why. By following the object gaze order we can tell how users search through web applications, creating spesific gaze pattern. In a book “How to Conduct Eyetracking studies”, from Jakob Nielsen and Kara Pernice, they claim that for using heatmaps to analyze gaze tracking data, we need 30 users per heatmap. Therefore gaze tracking is the most expensive research method.

We aim to create a method for usability testing in a specific domain of the applications, like e-learning systems or content management systems, on a basis of implicit feedback, particularly from gaze tracking. We want to research the possibility to generalize usability tests for web applications from same domain. Our goal is also to answer the question: “Is it possible to create a method, which will reduce the number of users needed for usability testing, but it will also preserve the value of acquired data for evaluation?”

Search engine keyword prediction based on user need derived from eye tracking

januskaPatrik Januška
bachelor study, supervised by Eduard Kuric

Abstract. Internet has become an integral part of everyday human life. Millions of users interact with various search engines on a daily basis. Man as a user searches internet webpages for required information.Queries, characterizing wanted information, are entered into browser interface by users. Search engine then returns list of relevant pages, based on its own database, containing wanted information. Users visit these pages, spend some time on them, click on ads, modify queries and perform other actions. Query represents key part of information retrieval. In this context, query is defined as word or group of words describing or characterizing retrieved information. Biggest problem we face is creation of said query, whose execution results in relevant information and thus retrieval success. Main goal of browser is to provide user with the most relevant information from query result. However, user doesn’t always find resulting information relevant.

Recently, a wide variety of studies on information retrieval (IR) have focused on tracking users’ eye movements, and the use of high-performance cameras or eye-trackers has made application of this technique much easier than before. The method we propose in this work can be regarded as a type of implicit relevance feedback because it estimates a user’s search intent implicitly from data about where the user looked while browsing Web pages.

Cognitive Overload Identification during Human-Computer Interconnection

juhaniakTomáš Juhaniak
bachelor study, supervised by Mária Bieliková

Abstract. This study focuses on analysis of possibilities in evaluation of applications’ cognitive severity based on observing pupil dilatation while user interacts with the application. Based on actual research on cognitive overload effects, which demonstrated that cognitive overload is measurable by pupil dilatation, we assume that we can measure cognitive overload of complex stimuli just like computer applications. In terms of stimuli complexity, like changing content in time or significant chroma differences between concrete stimuli fragments, pupil range in neutral state, i.e. under minimal cognitive load is changing in time, so present methods cannot be applied independently. Biggest problem originates in acquiring an adequate reference line of neutral pupil range, which is in actual researches considering simple stimuli substituted by a reference value.

The main goal of this study is to develop a method and a tool, which lets us eliminate problems caused by interface complexity and then apply knowledges of cognitive psychology for the purpose of computer applications’ cognitive severity assessment.

User short-term behaviour modelling

kassakOndrej Kaššák
doctoral study, supervised by Mária Bieliková

Abstract. Modelling of user behaviour on the web site represents relatively well-known topic. Existing approaches, however focus mostly on capturing user’s typical behaviour and preferences from a long-term perspective. Typical usage of such an information is a personalization of the web site or recommendation of interesting content.

In moment, when we want to be able to identify the user’s actions in the very close future, for example his next step within the page (e.g., next visited page, information if he remain in the site), it is needed to consider also information about user’s short-term behaviour. This means his current preferences, intent, context, actual trends etc. In case of short-term behaviour, there is however typically unable to identify them as clearly as in case of long-term one. The reason is that short-term behaviour is influenced by above mentioned set of factors concurrently, which appears as biased, random tangle of user actions. This problem is possible to ease by identification of individual factors and their modelling, which however represent a non-trivial problem.

User short-term behaviour modelling has nowadays great potential due to the allowing for example to predict user’s future steps within the web site. In case, we know which page will user visit as next the most probably, or if we are able to predict that user will leave the site soon, we are able to proactively react – e.g., personalize the content for the user, maximize his user experience or maximize the web site provider aims.

Game-based support of online learning of programming

kisPeter Kiš
master study, supervised by Jozef Tvarožek

Abstract. Lack of motivation of students is one of the main barriers to the efficient learning. In the case of online learning there are also suppressed natural human and social aspects, so the lack of motivation causes even worse results. Therefore, research is still looking for new ways to increase students’ motivation for learning in an online environment. Games and gaming principles improves entertainment and increase overall involvement of students. Both of them are increasingly used in the online environment. Use of games and game principles, graphical visualization, and entertainment content for teaching programming opens the way to explore the impact of these elements in the learning process, the speed of acquiring new knowledge and the ability to select the most appropriate procedures for solving algorithmic problems.

As the writing of the source code according existing assignment is fully used in teaching programming at the faculty, we decided that in our work we will focus on the use of existing codes that students produced over the past years. For new students, we want to prepare a diverse range of tasks from these codes, aimed on the understanding, analysis and description of the code, the code refactoring and use of best practices in programming.

Support for Domain Model Authoring

kloskaMatej Kloska
master study, supervised by Marián Šimko

Abstract. Our work aims at supporting domain model authoring. Today, more than ever before should be advised to personalize and intelligently search for relevant data in the digital space. Potentially helpful decision support solution for the problem could be to use a domain model.

The main interest of our work is to propose a method by which it will be possible to easily author domain model. The method covers three areas related to creating and managing a domain model. This is an annotation, linking and versioning domain model. A good designed user-friendly interface to annotate we strive to make navigation and smart search in models more easily from a user’s perspective. Management realm creation of a domain model is supported by assembling and versioning models, aiming to promote and accelerate the collective creation of a domain model.

Software Developer’s Activity Recognition with Eye Tracking

konopkaMartin Konôpka
doctoral study, supervised by Pavol Návrat

Abstract. Monitoring a software developer’s activity on the level of interactions in tools provides detailed data source about the process of software development when compared to traditional source code metrics. However, in certain cases it may still seem to be incomplete if we do not exactly know what a developer is focusing on during work. Eye tracking technology lets us look at source code through a developer’s eyes and see how she interacts with source code and what is of her interest the most. Then we may evaluate her behaviour, reasoning, or knowledge better than just using interaction events in tools. We may extend existing approaches and methods for evaluation of her activity with eye tracking data, e.g., when used for untangling code changes, identifying source code structure or connections in source code.

In this work we focus on analysis of existing approaches for monitoring a software developer’s activity and its recognition. We propose to extend existing methods for understanding source code and developer’s activity with eye tracking data. At the Faculty of Informatics and Information Technologies we have got access to the eye tracking devices as well as to infastructures of research projects that we may use to complete this task. We plan to evaluate our work with students and possibly with professional developers as well. Partial results of this work has been already presented at international software engineering conferences and workshops.

Streamlining the Web Browsing based on User Task Identification

korbel_foto2Michal Korbeľ
master study, supervised by Martin Labaj

Abstract. Nowadays, most Web browsers support browsing the Web in a parallel way, that means using multiple opened tabs in the same browser window at once. However, a big disorder among them can arise in the Web browser when users browse the Web inconsistently and can get disoriented. Still, the parallel browsing with an adequate amount of the tabs seems to be a better strategy for browsing the Web than using the back button or finding a link in the browser history. The users can often save time and increase their work productivity with this browsing strategy.

Nevertheless, with an increasing number of the tabs in the Web browser, the users’ orientation decreases and it is harder to remember the tabs‘ positions in the Web browser. One of the consequences is more frequent switching among the tabs because there is higher probability that the users do not choose the right tab when they switch back to one of the already opened tabs. In many cases, the users switch to an unwanted tab and instead divert their attention, e.g. by watching video. This can much decrease the users‘concentration and consequently their productivity. In terms of making recommendations for the Web, it is important to know the users’ knowledge, interests and motivation for browsing, which enables us to facilitate adaptation of prototyping to make way for increasing the user productivity by identifying their objectives in the individual tabs. We are trying to design a model which will reorganize the tabs in the Web browser by detecting which ones are work and non-work, based on the user browsing characteristics.

Sentiment Analysis in Slovak Text

krchnavyRastislav Krchňavý
bachelor study, supervised by Marián Šimko

Abstract. Social networks are in last few years wildly used. Users do not only communicate with other users, but they are also discussing some topics. Our task is to determine whether user‘s status (comment, tweet, review,…) is positive or negative and how much. Methods of automatizing this process are called sentiment analysis tools.

Our solution will be working with Slovak language. English and Slovak have many differences, for example word flexion, double negation and diacritics. Except of these our analyzer should deal with emoticons, stop words, unnecessary punctuation and a lot more.

We will implement a sentiment analysis tool based on naive Bayes algorithm for Slovak language. Naive Bayes is an algorithm which calculates probability that a certain text belongs to some category. Our solution has 5 categories (strongly positive, positive, neutral, negative, and strongly negative). An important step is classifier training, so we need a dataset for every category. After training the classifier we can use it to determine the sentiment of texts from our test set and measure its accuracy. Our goal is to reach similar accuracy compared to existing solutions for other languages.

Tool to Assign Badges in CQA System Askalot

krenMichal Kren
bachelor study, supervised by Ivan Srba

Abstract. Community question-answering (CQA) systems are widely used platforms for sharing knowledge and information. Especially interesting aspect is utilization of these systems in educational domain – one example is our faculty-wide community question answering system Askalot. The quality of such system is determined by its users, more importantly by their activity. An intensive research is being made to find out how to increase user engagement, productivity or motivation in online communities. One such group of approaches, that has become extremely popular in the recent years, is gamification. It refers to using game mechanics in a non-game context to create a ‘gameful experience’ and increase user motivation. This can be achieved through various game elements, such as storytelling, progress through levels, leaderboards, achievements or badges. Badges are really popular in online communities because they represents one’s achievements, skills or effort and can help to boost users’ confidence or to identify the more “skilled” members of the community.

In our work, we plan to create a tool to assign badges in Askalot. Students can earn regular badges for a number of activities, like answering questions or upvoting. On the other hand, a number of top students each week will be awarded with special weekly badges based on their activity in a certain subject. Our main goal is to use badges to increase user engagement in Askalot and thus improve the quality of education in our school.

Automatic Estimation of Developer’s Expertise

kuricEduard Kuric
doctoral study, supervised by Mária Bieliková

Abstract. Evaluating expertise of developers is critical in software engineering, in particular for effective code reuse. In a software company, technical and expert knowledge of an employees is not usually represented in a unified manner and it is difficult to measure or observe directly. How well (the level of) the developer’s expertise is problematic to determine/estimate automatically. For example, to exactly show the developer’s expertise with a technology (library) we need to give the developer to solve a test. However, there is often a problem to motivate developers to execute a test and they have different standards for judging the degree of expertise. Therefore, our steps are based on the automatic estimation of the relative expertise with the consideration of other developers and to compare them with each other in the company.

Our new idea is to establish automatically developer’s expertise based on monitoring his/her working activities during coding in integrated development environment (IDE), analyzing and evaluating the (resultant) source code he/she creates and commits to local repository. By applying our approach we are able to observe and evaluate different indicators. For example, we can sight the developer who often copies and pastes source code from an external source (Web). Source code contributions of such developer can be relative to the software project, moreover, it can reveal a reason of his frequent mistakes or low productivity in comparison with other developers.

Analysis of User Behavior Patterns in Parallel Web Browsing

kyselMartin Kyseľ
master study, supervised by Martin Labaj

Abstract. Nowadays, web browsing is often activity which affect more and more people. Every single web user has own favorite practices and activities, which do during browsing. On their basis, user acquire more experience and improves his web-browsing skill.

Perhaps the biggest benefit in the last few years is the web browsing in form of tabs, called parallel browsing. There are linear viewing, where new-opened page is displayed in the same window and replace the previous content, or parallel browsing, where new-opened page is displayed in the same window but in different tab. Users browse the web in form of many opened tabs, comparing tabs, use tabs as a reminder and others.

If we want to improve user experience and help him to get better skill, we need to analyze the behavior of user and his activities during parallel browsing. We can monitor the frequency of opening tabs, maximum and average number of opened tabs, back-button using and many other metrics. The findings and suggestions will be used to adjust existing web extension or to create a new one.

Gathering Tabbed Browsing Behaviour as User Feedback on the Adaptive Web

labajMartin Labaj
doctoral study, supervised by Mária Bieliková

Abstract. Everyday activity on the Web includes everything from grocery shopping, education, employment, communication, learning, to entertainment. Web systems that support such aspects of human activities are becoming more adaptive than before. A web system first needs to know the individual users through their actions in order to facilitate any adaptation. In our work, we focus on observing, logging, analysing and utilizing both implicit and explicit user feedback. Apart from explicit feedback questions presented to the user at the appropriate moments for obtaining better and more extensive explicit evaluations, one particular area of our research lies in observing user’s movement across web pages – the parallel browsing.

Before we can even analyse and model the parallel browsing behaviour and use it for user modelling, improving domain models, or recommending resources to users, we need to capture it. In one approach, using a tracking script, we can easily observe every user of our application without user’s additional steps, but only the visits to and switches between pages from a limited set are observed. We used this approach to recommend learning objects relevant to exercises solved in a learning system.

In another approach, when we observe the user’s browser, for example through an extension, we see user’s every step across various web applications, even when the user leaves our application to look for additional information in other web systems, but we only see actions of a limited user group who choose to participate. We previously used this approach for automatically enriching learning content with external resources.

Evaluating the Usability of Applications Using Gaze Tracking

lalikVladimír Ľalík
master study, supervised by Jakub Šimko

Abstract. At the present time success of software depends not only from functional aspects, but also very important is experience which application provides to users, while they effectively and efficiently achieve specified goals. Therefore usability evaluation is important part of software development. When we need determine level of usability in an application, the best way is to let users interacting with application, while we watch their behavior, asking a questions, record their activity.

Usability testing with users is time and money consuming, therefore our effort is to obtain as much information as possible from users. Now we can use gaze-tracking to obtain more information on how the users ponder while they interact with application. These data can provide a different perspective on what attract the attention of the users or where they were trying to find information. We can obtain a big set of data from gaze-tracking, but an analysis of these data is extremely time consuming, because this process is not sufficiently automated.

Our goal is to design method which will automate process of evaluation interfaces with the data obtained from eye tracker during usability testing. We analyzed recent studies which provides us metrics and patterns in gaze-tracking data determining specific usability problems. We conducted experiment in which we validated some of patterns identified in literature. On the grounds of this metrics and patterns we designed method which automatically analyze and identify usability problems of web interfaces in gaze-tracking data from usability testing.

Personalized Scalable Recommender System

lieskovsky_a Adam Lieskovský
master study, supervised by Michal Kompan

Personalized recommendation is present on almost all major sites on Web, regardless their domain. It helps to reduce information overflow and brings added value to the site’s services. Nowadays the requirements for the scalability and reusability of recommendation systems are much more demanding than in the past. Usually recommendations have to be delivered in real-time, which with large data sets and user base increases the requirements on system resources.

We propose scalable and flexible hybrid method utilizing content, context and users’ behaviour. We evaluate our method with a system prototype, using streaming data sets considering quantitative aspects in the domain of newspaper articles.

Content Recommendation from Archives of Questions Answered in Communities

lovasovaViktória Lovasová
master study, supervised by Ivan Srba

Abstract. Community Question Answering (CQA) sites such as Yahoo! Answers or Stack Overflow have become valuable platforms to create, share, and seek a massive volume of human knowledge. The task of question retrieval in CQA aims to resolve one’s query directly by finding the most relevant questions (together with their answers) from an archive of past questions.

Archives of questions, however, provide another potential which has not yet been fully discovered yet – to recommend solved questions that may be useful for users (e.g to expand their current knowledge or to provide topics they are interested in). We propose a method for personalized recommendation of solved questions considering the user’s interests – the questions a user frequently views, answers, asks, comments, rates and marks as favorite. More specifically, we try to predict the questions, which user would mark as favorite.

We implement and evaluate the method by the data from the system Stack Overflow – a CQA system for professional programmers, which allows us an extensive off-line verification. To precisely reproduce activities from the real environment of a CQA system, we use the infrastructure from system Askalot, which is developed by students at Faculty of Informatics and Information Technologies in Bratislava.

Recommendation of New Questions in Online Student Communities

macinaJakub Mačina
master study, supervised by Ivan Srba

Abstract. Community question answering (CQA) systems are popular on the Web and in enterprise environments. With an increasing popularity of Massive Open Online Courses (MOOCs) there is an opportunity for the CQA systems to help students in the online learning communities as well.

However, existing CQA systems have difficulties with an increasing proportion of questions that remain unanswered and with attracting helpers to address the posted problems. In the online student communities, it is even a bigger problem as it can lead to students drop outs. Because of the specifics of the educational domain, our aim is to propose a new approach for a recommendation of new questions (question routing), which will be specifically designed for CQA systems employed in educational settings.

Our main objective is to provide a better educational-specific recommendation in comparison to the general recommendation approaches. We plan to take student’s motivation and expertise into account considering both QA and non-QA data, e.g. courses grades or assignments grades. By routing questions based on student’s expertise, we can optimize knowledge utilization of users in online learning communities by engaging new students or students with low level of QA activity. We plan to use so called knowledge gap phenomenon for question-user expertise matching (i.e. more expert users tend to select more difficult questions while the opposite is true for less experienced users). In other words, we can utilize that questions asked by users with a particular level of expertise are usually answered by other users with the same level of expertise.

Mind-Controlled Application

Tomáš Matlovič
bachelor study, supervised by Róbert Móro

Abstract. The study of emotions in human-computer interaction has increased in the recent years. With successful classification of emotions we could get instant feedback from users, make the systems more emphatic, have better understanding of the human behaviour while using the information technologies. Those are the reasons why measuring and classifying emotions is important. A lot of methods exist to achieve this goal, but there is one not so well known in the field of informatics, namely the EEG (Electroencephalography).

In our approach, we aim to evaluate EEG device EPOC Emotiv and classify emotions from data captured by this device. Firstly, we analyze the methods for classification of the emotions and use them on an existing dataset. Then, we plan to conduct an experiment, in which the paticipants will watch music videos and we will use EPOC to capture the electrical signal from their brains. Lastly, we will verify the potencial of EPOC Emotiv device for classification of the emotions by comparing it with one of the existing tools, namely Noldus FaceReader.

Source Code Search Acknowledging Reputation of Developers

mekota2Martin Měkota
bachelor study, supervised by Eduard Kuric

Abstract. Newcomers in big software development teams can be assigned to work on a difficult tasks right from the start. From a new member’s perspective finding the right person to get an advice from may prove to be both time consuming and challenging since they might not be acquainted with other team members.

In our work we are attempting to solve this problem by gathering and analyzing information from version control and issue tracking systems and presenting reputable experts. The end result of our work will recommend these experts in certain parts of the source code therefore new members will spend less time find them and more time discussing the problem. The reputation of the experts will be based on their activity in the issue tracking system.

Learning Text Representation for Generating Descriptions

meszarosDalibor Mészáros
master study, supervised by Márius Šajgalík

Abstract. Nowadays, deep learning is experiencing growth compared to the past. The reason is size of today datasets, today computing power and improvements brought to neural networks architecture, which are the core of deep learning. Deep learning is showing promising results in many domains (graphics, music). Notably in domain of text processing, which we represent in this work.

In this work we use existing accepted methods of deep learning to present our methods for generating descriptions of input texts. In first phase we focus on methods of generating simple forms of description, such as tags, keywords, which describe the content of the text. While later we focus on more complex ways of descriptions, which use text generated from neural networks. Notable are document content, phrases, short text descriptions or even abstracts.

Voice control of computer

mokryMartin Mokrý
bachelor study, supervised by Jakub Ševcech

Abstract. The means, the most people use, in case of computer control, are the ones which are the most effective, but also the most natural for people. For long time choice number one have been keyboard and mouse. Because of the continual growth of computing power, it is now possible to control computer by more comfort way – human voice. There are different types of applications, not only for computers, but also for mobile phones, which can receive voice commands. The most of them accept commands in form of whole words.

The main goal of my bachelor thesis is to make application that can be control not by the whole words, but can accept commands in form of short sounds generated by vocal tract, or hands. It may not look like more comfortable way, but surely more effective one. First part of my project is testing features and classifiers that are commonly used in classification of speech sound and general audio. The second part consists of proposing and implementing interactive application.

Exploratory Search and Navigation in Digital Libraries

moroRóbert Móro
doctoral study, supervised by Mária Bieliková

Abstract. The information need of users is often ill-defined at the beginning and it tends to change in the light of new information that they gather during the search. Thus, their search tasks tend to be open-ended and more exploratory in their essence; the term exploratory search was coined for this type of searches.

In order to support exploratory search and navigation, we have proposed an approach of exploratory navigation using the navigation leads. Conceptually, navigation leads are important words automatically extracted from the documents present in information space. They are a generalization and extension of tag-based navigation; they differ from tags in being automatically extracted as opposed to being added by the users, in a method of their selection as well as in their placement. We distinguish two types of navigation leads: global navigation leads which provide a global overview of the domain, and local navigation leads which highlight terms (keywords) relevant in the context of a single document (search result) as well as the terms with the highest navigational value.

Although our proposed approach can be applied in any domain, we focus in our work on its application in the domain of digital libraries. More specifically, we are interested in the researcher novice scenario. We evaluate our approach by the means of a synthetic experiment as well as a user study in a bookmarking service Annota.

Analysis of Text Reading

mrocek2Jakub Mrocek
master study, supervised by Róbert Móro

Abstract. With the rise of the information technologies is the amount of text that we read in digital form on the screen rapidly increasing. And everything indicates that this trend will have increasing tendency also in the future. The informative value of the text is based on its clarity, therefore the text which is very rich in information may have only little benefit for the reader if it is written inappropriately for him or her. The clarity of the text is not objective, and therefore for different users (or groups) may be the same text understood differently (depending on their previous experience or other characteristics).

In our work we analyze how the text is read by the users using the device which is tracking user’s gaze. We mainly focus on identifying patterns of reading and thus whether the user read the text in detail, or just briefly went over the key parts of the read documents. We are trying to design a method which would be able to identify the critical parts of the text in terms of their clarity for the different groups of readers. We aim for accuracy while developing such a method, so that it could be used to increase the quality of the text in real use.

Keyword Extraction in Slovak Language

Marek Mura
bachelor study, supervised by Márius Šajgalík

Abstract. Nowadays we often find ourselves in need of processing large quantities of textual data and finding that which is most relevant to our interests. Analysing such a vast amount of text documents becomes easier if we have a set of keywords, which summarize the main themes and concepts of each document. Knowing these allows us to easily categorize the documents and eventually retrieve the most relevant ones with greater speed and precision.

My aim will be to implement a recurrent neural network capable of extracting keywords from and correctly assigning categories to Slovak documents (namely articles from the Slovak wikipedia). I will also focus on using additional information stored in each word, such as links to other articles and their PageRank values.

Software modelling support for small teams

olejarMartin Olejár
bachelor study, supervised by Karol Rástočný

Abstract. Software modelling process is one of the crucial parts of software development. A creation of a high-quality software model containing as few defects as possible is a big prerequisite for a successful project. Besides large teams, small teams also participate in software modelling and have to face many problems during model creation. Small teams need specific support to solve possible problems.

In this thesis, we analyse work of small teams learning the basics of software modelling. We focus our attention primarily on fast detection, identification and correction of defects in models and model synchronization and we offer overview of existing algorithms and solutions. Facilitation of high-quality model verification and validation and parallel collaboration during model creation can considerably raise work efficiency of small teams and prevent defects in source code created on the basis of model.

The main goal of this thesis is to develop an optimal method for solution of these problems and implement this method as add-in in well-known tool Enterprise Architect.

Identification of Important Places in Source Code using Eye Tracking of a Programmer

Barbora Pavlíková
bachelor study, supervised by Martin Konôpka

Abstract. Eye tracking finds its use in various research areas. Except of the usability studies, we can find it in psychology or user studies. Apart of that, it is also popular in studies of eye movements in programming. Many of such works are about reading a code fragments by programmers of various experience or about searching for links between source code elements.

In programming we are looking at various parts of source code and those which we are paying attention to the most are kind of important for us. The word important means different things to everyone. It could be a part of code which contains main or difficult logic, or something that is hard to understand for us.

In our work we are trying to identify these difficult parts in code only through tracking a programmer’s gaze. The difference from other studies is tracking gaze during programming so we are recording the whole process of creating code. This information could be then used in several ways. A team leader would be able to define weaknesses of his employee and arrange on improving his skills. Another use may be found in programming education. Tracking students during work on their tasks would be helpful to understand which sections are problematics for them so teachers would be able to adapt lessons according to the results.

Relationship extraction using word embeddings

pikuliakMatúš Pikuliak
master study, supervised by Marián Šimko

Abstract. Finding semantic relations between words and phrases in text is one of the tasks of Natural Language Processing (NLP). Harris’ hypothesis says that we can observe relationships between words using statistical analysis of text corpus written in natural language. Harris claims that semantic relations have certain patterns in co-occurrences of words. For example pairs Paris-France and Rome-Italy should have the same pattern.

In our work we are trying to utilize word embeddings to find these relations. Word embeddings are state-of-the-art linguistic statistical model based on deep learning algorithms. Neural networks are used to generate short vector for every word in vocabulary from statistical information about neighbourhoods of given word in text corpus. These vectors are composed only from latent features – features that do not have any meaning by themselves. However by comparing vectors of different words we can assess their similarity. It was empirically proven that word embeddings can preserve attributional as well as relational similarity between words. It means that we can find patterns in word embeddings vector space between pairs of words with the same semantic relation.

We are researching and analyzing patterns like this in our work. We designed our very own method for extracting semantic relations. First we gather knowledge about given relation from training data. This data is a sample set of instances of relations – an array of pairs of words. Information extracted from this set is used to explore the models of entire languages and identify new pairs of words with the same relation as was given in initial set. This method can be used to automatically enrich existing knowledge structures – such as ontologies – with new relations.

Keyword Extraction in Slovak Language

rafajdusAdam Rafajdus
bachelor study, supervised by Márius Šajgalík

Abstract. Natural language processing is fast evolving and important field of artificial intelligence, trying to advance connection between computers and human language. In last years, not only thanks to new methods of learning vector representation of words, such as Word2Vec, and progression in area of neural networks, we can improve existing processes, that we could only dream about in the past.

I will be utilising these tools to implement a method whose main purpose is to extract keyword from Slovak texts, applying interesting properties of word vectors and modern architectures of neural networks. These keywords, as a word representation of the text, should be useful in next level of text processing, such as text categorisation. Since this task wasn’t yet solved for Slovak language before, I will be considering special features of Slovak language, for purpose of better execution of the task, therefore obtaining better results.

Universal Tool to Assign Badges in Online Communities

Michal Randák
bachelor study, supervised by Ivan Srba

Abstract. Nowadays, it is common to use game elements and mechanics in many different software systems. Most of all, it is used in online communities like Stack Overflow or Khan Academy, but its use is much wider. Badges, reputation, and other elements help motivating users in using the system and thus increasing their activity. The goal of our bachelor thesis is to create an universal tool that could effectively evaluate which badges should be granted to users based on their activity in the system and the predefined rules. The communication between the system and our tool will work through simple REST API. The tool will be implemented as a web service in Java. The effectiveness of this tool will be evaluated by utilization of a dataset from the existing system (e.g. Askalot), or by randomly generated events.

Utilization of Information Tags in Software Engineering

OLYMPUS DIGITAL CAMERA

Karol Rástočný
doctoral study, supervised by Mária Bieliková

Abstract. Information tags are a subset of descriptive metadata that assign some structured information to other information artefacts (e.g., to a webpage paragraph or to a source code line). In general, informal tags have been proposed to model properties of tagged information (e.g., webpages). Additionally the information tags model is based on the standardized Open Annotation Data Model so information tags can be shared among software systems. Due to these properties of information tags, we utilize them for modelling source code files and to provide first basic tools which utilize source code model based on information tags.

For modelling source code files and supporting tools we utilize all categories of information tags: (i) User information tags – support code review process (TOTO, FIXME, CODEREVIEW, REFACTOR, …), (ii) Content-based information tags – information tags obtained by analysis of a source code via SonarQube, (iii) User activity information tags – information tags created by analysis of developers’ activities, e.g. implicit dependencies, (iv) Aggregating information tags – aggregate information from multiple information tags, e.g. facet tags for support of source code search.

Currently we are refactoring and finalizing implementation of an architecture for collecting developers’ activities and for enriching source code with information tags. The refactored architecture gives us possibilities to deploy the architecture to multiple organization, to collect clearer datasets and to provide experiments effectively.

Universal Tool to Assign Badges in Online Communities

Martina Redajová
bachelor study, supervised by Ivan Srba

Abstract. The application of gamification is becoming a widely used technique of an activity motivation not only in learning process in educational domains, but also in other domains with no such purpose. The main goal of various gamification mechanics is to motivate users to visit a system, be active in this system and to have another reason to come back regularly. There are many types of game elements used to achieved this, such as leaderboards, storytelling, achievements and application of levels or badges. Assigning badges is one of the most promising elements of gamification because it is rewarding users for their activity with no need of permanent focus on progress. It is proven that user’s ambition to earn a badge is provided by natural human desire of owning something. However, choosing right activities to be rewarded for and correct definitions of activity boundaries to assign badges, seems to be often a problem for web domains creators and there is just a few tools developed to solve this problem.

In our work, we are going to create a front-end for a universal tool for assigning badges focusing on creating correct rules for assigning badges and creating design of these badges. Correct choice of activities users should be rewarded for and correct definition of boundaries for assigning certain badges will be supported by specifically proposed visualization of users’ activities, which will be provided by our tool.

Optimizing Map Navigation Using Contextual Information

rostar_3Roman Roštár
master study, supervised by Dušan Zeleník

Abstract. With the increasing trend of ubiquitous computing, intelligent mobile devices in the form of smart assistants are becoming a common part of our lives. We can also witness a rapid growth of mobile GPS navigation usage on an almost daily basis, whether for personal needs in terms of navigating from point A to point B, or for the needs of work (such as taxi drivers, couriers, etc.). With the growing trend, the quality requirements for personal navigation rise and it is often necessary to know or estimate the traffic conditions ahead of time.

In our work we experiment with the time and weather context to augment information about real life gps traces obtained from dozens of food delivery couriers working in the cities of Bratislava, Prague and Vienna. Given the hypothesis that we can deliver more accurate predictions when we account contextual information we extract speed information from raw gps traces and map it to specific streets in aforementioned cities along with the contextual information in which the gps measurement happened. Thus we can build a prediction model on a part of such pre-processed data and perform an offline test to verify our hypothesis.

Similarities in source code

rostar_mMarek Roštár
bachelor study, supervised by Michal Kompan

Abstract. With increase of different programming languages, a problem of finding similar parts of source codes in different programing is rising. This can be useful for improving source code quality or identifying potential plagiarism.

In current day and age there are multiple ways of identifying similarities in source code, or text in general. Most known is text/token based method, which can be in some ways improved in stronger preprocessing. In my work I will focus mainly on identifying similarities using abstract syntax tree and since I haven’t found any attempts to use stronger abstraction and preprocessing while using abstract syntax tree I will test if using stronger abstraction on source codes and then comparing them through abstract syntax tree will be more effective than doing so without the abstraction.

Since I probably wont be able to test it on large scale source codes, I will focus on medium and small sized source codes. Also my program should be able to compare source codes in multiple programming languages, so i will primarily focus on C and Java and then add support for other languages if i will have the time for it.

Explicit User Input Quality Determination Based on Implicit User Input

rybarMetod Rybár
master study, supervised by Mária Bieliková

Abstract. Online questionnaires today are widely used in many different fields, from research data collection to filling out information about taxes for the government. Most of the time however we are either unable to tell if the information provided by users is correct, or we can decide this only by cross examination in the questionnaire or by using other methods to check the data. Especially if we are asking about user opinion, there is no sure way to estimate, if the user is telling us the truth.

If we would have method that could tell us if the answer from the user is truthful, we would be able to obtain better information from questionnaires. Online environment and various technologies allow us to collect implicit data from the user which can be used to predict if the answer from the user is truthful or not.

In our work we are training automated model which should be able with certain precision to tell us if the answer from the user is truthful or not. To do this we are using various metrics collected by mouse and eyetracking. Our partial results provided us with several statistically significant metrics and our first model is providing us with accuracy up to 70 percent.

Extracting Keywords in Latent Feature Vector Space to Model User Interests

sajgalik2Márius Šajgalík
doctoral study, supervised by Mária Bieliková

Abstract. User modelling includes modelling various different characteristics like user goals, interests, knowledge, background and much more. However, evaluation of each of these characteristics can be very difficult, since every user is unique and objective evaluation of each modelled feature often requires huge amount of training data. That requirement cannot be easily satisfied in public research environment, where personal information is too confidential to be publicly accessible. In a common research environment, we are confronted with training the model on only a small sample of data, which mostly requires humans to evaluate the model manually, which is often very subjective and time-consuming.

We examine a novel approach to evaluate user interests by formulating an objective function on quality of the model. We focus on modelling user interests in form of keywords aggregated over web pages from user browsing history. By treating users as categories, we can formulate an objective function to extract user interests represented as discriminative words, which can be used to discriminate the user within given community, which effectively avoids extracting words that are just too generic.

Supporting Learning Using Recommendation of Study Activities

Matúš Salát
bachelor study, supervised by Jozef Tvarožek

Abstract. Learning at university can be troublesome for first-year students. Different teaching methods than what they are used to from high school, and the quantity of learning may force them to drop out. Students need to organize their deadlines, when they should start with learning for some midterm test or when they have deadlines for their project. The many sources of information often make students misunderstand what teachers want from them. This leads often to forgetting the obligations and the follow-up failure. Students currently only seldom share useful tricks and hints on how to plan their time effectively for other classmates. Good time organization is the key for perspective and balanced learning.

In our project we want to predict event (exam, task) prepare duration for every student and then we want to give advices, what student should solve with bigger priority (simple recommendation). Every student is going to see what and when should be done, and should have the most priority in learning. All deadlines and current conditions in one place force students to make right and successful decisions. Events will be visualized for every student in their own interface. Prediction of event prepare duration with visualization of all term events is primary goal in our work.

Automated evaluation of website usability in terms of user experience

schwartzMatej Schwartz
bachelor study, supervised by Eduard Kuric

Abstract. With the evolution of Internet and its use in everyday life we come into contact with websites in increasing numbers. Searching for information, communicating with loved ones or business is the reason that we use web every day. From the educational objectives to grocery shopping online. When searching or using services provided on Web we need to verify whether the websites is functional and simple from the user point of view. When websites fail to comply with certain habits, it may discourage user and reduce its visit rate.

Website usability testing is mostly done manually, by group of testers, which focus on design errors. Testers are ordinary users who try to perform the given task. During execution they face problems which could lead towards discouraging real users from accomplishing tasks. This form of testing is very time consuming and demanding on human resources. By using the modern technology, we come to automated evaluation of website usability.

Websites are used by people in different countries, in different age ranges or by people with disabilities. While interacting with Websites, disability is often represented blind people or otherwise visually impaired. Our goal is to automate evaluation of the usability of web site from the perspective of people with impaired recognition of the colors and then change the interface so that it can be used. The aim is to give users with this type of disease an ability to use services as well as users with no disability.

Stream Data Processing

sevcechJakub Ševcech
doctoral study, supervised by Mária Bieliková

Abstract. In the past years, the interest for the domain of stream data processing is building up. Methods for stream data processing are used in every domain where results have to be provided in real time or in tight time constraints.

In our work, we focus on processing of repeating data streams, where reoccurring sequences can be used to compress the stream size and to enable the application of various methods from text processing by transformation of real-valued data streams into streams of symbols. We study the possibility of transformation of metrics running on data streams into sequences of symbols and we explore methods for their analysis. Applications we are focusing on are stream state classification, anomaly detection and forecasting in domains such as electrical energy consumption or other production/consumption processes. Our paramount goal is to facilitate parallel analysis of multiple data streams and multiple metrics running over them.

Promoting Sustainability and Transferability of Community Question Answering

srbaIvan Srba
doctoral study, supervised by Mária Bieliková

Abstract. In situations when Internet users are not able to find required information by means of standard information retrieval systems (especially web search engines), they have a possibility to ask their questions in popular Community Question Answering systems (CQA) such as Yahoo! Answers or Stack Overflow. The main goal of CQA systems is to harness the knowledge potential of the whole community to provide the most suitable answers on the recently posted questions in the shortest possible time.

At first in our project, we focus on providing a novel adaptive support that contributes to a long-term sustainability of CQA systems. We conducted a case study on Stack Overflow system in which we analyzed the recent negative evolution of the community. Consequently, we proposed suggestions how to preserve successfulness of CQA ecosystems by proposing adaptive support methods that are answerer oriented and involve the whole community.

Secondly, we consider CQA systems as innovative learning environments. We developed a CQA system named Askalot which is designed specifically for universities where students can take advantage of the learning aspect in the question answering process. Currently, we focus on application of CQA concepts in MOOCs (Massive Open Online Courses), which are provided by Harvard University in edX platform.

Analysis of Reading Difficulty in Web Environment

strbakMartin Štrbák
master study, supervised by Marián Šimko

Abstract. Success of Web applications depends on their ability to recognize user characteristics (interests, knowledge, type of personality etc.) and apply them to improve offered services. Mostly, these characteristics are acquired trough analysis of interaction of user with concrete Web application (where is clicking or looking, what is writing). More difficult is to determine if textual web content is interesting.

One method to identify if reader is satisfied with content of document is to ask him. This approach has its limitations, because people are not always very honest, or have their own perception of classifying things as boring or entertaining. Questionnaire is a bit more objective.

Another methodology might be to analyze brain waves of user with electroencephalograph (EEG) while reading and compare them when his neural activity is idle. Most commercially available EEGs are equipped with software used to estimate emotions. We will use it to decide if he is bored, concentrated, confused or delighted.

To get more precise information, more genres of text documents should be taken into account.

Presentation of personalized recommendations via web

svrcekMartin Svrček
master study, supervised by Michal Kompan

Abstract. Nowadays, personalized recommendations are widely used and very popular. We can see a lot of systems in various fields, which use recommendations for different purposes. However, one of the basic problems is the distrust of users of recommendation systems. They consider them as intrusion of their privacy. Therefore, it is important to make recommendations transparent and understandable to users. Our main goal is to propose several methods for presenting the results of the recommendations. In this context, we are trying to create recommendation system, which will be using standard implementation of some recommendation technique.

On one hand we want to propose several methods how to explain recommendations to end user. In this goal we try to find the best method or approach of explaining recommendations without the need for knowledge of recommendation technique. This will allow us to explain each item differently by using information about user and his preferences. As we mentioned, with this approach we will not be dependent on recommendation technique. On the other hand we also want to know more about user because we need this type of information for our explanatory method. In this context we are trying to focus on different approaches to obtaining data about users. These information allow us to provide the most suitable explanations.

We also want to verify the suggested approaches in the selected application domain that are newspaper articles in order to obtain statistically significant results through the use of implicit and/or explicit feedback.

Analysis and Measurement of User Behaviour in a Web Application

truchanPeter Truchan
master study, supervised by Mária Bieliková

Abstract. In this thesis we discuss behavior of user visit in web application and also the most important metrics and characteristics of the behavior. We put to use advanced artificial intelligence algorithms.

We measured user behavior in bank application and Internet banking. Main measured metrics are visited pages, time spent on page, interest in products, exit page, hour of the day, channel by which visitor came to our application and lots of other metrics about every visit. We measured behavior of approx. 250 000 people.

These data are then joined with data from internal database, which contains age and sex of the registered visitor. After cleaning and depth understanding data, we use singular value decomposition (SVD) to find the correlations between metrics and their information value. When amount of data is reduced, we try to use BIRCH to cluster data. The goal is to predict user’s behavior.

Application of Machine Learning for Sequential Data

uherekPeter Uherek
master study, supervised by Michal Barla

Abstract. The thesis deals with the problem of using of sequential data in machine learning methods, especially in the Recurrent Neural Networks. We work with user data that originates from the paywall of foreign news portal. The aim of this thesis is to propose and design a method which would verify several hypotheses.

We want to research the possibilities of predictions in according to user history of the web browsing. We are trying to focus on different approaches of using of sequential data for the predictions. We have three main objects of interest. First one is the predictions of articles that user visited. Second one is predictions of the user’s payments for articles and last one is the prediction of the popularity of articles.

We focus at the Recurrent Neural Networks with using the Long Short Term Memory architecture. Long Short Term Memory architecture consist of several cell memories which can help when there are very long time lags of unknown size between important events. This approach can offer better solutions than commonly used machine learning algorithms. Our goal is find possibilities of using the Recurrent Neural Networks with user data from news portal.

Analysis of Human Concentration during Work Using Web Applications

vnenk_smallLubomir Vnenk
master study, supervised by Mária Bieliková

Abstract. Full concentration during work-time is really hard for many reasons: task is boring, too difficult or a motivation is missing. We dedicated our work to distinguish computer activities connected with work and with other thing, like personal life or fun. We use this info to help stay focused while we are fully working and to relax as much as possible when we are on fun or personal sites.

We capture user’s activity using multi-platform activity trackers. They do not capture just metadata, no private info, so data analysis is a bit harder. We focus on applications’ connections through theirs links and user’s application switches. Data analysis is made by machine learning algorithms. For application categorizing we use Naive Bayes classifier that performed on training dataset with 90% R2 accuracy. For dividing into work and non-work activities we use SVM.

Modeling Programmer’s Expertise Based on Software Metrics

zbellPavol Zbell
master study, supervised by Eduard Kuric

Abstract. Knowledge of programmers’ expertise is utilized in various ways during software development. Correct identification of task assignees adds to their effective resolving. More targeted forming of teams improves their potential. Personalization of recommendation or search in source code assist programmers. Knowledge of programmers’ expertise impacts quality of development and software. Determining programmer’s expertise in a context of certain need in software development may not be trivial. Programmer’s experience is usually suggested by projects on which he worked. Them alone however say nothing about areas in which the programmer is an expert. More accurate source of information about expertise is the programmer’s source code on which he participated, ideally supplemented by information about his activity during software development. Amiss determination of an expert may have negative consequences, or may lead to difficult software maintenance, etc.

In our project we therefore propose a method which models programmer’s expertise for a desired development concept to a certain point of time, so that it is based on his tasks, source code and interactions with it. The method is evaluated on real software projects from Eclipse repositories supported by Bugzilla task data and Mylyn interaction data. We argue that the proposed method can contribute to quality of software development.