Students’ Research Works – Spring 2015

Proceedings of Spring 2015 PeWe Workshop

Data Analysis, Mining and Machine Learning (PeWe.Data)

Recommenders (PeWe.Rec)

Semantics Acquisition and Domain Modeling (PeWe.DM)

Text Processing and Search (PeWe.Text)

User Experience and Implicit Feedback (PeWe.UX)

Proceedings Template

to the top | to the main


Doctoral Staff

bielikova barlajtvarozek simkomsimkojkompan kramar   zelenik

  • Mária Bieliková: web personalization, user/user groups and contexts modelling, usability and user experience (and HCI in general)
  • Michal Barla: user modeling, implicit user feedback, virtual communities, collaborative surfing
  • Jozef Tvarožek: social intelligent learning, collaborative learning, semantic text analysis, natural language processing
  • Marián Šimko: domain modelling, ontologies, folksonomies, semantic text analysis, Web-based Learning 2.0
  • Jakub Šimko: crowdsourcing, games with a purpose, semantics acquisition, Web-based learning
  • Michal Kompan: single user and group recommendation, satisfaction modeling
  • Tomáš Kramár: user modelling, personalized web search
  • Dušan Zeleník: context modelling, recommendation

to the top | to the main


Recommendation Using Graph Data Structures

antlOndrej Antl
bachelor study, supervised by Dušan Zeleník

Abstract. Personal recommendation is feature, which is everywhere around us. Many people don’t even notice that but it is there, it is here. It is all about relationships, about that between a person and item and also about that between persons each other. In this case we want to use features of graphs which are composed of nodes and edges which represent some kind of relationship between two nodes. One of the best attribute of graphs is that we can directly reach on any node related to other.

We built a graph database of movies and keywords on which we want to experiment with building queries recommending list of movies similar to one specific title using their relationships to each other. It depends on how particular movie relates with particular keyword which is also related to other movie. We also have to face problem of determining which movie to movie relationship is relevant and which is not.

Prediction of Citation Counts of Journal Articles

bachoAdam Bacho
bachelor study, supervised by Róbert Móro

Abstract. Nowadays, there are dozens of new scientific articles published each day. For researchers it is harder to determine which articles they should read next and then use for their further work. All of the researchers want to write their scientific work on the basis of the best available articles of other authors. One of the most popular features often used for designating a quality of an article is its citation count, i.e., how many times the article was cited by other authors. Thus, in this work we designed a predictive model based on article and journal characteristics which tend to be good predictors of the citation counts.

Our model is trained and evaluated on a subset of PubMed Central dataset in which more than 900,000 of open access articles are stored. After the cleaning process of the whole dataset only articles with the same publication year were used. Besides features, such as the title length or the number of authors, which have been already evaluated in related works with sometimes contradictory results, we include new features, the most prominent of them being Eigenfactor Score that can be obtained from ISI Web of Knowledge.

Personalised Support for Publication Reports

Andrej Biro
bachelor study, supervised by Michal Kompan

Abstract. The aim of this work is to design and implement a system for personalized management of publication outputs. Researchers often have to deal with managing of a large number of their publications. Except of tracking their works, they also have to keep a list of papers in which these works are cited. They and the institution where they operate are evaluated on the basis of this number, which is also important.

Our model is learned from Google Scholar, Scopus and ISI Web of Knowledge. They are the biggest citations database in the world. After removing duplications data about author and citations informations, the rest of them is saved and graphically display to user.

Researchers spend a lot of time by dealing with forms about their paperworks, if they want to save information about it into school’s database. We implemented a system, which helps by filling forms.

Predicting Content Quality in Community Question Answering

borak Martin Borák
bachelor study, supervised by Ivan Srba

Abstract. Information age makes it easy for people to seek information on various topics. Everyone who has access to Internet can just type whatever he is curious about into a search engine, and he will receive thousands of results. However, sometimes people have more complex questions or they need very specific answers, which are hard to find using traditional methods. Community Question Answering systems (CQA) are systems, where users can publicly ask questions and let other users answer them.

Quality of questions/answers on these systems is not always optimal, therefore a content quality prediction and evaluation is in place. It is beneficial for users, to be able to distinguish between content of high quality and that of low quality. It enables them to solve their problems more quickly and comfortably and it also helps them produce better content in the future.

In my project, I specifically focus on answer quality prediction. Working with Stack Exchange datasets, I extract various features from answers and user profiles and use methods of machine learning to predict, how many votes will given answer eventually receive. Using linear regression, I am able to predict just that, and also determine the measure of impact, that each individual feature has on answer quality.

Explanations of Personalised Recommendations

cajaMatej Čaja
master study, supervised by Michal Kompan

Abstract. Recommender systems became widely used in many different areas and on many internet sites such as Amazon, Netflix, Pandora etc.. These systems are designed to learn from user’s behavior and help them find what they are looking for, or offer recommendations about information they might find useful. Despite its usefulness and popularity, there are various reasons, why users might find it uneasy to use these systems, or to trust and rely on recommendations they offer. The lack of knowledge about the process of recommendation tend to leave user unsafe. Privacy issues are often considered, about which information were kept, which were not and why. Hence, these systems are usually seen as black boxes in which there is no other choice for a user than to trust their recommendations. This leads users to discouragement and dubiousness in recommendations.

One of many approaches to make these systems more transparent and more credible is to present recommendations with explanations. Explanations such as the one offered by Amazon: Customers who bought this item also bought… are not even able to increase transparency and credibility, but can also help users make decision or find what they seek, even faster.

Our goal is to analyze different approaches in presentation and explanation of recommendations with aim to create a system able to present and explain recommendations to users. This system with collected data will be later evaluated using common approaches and methods.

Automated Syntactic Analysis of Natural Language

cervenovaDominika Červeňová
master study, supervised by Marián Šimko

Abstract. Natural language as one of the most common means of expression is also used for storing information on the Web. The information are often hidden within large texts, which are difficult to process effectively without computers, but there are usually no tags or any other form of semantics included, that is necessary for the text to be easily machine-processable. Therefore we need to transform information from the simple text form into a structure, understandable to computers and this process is called Natural Language Processing. Syntactic analysis, as a part of the natural language processing discovers formal relations between syntagms(words, groups of words, punctuation) in a sentence, and assigns them syntactic roles. A discovery of proper relations is needed for the later semantics acquisition.

There are many approaches to automatic syntactic analysis and their accuracy depends on complexity of a chosen language. In general, Slavic languages are one of the most difficult languages to parse as they are very flective with high grammar complexity and have usually free word order in sentences. Despite there are many researches focused on parsing Slavic languages, like Czech or Russian, but parsing of Slovak language still falls behind.

We propose a hybrid method for Slovak language parsing. It consists of two classificators, which both predict syntacitc roles and relations based on morpholigical information (such as grammatical categories, POS-tags, lemmas, etc.). The first one, base classificator is an existing dependency parser and parses all sentences given. The second classificator, called oracle, is developed and trained specially for Slovak language. It helps to improve classifications from the base parser and achieve higher acuracy of slovak language parsing.

Source Code Review Recommendation

chlebanaMatej Chlebana
master study, supervised by Karol Rástočný

Abstract. Systems for managing and processing information systems create metadata that help them in analyzing the managed content. One of the types of metadata are the information tags that contain structured information associated with a particular piece of content. Information tags can be used to store information about the activities of developers and the properties of source code artifacts in the domain of information systems development. Information tags allow abstracting from computationally complex direct processing code. Their use can thus help to solve the problems connected with the development of information systems such as optimization of the development process by supporting the assessment of source code.

Current approaches to the source code review require manual inspect of all code changes by code reviewers. In this approach there are often overloaded reviewers which directly affect the degradation of the quality of source code. The main objective of this work is to facilitate the time-consuming work of assessors of the source code and thereby contribute to the quality of the developed software. We analyze developers’ activity during the development of software product by using the process mining technique, watching developing processes which are divided by commits. We collect annotations for individual processes and according to these annotations we determine character of the process. We build process models and compare annotated process models with new process models that have not been already annotated. We consider the processes with high similarity as the processes with the same character and in this way we are able to find out processes without any annotation. If new process has error character, we create recommendation for code review for the code which was written in this process.

Data Stream Analysis

DCIM100GOPRO

Matúš Cimerman
bachelor study, supervised by Jakub Ševcech

Abstract. Nowadays we can see Big Data processing and analysis in many domains. Increasing volume, variety and velocity of the data has initiated growing interest in Big Data. There are affected domains where it is essential to process large amounts of data, for example data produced from sensors in general, computer networks or social media such as Twitter. These potentially infinite sources of data, generating the data in rapidly changing velocity and volume, are simply called sources of data streams. Processing and analysis of data streams is a complex issue. Stream needs to be processed with low-latency, a solution must be fault-tolerant and horizontally scalable.

We are building software architecture to process data streams with near real-time latency. This architecture is providing valuable outputs based on user queries to data streams, where near real-time processing is essential. Routinely used method for Big Data processing is batch processing using MapReduce model. However, this approach is not applicable to process data streams because of the characteristics (infinite flowing streams, high volume, velocity, etc.) and requirement to gain filtered or analysed outputs in near real-time. To achieve these requirements, we need to use different approach, called data stream processing.

In our project, we analyse existing solutions and frameworks for processing data streams. We provide performance evaluation of our suggested topology set up using Storm Framework. Proposed topology is designed to process high-volume data streams, to be highly horizontally scalable and fault-tolerant. We satisfied the requirement to process data streams in near real-time by using proposed topology and incremental algorithms.

Predicting Interest in Information Sources on the Internet using Machine Learning

cizMartin Číž
master study, supervised by Michal Barla

Abstract. The most important goal of a each Internet source provider is to capture reader’s interest, so that reader becomes a returning customer. Although it is useful to evaluate previously published articles, there is an opportunity to find article’s potential to be popular before it is even published. There are many attributes that may decide whether an article has a potential. These attributes include title, content, author, source, topic, freshness, credibility.

To predict popularity of an article based on these attributes we will use regression in machine learning. Popularity of an article cannot be expressed without selected time frame, for example popularity after publishing might be different one hour later and one day later. Each article usually has a peak of maximum readings after which it quickly decreases to an average number, so we have established that popularity of news article on the Internet can be defined as number of times an article was visited after publishing it in a short period of time, for example in one day. Our goal is to cover maximum number of readings in this period of time. We exclude special case when article has periodical peaks of readings (for example one article about autumn clothes might be popular every autumn), as we are interested only in new articles that become popular after being published.

To make prediction as true as possible we need big data, which we will need to edit specially for Slovak language: remove joining words and edit key words to their root form or to their lexical form.

Recommendations for Developers to Prevent Errors in Source Code

csokaBálint Csóka
master study, supervised by Eduard Kuric

Abstract. These days, the number of developers working on projects from open code repositories are growing. With continuous tracking of their activities during work is possible to create a model of activities, from which we can determine, which fragments of code were modified and what type of activities did the developer perform to achieve these changes. It is possible to anticipate parts of source code, which are susceptible to errors with the help of these data mappings.

When developing a larger scale software system, maintaining code and fixing errors may take more time. We could help developers avoid writing problematic code by showing relevant information of high-error rate fragments on the fly. Also, it would be possible to recommend already used (and tested) parts of source code to avoid rewriting content for the same problem domain.

Our goal is to analyse the existing methods of processing and collecting data from open repositories, explore ways of tracking the developer to effectively anticipate the error-susceptible parts of code based on combining these metrics. The aim is to create a plugin that integrates into the development environment and warns/recommends based on the current activities of the user. This behaviour can be achieved by generating the activity model of the developer that is bound to the latest repository of source code including errors. Our solution will evaluate and update this model based on the latest actions of the user. The functionality and effectiveness of the module will be compared to the existing solutions.

Personalized Recommendation of TV Relations

xdadojJakub Ďaďo
bachelor study, supervised by Mária Bieliková

Abstract. Much information founded on internet causes that users don´t have knowledge over them or rather they can´t search for proper information. The recommender systems was created to improve this situation. These systems are recommending items, based on user´s preferences or items each another. In this work we are focusing on domain of TV program.

We proposed web application with recommender system together, for user who wants to watch TV right now. We focus on his preferences and make recommendations for him. We are focusing on sparsity problem in case of collaborative filtering especially, because the effectivity and accuracy is dicrising with this problem. We proposed and implemented the method, which solves sparsity problem in first place, by imputation of similar items to matrix, which is used in collaborative filtering especially and afterwards make recommendations.

Methodics of Game Evaluation Based on Implicit Feedback

demcakPeter Demčák
master study, supervised by Jakub Šimko

Abstract. As a part of the development and testing phase of game development, different prototypes of the game are being produced and subsequently tested for various qualities that the developers aim for – general usability of the game interface, performance, or just the plain entertainment value of the game. Playtesting is a popular methodology used to evaluate certain properties of a game. During playtests, the object of evaluation is the experience of the players itself. With a particular gameplay feature in mind, the playtesting players are let to interact with the game as naturally as possible. However, playtesting does have shortcomings. It is disruptive to the player, so we can no longer tell, if we are measuring the exact experience real players are going to have with our game, and it is also incapable of collecting all of the information about the player experience.

The objective of our method is to provide an additional source of feedback from playtesting, which diminishes the aforementioned shortcomings when integrated into the playtest. Our method is based on implicit feedback, namely gaze tracking, thus, it is less disruptive for the player and uses a different source of information about the player than just the observation and the explicit feedback.

Our method further concentrates on evaluation of gameplay learnability. Learnability is especially crucial for games, because in order for games to be entertaining for the player, the player will have to be able to get a quick grasp of the basic game mechanics a dynamics.

TV Program Guide Metadata Acquisition

dubecPeter Dubec
bachelor study, supervised by Mária Bieliková

Abstract. Even today, there is still many people watching television and using TV program guide when deciding what to watch. It is thus important to describe entities of TV program with relevant and high quality metadata. Quality of these services directly depends on metadata which we have. The higher quality of metadata we have, the higher quality of these services can be. Our enriched metadata can also later be used to improve recommendation of these items.

Internet is full of information about movies and TV Shows, but these information are not in correct form or are spread on various sources. In our work we propose a method for automatic acquisition of metadata for TV program guide from online movie databases CSFD, IMDB and Linked Open Data space. The method is designed to be able to extract metadata in English language and map them on entities of tv program guide described in Slovak language. Also part of our method is automatic categorization of Documentary movies for the purpose of enriching tv program guide with our own metadata. It means that Documentary movies are classified into subcategories based on their content (description).

TV Program Recommendation

dzurnakErik Dzurňak
bachelor study, supervised by Mária Bieliková

Abstract. Personalized recommendation is the main domain at bunch of systems, which are based on content, item’s attributes, properties and similarity between them. Factors that create a demand on recommender systems are influenced by the society and its desire for knowledge of news and one of the easiest ways for human to handle information is by audio-visual representation of data. Wherever we are, TV broadcast is the most common temptation as much for our sight as for our hearing.

We aim towards a recommendation of items included in TV programme according to item’s attributes. We propose an improvement of Item-based collaborative filtering by focusing on airtime/duration of the programme, which may be significant to use as factor for recommendation. Our method is implemented and tested as a web application, Results were compared to already existing solution in this area.

Automatic Detection and Attribution of Quotations

richar-filipcik120Richard Filipčík
master study, supervised by Marián Šimko

Abstract. Although the text is one of the oldest ways of information preservation and information interchange, it is still playing the key role in these tasks. In the present times thanks to the Internet however there are far easier ways for publishing and spreading textual works all over the globe. The only problem of this easy way of information spread is that we are often overloaded by information, hence it is almost impossible for us to get only the information we are currently looking for.

Natural language processing (NLP) is the field where we can look up the help. There are a lot of issues the NLP can help us with, one of which, little explored, but still interesting, is automatic detection and attribution of quotations. Those processes can be very useful in many domains since outputted data can be used for additional post-processing in a range of various ways.

The aim of our work is to propose a method for automatic detection and attribution of a direct or possibly even an indirect quotations in Slovak texts coming from the Internet sources such as newspaper articles. The output of our method should consist of a list of quotations extracted from the unstructured input text as well as names of their attributed originators.

Imagine Cup analysis

filipcikova_svecMonika Filipčiková, Andrej Švec
bachelor study, supervised by Jakub Šimko

Abstract. As we know the world is permanently changing, it is the process of human race. Since, people are constantly finding something new, they want to understand the world more, know new things. People are struggling with finding out new solutions to problems they suffer from, some people are creating solutions to something what can make the world better and move it forward. We also want to make the world a better place and that is why we are going to participate in an international software development competition named Imagine cup.

Imagine cup offers two areas, you can focus on. First, it is innovation, where people can work on a new solution or it is possible to look around and find some things that people have problem with. It can be a small trouble from everyday life with which we can cover the greater part of mankind or we can try to solve for example a health problem of a specific part of mankind, whom we can do life more comfortable.

One of our opportunity is to build our project with eyetrackers, that we have on our university. Eytracker is a technology that is getting cheaper and cheaper these days and enables us get more information about people from their eye view. There is an option how to find out in which mood is a human, what can be useful in different spheres of life. On the other hand this way we can create something helpful for people who suffer from a disability. Our first idea was a trolley for disabled. They can not move and our trolley would be controlled by eyes so they could control it even if they couldn’t move their limbs and head.

To sum up, we are currently in a state of looking for an area that can be improved or interesting to explore and we are mainly focusing on the eye-tracker technology.

Sitting Posture Quality Evaluation Using Depth Camera

gaborik_lesko_macina_stanoJozef Gáborík, Matej Leško, Jakub Mačina, Jozef Staňo
bachelor study, supervised by Jakub Šimko

Abstract. Human bodies are not designed to sit, we evolved to walk and move at constant pace. It turns out, that nowadays people are sitting huge part of the day. Moreover, most of this time, they are sitting in a wrong way. Our goal is to prevent future health problems of computer users, caused by wrong sitting habits.

We propose a real-time posture tracking application that notifies user in case of a wrong posture. Application monitors user’s activity on a computer and therefore show notifications at the right time. Different notifications are shown according to severity of user’s sitting posture – taskbar icon, change of window’s border or brightness of the screen.

We are using three different approaches to gain the highest accuracy of detecting quality of user posture. We extract features from depth and RGB camera streams using Histogram of oriented gradients algorithm. These features are an input for neural networks. We are also detecting points at different parts of a user body and comparing depths with user’s calibrated image. Last method computes various depth image features from a region of interest, which is the user’s body, and these features are then fed into neural network for posture classification. Experiments shows that depth comparison method gained the best results. Some of the methods that use neural networks are good, but they are depended on quality of our training samples.

Slovak Web-based Encyclopedia

gajdosikPatrik Gajdošík
bachelor study, supervised by Michal Holub

Abstract. The Semantic Web is a big topic these days. Structuralized data give much better way in finding relations between entities that could never be thought of before. Slovak Web also contains certain amount of information and the goal is to make them accessible through the Semantic Web as well. One of the big players in the Web of Data is DBpedia, an ontology focused on extracting data from Wikipedia, that already has a solid ground for the internationalization.

We propose a method that allows DBpedia to extract information in the other language mutations of Wikipedia by automatically creating mappings that DBpedia needs for the extraction process. Our method creates mappings of articles to the right classes and then the mappings for the individual attributes found in the infoboxes. For this task we use the already existing mappings of different language mutations. By using different metrics we focus on finding more qualitative than quantitative results.

Utilizing Vector Models for Processing Text on the Web

gallayLadislav Gallay
bachelor study, supervised by Marián Šimko

Abstract. Text processing is an important part for plethora of tasks and is necessary for understanding content by machines in order to provide advanced functionality such as recommendation or intelligent search. Our goal is to improve lemmatization process in any given language by utilizing Word2vec tool by Google. This tool represents words as vectors and creates neuron map from plain text. In our work we focus on using large dataset consisting of plain text in selected language and small prepared data to create successful word lemmatizer.

The contribution of our work will be normalization of words in any given language with knowing almost nothing about the language. Current approaches deal with extracting a lot of plain text, cleaning of data, analyzing and optimizing the process. We believe that utilizing word2vec will improve process of lemmatization and helps in understanding of any language just by using meaningful plain text in given language. Currently we have the model trained on Slovak national corpus and adjusting our method to perform best results.

Linking Multimedia Metadata by Using Microblogging Network

gasparPeter Gašpar
master study, supervised by Jakub Šimko

Abstract. With the huge usage of the Web, information has become even more important part of people’s lives. Whether we are interested in a picture, video, document, or status on social networking service (SNS), there is always an intention to discover something new. Many researchers are trying to find the best way to characterize information. In our study we are focusing on metadata in a domain of multimedia and television.

A big potential in building metadata database is hidden in the SNSs. Past years they have become an irreplaceable companion on the Web for the most of people. On the one hand, they provide nearly unlimited space to spread ideas and opinions. Moreover, many television companies use them to propagate their programmes with desired articles and backstage photographs and videos. SNSs are also one of the most straightforward ways to get in touch with TV audience. People’s activity on public statuses makes an opportunity to reveal other interesting content.

In our approach, we are trying to propose an innovative method to interlink TV and SNSs. Our main goal is to extract and capture metadata from available content shared by the audience and TV companies. These metadata are supposed to enrich existing databases and give an attractive extension to huge audience. In our research we are analyzing Facebook Pages of popular Slovak and worldwide TV channels.

Automatic Diacritics Reconstruction in Slovak Texts

gederaJakub Gedera
bachelor study, supervised by Marián Šimko

Abstract. There is a lot of Slovak texts on the Web written without diacritics. This complicates various tasks related with intelligent information processing such as texts categorization or metadata extraction. Also, many people are annoyed when they have to write with diacritics. Therefore they ignore it. The aim of the work is to create a web application that automatically reconstruct diacritics in text.

Slovak language contains words that can have more than one way of diacritic reconstruction. In solving this task we are based on context. Using n-gram language model we are looking most likely occur of sentence.

This web service will be available in the form of extension for web browser for users as a support tool. By this way we can collect the text reconstructed by users. Then we can improved our database of words and also try to increment language model in order to improve diacritic reconstruction next time.

Support of Student’s Activity in an Educational System

gondovaVeronika Gondová
bachelor study, supervised by Mária Bieliková

Abstract. Motivation is one of the most important factors which affectthe power of a man. While a man is positively motivated he can organize his time more effectively and he canconcentrate on the most important activities. He takes his duties more seriously. Student’s motivation in web education systems is particularly important.

It’s both in the students and the teachears interests to use their time using this system the most effectively. Feedback is very important when talking about learning. For students it is rather useful when they can check their knowledges through questions and even more when they can discuss them.

It happens frequently, that students do have the opportunity to discuss the topics and the questions in the educational system, but they rather choose to use social networks and other ways instead. The reason for not choosing the option to use the educational systems vary. Many worry, that they would ask the questions in a wrong way and that it could have a negative impact on their final grades. In our project we want to improve student’s motivation towards the feedbacks. There are many methods, which help to raise peoples motivation. We want to focus on monitoring the students activity using the system and for motivation and the feedback. We will motivate the students via an improved score, which will reflect their activity. The results will be checked in the educational system ALEF.

Adaptive Collaboration Support in Community Question Answering

grznarMarek Grznár
master study, supervised by Ivan Srba

Abstract. With the development of Web 2.0, there is a novel option to obtain required information by asking a community in Community Question Answering (CQA) systems, such as Yahoo! Answers and Stack Overflow.

The existing CQA systems, despite of their increasing popularity, fail to answer a significant number of questions in required time. One option for supporting cooperation in CQA systems, is a recommendation of questions to users who are suitable candidates for providing correct answers (so called question routing). Various methods have been proposed so far to find answerers for uestions in CQA systems, but almost all studies heavily depend on previous users’ activities in the system (QA-data).

In our work, we focus on utilizing users’ context as a way of support question routing. We proposed a question routing method which analyses users’ non-QA activities from external services and platforms, such as blogs, micro-blogs, social networks, in order to better identify suitable users for answering new questions. This solution allows us to involve also users with no or minimal previous activity in the system.

Crowdsourcing for Large Scale Texts Annotation

harinek2Jozef Harinek
master study, supervised by Marián Šimko

Abstract. There is huge amount of information stored in natural language on the web or in various text documents. In order to be able to process it better, we need to pre-process the text into machine understandable form. To achieve this, we can use the process of annotating the texts in various layers. It starts from morphological layer and goes up to contextual layer.

In our work we focus on syntactic annotation of large scale texts by employing crowdsourcing principles. First experiments showed promising results. We were able to obtain correct solution in about 85 percent of sentences even with relatively small number of annotataions per sentence – about 17 per sentence (in range of 12 – 20 per sentence). Proposed method is verified in a software prototype designed especially for this purpose. The aim of our work is to explore the possibilities crowdsourcing has in the field of creating syntactic annotations for Slovak language, identify the quality of annotations created by our method, identify the power of crowd in this field and explore the possible use cases that can benefit from such dataset.

Analysing User Gaze on the Web

hlavac3Patrik Hlaváč
master study, supervised by Marián Šimko

Abstract. We propose a method of reading detection from gaze data. Eyetracking devices provide irreplaceable information about a user’s gaze. This work deals with the possibilities of identifying user interaction in the educational system. Our algorithm takes into account user’s fixation data and maps their coordinates onto single word elements. These are then processed with respect to their relative word distance.

Rule-based solution works by considering the sequences in the order of their occurrence. Unlike studies that calculate distance in points that eyes moved around the screen, we consider the distance of words in the vector.

Identification of Similar Entities in the Web of Data

holubMichal Holub
doctoral study, supervised by Mária Bieliková

Abstract. Publishing structured, machine-readable data the Web is the foundation of the Semantic Web vision. It also promotes usage of open standards so that intelligent applications could consume and process it easily. Today, many vendors provide such data. Moreover, it is actively being linked together, thus creating the Linked Data Cloud, also known as the Web of Data. Currently, there are around 500 interlinked datasets covering wide range of domains.

In our research we focus on discovering relationships between entities in these datasets. Usually, entities represent real-world objects. Discovering of relationships aims at aiding adaptive web-based applications. We are mainly interested in finding similar and identical entities. These can be either in one dataset (so the issue here is cleaning of data before publishing it on the Web), or between two or more datasets (so we can benefit from additional information about the same entity which can be found in other datasets). Our method has various usages: 1) for deduplication of data, 2) for discovering similar entities (usable in search and recommendation tasks), or 3) for data enriching and integration tasks.

Our method is based on comparing values of attributes using various algorithms, comparing whole entities using graph algorithms and putting it altogether in order to compute similarity between a given pair of entities. Moreover, the method uses machine learning so that it can set appropriate weights to it individual components. It can also automatically determine the right similarity threshold based on the underlying data, so it is able to adapt itself to various datasets.

User Reputation in Community Question Answering

hunaAdrián Huňa
bachelor study, supervised by Ivan Srba

Abstract. Community Question Answering systems (e.g. Yahoo! Answers, StackOverflow) have gained growing popularity in the last years. With the increasing amount of user generated content, a problem of identifying who is skilled and reliable user among the community members has arisen. The existing approaches estimate user reputation especially according to the overall user activity in the system, while the quality of user’s contributions is considered only secondary.

To address this drawback, our main goal is to estimate user reputation with focus on the quality of user’s contributions. We proposed a reputation schema that takes the amount of carried out activity as well as expertise into consideration. We propose a method that works with objective measure of quality – question difficulty based on time to adding the first answer as well as with community feedback – utility of questions and answers. To take into account differences between various topics in CQA systems, all the values are normalized.

The calculated reputation was compared with four baseline methods in an experiment on a dataset from Stack Exchange platform. The experimental results showed a higher precision achieved by our approach, and confirm an important role of contribution quality in estimation of user reputation. Interestingly, the results show that we should completely eliminate the factor of user activity, as this variant performed as the best.

Web User Behaviour Prediction

kassakOndrej Kaššák
doctoral study, supervised by Mária Bieliková

Abstract. Task of web user behaviour modelling and predicting is based data mining of pre-processing of big amount of data describing the user, the web site and user‘s interaction with the site. These data are then used to learning the information about user’s behaviour and habits. Information about web browsing can be extracted from different sources with various complexity and in dependency to actual source. They often need to be pre-processed. There exist three basic kinds of data describing user’s behaviour – web structure, page content, user session. To be able to improve and potentially personalize the system content, we have to know as much information about users’ characteristics and browsing habits as possible and model user’s behaviour.

Possibility to estimate user’s future behaviour can be very helpful in various situations, because it gives us the advantage to react to imminent consequences in advance. As example we can mention the situation when user leave the web site very soon. If we are able to predict this state, we can offer him/her an interesting content to keep him/her in system longer. In case of commercial systems as for example e-shop, this increases the chance that the user will buy something.

Game-based Support of Online Learning of Programming

kisPeter Kiš
master study, supervised by Jozef Tvarožek

Abstract. Lack of motivation of students is one of the main barriers to the efficient learning. In the case of online learning there are also suppressed natural human and social aspects, so the lack of motivation causes even worse results. Therefore, research is still looking for new ways to increase students’ motivation for learning in an online environment. Games and gaming principles improves entertainment and increase overall involvement of students. Both of them are increasingly used in the online environment. Use of games and game principles, graphical visualization, and entertainment content for teaching programming opens the way to explore the impact of these elements in the learning process, the speed of acquiring new knowledge and the ability to select the most appropriate procedures for solving algorithmic problems.

In our work we explore the area of existing solutions using games as a tool to support online learning in educational systems. We analyze the effectiveness of online learning, underpinned game compared to traditional teaching methods. We are working on a method to support online learning using games and improve the current process of teaching programming at the faculty. The proposed solution would be verified by the software prototype – a game in order to encourage the learning process in the basic programming in C.

Supporting Domain Model Authoring

kloskaMatej Kloska
master study, supervised by Marián Šimko

Abstract. Nowadays, we can see web as a huge repository of information. Quantity of information is such huge that we are talking in general about the overload of information. In order to effectively treat the people to information on the Web, several approaches were created, one of which is for example adaptation (content, presentation / forms, etc.). Adaptation is now almost everywhere on the Web, in email, in the web, tablet, in mobile phones. The adjustment models used for data storage and different methods of information processing, which will be familiar.

The basic problems which may be encountered when working with models is unfamiliarity with domain model, unconscionability of relationships in this model, flooding the user with automatically generated entities, navigation cumbersome and non-intuitive user interface for working with models. The aim of our work is to propose a method for support of domain model authoring. The output of our method should be framework for user friendly interface supporting all above mentioned issues.

Personalized Recommendation Using Context for New Users

kocian3Róbert Kocian
master study, supervised by Michal Kompan

Abstract. The very first user interaction within a personalized system is crucial from the recommender system and user modeling point of view. This activity is critical because in these moments a user creates a relationship and opinion, which are important for user to the next use of the system. If the user is new in the system, we have no information about his preferences and thus no or only trivial recommendations (personalized) can be recommended.

Today we experience huge social networks increase, where users are identified into groups according to what interests, work or relationships user have. This attributes we can use for recommendations to new users respectively. We can obtain a huge amount of information from related or similar users or other systems that can be used for increasing the quality of recommendations for the new user .We can also consider the social context obtained from other systems and applications.

In our work we analyze the current approaches for the new user in the context of different types personalized recommendations. We explore the possibility of obtaining additional information about the new user context from other systems. The aim of our work is design of methods for personalized recommendation with an emphasis on solving new user problem using methods enhanced by user context from related and similar users and documents. This context which is used in our work to solve matrix sparsity is user’s related school works and context documents from school library system Annota.

Extracting Keywords from Movie Subtitles

kosutMatúš Košút
bachelor study, supervised by Marián Šimko

Abstract. In our work we aim at keyword extraction from movie subtitles. Keywords and key phrases although missing the context can be found very helpful in finding, understanding, organising and recommending the content. Generally they are used by search engines to help find the relevant information. With the rising amount of information available on the Web, keywords are becoming more and more important, though it’s even harder now to determine keywords for all content by person, so we target on automatic keyword retrieval.

Movies and video content are becoming massively available and widespread. The ability to automatically describe and classify videos has a vast domain of application. In our work we aim at movie subtitles as a source of information, which seems to be more efficient compared with video and audio analysis. The main goal of our work is to design a method able to use of specifics of subtitles. First part of method focuses on pre-processing. Pre-processing tries to process timing, information for hearing impaired persons and tags included in subtitles. Second part divides subtitles into conversations according to the speed of speech (words per minute) and the gaps detected between the conversations. Scored conversations are used for keyword extraction. By this we create the set of keywords that can be used by recommending and search engines.

kratkyPeter Krátky
doctoral study, supervised by Daniela Chudá

Abstract. Mouse data representing cursor movement or button clicks are employed in applications such as user interface evaluation methods, implicit feedback gathering methods and recently mouse-based biometric systems. Especially the last mentioned application requires high quality mouse data used in underlying methods. In our work, we analyze nature of mouse data for usage in various precision-demanding applications. No method could be designed perfectly if nature of data is not known by the researcher.

We provide an overview of how these data logs look like, what is the expected log size, what issues could be expected in the data. Especially, mouse moving generates a great amount of data with coarse positions. We propose pre-processing techniques in order to enhance movement positions data. Mouse events are very low level information that should be aggregated into higher level actions. We suggest aggregation level of events before calculating movement features.

Automatic Estimation of Developer’s Expertise

kuricEduard Kuric
doctoral study, supervised by Mária Bieliková

Abstract. Evaluating expertise of developers is critical in software engineering, in particular for effective code reuse. In a software company, technical and expert knowledge of an employees is not usually represented in a unified manner and it is difficult to measure or observe directly. How well (the level of) the developer’s expertise is problematic to determine/estimate automatically. For example, to exactly show the developer’s expertise with a technology (library) we need to give the developer to solve a test. However, there is often a problem to motivate developers to execute a test and they have different standards for judging the degree of expertise. Therefore, our steps are based on the automatic estimation of the relative expertise with the consideration of other developers and to compare them with each other in the company.

Our new idea is to establish automatically developer’s expertise based on monitoring his/her working activities during coding in integrated development environment (IDE), analyzing and evaluating the (resultant) source code he/she creates and commits to local repository. By applying our approach we are able to observe and evaluate different indicators. For example, we can sight the developer who often copies and pastes source code from an external source (Web). Source code contributions of such developer can be relative to the software project, moreover, it can reveal a reason of his frequent mistakes or low productivity in comparison with other developers.

Gathering Tabbed Browsing Behaviour as User Feedback on the Adaptive Web

labajMartin Labaj
doctoral study, supervised by Mária Bieliková

Abstract. Everyone’s daily activity is gradually being more and more supported on the Web, if not entirely performed through its means – it includes everything from grocery shopping, through education, employment, communication, learning, to entertainment. Web systems supporting such aspects of human daily activity are becoming even more adaptive than before. Any web system firstly needs knowing the individual users through their actions in order to facilitate the adaptation. In our work, we focus on observing, logging, analysing and utilizing both implicit and explicit user feedback both within the boundaries of a single web based system and on the open Web across heterogeneous systems. Apart from explicit feedback questions presented to the user at the appropriate moments during his or her work for obtaining better and more extensive explicit evaluations, one particular area of our research lies in observing user’s movement across web pages – the parallel browsing.

Before we can even analyse and model this behaviour and use it for modelling the user, improving domain models, or recommending resources to users, we need to capture it. In one approach, using a tracking script, we can easily observe every user of our application without additional engagement from his or her side, but only the visits to, and switches between, those pages stemming from a limited set is observed. We used this approach to recommend learning objects relevant to exercises being currently solved in a learning system.

In another approach, observing the user’s browser, for example through an extension, we see user’s every step across various web applications, even when the user leaves our application to look for additional information in other web systems, but we only see actions of a limited user group who choose to participate. We previously used such data for automatically enriching learning content with links to external resources.

Evaluating Usability of Applications by Eye Tracking

lalikVladimír Ľalík
master study, supervised by Jakub Šimko

Abstract. At the present time success of software depends not only from functional aspects, but also very important is experience which application provides to users, while they effectively and efficiently achieve specified goals. Therefore usability evaluation is important part of software development. When we need determine level of usability in an application, the best way is to let users interacting with application, while we watch their behavior, asking a questions, record their activity.

Usability testing with users is time and money consuming, therefore our effort is to obtain as much information as possible from users. Now we can use eye-tracking to obtain more information on what draws the attention of users or where the users search for information in application. These data can provide a different perspective on what attract the attention of the users or where they were trying to find information. We can obtain a big set of data from eye-tracking, but an analysis of these data is extremely time consuming, because there has not been established correlation scheme to link eye tracking patterns to specific usability problems.

We analyze automation of process of evaluation users interfaces with the data obtained from eye-tracking. Usability of software is complex issue full of different influences, therefore we select one aspect of usability, for which we design automated evaluation method. This method can save a lot of time, which experts must spend with manually analysis of the data from eye-tracking.

Personalized Scalable Recommendation System

lieskovsky_aAdam Lieskovský
master study, supervised by Michal Kompan

Abstract. Personalized recommendation is nowadays widely spread and employed all over Internet. Thanks to these systems users encounter ads, which actually interest them, they can discover new music or movies personalized to their taste. It greatly helps to improve user satisfaction and reduces the information overload or complexity for users. Requirements for scalability of recommender systems growth hand in hand with the continuous growth of Internet traffic generated by users and their data.

In our work we would like to focus on the question of suitable architecture and software design of scalable recommender systems. We plan to analyze and choose the best approaches to user modeling, data storage, work distribution and recommendation generation in real-time. We intend to create universal and modular recommender system, which can be further enhanced and configured for use in certain domains. We plan to evaluate our method and system with offline experiments using standard datasets and secondly, with deployment in production environment.

Content Recommendation from Archives of Questions Answered in Communities

lovasovaViktória Lovasová
master study, supervised by Ivan Srba

Abstract. Community Question Answering (CQA) sites such as Yahoo! Answers or Stack Overflow have become valuable platforms to create, share, and seek a massive volume of human knowledge. The task of question retrieval in CQA aims to resolve one’s query directly by finding the most relevant questions (together with their answers) from an archive of past questions.

Archives of questions however provide another potential which has not yet been fully discovered – to recommend solved questions that may be useful for users – (e.g to expand their current knowledge or to provide topics they are interested in). We propose a method for personalized recommendation of solved questions considering the user’s interests – the questions he frequently views, answers, asks, comments and rates.

We implement and evaluate the method by the data from the system Stack Overflow – CQA system for professional programmers or in system Askalot, which is used by students of the Faculty of Informatics and Information Technologies in Bratislava, which will allow us not only an off-line verification but also a live experiment with students.

Game for Connection of Metadata and Sources in the Domain of Multimedia

masiarAleš Mäsiar
bachelor study, supervised by Jakub Šimko

Abstract. In the present, gathering metadata is necessary because of its use for effective information processing on the Web. There are three particular approaches of metadata acquisition, which are expert work, crowd work and automated machine-based methods. In our work we focus on crowdsourcing, because it can produce high quantities of results, which are still in reasonable quality. We use the concept of games with a purpose, which serve as means of entertainment and relax for the player, while producing useful output for the computer.

We present a game with a purpose, which acquires metadata for video data already existing on the Web. The goal of the game for the player is to use provided search engine to find video, which is playing on the screen without any describing information displayed. Our goal is to make the game challenging and entertaining for the player, while producing quality metadata for the videos.

Determining the Parts of Speech in Slovak Language

meszarosDalibor Mészáros
bachelor study, supervised by Márius Šajgalík

Abstract. In last few years, machine learning has made big progress and majority of approaches moved from manually created rule-based systems to data-driven approaches. That is also the case for part-of-speech tagging, which is globally maintaining 97.3% accuracy and has strong tendency to be surpassed in the near future. Unfortunately these results are reported mainly for English language, as the most common language in the world. But there are still only few solutions and methods for other languages, especially for Slovak language, which has this field nowadays weakly represented.

In this project, we focus on creating a software solution for part-of-speech tagging in Slovak language. We try to adapt the current state-of-the-art approaches for English language, which are based on conditional random fields. We evaluate our approach on annotated dataset obtained from Slovak Academy of Sciences. The main goal of this project is to train a sufficient model using this dataset and basic feature set supported with vector representations of Slovak words.

Activity-based Search Session Segmentation

molnarSamuel Molnár
master study, supervised by Tomáš Kramár

Abstract. Nowadays, knowing a search goal behind user query is a challenging task for search engines. Not only it improves personalisation, but it also makes sense of user queries and interests. The knowledge of user’s goal and all queries supporting it helps search engine to adjust sorting of search results, recommend useful ads or guide user towards meaningful documents. To improve the goal identification the engine uses other features of user’s search context and combine them together in order to identify user preferences and interests. Although, most of features utilized for goal identification involve only lexical analysis of user’s queries and time windows represented as short periods of user’s inactivity.

In our work, we focus on utilizing user activity during search for extending existing lexical and time features. By analyzing user search activity such as clicks and dwell time on search results, we better understand which search results are relevant for user’s current information need. Thus, we utilize user’s implicit feedback to determine relevance between queries by search results that are significant for user. The longer time user spends browsing through selected search results, the more relevant the result should be for the user. Therefore, we propose an approach for search session segmentation that utilizes lexical features of queries and clicked results along with time user spent browsing them.

moroRóbert Móro
doctoral study, supervised by Mária Bieliková

Abstract. For the type of searches that are open-ended, start with ill-defined information needs which can change over time as the users get more acquainted with the domain and the problem at hand and which require the use of different search strategies, the term exploratory search was coined. A prime example of this type of search that is often conducted by the researcher novices is an exploration of a new domain and search for the state-of-the-art.

In order to support exploratory search and navigation on the Web and in the digital libraries of research articles, we have proposed a method of exploratory navigation based on the navigation leads, i.e., important terms (automatically) extracted from the documents that help the users to filter information space. We define a notion of a navigational value of a keyword, which reflects the information subspace that is covered by the lead, i.e., the size of the subspace, its relevancy for the user (his current query) as well as how the term represents the subspace, and use it in the process of the leads selection. We employ clustering based on topic modeling using LDA (Latent Dirichlet Allocation) in the process of computation of the navigational value and evaluate our proposed approach in the web-based bookmarking system Annota by means of a quantitative user study.

Ontology Learning from the Web

pikuliakMatúš Pikuliak
master study, supervised by Marián Šimko

Abstract. Data mining from the content of the Web is a field of study that is gaining attention of researches around the World. Vast corpus of data in natural language there contains most of the human knowledge. Natural language processing (NLP) is a task of understanding the meaning behind texts created by humans. One of the approaches to this problem is creating contextual maps, also called ontologies. Ontologies are knowledge based structures consisting of concepts and relations between them.

In our work we are trying to discover new relations between concepts in ontologies. This task is usually referred to as relationship extraction. Current state-of-the-art ontologies consists of millions of concepts and relations. Some of this ontologies were built automatically from the structured and semi-structured data on the Web. Our goal is to refine the relationship layers of some of these ontologies using statistics-based methods. Our objective is using unstructured data from the reliable Web sources, such as article texts from Wikipedia, to find relations that are not present in current ontologies.

Approach we are primarily pursuing in our research is based on pair-pattern matrices built from continuous space language models. These matrices are most suited for measuring semantic similarity between relations. Possible applications in ontology learning for pair-pattern matrices are for example relational classifying, relational search, analogical mapping or measuring relational similarity. Our main goal is to enhance existing ontologies with new relations found in our corpus of data written in natural language.

Metadata Generation with Game with a Gurpose in the Multimedia Domain

Martin Polakovič
bachelor study, supervised by Jakub Šimko

Abstract. Data management and manipulation requires machines to understand the content and its context. For this to be possible, a homogenous metadata layer must be first generated on top of the content, which identifies its properties and relationships. Acquiring metadata can be no easy task, as it paradoxically requires intricate knowledge and understanding of the very content in the first place.

Metadata acquisition can be driven by experts, crowd or machines. I the crowd driven area, there also belong games with a purpose (GWAPs), which have players play and indirectly solve a problem transformed into a game mechanics. GWAPs are now acknowledged solution to metadata acquisition.

We want to build an engaging GWAP solution for semantics acquisition in specific image datasets. We attempt to gather image semantics and devise a 2D board game – minimalistic multiplayer output agreement GWAP . By having players manipulate position of images on a 2D board we allow player to input image semantics into the game. Players inadvertently validate each other’s actions by playing on the same board simulatenously. Estimating image similarity and semantics is then possible from the relative image positions at the end of a game. Wmove images across the board. By analyzing the state of the board, we can draw conclusions about the relationships between the image resources.

Extracting Pure Text from Web Pages

poschHelmut Posch
bachelor study, supervised by Michal Kompan

Abstract. Extraction of pure text from web pages is the first step for succesful analysis and next processing of web page content.Pure text of web page can be useful for recommendation or search systems. It can be displayed on mobile devices, which haven t enough place on screen for irrelevant information.

In these days, extraction of pure text from web pages is nontrivial task. Human can recognize relevant content by understanding some text. This technique isn t achievable in machine processing of web pages, because of diversity of web page layouts. Each web page can have different structure and content. Web page developers insert to their web pages elements, which aren t interesting for user. These elements are advertisement, navigation bars, header and footer etc.

My approach of extracting pure text from web pages is based on the nature of the text, especially on intensity of punctuation occurrence. On the basis of my own statistics, in the webpage main content occurs dominate amount of punctuation (especially commas). Many approaches use ratio of anchor text and normal text to find out existence of relevant content in the part of the webpage. My approach tries to improve it, not looking for whatever anchor text but looking for special characters (punctuation). After evaluation of webpage parts, I choose parts which are above of average score. Last step is verifying position in structure of website of the chosen parts of website.

Utilization of Information Tags in Software Engineering

OLYMPUS DIGITAL CAMERA

Karol Rástočný
doctoral study, supervised by Mária Bieliková

Abstract. Information tags are a subset of descriptive metadata that assign some structured information to other information artefacts (e.g., to a webpage paragraph or to a source code line). In general, informal tags have been proposed to model properties of tagged information (e.g., webpages). Additionally the information tags model is based on the standardized Open Annotation Data Model so information tags can be shared among software systems. Due to these properties of information tags, we utilize them for modelling source code files and to provide first basic tools which utilize source code model based on information tags.

For modelling source code files and supporting tools we utilize all categories of information tags: (i) User information tags – support code review process (TOTO, FIXME, CODEREVIEW, REFACTOR, …), (ii) Content-based information tags – information tags obtained by analysis of a source code via SonarQube, (iii) User activity information tags – information tags created by analysis of developers’ activities, e.g. implicit dependencies, (iv) Aggregating information tags – aggregate information from multiple information tags, e.g. facet tags for support of source code search.

Currently we finalize last implementations of prototypes used for support of software development by utilization of information tag. In future work we plan to provide a set of experiments that prove usability of information tags in software engineering.

Optimizing Map Navigation Using Contextual Information

rostar_2Roman Roštár
master study, supervised by Dušan Zeleník

Abstract. When solving and optimizing logistic problems, most of the widely used methods usually focus on attributes like distance or duration among two of the points on a map. By using services like Google Distance Matrix we can even enrich the optimization algorithm with the information on the current traffic in the form of duration which increases if there are traffic jams caused by peak hours or accidents. All of this information is based on observations which are made in real time. Although it could be precise in larger cities, where you have the mass to cover the map in real time, there are many places in the world where you simply cannot identify the traffic. Also when given the requirement to predict travel times ahead of time, it’s impossible to rely just on the current state of traffic.

The idea of our project is to augment the duration or distance data between the points on a map with contextual information of the environment. The theory is, that traffic jams and the traffic dynamics overall show up in patterns. We want to analyse weather conditions, location and time to build a traffic dynamics model which could improve the relevance of traffic dynamics prediction. To confirm our hypothesis we will use a realistic dataset of car positions from a delivery company, where we will compare the results predicted by our model with the duration data retrieved from Google Distance Matrix service and the actual delivery arrival times.

Evaluation of user explicit feedback based on his implicit feedback measurement

rybarMetod Rybár
master study, supervised by Mária Bieliková

Abstract. Implicit feedback can provide us with information that we can use to help us evaluate online questionnaires. Using this information, we could eliminate number of necessary explicit feedback and we can better evaluate the results. This would allow us to simplify the questionnaires and also improve the result quality. Explicit information from the user may be incomplete or misleading. This is currently being dealt with using complicated questionnaires and forms asking the same question multiple times differently, to avoid getting misleading information.

Using implicit measures as pupil dilation, eye-tracking, galvanic skin response or skin temperature, we could predict if user is trying to deceive us or is lying to us. Using these implicit measures from the users when filling out online questionnaires would allow us to eliminate the number of the questions needed to ensure the needed quality of the results and also would significantly improve the results of the questionnaires by eliminating deceptive and false answers from our result set.

Modelling User Interests in Latent Feature Vector Space based on Document Categorisation

sajgalik2Márius Šajgalík
doctoral study, supervised by Mária Bieliková

Abstract. User modelling includes modelling various different characteristics like user goals, interests, knowledge, background and much more. However, evaluation of each of these characteristics can be very difficult, since every user is unique and objective evaluation of each modelled feature often requires huge amount of training data. That requirement cannot be easily satisfied in public research environment, where personal information is too confidential to be publicly accessible. In a common research environment, we are confronted with training the model on only a small sample of data, which mostly requires humans to evaluate the model manually, which is often very subjective and time-consuming.

We examine a novel approach to evaluate user interests by formulating an objective function on quality of the model. We focus on modelling user interests in form of keywords aggregated over web pages from user browsing history. By treating users as categories, we can formulate an objective function to extract user interests represented as discriminative words, which can be used to discriminate the user within given community, which effectively avoids extracting words that are just too generic.

Time-recommendation System for Students

Matúš Salát
bachelor study, supervised by Jozef Tvarožek

Abstract. Abstract Learning at university can be troublesome for first-year students. Different teaching methods than what they are used to from high school, and the quantity of learning may force them to drop out. Students need to organize their deadlines, when they should start with learning for some midterm test or when they have deadlines for their project.

The many sources of information often make students misunderstand what teachers want from them. This leads often to forgetting the obligations and the follow-up failure. Students currently only seldom share useful tricks and hints on how to plan their time effectively for other classmates. Good time planning is the key for perspective and balanced learning.

In our project, we want to improve these factors using visualization of their plans and sharing information with others. Every student is going to see what and when should be done, and should have the most priority in learning. All deadlines and current conditions in one place force students to make right and successful decisions. Time-recommendation based on student own experience, effectively visualized data and sense of unity is primary goal in our work.

Local User Model: Towards Decentralized User Modelling

Jakub Senko
bachelor study, supervised by Michal Kompan

Abstract. The adaptation and personalization is a basic part of the modern Web. At the same time more and more users value their privacy. Privacy issues gets even more attention nowadays, when several worldwide scandals, touching online privacy, have been revealed. There is a trade-off between personalization and online privacy, though.

Providing user with personalized content, a user model which represents users preferences have to be maintained. For recommendation system to be accurate, usually the more data about the user, the better recommendations would be generated. This, however, is in contrast to the user’s privacy. To make systems able to provide personalized content and users keep their privacy, we focus on modeling users’ preferences locally, in the computer of a user instead of one centralized user model.

Currently, however, there is not a way to display personalized content to the user without sending local user model to the system, or fetching great amount of data to be filtered locally. First would break the idea of private data not shared by the user, the second one uselessly transfers vast amount of data, of which majority will never be presented to the user. In our work, we aim to discover more efficient approaches for local decentralized user modeling.

Stream Data Processing

JakubSevcech_fotoJakub Ševcech
doctoral study, supervised by Mária Bieliková

Abstract. Over the last years many time series representations were proposed in the search for representation highlighting important aspects of the processed data while reducing its dimensionality, handling the noise present in the data and at the same time being easy to compute. With various specialized applications other requirements for the time series representations emerge. In our work we focus on easy interpretability of the data to allow its direct presentation to the data consumer and on the incremental transformation process. The second property is an essential requirement for the application in stream data processing, which is the main focus of our work.

We proposed a time series representation based on transformation of repeating patterns in the course of the time series into the sequence of symbols. We use incremental clustering algorithm in the process of symbol creation to allow the transformation of incoming stream of time series data.

Processing and Comparing of Sequential Data Using Machine Learning

simekMiroslav Šimek
master study, supervised by Michal Barla

Abstract. From many different approaches of machine learning, multilayered self-teaching neural networks (a.k.a. Deep Belief Networks) using the unsupervised learning approach are gaining popularity nowadays. They used to be not accepted and largely ignored by most of the experts in machine learning community for almost 40 years. One of the reasons was simply because of too little computational power of available technology at the time. However, today it is already producing interesting results for example in computer vision, speech recognition or text processing.

One of the crucial attributes of software is usability. Devices like eye tracker are already here to help us monitor user’s activity, but they produce large streams of unlabeled data which need to be evaluated. Current approaches rely upon manually defined areas of interest and optionally also manually designed features fed as an input to some machine learning algorithm.

Our approach is based on unsupervised machine learning, specifically Restricted Boltzmann Machine (RBM), which is presented with fragments of user sessions in form of heat maps capturing spatial (pixel coordinates) and time (pixel intensity) information. RBM is able to find its own features to make efficient session abstraction in the context of other user sessions, which is especially suitable for comparing, clustering and categorizing user eye-tracking sessions. Our goal is to improve possibilities of automatized evaluation of eye-tracking data sequences by comparing user sessions to each other, comparing user sessions with expected usage of application captured by eye tracker or detecting outliers for closer manual inspection.

Answerer-oriented Adaptive Support in Community Question Answering

srbaIvan Srba
doctoral study, supervised by Mária Bieliková

Abstract. In situations when Internet users are not able to find required information by means of standard information retrieval systems (especially web search engines), they have a possibility to ask their questions in popular Community Question Answering systems (CQA), such as Yahoo! Answers or Stack Overflow. The main goal of CQA systems is to harness the collective intelligence of the whole community to provide the most suitable answers on the recently posted questions in the shortest possible time.

In our project, we tackle two open problems that are present in current CQA systems and state-of-the-art approaches to collaboration support. The first one is related to question routing which refers to a recommendation of new questions to potential answerers. The majority of approaches to question routing can be characterized as asker-oriented. It means that they take primarily asker goals and expectations into consideration and utilize mainly expert users with a high level of knowledge regardless the real difficulty of routed questions. Therefore, our first research goal is to propose a novel answerer-oriented adaptive support which focuses specifically on answerers and their preferences.

The second open problem, which we tackle in our project, is an employment of CQA concepts in the educational domain. We proposed a concept of organization-wide education CQA system to tackle the second open problem. In order to evaluate the feasibility of this concept, we created CQA system Askalot, which is the first university-wide CQA system that takes into consideration educational (e.g. a presence of teacher) as well as organizational specifics (e.g. common familiarity of students). Askalot is used as a supplementary tool to the formal educational process at Faculty of Informatics and Information Technologies, Slovak University of Technology in Bratislava.

Analysis of Methods of Interactive Problem Solving

stibilaJán Stibila
bachelor study, supervised by Jozef Tvarožek

Abstract. There are many more or less suitable ways of solving interactive problems. The main aim of our work is to explore approaches to solving 2048 game and compare the quality of different strategies, which players have developed.

This will require analyzing game itself as well as recorded game-play of many players. Game-play records will be enhanced by eye tracking records. That allows us to get an overview about what part of the game board is player focus more in different situations.

For that, we need to develop not only tools for recording game-play, but also methods and tools to analyzing huge amount of gathered data. At the conclusion we will derive specific solving strategies from the most successful players, identify their strengths and weaknesses and enhance them to create even better strategy. We also hope we can discover corelation between player strategy and his personal profile, if there is any.

Implicit Feedback-based Estimation of Student’s Knowledge

strbakovaVeronika Štrbáková
master study, supervised by Mária Bieliková

Abstract. Nowadays, there is a number of adaptive educational systems, which adapt the content automatically for a student by monitoring his activities. Inaccurate information obtained by the evaluation of implicit feedback based on the student’s behavior has an impact on accuracy of the recommendation. With the increasing options of the monitoring activities such as signals from eye camera we can more accurate evaluate the implicit feedback and interpret various activity signals of the user implicit feedback.

Our research aims to monitor a student during his study in a web-based educational system and to estimate active time spent on learning objects. We monitor various signals of the user activity while working with the educational system, at which we specialize mostly on the ones that help us to determine the time of an active work with the learning object. However, it is important to know what is the complexity of the learning object to read and understand. In our method we use the Automated Readability Index metric and the LIX formula metric to determine these characteristics of the learning object.

The characteristics of the learning object along with the observed student’s actions can help us to estimate the level of user’s knowledge. By comparison of the evaluation of the acquired knowledge based on the student results from the pretest and posttest from the Functional and logical programming course with the time of the student’s active work spent on the learning objects we can find the dependence between the time of the active work with the learning object and the student’s knowledge. Based on this we estimate the level of his knowledge on concepts related to given learning objects.

Presentation of Personalized Recommendations via Web

svrcek2Martin Svrček
master study, supervised by Michal Kompan

Abstract. Nowadays, personalized recommendations are widely used and very popular. We can see a lot of systems in various fields, which use recommendations for different purposes. However, one of the basic problems is the distrust of users of recommendation systems. They consider them as intrusion of their privacy. Therefore, it is important to make recommendations transparent and understandable to users.

In this context, we want to propose several methods for presenting the results of the recommendations. Our aim is to use a standard recommendation technique and focus on different approaches to obtaining data about users, visualization and explanation of recommendations.

We also want to verify the suggested approaches in the selected application domain in order to obtain statistically significant results through the use of implicit and/or explicit feedback.

Using Parallel Web Browsing Patterns on Adaptive Web

tomaMartin Toma
master study, supervised by Martin Labaj

Abstract. The possibility to use browser tabs as a tool for parallel web browsing is definitely not new. In recent years, however, more and more people tend to use this feature every day. Despite that, little research has been done, to deeply analyze why, and more importantly how people use this browsing mechanism. Even less of them aimed to utilize this information in a way, which would further enhance the browsing experience for the Web browser users themselves.

Our first goal was to design and implement a method, which will be used to collect accurate browser usage data with the primary focus on browser tabs. Existing solutions like Brumo lack the accuracy in some cases, because they are not utilizing the modern browsers API’s. Collecting accurate data will lead to more accurate parallel web browsing analysis. We have implemented a Web browser extension called TabRec. While using the Chrome API we are able to confidently detect eight different tab events. TabRec is already live and is capturing parallel browsing activity produced by about 25 users. Utilization of the most common patterns via the browser actions recommendation is our next goal. We have decided to use the Generalized Sequential Pattern algorithm to find the most frequent sequences.

We will utilize the most common patterns and sequences we discover to enhance the users browsing experience. This can be achieved in terms of recommendation of the most probable tab action in specific situation or by detecting some characteristic sequences. Currently we focus on identifying the characteristics of such situations and detecting the according sequences in our data. As a next step, TabRec extension will be enhanced to perform a real-time detection of selected sequences and will have the ability to notify users and execute the appropriate action.

Analysis and Measurement of User Behaviour in a Web Application

truchanPeter Truchan
master study, supervised by Mária Bieliková

Abstract. In this thesis, we would like to find out, analyse and summarise the best metrics, which can be used for user behaviour tracking in the space of web applications. Based on these metrics, we want to evaluate their behaviour. Nowadays, there are some really advanced commercial technologies for site tracking and analysis e.g. Google Analytics or Adobe Analytics. It collects data about user and his behaviour, but in the end, it only present data about concrete web page or web site.

We would like to focus more on the visitor of the web application and evaluation of his path through the website. If we can identify the most important metrics, which will tell us if the user is going the right path to the success, we can also tell, if the user is lost and needs help to achieve his goal. We would like to analyse data and make a model of this user behaviour.

The first step will be to get and process measured data. Then we would like to design a model for valuation of these data. Goal is not to only identify bad pages, but also to identify groups of people and their common signs, which will tell us, if they are not familiar or they are lost in our application. If we could achieve this, we will be capable of improving bad pages or helping these people with some special approach to them. This thesis will discuss and try to improve modern way of inventing web application through testing, evaluating, improving and targeting content of website.

Application of Machine Learning for Sequential Data

uherekPeter Uherek
master study, supervised by Michal Barla

Abstract. In these days every website is submitted to an analysis, a readership and a visit rates in order to keep on a website so many readers as possible for a long time. Every day operators of portals, who provide extensive and dynamic content of information on their websites (typically online editions of newspapers, magazines), have large amounts of sequence data that capture behavior of users and history of their gradual browsing on website. They also store information about their articles, names, topics and article release. From these data it is not only possible to retrospectively evaluate the readership of individual articles, but for example predict popularity of articles or topics, which may have an affect on a web layout. It also can help decide, whether article will be charged or not.

Our goal is to analyze existing methods of machine learning for problematics of predicting and classification by using the sequence data. Main focus is placed on possibilities offered by the artificial neural network in preprocessing and better representation of sequence data. The aim is to design a method to predict popularity of articles from sequence data of web portals with support of some methods of machine learning.

Citations and Co-citations as a Source of Keyword Extraction in Digital Libraries

vangel2Máté Vangel
master study, supervised by Róbert Móro

Abstract. Keyword extraction is usually done from the text of the document, but in digital libraries there are some other possible options for extracting relevant words from research articles. One of these possibilities is to use citations, which can be considered as an important source of keywords because they can characterize the article and they can also highlight different, but relevant aspects of the cited article that is relevant for other researchers. We aim to extract keywords from research articles via citations, using some specific metrics and attributes of citations, for example citation network, distance between citations, multiplicity and co-citations.

Our proposed method works with three sources of texts related to a given research article, from which we extract important words separately and subsequently we compute the final relevance of each word to a single value coming from multiple sources. The given sources of texts are: the text of the analyzed research article, citation and co-citation contexts. We evaluate the proposed method in a domain of digital libraries using explicit feedback from the users of the web-based bookmarking system Annota.

Analysis of Human Concentration During Work in Information Space of the Web

vnenkĽubomír Vnenk
master study, supervised by Mária Bieliková

Abstract. Full concentration on a job is a hard task, mainly if the job is boring, unpleasant or too hard to do. We unintentionally seek for a distraction, for an occasion to do something different, something funnier. However, breaking our concentration repeatedly results in less efficiency and worse results.

In computer space we are able to analyse user’s behaviour. Analysing his workflow, the applications he is using or using eye-tracker to monitor changes in user’s saccade and webcam to monitor changes in sitting position we may precisely find out the moments when user is interrupting his work flow to do something different, not associated to his actual goal.

We also plan to use motivation as drive power to bring him back to productive work. However, we need to analyse and choose proper form of activity recommendation. We should not force him to get back to work but just suggest it. Or maybe we should force him to act right way. Who knows? Experiments will show the best way.

Modeling Programmer’s Expertise Based on Software Metrics

zbellPavol Zbell
master study, supervised by Eduard Kuric

Abstract. Knowledge of programmers expertise and their activities in environment of a software house is used in prior to effective task resolving (by identifying experts), better forming of teams, effective communication between programmers, personalized recommendation or search in source code, and thus indirectly improving overall software quality. The process of modeling programmer’s expertise (building the knowledge base) usually expects on its input some information about programmer’s activities during software development such as interactions with source code (typically fine grained actions performed in IDE), interactions with ITS (issue tracking) and RCS (revision control) systems, activities on the Web or any other interaction with external documents.

In our research, we focus on modeling programmer’s expertise based on software metrics such as software complexity and source code authorship. We assume that programmer’s expertise is related to complexity of the source code she is interacting with as well as to a degree of authorship of that code. In case of software complexity our idea is to explore alternative approaches to LOC (lines of code) based metrics, such as weighted AST (abstract syntax tree) node counting or call graph based metrics. With source code authorship we expect programmers who wrote some code to be experts on that particular code, but we need to consider only some degrees of authorship as the code evolves is changed by other programmers over time. Information acquisition for programmer modeling in our work is based on activity logs from programmer’s IDE. We plan to implement our method as an extension to Eclipse IDE for Java programmers and evaluate it on data from academic environment or (preferably) real software house environment.

Evaluating the Impact of Developer’s Reputation in Search Driven Development

zimenPeter Zimen
bachelor study, supervised by Eduard Kuric

Abstract. Nowadays reuse of source code is an effective way to increase developers’ productivity. There are various open source code repositories on the Web. The repositories are often the initial source of information that developers use to solve their development tasks. Relevance of source code results according to the developer’s query is of course paramount but trustability in the results is just as important. Recent research indicates that developers use social cues over technical cues to evaluate source code candidates.

In our work we analyze current state of searching in source code. We propose a metric that considers both relevance of source code and developer’s reputation. We focus on evaluating the relationship between the relevance and the reputation. Giving more importance to reputation could obscure very relevant results, i.e., where is the best compromise?