Proceedings of Spring 2014 PeWe Workshop
Personalized Recommendation and Search (PeWe.REC)
- Rastislav Detko: Users’ Relationship Analysis
- Eduard Fritscher: Group Recommendation of Multimedia Content
- Ondrej Kaššák: Group Recommendation for Smart TV
- Róbert Kocian: Personalized Recommendation Using Context for New Users
- Peter Krátky: User Identification Based on Web Browsing Behaviour
- Samuel Molnár: Activity-based Search Session Segmentation
- Matúš Tomlein: Method for Novelty Recommendation Using Topic Modelling
- Juraj Višňovský: Evaluating Context-aware Recommendation Systems Using Supposed Situation
Traveling in Digital Space (PeWe.DS)
- Dominika Červeňová: Automated Syntactic Analysis of Natural Language
- Ľuboš Demovič: Linking Slovak Entities from Educational Materials with English DBpedia
- Peter Dubec: Semantics Acquisition for Electronic Program Guide
- Ladislav Gallay: Utilizing Vector Model of Words for Text Processing on the Web
- Michal Holub: Utilization of Linked Data in Domain Modeling Tasks
- Adam Lieskovský: Query by Multiple Examples Considering Pseudo-Relevant Feedback
- Martin Lipták: Researcher Modeling in Personalized Digital Library
- Róbert Móro: Using Navigation Leads for Exploratory Search in Digital Libraries
- Martin Plank: Collocation Extraction on the Web
- Ondrej Proksa: Discovering Identity Links between Entities on the Semantic Web
- Márius Šajgalík: Exploring Multidimensional Continuous Feature Space to Extract Relevant Words
- Jakub Ševcech: Anomaly Detection in Stream Data
- Miroslav Šimek: Processing and Comparing of Data Streams Using Machine Learning
- Máté Vangel: Determining the Relevancy of Important Words in Digital Library using the Citation Sentences
- Tomáš Vestenický: Using Tags for Query by Multiple Examples
- Ľubomír Vnenk: Web Search Employing Activity Context
- Jakub Mačina: Application for International Competition
User Experience (PeWe.UX)
- Peter Demčák: Methodics of Game Evaluation Based on Implicit Feedback
- Patrik Hlaváč: Analysis of User Behaviour on the Web
- Martin Janík: Web Applications Usability Testing by means of Eyetracking
- Filip Mikle, Matej Minárik, Juraj Slavíček, Martin Tamajka: Low-Cost Acquisition of 3D Interior Models for Online Browsing
- Veronika Štrbáková: Implicit Feedback-based Discovery of Student Interests and Educational Object Properties
Software Development Webification (PeWe.PerConIK)
- Karol Balko: Keeping Information Tags Valid and Consistent
- Matej Chlebana: Source Code Review Recommendation
- Martin Konôpka: Identifying Hidden Source Code Dependencies from Developer’s Activity
- Jakub Kříž: Context-based Improvement of Search Results in Programming Domain
- Eduard Kuric: Modeling Developer’s Expertise
- Karol Rástočný: Employing Information Tags in Software Development
- Jana Podlucká: Assessing Code Quality and Developer’s Knowledge
- Richard Sámela: Personalised Search in Source Code
- Andrea Šteňová: Browsing Information Tags Space
- Pavol Zbell: Modeling Programmer’s Expertise Based on Software Metrics
Technology Enhanced Learning (PeWe.TEL)
- Tomáš Brza: Student Motivation in Interactive Online Learning
- Peter Dulačka: Finding and Harnessing Experts for Metadata Generation in GWAPs
- Richard Filipčík: Gamification of Web-based Learning System for Supporting Motivation
- Martin Gregor: Facilitating Learning on the Web
- Marek Grznár: Adaptive Collaboration Support in Community Question Answering
- Jozef Harinek: Natural Language Processing by Utilizing Crowds
- Peter Kiš: Analysis of Interactive Problem Solving
- Matej Kloska: Keyword Map Visualisation
- Martin Labaj: Observing and Utilizing Tabbed Browsing Behaviour
- Marek Láni: Acquisition and Determination of Correctness of Answers in Educational System Using Crowdsourcing
- Viktória Lovasová: Recommendation in Adaptive Learning System
- Michal Račko: Automatic Web Content Enrichment Using Parallel Web Browsing
- Ivan Srba: Adaptive Support for Collaborative Knowledge Sharing
- Martin Svrček: Collaborative Learning Content Enrichment
- Martin Toma: Using Parallel Web Browsing Patterns on Adaptive Web
Doctoral Staff
- Mária Bieliková: web personalization, user/user groups and contexts modelling, usability and user experience (and HCI in general)
- Michal Barla: user modeling, implicit user feedback, virtual communities, collaborative surfing
- Jozef Tvarožek: social intelligent learning, collaborative learning, semantic text analysis, natural language processing
- Marián Šimko: domain modelling, ontologies, folksonomies, semantic text analysis, Web-based Learning 2.0
- Jakub Šimko: crowdsourcing, games with a purpose, semantics acquisition, Web-based learning
- Michal Kompan: single user and group recommendation, satisfaction modeling
- Tomáš Kramár: user modelling, personalized web search
- Dušan Zeleník: context modelling, recommendation
Keeping Information Tags Valid and Consistent
Karol Balko
master study, supervised by Karol Rástočný
Abstract. Throughout the development of modern systems the metadata are starting to become their inseparable part. Metadata as structured information describing the information resources are understood as the means of describing the features of the source code in our field of research. The features of the code include for example the number of white characters or denomination of the copied code.
Our research field is focused especially on the metadata used within the project PerConIK. Within this project metadata are being denominated by the expression information tags. Information tags are stored in the repository of information tags and are connected to the source code through a link on the source code. However, these information tags can be discarded during the modification and refactoring of the source code as they may represent invalid source link or they might be describing invalid feature of the source code.
The thesis aims to create a method to control the consistency and validity of theses information tags. Through the analysis of this problem we have proposed possible method, which will allow us make this control. Several ways used in our solution in general as well as the analysis of metadata through approaches used in related fields of research have been analysed.
Student Motivation in Interactive Online Learning
Tomáš Brza
bachelor study, supervised by Jozef Tvarožek
Abstract. In today’s age of modern technology, alternative ways of educating are becoming popular. Usage of interactive online learning systems is slowly starting to complement, sometimes even replace, studying in traditional school. This method of learning not only improves students’ understanding of underlying concepts but also provides more natural learning process where students can decide their own pace and move to next topics when they themselves feel they are ready. However, students often lose motivation to study on their own using this method.
In our project we look at different aspects and types of motivation and existing evidence in order to create elements of interactive learning system that would motivate students into using it more often. We focus on extrinsic motivation using gamification and social aspects that would create feel of community and provide competition among students to motivate them into learning new things and solving more tasks. We also try to eliminate all unwanted aspects that could cause frustration or anger and replace them with more motivating counterparts.
Automated Syntactic Analysis of Natural Language
Dominika Červeňová
master study, supervised by Marián Šimko
Abstract. Natural language as one of the most common means of expression is also used for storing information on the web. Its processing is, however, a difficult and problematic process, because of the informality and not very good structuring of the natural language. Syntactic analysis, as a part of the natural language processing, discovers formal relations between syntagms in a sentence and assigns them syntactic roles. That can help make natural language and information stored using it more machine-processable.
We work on a method that will be able to automate the syntactic text analysis process as much as possible. Currently, we focus on analyzing existing tools for various languages. There already are some parsers that can perform syntactic analysis in languages that are more simple and easier to formalize (e.g., English), but we also explore options for Slavic languages (e.g., Russian, Czech or Slovak language) where automated syntax recognition is a nontrivial problem and many approaches are trying to solve it. Machine learning appears to be helpful to this problem. With enough training data – e.g., corpus of annotated sentences for specific language – it is possible to train a parser to recognize syntagms with state-of-the-art accuracy.
We plan to evaluate our method using a software prototype that will analyze sentences. As a golden standard we plan to use syntactic annotations based on the Slovak National Corpus project at Ľudovít Štúr Institute of Linguistics.
Source Code Review Recommendation
Matej Chlebana
master study, supervised by Karol Rástočný
Abstract. Quality assessment software is complex, but substantial activity in terms of project success. Early detection of errors can alert company management on issues that may lead to prolonged or complete abolition of the project. In addition to detecting errors also helps maintaining consistency of code and reducing the risks associated with the departure of one of the team members. Revision of the developers’ codes is time-consuming, especially in larger companies where the new versions of the code are generated in small intervals of time. The aim will be to separate “good” from potentially hazardous source codes therefore to eliminate the necessity of revision of the entire code. In this work we will try to facilitate the work of judges.
Currently we are trying to model the error processes of developer, which would allow us to answer the question of why the error occurred. We analyze data from the recording of the activity of system PerConIK and test tools for discovering these processes.
Methodics of Game Evaluation Based on Implicit Feedback
Peter Demčák
master study, supervised by Jakub Šimko
Abstract. It is the objective of every game – whatever its purpose may be, to create an authentic game experience, and to capture (and keep) the interest of its players. Game experience makes use of basic cognitive skills such as modeling, focus, imagination and empathy. One of the aspects of game experience is immersion, which is the degree, in which the player is invested into the playing of the game. As such, if a game aims to reach its goals, it is essential to achieve as high a level of immersion as possible. For the purpose of creation of the intended gaming experience, the ability to evaluate the gaming immersion is highly valuable. However, because of the subjective character of experience, immersive experience requires feedback from the players to fully grasp.
The limitations of explicit feedback come from the difficulty of executing a detailed observation of the player’s mental state without disturbing it. Hence, the importance of implicit feedback, which is based on recognition of the user’s mental state based on their natural behavior. One of the means of gathering interesting implicit feedback is through eye tracking. Mapping of the eye movements to cognitive functions shows promise, even for the evaluation of game immersion. Some of the approaches available are the identification of the fitting and disturbing elements of gameplay, the game passages which are the most and the least immersive, or recognition of the states of presence and gameflow with the player.
Our goal is to use the methodological approach in our research, to design a set of reusable principles which can be used for game evaluation based on eye tracking information. Then, we plan to apply these principles to several games with different user groups and different kinds of game play, to verify the results of our method.
Linking Slovak Entities from Educational Materials with English DBpedia
Ľuboš Demovič
master study, supervised by Michal Holub
Abstract. Currently the Web provides a large amount of knowledge, and also has the potential to become the largest source of information in the world. Data published on the Web is largely unstructured, intended for people, without a clear definition of entities, their meaning or relationships between them. Linked Data describes a method for publishing structured data on the Web so that they are connected to each other and which makes them useful. In addition, Linked Data contains various variants of links between the entities that make it possible to create a chart describing the selected domain. Promoting the importance of data represents the next stage of Web development referred to as Web 3.0.
We deal with the analysis of automated machine processing of data on the Web in order to identify and extract entities and facts from Web content. We also deal with exploring the possibility of creating automated datasets obtained from the extracted entities and facts, using the principles of Linked Data. Datasets generated in this way would be helpful for fast retrieval, translation, personalization, enrichment context, recommendation and navigating the user to the desired information.
The aim of our work is the method that allows automatic extraction of entities and facts. Subsequently, all acquired entities in the Slovak language we use for linking to the English version of DBpedia. We focus on the processing of unstructured data from Slovak Web content. The selected domain of our work is educational content, in which there is a large amount of educational material. We will verify the proposed method experimentally, by implementing a software tool that will exploit the knowledge base for enrich context with new information.
Users’ Relationship Analysis
Rastislav Detko
bachelor study, supervised by Michal Kompan
Abstract. Nowadays, many people are actively using social network services. Usually these social network services are used to share ideas and to communicate with other people. These information about the user’s interaction can be useful in various tasks of Web to user improvements (as the recommendation, information filtering etc.)
In our work we aim to analyze data containing information about users and their activities in social network service and proposing method, which will set the likelihood of influence between users. In this method we are focusing on to take public active log which contains interactions one user with whole subnetwork (this subnetwork contains user’s neighbors) and find out which other users was reacting to his public activities and which users influence other users.
Subsequently, we are modeling influence flow in the network, what help us to verify, that our proposed method are right, by simulating real event. This event development will be supposed by using our method to evaluate edges in the network, which is used in model to monitor influence range and user activities. Next step is maximizing influence in network, which means minimize activated users at the beginning of the simulation and trying to maximize the number of activated users and we are monitoring whether activation flow goes through the whole network or activation stops in simulation.
Semantics Acquisition for Electronic Program Guide
Peter Dubec
bachelor study, supervised by Mária Bieliková
Abstract. In today’s age still more and more people is watching different TV Shows and movies. Now it´s very easy to watch something, because there are hundreds of TV channels and people are also able to watch TV shows and movies online. Internet is full of information about movies and TV Shows. There are many websites where users can find information about them and these data are necessary for TV shows or movie recommendation.
Our goal is to enrich existing metadata served by the Electronic Program Guide (EPG) aimed at improving personalized recommendation of TV shows and movies. It includes also discovery of relationships between particular TV shows or between various things in our world and TV shows such as data about a place mentioned in particular TV show or movie. To do so we need to find as much data sources as possible. Online movie databases are very good source of data, but we can also use Linked Data to find even more information about TV Shows, movies and some additional info such as locations, actors, images and so on.
Finding and Harnessing Experts for Metadata Generation in GWAPs
Peter Dulačka
master study, supervised by Jakub Šimko
GWAPs have been used to acquire and validate metadata for various media types in last couple of years. In our project CityLights we focused on validating existing music metadata due to poor quality of user entered data in online databases. As it failed in detecting false positives and many players had to take part to gain a decision, we realized that expert opinion or player’s activity weighting would fasten up the process a lot. The problem was that individual experts had been playing there, however they were outshouted by the rest of the crowd. We would like to recognize these experts by simple tasks and give them more power in affecting the dataset by playing the game.
We propose a game with a purpose primarily for domain expert finding, secondarily for metadata generation. After the experts are recognized, they can be treated differently during metadata generation process – and even a-posteriori to gain more accurate data. Our work is focused on music metadata gain and we would like to find experts in various music domains. We want to combine game with an online radio. By listening to online radio and answering questions about currently playing songs, we filter out non-experts. Then, after experts are recognized, we can experiment with special tasks only for experts and see their success ratio or compare success rate of expert group with non-experts and see how faster we can get in metadata generation with weighted player’s actions in GWAPs.
Gamification of Web-based Learning System for Supporting Motivation
Richard Filipčík
bachelor study, supervised by Mária Bieliková
Abstract. The greatest driving force while learning or doing some work is undoubtedly motivation. Motivation affects job performance and efficiency and amount of time a person is willing to sacrifice. There are many ways on how to motivate student during the use of web-based learning system. It is score and pointing systems which belong in most frequent and used ways to support motivation. Most educational systems use various methods of score computation – from the simplest which just count amount of performed activity to the advanced which take many factors into account.
We proposed a new method for score regulation which takes into account current system status and changes it is affected by. This method relies on several factors important for score regulation. The aim of the method is to make score computation and regulation more dynamic and perhaps fairer, and to achieve the ratio between activities performed in the system to be more balanced. At the same time we want to use it to motivate students to use learning system as a support for their studies more frequently. We have integrated our method for dynamic score regulation into the environment of Adaptive Learning Framework where we plan to evaluate effectiveness of it.
Group Recommendation of Multimedia Content
Eduard Fritscher
master study, supervised by Michal Kompan
Abstract. In our times it is very important that web pages or applications, not only store information, but it is also needed that the page or application could communicate with the user in certain ways. Because of the growth of the world wild web the amount of information which are stored in online space has increased. To solve the problem of this information burst recommendation technics and methods were invented, but as the world changes, the access to the internet also changed. People collaborate more often with each other. In times where the most visited pages in the world are social network pages the recommendation technics have to adept these new trends. Which is mainly collaboration between users. The answer to this need is group recommendation.
Therefore we are proposing method that will extract information from the users threw social networks for recommendation generation. Threw the extracted information we will create the Big Five Personality Model for each member of the group. After that we will apply an aggregation strategy for the personality model which will include it in the recommendation method. We will use a graph based approach for recommending content, this way will ensure that our method will be applicable for a wide range of domains. The method will be tested in the domain of movie recommendations.
Utilizing Vector Model of Words for Text Processing on the Web
Ladislav Gallay
bachelor study, supervised by Marián Šimko
Abstract. Text processing is an important part for plethora of tasks and is necessary for understanding content by machines in order to provide advanced functionality such as recommendation or intelligent search.
Our goal is to improve lemmatization process in any given language by utilizing Word2vec tool by Google. This tool represents words as vectors and creates neuron map from plain text. In our work we focus on using large dataset consisting of plain text in selected language and small prepared data to create successful word lemmatizer. The contribution of our work will be normalization of words in any given language with knowing almost nothing about the language. Current approaches deal with extracting a lot of plain text, cleaning of data, analyzing and optimizing the process. We believe that utilizing word2vec will improve process of lemmatization and helps in understanding of any language just by using meaningful random plain text in given language.
Facilitating Learning on the Web
Martin Gregor
master study, supervised by Marián Šimko
Abstract. Web browsing is one of our everyday activities. Therefore web content enrichment with the potential to improve access to an information is an easy way to deliver new knowledge. There are several ways to enrich the web content. Personalization of web content enrichment is important for adaptation of system behaviour to user individual needs. Personalization in adaptive e-learning systems is realized based on data in user model. The problem is the inaccuracy of user modelling, causing inappropriate personalization resulting in inefficient web content enrichment.
We propose a method that will accurately model the knowledge of the user based on collected feedback on user behaviour on the Web. Proposed method collects implicit feedback in a form of mouse clicks, moves, text selections, time spent over elements, time of visibility of each element and number of enters to areas of document. We predict a read level for whole document and for each knowledge covered in terms of the document. Our proposed method uses the read level information to user knowledge modelling. We implement method in domain of language learning as a javascript library. Our method evaluates observation and translation of terms in foreign language and uses evaluation results to user knowledge modelling. We evaluate our approach as an extension for a web browser, where the user learns vocabulary of a foreign language.
Adaptive Collaboration Support in Community Question Answering
Marek Grznár
master study, supervised by Ivan Srba
Abstract. Nowadays, users are lost in a great amount of information available on the Internet. Many times they come into a situation when information which they search is not easily found anywhere on the Internet using traditional search engines. With the development of Web 2.0, there is an option to obtain such information by asking a community. This kind of systems based on the sharing of knowledge to each other is being used lately. One type of these systems is Community Question Answering (CQA). Typical examples of such CQA systems are Yahoo! Answers and Stack Overflow.
The existing CQA systems, despite of their increasing popularity, failing to answer significant number of questions in required time. One option for supporting cooperation in CQA systems, it is a recommendation of question to a user which is suitable candidate for providing the correct answer (Question Routing). Various methods have been proposed to help find answerers for a question in CQA systems, but almost all work studies heavily depends on previous users’ activities in the system (QA-data).
In our work, we focus on utilizing users’ non-QA data as a way of support question routing. We look at different types of non-QA data and existing methods of question routing. By analysing of users’ non-QA activities such as blogs, micro-blogs, friends etc. we can better identify a suitable user for answering a specific question.
Natural Language Processing by Utilizing Crowds
Jozef Harinek
master study, supervised by Marián Šimko
Abstract. Amount of information stored in natural language on the web is huge and still growing. In order to process this information better we need to process the natural language and transform it to a form that machines are capable of understanding it.
However, Natural language processing (NLP) is a difficult task. One has to deal not only with parsing the text and cleaning it from unnecessary words (stop words), but also with representing the semantics of the processed text. There are typically several modules that analyze given text from phonological to semantic layer. Such analysis is even more demanding in languages like Slovak that do not have fixed word position in sentence.
In our work we plan to employ crowdsourcing principles in order to be able to better annotate a given text corpora. We are creating a system in which students will be completing their given homeworks and by doing so also annotating underlying corpora.
Analysis of User Behavior on the Web
Patrik Hlaváč
master study, supervised by Marián Šimko
Abstract. Development of web systems has changed considerably in recent years and it also changes the importance of the estimation of the user and further recommendations. Analyzing user behavior on the Web and interaction with a web browser via computer is a nontrivial matter, where new solutions have recently opened with better availability of technology and equipment. While existing solutions are based on monitoring the behavior directly in the browser by using peripheral devices, now we have the possibility of directly monitoring user’s gaze and focus blocks of content on the website.
One objective of this work is to propose a user model suitable for collecting data from interactions in a Web environment. The primary task will be to gather information through sensor device to identify the fields of view of the user’s interest in the content on the screen along with the use of other devices (mouse, keyboard, microphone), which allow the acquisition of implicit feedback. These data from interaction will be processed in the user model.
Utilization of Linked Data in Domain Modeling Tasks
Michal Holub
doctoral study, supervised by Mária Bieliková
Abstract. The idea of Semantic Web has found many followers among the web researchers. A lot of datasets publicly available use the Linked Data principles. These may be a perfect source for additional metadata utilizable in various tasks of web personalization, recommendation, information retrieval, and data processing. However, only few works actually pursue the idea of wider adoption of such datasets in the tasks mentioned. The aim of our work is to use Linked Data for creating domain models usable for various tasks of web personalization.
We propose a method for relationship discovery among concepts forming a concept map serving as a basis of these domain models. For this purpose we use unstructured data from the Web, which we transform to concepts and discover links between them. This can be used e.g. for recording the technical knowledge and skills of software engineers, or research fields of interest of scientists (especially in the domain of information technologies). We examine the utilization of such concept maps in order to 1) improve the navigation in a digital library based on what the user has already visited, 2) find similarities between scientists and authors of research papers and recommend them to the visitors of a digital library, 3) analyze Linked Data graphs and find identities between various entities, 4) enhance the displayed articles and recommend additional interesting information to the reader, 5) enable the user to search for particular information using natural language queries (in English).
We evaluate the models and methods of their creation directly by comparing them to existing ones or by evaluating facts from them using domain experts. Moreover, we evaluate the models indirectly by incorporating them in adaptive personalized web-based systems and measure the improvement in the experience of users (i.e. they get better recommendations, search results, etc.).
Web Applications Usability Testing by means of Eyetracking
Martin Janík
master study, supervised by Mária Bieliková
Abstract. Usability of application, also known as quality of use, is a feature, which can fundamentally influence success rate of an interactive application. Evaluation of usability depends on a type of application and on a person, who uses it. For web applications, we often do not know the set of their users. The only thing we can know, are the specific users of each end groups, but they usually represent an open and sometimes dynamically changing community. Through research of implicit feedback, which we gain from user-application interaction, we can evaluate the usability. For example, we can detect adverse behaviour, resolve it and improve the use of application.
Basic usability testing provides us sufficient amount of data to help us evaluate the design of application. Gaze tracking brings new aspect for evaluating usability. It offers information of which objects attract attention and why. By following the object gaze order we can tell how users search through web applications, creating specific gaze pattern. Based on gaze patterns we can identify users patterns of behaviour. Specific unwanted patterns of behaviour, which the owners of web applications would like to eliminate, are also present. Aimless movement across the web application, long time of users inactivity, user repeatedly visiting same web element, are some of the undesirable patterns of behaviour.
We aim to create a method which will be able to identify unwanted patterns of behaviour in domain of content management systems. Identification will be based on implicit feedback, particularly on feedback from gaze tracking. Our goal, next to identification of unwanted patterns of behaviour, is to recommend a set of steps to resolve a problem, which is likely the cause of specific pattern of behaviour. By improving our method we would like to be able to predict undesirable patterns of behaviour to avoid their creation.
Group Recommendation for Smart TV
Ondrej Kaššák
master study, supervised by Michal Kompan
Abstract. Domain of multimedia content belongs to the popular fields of human interests. This kind of content brings to people knowledge, fun, relax and so on. Due to the big interest, there is a big offer too, which brings an information overload problem, growing with the increasing availability of data.
There were involved multiple ways how to help users with described problem. One of the most effective ways is personalized recommendation, which is based on automatized selection of most interesting content, individually to each user. There are multiple known types of personalized recommendation: collaborative, content based, demographic, or knowledge based. These pure approaches are fairly good explored and known. Many researchers described their strong and weaker features.
But in the field of personalized recommendation exists a fewer scouted approach, named hybrid recommendation. In this meaning, hybrid symbolizes any method combining by some way multiple pure recommendation methods. In our research, we focused on mixed hybrid method combining collaborative and content based approach. Suitable items are in first step of our recommendation process chosen by collaborative method, which uses to recommend a power of users community. This is, in common, known as the very effective way of finding items to recommend. In the second step we use content based approach. It is used to change order of collaborative approach results based on similarity level of these items to items that user watched before. Our proposed method reaches for single users the higher precision of recommendation, compared to pure collaborative and content based methods. Our next aim is to confirm proposed concept on multiple user groups, because they are in target domain more common as single users.
Analysis of Interactive Problem Solving
Peter Kiš
bachelor study, supervised by Jozef Tvarožek
Abstract. As personality affects behavior in everyday life, so personality is also reflected in player’s gaming style. Game developers try to balance their games to fit the widest range of players as possible by creating a huge variety of activities and tasks. Consequently, every type of player finds a lot of unattractive stuff, so the problem is: “Can we identify player’s personality from the gameplay and personalize the game accordingly?”
The main objective of this work is to examine the relationship between personality traits of players with their gaming expressions and thus creating a characteristic model of gameplay behavior for each characteristic group of players. For this purpose we designed a casual browser turn-based game Hexa with logical and strategic elements. During playing the game, we track player’s in-game interactions (mouse moving and clicking), and possibly his gaze and his life functions. From mouse moving and clicking, we are able to determine the level of optimality of each move, the efficiency of mouse moving, etc. From gaze recording, we can analyze the way how player evaluates the game map and how he decides to choose his next move. Life functions data can provide us with various emotion levels of the player during gameplay such as happiness or stress. We correlate the acquired data with the player’s personality traits, which we have identified using the Big Five personality test. The results of this project is to create a model that can help us and game developers to make new games or applications with interactive content that would be better balanced for every targeted player or user.
Keyword Map Visualisation
Matej Kloska
bachelor study, supervised by Marián Šimko
Abstract. Nowadays, people are creating more digital documents – data elements – than ever before. If we want to search and navigate in those documents, we need efficient way how to interpret relations between documents. Information visualisation has become a large field and various “subfields” are beginning to emerge. The question is, whether there is an inherent relation among the data elements to be visualised. Proper relations creation and information visualisation implies high success rate in looking for desired information in documents.
If we work with a large number of data elements, we often need efficient representation of content – e.g., to describe each document with appropriate set of keywords. Keyword sets do not guarantee quality of maps and search results. It strongly depends on interconnections of keywords between documents. There are several techniques how to create keyword connections – clusters. Most of them are based on clustering techniques.
Second problem is, how to properly visualise keywords and relations between them. The importance to support user experience is also crucial problem, which we have to keep in mind. The more failproof interface, the more valuable maps.
Personalized Recommendation Using Context for New Users
Róbert Kocian
master study, supervised by Michal Kompan
Abstract. The very first users’ interaction within a personalized system is crucial from the recommender system and user modeling point of view. This activity is critical because in these moments a user creates a relationship and opinion, which are important for user to the next use of the system. If the user is new in the system, we have no information about his preferences and thus no or only trivial recommendations (personalized) can be recommended.
Today we experience huge social networks increase, where users are identified into groups according to what interests, work or relationships user have. This attributes we can use for recommendations to new users respectively. We can obtain a huge amount of information from related or similar users that can be used for increasing the quality of recommendations for the new user .We can also consider the social context obtained from other systems and applications.
In our work we analyze the current approaches for the new user in the context of different types personalized recommendations. We explore the possibility of obtaining additional information about the new user from social networks and other systems. The aim of our work is design of methods for personalized recommendation with an emphasis on solving new user problem using methods enhanced by user’s context from related and similar users.
Identifying Hidden Source Code Dependencies from Developer’s Activity
Martin Konôpka
master study, supervised by Mária Bieliková
Abstract. Monitoring and evaluation of software project is important for its management and development. Traditionally, we use source code metrics to identify code complexity, code smells or other problematic places in the source code. However, software source code is result of developer’s work, what opens up space for using information about developer’s activity to evaluate resulting source code.
In our work we propose method for identification of hidden implicit dependencies in the source code from logs of developer’s activity. Developer interacts with source code files during development, studies the contents and gains knowledge about existing solutions or copies fragments of code. From the monitored activity we can infer dependencies in the source code that were important during completion of particular task, i.e., the development task context.
Existing dependency graph of source code is created with syntactic analysis of source code. We enhance this graph with implicit dependencies, thus enlarging the space used to search for dependent software components. Implicit dependencies may reveal dependencies which are not explicitly stated in the source code but exist during the development or even during the runtime of developed software.
User Identification Based on Web Browsing Behaviour
Peter Krátky
doctoral study, supervised by Daniela Chudá
Abstract. Every person is unique in the way he/she uses a computer, an operating system, programs and even input devices such a keyboard or a computer mouse. Patterns produced by usage of standard input devices could be utilized to identify an unauthorized user. Especially, adaptive and personalized systems accessed by multiple users might benefit from this feature as tailoring the content heavily depends on previous activity of the user.
In our research, we focus on patterns in browsing the web, a very common activity nowadays. The goal of our work is to design a method to identify persons based solely on their behaviour rather than machine or browser they use to access websites. We propose a system that acquires data of user’s activity implicitly, extracts characteristics and picks up an identity associated with the user.
While browsing, mostly computer mouse is used. We systematically analyze characteristics of computer mouse usage, such as mouse movement velocity, duration of a click etc. We seek the most suitable characteristics to be used in the identification task. We compare distinctiveness of the characteristics, steadiness over the time and dependency on hardware used.
Context-based Improvement of Search Results in Programming Domain
Jakub Kříž
master study, supervised by Tomáš Kramár
Abstract. When programming the programmer does many other things along with writing the actual source code. He opens various applications and other source codes, searches them, writes notes and, last but not least, he searches the internet for solutions of problems or errors he might have encountered. He does all these tasks in a context or in order to do something, usually to solve a problem. Via identification of this context we can understand the meaning behind the programmer’s actions. This understanding can be used to make the internet search results more accurate and relevant to the current task.
In this work we analyse the existing methods used to mine the context in other domains and analyse their usability in the programming domain. An important source of contextual information is the source code the programmer is currently working on, we analyse methods used to extract metadata from source code files, software projects and other documents in general. Based on the analysis we design methods which can be used to mine the context in programming domain from the source code in order to create a context model which we further use to improve the search results. The designed methods are experimentally evaluated using logs and via live experiments.
Modeling Developer’s Expertise
Eduard Kuric
doctoral study, supervised by Mária Bieliková
Abstract. Estimation of developer’s expertise allows, e.g., developers to locate an expert in a particular library or a part of a software system (someone who knows a component or an application interface). In a software company estimation of developers’ expertise allows, for example, managers and team leaders to look for specialists with desired abilities, form working teams or compare candidates for certain positions. It can be also used to support so-called “search-driven development”. Relevance of software artifacts is of course paramount, however, trustability is just as important. When a developer reuses a software artifact from an external source he has to trust the work of an external developer who is unknown to him. If a target developer would easily see that a developer with a good level of expertise has participated in writing the software artifact, then the target developer will be more likely to think about reusing.
Our approach is based on investigation of software artifacts which the developer creates and the way how the artifacts were created. In other words, we take into account (1) the developer’s source code contributions, their complexity and how the contributions were created to a software artifact (e.g. copy/paste actions from external resources, such as a web browser); (2) the developer’s know-how persistence about a software artifact; (3) and technological know-how – the level of how the developer knows the used libraries, i.e., broadly/effectively.
Observing and Utilizing Tabbed Browsing Behaviour
Martin Labaj
doctoral study, supervised by Mária Bieliková
Abstract. We focus on observing, analysing, logging and utilizing the tabbed browsing (parallel browsing, tabbing, etc.) both within adaptive web based systems and on the open Web. Adaptive systems in general make use of information about users, content, etc. Implicit user actions expressed as tabbing could be used to improve user models, domain models, or aid in recommendation.
The tabbed browsing is currently regarded as a more representative notion of actions during browsing than the previous models which considered visits to resources in a linear fashion and disregarded the possibility of a user having opened multiple pages at once and returning to them without repeating the page load. Users browsing the Web do browse in tabs and they do so for various reasons in different scenarios: keeping a page opened as a reminder to do or read something later, finding additional information about topic on a given page, etc.
The parallel browsing behaviour, however, cannot be reliably inferred from typical server logs. It can be observed with the aid of client side scripts embedded within web pages (observing all users of a single web application) or from a browser extension (observing tabbing on all web applications visited in the augmented browser, but only within a smaller group of users who have the extension installed). We previously realized tabbing logger within ALEF adaptive learning system and on the open Web within Brumo browser extension. We are currently modifying the single-application logger to consider tab switch delays and log in the new format, while we also propose browser extension Tabber, which allows users to view and analyse their usage of browsers tabs, and its data can serve as a dataset of browsing the open Web.
Acquisition and Determination of Correctness of Answers in Educational System Using Crowdsourcing
Marek Láni
master study, supervised by Dr. Jakub Šimko
Abstract. In past years, the Web began to be used largely for education purposes. There are many technology enhanced learning (TEL) or community question answering (CQA) portals and web sites which are being used to gain knowledge and information. Therefore, the systems are beneficial to the users, however the users can be beneficial to the systems too. We can say, that it is a win-win relationship. The benefit for the systems comes from the content, which is often crowdsourced, i.e. generated by users themselves. Since not everyone, who creates this content is expert in a specified area, it is necessary to filter it. The problem is, that this filtering is a time consuming process and should be automated.
There are several approaches how to do so and aim of our work is to take, combine and modify some of them to achieve satisfactory results in filtering of answers on questions within the TEL system. In our approach we are focused on collected user created evaluations of answers to questions. Our aim is to use advanced methods of interpretation of crowd answer (evaluations) and determine if crowd is capable to evaluate answers similarly to expert. We have already experimented with several methods of interpretation, namely determination and filtering of out-layers, determination of force of certain evaluation based on distribution of the user’s evaluations and we also plan to use machine learning and create user model based on concepts assigned to every question in new methods of interpretation. Evaluation of the interpretation methods is done by comparing with expert evaluation assigned to every answer.
To collect the sufficient amount of data we developed our own system, which includes some of the key features of the common CQA systems.
Query by Multiple Examples Considering Pseudo-Relevant Feedback
Adam Lieskovský
bachelor study, supervised by Róbert Móro
Abstract. Digital libraries offer us many different ways to form a query and search for articles. The most common is keyword search which has been fairly popular among users for some time and majority of them can use it efficiently nowadays in contrast to the past. However, sometimes when searching for documents in a domain which is not well known to us, using keyword search might lead to dissatisfying results, furthermore users can become confused because they are not aware of the right terms to use for the construction of their queries.
One of the methods that eliminates the need of an explicit query formulation (user still has to have a sufficient level of knowledge to make the initial query) is query by example (QBE) that has proved to be successful in the content-based image retrieval or multimedia domain as a whole.
We propose a QBE method based on metadata similarity using explicit relevance feedback. In addition, initial results reflect the application of pseudo relevance feedback on a keyword query. This way, even novice or inexperienced users can interactively select relevant articles and with each iteration improve their initial query results by their actions. We plan to evaluate our approach in bookmarking system Annota by means of a user study.
Researcher Modeling in Personalized Digital Library
Martin Lipták
master study, supervised by Mária Bieliková
Abstract. Researchers either use digital libraries to conceive solutions to particular problems or to keep track with the latest trends in their domains of interest. In order to deal with the amount of information in digital libraries and to improve the researcher experience, some parts of the digital library (like recommender, search or navigation) are personalized. Behind every personalization there is a user model — in our case researcher model. We propose the researcher model that comprises various researcher’s activity in the digital library — papers stored in her or his personal library, authored papers, browsing, search, tagging, categorization and annotation.
The final researcher model is represented as a vector of terms most relevant to the researcher. To allow extensibility, the model is internally represented as a graph of intermediate results that lead to the resulting model. We implement the researcher model in the Annota digital library. We evaluate the researcher model by determining correlation between the researcher model terms and what the researchers think themselves as their research interests. The experiment uses a game with purpose to examine which terms are relevant. Then we perform a thorough study with the subjects of the experiment to bring further qualitative evaluation of the proposed model of researcher’s interests.
Recommendation in Adaptive Learning System
Viktória Lovasová
bachelor study, supervised by Martin Labaj
Abstract. Recommendation systems are an important part of educational systems, where they help students for example deciding what to learn next. Recommendation methods use explicit and implicit feedback. With explicit rating, the users can show their opinion in various ways – using stars, thumbs up/down, scales 1 to 10, and others. In implicit feedback, the users show their opinion without even being aware of that – their actions are monitored: where they click, what they buy, what they browse and so on. Parallel browsing may represent a type of implicit feedback – including actions, such as switching between various websites, spending time on a site, opening links into new tabs or reusing the current one. In our research, we take these indicators into account in recommendation.
We propose a recommendation method based on the parallel browsing in the ALEF adaptive learning system. We recommend learning objects based on student activity. Switch pairs of related learning objects are captured. In this notion, a learning object rates another one according to the time spent on the object and the time it takes the user to switch between them, possibly including time spent in different sites until arriving to the object being switched to. For every user, we track a table with his/her switches between learning objects. Then, if a student is on a specific learning object a corresponding one according to others’ switches is recommended.
We evaluate the method accuracy through a closed experiment, in which we evaluate students obtaining recommendations based on their parallel browsing against students with recommendations from a sequence recommender. The students with recommendations based on parallel browsing should achieve better learning results than students with the standard recommendation method. This method could also be used in other domains, where the content analysis is costly: e.g. in systems serving video, or image content, since switching between pages may indicate similarity.
Application for International Competition
Jakub Mačina
bachelor study, supervised by Jakub Šimko
Abstract. We are going to compete in international competition with our application aimed at solving real world problems. Below is presented one of many ideas we have. Nowadays, still more and more people are suffering from diseases like obesity and wide range of cardiovascular diseases. It is mostly caused by food what we are eating, that contains harmful additives, preservations, pesticides or it can be genetically modified.
Our goal is to support local agriculture, local farmers and encourage people to buy home made products on local farms near them instead of supermarkets products. To reach our goal we focus on web application along with smartphone application for farmers, distributors and customers. Local farmers can present yourself with their work, control distribution of products to their customers or contribute with other farmers nearby. Customer is able to find farmers products near his location and verify quality of offered products. Application contains social elements and gamification.
Low-Cost Acquisition of 3D Interior Models for Online Browsing
Filip Mikle, Matej Minárik, Juraj Slavíček, Martin Tamajka
bachelor study, supervised by Jakub Šimko
Abstract. Building a dense 3D model in a usual 3D editing tool requires a lot of time and a significant skill in this field. Using our solution, everyone, even non-technically skilled person is able to model real places in a few minutes. Modeling interiors this way looks much more like filming than 3D modeling. It takes only a few more minutes to get a 3D model than to take a set of pictures of interior.
The first part is a scanning application, designed for real estate agents. The second part is a browsing web application, designed for ordinary people. Customers of real estate agency can easily walk through different interiors previously scanned. There is also information about our project and information for real estate agents. When the scanning is finished, scanning application processes the data and generates group of object files. Subsequently, agent uploads these files through our web application and the model is ready to be browsed.
Activity-based Search Session Segmentation
Samuel Molnár
master study, supervised by Tomáš Kramár
Automatic search goal identification is an important feature of personalized search engine. The knowledge of search goal and all queries supporting it helps the engine to understand our query and adjust sorting of relevant web pages or other documents according to our current information need. To improve the goal identification the engine uses other factors of user’s search context and combine them together by different relevance weight. Although, most of factors utilized for goal identification involve only lexical analysis of user’s queries and time windows represented as short periods of user’s inactivity.
In our work, we focus on utilizing user activity during search for extending existing lexical and time factors. By analysing user search activity such as clicks and dwell time on search results, we better understand which search results are relevant for user’s current information need. Thus, we utilize user’s implicit feedback to determine relevance between queries by search results they share. Strong relationships between queries provide similarity measure between queries by the number of shared link adjusted by user’s implicit feedback. Semantic analysis of queries and search results snippets is another factor we introduced for clustering queries into sessions. Utilization of encyclopedias like Wikipedia and Freebase can provide a way of understanding concepts and user’s intention behind the query and thus provide another clustering factor. We plan to integrate our model of weighted factors utilizing user activity and semantic analysis to existing search engines or servers like Elasticsearch.
Using Navigation Leads for Exploratory Search in Digital Libraries
Róbert Móro
doctoral study, supervised by Mária Bieliková
Abstract. Searching for relevant information and navigating in the information space of a digital library when researching a new domain, can be a challenging task even for a seasoned researcher and more so for novice ones, such as starting master or doctoral students. They can have a hard time formulating keyword queries, because they lack the needed domain overview and knowledge. Therefore, various means of exploratory search and navigation are provided in digital libraries, such as faceted search, tag cloud etc. However, the most natural way of navigation seems browsing, which does not force users to split their attention between the navigation interface and the search results.
In order to emulate this behavior and support exploration of the domain by the users, we provide them with navigation leads, i.e. links to relevant documents, with which we enrich the documents’ summaries (or abstracts). We focus on the problem of identification of information artifacts (leads, i.e. keywords) in the summaries that are suitable for further exploration. From these we recommend and visualize only those that are relevant for the users based on their current context (represented by their search query or a set of previous queries). In the process of leads’ recommendation other aspects, such as novelty and diversity of the leads, are considered as well.
In addition, we examine the influence of the navigation leads’ different visualizations on the users’ performance of exploratory search tasks in a digital libraries domain. We compare three types of leads visualization, namely visualization in text (of a document’s abstract or summary), under text and in a term cloud next to the list of search results. We evaluate our approach on the scenario of a researcher novice by conducting a user study in a web-based bookmarking system Annota.
Collocation Extraction on the Web
Martin Plank
master study, supervised by Marián Šimko
Abstract. The main topic of our work is extraction of collocations in the natural language. Natural language processing is important issue connected with metadata acquisition. The processing often involves identification of collocations in the text. The information, whether the words form a collocation or not, are metadata, too. There are various methods, which allow us to identify the collocations automatically.
Our work contains the analysis of existing methods for automatic collocation extraction. We characterize the issue of collocations in general and the properties of collocations, which can be utilized for the task of collocation extraction. One of these properties is limited modifiability of the collocation components. We propose a method for collocation extraction, which is based on this property. Preliminary experiments show that performance of this method can be compared to other methods used in this area. Further evaluation and improvement of the method is the object of our future work.
Assessing Code Quality and Developer’s Knowledge
Jana Podlucká
bachelor study, supervised by Dušan Zeleník
Abstract. We encounter the term quality very often in our lives. Thus, it certainly does not avoid the software projects either: quality or its absence is one of the main aspects for the evaluation of the source code. The significance of this evaluation is, however, not limited to the customer: programmer himself should be interested in it, as it may form the basis for the improvement of his work.
Since the creation of programming languages, many techniques have been formed to detect bugs in the code automatically. Most of them rely on formal methods and a sophisticated analysis of the code, although these techniques do not necessarily discover the bugs every time. Those are especially the logical bugs that can be discovered only during the testing process.
We solve this issue from the aspect of programmer’s context. We reconstruct the context by using logs of real software developers and their activities. At our disposal, there are bug reports, where we try to find correlation between faultiness and the conditions of code creation. Based on these relations, we are able to identify parts of the code with high risk of bugs.
Discovering Identity Links between Entities on the Semantic Web
Ondrej Proksa
master study, supervised by Michal Holub
Abstract. Currently, millions of specific web pages are being created in the World Wide Web, while most of them are published in a non-structured form. Linked Data are structured data containing entities and relationships among them that are available through the Web. Some of the datasets are created through automatic processing of publicly available data, while having various utilization when personalizing web pages, searching or when deducing new knowledge. When creating new datasets, entities are usually connected through widely familiar datasets, but they are lacking further connections. One of the main problems is caused by detecting and creating relations and connections among existing datasets.
The aim of this work is to analyze the topic of creating new relations and to propose a method, which will enrich LOD (Linked Open Data) graph with new relations among existing datasets. LOD is a big graph of entities and relationships, the main aim of our method is to find the similarity between two vertices (entities) in the graph. If two vertices (entities) are sufficiently similar, then there is a relationship owl:sameAs between the entities they represent. The similarity of the graph nodes is based on the similarity of nodes’ properties and similarity of nodes’ relationships.
The proposed method will be experimentally tested on more use cases, because our method must be sufficiently generic and yet functional at the LOD. Therefore, we plan to experimentally evaluate our method on existing LOD datasets, GOV data and the DBLP – Computer Science Bibliography.
Automatic Web Content Enrichment Using Parallel Web Browsing
Michal Račko
master study, supervised by Martin Labaj
Abstract. In the domain of education, it is important to know relevant information about learning objects and relationships in it. Adaptive learning systems are expanding the area of personalizing the learning needs of individual users. Assuming that the user behavior is the same while following the same goal e.g. searching for additional information for given topic, we propose a method for automatic web content enrichment based on users’ actions in the domain of open Web.
Currently, there is a large number of web browsers allowing tabbed browsing. All of them in their latest versions support this kind of browsing. Some of them allow to persistently maintain the selected tabs, or renew their state even after the user closes the application. Various researchers found that users use tabs in a large number of ways. Among them are temporary lists, parallel search results, tabs to return to the previous page, and so on, while the majority of these ways of usage were not explicitly planned.
The aim of this project is the relationship discovery between sites frequently visited by users using multiple tabs. What are the relations between them, or whether they can be linked together based on a way the user accessed them. These links may not be dependent on the physical hyperlinks between sites, but based on the user browsing behavior. User actions are preprocessed, transformed to sessions and loops and classified based on defined behaviour models for further information extraction. The proposed method is evaluated in ALEF adaptive learning system where the aim is to make easier or automatize adding external resources to learning objects.
Employing Information Tags in Software Development
Karol Rástočný
doctoral study, supervised by Mária Bieliková
Abstract. A management of software development process is a crucial part of the software engineering, from which the success of software projects are dependent. This management mostly relays upon quality and freshness of software metrics and analysis over these metrics. Software metrics can be based on source codes or empirical data about software developers. Code-based metrics are well known and many approaches based on them have been proposed. But empirical software metrics are still uncovered part of software engineering even though they contain important information about software development process and they can be used e.g. for forecasting significant trends similarly as empirical data (e.g., implicit user feedback) in the web engineering. Reasons of this state are time expensive and erroneous collecting empirical data. We proposed solution of these problems based on collecting, storing and maintenance of developer-oriented empirical data abstracted to information tags and also empirical software metrics.
Nowadays we are working on proposition and evaluation of methods focused to automatic generation of information tags from stream of events and to automatic maintenance of information tags. As a core of these metrics we proposed information tags generator which queries stream of events in RDF format and executes tagging rules after successful queries evaluation. These tagging rules can be defined manually or learned automatically via analyzing modifications in information tag space.
Exploring Multidimensional Continuous Feature Space to Extract Relevant Words
Marius Šajgalík
doctoral study, supervised by Mária Bieliková
Abstract. With growing amounts of text data the descriptive metadata become more crucial in efficient processing of it. One kind of such metadata are keywords, which we can encounter e.g. in everyday browsing of webpages. Such metadata can have various purposes, like usage in web search or content-based recommendation.
In our work we focus on vector representation of words to simulate the understanding of word semantics. Each word is thus represented as a vector in N-dimensional space, which includes the advantages of using various vector operations like easy similarity measuring between pairs of words, or vector addition and subtraction to compose meaning of longer phrases. We can also calculate what words are the most similar by finding the closest vectors, or vector that encodes a relationship between pair of words, e.g. vector transforming singular into plural, etc. With word vectors, we can encode many semantic and also syntactic relations.
We research the computation of keywords in vector space. This perspective on the keyword extraction problem also brings another new interesting challenges and there are lots of unsolved open problems. So far, we developed a method of extracting relevant words in clusters of words with similar meaning. It means that each cluster contains a keyword, but is cluttered with other similar words that often correspond to less common synonyms or their misspelled alternatives, so that computed data is still noisy and needs to be cleaned. This could be achieved by using frequency statistics, which is our next task to complete.
Personalised Search in Source Code
Richard Sámela
master study, supervised by Eduard Kuric
Always, programmers try to solve their development problems as easy and quickly as they can. Maybe all of them are using the internet or some repositories with source code for finding the right solution. There are a lot of examples, tutorials or other options, how to get some lines of code, which are reachable by programmer. The most efficiency solution is reusing existing source code instead of creating a new one. But the problem is to find source code, which the best fits to solve development problem.
We will analyse some options how to recommend source code. This could be done by creating programmer’s user model. It should be based on implicit and explicit feedback. Implicit feedback should contains information about programmer, source code fragments implemented by programmer and information about technologies, what programmer used in some project. Explicit feedback will contains information, which are added manually. After that, we will be able recalculate a knowledge score of every programmer. Knowledge score will be calculated from user model of programmer and it will be useful for personalised recommendation of source code.
Anomaly Detection in Stream Data
Jakub Ševcech
doctoral study, supervised by Mária Bieliková
Abstract. During the last few years we can hear all over us a buzzword Big Data. The definition of this term is rather fuzzy, but one of the most frequent is that it is a common name for techniques for processing data which are characteristic by its big amount, big velocity and/or big variability. The most common technique for dealing with such data is batch processing. However in many applications this type of processing is not viable mainly due to delays caused by the batch job processing time. When we require real time processing we have to reach for the stream processing.
In our work we focus on processing stream of data where we are working on methods for anomaly detection. The main challenge is to be able to process big number of various metrics running on the streamed data and to be able to do so in a single pass through the data. Our aim is to create and evaluate (precision and performance) a method for anomaly detection in stream data based on pattern detection on individual metrics running on the data and on classification of stream state using detected patterns and supervised learning.
Processing and Comparing of Data Streams Using Machine Learning
Miroslav Šimek
master study, supervised by Michal Barla
Abstract. From many different approaches of machine learning, multilayered self-teaching neural networks (a.k.a. Deep Belief Networks) using the unsupervised learning approach are gaining popularity nowadays. They used to be not accepted and largely ignored by most of the experts in machine learning community for almost 40 years. One of the reasons was simply because of too little computational power of available technology at the time. However, today it is already producing interesting results for example in computer vision.
Neural networks with one hidden layer use this layer to find the patterns and features in the input layer. These features are much more useful for deciding on how the output will look like than just the raw input data. Multilayered neural networks take this approach to higher levels of abstraction. First hidden layer finds the features in the input layer, the second hidden layer finds the patterns and features of the features in the first hidden layer and so on. This approach is also a bit closer to how our brain works with multiple levels of abstraction. The problem with multilayered neural networks is that the usually very powerful backpropagation algorithm used in supervised learning doesn’t work here as it is losing power with every layer. This is where unsupervised learning comes useful to pre-train the hidden layers one by one separately to find the patterns and features in layer underneath. After this stage of unsupervised learning the backpropagation algorithm is once again useful to fine-tune the model.
Our goal is to find methods and new ways of training to utilize the potential of multilayered neural networks and unsupervised learning to process and compare large streams of unlabeled data like data from eye tracker or sound recordings.
Adaptive Support for Collaborative Knowledge Sharing
Ivan Srba
doctoral study, supervised by Mária Bieliková
Abstract. Information retrieval systems (e.g. web search engines or digital libraries) provide users with powerful tools how to effectively identify valuable information and knowledge in the great information space of the current Web. However, sometimes it is quite difficult to describe very specific information, especially if it is possible to find it only by keywords as it is in the case of search engines. In these situations, Internet users have a possibility to ask their questions in popular Community Question Answering systems (CQA) such as Yahoo! Answers or Stack Overflow.
The main goal of CQA systems is to harness the knowledge potential of the whole community to provide the most suitable answers on the recently posted questions in the shortest possible time. We assume that searching for the answer to the question is actually also a specific way of learning. Therefore in our project, we present a novel perspective on CQA systems as collaborative learning environments. We reflect this perspective in the proposal of question routing method which is probably the most important part of each CQA system. It refers to a recommendation of potential answerers who are most likely to provide an appropriate answer on the newly posted question. The proposed method promotes diversity of routed questions to maximize the learning potential of CQA process.
In addition, when we consider CQA systems as an innovative learning environments, we suppose that their potential for supporting of organizational knowledge sharing and collaborative learning is only to be discovered. We develop a CQA system named Askalot which is designed specifically for universities where students can take advantage of learning aspect in question answering process. Askalot will provide us also a possibility to study concepts of CQA in organization environment and consequently to apply the proposed method also in a live experiment.
Browsing Information Tags Space
Andrea Šteňová
master study, supervised by Karol Rástočný
Abstract. Software systems create various types of metadata that describes the specific parts of a structured addressed content, provide information about what data are, or give us additional information about the user who created them and how they were created. Metadata can also allow a better understanding of the data and display the data change over the time.
Information tags, an example of metadata types, contain structured information associated with a particular piece of content, such as the number of clicks on the link in a page, keyword characterizing paragraph in the article or the number of lines of method’s source code. Metadata are usually generated and processed by machine and their amount and machine representation makes it difficult for people to effectively read and understand them.
In our work we propose method to help users browse information tags space connected to source code. We show the source code to users as a tree using a fisheye view. We create map of information tags over the source code, which display associated information tags and visualize their different values. We aggregate values of information tags to individual nodes of the tree. Users are able to filter information tags space using faceted browser and browse data using multiple views. Our method support access to source code with specific properties, which represent possible values of information posts. We plan to verify our solution in domain of project PerConIK.
Implicit feedback-based discovery of student interests and educational object properties
Veronika Štrbáková
master study, supervised by Mária Bieliková
Abstract. In the present, searching and recommendation on the web are becoming more and more common. Whether it concerns search and recommendation of articles in news and digital libraries, of study materials, or different products in e-shops, it is essential to know the characteristics of the objects being recommended, and the characteristics of the person manipulating these web objects. These characteristics are collected via implicit feedback. Inaccurate information, collected from evaluation of implicit feedback from human behavior, has significant influence on the accuracy of recommendation. With the increasing possibilities of monitoring users on the web, like signals from eye tracking camera, blood pressure, body temperature and pulse sensors, we gain the ability to evaluate implicit feedback with great accuracy, and with that, gain the related interpretation of various signals of activity in different domains.
Despite the existing implicit methods of evaluation for various signals of user activity, which aim to explore its characteristics, there is still room for improvement. Our research is aimed at the attributes of users and educational objects with the use of implicit feedback indicators, and their interpretation for use in the domain of education. By the research of chosen implicit feedback indicators, individually and with each other, we will explore their mutual relations.
The goal of this work is to improve the user model, based on the gained interpretations of the indicators of implicit feedback and to propose a method for the use of collected information in the domain of education on the web. Further, we plan to experimentally prove our findings in the context of educational objects and in the context of user modeling.
Collaborative Learning Content Enrichment
Martin Svrček
bachelor study, supervised by Marián Šimko
Abstract. Collaborative learning is a situation in which two or more people learn or attempt to learn something together. There are many studies which show that collaborative learning is (in many cases) better than normal or individual learning. In the context of collaborative learning, web can becomes a medium in which students can ask for information, evaluate one another’s ideas and monitor one another’s work, regardless of their physical locations.
We want to enrich the learning content using a new type of annotations – definition (within the educational system ALEF). On the one hand, definitions can help students find the most
important keywords and their explanation. The whole information (both keyword and explanation) will be provided together. On the other hand, we can enlarge conceptual metadata, which can be used to improve services in the system (e.g. search, recommendations).
After that we can present Web page data in such a way that is understood by computers. In context of definitions we face problems such as synonyms or different explanation of one
definition. Therefore, we also want to evaluate these definitions. Rating of definitions will enable us to show students the most accurate information. There are a lot of factors that can influence the rating of the definition (e.g. student reputation, number of similar explanations, …). By solving these problems we can both help students and improve the information processing and presenting service in the educational system.
Using Parallel Web Browsing Patterns on Adaptive Web
Martin Toma
master study, supervised by Martin Labaj
Abstract. The possibility to use tabs as a tool for parallel web browsing is definitely not new. In recent years however, more and more people tend to use this feature every day. Despite that little research has been done to deeply analyze how and why people use such parallel browsing mechanism. Even less of them aimed to utilize this information in a way which is helpful for an average web browser user. We focus on identifying patterns in parallel web browsing activities and connecting them into meaningful actions.
After recognizing these actions, we will aim to provide some kind of appropriate recommendation. In this stage of research, we are thinking about recommendation options including (1) Web content recommendation (in ALEF domain or on open Web), or (2) Browser actions recommendation (bookmark this page, etc.).
The main output of this work, web browser extension, will be capturing parallel browsing activity and also providing some kind of recommendation. There already is a Brumo project, which is a Mozilla browser extension with ability to capture this kind of parallel web browsing activity. We are currently using the data captured by Brumo to identify browsing patterns. Furthermore, we also plan to implement Google Chrome extension, that will provide additional value to the browser users via its recommendation features.
Method for Novelty Recommendation Using Topic Modelling
Matúš Tomlein
master study, supervised by Jozef Tvarožek
Abstract. The web content has a very dynamic nature, it frequently changes and spreads over various information channels on the Web. The huge amounts of content on the Web make it difficult to find web pages that provide novel information.
In our work, we aim to model the information value of web content in order to recommend novel articles to the user. To this end, we use topic modelling to work with information in web content at a higher level. We focus our method in the domain of news articles. We introduce a method for clustering articles based on their relevancy and also a method for ranking topics based on their novelty.
We evaluate and compare our method with two other baseline methods for novelty detection. The evaluation shows that articles recommended by our method have much improved relevancy to the interests of the user while maintaining comparable novelty to the baseline methods.
Determining the Relevancy of Important Words in Digital Library using the Citation Sentences
Máté Vangel
master study, supervised by Róbert Móro
Abstract. The number of articles and publications in digital libraries is very huge and is persistently increasing, therefore keeping everything organized in the world of digital libraries has become impossible without automating of some of the processes. These issues can be resolved by different ways, but the main concept is realized by methods of text mining. The main purposes of text mining is to identify, extract and link specified or relevant informations (objects) from texts. Text mining is a very important discipline, which can be used to fulfill various kinds of tasks like domain modelling, automatic text summarization or navigation in a cloud of keywords. Keyword extraction is usually done from the text of the specified document, but in digital libraries there are some other possible options for extracting relevant words. One of these possibilities is to use the available information related to the article, which is called metadata.
There are many kinds of metadata in digital libraries, for instance keywords provided by the author, tags, year of publishing, category in which the article is located and tags associated by users. Citations can also be considered as an important source of keywords, because they can characterize the article. They can also highlight different, but relevant aspects of the analyzed article, which can be relevant for other researchers. There are various possible solutions for keyword extraction, however, they are mainly using one source of available information. Usually this is the main text of an article. We aim to extract keywords with the help of document metadata in our work, and we consider citations as the main source(input) for keyword extraction.
Using Tags for Query by Multiple Examples
Tomáš Vestenický
bachelor study, supervised by Róbert Móro
Abstract. Nowadays, the most widely used approach for searching information on the web is keyword-based. The main disadvantage of this approach is that users are not always efficient in keyword choice. This is why we aim to help users with query formulation or offer them different, more natural approach.
Our method is based on building the query by choosing positive examples of documents or typing keyword-based query as a starting point and then specifying information radius of results by selecting positive or negative examples utilizing explicit relevance feedback as a query refinement method. We use additional metadata (user-added tags and author keywords) to determine document similarity. Users add tags to documents which aids searching process, because users choose what is relevant for them from the particular topic. Therefore, tags can be used for more fine-grained query refinement by enabling users to see and/or remove tags from documents selected as positive or negative examples. In cases, where documents have not yet been tagged, we use keywords provided by the document author.
We focus on the domain of digital libraries of research articles and we evaluate our proposed method in bookmarking service Annota by means of a user study.
Evaluating Context-aware Recommendation Systems Using Supposed Situation
Juraj Višňovský
master study, supervised by Dušan Zeleník
Abstract. In the age of information overflow we witness an increase of personalized systems’ popularity. Among many solutions, coping with content adaptation to users’ needs, recommendation systems seem to stand above all of them. The main purpose of recommendation system is to deliver relevant information to the user and thus simplify his navigation in data overflow.
In the process of item selection, users are usually affected by various factors (e.g. user’s willingness to spend money in e-shops can be influenced by his wealth or by forthcoming Christmas). These factors, also known as contexts, describe the situation and the environment of the user. Including contexts in the recommender system, makes it possible to generate more specific recommendations. This may lead to recommendation quality enhancement.
We propose a novel approach for evaluating context-aware recommendation systems using supposed situations. In this paper, we focus on improving a number of recommendations covered by user study evaluation approach and thus reduce its costs. In this concern, supposed situations are used to simulate various context-aware situations and to simulate the behaviour of a bigger set of users using only a standard number of user study participants. In the experiments we use a dataset from a simple event recommendation system. To evaluate the proposed approach we compare outputs of user study using supposed situations with standard user study.
Web Search Employing Activity Context
Ľubomír Vnenk
bachelor study, supervised by Mária Bieliková
Abstract. Too much information is available on the web, so it so simple to lost in this amount of data. When a user tries to find something valuable, the best current way is the use a web search. However, even if the web search machines are advanced, they cannot really know what the user is trying to find. It is mainly because average query length is 2-3 words. Main purpose of my research is to specify query by extending itself by user’s context.
To get user’s context, we developed activity logger. It captures user’s activity and interaction inside browser and also outside browser, at desktop applications. It records applications name, copy / paste between applications and time when user switched to another application. It also get keywords of actually writing document and web pages. All this data are user’s activity context and we need to give priorities to each event and information to get the most precise context.
We hypothesise, an application that user was recently using is connected with user’s query. So user’s intentions to search something are rooted in an application context and specific meaning of the query that may look ambiguous can be found in the application content. Therefore finding connection between query and the application is crucial. We try to find it by examining interaction between query and every application and by finding connection between each application and query. We extend query by most relevant keywords of application’s context. They must be the most relevant to query and the most relevant to application itself. We believe it helps user to find valuable information faster and easier.
Modeling Programmer’s Expertise Based on Software Metrics
Pavol Zbell
master study, supervised by Eduard Kuric
Abstract. Knowledge of programmers expertise and their activities in environment of a software house is used in prior to effective task resolving (by identifying experts), better forming of teams, effective communication between programmers, personalized recommendation or search in source code, and thus indirectly improving overall software quality. The process of modeling programmer’s expertise (building the knowledge base) usually expects on its input some information about programmer’s activities during software development such as interactions with source code (typically fine grained actions performed in IDE), interactions with ITS (issue tracking) and RCS (revision control) systems, activities on the Web or any other interaction with external documents.
In our research, we focus on modeling programmer’s expertise based on software metrics such as software complexity and source code authorship. We assume that programmer’s expertise is related to complexity of the source code she is interacting with as well as to a degree of authorship of that code. In case of software complexity our idea is to explore alternative approaches to LOC (lines of code) based metrics, such as weighted AST (abstract syntax tree) node counting or call graph based metrics. With source code authorship we expect programmers who wrote some code to be experts on that particular code, but we need to consider only some degrees of authorship as the code evolves is changed by other programmers over time. Information acquisition for programmer modeling in our work is based on activity logs from programmer’s IDE. We plan to implement our method as an extension to Eclipse IDE for Java programmers and evaluate it on data from academic environment or (preferably) real software house environment.