Students’ Research Works – Spring 2013

Personalized Search and Recommendation

Personalized Navigation

User Modeling, Virtual Communities and Social Networks

Domain Modeling, Representation and Maintenance

Semantics Discovery

Proceedings Template

Doctoral Staff

Mária Bieliková
o collaborative/personalized/context search and navigation
o recommendation for adaptive social web
o reasoning in information space with semantics
o user/user groups and contexts modelling

Michal Barla
o user modeling
o implicit user feeback
o virtual communities
o collaborative surfing

Jozef Tvarožek
o social intelligent learning
o collaborative learning
o semantic text analysis
o natural language processing

Marián Šimko
o domain modelling
o semantic text analysis
o Web-based Learning 2.0
o ontologies, folksonomies

Keeping Information Tags Valid and Consistent


Karol Balko
master study, supervised by Karol Rástočný

Abstract. Information tags as type of descriptional metadata with sematic connection to marked content represents opportunity of enrichment of objects ( marked content ) with additional informations and connections of objects in wide informational space (for example web). This makes it able to form space of structured informations about content and interesting infrastructure of connections, which are machine-readable. It is thus a topic, which have potential for future of web as such, since technology of information tags have potential to solve open issues for cooperation of software systems in making machine processable informations and knowledge about the content readable by people.

In our work we deal with methods of keeping information tags valid and consistent, focused on information tags which enrich sources from space of wide informational space ( for example web, or PerConIK). We want suggest methods of automated detection of timelines and consistency of these information tags.

User Interest Modelling based on Microblog Data


Miroslav Bimbo
master study, supervised by Marián Šimko

Abstract. User model is a digital representation of a real user, necessary for providing personalized behaviour in information systems. Creating user model based on microblog data has a great potential due to the amount and nature of data produced by users. On the other hand, it is a nontrivial problem due to the shortness of microblog posts and specific language used in them. It is necessary to cope with these problems by linking posts to external sources by using various matching criteria. It is easier to extract more user related information from “richer” and well-structured external sources, than from original post. We proposed a method for user interest modelling, utilizing several enrichment methods and aggregating their output together.

We propose the method for user interest model creation. It is based on the intuition, that we can build better user model by aggregating results of several different enrichment methods. The model is represented as vector of pairs – interests (semantic web entities) and their relevance to the user. We investigate in importance of weights given by enrichment methods, by semantic web entities extractor and by a classifier which compute, how much are particular posts interest-related. Furthermore, we propose methods aggregating same interests found more times in one post, or aggregating same interests found in more posts.

Personalized Web Documents Organization through Facet Tree


Roman Burger
master study, supervised by Mária Bieliková

Abstract. With vast amount of accessible and relevant information and resources through web, one may start to seek for effective archiving and organization of resources. Keeping track of resources organization and structure then becomes tedious task. Users need to manually manage their resource collections which produces significant tasks overhead and requires a lot of cognitive effort. Main problem although is in very specific supported use cases of existing solutions in personal information management. It is not uncommon for users to actually abandon the information management due to inconvenience and lack of effectiveness.

We propose method for personal information management based on facet view of personal information structure. Structure is displayed in the form of tree so with facet chaining we can create any depth of structure and with that any context specification of a resource. Available facets are defined in front so we can do automatic extraction of relevant metadata from new resources. Archiving new resource is than as simple as one click. For users that will have hard time working with facet tree, we enhance this method by automatic extraction of clusters of similar resources. These clusters gets context in a form of a color. Colors then make values for a special facet, providing consistency with facet tree.

Emotion-Aware Movie Recommender Based on Genre Impact Analysis


Dominika Červeňová
bachelor study, supervised by Dušan Zeleník

Abstract. Using context-enriched data when creating recommendations, gives us a new perspective and a chance of better understanding of users and their needs. It is known that mood and emotions have strong influence on peoples’ decisions and behavior therefore we assume, there is a relation between users’ current mood and genres she might like.

We propose a context-aware knowledge-based method that recommends movies using users’ current emotions. As the first step we defined some binding rules between genres and a type of mood gained from user. The rules are based on a research we have made with some web users and psychologist, as well as data mining from a database of movies, users and contextual information about them.

In our method we use post-filtering and existing recommendation service – we transform the list of recommended movies into list of movies relevant at the moment, taking the defined rules into account. We differentiate between what user might like in general and what he might find interesting at the moment.

To evaluate our method we made some experiments with the database of movies, users and explicitly acquired context, which confirmed our hypothesis essentially. In addition, we plan to make some qualitative experiments with real users as well.

Improving Speech Therapy by Motivational Home Exercises


Peter Demčák, Ondrej Galbavý, Miroslav Šimek, Veronika Štrbáková
bachelor study, supervised by Michal Barla

Abstract. The ability of speech has an important place in human lives. Nowadays, many people and especially children suffer from speech disorders. These disorders have a negative effect on the quality of an individual’s life. It is essential that children perfect their ability to speak correctly before they begin attending primary school, because after reaching the age of seven, the correction of habitually incorrect speech becomes lengthy and rather difficult.

Speech therapists have approximately only 20 minutes to spend with a child per session, which is not enough time to achieve any significant improvements. The speech therapists need to prescribe children various types of exercises to do at home. In reality, many children refuse to exercise and view the exercises as punishment because it is tiring and takes up a lot of their time. Parents also tend to underestimate the importance of exercising, or they may lack the needed time and energy to help their children exercise, or they can just plain lead their child to exercise in the wrong way because of their lack of information.

Our solution, which we propose to solve this situation, aims to improve home exercises. It stands on two main cornerstones. Firstly, we created specialized methods for controlling a computer that can be used to support speech therapy. We use the resulting technologies by integrating them into our game application which motivates them to exercise and provides feedback about correctness of exercise via gameplay.

Secondly, each child who suffers from a speech disorder is its own case and has special needs. This is why our solution represents a whole platform which enables the speech therapist to review the information about the children’s exercising and configure the exercises to help solving complex problems with speech.

Linked Data on the Web in order to Improve Recommendations


Ľuboš Demovič
master study, supervised by Michal Holub

Abstract. Currently the Web provides a large amount of knowledge, and also has the potential to become the largest source of information in the world. Data published on the Web is largely unstructured, intended for people, without a clear definition of entities, their meaning or relationships between them. Linked Data describes a method for publishing structured data on the Web so that they are connected to each other and which makes them useful. In addition, Linked Data contains various variants of links between the entities that make it possible to create a chart describing the selected domain. Promoting the importance of data represents the next stage of Web development referred to as Web 3.0.

We deal with the analysis of automated machine processing of data on the Web in order to identify and extract entities and facts from Web content. We also deal with exploring the possibility of creating automated datasets obtained from the extracted entities and facts, using the principles of Linked Data. Datasets generated in this way would be helpful for fast retrieval, translation, personalization, recommendation and navigating the user to the desired information. We focus on the processing of unstructured data from English and Slovak Web content.

The aim of our work is to propose a method that allows automated identification and extraction of entities and facts about them using lightweight semantics. Obtained facts will be used to create a dataset describing Linked Data from the selected domain knowledge. We will verify the proposed method experimentally, by implementing a software tool that will exploit the knowledge base for recommendations.

Combining the Power of Crowd and Knowledge of Experts in GWAP


Peter Dulačka
master study, supervised by Jakub Šimko

Abstract. Games with a purpose have been a great tool for acquiring and validating metadata in the last couple of years. Our latest project – City Lights – had been able to rule out wrong crowdsource metadata with very good success rate, but failed at detecting false positives. One of the big issues of crowdsourcing is suppression of expert opinions, which are “outshouted” by the rest of the crowd and advanced data – which could be acquired by such mean – are tagged as wrong.

We would like to face „the outshouting problem“ and create the game with a purpose which would be able to recognize expert players as they play and handle actions of theirs differently. By that we would be able to combine power of the crowd and knowledge of individual experts.

Therefore we propose game for song relation discovery based solely on player’s experience and taste which might uncover relations not traceable by similar metadata. Non-expert and expert players will be able to confirm which songs relate to each other the best (e.g. which songs should be close to each other in playlists). Apart from that, expert players will be used to discover relations between to songs which might not be obviously related. The game will be evaluating knowledge of players by asking them the facts about song they listen (e.g. record cover, lyrics, etc.). As the score will be based on these answers, players are motivated to answer correctly which is promising for expert discovery as well as song relation discovery.

Recognizing User’s Emotion in Information System


Máté Fejes
master study, supervised by Jozef Tvarožek

Abstract.Human emotions and theirs signs are inbred characteristics of every people, regardless to the particular person. Thanks to that they can serve as implicit feedback from the users in information systems. Mimics of the face are unconscious signs of psychical reaction of people. According to a number of researches in the area of psychofeedback different movements of the face are common for all people, so we can derive theirs reason – emotions. In case of education systems we can estimate the users’ opinion about the red text from mimics, so we are able to find out knowledge, interests, mood and other attributes that wouldn’t be possible to identify by the help of traditional ways of gaining feedback.

In this project we deal with gaining, representing and utilizing of user emotions while using web based education system. To find out, what is on the subject’s mind, we need to have a camera that records the user’s face. For the extraction of emotions from video we can use a number of existing tools, which are able to recognize human face and its facial features. Facial features are important points of human face (e.g. border of mouth or eyes). Theirs location depends on the movements of facial muscles therefore on the emotions of the user.

Our aim is to propose a method for user modeling based on emotions invoked during work in web based education system. Our method is going to be based on results of experiment we plan to realize in real environment with a number of users. Within the experiment we will track the users by a webcam while they are working in the selected system. By comparing of extracted emotions with users’ activities we will try to find relations between psychical activity and executed actions in user interface. The goal of the experiment is to identify activities or theirs groups specific for users in certain emotive state. This way we will be able to find out emotions invoked by given content by traditional types of feedback.

Group Recommendation of Multimedia Content


Eduard Fritscher
master study, supervised by Michal Kompan

Abstract. In our times it is very important that web pages or applications, not only store information, but it is also needed that the page or application could communicate with the user in certain ways. Because of the growth of the world wild web the amount of information which are stored in online space has increased. To solve the problem of this information burst recommendation technics and methods were invented, but as the world changes, the access to the internet also changed. People collaborate more often with each other. In times where the most visited pages in the world are social network pages the recommendation technics have to adept these new trends. Which is mainly collaboration between users. The answer to this need is group recommendation.

Therefore we are proposing method that will extract information from the users threw social networks for recommendation generation. We are also proposing aggregation strategies for the collected data from the users. These aggregation strategies will be able to determine the common interest and taste of the group. The last step for the formula will be creating a recommendation technic for the domain where the proposed methods will be tested. Because before we can recommend anything to groups we need to be able to recommend to individuals first. The domain where the experiment will take place is recommendation of multimedia content, more exactly recommendation of movies.

User Modeling for Facilitating Learning on the Web


Martin Gregor
master study, supervised by Marián Šimko

Abstract. Personalized text enrichment with the potential to improve access to an information is an easy way to obtain new knowledge. Text enrichment is the process covering analysis of the text, enrichment of the text based on user’s knowledge and finally obtaining feedback about the appropriateness of text enrichment. Everyday web browsing is the normal routine of each of us and through experiences that users have, it can be used as an effective way to support learning, especially e-learning.

Our aim is to gain a feedback from every possible action of the web browsing, transform this feedback to the knowledge about the user, store the knowledge to user model and finally enrich a text of the user’s web page according to the knowledge of user goals calculated from the user model. We try to consider the user forgetting with support of user model scrutability and we try to enhance user motivation to learn and work with facilities of our proposed method. We will evaluate our approach in e-learning domain. Analysis of domains of user modeling in adaptive e-learning web systems, behavior tracking on the web and facilitating learning on the web poses many challenges to solve.

Adaptive Feedback in Web Systems


Marek Grznár
bachelor study, supervised by Martin Labaj

Abstract. Nowadays, users are lost in great amount of information available on the Internet. Therefore it is important to recommend useful information to the user. For good recommendations, the recommender system must know the interests of the users, what they like, dislike, prefer. User feedback is one of the most important inputs for the recommender systems.

The quantity of explicit user feedback is quite low, because one of the problems of collecting the explicit feedback is that the users cannot see object for collect feedback . Many times a user does not answer an explicit feedback question because she is disturbed by the work . On the other hand quality of explicit feedback is higher than of implicit feedback.

We focus on difficulty ratings in an adaptive learning system for the purposes of learning object recommendation. In our research, we try to increase the quantity of explicit feedback, combining explicit and implicit feedback in adaptive explicit feedback elicitation. We follow the user behavior from implicit feedback and ask her in right moments to rate learning object difficulty. That means a window with a rating buttons for difficulty of learning object can appear just the user answer on the question . This window also appears when the user asks for hint.

Extracting Keywords from Educational Content


Jozef Harinek
bachelor study, supervised by Marián Šimko

Abstract. Effective access to documents is an important task for document management systems. It is even more important in educational materials where one wants to find relevant information with a reasonable level of precision. For this purpose we can use a form of lightweight semantics – relevant domain terms. However, it is important to extract these terms so they would describe the document the best they can and at the same time differentiate it from other documents in a set. This is often a difficult task.

In the case of educational content we can improve results of relevant domain term acquisition by processing user created annotations assigned to the documents (tags, highlights, comments). The annotations provide us with potentially useful information about the particular document and improve the results of base Automatic Term Recognition (ATR) algorithms. In our experiment we showed that enhancements based on annotations yield improved results.

So far in our experiment we showed that the annotations are promissing. Preliminary results show improvement of about 22 % in relevant domain terms extraction. The next step is experimenting with the right configuration of parameters in our method to obtain the best results.

Building a Domain Model using Linked Data Principles


Michal Holub
doctoral study, supervised by Mária Bieliková

Abstract. The Linked Data principles are being used in many datasets published on the Web. The aim of our work is to use Linked Data in order to create models describing 1) the domain of software development, and 2) the domain research in the field of software / web engineering. We use these models to represent the knowledge of IT professionals (analysts, programmers, testers), as well as the research interests of researchers in the respective fields.

We propose a method for automatic construction of a concept map serving as a basis of our domain models. For this purpose we use unstructured data from the Web, which we transform to concepts and links between them. Using a concept map we describe the knowledge of software developers as a set of technologies and principles they are familiar with. We also use a similar concept map to describe research areas, problems, principles, methods and models studied by researchers at our faculty.

The domain models we create can be used as a basis in two adaptive systems. The first aims at capturing IT professionals’ knowledge and skills, deduce further technologies they might know and enables users to search for a suitable candidate for a certain task or project. The second one allows users to bookmark, annotate and collaborate over research papers in digital libraries, as well as other Web documents. Here, we also use the model in order to answer queries in pseudo-natural language.

We evaluate the models and methods of their creation directly by comparing them to existing ones or by evaluating facts from them using domain experts. Moreover, we evaluate the models indirectly by incorporating them in adaptive personalized web-based systems and measure the improvement in the experience of users (i.e. they get better recommendations, search results, etc.).

Augmenting the Web for Facilitating Learning


Róbert Horváth
master study, supervised by Marián Šimko

Abstract. Every day, users spend large amount of time browsing the Web while fulfilling various needs, but they find it difficult to spare some time for education. We think that the amount of documents they go through and time they spend can be used more effectively. Information technologies and text augmentation methods are able to provide user with additional information during web browsing, which is helpful in learning process like learning new languages. Those texts are written in natural language which is a problem. It is not understandable for computers and therefore web augmentation is a complicated task. Finding methods which allow augmentation of selected parts of web documents is a research challenge in field of Technology enhanced learning.

In our work we propose a method for web augmentation during casual web browsing, which helps with learning process in domain of foreign language learning. Our method substitutes words on a webpage for their foreign equivalents, therefore users attention is needed for understanding the meaning of article and foreign words as well. Potential for this approach is supported by agreement of experts that vocabulary acquisition occurs incidentally and minimal mental processing (of presented vocabulary) can have memory effects. Our method is based on user model and minds the specifics of learning like forgetting. To evaluate proposed method we have created web browser extension and we plan to conduct an experiment with selected group of users.

Group Recommendation for Smart TV


Ondrej Kaššák
master study, supervised by Michal Kompan

Abstract.Domain of multimedia content represents an extensive area with lots of existing items. This set expands dynamically nowadays and it is impossible for a single man to be able to know about all of potentially interesting content that exists, or possibly to watch all of this content.

For this reason it is necessary to help people filter the multimedia content or just to recommend them directly suitable content, which interests them. A specific of consumption of the multimedia content, especially watching television is that it is quite often performed in a group. This creates new challenges and adapts requirements for recommendation systems.

For the group it is needed to select a single content, which attracts all members simultaneously. The aim is to achieve the highest possible satisfaction to all the group members. For that we need to know interests and preferences from individuals. By monitoring activities of individual users, such as rating of the overripe content, we create user models.

Besides the need of modeling user’s profile, we have to be able to identify actual group members. The structure of these groups watching TV is very varying from the view of single sessions. When recommending to group, we must also prove to aggregate single recommendations into common recommendation, which is uniform for the whole group.

User’s Satisfaction Modelling in Personalized Recommendations


Michal Kompan
doctoral study, supervised by Maria Bielikova

Abstract. Today’s approaches for the personalized recommendations focus mainly in the user’s activity over various web-portals. User’s preferences are not only dependent on the long term history and preferences, but the actual user’s situation plays crucial role in the user’s preferences formation process. Thus the item liked by the user in some context can be disliked in the other.

Because of this users’ variability, in our work, we explore a novel approach for the user’s satisfaction modelling and incorporating the actual user’s context with the consideration of previous users’ rating history. Such an approach reflects the natural characteristic of user’s context, when various contexts’ settings can influence another context and finally change user’s attitude to the specific item involved in the recommendation process.

Incorporating such user preference modelling allows us to maximize user satisfaction during one-session recommendation by improving the item’s rating predictions. Moreover, when the sequence of items is recommended to the user, designed approach allows us to maximize specific goal of the recommendation by choosing appropriate sequence of items presented to the user.

Software Metrics Based on Developer’s Activity and Context of Software Development


Martin Konôpka
master study, supervised by Mária Bieliková

Abstract. Software development is extensive process which needs to be monitored and evaluated. It is important to evaluate it not only from the perspective of development of software product, but also from the management perspective, to assess reaching desired qualities and attributes of the process and resulted product.

In our work we focus on the evaluation of software and its source code using the information about developer’s activity and context of the development. Developer performs various activities during the development process. Some of them are directly associated with development, e.g., programming or modeling of components, other activities may be associated indirectly, e.g., searching for information, studying documentation or communicating with team members. However, developers also perform other activities not related to development or they are influenced by their emotional state or environment where they remain.

Traditionally we use various metrics to evaluate the process and product of the software development. Because the product metrics evaluates software using its source code, they suffer from the ambiguity in interpretation exclusively for every project. In our work we intend to take into consideration developer’s activity performed during the development process and the context of software development to find the connection with resulting attributes of created product. Existing product metrics based on the source code may be helpful for evaluation of our approach.

Multiple Sources of Search Context, Their Influence and Applicability


Tomáš Kramár
doctoral study, supervised by prof. Mária Bieliková

Abstract. Web search begun as a relatively simple process, where the person types in the query in form of keywords, the underlying database of documents is searched for a match using the given model (e.g., vector space model) and the relevant documents are returned. The important concern that is not addressed by this process is the actual underlying goal that the person is trying to fulfil by issuing the query. With semantic search not yet widely used in production, a whole generation of people has been trained to express their information needs in form of keywords.

It has been recognized that Web search needs some form of implicitly acquired information, that would help to understand the underlying intent. That information is collectively referred to as a search context. Traditionally, in Web search the context describes any information that can be used to infer the specific goal that the searcher wants to fulfil by issuing a query. The concept of context per se is decoupled from its representation; one example of such representation might be a vector of weighted terms indicating the topics that the user is interested in. The only important aspect of a search context is that it reveals information about the underlying search goal that can be used in the ranking phase of the search process to rank documents matching the goal higher.

It has been shown that a single source of context can dramatically improve search relevance, but whether multiple sources of context can further improve relevance has yet to be shown. Many questions need to be answered before multiple sources of context can be reliably used in Web search. In this work we analyze three sources of context: a long-running context of seasonality, a short-term activity-based context and a social-based context. We monitor user’s clicking activity, extract many features from each search and analyze which features influence which context. Our goal is to tell whether the feasibility of a particular source depends on some of the external search features.

User Modeling Using Social and Game Principles


Peter Krátky
master study, supervised by Jozef Tvarožek

Abstract. Personality has an impact on user’s behaviour in information systems, and adaptive systems that model these features can provide better user experience. In our project, we are interested in a generic type of user modelling based on personality traits. Our goal is to identify user’s personality within information systems in general, and computer games in particular. Classic methods of user modelling based on questionnaires hinder smooth user experience especially in games that should provide entertainment. We explore to what extent the personality-based user modelling can be conducted unobtrusively in computer games. Games are different, and various game mechanics can work differently with different players’ personality profiles.

In order to study effects of player’s personality on games in general we designed a feature-rich causal browser game in which different game mechanics can be turned on/off based on the user experiment design. The game is tracking both the user interface actions and game actions, providing a complete footprint of user’s personality in terms of the manifested gameplay. Correlating the activity logs with different personality measures (Big Five and Index of Learning Styles, in our case) reveals the relationships between player’s personality and gameplay, and provides insights into which game mechanics work effectively with which personality profiles. To evaluate our solution we have integrated the game into the educational system Peoplia at the faculty providing a potential base of experimental players.

Context-based Improvement of Search Results in Programming Domain


Jakub Kříž
master study, supervised by Tomáš Kramár

Abstract. When programming the programmer does many other things along with writing the actual source code. He opens various applications and other source codes, searches them, writes notes and, last but not least, he searches the internet for solutions of problems or errors he might have encountered. He does all these tasks in a context or in order to do something, usually to solve a problem. Via identification of this context we can understand the meaning behind the programmer’s actions. This understanding can be used to make the internet search results more accurate and relevant to the current task.

In this work we analyze the existing methods used to mine the context in other domains and analyze their usability in the programming domain. An important source of contextual information is the source code the programmer is currently working on, we analyze methods used to extract metadata from source code files, software projects and other documents in general. Based on the analysis we design methods to mine the context in programming domain in order to create a context model which we further use to improve the search results. The designed methods are experimentally evaluated on a large dataset of logs made from programmers’ activities.

Activity-Based Programmer’s Knowledge Model for Personalized Search in Source Code


Eduard Kuric
doctoral study, supervised by Mária Bieliková

Abstract. Every day, a programmer needs to answer several questions for the purpose of finding solutions and making decisions. It requires the integration of different kinds of project (software system) information, as well as, it depends on the programmer’s knowledge, experience, skills and inference.

The main advantage of search-driven development should be that programmers save time and resources by reusing (external) source code (components) in their software projects. To support search-driven development it is not sufficient to implement a “mere” full text search over a base of source code. When a programmer reuses source code he has to trust the work of external programmers that are unknown to him.

It is possible to solve using a trustability metric so that the programmers assess the quality of source code search results. Existing approaches are often based on collaborative filtering of programmers’ votes and project activity of programmers. It would be helpful for programmers to see not only some activity statistics about software project (components) but also a karma value of each author (programmer). If a target programmer will easily see in the search results that an experienced programmer with good reputation has participated in writing the source code (component) then the target programmer will be more likely to think about reusing. Thus, author’s reputation can provide information to support programmers’ decisions.

Reputation ranking can be a plausible way to rank source code results of a search, i.e., if we determine programmers’ karma values, we can prefer software components based on reputation of their authors. To model programmer’s reputation (to calculate programmer’s karma value), we need to investigate software components which he created. We propose programmer’s knowledge model and methods for its automatic retrieving. Currently, we focus on two factors, namely, calculation of programmer’s know-how about used technologies and calculation of programmer’s karma based on importance of components.

User Feedback in User/Domain Modelling and Adaptive Evaluation


Martin Labaj
doctoral study, supervised by Mária Bieliková

Abstract. User and domain models are essential components of adaptive web-based systems, as well as the evaluation of such systems. In our research, we focus on user feedback used as a source for user and domain modelling, specifically on the tabbed browsing (also called parallel browsing). We also work on adaptive evaluation of adaptive web-based systems.

The tabbing is currently established as a more accurate representation of user browsing activities than the previous linear model. We model the user tabbing behaviour from events sourced from a browser agent (extension) or scripts included in a page, recognizing sequences of events (e.g., pageload of a page P with referrer R, not preceded by page unload of R, and followed by blur of R, focus of P) as user actions (e.g., the user has opened the link in a new tab, then switched to it). From these actions, we discover tabbing scenarios (e.g., keeping a tab opened as a reminder). Various tabbing scenarios in which the tab participates are tracked per each tab during its life, effectively putting opened tabs into groups with various current or future levels of user’s interest and various user tasks and goals. These data serve as basis for stereotype-based user model of tab scenarios usage and overlay user model of interests, as well source of relations for domain model augmentation.

Another area of our research is the user-centered evaluation of adaptive web-based systems. We ask evaluation questions (EQs) during the user’s typical work in the system. The questions are adapted for the user and their actions and are asked at appropriate moments using the evaluation engine. In this way, evaluation feedback is collected even from users who otherwise would not actively seek to provide the feedback, e.g., in post-session questionnaire, and moreover the data is more accurate as the users are asked and they answer right when they are working with relevant parts of the system.

Personalised Recommendation of Learning Sources


Jozef Lačný
master study, supervised by Michal Kompan

Abstract. Nowadays large amount of information is offered to the user via various information systems and e-shops. Therefore selection of that information is very important for the user. There has been much work done in the field of developing recommender systems to provide relevant information for the user – systems based on collaborative filtering, content analysis and many others.

Interesting application domain for personalized recommendation arises in the field of learning systems, where one of the challenges is to recommend appropriate learning sources to accomplish best studying results and enhance learning efficiency.

We propose a content-based method for recommendation of learning resources for groups in e-learning environments. We based our method on existing research in this area and methods developed at Faculty of Informatics and Information Technology. We extend these approaches by using the users’ learning styles to enhance the suitability of recommended resources by adapting the group creation process and recommendation itself to the users’ preferences based on their knowledge and learning style. To evaluate our method we plan to implement our method into existing e-learning system ALEF. There we will measure the benefits of recommendation of learning resources in collaborative learning and test our method with real users in a ongoing university course.

Acquisition of Learning Object Metadata Using Crowdsourcing


Marek Láni
master study, supervised by Jakub Šimko

Abstract. In past years the Web began to be used largely for education purposes. There are many TEL or Question Answer portals and web sites which are being used to gain knowledge and information. Many times these systems are designed not only to provide benefits for users but also to benefit from their users. We can say, that it is a win-win relationship. It is because the content of these systems is often crowdsourced, it means generated by users themselves. But as it is not guaranteed that people who create this content are experts in a specified area, it is necessary to somehow ensure quality of their contributions. This validation should be automated for reason of it’s time consumption. But the question is: How?

There are several approaches how to do so and aim of our work is to take, combine and modify some of them to achieve satisfactory results in filtering of answers on questions within the TEL system. We aim to analyse user’s profiles in order to determine their expertise level. This will help us to estimate probability of correctness of user’s answer. We will also take in account the votes by which users award every answer. With these measures we will be able to sort the answers on the basis of their estimated correctness. We want to experiment with these measures and use different weighting parameters which will lead us to best results and to determination whether the crowd is capable of self evaluation or not.

We plan evaluation of our work by comparing our results with ratings assigned to every answer by teacher.

Researcher Modeling in Personalized Digital Library


Martin Lipták
master study, supervised by Mária Bieliková

Abstract. Researchers use digital libraries to either find solutions to particular problems concerning their current research or just to keep track with the newest trends in areas of their interest. However, the amount of information in digital libraries grows exponentially. This has two serious consequences. Firstly, many interesting works are unnoticed. Secondly, researchers spend too much time reading articles that turn out low-quality, unrelated to their current research or unrelated to their other interests. These kinds of problems are nowadays solved with recommendation systems or more effectively with personalized recommendation systems. The core of every personalized system is its user model.

Our aim is to design and implement a user model based on data from Annota. Our model will leverage the articles the user has read, the tags and folders she has used, the terms she has searched for, etc. Furthermore, user data from the Mendeley library organization service will be integrated. Probably a personalized article recommendation service for Annota will be used for its evaluation. At this point we are also considering other options as personalized search results or personalized article recommendation service for Mendeley. Based on available user data and evaluation options, we will seek for suitable representation and creation process of researcher (user) model in domain of digital libraries.

Preprocessing Linked Data in order to Answer Natural Language Queries


Peter Macko
master study, supervised by Michal Holub

Abstract. Searching for information on the Web is difficult because of its enormous growth. To make matters worse, most of the data published on the Web is in unstructured format. However, more and more structured data is being published, which is also evident from the emergence of unifying initiatives like Linked Data. Nowadays, there are only few search engines are able to search in them utilizing the full power of the provided semantics. The majority of the search engines search for information using keywords. To utilize the full power of the structured data a special query language, like SPARQL, has to be used.

We can change this fact by creating complex search engine which could understand pseudo-natural language of humans. These queries will be transformed to SPARQL language and executed on an ontological database. The main point of our method is preprocessing. In preprocessing stage we use WordNet for populating our vocabulary with synonyms of dataset entities and relations. Than user can search in his or her vocabulary and thanks to preprocessing we can convert user vocabulary to dataset vocabulary. For a purpose of understanding user sentences we use StandfordParser. When we have sentence skeleton mapped by WordNet synonyms into dataset vocabulary we can execute this query in SPQRL form on dataset.

Querying Large Web Repositories


Matej Marcoňák
bachelor study, supervised by Karol Rástočný

Abstract. Nowadays, large amount of information on the Internet achieved level, when we are not capable to process this amount of data on a single machine/server. It is necessary to look for other options and approaches how to process large data. One of solutions is programming model called MapReduce, which is based on parallel data processing on clusters of computers.

The quantity of these data is related with expanding trend of use semantics in the Web. Semantics allows us to create webpages or documents, which are more intended for machine processing. This kind of data are often represented as RDF triplets of subjects, predicates and objects (e.g., John, is friend, Mathew) and organised in ontologies. Ontologies and RDF data are standardly queried by SPARQL and its extended forms.

One of the possible data storage are triple-based RDF repositories, but their efficiency is questionable. Therefore we try to store domain specific data in NoSQL database MongoDB, because data structure aspect it is more preferable for our use. But NoSQL databases do not support querying by SPARQL, so we decided to propose MapReduce algorithm for evaluation of SPARQL and its advanced features. In the implementation of the advanced features we are going to use different techniques on different levels of programming model MapReduce.

Semantic Wiki for Research Groups


Martin Markech
bachelor study, supervised by Jakub Šimko

Abstract. With a proliferation of Web 3.0 or Semantic Web, we are able to describe web resources with more semantics. Thanks to Linked Data initiative we also join these data to each-other across the Web. With linked data it is more easily to find related data and explore the web of data.

In our work we focus to create easy-to-use method to add semantics into wiki application of our faculty, without need to write any RDF markup by the end user. We analyze and take into account specific needs of PeWe group members, like organizing events or bibliography linking. We give ability to user to fill in templates to automatizate semantics creation – triplets are automatically created with pre-defined values. This way ensure, that user can think more about the text instead of how to create correct semantics. For each triplet we create dereferenced URI to allow users browse through our semantics store.

We use RDF store Sesame to store RDF triplets and try to create web service for automatization of bibliography linking. At the end we will get linked data, with the ability to create semantic search through semantic endpoint. This semantics then can be used in new projects created in our research group.

Discovering and Predicting Human Behaviour Patterns


Štefan Mitrík
master study, supervised by Mária Bieliková

Abstract. The fast development of advanced mobile technologies opens up new possibilities for analysis of humans’ behaviour. We are able to track user movements and physical activity performed during the day. We can analyze measured data and discover users’ behavior patterns. Each pattern consists of visited locations and routes among them. It also contains time and distance annotations that describe users’ behavior in the more detailed manner. A pattern location is enriched by additional semantics information. This transforms geographical points determined by latitude and longitude into the more meaningful places with information about the place semantics such as University or Restaurant.

The behavior patterns are naturally being performed in repetitive manner. We utilize 3this to predict the actions of the users in the future. The ability to predict users’ actions is crucial in fields such as physical activity recommendation. The main goal of our method is selection and recommendation of the transitions that are suitable for the physical activity. User’s behavior is highly influenced by the context. For example user might prefer walks when there is a nice weather outside. We evaluate user ratings and preferences with respect to a weather and day of the week. Our method successively learns from user’s feedback about transitions that are convenient for performing physical activity and adjusts the recommendations according to the recent actions.


Samuel Molnár
bachelor study, supervised by Mária Bieliková and Róbert Móro

Nowadays, the amount of information available on the web is making navigation by many common approaches and technologies difficult which leaves users to rely solely on the results list provided by the keyword-based search engines. Thus, over the past years several novel approaches were presented as an alternative solution to navigation support in search engines, such as tag clouds. Tag clouds focus mainly on exploiting different visual features of words like font size, color or justification to emphasize their relevance. By employing visual features of tag clouds we aid a user’s navigation with the knowledge of how large is the information space behind the specific word. However, many approaches employing tag clouds do not exploit user’s context or adapt content of tag cloud to provide personalized navigation.

We propose an approach for term cloud navigation, which exploits navigation history as a source of metadata for personalized browsing of information. Apart from tags we employ keywords extracted from documents as content for term cloud. We consider the position of a word in a query as a relevant factor and prefer the recently added words to query when sorting the resultant documents. In order to represent user’s interests we exploit history records in a particular period of time and consider trending words in her history to be the closest expression of her interests in the explored time period. We highlight the trending words in the cloud by different color to emphasize their relevance according to the time of their last usage in user’s history.

We evaluate our approach in the domain of digital libraries. We implement our proposal as a module into a system for web page annotating – Annota, which is being developed by several PeWe group members.

Exploratory Search Using Automatic Text Summaries


Róbert Móro
doctoral study, supervised by Mária Bieliková

Abstract. Nowadays, keyword search is a prevalent search paradigm on the Web. We use a set of keywords as a query describing our information need and get a simple list of results in return, where each result is usually represented by its title, URL and a short snippet. This approach works reasonably well for simple information retrieval tasks such as fact finding.

Selecting the relevant links and navigating among the documents can, however, be an uneasy task if the information seeking problem at hand is more complex and requires exploring multiple sources to find relevant information, such as researching a new domain. These types of searches are exploratory in their essence. They start with ill-defined information needs and are often open-ended; moreover, they can span over multiple search session and usually require employing different search strategies.

In order to support the exploration of the domain by the users, we provide them with navigation support using automatic text summaries. The summaries consist of sentences conveying the most important information of the document; they can reduce information overload by helping the users to decide whether the document is relevant for them and they should read the whole text or not. In our proposed approach the users can choose their own leads (keywords in the summaries) to filter the search results and follow potentially useful leads added by other searchers. We evaluate our approach in the domain of digital libraries on the scenario of a researcher novice such as a young master or doctoral student using the bookmarking system Annota.

Metadata Collection for Personal Multimedia Repositories Using GWAP


Balázs Nagy
master study, supervised by Jakub Šimko

Abstract.With increasing number of personal albums and photos in them, owners have more and more problems with their organization. This is because of the lack of descriptive data, however their amount only depends on the abilities and will of the owners. The tools for their creation are available, however the main problem is with the motivation, because tagging and annotating of photos is usually a boring process and its execution takes also long time. Other methods for obtaining metadata to general images also exist, with different advantages and disadvantages.

In addition to manual methods we have option to retrieve metadata with automatic methods. These methods are effective but applicable only for certain types of data. They can recognize, for example, objects on the picture like animals, plants, faces, emotions or weather. The problem is that for organization and navigation in personal albums users need special types of descriptive data like persons on the picture or events, places, holidays related with photo. This kind of data can’t be acquired automatically without the existence of necessary annotations.

Other group of approaches are crowdsourcing methods including games with a purpose using the power of human computation to solve problems that are partially or completely unsolvable for computers. To produce results, these methods need group of people who are working to solve the same task. Shortcomings are often results with poor quality requiring additional processing. Classical methods of crowdsourcing and GWAP are naturally useless in our case, because of small number of people who know the information required to describe personal photos.

We devised a game called PexAce that is a GWAP for harvesting annotations to photos in general. Earlier experiments showed that people playing with they own photos are more interested in creating such annotations, and another side effect is that these annotations are more precise and relevant for the owners of these photos. By merging our game with automatically obtaining of metadata from text we found an appropriate solution for creating metadata for personal photo albums. The main contribution of this work is a framework for processing annotations written to personal photos. This framework consists of modules called extractors working on same tasks in different ways and is extensible from this perspective.

Recommendation based on Difficulty Ratings


Matej Noga
bachelor study, supervised by Martin Labaj

Abstract. Our main goal is to create full comfort recommendation system in learning system ALEF. We try to help students with their studies, mainly through recommendation of appropriate learning materials, depending on the current level of knowledge of the student. Our goal is to recommend that the student such a case, that the solving of the case would be a challenge for the student and the student would assess the case as moderately challenging. This kind of case has the greatest value for the student, because it is adequate to his knowledge and he can learn a lot from it. The case should not be too easy, otherwise the student might start getting bored and nothing could be learned from it, but again, not too heavy, or could discourage a student to continue the learning.

Method is based on the assumption that each object we want to recommend contains information of the concepts being present in the object. The user model also must contain an estimated level of difficulty for each concept level to deal with it. For all examples that we recommend to the given user, we find as set of similar users that already started the evaluation of such case and establish the average value of the lever of given case. An example which value will be approaching the value of “moderately difficult” will be recommended.

We are thinking of using RECO platform that deals with personalized recommendation in the project. The method is in the state of implementation into the teaching system ALEF.

Extracting Word Collocations from Textual Corpora


Martin Plank
master study, supervised by Marián Šimko

Abstract. Natural language is the main way of communication between people. They use it for asking and answering questions, expressing opinions, beliefs, as well as talking about events etc. And they communicate in natural language on the Web, too. However, the simplicity of creating the Web content is not only the advantage of the Web, but also its disadvantage. It is expressed in natural language, which means that it is usually unorganized and unstructured. This makes processing of the Web content expressed in the natural language difficult.

Difficulties in natural language processing are often connected with ambiguity of the language. Some words have specific meaning, when they are used together in one sentence. This raises the problem of collocation extraction. Detection of collocations is important for various tasks in natural language processing (word sense disambiguation, machine translation, keyword extraction etc.). Many statistical methods, as well as other natural language attributes (e.g., part of speech) are used to resolve this task.

In our work we focus on extracting collocations in the Slovak language. We analyze several methods for collocation extraction. Our goal is to adapt or improve existing methods and explore collocation properties in the Slovak language. The important choice is whether to focus on statistical methods measuring co-occurrence between word n-grams or linguistic methods.


Ondrej Proksa
master study, supervised by Michal Holub

Abstract. A few million unique websites appear on the Web every day. Information on them is usually published in an unstructured format. Linked Data is structured data which contains entities and relationships between them, that are available on the Web. Some datasets are made via automatized processing of freely available data. These are useful for personalization, web search or for knowledge deduction. One of the main problems is the conversion from various unstructured datasets to a uniform format and the linking of the data to existing datasets.

In this work we analyze the issue of mining structured data from various sources available on the Web and the issue of linking the mined data in order to create a domain knowledge base. We analyze various approaches to automatized dataset creation, gathering information about named entities and linking of the entities and integration of new datasets with existing. We design a method to automatically process chosen sources of unstructured data and create a structured knowledge base, which is based on the Linked Data principles.

The designed method is experimentally evaluated on data from chosen domain by implementing a software prototype, which uses the knowledge base for a chosen problem from the field of Web personalization – search, navigation, recommendation based on relationships between entities. We validate the created knowledge base by comparing it to other existing knowledge bases.

Automatic Web Content Enrichment Using Parallel Web Browsing


Michal Račko
master study, supervised by Martin Labaj

Abstract. Creation of links between resources on the Internet is now an acute problem due to the large amount of diverse content that it includes. In the past this content could only be created by the authors of pages, but now at the time of Web 2.0 users are able themselves add this content. This is the main reason and cause of improper structure of such content and weak or no links between similar sources.

Creation of clear and sustainable long-term structure of sites is important because of easier navigation when browsing or searching the Internet. Metadata and semantics of each page allows more accurate search and thus are search engines able to create complete picture of the site area. It is also possible based on the information content to create links between similar resources without having to physically connect those sites.

The aim of our project is to propose and test a method to enrich the domain model of web systems with ability to link external resources to current adaptive systems. The proposed method takes into account the user’s privacy while creating implicit links between resources on the basis of their behavioral models taking into account a parallel web browsing behaviour. The method will consider that user uses browser tabs and will try to accomodate to his habits when working with the Internet.

Information Tags Maintenance: Anchoring


Karol Rástočný
doctoral study, supervised by Mária Bieliková

Abstract. Current content processing and presenting systems create a lot of different metadata that contain valuable information, for example logs about users’ behavior or derived concepts. These metadata are closely related to their resources – data in repositories of information spaces. But these data are not static and all their modifications affect validity of metadata, so metadata have to be maintained. Because several types of metadata exist and probably each type needs specialized maintenance approach, we have aimed to information tags (descriptive metadata with semantic relations to a tagged content) and we are working on a proposition of automatic information tags maintenance approach and information tags representation which is suitable for effective maintenance.

The Problem of metadata maintenance has not any sufficient solution. But this problem can be divided to two partly indifferent sub-problems. The first is maintenance of anchoring, which can be solved by accurate robust position descriptor. The second problem is maintenance of bodies of metadata. This problem is not solved in current approaches of metadata maintenance.

In case of our research project PerConIK we utilize information tags mainly for tagging parts of source code by behaviour information about programmers and information about source code’s features. To deal with problem of anchoring in source code we proposed an approach, which works with source code files like with sequences of textual elements (lines or words). Our approach uses location descriptor within a target (source file or an AST element of a source file), which consists of two partial descriptors with different scope of use cases – index-based location descriptor and context-based location descriptor. We interpret context-based location descriptors as sequences of textual elements. The approach gives us opportunity to break up problem of time and memory complexity of approximate string matching to two smaller parts – comparing textual elements and local sequence alignment that are processed separately.

Using Site Specificity to Build Better User Model from Web Browsing History


Marius Šajgalík
doctoral study, supervised by Michal Barla and Mária Bieliková

Abstract. The ever growing Web content has enabled users to stay just on the Web more often and accommodate their various needs. Users browse the Web to read news, work, play games, socialise, etc. Even news can cover more than one broader topic like politics, sport, or profession news. From the developer’s point of view who endeavours to build the best possible general user model (assuming not to focus on a particular domain), not each of these are necessarily equally influent. As users browse the websites, they can have overlooked, or just skipped some parts they are not interested in. If we are to model the user interests, we often cannot assume that user has seen the whole page, nor even think it interested her. This is particularly applicable to websites of general interests like the aforementioned news portals. If there are multiple topics within a single website, it is highly probable that user is just not interested in all of these topics, but chooses to read only a few of them.

To account for these topic varieties, we decided to evaluate the specificity of each site and thus make it less influential on user model. To infer an interest of user in a site, we propose to calculate the site specificity. The less topics are contained within a site, the more specific it is and the more probable is the higher significance of the discovered topics for user interests. On the other hand if there are multiple topics, yet vaguely related if at all, the probability of user interest in all of them is very low.

Personalized Search in Source Code


Richard Sámela
master study, supervised by Eduard Kuric

Always, programmers try to solve their development problems as easy and quickly as they can. Maybe all of them are using the internet or some repositories with source code for finding the right solution. There are a lot of examples, tutorials or other options, how to get some lines of code, which are reachable by programmer. The most efficiency solution is reusing existing source code instead of creating a new one. But the problem is to find source code, which the best fits to solve development problem.

We will analyze some options how to recommend source code. This could be done by creating programmer’s user model. It should be based on implicit and explicit feedback. Implicit feedback should contains information about programmer, source code fragments implemented by programmer and information about technologies, what programmer used in some project. Explicit feedback will contains information, which are added manually. After that, we will be able recalculate a knowledge score of every programmer. Knowledge score will be calculated from user model of programmer and it will be useful for personalized recommendation of source code.

Web Navigation Based on Annotations


Jakub Sevcech
master study, supervised by Mária Bieliková

Abstract. We often use various services for creating bookmarks, tags, highlights and other types of annotations while surfing the Internet or when reading electronic documents. We use these annotations to highlight important parts of documents and to mark our thoughts in the margin of the document. User created annotations are commonly used to support navigation, text summarization etc. We proposed a method for search for related documents to currently studied document. Proposed method uses annotations created by the document reader as indicators of users’ interest in particular parts of the document. The proposed method uses spreading activation algorithm to identify most important words in the text transformed to graph. The text to graph transformation step transforms words to graph nodes and creates edges using word neighborhood in the source document. The activation is spread from nodes with attached annotations and concentrates in most important words. We use these words as query in retrieval of related documents.

To evaluate proposed method we created a service called Annota, which allows users to insert various types of annotations into web pages and PDF documents displayed in web browser. We analyzed properties of various types of annotations inserted by users of Annota into documents. Based on these properties we performed a simulation to generate annotations into dataset created from Wikipedia pages. We compared relevancy of documents retrieved when searching using query created by proposed method and when searching using query created by TF-IDF based method.

Crowdsourcing in the Class


Jakub Šimko
doctoral study, supervised by Mária Bieliková

Abstract. The goal of our research is to evolve and examine a novel crowd-based metadata acquisition approach within an online learning framework. The resources we target are the question-answer learning objects, which are exam-like questions of a learning course to which free-text answers created by students exist. What does not exist though, is metadata describing the validity of these answers.

We devised an online interactive exercise available to students through which the students validate free text answers to questions. We build on our previous research, where we have shown, that the aggregate student crowd answer can be correct to some extent. Nevertheless, it is a challenge to extract more accurate crowd answer from the individual student answers. On this, we focus in our work. Our aim is to measure and exploit information about student expertise level (for the course domain) by marginalizing the answers of “bad” students and strengthening answers of “good” students.

Reciprocity as a Means of Support for Collaborative Knowledge Sharing


Ivan Srba
doctoral study, supervised by Mária Bieliková

Abstract. Knowledge management systems provide many progressive ways to different types of organizations how to create, improve and share knowledge embedded in communities which consist of particular organization members. One type of these communities is called knowledge building communities which focus not only on knowledge sharing but also on learning new valuable practices. It is possible to identify knowledge building communities in many areas; the most common examples are classrooms, academic research teams or workplace teams.

Employing computer support in knowledge sharing process brought many advantages to knowledge sharing process. On the other side, new problems occurred. One of them is much wider users’ diversity which can significantly influence the knowledge sharing process in positive or in the contrary in the negative way. Positive motivators include reputation, centrality or reciprocity. According to several researches, reciprocity is one of the most important motivator because users expect that they will get back from the community the same amount of knowledge as they give to community. Therefore, we decided to propose an innovative model of adaptive web-based system for collaborative knowledge building which will be aimed to support symmetry in activities of receiving and producing knowledge among all members of particular knowledge building community.

Browsing Information Tags Space


Andrea Šteňová
master study, supervised by Karol Rástočný

Abstract. Software systems create various types of metadata that describes the specific parts of a structured addressed content. These metadata are a valuable source of information because they summarize the content (such as keywords), and it gives us the additional information (such as the number of clicks on the link on the page). They are usually generated and processed by machine, and their amount makes it dificult for people to efectively read and understand them. Therefore, it is necessary to navigate users in the information tags space and to display them in a human-readable and browsable format.

In our work we would like to propose method that would help users browsing information tags space. We want to focus on structured metadata about user activity, as well as the structure and content of data. Problem with metadatas is their change in time. We can present to users only actual data or display them change of data in time also.

Another problem is computational complexity of algorithms to display data. If we do not display to user all the data at the time and calculation of its representation is challenging, we need some way to predict a user’s actions to optimize the computational time. Our solution must be understandable and easily readable by users. To support that we will use some of the existing visualization solutions. We plan to verify our solution in domain of project PerConIK.

Modelling the Dynamics of Web Content


Matúš Tomlein
master study, supervised by Jozef Tvarožek

Abstract. The Web content has a very dynamic nature, it frequently changes and spreads over various information channels on the Web. It is a non-trivial process to observe and analyse the dynamics of Web content and it also requires a lot of archived data. On the other hand, the knowledge of the behaviour of Web content is useful in many areas of software engineering. It can improve search algorithms for Web content, provide some basis for recommending similar content and also be useful to update cache on Web servers.

Our aim is to design and implement a method to effectively process Web content in order to be able to observe and analyse its behaviour in an archived data set. We plan to use the method on a sufficiently large data set of websites from various sources (e.g. blogs, social networks or news portals) to draw useful conclusions about the dynamics of such content.

Method for Social Programming and Code Review


Michal Tomlein
master study, supervised by Jozef Tvarožek

Abstract. Peer code review is a powerful tool, which can be used to significantly improve the quality of programming courses. In our work, we set out to explore a new kind of collaborative programming exercise, where a group of students works on a chain of programming assignments with the goal of their completion by all of the students.

To that end, we base our approach on a combination of peer review and social awareness. Students are each automatically assigned a reviewer, who then has a live view of their code and can provide feedback by means of the built-in messaging feature. Automating the selection of a suitable reviewer is a non-trivial problem unaddressed by present collaboration and code review solutions.

Additionally, we raise the students’ awareness of their part in the overall progress of the group using a special group progress visualisation. In our initial experiment, this visualisation has met with positive reactions from the students, who found such information very useful and motivating.

Information Retrieval Using Short-term Context


Matus Vacula
master study, supervised by Dušan Zeleník

Abstract. The content of the Web is continuously expanding. With the growing amount of the information accessible to end user, the difficulty to achieve search results which are more relevant to user’s interest and intention. All of the modern search engines have to accommodate to this. Search engine developers are trying to make searching more efficient by involving the context. Although the resources are available and the some methods of using the context are known, search engines are still not using these possibilities at their full potential.

Information about user’s current activity could lead to achieving more accurate search results or at least less ambiguous meaning of his query. These information are relatively easy to obtain from web browser. It is common phenomenon that web users are multitasking while they are surfing the Web. That means browsing multiple websites at the time and switching between them. Even from the single view on the list of open websites we can assess what kind of activity is user performing with considerable probability and precision.

It is possible to extract the information about user’s activity by identifying the keywords or from the topic of viewed websites. We are able to use these keywords to enrich the search query of user to specify the search and to achieve more precise results. This approach does not require tracking the long term history of user activity nor does it enclose the user in the imaginary “bubble” of preferred domains which would prevent him to obtain information from another domain or area.

Context-Aware Recommender Systems Evaluation


Juraj Višňovský
master study, supervised by Dušan Zeleník

In the age of information overflow it is impossible for users to handle or filter large amounts of data on their own. Therefore there is a demand after recommender systems capable of selecting the most relevant information to the user.

In the process of item selection, users are usually affected by various factors (e.g. user’s willingness to spend money in e-shops can be influenced by his wealth or by forthcoming Christmas). These factors describe user’s current situation in details and are called contexts. In most, if not all, domains it is appropriate to include contexts in recommender systems in order to improve recommendations, because as it is known there is a correlation between user’s action and contexts.

Because of contexts variability it is not a simple task to evaluate context-aware recommender systems. A naïve evaluation of a recommendation generated for a certain set of contexts would be performed only in the exact same conditions. This is, however, a very costly evaluation method. In our work we set up a supposed situation and simply ask users how would they act in given conditions. During generation of supposed situations we have to bear in mind users’ nature. Some of them could be more open-minded than others and they could be possibly used to evaluate recommendations for different kinds of users (e.g. an empathic man could precisely answer the question what perfume would a woman of his age buy in given situation).

User´s Activity Based Search in Digital Library

vnenkLubomir Vnenk
Bachelor study, supervised by Mária Bieliková

Everything is on internet, so it so simple to lost in this amount of data. When user tries to find something he needs, there is no other way than use a web search. However, if it tries as much as it can, it cannot really guess, what user is trying to find. Nobody can, sometimes even user cannot. However, we can try to guess.

To have a precise guess, we need to have a model of user, what he likes and what he does not. However, the most important is information about his actual state and actions he recently did. These information have to give us right meaning of the keywords. We use this meaning to extend user´s context or filter found website. To get useful information about user we are going to monitor his activity like browsing internet and applications using, etc.

The aim of our work is to help people to search what they need immediately and without mess.

Relationship Discovery from Educational Content


Petra Vrablecová
master study, supervised by Marián Šimko

Abstract. The domain model is an essential part of adaptive learning system. It expresses the semantics of educational content in the form of metadata. We consider it to be a lightweight ontology, i.e., a set of terms and relations. Manual domain model building is a challenging task for teachers, hence there is an effort to automate it. We propose a method for automated acquisition of metadata from educational content, aimed at relationships discovery between terms. We exploit existing methods for relationship discovery from text and adopt them for the educational domain. Our work is promising contribution to the growing field of automated domain model acquisition.

We decided to take advantages of the statistical methods for relationship discovery – language independence, no need for syntax knowledge. We assume that educational content has an uniform vocabulary which is a precondition for better results of statistical methods. We will make use of the LSA, term subsumption and application of graph algorithms on the educational content’s structure to discover relatedness and hierarchical relationships. We employ the system for educational content management for the evaluation.

Concept Location Based on Programmer’s Activity


Pavol Zbell
bachelor study, supervised by Mária Bieliková and Eduard Kuric

Abstract. Search in source code is a necessary part of the daily work of a programmer. Programmers often search and explore source code to enrich their existing knowledge of the workings and functionality of a software system, or to get answers to questions about software evolution tasks they are currently working on, or they search for source code fragments that they could reuse.

Existing search tools built-in IDEs usually return a list of relevant functions and programmers try to locate desired concept by jumping through functions based on the function calls they see. We found this approach ineffective as it is usually resource and time consuming.

In our work we focus on searching the source code in terms of concept location. We propose a method which takes changes over time to fine-grained elements (functions) of the source code into account. We assume that changes made at a particular time are related, and may represent a concept of the software system. The source code search required for our concept location method is based on similarity between terms from programmer’s query and terms extracted from the source code elements (identifiers and comments).

We believe that developers using our method will gain a better tool for exploring the source code, and a better overview of the source code evolution and concepts of the software system. Our work partially contributes to the research project PerConIK. Several PeWe group members are currently working on it.

Beyond Code Review: Detecting Errors via Context of Code Creation


Dušan Zeleník
doctoral study, supervised by Mária Bieliková

Abstract. Our intention is to support the process of code reviewing. The idea is in detection of possible mistakes which could be created during software development. Unlike syntactical analyses or detecting smells we focus on the human factors. We presume that developer is affected by variety of conditions which could be present during code creation. To detect conditions with negative impact on the code quality we observe developer and his history in software developing. Using source control management and bug reports we discover relations among mistakes and conditions. For instance, working too long could be one reason to make a mistake in software developing. We used real software developers and their activities collected in almost one year. We analyse source control management, activities on PC and actions in IDE.

In our work we focus on software developers and their productivity and effectiveness which are influenced by surroundings. Programmers do lot of mistakes when they are not in the shape. We are going to prove this assumption by analyzing their behavior while developing software. In our work we focus on two main aspects which affect the quality of programmer’s outputs.

Continuity of work. This means that programmer is working in continuous time and he is not interrupted by external happening. Interruptions could cause problems with refocusing on the task and reconstructing the situation. This leads to loosing time and loosing context and eventually leads to mistakes and incomplete tasks.

Stereotyped work. When programmer works in common situations or in the common environment, he is used to conditions what positively affects his outputs. This also means that programmer needs some sort of stereotype in work. Loosing the stereotype brings anomalies in his behavior that causes anomalies in his outputs. Anomalies in outputs could emerge into mistakes.