Students’ Research Works – Autumn 2013

Search and Recommendation

Information Analysis, Organization and Navigation

User Modeling, Collaboration and Social Networks

Domain Modeling, Semantics Discovery and Annotations

to the top | to the main


Doctoral Staff

bielikova
Mária Bieliková
Professor
o collaborative/personalized/context search and navigation
o recommendation for adaptive social web
o implicit feedback
o usability
o user and contexts modelling

barla
Michal Barla
Postdoc
o user modeling
o implicit user feedback
o virtual communities
o collaborative surfing
o linked data, data modeling

jtvarozek
Jozef Tvarožek
Postdoc
o social intelligent learning
o collaborative learning
o semantic text analysis
o natural language processing

simkom
Marián Šimko
Postdoc
o domain modelling
o semantic text analysis
o Web-based Learning 2.0
o ontologies, folksonomies

simkoj
Jakub Šimko
Postdoc
o human computation and games with a purpose
o crowdsourcing
o usability
o domain modeling and semantics acquisition
o exploratory information retrieval


Keeping Information Tags Valid and Consistent

balkoKarol Balko
master study, supervised by Karol Rástočný

Abstract. Throughout the development of modern systems the metadata are starting to become their inseparable part. Metadata as structured information describing the information resources are understood as the means of describing the features of the source code in our field of research. The features of the code include for example the number of white characters or denomination of the copied code.

Our research field is focused especially on the metadata used within the project PerConIK. Within this project metadata are being denominated by the expression information tags. Information tags are stored in the repository of information tags and are connected to the source code through a link on the source code. However, these information tags can be discarded during the modification and refactoring of the source code as they may represent invalid source link or they might be describing invalid feature of the source code.

The thesis aims to create a method to control the consistency and validity of theses information tags. Through the analysis of this problem we have found several possible solutions our thesis may follow. Several ways used to extract data in general as well as the analysis of metadata through approaches used in related fields of research have been analysed.

Student Motivation in Interactive Online Learning

brza2Tomáš Brza
bachelor study, supervised by Jozef Tvarožek

Abstract. Interactive exercises help students in better understanding of the underlying concepts. However, students often lose motivation to learn outside class. Time spent learning is the most important factor for success in learning.

In our project we explore ways to motivate students work more in interactive online learning systems through learning analytics and social incentives. We try to compare different methods and in evaluation try to measure time spent interacting with the learning elements (programming exercises).

Natural Language Processing on the Web Using Automated Syntactic Analysis

cervenovaDominika Červeňová
master study, supervised by Marián Šimko

Abstract. Natural language as one of the most common means of expression is also used for storing information on the web. As the amount of this kind of information is huge, we need to ensure that computers could understand it in order to process it effectively.

Natural language processing is however a difficult and problematic process, because of the informality and not very good structuring of the natural language. The processing typically consists of the sequential application of different analysis components, which try to solve several problems such as phonological, syntactic, or context ambiguity, homonymy and polysemy. Syntactic analysis parses sentence, identifies members, syntagms and assigns them syntactic roles. The results of this kind of analysis are formal relations that can help make natural language and information represented using natural language more machine-processable.

We work on a method that will be able to automate the syntactic text analysis process as much as possible. There already are some tools that can perform such analysis in other languages that are more simple and easier to formalize (e.g., English). But, for example, in case of Slavic languages it is a nontrivial problem. In Slovak language it is still better to perform syntactic analysis manually by people, because of the variety of specific features in this language and high number of mistakes when processed by computer. Our goal is to analyze possibilities of maximizing the automation of this process and to minimize human manual work. We plan to evaluate our method using a software prototype that will analyze sentences. As a golden standard we plan to use syntactic annotations made be people at the Slovak National Corpus at Ľudovít Štúr Institute of Linguistics.

Methodics of Game Evaluation Based on Implicit Feedback

demcakPeter Demčák
master study, supervised by Jakub Šimko

Abstract. It is the objective of every game to create an authentic game experience, and capture the interest of its players. This game experience makes use of basic cognitive skills such as modeling, focus, imagination and empathy, and its full realization comes in the state of “flow”, where the player’s attention is completely focused at the content of the game. For the purposes of game design, and creating of intended game experience, it is necessary to be capable of evaluating the game experience of players. However, because of the subjective character of experience, it requires feedback from players to grasp.

The limitation of explicit feedback comes from the difficulty of executing a detailed observation of the player’s mental state without disturbing it. Hence, the importance of implicit feedback, which is based on recognition of the user’s mental state based on their natural behavior. One of the means of gathering interesting implicit feedback is through eye tracking. Mapping of the eye movements to cognitive functions shows promise, even for the evaluation of game experience. There, it can be used for identification of the immersive and disturbing elements of game play, the game passages which are the most and the least fun to play, or recognition of the flow in the player.

Our goal is to use the methodological approach in our research, to design a set of reusable principles which can be used for game evaluation based on eye tracking information. Then, we plan to apply these principles to several games with different user groups and different kinds of game play, to verify the results of our method.

Linking Data on the Web in order to Improve Recommendation

lubos_demovicĽuboš Demovič
master study, supervised by Michal Holub

Abstract. Currently the Web provides a large amount of knowledge, and also has the potential to become the largest source of information in the world. Data published on the Web is largely unstructured, intended for people, without a clear definition of entities, their meaning or relationships between them. Linked Data describes a method for publishing structured data on the Web so that they are connected to each other and which makes them useful. In addition, Linked Data contains various variants of links between the entities that make it possible to create a chart describing the selected domain. Promoting the importance of data represents the next stage of Web development referred to as Web 3.0.

We deal with the analysis of automated machine processing of data on the Web. We propose a method that allows automatic identification and extraction of entities and facts of the links between the entities. We focus on the processing of unstructured data from English and Slovak Web content. The selected domain of our work is educational content, in which there is a large amount of educational material. Obtained facts we will use in the dataset linked data describing the knowledge of the selected domain.

Subsequently, on the basis of principles and the full potential of Linked Data to recommend further individual works related education material in the same category. Through new recommended materials we provide to end users extended and relevant content. We will verify the proposed method experimentally, by implementing a software tool that will exploit the knowledge base for recommendations.

The Analysis of Social Network

detkoRastislav Detko
bachelor study, supervised by Michal Kompan

Abstract. Nowadays, many people are actively using social networks. Usually these social networks are used to share ideas and to communicate with other people. These information can be useful in various task of Web to user improvements (as the recommendation, information filtering etc.)

Finding influencers and theirs outreach is effective way, but not commonly used to the development of marketing. Few years ago marketers played a numbers game. They invest money to build sizeable community in social network. Then marketers find out that is more important to have small network well controlled by influencers and specialists rather than big uncontrolled community.

In our work we focus on analyzing data mined from social network and find members, whose we call influencers. Those influencers have increased ability to inspire other members to drive action. Similarly, we aim to find influencer’s outreach. Outreach is range of influenced members, those who drive some action after being inspired by influencers. Our work is focused to finding so many influencers, how many are needs to inspire desirable count of members whom will drive action with total probability per specific time.

Finding and Harnessing Experts for Metadata Generation in GWAPs

dulackaPeter Dulačka
master study, supervised by Jakub Šimko

Abstract. GWAPs have been used to acquire and validate metadata for various media types in last couple of years. In our project CityLights we focused on validating existing music metadata due to poor quality of user enetered data in online databases. As it failed in detecting false positives and many players had to take part to gain a decision, we realized that expert opinion or player’s activity weighting would fasten up the process a lot. The problem was that individual experts had been playing there, however they were outshouted by the rest of the crowd. We would like to recognize these experts by simple tasks and give them more power in affecting the dataset by playing the game.

We propose a game with a purpose primarily for domain expert finding, secondarily for metadata generation. After the experts are recognized, they can be treated differently during metadata generation process – and even a-posteriori to gain more accurate data. Our work is focused on music metadata gain and we would like to find experts in various music domains. We want to combine game with an online radio. By listening to online radio and answering questions about currently playing songs, we filter out non-experts. Then, after experts are recognized, we can experiment with special tasks only for experts and see their success ratio or compare success rate of expert group with non-experts and see how faster we can get in metadata generation with weighted player’s actions in GWAPs.

Gamification of Web-based Learning System for Supporting Motivation

richar-filipcik120Richard Filipčík
bachelor study, supervised by Mária Bieliková

Abstract. The greatest driving force while learning or doing some work is undoubtedly motivation. Motivation affects job performance and efficiency and amount of time a person is willing to sacrifice. There have been several new theories thinking about how to motivate a person in the last years, one of them is theory of gamification. By Wikipedia, gamification is “the use of game thinking and game mechanics to engage users in solving problems.” The elements of gamification are being increasingly used in applications development in pursuit of supporting productivity while using these applications.

Gamification defines several techniques and methods on how to increase motivation while using application. Mostly it is about using elements which are well-known from majority of videogames, such as badges, achievements, pointing systems and leaderbords based on them. Very popular are various kinds of competitions too.

Our goal is to incorporate elements based on gamification theory into existing infrastructure of web-based learning system and use them to increase students’ motivation while using this system. Our second objective is to specify and apply new rules of rewarding students for their work by using dynamic scoring system, which will take weights of individual actions into account. Weights should be editable either by teacher or by system itself.

Group Recommendation of Multimedia Content

edo Eduard Fritcher
master study, supervised by Michal Kompan

Abstract. In our times it is very important that web pages or applications, not only store information, but it is also needed that the page or application could communicate with the user in certain ways. Because of the growth of the world wild web the amount of information which are stored in online space has increased. To solve the problem of this information burst recommendation technics and methods were invented, but as the world changes, the access to the internet also changed. People collaborate more often with each other. In times where the most visited pages in the world are social network pages the recommendation technics have to adept these new trends. Which is mainly collaboration between users. The answer to this need is group recommendation.

Therefore we are proposing method that will extract information from the users threw social networks for recommendation generation. Threw the extracted information we will create the Big Five Personality Model for each member of the group. After that we will apply an aggregation strategy for the personality model which will include it in the recommendation method. We will use a graph based approach for recommending content, this way will ensure that our method will be applicable for a wide range of domains. The method will be tested in the domain of movie recommendations.

Facilitating Learning on the Web

mgregorMartin Gregor
master study, supervised by Marián Šimko

Abstract. Personalized text enrichment with the potential to improve access to an information is an easy way to obtain new knowledge. Text enrichment is the process covering analysis of the text, enrichment of the text based on user’s knowledge and finally obtaining feedback about the appropriateness of text enrichment. Everyday web browsing is the normal routine of each of us and through experiences that users have, it can be used as an effective way to support learning, especially e-learning.

Our aim is to gain a feedback from every possible action of the web browsing, transform this feedback to the knowledge about the user, store the knowledge to user model in order to enrich a text of the user’s web page according to the knowledge of user goals calculated from the user model. We selected from all possible observable user actions a cursor movement during the web browsing session and scrolling behaviour. We will evaluate our approach in e-learning domain. Analysis of domains of user modeling in adaptive e-learning web systems, behavior tracking on the web and facilitating learning on the web poses many challenges to solve.

Building a Domain Model using Linked Data Principles

holubMichal Holub
doctoral study, supervised by Mária Bieliková

Abstract. The Linked Data principles are being used in many datasets published on the Web. The aim of our work is to use Linked Data in order to create models describing 1) the domain of software development, and 2) the domain research in the field of software / web engineering. We use these models to represent the knowledge of IT professionals (analysts, programmers, testers), as well as the research interests of researchers in the respective fields.

We propose a method for automatic construction of a concept map serving as a basis of our domain models. For this purpose we use unstructured data from the Web, which we transform to concepts and links between them. Using a concept map we describe the knowledge of software developers as a set of technologies and principles they are familiar with. We also use a similar concept map to describe research areas, problems, principles, methods and models studied by researchers at our faculty.

The domain models we create can be used as a basis in two adaptive systems. The first aims at capturing IT professionals’ knowledge and skills, deduce further technologies they might know and enables users to search for a suitable candidate for a certain task or project. The second one allows users to bookmark, annotate and collaborate over research papers in digital libraries, as well as other Web documents. Here, we also use the model in order to answer queries in pseudo-natural language.

Evaluating Web Aplications Usability Through Gaze Tracking

janikMartin Janik
master study, supervised by Mária Bieliková

Abstract. Usability of application, also know as quality of use, is a feature, which can fundamentally influence succes rate of an interactive application. Evaluation of usability depends on a type of application and on a person, who uses it. For web applications, we often do not know the set of their users. The only thing we can know, are the specific users of each end group, but they ususally represent an open and sometimes dynamically changing community. Through research of implicit feedback, which we gain from user-application interaction, we can evaluate the usability. For example, we can detect adverse behaviour, resolve it and improve the use of application.

Basic usability testing provides us sufficient amount of data to help us evaluate the desing of application. Gaze tracking brings new aspect for evaluating usability. It offers information of which objects attract attention and why. By following the object gaze order we can tell how users search through web applications, creating spesific gaze pattern. In a book “How to Conduct Eyetracking studies”, from Jakob Nielsen and Kara Pernice, they claim that for using heatmaps to analyze gaze tracking data, we need 30 users per heatmap. Therefore gaze tracking is the most expensive research method.

We aim to create a method for usability testing in a specific domain of the applications, like e-learning systems or content management systems, on a basis of implicit feedback, particularly from gaze tracking. We want to research the possibility to generalize usability tests for web applications from same domain. Our goal is also to answer the question: “Is it possible to create a method, which will reduce the number of users needed for usability testing, but it will also preserve the value of acquired data for evaluation?”

Group Recommendation for Smart TV

kassak Ondrej Kaššák
master study, supervised by Michal Kompan

Abstract. The importance of personalized recommendation in the area of reducing information overload is well known. But there is number of activities, such as viewing multimedia content, which people do together. So we need to adapt the recommendation process to this fact.

Existing group recommendation solutions, however, often focus on the recommendation part only, without building on well processed user models. But group recommendation, as an individual recommendation, is process mainly working with users. To achieve the highest recommendation quality is therefore necessary to take into account number of attributes that are affecting the group decision-making in reality. Each person is in fact unique and has different preferences and requirements than others. Someone selects content according to its author, but for other may be this information irrelevant, because he chooses rather by genres. People in groups mutually interact, while deciding what content they will watch together. Next, there are social relations between them; so for example, they aren’t indifferent to satisfaction of close people. Similarly, each person can enforce his opinion on a different level.

When recommending relevant content, it is appropriate to include all these attributes while maintaining an acceptable complexity of the whole process. We think, this way we can maximize the recommendation quality and can also provide results to group in the real time. In our work we propose mixed hybrid method, using collaborative recommendation with content based recommendation of unknown, new items. Our Method is built on weighted user and group models considering individual preference of genres, keywords, actors and directors. Next it works with social connections inside groups and with group members’ personal abilities of asserting their own opinions.

Knowledge Sharing by Means of Graph-based Diagrams on Web

kazickovaTerézia Kazičková
bachelor study, supervised by Ivan Srba

Abstract. Nowadays, there is not a problem to get access to information or to share it with others, anymore. However, the question is how to find relevant information and share knowledge with the right audience. Moreover, it is important to do so quickly and effectively. These requirements cannot be met by using only web search engines since solving more complex problems require not only algorithms and computer intelligence but also human intelligence and discussion.

Therefore, the Community Question Answering systems (CQA) gained their popularity. CQA systems offer their users possibility to find answers by asking a group of users forming a community willing to share their knowledge. Most of the CQA systems enable users to vote for the most accurate answer, what increases the probability of finding relevant answer. To popular CQA systems belong Stack Overflow, Yahoo! Answers or Quora, to name a few. However, CQA systems play a fundamental role in knowledge sharing, we believe their potential is not fully employed yet. Therefore we would like to provide more effective alternative for knowledge sharing within CQA systems. As CQA systems offer possibility to write questions and answers only as unstructured text, expressing ideas as well as understanding ideas of others might be often complicated and so lead to misunderstandings. Using diagrams along the text could increase the understanding, especially in fields of engineering.

Our goal is to create a web application based on functionalities of current CQA systems but enriched with possibility to create graphical representations. Main focus being on software engineering, our priority is to implement graphical representation by UML diagrams. Additionally, we would like to implement the functionality of real-time synchronization. As such, it will also be a tool for collaborative project work. We aim to make this application a helpful learning and knowledge sharing tool for students at our university.

Analysis of Interactive Problem Solving

kisPeter Kiš
bachelor study, supervised by Jozef Tvarožek

Abstract. Using applications full of interactive content, for example solving puzzles or some logical riddles, is very attractive and fascinating for users. During the use of such applications, we are able to follow users actions, better understand their capabilities and eventually modify or adapt these activities to their interests and needs.

This project aims to analyze different ways how users solve interactive tasks in games. We would like to understand which part of game makes the game attractive and why. There are lot of methods and technics used to gather gameplay data, such as logging game events, using eye tracker to track user gaze or make physiological measures as EKG or EMG. Combining multiple methods of gathering data, we can obtain enough information about what is happening in the game, what the user sees and what emotions is he experiencing. Analyzing this information we can probably create model showing existence of link between the personality of player and the way he is solving tasks in game. As result, we can use this link to create new games or applications with interactive content that would be better balanced for every targeted player or user.

Keyword Map Visualization

kloskaMatej Kloska
bachelor study, supervised by Marián Šimko

Abstract. In present, visualization of data in form of large scale maps is becoming more and more commonly used. Any type of entities could be described by set of keywords which would be easily interpreted like large scale graphs. The problem is how to naturally navigate in such graphs with the respect to the data change over time and the working depth of graph. This is why we aim to help users with navigation and creation of such maps in more natural approach.

In this work we analyze different methods of visualization of large scale maps and possibilities how to create them with the reduction of required time. Results of our analysis are applied in design of a new method. Significant characteristic of our method is to keep track of keywords in navigation. Secondly we want to focus on simplification of the creation process. The main idea is to create intuitive user interface with high efficiency and low error rate.

The designed method is evaluated using data from adaptive learning system ALEF via live experiments. We will evaluate method impact through an experiment, in which users will be asked to create several maps with old method and then we are new one. We expect that results of new method should be measurable better in general and users would be more satisfied.

Group Recommendations for Adaptive Social Web-based Applications

kompanMichal Kompan
doctoral study, supervised by Mária Bieliková

Abstract. Personalized recommendation is a natural component of Web nowadays. Users’ social activity increase can be observed, thus, the group recommendation is more and more researched. Several of our daily activities are not single but group based. Aspects of the group recommendation become more visible as larger part of our lives transformed and moved to social networks. The group recommendation can be used not only in the mean of recommendation of items to the group of users. Several types of groups can be observed – temporal or permanent, forced or natural, moreover in the context of web we can consider virtual groups as well.

This brings us to the assumption that principles of the group recommendation can be used in the standard – single user recommendation tasks. Firstly the aggregation strategies used for the group preference aggregation can be used in the multicriteria problem. Similarly, groups can be useful when a new user interacts with the system which refers to the cold start problem. The task of the group recommendation can be extended to the standard single user recommendation as well. The aggregation of single user profiles in order to obtain one group profile combines user preferences and also in some settings can introduce variety, which can be interesting from the recommendation improvement point of view.

Software Metrics Based on Developer’s Activity and Context of Software Development

konopkaMartin Konôpka
master study, supervised by Mária Bieliková

Abstract. Monitoring and evaluation of software development is important for its management and reaching the desired goals, what influenced proposal of various software metrics. Code reviewers and managers traditionally use source code metrics to identify code complexity, code smells or other problematic places in the source code; or they do it manually by inspection. However codebases of software projects tend to be large what makes identifying places for inspection tedious and difficult. This opens up space for using information about developer’s activity to make identifying problematic places more efficient.

It is developers who create source code, thus there should exist connection between the software attributes and the activity done by developers together with surrounding context. Empowering developers to annotate the code with semantic information will give us even more valuable sources of information about the source code without the need of costly source code analysis. Both kinds of information, automatically collected activities and manually entered annotations, are abstracted to information tags anchored to the source code. Using just the information tags and not the original raw data reduces the dimensionality of information space for our method.

In our work we propose method for discovering problematic places in source code, to be reviewed by code reviewer, using visualization of developers’ activity and context described with information tags together with ability to filter and search the space of available tags. With such tool we create space for analysis and summarization of developer’s activity, as well as space for defining sequence patterns of activities to take role of software metrics. We believe our method can help with bridging the gap between the cause and consequences, i.e. developer’s activity and context and resulting attributes of the produced software.

kramarTomáš Kramár
doctoral study, supervised by Mária Bieliková

It has been recognized that Web search needs some form of implicitly acquired information, that would help to understand the underlying intent. That information is collectively referred to as a search context. Traditionally, in Web search the context describes any information that can be used to infer the specific goal that the searcher wants to fulfil by issuing a query. The concept of context per se is decoupled from its representation; one example of such representation might be a vector of weighted terms indicating the topics that the user is interested in. The only important aspect of a search context is that it reveals information about the underlying search goal that can be used in the ranking phase of the search process to rank documents matching the goal higher.

We identify several sources of context: temporal context in form of behavioral search patterns, activity-based context in form of past queries and social context in form of user similarity. We analyze the behavioral search patterns as a possible source of context and show that it can be used to build a temporal context. Past queries can be used to build an extremely short-term activity-based context and we introduce a method for finding related queries that is based on document metadata and implicit feedback and show that it outperforms other existing methods. We analyze similarity between users as another source of context and show how such model outperforms a mainstream search engine.

Context-based Improvement of Search Results in Programming Domain

krizJakub Kříž
master study, supervised by Tomáš Kramár

Abstract. When programming the programmer does many other things along with writing the actual source code. He opens various applications and other source codes, searches them, writes notes and, last but not least, he searches the internet for solutions of problems or errors he might have encountered. He does all these tasks in a context or in order to do something, usually to solve a problem. Via identification of this context we can understand the meaning behind the programmer’s actions. This understanding can be used to make the internet search results more accurate and relevant to the current task.

In this work we analyze the existing methods used to mine the context in other domains and analyze their usability in the programming domain. An important source of contextual information is the source code the programmer is currently working on, we analyze methods used to extract metadata from source code files, software projects and other documents in general. Based on the analysis we design methods which can be used to mine the context in programming domain from the source code in order to create a context model which we further use to improve the search results. The designed methods are experimentally evaluated using logs and data from the past and via live experiments.

Modeling Programmer’s Expertise in Software House Environment

kuricEduard Kuric
doctoral study, supervised by Mária Bieliková

Abstract. Evaluating expertise of programmers is critical in software engineering, in particular for effective code reuse. In a software company, technical and expert knowledge (experience) of employees is not usually represented in a unified manner and it is difficult to measure or observe directly. How well (the level of) the programmer’s expertise is problematic to determine automatically. For example, to exactly show the programmer’s “real experience” with a technology (library) we need to give the programmer to solve a test. However, there is often a problem to motivate programmers to execute a test and they have different standards for judging the degree of expertise. Therefore, our steps are based on the automatic estimation of the relative experience with the consideration of other programmers and to compare them with each other in the company.

Our new idea is to establish automatically programmer’s karma based on monitoring his working activities during coding in integrated development environment (IDE), analyzing and evaluating the (resultant) code he creates and commits to local repository. By applying our approach we are able to observe and evaluate different indicators. For example, we can sight the programmer who often copies and pastes code from an external source (Web). Contributions of such programmer can be relative to the project, moreover, it can reveal a reason of his frequent mistakes or low productivity in comparison with other programmers.

Tabbed Browsing Behaviour in Web Based Systems

labajMartin Labaj
doctoral study, supervised by Mária Bieliková

Abstract. In our research, we focus on exploring the possibilities of employing the tabbed browsing (also called parallel browsing or tabbing) in adaptive web based systems. Adaptive systems rely on several types of information about the users, about content being presented to users, etc. This includes user and domain models. Implicit user actions expressed during tabbed browsing could possibly be used to improve such kinds of models.

The tabbing is nowadays regarded as a more representative model of actions during browsing than the previous models which considered visits to resources in a linear fashion and disregarded the possibility of a user having opened multiple pages at once and switching between them. Users browsing the Web do browse in tabs and they do so for various reasons in different scenarios: keep a page opened as a reminder to do or read something later, find additional information about topic on a given page, etc. Treating user’s tabs as indicators of the user’s current and future interests can help improve the user model for better personalization. Users moving through the pages also express relations between the pages (and therefore between their content), possibly aiding in domain model inference.

The parallel browsing behaviour, however, cannot be reliably inferred from typical server logs. It can be observed with the aid of client side scripts embedded within web pages (observing all users of a single web application) or from a browser extension (observing tabbing on all web applications visited in the augmented browser, but only within a smaller group of users who have the extension installed). We propose a browsing extension, Tabber, which allows users to view and analyse their usage of browsers tabs, and its data can serve as a dataset for research.

Acquuisition of Learning Object Metadata Using Crowdsourcing

laniMarek Láni
master study, supervised by Jakub Šimko

Abstract. In past years Web began to be used largely for education purposes. There are many TEL or Question Answer portals and web sites which are being used to gain knowledge and Information. Many times not only the users benefit from this systems, but also these systems benefit from their users. We can say, that it is win-win relationship. It is because the content of these systems is often crowdsourced, it means generated by users themselves. But as it is not guaranteed that people who creates this content are experts in specified area, it is necessary to filter it. Problem is, that this filtering should be automated for reason of its duration. There are several approaches how to do so and aim of our work is to take, combine and modify some of them to achieve satisfactory results in filtering of answers on questions within the TEL system.

In our approach we are focused on collected user created evaluations of answers to questions. Our aim is to use advanced methods of interpretation of crowd answer (evaluations) and determine if crowd is capable to evaluate answers similarly as expert. Methods of interpretation, which we plan to use are determination and filtering of out-layers, determination of force of certain evaluation based on distribution of evaluations individual user’s evaluations. To collect the sufficient amount of data we are developing our own system, which includes some of the key features of the common CQA systems. Evaluation of this work will be done by comparing value of our interpretation with evaluation assigned to every answer by expert.

Supporting Query Formulation

lieskovsky_aAdam Lieskovský
bachelor study, supervised by Róbert Móro

Abstract. Digital libraries offer us many different ways to form a query and search for articles. The most common is keyword search which has been fairly popular among users for some time and majority of them can use it efficiently nowadays in contrast to the past. However, sometimes when searching for documents in a domain which is not well known to us, using keyword search might lead to dissatisfying results, furthermore users can become confused because they are not aware of the right terms to use for the construction of their queries.

One of the unexploited methods that eliminate the need of an explicit query formulation (user still has to have a sufficient level of knowledge to make the initial query) is query by example (QBE) that has proved to be successful in the content based image retrieval or multimedia domain as whole.

We would like to propose a QBE method based on metadata similarity using relevance feedback, whether it is implicit/explicit or positive/negative. This way, even novice or inexperienced users can interactively select relevant articles and with each iteration improve their initial query results by their actions. We plan to evaluate our approach in bookmarking system Annota.

Researcher Modeling in Personalized Digital Library

mliptak2Martin Lipták
master study, supervised by Mária Bieliková

Abstract. Researchers use digital libraries to either find solutions to particular problems concerning their current research or just to keep track with the newest trends in areas of their interest. However, the amount of information in digital libraries grows exponentially. This has two serious consequences. Firstly, many interesting works are unnoticed. Secondly, researchers spend too much time reading articles that turn out low-quality, unrelated to their current research or unrelated to their other interests. These kinds of problems are nowadays solved with recommendation systems or more effectively with personalized recommendation systems. The core of every personalized system is its user model.

Our aim is to design and implement a user model based on data from Annota. Our model will leverage the articles the user has read, the tags and folders she has used, the terms she has searched for, etc. Furthermore, user data from the Mendeley library organization service will be integrated. Probably a personalized article recommendation service for Annota will be used for its evaluation. At this point we are also considering other options as personalized search results or personalized article recommendation service for Mendeley. Based on available user data and evaluation options, we will seek for suitable representation and creation process of researcher (user) model in domain of digital libraries.

Recommendation in Adaptive Learning System

lovasovaViktória Lovasová
bachelor study, supervised by Martin Labaj

Abstract. Recommendation systems are an important part of educational systems, where they help students for example to decide what to learn next. Among the most common methods used in recommendation are the collaborative, content-based and hybrid methods. Collaborative filtering methods predict what users will like based on their similarity to other users. Content-based filtering tries to recommend items that are similar to those that the user liked in the past. Hybrid recommender systems are the combination of content-based and collaborative filtering or other methods. These methods use explicit and implicit feedback. With explicit rating, the users can show their opinion in various ways – stars, thumbs up/down, scale from 1 to 10 and others. In implicit feedback, the users show their opinion without even being aware of that – their actions are monitored: where they click, what they buy, what they browse and so on.

Parallel browsing may represent a type of implicit feedback – including actions such as switching between various websites, spending time on a site, opening links into new tabs or reusing the current one. In our research, we take these indicators into account in recommendation.

We propose a recommendation method based on the parallel browsing in the adaptive learning system ALEF. We recommend learning objects based on student activity. We will evaluate the method accuracy through an experiment, in which we will evaluate students obtaining recommendations based on their parallel browsing against students with recommendations based on standard recommender system. The students with recommendations based on parallel browsing should achieve better learning results than students with standard recommendation methods.

Innovative Application for the International Competition

mikle-et-alFilip Mikle, Matej Minarik, Juraj Slavíček, Martin Tamajka

bachelor study, supervised by Jakub Šimko

Abstract. Imagine Cup is a prestigious worldwide competition organised by Microsoft and this year we would like to be part of it. During last few months we have struggled our way through many different ideas. Unfortunately, many of them, as we discovered, were already realised and implemented in previous Imagine Cup competitions or by companies and individuals.

Currently we are considering two ideas for further investigation. First is based on OCR. We would like to use this mechanism to recognize quantities, prices and products from user s receipts (bills). Users will gain knowledge about their spendings and we could recommend them new products, new stores and some best offers. How would we like to reach this? Well, our users will do. Based on data got from their receipts, we’ll be able to tell, where it’s possible to buy the most bargain chicken to make some tasty sunday lunch. In later phase, we’d like to get some data from stores, too.

Second is about innovative way to go behind borders of real world. The main idea is based on simple assumption – user does not only want to see the virtual reality on some screen in front of his eyes. No, he wants to be an active part of it. Nowadays, there’re some solutions, but (although some of them are really exciting) none of them provides possibility to get really tired when carrying rocks, or walk around in fantasy world without all the odd joysticks, controllers, wires, etc. We want to let user LIVE in the virtual world as if he’d live in the real one. We would like to create head-mounted display with gesture recognition to provide unseen immersion into virtual reality. With such hardware, you could easily walk through the Louver, dive to the wreck of the Titanic or hang out with your friends inside of your favourite game environment.

Automated Search Goal Identification

molnarSamuel Molnár
master study, supervised by Tomáš Kramár

Abstract. Automatic search goal identification is an important feature of personalized search engine. The knowledge of search goal and all queries supporting it helps the engine to understand our query and adjust sorting of relevant web pages or other documents according to our current information need. To improve the goal identification the engine uses other factors of user’s search context and combine them together by different relevance weight. Although, most of the factors utilized for goal identification involve only lexical analysis of user’s queries and time windows represented as short periods of user’s inactivity.

In our work, we focus on utilizing semantic relationships between search results and user’s query. By applying existing approaches for semantic text analysis on a content of individual search results, we propose a better factor for determining coherence between queries. The semantic analysis of search results also identifies deviations in queries that are not related to current user’s goal. By utilization of implicit feedback, we improve context factors by ranking search results, since relevancy of search results for user can be measured by the time she spent browsing the individual content of search result or the position of selected results in the list. We plan to integrate our model of weighted factors utilizing semantic analysis to existing search engines or servers like Elasticsearch.

Exploratory Navigation Based on Automatic Text Summarization

moroRóbert Móro
doctoral study, supervised by Mária Bieliková

Abstract. Considering the shear amount of information available, searching for relevant information and navigating in the information space of the Web, or more specifically in that of a digital library can be a challenging task even for a seasoned researcher and more so for novice ones, such as starting master or doctoral students. They can have a hard time formulating keyword queries, because they lack (at least in the beginning) the needed domain overview and knowledge. Moreover, when researching a new domain, their goal is not to find specific facts, but to learn about the problem or the given domain and investigate the topics and existing approaches as well as the gaps in the current state of knowledge. Therefore, their task is exploratory in essence.

In order to support the exploration of the domain by the users, we provide them with navigation support using automatic text summarization. The summaries consist of sentences conveying the most important information of the document; they, therefore, play a crucial role in the search systems and during navigation sessions by speeding the whole process up and diminishing the overall information load.

We focus on the problem of identification of information artifacts (leads, i.e. keywords) in the summaries. In our proposed approach the users can choose their own leads to filter the search results. In addition, we recommend the users potentially useful leads based on their current context and considering other aspects, such as novelty and diversity of the leads. We evaluate our approach in the domain of digital libraries of research articles on the scenario of a researcher novice using the bookmarking system Annota.

The Analysis of Social Network

patoprstyTomáš Patoprstý
bachelor study, supervised by Michal Kompan

Abstract. The identification of various groups within social networks, is one of the interesting research area. In our work we focus on the analysis of relationships and interconnection of users in social networks. Nowadays we live in a period of time, when using of social network is essential part of living. No matter if it is Facebook, Twitter, LinkedIn or another Online Social Network (OSN). Human being basically looks for integration into group, political entity or religion.

The Web has approved itself to be considered as an effective source of information and powerful tool for communication in a modern society. The main aim in the ages of Web 1.0 was sharing of documents, web pages and devices. Nowadays, in the age of Web 2.0, it is more than that. It is connection of people, organizations, thoughts and concepts. The production and publication of content is the responsibility of users, matched and interconnected on the Internet concerning their interest.

In our work we analyze the network by a graph which is based on automatic identification of groups where the users is a member. It is going to be a graph of friendship for the environment of social network, e.g. Facebook, based on existing algorithms implemented on social network groups. Subsequently, this method is tested by a software simulation where the real data of Facebook users are processed.

Extracting Word Collocations from Textual Corpora

martin-plankMartin Plank
master study, supervised by Marián Šimko

Abstract. Natural language is the main way of communication between people. They use it for asking and answering questions, expressing opinions, beliefs, as well as talking about events etc. And they communicate in natural language on the Web, too. However, the simplicity of creating the Web content is not only the advantage of the Web, but also its disadvantage. It is expressed in natural language, which means that it is usually unorganized and unstructured. This makes processing of the Web content expressed in the natural language difficult.

Difficulties in natural language processing are often connected with ambiguity of the language. Some words have specific meaning, when they are used together in one sentence. This raises the problem of collocation extraction. There are various methods, which allows us to identify the collocations automatically. Some of them are association measures, based on word co-occurrences. Other are based on linguistic properties of collocations. We focus on the property called limited modifiability. It means that collocations cannot be supplemented by additional lexical material (for example, the noun in ‘to kick the bucket’ cannot be modified as ‘to kick the {holey/plastic/water} bucket’).

We introduce a method, which compares frequencies of n-grams to find out, whether a n-gram is a collocation or not. For example, the frequencies of ‘to kick the bucket’ and ‘plastic bucket’ are compared to ‘to kick the plastic bucket’. If the last frequency is many times smaller, it is probable that the combination of words ‘to kick the bucket’ is a collocation.

Evaluation of Code Quality and Programmer’s Knowledge Determination

podluckaJana Podlucká
bachelor study, supervised by Dušan Zeleník

Abstract. In the software projects, we put our effort into avoidance of various errors that could be embedded into code by its programmer. Those are not errors of syntactic type, whose could be easily discovered by debugger – those are errors of logical character. The layout of my bachelor thesis is the proposition of the method to discover software parts, whose could involve these errors. This method is to be based on the examination of user’s own context.

For the purposes of this research, we are using a data sample created by the observation of programmers credited with solving of multiple large-scale projects. We will utilize the records of PC usage and the usage of IDE, as well as the record of keyboard taps. Making use of these records, we are able to estimate an interruption or a break within the programmer’s work. This possible interruption is interesting to us because afterwards, it requires some time to completely focus on the task interrupted – and this, in fact, is one of the possible reasons for unintentional introduction of various errors into code. There are two possible types of interruption: external one, caused by the external environment (for example a phone ringing); and internal one, a result of the psychical need of a break. Interruptions may also have different levels of importance. Our goal is to examine effects of internal and external interruptions on task switching.

????????????????????????????????????

Ondrej Proksa
master study, supervised by Michal Holub

Abstract. Currently, millions of specific webpages are being created in the World Wide Web, while most of them are published in a non-structured form. Linked Data are structured data containing entities and relationships among them that are available through the Web. Some of the datasets are created through automatic processing of publicly available data, while having various utilization when personalizing webpages, searching or when deducing new knowledge. When creating new datasets, entities are usually connected through widely familiar datasets, but they are lacking further connections. One of the main problems is caused by detecting and creating relations and connections among existing datasets.

The aim of this work is to analyze the topic of creating new relations and to propose a method, which will enrich LOD (Linked Open Data) graph with new relations among existing datasets. LOD is a big graph of entities and relationships, the main aim of our method is to find the similarity between two vertices (entities) in the graph. If two vertices (entities) are sufficiently similar, then there is a relationship owl:sameAs between the entities they represent. The similarity of the graph nodes is based on the similarity of nodes’ properties and similarity of nodes’ relationships.

The proposed method will be experimentally tested on more use cases, because our method must be sufficiently generic and yet functional at the LOD. Therefore, we plan to experimentally evaluate our method on existing LOD datasets, GOV data and the DBLP – Computer Science Bibliogaphy.

Automatic Web Content Enrichment Using Parallel Web Browsing

rackoMichal Račko
master study, supervised by Martin Labaj

Abstract. Creation of links between resources on the Internet is now an acute problem due to the large amount of diverse content that it includes. In the past this content could only be created by the authors of pages, but now at the time of Web 2.0 users are able themselves add this content. This is the main reason and cause of improper structure of such content and weak or no links between similar sources.

Creation of clear and sustainable long-term structure of sites is important because of easier navigation when browsing or searching the Internet. Metadata and semantics of each page allows more accurate search and thus are search engines able to create complete picture of the site area. It is also possible based on the information content to create links between similar resources without having to physically connect those sites.

The aim of our project is to propose and test a method to enrich the domain model of web systems with ability to link external resources to current adaptive systems. The proposed method takes into account the user’s privacy while creating implicit links between resources on the basis of their behavioral models taking into account a parallel web browsing behaviour. The method will consider that user uses browser tabs and will try to accommodate to his habits when working with the Internet.

Employing Information Tags in Software Development

OLYMPUS DIGITAL CAMERA

Karol Rástočný
doctoral study, supervised by Mária Bieliková

Abstract. A management of software development process is a crucial part of the software engineering, from which the success of software projects are dependent. This management mostly relays upon quality and freshness of software metrics and analysis over these metrics. Software metrics can be based on source codes or empirical data about software developers. Code-based metrics are well known and many approaches based on them have been proposed. But empirical software metrics are still uncovered part of software engineering even though they contain important information about software development process and they can be used e.g. for forecasting significant trends similarly as empirical data (e.g., implicit user feedback) in the web engineering. Reasons of this state are time expensive and erroneous collecting empirical data. We proposed solution of these problems based on collecting, storing and maintenance of developer-oriented empirical data abstracted to information tags and also empirical software metrics.

Nowadays we are working on proposition and evaluation of methods focused to automatic generation of information tags from stream of events and to automatic maintenance of information tags. As a core of these metrics we proposed information tags generator which queries stream of events in RDF format and executes tagging rules after successful queries evaluation. These tagging rules can be defined manually or learned automatically via analyzing modifications in information tag space.

Learning User Interests from Browsed Web Content

sajgalikMarius Šajgalík
doctoral study, supervised by Mária Bieliková

Abstract. As Web evolves, we are overflowed with huge amounts of information. Users interact with the Web more and more, which reactively creates additional data to be processed. As data gets bigger, it can speak itself and uncover more hidden information. If we consider the data generated inside the web browser, we have an access to the information throughout all the web applications and web sites that user browses and interacts with. This has a potential to contain valuable information about the user that could not be discovered if we were limited and had only the data from a single server-side web application. Being on the client – within the web browser – enables us to monitor all user’s activity while preserving user’s privacy by performing all the computations locally inside the web browser.

In our work, we focus on analysis of the data generated by user interaction with the Web such as her web browsing history and her activity within the web browser to model user interests. We analyse this data mainly from the natural language processing perspective – to process the text content of the web pages that the user has been browsing and enrich it with additional metadata available within the web browser (like web page title, visit time, etc.) and connections between web pages in the web browsing history tree. We focus on identification of user interests in form of simple keywords. Recently, we have started to research the use of distributed (vector) representation of words for this purpose.

Personalized Search in Source Code

samelaRichard Sámela
master study, supervised by Eduard Kuric

Abstract. Always, programmers try to solve their development problems as easy and quickly as they can. Maybe all of them are using the Internet or some repositories with source code for finding the right solution. There are a lot of examples, tutorials or other options, how to get some fragments of source code, which are reachable by programmer. The most efficiency solution is reusing existing source code instead of creating a new one. But the problem is to find source code, which the best fits to solve development problem.

We will analyze some options how to recommend source code. This could be done by creating programmer’s user model. It should be based on implicit and explicit feedback. Implicit feedback should contains information about programmer, source code fragments implemented by programmer and information about technologies, what programmer used in some project. Explicit feedback will contains information, which are added manually. After that, we will be able recalculate a knowledge score of every programmer. Knowledge score will be calculated from user model of programmer and it will be useful for personalized recommendation of source code.

Privacy Preserving Data Collection for Data Analysis Applications

JakubSevcech_fotoJakub Ševcech
doctoral study, supervised by Mária Bieliková

Abstract. Many applications we are using daily are collecting various kinds of information about user’s activity. These logs of user activity are very important source of information for improving application quality, for user interface evaluation, for marketing and advertisement purposes but also for research purposes, such as data mining, machine learning, recommendation, personalization, research method evaluation and so on. The main sources of user activity logs are user’s search activity, social activity, explicit feedback provided using various forms and surveys and implicit feedback in form of visit duration, scroll activity, text selection and many more. Commonly, collected data is being published in form of APIs to ease service interconnection or in form of datasets to support research activities.

When collecting these information, it is common to concern very little about user’s privacy and means to communicate with users about our application being collecting some information about his activity. In most cases it is very hard to find out what information are being collected and how they are stored and processed. In our work we are interested in means to collect user activity logs with respect to their privacy. To support anonymization and privacy preservation, we use methods such as randomization, noise introduction, aggregation, detail removal … We study methods for privacy preserving data collection and processing for various types of data such as numerical values, text data, sparse matrix data, graph or transactional data. We are studying methods for data collection while preserving user privacy and conserving data usability for common data mining applications.

Processing and Comparing of Data Streams by Usage of Machine Learning

simekMiroslav Šimek
master study, supervised by Michal Barla

Abstract. From many different approaches of machine learning, multilayered self-teaching neural networks (a.k.a. Deep Belief Networks) using the unsupervised learning approach are gaining popularity nowadays. They used to be not accepted and largely ignored by most of the experts in machine learning community for almost 40 years. One of the reasons was simply because of too little computational power of available technology at the time. However, today it is already producing interesting results for example in computer vision.

Neural networks with one hidden layer use this layer to find the patterns and features in the input layer. These features are much more useful for deciding on how the output will look like than just the raw input data. Multilayered neural networks take this approach to higher levels of abstraction. First hidden layer finds the features in the input layer, the second hidden layer finds the patterns and features of the features in the first hidden layer and so on. This approach is also a bit closer to how our brain works with multiple levels of abstraction. The problem with multilayered neural networks is that the usually very powerful backpropagation algorithm used in supervised learning doesn’t work here as it is losing power with every layer. This is where unsupervised learning comes useful to pre-train the hidden layers one by one separately to find the patterns and features in layer underneath. After this stage of unsupervised learning the backpropagation algorithm is once again useful to fine-tune the model.

Our goal is to find methods and new ways of training to utilize the potential of multilayered neural networks and unsupervised learning to process and compare large streams of unlabeled data like data from eye tracker or sound recordings.

Adaptive Support for Collaborative Knowledge Sharing

srbaIvan Srba
doctoral study, supervised by Mária Bieliková

Abstract. Nowadays, it is possible to access almost unlimited sources of information by ubiquitous information and communication technologies. However, sometimes it is difficult to find required information by standard web search engines. In these situations, Internet users have a possibility to ask their questions in popular Community Question Answering systems (CQA) such as Yahoo! Answers or Stack Overflow. We are interested in an idea to provide similar opportunity also for users in intra-organizational context, and more specifically for students in educational environment.

On the basis of analyses of existing approaches in standard CQA systems, we concentrate in our research on the open problem how to adapt these approaches to match the organizational specifics. Particularly, we focus on question routing which is probably the most important part of the proposed educational CQA system. It refers to recommendation of potential answerers who are most likely to provide an appropriate answer on the newly posted question. We propose a method for question routing on the basis of existing methods for question routing while taking the specifics of intra-organizational educational domain into consideration.

We plan to evaluate the proposed method by employing a prototype of educational CQA system. We will start an experiment with the limited number of students. Afterwards, in the second long-term experiment, we plan to involve wider group of students moving to faculty-wide environment with possibility to ask questions related to various topics across several courses.

Browsing Information Tags Space

stenovaAndrea Šteňová
master study, supervised by Karol Rástočný

Software systems create various types of metadata that describes the specific parts of a structured addressed content, provide information about what data are, or give us additional information about the user who created them and how they were created. Metadata can also allow a better understanding of the data and display the data change over the time.

One type of metadata is information tags, which contain structured information associated with a particular piece of content. To enable analysis and ensure readability and understandability we will use navigation of user through the huge information space with the help of their visualization.

In our work we would like to propose method that would help users browsing information tags space. We want to focus on information tags connected to source codes and support their browsing. Therefore we will create map over the source code, which will display associated information tags and visualize their different values. User will be able to zoom from project to method and display different levels of information tags, without losing context. Using facet browser, we will allow user to create query and support exploratory search of project metadata. We plan to verify our solution in domain of project PerConIK using existing source codes and their information tags.

Implicit Feedback-based Discovery of Student Interests and Learning Objects Attributes

strbakovaVeronika Štrbáková
master study, supervised by Mária Bieliková

Abstract. In the present, searching and recommendation on the Web are becoming more and more common. Whether it concerns search and recommendation of articles in news and digital libraries, of study materials, or different products in e-shops, it is essential to know the characteristics of the objects being recommended, and the characteristics of the person manipulating these web objects. These characteristics are collected via implicit feedback. Inaccurate information, collected from evaluation of implicit feedback from human behavior, has significant influence on the accuracy of recommendation. With the increasing possibilities of monitoring users on the Web, like signals from eye tracking camera, blood pressure, body temperature and pulse sensors, we gain the ability to evaluate implicit feedback with great accuracy, and with that, gain the related interpretation of various signals of activity in different domains.

Despite the existing implicit methods of evaluation for various signals of user activity, which aim to explore its characteristics, there is still room for improvement. Our research is aimed at the attributes of users and learning objects with the use of implicit feedback indicators, and their interpretation for use in the domain of education. By the research of chosen implicit feedback indicators, individually and with each other, we will explore their mutual relations.

The goal of this work is, to propose a method for the use of collected information in the domain of recommendation. Further, we plan to experimentally prove our findings in the context of learning objects, inside the domain of education on the Web.

Collaborative Learning Content Enrichment

svrcek1Martin Svrček
bachelor study, supervised by Marián Šimko

Abstract. Collaborative learning is a situation in which two or more people learn or attempt to learn something together. There are many studies which show that collaborative learning is (in many cases) better than normal or individual learning. In the context of collaborative learning, web becomes a medium in which students can ask for information, evaluate one another’s ideas and monitor one another’s work, regardless of their physical locations.

We want to enrich the learning content using a new type of annotations – definition (within the educational system ALEF). On the one hand, definitions can help students find the most important keywords and their explanation. The whole information (both the keyword and the explanation) will be provided together. On the other hand, we can enlarge conceptual metadata, which can be used to improve services in the system (e.g., search, recommendations, …). After that we can present Web page data in a way that is understood by computers.

In context of definitions we face problems such as synonyms or different explanation of one definition. Therefore, we also want to evaluate these definitions. Rating of definitions will enable us to show students the most accurate information. There are a lot of factors that can influence the rating of the definition (e.g., student reputation, number of similar explanations, …). By solving these problems we can both help students and improve the information processing and presenting service in the educational system.

Modelling the Dynamics of Web Content

matustomleinMatúš Tomlein
master study, supervised by Jozef Tvarožek

Abstract. The web content has a very dynamic nature, it frequently changes and spreads over various information channels on the Web. It is a non-trivial process to observe and analyse the dynamics of web content and it also requires a lot of archived data. On the other hand, the knowledge of the behaviour of web content is useful in many areas of software engineering. It can improve search algorithms for web content, provide some basis for recommending similar content and also be useful to update cache on Web servers.

The dynamic nature of web content is largely hidden from the users. When reading about a topic on the web, they usually have no way to see how the topic evolved since the time of the article. It might be useful to the user to be able to see the latest development of the topic they are reading about especially in the domain of news.

We make use of tracking the flow of information on the web to recommend novel information to the user. We focus on recommending novel information in the domain of news from various news portals. Our goal is to create a system that enhances the news reading experience by recommending novel and relevant information that the user has not previously read about.

Supporting Query Formulation

vestenickyTomáš Vestenický
bachelor study, supervised by Róbert Móro

Abstract. Nowadays, the most widely used approach for searching information on the web is keyword-based. The main disadvantage of this approach is that users are not always efficient in keyword choice. This is why we aim to help users with query formulation or offer them different, more natural approach.

Our method is based on building the query by choosing positive examples of documents as a starting point and then specifying information radius of results by selecting positive or negative examples utilizing explicit relevance feedback as a query refinement method. We plan to use various metadata (mainly user-added tags) to determine document similarity. Users add tags to documents which aids searching process, because users choose what is relevant for them from the particular topic. Therefore, tags can be used for more fine-grained query refinement by enabling users to see and/or remove tags from documents selected as positive examples. We plan to use machine learning for personalization and future improvements of results relevance.

We focus on the domain of digital libraries of research articles and plan to evaluate our proposed method in bookmarking service Annota.

Context-aware Recommender Systems Evaluation Using Supposed Situations

visnovskyJuraj Višňovský
master study, supervised by Dušan Zeleník

Abstract. In the age of information overflow we witness an increase of personalized systems’ popularity. Among many solutions, coping with content adaptation to users’ needs, recommendation systems seem to stand above all of them. The purpose of recommender systems is to deliver relevant information to user and thus simplify his navigation in data overflow.

In the process of item selection, users are usually affected by various factors (e.g. user’s willingness to spend money in e-shops can be influenced by his wealth or by forthcoming Christmas). These factors describe user’s current situation and environment. They are called contexts. In most, if not all, domains it is appropriate to include contexts in recommender systems in order to improve recommendations, because as it is known there is a correlation between user’s action and contexts influencing his actions.

Having a context-aware recommender system makes evaluation process a far more complicated and expensive. A naïve evaluation of a recommendation generated for a certain set of contexts would be performed only in the exact same conditions. One possible solution to reduce this costly evaluation leans on supposed situation. Using supposed situations, it is no longer needed to wait until real contexts matches those given by recommender systems, but rather rely on the assumption that the user is able to imagine given situation an determine how would he act in this supposed situation. In the process of supposed situations’ generation we have to bear in mind users’ characteristics. Some of them could be more open-minded than the others and they could be possibly used to evaluate recommendations for different kinds of users (e.g. an empathic man could precisely answer the question what kind of perfume would a woman of his age buy in a given situation).

Web Search Employing Activity Context

?

Ľubomír Vnenk
bachelor study, supervised by Mária Bieliková

Abstract. Too much information is available on the Web, so it so simple to lost in this amount of data. When a user tries to find something valuable, the best current way is the use a web search. However, even if the web search machines are advanced, they cannot really know what the user is trying to find. It is mainly because average query length is 2-3 words. Main purpose of my research is to specify query by extending itself by user’s context.

To get user’s context, we developed activity logger. It captures user’s activity and interaction inside browser and also outside browser, at desktop applications. It records applications name, copy / paste between applications and time when user switched to another application. It also get keywords of actually writing document and web pages. All this data are user’s activity context and we need to give priorities to each event and information to get the most precise context.

We hypothesise, an application that user was recently using is connected with user’s query. So user’s intentions to search something are rooted in an application context and specific meaning of the query that may look can be found in the application content. Therefore finding connection between query and the application is crucial. We try to find it by examining interaction between query and every application and by finding connection between each application and query. We extend query by few most relevant context informations of application connected to query

Modeling Programmer’s Expertise Based on Software Metrics

zbellPavol Zbell
master study, supervised by Eduard Kuric

Abstract. Knowledge of programmers expertise and their activities in environment of a software house is used in prior to effective task resolving (by identifying experts), better forming of teams, effective communication between programmers, personalized recommendation or search in source code, and thus indirectly improving overall software quality. The process of modeling programmer’s expertise (building the knowledge base) usually expects on its input some information about programmer’s activities during software development such as interactions with source code (typically fine grained actions performed in IDE), interactions with ITS (issue tracking) and RCS (revision control) systems, activities on the Web or any other interaction with external documents.

In our research, we focus on modeling programmer’s expertise based on software metrics such as software complexity and source code authorship. We assume that programmer’s expertise is related to complexity of the source code she is interacting with as well as to a degree of authorship of that code. In case of software complexity our idea is to explore alternative approaches to LOC (lines of code) based metrics, such as weighted AST (abstract syntax tree) node counting or call graph based metrics. With source code authorship we expect programmers who wrote some code to be experts on that particular code, but we need to consider only some degrees of authorship as the code evolves is changed by other programmers over time. Information acquisition for programmer modeling in our work is based on activity logs from programmer’s IDE. We plan to implement our method as an extension to Eclipse IDE for Java programmers and evaluate it on data from academic environment or (preferably) real software house environment.

Utilization of Behavioral Patterns for Code Quality Assessment

zelenikDušan Zeleník
doctoral study, supervised by Mária Bieliková

Abstract. The code review is a process which is very frequent and oblique in software development. There are already tools which provide techniques for automated code analysis. These techniques usually focus on the design patterns and best practices. These are, however, focused on the code itself. This leads to revealing most of the code related errors. On the other hand, there might be errors related to the process of code creation.

Our approach to code quality assessment is in detecting problems while programmer is working. We are not interested in code itself. We look at the process of its creation. For instance, programmer has almost no experience with the domain he is working on, he could be exhausted because he is working in night. These examples are those which we can understood as negative aspects in code creation. We named few which are logical, but by analyzing patterns in programmer’s behavior could extract some hidden patterns. Thus we need to extract as much contextual information on user behavior as possible. These conditions in relation with former errors (which we track) repeat themselves in patterns.

We are trying to face three problems. Acquiring as much contextual information as possible, discovering patterns and calculating probability of error occurrence in specific code. By solving these issues we bring new approach to code quality assessment, and in fact, to code review process.