Students’ Research Works – Autumn 2014

Search and Recommendation

Information Analysis, Organization and Navigation

User Modeling, Collaboration and Social Networks

Domain Modeling, Semantics Discovery and Annotations

to the top | to the main


Doctoral Staff

bielikova barla jtvarozek kompan simkoj simkom kramar zelenik

  • Mária Bieliková: web personalization, user/user groups and contexts modelling, usability and user experience (and HCI in general)
  • Michal Barla: user modeling, implicit user feedback, virtual communities, collaborative surfing
  • Jozef Tvarožek: social intelligent learning, collaborative learning, semantic text analysis, natural language processing
  • Marián Šimko: domain modelling, ontologies, folksonomies, semantic text analysis, Web-based Learning 2.0
  • Jakub Šimko: crowdsourcing, games with a purpose, semantics acquisition, Web-based learning
  • Michal Kompan: single user and group recommendation, satisfaction modeling
  • Tomáš Kramár: user modelling, personalized web search
  • Dušan Zeleník: context modelling, recommendation

to the top | to the main


Similarity in Graph Data Structures

antlOndrej Antl
bachelor study, supervised by Dušan Zeleník

Abstract. Recommendation is important element of every web site containing large amount of data of similar or different type, where system on the basis of information about unique user will filter out potentially unwanted content or from the other view will show user desired data. In this case user is not disturbed by lot of uninteresting content. When recommending it is necessary to explore similarity of stored data by its attributes.

In my project I will work right on similarity of data. Concretely similarity of movies on the basis of key words belonging to concrete movie. My data will be stored in some graph database, which is nowadays very popular for storing data and relationships between them.

My goal is to research and test ways to get list of similar movies to the selected movie title by mutual key words of both movies. I want to find out how to get required result on one used graph database and than compare it using another one or two. There will be maybe little difference in way to access data so we can get little difference in results or in performing time.

Popularity Prediction of Scientific Publications

bachoAdam Bacho
bachelor study, supervised by Ing. Robert Moro

Abstract. Many articles have already been written about how to measure the impact of scientific publications. This area is most important for researches who want to know shortly after publication how popular their work will be. There are dozens of features which can be used for predicting the popularity and citation count of articles.

In bachelor work I am going to introduce the ones which are the most frequently used. Also I would like to mention new way of predicting citation count which is called altmetrics. These bring except traditional features, which are for example JIP (journal impact factor) or article’s title length, also alternative features like number of tweets, number of Mendeley’s readers, etc. Finally the results of this work will be compared with those that have the best results nowadays.

Personalised Management of Publications

Andrej Bíro
bachelor study, supervised by Michal Kompan

Abstract. Researchers often have to deal with managing of a large number of their publications. Except of tracking their works, they also have to keep a list of papers in which these works are cited. Therefore, personalisation of publications outputs is very important for scientists.

In my bachelor work I am going to focus on the analysis of actually existing possibilities for management publications, their advantages and disadvantages and possible options for their improvement.

Further, researchers often have to fill forms, which serve to record these publications.In the second part of my work I am going to propose and implement a solution for automatic filling of this forms.

Predicting Content Quality in Community Question Answering

borakMartin Borák
bachelor study, supervised by Ivan Srba

Abstract. Information age makes it easy for people to seek information on various topics. Everyone who has access to Internet can just type whatever he is curious about into a search engine, and he will receive thousands of results. However, sometimes people have more complex questions or they need very specific answers, which are hard to find using traditional methods. Community Question Answering systems (CQA) are systems, where users can publicly ask questions and let other users answer them.

Quality of questions/answers on these systems is not always optimal, therefore a content quality prediction and evaluation is in place. It is beneficial for users, to be able to distinguish between content of high quality and content of low quality. Main goal of my bachelor project is to develop a method that would prove efficient in such evaluation.

Automated Syntactic Analysis of Natural Language

cervenovaDominika Cervenova
master study, supervised by Marián Šimko

Abstract. Natural language as one of the most common means of expression is also used for storing information on the web. Its processing is, however, a difficult process, because of the informality and not very good structuring of the natural language. Syntactic analysis, or parsing, as a part of the natural language processing, discovers formal relations between syntagms in a sentence and assigns them syntactic roles. That can help make natural language and information stored by it more machine-processable.

We work on a method that will be able to automate the syntactic analysis of Slovak language. In inflectional Slavic languages, like Slovak or Czech, machine learning approach appears to be helpful. However, due to many grammar exceptions and specific rules, that are typical especially for Slavic language family, even with enough training data it is not possible to train a parser to recognize syntagms with 100% accuracy. Still, machine learning represents a useful base, which can be easily improved by some additional rule-based approaches.

As our aim is to parse Slovak sentences, we plan to create a hybrid method. For the machine learning part we will use a corpus of pre-annotated Slovak sentences and an existing dependency parser. The results will be then post-processed by a rule-based parser, that should eliminate as many inaccuracies as possible. We plan to evaluate our method using a software prototype that and as a golden standard we plan to use syntactic annotations based on the Slovak National Corpus project at Ľudovít Štúr Institute of Linguistics.

Source code review recommendation

chlebanaMatej Chlebana
master study, supervised by Karol Rástočný

Abstract. Evaluation of data obtained from monitoring of activities of the programmer developers is a challenging process. Every developer is different, has different experiences and different strengths. Some developers create better code during the day, another at night when they are not influenced by the noise of the surrounding area. Circumstances related directly to the developer have an impact on the quality of the code, such as illness or different situations in life.

Identification of these special circumstances and characteristics of developers is difficult. Even some environmental influences on the work of the programmer cannot be detected (e.g., problems they are currently dealing with in their private lives. Although we are not able to fully identify these special circumstances, we can work with the information we collect, for example, through a system being developed within the research project PerConIK (Personalized Conveying of Information and Knowledge). PerConIK system aims to support business applications in software house using empirical software metrics. This system collects empirical data through software tools and extensions for development environments (Microsoft Visual Studio 2012 and Eclipse) and web browser (Mozilla Firefox) that are installed on the workstation of developers 4. This data are, however, necessary to analyze and find metrics that are rated by different source codes. Through process mining we will be able to create a model in which can identify and select potential risk source code so we try to make the best recommendation for checking the source code designated by a judge.

Data Stream Analysis

DCIM100GOPRO

Matúš Cimerman
bachelor study, supervised by Jakub Ševcech

Abstract. Nowadays we can see Big Data processing and analysis in many domains. As a amounts of data growing, more people are focusing on this problem. The most affected domains are social media websites like Facebook or Twitter. A data from such a sources are streaming in huge amounts and changing in real-time, called data streams. We want to process and analyze data streams in real-time to provide users personalized and valuable outputs. The most common approach to handle data streams is map-reduce paradigm, e.g. batch data processing. Proposed methods are not meeting our requirement to process data streams in real-time. To achieve these requirements, we need use different approach called data stream processing which is built on Lambda Architecture.

Processing and analysis of big data streams is a complex task, because we need to provide low-latency, scalable and fault-tolerant solution. In our project, we analyze existing solutions and frameworks to analyze data streams. We provide verification of its characteristics in different kind of tasks. Accordingly to this, we propose a application for processing and analyzing big data streams (e.g. Twitter data stream), which allows users to get valuable outputs changing in real-time.

TV Program Recommendation

xdadojJakub Ďaďo
bachelor study, supervised by Mária Bieliková

Abstract. Nowdays there is a lot of informations on internet to be located by user. Recommender systems are here to make decision instead of user. We know a few recommender systems with different approachs – collaborative filtering, content-based, hybrid (collaborative and contetn-based working together). Collaborative filtering uses user profiles where are made groups of people with some similarity. Content-based is describing items. These approach tries to recommend items that has some similarity with items that user liked in past.

In my Bachelor´s work we will be focusing on collaborative filterting especially cold-start problem in TV´s program recommender system. The cold start problem is created when user register himself on website and recommender does not have any informations about him.

We try to model user on additional informations e.g. on items ratings which rated other users. We will use few techniques and hopefully get some similarity within new user and user who has buit some profile in past. There might be problem to get some additional informations. There exist a few other approaches to solve this problem, we might use them and try to get them working together.

Methodics of Game Evaluation Based on Implicit Feedback

demcakPeter Demčák
master study, supervised by Jakub Šimko

Abstract. Learnability is an essential trait for most types of applications. In the case of games in particular, their initial learnability has significant impact on their whole game experience. Quick learning curve of game mechanics and basic game dynamics can make a difference between a game which quickly peeks the player’s interest – thus more likely to achieve its goals – and a game which confuses the player, who then quickly loses their interest and abandons the game. Learnability of games is usually evaluated through playtesting, which depends on explicit methods of feedback to determine any kind of usability issues.

The reliability of explicit feedback is limited, because it is impossible to use for real-time observation of the player’s mental state, without disturbing said mental state. Hence, the importance of implicit feedback, which is based on the observation of the user’s natural behavior to determine their inner experience. One of the means of gathering interesting implicit feedback is through gaze tracking. Mapping of the eye movements to cognitive functions shows promise, even for evaluation of game experience.

Our goal is to design a method, through which game designers can evaluate the learnability of their games. The game designers pre-define their own learning cases, which are the interactions through which players learn new game mechanics and game dynamics. Our method then matches these learning models to the real player behavior including the gaze data collected during play, to gain information about any learnability issues if the learning cases and the usage data diverge.

Cold-start Problem in Personalized Recommendation

dobsovic2Rastislav Dobšovič
master study, supervised by Michal Kompan

Abstract. When a new user comes to a system, it is very hard to propose to him appropriate recommendation. This is caused by lack of information about him – what he likes, what he prefers or what he is looking for. This problem is called the new user problem (or cold start) and causes that user don’t get important information he needs or even financial loss in commercial systems. Basically, there are two approaches that can reduce this problem – one is to elicit as much as possible additional info directly from user; and second is based on simple idea that we can always find something out about the user without his effort.

Our research is focused on domain of recommendation of scientific articles in academic setting. We try to find as many as possible usable sources of information about user (i.e. we use knowledge about students based on connection to our academic information system). Based on information we can get, we propose than as reliable recommendation as possible.

TV Program Metadata Acquisition

dubecPeter Dubec
bachelor study, supervised by Mária Bieliková

Abstract. Nowadays still more and more people are watching TV or online movies and lot of them is still using TV program when deciding what to watch. Quality of these services depends on metadata which we have to describe movies and TV shows with. The higher quality of metadata we have, the higher quality of these services can be. Internet is full of information about movies and TV Shows, but these information are not in correct form or are spread on various sources. There are many websites where users can find information about movies and TV shows and these data are necessary for TV shows or movie recommendation.

Our goal is to enrich metadata served by TV program aimed at improving personalized recommendation of TV shows and movies and to increase quality of these services. We are trying to connect existing entities from TV program to entities in Linked Data space (DBpedia, Freebase etc.) where we can find some new interesting metadata. It includes also discovery of relationships between particular TV shows or between various things in our world and TV shows such as data about a place mentioned in particular TV show or movie or also discovery of whole new metadata which can potentially enrich content of TV program.

TV Program Recommendation

dzurnakErik Dzurňak
bachelor study, supervised by Mária Bieliková

Abstract. Personalized recommendation is a topic, which is nowadays spread through means of all items that have attributes, which person we are recommending to may like. Factors that create a demand on recommender systems are influenced by society and its desire for knowledge of the most data it can process and the easiest way for human to handle information is by audio-visual representation of input data.
People meet lots of different types of audio-visual input everyday, but whereever we are, TV broadcast is the most common temptation as much for our sight as for our hearing. That’s the main reason for me to focus into domain of Personalized Recommendation of TV Programme.
We aim towards recommendation of items included in TV programme according to air time, genre and category of program. Air time alias slot time of program is one of unexplored areas, which may be significant to use as factor for recommendation, because a lot of people live stereotype lifes, so when he/she has time to watch TV two weeks in row at 6 PM in Tuesday, there is high probability that either time or genre of program actually on air suits him/her.

Innovative Application for International Competition

gaborik_lesko_macina_stano

Jozef Gáborík, Matej Leško, Jakub Mačina, Jozef Staňo
bachelor study, supervised by Jakub Šimko

Abstract. Imagine Cup is a worldwide technology competition for students. It’ s main goal is to encourage students to create innovative software application solving real world problems. After researching many potential interesting areas, we are actually impressed with idea of helping people to be healthy and more confident by improving their posture.

It’s no doubt that computer make our work easier and more efficient. However, it caused that majority of jobs transformed into sedentary, requiring minimum physical activity. Furthermore, we spend most of our leisure time online. On a long term basis, sitting in bad posture while using computers causes back pain and headaches, which leads to decreasing your productivity, or even spine disorders.

We propose to solve this problem with our project, which is a real-time posture tracking application. It uses webcam to constantly monitor user posture – position of arms, shoulders, neck and spine. When needed, it intelligently alert you to correct your body alignment, switch positions or recommend you other opportunities. Every notification is done at the right time by non-interruptive form. Our aim of improving bad posture habits is also supported by educating users, detailed graphs and ability to share results with your specialist.

Slovak Web-based Encyclopedia

gajdosikPatrik Gajdošík
bachelor study, supervised by Michal Holub

Abstract. The Web contains a huge amount of unstructured or semi-structured information of different qualities floating around. In order to search through them and use them effectively the concept of Semantic Web comes to aid with transforming this chaos into structured machine-readable data. After creating these blocks of knowledge and finding the right relations in between them the cloud of Linked Data is growing. The need for evaluating all of the processed data also adds to the usability of shared knowledge.

For Slovak web and Slovak language being not the most used in the world comes lots of unstructured data (e. g. Slovak Wikipedia contains around 195 000 articles that are in no way processed or other sources offered by various institutions) that, when appropriately filtered, could actually enrich the web of data. The goal is to create a method, a tool to make structuralizing of Slovak Web (or any language mutation) easier and at the same time also help with evaluating the extracted data.

Utilizing Vector Models for Processing Text on the Web

gallayLadislav Gallay
bachelor study, supervised by Marián Šimko

Abstract. Text processing is an important part for plethora of tasks and is necessary for understanding content by machines in order to provide advanced functionality such as recommendation or intelligent search. Our goal is to improve lemmatization process in any given language by utilizing Word2vec tool by Google. This tool represents words as vectors and creates neuron map from plain text. In our work we focus on using large dataset consisting of plain text in selected language and small prepared data to create successful word lemmatizer.

The contribution of our work will be normalization of words in any given language with knowing almost nothing about the language. Current approaches deal with extracting a lot of plain text, cleaning of data, analyzing and optimizing the process. We believe that utilizing word2vec will improve process of lemmatization and helps in understanding of any language just by using meaningful plain text in given language. Currently we are training the model on data from Slovak Wikipedia and trying different calculations and metrics to improve the input and output accuracy.

Diacriticizing Slovak Texts

gederaJakub Gedera
bachelor study, supervised by Marián Šimko

Abstract. Nowadays, we spend a lot of time chating on the Internet. Many people are annoyed when they have to write words with diacritics. Therefore, they ignore it. The aim of the work is to create a web application that automatically repairs text without diacritics. Now we focus on comparing different methods of solving this problem which belongs to the field of natural language processing. There is a problem when we have a word with more variations of diacritics. We can choose a word randomly. This is not very effective. We can use statistical methods of choosing word, n-grams or other methods. We have to consider efficiency of method and time complexity of the method, too.

For evaluation, we will take articles from the Web and will remove diacritics. Then we repair diacritics and compare these two articles to find out precision of our method.

Adaptive Collaboration Support in Community Question Answering

grznarMarek Grznár
master study, supervised by Ivan Srba

Abstract. Nowadays, users are lost in a great amount of information available on the Internet. Many times they come into a situation when information which they search is not easily found anywhere on the Internet using traditional search engines. With the development of Web 2.0, there is an option to obtain such information by asking a community. This kind of systems based on the sharing of knowledge to each other is being used lately. One type of these systems is Community Question Answering (CQA). Typical examples of such CQA systems are Yahoo! Answers and Stack Overflow.

The existing CQA systems, despite of their increasing popularity, failing to answer significant number of questions in required time. One option for supporting cooperation in CQA systems, it is a recommendation of question to a user which is suitable candidate for providing the correct answer (Question Routing). Various methods have been proposed to help find answerers for a question in CQA systems, but almost all work studies heavily depends on previous users’ activities in the system (QA-data).

Our goal is to create a method for question routing where we utilize users’ non-QA data. We explore the possibility of obtaining users’ non-QA data from other systems. By analysing of users’ non-QA activities such as blogs, micro-blogs, friends etc. we can better identify a suitable user for answering a specific question.

Natural Language Processing by Utilizing Crowds

harinek2Jozef Harinek
master study, supervised by Marián Šimko

Abstract. Amount of information stored in natural language on the web is huge and still growing. In order to process this information better we need to process the natural language and transform it to a form that machines are capable of understanding it.

However, Natural language processing (NLP) is a difficult task. One has to deal not only with parsing the text and cleaning it from unnecessary words (stop words), but also with representing the semantics of the processed text. There are typically several modules that analyze given text from phonological to semantic layer. Such analysis is even more demanding in languages like Slovak that do not have fixed word position in sentence.

In our work we plan to employ crowdsourcing principles in order to be able to better annotate a given text corpora. We are creating a system in which students will be completing their given homeworks and by doing so also annotating underlying corpora.

Analysis of User Behavior on the Web

hlavac3Patrik Hlaváč
master study, supervised by Marián Šimko

Abstract. Existing solutions are based on monitoring the behavior directly in the browser by using peripheral devices, now we have the possibility of directly monitoring user’s gaze and focus blocks of content on the website. One objective of this work is to propose a user model suitable for collecting data from interactions in a Web environment. The primary task will be to gather information through eye tracker sensor to identify the fields of view of the user’s interest in the content on the screen along with the use of other devices (mouse, keyboard), which allow the acquisition of implicit feedback. These data from interaction will be processed in the user model. It should be particularly useful mainly in extending Alef environment.

Identification of Similar Entities in the Web of Data

holubMichal Holub
doctoral study, supervised by Mária Bieliková

Abstract. The idea of Semantic Web presumes structured, machine-readable data published freely on the web using open standards so that new intelligent applications could emerge. Such data is being constantly created and linked together, thus forming the Linked Data cloud or the Web of Data. Currently, there are few hundreds of such datasets covering wide range of domains.

In our research we focus on discovering relationships between entities published on the Linked Data cloud and using these relationships to aid adaptive web-based applications. Mainly, we are interested in finding similar and identical entities, either within one dataset, or across more datasets and linking them together. This method has a variety of usages: 1) in deduplication algorithms (usable in data cleaning and processing tasks), 2) in similarity detection (usable in search and recommendation tasks), or 3) in data enriching and integration tasks.

Building an adaptive web-based application using a domain model based on linked data enables us to utilize the relationships to recommend related entities (e.g. in the domain of learning materials), or to help the user navigate in a large information space (e.g. in large digital libraries containing millions of authors, papers and conferences which may overwhelm the user). We can also use the relationships to help the user in the search process. Since the Linked Data cloud has the form of a large graph we are able to answer complex queries, which are difficult to solve using traditional keyword-based approach.

User Reputation in Community Question Answering

hunaAdrián Huňa
bachelor study, supervised by Ivan Srba

Abstract. Community Question Answering (CQA) websites have quickly became rich sources of information of knowledge on many topics and serve as an alternative to find answers when standard keyword based search engines fail to provide adequate answer. CQA systems serve as an alternative where you can ask your question and real people will answer reasonably quickly. However, with many users there comes the problem of identifying who is skilled and thus provides valuable knowledge to others and who is less skilled.

Expert identifying can be useful for question routing or predicting answer quality because users with high expertise are more likely of making a valuable contribution. My work is aimed to identify these experts by exploiting the network behind interactions between the users on CQA site and their history of contributions.

Evaluating Web Aplications Usability through Gaze Tracking

janikMartin Janík
master study, supervised by Mária Bieliková

Abstract. Usability of application, also know as quality of use, is a feature, which can fundamentally influence succes rate of an interactive application. Evaluation of usability depends on a type of application and on a person, who uses it. For web applications, we often do not know the set of their users. The only thing we can know, are the specific users of each end group, but they ususally represent an open and sometimes dynamically changing community. Through research of implicit feedback, which we gain from user-application interaction, we can evaluate the usability. For example, we can detect adverse behaviour, resolve it and improve the use of application.

Basic usability testing provides us sufficient amount of data to help us evaluate the desing of application. Gaze tracking brings new aspect for evaluating usability. It offers information of which objects attract attention and why. By following the object gaze order we can tell how users search through web applications, creating spesific gaze pattern. In a book “How to Conduct Eyetracking studies”, from Jakob Nielsen and Kara Pernice, they claim that for using heatmaps to analyze gaze tracking data, we need 30 users per heatmap. Therefore gaze tracking is the most expensive research method.

We aim to create a method for usability testing in a specific domain of the applications, like e-learning systems or content management systems, on a basis of implicit feedback, particularly from gaze tracking. We want to research the possibility to generalize usability tests for web applications from same domain. Our goal is also to answer the question: “Is it possible to create a method, which will reduce the number of users needed for usability testing, but it will also preserve the value of acquired data for evaluation?”

Dynamic user modeling and behavior prediction

kassakOndrej Kaššák
doctoral study, supervised by Mária Bieliková

Abstract. Users carry out different activities on the web and thereby they create there a characteristic track. Whether we consider the web as a whole or only its selected parts, such as specific portals, it is necessary to filter user behavior tracks appropriately and use them to represent the characteristic features of the user.

With the present trend of data amount increase and its frequent updates, it is necessary to be able to model the user features effectively and to dynamically respond to changes in user’s preferences in time close to real. By capture of appropriate user steps, it is possible to identify the characteristic traits of his behavior, similarity measure to other users and also to predict his most probable future actions. Based on this knowledge, we can then personalize to user the displayed content or functionality of the environment in which is the user actually moving on the web. Also based on this information, we can better estimate how user will react to the personalization.

We plan to identify the user future behavior based on behavioral patterns typical for the analyzed domains or from frequent reactions of other users. For these kind of tasks are currently tend to be used the predicting trends methods, correlations in the data, and decision-makers.

Game-based support of online learning

kisPeter Kiš
master study, supervised by Jozef Tvarožek

Abstract. Lack of motivation of students is one of the main barriers to the efficient learning. In the case of online learning there are also suppressed natural human and social aspects, so the lack of motivation causes even worse results. Therefore, research is still looking for new ways to increase students’ motivation for learning in an online environment.

Games and gaming principles improves entertainment and increase overall involvement of students. Both of them are increasingly used in the online environment. Use of games and game principles, graphical visualization, and entertainment content for teaching programming opens the way to explore the impact of these elements in the learning process, the speed of acquiring new knowledge and the ability to select the most appropriate procedures for solving algorithmic problems.

Personalized recommendation for new users using context

kocian3Robert Kocian
master study, supervised by Michal Kompan

Abstract. The very first users’ interaction within a personalized system is crucial from the recommender system and user modeling point of view. This activity is critical because in these moments a user creates a relationship and opinion, which are important for user to the next use of the system. If the user is new in the system, we have no information about his preferences and thus no or only trivial recommendations (personalized) can be recommended.

Today we experience huge social networks increase, where users are identified into groups according to what interests, work or relationships user have. This attributes we can use for recommendations to new users respectively. We can obtain a huge amount of information from related or similar users that can be used for increasing the quality of recommendations for the new user .We can also consider the social context obtained from other systems and applications.

In our work we analyze the current approaches for the new user in the context of different types personalized recommendations. We explore the possibility of obtaining additional information about the new user from social networks and other systems. The aim of our work is design of methods for personalized recommendation with an emphasis on solving new user problem using methods enhanced by user’s context from related and similar users.

Recommendation Systems in Professional and End-User Software Engineering

konopkaMartin Konôpka
doctoral study, supervised by Pavol Návrat

Abstract. Software developers encounter many obstacles during the development and maintenance because of the lack of experience with new technologies, missing documentation or unsolved problems with third party libraries. Additionally, inexperience becomes more relevant in case of novice developers struggling to come out with a possibly innovative mobile app. Recommendation systems find their application in software engineering, easing solutions to problems throughout the whole software development lifecycle. We study areas of recommendation in software engineering, monitoring software development, developer’s activity and presenting actual recommendations.

Moreover, we analyze possible applications of recommendation systems in software engineering for users without diverse knowledge about software development. Such role is taken by end users who are able to customize given software with their own intent, when provided with the right tools. End-user software development becomes almost ubiquitous in recent years with advancements on the Web and smart devices market. Therefore, we study possible applications of recommendation systems in both professional and end-user software engineering.

Extracting Keywords from Movie Subtitles

profile_smallMatúš Košút
bachelor study, supervised by Marián Šimko

Abstract. Keywords and keyphrases although missing the context can be found very helpful in finding, understanding and organizing the content. Generally they are used by search engines to help find the relevant information. With the rising amount of information available on the Web, keywords are becoming more and more important, though it’s even harder now to determine keywords for all content by person, so we target on automatic keyword retrieval.

Movies and video content are becoming massively available and widespread. The ability to automatically describe and classify videos has a vast domain of application. In our work we aim at keyword extraction from movie subtitles which seems to be more efficient compared with video and audio analysis. We propose to use metadata included in subtitles in combination with keyword retrieval algorithm to get more accurate results.

Modeling Developer’s Expertise in Software House Environment

kuricEduard Kuric
doctoral study, supervised by Maria Bielikova

Abstract. Evaluating expertise of developers is critical in software engineering, in particular for effective code reuse. In a software company, technical and expert knowledge of an employees is not usually represented in a unified manner and it is difficult to measure or observe directly. How well (the level of) the developer’s expertise is problematic to determine/estimate automatically. For example, to exactly show the developer’s expertise with a technology (library) we need to give the developer to solve a test. However, there is often a problem to motivate developers to execute a test and they have different standards for judging the degree of expertise. Therefore, our steps are based on the automatic estimation of the relative expertise with the consideration of other developers and to compare them with each other in the company.

Our new idea is to establish automatically developer’s expertise based on monitoring his/her working activities during coding in integrated development environment (IDE), analyzing and evaluating the (resultant) source code he/she creates and commits to local repository. By applying our approach we are able to observe and evaluate different indicators. For example, we can sight the developer who often copies and pastes source code from an external source (Web). Source code contributions of such developer can be relative to the software project, moreover, it can reveal a reason of his frequent mistakes or low productivity in comparison with other developers.

Observing and Utilizing Tabbed Browsing Behaviour

labajMartin Labaj
doctoral study, supervised by Mária Bieliková

Abstract. We focus on observing, analysing, logging and utilizing tabbed browsing both within adaptive web based systems and on the open Web. Adaptive systems in general use information about users, content, etc. Implicit user actions expressed as tabbing could be used to improve user models, domain models, or aid in recommendation. Tabbing is currently regarded as a more representative notion of user actions during browsing than the previous models which considered visits to resources in a linear fashion and disregarded the possibility of having multiple pages opened at once, and thus returning to them without repeating the page load. Users do browse the Web in tabs and they do so for various reasons in different scenarios: keeping a page opened as a reminder to do or read something later, finding additional information about topic on a given page, etc.

The parallel browsing behaviour, however, cannot be reliably inferred from typical server logs. It can be observed with the aid of client side scripts embedded within web pages (observing all users of a single web application) or from a browser extension (observing tabbing on all web applications visited in the augmented browser, but only within a smaller group of users who chose to install a tracking extension). We previously realized tabbing logger within ALEF daptive learning system and on the open Web within Brumo browser extension. We experimented with the single-application logger by considering tab switch delays and switch pairs for recommendation and content relations discovery. We also propose Tabber browser extension, which allows users to view and analyse their usage of browsers tabs, and its data can serve as a dataset of browsing the open Web.

Personalized Scalable Recommendation System

lieskovsky_aAdam Lieskovský
master study, supervised by Michal Kompan

Abstract. Personalized recommendation is present on almost all major sites on Web, regardless their domain. It helps to reduce information overflow and brings added value to the site’s services. Nowadays the requirements for the scalability and reusability of recommendation systems are much more demanding than in the past. Usually recommendations have to be delivered in real-time, which with large data sets and user base increases the requirements on system resources.

We plan to analyze existing approaches and propose own method with focus on the scalability and performance aspects of recommendation systems. We evaluate our method with a system prototype, using large data sets considering both qualitative and quantitative aspects.

Content Recommendation from Archives of Questions Answered in Communities

lovasovaViktória Lovasová
master study, supervised by Ivan Srba

Abstract. Community Question Answering (CQA) sites such as Yahoo! Answers or Stack Overflow have become valuable platforms to create, share, and seek a massive volume of human knowledge. The task of question retrieval in CQA aims to resolve one’s query directly by finding the most relevant questions (together with their answers) from an archive of past questions.

Archives of questions however provide another potential which has not yet been fully discovered – to recommend solved questions that may be useful for users – (e.g to expand their current knowledge). We propose a method for personalized recommendation of solved questions considering the user’s interests. We evaluate the method in an existing CQA system.

Game For Video Metadata Acquisition

masiarAleš Mäsiar
bachelor study, supervised by Jakub Šimko

Abstract. Over the past few years, the importance of metadata has grown quite rapidly. For people it has become natural to include some metadata to information they’re putting on the web. However, in many cases this alone is not sufficient for efficient information processing, therefore the topic of metadata acquisition is still open for new ideas. Basically, there are 3 particular approaches of how to collect semantics and those are expert work, crowd work and machine based approach. In our work we focus on crowd sourcing, because in contrast to hired experts, crowds can produce much higher quantities of results and unlike machine approaches, the results are still in reasonable quality.

The intent of our work is to create game with a purpose which will acquire metadata for video data that already exists on the web. The game has to be challenging and entertaining enough to maintain its players and still has to produce metadata with sufficient quality.

Our design is based on searching videos that are playing on the screen using provided search engine. The original name of the video is hidden from the players, therefore they need to type keywords which in their opinion best suit the video they see. After several attempts they will probably find the video and we can use their unsuccessful attempts as our acquired metadata. The main principle of the game is quite entertaining from its nature and the gaming experience is also strengthened by pleasure gained from discovery of new videos or just watching the videos that players find funny or enjoyable.

Determining the parts of speech in Slovak language

meszarosDalibor Mészáros
bachelor study, supervised by Márius Šajgalík

Abstract. Determining parts of speech is a human process, which is performed almost subconsciously. Individuals learn this skill and hone it during early childhood on elementary schools. But even in adulthood, when we don’t need to determine parts of speech on everyday basis, we still somehow intuitively know which word belongs in which category. That’s where machine learning is often stuck, in determining atomic processes and patterns, which are for human absolutely trivial and hardly explainable.

Of course, there already are many implementations and techniques, how to face this problem. They are known as Parts-Of-Speech (POS) Taggers, but they are mostly focused on languages, which use simple patterns or universal rules, such as English. The most common POS Taggers use dictionaries and set of rules, which are highly limited. The whole problem around parts of speech stops being trivial with Slovak language, which is highly complex and morphologically rich with many words and with even more exceptions in rules.

Our goal is to explore possibilities with Conditional Random Fields (CRF), which are used nowadays in many solutions for other languages. They focus around probability of words position in sentences and so even if we can’t determine parts of speech with 100% accuracy, we can overcome many limitations of existing solutions. Also there will be an implementation of POS tagger for our native language.

Activity-based Search Session Segmentation

molnarSamuel Molnár
master study, supervised by Tomas Kramar

Abstract. Nowadays, identification of search goal is a essential and also challenging task for search engines to improve their personalisation. The knowledge of search goal and all queries supporting it helps the engine to understand our query and adjust sorting of relevant web pages or other documents according to our current information need. To improve the goal identification the engine uses other features of user’s search context and combine them together in order to identify user preferences and interests. Although, most of factors utilized for goal identification involve only lexical analysis of user’s queries and time windows represented as short periods of user’s inactivity.

In our work, we focus on utilizing user activity during search for extending existing lexical and time factors. By analyzing user search activity such as clicks and dwell time on search results, we better understand which search results are relevant for user’s current information need. Thus, we utilize user’s implicit feedback to determine relevance between queries by search results they share. Similar queries share noticeable amount of search results and therefore, we propose a method that utilizing shared links and implicit feedback (clicks, dwell time and skipped search results) as a factor for query similarity. We plan to integrate our model of weighted factors utilizing user activity and semantic analysis to existing search engines or servers like Elasticsearch.

Exploratory Search and Navigation in Digital Libraries

moroRobert Moro
doctoral study, supervised by Mária Bieliková

Abstract. A typical task that researcher novices such as a master or doctoral students has to face, is exploration of a new domain. Their goal is not to find specific facts, but to learn about the problem or the given domain and investigate the topics and existing approaches as well as the gaps in the current state of knowledge. Digital libraries provide various means of supporting exploratory search and navigation, such as faceted search or tag clouds which utilize documents’ metadata and semantics. However, the most natural way of navigation seems browsing, which does not force users to split their attention between the navigation interface and the search results.

In our work, we propose a method of exploratory search and navigation using automatically extracted navigation leads in the research articles’ summaries (or abstracts) which help to filter the information space of a digital library with the purpose of finding relevant documents. In the process of identification and selection of the navigation leads we consider the navigation lead candidate’s relevancy for the document in which it is identified together with its navigational relevancy, i.e. its potential to navigate (lead) to the documents relevant for the particular user. Other aspects, such as novelty and diversity of the leads are considered as well. We evaluate our approach in a web-based bookmarking system Annota by means of a long-term user study.

Gamification of Metadata Generation for Multimedia Content

Martin Polakovic
bachelor study, supervised by Jakub Šimko

Abstract. With the ever-increasing adoption of internet related technologies facilitating the boom of user generated content, there is growing importance of searching, filtering, organizing and analyzing the data, which necessitates acquisition of metainformation (metadata) in order to recognize and manipulate the data. At the intersection of crowdsourcing and entertainment lies a special approach to gathering data, games with a purpose (GWAPs), attempting to obtain data otherwise unreachable by machine.

Our aim is to successfully generate metadata content for existing multimedia, while utilizing elements of gamification in the context of crowdsourcing. Metadata is focused on multimedia content specifically. We intend to fuse gaming concepts with data gathering while maximalize both entertainment and data gathering efficiency. We hope to bring originality into GWAP design and push the boundaries of GWAP phenomenon a little further than it is.

Extracting Pure Text from Web Pages

peweHelmut Posch
bachelor study, supervised by Michal Kompan

Abstract. Nowadays plenty of websites contains far more data than the main content. A additional content on websites are navigation bars, page headers and footers, privacy notices and especially advertisment. Owners of websites adds this additional content on their website for reason of earning money or making website better-looking. For user often this additional content is useless and he needs to find the main content of website. Main text content of website is almost needed in designing methods for processing texts applied to the web.

My first goal is research existing methods for web content extraction in dependency of domain, language or successibility. Next step is to implement my own method for website content extraction and compare its successibility with others.

Tool Support for XML Documents Creation on the Web Portal

prekalaMartin Prekala
bachelor study, supervised by Marián Šimko

Abstract. There are many advantages using XML structure for keeping data in text form. For instance, you may preserve data structure such as arrays or tree, you may set your own tags and their behavior while displaying, or you may use predefined XML tags by well-known consortium W3C to simplify displaying data by other applications. However, working with pure XML requires not easy to learn know-how, is not that simple and user friendly, and therefore this uncovers the need of WYSIWYG XML editor.

My task is to analyze existing WYSIWYG editors, review them and then propose and implement WYSIWYG XML editor for the web portal COME2T, used for content management tool at our faculty. This editor should provide simplicity of creating new texts, easy usability to speed up the work and provide variable functionality to handle the text.

Employing Information Tags in Software Development

Karol Rástočný
doctoral study, supervised by Mária Bieliková

Abstract. A management of software development process is a crucial part of the software engineering, from which the success of software projects are dependent. This management mostly relays upon quality and freshness of software metrics and analysis over these metrics. Software metrics can be based on source codes or empirical data about software developers. Code-based metrics are well known and many approaches based on them have been proposed. But empirical software metrics are still uncovered part of software engineering even though they contain important information about software development process and they can be used e.g. for forecasting significant trends similarly as empirical data (e.g., implicit user feedback) in the web engineering. Reasons of this state are time expensive and erroneous collecting empirical data. We proposed solution of these problems based on collecting, storing and maintenance of developer-oriented empirical data abstracted to information tags and also empirical software metrics.

Nowadays we are working on proposition and evaluation of methods focused to automatic generation of information tags from stream of events and to automatic maintenance of information tags. As a core of these metrics we proposed information tags generator which queries stream of events in RDF format and executes tagging rules after successful queries evaluation. These tagging rules can be defined manually or learned automatically via analyzing modifications in information tag space.

Explicit User Input Quality Determination Based on Implicit User Input

rybarMetod Rybár
master study, supervised by Mária Bieliková

Abstract. Collecting implicit user input from users when interacting with system can provide us with valuable information. Tracking of eye or cursor movement, sweating or hearth rate can tell as lot about what physical and psychological experiences users have interacting with system.

This implicit data can than be used to analyse user explicit input and make assumptions and predictions about quality of the input. This can be helpful when evaluating psychological test or in interactions like age verification. In this work we will look at existing methods and try to develop new method for explicit input evaluation and quality prediction based on user implicit input.

Representation of documents in latent feature vector space

sajgalik2Márius Šajgalík
doctoral study, supervised by Mária Bieliková

Abstract. We focus our research on modelling discriminative representations of text documents. In the past, we tried to employ WordNet to extract key-concepts of documents. Later, we studied unsupervised neural network based methods for learning latent features of words instead of hand-crafted concepts.

Recently, we have experimented with novel keyword extraction method, which is based on feature vectors of words and learns feature vectors of whole documents. We showed that it is possible to move a level up from representing words to representing documents in the same feature vector space.

Our next goal is to move to another level higher to learn feature vectors of users based on feature vectors of documents they read. Successful completion of this last step will prove a modular learning scheme of our approach.

Stream Data Processing

JakubSevcech_fotoJakub Sevcech
doctoral study, supervised by Mária Bieliková

Abstract. During the last few years we can hear all around us a buzzword Big Data. The definition of this term is rather fuzzy, but one of the most frequent is that it is a common name for techniques for processing data which are characteristic by its big amount, big velocity and/or big variability. The most common technique for dealing with such data is batch processing. However in many applications this type of processing is not viable mainly due to delays caused by the batch job processing time. When we require real time processing we have to reach for the stream processing.

In our work we focus on processing stream of data where we are working on various methods for data analysis and data mining. The main challenge is to be able to process big number of various metrics running on the streamed data and to be able to do so in a single pass through the data with thorough memory limitations. We proposed a time series representation based on transformation of repeating patterns in the course of the time series into the sequence of symbols. We evaluate the applicability of this representation in various time series data analysis tasks while we focus on possibilities of such analysis not only on static collections of data, but mainly on streams of data.

Processing and Comparing of Data Streams Using Machine Learning

simekMiroslav Šimek
master study, supervised by Michal Barla

Abstract. From many different approaches of machine learning, multilayered self-teaching neural networks (a.k.a. Deep Belief Networks) using the unsupervised learning approach are gaining popularity nowadays. They used to be not accepted and largely ignored by most of the experts in machine learning community for almost 40 years. One of the reasons was simply because of too little computational power of available technology at the time. However, today it is already producing interesting results for example in computer vision.

One of the crucial attributes of software is usability. Devices like eye tracker are already here to help us monitor user’s activity, but they produce large streams of unlabeled data which need to be evaluated. Comparing these streams with expected usage of application captured by eye tracker may greatly improve this evaluation in many different ways. Our goal is to find methods and new ways of training to utilize the potential of multilayered neural networks and unsupervised learning to process and compare large streams of unlabeled data from eye tracker.

Recommendation using graph data structures

slezakDávid Slezák
bachelor study, supervised by Dušan Zeleník

Abstract. The recommendation is a process of suggestions as to the best course of actions for users.The subject of recommendation means an object which is suggested to user (music, movies, articles…) These suggestions can be adapted to the user (personalised) or may be the same for all users (non personalised). There are two basic types of recommendations: Collaborative filtering recommendations and content-based filtering Whether you are dedicated to one or the another of the above methods always work with similarity. In my project, I will be using more collaborative filtering so an approach that generates recommendations based on the preferences of other users.

Graph database is a database that uses graph structures with nodes, edges, and properties to represent and store data. For graphs generated links between users and items can be better suited for implementation shallow network graphs.

In the graph database i will try to find articles based on data by the reading of selected articles readers. I want to find out how to get required result on one used graph database and than compare it with results from another graph database. There will be maybe little difference in way to access data so we can get little difference in results or in performing time.

Tag Recommendation for Resources in Digital Libraries

spanikIgor Špánik
bachelor study, supervised by Róbert Móro

Abstract. Recommender systems are a part of many websites nowadays. Whether it is a system based on examining and comparing the similarity of users (Collaborative Filtering), content items (Content-based Filtering), or it combines the both approaches (Hybrid Filtering), the recommender systems help users find appropriate music, photos, videos or other items. Subset of recommender systems is tag recommendation. Tags are used to describe the content of the items and for organization of the resources. Tag recommendation serves to simplify the process of adding the tags to the items as well as convergence of the emerging folksonomy.

In our work, we deal with issues related to effective tag recommendation in real time in the domain of digital libraries. We aim tp propose and implement our own effective method that will take into account especially the users’ context. We will evaluate our method in a system called Annota, which is used to organize and share bookmarks.

Answerer-oriented Adaptive Support in Community Question Answering

srbaIvan Srba
doctoral study, supervised by Maria Bielikova

Abstract. In situations when Internet users are not able to find required information by means of standard information retrieval systems (especially web search engines), they have a possibility to ask their questions in popular Community Question Answering systems (CQA) such as Yahoo! Answers or Stack Overflow. The main goal of CQA systems is to harness the knowledge potential of the whole community to provide the most suitable answers on the recently posted questions in the shortest possible time. In our project, we present a novel perspective of answerer-oriented approaches to support knowledge sharing in CQA systems. We reflect this perspective in the proposal of question routing method which is probably the most important part of each CQA system. It refers to a recommendation of potential answerers who are most likely to provide an appropriate answer on the newly posted question. The proposed method promotes preferences and expectations of answerers while state-of-the-art approaches focus primarily on askers.

In addition, we consider CQA systems as innovative learning environments. In spite of that, their potential for supporting of organizational knowledge sharing and collaborative learning is only to be discovered. We develop a CQA system named Askalot which is designed specifically for universities where students can take advantage of the learning aspect in the question answering process. Askalot will provide us also with a possibility to study concepts of CQA in organization environment and consequently to apply the proposed method also in a live experiment.

Analysis of Methods of Interactive Problem Solving

stibilaJán Stibila
bachelor study, supervised by Jozef Tvarožek

Abstract. There are many more or less suitable ways of solving interactive problems. The main aim of my work is to explore approaches to solving 2048 game and compare the quality of different strategies, that players have developed.

This will require analysing game itself as well as recorded gameplay of many players. Gameplay records will be enhanced by eye tracking records. That allows us to get an overview about what part of the game board is player focus more in different situations.

For that, we need to develop not only tools for recording gameplay, but also methods and tools to analysing huge amount of gathered data. At the conclusion we will derive specific solving algorithms from most successful players’ strategies and compare them to existing algorithms and heuristic solutions.

Implicit Feedback-based Discovery of Student Interests and Learning Objects Attributes

strbakovaVeronika Štrbáková
master study, supervised by Mária Bieliková

Abstract. Web systems often store data about their visitors. This data is gathered either through explicit or implicit feedback from the user, and it reflects the user’s characteristics, representing their skills, goals, preferences and habits, their level of knowledge, their methods of accessing the concepts on the website and their behavior during their visit on the website. These characteristics create the basis for the user model and the data important for the creation of these characteristics may vary depending on the system’s domain.

In this work, we concentrate on user modeling in the domain of education, which is why the user characteristics we use come from monitoring the users studying and working in an adaptive web system. This system – compared to a standard web system – is enriched with the ability to adapt to the needs of the user.

Our aim is to be able to accurately specify the time a student spends by actually studying, by observing them through a webcam. From the values measured, we plan to more accurately assess the student’s knowledge of individual concepts, which will allow us for more accurate recommendation of content that the student should study.

Presentation of the results of personalized recommendations

svrcek2Martin Svrček
master study, supervised by Michal Kompan

Abstract. Nowadays, personalized recommendations are widely used and very popular. We can see a lot of systems in various fields, which use recommendations for different purposes. However, one of the basic problems is the distrust of users of recommendation systems. They consider them as intrusion of their privacy. Therefore, it is important to make recommendations transparent and understandable to users.

In this context, we want to propose several methods for presenting the results of the recommendations. Our aim is to use a standard recommendation technique and focus on different approaches to obtaining data about users, visualization and explanation of recommendations.

We also want to verify the suggested approaches in the selected application domain (news, movies, education, …) in order to obtain statistically significant results through the use of implicit and/or explicit feedback.

Using Parallel Web Browsing Patterns on Adaptive Web

tomaMartin Toma
master study, supervised by Martin Labaj

Abstract. The possibility to use browser tabs as a tool for parallel web browsing is definitely not new. In recent years however, more and more people tend to use this feature every day. Despite that, little research has been done to deeply analyze why, and more importantly how people use this browsing mechanism. Few solutions aimed and managed to utilize this information to add value for the Web browser users.

We focus on identifying patterns in browser tab usage data and connecting them into meaningful actions. After recognizing these actions, we provide appropriate recommendations. We focus mainly on in-browser tab action recommendations to increase users productivity. Possible actions and corresponding recommendations are: (1) Action: User closed several correlated tabs (in a small time interval). Recommendation: Close the remaining tabs automatically. (2) Action: User opened next tab, increasing the number of tabs to a certain limit. Recommendation: Close tabs accessed the longest time ago.

The main output of this work, a web browser extension captures parallel browsing activity and also provides action recommendations. Brumo project, a Mozilla Firefox extension, currently allows capturing parallel web browsing activity, but the nature and structure of logged data is not suited for the purpose of recommendation. We are currently implementing Google Chrome extension TabRec, which captures tab usage data, best suited for the purpose of this work, browser actions recommendation based on parallel web browsing.

Analysis of user behaviour in web application

truchanPeter Truchan
master study, supervised by Maria Bielikova

Abstract. In this thesis, we would like to find out, analyse and summarise the best metrics, which can be used for user behaviour tracking in the space of web applications. Based on these metrics, we want to evaluate their behaviour. Nowadays, there are some really advanced commercial technologies for site tracking and analysis e.g. Google Analytics or Adobe Analytics. It collects data about user and his behaviour, but in the end, it only present data about concrete web page or web site. We would like to focus more on the visitor of the web application and evaluation of his path through the website. If we can identify the most important metrics, which will tell us if the user is going the right path to the success, we can also tell, if the user is lost and needs help to achieve his goal.

We would like to analyse data and make a model of this user behaviour. The first step will be to get and process measured data. Then we would like to design a model for evaluation of these data. Goal is not to only identify bad pages, but also to identify groups of people and their common signs, which will tell us, if they are not familiar or they are lost in our application. If we could achieve this, we will be capable of improving bad pages or helping these people with some special approach to them. This thesis will discuss and try to improve modern way of inventing web application through testing, evaluating, improving and targeting content of website.

Determining the Relevancy of Important Words in a Digital Library

vangel2Máté Vangel
master study, supervised by Róbert Móro

Abstract. Text mining is a very important discipline, which can be used to fulfill various kinds of tasks like domain modelling, automatic text summarization or navigation in a cloud of keywords. The baseline of text mining is a technique, which is called keyword extraction. Keyword extraction is usually done from the text of the document, but in digital libraries there are some other possible options for extracting relevant words from research articles. One of these possibilities is to use the available information related to the article, which is called metadata.

There are many kinds of metadata in digital libraries, for instance keywords provided by the author, tags, year of publishing, category in which the article is located and tags associated by users. Citations can also be considered as an important source of keywords, because they can characterize the article. They can also highlight different, but relevant aspects of the analyzed article, which is relevant for other researchers. We aim to extract keywords from research articles via citations, using some specific metrics and attributes of citations, for example citation network, distance between citations, multiplicity and co-citations.

Web Content Extraction

Ondrej VlčekOndrej Vlček
bachelor study, supervised by Ing. Michal Kompan PhD.

Abstract. When you take a look at a modern website, you will probably see a beautiful designed piece with a rich content. There is a logo somewhere at the top of page with navigation underneath, some sidebar on the side and a footer at the bottom. The main content is usually in the middle. From the human point of view, all these blocks are necessary, as the visitor should feel comfortable and every information should be just a few clicks away. But when you open it as a plain text, it is a mess. You cannot find any useful information at first glance, all you see is HTML tags mixed with content. This is how the computer “sees” it. Finding the content of the webpage is much more difficult. Here we go, a challenge for bored IT student.

Extracting the main content from the source code of the web page is the main task in my bachelor study. There are some existing solutions. I will analyze them and test it with differently structured websites, measuring their efficiency. After a research, I will compose my own solution, improving some specific problem of the content extracting.

Analysis of Human Activities in Digital Space of the Web

vnenkĽubomír Vnenk
master study, supervised by Maria Bielikova

Abstract. Full concentration on a job is a hard task, mainly if the job is boring, unpleasant or too hard to do. We unintentionally seek for a distraction, for an occasion to do something different, something funnier. However, breaking our concentration repeatedly results in less efficiency and worse results.

In computer space we are able to analyse user’s behaviour. We can try to find out the moments when user is interrupting his work flow to do something not associated to his actual goal. We plan to use motivation as drive power to bring him back to productive work. However, we need to analyse and choose proper form of activity recommendation. We should not force him to get back to work but just suggest it. Or maybe we should force him to act right way. Who knows? Experiments will show the best way.

Modeling Programmer’s Expertise Based on Software Metrics

zbellPavol Zbell
master study, supervised by Eduard Kuric

Abstract. Knowledge of programmers expertise and their activities in environment of a software house is used in prior to effective task resolving (by identifying experts), better forming of teams, effective communication between programmers, personalized recommendation or search in source code, and thus indirectly improving overall software quality. The process of modeling programmer’s expertise (building the knowledge base) usually expects on its input some information about programmer’s activities during software development such as interactions with source code (typically fine grained actions performed in IDE), interactions with ITS (issue tracking) and RCS (revision control) systems, activities on the Web or any other interaction with external documents.

In our research, we focus on modeling programmer’s expertise based on software metrics such as software complexity and source code authorship. We assume that programmer’s expertise is related to complexity of the source code she is interacting with as well as to a degree of authorship of that code. In case of software complexity our idea is to explore alternative approaches to LOC (lines of code) based metrics, such as weighted AST (abstract syntax tree) node counting or call graph based metrics. With source code authorship we expect programmers who wrote some code to be experts on that particular code, but we need to consider only some degrees of authorship as the code evolves is changed by other programmers over time. Information acquisition for programmer modeling in our work is based on activity logs from programmer’s IDE. We plan to implement our method as an extension to Eclipse IDE for Java programmers and evaluate it on data from academic environment or (preferably) real software house environment.

Search in a Source Code Taking into Cnnsideration its Authors’ Reputation

zimenPeter Zimen
bachelor study, supervised by Eduard Kuric

Abstract. Nowadays searching in the source code and their reuse are very popular. When programmers trying find the best components for their programs, they have a lot of settings. One of these settings is reputation of author of source code. Its depends on a lot of facts, and it’s very difficult find it.

I analyze these options on search in source code. My goal is designed some method on rankings programmers.