Students’ Research Works – Autumn 2011

Search, Navigation and Recommendation

User Modeling, Virtual Communities and Social Networks

Domain Modeling, Semantics Discovery and Annotations

to the top | to the main


Doctoral Staff

bielikova
Mária Bieliková
Professor
o collaborative/personalized/context search and navigation
o recommendation for adaptive social web
o reasoning in information space with semantics
o user/user groups and contexts modelling

barla
Michal Barla
Postdoc
o user modeling
o implicit user feeback
o virtual communities
o collaborative surfing

jtvarozek
Jozef Tvarožek
Postdoc
o social intelligent learning
o collaborative learning
o semantic text analysis
o natural language processing

tvarozek
Michal Tvarožek
Postdoc
o explorative search
o personalization – adaptation and generation of user interfaces
o information vizualisation
o navigation in information space

to the top | to the main


Automated Foreign Language Teacher

adda

Michal Adda
bachelor study, supervised by Jozef Tvarožek

Abstract. Internet is great resource for learning new language. There are many articles, pictures, videos etc. which could be used for education. Our goal is to utilize these resources and create an application, which will help people to learn second language.

Application is aimed for both beginners and more experienced students. Its main purpose will be to extend student’s vocabulary. Application will take the most used words in language (extracted from Wikipedia frequency lists) from which it will select only some parts of speech (such as nouns or verbs) which will be presented to student in presentable form (possibly with images, sample sentences …).

After getting acquainted with selected words, the student can learn similar words to those he just learned or he can learn words that belongs to the same category (apple, pear => fruit). Application will be developed for two foreign languages (English, Spanish). It will be realized in C# using Silverlight technology.

to the top | to the main

Information Recommendation Using Context in a Specific Domain

bencic

Anton Benčič
master study, supervised by Mária Bieliková

Abstract. Analysts say that in 30 years we will no longer be able to buy regular printed newspaper for they will be completely replaced by their modern equivalent – internet newspaper and this claim may have some grounds to it. Internet news services are much more convenient by being able to deliver content incomparably faster and with the use of modern knowledge can even make it much more focused. That is why many people today opt for reading news on their mobile devices during their daily commute rather than buying their paper version. Mobile devices can however offer much more possibilities and thus can be a delivery, rather than just a mediatory platform.

Our aim is to create a news delivery service that can intelligently and autonomously push news articles onto a user’s mobile device. To achieve this we are studying usage patterns for example to determine how often a user checks for fresh articles and then use this information to provide her these articles autonomously. A number of studies also suggest that what type of news is user interested in depends on what time of day it is. This might however be more bound to what the user is doing around that time.

Because many people still work rather consistently at one place, location can be another good indicator of what the user would currently be interested in. Our ultimate goal however is being able to determine what the user is, or better is going to be doing and in what kind of environment. To achieve this we have to make use of different context information and services available on user’s mobile device as well as use them cumulatively from groups of users.

to the top | to the main

Recommendation of (Application) Tools

bielik

Pavol Bielik
master study, supervised by Michal Barla

Abstract. In today’s information age, we found ourselves oftentimes being overwhelmed by vast amount of information and tools. Information space for many of our tasks has become too vast to comprehend and effectively navigate through on a daily basis. To tackle this issue, many projects have been developed seeking to help users find relevant content whether it is newspaper article, music, movie, shopping or other.

At the same time, there is a shift towards mobile computing represented by smartphones and tablets, bringing us new opportunities to improve or adapt our current methods and models based on usage of these devices. In our work, we focus on modeling and subsequent recommendation of tools and usage of these tools for user groups with similar interests or behaviors, while using mobile devices as a source of information (along with existing web-based approaches). Example of possible tool to recommend is an smartphone application.

We often hear that true power and value of these devices are the applications not the phone itself. But how does user find out about an application that is useful for him or her within more than 200 thousand applications in the Android Market alone? I believe that answer is “hardly”. The main reason why it is hard, is because user often doesn’t really know what application he wants or has only a rough idea. To tackle this issue, recommendation method combining wisdom of the crowd, social networks and smartphone context (e.g. its usage) could be used.

to the top | to the main

Activity Recommendation in Research Based on Community Activities

burger

Roman Burger
master study, supervised by Mária Bieliková

Abstract. When starting a new project, it is almost mandatory to do research on the subject for the best chance for success. Even though this applies to all projects, it’s particularly important for students at universities. These students are studying new and usualy complex principles and that’s why they need throughfull research. It’s not unusual that starting researchers struggle to do a propper research. They often get overwhelmed with sources they search for and are having hard times to evaluate, sort, categorize and preserve these sources.

Existing solutions usualy help organize sources. While it is important to organize them, it is not really our goal. Our goal is to transfer knwoledge from source to student. Idealy if processing sources is faster than emerging sources. Organized bookmarks (in existing solutions) may easily gets uncomfotable large and deterring to process and thus not concerning on the problem. In this project we will propose a method that will try to fast nad effectively help transfer knoledge from sources to student.

to the top | to the main

Imagine Cup 2012 – Software Design

demovic_konopka_lani_tomlein

Ľuboš Demovič, Martin Konôpka, Marek Láni, Matúš Tomlein
bachelor study, supervised by Michal Barla

Abstract. Imagine Cup is an international competition, which motivates young people to use technology to solve some of the world’s toughest problems. Organized by Microsoft, Imagine Cup Worldwide Finals are held each year in a different city. Next year, they are being held in Sydney, Australia. The Imagine Cup 2012 theme remains the same from previous years: “Imagine a world where technology helps solve the toughest problems”. Students participating in Imagine Cup are also encouraged to focus their projects on solving issues outlined in the United Nations Millennium Development Goals programme.

As we aim to take part in the next year’s Imagine Cup, we understand that choosing the right theme for a participating project is a crucial part of our success in the competition. Technology can be used in many ways to help people from all around the world in their everyday lives. It has the potential to help the environment, improve education, health care, reduce energy consumption or to help fight a mortal disease. Aware of these possibilities, we challenge ourselves to come up with an innovative idea to present at the next Imagine Cup.

Until now, we have come up with several ideas that fulfill the mentioned goals. We think that especially one of our ideas has a big potential. It is closely related to social networks and collaboration. We have already discussed it with some organizations that have experiences with the problem area covered by our idea.

to the top | to the main

Acquiring Semantics and Metadata via Games with a Purpose

foto

Peter Dulačka
bachelor study, supervised by Jakub Šimko

Abstract. In order to provide the most relevant search results on the Web, machines need to gain and process the information first. However, not every piece of information can be processed automatically from the beginning and human interaction is needed. This task turned out as too expensive in numerous cases. In recent years, researchers were coming with various „Games with a purpose“ providing the combination of such an interaction on the one hand and fun which is mandatory to motivate people do such tasks on the other hand.

Our goal is to focus on music and create connections between songs, artists and other music related facts. We do not want only to categorize music, but to cover emotions and subjective point of view of listener as well. This task was targeted many times in the past, but automatic systems have not reached applicable success rate yet and without accurate base of usable data, machine learning systems could not be used either.

We would like to create a game with a purpose to solve this lack of music data. Unlike other games with a purpose containing only social connection mostly provided by multiplayer, we will have to handle emotional person-song connection. Because of this, we should be very sensitive while designing the game. This connection could tie the player to game for longer than regular games, but it could discourage him from playing it that easily too.

to the top | to the main

Recognizing User’s Emotion in Information System

fejes

Máté Fejes
master study, supervised by Jozef Tvarožek

Abstract. Human emotions and their expressions are congenital characteristics and independent of the specific person. Therefore they can serve as implicit feedback from users while using information systems. In the case of educational systems we can use facial expressions and speech signals to estimate the user’s opinion on the text he or she is currently reading. By the help of these expressions we can derive knowledge, interests and various other characteristics of users, and also obtain information about the content.

In this project the focus is set on obtaining, representation and utilizing of the emotions of users when using Web educational system. To find out what the subject has on mind is necessary to have a camera, that we gain photos of his/her face by. This is followed by evaluating of expressions. We can use various existing methods and services. They usually divide the face into several parts (eyes, eyebrows, mouth, nose) or areas (upper, lower, middle, etc.). For all of them there is a set of information about various states, that indicates one or more of the basic emotions: fear, anger, joy, sorrow, trust, surprise, etc. To represent emotions we can choose conceptual data model. In a relation takes place a user, an object (content) that caused the emotion, and a set of concepts (with associated weights), that represent different emotions or stereotypes.

The information obtained can be used for user modeling (knowledge, interests, similarity between users) and domain modeling (difficulty of learning materials, similarity between them). They also can be useful in adaptive education (choosing of appropriate level of explanation) and adaptive testing (choosing the following questions based on the responses and answer to the previous).

to the top | to the main

Educational Content Recommendation Based on Collaborative Filtering

edo

Eduard Fritscher
bachelor study, supervised by Marián Šimko

Abstract. In our times it is very important that web pages or applications, not only store information, but it is also needed that the page or application could communicate with the user in certain ways, and in e-learning it is more important than anywhere elsewhere. Studying something is a complex process and if the recommendation of the educational content to the user is insignificant or bad than the process will end in a failure. This signalizes the importance of good educational content recommendation.

One way of this communication is to learn from the users behaviors and try to recommend content based on the patterns that were extracted from the users behaviors, and here comes in the picture the so called collaborative filtering. Collaborative filtering has many types, just to mention a few there is memory based, model based and hybrid. With this powerful methodology we can construct an accurate and stable educational recommendation system.

Our goal is to propose such a recommendation method for an adaptive learning system Alef, which will be able to help students in their learning process. The recommendation will not only consider the relations between users when recommending articles, but even actual progress of the user. This way the user will not get recommendations that are irrelevant for the actual state of his learning process.

to the top | to the main

Discovering Relationships between Entities on Social Adaptive Web

holub

Michal Holub
doctoral study, supervised by Mária Bieliková

Abstract. Information about various entities that can be found on the Web is mostly intended for processing by humans. Data is scattered across many web portals. The problem is that there are few explicitly expressed relationships between these entities so we have difficulties creating more intelligent web applications. Relationships bring semantics to the data which subsequently allows us to improve recommendations, results of text processing, etc. Search engines working with the Web of Data with semantics can provide more precise results for the queries, especially when asking questions about entities.

Relationships between entities and objects are also essential for their integration and creating mashups of things. We can use it with exploratory search when we create an overview of the target domain from web objects. There are various types of relationships which we can find between entities. One type represents hierarchical links (e.g. isPartOf, isA, sameAs). The other type represents complex real-world relations (e.g. earthquake causes tsunami) which are more powerful but harder to discover than hierarchical relations. We can also connect objects based on the way people interact with them on the Web.

In our research we aim at discovery of relationships between entities belonging to various problem domains. This corresponds with the Linked Data initiative which we want to promote. We would like to discover predefined relationships to create domain models in the areas of digital libraries and knowledge of a software company. We also plan to discover new types of relationships using information extraction techniques. This introduces research challenges such as verification and validation of discovered relationships, their weighting and ranking. We would like to search for relationships between named entities as well as between web objects.

to the top | to the main

Automatic Text Processing on the Web

horvath

Róbert Horváth
master study, supervised by Marián Šimko

Abstract. World Wide Web contains great amount of data, which is comprehensible only to us, people. Due to its large amount it becomes hard to process, but computer power could help us with easier understanding, faster information gathering or documents categorization. Transformation of data into some usable form for computer is a complicated process, since documents on web are written in natural language and do not follow strict rules or structures. Natural language processing consists of many tasks from term identification, word sense disambiguation to creation of concept maps.

Our goal is to analyze existing approaches to natural language processing and propose a method which will support user while browsing. We aim to use not only browsing data but user profile as well. To gather user data and provide real-time support we propose integration of our solution into existing proxy project. To evaluate our method, we plan to conduct an experiment with a selected group of users.

to the top | to the main

Gathering Information on the User Environment

jendek

Tomáš Jendek
bachelor study, supervised by Dušan Zeleník

Abstract. User’s environment like location and time provide additional information on user’s behavior on the Internet. In combination with other contextual information like user’s current mood and his activities on the Web, posting status updates and writing blogs, it can provide overall view on user. This can be very useful for other purposes – recommending, personalization.

Mobile devices and smartphones are nowadays wide-spread and can be used for gathering information on the user environment. Social networks like facebook or twitter create kind of virtual life. People’s connections and friendships on these networks can be understood as social contexts which can be combined with tracking user’s position in time using smartphones. This can help us to create context-sensitive applications which can be useful for improve person’s time management (keep track with friends and their current activity, predicting friend’s location).

Goal of our work is to define possibilities of gathering contextual information on user environment. We want to design method to predict user s activities in time and place, considering social context. For experiments and evaluation purpose, we will implement mobile application which will use this information to weight connections and friendships. Since we want to focus on social context, we are going to compare outputs of our method to existing methods for computing friendship intensity. To evaluate prediction itself, we considered using training /validation principle for the interval of monitored user. behavior.

to the top | to the main

Discovering Keyword Relations

kajan

Peter Kajan
master study, supervised by Michal Barla

Abstract. Relations among keywords may be used to form useful structures, such as taxonomy. Taxonomy mainly contains relations of similarity and hierarchy and it is often used for classification or investigation of similarities in document’s context.

Several approaches which deal with the creation of the taxonomy by discovering the usage of the Web were published. In this project we will analyze the logs of the PeWe proxy server. Our idea is to study the browsing history of the users – the streams of the visited pages in one session. We assume that these pages have similar concept which implies the similarity between their keywords. The main issues we will be dealing with in the next months are: dividing the user’s browsing into the sessions (the following pages concerning the same concept) and identifying the homonyms and their meaning in the actual context. The goal of the project is to design an algorithm creating the taxonomy and to evaluate it in the domain of the personalized search.

to the top | to the main

Personalised Recommendation of Places

kanta

Marcel Kanta
master study, supervised by Marián Šimko

Abstract. Microblogs are common part of modern human life. It is safe to say that every young human is part of one or more microblogs. They are evolving really fast and this phenomenon is object of many scientific studies these days.

Recommendation of objects is useful use case that save time and help people to find what they need, especially when there are too many objects. Personalized recommendation is one step forward. It takes user’s personality into account, so that every user can see its own personalized recommendations.

Our approach into recommending is that we take places near the user and based on the user’s personality and what he seeks, and we recommend him places that he may like. To do this, we create a user model from his actions on Facebook and then thanks to databases such as Wikipedia we create lightweight ontology that helps to create relations between user’s preferences and places. Ultimate goal is to create semantic, personalized recommendation of places.

to the top | to the main

Named Entity Recognition for Slovak Language

kassak

Ondrej Kaššák
bachelor study, supervised by Michal Kompan

Abstract. Text preprocessing plays an important role in the methods, which are based on text content as recommendation or search based tasks. Text analysis and extraction of the named entities (identification of persons, organizations, dates, etc.) can bring substantial improvement not only in the mean of accuracy, but also efficiency and extensional opportunities of recommender systems.

Named entity recognition is nowadays well-studied and researched area for languages like English, where nouns have mostly one form in all cases of inflection. Systems that recognize the following languages achieve a relatively high accuracy of entity recognition in the text, as well as precision of the entity type identification. Our aim is to design a method for extracting named entities for the Slovak language. Slovak as a representative of the flexive and Slavic languages with dynamic form of words and free word order in sentences is challenging for automatic name entity recognition task. It is necessary to find potential entities in the text by the word comparison based on language dependent dictionary or a text corpus. Thereafter we determine the beginnings and endings of the candidate for entities using decision trees, because they often involve multi-word names.

The result of our named entity extraction method can be used in several text processing approaches. Extracted features we can use during similarity computation in the content based recommendation, or for construction of several term networks etc.

to the top | to the main

Acquiring Semantics and Metadata via Games with a Purpose

kiss

Marek Kišš
bachelor study, supervised by Jakub Šimko

Abstract. There are many tasks, that are very difficult for today´s computers and are not automatically solvable in required quality. Some of them are however relatively easy for humans, therefore human brainpower can be used to solve them. Creating annotations to web content, especially to a multimedia content is a typical example of these tasks. This is not hard work for ordinary people, but it´s very time consuming and we can´t rely on human´s altruism. So we need to motivate them to work for us. Paying people salaries is not a good solution, because for tasks like this one we would need contributions from a large number of them. We believe that answer could be to stop thinking about this contribution like a work and make a fun of it.

Our goal is to design a game with a purpose (GWAP) , which will use human computation to identify a meaning and feeling of pictures in a way, that people will like and play it again. In our game we focus on obtaining valuable annotations of emotional point of view.

to the top | to the main

Group Recommendations for Adaptive Social Web-based Applications

kompan

Michal Kompan
doctoral study, supervised by Mária Bieliková

Abstract. Personalized recommendation becomes more and more inherent part of nowadays’ web-based systems. There are plenty of situations when users interact socially, i.e. influence our behavior according to social context presented by other people. In some situations we want to interact socially (e.g., a dinner), but in other situations we are forced to participate in groups (e.g., mass transit). The question is why we have to act as a single user in our “virtual life” while in our real life we tend to act socially? Our social interactions are limited by the borders of social networks. It is clear that in the context of a group, the recommendation obtains new dimensions. The single user preferences and needs are important; moreover, the preferences of others and thus the group needs become more visible.

One of the not so typical group recommendation approach usage is applying such an approach for boosting collaborative cooperation in on-line systems. For example users interacting with e-learning portal, wants to learn and to solve problems together. Thanks to group recommendation we are able to choose exercises, which are suitable for actual users, and thus obtain balance of number of interacting users and on the other hand usefulness of selected exercises for these users. Second example of such an application is domain of programming company, where users need to cooperate.

By modeling user satisfaction we can predict user’s preferences in the context of other group members. This is extremely important when not only one item but sequence of items is recommended. Group recommendation approaches enhanced by social context of group members can improve user satisfaction during the recommendation process.

to the top | to the main

Identifying Personal Characteristics Using Microblog

korenek

Peter Korenek
master study, supervised by Marián Šimko

Abstract. Microblogs are already a part of everyday life of a large group of people around the world. A goal of microblogging is to share ideas and opinions between friends and relatives in the rapid, but concise form – up to 140 characters. Microblogs have become very popular thanks to their simple usage. Microbloggers do not only publish their ideas, but they also read microblogs of other users to obtain new and interesting information and ideas. It is always good, if this information comes from users, whose microblogs most people consider as correct and verified. These users are called authorities and the aim of this research is to identify the most significant authorities on the microblog social network.

Currently there are many approaches that are trying to use a large number of metrics to identify the authorities. Existing approaches, however, do not analyze the text of microblogs because of doubts from very small length of texts.

This research analyzes the text of microblogs in terms of the user characteristics and his writing style. For each microblogger we analyze linguistic features of his microblogs, we specifically focus on his reactions to other microblogs. Using linguistic analysis we try to identify emotions in microblog texts. We consider emotions as the one of the most important features in authority recognition, because they express the microblogger’s attitudes about topic he writes the best. The most significant authorities are microbloggers, who have the best ratings from people who commented their microblogs.

to the top | to the main

kramar

Tomáš Kramár
doctoral study, supervised by Mária Bieliková

Abstract. The idea of personalized search – the search that is not generic but adapts to user’s needs, is as old as the search itself and is subject of long-term research. There are many methods and approaches, which create interest model of the user in order to influence ranking and ordering of the search results. Most of these methods only deal with long-term interests of the users and as a consequence mix user’s current and past needs. We focus on short-term interests, their collection and usage in the process of personalized search.

We propose a method for modelling users’ interests with various automatically extracted document metadata and combining the created user models based on the similarity of user interests. The enhanced user models, which are a combination of metadata based interests, communities of similar users and the interests of similar users provide more data for the personalization, leading to more accurate search results.

To deal with the problem of long-term user model, we propose to use a short-term context, built only with data from the current search session, enhanced with user models of users with similar interests, based on the short-term interests. This approach would have the advantage of the rich, enhanced user model, but would not contain the damaging long-term interests. The main challenge in this area is finding the moment of context switch, based on the vectors of document metadata. We also have to deal with parallel browsing sessions by maintaining a stack of short contexts.

To deal with the problem of multiple personas, we propose a layered, contextual user model, where each interest is constrained with its context – time, date, location or other available sources. When the user model is about to be used to provide search context, it would be possible to only consider those layers, which match the given constraints. Also, the created social network should be layered and when enhancing the user model, only some layers should be used.

to the top | to the main

Modeling User with Computer Games

pkratky

Peter Krátky
master study, supervised by Jozef Tvarožek

Abstract. Many web applications adapt their content according to the user’s characteristics and preferences so that he finds what interests him the most. Therefore the need of the user modeling arises. The traditional approach to find out user’s characteristics is to ask him to fill in a special form. However, the user doesn’t like answering questions which might be too personal for him. Such an approach has to be funny and appealing. Computer games meet this criteria and performed actions of the user could tell much about him.

Our goal is to design a game which analysis user actions and create an appropriate set of actions to be tracked. Results of the analysis based on emotional models proposed by psychologists are used to determine user affect. The user affect is an important input for an adaptive system not only to make the content personalized but also to make the whole process of interaction with the user more effective.

to the top | to the main

Collecting Implicit Feedback for the Needs of Search Engines

kriz

Jakub Kříž
bachelor study, supervised by Tomáš Kramár

Abstract. To improve the results from full text search engines and make them more personalized we need to take into the account the interests of the particular user. Current user models used for recommender systems analyze the documents the user visited or the search results, which the user followed. To improve this method it is better to consider which documents the user actually liked and read and which documents he saw, but didn’t interest him.

This interest, feedback, can be collected implicitly by analyzing user’s activities on a web page, such as time spent browsing, scrolling, mouse movements, number and position of clicks, selection of text and copying the text. Based on the factors we can estimate the user’s interest in the matters discussed in the document. To further improve the user model we can also analyze which paragraphs in an article interested him the most or which results of a search he liked and considered clicking.

We are going to analyze and evaluate existing methods used to collect implicit feedback and, based on the analysis, design our own which will be experimentally tested.

to the top | to the main

Modeling Temporal Dynamics of User Preferences in Recommenders

kuric

Eduard Kuric
doctoral study, supervised by prof. Mária Bieliková

Abstract. Adaptive applications for social web require design of models, which allow users effective information access. These models are often adapted for current users’ characteristics (e.g. knowledge) and context (e.g. time). Prediction and recommendation systems are generally based on models of users’ preferences. These systems analyze patterns of user preferences in items to provide personalized recommendations of items that will correspond with user’s taste. However, these preferences are usually treated as constant over time and ignore temporal dynamics of user interests. Research in temporal information retrieval provides many issues and challenges such as retrieving temporal summaries for documents, determining temporal similarity, temporal clustering, spatio-temporal information extraction and searching in time.

Recommendation systems based on temporal IR methods evaluate temporal similarity of content, i.e. there is a goal to retrieve temporal similar content that is contemporary to the target content (documents). For example, there is a need to identify if one document is subinterval of another document and predict which content is likely to change over time. There are many interesting issues how to determine the lifespan of a main event or detect trending events, too. For example, in a system where users rank items using star rating technique, a user that used to indicate a neutral preference by a “3 stars” input, may suddenly indicate disappointment by the same “3 stars” feedback.

The goal of our research is creating methods for modeling user preferences for building a recommender system. We expect user preferences to change over time. This includes new temporal information retrieval methods.

to the top | to the main

Recommendation on Social Adaptive Web

labaj

Martin Labaj
doctoral study, supervised by Mária Bieliková

Abstract. Nowadays more and more Web systems are becoming adaptive, when they are modifying their behaviour to suit different users’ needs. Recommender systems are an important part of such adaptive Web systems. They are of benefit to both users (e.g. see more items in which he is interested in without navigating through vast amount of items available) and system owners (e.g. sell more items).

Although in technology enhanced learning (TEL) environment the goal is not to sell items, but to aid users while learning or aid teacher while preparing a course, the key principles are the same. Following a recommendation does not cost money, but it still does take a time to read recommended resource. Also existing identified user tasks (annotation in context, find good items, etc.) have meaning in TEL in relation to learning items and are already supported by recommender systems. There are tasks that could be supported by recommender systems – find novel resources, find peers, find good pathways. Some of them have utilization in other similar applications, which are not TEL systems themselves, but share similar features: users navigating digital space.

In order to infer recommendation for individual users, both by means of content-based filtering or collaborative filtering, we must have sufficient information about users’ needs and responses to recommended items. In previous research we employed implicit feedback including base indicators like mouse movement and gaze position to the level of fragments of learning texts.

In our research we are aiming at improving methods of recommendation in TEL systems and while browsing through digital space.

to the top | to the main

Automated Public Data Refining

mliptak

Martin Lipták
bachelor study, supervised by Ján Suchal

Abstract. Nowadays many government institutions share some public data on the Internet. While various registers and bulletins are essential for proper business communication, another data enables public monitoring of our government. Although this data is publicly available on the Internet, its format and structure is often unsuitable for machine processing.

Our goal is to refine this data to make further processing possible. After downloading raw data from various public web sources (Business Register, The Trade Register, Statistical Office, Ministry of Justice, Central register of contracts, The Bratislava Stock Exchange, etc.) in many different formats (html, pdf, xls, images), we extract structured data (parsing, OCR) in a common format (relational database). Then some text clustering methodologies like fingerprint or nearest neighbour methods or other approaches will be used to filter out numerous spelling mistakes and ambiguity.

Our results will be made available to third-party organizations like Fair Play Alliance or Transparency International. They will take care of data examination and the whole society can benefit from our effort.

to the top | to the main

Finding and Acquiring Metadata from Websites

milan

Milan Lučanský
master study, supervised by Marián Šimko

Abstract. World Wide Web provides access to enormous amount of data. Even though those data are freely available to every one, we have problem to process it because of the form the data are stored and also the amount is an issue. There is potential to provide advance data processing such as categorization, recommendation or context advertisement placement but we have to build semantic layer which is necessary for advanced tools. We need automated method to extract relevant meta-information from the web content. There are some approaches to extracting information from web pages, but most of them are not suitable for “dirty” World Wide Web environment.

We focus on extracting keywords describing the content of web page. Acquiring the keywords is important step before getting the concepts, which are essential for creating ontologies. There are approaches more or less successful in acquiring relevant keywords from text documents. We use those approaches, namely automatic term recognition (ATR) algorithms, to acquire terms from webpage plaintext. These algorithms are based on statistical and probabilistic models and are part of research in field of Information Retrieval (IR). The web pages have great advantage beside plaintext documents. They contain HTML tags and CSS markup, which could bring semantics to the plaintext placed on webpage. For example we can distinguish title or headings from regular paragraphs (thanks HTML tags) or change words appearance using CSS styling. Those words are likely to be important because author of webpage wants to highlight them. The third element of web we would like to utilize is linking between the web pages and anchor texts. Anchors are very important and valuable source of potential keywords because they in short summarize the content of web page they link to and more so they come from outside of examined web page, so it something that someone else is talking about us.

We are combining all three resources (ATR, HTML + CSS, Links) to produce even better and more relevant keywords. Moreover we would like to involve to our work a tool (thesaurus) for discovering synonyms and hyponyms for being able to produce keywords not present on the webpage or anchor texts.

to the top | to the main

Statistic Translation of Natural Text

p_macko

Peter Macko
master study, supervised by Dušan Zeleník (team project)

Abstract. We have already used to it that Google offer to us as programmers many powerful tool. Such tool is the Google Translate, which now uses a lot of users and programmers in the world. Google, however, decided that in the future translation of text for programmers will be open just after pay fee.

Translation of a text plays in many projects of our faculty very important role. Therefore, in large amount of projects participates Google Translate. This became the motivation, why we would like to create similar product. Our goal is to create a translator that can translate text in different languages like human brain. Our translator in the first phase will build on the translation of individual words in sentences. In the second stage we proceed translation, which will be supported by statistics. Thanks to statistic we will knew which translation of selected part of text will be the best. By this method, we would like to achieve the best result of the output translation.

Many languages, as Slovak, are quite complicated. In these languages, therefore we want to do deeper analysis. We breaking down sentences to constituent, translate it in the right form and put it back to sentence in other language in right order.

to the top | to the main

Context-Based Recommendation

mitrik

Štefan Mitrík
master study, supervised by Mária Bieliková

Abstract. We live in the age full of information, and automatic filtering or recommendation of the information can helps us to get the most valuable ones. We need to involve user preferences into this process. Most of the current solutions work with long term user observation whereby recommendation of proper information is done.

The smartphones and intelligent mobile devices are great way to determine current situation or context of the user and his or her needs. The context of smartphone includes not only location but also other interesting information such as network connectivity, lightning conditions and more.

Our goal is to analyze existing solutions and propose a method, which will use context of the user for effective recommendation. We plan to evaluate our method in real-world experiments.

to the top | to the main

Personalized Text Summarization

moro

Róbert Móro
master study, supervised by Mária Bieliková

Abstract. One of the most serious problems of the present-day web is information overload. As we can find almost everything on the web, it has become very problematic to find relevant information. Also, the term “relevant information” is subjective, because as users of the web, we differ in our interests, goals or knowledge. Automatic text summarization aims to address the information overload problem. The idea is to extract the most important information from the document, which can help users to decide, whether it is relevant for them and they should read the whole text or not. This approach can even be used to relieve the users from the need to read the text in its entirety, which can be especially useful when summarizing multiple documents.

We propose personalized method of text summarization, which unlike the classical (generic) automatic text summarization methods takes into account the different users’ characteristics, i.e. their goals, interests or knowledge. Much information about users’ interests can be inferred from their browsing behavior. If a user reads a document about a particular topic, it serves as an implicit feedback, from which we can assume, that he or she is interested in the topic. Also, with the arrival of Web 2.0, users are no longer passive consumers of Web content, but they can create content and add metadata, such as annotations, tags etc. Because annotations can indicate user’s interest in the particular parts of the document, we can use them as another important source of personalization.

We plan to evaluate our proposed method in the domain of e-learning in ALEF (Adaptive Learning Framework); however, the method itself will be designed and implemented as domain-independent.

to the top | to the main

Processing and Exploitation of Metadata Obtained via Crowdsourcing

nagy

Balázs Nagy
master study, supervised by Jakub Šimko

Abstract. For implementation of effective search and navigation in documents (files, web pages, photos) it is necessary to have enough descriptive metadata available (e.g., the subject of a document, the type of a page, what is in a picture). Automatic acquisition of metadata is technically difficult due to ambiguity of natural language or problems with the identification of objects in the multimedia content. One effective way to create metadata for content is the use of human computation – human intelligence, which could be encouraged for example through games.

In our project we want to create a tool or a game which helps us exploit the power of the human crowd and provides us with useful and valuable (meta) data. We want to focus on obtaining metadata for personal photo albums and use them to keep that albums organized. Using them we can also search, order and filter photos much better than before.

Another potential of this project is that we could combine it with many existing solutions like photo browsers, galleries or album visualizations and so create a comprehensive solution for organizing personal photo albums. Last but not least obtained metadata can be also used in other projects or explorations as a good resource.

to the top | to the main

Automated Cleaning of Public Data

ondrej-proksa

Ondrej Proksa
bachelor study, supervised by Ján Suchal

Abstract. A necessary precondition of transparency is the publication of data that allows greater control by the public. Automated post-processing, structuring and matching data from various sources became a huge problem. There are public data available on the Internet in very confusing and unstructured formats. Another problem is the disunity in the publication of available data. All the data published by the public sector (insurance companies, ministries, central registry, domain registrar, public portals …) has different structures and forms. Some portals publish incomplete or ambiguous information, which may contain typos.

Firstly, it will be necessary to locate on the Internet all the available public data and create an automated downloader that will be used (using API) by third parties. For example nonprofit organizations that deal with public data or companies that need such data for commercial purposes. As data on the Internet are constantly changing, it is necessary to create a tool that would update the data, erase them if necessary and add new – when a new contract is published, a new company or domain is registered, an organization ceases it’s operation, a company changes its management etc.

The main aim of my work is to process this data. Data will be automatically downloaded according to regular disclosures and updates. I’ll analyze the possible approaches for processing and cleaning the data of various sources and deal with typos and ambiguous information, which will then pair with resources from other public sectors. I created a solution to verify the specific selected data from the public sector – Všeobecná zdravotná poisťovňa, Sociálna poisťovňa, Obchodný register SR etc.

to the top | to the main

Maintenance of Annotated Content

OLYMPUS DIGITAL CAMERA

Karol Rástočný
doctoral study, supervised by Mária Bieliková

Abstract. Annotations are new important dimension of knowledge which is collaboratively created by users of information spaces, especially digital libraries. Annotations have lot of different forms and meanings, but all annotations are strictly related to annotated content and its parts. This annotated content is often static in most of digital libraries. But usability of annotations will decrease in digital libraries with dynamic content or in cases, when commonly static content have to be updated. Because of strong relation of annotations to content, after modification of annotated content, it is necessary to update annotations of modified content, too. This problem is currently solved by manual correction of annotations by editors in smaller systems. But this is not applicable in larger systems with a numbers of annotations. We address this problem by method of automatic correction of annotations after modification of annotated content.

To propose of this method we have to define algorithm, which provides automatic corrections and annotations model, with which this algorithm works. It is necessary for proposition of algorithm to identify types of annotations provided by users in different types of content, their meanings and how different types of modifications affect these annotations. The proposed annotations model has to respect requirements of proposed algorithms and also allow effective access for other system tools that can possibly access annotations, e.g. recommenders.

to the top | to the main

Decentralised User Modelling and Personalisation

marius

Márius Šajgalík
master study, supervised by Michal Barla

Abstract. The most common way of performing personalisation is the usage of servers which gather data about users, create user models to predict their behaviour and adapt to their needs. The problem is however, that almost every web-based application and service maintains its own user model and does not take the advantage of the existing ones. This also reflects in the quality of personalisation if the user data is insufficient. However, with the emergence of the web, the web browser becomes the main work environment of the user. Therefore, it is reasonable to make use of it and group the user data right in the browser without bothering about user platform.

To achieve that, our project focuses on the development of the cross-browser personalisation platform built on top of the Crossrider framework. It brings the personalisation to the user by creating user model in the browser, taking advantage of the whole user browsing history and whatever actions seen by javascript events with the possibility of communicating among other users, thus exposing almost limitless possibilities of personalisation.

The whole system is easily configurable and extensible by personalisation extensions which are simply pieces of javascript code with the built-in support of the jQuery framework and our personalisation API and therefore, they can be developed even by the user.

to the top | to the main

JakubSevcech_foto

Jakub Ševcech
master study, supervised by Mária Bieliková

Abstract. There are many different tools to support manual creation of annotations on the web. However, only few users actually use them. There is lack of motivation to create annotations when users are viewing ordinary web pages. Users rarely return to the same page or the page is too short so it isn’t necessary to highlight contained information. At best, user just create a bookmark on the page and he assigns keywords to these bookmarks, so he can easily organize them later.

I believe that if users had a better motivation for returning to the site, they would create more annotations. There may be enough motivation when viewing documents in different digital libraries. These documents are more complex and sufficiently long so that the user may be interested in creation of annotations. I would therefore focus on encouraging the creation of annotations in such documents. In this way users could create personal collections of interesting documents using annotations and bookmarks.

I would also work on discovering other kinds of motivation or other forms of annotations, users could create.

I would like to implement the solution as an extension to an existing proxy server. Such a solution will work on portals of various digital libraries, but on the wide web as well.

to the top | to the main

Harnessing Manpower for Creating Semantics

simkoj

Jakub Šimko
doctoral study, supervised by Mária Bieliková

Abstract. To search the Web and utilize its content, we require a homogeneous layer of metadata for its resources so they can be easily processed by machines. The Semantic Web principles were created to provide a worldwide framework for creating rich web resource annotation. Although much work has been done in the field of automatic semantics acquisition, the human work is still a need in semantics-building tasks.

In our research, we examine the family of human-oriented semantics acquisition approaches – the games with a purpose (GWAPs). These games harness the human brainpower for solving computational problems, which are hard or impossible to be solved by machines. However, transformation of an arbitrary problem into an appealing game is non-trivial process and it is not clear how it should be done, since design of a GWAP has to align two opposing principles (work and fun) and solve several other issues like cheating prevention or control of the player’s output. Our research aim is to explore best practices for designing GWAPs.

We also experiment with two our GWAPs for semantics acquisition. The first game, the Little Search Game, is a game on web search query formulation where the player’s task is to reach lowest possible number of result retrieved by the search engine by using negative search principle. The purpose of the game is retrieval of the term relationships and our next aims are focused on usage of the game within specialized domain and also towards naming the relationships between terms. The second game, PexAce, is a modification of the Concentration game (Pexeso) and retrieves tags for images used in the game. We plan to modify the game to provide annotations for personal multimedia archives.

extended abstract
to the top | to the main

Automated Hierarchical Relationship Discovery for Educational Content Description

simkom

Marián Šimko
doctoral study, supervised by Mária Bieliková

Abstract. In order to make the learning process more effective, educational systems tailor learning material to user goals, needs and characteristics. Adequate adaptation requires a domain description enabling adaptation engines to make at least basic reasoning. We focus on “lightweigt” domain modeling leveraging relevant domain terms and relationship between. One among many types of relationships is hierarchical relationship (is-a) forming a skeleton of domain conceptualization.

As manual definition of relationships between relevant domain terms is tedious and time consuming task, we devise methods for automated semantics acquisition. In our current work we aim at automated hierarchical relationships discovery in particular.

We proposed and evaluated a method for is-a relationship discovery, which is based on statistical and linguistic processing of underlying learning content. We particularly focus on lexico-syntactical patterns. Obtained results are very reasonable as they showed that automated processing can facilitate authoring of adaptive course for a teacher or domain expert.

to the top | to the main

Information Retrieval using Short Text Similarity

sokol

Pavol Sokol
master study, supervised by Michal Barla

Abstract. Industry engaged in the semantic web is today perhaps one of the most studied areas of computer science. Its basic elements are explicit expression of meaning, knowledge representation and their connections to ontology. Realization of an idea, according to its author, is to build mutually communicating agents, handling small domain area. Linking these domains emerge the new semantics on the web which can be directly available to end-user or be granted to other agents. However, the explicit expression of meaning implies a considerable effort to be spent on its construction. In addition, construction of this base will never be completed and those neither used in many domains. Therefore we are dealing with an approach where the short text similarity is used to explore and discover new related information.

We present a method in domain of information retrieval and short text similarity that looks for relations in information space. We use the metaphor of the human brain where we are creating sets of information linked by weighted bonds. One set is a carrier of information which consists of a set of grouped concepts. In such way defined space we are looking for a new connection related to the predefined demand.

to the top | to the main

Encouragement of Collaborative Learning Based on Dynamic Groups

srba

Ivan Srba
master study, supervised by Mária Bieliková

Abstract. Computer-Supported Collaborative Learning (CSCL) is approach to learning of students which is based on support of information and communication technologies (ICT). Main task of CSCL is linking together two trends. First one is support of students’ collaboration during learning in small groups. Second one is increasing potential and availability of ICT infrastructure. CSCL systems solve many problems which are connected with collaborative learning. Tasks which lead to solution of these problems can be divided into two groups – learning analysis and application design.

We concern with application design tasks in our project, especially with encouragement students in collaboration learning via creating different types of study groups. In order to create these groups we propose enhanced group technology (GT) method. Proposed method is able to take many students’ activities (characteristics) as inputs and create clusters of users which are characterized by similar activities. This similarity can be defined with different matrices, i.e. activities which complement each other or dynamically adjusted activities relationships. By employing several of these matrices we can obtain different clusters of students which can be used to create different types of groups. Activities which are used as input to the proposed method will be determined by explicit feedback and tracing log records.

We will evaluate proposed method in collaboration platform which will implement several collaboration tools, such as text and graphic editor. In the experiment we hypothesize that groups which will be created with dynamic matrix or with matrix of complement activities will achieve more successful collaboration in comparison with groups which will be created with random value matrix.

to the top | to the main

Feedback Acquisition from Webpage Visitors

stenova

Andrea Šteňová
bachelor study, supervised by Mária Bieliková

Abstract. Explicit feedback from website visitors is very important and its meaning is constantly growing nowadays. Despite the high value of this information for providers, customers do not wish to be disturbed during their work and therefore they are not willing to provide feedback.

In e-learning systems it is interesting to check whether students have understood the content of the page or ask them to rate the quality of teaching. We should motivate students to provide meaningful and realistic feedback. The problem is that we should not disturb students during their studies and get feedback in right time. Having met this condition, we can avoid collecting wrong information.

The student himself decides whether to answer or not to answer a question. This fact can cause that most of ratings on the internet are highly positive or highly negative value. It is important to motivate students with average evaluation to provide feedback, because their opinions are also very valuable.

to the top | to the main

Modeling a Tutor for E-Learning Support

svorada

Peter Svorada
master study, supervised by Jozef Tvarožek

Abstract. E-learning web systems allow students to educate themselves for example by studying materials, solving tests and doing exercises. It is proven that students can learn more if they are included in the learning process and this process is adapted to the needs of individual student. Likewise it is known that student advances faster in the learning process if he is lead by someone who acts more like a friendly tutor than a leading authority.

It has been confirmed that peer tutoring (way of tutoring where one student tutors another) provides benefits for both tutor and tutee. But success of this process depends on many aspects. If tutor and tutee are not successful in solving given problems, they might become demotivated. Because of this there have been several research projects on how to support peer tutor in his tutoring tasks. One of the solutions is to provide peer tutor with a computer tutor who gives him tips related to the current task. However research shows that peer tutor who is given non-interactive hints and peer tutor who is supported by computer tutor have similar results.

But in this research, the functionality of computer tutor was severally limited. In this research we try to broaden abilities of a computer tutor. We want our tutor to not just sit and wait for being asked for help but provide support when peer tutee makes a mistake which is not corrected by peer tutor. That way we try to raise the effectiveness of learning process by essentially cutting down the amount of time required to solve given tasks.

Our current focus lies on creating a project which will implement such a tutor. This project should help students to learn how to create basic procedural algorithms and provide us with means to validate our hypothesis.

to the top | to the main

Method for Social Programming

tomlein-michal

Michal Tomlein
master study, supervised by Jozef Tvarožek

Abstract. Code review is an important part of quality software development. In programming courses, peer review has the potential to be an effective driving force behind the learning process. However, due to the significant amount of time reviews take, whether in software development or a programming course, they cannot be and are not done thoroughly in practice.

While collaboration solutions are widely available, their use is, by their nature, generally limited to larger projects and/or requires a certain discipline to be effective. Moreover, these solutions rarely take into account the strengths and other characteristics of individuals, instead they rely on manual assignment or choice of reviewer. Automating the choice of the right reviewer is a non-trivial problem unaddressed by present collaboration and code review solutions.

In our work, we aim to make programming more effective through social interaction and peer reviews. We believe that by making it possible for students and software developers to collaborate more tightly and easily, we can speed up the development process and achieve higher quality overall.

to the top | to the main

Group Recommendation Based on Voting

trebula

Ján Trebuľa
bachelor study, supervised by Michal Kompan

Abstract. Every day thousands of people are overload by number of information and most of this information are irrelevant to them. Therefore, we are still looking for new ways, to deal with information overloading problem in several domains. One of promising approaches used to solve such a problem, which deals with the recommendations of items for group of users. Several approaches for group recommendations were proposed. Voting or preference elicitation is only the one example of such an approach used in nowadays systems. System from all users’ preferences selects the one the best suits the whole group.

Usual scenario for group recommendation is to find the most interesting and the most user- preferred programs while watching TV. Today we recognize a lot of TV channels offering a large number of movies. The system helps to select program, which users would like to watch together, by following preferences of actual present users. Another example might be, when some users are online to learn together, so vote for the topic they will learn.

The group recommendation is one of the actual research areas nowadays. Therefore, further improvements are still needed; including the development of less intrusive and more flexible recommend systems. Our aim is to design approaches that will recommend a vote for the group and will yet take into account the above- mentioned characteristics and principles.

to the top | to the main

Acquiring Metadata about Content and Relations on the Web

uhercik

Tomáš Uherčík
master study, supervised by Marián Šimko

Abstract. World Wide Web has become one of the most important sources of sharing and searching for information. The amount of information on the web is so huge, that searching can be done only by machines, but the information is understandable only by humans. Semantic web is vision, where this problem is solved by the layer of machine-processable metadata.

These metadata are not filled so often, as we assumed. It is difficult and time consuming work. The challenge is to obtain them automatically. There are many methods that can be used to acquire them from text. In many cases the keywords, which we would use for annotate text, are not included in that text. The social web is opening up wide possibilities of user data usage for this purpose. People enrich web content by writing their comments about entities of their interest.

There are a lot of useful metadata within socially-oriented data, but it is challenging to distinguish between important information and trash. We can consider using the relationship between users and categorization of users to gain metadata from group of our interest. The goal of this project is to find the way to enrich annotation of entities on the web using socially-oriented data.

to the top | to the main

Combining Different Data-Sources for User Modeling in Personalized Learning

uncik

Maroš Unčík
masterstudy, supervised by Mária Bieliková

Abstract. The trend of using adaptive systems in e-learning is progressive growing. These adaptive e-learning systems are trying to address the most crucial issues, which are related with overflow of information. The main future of such systems is personalization, which monitor characteristics of individual users, including modeling of their skills, knowledge and/or interests. The performance of such systems is derived from an important element – a user model, which allows users minimizing error rates and learning time.

We proposed user model, with segregated data collecting about user and constructing of user model itself. We consider several sources of inputs to process. We consider also visualization of user model from user point of view, which allows direct and explicit feedback from students to enrich the user model. The visualization helps the user to get an overview of the whole model, to get a clearer overview of dependencies in the user model and to adjust the sensitivity of the user model.

We will experiment with proposed user model in the Adaptive Learning Framework in real-settings environment, which is used as e-learning system at Faculty Informatics and Information Technologies at several courses.

to the top | to the main

Automatic Support for Academic Writing

diana

Diana Vandlíková
bachelor study, supervised by Jozef Tvarožek

Abstract. Students of high schools and universities, teachers and experts in various fields are often faced with the task to write a thesis in various forms. It is very important not only to get information but also be able to process information and present it to public. The crucial problem is the fact that students are forced to gather information about the form of thesis using their own ways. This causes the markant disunity of thesis. There are some international rules how to write final thesis which are trying to enhance and unify their quality.

My bachelor thesis will contain the automatic control of final thesis. The basic form of all thesis contains correctness, usefulness, topicality and correct form of the text. Each final thesis is derived into a few parts: preface, introduction, analyse, design, implementation, conclusion, appendixes. All important points and visual forms that are controlled by international standards I mentioned above must be included in all these parts. This helps to correct thesis faster and easier.

to the top | to the main

Discovering Rituals of a Web User

visnovsky

Juraj Višňovský
bachelor study, supervised by Dušan Zeleník

Abstract. In last decade the Internet has become a big part of our lives. Thanks to the Web we can save a lot of time fulfilling common tasks, but on the other hand also to lose some. It is a place where we can find many tracks about user’s recent activity. Some of this information is visible, some is hidden and has to be uncovered.

Human behavior, in general, is affected by many contexts. Current mood, weather, actual occupation and many other things have a great influence on our actions. Many people have a daily routine which will not change no matter what. Some people are used to read newspapers in the morning, some watch a TV every evening and some only works at night. People most likely do not even realize those small rituals, but we are going to point at them.

The aim of our work is to discover connection between user’s activities and contexts. If there is a connection, we will use this information to predict future outcomes. We are going to track user’s footprints obtained from PeWe Proxy server, which means our possibilities to work with different types of contextual information will be slightly limited (approximate location derived from IP, time, day of week, weather etc.). According to this information we are meant to propose a method for discovering user’s rituals. To achieve this we are going to split user’s browsing history into two parts. Based on discoveries related to data from the first part we will be able to come up with some predictions. These will be verified by real data from the second part.

to the top | to the main

Role of the Context in the User Behavior

zelenik

Dušan Zeleník
doctoral study, supervised by Mária Bieliková

Abstract. User activities both on the Web and in the real world are affected by attributes of the environment and user’s state. We call these factors simply contexts. There are many different types of contexts which are affecting our behavior, however, in behavioral pattern discovery we often use time as the only context. Our aim is to use as many contexts as it is possible to discover behavioral pattern and build better, more accurate prediction model. This includes also contexts which are often unknown or unavailable for the specific user (no or disabled GPS device – location context, no will to express mood – emotional context).

We design the method for context-aware user behavior prediction. This includes context discovery which we achieve by inferring them using correlations in users’ behavior. Our basic idea is in representing conditional user’s actions as points in multidimensional space. Every dimension in such a space is one of the conditions (context). This representation enables us to compare two users by distance of the specific actions. We use projection of the space using the intersecting dimensions (contexts known for both users). Even unknown contexts are then available (with certain probability) by appending from the representation of similar users. This eventually leads to more accurate prediction model. Besides more accurate predicting we also achieved partial result which is in context discovery. We are able to discover present or future contexts for the specific user using his behavior and behavior of similar users (assuming that users’ contextual information is available). Moreover, besides named contexts, we are also able to discover contexts which are unnamed. These contexts are present but only as combination of more contexts (unknown for everyone – like emotional contexts).

to the top | to the main