Students´Research Works – Autumn 2010 | PeWe - Personalized Web Research Group

Search, Navigation and Visualization

Máté Fejes: Cloud-Based Navigation over Concepts
Róbert Horváth: Interpretation Support of Terms while Browsing in Slovak Language
Eduard Kuric: Automatic Photo Annotation: Photo-to-word Transformation based on Object Recognition
Michal Lohnický: Photo Album as a Collection of Memories and Experiences
Milan Lučanský: Acquiring Metadata from the Web
Peter Macko: Interactive Browser of Heterogeneous Web Content
Michal Noskovič: Personalized Search using Social Networks
Karol Rástočný: Semantic Web Browsing Based on Graph Visualization
Miroslav Soha: Information Synchronization for Honey Bees Searching on the Web
Peter Študent: Using Implicit Feedback in Searching Interesting Content
Michal Tvarožek: Exploratory Search in the Adaptive Social Semantic Web

Classification and Recommendation

Anton Benčič: Use of Context in Information Recommendation in a Specific Domain
Pavol Bielik, Peter Krátky, Štefan Mitrík, Michal Tomlein: Motivating Children to Increase Physical Activity by Means of Reward
Michal Holub: Creating Personalized Websites from Extracted Web Objects
Michal Kompan: Group Recommendation for Multiple Users
Martin Labaj: Recommendation and Collaboration through Implicit Identification in Social Context
Pavel Michlík: Combining Recommendation Methods in Web-based Educational Systems
Ján Suchal: New Approaches to Log Mining and Applications to Collaborative Filtering
Filip Sucháč: Social-based Recommendations on the Web
Martin Virik: Automated Recognition of Author’s Writing Style in Blogs
Dušan Zeleník: Context-Aware Recommending

User Modeling, Virtual Communities and Social Networks

Michal Barla: Semantic Refreshment: Combining Term-based User Model with Linked Data Cloud
Roman Burger: Human like Artificial Player in a Board Game
Marián Hönsch: Virtual Community Detection in Vast Information Spaces
Tomáš Kramár: Detecting Searcher’s Intent
Tomáš Majer: Microblog-Based Web Resource Ranking
Ivan Srba: The Influence of Group Creation on Collaboration
Márius Šajgalík: Decentralized User Modeling and Personalization
Jozef Tvarožek: Bootstrapping a Socially Intelligent Tutoring Strategy
Maroš Unčík: Long-term User Model Prediction

Domain Modeling, Semantics Discovery and Annotations

Peter Bartalos: Effective QoS and Pre-/Post-Condition aware Web Service Composition in Dynamic Environment
Martin Jačala: Named Entity Disambiguation Using Semantic Analysis
Tomáš Kuzár: Slovak Blog Clustering Enhancements
Vladimír Mihál: Relations Discovering in Educational Texts based on User Created Annotations
Róbert Móro: Personalized Text Summarization Based on Collaborative Annotation
Balázs Nagy: Collection and Creation of Metadata via Interactive Games
Jakub Ševcech: Automatic Web-page Annotation
Jakub Šimko: Games with a Purpose Aiding the Semantic Web
Marián Šimko: Automatic Semantics Discovery for Adaptive Web-based Learning 2.0

to the top | to the main

Semantic Refreshment: Combining Term-based User Model with Linked Data Cloud

Michal Barla
doctoral study, supervised by Mária Bieliková

Abstract. Last year, we proposed an approach which aims at significantly enhancing the everyday surfing experience and changing the whole Web into a social information space, where one can leverage the experience of others by employing open corpus adaptive hypermedia techniques on the level of “wild” Web. We achieve it by employing lightweight semantics of terms retrieved automatically from web documents in a combination with implicit interest indicators. We believe that such a lightweight semantics, easily obtainable by machines, provides us with sufficient “meaning”, required to successfully build a user model.

The main drawbacks of term-based approach are polysemy of words and a lack of relationships between terms. One way how to eliminate the polysemy issues is to incorporate additional semantics, which would help us to distinguish among different meanings of a term. This could be accomplished via web services for terms extraction, which automatically create rich semantic metadata for the submitted content. For instance, OpenCalais resolves named entities (such as persons, companies, cities, countries etc.), facts and events. OpenCalais does recognize such named entities and assign them into semantic categories using Linked Data standard and thus connecting it to the Linking Open Data Cloud. The other benefit is that Linked Data naturally comes with various types of relationships among them, which eliminates the latter mentioned drawback.

We propose to incorporate an additional layer into a user term-based user model, which would serve for the identification of term-types using URIs from Linked Data Cloud. This allows us not only to deal with ambiguity of terms, but provides us a possibility to broaden the set of terms present in the user model, to find new relationships between terms and thus between users. Such user-to-user relationships serve as a basis for virtual communities detection and subsequent community-based personalization.

to the top | to the main

Effective QoS and Pre-/Post-Condition aware Web Service Composition in Dynamic Environment

Peter Bartalos
doctoral study, supervised by Mária Bieliková

Abstract. Web services present a topical research area with lot of attention. One part of this research aims to propose solutions to automatic composition of several web services into workflows bringing a utility which cannot be provided by single service. The desired goal of such composition is described in the user query. The automatic web service composition showed to be a challenging task. The research of service composition in last years tends to focus on issues related to QoS, pre-/post-conditions, user preferences (e.g. soft constraints), service selection considering complex dependencies between services. Our work deals with the effectiveness and scalability of service composition aware of QoS, and pre-/post-conditions. Moreover, we deal with the dynamic changes in the service environment and continuing composition query arrival. Both of these must be handled by the composition system effectively.

Our approach is based on a lot of preprocessing done before we are responding to user queries. During it we create data structures which are used to quickly answer the query. The most important is that we evaluate which services can be chained, i.e. which services produce data and have a post-condition required by the other services. The actual design of the composite service is realize as a forward chaining. This process could become a high computation demanding task if the number of services, which must be considered during the composition, is high. To make our approach effective, we propose service space restriction. Our experiments show that using this approach we can faster the composition in more than one order of magnitude. Our overall work shows that the performance issues, in the context of web service composition, can be managed even if the number of considered web services is high. This is true even if we take into account the pre-/post-conditions and the QoS of the web services.

to the top | to the main

Use of Context in Information Recommendation in a Specific Domain

Anton Benčič
beginning master study, supervised by Mária Bieliková

Abstract. There are many methods for personalization and recommendation within the web today. However, regardless of the specific method used, there is always a need for accurate and relevant information about the user and his context, and the question still remains, as of where to find that information. One of the possible ways is to analyze and process the data stored inside intelligent mobile devices, provided by technologies and services like GPS, Facebook or e-Mail just to name a few. These can provide very helpful information to be used for user modeling and consequent personalization or recommendation, because mobile devices are the most personal computers people carry around themselves today. Despite of this fact we do not see a widespread use of this information beyond a set of guide applications that exploit positioning technologies for determining their context.

Our goal is to analyze the potential of mobile devices from the personalization perspective as providers of valuable information about the user that can put his actions in a context. Examples of such use would be analyzing documents, browsing history or user comments within a social network like Facebook, and using that data for content-based recommendation, for example in a domain of an internet newspaper. This recommendation can be even supported by using not only the user’s own data, but also the data from his friends in one of his social network. The mobile device could even use position and time data from all of these users and on their basis automatically construct a set of user’s interest groups to further refine the recommendation process. If we went even farther, we could use the user’s and his classmates’ calendar data, for example to recommend the user some articles from an education system that he hasn’t seen yet, but his classmates did.

Today mobile devices really offer a wide variety of technologies and services that are capable of providing valuable information about the user’s context that we often lack, and we see it as a research opportunity not only for us, but for the whole community as well.

to the top | to the main

Motivating Children to Increase Physical Activity by Means of Reward

Pavol Bielik, Peter Krátky, Štefan Mitrík, Michal Tomlein
bachelor study, supervised by Mária Bieliková, Michal Barla

Abstract. We address the lack of physical activity as a health problem of the entire young generation. We specifically aim at helping to reduce child obesity and promote a healthy lifestyle from early childhood. Our solution motivates children to participate in physical activity by making it a part of gameplay. It replaces in-game payment, which children currently see and make use of in a large number of popular games. Instead of having to pay for features and advancement in the game, children use points awarded to them for being physically active. The actual payment is performed by their parents, who are given the option to specify daily limits and the exact sum of money they wish to enter into the system. Apart from paid games, the system is also designed to extend free games. Developers of free games have the ability to require a certain level of health or a given number of points for a feature to be unlocked or in order for the player to advance to the next level.

We provide adaptive mechanisms to ensure that children engage in regular exercise, as opposed to occasional outbursts of activity which are unhealthy and dangerous. In particular, we have introduced the concept of health as a prerequisite for awarding points, in that the child is required to reach a given level of health to be awarded any.

To achieve the above goals we introduce two separate applications, one for children and one for parents. The application for children presents a personalized activity recommender which seeks to motivate and guide children in their activities. In order to recognize and assess physical activity, our application for mobile phones collects data from various sensors, such as GPS, accelerometer and gyroscope. The parents’ application makes it possible for parents to track, evaluate and compare their children’s physical activities.

to the top | to the main

Human like Artificial Player in a Board Game

Roman Burger
bachelor study, supervised by Jozef Tvarožek

Abstract. Since computer world becomes available to almost everyone these days, the main focus of development is shifting from expansion to rather human computer interactions. It is important to proceed with these interactions adaptively because the base of computer and web users is so widespread. Adaptive approach means sort of artificial intelligence that would interact with a user.

The main problem with artificial intelligence is, that it is artificial. Users can easily guess if they are interacting with artificial intelligence and also it is predictable. This often makes interaction boring and demotivating, thus loosing interest in developing relationship between computer and human.

In this project we are going to develop human like artificial player in a board game. The board game approach is chosen because it offers natural environment for developing relationships. It is also usually accompanied with a joy which is one of the best motivation, needed for further development of the relationship.

Development of human like artificial intelligence is achieved in two significant ways. First one is development in the right setup. Right setup means a board game, which would favor artificial player expressing as much as possible with single actions. This can be done either by implementing such features to the board game or using whole new innovative board game, but the best would be right composition of these two.

The second way is development of artificial intelligence algorithm. Since humans are usually smarter than algorithms, it is common to secretly benefit artificial intelligence in some way to make human and computer even. We would like to eliminate this, but rather make artificial intelligence less predictable and more “moody”. This could make the artificial intelligence interesting to play with. Another feature of algorithm could be building a relationship with individual users, thus make the game a place to “meet” someone (meant computer friend), not just to play a game.

We hope to gain enough information and experience in this field for the better future of human computer interaction approaches and development.

to the top | to the main

Cloud-Based Navigation over Concepts

Máté Fejes
bachelor study, supervised by Mária Bieliková

Abstract. The effectiveness of information overview could be improved by individual access. In this context there are more methods and techniques of adaptation, which lead to personalized web applications. Support of navigation in case of large scale of information (e.g. professional literature about programming languages collected from the Internet) have significant value in order to enable the effective access to the required content or which contributes to the user’s actual goal and needs. If we assume that we have information about terms which could be interesting for the user we can help him with the visualization of keyword clouds in a given context.

Our solution for the given problem is a navigation tool application within an adaptive web learning framework. This tool utilizing the selected domain’s model features determines set of keywords, which could be interesting for the user. The tool looks like a general tag clouds which are commonly used for navigation simplification throughout web pages. Domain model contains learning objects, which are educational materials (explanations), questions and exercises. There are concepts (keywords) related to learning objects, which serves as metadata. Furthermore, there is information about web page users’ activities – educational material reading, questions and exercises solution, correct or incorrect answers etc.

The navigation has two sections. The first section contains related concepts, that is concepts related to actually captured learning object. They enable quick switch among articles with similar content and question / exercise acquisition for already read explanation and vice verse. The user gets to learning objects which are related to the chosen keyword by selecting a keyword from this section.

The second section of navigation contains suggested keywords, which are determined by a defined recommendation method. The recommendation method utilizes the user model and the history. With the help of users’ activities history the recommendation algorithm determines potential user’s knowledge deficiencies gained upon his individual actions and similarities between the involved user and other users. The system responses to these actions by a defined method and selects concepts related to learning objects, which could be interesting for the user.

to the top | to the main

Creating Personalized Websites from Extracted Web Objects

Michal Holub
beginning doctoral study, supervised by Mária Bieliková

Abstract. Every piece of information a person could need is on the Web. In fact, there are far more information than one can process. This brings us a problem of choosing the right information in which we are interested. A computer system which knows our needs could make us more productive by doing much of preprocessing for us. It could then recommend us interesting web pages or objects. We see the Web as a large data provider. Various web services benefit from this data by mixing them together and creating so called mashups. An example would be a map which is shared among various types of applications. Doing one step further means not only to take data from data providers but to deduce facts by preprocessing and analyzing various web portals.

As a result we could create more sophisticated personalized mashups. This requires solving many challenges. At first we need to do the web page analysis in order to extract interesting objects and annotate them. Ontology can help us to discover various relationships among these objects and deduce further facts. Next phase is to select appropriate objects according to user’s needs and combine them in a personal mashup. Finding patterns in the behavior of users and in their social networks can provide valuable information about their interests. We can later use it for recommendations.

We have showed in our previous work that we are able to estimate user’s interest in a web page based on the actions he takes. We applied this method in a web-based application, which extracts events from a web portal and creates a personalized calendar for every user. Now we plan to extend the method to other portals and make a mashup of various types of events. Next, we we plan to combine it with other objects and information related to the events (information about people, places, etc.).

to the top | to the main

Virtual Community Detection in Vast Information Spaces

Marián Honsch
master study, supervised by Michal Barla

Abstract. Collaborative filtering is present within current web-based systems in many forms. At the beginning they were mostly either memory based or model based, but as the time passed, many hybrid approaches combining several techniques from multiple disciplines emerged. However, the basic idea remained always the same: use past experiences of users to get benefits for an individual. We can imagine it as when we are in the woods, we take the paths that others took before us. Significant enhancement of the basic idea is not to use past experiences of all users but instead of it, consider only users which are similar to the user, for who we are targeting recommendations. Based on this we can start to detect virtual communities of users.

In our work we focused on identifying communities of individuals based on their interests. At the same time a user can belong to several communities. Each community represents parts of his interests. Recommendations coming from such communities are more accurate than when we group users based on whole user profile. As first in our work we describe how it is possible to catch and identify particular interests for each user. We chose to do a latent analysis of the content that the users have viewed in the past. Based on the time period we can module short term and long term interests. Next we describe the processes how to create and discover virtual communities. To proof our solution we decided to recommend articles to groups defined by us on a news paper portal. As collaborative filtering is always tailored to the specific domain, a news paper portal is a highly dynamic domain with frequent changes and special user behavior. Nobody reads yesterdays newspaper. In our work, we also consider these fluctuating and time-dependent changes by incorporating influence of volatile communities on recommendations.

In our work we assume that article categories are represented by different keywords, so a interest can be defined based on clusters of keywords corresponding to those categories. The novel approach is to create virtual communities based on these clusters. In our experiment we use a hybrid approach of layering and weighting results from different methods that we use for identifying communities.

to the top | to the main

Interpretation Support of Terms while Browsing in Slovak Language

Róbert Horváth
bachelor study, supervised by Marián Šimko

Abstract. When browsing the Web, we sometimes find it difficult to understand the page content. In many technical articles there are words which meaning is unknown for us. Most of the users look for an explanation of a phrase in online dictionaries. This way they have to open a new window with dictionary and manually enter a word. That is nor very fast neither comfortable. Another possibility for a user is to use a web browser extension that can do the whole process automatically. Extension is made to show the main explanation of word in a tooltip. Main problem of such extensions is that they are not created to support Slovak language.

Therefore, the goal of our work is to create a tool that will provide explanation of Slovak word simply and fast. We will achieve it by creating an extension for web browser (Google Chrome). The process of word explanation will begin with simple double clicking and selecting unknown word. There are some problems with Slovak language we have to face to get the right meaning. Extension must be able to get explanation even for words that are not in a main form. To allow this we need to use lemmatizer to get lemmas of the word. This will be done as separate web service which our extension will use and afterwards it will send a word lemma into a dictionary. Another problem connected with simplicity is to show the right meaning of word if there is more than just one meaning. The extension will take context of the word into consideration and compare it with explanation results already found. Only the most probable result will be shown to user. To fully understand the word, not only the explanation but also synonyms can be sometimes very useful and will be displayed as well.

To evaluate the extension, we plan to conduct a small experiment with a selected group of users.

to the top | to the main

Named Entity Disambiguation Using Semantic Analysis

Martin Jačala
master study, supervised by Jozef Tvarožek

Abstract. Identification of named entities in unstructured, human written text is well established subtask of natural language processing. However, marking each occurrence of named entity in text with class label is not sufficient, as the same word often describes various different entities. A dedicated area of NLP, Named Entity Disambiguation, has been devised to solve this problem.

Such methods use usually large external dictionary and links all occurrences of named entities to appropriate dictionary keywords. Various approaches has been published to solve this problem, however, majority of them are based on “bag of word” methods, such as simple cosine similarity, Latent Semantic Analysis or Explicit Semantic Analysis.

In our work, we are employing Wikipedia as a source of background knowledge. Individual pages are focused to single ‘instance’ of the entity with distinctive meaning, thus we can use it as disambiguation dictionary for entity occurrences in evaluated text. We use additional semantics provided by Wikipedia, such as disambiguation and redirect pages or links between documents.

Our method measures semantic relatedness between the occurrence of a entity in evaluated text and its possible meanings, according to surface forms extracted beforehand from Wikipedia data. The semantic relatedness is then computed using Explicit Semantic Analysis with respect to semantic space built from pages describing entity possible meanings.

In this method, we build smaller semantic space for disambiguation with respect to the evaluated entities, instead of using large semantic space built from entire Wikipedia. We currently evaluate how to build the document vectors for disambiguated entity and its possible meanings and how to determine the size of the individual semantic spaces to achieve best results.

to the top | to the main

Group Recommendation for Multiple Users

Michal Kompan
beginning doctoral study, supervised by Maria Bielikova

Abstract. The data amount on the web is serious problem for the common user. The existence of information is not so relevant, when there is no one who can access or find this information in acceptable time. One of the most relevant sources of information over the web are news portals (nytimes.com, reuters.com, etc.). Most of users prefer large renowned news meta-portals. They include thousands of daily added news from the whole world and there is no chance to access them in a fast and comfortable way for every user. The only way to help the user is to filter large amount of information and reduce it to an acceptable amount.

The main problem in the content-based filtering is effective and enough expressive representation of items (or articles). This is often done by means of text summarization or keywords extraction. These techniques are commonly used in English based systems and cannot be easily applied to other languages. Keywords extraction and summarization brings better results as the other methods but are more time consuming. These methods cannot represent non-text documents without modification.

The recommendation task for a single user can be extended to a group recommendation. With the social networking increase, the recommendation of restaurants, accommodations, trips, TV shows or learning lessons will be more important. There are also several scenarios, when one information source is shared and accessed by a group of users (adverts, chat rooms etc.). Several strategies have been proposed, which are inspired by Social Choice Theory (e.g. Fairness, Most pleasure, Voting, Least Misery). On the other hand the user group modeling can be used in standard recommendation tasks to overcome problems like “cold start” or “gray sheep”. One of the actual problems in the field of recommender systems are explanations. Understanding the recommendation process can increase trust and user satisfaction.

to the top | to the main

Detecting Searcher’s Intent

Tomáš Kramár
beginning doctoral study, supervised by Mária Bieliková

Abstract. One of the main problem of current search engines is the limited interaction with its users. The whole interaction proceeds as a series of queries; user inputs the keywords and the search engine presents results. This model doesn’t allow the search engine to understand user’s needs, his context and the ultimate intent behind the search query.

One keyword may represent multiple search intents, but the problem is far more serious than just ambiguous keywords – the real problem is interpreting the query in a way that fulfills searcher’s needs. We need to understand answers to questions like “What is the goal of this search?” or “Why is the user searching?”. The queries are often just partial steps towards a final answer, which help the user to understand the domain, refine the query, complete the picture and reach the final goal.

Queries may be in general categorized into three main groups: navigational (searching for institution, company or celebrity pages), informational (answers to questions, information on a topic) and transactional (looking for recipes, music, articles). Each group has its own specifics and requires different search results presentation and different result ranking methods.

The goal of our research is creating methods that would infer search intent, based on the user’s context, behavior and browsing history, and adapt the search results accordingly. This includes methods for indexing and storing the information in a way that enables effective ranking using different attributes than just a text similarity.

to the top | to the main

Automatic Photo Annotation: Photo-to-word Transformation based on Object Recognition

Eduard Kuric
master study, supervised by Mária Bieliková

Abstract. With increasing popularization of digital and mobile phone cameras, there occurs a need of quick and exact searching. Content based indexing of photos is more difficult than text documents because they do not contain units like words. Searching is based on annotations and semantic keywords that are entered by a user and associated with photos. However, manual creating of annotations is very time-consuming and results are often subjective. Therefore, photo automatic annotation is most challenging task.

Traditional approaches for automatic annotation are based on combining keyword-based and content-based photo retrieval. One of possible scenarios is that the user enters a query consisting of a target photo and keywords, typically only a caption. The aim is to find the most similar photos in a corpus which contains well-annotated photos. After a retrieval process, related keywords are extracted from candidates and associated with the target photo.

In our work, we propose a novel method for automatic annotating photos. In our scenario, the target photo is divided into a fixed number of sub-images (tiles) and for each one are computed descriptors of local features. To capture local information that is in object retrieval essential, we use Scale Invariant Feature Transform (SIFT) in combination with a hash-based method known as Locality Sensitive Hashing. A SIFT detector transforms a tile into a collection of feature points that are invariant to photo scaling, translation, rotation and partially to illumination changes.

Our learning corpus contains well-annotated photos with extracted local and global features, too. For each tile of the target photo, the most similar photos according to the features are retrieved and each one inherits all keywords associated with the candidates. After process, each tile contains a word with the highest frequency. By using the best candidates and local features, bounds of the tiles are refined. The result is the annotated target photo, in other words, the photo with recognized and labeled objects.

to the top | to the main

Slovak Blog Clustering Enhancements

Tomáš Kuzár
doctoral study, supervised by Pavol Návrat

Abstract. There is a huge amount of unstructured user generated content (e.g. blogs, comments, statuses) available on the social web. In order to increase effectiveness the web search, in order to support context advertising or personalized recommendations we can cluster the data. We experimented with dataset which consists of blogs are written in Slovak language and online forum contributions which belong to these blogs. Our blog clustering model can be divided into three phases – preprocessing (term extraction), processing (LDA model based term selection) and postprocessing (user and content clustering). The quality of clusters is measured by traditional clustering metric called F-measure.

Transformation of unstructured information into structured representation is called preprocessing. Unstructured data needs to be put into structured representation in order to apply the machine learning techniques. In preprocessing phase of our model we evaluated the impact of different data preprocessing methods on success of blog clustering. We found out that applying various text data manipulation techniques in preprocessing can significantly improve the quality of clusters.

We experimented with several methods in the term extraction phase: lemmatization, taxonomy based term extraction, lexical classes building and usage information consideration. In other experiment setup we translated Slovak blogs into English language using Google translate service and filtered English stop words and applied well know Porter stemmer. We found out that English processing was more successful than simplistic Slovak preprocessing methods. During the text processing phase we employed probabilistic model called latent Dirichlet allocation in order to extract the topics from the dataset.

In order to increase the quality of clusters even more, we decided to use some postprocessing based on consideration of social interactions of users in dataset. We found out that many web forums contributors commented preferably specific topics. Considering this fact we were able to build more precise blog clusters.

As future work we plan to analyze blog clustering based on social interactions of users in more details.

to the top | to the main

Recommendation and Collaboration through Implicit Identification in Social Context

Martin Labaj
master study, supervised by Mária Bieliková

Abstract. In the field of e-learning, the identification of difficult and/or interesting parts of learning text can be useful feature for tasks like rewriting the text, showing where to focus or offering help. In our work, we track implicit feedback/interest identifiers including user scrolling (read wear). Using statistical approach and taking intersections and overlays of this data collected from many users, we can determine which part is the most time-consuming and therefore interesting and/or difficult. Another important data consist of mouse clicks (click heatmaps) and mouse movement (flowmaps).

As in any method dealing with time based user tracking, there is a possibility that a user is pursuing different activities during evaluated time periods. We try to avoid this by using low-cost webcam and employing physical user tracking: eye tracking, where user gaze is evaluated. This way we can leave out time periods when user is not directly using computer or even when he is at the computer, but is working with different parts of screen. The gaze detection also increases precision of fragment identification.

Subsequently, readily available data about user’s active fragments and his work with a system can also be used to provide tips, advices or even explicit feedback questions adapted via means of implicit feedback. While explicit feedback adapted to some degree is common (i.e. asking for reason of revisiting the downloads page), by employing mouse and gaze tracking, we can ask specific question – i.e. why the user has not used recommended links at all – and determine whether he just did not notice them or he finds recommender-picked links unappealing.

Another use lies in a social context. By augmenting the displayed content with indication of active fragments of other users, users see how they are doing against others. We also believe to increase level of user collaboration by providing messaging with indication of where each user currently works in the same content. As the user contacts friends learning the same part, he is not distracting them away from their current study and he also obtains better advice from them.

The evaluation is carried out in multiple steps – independent testing of gaze tracking and interesting fragment detection via implicit interest indicators and then final evaluation through ALEF learning system, with outlooks for implementation through Adaptive Proxy which would also bring this concept to open space of Web.

to the top | to the main

Photo Album as a Collection of Memories and Experiences

Michal Lohnický
master study, supervised by Mária Bieliková

Abstract. Digital photography has existed since 1991. About ten years later the first mobile phone with an integrated camera was manufactured and nowadays more than 90% of mobile phones have a built-in camera. In 2009, more than 3 billion photos were uploaded monthly to the biggest social network Facebook. These numbers only prove the fact that the vast majority of people in developed countries tends to carry the camera at any major event.

Moreover, photography is mostly considered to be a medium to save and share experiences and emotions from photographed events. This means that photo albums are collections of memories and emotions linked together via a user’s story. Photo album visualization is supposed to emphasize these elements and create unforgettable user experience.

Our work is aimed at augmenting user experience while browsing photos and other multimedia. We have proposed a special emotional chart as a fundamental navigation element which creates an overview of photographed events. The chart is meant to be an emotional histogram of the events, which employs minimal user’s personalization to save the atmosphere of the experiences. This is allowed by combination of various methods and algorithms like fast furrier transformation, force based layout, logarithmic normalization and photo clustering.

The main goal of our work is to supply innovative navigation and browsing in photo albums which supports storytelling and collaborative creating of digitally saved experience. To combine this with a proper amount of photo analysis and informational augmentation we can create various views of a complex collection of photos in photo albums. This style of browsing photos can be used by the users for sharing photos in much higher quality, for finding photos which they miss in their photo collections, to view places where they intend to go in various time periods etc. It is also usable in commercial sphere – in travel agencies, botanic monitoring etc. When we add the direction of taking photos to the location we can create an ideal presentation tool for real-estate companies.

to the top | to the main

Acquiring Metadata from the Web

Milan Lučanský
beginning master study, supervised by Marián Šimko

Abstract. World Wide Web provides access to enormous amount of data. Even though those data are freely available to every one, we have problem to process it because of the form the data are stored and also the amount is an issue. There is potential to provide advance data processing such as categorization or recommendation, but we have to build semantic layer which is necessary for advanced tools. We need automated method to extract relevant meta-information from the web content. There are some approaches to extracting information from web pages, but most of them are not suitable form “dirty” World Wide Web environment.

We focus on extracting keywords describing the content of web page. Acquiring the keywords is important step before getting the concepts, which are essential for creating ontologies. And the ontology is important part of tools for advanced data processing. We build our work on bachelor thesis (Web site content metadata acquisition using tags), which focuses on acquiring keywords from websites. We proposed new method which improves contemporary algorithms for keywords extraction. In this method, automatic term recognition (ATR) algorithms are combined with some semantically potential HTML tags (i.e. title, heading and anchor) to get more descriptive keywords from web page. The main idea of our method is considering more relevant keywords which are picked by ATR algorithms, but also present in one of HTML tags with semantic potential. We increase weight of keywords using TagIndex, which is constant number assigned to each HTML tag.

The index for each tag was estimated at the beginning and did not change through experiment. There we see potential for our research. We suppose that finding method how to parameterize value of TagIndex according to some variables could bring even more precise and descriptive keywords. Possible variables could be number of external links (anchors), number of words in HTML tags, diversity of words in HTML tags or every ATR algorithm could have different measure how to set its TagIndex. After creating new method for dynamically assigned TagIndex, we will perform experiment on “wild” web.

to the top | to the main

Interactive Browser of Heterogeneous Web Content

Peter Macko
bachelor study, supervised by Michal Tvarožek

Abstract. Multimedia content is really very important for humans. Thanks to it we can have a lot of fun and relax, but it may also serve to improve education and skills. Just a few years ago it was extremely difficult to transmit large amounts of data, required by multimedia content, through the Internet network. The arrival of new revolutionary technologies with higher speed of data transmission has changed this situation significantly in recent years. Therefore multimedia content is ever more visible on the Web. But the main advantage is, that the quality of the transmitted content, thanks to the constant acceleration of transmission speed, may still increase. Thanks to the features of today’s Internet, multimedia content is more accessible then it was in the past and this is the main reason why I have focused my work on this area.

The main topic of my bachelor thesis is focused on displaying and viewing multimedia content and on the improvement of existing photo browsers and viewers. The browsers nowadays include just the possibility of browsing and searching images based on specified criteria. First theme of my project concerns integration of a video player into an existing multimedia exploration solution since videos will become an integral part of it. Besides that, these videos will also include metadata which will be useful for easy searching and navigation in video content. Videos displayed by the system will be transmitted through the Internet in the highest possible quality.

The second topic is about personalized presentation of images and videos. This means, that if someone is interested in some person or object on the picture or video, he may ask the system to show other pictures or videos with this chosen object. This will make the presentation much more interactive. The direction of the presentation will not be determined by the system or by the gallery owner but by the user. The user alone could manage what should be seen in the gallery.

to the top | to the main

Microblog-Based Web Resource Ranking

Tomáš Majer
master study, supervised by Marián Šimko

Abstract. In order to compute page rankings, search algorithms leverage mainly information coming from page content and its interconnections. Microblog as a phenomenon of the “Web age” often provides additional, potentially relevant, information – user feedback on a page (or any web resource in general) and/or its contents. This is particularly valuable when considering Microblog as a source of news from around the world and the enormous number of users receiving that news in very short notice. Despite its spread, microblogs are still relatively poorly understood and suitable for further analysis and research.

Microblogging service Twitter has its own characteristics, such as followers who read user posts. They can share posts and publish on their profiles. This makes it possible to rank a user based on his followers with respect to number of contributions and to create an algorithm for evaluating resources on the Web. There are references between users, posts and pages which create a graph. We analyze the graph and apply various graph algorithms leveraging the notion of a node centrality to deduce microblog-based resource ranking.

We plan to evaluate our approach using web search. We compare our microblog-based ranking with traditional search rankings in order to assess the level of search results improvement. Besides web search, we believe our method can be used also in online stores where rankings of products can estimate user interests and opinions.

to the top | to the main

Combining Recommendation Methods in Web-based Educational Systems

Pavel Michlík
beginning doctoral study, supervised by Mária Bieliková

Abstract. In our previous work we designed a method for recommending exercises in an adaptive web-based learning system. In this method we consider the limited learning time and modify the recommendation according to that. However, this behavior might not be desirable at the beginning of the course, long before the exam, where some other sequencing strategy could be more appropriate. Also, students prefer various learning styles and strategies and to achieve optimal learning performance for every student, we need to match the recommending/sequencing method to the particular learning style and/or context.

In the adaptive learning framework ALEF we already implemented simple weighed hybrid recommending. In this method the fixed weights for combining individual recommendations are typically calculated using machine learning. However, a training data set for evaluating the impact of recommendations on learning performance would be extremely difficult to obtain. In addition to that, fixed recommendation weights would not reflect individual preferences of a particular student.

If we assume that every recommendation method represents a certain learning style, then the set of parameters (weights) for hybrid recommendation might become a model of the student’s individual learning style and preferences according to curriculum sequencing.

A proposal of a method for learning the student’s preferences has to face two main problems: 1) defining a feedback for evaluating the positive or negative effect of a change in the student’s preferences model and 2) developing a machine learning method that works well with relatively low number of interaction and feedback cycles from one student. If we assume that there is not a great number of completely different learning styles, we can create groups of students whose preferences seem similar and use their feedback to optimize the model of the whole group, thus multiplying the number of interactions for each student’s model. However, since the changes of the parameters during optimizing do not always head towards the optimal point (especially at the beginning of the process) the clustering of the students has to be loose enough to allow a student to separate from a group if his feedback begins to differ from the group significantly.

to the top | to the main

Relations Discovering in Educational Texts based on User Created Annotations

Vladimír Mihál
master study, supervised by Mária Bieliková

Abstract. Existence of good quality domain model is essential for successful personalization within learning course. However manual creation of domain model and its consequential mapping onto learning content is requires comprehensive knowledge of the domain and substantial amount of time and effort. Therefore it is necessary to support a process of creation of the domain model.

While studying from a regular course, students usually search the Web for additional sources of information. Students bookmark links they found and then share them among their friends. The most common additional (or external) sources for learning programming are reference manual pages, tutorials and code snippets.

Therefore we decided to provide students with a possibility to attach hyperlinks to external sources directly within the learning course. Students insert links to external sources as annotations into related learning content. Inserted links become available to all students, and can be commented or rated. We assume that external sources inserted into learning content by students characterize the learning content from the students’ viewpoint.

We have proposed a method of discovering relations from external sources attached to learning materials by students. It consists of two basic steps:

Annotation of external sources with concepts

Extraction of relationships according to external sources

In the first step we analyze the content of external sources and map already existing concepts in the domain model according to the new content. Discovering relations of the external source content to the existing concepts enables integration of external sources into a learning course. External sources can be then used in the learning process, e.g. adaptively recommended or filtered.

In second step we create a set of weighted relations between the educational content and concepts according to external sources attached to certain learning objects. For that purpose we use graph representing relations between learning objects, external sources and concepts. We use a simple form of spreading activation algorithm to find relatedness of concepts to certain learning object. After obtaining the set of relations, we apply them to the original model according defined rules. As a result we will get enriched mapping between learning content and concepts from domain model.

to the top | to the main

Personalized Text Summarization Based on Collaborative Annotation

Róbert Móro
beginning master study, supervised by Mária Bieliková

Abstract. One of the most serious problems of the present-day web is information overload. As we can find almost everything on the web, it has become very problematic to find what we actually want or need, to find relevant information. Also, the term “relevant information” is subjective, because as users of the web, we differ in our interests, goals or knowledge.

Automatic text summarization aims to address the information overload problem. The idea is to extract the most important information from the document, which can help users to decide, whether it is relevant for them and they should read (study) the whole text or not. In other words, excerpts (abstracts) of documents can help users to maximize the information gain while minimizing the time needed to acquire the relevant information.

The problem with classical automatic text summarization methods is that they do not take into account the different users’ goals, interests or knowledge. Our idea is to personalize the text summarization with the use of collaborative annotation.

Annotation of documents is a technique widely used by people especially when reading printed documents. They highlight or underline the important parts of text, add explanations or different formulations or even references to other documents. This way, annotations can indicate reader’s (or user’s in the context of the web) interest in that particular part of the document.

We can take into account not only user’s annotations but also those of similar users, including the users’ collaboration into the process of text summarization. It is quite an unexplored and open area of research. Open problems are: choosing the right subset of annotations which is subsequently used in the process of summarization, how to include the users’ collaboration to get improved results and multidocument summarization based on annotations in related documents.

We aim to propose the method to personalize text summarization using collaborative annotation and verify the method in the domain of e-learning system ALEF.

to the top | to the main

Collection and Creation of Metadata via Interactive Games

Balázs Nagy
bachelor study, supervised by Michal Tvarožek

Abstract. For implementation of effective search and navigation in documents (files, web pages, photos) it is necessary to have enough descriptive metadata available (e.g., the subject of a document, the type of a page, what is in a picture). Automatic acquisition of metadata is technically difficult due to ambiguity of natural language or problems with the identification of objects in the multimedia content. One effective way to create metadata for content is the use of human computation – human intelligence, which could be encouraged for example through games.

Our goal is to create a game that is appealing to players, offers competition, and provides us with useful (meta)data. In our project we focus on obtaining detailed descriptive metadata for photos that are stored in an ontological database. We also work on more advanced games that deal with the acquisition of more additional annotations, such as object types or locations in a multimedia content.

Our specific game (implemented in Silverlight 4 and C # using Microsoft Visual Studio 2010) is similar to the old familiar memory game Pexeso. At the beginning the player logs in, selects the difficulty of the game, and starts to play. On a board with a minimum of 8×8=64 cards, random images from the database are shown on the hidden downward facing sides. To make the game easier, players can mark a picture with words after they reverse a card. These words then appear when they hold the mouse over the cards effectively adding annotations to images, which we store as metadata associated with the images. Players are rated according to the time required to find all the pairs, the number of attempts while points are also awarded through additional bonuses.

to the top | to the main

Personalized Search using Social Networks

Michal Noskovič
master study, supervised by Pavol Návrat

Abstract. Nowadays, a lot of information is available through the web. This information can be accessed using search engines which evaluate the specified keywords and return search results. These results are often irrelevant and contain no information that was user searching for. In some cases user often has to visit a lot of offered links to find the information. A problem occurs, if the search term has different meanings for different users.

An interesting approach is to use social networks, which have become popular recently. Social networks users share information about themselves in their public profile. This information defines the area of interest and can therefore be used for searching. The information is stored in the user’s profile and is used for further searches. The actual query can be extended by appropriate keywords and make searching more accurate. We use data from the profile and extract the keywords, which are related to the query and offer results to the user. For example, if a user is interested in traveling, results for a query java will be about Java island instead of programming.

Another aim is to compare results using user’s provided profile and profiles of his friends. Everyone on the social network has unique interests and knowledge, so we want to use it in special way. If a user is searching for a piece of information, why do not to recommend a friend to user. If the friend concerns with the search term, he may help the user more than dozens of links.

The main aim of this work is to analyze existing solutions and propose own solution using social networks when searching on the web, so the user will not have to spend most of the time searching in results, because they will be relevant to his interests.

to the top | to the main

Semantic Web Browsing Based on Graph Visualization

Karol Rástočný
master study, supervised by Michal Tvarožek

Abstract. The difficulty of finding relevant data on the Web is increasing as web repositories grow in size. We propose a novel approach for Semantic Web browsing, which can help users find relevant information in the Semantic Web, and enables them to browse similar and/or related data entities. We achieve this via view-based search within the Semantic Web using navigation in a two dimensional graph.

Typical tools such as IsaViz or GOLEM employ graph-based approaches, which start with one central node representing the initial result and some nodes around it representing its facets. Users can navigate the graph by expanding nodes representing facets. We extend these approaches by preventing the graph from becoming unclear due to node expansion, which can result in many (irrelevant) nodes, by result clustering, facet marking, next action recommendation and the hiding of nodes and graph components.

To make navigation more understandable for conventional users (not only for specialists), we propose advanced zoom capability with six levels of abstraction:

Hierarchical clusters – this view shows RDF objects categorized in hierarchical clusters. We propose this view to simplify the identification of good starting points for further navigation in large numbers of results of faceted browsers.

One item view – displays basic information about the selected result.

Literal attributes graph – in this graph only literal attributes are visible. Objects are clustered by their types.

Object attributes graph – object attributes are added to the graph.

Restricted RDF graph – object attributes are expanded to RDF triplets, where subject, predicate and object are displayed as standalone nodes with oriented edges among them. This graph is restricted to non-schema objects and predicates.

Full RDF graph – users navigate in complete RDF graph without any restrictions.

to the top | to the main

Decentralized User Modeling and Personalization

Márius Šajgalík
beginning master study, supervised by Michal Barla

Abstract. In present, web personalization is done almost entirely on the web servers, i.e. in a centralized way. The vast majority of cases it is a personalized web application or a personalization layer on top of the ordinary web application. Completely different approach is a decentralized model, in which each client (an agent in the multi-agent system in essence) keeps a model of its user, determines what will be shared with others and personalizes the content and navigation to the actual needs of the user. This is called distributed or decentralized user modeling.

Distributed user modeling includes several basic requirements to be fulfilled. A network of distributed components must be able to adapt itself, especially because there are not always available the same communication partners and technology. Information should be able to move among multiple users and platforms without the need for centralized controlling. The communication infrastructure and technical details must be hidden from the user modeling components and also the creators of these models. Software designers who create distributed applications must provide usually the following non-functional requirements: scalability, openness, heterogeneity, fault tolerance and resource sharing.

The main issues to be addressed in the context of managing all the information are:

How to locate an agent who has the relevant model with regard to the context and the purpose for which this model is required?

How to make sense of potentially inconsistent, contradictory data?

In general, how to interpret models created by other agents?

How to ensure durability and the integrity of user models in this environment?

How to ensure user privacy?

Our goal is to examine and do some research of possible approaches and subsequently design and implement the appropriate solution for distributed user modeling and thus extend actual research and use of our proxy server behind walls of our faculty.

to the top | to the main

Automatic Web-page Annotation

Jakub Ševcech
bachelor study, supervised by Mária Bieliková

Abstract. Everyone sometimes encounters a problem, that while studying or just reading some text, reader gets to the point, where he finds out that he needs more information or requires the text to be better explained. Such a place could be a term that he does not understand and therefore needs some kind of explanation. It may be also a concept, about which reader would like to find out something more than just information mentioned directly in the text. These concepts are usually matching with a particular word in the text, so reader needs additional information to be directly assigned to this word.

If the reader encounters a term which he does not understand, he will need a definition of that word. If he gets to the point in text where he needs some additional information, he will appreciate additional links to related texts. There he will get more information about the concept. In both cases it is necessary to add some kind of annotations, which enriches the original text.

It wouldn’t be right to display for each reader the same information, because different users have already seen different parts of annotation and therefore they already have some knowledge of part of information contained in this annotation. In order that annotation still provides interesting information for students, it is necessary to adjust displayed information accordingly to current needs and knowledge of students.

Our goal is to create a tool through which it will be possible to automatically annotate web-pages. To achieve this goal we are working on following problems:

We are trying to create a method, whereby we would determine, which words in the text are appropriate to assign an annotation. Furthermore, we are creating a method to automatically generate annotations, based on context of the text and on the word they are linked with. Finally, we are trying to find a way to personalize an annotation, to show to the user information which is the most appropriate for him.

to the top | to the main

Games with a Purpose Aiding the Semantic Web

Jakub Šimko
beginning doctoral study, supervised by Mária Bieliková

Abstract. Semantic web suffers from insufficient forces for creating its essential structures: domain models and resource annotations. Experts can deliver quality but become extremely costly for web scale tasks. Automated (like text mining) and collaborative approaches (image tagging, online bookmarking) are capable of delivering the quantity, but lack either precision (due to the web heterogeneity) or are too general (user tags are usually not specific enough). As an alternative computational and reasoning force capable of delivering necessary information and knowledge, the Games with a Purpose (GWAP) emerged in the past years. By voluntarily playing a GWAP, the player solves an instance of a real problem as a side effect while he is entertained by the game. The problem is “coded” in the rules of the game: the winning condition implies that player solves it. As the player plays, the tactics and strategy he uses is in fact a knowledge that solves the problem instance. If a GWAP is played simultaneously by many players, the required quantity can be delivered, while by aggregating solutions of the one game task from more players, we achieve the quality.

Unfortunately, transformation of a real world problem (like lack of resource annotations) to a game is not straightforward. Our aim is to come up with a set of best practices for creating GWAPs for the domain of the Semantic Web. We believe, that the way leads through formalizing the current problems of the semantic web and, using the same formalisms, identifying the low level rule patterns of successful games to identify possible junctures.

As a case of GWAP, we developed the Little Google Game – a search query formulation game for discovery of the term relationship network. The players of the game compete in reducing the amount of results returned by search engine for a certain given task term using negative terms. In order to be successful, the negative terms must be related to the task term. In this project we faced the general GWAP problem of game rule workarounds (or cheating). We also realised several experiments to discover the attractiveness and impact of the game.

to the top | to the main

Automatic Semantics Discovery for Adaptive Web-based Learning 2.0

Marián Šimko
doctoral study, supervised by Mária Bieliková

Abstract. In order to make the learning process more effective, educational systems tailor learning material to user goals, needs and characteristics. Adequate adaptation requires a domain description enabling adaptation engines to make at least basic reasoning. A domain model of an adaptive course consists of interlinked concepts – domain knowledge elements related to learning content. The concepts are mutually interconnected forming a structure resembling a lightweight ontology, also referred to as course metadata. Different types of relationships between concepts represent different semantics: e.g. concept relatedness, hyponymy or prerequisite.

The bottleneck of adaptive educational systems is the complexity of domain model creation and update. Identifying concepts or defining hundreds or even thousands of relationships between them is difficult and almost impossible for humans. The complexity of domain model update is visible especially in the case of student-generated content when considering Adaptive Web-based Learning 2.0.

In our work we aim at automatic semantics acquisition for adaptive courses. We propose a method for automatic domain relationships discovery based on content processing (lexico-syntactical patterns analysis, resource-concept association processing, etc.) and graph algorithms (leveraging a notion of node centrality). We also focus on relationship type identification: concept relatedness, is-a relationship and prerequisite relationship. We evaluate our approach in a domain of learning programming by utilizing the automatically generated domain model to support course navigation with recommendation of learning objects. The method is being integrated into ALEF – Adaptive Learning Framework. Currently we conduct an long-term experiment in Procedural programming course at Faculty of Informatics and Information Technologies.

to the top | to the main

Information Synchronization for Honey Bees Searching on the Web

Miroslav Soha
master study, supervised by Pavol Návrat

Abstract. Every day several thousands of new pages and articles are placed on Internet. Therefore retrieving desired information is becoming more and more complicated. Creating the optimal query is not the only problem. Evaluating of all results returned by search engines could be very time consuming. This offers many opportunities for improvements.

The idea of using social insect models for optimization and search problems is not new. Despite the fact that the metaphor of bees gathering food applied to web search could not be compared to other search engines in the terms of performance and speed, results of evaluation and long term observation capabilities are at least satisfying.

This project aims at the possibility of using bees to search for demanded information by focusing only at one attribute. At this point, all bees in the hive are divided into groups according to attributes and values stated in the user query. Every group of bees searches for its own best result and by combining partial results we should be able to offer overall best result for the user.

By dividing user query into keywords, values and associated operators we will be able to evaluate complex queries with simple bee behavior. If a bee evaluates only one keyword or value with associated operator, there is no need to implement complex evaluation methods. This approach is based on the fact that article which is compliant with original query should be compliant with set of derived terms as well. Propagation of sources in this model could be performed in divided dance rooms with post synchronization of preferred sources or we can utilize several “dance spots” within the same dance room. These possibilities are to be further investigated.

to the top | to the main

The Influence of Group Creation on Collaboration

Ivan Srba
beginning master study, supervised by Mária Bieliková

Abstract. One of the most important thing of the recent Web 2.0 is the collaboration. It is expressed mostly by the way how users behave towards content published on the web pages. While in past users could only passively accept information, in the present they can collaborate together and actively participate on the webpage content creation. In spite of many advantages this trend brings also several important problems, i.e. users follow different goals, they have different knowledge of problem area and mostly they exist in different social contexts.

The problem of identification of social groups exists in many areas. One of them is e-learning where we notice the gap between fast developing social software and its application for purposes of e-learning. Nowadays main trends in the area of e-learning are connected with development of the second generation of the Web. Static learning materials were replaced by dynamic social systems which support users’ communication and collaboration. As example we can mention that users are able to cooperate on solving problems and exercises.

It is important to identify appropriate member distribution among groups to achieve successful and effective collaboration. But there is still open question how this distribution influences the way of collaboration. Our idea is propose a method for group creation which will consider social inputs. This method will allow us to create different types of groups (i.e. group members will be friends, experts or novice in problem area etc.). Then we will be able to watch how members in each group collaborate and cooperate. One of the most interesting results can be how members use collaborative tools. We will evaluate proposed method in e-learning system Alef.

to the top | to the main

Using Implicit Feedback in Searching Interesting Content

Peter Študent
master study, supervised by Ján Suchal

Abstract. The amount of data on the web is growing day by day and therefore it is still more difficult to find content that is interesting for the specific user. This problem has a major impact especially on the news portals, where is a very rapid variation of content and there is big risk that the user miss an important article from his point of view.

The aim of our work is to analyze the possibility of using negative implicit feedback based on user behavior in process of searching interesting content on the web and generating automatic recommendations for that content. Main advantage of using implicit feedback in comparison with explicit feedback is that there is no unnecessary burden on users of redundant operations. The aim of introducing implicit negative feedback in the process of generating recommendations is to increase the quality and speed of generating recommendations compared to traditional systems without this kind of feedback.

One of the indicators falling under negative feedback we focused mainly on is information about situations when user immediately exit pages containing uninteresting content without being reading whole page. Currently we are analyzing different forms of adaptation of negative feedback in the process of generating recommendations. One of our approaches is to identify groups of similar users based on negative feedback and make recommendations for these groups.

Proposed solutions of utilization of negative implicit feedback in process of creating recommendations are subject to experimental verification on dataset from the existing web news portal.

to the top | to the main

Social-based Recommendations on the Web

Filip Sucháč
bachelor study, supervised by Michal Barla

Abstract. People in their everyday life often reproduce behavior of other people to quickly adapt and orient themselves in new situations. They go where others are going, they buy food what others are buying. They behave socially and they rely on the fact that someone before them has already solved their problem. This behavior starts to appear also on the Web. E-shops provide their customers with additional „social“ information related to their products such as what other products people buy with the current one, how much people are interested in the product etc. However besides e-shops social behavior on the Web is supported only rarely.

The goal of our work is to support this social behavior on the whole Web by providing ubiquitous social recommendations. A user searching for some specific piece of information on the Web or just browsing the Web without any strict goal could benefit from being able to see the steps of other people who were accomplishing similar tasks on the Web in the past. It could make his search faster and more effective or bring him other useful pieces of information.

The recommendation will be realized as an overlay layer on the top of currently opened web site, using javascript technology. This layer would contain aggregated information about other people who also came to the site, like which sites they have visited after this one, which sites they came from or how much time they have spent on the site. It would be visualized in an easy to understand form such as a pie chart. User following the recommended links could follow the path of other people, navigating easily to what she is looking for or what she is interested at.

Our proposed solution is built on the top of adaptive proxy server developed at Institute of informatics and software engineering STU in Bratislava, which provides us with all data required for delivery of personalized social recommendations. These are visited web pages of users using this proxy server and time they have spent on them. At the same time, the proxy server also allows us to deliver our recommendations to them in an unobtrusive manner.

to the top | to the main

New Approaches to Log Mining and Applications to Collaborative Filtering

Ján Suchal
doctoral study, supervised by Pavol Návrat

Abstract. Our work focuses on two main goals. Novel approaches to log mining and the potential usage of these implicit data in recommendation systems, especially collaborative filtering.

We present a method for mining sources and cascading graphs of viral visits from raw logs. Such information can be useful for detecting influencers and detection of potential sources of viral traffic. We present types of sites for which our method can be used and experiment on real world dataset containing the massively viral start of foaf.sk service.

Second method focuses on mining negative interests of users from basic server logs in the domain of news articles. Such data can be used in addition to positive interest that are normally used for generating recommendations.

Next we present a novel method for linearly scalable nearest-neighborhood based collaborative recommender system using specially prepared fulltext indices. Evaluation is done datasets from largest Slovak news portal sme.sk and github.com recommendation contest. Comparison with graph-based spreading activation recommendation method shows comparable results in means of relevance and with superior scalability characteristics.

to the top | to the main

Bootstrapping a Socially Intelligent Tutoring Strategy

Jozef Tvarožek
doctoral study, supervised by Mária Bieliková

Abstract. Learning can be quite time consuming and unexciting. Even with some of today’s best computer supported instructional technology, students engage in gaming behaviors associated with less learning. Time spent on task and motivation as key factors of effective learning need to be sustained but contemporary tutoring systems seem to be failing in this respect; all too many students drop out due to low motivation. Can computer tutors build trust and respect with students that would motivate them to learn at all?

We present an approach for computer supported education in the form of a socially intelligent learning environment that is available online. It integrates problem solving and instructional materials into individual and group learning scenarios. A Wizard-of-Oz-driven computer tutor accompanies students to maintain their motivation within the learning environment. The agent can hold off-task conversations and guide students to appropriate learning opportunities. Its tutoring strategy is devised by a reinforcement learning control method that operates on socially motivated state and action spaces induced by the human wizard whose interface facilitates rapid prototyping of relevant states and taking appropriate actions. To make the learning algorithm feasible, states are grouped into equivalence classes according to wizard selected state features, and contextual and linguistic reflection is employed to adjust the immediate action to the current learner’s situation. The proposed approach is pedagogically and technologically robust and is well suited for home study, regular classroom use, as well as for both formative and summative assessment.

The approach was evaluated by conducting experiments in the domain of middle school mathematics. To evaluate the robustness of assessments, students worked on tasks generated on-the-fly to discourage cheating, while the human wizard judged the answers. The feasibility study of the socially intelligent agent demonstrated that students who engaged with the agent liked the system more and attained higher learning gains. In a collaborative learning experiment, students solved problems in groups more efficiently when being socially motivated. Finally, the bootstrapping of the socially intelligent tutoring strategy was evaluated in simulated student scenarios. Evaluations suggest that our approach for using computers to support students in the learning process is technologically viable.

to the top | to the main

Exploratory Search in the Adaptive Social Semantic Web

Michal Tvarožek
doctoral study, supervised by Mária Bieliková

Abstract. Effective access to information on the Web, which has become vital to many users and to the whole society, is being hampered by information overload, unavailability of information, navigation issues and user diversity. We aim to facilitate the slow adoption of the Semantic Web by devising an enhanced faceted semantic browser with support for multi-paradigm exploration, personalized recommendation and adaptive view generation.

We employ a comprehensive multidisciplinary approach to facet visualization, generation and adaptation to provide users with advanced information exploration capabilities both in the Semantic Web and Legacy Web information spaces. We perform facet personalization via facet and restriction selection, ordering and annotation to address information overload and user guidance, adaptive view generation of personalized list-based, table-based and matrix-based result overviews, and interactive search result exploration via an incremental graph-based visualization of the information space to enable end-user grade exploration of semantic web content.

We have evaluated the proposed approach in multiple domains (job offers, scientific publications, digital images) with highly promising results based on several user studies performed with our browser prototypes, which confirmed the viability and practicality of our approach in terms of improved task times and user understanding of the explored information space.

to the top | to the main

Long-term User Model Prediction

Maroš Unčík
beginning master study, supervised by Mária Bieliková

Abstract. The trend of teaching via Web is growing for many years. In the first teaching systems, the aim was to build a clever teacher able to communicate and advice to the individual student. In the present, the main work focuses on the learner exploring, designing and using adaptive systems as tools. Adaptive web systems track individual user’s specifics – collect and present the characteristics of the user to eliminate the following problems:

presented information is for the user inappropriate or non-interesting,

user does not know, which way to proceed when viewing content,

user is lost in the content, forget his/her original objectives.

The adaptive learning systems are built to give the learner greater control and responsibility over all aspects of the learning, and especially over the learner model. The core of such systems lies on the user-adaptation. Adaptive Web systems used in e-learning domain often represent the user characteristics by the user model. However, it is nontrivial to design and to build an appropriate user model, and therefore a large amount of effort is dedicated to it.

There are many trends for user modeling. The very promising trend concerns on creating long-term user models. The main idea of long-term user modeling is to use ubiquitous computing, which collect huge amount of data. This data are about the user and belongs to him/her. The collection of personal data offers potential benefits especially in area of distribution of knowledge, recalls and lifelong learning. Data are used to create only one model, which life-long characterize the user and not only in relation to one specific domain.

In our work, we aim to examine and to propose the method for life-long user modeling. We verify the method in the domain of adaptive e-learning framework ALEF.

to the top | to the main

Automated Recognition of Author’s Writing Style in Blogs

Martin Virík
master study, supervised by Marián Šimko

Abstract. In the past decade it has become much easier to create web content even for users with no experience with web technologies. Weblogs are the most typical and the most growing example of this trend. Thousands of bloggers use this hybrid genre to express their ideas, opinions and emotions, making blogs a rich space of topics and writing styles. In proportion to increasing number of blogs, the number of efforts to improve blog-search and recommendation algorithms has also grown. New requirements are aware of blog articles text quality and consider individual writing style an important blog characteristic.

In our research we focus on linguistic characteristics of blog articles in order to recognize and classify writing style of articles, blogs or even authors. We study the grammar, morphology and syntax of selected – Slovak – language and possibilities of computational linguistics to extract the features of document model necessary for further classification. So far we have partially developed methods for morphological and lightweight syntactic parsing focused on describing simple and compound sentences and discovering predicate candidates. We are also able to identify word class for each word and we believe that these methods will contribute to recognition of main article features.

We have considered a distribution between informative and affective articles as most relevant for blogs and for affective posts we have recognized two levels of further distribution: reflection vs. story and emotional vs. rational. By crossing these two levels, we receive four categories, which we will use for our classification. We have gathered a dataset for initial experiments of about 16 thousand blog posts and managed to manually classify a small subset in a user study including several participants.

In the current phase of our project we are experimenting with classifier algorithms and feature sets to develop a method giving best possible results for our classification task.

to the top | to the main

Context-Aware Recommending

Dušan Zelenik
beginning doctoral study, supervised by Mária Bieliková

Abstract. Imagine an intelligent system able to suggest you perfect movie which will make you happy even if you are sad. Our aim is to propose a recommender system which considers your current situation (context). Actual state of the user definitely affects his needs. Starting with simple attributes like time, location or weather, we should be able to recommend appropriate content. However, further analysis showed that the user and his needs are also affected by his mood, emotions and other attributes which are hardly visible for recommender systems. There are also other contexts, e.g. in news recommending it could be set of articles already read or feedback provided before. Besides, all of these contexts could be predefined using the same schema and further relations between contexts could be discovered. If we are able to discover relations between contexts, we are able to guess some attributes like mood. For instance, let’s say that there is strong connection between location, time and mood for specific type of user. When he is at work on Friday after five, his mood is worse than at home on Saturday. To make him happy on Friday, we could recommend a joke which fits his context. And on Saturday we could recommend an article about science.

Discovering relations like context-context, user-context, context-item is our main goal. So far, we analyzed logs of news reading and searched for pattern mainly in the context of time. We used logs of reading sme.sk from March, April and May and users who in average read two or more articles per day. We identified around 1200 combinations of category and section in which were articles placed by authors. These combinations are used to identify similar articles. We aggregated logs using the hour of the day, the part of the day (6 parts), day of the week and day of the month. For every user we identified more these aggregated series regarding combination of section and category. Since the distribution of articles in these combinations is not flat, we calculated the weight of the combination to make these series comparable. Using this simple technique we discovered that many users live in a stereotype and considering time is plausible.

to the top | to the main