Students´Research Works – Spring 2010

Search, Navigation and Visualization

Classification and Recommendation

User Modeling, Virtual Communities and Social Networks

Domain Modeling, Semantics Discovery and Annotations

Web Engineering, Semantic Web Services

Proceedings Template (.pdf)

to the top | to the main


Towards Social-based User Modeling

Michal Barla
doctoral study, supervised by Mária Bieliková

Abstract. Our work deals with enhancing the individual-based user modeling and personalization of adaptive web-based systems with knowledge encompassed within social networks. One of the problems we are aware of in the traditional user modeling is a cold-start problem, when adaptive system cannot provide any meaningful personalization to a new user, for who it does not have any information stored in his or her user model yet. However, such a new user is probably the one which deserves the most some help and guidance provided by the system, in order to get more familiar with its interface, provided functionality and the presented information space itself.
Our goal is to contribute to the cold-start problem by leveraging social information (such as relationships between a new user and other, already present users or membership of a user in a virtual community). The approach is motivated by social behavior, which is inherent to the most of human beings. More precisely, the initial estimate of user characteristics is acquired as a weighted combination of characteristics other users interconnected with various types of relationships, acquired from various sources as well as based on common navigational patterns of users. The advantage of such approach is that it produces the standard user model, which can be maintained by well-established approaches to the user modeling and which can be easily used by classical personalization and adaptation techniques.
We evaluate our method in a domain of information research, such as searching for documents in the open information spaces as the Web is or in closed but vast information spaces like digital libraries or electronic newspaper. We use rather simple keyword-based (tag-based) user model representation coming from various text analysis techniques applied on web-pages visited by the user. More, we acquire various relationships between tags by analyzing folksonomies and employing linguistic knowledge from Wordnet in order to compare particular user characteristics or even whole user models. Our evaluation platform is an enhanced proxy server capable (apart from logging the information gained by analyzing the traffic) to personalize either user requests (e.g., disambiguate the search keywords) or responses sent from particular web server (e.g., annotate or re-rank search results).

In Proc. of Spring 2010 PeWe Workshop, Smolenice, Slovakia, pp. 43-44
to the top | to the main

QoS Aware Semantic Web Service Composition Approach Considering Pre/Postconditions

Peter Bartalos
doctoral study, supervised by Mária Bieliková

Abstract. The aim of web services is to provide functionality, which can produce required data, and effects in a widely accessible, easily discoverable form (by machines). Web service composition enhance to potential of single services. Its aim is to combine several services to supply more complex needs. Web service composition problem is defined as follows: given a query describing the goal, and providing some inputs, design a composite service (depicting the data-, and control- flow) from available services such that if it is executed, it produces the required goal.

The research of service composition in last years tends to focus on issues related to QoS (Quality of Service), pre-/post-conditions, user preferences (e.g. soft constraints), service selection considering complex dependencies between services. Our work in the field focuses on the effectiveness and scalability of service composition aware of QoS, and pre-/post-conditions. In the context of QoS, we introduce a new process based on restricting the set of services which must be considered when looking for a solution. As a result we achieve good results in terms of shorter composition time. In the context of pre-/post-conditions we introduce a novel approach considering more expressive conditions also with value restrictions. Another important issue in the context of our approach is the dynamic character of the environment in which the composition system operates. In real world, web services may be added/removed into/from service registry. The composition system must flexibly react to these changes and compose services based on the current situation. The updates of the service registry must be done fast. In our approach the addition, and removing of services is done in relatively short time comparing to composition time. Hence, the updates do not laten the response of the system to composition request.

To sum up, our work deals with the QoS aware semantic web service composition considering pre-/post-conditions, operating in dynamic environment. In this context we deal with the performance, and scalability of the composition.

In Proc. of Spring 2010 PeWe Workshop, Smolenice, Slovakia, pp. 73-74
to the top | to the main

Connection of Social Networks and User’s Real Life Interests to Engage People in Helping Others

Anton Benčič
bachelor study, supervised by Mária Bieliková

Abstract. Our project aims at providing people the opportunity to help others and the environment by donating or lending their own assets. We use social networks to support this activity by spreading success stories towards friends, followers or other people connected by social networks. For this purpose we have chosen map to be the main navigation control that dominates the whole user interface. The map provides users with a possibility to view other people and their contribution, assets that are being offered or requested by them and collections that are being organized by charities. The positioning of the mentioned elements as well as the final look of the map is controlled by IntelliView – our positioning decision algorithm. IntelliView ensures that region-specific, as well as user-relevant content is prioritized in the final output.

Apart from the static view we provide a dynamic view of activities that introduces the dimension of time. Activities view works on top of the IntelliView in a way that every time an action is evaluated to be relevant to the active user, it is shown on the map. The process of displaying an animation is realized as a transition in which the opacity of static elements that occupy the target area is significantly reduced, while in the time being the animation appears on the map. The animation is then shown and played on the map over a certain period of time, after which it is cleared out by a reverse opacity transition, so the static elements regain their former state.

These animations allow users to see the recent activity of their friends, people they follow or just the activity of a random user they are interested in. The users can also watch the progress of a specific item and the system supports showing the activity in real-time as well. All of this was designed to keep users in touch, support their own activity and thus reduce the possibility of loosing their interest over time.

In Proc. of Spring 2010 PeWe Workshop, Smolenice, Slovakia, pp. 13-14
to the top | to the main

Enhancing the Web Experience by Freely Available Metadata

Peter Bugáň
bachelor study, supervised by Michal Barla

Abstract. Much of information available on the Internet can be easily understood by people, but not at all by computers, which present the actual content of the web pages. The problem is often solved by adding machine-understandable metadata. In addition to “lightweight” metadata in the form of ‘meta’ html tag, which is devoted mainly to the web search engines, we know also more formal metadata defined within Semantic Web initiative. The vision is a Web, in which the meaning (semantics) of information and services is shared across the web applications. Metadata contained within the Semantic web create so called Linking Open Data cloud, that is growing every year, although this growth is slow. Only interesting Semantic Web applications can persuade users to participate more actively in the Semantic Web initiative, which would hopefully increase the growth of the available metadata until a self-strengthening threshold is reached.

When we are reading articles on the web, e.g., online newspapers, we are often unable to understand it quickly (especially longer sections) and we need to re-read it several times. We underline the most relevant keywords on the web pages and annotate them with automatically generated content. Underlined keywords should help readers to quickly recognize the main words of an article while the actual content of annotations should allow for better understanding of the underlined word. Content of annotations is tailored to the particular article, so if an article is about sport, the annotation will contain information relevant to the sport topic. For instance, for a word ‘Lisbon’ within a political article, the annotations will contain information about mayor of the Lisbon, in case of a sport article, the annotations will contain information about sport events which were or will be hosted in Lisbon or about athletes coming from this city.

In Proc. of Spring 2010 PeWe Workshop, Smolenice, Slovakia, pp. 57-58
to the top | to the main

Search Using Personal Profiles

Peter Cích
master study, supervised by Pavol Návrat

Abstract. For Internet users, as well as users of other extensive networks which accommodate vast amounts of different kinds of information, it is at present becoming increasingly more difficult to find the exact piece of information they need. There arises an information overflow which now presents a new, quickly spreading problem. Personalization of the web provides one way of facing this new informational phenomenon.

This project analyses several possible approaches towards the creation of a personal user profile on the basis of which search results may then be reordered – personalized. It describes the basic principles of their functioning and also the use of a thusly created personal user profile. The personalization process of the search results and also the user profile creation are set up by using bookmarks of web pages marked by user. In my proposed solution the user profile is created and constructed on the basis of the user’s favorite web pages he has marked as something interesting. These, for user special web pages, can be simply distinguished from the other pages because the very user has voluntarily marked and saved them – they are usually known as the user’s bookmarks. Because of that we can say that user’s bookmarks represent the user’s long-term interests and according to this information we can with some higher probability predict what the user could be interested in, in the near future.

Long-term interests are assumed as a set of bookmarks, representing the list of the web pages that are presumable closer to the user than the others. We can also assume that if the user will not be feeling the necessity of adding the page to his bookmarks, than there is also the expectation that that page is representing only his temporary interest instead of the long-term.

Our personalization of the searching consists of the modification of returned searching results. This kind of personalization is based on the reordering of standard returned list of query conforming pages – they are ordered by the similarity to the pages stored as user’s bookmarks – his profile. The profiles of similar pages are then pushed to the higher positions in the result list and thus we can say that with this approach there is some kind of personalization.

In Proc. of Spring 2010 PeWe Workshop, Smolenice, Slovakia, pp. 77-78
to the top | to the main

Website Navigation Adaptation Based on Behavior of Users

Michal Holub
master study, supervised by Mária Bieliková

Abstract. Nowadays web portals contain large amount of information that is meant for various groups of visitors. To effectively navigate within the content the website needs to “know” its users in order to provide personalized content. When surfing the web users leave digital footprints that can be used for this personalization. From the web usage data we can determine user’s interests. Users who behave similarly can benefit from mutual recommendation of pages.

We propose a method for adaptive navigation support and link recommendation based on the analysis of the user navigational patterns and behavior on web pages. We also mine the portal to extract interesting information that is presented in a new fashion. Web pages of the selected portal are enriched with new sections with links that might interest the user.

Each user selects different approach when browsing through a web portal. One user can follow links to certain depth and then backtrack if he does not discover desired information. Other user can use the breadth first approach when he tries to visit all links from the menu and returns immediately to the main page. We discover four basic navigational patterns in the clickstreams of all users and group the users according to the prevailing patterns. Within each group we compute the similarity of users more exactly using cosine similarity method on their clickstreams. For a particular user we recommend links from his top N similar users.

The way a user behaves on a web page reflects his interest in this page. We monitor actions he conducts which include time spent, occurrence of scrolling events and copying text into clipboard. Comparing these actions with actions of other visitors to the same page indicates the degree of the user’s interest. We do the comparison using collaborative filtering method. This enables us not only to compare users but also to predict user’s interest in a web page he has not visited yet. For the prediction we take computed interest of his top N similar (according to clickstream comparison) users who already visited the web page.

We apply proposed method to our faculty web portal. First we analyze the portal’s web pages and then for each visitor we create his personalized section of links. These can be links to interesting school events or pages with news that the user might like.

In Proc. of Spring 2010 PeWe Workshop, Smolenice, Slovakia, pp. 27-28
to the top | to the main

Virtual Community Detection in Vast Information Spaces

Marián Hönsch
master study, supervised by Michal Barla

Abstract. Collaborative filtering is present within current web-based systems in many forms. At the beginning they were mostly either item based or user based, but as the time passed, many hybrid approaches combining several techniques from multiple disciplines emerged. However, the basic idea remained always the same: use past experiences of users to get benefits for an individual. We can imagine it as when we are in the woods, we take the paths that others took before us. Significant enhancement of the basic idea is not to use past experiences of all users but instead of it, consider only users which are similar to the user, for who we are targeting recommendations.

Based on this we can start to detect virtual communities of users. As collaborative filtering is always tailored to the specific domain, we chose to focus on recommending news articles, a highly dynamic domain with frequent changes and special user behavior. Nobody reads yesterdays newspaper. Often by picking certain articles, we indirectly express our opinions and preferences. In our work, we consider these fluctuating and time-dependent changes by incorporating influence of volatile communities on recommendations. We use keyword-based layered user models, where layers represent different attributes (e.g. articles from a week ago, long term preference). For the semantics we use latent semantic model or WordNet, the actual keywords are extracted from the plain texts of articles. We assume that article categories are represented by different keywords, so a model can be partitioned based on clusters of keywords corresponding to those categories. The novel approach is to create virtual communities based on these clusters. A user can belong to several communities that are on the same hierarchical level. Recommendations coming from such communities would be more accurate than when we group users based on whole user profile.

In Proc. of Spring 2010 PeWe Workshop, Smolenice, Slovakia, pp. 45-46
to the top | to the main

Text Understanding and Analysis

Martin Jačala
master study, supervised by Jozef Tvarožek

Abstract. As the amount of the online content available increases over time, we need methods for effective visualization, exploring and navigation in this data. Large portion of the published content is comprised of unstructured text, such as newswire articles, personal blogs, online discussions and many other documents. In these documents, we can find interesting information about famous or ordinary people, various events, organizations or relationships between them. Many of the relationships span across multiple articles, thus they are not visible after reading just few of them. However, using automatic processing of such text articles, we can trace such relationships across multiple documents. The discovered relationships help us to see all the entities interacting with each other, moreover we can also gather interesting statistics from the source documents, such as polarity, topic, type of document, etc.

In our work, we primarily focus on the problem of text understanding and entity identification in the open web content. The method does not only extract the entity from the source text, but can also disambiguate and uniquely identify the entity with common or similar name. In the disambiguation process, we are exploring the usable source data for the background knowledge, such as the DBPedia, providing semantic information about large quantity of entities, or the very interesting Yahoo! Semantically Annotated Snapshot of English Wikipedia, while the latter offers large sample of semantically and morphologically annotated text useful for training of various nature language processing algorithms and to provide desired background knowledge. As the result, we would like to create accessible user interface on top of the text processing mechanism. We hope to provide useful and interesting content for many web users, however, the interaction of the users with the web interface can help us to evaluate and further refine the process.

In Proc. of Spring 2010 PeWe Workshop, Smolenice, Slovakia, pp. 59-60
to the top | to the main

Personalized Recommendation of Interesting Texts

Michal Kompan
master study, supervised by Mária Bieliková

Abstract. The data amount on the web is serious problem for the common user. The existence of information is not so relevant, when there is no one who can access or find this information in acceptable time. One of the most relevant sources of information over the web are news portals (nytimes.com, reuters.com, etc.). Most of users prefer large renowned news meta-portals. They include thousands of daily added news from the whole world and there is no chance to access them in a fast and comfortable way for every user. The only way to help the user is to filter large amount of information and reduce it to an acceptable amount.

The main problem in the content-based filtering is effective and enough expressive representation of items (or articles). This is often done by means of text summarization or keywords extraction. These techniques are commonly used in English based systems and cannot be easily applied to other languages. Keywords extraction and summarization brings better results as the other methods but are more time consuming. These methods cannot represent non-text documents without modification.

Our method for similarity computation compresses article information value to short vectors, which are used for fast similarity computation over the specific articles time-window. This vector represents article in an effective way, so there is no need to store whole articles. Then these vectors can be easily used for similarity computations or we can use them in special structures for recommendation e.g. binary trees.

Fast similarity estimation plays the critical role in the high changing domains as news portals are. It is necessary to process new article as fast as possible and start to this article recommendation, because of the high information value degradation. The recommendation method uses server logs to implicitly collect user preferences (user model). Based in this, the recommended list consists of two parts (“before recommended and visited” and “before not recommended and visited”), where the ratio is dynamically computed to adapt actual user preferences.

In Proc. of Spring 2010 PeWe Workshop, Smolenice, Slovakia, pp. 29-30
to the top | to the main

Leveraging Social Networks in Navigation Recommendation

Tomáš Kramár
master study, supervised by Michal Barla

Abstract. Finding a relevant document based on few keywords is often difficult. Many keywords are ambiguous, their meaning varies from context to context and from person to person. Some words are ambiguous by nature, e.g., a coach might be a bus or a person, other words became ambiguous only after being adopted for a particular purpose, not to mention English nouns, which, apart from their natural meaning, also name a software, music band or any other entity. There are also words whose meaning depends on the person who is using them; clearly, architecture means different things to a processor designer than to an architect. Based on the previous observations, we might conclude that using short queries is not a good idea. Unfortunately, this is how we search.

The search engines work like databases: they crawl and index documents and respond to queries with a list of results. The order of documents depends on the adopted retrieval model and, more specifically, on the implemented relevance function. The most widely used search engine today — Google — uses a PageRank relevance function: the more is a document linked to, the more likely it is to appear at the top positions. This ordering is however not always compatible with user’s information needs. A programmer searching for cucumber is probably not looking for a salad and a programmer searching for c strings probably does not want to buy underwear.

We tackle the problem by implicitly inferring the context and modifying the user’s query to include it. The original query is enriched with additional keywords which capture the user’s focus. In case of the said programmer, the resulting query might be cucumber testing, which provides much more valuable and relevant documents than the original query. We select additional keywords following the social network or rather the virtual community the user belongs to in this network. The search thus becomes personalized – the same query for different user from another community might be cucumber salad.

In Proc. of Spring 2010 PeWe Workshop, Smolenice, Slovakia, pp. 47-48
to the top | to the main

Interactive Photo Retrieval Based on Semi-Automatic Annotation Using Visual Content and Folksonomies

Eduard Kuric
master study, supervised by Mária Bieliková

Abstract. Nowadays web photo management systems provide features for users, such as organizing, sharing and searching photos. With increasing popularization of digital and mobile phone cameras, there occurs a need of quick and exact searching. Content based indexing of photos is more difficult than text documents because they do not contain units like words. Searching is based on annotations and semantic keywords that are entered by a user and associated with photos. However, manual creating of annotations is very time-consuming and results are often subjective. Therefore, photo semi-automatic annotation is most challenging task.

Traditional approaches for semi-automatic annotation are based on combining keyword-based and content-based photo retrieval. The user enters a query consisting of a target photo and keywords, typically only a caption. The aim is to find most similar photos and to extract related keywords. After a retrieval process, the user selects the best relevant keywords and associates them with the target photo. The process usually takes place in three steps. First, a keyword-based technique is used to obtain a list of candidate photos that are also associated with the input caption. Second, content-based photo retrieval technique is used to assemble a ranked list of visually related photos. Finally, a method is used to combine the ranked list into an annotation list which represents keyword proposals. These solutions employ global low-level features like color and texture for a content comparison of photos. However, the user query can include a full photo or a just part of the whole photo which we call object-of-interest.

In our work, we propose a novel method for annotating photos which extends existing solutions of searching similar photos primary according to objects-of-interest. Often, those objects represent a foreground of a photo that is in comparison with a background less dominant. Therefore, in traditional approaches of content-based photo retrieval foregrounds can be ignored (low-rated) despite of the fact that often represent most important elements of the photo. We use an interactive photo segmentation to determine objects-of-interest. To capture local photo information that is in object retrieval essential, we use scale invariant features in combination with a hash-based method known as locality sensitive hashing. Our proposed solution does not require input caption but in the case of insufficient results allows extending input query of the keywords. Thus, our solution allows identifying objects in the photo and using relevance feedback user can improve performance of our content-based photo retrieval.

In Proc. of Spring 2010 PeWe Workshop, Smolenice, Slovakia, pp. 7-8
to the top | to the main

Web Content Preprocessing for Clustering

Tomáš Kuzár
doctoral study, supervised by Pavol Návrat

Abstract. There is a huge amount of unstructured content (e.g. blogs, comments) available on the internet. Unstructured data needs to be put into structured representation in order to apply the machine learning techniques. Transformation of unstructured information into structured representation is called preprocessing.

We focus on preprocessing phase of web content clustering. We evaluate the impact of different data preprocessing methods on success of blog clustering. We found out that applying various text data manipulation techniques in preprocessing can significantly improve the quality of clusters. The quality of clusters is measured by traditional clustering metrics called F-measure.

We use several methods in term extraction phase: lemmatization, taxonomy based term extraction, lexical classes building and usage information consideration. Taxonomy based term extraction searches for Eurovoc terms in articles and after exact matching replaces the term from the article by more general term in Eurovoc hierarchy. Slovak language uses high number of suffixes. We created method, can be considered as basic stemmer for Slovak language, for grouping lexically similar terms into one term. We calculated lexical similarity on terms longer as three characters. If two terms are equal on more on 75% of term length, they are mapped on some lexical term. While information gathering process we downloaded not only articles but also information about count of the discussion posts. We suppose that information about count of the discussion posts can improve the quality of clusters. Some topics have higher average discussion count then some others. We added information about discussion count into article-topic matrix.

Our experiments consisted of several steps: term extraction, LDA model based term selection, clustering and F-measure based evaluation. We used some combinations of term extraction methods and we found out that only lemmatization has always enhanced the quality of clusters. As future work we want to focus on some other term extraction methods – term extraction based on named entity recognition and term extraction method enriched by extensive knowledge of text source.

In Proc. of Spring 2010 PeWe Workshop, Smolenice, Slovakia, pp. 79-80
to the top | to the main

Recommendation and Collaboration through Implicit Identification in Social Context

Martin Labaj
master study, supervised by Mária Bieliková

Abstract. In the field of e-learning, the identification of difficult and/or interesting parts of learning text can be useful feature for various tasks like rewriting the text, showing where to focus or offering help. However, methods, which extract this information by directly interacting with the user for example by asking him to rate his first read comprehension, can lead to distraction in the learning process and require that users participate voluntarily and answer truthfully.

In our work, we track implicit feedback/interest identifiers including user scrolling, e.g. to which portion of text (or any other scrollable content) has the user scrolled and what time he spent there (the concept of read wear). Using statistical approach and taking intersections and overlays of this data collected from many users, we can determine which part is the most time-consuming and therefore interesting and/or difficult. With enough users, details about various parts of the content can be obtained very precisely, even with a precision of single words. Other important data consist of mouse clicks (click heatmaps) and mouse movement (flowmaps).

As in any method dealing with time based user tracking, there is a possibility that a user is pursuing different activities during evaluated time periods. We try to avoid this by using low-cost webcam and employing two-level physical user tracking: face tracking, where presence of user at the computer is detected and eye tracking (where possible), where user gaze is evaluated. Both methods allow leaving out time periods when user is not directly using computer or even when he is at the computer, but is working with different parts of screen, respectively. The gaze detection also increases precision of fragment identification.

Subsequently, readily available data about user’s active fragments can not be only used for identification and recommendation, but also in a social context. By augmenting the displayed content with indication of active fragments of other users, users see how they are doing against others. We also believe to increase user collaboration by providing messaging with indication of where each user currently works in the same content. As the user contacts friends learning the same part, he is not distracting them away from their current study and he also obtains better advice.

While the main evaluation is through the ALEF learning system, one of the possibilities we are considering is implementation through Adaptive Proxy which would also readily bring this concept to open space of Web.

In Proc. of Spring 2010 PeWe Workshop, Smolenice, Slovakia, pp. 31-32
to the top | to the main

Spatial and Time Navigation in Multimedia

Michal Lohnický
master study, supervised by Mária Bieliková

Abstract. Almost every man takes a camera for holidays or trips. However, what is the purpose of this ritual? It is simple. A user wants to literally save his experience and not one experience but a bunch of experience which creates a uniform story. Most people on their holidays are not alone and so their friends also create another experience connected to the holidays. We must not forget that the process of having great times is only about evoking a kind of emotions, because of that the proper visualization of photo albums should communicate these emotions.

Our work is aimed at augmenting user experience while browsing photos and other multimedia. This is mostly carried out by extending standard approaches by the use of two attributes of photography – geographical location and time stamp. The geographical location is one of the most valuable aspects of the photography. The information where the photo was taken says a lot about the photography even before a photo is viewed because the location is a kind of a connection between the photo and events in the area. For example, we can easily gain the character of vacation (weather, beach, hiking etc.) from the location.

The other important attribute of the photo is time stamp. The digital photography has existed for nearly twenty years and the time stamp is becoming more and more essential in archiving and browsing of photos. The time stamp is the first attribute in which the photos are ordered in photo albums but the navigation in this area is insufficient.

The main goal of our work is to supply innovative navigation and browsing in photo albums which supports storytelling and collaborative creating of digitally saved experience. To combine this with proper amount of photo analysis and informational augmentation we can create various views of a complex collection of photos in photo albums. This style of browsing photos can be used by the users for sharing photos in much higher quality, for finding photos which they miss in their photo collections, to view places where they intend to go in various time periods etc. It is also usable in commercial sphere – in travel agencies, botanic monitoring etc. When we add the direction of taking photos to the location we can create an ideal presentation tool for real-estate companies.

In Proc. of Spring 2010 PeWe Workshop, Smolenice, Slovakia, pp. 9-10
to the top | to the main

Does HTML Tags Improve Results of ATR Algorithms?

Milan Lučanský
bachelor study, supervised by Marián Šimko

Abstract. Our work focuses on mining relevant information from web sites. Information that could be retrieved from a web page could be divided into two groups. Information related to the first one is stored in plain text. A visitor can see and read this plain text on the web page. The second group of information is hidden from visitors. It is stored in the source code of the page. This group is called meta-data as it is “data about data” on a web page. For both groups there are many methods how to extract the information.

The main goal of text mining is extracting relevant information from text. There exist several more or less successful methods for extracting keywords from plain text. Meta-data extraction from the source code is used by search engines. They evaluate different HTML tags from the web pages and give specific weight to terms marked by those tags. It seams promising to combine these techniques to create one measure for weighting extracted keywords. It should be possible to assign one value to each word considered to be relevant.

Our aim is to propose a method which combines known algorithms from the web content mining (text mining) together with meta-data gathered from source page tags. The method can crawl a specific web site, but not pages outside the main domain. It uses algorithms like C-value and TF-IDF to retrieve terms from the main text of a web page. It also uses other algorithms to retrieve anchor text, title of web page, etc. from the source code. Finally, we use proposed merging algorithm, which joins both sets of words and gives a weight for every word. By this, we create an index of pages, which can be used as a base for next research (e.g. for building searching engine).

to the top | to the main

Improving Query Suggestion Capabilities Using Web Search Results

Ladislav Martinský
master study, supervised by Pavol Návrat

Abstract. Correct query formulation for web search is very important prerequisite for successful retrieval of desired results. Powerful tool for this purpose, used in most popular search engines, is query suggestion. Capabilities of these existing solutions are however limited in scope only on the terms with high popularity. Our approach provides alternative way to generate these suggested words, based on real time search results analysis. Main advantage is presence of suggestion for any meaningful query for which would user get one or more relevant search results, not only for popular ones. Search results present small units of information about particular query or context. These units consist of words with different levels of importance for user.

Four main characteristics were chosen to distinguish which of them are applicable as suggested words to enrich query. Location of word, in which it appeared in result (title, description, url address, etc.) is measured by Locality characteristic. Each part of result has its own importance. Title is for example more important than description. Uniqueness represents number of different results in which particular word appeared. This characteristic can help to distinguish different contexts. Popularity is percentage of occurrence of word to all the other words. Last characteristic Distance is count of words between examined word and query in one particular area of result. Evaluation scale for every characteristic was set to 1-10. One potential suggested word can have maximum score of 40.

Experimental results of this main core function of application have shown good potential to choose relevant and helpful suggested words for user. However query can have a lot of meanings to different users. Presenting suggested words in general is very helpful, but the amount of help depends on each user particular needs. This problem is addressed by second part of work – personalization. Every action of user towards application (click on suggested word, click on result, etc.) represents his own needs and requirements and is stored in personal profile. This profile information is important for fifth added characteristic Personalization, representing the importance of particular word in according to his actions in past. Final result is ability to lever up important words for a particular user, which can potentially offer more relevant help to him/her.

In Proc. of Spring 2010 PeWe Workshop, Smolenice, Slovakia, pp. 11-12
to the top | to the main

Towards Semi-automated Design of Enterprise Integration Solutions

Pavol Mederly
doctoral study, supervised by Pavol Návrat

Abstract. Our aim is to reduce the effort needed to create enterprise integration solutions by fully or partially automating the process of technical design of such solutions. This process includes choosing the overall architecture style, selecting an appropriate integration platform, and designing the solution components so that all the functional and non-functional requirements are met. Functional requirements are typically given in the form of an abstract description of the workflow and data transformations the solution has to implement. On the other hand, typical non-functional requirements cover areas of availability, reliability, performance, security, logging and auditing, and maintainability of the solution, as well as ensuring compatibility with message formats, protocols and application programming interfaces used by individual systems being integrated.

We work primarily with integration solutions that utilize a standardized message-oriented middleware infrastructure for communication between their components. So far we have developed two methods that create designs of such solutions, taking into account a subset of the non-functional requirements categories described above.

The first method uses an action-based planning approach, representing properties of message flows present in the integration solution as the planner’s states of the world and potential solution components as planning operators. It encodes an integration problem as an action-based planning problem, executes a planner, and then interprets the plan found by the planner as a description of the integration solution.

The second method achieves similar goals using constraint programming. It encodes an integration problem as a Constraint Satisfaction Problem (CSP), representing properties of business and integration services and message flows between them as CSP variables, and the laws of messaging-based integration as constraints over these variables. Then it executes a CSP solver and interprets the solution found as a description of the integration solution.

Current results show that these methods, especially the second one, are able to find solutions for practically-sized integration problems within reasonable time (seconds). Yet what is needed is to broaden the set of design issues tackled by the methods, e.g. to find the optimal placement of logical data elements in messages, to deal with security issues, to support diverse message transport protocols, and to deploy solution components into ESB containers. We also plan to more precisely define the notion of solution optimality. We expect that in order to achieve these goals we would have to involve the developer in the solution-finding process. Finally, we plan to provide a code-generation module for selected integration platforms.

In Proc. of Spring 2010 PeWe Workshop, Smolenice, Slovakia, pp. 75-76
to the top | to the main

Geographically-oriented Donating Using Social Networks and Its Visualization via Maps

Roman Mészároš
bachelor study, supervised by Mária Bieliková

Abstract. People nowadays have too many items, that they find useless or they actually do not need them. On the other hand, there are many people who are in need and they would like to have some stuff either given or lent. The idea is to bring people a better overview of all the things that are free to be used by somebody else but its current owner. We brought a system into this area. We present the information via geographical maps. To let everybody use our system we develop it as a web application and to make it handy we develop a mobile application as well. We use existing social networks instead of creating a new one. Social networks are used mainly for spreading the idea of our system via short messages.

Our solution provides intelligent evaluation of items which are going to presented to others. The system should decide which items to display. The evaluation is divided into four parts: item evaluation, user evaluation, events evaluation and action evaluation. These parts are combined to produce overall ranking of items. The items are visualized as a graph, so the evaluation process uses ideas from graph algorithms like spreading activation or page rank.

Evaluation process is very important because it is used for recommendations for users and for evaluating of groups and regions. Recommendation for the user is a key part, because while the system recommends things that a user is not interested in, he finds the system boring and will not use it any more. Evaluation of users and groups considering their activity within the system is used to motivate the users to use the system and to compete against each other to help each other.

In Proc. of Spring 2010 PeWe Workshop, Smolenice, Slovakia, pp. 13-14
to the top | to the main

Collaborative Tagging for Word Relationships Mining

Tomáš Michálek
bachelor study, supervised by Marián Šimko

Abstract. Huge amount of data on Internet is demanding for effective way of searching in it. Computers are unable to process unstructured text, which are most of data on Internet. Nowadays searching is heavily based on looking up key words. It’s like playing darts and hoping to hit the right words. Even then we won’t get the information we are looking for, instead get amount of links referring to resources (articles, pages, etc.) which can hopefully contain answer. Problem is, we have to look it up. Even answering simple questions can be a long process.

We need to find a way, computers will be able to understand and process this information on the Internet that means describe relations between words. Then we can use automated systems to process text, extract information and maybe reveal new facts. System describing this relations between words is called ontology – formal explicit specification of shared conceptualization. But building this set of concept definitions is a very time consuming process which requires high level of expertness. But these systems are still built by man which make predispositions for them not to be formal and shared.

Fastest growing part of the Internet today are social networks, which are hiding a great potential of human computing power part of which can be used for building ontologies. We describe these existing social networks, collaboration systems like delicious.com, flickr.com of citeULike.com. Each one of these web sites is collaborative tagging system offering opportunity to use this tags in process of finding relations between words. This can lead in building a basic skeleton, taxonomies of large ontologies. We focus mainly on describing temper of folksonomies – intersection between taxonomies and social networks – and using these systems for word relation mining.

In Proc. of Spring 2010 PeWe Workshop, Smolenice, Slovakia, pp. 63-64
to the top | to the main

Personalized Exercises Recommending for Limited Time Learning

Pavel Michlík
master study, supervised by Mária Bieliková

Abstract. The objective of adaptive navigation is to help a student choose the best topics or learning objects to focus on, in order to maximize the learning efficiency. We focus on exercises as an important part of preparing for an exam. Since the time for preparation is limited by the date of the exam or midterm, it might be not sufficient for learning all required concepts perfectly, especially for students who started preparing late.

Our goal is to help the student to achieve as good exam result as possible. A strategy used by many students is going through all topics in the course very quickly and learning them at least to some extent, rather than learning few topics in detail. Our method for recommendation is designed to help the students to prepare for the exam using the former strategy.

To achieve proper learning time distribution between all required concepts, we attempt to determine optimal knowledge levels of all concepts at the end of learning time, which are achievable at the current learning speed. Using the overall knowledge level increase from the learning start, we estimate the knowledge level increase from present time to the end of learning and divide it between all concepts in such way, that the final estimated knowledge levels of concepts correspond with the concepts importance given by the teacher.

To make a recommendation for a student, we compute an appropriateness value for each exercise. Then, a predefined number of exercises with largest appropriateness values are recommended. Three criteria are used for exercise evaluation: concept appropriateness (decides whether the student should learn the concepts covered by the particular exercise, considering the estimated achievable knowledge levels), exercise difficulty appropriateness (prevents the student from being uninterested or discouraged) and time since the last attempt to solve the exercise (suppresses recommending of recently viewed exercises).

Our solution is currently evaluated in controlled experiments within the Functional and logic programming course. Experiments consist of a pre-test, a learning session and a post-test to verify the adaptive navigation impact on students’ learning performance.

In Proc. of Spring 2010 PeWe Workshop, Smolenice, Slovakia, pp. 33-34
to the top | to the main

Exploring the Possibilities of Annotations in Learning Content

Vladimír Mihál
master study, supervised by Mária Bieliková

Abstract. Modern web based educational systems employ principles of Web 2.0, utilizing tools to allow students to actively contribute to the learning content and personalizing the content and presentation to them. One of the techniques allowing contribution to the learning content is user annotation. Besides the contribution and active participation of the student, annotations provide other several possibilities of their utilization within the learning domain.

Using annotations for commenting the learning content by students or teachers can be extended from leaving a short text comment to the conversation between all users. Conversations and discussions embedded in the annotations will be spread through the learning content providing to students information about upcoming events or current activities. Discussions will be similar to forums or microblogs, which are already well known and used amongst students, what will partly serve as motivation.

When several students select certain words or phrase within the learning document, it indicates that the phrase is important within the context of the document or even being one of the important keywords characterizing given document. These words or phrases can also identify concepts described in the document allowing us to create a community based domain model. Furthermore the advantage of such model is its dynamic nature, since students can annotate learning content during whole course. Such model will not only reflect relations between documents and concepts but also include current interests of students.

Creating annotations the student provides us information about his current activities and progress within the course. Combining this information with the results of mid-term tests we can discover parts of the document, which were studied by students, who were more or less successful on the test. Displaying these pats of the learning content to students we can recommend texts, questions or exercises which helped other students to achieve their success on the test.

In Proc. of Spring 2010 PeWe Workshop, Smolenice, Slovakia, pp. 65-66
to the top | to the main

Visualization of Welfare on Maps and its Personalization

Roman Panenka
bachelor study, supervised by Mária Bieliková

Abstract. There are many social networks spread around the world nowadays. Common approach to visualization of these networks is text or graph interface. We decided to provide a map interface support for social networks visualization. Our solution is aimed at graphical representation of social networks. To illustrate this idea we have built a system for lending and donating old and useless items. Our system uses foreign social networks to be visualized on the map, we do not build our own social network but we use item donating and lending data as our own social network.

We integrated social networks and personalization of users’ views. Currently, we focus at the biggest social network – Facebook. Since our system is designed to be open to other social networks, it is easy to integrate them in the future and to extend system availability. Integration of social network is two-way. Users can authenticate with his social network credentials (currently Facebook, Twitter and Windows Live) and start using our system with the same friends, same profile information without any registration. Furthermore they can link more existing accounts and enjoy more usability.

On the other hand there is integration to his Facebook wall. Our system provides triggered updates to users’ wall including photos of things they have provided, things they have borrowed or lent, events they have done and regular updates with user statistics, social impact information and posts about successful stories. All these updates are controlled and adjustable, to lower the measure of spam.

In addition to common statistics of users’ activity, there are statistics provided by users’ geographical location. They include amount of welfare done in their location, statistics of users from their neighborhood and especially comparison with neighboring regions and countries. This indirectly leads to geo competition and bigger interest in our system.

To provide better item recommendations, we use intelligent algorithms to provide personalized and interesting views to users. To achieve this, our system works with keywords of the current item and their synonyms, with similarity of its categories, with age of items in our system and most of all with geographical location of items. Users and items are also user-evaluated, what gives us better image about his interests.

In Proc. of Spring 2010 PeWe Workshop, Smolenice, Slovakia, pp. 13-14
to the top | to the main

Improving Social Skills Using the Social Exchange Framework

Jana Pazúriková
bachelor study, supervised by Jozef Tvarožek

Abstract. Children and youngsters today have problems with socializing. The main reason seems to be their awkwardness in the initial phase, they do not know how to act when meeting new people, how to interact with them smoothly.

The main aim of this project is to develop a method that would improve young people’s social skills. We devise a strategic game that offers them the opportunity to examine the process of making new friends and socialize as many times as they need without the fear of failure. In the game, the user first specifies the essential characteristics of their personality and some characteristics of their dreamed-of friend, somebody they would like to meet and relate to. The goal of the game is to become a friend of that person. This can be accomplished by spending time together doing various activities – by exchanging the benefits in social framework.

As the communication is crucial for establishing, maintaining and improving the relationship, we give the special attention to the dialogs during social interactions. They are text-based and both the users and the system select utterances from a set of available options. Dialogs can progress in several directions due to personal characteristics of interlocutors, their moods and influence of past conversations. Certain questions and answers are chosen according to those circumstances.

The game gives children the opportunity to simulate the process of socializing in the real world, gain experience and, eventually, enhance their own social capabilities. The software prototype is web-based and is implemented in Silverlight 3.0 and C# programming language.

In Proc. of Spring 2010 PeWe Workshop, Smolenice, Slovakia, pp. 49-50
to the top | to the main

Karol Rástočný
master study, supervised by Michal Tvarožek

Abstract. The difficulty of finding of relevant data on the Web is graduating as web repositories grow. Therefore, we propose an approach for browsing the Semantic Web, which can help users find relevant results, i.e . how to find data in the Semantic Web, and how to browse similar and/or related data entities.

We extend the faceted browser Factic, which displays results in thumbnail matrix/list view with an additional view where results are categorized in hierarchical clusters. This view helps users to browse large numbers of results in a more organized way. We propose an approach to hierarchical cluster creation and labeling using semantic similarity computed from meta-data.

Faceted browsers can help users to find data and information but it often happens that users want to explore resources similar to an already found result. We employ view-based search within the Semantic Web via navigation in a 2D graph. The process starts with one central node, which represents the initial result and some nodes around that represents its facets. After that, the user can browse the graph by expanding nodes representing facets. To prevent the graph from becoming unclear due to node expansion, which can result in many (irrelevant) nodes, we propose the clustering of results and the marking of facets. Clusters are created from results that have the same facets displayed on the graph and behave in the same way as results. This means that clusters are connected to facets and users can display their facets. Users can filter new results by marking facets as:

  • Wanted – new results should have direct connections to this facet.
  • Unwanted – new results are not allowed to have direct connections to this facet.
  • Possible – new results may have direct connections to this facet. This mark can be used as a default state.

In Proc. of Spring 2010 PeWe Workshop, Smolenice, Slovakia, pp. 15-16
to the top | to the main

Online Gathering of Information from Text Sources

Štefan Sabo
master study, supervised by Anna Bou Ezzeddine

Abstract. The project deals with gathering of information from text sources available on the web. The information wouldn’t be gathered from databases, or already processed sources, but from random sources in natural language. These sources may come in many forms, like consumer forums, articles, user comments or various blogs. The aim of this project would be to create a system, able of locating, harvesting and subsequently analyzing such sources. This analysis would be able to provide us with information about various products, events, etc. The first idea would be to implement the mining of opinions of consumer products such as cell phones, notebooks, or other electronics. Although this is just a first notion and could be subject to a change with further progress.

Similar systems dealing with opinion mining are no new idea. However, this system would differ in the means of information locating and gathering. To achieve reasonable reliability we need to analyze a rather broad database. In order to do so, I would like to utilize the social insect model, namely the metaphor of bees gathering food. The bees would be implemented as web-crawlers. These crawlers would be able to search the web and download pieces of information for further processing. The advantages of this approach consist in the ability of web crawlers to make decisions and to prefer the more suitable sources over the ones that are less interesting. The new sources tend to be linked more often than the older ones. So the web crawlers searching the web are more likely to localize such a source, just like a bee stumbles upon a quality food source. These sources would subsequently be harvested, meaning that the text information would be downloaded to a database and ready to be analyzed. Again, an another advantage of this model, as more bees will be allocated to harvest the better sources.

The subsequent analysis of downloaded data would provide us with opinions about the chosen products. After the implementation, series of tests would be performed. Aim of these tests would be to compare different strategies, or parameters and find out, how good the produced results would be.

In Proc. of Spring 2010 PeWe Workshop, Smolenice, Slovakia, pp. 67-68
to the top | to the main

Geonavigation in Social Networks

Márius Šajgalík
bachelor study, supervised by Mária Bieliková

Abstract. The basic idea of our project is to provide everybody a way to help other by means of their own possibilities. Everyone possesses lot of things, but there are only a few, which are in active use. Thus, people can help by gifting or lending those, who are in need, or just can make use of them. To accomplish this goal, we integrate social networking as a popular means of social contact. To be as accessible as possible, we integrate existing networks such as Facebook, Twitter and Windows Live, which enable users to remain in their original social environment without need to recreate it in another place.

Due to material nature of items migrations, we integrated new way of navigation in social networking through use of map. Basic concept of presentation is to visualize as many information as possible on map. Every user, item, event and action is visualized on map. Map also represents basic interface of interaction between users. Therefore it is very important to preserve clarity in visualization.

To meet specific needs of our system, we proposed our own method of visualization. Items are iterated over by their priority and every time there is a check, if there is a collision. If not, the item is displayed. If there is a collision, the item is considered to be displayed with less importance and in order not to overlay those already displayed (which have evidently higher priority), their size is reduced and opacity decreased. Ordered by importance, they are grouped into several levels of importance. Visually, displayed items differ based on level, which they are included in. To maintain lucid view, items distribution is pushed to the last level, which is not displayed. Thus, visual differences among levels can be less significant. Presence of hidden items in the last level is represented by background color of appropriate topmost item. In this manner, density can be viewed as well. By use of several levels of item visualization, it is showier for a common user.

In Proc. of Spring 2010 PeWe Workshop, Smolenice, Slovakia, pp. 13-14
to the top | to the main

Enhancing Exploratory Search: Graphs, User Modeling and Search History

Jakub Šimko
master study, supervised by Michal Tvarožek

Abstract. Focal point of our research is the exploratory search. This paradigm is oriented towards helping the users during learning and investigating search tasks, where user’s goal is not a single web document of fact. It tries to fight the “information space invisibility problem”, best described with typical example of web search, where user has to guess search keywords rather than picking a query from the list. To do this, exploratory search tries to propose the user with clues or suggestions at each step of his search session. Clues are produced based on extraction and analysis of space semantics and presented with respect of user’s personal interests. For presentation of clues and other properties of the information space, the exploratory search uses various visual structures (including graphs).

We facilitate exploratory search in several ways. As first, we develop novel approach intended to reduce user effort required to retrieve and/or revisit previously discovered information exploiting web search and navigation history. We collect streams of user actions during search and navigation sessions, identify individual user agendas and construct and persistently store visual trees representing session history. We provide users with a History Map – a scrutable graph of semantic terms and web documents with full-text search capability over individual history entries, constructed by merging individual session history trees and the associated web documents. The Map semantically organizes a user’s browsing history (with the help of the Delicious taxonomy) and enables him to quickly recall information distributed over several documents and/or sessions.

Also, our interest lies in discovering relationships among web documents and terms, that are useful for providing navigation clues or for constructing structures like history map. Based on user search sessions, we map their initial queries to end results and refine suggestions for future occurrence of such queries. We also discover relationships between terms themselves by special web search game in which users have to formulate search queries in specific format to minimize number of returned results. Query format forces players to use terms that are related together. Multiple occurrence of the same term combinations result to relationship creation.

In Proc. of Spring 2010 PeWe Workshop, Smolenice, Slovakia, pp. 17-18
to the top | to the main

Lightweight Semantic Search Based on Heterogeneous Sources of Information

Marián Šimko
doctoral study, supervised by Mária Bieliková

Abstract. To satisfy user’s information needs, the most accurate results for entered search query need to be returned. Traditional approaches based on query and resource Bag-Of-Words model comparison are overcome. In order to yield better search results, the role of semantic search is increasing. However, the presence of semantic data is not common as much as it is needed for search improvement. Although there are initiatives to make resources on the Web semantically richer, it is demanding to appropriately describe (annotate) each single piece of resource manually. Furthermore, it is almost impossible to make it coherently. The current major problem of the semantic search is the lack of available semantics for the resources, especially when considering the search on the Web.

To overcome this drawback, we propose an approach leveraging lightweight semantics of resources. It relies on resource meta-data – model representing resource content. It consists of interlinked concepts and relationships connecting concepts to resources (subjects of the search) or concepts themselves. Concepts feature domain knowledge elements (e.g. keywords or tags) related to the resource content (e.g. web pages or documents). Both resource-to-concept and concept-to-concept relationship types are weighted. Weights determine the degree of concept relatedness to resource or other concept, respectively. Interlinked concepts result in a structure resembling lightweight ontology thus allowing automated generation (we have already performed several experiments with promising results in e-learning domain).

Having domain model as described above, we examine the possibilities of search improvement. We propose two variants of so called concept scoring computation. With concept scoring we extend the baseline state-of-the-art approaches to query scoring computation expecting an improvement of the search. Utilizing meta-data we are able to assign the query to particular topic (set of concepts) and yield more accurate search results with respect to related resources. Currently we are working on the evaluation of the proposed approach.

In Proc. of Spring 2010 PeWe Workshop, Smolenice, Slovakia, pp. 19-20
to the top | to the main

Tracing Strength of Relationships in Social Networks

Ivan Srba
bachelor study, supervised by Mária Bieliková

Abstract. Current web is known as a space with constantly growing interactivity among its users. It is changing from a place for data storage to the social place. Place where people not only search interesting information, but also communicate and collaborate each with other. Obviously, the best and biggest places for common interaction are social networks where people can arrange and explicitly express their relationships. If we know about strong or weak relationship intensity between two users we can provide actions which depend on actual context of deployment. For example we can provide adaptive recommendation. Also the user can control evolution of his or her relationships to friends. If some of these relationships have a tendency to weak, we can notify this user and propose the way how to change current state (e.g., send a message or send some virtual gift).

We have proposed a method for analysis of the evolution of user’s relationships and its evaluation by means of developed web-based application, which approximates the user’s relationships with other users in the time. This approximation is based on varied user’s activities performed in social networks. Example of this activity is sending a message or uploading common photography. Such activity we denote as a rate factor. Partial intensity expressed by one rate factor depends on the rate factor’s weight, the count of all appearances of the rate factor and for some types of activities also on time when the activity happens and the duration of influence. All these effects are included in sequence of calculation.

We can use many sources of user’s activities to evaluate proposed method. We chose for experiment well known and popular social portal Facebook. We developed web system Intensity Relationship Analyzer & Presenter to realize the proposed method. This application uses wrapper to connect to social network Facebook and to data mine rate factors via Facebook API. In the experiment we hypothesize three results: first one is that the calculated intensity will describe similar distribution of user interaction among friends as in related works. Second one is that the calculated intensity will represent similar evolution of relationships in time as it was described in other researches. Finally, third expected result is that the method will calculate relationship intensity for first ten best friends with 80% reliability.

In Proc. of Spring 2010 PeWe Workshop, Smolenice, Slovakia, pp. 51-52
to the top | to the main

Improving Search Using Graphs and Implicit Feedback

Ján Suchal
doctoral study, supervised by Pavol Návrat

Abstract. With the coming era of semantic web, large, structured and linked datasets are becoming common. Unfortunately, current search engines mostly see web only as a graph of pages linked together by hyperlinks, thus becoming insufficient for users for searching in such new, structured and multidimensional data. When dealing with multidimensional data, identifying relations and attributes that are important for users to achieve their searching goals becomes crucial. Furthermore every user, can have different priorities, different goals which can even change in time.

One of the goals of this work is the extension of existing graph algorithms for multidimensional data, where the usage of tensor algebra and multigraphs can be useful, in contrast with currently preferred matrix algebra. Such extension of graph algorithms would be able to increase relevance and quality of search, and even enable new quality of query formulations.

Evaluation of relevance and quality of search can be done gathering implicit feedback (e.g. quality can be measured just by monitoring user interactions with the system). Another goal of this work is the exploitation of gathered (implicit or explicit) feedback from users to not only evaluate the underlying system, but also to analyze users behavior thus opening possibilities for adaptation and personalization.

The main goal of this work is the usage of implicit feedback in search engines dealing with large multidimensional data, to improve search result quality and relevance of results.

In Proc. of Spring 2010 PeWe Workshop, Smolenice, Slovakia, pp. 35-36
to the top | to the main

Cooking a Socially Intelligent Tutoring Platform

Jozef Tvarožek
doctoral study, supervised by Mária Bieliková

Abstract. Educational technology has moved from tools that automate repetitive tasks such as grading tests to intelligent tools that provide personalized instruction. Intelligent tutoring systems give direct one-on-one instruction and feedback to students during problem solving. Students, however, often engage in off-task behaviors that diminish learning gains. Maintaining sustained student motivation is therefore important for effective learning, yet providing motivational feedback is often at odds with cognitive scaffolding. Various approaches for improving motivation have been proposed. The affective support for learning seems difficult to realize and therefore remains limited, while the narrative-centered story-based approaches are not directly applicable to traditional domains such as mathematics and computer science.

In this work, we propose to enhance computer-supported learning systems with a virtual conversational agent that employs socially intelligent dialog strategy to increase student motivation and guide students to instructional activities appropriate for their current context. The activities (problem solving, course notes) are augmented by social features (synchronous group work, annotations, asynchronous discussions, etc) which are subsequently used by the tutoring agent to facilitate a socially encouraging learning path for individual students. The dialog strategy is induced by reinforcement learning method on Wizard-of-Oz natural language data collected online with the help of domain experts. The tutoring agent does not directly participate in learning activities with students and its dialog capability can be developed separately from the domain content. In the process, we redesigned numerous techniques used in pseudo-tutor learning environments and tailored them to the socially intelligent tutoring context. The problems for students to solve are scripted in a template language that generates complex problems with hints, while optionally being semantically adapted to student’s individual preferences. Decisions are made on the server, natural language dialogs and collaborative features within the client’s interface are synchronized near real time.

We apply these methods to increase motivation and learning gains in a learning system for middle school mathematics. Some 54% of students engage with the tutor quite naturally, while the others seem to be require more tangible benefits. Students in the socially engaged group liked the system and the tutor more, and they were also more successful in solving problems within the tutoring environment. The reinforcement learning strategy lets us create a working dialogue capability rapidly, without tedious dialogue scripting. We envision that advanced users (students and teachers) can put expertise in their own virtual presence, adding new virtual tutors capable to directly help others in learning.

In Proc. of Spring 2010 PeWe Workshop, Smolenice, Slovakia, pp. 53-54
to the top | to the main

Exploratory Search in the Adaptive Social Semantic Web

Michal Tvarožek
doctoral study, supervised by Mária Bieliková

Abstract. The evolution of the Web as a dynamic global virtual socioeconomic space resulted in many issues that affect both individual users and the entire human society. We need not only to address information overload and the navigation problem but also accommodate for novel trends in web use such as the push towards exploratory search, more interactivity, involvement, personalization. To achieve these objectives, we need to take advantage of principles and approaches outlined in present web initiatives – the Adaptive Web, the Semantic Web, the Social Web.

In our work, we combine and extend several existing approaches in order to create an advanced exploratory search browser for both the semantic and legacy web taking advantage of personalization, social wisdom and semantics. Ultimately, our goal is to provide users with a seamless exploration experience within a common Adaptive Social Semantic Web environment.

Our approach is based on a faceted browser which is extended with support for semantic information spaces to facilitate exploratory search, automated user interface generation to accommodate for web dynamics, user modeling and personalization to address information overload and navigation issues, collaborative content/meta-data creation to harness the power of social wisdom. We integrate our browser with additional support approaches for exploration such as history tracking and tree visualization, graph visualization and incremental navigation in the information space and custom content rendering tools to facilitate content exploration.

We evaluate our approach both via synthetic experiments and user studies in several application domains – digital images, job offers, scientific publications. Our initial findings have shown promising results with respect to individual approaches, while a comprehensive evaluation of the whole integrated browser is under way.

In Proc. of Spring 2010 PeWe Workshop, Smolenice, Slovakia, pp. 21-22
to the top | to the main

Collaborative Acquisition and Evaluation of Questions by Learners

Maroš Unčík
bachelor study, supervised by Mária Bieliková

Abstract. The classical educational environment consists of passive consumers (students) and creators (teachers) of learning content. Our idea is to make the passive parts of the learning process active. There are two views on this problem. In the first view are students creators, which increase their understanding educational texts and provides collaborative learning opportunity. The students are not just forced to think, but they also create the content which is useful for their peers. The second view means gaining of new quality content thanks to the collaboration. The content of learning texts constantly evolves, it is dynamic and it reflects the current changes.

The most common way for content enrichment is adding annotation, as text-based content such as notes, explanations, comments, hint and tags and also graphic-based content such as underlining, highlighting and adding the graphic label. One of not so obvious forms of annotations are questions. Questions concern to the content and summarize the significant information. Our main idea is to let students themselves to enrich educational materials with questions and therefore helping them to understand learning texts faster, better, with less effort.

Although the questions are added by students, it is necessary that these questions have a similar level of quality as the expert’s questions. Our idea is to evaluate the quality of questions based on the explicit feedback of students in conjunction with actions that students do in the educational system and also on the evaluation of expert (a teacher). For this purpose we proposed a model for the question rating and a model for the rating of the user ability to create questions.

Rating of questions derives from the explicit rating of questions by students and implicit rating of questions, based on the actions of students in the system (the user rating model). The tracked actions are question creation, answering a question, explicit rating of the question and similar rating as other students. Our approach includes also a competitive element of motivation in form of gaining points in overall assessment.

To evaluate our approach, we have designed a component for adding questions, which is part of educational web-based framework ALEF, for experimenting in domain of functional and logic programming. We plan to provide experiments within this adaptive system in the real learning process. We plan experiments to run for one week. The students will create the questions related to educational materials for programming language Prolog.

In Proc. of Spring 2010 PeWe Workshop, Smolenice, Slovakia, pp. 69-70
to the top | to the main

Information Search Considering the User’s Interest and Groups of Similar Users

Matej Valčuha
master study, supervised by Pavol Návrat

Abstract. Due to constantly expanding amount of information on the Internet it has become increasingly difficult to find the information we need. Therefore we use search engine in which we enter our request. Usually we provide one or two words to specify the item or the term. For the same requirement, the standard search engines give us the same result. Different users might be looking for completely different things even though they use the same requirements. Therefore, the result of the standard search engine might contain a lot of irrelevant links. Such problems can be solved by personalized search, which takes into account the particular interest of the user. Based on the user´s profile, the personalized search can override the requirement and give it to the standard search engine or to change the links order that was returned as search results of the standard search engine. It can also combine both of these opportunities to achieve more relevant results to the user´s interest.

To start such a search it is necessary to find a large source of information about the user from which one can read the area of interests. Social networks have a great potential for this tool. People disclose their hobbies, update their status, join different groups and become fans of their favorite movies, musicians, athletes. Groups have the advantage that even when the user is inactive, other users can contribute with relevant facts. All such information, like membership in a group or messages, can be used to create a profile of interests of the user.

The users of social networks can also express their views on any topic or a question. This interest could be used in determining the relevancy of results. The users would be given an opportunity to express their views on the results. These results would be shown to the others with similar interests and they would be able to increase the ranking of a relevant result or to decrease the order of an irrelevant result. The user of the personalized search should be allowed to choose which of his groups would be added to the search profile. Membership in the group does not have to reflect the user´s current or permanent interest.

In Proc. of Spring 2010 PeWe Workshop, Smolenice, Slovakia, pp. 23-24
to the top | to the main

Automated Recognition of Author’s Writing Style in Blogs

Martin Virík
master study, supervised by Marián Šimko

Abstract. In the past decade it has become much easier to create web content even for users with no experience with web technologies. Weblogs are the most typical and the most growing example of this trend. Thousands of bloggers use this hybrid genre to express their ideas, opinions and emotions, making blogs a rich space of topics and writing styles. In proportion to increasing number of blogs, the number of efforts to improve blog-search and recommendation algorithms has also grown. New requirements are aware of blog articles text quality and consider individual writing style an important blog characteristic.

In our research we focus on linguistic characteristics of blog articles in order to recognize and classify writing style of articles, blogs or even authors. We study the grammar and morphology of selected language and possibilities of computational linguistics to extract the features of document model necessary for further classification. In the first phase of our research we have been studying basic text mining and classification methods and works related to the analysis of blog articles linguistic quality. Apart from user profiling, such as gender or personality profiling, a great effort has been expended on detecting between informative and affective articles. This and other genre based research has proven a large overreach of affective blogs, especially diaries. Methods analyzing reading difficulty are very related to the weblog classification. We discovered a space for building models for multiple factors such as measures of syntactic complexity or prior knowledge of the reader.

We expect to achieve definition of writing style classes that are meaningful for weblogs, described by a set of measurable features. In order to automatically recognize these classes, we develop a method based on well-known linguistic analysis and classifiers, or their improved variations. To evaluate the results of our research, we will apply our method on testing articles, selected from live blogs.

In Proc. of Spring 2010 PeWe Workshop, Smolenice, Slovakia, pp. 37-38
to the top | to the main

Effective Representation for Content-Based News Recommendation

Dušan Zeleník
master study, supervised by Mária Bieliková

Abstract. Our work is based on advantages which could be achieved by the hierarchical representation of similar entities. As long as we are working with news we focused on representing similarities among text documents. Our method for similarity search composes a tree of news articles based on content similarity. The hierarchy and our incremental approach is effective for growing datasets and dynamically changing domains, because of its logarithmic complexity of storing and retrieving similar articles.

We apply our solution on the news domain, where this complexity is important, since articles are continuously added and time sensitive. Similarity is then kept in the hierarchy and set of the most similar articles could be retrieved. Evaluation of suitability of our representation is its comparison with brute force similarity search and relatively high precision for top similar articles. Best achievement is a little time which is need to store and retrieve articles.

We also assume that generating recommendations for users is often important to be provided real-time. Otherwise, the user loses his patience very easily. The user demands correct and relevant results but keeps waiting very shortly. Complications occur especially in domains where the subject of recommendation is time sensitive and the dataset grows. Our representation utilized for the recommendation solves these drawbacks. One of the advantages is ability to generate content-based, personalized recommendations of news. We use out incrementally composed hierarchy of similar articles also as a hierarchy of user’s interest stereotypes. Each stereotype is a tree node with set of ancestors – similar articles. Since the user reads specific types of articles, we presume that his interest stereotypes could be located in our representation and ordered by relevance. The ratio of articles read and articles not displayed is a criterion for such a sorting. In a result, the recommendation consist of articles from more relevant stereotypes to cover all of the interest of the user. The content similarity is then effectively used to recommend newly added articles if relevant for specified interests of reader.

In Proc. of Spring 2010 PeWe Workshop, Smolenice, Slovakia, pp. 39-40
to the top | to the main