Students´Research Works – Spring 2011

Search, Navigation and Visualization

Classification and Recommendation

User Modeling, Context Modeling, Virtual Communities and Social Networks

Domain Modeling, Semantics Discovery and Annotations

Presentation Template (.ppt)

to the top | to the main

Using Social Relationships for Searching Relevant Information

Anton Balucha
master study, supervised by Anna Bou Ezzeddine

Abstract. Web continuously changes itself. In a present day, web is complex tool used for storing, searching and processing information. Of course, beside work with information, web has character of significant platform for creation and holding relationships in the last years.

People publish their personal data, present their opinions, show their photos and write about their experiences. This data, available only to small group of people few years ago, are now available to very wide range users of the Internet. People search information on web. These results are providing by some search engines, on which is always something to improve. To get better search results can help mentioned social relationships. Many people search their friend because they are close to them, so it is possible to use their presented opinions, experiences and other information to get better information about searching scope. We study resources and possibilities of person identification, we try to get information about social relationships and we study possibilities of using social relationships to get relevant searched information. We suggest ways and methods of getting selected information about persons and their social relationships. We present this information about people in integrated form and use it for searching information.

We study algorithms for searching information inspired by behaviour of social insects. In this work we decided to go out and deal with searching by algorithms of fuzzy ants.

In Proc. of Spring 2011 PeWe Workshop, pp. 3-4
to the top | to the main

Information Recommendation with the Use of Context in a Specific Domain

Anton Benčič
master study, supervised by Mária Bieliková

Abstract. Our every action, wish or preference is shaped by our context, be it short-term or long-tem. Studies about mobile devices, that have long passed on being just mobile phones show that people use them and the connected services they offer even when they are in a presence of a computer. Besides the basic information or context like location, orientation, browsing history or installed applications that can tell a lot about the user, we can access the user’s data from social networks like Facebook or Twitter to get even more of the context.

The vast amount of information about the user, her environment and current whereabouts gave a foundation for a number of applications that make use of it. The problem however with most of them is that they only choose a small subset of this context and act upon it. This subset is most often only the location of the user, which while being useful in certain scenarios, can’t fill up for the whole picture and for example a pattern matching is often infeasible or annoying for the user when put in practice because of its inaccuracy. A few methods today are attempting to create a context fusion system that take multiple inputs such as mobile context, social context and smart space sensor context and fuse them together. By leveraging more than a location or a time by itself, these contexts are much more capable of recognizing a pattern in the user’s behavior.

Our aim is to create a context framework that will provide applications access to the user’s context through a unified interface, allowing these applications to choose the recommendation method as well as the context subset they want to use. With the context framework at hand we will experiment with it in a specific domain, like internet newspaper or a digital library to see what context subsets are relevant in those particular domains and combined with the specific recommendation methods.

In Proc. of Spring 2011 PeWe Workshop, pp. 21-22
to the top | to the main

Motivating Children to Physical Activity by Means of Reward with the Help of Personalized Recommendation and a Context-Aware Avatar

Pavol Bielik, Peter Krátky, Štefan Mitrík, Michal Tomlein
bachelor study, supervised by Michal Barla

Abstract. Over the last few decades, overweight and obesity became a problem of global proportions. Insufficient amount of physical activity and unhealthy dietary habits have been cited as its primary causes. While measures that need to be taken have been known for a long time, their execution has been hindered by a lack of motivation present in all age groups.

Tracking activity is an important step towards improvement. Most current activity tracking solutions rely on stand-alone single-purpose measurement and tracking devices. However, the current generation of smartphones has made it possible to track movement using built-in sensors such as GPS receivers, accelerometers and gyroscopes. Using a device people already carry, we are now able to collect data and provide the user with statistics and recommendations, which is a powerful source of motivation. A different approach to motivation is necessary for children, who are not easily motivated by figures and charts. Our solution is based on rewarding, a natural and effective way to motivate.

Being physically active requires a certain extent of knowledge on the right kind and amount of activity. Not enough activity may result in no improvement while too much activity may be harmful. A wrong choice of the type of activity may also have undesired consequences. It is therefore vital that users are advised on these matters. Our solution provides the user with advice and recommendations.

Much like motivation, advising children requires some additional consideration. Animated pedagogical agents have proven to be effective in capturing children’s attention. With the help of such agents, good results have generally been achieved. We propose an animated character (avatar), which is fully customizable in terms of appearance.

While the general concept of our solution is not limited to a specific age group, we have chosen to target children as our primary focus group. There are a number of differences in the approach that needs to be taken with younger users. Moreover, experience suggests that if any measurable progress is to be achieved, the involvement of parents in the process is necessary. For that reason, we specify means for parents to oversee their children’s progress and interfere when and where appropriate. Parents are also the primary source of rewards, which provide children with the necessary motivation.

In Proc. of Spring 2011 PeWe Workshop, pp. 23-24
to the top | to the main

Concept-Cloud Navigation in Educational Web-Based System

Máte Fejes
bachelor study, supervised by Mária Bieliková

Abstract. In this project we focus on the navigation improvement within the web-based educational system ALEF. The project’s goal is to suggest a method of content recommendation by the help of displaying essential key words representing content elements, which are recommended to the user by the system. The method is focused on effective detection and elimination of shortcomings of students’ knowledge as well as continuous gaining of new information. The content of ALEF consists of so-called learning objects. They are accessible from the main menu divided into three sections: text-explanations, questions and exercises.

Since the goal is to design the navigation by the help of key words (concepts) using, regarding the realization process we chose concepts organization in a form of tag clouds. There are two categories of concepts displayed in the concept cloud. They are divided into two boxes within the cloud.

The first type is Related concepts – all the concepts linked to actually opened learning object. They serve for quick switching among similar learning objects belong here. Related concepts enable the student to have quick access to questions and examples for the read learning material or to get learning materials for the unsolved task explanation.

We recommend – The second part of the cloud. It consists of concepts recommended for the user. By the help of users’ activities history the recommendation algorithm determines potential user’s knowledge deficiencies gained upon his individual actions. The system responses to these actions by a defined method and selects concepts related to learning objects, which could be interesting for the user.

We can move among learning objects on the site by the help of main menu. If we select the required learning object, we will get to a new state, where the particular information will be displayed. However, the tag cloud has different orientation. It does not navigate the user to the concrete learning object but recommends topics he/she should deal with. If we click on any concept in the tag cloud, the related learning objects will be highlighted in the main menu. In this way we indicated the proper direction for the user but the final decision to select one of the recommended objects is on the user

In Proc. of Spring 2011 PeWe Workshop, pp. 5-6
to the top | to the main

Information Integration on Social Adaptive Web

Michal Holub
doctoral study, supervised by Mária Bieliková

Abstract. Today almost everything can be found on the Web. The problem is that information is scattered across many sources. One difficulty is that there is no web page containing everything. Around 66 % of searches are triggered after visiting some web page. This means that a user requires additional information to supplement what he reads on a web page. If we can anticipate this behavior and foresee what the user will search for, we can do this search instead of him, process the results and integrate them to the current web page. The user then sees an enriched page with more information.

Search engines are effective in locating the documents which contain the keywords from a query. It is then up to the user to look through these documents, find relevant pieces of information and join them together in order to get the broader view of the problem. Search engines fail in presenting relationships among information contained in returned documents. We try to accomplish this by integrating information across many websites, thus creating a mashup. We are working on a method for combination of various web objects extracted from different websites. The combination should answer query of a user, or it should broaden the information provided by a single website.

We have already done some work with integrating events automatically found on a web portal into a personal calendar of a user. We used implicit feedback for user’s interest estimation. Then we combined recommended events (determined using collaborative filtering) into a calendar. We have also tried to group news articles informing about the same topic, which were acquired from various portals. We extracted keywords from the articles using various web services. We then computed the intersections of these keywords for each pair of articles. Based on this work we try to describe web objects’ properties which can be used for their integration. These should be as domain independent as possible. We also deal with creating a methodology for this integration, usage of which will result in new applications for end users.

In Proc. of Spring 2011 PeWe Workshop, pp. 25-26
to the top | to the main

Detecting User Communities Based on Latent and Dynamic Interest on a News Portal

Marián Hönsch
master study, supervised by Michal Barla

Abstract. In our work we identify communities of individuals based on their interests. One of the drawbacks of today’s community-based collaborative recommender systems is that they group users based only on their aggregated similarity. Only those users are assigned to a community, whose profiles match completely the profile of the community. This prevents them from using the wisdom of the crowd coming from users that match only parts of their interests. A user can belong to several communities at a time, where each community represents parts of his interests. We assume that recommendations coming from communities that address this issue can significantly improve the quality of recommendations.

We describe how to record and identify particular interests for each user. Interests evolve from analysis of the resources that the user has viewed in the past and are defined as cluster of keywords that have dense relatedness interconnection between each other. The identified interests are specific for the domain corpus and so content depended. Based on the time period we can model short term and long term interests. The novel approach is to create virtual communities based on these interests. We address the negative impact of aggregated similarity on user grouping by proposing a method consisting of two main steps. First to collect and detect users interests and then create relatedness graph between interests and detect virtual communities

Our target is to find communities that are defined by one particular interest. This community should include all users that share this interest. To evaluate our approach we built an articles recommender for a news paper portal and perform an experiment to confirm our interest comparing strategy. As recommender systems are always tailored to the specific domain, we also adapted our approach slightly to better fit the news paper portal domain, which is highly dynamic and with frequent changes. We consider these fluctuating and time-dependent changes by weighting the influence of volatile communities on recommendations. To overcome highly influential short term interests we also consider in actual community results users past community memberships.

In Proc. of Spring 2011 PeWe Workshop, pp. 39-40
to the top | to the main

Interpretation Support of Terms while Browsing in Slovak Language

Róbert Horváth
bachelor study, supervised by Marián Šimko

Abstract. Today, Web is an inseparable part of our everyday life. While accessing web pages, it is necessary for us to fully understand its meaning in order to take full advantage of it. In many technical articles there are words which meaning is unknown for us. Most of the users look for an explanation of a phrase in online dictionaries. This way they have to open a new window with dictionary and manually enter a word. That is nor very fast neither comfortable. Another possibility for a user is to use a web browser extension that can do the whole process automatically. Extension is made to show the main explanation of word in a tooltip. Main problem of such extensions is that they are not created to support Slovak language.

Therefore, the goal of our work is to create a tool that will provide explanation of Slovak word simply and fast. We achieve it by creating an extension for web browser (Google Chrome). The process of word explanation begins with simple double clicking and selecting unknown word. There are some problems with Slovak language we have to face to get the right meaning. Extension must be able to get explanation even for words that are not in a main form. To allow this we use lemmatizer to get lemmas of the word. This is done as separate web service which our extension uses and afterwards it sends a word lemma into a dictionary. Another problem connected with simplicity is to show the right meaning of word if there is more than just one meaning. The extension takes context of the word into consideration and compare it with explanation results already found. Only the most probable result based on cosine similarity computation is shown to user. To fully understand the word, not only the explanation but also synonyms can be sometimes very useful and are displayed as well. To evaluate the extension, we plan to conduct a small experiment with a selected group of users.

In Proc. of Spring 2011 PeWe Workshop, pp. 7-8
to the top | to the main

Named Entity Disambiguation using Wikipedia

Martin Jačala
master study, supervised by Jozef Tvarožek

The constantly growing amount of human written textual content available on the web is source of interesting and actual information about persons, organisations or places. One of the problems we face when analysing or querying in such content is name ambiguity. Does the word jaguar mean the sports car, the jungle animal or something different? Which Michael Jordan does the text refer to?

The proper names in news articles comprise approximately 10% of text and many of such proper names are ambiguous. In our work we propose an approach to answer these questions by disambiguating named entities using explicit semantics extracted from web-based corpora serving as background knowledge. We follow Miller and Charles distributional hypothesis stating that similar entities appears in similar contexts even across multiple documents.

Our approach combines named entity recogniser with additional disambiguation stage. The disambiguation is based on semantic similarity measure between fragment of analysed document and articles extracted from Wikipedia. We use explicit semantics defined in Wikipedia, such as link structure, page-to-page redirects and disambiguation pages to extract relevant documents. The similarity measure is computed as cosine similarity between document vectors transformed into high-dimensional semantic space created with Explicit Semantic Analysis. This method is similar to Latent Semantic Analysis but assumes each input document as a concept instead of trying to discover hidden, latent concept in large unstructured and uncategorised text corpora.

Our experimental results show promising results compared to baseline method using similarity measure based on cosine similarity alone, without the transformation into semantic space. This is subject to further improvement with additional measure based on category structure of encyclopedia data by mapping disambiguated entities to categories and comparing them with categories of retrieved articles. Previous research indicate such measures can improve the precision of such methods. Additionally, we are reconsidering the implementation to provide results in real time, as we would like to present our method as web based demo.

In Proc. of Spring 2011 PeWe Workshop, pp. 53-54
to the top | to the main

Discovering Keyword Relations

Peter Kajan
master study, supervised by Michal Barla

Abstract. Analysis of the web structure and usage reveals new semantic knowledge about users or the content. One of the popular method is to build up a knowledge base using “collective intelligence”. The main idea is that by exploring the web usage of the large number of users is possible to find out relations between the users, visited pages or their attributes (e.g., keywords). In our work we are interested in exploring relations between keywords of the visited pages. These relations could be used to organize keywords into ontologies which could be applied in the area of personalized web (e.g., personalized navigation, content filtering).

Many research papers deal with the building taxonomies or even ontologies by exploring relations between keywords. The common approaches use algorithms based on statistics, clustering or set theory to group keywords. Few works use WordNet or other lexical database to find synonyms or multiple meanings of a keyword. In our work we would like to analyze the possibilities which brings Linked Data to enrich the knowledge base. For example, Open Calais web-service is able to disambiguate a keyword from the context which can be helpful when dealing with ambiguous keywords. Another idea is to analyze the dynamics of the web usage, the users click-streams for instance.

Our idea is to combine mentioned methods to build a knowledge base and to enrich it by finding new kinds of relations between keywords and their properties. The results will be evaluated in the domain of personalized web search.

In Proc. of Spring 2011 PeWe Workshop, pp. 55-56
to the top | to the main

Group Recommendation for Multiple Users

Michal Kompan
doctoral study, supervised by prof. Mária Bieliková

Abstract. The group recommendation is an interesting research area nowadays. There are several activities, which are doing in a social rather than an individual manner. In this situation individual recommender systems cannot be applied. TV watching, going to cinema, restaurant or pub are only several examples. These activities are usually attended after some agreement over the group. We also distinguish situation, when we cannot to choose as music played in the gym or in the vehicle etc. There are two basic group recommendation approaches: aggregation of individual’s preferences or aggregation of individual’s recommendation.

When recommending, usually not only one item is recommended. The sequence recommendation is an interesting task, especially in the group recommendation. We have to consider not only the order of the sequence for single user but also its influence on other group members. This is strongly connected to designing satisfaction functions, which should model the satisfaction level over the group.

As we try to model real life group characteristics, it is important to incorporate user’s personality. It was shown that user’s mood and personality could have a significant influence to other group’s member’s feelings. In other words, when a respected extrovert is unsatisfied, other members will probably share her feelings, even if they were partially satisfied.

The group recommendation can be also used for solving problems of the standard single recommendation. The multiple criteria are usual complication, where recommended item consists of several attributes. Merging strategies can be used for overcome the multiple criteria, while only some modification are needed (not considering fairness etc.) The most of the today’s recommenders suffers from cold start problem. The state where new user comes and wants to interact with the system and there is not enough information about her. In this case, we can apply the group recommender (group consisting of new user and all or several representative old system users) to solve this problem. More research and evaluation is need for verifying these principles.

In Proc. of Spring 2011 PeWe Workshop, pp. 27-28
to the top | to the main

Improving the Personalized Search on the Web

Tomáš Kramár
doctoral study, supervised by prof. Mária Bieliková

Abstract. In our daily contact with the Web, we are faced with a vast amount of documents and their number increases every moment. The situation is in large part caused by the rise of the social Web, where users are encouraged to not only passively consume, but also actively create the content. The numbers of new content are hard to track, but according to State of the Blogosphere, an annual report by a social media tracker Technorati, there’s been a steady rise in its count recently. The amount of new content created every day is tremendous, and blogs contribute only a small part. It’s obvious, that to access an information base this large, a powerful methods for content analysis and retrieval are required.

To facilitate easy access to this knowledge base, search engines were created. In a highly oversimplified point of view, search engines enable finding the document by entering a set of keywords which describe user’s intent — if these keywords are found in the text of the document, it is retrieved and presented to the user. However, the fulltext-only search is often incapable of handling user queries satisfactorily and has several known disadvantages.

To mitigate this problem, several approaches to search personalization have been researched, each with the ultimate aim to help the user find the relevant content, without trying to change how humans think, or work. There is relevance feedback, query expansion, search intent detection, alternative ranking schemes and many others.

In the domain of user modeling, the term context traditionally refers to the attributes of the environment (i.e. user’s location, time, her mood, etc.). In the domain of personalized search, the term context is commonly used to describe user’s needs, goals and intent. But there is a difference between a personalized search and a search in context. The personalized search deals with adapting the search to a user’s “personality”, while the contextual search is concerned with adapting the results to a particular context. The difference is in the scale — the personalized search deals with long-term user’s preferences, while the contextual search focuses more on the intermediate needs. While many personalization approaches exist, the contextual search is still largely unexplored area. In this work we describe various existing approaches to search personalization and discuss the possibilities to make the search more contextual.

In Proc. of Spring 2011 PeWe Workshop, pp. 9-10
to the top | to the main

Automatic Photo Annotation Based on Visual Content Analysis

Eduard Kuric
master study, supervised by prof. Mária Bieliková

Abstract. Automatic photo annotation is the process by which a computer system automatically assigns metadata to a target photo. With increasing popularization of digital and mobile phone cameras, there occurs a need of quick and exact searching, for example by general category or focusing on a specific object. Manual creating annotations by a user is very time-consuming and results can be often subjective. Therefore, automatic photo annotation is most challenging task.

Generally, approaches for automatic annotation are categorized into two scenarios: learning-based methods primary focused on determining complex categories or group of specific objects and web-based methods use crawled web image data to obtain relevant annotations.

In learning-based methods, a statistical model is built to learn a classifier. Automatic face recognition in a target photo is a good example for automatic annotation of specific objects. One of possibility is that a retrieval process uses a robust dictionary of visual terms to identify people. Similarity can be evaluated of comparing local descriptors which are computed over local features such as edges, small patches around points of interest. The local descriptors are much more precise and discriminating than global descriptors. By searching a specific object, this feature is welcome, but by searching complex categories it can be an obstacle. Another obstacle is the need to store the huge number of the extracted features.

The Web space provides unlimited vocabulary for web-based methods. However, the main problem of web-based approaches is initial query. The problem is lack of information about the target photo. Without providing further information such as key caption, searching similar photos for the target photo on the web is like finding a needle in a haystack. No less important drawbacks of the approaches are performance and obtained annotations are often noisy.

In our method, we combine local and global features to retrieve the best results. With the combination, we are able to ensure robustness and generalization needed by complex queries. We place great emphasis to work in real-time. To cope with the huge number of extracted features, we implemented disk-based locality-sensitive hashing to index descriptors. We focus on photos analysis in terms of probability, that the retrieved photos contain the right keywords for the target photo. It is designed for versatile use, e.g. identifying of key objects in specific photo albums; complex automatic photo annotation in large web photo galleries; searching similar photos according to objects of interest (query by image content).

In Proc. of Spring 2011 PeWe Workshop, pp. 57-58
to the top | to the main

Recommendation and Collaboration through Implicit Feedback

Martin Labaj
master study, supervised by Mária Bieliková

Abstract. In the field of e-learning, the identification of difficult and/or interesting parts of learning text can be useful feature for tasks like rewriting the text, showing where to focus or offering adaptive help to student. In our work, we track implicit feedback/interest indicators including scrolling (read wear). Using these data collected from many users, we can determine which fragment of the document is the most time-consuming and therefore interesting and/or difficult. As in any method dealing with time based user tracking, there is a possibility that the user is pursuing different activities during assessed time periods. We try to avoid this by using low-cost webcam and employing physical user tracking – gaze tracking. This way we can leave out time periods when user is not directly using computer or even when he is at the computer, but is working with different application. The gaze detection also increases precision of fragment identification as an additional implicit interest indicator. We then combine interest indicators into an attention index.

Collected interest data can be used in various scenarios:

  • Interesting fragments visualization and summarization, where only fragments with highest attention index are highlighted or selected. Users can quickly scan through document either on first read or on revisit.
  • Adaptive guide to (learning) application, where user’s work with web system is evaluated and adaptive hints are provided. If user notices recommended items with his gaze, but does not use them, different advice is provided than when he did not notice recommendations at all. Also explicit feedback questions can be asked the same way. Hints or questions adapted to current situation should make the user respond more easily.
  • Augmented instant message communication, where users are provided with positions of their peers in the same document. As students see who is possibly stuck on the same fragments, cooperation should be encouraged.

We implement gaze tracking (using open source gaze tracker OpenGazer) as a standalone desktop application due to webcam access and required processing power. We collect interest indicators via custom extension of Firefox web browser. This extension is connected with the gaze tracking application via local client-server communication using sockets. In order to save on resources of web application server, we filter and process collected feedback on the client side in the extension, but unprocessed feedback is also stored for offline analysis and review on an independent server.

We have already partially evaluated the gaze tracking alone and incremental parts of implementation. Currently we are working towards the evaluation of complete solution via ALEF Adaptive LEarning Framework and possibly on the open Web.

In Proc. of Spring 2011 PeWe Workshop, pp. 29-30
to the top | to the main

Photo Album Visualization as a Collection of Memories and Experiences

Michal Lohnický
master study, supervised by Mária Bieliková

Abstract. Digital photography has existed since 1991. About ten years later the first mobile phone with an integrated camera was manufactured and nowadays more than 90% of mobile phones have a built-in camera. In 2009, more than 3 billion photos were uploaded monthly to the biggest social network Facebook. These numbers only prove the fact that the vast majority of people in developed countries tends to carry the camera at any major event.

Moreover, photography is mostly considered to be a medium to save and share experiences and emotions from photographed events. This means that photo albums are collections of memories and emotions linked together via a user’s story. Photo album visualization is supposed to emphasize these elements and create unforgettable user experience.

Our work is aimed at augmenting user experience while browsing photos and other multimedia. We have proposed a special emotional chart as a fundamental navigation element which creates an overview of photographed events. The chart is meant to be an emotional histogram of the events, which employs minimal user’s personalization to save the atmosphere of the experiences. This is allowed by combination of various methods and algorithms like fast furrier transformation, force based layout, logarithmic normalization and photo clustering.

The main goal of our work is to supply innovative navigation and browsing in photo albums which supports storytelling and collaborative creating of digitally saved experience. To combine this with a proper amount of photo analysis and informational augmentation we can create various views of a complex collection of photos in photo albums. This style of browsing photos can be used by the users for sharing photos in much higher quality, for finding photos which they miss in their photo collections, to view places where they intend to go in various time periods etc. It is also usable in commercial sphere – in travel agencies, botanic monitoring etc. When we add the direction of taking photos to the location we can create an ideal presentation tool for real-estate companies.

In Proc. of Spring 2011 PeWe Workshop, pp. 11-12
to the top | to the main

Acquiring Metadata from the Web

Milan Lučanský
master study, supervised by Marián Šimko

Abstract. World Wide Web provides access to enormous amount of data. Even though those data are freely available to every one, we have problem to process it because of the form the data are stored and also the amount is an issue. There is potential to provide advance data processing such as categorization or recommendation, but we have to build semantic layer which is necessary for advanced tools. We need automated method to extract relevant meta-information from the web content. There are some approaches to extracting information from web pages, but most of them are not suitable form “dirty” World Wide Web environment.

We focus on extracting keywords describing the content of web page. Acquiring the keywords is important step before getting the concepts, which are essential for creating ontologies. And the ontology is important part of tools for advanced data processing. We build our work on bachelor thesis (Web site content metadata acquisition using tags), which focuses on acquiring keywords from websites. We proposed new method which improves contemporary algorithms for keywords extraction. In this method, automatic term recognition (ATR) algorithms are combined with some semantically potential HTML tags (i.e. title, heading and anchor) to get more descriptive keywords from web page. The main idea of our method is considering more relevant keywords which are picked by ATR algorithms, but also present in one of HTML tags with semantic potential. We increase weight of keywords using TagRel, which is constant number assigned to each HTML tag.

The index for each tag was estimated at the beginning and did not change through experiment. There we see potential for our research. We suppose that finding method how to parameterize value of TagRel according to some variables could bring even more precise and descriptive keywords. Possible variables could be number of external links (anchors), number of words in HTML tags, diversity of words in HTML tags or every ATR algorithm could have different measure how to set its TagRel. After creating new method for dynamically assigned TagRel, we will perform experiment on “wild” web.

In Proc. of Spring 2011 PeWe Workshop, pp. 13-14
to the top | to the main

Interactive Browser of Heterogeneous Web Content

Peter Macko
bachelor study, supervised by Michal Tvarožek

Abstract. Multimedia content is really very important for humans. Thanks to it we can have a lot of fun and relax, but it may also serve to improve education and skills. Just a few years ago it was extremely difficult to transmit large amounts of data, required by multimedia content, through the Internet network. The arrival of new revolutionary technologies with higher speed of data transmission has changed this situation significantly in recent years. Therefore multimedia content is ever more visible on the Web. But the main advantage is that the quality of the transmitted content, thanks to the constant acceleration of transmission speed, may still increase. Thanks to the features of today’s Internet, multimedia content is more accessible then it was in the past and this is the main reason why I have focused my work on this area.

The main topic of my bachelor thesis is focused on displaying and viewing multimedia content and on the improvement of existing photo browsers and viewers. The browsers nowadays include just the possibility of browsing and searching images based on specified criteria. First theme of my project concerns integration of a video player into an existing multimedia exploration solution since videos will become an integral part of it. Besides that, these videos will also include metadata which will be useful for easy searching and navigation in video content. Videos displayed by the system will be transmitted through the Internet in the highest possible quality.

The second topic is about personalized presentation of images and videos. This means, that if someone is interested in some person or object on the picture or video, he may ask the system to show other pictures or videos with this chosen object. This will make the presentation much more interactive. The direction of the presentation will not be determined by the system or by the gallery owner but by the user. The user alone could manage what should be seen in the gallery.

In Proc. of Spring 2011 PeWe Workshop, pp. 15-16
to the top | to the main

Leveraging Microblogs for Resource Ranking

Tomáš Majer
master study, supervised by Marián Šimko

Abstract. In order to compute page rankings, search algorithms leverage mainly information coming from page content and its interconnections. Microblog as a phenomenon of the “Web age” often provides additional, potentially relevant, information – user feedback on a page (or any web resource in general) and/or its contents. This is particularly valuable when considering Microblog as a source of news from around the world and the enormous number of users receiving that news in very short notice. Despite its spread, microblogs are still relatively poorly understood and suitable for further analysis and research.

Microblogging service Twitter has its own characteristics, such as followers who read user posts. They can share posts and publish on their profiles. This makes it possible to rank a user based on his followers with respect to number of contributions and to create an algorithm for evaluating resources on the Web. There are references between users, posts and pages which create a graph. We analyze the graph and apply various graph algorithms leveraging the notion of a node centrality to deduce microblog-based resource ranking.

We plan to evaluate our approach using web search. We compare our microblog-based ranking with traditional search rankings in order to assess the level of search results improvement. Besides web search, we believe our method can be used also in online stores where rankings of products can estimate user interests and opinions.

In Proc. of Spring 2011 PeWe Workshop, pp. 31-32
to the top | to the main

Relations Discovery in Educational Texts based on User Created Annotations

Vladimír Mihál
master study, supervised by prof. Mária Bieliková

Abstract. To increase effectiveness of learning, web based e-learning systems employ tools to personalize the content, navigation and allow students to collaborate and contribute to educational content. To utilize such tools it is necessary to retain a domain model. However manual creation of a domain model requires expert knowledge and great amount of time and effort. It is also necessary to link the newly created domain model to the learning content. Therefore it is essential to support process of creation of the domain model and its enrichment with additional relations.

E-learning systems exploit user created annotations as a tool to support participation of students within a learning course and enhance collaboration between students. Annotations can be also used to guide students, encourage them to perform interactive and collaborative learning tasks, thus improving their learning experience. One of remarkable types of annotations is links to external information sources. Students typically search for additional information sources on the Web, collect links to most interesting and helpful content and share them among their friends. Inserting such links into related learning texts as annotations helps organizing links to resources and makes sharing the links easier. Teachers may also insert links to provide students additional learning resources. Users can also rate usefulness of inserted external sources to help us evaluate their quality. As a result learning content becomes enriched with external sources.

Besides enriching the learning content, external sources characterize portions of the document where they are inserted. We use this information to derive relations between learning object and concepts from existing domain model. We analyze a content of external sources to link them with appropriate concepts and construct a graph from learning objects, external sources and concepts according to known relations. We apply spreading activation algorithm on the graph to compute similarity of entities (learning object or concepts) to each learning object. According to computed similarity we create weighted relations and merge them into existing domain model, resulting in a new hybrid domain model consisting of expert knowledge and collective wisdom of students.

In Proc. of Spring 2011 PeWe Workshop, pp. 59-60
to the top | to the main

Personalized Text Summarization

Róbert Móro
master study, supervised by Mária Bieliková

Abstract. One of the most serious problems of the present-day web is information overload. As we can find almost everything on the web, it has become very problematic to find what we actually want or need – to find relevant information. Also, the term “relevant information” is subjective, because as users of the web, we differ in our interests, goals or knowledge. Automatic text summarization aims to address the information overload problem. The idea is to extract the most important information from the document, which can help users to decide, whether it is relevant for them and they should read the whole text or not.

The problem with classical (generic) automatic text summarization methods is that they do not take into account the different users’ goals, interests or knowledge. Our idea is to personalize the text summarization. Much information about users’ interests can be inferred from their browsing behavior. If a user reads a document about a particular topic, it serves as an implicit feedback, from which we can assume, that he or she is interested in the topic. Also, with the arrival of Web 2.0, users are no longer passive consumers of Web content, but they can create content and add metadata, such as annotations, tags etc. These can be used as another important source of personalization.

Annotation of documents is a technique widely used by people especially when reading printed documents. They highlight or underline the important parts of text, add explanations or different formulations or even references to other documents. This way, annotations can indicate reader’s (or user’s in the context of the web) interest in that particular part of the document. We can take into account not only user’s annotations but also those of similar users, including the users’ collaboration into the process of text summarization. Tags, as a special type of annotations, can also be considered. They are usually generalized descriptions of the topics contained in a document and directly reflect the users’ vocabulary and their understanding of the document.

We plan to evaluate our proposed method in the domain of e-learning in ALEF (Adaptive Learning Framework); however, the method itself will be designed and implemented as domain-independent.

In Proc. of Spring 2011 PeWe Workshop, pp. 61-62
to the top | to the main

Acquisition of Semantic Metadata via Interactive Game

Balázs Nagy
bachelor study, supervised by Michal Tvarožek

Abstract. Metadata play an important role in today’s world, which is based on the World Wide Web. Our goal is to create a game that will facilitate this kind of data acquisition by using the power of a human computation.

Benefit of our game is automatic acquisition of metadata to photos obtained from an ontological database. The basis of our project is the old familiar memory game called PEXESO, which was remodeled to be useful for us. We added some help to the game, in such a way that the players can tag inverted images. The next step was to persuade the players to use the offered opportunity with increasing the size of the table up to 10×10. Now they do not need to memorize the exact location of the picture, it is enough when they know the roughly place of it, and they can find the pair by given tags.

To get remarkable results as soon as possible, we narrowed the number of used photos from 8000 to 400 and in each new game are selected pictures randomly from this narrowed range. After getting basic annotations, we can select photos with similar content to make the game more difficult and produce more precise tags to given photos.

With prototype of the game PexAce we gained 49 registered users, and some users that played our game as guest. The game is now working with first 400 photos of our database, which were annotated with 1059 annotations. Annotations contained altogether 1603 words, what means that we created in average 2,65 annotations and 4 words to one photo.

Today’s results and researches suggest that we cannot underestimate the potential of Games With A Purpose. Using them we can utilize lots of time wasted by playing normal games. There are many areas of sciences which can be supported with GWAP’s.

In Proc. of Spring 2011 PeWe Workshop, pp. 63-64
to the top | to the main

Semantic Web Navigation Based on Adaptive Views

Karol Rástočný
master study, supervised by Michal Tvarožek

Abstract. The amount of information in web repositories is growing exponentially, what affects the number of identified results for users’ queries, which can decrease the relevance of these results. Users can increase the success rate of their query, when they can exactly describe required result with keywords. This problem is quite successfully solved with a keyword-based query expansion and with approaches based on exploratory search. But it will often happen that a user has found required result already and he wants to find similar and/or related results, now. One type of this service is provided by Google (Google Similar), but results offered by this service are only displayed in list view without any information, why they are evaluated as similar. In our approach we address this problem via view-based search within the Semantic Web using navigation in a two dimensional graph.

Our graph navigation approach is based on bachelor work of Adrian Rakovský, where the concept of web browsing based on graph visualization was proposed. The main benefit of this approach is that users can see dependencies between the resource (original result) and new results visualized in graph. On the other hand, we identified quite large deficit in our work: Visualized graph can quickly grow to enormous size, so it becomes unclear and unusable for conventional users. To avoid this we extend this approach via result clustering, facet marking, adaptability to user’s interests, next action recommendation and abstract zoom. The first three extensions aim to reduce the number of nodes displayed in the graph and the last two help users to orientate in the graph and understand the graph.

Next problem is that finding information on the Web consists of three iterations: lookup, learn and investigate. These iterations cannot be easily separated into independent steps as they often are. Obviously, when a user starts with finding of information, he does not know information space, where he is searching very well. So he should be able to redefine his query, while he learns about identified results. When user has already found wanted result and learned basic information about it, he obviously wants to know more about related resources (e.g. when user found the painting, he can want to know more about its author), so he need support for navigation in them. We address this problem via navigation based on adaptive views that can help navigate a user from searching to browsing among related and/or similar results.

In Proc. of Spring 2011 PeWe Workshop, pp. 17-18
to the top | to the main

Devising Secure Communication for Decentralized Environment

Márius Šajgalík
master study, supervised by Michal Barla

Abstract. As the decentralized user modelling requires wide range of issues to be addressed, our research should not focus on every one in detail since there has already been done significant amount of work. Instead of that we must analyse and choose the most appropriate existing methods and helper tools. That will allow us to focus on the main goal of our work to build up the decentralized client proxy.

Privacy is one of the inherent issues of decentralized distribution. In a decentralized environment the information resides on the client side and individual users might decide to restrict access to their information by setting up permissions which users are authorized to access and use their personal information. Thus, users’ trust in secure information usage can lead to more confident, relaxed and extensive interaction, hence to more and better data about the user, and thus to better personalization. This issue has already been addressed in several works and also P3P protocol has been developed and officially recommended by the World Wide Web Consortium.

Another great work which our research could build on is LoudVoice infrastructure. It is an efficient multi-agent communication platform based on the concept of channelled multicast. Messages are sent on a channel and received by all agents that “tune” into it. Channelled multicast reduces the amount of communication needed when more than two agents are involved in a task. Moreover, LoudVoice presents several other features including the ability to distinguish streams of messages by their theme, and to address agents by their characteristics. Multi-agent systems can benefit from the possibility of broadcasting messages to a wide audience. The audience may include overhearing agents which, unknown to senders, observe conversations and, among other things, pro-actively send suggestions.

LoudVoice has been designed to support the notion of implicit organizations. An implicit organization is a group of agents playing the same role on a given channel and willing to coordinate their actions for the sake of delivering a service. The term “implicit” highlights the fact that there is no need for a group formation phase, since joining an organization is a matter of tuning into a channel. By definition, implicit organizations are formed by agents able to play the same role. LoudVoice allows senders to address messages either to specific agents or to all agents that offer a certain service on a channel, for example providers of a particular type of information.

In Proc. of Spring 2011 PeWe Workshop, pp. 43-44
to the top | to the main

Automatic Web Content Annotation

Jakub Ševcech
bachelor study, supervised by Mária Bieliková

Abstract. Today we are facing with information overload. It is thus important to provide information in a way that it can be used efficiently and can be the most useful. For organizing and providing additional information are frequently used annotations attached to documents. It is possible to create these annotations in two ways, manually, where reader of document creates annotations, or automatically. There are many tools to support the manual creation of annotations, but with current amount of available documents it is impossible to annotate them in both sufficient quality and quantity, only with the help of readers of the documents. It is therefore necessary to create methods to automatically create annotations and attach them to documents.

We proposed a method for automatic creation of annotations for important words in Slovak text. Created annotation are intended to define the word or phrase to which they are attached, or to provide additional information about this word. As a first step in process of creating annotations, it is necessary to find the words, that needs to attach annotations. For this purpose, we use the service for extraction of keywords and named entities. Since machine analysis of text achieve satisfactory results only if English text is processed, we translate analyzed text into English. Subsequently it is necessary to connect extracted words back to their equivalent in the original text. For this step, we propose a method for mapping equivalent words between the text and its translation. The method is based on the use of bilingual dictionary. To take account of different shapes of words we compare words using Levenshtein distance.

Information necessary to fill annotations are gathered through publicly available services for information retrieval. Created annotations are in the form of definitions of keywords and in form of links to related resources.

We proposed a method for personalization of annotation based on users interaction with annotations and its content. Personalization is in form of hiding annotations and in the form of rearrangement of links to related resources, which form the content of annotation.

Proposed method was implemented in two parts: web service creating annotations attached to keywords in text and module module into the teaching system ALEF, where annotations are presented and personalized.

In Proc. of Spring 2011 PeWe Workshop, pp. 65-66
to the top | to the main

Harnessing Manpower for Semantics Acquisition

Jakub Šimko
doctoral study, supervised by prof. Mária Bieliková

Abstract. Nowadays, the amount of information on the web grows extremely fast. In order to be able to search the web and utilize its content, we require scalable methods for acquiring information about individual resources. These information, in opposite to the heterogeneity of web resources, must be homogeneous to be easily processed by machines. Nowadays, the role of such homogeneous meta-layer above the web is played by the keyword search indexes within web search engines.

The semantic web principles were created to provide a worldwide framework for creating richer web resource annotation, than keyword indexes. The semantic web can be seen as a meta-layer of the common web: a collection of web resource annotations unified under universal and widely accepted domain models. With such structure the web invisibility problem is easy to solve – it can be used to create various forms of web abstractions to browse. The solving of the problem of sophisticated queries becomes also trivial – besides the structured information and knowledge, the semantic web standards provide logical reasoning frameworks suitable for question answering. Although there is much work done in the field of automatic semantics acquisition, the human work on building the semantics is still a need. In response to that we study approaches for creating semantics on the web with accent on games with a purpose, which are computer games that transform a human intelligence task into entertaining game.

In our previous research, we introduced the Little Google Game (LGG), a game with a purpose of acquiring a general term relationship. We have validated the semantic soundness of the relationships within the network and shown that the game can discover relationships that remain hidden to automated corpora mining methods. Now we look at the LGG network as a potential source of ontological facts. We conducted experiments to disclose how many of the relationships are present in the major knowledge base of Wikipedia using Wikipedia Miner tool and what kind of relationships are mostly present in the network, by evaluating them manually and also comparing them to the facts in the knowledge base of ConceptNet. We now propose two methods for naming the LGG network relationships: first, a modification to the Little Google Game itself forcing players to disclose predicate-like terms related to existing bigrams and second, automated sentence mining with web search engines helping to gather relevant sentences.

In Proc. of Spring 2011 PeWe Workshop, pp. 67-68
to the top | to the main

Automated Metadata Extraction for Adaptive Social Learning

Marián Šimko
doctoral study, supervised by Mária Bieliková

Abstract. In order to make the learning process more effective, educational systems tailor learning material to user goals, needs and characteristics. Adequate adaptation requires a domain description enabling adaptation engines to make at least basic reasoning. A domain model of an adaptive course consists of interlinked concepts – domain knowledge elements related to learning content. The concepts are mutually interconnected forming a structure resembling a lightweight ontology, also referred to as course metadata. Different types of relationships between concepts represent different semantics: e.g. concept relatedness, hyponymy or prerequisite.

The bottleneck of adaptive educational systems is the complexity of domain model creation and update. Identifying concepts or defining hundreds or even thousands of relationships between them is difficult and almost impossible for humans. The complexity of domain model update is visible especially in the case of student-generated content when considering Adaptive Web-based Learning 2.0. To the best of our knowledge, there are only few works related to automatic metadata acquisition in adaptive educational web-based systems (or even adaptive web-based systems at all). State-of-the-art approaches rely mostly on domain experts or teachers, who supply an adaptive system with necessary semantic descriptions.

In our work we aim for an unsupervised approach requiring teacher assistance for fine-tuning the automatically generated domain model only. We consider heterogeneous sources of information to process and extract relevant domain descriptions. We particularly focus on learning objects (created by a teacher), social annotations (created by students) and links between them. The method we propose consists of relevant domain term (RDT) extraction and relationship discovery steps. In our approach we employ methods and techniques of text mining (statistical, linguistic processing) and graph analysis. We proposed several variants of relationship discovery. Each provides a unique view of the actual domain model state. We integrated the method into the ALEF (Adaptive Learning Framework) and evaluated it in several real-world experiments.

In Proc. of Spring 2011 PeWe Workshop, pp. 69-70
to the top | to the main

Encouragement of Collaborative Learning Based on Dynamic Groups

Ivan Srba
master study, supervised by prof. Mária Bieliková

Abstract. Web 2.0 principles became very successful and brought a lot of energy into the development of web applications. One of the new trends is so-called social software. It uses the web as a broker which allows users to collaborate, communicate or share content and opinions. Typical examples of social software are wikis, blogs or social portals. The rising popularity of these applications caused that many users with different interests and social contexts are connected via common applications. If we want these users to collaborate effectively we need to know how to successfully identify users’ groups and help users to find appropriate collaborators. This problem is especially important in the domain of Computer-Supported Collaborative Learning (CSCL). There are several methods which solve this problem but they usually use only one source of information about users and do not consider actual context.

Our main goal is to propose a method for creating different types of study groups and observe their dynamic aspects. This method will be able to take many users’ characteristics as inputs, i.e. interests, friendship with other students, knowledge of learning objects etc. Also, the proposed method will consider information about an actual student’s position in the learning system. The output of the method will be various types of groups. The group members will be for example friends, experts or novices in the problem area. In order to create these groups we will employ several methods (i.e. latent jigsaw method or methods for creating homogenous, heterogeneous and also mixed groups). Students in created groups will be able to communicate and cooperate with all available collaborative tools. We will observe dynamic aspects of created groups, especially how students use these tools to achieve their goals. The result of our observation will be the behavior patterns which can be used as an additional input for the proposed method.

We will evaluate the proposed method in the e-learning system ALEF, Adaptive LEarning Framework. The result of experiment will be recommendations and observed behavior patterns which can be used in any e-learning system to encourage students’ cooperation with respect to available collaboration tools.

In Proc. of Spring 2011 PeWe Workshop, pp. 69-70
to the top | to the main

Using Implicit Feedback in Recommendation Systems

Peter Študent
master study, supervised by Ján Suchal

Abstract. The amount of data on the web is growing day by day and therefore it is still more difficult to find content that is interesting for the specific user. This problem has a major impact especially on the news portals, where is a very rapid variation of content and there is big risk that the user miss an important article from his point of view.

The aim of our work is to analyze the possibility of using negative implicit feedback based on user behavior in process of searching interesting content on the web and generating automatic recommendations for that content. Main advantage of using implicit feedback in comparison with explicit feedback is that there is no unnecessary burden on users of redundant operations. The aim of introducing implicit negative feedback in the process of generating recommendations is to increase the quality and speed of generating recommendations compared to traditional systems without this kind of feedback.

One of the indicators falling under negative feedback we focused mainly on is information about situations when user immediately exit pages containing uninteresting content without being reading whole page. Currently we are analyzing different forms of adaptation of negative feedback in the process of generating recommendations. One of our approaches is to identify groups of similar users based on negative feedback and make recommendations for these groups.

Proposed solutions of utilization of negative implicit feedback in process of creating recommendations are subject to experimental verification on dataset from the existing web news portal. We had done synthetic tests of described method which has shown us than we can make recommendation increase speed of system by 5% with same quality compared to traditional recommendation systems.

In Proc. of Spring 2011 PeWe Workshop, pp. 35-36
to the top | to the main

New Approaches to Log Mining and Applications to Collaborative Filtering

Ján Suchal
doctoral study, supervised by Pavol Návrat

Abstract. Our work focuses on two main goals. Novel approaches to log mining and the potential usage of these implicit data in recommendation systems, especially collaborative filtering exploiting implicit negative feedback data.

First we present a method for mining sources and cascading graphs of viral visits from raw logs. Such information can be useful for detecting influencers and detection of potential sources of viral traffic. We present types of sites for which our method can be used and experiment on real world dataset containing the massively viral start of service.

Second approach focuses on mining negative interests of users from basic server logs in the domain of news articles. We propose two different methods for mining negative feedback, the first is based on time-based identification of articles which users do not read, second is based on detecting articles that users have probably seen, but not even clicked. Such data can be used in addition to positive interest that are normally used for generating recommendations. We also show that incorporating such feedback into collaborative filtering recommender gains 8.5% higher click-through rates and lowers recommendation rejection rate by 5%.

Finally we present a novel method for linearly scalable nearest-neighborhood based collaborative recommender system using specially prepared fulltext indices. Evaluation is done datasets from largest Slovak news portal and recommendation contest. Comparison with graph-based spreading activation recommendation method shows comparable results in means of relevance and with superior scalability characteristics.

In Proc. of Spring 2011 PeWe Workshop, pp. 33-34
to the top | to the main

Modeling a Tutor for E-Learning Support

Peter Svorada
master study, supervised by Jozef Tvarožek

Abstract. E-learning web systems allow students to educate themselves for example by studying materials, solving tests and doing exercises. Students take actions (text reading, answer inputting, text messaging etc.) which change the learning environment in which they work. It is proven that students can learn more if they are included in the learning process and this process is adapted to the needs of individual student. Likewise it is known that student advances faster in the learning process if he is lead by someone who acts more like a friendly tutor than a leading authority.

Focus of this research is put on finding possible improvements in the area of tutoring model part of intelligent tutoring system architecture. This model receives input from both student and domain model and it makes appropriate decisions about tutoring strategies and actions. These decisions decide whether it is time to intervene, when is the best time and how to do so. (Whether it is feedback on the correctness of steps taken – not only the final answers, providing error specific feedback etc.) Picking the next steps in the learning process (picking the content and right form of its presentation – many ITSs present content as a plain text but some students might prefer different approaches like video, self solving exercises with narration etc.) is also part of the tutoring model’s functionality. Other responsibilities of tutoring model include offering the context-sensitive next-step hints (currently primarily called upon the student’s request what presents itself as spot for improvement and automatization). Because of fact that we are working on application that provides means of student collaboration, tutor can support and lead this collaboration to archive better results (by dividing students into the right groups and encouraging them to help each other in a way that is useful for all of them) We also think that tutors can provide technical support making the final application more userfriendly. Tutor’s communication and feedback should feel natural to the students what creates another room for improvement especially considering use of different languages.

Goal of this research is to pick the most suitable improvements, design them and implement tutor model using them into existing application. This application is build on Silverlight technology with the main focus on math studies. This application using our tutor model will be then tested by high school students .

In Proc. of Spring 2011 PeWe Workshop, pp. 45-46
to the top | to the main

Combining Different Data-Sources for User Modeling in Personalized Learning

Maroš Unčík
master study, supervised by prof. Mária Bieliková

Abstract. The trend of using e-learning systems is progressive growing and opportunities that the web 2.0 provides are huge. Nowadays, e-learning systems offer more and richer content, enable communication and collaboration among users. The rise of using these systems caused the flood of information. The adaptive e-learning systems are trying to address the most crucial issues, which are related with this overflow: (1) adaptive systems present user only information, which is for him/her appropriate or interesting at the moment, (2) help user to decide the way to proceed when viewing content, (3) avoid user to lost in the content or avoid to forget his/her original objectives. To allow such personalization raised personalized web-based e-learning systems, which monitor characteristics of individual users, including modeling of their skills, knowledge and/or interests. The performance of such systems is derived from an important element – a user model, which allows users minimizing error rates and learning time.

A lot of problems in the domain of user modeling were introduced. Combinations of several different inputs for user model or using of information about user beyond the adaptive system to enrich the user model are just two of them. Another problem is visibility of user model for users, with which the term scrutability is highly related. In the most of systems, the user cannot directly access the user model and cannot provide explicit feedback, which could be otherwise taken in the count. Much work has been devoted for resolving of mentioned problems, but any of them is sufficient enough.

Our aim is to design transparent user model, with segregated data collecting about user and constructing of user model itself. We consider several sources of inputs to process. We consider also visualization of user model from user point of view, which allows direct and explicit feedback from students to enrich the user model. The visualization can also answer for the question what the system models as true, what it models as false and to find relationship between this beliefs, if exists. It brings also another benefits, since many real user models are likely to be large, the visualization helps the user:

  1. to get an overview of the whole model,
  2. to get a clearer overview of dependencies in the user model,
  3. to adjust the sensitivity of the user model.

We will experiment with proposed user model in the Adaptive Learning Framework in real-settings environment, which is used as e-learning system at Faculty Informatics and Information Technologies at several courses.

In Proc. of Spring 2011 PeWe Workshop, pp. 47-48
to the top | to the main

Automated Recognition of Writing Style in Blogs

Martin Virík
master study, supervised by Marián Šimko

Abstract. In the current web, blogs represent a genre that stands between static pages and live forums. They have became a tool for ordinary users to share information, ideas or even emotions, creating a heterogeneous mass of user generated content filled with unique information related to individual as well as society-wide issues. Since the blog articles are typically weakly structured, the extraction of valuable information is becoming increasingly difficult to handle. Even though there are no restrictions or limits to content or form in blogs, users seem to spontaneously create writing styles and genres reflecting their intention and current emotional state. Modern search services are aware of these genres and consider them as highly valuable for several tasks. For example, for filtering articles with low information value (like diaries, enabling search engines to focus only on relevant sources) or, alternatively, for recognizing sentiment about a specified object.

In our research we follow a distribution between informative and affective articles and for affective articles we suggest two dichotomies of further differentiation: reflective vs. narrative and emotional vs. rational. By their combination, we receive four categories, which we will use for our classification. We have gathered a dataset for initial experiments of about 16 thousand blog posts and managed to manually classify a small subset in a user study including several participants. In our work we focus on linguistic characteristics of blog articles. We propose a novel method for Slovak blogs classification that considers not only word usage and lexical and morphological attributes (typical for state-of-the-art approaches), but also more complex features such as sentence syntax or text structure obtained during a pre-processing step. We suggest and implement a method for morphological and lightweight syntactic parsing focused on describing simple and compound sentences and discovering predicate candidates as well as their attributes.

In the current phase of our project we are gathering a larger training set of classified articles and improving lightweight syntactic parsing methods and methods for feature selection. We are experimenting with classification algorithms and classifier comities in order to boost the accuracy of classification.

In Proc. of Spring 2011 PeWe Workshop, pp. 71-72
to the top | to the main

Modelling Context Relations To Discover Hidden Contexts

Dušan Zeleník
doctoral study, supervised by Mária Bieliková

Abstract. Situational attributes like location, time, weather etc. are available now for devices whose only goal is to help us. We call these situational attributes simply contexts. Contextual information is not a new topic for researchers. It has been already explored in many different areas of interdisciplinary research. But it all started with monitoring users in the scope of the web. Nowadays this research around contexts rises again because of computing power around us (generally called pervasive computing). Context is new dimension in information retrieval as it is in recommending. Contexts such as location or part of the day are widely used to personalize systems. Contexts, and not only simple as location and time, are becoming more and more available what bring the research in the field of user modelling to the new and attractive perspective.

Web is a very specific place for contexts. We are able to recognize user interests by monitoring their interaction with websites. We could easily obtain information about current location (IP address) but there are more contexts which are not so visible. For now, we should abstract from tools of pervasive computing, since we are not able to determine low level contexts using computer. Hence there is new trend to browse web using smart devices, but still, we are never sure about complete set of current contexts of the user.

There has been some work done in classifying sentiment or opinion mining. It takes us to the idea of web as storage of the users opinions. Furthermore, people spend many ours by contributing to web what opens a new possibilities. By monitoring their contribution we could even discover contexts of their current state in general. To be more specific our intention is to extract emotion from the contributions. They use Twitter, Facebook or different tools to express their emotions or also other contexts, like what is new for them, where they are, what are they doing and what they feel. People are writing blogs, twits, commenting and liking. This is all about nature language processing and learning the computer to understand humans. So far, we are introducing the Web of Emotions. Here we want to start the new approach in understanding emotions and how they affect user behaviour or interests on the web. For instance, sad person would possibly like to watch comedy or play a game to forget about the sadness.

In Proc. of Spring 2011 PeWe Workshop, pp. 49-50
to the top | to the main