Students´Research Works – Autumn 2009

Navigation and Visualization

Retrieval and Search

Classification and Recommendation

User Modeling, Virtual Communities and Social Networks

Domain Modeling, Semantics Discovery and Annotations

Semantic Web Services

Imagine Cup

  • Eduard Kuric, Vladimír Mihál, Karol Rástočný, Róbert Sopko: Imagine Cup – Game Design 2010 Competition
  • Anton Benčič, Roman Meszároš, Roman Panenka, Marius Šajgalík: Imagine Cup – Software Design 2010 Competition

Visualization of the Semantic Web

Jakub Baloga
master study, supervised by Michal Tvarožek

The semantic web is an evolving extension of World Wide Web. The difference between the two is, that all the information in the semantic web (pictures, publications, audio, video, etc.) have their semantics defined . This allows better search and orientation possibilities in the informations provided for both man and machine. The machine can automatically select those informations we are interested in and filter out similar, but irrelevant information. Man can use the semantics to improve his orientation in the information space and reduce the time required to find the information we are looking for. The main problem is how to visualize the content of semantic web while maximizing the amount of information and their semantics displayed and still maintain clarity of displayed information and ease of navigation. Another goal is to visualize relationships between the displayed information.

Simple access, understanding and navigation across the semantic web will be possible only with well-designed and comprehensible user interface. To accelerate development and use of semantic web, tools must be developed, which meet all the requirements mentioned above. There are many possibilities of visualizing information and their semantics. I am concentrating on graphic and hybrid graphic and text visualization, since these types allow best possibilities of visualizing relationships between the displayed information.

Another task is to devise a means of visualizing information, even if we do not know its structure and to specify the properties, functionality and methods of interaction with user interface for visualization of semantic web. I want to use this knowledge in creating an approach to search and exploration of semantic web and its visualization, and test the approach by implementing its various parts and then experimenting with them. I want to concentrate on various possibilities of visualization of search process and search results, while taking into account the approaches and methodologies of HCI (Human Computer Interaction). With multiple methods of visualization a possibility opens up to customize the style of visualization according to the preferences of user and creating a personalized web.

to the top

Towards Social-based User Modeling

Michal Barla
doctoral study, supervised by Mária Bieliková

The project deals with enhancing the individual-based user modeling and personalization of adaptive web-based systems with knowledge encompassed within social networks. One of the problems we are aware of in the traditional user modeling is a cold-start problem, when adaptive system cannot provide any meaningful personalization to a new user, for who it does not have any information stored in his or her user model yet. However, such a new user is probably the one which deserves the most some help and guidance provided by the system, in order to get more familiar with its interface, provided functionality and the presented information space itself.

Or goal is to contribute to the cold-start problem by leveraging social relationships between a new user and other, already present users. The approach is motivated by social behavior, which is inherent to the most of human beings. More precisely, the initial estimate of user characteristics is acquired as a weighted combination of characteristics other users interconnected with various types of relationships, acquired from various sources as well as based on common navigational patterns of users. The advantage of such approach is that it produces the standard user model, which can be maintained by well-established approaches to the user modeling and which can be easily used by classical personalization and adaptation techniques.

We plan to evaluate our method in a domain of information research, such as searching for documents in the open information spaces as the Web is or in closed but vast information spaces like digital libraries or electronic newspaper. We use rather simple keyword-based (tag-based) user model representation coming from various text analysis techniques applied on web-pages viewed by the user. More, we acquire various relationships between tags by analyzing folksonomies and employing linguistic knowledge from Wordnet in order to compare particular user characteristics or even whole user models. Our evaluation platform is an adaptive proxy server capable (apart from logging the information gained by analyzing the traffic) to personalize either user requests (e.g., disambiguate the search keywords) or responses sent from particular web server (e.g., annotate or re-rank search results).

to the top

Automatic Semantic Web Service Composition

Peter Bartalos
doctoral study, supervised by Mária Bieliková

Our research is oriented towards the semantic web service composition. We have proposed novel method for automatic web service composition and execution. Our approach brings promising results in the following aspects:

  1. Consideration of the pre/postconditions of web services.
  2. Constraint back-propagation.
  3. Scalability regarding the number of used web services.

The pre/postconditions of web services describe the conditions which must hold before/after the execution of the web service. These conditions are expressed as logical statements and must be considered during the web service chaining – process when one service’s output data are used as input data for another service. Two services can be chained only if the postcondition of the ancestor service logically implies the precondition of the successor service. Our approach for the decision if this is true exploits the possibilities of current relational database management systems and their capability to process logical statements. The logical formulae expressing the pre/postconditions are normalized to disjunctive normal forms to make the decision simpler.

The pre/postconditions of web services are used also to identify the relations between the parameters of web services and the user defined goal of the composition. In our approach we realize value constraint back-propagation from the user goal to the web services used in the workflow. As a result of this process, the related parameters are set to the values from the goal constraint. This approach helps to create more proper workflows regarding the user goal and may result in a definition of the input data.

One problem with the workflow composition in general is that it is an NP-hard problem, i.e. it to find all the possible composite services leading to the user defined goal is an extremely computation demanding process. To overcome this problem, several approaches are designed. In our approach we deal with this problem by realization of huge preprocessing during which effective data structures are created. The second mean helping to make the approach fast is the usage of relational database management systems to realize a part of the composition process. As a result our approach is able to create a correct composite service in an acceptable time (milliseconds) even if the number of web services raises to some 10-thousands.

to the top

Sharing Information about Events on the Web

Pavol Fábik
master study, supervised by Mária Bieliková

Conferences are crucial in the professional life of academics. They provide a platform for them to present the results of their research, learn from their colleagues, and network. Therefore, academics need to find the conferences that may help further their careers.

The main objective of my project is to put in place the processes that will allow personalized recommendations to be generated. Conference bookmarking provides a solution for inserting and sharing information about conferences based on social bookmarking. Users, documents (conference websites), keywords and relationships between them are the backbone of social bookmarking. The important processes which needed to be conceived and developed are:

  1. Creating personal lists of conferences based on users interests.
  2. Automatic inserting new conferences based on the conference list analysis from different sources (currently we would like to use DBworld server
  3. Conference bookmarks recommendation will be based on keywords, user’s personal conference list and interests

to the top

Model Based Improvement of Website Structure

Michal Holub
master study, supervised by Mária Bieliková

The constant growth of the amount of information available on the Web represents a big challenge. We need the means to effectively browse the websites without being overwhelmed by irrelevant data. Answer to this challenge lies in personalization of the websites content. Systems capable of providing this personalization are called adaptive web-based systems. They perform two main tasks: adaptive content presentation (including layout adaptation) and adaptive navigation support. In our work we focus on website structure improvement by adapting navigation based on observation of user behaviour. We chose the web site of our faculty for experiments.

The key input is the model of a user. By monitoring his interaction with a web page we can deduce in which links he is and is not presumably interested. Moreover, we can reconstruct the whole path of his navigation through a web site. After collecting the paths of navigation (called clickstreams) from several users we can start to group them based on their similarity. Our method takes all clickstreams of the user and produces a unique vector that describes him. We then compare users by computing cosine similarity of their vectors and put similar users to one group. This is a prerequisite for later content adaptation of the web site.

With users divided into groups we can start recommending useful links to members of the same group. These are displayed in separate menus around original web page. Optionally, we can modify the original menu (sort the links differently, highlight/remove some links or remove the whole menu) and let the users to use personalized navigation.

The second aspect of our work focuses on the body part of the web page rather than on menus. It is therefore a complement to the first method mentioned. We noticed that most of the news items displayed on our faculty’s web site informs about upcoming events. Therefore we assume that a user who clicks on this news is interested in particular event. We analyze the content and look for date of the event. Afterwards we recommend this event to other users by showing it in their personal calendars. The personalized calendar is another improvement of the structure of original web site.

Our method of content personalization is based on man in the middle principle. Software system that implements our method lies between user and web server. When the user requests a web page from the server, the system captures the web page and modifies it before displaying it to the user. For our method to work properly we need to know the structure of the web site. We acquire it manually. We developed a special format that describes it (e.g. which part of the web site is menu, main text, etc.). Web master of each site is asked to provide a file in this format for better personalization.

to the top

Virtual community detection in wide space

Marián Hönsch
master study, supervised by Michal Barla

In the past few years effectiveness of simple information retrieval with traditional search engines, even if based on field tested indexing algorithm, has suffered under the rapidly growing amount of content in web space. The user is willing to spend less and less time and patience to additionally define search queries or browsing through bigger amount of results. To improve this situation the trend is to introduce and combine the classical methods of information retrieval with personalization techniques. Some attempts to personalize search results were made, but users rate some of these attempts as not sufficient enough or hard to use. The system is rated as not accurate enough, it takes too long till it can provide really personalized results and user gets tired to enter bigger amount of explicit profile data. The personalization effect is often limited only for one server or portal. One promising technique is to use collaborative filtering. In this technique we recommend information items between users with similar interests, profiles. The profiles are modeled implicit, based on user behavior, or with help of explicit inputs from user. Possible next step is to detect virtual communities within users with similar profiles and base the recommendation on the group profile. User can receive personalized results based on community behavior and experiences. The virtual community can hold only a certain amount of members to ensure personalized enough results for their members. Therefore maintenance and updating of user profiles is an ongoing task for every adaptive system. Our aim is to deal with all these aspects of virtual communities and improve information filtering.

Our main goals are, to determine and analyze techniques for user profiling and develop a method, that can detect virtual communities and adapt contend within searching or browsing. Possible options for user modeling are keyword-based, feature-based, stereotype based user models and their hybrids. All these models can be modeled with the overlay approach with layers. The method can be evaluated in an environment of a news portal. Information will be gained without logging. The user profiles are modeled from articles, which the user reads.

to the top

Interactive Web-Based Photo Gallery

Filip Hrkeľ
bachelor study, supervised by Michal Tvarožek

Our main objective the development of Bachelor’s thesis is essential elements of interactivity that will improve the existing photo gallery in facet browser. To inspire us to help existing gallery on the web portal Facebook, which works on similar principles, which we used.

One of the main elements of interactivity the user the option to use the designation kontorlu objects located in the picture. Designated object may be a person or thing that represents this photo. This functionality allows the user to browse the display of the images used in which the object is located, which will better align the requirements of the browser user. Will you be able to create a story browsing. We will try a more logical organization of metadata for photos using Ontological network defining relations between them, which will run in the labeling.

Another element of interactivity, the possibility of adding comments to individual photos and albums. The user will use his nick to represent comments.

to the top

Semantic Analysis of Unstructured Text

Martin Jačala
master study, supervised by Jozef Tvarožek

The amount of information available in various electronic formats is growing every day. As the volume grows, the need of effective navigation and visualisation becomes more important. Great amount of daily published content is written in form of unstructured, human readable form – so called natural language. In this text we can find various information about persons, events or relationships. Interesting is among the directly written information to explore hidden relationships or other correlations we can reveal thuru automatic semantic analysis of news reports, blogs, e-mail communication or discussions.

Our aim is to design and implement method for automatic semantic analysis of unstructured text. In the first phase, we need to build large knowledge base with semantic metadata. The knowledge base is composed of various sources, such as the Wikipedia project, dbpedia, dmoz and can be further extended with data already extracted with our method. The community of users can be also involved and helpful in various stages. The unknown entities can be annotated by users, if we provide attractive user interface, e.g. as a game. Results of our method can be also further revised and refined by the users.

We implement web-based application accessible to wide range of users. This web portal serves as interface between our method of text analysis and user community, allowing them to use, search and extend the knowledge base, and as well as an evaluation of the method.

to the top

Models, Patterns for Adaptive Web-Based Applications and Usability

Branislav Kokavec
doctoral study, supervised by Mária Bieliková

Automatic building of user interface for adaptive web-based systems in terms of life, may violate the basic principles of usability such as user control and consistency. Should be paid to building and evaluation adaptive web-based system, in practical terms.

Usability engineering is a systematic process of developing interfaces that are easily usable. There are various methods to guarantee that the interface of the finished product was effective. There are guidelines, heuristics and user-centered orientation.

Content-Based News Recommendation

Michal Kompan
master study, supervised by Mária Bieliková

Amount of data accessible from whole Internet is growing day by day. The number of people dependable on effective work with these data is enormous. Web content especially the content of news portal is changing every day and value of this information is decreasing every minute. There is no chance to face up this problem without machine support based on recommendation systems.

The main goal of our work is:

  1. analyzing wide-used methods of similar text finding
  2. categorization and similarity find of these articles based on content
  3. recommendation of these articles to the specific users

The recommendation is based on user behavior model and based on community behavior model too. Information about users we gain without need of user logging to the system, based on various methods (e.g., using cookies). “Rating” of the article, is measured by positive or negative user evaluation, or based on time which is user reading article in proportion to length of text. One of the biggest problems with similarity finding is keywords extraction especially in Slovak language. For every article we create an article characteristic vector. These vectors consist of keywords extracted from the name of the article, form article content and finally from the category of the article. Because of large amount of daily added articles, it is necessary to high reduction of these vectors. After this reduction it is possible to find similar articles based on various similarity measurements.

Validation of proposed solution will be based on integration of methods into existing web news portal, or there is a possibility to create “gadgets-like” application distributed to users.

to the top

Leveraging Social Networks in Navigation Recommendation

Tomáš Kramár
master study, supervised by Michal Barla

Web search engines generally treat user queries in isolation. The response to a query, list of documents matching the query is the same for all users and does not match his preferences or needs — his context. Search engines work similarly to the databases, they index documents, extract keywords, rank documents and provide sorted results. This model works well for searching relevant documents, but there is a wide gap between relevance and usability. Imagine searching for “Jaguar”; the document about carnivore would be relevant, but not usable if you’d wanted information about a car.

There are multiple causes of this problem — some words are ambiguous (jaguar, latex), other have different meanings for different users (architecture). For some queries, this problem might be addressed by narrowing the search by adding another keywords, although the research proved that users tend to write shorter queries to minimize the cognitive load. For other keywords, narrowing the queries might be very difficult.

Our main goal is to personalize the search results by disambiguating the keywords. The idea is built on a simple premise that the keywords have similar meanings for similar users. The proposed method leverages dynamically built social network created by analyzing the users activity on the Web. The evaluation platform will be an adaptive proxy server which will log all requests for further analysis and modify the search results page to accommodate the recommended documents.

to the top

Combining Content-Based and Visual-Based Approaches for Web-Based Photo Retrieval

Eduard Kuric
master study, supervised by Mária Bieliková

Nowadays web photo albums provide features for users, such as organizing, sharing and searching photos. All of these features are interest of large-scale research. With increasing number of photos in albums, there occurs a need of quick and exact searching. Content based indexing of photos is more difficult than text documents, because they do not contain units like words. The photos in most albums are therefore annotated with descriptive text, such as captions and keywords (tags) that identify photo elements (e.g. there is a green apple in the photo).

Creating annotations is very time-consuming and results are often subjective. Much current research in web photo retrieval is focused on developing fully automatic – objective methods for indexing photos. In general there are two approaches. The first approach is based on the associate text around the web photos and the second utilizes visual similarity of photos. Visual identification of objects in the photo is highly non-trivial and needs special algorithms for recognition of specific kinds of objects (e.g. apples). Therefore, we usually utilize low-level visual features of photos, such as color distribution or a sample of a texture.

Those two approaches are often applied independently. There exists a relatively new model called “multiplied refinement” which allows a combination of those approaches. For example, a user enters keywords (e.g. Slovakia, castle) and then he is expected to express his requirement in visual form – drawing a simple sketch, e.g. navy blue sky in the top of the photo.

Thus, annotated photos are possible to categorize and associate. For example, we imagine, there is a castle in the photo. The user selects a castle area (e.g. by a rectangle) in the photo and this selection associates with photos on which there is shown an interior of the castle. Thus, we can create visual associations and space for social relations.

Our goals are, to analyze mentioned approaches and models for web photo retrieval and to think up a robust solution for searching photos on the web using these approaches.

to the top

Imagine Cup – Game Design 2010 competition

Eduard Kuric, Vladimír Mihál, Karol Rástočný, Róbert Sopko
team project (master study), supervised by Michal Tvarožek

The topic of Imagine Cup competition this year is Imagine a world where technology helps solve the toughest problems. As an inspiration, competition participants were given eight United Nations Millennium goals including serious material, health, economical and social problems. Since we cannot solve all of the problems, we focused on one of them. The biggest problem we have identified is the lack of interest. Many people in developed countries just don’t care about problems of others; they just see their own lives, their own problems. However even with negligible effort, people can understand other people’s problems and can help to solve them.

To solve the identified problem we propose real-time strategy game. We set our game into places in the world, where various people suffer from serious problems. Main task of a player is to help people with solving their problems, educate and heal them, assure them a good job. The game will touch the real-world problems and will be linked to real foundations and their ongoing projects. We attempt to entertain the players and, at same time, inform them about the most serious problems people face today. Moreover, the game will be massively multiplayer, stimulating players to cooperate with each other, sharing infrastructure, technology and valuable specialists. After completing various tasks, players will be rewarded with special items or awards, which can be used to improve game progress, given to a friend of the player to help him or the award may be put on the players ‘mantelpiece’ to show other players progress and achievements within the game.

Besides all the entertainment, fun and education, we offer a meaningful business model to collect real funds for foundations. Players have the option to buy ‘upgrades’, e.g. to unlock special items in game without effort to acquire new possibility, money raised from payments will be donated to foundations and will be spent on real-world projects. Players will also be informed that by buying an upgrade they really help someone somewhere in the world.

to the top

Identification of Interesting Text Fragments with Social Context

Martin Labaj
master study, supervised by Mária Bieliková

In the field of e-learning, the identification of difficult and/or interesting parts of learning text can be useful feature for various tasks like rewriting the text, showing where to focus or offering help. However, methods, which extract this information by directly interacting with the user for example by asking him to rate his first read comprehension, can lead to distraction in the learning process and require that users participate voluntarily and answer truthfully.

By tracking to which portion of text has the user scrolled and therefore which part of the text he currently sees together with time he has spent on this position, we can determine which part is most time-consuming and therefore interesting or difficult. With enough users, details about various parts can be obtained very precisely, even to a single line of text, by taking intersections and overlays of their viewports statistically into account. As in any method dealing with time based user action tracking, there is a possibility that user is pursuing different activities during evaluated period. This can be avoided by tracking whether is the user focused on the text by detecting where is he looking on the screen and considering only active time. Also, information where does user look allows for better identification of active part of text. To some extent, this can be performed by using simple webcam (which is prevalent nowadays) pointed towards user.

Subsequently, tracking of active part of text cannot only be used for statistical purposes and features like indicating parts where user should concentrate, but also in a social context. By displaying where other users are currently reading, user can get overview of how is he doing. Displayed augmented social network providing friends who are currently focused on or near the same part of text as the user can give this user an option to contact friends who are currently thinking about the same portion as he has problem with. As the user asks friends learning the same part, he is not distracting them away from their current study and he also obtains better advice. Therefore, this concept encourages collaboration.

While we are considering using e-learning environment as evaluation platform, this concept is not limited only to this particular area or closed information space. There is possibility to easily adapt and use this concept in open space like Web.

to the top

When and Where or Augmented Photo Gallery

Michal Lohnický
master study, supervised by Mária Bieliková

Photography is defined as the process of producing images (often having signs of art) on a sensitized surface by the action of radiant energy and especially light. However, the studies show that photography views a collection of events, locations, subjects and time for most users, i.e. it is a kind of documentation of various situations. In spite of that, most photo galleries are just like a bunch of ordered photos. Almost all photo galleries are focused on the presentation of a single photo, not for presenting a photo album as a complex collection.

Our work is aimed at augmenting the informational value of photo albums. This is carried out by extending existing approaches by the use of two attributes of photography – geographical location and timestamp. The geographical location is one of the most valuable aspects of the photography. The information where the photo was taken says a lot about the photography even before a photo is viewed because the location is a kind of a connection between the photo and events in the area. For example, we can easily gain the character of vacation (weather, beach, mountain, hiking etc.) from the location.

The other important attribute of the photo is timestamp. The digital photography has existed for 19 years and the timestamp is becoming more and more essential in archiving and browsing of photos. The timestamp is the first attribute in which the photos are ordered in photo albums but the navigation in this area is insufficient.

The main goal of our work is to propose innovative navigation and browsing in photo albums according to the timestamps and geographical locations. This kind of navigation in the combination with proper photo analysis and metadata discovering can create various views of a complex collection of photos in photo albums. This style of browsing photos can be used by the users for sharing photos in much higher quality, for finding photos which they miss in their photo collections, to view places where they intend to go in various time periods etc. It is also usable in commercial sphere – in travel agencies, botanic monitoring etc. When we add the direction of taking photos to the location we can create an ideal presentation tool for real-estate companies.

Web site content metadata acquisition using tags

Milan Lučanský
bachelor study, supervised by Marián Šimko

Our work focuses on mining relevant information from web sites. Information that could be retrieved from web page could be divided into 2 groups. Information related to the first one is stored in plain text. A visitor can see and read this plain text on web page. The second group of information is hidden from visitors and this information is stored in source code of the page. This group is called meta-information, because it is “information about information” on a web page. For both groups there are many methods how to extract the information.

The main goal of text mining is extracting relevant information from text. There are many more or less successful methods of extracting keywords from plain text. Meta-information extraction from source code is used by search engines. They evaluate different HTML tags from web pages and then give specific weight to terms marked by those tags. It is interesting to combine these techniques to create one measure for weighting extracted keywords. It should be possible to assign one value to each word considered to be relevant.

Our aim is to create a method which combines known algorithms from web content mining (text mining) together with meta-information gathered from source page tags. The method should be able to crawl specific web site, but not pages outside the main domain. It should be using algorithms like C-value and TF-IDF to retrieve terms from the main text of a web page. It will also use other algorithms to retrieve anchor text, title of web page, etc. from source code. Finally, we use proposed merging algorithm, which joins both sets of words and gives a weight for every word. By this, we will create index of pages which can be used as a base for next research (e.g. for building searching engine).

to the top

Collaborative Tagging Systems

Tomáš Michálek

bachelor study, supervised by Marián Šimko

To enable correct computer processing of web content we need to represent the way computer understand it. I’m discovering (searching for) new way of semantic representation of web content also known as metadata. My approach is to describe existing solutions like Delicious, Flicker, Stumbleuppon etc. identify weak point and try to suggest new method which will be able suggesting better tags for users relevant for the web site or resource. For this purpose I’ll try to use social networks using folksonomies and collaborative tagging to incorporate users in process of making the content of web sites semantic. In generally tagging is manual process limited to user’s experience. Retrieving good tags can significantly increase quality of search results.

Criteria for good tags are:

  • High coverage of multiple facets
  • High popularity
  • Least-effort
  • Uniformity (normalization)

to the top

Recommending Exercises in Adaptive Educational Web-Based System

Pavel Michlík
master study, supervised by Mária Bieliková

When using an educational system that provides adaptive navigation, students will not access and study all the available learning materials in the same order. There is no predefined sequence of topics like chapters in a book. Therefore, exercises which are provided to support educational texts and to allow students to test their knowledge cannot be accessed in a predefined order as well. This is because most exercises are related to more than one topic. In addition to that, the educational system can contain more exercises than are necessary for the students to learn the topic.

Our goal is to design a method for selecting the next exercise for the student to solve (or recommending a few exercises from which the student can choose). The method should consider:

  • The student’s current knowledge of related concepts (or topics) which can be obtained from various sources: materials that he studied, test questions, previously accessed exercises and his feedback on them etc. If the student’s knowledge of a particular concept is very high, we do not want to focus on that concept in future exercises, because the student would not learn anything new in such case.
  • Relations between concepts, such as similarity or prerequisites. The student must meet the new concept’s prerequisites before we present the new concept to him. Otherwise he probably would not be able to understand it.
  • Difficulty of the exercise should match the student’s level of knowledge. If too easy exercises are presented, the student does not learn as much as he could. Too hard exercises usually cannot be understood by beginners and tend to discourage students.
  • Learning goal: Perfect knowledge of every concept in the course is not always required and/or possible. For example, a mid-term exam requires knowledge of only a subset of all concepts. So before this exam, students usually focus on these concepts and do not want to learn other ones, unless their knowledge of the required concepts is already perfect. Therefore the designed recommendation method should allow selecting concepts which the students should focus on. In addition to that, it also should allow selecting desired knowledge level of these concepts. If there is a student with very little knowledge of more key concepts and only little time for learning is remaining, he will probably not be able to learn all of these concepts perfectly. In this case the student wants to learn at least some basics of all required concepts.
  • History of exercises previously presented to the student and the student’s feedback on them. If the student did not understand an exercise, we do not want to recommend the same exercise immediately. Either a similar or an easier exercise should be preferred.

to the top

User Annotations in Educational System

Vladimír Mihál
master study, supervised by Mária Bieliková

User annotations in the educational system can provide two valuable benefits: space for content based communication and an explicit feedback from students. First benefit makes system more attractive and useful, second can provide us interesting data about students’ interests and knowledge.

Idea of user annotation is to let users of system to annotate the web document, contribute or comment its’ content by inserting the annotation. Advantage of this kind of contribution is that all questions, ideas and thoughts related to the document remain in their context within the document. This is exceptionally useful for educational purposes, because questions of students can be spread around the textbook and answers to these questions are shared with everyone reading the textbook. In this way the most actual information about the topic of the course can be better organized and visualized.

Modern education systems utilize various methods to adapt system behavior to every student including adaptive visualization or adaptive navigation. To successfully adapt educational system for the student it is essential to acquire information about the student. User annotations can supply this information, since the student by creating an annotation; express the nature of interest in the topic described in the annotated text. Moreover typed annotations, e.g. questions, corrections or simply notes, provide us more specific information about the student, for instance his about his knowledge of actual learning concept.

to the top

Recommendations in multidimensional data

Michal Oláh
master study, supervised by Ján Suchal

Recommender systems are becoming a standard part of Internet shops and search engines. They use information gathered from users to predict future behavior and to estimate user interests. One way the recommender systems work is based on representation of objects (users, products etc.) and their relations using graphs. These graphs are valued by various algorithms (e.g. PageRank, HITS) that increase the accuracy of a prediction. Current methods of recommendations and searching on the web work with a graph that has only one type of edges. Graphs with multiple types of edges can reveal new information not present when using prior methods. Using this approach enables search/recommendation much more effectively than it is currently possible.

The goal of the project is to create a recommender that uses multidimensional data to make its recommendation. It is also important to make it scalable to very large graphs with millions of edges. Using a decomposition technique called HOSVD I successfully tested a multidimensional recommender that recommends real-estate to people based on user preferences/searches. The next step is to use the same technique to recommend based on attributes of items that change in time. People tend to have very specialized requirements when buying real estate such as: Flat cannot be on the top/first floor; Has to be located near the center, but not in a “bad” neighborhood; Price changed recently etc.

Based on the requests and multidimensionality of the data I can make very accurate suggestions as to what people should and should not buy. Even though I used an example, the recommender should not be tied to one domain space and be usable in various domains.

to the top

Life is just a game

Jana Pazúriková
bachelor study, supervised by Jozef Tvarožek

Children and youngsters today have problems with socializing – meeting new people, interacting with them smoothly, making friends and nurturing relationships they have. A human being is a sociable creature, and socializing is a crucial need in young age; its lack can cause many difficulties and can have great impact in their later life.

The main aim of this project is to develop a method (and a software prototype) that what would improve young people’s social skills. Children learn everything faster and better if presented in am interesting and playful manner. That’s why we decided to devise a strategic game. In the game, the user first specifies the essential characteristics of their personality and some characteristics of their dreamed-of friend, somebody they would like to meet and relate to. The goal of the game is to become a friend of that person. This can be done by living their own virtual life, comforting their psychological needs and, mostly, socializing. They maintain relationships by talking and spending time together, while friends can introduce you to their friends, and the friends of their friends, etc. The communication between friends is given a special attention in our project. Dialogs during social interactions can progress in several directions due to personal characteristics of interlocutors, their moods and influence of past conversations. Certain questions and answers are chosen according to those circumstances.

The game gives children the opportunity to simulate the process of socializing in the real world, gain experience and, eventually, enhance their own social capabilities. The software prototype is web-based and is implemented in Silverlight 3.0 and C# programming language.

to the top

Browsing similar data entities by breadth-first search in the semantic Web

Karol Rástočný
master study, supervised by Michal Tvarožek

With Web repositories growing, the effort to propose natural data search is graduating. There are a lot of approaches to web browsing. Most research in this problem area is oriented to finding unknown or not very well known data. We look at this problem from a quite different view. We propose a method of browsing, when user has already found a data entity, for example picture, which is very similar to what he is looking for and he want to find other possibilities.

We realize this idea by utilizing view-based search within the semantic Web. First, the user sees a simple semantic graph with a basic data entity in the center and its facets around. After that, the user can browse the graph by expanding nodes representing facets. But simple node expansion adds a lot of possibilities of data entities, which can make the graph unclear while the user does not want to see most of them at all. This problem can be partially addressed if the user sets the state of each facet to one of these before expansion:

  1. Wanted – new data elements should have direct connections to this facet.
  2. Unwanted – new data elements are not allowed to have direct connections to this facet.
  3. Possible – new data elements may have direct connections to this facet. This state can be used as a default state.

Unfortunately, there are a lot of other problems. How to display simple facets in semantic graph, where RDF triples are displayed and not only data entities with their facets? How to display results in the most natural way? What tools can be useful for working with this type of graph? Which algorithm will be best for organizing the nodes in the graph area? Is a 2D or 3D graph better?

to the top

Graphs in service of exploratory search

Jakub Šimko
master study, supervised by Michal Tvarožek

Our aim is to research possibilities of utilizing the graphic visualization tools in faceted browsers and exploratory search domain. These tools mainly include displays of graph structures (such as ontology representing conceptual scheme of search domain), which enable visual navigation across information space and support layout of search results based on hierarchy or semantic distance of given objects.

We propose to improve search by more effective navigation in broader result set. Result sets often contain number of irrelevant results (left to user to deal with them), which make the search slower. Faceted browser displays result set based on settings of facets. Facets represent additional criteria for further reduction and organization of the set and are often represented as lists of options. Task for the graph tools is to replace the classic list view. Graphs enable visibility of similar concepts (using connections or layout in two dimensions) and therefore reveal more relationships to user. Besides facets, graph can be also used to represent result set itself.

User is well concerned about what he wants, but also what he doesn’t want. We propose an option for user to manually remove an object or concept that he doesn’t want, out of the graph. Removal or suppression of objects (concepts) similar to removed one may also follow. Removal action has also effect on other parts of the browser and has to be accompanied by strong feedback for user. Search history representation is other possible usage of graphs, particularly trees. In classic browser, history provides basic back step possibility, but is unclear when it comes to different search branches, when user is interested in more objects found. Continuously created, always visible tree of queries, visited objects or other actions performed during session, can give the user more clear view on his own work and results.

Some objects are hard to find, and users often start to search them with no effective queries. After sequence of attempts they improve them and reach the demanded goal. We propose an approach for discovery mapping of “wrong queries” to “right objects” to create possible shortcuts for users. For this we need to determine start query of a session and what the result was. After multiple occurrences of such query-result couples, we can establish a mapping and start proposing shortcuts to “irrelevant” results. This is hardly possible in classic web search applications, because of lack of possibility of tracking user actions. Using applets with accent on session opening and closing makes that more possible.

Important challenge is to validate our approach. Because of need of user participation, this renders validation rather difficult. We plan to organize testing with group of common users with given search tasks. The search domain may be used both web and closed information space with more semantic intercepting structure (such as research publications base). We are still open to any other proposals for validation tests conceptions.

to the top

Improving Search Results with Lightweight Semantic Search

Marián Šimko
doctoral study, supervised by Mária Bieliková

To satisfy user’s information needs, the most accurate results for entered search query need to be returned. Traditional approaches based on query and resource Bag-Of-Words model comparison are overcome. In order to yield better search results, the role of semantic search is increasing. However, the presence of semantic data is not common as much as it is needed for search improvement. Although there are initiatives to make resources on the Web semantically richer, it is demanding to appropriately describe (annotate) each single piece of resource manually. Furthermore, it is almost impossible to make it coherently. The current major problem of the semantic search is the lack of available semantics for the resources, especially when considering the search on the Web.

To overcome this drawback, we propose an approach leveraging lightweight semantics of resources. It relies on resource metadata – model representing resource content. It consists of interlinked concepts and relationships connecting concepts to resources (subjects of the search) or concepts themselves. Concepts feature domain knowledge elements (e.g. keywords or tags) related to the resource content (e.g. web pages or documents). Both resource-to-concept and concept-to-concept relationship types are weighted. Weights determine the degree of concept relatedness to resource or other concept, respectively. Interlinked concepts result in a structure resembling lightweight ontology thus allowing automated generation (we have already performed several experiments with promising results in e-learning domain).

Having domain model as described above, we examine the possibilities of search improvement. We propose two variants of so called concept scoring computation. With concept scoring we extend the baseline state-of-the-art approaches to query scoring computation expecting an improvement of the search. Utilizing metadata we are able to assign the query to particular topic (set of concepts) and yield more accurate search results with respect to related resources. Currently we are working on the evaluation of the proposed approach.

to the top

Tracing Strength of Relationships between Users in Social Networks

Ivan Srba
bachelor study, supervised by Mária Bieliková

Current web is known as a space with constantly growing interactivity among users. It is changing from a place for data storage to a social place. Place where people not only search new information but also communicate with each other. Obviously, the best and biggest places for common interaction are social nets where people can arrange and explicitly express many kinds of relationships. There are a lot of other places suitable for tracing user’s relationships, for example web news, forums, emails. The strength of these relationships between the users varies and can rapidly change in time.

Our aim is to analyze evolution of user’s relationships and develop a web based application, which will approximate user’s relationships with other users in time. This approximation will be based on varied user’s activities. The application should be presented on real source of user’s data.

We have decided to use two independent sources of user’s activities. The first source is well known and popular social net Facebook and the second one is user’s email communications. We designed the application in such a way that it uses a special wrapper for each source and predefined factors for all user’s friends. Each of these factors has predefined value. The strength of relationship is counted as a summation of count factor’s appearance multiplied with its value. The final result will be presented as a comparison of strength of relationships during specified time interval.

If we can approximate user’s relationships with other users we have valuable information whether these users share same interests, same ways how to spend free time etc. So proposed approach can be used in many areas where we want to know more information about our users, for example on web pages which change their content according to current user’s attributes.

to the top

Improving Search in Multidimensional Data Using Graphs and Implicit Feedback

Ján Suchal
doctoral study, supervised by Pavol Návrat

With the coming era of semantic web, large, structured and linked datasets are becoming common. Unfortunately, current search engines mostly see web only as a graph of pages linked together by hyperlinks, thus becoming insufficient for users for searching in such new, structured and multidimensional data. When dealing with multidimensional data, identifying relations and attributes that are important for users to achieve their searching goals becomes crucial. Furthermore every user, can have different priorities, different goals which can even change in time.

One of the goals of this work is the extension of existing graph algorithms for multidimensional data, where the usage of tensor algebra and multigraphs can be useful, in contrast with currently preferred matrix algebra. Such extension of graph algorithms would be able to increase relevance and quality of search, and even enable new quality of query formulations.

Evaluation of relevance and quality of search can be done gathering implicit feedback (e.g. quality can be measured just by monitoring user interactions with the system). Another goal of this work is the exploitation of gathered (implicit or explicit) feedback from users to not only evaluate the underlying system, but also to analyze users behavior thus opening possibilities for adaptation and personalization.

The main goal of this work is the usage of implicit feedback in search engines dealing with large multidimensional data, to improve search result quality and relevance of results.

to the top

Detecting implicit feedback in web browsing

Michal Šušoliak
bachelor study, supervised by Michal Barla

This bachelor thesis brinks new view to the feedback in web browsing. Main point of this thesis is detecting implicit feedback. We will monitoring behavior the visitor of internet webpage’s, which helps better understand what visitor rely needs. It is mainly used by programmer of webpage’s, web administrators or shopkeepers and principals, which gave the website to evaluate.

With help of implicit feedback, we can recognize if the visitor of website is not interested in contest of the page or the graphic interface is poor. We can also say if he is interested on the web page. The mark is analyzed way of getting implicit feedback.

We will use data mining methods, in which we will reach behavior of visitor on website. This information will be store in database. Every visitor will be stored in this database with information. In database will be visitor’s mac address which will his nick name. The others information in database is for example: number of click or time (how long was visitor on website) etc.

We can define visitor behavior with help of implicit feedback. And then we define his reaction of website, too. In this thesis we will define, how can be information about visitor behavior evaluate and used.

to the top

Virtual Communities Identification for Adaptive Web Systems

Jozef Tomek
master study, supervised by Michal Barla

With constantly increasing amount of information available on the Internet, searching for relevant information is becoming bigger problem every day. Even though the current search engines are still fairly effective tools in the hands of an expert, an ordinary user is often flooded with results he is not interested in. That’s why they are becoming insufficient for vast majority of the still increasing number of internet users, what causes these search engines to be less usable.

One of the possible solutions of this problem is personalization of these systems, an approach that receives a lot of research attention during the last couple of years. What lurks behind this term is basically an effort to adjust the behavior of the system to the individual needs of the particular user.

Human is a social being by the nature, so he tries to integrate himself into the community of other people he shares some common attribute, interest or attitude with. The world of the internet is no exception. People belonging to an internet community may be characterized by the same information interests. They search for and collect the same type and quality of information while browsing the web environment. Concerning this fact these two questions arise: how to identify these communities and how to assign a particular user to a particular community?

The objective of my diploma thesis is to analyze existing approaches in virtual communities identification on the web based on observing and analyzing user behavior. Especially, focusing on those that are applicable in real-time, thus the approaches that adjust user model after every user interaction. The main goal is to design a method that can provide communities identification real-time, together with considering already existing user models. Later to test the proposed method on real world scenario using the test data that represent real users’ interactions with the web space. These test data will be collected by adaptive proxy server, which is being developed on the faculty. The outcome of this work will be a model of communities extracted from the input data set and a plugin for stated adaptive proxy server, which will perform application of the proposed method in real-time fashion.

to the top

Intelligent Social Learning for Everyone

Jozef Tvarožek
doctoral study, supervised by Mária Bieliková

In our project, we are interested in how socializing the process of learning can improve motivation and learning, and devise computational models and methods to realize this concept. For this, we are building a prototype learning system that adds socially intelligent tutoring to our previous work which is a typical pseudo-tutor assessment system enhanced with free-text answering, and on-the-fly question generation and adaptive selection. In evaluation of our prototype, we are ultimately interested in how does the addition of a socially intelligent tutor, the tutoring friend, affect students’ on-class motivation and off-class system use. Additionally, we want to investigate the effect of an earlier anonymous collaboration between students that, in case of mutually positive evaluations, is transformed into an acquaintance by the tutoring friend, thus allowing the particular students a possible future encounter.

To endow the computer tutor with at least the appearance of human-level social intelligence, the tutoring friend extracts instances of student’s social behavior such as past events the student attended in real life, future plans and opinions on interesting topics, and peer evaluations and group outcomes of collaborative activities performed by the student in the social learning system. The challenge is to gather descriptions of student’s events, plans and opinions during the off-task (socializing) dialogs with the tutor, and for this we employ a slot-filling dialogue manager that extracts (from student’s utterances) relevant attributes such as duration, location, and other participants.

to the top

Personalized Faceted Exploratory Search in the Semantic Web

Michal Tvarožek
doctoral study, supervised by Mária Bieliková

Exploratory search is one of the current approaches to information retrieval and content navigation in information spaces such as the Web or the Semantic Web. Exploratory search also signifies a shift in search and navigation approaches from traditional “closed” fact retrieval tasks towards more open ended tasks where users do not know beforehand what it is they actually seek. Typical examples of exploratory search include learning, exploration of an information space and the understanding of its structure, or analysis, comparison and forecasting. The aim of our research is the extension an enhancement of exploratory search approaches with support for personalized browsing of an information space using an enhanced faceted browser. We focus on search, visualization and user interaction with structured information spaces described via ontologies.

We extend classical faceted browsers with:

  • dynamic facet generation using the structure of the information space (metadata),
  • automatic adaptation of facet ordering and visualization to user preferences,
  • dynamic generation of both textual and graphical views of the information space based on estimated user preferences,
  • integration of keyword-based, view-based and content-based search approaches,
  • support for collaboration and evaluation of users’ social relationships.

We evaluate our approach by conducting a user study in the digital library domain using heterogeneous content – digital images, publications and student projects. The proposed approach allows users to browse and search in arbitrary (structured) information spaces represented via ontologies (e.g., in OWL). We see several opportunities for the integration of our method with other approaches, mostly for input acquisition and information organization (e.g., classification or clustering). The integration with approaches for discovery of users’ social networks, the acquisition and maintenance of user models, and the integration with advanced data mining and visualization approaches appears of great interest.

to the top

Annotating Texts in Educational Web-Based System

Maroš Unčík
bachelor study, supervised by Mária Bieliková

The natural behavior of people is that they annotate the educational texts while learning. Reading often does not involve just looking at words on a page, but also underlining, commenting and highlighting important facts (if it is possible). Combination of annotations and reading with critical thinking and learning helps to increase the level of study texts understanding. This paper document annotation metaphor could improve learning also in a world of web-based environments. However, it is oddly to see that although there is the Web for many years, general solution for the domain of annotation still does not exist.

Our aim is to design and implement web-based system that allows annotations. The most important aspect of the annotation, which we deal with, is the collaboration of students and their teachers.

The web-based system that we propose, involves the concept of creation and evaluation of questions. The idea, we have started from, is to make students understand what is important in study text and to force them thinking about it. There are two levels, which must be taken into account. First of them is that students receive a feedback to their questions from other students, secondly that they should share the questions with their colleagues.

In the similar systems, the questions are added by expert. But for the expert is sometimes difficult to propose a question of accurate level for students. Moreover the content of study texts is also too extensive. In our system questions are created by students. Students add the questions to a document to a place they take for important. The question is just asking for the content of the text, which it is binding. Other students can find the answer to the question in the selected text, so they can easily find out the main ideas from the text.

Evaluation of questions is based on the rating by other students and teacher. The rating is subjective and it is based on difficulty, intelligibility, relevance of question and other factors. Evaluation of questions allows us decide to the question in the system to maintain, modify, or replace. We care also for motivation the students, so the system involves the competitive element which will encourage students to create and rate questions.

to the top

Homophily and Relational Classification

Peter Vojtek
doctoral study, supervised by Mária Bieliková

Increasing complexity and structure of the treated data revealed limitations of the attribute-based (content) classification based solely on own content of data objects. In search for advanced methods capable to exploit structure of interconnected data instances more intensively, relational classification originated as more efficient alternative to content classification.

Methods which utilize the relations between classified instances are well suited for domains where instances have variable number of attributes (e.g., actors in a movie), attribute values are very sparsely distributed and inadequately correlate with classes, or instances have very few attributes but many relations (e.g., person in a social network identified by its nickname only but connected to many other people via friendship relation).

Our work extends the current state of art in the relational classification in the area of classifier design. We apply the task of classification in graph based datasets, mainly graphs which capture social networks. In this kind of data is present a phenomenon named homophily, defined as a tendency of individuals (vertices in a graph) to associate and bond with similar others (via edges).

We defined how to measure homophily in a graph and we designed relational classifier capable to employ this knowledge and take benefit of it.

Homophily is present im many datasets capturing the web and its users, and at the same time, relational classifiers are successfully employed in this domain. Another classification tasks where relations are important and should be captured by the classifiers is tax fraud detection, hypertext document classification or social network analysis.

to the top

Dynamics in Hierarchic Document Classification

Dušan Zeleník
master study, supervised by Mária Bieliková

The point of our work is to provide method which is able to determine similarity between documents. We designed different representation of relations between documents. The structure (binary tree) enables online expansion of the document set without amounts of recalculations. The hierarchic structure is ready for on demand cluster computing just by pruning this tree. Similarly to decision trees, we proposed how it is possible to locate user interest stereotypes in the tree. By locating interest, it is ready to recommend documents which are suitable for user’s personality.

Different issue is to solve problem in every document set – keeping it up-to-date. Removing old documents is as simple as adding them, but we proposed method which suppresses aging documents iteratively by changing their weight. Decision procedure is then fluently affected by the date of the document creation.

We use several strategies to extract information from document to build up the structure. We focus on contend based strategies. It means that the content of the document is used for discovering relations. Alternatives like normalization by lemmas, synonyms or statistic stemmer are subjects for further experiments.

The environment of web news is suitable place where to use proposed method and test its features. Important part of solution is also its experimental evaluation. As soon as it is provided for specific set of users, we can compare its popularity, which shows how this solution is relevant for the users. Other option for evaluation is to compare results which are calculated by different already existing methods, or by comparing it with results created manually. We consider as the most important evaluating results by a comparison, because users tend to react positively on new features despite of its added value.

to the top