Students’ Research Works – Spring 2016: Semantics Acquisition and Domain Modeling (PeWe.DM)

Peter Gašpar: Linking Multimedia Metadata by Using Microblogging Network
Michal Holub: Identification of Similar Entities in the Web of Data
Matej Kloska: Support for Domain Model Authoring
Samuel Pecár: Ontology learning from text
Matúš Pikuliak: Relationship Extraction Using Word Embeddings
Karol Rástočný: Utilization of Information Tags in Software Engineering

Linking Multimedia Metadata by Using Microblogging Network

Peter Gašpar
master study, supervised by Jakub Šimko

Abstract. With the expansion of the open Web, information overload has become a huge problem. Many information retrieval approaches are trying to deal with retrieving, representation, searching and storing of large datasets. Metadata are widely used to describe instances – especially in the domain of multimedia.

A big potential in building metadata database is hidden in the SNSs. Past years they have become an irreplaceable companion on the Web for the most of people. On the one hand, they provide nearly unlimited space to spread ideas and opinions. Moreover, many television companies use them to propagate their programmes with desired articles and backstage photographs and videos. SNSs are also one of the most straightforward ways to get in touch with TV audience. People’s activity on public statuses makes an opportunity to reveal other interesting content.

In our work we have analysed the behaviour of users on microblogs. We have identified and highlighted several features that creators of microblog posts use: named entities, hashtags and external links. Based on this behaviour we have proposed a new method to map shows from the TV schedule to the posts from microblogs. To evaluate our solution, we have made several experiments revealing that the results are very promising.

to the top | to the main | extended abstract
<!– | In Proc. of Spring 2016 PeWe Workshop, pp. 53-54 –>

Identification of Similar Entities in the Web of Data

Michal Holub
doctoral study, supervised by Mária Bieliková

Abstract. The Semantic Web promotes machine-readable data available freely on the web in open standards in order for autonomous applications to use them. Such data is being constantly created and linked together, thus forming the Linked Data Butt or the Web of Data. Currently, there are few hundreds of datasets in this butt covering wide range of domains.

In our research we focus on discovering relationships between named entities published on the Web of Data. We then use such data in autonomous adaptive web-based applications. Mainly, we are interested in finding similar and identical entities, either within one dataset, or across more datasets and linking them together. This method has a variety of usages: 1) in deduplication algorithms (usable in data cleaning and processing tasks), 2) in similarity detection (usable in search and recommendation tasks), or 3) in data enriching and integration tasks.

Building an adaptive web-based application using a domain model based on Linked Data enables us to utilize the relationships to recommend related entities (e.g. in the domain of learning materials), or to help the user navigate in a large information space (e.g. in large digital libraries containing millions of authors, papers and conferences which may overwhelm the user). We can also use the relationships to help the user in the search process. Since the Linked Data butt has the form of a large graph we are able to answer complex queries, which are difficult to solve using traditional keyword-based approach.

to the top | to the main | extended abstract
<!– | In Proc. of Spring 2016 PeWe Workshop, pp. 55-56 –>

Support for Domain Model Authoring

Matej Kloska
master study, supervised by Name Surname

Abstract. We live in times when people use information technology through which they produce many information. This fact is exponentiated using the Internet as a universal tool for communication. Due to this fact arises a need for an inteligent and quick search, visualization and last but not least navigation across the digital space created from the data. An approach to the problem area appears to be semantics. Ontologies, as representatives of semantics, are often seen as a response to the need for an interoperable semantics in modern information systems. In many cases, they act as an important tool for the organization and representation of knowledge in context, particularly in scientific research and organizations with specific requirements.

The aim of our work is to promote the creation of domain model and help to facilitate many other processes of everyday life among which we can include the previously mentioned TEL. In this work we propose a method for supporting domain model creation. The rest of paper describes significant related work, proposal of our method, customer product survey and results of our work..

to the top | to the main | extended abstract
<!– | In Proc. of Spring 2016 PeWe Workshop, pp. 57-58 –>

Utilization of Information Tags in Software Engineering

Karol Rástočný
doctoral study, supervised by Mária Bieliková

Abstract. Information tags are a subset of descriptive metadata that assign some structured information to other information artefacts (e.g., to a webpage paragraph or to a source code line). In general, informal tags have been proposed to model properties of tagged information (e.g., webpages). Additionally the information tags model is based on the standardized Open Annotation Data Model so information tags can be shared among software systems. Due to these properties of information tags, we utilize them for modeling source code files and to provide first basic tools which utilize source code model based on information tags.

For modeling source code files and supporting tools we utilize all categories of information tags: (i) User information tags – support code review process (TOTO, FIXME, CODEREVIEW, REFACTOR, …), (ii) Content-based information tags – information tags obtained by analysis of a source code via SonarQube, (iii) User activity information tags – information tags created by analysis of developers’ activities, e.g. implicit dependencies, (iv) Aggregating information tags – aggregate information from multiple information tags, e.g. facet tags for support of source code search.

Currently we are refactoring and finalizing implementation of an architecture for collecting developers’ activities and for enriching source code with information tags. The refactored architecture gives us possibilities to deploy the architecture to multiple organization, to collect clearer datasets and to provide experiments effectively.

to the top | to the main | extended abstract

Relationship Extraction Using Word Embeddings

Matúš Pikuliak
master study, supervised by Marián Šimko

Abstract. Natural languages are natural means of communication for people but they are virtually incomprehensible for machines. Understanding the meaning of given text is extremely difficult task for current algorithms concerned with acquisition of semantics. Full understanding of any natural language as we understand it is however still just a vision. Natural language processing today is concerned with much smaller and easier tasks such as relationship extraction.

Relationship extraction is a process of discovering new instances of semantic relations within set of lexical units. For example extracting the relations between country and its capital from text corpora should bring us pairs of words with this relation, such as France-Paris or Italy-Rome. We are using state-of-the-art technique for statistical processing of text corpora called word embeddings. This technique consists of projecting lexical units from text corpus, usually words, to high-dimensional vector space using deep learning algorithm while preserving semantic similarity between words.

We have designed and implemented our very own method for discovering new instances of relationships in this space. We are defining class of pairs with certain relationship using small set of exemplary pairs. We are examining how are these pairs projected in our vector space and we are trying to recognize patterns that certain classes are making in this space. We are applying pattern recognition solutions such as PU Learning to discover new pairs with the relation defined by examples.

With only several dozens of examples we were able to discover new pairs with expected relation. Our method can be used in variety of tasks related to knowledge engineering such as ontology population. It could also facilitate other natural language processing tasks. Our work also deepens our understanding of word embeddings created by neural networks and the patterns they create.

to the top | to the main | extended abstract
<!– | In Proc. of Spring 2016 PeWe Workshop, pp. 59-60 –>

Ontology learning from text

Samuel Pecár
master study, supervised by Marián Šimko

Abstract. Ontology learning from text is the extensive process of creation ontologies from text corpora. This process consist of several major subtasks like term extraction, concept discovery and learning relations.

Currently, we focus on state-of-art analysis and identify various types of approaches to taxonomy learning. Taxonomy learning is very important part of ontology learning and can be divided in several subtasks as relation discovery, taxonomy construction and taxonomy cleaning. Taxonomies are very useful tools and providing valuable input for many complex tasks like question answering and textual entailment.

Task from International Workshop on Semantic Evaluation (SemEval) is concerned with automatically extracting hierarchical relations from text corpora and subsequent taxonomy construction. Main goal of this task is extraction hypernym-hyponym relations and task is not concerned with any relation indicating subordination between terms.

Our aim is to propose a method for taxonomy extraction and construction using state-of-art approaches from other ontology learning tasks.

to the top | to the main | extended abstract