Students’ Research Works – Autumn 2018: Data Analysis, Mining and Machine Learning (PeWe.Data)

Filip Dresto: Characterizing Fake News and Its Spread by Data Analysis
Marek Drgoňa: Aspect detection for sentiment analysis
Peter Gašpar: Personalized Hybrid Recommendation enhanced by Visual Features
Florian Chmelár: Hybrid recommendation
Tomas Mizera: Linking and Cleaning of Open Government Data
Branislav Pecher: Interpretability of Neural Network Models Used in Data Analysis
Matej Schwartz: User behavior on the Web: prediction of retention
Elena Štefancová: Recommendation Taking into Account the Time Aspects of Users and Items
Peter Tibenský: Personalized Product Recommendation for Users
Miroslav Valčičak: Predicting Offer Popularity in E-commerce Environment
Marek Wallner: Analysis of User Feedback

Characterizing Fake News and Its Spread by Data Analysis

Filip Dresto

Abstract: Nowadays, information is spreading over the Internet at great speed. Information sources not only include news portals but also more and more popular social networks. However, the amount and speed of dissemination of information also brings its disadvantages. Fake news is becoming an increasing problem not only in the world but also in Slovakia. Fake news contain false, inaccurate or even untrue information and facts. Given that information generally affects the opinions, attitudes and actions of people, fake news can have a negative impact not only on the individual but also on the wider society. Since fake news tend to act as reliable news, their identification is a problem for people.

After analyzing the problems with fake news, we have focused on the differences between real and fake news in terms of content, corpus, and language. Our goal is to better understand the characteristics of fake news and to distinguish the differences between fake and real news.

to the top | to the main

Aspect detection for sentiment analysis

Marek Drgoňa

Abstract: On web portals (such as blogs, forums, e-shops) or social networks, users often express their opinions on certain topics or products. Their opinions include sentiment (either positive or negative), and its analysis is crucial in many domains in terms of business and marketing. Opinions are often summarized and evaluated manually, however evaluation is time consuming and laborious. More efficient approach is automatic sentiment analysis.

In our work we will mainly focus on aspect-oriented sentiment analysis, specifically on automatic aspect detection. Firsty, we will describe state of the art of sentiment analysis and compare existing method on aspect extraction. Secondly, we will create our own method in slovak language, using machine learning algorithms.

to the top | to the main

Personalized Hybrid Recommendation enhanced by Visual Features

Peter Gašpar

Abstract: Recommender systems have become an essential part of the Web in various domains. They provide suggestions for users about which things to read, which products to buy, or which movies to watch. They try to perfectly tailor to user preferences in order to improve overall user experience. However, in some scenarios, we need to deal with the problem of insufficient amount of information about users, items, or they interactions.
Another essential part of many Web portals are images. They play an important role in extending or even replacing an information about items (such as movie posters, product photos). Users’ decision process may be led by the visual stimuli and thus, an important information about the item is hidden in the image. Moreover, these images may also contain features that can be useful during the process of user-modelling and recommendation.
In our work, we study the problem of incorporation of features extracted from the images in a recommender systems domain. We examine hybrid recommendation approaches that combine basic recommender systems techniques and incorporate images during the ranking process of output recommendations.

to the top | to the main

Hybrid recommendation

Florian Chmelár

Abstract: Recommendation is scientific department which importance correlates with the number of options user have when he wants to choose a product in random domain. In the age of informatics, recommender systems are very helpful, because a common person is experiencing big information overload. Similarly they are very helpful for sellers, because they try to offer customers something they like and probability that they will actually like it is higher.

The most common recommendation technique is collaborative recommendation which rates the product based on ratings of similar users. But it cant generate quality suggestions if it operates in environment where there were made only a few interactions between the users and products. Its called problem of sparsity.

In this thesis I want to create hybrid recommendation system, which will combine collaborative recommender with the second most common type of recommender, content-based. Output from content-based recommender systems are products, which have common features with those, which were rated positively by the user. I will analyze which features are good to combine with standard collaborative recommender and then try to prove their usefulness by implementing and testing hybrid recommender system on some classic domain.

to the top | to the main

Linking and Cleaning of Open Government Data

Tomáš Mizera

Abstract: Open Government Data (OGD) gains more and more popularity around the world. Governments and state institutions work hard to adopt OGD principles into their legislature and disclose datasets to public.

However, the process of disclosing government data is not automated and needs a manual processing to be done. Manual processing often carries with it an errors spread across datasets. Also, although there is a centralized platform for raw datasets to be published on, some institutions tend to publish datasets on their own sites.

The goal of our project is to provide an intermediary platform, where messy OGD published in several government platforms can be viewed in human-friendly form. Moreover, it hopes to increase popularity of OGD and create more transparent relationship between government and its citizens.

to the top | to the main

Interpretability of Neural Network Models Used in Data Analysis

Branislav Pecher

Abstract: Interpretability is an integral part of every machine learning model. Without it, the chance our model will be used in domains where the cost of error is huge, like medicine, is non existant. How can the experts in these fields really trust that our model behaves correctly and according their assumptions, if we do not provide them with explanations of our models decisions? The neural network models that are regarded as one of the hardest models to interpret. This is enhanced by the recent advances that allow the networks to become more and more deep with large number of layers. On the other hand, these models can model data complexities more precisely than other models and therefore perform much better on specific task.

Our goal is to develop a method that assigns an importance factor to each feature for a specific decision, that take into consideration interactions present in the data. The main focus of this work is on developing this kind of method for text data that uses word embeddings.

to the top | to the main

User behavior on the Web: prediction of retention

Matej Schwartz

Abstract: Predicting customer behavior represents a significant role in the dynamic web environment. Knowing customer satisfaction with services, content or goods is a good way for merchants to respond and take action to increase customer satisfaction. Conversely, if we know that the customer is definitely leaving, we do not need to spend more money on his conviction or providing services.
Through web applications, we can record user activity with a variety of information that can be used to predict their behavior in a variety of ranges – whether at the user’s loss level, subscription to the service, or the purchase of additional goods.
Analyze machine learning approaches that can be used to predict customer behavior with an emphasis on the use of web application data. Explore the features that influence customer decision-making and maintain interest in the content offered in the selected application domain. Identify significant characteristics that influence customer decision- making and/or content analysis of the elements with which they interact. Design a prediction method based on identified attributes. Verify the suggested solution in the selected domain on a non-trivial sample of data (e-commerce, etc.).

to the top | to the main

Recommendation Taking into Account the Time Aspects of Users and Items

Elena Štefancová

Abstract: This work deals with the issue of recommender systems, namely those that focus on the integration of contextual aspects. Our aim is to improve the recommendation systems by incorporating temporal aspects as one of the important attributes in generating recommendations.
To prove that, we have proposed a method that addresses two different domains whose recommendation is to be extended by temporal aspects and, in addition, we aim to try out a specific style of recommendation, which does not recommend individual items, rather their collections. We believe that this approach has the additional value potential for current methods.

to the top | to the main

Personalized Product Recommendation for Users

Peter Tibenský

Abstract: This work deals with the issue of personalized recommendations to users in the domain of multimedia content, focusing on the adaptive selection of a recommendation method for a particular user or group of users. Conventional implementations of recommender systems use methods that are optimized for all users and can, due to diversity of users, limit the resulting accuracy of the recommendation. The quality of recommendations in the field commerce is key to product sales as well as user experience.
In this work, we focus on creating a hybrid recommendation method that would choose the most appropriate recommendation method for a user or group of users with similar characteristics to achieve qualitatively better results than to use a conventional recommendation method. In the domain of multimedia content, we specialize in movie recommendation by using a publicly available data set. In the course of this work we analyse the various recommendations techniques, state of the art, evaluation metrics and design and implementation of our own adaptive recommendation method based on the switching hybrid recommendation, including the selection of suitable attributes that will be the input for our method.

to the top | to the main

Predicting Offer Popularity in E-commerce Environment

Miroslav Valčičák

Abstract: Increasing popularity of buying goods online causes a rapid growth of a number of e-shops. Many different sellers often offer many similar products which differ only in a few details and thus buyers often make decisions based on less “product-related” and more “offer-related” attributes such as length of description or quality of illustrative photos. Usually, it takes some time to find out if an offer is considered attractive by potential buyers and meanwhile some competitors could have offered a very similar product in a more attractive way. This means the one who knows what attributes a popular offer should have and when it should be published rules the market. There are many works on predicting the popularity of items after they are published. In our work, we focus on creating a model to predict a product popularity prior to its publication. Firstly we want to choose a proper classifier and then we want to provide some kind of offer modification recommendation as well.

to the top | to the main

Analysis of User Feedback

Marek Wallner

Abstract: User working on web leaves traces behind.Traces can be conscious(explicit feedback) or unconscious(implicit feedback). As explicit feedback we consider assessments items, for example: Like/unlike, percentage of satisfaction or commentary. As implicit feedback we consider clicks, time spent on web page or saving the item in favorites. In our work we will focus on implicit feedback and its analysis. Several studies focused on the processing and subsequent use of implicit feedback and found that although implicit feedback data are robust, they are often distorted by various influences. These influences are, for example: position of item on page, highlighted words, summary of item. In our work we select one of these influences and apply it to data from the e-shop. The output of the work will be a quantitative assessment of the impact of these distortions on the behavior of the user.

to the top | to the main