Students’ Research Works – Spring 2019: Data Analysis, Mining and Machine Learning (PeWe.Data)

Patrik Blanárik: Meta-recommending: Adaptive Selection of Personalized Recommendation Algorithms
Filip Dresto: Characterizing Fake News and Its Spread by Data Analysis
Marek Drgoňa: Aspect detection for sentiment analysis
Peter Gašpar: Personalized Hybrid Recommendation enhanced by Visual Features
Florian Chmelár: Hybrid recommendation
Jakub Janeček: Interpretability of machine learning models created by clustering algorithms
Tomas Mizera: Linking and Cleaning of Open Government Data
Branislav Pecher: Interpretability of Neural Network Models Used in Data Analysis
Matej Schwartz: User behavior on the Web: prediction of retention
Elena Štefancová: Recommendation Taking into Account the Time Aspects of Users and Items
Peter Tibenský: Personalized Product Recommendation for Users
Nikolas Tomaštík: Personalized Web Recommendations
Miroslav Valčičak: Predicting Offer Popularity in E-commerce Environment
Marek Wallner: Analysis of User Feedback

Meta-recommending: Adaptive Selection of Personalized Recommendation Algorithms

Patrik Blanárik

Abstract: Recommender systems have become essential part of the Web in many domains. Research in this topic in the last few years led to design of a large variety of algorithms capable of generating personalized recommendations. Having so many algorithms, however, selecting the one that will be used to generate recommendations in a specific situation may be a challenging task. Common practice is to try multiple algorithms and then choose the one that performs the best. However, this approach requires non-trivial amount of time and effort.

In our work, we study an algorithm selection problem in recommender systems domain. Our goal is to design a solution that will recommend the use of an algorithm in a given situation. Understanding what affects performance of different algorithms is crucial for our approach. Therefore, our first task is to find out what data/domain/user characteristics cause some algorithms to success while the other algorithms fail.

to the top | to the main

Characterizing Fake News and Its Spread by Data Analysis

Filip Dresto

Abstract: Nowadays, information is spreading over the Internet at great speed. However, the amount and speed of dissemination of information also brings its disadvantages. Fake news is becoming an increasing problem not only in the world but also in Slovakia. Fake news contain false, inaccurate or even untrue information and facts. Given that information generally affects the opinions, attitudes and actions of people, fake news can have a negative impact not only on the individual but also on the wider society.

After analyzing the problem of fake news, our work focused on the differences between articles coming from reliable and unreliable sources. Based on the analysis, we have designed a method that looks for differences in their sentiment, readability, and number of social network shares. We verified the method on a dataset of articles from both reliable and unreliable sources. We have found that there are statistically significant differences in sentiment and readability between articles from reliable and unreliable sources.

to the top | to the main

Aspect detection for sentiment analysis

Marek Drgoňa

Abstract: Nowadays, users are often used to express their opinions on certain topics or products. Sentiment analysis is process of summarizing and analyzing reviews, what is crucial from the perspective of business and marketing. In this paper we focus on aspect-level sentiment analysis, which consists of more steps.

Our main goal is to propose a method for automatic aspect extraction in Slovak language. Firstly, we implement simple method based on TF-IDF statistics. Secondly, we employ machine learning algorithm. We evaluate our method on data from student feedback on education at FIIT STU.

to the top | to the main

Personalized Hybrid Recommendation enhanced by Visual Features

Peter Gašpar

Abstract: Recommender systems have become an essential part of the Web in various domains. They provide suggestions for users about which things to read, which products to buy, or which movies to watch. They try to perfectly tailor to user preferences in order to improve overall user experience. However, in some scenarios, we need to deal with the problem of insufficient amount of information about users, items, or they interactions.
Another essential part of many Web portals are images. They play an important role in extending or even replacing an information about items (such as movie posters, product photos). Users’ decision process may be led by the visual stimuli and thus, an important information about the item is hidden in the image. Moreover, these images may also contain features that can be useful during the process of user-modelling and recommendation.
In our work, we study the problem of incorporation of features extracted from the images in a recommender systems domain. We examine hybrid recommendation approaches that combine basic recommender systems techniques and incorporate images during the ranking process of output recommendations.

to the top | to the main

Hybrid recommendation

Florian Chmelár

Abstract: Recommendation is scientific department which importance correlates with the number of options user have when he wants to choose a product in random domain. In the age of informatics, recommender systems are very helpful, because a common person is experiencing big information overload. Similarly they are very helpful for sellers, because they try to offer customers something they like and probability that they will actually like it is higher.

The most common recommendation technique is collaborative recommendation which rates the product based on ratings of similar users. But it cant generate quality suggestions if it operates in environment where there were made only a few interactions between the users and products. Its called problem of sparsity.

In this thesis I want to create hybrid recommendation system, which will combine collaborative recommender with the second most common type of recommender, content-based. Output from content-based recommender systems are products, which have common features with those, which were rated positively by the user. I will analyze which features are good to combine with standard collaborative recommender and then try to prove their usefulness by implementing and testing hybrid recommender system on some classic domain.

to the top | to the main

Interpretability of machine learning models created by clustering algorithms

Jakub Janeček

Abstrakt: In our era, machine learning has become something of a certainty for solving many research problems, but also problems from real life. The difference between these two areas is, that for research oriented problems, it is not absolute necessity to be able to explain created model and its decisions. However, if we want for our model to be used in other area and want those using it to trust its decisions, it is necessary to be able to explain even complex models.

In our work, we focus on clustering as field, that has been less researched from this angle. Concrete specification, that we are focusing on is finding differences between two segments of data based on feature importance. As tools for this task, we decided to use topological data analysis as a segmentation tool, and regularization of linear models as a tool for finding importance of features. More specifically, we use logistic regression with L1 normalization as a surrogate model, which provides a sparse vector of attributes as output, that we in turn use to interpret the clustering (segmentation) model.

to the top | to the main

Linking and Cleaning of Open Government Data

Tomáš Mizera

Abstract:

Open Government Data (OGD) gains more and more popularity around the world. Governments and state institutions work hard to adopt OGD principles into their legislature and disclose datasets to public.

However, the process of disclosing government data is not automated and needs a manual processing to be done. Manual processing often carries with it an errors spread across datasets. Also, although there is a centralized platform for raw datasets to be published on, some institutions tend to publish datasets on their own sites.

The goal of our project is to provide an intermediary platform, where messy OGD published in several government platforms can be viewed in human-friendly form. Moreover, it hopes to increase popularity of OGD and create more transparent relationship between government and its citizens.

to the top | to the main

Interpretability of Neural Network Models Used in Data Analysis

Branislav Pecher

Abstract: Interpretability is an integral part of every machine learning model. Without it, the chance our model will be used in domains where the cost of error is huge, like medicine, is non-existant. How can the experts in these fields really trust that our model behaves correctly and according their assumptions, if we do not provide them with explanations of our models decisions? The neural network models are regarded as one of the hardest models to interpret. This is enhanced by the recent advances that allow the networks to become much deeper using increasingly higher number of layers. On the other hand, these models can model data complexities more precisely than other models and therefore perform much better on specific task.

Our goal is to develop a method that assigns an importance factor to each feature for a specific decision, that can take into consideration the interactions present in the data. The main focus of this work is on developing this kind of method for text data that uses word embeddings.

to the top | to the main

User behavior on the Web: prediction of retention

Matej Schwartz

Abstract: Predicting customer behavior represents a significant role in the dynamic web environment. Knowing customer satisfaction with services, content or goods is a good way for merchants to respond and take action to increase customer satisfaction. Conversely, if we know that the customer is definitely leaving, we do not need to spend more money on his conviction or providing services.

Through web applications, we can record user activity with a variety of information that can be used to predict their behavior in a variety of ranges – whether at the user’s loss level, subscription to the service, or the purchase of additional goods.

Analyze machine learning approaches that can be used to predict customer behavior with an emphasis on the use of web application data. Explore the features that influence customer decision-making and maintain interest in the content offered in the selected application domain. Identify significant characteristics that influence customer decision- making and/or content analysis of the elements with which they interact. Design a prediction method based on identified attributes. Verify the suggested solution in the selected domain on a non-trivial sample of data (e-commerce, etc.).

to the top | to the main

Recommendation Taking into Account the Time Aspects of Users and Items

Elena Štefancová

Abstract: This work deals with time-aware recommender systems in a domain of location-based social networks, such as Yelp or Foursquare. We propose a novel method to recommend Points-of-Interest (POIs) which consider their seasonality and long-term trends. In contrast to existing methods, we model these temporal aspects specifically for individual geographical areas instead of globally.

In addition, a geographical post-filter method is used for creating personal regions of users. The preliminary results show that consideration of locality-specific seasonality and long-term trends in categories’ popularity can improve the performance of the proposed recommender system.

to the top | to the main

Personalized Product Recommendation for Users

Peter Tibenský

Abstract: This work deals with the issue of personalized recommendations to users in the domain of multimedia content, focusing on the adaptive selection of a recommendation method for a particular user or group of users. Conventional implementations of recommender systems use methods that are optimized for all users and can, due to diversity of users, limit the resulting accuracy of the recommendation. The quality of recommendations in the field commerce is key to product sales as well as user experience.

In this work, we focus on creating a hybrid recommendation method that would choose the most appropriate recommendation method for a user or group of users with similar characteristics to achieve qualitatively better results than to use a conventional recommendation method. In the domain of multimedia content, we specialize in movie recommendation by using a publicly available data set. In the course of this work we analyse the various recommendations techniques, state of the art, evaluation metrics and design and implementation of our own adaptive recommendation method based on the switching hybrid recommendation, including the selection of suitable attributes that will be the input for our method.

to the top | to the main

Personalized Web Recommendations

Nikolas Tomaštík

Abstract: The development of the internet and technology has caused a huge increase in data that has become untenable for the user. Solving this information overflow has become a main motivation for creating recommender systems. Recommender systems make user’s decision-making easier by replacing user’s information discovery process. Recommender will suggest a personalized recommendation based on behavior and personality, resulting in items that could be interesting to the user. This recommendation reduces a large amount of data that is irrelevant to the user.

Our method is focusing on domain Yelp.com, where are people rating businesses and places. This domain suffer with data sparsity, because users tend to not leave ratings for many places. We are trying to reduce this sparsity and improve recommendation with hybrid system, based on clusterization with k-Nearest Neighbours algorithm.

to the top | to the main

Predicting Offer Popularity in E-commerce Environment

Miroslav Valčičák

Abstract: Increasing popularity of buying goods online causes a rapid growth of a number of e-shops. Many different sellers often offer many similar products which differ only in a few details and thus buyers often make decisions based on less “product-related” and more “offer-related” attributes such as length of description or quality of illustrative photos.

Usually, it takes some time to find out if an offer is considered attractive by potential buyers and meanwhile some competitors could have offered a very similar product in a more attractive way. This means the one who knows what attributes a popular offer should have and when it should be published rules the market.

There are many works on predicting the popularity of items after they are published. In our work, we focus on creating a model to predict a product popularity prior to its publication. Firstly we want to choose a proper classifier and then we want to provide some kind of offer modification recommendation as well.

to the top | to the main

Analysis of User Feedback

Marek Wallner

Abstract: User working on web leaves traces behind.Traces can be conscious(explicit feedback) or unconscious(implicit feedback). As explicit feedback we consider assessments items, for example: Like/unlike, percentage of satisfaction or commentary. As implicit feedback we consider clicks, time spent on web page or saving the item in favorites. In our work we will focus on implicit feedback and its analysis.

Several studies focused on the processing and subsequent use of implicit feedback and found that although implicit feedback data are robust, they are often distorted by various influences. These influences are, for example: position of item on page, highlighted words, summary of item.

In our work we select one of these influences and apply it to data from the e-shop. The output of the work will be a quantitative assessment of the impact of these distortions on the behavior of the user.

to the top | to the main