{"id":102,"date":"2016-04-05T10:02:05","date_gmt":"2016-04-05T08:02:05","guid":{"rendered":"http:\/\/www.pewe.sk\/datalys\/?p=102"},"modified":"2016-04-05T13:47:56","modified_gmt":"2016-04-05T11:47:56","slug":"a-survey-on-big-data-technologies-2016","status":"publish","type":"post","link":"https:\/\/www.pewe.sk\/datalys\/2016\/04\/05\/a-survey-on-big-data-technologies-2016\/","title":{"rendered":"A survey on Big Data technologies 2016 by Mat\u00fa\u0161 Cimerman"},"content":{"rendered":"<p style=\"text-align: center;\"><span style=\"font-weight: 400;\">Big Data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else doing it, so everyone claims they are doing it\u2026<\/span><\/p>\n<p style=\"text-align: right;\"><span style=\"font-weight: 400;\">(Dan Ariely)<\/span><\/p>\n<p><span style=\"font-weight: 400;\">This blog is a summary of several survey articles about the current state of Big Data technologies, the list is included at the end of the article. At some places, my own observations and comments are included.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">I have met two groups of people, one was saying: &#8220;We are doing Big Data!&#8221; and second one: &#8220;There is no Big Data (in Slovakia<\/span><span style=\"font-weight: 400;\">)&#8221;. So I came up with two fundamental questions. Why and how do you think you are doing Big Data? Why do you think there is no Big Data (in Slovakia)? I am not sure about the first question. But it is certain there is Big Data back in Slovakia. Just think of a telecommunication operator as an example.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Thinking of Big Data as a whole, an early years were driven by a set of large, mainly Internet, companies which were also creators of the core Big Data technologies. For example Google developed <\/span><a href=\"http:\/\/hadoop.apache.org\/\"><span style=\"font-weight: 400;\">Hadoop<\/span><\/a><span style=\"font-weight: 400;\"> (currently developed under Apache) framework for the distributed processing of large data sets across clusters of computers using simple programming models. The very best engineers from these companies went on they own and established their own Big Data startups. <\/span><\/p>\n<p><span style=\"font-weight: 400;\">This interesting and massive domain includes three core topics: statistics, machine learning and data mining. Each of these knowledge is an independent and extensive area which includes various research problems. A key thing to understand is that: Big Data is about assembling a set of technologies and processes together. You need to capture data, store them, clean them, query them, analyze and visualize them. Today, there is a thing about capturing and storing data. With massive amounts (e.g. TB and more per day) of data originated from data firehoses<\/span><span style=\"font-weight: 400;\">, storing all these data in raw form is a rising problem.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">I recall what have some people predicted for today. They&#8217;ve predicted that last years were supposed to be the years of natural language processing and image processing or recognition (using traditional methods). Are they? Certainly not. Just take a look at venture capitals, Big Data startups received $6.64B in venture capital investment in 2015, 11% of total tech VC<\/span><span style=\"font-weight: 400;\">.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Speaking about technologies, then 2015 was year of <\/span><a href=\"http:\/\/spark.apache.org\/\"><span style=\"font-weight: 400;\">Spark<\/span><\/a><span style=\"font-weight: 400;\"> without a doubt. We have seen more than linear growth in several indicators of Spark usage. Spark in an open source framework whose core providing in-memory processing. Other exciting frameworks continue to gain more momentum such as Samza, Flink, Kudu, Mesos or Heron (not open-sourced yet) as a successor of Twitter&#8217;s Storm. In the world of databases there are also emerging technologies like Neo4j founded in LinkedIn, CockroachDB or InfluxDB. Most of these technologies are released as open-source by large Internet companies, the same like in beginning of Big Data era this millennium. Startups and rising companies often build their business on top of these technologies. You can see all the popular technologies for the year 2016, grouped in figure 1. Another <\/span><a href=\"http:\/\/dfkoz.com\/big-data-landscape\/\"><span style=\"font-weight: 400;\">full list of Big Data technologies<\/span><\/a><span style=\"font-weight: 400;\"> by Matt Turck lists all the most important technologies to look into.<\/span><\/p>\n<p><img loading=\"lazy\" class=\"alignnone size-large wp-image-103\" src=\"https:\/\/www.pewe.sk\/datalys\/wp-content\/uploads\/sites\/3\/2016\/04\/matt_turck_big_data_landscape_v11-1-1024x770.png\" alt=\"matt_turck_big_data_landscape_v11 (1)\" width=\"700\" height=\"526\" srcset=\"https:\/\/www.pewe.sk\/datalys\/wp-content\/uploads\/sites\/3\/2016\/04\/matt_turck_big_data_landscape_v11-1-1024x770.png 1024w, https:\/\/www.pewe.sk\/datalys\/wp-content\/uploads\/sites\/3\/2016\/04\/matt_turck_big_data_landscape_v11-1-300x226.png 300w, https:\/\/www.pewe.sk\/datalys\/wp-content\/uploads\/sites\/3\/2016\/04\/matt_turck_big_data_landscape_v11-1-768x577.png 768w, https:\/\/www.pewe.sk\/datalys\/wp-content\/uploads\/sites\/3\/2016\/04\/matt_turck_big_data_landscape_v11-1-200x150.png 200w\" sizes=\"(max-width: 700px) 100vw, 700px\" \/><\/p>\n<p style=\"text-align: center;\"><span style=\"font-weight: 400;\">Figure 1. Big Data landscape companies in 2016 (original available here <\/span><a href=\"http:\/\/mattturck.com\/wp-content\/uploads\/2016\/03\/Big-Data-Landscape-2016-v18-FINAL.png\"><span style=\"font-weight: 400;\">http:\/\/mattturck.com\/2016\/02\/01\/big-data-landscape\/<\/span><\/a><span style=\"font-weight: 400;\">)<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Trend in the last few months is about focusing on artificial intelligence\/machine learning, to build up better analyses and predictions using massive amounts of data. In business world, it is simply to deliver revenue or predictive insights on market against rivals. We can notice this trend even in the fastest being adopted Big Data framework: Spark where machine learning library was added recently. We are still in the early stage and evolving phase of the Big Data phenomena. Combination of AI\/machine learning now emerging towards Big Data. This combination will drive innovation and research across various industries. From that perspective, opportunity hidden in Big Data is far beyond than we thought.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Last, we would like to propose some predictions for the year 2016 and Big Data. Many people say it will be year of Internet of Things and corresponding Big Data analytics. But, for example <\/span><span style=\"font-weight: 400;\">Gregory Piatetsky<\/span><span style=\"font-weight: 400;\">, President of <\/span><a href=\"http:\/\/www.kdnuggets.com\/\"><span style=\"font-weight: 400;\">KDNuggets<\/span><\/a><span style=\"font-weight: 400;\"> (I personally recommend subscribing to their newsletter if you are interested in Big Data including all three domains in the top of article) says:<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u201c2016 will be the year of deep learning. It will move from experimental to deployed technology in image recognition, language understanding, and exceed human performance in many areas.\u201d<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400;\">This article was written as review of the following articles:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\"><a href=\"http:\/\/mattturck.com\/2016\/02\/01\/big-data-landscape\/\"><span style=\"font-weight: 400;\">Is Big Data Still a Thing? (The 2016 Big Data Landscape)<\/span><\/a><\/li>\n<li style=\"font-weight: 400;\"><a href=\"http:\/\/www.kdnuggets.com\/2015\/12\/22-big-data-science-experts-predictions-2016.html\"><span style=\"font-weight: 400;\">22 Big Data &amp; Data Science experts predictions for 2016<\/span><\/a><\/li>\n<li style=\"font-weight: 400;\"><a href=\"http:\/\/www.kdnuggets.com\/2016\/03\/3-telecom-developments-which-impact-iot-analytics.html\"><span style=\"font-weight: 400;\">3 Telecom Developments Which impact IoT Analytics<\/span><\/a><\/li>\n<li style=\"font-weight: 400;\"><a href=\"http:\/\/www.kdnuggets.com\/2016\/02\/spark-tipping-point.html\"><span style=\"font-weight: 400;\">Why Spark Reached the Tipping Point in 2015<\/span><\/a><\/li>\n<li style=\"font-weight: 400;\"><a href=\"http:\/\/www.kdnuggets.com\/2016\/03\/top-big-data-processing-frameworks.html\"><span style=\"font-weight: 400;\">Top Big Data Processing Frameworks<\/span><\/a><\/li>\n<li style=\"font-weight: 400;\"><a href=\"http:\/\/www.kdnuggets.com\/2015\/12\/spark-deep-learning-training-with-sparknet.html\"><span style=\"font-weight: 400;\">Spark + Deep Learning: Distributed Deep Neural Network Training with SparkNet<\/span><\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Big Data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else doing it, [&hellip;]<\/p>\n","protected":false},"author":15,"featured_media":110,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[7],"tags":[],"_links":{"self":[{"href":"https:\/\/www.pewe.sk\/datalys\/wp-json\/wp\/v2\/posts\/102"}],"collection":[{"href":"https:\/\/www.pewe.sk\/datalys\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.pewe.sk\/datalys\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.pewe.sk\/datalys\/wp-json\/wp\/v2\/users\/15"}],"replies":[{"embeddable":true,"href":"https:\/\/www.pewe.sk\/datalys\/wp-json\/wp\/v2\/comments?post=102"}],"version-history":[{"count":5,"href":"https:\/\/www.pewe.sk\/datalys\/wp-json\/wp\/v2\/posts\/102\/revisions"}],"predecessor-version":[{"id":112,"href":"https:\/\/www.pewe.sk\/datalys\/wp-json\/wp\/v2\/posts\/102\/revisions\/112"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.pewe.sk\/datalys\/wp-json\/wp\/v2\/media\/110"}],"wp:attachment":[{"href":"https:\/\/www.pewe.sk\/datalys\/wp-json\/wp\/v2\/media?parent=102"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.pewe.sk\/datalys\/wp-json\/wp\/v2\/categories?post=102"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.pewe.sk\/datalys\/wp-json\/wp\/v2\/tags?post=102"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}