Documente Academic
Documente Profesional
Documente Cultură
Abstract
The purpose of this briefing paper is to look at Big Data Analytics from different levels. From its evolution to its popularity in recent media, the main definitions from the key players in the field, the most commonly heard technology terms explained in simple format and lastly touches on the challenges that Big Data brings to business world, our personal lives and society.
Table of Contents
Where is Big Data coming from? ........................................................................................................ 2 Why Big Data become so popular? ..................................................................................................... 2 What are the main characteristics of Big Data ? ................................................................................ 3 What are the key technologies behind Data Analytics? ..................................................................... 4 What are the three key challenges faced by businesses? .................................................................. 5 What is the three paradoxes of Big Data ? ......................................................................................... 5 Conclusion ........................................................................................................................................... 6 References .............................................................................................................................................. 6
According to Facebooks 4th amendment to its S-1 with US Securities and Exchange Commission (SEC) on April 2012, Facebook have 901 million monthly active users as at Q1 2012, an increase of 33% as compared to 680 million as at Q1 2011. (Facebook, Inc., 2012) Facebooks ability to analyse all the weblogs and clicks has gained them a very large customer portfolio. This is mostly likely the biggest reason why a lot of companies want to invest on big data.
Volume: refers to the size of the data. With the technological advancements and growing use of social media, the amounts of data has grown rapidly. Data volumes are ranging from Terabytes to Petabytes. Facebook recently revealed that with the new system they are developing they will be able to process their 250 PB worth of data. (Novet, 2013) Velocity: refers to the speed at which the data is being generated. Different applications outputs the data in different rates and in todays business world, decision makers wants to have information ready as quick as possible. In an extreme case scenario, The Large Hadron Collider at CERN experiments represent about 150 million sensors delivering data 40 million times per second. [CERNBrochure-FAQ-LHC-the guide, 2008, p.45] Variety: refers to the different formats in which data is being stored. Data format can be classified in two broad categories: Structured data: has a pre-defined data model/schema/structure and is often either relational in nature. The main advantage of having structured data is it can be easily managed. Examples: data in the relational databases, data from CRM systems, XML
files etc.
Unstructured data: doesnt have a well-defined data model or does not fit well into the relational world. Examples: flat files, spreadsheets, Word documents, emails,
images, audio files, video files, feeds, PDF files, scanned documents, etc.
A quick snapshot from Big Data Tools Hadoop presentation by [S.S.Mulay, 2013] shows the many components that can be considered part of the Hadoop ecosystem. Different uses of Hadoop by the global organisations: Amazon and eBay uses Hadoop for Search Engine Optimization and research. Facebook uses Hadoop for reporting/analytics and machine learning. University of North Caroline (UNC) uses Hadoop for analysing Next Generation Sequencing data produced for the Cancer Genome Atlas (TCGA) project. Finally, there are many vendors integrated their own Hadoop systems including; CDH Hadoop distribution from Cloudera Greenplum - Hadoop distribution from EMC HDP Hortonworks Hadoop Platform InfoSphere Hadoop distribution from IBM
Conclusion
In this briefing paper on Big Data and Analytics, I have covered the following: The evolution of big data from structured to unstructured data. Big Data Hype: increased popularity with the significant drop of storage cost and technological advancement in business intelligence. Facebook example to attract many customers thus increasing profit substantially become one the biggest drive forces in the business to enter Big Data Analytics. Definitions from the key researchers in the field and 3Vs of the Big Data: Volume, Variety and Velocity. Simple explanations for the common technology terms used in the field such as Hadoop, Hive and its components displayed in a chart. The key vendor variations of Hadoop and popular uses of Hadoop by international organisations. Three big challenges faced by business when moving into Big Data Analytics field. Three paradoxes of big data that can affect our everyday lives at personal and public level.
References
Data Never Sleeps. [Art], Domo 2013, viewed 11 January 2014, <http://www.domo.com/blog/2012/06/how-much-data-is-created-every-minute/> CERN, 2008. LHC the guide, s.l.: CERN. [viewed 12 January 2014]. Facebook, Inc., 2012. Amendment No. 4 to S-1, Washington D.C: United States Securities and Exhanges Commission, [viewed 15 January 2014]. Grantz J. & Reinsel D., 2011. Extracting Value from Chaos, IDC, viewed 12 January 2014 King, N. M. R. &. J. H., 2013. Three Paradoxes of Big Data. Stanford Law Review Online, 3 September, 66(41), p. 6. Novet, J., 2013. Facebook unveils Presto engine for querying 250 PB data warehouse. [Online] Available at: http://gigaom.com/2013/06/06/facebook-unveils-presto-engine-for-querying-250-pbdata-warehouse/ [viewed 15 January 2014]. OReilly Radar Team, 2012. Planning for Big Data. California, OReilly Media. SAS, 2011. Big Data Meets Big Data Analytics, United States: SAS.