Sunteți pe pagina 1din 7

BIG DATA ANALYTICS

Assignment 1 - Briefing Paper

Abstract
The purpose of this briefing paper is to look at Big Data Analytics from different levels. From its evolution to its popularity in recent media, the main definitions from the key players in the field, the most commonly heard technology terms explained in simple format and lastly touches on the challenges that Big Data brings to business world, our personal lives and society.

Begum Bolu 6623433


Bachelor of Information Communication Technology Swinburne University of Technology

Table of Contents
Where is Big Data coming from? ........................................................................................................ 2 Why Big Data become so popular? ..................................................................................................... 2 What are the main characteristics of Big Data ? ................................................................................ 3 What are the key technologies behind Data Analytics? ..................................................................... 4 What are the three key challenges faced by businesses? .................................................................. 5 What is the three paradoxes of Big Data ? ......................................................................................... 5 Conclusion ........................................................................................................................................... 6 References .............................................................................................................................................. 6

Where is Big Data coming from?


Traditional medium to large corporations usually have Business Intelligence and Analytics. They process business data comes from transactional systems such as Billing and Customer Management Systems, Logistics and Shipping Systems or combinations of those systems such as ERP, etc. As technology evolved more data started to come into the picture. Such as web logs, clickstreams, videos, images, sensors, geo-tags are just a few examples that comes from social media channels. Indeed social media giants (Facebook, Twitter, YouTube, Instagram, et al.) are the biggest content generators through their millions of users around the world. In one minute, Google receives over 2,000,000 search queries. Facebook shares 684,478 pieces of content. Twitter users send over 100,000 tweets. Instagram users share 3,600 new photos.

Source: Data Never Sleeps 2013 by Domo

Why Big Data become so popular?


Big data is a buzzword that seemed to appear from nowhere. But in reality, big data isn't new. Instead, it is something that is moving into the mainstream and grabbing large amounts of attention for good reason. Gantz J. and Reinsel D. from IDC [2011, P.6] identified that Big data is enabled via inexpensive storage, improvements in sensor and data capture technology, increasing connections to information using the cloud, virtualized storage infrastructures and innovative software and analysis tools. Big data is not a "thing" rather a dynamic/activity that crosses many IT borders. Five years ago, only big business could afford to profit from big data: Walmart and Google or big financial institutions. Today, thanks to an open source project called Hadoop, commodity Linux hardware and cloud computing, this power is in reach for all. [Edd Dumbill 2012, p. 8] Most companies now recognize that they have opportunities to use data and analytics to raise productivity, improve decision making, and gain competitive advantage. Facebook analysed the user logs to determine the top 2 actions once the user completed they are 95% of likely to log back to Facebook.

According to Facebooks 4th amendment to its S-1 with US Securities and Exchange Commission (SEC) on April 2012, Facebook have 901 million monthly active users as at Q1 2012, an increase of 33% as compared to 680 million as at Q1 2011. (Facebook, Inc., 2012) Facebooks ability to analyse all the weblogs and clicks has gained them a very large customer portfolio. This is mostly likely the biggest reason why a lot of companies want to invest on big data.

What are the main characteristics of Big Data ?


According to (SAS, 2011), Big data is a relative term describing a situation where the volume, velocity and variety of data exceed an organizations storage or compute capacity for accurate and timely decision making. Gantz J. and Reinsel D. from IDC [2011, P.6] defines big data this way: "Big data technologies describe a new generation of technologies and architectures, designed to economically extract value from very large volumes of a wide variety of data, by enabling high-velocity capture, discovery, and/or analysis." When we analyse both definition we can see they always touch on: Large Volume of data High Velocity of the data Wide Variety of the data

Volume: refers to the size of the data. With the technological advancements and growing use of social media, the amounts of data has grown rapidly. Data volumes are ranging from Terabytes to Petabytes. Facebook recently revealed that with the new system they are developing they will be able to process their 250 PB worth of data. (Novet, 2013) Velocity: refers to the speed at which the data is being generated. Different applications outputs the data in different rates and in todays business world, decision makers wants to have information ready as quick as possible. In an extreme case scenario, The Large Hadron Collider at CERN experiments represent about 150 million sensors delivering data 40 million times per second. [CERNBrochure-FAQ-LHC-the guide, 2008, p.45] Variety: refers to the different formats in which data is being stored. Data format can be classified in two broad categories: Structured data: has a pre-defined data model/schema/structure and is often either relational in nature. The main advantage of having structured data is it can be easily managed. Examples: data in the relational databases, data from CRM systems, XML
files etc.

Unstructured data: doesnt have a well-defined data model or does not fit well into the relational world. Examples: flat files, spreadsheets, Word documents, emails,
images, audio files, video files, feeds, PDF files, scanned documents, etc.

What are the key technologies behind Data Analytics?


At the heart of many big data solutions is Apache Hadoop. Hadoop is a system for distributing computation among commodity servers. It is often used with the Hadoop Hive project, which layers data warehouse technology on top of Hadoop, enabling ad-hoc analytical queries. [Edd Dumbill, 2012, p. 24] Essentially Hadoop houses a collection of technologies that can perform the processing of mammoth amounts of data.

A quick snapshot from Big Data Tools Hadoop presentation by [S.S.Mulay, 2013] shows the many components that can be considered part of the Hadoop ecosystem. Different uses of Hadoop by the global organisations: Amazon and eBay uses Hadoop for Search Engine Optimization and research. Facebook uses Hadoop for reporting/analytics and machine learning. University of North Caroline (UNC) uses Hadoop for analysing Next Generation Sequencing data produced for the Cancer Genome Atlas (TCGA) project. Finally, there are many vendors integrated their own Hadoop systems including; CDH Hadoop distribution from Cloudera Greenplum - Hadoop distribution from EMC HDP Hortonworks Hadoop Platform InfoSphere Hadoop distribution from IBM

What are the three key challenges faced by businesses?


Big Data and Advanced Analytics will become one of the top issues for business leaders around the world. It will define the difference between the losers and winners going forward. To become a data-driven decision maker in your field, you must be willing to change the way you decide otherwise all of the data and analytics will not solve any problems. says Tim McGuire, a McKinsey director. [Making data analytics work: Three key challenges, 2013] According to McKinsey research group there are 3 biggest challenges of when moving into Big Data Analytics: 1. Determination of which data to analyse: Businesses have not only internal data from customer and billing systems but also geographic and web data collected from online marketing tools. Combination of these will increase the complexity of the first question when deciding on which data to use. 2. Analytics Modelling and Staff up-skilling: Maths intensive Analytics Model Exercise. Getting the right skilled people to do the modelling will also be part of the challenge. 3. Transforming the business: Transforming the business operations and decision making based on data analytics. There is no point of using the Analytics if the business is not going to change the way they decide.

What is the three paradoxes of Big Data ?


While the below were the issues that needs to be addresses by the business leaders, Stanford Law Review [King, 2013] has published the following 3 Paradoxes of Big Data that will raise concerns from the public eye: 1. The Transparency Paradox 2. The Identity Paradox 3. The Power Paradox First, while big data persistently collects all kinds of private information, the operations of big data itself are almost entirely covered in legal and commercial secrecy. This is called the Transparency Paradox. Second, big data seeks to identify, but it also threatens identity. With even the most basic access to a combination of big data pools like phone records, surfing history, buying history, social networking posts, I am and I like risk becoming you are and you will like. Every Google user is already influenced by Googles tailored search results, which risk producing individuals own chambers of thought. This is called the Identity Paradox. And third, big data is characterized by its power to transform society. It is advertised as a powerful tool that enables its users to view a sharper and clearer picture of the world. For example, many Arab Spring protesters and commentators credited social media for helping protesters to organize. Big data will create winners and losers, and it is likely to benefit the institutions who use its tools over the individuals being mined, analysed, and sorted. This is called the Power Paradox.

Conclusion
In this briefing paper on Big Data and Analytics, I have covered the following: The evolution of big data from structured to unstructured data. Big Data Hype: increased popularity with the significant drop of storage cost and technological advancement in business intelligence. Facebook example to attract many customers thus increasing profit substantially become one the biggest drive forces in the business to enter Big Data Analytics. Definitions from the key researchers in the field and 3Vs of the Big Data: Volume, Variety and Velocity. Simple explanations for the common technology terms used in the field such as Hadoop, Hive and its components displayed in a chart. The key vendor variations of Hadoop and popular uses of Hadoop by international organisations. Three big challenges faced by business when moving into Big Data Analytics field. Three paradoxes of big data that can affect our everyday lives at personal and public level.

References
Data Never Sleeps. [Art], Domo 2013, viewed 11 January 2014, <http://www.domo.com/blog/2012/06/how-much-data-is-created-every-minute/> CERN, 2008. LHC the guide, s.l.: CERN. [viewed 12 January 2014]. Facebook, Inc., 2012. Amendment No. 4 to S-1, Washington D.C: United States Securities and Exhanges Commission, [viewed 15 January 2014]. Grantz J. & Reinsel D., 2011. Extracting Value from Chaos, IDC, viewed 12 January 2014 King, N. M. R. &. J. H., 2013. Three Paradoxes of Big Data. Stanford Law Review Online, 3 September, 66(41), p. 6. Novet, J., 2013. Facebook unveils Presto engine for querying 250 PB data warehouse. [Online] Available at: http://gigaom.com/2013/06/06/facebook-unveils-presto-engine-for-querying-250-pbdata-warehouse/ [viewed 15 January 2014]. OReilly Radar Team, 2012. Planning for Big Data. California, OReilly Media. SAS, 2011. Big Data Meets Big Data Analytics, United States: SAS.

S-ar putea să vă placă și