Documente Academic
Documente Profesional
Documente Cultură
Did you know that 90% of the data in the world today was created in the last two years? The growth of data will continue to rise as the cost of storage decreases. Below is the rate of growth of data since 2005 to 2015(forecast) IDC Research
companies were combined to form Computing Tabulating Recording Corporation which is now International Business Machines IBM. We must first understand, how do we measure of data? Byte (8 bits equals 1 byte) is a unit of measure of digital information. it is important to understand the below metrics as we start looking at big data. Big Data is certainly not a measurement but we should understand how much data is considered Big. In the table below the starting of terrabyte of data is considered to starting of what is referred to as big data.
Estimations 4.7 Gigabytes: A single DVD 1 Terabyte: About two years worth of non-stop MP3s. (Assumes one megabyte per minute of music) 10 Terabytes: The printed collection of the U.S. Library of Congress 1 Petabyte: The amount of data stored on a stack of CDs about 2 miles high or 13 years of HD-TV video 20 Petabytes: The storage capacity of all hard disk drives created in 1995 1 Exabyte: One billion gigabytes 5 Exabytes: All words ever spoken by mankind
Big Data can be described by the following characteristics: (i) Volume The quantity of data that is generated is very important in this context.It is the size of the data which determines the value and potential of the data under consideration and whether it can actually be considered as Big Data or not.The name Big Data itself contains a term which is related to size and hence the characteristic. (ii) Variety- The next aspect of Big Data is its variety.This means that the category to which Big Data belongs to is also a very essential fact that needs to be known by the data analysts.This helps the people, who are closely analyzing the data and are associated with it, to effectively use the data to their advantage and thus upholding the importance of the Big Data. (iii) Velocity- The term velocity in this context refers to the speed of generation of data or how fast the data is generated and processed to meet the demands and the challenges which lie ahead in the path of growth and development. (iv) Variability- This is a factor which can be a problem for those who are analyse the data. This refers to the inconsistency which can be shown by the data at times, thus hampering the process of being able to handle and manage the data effectively. (v) Complexity- Data management can become a very complex process,especially when large volumes of data come from multiple sources.These data need to be linked,connected and correlated in order to be able to grasp the information that is supposed to be conveyed by these data.This situation,is therefore,termed as the complexity of Big Data.
1. Structured, and 2. Unstructured Data (there are also semi-structured data eg. XML)
Structured data has semantic meaning attached to it whereas Unstructured data has no latent meaning. The growth in data that we are referring is most unstructured data. Below are few examples of unstructured data -
1. Calls, text, tweet, net surf, browse through various websites each day and exchange messages via several means. 2. Social media usage my several million people for exchanging data in various forms also forms a part of Big Data. 3. Transactions made through card for various payment issues in large numbers every second across the world also constitutes the Big Data.
Hope this posts gave you enough of infomation about Big Data and in future posts, we will be looking at Applications of Big Data i.e. Big Data Analytics, Careers in Big Data From Software Engineer to becoming a Data Scientist, Hadoop and Applications.