Sunteți pe pagina 1din 3

The Art of Big Data

Rahul Beakta
Computer Science & Engineering Department
Baddi University of Emerging Sciences & Technology,
Baddi, Himachal Pradesh, India
rahulbeakta93@gmail.com

AbstractIn this paper we will discuss various ways of Big


Data to improve the digital world. Big data is now a cost effective
approach to deal with high volume and complex data. In the
massive data useful information and patterns are hidden which
cannot be processed by the conventional database systems. New
strategies for managing the data are discussed to convert big data
into smart data. This paper aims to represent advantages of big
data analytics and discuss big data management strategies used
to manage the data.

Shalini Chauhan
Computer Science & Engineering Department
Baddi University of Emerging Sciences & Technology,
Baddi, Himachal Pradesh, India
shalinee@baddiuniv.ac.in

Data velocity: Data is generating very quickly from


different locations.

Data variety: Data is stored in structured, semistructured and unstructured form [6].

Data volume: Data are in size of terabytes or petabytes.

Data complexity: Data that is stored and managed in


different locales, data centers, or cloud geo-zones.

Keywordsbig data; analytics; characteristics; hadoop; IoT.

I.

II. BIG DATA TOOLS

INTRODUCTION

Now a days everyone is talking about Big Data. But what


exactly it is? Big Data is similar to small data but in large
volume. There is no exact definition for big data. Data that has
extra-large Volume, comes from Variety of sources, Variety of
formats and comes at us with a great Velocity is normally refer
to as Big Data [1]. Big data can be structured, unstructured or
semi-structured which cannot be processed by the
conventional techniques and tools. Due to the large volume of
data which is growing rapidly fast from different sources it is
very difficult to capture, process and analyze the data [2].

A. NoSQL
It is a database which provides data storage and retrieval
which is not in the form of rows and columns. NoSQL systems
are also sometimes called Not Only SQL that means they
also support SQL. NoSQL databases are increasingly used in
big data applications. Companies like Facebook and Google
are using NoSQL.
Why NoSQL?

Handle both unstructured and semi structured data.

Adapt to change with updates and time.

Support large numbers of concurrent users.

Deliver better experiences to a globally distributed


users.
Always available.

B. MapReduce
MapReduce is a programming model. It is designed for
processing large volumes of data in parallel. It divides the
work into a set of independent tasks [3]. It is used to run
programs in Hadoop [4].
It has two functions:

Fig. 1. Characterization of Big Data: Volume, Velocity, Variety.

Big Data characteristics:

Map: The master node takes the input, divide into


smaller subparts and distribute into worker nodes [5].

Reduce: Master node collects the answers from all the


sub problems and combines them together to form the
output [7].

C. Hadoop :
It is an open-source software framework. It stores and
processes huge data sets on a large cluster of commodity
hardware [13]. Hadoop delivers distributed processing power
at a remarkably low cost, making it an effective complement
to a traditional enterprise data infrastructure [8].
It consists of :
1) Hadoop File System.
2) MapReduce Programming Paradigm.
III.

STRATEGIES FOR MANAGING BIG DATA

This is where when various strategies and tools are applied


to manage the big data. Data can be in many forms for
example structured or unstructured. To manage the various
forms and variety of data new and old skills along with data
management tools are applied [11].
Big data is capable of extracting important values from the
large data sets. To understand the big data strategies and
techniques we have designed a framework(fig. 2). In the first
dimension we have mentioned data type. Data which is
structured and operational is called as transactional data. Data
that come from unstructured sources (e.g., social media) is
known as non transactional.
Second dimension is business objective in which
organizations measure or experiments while performing big
data capabilities. In experimenting organizations look for
some scientific tools to apply. Finally this results in four
quadrants as shown in figure.

It uses statistical methods to get the experimental results


which might not came previously. This model uses predictive
approach to predict user behaviour based upon their previous
transactions on database. Organizations uses data mining
techniques to target the customers on their previous
experience on their portal.
The rise in statistical techniques, it can result to faster and
direct results for the organizations. But there is lack of
qualified data scientists/statistician in the market.
3) Social Analytics:
It includes all the non-transactional data. Most of social
data comes from the social media websites such as facebook,
Twitter and Instagram. This data is not in the form of
structured. This data is not processed by the conventional
database systems.
Other techniques are used to manage this data and can
have some useful patterns. Sentiment analysis can be done
from this data which can give some very amazing results to
the organizations.
4) Decision Science:
It involves analysis on non transactional data. New
experiments are performed on the basis of previous data
analysis. Consumer gives reviews and product suggestions on
the online platforms. These can be analyzed to take better
decisions.
Decision scientists explore social big data to field
research for example to determine feedback, value, validity,
feasibility to put the idea in action.
IV. BIG DATA AND INTERNET OF THINGS
The Internet of Things(IoT) is network of connected
devices through software and sensors. IoT is one of the very
important topic in technology and engineering[9]. In so many
conferences and news it is shown as an IoT Revolution.
Internet of Things is becoming more popular with the
evolution of Big data. Due to network of internet of things a
lot more data are generated which contains a lot useful
information hidden in it. Big data is all about data and Internet
of Things is about data, devices and connectivity [10].
Impact of Internet of Things on Big Data:

Fig. 2. Big data framework.

Big Data Strategies:


1) Performance Management:
It uses predetermined queries and multidimensional
analysis. Transactional and operational data is used for to
know a customers purchase activity at a particular
ecommerce website. Managers in organizations can take short
term business decisions and longer plans [12]. Analysts can
use queries and filter the output also.
2) Data Exploration:

1) Data Storage:
Internet of Things is collecting a lot data to the data centers
of organizations. This heterogeneous data should be handled
carefully. This data is also called as big data and hidden data
can be extracted from this data.
2) Big Data Technologies:
Its very clear that huge data will be created by Internet of
Things but to tackle with this data better technologies should
be applied. As a technology concern, Big data technology are
applied to analyze the data. Hadoop and NoSQL etc.
technologies are used.
3) Data Security:
There will be different devices for the data. There will be
different data for a device. This IoT security is new for the
security professionals and lack of experience can increase the
risk. A multilayered security system should be implemented.

4) Big Data Analytics:


Basically Big Data and IoT are two sides of a coin. Now a
days organizations are facing a lot problems to extract the
information from the IoT data. As we know not all the data is
important so a proper infrastructure should be developed for
analytics.
The growth of Internet of Things has a revolution with
this. It contains huge amount of data and Big data comes along
with this so new strategies and management techniques should
be implemented.

encourage ourselves in big data research if we want to get the


benefit of this huge data.
We believe data Internet of Things is of great significance
with the big data, and can provide a lot of data which should
be properly examined, extracted and better strategies should
be applied to get the more benefit from big data.

References
[1]

[2]

[3]

[4]
[5]

[6]
[7]
Fig. 3. Big Data and Internet of Things.

Big data and IoT are linked to each another as all devices
in IoT are connected to internet and constantly generating the
data which can be used to discover information and hidden
patterns. Internet of Things and Big Data are both important
because both are based on the technology improvements.
V. CONCLUSION
In this paper we have examined the big data and its
opportunities to use the massive data to extract the hidden
information and patterns from the data. We are living In this
information age where a large volume of data are produced
daily and within this data lay patterns and hidden knowledge
which should be analyzed, extracted and utilized. We must

[8]
[9]
[10]

[11]
[12]
[13]
[14]
[15]

Rahul Beakta, Big Data And Hadoop: A Review Paper. International


Journal of Computer Science & Information Technology, RIEECE 2015,
BUEST, Volume 2, Spl. Issue 2 (2015)
Harshawardhan S. Bhosale, Prof. Devendra P. Gadekar, A Review
Paper on Big Data and Hadoop. International Journal of Scientific and
Research Publications, Volume 4, Issue 10, October 2014 1 ISSN 22503153.
Shital Suryawanshi, Prof. V.S.Wadne, Big Data Mining using Map
Reduce: A Survey Paper, IOSR Journal of Computer Engineering
(IOSR-JCE) e-ISSN: 2278-0661,p-ISSN: 2278-8727, Volume 16, Issue
6, Ver. VII (Nov Dec. 2014), PP 37-40.
K. V. Shvachko and A.C. Murthy, Scaling Hadoop to 4000 Nodes at
Yahoo Yahoo! Developer Network Blog, 2008.
Shalini Jain, Satendra Sonare, Ashok Verma,Big Data Analysis using
HDFS, C-MEANS and MapReduce, Volume 5, Issue 4, 2015 ISSN:
2277 128X International Journal of Advanced Research in Computer
Science and Software Engineering.
IBM
Big
Data
analytics
HUB,
www.ibmbigdatahub.com/infographic/four-vs-big-data
Shilpa, Manjit Kaur, BIG Data and Methodology-A review, Volume 3,
Issue 10, October 2013 , ISSN: 2277 128X, International Journal of
Advanced Research in Computer Science and Software Engineering.
Serge Blazhievsky, Introduction to Hadoop, MapReduce and HDFS for
Big Data Applications, 2013 Storage Networking Industry Association.
Internet
of
Things
Wikipedia,
https://en.wikipedia.org/wiki/Internet_of_Things.
Tamara Dull, Big data and the Internet of things: Two sides of the same
coin, http://www.sas.com/en_us/insights/articles/big-data/big-data-andiot-two-sides-of-the-same-coin.html.
Philip Russom, MANAGING BIG DATA, TDWI BEST PRACTICES
REPORT, FOURTH QUARTER 2013, TDWI research.
Salvatore Parise, Bala Iyer, Four strategies to capture and create value
from big data. Ivey Business journal, Issues: july/august 2012.
Ms. Gurpreet Kaur, Ms. Manpreet Kaur, REVIEW PAPER ON BIG
DATA USING
HADOOP, International Journal of Computer Engineering &
Technology (IJCET)
Volume 6, Issue 12, Dec 2015, pp. 65-71, Article ID: IJCET_06_12_008

S-ar putea să vă placă și