Sunteți pe pagina 1din 30

Technical Seminar Report on

HUMANIZED BIG DATA


Submitted to

Jawaharlal Nehru Technological University Hyderabad


In partial fulfillment of the requirements for the

award of the degree of

BACHELOR OF TECHNOLOGY
in

COMPUTER SCIENCE AND ENGINEERING


By
Mr. Thirunahari Pravarshi
(14VE1A05B4)
Under the Guidance of
Dr. M. Purushotham, M.Tech, M.S, PhD

SREYAS INSTITUTE OF ENGINEERING AND TECHNOLOGY


DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
(Affiliated to JNTUH, Approved by A.I.C.T.E and Accredited by NAAC, New Delhi)
Bandlaguda, Beside Indu Aranya, Nagole,
Hyderabad-500085, Ranga Reddy Dist.
(2014 – 2018)
SREYAS INSTITUTE OF ENGINEERING AND TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

CERTIFICATE

This is to certify that the Technical Seminar Report on “Humanized Big Data”
submitted by Mr. Thirunahari Pravarshi bearing Hall ticket No. 14VE1A05B4 in
partial fulfillment of the requirements for the award of the degree of Bachelor of
Technology in Computer Science And Engineering from Jawaharlal Nehru
Technological University, Kukatpally, Hyderabad for the academic year 2017-18 is a
record of bonafide work carried out by him / her under our guidance and Supervision.

Name of the Guide Head of the Department

Dr. M Purushotham Dr. DVSS.Subrahmanyam


M.Tech, M.S, PhD AMIETE (CSE)., M.Tech (CSE).,
Ph.D (CSE).M.Sc(Industrial Maths),
M.Sc (Maths)., M.Phil (Maths).,

External Examiner Director


(Prof S. Venkateswarlu)
SREYAS INSTITUTE OF ENGINEERING AND TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

DECLARATION

I, Mr. Thirunahari Pravarshi, bearing 14VE1A05B4 hereby declare that the


Technical Seminar titled "HUMANIZED BIG DATA” done by me under the guidance
of Dr. M Purushotham, which is submitted in the partial fulfillment of the requirement for
the award of the B.Tech degree in Computer Science and Engineering at Sreyas
Institute of Engineering & Technology for Jawaharlal Nehru Technological University,
Hyderabad is my original work.

THIRUNAHARI PRAVARSHI
(14VE1A05B4)
ACKNOWLEDGEMENT

The successful completion of any task would be incomplete without mention of


the people who made it possible through their guidance and encouragement crowns all
the efforts with success.
I take this opportunity to acknowledge with thanks and deep sense of gratitude to
Dr. M Purushotham, M.Tech, M.S, PhD, Department of Computer Science
and Engineering for his constant encouragement and valuable guidance during the
Technical Seminar work.
A Special note of Thanks to Dr. DVSS. Subrahmanyam, Prof. and Head of the
Department, Computer Science and Engineering, who has been a source of
Continuous motivation and support. He had taken time and effort to guide and correct me
all through the span of this work.
I owe very much to the Professor S.Venkateswarlu, Director and the
Management who made my term at Sreyas a Stepping stone for my career. I treasure
every moment I had spent in the college.
Last but not least, my heartiest gratitude to my parents and friends for their
continuous encouragement and blessings. Without their support this work would not have
been possible.

Mr. Thirunahari Pravarshi


(14VE1A05B4)
CONTENTS

1. INTRODUCTION 2-6

2. TECHNOLOGY 7-9

3. DEVELOPMENT 10-11

4. ROLE OF HUMANIZED BIG DATA 12-13

5. APPLICATIONS 14-20

6. FUTUTRE SCOPE 21-23

7. CONCLUSION 23-26

1
1. INTRODUCTION
The term, Big Data has been coined to refer to the gargantuan bulk of data that cannot be dealt
with by traditional data-handling techniques. Big Data is still a novel concept, and in the
following literature we intend to elaborate it in a palpable fashion. It commences with the
concept of the subject in itself along with its properties and the two general approaches of
dealing with it. The comprehensive study further goes on to elucidate the applications of Big
Data in all diverse aspects of economy and being. The utilization of Big Data Analytics after
integrating it with digital capabilities to secure business growth and its visualization to make it
comprehensible to the technically apprenticed business analyzers has been discussed in depth.
Aside this, the incorporation of Big Data in order to improve population health, for the
betterment of finance, telecom industry, food industry and for fraud detection and sentiment
analysis have been delineated. The challenges that are hindering the growth of Big Data
Analytics are accounted for in depth in the paper. This topic has been segregated into two
arenas- one being the practical challenges faces whilst the other being the theoretical
challenges. The hurdles of securing the data and democratizing it have been elaborated amongst
several others such as inability in finding sound data professionals in required amounts and
software that possess ability to process data at a high velocity. Through the article, the authors
intend to decipher the notions in an intelligible manner embodying in text several use-cases
and illustrations. Every day, we create 2.5 quintillion bytes of data so much that 90% of the
data in the world today has been created in the last two years alone. This data comes from
everywhere: sensors used to gather climate information, posts to social media sites, digital
pictures and videos, purchase transaction records, and cell phone GPS signals to name a few.
Such colossal amount of data that is being produced continuously is what can be coined as Big
Data. Big Data decodes previously untouched data to derive new insight that gets integrated
into business operations. However, as the amounts of data increases exponential, the current
techniques are becoming obsolete. Dealing with Big Data requires comprehensive coding
skills, domain knowledge and statistics. Despite being Herculean in nature, Big Data
applications are almost ubiquitous- from marketing to scientific research to customer interests
and so on. We can witness Big Data in action almost everywhere today. From Facebook which
handles over 40 billion photos from its user base to CERN‘s Large Hydron Collider (LHC)
which generates 15PB a year to Walmart which handles more than 1 billion customer
transactions in an hour. Over a year ago, the World Bank organized the first WBG Big Data
Innovation Challenge which brought forward several unique ideas applying Big Data such as
2
big data to predict poverty and for climate smart agriculture and fore user focused Identification
of Road Infrastructure Condition and safety and so on. Big Data can be simply defined by
explaining the 3V‘s volume, velocity and variety which are the driving dimensions of Big Data
quantification. Gartner analyst, Doug Laney introduced the famous 3 V‘s concept in his 2001
Metagroup publication, 3D data management: Controlling Data Volume, Variety and
Velocity‘.

Fig 1.0: Big Data 3v’s

A. Volume: This essentially concerns the large quantities of data that is generated
continuously. Initially storing such data was problematic because of high storage costs.
However with decreasing storage costs, this problem has been kept somewhat at bay as
of now. However this is only a temporary solution and better technology needs to be
developed. Smartphones, E-Commerce and social networking websites are examples
where massive amounts of data are being generated. This data can be easily
distinguishes between structured data, unstructured data and semi-structured data.

B. Velocity: In what now seems like the pre-historic times, data was processed in batches.
However this technique is only feasible when the incoming data rate is slower than the
batch processing rate and the delay is much of a hindrance. At present times, the speed
at which such colossal amounts of data are being generated is unbelievably high. Take
Facebook for example – it generates 2.7 billion like actions/day and 300 million photos
amongst others roughly amounting to 2.5 million pieces of content in each day while
Google Now processes over 1.2 trillion searches per year worldwide.

3
C. Variety: Documents to databases to excel tables to pictures and videos and audios in
hundreds of formats, data is now losing structure. Structure can no longer be imposed
like before for the analysis of data. Data generated can be o any type- structures, semi-
structured or unstructured. The conventional form of data is structured data. For
example text. Unstructured data can be generated from social networking sites, sensors
and satellites.
Implementing Big Data is a mammoth task given the large volume, velocity and variety. Big
Data‖ is a term encompassing the use of techniques to capture, process, analyse and visualize
potentially large datasets in a reasonable timeframe not accessible to standard IT technologies.
By extension, the platform, tools and software used for this purpose are collectively called Big
Data technologies‖. Currently, the most commonly implemented technology is Hadoop.
Hadoop is the culmination of several other technologies like Hadoop Distribution File Systems,
Pig, Hive and HBase. Etc. However, even Hadoop or other existing techniques will be highly
incapable of dealing with the complexities of Big Data in the near future. The following are
few cases where standard processing approaches to problems will fail due to Big Data-
• Large Synoptic Survey Telescope (LSST): Over 30 thousands gigabytes (30TB)
of images will be generated every night during the decade –long LSST survey
sky.
• There is a corollary to Parkinson‘s Law that states: Data expands to fill the space
available for storage.
• This is no longer true since the data being generated will soon exceed all
available storage space.
• 72 hours of video are uploaded to YouTube every minute.

There are at present two general approaches to big data-


a. Divide and Conquer using Hadoop: The huge data set is broken into smaller parts and
processed in a parallel fashion using many servers.
b. Brute Force using technology on the likes of SAP HANA: One very powerful server with
massive storage is used to compress the data set into a single unit.

4
WHAT IS BIG DATA?

Big data is a broad term for data sets so large or complex that traditional data processing
applications are inadequate. Challenges include analysis, capture, data curation, search, sharing,
storage, transfer, visualization, and information privacy. The term often refers simply to the use of
predictive analytics or other certain advanced methods to extract value from data, and seldom to a
particular size of data set. Accuracy in big data may lead to more confident decision making. And
better decisions can mean greater operational efficiency, cost reductions and reduced risk.

Analysis of data sets can find new correlations, to "spot business trends, prevent diseases, and
combat crime and so on." Scientists, practitioners of media and advertising and governments alike
regularly meet difficulties with large data sets in areas including Internet search, finance and
business informatics. Scientists encounter limitations in e-Science work, including meteorology,
genomics, connect omics, complex physics simulations, and biological and environmental
research.

Work with big data is necessarily uncommon most analysis is of "PC size" data, on a desktop PC
or notebook that can handle the available data set.

Relational database management systems and desktop statistics and visualization packages often
have difficulty handling big data. The work instead requires "massively parallel software running
on tens, hundreds, or even thousands of servers". What is considered "big data" varies depending
on the capabilities of the users and their tools, and expanding capabilities make Big Data a moving
target. Thus, what is considered to be "Big" in one year will become ordinary in later years. "For
some organizations, facing hundreds of gigabytes of data for the first time may trigger a need to
reconsider data management options. For others, it may take tens or hundreds of terabytes before
data size becomes a significant consideration."

WHAT IS HUMANIZED BIG DATA?

Big data has been a big topic for the past five years or so, when it started making headlines as
a buzzword. The idea is that mass quantities of gathered data which we now have access to can
help us in everything from planning better medical treatments to executing better marketing
campaigns. But big data’s greatest strength it’s quantitative, numerical foundation is also a

5
weakness. In 2019, we’ll see advancements to humanize big data, seeking more empathetic
and qualitative bits of data and projecting it in a more visualized, accessible way.

The past few years have seen Data Science and Big Data moving forward dramatically. While
Big Data may not technically be a technology it is going to be a massive disruptor. The entire
business world has been transformed and will continue to be so in the upcoming years.
Humanized big data is a part of that.

At the heart of things, humanizing big data seems like something that would be the opposite of
productive and certainly counterintuitive. The reality is that big data must start and end with
humanity. Because people are the source of big data, they must also be heavily involved in the
processing and the interpretation.

“Basically, what we’re seeing is that new elements of behavior are affecting data, such as
politics, opinions and agents interacting with and influencing each other. So, in addition to
looking at data in the traditional way, we must now consider political and social structures, and
how people learn from and influence each other; we must consider how ideas flow through
social networks, what motivates people to contribute to discussions, and the consequences of
engagement.” Information should be processed in such a way that non-data scientists can also
derive clear answers ̶ ‘actionable insights’ ̶ from big data analyses and use them as a basis for
decision making.

This requires an approach that is more qualitative than quantitative, as well as a high degree of
visualization of the data. The idea of humanizing data may seem counterintuitive on its face,
but it’s really not. As our experts point out, data starts with humans. Therefore, at some point,
humans must also be involved in the processing of data.

Humanizing Big Data argues that data isn’t really something that can be fully automated.
Rather than handing the entire processing task over to super-smart machines, the human
element, so integral to understanding the results, can’t be removed from the analytics process.

6
2. TECHNOLOGY

Big Data is a collection of data sets so large and complex that it is difficult to process using
traditional applications/tools. It is the data exceeding Terabytes in size. Because of the variety
of data that it encompasses, big data always brings a number of challenges relating to its volume
and complexity. A recent survey says that 80% of the data created in the world are unstructured.
One challenge is how these unstructured data can be structured, before we attempt to
understand and capture the most important data. Another challenge is how we can store it.

Artificial Intelligence for Big Data

While the concept of artificial intelligence (AI) has been around nearly as long as there have
been computers, the technology has only become truly usable within the past couple of years.
In many ways, the big data trend has driven advances in AI, particularly in two subsets of the
discipline: machine learning and deep learning.

The standard definition of machine learning is that it is technology that gives "computers the
ability to learn without being explicitly programmed." In big data analytics, machine learning
technology allows systems to look at historical data, recognize patterns, build models and
predict future outcomes. It is also closely associated with predictive analytics.

Deep learning is a type of machine learning technology that relies on artificial neural networks
and uses multiple layers of algorithms to analyse data. As a field, it holds a lot of promise for
allowing analytics tools to recognize the content in images and videos and then process it
accordingly.

Fig 2.0: AI for Big Data


7
AI and machine learning is a set of technologies that empower connected machines and
computers to learn, evolve and improve upon their own learning by reiterating and consistently
updating the data bank through recursive experiments and human intervention. As a matter of
fact, the technological giants, corporations, and data scientists worldwide are already foreseeing
big data to make a huge difference in the overall AI and machine learning landscape.

Big Data to Enhance Artificial Intelligence

Inherently, machine learning is defined as an advanced application of AI in interconnected


machines and peripherals by granting them access to databases and making them learn new
things from it on their own in a programmed manner. As the size of big data is continuously
growing and new grounds are being broken in analysing its implications as well, it is becoming
more meaningful and contextually relevant for the machines to have a better idea of their
functions with the help of big data analysis.

For example, the automation infrastructure of a leather garments plant based in Bangladesh that
exports its products to the entire European market will be able to judge market requirements for
the coming winter season in much accurate and insightful manner if it is able to access and
analyse big data reports about the market, financial and weather conditions of that area
throughout the year.

Big Data to Help Expand AI & Machine Learning Workforce

One of the major concerns people are having about AI today is that it will minimize human
requirement in all the job sectors as most of the work will be done by robots and AI-based
computers in future, while the truth is far from it when observed with the role to be played by
big data in the picture. The sentimental and emotional big data analysis will always require
human intelligence as the machines lack emotional intelligence and decision-making abilities
based on sentiments.

For example, a data scientist analysing the big data pertaining to a pharmaceutical giant that
caters to the needs of South-East Asian market may be able to sense the pharmaceutical
prescriptions to be launched in those areas keeping the local inhibitions and reservations in

8
mind, while computer-based big data analysis can never yield such contextually sensitive search
results.

Hence, the increasing collaboration between AI, machine learning, and big data will only make
way for talented and capable human data scientists to consistently evolve and rise in the market,
and by all means, they will be required in a much huge number as the applications of these
technologies gain movement.

AI & Machine Intelligence Solution Providers to Benefit from Big Data

Right now, the overall size of global markets for machine learning and artificial intelligence
based solutions is highly limited. With increasing proliferation of big data analysis into the
artificial intelligence and machine learning procedures, devices and machines will get smarter
and able to perform in a better manner. This will lead to consistent improvement, enhancement
and advancement in AI solutions which will boost the market adoption of these solutions, giving
rise to a high increase in their market demand.

In his analytical piece, James Canton, a well-known Big Data, and AI expert suggests that as in
near future even the network nodes, chips, sensors and the software programs that will run IoT
networks will be AI enabled via the cloud or at the chip or infrastructure level, it will be
impossible for the massive global IoT network of tomorrow that will process, distribute, collect
big data to enable us, humans, to run this network without a digital brain or AI based solution
that is smart enough to do this.

For example, big data analysis of AI best learning in the schools of rural and smaller towns of
Latin American regions ten years hence will help inconsistent modification and revision of AI-
based academic solutions, so that more schools and educational institutions will be encouraged
to adopt artificial intelligence into their training and teaching methods and the market sector
will witness a considerable growth accordingly.

9
3. DEVELOPMENT

Big Data has truly come of age in 2013 when Oxford English Dictionary introduced the term
“Big Data” for the first time in its dictionary. The story of how data became big starts many
years before the current buzz around big data. Already seventy years ago we encounter the first
attempts to quantify the growth rate in the volume of data or what has popularly been known
as the “information explosion” (a term first used in 1941, according to the Oxford English
Dictionary). Research on the effective usage of information and communication technologies
for development (also known as ICT4D) suggests that big data technology can make important
contributions but also present unique challenges to International development.

Advancements in big data analysis offer cost-effective opportunities to improve decision-


making in critical development areas such as health care, employment, economic productivity,
crime, security, and natural disaster and resource management. Additionally, user-generated
data offers new opportunities to give the unheard a voice. However, longstanding challenges
for developing regions such as inadequate technological infrastructure and economic and
human resource scarcity exacerbate existing concerns with big data such as privacy, imperfect
methodology, and interoperability issues.

Big Data has been described by some Data Management pundits (with a bit of a snicker) as
“huge, overwhelming, and uncontrollable amounts of information.” In 1663, John Graunt dealt
with “overwhelming amounts of information” as well, while he studied the bubonic plague,
which was currently ravaging Europe. Graunt used statistics and is credited with being the first
person to use statistical data analysis. In the early 1800s, the field of statistics expanded to
include collecting and analysing data.

The evolution of Big Data includes a number of preliminary steps for its foundation, and while
looking back to 1663 isn’t necessary for the growth of data volumes today, the point remains
that “Big Data” is a relative term depending on who is discussing it. Big Data to Amazon or
Google is very different than Big Data to a medium-sized insurance organization, but no less
“Big” in the minds of those contending with it.

Such foundational steps to the modern conception of Big Data involve the development of
computers, smart phones, the internet, and sensory (Internet of Things) equipment to provide
data. Credit cards also played a role, by providing increasingly large amounts of data, and
certainly social media changed the nature of data volumes in novel and still developing ways.
The evolution of modern technology is interwoven with the evolution of Big Data.

10
1996 • the price of digital storage falls to the point where it is more cost-effective than paper.
1997 • Google launch their search engine which will quickly become the most popular in the
world. • Michael Lesk estimates the digital universe is increasing tenfold in size every year.
1999 • First use of the term Big Data in an academic paper – Visually Exploring Gigabyte
Datasets in Real-time (ACM) • First use of term Internet of Things, in a business presentation
by Kevin Ashton to Procter and Gamble.
2001 • Three “Vs” of Big Data – Volume, Velocity, Variety – defined by Doug Laney
2005 • Hadoop – an open source Big Data framework now developed by Apache – is
developed. • The birth of “Web 2.0 – the user- generated web”.
2008 • Globally 9.57 zettabytes (9.57 trillion gigabytes) of information is processed by the
world’s CPUs. • An estimated 14.7 hexabyte of new information is produced this year.
2009 • the average US company with over 1,000 employees is storing more than 200 terabytes
of data according to the report Big Data: The Next Frontier for Innovation, Competition and
Productivity by McKinsey Global Institute.
2010 • Eric Schmidt, executive chairman of Google, tells a conference that as much data is now
being created every two days, as was created from the beginning of human civilization to the
year 2003.
2011 • The McKinsey report states that by 2018 the US will face a shortfall of between 140,000
and 190,000 professional data scientists, and warns that issues including privacy, security and
intellectual property will have to be resolved before the full value of Big Data will be realised.
2014 • Mobile internet use overtakes desktop for the first time • 88% of executives responding
to an international survey by GE say that big data analysis is a top priority 2015 • The data
volumes are exploding, more data has been created in the past two years than in the entire
previous history of the human race.

Fig 3.0: Development


11
4. ROLE OF BIG DATA

Conversing with Machines

There’s no getting away from the fact big data is increasingly automated and mechanized. As
machines become smarter, they’re becoming more adept at collecting, analyzing, and learning
from data trends. The challenge will be engaging humans and machines in conversation about
that data. Machines may be able to better handle information than human beings ever can, but
it still takes a human touch to interpret it.

Quality over Quantity

Big data has focused, to a large degree, on volume. The sheer amount of information that can
be collected in short order is a defining feature of the field. But companies are realizing
something as they come up against increasing information overload: In most cases, quality is
more important than quantity. While plenty of data can be collected, businesses will
increasingly emphasize quality data: Actionable data that provides important insights, over
mere quantity.

True 360-degree Views

The sheer amount of data collected through big data processes and technologies has enabled
companies to see big pictures: Trends are laid out and mapped with ease, and it’s easy to see
changes. But what about seeing more complete pictures? As companies collect more data about
each and every one of their customers, they’ll be able to put together true 360-degree views of
individual consumers, enabling customization of services and products on an unforeseen scale.

Employing Common Sense

One common complaint about big data is it lacks a human touch and this lack of humanization
is precisely what causes problems. Take, for example, an insurance company with a policy
requiring all customers to have their cholesterol tested before being eligible for health
insurance. While the numbers point to this as a smart move in determining a person’s risk and
12
eligibility, there are some groups that testing doesn’t make sense for such as infants. Simply
looking at the numbers isn’t enough; the people behind the scenes are needed to interpret those
numbers and to apply common sense to the trends they see.

Seeing Opportunity

Humanizing big data will increasingly be viewed not as a challenge but as an opportunity.
Companies that successfully humanize their big data operations will interact with their clients
and customers more efficiently and effectively, delivering truly superior products and services.
Those who stick solely to crunching the numbers won’t survive very long in the new
environment, as their methods will become increasingly outdated and out of touch with
consumer desires.

Real Interactions, Real Impacts

Marketing and retail operations have been at the forefront of big data analysis, since
understanding consumer behaviour helps them refine their products and services, and target
particular demographics. Now their challenge is using data to foster real interactions between
people their customers and their employees. But big data can also be used to support other
industries and it can have real impacts on people’s everyday lives. Take healthcare industries
using fitness tracker information to help patients live better lives or engage in preventative
healthcare. It’s not entirely inconceivable; the fitness tracker and wearable tech are becoming
big business, and these devices collect plenty of information about their users some of which
could be useful for physicians and other healthcare practitioners.

ROLE OF BIG DATA Fig 4.0: Role of Big Data

13
5. APPLICATIONS

Big Data Applications and Enhanced Cyber Security


Cyber security should be first on the to do list of all enterprise IT and IT cyber security
practitioners. When you discuss big data and security, it’s about the ability to gather massive
amounts of data in order to discover insights that predict and help prevent cyber-attacks. The
opportunity for incredible results was always there, but now there have been huge leaps forward
in technology. There are now tools and techniques that enable enterprises to stay ahead of the
perpetrators. A combination of big data analytics with specific security technologies that yields
today’s strongest cyber defines posture.

Get the Most Advantageous View of Your Customers


Construct a fuller 360o view of your customers by adding more data sources – internal, external,
proprietary, open source. Paint a fuller picture, allowing the organization to better understand
customers and find advantageous means of communicating with them. Understand what when
and why they buy, why they don’t or what they might buy next time.

Improving the Data Warehouse to Improve Business Insights


Use big data applications to improve decision making. Data stored in many different systems
can be brought together for greater access and better decision making. Fold in big data and
leverage advanced data warehouse capabilities to increase operational efficiency – and enable
new forms of analysis. Use new technologies like big data specific platforms to create the
opportunity for analysis of disparate data types. More data and broader data sources yield
insights for stronger competitive advantages.

Big Sensor Data and Big Advantages Using Big Data Applications
Consider the opportunities analyzing things like machine and sensor or operational data can do
for improving customer service and overall business results. The boom in and current
pervasiveness of IT machine data, sensors, meters, GPS devices and myriad more requires
analysis and combination with pertinent internal and external data sources. By employing not
so complicated big data analytics, organizations can gain real-time visibility into operations,
mechanical situations, customer experiences, transactions and behavior. NCR now receives
telematics data from devices around the globe to determine the health of the equipment. The

14
benefit? NCR sends digital repair instructions remotely or sends technicians with the correct
equipment, to the right device, at the right time. Downtime can be planned or even prevented.
The benefits of big data analytics and tailored big data applications are very real. These are just
four of the top uses for the new wealth of data. Many organizations have found many
advantages in their explorations with big data.

Fig 5.0: Applications of Big Data

Big Data is slowly becoming ubiquitous. Every arena of business, health or general living
standards now can implement big data analytics. To put simply, Big Data is a field which can
be used in any zone whatsoever given that this large quantity of data can be harnessed to one‘s
advantage. The major applications of Big Data have been listed below.

The Third Eye- Data Visualization

Organizations worldwide are slowly and perpetually recognizing the importance of big data
analytics. From predicting customer purchasing behavior patterns to influencing them to make
purchases to detecting fraud and misuse which until very recently used to be an
incomprehensible task for most companies big data analytics is a one-stop solution. Business
15
experts should have the opportunity to question and interpret data according to their business
requirements irrespective of the complexity and volume of the data. In order to achieve this
requirement, data scientists need to efficiently visualize and present this data in a
comprehensible manner. Giants like Google, Facebook, Twitter, EBay, Wal-Mart etc., adopted
data visualization to ease complexity of handling data. Data visualization has shown immense
positive outcomes in such business organizations. Implementing data analytics and data
visualization, enterprises can finally begin to tap into the immense potential that Big data
possesses and ensure greater return on investments and business stability.

Integration- An exigency of the 21st century

Integrating digital capabilities in decision-making of an organization is transforming


enterprises. By transforming the processes, such companies are developing agility, flexibility
and precision that enables new growth. Gartner described the confluence of mobile devices,
social networks, cloud services and big data analytics as the as nexus of forces. Using social
and mobile technologies to alter the way people connect and interact with the organizations
and incorporating big data analytics in this process is proving to be a boon for organizations
implementing it. Using this concept, enterprises are finding ways to leverage the data better
either to increase revenues or to cut costs even if most of it is still focused on customer-centric
outcomes. Such customer-centric objectives may still be the primary concern of most
companies, a gradual shift to integrating big data technologies into the background operations
and internal processes.

Fig 5.2: Analysis as generated by IBM institute of Business Value 2014 Analytics Study

16
Big Data in Healthcare:

Healthcare is one of those arenas in which Big Data ought to have the maximum social impact.
Right from the diagnosis of potential health hazards in an individual to complex medical
research, big data is present in all aspects of it [12]. Devices such as the Fitbit [13], Jawbone
[14] and the Samsung Gear Fit [15] allow the user to track and upload data. Soon enough such
data will be compiled and made available to doctors, which will aid them in the diagnosis.
Several partnerships like the Pittsburgh Health Data Alliance have been established. The
Pittsburgh Health Data Alliance [16] is a collaboration of the Carnegie Mellon University,
University of Pittsburgh and the UPMC. In their website, they state [16], ―The health care
field generates an enormous amount of data every day. There is a need, and opportunity, to
mine this data and provide it to the medical researchers and practitioners who can put it to work
in real life, to benefit real people……The solutions we develop will be focused on preventing
the onset of disease, improving diagnosis and enhancing quality of care…….Further, there is
the potential to lower health care costs, one of the greatest challenges facing our nation. And
the Alliance will also drive economic growth in Pittsburgh, attracting hundreds of companies
and entrepreneurs, and generating thousands of jobs, from around the world. The patients
diagnosis will be analyzed and compared with the symptoms of others to discover patterns and
ensure better treatment. IBM has taken initiative in a large scale to implement big data in
healthcare systems be in its collaboration with healthcare giant Fletcher Allen or with the
Premier healthcare alliance to change the way unstructured but useful clinical data is made
available to more medical practitioners so as to improve population health. Big Data can also
be used in major clinical trials like cure for various forms of cancer and developing tailor-made
medicines for individual patients according to their genetic makeup. To summarize, Sundar
Ram of Oracle stated, Big Data solutions can help the industry acquire organize & analyze this
data to optimize resource allocation, plug inefficiencies, reduce cost of treatment, improve
access to healthcare & advance medicinal research.‖

Big Data and the World of Finance:

Big Data can be a very useful tool in analyzing the incredibly complex stock market moves
and aid in making global financial decisions. For example, intelligent and extensive analysis
of the big data available on Google Trends can aid in forecasting the stock market. Though this
is not a fool-proof method, it definitely is an advancement in the field. A research study [19]
17
by the Warwick Business School drew on records from Google, Wikipedia and Amazon
Mechanical Trunk in the time period of 2004-2012 and analyzed the link between Internet
searches on politics or business and stock market moves. In the paper, the author states, ―We
draw on data from Google and Wikipedia, as well as Amazon Mechanical Turk. Our results
are in line with the intriguing possibility that changes in online information-gathering behavior
relating to both politics and business were historically linked to subsequent stock market
moves….Our results provide evidence that for complex events such as large financial market
moves, valuable information may be contained in search engine data for keywords with less-
obvious semantic connections to the event in question. Overall, we find that increases in
searches for information about political issues and business tended to be followed by stock
market falls.‖ Big Data is also being implemented in a field called Quantitative Investing ‘where
data scientists with negligible financial training are trying to incorporate computing power into
predicting securities prices by drawing ideas from sources like newswires, earning reports,
weather bulletins, Facebook and Twitter.

18
Fig 5.5: Wall Street Journal summarizes the above concept.

One very interesting avenue of using Big Data in finance is the sentiment extraction from news
articles. Market sentiment refers to the irrational belief in investors about cash-flow returns.
The Heston-Sinha‘s Application of the Machine Learning algorithm provides us with the
probability of an article being positive ‘negative‘ and neutral‘ using two other popular methods,

19
one being with the use of the Harvard IV Dictionary. In general, big data is set to revolutionize
the landscape of Finance and Economy. Several financial institutions are adopting big data
policies in order to gain a competitive edge. Complex algorithms are being developed to
execute trades through all the structured and unstructured data gained from the sources. The
methods adopted so far has not been completely adept, however, extensive research ensures
growing dependence of the stock markets, financial organizations and economies on big data
analytics.

Big Data in Fraud Detection:

Forensic Data Analytics or FDA has been an intriguing area of interest in the past decade.
However, very few companies are actually using FDA to mine big data. The reasons for this
unfortunate situation vary from the deficit of expertise and awareness, developing the right
tools to mine big data to lack of appropriate technology and inability to handle such humungous
quantities of data. Ernst & Young undertook the Global forensic data analytics survey in 2014
and found that, our survey finds that 42% of companies with revenues between US$100 million
to US$1 billion are reviewing less than 10,000 records. And 71% companies with more than
US$1 billion in sales report examining just one million records or fewer….Companies know
there are high risk numbers in book entries, such as round thousands or duplicates, but they‘re
only just starting to analyze descriptions for those book entries. Looking at both the numbers
and words can mean the difference between uncovering fraud, and falling victim to it. The
combination of appropriate data and big data analytics can help combat fraudulent activities.
Though several companies are mining big data for this purpose there are still limitations [26]
in their approach. They are either keeping the data siloed, limiting the analysis to be performed
or only taking into consideration the structured data thus only giving a subset of information.
A more holistic approach to the implementation of big data analytics is required. Companies
such as Pactera is developing solutions which will process massive amounts of structured and
unstructured data and develop varied models and algorithms to find patterns of fraud and
anomalies and predict customer behavior.

20
6. FUTURE SCOPE
Today, Big Data is influencing IT industry like few technologies have done before. The
massive data generated from sensor-enabled machines, mobile devices, cloud computing,
social media, satellites help different organizations improve their decision making and take
their business to another level.

"Big data absolutely has the potential to change the way governments, organizations, and
academic institutions conduct business and make discoveries, and its likely to change how
everyone lives their day-to-day lives," - Susan Hauser, corporate vice president of Microsoft.
Data is the biggest thing to hit the industry since PC was invented by Steve Jobs.

As mentioned earlier in this paper, every day data is generated in such a rapid manner that,
traditional database and other data storing system will gradually give up in storing, retrieving,
and finding relationships among data. Big data technologies have addressed the problems
related to this new big data revolution through the use of commodity hardware and distribution.
Companies like Google, Yahoo!, General Electric, Cornerstone, Microsoft, Kaggle, Facebook,
Amazon that are investing a lot in Big Data research and projects.

IDC estimated the value of Big Data market to be about $ 6.8 billion in 2012 growing almost
40 percent every year to $17 billion by 2015.‖ By 2017, Wikibon‘s Jeff Kelly predicts the Big
Data market will top $50 billion. Demand is so hot for solutions that all companies are
exploring big data strategies. The problem is that the companies lack internal expertise and best
practices.. the side effect is that there is a services and consulting boom in big data. It‘s a perfect
storm of product and services‖ says Wikibon‘s Jeff Kelly.

Recently it was announced that, Indian Prime Minister‘s office is using Big Data analytics to
understand Indian citizen‘s sentiments and ideas through crowd sourcing platform
www.mygov.in and social media to get a picture of common people‘s thought and opinion on
government actions. Google is launching the Google Cloud Platform, which provides
developers to develop a range of products from simple websites to complex applications. It
enables users to launch virtual machines, store huge amount of data online, and plenty of other
things.

21
Basically, it will be a one stop platform for cloud based applications, online gaming, mobile
applications, etc. All these required huge amount of data processing where Big Data plays an
immense role in data processing.

The predictions from the IDC Future Scope for Big Data and Analytics are:

1. Visual data discovery tools will be growing 2.5 times faster than rest of the Business
Intelligence (BI) market. By 2018, investing in this enabler of end-user self-service will
become a requirement for all enterprises.
2. Over the next five years spending on cloud-based Big Data and analytics (BDA) solutions
will grow three times faster than spending for on premise solutions. Hybrid on/off premise
deployments will become a requirement.
3. Shortage of skilled staff will persist. In the U.S. alone there will be 181,000 deep analytics
roles in 2018 and five times that many positions requiring related skills in data management
and interpretation.
4. By 2017 unified data platform architecture will become the foundation of BDA strategy. The
unification will occur across information management, analysis, and search technology.
5. Growth in applications incorporating advanced and predictive analytics, including machine
learning, will accelerate in 2015. These apps will grow 65% faster than apps without predictive
functionality.
6. 70% of large organizations already purchase external data and 100% will do so by 2019. In
parallel more organizations will begin to monetize their data by selling them or providing
value-added content.
7. Adoption of technology to continuously analyze streams of events will accelerate in 2015 as
it is applied to Internet of Things (IoT) analytics, which is expected to grow at a five-year
compound annual growth rate (CAGR) of 30%.
8. Decision management platforms will expand at a CAGR of 60% through 2019 in response
to the need for greater consistency in decision making and decision making process knowledge
retention.
9. Rich media (video, audio, image) analytics will at least triple in 2015 and emerge as the key
driver for BDA technology investment.

22
10. By 2018 half of all consumers will interact with services based on cognitive computing on
a regular basis. Big data isn't new, but now has reached critical mass as people digitize their
lives. "People are walking sensors," said Nicholas Scotland, project manager at NASA within
the Human Adaptation and Countermeasures Division of the Space Life Sciences Directorate.
Taking an average of all the figures suggested by leading big data market analyst and research
firms, it can be concluded that approximately 15 percent of all IT organizations will move to
cloud-based service platforms, and between 2015 and 2021, this service market is expected to
grow about 35 percent.

23
7. CONCLUSION
From a historical perspective, big data can be viewed as the latest generation in the evolution
of decision support data management [Watson and Marjanovic 2013c]. The need for data to
support computer-based decision making has existed at least since the early 1970s with DSS.
This period can be thought of as the first generation of decision support data management. It
was very application-centric with data organized to support a single decision or a set of related
decisions.
By the 1990s, there was a need to support a wide variety of BI and analytic applications (e.g.,
reporting, executive information systems) with data. Having separate databases (i.e.,
independent data marts) for each application was costly, resulted in data inconsistencies across
applications, and failed to support enterprise-wide applications.

The outcome was the emergence of enterprise data warehouses (the second generation), which
represented a data-centric approach to data management. The next generation (the third) was
real-time data warehousing. Technology had improved by 2000 so that it was possible to
capture data in real time and trickle feed it into the data warehouse.

The significance of this evolution is that it changed the paradigm for what kinds of decisions
could be supported. With real time data, operational decisions and processes could be
supported. Big data is the fourth generation decision support data management.

The ability to capture, store, and analyze high-volume, high-velocity, and high-variety data is
allowing decisions to be supported in new ways. It is also creating new data management
challenges. For many years, companies developed data warehouses as the focal point for data
to support decision making.

This is changing, as new data sources, platforms, and cloud-based services have emerged. As
a result, data is becoming more federated; that is, data is stored and accessed from multiple
places. Adding to this trend are business units such as finance and marketing that have the
business need, resources, and political clout to acquire their own platforms, services, and tools.
In many organizations, IT is losing some control over data management.

24
This is not bad if it leads to more agility and better organizational performance. The downside,
however, includes data silos that don’t share data, data inconsistencies, inefficiencies in storing
data, and duplication of resources. Organizations are accepting that data federation is going to
exist, at least in the short to medium term, and are instituting greater controls over their data
management practices.

Some are putting more emphasis on data governance (e.g., data stewards, metadata
management, and master data management). They are also creating BI or analytics centers of
excellence to provide strategic direction for the use of data and analytics, prioritize projects,
provide shareable resources, establish guidelines and standards, participate in tool selection,
troubleshoot problems, and more.

Organizations are gaining unprecedented insights into customers and operations because of the
ability to analyze new data sources and large volumes of highly detailed data [Russom, 2012].
This data is bringing more context and insight to organizational decision making. Success with
big data is not guaranteed, however, as there are specific requirements that must be met.
Organizations should start with specific, narrowly defined objectives, often related to better
understanding and connecting with customers and improving operations.

There must be strong, committed sponsorship. Depending on the project(s), the sponsorship
can be departmental or at the senior executive level. The CIO is typically responsible for
developing and maintaining the big data infrastructure. For some companies (e.g., Google),
alignment between the business and IT strategies is second nature because big data is what the
business is all about.

For others, careful consideration needs to be given to organization structure issues; governance;
the skills, experiences, and perspectives of organizational personnel; how business needs are
turned into successful projects; and more. There should be a fact-based decision-making culture
where the business is “run by the numbers” and there is constant experimentation to see what
works best.

25
The creation and maintenance of this culture depends on senior management. Big data has
spawned a variety of new data management technologies, platforms, and approaches. These
must be blended with traditional platforms (e.g., data warehouses) in a way that meets
organizational needs cost effectively. The analysis of big data requires traditional tools like
SQL, analytical workbenches (e.g., SAS Enterprise Miner), and data analysis and visualization
languages like R. All of this is for naught, however, unless there are business users, analysts,
and data scientists who can work with and use big data. As organizations make greater use of
big data, it is likely that there will be increased concerns and legislation about individual
privacy issues.

26

S-ar putea să vă placă și