Sunteți pe pagina 1din 8

Big Data Use Cases

Big data can help you address a range of business activities, from customer experience to
analytics. Here are just a few. (More use cases can be found at Oracle Big Data Solutions.)

Product Development
Companies like Netflix and Procter & Gamble use big data to anticipate customer demand.
They build predictive models for new products and services by classifying key attributes of
past and current products or services and modeling the relationship between those attributes
and the commercial success of the offerings. In addition, P&G uses data and analytics from
focus groups, social media, test markets, and early store rollouts to plan, produce, and
launch new products.

Predictive Maintenance
Factors that can predict mechanical failures may be deeply buried in structured data, such as
the equipment year, make, and model of a machine, as well as in unstructured data that
covers millions of log entries, sensor data, error messages, and engine temperature. By
analyzing these indications of potential issues before the problems happen, organizations
can deploy maintenance more cost effectively and maximize parts and equipment uptime.

Customer Experience
The race for customers is on. A clearer view of customer experience is more possible now
than ever before. Big data enables you to gather data from social media, web visits, call logs,
and other data sources to improve the interaction experience and maximize the value
delivered. Start delivering personalized offers, reduce customer churn, and handle issues
proactively.

Fraud and Compliance


When it comes to security, it’s not just a few rogue hackers; you’re up against entire expert
teams. Security landscapes and compliance requirements are constantly evolving. Big data
helps you identify patterns in data that indicate fraud and aggregate large volumes of
information to make regulatory reporting much faster.

Machine Learning
Machine learning is a hot topic right now. And data—specifically big data—is one of the
reasons why. We are now able to teach machines instead of program them. The availability
of big data to train machine-learning models makes that happen.

Operational Efficiency
Operational efficiency may not always make the news, but it’s an area in which big data is
having the most impact. With big data, you can analyze and assess production, customer
feedback and returns, and other factors to reduce outages and anticipate future demands.
Big data can also be used to improve decision-making in line with current market demand.

Drive Innovation
Big data can help you innovate by studying interdependencies between humans, institutions,
entities, and process and then determining new ways to use those insights. Use data insights
to improve decisions about financial and planning considerations. Examine trends and what
customers want to deliver new products and services. Implement dynamic pricing. There are
endless possibilities.

https://www.bernardmarr.com/default.asp?contentID=1076

BREAKING DOWN 'Big Data'

The increase in the amount of data available presents both opportunities and problems. In
general, having more data on one’s customers (and potential customers) should allow
companies to better tailor their products and marketing efforts in order to create the highest
level of satisfaction and repeat business.

Companies that are able to collect large amount of data are provided with the opportunity to
conduct deeper and richer analysis. This data can be collected from publicly shared
comments on social networks and websites, voluntarily gathered from personal electronics
and apps, through questionnaires, product purchases, and electronic check-ins. The
presence of sensors and other inputs in smart devices allows for data to be gathered across
a broad spectrum of situations and circumstances.

Challenges of Using Big Data

While better analysis is a positive, big data can also create overload and noise. Companies
have to be able to handle larger volumes of data, all the while determining which data
represents signals compared to noise. Determining what makes the data relevant becomes a
key factor. Furthermore, the nature and format of the data can require special handling
before it is acted upon. Structured data, consisting of numeric values, can be easily stored
and sorted. Unstructured data, such as emails, videos, and text documents, may require
more sophisticated techniques to be applied before it becomes useful.

Big data is most often stored in computer databases and is analyzed using software
specifically designed to handle large, complex data sets. Many Software-as-a-Service
(SaaS) companies specialize in managing this type of complex data. Data analysts look at
the relationship between different types of data, such as demographic data and purchase
history, to determine whether correlation exists. Businesses often use the assessment of big
data by such experts turn it into actionable information. Such assessments may be done in-
house within a company or externally by a third-party who focuses on processing big data
into digestible formats.

Nearly every department in a company can utilize findings from data analysis, from human
resources and technology to marketing and sales. The goal of big data is thus to increase the
speed at which products get to market, to reduce the amount of time and resources required
to gain market adoption, and to ensure that customers remain satisfied.

https://www.investopedia.com/terms/b/big-data.asp

https://www.ibmbigdatahub.com/infographic/where-does-big-data-come photo

The three primary sources of Big Data

Social data comes from the Likes, Tweets & Retweets, Comments, Video Uploads,
and general media that are uploaded and shared via the world’s favorite social media
platforms. This kind of data provides invaluable insights into consumer behavior and
sentiment and can be enormously influential in marketing analytics. The public web is
another good source of social data, and tools like Google Trends can be used to
good effect to increase the volume of big data.

Machine data is defined as information which is generated by industrial equipment,


sensors that are installed in machinery, and even web logs which track user behavior.
This type of data is expected to grow exponentially as the internet of things grows
ever more pervasive and expands around the world. Sensors such as medical devices,
smart meters, road cameras, satellites, games and the rapidly growing Internet Of
Things will deliver high velocity, value, volume and variety of data in the very near
future.

Transactional data is generated from all the daily transactions that take place both
online and offline. Invoices, payment orders, storage records, delivery receipts – all
are characterized as transactional data yet data alone is almost meaningless, and
most organizations struggle to make sense of the data that they are generating and
how it can be put to good use.

https://www.cloudmoyo.com/what-is-big-data-and-where-it-comes-from/

souce of big data

MEDIA AS A BIG DATA SOURCE


Media is the most popular source of big data, as it provides valuable insights on
consumer preferences and changing trends. Since it is self-broadcasted and crosses
all physical and demographical barriers, it is the fastest way for businesses to get an
in-depth overview of their target audience, draw patterns and conclusions, and
enhance their decision-making. Media includes social media and interactive
platforms, like Google, Facebook, Twitter, YouTube, Instagram, as well as generic
media like images, videos, audios, and podcasts that provide quantitative and
qualitative insights on every aspect of user interaction.

CLOUD AS A BIG DATA SOURCE


Today, companies have moved ahead of traditional data sources by shifting
their data on the cloud. Cloud storage accommodates structured and unstructured
data and provides business with real-time information and on-demand insights. The
main attribute of cloud computing is its flexibility and scalability. As big data can be
stored and sourced on public or private clouds, via networks and servers, cloud
makes for an efficient and economical data source.
THE WEB AS A BIG DATA SOURCE
The public web constitutes big data that is widespread and easily accessible. Data on
the Web or ‘Internet’ is commonly available to individuals and companies alike.
Moreover, web services such as Wikipedia provide free and quick informational
insights to everyone. The enormity of the Web ensures for its diverse usability and is
especially beneficial to start-ups and SME’s, as they don’t have to wait to develop
their own big data infrastructure and repositories before they can leverage big data.

IOT AS A BIG DATA SOURCE


Machine-generated content or data created from IoT constitute a valuable source of
big data. This data is usually generated from the sensors that are connected to
electronic devices. The sourcing capacity depends on the ability of the sensors to
provide real-time accurate information. IoT is now gaining momentum and includes
big data generated, not only from computers and smartphones, but also possibly
from every device that can emit data. With IoT, data can now be sourced from
medical devices, vehicular processes, video games, meters, cameras, household
appliances, and the like.

DATABASES AS A BIG DATA SOURCE


Businesses today prefer to use an amalgamation of traditional and modern
databases to acquire relevant big data. This integration paves the way for a hybrid
data model and requires low investment and IT infrastructural costs. Furthermore,
these databases are deployed for several business intelligence purposes as well.
These databases can then provide for the extraction of insights that are used to drive
business profits. Popular databases include a variety of data sources, such as MS
Access, DB2, Oracle, SQL, and Amazon Simple, among others.
The process of extracting and analyzing data amongst extensive big data sources is
a complex process and can be frustrating and time-consuming. These complications
can be resolved if organizations encompass all the necessary considerations of big
data, take into account relevant data sources, and deploy them in a manner which is
well tuned to their organizational goals.

https://www.allerin.com/blog/top-5-sources-of-big-data

Big data is often boiled down to a few varieties including social data, machine data, and
transactional data. Social media data is providing remarkable insights to companies on
consumer behavior and sentiment that can be integrated with CRM data for analysis,
with 230 million tweets posted on Twitter per day, 2.7 billion Likes and comments added to
Facebook every day, and 60 hours of video uploaded to YouTube every minute (this is what
we mean by velocity of data).

There are some of many sources of BigData:


1. Sensors/meters and activity records from electronic devices:These kind of
information is produced on real-time, the number and periodicity of observations of
the observations will be variable, sometimes it will depend of a lap of time, on others
of the occurrence of some event (per example a car passing by the vision angle of a
camera) and in others will depend of manual manipulation (from an strict point of view
it will be the same that the occurrence of an event). Quality of this kind of source
depends mostly of the capacity of the sensor to take accurate measurements in the
way it is expected.

2. Social interactions: Is data produced by human interactions through a network, like


Internet. The most common is the data produced in social networks. This kind of
data implies qualitative and quantitative aspects which are of some interest to be
measured. Quantitative aspects are easier to measure tan qualitative aspects, first
ones implies counting number of observations grouped by geographical or temporal
characteristics, while the quality of the second ones mostly relies on the accuracy of
the algorithms applied to extract the meaning of the contents which are commonly
found as unstructured text written in natural language, examples of analysis that are
made from this data are sentiment analysis, trend topics analysis, etc.;

3. Business transactions: Data produced as a result of business activities can be


recorded in structured or unstructured databases. When recorded on structured data
bases the most common problem to analyze that information and
get statistical indicators is the big volume of information and the periodicity of its
production because sometimes these data is produced at a very fast pace, thousands
of records can be produced in a second when big companies like supermarket
chains are recording their sales. But these kind of data is not always produced in
formats that can be directly stored in relational databases, an electronic invoice is an
example of this case of source, it has more or less an structure but if we need to put
the data that it contains in a relational database, we will need to apply some process
to distribute that data on different tables (in order to normalize the data accordingly
with the relational database theory), and maybe is not in plain text (could be a picture,
a PDF, Excel record, etc.), one problem that we could have here is that the process
needs time and as previously said, data maybe is being produced too fast, so we
would need to have different strategies to use the data, processing it as it is without
putting it on a relational database, discarding some observations (which criteria?),
using parallel processing, etc. Quality of information produced from business
transactions is tightly related to the capacity to get representative observations and to
process them;

4. Electronic Files: These refers to unstructured documents, statically or dynamically


produced which are stored or published as electronic files, like Internet pages, videos,
audios, PDF files, etc. They can have contents of special interest but are difficult to
extract, different techniques could be used, like text mining, pattern recognition, and
so on. Quality of our measurements will mostly rely on the capacity to extract and
correctly interpret all the representative information from those documents;

5. Broadcastings: Mainly referred to video and audio produced on real time, getting
statistical data from the contents of this kind of electronic data by now is too complex
and implies big computational and communications power, once solved the problems
of converting “digital-analog” contents to “digital-data” contents we will have similar
complications to process it like the ones that we can find on social interactions

http://www.hadoopadmin.co.in/sources-of-bigdata/

B IG DATA is a term used for a collection of data sets so large and

complex that it is difficult to process using traditional applications/tools. It is


the data exceeding Terabytes in size. Because of the variety of data that it
encompasses, big data always brings a number of challenges relating to its
volume and complexity. A recent survey says that 80% of the data created
in the world are unstructured. One challenge is how these unstructured data
can be structured, before we attempt to understand and capture the most
important data. Another challenge is how we can store it. Here are the top
tools used to store and analyse Big Data. We can categorise them into two
(storage and Querying/Analysis).
1. Apache Hadoop
Apache Hadoop is a java based free software framework that can
effectively store large amount of data in a cluster. This framework runs in
parallel on a cluster and has an ability to allow us to process data across all
nodes. Hadoop Distributed File System (HDFS) is the storage system of
Hadoop which splits big data and distribute across many nodes in a cluster.
This also replicates data in a cluster thus providing high availability.
2. Microsoft HDInsight
It is a Big Data solution from Microsoft powered by Apache Hadoop which is
available as a service in the cloud. HDInsight uses Windows Azure Blob
storage as the default file system. This also provides high availability with
low cost.
3. NoSQL
While the traditional SQL can be effectively used to handle large amount of
structured data, we need NoSQL (Not Only SQL) to handle unstructured
data. NoSQL databases store unstructured data with no particular schema.
Each row can have its own set of column values. NoSQL gives better
performance in storing massive amount of data. There are many open-
source NoSQL DBs available to analyse big Data.
4. Hive
This is a distributed data management for Hadoop. This supports SQL-like
query option HiveSQL (HSQL) to access big data. This can be primarily
used for Data mining purpose. This runs on top of Hadoop.
5. Sqoop
This is a tool that connects Hadoop with various relational databases to
transfer data. This can be effectively used to transfer structured data to
Hadoop or Hive.
6. PolyBase
This works on top of SQL Server 2012 Parallel Data Warehouse (PDW) and
is used to access data stored in PDW. PDW is a datawarhousing appliance
built for processing any volume of relational data and provides an
integration with Hadoop allowing us to access non-relational data as well.
7. Big data in EXCEL
As many people are comfortable in doing analysis in EXCEL, a popular tool
from Microsoft, you can also connect data stored in Hadoop using EXCEL
2013. Hortonworks, which is primarily working in providing Enterprise
Apache Hadoop, provides an option to access big data stored in their
Hadoop platform using EXCEL 2013. You can use Power View feature of
EXCEL 2013 to easily summarise the data. (More information).
Similarly, Microsoft’s HDInsight allows us to connect to Big data stored in
Azure cloud using a power query option. (More information).
8. Presto
Facebook has developed and recently open-sourced its Query engine
(SQL-on-Hadoop) named Presto which is built to handle petabytes of data.
Unlike Hive, Presto does not depend on MapReduce technique and can
quickly retrieve data.
https://bigdata-madesimple.com/top-big-data-tools-used-to-store-and-analyse-data/

1. INTRODUCTION In today‘s world, every tiny gadget is a potential data source,


adding to the huge data bank. Also, every bit of data generated is practically valued,
be it enterprise data or personal data, historical or transactional data. This data
generated through large customer transactions, social networking sites is varied,
voluminous and rapidly generating. All this data prove a storage and processing crisis
for the enterprises. The data being generated by massive web logs, healthcare data
sources, point of sale data, satellite imagery needs to be stored and handled well.
Although, this huge amount of data proves to be a very useful knowledge bank if
handled carefully. Hence big companies are investing largely in the research and
harnessing of this data. By all the predilections today for Big Data, one can easily
state Big Data technology as the next best thing to learn. All the attention it has been
getting over the past decade is but due to its overwhelming need in the industry.
2. 5. CONCLUSION AND FUTURE WORK Due to the gargantuan increase in the
amount of data in various fields, it becomes a major challenge to handle the data
efficiently. Thus to come up with plausible solutions to these challenges one needs to
understand the concept of big data, its handling methodologies and furthermore
improve the approaches in analyzing big data. With the advent of social media the
need for handling big data has increased monumentally. If Facebook, Whatsapp,
Twitter produce data which keeps increasing exponentially every year (or a few
years) then handling such huge data is something to be efficiently dealt with. We will
need solutions to such issues without compromising the quality of the results. Hence
we attempt to showcase basic concepts of big data that can be used as easy referrals
for literature survey of the topic.

The Apache Hadoop software library is a framework that allows


for the distributed processing of large data sets across clusters
of computers using simple programming models. It is designed
to scale up from single servers to thousands of machines, each
offering local computation and storage. Rather than rely on
hardware to deliver high-availability, the library itself is designed
to detect and handle failures at the application layer, so
delivering a highly-available service on top of a cluster of
computers, each of which may be prone to failures.

S-ar putea să vă placă și