Sunteți pe pagina 1din 4

MCA III SEM

INTRODUCTION

The term has been in use since the 1990s, with some giving credit to John Mashey for
popularizing the term. Big data usually includes data sets with sizes beyond the ability of
commonly used software tools to capture, curate, manage, and process data within a tolerable
elapsed time. Big data philosophy encompasses unstructured, semi-structured and structured
data, however the main focus is on unstructured data. Big data "size" is a constantly moving
target, as of 2012 ranging from a few dozen terabytes to many zettabytes of data. Big data
requires a set of techniques and technologies with new forms of integration to reveal insights
from datasets that are diverse, complex, and of a massive scale.
"Big data" is a field that treats ways to analyze, systematically extract information from, or
otherwise deal with data sets that are too large or complex to be dealt with by traditional data-
processing application software. Data with many cases (rows) offer greater statistical power,
while data with higher complexity (more attributes or columns) may lead to a higher false
discovery rate. Big data challenges include capturing data, data storage, data analysis, search,
sharing, transfer, visualization, querying, updating, information privacy and data source. Big data
was originally associated with three key concepts: volume, variety, and velocity. Other concepts
later attributed to big data are veracity (i.e., how much noise is in the data) and value.

pg. 1
MCA III SEM

Current usage of the term big data tends to refer to the use of predictive analytics, user behavior
analytics, or certain other advanced data analytics methods that extract value from data, and
seldom to a particular size of data set. "There is little doubt that the quantities of data now
available are indeed large, but that's not the most relevant characteristic of this new data
ecosystem." Analysis of data sets can find new correlations to "spot business trends, prevent
diseases, combat crime and so on." Scientists, business executives, practitioners of medicine,
advertising and governments alike regularly meet difficulties with large data-sets in areas
including Internet searches, fintech, urban informatics, and business informatics. Scientists
encounter limitations in e-Science work, including meteorology, genomics, connectomics,
complex physics simulations, biology and environmental research.
Relational database management systems, desktop statistics[ and software packages used to
visualize data often have difficulty handling big data. The work may require "massively parallel
software running on tens, hundreds, or even thousands of servers". What qualifies as being "big
data" varies depending on the capabilities of the users and their tools, and expanding capabilities
make big data a moving target. "For some organizations, facing hundreds of gigabytes of data for
the first time may trigger a need to reconsider data management options. For others, it may take
tens or hundreds of terabytes before data size becomes a significant consideration."

pg. 2
MCA III SEM

E-COMMERCE

E-Commerce businesses and retailers with online presence operate in a competitive and fast
paced environment dominated by price and online advertisements. Conventional approaches are
fragmented with individual analytics tools for separate data sources and information needs.
Bringing the data together, if at all, is typically done with a manual, human driven process.
Decisions are consequentially suboptimal because of the volume, variety, and velocity of data.
These are the foci of Big Data where it promises superior, data driven results. This article
describes the typical challenge and technologies to address them to guide businesses in the
strategic decision on adopting Big Data.
An e-commerce business has numerous data sources to consider when it makes decisions on how
to set a price and compete for advertisements space. The data sources vary in type and
subsequently, until recently, were difficult to integrate. Relevant data are competitor offers,
product cost and price, stock levels and stocking costs, sales, advertising campaigns and prices,
customer sentiment, and sales data.

The different data sources are available in a number of forms and require varying level of effort
to integrate:
* Competitor offers can be obtained through service like price comparison websites that may
sell datasets stored as files or via API access. Alternatively crawling services or programs can
regularly retrieve data for analysis. The resulting data will be either in a text format, e.g. JSON or
CSV, or in a database. Either SQL or NoSQL stores can be utilized with the latter being more
scalable for large-scale data sets.
*Product cost and price, stock level, stocking cost, and sales usually are available in a
business’s SQL data store or can be derived from information in it. Smaller businesses work
sometimes with Excel and CSV files.
*Advertisement campaign and pricing data can be collected through APIs from the providers
(and then stored in text files or databases) or exported as text files.
*Customer sentiments, often only available anecdotally to persons involved in the analytics,
can be captured in statistically relevant data sets from social media. The data can be collected
raw, e.g. Tweets, via APIs and then processed and stored. Alternatively, services may be used to
receive aggregated results, potentially already sentiment scored. Again depending on the service
the data may be available as text files or from APIs (and stored into databases if desired).

pg. 3
MCA III SEM

The discussion so far focused on the data input and output without defining the processing steps.
The technologies available are numerous and perplexing when not put in context of a use case.
At the core of big data processing stands Hadoop as a platform for scalable, inexpensive
processing. Hadoop processing traditionally was Java map-reduce programs, which are intricate
and verbose, and integrating multiple data sources resulted in significant development effort and
costs.

pg. 4

S-ar putea să vă placă și