Documente Academic
Documente Profesional
Documente Cultură
BUYERS GUIDE
TABLE OF CONTENTS
02
Why Buy?
03
06
08
16
25
36
01
Analyze data
Visualize data
Export data
Secure data
Monitoring
Scheduling and
dependencies
14+ months
BUY
Data loading
Scheduling and
dependencies
3 months
BAY AREA
NEW YORK
IT Project Manager
$ 140,000.00
$ 126,000.00
System Administrator
$ 117,000.00
$ 105,000.00
Network Administrator
$ 119,000.00
$ 107,000.00
Database Administrator
$ 125,000.00
$ 119,000.00
IT Security Manager
$ 116,000.00
$ 104,000.00
$ 137,000.00
$ 133,000.00
Data Scientist
$ 138,000.00
$ 133,000.00
Java Developer
$ 136,000.00
$ 133,000.00
QA Engineer
$ 120,000.00
$ 114,000.00
$ 1,148,000.00
$ 1,074,000.00
02
Agree on
Use Case
(1 month)
Qualify
Solutions
(1 month)
Validate
Purchase
(1.5 months)
(.5 month)
JOB TITLE
BAY AREA
NEW YORK
Project Manager
52,500
47,250
51,375
49,875
IT Administrator
44,645
40,125
$ 148,500
137,250
03
Powerful Analytics
Business Infographics
Interactive spreadsheet UI
Built-in analytic functions
Macros and function plug-in API
In this guide, well discuss how you can expedite the deployment and
selection process by:
04
Define
Decision
Criteria
Agree on
Use Case
Qualify
Solutions
Validate
06
STEP 1
DEFINE DECISION
CRITERIA
Data Integration
Does system support native connectors to
unstructured and semi-structured data sources (e.g.
log files, social, SaaS, machine data)? Does it support
flexible partitioning of the data so that it is easy to work
with large amounts of data? Does the solution provide
data quality functions so that the data can be quickly
normalized and transformed?
Administration
Does the system support flexible security integration
with LDAP or ActiveDirectory?
Analytics
Does the solution provide an intuitive environment (e.g.
spreadsheet) that business users can quickly use?
Does the solution include pre-built analytic functions?
Does the system provide a preview to validate analysis
and show data lineage for auditing data flows? Do
data models have to be defined before insights can be
gained? Do analysts need to know what they want to
do before they have had a chance to look at the data?
Can analysts look at the data, iterate, make the
changes they need and analyze without involving IT?
Visualizations
Does the solution support complete freeform
visualization? Or is it just a combination of reports and
dashboards?
Extensibility
Does the solution support open APIs for custom data
connections and custom visualizations?
Architecture
Does the solution run natively on Hadoop? Is a
separate cluster required (ideally no separate cluster
should be required)? Is the product limited by the
availability of the memory within the nodes? The ideal
solution should have no memory constraints. Does the
solution provide a job planner and optimizer to ensure
the lowest number of MapReduce jobs is executed?
Vendor requirements
Is the system proven, does it have numerous major
releases? How much is it in enterprise use, has it
analyzed substantial data, (terabytes, petabytes or
exabytes)?
09
STEP 2
AGREE ON
USE CASES
The ideal Big Data Analytics solution enables users to integrate, analyze, and visualize data to discover insights.
Users get deeper insights across not only transactions but also interactions to reveal more precise insights, predict
behavior and even make recommendations of future behavior.
In the following pages, youll see examples of Datameers customer use cases. These use cases detail the variety,
volume and velocity of data.
The goal of depicting these use cases is to help you identify potential use cases of your own. Bottom line: faster
time to value.
10
Historical
Inventory
Pricing
Transaction
Variety
Volume
3000 TB
Velocity
11
Log Files
Firewall
Feeds
Variety
50+ different feeds, MySQL and JSON data from message queues and flat files, and log files
Volume
As their customer base grew, data volumes outstripped the capacity of their existing
RDMBS-based system.
Velocity
To maintain SLAs within hours, the company had to rapidly analyze data across growing
volumes of customer data.
12
BEHAVIORAL ANALYTICS
Improve game flow and increase number of paying customers
The game for gaming companies is to increase customer acquisition, retention and monetization. This means
getting more users to play, play more often and longer, and pay. First, analysts use Datameer to identify common
characteristics of users. As a result, gaming companies can target these users better with the right advertising
placement and content. To increase retention, analysts use Datameer to understand what gets a user to play
longer. A user who plays longer and interacts with other players makes the overall gaming experience better.
User Prole
Variety
Game event logs, user profile data, social interaction data captured during games
between players
13
PREDICTIVE SUPPORT
Identify operational failure and address them before they are reported
A couple of hours of downtime in a store or production environment means lost revenue, sometimes in the
millions of dollars. The clues to where downtime may occur are spread across devices around a store or facility
including WLAN controllers, mobile devices, routers and firewall devices. For this customer, these devices are
used to run operations such as tracking inventory. Each network device generates enormous amounts of
machine-generated data.
Store X
Store Y
Store Z
Variety
Device data from WLAN controllers, mobile devices, routers and firewall devices
14
FRAUD DETECTION
Identify potential fraud
Credit card fraud has changed. Instead of stealing a credit card and using it to buy big ticket items, some credit
card thieves have become more sophisticated. For example, they can now making numerous, small transactions
that are seemingly benign. But if Joe is making 100 $5 margarita transactions at various locations, something is
wrong. By analyzing point of sale, geolocation, authorization, and transaction data with Datameer, this financial
customer was able to identify fraud patterns in historical data.
Point of Sale
Geo-location
Authorization
Transaction
Variety
15
DEVICE ANALYTICS
Enable business analysts or non-technical users through a spreadsheet
interface to analyze and do big data discovery
This enterprise hardware company was generating and collecting data that was doubling every 15 months. In
addition to the rapidly growing data volumes, there were hundreds of different semi-structured and unstructured
log formats. Before Datameer, analysts were forced to write ad hoc Perl code to parse a subset of the log files and
store data locally. By using Datameer, the company was able to derive valuable insights that helped virtually every
group Support, Development, Marketing, and Services. For example, Support was able to send out a
replacement part before the component actually failed. Sales was able to look at usage patterns to improve
forecasting and renewal negotiations.
Log Files
Data Store
Transaction Data
Variety
16
STEP 3
QUALIFY
SOLUTIONS
Data Loading
Data Loading
A software has to be developed to load data from multiple, various data sources. This system needs to deal with
the distributed nature of Hadoop on the one side and the non-distributed nature of the data source. The system
needs to deal with corrupted records and need to provide monitoring services.
Data Parsing
Data Parsing
Most data sources provide data in a certain format that needs to be parsed into the Hadoop system. For example,
lets consider parsing a log file into records. Some formats are complicated to parse like JSON where a record can
be on many lines of text and not just one line per record.
Data Analytics
Data
Visualization
Scheduling
Dependency
Management
Data
Synchronization
Monitoring API
Management UI
Security
Integration
Data Analytics
In order for data to be properly analyzed, a big data analytics solution needs to support rapid iterations.
Data Visualization
In order for an analyst to see the insights, data needs to be visualized. Integrating visualization is difficult because
middleware needs to be built to deliver the data out of Hadoop and into the visualization layer.
Scheduling
All the items discussed above need to be orchestrated and scheduled. Scheduling needs to be easy to configure. In
addition, the scheduling needs have monitoring services to notify administrators of jobs that fail.
Dependency Management
There are complex dependencies that must be managed. For example, certain data sets have to be loaded before
certain jobs in Hadoop can be run.
Data synchronization
Data often needs to be pushed from Hadoop in to a data store like a database or in-memory system.
Monitoring API
Every aspect of a big data analytics solution needs to be monitored. Things that need to be monitored include who
has access to the system, job health, performance, and data throughput.
Management UI
A management user interface is critical for ease of configuration and monitoring.
Security Integration
For security purposes, it is important to be able to integrate with Kerberos and LDAP.
These capabilities map to the steps of the big data analytics process, including:
integration
functional analytics
visualization
smart analytics
Well walk through the things youll want to look for to address each of these steps in the following pages of this section.
Big Data Analytics Buyers Guide
18
19
Visualize
Integrate
Deploy
20
Pageviews
The most popular pages, represented in the tag cloud by page
views this month...
na
ter
ex
busin
ess_
news
Click paths
rum
_fo
ity
un
mm
co
ment_ne
infotain
11.0
10.5
10.0
9.5
9.0
8.5
8.0
7.5
7.0
6.5
6.0
5.5
5.0
co
mm
un
ity
_n
ew
10
20
30
40
50
60
70
80
90
100
110
120
130
140
150
160
170
180
190
200
210
220
230
240
250
260
infotain
m
ent_hi
s...
21
Decision Trees
Datameers decision trees (random forest algorithm) help you understand the different combinations of data
attributes that result in a desired outcome. Decision trees are often used when enriching a dataset with additional
data sources to optimize a process for a better outcome. The structure of the decision tree reflects the structure
that is possibly hidden in your data.
For example, find out what common attributes influence:
Disease risk
Fraud risk
Customer churn
Purchases
Online signups
Root-cause
Product conversions
22
...continued
Column Dependencies
Want to know how strongly a single data attribute like age, location, or gender, relates to other data attributes like
income, college degree, or credit score? The column dependency algorithm automatically compares every possible
data attribute combination and visually ranks the strengths of those relationships so you can instantly see where to
focus further. Those relationships are important themselves and is often used to help target further analysis.
For example, see the relationship between:
Title and purchase amount
Transaction type and frequency
Location and product selection
Average session length and virtual goods purchase
Age and disease type
Account age and product type
Age and number of SMS messages
Recommendation Engine
Datameers recommendation engine automatically predicts interests of a person based on historical observations of
similar peoples interests so you can increase engagement, recommend more relevant choices, increase customer
satisfaction, and more.
For example, predict interest in:
Music
Movies
Content
Services
Products
Documents
Applications
23
3.
2.
4.
Support annotations?
next page...
24
...continued
5.
7.
8.
6.
9.
25
STEP 4
VALIDATE
SOLUTION
VALIDATE SOLUTION
$$$
TCO
Return
Identify Fraud
Hardware
$$$
Lower Customer
Acquisition Costs
Software
Time
Increase Retention
Integration
Flexibility
Increase
Conversion Rate
People
Lower IT Costs
Operations
Logistics
27
VALIDATE SOLUTION
Step 1
Step 2
Step 3
Estimate
ROI
Measure
ROI
Update
Model
28
VALIDATE SOLUTION
MEASURE ROI
Typical TCO (total cost of ownership) and ROI (return on investment) analyses show hardware savings, software
savings, and productivity gains. In addition, the Datameer analysis provides business benefits such as the increased
customer conversion rate that leads to $20M in new revenue. This business benefit is magnitudes larger than the IT
savings. Take the time to find the business benefits and become the big data hero. IT savings are signifcant, but
business benefits are greater. Datameer can help with this.
Have a big vision. Start small. Iterate. Build on your successes.
Once youve measured the ROI for one project you can reuse that ROI metric (e.g. % increase in customer
conversion rate) to estimate ROI gains for other projects within your organization that have similar use cases.
Project 3
Project 2
Project 1
Hardware
Savings
Software
Savings
Productivity
Business
Benefit
29
VALIDATE SOLUTION
INDUSTRY
BENEFIT
Software Security
Behavioral Analytics
Online Gaming
2X Revenue
Customer Segmentation
Financial Services
Predictive Support
Enterprise Storage
Retail
Software Security
Fraud Detection
Financial Services
Pricing Optimization
You can use these returns in your estimation of return from a big data analytics project.
In the following pages you will see examples of our previous use cases and how Datameer customers have
captured ROI.
30
VALIDATE SOLUTION
Conversion
60%
31
VALIDATE SOLUTION
BEHAVIORAL ANALYTICS
Improve game flow and increase number of paying customers
The game for gaming companies is to increase customer acquisition, retention and monetization. This means
getting more users to play, play more often and longer, and pay. First, analysts use Datameer to identify common
characteristics of users. As a result, gaming companies can target these users better with the right advertising
placement and content. To increase retention, analysts use Datameer to understand what gets a user to play
longer. A user who plays longer and interacts with other players makes the overall gaming experience better. To
increase monetization, analysts use Datameer to identify the group of users most likely to pay based on common
characteristics. As a result of this analysis the company was able to double their revenue to over $100M.
User Prole
Revenue
2x
Increase revenue by 2x
32
VALIDATE SOLUTION
PREDICTIVE SUPPORT
Identify operational failure and address them before they are reported
A couple of hours of downtime in a store or production environment means lost revenue, sometimes in the
millions of dollars. The clues to where downtime may occur are spread across devices around a store or facility
including WLAN controllers, mobile devices, routers and firewall devices. For this customer, these devices are
used to run operations such as tracking inventory. Each network device generates enormous amounts of
machine-generated data. By using Datameer to analyze all network data, this company was able to detect
potential network failures faster. As a result, they were able to reduce the number of network failures by 30%.
Store X
Store Y
Store Z
Failure
30%
33
VALIDATION CRITERIA
Historical
Inventory
Pricing
Transaction
12 weeks
to
3 days
34
VALIDATE SOLUTION
Log Files
Firewall
Feeds
Predict
threat
35
VALIDATE SOLUTION
FRAUD DETECTION
Identify potential fraud
Credit card fraud has changed. Instead of stealing a credit card and using it to buy big ticket items, some credit
card thieves have become more sophisticated. For example, they can now making numerous, small transactions
that are seemingly benign. But if Joe is making 100 $5 margarita transactions at various locations, something is
wrong. By analyzing point of sale, geolocation, authorization, and transaction data with Datameer, this financial
customer was able to identify fraud patterns in historical data. This analysis helped the firm identify $2B in fraud.
By applying the fraud model to new transactions, the company was able to identify potential fraud and proactively
notify customers.
Point of Sale
Geo-location
Authorization
Transaction
Prevent
fraud
36
KEY TAKEAWAYS
1. Have you defined your decision criteria?
(See Decision Criteria for selecting a big data analytics solution p.6)
2. Have you defined and identified your big data use cases?
(See Decision Criteria for selecting a big data analytics solution p.8)
3. Have qualified the solutions so that you have only one or two solutions to validate?
(See Decision Criteria for selecting a big data analytics solution p.16)
4. Have validated the solution and created the ROI/TCO to compare the solution?
(See Decision Criteria for selecting a big data analytics solution p.8)
37