Sunteți pe pagina 1din 92

Big Data Overview & Hadoop for DBA’s

Satyendra Pasalapudi
Associate Practice Director
Apps Associates LLC

© Copyright 2016. Apps Associates LLC. 1


About Me
Satyendra Kumar Pasalapudi
Associate Practice Director – Infrastructure/Cloud Practice at Apps Associates
Co-Founder & President of All India Oracle Users Group(AIOUG)

@pasalapudi

© Copyright 2016. Apps Associates LLC. 2


www.ora-search.com

© Copyright 2016. Apps Associates LLC. 3


History of Data Management Systems

Magnetic tape
IDMS
Access HBase
“flat” (sequential) files
ADABAS
System R Postgres Dynamo
Magnetic Disk
Oracle V2
MySQL MongoDB
Pre-computer Redis
technologies:
VoltDB
Printing press Neo4J
Dewey decimal
system
Punched cards
1940-50 1950-60 1960-70 1970-80 1980-90 1990-2000 2000-2010

SQL Server Aerospike


Relational
Model Sybase Hana
defined
Informix Riak
IMS
Ingres
Cassandra
Network Model
Hierarchical model DB2
Vertica
Indexed-Sequential dBase
Access Mechanism Hadoop
(ISAM)

© Copyright 2016. Apps Associates LLC. 4


@dvantages of Cloud

© Copyright 2016. Apps Associates LLC. 5


Generational Change for Enterprise (IT)
 Cloud supports mission critical workloads
─ 87% of Enterprises use Cloud for Mission
Critical Applications

 Cloud use in the enterprise continues to


grow
─ Half of the Enterprises say they will use
cloud for at least 75% of their workloads
by 2018

 No one cloud fits all


─ More than half (53 %) of enterprises use
two(2) to four(4) cloud providers
Source: Verizon 2016 State of the Market: Enterprise Cloud report

© Copyright 2016. Apps Associates LLC. 6


Cloud – Probable to Inevitable
 GE undergoing most important
transformation in 140 year history
─ 9000 Applications to AWS & to 4000 Applications
─ 300 ERPs (two years back) to more manageable
─ 34 Data Centers to 4 Data Centers
 By 2020 - US$15b of Software Revenue
 Changes
─ People - Reduce Outsourcing
─ Service Management ─ Technology - Build Approach for things that matter
─ Network Perimeter ─ 20% of Applications in Cloud as of today
─ Risk Based Security Controls ─ 70% of Applications by 2020 in Cloud
─ Self Service and Automation
─ Financial Transparency
Source: AWS 2015 Keynote – Oct 6 2015
OOW Keynote with Mark Hurd Oct 26 2015

© Copyright 2016. Apps Associates LLC. 7


What is Cloud

© Copyright 2016. Apps Associates LLC. 8


The Role of Data
is Changing
Until now, Questions you ask drove
Data model

New model is collect as much data as possible


– “Data-First Philosophy”
© Copyright 2016. Apps Associates LLC. 10
Data is the new raw material for
Dataany business on par with
is the new raw material for any business on par
with capital, people, labor
capital, people, labor

© Copyright 2016. Apps Associates LLC. 11


Characteristics of Big Data

© Copyright 2016. Apps Associates LLC. 12


Big data Challenge
Cost effectively manage
and analyze
all available data in its
native form
unstructured,
structured, streaming

Website Social Media

Billing
ERP RFID Network Switches
CRM
© Copyright 2016. Apps Associates LLC. 13
Hybrid Cloud Framework

HR FIN

SCOM SALES

PLANNING

DW / BI

PROCUREMENT

© Copyright 2016. Apps Associates LLC. 14


Big data Eco System

© Copyright 2016. Apps Associates LLC. 15


Not Easy to Get Analytic Value at Fast Enough Pace

Data Uncertainty 80% effort


• Not familiar and overwhelming typically spent on
• Potential value not obvious
evaluating and
preparing data
• Requires significant manipulation

Tool Complexity Overly dependent


• Early Hadoop tools only for experts on scarce and
• Existing BI tools not designed for Hadoop
highly skilled
resources
• Emerging solutions lack broad capabilities
Source : Oracle

© Copyright 2016. Apps Associates LLC. 16


Key Challenges in Managing Big Data

Addressed by Oracle Big Data Discovery

Informatica Study May 2013

© Copyright 2016. Apps Associates LLC. 17


Sample of Big Data Use Cases Today
AUTOMOTIVE COMMUNICATIONS Retail / CPG FINANCIAL EDUCATION &
Auto sensors What is the main difference in this data?
Location-based
advertising
Sentiment analysis SERVICES
Risk & portfolio analysis
RESEARCH
Experiment
reporting Hot products
location, New products sensor analysis
Optimized Marketing
problems

HIGH TECHNOLOGY /
Volume, Velocity, Variety
LIFE SCIENCES MEDIA/
ON-LINE
SERVICES / HEALTH CARE
INDUSTRIAL MFG. Clinical trials ENTERTAINMENT SOCIAL MEDIA Patient sensors,
Mfg quality Genomics Viewers / advertising People & career monitoring, EHRs
Warranty analysis effectiveness matching Quality of care
Cross Sell Web-site
optimization

OIL & GAS Games TRAVEL & LAW ENFORCEMENT


TRANSPORTATION UTILITIES & DEFENSE
Drilling Adjust to
exploration
sensor analysis
These Characteristics Challenge Your Existing
player
behavior
Sensor analysis for
optimal traffic flows
Smart Meter
analysis for
Threat analysis -
social media
network
In-Game Ads Architecture
Customer sentiment capacity,
monitoring, photo
analysis

© Copyright 2016. Apps Associates LLC. 18


Big Data Verticals

Social
Media/A Life Financial
Oil & Gas Retail Security Network/
dvertising Sciences Services
Gaming
Monte User
Targeted Anti-virus Demograp
Recomme Carlo
Advertisin hics
nd Simulatio
g
ns
Seismic Genome Fraud Usage
Analysis Analysis Detection analysis
Image
Transactio
and Video Risk Image
ns In-game
Processin Analysis Recogniti
Analysis metrics
g on

© Copyright 2016. Apps Associates LLC. 19


Sample Enterprise Big Data Architecture
In-memory Analytic/BI
processing software (SAS,
(Spark) Tableau

Data In-memory
Web Server Warehouse Analytics
RDBMS (HANA,
(Oracle, Exalytics …)
Teradata …)

Hadoop
Web DBMS ERP & in-
(MySQL, Operational house CRM
Mongo, RDBMS (Oracle,
Cassandra) SQL Server, …)

© Copyright 2016. Apps Associates LLC. 20


Enterprise Data Hub / Data Lake / Data Reservoir

© Copyright 2016. Apps Associates LLC. 21


We Need Tools Built Specifically
for Big Data
Hadoop and it’s Eco System

• Scale out Easily • Solves some Problems


• Parallel Computing • Complex to Run
• Commodity Hardware • Special Skills to Maintain
Cassandra

© Copyright 2016. Apps Associates LLC. 23


ETL for Unstructured Data

© Copyright 2016. Apps Associates LLC. 24


ETL for Structured Data

© Copyright 2016. Apps Associates LLC. 25


Hadoop Design Principles
• System shall manage and heal itself
– Automatically and transparently route around failure
– Speculatively execute redundant tasks if certain nodes are detected to be
slow
• Performance shall scale linearly
– Proportional change in capacity with resource change
• Compute should move to data
– Lower latency, lower bandwidth
• Simple core, modular and extensible

© Copyright 2016. Apps Associates LLC. 26


Hadoop History
• Dec 2004 – Google GFS paper published
• July 2005 – Nutch uses MapReduce
• Feb 2006 – Starts as a Lucene subproject
• Apr 2007 – Yahoo! on 1000-node cluster
• Jan 2008 – An Apache Top Level Project
• Jul 2008 – A 4000 node test cluster
• May 2009 – Hadoop sorts Petabyte in 17 hours

© Copyright 2016. Apps Associates LLC. 27


Google Applications

Google Software Map Reduce BigTable


Architecture
(circa 2005) Google File System (GFS)
Map Reduce
Map
Map
Map
Map
Map
Map
Map
Map
Map
Map
Start Map Reduce
Map Map
Map Map
Map Map
Map Map
Map Map
Map Map
Map Map
Map Map
Map Map
Map Map
Map Map
Map
Map
Hadoop Ecosystem
Client Access Data Access Data Mining Orchestration
Hue
Hive(Sql) Sqoop Mahout Oozie
Pig(Pl/Sql) Flume
Networking

Chukwa (Monitoring)
MapReduce (Job Scheduling/Execution System)
(Coordination)
ZooKeeper

HBase (key-value store) (Streaming/Pipes APIs)


HDFS (Hadoop Distributed File System)
Java Virtual Machine
OS – Redhat, Suse, Ubuntu,Windows
Commodity Hardware
© Copyright 2016. Apps Associates LLC. 30
Hadoop – Simplified View

Controller Worker Nodes

• MPP (Massively Parallel) hardware running database-like software


• “Data” is stored in parts, across multiple worker nodes
• “Work” operates in parallel, on the different parts of the table

© Copyright 2016. Apps Associates LLC. 31


HDFS Architecture

© Copyright 2016. Apps Associates LLC. 32


HDFS Architecture

Metadata ops Metadata(Name, replicas..)


Namenode (/home/foo/data,6. ..

Client
Block ops
Read Datanodes Datanodes

replication B
Blocks

Rack1 Write Rack2


Client
HDFS – Highly Available

Head Node Data 1 Data 2 Data 3 Data 4


MYFILE.TXT
..block1 -> block1
..block2 -> block2
..block3 -> block3

© Copyright 2016. Apps Associates LLC. 34


Namenode and Datanodes

 Master/slave architecture
 HDFS cluster consists of a single Namenode, a master server that manages the file
system namespace and regulates access to files by clients.
 There are a number of DataNodes usually one per node in a cluster.
 The DataNodes manage storage attached to the nodes that they run on.
 HDFS exposes a file system namespace and allows user data to be stored in files.
 A file is split into one or more blocks and set of blocks are stored in DataNodes.
 DataNodes: serves read, write requests, performs block creation, deletion, and
replication upon instruction from Namenode.

© Copyright 2016. Apps Associates LLC. 35


Hadoop 1 – Job & Task Trackers
Master Node - The majority of hadoop deployments consist of sevaral master node
instances. Having more than one master node helps eliminate the risk of single
point of failure.

NameNode - These processes are charged with storing a directory tree of all files
in the Hadoop Distributed File SYstem (HDFS). They also keep track of where the
file data is kept within in the cluster. Client Applications contact Name Nodes when
they need to locate a file, or add, or copy or delete a file.

DataNodes - The datanode stores data in the HDFS and is responsible for
replicating data across clusters. Data Nodes interact with client applications when
the NameNopde has supplied the Datanode's address.

WorkerNode: Unlike a master node, whose numbers we can count on one hand, a
representative Hadoop Deployment consists of dozens or hundreds of worker
nodes, which provides enough processing power to analyze a
few hundreds terabytes all the way upto one petabyte. Each worker node includes
a DataNode as well as Task Tracker.
Map Reduce

Job Tracker /MapReduce Workload Management Layer - This


process is assigned to interact with client applications. It is
responsible for distributing MapReduce tasks to particular nodes
within in a cluster. This engine coordinates all aspects of hadoop
such as scheduling and launching jobs.

Task Tracker - This is a process in the cluster that is capable of


receiving tasks( inlcuding Map, Reduce, and Shuffle) from a Job
Tracker
Data Replication Similar to that of ASM

 HDFS is designed to store very large files across machines in a large cluster.
 Each file is a sequence of blocks.
 All blocks in the file except the last are of the same size.
 Blocks are replicated for fault tolerance.
 Block size and replicas are configurable per file.
 The Namenode receives a Heartbeat and a BlockReport from each DataNode
in the cluster.
 BlockReport contains all the blocks on a Datanode.

© Copyright 2016. Apps Associates LLC. 38


Replica Placement & Rack Aware
 The placement of the replicas is critical to HDFS reliability and performance.
 Optimizing replica placement distinguishes HDFS from other distributed file systems.
 Rack-aware replica placement:
 Goal: improve reliability, availability and network bandwidth utilization
 Many racks, communication between racks are through switches.
 Network bandwidth between machines on the same rack is greater than those in different racks.
 Namenode determines the rack id for each DataNode.
 Replicas are typically placed on unique racks
 Simple but non-optimal
 Writes are expensive
 Replication factor is 3
 Replicas are placed: one on a node in a local rack, one on a different node in the local rack and
one on a node in a different rack.

© Copyright 2016. Apps Associates LLC. 39


Replica Selection

• Replica selection for READ operation: HDFS tries to minimize the bandwidth
consumption and latency.
• If there is a replica on the Reader node then that is preferred.
• HDFS cluster may span multiple data centers: replica in the local data center
is preferred over the remote one.

© Copyright 2016. Apps Associates LLC. 40


Hadoop Components

• Hadoop is bundled with two independent components


– HDFS (Hadoop Distributed File System)
• Designed for scaling in terms of storage and IO bandwidth
– MR framework (MapReduce)
• Designed for scaling in terms of performance

© Copyright 2016. Apps Associates LLC. 41


Understanding file structure

File is Each block is


split into typically
blocks 64MB
Bloc
k

Each block is stored as


1 GB file two files – one holding
data and second for
metadata, checksum
© Copyright 2016. Apps Associates LLC. 42
Hadoop Processes
• Processes running on Hadoop
– NameNode
– DataNode
– Secondary NameNode
– Task Tracker
– Job Tracker

© Copyright 2016. Apps Associates LLC. 43


NameNode

• Single point of contact


NameNode
• HDFS master
• Holds meta information
– List of files and directories
– Location of blocks
• Single node per cluster
– Cluster can have thousands of DataNodes and tens
of thousands of HDFS client.

© Copyright 2016. Apps Associates LLC. 44


DataNode
• Can execute multiple tasks concurrently
• Holds actual data blocks, checksum and generation stamp
• If block is half full, needs only half of the space of full block
• At start-up, connects to NameNode and perform handshake
DataNode
• No binding to IP address or port, uses Storage ID Storage ID:
XYZ001
• Sends heartbeat to NameNode

© Copyright 2016. Apps Associates LLC. 45


Communication
Heartbeat
• Total Storage Capacity • Instructs DataNode
• Fraction of storage in use • Replicate block to other node
• No of data transfer currently • Remove local block replica
in progress
NameNod
• Send immediate block report
e • Shut down the node

Every 3
seconds. No heartbeat
“I AM ALIVE” for 10 minutes
Reply

DataNode DataNode
Storage ID: DataNode Storage ID:
XYZ001 Storage ID: XYZ003
XYZ002
© Copyright 2016. Apps Associates LLC. 46
© Copyright 2016. Apps Associates LLC. 47
Coordination in a distributed system
• Coordination: An act that multiple nodes must perform together.
• Examples:
– Group membership
– Locking
– Publisher/Subscriber
– Leader Election
– Synchronization
• Getting node coordination correct is very hard!
Introducing ZooKeeper

ZooKeeper allows distributed processes to


coordinate with each other through a shared
hierarchical name space of data registers.
- ZooKeeper Wiki

ZooKeeper is much more than a


distributed lock server!
What is ZooKeeper?
• An open source, high-performance coordination service for
distributed applications.
• Exposes common services in simple interface:
– naming
– configuration management
– locks & synchronization
– group services
… developers don't have to write them from scratch
• Build your own on it for specific needs.
HDFS Distributions

© Copyright 2016. Apps Associates LLC. 52


Real Time BI

• Speed, agility, and intelligence are competitive advantages that nearly all
organizations seek.

• Existing Traditional Reporting Systems provide information after 24 – 36 hours.

• To support Operational Users and influence what should happen next, the data
should be available in real time to know what is happening now.

© Copyright 2016. Apps Associates LLC. 53


Hadoop 2.0

© Copyright 2016. Apps Associates LLC. 54


MR-279: YARN
Hadoop 2 & YARN
2006 2009 October 23, 2013 Enabled the
Modern Data
Architecture
Hadoop w/ Hadoop2 & YARN based Architecture
MapReduce
MapReduce
Largely Batch Processing Batch Interactive Real-Time
1 ° ° ° ° °
HDFS
(Hadoop
° ° Distributed
° ° File °System)
N YARN: Data Operating System
1 ° ° ° ° ° ° ° ° ° °
HDFS
° ° (Hadoop
° ° Distributed
° ° File
° System)
° ° N °

Silo’d clusters
Largely batch system
Difficult to integrate
Hadoop 2.0

Multi Use Data Platform


Batch, Interactive, Realtime, Online, Streaming, …

HADOOP 2
Standard Query Online Data Real Time Stream
Processing Processing Processing Others
Hive

Batch Interactive
MapReduce Tez

Efficient Cluster Resource


Management & Shared Services
(YARN)
Redundant, Reliable Storage
(HDFS)

© Copyright 2015. Apps Associates LLC. 56


Hadoop 2.0 with YARN

© Copyright 2016. Apps Associates LLC. 57


Resource Manager/Node Manager Components

© Copyright 2016. Apps Associates LLC. 58


Problems with this approach in Hadoop 1.0
 It limits scalability: JobTracker runs on single
machine doing several task like
1) Resource management
2) Job and task scheduling and
3) Monitoring
Although there are so many machines (DataNode)
available; they are not getting used. This limits scalability.

 Availability Issue: In Hadoop 1.0, JobTracker is single


Point of availability. This means if JobTracker fails, all
jobs must restart.
 Distinct map slots and reduce slots
 Limitation in running non-MapReduce Application

© Copyright 2016. Apps Associates LLC. 59


Yarn Architecture
 Rescource Manager:
Arbitrates division of resources among all the
applications in the system. The Resource Manager has a pluggable
scheduler component, which is responsible for allocating resources
to the various running applications
 Node Manager:
per-machine slave, runs on slave nodes, which is
responsible for launching the applications’ containers, monitoring
their resource usage (CPU, memory, disk, network),and reporting
the same to the Resource Manager.
 Application Master:
Negotiate appropriate resource containers from the
Scheduler, tracking their status and monitoring for progress
 Container:
Unit of allocation incorporating resource elements
such as memory, cpu, disk, network etc, to execute a specific task of
the application (similar to map/reduce slots in MRv1)

© Copyright 2016. Apps Associates LLC. 60


Yarn - Execution Sequence
1) A client program submits the application
2) ResourceManager allocates a specified container to start the
ApplicationMaster
3) ApplicationMaster, on boot-up, registers with
ResourceManager
4) ApplicationMaster negotiates with ResourceManager for
appropriate resource containers
5) On successful container allocations, ApplicationMaster
contacts NodeManager to launch the container
6) Application code is executed within the container, and then
ApplicationMaster is responded with the execution status
7) During execution, the client communicates directly with
ApplicationMaster or ResourceManager to get status, progress
updates etc.
8) Once the application is complete, ApplicationMaster
unregisters with ResourceManager and shuts down, allowing
its own container process

© Copyright 2016. Apps Associates LLC. 61


Operational vs. Analytical Databases

© Copyright 2016. Apps Associates LLC. 62


A New Technology

© Copyright 2016. Apps Associates LLC. 63


No Means Yes!
Use Cases

© Copyright 2016. Apps Associates LLC. 65


Brewer's CAP Theorem

© Copyright 2016. Apps Associates LLC. 66


Brewer's CAP Theorem

© Copyright 2016. Apps Associates LLC. 67


NoSQL Technology Spectrum

© Copyright 2016. Apps Associates LLC. 68


BigTable Data Model
NameId Name SiteId SiteName
1 Dick 1 Ebay
2 Jane 2 Google
3 Facebook
4 ILoveLarry.com
5 MadBillFans.com

Name Site Counter


Dick Ebay 507,018 NameId SiteId Counter
Dick Google 690,414 1 1 507,018
Jane Google 716,426 1 3 690,414
Dick Facebook 723,649 2 3 716,426
Jane Facebook 643,261 1 3 723,649
Jane ILoveLarry.com 856,767 2 3 643,261
Dick MadBillFans.com 675,230 2 4 856,767
1 5 675,230

Id Name Ebay Google Facebook (other columns) MadBillFans.com


1 Dick 507,018 690,414 723,649 . . . . . . . . . . . . . . 675,230

Id Name Google Facebook (other columns) ILoveLarry.com


2 Jane 716,426 643,261 . . . . . . . . . . . . . . 856,767
Document databases

• Structured documents – XML and JSON


(JavaScript Object Notation) become more
prevalent within applications

• Web programmers start storing these in BLOBS in


MySQL

• Emergence of XML and JSON databases


MemchacheD
B MongoDB

Key Value Oracle NoSQL Voldemort JSON based CouchDB

Dynamo DynamoDB RethinkDB


Document

MarkLogic
Riak
XML based
BerkeleyDB
XML

Cassandra

Neo4J
Hbase
Table Based BigTable
Graph
Infinite Graph
HyperTable Database

FlockDB
Accumulo
Multiple Data Stores

Hadoop NoSQL Relational

Change the Business Scale the Business Run the Business


 Scale-out, low cost store  Scale-out, low cost store  Scale-out and scale-up
 Collect any data  Collect key-value data  Collect any data
 Map-reduce, SQL  Find data by key  SQL
 Analytic applications  Web applications  Transactional and analytic
applications for the enterprise
 Secure and highly available

© Copyright 2016. Apps Associates LLC. 72


Data Analytics Challenge
Separate silos of information to analyze

© Copyright 2016. Apps Associates LLC. 73


Data Analytics Challenge
Separate data access interfaces

© Copyright 2016. Apps Associates LLC. 74


SQL on Hadoop is Obvious

Stinger

© Copyright 2016. Apps Associates LLC. 75


Data Analytics Challenge
No comprehensive SQL interface across Oracle, Hadoop and NoSQL

© Copyright 2016. Apps Associates LLC. 76


Oracle Big Data Management System
Rich, comprehensive SQL access to all enterprise data

NoSQL

© Copyright 2016. Apps Associates LLC. 77


What Does Unified Query Mean for You?
Data Science
Before After

PhD Anyone

???

© Copyright 2016. Apps Associates LLC. 78


What Does Unified Query Mean for You?
Application Development
Before After

© Copyright 2016. Apps Associates LLC. 79


A New Hadoop Processing Engine
Processing Layer
MapReduce Big Data
Spark Impala Search
and Hive SQL

Resource Management (YARN)


Storage Layer

NoSQL Databases
Filesystem (HDFS)
(Oracle NoSQL DB, Hbase)

© Copyright 2016. Apps Associates LLC. 80


Big Data SQL

SELECT w.sess_id, c.name


FROM web_logs w, customers c
WHERE w.source_country = ‘Brazil’
AND w.cust_id = c.customer_id;

Relevant SQL runs on BDA nodes

Big Data SQL


10’s of Gigabytes of Data
WEB_LOGS
B B B Only columns and rows needed to
CUSTOMERS
answer query are returned

Hadoop Cluster Oracle Database

© Copyright 2016. Apps Associates LLC. 81


Big Data SQL

SELECT w.sess_id, c.name


FROM web_logs w, customers c
SQL Push WHERE
Down
AND
w.source_country = ‘Brazil’
in Big Data=SQL
w.cust_id c.customer_id;

• Hadoop Scans on Unstructured Data


• WHERE ClauseRelevant
Evaluation
SQL runs on BDA nodes
• Column Projection
Big Data SQL
• Bloom Filters for Better Join Performance
10’s of Gigabytes of Data
• JSON Parsing, Data Mining Model Evaluation
WEB_LOGS
B B B Only columns and rows needed to
CUSTOMERS
answer query are returned

Hadoop Cluster Oracle Database

© Copyright 2016. Apps Associates LLC. 82


Oracle Big Data SQL
Query All Data without Application Change or Data Conversion

© Copyright 2016. Apps Associates LLC. 83


High Level Architecture

VISUALIZE

INGEST PROCESS STORE

ANALYZE
Fast Pace Innovation

Dec 18th 2015

http://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at

© Copyright 2016. Apps Associates LLC. 85


BDD Value Proposition

Note: company logos and images are for illustration purposes only. Not a real use case for the company.

© Copyright 2016. Apps Associates LLC. 86


Oracle BDD - Technical Innovation on Hadoop

Oracle Big Data Discovery Workloads Other Hadoop


Workloads
Studio
Hadoop Cluster
(BDA or Commodity • Web UI: Find, Explore, Transform, Discover, Share MapReduce
Hardware)
In-Memory Discovery Indexes
BDD node
• DGraph: Search, Guided Navigation, Analytics Spark
name node Hadoop 2.x
Data Processing, Workflow & Monitoring
Metadata
data node (HCatalog) • Profiling: catalog entry creation, data type & Hive
language detection, schema configuration
Workload Mgmt • Sampling: dgraph (index) file creation
data node
(YARN) • Transforms: >100 functions
• Enrichments: location (geo), text (cleanup, Pig
data node sentiment, entity, key-phrase, whitelist tagging)
Filesystem
(HDFS)
Self-Service Provisioning & Data Transfer
data node Oracle Big Data SQL
• Personal Data: Upload CSV and XLS to HDFS
(BDA only)

© Copyright 2016. Apps Associates LLC. 87


Sample Enterprise Big Data Architecture
In-memory Analytic/BI
processing software (SAS,
(Spark) Tableau

Data In-memory
Web Server Warehouse Analytics
RDBMS (HANA,
(Oracle, Exalytics …)
Teradata …)

Hadoop
Web DBMS ERP & in-
(MySQL, Operational house CRM
Mongo, RDBMS (Oracle,
Cassandra) SQL Server, …)

© Copyright 2016. Apps Associates LLC. 88


How to transition into a Cloud Consultant

Cloud Tools &


Cloud Core Skills Automation Integration
= + Knowledge + 10%
+
Consultant 50% 20 %
20%

© Copyright 2016. Apps Associates LLC. 89


© Copyright 2016. Apps Associates LLC. 90
https://community.oracle.com/groups/aioug-social-group

Satyendra.pasalapudi@appsassociates.com Thank You!


@pasalapudi
www.ora-search.com

© Copyright 2016. Apps Associates LLC. 92

S-ar putea să vă placă și