Sunteți pe pagina 1din 5


com USA: +1 772 777 1557 & UK: +44 702 409 4077


Introduction to Hadoop
What is Distributed File System?
Problems with Traditional Large-Scale Systems
Introduction to Hadoop
Brief history of Hadoop
RDBMS/SQL vs. Hadoop
DWH vs. Hadoop
Scaling with Hadoop
Introduction to the Hadoop Ecosystem
Business Use cases on Health Care /Banking Industry

Assignment -1
HADOOP 2.0 Cluster Setup
Hadoop Installation & Configuration
Setting up Standalone system
Setting up pseudo distributed cluster
Installing Hadoop in Pseudo Distributed Mode, Understanding Important configuration files,
their Properties and Demon Threads
Hadoop Daemon Addresses and Ports, Other Hadoop Properties
SSH Configuration
Basic Unix/Linux Commands Hands-On

Assignment -2
HDFS Deep Dive
Significance of HDFS in Hadoop
Features of HDFS
HDFS Architecture
Daemons of Hadoop
Name Node and its functionality
Data Node and its functionality
Secondary Name Node and its functionality
Hadoop 2.0 New Features-- Name Node High Availability
HDFS Federation
Resource Manager
Node Manager
App Manager
Name Space
Block Pool
Job Tracker and its functionality
Task Track and its functionality
Data Flow (Anatomy of a File Read, Anatomy of a File Write, Coherency Model)
Heartbeats, Data Node commissioning/decommissioning
Rack Awareness, Block Scanner, Balancer, Trash, Health Check
Exploring the HDFS Web UI
Parallel Copying with DISTCP
Hadoop Archives USA: +1 772 777 1557 & UK: +44 702 409 4077

Hadoop Commands Hands on Live Environment

Assignment -3
Map Reduce (YARN)
The Map Reduce Flow
Hadoop Data Types
Functional - Concept of Mappers, Functional - Concept of Reducers
Basic Map Reduce API Concepts
Writing Map Reduce Drivers, Mappers and Reducers in Java
The Execution Framework
Shuffle and Sort
Speculative Execution
Speeding Up Hadoop Development by Using Eclipse
Hands-On Exercise: Writing a Map Reduce Program
Differences between the Old and New Map Reduce APIs
Exploring the Map Reduce Web UI
Creating Input and Output Formats in Map Reduce Jobs
Text Input Format
Key Value Input Format
Sequence File Input Format
How to debug Map Reduce Jobs in Local and Pseudo cluster Mode.
OutPut Formats (TextOutput, BinaryOutPut, Multiple Output)
Joining Data sets in Map Reduce
Delving Deeper Into the Hadoop API
More Advanced Map Reduce Programming
Error Handling, Tuning
Advance Map Reduce
Fair and Capacity Schedulers
Programming in YARN
Running MRv1 in YARN
Upgrade your existing code to MRv2
Advance Map Reduce programming and error handling

Assignment -4
Pigs Eat Anything
What Is Pig?
Pig Use Cases
How Pig Works
Installing and Configuring Pig
Pig Latin and the Grunt shell
Modes Of Execution in Pig
Local Mode
Map Reduce OR Distributed Mode
Loading data
Data types and schemas
Pig Latin details: structure, functions, expressions, relational operators USA: +1 772 777 1557 & UK: +44 702 409 4077

Intro to User Defined Functions and Scripts
How to write pig script
Advance Pig Latin, Evaluation and Filter functions, Pig and Ecosystem
Real time use cases Health Care Industry
Hands on Exercise: Using Pig for ETL Processing

Assignment -5
Hive for Structured Data
Hive Introduction
Hive Architecture
Hive Meta Store
Comparison with Traditional Database (Schema on Read versus Schema on Write, Updates,
Transactions and Indexes)
Hive Schema and Data Storage
Hive Setup and Configuration
Hive vs Pig
HiveQL and Hive Shell
Creating Hive Tables
Loading Data into Hive
Retrieving Data with the SELECT Command
Joining Tables
Storing Query Results in HDFS
Partitioning Data
Bucketing Data
Hive Variables
The Hive CLI
Hive and Thrift
Hive Transform
Hands on Exercises Playing with huge data and querying extensively
Debugging and Troubleshooting Hive
User Defined Functions
Appending Data into existing Hive Table
Custom Map/Reduce in Hive
Overview of Text Processing
Important String Functions
Using Regular Expressions in Hive
Sentiment Analysis and N-Grams
Hands on Exercise

Assignment -6
Real-time I/O with HBase
HBase Introduction
HBase Architecture
HBase versions and origins
HBase vs. RDBMS
HBase Master and Region Servers
Data Modeling
Column Families and Regions USA: +1 772 777 1557 & UK: +44 702 409 4077

Bloom Filters and Block Indexes
Write Pipeline/ Read Pipeline
Catalog Tables
The HBase Shell
Running the Shell
Creating the Tables
Accessing Data in Tables
HBase Administration
Managed Operations
Capacity Planning
Map Reduce Integration

Assignment -7
Introduction ETL Concepts
Introduction to Sqoop
Setup and Configuration of Sqoop
MySQL client and Server Installation
How to connect to Relational Database using Sqoop
Sqoop Import
Connecting to a Database Server
Selecting the Data to Import
Free-form Query Imports
Controlling Parallelism
Controlling the Import Process
Controlling type mapping
Incremental Imports
File Formats
Importing Data into Hive
Importing Data into Hbase
Hands on Exercise
Working with Imported Data
Importing Large Objects
Sqoop Export
Inserts vs Updates
Exports and Transactions
Hands on Exercise USA: +1 772 777 1557 & UK: +44 702 409 4077

Assignment -8
What is Flume?
Setup and Configuration of Flume
Flume Architecture
How it works?

Assignment -9
The Zookeeper Service (Data Modal, Operations, Implementation, Consistency, Sessions, States)
Building Applications with Zookeeper (Zookeeper in Production)

Assignment -10


Health Care Project: It has all the details of Health Care System over a period of time using which you
may find out Member policy logins, Provide Services, Treatment Methadone Abstract, Early Dropout
Abstract, Payment Processing to Providers and agents etc.
Additional Features
Cloudera HADOOP Developer/Admin Certification Guidance
HADOOP Installation process and Configuration
Well Versed Materials Which Covers Hadoop Ecosystem, UNIX and JAVA
Separate JAVA and UNIX Training for Beginners
We also have a 24x7 Support

S-ar putea să vă placă și