Bine ați venit la Scribd!

Introduction To Apache Hadoop

Încărcat de

shashwat2010

0% au considerat acest document util (0 voturi)

28 vizualizări22 pagini

Just a basic introduction on Hadoop to get started with it. What is hadoop? What is Map Reduce? What is structure of Hadoop?

Titlu original

Introduction to Apache Hadoop

Drepturi de autor

Formate disponibile

PDF, TXT sau citiți online pe Scribd

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Raportați acest document

Just a basic introduction on Hadoop to get started with it. What is hadoop? What is Map Reduce? What is structure of Hadoop?

Drepturi de autor:

Attribution Non-Commercial (BY-NC)

Formate disponibile

Descărcați ca PDF, TXT sau citiți online pe Scribd

Indicator pentru conținut neadecvat

0% au considerat acest document util (0 voturi)

28 vizualizări22 pagini

Introduction To Apache Hadoop

Încărcat de

shashwat2010

Just a basic introduction on Hadoop to get started with it. What is hadoop? What is Map Reduce? What is structure of Hadoop?

Drepturi de autor:

Attribution Non-Commercial (BY-NC)

Formate disponibile

Descărcați ca PDF, TXT sau citiți online pe Scribd

Indicator pentru conținut neadecvat

Salt la pagina

Sunteți pe pagina 1din 22

Căutați în document

Agenda

Need for a new processing platform (BigData)

Origin of Hadoop
What is Hadoop & what it is not ? Hadoop architecture Hadoop components (Common/HDFS/MapReduce) Hadoop ecosystem When should we go for Hadoop ? Real world use cases

Questions

Need for a new processing platform (Big Data)

What is BigData ?
- Twitter (over 7~ TB/day) - Facebook (over 10~ TB/day) - Google (over 20~ PB/day)

Where does it come from ?

Why to take so much of pain ?

- Information everywhere, but where is the knowledge? Existing systems (vertical scalibility)

Why Hadoop (horizontal scalibility)?

Origin of Hadoop

Seminal whitepapers by Google in 2004 on a new programming paradigm to handle data at internet scale Hadoop started as a part of the Nutch project. In Jan 2006 Doug Cutting started working on Hadoop at Yahoo Factored out of Nutch in Feb 2006

First release of Apache Hadoop in September 2007

Jan 2008 - Hadoop became a top level Apache project

Hadoop distributions

Amazon Cloudera MapR

HortonWorks
Microsoft Windows Azure.

IBM InfoSphere Biginsights

Datameer EMC Greenplum HD Hadoop distribution Hadapt

What is Hadoop ?
Flexible

infrastructure for large scale computation & data processing on a network of commodity hardware Completely written in java Open source & distributed under Apache license Hadoop Common, HDFS & MapReduce

What Hadoop is not

replacement for existing data warehouse systems A File system An online transaction processing (OLTP) system Replacement of all programming logic A database

Hadoop architecture

High level view (NN, DN, JT, TT)

HDFS (Hadoop Distributed File System)

Hadoop distributed file system

Default storage for the Hadoop cluster NameNode/DataNode The File System Namespace(similar to our local file system)

Master/slave architecture (1 master 'n' slaves)

Virtual not physical Provides configurable replication (user specific) Data is stored as chunks (64 MB default, but configurable) across all the nodes

HDFS architecture

Data replication in HDFS.

Rack awareness

Typically large Hadoop clusters are arranged in racks and network traffic between different nodes with in the same rack is much more desirable than network traffic across the racks. In addition Namenode tries to place replicas of block on multiple racks for improved fault tolerance. A default installation assumes all the nodes belong to the same rack.

MapReduce

Framework provided by Hadoop to process large amount of data across a cluster of machines in a parallel manner Comprises of three classes Mapper class Reducer class Driver class

Tasktracker/ Jobtracker
Reducer phase will start only after mapper is done Takes (k,v) pairs and emits (k,v) pair

public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void

map(LongWritable key, Text value, Context context) throws

IOException, InterruptedException {
String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); context.write(word, one); } } }

MapReduce job flow

Modes of operation
Standalone

mode mode

Pseudo-distributed Fully-distributed

mode

Hadoop ecosystem

When should we go for Hadoop?

Data

is too huge are independent analytical processing

Processes Online Better

(OLAP)

scalability data

Parallelism Unstructured

Real world use cases

Clickstream
Sentiment Ad

analysis
engines

analysis

Recommendation

Targeting
Quality

What I have been doing

Seismic

Data Management & Processing

WITSML

Server & Drilling Analytics

Permission Map management for

Orchestra

SDIS

(just started)

Next steps: Get your hands dirty with code in a workshop on

Hadoop HDFS Map

Configuration

Data loading Reduce programming

Hbase

Hive

& Pig

QUESTIONS ?

S-ar putea să vă placă și

The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
De la Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
Evaluare: 4 din 5 stele
4/5 (5794)
Upgrading Hadoop
Document10 pagini
Upgrading Hadoop
shashwat2010
Încă nu există evaluări
The Little Book of Hygge: Danish Secrets to Happy Living
De la Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
Evaluare: 3.5 din 5 stele
3.5/5 (399)
Hadoop Migration and Upgradation
Document8 pagini
Hadoop Migration and Upgradation
shashwat2010
Încă nu există evaluări
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
De la Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
Evaluare: 3.5 din 5 stele
3.5/5 (231)
Hadoop Fully Distributed Cluster
Document5 pagini
Hadoop Fully Distributed Cluster
shashwat2010
Încă nu există evaluări
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
De la Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
Evaluare: 4 din 5 stele
4/5 (894)
Hadoop Fully Distributed Cluster
Document8 pagini
Hadoop Fully Distributed Cluster
shashwat2010
Încă nu există evaluări
The Yellow House: A Memoir (2019 National Book Award Winner)
De la Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
Evaluare: 4 din 5 stele
4/5 (98)
Hive Query Optimization Infinity
Document13 pagini
Hive Query Optimization Infinity
shashwat2010
Încă nu există evaluări
Shoe Dog: A Memoir by the Creator of Nike
De la Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
Evaluare: 4.5 din 5 stele
4.5/5 (537)
Hbase
Document29 pagini
Hbase
shashwat2010
Încă nu există evaluări
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
De la Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
Evaluare: 4.5 din 5 stele
4.5/5 (474)
Hive Configuration: Shashwat Shriparv
Document5 pagini
Hive Configuration: Shashwat Shriparv
shashwat2010
Încă nu există evaluări
Never Split the Difference: Negotiating As If Your Life Depended On It
De la Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
Evaluare: 4.5 din 5 stele
4.5/5 (838)
R Language Introduction
Document27 pagini
R Language Introduction
shashwat2010
Încă nu există evaluări
Grit: The Power of Passion and Perseverance
De la Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
Evaluare: 4 din 5 stele
4/5 (587)
HBase Development Java
Document24 pagini
HBase Development Java
shashwat2010
Încă nu există evaluări
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
De la Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
Evaluare: 4.5 din 5 stele
4.5/5 (265)
Hive Configuration: Shashwat Shriparv
Document5 pagini
Hive Configuration: Shashwat Shriparv
shashwat2010
Încă nu există evaluări
Yes Please
De la Everand
Yes Please
Amy Poehler
Evaluare: 4 din 5 stele
4/5 (1891)
Hive Configuration: Shashwat Shriparv
Document5 pagini
Hive Configuration: Shashwat Shriparv
shashwat2010
Încă nu există evaluări
Angela's Ashes: A Memoir
De la Everand
Angela's Ashes: A Memoir
Frank McCourt
Evaluare: 4.5 din 5 stele
4.5/5 (440)
Hive Configuration: Shashwat Shriparv
Document5 pagini
Hive Configuration: Shashwat Shriparv
shashwat2010
Încă nu există evaluări
The Emperor of All Maladies: A Biography of Cancer
De la Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
Evaluare: 4.5 din 5 stele
4.5/5 (271)
Linux 4 You
Document31 pagini
Linux 4 You
shashwat2010
100% (1)
On Fire: The (Burning) Case for a Green New Deal
De la Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
Evaluare: 4 din 5 stele
4/5 (73)
Mysql
Document11 pagini
Mysql
shashwat2010
100% (1)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
De la Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
Evaluare: 4.5 din 5 stele
4.5/5 (344)
Configure HBase Hadoop and Hbase Client
Document16 pagini
Configure HBase Hadoop and Hbase Client
shashwat2010
Încă nu există evaluări
Team of Rivals: The Political Genius of Abraham Lincoln
De la Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
Evaluare: 4.5 din 5 stele
4.5/5 (234)
Next Generation Technology
Document4 pagini
Next Generation Technology
shashwat2010
Încă nu există evaluări
Fear: Trump in the White House
De la Everand
Fear: Trump in the White House
Bob Woodward
Evaluare: 3.5 din 5 stele
3.5/5 (738)
C# Interview Quesions
Document10 pagini
C# Interview Quesions
Shashwat Shriparv
Încă nu există evaluări
The Glass Castle: A Memoir
De la Everand
The Glass Castle: A Memoir
Jeannette Walls
Evaluare: 4.5 din 5 stele
4.5/5 (1712)
Apache Tomcat
Document18 pagini
Apache Tomcat
shashwat2010
Încă nu există evaluări
Rise of ISIS: A Threat We Can't Ignore
De la Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
Evaluare: 3.5 din 5 stele
3.5/5 (137)
C# Interview Quesions
Document10 pagini
C# Interview Quesions
Shashwat Shriparv
Încă nu există evaluări
Principles: Life and Work
De la Everand
Principles: Life and Work
Ray Dalio
Evaluare: 4 din 5 stele
4/5 (599)
Secondary Storage Devices
Document36 pagini
Secondary Storage Devices
shashwat2010
Încă nu există evaluări
The Unwinding: An Inner History of the New America
De la Everand
The Unwinding: An Inner History of the New America
George Packer
Evaluare: 4 din 5 stele
4/5 (45)
Project Oxygen : Shashwat Shriparv Infinitysoft
Document25 pagini
Project Oxygen : Shashwat Shriparv Infinitysoft
shashwat2010
Încă nu există evaluări
The World Is Flat 3.0: A Brief History of the Twenty-first Century
De la Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
Evaluare: 3.5 din 5 stele
3.5/5 (2219)
Search Engine
Document42 pagini
Search Engine
shashwat2010
Încă nu există evaluări
Steve Jobs
De la Everand
Steve Jobs
Walter Isaacson
Evaluare: 4.5 din 5 stele
4.5/5 (806)
Poker Test
Document9 pagini
Poker Test
shashwat2010
Încă nu există evaluări
John Adams
De la Everand
John Adams
David McCullough
Evaluare: 4.5 din 5 stele
4.5/5 (2409)
Sam
Document24 pagini
Sam
shashwat2010
Încă nu există evaluări
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
De la Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
Evaluare: 4 din 5 stele
4/5 (1090)
Probability Terminology and Concepts
Document13 pagini
Probability Terminology and Concepts
shashwat2010
Încă nu există evaluări
Bad Feminist: Essays
De la Everand
Bad Feminist: Essays
Roxane Gay
Evaluare: 4 din 5 stele
4/5 (1015)
Operations On Files
Document12 pagini
Operations On Files
VinayKumarSingh
Încă nu există evaluări
The Outsider: A Novel
De la Everand
The Outsider: A Novel
Stephen King
Evaluare: 4 din 5 stele
4/5 (1839)
Runtime Storage Management
Document14 pagini
Runtime Storage Management
VinayKumarSingh
100% (1)
Brooklyn: A Novel
De la Everand
Brooklyn: A Novel
Colm Tóibín
Evaluare: 3.5 din 5 stele
3.5/5 (1937)
Parameter Passing
Document14 pagini
Parameter Passing
shashwat2010
Încă nu există evaluări
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
De la Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
Evaluare: 4.5 din 5 stele
4.5/5 (119)
P2P
Document51 pagini
P2P
shashwat2010
Încă nu există evaluări
A Man Called Ove: A Novel
De la Everand
A Man Called Ove: A Novel
Fredrik Backman
Evaluare: 4.5 din 5 stele
4.5/5 (4609)
Alphabet of Lines
Document46 pagini
Alphabet of Lines
Nancy Cordero Ambrad
Încă nu există evaluări
The Light Between Oceans: A Novel
De la Everand
The Light Between Oceans: A Novel
M.L. Stedman
Evaluare: 4.5 din 5 stele
4.5/5 (789)
(Genus - Gender in Modern Culture 12.) Segal, Naomi - Anzieu, Didier - Consensuality - Didier Anzieu, Gender and The Sense of Touch-Rodopi (2009)
Document301 pagini
(Genus - Gender in Modern Culture 12.) Segal, Naomi - Anzieu, Didier - Consensuality - Didier Anzieu, Gender and The Sense of Touch-Rodopi (2009)
Anonymous r3ZlrnnHc
Încă nu există evaluări
The Woman in Cabin 10
De la Everand
The Woman in Cabin 10
Ruth Ware
Evaluare: 3.5 din 5 stele
3.5/5 (2322)
7949 37085 3 PB
Document11 pagini
7949 37085 3 PB
Aman Chaudhary
Încă nu există evaluări
Manhattan Beach: A Novel
De la Everand
Manhattan Beach: A Novel
Jennifer Egan
Evaluare: 3.5 din 5 stele
3.5/5 (792)
SQL Server DBA Daily Checklist
Document4 pagini
SQL Server DBA Daily Checklist
Lolaca Deloca
Încă nu există evaluări
The Perks of Being a Wallflower
De la Everand
The Perks of Being a Wallflower
Stephen Chbosky
Evaluare: 4.5 din 5 stele
4.5/5 (2099)
Seminar Report On Satellite Communication
Document17 pagini
Seminar Report On Satellite Communication
Hapi ER
67% (6)
Wolf Hall: A Novel
De la Everand
Wolf Hall: A Novel
Hilary Mantel
Evaluare: 4 din 5 stele
4/5 (3811)
CHAPTER II: Review of Related Literature I. Legal References
Document2 pagini
CHAPTER II: Review of Related Literature I. Legal References
Cha
Încă nu există evaluări
Little Women
De la Everand
Little Women
Louisa May Alcott
Evaluare: 4 din 5 stele
4/5 (104)
Audit of Organizational Communication Networks
Document17 pagini
Audit of Organizational Communication Networks
Lissette Oblea
Încă nu există evaluări
The Art of Racing in the Rain: A Novel
De la Everand
The Art of Racing in the Rain: A Novel
Garth Stein
Evaluare: 4 din 5 stele
4/5 (4200)
Bus210 Week5 Reading1
Document33 pagini
Bus210 Week5 Reading1
eadyden33
0% (1)
Sing, Unburied, Sing: A Novel
De la Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
Evaluare: 4 din 5 stele
4/5 (1103)
2 Reason Why I Like Doraemon
Document2 pagini
2 Reason Why I Like Doraemon
priyanka shafira
Încă nu există evaluări
A Tree Grows in Brooklyn
De la Everand
A Tree Grows in Brooklyn
Betty Smith
Evaluare: 4.5 din 5 stele
4.5/5 (1929)
Components of GMP - Pharma Uptoday
Document3 pagini
Components of GMP - Pharma Uptoday
Sathish Vemula
Încă nu există evaluări
The Constant Gardener: A Novel
De la Everand
The Constant Gardener: A Novel
John le Carre
Evaluare: 3.5 din 5 stele
3.5/5 (104)
Training Program On: Personality Development Program and Workplace Skills
Document3 pagini
Training Program On: Personality Development Program and Workplace Skills
Vikram Singh
Încă nu există evaluări
Her Body and Other Parties: Stories
De la Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
Evaluare: 4 din 5 stele
4/5 (821)
Sony Kdl-42w654a rb1g PDF
Document100 pagini
Sony Kdl-42w654a rb1g PDF
Mihaela Caciumarciuc
Încă nu există evaluări
Cloud Computing Vs Traditional IT
Document20 pagini
Cloud Computing Vs Traditional IT
garata_java
100% (1)
HA200
Document4 pagini
HA200
Adam Ong
Încă nu există evaluări
Standard Test Method For Impact Resistance D2794
Document3 pagini
Standard Test Method For Impact Resistance D2794
vasu_suva
Încă nu există evaluări
Physics Collision Lesson Plan
Document4 pagini
Physics Collision Lesson Plan
Luna Imud
50% (4)
Optical Fiber Design Modification for Medical Imaging
Document6 pagini
Optical Fiber Design Modification for Medical Imaging
NAJMIL
Încă nu există evaluări
Lighthouse Case Study Solution Guide
Document17 pagini
Lighthouse Case Study Solution Guide
scon drium
Încă nu există evaluări
Memo
Document2 pagini
Memo
api-310037519
Încă nu există evaluări
Introduction To Literary Theory
Document3 pagini
Introduction To Literary Theory
Anil Pinto
100% (4)
Vibration Transducer
Document2 pagini
Vibration Transducer
Surendra Reddy
0% (1)
Feminism
Document8 pagini
Feminism
ismailjutt
Încă nu există evaluări
SPM Literature in English Tips + Advise
Document2 pagini
SPM Literature in English Tips + Advise
Jessica Ng
Încă nu există evaluări
Week3 Communication Skill Part 1 Student Guide
Document10 pagini
Week3 Communication Skill Part 1 Student Guide
Zoe Formoso
Încă nu există evaluări
Acer Aspire 5741, 5741g Series Service Guide
Document258 pagini
Acer Aspire 5741, 5741g Series Service Guide
Emilio Torralbo
Încă nu există evaluări
SampleAdmissionTestBBA MBA
Document4 pagini
SampleAdmissionTestBBA MBA
m_tariq_hn
Încă nu există evaluări
Data Capture Form Environmental Management
Document1 pagină
Data Capture Form Environmental Management
Donavel Nodora Jojuico
Încă nu există evaluări
2.01-Motion in A Straight Line-HW
Document2 pagini
2.01-Motion in A Straight Line-HW
Nirmal gk
Încă nu există evaluări
Well Logging 1
Document33 pagini
Well Logging 1
Spica Fadli
Încă nu există evaluări
Syllabus in Study and Thinking Skills
Document5 pagini
Syllabus in Study and Thinking Skills
Enrique Magalay
0% (1)
Just Enough Research: Second Edition
De la Everand
Just Enough Research: Second Edition
Erika Hall
Încă nu există evaluări
How to Do Nothing: Resisting the Attention Economy
De la Everand
How to Do Nothing: Resisting the Attention Economy
Jenny Odell
Evaluare: 4 din 5 stele
4/5 (421)
Get Into UX: A foolproof guide to getting your first user experience job
De la Everand
Get Into UX: A foolproof guide to getting your first user experience job
Vy Alechnavicius
Evaluare: 4.5 din 5 stele
4.5/5 (5)
TikTok Algorithms 2024 $15,000/Month Guide To Escape Your Job And Build an Successful Social Media Marketing Business From Home Using Your Personal Account, Branding, SEO, Influencer
De la Everand
TikTok Algorithms 2024 $15,000/Month Guide To Escape Your Job And Build an Successful Social Media Marketing Business From Home Using Your Personal Account, Branding, SEO, Influencer
Jordan Smith
Evaluare: 4 din 5 stele
4/5 (4)
Monitored: Business and Surveillance in a Time of Big Data
De la Everand
Monitored: Business and Surveillance in a Time of Big Data
Peter Bloom
Evaluare: 4 din 5 stele
4/5 (1)
Social Media Marketing 2024, 2025: Build Your Business, Skyrocket in Passive Income, Stop Working a 9-5 Lifestyle, True Online Working from Home
De la Everand
Social Media Marketing 2024, 2025: Build Your Business, Skyrocket in Passive Income, Stop Working a 9-5 Lifestyle, True Online Working from Home
Kara Hawkins
Încă nu există evaluări
The Social Media Bible: Tactics, Tools, and Strategies for Business Success
De la Everand
The Social Media Bible: Tactics, Tools, and Strategies for Business Success
Lon Safko
Evaluare: 3.5 din 5 stele
3.5/5 (19)
Ultimate Guide to LinkedIn for Business: Access more than 500 million people in 10 minutes
De la Everand
Ultimate Guide to LinkedIn for Business: Access more than 500 million people in 10 minutes
Ted Prodromou
Evaluare: 5 din 5 stele
5/5 (5)
Ultimate Guide to YouTube for Business
De la Everand
Ultimate Guide to YouTube for Business
The Staff of Entrepreneur Media
Evaluare: 5 din 5 stele
5/5 (1)