Sunteți pe pagina 1din 2

search KDnuggets Search

Subscribe to KDnuggets News | | Contact

SOFTWARE | News/Blog | Top stories | Opinions | Tutorials | JOBS | Companies | Courses | Datasets | EDUCATION | Certificates | Meetings | Webinars

Discover, understand, and catalog your data with Io-Tahoe. Smart Data Discovery - AI-driven Catalog

KDnuggets Home » News » 2017 » May » Tutorials, Overviews » HDFS vs. HBase : All you need to know ( 17:n19 )

HDFS vs. HBase : All you need to Latest News


know MS in Health Analytics Online

On Stage at PAW Industry 4.0: Bayer,


Continental, HP, V...
Previous post Next post
Strata SF day 1 Highlights: from
http likes 313
Edge to AI, scoring AI...
Like 6 Share 6 Tweet
Top Data Science and Machine
Tags: Big Data, Hadoop, HBase, HDFS Learning Methods Used in 2...

Pandas DataFrame Indexing

Hadoop Distributed File System (HDFS), and Hbase (Hadoop Monash University: Lecturer/Senior
database) are key components of Big Data ecosystem. This Smith School Lecturer (Machine Le...
blog explains the difference between HDFS and HBase with Master of Management Analytics
real-life use cases where they are best fit. Unleash the True Potential of Data

By Alex Mailajalam, Noah Data

REV 2 Data Science Leaders


Summit, May 23-24, New York City
Get $100 off your order by using
promo code: KDNuggetREV

142
SHARES

The sudden increase in the volume of data from the order of gigabytes to zettabytes has created the
need for a more organized file system for storage and processing of data. The demand stemming The Fastest
from the data market has brought Hadoop in the limelight making it one of biggest players in the Mathematical
industry. Hadoop Distributed File System (HDFS), the commonly known file system of Hadoop and Optimization Solver
Hbase (Hadoop’s database) are the most topical and advanced data storage and management Try It Now!
systems available in the market.

What are HDFS and HBase?

HDFS is fault-tolerant by design and supports rapid data transfer between nodes even during system
Top Stories
failures. HBase is a non-relational and open source Not-Only-SQL database that runs on top of
Last Week
Hadoop. HBase comes under CP type of CAP (Consistency, Availability, and Partition Tolerance)
theorem.
Most Popular
HDFS is most suitable for performing batch analytics. However, one of its biggest drawbacks is its 1. Data Visualization in
inability to perform real-time analysis, the trending requirement of the IT industry. HBase, on the Python: Matplotlib vs
other hand, can handle large data sets and is not appropriate for batch analytics. Instead, it is used Seaborn
to write/read data from Hadoop in real-time.

Both HDFS and HBase are capable of processing structured, semi-structured as well as un-structured
data. HDFS lacks an in-memory processing engine slowing down the process of data analysis; as it is
using plain old MapReduce to do it. HBase, on the contrary, boasts of an in-memory processing
engine that drastically increases the speed of read/write.

HDFS is very transparent in its execution of data analysis. HBase, on the other hand, being a NoSQL
database in tabular format, fetches values by sorting them under different key values.

Enhanced Understanding with Use Cases for HDFS & HBase

Use Case 1 – Cloudera optimization for European bank using HBase 2. Best Data Visualization
Techniques for small and
HBase is ideally suited for real-time environments and this can be best demonstrated by citing the large data
example of our client, a renowned European bank. To derive critical insights from the logs from
application/web servers, we implemented solution in Apache Storm and Apache Hbase together. 3. An Introduction on Time
Given the huge velocity of data, we opted for HBase over HDFS; as HDFS does not support real-time Series Forecasting with
writes. The results were overwhelming; it reduced the query time from 3 days to 3 minutes. Simple Neural Networks &
LSTM
Use Case 2 – Analytics solution for global CPG player using HDFS & MapReduce
4. 2019 Best Masters in
With our global beverage player client, the primary objective was to perform batch analytics to gain Data Science and Analytics —
SKU level insights, and involved recursive/sequential calculations. HDFS and MapReduce frameworks Europe Edition
were better suited than complex Hive queries on top of Hbase. MapReduce was used for data
5. Data Science with
wrangling and to prepare data for subsequent analytics. Hive was used for custom analytics on top
Optimus Part 1: Intro
of data processed by MapReduce. The results were impressive; as there was a drastic reduction in
the time taken to generate custom analytics – 3 days to 3 hours. 6. Top 10 Coding Mistakes
Made by Data Scientists
To offer a reasonable comparison between HDFS and HBase, the following points need to be
emphasized on: 7. 9 Must-have skills you
need to become a Data
HDFS HBase Scientist, updated
HDFS is a Java-based file system utilized for storing
HBase is a Java based Not Only SQL database
large data sets.
HDFS has a rigid architecture that does not allow HBase allows for dynamic changes and can be Most Shared
changes. It doesn’t facilitate dynamic storage. utilized for standalone applications.
1. Data Visualization in Python:
HDFS is ideally suited for write-once and read-many HBase is ideally suited for random write and Matplotlib vs Seaborn
times use cases read of data that is stored in HDFS.
2. Data Science with Optimus
Original. Reposted with permission. Part 2: Setting your DataOps
Bio: Alex Mailajalam is a Big Data Architect at Noah Data with strong expertise in optimizing Environment
custom Big Data Applications built on Cloudera / Hortonworks, migrating legacy systems to Big Data
platform, and in building Enterprise grade integrated Big Data and Analytics solutions.

Related:

7 Steps to Understanding NoSQL Databases


Hadoop Key Terms, Explained
Hadoop is Not Failing, it is the Future of Data

3. Best Data Visualization


Previous post Next post Techniques for small and
large data
Top Stories Past 30 Days 4. K-Means Clustering:
Unsupervised Learning for
Most Popular Most Shared Recommender Systems

1. Top 10 Coding Mistakes Made by 1. Another 10 Free Must-See Courses 5. The Rise of Generative
Data Scientists for Machine Learning and Data Adversarial Networks
Science
2. How to Recognize a Good Data 6. Unleash a faster Python on
Scientist Job From a Bad One 2. Top 10 Coding Mistakes Made by your data
Data Scientists
3. 9 Must-have skills you need to 7. How Optimization Works
become a Data Scientist, updated 3. R vs Python for Data Visualization

4. An Introduction on Time Series 4. The Deep Learning Toolset — An


Forecasting with Simple Neural Overview
Networks & LSTM
5. Data Visualization in Python:
5. Another 10 Free Must-Read Books Matplotlib vs Seaborn
for Machine Learning and Data
6. An Introduction on Time Series
Science
Forecasting with Simple Neural
6. Data Visualization in Python: Networks & LSTM
Matplotlib vs Seaborn
7. How to Recognize a Good Data
7. Best Data Visualization Techniques Scientist Job From a Bad One
for small and large data

KDnuggets Home » News » 2017 » May » Tutorials, Overviews » HDFS vs. HBase : All you need to know ( 17:n19 )

© 2019 KDnuggets. About KDnuggets. Privacy policy. Terms of Service

Subscribe to KDnuggets News X

S-ar putea să vă placă și