Bine ați venit la Scribd!

The Pathologies of Big Data Summary

Încărcat de

0% au considerat acest document util (0 voturi)

87 vizualizări2 pagini

The document discusses three key pathologies of working with big data: 1) Relational databases are inefficient for storing sequenced data like time series data; 2) Applications often run out of primary memory when processing large datasets, slowing performance; 3) Distributed computing introduces challenges around load balancing, failures, and communication between nodes. New technologies like Hadoop have addressed these issues by designing databases that handle large distributed datasets and failures more easily.

Descriere originală:

A summary of the pathologies of big data by Jacob Adams, as well as putting it into a modern context.

Drepturi de autor

Formate disponibile

DOCX, PDF, TXT sau citiți online pe Scribd

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Raportați acest document

Drepturi de autor:

Formate disponibile

Descărcați ca DOCX, PDF, TXT sau citiți online pe Scribd

Indicator pentru conținut neadecvat

0% au considerat acest document util (0 voturi)

87 vizualizări2 pagini

The Pathologies of Big Data Summary

Încărcat de

Nikita Granger

Drepturi de autor:

Formate disponibile

Descărcați ca DOCX, PDF, TXT sau citiți online pe Scribd

Indicator pentru conținut neadecvat

Salt la pagina

Sunteți pe pagina 1din 2

Căutați în document

The pathologies of Big Data

The authors of this paper define big data, as working with any sufficient large
set of data that cannot be managed using the standard procedures and
technologies of the day. From this definition, the authors then proceed to
highlighting some problems with the current set of technologies and techniques that
need to be addressed in order to succeed in a big data.
The first problem mentioned lies in extracting and working with data that
comes from traditional relational databases, although moving data into a relational
database can be done efficiently. Additionally, relational databases tend to be very
inefficient at storing data, particularly data with some sort of sequencing or ordering
to it. For example, time series data is not stored in a manner where it can be easily
queried. Since in practice users need either time series, or other sequenced
datasets, we need a new model of database that can better query efficiently this
type of data.
The second problem mentioned lied in when we run out of memory while
processing large datasets. When many applications run out of primary memory,
they move data to secondary memory, such as a hard drive. The problem with this
is that the program then runs a great deal slower. This problem is further
compounded if we use random access to the memory instead of sequential access.
For other applications, however, once they run out of primary memory they can no
longer continue to operate. Since Jacobs wrote this article in 2009, hardware
developers have made strides to addressing specific the problems surrounding big
data. For example, NVidia has developed a line of graphics cards called Tesla, which
are designed for working with and processing big data on servers. The Tesla series
of graphics card include a lot more primary memory as well as allow for much larger
bandwidth for data transfers, which facilitates better performance. (CITE THIS)
The third and final bottleneck discussed for working in big data is working in a
distributed environment. There are several problems with distributed computing in
big data including that not operations can be distributed and that it may be hard to
balance the load across all nodes. Also if computations require a lot of
communicating between different nodes then you might not see any improvements
from the distributed network of nodes, although the authors admit that this problem
has many simple solutions. Another problem is that if some part of the distributed
system fails, then the whole database or program that is running on it will be in
jeopardy. Since the time of writing new databases models have been developed
that work with large data sets more easily, such as Hadoop. Hadoop is explicitly
designed to work with large datasets over a distributed computing system, and
addresses many of the problems that Jacobs discusses in the article, such as if
there is a hardware failure somewhere in the distributed system, then Hadoop can
automatically handle it. (CITE THIS) Hadoop also takes greater advantage of data
locality than parallel relational databases do by automatically separating data in
blocks and spreading them across the distributed system. (CITE THIS)

References:
http://www.nvidia.com/object/tesla-k80.html
http://hadoop.apache.org/
http://www.datascienceassn.org/content/data-locality-hpc-vs-hadoop-vs-spark

S-ar putea să vă placă și

Introduction To Big Data
Document5 pagini
Introduction To Big Data
Vittorio Troise
Încă nu există evaluări
Introduction to Big Data Analytics
Document22 pagini
Introduction to Big Data Analytics
Rahul Pawar
Încă nu există evaluări
It Documentation
Document61 pagini
It Documentation
Vinay Motghare
Încă nu există evaluări
Guha Roy 2017
Document3 pagini
Guha Roy 2017
G LAHARI
Încă nu există evaluări
Hadoop & BigData (UNIT - 2)
Document22 pagini
Hadoop & BigData (UNIT - 2)
Ayush Khandal
Încă nu există evaluări
Comparing Hadoop and RDBMS for large data processing
Document27 pagini
Comparing Hadoop and RDBMS for large data processing
Aakanksha Jain
Încă nu există evaluări
The Growing Enormous of Big Data Storage
Document6 pagini
The Growing Enormous of Big Data Storage
Eddy Manurung
Încă nu există evaluări
IM Ch14 Big Data Analytics NoSQL Ed12
Document8 pagini
IM Ch14 Big Data Analytics NoSQL Ed12
Mohsin
Încă nu există evaluări
Large Scale and MultiStructured Databases
Document223 pagini
Large Scale and MultiStructured Databases
Franco Terranova
Încă nu există evaluări
Hadoop: Presented by Y Naveen
Document7 pagini
Hadoop: Presented by Y Naveen
Naveen Yellapu
Încă nu există evaluări
Big Data For Analysis of Large Data Sets
Document6 pagini
Big Data For Analysis of Large Data Sets
vikram
Încă nu există evaluări
Big Data Analytics on Large Scale Shared Storage
Document7 pagini
Big Data Analytics on Large Scale Shared Storage
Kyar Nyo Aye
Încă nu există evaluări
Big Data:: Task 1
Document18 pagini
Big Data:: Task 1
Meet Mahida
Încă nu există evaluări
Hadoop - MapReduce
Document51 pagini
Hadoop - MapReduce
dangtran
Încă nu există evaluări
Files 1 2020 April NotesHubDocument 1586849482
Document60 pagini
Files 1 2020 April NotesHubDocument 1586849482
growwebservices
Încă nu există evaluări
Distributed Query Processing +
Document19 pagini
Distributed Query Processing +
Prashant Deep
Încă nu există evaluări
Escritura 1
Document7 pagini
Escritura 1
kenneth
Încă nu există evaluări
Integrative Modeling of BigData Processing
Document15 pagini
Integrative Modeling of BigData Processing
Rocio Loayza
Încă nu există evaluări
Understanding Big Data
Document14 pagini
Understanding Big Data
dablyu
Încă nu există evaluări
Hadoop Notes Unit2
Document24 pagini
Hadoop Notes Unit2
manyamlakshmiprasanna
Încă nu există evaluări
Bda - 1 Unit
Document21 pagini
Bda - 1 Unit
ASMA UL HUSNA
Încă nu există evaluări
Big Data Certificat Notes
Document11 pagini
Big Data Certificat Notes
zakaria aissari
Încă nu există evaluări
Financial Software Questions
Document4 pagini
Financial Software Questions
J-Claude Doug
Încă nu există evaluări
Big Data: Spot Business Trends, Prevent Diseases, C Ombat Crime and So On"
Document8 pagini
Big Data: Spot Business Trends, Prevent Diseases, C Ombat Crime and So On"
Renuka Pandey
Încă nu există evaluări
Hadoop Report
Document110 pagini
Hadoop Report
Gahlot Divyansh
Încă nu există evaluări
Star Schema and Technology Review: Musa Sami Ata Abdel-Rahman Supervisor: Professor Sebastian Link
Document15 pagini
Star Schema and Technology Review: Musa Sami Ata Abdel-Rahman Supervisor: Professor Sebastian Link
Musa Sami
Încă nu există evaluări
Database Scalability: Vertically Scaling Your Database
Document11 pagini
Database Scalability: Vertically Scaling Your Database
Yonatan Raya
Încă nu există evaluări
Big Data Hadoop Training 8214944.ppsx
Document52 pagini
Big Data Hadoop Training 8214944.ppsx
houkoumat
Încă nu există evaluări
An Approach of RDD Optimization in Big Data Analyt
Document9 pagini
An Approach of RDD Optimization in Big Data Analyt
mary mary
Încă nu există evaluări
Data Warehouse Dissertation Topics
Document7 pagini
Data Warehouse Dissertation Topics
BestOnlinePaperWritersBillings
100% (1)
Big Data 101 Brief PDF
Document4 pagini
Big Data 101 Brief PDF
Aruna Pattam
Încă nu există evaluări
Big Data Analytics on Large Scale Shared Storage
Document22 pagini
Big Data Analytics on Large Scale Shared Storage
Kyar Nyo Aye
Încă nu există evaluări
Traing On Hadoop
Document123 pagini
Traing On Hadoop
Shubham
Încă nu există evaluări
Research Papers On Distributed Database Management System
Document4 pagini
Research Papers On Distributed Database Management System
pwvgqccnd
Încă nu există evaluări
Hadoop Architecture and its Functionality Explained
Document7 pagini
Hadoop Architecture and its Functionality Explained
John
Încă nu există evaluări
What Is Big Data ?
Document6 pagini
What Is Big Data ?
Meet Mahida
Încă nu există evaluări
TERM PAPER - DBMS N
Document5 pagini
TERM PAPER - DBMS N
MD Shohag
Încă nu există evaluări
List The Main Categories of Data
Document10 pagini
List The Main Categories of Data
sramalingam288953
Încă nu există evaluări
Parallel DB Issues
Document29 pagini
Parallel DB Issues
Varsha Bhoir
Încă nu există evaluări
Chapter - 2 Hadoop
Document32 pagini
Chapter - 2 Hadoop
Rahul Pawar
Încă nu există evaluări
Big Data in Cloud Computing A Literature Review
Document7 pagini
Big Data in Cloud Computing A Literature Review
Salil Naik
Încă nu există evaluări
Comparative Study of The New Generation, Agile, Scalable, High Performance NOSQL Databases
Document4 pagini
Comparative Study of The New Generation, Agile, Scalable, High Performance NOSQL Databases
adityaa2064
Încă nu există evaluări
Top 20 Latest Research Problems in Big Data and Data Science
Document10 pagini
Top 20 Latest Research Problems in Big Data and Data Science
abasovi99
Încă nu există evaluări
Ebook The Evolution of The Data Warehouse
Document40 pagini
Ebook The Evolution of The Data Warehouse
Haribabu Palneedi
Încă nu există evaluări
Big Data Analysis Concepts and References
Document60 pagini
Big Data Analysis Concepts and References
Jamie Gibson
100% (1)
Data Vault Series
Document10 pagini
Data Vault Series
Harish Nagappa
Încă nu există evaluări
Hadoop - Quick Guide Hadoop - Big Data Overview
Document32 pagini
Hadoop - Quick Guide Hadoop - Big Data Overview
Akhilesh Mathur
Încă nu există evaluări
Hadoop Quick Guide
Document32 pagini
Hadoop Quick Guide
Ramanan Subramanian
Încă nu există evaluări
Research Paper On Distributed Database
Document7 pagini
Research Paper On Distributed Database
vguneqrhf
100% (1)
Big Data Analysis Using Hadoop and Mapreduce: Figure 1: World Wide Data Creation Index
Document11 pagini
Big Data Analysis Using Hadoop and Mapreduce: Figure 1: World Wide Data Creation Index
swapnil kale
Încă nu există evaluări
Chap 2 Emerging Database Landscape
Document10 pagini
Chap 2 Emerging Database Landscape
SwatiJadhav
Încă nu există evaluări
Big Data and Hadoop - 12 Aug 2021
Document19 pagini
Big Data and Hadoop - 12 Aug 2021
Sahil Sarwar
Încă nu există evaluări
Mining Databases: Towards Algorithms For Knowledge Discovery
Document10 pagini
Mining Databases: Towards Algorithms For Knowledge Discovery
Juan Kard
Încă nu există evaluări
Analysis of Dynamic Data Placement Strategy For Heterogeneous Hadoop Cluster
Document8 pagini
Analysis of Dynamic Data Placement Strategy For Heterogeneous Hadoop Cluster
International Journal of Application or Innovation in Engineering & Management
Încă nu există evaluări
Research Papers Distributed Database Management System
Document6 pagini
Research Papers Distributed Database Management System
gz46ktxr
100% (1)
DATAMA
Document10 pagini
DATAMA
Inah Espinola
Încă nu există evaluări
BMurphy Geotechnical Data Management Using Acquire 2007
Document8 pagini
BMurphy Geotechnical Data Management Using Acquire 2007
Andi Anriansyah
Încă nu există evaluări
8 Considerations When Selecting Big Data Technology
Document2 pagini
8 Considerations When Selecting Big Data Technology
Venkatraman Krishnamoorthy
Încă nu există evaluări
Learn Hadoop in 24 Hours
De la Everand
Learn Hadoop in 24 Hours
Alex Nordeen
Încă nu există evaluări
Developing Analytic Talent: Becoming a Data Scientist
De la Everand
Developing Analytic Talent: Becoming a Data Scientist
Vincent Granville
Evaluare: 3 din 5 stele
3/5 (7)
GRID AND CLOUD COMPUTING KEY CONCEPTS
Document14 pagini
GRID AND CLOUD COMPUTING KEY CONCEPTS
Suman Pawar
Încă nu există evaluări
Oracle Big Data 2017 Implementation Essentials Exam: 1Z0-449 Demo Edition
Document5 pagini
Oracle Big Data 2017 Implementation Essentials Exam: 1Z0-449 Demo Edition
ishtiaq
Încă nu există evaluări
What Is H2O
Document3 pagini
What Is H2O
jeep2014
Încă nu există evaluări
How-To - Install CDH On Mac OSX 10
Document20 pagini
How-To - Install CDH On Mac OSX 10
archsark
Încă nu există evaluări
Big Data Analytics TEXTBOOK
Document230 pagini
Big Data Analytics TEXTBOOK
ARAVINDHARAJ M
Încă nu există evaluări
RapidMiner Fact Sheet
Document11 pagini
RapidMiner Fact Sheet
corello52
Încă nu există evaluări
Introduction To Hadoop
Document44 pagini
Introduction To Hadoop
Ponnusamy S Pichaimuthu
Încă nu există evaluări
Best Practices for Optimizing Amazon EMR Jobs
Document38 pagini
Best Practices for Optimizing Amazon EMR Jobs
Charl11e
Încă nu există evaluări
Master's Distributed Systems and Cloud Computing
Document2 pagini
Master's Distributed Systems and Cloud Computing
james
Încă nu există evaluări
Machine Learning Models and Algorithms For Big Data Classification
Document364 pagini
Machine Learning Models and Algorithms For Big Data Classification
陳琮方
0% (1)
Bankruptcy Prediction Model Using Big Data Analytics
Document7 pagini
Bankruptcy Prediction Model Using Big Data Analytics
Shagufta Tahsildar
Încă nu există evaluări
02-HUAWEI Storage Product Sales Specialist Training V2.0
Document41 pagini
02-HUAWEI Storage Product Sales Specialist Training V2.0
BeeGees
Încă nu există evaluări
Sqoop Big Data Tech
Document16 pagini
Sqoop Big Data Tech
linkranjit
Încă nu există evaluări
Big Data
Document17 pagini
Big Data
Jelin
Încă nu există evaluări
Big Data For Finance
Document14 pagini
Big Data For Finance
Fernanda Rocha
Încă nu există evaluări
Big Data Analytics
Document13 pagini
Big Data Analytics
star
Încă nu există evaluări
Swathi Modem - Java - Scala - Yupp TV
Document4 pagini
Swathi Modem - Java - Scala - Yupp TV
swapna
Încă nu există evaluări
Run Python MapReduce On Local Docker Hadoop Cluster - DEV Community
Document5 pagini
Run Python MapReduce On Local Docker Hadoop Cluster - DEV Community
Ahmed Mohamed
Încă nu există evaluări
Big Data Storage: Margaret Rouse Garry Kranz
Document6 pagini
Big Data Storage: Margaret Rouse Garry Kranz
Lotti Lotti
Încă nu există evaluări
Data Science With Python
Document13 pagini
Data Science With Python
zhlikhon
Încă nu există evaluări
Big Data Final Report
Document34 pagini
Big Data Final Report
clinfox
100% (2)
Odi 12c New Features WP 2226353
Document29 pagini
Odi 12c New Features WP 2226353
Anonymous STmh9rbfK
Încă nu există evaluări
Accenture How To Become A Phygital Bank in A Year
Document24 pagini
Accenture How To Become A Phygital Bank in A Year
Johnly Joy
Încă nu există evaluări
Cloudera PS CDP DC SmartMigrate
Document16 pagini
Cloudera PS CDP DC SmartMigrate
malick fall
Încă nu există evaluări
Proposing A New Methodology For Weather
Document6 pagini
Proposing A New Methodology For Weather
Supuni Mudalige
Încă nu există evaluări
Beno K Pradekso - Solusi247 - In40ai
Document36 pagini
Beno K Pradekso - Solusi247 - In40ai
Muhammad Hatta
Încă nu există evaluări
Edureka VM Readme PDF
Document4 pagini
Edureka VM Readme PDF
agangapur
Încă nu există evaluări
Database Systems (Introduction)
Document39 pagini
Database Systems (Introduction)
Alina Alina
100% (1)
Ambari Install v20
Document42 pagini
Ambari Install v20
Harsh
Încă nu există evaluări
Hiho
Document12 pagini
Hiho
Sonal
Încă nu există evaluări