Data Analysis in the Cloud: Models, Techniques and Applications

Ebook225 pages2 hours

Data Analysis in the Cloud: Models, Techniques and Applications

Name: Data Analysis in the Cloud: Models, Techniques and Applications
Author: Domenico Talia
ISBN: 9780128029145

By Domenico Talia, Paolo Trunfio and Fabrizio Marozzo

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Data Analysis in the Cloud introduces and discusses models, methods, techniques, and systems to analyze the large number of digital data sources available on the Internet using the computing and storage facilities of the cloud.

Coverage includes scalable data mining and knowledge discovery techniques together with cloud computing concepts, models, and systems. Specific sections focus on map-reduce and NoSQL models. The book also includes techniques for conducting high-performance distributed analysis of large data on clouds. Finally, the book examines research trends such as Big Data pervasive computing, data-intensive exascale computing, and massive social network analysis.

Introduces data analysis techniques and cloud computing concepts
Describes cloud-based models and systems for Big Data analytics
Provides examples of the state-of-the-art in cloud data analysis
Explains how to develop large-scale data mining applications on clouds
Outlines the main research trends in the area of scalable Big Data analysis

Skip carousel

LanguageEnglish

PublisherElsevier Science

Release dateSep 15, 2015

ISBN9780128029145

Author

Domenico Talia

Domenico Talia is a professor of computer engineering at University of Calabria and partner of two startups: DtoK Lab and Exeura. His research interests include parallel and distributed data mining algorithms, cloud computing, social data analysis, distributed knowledge discovery, mobile computing, green computing systems, peer-to-peer systems, and parallel programming. He is the author of several books including Service-Oriented Distributed Knowledge Discovery (CRC 2012) and Grid Middleware and Services: Challenges and Solutions (Springer 2010), and more than 300 papers in archival journals such as CACM, IEEE TKDE, ACM Computing Surveys, FGCS, Parallel Computing, IEEE Internet Computing and international conference proceedings. He is a member of the editorial boards of many journals including IEEE Transactions on Cloud Computing, the Future Generation Computer Systems journal, Journal of Cloud Computing, and The International Journal on Web and Grid Services.

Related authors

Skip carousel

Related to Data Analysis in the Cloud

Related ebooks

Skip carousel

Introduction to Data Science Using R
Ebook
Introduction to Data Science Using R
byPrema Alla
Rating: 0 out of 5 stars
0 ratings
Big Data Analytics: From Strategic Planning to Enterprise Integration with Tools, Techniques, NoSQL, and Graph
Ebook
Big Data Analytics: From Strategic Planning to Enterprise Integration with Tools, Techniques, NoSQL, and Graph
byDavid Loshin
Rating: 5 out of 5 stars
5/5
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
Ebook
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
bySteven Cooper
Rating: 4 out of 5 stars
4/5
Data Science: Concepts and Practice
Ebook
Data Science: Concepts and Practice
byVijay Kotu
Rating: 3 out of 5 stars
3/5
Computational Learning Approaches to Data Analytics in Biomedical Applications
Ebook
Computational Learning Approaches to Data Analytics in Biomedical Applications
byKhalid Al-Jabery
Rating: 5 out of 5 stars
5/5
Mastering Data Science with Python: The Ultimate Guide: Unlock the Power of Data Analysis and Visualization with Python's Cutting-Edge Tools and Techniques
Ebook
Mastering Data Science with Python: The Ultimate Guide: Unlock the Power of Data Analysis and Visualization with Python's Cutting-Edge Tools and Techniques
bydaniel Huston
Rating: 0 out of 5 stars
0 ratings
Python Data Science: A Step-By-Step Guide to Data Analysis. What a Beginner Needs to Know About Machine Learning and Artificial Intelligence. Exercises Included
Ebook
Python Data Science: A Step-By-Step Guide to Data Analysis. What a Beginner Needs to Know About Machine Learning and Artificial Intelligence. Exercises Included
byAxel Ross
Rating: 0 out of 5 stars
0 ratings
Integrating Python with Leading Computer Forensics Platforms
Ebook
Integrating Python with Leading Computer Forensics Platforms
byChet Hosmer
Rating: 0 out of 5 stars
0 ratings
Developing Analytic Talent: Becoming a Data Scientist
Ebook
Developing Analytic Talent: Becoming a Data Scientist
byVincent Granville
Rating: 3 out of 5 stars
3/5
PYTHON DATA ANALYTICS: Mastering Python for Effective Data Analysis and Visualization (2024 Beginner Guide)
Ebook
PYTHON DATA ANALYTICS: Mastering Python for Effective Data Analysis and Visualization (2024 Beginner Guide)
byFLOYD BAX
Rating: 0 out of 5 stars
0 ratings
“Mastering Relational Databases: From Fundamentals to Advanced Concepts”: GoodMan, #1
Ebook
“Mastering Relational Databases: From Fundamentals to Advanced Concepts”: GoodMan, #1
byPatrick Mukosha
Rating: 0 out of 5 stars
0 ratings
Systems Analysis and Synthesis: Bridging Computer Science and Information Technology
Ebook
Systems Analysis and Synthesis: Bridging Computer Science and Information Technology
byBarry Dwyer
Rating: 0 out of 5 stars
0 ratings
Secure Chains: Cybersecurity and Blockchain-powered Automation
Ebook
Secure Chains: Cybersecurity and Blockchain-powered Automation
bySrinivas Mahankali
Rating: 0 out of 5 stars
0 ratings
Guerrilla Analytics: A Practical Approach to Working with Data
Ebook
Guerrilla Analytics: A Practical Approach to Working with Data
byEnda Ridge
Rating: 5 out of 5 stars
5/5
Business Modeling and Data Mining
Ebook
Business Modeling and Data Mining
byDorian Pyle
Rating: 3 out of 5 stars
3/5
Real-time business intelligence A Complete Guide
Ebook
Real-time business intelligence A Complete Guide
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
Mastering Social Media Mining with Python
Ebook
Mastering Social Media Mining with Python
byMarco Bonzanini
Rating: 5 out of 5 stars
5/5
Big Data Analytics: Disruptive Technologies for Changing the Game
Ebook
Big Data Analytics: Disruptive Technologies for Changing the Game
byArvind Sathi
Rating: 4 out of 5 stars
4/5
BCS Glossary of Computing
Ebook
BCS Glossary of Computing
byArnold Burdett
Rating: 0 out of 5 stars
0 ratings
Big Data Analytics for Cyber-Physical Systems: Machine Learning for the Internet of Things
Ebook
Big Data Analytics for Cyber-Physical Systems: Machine Learning for the Internet of Things
byGuido Dartmann
Rating: 0 out of 5 stars
0 ratings
Geospatial intelligence A Clear and Concise Reference
Ebook
Geospatial intelligence A Clear and Concise Reference
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
Bayesian Optimization and Data Science
Ebook
Bayesian Optimization and Data Science
byFrancesco Archetti
Rating: 0 out of 5 stars
0 ratings
Predicting Malicious Behavior: Tools and Techniques for Ensuring Global Security
Ebook
Predicting Malicious Behavior: Tools and Techniques for Ensuring Global Security
byGary M. Jackson
Rating: 0 out of 5 stars
0 ratings
Data Visualization Strategy Standard Requirements
Ebook
Data Visualization Strategy Standard Requirements
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
Implementing Cryptography Using Python
Ebook
Implementing Cryptography Using Python
byShannon W. Bray
Rating: 0 out of 5 stars
0 ratings
Artificial Intelligence for the Internet of Everything
Ebook
Artificial Intelligence for the Internet of Everything
byWilliam Lawless
Rating: 3 out of 5 stars
3/5
Applied Architecture Patterns on the Microsoft Platform
Ebook
Applied Architecture Patterns on the Microsoft Platform
byRichard Seroter
Rating: 0 out of 5 stars
0 ratings
F# High Performance
Ebook
F# High Performance
byEriawan Kusumawardhono
Rating: 0 out of 5 stars
0 ratings
Practical Data Science with Python 3: Synthesizing Actionable Insights from Data
Ebook
Practical Data Science with Python 3: Synthesizing Actionable Insights from Data
byErvin Varga
Rating: 0 out of 5 stars
0 ratings
SQL Server Analysis Services 2012 Cube Development Cookbook
Ebook
SQL Server Analysis Services 2012 Cube Development Cookbook
byPaul Turley
Rating: 0 out of 5 stars
0 ratings

Enterprise Applications For You

Skip carousel

Mastering ChatGPT: Create Highly Effective Prompts, Strategies, and Best Practices to Go From Novice to Expert
Ebook
Mastering ChatGPT: Create Highly Effective Prompts, Strategies, and Best Practices to Go From Novice to Expert
byTJ Books
Rating: 3 out of 5 stars
3/5
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
Ebook
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
byCea West
Rating: 4 out of 5 stars
4/5
Microsoft Office 365 Bible: 10:1 Mastery | Excel in Your Profession, Enhance Time Management, and Foster Exceptional Collaboration [III EDITION]: Career Elevator
Ebook
Microsoft Office 365 Bible: 10:1 Mastery | Excel in Your Profession, Enhance Time Management, and Foster Exceptional Collaboration [III EDITION]: Career Elevator
byKevin Pitch
Rating: 5 out of 5 stars
5/5
Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1
Ebook
Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1
byKevin Clark
Rating: 5 out of 5 stars
5/5
101 Ready-to-Use Excel Formulas
Ebook
101 Ready-to-Use Excel Formulas
byMichael Alexander
Rating: 4 out of 5 stars
4/5
Excel 2019 Bible
Ebook
Excel 2019 Bible
byMichael Alexander
Rating: 4 out of 5 stars
4/5
Access 2019 For Dummies
Ebook
Access 2019 For Dummies
byLaurie A. Ulrich
Rating: 0 out of 5 stars
0 ratings
Excel Formulas and Functions 2020: Excel Academy, #1
Ebook
Excel Formulas and Functions 2020: Excel Academy, #1
byAdam Ramirez
Rating: 4 out of 5 stars
4/5
Learn Power BI: A beginner's guide to developing interactive business intelligence solutions using Microsoft Power BI
Ebook
Learn Power BI: A beginner's guide to developing interactive business intelligence solutions using Microsoft Power BI
byGreg Deckler
Rating: 5 out of 5 stars
5/5
Mastering QuickBooks 2020: The ultimate guide to bookkeeping and QuickBooks Online
Ebook
Mastering QuickBooks 2020: The ultimate guide to bookkeeping and QuickBooks Online
byCrystalynn Shelton
Rating: 0 out of 5 stars
0 ratings
50 Useful Excel Functions: Excel Essentials, #3
Ebook
50 Useful Excel Functions: Excel Essentials, #3
byM.L. Humphrey
Rating: 5 out of 5 stars
5/5
3D Concrete Printing Technology: Construction and Building Applications
Ebook
3D Concrete Printing Technology: Construction and Building Applications
byJay G. Sanjayan
Rating: 0 out of 5 stars
0 ratings
The New Email Revolution: Save Time, Make Money, and Write Emails People Actually Want to Read!
Ebook
The New Email Revolution: Save Time, Make Money, and Write Emails People Actually Want to Read!
byRobert W. Bly
Rating: 5 out of 5 stars
5/5
Excel Tips and Tricks
Ebook
Excel Tips and Tricks
byM.L. Humphrey
Rating: 0 out of 5 stars
0 ratings
QuickBooks 2024 All-in-One For Dummies
Ebook
QuickBooks 2024 All-in-One For Dummies
byStephen L. Nelson
Rating: 0 out of 5 stars
0 ratings
ChatGPT Millionaire 2024 - Bot-Driven Side Hustles, Prompt Engineering Shortcut Secrets, and Automated Income Streams that Print Money While You Sleep. The Ultimate Beginner’s Guide for AI Business
Ebook
ChatGPT Millionaire 2024 - Bot-Driven Side Hustles, Prompt Engineering Shortcut Secrets, and Automated Income Streams that Print Money While You Sleep. The Ultimate Beginner’s Guide for AI Business
byAlec Rowe
Rating: 0 out of 5 stars
0 ratings
Excel for Beginners 2023: A Step-by-Step and Quick Reference Guide to Master the Fundamentals, Formulas, Functions, & Charts in Excel with Practical Examples | A Complete Excel Shortcuts Cheat Sheet
Ebook
Excel for Beginners 2023: A Step-by-Step and Quick Reference Guide to Master the Fundamentals, Formulas, Functions, & Charts in Excel with Practical Examples | A Complete Excel Shortcuts Cheat Sheet
byJames H. Moyle
Rating: 0 out of 5 stars
0 ratings
Learn Windows PowerShell in a Month of Lunches
Ebook
Learn Windows PowerShell in a Month of Lunches
byDon Jones
Rating: 0 out of 5 stars
0 ratings
Notion for Beginners: Notion for Work, Play, and Productivity
Ebook
Notion for Beginners: Notion for Work, Play, and Productivity
byJill Hamilton
Rating: 4 out of 5 stars
4/5
Bitcoin For Dummies
Ebook
Bitcoin For Dummies
byPrypto
Rating: 4 out of 5 stars
4/5
Excel 2023 for Beginners: A Complete Quick Reference Guide from Beginner to Advanced with Simple Tips and Tricks to Master All Essential Fundamentals, Formulas, Functions, Charts, Tools, & Shortcuts
Ebook
Excel 2023 for Beginners: A Complete Quick Reference Guide from Beginner to Advanced with Simple Tips and Tricks to Master All Essential Fundamentals, Formulas, Functions, Charts, Tools, & Shortcuts
byTerry R. Hoffmann
Rating: 0 out of 5 stars
0 ratings
Enterprise AI For Dummies
Ebook
Enterprise AI For Dummies
byZachary Jarvinen
Rating: 3 out of 5 stars
3/5
QuickBooks 2023 All-in-One For Dummies
Ebook
QuickBooks 2023 All-in-One For Dummies
byStephen L. Nelson
Rating: 0 out of 5 stars
0 ratings
Excel 2016 For Dummies
Ebook
Excel 2016 For Dummies
byGreg Harvey
Rating: 4 out of 5 stars
4/5
Create Income through Self-Publishing: An Author's Approach on Generating Wealth by Self-Publishing
Ebook
Create Income through Self-Publishing: An Author's Approach on Generating Wealth by Self-Publishing
bySimon Marlow
Rating: 5 out of 5 stars
5/5
QuickBooks 2021 For Dummies
Ebook
QuickBooks 2021 For Dummies
byStephen L. Nelson
Rating: 0 out of 5 stars
0 ratings
ChatGPT Ultimate User Guide - How to Make Money Online Faster and More Precise Using AI Technology
Ebook
ChatGPT Ultimate User Guide - How to Make Money Online Faster and More Precise Using AI Technology
byMaximus Wilson
Rating: 0 out of 5 stars
0 ratings
The Ridiculously Simple Guide to Google Docs: A Practical Guide to Cloud-Based Word Processing
Ebook
The Ridiculously Simple Guide to Google Docs: A Practical Guide to Cloud-Based Word Processing
byScott La Counte
Rating: 0 out of 5 stars
0 ratings
ChatGPT Side Hustles 2024 - Unlock the Digital Goldmine and Get AI Working for You Fast with More Than 85 Side Hustle Ideas to Boost Passive Income, Create New Cash Flow, and Get Ahead of the Curve
Ebook
ChatGPT Side Hustles 2024 - Unlock the Digital Goldmine and Get AI Working for You Fast with More Than 85 Side Hustle Ideas to Boost Passive Income, Create New Cash Flow, and Get Ahead of the Curve
byAlec Rowe
Rating: 0 out of 5 stars
0 ratings
PowerShell for SQL Server Essentials
Ebook
PowerShell for SQL Server Essentials
bySantos Donabel
Rating: 0 out of 5 stars
0 ratings

Related podcast episodes

Skip carousel

Digital Forensics with the Digital Forensics Team: The Sheriff's Office Digital Forensics Team explains digital forensics and how it helps solve crimes.
Podcast episode
Digital Forensics with the Digital Forensics Team: The Sheriff's Office Digital Forensics Team explains digital forensics and how it helps solve crimes.
byGoing 97
0 ratings
0% found this document useful
Learning Long-Time Dependencies with RNNs w/ Konstantin Rusch - #484: Today we conclude our 2021 ICLR coverage joined by Konstantin Rusch, a PhD Student at ETH Zurich. In our conversation with Konstantin, we explore his recent papers, titled coRNN and uniCORNN respectively, which focus on a novel architecture of...
Podcast episode
Learning Long-Time Dependencies with RNNs w/ Konstantin Rusch - #484: Today we conclude our 2021 ICLR coverage joined by Konstantin Rusch, a PhD Student at ETH Zurich. In our conversation with Konstantin, we explore his recent papers, titled coRNN and uniCORNN respectively, which focus on a novel architecture of...
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful
040: Graph Databases: Traditional relational databases like MySQL or Postgres are really good at providing many solutions to the problem of persisting state. But these types of database are really horrible at querying highly connected models in an efficient way. Graph datab...
Podcast episode
040: Graph Databases: Traditional relational databases like MySQL or Postgres are really good at providing many solutions to the problem of persisting state. But these types of database are really horrible at querying highly connected models in an efficient way. Graph datab...
byPHPRoundtable Podcast
0 ratings
0% found this document useful
Episode 19 (Python for Data Science - Python Files - Scripts and Modules)
Podcast episode
Episode 19 (Python for Data Science - Python Files - Scripts and Modules)
byHow to Data (Joshiverse- Journey of a Budding Data Scientist)
0 ratings
0% found this document useful
Streaming Data Pipelines Made SQL With Decodable: An interview with Eric Sammer about the difficulty of working with streaming engines at a low level of abstraction and how he and his team at Decodable are working to make development of streaming data pipelines as straightforward as writing SQL
Podcast episode
Streaming Data Pipelines Made SQL With Decodable: An interview with Eric Sammer about the difficulty of working with streaming engines at a low level of abstraction and how he and his team at Decodable are working to make development of streaming data pipelines as straightforward as writing SQL
byData Engineering Podcast
0 ratings
0% found this document useful
Big Data: The money-making world of big data is discussed by Evan Davis and guests.
Podcast episode
Big Data: The money-making world of big data is discussed by Evan Davis and guests.
byThe Bottom Line
0 ratings
0% found this document useful
Graph Analytic Systems with Zachary Hanif - TWiML Talk #188: In this, the final episode of our Strata Data Conference series, we’re joined by Zachary Hanif, Director of Machine Learning at Capital One’s Center for Machine Learning. Zach led a session at Strata called “Network effects: Working with modern...
Podcast episode
Graph Analytic Systems with Zachary Hanif - TWiML Talk #188: In this, the final episode of our Strata Data Conference series, we’re joined by Zachary Hanif, Director of Machine Learning at Capital One’s Center for Machine Learning. Zach led a session at Strata called “Network effects: Working with modern...
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful
Data Visualization with Manuel Lima: Gabi Ferrara and Jon Foust are back today and joined by fellow Googler Manuel Lima.
Podcast episode
Data Visualization with Manuel Lima: Gabi Ferrara and Jon Foust are back today and joined by fellow Googler Manuel Lima.
byGoogle Cloud Platform Podcast
0 ratings
0% found this document useful
OSINT, Curiosity, Creativity, & Career Pivots: A Conversation with Rae Baker
Podcast episode
OSINT, Curiosity, Creativity, & Career Pivots: A Conversation with Rae Baker
by8th Layer Insights
0 ratings
0% found this document useful
Distributed Systems with Leslie Lamport: This episode is a republication from my interview with Leslie Lamport on Software Engineering Radio. Leslie Lamport won a Turing Award in 2013 for his work in distributed and concurrent systems. He also designed the document preparation tool LaTex.
Podcast episode
Distributed Systems with Leslie Lamport: This episode is a republication from my interview with Leslie Lamport on Software Engineering Radio. Leslie Lamport won a Turing Award in 2013 for his work in distributed and concurrent systems. He also designed the document preparation tool LaTex.
byCloud Engineering Archives - Software Engineering Daily
0 ratings
0% found this document useful
A Multipurpose Database For Transactions And Analytics To Simplify Your Data Architecture With Singlestore: An interview with Shireesh Thota about how the Singlestore database engine allows you to reduce architectural sprawl in your data systems by combining performant and scalable transactional and analytical capabilities into a single platform
Podcast episode
A Multipurpose Database For Transactions And Analytics To Simplify Your Data Architecture With Singlestore: An interview with Shireesh Thota about how the Singlestore database engine allows you to reduce architectural sprawl in your data systems by combining performant and scalable transactional and analytical capabilities into a single platform
byData Engineering Podcast
0 ratings
0% found this document useful
Data Structures and Algorithms – Podcast S08 E06: Kelvin Lau and Vincent Ngo of "Data Structures and Algorithms in Swift" show why to have these into your core knowledge list.
Podcast episode
Data Structures and Algorithms – Podcast S08 E06: Kelvin Lau and Vincent Ngo of "Data Structures and Algorithms in Swift" show why to have these into your core knowledge list.
byThe Kodeco Podcast: For App Developers and Gamers
0 ratings
0% found this document useful
The Data of Love: Xiao-Li Meng and Liberty Vittert speak with relationship experts Drs. Julie and John Gottman. Listen to find out how to ensure your relationship lasts the test of time.
Podcast episode
The Data of Love: Xiao-Li Meng and Liberty Vittert speak with relationship experts Drs. Julie and John Gottman. Listen to find out how to ensure your relationship lasts the test of time.
byHarvard Data Science Review Podcast
0 ratings
0% found this document useful
62: Cracking the Data Code w/ Mike Bugembe: Mike Bugembe is a speaker, consultant, and Amazon best selling author of the book Cracking the Data Code. He joins today’s podcast to talk about the things that you can do that will help create successful analytics projects. After being the...
Podcast episode
62: Cracking the Data Code w/ Mike Bugembe: Mike Bugembe is a speaker, consultant, and Amazon best selling author of the book Cracking the Data Code. He joins today’s podcast to talk about the things that you can do that will help create successful analytics projects. After being the...
byAnalytics on Fire
0 ratings
0% found this document useful
penetration test (noun) [Word Notes]
Podcast episode
penetration test (noun) [Word Notes]
byHacking Humans
0 ratings
0% found this document useful
4 + 1 Model of Data Science: Before diving into the complex world of data science it seemed to wise to establish a shared definition of the field. Here at the UVA School of Data Science, we have defined data science with the 4 + 1 Model. This model serves an outline for the first series of UVA Data Points. It also serves as a guiding definition within the School of Data Science, touching everything from research to course planning. In this introduction trailer, host Monica Manney discusses the history, development, and function of the 4 + 1 Model of Data Science with its main author, Raf Alvarado. Below is a brief expect from An Outline of the 4 + 1 Model of Data Science by Raf Alvarado: “The point of the 4 + 1 model, abstract as it is, is to provide a practical template for strategically planning the various elements of a school of data science. To serve as an effective template, a model must be general. But generality if often purchased at the cost of intuitive understanding. The fol
Podcast episode
4 + 1 Model of Data Science: Before diving into the complex world of data science it seemed to wise to establish a shared definition of the field. Here at the UVA School of Data Science, we have defined data science with the 4 + 1 Model. This model serves an outline for the first series of UVA Data Points. It also serves as a guiding definition within the School of Data Science, touching everything from research to course planning. In this introduction trailer, host Monica Manney discusses the history, development, and function of the 4 + 1 Model of Data Science with its main author, Raf Alvarado. Below is a brief expect from An Outline of the 4 + 1 Model of Data Science by Raf Alvarado: “The point of the 4 + 1 model, abstract as it is, is to provide a practical template for strategically planning the various elements of a school of data science. To serve as an effective template, a model must be general. But generality if often purchased at the cost of intuitive understanding. The fol
byUVA Data Points
0 ratings
0% found this document useful
[Bite] Data Science and the Scientific Method
Podcast episode
[Bite] Data Science and the Scientific Method
byDataCafé
0 ratings
0% found this document useful
37. Sean Knapp - The brave new world of data engineering
Podcast episode
37. Sean Knapp - The brave new world of data engineering
byTowards Data Science
0 ratings
0% found this document useful
48. Big Data Wrangling for Core Sensing Technology
Podcast episode
48. Big Data Wrangling for Core Sensing Technology
byDiscovery to Recovery
0 ratings
0% found this document useful
Keeping ourselves honest when we work with observational healthcare data: The abundance of data in healthcare, and the valu…
Podcast episode
Keeping ourselves honest when we work with observational healthcare data: The abundance of data in healthcare, and the valu…
byLinear Digressions
0 ratings
0% found this document useful
Composable Data Analytics
Podcast episode
Composable Data Analytics
byThe Cloudcast
0 ratings
0% found this document useful
Ep. 145 - Laura Anne Edwards, DATA OASIS founder, NASA Datanaut, TED Resident & SheCanHackIT on Sustainable Innovation and Big Data: Laura Anne Edwards is founder of DATA OASIS and serves as a NASA Datanaut, TED Resident and with SheCanHackIT. Brian Ardinger, Inside Outside Innovation founder, talks with Laura Anne about sustainable innovation and big data. Important Take Aways: Su
Podcast episode
Ep. 145 - Laura Anne Edwards, DATA OASIS founder, NASA Datanaut, TED Resident & SheCanHackIT on Sustainable Innovation and Big Data: Laura Anne Edwards is founder of DATA OASIS and serves as a NASA Datanaut, TED Resident and with SheCanHackIT. Brian Ardinger, Inside Outside Innovation founder, talks with Laura Anne about sustainable innovation and big data. Important Take Aways: Su
byInside Outside Innovation
0 ratings
0% found this document useful
007 Prof. Kristin Persson of the Materials Project – Building a Global Materials Informatics Platform: Summary: This episode focuses on Prof. Kristin Persson’s work directing the Materials Project, where she had her group have built an open-source materials informatics platform that reaches over 75,000 users worldwide. In this episode,...
Podcast episode
007 Prof. Kristin Persson of the Materials Project – Building a Global Materials Informatics Platform: Summary: This episode focuses on Prof. Kristin Persson’s work directing the Materials Project, where she had her group have built an open-source materials informatics platform that reaches over 75,000 users worldwide. In this episode,...
byDataLab: The Materials Informatics Podcast
0 ratings
0% found this document useful
Debezium - Capturing Data the Instant it Happens (with Gunnar Morling)
Podcast episode
Debezium - Capturing Data the Instant it Happens (with Gunnar Morling)
byDeveloper Voices
0 ratings
0% found this document useful
A "AI & ML" Look Ahead for 2020
Podcast episode
A "AI & ML" Look Ahead for 2020
byThe Cloudcast
0 ratings
0% found this document useful
Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel: Data processing technologies have dramatically improved in their sophistication and raw throughput. Unfortunately, the volumes of data that are being generated continue to double, requiring further advancements in the platform capabilities to keep up. As the sophistication increases, so does the complexity, leading to challenges for user experience. Jignesh Patel has been researching these areas for several years in his work as a professor at Carnegie Mellon University. In this episode he illuminates the landscape of problems that we are faced with and how his research is aimed at helping to solve these problems.
Podcast episode
Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel: Data processing technologies have dramatically improved in their sophistication and raw throughput. Unfortunately, the volumes of data that are being generated continue to double, requiring further advancements in the platform capabilities to keep up. As the sophistication increases, so does the complexity, leading to challenges for user experience. Jignesh Patel has been researching these areas for several years in his work as a professor at Carnegie Mellon University. In this episode he illuminates the landscape of problems that we are faced with and how his research is aimed at helping to solve these problems.
byData Engineering Podcast
0 ratings
0% found this document useful
Product Owners in Data Science - Anna Hannemann
Podcast episode
Product Owners in Data Science - Anna Hannemann
byDataTalks.Club
0 ratings
0% found this document useful
Database Monitoring & Observability
Podcast episode
Database Monitoring & Observability
byThe Cloudcast
0 ratings
0% found this document useful
Intelligently Building Community in the AI and Data Science Space—Dr. Alex Liu—RMDS Lab: Former IBM Chief Scientist, Dr. Alex Liu, discusses the services provided by RMDS Lab, a community-based ecosystem provider in the artificial intelligence (AI) and big data sector. You will learn: Why AI and data-related...
Podcast episode
Intelligently Building Community in the AI and Data Science Space—Dr. Alex Liu—RMDS Lab: Former IBM Chief Scientist, Dr. Alex Liu, discusses the services provided by RMDS Lab, a community-based ecosystem provider in the artificial intelligence (AI) and big data sector. You will learn: Why AI and data-related...
byFinding Genius Podcast
0 ratings
0% found this document useful
Build A Data Lake For Your Security Logs With Scanner: Monitoring and auditing IT systems for security events requires the ability to quickly analyze massive volumes of unstructured log data. The majority of products that are available either require too much effort to structure the logs, or aren't fast enough for interactive use cases. Cliff Crosland co-founded Scanner to provide fast querying of high scale log data for security auditing. In this episode he shares the story of how it got started, how it works, and how you can get started with it.
Podcast episode
Build A Data Lake For Your Security Logs With Scanner: Monitoring and auditing IT systems for security events requires the ability to quickly analyze massive volumes of unstructured log data. The majority of products that are available either require too much effort to structure the logs, or aren't fast enough for interactive use cases. Cliff Crosland co-founded Scanner to provide fast querying of high scale log data for security auditing. In this episode he shares the story of how it got started, how it works, and how you can get started with it.
byData Engineering Podcast
0 ratings
0% found this document useful

Skip carousel

Don’t Be Misled by GPT-4’s Gift of Gab
The Atlantic
Article
Don’t Be Misled by GPT-4’s Gift of Gab
Mar 15, 2023
4 min read
Open-source Sabotage Sparks Ethical Storm
PC Pro Magazine
Article
Open-source Sabotage Sparks Ethical Storm
Jun 10, 2021
2 min read
Federica Fragapane
Wallpaper
Article
Federica Fragapane
Sep 17, 2020
7 min read
Idaho Needs To Shore Up Cybersecurity, Task Force Says
TechLife News
Article
Idaho Needs To Shore Up Cybersecurity, Task Force Says
May 7, 2022
2 min read
How to Make Predictive Analytics Work for Your Business
Entrepreneur
Article
How to Make Predictive Analytics Work for Your Business
Jul 1, 2014
1 min read
Inform And Enhance Your Business With Open Data
PC Pro Magazine
Article
Inform And Enhance Your Business With Open Data
Jun 10, 2021
7 min read
Team Encodes Digital ‘Hello’ Into Lab-made DNA
Futurity
Article
Team Encodes Digital ‘Hello’ Into Lab-made DNA
Mar 26, 2019
4 min read
How And Where You Use Machine-learning
APC
Article
How And Where You Use Machine-learning
Oct 7, 2019
4 min read
How To Make Sense From And With AI ?
The European Business Review
Article
How To Make Sense From And With AI ?
Sep 25, 2021
4 min read
Quantum Computing and The Rise Of Machine Learning
Techfastly
Article
Quantum Computing and The Rise Of Machine Learning
Oct 1, 2021
2 min read
Data Centers Aren’t The Energy Hogs We Thought
Futurity
Article
Data Centers Aren’t The Energy Hogs We Thought
Feb 28, 2020
2 min read
Quantum Simulators An Overview
Techfastly
Article
Quantum Simulators An Overview
Oct 1, 2021
4 min read
Cryptographers Solve Decades-Old Privacy Problem
Nautilus
Article
Cryptographers Solve Decades-Old Privacy Problem
Nov 17, 2023
4 min read
‘Deep Learning’ Goes Faster With Organized Data
Futurity
Article
‘Deep Learning’ Goes Faster With Organized Data
Jun 5, 2017
Researchers have found that a technique for speedy data lookup, called hashing, can dramatically reduce the amount of computation required for deep learning, a demanding form of machine learning. “This applies to any deep-learning architecture, and t
2 min read
Federated Learning Uses The Data Right On Our Devices
Futurity
Article
Federated Learning Uses The Data Right On Our Devices
Jul 21, 2022
2 min read
Finding Your Data
APC
Article
Finding Your Data
Sep 9, 2019
4 min read
The Future Is All Quantum
Techfastly
Article
The Future Is All Quantum
Oct 1, 2021
2 min read
Powering Costing With Artificial Intelligence: The Case Of Vodafone Procurement
The European Business Review
Article
Powering Costing With Artificial Intelligence: The Case Of Vodafone Procurement
May 25, 2021
8 min read
Building Trends, Building Momentum
Facility Management
Article
Building Trends, Building Momentum
Oct 14, 2019
3 min read
The Cloud Is All Around Us
MoneyWeek
Article
The Cloud Is All Around Us
Mar 17, 2023
The ways the cloud can be used in our day-to-day lives is unlimited, as these examples help to illustrate. Within entertainment, whether it’s Disney+ or Netflix, the television shows and films we watch are stored in the cloud so that millions can sim
2 min read
Safer Cyber
Cosmos Magazine
Article
Safer Cyber
Mar 14, 2024
3 min read
How To Train Computers Faster For ‘Extreme’ Datasets
Futurity
Article
How To Train Computers Faster For ‘Extreme’ Datasets
Dec 12, 2019
4 min read
Facilities Systems
Facility Management
Article
Facilities Systems
Oct 21, 2018
5 min read
Prototype Paves Way For ‘Computer-on-a-chip’
Futurity
Article
Prototype Paves Way For ‘Computer-on-a-chip’
Feb 22, 2019
2 min read
Herd In The Cloud
Linux Format
Article
Herd In The Cloud
Sep 21, 2021
Matt Yonkovit is Percona’s Head of Open Source Strategy and a member of SHA (Silly Hats Anonymous). “Going ‘cloud native’ involves building applications in new ways. Traditional applications are generally designed with a two- or three-tier architectu
1 min read
Is Artificial Intelligence Permanently Inscrutable?: Despite new biology-like tools, some insist interpretation is impossible.
Nautilus
Article
Is Artificial Intelligence Permanently Inscrutable?: Despite new biology-like tools, some insist interpretation is impossible.
Sep 1, 2016
Dmitry Malioutov can’t say much about what he built. As a research scientist at IBM, Malioutov spends part of his time building machine learning systems that solve difficult problems faced by IBM’s corporate clients. One such program was meant for a
13 min read
Business applications For Quantum computing
Rotman Management
Article
Business applications For Quantum computing
May 1, 2022
COMPUTERS DO ARITHMETIC. Underlying every amazing application of computers today is math, calculated using binary digits or ‘bits.’ The original computers of the early 1950s could perform about 465 multiplications per second — much faster than the ‘h
11 min read
Tensor Flow 101
APC
Article
Tensor Flow 101
Jan 27, 2020
4 min read
Is Artificial Intelligence Permanently Inscrutable?
Nautilus
Article
Is Artificial Intelligence Permanently Inscrutable?
Sep 1, 2016
Dmitry Malioutov can’t say much about what he built. As a research scientist at IBM, Malioutov spends part of his time building machine learning systems that solve difficult problems faced by IBM’s corporate clients. One such program was meant for a
13 min read
What Tech Can Learn from the Fruit Fly’s Search Algorithm
Nautilus
Article
What Tech Can Learn from the Fruit Fly’s Search Algorithm
Nov 13, 2017
5 min read

Related categories

Skip carousel

Reviews for Data Analysis in the Cloud

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Data Analysis in the Cloud - Domenico Talia

Data Analysis in the Cloud

Models, Techniques and Applications

Domenico Talia

Paolo Trunfio

Fabrizio Marozzo

Cover

Title page

Copyright

Dedication

Preface

Chapter 1: Introduction to Data Mining

Abstract

1.1. Data mining concepts

1.2. Parallel and distributed data mining

1.3. Summary

Chapter 2: Introduction to Cloud Computing

Abstract

2.1. Cloud computing: definition, models, and architectures

2.2. Cloud computing systems for data-intensive applications

2.3. Summary

Chapter 3: Models and Techniques for Cloud-Based Data Analysis

Abstracts

3.1. MapReduce for data analysis

3.2. Data analysis workflows

3.3. NoSQL models for data analytics

3.4. Summary

Chapter 4: Designing and Supporting Scalable Data Analytics

Abstract

4.1. Data analysis systems for clouds

4.2. How to design a scalable data analysis framework in clouds

4.3. Programming workflow-based data analysis

4.4. Data analysis case studies

4.5. Summary

Chapter 5: Research Trends in Big Data Analysis

Abstract

5.1. Data-intensive exascale computing

5.2. Massive social network analysis

5.3. Key research areas

5.4. Summary

Copyright

Elsevier

Radarweg 29, PO Box 211, 1000 AE Amsterdam, Netherlands

The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, UK

225 Wyman Street, Waltham, MA 02451, USA

No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions.

This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein).

Notices

Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary.

Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility.

To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein.

ISBN: 978-0-12-802881-0

British Library Cataloguing-in-Publication Data

A catalogue record for this book is available from the British Library

Library of Congress Cataloging-in-Publication Data

A catalog record for this book is available from the Library of Congress

For information on all Elsevier publications visit our website at http://store.elsevier.com/

Dedication

To my beloved parents and to my darling family.

Domenico Talia

To my little daughter, Iris, who joined Erika, Thomas, and me along the way.

Paolo Trunfio

To Laura and to my family.

Fabrizio Marozzo

Preface

The massive amount of digital data currently being generated in all the human activities is a precious source of knowledge to both business and science. However, handling and analyzing huge datasets requires very large storage resources and scalable computing facilities. In fact, the large availability of big data sources demands for efficient data analysis tools and techniques for finding and extracting useful knowledge from them. Big data analysis today can be performed by storing data and running compute-intensive data mining algorithms on cloud computing systems to extract value from data in reduced time. Cloud computing systems can be used to run complex applications on dynamic computing servers and deliver them as services over the Internet. According to their elastic nature, cloud computing infrastructures can serve as effective platforms for addressing the computational and data storage needs of most big data analytics applications that are being developed nowadays. Coping with and gaining value from cloud-based big data, however, requires novel software tools and advanced analysis techniques. Indeed, advanced data mining techniques and innovative tools can help users to understand and extract what is useful in large and complex datasets; and the knowledge extracted from big data sources today is vital in making informed decisions in many business and scientific applications. This process, which constitutes the base for allowing the analysis of big data sources and repositories, must be implemented by combining big data analytics and knowledge discovery techniques with scalable computing systems such as clouds.

All these issues are discussed in this book. In fact, the main goal of the book is to introduce and present models, methods, techniques, and systems useful to analyze large digital data sources by using the computing and storage facilities of cloud computing systems. This book includes, as key topics, scalable data mining and knowledge discovery techniques, together with cloud computing concepts, models, and systems. After introducing these fields, this book focuses on scalable technologies for cloud-based data analysis such as MapReduce, workflows, and NoSQL models, and discusses how to design high-performance distributed analysis of big data on clouds. Finally, this book examines research trends such as big data exascale computing, and massive social network analysis.

This book is for graduate students, researchers, and professionals in cloud computing, big data analysis, distributed data mining, and data analytics. Both readers who are beginners to the subjects and those experienced in the cloud computing and data mining domains will find many topics of interest. Researchers will find some of the latest achievements in the area and significant technologies and examples on the state-of-the-art in cloud-based data analysis and knowledge discovery. Furthermore, graduate students and young researchers will learn useful concepts related to parallel and distributed data mining, cloud computing, data-intensive applications, and scalable data analysis.

Other than introducing the key concepts and systems in the area of cloud-based data analysis, this book presents real case studies that provide a useful guide for developers on issues, prospects, and successful approaches in the practical use of cloud-based data analysis frameworks. In this book, the chapters are presented in a way so that the book could also be used as reference text in graduate and postgraduate courses, in parallel/distributed data mining and in cloud computing for big data analysis.

We would like to thank people from the publisher, Elsevier, particularly Lindsay Lawrence, for their support and work during the book publication process.

We hope readers will find this book’s content interesting, attractive, and useful, as we found it stimulating and exciting to write.

Domenico Talia

Paolo Trunfio

Fabrizio Marozzo

Chapter 1

Introduction to Data Mining

Abstract

We introduce in this chapter the main concepts of data mining. This scientific field, together with Cloud computing, discussed in Chapter 2, is a basic pillar on which the contents of this book are built. Section 1.1 explores the main notions and principles of data mining introducing readers to this scientific field and giving them the needed information on sequential data mining techniques and algorithms that will be used in other sections and chapters of this book. Section 1.2 outlines the most important parallel and distributed data mining strategies and techniques.

Keywords

data mining

classification

clustering

association rules

parallel data mining

distributed data mining

meta-learning

collective data mining

ensemble learning

1.1. Data mining concepts

Computers have been created to help humans in executing complex and long operations automatically. One of the main effects of the invention of computers is the very huge amount of digital data that nowadays is stored in the memory of computers. Those data volumes can be used to know and understand facts, behaviors, and natural phenomena and take decisions on the basis of them. Researchers investigated methods for instructing computers to learn from data. In particular, machine learning is a scientific discipline that deals with the design and implementation of models, procedures, and algorithms that can learn from data. Such techniques are able to build a predictive model based on data input to be used for making predictions or taking decisions. More recently, data mining has been defined as an area of computer science where machine learning techniques are used to discover previously unknown properties in large data sets. More formally, data mining is the analysis of data sets to find interesting, novel, and useful patters, relationships, models, and trends. Data mining tasks include methods at the intersection of artificial intelligence, machine learning, statistics, mathematics, and database systems. The overall practical goal of a data mining task is to extract information from a data set and transform it into an understandable structure for further use. Data mining is considered also the central step of the knowledge discovery in databases (KDD) process that aims at discovering useful patterns and models for making sense of data. The additional steps in the KDD process are data preparation, data selection, data cleaning, incorporation of appropriate prior knowledge, and interpretation of the results of mining. They are essential to ensure, together with the data mining step, that useful knowledge is derived from the data that have to be analyzed.

Many data mining algorithms have been designed and implemented in several research areas such as statistics, machine learning, mathematics, artificial intelligence, pattern recognition, and databases, each of which uses specialized techniques from the respective application field. The most common types of data mining tasks include:

• Classification: the goal is to classify a data set in one or more predefined classes. This is done by models that implement a mapping from a vector of values to a categorical variable. In this way using classification we can predict the membership of a data instance to a given class from a set of predefined classes. For instance, a set of outlet store clients can be grouped in three classes: high spending, average spending, and low spending clients or a set of patients can be classified according to a set of diseases. Classification techniques are used in many application domains such financial services, bioinformatics, document classification, multimedia data, text processing, and social network analysis.

• Regression: it is a predictive technique that associates a data set to a quantitative variable and predicts the value of that variable. There are many applications of regression, such as assessing the likelihood that a patient can get sick from the results of diagnostic tests, predicting the margin of victory of a sport team based on results and technical data of previous matches. Regression is often used in economics, environmental studies, market trends, meteorology, and epidemiology.

• Clustering: this data mining task is targeted to identify a finite set of categories or groupings (clusters) to describe the data. Clustering techniques are used when no class to be predicted is available a priori and data instances are to be divided in groups of similar instances. The groups can be mutually exclusive and exhaustive, or consist of a more extensive representation, such as in the case of hierarchical categories. Examples of clustering applications concern the finding of homogeneous subsets of clients in a database of commercial sales or groups of objects with similar shapes and colors. Among the application domains where clustering is used are gene analysis, network intrusion, medical imaging, crime analysis, climatology, and text mining. Unlike classification in which classes are predefined, in clustering the classes must be derived from data, looking for clusters based on metrics of similarity between data without the assistance of users.

• Summarization: this data mining task provides a compact description of a subset of data. Summarization methods for unstructured data usually involve text classification that groups together documents sharing similar characteristics. An example of summarization of quantitative data is the tabulation of the mean and standard deviation of each data field. More complex functions involve summary rules and the discovery of functional relationships between variables. Summarization techniques are often used in the interactive analysis of data and the automatic generation of reports.

• Dependency modeling: this task consists in finding a model that describes significant dependencies between variables. Here the goal is to discover how some data values depend on other data values. Dependency models are at two levels: the structural level of the model specifies which variables are locally dependent on each other, while the quantitative level specifies the power of dependencies using a numeric scale. Dependency modeling approaches are used in retail, business process management, software development, and assembly line optimization.

• Association rule discovery: this task aims at finding sets of items that occur together in records of a data set and the relationships among those items in order to derive multiple correlations that meet the specified thresholds. It is intended to identify strong rules discovered in

Enjoying the preview?

Page 1 of 1

Data Analysis in the Cloud: Models, Techniques and Applications

About this ebook

Domenico Talia

Related authors

Related to Data Analysis in the Cloud

Related ebooks

Enterprise Applications For You

Related podcast episodes

Related articles

Related categories

Reviews for Data Analysis in the Cloud

What did you think?

Book preview

Data Analysis in the Cloud - Domenico Talia

Table of Contents

Copyright

Notices

Dedication

Preface

Abstract

Keywords

1.1. Data mining concepts