Computation and Storage in the Cloud: Understanding the Trade-Offs
By Dong Yuan, Yun Yang and Jinjun Chen
5/5
()
About this ebook
Computation and Storage in the Cloud is the first comprehensive and systematic work investigating the issue of computation and storage trade-off in the cloud in order to reduce the overall application cost. Scientific applications are usually computation and data intensive, where complex computation tasks take a long time for execution and the generated datasets are often terabytes or petabytes in size. Storing valuable generated application datasets can save their regeneration cost when they are reused, not to mention the waiting time caused by regeneration. However, the large size of the scientific datasets is a big challenge for their storage. By proposing innovative concepts, theorems and algorithms, this book will help bring the cost down dramatically for both cloud users and service providers to run computation and data intensive scientific applications in the cloud.
- Covers cost models and benchmarking that explain the necessary tradeoffs for both cloud providers and users
- Describes several novel strategies for storing application datasets in the cloud
- Includes real-world case studies of scientific research applications
- Covers cost models and benchmarking that explain the necessary tradeoffs for both cloud providers and users
- Describes several novel strategies for storing application datasets in the cloud
- Includes real-world case studies of scientific research applications
Dong Yuan
Dong Yuan is currently a research fellow in School of Software and Electrical Engineering at Swinburne University of Technology, Melbourne, Australia. His research interests include data management in parallel and distributed systems, scheduling and resource management, grid and cloud computing.
Related to Computation and Storage in the Cloud
Related ebooks
Web Semantics: Cutting Edge and Future Directions in Healthcare Rating: 0 out of 5 stars0 ratingsEnergy Efficiency of Medical Devices and Healthcare Applications Rating: 0 out of 5 stars0 ratingsDeep Learning for Medical Applications with Unique Data Rating: 0 out of 5 stars0 ratingsAn Introduction to the History of Science Rating: 0 out of 5 stars0 ratingsUttar Pradesh: Modern Business Hub Rating: 0 out of 5 stars0 ratingsBig Data Mining for Climate Change Rating: 0 out of 5 stars0 ratingsLaboring Bodies and the Quantified Self Rating: 0 out of 5 stars0 ratingsService Science, Management, and Engineering:: Theory and Applications Rating: 0 out of 5 stars0 ratingsIndia, Pakistan, and the Bomb: Debating Nuclear Stability in South Asia Rating: 0 out of 5 stars0 ratingsPath Planning for Vehicles Operating in Uncertain 2D Environments Rating: 0 out of 5 stars0 ratingsArtificial Intelligence, Expert Systems & Symbolic Computing Rating: 0 out of 5 stars0 ratingsMedia, Conflict and Peace in Northeast India Rating: 0 out of 5 stars0 ratingsDictionary of Information Science and Technology Rating: 0 out of 5 stars0 ratingsRecent Trends in Computational Intelligence Enabled Research: Theoretical Foundations and Applications Rating: 0 out of 5 stars0 ratingsNanomaterials-Based Charge Trapping Memory Devices Rating: 0 out of 5 stars0 ratingsReliability Assurance of Big Data in the Cloud: Cost-Effective Replication-Based Storage Rating: 5 out of 5 stars5/5Data Analysis in the Cloud: Models, Techniques and Applications Rating: 0 out of 5 stars0 ratingsDeep Learning on Edge Computing Devices: Design Challenges of Algorithm and Architecture Rating: 0 out of 5 stars0 ratingsDistributed and Cloud Computing: From Parallel Processing to the Internet of Things Rating: 5 out of 5 stars5/5Computational Intelligence for Multimedia Big Data on the Cloud with Engineering Applications Rating: 0 out of 5 stars0 ratingsSystems Programming: Designing and Developing Distributed Applications Rating: 0 out of 5 stars0 ratingsMastering Cloud Computing: Foundations and Applications Programming Rating: 0 out of 5 stars0 ratingsComputational Learning Approaches to Data Analytics in Biomedical Applications Rating: 5 out of 5 stars5/5Optimized Cloud Resource Management and Scheduling: Theories and Practices Rating: 0 out of 5 stars0 ratingsFundamentals of Data Science: Theory and Practice Rating: 0 out of 5 stars0 ratingsManaging the Web of Things: Linking the Real World to the Web Rating: 0 out of 5 stars0 ratingsMicrogrid Methodologies and Emergent Applications Rating: 0 out of 5 stars0 ratingsTemporal QOS Management in Scientific Cloud Workflow Systems Rating: 0 out of 5 stars0 ratingsEnergy Positive Neighborhoods and Smart Energy Districts: Methods, Tools, and Experiences from the Field Rating: 0 out of 5 stars0 ratingsDeep Learning: Convergence to Big Data Analytics Rating: 0 out of 5 stars0 ratings
Databases For You
Blockchain Basics: A Non-Technical Introduction in 25 Steps Rating: 5 out of 5 stars5/5Practical Data Analysis Rating: 4 out of 5 stars4/5Access 2019 For Dummies Rating: 0 out of 5 stars0 ratings100+ SQL Queries T-SQL for Microsoft SQL Server Rating: 4 out of 5 stars4/5Grokking Algorithms: An illustrated guide for programmers and other curious people Rating: 4 out of 5 stars4/5SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL Rating: 4 out of 5 stars4/5Learn SQL in 24 Hours Rating: 5 out of 5 stars5/5Access 2016 For Dummies Rating: 0 out of 5 stars0 ratingsPython Projects for Everyone Rating: 0 out of 5 stars0 ratingsLearn SQL Server Administration in a Month of Lunches Rating: 3 out of 5 stars3/5SQL Clearly Explained Rating: 5 out of 5 stars5/5Artificial Intelligence for Fashion: How AI is Revolutionizing the Fashion Industry Rating: 0 out of 5 stars0 ratingsQuery Store for SQL Server 2019: Identify and Fix Poorly Performing Queries Rating: 0 out of 5 stars0 ratingsCOBOL Basic Training Using VSAM, IMS and DB2 Rating: 5 out of 5 stars5/5Developing High Quality Data Models Rating: 0 out of 5 stars0 ratingsOracle DBA Mentor: Succeeding as an Oracle Database Administrator Rating: 0 out of 5 stars0 ratingsData Governance: How to Design, Deploy and Sustain an Effective Data Governance Program Rating: 4 out of 5 stars4/5Text Analytics with Python: A Practitioner's Guide to Natural Language Processing Rating: 0 out of 5 stars0 ratingsSQL Server: Tips and Tricks - 2 Rating: 4 out of 5 stars4/5Beginning Microsoft Power BI: A Practical Guide to Self-Service Data Analytics Rating: 0 out of 5 stars0 ratingsBuilding a Scalable Data Warehouse with Data Vault 2.0 Rating: 4 out of 5 stars4/5Business Intelligence Strategy and Big Data Analytics: A General Management Perspective Rating: 5 out of 5 stars5/5Access 2010 All-in-One For Dummies Rating: 4 out of 5 stars4/5Data Mining: Concepts and Techniques Rating: 4 out of 5 stars4/5COMPUTER SCIENCE FOR ROOKIES Rating: 0 out of 5 stars0 ratingsServerless Architectures on AWS, Second Edition Rating: 5 out of 5 stars5/5Data Science Strategy For Dummies Rating: 0 out of 5 stars0 ratings
Reviews for Computation and Storage in the Cloud
2 ratings0 reviews
Book preview
Computation and Storage in the Cloud - Dong Yuan
computing.
Preface
Nowadays, scientific research increasingly relies on IT technologies, where large-scale and high-performance computing systems (e.g. clusters, grids and supercomputers) are utilised by the communities of researchers to carry out their applications. Scientific applications are usually computation and data-intensive, where complex computation tasks take a long time for execution and the generated data sets are often terabytes or petabytes in size. Storing valuable generated application data sets can save their regeneration cost when they are reused, not to mention the waiting time caused by regeneration. However, the large size of the scientific data sets makes their storage a big challenge.
In recent years, cloud computing is emerging as the latest distributed computing paradigm which provides redundant, inexpensive and scalable resources on demand to system requirements. It offers researchers a new way to deploy computation and data-intensive applications (e.g. scientific applications) without any infrastructure investments. Large generated application data sets can be flexibly stored or deleted (and regenerated whenever needed) in the cloud, since, theoretically, unlimited storage and computation resources can be obtained from commercial cloud service providers.
With the pay-as-you-go model, the total application cost for generated data sets in the cloud depends chiefly on the method used for storing them. For example, storing all the generated application data sets in the cloud may result in a high storage cost since some data sets may be seldom used but large in size; but if we delete all the generated data sets and regenerate them every time they are needed, the computation cost may also be very high. Hence, there is a trade-off between computation and storage in the cloud. In order to reduce the overall application cost, a good strategy is to find a balance to selectively store some popular data sets and regenerate the rest when needed. This book focuses on cost-effective data sets storage of scientific applications in the cloud, which is currently a leading-edge and challenging topic. By investigating the niche issue of computation and storage trade-off, we (1) propose a new cost model for data sets storage in the cloud; (2) develop novel benchmarking approaches to find the minimum cost of storing the application data; and (3) design innovative runtime storage strategies to store the application data in the cloud.
We start with introducing a motivating example from astrophysics and analyse the problems of computation and storage trade-off in the cloud. Based on the requirements identified, we propose a novel concept of Data Dependency Graph (DDG) and propose an effective data sets storage cost model in the cloud. DDG is based on data provenance, which records the generation relationship of all the data sets. With DDG, we know how to effectively regenerate data sets in the cloud and can further calculate their generation costs. The total application cost for the generated data sets includes both their generation cost and their storage cost.
Based on the cost model, we develop novel algorithms which can calculate the minimum cost for storing data sets in the cloud, i.e. the best trade-off between computation and storage. This minimum cost is a benchmark for evaluating the cost-effectiveness of different storage strategies in the cloud. For different situations, we develop different benchmarking approaches with polynomial time complexity for a seemingly NP-hard problem, where (1) the static on-demand approach is for situations in which only occasional benchmarking is requested; and (2) the dynamic on-the-fly approach is suitable for situations in which more frequent benchmarking is requested at runtime.
We develop novel cost-effective storage strategies for users to facilitate at runtime of the cloud. These are different from the minimum cost benchmarking approach, and sometimes users may have certain preferences regarding storage of some particular data sets due to reasons other than cost – e.g. guaranteeing immediate access to certain data sets. Hence, users’ preferences should also be considered in a storage strategy. Based on these considerations, we develop two cost-effective storage strategies for different situations: (1) the cost-rate-based strategy is highly efficient with fairly reasonable cost-effectiveness; and (2) the local-optimisation-based strategy is highly cost-effective with very reasonable time complexity.
To the best of our knowledge, this book is the first comprehensive and systematic work investigating the issue of computation and storage trade-off in the cloud in order to reduce the overall application cost. By proposing innovative concepts, theorems and algorithms, the major contribution of this book is that it helps bring the cost down dramatically for both cloud users and service providers to run computation and data-intensive scientific applications in the cloud.
1
Introduction
This book investigates the trade-off between computation and storage in the cloud. This is a brand new and significant issue for deploying applications with the pay-as-you-go model in the cloud, especially computation and data-intensive scientific applications. The novel research reported in this book is for both cloud service providers and users to reduce the cost of storing large generated application data sets in the cloud. A suite consisting of a novel cost model, benchmarking approaches and storage strategies is designed and developed with the support of new concepts, solid theorems and innovative algorithms. Experimental evaluation and case study demonstrate that our work helps bring the cost down dramatically for running the computation and data-intensive scientific applications in the