100%(1)100% au considerat acest document util (1 vot)
30 vizualizări4 pagini
Cloud computing is a promising computing model that enables convenient and on demand network access to a shared pool of configurable computing resources. The first offered cloud service is moving data into the cloud: data owners let cloud service providers host their data on cloud servers and data consumers can access the data from the cloud servers. This new paradigm of data storage service also introduces new security challenges, because data owners and data servers have different identities and different business interests with map and reduce tasks in different jobs. Therefore, an independent auditing service is required to make sure that the data is correctly hosted in the Cloud. The goal is to improve data locality for both map tasks and reduce tasks, avoid job starvation, and improve job execution performance. Two variations are further introduced to separately achieve a better map-data locality and a faster task assignment. We conduct extensive experiments to evaluate and compare the two variations with current scheduling algorithms. The results show that the two variations outperform the other tested algorithms in terms of map-data locality, reduce-data locality, and network overhead without incurring significant overhead. In addition, the two variations are separately suitable for different Map Reduce workload scenarios and provide the best job performance among all tested algorithms in cloud computing data storage.
Titlu original
A Survey: Hybrid Job-Driven Meta Data Scheduling for Data storage with Internet Approach
Cloud computing is a promising computing model that enables convenient and on demand network access to a shared pool of configurable computing resources. The first offered cloud service is moving data into the cloud: data owners let cloud service providers host their data on cloud servers and data consumers can access the data from the cloud servers. This new paradigm of data storage service also introduces new security challenges, because data owners and data servers have different identities and different business interests with map and reduce tasks in different jobs. Therefore, an independent auditing service is required to make sure that the data is correctly hosted in the Cloud. The goal is to improve data locality for both map tasks and reduce tasks, avoid job starvation, and improve job execution performance. Two variations are further introduced to separately achieve a better map-data locality and a faster task assignment. We conduct extensive experiments to evaluate and compare the two variations with current scheduling algorithms. The results show that the two variations outperform the other tested algorithms in terms of map-data locality, reduce-data locality, and network overhead without incurring significant overhead. In addition, the two variations are separately suitable for different Map Reduce workload scenarios and provide the best job performance among all tested algorithms in cloud computing data storage.
Cloud computing is a promising computing model that enables convenient and on demand network access to a shared pool of configurable computing resources. The first offered cloud service is moving data into the cloud: data owners let cloud service providers host their data on cloud servers and data consumers can access the data from the cloud servers. This new paradigm of data storage service also introduces new security challenges, because data owners and data servers have different identities and different business interests with map and reduce tasks in different jobs. Therefore, an independent auditing service is required to make sure that the data is correctly hosted in the Cloud. The goal is to improve data locality for both map tasks and reduce tasks, avoid job starvation, and improve job execution performance. Two variations are further introduced to separately achieve a better map-data locality and a faster task assignment. We conduct extensive experiments to evaluate and compare the two variations with current scheduling algorithms. The results show that the two variations outperform the other tested algorithms in terms of map-data locality, reduce-data locality, and network overhead without incurring significant overhead. In addition, the two variations are separately suitable for different Map Reduce workload scenarios and provide the best job performance among all tested algorithms in cloud computing data storage.
International e-Journal For Technology And Research-2018
A Survey: Hybrid Job-Driven Meta Data
Scheduling for Data storage with Internet Approach Ms. BHANUPRIYA S V 1, Mrs. SHRUTHI G 2 Department of Computer Science and Engineering 1 M-Tech, Student, DBIT, Bengaluru, India 2 Guide and Professor, DBIT, Bengaluru, India
1. ABSTRACT performance among all tested algorithms in cloud
computing data storage. Cloud computing is a promising computing Keywords: Cloud Computing, Communication model that enables convenient and on demand network System, IAAS, Scheduling Process, Auditing access to a shared pool of configurable computing Process. resources. The first offered cloud service is moving data into the cloud: data owners let cloud service providers host their data on cloud servers and data 2. INTRODUCTION consumers can access the data from the cloud servers. Cloud computing is a promising computing model This new paradigm of data storage service also that enables convenient and on-demand network introduces new security challenges, because data access to a shared pool of computing resources. Cloud owners and data servers have different identities and computing offers a group of services, including different business interests with map and reduce tasks Software as a Service, Platform as a Service and in different jobs. Therefore, an independent auditing Infrastructure as a Service. Cloud storage is an service is required to make sure that the data is important service of cloud computing, which allows correctly hosted in the Cloud. The goal is to improve data owners to move data from their local computing data locality for both map tasks and reduce tasks, systems to the Cloud. More and more data owners avoid job starvation, and improve job execution start choosing to host their data in the Cloud. By performance. Two variations are further introduced to hosting their data in the Cloud, data owners can avoid separately achieve a better map-data locality and a the initial investment of expensive infrastructure setup, faster task assignment. We conduct extensive large equipments, and daily maintenance cost. The experiments to evaluate and compare the two data owners only need to pay the space they actually variations with current scheduling algorithms. The use, e.g., cost-per-giga byte stored model. Another results show that the two variations outperform the reason is that data owners can rely on the Cloud to other tested algorithms in terms of map-data locality, provide more reliable services, so that they can access reduce-data locality, and network overhead without data from anywhere and at any time. Individuals or incurring significant overhead. In addition, the two small-sized companies usually do not have the variations are separately suitable for different Map resource to keep their servers as reliable as the Cloud Reduce workload scenarios and provide the best job
IDL - International Digital Library 1|P a g e Copyright@IDL-2017
IDL - International Digital Library Of Technology & Research Available at: www.dbpublications.org
International e-Journal For Technology And Research-2018
does. By hosting data in the Cloud, it introduces new guaranteed to provide unbiased and honest auditing security challenges. result. Data storage auditing is a very resource Firstly, user can be authorized to store data demanding operation in terms of computational in a cloud according to job scheduling with connected resource, memory space, and communication cost. internet. Secondly, the data owners would worry their data could be take max storage in cloud or data loss. 3. SURVEYS This is because data loss could happen in any infrastructure, no matter what high degree of reliable 3.1 “Data storage auditing service in cloud measures the cloud service providers would take. computing: challenges, methods and Some recent data loss incidents are the Sidekick Cloud opportunities” Disaster in 2009 and the breakdown of Amazon’s In this survey author said that, cloud Elastic Compute Cloud (EC2) in 2010. Sometimes, the computing is a promising computing model that cloud service providers may be dishonest and they enables convenient and on-demand network access to may discard the data which has not been accessed or a shared pool of configurable computing resources. rarely accessed to save the storage space or keep fewer The first offered cloud service is moving data into the replicas than promised. Moreover, the cloud service cloud: data owners let cloud service providers host providers may choose to hide data loss and claim that their data on cloud servers and data consumers can the data are still correctly stored in the Cloud. As a access the data from the cloud servers. This new result, data owners need to be convinced that their data paradigm of data storage service also author are correctly stored in the Cloud. Checking on introduces new security challenges, because data retrieval is a common method for checking the data owners and data servers have different identities and integrity, which means data owners check the data different business interests. Therefore, an integrity when accessing their data. This method has independent auditing service is required to make sure been used in peer-to-peer storage systems, network file that the data is correctly hosted in the Cloud. In this systems, long-term archives, web-service object stores survey paper, they investigate this kind of problem and database systems. However, checking on retrieval and give an extensive survey of storage auditing is not sufficient to check the integrity for all the data methods in the literature. First, they give a set of stored in the Cloud. There is usually a large amount of requirements of the auditing protocol for data storage data stored in the Cloud, but only a small percentage is in cloud computing. Then, they introduce some frequently accessed. There is no guarantee for the data existing auditing schemes and analyze them in terms that are rarely accessed. An improved method was of security and performance. Finally, some proposed by generating some virtual retrieval to check challenging issues are introduced in the design of the integrity of rarely accessed data. But this causes efficient auditing protocol for data storage in cloud heavy I/O overhead on the cloud servers and high computing. communication cost due to the data retrieval operations. 3.2 “Efficient Public Integrity Checking for Cloud Therefore, it is desirable to have storage auditing Data Sharing with Multi-User Modification” service to assure data owners that their data are In past years a body of data integrity correctly stored in the Cloud. But data owners are not checking techniques have been proposed for securing willing to perform such auditing service due to the cloud data services. Most of these survey assume that heavy overhead and cost. In fact, it is not fair to let any only the data owner can modify cloud-stored data. side of the cloud service providers or the data owners Recently a few attempts started considering more conduct the auditing, because neither of them could be realistic scenarios by allowing multiple cloud users to
IDL - International Digital Library 2|P a g e Copyright@IDL-2017
IDL - International Digital Library Of Technology & Research Available at: www.dbpublications.org
International e-Journal For Technology And Research-2018
modify data with integrity assurance. However, these workload scenarios and provide the best job attempts are still far from practical due to the performance among all tested algorithms. tremendous computational cost on cloud users. Moreover, collusion between misbehaving cloud 3.4 “Hybrid Job-Driven Scheduling for Virtual servers and revoked users is not considered. This MapReduce Clusters” paper proposes a novel data integrity checking It is cost-efficient for a tenant with a limited scheme characterized by multi-user modification, budget to establish a virtual MapReduce cluster by collusion resistance and a constant computational renting multiple virtual private servers (VPSs) from a cost of integrity checking for cloud users, this survey VPS provider. To provide an appropriate scheduling novel design of polynomial-based authentication tags scheme for this type of computing environment, we and proxy tag update techniques. This survey scheme propose in this paper a hybrid job-driven scheduling also supports public checking and efficient user scheme (JoSS for short) from a tenant's perspective. revocation and is provably secure. Numerical JoSS provides not only job-level scheduling, but also analysis and extensive experimental results show the map-task level scheduling and reduce-task level efficiency and scalability of our proposed scheme. scheduling. JoSS classifies MapReduce jobs based on job scale and job type and designs an appropriate 3.3 “Hybrid Job-Driven Meta Data Scheduling for scheduling policy to schedule each class of jobs. This BigData with MapReduce Clusters and Internet survey goal is to improve data locality for both map Approach” tasks and reduce tasks, avoid job starvation, and It is cost-efficient for a tenant with a limited improve job execution performance. Two variations budget to establish a virtual Map Reduce cluster by of JoSS are further introduced to separately achieve a renting multiple virtual private servers (VPSs) from a better map-data locality and a faster task assignment. VPS provider. To provide an appropriate scheduling We conduct extensive experiments to evaluate and scheme for this type of computing environment, we compare the two variations with current scheduling propose in this paper a hybrid job-driven scheduling algorithms supported by Hadoop. The results show scheme (JoSS for short) from a tenant’s perspective. that the two variations outperform the other tested JoSS provides not only job level scheduling, but also algorithms in terms of map-data locality, reduce-data map-task level scheduling and reduce-task level locality, and network overhead without incurring scheduling. JoSS classifies Map Reduce jobs based significant overhead. In addition, the two variations on job scale and job type and designs an appropriate are separately suitable for different MapReduce- scheduling policy to schedule each class of jobs. The workload scenarios and provide the best job goal is to improve data locality for both map tasks performance among all tested algorithms. and reduce tasks, avoid job starvation, and improve job execution performance. Two variations of JoSS 3.5 “A new approach to internet host mobility. are further introduced to separately achieve a better ACM Computer Comminication” map-data locality and a faster task assignment. We This paper describes a new approach to conduct extensive experiments to evaluate and Internet host mobility. We argue that by separating compare the two variations with current scheduling local and wide area mobility, the performance of algorithms supported by Hadoop. The survey show existing mobile host protocols (e.g. Mobile IP) can be that the two variations outperform the other tested significantly improved. We propose Cellular IP, a algorithms in terms of map-data locality, reduce data new lightweight and robust protocol that is optimized locality, and network overhead without incurring to support local mobility but efficiently interworks significant overhead. In addition, the two variations with Mobile IP to provide wide area mobility are separately suitable for different Map Reduce support. Cellular IP shows great benefit in
IDL - International Digital Library 3|P a g e Copyright@IDL-2017
IDL - International Digital Library Of Technology & Research Available at: www.dbpublications.org
International e-Journal For Technology And Research-2018
comparison to existing host mobility proposals for [2] P. K. McKinley, H. Xu, A. H. Esfahanian, and L. environments where mobile hosts migrate frequently, M. Ni. Unicastbased cloud storage communication in which we argue, will be the rule rather than the wormhole-routed direct networks. TPDS, 1992. exception as Internet wireless access becomes ubiquitous. Cellular IP maintains distributed cache [3] H. Wu, C. Qiao, S. De, and O. Tonguz. Integrated for location management and routing purposes. cell and ad hoc relaying systems: iCAR. J-SAC, 2001. Distributed paging cache coarsely maintains the position of ‘idle’ mobile hosts in a service area. [4] Y. H. Tam, H. S. Hassanein, S. G. Akl, and R. Cellular IP uses this paging cache to quickly and Benkoczi. Optimal multi-hop cellular architecture for efficiently pinpoint ‘idle’ mobile hosts that wish to wireless communications. In Proc. of LCN, 2006. engage in ‘active’ communications. This approach is beneficial because it can accommodate a large [5] Y. D. Lin and Y. C. Hsu. Multi-hop cellular: A number of users attached to the network without new architecture for wireless communications. In overloading the location management system. Proc. of INFOCOM, 2000. Distributed routing cache maintains the position of active mobile hosts in the service area and [6] P. T. Oliver, Dousse, and M. Hasler. Connectivity dynamically refreshes the routing state in response to in ad hoc and hybrid networks. In Proc. of INFOCOM, the handoff of active mobile hosts. These distributed 2002. location management and routing algorithms lend [7] E. P. Charles and P. Bhagwat. Highly dynamic themselves to a simple and low cost implementation destination sequenced distance vector routing (DSDV) of Internet host mobility requiring no new packet for mobile computers.In Proc. of SIGCOMM, 1994. formats, encapsulations or address space allocation beyond what is present in IP. [8] C. Perkins, E. Belding-Royer, and S. Das. RFC 3561: Ad hoc on demand distance vector (AODV) 4. CONCLUSION routing. Technical report, Internet Engineering Task In this paper, we have plan to investigate the Force, 2003. auditing problem for data storage in cloud computing and proposed a set of requirements of designing the [9] D. B. Johnson and D. A. Maltz. Dynamic source third Party Auditing protocols. Here we applying job routing in adhoc wireless networks. IEEE Mobile scheduling process with two phase level, its help to Computing, 1996. comparison all type of processing issue from storage system with respect data and time. Finally, we have [10] V. D. Park and M. Scott Corson. A highly plan to introduce some challenging issues in the adaptive distributed routing algorithm for mobile design of efficient auditing protocols for data storage wireless networks. In Proc. of INFOCOM, 1997. in cloud computing. [11] R. S. Chang, W. Y. Chen, and Y. F. Wen. Hybrid wireless network protocols. IEEE Transaction on 5.REFERENCES Vehicular Technology, 2003. [1] H Luo, R. Ramjee, P. Sinha, L. Li, and S. Lu. Ucan: A unified cell and cloud computing [12] G. N. Aggelou and R. Tafazolli. On the relaying architecture. In Proc. of MOBICOM, 2003. capacity of nextgeneration gsm cellular networks. IEEE Personal Communications Magazine, 2001.
IDL - International Digital Library 4|P a g e Copyright@IDL-2017