Sunteți pe pagina 1din 6

LITERATURE SURVEY

Introduction:
With the significant advances in Information and Communications Technology Computing is being transformed to a model consisting of services that are commoditized and delivered in a manner similar to traditional utilities such as water, electricity, gas, and telephony. In such a model, users access services based on their requirements without regard to where the services are hosted or how they are delivered. Several computing paradigms have promised to deliver this utility computing vision and they include cluster computing, Grid computing, and now Cloud computing. At present, it is common to access content across the Internet independently without reference to the underlying hosting infrastructure. This infrastructure consists of data centers that are monitored and maintained around the clock by content providers. Cloud computing is an extension of this paradigm wherein the capabilities of business applications are exposed as sophisticated services that can be accessed over a network. Cloud service providers are incentivized by the profits to be made by charging consumers for accessing these services. Consumers, such as enterprises, are attracted by the opportunity for reducing or eliminating costs associated with ``in-house'' provision of these services. However, since cloud applications may be crucial to the core business operations of the consumers, it is essential that the consumers have guarantees from providers on service delivery. Typically, these are provided through Service Level Agreements (SLAs) brokered between the providers and consumers. CLUSETR: A cluster is a type of parallel and distributed system, which consists of a collection of inter-connected stand-alone computers working together as a single integrated computing resource. GRID: A Grid is a type of parallel and distributed system that enables The sharing, selection, and aggregation of geographically distributed `autonomous' resources dynamically at runtime depending on their availability, capability, performance, cost, and users' quality-of-service requirements. CLOUD:A Cloud is a type of parallel and distributed system consisting of a collection of inter-connected and virtualized computers that are dynamically provisioned and presented as one or more unified computing resource(s) based on service-level agreements established through negotiation between the service provider and consumers. A set of characteristics that helps distinguish cluster, Grid and Cloud computing systems is listed in Table 1 in imp3. The resources in clusters are located in a single administrative domain and managed by a single entity whereas, in Grid systems, resources are geographically distributed

across multiple administrative domains with their own management policies and goals. Another key difference between cluster and Grid systems arises from the way application scheduling is performed. The schedulers in cluster systems focus on enhancing the overall system performance and utility as they are responsible for the whole system. On the other hand, the schedulers in Grid systems called resource brokers, focusing on enhancing the performance of a specific application in such a way that its end-users' QoS requirements are met. Cloud computing platforms possess characteristics of both clusters and Grids, with its own special attributes and capabilities such strong support for virtualization, dynamically compo sable services with Web Service interfaces, and strong support for creating 3rd party, value added services by building on Cloud compute, storage, and application services. Thus, Clouds are promising to provide services to users without reference to the infrastructure on which these are hosted. Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. This cloud model promotes availability and is composed of five essential characteristics, three service models, and four deployment models. On-demand self-service. A consumer can unilaterally provision computing capabilities, such as server time and network storage, as needed automatically without requiring human interaction with each services provider. Broad network access. Capabilities are available over the network. Resource pooling. The providers computing resources are pooled to serve multiple consumers using a multi-tenant model, with different physical and virtual resources dynamically assigned and reassigned according to consumer demand. There is a sense of location independence in that the customer generally has no control or knowledge over the exact location of the provided resources but may be able to specify location at a higher level of abstraction (e.g., country, state, or datacenter). Examples of resources include storage, processing, memory, network bandwidth, and virtual machines. Rapid elasticity. Capabilities can be rapidly and elastically provisioned, in some cases automatically, to quickly scale out and rapidly released to quickly scale in. To the consumer, the capabilities available for provisioning often appear to be unlimited and can be purchased in any quantity at any time. Measured Service. Cloud systems automatically control and optimize resource use by leveraging a metering capability at some level of abstraction appropriate to the type of service (e.g., storage, processing, bandwidth, and active user accounts). Resource usage can be

monitored, controlled, and reported providing transparency for both the provider and consumer of the utilized service. Service Models: Cloud Software as a Service (SaaS). The capability provided to the consumer is to use the providers applications running on a cloud infrastructure. The applications are accessible from various client devices through a thin client interface such as a web browser (e.g., web-based email). The consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, storage, or even individual application capabilities, with the possible exception of limited user-specific application configuration settings. Cloud Platform as a Service (PaaS) . The capability provided to the consumer is to deploy onto the cloud infrastructure consumer-created or acquired applications created using programming languages and tools supported by the provider. The consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, or storage, but has control over the deployed applications and possibly application hosting environment configurations. Cloud Infrastructure as a Service (IaaS) . The capability provided to the consumer is to provision processing, storage, networks, and other fundamental computing resources where the consumer is able to deploy and run arbitrary software, which can include operating systems and applications. The consumer does not manage or control the underlying cloud infrastructure but has control over operating systems, storage, deployed applications, and possibly limited control of select networking components (e.g., host firewalls). Deployment Models: Private cloud. The cloud infrastructure is operated solely for an organization. It may be managed by the organization or a third party and may exist on premise or off premise. Community cloud. The cloud infrastructure is shared by several organizations and supports a specific community that has shared concerns (e.g., mission, security requirements, policy, and compliance considerations). It may be managed by the organizations or a third party and may exist on premise or off premise. Public cloud. The cloud infrastructure is made available to the general public or a large industry group and is owned by an organization selling cloud services. Hybrid cloud. The cloud infrastructure is a composition of two or more clouds (private, community, or public) that remain unique entities but are bound together by standardized or proprietary technology that enables data and application portability (e.g., cloud bursting for loadbalancing between clouds). Cloud computing has particular characteristics that distinguish it from classical resource and service provisioning environments: Infinitely (more or less) Scalable

Cost saving/less capital expenditure Higher resource Utilization Business agility Disaster recovery and Back up Device and Location Independence There are various reasons that warns us for the adoption of cloud computing. Security Security issue has played the most important role in hindering Cloud computing acceptance. Various security issues, possible in cloud computing are: availability, integrity, confidentiality, data access, data segregation, privacy, recovery, accountability, multi-tenancy issues and so on. Solution to various cloud security issues vary through cryptography, particularly public key infrastructure (PKI),use of multiple cloud providers, standardization of APIs, improving virtual machines support and legal support. Difficult to migrate Its not very easy to move the applications from an enterprise to cloud computing environment or even within different cloud computing platforms because different cloud providers support different application architectures which are also dissimilar from enterprise application architectures [12]. Internet dependency performance and availability Cloud computing services relies fully on the availability, speed, quality and performance of internet as it works as carrier in between consumer and service provider. Downtime and service level In business applications, downtime is common concern because every minute of downtime is minute in which important business application cant be performed which degrades the performance of organization as well reputation also. In a cloud computing environment, the traditional role of service provider is divided into two: the infrastructure providers who manage cloud platforms and lease resources according to a usagebased pricing model, and service providers, who rent resources from one or many infrastructure providers to serve the end users. The emergence of cloud computing has made a tremendous impact on the Information Technology (IT) industry over the past few years, where large companies such as Google, Amazon and Microsoft strive to provide more powerful, reliable and cost-efficient cloud platforms, and business enterprises seek to reshape their business models to gain benefit from this new paradigm. Indeed, cloud computing provides several compelling features that make it attractive to business owners.

Architectural design of data centers A data center, which is home to the computation power and storage, is central to cloud computing and contains thousands of devices like servers, switches and routers. Proper planning of this network architecture is critical, as it will heavily influence applications performance and throughput in such a distributed computing environment. Further, scalability and resiliency features need to be carefully considered. Currently, a layered approach is the basic foundation of the network architecture design, which has been tested in some of the largest deployed data centers. The basic layers of a data center consist of the core, aggregation, and access layers, as shown in Fig. 3. The access layer is where the servers in racks physically connect to the network. There are typically 20 to 40 servers per rack, each connected to an access switch with a 1 Gbps link. Access switches usually connect to two aggregation switches for redundancy with 10GBps links. Distributed file system over clouds Google File System (GFS) is a proprietary distributed file system developed by Google and specially designed to provide efficient, reliable access to data using large clusters of commodity servers. Files are divided into chunks of 64 megabytes, and are usually appended to or read and only extremely rarely overwritten or shrunk. Compared with traditional file systems, GFS is designed and optimized to run on data centers to provide extremely high data throughputs, low latency and survive individual server failures. Inspired by GFS, the open source Hadoop Distributed File System (HDFS) stores large files across multiple machines. It achieves reliability by replicating the data across multiple servers. Similarly to GFS, data is stored on multiple geo-diverse nodes. The file system is built from a cluster of data nodes, each of which serves blocks of data over the network using a block protocol specific to HDFS. Data is also provided over HTTP, allowing access to all content from a web browser or other types of clients. Data nodes can talk to each other to rebalance data distribution, to move copies around, and to keep the replication of data high. Distributed application framework over clouds HTTP-based applications usually conform to some web application framework such as Java EE. In modern data center environments, clusters of servers are also used for computation and dataintensive jobs such as financial trend analysis, or film animation. MapReduce is a software framework introduced by Google to support distributed computing on large data sets and on clusters of computers. MapReduce consists of one Master, to which client applications submit MapReduce jobs. The Master pushes work out to available task nodes in the data center, striving to keep the tasks as close to the data as possible. The Master knows which node contains the data, and which other hosts are nearby. If the task cannot be hosted on the node where the data is stored, priority is given to nodes in the same rack. In this way, network traffic on the main backbone is reduced, which also helps to improve throughput, as the backbone is usually the bottleneck.If a task fails or times out, it is rescheduled. If the Master fails, all ongoing tasks are lost. The Master records what it is up to in the file system. When it starts up, it looks for any such

data, so that it can restart work from where it left off. The open source Hadoop MapReduce project is inspired by Googles work. Currently, many organizations are using Hadoop MapReduce to run large data-intensive computations. Cloud computing, a business model containing a pool of resources, provides an effective paradigm for this purpose. In this context, cloud computing is a distributed computing paradigm that enables large datasets to be sliced and assigned to available computer nodes where the data can be processed locally, avoiding network-transfer delays. This makes it possible for people to understand and utilize the trillions of rows of information in a data center. We strongly believe that cloud computing can be an effective platform for data mining. Association rule based algorithm, Apriori, can be used as an example to study how data mining algorithms can be adjusted to fit the increasing demand for parallel computing environment of cloud. Association rule mining aims to extract interesting correlations, patterns, associations among sets of items in the transaction database or other data repositories. The Apriori algorithm is the most widely used algorithm for association rule mining. The input data size of Apriori is usually quite large and distributed in nature. Therefore, a cloud could be an ideal platform for this algorithm. However, the classical Apriori algorithm was not designed to be performed in the parallel environment of cloud because the iterative approach to get the frequent sets causes repeated scans of the disk. This high I/O overhead makes running the algorithm in clouds impractical. Although there have been some paralleled association rule algorithms, implementing these algorithms is difficult because programmers have to deal with challenges of process communication and synchronization. It is especially difficult to deal with failures that occur frequently in data intensive computing. In our work, we have not only revised Apriori to the MapReduce format, but also improved its parallel performance at all stages.

S-ar putea să vă placă și