Content Distribution Networks: National Level Conference On Information Processing 2011

National Level Conference On Information Processing 2011
CONTENT DISTRIBUTION NETWORKS

Trupti V.G [1], Rekha V [2] 1- M.Tech(CE),1st sem, 2-Asst. prof Dept of Computer science & Engineering SJB Institute of Technology, Bangalore [1] [2] truptivg.nayak@gmail.com rekha.vgs@yahoo.com Abstract- The paper introduces about Content Delivery Networks. How the traffic is reduced by distributing the contents across several artificial servers and making the contents highly accessible and available. Success in Internet applications involves user interactions whose quality is mainly affected by application response time. Content Delivery Networks (CDNs) have shortly appeared as a distributed solution to serve content faster than contacting a centralized server. It also specifies the CDN architecture which mainly has a Client, Server and Surrogate server that form a CDN topology. A set of surrogate servers (distributed around the world) that cache the origin servers content, Routers and network elements that deliver content requests to the optimal location and the optimal surrogate server; and An accounting mechanism that provides logs and information to the origin servers. Under a CDN, the client-server communication is replaced by two communication flows: one between the client and the surrogate server, and another between the surrogate server and the origin server. This distinction into two communication flows reduces congestion (particularly over popular servers) and increases content distribution and availability. This report tries to describe a CDN from a different point of view, paying much attention on the implementation process of a CDN. More specifically, CDNs maintain multiple Points of Presence (PoP) with clusters of (the so-called surrogate) servers that store copies of identical content, such that users requests are satisfied by the most appropriate site. Keywords- surrogate servers, Multicast, Computer Networking I INTRODUCTION By the growth of World Wide Web the traffic increased in many of the popular websites i.e. is the client requests were more which has to be serviced by the server. These websites have a motivation to provide better services. Thus CDNs came into existence and these have been used to provide better service. A Content Delivery Network (CDN) is an overlay network on top of the Internet which pushes content closer to end users. It is achieved by strategically placing servers, called surrogates, next to these users and serving them the desired content. The surrogates act typically as intelligent and transparent proxy caches that retrieve content previously from the origin server before responding. As the origin server is less accessed, backbone traffic is reduced and network bandwidth is efficiently used. Besides, load can be balanced among the servers. CDNs has primarily focused on techniques for efficiently redirecting user requests to appropriate surrogates to reduce request latency and balance load, and placement strategies to place server replicas in order to achieve better performance. Many CDN service providers like Akamai offer some overview whitepapers, they hide the real implementation as a private secret and fundamental key of their business success. This paper tries to provide some implementation hints to establish a CDN basis. This paper also includes CDN model architecture with the description of the main building blocks from an implementional point of view. CDN attempts to reduce network latency by avoidance of congestion paths. CDNs maintain multiple Points of Presence (PoP) with clusters of (the so-called surrogate) servers that store copies of identical content, such that users requests are satisfied by the most appropriate site II INSITE AND PERSPECTIVENESS OF CDNs A Content Delivery Network (CDN) is an overlay network on top of the Internet which pushes content closer to end users. It is achieved by strategically placing servers, called surrogates, next to these users and serving them the desired content. A. CDN Topology CDN topology involves: A set of surrogate servers (distributed around the world) that cache the origin servers content; Routers and network elements that deliver content requests to the optimal location and the optimal surrogate server; and An accounting mechanism that provides logs and information to the origin servers. Under a CDN, the client-server communication is replaced by two communication flows: one between the client and the surrogate server, and another between the surrogate server and theorigin server and increases content distribution and availability. To maintain (worldwide) distributed copies of identical content, the practice for a CDN is to locate its surrogate servers within strategic data centers (relying on multiple network
Dept of ISE, AMCEC
18

providers), over a globally distributed infrastructure. This distinction into two communication flows reduces congestion (particularly over popular servers) Some of the content providers are Akamai Technologies which is the leading one has more than 20000 servers over 10000 networks. Inktoni, a Yahoo company provides services for load balancing, streaming media. B. Content Delivery Network World Wide Organizations offering content to a geographically distributed and potentially large audience (such as the Web), are attracted to CDNs and the trend for them is to sign a contract with a CDN provider and offer their sites content over this CDN. CDNs are widely used in the Web community, but a fundamental problem is that the costs involved are quite high. which places replicas near the clients generating the greatest load, and Tree based replicas). These algorithms specify the locations of the surrogate servers in order to achieve improved performance with low infrastructure cost. Earlier experimentation has shown that the greedy placement strategy can yield close to optimal performance. Content Selection: The choice of the content that should be outsourced in order to meet customers needs is another important issue in the content selection problem. An obvious choice is to outsource the entire set of origin servers objects to other surrogate servers (the so-called entire replication). The greatest advantage of entire replication is its simplicity; however, such a solution is not feasible or practical because although disk prices are continuously dropping, the sizes of Web objects increase as well (such as audio or video on demand). Moreover, the problem of updating such a huge collection of Web objects is unmanageable. Therefore, the challenge of the content selection problem is to find a sophisticated management strategy for replication of Web content. A typical way is to group Web content based on either correlation or access frequency and then replicate objects in units of content clusters. Two types of content clustering have been proposed: Users sessions-based: The content of the Web log files3 is exploited in order to group together a set of users navigation sessions showing similar characteristics. Clustering users sessions is useful for discovering both groups of users exhibiting similar browsing patterns and groups of pages having related content based on how often URL references occur together across them. URL-based: Web content is clustered using the Web site topology (which is considered as a directed graph), where Web pages are vertices and hyperlinks are arcs. The Web pages (URLs) are clustered by eliminating arcs between dissimilar pages. The most popular objects from a Web site are identified, (the so-called hot data), and replicated in units of clusters where the correlation distance between every pair of URLs is based on a certain correlation metric. Furthermore, several coarse-grain dynamic replication schemes are used. By using these replication schemes, the performance of the Web services can be significantly improved. D. CDN Pricing Commercial-oriented Web sites turn to CDNs to contend with the high traffic problems while providing high data quality and increased security for theirs clients in order to increase their
Fig 1. Content Delivery Networks Overview C. Issues involved in CDN Surrogate Servers Placement: Choosing the best location for each surrogate server is important for each CDN infrastructure since the location of surrogate servers is related to important issues in the content delivery process. Determining the best network locations for CDN surrogate servers (known as the Web server replica placement problem) is critical for content outsourcing performance and the overall content distribution process. CDN topology is built such that the client-perceived performance is maximized and the infrastructures cost is minimized Therefore, effective surrogate server placement may reduce the number of surrogate servers needed and the size of content (replicated on them), in an effort to combine the high quality of services and low CDN prices. In this context, several placement algorithms have been proposed (such as Greedy1, which incrementally places replicas, Hot Spot,
Dept of ISE, AMCEC
19

profit and popularity. CDN providers charge their customersowners of Web sitesaccording to their traffic (delivered by their surrogate servers to the clients). The most indicative factors affecting the pricing of CDN services include: Bandwidth cost; Variation in traffic distribution; Size of content replicated over surrogate servers; Number of surrogate servers; Reliability and stability of the whole system; and Security issues of outsourcing content delivery. E. Meet CDN User Preferences: Meeting the user preferences is crucial for CDNs, adopting a content management task by which the content is personalized to meet the specific needs of each individual user. User preferences are learnt from web usage data by using data mining techniques. Some indicative objectives of content personalization over CDNs are Deliver the appropriate content to the interested users in a timely, scalable, and cost-effective manner; Increase the quality of the published content by ensuring it is accurate, authorized, updated, easily searched and retrieved, as well as personalized according to various users and audiences Manage the content throughout its entire life cycle from creation, acquisition, or migration to publication and retirement; and Meet security requirements since introducing content personalization on CDNs will facilitate the security issues raised such as authentication, signing, encryption, access control, auditing, and resource control for ensuring content security and users privacy. III. CONTENT DISTRIBUTION ARCHITECTURE The process Redirector is mainly composed by an algorithm that accepts input The architecture comprises six basic elements. The relationships between blocks are as follows: The origin server delegates its URI namespace to the request routing system (1), and publishes content (2) to be distributed to the remote surrogates (3) by the distribution system. Client requests content from what he perceives to be the origin server, but his request is treated by the request routing system (4) which redirects him to the optimum surrogate server (5). The surrogate servers periodically send information to the accounting system (6), which summarizes it in detail statistics and sends it as feedback to the Fig 2. General Architecture of CDN A. Sequence of action taken during content transaction origin server (7) and the request routing system
1. The client will connect to a portal, e.g.

www.porta1. com, through a web browser. a portal consists of a set of surrogates that build together a CDN. The request is processed by the authoritative DNS server, which is responsible to map the name www.portal.com into at least one IP address. This is the best point to introduce the Request Routing System, and is mostly used by current CDN companies. In fact, the DNS server is nothing but an interface: another process, call it Redirector, is the one in charge for determining the optimal surrogate. parameters and produces a response, typically a list of IP choice of an appropriate server depends on client proximity, server overhead and network congestion. Server overhead and network congestion implies some type of continuously monitoring the system, for example, through SNMP. This is addressed by another process, say SNMP Monitor, responsible for capturing periodical information of the servers and the network. The client will retrieve a list of IP addresses decrementally ordered by optimal performance estimated by the Redirector process. Once the client enters the portal from one of the surrogates, it has to select a content. This content is typically in a multimedia format and is delivered streamlined by a media server. So we need both a web server and a media server. Once the desired content is selected by a user, a new resolution phase is needed, as this selected content supposes a new input
2.
3.
4.
5.
6.
Dept of ISE, AMCEC
20

parameter. It is also important to note that target web surrogates could be different from target media surrogates. The resolution phase takes place at HTTP level, acting the first contacted surrogate as interface. 7. In order to distribute the content in a streamlined multimedia format, some kind of plug-in is required inside the browser, such as RealPlayer, QuickTime or, in an open way, a simple Java applet. This plugin connects to the media server in order to retrieve the content. B. Look at the Components Some of the components of the architecture are given below: DNS server: The function of our DNS is to simply map CDN name servers into CDN identifiers. Once a client request for a certain website arrives at the DNS server, it filters it depending on the content: if the site is associated to a certain CDN, then the DNS server obtains the corresponding CDN identifier and resends the request to the Redirector module. Otherwise, the request can be forwarded to a local DNS server following the hierarchical DNS operation. addresses. supports. In the first mode, which takes place at DNS resolution phase, the Redirector module retrieves the CDN identifier and a client IP address. The latter parameter (IP address) is at this stage unnecessary if only scalability is targeted. The second mode takes place after the client has selected the content. This time the surrogate that is serving him has to interact in background with the Redirector module to retrieve an optimal surrogate for serving this content, which is a key parameter in the selection strategy. It is also important to serve content from a nearby surrogate in order to obtain a low response time; therefore, client proximity is estimated and taken into account. If the CDN environment remains local (iCDN) and the number of surrogates is not considerable, a simple way of calculating proximity consists of sending pings from each surrogate to the client loaded within a time interval, each of them will serve a client request with the same probability if a surrogate is overloaded above an established limit, it will not be considered in the algorithm. The SNMP Monitor captures status information from the surrogates. This information is of two types: on the one hand, the monitor stores data about available resources in each portal or surrogate (memory, CPU utilization and number of connections); on the other hand, the monitor tracks information about network status between clients and portals. Whereas the first type of data is periodically read, the second type is asynchronously requested from the Redirector module each time a client issues a request to the CDN. The surrogates or portals act as CDN entry points for the clients and are in charge of serving them the desired content. The portals store static content (web pages) and generate dynamic content. Once a portal receives a client request for a streaming media content, it firstly interacts with the Redirector module to obtain an optimum surrogate IP address. After that, the portal generates an applet that contains a media player and sends it to the client, including the IP address of the optimum server. The client then initiates the applet and reproduces the multimedia content. The Fig 3. Data Exchange between the modules The CDN Manager is responsible for initializing all CDN parameter values, as well as managing how and where to store content according to a certain policy. That includes cache time control, content transfers between portals, content inclusion, content deletion, etc. C. Database Design Any system that stores and bases its behavior on stored data (at least partially) must
Redirector module: The Redirector is a key process of the whole system, as is the one in charge of deciding an adequate surrogate for each client request. There are two different functional modes, though similar, related with the number of input parameters that the included algorithm
Dept of ISE, AMCEC
21

include an effective design of its database structure. The database design is highly dependent of the desired content to be published. In the case of our CDN, there are various important .databases associated to the different modules of the architecture. There is a global content database that includes three data tables: table_lessons: it includes some important information for reference (the title of the lesson, the correspondent subject, faculty and teacher). lessons_CDN: it associates a lesson with a portal. copies_lessons: it indicates which surrogate has an available copy of a certain lesson. The first two data tables of the content database are remotely replicated on each surrogate, so that each surrogate has local knowledge of the available content in the CDN. The SNMP Monitor has its own database to store all the information obtained by the SNMP agents etiher periodically -CPU usage, used memory and connections or asynchronously - pings and network hops. Note that ping mechanisms may suppose a problem if a client incorporates a firewall that rejects ICMP messages. The redirection algorithm, as part of the Redirector Module, also has its own database to store values of server load and server proximity. D. Data Exchange between the Components A well performance of a CDN significantly depends on the correct communication of each process of the system. This communication takes place in form of messages, whose exchange is illustrated in Fig. 3.2. Two different routes can be distinguished: A DNS resolution phase: It redirects a client to a portal using a load balancing algorithm (4 steps), and A portal resolution phase: When the client has already entered a portal and is going to select a streamlined multimedia content from a list of available ones (7 steps). If no server is available, an empty list is sent and an error message is forwarded to the client. Besides these messages that occur in a content transaction, there are additional ones related to management tasks, such as content transfers, cache control, etc. IV. CONCLUSIONS In this paper we described about the Content Delivery Networks. It tells about how well the contents are distributed world wide to provide better accessibility of data for the clients. CDNs are still in an early stage of development and their future evolution remains an open issue. The challenge is to provide a delicate balance between costs and customers satisfaction. In this framework, caching-related practices, content personalization processes, and data mining techniques seem to offer an effective roadmap for the further evolution of CDNs. CDNs deliver all the data to multicast group, so the clients can join the group anytime when they need for particular content by sending session messages.. This technique saves the network bandwidth and makes it scalable. V. FUTURE WORK The client-server communication flow is replaced in CDN by two communication flows, namely one between the origin server and the surrogate server and the other between the surrogate server and the client. Thus congestion control can be done for the communication flow between client and surrogate server since CDNs support streaming media contents congestion can take place which can be reduced further. REFERENCES [1] Baruffa, G., Femminella, M., Frescura, F., Micanti, P., Parisi, A. and Reali, G.,Multicast Distribution of Digital Cinema, NEM Summit, September 2008 [2] Byers, J. and Kwon, G., STAIR: Practical AIMD Multirate Multicast Congestion Control, 3rd Int'l Workshop on Networked Group Communication, 2001, pp. 100-112 [3] Floyd, S., Jacobson, V., Liu, C., McCanne, S. and Zhang, L., SRM: Scalable Reliable Multicast. A Reliable Multicast Framework for Light-weight Sessions and Application Level Framing, IEEE/ACM Transactions on Networking, 1997, pp. 784-803. [4] Pallis, G. and Vakali, A., Insight and Perspectives for Content Delivery Networks, ACM Communications of ACM, 49(1), 2006, pp. 101-106. [5] Molina, B., Palau, C., Esteve, M., Alonso, I. and Ruiz, V., On content delivery network implementation, Computer Communications, vol. 29, no 12, pp: 2396-2412, September 2006. [6] Matrawy, A. and Lambadaris, I., A Survey of Congestion Control Schemes for Multicast Video Applications, IEEE Comms Surveys & Tutorials, 2004, pp. 22-31. [7] Akamai Technologies www.akamai.com.
Dept of ISE, AMCEC
22

Content Distribution Networks: National Level Conference On Information Processing 2011

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Content Distribution Networks: National Level Conference On Information Processing 2011

Încărcat de

Drepturi de autor:

Formate disponibile

National Level Conference On Information Processing 2011