Documente Academic
Documente Profesional
Documente Cultură
Submitted To: Department of Information Technology G.H.Patel College of Engineering & Technology
Few Statistics
Goolge.com has about 30 million hit a day. eCommerce portal Amozon.com has thousands of transactions occurring at a given instance of time. Hotmail keeps track of about 2 million mail session at a given instance.
How are these sites able to stand against such a huge number of hits without ever going down?
Do they have some super computers behind them which accomplish this task? Is there the cluster of computer which works coordinately to perform the duty?
High Availability Minimum Response time Content Security without unnecessary overheads eCommerce
Computer Analogy
Use faster hardware: e.g. reduce the time per instruction (clock cycle) Optimized algorithms and techniques Multiple computers to solve problem: That is, increase no. of instructions executed per clock cycle. i.e. Cluster computing
What is a cluster?
A cluster is a type of parallel or distributed processing system, which consists of a collection of interconnected standstandalone/complete computers cooperatively working together as a single, integrated single, computing resource. Each of these autonomous machine contributes their resources in the form of:
CPU cycles Memory Network Bandwidth Disk
Surveys show utilization of CPU cycles of desktop workstations is typically <10%. Performance of workstations and PCs is rapidly improving As performance grows, percent utilization will decrease even further Organizations are reluctant to buy large supercomputers, due to the large expense and short useful life span.
The collection of autonomous machines acting as web servers are the working horses of the cluster. All the request are distributed among these servers evenly. Optimum load balancing techniques is used to decide which server is going to entertain the next request.
Server load should be evenly distributed among the machines If any server fails, the failure should be masked by distributing the requests to the remaining servers without bringing down the service. The design should have the capability to add more machines to the cluster without bringing down the service Efficient Load Balancing techniques should be used to distribute the load among the slave nodes.
HTTPHTTP-redirection approach ClientClient-side approach ServerServer-side DNS approach The Server-side single-IP-image Serversingle-IPapproach Application Gateway System
Goals of Project
High availability Design an efficient load balancing algorithm Increase performance Scalability Efficient monitoring of the system performance and adequate logging mechanism
Balancing Algorithms
Round Robin Algorithm Based on Probability Distribution Proposed Algorithm (Dynamic Feedback Algorithm)
Optimizing Relative And Weight Factors Sampling Delay Optimizing Number of Slave
Implementation Issues
Request Forwarding 1. Network Address Translation (NAT) 2. IP Tunneling 3. Distributed Packet Rewriting (DPR) 4. Request forwarding and rewriting response Site of Implementation 1. DNS Server 2. Router Single vs. Multiple Master (Load Balancer) Scalability
Working of System
HTTP request redirection. Graphical User Interface Performance monitoring and sampling of system parameters Agents at slave node
Performance Panel
Java (JDK1.4.) Visual C++ 6.0 XML version 1.0 Java Web Server 2.0
Limitations
All the responses to the requests are served through the load balancer it self. This adds unnecessary overhead of rewriting the response by the load balancer. The system has been designed at Application layer using Request forwarding and rewriting response The system is designed with only single Load Balancer, and this could be a bottle neck to the system when ,because if the load balancer fails the system as a whole fails.
Future Enhancements
Direct Response to the client by the slave node without re-writing of reresponse at Load Balancer The system can be implemented at lower layer in TCP/IP suit using NAT or IP tunneling. Using multiple Load Balancers.
BackBack-up Slides
IP Tunneling
Main Panel
Status Panel
Log Panel
Configuration Panel
Cluster Computing
Collection of autonomous computer contributing their resources to accomplish a particular computing task. Resources can be in the form of:
CPU cycles Memory Network Bandwidth Disk