Sunteți pe pagina 1din 14

Microsoft Clustering GSL

Produced by: Kingsley Bell

Distributed Operations Windows

Windows Server Support Contact Information

Windows Regional Services EMEA: Hotline *448 6868 Email Address # IT TIS RDO EMEA DO Windows Manager Aleet Kavia *448 7753 Back Office team Lead: Edwin Broersma *443 9606 Front Office team lead: Barry Roberts *448 5483
Windows Production Services (Global): Hotline *650 8888 Email Address # IT TIS RDO Windows Prod Svcs Manager Tejendra Dhiman *650 8860 Remedy GIM / RFC Queue TIS_RDO_DO_WIN_PROD_SVCS Remedy GIM / RFC Queue : EMEA Asset Management TIS_RDO_EMEA_DO_WIN_ASSET_MGT EMEA Equities & PrimeServices TIS_RDO_EMEA_DO_WIN_EQ_PS EMEA Fixed Income & Deriv. TIS_RDO_EMEA_DO_WIN_FID_DRV EMEA Back Office TIS_RDO_EMEA_DO_WIN_IBO_BO

Produced by: Kingsley Bell

Contents

What is MSCS? Cluster Overview Cluster groups Resources Credit Suisse Naming Standards Failover Disaster Recovery Load Balancing Questions & Answers

Produced by: Kingsley Bell

What is MSCS?
A cluster consists of two or more computers working together to provide a higher level of availability, reliability, and
scalability than can be obtained by using a single computer. Microsoft cluster technologies guard against three specific types of failure:

Application and service failures, which affect application software and essential services. System and hardware failures, which affect hardware components such as CPUs, drives, memory, network adapters, and power supplies. Site failures in multisite organizations, which can be caused by natural disasters, power outages, or connectivity outages.

The ability to handle failure allows server clusters to meet requirements for high availability, which is the ability to
provide users with access to a service for a high percentage of time while reducing unscheduled outages.

In a server cluster, each server owns and manages its local devices and has a copy of the operating system and the
applications or services that the cluster is managing. Devices common to the cluster, such as disks in common disk arrays and the connection media for accessing those disks, are owned and managed by only one server at a time. For most server clusters, the application data is stored on disks in one of the common disk arrays, and this data is accessible only to the server that currently owns the corresponding application or service.

Server clusters are designed so that the servers in the cluster work together to protect data, keep applications and
services running after failure on one of the servers, and maintain consistency of the cluster configuration over time.

Produced by: Kingsley Bell

Cluster Overview

Produced by: Kingsley Bell

Cluster Groups

Cluster Groups are used to group together all resources Required to run an application or instance. A cluster group can only run on one physical node at one time. No other node will be able to access the resources e.g.
Disks Multiple cluster groups can be run simultaneously on the same node. When a cluster group is moved to an other node all resources in that group are taken offline and brought up on the other node. An Active/Active cluster is when 2 cluster groups are running on 2 physical machines. In case of a node failure the cluster service will automatically start the whole cluster group on an other available node. The first cluster Group is used to operate the cluster, no other resources should be placed in this group. Cluster groups have some configuration options - Preferred Owners, Failover (Threshold-Period), failback options.

Produced by: Kingsley Bell

Resources

Resources reside in cluster groups All resources required for a specific function should be grouped together When a resource is in a cluster, it should only be administered through the cluster Typically a each cluster group has a network name and associated IP the resources can be accessed through A large number of resource types can be created which can be used to provide a total clustered application ecosystem

IP Address Network Name File share Generic Service Physical Disk

Some Resources have required dependencies e.g. the Network Name requires an IP address You can create your own dependencies, for example a service can not start until a file share is online Each resource has a number of configuration options Some applications create new cluster resources e.g. MS SQL server If the application is not cluster aware then the use of generic service/application can be used for a roll your own solution Resources are required to be available on each node that may own the resource Cluster aware applications will install required binaries on all nodes at install Generic applications will need to have required binaries installed onto each node manually

Produced by: Kingsley Bell

Resources Configuration Options

Should a resource be automatically restarted if it goes offline? Can the resource affect the cluster group if the resource fails a specified number of times in a
certain time period? Is the process running? Is it actually working? Timeout for process to start before it goes to a failed state

Produced by: Kingsley Bell

Credit Suisse Naming Standards


Physical Machines Nodes begin with X e.g. XNYC19P11013A First Node ends A, second Node ends B etc Cluster Group The First cluster group begins with C and is based on the server name e.g. CNYC19P11013, This is usually the default cluster group and is used to operate the cluster, no other resources should be placed in this group

SQL based cluster groups begin with M Application based cluster Resources begin with C The first group ends with A, the second group ends with B MNYC19P11013A First SQL based group CNYC19P11013B Second application based group Cluster groups are often referred to as Virtual servers (not to be confused with VMware) The cluster group name is typically the name assigned to the network name resource for the
virtual server, this is name the clients should use to access the resources

Produced by: Kingsley Bell

Failover
MSCS does not provide a seamless failover solution, resources are shutdown on one node and
then brought up on an other node in case of failure Careful consideration should be made when configuring resource parameters e.g. affect group Cluster resources should not be overcommitted to allow space for node failure e.g. if one cluster group requires 80% of computing power to operate there should always be this amount of capacity in the cluster available in case of node failure e.g. 2 cluster groups both need 55% of compute power 3 nodes should be in the cluster Keep all nodes in a cluster with the same specification Individual resource failures can initiate a cluster failover Node failure will initiate a cluster failover

Produced by: Kingsley Bell

Disaster Recovery
requires 2 DR nodes

DR nodes should installed with enough resources to just run the cluster e.g. 3Production Nodes
2+1 3+2 4+3 DR nodes typically have the cluster service disabled or running just the default cluster group with all other cluster groups offline DR nodes will need to have the application installed Configuration changes need to be updated when the production configuring is changed Credit Suisse utilises the following 3rd party vendor technologies to aid DR failover EMC SRDF Symmetrix Remote Data Facility CISCO LAM Local Area Mobility When using SRDF the cluster disk resources will be unable to be brought online without the disks being in a split state The IP address and Network Name will be unable to be brought online if they are in use in the estate The DR nodes naming standard reflects the production nodes XNYC19P11013A -> XNYC19B11013A CNYC19P11013 -> CNYC19B11013 With the use of LAM however, the Virtual server names will be able to be brought online in a DR scenario (CNYC19P11013A)

Produced by: Kingsley Bell

MSCS Clusters DR
Collection of clustered Windows servers with shared disks, IP addresses, network names and SQL resources. 2+1 or 3+2.
LAM

Slough
Prod A
Corp

Global Switch

Heartbeat vlan non - routed

SRDF Shared Storage PROD Shared Storage DR

DR

Corp

Prod B

Enable LAM in DR
Stop production resources Split storage

Import disk groups in DR


Start Network and storage cluster resources in DR Start SQL resources in DR

Produced by: Kingsley Bell

Load Balancing
Service is provided by the NOC Cisco GSS Global Site Selector

Round Robin or weighted balancing Session aware End point node checking (ping) Node port end point checking e.g. port 80,21,443 etc Can also query website connectivity, e.g. 404 Page not found errors

Produced by: Kingsley Bell

Questions & Answers

Clustering solutions would be expected to support: Automatic failover of application processes/services in the event of node failure Automatic restart of application processes/services in the event of process failure Automatic load balancing of peer processes/services in a cluster Automatic reallocation of processes/services to ensure best utilisation of the cluster Management and software deployment and provisioning at the cluster level rather than individual node level Given these reasonable expectations of a clustering solution a number of facets of the GSL / SlatePlus clustering arrangements don't seem to quite match these expectations. Why is it necessary to install services explicitly to every machine rather than installing a service to a cluster and letting the clustering solution manage the deployment of services to the cluster nodes? What role does GSL play in providing the clustering solution rather than relying upon the MS product. For instance is GSL Gateway necessary or could it be replaced in whole or in part by MS components? What 'templates' exist for stateless services where we can run multiple instances of services concurrently in the cluster? What 'templates' exist for stateful services where we would want only one instance of particular service to run at one time, but where we do want the benefits of the cluster, i.e. automatic failover to another node and automatic restart?

Produced by: Kingsley Bell

S-ar putea să vă placă și