Sunteți pe pagina 1din 24

Project Implementation Architecture (PIA) Document

SCOM Design

Synopsis:

System Centre Operations Manager (SCOM) is an end-to-end service monitoring solution that can monitor clients, events, services, applications and network devices. It presents integration for Microsoft Products Such as Live Communication Server 2005, MS SQL and Sharepoint.

Segment:

EMEA

Authors: Contributors:

Mr X. Mr Y.

PIA Document Version: Document Status: Date: Document Status:

2.3 Draft 24/11/2011 1. Definition Phase Draft

Current Project Phase: Definition Authorised by:

No Copyright 2012 ACME Limited

PIA - SCOM

V2.3 Draft

Contents
1. Project Summary ............................................................................................................................... 3 1.1 References ................................................................................................................................ 3 1.2 Change History .......................................................................................................................... 3 1.3 Glossary .................................................................................................................................... 4 2. Business Context ............................................................................................................................... 5 2.1 SCOM Pilot ................................................................................................................................ 5 2.2 Other Key Benefits..................................................................................................................... 6 2.3 Scope ........................................................................................................................................ 6 3. SCOM Architectural Design ................................................................................................................ 7 3.1 Original Design in ASMB............................................................................................................ 7 3.2 Proposed Design ....................................................................................................................... 8 3.3 System Centre Operations Manager Data Flow Diagram ........................................................... 9 3.3a Agent to server Communication ............................................................................................... 10 3.4 Previous MOM installation in Interchange ................................................................................ 10 3.5 Microsofts Recommendations ................................................................................................. 10 3.6 Deployment Plan ..................................................................................................................... 11 3.7 Testing Schedule ..................................................................................................................... 12 4. Project Requirement Analysis........................................................................................................... 14 4.1 Requirements: ......................................................................................................................... 14 4.2 Admin Requirements ............................................................................................................... 14 4.3 Equipment List ......................................................................................................................... 14 5. Database Sizing ............................................................................................................................... 16 5.1 Database Sizing and Design .................................................................................................... 16 6. Network Recommendations .......................................................................................................... 17 7. Storage Requirements .................................................................................................................. 17 8. Security ........................................................................................................................................ 18 8.1 MacAfee Exclusions for System Centre Operations Manager ................................................... 18 8.2 Operations Manager 2007 (management servers and agents): ................................................ 18 8.3 Exclusion of File Type by Extensions ....................................................................................... 18 8.4 Considerations ......................................................................................................................... 19 8.5 Access Control Model .............................................................................................................. 19 9. Capacity Forecast & Schedule Availability via PAWZ .................................................................... 22 10. Disaster Recovery Backup & Restore ...................................................................................... 22 10.1 Backup SCOM Databases ....................................................................................................... 22 10.2 Backup software Requirements ............................................................................................... 22 10.3 List of servers to be backed up ................................................................................................ 22 10.4 Clustered SQL Server .............................................................................................................. 23 11. Maintenance and Best Practices .............................................................................................. 23 12. Additional Costs ....................................................................................................................... 24 13. Risks & Issues ......................................................................................................................... 24

PIA - SCOM

V2.3 Draft

Page 2 of 24

PIA - SCOM

V2.3 Draft

1. Project Summary
SCOM is a Microsoft software solution which will provide end-to-end service monitoring of X messaging environments. It presents integration for Microsoft Products Such as Live Communication Server 2005, MS SQL, interchange and SharePoint.

1.1

References
Document Date Author

1 2 3 4 5

1.2
Ver 0.1 0.2 0.3 0.4

Change History
Date Author Key Changes

PIA - SCOM

V2.3 Draft

Page 3 of 24

PIA - SCOM

V2.3 Draft

1.3
Term SCOM Collab LCS RMS ASMB TDP IOPS ASM Prod VM LAN

Glossary
Definition System Center Operations Manager Collaboration (Microsoft) Live Communication Server Root Management Server Assembly Blue Tivoli Data Protector Input/Output Per Second Assembly Environment Production Environment Virtual Machine Local Area Network

PIA - SCOM

V2.3 Draft

Page 4 of 24

PIA - SCOM

V2.3 Draft

2. Business Context
GMI is the solution which currently provides monitoring of the Messaging environment however; there are some limitations on its monitoring capabilities. GMI cannot report effectively to the granularity required for critical business systems nor can it provide a comprehensive view of the health of the environment. SCOM provides helpful tools to manage the environment in its entirety and has the ability to integrate with other existing tools such as GMI. Key benefits: Improve Service Health

Improve services health while driving alignment with business SLAs. SCOM provides easy to use reporting and authoring capabilities. There is full visibility of service health but also the ability to monitor in a more proactive manner to circumvent the likeliness of service impacting issues. Unify Management of Complete Messaging Environment

Visibility across platforms, applications and components. SCOM provides a single view, including application and infrastructure components across Windows and non-Windows environments. Dynamically Respond to Changes

SCOM enables the ability to dynamically respond to changes with automated action to ensure continued service performance and availability. Actions can be taken to remediate a service directly from the console, making it easy to restore it back to full health in an operationally efficient manner.

2.1

SCOM Pilot

An initial SCOM Pilot within the ASMB environment presents the following findings: SCOM utilises a hierarchical, overall health philosophy, where alerts are associated with each other to assist in identifying the underlying issue which is currently not offered by GMI. SCOM alerts on minor and major issues that have not yet become critical or have not interrupted the normal operation of a configuration item. Alerts are visible at the top level of the hierarchy and it is possible to drill down to the offending constituent parts causing the alert. These broken items would pre-empt the failure before it becomes service impacting.

PIA - SCOM

V2.3 Draft

Page 5 of 24

PIA - SCOM

V2.3 Draft

2.2

Other Key Benefits

Integration with GMI Using Microsoft Orchestrator / Opalis, SCOM is able to integrate with GMI Webtop. SCOM is able to take alerts and pass them to GMI, eliminating the need for an additional monitor.

Automation SCOM has native automation capabilities, which are further enhanced by the presence of Microsoft Orchestrator / Opalis.

Economy of implementation Using existing non cutting edge hardware, through lessons learnt and bugs that have been resolved, the third implementation of the software should be accelerated and mature.

Diverse monitoring capability SCOM can make use of third party management packs to monitor many third party products such as, Solaris OS, LINUX, Cisco switches.

Built in knowledgebase The built in knowledgebase contains common problems and how to resolve them as well as common fixes. This knowledgebase is also expandable and easily accessible through the SCOM Console interface.

2.3

Scope

In scope: SCOM monitoring solution in ASMB SCOM monitoring solution in PROD Out of scope SCOM via Opalis to GMI Handover to 2nd Level Handover to TPH SCOM monitoring solution in ADMB DR SCOM monitoring solution in PROD DR That which is not in scope

PIA - SCOM

V2.3 Draft

Page 6 of 24

PIA - SCOM

V2.3 Draft

3. SCOM Architectural Design


3.1 Original Design in ASMB

The above diagram is an illustration of the original design concept (see Figure.1).This configuration uses a single Root Management Server virtual machine and a single physical database server which was chosen for ease of implementation in-conjunction with the Lync project. This design failed to meet requirements required to implement the design into the production environment. Flaws in the design presented: Single points of failure The ASMB concept had two single points of failure which were the database and the RMS they both had no inbuilt resiliency e.g. clustering or management servers this has been addressed in the new design. Little scope for growth There was very little scope to grow the number of systems monitored as the RMS had poor performance and did not seem able to handle the load efficiently. (Please see section three)

This Pilot will remain in ASMB for further analysis during the design & planning stage of the project. Once this stage is complete, the pilot will be replaced with a proposed design which eliminates the flaws identified in the original design.

PIA - SCOM

V2.3 Draft

Page 7 of 24

PIA - SCOM

V2.3 Draft

3.2

Proposed Design

The new design illustrated above considers: High Availability The clustered configuration of our new design will provide the extra redundancy in the event of a failure, our present Installation in Assembly blue does not provide this. The single Root Management Server and the single Database and single Opalis installations all provide single points of failure which are unacceptable for a production installation. Resilience The new design also has two extra management servers which provide monitoring redundancy. Active Directory integration makes it possible for agents to automatically failover to another management server in the management group, if their assigned management server fails. Use of VMWARE The Gateway Server is a virtual machine and has the entire resiliency offered by the Vmware Vsphere infrastructure. The Gateway Server will communicate with the management server through certificate based authentication. If the management server that communicates with the gateway server fails, then the gateway server will failover to the other management server which would then take over monitoring systems on the untrusted domain.

PIA - SCOM

V2.3 Draft

Page 8 of 24

PIA - SCOM

V2.3 Draft

3.3

System Centre Operations Manager Data Flow Diagram

Figure 4: Data Flow Diagram of the SCOM Environment

PIA - SCOM

V2.3 Draft

Page 9 of 24

PIA - SCOM

V2.3 Draft

3.3a Agent to server Communication


The Data Flow diagram above shows the dataflow which SCOM uses to monitor managed systems and communicate with the operations manager database. The RMS monitors managed systems through the Management Server Action Account and communicates with the database via the SDK Config Account. SCOM utilizes Management Packs for Key Performance Indicators on the various types of software that it monitors, for example, SharePoint or Active Directory, so within the Active Directory Management Pack Microsoft has placed thresholds which it considers are normal and thresholds which are not normal. The Agent communicates with the RMS through a heartbeat which beats three times per minute. As soon as the agent picks up on a system which has fallen outside a normal operating window, it generates an event which is reported back to the management server. The nature of the event could range from software compliance to configuration errors. These events may not stop the system from running but may impact performance later on.

3.4

Previous MOM installation in Interchange

The current MOM configuration is not offering the level of monitoring required to examine the environment and provide a proactive level of support. A number of issues may be going unnoticed and subsequently leading to impact to service. However, following an analysis of the previous MOM installation in Interchange, therefore parts of the old MOM installation can be salvaged to form the new SCOM design. The MOM management server will no longer be required as Interchange will be monitored via the SCOM Gateway and Root Management Servers. Agents will need to be reinstalled, as there is no upgrade path available from MOM 2005 to SCOM 2007. The previous MOM configuration is then to be decommissioned.

3.5

Microsofts Recommendations

SCOM via VM This is not a recommended solution from Microsoft; the present pilot setup includes one Root Management Server on a VM and one Database Server on a physical machine. Microsoft recommend against using VMs in the case of either server components due to performance reasons. Sluggish performance on a VM There is an impact on response times with the use of VMs, despite the fact that the network is more than able to support SCOM and the fact that the database had been tuned to use separate physical disks. The reason for this is due to the fact that the SCOM Root Management Server has a high IOPS threshold, and according to Microsoft, this is not suitable for VMs. Fault tolerance

PIA - SCOM

V2.3 Draft

Page 10 of 24

PIA - SCOM

V2.3 Draft

The recommendation from Microsoft is to add separate Management servers specifically for the function of passing data to the database to ease the load on the Root Management server. We have also opted to Cluster both the SCOM Database and Root Manangement Servers for added fault tolerance. Opalis will be installed on the Root Management Server as per the old design and being on a cluster, Opalis will benefit from the added fault tolerance which is missing from the old model. Untrusted Domain There is no Firewall between Interchange and RM, but because Interchange is on another Domain a gateway server will be required. The native SCOM Gateway will provide secure encrypted communication between Messaging and Interchange via mutual authentication Certificates using Port 5723.

3.6

Deployment Plan

STAGE NAME Initiation - Objectives - Scope - Stakeholders - PIA Document Planning & Design - H/W & S/W Requirements - High Level Design - Capacity - Maintenance - Storage - Database - Network - Security - Access Control Model - Test Plan

Days 5

Start Date 21/11/2011

End Date 25/11/2011

25

28/11/2011

04/01/2012

PIA - SCOM

V2.3 Draft

Page 11 of 24

PIA - SCOM

V2.3 Draft

PHASE 1: Implementation ASMB - Base Build - Installation - Configuration - Testing PHASE 2: Implementation PROD - Base Build - Installation - Configuration - Testing PHASE 3: Scom via OPALIS to GMI nd Handover to 2 Level Handover to TPH ASMB DR Build PROD DR Build

30

03/01/2012

10/02/2012

09/01/2012

24/02/2011

Out of Scope

TBC

TBC

Out of scope

TBC

TBC

Closure - Training - Documentation - DR Documentation/ Update - Project Closure Doc

10

March 2012

April 2012

3.7

Testing Schedule
Reason For Test To test Cluster failover and check service availability Expected Result There should be no interruption to services on the RMS server and the SQL server Monitored systems should failover to the secondary management server when the first management server is switched off Cumulative update 5 should be applied to the RMS cluster Result achieved Success/Failure

Test Verify RMS and SQL Cluster Failover

Agent failover on monitored systems

Cumulative update Process for clustered RMS

Analyze events from managed

Testing to see if monitored systems failover to an alternate management server if the primary management server goes down Testing cumulative update process for RMS and verify that the RMS is at the correct patch level Comparing events generated in scom

There should be more granularity

PIA - SCOM

V2.3 Draft

Page 12 of 24

PIA - SCOM

V2.3 Draft

systems

with events from GMI in a side by side comparison

with the scom events and more proactive monitoring

PIA - SCOM

V2.3 Draft

Page 13 of 24

PIA - SCOM

V2.3 Draft

4. Project Requirement Analysis


4.1 Requirements:
ID 001 002 003 004 005 System Requirements The system SHALL provide an end-to-end service monitoring of the messaging environments. The system SHALL integrate with existing monitoring tools. The systems SHALL see a reduction in alerts from Microsoft Products Such as Live Communication Server 2005, MS SQL and SharePoint. The system SHALL see an improvement in the maintenance life cycle of all associated configuration items. The system SHALL replace MOM with SCOM in Interchange ASMB and Production environments. The system SHALL provide better visibility of issues before they become critical

006

ID 007 008 009 010

Operational Requirements The system SHALL provide a high-availability solution. The system SHALL provide resilience. The system SHALL have external support agreements. The system SHALL have backup and restore processes.

ID 011 012

Environment Requirements The system SHALL adhere to X hardware and software standards. The system SHALL include a test/ Non production environment.

ID 013

Strategic Requirements The system SHALL be saleable for X growth.

4.2

Admin Requirements

There are no direct administration requirements to the system other than predefined static configuration files that are part of the overall install package. All user details and access should be managed by external capabilities (Collaboration Admin Portal / Siebel / eSpresso / AD).

4.3

Equipment List
ASMB

LCS Production

PIA - SCOM

V2.3 Draft

Page 14 of 24

PIA - SCOM

V2.3 Draft

Root Management Server (Clustered) 2x HP DL360 G5 16 GB RAM Min 60GB native Hard disks SAN Storage 136 GB

Root Management Server (Clustered/Non Clustered) 2/1x HP DL360 G5 16 GB RAM Min. 60GB native Hard disks SAN Storage 136 GB DB Server (Clustered/Non Clustered) 2/1x HP DL585 G5 32GB RAM 60GB native Hard disks SAN Storage 500 GB (see sizing based on 30 days data and 500 agents) Management Servers (Non Clustered) 2x HP DL360 G5 or equivalent 60GB native Hard disks 12 GB RAM Min. Gateway Server 1x Virtual Machine 8 GB RAM 60 GB HDD Min.

DB Server (Clustered) 2x HP DL585 G7 32GB RAM 60GB native Hard disks SAN Storage 500 GB (see sizing based on 30 days data and 500 agents) Management Servers (Non Clustered) 2x HP DL360 G5 or equivalent 60GB native Hard disks 12 GB RAM Min. Gateway Server 1x Virtual Machine 8 GB RAM 60 GB HDD Min.

PIA - SCOM

V2.3 Draft

Page 15 of 24

PIA - SCOM

V2.3 Draft

5. Database Sizing
5.1 Database Sizing and Design
The following table shows the amount of space used by the same SCOM SQL databases in the present proof of concept in ASMB. These figures were used to work out how much space would be required for a SCOM environment with 500 Agents.

Database diskspace estimation. 277 Physical / 125 VMs + room = 26 Interchange & for error / growth we will work with the number of 500 Agents, but the actual number will be nearer 420 Agents to allow for growth Grooming Interval (days) Present conf. diskspace MB for 40 Agents Estimated diskspace GB for 500 Agents (40 Agents x 12.5 = 500)

DB Component Database 1) Ops Mgr DW DB 2) Ops Mgr DB 3) Opalis DB 4) Reports Server DB 5) Reports Server Temp DB SAN Mount Point 1 6) System Master DB 7) System Model DB 8) MSDBData DB 9) System Temp DB SAN Mount Point 2 Transaction Logs 10) Ops Mgr DW DB Transaction Logs 11) Ops Mgr DB Transaction Logs 12) Opalis DB Transaction Logs 13) Reports Server DB Transaction Logs 14) Reports Server Temp DB Transaction Logs 15) Master DB Transaction Logs 16) Model DB Transaction Logs 17) Temp DB Transaction Logs 18) MSDB Data DB Transaction Logs

90 7 N/A 7 7 N/A N/A N/A N/A

26796/16110 MB 2800 MB 355 MB 8.87 MB 3.5 MB 4.0 MB 1.3 MB 13 MB 235 MB

201375+80560 (+40%) = 282 GB 35000+14000 (+40%) = 49 GB 4437.5+1775 (+40%) = 07 GB 0.10GB 0.04GB 340+68 (+20%) = 408 GB

5GB 25GB 30+6 (+20%) = 36 GB

Autogrow 200MB Autogrow 200MB Autogrow 200MB Autogrow 200MB Autogrow 200MB Autogrow 200MB Autogrow 200MB Autogrow 200MB Autogrow 200MB

30000 MB 435 MB 22 MB 35 MB 1 MB 0.5 MB 77 MB 7 MB 130 GB

PIA - SCOM

V2.3 Draft

Page 16 of 24

PIA - SCOM

V2.3 Draft

SAN Mount Point 3 Quorum Drive SAN Mount Point 4 Required disk space Total

130+26 (+20%) 156 GB

0.5GB Circa 600GB

http://www.simple-talk.com/sql/database-administration/estimating-disk-space-requirements-for-databases/

6. Network Recommendations
Network usage measurements and response times were analysed against a snapshot of data over the period of an hour in the pilot deployment of SCOM in ASMB. It is recommended that two devices share the same access switch so that traffic flows are carried on the switch fabric and not across the LAN via the distribution switch. There is an element of risk involved as it shows to impact performance caused by network latency therefore co-locating would eliminate this risk. For details of findings and network requirements analysis, please refer to the following document.

7. Storage Requirements
We assume the disk subsystems have at least the following capability: 125 random I/O operations per second per drive.

The database disks were sized based on the number of SCOM Agents, file sizes, 5 months of data, projected growth and data retention values in the present Pilot in ASMB. For example, the Operations Manager Data warehouse file is currently 26796MB, since July we have collected about 150 days (30*5) worth of data and we actually only wish to keep 90 days worth of data. 26796/150 = 179MB (data per day) 90*179 = 16110MB (90 days worth of data) In order to support the number of 500 Agents we need to multiply this number by 12.5 16110*12.5 = 201375MB Microsoft recommend adding 40% to this number for operations such as indexing etc. 201375/100 = 2014 (1%) 2014*40 = 80560 (40% of 201375) 201375 + 80560 = 281935MB This final figure of 282GB is the projected disk space needed for the SCOM Data warehouse

PIA - SCOM

V2.3 Draft

Page 17 of 24

PIA - SCOM

V2.3 Draft

In Assembly Blue there is less activity than in Production in general, but we have no yard stick to measure this by, so I would like to propose adding a further 20% buffer on to of the final total SAN disk space required (600GB). Our DBA was also consulted regarding the disk space requirements for the database files and transaction logs along with the following websites: http://www.simple-talk.com/sql/database-administration/estimating-disk-space-requirements-for-databases/ http://technet.microsoft.com/en-us/library/bb735402.aspx

8. Security
8.1 MacAfee Exclusions for System Centre Operations Manager

In order for System Centre Operations Manager to run effectively the following exclusions would need to be made on the MacAfee Epolicy orchestrator server this would be for managed systems as well as management servers and the RMS server. The following exclusions need to be applied to the management servers and monitored servers.

8.2

Operations Manager 2007 (management servers and agents):

These include the queue and log files used by Operations Manager. Both of these need to be excluded: C:\Program Files\System Center Operations Manager 2007 D:\Program Files\System Center Operations Manager 2007\Health Service State\Health Service Store D:\Program Files\System Center Operations Manager 2007

8.3

Exclusion of File Type by Extensions

SQL Database Servers: These include the SQL Server database files used by Operations Manager components as well as system database files for the master database and tempdb. Examples: MDF, LDF Operations Manager 2007 (management servers and agents): These include the queue and log files used by Operations Manager. Example: EDB, CHK, LOG. SQL Database Servers:

PIA - SCOM

V2.3 Draft

Page 18 of 24

PIA - SCOM

V2.3 Draft

These include the SQL Server database files used by Operations Manager components as well as system database files for the master database and tempdb. To exclude these by directory, exclude the directory for the LDF and MDF files: Examples: C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Data D:\MSSQL\DATA E:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Log

8.4

Considerations

A group policy will be required to place the agent action account into the local administrators group Testing Guide Test and verify cluster failover configuration on both the SCOM database and the RMS servers Test agent failover configuration Test cumulative update process for clustered root management server (Please refer to patching guide for further details) Analyse events from managed systems

8.5

Access Control Model

Required Accounts to be Created including Group Policy and Management Packs The below table shows the accounts, permissions and groups memberships that are required, including any Group Policy changes for the SCOM deployment. Management Pack account requirements are also included. SCOM Account Permissions Group Memberships and Extra rights Local Users group, Local Performance Users group. Password never expire, User cannot change password Local Users group, Local Performance Users group. Register the SPN with Active Directory grant service accounts SELF property the right to register and update SDK Password never expire, User cannot change password Collect information and run tasks on managed systems Use

Management Server Action Account (MSAA) SCOMMSAA

Local Admin Access, Allow Log On Locally (Globally via Group Policy)

Collect information and run tasks on managed systems

System Center Data Access SCOMDA

Local Admin Access, Allow Log On Locally (Globally via Group Policy)

PIA - SCOM

V2.3 Draft

Page 19 of 24

PIA - SCOM

V2.3 Draft

System Center Management Configuration, (data access and config) SCOMCDA Data Reader SCOMDR Data Warehouse Write Action SCOMDWWA MS SQL Server Action Account SCOMSQL Sharepoint Agent Action Account SCOMSPAA Sharepoint Pool SCOMSPP Sharepoint Config SCOMSPC

Local Admin Access globally (Via Group Policy)

Password never expire, User cannot change password

Runs services and write data to Operational Database

Local Admin on SCOM DB

Password never expire, User cannot change password Password never expire, User cannot change password Password never expire, User cannot change password

Query Reporting Services database

Local Admin on SCOM DB

Writes data to the data warehouse databases

Local Admin Access globally (Via Group Policy) Local Admin Access

Collect information and run tasks on managed SQL systems

Password never expire, User cannot change password

Collect information and run tasks on managed Sharepoint systems. To be confirmed Advanced tasks for Sharepoint. To be confirmed Runs services and write data to Operational Database. To be confirmed Administrators accounts. Specific accounts to be confirmed

Local Admin Access

Password never expire, User cannot change password Password never expire, User cannot change password

Local Admin Access

Operations Manager Administrators accounts SCOMADMIN

On all SCOM severs (Via Group Policy)

The following table shows the groups and members that are required by SCOM, including any Group Policy changes necessary. SCOM Groups Operations Manager Administrators Group Members SCOMADMIN, and any individual accounts for nd COLLAB 2 level. Local Admin Access On all SCOM Servers Miscellaneous

PIA - SCOM

V2.3 Draft

Page 20 of 24

PIA - SCOM

V2.3 Draft

The following table shows Active Directory requirements for the SCOM deployment. Additional OPSMGR Function Organisational Unit in the Active Directory for general Organisational Unit in the Active Directory for MS SQL Organisational Unit in the Active Directory for Sharepoint Notes Required

OPSMGRSQL

To be confirmed

OPSMGRSP

To be confirmed

Collaboration Support Team 2

nd

Level Access to the SCOM Administrator Console

So far access SCOM for the Collaboration Support Team will be facilitated via RDTABs as with all other monitoring tools. All Collaboration Support Team members will have full SCOM administrator access and all the rights that go with this level of access. Additional Management Packs to be imported Default Management Packs covering a range of functions are installed by default, which are out of scope for this document. The following is a list of additional Management Packs to be imported into SCOM. The requirements for these Management Packs, if any, are covered in the previous section Required Accounts to be Created including Group Policy and Management Packs: Microsoft SharePoint 2010 Products Microsoft SharePoint Foundation 2010 Microsoft.Office.LiveCommunicationsServer.2005 Microsoft.SQLServer.2008.Monitoring SQL Server 2005 (Monitoring) Windows Server 2003 Operating System Windows Server 2008 Operating System Windows Server 2008 Cluster Management Windows Server 2003 Cluster Management Windows Server Internet Information Services 2003 Respective Core Files and Libraries which are dependants of all the above

PIA - SCOM

V2.3 Draft

Page 21 of 24

PIA - SCOM

V2.3 Draft

9. Capacity Forecast & Schedule Availability via PAWZ


Capacity management is implemented through the standard X PAWZ trend analysis tool. Metrics should be reviewed monthly through the standard under watch process. This should equate to a monthly report that is issued by the Global capacity management group and subsequent review session to discuss any negative trend and mitigation activities that result. All servers will have at least 5GB free allocated for performance logs. This section will be work in progress, as the old SCOM Pilot in ASMB environment cannot really be used to collect accurate metrics to fill in the Capacity Risk Proforma. Accurate metrics will be available as soon as the SCOM deployment in ASMB is finished.

10. Disaster Recovery Backup & Restore


This section outlines the process required to backup and restore the clustered Root Management and Clustered SQL Database. Tivoli Storage Manager will be used to backup the SCOM environment. The backup requirements for a clustered SCOM are as follows:

10.1 Backup SCOM Databases


Operations Manager (Operational Database) Operations ManagerDW (Dataware house Database) Ops Mgr DB Transaction Logs Ops Mgr DW DB Transaction Logs Report Server (Reporting Server Database) Report Server TempDB (Reporting Server Temporary Database) Master (SQL Server Master Database) MsDbData (Msdb database) Other components Internet Information Services (IIS) 7.0 Metabase Internet Information Services (IIS) 7.0 configuration Root Management Server Encryption Key Create List of All Management Pack Installed (stored on shared network drive) Backup of Unsealed (customised) Management packs SCOM Registry Keys

10.2 Backup software Requirements


Each server requires the Backup Archive TSM client to be installed the clustered SQL database servers and clustered RMS servers also require the TSM SAN storage client to be installed.

10.3 List of servers to be backed up


Clustered Root Management Server x2 servers

PIA - SCOM

V2.3 Draft

Page 22 of 24

PIA - SCOM

V2.3 Draft

Clustered SQL Server x2 Backup requirements for the Root management Cluster

The RMS cluster has shared resources on a shared drive; the following files need to be backed up Root Management Server Encryption key this key allows the management group to function All management Packs which are currently installed (Power shell command GetManagementPack | Export-csv c:\ManagementPackList-Nov-2011.csv) will generate a comprehensive list of management packs that SCOM is currently using, all unsealed management packs need to be backed up as well these can be exported to the network shared drive within the cluster on a regular basis. Gateway server certificate this should be exported to the shared network drive so that it can be backed up. Backup the following SCOM registry key Computer\HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft Operations Manager\ export the registry key to a folder on the shared network drive within the cluster, the registry key can then be backed up by TSM with the rest of the files and restored to a new installation of SCOM if necessary.

10.4 Clustered SQL Server


All the SCOM databases will need to be backed up by TSM a list above shows all the databases that need to be backed up Management Servers The management Servers can be rebuilt fairly quickly so no backups of these servers are required active directory integration will be used to allow agent failover so if a management server goes down the secondary server takes over the monitored systems that the failed server was monitoring. Gateway Server The gateway server does not require backing up as it will be a virtual machine and will carry all the inbuilt resiliency of VMware. If however a restoration of the gateway server is required than the certificate stored on the root management server will need to be imported to any new gateway serve X Backup Strategy for SCOM as Recommended by Database Administrator Weekly Full backup of all SCOM databases Daily Differential of all SCOM databases Hourly Log backup in between database differential backups Please refer to the xxx for further details on the specific backup process.

11. Maintenance and Best Practices


Systems Centre Operations Manager 2007 will start to accumulate large amounts of data after the system has been deployed into the X Messaging Environment.

PIA - SCOM

V2.3 Draft

Page 23 of 24

PIA - SCOM

V2.3 Draft

To limit service interruption and protect your Operations Manager environment will further develop and implement a comprehensive and effective maintenance plan. This plan is based mainly on best practises from Microsoft, external website sources and our own experience, ensuring effective ongoing maintenance of our Operations Manager environment to improve performance and minimize the chances of failure. Our maintenance will includes the following: Regular monitoring of both software and hardware. Frequent backups of databases and other critical data so that it can be later restored in case of failure. Please refer to the xxx for further details on the maintenance schedule and best practice recommendations by Microsoft.

12. Additional Costs


Name of item Server RAM Upgrades for 8 x DL360 (HP Part# 397415-B21) Cost per unit 500 Total 4000

13. Risks & Issues


One virtual Gateway server will be deployed to take advantage of built in VMware resiliency, specifically dynamic virtual hardware provisioning and the ability to create snapshots. These features will provide the ability to rapidly recover from most failures and is regarded by us as an accepted risk. Annual leave of staff due to time of year may delay the delivery of key milestones. BAU take priority over project work in the event of a major incident resources may be required to assist in service restoration.

PIA - SCOM

V2.3 Draft

Page 24 of 24

S-ar putea să vă placă și