Documente Academic
Documente Profesional
Documente Cultură
SCOM Design
Synopsis:
System Centre Operations Manager (SCOM) is an end-to-end service monitoring solution that can monitor clients, events, services, applications and network devices. It presents integration for Microsoft Products Such as Live Communication Server 2005, MS SQL and Sharepoint.
Segment:
EMEA
Authors: Contributors:
Mr X. Mr Y.
PIA - SCOM
V2.3 Draft
Contents
1. Project Summary ............................................................................................................................... 3 1.1 References ................................................................................................................................ 3 1.2 Change History .......................................................................................................................... 3 1.3 Glossary .................................................................................................................................... 4 2. Business Context ............................................................................................................................... 5 2.1 SCOM Pilot ................................................................................................................................ 5 2.2 Other Key Benefits..................................................................................................................... 6 2.3 Scope ........................................................................................................................................ 6 3. SCOM Architectural Design ................................................................................................................ 7 3.1 Original Design in ASMB............................................................................................................ 7 3.2 Proposed Design ....................................................................................................................... 8 3.3 System Centre Operations Manager Data Flow Diagram ........................................................... 9 3.3a Agent to server Communication ............................................................................................... 10 3.4 Previous MOM installation in Interchange ................................................................................ 10 3.5 Microsofts Recommendations ................................................................................................. 10 3.6 Deployment Plan ..................................................................................................................... 11 3.7 Testing Schedule ..................................................................................................................... 12 4. Project Requirement Analysis........................................................................................................... 14 4.1 Requirements: ......................................................................................................................... 14 4.2 Admin Requirements ............................................................................................................... 14 4.3 Equipment List ......................................................................................................................... 14 5. Database Sizing ............................................................................................................................... 16 5.1 Database Sizing and Design .................................................................................................... 16 6. Network Recommendations .......................................................................................................... 17 7. Storage Requirements .................................................................................................................. 17 8. Security ........................................................................................................................................ 18 8.1 MacAfee Exclusions for System Centre Operations Manager ................................................... 18 8.2 Operations Manager 2007 (management servers and agents): ................................................ 18 8.3 Exclusion of File Type by Extensions ....................................................................................... 18 8.4 Considerations ......................................................................................................................... 19 8.5 Access Control Model .............................................................................................................. 19 9. Capacity Forecast & Schedule Availability via PAWZ .................................................................... 22 10. Disaster Recovery Backup & Restore ...................................................................................... 22 10.1 Backup SCOM Databases ....................................................................................................... 22 10.2 Backup software Requirements ............................................................................................... 22 10.3 List of servers to be backed up ................................................................................................ 22 10.4 Clustered SQL Server .............................................................................................................. 23 11. Maintenance and Best Practices .............................................................................................. 23 12. Additional Costs ....................................................................................................................... 24 13. Risks & Issues ......................................................................................................................... 24
PIA - SCOM
V2.3 Draft
Page 2 of 24
PIA - SCOM
V2.3 Draft
1. Project Summary
SCOM is a Microsoft software solution which will provide end-to-end service monitoring of X messaging environments. It presents integration for Microsoft Products Such as Live Communication Server 2005, MS SQL, interchange and SharePoint.
1.1
References
Document Date Author
1 2 3 4 5
1.2
Ver 0.1 0.2 0.3 0.4
Change History
Date Author Key Changes
PIA - SCOM
V2.3 Draft
Page 3 of 24
PIA - SCOM
V2.3 Draft
1.3
Term SCOM Collab LCS RMS ASMB TDP IOPS ASM Prod VM LAN
Glossary
Definition System Center Operations Manager Collaboration (Microsoft) Live Communication Server Root Management Server Assembly Blue Tivoli Data Protector Input/Output Per Second Assembly Environment Production Environment Virtual Machine Local Area Network
PIA - SCOM
V2.3 Draft
Page 4 of 24
PIA - SCOM
V2.3 Draft
2. Business Context
GMI is the solution which currently provides monitoring of the Messaging environment however; there are some limitations on its monitoring capabilities. GMI cannot report effectively to the granularity required for critical business systems nor can it provide a comprehensive view of the health of the environment. SCOM provides helpful tools to manage the environment in its entirety and has the ability to integrate with other existing tools such as GMI. Key benefits: Improve Service Health
Improve services health while driving alignment with business SLAs. SCOM provides easy to use reporting and authoring capabilities. There is full visibility of service health but also the ability to monitor in a more proactive manner to circumvent the likeliness of service impacting issues. Unify Management of Complete Messaging Environment
Visibility across platforms, applications and components. SCOM provides a single view, including application and infrastructure components across Windows and non-Windows environments. Dynamically Respond to Changes
SCOM enables the ability to dynamically respond to changes with automated action to ensure continued service performance and availability. Actions can be taken to remediate a service directly from the console, making it easy to restore it back to full health in an operationally efficient manner.
2.1
SCOM Pilot
An initial SCOM Pilot within the ASMB environment presents the following findings: SCOM utilises a hierarchical, overall health philosophy, where alerts are associated with each other to assist in identifying the underlying issue which is currently not offered by GMI. SCOM alerts on minor and major issues that have not yet become critical or have not interrupted the normal operation of a configuration item. Alerts are visible at the top level of the hierarchy and it is possible to drill down to the offending constituent parts causing the alert. These broken items would pre-empt the failure before it becomes service impacting.
PIA - SCOM
V2.3 Draft
Page 5 of 24
PIA - SCOM
V2.3 Draft
2.2
Integration with GMI Using Microsoft Orchestrator / Opalis, SCOM is able to integrate with GMI Webtop. SCOM is able to take alerts and pass them to GMI, eliminating the need for an additional monitor.
Automation SCOM has native automation capabilities, which are further enhanced by the presence of Microsoft Orchestrator / Opalis.
Economy of implementation Using existing non cutting edge hardware, through lessons learnt and bugs that have been resolved, the third implementation of the software should be accelerated and mature.
Diverse monitoring capability SCOM can make use of third party management packs to monitor many third party products such as, Solaris OS, LINUX, Cisco switches.
Built in knowledgebase The built in knowledgebase contains common problems and how to resolve them as well as common fixes. This knowledgebase is also expandable and easily accessible through the SCOM Console interface.
2.3
Scope
In scope: SCOM monitoring solution in ASMB SCOM monitoring solution in PROD Out of scope SCOM via Opalis to GMI Handover to 2nd Level Handover to TPH SCOM monitoring solution in ADMB DR SCOM monitoring solution in PROD DR That which is not in scope
PIA - SCOM
V2.3 Draft
Page 6 of 24
PIA - SCOM
V2.3 Draft
The above diagram is an illustration of the original design concept (see Figure.1).This configuration uses a single Root Management Server virtual machine and a single physical database server which was chosen for ease of implementation in-conjunction with the Lync project. This design failed to meet requirements required to implement the design into the production environment. Flaws in the design presented: Single points of failure The ASMB concept had two single points of failure which were the database and the RMS they both had no inbuilt resiliency e.g. clustering or management servers this has been addressed in the new design. Little scope for growth There was very little scope to grow the number of systems monitored as the RMS had poor performance and did not seem able to handle the load efficiently. (Please see section three)
This Pilot will remain in ASMB for further analysis during the design & planning stage of the project. Once this stage is complete, the pilot will be replaced with a proposed design which eliminates the flaws identified in the original design.
PIA - SCOM
V2.3 Draft
Page 7 of 24
PIA - SCOM
V2.3 Draft
3.2
Proposed Design
The new design illustrated above considers: High Availability The clustered configuration of our new design will provide the extra redundancy in the event of a failure, our present Installation in Assembly blue does not provide this. The single Root Management Server and the single Database and single Opalis installations all provide single points of failure which are unacceptable for a production installation. Resilience The new design also has two extra management servers which provide monitoring redundancy. Active Directory integration makes it possible for agents to automatically failover to another management server in the management group, if their assigned management server fails. Use of VMWARE The Gateway Server is a virtual machine and has the entire resiliency offered by the Vmware Vsphere infrastructure. The Gateway Server will communicate with the management server through certificate based authentication. If the management server that communicates with the gateway server fails, then the gateway server will failover to the other management server which would then take over monitoring systems on the untrusted domain.
PIA - SCOM
V2.3 Draft
Page 8 of 24
PIA - SCOM
V2.3 Draft
3.3
PIA - SCOM
V2.3 Draft
Page 9 of 24
PIA - SCOM
V2.3 Draft
3.4
The current MOM configuration is not offering the level of monitoring required to examine the environment and provide a proactive level of support. A number of issues may be going unnoticed and subsequently leading to impact to service. However, following an analysis of the previous MOM installation in Interchange, therefore parts of the old MOM installation can be salvaged to form the new SCOM design. The MOM management server will no longer be required as Interchange will be monitored via the SCOM Gateway and Root Management Servers. Agents will need to be reinstalled, as there is no upgrade path available from MOM 2005 to SCOM 2007. The previous MOM configuration is then to be decommissioned.
3.5
Microsofts Recommendations
SCOM via VM This is not a recommended solution from Microsoft; the present pilot setup includes one Root Management Server on a VM and one Database Server on a physical machine. Microsoft recommend against using VMs in the case of either server components due to performance reasons. Sluggish performance on a VM There is an impact on response times with the use of VMs, despite the fact that the network is more than able to support SCOM and the fact that the database had been tuned to use separate physical disks. The reason for this is due to the fact that the SCOM Root Management Server has a high IOPS threshold, and according to Microsoft, this is not suitable for VMs. Fault tolerance
PIA - SCOM
V2.3 Draft
Page 10 of 24
PIA - SCOM
V2.3 Draft
The recommendation from Microsoft is to add separate Management servers specifically for the function of passing data to the database to ease the load on the Root Management server. We have also opted to Cluster both the SCOM Database and Root Manangement Servers for added fault tolerance. Opalis will be installed on the Root Management Server as per the old design and being on a cluster, Opalis will benefit from the added fault tolerance which is missing from the old model. Untrusted Domain There is no Firewall between Interchange and RM, but because Interchange is on another Domain a gateway server will be required. The native SCOM Gateway will provide secure encrypted communication between Messaging and Interchange via mutual authentication Certificates using Port 5723.
3.6
Deployment Plan
STAGE NAME Initiation - Objectives - Scope - Stakeholders - PIA Document Planning & Design - H/W & S/W Requirements - High Level Design - Capacity - Maintenance - Storage - Database - Network - Security - Access Control Model - Test Plan
Days 5
25
28/11/2011
04/01/2012
PIA - SCOM
V2.3 Draft
Page 11 of 24
PIA - SCOM
V2.3 Draft
PHASE 1: Implementation ASMB - Base Build - Installation - Configuration - Testing PHASE 2: Implementation PROD - Base Build - Installation - Configuration - Testing PHASE 3: Scom via OPALIS to GMI nd Handover to 2 Level Handover to TPH ASMB DR Build PROD DR Build
30
03/01/2012
10/02/2012
09/01/2012
24/02/2011
Out of Scope
TBC
TBC
Out of scope
TBC
TBC
10
March 2012
April 2012
3.7
Testing Schedule
Reason For Test To test Cluster failover and check service availability Expected Result There should be no interruption to services on the RMS server and the SQL server Monitored systems should failover to the secondary management server when the first management server is switched off Cumulative update 5 should be applied to the RMS cluster Result achieved Success/Failure
Testing to see if monitored systems failover to an alternate management server if the primary management server goes down Testing cumulative update process for RMS and verify that the RMS is at the correct patch level Comparing events generated in scom
PIA - SCOM
V2.3 Draft
Page 12 of 24
PIA - SCOM
V2.3 Draft
systems
PIA - SCOM
V2.3 Draft
Page 13 of 24
PIA - SCOM
V2.3 Draft
006
Operational Requirements The system SHALL provide a high-availability solution. The system SHALL provide resilience. The system SHALL have external support agreements. The system SHALL have backup and restore processes.
ID 011 012
Environment Requirements The system SHALL adhere to X hardware and software standards. The system SHALL include a test/ Non production environment.
ID 013
4.2
Admin Requirements
There are no direct administration requirements to the system other than predefined static configuration files that are part of the overall install package. All user details and access should be managed by external capabilities (Collaboration Admin Portal / Siebel / eSpresso / AD).
4.3
Equipment List
ASMB
LCS Production
PIA - SCOM
V2.3 Draft
Page 14 of 24
PIA - SCOM
V2.3 Draft
Root Management Server (Clustered) 2x HP DL360 G5 16 GB RAM Min 60GB native Hard disks SAN Storage 136 GB
Root Management Server (Clustered/Non Clustered) 2/1x HP DL360 G5 16 GB RAM Min. 60GB native Hard disks SAN Storage 136 GB DB Server (Clustered/Non Clustered) 2/1x HP DL585 G5 32GB RAM 60GB native Hard disks SAN Storage 500 GB (see sizing based on 30 days data and 500 agents) Management Servers (Non Clustered) 2x HP DL360 G5 or equivalent 60GB native Hard disks 12 GB RAM Min. Gateway Server 1x Virtual Machine 8 GB RAM 60 GB HDD Min.
DB Server (Clustered) 2x HP DL585 G7 32GB RAM 60GB native Hard disks SAN Storage 500 GB (see sizing based on 30 days data and 500 agents) Management Servers (Non Clustered) 2x HP DL360 G5 or equivalent 60GB native Hard disks 12 GB RAM Min. Gateway Server 1x Virtual Machine 8 GB RAM 60 GB HDD Min.
PIA - SCOM
V2.3 Draft
Page 15 of 24
PIA - SCOM
V2.3 Draft
5. Database Sizing
5.1 Database Sizing and Design
The following table shows the amount of space used by the same SCOM SQL databases in the present proof of concept in ASMB. These figures were used to work out how much space would be required for a SCOM environment with 500 Agents.
Database diskspace estimation. 277 Physical / 125 VMs + room = 26 Interchange & for error / growth we will work with the number of 500 Agents, but the actual number will be nearer 420 Agents to allow for growth Grooming Interval (days) Present conf. diskspace MB for 40 Agents Estimated diskspace GB for 500 Agents (40 Agents x 12.5 = 500)
DB Component Database 1) Ops Mgr DW DB 2) Ops Mgr DB 3) Opalis DB 4) Reports Server DB 5) Reports Server Temp DB SAN Mount Point 1 6) System Master DB 7) System Model DB 8) MSDBData DB 9) System Temp DB SAN Mount Point 2 Transaction Logs 10) Ops Mgr DW DB Transaction Logs 11) Ops Mgr DB Transaction Logs 12) Opalis DB Transaction Logs 13) Reports Server DB Transaction Logs 14) Reports Server Temp DB Transaction Logs 15) Master DB Transaction Logs 16) Model DB Transaction Logs 17) Temp DB Transaction Logs 18) MSDB Data DB Transaction Logs
201375+80560 (+40%) = 282 GB 35000+14000 (+40%) = 49 GB 4437.5+1775 (+40%) = 07 GB 0.10GB 0.04GB 340+68 (+20%) = 408 GB
Autogrow 200MB Autogrow 200MB Autogrow 200MB Autogrow 200MB Autogrow 200MB Autogrow 200MB Autogrow 200MB Autogrow 200MB Autogrow 200MB
PIA - SCOM
V2.3 Draft
Page 16 of 24
PIA - SCOM
V2.3 Draft
SAN Mount Point 3 Quorum Drive SAN Mount Point 4 Required disk space Total
http://www.simple-talk.com/sql/database-administration/estimating-disk-space-requirements-for-databases/
6. Network Recommendations
Network usage measurements and response times were analysed against a snapshot of data over the period of an hour in the pilot deployment of SCOM in ASMB. It is recommended that two devices share the same access switch so that traffic flows are carried on the switch fabric and not across the LAN via the distribution switch. There is an element of risk involved as it shows to impact performance caused by network latency therefore co-locating would eliminate this risk. For details of findings and network requirements analysis, please refer to the following document.
7. Storage Requirements
We assume the disk subsystems have at least the following capability: 125 random I/O operations per second per drive.
The database disks were sized based on the number of SCOM Agents, file sizes, 5 months of data, projected growth and data retention values in the present Pilot in ASMB. For example, the Operations Manager Data warehouse file is currently 26796MB, since July we have collected about 150 days (30*5) worth of data and we actually only wish to keep 90 days worth of data. 26796/150 = 179MB (data per day) 90*179 = 16110MB (90 days worth of data) In order to support the number of 500 Agents we need to multiply this number by 12.5 16110*12.5 = 201375MB Microsoft recommend adding 40% to this number for operations such as indexing etc. 201375/100 = 2014 (1%) 2014*40 = 80560 (40% of 201375) 201375 + 80560 = 281935MB This final figure of 282GB is the projected disk space needed for the SCOM Data warehouse
PIA - SCOM
V2.3 Draft
Page 17 of 24
PIA - SCOM
V2.3 Draft
In Assembly Blue there is less activity than in Production in general, but we have no yard stick to measure this by, so I would like to propose adding a further 20% buffer on to of the final total SAN disk space required (600GB). Our DBA was also consulted regarding the disk space requirements for the database files and transaction logs along with the following websites: http://www.simple-talk.com/sql/database-administration/estimating-disk-space-requirements-for-databases/ http://technet.microsoft.com/en-us/library/bb735402.aspx
8. Security
8.1 MacAfee Exclusions for System Centre Operations Manager
In order for System Centre Operations Manager to run effectively the following exclusions would need to be made on the MacAfee Epolicy orchestrator server this would be for managed systems as well as management servers and the RMS server. The following exclusions need to be applied to the management servers and monitored servers.
8.2
These include the queue and log files used by Operations Manager. Both of these need to be excluded: C:\Program Files\System Center Operations Manager 2007 D:\Program Files\System Center Operations Manager 2007\Health Service State\Health Service Store D:\Program Files\System Center Operations Manager 2007
8.3
SQL Database Servers: These include the SQL Server database files used by Operations Manager components as well as system database files for the master database and tempdb. Examples: MDF, LDF Operations Manager 2007 (management servers and agents): These include the queue and log files used by Operations Manager. Example: EDB, CHK, LOG. SQL Database Servers:
PIA - SCOM
V2.3 Draft
Page 18 of 24
PIA - SCOM
V2.3 Draft
These include the SQL Server database files used by Operations Manager components as well as system database files for the master database and tempdb. To exclude these by directory, exclude the directory for the LDF and MDF files: Examples: C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Data D:\MSSQL\DATA E:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Log
8.4
Considerations
A group policy will be required to place the agent action account into the local administrators group Testing Guide Test and verify cluster failover configuration on both the SCOM database and the RMS servers Test agent failover configuration Test cumulative update process for clustered root management server (Please refer to patching guide for further details) Analyse events from managed systems
8.5
Required Accounts to be Created including Group Policy and Management Packs The below table shows the accounts, permissions and groups memberships that are required, including any Group Policy changes for the SCOM deployment. Management Pack account requirements are also included. SCOM Account Permissions Group Memberships and Extra rights Local Users group, Local Performance Users group. Password never expire, User cannot change password Local Users group, Local Performance Users group. Register the SPN with Active Directory grant service accounts SELF property the right to register and update SDK Password never expire, User cannot change password Collect information and run tasks on managed systems Use
Local Admin Access, Allow Log On Locally (Globally via Group Policy)
Local Admin Access, Allow Log On Locally (Globally via Group Policy)
PIA - SCOM
V2.3 Draft
Page 19 of 24
PIA - SCOM
V2.3 Draft
System Center Management Configuration, (data access and config) SCOMCDA Data Reader SCOMDR Data Warehouse Write Action SCOMDWWA MS SQL Server Action Account SCOMSQL Sharepoint Agent Action Account SCOMSPAA Sharepoint Pool SCOMSPP Sharepoint Config SCOMSPC
Password never expire, User cannot change password Password never expire, User cannot change password Password never expire, User cannot change password
Local Admin Access globally (Via Group Policy) Local Admin Access
Collect information and run tasks on managed Sharepoint systems. To be confirmed Advanced tasks for Sharepoint. To be confirmed Runs services and write data to Operational Database. To be confirmed Administrators accounts. Specific accounts to be confirmed
Password never expire, User cannot change password Password never expire, User cannot change password
The following table shows the groups and members that are required by SCOM, including any Group Policy changes necessary. SCOM Groups Operations Manager Administrators Group Members SCOMADMIN, and any individual accounts for nd COLLAB 2 level. Local Admin Access On all SCOM Servers Miscellaneous
PIA - SCOM
V2.3 Draft
Page 20 of 24
PIA - SCOM
V2.3 Draft
The following table shows Active Directory requirements for the SCOM deployment. Additional OPSMGR Function Organisational Unit in the Active Directory for general Organisational Unit in the Active Directory for MS SQL Organisational Unit in the Active Directory for Sharepoint Notes Required
OPSMGRSQL
To be confirmed
OPSMGRSP
To be confirmed
nd
So far access SCOM for the Collaboration Support Team will be facilitated via RDTABs as with all other monitoring tools. All Collaboration Support Team members will have full SCOM administrator access and all the rights that go with this level of access. Additional Management Packs to be imported Default Management Packs covering a range of functions are installed by default, which are out of scope for this document. The following is a list of additional Management Packs to be imported into SCOM. The requirements for these Management Packs, if any, are covered in the previous section Required Accounts to be Created including Group Policy and Management Packs: Microsoft SharePoint 2010 Products Microsoft SharePoint Foundation 2010 Microsoft.Office.LiveCommunicationsServer.2005 Microsoft.SQLServer.2008.Monitoring SQL Server 2005 (Monitoring) Windows Server 2003 Operating System Windows Server 2008 Operating System Windows Server 2008 Cluster Management Windows Server 2003 Cluster Management Windows Server Internet Information Services 2003 Respective Core Files and Libraries which are dependants of all the above
PIA - SCOM
V2.3 Draft
Page 21 of 24
PIA - SCOM
V2.3 Draft
PIA - SCOM
V2.3 Draft
Page 22 of 24
PIA - SCOM
V2.3 Draft
Clustered SQL Server x2 Backup requirements for the Root management Cluster
The RMS cluster has shared resources on a shared drive; the following files need to be backed up Root Management Server Encryption key this key allows the management group to function All management Packs which are currently installed (Power shell command GetManagementPack | Export-csv c:\ManagementPackList-Nov-2011.csv) will generate a comprehensive list of management packs that SCOM is currently using, all unsealed management packs need to be backed up as well these can be exported to the network shared drive within the cluster on a regular basis. Gateway server certificate this should be exported to the shared network drive so that it can be backed up. Backup the following SCOM registry key Computer\HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft Operations Manager\ export the registry key to a folder on the shared network drive within the cluster, the registry key can then be backed up by TSM with the rest of the files and restored to a new installation of SCOM if necessary.
PIA - SCOM
V2.3 Draft
Page 23 of 24
PIA - SCOM
V2.3 Draft
To limit service interruption and protect your Operations Manager environment will further develop and implement a comprehensive and effective maintenance plan. This plan is based mainly on best practises from Microsoft, external website sources and our own experience, ensuring effective ongoing maintenance of our Operations Manager environment to improve performance and minimize the chances of failure. Our maintenance will includes the following: Regular monitoring of both software and hardware. Frequent backups of databases and other critical data so that it can be later restored in case of failure. Please refer to the xxx for further details on the maintenance schedule and best practice recommendations by Microsoft.
PIA - SCOM
V2.3 Draft
Page 24 of 24