Documente Academic
Documente Profesional
Documente Cultură
David M. DiQuinzio, P.E. Kathleen A. Lucey, FBCI Strategic Facilities, Inc. Montague Technology Management, Inc.
Session Agenda
Kathleen:
Introduction and ground rules Physical Access Security Incident Management Introduction to SFI
David:
Reliability vs. Availability The Players Risk Assessment Case Studies
GROUND RULES
Please interrupt immediately if you... Cant hear Cant see or read the slides Find the presentation confusing Lets address the situation ASAP!
Introduction
Its about working together to avoid the interruption and minimize both its recurrence and its impact... Who are the players...and how do we work together ? Making the right decisions for design, detection, and response Managing the incident to minimize impact and deter recurrence.
BCARE SOLUTIONS
Continuity Availability Reliability
Engineering
Process Dysfunctions
What is a Perimeter?
Controlled border
External: Public / First Level. May be outside of building. Second: Building Access. May include elevators and stairways. Multiple interior: authorization related to function-based need to know
Defensive Depth
Multiple barriers to breach: make an intruder work harder Multiple levels, multiple techniques Multiple levels of monitoring and detection Introduce random supplemental checks
Universal Application
Every time Every person Every control point Weekdays, nights and weekends Especially no official piggybacking Why: keeps the bright line between authorized and unauthorized
10
Monitoring/Detection/Response
Monitoring: what conditions, when Detection: manual, automatic, alarms; who is notified? Response:
Who, what, when How contacted Logistics and SLA
11
I have a delivery for Mr./Ms. X. Concealment within interior protected areas Exploitation of known system flaws
12
Incident Management:
How to Get a High ROI
13
Incident Management
Players Response Management Debriefing and Documentation Follow-up: Implementing Adjustments
14
Players
BC should be taking the LEAD IT Facilities: Internal, Building Management, + vendors, contractors Physical Security: Internal, Building Management, + external contractors
15
Notify the most knowledgeable person for this case within the appropriate time interval Eliminate response single-points-of failure through cross-coverage and training.
16
Analysis and problem resolution leads to design of the fix. The fix is then applied by the appropriate party, but...
THE FIX DOES NOT END HERE!
17
18
20
BREAK!!
21
22
Sjogren
11+ years at UPS as IT Facilities Director Ramapo Ridge Data Center Mahwah, NJ Windward Data Center Alpharetta, GA
23
Projects
Site Capacity, Reliability & Ops Analyses Critical Systems Testing & Commissioning Operating Procedures & Programs New Critical Systems Technology Studies Serve as Interim Facilities Department
Hughes State Street Cingular Safeco 1st National Bank of Omaha Salt River Project
24
25
PART 2
Facilities, Security, IT & BCP - Risk & Reliability as Common Threads
28
UNTIL NOW
30
...AND SO DO REGULATORS
31
Copyright 2004 Strategic Facilities Inc. All rights reserved
35
Copyright 2004 Strategic Facilities Inc. All rights reserved
36
37
Copyright 2004 Strategic Facilities Inc. All rights reserved
38
Copyright 2004 Strategic Facilities Inc. All rights reserved
AERIAL VIEW
39
40
42
43
45
46
ACCEPTANCE CURVE
47
48
49
50
51
Copyright 2004 Strategic Facilities Inc. All rights reserved
MORE LESSONS
RELIABILITY
What is the probability that a system will
operate correctly? Over what mission time? Severity of failure is part of the risk conversation, not the reliability conversation Duration of failure is also a separate variable Duration is also part of the risk conversation and also NOT part of the reliability conversation
52
EMPIRICAL LIFETIME
53
THE GOAL...
54
WORTHWHILE? MAYBE...
55
MAYBE NOT...
56
reserved
LESSONS III
MORE RELIABILITY
Can be expressed as Mean Time To Failure
(MTTF) MTTF is OK, but lacks mission time context Probability of success over mission time does a better job of depicting the situation Probability of failure = 1 - (Probability of success) Duration of failure known as Mean Time To Restore, or MTTR Probability of success or failure of an individual system does not depend on MTTR
57
LESSONS IV
AVAILABILITY
Different concept entirely Comparison of MTTF & MTTR Mathematically: MTTF / (MTTF + MTTR) Grossly misused throughout industry in the form of nines; usually, MTTF >> MTTR Misuse due to two-dimensional nature Does not mean that MTTR and Availability do not matter
58
AVAILABILITY - IT DEPENDS
59
System B
4 failures, avg. 1/2.5 yrs
Down 5 min each time
Reliability: MTTF = 9
yrs; only 1 sample Availability: 90 % More reliable (?), less available Less certain
60
LESSONS V
HOW SYSTEMS FAIL
Independently due to internal, local failure Due to a common cause effect; that is,
something that affects entire system at once Natural or man-made disaster, for example; tend to be high severity, low frequency Human error is most frequent common-cause failure mode; often less severe than disasters
WWII to explain V2 rocket failures Brought to USA by Werner von Braun and his associates Refined by many over the years since
62
PRA ACCOMPLISHMENTS
Aviation: Odds of you NOT getting off a
commercial airliner in one piece are now less than one in one million Nuclear Power: USA output is up 20% and reportable incidents down despite older fleet and no new plants since early 80s Slow and steady improvement, not gee-whiz breakthroughs Very limited application in Facilities arena
63
Copyright 2004 Strategic Facilities Inc. All rights reserved
PART 3
Case Studies:
Using Risk & Reliability Language to Improve Coordination among Facilities, Security, IT & BCP
64
66
70
ANY SUGGESTIONS?
71
Q+A
74
Contact us at:
David M. DiQuinzio (973) 903-3699
DaveD@strategicfacilities.com
75
LATER, DUDES!!!
76