Risk Analysis in Water Drinking Systems

Techneau, June 2009
Methods for risk analysis of drinking water systems from source to tap
- Guidance report on Risk Analysis
Techneau, June 2009
TECHNEAU
2009 TECHNEAU TECHNEAU is an Integrated Project Funded by the European Commission under the Sixth Framework Programme, Sustainable Development, Global Change and Ecosystems Thematic Priority Area (contractnumber 018320). All rights reserved. No part of this book may be reproduced, stored in a database or retrieval system, or published, in any form or in any way, electronically, mechanically, by print, photoprint, microfilm or any other means without prior written permission from the publisher
Colofon
Title Methods for risk analysis of drinking water systems from source to tap - Guidance report on Risk Analysis Authors P. Hokstad1, J. Rstum1, S. Sklet1, L. Rosn2, T.J.R. Pettersson2; A. Linde2, S. Sturm3, R. Beuken4, D. Kirchner6, C. Niewersch6
1SINTEF 2Chalmers 3TZW 5KWR 6RWTH
University of Technology
Aachen
Quality Assurance By LNEC and KWR Deliverable number D 4.2.4
This report is: PU = Public
Contents
Summary 1
1.1 1.2 1.3 1.4 1.5
7 9
9 10 12 14 15
Introduction
Objective and scope Content of the report The TECHNEAU generic framework for risk management Definitions Abbreviations
2
2.1 2.2 2.3 2.4 2.5 2.6
Risk analysis of drinking water systems From source to tap

Initiation and organisation of a complete risk analysis Relevant decision situations for water utilities System description Hazardous events Safety barriers Causes and consequences of hazardous events Risk estimation
17
17 18 19 21 22 23
3
3.1 3.1.1 3.1.2 3.2 3.3
Coarse risk analysis of water supply systems

Identification of hazardous events Various approaches for hazard identification TECHNEAU Hazard Data Base (THDB) Risk estimation in Coarse Risk Analysis (CRA) Tool for Coarse Risk Analysis (CRA)
25
25 25 26 28 29
4
4.1 4.2 4.3 4.4 4.5
Quantification of risk
The dimensions of risk and various ways to quantify risk Qualitative versus quantitative expressions for risk Risk measures for loss of water quality Risk measures for loss of water quantity (supply) Risk measured in monetary units
35
35 36 38 39 40
5
5.1 5.2 5.3 5.3.1 5.3.2
Data for risk analysis

Introduction Data needs Data sources Types of data sources Failure event data bases
43
43 43 45 45 45
6
6.1
More advanced risk analysis methods for water supply systems

Risk modelling and choice of a risk analysis method
47
47
Methods for risk analysis of drinking water systems from source to tap 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 6.10 6.11 6.12 6.13 6.13.1 6.13.2 6.13.3 6.13.4 6.13.5 HAZOP Failure Modes, Effects and Criticality Analysis (FMECA) Removal efficiency of the water treatment system Fault Tree Analysis (FTA) Reliability Block Diagram (RBD) Human Reliability Analysis (HRA) Markov Analysis Cause- effect relations - Bayesian Networks Event Tree Analysis (ETA) - analysing consequences Methods for estimation of risk to human health (QMRA and QCRA) Methods for risk analysis of water quantity (supply) GIS as a tool in risk analysis Introduction GIS in catchment risk management GIS Assisted Risk Analysis Description and application Main requirements Concluding comment 50 52 54 55 58 59 62 65 68 70 70 72 72 73 73 75 75
7 8
Summary of risk analysis methods References
77 81 87 91
91
Appendix A: Main steps of a risk analysis Appendix B. DALY and a generalisation

Appendix B.1. DALY An overall risk measure of health effects.
Appendix B.2. Generalisation of DALY. Combined measure for water quantity and quality. 93
Appendix C. Two examples of FTA

C.1 A Fault Tree Analysis of an UV system C.2 Integrated risk analysis: Fault-tree analysis to investigate causes of failures
95
95 96
Appendix D. Procedure and example of FMECA Appendix E. Analyses to establish treatment and monitoring system Appendix F: Some fundamental reliability concepts
99 107 113
Summary
This report is a deliverable of Work Area 4 (WA4) Risk Assessment and Risk Management in the TECHNEAU project. The main objective of WA4 is to integrate risk assessments of the separate parts in drinking water supplies into a comprehensive decision support framework for costefficient risk management in safe and sustainable drinking water supply. The present report gives an introduction into risk analysis methods for water supply, from source to tap. It describes which problems can be addressed by the various methods. The capabilities and restraints of the methods and typical results of the analyses are presented. This report is focussing on informing staff of water utilities on the available methods for risk analysis. The report will refer to the TECHNEAU Case studies and other literature to give details on the various risk analysis methods. It also refers to the TECHNEAU Hazard Data Base (THDB) and to the TECHNEAU report, Generic framework and method for integrated risk management in water safety plans. Various decision situations where use of risk analysis is relevant are presented, demonstrating typical objectives for carrying out risk analyses. There could for instance be a need of: An initial analysis, prior to the start up of water utility as a basis for design of the supply system; Cost/benefit considerations to identify the best risk reducing alternative; An analysis driven by identified problems related to water quality or water availability; An analysis imposed by rebuilding or operational changes of the utility.
The main steps of a complete risk analysis are discussed: Scope / analysis objective System description Identification of hazards and hazardous events Estimation of risk (probabilities and consequences). Consequences with respect to both water quality and water quantity are considered.
The report describes a Coarse Risk Analysis (CRA), being a total risk analysis, including hazard identification and risk estimation, with the results presented in a risk matrix. This is a relatively simple analysis, and could be carried out by most water utilities with some assistance from risk analysts. In Chapter 7 a summary of all risk analysis methods is presented. However, several of the other risk analysis methods will require deeper knowledge and experience with the risk analysis techniques, and should be carried out by professionals in close cooperation with water utility personnel. Several of these more complex analyses are described in the last part of the report. Possible objectives of these more advanced methods are described. It can for instance be to analyse
the causes of hazardous/undesired events, the consequences of these events, to assess the availability of water to various consumers (distribution network analysis), to analyse the effect of human errors on system reliability, and to perform maintenance optimisation, etc. The report discusses various ways to quantify relevant aspects of risk for water supply. The data needed to perform risk analyses are also described. This risk analysis report will not in any detail treat the risk acceptance and risk evaluation steps of risk assessment.
1 Introduction
1.1 Objective and scope
The main objective of Work Area 4 (WA4) Risk Assessment and Risk Management in TECHNEAU is [1]: to integrate risk assessments of the separate parts into a comprehensive decision support framework for cost-efficient risk management in safe and sustainable drinking water supply. The goals in WA 4 are also to provide tools and guiding documents for water utilities carrying out risk assessment and risk management. A number of the tools are also tested in case studies and are disseminated through training seminars. A conceptual schematic of the framework, guidance reports (like this) and tools that is to be produced in WA 4 is presented in Figure 1 and the present guidance report is the Methods for Analysing Risk.
Figure 1. A conceptual schematic of the framework, guides and tools that is produced in WA 4 with this report Methods for Analysing Risks put into the context. This report is based on the TECHNEAU report [2] Generic Framework and Methods for Risk Management in Water Safety Plans, where risk management is discussed, and also some risk analysis method are described. The present report provides a more complete overview of risk analyses methods for water utilities. The main target groups of the report are management and operational personnel of water utilities with some basic understanding of the main concepts of risk management. The present report aims at demonstrating the application of various risk analysis methods for water utility systems. Thus, it can serve as a guide to the management of a water utility regarding which analyses are most relevant in various decision situations. The report is intended to give insight into the capabilities, the use and potential results of the various methods for risk analysis. However, the report will not give all details of the various methods, as this would require a full text book, which is not the scope here. Thus, the water
utilities (at least the smaller ones) may need support from risk analysts/consultants in order to apply many of the methods described here. Within TECHNEAU we define the term water safety as Water supply that protects water availability and human health with a high degree of practical certainty that comprises both loss of water quality and water quantity, and the report covers both these aspects of risk. The report focuses on risk analysis, and will not go into any detail on risk evaluation and risk reduction and control, but will be presented in later reports. Risk evaluation has already been briefly discussed within WA4 [2]. Within WA4 six case studies are carried out at different water supply systems. In these case studies the applicability of various methods for RA is tested. The results of these case studies have been integrated into this report and are described more into detail in the various case study reports, notably: Bergen Coarse Risk Analysis (CRA), [65]. Gteborg Quantitative and probabilistic method based on a fault tree analysis, [31]. Amsterdam Network simulation model and Bayesian belief networks, [73]. Freiburg-Ebnet GIS (Geographic Information System) Assisted Risk Analysis (GARA-method), [63]. Beznice CRA and a Failure Modes and Effect Analysis (FMEA), [74]. Upper Nyameni CRA and the South African Risk Evaluation Guidelines, [75].
1.2 Content of the report
Figure 2 gives an overview of the various topics described in this report, (Section/Chapter no. in parenthesis). First some basic background information is given; (see top box in Figure 2): The TECHNEAU framework for risk management, showing that risk analysis is an integrated part of risk management, (Section 1.3). Definitions of risk analysis terms, (Section 1.4). Abbreviations used, (Section 1.5). Further, the main tasks of risk analysis (middle boxes in Figure 2): Chapter 2 gives an overview of the integrated approach to a complete risk analysis of a water utility. First some relevant decision situations for carrying out a risk analysis are given. The analysis will include description of the total system, identification of hazardous events and finally risk estimation. Chapter 4 describes various ways to quantify and measure1 different risks to a water utility. Chapter 5 discusses data needed to carry out a risk analysis. In the lower box in Figure 2 various risk analyses methods are listed and described in the report.
1
In a risk analysis we may apply risk measures to quantify (express) the risk. Thus, here we are not referring to measurements.
10
First, Chapter 3 describes a complete coarse risk analysis (CRA), including both hazard identification and risk estimation. In the CRA this often restricts to a semi-quantitative risk estimate, giving probability and consequence categories. A CRA is often the first risk analysis to be carried out for a utility, and when a CRA has been carried out, the water utility should have a rough picture of the main risks. However, the need of more detailed analyses of critical events/subsystems can be identified.
Figure 2. Overview of topics covered in the report (relevant Section/Chapter no. in parenthesis).
In Chapter 6 we describe several of the more advanced risk analysis methods that could be relevant for a more detailed investigation of the risk related to a water utility. Thus, Chapter 6 can then be seen as a Part II of the report, advanced level of risk analysis, for readers that have some additional background in the risk and reliability concepts. Therefore, these more advanced risk analysis techniques will not directly apply to ordinary water utility personnel but for the interested reader. The advanced methods are, (illustrated in the lower part of Figure 2):
11
Hazard and operability (HAZOP) analysis, being a rather elaborate method for hazard identification, (Section 6.2). Failure Mode, Effects and Criticality Analysis (FMECA), a systematic way to identify and document the failure modes (and consequences of failure) of a specified system, (Section 6.3). Analysis of the efficiency of treatment systems, in order to identify proper method for water treatment applying FMECA, (Section 6.4). Fault Tree Analysis (FTA), a systematic approach to break down a failure event into its causes/contributors, (Section 6.5). Appendix C gives two examples of the use of FTA. Reliability Block diagram, gives very much the same information as a fault tree, but gives a different graphical presentation of the result, (Section 6.6). Human Reliability Analyses (HRA), various analyses to identify human errors and assess the consequences of these, (Section 6.7). Markov Analysis, a somewhat detailed analysis that can be carried out to analyse a system that can pass through various (performance) states. Can be relevant for maintenance analyses, (Section 6.8). Bayesian networks, a mathematical technique used to analyse dependencies between variables. It can be used for instance to assess the effect of factors influencing the risk, (Section 6.9). Event Tree Analysis, used to evaluate various outcomes (consequences) of an undesired (hazardous) event, (Section 6.10). QMRA/QCRA, analysis techniques to estimate the effects on human health of microbiological or chemical hazards, (Section 6.11). Distribution network analyses, methods to assess the performance of a distribution network. This type of analysis is mostly relevant for analysing water quantity, although in TECHNEAU (WA 5) research is executed on modelling water quality, (Section 6.12). GIS tools, allowing geographical representation and analysis of infrastructure assets and the tracking of associated hazards/risks, (Section 6.13).
Note that a couple of these methods (Treatment efficiency and GIS tools) are usually not considered as risk analysis methods, but are included here as being important in risk analysis of water utilities. Conclusions, including an overview of the various risk analysis methods and the applicability of these methods, are given in Chapter 7. Finally note that an overview of the main steps of a risk analysis is presented in Appendix A; for each step giving references to the relevant sections of the report.
1.3 The TECHNEAU generic framework for risk management
The risk management process is illustrated in Figure 3. This presents the TECHNEAU generic framework for integrated risk management, (see [2]), which includes the following main components: Risk Analysis In a risk analysis the various hazardous events related to the water utility are identified, and the corresponding risks are estimated. This is done by estimating e.g.
12
the frequency of hazardous events and various consequences of these events. Risk evaluation The risk evaluation requires that a risk acceptance/tolerability criterion is defined (by the water utility). The estimated risk is then compared with this acceptance criteria in order to decide whether the risk is acceptable (tolerable), see [2]. Further, various risk reduction options are considered to evaluate their cost-effectiveness. Risk control Risk reduction options have to be decided on and then implemented. In particular, risks above the acceptance criteria must be treated. Further, the risk is monitored during operation of the utility. (This activity will essentially to be treated in forthcoming report of TECHNEAU, WP4.3.)
Risk Analysis Define Scope Identify Hazards Estimate Risks

Qualitative Quantitative
Get new information
Update
Analyse sensitivity Risk Evaluation Define tolerability criteria

Water quality Water quantity
Develop supporting programmes Document and assure quality
Analyse risk reduction options

Ranking Cost-efficiency Cost-benefit
Risk Reduction/ Control Make decisions Treat risks Monitor
Report and communicate
Review, approve and audit
Figure 3. The main components of the TECHNEAU generic framework for integrated risk management in WSP [2].
The first component of this framework, risk analysis (the scope of this report), includes the following three steps: 1. Definition of scope of risk analysis A complete risk analysis will start by defining the scope of the analysis, (see Section 2.2 below). For a water utility the objective of the analysis could be related to one or more of the following topics: Water quality, Water quantity (and availability), Economy,
13
Environmental impact, Consumer trust. The present report focuses on water quality and water quantity. System definition/description and limitations of analysis are also given in this initial step. Further, an appropriate team need to be assembled according to the scope of the analysis. 2. Hazard identification The next step is the identification of all hazards and hazardous events. A hazard is usually given as a source of potential harm, (e.g. the existence of a farming or industrial activity in the catchment area). A hazardous event is an event which can cause harm, (e.g. the existence of hazardous agents in the drinking water source). Various methods exist for identifying hazards and hazardous events, e.g. checklists [20], experience from the past and expert judgements. Risk estimation A lot of methods exist for modelling and estimating the various risks to a water utility. A proper method need to be selected with respect to the specific scope of the risk analysis. Important considerations are if qualitative, semi-quantitative or quantitative measures of risk are needed and if the risk analysis comprises the complete water utility or some subsystem(s) of it.
Various activities are required in order to carry out a risk analysis, risk evaluation and risk control. These are indicated in the rightmost box of Figure 3.
1.4 Definitions
The following definitions of terms are applied in the TECHNEAU project, cf. [2]: Hazard is a source of potential harm or a situation with a potential of harm. Hazardous agent is for example a biological, chemical, physical or radiological agent that has the potential to cause harm. Hazardous event is an event which can trigger a hazard and cause harm. Hazard identification is the process of recognizing that a hazard exists and defining its characteristics. Risk is a combination of the frequency, or probability2, of occurrence and the consequences of a specified hazardous event. Risk analysis is the systematic use of available information to identify hazards and to estimate the risk to individuals or populations, property or the environment. Risk estimation is the process used to produce a measure of the level of risk being analysed. Risk estimation consists of the following steps; frequency analysis, consequence analysis, and their integration. Risk evaluation is the process in which judgements are made on the tolerability of the risk on the basis of risk analysis and taking into account factors such as socioeconomic and environmental aspects.
2
When we give the mean number of events during a fixed period of time (e.g. per year), we talk about a frequency, f. For instance, f = 3/year. We can also give the probability that the event will occur during one year. The probability, p, is dimensionless and is always a number between 0 and 1. For instance p = 0.1 means that on the average the event will occur in one out of 10 years, (and frequency, f = 0.1/ year.) Likelihood can be used as a common word for probability and frequency.
14
Risk assessment is the overall process of risk analysis and risk evaluation. Risk management is the systematic application of management policies, procedures and practices to the tasks of analysing, evaluating and controlling risk. Water safety is defined (within TECHNEAU) as: Water supply that protects water availability and human health with a high degree of practical certainty.
1.5 Abbreviations
ALARP BN CCP CML CRA DALY ETA FTA FMEA FMECA GARA GIS HACCP HAZID HAZOP HCI HIA HRA MTTF MTTR PFD PHA PSF QCRA QMRA RBD ROS RPN SSM THDB WHO WSP WSS YLD YLL YLQ YLS
As Low As Reasonable Practicable Bayesian Network Critical Control Points Customer Minutes Lost Coarse Risk Analysis Disability Adjusted Life Years Event Tree Analysis Fault Tree Analysis Failure Modes and Effect Analysis Failure Modes, Effects and Criticality Analysis GIS Assisted Risk Analysis Geographic Information System Hazard Analysis and Critical Control points Hazard Identification Hazard and Operability analysis Hydraulic Criticality Index Health Impact Assessment Human Reliability Assessment Mean Time To Failure Mean Time To Repair Probability of Failure on Demand Preliminary Hazard Analysis Performance Shaping Factors Quantitative Chemical Risk Assessment Quantitative Microbiological Risk Assessment Reliability Block Diagram Risk and Vulnerability Analysis (ROS is the Norwegian abbreviation) Risk Priority Number Substandard Supply Minutes TECHNEAU Hazard Database World Health Organisation Water Safety Plans Water Supply Structure Years Lived with Disability Years of Life Lost Years of Life with water supply of bad Quality Years of Life without water Supply
15
16
2 Risk analysis of drinking water systems From source to tap

This chapter describes the overall structure and main elements of the risk analysis process, including the motivation for carrying out a risk analysis. The process is described in detail in Appendix A (also briefly described in Section 1.3) and includes the following main steps: 1. Scope definition, including study initiation/organisation, system description and assembling a team. 2. The identification of hazardous events. 3. The risk estimation. Some details of the process are given below. Further, Appendix A presents a summary of the various steps of the risk analysis.
2.1 Initiation and organisation of a complete risk analysis
A risk analysis should be initiated by a general objective on how to reduce the risk for the public or the water utility. Further, a clear scope of the specific analysis should always be formulated (e.g. see [4]). When assembling the risk (analysis) team relevant stakeholders are to be identified, e.g. water utility owners, safety managers, consumers, municipalities, health authorities. These decide whether any restrictions should be imposed on the work; for instance whether only a subsystem of the utility should be considered, or whether to include only specific types of hazardous events or risk reduction options. Critical stakeholders, for example, hospitals, kindergartens, and schools, have to be identified and given special attention during the analysis. As discussed below, in Section 2.2, the risk analysis could be initiated by making an overview of the overall risk situation within the supply system for the specific decision situations. If the water utility is in a decision situation it should consider the questions: What is the problem? What are the alternatives? Who is affected by the decision? Who is making the decision? (For whom shall we carry out the analysis?) Which aspects are considered when making the decision? What are the wishes and priorities of the various stakeholders?
When risk analyses are utilised as decision support there are several ways to express (quantify) the various aspects of the risk. So if there are various benefits and losses (potential consequences) involved, the comparison of these benefits/losses may represent (ethical) problems, which must be handled by decision makers. One typical difficulty is how to give value to human life.
17
Further, an analysis team must be selected; e.g. it must be decided who shall participate in the analysis work: risk analyst(s), various experts and generalists. The team should consist of water works experts (operators, planners, laboratory personnel etc.), and some outside specialists (e.g. researchers, consultants etc.) that may introduce new perspectives in the risk analysis process. The working process must also be organised in a combination of meetings (with information gathering and evaluation) and analysis work. Thus the initial part of the analysis process is to organise and make a plan for the work. In this respect it is important to stress the importance of having commitment from all professional categories of the water company in order to achieve real risk reductions as a result of the work.
2.2 Relevant decision situations for water utilities
The scope of a risk analysis should describe the purpose of the analysis and the problems that initiated it. Below we list some typical decision situations for water companies, which could initiate risk analyses work. Practical examples on this are included. Initial risk analyses, required prior to the start up of a plant/water utility, (or modifications, such as rebuilding or operational changes).. Drinking water supply is subjected to many different risks and it is important to focus the risk control to the most important areas. Relevant objectives to initiate a risk analysis could simply be a need to:
o o o
Identify and rank all hazards (in order to control risk); Estimate the risk to identify any need of additional Critical Control Points, (CCP); Evaluate cost/benefit of risk reduction options to achieve an acceptable risk.
Examples of typical (specific) questions that could launch a risk analysis exercise could be:
o
Which out of all chemicals and microbial substances are most critical to health aspects for drinking water consumers? How do we compare the risks of shipping petroleum products on the raw water source with cattle grazing next to it?
Analyses carried out to optimise operational maintenance and emergency procedures. The protection against water-borne diseases by implementing new barriers in the plant may be a long-term action for many water utilities, and it may take some years before the required barrier function is implemented. So, in the meantime: o o How can the protection be improved by optimizing the present treatment? Which risks can be reduced by process optimization? How important are periods with suboptimal performance?
Analyses initiated by specific operational problem. The water utility may have a deviation reporting system that gives support to the handling of specific problems. Such a system gives information on the acute actions. It is also designed to sort out the need for improvements in order to avoid similar events or to reduce the consequences of them. For instance, it has been experienced that
18
deviations related to a very rainy autumn can include simultaneous pollution in the main raw water source and the back-up water source. This can be combined with humic contents in the raw water reservoir and inadequate sludge removal in the treatment. So, relevant questions are:
o o
What is the likelihood of such combinations in the future? How can they be detected and avoided, or how can the consequences be reduced?
More generally, risk analyses could be initiated by problems like:

o
o o o
Delivered water is observed not to comply with required quality standards (e.g. unacceptable level of some bacteria) Reduced availability of water delivery observed (to some group of users) Observed security problems Occurrence of an unwanted event (accident investigation)
Analyses to update initial risk analyses, in order to include possible new hazards. For instance, a water treatment plan could be designed for having a multi-barrier protection, while according to new knowledge formerly unknown microbial agents are pointed out as an important hazard. For instance the parasitic protozoa Cryptosporidium was not described until 1970s as infecting humans [8], and not recognized until 1984 as a waterborne infection. The recognition of new hazards can result in new risk reduction options as e.g. improved barriers against Cryptosporidium. So relevant questions to initiate further analyses could be:
o
o o
Are the barriers in the water treatment plant sufficient for emerging microbial contamination? Does the raw water contain micro-organisms that can harm human health? How will present treatment meet to the predicted climate change?
Analysis to obtain acceptable risk with respect to supply, (major delivery failures). Water utilities may have acceptance levels for interruption of supply that take into consideration number of consumers without water and time without water. The risk of limited delivery failures can be calculated from statistical data, but little information is available of the larger failures. Relevant problems:
o
Is it raw water, treatment or distribution system, or a combination of these, which is the limiting factor to achieve acceptable risk? Where are the bottlenecks?
It is observed that the above questions could be related to various life cycle phases, (e.g. design or operational phase), and the questions can be related both to strategic and operational decisions.
2.3 System description
One of the first tasks is to provide a system description, and also describing the functions of the various subsystems. Each water supply system is unique and a description of the
19
system is therefore an important part of a risk analysis. The description should include both illustrations (drawings) and written text. Important documents are rules and regulations, standards, maps, statistics, operating procedures, drawings, etc. The system descriptions should include detailed knowledge of the following three subsystems (in case the total system is analysed): 1. Water source (groundwater and/or surface water) and the catchment area. 2. Water treatment systems and monitoring systems. 3. Distribution network, including plumbing system and consumers. As an example Figure 4 is an illustration of a water supply system from source to tap carried out according to the WSP guidance. An important aspect is that the risk analysts shall get familiarised with the analysis object.
Source water Treatment Distribution
Tank WTP 1 Reservoir
WTP 2
Reservoir
Reservoir
Figure 4. Illustration of system flowchart, from source to tap. The system description should include a description of the system boundaries, the technical systems, operational conditions and the environment. For an identification of hazardous events it is also important to point out important support systems, which the water utility is depending on for successful operation, (e.g. power supply, supply of chemicals, ITsystems, training and employment of personnel). Some generic information is also seen as a part of the system description, such as the total number of consumers linked to the distribution system and their consumption demand. The system description illustrates a normal operational situation, after the treatment process and control points have been decided. So specification of this normal operational situation is an important part of the system description. In particular, it is specified which concentrations of various contaminants that the treatment system is designed to handle. Many risk analysis methods require some structured way to breakdown the system in manageable parts. A common way to break down a system in an analysis is a hierarchical model which reflects how the system is designed. The system should be broken down into suitable subsystems that can be handled effectively in an analysis (i.e. splitting Figure 4 into subsystems like source, treatment, distribution ). Each subsystem can further be broken down into modules, and each module into components etc.
20
2.4 Hazardous events
Step 2 (on page 15) of the risk analysis is to identify hazardous events, in the various parts of the system. A hazardous event is an event which can cause harm. In principle all types of unwanted events should be included. Identification of hazardous events is described in detail in the TECHNEAU Hazard Database (THDB) [20]. Following groups of hazards are normally considered: Biological Chemical Radiological or physical Unavailability (insufficient availability of water supply to consumers) Safety (safety to personnel) External damage (external damage to third parties, incl. liability)
A typical example of a hazardous event is the presence of a contamination (hazard) in the source of a drinking water supply system. The Microrisk project3, (for additional information see [5]), refers to the following types of hazardous events from a microbiological point of view: 1. Primary faecal contamination: Events that cause significant contamination of the source water, with microbial levels much above normal levels, (which should in principle be handled by the existing treatment system), and events that result in a biological/chemical agents entering the drinking water source; which the existing treatment system is not designed to handle. 2. Water treatment is malfunctioning: Treatment system having reduced ability to handle normal contamination or to detect contamination above normal levels. 3. Secondary faecal contamination, i.e. faecal contaminations that are not originating from the source water, (e.g. wastewater intrusion due to cross-connections or backflow). In TECHNEAU we add the following type of hazardous events: 4. Events causing insufficient water supply to consumers. Note that if the objective of the analysis is to assess the total risk caused by all identified hazardous events, there could be a risk of counting hazardous events twice. For instance, failure of the treatment system to handle/detect Giardia is one hazardous event. The presence of Giardia in the water source is another. However, it is not necessarily correct not to add the frequencies of these events to achieve the total risk, because both events must occur at the same time in order for the drinking water to be contaminated. In this case, it is suggested to start from the above four categories of events, to avoid such double counting.
3 The EU project Microrisk (contract EVK1-CT-2002-00123), see www.microrisk.com resulted in a number on reports on QMRA, see http://217.77.141.80/clueadeau/microrisk/uploads/microrisk_how_to_implement_qmra.pdf . QMRA was applied to 12 systems across Europe and Australia.
21
Different approaches for identifying hazardous events are discussed in Section 3.1.
2.5 Safety barriers Causes and consequences of hazardous events
When hazardous events are identified we may want to analyse both causes and possible consequences. The so-called Bow-Tie diagram can be used to illustrate this, see the example in Figure 5, with the hazardous event Giardia in water source. The chain of events goes from left to right with the causes (and hazards) on the left, the hazardous event in the middle and the consequences on the right.
Figure 5. Bow-Tie diagram and barriers. A practical example. In Figure 5 also some safety barriers are introduced, which are implemented to reduce the risk. In the left part of the bow-tie diagram we have barriers (1, 2, 3) that prevent the hazardous event to occur or mitigate the hazardous event; to the right we see barriers (4, 5) for preventing or reducing unwanted consequences; (e.g. people being infected by contaminated water). In general safety barriers can either: prevent the undesired event to occur (reduce probability), e.g. by introducing restrictions on the use of the catchment area, or reduce the consequences by water treatment systems; thus preventing contaminated water to be delivered to consumer.
So introducing a barrier actually means implementing a risk reducing option. Figure 6 illustrates the concept of hygienic (safety) barriers in a water supply system considering all elements from source to tap, see also the Bergen Case study [65].
22
Figure 6. Illustration hygienic barriers in a water supply system from source to tap (modification of a figure taken from: SA Water - Drinking water quality report 2004-2005).
2.6 Risk estimation
The risk estimation can be carried out at various levels of detail. An analysis of the hazardous events should include estimation of likelihood (probability) and consequence. Often a semi-quantitative approach is chosen, just giving categories of likelihood and consequence. The combined likelihood-consequence categories could then be inserted in a risk matrix, see example of risk matrix in Figure 7. As an example we here indicate corresponding risk values ranking from 1 (likelihood = rare; consequence = insignificant) to 9 (likelihood = almost certain; consequence = catastrophic). This is just an example on how to rank the risks related to the various hazardous events. The categories (e.g. catastrophic) can be defined in various ways (see Figure 11).
Severity of consequences Likelihood Almost certain Likely Moderately likely Unlikely Rare Insignificant 5 4 3 2 1 Minor 6 5 4 3 2 Moderate 7 6 5 4 3 Major 8 7 6 5 4 Catastrophic 9 8 7 6 5
Figure 7. An example of a Risk matrix.
One should make a separate risk matrix for loss of quality, and another for loss of quantity, (and possibly one for e.g. economic losses). Performing the risk estimation one should note the possible links between poor water quality and reduced water quantity. Water quality problems can occur due to low pressure, as a result of a pipe burst (water quantity problem). Further, a drought (quantity problem), will often also result in a decrease of water quality.
23
In more advanced analyses risk can be fully quantified, see Chapter 6. Often the input data to these quantifications are rather uncertain, and the results involve considerable uncertainty. Then it is recommended to carry out a sensitivity analysis; i.e. calculating risk with various input values to demonstrate the range of probable results.
24
3 Coarse risk analysis of water supply systems

The Coarse Risk Analysis (CRA) is a method for semi-quantitative risk analysis. The scope of an overall CRA including risk evaluation and risk control - typically consists of (see [18, 21, 22]): 1. Identify hazardous events related either to the total water supply system, or to a specific part (or in general to some category of undesired events). (Section 3.1) 2. Risk estimation, i.e. estimate the probability and consequence for each hazardous event. (Section 3.2) 3. Present these risks in risk matrices, and possibly compare to risk acceptance criteria. 4. Rank the hazardous events with respect to their risk. 5. Assess the need for risk reduction options or more detailed analyses. The first two steps relate to risk analysis, and in this chapter we first give a description of the hazard identification of an overall CRA (Section 3.1). Next, the risk estimation in a CRA is discussed (Section 3.2). Finally an example is given (Section 3.3).
3.1 Identification of hazardous events
There are various approaches for the identification of hazardous events, e.g. using; Brainstorming, experience from the past, checklists (Section 3.1.1). Checklists may be databases, such as the TECHNEAU Hazard database (THDB, Section 3.1.2). HAZOP is another commonly used method (Section 6.2). A general discussion is given first; then THDB is shortly described.
3.1.1 Various approaches for hazard identification
There are various techniques for identification of hazards or hazardous events within a system. Hazard Identification (HAZID) is a collective term often used for such techniques. A brief description of some of the methods is presented in this section. The descriptions are primarily based on [18] and [19]. Brainstorming is a main method of problem solving or idea generation in which members of a group contribute ideas spontaneously. In this case, the task is to identify hazards or hazardous events in a water supply system. What-if analysis is a specific and effective brainstorming approach [2]. Use of experience from the past, i.e. accident and reliability data, may also be used to identify potential problem areas and provide input into frequency analysis (probability estimation). Experience from the past is often used as input to the methods described in this chapter. A traditional checklist comprises a list of specific items to identify known types of hazards and potential accidents scenarios associated to a system. Checklists may vary widely in level of detail. Checklists are limited by their authors knowledge and experience and should be viewed as living documents, reviewed regularly and updated when necessary.
25
Experience from the past could be experience from the actual (or similar) water utility, provided by operational personnel of these utilities. One could then go through the total system and record operational problems and concerns that are experienced. This method is rather similar to the brainstorming session. One could also utilise statistics and data on events that have been recorded in various data sources, cf. Chapter 5. A checklist is easy to use and is a cost-effective way to identify common and customarily recognized hazards. Checklists can be applied at any stage of the life-cycle of a water supply system and can be used to evaluate conformance with codes and standards. The TECHNEAU Hazard Database, [20] presents a comprehensive list of hazards and hazardous events that can serve as a checklist for water utilities, see below. A list of generic hazardous events can be formulated by considering characteristics such as [18]: Materials used or produced and their reactivity Equipment employed Operating environment Layout Interfaces among system components, etc.
Based on the general hazardous events identified, a more specific list may be described for the various parts of the system. A form, shown in Table 1, may be used for doing this.
Table 1. Specific list of hazardous events. Hazardous event Tanker containing 20 m of gasoline tips over near intersection XX, polluting the water source near by the inlets Cause Vulnerable locality - Sudden illness of Intersection XX tank driver - Slippery conditions Possible effects:
3.1.2 TECHNEAU Hazard Data Base (THDB)
The THDB, [20], applies a holistic view on hazard identification within the water supply system and provides a list of hazards and hazardous events for each element in the water system. The hazards identified in the THDB are both internal and external. Internal hazards are mostly related to functional failures or the absence of infrastructure. External hazards are for instance source water contamination, degradation of mains due to aggressive soils or terrorist actions. The objective of the database is to help water supply utilities with the identification of relevant hazards by providing a catalogue with potential hazards of technical, geographical or human origin for the whole part of the system. The database has a generic set-up. It does not cover all possible specific operational hazards, but should be regarded as a checklist to assess possible risks of the supply system.
26
The water supply system is subdivided into 12 sub-systems, of which 10 are physical subsystems representing the infrastructure, one is a non-physical sub-system representing organizational aspects and one is a sub-system representing future hazards. The hazard database is presenting the identified hazards in a table at the subsystem level. The tables are divided into components and elements. At component level the most important elements are given, and for each element the most relevant hazards are given in combination with a description of the cause of the hazard, the hazard type and the consequences. The THDB focuses on both water quality and water quantity. The hazard database uses the definitions given in Table 2. Examples of hazardous events from the THDB are shown in Table 3. Table 2. Definitions applied in the TECHNEAU database (THDB), [20]. Element: Hazard: Lowest level of the system at which hazards are described. A source of potential harm or a situation with a potential of harm (e.g. a biological, chemical, physical or radiological agent or undesired event that has the potential to have a negative effect on the supply of safe and sufficient water). Reference number (id.) of the hazard. An event which can cause harm (e.g. an incident or situation that can lead to the presence of a hazard, what can happen and how). Indication of the origin of the hazardous event. - D: design-related - O: operation-related - E: external-related - OS: consequence of a hazard in other sub-system - Ref. OS: Reference of other sub-system Indication of the type of hazard. - Biolog.: biological - Chemic.: chemical - Rad./phys.: radiological or physical (including turbidity) - Unavail.: insufficient availability of water supplied to consumers - Safety: safety to personnel - External damage: external damage to third parties, including liability Description of potential consequences of the hazard to other sub-systems and to consumers. Reference of the sub-system affected by the hazard. Column to be used by the end-user for marking the identified hazards.
Ref.: Hazardous event: Type of hazardous event:
Type of hazard:
Consequence description: Consequence to sub-system: Rel. system:
27
Table 3. Examples of hazardous events in the TECHNEAU hazard database (THDB) [20]. System Source/ Catchment Examples of hazardous events from the THDB (hazard id. in brackets) - Industrial discharges of chemicals. (1.1.1) - Industrial discharge of biological matter. (1.1.2) - Emissions during accidents (fire or explosions) e.g. industrial accidents or forest fire. (1.1.3) - Improper coagulant mixing and/or flocculation; inappropriate flocculant or flocculation agent; improper pH control. (6.4.3) - Decrease of UV lamp performance due to ageing or colour sediments on quartz tube. Electrical disruptions. (6.6.7) - Poor hygiene during repair. (8.1.2) - Malfunctioning valves, connections to different water qualities (industrial water, sewers). (8.1.14)
Water treatment plant Distribution and plumbing
3.2 Risk estimation in Coarse Risk Analysis (CRA)
It is a rather common situation that a water utility wants to have a coarse overview of the main risks for its activities, in order to identify the most serious threats and then to make the right priorities with respect to implementing risk reduction options. In such a situation water utilities can carry out a Coarse Risk Analysis (CRA); this method is similar to the Preliminary Hazard Analysis (PHA). In Norway this type of analysis is commonly applied, and is referred to as a ROS (Risk and Vulnerability) analyses. These analyses are often carried out early in the development of a utility, or in the launching of a WSP implementation in an existing system. Then there is little information on design details and operating procedures, and the analysis can be a precursor to further studies. However, they are also used for analysing existing systems, or specific subsystem. The CRA can also be used to prepare emergency preparedness plans for the water supply companies. The main objective of the CRA is to identify hazardous events (as described above), the causes of the event, and to make a coarse evaluation of likelihoods (probabilities) and consequences of these events. The results are normally displayed in a list of hazardous events (in a worksheet form). Several variations of this form are used. One example of a worksheet used to document the results of the analysis are shown in Table 4. Each hazardous event identified is inserted in the list and analysed. As an example we have taken the hazardous event 6.6.7 in the THDB, [20]. The risk estimation in a CRA usually restricts to presenting categories of probability and consequence. The probability categories are denoted e.g. P1P4, and similarly consequence categories, C1-C4; cf. Table 4. These pairs of values are later inserted in the appropriate cell of the risk matrix. Note that the consequences can be evaluated with respect to several dimensions; e. g. water quality, water quantity (supply) or reputation/economic loss.
28
Table 4. Example of a CRA- worksheet. System: Treatment

Ref. Hazard
6.6.7 Pathogen in water source
Operating mode: Normal operation

Hazardous Causes event
Analyst: NN Date: 2008-10-10

Probability Conse- Preventive quence actions
C2 2) Online measurement of UV intensity to verify correct intensity
Comments
Too low UV Ageing or colour P2 1) dose sediments on quartz tube
1) 2)
Probability category Consequence category
According to the resulting risk-score of the various hazardous events in the risk matrix, the most serious hazardous events are identified. Risk reduction options to prevent the hazardous event or to neutralize its consequences are identified. The needed efforts (in terms of costs, time, organization, training, etc.) and the reduction of risk of the various risk reduction options are roughly evaluated. Finally a priority list for risk reduction options (with deadlines) is formulated. In summary, a CRA is a rather simple semi-quantitative risk analysis method. However, the CRA requires good information and knowledge about the system including surroundings. Hazard identification is usually based on some kind of expert judgement, e.g. using experience from the past, check lists, or a combination of these. If statistics about hazards are not available the CRA will rely on expert judgements to estimate the risk and define appropriate risk reduction options. No detailed modelling and calculations are needed, and the analysis may be carried out by professionals with good system knowledge, but is not requiring computational skills. Normally a CRA is not very time consuming. However, this depends on the size and complexity of the system to be analysed. Note that the CRA does not provide a score of the total risk of the water utility. The main focus is on identifying major hazardous events, and then ranking these with respect to their contribution to risk.
3.3 Tool for Coarse Risk Analysis (CRA)
A tool for carrying out a coarse risk analysis was developed. The tool is applicable for small, medium and large water companies. The tools itself is also an aid for organizing the data generated as a part of the coarse risk analysis. The structure of the tool is a database which enhances future updating of the tool. The userinterface for carrying out the analysis is shown in Figure 8. By clicking on the acronym of a potential hazardous event the corresponding risk registering dialog box for the relevant hazardous event appears, as shown in this figure. The various fields of the interface are explained in Table 5.
29
Figure 8. User interface for the CRA tool (registration of potential hazardous events).
Likelihood (probability) and consequence are given as categories. The consequence classes can be specified by two dimensions. In the example below (in Figure 9) duration and exposure are chosen. Duration of e.g. illness or lack of supply can, for example, be classified as: 1. 0-6 hrs 2. 6-24 hrs 3. 1-7 days 4. 1-4 weeks 5. 1-6 months 6. > 6 months Exposure, i.e. number of affected persons, can be given as: 1. 1-10 2. 10-100 3. 100-1 000 4. 1000-10 000 5. 10 000-100 000 6. > 100 000 As seen in Figure 9 this can be used to define four consequence categories, (from Low to Very high).
30
Table 5. Description of user interface (CRA tool) in Figure 8. Comment The user selects which waterworks the analysis belongs to. One water company might have several waterworks and some hazardous events might be unique for one of the waterworks. Analysis object Describes which element in the water supply system is analysed (e.g. catchment, source, intake, water treatment plant). The user must select analysis object from a drop down text. Detailed Detailed description of the analysis object (e.g. for treatment plant the following detailed elements might be analysed: coagulation, filtration, chlorination, UV, CO2, pH). The user must select detailed analysis object from a drop down text. Undesired Description of the undesired event or hazardous event. A check list of event/hazardous possible events is available from the Hazard database developed as a event part of Techneau. The user must select event from a drop down text. Cause The underlying cause for undesired event. A checklist for possible causes can be found in the Hazard database developed in Techneau. The user must select cause from a drop down text. Probability The probability for the undesired event to occur. The probability must be estimated by the user either based on available data or expert evaluation. Some guidance on assessing the probability is given within the tool. The probabilities of occurrence are defined as small (P1), medium (P2), large (P3) or very large (P4). Cause description A more detailed description of the underlying cause of event can be given. Vulnerability Description of how vulnerable the system is if the analysed elements fails (e.g. if the water company has alternative sources of backup supply the water supply will be less vulnerable). Might also be used indirectly for assessing the consequences. Components A description of the components (e.g. two pumps in parallel) Description A more detailed description of the consequences of the event might be given here. It will also serve as a justification for the assessed consequences making it easier to review the estimated values. Consequences The possible consequences resulting from the event are described as small (quality, (C1), medium (C2), large (C3) and very large C4). The consequences delivery/quantity, consist of 3 elements: quantity/delivery, water quality and loss of reputation/ reputation/direct economic loss. For the terms quality and delivery/quantity the economic) duration and the number of involved persons Barriers Identification of barriers (c.f Bow-tie diagram) reducing both the probability and consequences for the event. The barriers can be existing barriers and possible future barriers. Assessing whether the barriers reduces the Probability (P) or the Consequences (C) might be useful. Manageability Description on how the risk can be managed i.e. how and what can be modelled and/or measured to control the process; (e.g. measuring remaining pipe wall thickness by non-destructive testing) Risk reduction Based on the resulting risk matrixes, the need for risk reduction options for options /CCP each of the undesired events might be introduced. These can either be physical options or implementation of critical control points (CCP) for controlling the risk in real time. Waterworks
31
Figure 9. Specification of the consequence categories by duration and exposure (CRA tool).
So, one outcome of the analysis of a specific hazardous event is the risk, given by probability (likelihood), P, and consequence, C. This set (P, C), is to be inserted in the risk matrices, see Figure 10. There can for instance be one matrix for quality (life and health), one for quantity (delivery) and one for reputation/economic; i.e. for the various dimensions of risk considered. Each hazardous event will be shown in the risk matrices by a symbol. Note that the colour coding within the risk matrix in the CRA tool are defined by the user by editing the individual cells of the risk matrix. In Figure 10 Green risk indicates that the risk is tolerable and there is no need for risk reduction options. Yellow risks indicate that the need for risk reduction options should be discussed, while red risks indicate that the risk is not tolerable and there is need for risk reduction options. The decisions of the spreading of the red, yellow and green area of the risk matrix reflect the applied risk acceptance criteria of the water utility. Decisions of these acceptance criteria are to be taken by the management. By this splitting of outcomes in three categories, red, yellow and green, we adopt the ALARP principle, ref. [2].
32
Figure 10. The risk matrix for water quality (delivery). Identical matrices are given for quality (life and health) and loss of reputation/economy (CRA tool).
An example of the application of the tool is given in the Bergen case study report, [65]. In that case the following undesired events, which might take place in the distribution system, were identified:
Failures in hygienic barriers (water quality)/ intrusion of contaminated water into network: Contamination in water tanks (water surface) Intrusion due to low pressure/non-pressurised network o Operational and maintenance situations (e.g. valve operations) o Power failure o Work on non-pressurised network (e.g. repair, rehabilitation, construction) o Fire (huge water demands might lead to low pressure) o Water mains failure (might lead to non-pressurised system) o Incorrect operation of valves o Failure at pumping stations in zones without water tanks o Water hammer o Pipe fracture, valve closes without intention o Water tanks emptied due to communication error o Extraordinary water demand/tapping o In-pipe processes Cross-connection/backflow o Unintended backflow from building o Sabotage (intended backflow from building)
33
II Failures of water delivery/quantity: o Operational and maintenance situations (e.g. valve operations) o Pipe failures o Rockslides/rockfall in tunnel o Water tanks emptied due to communication error o Failure at pumping stations o Failure of equipment (e.g. valves)
34
4 Quantification of risk
Depending on which aspects of risk are considered, risk can be quantified in various ways. In this chapter various ways to quantify (measure) risk are discussed.
4.1 The dimensions of risk and various ways to quantify risk
The TECHNEAU project applies the very common definition of risk (Section 1.4), as a combination of the probability (frequency) of the occurrence of specified hazardous events and the consequence(s) of these events. Risk is often expressed in terms of these probabilities and consequences. The estimated risk of the various hazardous events can also be aggregated, in order to give an expression of the total risk of a water supply system. Several types of potential consequences can be considered in a risk analysis of a water supply system. One refers to the various dimensions of risk, representing the different types of consequences, and each of these risk dimensions can be quantified. For the consumer it is important to be supplied with water of good quality, but there should also be enough water. So the TECHNEAU project focuses on the quality and quantity of the water supply, both aspects essential for the consumers risk. In order to quantify risk related to water quality, the complete water supply chain should be considered. Some examples of risk measures for water quality are: 1. Probability of a specific degree of contamination/pollution of the water source. 2. Probability of a specific failure of the treatment system, resulting in contaminated water entering the distribution network. 3. Probability that one litre of drinking water at tap contains a certain parasite; (meaning that contaminated water is delivered to consumer). 4. Mean number of consumers getting adverse health effects caused by drinking water (due to a certain hazardous event). Risk related to water quality is not necessarily measured in terms of the quality of water delivered to consumers (item 3 on the list), or as the actual health effects for the consumers (item 4). For a water utility it can be useful also to estimate the risk of a water source being polluted or of a failure of the treatment system (items 1 and 2). In Item 4 Mean number of persons getting adverse health effects a quantification of risk is applied where probability and consequence are combined into one figure. This means that a rather traditional definition of risk as the mean loss, (probability x consequence)4 is applied. Various measures for water quality are discussed in Section 4.3. When risk related to water quantity shall be quantified, it should be noted that loss with respect to water quantity/availability depends on (see Section 4.4):
frequency of interruptions of water supply
Note that some fundamental probability concepts are discussed in Appendix F.
35
duration of the interruption, exposure, i.e. number of consumers being affected.
Note that even without interruption of the water flow at the consumers tap, the water may be delivered with a pressure which is too low (e.g. for appliance to work). So water pressure being excessively low is also a risk to water quantity, (and excessive high pressure is a hazard, potentially causing leakage or bursts in plumbing installations). Thus, loss of water quality and loss of water quantity are the two most important dimensions of risk for a water utility. But note that if analysis of water quality restricts to include the effect on human health, then environmental impacts is another dimension of the risk. Also this risk can be measured in various ways, e.g. in terms of frequency of polluting events and the exposure (e.g. number of affected species/animals). In addition, the water utility can experience loss of reputation (consumer thrust), which is more difficult to measure, but also these losses can have economic consequences. Further, consumers (e.g. certain industries) and the water utility itself may experience economic losses, which are most reasonably expressed in monetary units, (e.g. Euro). But in principle, it is possible to measure all losses - related both to water quality and quantity (and environment) as economic losses, and in this way give an overall measure of the total risk. Finally, we mention societal risk, which is the risk related to major events, e.g. causing main functions of society to be at risk. This is certainly relevant for a major infrastructure like the water supply; (either lack of water or polluted water, affecting many consumers or an institution like a hospital). Specific risk measures could be designed to express also these risks. However, the present report focuses on risks for the consumers and for the water utility. Various ways to measure (quantify) risk will be discussed in more detail below. Some measures are common, i.e. can be used for various dimensions of risk, and others are related to a specific dimension, as quality or quantity.
4.2 Qualitative versus quantitative expressions for risk
As stated above, risk is usually measured by severity of some unwanted consequence, C and the likelihood (i.e. probability, p, or frequency, f) that this consequence occurs. Various types of consequences (losses) can be considered. Often we want to rank various risks, and so the C- and p-values are quantified to give an overall measure of the risk, e.g. R = p x C. This quantification can be time consuming. Also note that risk quantification expressed in detailed numbers pretends an exactness that may not be the case because it has been derived from assumed probabilities or ranges of numbers described in then literature. So there is a danger of creating a false sense of precision of the result. A ranking can also be carried out qualitatively, without specifying p- and C-values for each risk. One possibility is to apply paired ranking; i.e. comparing pairs of risks: each risk is compared to every other risk, specifying which of the two is greater [7]. This should give an explicit weighting, but again with the danger of giving a false sense of precision. The
36
process could also be very time consuming and complicated due to the fact that experts are not always consistent (agreeing) in their evaluations of paired comparisons. A common qualitative approach is to apply a classification of risk. Probabilities and consequences are divided into categories. For the probability category measures as rare and frequent are used. Consequences could be categorised as small, medium and catastrophic. These categories are a ranking of likelihood and consequences. The categories can also be defined by intervals, for instance, the probability category rare could be defined as less than once a month. Similarly, the consequence category small with respect to health effects could be defined as at most 10 consumers with minor health effects, etc. In this case the term semi-quantitative approach is used (not fully quantitative but placed into pre-determined categories). Based on categories for probability and consequence, a risk matrix can be made; for an example based on WHO [8], see Figure 11. In this figure risk categories are given (1-9) (note that this is an example). Also observe that the WHO definition of the likelihood (probability) category Almost certain equals Once per day. In a risk analysis it is rather seldom to include events which are that frequent.
Severity of consequences Likelihood Almost certain Likely Moderately likely Unlikely Rare Insignificant 5 4 3 2 1 Minor 6 5 4 3 2 Moderate 7 6 5 4 3 Major 8 7 6 5 4 Catastrophic 9 8 7 6 5
Examples of definitions of likelihood (probability) and severity (consequence) categories that can be used in risk scoring Item Likelihood categories Almost certain Likely Moderately likely Unlikely Rare Severity category Catastrophic Major Moderate Mortality expected from consuming water Morbidity expected from consuming water Major aesthetic impact possibly resulting in use of alternative but unsafe water sources Minor aesthetic impact possibly resulting in use of alternative but unsafe water sources Not detectable impact Once per day Once per week Once per month Once per year Once every 5 years Definition
Minor
Insignificant
Figure 11. Example of a risk matrix and definitions of likelihood (probability) and severity (consequence) categories to be used in risk scoring in WSP (WHO, [76]). Suggested risk categories, 1-9, are added here as an example; (not included in the WHO report).
37
A risk matrix is the most common way to present risk when a semi-quantitative approach is chosen, see Chapter 3. In particular it is used to prioritize and distinguish between important and less important hazardous events, as the risk for each hazardous event is assessed and inserted in the risk matrix. As there are various dimensions of risk, we can design one matrix for each dimension; e.g. one for health effects (loss of quality) and one for water availability (loss of quantity).
4.3 Risk measures for loss of water quality
Considering the total system, from source to tap, there could be various quantifications related to loss of water quality, i.e. the measures could be related to: 1. 2. 3. 4. Quality of source water, treatment technology and distribution network. Health effects for consumers. Effects on the consumers acceptability Effects on the distribution and plumbing systems and equipment (e.g. corrosion)
Here the measures related to 1 will be interesting only as they say something about the potential to avoid the consequences 2, 3 and 4. Some examples are given below. 1. Quality of water source, treatment technology and distribution network. For a water utility it can be useful to estimate the risks related e.g. to pollution of raw water or to treatment failures, and so the following are examples of risk measures: Probability (frequency) of specific degrees of contaminations/pollution of the water source. Probability of failure of specific treatment systems. The probability of one litre of treated water containing a certain parasite. Probability of pollution entering distribution network
2. Health effects for consumers. The risk of contaminated water to human health can be characterised in a number of ways. For instance one can give the risk per person and then in addition the number of persons exposed. The risk per person can be described by a probability distribution, and the measure could be given by the mean, the median or e.g. the 95% percentile; cf. the Microrisk project, (www.microrisk.com). Thus, the following are some risk measures related to health effects for consumers: Mean number of consumers which during one year have serious health effects caused by bad drinking water. Frequency, f, of events resulting in at least N consumers getting ill (adverse health effects); with say, N = 1000).
Finally, note that a general risk measure for overall health effects is DALY (= Disability Adjusted Life Years), which is the measure of health effects used by WHO. The use of
38
DALY is discussed in Appendix C. Here also a generalisation is introduced, that gives an overall measure for loss of both water quality and water quantity. However, usually DALY is probably too complex to be used by water utilities. 3. Effects on the consumers acceptability. This way arise from the occurrence of taste, odour, colour or turbidity (and may further convey economic risks (loss of reputation). Some risk measures are: Probability of water delivered to consumer has unacceptable odour/smell. Substandard Supply Minutes (SSM) i.e. the number of minutes the average consumer is supplied with drinking water that do not complying with existing quality and/or quantity standards.
4. Effects on the distribution and plumbing system equipment. There could for instance be an effect on pumps and appliances; e.g. due to water aggressiveness (e.g. corrosion) or hardness (e.g. incrustations). The risk measure should quantify this damage.
4.4 Risk measures for loss of water quantity (supply)
Generally, there should be a high availability of the water supply, (i.e. a high probability of every consumer being supplied), and further, supply should be done with proper flow and pressure. In addition to the consequences (i.e. how much deficient are flow and pressure), risk measures related to water quantity could consider either the number of affected consumers, the frequency and the duration, or a combination of these. For instance, the mean number of days without water supply, aggregated over all consumers, could be such a measure of risk. The frequency of interruption of supply, (e.g. affecting at least 1000 consumers) is another. To give some examples, loss of water quantity can be measured e.g. as Probability (fraction of the time) that an arbitrary consumer is without water supply, (or supply is insufficient). Frequency of events resulting in failure to supply water to at least 1000 consumers. Volume of water missing (when supply is insufficient). Mean number of consumers affected by shortage (when supply is insufficient). Customer Minutes Loss (CML), i.e. the average number of minutes that drinking water is not delivered to an average consumer.
In general, the average water unavailability (fraction of time without water) for a consumer should be a reasonable measure for water quantity. However, one long delivery interruption does not necessarily represent the same risk as ten small interruptions, even if the total time without supply is the same. For specific types of industries a short interruption might have approximately the same consequences as a longer one, even though the interruptions contribution to the yearly unavailability can be small compared to a more long-lasting stop. A similar argument could apply for residential consumers: 500 persons losing water supply for one month (30 days) may be considered worse than 15 000
39
losing water for 1 day; even if both events give the same contribution to overall water unavailability. Thus, measuring water quantity by the average unavailability of supply may not be sufficient; and then both the frequency and durations of interruptions should be given. So, in more advanced approaches we could distinguish between long and short durations of the interruptions in water supply. The unavailability of water should be evaluated both with respect to planned and unplanned activities. Example of unplanned activities might be pipe burst, wrong valve operation etc.
4.5 Risk measured in monetary units
Risks and risk reduction can be valued in monetary units in order to (1) express all risks in a common unit and (2) facilitate economic analyses, e.g. cost-effectiveness or cost-benefit analyses, for prioritising between risk reduction options. An overview of different valuation principles and methods are presented in a TECHNEAU report on risk management [2]. Economic valuation of market goods, i.e. goods traded in the common market, does usually not constitute any large problems. Economic valuation of non-market goods, such as the reduced risks to human health from drinking-water consumption, is generally more problematic. Several studies [10-13] provide detailed and extensive information on economic valuation methods of non-market goods. Three groups of valuation methods for non-market goods can be distinguished [14]: 1. Revealed preference methods (RPM) 2. Stated preference methods (SPM) 3. Methods that are less strongly founded in economic theory We will describe the first two groups. First, the revealed preference methods (RPMs) are based on individuals actual behaviour on an existing market, making use of a relation between the market and the non-market goods. Thus, the relationship between goods on a market and for example the reduced health risk from consuming drinking water is used for indirect valuation of the risk reduction. RPMs include (a) the production function method, (b) the travel cost method, (c) the hedonic price method, and (d) the replacement cost method and the restoration cost method. An example of a revealed preference valuation is to investigate the decrease in sales of bottled water after installing a new treatment system to decrease the health risks of the drinking water. A shortcoming of the RPMs is that they are capable of valuing only parts of the total economic value (TEV) of e.g. a risk reduction. In the second group we describe the stated preference methods (SPMs) which are capable of measuring the TEV, including both use and non-use values (see e.g. [15]). The principle of SPMs is that a scenario is presented for a randomly selected group of individuals. Each individual has to decide on the scenario through interviews or questionnaires. The most common SPM is the contingent valuation method, where the individuals are asked about their willingness to pay (WTP) for a suggested change in the scenario. A closely related SPM is choice experiments where the individuals have to make a choice among different
40
situations, e.g. how much they are willing to pay for different well-defined levels of drinking water safety. Based on these experiments, it is possible to derive a willingness-topay (WTP) model. Economic valuation of non-market goods is still to some extent controversial. However, extensive research and applications in the field of environmental economics over the last decades have resulted in greatly increased knowledge regarding the possibilities and limitations of valuations of e.g. saving a statistical life and ecological improvements. For example, important improvements have been made on various types of SPMs, see e.g. [16]. In the drinking-water sector, economic valuation is being increasingly used in order to achieve cost-effective asset management. Especially in the UK, where the drinking-water industry is privatised, economic valuation is common. A successful example often referred to is the Yorkshire Water utility, which uses economic valuation of risks, based on stated preference surveys, as an integral part of its asset management (see [17]).
41
42
5 Data for risk analysis

5.1 Introduction
Available and accurate data are essential for achieving reliable results from a risk analysis. Data is needed for the system description, hazard identification risk estimation and risk reduction option identification and implementation. The level of detail of the needed data depends on methods used for risk analysis, the required level of detail of the analysis and need of accuracy of the results. Data requirements for a coarse qualitative risk analysis differ from the requirements for a detailed quantitative risk analysis. Some relevant types of data are listed below. Technical data are needed to understand the functions of the technical systems and to identify the barriers. Information about the specific layout of the system is essential to establish a system model and to gain an understanding of the system as a whole. Environmental and geographical data are necessary to identify possible hazards and to obtain an understanding of the environment where the water system is located. This information can allow evaluation of the dose and frequency of a contamination of the source, and to identify possible contamination points. Operational and maintenance data are needed to determine availability and reliability of components, subsystems or the entire system. Specific data about the reliability of barriers in the system is essential. The treatment systems will be of special importance. Knowledge about the effects of the identified hazard on consumers is also required. Guideline values and national standards. Knowledge about removal efficiencies.
5.2 Data needs
The various types of data needed can be considered in three categories: Generic data Data from external data sources (not data from the water utility under investigation). Data could relate to e.g. effectiveness of different types of water treatment or the effect that different types of pollutants have on humans (cf. dose-response results). System data Data describing the entire system of the water supply (from source to tap) in question, e.g. raw water sources, layout of the plant , water treatment methods being used, number of consumers and local conditions relevant to adjust the generic failure data. Event data Monitored data of hazardous events or system failures that have occurred in the past.
Some data needs for risk analysis are summarized in Table 6, following this categorisation.
43
Table 6. Data needed for risk analysis. Type of data Generic data Data on health effects of various doses of various pollutants on humans; cf. dose-response (QMRA) Effectiveness of treatment systems for various types of contamination Weights to be used in DALY calculations Use Efficiency of treatment systems (i.e. level of contamination in source being unacceptable) Calculations of risk in terms of DALY Data sources Microrisk website (www.microrisk.com) WHO website Databases available on USEPA websites provide additional information (e.g. for health risk assessment) in comparison to the WHO or Microrisk websites.
System data Geographical data Layout of the catchment area and source Possible hazards in the catchment area, water source and the distribution system GIS data on hazards Environmental data Treatment systems Water distribution network Number and types of consumers connected to water utility Volume of water consumed per consumer (per day) Event Data Failure data for various subsystems, (treatment systems / barriers) Data on erroneous operation (human errors) Events that have resulted in contaminated water Preventive and corrective maintenance data
System description is used throughout risk analysis to assess e.g. Hazards Hazardous events Treatment system reliability Exposure and consequences to water quality and human health
Maps Water utility/plant data: o Technical drawings o Layout drawings o Asset databases o Maintenance systems Municipality, water utility (GIS maps, water distribution networks etc) Local knowledge On-site inspection Failure data base of water utility Maintenance system Generic failure data bases Vendor information (e.g. on failures) Reporting system for hazardous/undesired events Local knowledge, (e.g. maintenance personnel)
Reliability and failure rate of equipment and systems Type and frequency of hazardous events
44
5.3 Data sources
5.3.1 Types of data sources
There are different sources that can be utilized to obtain data. These can be grouped in different categories [67]: 1. 2. 3. 4. 5. External data sources Internal data sources Expert judgement Test data Literature and publications
External data sources can be used for reliability of technical component and systems and for obtaining the effect different types of pollutants have on humans. In the case of component reliability this type of data source could give valuable data because the operational time where failures are registered is often extensive. The effect that different pollutants have on humans is in most cases independent of local conditions, but structure and sensitivity of population supplied may be site specific (e.g. hospital or baby sanatorium connected to the network). It is important to consider the relevance of the data for the specific system in question before utilizing external data sources. Similar systems or barriers might have different external conditions and maintenance which may effect the reliability (and effectiveness) of the barriers. Internal data sources can be data monitored in a CMMS system (Computerized Maintenance Management System) or a SCADA (Supervisory Control And Data Acquisition) system, which can be important sources for reliability data. Expert judgment and testing can either be from external sources or internal. This is used where no reliable data is available. Testing of systems can also be used either in operation or in laboratories.
5.3.2 Failure event data bases
Information about reliability of equipment is defined by Rausand and Hyland (2004) [23] as information about the failure/error modes and time to failure distributions for hardware, software and humans. The reliability of specific systems or components is often not site specific. It is therefore possible to collect reliability data from different sites (systems) into a common database. This is done in other industries than the drinking water industry such as in the offshore industry in Offshore REliability DAta [68]. Such a database should contain the following information;

(Hazardous) events Failures of various components/equipment, (incl. failure mode, repair time etc.) Inventory of equipment, giving number of various types of components, operational times, etc. Various environmental and operational data that are (assumed) relevant for the performance of the systems/components.
Important tasks when collecting data is to Establish a common format for such a database (making it easy to transfer data) Encourage exchange of data across water utilities (and countries)
45
Develop analysis techniques to better utilise the information provided by such a data base.
Such a database will make risk analyses more reliable, and once a database is established the risk analysis will be less costly and time consuming.
46
6 More advanced risk analysis methods for water supply systems

This chapter gives an overview of various methods for risk assessments of a water supply system. The methods presented here give analyses of different level of detail and complexity, but they are more advanced than the simple CRA described in Chapter 3. Most risk analysis methods described in this chapter are quantitative, but there will also be qualitative analyses. The risk analysis could be a detailed analysis of a specific subsystem or process, and could also include evaluation of human and operational activities. Further, an overall analysis to assess the total risk of the water utility can be carried out, possibly with respect to both quality and quantity. First a general introduction to some main models and approaches is given; next the various methods are described in some detail. The objective is to explain the capabilities of the various methods and the situations where they can be beneficially applied.
6.1 Risk modelling and choice of a risk analysis method
In order to investigate consequences event trees (Section 6.10) are often used5. A simple example is found in Figure 12. Here the hazardous event, microbiological contamination of source water has the meaning Either there is an unacceptable concentration of this contamination in source water, or there is a microbiological hazard that the treatment system is not designed for. In this example there are two barriers to reduce consequences to consumers: disinfection and monitoring. A branch of the event tree indicates whether the preceding barrier functions or not (the first branching point of the event tree corresponds to Is disinfection system of water OK? If yes upper branch is chosen). Depending on whether the barriers are effective or not, there are then three possible consequences for the consumers: Contaminated water is disinfected; (no effect for consumers) Disinfection does not work, but contamination is revealed by monitoring, resulting in corrective action. Neither disinfection nor monitoring works, thus, contaminated water is delivered to consumers.
The causes of the hazardous event or the failure of a barrier can be analysed using fault trees (Section 6.5). Also this is illustrated in Figure 12, here considering a failure of the barrier disinfection. The fault tree uses specific symbols to break down the causes of such an event, and in this example it is assumed that there are two systems for disinfection that both must fail in order to cause water not being disinfected. Further, there are two events (Event 1 and Event 2) that both can cause failure of system 1 (similar for system 2).
In this Section we will refer to some analysis techniques that will be properly defined later in the Chapter.
47
Figure 12 Two simple examples of risk analysis methods
An overall risk model related to a hazardous event can then be provided, as illustrated in Figure 13: Analysing the causes of the hazardous event, see left part of figure, (here using a fault tree). Analysing the consequences of the undesired events; illustrated by an event tree, see right part of the figure, (using an event tree). Analysing one or more of the safety barriers of the event tree, (see Barrier 3, again using a fault tree). So during the analysis we consider events that may cause a malfunctioning water supply system, and also the effectiveness of different barrier systems for treatment or detection. In Figure 13 the undesired (hazardous) event is the top event in a fault tree, describing the causes of the undesired event. The causes could be a combination of hazards, hazardous events and malfunctioning barriers. After the undesired event has occurred, the further development of the event chain depends on the functioning of safety barriers. This is illustrated by an event tree in the right part of in Figure 13. Each safety barrier in the event tree may also be influenced by a set of hazards or hazardous events, as illustrated for barrier 3 where a malfunction of the barrier is represented by the top event in a failure tree.
48
Figure 13. Outline of an overall risk model, including cause analysis and consequence (C) analysis
The direct consequences are given at the rightmost part of the figure; by five increasing categories (C1-C5); the most serious consequence (C5) usually located at the bottom. Various types of consequences can be considered; degree of contamination of water to consumer, health effect for consumers, lack of delivery to consumer etc. Note that the overall risk model in the figure above illustrates the bow-tie model from Figure 5 in Section 2.5. The following are some other examples where the use of more advanced risk analysis methods can be required: If a thorough investigation/identification of all hazards is required for a complex system, the use of a Hazard and Operability Analysis (HAZOP) could be relevant (Section 6.2). If a system during design is identified to be critical or during operation is observed to have a negative development, this should be analysed to identify possible failure modes and their causes and effects. This could be carried out by a Failure Modes, Effects, and Criticality Analysis (FMECA) (Section 6.3). If a component/system goes through various stages of degradation, then a Markov analysis can be carried out to identify a cost-effective preventive maintenance program (Section 6.8). If there is a public discussion about the risk to consumers health due to inadequate water quality, an assessment of health impacts of microbiological threats in drinking
49
water could be required, e.g. a QMRA (Quantitative Microbiological Risk Assessment) (Section 6.11).
6.2 HAZOP
Hazard and operability (HAZOP) is a detailed and systematic technique for identifying hazards and operability problems throughout an entire treatment plant or facility. All parts of the systems are evaluated to see how deviations can occur and whether they can cause problems. A HAZOP analysis is particularly useful in identifying unforeseen hazards designed into facilities due to lack of information, or introduced into existing facilities due to changes in process conditions or operating procedures. The basic objectives of the analysis are to: a. Provide a full description of the facility or process, including the intended design conditions; b. Reveal how deviations from the intention of the design can occur; c. Decide whether these deviations can lead to hazards or operability problems. The approach is briefly described by the following steps: 1. Split the system or process into study nodes 2. At each study node specify a relevant set of process parameters, such as: temperature, pressure, flow level, and chemical composition.
3. All of the process parameters are used together with a set of predefined guide words (see Table 7) to review the process in a systematic way in order to identify possible deviations that may affect water quantity or quality. The steps of a HAZOP analysis are illustrated in Figure 14. Table 7. HAZOP guidewords to review processes
Terms No or not More Less As well as Part of Reverse Other than Definitions (and examples) No part of the intended result is achieved (e.g. no flow) Quantitative increase (e.g. high pressure) Quantitative decrease (e.g. low pressure) Qualitative increase (e.g. additional material) Qualitative decrease (e.g. only one or two components in a mixture) Opposite (e.g. backflow) No part of the intention is achieved, something completely different happens (e.g. flow of wrong material)
50
Figure 14. Flow diagram for the HAZOP analysis
The HAZOP study is documented in a HAZOP worksheet. An example for a water treatment system (chlorination for water disinfection) is given in Table 8. Table 8. Example of HAZOP analysis (use of guidewords) for a water treatment system.
Process unit: Water treatment; chlorination 1. Process parameter: Flow (of chlorine) Guide word Deviation Causes No No flow 1. Chlorine supply is empty 2. Leaking pipe or tank 3 Valve failed in closed position More More (too Miscalibration of much) flow equipment Less Less flow 1. Limited supply 2. Miscalibration of equipment
Consequences Water not disinfected
Action /solution
High chlorine concentration in water Water not disinfected
Reverse
Flow in opposite direction
51
A HAZOP study may highlight specific deviations for which risk reduction options need to be developed (and implemented). This risk analysis method is most suited to be applied to the treatment system and distribution network of a water supply system.
6.3 Failure Modes, Effects and Criticality Analysis (FMECA)
A good understanding of the functioning of the various subsystems or modules is a prerequisite for safe operation of any system. Therefore a Failure Modes, Effects and Criticality Analysis (FMECA) is often a first step in a reliability analysis. It can also be very useful as part of a risk analysis. It can be carried out for the whole system or restricted to some subsystems or modules. In a FMECA subsystems or modules are reviewed to identify failure modes of components (i.e. ways in which they can fail) and the causes and effects of these failures. The FMECA is often carried out during the design phase of a system in order to reveal weaknesses and potential failures at an early state. The method can also be used to evaluate redesign and extension of water supply systems, e.g. when a new consumer with special requirement for water quality or quantity is connected to an existing system. The results of a FMECA are risk reduction options of various types e.g. redesign, new procedures, better maintenance. The results from the FMECA may also be useful during modifications of the system and for maintenance planning. The FMECA is mainly a qualitative analysis. The analysis aims to answer the following questions: 1. How can each part of the system conceivably fail? 2. What mechanisms might produce these modes of failure? 3. What could the effects be if the failure occurs? How critical is it? 4. Is the failure in the safe or unsafe direction? 5. How is the failure detected? 6. What inherent provisions are provided in the design to compensate for the failure? The system should first be broken down into a suitable level to adjust the level of detail of the analysis to the purpose and the resources available. The FMECA can be expanded down to a level of detail where estimates of a failure rate can be obtained. The analysis results are recorded in a specific FMECA worksheet (see example Table 9). There are many variations of FMECA sheets, but this is a rather typical example.
52
Table 9. Example of FMECA worksheet

System: Treatment system; Membrane filtration Ref. drawing no.: xx Analysed unit Description of failure (module) Ref. Func- Operati- Failure Failure Detection no. tion onal mode cause or of failure mode mechanism xxy Pump Running Stop Degradation Alarm water while Corrosion running Performed by: NN Date: 2008-10-10 Effect of failure On (sub)system Pressure Reduced loss water quality from treatment On the module Page: 1 of 4
Failure Conse- Criticality Risk rate quence (risk) reduction options 1 per 10 years Medium Medium Maintenance, Stand by pump, Condition monitoring
In Table 9 the heading gives the name of the system or subsystem, and other general information. Then information is successively provided for each unit (module) of the relevant subsystem. The following information is given; see columns in the worksheet: Ref.no: Reference to e.g. a drawing. This could also just be the units name or tag number if it exists. Function: The function of the unit is explained (e.g. pumping water, disinfectant addition) Operational mode: The unit can have different operation modes. A pump can e.g. be running or be in standby. This can affect the possible failure modes. Failure mode: All failure modes should be recorded. The failure mode should be formulated as failure to perform a main function (e.g. failure to start, to pump required flow etc.). One line for each failure mode should be inserted in the form. Failure cause and mechanisms: The possible failure mechanisms and/or events that may cause the identified failure modes are recorded (e.g. corrosion, erosion, fatigue etc.) Detection of failure: The way in which the failure mode is detected is recorded. Effect of failure on unit/module: The local effect of the failure mode is recorded. Effect of failure on the system or subsystem function: The effects that the failure mode has on the system are recorded. Failure rate: The rate of failure (frequency) is recorded for each failure mode. Consequence: The consequence category (severity) of failure mode is recorded.
53
Criticality: The (combined) evaluation of failure rate and consequence related to the failure mode. Risk reduction option: Possible actions to reduce the consequence or the frequency of the failure mode are recorded. In addition there is usually also a Comments field in the form. The FMECA method is a sort of structured brainstorming. The method is quite easy to understand and there is no need for much training. It is however recommended to have a facilitator that is familiar and experienced with the FMECA method. The most important competence of the people that are involved in the analysis is knowledge about the system in question. If one is to perform a FMECA of a complete water supply system, from source to tap, the need for man hour resources will be extensive. The FMECA can be documented by using a spreadsheet as shown earlier, but also many different FMECA software packages exist that will help structuring and executing the analysis. A more thorough introduction to and description of FMECA is given in Appendix D. An introduction to FMECA is also given e.g. in [23]. More detailed presentation of the FMECA method can be found in several standards [77, 78, 79].
6.4 Removal efficiency of the water treatment system
When the hazards and hazardous events are identified during the design phase of a drinking water treatment plant, one should also specify the requirements for the plants treatment system. In particular, there should be an analysis to identify the needed removal efficiency of the treatment system with respect to the hazardous agents identified. This analysis is described in more detail in Appendix E and a short review is given below. An important step of this analysis is to carry out a FMECA in the design phase of the treatment and monitoring system, and then give a list of failure modes of the planned system. The system functionality is expressed in terms of removal efficiencies for each planned treatment step, and for a fixed set of predetermined water quality parameters. First the removal efficiencies are assessed for each treatment step during normal operation. Then the reduced removal efficiencies can be identified for each failure mode, each treatment step and each parameter. Based on the removal efficiencies of the various treatment steps, we can for each water quality parameter and each failure mode calculate the overall removal efficiency for the whole system. The final objective of the method is to obtain the probability that the drinking water quality is insufficient (despite the drinking water treatment system); i.e. we obtain the probability that the concentrations of certain parameters are exceeded in the produced drinking water. Therefore guideline concentration values are determined for each parameter of a predefined list of relevant parameters, (e.g. E.coli, lead, turbidity etc.). Data has to be
54
provided about the probability of the occurrence of certain concentrations of these parameters in the raw water. When the removal efficiencies are determined we can calculate raw water parameter concentrations that must not be exceeded, in order to comply with guideline or threshold values of the produced drinking water. Then we should estimate the probability that these parameter concentrations in the raw water are exceeded. The combination of probabilities, the occurrence of failure modes and the exceedence of the raw water concentrations, results in the calculation of the final probability that the concentration thresholds in the drinking water for certain parameters are not complied with, cf. illustration in Figure 15. In the design phase this approach can help to evaluate options for the design of the treatment system. The method is based on published studies [24], [25], and described in more detail in Appendix E.
Pesticide > 0,03 or
Bacteria > 1 x 10-5 or
+
FM1 > 0.075 > 0.3 FM2
+
> 0.3 FM1
+
>5x 10-6 >5x 10-6 FM2
+
>2x 10-6
Figure 15. Event combinations for the parameters pesticides and bacteria.
6.5 Fault Tree Analysis (FTA)
As part of a total risk analysis we may need to carry out a dedicated analysis of the causes of some hazardous events. There could for instance be a critical safety barrier, which needs to be analysed in more detail, in order to investigate the possible events causing it to fail. For example it could be needed to analyse the effectiveness of a specific treatment system (UV disinfection, filtration, CO2, etc.) in relation to possible failures of its elements. Further, we could look at the entire water utility, to get an overall view of all possible ways that this system can fail (or experience a specific type of failure). In all these cases we can apply a Fault Tree Analysis (FTA). A FTA [23] is particularly suited to identify and analyse systematically the various failure causes of a (sub)system. The specific strength of this method is that it identifies any combination of single failures, which alone are not critical, but together are critical (causing system failure). The main part of a FTA is to construct a fault tree related to a specified undesired (hazardous) event. The fault tree is a logic diagram, which provides a model of the failure causes. It displays the interrelationships between the undesired event and the causes of this event. Any event can be broken down step by step, starting with the main undesired
55
event, which in the FTA is referred to as the top event. The fault tree applies specific symbols, in particular AND-gates and OR-gates (Figure 16).
Figure 16. AND and OR gates of a fault tree.
In Figure 16 the AND-gate is used to model that both Event 1 and Event 2 must occur in order for the Event A to occur. Similarly, the OR-gate is used to show that Event B occurs if either Event 1 or Event 2 occurs (or both). A fault tree analysis is normally carried out in five steps: 1. Definition of the problem, the top event and boundary conditions 2. Construction of the fault tree 3. Identification of minimal cut sets 6 4. Qualitative analysis of the fault tree (e.g. evaluating the criticality of the cut sets) 5. Quantitative analysis of the fault tree (estimate probability of top event) We will present a simple FTA where four of the five steps above are considered. In this example, there are two redundant pumps; i.e. it is sufficient that one is functioning in order to avoid the undesired event. First (step 1), we will investigate the top event, undesired/hazardous event: Failure to pump water (at a specific location) Next step (step 2) is to create the fault tree and we have three possible events that can cause the top event to occur (see Figure 17): The pumps do not receive water (no input) None of the two pumps are working Common motor of pumps fail; incl. loss of power to motor
The second event is broken further down into Pump 1 fails to pump water Pump 2 fails to pump water
Cut set is the combinations of basic events that, if they occur, will cause the top event to occur
56
At this stage we stop to further break the fault tree down and at the bottom of the fault tree we have four basic events, identified as: NI = No input to pumps P1 = Pump 1 fails P2 = Pump 2 fails CP = Common motor/power failure, causing both pumps to fail The further analysis will be based on these basic events.
Figure 17. Example of a simple fault tree. First there can be a qualitative analysis, identifying the cut sets (step 3). For the top event to occur it is sufficient that NI or the CP event occur alone. Finally, the top event occurs if both P1 and P2 occur. So in the above example there are three (minimal) cut sets: S1 = {NI} S2 = {CP} S3 = {P1, P2}
Next (step 4) we can carry out a quantitative analysis. If we can assess the probability of all basic events, we can using the fault tree also quantify the probability that the top event occurs. In the above example: P(top event) P(NI) + P(CP) + P(P1) x P(P2) The quantitative analysis (step 5) is not described here, but an example on this is presented in Appendix C. In summary, a fault tree is a logic diagram that displays the interrelationships between an undesired event in a system and the causes of this event. The causes may be technical failures, human errors, normal events, and environmental conditions. A properly constructed fault tree provides a good illustration of the various combinations of (component) failures, human errors, normal events, and environmental factors that may result in a critical event for the system. A more extensive example is given in Appendix C. The undesired/hazardous event is called the top event of the fault tree. The various events in a fault tree are connected through logic gates, and the events on the lowest level are called basic events. A fault tree may be broken down to the preferred level of resolution.
57
The FTA is seen as a rather advanced method. The fault trees can be quite complex for big systems. The fault tree construction should be carried out in co-operation with risk analysts and with utility personnel that are well acquainted with the operation of the system. The analysis based on the fault tree can also be quite complex. Various software packages are available to draw fault trees, identify cut sets and perform quantifications. Use of fault trees are described e.g. in [23, 26-29]. A more comprehensive example is given in Appendix C (example C1). The FTA is also the main approach of the Gteborg case study, which is an overall analysis of water utility, see Appendix C (example C2) and [3, 30, 31].
6.6 Reliability Block Diagram (RBD)
A reliability block diagram (RBD) is an alternative to a fault tree. The way n components are interconnected to fulfil a specified system function may be illustrated by a RBD, see Figure 18. Each component is illustrated by a block in the diagram. There can be various ways to connect the end points a and b. When component fails it breaks the connection at that point. As long as there is at least one connection between the end points a and b, the specified system functions. In Figure 18 we see that the system fails if either Input water is not available. Both water pumps have failed. Pump motor has failed.
It is usually an easy task to convert a fault tree into a RBD. The RBD in Figure 18 below corresponds to the fault tree (Figure 17) in the previous section. The cut sets of the top event (see previous section) are easily seen from Figure 18. Further, the probability of the system failing, i.e. P(water pump has failed to pump water), can be derived either from the FTA of previous section or from the RBD, using P(water pump is functioning) = 1 P(water pump has failed).
Figure 18. Illustration of a simple reliability block diagram.
Two important structures of a reliability block diagram are a series structure and a parallel structure. These are illustrated in Figure 19 below. A system that is functioning if and only
58
if all of its n components are functioning, is called a series structure. A parallel structure is a system that is functioning if at least one of its n components is functioning.
Figure 19. Illustration of a series structure (left) and parallel structure (right) of a RBD.
Analysis of RBD is presented e.g. in [23]. A detailed description of RBD is found in the IEC standard [80].
6.7 Human Reliability Analysis (HRA)
It is well known that human errors are very important (often the most important) sources of failures. Human errors occur both during operation and maintenance, and it is an important task for any operator to reduce the number of these errors. Human reliability assessment (HRA) deals with the impact of human operators and maintainers on system performance and can be used to evaluate human error influences on water quality and water quantity in the water supply system. HRA can for instance be used for analysing work processes carried out by human operators. HRA is a collective term for various methods (see e.g. [32] for descriptions of HRAmethods). The main steps of HRA-methods are: 1. 2. 3. Task analysis Human error identification Human reliability quantification
Task analysis is the study of what an operator (or team of operators) is required to do, in terms of actions and/or cognitive processes, to achieve a system goal [33]. Task analysis methods can also document the information and control facilities used to carry out the task: Task analysis covers a range of techniques used to describe, and in some cases to evaluate, the human-machine and human-human interaction in systems.
The objective of the task analysis is to describe and characterize the task to be analysed in sufficient detail to perform human error identification and/or human error quantification.
59
The human error identification identifies and describes possible erroneous actions while the human reliability quantification estimates the probability of erroneous actions. There exist several methods to carry out task analysis. The techniques for task analyses are divided into five groups, [33]: Techniques for the collection of task data on human-system interactions. Task description techniques which structure the information collected into a systematic format. Task simulation methods which are aimed at compiling data on human involvements to create a more dynamic model of what actually happens during the execution of a task. Task behaviour assessment methods which are largely concerned with system performance evaluation, usually from a safety perspective. Task requirement evaluation methods which are utilized to assess the adequacy of the facilities which the operator(s) have available to support the execution of the task, and directly describe and assess the interface (displays, controls, tools, etc.) and documentation such as procedures and instructions.
An example of a top level hierarchical task analysis for water sampling is shown in Figure 20.
0. Water sample
1. Warning about water sample
2. Prepare for water sample
3. Carry out water sample
4. Transport sample to laboratory
5. Analyse water sample
6. Report result from water sample
Figure 20. Hierarchical task analysis water sample.
The task analysis is followed by the human error identification. At least, the human error identification should consider the following types of error [32]: Error of omission, i.e., acts omitted or not carried out. Error of commission, i.e., acts carried out inadequately, in wrong sequence, too late or too early. Extraneous act, i.e., wrong (avoidable) act performed. Error-recovery opportunities (possibility to correct errors before a critical event has occurred).
The human error identification is usually documented in a tabular task analysis. An example is illustrated in Table 10. There are several types of error recovery: Internal recovery; the operator having committed an error realises this immediately, or later, and corrects the situation.
60
External recovery; the operator having committed an error, is prompted by a signal from the environment (e.g. an alarm or an error message). Independent human recovery; another operator monitors the first operator, detects the error and either corrects it or brings it to the attention of the first operator, who corrects it. System recovery; the system itself recovers from the human error. This implies a degree of error tolerance, or of error detection and automatic recovery.
Table 10. Human error identification some examples.

Task 1. Warning about water sample Human error Warning not sent Warning sent too late Warning not understood Test tube not disinfected Action omitted Action not carried out correctly (according to rules) Test tube not sealed Sample sent to wrong address Sample not properly packed Action omitted Analysis not carried out correctly Results misinterpreted
2. Prepare for water sample 3. Carry out water sample
4. Transport sample to laboratory 5. Analyse water sample 6. Report result from water sample
Dependent on the purpose of the analysis, it may be possible to quantify the likelihood of the errors involved and then determine the overall effect of human error on system safety or reliability. Human reliability quantification techniques all quantify the human error probability (HEP), which is the metric of human reliability assessment. The human error probability is defines as follows:
HEP =
Number of errors occurred Number of opportunities for error
Further description of the quantification process is given in [32]. Although numerous HRA quantification techniques have been developed and applied over the years, there does not exist one universally accepted methodology with a firm theoretical basis [34]. According to [34], three approaches to quantification can be distinguished: (1) Decomposition or Database Techniques, involving decomposition of tasks to a level for which some reference data are available and can be adjusted according to the specifics of the task. (2) Time Dependent Methods, assuming that human error probability is a function of the time available to respond to an event. (3) Expert Judgement Based Techniques utilising expert knowledge. Some known and recognized HRA-methods are: THERP Technique for Human Error Rate Prediction, [35]. SLIM Success Likelihood Index Method, [36]. HEART Human error assessment end reduction technique, [37].
61
CREAM Cognitive Reliability and Error Analysis Method [38]. ATHEANA A Technique for Human Event Analysis [39]. MERMOS Methode dEvaluation de la Realisation des Missions Oprateur pour la Sret [40]. SPAR-H Standardized Plant Analysis Risk HRA Method [41].
The following knowledge is necessary to carry out human reliability assessments: In depth knowledge about the work process in order to carry out the task analysis. Knowledge about human factors and human errors in order to identify human errors. In depth knowledge about HRA methods and knowledge about human reliability data in order to carry out a quantitative human reliability analysis.
The time consumption of a HRA depends on the scope of the analysis. It may be time consuming to analyse all work processes involving human actions in the water supply system quantitatively. Access to relevant human reliability data as basis for quantitative human reliability analyses may be a problem. If no specific human reliability data for water supply systems are known the analyses have probably to be based on generic human reliability data from other types of industries.
6.8 Markov Analysis
There are situations where a more detailed analysis of the reliability of a system (or component) should be carried out. For instance, a preliminary analysis can have identified a critical component or serious problems are actually observed with the operation of a specific subsystem. As a consequence, it should be carried out an analysis to identify the right level of redundancy and/or preventive maintenance for the system (component). In a Markov analysis we define various performance states (or levels of deterioration) for the system. Example A: In order to carry out a certain operation, two pumps are needed. One can decide either to install just two pumps, or to have a third pump in stand-by. A Markov analysis can then be carried out to compare the performance of these two options. The analysis results for each option are e.g. The probability (fraction of time) that it is necessary to operate with just one pump. The mean number of times (during one year) when it is necessary to operate with just one pump, (or mean number of times that both pumps fail).
Thus, a Markov analysis will in combination with an analysis of costs help us to make a choice between the two options. Example B: Markov analysis can also be used to model deterioration processes; for example consider wastewater pipes. Equipment for direct measurement of the remaining pipe wall thickness exists for water networks (e.g. in WP 5.6 in the TECHNEAU project). Based on such measurements one can identify the state of degradation for various pipe sections. A Markov model can then be developed to describe transitions between deteriorating states,
62
and such a model can help us to analyse and predict the time until the occurrence of pipe failure (i.e. leaks of a certain size). This can help us to plan repairs and replacements of pipes. In general the analysis will start by formulating a Markov model for the system in question. The analyst will in co-operation with personnel who is familiar with the system define various states; corresponding to various levels of performance/deterioration. For instance, we can allow the system to have N +1 states, and let the set of all possible states equal S = {0, 1, ., N}. Here state 0 could denote a perfect system (i.e. system being as good as new), and state N could represent a completely failed system. These states, and possible transitions between these are illustrated by a Markov diagram; see Figure 21 for an example with N=2, (i.e. three states).
Figure 21.Markov state diagram for example C, (illustrating a Markov model). In this specific diagram it is possible to make transitions between state 0 and state 1 (both ways), and between state 1 and state 2 (both ways), but not between states 0 and 2. The Markov diagram in Figure 21 could illustrate the model for a redundant system of two components (say two pumps). Note that in this example it is assumed to be sufficient that one component is working for the system to be OK. The states of the model correspond to State 0: Both components (pumps) are working State 1: One component (pump) has failed and the other is working, (but system is still working) State 2: Both components have failed, (and so system has failed). Transitions between the states here occur according to the following constant rates: = Failure rate of a component (that is operating); (failures per unit time) 1 = Repair rate when one component has failed; (repairs per unit time) 2 = Repair rate when both components have failed; (repairs per unit time) This means that Mean Time To Failure (MTTF) of one component equals MTTF=1/ (Appendix F). When the system is in state 0, both components can fail. Thus, the total rate of state 0 equals 2, (and mean time until a failure occurs equals 1/(2)). Similar results apply for repair times. The repair time of a single component has a mean MTTR = 1/1. When both components have failed, the mean time until one repair is accomplished equals MTTR = 1/2. Note that these MTTR are interpreted as the total time elapsing from a failure occurs until a component is fully restored. Now assume that an analysis should be carried out to derive the repair strategy to follow when both components have failed.
63
The basic task of a Markov analysis is to derive the probabilities p0, p1 and p2 that the system is in state 0, 1 and 2 respectively. That is, px is the probability that the system (process) is in state x, (x = 0, 1, 2). Obviously p0 + p1 + p2 = 1, and by using some equilibrium equations all three probabilities can be derived. For simplicity assume 2 = 1 = ; (this actually means that just one component is repaired at a time, also when both have failed). Then the following probabilities are derived p0 = 2/(2 + 2 + 22) p1 = 2 /(2 + 2 + 22) p2 = 22/(2 + 2 + 22)
Here it is particularly interesting to find the probability, p2, that the system is failed. A numerical example is given: Let time be measured in years, and assume that it is experienced that the average life time (MTTF) of a component equals 2 years. Thus, = 0.5 year-1, and then p2 = 1/(22 + 2 + 1), which shows how the probability to be in the failed state (p2) depends on the repair rate (given the value = 0.5). One strategy could be not to repair a failed component before the next overhaul. If there is an overhaul every year, this could imply that on the average MTTR = 0.5 years; i.e. =2 year-1. This gives p2 0.077; that is system is in the failed state 7.7% of the time. Changing the repair strategy and reducing MTTR to 0.1 year ( 37 days); i.e. =10 year-1, will give p2 0.0045 0.5%. Reducing MTTR further to 0.01 year ( half a week); i.e. =100 year-1, gives p2 510-5. This type of results can help us to choose a sensible MTTR to use for this system. One could also calculate other parameters, like the frequency (rate) of system failures, i.e. p2, and demonstrate how this rate depends on . In summary, the Markov analysis is seen as a rather advanced method, and it will require skilled personnel to carry out such an analyses. Some experience is required both to define the states and the relevant transitions between these. The need of data (i.e. transition rates) is significant. Also note that if the system has several states, (say more than 3-4), the analysis can become rather complex and time consuming, and the use of a data tool is recommended. Such tools are commercially available. Finally it is pointed out that a Markov analysis is based on two specific assumptions: All transition rates are constant in time, and will not depend on the time elapsed since the current state was entered. This implies that all involved failure times and repair times are assumed to have an exponential distribution, (cf. Appendix F). The Markov model is based on the system having a lack of memory; meaning that all transition rates depend only on the current state, and not on the previous history of the system.
So in particular, the transition rates will be independent of how long the system has been in the current state. Rausand and Hyland [23] provide a good introduction to the use of Markov analyses in reliability and risk analyses. There is also an IEC standard for the method which could be helpful for a concise description [81].
64
6.9 Cause- effect relations - Bayesian Networks
The Bayesian Network is an advanced analysis technique to model how various factors affect the performance of relevant systems and thereby the resulting risk. Introduction A probabilistic network is a graphical and qualitative representation of a problem, consisting of parameters, represented by nodes and their interactions represented by connectors [42-44]. The graphical representation is useful for intuitively defining dependencies and independencies of complex problems and for communicating about these problems. Probabilistic networks have become an increasingly popular tool for reasoning under uncertainty. Bayesian networks are a specific subclass of probabilistic networks where the connectors are represented as quantitative probabilistic dependencies between variables (cause-effect relations) in one specific direction. A specific condition for Bayesian networks is that it consists only of directed acyclic graphs, meaning that the network may contain no nodes that lead through other nodes back to itself. The direction of the dependency defines the hierarchy between nodes. If there is a dependency from node A to node B, B is described as a child of A and A as a parent of B. Bayesian networks contain nodes that represent a probability (chance nodes). This probability can be a continuous or a discrete variable. A discrete variable is one with a well defined finite set of possible values (called states), such as the numbers 1 to 6 on a dice or a statement which is either true or false. A continuous variable is one which can take on a value between any other two values, such as: indoor temperature or volume of consumed water. Bayesian networks can be augmented with decision nodes or utility nodes. A decision node represents a variable (or choice) that is under the control of the decision maker. A utility node represents the expected value that is to be maximized while searching for the best decision rule for each of the decision nodes. Bayesian networks consisting of decision nodes or utility nodes are called influence diagrams or decision networks. In Figure 22 an example of an influence diagram is given.
Nodes choice decision utility
Figure 22. A simple influence diagram [45] and node types Figure 22 contains three types of nodes: chance nodes represented by an ellipse, a decision node represented by a rectangle and a utility node represented by a flattened hexagon. The chance node Weather represents whether or not it actually rains during the day (states: rain or no_rain). The chance node Forecast represents the weather forecast in the morning (states: sunny, cloudy or rainy). The decision node stands for the decision
65
whether or not to take an umbrella, and at the utility node the result of the decision makers level of satisfaction is calculated. There is a link from Forecast to Decide_Umbrella indicating that the decision maker will know the forecast when he makes the decision, but no link from Weather to Decide_Umbrella; if he knew for certain what the weather was going to be, it would be easy to decide whether or not to take the umbrella. Use of Bayesian networks in practice Bayesian networks7 result in a quantitative outcome, representing the probability that a certain occurrence will happen. The graphical lay-out (the nodes and the dependencies) are plotted first. Especially for complex problems a certain amount of analytical knowledge is required. The probabilities of the dependencies can be based on:

raw data collected by direct measurement, raw data (mostly perceptions) collected through stakeholder elicitation, output from models, expert opinions based on theoretical calculation or best judgement.
All nodes have an underlying Conditional Probability Table (CPT) containing the probabilities of occurrence. For non-modifying parents (i.e. nodes without parents) the CPT describes the probability of occurrence of the given states (rain or no_rain). For modifying parents (i.e. nodes with parents) the CPT describes the probability of the occurrence of a state, given the state of its parent(s). The most important advantages of Bayesian networks are:
A subtle modelling approach is possible, as probabilities can be obtained from various types of data and per variable (node) various states can be differentiated (e.g. a pipe burst can be characterized as No_burst, Small_burst or Big_burst). The model can be updated with new data, and become a learning model. Adapting Bayesian networks to new situations is relatively easy. The network configuration and its calculation are integrated into one model, facilitating the communication of the approach and results. A large freedom of programming, especially compared to fault and event trees. Both tree types can be integrated into a Bayesian network, with the possibility to link different branches of a tree or to link variables in a fault tree directly to variables in an event tree. The possibility of making sensitivity and what-if analyses (see Section 3.1.1). Specialist skills for building Bayesian networks are not required because well documented software packages are available.
An important disadvantage of Bayesian networks is that with an increasing complexity the amount of input will grow exponentially. Nodes with multiple parents and states will have a large CPT. This high demand of data may require a significant effort, especially when data is obtained through stakeholder elicitation or expert knowledge. Bayesian networks have the ability to import data from other software, spreadsheets and Matlab etc. In these software CPTs can be generated by applying predefined rules. These rules can be based on statistical analysis or expert knowledge.
Where in the remainder of this paragraph reference is made to Bayesian networks, this is also valid for influence diagrams.
66
For the calculation of Bayesian networks different commercial models exist. Most of them have free demo versions. Two well known packages are Netica (www.norsys.com) and Hugin (www.hugin.dk) For a comprehensive list of Bayesian networking packages see: www.cs.ubc.ca/~murphyk/Bayes/bnsoft.html. Example of a Bayesian network In Figure 23 a Bayesian network is given for a system containing a power source feeding two bulbs.
Light
Power source fails
Bulb 1 fails
Bulb 2 fails
Figure 23. Bayesian network of a system consisting of a power source and two bulbs. The network consists of three non-modifying nodes and one modifying node with a result depending on the condition of its three parents. Each node has two states (fails and works), meaning that the condition of the node Light is dependent of eight (23 = 8) conditional probabilities. These conditional probabilities have to be defined into a CPT. In this example it is assumed that Light is a result of either a failing power supply, or two failing bulbs. Table 11 gives the CPT for the node Light.
Table 11. Conditional probability table for node Light. Power source Bulb 1 Bulb 2 no light Light Works Works Fails works fails works fails 0 0 0 1 1 1 1 0 fails works fails works fails works fails 1 1 1 1 0 0 0 0
In the CPT the Bayesian network is modelled for an uncertain condition (the availability of light). It is also possible to model for uncertain relations between conditions. If the states in the node Light are replaced by insufficient_light and sufficient_light and it is assumed that the probability of sufficient light given that only 1 bulb is working, is 0.7, than the CPT of the node Light looks like the one represented by Table 12. Figure 22 and Figure 23 show simple Bayesian networks with a limited number of nodes. More complex networks can be made such as in Figure 24 where an example of a Bayesian network is given for diagnosing the probability that a car starts.
67
Table 12. Conditional probability table for node Light. Power source Bulb 1 Bulb 2 insufficient light sufficient light Works Works Fails works fails works fails 0 0.3 0.3 1 1 0.7 0.7 0 Fails works Fails works fails works fails 1 1 1 1 0 0 0 0
Figure 24. A Bayesian network for diagnosing the probability a car starts [45].
6.10 Event Tree Analysis (ETA) - analysing consequences
Event tree analysis (ETA) is the most commonly used method for analysing the progression of a hazardous event from being initiated to the final consequences. The hazardous event considered in an ETA, is in the analysis often denoted as the initiating event. An event tree is a logic tree diagram that starts from the initiating event and provides a systematic coverage of the time sequence of event propagation to its potential consequences. The event sequence is influenced by safety barriers (or control measures) and the consequences are determined by assuming failure or success of the existing safety barriers (or control measures).
68
Each barrier outcome in the tree will be conditional on the occurrence of the previous safety barrier outcomes in the event propagation. The outcomes of the barriers are most often assumed to be binary (the barrier is either functioning or not), but may also include multiple outcomes (e.g. the barrier fully functions, partly functions or fails). A simple event tree was illustrated in Figure 12, section 2.6. ETA is a method used to analyse the consequences of hazardous events (Right part of the overall risk model outlined in Figure 13, section 2.6). ETA can be carried out both qualitatively and quantitatively. The qualitative part of the event tree analysis is usually carried out in the following steps: 1. Identification of relevant initiating event that may give rise to unwanted consequences. 2. Identification of safety barriers provided to stop or mitigate the unwanted consequences (discussed in section 2.5). 3. Construction of the event tree. 4. Description of the resulting consequences. The initiating event may be identified by other risk analysis methods presented in this chapter like FMECA, PHA or HAZOP. The event tree displays the chronological development of event chains (from left to right), starting with the initiating event and proceeding through successes and/or failures of the safety barriers that respond to the initiating event. At every barrier the event tree splits into one upper and one lower branch, (Figure 12). If the barrier functions successfully, follow the upper branch, if not follow the lower branch. Hence, the consequences will be ranked in an increasing order, with the worst consequence lowest on the list. Safety functions, or barriers, are provided to stop or mitigate the consequences of the hazardous events. The safety functions may comprise technical equipment, human interventions, emergency procedures, and combinations of these. If experience data are available for the initiating event and for all the relevant safety barriers and hazards, a quantitative analysis of the event tree may be carried out to give probabilities or frequencies of the resulting consequences. The conditional probability that each safety barrier will function properly given that the previous event sequence has occurred must be estimated. The reliabilities of the safety function may be carried out by e.g. a FTA (see Figure 13, section 6.1, where failure of barrier 3 is a top event in a fault tree). Then the probability of each consequence category for the specified hazardous event is estimated by multiplying all probabilities in the event sequence, from the initiating event to the consequence class under consideration. The frequency of each consequence is estimated by the product of the frequency of the initiating event and the probability of the consequence category. Finally by quantifying the consequence categories, the total risk can be evaluated.
69
6.11 Methods for estimation of risk to human health (QMRA and QCRA)
There are general approaches for assessment of various risks to human health, for instance Health Impact Assessment and Health Risk Assessment. These methods aim to assess the actual health effects to consumers, due to undesired events at the water supply system. These are referred to in [2], and just a short review is given here. QMRA is a specific tool for risk assessment of microbiological quality of drinking water (QMRA is considered here to be a subtype of Health Impact Assessment, or HRA) see TECHNEAU report for details [46]. QMRA is derived from the chemical risk assessment (QCRA) paradigm that encompasses four basic elements: A characterisation of the problem setting (system description), including identification of hazards (pathogens) and hazardous events. Exposure assessment (e.g. duration of event, numbers of consumers affected). Effect assessment (dose-response curves for specific pathogens). Risk characterisation.
The principles behind QMRA has been described in [47] and further developed and applied during recent years in relation to risk management [48-50]. The EU project MicroRisk resulted in a number on reports [51] describing the use of QMRA. This project included pathogen samplings and risk assessments involving 12 systems across Europe and in Australia. Available doseresponse data have been obtained mainly from studies using healthy adult volunteers.
6.12 Methods for risk analysis of water quantity (supply)
The loss of water supply due to bursts and leaks can be analysed by a combination of hydraulic and reliability models. One example of such a model is developed in the CAREW project (see http://care-w.unife.it/ and [82]), and is denoted CARE-W REL. Here a hydraulic network simulation model, i.e. EPANET, is combined with a routine forecasting the probability of failure for each pipe. The joint model considers the probability of leak or bursts as well as reduction of flow capacity and the consequences for water supply measured as flow and pressure to the consumers. The model identifies reliability bottlenecks in the network. It has been used and tested in several cities to assess the reliability of water supply to sensitive consumers and entire water supply districts. Figure 25 shows an example of a CARE-W REL analysis, where the bottlenecks are marked red. A Hydraulic Criticality Index (HCI ) equal to zero means that the pipe has no effect on reliability (with respect to water supply), either because the pipe has forecasted failure rate is equal to zero, (MTTF = ), or because its unavailability (repair of failure for instance) has no effect on consumer interruptions to supply. A HCI equal to 1 means that the pipe is totally unavailable and that its unavailability will result in supply interruptions for all consumers served. Even if this value cannot be reached, this 0-1 scale allows one to compare pipes which belong to different hydraulically independent networks. The pipe availability is also considered, integrating failure rate and mean time to repair (MTTR).
70
Figure 26 provides an example of a map displaying the hydraulic criticality of water mains in a particular network.
2 2 9 T a r a ld s v ik
1 4 3 S y ke h u s
R ED . V EN T IL R ED . V EN T IL R ED . V EN T IL T A R A L D S V V A NNF K O NTRO L L O S CA RB O RG 7 3 H g s k u le
1 6 3 F ly p la s s
5 1 In d u s t r i
2 0 0 A n k e n e s B o o g S e r v ic e s e n t e r
1 6 9 In d u s t r i A N K EN ES 1 7 7 In d u s tr i N Y B O R G PS
2 1 8 H k v ik p s y k ia tr is k e n o r d la n d s k lin ik k e n
Figure 25. Finding reliability bottlenecks of water through combination of hydraulic capacity and failure probability (Example from CARE-W REL).
152 153 138 150 151 158 149 148 12 24 5 7 137 4 11 10 1 2 9 8 3 14 15 16 34 44 73 45 36 79 78 80 102 103 116 126 118 104 117 105 119 120 114 113 214 125 124 123 115 127 81 95 83 82 94 37 77 17 72 76 128 18 157 19 74 75
162 160 154 155
163
164 169
170 217 168
175 159 156 166 25 21 22 23 201 167
6 13
20
50
51 52 28
202 41 39 93 40
101
35
38
42 46 43
208 97 86 85 84 213 88 87 89
96
99
Figure 26. Map displaying the hydraulic criticality of water mains: bold lines refer to a high criticality, (example from CARE-W).
71
Other models than CARE-W REL exist. In the Netherlands all water companies apply for many years a reliability analysis of the total system. This has been integrated into the national policy.
6.13 GIS as a tool in risk analysis 6.13.1 Introduction
Risk analysis strategies and techniques for application in the water utility sector have been surveyed at the strategic, program, and operational levels of decision making by [52]. A comprehensive review has been presented and analyses show that Geographical Information Systems (GIS) have the potential to become part of the risk assessment portfolio [53]. GIS assisted risk analyses have a broad application not only in public health protection but also for asset management and potential threats to the security of supplies. The risk analysis portfolio using GIS-techniques is compiled in Table 13. GIS is used at the program risk analysis-level to optimise the total cost of owning and operating the infrastructure assets of a water utility. Due to the spatial context of risk, GIS technologies allow utilities to convert data displayed on paper maps into digital format. Based on that, GIS provide the visualisation of infrastructure assets and the tracking of their associated risk factors [53]. Furthermore, applications of GIS technologies offer the capabilities to spatially analyse data, to examine infrastructure deterioration induced by spatially variable risk factors, for example (e.g. [54] [55] [56]).
Table 13. Program and Operational level Risk portfolio including the use of GIS (extracted from [53]).
Risk hierarchy Tool / Technique Context PROGRAM RISK ANALYSIS LEVEL Asset management GIS risk tracking GIS spatial analysis GIS risk simulation Catchment management GIS risk mapping Contaminant flow/ transport modelling Kriging GIS risk simulation OPERATIONAL RISK ANALYSIS LEVEL Public Health and GIS simulation Compliance Risk Application
Infrastructure risk-tracking, visualisation and communication Risk-mapping of infrastructure Evaluating degradation risk Mapping areas of catchment critical to water quality Projecting degradation patterns / assessing risk of water quality violation Projecting degradation patterns with limited sample data (e. g. groundwater) Quantified risk mapping over space and time Assessing risk of distribution system water quality degradation
72
Methods for risk analysis of drinking water systems from source to tap 6.13.2 GIS in catchment risk management
Because of the spatial context of risk and the classical capabilities of Geographical Information Systems for spatial data analysis, many GIS-applications deal with catchment or watershed management. The catchment is often referred to as the first barrier within the so called multi-barrier-system. Risk analysis of water supply in a catchment-scale has to consider a multitude of possible sources of hazardous events, caused by natural or human factors such as wild animals, erosion, land use, industry, traffic, and recreational activities. The natural properties and the anthropogenic land-use patterns in the catchment play a decisive role for the raw water quality. GIS can support the identification of hazardous events in the catchment areas by querying the natural boundary conditions and land-use patterns which may lead to the release of hazardous substances or pathogens. Therefore GIS is an appropriate instrument for the preventive protection and for the management of risks in the catchment area. GIS techniques in catchment management [53] include: Mapping of data and attributes that are spatially variable in nature considered to play a significant role in pollutant transport (e. g., geology, rainfall, soil type, agricultural activities, etc.) Spatial risk-ranking methodologies of these attributes, according to predefined formulas (e. g., weighted runoff-potential index). Map-overlay techniques to identify areas critical to catchment water quality and to inform the prioritisation of catchment management activities, specifically monitoring programs. Using geo-statistical inference (kriging methods) to characterise the extent and severity of source contamination (e. g. contaminant concentration)
The hydraulic and geological settings determine the natural protective function of the catchment, as the first barrier to water supply. For groundwater resources the degree of this protective function can be expressed as intrinsic vulnerability, which is independent from the chemical or physical properties of the specific hazards (specific vulnerability) [57]. Groundwater contamination risk assessments and mappings that are using GIS are often based on vulnerability maps. Such approaches have been applied by [58] [59] or for karst groundwater [60] [61] [62].
6.13.3 GIS Assisted Risk Analysis Description and application
During the German TECHNEAU case study in Ebnet-Freiburg, TZW researchers used a GIS based approach for catchment risk assessment, combining hazard mapping, hazard ranking and groundwater vulnerability mapping (GIS Assisted Risk Analysis, GARAmethod), see the case study report [63]. A short summary is given below. The TECHNEAU Hazard Database THDB [20] is applied as basis for hazard identification. Information on the exact location of hazards (including map coordinates) and land take (aerial extent) is added, using the GIS attribute table. It is then possible to display the hazards and assess them using a Geographical Information System.
73
Figure 27. Schematic illustration of the procedure of hazard mapping and hazard ranking used in Cost Action 620 (2004) [60] The hazard identification is supported by brainstorming, an analysis of earlier incidents, and a field survey. During that field survey the identified hazardous events in the catchment area are revised and described more closely. The weighting of the hazard in the GIS attribute table is done by assigning a score ranging from 0 to 100, expressing the harmfulness of a hazard. The weighting is based on literature data [60] and expert judgement and experience of the water utilities personnel. In porous aquifers, the degree of groundwater contamination depends on the travel time and the dilution with uncontaminated groundwater. Therefore a second ranking procedure modifies this score slightly by reducing or increasing it due to the distance to the water abstraction wells. Each groundwater protection zone is assigned with a ranking factor ranging from 0.8 (outer protection zone) to 1.2 (inner catchment zone). By overlaying the GIS hazard-layer and the GIS protection zone-layer, it is possible to calculate the final Hazard Level. To consider overlapping hazards, the single weighted hazard scores are added in the overlapping area. The result is a semi-quantitative hazard ranking, expressed as a catchment hazard map (Figure 27). The next step is the digital mapping of the intrinsic groundwater vulnerability based on soil-data and the hydrogeological setting of the catchment area (like groundwater recharge, infiltration of surface waters, depth to groundwater table etc.). The PI-method for groundwater vulnerability assessment [61] was applied and adapted in the TECHNEAU case study in Freiburg in Germany [63]. The final product, the Risk Intensity map, is the result of overlaying the vulnerability map and the hazard map with the GIS. The multiplication of the reciprocal value of the Hazard level with the PI-factor expressing the vulnerability level is performed with the GIS. The result, the Risk Intensity Index is displayed in a map, visualizing the risk associated with the hazardous events, depending on the hazards properties and their location within the catchment, based on its natural protective function. This Risk Intensity map can be the basis for future considerations on risk management and risk reduction options in the catchment area. Figure 28 demonstrates how the hazard level and the vulnerability factor can be combined to express the risk intensity.
74
Figure 28. Diagram of risk intensity index with five different classes assigned to build risk classes. [64]
6.13.4 Main requirements
In the German case study [63] a commercially available GIS software (Desktop ArcGIS 9.2, ESRI) was used. Other GIS software is either commercially or has open source GIS software available. To apply GIS analysis, a certain degree of competence and training is necessary. But the decisive factor for the use of all GIS is the availability of appropriate information or digital data. In the case study the utilities CAD datasets on water supply structures, well locations etc. were used. Maps and aerial pictures were used as topographic base. Land-use patterns were already existent in GIS shape file format, due to the catchment management activities of the water utility. Information on soil structure, hydrogeology, groundwater recharge and infiltration of surface waters have been derived from the local hydraulic groundwater model and transformed to GIS shape files. Tabular data, for example additional information on hazards from the THDB [20] have been linked to the GIS attribute table using the Join-function. In some cases, tabular, CAD-, or GIS-data may be available from authorities. If no digital data exist, they have to be created by geo-referencing paper maps, and digitising information obtained from maps and field surveys. Digitizing can take place at a desktop PC or GPS-assisted, with tablet PCs directly in the field. Depending on the degree of complexity of the projects objective and the available data basis some effort may be necessary to gather the data needed.
6.13.5 Concluding comment
The use of GIS Assisted Risk Analysis (GARA-method) in the case study [63] illustrated that risk management is an iterative process of continuous updating as new information
75
becomes available and as the preconditions change. This outcome is in line with the TECHNEAU Generic framework [2]. By storing and updating all spatial information in a GIS, the entire risk management process from the identification of hazards, and the estimation of risks, through the evaluation of risk tolerability and identification of potential risk reduction options, to the selection and implementation of appropriate risk reduction and monitoring measures can be operated regularly. The plain visualisation of Risk Intensity in a map is a very vivid and convenient tool for communication of risks between the various involved stakeholders.
76
7 Summary of risk analysis methods

The report provides an overview of main risk analysis methods for a water utility. The objective is to describe the tasks of a risk analysis, and to demonstrate the applicability and capabilities of the various methods, and thus support the implementation of the Generic framework and methods for integrated risk management in water safety plans [2]. Risk analyses provide useful tools for the management to control the variety of hazards and hazardous events of the water utility; which could be e.g. Contamination of the catchment area, (biological/chemical); Failure of the treatment systems, (technical/human); Failure of the distribution system, (water leakage out; intrusion of contaminated water into pipes). So the risk picture for water utility is quite complex, including e.g. technical, biological and human aspects of a large and diverse system. It is also required to consider and balance the risks related both to water quality and quantity; a fact that further increases the complexity. Main steps of a risk analysis are (se Appendix A for details): 1. Define objectives and scope of the risk analysis. Make a system description and plan the work 2. Identify hazards and hazardous events 3. Estimate risk A risk analysis is an important action to identify the hazards and hazardous events. In this respect the THDB is a useful tool. However this application must be supplemented by a systematic use of experts with a thorough knowledge of the water supply system. The Coarse Risk Analysis (CRA) will often be the basic risk analysis method for the water utilities. In addition to support the identification of hazards and hazardous events, this method will in a rough way estimate the related risks; as a basis for identify and implement effective risk reduction options. It is expected that most water utilities have the competence to carry out a CRA with some assistance from risk analysts. Most of the other risk analysis methods, besides the CRA and required in various situations, are rather complex, and these should most often be carried out by risk analysts in close cooperation with personnel from the water utility. Table 14 gives an overview of some applications of risk analysis methods for several decision situations. In the column analysis complexity we use the following symbols: H: M: L: High complexity of the analysis. Medium complexity of the analysis. Low complexity of the analysis.
This classification is given as an indication only, and we note that it refers mainly to modelling complexity. For instance FMECA is given complexity L. However, also this method requires extensive system knowledge.
77
Table 14. Overview of risk analysis methods.
Life cycle Decision / Purpose of analysis phase
Method Name HAZOP/Hazid FMECA Removal efficiency Analysis Complexity M/L L H H M L/M L H M H H M L M Comments / Examples
Section
Select type of water treatment
Hazards to water source/catchment area 6.2, 3.1 Reliability of treatment systems 6.3 Specification of treatment system For distribution only Establish monitoring system. Primarily for source & treatment Identify need for risk reduction options Technical failures; (primarily for treatment?) E.g. to investigate redundant systems E.g. to investigate redundant systems Analyse potential for human errors causing maloperation Analyse (effects of) microbial/chemical contaminations Plans for warning consumers, obtain substitute of delivery, recovery, . Analyse hazardous events of construct. Identify hazards / hazardous events for water source Prioritise risk reduction options Improve procedures Identify causes of failure events Consequences of undesired events Effect of risk influencing factors More complete picture of hazards/vunerabilitie Optimise water availability for consumers Causes of network failures E.g. food industry, hospital, . Maintenance optimisation Identify threats and vulnerable points New buildings, roads, animals, etc. New hazards appear? Identify new failure causes 6.4 6.12 3.2 3.1, 6.2 6.3 6.5, 6.6 6.7 6.11 3.2 3.2 6.2
Design and development
Select/design distribution Network model system, (capacity, redundancy) Identification of control points Hazard identification CRA (HACCP) Hazid/HAZOP FMECA Plan for risk reduction/avoidance FTA RBD HRA QMRA/QCRA Develop emergency plans Could be based on CRA CRA HAZOP
Production and/or construction
Avoid construction work to pollute water source
Protect against undesired events
CRA (HACCP) HRA FTA
L/M H H M H H H H M/L H M/L L L/M H M
3.2 6.7 6.5 6.10 6.9 6.13 6.12 6.5 6.2 6.8 6.2 3.1 3.1 6.5 6.6
Extend risk analyses to cope with specific problems
ETA Bayes Network GIS
Operation
Changes in network capacity or Network model reliability FTA New (type of) users to be HAZOP/hazid connected Unreliable equipment observed Markov Security problems; new threats; HAZOP/hazid Changes in environm. of source Hazid Hazid/HAZOP Modifications / Life extension FTA RBD
78
A major problem in the performance of risk analyses is the scarcity/lack of relevant data for instance regarding failure events. Therefore water utilities should design their own data base, and record undesired events with causes and consequences; such that various failure probabilities/rates can be estimated. Generic data are not so useful and may not be available. However, it would be useful if water utilities apply a similar design of their databases and allow exchange of data with other water utilities.
79
80
8 References
[1] [2] [3] TECHNEAU, Technology enabled universal access to safe water Annex I "Description of work", Contract no. 018320-02, 2005. Rosn, L., Hokstad, P., Lindhe, A., Sklet, S., Rstum, J. , Generic framework and methods for integrated risk management in water safety plans, 2007. Lindhe, A., Pettersson, T. J. R., Rosn, L., Norberg, T., strm, J., Bondelind, M., Rstum, J., Hokstad, P., Sklet, S., Sturm, S., Kiefer, J., Ball, T., fverstrm, B., Trnqvist, M., Swartz, C. D., Koek, F., Weyessa Gari, D., Pumann, P., Runtuk, J., aek, J., Tuhovk, L., Ruka, J., Beuken, R., Bosch, A. and Reinoso, M., Summary report: Risk assessment case studies. Deliverable Number D4.1.5g, TECHNEAU, 2008. Aven, T., Vinnem, J. E. and Wiencke, H. S., A decision framework for risk management, with application to the offsore oil and gas industry, Reliability Engineering & System Safety. 92 (2007) 433 - 448. Petterson, S., Signor, R., Ashbolt, N. and Roser, D., QMRA methodology, MICRORISK, 2006. EC, The Drinking Water Directive, EU Council Directive 98/83/EC, 1998. van Mil, B. P. A., Dijkzeul, A. E. and van der Pennen, R. M. A., A View on risks. Risk modelling handbook, Berenschot Process Management, Utrecht, 2006. World Health Organization (WHO), Guidelines for Drinking-water Quality, Geneva, Switzerland, 2004. Norwegian Technology Centre, Norsok Z-013, Risk and emergency preparedness analysis, 2001. Freeman III, A. M., The Measurement of Environmental and Resource Values: Theory and Methods, Resources for the Future, Washington DC, 2003. US_EPA, Guidelines for Preparing Economic Analyses, EPA 240-R-00-003, 2000. Johansson, P.-O., Cost-Benefit Analysis of Environmental Change, Cambridge University Press, Cambridge, 1993. Johansson, P.-O., Evaluating Health Risks: An Economic Approach. : , Cambridge University Press, Cambridge, 1995. Rosn, L., Sderqvist, T., Soutukorva, ., Back, P.-E., Grahn, L. and Eklund, H., Risk valuation in selection of remedial strategies. Description of methods and examples (summary in English), Report 5537, Swedish Environmental Protection Agency, Stockholm, 2006. NRC, Valuing Ground Water. Economic Concepts and Approaches, National Research Council, National Academy Press: Washington DC, 1997. Carson, R. T., Flores, N. E. and Meade, N. F., Contingent Valuation: Controversies and Evidence, Environmental and Resource Economics. 19 (2001) 173-210. Smith, A., Capital maintenance: a good practice guide. Leading Edge Asset Decisions Assessment (LEADA), Water Asset Management International. 15, 1.1 (2005). IEC, Dependability management Part 3: Application guide, Section 9: Risk analysis of technological systems, IEC 300-3-9: 1995, IEC, 1995. USDOE, DOE Handbook chemical process hazard analysis, DOE-HDBK-1100-2004, US Department of Energy, Washington D.C., 2004. Beuken, R., Reinoso, M., Sturm, S., Kiefer, J., Bondelind, M., strm, J., Lindhe, A., Rosn, L., Pettersson, T.J.R., Machenbach, I., Melin, E., Thorsen, T., Eikebrokk, B., Hokstad, P., Rstum, J., Niewersch, C., Kirchner, D., Kozisek, F., Gari, D. W.,
[4]
[5] [6] [7] [8] [9] [10] [11] [12] [13] [14]
[15] [16] [17]
[18] [19] [20]
81
[21] [22]
[23] [24]
[25]
[26]
[27]
[28]
[29]
[30]
[31]
[32] [33] [34]
[35]
[36]
Swartz, C., and Menaia, J., Identification and description of hazards for water supply systems - A catalogue of today's hazards and possible future hazards, updated Version August 2008. Deliverable D4.1.1, 4.1.2 and 4.1.4, TECHNEAU, 2008. ISO 14121-1. Safety of machinery - Risk assessment. Part 1: Principles, International Standardization Organization, 2007. NFSA, kt sikkerhet og beredskap i vannforsyningen - Veiledning (Improved safety and emergency preparedness in water supply - Guidance) (In Norwegian), Norwegian Food Safety Authority, Oslo, 2006. Rausand, M., Hyland, A., System reliability theory. Models, statistical methods, and applications, Wiley-Interscience, New Jersey, U.S.A., 2004. Lan, J. M., Dmotier, S., Odeh, K., Schn, W. and Charles, P., Risk assessment for drinking water production: assessing the potential risk due to the presence of Cryptosporidium oocysts in water., Water Science and Technology: Water Supply 2. No 3 (2002), 55-63. Dmotier, S., Odeh, K., Schn, W., Charles, P., Footohi, F. and Allioux, J.-F., Risk assessment for drinking water production process, In proceeding of European Conference on System Dependability and Safety, Lyon, France, 2002. Risebro, H. L., Dn oria, M. F., Andersson, Y., Merdema, G., Osborn, K., Schlosser, O. and Hunter, P. R., Fault tree analysis of the causes of waterborne outbreaks, Journal of Water and Health. 5, 1 (2007) 1-18. Rosn, L. and Steier, K., Risk assessment of water quantity, Gothenburg, In Swedish, Consulting report, Contract no. 1310786000, SWECO VIAK AB, Gothenburg, Sweden, 2006. Lindhe, A., Rosn, L., Norberg, T., Pettersson, T. J. R., strm, J. and Bondelind, M., Fault-tree analysis for integrated and probabilistic risk analysis of drinking-water systems. In prep., Chalmers, Gteborg, Sweden, 2008. Rosn, L., Bergstedt, O., Lindhe, A., Pettersson, T. J. R., Johansson, A. and Norberg, T., Comparing raw water options to reach water safety targets using an integrated fault tree model, International Water Association Conference, Water Safety Plans: Global Experiences and Future Trends, Lisbon, 2008. Lindhe, A., Rosn, L., Norberg, T., Pettersson, T. J. R., Bergstedt, O., strm, J. and Bondelind, M., Integrated risk analysis from source to tap: Case study Gteborg, Nordic Drinking Water Conference, Oslo, 2008. Lindhe A., Rosn L., Norberg T., strm J., Bondelind M., Pettersson T.J.R. and Bergstedt O. Risk assessment case study Gteborg, Sweden, Deliverable 4.1.5a, TECHNEAU, 2008. Kirwan, B., A guide to practical human reliability assessment, Taylor & Francis, London, 1994. Kirwan, B., Ainsworth, L. K., A guide to task analysis, Taylor & Francis, London, 1992. Hirschberg, S., Human Reliability Analysis in Probabilistic Safety Assessment for Nuclear Power Plants, CSNI Technical Opinion Papers No. 4, OECD, Nuclear Energy Agency, 2004. Swain, A. D. and Guttmann, H. E., Handbook of human reliability analysis with emphasis on nuclear power plant applications: Final report NUREG CR-1278, SAND80-200, Sandia National Laboratories Statistics Computing and Human Factors Division, Albuquerque, 1983. Embrey, D. E., Humphreys, P., Rosa, E. A., Kirwan, B. and Rea, K., SLIM-MAUD: An approach to assessing human error probabilities using structured expert judgment, Department of Energy, USA, 1984.
82
[37]
[38] [39]
[40]
[41] [42]
[43]
[44] [45] [46] [47] [48]
[49]
[50] [51]
[52]
[53]
[54] [55]
Williams, J. C., HEART - a proposed method for assessing and reducing human error, The 9th Advances in Reliability Technology Symposium, University of Bradford, 1996. Hollnagel, E., Cognitive reliability and error analysis method : CREAM, Elsevier, Oxford, 1998. NRC, Technical Basis and Implementation Guidelines for A Technique for Human Event Analysis (ATHEANA), NUREG-1624, U.S. Nuclear Regulatory Commission, 2000. Bieder, C., Le Bot, P., Desmares, E., Cara, F. and Bonnet, J. L., MERMOS: EdFs new advanced HRA method, The 4th international conference on probabilistic safety assessment and management (PSAM 4), New York, 1998. Gertman, D., Blackman, H., Marble, J., Byers, J. and Smith, C., The SPAR-H Human Reliability Analysis Method, NUREG/CR-6883, Idaho National Laboratory, 2005. Caine, J., Planning improvements in natural resources management, guidelines for using Bayesian networks to support the planning and management of development programmes in the water sector and beyond, Center for Ecology & Hydrology, Crowmarch Gilford, Wallingford, ISBN 0903741009, 2001. Kjaerulff, U. B. and Madsen, A. L., Probabilistic networks for practitioners A guide to construction and analysis of Bayesian networks and influence diagrams, Aalborg University, 2006. Wiel, v. d. W., Risicoanalysemethoden ten behoeve van infrastructuur voor drinkwaterdistributie, TNO-D-R0355/A, Delft, 2007. Netica, Netica Helpfile, Netica version 3.25, www.norsys.com, 2007. Kirchner, D., Niewersch, C. and Wintgens, T., Application of risk assessment in the drinking water sector. Literature review, internal report TECHNEAU, 2006. Haas, C. N. R. and Gerba, J. B., Quantitative Microbial Risk Assessment, John and Wiley & Sons, Inc, New York City, USA, 1999. Medema, G. and Ashbolt, N. J., QMRA: its value for risk management, Available: <http://www.microrisk.com/uploads/microrisk_value_of_qmra_for_risk_manage ment.pdf>. Accessed on 2008-04-29, Microrisk, 2006. Smeets, P., Rietveld, L., Hijnen, W., Medema, G. and Stenstrm, T.-A., Stochastic modelling of drinking water treatment in quantitative microbial risk assessment, Delft University of Technology, Delft, 2008. Westrell, T., Microbial risk assessment and its implications for risk management in urban water systems, Linkping University, Linkping, 2004. MicroRisk, Microbiological risk assessment: a scientific basis for managing drinking water safety from source to tap. " Available: <http://www.microrisk.com/publish/cat_index_6.shtml>. Accessed on 2008-0429., The Microrisk consortium, 2006. Pollard, S., Strutt, J., MacGillivray, B., Hamilton, P. and Hrudey, S., Risk analysis and management in the water utility sector, Process Safety and Environmental Protection. 82, B6 (2004) 453-462. MacGillivray, B., Hamilton, P., Strutt, J. and Pollard, S., Risk Analysis Strategies in the Water Utility Sector: An Inventory of Applications for Better and More Credible Decision Making. Critical Reviews, Environmental Science and Technology. (2006) 85-139. Doyle, G. and Grabinsky, M., Applying GIS to a water main corrosion study. 95 [5], 90-104., Journal AWWA. 95, 5 (2003) 90 - 104. Lindley, T. and Buchberger, S., Assessing intrusion susceptibility in distribution systems, Journal AWWA. 94, 6 (2002) 66 - 79.
83
[56] [57] [58] [59] [60]
[61]
[62]
[63] [64]
[65] [66]
[67] [68] [69]
[70]
[71] [72]
[73]
[74]
[75] [76]
Lipponen, A., Applying GIS to assess the vulnerability of the Pijnne waterconveyance tunnel in Finland, Environmental Geology, Springer-Verlag (2007). Zwahlen, F., Vulnerability and Risk Mapping for the Protection of Carbonate (Karst) Aquifers. Cost Action 620, Final Report, 2004. Civita, M. and De Maio, M., Assessing Groundwater contamination risk using ArcInfo via GRID function http://gis.esri.com/library, 2007. Ducci, D., GIS techniques for mapping groundwater contamination risk, Natural Hazards 20, 2-3 (1999) 279 - 294. De Ketelaere, D., Htzl, H., Neukum, C., Civita, M. and Sappa, G., Hazard Analysis and Mapping. Vulnerability and Risk Mapping for the Protection of Carbonate (Karst) Aquifers. Cost Action 620, Final Report, 2004. Goldscheider, N., Klute, M., Sturm, S. and Htzl, H., The PI method - a GIS-based approach to mapping groundwater vulnerability with special consideration of karst aquifers, Z. angew. Geol. 46, 3 (2000) 157-166. Nguyet, V. and Goldscheider, N., A simplified methodology for mapping groundwater vulnerability and contamination risk, and its first application in a tropical karst area, Vietnam, Hydrogeology Journal, Vietnam. 14 (2006) 1666-1675. Sturm S., Kiefer J. and Ball T. Risk assessment case study Freiburg-Ebnet, Germany, Report no. D 4.1.5d, TECHNEAU, 2008. Htzl, H., Delporte, C., Liesch, T., Malik, P., Neukum, C. and Svasta, J., Risk mapping. edited by Zwahlen F.: Vulnerability and Risk Mapping for the Protection of Carbonate (Karst) Aquifers. Cost Action 620, Final Report , 113-120., 2004. Rstum J. and Eikebrokk B. Risk assessment case study Bergen, Norway, Report no. D 4.1.5b, TECHNEAU, 2009. Norberg, T., Rosn, L. and Lindhe, A., Added value in fault tree analyses, European Safety and Reliability Association and Society for Risk Analysis Europe Conference, Valencia, 2008. Rausand, M., Risk analysis (In Norwegian: Risikoanalyse), Tapir Forlag, 1991. SINTEF, OREDA, Offshore Reliability Data, 2002. Stouthart, M. E. A., Essink-Bot, M.-L., Bonsel, G.J., Disability weights for diseases: a modified proyocol and results for a Western European region, Eur. Journal of Public Health. 10 (2000) 24-30. Melse, J. M., Essink-Bot, M-L., Kramers, P.G.N. Hoeymans, N., A National Burden of Disease Calculation: Dutch Disability-Adjusted Life-Years, American Journal of Public Health. 90, 8 (2000) 1241-1247. International Burden of Disease Network (IBDN), Global Burden of Disease (GBD) Study. (GBD 2000 project) 2000. DIN EN 60812. Analysetechniken fr die Funktionsfhigkeit von Systemen Verfahren fr die Fehlzustandsart- und auswirkungsanalyse (FMEA) (IEC 60812:2006, Deutsche Fassung EN 60812:2006, 2006. Beuken R., Meerkerk M., Bosch A., Reinoso M., and Mesman G. (2008). Risk assessment case study Amsterdam, The Netherlands, Deliverable 4.1.5c, TECHNEAU. Koek F., Weyessa Gari D., Pumann P., Runtuk J., aek J., Tuhovk L., Ruka J. and Paprnk V. Risk assessment case study Beznice, Czech Republic, Deliverable 4.1.5e, TECHNEAU, 2008. Trnqvist M., fverstrm B. and Swartz C. Risk assessment case study Upper Mnyameni, South Africa, Deliverable 4.1.5f, TECHNEAU, 2009. WHO. Water Safety Plans Managing drinking-water quality from catchment to consumer. World Health Organization, Geneva, 2005.
84
[77] [78] [79] [80] [81] [82]
SAE ARP 5580. Recommended Failure Modes and Effects Analysis (FMEA) Practices for Non-Automobile Applications, 2005. IEC 60812. Analysis techniques for system reliability - Procedures for failure mode and effect analysis (FMEA), 2006. BS5760-5: 1991 - Reliability of systems, equipment and components. Guide to failure modes, effects and criticality analysis (FMEA and FMECA), 1991. IEC 61078. Analysis techniques for dependability - Reliability block diagram and boolean methods, 2006. IEC 61165. Application of Markov techniques, 2006. CARE-W. Description of technical tools for failure forecasting and network reliability (WP2) (Deliverable D3). http://www.sintef.no/upload/24472/D03%20Models_Description.pdf
85
86
Appendix A: Main steps of a risk analysis

There are three main steps of a risk analysis, cf. Section 1.3. Some details of these steps are given in Figure 29, and references to relevant parts of the reports are also given. 1. Definition of scope of analysis In defining the scope of the risk analysis, the following steps should be included: a. Select a team to involve in analysis, including both water utility experts and outside professionals (e.g. risk analysts); organise the working process. (Section 2.1). b. Define the scope of risk analysis. Why is it carried out? Is it aimed at a limited part of the system or is it an integrated risk analysis in concordance with the Water Safety Plan, comprising a source to tap approach. (Sections 2.1, 2.2). Are there restrictions related to scope, for instance with respect to - hazardous events to consider, (e.g. exclude terrorist acts?); or - with respect to types of consequences to investigate, (e.g. only consider risks to water quality). (Section 2.1). Which dimensions of risk shall be treated? (Section 4.1). c. Describe system to be analysed and main functions of these systems. Specify system boundaries. (Section 2.3). 2. Identify hazards and hazardous events. Should include: a. Collect available data on hazards/hazardous events, (i.e. experience from the past); (Sections 2.4, 3.1): Generic data e.g. using THDB, and Site specific data. Perform an expert sessions to identify a list of site specific hazards and hazardous events: Brainstorming and/or use of checklists. (Section 3.1). Documentation of results. (Section 3.3 gives example).
b.
c.
87
Main steps of a risk analysis:

1. Define scope 1a. Select team to involve, organise working process. 1b. Define scope of risk analysis. - Why is it carried out? - Are there restrictions, e.g. on type of hazardous events, or type of consequences to investigate 1c. Describe system, subsystem and main functions. Give system boundaries.
Main references
Section 2.1 Initiation and organisation Section 2.2 Decision situations Section 2.3 System description
2. Identify hazards and hazardous events 2a. Collect available event data. 2b. Perform expert sessions to identify site specific hazards/events. 2c. Document results (form).
Section 2.4 Hazardous events Section 3.1 Identification of hazardous events
3. Risk estimation 3a. Identify safety barriers: Assess causes and consequences of hazardeous events. 3b. Decide on qualitative (semiquantitative) or quantitative analysis. 3c. Estimate risk; either by - Qualitative analysis (using risk matrix), or - Quantitative anaysis.
Section 2.5 Safety barriers. Causes and consequences Chapter 4 Quantification of risk Chapter 5 Data for risk analysis Section 2.6 Risk estimation Section 3.2 Risk estimation in CRA
Figure 29. Main steps of a risk .analysis.
88
3. Risk estimation. Should include: a. Qualitative analysis: Identification of safety barriers. Specifying causes and possible consequences of hazardous events (Section 2.5). Decision on whether to carry out a qualitative (semi-quantitative) or quantitative analysis (Section 4.2). Perform risk estimation, either - Qualitative (semi-quantitative): use of risk matrix (Section 2.6, Section 3.2). - Quantitative (Section, 2.6; Sections 4.3-4.5). Data requirements: Chapter 5.
b.
c.
Note that main references above refer to semi-quantitative risk analyses (covering the entire system), which are less advanced, and can be carried out without high competence within risk analysis. However, in addition to this, Chapter 6 describes a number of more advanced/detailed risk analysis methods. Several of these are (also) quantitative, and risk quantification is described in Chapters 4 and 5.
89
90
Appendix B. DALY and a generalisation

This section focuses on DALY, which is the measure for health effects suggested by WHO. In TECHNEAU the DALY concept has been further developed by also including quantitative aspects. However, the DALY approach is most likely to be used at national level and not on water utility/company scale.
Appendix B.1. DALY An overall risk measure of health effects.
Disability adjusted life years (DALY) In their Guidelines for Drinking-water Quality, WHO [8] applies the Disability-Adjusted Life-Years (DALYs) as a risk measure. The basic principle of the DALY is to weight each health effect for its severity from 0 (normal good health) to 1 (death). Thus, the DALY combines in one measure the time lost due to premature mortality and the time lived with disability. The DALY measure combines information on 1. years of life lost due to premature death 2. years lost due to disability and other non-fatal consequences The DALY is calculated for a specific population, related to a specific disease or health condition or, in our case, to a specific event related to water safety. The DALY for the population in question is calculated as the sum of 1. the years of life lost due to premature mortality (YLL) in the population, and 2. the equivalent years lost due to disability (YLD) of the health condition, i.e. years in states of less than full health, That is DALY = YLL + YLD DALY could in principle be calculated for each undesired (hazardous) event of a water utility. However, it would hardly be data available to carry out such an exercise. By adding the DALY of all relevant hazardous events, we get the overall DALY, which is then a measure of the risk related to the water quality for this utility. Years of Life Lost (YLL) The years of life lost (YLL) basically corresponds to the number of deaths multiplied by the standard life expectancy at the age at which death occurs. The basic formula for YLL, is the following, (for a given cause): YLL = N x L where N = number of deaths L = standard life expectancy minus age of death in years. So, one DALY can be thought of as one lost year of healthy life. In order to calculate the YLL, we may split the population according to factors like age (groups) and sex.
91
Years Lived with Disability (YLD) (=Years of life lost due to disability) To estimate YLD for a particular event, the number of incidents in that period is multiplied by the average duration of the disease/disability and a weight factor that reflects the severity of the disability on a scale from 0 (perfect health) to 1 (dead). The basic formula for YLD is the following: YLD = I x DW x L where

I = number of affected persons DW = disability weight L = average duration of the case until remission or death (years)
So YLD is a measure of the burden of disability, given as a measurement of the gap between current health status (due to a hazardous event) and an ideal situation where everyone lives into old age free of disease and disability. There may of course be various contributions to YLD: Let I1 be the number of affected persons getting a disability corresponding to the weight, DW1, and with an average duration, L1. Similarly, let I2 be the number of affected persons having disability weight, DW2, with average duration, L2. Then in total YLD = I1 x DW1 x L1 + I2 x DW2 x L2 Disability Weights (DW) A crucial point in this approach is the determination of the DW corresponding to various disabilities. A large amount of work has been carried out to assess these, e.g. [69, 70]. As described here a large number of disease stages were evaluated by panels of medical experts. Then an average disability weight of a disease was calculated, to which the stage disability weights contributed according to their share in the disease prevalence. The required distribution of the prevalence over the disease stages was obtained through consulting experts for each disorder. Some examples of DW given in [70] are: Upper respiratory infections DW = 0.003 Influenza DW = 0.01 Intestinal infectious diseases DW = 0.02 Lower respiratory infections DW = 0.04 STD (Bacterial only) DW = 0.07 Contact eczema DW = 0.07 Inflammatory bowel disease DW = 0.20 Tuberculosis DW = 0.23 Stomach cancer DW = 0.33 Parkinsons disease DW = 0.68 Such weights must necessarily be rather controversial. In effect, it implies a relative value of life (VOL), which always results in controversy. Here, the judgement is that 50 persons having an inflammatory bowel disease (DW=0.2) of duration 1 year, corresponds to one person losing 10 years of healthy life; (in both cases DALY=10).
92
Yes, it is rational (not emotional) simplification, but not non-sense. Inflam. bowel disease is serious disease, it is not just normal diarrhoea (i.e. intestinal infectious diseases with DW 0.02, i.e. 10x lower than inflammatory bowel disease. TECHNEAU use of DALY An important input to DALY is Life expectancy, (expected time until death at various ages). Note that WHO explicitly built the egalitarian principles into the DALY, and the Global Burden of Disease (GBD) Study [71] used the same values for all regions of the world, but age and sex was taken into consideration when lost years of healthy life is calculated. The GBD study also used a 3% time discounting and non-uniform age weights, (i.e. giving less weight to years lived at young and older ages). Note that the objective of using DALY in a risk assessment study within a TECHNEAU framework will not be the same as in the global WHO studies. The WHO has an objective e.g. to identify main health problems and then make priorities worldwide. Therefore an egalitarian principle with a fixed life expectancy is appropriate. However, when a water utility shall carry out a risk assessment, the objective would rather be to identify the specific hazards and to prioritise risk reducing measures relevant for this specific water utility. Consequently, if DALY is used in such a study, one should rather apply the life expectancy of the relevant country (or the public to which the utility delivers water). Actually, when a water utility (national authority) is carrying out the analysis, it is suggested that the values used both for the life expectancy and the DW are adapted to the country in question. The use of discounting has been somewhat controversial even in WHO-applications, [69]. So in order to keep the approach as simple as possible, we suggest not to apply such discounting. We also suggest that all ages have equal weight; (corresponding to the formulas given above). Still it can be hard to apply the method for a water utility; in particular choosing appropriate weights. Values of DW used for the most typical diseases caused by drinking water could be collected. However, the question is how applicable DALY is for the drinking water producers, due to difficulties in communicating DALY and in weighting different types of consequences. Most likely, DALY would not be used as routine tool in risk assessment of water supply made by utility.
Appendix B.2. Generalisation of DALY. Combined measure for water quantity and quality.
For the consumer it is important both to have water of good quality and enough water. We cold now extend the main idea behind DALY to include also loss of water quantity, i.e.: Water is known to be seriously polluted, and can not be used for drinking Water delivery is interrupted for some time
The risk measures on quantity are typically calculated for the distribution network. Then the number of consumers affected (exposure) is estimated. A risk measure could now be obtained by adding a term taking into account Years lived with water supply of bad quality (YLQ) Years lived without water supply (YLS).
93
These parameters are in their simplest form obtained by multiplying the number of consumers affected by the duration (number of years) of the bad supply/interruption, together with on some weighting factor. That is YLQ = J x QW x LQ YLS = K x SW x LS where J= QW = LQ = K= SW = LS = Number of people receiving water of bad quality (undrinkable) Weight of inconvenience of receiving bad water Duration of delivery of bad water (in years) Number of people not receiving water Weight of inconvenience of not receiving water Duration of lack of delivery of water (in years)
Also LQ and LS should be given in years, in order to make the measure compatible with DALY. To give a simple numerical example: Assume a total of K = 500 000 people do not receive water during a period of LS = 0.02 years (approximately 1 week). Assume (as an illustration) the weight SW = 0.00038. Then YLS = 500 000 x 0.0003 x 0.02 = 3 If we shall combine this with DALY to give an overall measure of risk, the main problem is the weight, SW. But if it is agreed that a weight SW = 0.0003 is comparable with the weights DW, we could add this YLS to DALY, and get a total measure of risk. We could also include the adverse effects for industries dependent on water delivery. In this way we can combine all losses to consumers in one single measure. Such a measure as indicated here, of course would have to be specified further. However, as the use of DALY is probably not very relevant for water utility, this is not pursued here: From the point of view of the water works, the risk is to fail national or internal standards (water quality and quantity). Water quality standards set up by health authorities are already existing health based targets, so it is not the water utilities duty to calculate these values. TECHNEAU should support the end user (i.e. the water utility) to assess the risk of failing the own or legal requirements, expressed in quality standards, consumer trust etc.
This is not a recommended value, but just used as an example.
94
Appendix C. Two examples of FTA

Fault Tree Analysis (FTA) was discussed in Section 6.5. We here give two example of a fault tree analysis; first an analysis of one specific system (UV treatment), and then an overall system analysis.
C.1 A Fault Tree Analysis of an UV system
The first example is taken from the Bergen Case (Section 3.3). Here a FTA analysis was applied to analyse the UV-system, which is one of the treatment systems. The fault tree is illustrated in Figure 30. The top event (i.e. the unwanted event) is defined as UV-system delivers water that is not treated according to requirements. For this top event to occur we must have that both: UV applies a too low dose, and Automatic shut-down fails (there are sensors that shall activate a valve to shutdown water production if UV-dose is too low)
Figure 30. Fault tree for the top event UV-system delivers water that is not treated according to requirements.
This is shown by an AND-gate directly above these two events. These two events can then be evaluated further, as indicated in the figure. Too low UV dose can occur if either: Sensors measuring the UV-dose gives too high values, or
95
Water flow is too high (so that water does not stay long enough to get sufficient UV radiation), or UV intensity is too low.
These three events can then be evaluated further, as indicated in the figure. The automatic shut down fails if either: Shut-down system is disabled by operator (e.g. after a wash), or Sensor measures a too high UV-intensity, or Control system fails to activate shut-down valve, or Shut-down fails to shut down when required by sensor/control system.
Also these events could be further developed (not given in fault tree). This fault tree can be further analysed to give the cut sets, i.e. combinations of the basic events that result in the top event. This is helpful to identify the most critical failures. We observe that Sensor(s) giving too high values can result in both Too low UV dose applied and Automatic shut down fails. Therefore it is realised that the sensors appear to be a critical component. Various software allow us to analyse this fault tree further, both A qualitative analysis: Find minimum combinations of events that result in the top event (Cut sets): A quantitative analysis: Assessing probability of top event.
C.2 Integrated risk analysis: Fault-tree analysis to investigate causes of failures
This section shortly presents an example (from the city of Gteborg, Sweden) of an integrated risk analysis of the water supply system, from source to tap. To carry out the analysis an integrated and probabilistic fault tree method was used, see [28, 66]. The entire system was considered and water quantity as well as quality aspects were included. The analysis is further described by [31] and [30]. The method is based on the fault tree technique and a Markovian approach was applied to be able to not only calculate the probability of failure, but also the failure rate and mean down time of the system. By including the number of people affected by different events in the fault tree, the risk levels could be calculated in terms of Customer Minutes Lost (CML). The estimated levels of risk were compared to politically established performance targets that can be considered as acceptable levels of risk. Method The main failure event studied in the analysis was supply failure, defined as including: (1) quantity failure, i.e. no water is delivered to the consumer; and (2) quality failure, i.e. water is delivered, but unfit for human consumption according to existing water-quality standards.
96
The drinking-water system was modelled as a supply chain composed of the following main sub-systems: raw water, treatment and distribution. Failure events in any of the three sub-systems may cause supply failure. However, the system has an inherent ability to compensate for failure. For example, failure of the treatment plant to produce drinking water may be compensated for by reservoirs in the distribution system. To construct the fault tree four types of logic gates were used. The logic gates illustrate the interaction between different events. The following logic gates were used: OR-gate: only one of the input events has to occur to cause system failure. AND-gate: all input events have to occur to cause the system to fail. 1st variant of AND-gate: one or several events may compensate for one failure event during a limited time period. 2nd variant of AND-gate: similar to the 1st variant, but with the important difference that the compensating event may recover and start to compensate when it has failed.
In Figure 31 a schematic fault tree structure including the main events is illustrated. The figure shows that the drinking water system was divided into its three main sub-systems. The entire fault tree for the Gteborg system included 116 basic events, 100 intermediate events and 101 logic gates.
Supply failure
Raw water failure
Treatment failure
Distribution failure
Raw water quantity failure (Q = 0)
Treatment quantity failure (Q = 0)
Distribution quantity failure (Q = 0) Distribution quality failure (Q > 0, C')
Quantity failure
Quantity failure
Treatment fails to compensate Distribution fails to compensate Raw water quality failure (Q > 0, C') Treament quality failure (Q > 0, C')
Distribution fails to compensate
Quality failure
Quality failure
Distribution fails to compensate
Treatment fails to compensate Distribution fails to compensate
OR-gate st 1 variant of AND-gate Q = flow (Q = 0, no water is delivered to the consumer; Q > 0 water is delivered) C' = The drinking water does not comply with existing water-quality standards
Figure 31. Schematic fault tree including the main events. All events were further developed in the analysis using also the traditional AND-gate and the second variant of it. The distribution was assumed to not be able to compensate for quality failure in previous sub-systems, but have been included here to illustrate the general thinking.
97
By using a Markovian approach the probability of failure as well as the failure rate and mean down time of the system could be calculated for the top event as well as for all intermediate events. By defining the number of people affected by different main type of events in the fault tree, the risk levels could be calculated in terms of CML. Both the risk levels related to quantity and quality failures were expressed using CML. However, they were presented separately in order to retain transparency. To input data to the fault tree model was based on hard data (e.g. measurements and statistics on events), expert judgments and combinations of these. Since a probabilistic approach was used, uncertainties were considered by defining all estimates by means if probability distributions. The calculations were performed using Monte Carlo simulations. Hence, the uncertainties of the results could be analysed. Analysis procedure To facilitate the analysis work a team of water utility personnel and researchers, was set up, i.e. people with different knowledge about the system and the risk analysis method. To analyse the drinking water system by means of the fault tree method was an iterative process. The main analysis steps were: scope definition, system description, hazard identification, fault-tree construction, evaluation of available data, expert judgements, risk estimation, uncertainty analysis and evaluation of results. Results For the top event as well as all intermediate events the fault tree analysis provides information on the probability of failure, failure rate and mean down time, for quantity and quality failure. Also quantity and quality related CML is estimated for the entire system as well as its three main sub-systems. All parameters are estimated as probability distributions, which mean that information on the uncertainties is provided. The CML illustrates the level of risk and the probability of failure, failure rate and mean down time provides information on the dynamic behaviour of the system. Two subsystems may cause same number of CML, but have different failure rate and mean down time, and affect different amount of people. Some parts of the results were compares to performance targets defined by the City of Gteborg. These targets were considered as acceptable levels of risk and one of the targets were: Duration of interruption in delivery to the average consumer shall, irrespective of the reason, totally be less than 10 days in 100 years. This criterion was translated to 144 annual CML for the average consumer. When compared to the results of the fault tree analysis (mean value: 608 annual CML) it was concluded that the probability of exceeding the criterion was 0.84.
98
Appendix D. Procedure and example of FMECA

Scope and Overview The Failure Modes and Effects Analysis (FMEA) and the Failure Modes, Effects and Criticality Analysis (FMECA) are both risk analysis techniques to identify possible failure modes with their causes and effects on a system as well as measures to avoid or reduce the failures. A system can be a software, hardware or a process. But FMEA/FMECA can also be used to identify human failure modes and effects. The FMECA is a further development of the FMEA and contains additionally the criticality analysis with a combined measure for risk in order to evaluate the risks and prioritise their risk reduction options. The FMEA also classifies the severity of the effects from a certain failure mode and estimates the probability of the occurrence of a certain failure mode, but both are not combined to a measure of risk. Therefore the FMEA is a qualitative technique and the FMECA can be qualitative, semiquantitative or quantitative, depending on the kind of data used. Qualitative as well as quantitative results from a FMEA/FMECA can be an input to other analysis techniques such as fault tree analysis. The application of FMEA/FMECA is advantageous in an early phase of design of a process or product since the reduction of failures is in this stage cost efficient. But they can also be applied during the normal operation phase of a process. Procedure Making a reliability plan, obtaining the definition of the analysed system, the specific purposes of the analysis, its scope and objectives as well as all conducted actions and measures. Assembling a team which is sufficiently qualified to identify and assess the effects and their severity in consequence of events or failure. Definition and comprehension of the analysed system including all information about characteristics, performances, roles and functions from all considered system elements at all system levels, the logical connections between the elements, their redundancy, the inputs and outputs of the system as well as the set up of the system with the corresponding operating conditions. Breaking down the system into its elements and structuring it by determination of the highest to the lowest system level with the help of block diagrams (IEC 61078). This consideration of the system levels is important because the effect of one failure mode on a lower level can be the cause of a failure mode at the next or highest level (Figure 32). The FMEA/FMECA begins always on the lowest level. The system structure with the functional connections between elements, input and output information and redundancies should be illustrated in the block diagram in order to reconstruct function failures.
99
System Subsystem 2 Subsystem 1 Subsystem 3 Subsystem 4 Subsystem 5
Effect: Failure of Subsystem 4 Subsystem 4
Module 1
Module 2
Module 3
Module 4
Effect: Failure of Module 3 Module 3 Part 3 Part 1 Part 2 Part 4 Part 5
Effect: Failure of Part 2 Part 2
Failure mode 1
Failure mode 2
Failure mode 3
Effect: Failure mode 3 Part 2, Causes for failure mode 3
Cause 1
Cause 2
Cause 3
Figure 32. Block diagram for the connections between failure modes and effects in a system [72]. For instance: Causes 1 and 2 lead to the effect that failure mode 3 occurs. Failure mode 3 causes the failure of part 2. The effect of the failure from part 2 is the failure of module 3. The failure of module 3 in turn is the cause of the failure of the subsystem 4.
100
Analysis (Figure 33) 1. Selection of component of the system for the analysis. 2. Identification of all failure modes for the selected component. Very general categories for failure modes could be for instance the following: failure during the operation, failure to operate at a prescribed time, failure to cease operation at a prescribed time, premature operation. The more detailed failure modes can be identified by thinking about the function of the component, its performance specification and function and stresses under appropriate test conditions. 3. Selection of failure mode to be analysed. 4. Identification of the immediate effect and the possible final effect on a higher level of the failure mode. An effect is the consequence of one failure mode regarding the operation, the function or the status of a system. An effect occurs due to one or more failure modes of one or more components. On each level the effect of a failure mode on a next higher level should be assessed. 5. Determine the severity of the final effect. The severity is the assessment of the significance of the effect on the system operation. The classification of the effects is dependent on the kind of the FMEAimplementation. It is often qualitative and similar with the risk matrices proposed by the WSP. 6. Identification of the potential causes of that failure mode. These causes have to be independent from each other. They can be determined by expert opinions or analysis of field failure and failures in test units. It is not always useful to describe all causes. That will be determined by the severity of the corresponding effects. 7. Estimation of the frequency or probability that the failure mode occurs during the pre-determined period. This step is important for the evaluation of the failure mode. The probability can be estimated with help of data from the life testing of the components, from available databases containing failure rates, field failure data and data from failures of similar components. The probability has to be estimated for a time period, for instance for the warranty period or the predetermined life period. For a FMECA a special analysis is carried out to assess the criticality as the combination of the severity of an effect and the probability of occurrence. The procedure of a FMECA is described further below (see: The Criticality Analysis below, in this Appendix). 8. Proposal of an appropriate risk reduction method, corrective measure or compensating provision if this failure mode has to be avoided or reduced.
101
Initiate FMEA or FMECA of an item Select a component of the item to analyse Identify failure modes of the selected component Select the failure mode to analyse Identify immediate effect and the final effect of the failure mode Determine severity of the final effect Identify potential causes of that failure mode Estimate frequency or probability of occurrence for the failure mode during the predetermined time period
Do severity and/or probability of occurrence warrant the need for action? Yes Propose mitigation method, corrective actions or compensating provisions. Identify actions and responsible personnel. Document notes, recommendations, actions and remarks
No
Are there more of the component failure modes to analyze? Yes
No
Are there other components for analysis?
No
Yes
Complete FMEA. Determine the next revision date as appropriate
Figure 33 Procedure of the FMEA/FMECA analysis [72].
Compilation of a report containing all analysed details of the system with the source of data, assumptions made in the analysis, the procedure of the FMEA/FMECA with its elaborated system diagrams, worksheets and risk matrices as well as all recommendations for further analyses, design changes etc.
102
Review of the system by conducting a new FMECA to assess the failure modes after implementing the risk reduction options. The Criticality Analysis One possible quantitative measure in the Criticality Analysis is the Risk Priority Number (RPN), a calculation based on the equation of risk R = S P where S is the severity (consequence) and P is the probability: RPN = S * O * D where O = probability of the occurrence of the failure during a given period D = an estimate for time needed to detect the failure In the case of available real data the RPN is a quantitative measure, otherwise it is a semiquantitative calculation by using ranks for each parameter. The failure mode with the highest RPN is the prioritized failure. Another possible measure is a so called Criticality Number, which includes a combination of the failure rate for one failure mode and the operation time of the system. Basing on this measure a special approach is to calculate the probability of occurrence of one failure mode with the following equation:
Pi = 1 e C
i
where Pi=the probability of occurrence of the failure mode i Ci=the Criticality Number for the failure mode i A possibility to illustrate the risk (criticality) is to combine severity and probability of occurrence of a failure mode in a matrix analogues to the risk matrix described in the WSP. Example for the application of the FMEA procedure: As an example the risk assessment method FMEA is applied on a drinking water treatment system with the treatment steps roughing filters, flocculation, membrane filtration and disinfection. The first step of the FMEA is to build up a block diagram of the whole treatment system that should be analysed (Figure 34). It has to be compiled for all subsystems and all modules. In the following the structure of such a block diagram to identify all possible failure modes is shown exemplarily for the subsystem membrane filtration with the chosen module membrane module. Subsequently one part of the chosen module has to be selected to start the identification of failure modes. In this example the part membrane capillary has been selected.
103
System: Treatment Roughing filters Flocculation Membranefiltration (Ultrafiltration) Disinfection
Effect: Failure of Subsystem 4: Membranefiltration Subsystem 4: Membranefiltration Feedpipe Pump Membranemodules Permeatetube CleaningSystem
Effect: Failure of Module 3: Membranemodules Module 3: Membranemodule Feedjunction Resin sealing Membrane capillaries Permeatejunction Modul manteling
Effect: Failure of Part 3: Membrane capillaries
Figure 34. Compilation of a block diagram for an exemplary drinking water treatment process.
The identification of failure modes which can occur in membrane capillaries is conducted by using expert judgements and data available for the selected part. In this example three failure modes are identified (Table 15). The result of this step can be illustrated by integrating the failure modes in the block diagram (Figure 35).
Table 15. Identified failure modes for the part membrane capillaries. Failure mode 1: Failure mode 2: Failure mode 3: Fibre breakage Membrane fouling Membrane material imperfections
104
Effect: Failure of Subsystem 4: Membranefiltration Subsystem 4: Membranefiltration Feedpipe Pump Membrane modules Permeatetube CleaningSystem Monitoring
Effect: Failure of Module 3: Membranemodule s Module 3: Membranemodule Feedjunction Resin sealing Membrane capillaries Permeatejunction Modul manteling
Effect: Failure ofPart 3: Membrane capillaries Part 2: Membrane capillaries Failure mode 1 Failure mode 2 Failure mode 3
Effect: Failure mode 3
Figure 35. Integration of the identified failure modes in the block diagram.
In the next step one failure mode is chosen for further analysis. In this example the failure mode 3 "membrane material imperfections" is analysed. The immediate and final effect is identified as well as the severity of the final consequences is determined. After the identification of all possible causes of the failure mode the corresponding frequency or probability that the failure mode occurs during the pre-determined period is estimated. After this step a risk matrix can be applied in order to assess the risk that the failure mode occurs and cause harm. The risk matrix often combines the severity and the probability or frequency of a failure mode. Using the results of the risk matrix the failure modes can be ranked according to their risk. Now it has to be determined which risks are acceptable. After that, risk reduction options, corrective actions or compensating provisions can be proposed and established for the failure modes with a not acceptable risk (Table 16). In the final block diagram (Figure 36) the identified causes are integrated. For a complete analysis all above mentioned steps have to be repeated for each part of the system and for each failure mode.
105
Table 16. Identification of causes of a failure mode, assessment of severity and probability of occurrence as well as proposal risk reduction options.
Failure mode: Membran e material imperfect ions (failure mode 3) Immediate effect membrane capillars have imperfection Final effect contaminant s in the filtered water Possible causes failure in the manufacturing process exceeding of pressure Aging Harmful chemicals in the water before the filtration Estimation of severity 3 Estimation of frequency 0,1 10-6 Result risk matrix Risk reduction options
0,5 10-6 1 10-6 0,2 10-6
Effect: Failure of Subsystem 4: Membranefiltration Subsystem 4: Membranefiltration Feedpipe Pump Membranemodules Permeatetube CleaningSystem Monitoring
Effect: Failure of Module 3: Membranemodules Module 3: Membranemodule Feedjunction Resin sealing Membrane capillaries Permeatejunction Modul manteling
Effect: Failure of Part 3: Membrane capillaries Part 2: Membrane capillaries
Failure mode 1
Failure mode 2
Failure mode 3
Effect: Failure mode 3 Part 2, Causes for failure mode 3
Cause 1
Cause 2
Cause 3
Cause 4
Figure 36. Integration of the identified failure mode causes in the block diagram.
106
Appendix E. Analyses to establish treatment and monitoring system

During a design phase of a drinking water treatment plant, after the identification of hazards and hazardous events, there is a need to determine the requirements of the treatment system depending on the identified hazards.
A possible approach to establish the treatment steps is a risk assessment method described in the literature [24, 25]. The approach is a comprehensive method for calculating the probability of water quality parameters for an existing drinking water treatment and monitoring system to exceed certain threshold values. But this method is also suitable for giving specifications for a treatment and monitoring system. In the following the procedure of this approach is described, with focus on assessment in the planning phase of a drinking water supply. The basis is the definition of relevant parameters for the drinking water quality and corresponding threshold values that must not be exceeded to have a sufficient drinking water quality. Lan et al. [24] mentioned 63 parameters to be taken into account in drinking water supply. But the choice of parameters and values can individually be made by the responsible person for the risk assessment. The thresholds can base on national guideline values as well as on internal standards of the water utility. For existing treatment systems removal efficiencies are identified for each treatment step for the normal operation (= nominal mode). With these removal efficiencies transfer functions are defined for each treatment step, inverted and finally combined to one overall inverse transfer function:
C Crt ( FM i ; Pj ) =
where FMi FM0 Pj CGV rk CCrt
C GV
(1 r )
k k =0
= failure mode i = nominal mode = parameter j = guideline value = reduction factor for treatment step k = critical concentration in the raw water that may not be exceeded to comply with the guideline values in the drinking water
With this function critical raw water concentrations are calculated that must not be exceeded in the raw water to make sure that the drinking water complies with the thresholds in the case that the treatment works in the normal operation mode. In order to establish a treatment and monitoring system optional treatment steps can be assessed with this method to support the process development. Therefore the removal efficiencies have to be generated for the optional and possibly suitable treatment methods.
107
To assess the probability of exceeding thresholds in the case of a deviation from the nominal mode (= failure mode) a Failure Mode and Effect and Criticality Analysis (FMECA) have to be performed for the treatment system. FMECA is especially suitable for application in the design phase of a process and therefore appropriate for a treatment and monitoring system that should be established. The result of the FMECA is a failure mode arrays for all treatment steps and all identified failure modes. The arrays contain the kind and a description of the failure modes as well as an estimated probability for occurrence of the corresponding failure mode. Additionally the reduced reduction efficiencies are determined, that means the reduction efficiencies in the case of a specific failure mode. With these reduced removal efficiencies critical raw water concentrations can be calculated analogue to those for the nominal mode. These critical raw water concentrations must not be exceeded to make sure that the drinking water complies with the thresholds in the case that the corresponding failure modes occur. To calculate the probabilities that the threshold for a certain parameter is exceeded at a certain operational mode, the combination of the events "failure respectively nominal mode" and "exceedance of the critical raw water concentration" is regarded (Figure 37). The event combinations for the exceedance of one specific parameter at any operational mode (Figure 37a) respectively at one specific mode of any parameter (Figure 37b) can be illustrated by the fault trees.
a
exceedance of the guideline value for the parameter k n
b
exceedance of any guideline value due to failure mode i
i=0
FMi
k=0 FMi eik eik
with: FMi = Failure mode i = Nominal mode without failure F0 eik = exceedance of the critical concentration for FMi in the raw water for parameter k
with: FMi = Failure mode i = Nominal mode without failure F0 eik = exceedance of the critical concentration for FMi in the raw water for parameter k
Figure 37. Fault trees for the event combination failure modes/nominal mode and exceeding critical raw water concentration
Example for the application of this method: In a simple example the method should be applied in a situation where a drinking water supply is designed with a simple treatment including the processes; activated carbon
108
filtration and disinfection. For nominal mode Table 17 shows the calculation of critical raw water concentrations which are not allowed to be exceeded to comply with the threshold values drinking water standards. Reduction factors (removal efficiency) have to be defined for each treatment step and in this example for the two parameters pesticide x and bacteria y. By using these reduction factors and the threshold values for drinking water the inverse transfer function gives critical raw water concentrations 0.3 mg/L and 5*10-6 1/L respectively.
Table 17. Calculation of the critical raw water concentration C for the nominal mode.
P1 Parameter Pesticide xy x Threshold reduction factor r1 Activated carbon filter transfer function inverse transfer function reduction factor r2 Disinfection transfer function inverse transfer function
C input(threshold) = threshold / (1 -r2) Coutput = Cinput/(10) Cinput = Coutput/(10) -
P2 Bacteria xy y 0.03 0.9 1.0E-6 0

Coutput = Cinput/(10) Cinput = Coutput/(10) -
Coutput = Cinput/(10.9) Cinput = Coutput/(10.9) -
0.8
Coutput = Cinput/(10.8) Cinput = Coutput/(10.8) -
inverse transfer function disinfection (threshold) combination of both inverse transfer functions
0.03
5.0E-6
C raw water(threshold) = (threshold / -r2)) / (1-r1) (1
0.3
5.0E-6
The application of the FMECA method leads to the identification of failure modes. One example for a failure mode and the information deriving from the FMECA method is shown in Table 18. It is necessary to assess the reduced reduction factors for both parameters and both treatment steps in the FMECA.
Table 18. Example of failure mode array for the activated carbon filter.
Failure mode 1 (FM1) for activated carbon filter Failure Failure rate (1/h) Latency (h) Unavailability UFM1 r1d1 Removal efficiency for P1 r1d2 Removal efficiency for P2 r2d1 Removal efficiency for P1 r2d2 Removal efficiency for P2 Activated carbon filter failure 0.002 0.0002 4.0E-7 0.6 0 0 0.8
109
The critical raw water concentrations in the case of a failure can be calculated by analogy with the nominal mode using the reduced reduction factors from the failure mode array: For each failure mode the inverse transfer function is compiled. Then the critical concentrations in the raw water can be determined by using the thresholds for the drinking water and the reduced reduction factors. This calculation is shown for two exemplary failure modes in Table 19 and Table 20. Table 19. Calculation of the critical raw water concentration C for the Failure Mode 1 in the activated carbon filter.
Failure mode FM1 (for activated carbon filter)

P1 Parameter Pesticide xy Threshold reduction factor r1 Activated carbon filter transfer function inverse transfer function reduction factor r2 Disinfection transfer function inverse transfer function
C input(threshold) = threshold / (1-r2) Coutput = Cinput/(1-0) Cinput = Coutput/(1-0)
P2 Bacteria xy 0.03 0.6 1.0E-6 0

Coutput = Cinput/(1-0) Cinput = Coutput/(1-0)
Coutput = Cinput/(1-0.9) Cinput = Coutput/(1-0.9)
0,8
0.03
5.0E-6
C raw water(threshold) = (threshold / (1-r2)) / (1-r1)
0.075
5.0E-6
Table 20. Calculation of the critical raw water concentration C for the Failure Mode 2 in the disinfection.
Failure mode FM2 (for disinfection)
P1 Parameter Pesticide xy Threshold reduction factor r1 Activated carbon filter transfer function inverse transfer function reduction factor r2 Disinfection transfer function inverse transfer function
C input(threshold) = threshold / (1-r2) Coutput = Cinput/(1-0) Cinput = Coutput/(1-0)
P2 Bacteria xy 0.03 0.9 1.0E-6 0

Coutput = Cinput/(1-0) Cinput = Coutput/(1-0)
0.5
0.03
2.0E-6
C raw water(threshold) = (threshold / (1-r2)) / (1-r1)
0.3
2.0E-6
110
Another part of the method is the assessment of the probability that the calculated critical concentrations occur in the raw water as shown in Table 21. In this table probabilities for the existence of the calculated concentration values in the raw water for certain parameters have to be estimated according to measurements, expert judgements or information from literature. Table 21. Estimated probabilities for the parameters pesticide and bacteria in nominal and failure mode.
Parameter 1: Pesticide Critical values Nominal mode Failure mode 1 Failure mode 2 0.3 0.075 0.3 Parameter 2: Bacteria Assessed probability 2.0E-5 2.0E-5 2.0E-5
Assessed Critical values probability 1.0E-6 5.0E-6 1.7E-6 5.0E-6 1.9E-6 2.0E-6
The last step of this method is a fault tree used to combine the events "failure mode" and "exceedance of critical concentrations in the raw water". If such an event combination occurs, the given threshold concentrations for the drinking water will be exceeded. Figure 38 shows these possibilities of event combinations leading to such an exceedance.
Pesticide > 0,03 or
Bacteria > 1 x 10-5 or
+
FM1 > 0.075 > 0.3 FM2
+
> 0.3 FM1
+
>5x 10-6 >5x 10-6 FM2
+
>2x 10-6
Figure 38.Fault trees with event combinations for the parameters pesticides and bacteria.
The calculation of the probability for the occurrence of the event combinations is shown in Table 22.
111
Table 22. Calculation of the probability of the event combinations.

Overview of the possible events with their probability values P1 Raw water event in the nominal mode Raw water event in the failure mode 1 Event of failure mode 1 Combination of raw water event in FM1 and event FM 1 Raw water event in the failure mode 2 Event of failure mode 2 Combination of raw water event in FM2 and event FM 2 0.000001 0.0000017 0.0000004 6.8E-13 0.0000019 0.00000025 4.75E-13 P2 0.00002 0.00002 0.0000004 8E-12 0.00002 0.00000025 5E-12
Calculation of the final probabilities P1 1E-06 P2 0.00002
112
Appendix F: Some fundamental reliability concepts

Here we present some fundamental probabilistic concepts. When we shall quantify the reliability of a system (component), it is most often given as the probability that the system fails within a given period of time. The reliability is directly given by the failure rate; (often denoted or rather (t), as this rate can depend on the age of the barrier). This is the rate of which failures for a barrier of a given age occurs. A mental model for the failure rate that is common in the literature is the bathtub curve. Very often the failure rate shows a bath tub like behaviour as illustrated in Figure 39.
Failure rate
1 2
Time
Figure 39. Bath tub curve (Failure rate as a function of age). The model indicates that many systems and components will have a high failure rate in the initial period (1) where there often are some start up problems, a useful life period (2) where the failure rate is constant and only random failures occur, and a wear out period (3) where the failure rate increases because of wear and tear of the system components. There are many methods, e.g. mathematical models for modelling different types of fault development (for a thorough introduction see [23]). In this report the failure rate is expected to be in the period of useful life and a constant failure rate is assumed. The mean time to failure (MTTF) for a barrier with a constant failure rate, (i.e. being independent of age, t) is MTTF=1/ If the probability of failure is to be determined, a time interval needs to be defined. The probability of a barrier surviving a time period t is given by the reliability R(t)=e- t. where t is the elapsed time period (which can be interpreted as the age). The probability of failure within time t is: P(failure) = 1-R(t). We often want to determine the unavailability of the barrier. Then we also need the time which the barrier is not functioning, each time it has failed. This mean time from the barrier has failed until it is operational again is denoted Mean time to repair (MTTR).
113
If failures are detected immediately (not requiring a test), the mean time the barrier is in a failed state (i.e. is unavailable) equals: U MTTR/(MTTF +MTTR) If a barrier is a sensor/detector, giving alarm when a certain level (of concentration) is too high or low, the functioning of the barrier can usually not be decided without testing it. Thus, the barrier can have a dormant failure, and its performance by the probability of failure on demand (PFD)9. If there is a constant failure rate (), the PFD approximately equals: PFD ()/2 where is the test interval. So, if =0.1 per year, and the component is tested every six months (=0.5 year), we get PFD 0.025 (=2.5%). If the barrier consists of a redundant system of component, the calculation increases in complexity (see [23]). Within the water sector a lot of effort has been put into modelling the bathtub curve (failure rate for repairable systems). Special focus has been on the right end of the curve, i.e. the deterioration phase, where failures occur more and more frequently. The models have developed from simple regressions models to more advanced stochastic models. One example of tools for failure forecasting of pipe breaks in a water network is the project CARE-W (see http://care-w.unife.it/). The tools use knowledge of past, observed and recorded failures. A failure is defined in this case as a break or detected leak that has necessitated repair to the pipe.
Also often denoted Mean Fractional Dead Time (MFDT). This is the probability that the barrier will function when needed, (i.e. by an actual random demand).
114
115

Risk Analysis in Water Drinking Systems

Încărcat de

Informații document

Descriere originală:

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Risk Analysis in Water Drinking Systems

Încărcat de

Drepturi de autor:

Formate disponibile

Techneau, June 2009

Techneau, June 2009

Quality Assurance By LNEC and KWR Deliverable number D 4.2.4

This report is: PU = Public

Risk analysis of drinking water systems From source to tap

Coarse risk analysis of water supply systems

Data for risk analysis

More advanced risk analysis methods for water supply systems

Summary of risk analysis methods References

Appendix A: Main steps of a risk analysis Appendix B. DALY and a generalisation

Appendix C. Two examples of FTA

1.2 Content of the report

Risk Analysis Define Scope Identify Hazards Estimate Risks

Get new information

Analyse sensitivity Risk Evaluation Define tolerability criteria

Develop supporting programmes Document and assure quality

Analyse risk reduction options

Risk Reduction/ Control Make decisions Treat risks Monitor

Report and communicate

Review, approve and audit

2 Risk analysis of drinking water systems From source to tap

2.1 Initiation and organisation of a complete risk analysis

2.2 Relevant decision situations for water utilities

More generally, risk analyses could be initiated by problems like:

2.3 System description

2.4 Hazardous events

2.5 Safety barriers Causes and consequences of hazardous events

2.6 Risk estimation

Figure 7. An example of a Risk matrix.

3 Coarse risk analysis of water supply systems

3.1 Identification of hazardous events

3.1.2 TECHNEAU Hazard Data Base (THDB)

Ref.: Hazardous event: Type of hazardous event:

Consequence description: Consequence to sub-system: Rel. system:

Water treatment plant Distribution and plumbing

3.2 Risk estimation in Coarse Risk Analysis (CRA)

Table 4. Example of a CRA- worksheet. System: Treatment

Operating mode: Normal operation

Analyst: NN Date: 2008-10-10

Too low UV Ageing or colour P2 1) dose sediments on quartz tube

Probability category Consequence category

3.3 Tool for Coarse Risk Analysis (CRA)

4.1 The dimensions of risk and various ways to quantify risk

frequency of interruptions of water supply

Note that some fundamental probability concepts are discussed in Appendix F.

duration of the interruption, exposure, i.e. number of consumers being affected.

4.2 Qualitative versus quantitative expressions for risk

4.3 Risk measures for loss of water quality

4.4 Risk measures for loss of water quantity (supply)

4.5 Risk measured in monetary units

5 Data for risk analysis

5.2 Data needs

5.3 Data sources

5.3.1 Types of data sources

6 More advanced risk analysis methods for water supply systems

Figure 12 Two simple examples of risk analysis methods

Figure 14. Flow diagram for the HAZOP analysis

Consequences Water not disinfected

High chlorine concentration in water Water not disinfected

Flow in opposite direction

6.3 Failure Modes, Effects and Criticality Analysis (FMECA)

Table 9. Example of FMECA worksheet

6.4 Removal efficiency of the water treatment system