Design Considerations For Fault Management in Wireless Sensor Networks

Design Considerations for Fault Management in
Wireless Sensor Networks

Muhammad Z Khan, Madjid Merabti, Bob Askwith
School of Computing and Mathematical Sciences,
Liverpool John Moores University
Byrom St. Liverpool, L3 3AF, UK
M.Zahid-Khan@2008.ljmu.ac.uk
Abstract- Wireless Sensor Networks (WSNs) are envisioned as and inaccessible environments, where there is no or less human
densely deployed tiny sensors, left unattended to monitor and accessibility e.g. battlefield and chemically polluted areas etc.
interact with physical and environmental phenomena. Faults and
failures are inevitable in WSNs due to the inhospitable
environment and unattended deployment. In this paper, we
survey fault management in WSNs, and review and categorize
current approaches and techniques dealing with faults and failure
in WSNs at different levels. The categorization is based on the
different phases of fault management, i.e. fault detection, fault
diagnosis and fault recovery. Based on the literature survey we
elaborate different issues and problems in existing approaches for
fault management. We attest that most of these approaches are
application specific and address faults only at a certain level.
Therefore, it cannot guarantee that a protocol developed for one
specific application can carry over directly to another application.
We finally, outline a design criterion to develop application Figure 1. Wireless Sensor Network
independent fault management architecture, which can provide
extensive fault management for all types of faults and failures Sensor nodes in WSNs are expected to operate autonomously
with a more holistic approach to enable a wider adoption of WSNs for a long period of time and may not be easily approachable
applications and technology. This survey is a part of our ongoing for battery replacement and maintenance due to their physical
research to develop application independent fault management. deployment location. Therefore, faults and failures are normal
We are currently investigating mechanisms to inject application’s facts in WSNs. Thus, in order to guarantee the network quality
knowledge into the large computing management infrastructure
of service and performance, it is essential for the WSNs to be
of WSNs. Application knowledge is the driving force to direct its
operations, in order to tailor to the special needs of one able to detect faults and failures and to perform something akin
application to another application. to healing and recovering from events that might cause faults
or misbehaviour in the network. A set of functions or
I. INTRODUCTION applications designed specifically for this purpose is called a
fault-management platform. Most of the existing fault
Recent advances in wireless networking and communication, management approaches for WSNs have been integrated with
the development of MEMS (Micro-Electro-Mechanical application requirements. The main reason for this is that
Systems) and its integration with embedded microprocessors WSNs are energy and resource constrained, and direct
have enabled a new breed of sensor networks suitable for a application of traditional fault management techniques incurs a
wide range of civil, commercial, and military applications. significant overhead. Thereby, to design an application
Modern WSNs are made up of a collection of densely independent and efficient fault management architecture, we
deployed, inexpensive, tiny sensor devices that are networked must take into account a wide variety of sensor applications
through a low power wireless communication, to cooperatively with diverse needs, different sources of faults, and with various
monitor the physical or environmental phenomenon. network configurations. In addition, scalability, mobility, and
Figure.1 [1], is an example of general WSN, where sensor timeliness may have to be considered [2].
nodes are scattered into a sensor filed, perform sensing and In this paper, we discuss faults and fault management in
sending results back to the end user (performing local WSNs. We categorize and compare existing fault management
monitoring/remote monitoring) through Sink node. Proposed approaches based on their classification into three main phases:
applications of WSNs include environmental monitoring, fault detection, fault diagnosis and fault recovery. From the
habitat monitoring, structure monitoring, healthcare, disaster literature survey, we attest that most of the existing fault
prediction and management, enemy tracking in the battlefield, management approaches are tightly application specific and
security surveillance, home appliances and entertainment. The address faults only at a certain level. We also mention issues
core reason of its popularity is its low price and its ease of and problems in the existing approaches, and attest that there is
deployment, particularly such networks are useful in hazardous a need for fault management architecture with more holistic
approach to enable a wider adoption of WSNs applications and and minimize the risk of failure, consequently, make the
technology. We also outline some design criteria for network more fault tolerant [6]. Important functions of fault
developing an application independent architecture for WSNs. management include:
The rest of the paper is organized as follows: Section II defines • constant monitoring of system status and usage level
faults, sources of faults and types of faults in WSNs. Section • general diagnostics
III explain fault management in WSNs. In Section IV we • tracing the location of potential and actual failure
survey and categorize state of the art fault management • Auto-recovery and self-healing in the event of failure
approaches for WSN and mention different issues and A sensor network management system can be categorized
problems in them. Section V describes a design criterion for according to the approaches taken for monitoring, and control.
application independent fault management architecture for From the management system organization perspective, there
WSNs, and finally the paper concludes in section VI. are two main categories of network monitoring [7]: passive Vs
active monitoring, and pro-active Vs reactive monitoring.
II. SOURCES OF FAULTS IN WSNs
• Passive monitoring – The passive model triggers the
alarms when a fault is detected.
Fault is any kind of defect that leads the system to failure, and
• Active monitoring – Sensor nodes continuously send
failure is a situation when the system deviates from its
the keep alive or update messages to the control centre
specification and can’t deliver its intended functionality.
to inform them of their existence.
Koushanfar et al. [3] categorized faults into three types:
• Pro-active monitoring – A management system
• Permanent faults – These faults are continuous and actively collects and analyzes the network present
stable in nature e.g. hardware faults within a states to detect past events and to predict future events
component of a sensor node. in order to maintain the performance of the network.
• Intermittent faults – An intermittent fault has an • Reactive monitoring – A Management system gathers
occasional manifestation due to the unstable information about the network states to detect whether
characteristics of the hardware, or as a consequence of events of interest have occurred and then to take some
software being in a particular subset of space. adaptive measures to re-configure the network.
• Transient faults – Transient faults are the result of Fault management in WSNs can be classified according to its
some temporary environmental impact on otherwise network management system architecture [8]: centralized,
correct hardware, e.g. the impact of cosmic radiation distributed or hierarchical.
on the sensing enclosure of a sensor node. 1) Centralized architecture: Base station or the central
Faults in WSNs can occur for various reasons. Some of the manager has rich and unlimited resources. Therefore, it
prominent sources of faults mentioned in [2, 4] are: performs complex management tasks and controls the whole
1) Node Level Faults: Nodes are fragile; they may fail due to network.
the depletion of batteries, node's hardware/software 2) Distributed Architecture: Instead of having a single central
malfunction and the external impact of harsh environmental controller, distributed architecture employs multiple manager
conditions (direct contact with water causing short circuit, node stations throughout the whole network. Each manager controls
crash by tree falling etc). a sub-region of the network and may communicate directly
2) Network Level Faults: Instability of the link between nodes with other manager stations in a co-operative manner in order
causing network partitions and dynamic changes in network to perform management functions.
topology leads to network level faults. 3) Hierarchical Architecture: It is a hybrid between centralized
3) Sink Level Faults: Failure of the sink leads to a massive and distributed architectures. Sub-controller or managers are
failure of the network. At the sink level, software, that store distributed throughout the network in a tree shape hierarchical
and process data are subject to bugs and can lead to loss of data manner, having lower and higher level of hierarchy. These
within the period when fault occurs. managers are referred as the Intermediate managers, manage a
4) Faults caused by adversaries: Since WSNs are often sub-section of a network and perform the management
deployed for critical applications, attacks by adversaries may functions, but they don't communicate directly with each other.
cause node faults and consequently, lead the network to failure.
The lack of infrastructure and broadcast nature of wireless IV. FAULT MANAGEMENT APPRAOCHES
medium enable adversaries to intrude into the network, and
disrupt the whole functionality (e.g. routing, aggregation etc) Fault management in WSNs is different from traditional
of an individual sensor node [5]. networks. Recently, researchers have developed various
techniques and approaches to deal with various types of faults
III. FAULT MANAGEMENT IN WSNs at different layers of the network. To provide resilience in
faulty situations these three main actions (fault detection, fault
Fault management is a very important component of network diagnosis and fault recovery) must be performed [2, 9]. We
management concerned with detecting, diagnosing, and categorize these existing approaches according to different
resolving faults in the network. Proper implementation of fault phases of the fault management architecture, i.e. fault
management can keep the network running at an optimum level detection, fault diagnosis and fault recovery. In this section,
we will discuss these phases and state of the art approaches to potential failure in the network. WinMS has a lightweight
perform these functions. We also highlight different issues and TDMA protocol design; that provides energy-efficient
problems in the proposed fault management approaches for management, data transport and local repairs. However, the
WSNs. initial setup cost for creating a data gathering tree and node
schedule is dependent of the network density [7]. Staddon et al.
A. Fault Detection (First Phase) [15], while tracing failed nodes in the network proposed a
similar centralized management approach, whereby the
In the fault management of WSNs, fault detection is the first manager monitors the health of an individual sensor node to
phase, where faults and failures in the network are properly detect node failures in the network. The base station constructs
identified by the management system. The aim of the fault the whole map of the network topology with the help of nodes'
detection is to ensure that the services being provided are routing update message providing a method for recovering
functioning properly, and in some cases to predict if they will corrupted routes. Existing centralized-based approaches suffers
continue to function properly in the future. Generally, there are from problems such as insufficient scalability, availability and
two types of failure detection: explicit detection and implicit flexibility, when network becomes more distributed.
detection, for more details see [9]. Implicit detection is 2) Distributed Approaches: Distributed approaches employ the
normally carried out with a passive or active model. Recent concept of local decision making and distribute management
research has investigated automatic fault detection techniques functions throughout the network. The more decisions a local
for WSNs, because the method of visual observation and node can take by itself, the fewer the number of messages may
manual intervention for fault detection is unsuitable due to its need to be delivered to the central manager. These approaches
deployment in inaccessible and hostile environment. Existing conserve a lot of sensor node energy and ensure the longevity
fault detection approaches are mainly classified into two types: of the network [9]. Hsin and Liu [16] proposed an efficient
centralized and distributed approaches. distributed two-phase self-monitoring mechanism (TP) for fault
1) Centralized Approaches: In centralized approaches most of detection. In TP, health of an individual node has been
the management and monitoring tasks are performed by the monitored to detect malfunctioning nodes and intrusions that
central manager or base station, which have powerful resources can result in the destruction of nodes. Each node monitors its
(e.g. Energy, computing and memory etc). The central manager own health and its neighbours' health to provide local fault
generally adopts an active monitoring model to detect faults, detection. TP performs either explicit or implicit fault
states of the network performance, and life of an individual detection, based on a two-phase timer scheme for local co-
sensor node. A centralized sink location based scheme, ordination and information exchange among nodes. This
Sympathy [10], provides a debugging tool to detect and approach requires the network to be pre-configured and each
localize faults that may occur due to the interactions between sensor should have a unique ID. Failure detection through
different sensor nodes in the network. Sympathy has two main neighbour co-ordination is used in a number of different
types of nodes: Sympathy-sink and Sympathy-node. schemes [17, 18]. In these approaches, nodes co-ordinate and
Sympathy-sink makes request to Sympathy-node using exchange messages and information with their neighbours to
message-flooding technique to pool event data and current detect and identify network faults before contacting the central
states (metrics) of a network. A sympathy management system node. Cheng et al. [19], proposed a distributed mechanism for
actively monitors and dynamically collects run-time node fault and anomaly detection to identify failed or misbehaving
states and flow information towards Sympathy-sink. In nodes in event-driven WSNs applications.
addition, it detects possible faults by analyzing node states Clustering has become an emerging technology for building
together with network performance [11]. MANNA [12], a scalable and energy-balanced application for WSNs. Recent
policy-based centralized approach using the concept of external research has used clustering approach to evenly distribute fault
managers to detect faults in the network. MANNA assigns management tasks in the network. Clustering divide the whole
different roles (Managers or Agents) to various sensor nodes network into a group of sensor nodes called clusters, where one
depending on the network characteristics (homogeneous vs. node is selected as a Cluster Head (CH), which have its
heterogeneous) and topology. These distinguished nodes associated sensor nodes called cluster members. The CH
exchange request and response messages with each other for executes different fault management functions to detect faults
management purposes. MANNA performs centralized fault inside the cluster. Cluster based approaches for distributed fault
detection based on the analysis of gathered WSN data. detection are proposed by Venkataraman et al. [20] and Yao-
MANNA architecture requires manual configuration and Chang et al. [21]. The approach adopted in these mechanisms
human intervention to set up agents, which are not practical for is the exchange of messages between the CH and its member
sensor networks deployed in an inaccessible terrain. Similarly, sensor nodes. If a node is failing due to its energy depletion, it
agent-based fault detection mechanism [13], based on data sends the fail report message to its neighbours and its CH. The
aggregation at sink node is an efficient and fast fault detection CH in this way can detect the potential fault and invoke the
approach with minimum energy consumption. fault recovery mechanism to keep the cluster connected
By comparing the current or historical sates of sensor nodes Clouqueur et al. [22] use the concept of decision fusion sensors
against the overall network state model (i.e. Topology map, to co-ordinate with each other to obtain the same global
energy map, coverage map etc) the central manager in WinMS network state information. It can detect suspicious nodes, if
[14] proposed a centralized approach to detect and prevent
they then send inconsistent information to the decision fusion problem Liu et al. [11] proposed a probabilistic diagnosis
center. WSNMP [23], is a hierarchical network management (PAD) approach for inferring the root cause of abnormal
system based on clustered formation. WSNMP provides a behavior. PAD uses a packet marking algorithm for efficiently
method to monitor the network states by collecting constructing and dynamically maintaining the inference model.
management data and accordingly control and maintain the The algorithm does not incur extra overhead for information
network resources. gathering and provides an on-line diagnosis of an operational
Most recently, researchers are taking interests towards the sensor network system, which passively observes the network
statistical analysis for fault detection algorithms. These symptoms from the sink. A distributed diagnosis algorithm for
mechanisms are simpler to operate, and perform equally well isolating faulty sensor nodes in WSNs is presented in [17]. The
as other techniques. A Mobile Agent (MB) based approach algorithm diagnoses transient faults: communication and
proposed by Al-Kasassbeh et al. [24] can present a reasonable incorrect sensor reading faults. Faulty nodes are simply
new technology that will help to achieve distributed isolated by identifying fault-free nodes.
management. The proposed technique used a statistical method Sympathy [10] is a diagnosis tool for detecting and debugging
based on Wiener Filter to capture the abnormal behaviour of a node self, path and sink faults. Sympathy monitors regular
the MIB variable. The mobile agent migrates from one node to network traffic generated by each healthy node, i.e. sensor
another accessing an appropriate subset of MIB from each readings, routing update messages, synchronization beacons
node and analyzing them locally to perform fault detection. etc. Sympathy detects faults when nodes are not delivering
The method is proved to be more scalable, efficient and does sufficient data to the sink, and treats the absence of monitored
not take longer for its preparation. traffic as an identification of faults. Sympathy identifies
Distributed approaches provide a major shift in the design of whether the root cause of failure is node health, connectivity
fault management architecture for WSNs. Management problem, or at the sink by using an empirical decision tree [2].
responsibilities are transferred more towards the sensor nodes, Chen et al. [18] and Koushanfar et al. [3] focus on sensor
instead of a central manager, which ultimately makes the hardware and actuator faults respectively, which are more
network more reliable and self-managed. prone to be malfunctioning. Intermittent faults are also an
important class of failures. Khilar et al. [25] presented a
B. Fault Diagnosis (Second Phase) probabilistic approach to fault diagnosis in WSNs, considering
intermittent faults in sensor nodes and permanent faults in
In a fault management architecture fault diagnosis is the next wireless links. However, Clouqueur et al. [22] considered
phase after fault detection. After the detection of faults faulty nodes due to harsh environmental conditions.
(alarms) the management system will start to identify the real
causes of faults. In this way detected faults are properly C. Fault Recovery (Third Phase)
identified and distinguished from the other irrelevant false
alarms. The accuracy and correctness of a detected fault have We have discussed different techniques for fault detection and
already been partly achieved using various fault detection fault diagnosis; we next discuss how faults can be treated. Fault
methods already proposed. However, there is still a need of recovery is the final phase of fault management architecture
more comprehensive model of faults in sensor networks to whereby the sensor network is re-configured and restructured
support the systems for accurate fault diagnosis [9]. In WSNs, in such a way that faults and failures do not impact further on
when a sink node does not receive messages from a specific the network performance [9]. WinMS [14], a centralized
region of the routing tree, it is unknown whether it is due to the architecture that analyses network state to detect and predict
failure of a key routing node, or failure of all nodes in the potential failures and take corrective and preventive actions
region. Staddon et al. [15] proposed a fault tracing protocol to accordingly. In WinMS there is a schedule period where local
differentiate these two types of faults. The protocols enable the nodes listen to its environmental activities and can self-
sink node to construct the complete topology of the network configure themselves in the event of failure without prior
(each individual node piggybacks its neighbour node's ID, knowledge of the full network topology. WinMS uses a pro-
along with its own reading). Failed nodes can then be traced by active technique to instruct nodes to send data less frequently
using a divide-and-conqueror strategy based on adaptive route to conserve energy. The main advantage of WinMS is that it
update messages. This approach is unsuitable for large scale adaptively adjusts the network by providing local and central
WSNs, because if there are constant failures, the sink would be recovery mechanism. A distributed localized Cluster-Based
frequently broadcasting routing update messages to the nodes, approach for fault detection and network connectivity recovery
which will incur significant overhead. To overcome this
is proposed by Venkataraman et al. [20]. The scheme is energy advantage of the inter-cluster monitoring of nodes to detect the
efficient and responsive, however it considers only permanent faults. When the cluster member detects fault that is caused by
faults which occur mainly due to energy depletion in particular the cluster head, they act co-operatively to select new cluster
that ultimately leads to the loss of connectivity and coverage in head to replace the failed one.
the network. To improve the robustness and efficiency of Koushanfar et al. [3] proposed a heterogeneous back-up
clustered-based scenario, Lai and Chen [26] proposed a scheme for tolerating and recovering of sensor node hardware
CMATO (Cluster-Member-based fAult Tolerant mechanism) malfunctioning. They argued that a single type of hardware
algorithm. CMATO views the cluster as a whole and takes resource can back up different types of resources. The
proposed protocol focuses on five types of resources: sensing, organization and types of faults they detect, diagnosis and
computing, storage, communication and actuating, which can recover from.
replace each other through suitable changes in a system and
application software. WSNMP [23], is a hierarchical network D. Problems and Issues
management architecture, which is based on clustering
formation. The protocol monitors the network with minimum In this section, we highlight different issues and problems
overhead, collects the management data and finally re- existed in already proposed fault management approaches for
configures the network periodically to recover it from failure. WSNs. We believe that there is a need for application
The protocol describes the algorithm which generates the independent fault management architecture with more holistic
topology of the entire network; once the topology is modeled approach. It is evident from the literature survey that different
the central manager (CM) can reconfigure the network approaches for fault management suffer from the following
minimum overhead in the event of any node or link failures. problems:
The protocol focuses on application that provides management • Due to application specific nature of WSNs, it is very
schemes in terms of monitoring and controlling of WSN. It challenging to apply existing fault management
also detects network faults by identifying non-response nodes, architecture from one application to another.
and if required re-configure the routing path. WSNMP does • Most existing approaches [16, 29] mainly focus on
well in static homogenous WSNs, but provides no solution in failure detection. However, there is still no
dynamic changing topology. comprehensive solution for fault management in
Algorithms like LEACH (Low Energy Adaptive Clustering WSNs from the management architecture perspective.
Hierarchy) and HEED (Hybrid Energy Distributed Clustering) • Different mechanisms proposed for fault recovery [3]
mainly focus on the balanced energy consumption mechanism are not directly relevant to fault recovery in respect of
and efficient clustering forming. They believe that recovery the network system level management (e.g. network
through neighbouring cluster head is better than a gateway connectivity and network coverage area etc).
node. For example, Asim et al. [27] proposed a distributed • Fault recovery mechanisms are mainly application
fault detection and recovery architecture of homogenous WSN. specific (e.g. gateway recovery, common node
The scheme does the local detection and recovery with mutual recovery etc) and focus on small region or individual
nodes co-ordination. They divide the network in a virtual grid nodes thereby are not fully scalable.
instead of clustering, which is more energy-efficient and light • Some decentralized approaches e.g. Hsin et al. [16]
weight with minimum communication cost, provides better require the network to be pre-configured, which is
reliability and energy efficiency. However, they only consider very costly for resource constrained WSNs.
permanent faults. • Some management frameworks require the external
Most of the schemes (centralized and distributed) discussed human manager to monitor the network management
here, are not fully adaptive and self-managed. The fault functionalities e.g. TinyDB, MOTE-VEW and sNMP.
management and recovery are carried out by exchanging • Some schemes [20, 27] only consider permanent
excessive messages between the central manager and nodes or faults and avoid Intermittent and Transient faults.
CH and member nodes. To overcome this problem Yu et al. • Most existing approaches in WSNs isolate [30] failed
[28], proposed a biologically inspired self-managed fault or misbehaving nodes directly from the network
management architecture for WSNs. The proposed self- communication, but there is no adequate fault
managed hierarchical architecture fully distributes the recovery procedure available.
management tasks among different sensor nodes in the
network. The scheme introduces more self-managing functions V. APPLICATION INDEPENDENT ARCHITECTURE
to the sensor nodes, which encourages them to be more self
dependent on monitoring their own status instead of frequent In general, WSNs are tightly application-dependent. When
consulting with their cluster-head. In additions, they also give a WSNs are deployed, applications are not stand-alone but are
solution for faulty nodes replacement in a self-configurable integrated into the management infrastructure. The design of
WSN. The paper, particularly tries to examine the self- applications and management architecture in WSNs are also
management capabilities adapting to various requirements (e.g. dependent on application semantics (e.g. application specific
sensor node failure) in a rapidly changing and hostile data processing combined with data routing). Therefore, unlike
environment. Instead of considering the stereotype distributed traditional networks, resource constrained WSNs limits the
clustering technique, the authors introduce a new management senor nodes to accommodate a wide variety of applications.
layer between the cluster-head and its leaf nodes. This will Furthermore, application designers have to develop complex
make the sensor nodes more self-managed (local computation and special protocol and algorithms for specific sensor
instead of message transmission in sensor networks). applications [9].
Table I show the overall classification and comparison of From the above discussion, we can attest that there is a need for
existing fault management approaches and architecture. The an application independent fault management for WSNs to
table describes the approaches with their operation improve their robustness, reliability and to enable a wider
adoption of WSNs applications and technology.
TABLE 1
FAULT MANAGEMET APPROACHES CATEGORIZATION
Schemes Management System Organization Types of faults & failures addressed Action taken
Centralized Hierarchical, Node self, Network faults, Sink fault, Crash & Fault Detection &
Sympathy [10]
Pro-active monitoring time-out omission failures Diagnosis
Centralized + Distributed Detection, Diagnosis &
MANNA [12] Node faults
Passive monitoring Recovery
Centralized + Distributed (Hierarchical)
WinMS [14] Node faults (week or faulty) Detection & Recovery
Pro-active monitoring
Centralized + Distributed (Hierarchical
WSNMP [23] Node faults, Network faults Detection & Recovery
Clustering based)
Cluster-Based approach Node faults (energy failures), Network faults
Centralized + Distributed Detection & Recovery
[20, 21] (network connectivity), Permanent faults
Centralized + Hierarchical,
Passive Diagnosis of WSNs Detection, Diagnosis &
Probabilistic approach Node faults, Network faults, Transient faults
[11] Recovery
Passive monitoring
Efficient Tracing of failed Centralized Detection, Diagnosis &
Node faults, Route Faults
nodes [15] Active monitoring Recovery
More specifically the architecture should have the following developing application independent fault management
capabilities: architecture with more holistic approach. We integrate
• Unique characteristics and restrictions of WSNs must application knowledge into the MIB (Management Information
be taken into account when proposing fault Base) of management infrastructure. The application
management architecture for WSNs. knowledge is the driving force to direct its operation in order to
• The fault management architecture should be tailor to the special needs of one application to another
application independent with holistic approach that application. Application knowledge may contain information
tackles faults at a number of different levels with low about the application's network topology, its deployed scenario
overhead in terms of computing bandwidth, reliability (indoor/outdoor), data generation and traffic, nature of a sensed
and energy consumption. phenomenon, and nodes' power consumption. Integrating the
• It should reduce the un-certainty associated with application knowledge into the MIB of management
WSNs operations through fault detection, diagnosis infrastructure, provide the basis to develop application
and recovery. independent fault management architecture with more generic
• The architecture should be context aware, adaptive, and holistic approach, which can easily be applicable from one
self-organized and distributed so that the use of sensor application to another.
network resources may be up-to its minimum level The above discussion of various issues considered and outlined
while performing fault management responsibilities. is by no means exhaustive or complete. There are several other
• The architecture should be lightweight in terms of factors and designs considerations to be tackled, including
design and operations. For this purpose layered node deployment versus placement, synchronization, coverage,
system structure may be used. In layered-based and security, before we can design and develop application
system structure each functional component is independent fault management architecture for WSNs.
designed and programmed separately for various
VI. CONCLUSION
sensor applications and management functions.
• In order to provide a continuous support for fault In this paper we presented a survey on fault management in
management in various applications of WSNs, a WSNs, and reviewed current approaches dealing with fault and
generic common interface should be provided. failures in WSNs at different levels. We surveyed state of the
• To improve resilience against failures and make the art protocols, algorithms and techniques applied for fault
network more fault-tolerant the management management in WSNs. Based on our literature survey we can
architecture needs to reconfigure its operation and verify that current approaches of fault management provide
functions reflective to changes in environment and solution for faults and failures only in specific applications and
circumstances. In other words, fault management scenarios. We also mentioned problems and issues in the
architecture should be self-configured and self- existing management approaches and attest that there is a need
organized so that it can continuously monitor the for application independent fault management architecture that
network for faults and technical problems without too can provide extensive fault management solutions for all types
much human intervention. of faults and failures in WSNs. Finally, we proposed some
This survey is a part of our ongoing research to develop design criteria that need to be considered when designing
application independent fault management architecture for application independent fault management architecture for
WSNs. We verified that most of the proposed schemes and WSNs with more a holistic approach. By integrating the
approaches for fault management are tightly applications application knowledge into the management infrastructure
specific. Therefore, we outline a design criterion for
provides us the basis to develop application independent fault [17] L. Myeong-Hyeon and C. Yoon-Hwa, "Distributed diagnosis of
wireless sensor networks," in IEEE Region 10 Conference
management architecture. We are further investigating the ,TENCON'07, , 2007, pp. 1-4.
mechanism that how to inject application knowledge into the [18] J. Chen, S. Kher, and A. Somani, "Distributed fault detection of
MIB of management infrastructure. wireless sensor networks," in Proceedings of the 2006 workshop on
Dependability issues in wireless ad hoc networks and sensor
REFERENCES networks Los Angeles, CA, USA: ACM, 2006.
[19] S.-T. Cheng, S.-Y. Li, and C.-M. Chen, "Distributed Detection in
[1] http://www.alicosystems.com/Wireless%20Sensor%20Netw. Wireless Sensor Networks," in Seventh IEEE/ACIS International
[2] L. Paradis and Q. Han, "A Survey of Fault Management in Wireless Conference on Computer and Information Science, ICIS'08, 2008,
Sensor Networks," Journal of Network and System Management, pp. 401-406.
Springer Science + Business Media, LLC, vol. 15, pp. 171-190, [20] G. Venkataraman, S. Emmanuel, and S. Thambipillai, "A Cluster-
June 2007. Based Approach to Fault Detection and Recovery in Wireless
[3] F. Koushanfar, M. Potkonjak, and A. Sangiovanni-Vincentell, Sensor Networks," in 4th International Symposium on Wireless
"Fault tolerance techniques for wireless ad hoc sensor networks," in Communication Systems, ISWCS'07. , 2007, pp. 35-39.
Proceedings of IEEE Sensors, 2002, pp. 1491-1496 vol.2. [21] C. Yao-Chung, L. Zhi-Sheng, and C. Jiann-Liang, "Cluster based
[4] M. Ding, D. Chen, K. Xing, and X. Cheng, "Localized fault-tolerant self-organization management protocols for wireless sensor
event boundary detection in sensor networks," in INFOCOM 2005, networks," Consumer Electronics, IEEE Transactions on, vol. 52,
24th Annual Joint Conference of the IEEE Computer and pp. 75-80, 2006.
Communications Societies. vol. 2, D. Chen, Ed., 2005, pp. 902-913 [22] C. Thomas, K. S. Kewal, and R. Parameswaran, "Fault Tolerance in
vol. 2. Collaborative Sensor Networks for Target Detection," IEEE Trans.
[5] R. Linnyer Beatrys, G. S. Isabela, B. e. O. Leonardo, W. Hao Chi, Comput., vol. 53, pp. 320-333, 2004.
Jos, S. N. Marcos, and A. F. L. Antonio, "Fault management in [23] M. M. Alam, M. Mamun-Or-Rashid, and C. S. Hong, "WSNMP: A
event-driven wireless sensor networks," in Proceedings of the 7th Network Management Protocol for Wireless Sensor Networks," in
ACM international symposium on Modeling, analysis and 10th International Conference on Advanced Communication
simulation of wireless and mobile systems Venice, Italy: ACM, Technology, (ICACT'08) vol. 1, 2008, pp. 742-747.
2004, pp. 149-156. [24] M. Al-Kasassbeh and M. Adda, "Network fault detection with
[6] J. Suhonen, M. Kohvakka, M. Hannikainen, and T. D. Hamalainen, Wiener filter-based agent," Journal of Network and Computer
"Embedded Software Architecture for Diagnosing Network and Applications, vol. 32, pp. 824-833, 2009.
Node Failures in Wireless Sensor Networks," in Embedded [25] P. M. Khilar and S. Mahapatra, "Intermittent Fault Diagnosis in
Computer Systems: Architectures, Modeling, and Simulation. vol. Wireless Sensor Networks," in Information Technology, (ICIT
5114/2008: Springer Berlin / Heidelberg, July 18, 2008, pp. 258- 2007). 10th International Conference on, 2007, pp. 145-147.
267. [26] L. Yongxuan and C. Hong, "Energy-Efficient Fault-Tolerant
[7] W. L. Lee, A. Datta, and R. Cardell-Oliver, Network Management Mechanism for Clustered Wireless Sensor Networks," in
in Wireless Sensor Networks: Handbook on Mobile Ad Hoc and Proceedings of 16th International Conference on Computer
Pervasive Communications American Scientific Publishers, 2006. Communications and Networks, ICCCN'07, 2007, pp. 272-277.
[8] I. F. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cayirci, "A [27] M. Asim, H. Mokhtar, and M. Merabti, "A Fault Management
Survey on Sensor Networks," IEEE Communication Magazine, pp. Architecture for Wireless Sensor Network," in International
102-114, August 2002. Wireless Communications and Mobile Computing Conference,
[9] Y. Mengjie, H. Mokhtar, and M. Merabti, "Fault Management in IWCMC '08. , 2008, pp. 779-785.
Wireless Sensor Networks," IEEE Wireless Communications, vol. [28] M. Yu, H. Mokhtar, and M. Merabti, "Self-Managed Fault
14, pp. 13-19, 2007. Management in Wireless Sensor Networks," in The Second
[10] N. Ramanathan, E. Kohler, L. Girod, and D. Estrin, "Sympathy: a International Conference on Mobile Ubiquitous Computing,
debugging system for sensor networks [wireless networks]," in 29th Systems, Services and Technologies, UBICOMM '08. , 2008, pp. 13-
Annual IEEE International Conference on Local Computer 18.
Networks, 2004. , pp. 554-555. [29] A. Peffig, R. Szewczy, J. D. Tygar, Victorw, and D. E. Culler,
[11] K. Liu, M. Li, Y. Liu, M. Li, Z. Guo, and F. Hong, "Passive "SPINS: Security Protocols for Sensor Networks," in ACM
diagnosis for wireless sensor networks," in Proceedings of the 6th MobiCom' 01, Rome, Italy, 2001, pp. 189-199.
ACM conference on Embedded network sensor systems, Sensys'08 [30] S. Marti, T. J. Giuli, K. Lai, and M. Baker, "Mitigating routing
Raleigh, NC, USA: ACM, 2008, pp. 113-126. misbehavior in mobile ad hoc networks," in Proceedings of the 6th
[12] L. B. Ruiz, J. M. Nogueira, and A. A. F. Loureiro, "MANNA: a annual international conference on Mobile computing and
management architecture for wireless sensor networks," networking Boston, Massachusetts, United States: ACM, 2000, pp.
Communications Magazine, IEEE, vol. 41, pp. 116-125, 2003. 255-265.
[13] S. Elhadi, X. Xinyu, and Z. Haiyi, "Agent-based Fault Detection
Mechanism in Wireless Sensor Networks," in Proceedings of the
2007 IEEE/WIC/ACM International Conference on Intelligent
Agent Technology: IEEE Computer Society, 2007.
[14] W. L. Lee, A. Datta, and R. Cardell-Oliver, "WinMS: Wireless
Sensor Network-Management System, An Adaptive Policy-Based
Management for Wireless Sensor Networks," School of Computer
Science & Software Engineering, The University of Western
Australia, CSSE Technical Report UWA-CSSE-06-001, June 2006.
[15] S. Jessica, B. Dirk, and D. Glenn, "Efficient tracing of failed nodes
in sensor networks," in Proceedings of the 1st ACM international
workshop on Wireless sensor networks and applications Atlanta,
Georgia, USA: ACM, 2002, pp. 122-130.
[16] C. Hsin and M. Liu, "A Two-Phase Self-Monitoring Mechanism for
Wireless Sensor Networks," Journal of Computer Communications
special issue on Sensor Networks, vol. 29, pp. 462-476, February
2006.

Design Considerations For Fault Management in Wireless Sensor Networks

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Design Considerations For Fault Management in Wireless Sensor Networks

Încărcat de

Drepturi de autor:

Formate disponibile

Design Considerations for Fault Management in

Wireless Sensor Networks

S-ar putea să vă placă și