Dissertation

Budapest University of Technology and Economics Department of Telecommunications
Privacy enhancing protocols for wireless networks

Ph.D. Dissertation of
Tam as Holczer
Supervisor: Levente Butty an, Ph.D.
TO BE ON THE SAFE SIDE
Budapest, Hungary 2012
Alul rott Holczer Tam as kijelentem, hogy ezt a doktori ertekez est magam k esz tettem es abban csak a megadott forr asokat haszn altam fel. Minden olyan r eszt, amelyet sz o szerint, vagy azonos tartalomban, de atfogalmazva m as forr asb ol atvettem, egy ertelm uen, a forr as megad as aval megjel oltem. I, the undersigned Tam as Holczer hereby declare, that this Ph.D. dissertation was made by myself, and I only used the sources given at the end. Every part that was quoted word-for-word, or was taken over with the same content, I noted explicitly by giving the reference of the source.
A dolgozat b r alatai es a v ed esr ol k esz ult jegyz ok onyv a Budapesti M uszaki es Gazdas agtudom anyi Egyetem Villamosm ern oki es Informatikai Kar anak d ek ani hivatal aban el erhet oek. The reviews of the dissertation and the report of the thesis discussion are available at the Deans Oce of the Faculty of Electrical Engineering and Informatics of the Budapest University of Technology and Economics.
Budapest, . . . . . . . . . . . . . . . . . . . . . . . . Holczer Tam as
iii
Abstract
Wireless networks are used in our everyday life. We use wireless networks to call each other, to download our emails at home, or to enter a building with a proximity card. In the near future wireless networks will be used in many new elds such as vehicular ad hoc networks, or critical infrastructure protection. The use of wireless networks instead of wired networks opens up new research challenges. These challenges include mobility, coping with unreliable links, resource constraints, and the security and privacy aspects of the wireless networks. In this thesis some privacy aspects of dierent wireless networks are investigated. In chapter 2, private authentication methods are proposed and analyzed for radio frequency identication (RFID) systems. A typical example for such an application is a Radio Frequency Identication System (RFID) system, where the provers are low-cost RFID tags, and the number of the tags can potentially be very large. I study the problem of private authentication in RFID systems. More specically I propose two methods, that are the privacy ecient key-tree based authentication, and the group based authentication. The rst key-tree based private authentication protocol has been proposed by Molnar and Wagner as a neat way to eciently solve the problem of privacy preserving authentication based on symmetric key cryptography. However, in the key-tree based approach, the level of privacy provided by the system to its members may decrease considerably if some members are compromised. In this thesis, I analyze this problem, and show that careful design of the tree can help to minimize this loss of privacy. First, I introduce a benchmark metric for measuring the resistance of the system to a single compromised member. This metric is based on the well-known concept of anonymity sets. Then, I show how the parameters of the key-tree should be chosen in order to maximize the systems resistance to single member compromise under some constraints on the authentication delay. In the general case, when any member can be compromised, I give a lower bound on the level of privacy provided by the system. I also present some simulation results that show that this lower bound is quite sharp. The results of Chapter 2 can be directly used by system designers to construct optimal key-trees in practice. In the second part of chapter 2, I propose a novel group based authentication scheme similar to the key-tree based method. This scheme is also based on symmetric-key cryptography, and therefore, it is well-suited to resource constrained applications in large scale environments. I analyze the proposed scheme and show that it is superior to the previous key-tree based approach for private authentication both in terms of privacy and eciency. In chapter 3, I analyze the privacy consequences of inter vehicular communication. The promise of vehicular communications is to make road trac safer and more ecient. However, besides the expected benets, vehicular communications also introduce some privacy risk by making it easier to track the physical location of vehicles. One approach to solve this problem is that the vehicles use pseudonyms that they change with some frequency. In this chapter, I study the eectiveness of this approach. I dene a model based on the concept of mix zone, characterize the tracking strategy of the adversary in this model, and introduce a metric to quantify the level of privacy enjoyed
by the vehicles. I also report on the results of an extensive simulation where I used my model to determine the level of privacy achieved in realistic scenarios. In particular, in my simulation, I used a rather complex road map, generated trac with realistic parameters, and varied the strength of the adversary by varying the number of her monitoring points. My simulation results provide information about the relationship between the strength of the adversary and the level of privacy achieved by changing pseudonyms. From the rst half of Chapter 3, it can be seen that untraceability of vehicles is an important requirement in future vehicle communications systems. Unfortunately, heartbeat messages used by many safety applications provide a constant stream of location data, and without any protection measures, they make tracking of vehicles easy even for a passive eavesdropper. However, considering a global attacker, this approach is eective only if some silent period is kept during the pseudonym change and several vehicles change their pseudonyms nearly at the same time and at the same location. Unlike other works that proposed explicit synchronization between a group of vehicles and/or required pseudonym change in a designated physical area (i.e., a static mix zone), I propose a much simpler approach that does not need any explicit cooperation between vehicles and any infrastructure support. My basic idea is that vehicles should not transmit heartbeat messages when their speed drops below a given threshold, and they should change pseudonym during each such silent period. This ensures that vehicles stopping at trac lights or moving slowly in a trac jam will all refrain from transmitting heartbeats and change their pseudonyms nearly at the same time and location. Thus, my scheme ensures both silent periods and synchronized pseudonym change in time and space, but it does so in an implicit way. I also argue that the risk of a fatal accident at a slow speed is low, and therefore, my scheme does not seriously impact safety-of-life. In addition, refraining from sending heartbeat messages when moving at low speed also relieves vehicles of the burden of verifying a potentially large amount of digital signatures, and thus, makes it possible to implement vehicle communications with less expensive equipments. In chapter 4, I propose protocols that increase the dependability of wireless sensor networks, which are potentially useful building blocks in cyber-physical systems. Wireless sensor networks can be used in many critical applications such as martial or critical infrastructure protection scenarios. In such a critical scenario, the dependability of the monitoring sensor network can be crucial. One interesting part of the dependability of a network, is how the network can hide its nodes with specic roles from an eavesdropping or active attacker. In this problem eld, I propose protocols which can hide some important nodes of the network. More specically, I propose two privacy preserving aggregator node election protocols, a privacy preserving data aggregation protocol, and a corresponding privacy preserving query protocol for sensor networks that allow for secure in-network data aggregation by making it dicult for an adversary to identify and then physically disable the designated aggregator nodes. The basic protocol can withstand a passive attacker, while my advanced protocols resist strong adversaries that can physically compromise some nodes. The privacy preserving aggregator protocol allows electing aggregator nodes within the network without leaking any information about the identity of the elected node. The privacy preserving aggregation protocol helps collecting data by the elected aggregator nodes without leaking the information, who is actually collecting the data. The privacy preserving query protocol enables an operator to collect the aggregated data from the unknown and anonymous aggregators without leaking the identity of the aggregating nodes.
vi
Kivonat
Vezet ek n elk uli h al ozatok a mindennapi elet r esz et k epezik. Ilyen h al ozatokat haszn alhatunk p eld aul telefon al asra, Interneten el erhet o szolg altat asok ig enybe v etel ere, vagy kontaktus mentes k arty as bel eptet o rendszerekben. A k ozelj ov oben a felhaszn al asi ter uletek jelent os m ert ekben ki fognak b ov ulni, t obbek k oz ott a g epj arm uvek is gy fognak kommunik alni egym assal, vagy szerepet fog kapni a kritikus infrastrukt ura v edelm eben is. A vezet ek n elk uli h al ozatok sz elesk or u haszn alata u j kutat asi probl em akat vet fel. Ilyen u j probl emak or a mobilit as, megb zhatatlan kapcsolatok kezel ese, sz uk os er oforr asokb ol sz armaz o probl em ak es kih v asok vagy az adatv edelmi es adatbiztons agi k erd esek kutat asa. Ebben a disszert aci oban k ul onb oz o vezet ek n elk uli h al ozatok adatv edelmi k erd eseit vizsg alom. A disszert aci o els o fejezet eben priv at hiteles t esi m odszereket vizsg alok r adi ofrekvenci as azonos t asi probl em ak kezel es ere. Tipikus alkalmaz asi ter ulet az RFID rendszerek, ahol potenci alisan rengeteg felhaszn al o olcs o RFID k arty ak seg ts eg evel hiteles tik magukat egy olvas o fel e. A k et hiteles t esi m od a kulcsfa alap u illetve a csoport alap u azonos t as. Az els o kulcsfa alap u priv at hiteles t esi protokollt Molnar es Wagner javasolta. Ez a m odszer egy hat ekony szimmetrikus kulcs alap u priv at hiteles t o protokoll volt. Ez a m odszer nagyon j ol m uk odik mindaddig, am g nem kompromitt al odik valamelyik felhaszn al o titkos kulcsai. Ekkor nemcsak a kompromitt al odott felhaszn al o elvez kisebb anonimit ast, de az osszes t obbi felhaszn al o anonimit asa is s er ul. A disszert aci o 2. fejezet eben azt elemzem, hogy a fa param etereinek gondos megv alaszt asa hogyan tudja minimaliz alni az elveszett anonimit ast. El osz or is, deni alok egy m ert eket, ami azt m eri, hogy milyen hat asa van annak, ha egy felhaszn al o kompromitt al odik a rendszerben. Ez a m ert ek az anonimit asi halmaz j ol ismert fogalm ara ep ul. Ezut an megmutatom, hogy kell a kulcsfa param etereit megv alasztani u gy, hogy az el obb deni alt m ert ekben minim alis legyen a kompromitt al od asb ol sz armaz o vesztes eg bizonyos k uls o k enyszerek teljes ul ese mellett. Altal anos esetben, ahol nem csak egy felhaszn al o kompromitt al odhat hanem t obb is, als o becsl est adok a rendszer altal biztos tott anonimit asi szintre. Szimul aci okkal megmutatom, hogy ez az als o becsl es jellemz oen pontos becsl es. A fejezet eredm enyei k ozvetlen ul felhaszn alhat ok rendszer tervez eskor, amikor meg kell tal alni a feladatnak legjobban megfelel o kulcsf at. 2. fejezet m asodik r esz eben egy u j csoport alap u priv at hiteles t esi m odszert javaslok. Ez a m odszer is szimmetrikus kulcsokon alapul, gy j ol alkalmazhat o er oforr as korl atozott eszk oz ok eset en is. A fejezetben elemzem a javasolt megold ast, es megmutatom, hogy bizonyos tipikus esetekben jobban m uk odik, mint a fejezet elej en bevezetett kulcsfa alap u m odszer. A 3. fejezetben a j arm uk ozi kommunik aci o adatv edelmi k ovetkezm enyeit elemzem. A k ozelj ov oben megval osul o j arm uk ozi kommunik aci o biztons agosabb es hat ekonyabb k ozleked est tesz lehet ov e, de ugyanakkor egyszer ubb e teszi a j arm uvek k ovethet os eg et is, ami jelent osen s ertheti a j arm uvezet ok priv at szf er aj at. Egy lehets eges megold as a probl em ara, ha a j arm uvek nem alland o azonos t okat haszn alnak a kommunik aci ojuk sor an, hanem alneveket, amiket gyakran le tudnak cser elni. Ebben a fejezetben ennek a megold asnak a hat ekonys ag at elemzem. El osz or is egy mix z ona alap u modellt alkotok. Ebben a modellben deni alom a t amad o k ovet esi strat egi aj at, es deni alom a m ert eket,
vii
ami azt m eri, hogy az egyes j arm uvek mennyire k ovethet ok. Ezek ut an megvizsg alom a modellt egy r eszletes szimul aci oban. A szimul aci o folyam an, egy komplex t erk epen val os agh uen k ozlekednek j arm uvek, es vizsg alom a forgalom es a t amad o er oss eg enek hat as at a k ovethet os egre. Ahogy ez a 3. fejezet els o r esz eb ol l atszik, a j arm uvek k ovethet os ege fontos szempont a j arm uk ozi kommunik aci oban. Sajn alatos m odon, ahogy l attuk, a folytonosan adott helyzetje lent esek k onnyen k ovethet ov e teszik a j arm uveket. Altal anos megold as a probl em ara, ha a j arm uvek v altogatj ak az azonos t ojukat. Ez a v alt as, persze csak akkor tud hat ekony lenni, ha a k et k ul onb oz o azonos t o haszn alata k oz ott eltelik legal abb egy kis id o, amikor a j arm u nem ad semmit, es egyszerre t obb egym as k ozel eben l ev o j arm u v alt azonos t ot. M g a legt obb megold as bonyolult szinkroniz aci ot r el o, vagy csak statikusan kijel olt helyeken engedi a cser et, addig az en megold asom enn el sokkal egyszer ubb. Ebben a megold asban nincs sz uks eg explicit kooper aci ora vagy k uls o infrastrukt ur ara, hanem egyszer uen a j arm uvek abbahagyj ak az ad ast egy bizonyos sebess eg alatt, majd amikor atl epik ezt a k usz ob sebess eget, akkor u jra elkezdenek adni de m ar az u j azonos t oval. Ez altal k ozleked esi l amp an al v arakoz o, vagy dug oban araszol o j arm uvek egyszerre maradnak cs ondben es cser elnek azonos t ot. Ez altal ez a m odszer egyszer uen garant alja a sz uks eges cs ondes peri odust, es a helyileg es id oben szinkroniz alt cser et val osit meg szinkroniz aci o n elk ul. Ez a m odszer egyr eszt az ert szerencs es, mivel alacsony sebess egn el kicsi az es ely s ulyos balesetre, teh at epp akkor nem ad jeleket a j arm u, amikor nincs is sz uks eg r a, m asr eszt az egym ashoz k ozel araszol o j arm uvek nagyon nagy mennyis eg u feldolgozand o adatot gener alnak, ami gy szint en elker ulhet o. A disszert aci o 4. fejezet eben protokollokat javaslok, amik n ovelni tudj ak egy vezet ek n elk uli szenzorh al ozat megb zhat os ag at. Vezet ek n elk uli szenzorh al ozatokat fel lehet haszn alni kritikus feladatokra is mint p eld aul had aszati vagy kritikus infrastrukt ura v edelem. Ilyen kritikus feladatokban, nagyon fontos lehet a kiemelt szerep u node-ok v edelme illetve elrejt ese t amad ok el ol. Ezen prob emater uleten bel ul javaslok protokollokat, melyek el tudj ak rejteni a kulcsfontoss ag u eszk oz ok kil et et. Pontosabban, k et priv at aggreg ator v alaszt o protokollt egy priv at aggreg al o es egy priv at lek erdez o protokollt javaslok, amelyek haszn alata eset en szenzor h al ozatban t amad ok nem tudj ak azonos tani az aggreg ator eszk oz oket. A k et megold as k oz ul az egyszer ubb protokoll passz v lehallgat as ellen ny ujt biztons agot, m g a komplexebb protokoll akt v t amad asok ellen is v edelmet ny ujt.
viii
Acknowledgement
First of all, I would like to express my gratitude to my supervisor, Professor Levente Butty an, Ph.D., Departement of Telecommunication, Budapest University of Technology and Economics. He gave me guidance in selecting problems to work on, helped in elaborating the problems, and pushed me to publish the results. All these three steps were needed to nish this thesis. I am also grateful to the current and former members of the CrySyS Laboratory: Boldizs ar Bencs ath, L aszl o Czap, L aszl o Cs k, L aszl o D ora, Amit Dvir, Gergely K otyuk, Aron L aszka, G abor P ek, P eter Schaer, Vinh Thong Ta, and Istv an Vajda for the illuminating discussions on dierent technical problems that I encountered during my research. They also provided a pleasant atmosphere which was a pleasure to work in. I would also like to thank for our joint eorts and publications to Petra Ardelean, Naim Asaj, Gildas Avoine, Danny De Cock, Stefano Cosenza, Amit Dvir, L aszl o D ora, Julien Freudiger, Albert Held, Jean-Pierre Hubaux, Frank Kargl, Antonio Kung, Zhendong Ma, Michael M uter, Panagiotis Papadimitratos, Maxim Raya, P eter Schaer, Elmar Schoch, Istv an Vajda, Andre Weimerskirch, William Whyte, and Bj orn Wiedersheim. The nancial support of the Mobile Innovation Centre (MIK) and the support of the SEVECOM (FP6-027795) and WSAN4CIP (FP7-225186) EU projects are gratefully acknowledged. And last but not least my thanks go to my wife N ora, who accepted me as being a PhD student. I know sometimes it was not easy.
ix
Contents
1 Introduction 1.1 Introduction to RFID systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Introduction to Vehicular Ad Hoc Networks . . . . . . . . . . . . . . . . . . . . . . 1.3 Introduction to Wireless Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . 2 Private Authentication 2.1 Introduction to private authentication . . . . . . . . . . . 2.2 Resistance to single member compromise . . . . . . . . . . 2.3 Optimal trees in case of single member compromise . . . . 2.4 Analysis of the general case . . . . . . . . . . . . . . . . . 2.5 The group-based approach . . . . . . . . . . . . . . . . . . 2.6 Analysis of the group based approach . . . . . . . . . . . 2.7 Comparison of the group and the key-tree based approach 2.8 Related work . . . . . . . . . . . . . . . . . . . . . . . . . 2.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . 2.10 Related publications . . . . . . . . . . . . . . . . . . . . . 3 Location Privacy in VANETs 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 3.2 Model of local attacker and mix zone . . . . . . . . . 3.2.1 The concept of the mix zone . . . . . . . . . 3.2.2 The model of the mix zone . . . . . . . . . . 3.2.3 The operation of the adversary . . . . . . . . 3.2.4 Analysis of the adversary . . . . . . . . . . . 3.2.5 The level of privacy provided by the mix zone 3.3 Simulation of mix zone . . . . . . . . . . . . . . . . . 3.3.1 Simulation settings . . . . . . . . . . . . . . . 3.3.2 Simulation results . . . . . . . . . . . . . . . 3.4 Global attacker . . . . . . . . . . . . . . . . . . . . . 3.5 Framework for location privacy in VANETs . . . . . 3.6 Attacker Model and the SLOW algorithm . . . . . . 3.7 Analysis of SLOW . . . . . . . . . . . . . . . . . . . 3.7.1 Privacy . . . . . . . . . . . . . . . . . . . . . 3.7.2 Eects on safety . . . . . . . . . . . . . . . . 3.7.3 Eects on computation complexity . . . . . . 3.8 Related work . . . . . . . . . . . . . . . . . . . . . . 3.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . 3.10 Related publications . . . . . . . . . . . . . . . . . .
1 2 3 4 9 9 11 14 20 23 24 26 27 28 28 29 29 31 31 32 32 32 34 34 34 35 36 37 38 39 39 44 44 44 46 47
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
xi
CONTENTS 4 Anonymous Aggregator Election and Data Aggregation 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 System and attacker models . . . . . . . . . . . . . . . . . 4.3 Basic protocol . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 Protocol description . . . . . . . . . . . . . . . . . 4.3.2 Protocol analysis . . . . . . . . . . . . . . . . . . . 4.3.3 Data forwarding and querying . . . . . . . . . . . . 4.4 Advanced protocol . . . . . . . . . . . . . . . . . . . . . . 4.4.1 Initialization . . . . . . . . . . . . . . . . . . . . . 4.4.2 Data aggregator election . . . . . . . . . . . . . . . 4.4.3 Data aggregation . . . . . . . . . . . . . . . . . . . 4.4.4 Query . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.5 Misbehaving nodes . . . . . . . . . . . . . . . . . . 4.5 Related work . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7 Related publications . . . . . . . . . . . . . . . . . . . . . 5 Application of new results 6 Conclusion in . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . WSNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 49 50 53 53 56 60 60 61 63 65 67 69 70 72 73 75 77
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
xii
List of Figures
2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10
Illustration of a key-tree . . . . . . . . . . . . . . . . . . . . Illustration of single member compromise . . . . . . . . . . Illustration of several members compromise . . . . . . . . . Simulation results for branching factor vectors . . . . . . . . system comparison based on approximation . . . . . . . . . Operation of the group-based private authentication scheme Tree and group based authentication . . . . . . . . . . . . . Simulation results . . . . . . . . . . . . . . . . . . . . . . . Mix and observed zone . . . . . . . . . . . . . . . . . . . . Simplied map of Budapest generated for the simulation. Success probabilities of the adversary . . . . . . . . . . . . Results of the simulation . . . . . . . . . . . . . . . . . . . Success rate of a tracking attacker . . . . . . . . . . . . . Example intersection . . . . . . . . . . . . . . . . . . . . . Success rate of the simple attacker . . . . . . . . . . . . . Success rate of the simple attacker . . . . . . . . . . . . . Number of signatures to be veried . . . . . . . . . . . . . Result of aggregator election protocol . . . . . . . . . . Probability of being cluster aggregator . . . . . . . . . . Probability of being cluster aggregator . . . . . . . . . . Result of balancing . . . . . . . . . . . . . . . . . . . . . Entropy of the attacker . . . . . . . . . . . . . . . . . . Connected dominating set . . . . . . . . . . . . . . . . . Aggregation example . . . . . . . . . . . . . . . . . . . . Query example . . . . . . . . . . . . . . . . . . . . . . . Graphical representation of the suitable intervals . . . . Misbehavior detection algorithm for the query protocol. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
10 12 20 22 23 24 24 27 31 35 36 37 40 42 43 43 45 51 57 58 58 59 63 66 68 69 71
xiii
List of Tables
2.1 3.1 4.1 4.2 4.3
Illustration of the operation of the recursive function f . . . . . . . . . . . . . . . . Notation in SLOW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Estimated time of the building blocks on a Crossbow MICAz mote . . . . . . . . . Optimal values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary of complexity of the advanced protocol . . . . . . . . . . . . . . . . . . .
19 41 55 60 61
xv
List of Algorithms
1 2
Optimal branching factor generating algorithm . . . . . . . . . . . . . . . . . . . . Basic private cluster aggregator election algorithm . . . . . . . . . . . . . . . . . .
19 54
xvii
Chapter
Introduction
In this dissertation privacy enhancing protocols for wireless networks are proposed. In this chapter, a brief overview is given on those wireless networks to which the work presented in this dissertation is related, namely on Radio Frequency Identication systems (RFID systems), Vehicular Ad Hoc Networks (VANETs), and Wireless Sensor Networks (WSNs). The privacy consequences of the usage of such networks and some related problems are sketched. The main reason for choosing these networks is that they are or will potentially be used by billions of users, so solving a problem related to these networks can have an eect on an extremely large amount of users privacy. Wireless technology is a truly revolutionary paradigm shift, enabling multimedia communications between people and devices from any location. It also enables exciting applications such as sensor networks, smart homes, telemedicine, and automated highways. Comprehensive introductions to wireless networks can be found in [Goldsmith, 2005; Rappaport, 2001]. The security and privacy problems of wireless networks is a well studied eld, however there are a lot of open question worth to work on. Overviews of security and privacy in wireless networks can be found in [Butty an and Hubaux, 2008; Juels, 2006; Raya and Hubaux, 2007; Akyildiz et al., 2002]. A wireless network consists of nodes that can communicate through wireless channels. Those channels include Infra Red (IR) or Radio Frequency (RF) channels. From the security point of view, the main dierence between wireless and traditional wired networks is that a passive attacker can easily eavesdrop the wireless channel without detection, while it can be harder with wired networks. Harder actually means here that those attacks require physical access to the network (cables or network elements), and the lack of physical protection in case of wireless networks makes these attacks easier to carry out. An active attacker can inject, modify, and delete messages in the air with some knowledge of the network and wireless technologies, while again it is harder for a traditional wired network. In information technology, privacy is dened as the right of an entity to choose which information is revealed about the entity, what information is collected and stored, how that information is used, shared or published, and also the right to keep control on that information (e.g., the right to delete data from a database if the user wishes to do so). Privacy has actually two facets: data control and data protection. One way to keep control is to keep data secret, e.g., to remain ohntopp, 2001], anonymity is the state of being not anonymous. According to [Ptzmann and K identiable within a set of subjects, the anonymity set. In the remaining part of the dissertation, I will use privacy with this information centric meaning, and decisional privacy1 or intentional privacy2 will not be discussed.
This conception of privacy addresses issues related to an individuals authority to make decisions that aect the individuals life and body and that of the individuals family members such as end of life issues. [ITLaw] 2 This conception of privacy addresses issues related to intimate activities or characteristics that are publicly visible. [ITLaw]
1
1. INTRODUCTION In the remainder part of this chapter, the three wireless networks I worked with within this dissertation are introduced.
1.1
Introduction to RFID systems
The following description of RFID systems and its security and privacy problems is based on [Juels, 2006; Langheinrich, 2009; Peris-Lopez et al., 2006]. The interested reader can get a broader view and deeper understanding on RFID systems by reading the cited papers instead of only relying on this short introduction. RFID (Radio-Frequency IDentication) is a technology for automated identication of objects or people. An RFID system consist of simple Tags, Readers, and Backend servers. The tags carry unique identiers. These unique identiers are read by nearby Readers by radio communication. The Readers send the obtained identiers to Backend Servers. The goal of an RFID system is the unique identication of the holders of the Tags. Example applications of RFID systems include smart appliances, shopping, interactive objects, or medication compliance. This list can be expanded to hundreds of scenarios [Wu et al., 2009; RFID, 2012]. The main threats to privacy in RFID systems are tracking and inventorying. A tracking attacker can eavesdrop message exchanges in dierent parts of the network. If the system is not defended against such attacks, the attacker can link dierent message exchanges of the same user, hence can track the user. This is a very important concern in RFID systems, that is why this problem is discussed in Chapter 2 (the problem of tracking is actually not unique to RFID systems, and I will study it in a dierent context in Chapter 3, namely in vehicular networks). Inventorying is a specic attack against RFID systems. It relies on the assumption that in the near future, most of our objects will be tagged with distant readable RFID tags. An attacker carrying out an inventorying attack can get know exactly what a user wears, has in her pockets or bag without the consent of the user. In Chapter 2, two private authentication methods are given, which make it dicult for an attacker to carry out tracking and inventorying attacks. Another important eld of security problems regarding RFID is the authenticity of the tags. In short, the privacy problem is related to malicious readers, while the authenticity problem is related to malicious tags. The main problem is that illegitimate tags can be counterfeited to obtain the same rights as the legitimate tag holds. In the following, I will assume the presence of malicious readers, but no malicious tags is considered. When considering the RFID tags capabilities, the tags on the market can be classied into two main categories: basic tags with no real cryptographic capabilities and advanced tags with some symmetric key cryptography capabilities.
Basic tags
Basic RFID tags lack the resources to perform true cryptographic operations. The lack of cryptography in basic RFID tags is a big impediment to security design; cryptography, after all, is the main building block of data security. The main approaches to provide privacy to basic tags are the following: killing, sleeping, renaming, proxying, distance measurement, blocking, and legislation. Killing and sleeping are very similar approaches. The basic idea is that an authenticated command can reversibly or permanently switch o the tag. Another approach is to divide the identier space into two separate parts by a modiable privacy bit [Juels et al., 2003; Juels and Brainard, 2004]. The two parts are the private and the public parts. A blocker device can make the scanning of private tags infeasible, and the tags can be moved between the public and private zone on demand. Another device based solution is the proxying, where the holder of the tag can use some equipments (like a mobile phone)to enforce privacy [Floerkemeier et al., 2005; Juels et al., 2006; Rieback et al., 2005].
1.2. Introduction to Vehicular Ad Hoc Networks The tracking problem is based on the fact, that tags use static identiers. Some proposals suggest to rename the tags by readers [Instruments, 2005], or the tag itself can rotate pseudonyms [Juels, 2005a] to make tracking harder. In distance measurement the tags can roughly measure their distance to the reader by measuring the signal-to-noise ratio of the channel [Fishkin et al., 2005]. This can be used to avoid distant aggressive scanning. A non technical approach is legislation: There are some eorts to regulate the usage of RFID tags from the privacy point of view [Kelly and Erickson, 2005], but these eorts are far from ecient completion. Ultimately, this approach may be more eective and cost ecient than any other (e.g. from an economic aspect, it is not worth to track if the tracker can go to jail by doing so). The authentication of basic tags is as hard as providing privacy to them. There are some work [Juels, 2005b], how the kill PIN can be used to authenticate the tags.
Advanced tags
Advanced tags are capable of simple symmetric key operations. However weak cryptographic algorithms are targets of successfull attacks [Bono et al., 2005]. Another attack type against cryptographically enabled tags are the man-in-the-middle attacks. In a MiM attack the attacker is relaying messages between the tag and the reader and by doing so, he can modify, delete, and inject messages in their communication. This can also be done if the tag and the reader are not in vicinity [Hancke, 2005; Kr and Wool, 2005]. The privacy of advanced tags is deeply analyzed in Chapter 2. In short, the problem is that the tag is not allowed to send its identier in order to avoid tracking, therefore the reader needs a lot of trials to nd the right decryption key. The computational burden on the reader can be partly alleviated with key-trees [Molnar and Wagner, 2004], synchronization [Ohkubo et al., 2004], or time-memory tradeos [Avoine et al., 2005; Avoine and Oechslin, 2005]. However, all known mitigation techniques lead to degradation of privacy or eciency. The degradation of privacy is analyzed in Chapter 2, where ecient solutions are also proposed.
1.2
Introduction to Vehicular Ad Hoc Networks
The following description of Vehicular Ad Hoc Networks and their security and privacy properties is based on [Raya and Hubaux, 2005; Raya and Hubaux, 2007; Lin et al., 2008; Blum et al., 2004b; D otzer, 2006]. The interested reader can get a broader view and deeper understanding on VANETs by reading the cited papers instead of only relying on this short introduction. The main motivation to use VANETs is to enhance trac safety, trac eciency, give assistance to drivers, and the possibility of infotainment applications. A VANET consist of vehicles equipped with On Board Units (OBUs) and wireless communication equipment, Road Side Units (RSUs), and backend infrastructure. The vehicles exchange messages regularly with each other and with the infrastructure using wireless communication to achieve the main goals such as safer roads. The main vulnerabilities in VANETs come from the wireless nature of the communication, and the sensitive information, such as location of users, used by the network. One major vulnerability comes from the the wireless nature of the system: the communication can be jammed easily, the messages can be forged. Another problem related to the wireless communication is that while the nodes are relaying messages, they can modify them. This is called In-Transit Trac Tampering. Another kind of problem, that the vehicles can impersonate other vehicles with higher privileges such as emergency vehicles to gain extra privileges. The most relevant problem to this dissertation is that the privacy of the drivers of the vehicles can be violated. This vulnerability is analyzed in Chapter 3. In general an attacker can achieve her goals by tampering the OBU, an RSU, sensor readings, or the wireless channel. Traditional mechanisms cannot deal with the vulnerabilities discussed above because of the new challenges in VANETs. Such challenge is the high network volatility caused by the highly mobile very large scale network. Another challenge is that the network must oer liability and
1. INTRODUCTION privacy at the same time in an ecient way, as the applications are delay sensitive. To make things even worse, the network is very heterogenous, dierent vehicles can have dierent equipment and abilities, so no unique solution can solve every problem. When dening the key vulnerabilities and challenges of vehicular ad hoc networks, it is crucial to rst dene and characterize the possible attackers. In many papers [Raya and Hubaux, 2007; Hu et al., 2005] the attacker can be characterized as follows: Insider vs. Outsider: The key dierence between an insider and an outsider attacker is that an insider poses legitimate and valid cryptographic credentials, while an outsider does not have any valid credentials. It is obvious that an insider attacker can mount stronger attacks, then an outsider. Malicious vs. Rational: The main goal of a malicious attacker is to disrupt the normal operation of the network without any further goal, while a rational attacker wants to make some prot with his attack. In general, it is easier to handle a rational attacker, because his steps can be foreseen easier. Active vs. Passive: A passive attacker only eavesdrops the messages of the vehicles, while an active attacker can send, modify, or delete messages. Local vs. Global: A local attacker mounts his attack on a small area (or on some non continuous small areas), while a global attacker has inuence on broader areas. In the following, some basic and sophisticated attacks are presented to give the reader an idea about the threats in vehicular ad hoc networks. An insider attacker can diuse bogus information to aect the behavior of other drivers. The source of the information can be a cheated sensor reading or a modied location data. In wireless networking, the wormhole attack [Hu et al., 2006] consists in tunneling packets between two remote nodes. Similarly, in VANETs, an attacker that controls at least two entities remote from each other and a high speed communication link between them can tunnel packets broadcasted in one location to another, thus disseminating erroneous (but correctly signed) messages in the destination area. According to [Kroh et al., 2006] the following security concepts must be used in a vehicular ad hoc network to handle most of the possible attacks: identication and authentication concepts, privacy concepts, integrity concepts, access control and authorization concepts. The concepts are introduced in Section 3.8 with a special attention on providing privacy to the users of the system. In Chapter 3, the privacy of VANETs is analyzed, especially the privacy provided by pseudonyms considering outsider rational passive local attackers. A pseudonym change algorithm is provided as well considering an outsider rational passive global attacker.
1.3
Introduction to Wireless Sensor Networks
The following description of Wireless Sensor Networks (WSNs) and the related security problems is based on [Akyildiz et al., 2002; Chan and Perrig, 2003; Li et al., 2009; Lopez, 2008; Perrig et al., 2004; Sharma et al., 2012; Yick et al., 2008]. The interested reader can get a broader view and deeper understanding on WSNs by reading the cited papers instead of only relying on this short introduction. A sensor network is composed of a large number of sensor nodes, which are typically densely deployed. One sensor node consists of some sensor circuits which can measure some environmental variable, central processing unit which is typically a microcontroller, and radio circuit which enables the communication with other nearby nodes. The goal of a wireless sensor network can be one of many applications: military applications (e.g. battleeld surveillance), environmental applications (e.g. forest re detection), critical infrastructure protection (e.g. surveillance of water pipes), health applications (e.g. drug administration in hospitals), home applications (e.g. smart environment).
1.3. Introduction to Wireless Sensor Networks Some important security challenges in WSNs are: secure routing, secure key management, ecient (broadcast) authentication, secure localization, secure data aggregation. A good introduction to these problems and some countermeasures can be found in [Lopez, 2008]. The privacy related challenges can be categorized into two main groups [Li et al., 2009]: dataoriented and context oriented challenges. In data-oriented protection, the condentiality of the measured data must be preserved. Context oriented protection covers the location privacy of the source and some signicant nodes such as the base station or aggregator nodes:
X Data-oriented privacy protection: Data-oriented privacy protection focuses on protecting the privacy of data content. Here data refer to not only sensed data collected within a WSN but also queries posed to a WSN by users.
Privacy protection during data aggregation: Data aggregation is designed to substantially reduce the volume of trac being transmitted in a WSN by fusing or compressing data in the intermediate sensor nodes (called aggregators). It is an important technique for preserving resources (e.g., energy consumption) in a WSN. Interestingly, it is also a common and eective method to preserve private data against an external adversary, because the process compresses large inputs to small outputs at the intermediate sensor nodes. On the other hand, a malicious aggregator can modify the measurements of many nodes with one step, or can learn the individual measurements of individual nodes. Some countermeasure are proposed in [He et al., 2007; Zhang et al., 2008]. Cluster-based privacy data aggregation (CPDA): The basic idea of CPDA [He et al., 2007] is to introduce noise to the raw data sensed from a WSN, such that although an aggregator can obtain accurate aggregated information but not individual data points. Slice-mixed aggregation (SMART): The main idea of SMART [He et al., 2007] is to slice original data into pieces and recombine them randomly. This is done in three phases: slicing, mixing, and aggregation. Generic privacy-preservation solutions for approximate aggregation (GP 2 S ): The basic idea of GP 2 S [Zhang et al., 2008] is to generalize the values of data transmitted in a WSN, such that although individual data content cannot be decrypted, the aggregator can still obtain an accurate estimate of the histogram of data distribution, and thereby approximate the aggregates. Private date query: The query issued to a WSN (to retrieve the collected data) is often also of critical privacy concerns. To address this challenge, a target-region transformation technique was proposed in [Carbunar et al., 2007] to fuzzy the target region of the query according to predened transformation functions.
X Context-oriented privacy protection: Context-oriented privacy protection focuses on protecting contextual information, such as the location and timing information of trac transmitted in a WSN. Location privacy concerns may arise for such special sensor nodes as the data source and the base station. Timing privacy, on the other hand, concerns the time when sensitive data is created at a data source, collected by a sensor node and transmitted to the base station.
Location privacy: A major challenge for context-oriented privacy protection is that an adversary may be able to compromise private information even without the ability of decrypting the transmitted data. In particular, since hop-by-hop transmission is required to address the limited transmission range of sensor nodes, an adversary may derive the locations of important nodes and data sources by observing and analyzing the trac patterns between dierent hops. Location privacy of data source: In event driven networks, an event is generated if something interesting happens in the vicinity of the node. In some networks, the
1. INTRODUCTION only data sent to the base station is the occurrence of the event. Thus the presence of communication reveals the location of the event. In some situations, it must be hidden from an attacker. Some approaches are described in the following: Baseline and probabilistic ooding mechanisms: The basic idea of baseline ooding is for each sensor to broadcast the data it receives from one neighbor to all of its other neighbors. The premise of this approach is that all sensors participate in the data transmission so that it is unlikely for an attacker to track a path of transmission back to the data source [Kamat et al., 2005]. This can be further optimized if not every node rebroadcasts the message, only a probabilistic set of them. Random walk mechanisms: According to [Kamat et al., 2005], a random walk can be performed before the probabilistic ooding to further increase the uncertainty of the attacker. To improve simple random walk, a two-way greedy random walk(GROW) scheme was proposed in [Xi et al., 2006]. Dummy data mechanism: To further protect the location of the data source, fake data packets can be introduced to perturb the trac patterns observed by the adversary. In particular, a simple scheme called Short-lived Fake Source Routing was proposed in [Kamat et al., 2005] for each sensor to send out a fake packet with a pre-determined probability. Fake data sources mechanism: The basic idea of fake data source is to choose one or more sensor node to simulate the behavior of a real data source in order to confuse the adversaries [Mehta et al., 2007]. Location privacy of base station: In a WSN, a base station is not only in charge of collecting and analyzing data, but also used as the gateway connecting the WSN with outside wireless or wired network. Consequently, destroying or isolating the base station may lead to the malfunction of the entire network. This can be circumvented if the location of the base station is unknown to the adversary. Defense against local adversaries: The location information or identier of the base station is sent in clear in many protocols. These information must be hidden from an eavesdropper, which can be done by traditional cryptographic techniques (encryption). Another problem can be if the attacker can follow the way of packets from the source towards the base station. This can be mitigated by changing data appearance by re-encryption [Deng et al., 2006a; Dingledine et al., 2004], routing with multiple parents [Deng et al., 2005; Deng et al., 2006a], routing with random walk [Jian et al., 2007], or decorrelating parentchild relationship by randomly selecting sending time [Deng et al., 2006a]. Defense against global adversaries: The techniques discussed above are inecient against a global attacker. To ght against a global attacker the trac patterns of the whole network must be modied. This can be done by hiding trac pattern by controlling transmission rate [Deng et al., 2006a], or by propagating dummy data [Deng et al., 2005; Deng et al., 2006a]. Temporal privacy problem: When an adversary eavesdrops a message, it can deduce the sending time of the message from the time it eavesdropped and the TTL value. In some applications this information must be hidden. It can be done by randomly delaying the messages by the relaying nodes [Kamat et al., 2007]. As it can be seen from the discussion above, a considerable amount of work has been done in the eld of privacy in wireless sensor networks. However, the particular problem of location privacy of aggregator nodes received less attention. Therefore, in Chapter 4, I study this problem and propose two anonym aggregator election protocols, which can hide the identity of the aggregator nodes.
1.3. Introduction to Wireless Sensor Networks
The remainder of the dissertation is organized as follows: In Chapter 2, I propose two private authentication schemes for resource limited systems, such as RFID systems. The results presented in Chapter 2 have been published in [Buttyan et al., 2006a; Buttyan et al., 2006b; Avoine et al., 2007]. In Chapter 3, I analyze the privacy achieved by pseudonym changing techniques in vehicular ad hoc networks, and propose a pseudonym changing algorithm for VANETs. All results of Chapter 3 have been published in [Buttyan et al., 2007; Papadimitratos et al., 2008; Holczer et al., 2009; Buttyan et al., 2009]. In Chapter 4, I analyze how an aggregator node can be elected and used in wireless sensor networks without revealing its identity. All results of Chapter 4 have been published in [Butty an and Holczer, 2009; Butty an and Holczer, 2010; Holczer and Butty an, 2011; Schaer et al., 2012]. The possible application of the new results can be found in Chapter 5, while Chapter 6 concludes the dissertation.
Chapter
Private Authentication in Resource Constrained Environments

2.1 Introduction to private authentication
Entity authentication is the process whereby a party (the prover) corroborates its identity to another party (the verier). Entity authentication is often based on authentication protocols in which the parties pass messages to each other. These protocols are engineered in such a way that they resist various types of impersonation and replay attacks [Boyd and Mathuria, 2003]. However, less attention is paid to the requirement of preserving the privacy of the parties (typically that of the prover) with respect to an eavesdropping third party. Indeed, in many of the well-known and widely used authentication protocols (e.g., [ISO, 2008; Kohl and Neuman, 1993]) the identity of the prover is sent in cleartext, and hence, it is revealed to an eavesdropper. One approach to solve this problem is based on public key cryptography, and it consists of encrypting the identity information of the prover with the public key of the verier so that no one but the verier can learn the provers identity [Abadi and Fournet, 2004]. Another approach, also based on public key techniques, is that the parties rst run an anonymous Die-Hellman key exchange and establish a condential channel, through which the prover can send its identity and authentication information to the verier in a second step. An example for this second approach is the main mode of the Internet Key Exchange (IKE and IKEv2) protocol [Harkins and Carrel, 1998; Black and McGrew, 2008]. While it is possible to hide the identity of the prover by using the above mentioned approaches, they provide appropriate solution to the problem only if the parties can aord public key cryptography. In many applications, such as low cost RFID tags and contactless smart card based automated fare collection systems in mass transportation, this is not the case, while at the same time, the provision of privacy (especially location privacy) in those systems is strongly desirable. The problem of using symmetric key encryption to hide the identity of the prover is that the verier does not know which symmetric key it should use to decrypt the encrypted identity, because the appropriate key cannot be retrieved without the identity. The verier may try all possible keys in its key database until one of them properly decrypts the encrypted identity1 , but this would increase the authentication delay if the number of potential provers is large. Long authentication delays are usually not desirable, moreover, in some cases, they may not even be acceptable. As an example, let us consider again contactless smart card based electronic tickets in public transportation: the number of smart cards in the system (i.e., the number of potential provers) may be very large in big cities, while the time needed to authenticate a card should be short in order to ensure a high throughput of passengers and avoid long queues at entry points.
1
This of course requires redundancy in the encrypted message so that the verier can determine if the decryption was successful.
2. PRIVATE AUTHENTICATION Some years ago, Molnar and Wagner proposed an elegant approach to privacy protecting authentication [Molnar and Wagner, 2004] that is based on symmetric key cryptography while still ensuring short authentication delays. More precisely, the complexity of the authentication procedure in the Molnar-Wagner scheme is logarithmic in the number of potential provers, in contrast with the linear complexity of the na ve key search approach. The main idea of Molnar and Wagner is to use key-trees (see Figure 2.1 for illustration). A key-tree is a tree where a unique key is assigned to each edge. The leaves of the tree represent the potential provers, which is called members in the sequel. Each member possesses the keys assigned to the edges of the path starting from the root and ending in the leaf that corresponds to the given member. The verier knows all keys in the tree. In order to authenticate itself, a member uses all of its keys, one after the other, starting from the rst level of the tree and proceeding towards lower levels. The verier rst determines which rst level key has been used. For this, it needs to search through the rst level keys only. Once the rst key is identied, the verier continues by determining which second level key has been used. However, for this, it needs to search through those second level keys only that reside below the already identied rst level key in the tree. This process is continued until all keys are identied, which at the end, identify the authenticating member. The key point is that the verier can reduce the search space considerably each time a key is identied, because it should consider only the subtree below the recently identied key.
k1 k11 k111
Figure 2.1: Illustration of a key-tree. There is a unique key assigned to each edge. Each leaf represents a member of the system that possesses the keys assigned to the edges of the path starting from the root and ending in the given leaf. For instance, the member that belongs to the leftmost leaf in the gure possesses the keys k1 , k11 , and k111 .
The problem of the above described tree-based approach is that upper level keys in the tree are used by many members, and therefore, if a member is compromised and its keys become known to the adversary, then the adversary gains partial knowledge of the key of other members too [Avoine et al., 2005]. This obviously reduces the privacy provided by the system to its members, since by observing the authentication of an uncompromised member, the adversary can recognize the usage of some compromised keys, and therefore its uncertainty regarding the identity of the authenticating member is reduced (it may be able to determine which subtree the member belongs to). One interesting observation is that the na ve, linear key search approach can be viewed as a special case of the key-tree based approach, where the key-tree has a single level and each member has a single key. Regarding the above described problem of compromised members, the na ve approach is in fact optimal, because compromising a member does not reveal any key information of other members. At the same time, as described above, the authentication delay is the worst in this case. On the other hand, in case of a binary key-tree, it can be observed that the compromise of a single member strongly 2 aects the privacy of the other members, while at the same time, the binary tree is very advantageous in terms of authentication delay. Thus, there seems to be a trade-o between the level of privacy provided by the system and the authentication delay, which depends on the parameters of the key-tree, but it is far from obvious to see how the optimal
2
The precise quantication of this eect is the topic of this chapter and will be presented later.
10
2.2. Resistance to single member compromise key-tree should look like. In this chapter, I address this problem, and I show how to nd optimal key-trees. In this chapter, after nding the optimal key-tree, I go further and I present a novel symmetrickey private authentication scheme that provides a higher level of privacy and achieves better eciency than the key-tree based approach. This approach is called the group based approach. More precisely, the complexity of the group based scheme for the reader can be set to be O(log N ) (i.e., the same as in the key-tree based approach), while the complexity for the tags is always a constant (in contrast to O(log N ) of the key-tree based approach). Hence, the group based scheme is better than the key-tree based scheme both in terms of privacy and eciency, and therefore, it is a serious alternative to the key-tree based scheme to be considered by the RFID community. More precisely, the main contributions are the following:
X I propose a benchmark metric for measuring the resistance of the system to a single compromised member based on the concept of anonymity sets. To the best of my knowledge, anonymity sets have not been used in the context of private authentication yet. I prove that this simply dened metric is equivalent to a metric widely used in cryptography with a much more complex denition. The real contribution of the metric, is that its denition simplies the usage of the metric without losing any details of the more complex metric. X I introduce the idea of using dierent branching factors at dierent levels of the key-tree; the advantage is that the systems resistance to single member compromise can be increased while still keeping the authentication delay short. To the best of my knowledge, key-trees with variable branching factors have not been proposed yet for private authentication. X I present an algorithm for determining the optimal parameters of the key-tree, where optimal means that resistance to single member compromise is maximized, while the authentication delay is kept below a predened threshold. X In the general case, when any member can be compromised, I give a lower bound on the level of privacy provided by the system, and present some simulation results that show that this lower bound is quite sharp. This allows me to compare dierent systems based on their lower bounds. X I introduce a group based approach, which is superior to the tree-based approach in many properties. X In summary, I propose practically usable techniques for designers of RFID based authentication systems.
The outline of the chapter is the following: in Section 2.2, I introduce my benchmark metric to measure the level of privacy provided by key-tree or group based authentication systems, and I illustrate, through an example, how this metric can be used to compare systems with dierent parameters. By the same token, I also show that key-trees with variable branching factors can be better than key-trees with a constant branching factor at every level. In Section 2.3, I formulate the problem of nding the best key-tree with respect to my benchmark metric as an optimization problem, and I present an algorithm that solves that optimization problem. In Section 2.4, I consider the general case, when any number of members can be compromised, and I derive a useful lower bound on the level of privacy provided by the system. After nding the optimal key-tree, I describe the operation of my group based scheme in Section 2.5, and I quantify the level of privacy that it provides in Section 2.6. I compare the group based scheme to the key-tree based approach in Section 2.7. Finally, in Section 2.8, I report on some related work, and in Section 2.9, I conclude the chapter.
2.2
Resistance to single member compromise
There are dierent ways to measure the level of anonymity provided by a system [Diaz et al., 2002; Serjantov and Danezis, 2003]. Here the concept of anonymity sets [Chaum, 1988] is used. The
11
2. PRIVATE AUTHENTICATION anonymity set of a member v is the set of members that are indistinguishable from v from the adversarys point of view. The size of the anonymity set is a good measure of the level of privacy provided for v , because it is related to the level of uncertainty of the adversary, if all members of the set are equiprobably likely (otherwise an entropy based metric can be used). Clearly, the larger the anonymity set is, the higher the level of privacy is. The minimum size of the anonymity set is 1, and its maximum size is equal to the number of all members in the system. In order to make the privacy measure independent of the number of members, one can divide the anonymity set size by the total number of members, and obtain a normalized privacy measure between 0 and 1. Such normalization makes the comparison of dierent systems easier. Now, let us consider a key-tree with levels and branching factors b1 , b2 , . . . , b at the levels, and let us assume that exactly one member is compromised (see Figure 2.2 for illustration). Knowledge of the compromised keys allows the adversary to partition the members into subsets P0 , P1 , P2 , . . ., where
X P0 contains the compromised member only, X P1 contains the members the parent of which is the same as that of the compromised member, and that are not in P0 , X P2 contains the members the grandparent of which is the same as that of the compromised member, and that are not in P0 P1 , X etc.
Members of a given subset are indistinguishable for the adversary, while it can distinguish between members that belong to dierent subsets. Hence, each subset is the anonymity set of its members.
k1 k11 k111 P0 P1 P2 P3
Figure 2.2: Illustration of what happens when a single member is compromised. Without loss of generality, it is assumed that the member corresponding to the leftmost leaf in the gure is compromised. This means that the keys k1 , k11 , and k111 become known to the adversary. This knowledge of the adversary partitions the set of members into anonymity sets P0 , P1 , . . . of dierent sizes. Members that belong to the same subset are indistinguishable to the adversary, while it can distinguish between members that belong to dierent subsets. For instance, the adversary can recognize a member in subset P1 by observing the usage of k1 and k11 but not that of k111 , where each of these keys are known to the adversary. Members in P3 are recognized by not being able to observe the usage of any of the keys known to the adversary.
The level of privacy provided by the system can be characterized by the level of privacy provided to a randomly selected member, or in other words, by the expected size of the anonymity set of a randomly selected member. By denition, the expected anonymity set size is: = S
|Pi | i=0
|P i | =
|P i |2 i=0
(2.1)
12
2.2. Resistance to single member compromise where N is the total number of members, and |Pi |/N is the probability of selecting a member from subset Pi . The resistance to single member compromise, denoted by R, is dened as the normalized expected anonymity set size, which can be computed as follows: R = = = where it is used that |P0 | = 1 |P1 | = b 1 |P2 | = (b1 1)b |P3 | = (b2 1)b1 b ... ... |P | = (b1 1)b2 b3 . . . b As its name indicates, R characterizes the loss of privacy due to the compromise of a single member of the system. If R is close to 1, then the expected anonymity set size is close to the total number of members, and hence, the loss of privacy is small. On the other hand, if R is close to 0, then the loss of privacy is high, as the expected anonymity set size is small. R is used as a benchmark metric based on which dierent systems can be compared. This metric can be seen as being a little ad hoc, but actually the same metric is used in other papers like [Avoine et al., 2005] with a dierent more complex denition: Theorem 1. The expected anonymity set size based metric (R) is complement to the one tag tampering based metric (M ) dened in [Avoine et al., 2005]. Proof. The metric M used in [Avoine et al., 2005] is dened in that paper as: 1. The attacker has one tag T0 (e.g., her own) she can tamper with and thus obtain its complete secret. For the sake of calculation simplicity, we assume that T0 is put back into circulation. When the number of tags in the system is large, this does not signicantly aect the results. 2. She then chooses a target tag T. She can query it as much as she wants but she cannot tamper with it. 3. Given two tags T1 and T2 such that T {T1 , T2 }, we say that the attacker succeeds if she denitely knows which of T1 and T2 is T . We dene the probability to trace T as being the probability that the attacker succeeds. To do that, the attacker can query T1 and T2 as many times as she wants but, obviously, cannot tamper with them. In the following P1 . . . Pk are the subsets of the tags after the compromise of some tags k ( i=1 Pi = N ). In the third step, the attacker can be successful if (and only if) T1 and T2 belongs to dierent subsets. The probability of the attackers success is the probability that two randomly chosen tags belongs to two dierent subsets. This probability can be calculated as follows: M = 1 Pr(T1 , T2 are in P1 ) . . . Pr(T1 , T2 are in Pk ) = 1 This is the complement of the metric R (M + R = 1). )2 k ( Pi
i=1 S |Pi |2 = N N2 i=0
) 1 ( 1 + (b 1)2 + ((b1 1)b )2 + . . . + ((b1 1)b2 b3 . . . b )2 N2 1 1 (bi 1)2 b2 1 + (b 1)2 + j N2 i=1 j =i+1
(2.2)
13
2. PRIVATE AUTHENTICATION Obviously, a system with greater R is better, and therefore, one would like to maximize R (and at the same time minimize M ). However, there are some constraints. The maximum authentication delay, denoted by D, is dened as the number of basic operations needed to authenticate any member in the worst case. The maximum authentication delay in case of key-tree based authenti cation can be computed as D = i=1 bi . In most practical cases, there is an upper bound Dmax on the maximum authentication delay allowed in the system. For instance, in the specication for electronic ticketing systems for public transport applications in Hungary [Berki, 2008], it is required that a ticket validation transaction should be completed in 250 ms. Taking into account the details of the ticket validation protocol, one can derive Dmax for electronic tickets from such specications. Therefore, in practice, the designers task is to maximize R under the constraint that D Dmax . This problem is addressed in Section 2.3. In the remainder of this section, I illustrate how the benchmark metric R can be used to compare dierent systems. This exercise will also lead to an important revelation: key-trees with varying branching factors at dierent levels could provide higher level of privacy than key-trees with a constant branching factor, while having the same or even a shorter authentication delay. Example: Let us assume that the total number N of members is 27000 and the upper bound Dmax on the maximum authentication delay is 90. Let us consider a key-tree with a constant branching factor vector B = (30, 30, 30), and another key-tree with branching factor vector B = (60, 10, 9, 5). Both key-trees can serve the given population of members, since 303 = 60 10 9 5 = 27000. In addition, both key-trees ensure that the maximum authentication delay is not longer than Dmax : for the rst key-tree, we have D = 3 30 = 90, whereas for the second one, we get D = 60+10+9+5 = 84. Using (2.2), we can compute the resistance to single member compromise for both key-trees. For the rst tree, we get R 0.9355, while for the second tree we obtain R 0.9672. Thus, we can arrive to the conclusion that the second key-tree with variable branching factors is better, as it provides a higher level of privacy, while ensuring a smaller authentication delay. At this point, several questions arise naturally: Is there an even better branching factor vector than B for N = 27000 and Dmax = 90? What is the best branching factor vector for this case? How can we nd the best branching factor vector in general? I give the answers to these questions in the next section.
2.3
Optimal trees in case of single member compromise
The problem of nding the best branching factor vector can be described as an optimization problem as follows: Given the total number N of members and the upper bound Dmax on the maximum authentication delay, nd a branching factor vector B = (b1 , b2 , . . . b ) such that R(B ) is maximal subject to the following constraints:
i=1 i=1
bi
= N Dmax
(2.3)
bi
(2.4)
This optimization problem is analyzed through a series of lemmas that will lead to an algorithm that solves the problem. The rst lemma states that we can always improve a branching factor vector by ordering its elements in decreasing order, and hence, in the sequel only ordered vectors are considered: Lemma 1. Let N and Dmax be the total number of members and the upper bound on the maximum authentication delay, respectively. Moreover, let B be a branching factor vector and let B be the vector that consists of the sorted permutation of the elements of B in decreasing order. If B satises the constraints of the optimization problem dened above, then B also satises them, and R(B ) R(B ).
14
2.3. Optimal trees in case of single member compromise Proof. B has the same elements as B has, therefore, the sum and the product of the elements of B are the same as that of B , and so if B satises the constraints of the optimization problem, then B does so too. Now, let us assume that B is obtained from B with the bubble sort algorithm. The basic step of this algorithm is to change two neighboring elements if they are not in the right order. Let us suppose that bi < bi+1 , and thus, the algorithm changes the order of bi and bi+1 . Then, using (2.2), we can express R = R(B ) R(B ) as follows: R = 1 2 (bi+1 1)2 b2 b2 b2 i j + (bi 1) j N2 j =i+2 j =i+2 1 2 (bi 1)2 b2 b2 b2 i+1 j + (bi+1 1) j N2 j =i+2 j =i+2 = = =
2 j =i+2 bj N2
( ) 2 2 2 2 (bi+1 1)2 b2 i + (bi 1) (bi 1) bi+1 (bi+1 1)
( ) 2 2 (bi+1 1)2 (b2 i 1) (bi 1) (bi+1 1) (bi 1)(bi+1 1) j =i+2 b2 j ((bi+1 1)(bi + 1) (bi 1)(bi+1 + 1)) N2
2 j =i+2 bj N2
Since bi 2 for all i, R is non-negative if bi + 1 bi+1 + 1 bi 1 bi+1 1 (2.5)
x+1 But (2.5) must hold, since the function f (x) = x 1 is a monotone decreasing function, and by assumption, bi < bi+1 . This means, that when sorting the elements of B , we improve R(B ) in every step, and thus, R(B ) R(B ) must hold.
The following lemma provides a lower bound and an upper bound for the resistance to single member compromise: Lemma 2. Let B = (b1 , b2 , . . . b ) be a sorted branching factor vector (i.e., b1 b2 . . . b ). We can give the following lower and upper bounds on R(B ): )2 ( )2 ( 1 4 1 R(B ) 1 + 2 1 b1 b1 3b1 Proof. By denition 1 1 1 + (b 1)2 + (bi 1)2 b2 = j N2 i=1 j =i+1 ( )2 1 b1 1 1 = + 2 1 + (b 1)2 + (bi 1)2 b2 j b1 N i=2 j =i+1 (2.6)
(2.7)
where it is used that N = b1 b2 . . . b . The lower bound in the lemma3 follows directly from (2.7).
3
( Note that we could also derive the slightly better lower bound of
b 1 1 b1
)2 +
1 N2
from (2.7), however, we do not
need that in this chapter.
15
2. PRIVATE AUTHENTICATION In order to obtain the upper bound, we can write bi instead of (bi 1) in the sum in (2.7): ( )2 b1 1 1 R < + 2 1+ b2 j b1 N i=2 j =i ( )2 i b1 1 1 1 = + 2 1+ b1 b1 b2 i=2 j =2 j Since bi 2 for all i, we can write 2 in place of bi in the sum, and we obtain: ( )2 i b1 1 1 1 R < + 2 1+ b1 b1 4 i=2 j =2 ( ) ( )2 ( )i1 b1 1 1 1 = + 2 1+ b1 b1 4 i=2 ( ) ( )2 ( )i1 b1 1 1 1 < + 2 1+ b1 b1 4 i=2 ( )2 b1 1 1 1 = + 2 b1 b1 1 1 4 and this is the upper bound in the lemma. Let us consider the bounds in Lemma 2. Note that the branching factor vector is ordered, therefore, b1 is not smaller than any other bi . We can observe that if we increase b1 , then the dierence between the upper and the lower bounds decreases, and R(B ) gets closer to 1. Intuitively, this implies that in order to nd the solution to the optimization problem, b1 should be maximized. The following lemma conrms this intuition formally: Lemma 3. Let N and Dmax be the total number of members and the upper bound on the maxi mum authentication delay, respectively. Moreover, let B = (b1 , b2 , . . . , b ) and B = (b 1 , b2 , . . . , b ) be two sorted branching factor vectors that satisfy the constraints of the optimization problem dened above. Then, b1 > b 1 implies R(B ) R(B ). Proof. First, we can prove that the statement of the lemma is true if b 1 5. We know from Lemma 2 that ( )2 1 4 R (B ) < 1 + 2 b1 3b1 and R(B ) > ( )2 ( )2 1 1 1 1 b1 b1 + 1
where we used that b1 > b 1 by assumption. If we can prove that ( ( )2 )2 1 1 4 1 + 2 1 b1 b1 + 1 3b1 (2.8)
then we also proved that R(B ) R(B ). Indeed, a straightforward calculation yields that (2.8) is 15 true if b 1 2+ 2 , and since b1 is an integer, we are done.
Next, we can make the observation that a branching factor vector A = (a1 , . . . , ak , 2, 2) that has at least two 2s at the end can be improved by joining two 2s into a 4 and obtaining A =
16
2.3. Optimal trees in case of single member compromise (a1 , . . . , ak , 4). It is clear that neither the sum nor the product of the elements changes with this transformation. In addition, we can use the denition of R to get N 2 R(A) = ((a1 1) a2 . . . ak 2 2)2 + . . . + ((ak 1) 2 2)2 + ((2 1) 2)2 + (2 1)2 + 1
and N 2 R(A ) = ((a1 1) a2 . . . ak 4)2 + . . . + ((ak 1) 4)2 + (4 1)2 + 1
1 Thus, R(A ) R(A) = N 2 (9 4 1) > 0, which means that A is better than A. Now, that is proven that the lemma is also true for b1 {2, 3, 4}: X b 1 = 2: Since B is an ordered vector where b1 is the largest element, it follows that every 4 7 2 element of B is 2, and thus, N is a power of 2. From Lemma 2, R(B ) < (1 1 2 ) + 322 = 12 1 2 1 2 7 1 7 = 4.23. Since and R(B ) > (1 b1 ) . It is easy to see that (1 b1 ) 12 if b1
b1 > b 1 , the remaining cases are b1 = 3 and b1 = 4. However, b1 = 3 cannot be the case, because N is a power of 2. If b1 = 4, then B can be obtained from B by joining pairs of 2s into 4s and then ordering the elements. However, according to the observation above and Lemma 1, both operations improve the vector. It follows that R(B ) R(B ) must hold.
4 16 1 2 1 2 X b 1 = 3: From Lemma 2, R(B ) < (1 3 ) + 332 = 27 and R(B ) > (1 b1 ) . It is easy to see 1 2 16 9 that (1 b1 ) 27 if b1 94 3 = 4.34. Since b1 > b1 , the only remaining case is b1 = 4. In this case, the vectors are as follows:
12
i 2 2
B = (2 , . . . , 2 , 3, . . . , 3, 2, . . . , 2)
j 2i+k
B = (3, . . . , 3, 2, . . . , 2) where i, j 1 and k 0. This means that B can be obtained from B by joining i pairs of 2s into 4s and then ordering the elements. However, as we saw earlier, both joining 2s into 4s and ordering the elements improve the vector, and thus, R(B ) R(B ) must hold.
X b 1 = 4: Since B is an ordered vector where b1 is the largest element, it follows that N is not 4 31 1 2 2 divisible by 5. From Lemma 2, R(B ) < (1 1 4 ) + 342 = 48 and R(B ) > (1 b1 ) . It is 31 1 easy to see that (1 b1 )2 48 if b1 = 5.09. Since b1 > b 1 , the remaining case is 31 1
b1 = 5. However, b1 = 5 cannot be the case, because N is not divisible by 5.
48
Lemma 3 states that given two branching factor vectors, the one with the larger rst element is always at least as good as the other. The next lemma generalizes this result by stating that given two branching factor vectors the rst j elements of which are equal, the vector with the larger (j + 1)-st element is always at least as good as the other. Lemma 4. Let N and Dmax be the total number of members and the upper bound on the maxi mum authentication delay, respectively. Moreover, let B = (b1 , b2 , . . . , b ) and B = (b 1 , b2 , . . . , b ) be two sorted branching factor vectors such that bi = bi for all 1 i j for some j < min(, ), and both B and B satisfy the constraints of the optimization problem dened above. Then, bj +1 > b j +1 implies R(B ) R(B ).
17
2. PRIVATE AUTHENTICATION Proof. By denition 1 1 R(B ) = 1 + (b 1)2 + (bi 1)2 b2 j N2 i=1 j =i+1 )2 ( 1 1 b1 1 1 1 + (b 1)2 + + 2 b2 = (bi 1)2 j b1 b1 (N/b1 )2 i=2 j =i+1 ( = b1 1 b1 )2 + 1 R ( B1 ) b2 1 ( )2 +
where B1 = (b2 , b3 , . . . , b ). Similarly, R(B ) = b 11 b 1 1 R(B1 ) b12
where B1 = (b 2 , b3 , . . . , b ). Since b1 = b1 , R(B ) R(B ) if and only if R(B1 ) R(B1 ). By repeating the same argument for B1 and B1 , we get that R(B ) R(B ) if and only if R(B2 ) R ( B2 ), where B2 = (b3 , . . . , b ) and B2 = (b 3 , . . . , b ). And so on, until we get that R(B ) R(B ) if and only if R(Bj ) R(Bj ), where Bj = (bj +1 , . . . , b ) and Bj = (bj +1 , . . . , b ). But from Lemma 3, we know that R(Bj ) R(Bj ) if bj +1 > b j +1 , and we are done.
I will now present an algorithm that nds the solution to the optimization problem. However, before doing that, we need to introduce some further notations. Let B = (b1 , b2 , . . . , b ) and B = ( b 1 , b2 , . . . , b ). Then
X X
(B ) denotes (B ) denotes
i=1 bi ; i=1 bi ;
X {B } denotes the set {b1 , b2 , . . . , b } of the elements of B ; X B B means that {B } {B }; X if B B , then B \ B denotes the vector that consists of the elements of {B } \ {B } in decreasing order; X if b is a positive integer, then b|B denotes the vector (b, b1 , b2 , . . . , b ).
The algorithm is dened as a recursive function f , which takes two input parameters, a vector B of positive integers, and another positive integer d, and returns a vector of positive integers. In order to compute the optimal branching factor vector for a given N and Dmax , f should be called with the vector that contains the prime factors of N , and Dmax . For instance, if N = 27000 and ve Dmax = 90 (the same parameters are used as in the example in Sec 2.2, to compare the na and algorithmical results), then f should be called with B = (5, 5, 5, 3, 3, 3, 2, 2, 2) and d = 90. Function f will then return the optimal branching factor vector. Function f is dened Algorithm 1. The operation of the algorithm can be described as follows: The algorithm starts with a branching factor vector consisting of the prime factors of N . This vector satises the rst constraint of the optimization problem by denition. If it does not satisfy the second constraint (i.e., it does not respect the upper bound on the maximum authentication delay), then no solution exists. Otherwise, the algorithm successively improves the branching factor vector by maximizing its elements, starting with the rst element, and then proceeding to the next elements, one after the other. Maximization of an element is done by joining as yet unused prime factors until the resulting divisor of N cannot be further increased without violating the constraints of the optimization problem.
18
2.3. Optimal trees in case of single member compromise Algorithm 1 Optimal branching factor generating algorithm f (B, d) if (B ) > d then exit (no solution exists) else nd B such that B (B ) + (B \ B ) d and (B ) is maximal end if if B = B then return ( (B )) else return (B )|f (B \ B , d (B )) end if
Theorem 2. Let N and Dmax be the total number of members and the upper bound on the maximum authentication delay, respectively. Moreover, let B be a vector that contains the prime factors of N . Then, f (B, Dmax ) is an optimal branching factor vector for N and Dmax . Proof. I will give a sketch of the proof. Let B = f (B, Dmax ), and let us assume that there is another branching factor vector B = B that also satises the constraints of the optimization problem and R(B ) > R(B ). I will show that this leads to a contradiction, hence B should be optimal. Let B = (b 1 , b2 , . . . , b ) and B = (b1 , b2 , . . . , b ). Recall that B is obtained by rst maximiz ing the rst element in the vector, therefore, b1 b1 must hold. If b 1 > b1 , then R(B ) R(B ) by Lemma 3, and thus, B cannot be a better vector than B . This means that b1 = b1 must hold. We know that once b 1 is determined, the algorithm continues by maximizing the next element must hold. If b b of B . Hence, b 2 > b2 , then R(B ) R(B ) by Lemma 4, and thus, B 2 2 cannot be a better vector than B . This means that b2 = b2 must hold too. By repeating this argument, nally, we arrive to the conclusion that B = B must hold, which is a contradiction. Table 2.1 illustrates the operation of the algorithm for B = (5, 5, 5, 3, 3, 3, 2, 2, 2) and d = 90. The rows of the table correspond to the levels of the recursion during the execution. The column labeled with B contains the prime factors that are joined at a given recursion level. The optimal branching factor vector can be read out from the last column of the table (each row contains one element of the vector). From this example, we can see that the optimal branching factor vector for N = 27000 and Dmax = 90 is B = (72, 5, 5, 5, 3). For the key-tree dened by this vector, we get R 0.9725, and D = 90. Table 2.1: Illustration of the operation of the recursive function f when called with B = (5, 5, 5, 3, 3, 3, 2, 2, 2) and d = 90. The rows of the table correspond to the levels of the recursion during the execution. recursion level B d B (B ) 1 (5, 5, 5, 3, 3, 3, 2, 2, 2) 90 (3, 3, 2, 2, 2) 72 2 (5, 5, 5, 3) 18 (5) 5 3 (5, 5, 3) 13 (5) 5 4 (5, 3) 8 (5) 5 5 (3) 3 (3) 3
19
2. PRIVATE AUTHENTICATION
2.4
Analysis of the general case
So far, we have studied the case of a single compromised member. This already proved to be useful, because it allowed us to compare dierent key-trees and to derive a key-tree construction method. However, one may still be interested in what level of privacy is provided by a system in the general case when any number of members could be compromised. In this section, I address this problem.
<->
<1>
<2>
<3>
<11>
<12>
<13>
<21>
<22>
<23>
<31>
<32>
<33>
P<11>
P<2>
Figure 2.3: Illustration of what happens when several members are compromised. Just as in the case of a single compromised member, the members are partitioned into anonymity sets, but now the resulting subsets depend on the number of the compromised members, as well as on their positions in the tree. Nevertheless, the expected size of the anonymity set of a randomly selected member is still a good metric for the level of privacy provided by the system, although, in this general case, it is more dicult to compute.
In what follows, we will need to refer to the non-leaf vertices of the key-tree, and for this reason, I introduce the labelling scheme that is illustrated in Figure 2.3. In addition, we need to introduce some further notations. I call a leaf compromised if it belongs to a compromised member, and I call a non-leaf vertex compromised if it lies on a path that leads to a compromised leaf in the tree. If vertex v is compromised, then
X Kv denotes the set of the compromised children of v , and kv = |Kv |; X Pv denotes the set of subsets (anonymity sets) that belong to the subtree rooted at v (see Figure 2.3 for illustration); and
v denotes the average size of the subsets in Pv . X S . We can do that as follows: We are interested in computing S S =
P P
|P |2 b1 b2 . . . b |P |2 b1 b2 . . . b (2.9)
((b1 k )b2 . . . b )2 + b1 b2 . . . b ((b1 k )b2 . . . b ) 1 + b1 b2 . . . b b1

2
v K P Pv
v K
v S
In general, for any vertex i1 , . . . , ij such that 1 j < 1: i ,...,i = S 1 j ((bj +1 ki1 ,...,ij )bj +2 . . . b )2 1 + bj +1 . . . b bj +1
v Ki1 ,...,ij
v S
(2.10)
20
2.4. Analysis of the general case Finally, for vertices i1 , . . . , i1 just above the leaves, we get: i ,...,i = S 1 1 (b ki1 ,...,i1 )2 ki ,...,i1 + 1 b b (2.11)
Expressions (2.9 2.11) can be used to compute the expected anonymity set size in the system iteratively, in case of any number of compromised members. However, note that the computation depends not only on the number c of the compromised members, but also their positions in the tree. This makes the comparison of dierent systems dicult, because for a comprehensive analysis, all possible allocations of the compromised members over the leaves of the key-tree should be considered. Therefore, such a formula is preferred that depends solely on c, but characterizes the eect of compromised members on the level of privacy suciently well, so that it can serve as a basis for comparison of dierent systems. In the following, such a formula is derived based on the assumption that the compromised members are distributed uniformly at random over the leaves of the key-tree. In some sense, this is a pessimistic assumption as the uniform distribution represents the worst case, which leads to the largest amount of privacy loss due to the compromised members. Thus, the approximation that is derived can be viewed as a lower bound on the expected anonymity set size in the system when c members are compromised. Let the branching factor of the key-tree be B = (b1 , b2 , . . . , b ), and let c be the number of compromised leaves in the tree. We can estimate k for the root as follows: k min(c, b1 ) = k0 (2.12)
If a vertex i at the rst level of the tree is compromised, then the number of compromised leaves in the subtree rooted at i is approximately c/k0 = c1 . Then, we can estimate ki as follows: ki min(c1 , b2 ) = k1 (2.13) In general, if vertex i1 , . . . , ij at the j -th level of the tree is compromised, then the number of compromised leaves in the subtree rooted at i1 , . . . , ij is approximately cj 1 /kj 1 = cj , and we can use this to approximate ki1 ,...,ij as follows: ki1 ,...,ij min(cj , bj +1 ) = kj (2.14)
Using these approximations in expressions (2.9 2.11), we can derive an approximation for , which is denoted by S 0 , in the following way: S 1 S (b k1 )2 k1 + b b ... ... 2 j +1 j = ((bj +1 kj )bj +2 . . . b ) + kj S S bj +1 . . . b bj +1 ... ... 2 1 0 = ((b1 k0 )b2 . . . b ) + k0 S S b1 . . . b b1 = (2.15)
(2.16)
(2.17)
Note that expressions (2.17 2.15) do not depend on the positions of the compromised leaves in the tree, but they depend only on the value of c. 0 estimates S , some simulations are run. The simulation parameters In order to see how well S are the following:
X total number of members N = 27000; X upper bound on the maximum authentication delay Dmax = 90; X Two branching factor vectors are considered: (30, 30, 30) and (72, 5, 5, 5, 3); X The number c of compromised members is varied between 1 and 270 with a step size of one.
21
2. PRIVATE AUTHENTICATION For each value of c, I run 100 simulations4 . In each simulation run, the c compromised members were chosen uniformly at random from the set of all members. The exact value of the normalized /N is computed using the expressions (2.9 2.11). Finally, the expected anonymity set size S obtained values are averaged over all simulation runs. Moreover, for every c, I also computed the 0 /N using the expressions (2.15 2.17). estimated value S The simulation results are shown in Figure 2.4. The gure does not show the condence interwalls, because they are very small (in the range of 104 for all simulations) and thus they 0 /N approximates S /N quite well, and in general it could be hardly visible. As we can see, S provides a lower bound on the normalized expected anonymity set size.
1 Simulation result for (S<>/N) Approximation (S0/N) 1
Normalized average anonymity set size
0.8
Normalized average anonimity set size
0.8
Simulation result for (S<>/N) Approximation (S0/N)
0.6
0.6
0.4
0.4
0.2
0.2
0 0
50 100 150 200 250 Number of compromised members (c)
300
0 0
50 100 150 200 250 Number of compromised members (c)
300
Figure 2.4: Simulation results for branching factor vectors (30, 30, 30) (left hand side) and 0 /N approximates S /N quite well, and in (72, 5, 5, 5, 3) (right hand side). As we can see, S general it provides a lower bound on it.
0 /N is plotted as a function of c for dierent branching factor In Figure 2.5, the value of S vectors. This gure illustrates, how dierent systems can be compared using the approximation 0 /N of the normalized expected anonymity set size. On the left hand side of the gure, we S 0 /N is greater for the vector B = (72, 5, 5, 5, 3) than for the vector can see that the value of S B = (30, 30, 30) not only for c = 1 (as we saw before), but for larger values of c too. In fact, B seems to lose its superiority only when the value of c approaches 60, but at this range, the systems nearly provide no privacy in any case. Thus, we can conclude that B is a better branching factor vector yielding more privacy than B in general. 0 /N starts We can make another interesting observation on the left hand side of Figure 2.5: S decreasing sharply as c starts increasing, however, when c gets close to the value of the rst element 0 /N slows down. Moreover, almost exactly when of the branching factor vector, the decrease of S 0 /N seems c reaches the value of the rst element (30 in case of B , and 72 in case of B ), S to turn into constant, but at a very low value. We can conclude that, just as in the case of a single compromised member, in the general case too, the level of privacy provided by the system essentially depends on the value of the rst element of the branching factor vector. The plot on the 0 /N for two branching factor right hand side of the gure reinforces this observation: it shows S vectors that have the same rst element but that dier in the other elements. As we can see, the curves are almost perfectly overlapping. Thus, a practical design principle for key-tree based private authentication systems is to maximize the branching factor at the rst level of the key-tree. Further optimization by adjusting the branching factors of the lower levels may still be possible, but the gain is not signicant; what really counts is the branching factor at the rst level.
All computations have been done in Matlab, and for the purpose of repeatability, the source code is available on-line at http://www.crysys.hu/holczer/PET2006
22
2.5. The group-based approach

Estimated normalised average anonimity set size (S /N)
1 B = [72 5 5 5 3] B = [30 30 30]
Estimated normalised average anonimity set size (S0/N)
1 B = [60 30 15] B = [60 5 5 3 3 2]
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0 0
20 40 60 80 Number of compromised members (c)
100
0 0
20 40 60 80 Number of compromised members (c)
100
0 /N as a function of c for dierent branching factor vectors. The gure Figure 2.5: The value of S 0 /N . On the left illustrates, how dierent systems can be compared based on the approximation S 0 /N is greater for the vector (72, 5, 5, 5, 3) than for the hand side, we can see that the value of S vector (30, 30, 30) not only for c = 1 (as we saw earlier), but for larger values of c too. On the 0 /N is almost the same for the vector (60, 5, 5, 3, 3, 2) as for the right hand side, we can see that S 0 /N is essentially determined by the value of the rst vector (60, 30, 15). We can conclude that S element of the branching factor vector.
2.5
The group-based approach
In the group based authentication scheme, the set of all tags is divided into groups of equal size, and all tags of a given group share a common group key. Since the group keys do not enable the reader to identify the tags uniquely, every tag also stores a unique identier. Keys are secret (each group key is known only to the reader and the members of the corresponding group), but identiers can be public. To avoid impersonation of a tag from the same group, every tag has a unique secret key as well. This key is only shared between the tag and the reader. To reduce the storage demands on the reader side, the pairwise key can be generated from a master key using the identier of the tag. In order to authenticate a tag, the reader sends a single challenge to the tag. The answer of the tag has two parts. In the rst part, the tag answers to the reader by encrypting with the group key the readers challenge concatenated with a nonce picked by the tag, and the tags identier. In the second part, the tag encrypts the challenge concatenated with the nonce using its own secret key. Encrypting the identier is needed since the key used for encryption does not identify uniquely the tag. Upon reception of the answer, the reader identies the tag by trying all the group keys until the decryption succeeds. Then it checks the second part, that it was encrypted by the same tag. Without the second part, every tag could impersonate every other tag in the same group. The operation of the group-based private authentication scheme is illustrated in Figure 2.6. The complexity of the group-based scheme for the reader depends on the number of the groups. In particular, if there are groups, then, in the worst case, the reader must try keys. Therefore, if the upper bound on the worst case complexity is given as a design parameter, then is easily determined. For example, to get the same complexity as in the key-tree based scheme with constant branching factor, one may choose = (b logb N ) 1, where N is the total number of tags and b is the branching factor of the key-tree. The minus one indicates the decryption of the second part of the message. An immediate advantage of the group-based scheme with respect to the key-tree based approach is that the tags need to store only two keys and an identier. In contrast to this, in the key-tree based scheme, the number of keys stored by the tags depends on the depth of the tree. For instance, in the case of the Molnar-Wagner scheme, the tags must store logb N keys. Moreover, by using only two keys, this scheme also has a smaller complexity for the tag in terms of computation and communication. Besides its advantages with respect to complexity, the group-based scheme provides a higher
23
Reader R
Pick R1 R1
Tag T
EK(R1|R2|ID) EKID (R1|R2) Try all group keys until K is found Check ID's own key
Pick R2
Figure 2.6: Operation of the group-based private authentication scheme. K is the group key stored by the tag,KID is the tags own secret key, ID is the identier of the tag, R1 and R2 are random values generated by the reader and the tag, respectively, | denotes concatenation, and EK () denotes symmetric-key encryption with K .
k1 k1,1 k1,1,1 k1,1,2 k1,2 k1,2,1 k1,2,2 k2,1,1
k2 k2,1 k2,2 k2,2,2

KID1 KID2
K1
K2
K3
K4
k2,1,2 k2,2,1
KID3 KID4
KID5
KID6
KID7
KID8
Figure 2.7: On the left hand side: The tree-based authentication protocol uses a tree, where the tags correspond to the leaves of the tree. Each tag stores the keys along the path from the root to the leaf corresponding to the given tag. When authenticating itself, a tag uses all of its keys. The reader identies which keys have been used by iteratively searching through the keys at the successive levels of the tree. On the right hand side: In the group-based authentication protocol, the tags are divided into groups. Each tag stores its group key and its own key. When authenticating itself, a tag uses its group key rst, and then its own key. The reader identies which group key has been used by trying all group keys, then it checks the tags own key.
level of privacy than the key-tree based scheme when some of the tags are compromised. I will show this in Section 2.7.
2.6
Analysis of the group based approach
The metric proposed in Section 2.2 is based on the observation that when some tags are compromised, the set of all tags become partitioned such that the adversary cannot distinguish the tags that belong to the same subset, but she can distinguish the tags that belong to dierent subsets. Hence, the subsets are the anonymity sets of their members. The level R of privacy provided by the scheme is then characterized as the average anonymity set size normalized with the total number N of the tags. Formally, 1 1 |Pi | R= = 2 |Pi | |Pi |2 (2.18) N i N N i where |Pi | denotes the size of subset Pi and |Pi |/N is the probability that a randomly chosen tag belongs to subset Pi . In the group-based scheme, a similar kind of partitioning can be observed when tags become compromised. In particular, when a single tag is compromised, the adversary learns the group key of that tag, which allows her to distinguish the tags within this group from each other (since
24
2.6. Analysis of the group based approach the tags use their identiers in the protocol) and from the rest of the tags in the system. This means that each member of the compromised group forms an anonymity set of size 1, and the remaining tags form another anonymity set. In general, when more tags are compromised, we can observe that the partitioning depends on the number C of the compromised groups, where a group is compromised if at least one tag that belongs to that group is compromised. More precisely, when C groups are compromised, we get nC anonymity sets of size 1 and an anonymity set of size n( C ), where is the number of groups and n = N/ is the size of a group. This results in the following expression for the level R of the privacy according to the metric (2.18): R= ) 1 ( nC + (n( C ))2 2 N (2.19)
If tags are compromised randomly, then C , and hence, R are random variables, and the level of privacy provided by the system is characterized by the expected value of R. In order to compute that, we must compute the expected value of C and that of C 2 . This can be done as follows: let us denote by Ai the event that at least one tag from the i-th group is compromised, and let IAi be Ai s indicator function. The probability of Ai can be calculated as follows: ) N n c ) = P (Ai ) = 1 ( N c ( ) c 1 n =1 1 N j j =0 ( The expected value of C is the expected value of the sum of the indicator functions: [
i=1
(2.20)
(2.21)
] IAi =
c 1 ( j =0
E [C ] = E
i=1
P (Ai ) =
(2.22)
= 1
) n 1 N j
(2.23)
Similarly, the second moment of C can be computed as follows: [ ]2 [ 2] E C =E IAi = [

i=1
(2.24)
i=1
=E
IAi + E
i=j
(2.25) (2.26)
IAi Aj =
( ) = E [C ] + 2 P (Ai Aj ) Finally, probability P (Ai Aj ) can be computed in the following way: P (Ai Aj ) = ( ) 2P Ai Aj
= 1 P Ai Aj
(2.27) (2.28)
25
) N 2n ( ) c ( ) P Ai Aj = = N c ( ) c 1 2n = 1 N j j =0 ( ) ( ) ( ) P Ai Aj = P Ai |Aj P Aj = ) c 1 ( n = 1 1 N nj j =0
c 1 ( j =0
(2.29)
(2.30)
(2.31) (2.32)
n 1 N j
) (2.33)
Based on the above formulae, the expected value of R is computed as a function of c for N = 214 and = 64. The results are plotted on the left hand side of Figure 2.8. The same plot also contains the results of a Matlab simulation with the same parameters, where we chose the c compromised tags uniformly at random. For each value of c, 10 simulations are run, computed the exact values of the average anonymity set size using (2.19) directly, and averaged the results. As it can be seen in the gure, the analytical results match the results of the simulation. I performed the same verication for several other values of N and , and in each case, I obtained the same matching results.
2.7
Comparison of the group and the key-tree based approach
In this section, I compare the group-based scheme to the key-tree based scheme. The methodology is the following: for a given number N of tags and upper bound on the worst case complexity for the reader, I determine the optimal key-tree using the algorithm proposed in 2.3. Then, I compare the level of privacy provided by this optimal key-tree to that provided by the group-based scheme with groups and N tags. The comparison is performed by means of simulations. A simulation run consists in randomly choosing c compromised tags, and computing the resulting normalized average anonymity set size R for both the optimal key-tree and the group-based scheme. For the former, we can use the formulae (2.15 2.17), while for the latter, I use formula (2.19) directly. For each value of c, I run several simulation runs, and average the results. The simulation parameters were the following: for the number N of tags, only powers of 2 are considered, because in practice, that number is related to the size of the identier space, and identiers are usually represented as binary strings. Thus, in the simulations, N = 2x , and x is varied between 10 and 15 with a step size of 1. The values for the worst case complexity (which coincides with the number of groups in the group-based scheme) were 64, 128, and 256. Finally, the number c of compromised tags from 1 to 3 is varied. For each combination of these values, 100 simulation runs were performed. The right hand side of Figure 2.8 shows the results that we can obtain for N = 210 and = 64. The plots corresponding to the other simulation settings are not included, because they are very similar to the one in Figure 2.8. As we can see, the group-based scheme provides a higher level of privacy when the number of compromised tags does not exceed a threshold. Above the threshold, the key-tree based scheme becomes better, however, in this region, both schemes provide virtually
26
2.8. Related work
1 0.9 0.8 Simulation result Formal result
1 0.9 0.8 Tree based authentication Group based authentication
Level of privacy (R)
0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 50 100 150 200
Level of privacy(R)
0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 50 100 150 200
Number of compromised members (c)
Number of compromised members (c)
Figure 2.8: On the left hand side: The analytical results obtained for the expected value of R match the averaged results of ten simulations. The parameters are: N = 214 and = 64. On the right hand side: Results of the simulation aiming at comparing the key-tree based scheme and the group-based scheme. The curves show the level R of privacy as a function of the number c of the compromised tags. The parameters are: N = 210 and = 64. The condence intervals are not shown, because they are in the range of 103 , and therefore, they would be hardly visible. As we can see, the group-based scheme achieves a higher level of privacy when c is below a threshold. Above the threshold, the key-tree based approach is slightly better, however, in this region, both schemes provide virtually no privacy.
no privacy. Thus, for any practical purposes, the group-based scheme is better than the key-tree based scheme (even if optimal key-trees are used).
2.8
Related work
The problem of private authentication has been extensively studied in the literature recently, but most of the proposed solutions are based on public key cryptography. One example is Idemix, which is a practical anonymous credential system proposed by Camenisch and Lysyanskaya in [Camenisch and Lysyanskaya, 2001]. Idemix allows for unlinkable demonstration of the possession of various credentials, and it can be used in many applications. However, it is not applicable in resource constraint scenarios, such as low-cost RFID systems. For such applications, solutions based on symmetric key cryptography seem to be the only viable options. A comprehensive bibliography of RFID related privacy problems is maintained by Avoine in [Avoine, 2012]. A recent survey of RFID privacy approaches is published by Langheinrich [Langheinrich, 2009], where he overviews 60 papers in this eld. Another important paper by Syamsuddin et al. survey the hash chain based RFID authentication protocols in [Syamsuddin et al., 2008]. In the following I try to focus on the methods similar to the ones described in this thesis, and encourage the reader to read the aforementioned surveys for a broader view. The key-tree based approach for symmetric key private authentication has been proposed by Molnar and Wagner in [Molnar and Wagner, 2004]. However, they use a simple b-ary tree, which means that the tree has the same branching factor at every level. Moreover, they do not analyze the eects of compromised members on the level of privacy provided. They only mention that compromise of a member has a wider eect than in the case of public key cryptography based solutions. An entropy based analysis of key trees can be found in [Nohara et al., 2005]. Nohara et al. prove that their K-steps ID matching scheme (which is very similar to [Molnar and Wagner, 2004]) is secure against one compromised tag, if the number of nodes are large enough. They consider only b-ary trees, no variable branching factors.
27
2. PRIVATE AUTHENTICATION Avoine et al. analyze the eects of compromised members on privacy in the key-tree based approach [Avoine et al., 2005]. They study the case of a single compromised member, as well as the general case of any compromised members. However, their analysis is not based on the notion of anonymity sets. In their model, the adversary is rst allowed to compromise some members, then it chooses a target member that it wants to trace, and it is allowed to interact with the chosen member. Later, the adversary is given two members such that one of them is the target member chosen by the adversary. The adversary can interact with the given members, and it must decide which one is its target. The level of privacy provided by the system is quantied by the success probability of the adversary. Beye and Veugen goes a little further and analyze, what happens if the attacker has access to side channel information, and adapts the attack dynamically [Beye and Veugen, 2012]. They analyze the case of key trees described in this chapter as well and generalize the problem by setting only a minimum on N in [Beye and Veugen, 2011].
2.9
Conclusion
Key-trees provide an ecient solution for private authentication in the symmetric key setting. However, the level of privacy provided by key-tree based systems decreases considerably if some members are compromised. This loss of privacy can be minimized by the careful design of the tree. Based on the results presented in this chapter, we can conclude that a good practical design principle is to maximize the branching factor at the rst level of the tree such that the resulting tree still respects the constraint on the maximum authentication delay in the system. Once the branching factor at the rst level is maximized, the tree can be further optimized by maximizing the branching factors at the successive levels, but the improvement achieved in this way is not really signicant; what really counts is the branching factor at the rst level. In the second part of this chapter, I proposed a novel group based private authentication scheme. I analyzed the proposed scheme and quantied the level of privacy that it provides. I compared the group based scheme to the key-tree based scheme originally proposed by Molnar and Wagner, and later optimized by me in the rst half of this chapter. I showed that the group based scheme provides a higher level of privacy than the key-tree based scheme. In addition, the complexity of the group based scheme for the verier can be set to be the same as in the key-tree based scheme, while the complexity for the prover is always smaller in the latter scheme. The primary application area of the schemes are that of RFID systems, but it can also be used in applications with similar characteristics (e.g., in wireless sensor networks).
2.10
Related publications
[Buttyan et al., 2006a] Levente Buttyan, Tamas Holczer, and Istvan Vajda. Optimal key-trees for tree-based private authentication. In Proceedings of the International Workshop on Privacy Enhancing Technologies (PET), June 2006. Springer. [Buttyan et al., 2006b] Levente Buttyan, Tamas Holczer, and Istvan Vajda. Providing location privacy in automated fare collection systems. In Proceedings of the 15th IST Mobile and Wireless Communication Summit, Mykonos, Greece, June 2006. [Avoine et al., 2007] Gildas Avoine, Levente Buttyan, Tamas Holczer, and Istvan Vajda. Groupbased private authentication. In Proceedings of the International Workshop on Trust, Security, and Privacy for Ubiquitous Computing (TSPUC 2007). IEEE, 2007.
28
Chapter
Location Privacy in Vehicular Ad Hoc Networks

3.1 Introduction
In this chapter, I investigate what level of privacy a driver can achieve in Vehicular Ad Hoc Networks (VANET). More specically, in the rst half of this chapter, I investigate how can a local eavesdropping attacker trace the vehicles based on their frequently sent status information. In the second half of this chapter (from Section 3.4), I go a little further in terms of strength of attacker, and check what can a global eavesdropping attacker do. After realizing its broad capabilities, I suggest an algorithm, which can greatly reduce the attackers success rate. Recently, initiatives to create safer and more ecient driving conditions have begun to draw strong support in Europe [COM], in the US [VSC], and in Japan [ASV]. Vehicular communications will play a central role in this eort, enabling a variety of applications for safety, trac eciency, driver assistance, and entertainment. However, besides the expected benets, vehicular communications also have some potential drawbacks. In particular, many envisioned safety related applications require that the vehicles continuously broadcast their current position and speed in so called heart beat messages. This allows the vehicles to predict the movement of other nearby vehicles and to warn the drivers if a hazardous situation is about to occur. While this can certainly be advantageous, an undesirable side eect is that it makes it easier to track the physical location of the vehicles just by eavesdropping these heart beat messages. One approach to solve this problem is that the vehicles broadcast their messages under pseudonyms that they change with some frequency [Raya and Hubaux, 2005]. The change of a pseudonym means that the vehicle changes all of its physical and logical addresses at the same time. Indeed, in most of the applications, the important thing is to let other vehicles know that there is a vehicle at a given position moving with a given speed, but it is not really important which particular vehicle it is. Thus, using pseudonyms is just as good as using real identiers as far as the functionality of the applications is concerned. Obviously, these pseudonyms must be generated in such a way that a new pseudonym cannot be directly linked to previously used pseudonyms of the same vehicle. Unfortunately, only changing pseudonyms is largely ineective against a global eavesdropper that can hear all communications in the network. Such an adversary can predict the movement of the vehicles based on the position and speed information in the heart beat messages, and use this prediction to link dierent pseudonyms of the same vehicle together with high probability. For instance, if at time t, a given vehicle is at position p and moves with speed v , then after some short time , this vehicle will most probably be at position p + v . Therefore, the adversary will know that the vehicle that reports itself at (or near to) position p + v at time t + is the same vehicle as the one that reported itself at position p at time t, even if in the meantime, the vehicle changed
29
3. LOCATION PRIVACY IN VANETS pseudonym. This problem can be solved with some silent periods. This is discussed in the second part of this chapter (from Section 3.4). On the other hand, the assumption that the adversary can eavesdrop all communications in the network is a very strong one. In many situations, it is more reasonable to assume that the adversary can monitor the communications only at a limited number of places and only in a limited range. In this case, if a vehicle changes its pseudonym within the non-monitored area, then there is a chance that the adversary loses its trace. My goal in the rst half of the chapter is to characterize this chance as a function of the strength of the adversary (i.e., its monitoring capabilities). In the second part of the chapter, I assume a relatively small area, where a global eavesdropping is reasonable. I analyze what a global attacker can do, and suggest a simple algorithm to reduce the capabilities of a global attacker. In particular, my main contributions are the following:
X I dene a model in which the eectiveness of changing pseudonyms can be studied. I emphasize that while changing pseudonyms has already been proposed in the literature as a countermeasure to track vehicles [Raya and Hubaux, 2005], to the best of my knowledge, the eectiveness of this method has never been investigated rigorously in this context. My model is based on the concept of the mix zone. This concept was rst introduced in [Beresford and Stajano, 2003], but again, to the best of my knowledge, it has not been used in the context of vehicular networks so far. I characterize the tracking strategy of the adversary in the mix zone model, and I introduce a metric to quantify the level of privacy provided by the mix zone. X I report on the results of an extensive simulation where I used my model to determine the level of privacy achieved in realistic scenarios. In particular, in my simulation, I used a rather complex road map, generated trac with realistic parameters, and varied the strength of the adversary by varying the number of her monitoring points. As expected, my simulation results conrm that the level of privacy decreases as the strength of the adversary increases. However, in addition to this, my simulation results provide detailed information about the relationship between the strength of the adversary and the level of privacy achieved by changing pseudonyms. X In Section 3.5, I provide a breakdown of the requirements that a system must address in order to provide privacy. The aim is to provide an analytical framework that future researchers can use to concisely state which aspects of privacy a new proposal does or does not address. X In Section 3.6, I propose an approach for implementing mix zones that does neither require extensive RSU support nor complex communication between vehicles, and that does not endanger safety-of-life to any signicant extent, while providing both syntactic mixing and semantic mixing (in the language of Section 3.5). To my knowledge, this is the rst proposal that provides for semantic mixing while at the same time addressing the safety-of-life concerns that naturally arise when a vehicle tries to obscure its path. The key insights are simply that vehicles traveling at a low speed are less likely to cause fatal accidents, and that vehicles will be traveling at a low speed at natural mix-points such as signalled intersections. The main body of experimental work in Section 3.6 is therefore an investigation of the consequences for the untraceability of vehicles if they stop sending heartbeat messages when their speed drops below a certain threshold and change all their identiers after such silent periods. I call my scheme SLOW, which stands for silence at low speeds. (I note that of course SLOW is not a full solution to untraceability, as it does not cover the safe use of silent periods at high speeds; other techniques will need to be used to give untraceability in this case).
The organization of the chapter is the following: in Section 3.2, I introduce the mix zone model, I dene the behavior of the adversary in this model, and I introduce my privacy metric. In Section 3.3, I describe my simulation setting and the simulation results for the mix zones. In Section 3.4 I introduce the global attacker scenario. Then I introducing my overall analytical framework in Section 3.5. Next, in Section 3.6, I introduce my attacker model and my proposed
30
3.2. Model of local attacker and mix zone solution, and in Section 3.7, I present the results of my experiments showing that my approach does indeed make tracing of vehicles hard for the attacker, and that it is usable in the real world. Finally, I report on some related work in Section 3.8, and conclude the chapter in Section 3.9.
3.2
3.2.1
Model of local attacker and mix zone

The concept of the mix zone
I consider a continuous part of a road network, such as a whole city or a district of a city. I assume that the adversary installed some radio receivers at certain points of the road network with which she can eavesdrop the communications of the vehicles, including their heart beat messages, in a limited range. On the other hand, outside the range of her radio receivers, the adversary cannot hear the communications of the vehicles. Thus, the road network is divided into two distinct regions: the observed zone and the unobserved zone. Physically, these zones may be scattered, possibly consisting of many observing spots and a large unobserved area, but logically, the scattered observing spots can be considered together as a single observed zone. This is illustrated on the left hand side of Figure 3.1.
ports 1 observation spots 4 3 5 6 observed zone mix zone 2 1
3 4
Figure 3.1: On the left hand side: The gure illustrates how a road network is divided into an observed and an unobserved zone in the model. In the gure, the observed zone is grey, and the unobserved zone is white. The unobserved zone functions as a mix zone, because the vehicles change pseudonyms and mix within this zone making it dicult for the adversary to track them. On the right hand side: The gure illustrates how the road network on the left can be abstracted as single mix zone with six ports. Note that the vehicles do not know where the adversary installed her radio receivers, or in other words, when they are in the observed zone. For this reason, we can assume that the vehicles continuously change their pseudonyms1 . In this part of the chapter, we can abstract away the frequency of the pseudonym changes, and we can simply assume that it is high enough so that every vehicle surely changes pseudonym while in the unobserved zone. I intend to relax this assumption in my future work. Since the vehicles change pseudonyms while in the unobserved zone, that zone functions as a mix zone for vehicles (see the right hand side of Figure 3.1 for illustration). A mix zone [Beresford and Stajano, 2003; Beresford and Stajano, 2004] is similar to a mix node of a mix network [Chaum, 1981], which changes the encoding and the order of messages in order to make it dicult for the adversary to link message senders and message receivers. In my case, the mix zone makes it dicult for the adversary to link the vehicles that emerge from the mix zone to those that entered it earlier. Thus, the mix zones makes it dicult to track vehicles. On the other hand, based on the observation that I made in the Introduction, I assume that the adversary can track the physical location of the vehicles while they are in the observed zone, despite the fact that they may change pseudonyms in that zone too.
1
Otherwise, if the vehicles knew when they are in the unobserved zone, then it would be sucient to change their pseudonyms only once while they are in the unobserved zone.
31
3. LOCATION PRIVACY IN VANETS Since the vehicles move on roads, they cannot cross the border between the mix zone and the observed zone at any arbitrary point. Instead, the vehicles cross the border where the roads cross it. We can model this by assuming that the mix zone has ports, and the vehicles can enter and exit the mix zone only via these ports. For instance, on the right hand side of Figure 3.1, the ports are numbered from 1 to 6.
3.2.2
The model of the mix zone
While the adversary cannot observe the vehicles within the mix zone, we can assume that she still has some knowledge about the mix zone. This knowledge is subsumed in a model that consists of a matrix Q = [qij ] of size M M , where M is the number of ports of the mix zone, and M 2 discrete probability density functions fij (t) (1 i, j M ). qij is the conditional probability of exiting the mix zone at port j given that the entry point was port i. fij (t) describes the probability distribution of the delay when traversing the mix zone between port i and port j . We can assume that time is slotted, that is why fij (t) is a discrete function. I note here, that it is unlikely for an attacker to achieve such a comprehensive knowledge of the mix zone. However it is not impossible with comprehensive real world measurements to approximate the needed probabilities and functions. In the rest of the chapter, we can consider the worst case (as it is advisable in the eld of security), the attacker knows the model of the mix zone.
3.2.3
The operation of the adversary
The adversary knows the model of the mix zone and she observes events, where an event is a pair consisting of a port (port number) and a time stamp (time slot number). There are entering events and exiting events corresponding to vehicles entering and exiting the mix zone, respectively. Naturally, an entering event consists of the port where the vehicle entered the mix zone, and the time when this happened. Similarly, an exiting event consists of the port where the vehicle left the mix zone, and the time when this happened. The general objective of the adversary is to relate exiting events to entering events. More specically, in the model, the adversary picks a vehicle v in the observed zone and tracks its movement until it enters the mix zone. In the following, I denote the port at which v entered the mix zone by s. Then, the adversary observes the exiting events for a time T such that the probability that v leaves the mix zone before T is close to 1 (i.e., Pr{tout < T } = 1 , where is a small number, typically, in the range of 0.005 0.01, and tout is the random variable denoting the time at which the selected vehicle v exits the mix zone). For each exiting vehicle v , the adversary determines the probability that v is the same as v . For this purpose, she uses her observations and the model of the mix zone. Finally, she decides which exiting vehicle corresponds to the selected vehicle v . The decision algorithm used by the adversary is intuitive and straightforward: the adversary knows that the selected vehicle v entered the mix zone at port s and in timeslot 0. For each exiting event k = (j, t) that the adversary observes afterwards, she can compute the probability pjt that k corresponds to the selected vehicle as pjt = qsj fsj (t) (i.e., the probability that v chooses port j as its exit port given that it entered the mix zone at port s multiplied by the probability that it covers the distance between ports s and j in time t). The adversary decides for the vehicle for which pjt is maximal. The adversary is successful if the decided vehicle is indeed v . Indeed, the above described decision algorithm realized the Bayesian decision (see the Section 3.2.4 for more details). The importance of this fact is that the Bayesian decision minimizes the error probability, thus, it is in some sense the ideal decision algorithm for the adversary.
3.2.4
Analysis of the adversary
In this section, I show that the decision algorithm of the adversary described in Subsection 3.2.3 realizes a Bayesian decision. The following notations are used:
32
3.2. Model of local attacker and mix zone

X k is an index of a vector. Every port-timeslot pair can be mapped to such an index and k can be mapped back to a port-timeslot pair. Therefore indices and port-timeslot pairs are interchangeable, and in the following discussion, I always use the one which makes the presentation simpler. X k 1 . . . M T , where M is the number of ports, and T is the length of the attack measured in timeslots. X C = [ck ] is a vector, where ck is the number of cars leaving the mix zone at k during the attack. M T X N is the number of cars leaving the mix zone before timeslot T (i.e., N = k=1 ck ). X ps (k ) is the probability of the event that the target vehicle leaves the mix zone at k (port and time) conditioned on the event that it enters the zone at port s at time 0. The attacker exactly knows which port is s. Probability ps (k ) can be computed as: ps (k ) = qsj fsj (t), where port j and timeslot t correspond to index k . X p(k ) is the probability of the event that a vehicle leaves the mix zone at k (port and time). This distribution M can be calculated from the input distribution and the transition probabilities: p(k ) = s=1 ps (k ). X Pr(k |C ) is the conditional probability that the target vehicle left the mix zone at time and port dened by k , given that the attackers observation is vector C .
We must determine for which k probability Pr(k |C ) is maximal. Let us denote this k with k . The probability Pr(k |C ) can be rewritten, using the Bayes rule: Pr(k |C ) = Then k can be computed as: k = arg max
k
Pr(C |k )ps (k ) Pr(C )
Pr(C |k )ps (k ) = arg max Pr(C |k )ps (k ) Pr(C ) k
Pr(C |k ) has a multinomial distribution with a condition that at least one vehicle (the target of the attacker) must leave the mix zone at k : Pr(C |k ) = N! p(k )ck 1 c1 ! . . . ck1 !(ck 1)!ck+1 ! . . . cM T !
M T j =1,j =k
p(j )cj
(k) Pr(C |k ) can be multiplied and divided by pc to simplify the equation: k M T ck N! Pr(C |k ) = p(j )cj p(k ) c1 ! . . . cM T ! j =1
where the bracketed part is a constant, which does not have any eect on the maximization, thus it can be omitted. k = arg max
k
ck pk ck ps (k ) = arg max ps (k ) = arg max ps (k ) p(k ) p(k )N p(k ) k k
where pk is the empirical distribution of k (i.e., pk = ck /N ). If the number of vehicles in the k mix zone is large enough, then pp (k) 1. Thus correctness of the intuitive algorithm described in Subsection 3.2.3 holds: k = arg max ps (k )
k
33
3. LOCATION PRIVACY IN VANETS This means that if many vehicles are traveling in the mix zone, then the attacker must choose the vehicle with the highest ps (k ) probability.
3.2.5
The level of privacy provided by the mix zone
There are various metrics to quantify the level of privacy provided by the mix zone (and the fact that the vehicles continuously change pseudonyms). A natural metric in the model is the success probability of the adversary when making her decision as described above. If the success probability is large, then the mix zone and changing pseudonyms are ineective. On the other hand, if the success probability of the adversary is small, then tracking is dicult and the system ensures location privacy. We can note that the level of privacy is often measured using the anonymity set size as the metric [Chaum, 1988], however, in this case, this approach cannot be used. The problem is that as described above, with probability , the selected vehicle v is not in the set V of vehicles exiting the mix zone during the experiment of the adversary, and therefore, by denition, V cannot be the anonymity set for v . Although, the size of V could be used as a lower bound on the real anonymity set size, there is another problem with the anonymity set size as privacy metric. Namely, it is an appropriate privacy metric only if each member of the set is equally likely to be the target of the observation, however, as we will see in Section 3.3, this is not the case in my model. Obviously, the success probability of the adversary is very dicult to determine analytically due to the complexity of the model. Therefore, I ran simulations to determine its empirical value in realistic situations. The simulation setting and parameters, as well as the simulation results are described in the next section.
3.3
Simulation of mix zone
The purpose of the simulation is to get an estimation of the success probability of the attacker in realistic scenarios. In this section, I rst describe the simulation settings, and then, I present the simulation results.
3.3.1
Simulation settings
The simulation was carried out in three main phases. In the rst phase, I generated a realistic map, where the vehicles moved during the simulation. This map was generated by MOVE [Karnadi et al., 2005], a tool that allows the user to quickly generate realistic mobility models for vehicular network simulations. My map is illustrated in Figure 3.2. In fact, it is a simplied map of Budapest, the capital of Hungary, and it contains the main roads of the city. I believe that despite of the simplications, this map is still complex enough to get realistic trac scenarios. The second phase of the simulation was to generate the movement of the vehicles on the generated map. This was done by SUMO [Krajzewicz et al., 2002], which is an open source microtrac simulator, developed by the Center for Applied Informatics (ZAIK) and the Institute of Transport Research at the German Aerospace Center. SUMO dumps the state of the simulation in every time step into les. This state dump contains the location and the velocity of every vehicle during the simulation. In the third phase of the simulation, I processed the state dump generated by SUMO, and simulated the adversary. This part of the simulation was written in Perl, because Perl scripts can easily process the XML les generated by SUMO. Note that for the purpose of repeatability, I made the source code available on-line at http://www.crysys.hu/holczer/ESAS07. I implemented the adversary as follows. First, I dened the observation spots (position and radius) of the adversary in a conguration le. Then, I let the adversary build her model of the mix zone (i.e., the complement of its observation spots) by allowing her to track the vehicles as if they do not change their pseudonyms. In eect, the adversarys knowledge is represented by a set of two dimensional tables. Each table K (i) corresponds to a port i of the mix zone, and contains empirical
34
3.3. Simulation of mix zone
Figure 3.2: Simplied map of Budapest generated for the simulation.
probabilities. More specically, the entry Kjt of table K (i) contains the empirical probability that a vehicle exits the mix zone at port j in time t given that it entered the mix zone at port i at time 0. The size of the tables is M T , where M is the number of the ports of the mix zone and T is the duration of the learning procedure dened as the time until which every observed vehicle left the mix zone. Once the adversarys knowledge is built, she could use that for making decisions as described above in Section 3.2. I executed several simulation runs in order to get an estimation for the success probability of the adversary. Experiments with adversaries of dierent strength are made, where the strength of the adversary depends on the number of her eavesdropping receivers. In the simulations, all receivers were deployed in the middle of the junctions of the roads. The eavesdropping radius of the receivers was set to 50 meter. The number of the receivers varied between 5 and 59 with a step size of 5 (note that the map contains 59 junctions). Always the junctions with the highest trac was chosen as the observation spots of the adversary (for instance, when the adversary had ten receivers, I chose the rst ten junctions with the largest trac). In addition to the strength of the adversary, the intensity of the trac is varied. More specifically, I simulated three types of trac: low, medium, and high. Low trac means that in each time step 250 vehicles are emitted into the trac ow, medium trac is dened as 500 vehicles are emitted into the ow, and in case of high trac 750 vehicles are emitted. For each simulation setting (strength of the adversary and intensity of the road trac) 100 simulations were performed.
(i)
3.3.2
Simulation results
Figure 3.3 contains the resulting success probabilities of the adversary as a function of her strength. The dierent curves belong to dierent trac intensities. The results are quite intuitive: we can conclude that the stronger the adversary, the higher her success probability. Note, however, that from above a given strength, the success probability saturates at about 60 %. Higher success probabilities can not be achieved, because the order of the vehicles may change between junctions without the adversary being capable of tracking that. Note also that the saturation point is
35
3. LOCATION PRIVACY IN VANETS reached with the control of only the half of the junctions. The intensity of the trac is much less important parameter, than the strength of the attacker. The success probability of the attacker is nearly independent from the intensity of the trac above a given attacker strength.
Success probability of an attack [%]
80
70
60
50
40
30
20
Low traffic Medium traffic High traffic 0 10 20 30 40 50 60
10
Number of attacker antennas
Figure 3.3: Success probabilities of the adversary as a function of her strength. The three curves represent three dierent scenarios (the darker the line, the more intensive the trac).
The dark bars in Figure 3.4 show how the size of the set V of the vehicles that exit the mix zone during the observation period and from which the adversary has to decide to the selected vehicle varies with the strength of the adversary. The three sub-gures are related to the three dierent trac situations (low trac left, medium trac middle, high trac right). While the size of V seems to be large (which seemingly makes the adversarys decision dicult), it is also interesting to examine how uniform this set V is in terms of the probabilities assigned to the vehicles in V . Recall that the adversary computes a probability pjt for each vehicle v in V , which is the probability of v = v . These probabilities can be normalized to obtain a distribution, and the entropy of this distribution can be computed. From this entropy, I computed the eective size of V (i.e., the size to which V can be compressed due to the non-uniformity of the distribution over its members), and the light bars in the gure illustrate the obtained values. As we can see, the eective size of V is much smaller than its real size, which means that the distribution corresponding to the members of V is highly non-uniform. This is the reason why the adversary can be successful.
3.4
Global attacker
In the following part of this chapter, I assume a global eavesdropping attacker instead of a local attacker. A global eavesdropping attacker can hear all of the messages sent by the vehicles. This is a more challenging task, compared to the local attacker scenario. My work is inspired by the work of [Freudiger et al., 2007]. However, [Freudiger et al., 2007] requires the use of signicant infrastructure. By replacing [Freudiger et al., 2007]s cryptographic mix zones with zones of silence I address semantic mixing and infrastructure requirements simultaneously. In the following, in Section 3.5, I give a framework, where the minimal requirements for providing privacy for vehicles is analyzed. Next, in Section 3.6, I introduce my attacker model and my proposed solution, and in Section 3.7, I present the results of my experiments showing that my approach does indeed make tracing of vehicles hard for the attacker, and that it is usable in the real world.
36
3.5. Framework for location privacy in VANETs
Figure 3.4: The dark bars show how the size of the set V of the vehicles that exit the mix zone during the observation period varies with the strength of the adversary (y axis: number of attacker antennas). The three sub-gures are related to the three dierent trac situations (low trac left, medium trac middle, high trac right). The light bars illustrate the eective size of V . As we can see, the eective size is much smaller than the real size, which means that distribution corresponding to the members of V is highly non-uniform.
3.5
Framework for location privacy in VANETs
Any system that aims to provide privacy for vehicles must address the following areas2 : Syntactic privacy. In brief, all vehicles that use pseudonyms must change those pseudonyms from time to time. This area includes: N1 Pseudonymity: An identier that is available to an eavesdropper must not be directly linkable to the vehicle (for example, it must not contain the VIN, the drivers name, or anything else an eavesdropper might know). N2 Change of identiers: Identiers must change with some frequency3 . N3 Local synchronization of change of identiers: All identiers, up and down the network stack, must change simultaneously. (This is not a communications issue as such, but a local engineering issue; however, it must be addressed). N4 Cooperative synchronization of change of identiers or syntactic mixing: A vehicle in an observed area must change its identier at the same time as at least one other vehicle and the two (or more) changing vehicles must do so in a way that allows semantic privacy as dened below4 . N5 Pseudonym use: This covers two intermingled areas:
2 This section is mainly based on the work of my coauthor, William Whyte [Buttyan et al., 2009] 3 The frequency of change that provides privacy to the level expected by a user will in practice often depend on local regulation. 4 Otherwise, an attacker who sees, for instance, identiers (A, B, C, D ) at time t and (A, B, C, E ) at time t + 1 will know that D and E refer to the same vehicle.
37
3. LOCATION PRIVACY IN VANETS N5.1 Pseudonym format: What cryptographic mechanism is used by psuedonym owners to authenticate that they are valid units within the system? N5.2 Pseudonym issuance and renewal: How are pseudonyms issued? How does a vehicle avoid running out of them? (The answer to this may involve the identier change frequency, N2.) What assumptions are necessary about the infrastructure to ensure that a vehicle is not left without pseudonyms? Semantic privacy. This captures the idea that vehicles must not be traceable by reconstructing the trajectories implied by their heartbeat messages. This area includes: M1 Semantic unlinkability: A vehicles stream of heartbeat messages must be interrupted at some frequency for some period of time. M2 Semantic mixing: Semantic unlinkability is valuable mainly in so far as it creates ambiguity for an attacker about whether a resumed stream of heartbeats comes from vehicle A or vehicle B. Robust privacy. This captures how misbehaving entities within the system may aect privacy and security. This area includes: R1 Privacy-preserving bad-actor removal: How is a misbehaving entity removed? Does this removal aect the privacy of its transmissions before it began to misbehave? Does its removal aect the privacy of other entities in the system? R2 Privacy against insider attacks: How is privacy protected against bad actors in Law Enforcement or at a Certicate Authority (CA)? This part of the chapter explicitly contributes in the area of syntactic mixing (N4), semantic mixing (M2), and semantic unlinkability (M1). The results are based on the assumption that pseudonyms are changed whenever the criteria are met. This will be fairly frequent, on the order of once every few minutes for urban driving, implicitly addressing N2. An identier change frequency this high may require frequent reissuance of pseudonyms, limiting the choices possible in areas N5.1 and N5.2. To the best of my understanding, the following proposal is compatible with any reasonable solution for N1, N3, R1, or R2.
3.6
Attacker Model and the SLOW algorithm
A global attacker is assumed who can get mass coverage. Conceptually, the attacker might be the RSU network operator that has access to messages received by all RSUs, or the attacker might have set up a network covering an entire city5 . This is clearly an extremely powerful attack model, perhaps too powerful to be plausible, but we can use this because if the system is secure in the face of this attacker it will be secure in the face of other, weaker attackers too. The attacker can use two basic mechanisms to link transmissions from a vehicle: (1) linking pseudonyms or other identiers between heartbeat messages (syntactic linking), and (2) using the position and velocity information in the heartbeat messages to reconstruct the trajectory of the vehicle (semantic linking). We assume no supporting infrastructure in terms of an RSU network, therefore, vehicles must have a strategy to create their own mix zones, and that strategy must work even in the case where the attacker has 100% coverage. The defenders mechanism is to turn o radio transmissions (to make semantic linking dicult) and change pseudonyms (to make syntactic linking dicult) while the radio is turned o without endangering safety of life. More precisely, the proposed solution, which is called SLOW for Silence at LOW speeds, works as follows. We can choose a threshold speed vT , say vT = 30 km/h. A vehicle will not broadcast
5
Fraunhofer Institute has established that the hardware cost (ignoring the backhaul connections) to set up receivers covering all 900 km2 of Berlin is about 250, 000 Euros.
38
3.7. Analysis of SLOW any heartbeat message, or any other message containing location or trajectory data in the clear, if it is traveling below speed vT , unless this is necessary for safety- of-life reasons. If the vehicle has not sent a message for a certain period of time, then it changes pseudonyms (identiers at all layer of the network stack and related certicates) before the next transmission. Trac signals in a crowded urban area seem like an ideal location for such a pseudonym change: whenever a crowd of vehicles stop at a trac signal, they may go into one of several lanes, they may choose to turn or not turn, and so on. Thus, mix-zones are created at the point where there is maximum uncertainty about exactly where a vehicle is and exactly what it is going to do next. This is also a safe set of circumstances under which to stop transmitting. Only 5% of pedestrians struck by a vehicle at 20 km/h die [Leaf and Preusser, 1999] while at 50 km/h the gure is 40%. Presumably, vehicle-to-vehicle collisions where both cars are traveling at 30 km/h result in even fewer fatalities. Situations can be dened as exceptions. For instance, if vehicle A is stopped at a signal, but vehicle B coming up behind it emits a heartbeat that lets vehicle A know that there is a risk of a collision, then vehicle A can send out a heartbeat to warn vehicle B to brake. We can note that the simulations do not include this exception case, because in practice these cases come up only rarely. Future research based on SLOW will investigate this exception case in greater detail. We can also note that an attacker can abuse exception cases to break the silent period, but this attacker (unless it is an inside attacker) can be tracked down by standard methods and revoked. Besides being very simple to implement, SLOW has other advantages. Trac jams and slow trac leads to a large amount of vehicles in transmission range and therefore requires extensive processing power to verify the digital signatures of all incoming heartbeat messages. By refraining from sending heartbeat messages, SLOW avoids the necessity of extensive signature verications in trac jams and slow trac, and thus, reduces hardware cost. A more detailed analysis of the impact on computation complexity, as well as the level of privacy and safety provided by the scheme will be presented in the next section.
3.7
3.7.1
Analysis of SLOW
Privacy
It must be intuitively clear that a vehicle frequently sending out heartbeat messages is easy to trace, but to the best of my knowledge, no accurate experiment conrms this statement in VANET settings. As eld experiments cannot be done due to the lack of envisioned VANET infrastructure, simulations were carried out to measure the level of traceability in an urban setting. The SUMO [Krajzewicz et al., 2002] simulation environment was used, as it is a realistic, microscopic urban trac simulator. SUMO was set to use a 100 Hz frequency for internal update of vehicle position and velocities, and every N th position (N depending on the heartbeat frequency) was considered to be available to the attacker as a heartbeat. Note that tracing vehicles in an urban setting is essentially a multitarget tracking problem, which has an extensive literature, however, mostly related to radar development in the elds of aviation and sailing [Gruteser and Hoh, 2005]. Yet, the following tracking approach, consisting of three steps, can be adopted to the vehicular setting too: rst, the actual position and speed of the targets are recorded by eavesdropping the heartbeat messages. Based on the position and speed information, a predicted new position is calculated, which can be further rened by the help of side information such as the layout of the streets, lanes etc. At the next heartbeat, the new positions are eavesdropped and matched with the predicted positions. We implemented an attacker that tracked the vehicles in the SUMO output based on the tracking approach described above. The attacker uses the last two heartbeat information to calculate the acceleration of the vehicles making the prediction of the next position more accurate. The vehicles are tracked from their departure to their destination. Tracking is considered successful, if the attacker has not lost a target through its entire journey. The results of the tracking of 50 vehicles are shown in Figure 3.5. As we can see, if the beaconing frequency is 5-10 Hz, which is needed for most of the safety applications, then 75-80%
39
3. LOCATION PRIVACY IN VANETS
80
75 Success rate of tracing [%]
70
65
60
55
4 6 Beacon frequency [1/s]
10
Figure 3.5: Success rate of an attacker performing vehicle tracking by semantic linking of heartbeat messages when no defense mechanisms are in use.
of the vehicles are tracked successfully. By evaluating the unsuccessful cases, we can observe that the target vehicles were lost at their destinations. More precisely, in the vast majority of the unsuccessful cases, when the target vehicle V1 arrived to its destination and stopped sending more messages, if an other vehicle V2 was in its vicinity, then the attacker continued tracking V2 as if it was V1 . I counted this as unsuccessful case, because the attacker erroneously determined the destination of the target vehicle (i.e., it concluded that the destination of V1 was that of V2 , and those two destinations have virtually never been the same). However, during the movement of the target vehicles (i.e., before they reached their destination), the attacker was able to track them with a remarkable 99% success rate. This conrms that semantic linking is a real problem. In any case, from a privacy point of view, a system where the users are traceable with probability 0.75-0.8 is not acceptable. My proposed silent period scheme, where the vehicles stop sending heartbeat message below a given speed, mitigates this problem. It must be clear that the tracking algorithm described above does not work when the vehicles stop sending heartbeats regularly. Yet, the attacker may use other side information, such as the probability of turning to a given direction in an intersection, to improve the success probability of tracking despite the absence of the heartbeats. Thus, we need a new attacker model that also accounts for such side knowledge of the attacker. We can formalize the knowledge of the attacker as follows (for a summary of notations the reader is referred to Table 3.1): rst, each intersection is modeled with a binary matrix J , where each row corresponds to an ingress lane and each column corresponds to an egress lane of the intersection, and Jij (the entry in the i-th row and j -th column) is 1 if it is possible to traverse the intersection by arriving in ingress lane i and leaving in egress lane j . As an example, consider the intersection shown in Figure 3.6 and its corresponding matrix J dened in matrix (3.1). 0 0 0 1 1 0 0 1 0 0 J = (3.1) 1 1 0 1 1 0 0 1 0 0 1 1 0 0 0
40
3.7. Analysis of SLOW
vT J m n T W w L lD l LS
Table 3.1: Notation in SLOW threshold speed junction descriptor matrix number of lanes towards the junction number of lanes from the junction probability distribution of the targets lanes number of waiting vehicles per lanes number of waiting vehicles in the junction list of egress events decision of the attacker the targets real egress event list of suspect events
Second, we can assume that the accuracy of GPS receivers does not permit to decide with certainty which lane of a road a given vehicle is using. Therefore, we can also assume that the attacker knows on which road a target vehicle enters the intersection, but it does not know which ingress lane it is using. Nevertheless, the attacker may have some a priori knowledge on the probability of an incoming vehicle choosing a given ingress lane on a given road in a given intersection; such knowledge may be acquired by visually observing the trac in that intersection for some time. These probabilities can be arranged in an m dimensional vector T , where the i-th element Ti is the probability of choosing ingress lane i when entering the intersection on the road that contains ingress lane i. As an example, consider the intersection in Figure 3.6, and the vector T = (0.6, 0.4, 1, 0.8, 0.2) This would mean that vehicles arriving to the intersection on the road that contains ingress lanes 1 and 2 choose lane 1 with probability 0.6 and lane 2 with probability 0.4. Note that vehicles arriving on the road that contains only ingress lane 3 have no choice, hence T3 in this example is 1. Third, when multiple possible egress lanes correspond to a given ingress lane (i.e., there are more than one 1s in a given row of matrix J ), we can assume that vehicles choose any of those egress lanes uniformly at random. For example, a vehicle arriving in ingress lane 1 of the intersection in Figure 3.6 can leave the intersection in egress lane 4 or 5 with equal probability. Finally, when the target vehicle arrives at an intersection, there may already be some other vehicles waiting or moving below the threshold speed in that intersection. The number of such silent vehicles in ingress lane i is denoted by Wi , and the m dimensional vector containing all Wi values is denoted by W . Note that due to the previous assumption that the attacker is not always able to precisely determine the ingress lane used by an incoming vehicle, it is also unable to determine the exact values of all Wi s; nevertheless, it can use its experimental knowledge on the probabilities of choosing a given lane, represented by vector T , to at least estimate the Wi values. Let us denote by L the list of vehicles that leave the intersection (and thus restart sending heartbeats) after the target entered the intersection (and thus stopped sending more heartbeats). More precisely, each element Lk of list L is a (timestamp, road) pair (t, r) that represents a vehicle reappearing on road r at time t. The objective of the attacker is to decide which Lk corresponds to the target vehicle. Let us denote by the list element chosen by the attacker, and let be the list element that really corresponds to the target vehicle. The attacker is successful if and only if = . In theory, the optimal decision is the following: = arg max Pr(Lk |J, T, W, L)
k
where Pr(Lk |J, T, W, L) is the probability of Lk being the right decision given all the knowledge of the attacker. However, it seems to be dicult to calculate (or estimate) all these conditional
41
Figure 3.6: An example intersection, the corresponding matrix is given in (3.1)
probabilities, as they have to be determined for every possible intersection (J ), number of awaiting vehicles in the intersection (W ), and observation of egress events (L). Hence, I assume a more simplistic attacker that uses the following tracking algorithm: let us denote by w the total number of silent vehicles in the intersection when the target vehicle arrives and stops sending heartbeats. The attacker decides on the w-th element of L, unless that entry surely cannot correspond to the target (e.g., it is not possible to leave the intersection on the road in the w-th element of L given the road on which the target arrived to the intersection). When the w-th element of L must be excluded, the attacker chooses the next element on the list L that cannot be excluded. Our simple attacker model essentially assumes that trac at an intersection follows the FIFO (First In First Out) principle. While this is clearly not the case in practice, the attacker still achieves a reasonable success rate in a single intersection as shown in Figure 3.7. One can see, for instance, that when the total number of vehicles is 100, the attacker can still track a target vehicle through a single intersection with probability around 1 2. Figure 3.8 shows the success rate of the attacker in the general case, when the target traverses multiple intersections between its starting and destination points. As expected, the tracking capabilities of the attacker in this case are worse than in the single intersection case. The quantitative results of the simulation experiments suggest that only around 10% of the vehicles can be tracked fully by the attacker when the threshold speed is larger than 22 km/h (approximately 6 m/s). The eectiveness of the attacker depends on the vT threshold speed and the density of the vehicles. In general the higher the threshold speed at which vehicles stop sending heartbeats, the higher the chance that the attacker loses the target (i.e., the lower the chance of successful tracking). Moreover, in a dense network, it is more dicult to track vehicles. Note, however, that there is an important dierence in practice between the trac density and the threshold speed, namely, that the threshold speed can be inuenced by the owner of the vehicle, while the trac density cannot be.
42
3.7. Analysis of SLOW
100 90 80 Success rate of tracing [%] 70 60 50 40 30 20 10 0 0 2 4 6 Threshold speed [m/s] 8 10 50 100 150 200
Figure 3.7: Success rate of the simple attacker in a single intersection. Dierent curves belong to dierent experiments with the total number of vehicles given in the legend.
100 90 80 Success rate of tracing [%] 70 60 50 40 30 20 10 0 0 2 4 6 Threshold speed [m/s] 8 10 50 100 150 200
Figure 3.8: Success rate of the simple attacker in the general case, when the target traverses multiple intersections between its starting and destination points. Dierent curves belong to dierent experiments with the total number of vehicles given in the legend.
43
3.7.2
Eects on safety
The main objective of vehicular communications is to increase road safety. However, refraining from sending heartbeat messages may seem to be in contradiction with this objective. Note, however, that I propose to refrain from sending heartbeats only below a given threshold speed, and I argue below that this may not endanger the objective of road safety. According to [Leaf and Preusser, 1999], only 5% of pedestrians struck by a vehicle at 20 km/h die, while this gure is 40% at 50 km/h. In [Kloeden et al., 1997], it is shown that in a 60 km/h speed limit area, the risk of involvement in a casualty crash doubles with each 5 km/h increase in traveling speed above 60 km/h. In [Baruya, 1998], it is shown that 1 km/h change in speed can inuence the probability of an accident by 3.45%. The statistical gures above show that at lower speed the probability of an accident is lower too. This is because usually vehicles go at lower speed in areas where the drivers need to be more careful (hence the speed limit). Thus, it makes sense to rely more on the awareness of the drivers to avoid accidents at lower speeds. On the other hand, at higher speeds, accidents can be more severe, and warning from the vehicular safety communication system can play a crucial role in avoiding fatalities.
3.7.3
Eects on computation complexity
A great challenge in V2V communication deployment is the processing power of the vehicles [Kargl et al., 2008]. The most demanding task of the On Board Unit (OBU) is the verication of the signatures on the received heartbeat messages. This problem can be partially handled by not attaching certicates to every heartbeat message [Calandriello et al., 2007], but it does not solve the problem of verifying the signatures on the messages. In principle, the heavier the trac, the more vehicles are in each others communication range. More vehicles send more heartbeats overwhelming each other. The number of vehicles in communication range depends on the average speed of the trac, assuming that the vehicles keep a safety distance between each other depending on their speed. In Figure 3.9, the results of some simple calculations can be seen showing the number of signature verications performed as a function of the average speed. In this calculation, vehicles are assumed to follow each other within 2 seconds. The communication range is assumed to be 100 m and the heartbeat frequency is 10 Hz. It can be seen in the gure that, in a trac jam on an 8-lane road, each vehicle must verify as many as approximately 8,000 signatures per second. If SLOW is used with a threshold speed of around 30 km/h (approximately 8 m/s), then the vehicles never need to verify more than 1,000 signatures per second (assuming all other parameters are the same as before). This approach also works well in combination with congestion control where the transmission power is reduced in high density trac scenarios. My approach therefore makes the hardware requirements of the OBU much lower and enables the use of less expensive devices.
3.8
Related work
The privacy of VANETs is a recent topic. Many author addressed VANETs and its security and privacy in some papers (for example in [Aoki and Fujii, 1996; Luo and Hubaux, 2004; McMillin et al., 1998; Chisalita and Shahmehri, 2002; El Zarki et al., 2002; D otzer, 2006; Hubaux et al., 2004; Raya and Hubaux, 2005; Raya and Hubaux, 2007; Gerlach, 2006; Ma et al., 2010; Wiedersheim et al., 2010]). A good online bibliography for the security of VANETs can be found in [Lin and Lu, 2012]. The problem of providing location privacy for VANETs is categorised in [Gerlach, 2006], into classes. The dierence between the classes is the goal and the strength of the attacker. In [Choi et al., 2005], Choy et al. investigates how to obtain a balance between privacy and audit requirements in vehicular networks using only symmetric primitives. Ren et al. analyzes the location privacy problems in VANETs with attack trees in [Ren et al., 2011]. Many privacy preserving techniques are suggested for on-line transactions (for example in [Chaum, 1988; Gulcu and Tsudik, 1996]). Mainly they are based on mix networks [Kesdogan
44
3.8. Related work
8000 7000 Number of signatures to be verified [1/s] 6000 5000 4000 3000 2000 1000 0 8 lanes 6 lanes 4 lanes 2 lanes
10
15
20 25 Speed [m/s]
30
35
40
Figure 3.9: Number of signatures to be veried as a function of the average speed. The communication range is 100 m, and the heartbeat frequency is 10 Hz. Safety distance between the vehicles depends on their speed.
et al., 1998; Reiter and Rubin, 1998], which was basically proposed by Chaum in 1981 [Chaum, 1981]. A single mix collect messages mixes them and send them towards their destination. A mix networks consits of single mixes, which are linked together. In a mix network, some misbehaving mixes can not break the anonimity of the senders/receivers. An evident extension of mix networks to the o-line world is the the mix zones, proposed by Beresford et al. in [Beresford and Stajano, 2003; Beresford and Stajano, 2004]. A mix zone is a place where the users of the network are mixed, thus after leaving the mix zone, they can not be distinguished from each other. The problem of providing location privacy in wireless communication is well studied by Hu and Wang in [Hu and Wang, 2005]. They built a transaction-based wireless communication system in which transactions are unlinkable, and give a detailed simulation results. Their solution can provide location privacy for real-time applications as well. To qualify the operation of the mix zones, the oered anonomity must be measured. The rst metric was proposed by Chaum [Chaum, 1988], was the size of the anonimity set. It is good metric only if any user leaving the mix zone is the target with the same probability. If the probabilities are dierent, then entropy based metric should be used. Entropy based metrics were suggested by D az et. al [Diaz et al., 2002] and Serjantov et al. [Serjantov and Danezis, 2003] at the same time. For the best of my knowledge, one the most relevant paper to SLOW is done by Sampigethaya et al. in [Sampigethaya et al., 2005; Sampigethaya et al., 2007]. In the paper, they study the problem of providing location privacy in VANET in the presence of a global adversary. A location privacy scheme called CARAVAN is also proposed. The main idea of the scheme is that random silent period [Huang et al., 2005] are used in the communication to avoid continous traceability. The solution is evaluated only in freeway model and in randomly generated manhattan street model. Lu et al. arrives to similar consequences as SLOW, namely, that the pseudonyms should be changed at intersections with high trac in [Lu et al., 2012]. The main dierence between the two approaches is that in their paper, the vehicles are aware of the possible zones from a predened
45
3. LOCATION PRIVACY IN VANETS map, so the mix zones are dened priori. They use a game theoretic approach to analyze their model. The change of pseudonyms may also have a detrimental eect, especially on the eciency of routing and the packet loss ratio. In [Schoch et al., 2006], Schoch et al. investigated this problem and proposed some approaches that can guide system designers to achieve both a given level of privacy protection as well a reasonable level of performance. Another proposed approach provides multiple certicates in vehicles based on the combination of group signatures and multiple self-issued certicates [Calandriello et al., 2007; Armknecht et al., 2007]. The disadvantage is that On Board Units (OBUs) need to perform expensive group signature verication operations, and that OBUs are empowered to mount Sibyl attacks. [Studer et al., 2008] uses group signatures to request temporary certicates from a CA in an anonymous manner without the disadvantages of the previous scheme, but at the cost of an available connection to the CA. My solution suggested in Section 3.6 accounts for a global attacker without the support of the RSU infrastructure.
3.9
Conclusion
In the rst half of this chapter from Section 3.2, I studied the eectiveness of changing pseudonyms to provide location privacy for vehicles in vehicular networks. The approach of changing pseudonyms to make location tracking more dicult was proposed in prior work, but its eectiveness has not been investigated yet. In order to address this problem, I dened a model based on the concept of the mix zone. I assumed that the adversary has some knowledge about the mix zone, and based on this knowledge, she tries to relate the vehicles that exit the mix zone to those that entered it earlier. I also introduced a metric to quantify the level of privacy enjoyed by the vehicles in this model. In addition, I performed extensive simulations to study the behavior of the model in realistic scenarios. In particular, in the simulation, I used a rather complex road map, generated trac with realistic parameters, and varied the strength of the adversary by varying the number of her monitoring points. My simulation results provided detailed information about the relationship between the strength of the adversary and the level of privacy achieved by changing pseudonyms. I abstracted away the frequency with which the pseudonyms are changed, and I simply assumed that this frequency is high enough so that every vehicle surely changes pseudonym while in the mix zone. It seems that changing the pseudonyms frequently has some advantages as frequent changes increase the probability that the pseudonym is changed in the mix zone. On the other hand, the higher the frequency, the larger the cost that the pseudonym changing mechanism induces on the system in terms of management of cryptographic material (keys and certicates related to the pseudonyms). In addition, if for a given frequency, the probability of changing pseudonym in the mix zone is already close to 1, then there is no sense to increase the frequency further as it will no longer increase the level of privacy, while it will still increase the cost. Hence, there seems to be an optimal value for the frequency of the pseudonym change. Unfortunately, this optimal value depends on the characteristics of the mix zone, which is ultimately determined by the observing zone of the adversary, which is not known to the system designer. In the second half of the chapter from Section 3.4, I proposed a simple and eective privacy preserving scheme, called SLOW, for VANETs. SLOW requires vehicles to stop sending heartbeat messages below a given threshold speed (this explains the name SLOW that stands for silence at low speeds) and to change all their identiers (pseudonyms) after each such silent period. By using SLOW, the vicinity of intersections and trac lights become dynamically created mix zones, as there are usually many vehicles moving slowly at these places at a given moment in time. In other words, SLOW implicitly ensures a synchronized silent period and pseudonym change for many vehicles both in time and space, and this makes it eective as a location privacy enhancing scheme. Yet, SLOW is remarkably simple, and it has further advantages. For instance, it relieves vehicles of the burden of verifying a potentially large amount of digital signatures when the vehicle density is large, as this usually happens when the vehicles move slowly in a trac jam or stop at
46
3.10. Related publications intersections. Finally, the risk of a fatal accident at a slow speed is low, and therefore, SLOW does not seriously impact safety-of-life. I evaluated SLOW in a specic attacker model that seems to be realistic, and it proved to be eective in this model, reducing the success rate of tracking a target vehicle from its starting point to its destination down to the range of 1030%. As a conclusion of this chapter, I analyzed what a local and a global eavesdropping attacker can do when trying to trace vehicles in VANETs, and gave an ecient countermeasure against the stronger global attacker.
3.10
[Buttyan et al., 2007] Levente Buttyan, Tamas Holczer, and Istvan Vajda. On the eectiveness of changing pseudonyms to provide location privacy in vanets. In Proceedings of the Fourth European Workshop on Security and Privacy in Ad hoc and Sensor Networks (ESAS2007). Springer, 2007. [Papadimitratos et al., 2008] Panagiotis Papadimitratos, Antonio Kung, Frank Kargl, Zhendong Ma, Maxim Raya, Julien Freudiger, Elmar Schoch, Tamas Holczer, Levente Butty an, and Jean pierre Hubaux. Secure vehicular communication systems: design and architecture. IEEE Communications Magazine, 46(11):100109, 2008. [Holczer et al., 2009] Tamas Holczer, Petra Ardelean, Naim Asaj, Stefano Cosenza, Michael M uter, Albert Held, Bj orn Wiedersheim, Panagiotis Papadimitratos, Frank Kargl, and Danny De Cock. Secure vehicle communication (sevecom). Demonstration. Mobisys, June 2009. [Buttyan et al., 2009] Levente Buttyan, Tamas Holczer, Andre Weimerskirch, and William Whyte. Slow: A practical pseudonym changing scheme for location privacy in vanets. In Proceedings of the IEEE Vehicular Networking Conference, pages 18. IEEE, IEEE, October 2009.
47
Chapter
Anonymous Aggregator Election and Data Aggregation in Wireless Sensor Networks

4.1 Introduction
Wireless sensor and actuator networks are potentially useful building blocks for cyber-physical systems. Those systems must typically guarantee high-condence operation, which induces strong requirements on the dependability of their building blocks, including the wireless sensor and actuator network. Dependability means resistance against both accidental failures and intentional attacks, and it should be addressed at all layers of the network architecture, including the networking protocols and the distributed services built on top of them, as well as the hardware and software architecture of the sensor and actuator nodes themselves. Within this context, in this chapter, I focus on the security aspects of aggregator node election and data aggregation protocols in wireless sensor networks. Data aggregation in wireless sensor networks helps to improve the energy eciency and the scalability of the network. It is typically combined with some form of clustering. A common scenario is that sensor readings are rst collected in each cluster by a designated aggregator node that aggregates the collected data and sends only the result of the aggregation to the base station. In another scenario, the base station may not be present permanently in the network, and the aggregated data must be stored by the designated aggregator node in each cluster temporarily until the base station can eventually fetch the data. In both cases, the amount of communication, and hence, the energy consumption of the network can be greatly reduced by sending aggregated data, instead of individual sensor readings, to the base station. While data aggregation in wireless sensor networks is clearly advantageous with respect to scalability and eciency, it introduces some security issues. In particular, the designated aggregator nodes that collect and store aggregated sensor readings and communicate with the base station are attractive targets of physical node destruction and jamming attacks. Indeed, it is a good strategy for an attacker to locate those designated nodes and disable them, because he can prevent the reception of data from the entire cluster served by the disabled node. Even if the aggregator role is changed periodically by some election process, some security issues remain, in particular in the case when the base station is o-line and the aggregator nodes must store the aggregated data temporarily until the base station goes on-line and retrieves them. More specically, in this case, the attacker can locate and attack the node that was aggregator in a specic time epoch before the base station fetches its stored data, leading to permanent loss of data from the given cluster in the given epoch. In order to mitigate this problem, I introduced the concept of private aggregator node election, and I proposed the rst private aggregator node election protocol. Briey, the rst protocol ensures that the identity of the elected aggregator remains hidden from an attacker who observes
49
4. ANONYMOUS AGGREGATOR ELECTION AND DATA AGGREGATION IN WSNS the execution of the election process. However, this protocol ensures only protection against an external eavesdropper that cannot compromise sensor nodes, and it does not address the problem of identifying the aggregator nodes by means of trac pattern analysis after the election phase. In the second protocol, I addressed the shortcomings of the rst scheme: I proposed a new private aggregator node election protocol that is resistant even to internal attacks originating from compromised nodes, and I also proposed a new private data aggregation protocol and a new private query protocol which preserved the anonymity of the aggregator nodes during the data aggregation process and when they provide responses to queries of the base station. In the second private aggregator node election protocol, each node decides locally in a probabilistic manner to become an aggregator or not, and then the nodes execute an anonymous veto protocol to verify if at least one node became aggregator. The anonymous veto protocol ensures that non-aggregator nodes learn only that there exists at least one aggregator in the cluster, but they do not learn any information on its identity. Hence, even if such a non-aggregator node is compromised, the attacker learns no useful information regarding the identity of the aggregator. The protocols can be used to protect sensor network applications that rely on data aggregation in clusters, and where locating and then disabling the designated aggregator nodes is highly undesirable. Such applications include high-condence cyber-physical systems where sensors and actuators monitor and control the operation of some critical physical infrastructure, such as an energy distribution network, a drinking water supply system, or a chemical pipeline. A common feature of these systems is that they have a large geographical span, and therefore, the sensor network must be organized into clusters and use in-network data aggregation in order to ensure scalability and energy ecient operation. Moreover, due to the mission critical nature of these applications, it is desirable to prevent the identication of the aggregator nodes in order to limit the impact of a successful attack against the sensor network. The rst protocol that resist only an external eavesdropper is less complex than the second protocol that works in a stronger attacker model. Hence, the rst protocol can be used in case of strong resource constraints or when the risk of compromising sensor nodes is limited (e.g., it may be dicult to obtain physical access to the nodes). The second protocol is needed when the risk of compromised and misbehaving nodes cannot be eliminated by other means. The remainder of the chapter is organized as follows: in Section 4.2, I introduce my system and attacker models. In Section 4.3, I present my basic aggregator election protocol which can withstand external attacks, while in Section 4.4, I introduce my advanced protocols, which can withstand internal aggregator identication and scamming attackers as well. In Section 4.5, I give an overview of some related work, and in Section 4.6, I conclude the chapter and sketch some future research directions.
4.2
System and attacker models
A sensor network consists of sensor nodes that communicate with each other via wireless channels. Every node can generate sensor readings, and store it or forward it to another node. Each node can directly communicate with the nodes within its radio range; those nodes are called the (one-hop) neighbors of the node. In order to communicate with distant nodes (outside the radio range), the nodes use multi-hop communications. The sensor network has an operator as well, who can communicate with some of the nodes through a special node called base station, or can communicate directly with the nodes if the operator moves close to the network. Throughout the chapter, a data driven sensor network is envisioned, where every sensor node sends its measurement to a data aggregator regularly. Such data driven networks are used for regular inspection of monitored processes notably in critical infrastructures. Event driven networks can be used for reporting special usually dangerous but infrequent events like re in a building. There is no need of clustering and data aggregation in event based systems, thus private cluster aggregator election and data aggregation is not applicable there. The third kind of network is the query driven network, where the operator sends a query to the network, and the network sends a
50
4.2. System and attacker models response. This kind of functionality can be used with data driven networks, and can have privacy consequences, like the identity of the answering node should remain hidden. In the following, it is assumed, that the time is slotted, and one measurement is sent to the data aggregator in each time slot. The time synchronization between the nodes is not discussed here, but a comprehensive survey can be found in [Faizulkhakov, 2007]. It is assumed that every node shares some cryptographic credentials with the operator. These credentials are unique for every node, and the operator can store them in a lookup table, or can be generated from a master key and the nodes identier on demand. The exact denition of the credentials can be found in Section 4.3.1 and in Section 4.4.1. The nodes may be aware of their geographical locations, and they may already be partitioned into well dened geographical regions. In this case, these regions are the clusters, and the objective of the aggregator election protocol is to elect an aggregator within each geographical region. We call this approach location based clustering; an example would be the PANEL protocol [Butty an and Schaer, 2010]. A kind of generalization of the position based election is the preset case, where the nodes know the cluster ID they belong to before any communication. Here the goal of the election is to elect one node in every preset cluster. This approach is used in [Butty an and Holczer, 2010]. Alternatively, the nodes may be unaware of their locations or cluster IDs, and know only their neighbors. In this case, the clusters are not pre-determined, but they are dynamically constructed parallel to the election of the aggregators. Basically, any node may announce itself as an aggregator, and the nodes within a certain number of hops on the topology graph may join that node as cluster members. We call this approach topology based clustering; an example would be the LEACH protocol [Heinzelman et al., 2000]. The location based and the topology based approaches are illustrated in Figure 4.1.
100
100
80
80
60
60
40
40
20
20
20
40
60
80
100
20
40
60
80
100
Figure 4.1: Result of a location based (left), and topology based (right) one-hop aggregator election protocol. Solid dots represent the aggregators, and empty circles represent cluster members. Both approaches may use controlled ooding of broadcast messages. In case of location based or preset clustering, the scope of a ood is restricted to a given geographic region or preset cluster. Nodes within that region re-broadcast the message to be ooded when they receive it for the rst time. Nodes outside of the region or having dierent preset cluster IDs simply drop the message. In case of topology based clustering, it is assumed that the broadcast messages has a Time-toLive eld that controls the scope of the ooding. Any node that receives a broadcast message with a positive TTL value for the rst time will automatically decrement the TTL value and rebroadcast the message. Duplicates and messages with TTL smaller than or equal to zero are silently discarded. When I say that a node broadcasts a message, I mean such a controlled ooding (either location based, preset or topology based, depending on the context). In Section 4.4, connected dominating sets (CDS) are used to implement ecient broadcast messaging. The concept of CDS will be introduced there.
51
4. ANONYMOUS AGGREGATOR ELECTION AND DATA AGGREGATION IN WSNS We can call the set of nodes which are (in the location based and the preset case) or can potentially be (in the topology based case) in the same cluster as a node S the cluster peers of S . Hence, in the location based case, the cluster peers of S are the nodes that reside within the same geographic region as node S . In the preset case, the cluster peers are the nodes sharing the same cluster ID. In the topology based case, the set of cluster peers of S usually consists in its n-hop neighborhood, for some parameter n. The nodes may not explicitly know all their cluster peers. The main functional requirement of any clustering algorithm is that either node S or at least one of the cluster peers of S will be elected as aggregator. The leader of each cluster is called cluster aggregator, or simply aggregator. In the following I will use aggregator, cluster aggregator and data aggregator interchangeably. As mentioned in Section 4.1, an attacker can gain much more information by attacking an aggregator node than attacking a normal node. To attack a data aggregator node either physically or logically, rst the attacker must identify that node. In this chapter I assume that the attackers goal is to identify the aggregator (which means that simply preventing, jamming or confusing the aggregation is not the goal of the attacker). In Section 4.4.5 I go a little further, and analyze what happens if a compromised node does not follow the proposed protocols in order to mislead the operator. An attacker who wants to discover the identity of the aggregators can eavesdrop the communication between any nodes, can actively participate in the communication (by deleting modifying and inserting messages) and can physically compromise some of the nodes. A compromised node is under the full control of the attacker, the attacker can fully review the inner state of that node, and can control the messages sent by that node. Compromising a node is a much harder challenge for an attacker than simply eavesdropping the communication. It requires physical contact with the node and some advanced knowledge, however it is far from impossible for an attacker with good electrical and laboratory background [Anderson and Kuhn, 1996]. So I propose two solutions. The rst basic protocol can fully withstand a passive eavesdropper, but a compromising attacker can gain some knowledge about the identities of the cluster aggregators. The second advanced protocol can withstand a compromising attacker as well, with only leaking information about the compromised nodes. In case of a passive adversary, a rather simple solution could be based on a common shared global key. Using that shared global key as a seed of a pseudo random number generator, every node can construct locally (without any communications) the same pseudo randomly ordered list of all nodes. These lists will be identical for every node because all nodes use the same seed and the same pseudo random number generator. Then, the rst A nodes of the list are elected aggregators such that every node can communicate with a cluster aggregator and no subset of A covers the whole system. An illustration of the result of this algorithm can be seen on Figure 4.1 for location based and topology based cluster aggregator election. The problem with this solution is that it is not robust: compromising a single node would leak the common key, and the adversary could compute the identier of all cluster aggregators. While I do not want to fully address the problem of compromised nodes in the rst protocol, I still aim at a more robust solution than the one described above. In particular, the system should not collapse by compromising just a single or a few nodes. The second protocol can withstand the compromise of some nodes without the degradation of the privacy of the cluster aggregators. This protocol meets the following goals and has the following limitations:
X The identity of the non-compromised cluster aggregators remains secret even in the presence of passive and active attackers or compromised nodes. X The attacker can learn whether the compromised node is an aggregator. X An attacker can force a compromised node to be aggregator, but does not know anything about the existence or identity of the other aggregators. X The attacker cannot achieve that no aggregator is elected in the cluster, however all the elected aggregator(s) may be compromised nodes.
52
4.3. Basic protocol The main dierence between the rst and second protocol is the following. The rst protocol is very simple, but not perfect as a compromised node can reveal the identity of the aggregators. The second protocol requires more complex computations, but oers anonymity in case of node compromise as well. In some cases such complex computations are outside the capabilities of the nodes (or the probability of compromise is low), but anonymity is still required by the system. In these cases I suggest to use the rst protocol. If the probability of node compromise is not negligible, then the use of the second protocol is recommended.
4.3
Basic protocol
In this section, I describe the basic protocol that I propose for private aggregator node election. First I give a brief overview of the basic principles of the protocol, and present the details later. After that, some important details of this basic protocol is presented in Section 4.3.2, where I also describe how to set the parameters of the protocol.
4.3.1
Protocol description
I assume that the nodes are synchronized (see [Faizulkhakov, 2007] for a survey on time synchronization mechanism for sensor networks), and each node starts executing the protocol roughly at the same time. The protocol terminates after a predened x amount of time. During the execution of the protocol, any node that has not received any aggregator announcement yet may decide to become an aggregator, in which case, it broadcasts an aggregator announcement message announcing itself as a cluster aggregator. This message is broadcast among the cluster peers of the node sending the announcement (see Section 4.2). Upon reception of a cluster aggregator announcement, any node that has neither announced itself as a cluster aggregator nor received any such announcement yet will consider the sender of the announcement as its cluster aggregator. In order to prevent an external observer to learn the identity of the cluster aggregators, all messages sent in the protocol are encrypted such that only the nodes to whom they are intended can decrypt them. For this, it is assumed that each node shares a common key with all of its cluster peers (an overview of available key establishment mechanisms for sensor networks can be found in [Lopez and Zhou, 2008]). In addition, in order to avoid that message originators are identied as cluster aggregators, the nodes that will be cluster members are required to send dummy messages that cannot be distinguished from the announcements by the external observer (i.e., they are encrypted and disseminated in the same way as the announcements). Note that the proposed basic protocol considers only either pairwise keys between the neighboring nodes or group keys shared between sets of neighboring nodes, so no global key is assumed. Such pairwise or group keys can be established by the techniques proposed in [Lopez and Zhou, 2008]. The key establishment can be based on randomly selected key sets. In such a protocol, the probability that neighboring nodes share a common key is high, and the unused keys are deleted [Chan et al., 2003]. The key establishment can be also based on a common key which is deleted after some short time when the neighbors are discovered [Zhu et al., 2003]. Any node that owns the common key can generate a pairwise key with a node which owns or previously owned the common key. The basic method for exchanging a group/cluster key with the neighboring nodes is to send the same random key to each neighbor encrypted with the previously exchanged pairwise keys. The pseudo-code of the protocol is given in Algorithm 2, and a more detailed explanation of the protocols operation is presented below. The protocol consists of two rounds, where the length of each round is . The nodes are synchronized, they all know when the rst round begins, and what the value of is. At the beginning, each node starts two random timers, T1 and T2, where T1 expires in the rst round (uniformly at random) and T2 expires in the second round (uniformly at random). Each node also initializes at random a binary variable, called announFirst, that determines in which round the node would like to send a cluster aggregator announcement.
53
4. ANONYMOUS AGGREGATOR ELECTION AND DATA AGGREGATION IN WSNS
Algorithm 2 Basic private cluster aggregator election algorithm start T1, expires in rand(0, ) //timer, expires in round 1 start T2, expires in rand( ,2 ) //timer, expires in round 2 announFirst = (rand(0,1) ) CAID = -1 // ID of the cluster aggregator of the node while T1 NOT expired do if receive ENC(announcement) AND (CAID = -1) then CAID = ID of sender of announcement end if end while // T1 expired if announFirst AND (CAID = -1) then broadcast ENC(announcement); CAID = ID of node itself; else broadcast ENC(dummy); end if while T2 NOT expired do if receive ENC(announcement) AND (CAID = -1) then CAID = ID of sender of announcement end if end while // T2 expired if (NOT announFirst) AND (CAID = -1) then broadcast ENC(announcement); CAID = ID of node itself; else broadcast ENC(dummy); end if
54
4.3. Basic protocol
Table 4.1: Estimated time of the building blocks on a Crossbow MICAz mote Algorithm Generation [ms] Verication [ms] SHA-1 [Ganesan et al., 2003] 1.4 RSA 1024 bit [Piotrowski et al., 2006] 12040 470 RC4 [Ganesan et al., 2003] 0.1 0.1 RC5 [Ganesan et al., 2003] 0.4 0.4
The probability that announFirst is set to the rst round is , which is a system parameter. The setting of is elaborated in Section 4.3.2. In the rst round, every node S waits for its rst timer T1 to expire. If S receives an announcement before T1 expires, then the sender of the announcement will be the cluster aggregator of S . When T1 expires, S broadcasts a message as follows: if announFirst is set to the rst round and S has not received any announcement yet, then S sends an announcement, in which it announces itself as a cluster aggregator. Otherwise, S sends a dummy message. In both cases, the message is encrypted (denoted by ENC() in the algorithm) such that only the cluster peers of S can decrypt it. The second round is similar to the rst round. When T2 expires S broadcasts a message as follows: if announFirst is set to the second round and S has not received any announcement yet, then S sends an announcement, otherwise, S sends a dummy message. In both cases, the message is encrypted. It is easy to see that at the end of the second round each node is either a cluster aggregator or it is associated with a cluster aggregator whose ID is stored in variable CAID. Without the second round, a node can remain unassociated, if it sends and receives only dummy messages in the rst round. In addition, a passive observer only sees that every node sends two encrypted messages, one in each round. This makes it dicult for the adversary to identify who the cluster aggregators are (see also more discussion on this in the next section). In addition, if a node is compromised, the adversary learns only the identity of the cluster aggregators whose announcements have been received by the compromised node. In WSNs, it must be analyzed what happens if some messages are delayed or lost in the noisy unreliable channel. Two cases must be analyzed, dummy messages and announcements. If a dummy message is delayed or not delivered successfully to all recipients, then the result of the protocol is not modied as dummy messages serve for only covering the announcements. If an announcement is delayed or not delivered to a node, then the recipient will not select the sender as cluster aggregator. It will select a node who sent the announcement later or the node elects itself and sends an announcement. The message loss may modify the resulting set of cluster aggregators, but neither harm the anonymity of the elected aggregators, nor harm the original goal of cluster aggregator election (a node must be either a cluster aggregator or a cluster aggregator must be elected from the nodes cluster peers). Note that two neighboring nodes can send an announcement at the same time with some small probability. Actually, it is not a problem in the protocol. The only result is that both nodes will be cluster aggregators independently. As it is not conicting with the original goal of cluster aggregator election, this infrequent situation does not need any special attention. The overhead introduced by the basic protocol is sending two encrypted messages for each an and Schaer, 2010; Heinzelman et al., 2000] uses one (or election round. Other protocols [Butty zero) unencrypted messages to elect an aggregator. So the number of messages sent in the election phase is slightly larger compared to other solutions. The symetric encryption also causes some extra overhead (for details, see Table 4.1, rows with RC4 and RC5).
55
4.3.2
Protocol analysis
In this section the previously suggested basic protocol is analyzed. As dened in Section 4.2, the main goal of the attacker is to reveal the identity of the cluster aggregators. To do so, the attacker can eavesdrop, modify, and delete messages, and can capture some nodes. First the logical attacks are analyzed where the attacker does not capture any nodes, then the results of a node capture. The attackers main goal is to reveal the identity of the cluster aggregators. As all the inter node communication is encrypted and authenticated, it cannot get any information from the messages themselves, but it can get some side information from simple trac and topology analysis. Density based attack Thanks to the dummy messages and the encryption in the basic protocol, an external observer cannot trivially identify the cluster aggregators; however, it can still use side information and suspect some nodes to be cluster aggregators with higher probability than some other nodes. Such a side information is the number of the cluster peers of the nodes. This number correlates with the local density of the nodes, that is why this attack is called density based attack. Indeed, the probability of becoming a cluster aggregator depends on the number of the cluster peers of the node. For instance, if a node does not have any cluster peers, it will be a cluster aggregator with probability one. On the other hand, if the node has a larger number of cluster peers, then the probability of receiving an announcement from a cluster peer is large, and hence, the probability that the node itself becomes cluster aggregator is small. Note also that the number of cluster peers can be deduced from the topology of the network, which may be known to the adversary. The probability of becoming a cluster aggregator is approximately inversely proportional to the number of cluster peers: Pr(CA(S)) = 1 D(S ) (4.1)
where CA(S ) is the event of S being elected cluster aggregator, and D(S ) is the number of cluster peers of node S . Figure 4.2 illustrates this proportionality where the curve belongs to Equation 4.1 and the plotted dots correspond to simulation results (100 nodes, random deployment, one hop communication, topology based clustering). It can be seen, that Equation 4.1 is quite sharp, it is very close to the simulated results. Two approaches can be used to mitigate this problem. One is to take the number of cluster peers of the nodes into account when generating the random timers for the protocol. The second is to balance the logical network topology in such a way that every node has the same number of cluster peers. In the following a possible solution for both approaches is introduced. The rst approach can be the ne tuning of the distributions. It is not analyzed here deeply, because it can only slightly modify the probabilities of being cluster aggregator, so it has no large eect. An example can be seen on Figure 4.3, where the 10th power of D(S ) is used as a normalizing factor, when (probability of sending an announcement in the rst round) is computed. The coecients of the polynomial are set as resulting curve is the closest to uniform distribution. It can be seen, that modifying on a per node basis does not eventually reaches its goal, the normalized distribution is far from uniform. Actually by modifying , the other attack discussed in the next section can be mitigated, so here I propose a solution which does not set the parameter. The second approach modies the number of cluster peers of a node to reach a common value. Let us denote this value by . An ecient approach to mitigate this problem is to modify the number of cluster peers such that it becomes a common value for all of them. In theory, this common value can be anything between 1 and the total number N of the nodes in the network. In practice, it should be around the average number of cluster peers, which can be estimated locally by the nodes. For example, assuming one-hop communications (meaning that the cluster peers are the radio neighbors), the following formula can be used:
56
4.3. Basic protocol
1 Probability of being cluster aggregator 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 5 10 15 Number of cluster peers 20 25 Simulation Analytical
Figure 4.2: Probability of being cluster aggregator as a function of the number of cluster peers.
= (N 1)
R2 + 1 E (D(S )) A
(4.2)
where R is the radio range, and A is the size of the total area of the network. The formula is based on the fact that the number of cluster peers is proportional to the ratio between radio coverage and total area. Similar formulae can be derived for the general case of multi-hop communication. If a node S has more than cluster peers it can simply discard the messages from D(S ) randomly chosen cluster peers. If S has less than cluster peers it must get new cluster peers by the help of its actual cluster peers (if S has not got any cluster peers originally, then it will always become a cluster aggregator). The new cluster peers can be selected from the set of cluster peers of the original cluster peers. To explore the potential new cluster peers, every node can broadcast its list of cluster peers within its few hop neighborhood before running the basic protocol. From the lists of the received cluster peers, every node can select its D(S ) new cluster peers uniformly at random. Then, the basic aggregator election protocol can be executed using the balanced set of cluster peers. An example for this balancing is shown in Figure 4.4 (70 nodes, random deployment, one hop communication, topology based clustering). After running the balancing protocol, every node can approach the envisioned value. The advantage of the balancing protocol is that however an attacker can gather the information about the number of cluster peers, this number is eciently balanced after the protocol. The drawback of this solution is that it requires the original cluster peers to relay messages between distant nodes. One can imagine this solution as selectively increasing the TTL of protocol messages creating much larger neighborhoods. Order based attack Another important side information an attacker can use is the order in which the nodes send messages in the rst round of the protocol. Indeed, the sender of the i-th message will be cluster aggregator if none of the previous i 1 messages are announcements (but dummies) and the i-th message is an announcement. Thus, the probability Pi that the sender of the i-th message becomes cluster aggregator depends on i and parameter :
57
1 Probability of being cluster aggregator 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 5 10 15 Number of cluster peers 20 25 Normalized Original
Figure 4.3: Probability of being cluster aggregator as a function of number of cluster peers. The analytical values comes from Equation 4.1, while the simulation values come from simulation, where the probabilities are normalized with the number of cluster peers of the nodes.
Pi = (1 )i1 , 1 i n The (n + 1)-th element of the distribution is the probability that no announcement is sent in the rst round: Pn+1 = (1 )n in which case the sender of the rst message of the second round must be a cluster aggregator. The entropy of this distribution characterizes the uncertainty of the attacker who wants to identify the cluster aggregator using the order information. Assuming that the number of cluster peers has been already balanced, this entropy can be calculated as follows:
40 Number of cluster peers 30 20 10 0 Number of cluster peers 0 20 40 Node ID 60
40 30 20 10 0
20
40 Node ID
60
Figure 4.4: Result of balancing. The 70 nodes are represented on the x axis. The number of cluster peers before (left), and after (right) the balancing are represented on the y axis.
58
4.3. Basic protocol
H=
n ( i=1
n +1 i=1
Pi log Pi =
(4.3)
(1 )
i1
( )) i 1 log (1 )
n n
(1 ) log (1 )
where is the probability of sending an announcement in the rst round and n is the balanced number of cluster peers.
3.5
2.5
Entropy
1.5
0.5
0.2
0.4
0.6
0.8
Figure 4.5: Entropy of the attacker as a function of sending announcement in the rst round ( ). Number of nodes in one cluster: 10. In Figure 4.5, I plotted formula (4.3). If is large, then the uncertainty of the attacker is low, because one of the rst few senders will become the cluster aggregator with very high probability. If is very small, then the uncertainty of the attacker is small again, because no cluster aggregator will be elected in the rst round with high probability, and therefore, the rst sender of the second round will be the cluster aggregator. The ideal value corresponds to the maximum entropy, which can be easily computed by the nodes locally from formula (4.3). For instance, Table 4.2 shows some ideal values for dierent number of nodes in one cluster. The fth row (Hmax ) shows the maximal entropy (uncertainty) that any kind of election protocol can achieve with the given number of nodes. This is achieved if every node is equiprobably elected from the viewpoint of the attacker. This value is closely approached by H ( ), where is very close to the optimal solution (the dierence between the found value and the optimal value can be arbitrarily small, and depends on the number of iterations the estimation algorithm uses). Using the found value, the order of the messages has no meaning for the attacker. Node capture attacks If an attacker can compromise a node, it can reveal some sensitive information, even when the system uses the local key based protocol. If the compromised node is a cluster aggregator, then all the previously stored messages can be revealed. The attacker can decide to demolish the node, modify the stored values, simply use the captured data, or modify the aggregation functions.
59
Table 4.2: Optimal values ( ) for dierent number of nodes in one cluster. Achieved entropy (H ( )) and maximal entropy (Hmax = log2 n) n 10 25 50 100 0.167 0.082 0.049 0.027 n 1.67 2.05 2.45 2.7 H ( ) 3.281 4.410 5.312 6.218 Hmax 3.322 4.644 5.644 6.644 If the compromised node is not a cluster aggregator, then the attacker can reveal the cluster aggregator of that node, which can result in the same situation described in the previous paragraph.
4.3.3
Data forwarding and querying
The problem of forwarding the measured data to the aggregators without revealing the identity of the aggregators is a well known problem in the literature, called anonymous routing [Seys and Preneel, 2006; Zhang et al., 2006; Rajendran and Sreenaath, 2008]. Anonymous routing let us route packets in the network without revealing the destination of the packet. A short overview of anonymous routing can be found in Section 4.5. With anonymous routing any node can send the measurements to the aggregators without revealing the identity of it. An operator can query the aggregator with the help of an ordinary node which uses anonymous routing towards the aggregator. Anonymous routing introduces signicant overhead in the trac. However this can be partially mitigated by synchronizing the data transmissions. Instead of suggesting such an approach, in this chapter I elaborate a more challenging situation where the identity of the aggregators is unknown to the cluster members as well in Section 4.4.3. The clear advantage is that even if a node is compromised, its aggregator cannot be identied.
4.4
Advanced protocol
The advanced private data aggregation protocol is designed to withstand the compromise of some nodes without revealing the identities of the aggregator. The protocol consists of four main parts. The rst part is the initialization, which provides the required communication channel. The second part is needed for the data aggregator election. This subprotocol must ensure that the cluster does not remain without a cluster aggregator. This must be done without revealing the identity of the elected aggregator. The third part is needed for the data aggregation. This subprotocol must be able to forward the measured data to the aggregator without knowing its identier. The last part must support the queries, where an operator queries some stored aggregated data. In the following, the description of each subprotocol follows the same pattern. First the goal and the requirements of the subprotocol are discussed, then the subprotocol itself is presented. After the presentation of the subprotocol, I analyze how it achieves its goal even in the presence of an attacker, and what data and services it provides to the next subprotocol. At the end of this section, misbehavior is analyzed. I discuss, what an attacker can achieve, if its goal is not to identify the aggregators of the cluster, but to confuse the operation of the protocols. In the following, it is assumed that every node knows which cluster it belongs to. The protocol descriptions are considering only one cluster, and separate instances of the protocol are run in dierent clusters independently. The complexity of each subprotocol is summarized in Table 4.3. This table gives an overview of the message complexity of the used subprotocols, so the bandwidth requirements can be calculated from it. It can be seen, that the rarely used election protocol has the highest complexity, and the frequently used aggregation is the most lightweight protocol in use.
60
4.4. Advanced protocol
Table 4.3: Summary of complexity of the advanced protocol. N is the number of nodes in the cluster Election Aggregation Query Message complexity O (N 2 ) O(N ) O(N ) Modular exponentiations 4N 1 0 0 Hash computations 0 0 1
4.4.1
Initialization
The initialization phase is responsible for providing the medium for authenticated broadcast communication. In the following, I shortly review the approaches of broadcast authentication in wireless sensor networks, and give some ecient methods for broadcast communication. The initialization relies on some data stored on each node before deployment. Each node has some unique cryptographic credentials to enable authentication, and is aware of the cluster identier it belongs to. In the following, without further mentioning, it is assumed, that each message contains the cluster identier. Every message addressed to a cluster dierent from the one a node belongs to is discarded by the node. First, I briey review the state of the art in broadcast authentication, then I propose a connected dominating set based broadcast communication method, which ts well to the following aggregation and query phases. Broadcast authentication Broadcast authentication enables a sender to broadcast some authenticated messages eciently to a big number of potential receivers. In the literature, this problem is solved with either digital signatures or hash chains. In this section, I review some solutions from both approaches. For the sake of completeness, Message Authentication Codes (MAC) must also be mentioned here [Preneel and Oorschot, 1999]. MACs are based on symmetric cryptographic primitives, which enable very ecient computation. Unfortunately, the verier of a MAC must also possess the same cryptographic credential the generator used for generating the MAC. It means that every node must know every credential in the network, to verify every message broadcast to the network. This full knowledge can be exploited by an attacker who compromises a node. The attacker can impersonate any other honest node, which means that if only one node is compromised, message authenticity can no longer be ensured. One solution to the node compromise is the hop by hop authentication of the packets. In hop by hop authentication, every packets authentication information is regenerated by every forwarder. In this case, it is enough to only have a shared key with the direct neighbors of a node. In case of node compromise, only the node itself and the direct neighbors can be impersonated. Such a neighborhood authentication is provided by Zhu et al. in LEAP [Zhu et al., 2003], where it is based on so called cluster keys. To make the authentication scheme robust against node compromise, one approach is the usage of asymmetric cryptography, namely digital signatures. Digital signatures are asymmetric cryptographic primitives, where only the owner of a private key can compute a digital signature over a message, but any other node can verify that signature. Computing a digital signature is a time consuming task for a typical sensor node, but there exist some ecient elliptic curve based approaches in the literature [Liu and Ning, 2008; Szczechowiak et al., 2008; Oliveira et al., 2008; Xiong et al., 2010]. One of the rst publicly available implementations was the TinyECC module written by Liu and Ning [Liu and Ning, 2008]. A more ecient implementation is the NanoECC module. Proposed by Szczechowiak et al. [Szczechowiak et al., 2008]. It is based on the MIRACL cryptographic library [mir, ] . Up to now, to the best of my knowledge, the fastest implementations are the
1
4 exponentiations for generating the two messages with knowledge proofs and 4N-4 exponentiations for checking the received knowledge proofs
61
4. ANONYMOUS AGGREGATOR ELECTION AND DATA AGGREGATION IN WSNS TinyPBC by Oliveira et al. [Oliveira et al., 2008], which is based on the RELIC toolkit [rel, ], and the TinyPairing proposed by Xiong et al. in [Xiong et al., 2010]. Another approach is proposed for broadcast authentication in wireless sensor networks by Perrig et al. in [Perrig et al., 2002]. The TESLA scheme is based on delayed release of hash chain values used in MAC computations. The scheme needs secure loose time synchronization between the nodes. The TESLA scheme is ecient if it is used for authenticating many messages, but inecient if the messages are sparse. Consequently, if only the rarely sent election messages must be authenticated, then the time synchronization itself can cause a heavier workload then simple digital signatures. If the aggregation messages must also be authenticated, then TESLA can be an ecient solution. A DoS resistant version specially adapted for wireless sensor networks is proposed by Liu et al. in [Liu et al., 2005]. A faster but less secure modication is proposed by Huang et al. in [Huang et al., 2009]. In the following it is assumed, that an ecient broadcast authentication scheme is used without any indication. Broadcast communication Broadcast communication is a method that enables sending information from one source to every other participant of the network. In wireless networks it can be implemented in many ways, like ooding the network or with a sequence of unicast messages. A natural question would be, why broadcast communication is so important to the advanced protocol? The reason is that only broadcast communication can hide the trac patterns of the communication, thus not revealing any information about the aggregators. An ecient way of implementing broadcast communication in wireless sensor networks is the usage of connected dominating set (CDS). The connected dominating set S of graph G is dened as a subset of G such that every vertex in G S is adjacent to at least one member of S , and S is connected. A graphical representation of a CDS can be found in Figure 4.6. The minimum connected dominating set (MCDS) is a connected dominating set with minimum cardinality. Finding a MCDS in a graph is an NP-Hard problem, however there are some ecient solutions which can nd a close to minimal CDS in WSNs. For a thorough review of the state of the art of CDS in WSNs, the interested reader is referred to [Blum et al., 2004a] and [Jacquet, 2004]. In the following, it is assumed that a connected dominating set is given in each cluster, and a minimum spanning tree is generated between the nodes in the CDS. Finding a minimum spanning tree in a connected graph is a well known problem for decades. Ecient polynomial algorithms are suggested in [Kruskal, 1956; Prim, 1957]. This kind of two layer communication architecture enables the ecient implementation of dierent kind of broadcast like communications, which are required for the following protocols. The spanning tree is used in the aggregation protocol in Section 4.4.3. The simple all node broadcast communication can be implemented simply: if a node sends a packet to the broadcast address, then every node in the CDS forwards this message to the broadcast address. The CDS members are connected and every non CDS member is connected to at least one CDS member by denition, so the message will be delivered to every recipient in the network. This approach is more ecient than simple ooding as only a subset of the nodes forwards the message, but the properties of the CDS ensures that every node in the cluster will eventually receive the broadcast information. Here, the notion of CDS parent (or simply parent) must be introduced. The CDS parent of node A is a node, which is in communication distance with A and is a member of the CDS. The complexity of such a broadcast communication is O(N ), but actually it takes |S | messages to broadcast some information, where |S | is the number of nodes in the connected dominating set. If the CDS algorithm is accurate, than it can be very close to the minimum number of nodes required to broadcast communication. In the following, broadcast communication is used frequently to avoid that an attacker can gain some knowledge about the identity of the aggregators from the trac patterns inside the network. Obviously not every message is broadcast in the network, because that would shortly lead to
62
100 90 80 70 60 50 40 30 20 10 0 0 20 40 60 80 100
Figure 4.6: Connected dominating set. Solid dots represents the dominating set, and empty circles represent the remaining nodes. The connections between the non CDS nodes of the network is not displayed on the gure. battery depletion and inoperability of the sensor network. Instead of automatically broadcasting every message, as much information as possible is aggregated in each message to preserve energy. In the following sections, I will use the given CDS in dierent ways, and each particular usage will be described in the corresponding section. The used communication patterns are closely related to and inspired by the Echo algorithm published by Chang in [Chang, 2006]. The Echo algorithm is a Wave algorithm [Tel, 2000], which enables the distributed computation of an idempotent operator in trees. It can be used in arbitrary connected graphs, and generates a spanning tree as a side result.
4.4.2
Data aggregator election
The main goal of the aggregator node election protocol is to elect a node that can store the measurements of the whole cluster in a given epoch, but in such a way that the identity remains hidden. The election is successful if at least one node is elected. The protocol is unsuccessful if no node is elected, thus no node stores the data. In some cases, electing more than one node can be advantageous, because the redundant storage can withstand the failure of some nodes. In the following, I propose an election protocol, where the expected number of elected aggregators can be determined by the system operator, and the protocol ensures that at least one aggregator is always elected. The election process relies on the initialization subprotocol discussed in Section 4.4.1. It requires an authenticated broadcast channel among the cluster members, which is exactly what the initialization part oers. The election process consists of two main steps: (i) Every node decides, whether it wants to be an aggregator, based on some random values. This step does not need any communication, the nodes compute the results locally. (ii) In the second step, an anonymous veto protocol is run, which reveals only the information that at least one node elected itself to be aggregator node. If no aggregator is elected, it will be clear for every participant, and every participant can run the election protocol again.
63
4. ANONYMOUS AGGREGATOR ELECTION AND DATA AGGREGATION IN WSNS Step (i) can be implemented easily. Every node elects itself aggregator with a given probability p. The result of the election is kept secret, the participants only want to know that the number c of aggregators is not zero, without revealing the identity of the cluster aggregators. This is advantageous, because in case of node compromise, the attacker learns only whether the compromised node is an aggregator, but nothing about the identity or the number of the other aggregators. Let us denote the random variable representing the number of elected aggregators with C . It is easy to see that the distribution of C is binomial (N is the total number of nodes in one cluster): ) ( N N c Pr(C = c) = pc (1 p) c The expected number of aggregators after the rst step is: cE = N p. So if on average c cluster c aggregator is needed, then p should be N (this formula will be slightly modied after considering the results of the second step). The probability that no cluster aggregator is elected is: (1 p)N . To avoid this anarchical situation when no node is elected, the nodes must run step (ii) which proves that at least one node is elected as aggregator node, but the identity of the aggregator remains secret. This problem can be solved by an anonymous veto protocol. Such a protocol is suggested by Hao and Zieli nski in [Hao and Zielinski, 2006]. Hao and Zieli nskis approach has many advantageous properties compared to other solutions [Brandt, 2006; Chaum, 1988], such as it requires only 2 communication rounds. The anonym veto protocol requires knowledge proofs. Informally, a knowledge proof allows a prover to convince a verier that he knows a solution of a hard-to-solve problem without revealing any useful information about the knowledge. A detailed explanation of the problem can be found in [Camenisch and Stadler, 1997] A well known example of knowledge proof is given by Schnorr in [Schnorr, 1991]. The proposed method gives a non interactive proof of knowledge of a logarithm without revealing the logarithm itself. The operation can be described briey as follows. The proof of knowledge of the exponent of x x gi consists of the pair {g v , r = v xi h}, where h = H (g, g v , gi , i) and H is a secure hash function. This proof of knowledge can be veried by anyone through checking whether g v and g r g xi h are equal. The operation of the anonym veto protocol consists of two consecutive rounds (G is a publicly agreed group with order q and generator g ):
x is broadcast with 1. First, every participant i selects a secret random value: xi Zq . Then gi a knowledge proof. The knowledge proof is needed to ensure that the participant knows xi x in without revealing the value of xi . Without the knowledge proof, the node could choose gi x a way to inuence the result of the protocol (it is widely believed that for a given gi (mod p) it is hard to nd xi (mod p), this problem is known as the discrete logarithm problem). Then every participant checks the knowledge proofs, and computes a special product of the received values: i 1 j =1
/ g xj
g yi =
g xj
j =i+1
2. g yi ci is broadcast with a knowledge proof (the knowledge proof is needed to ensure that the node cannot inuence the election maliciously afterwards). ci is set to xi for non aggregators, while a random ri value for aggregators. The product P =
N i=1
g ci yi equals to 1 if and only if no cluster aggregator is elected (none vetoed
the question: Is the number of cluster aggregators elected zero?). If no aggregator is elected, then it will be clear for all participants, and the election can be done again. If P diers from 1, then some nodes are announced themselves to be cluster aggregators, and this is known by all the nodes.
64
4.4. Advanced protocol If we consider the eect of the second step (new election is run if no aggregator is elected), the expected number of aggregators is slightly higher than in the case of binomial distributions. The expected number of aggregators are: cE = Np 1 (1 p)N
The anonymity of the election subprotocol depends on the parts of the protocol. Obviously, the random number generation does not leak any information about the identity of the aggregator nodes, if the random number generator is secure. A cryptographically secure random number generator, called TinyRNG, is proposed in [Francillon and Castelluccia, 2007] for wireless sensor networks. Using a secure random number generator, it is unpredictable, who elects itself to be aggregator node. The anonymity analysis of the anonym veto protocol can be found in [Hao and Zielinski, 2006]. The anonymity is based on the decisional Die-Hellman assumption, which is considered to be a hard problem. The message complexity of the election is O(N 2 ), which is acceptable as the election is run infrequently (N is the number of nodes in the cluster). If this overhead with the 4 modular exponentiations (see Table 4.3 for the complexities and Table 4.1 for the estimated running times, note that RSA is based on modular exponentiation) is too big for the application, then it can use the basic protocol described in Section 4.3.1, where only symmetric key encryption is used. In wireless sensor networks, the links in general are not reliable, packet losses occur in time to time. Reliability can be introduced by the link layer or by the application. As it is crucial to run the election protocol without any packet loss, it is required to use a reliable link layer protocol for this subprotocol. Such protocols are suggested in [Iqbal and Khayam, 2009; Wan et al., 2002] for wireless sensor networks. As a summary, after the election subprotocol every node is equiprobably aggregator node. The election subprotocol ensures that at least one aggregator is elected and this node(s) is aware of its status. An outside attacker does not know the identity of the aggregators or even the actual number of the elected aggregator nodes. An attacker, who compromised one or more nodes, can decide whether the compromised nodes are aggregators, but cannot be certain about the other nodes.
4.4.3
Data aggregation
The main goal of the WSN is to measure some data from the environment, and store the data for later use. This section describes how the data is forwarded to the aggregator(s) without the explicit knowledge of the identier(s) of the aggregator(s). The data aggregation and storage procedure use the broadcast channel. If the covered area is so small or the radio range is so large that every node can reach each other directly, then the aggregation can be implemented simply. Every node broadcasts their measurement to the common channel, and the cluster aggregator(s) can aggregate and store the measurements. If the covered area is bigger (which is the more realistic case), a connected dominating set based solution is proposed. In each timeslot, each ordinary node (not member of the CDS) sends its measurement to one neighboring CDS member (to the parent) by unicast communication. When the epoch is elapsed and all the measurements from the nodes are received, the CDS nodes aggregate the measurements and use a modication of the Echo algorithm on the given spanning tree to compute the gross aggregated measurement in the following way: each CDS member waits until all but one CDS neighbor sends its subaggregate to it, and after some random delay it sends the aggregate to the remaining neighbor. This means that the leaf nodes of the tree start the communication, and then the communication wave is propagated towards the root of the spanning tree. This behavior is the same as the second phase of the Echo algorithm. When one node receives the subaggregates from all of its neighbors, thus cannot send it to anyone, it can compute the gross aggregated value of
65

1;1
1;1 1;1
1;1
3;1
2;2 2;2 3;3

2.6;5
2.6;5 Aggregator CDS
2.6
2.6;5
2;1 3;1 4;1
2.6;5
2.6;5
4;1 3;1
4;1 3;1
2.6;5 2.6;5
2.6
2.6;5 2.6;5
Figure 4.7: Aggregation example. The subgures from left to right represents the consecutive steps of an average computation: (i) The measured data is ready to send. It is stored in a format of actual average; number of data. Non CDS nodes sends the average to their parents. (ii) The CDS nodes start to send the aggregated value to its parents. (iii) A CDS node receives an aggregate from all of its neighbors, and starts to broadcast the nal aggregated value. Nodes willing to store the value can do so. (iv) Other CDS nodes receiving the nal value rebroadcasts it. Nodes willing to store the value can do so.
the network. Then, this value is distributed between the cluster members by broadcasting it every CDS member. This second phase is needed, so that every member of the cluster can be aware of the gross aggregated value, and the anonymous aggregators can store it, while the others can simply discard it. The stored data includes the timeslot in which the aggregate was computed, and the environmental variables if more than one variable (e.g. temperature and humidity) are recorded besides the value itself. The aggregation function can be any statistical function of the measured data. Some easily implementable and widely used functions are the minimum, maximum, sum or average. In Figure 4.7, the aggregation protocol is visualized with ve nodes and two aggregators using the average as an aggregation function. The anonymity analysis of the aggregation subprotocol is quite simple. After the aggregation, every node possesses the same information as an external attacker can get. This information is the aggregated data itself, without knowing anything about the identity of the aggregators. If the operator wants to hide the aggregated data, it can use some techniques discussed in Section 4.5. The message complexity of the aggregation is O(N ), where N is the number of nodes in the cluster. This is the best complexity achievable, because to store all the measurements by a single aggregator, all nodes must send the measurements towards the aggregator, which leads to O(N ) message complexity. In terms of latency, the advanced protocol doubles the time the aggregated measurement arrives to the aggregator compared to a naive system, where the identity of the aggregators are known to every participant. This latency is acceptable as in most WSN applications the time between the measurements is much longer than the time required to aggregate the data. As mentioned in the election subprotocol, the protocol must be prepared to packet losses due to the nature of wireless sensor networks. In the aggregation subprotocol two kind of packet loss can be envisioned: a packet can be lost before or after the nal aggregate is computed. Both cases can be detected by timers and a resend request can be sent. If the resend is unsuccessful for some times, the aggregation must be run without those messages. If the lost message contains a measurement or subaggregate, then the nal aggregate will be computed without that data leading to an inaccurate measurement. If the lost message contained the gross aggregate, then some nodes will not receive the gross aggregate. Here it is very useful that the network can have multiple aggregators, because if at least one aggregator receives the data, the data can be queried by the operator.
66
4.4.4
Query
The ultimate goal of the sensor network is to make the measured data available to the operator upon request. While the aggregation subprotocol ensures that the measured data is stored by the aggregators, the goal of the query subprotocol is to provide the requested data to the operator and keep the aggregators identity hidden at the same time. One solution would be that the operator visits all the nodes, and connects to them by wire. While this solution would leak no information about the identities of the aggregators to any eavesdropping attacker, the execution would be very time consuming and cumbersome. Moreover, the accessibility of some nodes may be dicult or dangerous (for example in a military scenario). Therefore, I propose a solution where it is sucient for the operator to get in wireless communication range of any of the nodes. This node does not need to be an aggregator, as actually no one, not even the operator knows who the aggregator nodes are. As a rst step, the operator authenticates itself to the selected node O using the key kO . After that, node O starts the query protocol by sending out a query, obtains the response to the query from the cluster, and makes the response available to the operator. In the following, it is assumed that O is not a CDS node. (If it is indeed a CDS node, then the rst and last transmission of the query protocol can be omitted.) Node O broadcasts the query data Q with the help of the CDS nodes in the cluster. This is done by sending Q to the CDS parent, and then every CDS member rebroadcasts Q as it is received. The query Q describes what information the operator is interested in. It includes a variable name, a time interval, and a eld for collecting the response to the query. It also includes a bit, called aggregated, which will later be used in the detection of misbehaving nodes. For the details of misbehaving node detection, the reader is referred to Section 4.4.5; here we assume that the aggregated bit is always set meaning that aggregation is enabled. The idea of the query protocol is that each node i in the cluster contributes to the response by a number Ri , which is computed as follows: { h(Q|ki ), for non-aggregators Ri = (4.4) h(Q|ki ) + M , for aggregators where M is the stored measurement (available only if the node is an aggregator), h is a cryptographic hash function, and ki is the key shared by node i and the operator. Thus, non-aggregators contribute with a pseudo-random number h(Q|ki ) computed from the query and the key ki , which can later be also computed by the operator, while aggregator nodes contribute with the sum of a pseudo-random number and the requested measurement data. The sum is normal x point addition, which can overow if the hash is a large value. The goal is that the querying node O receives back the sum of all these Ri values. For this reason, when the query Q is received by a non CDS node from its CDS parent, it computes its Ri value and sends it back to the CDS parent in the response eld of the query token. When a CDS parent receives back the query tokens with the updated response eld from its children, it computes the sum of the received Ri values and its own, and after inserting the identiers of the nodes sends the result back to its parent. This is repeated until the query token reaches back to the CDS parent of node O, which can forward the response R = Ri and the list of responding nodes to node O, where the sum is computed by normal x point addition. This operation is illustrated in Figure 4.8. When receiving R from O, the operator can calculate the stored data as follows. First of all, the operator can regenerate each hash value h(Q|ki ), because it stores (or can compute from a master key on-the-y) each key ki , and it knows the original query data Q. The operator can subtract the hash values from R (note that the responding nodes list is present in the response), and it gets a result R = cM , where c is the actual number of aggregators in the cluster2 . Unfortunately, this number c is unknown to the operator, as it is unknown to everybody else. Nevertheless, if M is
2 Note that each aggregator contributed the measurement M to the response, that is why at the end, the response will be c times M , where c is the number of aggregators.
67

R 1+ . . . + R 5 O R 1 + R 2 + R 3+ R Q Q
4
O O Q Q
Q R +R +R
1 2
R +R +R
Q Q
1 2
R R
Q Q
1
Figure 4.8: Query example. The subgures from left to right represents the consecutive steps of a query: (i) The operator sends the Q query to node O. This node forwards it to its CDS parent. The CDS parent broadcasts the query. (ii) The CDS nodes broadcasts the query, so every node in the network is aware of Q. (iii) Every non CDS node (except O) sends it response to its parent. (iv) The sum of the responses is propagated back to the parent of O (including the list of responding nodes, not on the gure), who forwards it to the operator through O. restricted to lie in an interval [A, B ] such that the intervals [iA, iB ] for i = 1, 2, . . . , N are nonoverlapping, then cM can fall only into interval [cA, cB ], and hence, c can be uniquely determined by the operator by checking which interval R belongs to. Then, dividing R with c gives the requested data M . More specically, and for practical reasons, the following three criteria need to be satised by the interval [A, B ] for my query scheme to work: (i) as we have seen before, for unique decoding of cM , the intervals [iA, iB ] for i = 1, 2, . . . , N must be non-overlapping, (ii) in order to t in the messages and to avoid integer overow3 , the highest possible value for cM , i.e., N B must be representable with a pre-specied number L, and (iii) it must be possible to map a pre-specied number D of dierent values into [A, B ]. The rst criterion (i) is met, if the lower end of each interval is larger than the higher end of the preceding interval: 0 < iA (i 1)B = i(A B ) + B, i = 1, . . . , N Note that if the above inequality holds for i = N , then it holds for every i, because A B is a negative constant and B is a positive constant. So it is enough to consider only the case of i = N : 0 < N (A B ) + B B < NN 1 A The second criterion (ii) means that BN < L L B<N while the third criterion (iii) can be formalized as D <BA B >A+D (4.7) (4.6) (4.5)
Figure 4.9 shows an example for a graphical representation of the three criteria, where the crossed area represents the admissible (A, B ) pairs. It can also be easily seen in this gure that a solution exists only if the B coordinate of the intersection of inequalities (4.5) and (4.7) meets criterion (4.6), or in other words
3
In case of overow, the result is not unique.
68
NM <
L N
B
L N
(4.5) (4.7) (4.6)
D A
Figure 4.9: Graphical representation of the suitable intervals As a numerical example, let us assume, that we want to measure at least 100 dierent values (D = 99), the micro-controller is a 16 bit controller (L = 216 ), and we have at most 20 nodes in each cluster (N = 20). Then a suitable interval that satises all three criteria would be [A, B ] = [2000 2100]. Checking that this interval indeed meets the requirements is left for the interested reader. Finally, note that any real measurement interval can be easily mapped to this interval [A, B ] by simple scaling and shifting operations, and my solution requires that such a mapping is performed on the real values before the execution of the query protocol. Our proposed protocol has many advantageous properties. First, the network can respond to a query if at least one aggregator can successfully participate in the subprotocol. Second, the operator does not need to know the identity of the aggregators, thus even the operator cannot leak that information accidentally (although, after receiving the response, the operator learns the actual number of the aggregator nodes). Third, the protocol does not leak any information about the identity of the aggregators: an attacker can eavesdrop the query information Q, and the Ri pseudo random numbers, but cannot deduce from them the identity of the aggregators. Finally, the message complexity of the query is O(N ), where N is the number of nodes in the cluster. This is the best complexity achievable, when the originator of the query does not know the identity of the aggregator(s). The latency of the query protocol depends on the longest path of the network rooted at node O. As mentioned in the previous subprotocols, the protocol must be prepared to packet losses due to the nature of wireless sensor networks. Due to the packet losses, the nal sum R is the sum of the responding nodes which is a subset of all nodes. That is why the identiers must be included in the responses. The operator can calculate cM independently from the actual subset of responders. If at least one response from an aggregator gets to the operator, it can calculate M in the previously described way. If cM = 0, then it is clear for the operator that every aggregators response is lost.
4.4.5
Misbehaving nodes
In this section, I look beyond my initial goal. I briey analyze what happens if a compromised node deviates from the protocol to achieve some goals other than just learning the identity of the aggregators. In the election process, a compromised node may elect itself to be aggregator in every election. This can be a problem if this node is the only elected aggregator, because a compromised node may
69
4. ANONYMOUS AGGREGATOR ELECTION AND DATA AGGREGATION IN WSNS not store the aggregated values. Unfortunately this situation cannot be avoided in any election protocol, because an aggregator can be compromised after the election, and the attacker can erase the memory of that node. Actually my protocol is partially resistant to this attack, because more than one aggregator may be elected with some probability, and the attacker cannot be sure if the compromised node is the single aggregator node in the cluster. During the aggregation, a misbehaving node can modify its readings, or modify the values it aggregates. The modication of others values can be prevented by some broadcast authentication schemes discussed in Section 4.4.1. The problem of reporting false values can be handled by statistical approaches discussed in [Butty an et al., 2006; Wagner, 2004; Butty an et al., 2009]. The most interesting subprotocol from the perspective of misbehaving nodes is the query protocol. In this protocol, a compromised node can easily modify the result of the query in the following way. A compromised node can add an arbitrary number X to the hash in Equation (4.4) instead of using 0 or M . It is easy to see, that if X is selected from the interval [A, B ], then after subtracting the hashes, the resulting sum R will be an integer in the interval [(c +1)A, (c +1)B ] (c is the actual number of aggregators, c + 1 nodes act like aggregators, the c aggregator and the compromised node). A compromised node can further increase its inuence by choosing X from the interval [iA, iB ]. This means that the resulting sum R will be in the interval [(c + i)A, (ci)B ]. If X is not selected from interval [jA, jB ], j = 1 . . . N , then the result can be outside of the decodable intervals. This can be immediately detected by the operator (see Figure 4.10). If the result is in a legitimate interval (j, R [jA, jB ]), then the operator can further check the consistency by calculating R mod j . If the result is zero, then it is possible, that no misbehaving node is present in the network. If the result is non zero, the operator can be sure, that apart from the zeros and M s, some node sent a dierent value, thus a misbehaving node is present in the network. It is hard for the attacker to guess j , because it neither knows the actual number of aggregators, nor can calculate R from R by subtracting the unknown hashes. If the modulus is zero, but the operator is still suspicious about the result, it can further test the cluster for misbehaving nodes with the help of the aggregated bit in the queries. This further testing can be done regularly, randomly, or on receiving suspicious results. If the aggregated bit is cleared in a query Q, then the CDS nodes does not sum the incoming replies, but forward them towards the agent O node as they are received. So if the operator wants to check if a misbehaving node is present in the network it can run a query Q with aggregated bit set, and then run the same query with cleared aggregated bit. If the two results are dierent, then the operator can be sure, that a node wants to hide its malicious activity from the operator. If the two sums are equal, then the operator can further check the results from the second round. If the values are all equal after subtracting the hashes (not considering the zero values), then no misbehavior is detected, otherwise some node(s) misbehave in the cluster. Note here, that this algorithm does not nd every misbehavior, but the misbehaviors not detected by this algorithm does not inuence the operator. For example, two nodes can misbehave such that the rst adds S to its hash and the second adds S . It is clear that this misbehavior does not eect the result computed by the operator, because S S = 0. Other misbehavior not detected by the algorithm if a compromised non aggregator node sends M instead of 0. This is not detected by the algorithm, but not modies the result the operator computes. The operation of misbehavior detection algorithm is depicted on Figure 4.10. This algorithm only detects if some misbehavior is occurred in the cluster, but does not necessarily nd the misbehaving node. I left the elaboration of this problem for future work.
4.5
Related work
A survey on privacy protection techniques for WSNs is provided in [Li et al., 2009], where they are classied into two main groups: data-oriented and context oriented protection. In this section, I briey review these techniques, with an emphasis on those solutions that are closly related to my work. In data-oriented protection, the condentiality of the measured data must be preserved. It is
70
4.5. Related work
R = R
N i=1 h(Q|ki)
j R [jA, jB ]
R mod j = 0
Ri
Ri = Ri h(Q|ki)
N i=1 Ri
= R
M Ri = 0 Ri =M
Figure 4.10: Misbehavior detection algorithm for the query protocol.
71
4. ANONYMOUS AGGREGATOR ELECTION AND DATA AGGREGATION IN WSNS also a research direction how the operator can verify if the received data is correct. The main focus is on the condentiality in [He et al., 2007], while the verication of the received data is also ensured in [Sheng and Li, 2008]. According to [Li et al., 2009] context oriented protection covers the location privacy of the source and the base station. The source location privacy is mainly a problem in event driven networks, where the existence and location of the event is the information, which must be hidden. The location privacy of the base station is discussed in [Deng et al., 2006b]. The main dierence between hiding the base station and the in network aggregators is that a WSN regularly contains only one base station which is a predened node, while at the same time there are more in network aggregators used in one network, and the nodes used as aggregators are periodically changed. The problem of private cluster aggregator election in wireless sensor networks is strongly related to anonym routing in WSNs. The main dierence between anonym routing and anonymous aggregation is that anonym routing supports any trac pattern and generally handles external attackers, while anonymous aggregation supports aggregation specic trac patterns and can handle compromised nodes as well. In [Seys and Preneel, 2006] an ecient anonymous on demand routing scheme called ARM is proposed for mobile ad hoc networks. For the same problem another solution is given in [Zhang et al., 2006] (MASK), where a detailed simulation is also presented for the proposed protocol. A more ecient solution is given in [Rajendran and Sreenaath, 2008], which uses low cryptographic overhead, and addresses some drawbacks of the two papers above. In [Choi et al., 2007] a privacy preserving communication system (PPCS) is proposed. PPCS provides a comprehensive solution to anonymize communication endpoints, keep the location and identier of a node unlinkable, and mask the existence of communication ows. The security of dierent aggregator node election protocols is surveyed in [Schaer et al., 2012]. Most protocols are aiming at no security on the election, or they aim at the non-manipulability of the election. Such protocols are can withstand passive attacks [Kuhn et al., 2006], or active attacks as well[Sirivianos et al., 2007; Gicheol, 2010].
4.6
Conclusion
In wireless sensor networks, in-network data aggregation is often used to ensure scalability and energy ecient operation. However, as we saw, this also introduces some security issues: the designated aggregator nodes that collect and store aggregated sensor readings and communicate with the base station are attractive targets of physical node destruction and jamming attacks. In order to mitigate this problem, in this chapter, I proposed two private aggregator node election protocols for wireless sensor networks that hide the elected aggregator nodes from the attacker, who, therefore, cannot locate and disable them. My basic protocol provides fewer guarantees than my advanced protocol, but it may be sucient in cases where the risk of physical compromise of nodes is low. My advanced protocol hides the identity of the elected aggregator nodes even from insider attackers, thus it handles node compromise attacks too. I also proposed a private data aggregation protocol and a corresponding private query protocol for the advanced version, which allow the aggregator nodes to collect sensor readings and respond to queries of the operator, respectively, without revealing any useful information about their identity. My aggregation and query protocols are resistant to both external eavesdroppers and compromised nodes participating in the protocol. The communication in the advanced protocol is based on the concept of connected dominating set, which suits well to wireless sensor networks. In this chapter I went beyond the goal of only hiding the identity of the aggregator nodes. I also analyzed what happens if a malicious node wants to exploit the anonymity oered by the system, and tries to mislead the operator by injecting false reports. I proposed an algorithm that can detect if any of the nodes misbehaves in the query phase. I only detect the fact of misbehavior and leave the identication of the misbehaving node itself for future work. In general, my protocols increase the dependability of sensor networks, and therefore, they can be applied in mission critical sensor network applications, including high-condence cyber-physical
72
4.7. Related publications systems where sensors and actuators monitor and control the operation of some critical physical infrastructure.
4.7
[Butty an and Tamas Holczer. Private cluster head election an and Holczer, 2009] Levente Butty inwireless sensor networks. In Proceedings of the Fifth IEEE International Workshop on Wireless and Sensor Networks Security (WSNS 2009), pages 10481053. IEEE, IEEE, 2009. [Butty an and Holczer, 2010] Levente Butty an and Tamas Holczer. Perfectly anonymous data aggregation in wireless sensor networks. In Proceedings of The 7th IEEE International Conference on Mobile Ad-hoc and Sensor Systems (WSNS 2010), San Francisco, November 2010. IEEE. [Holczer and Butty an. Anonymous aggregator an, 2011] Tamas Holczer and Levente Butty election and data aggregation in wireless sensor networks. International Journal of Distributed Sensor Networks, page 18, 2011. Article ID 828414. am Horv [Schaer et al., 2012] P eter Schaer, K aroly Farkas, Ad ath, Tam as Holczer, and Levente Butty an. Secure and reliable clustering in wireless sensor networks: A critical survey. Elsevier Computer Networks, 2012.
73
Chapter
Application of new results

In this dissertation three dierent wireless network based systems are considered: Radio Frequency Identication Systems, Vehicular Ad Hoc Networks, and Wireless Sensor Networks. In this chapter, a brief overview is give, where these systems are used, and how my new results t in them. Radio Frequency Identication Systems The application of RFID is very widespread, some application areas are [Wu et al., 2009; RFID, 2012]: Payment by mobile phones Many companies like MasterCard or Nokia is working on mobile phones with embedded RFID capabilities to enable payment by such devices. Inventory systems RFID systems can provide accurate knowledge of the current inventory, which helps saving labor cost, and enables self checkout in shops. Access control RFID tags can be used as identication badges to enable access control in oce buildings, or can be used as tickets in automated fare collection systems. Transportation and logistics In transportation, RFID tags can help identify cargo, its owner or destination. Passport Many countries include RFID tags into passports, to fasten the passport control on the borders, and to make illegitimate replication harder. Hospitals and healthcare Hospitals began implanting patients with RFID tags and using RFID systems, usually for workow and inventory management [Fisher, 2006]. Libraries Libraries are using RFID to replace the barcodes on library items. An RFID system may replace or supplement bar codes and may oer another method of inventory management and self-service checkout by patrons. [Molnar and Wagner, 2004] Any usage of RFID systems, where the holder of the tag is a human being might breach the privacy of the holder. The solutions proposed in Chapter 2 can be used in such situations. An example application is the automated fare collection systems, where the pass for the mass transportation system can contain an RFID tag. In such a system, the system designer might consider the usage of key trees or group based private authentication, in particular if the legal environment requires the usage of some kind of privacy enhancing technology. Vehicular Ad Hoc Networks The application of Vehicular Ad Hoc Networks is very widespread, but can be categorized into three main categories: safety related applications, transport eciency, and information/entertainment applications [Hartenstein and Laberteaux, 2008; Willke et al., 2009]. Hundreds of possible applications can be envisioned or are under construction. Such
75
5. APPLICATION OF NEW RESULTS an application is the cooperative forward collision warning, which help avoiding rear-end collisions with the use of beacon messages. The trac eciency for example can be increased by a trac light optimal speed advisory application, which can assists the driver to arrive during a green phase. An example for the information gathering applications is the ability of remote wireless diagnosis, which enables to make the state of the vehicle accessible for remote diagnosis. Most of the safety and trac eciency related applications are based on the beacon messages, which are frequent messages containing the location, heading, identier, and some other attributes of the vehicle. These messages can enable the tracking of individual vehicles, which is an undesirable side eect of the usage of VANETs. This side eect is analyzed in Chapter 3, and a countermeasure is proposed as well. The countermeasure algorithm is compatible with the framework proposed by the Car 2 Car Communication Consortium [Consortium, 2012]. Most of the results of Chapter 3 were parts of the results of the SeVeCom1 European Commission funded project. The results were delivered to and accepted by the European Commission. Wireless Sensor Networks Wireless sensor networks can be used in many scenarios. In Chapter 4 I proposed two anonym aggregation schemes, which hides the identity of the aggregator node. In the following a few applications are given based on [Akyildiz et al., 2002] with a special attention on the possible need of hiding some special nodes: wireless sensor networks can be an integral part of military command, control, communications, computing, intelligence, surveillance, reconnaissance and targeting (C4ISRT) systems, where there is a clear motivation for an attacker to disturb the normal functioning of the network by eliminating some special nodes. Another example can be the protection of critical infrastructure. The problem is that some critical infrastructure like electrical lines or drinking water pipes are so large scale, that it is impossible to protect them with traditional methods. WSNs can be a possible protection and surveillance system, where the disturbance of normal operation by the elimination of aggregator nodes must be avoided. In the above mentioned applications, there is a clear need for aggregation, and the loss of the aggregator might have undesirable consequences. Hence in these applications, the anonym aggregator election, aggregation, and query schemes proposed in Chapter 4 can be used. The goal of the Wireless Sensor and Actuator Networks for Critical Infrastructure Protection project (WSAN4CIP2 ), funded by the European Commission, was to make critical infrastructure more dependable by the use of WSNs. Some of the results of Chapter 4 were integral part of that project. In summary, it can be seen that the results of Chapter 2-4 can be used in real applications, and the problems discussed in the chapters are important for the society.
http://www.sevecom.org/
http://www.wsan4cip.eu
76
Chapter
Conclusion
In this thesis, I proposed several privacy enhancing protocols for wireless networks. I dealt with three dierent types of networks, namely RFID systems, vehicular ad hoc networks, and wireless sensor networks. In Chapter 2 I proposed a key-tree and a group based private authentication protocol for RFID systems. Both approaches use only symmetric key based cryptographic primitives, which well suits to resource limited RFID systems. Key-trees provide an ecient solution for private authentication, however, the level of privacy provided by key-tree based systems decreases considerably if some members are compromised. This loss of privacy can be minimized by the careful design of the tree. Based on my results presented in this dissertation, I can conclude that a good practical design principle is to maximize the branching factor at the rst level of the tree such that the resulting tree still respects the constraint on the maximum authentication delay in the system. Once the branching factor at the rst level is maximized, the tree can be further optimized by maximizing the branching factors at the successive levels, but the improvement achieved in this way is not really signicant; what really counts is the branching factor at the rst level. In the second part of Chapter 2, I proposed a novel group based private authentication scheme. I analyzed the proposed scheme and quantied the level of privacy that it provides. I compared my group based scheme to the key-tree based scheme. I showed that the group based scheme provides a higher level of privacy than the key-tree based scheme. In addition, the complexity of the group based scheme for the verier can be set to be the same as in the key-tree based scheme, while the complexity for the prover is always smaller in the latter scheme. The primary application area of my schemes are that of RFID systems, but it can also be used in applications with similar characteristics (e.g., in wireless sensor networks). Some possible work that could be done is the usage of dierent metrics like the entropy based metric, or the usage of dierent constraints like the minimal size of the anonymity sets when selecting a structure like the groups for the users. These new metrics or constraints can make the resulting optimization problem complex, which can require heuristic solutions as well. A general framework that could solve the optimization problem for dierent metrics and constraints could be a future research direction. The most criticized part of any key tree or group based solution is the diculty of the key update. Hence, a challenging future work could be the implementation of a key update scheme in a tree based solution. In the rst half of Chapter 3, I studied the eectiveness of changing pseudonyms to provide location privacy for vehicles in vehicular networks. The approach of changing pseudonyms to make location tracking more dicult was proposed in prior work, but its eectiveness has not been investigated yet. In order to address this problem, I dened a model based on the concept of the mix zone. I assumed that the adversary has some knowledge about the mix zone, and based
77
6. CONCLUSION on this knowledge, she tries to relate the vehicles that exit the mix zone to those that entered it earlier. I also introduced a metric to quantify the level of privacy enjoyed by the vehicles in this model. In addition, I performed extensive simulations to study the behavior of my model in realistic scenarios. In particular, in my simulation, I used a rather complex road map, generated trac with realistic parameters, and varied the strength of the adversary by varying the number of her monitoring points. My simulation results provided detailed information about the relationship between the strength of the adversary and the level of privacy achieved by changing pseudonyms. I abstracted away the frequency with which the pseudonyms are changed, and I simply assumed that this frequency is high enough so that every vehicle surely changes pseudonym while in the mix zone. It seems that changing the pseudonyms frequently has some advantages as frequent changes increase the probability that the pseudonym is changed in the mix zone. On the other hand, the higher the frequency, the larger the cost that the pseudonym changing mechanism induces on the system in terms of management of cryptographic material (keys and certicates related to the pseudonyms). In addition, if for a given frequency, the probability of changing pseudonym in the mix zone is already close to 1, then there is no sense to increase the frequency further as it will no longer increase the level of privacy, while it will still increase the cost. Hence, there seems to be an optimal value for the frequency of the pseudonym change. Unfortunately, this optimal value depends on the characteristics of the mix zone, which is ultimately determined by the observing zone of the adversary, which is not known to the system designer. In the second half of Chapter 3, I proposed a simple and eective privacy preserving scheme, called SLOW, for VANETs. SLOW requires vehicles to stop sending heartbeat messages below a given threshold speed (this explains the name SLOW that stands for silence at low speeds) and to change all their identiers (pseudonyms) after each such silent period. By using SLOW, the vicinity of intersections and trac lights become dynamically created mix zones, as there are usually many vehicles moving slowly at these places at a given moment in time. In other words, SLOW implicitly ensures a synchronized silent period and pseudonym change for many vehicles both in time and space, and this makes it eective as a location privacy enhancing scheme. Yet, SLOW is remarkably simple, and it has further advantages. For instance, it relieves vehicles of the burden of verifying a potentially large amount of digital signatures when the vehicle density is large, as this usually happens when the vehicles move slowly in a trac jam or stop at intersections. Finally, the risk of a fatal accident at a slow speed is low, and therefore, SLOW does not seriously impact safety-of-life. I evaluated SLOW in a specic attacker model that seems to be realistic, and it proved to be eective in this model, reducing the success rate of tracking a target vehicle from its starting point to its destination down to the range of 1030%. Some future work could be a detailed analysis of the result of SLOW on the safety of vehicles, or the analysis of the exceptional cases where the vehicles are forced to send a beacon message below the threshold. In Chapter 4 I proposed two private aggregation algorithms for wireless sensor networks. In wireless sensor networks, in-network data aggregation is often used to ensure scalability and energy ecient operation. However, this also introduces some security issues: the designated aggregator nodes that collect and store aggregated sensor readings and communicate with the base station are attractive targets of physical node destruction and jamming attacks. In order to mitigate this problem, I proposed two private aggregator node election protocols for wireless sensor networks that hide the elected aggregator nodes from the attacker, who, therefore, cannot locate and disable them. My basic protocol provides fewer guarantees than my advanced protocol, but it may be sucient in cases where the risk of physically compromising nodes is low. My advanced protocol hides the identity of the elected aggregator nodes even from insider attackers, thus it handles node compromise attacks too. I also proposed a private data aggregation protocol and a corresponding private query protocol for the advanced version, which allow the aggregator nodes to collect sensor readings and respond to queries of the operator, respectively, without revealing any useful information about their identity. My aggregation and query protocols are resistant to both external eavesdroppers and compromised
78
6.0. Conclusion nodes participating in the protocol. The communication in the advanced protocol is based on the concept of connected dominating set, which suits well to wireless sensor networks. At the end of Chapter 4 I went beyond the goal of only hiding the identity of the aggregator nodes. I also analyzed what happens if a malicious node wants to exploit the anonymity oered by the system, and tries to mislead the operator by injecting false reports. I proposed an algorithm that can detect if any of the nodes misbehaves in the query phase. I only detect the fact of misbehavior and leave the identication of the misbehaving node itself for future work. A more challenging future work is the reduction of the message or computational complexity of the election subprotocol.
79
List of Acronyms
CA CDS CH DSRC ID IR MAC OBU RF RFID RSA RSU SEVECOM SLOW SUMO TTL VANET VIN WSAN4CIP WSN
Cluster Aggregator Connected Dominating Set Cluster Head Dedicated Short-Range Communications IDentier Infrared Message Authentication Code On Board unit Radio Frequency Radio Frequency IDentication Rivest Shamir Adleman algorithm Road Side Unit Secure Vehicular Communication Silence at LOW speeds Simulation of Urban MObility Time To Live Vehicular Ad Hoc Network Vehicle Identication Number Wireless Sensor and Actuator Networks for Critical Infrastructure Protection Wireless Sensor Network
81
List of publications
[Avoine et al., 2007] Gildas Avoine, Levente Buttyan, Tamas Holczer, and Istvan Vajda. Groupbased private authentication. In Proceedings of the International Workshop on Trust, Security, and Privacy for Ubiquitous Computing (TSPUC 2007). IEEE, 2007. [Butty an and Holczer, 2009] Levente Butty an and Tamas Holczer. Private cluster head election inwireless sensor networks. In Proceedings of the Fifth IEEE International Workshop on Wireless and Sensor Networks Security (WSNS 2009), pages 10481053. IEEE, IEEE, 2009. [Butty an and Holczer, 2010] Levente Butty an and Tamas Holczer. Perfectly anonymous data aggregation in wireless sensor networks. In Proceedings of The 7th IEEE International Conference on Mobile Ad-hoc and Sensor Systems (WSNS 2010), San Francisco, November 2010. IEEE. [Buttyan et al., 2004] Levente Buttyan, Tamas Holczer, and Peter Schaer. Incentives for cooperation in multi-hop wireless networks. H rad astechnika, LIX(3):3034, March 2004. (in Hungarian). [Buttyan et al., 2005] Levente Buttyan, Tamas Holczer, and Peter Schaer. Spontaneous cooperation in multi-domain sensor networks. In Proceedings of the 2nd European Workshop on Security and Privacy in Ad-hoc and Sensor Networks (ESAS), Visegr ad, Hungary, July 2005. Springer. [Buttyan et al., 2006a] Levente Buttyan, Tamas Holczer, and Istvan Vajda. Optimal key-trees for tree-based private authentication. In Proceedings of the International Workshop on Privacy Enhancing Technologies (PET), June 2006. Springer. [Buttyan et al., 2006b] Levente Buttyan, Tamas Holczer, and Istvan Vajda. Providing location privacy in automated fare collection systems. In Proceedings of the 15th IST Mobile and Wireless Communication Summit, Mykonos, Greece, June 2006. [Buttyan et al., 2007] Levente Buttyan, Tamas Holczer, and Istvan Vajda. On the eectiveness of changing pseudonyms to provide location privacy in vanets. In Proceedings of the Fourth European Workshop on Security and Privacy in Ad hoc and Sensor Networks (ESAS2007). Springer, 2007. [Buttyan et al., 2009] Levente Buttyan, Tamas Holczer, Andre Weimerskirch, and William Whyte. Slow: A practical pseudonym changing scheme for location privacy in vanets. In Proceedings of the IEEE Vehicular Networking Conference, pages 18. IEEE, IEEE, October 2009. [Dora and Holczer, 2010] Laszlo Dora and Tamas Holczer. Hide-and-lie: Enhancing applicationlevel privacy in opportunistic networks. In Proceedings of the Second International Workshop on Mobile Opportunistic Networking ACM/SIGMOBILE MobiOpp 2010, Pisa, Italy, February 22-23 2010. [Dvir et al., 2011] Amit Dvir, Tamas Holczer, and Levente Butty an. Vera - version number and rank authentication in rpl. In Proceedings of the 7th IEEE International Workshop on Wireless and Sensor Networks Security (WSNS 2011). IEEE, 2011.
83
LIST OF PUBLICATIONS [Holczer and Butty an, 2011] Tamas Holczer and Levente Butty an. Anonymous aggregator election and data aggregation in wireless sensor networks. International Journal of Distributed Sensor Networks, page 18, 2011. Article ID 828414. [Holczer et al., 2009] Tamas Holczer, Petra Ardelean, Naim Asaj, Stefano Cosenza, Michael M uter, Albert Held, Bj orn Wiedersheim, Panagiotis Papadimitratos, Frank Kargl, and Danny De Cock. Secure vehicle communication (sevecom). Demonstration. Mobisys, June 2009. [Papadimitratos et al., 2008] Panagiotis Papadimitratos, Antonio Kung, Frank Kargl, Zhendong Ma, Maxim Raya, Julien Freudiger, Elmar Schoch, Tamas Holczer, Levente Butty an, and Jean pierre Hubaux. Secure vehicular communication systems: design and architecture. IEEE Communications Magazine, 46(11):100109, 2008. am Horv [Schaer et al., 2012] P eter Schaer, K aroly Farkas, Ad ath, Tam as Holczer, and Levente Butty an. Secure and reliable clustering in wireless sensor networks: A critical survey. Computer Networks, 2012.
84
Bibliography
[Abadi and Fournet, 2004] M. Abadi and C. Fournet. Private authentication. Theoretical Computer Science, 322(3):427476, 2004. [Akyildiz et al., 2002] I.F. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cayirci. Wireless sensor networks: a survey. Computer networks, 38(4):393422, 2002. [Anderson and Kuhn, 1996] R. Anderson and M. Kuhn. Tamper resistance: a cautionary note. In Proceedings of the 2nd conference on Proceedings of the Second USENIX Workshop on Electronic Commerce-Volume 2, page 1. USENIX Association, 1996. [Aoki and Fujii, 1996] M. Aoki and H. Fujii. Inter-vehicle communication: Technical issues on vehicle control application. Communications Magazine, IEEE, 34(10):9093, 1996. [Armknecht et al., 2007] F. Armknecht, A. Festag, D. Westho, and K. Zeng. Cross-layer privacy enhancement and non-repudiation in vehicular communication. In 4th Workshop on Mobile Ad-Hoc Networks (WMAN), 2007. [ASV, ] Advanced safety vehicle program. http://www.ahsra.or.jp/demo2000/eng/demo_e/ ahs_e7/iguchi/iguchi.html. [Avoine and Oechslin, 2005] G. Avoine and P. Oechslin. A scalable and provably secure hash-based rd protocol. In Pervasive Computing and Communications Workshops, 2005. PerCom 2005 Workshops. Third IEEE International Conference on, pages 110114. IEEE, 2005. [Avoine et al., 2005] G. Avoine, E. Dysli, and P. Oechslin. Reducing time complexity in rd systems. In Proceedings of the 12th Annual Workshop on Selected Areas in Cryptography (SAC05), pages 291306. Springer, 2005. [Avoine, 2012] Gildas Avoine. Bibliography on security and privacy in rd systems. http://www.ep.ch/*gavoine/rd/, 2012. [Baruya, 1998] A. Baruya. Speed-accident relationship on dierent kinds of european roads. MASTER Deliverable 7, September 1998. [Beresford and Stajano, 2003] A.R. Beresford and F. Stajano. Location privacy in pervasive computing. Pervasive Computing, IEEE, 2(1):4655, 2003. [Beresford and Stajano, 2004] A.R. Beresford and F. Stajano. Mix zones: User privacy in locationaware services. In Pervasive Computing and Communications Workshops, 2004. Proceedings of the Second IEEE Annual Conference on, pages 127131. IEEE, 2004. [Berki, 2008] Z. Berki. Development of Trac Models on the basis of Passanger Demand Surveys Thesis of the PhD dissertation. PhD thesis, Budapest University of Technology and Economics, 2008.
85
BIBLIOGRAPHY [Beye and Veugen, 2011] M. Beye and T. Veugen. Improved anonymity for key-trees? Technical report, Cryptology ePrint Archive, Report 2011/395, 2011. [Beye and Veugen, 2012] M. Beye and T. Veugen. Anonymity for key-trees with adaptive adversaries. Security and Privacy in Communication Networks, pages 409425, 2012. [Black and McGrew, 2008] David L. Black and David A. McGrew. The internet key exchange (ikev2) protocol. 2008. [Blum et al., 2004a] Jeremy Blum, Min Ding, Andrew Thaeler, and Xiuzhen Cheng. Connected dominating set in sensor networksand manets. In D.-Z. Du and P. Pardalos, editors, Handbook of Combinatorial Optimization, pages 329369. Kluwer Academic Publishers, 2004. [Blum et al., 2004b] J.J. Blum, A. Eskandarian, and L.J. Homan. Challenges of intervehicle ad hoc networks. Intelligent Transportation Systems, IEEE Transactions on, 5(4):347351, 2004. [Bono et al., 2005] S. Bono, M. Green, A. Stubbleeld, A. Juels, A. Rubin, and M. Szydlo. Security analysis of a cryptographically-enabled rd device. In 14th USENIX Security Symposium, volume 1, page 16, 2005. [Boyd and Mathuria, 2003] C. Boyd and A. Mathuria. Protocols for authentication and key establishment. Springer Verlag, 2003. [Brandt, 2006] F. Brandt. Ecient cryptographic protocol design based on distributed El Gamal encryption. Lecture Notes in Computer Science, 3935:32, 2006. [Butty an and Holczer, 2010] Levente Butty an and Tamas Holczer. Perfectly anonymous data aggregation in wireless sensor networks. In Proceedings of the Sixth IEEE International Workshop on Wireless and Sensor Networks Security (WSNS10). IEEE, IEEE, 2010. [Butty an and Hubaux, 2008] Levente Butty an and Jean Pierre Hubaux. Security and Cooperation in Wireless Networks. Cambridge University Press, 2008. [Butty an and Peter Schaer. Panel: Position-based aggrean and Schaer, 2010] Levente Butty gator node election in wireless sensor networks. International Journal of Distributed Sensor Networks, 2010. [Butty an et al., 2006] Levente Butty an, Peter Schaer, and Istv an Vajda. Ranbar: Ransac-based resilient aggregation in sensor networks. In In Proceedings of the Fourth ACM Workshop on Security of Ad Hoc and Sensor Networks (SASN), Alexandria, VA, USA, October 2006. ACM Press. [Butty an et al., 2009] Levente Butty an, Peter Schaer, and Istv an Vajda. Cora: Correlation-based resilient aggregation in sensor networks. Elsevier Ad Hoc Networks, 7(6):10351050, 2009. [Calandriello et al., 2007] Giorgio Calandriello, Panos Papadimitratos, Jean-Pierre Hubaux, and Antonio Lioy. Ecient and robust pseudonymous authentication in vanet. In VANET 07: Proceedings of the fourth ACM international workshop on Vehicular ad hoc networks, pages 1928, New York, NY, USA, 2007. ACM. [Camenisch and Lysyanskaya, 2001] J. Camenisch and A. Lysyanskaya. An ecient system for non-transferable anonymous credentials with optional anonymity revocation. Advances in Cryptology-EUROCRYPT 2001, pages 93118, 2001. [Camenisch and Stadler, 1997] Jan Camenisch and Markus Stadler. Proof systems for general statements about discrete logarithms. Technical report, Department of Computer Science, ETH Z urich, 1997.
86
Bibliography [Carbunar et al., 2007] B. Carbunar, Y. Yu, L. Shi, M. Pearce, and V. Vasudevan. Query privacy in wireless sensor networks. In Sensor, Mesh and Ad Hoc Communications and Networks, 2007. SECON07. 4th Annual IEEE Communications Society Conference on, pages 203212. IEEE, 2007. [Chan and Perrig, 2003] H. Chan and A. Perrig. Security and privacy in sensor networks. Computer, 36(10):103105, 2003. [Chan et al., 2003] H. Chan, A. Perrig, and D. Song. Random key predistribution schemes for sensor networks. In IEEE Symposium on Security and Privacy, pages 197215. IEEE Computer Society, 2003. [Chang, 2006] E.J.H. Chang. Echo algorithms: Depth parallel operations on general graphs. Software Engineering, IEEE Transactions on, (4):391401, 2006. [Chaum, 1981] D.L. Chaum. Untraceable electronic mail, return addresses, and digital pseudonyms. Communications of the ACM, 24(2):8490, 1981. [Chaum, 1988] D. Chaum. The dining cryptographers problem: Unconditional sender and recipient untraceability. Journal of Cryptology, 1(1):6575, 1988. [Chisalita and Shahmehri, 2002] L. Chisalita and N. Shahmehri. A peer-to-peer approach to vehicular communication for the support of trac safety applications. In Intelligent Transportation Systems, 2002. Proceedings. The IEEE 5th International Conference on, pages 336341. IEEE, 2002. [Choi et al., 2005] J.Y. Choi, M. Jakobsson, and S. Wetzel. Balancing auditability and privacy in vehicular networks. In Proceedings of the 1st ACM international workshop on Quality of service & security in wireless and mobile networks, pages 7987. ACM, 2005. [Choi et al., 2007] H. Choi, P. McDaniel, and TF La Porta. Privacy Preserving Communication in MANETs. In 4th Annual IEEE Communications Society Conference on Sensor, Mesh and Ad Hoc Communications and Networks, pages 233242, 2007. [COM, ] Communications for esafety. http://www.comesafety.org/. [Consortium, 2012] Car 2 Car Communication Consortium. 2012. http://www.car-to-car.org,
[Deng et al., 2005] J. Deng, R. Han, and S. Mishra. Countermeasures against trac analysis attacks in wireless sensor networks. In Security and Privacy for Emerging Areas in Communications Networks, 2005. SecureComm 2005. First International Conference on, pages 113126. IEEE, 2005. [Deng et al., 2006a] J. Deng, R. Han, and S. Mishra. Decorrelating wireless sensor network trac to inhibit trac analysis attacks. Pervasive and Mobile Computing, 2(2):159186, 2006. [Deng et al., 2006b] J. Deng, R. Han, and S. Mishra. Decorrelating wireless sensor network trac to inhibit trac analysis attacks. Pervasive and Mobile Computing, 2(2):159186, 2006. [Diaz et al., 2002] C. Diaz, S. Seys, J. Claessens, and B. Preneel. Towards measuring anonymity. In Proceedings of the 2nd international conference on Privacy enhancing technologies, pages 5468. Springer-Verlag, 2002. [Dingledine et al., 2004] R. Dingledine, N. Mathewson, and P. Syverson. generation onion router. Technical report, DTIC Document, 2004. Tor: The second-
[D otzer. Privacy issues in vehicular ad hoc networks. In Privacy Enhancing otzer, 2006] F. D Technologies, pages 197209. Springer, 2006.
87
BIBLIOGRAPHY [El Zarki et al., 2002] M. El Zarki, S. Mehrotra, G. Tsudik, and N. Venkatasubramanian. Security issues in a future vehicular network. In European Wireless, volume 2, 2002. [Faizulkhakov, 2007] Ya. R. Faizulkhakov. Time synchronization methods for wireless sensor networks: A survey. Programming and Computing Software, 33(4):214226, 2007. [Fisher, 2006] J.A. Fisher. Indoor positioning and digital management. Surveillance and security: Technological politics and power in everyday life, page 77, 2006. [Fishkin et al., 2005] K. Fishkin, S. Roy, and B. Jiang. Some methods for privacy in rd communication. Security in ad-hoc and sensor networks, pages 4253, 2005. [Floerkemeier et al., 2005] C. Floerkemeier, R. Schneider, and M. Langheinrich. Scanning with a purposesupporting the fair information principles in rd protocols. Ubiquitous Computing Systems, pages 214231, 2005. [Francillon and Castelluccia, 2007] Aur elien Francillon and Claude Castelluccia. TinyRNG: A cryptographic random number generator for wireless sensors network nodes. In Modeling and Optimization in Mobile, Ad Hoc and Wireless Networks and Workshops, 2007. WiOpt 2007. 5th International Symposium on, pages 17, April 2007. [Freudiger et al., 2007] J. Freudiger, M. Raya, M. Felegyh azi, P. Papadimitratos, and J.-P. Hubaux. Mix-zones for location privacy in vehicular networks. In Proceedings of the 1st International Workshop on Wireless Networking for Intelligent Transportation Systems (WiN-ITS 07), 2007. [Ganesan et al., 2003] P. Ganesan, R. Venugopalan, P. Peddabachagari, A. Dean, F. Mueller, and M. Sichitiu. Analyzing and modeling encryption overhead for sensor network nodes. In Proceedings of the 2nd ACM international conference on Wireless sensor networks and applications, Sep. 2003. [Gerlach, 2006] M. Gerlach. Assessing and improving privacy in vanets. ESCAR Embedded Security in Cars, 2006. [Gicheol, 2010] W. Gicheol. Secure cluster head election using mark based exclusion in wireless sensor networks. IEICE transactions on communications, 93(11):29252935, 2010. [Goldsmith, 2005] Andrea Goldsmith. Wireless Communications. Cambridge University Press, New York, NY, USA, 2005. [Gruteser and Hoh, 2005] M. Gruteser and B. Hoh. On the anonymity of periodic location samples. In Proceedings of the Second International Conference on Security in Pervasive Computing, pages 179192. Springer, 2005. [Gulcu and Tsudik, 1996] C. Gulcu and G. Tsudik. Mixing e-mail with babel. In Network and Distributed System Security, 1996., Proceedings of the Symposium on, pages 216. IEEE, 1996. [Hancke, 2005] G.P. Hancke. A practical relay attack on iso 14443 proximity cards. Technical report, University of Cambridge Computer Laboratory, 2005. [Hao and Zielinski, 2006] F. Hao and P. Zielinski. A 2-round anonymous veto protocol. In Proceedings of the 14th International Workshop on Security Protocols, Cambridge, UK, 2006. [Harkins and Carrel, 1998] D. Harkins and D. Carrel. The internet key exchange (ike)protocol. 1998. [Hartenstein and Laberteaux, 2008] H. Hartenstein and K.P. Laberteaux. A tutorial survey on vehicular ad hoc networks. Communications Magazine, IEEE, 46(6):164 171, June 2008.
88
Bibliography [He et al., 2007] W. He, X. Liu, H. Nguyen, K. Nahrstedt, and T. Abdelzaher. Pda: Privacypreserving data aggregation in wireless sensor networks. In Proceedings of Infocom, pages 2045 2053. IEEE, 2007. [Heinzelman et al., 2000] WR Heinzelman, A. Chandrakasan, and H. Balakrishnan. Energyecient communication protocol for wireless microsensor networks. In Proceedings of the 33rd Annual Hawaii International Conference onSystem Sciences., page 10, 2000. [Hu and Wang, 2005] Y.C. Hu and H.J. Wang. A framework for location privacy in wireless networks. In ACM SIGCOMM Asia Workshop. Citeseer, 2005. [Hu et al., 2005] Y.C. Hu, A. Perrig, and D.B. Johnson. Ariadne: A secure on-demand routing protocol for ad hoc networks. Wireless Networks, 11(1-2):2138, 2005. [Hu et al., 2006] Y.C. Hu, A. Perrig, and D.B. Johnson. Wormhole attacks in wireless networks. Selected Areas in Communications, IEEE Journal on, 24(2):370380, 2006. [Huang et al., 2005] L. Huang, K. Matsuura, H. Yamane, and K. Sezaki. Enhancing wireless location privacy using silent period. In Wireless Communications and Networking Conference, 2005 IEEE, volume 2, pages 11871192. IEEE, 2005. [Huang et al., 2009] Y. Huang, W. He, and K. Nahrstedt. ChainFarm: A Novel Authentication Protocol for High-rate Any Source Probabilistic Broadcast. In Proc. of The 6th IEEE International Conference on Mobile Ad-hoc and Sensor Systems (IEEE MASS), 2009. [Hubaux et al., 2004] J.P. Hubaux, S. Capkun, and J. Luo. The security and privacy of smart vehicles. Security & Privacy, IEEE, 2(3):4955, 2004. [Instruments, 2005] Texas Instruments. Securing the pharmaceutical supply chain with rd and public-key infrastructure (pki) technologies. texas instruments white paper, june 2005, 2005. [Iqbal and Khayam, 2009] Adnan Iqbal and Syed Ali Khayam. An energy-ecient link layer protocol for reliable transmission over wireless networks. EURASIP J. Wirel. Commun. Netw., 2009:28:128:10, January 2009. [ISO, 2008] Iso 9798-2. mechanisms using symmetric encipherment algorithms. 2008. [ITLaw, ] ITLaw. Right of privacy. http://itlaw.wikia.com/wiki/Right_of_privacy. [Jacquet, 2004] Philippe Jacquet. Performance of connected dominating set in olsr protocol. Technical Report RR-5098, INRIA, 2004. [Jian et al., 2007] Y. Jian, S. Chen, Z. Zhang, and L. Zhang. Protecting receiver-location privacy in wireless sensor networks. In INFOCOM 2007. 26th IEEE International Conference on Computer Communications. IEEE, pages 19551963. Ieee, 2007. [Juels and Brainard, 2004] A. Juels and J. Brainard. Soft blocking: Flexible blocker tags on the cheap. In Proceedings of the 2004 ACM workshop on Privacy in the electronic society, pages 17. ACM, 2004. [Juels et al., 2003] A. Juels, R.L. Rivest, and M. Szydlo. The blocker tag: Selective blocking of rd tags for consumer privacy. In Proceedings of the 10th ACM conference on Computer and communications security, pages 103111. ACM, 2003. [Juels et al., 2006] A. Juels, P. Syverson, and D. Bailey. High-power proxies for enhancing rd privacy and utility. In Privacy Enhancing Technologies, pages 210226. Springer, 2006. [Juels, 2005a] A. Juels. Minimalist cryptography for low-cost rd tags. Security in Communication Networks, pages 149164, 2005.
89
BIBLIOGRAPHY [Juels, 2005b] A. Juels. Strengthening epc tags against cloning. In Proceedings of the 4th ACM workshop on Wireless security, pages 6776. ACM, 2005. [Juels, 2006] A. Juels. Rd security and privacy: A research survey. Selected Areas in Communications, IEEE Journal on, 24(2):381394, 2006. [Kamat et al., 2005] P. Kamat, Y. Zhang, W. Trappe, and C. Ozturk. Enhancing source-location privacy in sensor network routing. In Distributed Computing Systems, 2005. ICDCS 2005. Proceedings. 25th IEEE International Conference on, pages 599608. IEEE, 2005. [Kamat et al., 2007] P. Kamat, W. Xu, W. Trappe, and Y. Zhang. Temporal privacy in wireless sensor networks. In Distributed Computing Systems, 2007. ICDCS07. 27th International Conference on, pages 2323. IEEE, 2007. [Kargl et al., 2008] Frank Kargl, Antonio Kung, Albert Held, Giorgo Calandriello, Ta Vinh Thong, Bj orn Wiedersheim, Elmar Schoch, Michael M uter, Levente Butty an, Panagiotis Papadimitratos, and Jean-Pierre Hubaux. Secure vehicular communication systems: implementation, performance, and research challenges. IEEE Communications Magazine, 46(11):110118, 2008. [Karnadi et al., 2005] F.K. Karnadi, Z.H. Mo, and K. Lan. Rapid generation of realistic mobility models for vanet. In Wireless Communications and Networking Conference, 2007. WCNC 2007. IEEE, pages 25062511. IEEE, 2005. [Kelly and Erickson, 2005] E.P. Kelly and G.S. Erickson. Rd tags: commercial applications v. privacy rights. Industrial Management & Data Systems, 105(6):703713, 2005. [Kesdogan et al., 1998] D. Kesdogan, J. Egner, and R. B uschkes. Stop-and-go-mixes providing probabilistic anonymity in an open system. In Information Hiding, pages 8398. Springer, 1998. [Kr and Wool, 2005] Z. Kr and A. Wool. Picking virtual pockets using relay attacks on contactless smartcard. In Security and Privacy for Emerging Areas in Communications Networks, 2005. SecureComm 2005. First International Conference on, pages 4758. IEEE, 2005. [Kloeden et al., 1997] C.N. Kloeden, A.J. McLean, V.M. Moore, and G. Ponte. Travelling speed and the risk of crash involvement. NHMRC Road Accident Research Unit, The University of Adelaide, 1997. [Kohl and Neuman, 1993] J. Kohl and C. Neuman. Rfc 1510: The kerberos network authentication service (v5). Published Sep, 1993. [Krajzewicz et al., 2002] Daniel Krajzewicz, Georg Hertkorn, Christian R ossel, and Peter Wagner. Sumo (simulation of urban mobility); an open-source trac simulation. In A Al-Akaidi, editor, Proceedings of the 4th Middle East Symposium on Simulation and Modelling (MESM2002), pages 183187, Sharjah, United Arab Emirates, September 2002. SCS European Publishing House. [Kroh et al., 2006] Rainer Kroh, Antonio Kung, and Frank Kargl. Vanets security requirements nal v ersion. Sevecom D1.1, 2006. [Kruskal, 1956] Jr. Kruskal, Joseph B. On the shortest spanning subtree of a graph and the traveling salesman problem. Proceedings of the American Mathematical Society, 7(1):pp. 4850, 1956. [Kuhn et al., 2006] F. Kuhn, T. Moscibroda, and R. Wattenhofer. Fault-tolerant clustering in ad hoc and sensor networks. In Distributed Computing Systems, 2006. ICDCS 2006. 26th IEEE International Conference on, pages 6868. IEEE, 2006. [Langheinrich, 2009] M. Langheinrich. A survey of rd privacy approaches. Personal and Ubiquitous Computing, 13(6):413421, 2009.
90
Bibliography [Leaf and Preusser, 1999] W.A. Leaf and D.F. Preusser. Literature review on vehicle travel speeds and pedestrian injuries. National Highway Trac Safety Administration, http://www.nhtsa.dot.gov/people/injury/research/ pub/HS809012.html, October 1999. [Li et al., 2009] N. Li, N. Zhang, S.K. Das, and B. Thuraisingham. Privacy preservation in wireless sensor networks: A state-of-the-art survey. Ad Hoc Networks, 2009. [Lin and Lu, 2012] Xiaodong Lin and Rongxing Lu. Bibliography on secure vehicular communications. http://bbcr.uwaterloo.ca/ rxlu/sevecombib.htm, 2012. [Lin et al., 2008] X. Lin, R. Lu, C. Zhang, H. Zhu, P.H. Ho, and X. Shen. Security in vehicular ad hoc networks. Communications Magazine, IEEE, 46(4):8895, 2008. [Liu and Ning, 2008] An Liu and Peng Ning. Tinyecc: A congurable library for elliptic curve cryptography in wireless sensor networks. In Proceedings of the 7th International Conference on Information Processing in Sensor Networks (IPSN 2008), pages 245256, April 2008. [Liu et al., 2005] D. Liu, P. Ning, S. Zhu, and S. Jajodia. Practical broadcast authentication in sensor networks. In Mobile and Ubiquitous Systems: Networking and Services, 2005. MobiQuitous 2005. The Second Annual International Conference on, pages 118129, 2005. [Lopez and Zhou, 2008] J. Lopez and J. Zhou. Wireless Sensor Network Security. Cryptology and Information Security Series, IOS Press, 2008. [Lopez, 2008] J. Lopez. Wireless sensor network security, volume 1. Ios Pr Inc, 2008. [Lu et al., 2012] R. Lu, X. Li, T.H. Luan, X. Liang, and X. Shen. Pseudonym changing at social spots: An eective strategy for location privacy in vanets. Vehicular Technology, IEEE Transactions on, 61(1):8696, 2012. [Luo and Hubaux, 2004] J. Luo and J.P. Hubaux. A survey of inter-vehicle communication. Lausanne, Switzerland, Tech. Rep, IC/2004/24, 2004. [Ma et al., 2010] Z. Ma, F. Kargl, and M. Weber. Measuring long-term location privacy in vehicular communication systems. Computer Communications, 33(12):14141427, 2010. [McMillin et al., 1998] B. McMillin, J. Sirois, R. Mahoney, and F. Budd. Fault-tolerant and secure intelligent vehicle highway system software a safety prototype. In IEEE International Conference on Intelligent Vehicles. IEEE, 1998. [Mehta et al., 2007] K. Mehta, D. Liu, and M. Wright. Location privacy in sensor networks against a global eavesdropper. In Network Protocols, 2007. ICNP 2007. IEEE International Conference on, pages 314323. IEEE, 2007. [mir, ] http://www.shamus.ie/. [Molnar and Wagner, 2004] D. Molnar and D. Wagner. Privacy and security in library rd: Issues, practices, and architectures. In Proceedings of the 11th ACM conference on Computer and communications security, pages 210219. ACM, 2004. [Nohara et al., 2005] Y. Nohara, S. Inoue, K. Baba, and H. Yasuura. Quantitative evaluation of unlinkable id matching schemes. In Proceedings of the 2005 ACM workshop on Privacy in the electronic society, pages 5560. ACM, 2005. [Ohkubo et al., 2004] M. Ohkubo, K. Suzuki, and S. Kinoshita. Ecient hash-chain based rd privacy protection scheme. In International Conference on Ubiquitous ComputingUbicomp, Workshop Privacy: Current Status and Future Directions, 2004.
91
BIBLIOGRAPHY zlio Lopez, , and Ricardo Dahab. [Oliveira et al., 2008] Leonardo B. Oliveira, Michael Scott, Jd TinyPBC: Pairings for Authenticated Identity-Based Non-Interactive Key Distribution in Sensor Networks. In Proceedings of the 5th International Conference on Networked Sensing Systems (INSS08), pages 173179, Kanazawa/Japan, June 2008. IEEE, IEEE. [Peris-Lopez et al., 2006] P. Peris-Lopez, J. Hernandez-Castro, J. Estevez-Tapiador, and A. Ribagorda. Rd systems: A survey on security threats and proposed solutions. In Personal Wireless Communications, pages 159170. Springer, 2006. Tygar, and Dawn Song. The TESLA Broad[Perrig et al., 2002] Adrian Perrig, Ran Canetti, J.D. cast Authentication Protocol. RSA CryptoBytes, 5(Summer), 2002. [Perrig et al., 2004] A. Perrig, J. Stankovic, and D. Wagner. Security in wireless sensor networks. Communications of the ACM, 47(6):5357, 2004. [Ptzmann and K ohntopp, 2001] A. Ptzmann and M. K ohntopp. Anonymity, unobservability, and pseudonymitya proposal for terminology. In Designing privacy enhancing technologies, pages 19. Springer, 2001. [Piotrowski et al., 2006] K. Piotrowski, P. Langendoerfer, and S. Peter. How public key cryptography inuences wireless sensor node lifetime. In Proceedings of the fourth ACM workshop on Security of ad hoc and sensor networks, pages 169176, Nov. 2006. [Preneel and Oorschot, 1999] B. Preneel and Van Oorschot. On the security of iterated message authentication codes. IEEE Transactions on Information theory, 45(1):188199, 1999. [Prim, 1957] R.C. Prim. Shortest connection networks and some generalizations. Bell system technical journal, 36(6):13891401, 1957. [Rajendran and Sreenaath, 2008] T. Rajendran and K. V. Sreenaath. Secure anonymous routing in ad hoc networks. In Proceedings of the 1st Bangalore Annual Computer Conference. ACM New York, 2008. [Rappaport, 2001] Theodore Rappaport. Wireless Communications: Principles and Practice. Prentice Hall PTR, Upper Saddle River, NJ, USA, 2nd edition, 2001. [Raya and Hubaux, 2005] M. Raya and J. P. Hubaux. The security of vehicular ad hoc networks. In Proc. of Third ACM Workshop on Security of Ad Hoc and Sensor Networks (SASN 2005). ACM, 2005. [Raya and Hubaux, 2007] M. Raya and J.P. Hubaux. Securing vehicular ad hoc networks. Journal of Computer Security, 15(1):3968, 2007. [Reiter and Rubin, 1998] M.K. Reiter and A.D. Rubin. Crowds: Anonymity for web transactions. ACM Transactions on Information and System Security (TISSEC), 1(1):6692, 1998. [rel, ] http://code.google.com/p/relic-toolkit/. [Ren et al., 2011] D. Ren, S. Du, and H. Zhu. A novel attack tree based risk assessment approach for location privacy preservation in the vanets. In Communications (ICC), 2011 IEEE International Conference on, pages 15. IEEE, 2011. [RFID, 2012] Wikipedia RFID. Radio-frequency identication. wiki/Radio-frequency_identification, 2012. http://en.wikipedia.org/
[Rieback et al., 2005] M. Rieback, B. Crispo, and A. Tanenbaum. Rd guardian: A batterypowered mobile device for rd privacy management. In Information Security and Privacy, pages 259273. Springer, 2005.
92
Bibliography [Sampigethaya et al., 2005] K. Sampigethaya, L. Huang, M. Li, R. Poovendran, K. Matsuura, and K. Sezaki. Caravan: Providing location privacy for vanet. In in Embedded Security in Cars (ESCAR, 2005. [Sampigethaya et al., 2007] K. Sampigethaya, M. Li, L. Huang, and R. Poovendran. Amoeba: Robust location privacy scheme for vanet. IEEE Journal on Selected Areas in Communications, 25(8):15691589, 2007. [Schnorr, 1991] C. P. Schnorr. Ecient signature generation by smart cards. Journal of Cryptology, 4(3):161174, 1991. [Schoch et al., 2006] E. Schoch, F. Kargl, T. Leinm uller, S. Schlott, and P. Papadimitratos. Impact of pseudonym changes on geographic routing in vanets. Security and Privacy in Ad-Hoc and Sensor Networks, pages 4357, 2006. [Serjantov and Danezis, 2003] A. Serjantov and G. Danezis. Towards an information theoretic metric for anonymity. In Privacy Enhancing Technologies, pages 259263. Springer, 2003. [Seys and Preneel, 2006] S. Seys and B. Preneel. ARM: Anonymous routing protocol for mobile ad hoc networks. In 20th International Conference on Advanced Information Networking and Applications, AINA, pages 133137. IEEE, 2006. [Sharma et al., 2012] S. Sharma, A. Sahu, A. Verma, and N. Shukla. Wireless sensor network security. Advances in Computer Science and Information Technology. Computer Science and Information Technology, pages 317326, 2012. [Sheng and Li, 2008] B. Sheng and Q. Li. Veriable privacy-preserving range query in two-tiered sensor networks. In Proceedings of Infocom, pages 4650. IEEE, 2008. [Sirivianos et al., 2007] M. Sirivianos, D. Westho, F. Armknecht, and J. Girao. Non-manipulable aggregator node election protocols for wireless sensor networks. In Modeling and Optimization in Mobile, Ad Hoc and Wireless Networks and Workshops, 2007. WiOpt 2007. 5th International Symposium on, pages 110. IEEE, 2007. [Studer et al., 2008] A. Studer, E. Shi, F. Bai, and A. Perrig. TACKing Together Ecient Authentication, Revocation, and Privacy in VANETs. Technical report, Carnegie Mellon CyLab, 2008. [Syamsuddin et al., 2008] Irfan Syamsuddin, Tharam Dillon, Elizabeth Chang, and Song Han. A survey of RFID authentication protocols based on hash-chain method. In Convergence and Hybrid Information Technology ICCIT08, volume 2, pages 559564. IEEE, 2008. [Szczechowiak et al., 2008] Piotr Szczechowiak, Leonardo B. Oliveira, Michael Scott, Martin Collier, and Ricardo Dahab. Nanoecc: Testing the limits of elliptic curve cryptography in sensor networks. In Proceedings of the European conference on Wireless Sensor Networks (EWSN08), 2008. [Tel, 2000] Gerard Tel. Introduction to Distributed Algorithms (2nd ed.). Cambridge University Press, 2000. [VSC, ] Vehicle safety communications project. http://www-nrd.nhtsa.dot.gov/pdf/nrd-12/ CAMP3/pages/VSCC.htm/. [Wagner, 2004] David Wagner. Resilient aggregation in sensor networks. In Proceedings of the 2nd ACM workshop on Security of ad hoc and sensor networks, SASN 04, pages 7887, New York, NY, USA, 2004. ACM. [Wan et al., 2002] C.Y. Wan, A.T. Campbell, and L. Krishnamurthy. PSFQ: a reliable transport protocol for wireless sensor networks. In Proceedings of the 1st ACM international workshop on Wireless sensor networks and applications, pages 111. ACM, 2002.
93
BIBLIOGRAPHY [Wiedersheim et al., 2010] B. Wiedersheim, Z. Ma, F. Kargl, and P. Papadimitratos. Privacy in inter-vehicular networks: Why simple pseudonym change is not enough. In Wireless On-demand Network Systems and Services (WONS), 2010 Seventh International Conference on, pages 176 183. IEEE, 2010. [Willke et al., 2009] T.L. Willke, P. Tientrakool, and N.F. Maxemchuk. A survey of inter-vehicle communication protocols and their applications. Communications Surveys Tutorials, IEEE, 11(2):3 20, quarter 2009. [Wu et al., 2009] D.L. Wu, W.W.Y. Ng, D.S. Yeung, and H.L. Ding. A brief survey on current rd applications. In Machine Learning and Cybernetics, 2009 International Conference on, volume 4, pages 23302335. IEEE, 2009. [Xi et al., 2006] Y. Xi, L. Schwiebert, and W. Shi. Preserving source location privacy in monitoring-based wireless sensor networks. In Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International, pages 8pp. IEEE, 2006. [Xiong et al., 2010] Xiaokang Xiong, Duncan S. Wong, and Xiaotie Deng. TinyPairing: A Fast and Lightweight Pairing-based Cryptographic Library for Wireless Sensor Networks. In Proceedings of the IEEE Wireless Communications & Networking Conference. IEEE, 2010. [Yick et al., 2008] J. Yick, B. Mukherjee, and D. Ghosal. Wireless sensor network survey. Computer networks, 52(12):22922330, 2008. [Zhang et al., 2006] Y. Zhang, W. Liu, W. Lou, and Y. Fang. Mask: Anonymous on-demand routing in mobile ad hoc networks. IEEE Transactions on Wireless Communications, 5(9):2376 2385, 2006. [Zhang et al., 2008] W. Zhang, C. Wang, and T. Feng. Gp 2s: Generic privacy-preservation solutions for approximate aggregation of sensor data (concise contribution). In Pervasive Computing and Communications, 2008. PerCom 2008. Sixth Annual IEEE International Conference on, pages 179184. IEEE, 2008. [Zhu et al., 2003] Sencun Zhu, Sanjeev Setia, and Sushil Jajodia. Leap: Ecient security mechanisms for large-scale distributed sensor networks. In Proceedings of the 10th ACM conference on Computer and communications security, pages 6272. ACM Press, 2003.
94

Dissertation

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Dissertation

Încărcat de

Drepturi de autor:

Formate disponibile

Budapest University of Technology and Economics Department of Telecommunications

Privacy enhancing protocols for wireless networks

Supervisor: Levente Butty an, Ph.D.

TO BE ON THE SAFE SIDE

Budapest, Hungary 2012

Budapest, . . . . . . . . . . . . . . . . . . . . . . . . Holczer Tam as

2.1 3.1 4.1 4.2 4.3

Introduction to RFID systems

Introduction to Vehicular Ad Hoc Networks

Introduction to Wireless Sensor Networks

1.3. Introduction to Wireless Sensor Networks

Private Authentication in Resource Constrained Environments

Resistance to single member compromise

Optimal trees in case of single member compromise

( ) 2 2 2 2 (bi+1 1)2 b2 i + (bi 1) (bi 1) bi+1 (bi+1 1)

Since bi 2 for all i, R is non-negative if bi + 1 bi+1 + 1 bi 1 bi+1 1 (2.5)

from (2.7), however, we do not

need that in this chapter.

and N 2 R(A ) = ((a1 1) a2 . . . ak 4)2 + . . . + ((ak 1) 4)2 + (4 1)2 + 1

b1 = 5. However, b1 = 5 cannot be the case, because N is not divisible by 5.

where B1 = (b2 , b3 , . . . , b ). Similarly, R(B ) = b 11 b 1 1 R(B1 ) b12

Analysis of the general case

((b1 k )b2 . . . b )2 + b1 b2 . . . b ((b1 k )b2 . . . b ) 1 + b1 b2 . . . b b1

Normalized average anonymity set size

Normalized average anonimity set size

Simulation result for (S<>/N) Approximation (S0/N)

50 100 150 200 250 Number of compromised members (c)

50 100 150 200 250 Number of compromised members (c)

2.5. The group-based approach

1 B = [72 5 5 5 3] B = [30 30 30]

Estimated normalised average anonimity set size (S0/N)

1 B = [60 30 15] B = [60 5 5 3 3 2]

20 40 60 80 Number of compromised members (c)

20 40 60 80 Number of compromised members (c)

The group-based approach

k1 k1,1 k1,1,1 k1,1,2 k1,2 k1,2,1 k1,2,2 k2,1,1

k2 k2,1 k2,2 k2,2,2

Analysis of the group based approach

Similarly, the second moment of C can be computed as follows: [ ]2 [ 2] E C =E IAi = [

Comparison of the group and the key-tree based approach

2.8. Related work

1 0.9 0.8 Simulation result Formal result

1 0.9 0.8 Tree based authentication Group based authentication

Level of privacy (R)

Number of compromised members (c)

Number of compromised members (c)

Location Privacy in Vehicular Ad Hoc Networks

Model of local attacker and mix zone

ports 1 observation spots 4 3 5 6 observed zone mix zone 2 1

The model of the mix zone

The operation of the adversary

Analysis of the adversary

3.2. Model of local attacker and mix zone

Pr(C |k )ps (k ) Pr(C )

Pr(C |k )ps (k ) = arg max Pr(C |k )ps (k ) Pr(C ) k

ck pk ck ps (k ) = arg max ps (k ) = arg max ps (k ) p(k ) p(k )N p(k ) k k

The level of privacy provided by the mix zone

Simulation of mix zone

3.3. Simulation of mix zone

Figure 3.2: Simplied map of Budapest generated for the simulation.

Success probability of an attack [%]

Low traffic Medium traffic High traffic 0 10 20 30 40 50 60

Number of attacker antennas

3.5. Framework for location privacy in VANETs