Sunteți pe pagina 1din 12

Universidade Estadual de Campinas

Instituto de Computao

  
 
 
 
  

Masters thesis project proposal: A software for format-preserving encryption

Masters Candidate: Hansen David Gonzlez Sastoque


Adviser: Prof. Dr. Julio Csar Lpez Hernndez

Abstract
Format-preserving encryption(FPE) has emerged as a relevant cryptographic tool, it is
used for the encryption of messages with a precise format that result in a ciphertext with
the same format. It is being used for credit card numbers (CCN) and Security Social Number(SSN) in the US, as objective is intended to study the specific case for Brazil the Cadastro
de Pessoa Fisica (CPF) and a software capable of emulate the encryption and decryption
process will be produced. This proposal will introduce the state of the art beside the tools
planned to be used for the development of this project.

Contents
1 Introduction

2 Feistel Networks

3 Tweak

4 Cryptol

5 Algorithms for format-preserving formatting

5.1

FFX Standard

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5.2

BPS algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

6 Objectives

7 Methodology

10

8 Schedule

10

Introduction

Format-preserving encryption(FPE) arises to deal with sensitive information when it is important


to preserve its format, the well-known cipherblocks as the Advanced Encryption Standard (AES)
deals only with the alphabet {0,1}128 producing a cipher text within the same alphabet, Moreover
most of the common cipher blocks works under a signature similar to:  : {0,1}k {0,1}n
{0,1}n (where k is the length of the key space and n the message space), unfortunately a set of
applications cannot work properly under this conditions because they need to preserve the own
message space.
With this useful cryptographic tool can be reached applications that cannot permit a change in
its structure, for instance, database applications where a change in a field would produce a chaotic
code re-engineering or even could not be possible to made a change in its structure. As other
example consider data sanitization that consists in hide the personal identifiable information in a
database, this is a legitimate concern nowadays, as solutions FPE appears to be the most efficient
approach providing a transparent encryption. Finally, it can be found in more applications over
financial-information security, social security numbers (SSNs), primary account numbers(PANs),
credit card numbers(CCNs), 512-byte disk sectors or errors, postal addresses, even files of a
common type as JPEG,MPEG files.
As is highlighted by Bellare what makes FPE an interesting and powerful idea is that the notion reaches far beyond blockciphers [1], FPE can be considered as a more powerful abstraction
of blockciphers where a wide set of problems is solved. (Format-preserving encryption. Apparently the first researchers who described the FPE problem were Brightwell and Smith (1997) [4],
nonetheless the name chose was datatype-preserving encryption, they gave a very informal description about their problem involving the encryption of a entry in a database formed by an
SSN(social security number) that must be encrypted as another SSN, they suggested a joyfully
solution which lacks of a formal mathematical foundation.
Terence Spies coined the term of Format-Preserving Encryption close to 2003, however the roots
of FPE can be located at 1981 when the US National Bureau of Standards published the NBS data
encryption standard [9], unfortunately the submission was completely flawed and was withdrawn
in 2005 by the NIST(former NBS) , Moreover as is mentioned on [2] Black and Rogaway rouse the
problem on the cryptographic community in 2002, after some time Bellare, Rogaway and Spies
prepared a more elaborated proposal and submitted it to the NIST, finally recently in march
of 2016 was published as standard over the name: Recommendation for Block Cipher Modes of
Operation: Methods for Format-Preserving Encryption [5].
A common issue as is reported by Rogaway It is my view that the cryptographic literature
already contains good solutions for FPE but that, at least until quite recently, the ideas were
scattered about, not widely known, and not cohesively described [11], therefore it is a challenge to
unify and bring together the information gathered around FPE, moreover, even spread it around
the world.
Diverse approaches could be used to tackle the FPE problem, the two most straightforward are

create a completely new scheme from the scratch or use common accepted schemes and adjust
them to work under the constraints of the FPE problem. The first approach involve a more natural
solution designed exclusively for the FPE problem, on the other hand design a new scheme is a
laboriously work that depends in the acceptance and help of the security community. Instead the
second approach implies a better adaptation as is mentioned by Rogaway The only practical way
that one will get a widely-accepted and widely-used FPE scheme is to base it on a conventional
and widely accepted blockcipher [11]
It is crucial present a block cipher that is frequently used in the construction of FPE schemes
due to its great reputation , the blockcipher Advanced Encryption Standard (AES), one of the
safest and most widely used and studied algorithms today, available for public use. It is classified
by the National Security Agency (NSA)to the highest security, in 1997 the National Institute of
Standards and Technology former of the National Institute of Standards and Technology (NIST)
announced the search for a successor to the DES encryption standard, an algorithm called "Rijndael" developed by the Belgian cryptologists Joan Daemen and Vincent Rijmen was highlighted
in safety, performance and flexibility. This algorithm beat several competitors and was officially
presented as the new encryption standard in 2001 and became effective in 2002. The algorithm is
based on operations such as: substitutions, permutations, and linear transformations.
The foundation of FPE could be reduced as a permutation function nonetheless the adversary
must not distinguish or get any information of this function, this reduction is well-known as the
famous PRP(Pseudo Random Permutation). Therefore the use of some techniques based on the
PRP problem are useful in the FPE problem
The algorithms schemes in FPE are distinguished by the number of characters in the message
denoted here by N = |M | . For spaces where N 210 are spaces considered tiny and the usage
of algorithms with a high complexity is permitted, for instance some examples are the Knuth
shuffle [10], permutation numbering and prefix cipher [3]. Even so, these schemes are not quite
contemplated in the literature because larger spaces algorithms work gracefully here, thus the
literature are devoted on them. The next case represents spaces with N 2w where w is the
block size of a blockcipher, these spaces are considered as small spaces, for instance the most
famous block cipher AES(wit a block of 128 bits) is considered as the limit for these medium
spaces, the application examples gave before of SSN, CCN fit here, it can be found a collection of
vast schemes that tackle the problem, however the most common are based on Feistel networks
that are going to be illustrated in the next section, though the most relevant algorithms are: FIPS
74 scrambling [9] as was mentioned the first algorithm found in the literature, FFSEM [12] a
former NIST submission based on classical Feistel networks and cycle walking, it was replaced by
the FFX [2] standard that is the last scheme suggested, it was accepted recently as part of the
NIST standard in 2016 [5].
Finally the space where N 2w with w 128 is considered as large-space FPE, for instance
in this category enters the 512 bytes disk sectors and also the images case. Once more are found
a variety of schemes most of them based on hashing, however, only is going to be mentioned the
scheme BPS [6] that was accepted recently as part of the NIST standard [5] in methods of format
preserving encryption, in contrast with its competitors this scheme is flexible and can cipher short
or long strings.

Feistel Networks

A particular case of block coding algorithms with iteration is the family of Feistel ciphers. In
this coding system, a block of text is trimmed in light into two, then a round transformation is
applied to one of the halves, and the result is combined with the other half through an exclusive
xor, the two halves are then invested for the next round. An advantage of such algorithms is that
encryption and decryption are structurally identical, just reverse the round order. These networks
were named after the German-born physicist and cryptographer Horst Feistel who did its research
while working for IBM.
The security of a Feistel network could be determined by the F-function and the number of
rounds required for resistance to a attack, all this is reliant on the properties of the function. An
useful property of Feistel Networks is that they always produce a reversible function, this is an
ideal property for build blockciphers
Often the message has a length that does not permit a slice in even cuts, so the necessity of
discover variants of the original problem that removes this previous limitation, doing so derives
in an interesting implications for designing secure ciphers. Three approaches are considered, the
first one relies on the concept of cycle walking that consists on a network with two halves of
different length when one half enters in the F function the length could increase, thus this halve
reenters in the F function until it falls into the domain, it is commented to watch carefully this
point because could reveal information for possible side-channel attacks. A second approach is
the unbalanced feistel networks here is chosen a fixed length left hand side different from the right
hand side, again one half enters in the F function but in contrast with usual feistel networks now
does not make sense to apply a XOR operation, thus is an addition operation chosen for these
matters, usually is used the characterwise addition or the blockwise addition(explained in the FFX
algorithms section),finally the results are concatenated, leaving a block ready to be divided again.
Finally the last approach, takes in consideration the following, suposse it is generated after one
round the intermediate result 123 || 45 (in a message with a length: left = 2, right = 3) the next
round will reorder it in 12 || 345 for the next round, this little details generate a extra work in the
implementation, therefore, the alternated feistel network seek for avoid the extra work paying a
cost of having two kinds of round functions that are alternately used: one that does the expansion
and another one for the reduction.
The three different Feistel networks can be appreciated in the Figure 1.

Tweak

The tweak could be considered as a new cryptographic primitive, they are planned to be easy to
design and also the cost of make a tweakable cipher should be small. It is important to highlight
that the objective of the tweak is not improve the security, moreover, introduce a tweak in a
blockcipher does not increase the security.

Figure 1: Classes of Feistel Networks(Taken from [7])


Even the objective of a Tweakable Block cipher could be defined more precisely by Liskov,
Rivest and Wagner with a tweakable block cipher should have the property that changing the
tweak should be less costly than changing the key [8], this is true due to every change of the
key involves a process of key setup that usually implies a lot of work and time. Another goal is
maintaining variability that usually is achieved by using the input, doing a change in it before
enter the block cipher, unfortunately the common block cipher is not prepared for this matter,
therefore should be analyzed and propose a new block cipher, here arises the tweakable block
cipher.
Also another property is that keeping the tweak hidden does not strengthen the scheme security,
as is mentioned by Liskov, Rivest and Wagner in each fixed setting of the tweak gives rise to
a different, apparently independent, family of standard block cipher encryption operators [8].
Therefore, even if the adversary has the control over a Tweak, no much could be achieved in order
to break the cryptosystem.
The importance of tweakable blockcipher over the FPE problem is critical in the cases where the
plaintext to being cipher is too short, then it could be an easy target of dictionary or brute-force
attacks, for example with SSN, the message space is too short and sensible to attacks.

Cryptol

Cryptol is a Domain Specific Language designed for every person that works in cryptography, it
does not matter any deep knowledge or deep manage of memory, control structures, providing
a great tool for deploy fast and readable code. It is constructed as formal functional language
thereby it does not have control structures. As a formal language can be used for validation
and verification. As a platform for generation, Cryptol is a declarative language that is platform
neutral.

It was developed by Galois Inc a private company specialized in security services. A group of
experts meet after the AES competition and analyzed the algorithms made by the top 5 finalists
and extract a set of common characteristics that they share, moreover they decide to create a
language to cover all this point and begun Cryptol a language of more than 20 years of work.
As the cryptography is becoming more ubiquitous, arise the necessity to deploy it on more
platforms. Many of the most interesting platforms for cryptography are on embedded processors
and other specialized hardware that have a wide variety of requirements to satisfy, Cryptol is
designed for cover all the life-cycle necessary to produce a user product from a simple mathematical
abstraction. Cryptol was originally developed for use by the NSA(National Security Agency).
However, now it is a free software project and is being used by private firms as the American
company Rockwell Collins providing security services to aerospace and defence of the united
states of america.
In Cryptol, the cryptographic concepts can be expressed directly and formally in a independent
way of the details of a particular hardware platform, for example it is not important if the algorithm
can be deployed in a FPGA or embedded device, as well as a notebook. Cryptol can be seen as
a language for Cryptography, using high-level abstractions to express the same concepts and
idioms as those found in published algorithms, developers can implement or develop quickly new
algorithms using the old ones. Therefore, programmers can focus on the cryptography itself, and
not are going to be distracted by machine-level details such as word size.
Cryptol can be used to validate new cryptographic implementations by generating test vectors
moreover randomized vectors. Also it can be seen as a robust tool for verification supported on
Z3 as a default Theorem solver however can be used with anyone, this solver is used to verify
formally the programs. Cryptol is intended for use abroad embedded systems, smart cards, and
FPGAs.
Cryptol could be implemented as the basis of a standard library of cryptographic specifications. Moreover cryptol can look up the basis for complete design capturing structures highly
parametrized. For example in block ciphers can be found things like block size, key size, number
of rounds, etc, but in practice are only specified for certain fixed standard sizes. Cryptol was
designed to be well suited to capturing the complete parameterized design.
One of the core objectives of cryptol is that the generated code would be designed for readability
instead of efficiency. Cryptol generated code do not need to be highly optimized since it will be
responsible for a tiny portion of the runtime. Specifications are inherently platform agnostic.
Also is a specification who avoids unnecessary sequentiality, thus it can take advantage of the
highly parallel nature of hardware when a new algorithm is created in the paper is usually a
reference implementation written in the C, C++ or Pseudocode language, Unfortunately they are
not suitable as a basis for a specification in code. As a result a procedural specification will hide
the mathematics under in a cryptographic algorithm, moreover, a procedural specification will
needlessly enforce a sequential order upon the algorithm.
Numbers in Cryptol are represented by bit vectors, and, as is typical with crypto algorithms,
explicitly use modular arithmetic. It also supports polynomial arithmetic, which occurs in some

advanced cryptographic algorithms, such as AES and TwoFish, in addition, to these various
operators sequence comprehensions allow element-wise specification of sequences For prevent a
series of lateral attacks as a timing attack, thus the control flow is forbidden.

Algorithms for format-preserving formatting

5.1

FFX Standard

The name FFX is meant to suggest Format-preserving. Feistel-based encryption proposed by Mihir
Bellare, Terence Spies and Phillip Rogaway [2], the X means that it accepts multiple parameters.
FFX has its origin from the FFSEM specification developed by Terence Spies [12] . Nonetheless
it was improved and now is more general, adding in support for tweaks, non-binary alphabets,
and non-balanced splits. FFX is planned for use over strings of any length over any alphabet. It
uses a Pseudorandom function as cryptographic primitive, it is recommended by the author the
use of AES as cryptographic primitive.
There is not much to be said about the FFX algorithm, because it is basically a Feistel Network one of the three types described in the former section unbalanced, alternated or balance,
nevertheless the balance case could be considered as an unbalanced network with zero unbalance. What is important to describe is the nine parameters necessary for its implementation, they
should be pinned in order to define a specific scheme, for instance the scheme FFX-A2 encrypts
binary strings of 8-128 bits and the FFX-A10 scheme strings of 4-36 decimal digits. Finally the
description of the nine parameters are described:
Radix: A number greater or equal than 2 that specifies the alphabet. It assumes the sequence
between 0 and Radix - 1, because can be established a bijection between the alphabet chose
and this sequence
Lengths: the set of possible lengths
Keys: A set of binary strings
Tweaks: Defines a specific mapping for short strings, this concept will be explored deeply in
the next section
Addition: It has to options, the number 0 when it is characterewise it is applied a modular
arithmetic sum digit a digit, though when it is chose 1 blockwise addition is used instead
that uses modular arithmetic with 2radix
Method: Here is only considered two possibilities: 1 for Unbalanced feistel networks and 2
for alternative feistel networks
Split: Determines the unbalance over the feistel network chose

Rounds: It is a function that takes a length in Lengths and gives an even number of round.
This is considered for performance and security considerations
Function: Finally this function should take the key, the tweak, the permitted length, the
round number and a string with the input value, and generates a value within the alphabet
.

5.2

BPS algorithm

the BPS is a generic format-preserving symmetric encryption algorithm BPS, designed for cipher
short or long string of characters from any given set. It bring the help of tweak capability a quite
useful tool when the user would like to cipher very small strings of data.
The article emphasize that the use of unbalanced Feistel networks, that is the common scheme
used to solve the FPE problem, reveals a problem of a considerable cost based on the multiple
calls to the underlying cipher. In order to propose a better approach arises the BPS algorithm.
As is said in [6] However, we are looking here for elegant constructions that are not based on
any engineering trick and that produce ciphertexts with strictly no expansion., here the strictly
no expansion phrase means do not break the format restriction.
As was mentioned before, it is interesting to fabricate a block cipher capable of solve the FPE
problem making use of already standardized ciphers primitives such as AES or SHA-2, therefore
BPS can use any standard primitive.The BPS cryptosystem is composed of an internal block
cipher (BC) and a mode of operation. The encryption is composed of w simple straightforward
Feistel network rounds with a similar decipher, its mode of operation is similar to the cipher-block
chaining (CBC) , however it differs from CBC when is applied the xor function, here it is replaced
by the modular sum. finally as is stated in [6] when is mentioned the feistel networks used by
Black and Rogaway We believe that their original proposal is very elegant and is a major step
for format-preserving constructions, but we made some adaption in order to smoothly support any
string length and add tweak capability

Objectives
Contribute with the Format-preserving encryption in the Brazilian case CPF(Cadastro de
Pessoas Fsicas) ;
Produce a software capable of encrypt and decrypt a CPF using different algorithms and
express its efficiency;
Improve an efficient implementation of the studied algorithms for solve the FPE problem ;

Methodology

First of all, It will be studied deeply the most common algorithms for FPE described before,
however the NIST standard [5] will have a special treatment due to its importance, the algorithms
described there are the FFX schemes discussed before, moreover they specify two special cases of
schemes. Subsequently, it will be implemented in the C language these algorithms, just to gain
experience.
Thereupon the algorithms studied will be studied in Cryptol, looking for do a implementation
that maintains a clear and concise model of the algorithm as the cryptol philosophy express,
additionally, it will search and test a module for verify the correctness of the algorithm. Finally it
will be compared in metrics of line of codes(LOC) and efficiency in order to measure the advantages
of the two approaches.
Afterwards, it is going to be studied the INTEL and ARM platforms in order to create a
implementation under these architectures, moreover, it will explore the ARMv8 architecture with
64-bit support including a focus on power-efficient implementation while it maintains compatibility
with existing 32-bit software. Later it will be implemented a version in this architectures and it
will be compared with the former implementation in C, during this a research process looking for
improvements in the implementation will occur.
Finally, when it has concluded the implementation and improvements it will be studied the
analogous case to secure social number (SSN) in US , cadastro de pessoa fisica (CPF) in Brazil.
After this will be generated a software capable of solve the encryption of the CPF with the FPE
approach and will be discussed which implementation could be more suitable and why.

Schedule

Before enter to the masters degree program the student took the following courses as special
student:
MO401 - Arquitetura de Computadores I ;
MO417 - Complexidade de Algoritmos I ;
On the first semester and winter break was taken the following courses:
MO421 - Introduo Criptografia
MO644 - Programao Paralela
MO850 - Tpicos Avanados em Cincia da Computao I

10

Finally, the activities expected for the development of the project are broken down in the next
list:

Activity A Course MO422 - Algoritmos Criptogrficos


Activity B Study of the algorithms FFX and BPS.
Activity C Basic implementation of the algorithms FFX and BPS in C.
Activity D Implementation of the algorithms in Cryptol.
Activity E Implementation of the algorithms on ARM and Intel architectures
Activity F Generate report of performance
Activity G Research on possible improvements in the implementation
Activity H Generate a software for encrypt CPFs using the algorithms implemented.
Activity I Masters Dissertation Writing.
Activity J Thesis defense.

2016
Activity
Activity
Activity
Activity
Activity

Jan

Feb

Mar

Apr

May

Jun

Jul

Aug Sep

Oct

Nov

Dec

Jun

Jul

Aug

Oct

Nov

Dec

A
B
C
D

2017
Activity
Activity
Activity
Activity
Activity
Activity
Activity
Activity

Jan

Feb

Mar

Apr

May

D
E
F
G
H
I
J

11

Sep

References
[1] Bellare, Mihirand Ristenpart, T. R. P. S. T. Format-Preserving Encryption. Springer
Berlin Heidelberg, Berlin, Heidelberg, 2009, pp. 295312.
[2] Bellare M, Rogaway P, S. T. The ffx mode of operation for format-preserving encryption.
Manuscript (Standars Proposal) submitted to NIST (2010).
[3] Black, J., and Rogaway, P. Ciphers with Arbitrary Finite Domains. Springer Berlin
Heidelberg, Berlin, Heidelberg, 2002, pp. 114130.
[4] Brightwell, M., and Smith., H. Using datatype-preserving encryption to enhance data
warehouse security. In 20th NISSC Proceedings (1997).
[5] Dworkin, M. Recommendation for block cipher modes of operation: Methods for formatpreserving encryption. NIST Special Publication 800-38G (2016).
[6] Eric Brier, Thomas Peyrin, J. S. Bps: a format-preserving encryption proposal. submitted to NIST (2010).
[7] Hoang, V. T., and Rogaway, P. On Generalized Feistel Networks. Springer Berlin
Heidelberg, Berlin, Heidelberg, 2010, pp. 613630.
[8] Liskov, Mosesand Rivest, R. L. W. D. Tweakable block ciphers. Journal of Cryptology
24, 3 (2011), 588613.
[9] of Dtandards[USA], N. B. Guidelines for implementing and using the nbs data encryption
standard. FIPS Pub 74 (apr 1981).
[10] R.Durstenfeld. Algorithm 235: Random permutation. CACM, 7(7), jul 1964.
[11] Rogaway, P. A synopsis of format-preserving encryption, 2010.
[12] Spies., T. Feistel finite set encription. NIST submission, feb 2008.

12

S-ar putea să vă placă și