End of Studies Project

Ministry of Higher Education and Scientific Research
University of Carthage
*-*-*-*-*
Engineering School of Statistics and Data Analysis
End of Studies Project

For obtaining the
National Diploma of Engineering in Statistics and Data Analysis
Temporal Data Analysis and Machine

Learning for Decision Support Applications
Perfomed by
Slimaine Ben Attia
Hosting Company:
ESSAI tutor: Mrs.Fatma CHAKER-KHARRAT

Company supervisor : Mrs. Imen Majed
Supported by: Mr. Mohamed Mehdi Sidommou
Academic Year:2013/2014
1
To my beloved Mother, for her prayers to me and who

emphasized the importance of education and helped me with my
lessons throughout her life.
To my father, the first to teach me and who makes me want to
be a better person.
To my brother and my sister who have been my emotional
anchors through not only the vagaries of graduate school, but
my entire life.
To Zeineb, for her support throughout these five years and each
step of my way.
To my friends, for their presence and encouragement,
To all of you, I dedicate this work

Slimaine
Acknowledgments
I would like to take opportunity to express my gratitude to everyone who contributed

to the realization of this project.
First, I owe thanks to Mr. Samy Achour, CEO of Integration Objects, It has been a
great privilege to achieve this internship at this company.
I would like to thank Mrs. Imen Majed and Mr. Mehdi Sidommou, my supervisors
from Integration Objects, for providing all facilities and support to meet our project
requirements.
I would like to express my deepest gratitude to my tutor from ESSAI, Mrs. Fatma
Chaker, for her help, guidance and willingness to share her vast knowledge.
My thanks go also to the highly esteemed members of the jury, Mrs. Hlla Ouaili
Mallek and Mrs. Ben Slama Nawel, for accepting to evaluate this project.
Furthermore, I would like to thank my colleagues at Integration Objects for providing

a friendly environment, which had helped me in the achievement of this work.
Table of Contents
Acknowledgments ...................................................................................................................... 3
Table of Contents ....................................................................................................................... 4
List of Tables .............................................................................................................................. 7
List of Figures ............................................................................................................................ 8
General Introduction ................................................................................................................ 10
Chapter I GENERAL PRESENTATION ................................................................................ 12
1
Hosting Company: Integration Objects ....................................................................... 13

1.1
Overview .................................................................................................................. 13
1.2
Expertise ................................................................................................................... 13
1.3
Industry Participation and Certification ................................................................... 14
1.4
Technical Department .............................................................................................. 15
Project Overview ............................................................................................................ 15

2.1
Functional Scope ...................................................................................................... 15
2.2
Project Challenges .................................................................................................... 16
2.3
Project Goals ............................................................................................................ 17
2.4
Project Planning ....................................................................................................... 18
Chapter II Preliminary study.................................................................................................... 20

1
State of the Art................................................................................................................ 21

1.1
EVIEWS ................................................................................................................... 21
1.2
IBM SPSS ................................................................................................................ 22
1.3
GRETL ..................................................................................................................... 23
1.4
R ............................................................................................................................... 23
1.5
SAS........................................................................................................................... 24
Comparative Table ......................................................................................................... 25
Statistical Frameworks .................................................................................................. 26

3.1
R.NET ...................................................................................................................... 27
3.2
SAS Integration ........................................................................................................ 27
3.3
Integrate Accord.Net framework.............................................................................. 27
Chapter III Requirements Analysis and Specification ............................................................. 28

1
General Specifications.................................................................................................... 29
1.1
User characteristics .................................................................................................. 29
1.2
Design and implementation constraints ................................................................... 29
System features ............................................................................................................... 29

2.1
Transformation ......................................................................................................... 29
2.1.1 LAG Transformation ............................................................................................ 29

2.1.2 LEAD Transformation ......................................................................................... 30
2.1.3 Power Transformation .......................................................................................... 30
2.1.4 Integrate Transformation ...................................................................................... 31
2.1.5
Seasonal Differencing .......................................................................................... 32
2.1.6 Box-Cox Transformation ..................................................................................... 32

2.1.7 Exponential Smoothing ........................................................................................ 33
2.2
Statistical Test .......................................................................................................... 36
2.2.1 Dickey Fuller Test ................................................................................................ 36

2.2.2 Jarque-Berra Test ................................................................................................. 37
2.2.3 Shapiro-Wilk Test ................................................................................................ 38
2.3
Models and Prediction .............................................................................................. 38
2.3.1 Temporal PLS ...................................................................................................... 38

2.3.2 ARMAX model .................................................................................................... 40
2.3.3 ARIMA model...................................................................................................... 40
2.3.4 Linear Prediction .................................................................................................. 41
2.4
Graph ........................................................................................................................ 42
2.4.1 Box-Plot ............................................................................................................... 42

2.4.2 ACF Graph ........................................................................................................... 42
2.4.3 PACF Graph ......................................................................................................... 43
3
Use case Model ................................................................................................................ 43

3.1
Global Use Case ....................................................................................................... 43
3.2
Manage Project ......................................................................................................... 45
3.3
Missing values Use Case .......................................................................................... 46
Chapter IV Design .................................................................................................................... 48

1
Global Architecture of the System ................................................................................ 49
System Diagrams ............................................................................................................ 50
2.1
Package diagram ...................................................................................................... 50
2.2
Class diagram ........................................................................................................... 51
2.3
Sequence diagram .................................................................................................... 53
2.3.1 Load Data ............................................................................................................. 53

2.3.2 Apply algorithm ................................................................................................... 54
Chapter V Implementation and Test ........................................................................................ 55
1
Development environment ............................................................................................. 56

1.1
Software Environment.............................................................................................. 56
1.2
Hardware environment ............................................................................................. 58
Achieved Work ............................................................................................................... 58
In this section, we are going to present our solution. ............................................................... 58
2.1
Management of missing values ................................................................................ 59
2.2
Data description........................................................................................................ 62
2.3
Transformation ......................................................................................................... 65
2.4
Modeling .................................................................................................................. 67
Performance Tests .......................................................................................................... 71
Conclusion and Perspectives .................................................................................................... 73

Bibliography ............................................................................................................................ 75
Netography ............................................................................................................................ 76
List of Tables
Table 1.Comparative table ....................................................................................................... 26
Table 2: Inputs/Outputs LAG ................................................................................................... 30
Table 3: Inputs/Outputs LEAD ................................................................................................ 30
Table 4: Inputs/Outputs Power ................................................................................................ 30
Table 5. Inputs/Outputs Integrate ............................................................................................. 32
Table 6: Inputs/Outputs Seasonal Differencing ....................................................................... 32
Table 7. Inputs/Outputs Box Cox ............................................................................................. 33
Table 8. Inputs/Outputs SES .................................................................................................... 34
Table 9. Inputs/Outputs HS ...................................................................................................... 34
Table 10. Parameters of Winters Smoothing ........................................................................... 35
Table 11. Inputs/Outputs WS ................................................................................................... 36
Table 12. Inputs/Outputs ADF ................................................................................................. 37
Table 13. Inputs/Outputs Jarque Berra ..................................................................................... 37
Table 14. Inputs/Outputs Shapiro Wilk.................................................................................... 38
Table 15. Inputs/Outputs PLS .................................................................................................. 40
Table 16. Inputs/Outputs ARMAX .......................................................................................... 40
Table 17. Inputs/Outputs ARIMA ............................................................................................ 41
Table 18. Inputs/Outputs Linear Prediction ............................................................................. 42
Table 19. Inputs/Outputs Box Plot ........................................................................................... 42
Table 20. Inputs/Outputs ACF ................................................................................................. 42
Table 21. Inputs/Outputs PACF ............................................................................................... 43
Table 22. Hardware Environment ............................................................................................ 58
Table 23. Performance Tests .................................................................................................... 71
List of Figures
Figure 1: IO Services Manufacturing Operation Management[N1] ..................................... 13

Figure 2: KnowledgeNet Architecture [N2]............................................................................ 14
Figure 3.Modeling cycle .......................................................................................................... 17
Figure 4. Project Planning ........................................................................................................ 18
Figure 6. EVIEWS interface [N3] ............................................................................................ 21
Figure 7.SPSS interface [N4] ................................................................................................... 22
Figure 8.GRETL interface [N5] ............................................................................................... 23
Figure 9.R interface [N6] ......................................................................................................... 24
Figure 10.SAS interface[N7].................................................................................................... 24
Figure 11.
Time Series ........................................................................................................ 31
Figure 12.Integrated
Time Series ........................................................................................ 31
Figure 13.Global Use Case....................................................................................................... 44

Figure 14.Manage Project Use Case ........................................................................................ 45
Figure 15. Missing Values Use Case ....................................................................................... 46
Figure 16.Global architecture of the system ............................................................................ 49
Figure 17.Package diagram ...................................................................................................... 50
Figure 18.Class diagram ........................................................................................................... 52
Figure 19. Load Data ................................................................................................................ 53
Figure 20.Select method ........................................................................................................... 54
Figure 21.Microsoft Office Project Logo ................................................................................. 56
Figure 22.Accord.Net Logo [N11] ........................................................................................... 56
Figure 23.MVS Logo [N12] ..................................................................................................... 57
Figure 24.Entreprise Architect Logo [N13] ............................................................................. 57
Figure 25.DevExpress Logo [N14] .......................................................................................... 57
Figure 26.Main Interface.......................................................................................................... 58
Figure 27:File bar ..................................................................................................................... 59
Figure 28.Home Interface ........................................................................................................ 59
Figure 29.Data bar .................................................................................................................... 60

Figure 30.Summary Interface ................................................................................................... 60
Figure 31.Impute Interface ....................................................................................................... 61
Figure 32.Methods of Impute ................................................................................................... 61
Figure 33.Descriptive Statistics Impute Interface .................................................................... 62
Figure 34. Data Description menu ........................................................................................... 62
Figure 35. Line and bar Chart .................................................................................................. 63
Figure 36.Correlogram chart .................................................................................................... 63
Figure 37.Box Plot chart .......................................................................................................... 64
Figure 38.Descriptive Statistics Interface ................................................................................ 64
Figure 39.Shapiro Wilk Test Interface ..................................................................................... 65
Figure 40.ADF Test Interface .................................................................................................. 65
Figure 41.Transformation menu ............................................................................................... 65
Figure 42.Integrate Interface .................................................................................................... 66
Figure 43.Smoothing Interface ................................................................................................. 66
Figure 44.Models menu............................................................................................................ 67
Figure 45.PLS main interface................................................................................................... 67
Figure 46.Factors Interface ...................................................................................................... 68
Figure 47.Projection Interface .................................................................................................. 68
Figure 48.Regression Interface ................................................................................................ 69
Figure 49.Forecast menu .......................................................................................................... 69
Figure 50: Linear Regression Interface .................................................................................... 70
Figure 51: Holt's Smoothing Interface ..................................................................................... 70
General Introduction
nowledge discovery is one of the most recent and fastest growing elds of
research in computer science. It combines techniques from machine learning

and database technology to uncover meaningful knowledge from large and
real world databases. However, most of the real world data are time based:
for example stock prices, dairy cow milk production gures or meteorological data and
especially in the process industry. Most current knowledge discovery systems use similaritybased machine learning methodslearning from exampleswhich do not generally suite
this type of data. Time-series analysis techniques are used extensively in signal processing
and sequence identication applications such as speech recognition, but have not often been
considered for knowledge discovery tasks.
The popularity of time-series databases in many applications has created an increasing
demand for performing data-mining tasks (description, transformation, modeling, etc.) on
time-series data. Currently, however, almost no single system or library exists that specializes
on providing efficient implementations of data-mining techniques for time-series data,
supports the necessary concepts of representations, statistical test and forecasting, and which
can be used by both expert and non-expert of statistics.
Integration Objects deals with heterogeneous types of temporal data coming from
different equipments such as sensors, data feeds, etc. This large amount of time series data
challenges the way they would be analyzed, interpreted, modeled and predicted with
developing models that are both accurate and user-friendly.
For these reasons our project, developed within the Integration objects company, is a
solution that can perform analysis of temporal data. It aims to offer a rich environment that
meets the standards and the expectations of the company's customers, which was the scope of
our end of studies project.
The following report details the different steps we have been through in our project. This
report presents five main chapters.
10
In the first chapter, we introduce the project environment by presenting the hosting
company, the project challenges and goals as well as the project management methodology
applied during the project lifecycle.
In the second chapter, we present the state of the art based on the concepts of time
series analysis and a description of the competitors.
The specification and analysis of every requirement is presented in the third chapter in
which the functional and non-functional requirements as well as the design of these needs are
described in details.
The fourth chapter covers the architecture and design phase of the solution. The fifth
chapter details the aspects of the implementation illustrated by the establishment of a real case
example. Finally, we complete this report with a conclusion and present the project
perspectives.
Chapter I
GENERAL PRESENTATION
12
Chapter I
General Presentation
Introduction
In this chapter, we start by covering the internship environment and by presenting the hosting
company. Then, we focus on the project, by detailing its environment, goals and challenges.
1 Hosting Company: Integration Objects

This section covers Integration Objects presentation, by describing its profile, expertise and
activities.
1.1 Overview
Integration Objects is a software development firm created in 2002, based in Tunisia with
sales representatives in Houston, Texas and Genoa. It is a world leading systems integrator
and solutions provider for knowledge management, advanced analytics, automation, plant
information management, root cause analysis, performance management and decision support
applications for the process industry
1.2 Expertise
Integration Objects is specialized in the development of software solutions for the sectors of
industry and energy, including oil and chemicals. Software developed by Integration Objects
focus on Manufacturing Operation Management which the objective is management and
optimization of production under operational constraints: the safety of staff and assets,
production goals, costs
Figure 1: IO Services Manufacturing Operation Management[N1]
Integration Objects offers highly scalable and reliable solutions that allow real-time data
collection from multiple plant systems and various enterprise networks.
This enables
13
Chapter I
companies to turn data, information, and knowledge into operational intelligence, thereby
optimizing their business and manufacturing processes.
One of these solutions is KnowledgeNetTM (KNet). It is an innovative intelligent framework
application specialized in collecting real-time data, detecting abnormal conditions, automating
root cause analysis, and applying best practices through the workflow engine.
Figure 2: KnowledgeNet Architecture [N2]

KNet is primarily used to empower operations in the chemical oil and gas, power, and utilities
industries in making timely business decisions to increase production uptime and safety.
Users may include operators, shift supervisors, process engineers, operators, and plant
managers.
1.3 Industry Participation and Certification

As an active member of the OPC Foundation, MIMOSA, and ISA, Integration Objects is
dedicated to providing products and services that incorporate industry standards and enable
14
`
Chapter I
interoperability between different applications, systems, and vendors. Its quality and
management standards are reflected in its status as an ISO 9001:2008 certified company.
Their Customers are located on five continents and include the largest industrial companies in
the world such as ExxonMobil, Chevron, Saudi Aramco and Solvey.
1.4 Technical Department

To ensure best performance and results, Integration Objects technical department is divided
into three main teams:
-
The development team: This team is responsible for design, development and
maintenance of software solutions provided by Integration Objects for the process
industry including plug and play connectivity products and knowledge management
products.
The automation team: This team is responsible for all automation, installation,
deployment activities at customer sites. Automation Engineers act as end users for the
products delivered by the development team and are so responsible for the testing and
validation of Integration Objects software products.
The process team: The process team deals with more advanced applications used in
the process industry such as data validation and reconciliation applications, oil
movement applications, expert systems, diagnosis applications, etc.
2 Project Overview
2.1 Functional Scope
Time series analysis comprises methods for analyzing time series data in order to extract
meaningful statistics and other characteristics of data. Time series forecasting is the use of
a model to predict future values based on previously observed values, while regression
analysis is often employed in such a way to test theories that the current values of one or more
independent time series affect the current value of another time series.
15
`
Chapter I
Time series data have a natural temporal ordering. This makes time series analysis distinct
from other common data analysis problems, in which there is no natural ordering of the
observations (explaining people's wages by reference to their respective education levels,
where the individuals' data could be entered in any order).
Time series analysis is also distinct from spatial data analysis where the observations typically
relate to geographical locations (accounting for house prices by the location as well as the
intrinsic characteristics of the houses). A stochastic model for a time series will generally
reflect the fact that observations close together in time will be more closely related than
observations further apart.
In addition, time series models will often make use of the natural one-way ordering of time so
that values for a given period will be expressed as deriving in some way from past values,
rather than from future values.
2.2 Project Challenges

Our project tries to find an efficient way to enable creating an application for decision support
systems. By providing friendly interfaces and several algorithms, our solution provide to its
users outstanding functions to find out the degree of dependence between the values of a time
series, to discover trends (seasonal or not), to apply specific pretreatments such as the
Autoregressive Moving Average variants and finally to build predictive models.
16
`
Chapter I
Figure 3.Modeling cycle

Our solution allows you to take into account explanatory variables through a linear model
using the Partial Least Square (PLS) is a statistical method that tries to find the
multidimensional direction in the X space that explains the maximum multidimensional
variance direction in the Y space.
2.3 Project Goals

There are two main goals of our application:
-
Identifying the nature of the phenomenon represented by the sequence of

observations.
Forecasting (predicting future values of the time series variable).
Both of these goals require the time series pattern to be identified and formally described.
Our project consist in designing and implementing an analytics module allowing simple users
to apply several analysis algorithms in order to better treat their time series according to their
needs.
17
`
Chapter I
2.4 Project Planning

The figure below presents our project planning.
Figure 4. Project Planning

This schedule has been updated gradually during the project period. Using this approach has
helped us to estimate each part of the project and to optimize the work time in order to present
the project deliveries at time.
Conclusion
In this chapter we have presented the host company as well as the general context of the
project. In the next chapter, we are going to present the preliminary study that will allow a
better understanding of our goal.
18
`
19
Chapter II
Preliminary study
20
Chapter II
Preliminary study
Introduction
In this chapter, we start by defining Time Series Analysis concept. We continue by presenting
the principal market players and our proposed solution. Finally we present the statistical
frameworks.
1 State of the Art

In order to develop a time series analysis application we need to analyze and browse the most
known solutions in the market. The solutions we present in the next sections are: EVIEWS,
GRETL, IBM SPSS, R, and SAS.
1.1 EVIEWS
EVIEWS(Econometric Views) is a statistical package for Windows, used mainly for timeseries oriented econometric analysis. It is developed by Quantitative Micro Software (QMS),
now a part of IHS. Version 1.0 was released in March 1994, and replaced MicroTSP. The
current version of EVIEWS is 8.0, released in March 2013.
EVIEWS can be used for general statistical analysis and econometric analyses, such as crosssection and panel data analysis and time series estimation and forecasting.
Figure 5. EVIEWS interface [N3]

21
`
Chapter II
Preliminary study
1.2 IBM SPSS

SPSS Statistics (Statistical Package for the Social Sciences)is a software package used
for statistical analysis. Long produced by SPSS Inc., it was acquired by IBM in 2009. The
current versions (2014) are officially named IBM SPSS Statistics.
Companion products in the same family are used for survey authoring and deployment (IBM
SPSS Data Collection), data mining (IBM SPSS Modeler), text analytics, and collaboration
and deployment (batch and automated scoring services).
SPSS is a widely used program for statistical analysis in social science. It is also used by
market researchers, health researchers, survey companies, government, education researchers,
marketing organizations, data miners, and others.
Statistics included in the base software:
-
Descriptive statistics: Cross tabulation, Frequencies, Descriptives, Explore,

Descriptive Ratio Statistics
Bivariate statistics: Means, t-test, ANOVA, Correlation (bivariate, partial,

distances), Nonparametric tests
Prediction for numerical outcomes: Linear regression
Prediction for identifying groups: Factor analysis, cluster analysis (two-step, Kmeans, hierarchical), Discriminant
Figure 6.SPSS interface [N4]

22
`
Chapter II
Preliminary study
1.3 GRETL
Gretl is an open-source statistical package, mainly for econometrics. The name is an acronym
for Gnu Regression, Econometrics and Time-series Library. It has a graphical user interface
and can be used together with X-12-ARIMA, TRAMO/SEATS, R, Octave, and Ox. It is
written in C, uses GTK as widget toolkit for creating its GUI, and uses gnu plot for generating
graphs. As a complement to the GUI it also has a command line interface.
Figure 7.GRETL interface [N5]
1.4 R
R is a free software programming language and software environment for statistical
computing and graphics. The R language is widely used among statisticians and data
miners for developing statistical software and data analysis. Polls and surveys of data
miners are showing R's popularity has increased substantially in recent years.
R
provides
wide
variety
including linear and nonlinear modeling,
of
classical
statistical
statistical
and graphical techniques,

tests, time-series
analysis,
classification, clustering, and others.
23
`
Chapter II
Preliminary study
Figure 8.R interface [N6]
1.5 SAS
SAS (Statistical Analysis System) is a software suite developed by SAS Institute for advanced
analytics, business intelligence, data management, and predictive analytics. It is the largest
market-share holder for advanced analytics.
SAS is a software suite that can mine, alter, manage and retrieve data from a variety of
sources and perform statistical analysis on it. SAS provides a graphical point-and-click user
interface for non-technical users and more advanced options through the SAS programming
language. SAS programs have a DATA step, which retrieves and manipulates data, usually
creating a SAS data set, and a PROC step, which analyzes the data.
Figure 9.SAS interface[N7]

24
`
Chapter II
Preliminary study
2 Comparative Table
A brief study of those five possibilities led us to prepare this comparative table that allows us
to see clearly the features provided by each software.
EVIEWS
GRETL
SAS
SPSS
LAG/LEAD
Yes
Yes
Yes
Yes
Yes
Box Cox
No
No
Yes
Yes
Yes
Smoothing
Yes
No
Yes
Yes
Yes
Holt's Smoothing
No
No
No
Yes
Yes
Seasonal Differencing
Yes
Yes
No
Yes
No
Integrate
No
No
No
No
No
ARMAX
Yes
No
Yes
No
No
Linear Regression
Yes
No
Yes
Yes
Yes
No
No
Yes
Yes
No
Yes
Yes
No
No
No
Transformation
Models
Partial Least
Squares(PLS)
Statistical Test
Augmented Dickey
Fuller
25
`
Chapter II
Preliminary study
Shapiro Wilk
Yes
Yes
No
No
No
Mean test
Yes
No
No
Yes
No
ACF(Correlogram)
Yes
Yes
Yes
Yes
Yes
PACF(Correlogram)
Yes
Yes
Yes
Yes
Yes
Box plot
Yes
Yes
Yes
Yes
No
Bar
Yes
Yes
Yes
Yes
Yes
Line
Yes
Yaes
Yes
Yes
Yes
Points
Yes
Yes
Yes
Yes
Yes
Summary
No
No
No
Yes
No
Linear prediction
No
No
No
Yes
No
K-Nearest Neighbors
No
No
No
No
No
Descriptive analysis
No
No
No
Yes
Yes
Charts
Missed Values
Table 1.Comparative table
3 Statistical Frameworks
For a better design and development of our project, we have to go through a research phase
about the best statistical framework. In our case, we will develop an application that will treat
large amount of data calculation that is why we must have a tool that contains several
mathematical functions. After this research phase we discover that our solution can be done
by integrating statistical software such as R or SAS or integrate the Accord.net frameworks.
26
`
Chapter II
Preliminary study
3.1 R.NET
R.NET enables the .NET Framework to interoperate with the R statistical language in the
same process.[N8]
3.2 SAS Integration

SAS Integration Technologies, in combination with other SAS software and solutions,
enables you to make information delivery and decision support a part of the information
technology architecture for your enterprise.
SAS Integration Technologies provides you with the enabling software to build a secure
client-server infrastructure on which to implement SAS distributed processing solutions. With
SAS Integration Technologies, you can integrate SAS with other applications in your
enterprise; provide proactive delivery of information from SAS throughout the enterprise;
extend the capabilities of SAS to meet your organization's specific needs; and develop your
own distributed applications that leverage the analytic and reporting powers of SAS. [N9]
3.3 Integrate Accord.Net framework

The Accord.NET Framework is a complete framework for building machine learning,
computer vision, computer audition, signal processing and statistical applications. Sample
applications provide a fast start to get up and running quickly, and an extensive
documentation helps fill in the details. [N10]
Conclusion
In this chapter, we have presented some basic concepts that are necessary for the
understanding of our project and its context. We have also presented some of the existing
solutions. The next chapter will describe the specification phase we have been through.
27
`
Chapter III
Requirements Analysis and Specification
28
Chapter III
Introduction
In this chapter, we describe the global characteristics of the solution. Then, we analyze the
functional and non-functional requirements of the solution, and identify the different use cases
of the application.
1 General Specifications
1.1 User characteristics
Our solution can be used by both expert and non-expert of statistics such as chemist,
industrial and automation engineers...
1.2 Design and implementation constraints

All application software shall be modularized into classes using object-oriented design
principles.
The application has to provide users with an easy way to apply several analysis algorithms in
order to better treat their time series according to their needs.
2 System features
2.1 Transformation
2.1.1 LAG Transformation
Description
In time series analysis, the lag operator or backshift operator operates on an element of a time
series to produce the previous element.
For example, given some time series:
Then
(1)
or equivalently
(2)
where L is the lag operator
29
`
Chapter III
Sometimes the symbol B for backshift is used instead. Note that the lag operator can be
raised to arbitrary integer powers so that
And
(3)
Inputs
-
Outputs
-
Initial series
Order of lag
Backward series
Table 2: Inputs/Outputs LAG
2.1.2 LEAD Transformation

It is an operator that allows forwarding the series with a very precise order.
Inputs
-
Outputs
-
Initial series
Order of lead
Series conducted
Table 3: Inputs/Outputs LEAD
2.1.3 Power Transformation

In statistics, the power transform is from a family of functions that are applied to create a
rank-preserving transformation of data using power functions. This is a useful data
transformation technique used to stabilize variance, make the data more normal distributionlike
Inputs
-
Initial series
Power degree
Outputs
-
New series (variance stabilized)
Table 4: Inputs/Outputs Power

30
`
Chapter III
2.1.4 Integrate Transformation

Most of the real time series are not stationary, and their average level varies over time. The
figure shows the series which we will denote by"
" shows a clearly decreasing trend and
thus is not stationary.
Figure 10.
Time Series
The figure shows the first difference in this series, that is, the series of variations in market
share from one week to the next. If we let
denote this new series, we see that its
values oscillate around a constant mean and seem to correspond to a stationary series.
Figure 11.Integrated
Time Series
31
`
Chapter III
We conclude that the series
seems to be an integrated series, which is transformed into a
stationary one by means of differentiation. We say then that it is integrated of order one, the
number of differences needed to obtain a stationary process being the order of integration.
Inputs
-
Outputs
-
Non-stationary series
Stationary series
Order of integration
Table 5. Inputs/Outputs Integrate
2.1.5 Seasonal Differencing

The seasonal difference of a time series is the series of changes from one season to the next.
For monthly data, in which there are 12 periods in a season, the seasonal difference of Y at
period t is
Inputs
-
Outputs
-
Initial series
Order Of Differencing
Order Of seasonality
Series without seasonality
Table 6: Inputs/Outputs Seasonal Differencing
2.1.6 Box-Cox Transformation

Box-Cox transforms non-normally distributed data to a set of data that has approximately
normal distribution. The Box-Cox transformation is a family of power transformations.
If is 0, then:
If is = 0, then:
(3)
(4)
The logarithm is the natural logarithm (log base e). The algorithm calls for finding the value
that maximizes the Log-Likelihood Function (LLF).
32
`
Chapter III
Inputs
-
Outputs
-
Initial series
Lambda parameter
New series(normal distribution)
Table 7. Inputs/Outputs Box Cox
2.1.7 Exponential Smoothing

Smoothing is a technique that can be applied to time series data, either to produce smoothed
data for presentation, or to make forecasts. The time series data themselves are a sequence of
observations. The observed phenomenon may be an essentially random process, or it may be
an orderly, but noisy, process. Whereas in the simple moving average the past observations
are weighted equally, exponential smoothing assigns exponentially decreasing weights over
time.
2.1.7.1 Simple Exponential Smoothing
Exponential smoothing is a technique that can be applied to time series data, either to produce
smoothed data for presentation, or to make forecasts. The time series data themselves are a
sequence of observations. The observed phenomenon may be an essentially random process,
or it may be an orderly, but noisy, process. Whereas in the simple moving average the past
observations are weighted equally, exponential smoothing assigns exponentially decreasing
weights over time.
Exponential smoothing is commonly applied to financial market and economic data, but it can
be used with any discrete set of repeated measurements. The raw data sequence is often
represented by { }, and the output of the exponential smoothing algorithm is commonly
written as { }, which may be regarded as a best estimate of what the next value of x will be.
When the sequence of observations begins at time t = 0, the simplest form of exponential
smoothing is given by the formula:
(6)
33
`
Chapter III
Where is the smoothing factor, and 0 < < 1.
Inputs
-
Outputs
-
Data
smoothing
parameter
Comments
-
Smoothed
series
No Trend
No Seasonality
Table 8. Inputs/Outputs SES

2.1.7.2 Holt Smoothing
Holt (1957) extended simple exponential smoothing to allow forecasting of data with a trend.
This method involves a forecast equation and two smoothing equations (one for the level and
one for the trend):
(7 )
Forecast equation
(8)
Level equation
Trend equation
(9)
where
denotes an estimate of the
trend
denotes an estimate of the level of the series at time t,

(slope)
level, 01 and
of
the
series
at
time t, is
smoothing
is the smoothing parameter for the trend, 0
Inputs
-
the
Data
the relative level
another Trend
Smoothed series
for
the
1.
Outputs
-
parameter
Comments
-
With Trend
- No Seasonality
Table 9. Inputs/Outputs HS
34
`
Chapter III
2.1.7.3 Winters Smoothing

Winters exponential smoothing model is the second extension of the basic Exponential
smoothing model.It is used for data that exhibit both trend and seasonality.
It is a three parameter model that is an extension of Holts method. An additional equation
adjusts the model for the seasonal component.
The four equations necessary for Winters multiplicative method are:
The exponentially smoothed series:
y
t (1 )( L
L
b
)(10 )
t
t 1 t 1
S
ts
The trend estimate:
bt ( Lt Lt 1) (1 )bt 1(11)
The seasonality estimate
y
St t (1 ) St s (12)
Lt
Forecast m period into the future:
Ft m ( Lt mbt ) St ms (13)
-
= level of series.
= smoothing constant for the data.
= new observation or actual value in period t.
= smoothing constant for trend estimate.
= trend estimate.
= smoothing constant for seasonality estimate.
=seasonal component estimate.
m = Number of periods in the forecast lead period.
s = length of seasonality (number of periods in the season)

Table 10. Parameters of Winters Smoothing
35
Chapter III
Inputs
-
Data
The relative level
another on the Trend
the last to Seasonality
Outputs
-
Comments
Smoothed series
No Trend, with
Saisonality
With Trend, with
saisonality
Table 11. Inputs/Outputs WS
2.2 Statistical Test

2.2.1 Dickey Fuller Test
Description
(14)
A simple AR(1) model is

where
is the variable of interest, t is the time index,
the error term. A unit root is present if
is a coefficient, and
is
. The model would be non-stationary in this
case.
The regression model can be written as
(15)
where
is the first difference operator. This model can be estimated and testing for a unit
root is equivalent to testing
( where
). Since the test is done over the
residual term rather than raw data, it is not possible to use standard t-distribution to provide
critical values. Therefore this statistic
has a specific distribution simply known as
the DickeyFuller table.

There are three main versions of the test:
Test for a unit root :
Test for a unit root with drift:
(16)
(17 )
36
`
Chapter III
-
Test for a unit root with drift and deterministic time trend:
(18)
Inputs
-
Outputs
Initial series
F-statistic
P-value
Order of Lag
Table 12. Inputs/Outputs ADF
2.2.2 Jarque-Berra Test

In statistics, the JarqueBera test is a goodness-of-fit test of whether sample data have
the skewness and kurtosis matching a normal distribution.
The test statistic JB is defined as
(19)
Where n is the number of observations (or degrees of freedom in general);

S is the sample skewness, and K is the sample kurtosis.
Inputs
-
Initial series
Outputs
-
Jarque Berra test
Kurtosis
Mean
Skewness
Standev
Variance
Variance MLE
Table 13. Inputs/Outputs Jarque Berra
37
`
Chapter III
2.2.3 Shapiro-Wilk Test

The ShapiroWilk test is a test of normality in frequentist statistics. The ShapiroWilk test
utilizes the null hypothesis principle to check whether a sample
) came from
a normally distributed population. The test statistic is:
(20)
The constants
are given by
Where
and
are
the expected
values of
the order
statistics of independent and identically distributed random variables sampled from the
standard normal distribution, and V is the covariance matrix of those order statistics. The user
may reject the null hypothesis if W is below a predetermined threshold.
Inputs
-
initial series
Outputs
-
Jarque Berra test
Kurtosis
Mean
Skewness
Standev
Variance
Variance MLE
Table 14. Inputs/Outputs Shapiro Wilk
2.3 Models and Prediction

2.3.1 Temporal PLS
Description
Partial least squares regression (PLS regression) is a statistical method that bears some
relation to principal components regression; instead of finding hyper planes of minimum
variance between the response and independent variables, it finds a linear regression model by
38
`
Chapter III
projecting the predicted variables and the observable variables to a new space. Because both
X and Y data are projected to new spaces.
As in multiple linear regression, the main purpose of partial least squares regression is to
build a linear model, Y=XB+E, where Y is an n cases by m variables response matrix, X is an
n cases by p variables predictor (design) matrix, B is a p by m regression coefficient matrix,
and E is a noise term for the model which has the same dimensions as Y.
For establishing the model, partial least squares regression produces a p by c weight matrix W
for X such that T=XW, i.e., the columns of W are weight vectors for the X columns
producing the corresponding n by c factor score matrix T. These weights are computed so that
each of them maximizes the covariance between responses and the corresponding factor
scores. Ordinary least squares procedures for the regression of Y on T are then performed to
produce Q, the loadings for Y (or weights for Y) such that Y=TQ+E. Once Q is computed, we
have Y=XB+E, where B=WQ, and the prediction model is complete.
One of the most important steps in the application of the PLS regression is the determination
of the correct number of dimensions to use in order to avoid over-fitting, and therefore to
obtain a robust predictive model.
Comparison between PCR and PLS
Principal components regression and partial least squares regression differ in the methods
used in extracting factor scores. In short, principal components regression produces the
weight matrix W reflecting the covariance structure between the predictor variables, while
partial least squares regression produces the weight matrix W reflecting the covariance
structure between the predictor and response variables.
Temporal approach
The aim of this work is to propose a new technique for the application of PLS regression to
time series. This technique is based on the Exponential smoothing of the loadings weights
vectors (w) obtained at each iteration step. This smoothing progressively displaces the random
or quasi-random variations from earlier (most important) to later (less important) PLS latent
variables.
39
`
Chapter III
Inputs
-
data of predictors
response variable
Outputs
-
Estimators
Table 15. Inputs/Outputs PLS
2.3.2 ARMAX model

ARMAX models are useful when you have dominating disturbances that enter early in the
process, such as at the input. For example, a wind gust affecting an aircraft is a dominating
disturbance early in the process.
"ARMAX modeling" treats the given signals x, y, z as Auto-Regressive Moving Average with
eXtra / eXternal (ARMAX) process according t
(21)
where x is the input signal (usually a noise signal), y is the output signal and z is the external
input signal. The model coefficients of the given orders are estimated and the residual r (the
estimation error) is returned. Input parameters are order P of the AR process, order Q of the
MA process (choose Q=0 for an ARX model) and order R of the eXternal process.
Inputs
-
Estimation data
order P
order Q
order R
Outputs
-
Identified ARMAX structure

polynomial model.
Table 16. Inputs/Outputs ARMAX
2.3.3 ARIMA model

ARIMA(p,d,q): ARIMA models are, in theory, the most general class of models for
forecasting a time series which can be stationarized by transformations such as differencing
and logging.
40
`
Chapter III
The acronym ARIMA stands for "Auto-Regressive Integrated Moving Average." Lags of the
differenced series appearing in the forecasting equation are called "auto-regressive" terms,
lags of the forecast errors are called "moving average" terms, and a time series which needs to
be differenced to be made stationary is said to be an "integrated" version of a stationary
series. Random-walk and random-trend models, autoregressive models, and exponential
smoothing models are all special cases of ARIMA models.
A non-seasonal ARIMA model is classified as an "ARIMA(p,d,q)" model, where:
-
p is the number of autoregressive terms,
d is the number of non-seasonal differences, and
q is the number of lagged forecast errors in the prediction equation.
Inputs
-
Estimation data
order p
order d
order q
Outputs
-
Identified ARIMA structure

polynomial model.
Table 17. Inputs/Outputs ARIMA
2.3.4 Linear Prediction

Linear prediction is a mathematical operation where future values of a discrete-time signal are
estimated as a linear function of previous samples.
The most common representation is:
where
is the predicted signal value,
the previous observed values,
and the
predictor coefficients. The error generated by this estimate is:

where
is the true signal value.
41
`
Chapter III
Inputs
-
Initial Series
Horizon
Outputs
-
Predicted Series
Table 18. Inputs/Outputs Linear Prediction
2.4 Graph
2.4.1 Box-Plot
A box plot is a convenient way of graphically depicting groups of numerical data through
their quartiles.
Inputs
-
Outputs
-
Series
Box Plot Graph
Table 19. Inputs/Outputs Box Plot
2.4.2 ACF Graph

The autocorrelation can detect regularities, repeated patterns in a signal as a periodic signal
disturbed by a lot of noise, or a fundamental frequency of a signal that does not contain this
fundamental fact, but involved with several of its harmonics.
(22)
Where x is the average of the n observations.
Inputs
Outputs
Series
table of AC
number of Lag
Correlogram
Table 20. Inputs/Outputs ACF
42
`
Chapter III
2.4.3 PACF Graph

In time series analysis, the partial autocorrelation function (PACF) plays an important role
in data analyses aimed at identifying the extent of the lag in the models.
k=2..n , j=1,2..k-1
k=3n (23)
Inputs
Outputs
Series
matrix of PAC
number of Lag
Correlogram
Table 21. Inputs/Outputs PACF
3 Use case Model

A use case diagram at its simplest is a representation of a user's interaction with the system
and depicting the specifications of a use case. A use case diagram can portray the different
types of users of a system and the various ways that they interact with the system.
3.1 Global Use Case

This use case presents the global interactions between the system and the actors.
43
`
Chapter III
Figure 12.Global Use Case

Description
UC: Interact with application

Scope: TSAnalytics
Actor: User
Pre-Condition: Application executed
Main Scenario
1. Create new project.
2. Import data.
3. Choose method.
4. Save/Exit without Save/Choose another method.
44
`
Chapter III
Alternative Scenario
1. Open project.
2. Choose method.
3. Save/Exit without Save/Choose another method.
3.2 Manage Project

The system allows users to manage workspace by creating new projects, to save and load
projects.
Figure 13.Manage Project Use Case

Description
UC: Manage project
Scope: TSAnalytics
Actor: User
Pre-Condition: Application executed
45
`
Chapter III
Main Scenario
1. The application requests to manage project
2. The user choose to open or create new project
Post-Condition
Existence of a project
3.3 Missing values Use Case

The system allows users to summarized the missing values and complete the data with several
statistical methods.
Figure 14. Missing Values Use Case

Description
UC: Impute missing values
Scope: TSAnalytics
Actor: User
Pre-Condition: Missing data
46
`
Chapter III
Main Scenario
1. choose to describe the missing data
2. choose to Impute the missing data
3. choose the method of impute
4. Save the completed data
Post-Condition:
Completed data
Conclusion
Throughout this chapter, we have detailed the functional and non-functional requirements of
the solution as well as the use cases. In the next chapter we begin the analysis and design of
theses specifications.
47
`
Chapter IV
Design
48
Chapter IV
Design
Introduction
The Design is a creative process, a crucial phase of developing project. Supporting this phase
with techniques and tools appropriate is important to product a high quality application. To
present our design we begin this section by giving a global view of our solutions architecture
after that we will detail our design choices through the package, classes and sequences
diagrams.
1 Global Architecture of the System

An application architecture describes the structure and behavior of applications used in a
business, focused on how they interact with each other and with users. It is focused on the
data consumed and produced by applications rather than their internal structure.
This involves defining the interaction between application packages, databases, and
middleware systems in terms of functional coverage. This helps identify any integration
problems or gaps in functional coverage
For our application we opted for three-layer architecture
Figure 15.Global architecture of the system
49
`
Chapter IV
Design
These main Layers are:

Human Machine Interaction: HMI (Human Machine Interaction) aims to improve the
interactions between users and computers by making computers more usable and receptive to
users' needs. Specifically.
Here we have divided into 2 layers: Graphical interface, Controls.
Algorithms: Algorithms layer is composed of the services requested by the user. It contains
all the functional requirements. It is compound of four packages which are Transformation,
Models, Test, Graph.
Data Source: layer provide our solution to communicate with other systems and other
applications. It contains two types that supports access to the data Bases and files.
2 System Diagrams
2.1 Package diagram
Package diagram is UML structure diagram which shows packages and dependencies between
the packages. Our application is composed by five packages:
The package is the one that interact with the different other packages in order to flow the
execution from data into visualization.
Figure 16.Package diagram
50
`
Chapter IV
Design
TSAnalytics
Contains the form that hosts the main window of the solution and graphical charts like box
plot, Correlogram etc
Transformation
Provides the Transformation Algorithms requested by users like Lag, Integrate, Exponential
Smoothing..
Test
Provides the Test Algorithms requested by users like Duckey Fuller, Shapiro Wilk
Model
Provides the models Algorithms requested by users like ARMAX,ARIMA,PLS..
Graph
Provides the Graph Algorithms requested by users like BoxPlot, Correlogram..
2.2 Class diagram

The class diagram is a type of static structure diagram that describes the structure of a system
by showing the classes of the system, their attributes, operations (or methods), and the
relationships among objects.
This section will present the different classes diagrams for the different modules of our
solution.
51
`
Figure 17.Class diagram

52
Chapter IV
2.3
Design
Sequence diagram
A sequence diagram is an interaction diagram that shows the order how classes operate
between each others. It describe the objects and classes involved in the scenario and the
sequence of messages exchanged between the objects needed to carry out the functionality of
the scenario. Sequence diagrams are typically associated with use case realizations in the
Logical View of the system under development.
In this part, we present some sequence diagrams to describe interactions between the user and
the application.
2.3.1 Load Data
Figure 18. Load Data

The user request to load data, the main interface opens the file dialog for the user to choose.
Once the data ils selected, the data source loads the selected file then requests to display it.
53
`
Chapter IV
Design
2.3.2 Apply algorithm
Figure 19.Select method

The user selects a method, the main interface uses the method user control and user must set
the selected method. Once settings is finished the user control turn the algorithm and recovers
the result to display it.
Conclusion
Throughout this chapter, we have presented a conceptual view of our project. And we have
detailed the software architecture of the solution in the form of modules. In the fifth and final
chapter, we will describe the step of project implementation
54
`
Chapter V
Implementation and Test
55
`
Chapter V
Introduction
In this chapter, we devote the first part of the presentation for the development environment,
and then we focus on the presentation of the implemented solution and the performed tests.
1 Development environment
1.1 Software Environment
Microsoft Office Project Professional
Microsoft Office Project Professional 2007 is fairly developed software that includes features
for project management. It is an application that allows monitoring of projects by ensuring the
accomplishment of tasks such as scheduling and jobs.
Figure 20.Microsoft Office Project Logo
Accord.NET
We have been faced to choose third party integration for the analytics algorithms. This phase has
leaded us to choose the scientific calculations framework Accord.NET .We chose this framework
for its performance and the possibility of its configuration and its adaptation to our needs during the
implementation of the solution. Accord.NET is based on the mathematical framework "Aforge.Net".
This framework is composed of a variety of libraries including statistics, machine learning, pattern
recognition, etc.
Figure 21.Accord.Net Logo [N11]

56
`
Chapter V
Microsoft Visual Studio
Visual Studio is an integrated development environment (IDE) providing a set of tools and
services to develop desktop applications, web, or mobile. It incorporated several languages
such as C #, C + +, J # and F #. Its used to develop and test our solution.
Figure 22.MVS Logo [N12]
Enterprise Architect
"Enterprise Architect is a comprehensive UML analysis and design tool for UML, SysML,
BPMN and many other technologies. Covering software development from requirements
gathering through to the analysis stages, design models, testing and maintenance.
Figure 23.Entreprise Architect Logo [N13]
DevExpress
DevExpress is a stunning software development toolset for .NET developers. It includes a

complete range of controls and libraries for all major Microsoft platforms, including
WinForms, ASP.NET, WPF, Silverlight, and Windows 8.
Figure 24.DevExpress Logo [N14]
57
`
Chapter V
1.2 Hardware environment

During the development of our application we have used the hardware environment described
in the table below.
CPU
Intel Core i5-4200U,1.6GHz
Memory
6GB
Windows 7, 64bits
OS
Table 22. Hardware Environment
2 Achieved Work
In this section, we are going to present our solution.
Main Interface
As the end user launches the solution, he will be leaded to the main screen that is
presented in the figure below.
Figure 25.Main Interface
58
`
Chapter V
Load interface
Figure 26: File bar

When the users load a data, it will be displayed automatically.
Figure 27.Home Interface

Our solution is composed by four main categories that are the basics and important phases of
the Time Series Analysis: management of missing values, Data description, Transformation
and modeling-forecasting.
2.1 Management of missing values

If the data contains missing values or non-numeric values,our solution offers the possibility to
make the "Summary" and the "Imputation" of these values by three methods:
Statistical Description( min, max, mean, 1st Qr, median, 3rd Qr)
Linear Prediction
59
`
Chapter V

K-Nearest Neighbors
The next figure presents "Treatments of missing values" tools in Data menu bar:
Figure 28.Data bar

Summary interface
The next figure presents the interface of Summary of missing value which contains a
description of incomplete data through a summary table and bar plot. It provides the user the
number of missing values and the percent for each column.
Figure 29.Summary Interface
60
`
Chapter V
Impute interface
The next figure is the Impute missing Values interface.
Figure 30.Impute Interface
The user can choose one of the following three methods for imputation:
Figure 31.Methods of Impute
The next figure is the Impute missing Values" interface by Descriptive Statistics. The user
can choose the given method for each column or for all columns, and clicking on the impute
button leads the user to a table which does not contain missing values
61
`
Chapter V
Figure 32.Descriptive Statistics Impute Interface
2.2 Data description

The figure below presents the description menu, we can describe data, by plot or calculate
statistical description, or calculate statistical test to identify the behavior and structure of the
series.
Figure 33. Data Description menu

62
`
Chapter V
Chart
Line and bar interface
Figure 34. Line and bar Chart

Correlogram
The next figure presents the correlogram which represents the autocorrelation function.
Figure 35.Correlogram chart
Box plot interface

63
`
Chapter V
Figure 36.Box Plot chart

Descriptive statistics interface
This figure presents the descriptive statistics for a chosen variable
Figure 37.Descriptive Statistics Interface
Statistical Test interface

The next figure presents the result of "Shapiro Wilk" test for checking the normality of series:
64
`
Chapter V
Figure 38.Shapiro Wilk Test Interface

The next figure presents the result of "Augmented Dickey Fuller" test for checking the
stationarity of series:
Figure 39.ADF Test Interface
2.3 Transformation
The next figures present the Transformation menu which contains several transformation can
be applied to series.
Figure 40.Transformation menu

Integrate interface
65
`
Chapter V
To make series stationary with a single transformation and find the necessary order of
difference, our solution offers this possibility, with "Integrate Transformation":
Figure 41.Integrate Interface
Smoothing interface
For smoothing and forecasting we can use "Simple Exponential Smoothing":
Figure 42.Smoothing Interface
66
`
Chapter V
2.4 Modeling
The next figures present the models menu which contains several methods of modeling. These
methods can be applied to univariate or multivariate series:
Figure 43.Models menu

Temporal Partial Least Squares
Main interface
When we first launch the PLS control, we will be leaded to the home screen that is presented
in the figure below:
Figure 44.PLS main interface

PLS algorithm performed by our solution provides users the following results:
-
Factors
Loadings matrix
Weights matrix
Model
Projection
67
Chapter V
-
Regression
Factors interface
Figure 45.Factors Interface
Projection interface
Figure 46.Projection Interface

68
`
Chapter V
Regression interface
Figure 47.Regression Interface

1.1. Forecasting
The next figures present the forecast menu which contains two algorithms: linear model and
Holt's Smoothing.
Figure 48.Forecast menu

69
`
Chapter V
For the two methods of prediction we provide a friendly interface in order to help users to
easily change inputs and outputs. The results are displayed in both charts and data table.
Linear regression interface
Figure 49: Linear Regression Interface
Holt's Smoothing interface
Figure 50: Holt's Smoothing Interface

70
`
Chapter V
Performance Tests
After the end of the implementation phase we have to go through a testing phase of the
application. The test phase is needed to detect anomalies and validate our application. It
ensures that our solution will react as intended and that the quality of the code is in line with
expectations.
We have performed some stress tests to check the performance and response time of our
application. The next table presents some stress tests executed.
Test case
Load data
Transformation
Inputs
Duration
Table which contain 70 columns and 13500 lines
2.5 seconds
1 second
1 second
Description Analysis
1 second
Partial Least Squares
9 seconds
5.5 seconds
Linear Prediction
10 seconds
Exponential Smoothing
6 seconds
ARMAX model
7 seconds
Augmented Dickey
Fuller test
Table 23. Performance Tests
71
`
Chapter V
Conclusion
In this chapter, we have presented the implementation phase of the solution. We have started
by describing the different tool and libraries we have been using throughout the project. Then,
we have presented the most important features offered by our application by showing the
most important interfaces of our application. Finally, we have finished by performing some
tests to validate our application
72
`
Conclusion and Perspectives
raditionally, data mining and time series analysis have been seen as separate
approaches to analyzing enterprise data. However, much of the data used by
business processes is time-stamped. Time series Analysis is a mixture of
forecasting and traditional data mining techniques that uses time dimensions
and predictive analytics to make better business decisions.

Our project, developed within the Integration objects company, is a data mining
solution that can enhance the capabilities of the user in the area of time series analysis and
data preparation. Finding time series that exhibit similar statistical characteristics allows
analysts to easily identify customer or process behaviors of interest in large volumes of time
series data. With the wealth of enterprise data stored in time series, the power to integrate this
data into analysis workflows will help user to easily build valuable models.
In our project, we started by focusing on the understanding of the discipline by studying the
concept of time series analysis and reviewing the existing tools. The next step was to study
and analyze the features to design and implement in our solution and bring out the functional
and non-functional requirement of our project. We then proceeded with the design phase, by
detailing the architecture of our application as well as static and dynamic design through the
development of packages and class diagrams.
Finally, we concluded the report by presenting the implementation and test phase of our
project. This chapter describes the tools and frameworks used to achieve our solution, and
expose the work done through screenshots which cover the most important features of the
solution.
Much of the data that are used in the operational side of a business have a built-in time
dimension. One of the challenges of developing this solution is the complexity of handling a
large number of time series.
73
`
In addition to the technical acquired knowledge, this internship has been an

opportunity for me to adapt and integrate myself in a professional environment and to
improve our communication skills and collaboration with Integration Objects team.
To conclude, we have met the initial objectives, but the project remains open to
several enhancements. Firstly, our application can be easily extended with new modeling and
forecasting algorithms.
Besides, one of the enhancements that can be applied to our application is the
optimization of the current algorithms in order to improve the response time and find better
methods to handle and load big data.
74
`
Bibliography
Bastien P., Esposito Vinzi E., Tenenhaus M. (2005) PLS generalised linear regression,
Computational Statistics and Data Analysis, 48, 17-46.
Wang H.W., Wu Z.B., Meng J. (2006) Partial Least-Squares Regression-Linear and

Nonlinear Methods, National Defense Industry Press, Beijing
AitSaidi, A., Ferraty, F. etKassa, R. (2005) Single functional index model for a time
series. Rev. Roumaine Math. Pures Appl. 50 (4) 321-330.
Fan, J. et Zhang, J.-T. (2000) Two-step estimation of functional linear modelswith

applications
75
`
Netography
[N1] http://www.integrationobjects.com/services.php
[N2]http://www.integrationobjects.com/knowledgenet.php
[N3]
http://www.eviews.com/home.html
[N4]
http://www-01.ibm.com/software/analytics/spss/
[N5]
http://gretl.sourceforge.net/
[N6]
http://www.r-project.org/
[N7]
http://www.sas.com/en_us/software/analytics.html
[N9]
http://www.sas.com/en_us/software/integration-technologies.html
[N10] http://accord-framework.net/intro.html
[N8]
http://rdotnet.codeplex.com/
[N13] http://www.sparxsystems.com.au/
[N12] http://www.microsoft.com/visualstudio/fra
[N14] http://www.devexpress.com/
[N11] https://code.google.com/p/accord/
76
`
77
`

End of Studies Project - IO PDF

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

End of Studies Project - IO PDF

Încărcat de

Drepturi de autor:

Formate disponibile

Ministry of Higher Education and Scientific Research

Engineering School of Statistics and Data Analysis

National Diploma of Engineering in Statistics and Data Analysis

Temporal Data Analysis and Machine

ESSAI tutor: Mrs.Fatma CHAKER-KHARRAT

To my beloved Mother, for her prayers to me and who

To all of you, I dedicate this work

I would like to take opportunity to express my gratitude to everyone who contributed

Furthermore, I would like to thank my colleagues at Integration Objects for providing

Hosting Company: Integration Objects ....................................................................... 13

Industry Participation and Certification ................................................................... 14

Technical Department .............................................................................................. 15

Project Overview ............................................................................................................ 15

Functional Scope ...................................................................................................... 15

Project Challenges .................................................................................................... 16

Project Goals ............................................................................................................ 17

Project Planning ....................................................................................................... 18

Chapter II Preliminary study.................................................................................................... 20

State of the Art................................................................................................................ 21

IBM SPSS ................................................................................................................ 22

Comparative Table ......................................................................................................... 25

Statistical Frameworks .................................................................................................. 26

SAS Integration ........................................................................................................ 27

Integrate Accord.Net framework.............................................................................. 27

Chapter III Requirements Analysis and Specification ............................................................. 28

User characteristics .................................................................................................. 29

Design and implementation constraints ................................................................... 29

System features ............................................................................................................... 29

2.1.1 LAG Transformation ............................................................................................ 29

Seasonal Differencing .......................................................................................... 32

2.1.6 Box-Cox Transformation ..................................................................................... 32

Statistical Test .......................................................................................................... 36

2.2.1 Dickey Fuller Test ................................................................................................ 36

Models and Prediction .............................................................................................. 38

2.3.1 Temporal PLS ...................................................................................................... 38

2.4.1 Box-Plot ............................................................................................................... 42

Use case Model ................................................................................................................ 43

Global Use Case ....................................................................................................... 43

Manage Project ......................................................................................................... 45

Missing values Use Case .......................................................................................... 46

Chapter IV Design .................................................................................................................... 48

Global Architecture of the System ................................................................................ 49

System Diagrams ............................................................................................................ 50

Package diagram ...................................................................................................... 50

Class diagram ........................................................................................................... 51

Sequence diagram .................................................................................................... 53

2.3.1 Load Data ............................................................................................................. 53

Development environment ............................................................................................. 56

Hardware environment ............................................................................................. 58

Achieved Work ............................................................................................................... 58

In this section, we are going to present our solution. ............................................................... 58

Management of missing values ................................................................................ 59

Performance Tests .......................................................................................................... 71

Conclusion and Perspectives .................................................................................................... 73

Figure 1: IO Services Manufacturing Operation Management[N1] ..................................... 13

Time Series ........................................................................................................ 31

Time Series ........................................................................................ 31

Figure 13.Global Use Case....................................................................................................... 44

Figure 29.Data bar .................................................................................................................... 60

research in computer science. It combines techniques from machine learning

1 Hosting Company: Integration Objects