Documente Academic
Documente Profesional
Documente Cultură
University of Carthage
*-*-*-*-*
Academic Year:2013/2014
1
Acknowledgments
First, I owe thanks to Mr. Samy Achour, CEO of Integration Objects, It has been a
great privilege to achieve this internship at this company.
I would like to thank Mrs. Imen Majed and Mr. Mehdi Sidommou, my supervisors
from Integration Objects, for providing all facilities and support to meet our project
requirements.
I would like to express my deepest gratitude to my tutor from ESSAI, Mrs. Fatma
Chaker, for her help, guidance and willingness to share her vast knowledge.
My thanks go also to the highly esteemed members of the jury, Mrs. Hlla Ouaili
Mallek and Mrs. Ben Slama Nawel, for accepting to evaluate this project.
Table of Contents
Acknowledgments ...................................................................................................................... 3
Table of Contents ....................................................................................................................... 4
List of Tables .............................................................................................................................. 7
List of Figures ............................................................................................................................ 8
General Introduction ................................................................................................................ 10
Chapter I GENERAL PRESENTATION ................................................................................ 12
1
Overview .................................................................................................................. 13
1.2
Expertise ................................................................................................................... 13
1.3
1.4
2.2
2.3
2.4
EVIEWS ................................................................................................................... 21
1.2
1.3
GRETL ..................................................................................................................... 23
1.4
R ............................................................................................................................... 23
1.5
SAS........................................................................................................................... 24
R.NET ...................................................................................................................... 27
3.2
3.3
General Specifications.................................................................................................... 29
1.1
1.2
Transformation ......................................................................................................... 29
Graph ........................................................................................................................ 42
3.2
3.3
2.1
2.2
2.3
Software Environment.............................................................................................. 56
1.2
2.1
2.2
Data description........................................................................................................ 62
2.3
Transformation ......................................................................................................... 65
2.4
Modeling .................................................................................................................. 67
List of Tables
Table 1.Comparative table ....................................................................................................... 26
Table 2: Inputs/Outputs LAG ................................................................................................... 30
Table 3: Inputs/Outputs LEAD ................................................................................................ 30
Table 4: Inputs/Outputs Power ................................................................................................ 30
Table 5. Inputs/Outputs Integrate ............................................................................................. 32
Table 6: Inputs/Outputs Seasonal Differencing ....................................................................... 32
Table 7. Inputs/Outputs Box Cox ............................................................................................. 33
Table 8. Inputs/Outputs SES .................................................................................................... 34
Table 9. Inputs/Outputs HS ...................................................................................................... 34
Table 10. Parameters of Winters Smoothing ........................................................................... 35
Table 11. Inputs/Outputs WS ................................................................................................... 36
Table 12. Inputs/Outputs ADF ................................................................................................. 37
Table 13. Inputs/Outputs Jarque Berra ..................................................................................... 37
Table 14. Inputs/Outputs Shapiro Wilk.................................................................................... 38
Table 15. Inputs/Outputs PLS .................................................................................................. 40
Table 16. Inputs/Outputs ARMAX .......................................................................................... 40
Table 17. Inputs/Outputs ARIMA ............................................................................................ 41
Table 18. Inputs/Outputs Linear Prediction ............................................................................. 42
Table 19. Inputs/Outputs Box Plot ........................................................................................... 42
Table 20. Inputs/Outputs ACF ................................................................................................. 42
Table 21. Inputs/Outputs PACF ............................................................................................... 43
Table 22. Hardware Environment ............................................................................................ 58
Table 23. Performance Tests .................................................................................................... 71
List of Figures
Figure 12.Integrated
General Introduction
nowledge discovery is one of the most recent and fastest growing elds of
for example stock prices, dairy cow milk production gures or meteorological data and
especially in the process industry. Most current knowledge discovery systems use similaritybased machine learning methodslearning from exampleswhich do not generally suite
this type of data. Time-series analysis techniques are used extensively in signal processing
and sequence identication applications such as speech recognition, but have not often been
considered for knowledge discovery tasks.
The popularity of time-series databases in many applications has created an increasing
demand for performing data-mining tasks (description, transformation, modeling, etc.) on
time-series data. Currently, however, almost no single system or library exists that specializes
on providing efficient implementations of data-mining techniques for time-series data,
supports the necessary concepts of representations, statistical test and forecasting, and which
can be used by both expert and non-expert of statistics.
Integration Objects deals with heterogeneous types of temporal data coming from
different equipments such as sensors, data feeds, etc. This large amount of time series data
challenges the way they would be analyzed, interpreted, modeled and predicted with
developing models that are both accurate and user-friendly.
For these reasons our project, developed within the Integration objects company, is a
solution that can perform analysis of temporal data. It aims to offer a rich environment that
meets the standards and the expectations of the company's customers, which was the scope of
our end of studies project.
The following report details the different steps we have been through in our project. This
report presents five main chapters.
10
In the first chapter, we introduce the project environment by presenting the hosting
company, the project challenges and goals as well as the project management methodology
applied during the project lifecycle.
In the second chapter, we present the state of the art based on the concepts of time
series analysis and a description of the competitors.
The specification and analysis of every requirement is presented in the third chapter in
which the functional and non-functional requirements as well as the design of these needs are
described in details.
The fourth chapter covers the architecture and design phase of the solution. The fifth
chapter details the aspects of the implementation illustrated by the establishment of a real case
example. Finally, we complete this report with a conclusion and present the project
perspectives.
Chapter I
GENERAL PRESENTATION
12
Chapter I
General Presentation
Introduction
In this chapter, we start by covering the internship environment and by presenting the hosting
company. Then, we focus on the project, by detailing its environment, goals and challenges.
1.1 Overview
Integration Objects is a software development firm created in 2002, based in Tunisia with
sales representatives in Houston, Texas and Genoa. It is a world leading systems integrator
and solutions provider for knowledge management, advanced analytics, automation, plant
information management, root cause analysis, performance management and decision support
applications for the process industry
1.2 Expertise
Integration Objects is specialized in the development of software solutions for the sectors of
industry and energy, including oil and chemicals. Software developed by Integration Objects
focus on Manufacturing Operation Management which the objective is management and
optimization of production under operational constraints: the safety of staff and assets,
production goals, costs
Integration Objects offers highly scalable and reliable solutions that allow real-time data
collection from multiple plant systems and various enterprise networks.
This enables
13
Chapter I
General Presentation
companies to turn data, information, and knowledge into operational intelligence, thereby
optimizing their business and manufacturing processes.
One of these solutions is KnowledgeNetTM (KNet). It is an innovative intelligent framework
application specialized in collecting real-time data, detecting abnormal conditions, automating
root cause analysis, and applying best practices through the workflow engine.
Chapter I
General Presentation
interoperability between different applications, systems, and vendors. Its quality and
management standards are reflected in its status as an ISO 9001:2008 certified company.
Their Customers are located on five continents and include the largest industrial companies in
the world such as ExxonMobil, Chevron, Saudi Aramco and Solvey.
The development team: This team is responsible for design, development and
maintenance of software solutions provided by Integration Objects for the process
industry including plug and play connectivity products and knowledge management
products.
The automation team: This team is responsible for all automation, installation,
deployment activities at customer sites. Automation Engineers act as end users for the
products delivered by the development team and are so responsible for the testing and
validation of Integration Objects software products.
The process team: The process team deals with more advanced applications used in
the process industry such as data validation and reconciliation applications, oil
movement applications, expert systems, diagnosis applications, etc.
2 Project Overview
2.1 Functional Scope
Time series analysis comprises methods for analyzing time series data in order to extract
meaningful statistics and other characteristics of data. Time series forecasting is the use of
a model to predict future values based on previously observed values, while regression
analysis is often employed in such a way to test theories that the current values of one or more
independent time series affect the current value of another time series.
15
`
Chapter I
General Presentation
Time series data have a natural temporal ordering. This makes time series analysis distinct
from other common data analysis problems, in which there is no natural ordering of the
observations (explaining people's wages by reference to their respective education levels,
where the individuals' data could be entered in any order).
Time series analysis is also distinct from spatial data analysis where the observations typically
relate to geographical locations (accounting for house prices by the location as well as the
intrinsic characteristics of the houses). A stochastic model for a time series will generally
reflect the fact that observations close together in time will be more closely related than
observations further apart.
In addition, time series models will often make use of the natural one-way ordering of time so
that values for a given period will be expressed as deriving in some way from past values,
rather than from future values.
16
`
Chapter I
General Presentation
Both of these goals require the time series pattern to be identified and formally described.
Our project consist in designing and implementing an analytics module allowing simple users
to apply several analysis algorithms in order to better treat their time series according to their
needs.
17
`
Chapter I
General Presentation
18
`
19
Chapter II
Preliminary study
20
Chapter II
Preliminary study
Introduction
In this chapter, we start by defining Time Series Analysis concept. We continue by presenting
the principal market players and our proposed solution. Finally we present the statistical
frameworks.
1.1 EVIEWS
EVIEWS(Econometric Views) is a statistical package for Windows, used mainly for timeseries oriented econometric analysis. It is developed by Quantitative Micro Software (QMS),
now a part of IHS. Version 1.0 was released in March 1994, and replaced MicroTSP. The
current version of EVIEWS is 8.0, released in March 2013.
EVIEWS can be used for general statistical analysis and econometric analyses, such as crosssection and panel data analysis and time series estimation and forecasting.
Chapter II
Preliminary study
Prediction for identifying groups: Factor analysis, cluster analysis (two-step, Kmeans, hierarchical), Discriminant
Chapter II
Preliminary study
1.3 GRETL
Gretl is an open-source statistical package, mainly for econometrics. The name is an acronym
for Gnu Regression, Econometrics and Time-series Library. It has a graphical user interface
and can be used together with X-12-ARIMA, TRAMO/SEATS, R, Octave, and Ox. It is
written in C, uses GTK as widget toolkit for creating its GUI, and uses gnu plot for generating
graphs. As a complement to the GUI it also has a command line interface.
1.4 R
R is a free software programming language and software environment for statistical
computing and graphics. The R language is widely used among statisticians and data
miners for developing statistical software and data analysis. Polls and surveys of data
miners are showing R's popularity has increased substantially in recent years.
R
provides
wide
variety
of
classical
statistical
statistical
analysis,
23
`
Chapter II
Preliminary study
1.5 SAS
SAS (Statistical Analysis System) is a software suite developed by SAS Institute for advanced
analytics, business intelligence, data management, and predictive analytics. It is the largest
market-share holder for advanced analytics.
SAS is a software suite that can mine, alter, manage and retrieve data from a variety of
sources and perform statistical analysis on it. SAS provides a graphical point-and-click user
interface for non-technical users and more advanced options through the SAS programming
language. SAS programs have a DATA step, which retrieves and manipulates data, usually
creating a SAS data set, and a PROC step, which analyzes the data.
Chapter II
Preliminary study
2 Comparative Table
A brief study of those five possibilities led us to prepare this comparative table that allows us
to see clearly the features provided by each software.
EVIEWS
GRETL
SAS
SPSS
LAG/LEAD
Yes
Yes
Yes
Yes
Yes
Box Cox
No
No
Yes
Yes
Yes
Smoothing
Yes
No
Yes
Yes
Yes
Holt's Smoothing
No
No
No
Yes
Yes
Seasonal Differencing
Yes
Yes
No
Yes
No
Integrate
No
No
No
No
No
ARMAX
Yes
No
Yes
No
No
Linear Regression
Yes
No
Yes
Yes
Yes
No
No
Yes
Yes
No
Yes
Yes
No
No
No
Transformation
Models
Partial Least
Squares(PLS)
Statistical Test
Augmented Dickey
Fuller
25
`
Chapter II
Preliminary study
Shapiro Wilk
Yes
Yes
No
No
No
Mean test
Yes
No
No
Yes
No
ACF(Correlogram)
Yes
Yes
Yes
Yes
Yes
PACF(Correlogram)
Yes
Yes
Yes
Yes
Yes
Box plot
Yes
Yes
Yes
Yes
No
Bar
Yes
Yes
Yes
Yes
Yes
Line
Yes
Yaes
Yes
Yes
Yes
Points
Yes
Yes
Yes
Yes
Yes
Summary
No
No
No
Yes
No
Linear prediction
No
No
No
Yes
No
K-Nearest Neighbors
No
No
No
No
No
Descriptive analysis
No
No
No
Yes
Yes
Charts
Missed Values
3 Statistical Frameworks
For a better design and development of our project, we have to go through a research phase
about the best statistical framework. In our case, we will develop an application that will treat
large amount of data calculation that is why we must have a tool that contains several
mathematical functions. After this research phase we discover that our solution can be done
by integrating statistical software such as R or SAS or integrate the Accord.net frameworks.
26
`
Chapter II
Preliminary study
3.1 R.NET
R.NET enables the .NET Framework to interoperate with the R statistical language in the
same process.[N8]
27
`
Chapter III
Requirements Analysis and Specification
28
Chapter III
Introduction
In this chapter, we describe the global characteristics of the solution. Then, we analyze the
functional and non-functional requirements of the solution, and identify the different use cases
of the application.
1 General Specifications
1.1 User characteristics
Our solution can be used by both expert and non-expert of statistics such as chemist,
industrial and automation engineers...
2 System features
2.1 Transformation
2.1.1 LAG Transformation
Description
In time series analysis, the lag operator or backshift operator operates on an element of a time
series to produce the previous element.
For example, given some time series:
Then
(1)
or equivalently
(2)
where L is the lag operator
29
`
Chapter III
Sometimes the symbol B for backshift is used instead. Note that the lag operator can be
raised to arbitrary integer powers so that
And
(3)
Inputs
-
Outputs
-
Initial series
Order of lag
Backward series
Inputs
-
Outputs
-
Initial series
Order of lead
Series conducted
Inputs
-
Initial series
Power degree
Outputs
-
Chapter III
Figure 10.
Time Series
The figure shows the first difference in this series, that is, the series of variations in market
share from one week to the next. If we let
values oscillate around a constant mean and seem to correspond to a stationary series.
Figure 11.Integrated
Time Series
31
`
Chapter III
stationary one by means of differentiation. We say then that it is integrated of order one, the
number of differences needed to obtain a stationary process being the order of integration.
Inputs
-
Outputs
-
Non-stationary series
Stationary series
Order of integration
Inputs
-
Outputs
-
Initial series
Order Of Differencing
Order Of seasonality
(3)
(4)
The logarithm is the natural logarithm (log base e). The algorithm calls for finding the value
that maximizes the Log-Likelihood Function (LLF).
32
`
Chapter III
Inputs
-
Outputs
-
Initial series
Lambda parameter
(6)
33
`
Chapter III
Inputs
-
Outputs
-
Data
smoothing
parameter
Comments
-
Smoothed
series
No Trend
No Seasonality
(7 )
Forecast equation
(8)
Level equation
Trend equation
(9)
where
trend
level, 01 and
of
the
series
at
time t, is
smoothing
Inputs
-
the
Data
the relative level
another Trend
Smoothed series
for
the
1.
Outputs
-
parameter
Comments
-
With Trend
- No Seasonality
Table 9. Inputs/Outputs HS
34
`
Chapter III
bt ( Lt Lt 1) (1 )bt 1(11)
y
St t (1 ) St s (12)
Lt
Forecast m period into the future:
Ft m ( Lt mbt ) St ms (13)
-
= level of series.
= smoothing constant for the data.
= new observation or actual value in period t.
= smoothing constant for trend estimate.
= trend estimate.
= smoothing constant for seasonality estimate.
=seasonal component estimate.
Chapter III
Inputs
-
Data
The relative level
another on the Trend
the last to Seasonality
Outputs
-
Comments
Smoothed series
No Trend, with
Saisonality
With Trend, with
saisonality
(14)
is a coefficient, and
is
case.
The regression model can be written as
(15)
where
is the first difference operator. This model can be estimated and testing for a unit
( where
residual term rather than raw data, it is not possible to use standard t-distribution to provide
critical values. Therefore this statistic
(16)
(17 )
36
`
Chapter III
-
Test for a unit root with drift and deterministic time trend:
(18)
Inputs
-
Outputs
Initial series
F-statistic
P-value
Order of Lag
Inputs
-
Initial series
Outputs
-
Kurtosis
Mean
Skewness
Standev
Variance
Variance MLE
37
`
Chapter III
) came from
(20)
The constants
are given by
Where
and
are
the expected
values of
the order
statistics of independent and identically distributed random variables sampled from the
standard normal distribution, and V is the covariance matrix of those order statistics. The user
may reject the null hypothesis if W is below a predetermined threshold.
Inputs
-
initial series
Outputs
-
Kurtosis
Mean
Skewness
Standev
Variance
Variance MLE
Partial least squares regression (PLS regression) is a statistical method that bears some
relation to principal components regression; instead of finding hyper planes of minimum
variance between the response and independent variables, it finds a linear regression model by
38
`
Chapter III
projecting the predicted variables and the observable variables to a new space. Because both
X and Y data are projected to new spaces.
As in multiple linear regression, the main purpose of partial least squares regression is to
build a linear model, Y=XB+E, where Y is an n cases by m variables response matrix, X is an
n cases by p variables predictor (design) matrix, B is a p by m regression coefficient matrix,
and E is a noise term for the model which has the same dimensions as Y.
For establishing the model, partial least squares regression produces a p by c weight matrix W
for X such that T=XW, i.e., the columns of W are weight vectors for the X columns
producing the corresponding n by c factor score matrix T. These weights are computed so that
each of them maximizes the covariance between responses and the corresponding factor
scores. Ordinary least squares procedures for the regression of Y on T are then performed to
produce Q, the loadings for Y (or weights for Y) such that Y=TQ+E. Once Q is computed, we
have Y=XB+E, where B=WQ, and the prediction model is complete.
One of the most important steps in the application of the PLS regression is the determination
of the correct number of dimensions to use in order to avoid over-fitting, and therefore to
obtain a robust predictive model.
Comparison between PCR and PLS
Principal components regression and partial least squares regression differ in the methods
used in extracting factor scores. In short, principal components regression produces the
weight matrix W reflecting the covariance structure between the predictor variables, while
partial least squares regression produces the weight matrix W reflecting the covariance
structure between the predictor and response variables.
Temporal approach
The aim of this work is to propose a new technique for the application of PLS regression to
time series. This technique is based on the Exponential smoothing of the loadings weights
vectors (w) obtained at each iteration step. This smoothing progressively displaces the random
or quasi-random variations from earlier (most important) to later (less important) PLS latent
variables.
39
`
Chapter III
Inputs
-
data of predictors
response variable
Outputs
-
Estimators
(21)
where x is the input signal (usually a noise signal), y is the output signal and z is the external
input signal. The model coefficients of the given orders are estimated and the residual r (the
estimation error) is returned. Input parameters are order P of the AR process, order Q of the
MA process (choose Q=0 for an ARX model) and order R of the eXternal process.
Inputs
-
Estimation data
order P
order Q
order R
Outputs
-
40
`
Chapter III
The acronym ARIMA stands for "Auto-Regressive Integrated Moving Average." Lags of the
differenced series appearing in the forecasting equation are called "auto-regressive" terms,
lags of the forecast errors are called "moving average" terms, and a time series which needs to
be differenced to be made stationary is said to be an "integrated" version of a stationary
series. Random-walk and random-trend models, autoregressive models, and exponential
smoothing models are all special cases of ARIMA models.
A non-seasonal ARIMA model is classified as an "ARIMA(p,d,q)" model, where:
-
Inputs
-
Estimation data
order p
order d
order q
Outputs
-
and the
41
`
Chapter III
Inputs
-
Initial Series
Horizon
Outputs
-
Predicted Series
2.4 Graph
2.4.1 Box-Plot
A box plot is a convenient way of graphically depicting groups of numerical data through
their quartiles.
Inputs
-
Outputs
-
Series
(22)
Inputs
Outputs
Series
table of AC
number of Lag
Correlogram
42
`
Chapter III
k=2..n , j=1,2..k-1
k=3n (23)
Inputs
Outputs
Series
matrix of PAC
number of Lag
Correlogram
43
`
Chapter III
Chapter III
Alternative Scenario
1. Open project.
2. Choose method.
3. Save/Exit without Save/Choose another method.
45
`
Chapter III
Main Scenario
1. The application requests to manage project
2. The user choose to open or create new project
Post-Condition
Existence of a project
46
`
Chapter III
Main Scenario
1. choose to describe the missing data
2. choose to Impute the missing data
3. choose the method of impute
4. Save the completed data
Post-Condition:
Completed data
Conclusion
Throughout this chapter, we have detailed the functional and non-functional requirements of
the solution as well as the use cases. In the next chapter we begin the analysis and design of
theses specifications.
47
`
Chapter IV
Design
48
Chapter IV
Design
Introduction
The Design is a creative process, a crucial phase of developing project. Supporting this phase
with techniques and tools appropriate is important to product a high quality application. To
present our design we begin this section by giving a global view of our solutions architecture
after that we will detail our design choices through the package, classes and sequences
diagrams.
49
`
Chapter IV
Design
2 System Diagrams
2.1 Package diagram
Package diagram is UML structure diagram which shows packages and dependencies between
the packages. Our application is composed by five packages:
The package is the one that interact with the different other packages in order to flow the
execution from data into visualization.
50
`
Chapter IV
Design
TSAnalytics
Contains the form that hosts the main window of the solution and graphical charts like box
plot, Correlogram etc
Transformation
Provides the Transformation Algorithms requested by users like Lag, Integrate, Exponential
Smoothing..
Test
Provides the Test Algorithms requested by users like Duckey Fuller, Shapiro Wilk
Model
Provides the models Algorithms requested by users like ARMAX,ARIMA,PLS..
Graph
Provides the Graph Algorithms requested by users like BoxPlot, Correlogram..
51
`
Chapter IV
2.3
Design
Sequence diagram
A sequence diagram is an interaction diagram that shows the order how classes operate
between each others. It describe the objects and classes involved in the scenario and the
sequence of messages exchanged between the objects needed to carry out the functionality of
the scenario. Sequence diagrams are typically associated with use case realizations in the
Logical View of the system under development.
In this part, we present some sequence diagrams to describe interactions between the user and
the application.
53
`
Chapter IV
Design
Conclusion
Throughout this chapter, we have presented a conceptual view of our project. And we have
detailed the software architecture of the solution in the form of modules. In the fifth and final
chapter, we will describe the step of project implementation
54
`
Chapter V
Implementation and Test
55
`
Chapter V
Introduction
In this chapter, we devote the first part of the presentation for the development environment,
and then we focus on the presentation of the implemented solution and the performed tests.
1 Development environment
1.1 Software Environment
Microsoft Office Project Professional 2007 is fairly developed software that includes features
for project management. It is an application that allows monitoring of projects by ensuring the
accomplishment of tasks such as scheduling and jobs.
Accord.NET
We have been faced to choose third party integration for the analytics algorithms. This phase has
leaded us to choose the scientific calculations framework Accord.NET .We chose this framework
for its performance and the possibility of its configuration and its adaptation to our needs during the
implementation of the solution. Accord.NET is based on the mathematical framework "Aforge.Net".
This framework is composed of a variety of libraries including statistics, machine learning, pattern
recognition, etc.
Chapter V
Visual Studio is an integrated development environment (IDE) providing a set of tools and
services to develop desktop applications, web, or mobile. It incorporated several languages
such as C #, C + +, J # and F #. Its used to develop and test our solution.
Enterprise Architect
"Enterprise Architect is a comprehensive UML analysis and design tool for UML, SysML,
BPMN and many other technologies. Covering software development from requirements
gathering through to the analysis stages, design models, testing and maintenance.
DevExpress
57
`
Chapter V
Memory
6GB
Windows 7, 64bits
OS
Table 22. Hardware Environment
2 Achieved Work
In this section, we are going to present our solution.
Main Interface
As the end user launches the solution, he will be leaded to the main screen that is
presented in the figure below.
58
`
Chapter V
Load interface
59
`
Chapter V
The next figure presents "Treatments of missing values" tools in Data menu bar:
60
`
Chapter V
Impute interface
The next figure is the Impute missing Values interface.
The user can choose one of the following three methods for imputation:
The next figure is the Impute missing Values" interface by Descriptive Statistics. The user
can choose the given method for each column or for all columns, and clicking on the impute
button leads the user to a table which does not contain missing values
61
`
Chapter V
Chapter V
Chart
Line and bar interface
Chapter V
Chapter V
2.3 Transformation
The next figures present the Transformation menu which contains several transformation can
be applied to series.
65
`
Chapter V
To make series stationary with a single transformation and find the necessary order of
difference, our solution offers this possibility, with "Integrate Transformation":
Smoothing interface
For smoothing and forecasting we can use "Simple Exponential Smoothing":
66
`
Chapter V
2.4 Modeling
The next figures present the models menu which contains several methods of modeling. These
methods can be applied to univariate or multivariate series:
Main interface
When we first launch the PLS control, we will be leaded to the home screen that is presented
in the figure below:
Factors
Loadings matrix
Weights matrix
Model
Projection
67
Chapter V
-
Regression
Factors interface
Projection interface
Chapter V
Regression interface
Chapter V
For the two methods of prediction we provide a friendly interface in order to help users to
easily change inputs and outputs. The results are displayed in both charts and data table.
Chapter V
Performance Tests
After the end of the implementation phase we have to go through a testing phase of the
application. The test phase is needed to detect anomalies and validate our application. It
ensures that our solution will react as intended and that the quality of the code is in line with
expectations.
We have performed some stress tests to check the performance and response time of our
application. The next table presents some stress tests executed.
Test case
Load data
Transformation
Inputs
Duration
2.5 seconds
1 second
1 second
Description Analysis
1 second
9 seconds
5.5 seconds
Linear Prediction
10 seconds
Exponential Smoothing
6 seconds
ARMAX model
7 seconds
Augmented Dickey
Fuller test
71
`
Chapter V
Conclusion
In this chapter, we have presented the implementation phase of the solution. We have started
by describing the different tool and libraries we have been using throughout the project. Then,
we have presented the most important features offered by our application by showing the
most important interfaces of our application. Finally, we have finished by performing some
tests to validate our application
72
`
raditionally, data mining and time series analysis have been seen as separate
approaches to analyzing enterprise data. However, much of the data used by
business processes is time-stamped. Time series Analysis is a mixture of
forecasting and traditional data mining techniques that uses time dimensions
In our project, we started by focusing on the understanding of the discipline by studying the
concept of time series analysis and reviewing the existing tools. The next step was to study
and analyze the features to design and implement in our solution and bring out the functional
and non-functional requirement of our project. We then proceeded with the design phase, by
detailing the architecture of our application as well as static and dynamic design through the
development of packages and class diagrams.
Finally, we concluded the report by presenting the implementation and test phase of our
project. This chapter describes the tools and frameworks used to achieve our solution, and
expose the work done through screenshots which cover the most important features of the
solution.
Much of the data that are used in the operational side of a business have a built-in time
dimension. One of the challenges of developing this solution is the complexity of handling a
large number of time series.
73
`
74
`
Bibliography
Bastien P., Esposito Vinzi E., Tenenhaus M. (2005) PLS generalised linear regression,
Computational Statistics and Data Analysis, 48, 17-46.
AitSaidi, A., Ferraty, F. etKassa, R. (2005) Single functional index model for a time
series. Rev. Roumaine Math. Pures Appl. 50 (4) 321-330.
75
`
Netography
[N1] http://www.integrationobjects.com/services.php
[N2]http://www.integrationobjects.com/knowledgenet.php
[N3]
http://www.eviews.com/home.html
[N4]
http://www-01.ibm.com/software/analytics/spss/
[N5]
http://gretl.sourceforge.net/
[N6]
http://www.r-project.org/
[N7]
http://www.sas.com/en_us/software/analytics.html
[N9]
http://www.sas.com/en_us/software/integration-technologies.html
[N10] http://accord-framework.net/intro.html
[N8]
http://rdotnet.codeplex.com/
[N13] http://www.sparxsystems.com.au/
[N12] http://www.microsoft.com/visualstudio/fra
[N14] http://www.devexpress.com/
[N11] https://code.google.com/p/accord/
76
`
77
`