Behzad Thesis

Examensarbete 30 hp December 2011
Using Ensemble Learning To Improve Classification Accuracy in Medical Data
Behzad Oskooi
Abstract
Using Ensemble Learning To Improve Classification Accuracy in Medical Data
Behzad Oskooi
Supervisor: Saied Rahati Reviewer: Olle Gllmo Examiner: Anders Jansson
Contents
List of Figures ................................................................................................................................................ 6 List of Tables ................................................................................................................................................. 6 1 Introduction .......................................................................................................................................... 7 1.1 1.2 1.3 1.4 2 Project Overview ........................................................................................................................... 7 Background (difficulties, motivations) .......................................................................................... 7 Block diagram representation (concept level) .............................................................................. 8 Project structure ........................................................................................................................... 8
Problem Description ............................................................................................................................. 9 2.1 What is Reiki? ................................................................................................................................ 9 Reikis History ...................................................................................................................... 9 Reikis applications ............................................................................................................. 10
2.1.1 2.1.2 2.2
Signal representation (EEG during Reiki) .................................................................................... 11 Time domain features ......................................................................................................... 13 Frequency domain features ................................................................................................ 13 Time-frequency domain features ....................................................................................... 14 Non-linear analyzing methods ............................................................................................ 15 Input Vectors Features ........................................................................................................ 15
2.2.1 2.2.2 2.2.3 2.2.4 2.2.5 2.3
Literature survey (concept level, comparative study) ................................................................ 16 Pattern Classification Using Support Vector Machine Ensemble ....................................... 16 On Predicting Rare Classes with SVM Ensembles in Scene Classification........................... 18
2.3.1 2.3.2 3
Technical solution (Methodology) ...................................................................................................... 22 3.1 3.2 3.3 Classifiers .................................................................................................................................... 22 SVM and its extensions ............................................................................................................... 26 Ensemble learning....................................................................................................................... 30 Bagging ................................................................................................................................ 31 Boosting .............................................................................................................................. 32 AdaBoost ............................................................................................................................. 33 Methods for aggregating support vector machines ........................................................... 34
3.3.1 3.3.2 3.3.3 3.3.4 4
Empirical Evaluation............................................................................................................................ 35 4.1 4.2 Experiment 1 .................................................................................. Error! Bookmark not defined. Experiment 2 ............................................................................................................................... 51
4.3 5
Benchmark Data.......................................................................................................................... 64
Conclusion (qualitative results, comparative) and Future Works ...................................................... 65
References .................................................................................................................................................. 65 Appendix ........................................................................................................ Error! Bookmark not defined.
List of Figures
Figure 1: Representation block diagram of the project ................................................................................ 8 Figure 2: The 24-channel system to register brain signals (2) (7) ............................................................... 11 Figure 3: Chacras Body System (7) .............................................................................................................. 12 Figure 4: Frequency bands of EEG signals (10) ........................................................................................... 13 Figure 5: Architecture of the SVM ensembles (13) ................................................................................. 19 Figure 6: The two features of lightness and width for sea bass and salmon (14) ................................... 23 Figure 7: a more complicated model for classification of fishes (14) ...................................................... 24 Figure 8:" The decision boundary shown might represent the optimal tradeoff between performance on the training set and simplicity of classifier. (14) ....................................................................................... 25 Figure 9 : The AdaBoost Algorithm (17) .................................................................................................. 34
List of Tables
Table 1: The EEG Frequency Spectrum ....................................................................................................... 14 Table 2: Input Vectors Features (1)............................................................................................................. 15 Table 3:" The correct classification rates of UCI hand-written digit recognition" (12)............................... 17 Table 4: Binary-class fraud detection (12) Table 5: Multi-class fraud detection (12) ................ 18 Table 6: "Classification results using different method for tackling the rare class problem" (13) ............. 21
1 Introduction
1.1 Project Overview Nowadays medical instruments are widely used in hospitals, medical polyclinics and doctors offices to gather vital information about patients bodies. This information could be ultra sonic or x-ray images, brain or heart signals and so on. Medical data are used by professionals to distinguish the reason of illnesses and electroencephalogram (EEG) (1) is a well known test for this purpose. EEG is a painless test that records brains signals; hence several sensors are used to gather these bio signals. EEG have many features, therefore the volume of data can be increased dramatically if the numbers of samples or patients are enlarged. Interpreting this huge amount of data is not easy and takes long time to be analyzed. Application of neural networks can reduce time for interpreting medical data as well can interpret with higher precision. This thesis is designed to take large number of medical information with many input vectors by using ensemble learning methods to have output target with maximum accuracy more than 85% to differentiate real Reiki from Placebo. 1.2 Background (difficulties, motivations) This thesis is designed to continue earlier investigations related to Reiki which was done by Ms. Sahar Ghobadi in Islamic Azad University of Iran, Mashhad Branch as a Master thesis. Study was done on male and female people on ages between 21 and 29. Two different set of experiences were done in the same people in three different states rest, during Reiki and recovery. The first experiment was done by a person who was expert in Reiki and the other by a regular person who just imitates activities of a Reikis expert. Brain signals were gathered by equipment in both experiments, and volunteers were asked about their different emotional stats that they had during the experiments. Some emotional states that registered in questionnaires were anxiety, stress, angry, frightened and so on. After gathering enough samples, feature specifications were extracted in different domains such as Time and statistics, frequency, timefrequency and non-linear methods such as fractal dimensions, phase space diagram, Lyapunov exponent and entropy. As the features extracted in this investigation are more than 25, an individual SVM cannot cover all target space with high accuracy, so the thesis is designed to use other methods of machine learning such as ensemble learning to fulfill accuracy more than 85% to differentiate Reiki from placebo in experiment results. In this study it is assumed that feature space has the ability to describe the feature space well, so the focus is on the classifier.
1.3 Block diagram representation (concept level) The emphasize of this thesis project is on implementing of an ensemble learning classifier by using combination of SVMs to classify input vectors for distinguishing Real Reiki from placebo. To achieve this goal, first medical data during Reiki and placebo sessions, which are gathered by an EEG instrument in (2), are prepared to be usable as input vector for single SVMs. Next a single SVM will be implemented by MATLAB Version 7.10 on the Release 2010a as pilot to test input vectors. Further in pre training phase by representing different combinations of input vectors to the SVM various target space will be discovered. Finally, several SVMs will be applied to implement the ensemble learning classifier to increase accuracy of distinguishing Reiki from Placebo. Again, I believe that medical information which are gathered in (2) by Sahar Ghobadi is correct and I accepted them completely. Figure 1 shows steps of doing this thesis project as a boxed diagram.
Figure 1: Representation block diagram of the project
1.4 Project structure The aim of this thesis project is to apply Ensemble learning on brain signals which are gathered by EEG instrument to distinguish between Reiki (3) and placebo (4) with accuracy higher than 85%. In section 2 problems regarding to Reiki, signal representation and also some papers and documents that are used in this thesis will be described. Section 3 will be described methodology which is used in this project that contains explanation of classification methods, SVM and its extensions and ensemble learning methods such as boosting, bagging and so on. Section 4 will represent a quantitative description of evaluation of the experiments which are designed for the thesis and they will be compared with results that are taken from benchmarking. Section five will conclude the project by describing qualitative results of experiments and will describe future works that can be done regarding to this field of studies.
2 Problem Description
In classification of medical data, in the first step, bio signals are collected by the sensory instruments. Thereafter in preprocessing step, appropriate features are extracted from them. If the feature space is large, in order to avoid the curse of dimensionality problem, the data dimensionality is reduced. Finally classification methods like neural networks or support vector machine (SVM) are used to classify data. In some applications such as cognitive science due to complexity of the problem, the accuracy rate of these classifiers is less than 85%. Low accuracy rate may be caused by inappropriate feature space or inability of classifier in generalizing results. Poor generalization ability might be caused by variety of reasons, and finding those reasons is a very complex task. On the other hand, although some classifiers are not trained enough to perform well globally, at least they can achieve acceptable local performance. In this study I intend to improve classifier performance by ensemble learning of a group of classifiers. Usually, in addition to accuracy rate, sensitivity, specificity and ROC curves are used for evaluation. 2.1 What is Reiki? Reiki is a healing practice that applies for patients who need complementary treatment. Reiki is classified as complementary and alternative medicine (CAM) accordance with national center for complementary alternative medicine (NCCAM) organizations definition (5). CAM practices classify into wide categories, such as natural products, mind-body medicine and manipulative, body-based practice, movement therapies, traditional healer and energy therapy and so on. Reiki is belonging to energy therapy class, which is a natural way to achieve awareness, relaxation and balance. Reiki can balance emotional and spiritual terms of body by causing the body naturally threat itself. Also, Reiki is a therapeutic technique in which the healing energy through hands is to be guided into body or energy domain of healing receiver. For performing Reiki, practitioner places his/her hands lightly on or just above the body of person who needs treatment. Healing receiver can learn to do Reiki by himself/herself and it is good for whom that need to continue treatment at home for longer periods.
2.1.1 Reikis History
Reiki originated in Japan and the word Reiki is derived from two Japanese words: rei, or universal, and ki, or life energy (3). The idea behind Reiki is that there is a universal energy
that supports the bodys inbred healing abilities. Practitioners try to find access this energy and allow it to flow into the body for facilitating healing. Current Reiki practice is based on spiritual teachings of Mikao Usui a Japanese Christian educator in Kyoto, who was leaving in early 20 centuries (6). His teachings included meditative techniques and healing practice. Later Chujiro Hayashi, one of his students, developed Reiki, placing more emphasis on the healing practices. Reiki was introduced to western culture in late 1930s by an Amarican, Hawayo Takata that he learned Reiki from Hayashi in Japan. Since then several variations of Reiki have been developed and are currently practiced in various schools. The type of Reiki thought and practiced by Hayashi and Takata may be considered as traditional Reiki.
2.1.2 Reikis applications
Applying Reiki has many advantages in treatment of illnesses, for example after Reiki you feel more relaxed, have more control over your own health care, have more energy and also it increases well-being and self-awareness, strengthen your immune system, help to digest better, sleep better, be without migraines, back pain or other discomfort. Reiki also gives you experience heightened intuition and spiritual awareness, reduce (or complete elimination of) symptoms of hopelessness and anxiety. Regarding to mentioned advantages, in recent years Reiki increasingly is used in the health centers such as; medical centers, emergency rooms, surgery operating rooms, organ transplantation care, obstetrics and gynecology wards, neonatal care units, pediatric wards, HIV/AIDS patients care units and so on (2). Reiki has been found useful in curing some problems including sports injuries, cuts, burns, post-surgical healing, migraine headaches, chronic pain or inflammations, internal diseases, emotional disorders, and many stress-related illnesses. Although Reiki can be considered as a complementary form of treatment that could be very useful to many of people, it cannot be measured as a panacea (6). Although Reiki developed during last century, it is still a hot topic for research and NCCAM has defined several funded researches that have been investigating in the following topics: How Reiki might work Whether Reiki is effective and safe for treating the symptoms of fibromyalgia Reikis possible impact on the well-being and quality of life in people with advanced AIDS
The possible effects of Reiki on disease progression and/or anxiety in people with prostate cancer Whether Reiki can help reduce nerve pain and cardiovascular risk in people with type 2 diabetes. (3)
2.2 Signal representation (EEG during Reiki) Feature extraction is very important in every Neural Network researches, so to fulfill the aim of this thesis, distinguishing Reiki from Placebo with accuracy more than 85%, features of brain signals gathered by Sahar Ghobadi in (2) were analyzed and evaluated by various tests such as t test and receiver operating characteristics (ROC)1 diagrams. Also different domains of the features space such as time, frequency, time-frequency and non-linear methods such as fractal dimensions, phase space diagram, Lyapunov exponent and entropy were studied in (2) to be sure that the most appropriate features were extracted for signal representation. To extract appropriate features, brain signals were registered by a 24-channel device, which only 19 channels namely FP1, T3, T5, FZ, F3, F7, T4, C4, CZ, C3, F8, T6, P4, PZ, FP2, F4, O2, O1 on an electrode-hat was used according to standard 10-20 (7) to register brain signals, and ears connections were considered as references. EEG registration device and electrode-hat are shown in figure 2.
Figure 2: The 24-channel system to register brain signals (2) (7)
Brain signals were registered in (2) for both groups, real Reiki and imitated Reiki (placebo) in two different situations, during Reiki and rest. Studies were done on 30 male and female
1
Evaluation of results for classification depends on two factors, sensitivity and specificity. Sensitivity is probability of correct recognition of the target (the correct class) in an experiment, which is equal to true positive. Specificity is probability of the correct recognition of none target in an experiment, which is equal to true negative.
volunteer students in age between 24-28 for Real Reiki and 22-29 for imitated Reiki. None of volunteers have any heart, vessel, nerve problem or any chronically illnesses and they did not consume any drugs. None of volunteers were familiar with Reiki and they informed about the Reiki and conditions of the experiment by giving a brochure just before the experiment. Volunteers emotional status was registered before and after the experiment on printed forms to use in further studies. Measurement of brain signals is done for each student on seven chakras during Reiki, one minute for each chakra, and 2 minute for rest status (2). Signal registration for all of volunteers was done in the same conditions such as time, place and signal registration instrument, and for preparing volunteers to measure brain signals, they were asked not to drink drinks which contain caffeine and not to smoke 24 hours before the experiment. Also volunteers were asked not to speak, sleep and not to shake their bodys organs a lot during the experiments. Reiki treatment was done by putting Reiki performers hands just above the body of volunteers without touching their bodies. At first Reiki was performed on 3 chakras points on the head, next 4 other points in the body were given Reiki (2). A Chakra is a Sanksrit word that literally translates as "wheel" or "disk". It is an eastern concept of Indian origin that chakras are treated as "energy vortexes", and their sacred balance is crucial to our physical, mental, emotional, and spiritual well being. There are a total of 7 chakras in the actual chakra system. Each of these seven chakras, are located in different areas of the body (8). These points are shown in figure 3.
Figure 3: Chacras Body System (7)
2.2.1
Time domain features
Usually statistical features are used for bio signals. Below statistical features, which directly extracted from time domain is introduced in (2) and mathematical definitions can be found in (9). Mean Variance Skewness, Degree of deviation from the symmetry of a normal signal or Gaussian distribution is measured by Skewness parameter. Skewness for normal distribution is zero, and for distribution to the right is a positive value and for distribution to the left is a negative value. Kurtosis, This parameter indicates that a distribution is flat or has a peak. Value for this parameter for normal distribution is zero and called Mesokurtic. If the amount of this parameter is higher than zero it is called Leptokurtic and it has a peak sharper than normal distribution. If the amount is a signal less than zero it is called Platokurtic and it is flatter than normal distribution.
2.2.2 Frequency domain features
Frequency domain properties, including properties that are relatively have shown good performance in the processing of brain signals. Several methods for estimation of power spectrum have been developed that each one usually offers good estimation for specific types of signals. These methods are including traditional methods (Fourier-based methods) and parametric methods such as auto regressive (AR). EEG signals are divided into different frequency bands, including Delta, Theta, Slow Alpha, Fast Alpha, Beta and Gama which have different changes during psychological and mental conditions. Changes in frequency spectrum of EEG signals are very important and appropriate features can be extracted by studying these changes. Different frequency bands of EEG signals can be seen in figure 3.
Figure 4: Frequency bands of EEG signals (10)
As it can be seen in figure 3, regular spectrum of EEG signals is between 1-70 Hertz, which is divided into 5 bands. In (2) band Alpha is divided into 2 sub bands Alpha 1, 7-10 Hertz and Alpha 2, 10-13 Hertz to study brains signal spectrum much precisely. One of the features has taken is relative power of that band, which calculating by dividing the power of each band on total power of signal spectrum frequencies. Also you can find more details about EEG signals frequency spectrum in table 1.
Table 1: The EEG Frequency Spectrum
2.2.3
Time-frequency domain features
For better representation of signals, they can be represented in both dimensions of time and frequency simultaneously. Analyze of time-frequency in case of stationary signals is more important, because different sections of signal has different frequency content. Among other methods are used to analyze brain signals, short-term Fourier transformation (STFT) and wavelet transformation can be referred (11). One applications of wavelet transformation is separating stationary signals with different frequency features, while Fourier transformation is used to analyze stationary signals. By considering that brain signals have transient non static components, it seems only Fourier transformation is appropriate to use. Wavelet transformation is able to represent good features in time-frequency domain. In contrary to Fourier transformation that time and frequency precision are opposite to each other, Wavelet
transformation is using variable length windows, window with small width for high frequencies and big window for low frequencies.
2.2.4 Non-linear analyzing methods
There are others approaches for analyzing EEG signals such as none-linear and chaotic view of signals. Recently improvements in non-linear dynamics theorem introduced a way to analyze signals, which are produced by non-linear alive systems. Non-linear systems are able to describe processes, which are produced by biological systems more effective. With this vision, dynamic describer devices and for dynamic and
2.2.5 Input Vectors Features
Many features can be used in input vectors; however Sahar Ghobadi is emphasizing just only on 27 the most important features in (2). As section length selection of a parameter is very important because of its sensitivity, which can affect output for analysis, linear features are calculated in two seconds windows with 25% overlapping and non-linear features are calculated in twenty seconds windows without overlapping. Selected features are grouped in four groups of time, frequency, time-frequency and non-linear features, which mentioned in previous sections. The system of naming for features is according to their relation to parameters in deferent domains, for example features in Time (statistical) features group are specified in relation to mean, variance, skewness and kurtosis parameters, which described earlier in section 2.2.1. Frequency features are named regard to frequency bands and their relative power and energy of EEG signals. The other group, time-frequency features (wavelet) are addressed as average absolute, average and variance wavelet coefficients for Delta, Theta, Alpha, Beta frequency bands. And the final group None-linear features are named based on the name of methods such as fractal dimensions, phase space diagram, Lyapunov exponent and entropy. Table 2 shows type and name and symbols of features. Symbols are being used later instead of name of features in rest of this thesis report.
Table 2: Input Vectors Features (2)
Row Type of Feature

1 2 3 4 5 Time Features (statistical) Frequency Features
Name of Feature
Mean of EEG Signal Variance of EEG Signal Skewness of EEG Signal Kurtosis of EEG Signal Delta Band Reltive Power
Symbol
Mean-EEG Var-EEG Skewness-EEG Kurotosi-EEG D-Relative Power
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
Time-Frequency Features (Wavelet)
Non-Linear Features
Theta Band Reltive Power Alpha1 Band Reltive Power Alpha2 Band Reltive Power Beta Band Reltive Power Energy of EEG Signal Average absolute wavelet coefficients for Delta Band (A5) Average absolute wavelet coefficients for Theta Band (D5) Average absolute wavelet coefficients for Alpha Band (D4) Average absolute wavelet coefficients for Beta Band (D3) Average wavelet coefficients for Delta Band (A5) Average wavelet coefficients for Theta Band (D5) Average wavelet coefficients for Alpha Band (D4) Average wavelet coefficients for Beta Band (D3) Variance wavelet coefficients for Delta Band (A5) Variance wavelet coefficients for Theta Band (D5) Variance wavelet coefficients for Alpha Band (D4) Variance wavelet coefficients for Beta Band (D3) Average Lyapunov Exponent Fractal Dimension ( Higuchi) Approximate Entropy Shanon Wavelet Entropy Logarithm Energy Wavelet Entropy
T-Relative Power A1-Relative Power A2-Relative Power B-Relative Power Energy-EEG A5- AVE D5- AVE D4- AVE D3- AVE A5- POW D5- POW D4- POW D3- POW A5- VAR D5- VAR D4- VAR D3- VAR Lyapunov Fractal ApEn-Entropy ShEn-Entropy LeEn-Entropy
2.3 Literature survey (concept level, comparative study) In previous sections the goal of thesis and some key points such as Reiki, EEG, and Chakras were introduced, and features that can be used as input vectors in the thesis project were described. In the current section, some papers related to classification methods, SVM and Ensemble learning, which used as references for this project, will be discussed.
2.3.1 Pattern Classification Using Support Vector Machine Ensemble
The first paper that used as reference is written by Hyun-Chul Kim and colleagues. In their paper, Pattern Classification Using Support Vector Machine Ensemble (12), UCI hand written digit recognition and fraud detection have studied by using SVMs and different ensemble learning methods such as boosting and bagging, and methods like majority voting, LSE-based weighting and Double-layer hierarchical combining methods have used for aggregating support vector machines.
2.3.1.1 UCI Hand Written Digit Recognition
The first experiment in (12) was run on some prepared literature on hand written digit recognition, UCI hand written digit data by using an ensemble of 10 multi-classes SVM, and each multi-class used the one-against-one multi-classification method, and had 45 SVMs. Training set was 3,828 records, and test set was 1,797 data. Te size of each digit was 32 x 32
pixels originally, which reduced to 8 x 8 pixels. Each SVM had a 2D polynomial kernel. They used 1000 samples for both boosting and bagging methods. The experiment results for classification of hand written literature had taken in 10 independent runs of simulations. According the experiment results, by using a single normal SVM (without boosting and bagging methods) the correct rate of classification was 96.02% , while in case of using an ensemble of SVMs, bagging and boosting, the correct classification rate was higher than Single normal SVM. Table 3 shows the results of the first experiment.
Table 3:" The correct classification rates of UCI hand-written digit recognition" (12)
Single SVM SVM E. (Majority voting) SVM E. (LSE-based weighting) SVM E. (Hierarchical combining)
Normal 96.02% Bagging Boosting 96.85% 97.15% 97.27% 97.61% 97.38% 97.83%
As it can be seen, the rate of classification for ensemble of SVMs is higher than a single normal SVM between 0.83 - 1.81%.
2.3.1.2 Fraud Detection
Hyun-Chul kim et al. in the second experiment used the proposed SVM ensemble on mobile telecommunication payment fraud detection. Database, they were used, was obtained from a mobile telecom company, which had records of action data of 53,696 customers during one year. Eight prominent features were extracted to evaluate customer payment behaviors. Seventy percent of data were used as training set and thirty percent were used as test set. Two classification scenarios can be considered for fraud detection, a binary or a multi-class classification problem. In the binary classification class, customers are divided into two classes, fraud or non-fraud. In the multi-class classification customers are divided into more than two classes, which in (12) the number of classes is four and they specified according to confidence grade of customers. Fraud detection simulation was performed for both form of binary and multiclass classifications for both bagging and boosting. Numbers of SVMs for ensemble of binary and multi-class were eleven and two different kernel 3D polynomial or RBF was used for each type of classification. Experiment results for normal SVM, bagging and boosting ensembles were measured during the experiment. LSE-based method was not used for aggregation, because of
high computational complexity. Table 4 shows the correct classification rates of the binary class fraud detection and table 5 shows the correct classification rates of the multi-class fraud detection.
Table 4: Binary-class fraud detection (12) Table 5: Multi-class fraud detection (12)
Normal Single SVM (Poly.) Single SVM (RBF) 84,95% 83.97% Bagging Majority Voting (RBF) Majority Voting (Poly.) Hierarchical SVM (Poly.) 93.49% 95.75% 84.38%
Boosting 89.91% 89.83% Boosting 96.92% 97.28% 86.97% Majority Voting (RBF) Majority Voting (Poly.) Hierarchical SVM (Poly.) Single SVM (Poly.) Single SVM (RBF)
Normal 75.53% 79.78% Bagging 88.89% 93.52% 81.15%
Boosting 87.18% 88.68% Boosting 89.65% 96.43% 82.08%
As it can be seen in both tables 4, 5 fraud detection rates with ensemble of booting and bagging are higher than single SVMs. In the end of their paper, they conclude that SVM ensemble can improve a single SVM for a lot of real applications, in case of problems with large amount of data, problems with high dimensional data, and multi-class classification problems.
2.3.2 On Predicting Rare Classes with SVM Ensembles in Scene Classification
The second paper, which is used as reference (13), is written by Rong Yan et al. at Carnegie Melon University in 2003. Focus of this paper is on scene classification as an important method to deduce sophisticated semantic scene from low-level visual features. Authors of this paper have proposed SVM ensembles to deal with rare class problem, which humiliates the performance of many classifiers, because for many scenes in real world; the positive2 data may be rare. Although SVM ensembles idea is not an innovative method, SVM ensembles proposed in this paper is different from preceding investigations in two ways. Firstly, it applied to tackle rare class problem and secondly, sampling scheme which used in the paper is different from other researches and various combination strategies have been studied. Although SVMs are not fairly sensitive to the distribution of training samples in each class, they still get stuck when the class distribution is too skewed, so effect of training distribution has analyzed in the paper to show
Positive data
how varying training distribution could help to improve the prediction performance for rare classes. Two fundamental methods, which projected in the paper to address rare class problem by modifying the class distribution, are oversampling and under-sampling. Over-sampling is replicating the data in the minority class and under-sampling is throwing away part of the data in the majority class. Although both oversampling and under-sampling can diminish the issue of imbalance in the dataset, both of them have known disadvantages. Under-sampling may abolish some of potentially valuable information, and consequently performance of classifiers can be reduced. Over-sampling, alternatively, raises the training set dimension and training time. Training time complexity for SVM is near to the second degree to number of SVMs, even to the third degree in the worse case, so raising the training set size is a more serious concern to SVMs than other ordinary classifiers. Indeed, over-fitting is more probable to take place with replication of minor samples.
2.3.2.1 Effect of training distribution and architecture of SVM ensembles
The effect of varying the class distribution is demonstrated in (13) by applying over-sampling by using a single SVM for the scene classification data, which minority class distribution is altered from 10% to 60%. The results in (13) shows that varying class distribution could progress the classification accuracy prediction. To eliminate disadvantages of over-sampling and undersampling, SVM ensembles proposed in the paper to deal with the rare class problem faced with two major issues, overall architecture and the combination strategies. Architecture of the SVM ensembles used in (13) is shown in figure 5.
Figure 5: Architecture of the SVM ensembles (13)
As it can be seen in figure 5, training data contains positive and negative3 data, which divided into K partitions for each SVM. This is done, to achieve advantages of SVM ensembles, using all information available in data sets, and to avoid restrictions of SVMs such as high degrees of complexities, Rong Yan et al. used two strategies. First, negative samples are divided into K partitions, where K is relying on the number of positive samples; afterward all positive samples are combined with each negative samples partition to form an individual subset. Further, SVMs are trained independently on every training subset, and at the end all constituents SVMs are combined by various strategies such as, majority voting in case only class labels are considered and Sum Rule when continues-valued outputs like posteriori probabilities are available, to aggregate their results in a suitable combination approach. Another combination strategy, which is used in the paper, is hierarchal SVMs that can tackle the rare class problem by utilizing another SVM to accumulate the output of a number of SVMs in the previous stage. A crucial issue for rare class problem is that the training distribution may be varying from test distribution. So, for tackling this issue, it is necessary to have a held-out set within the local area of the test set. Nearest neighbor of each test sample is chosen from training set in (13), where the distance is measured by cosine similarity to have a better evaluation of test set distribution than stochastic sampling.
2.3.2.2 Sample data and images features
More than 100,000 images were labeled manually from a 23 hours video that contained a wide range of subjects like natural scenes, cartoons, man-made objects and so on in a database to be the used as samples. The extracted images were labeled as cityscape/ no cityscape, landscape/no landscape in the database. Later, to reduce the computational costs, image data were sampled randomly for each classification task to be the final dataset. In this experiment, color and texture are used as two low-level features and final input vectors have 144 features. Final input vectors are computed by using separate color channels, applying different Gabor filters on texture feature, using 6 different angles on Gabor filters and so on (13).
2.3.2.3 Classification results
Six different methods, over-sampling (OverSamp), under-sampling (UnderSamp), SVM ensembles with majority voting (SVM-MV), SVM ensembles with Sum Rule (SVM-Sum), SVM
3
Negative data
ensembles with a neural network gater which has 10-200 hidden components (SVM-NN) and SVM hierarchical SVMs where the top level SVM is linear kernel (SVM-SVM), were used to tackle the rare class problem in classification of images. For each classification method, the training distribution was varied, so that rare class samples were computed for eleven distributions for each training sets as follows: 10%, 15%... 55%, 60%. Table 6 is shown classification results for each method that its training distribution achieves its best F1 performance to tackle rare class problem for cityscape and landscape images. Performance of the over-sampling and under-sampling methods are used as baseline, since they are more popular in the literature and F1 is chosen as major performance metric, which is defined as follows: where, P is precision4 and R is recall5.
Table 6: "Classification results using different method for tackling the rare class problem" (13)
For Each Method, The Best Training Distribution for Rare Class (Best Dist.), Precision (Prec), Recall (Rec), F1 Metric (F1) And Training Time in Seconds (Time) are reported. (13)
As illustrated in table 6, performance for over-sampling always is better than under-sampling at higher cost for calculation, also almost always hierarchical SVMs do better than the other methods. In terms of F1 measure, over-sampling in comparison with hierarchical SVMs has improved 11% on the cityscape dataset, and has improved 5% on the landscape dataset. Moreover, hierarchical SVMs can be trained much faster than over-sampling, which only takes less than half of training time for cityscapes and one third for landscapes. SVM-MV and SVMSum have not considerable results and their performance lies between over-sampling and undersampling. Authors speculate that assuming equal weights to each classifier is the cause of low performance of these two classifiers in compare with other methods. SVM-NN produces relatively unsteady outcomes, i.e. finest in the landscape data set, but poorer in the cityscape.
4 5
percision recall
This specifies that SVM-NN method is sensitive to subtle characteristics of diverse datasets. Hence, by considering both performance and computation price hierarchical SVMs are ideal to tackle the rare class problem.
3 Technical solution (Methodology)

Up to now, in previous sections, Reiki is introduced and necessary features that can be useful to take higher precision in interpreting EEG signals to identify real Reiki from placebo are described. In the current section classifiers, SVM and its extensions will be described and in the rest of this section different ensemble learning methods such as Bagging, Boosting and AdaBoost will be explained, and also different aggregation methods that can be applied to sum up the outputs of different SVMs will be described. 3.1 Classifiers Abilities like face recognition, understanding spoken words, and ability of reading handwritten characters and recognizing things such as car keys by feel on our packet are known as pattern recognition. To be more precise, according to Duda and et al. description, pattern recognition is the act of taking in raw data an taking an action based on the category of the pattern (14) and it has been vital for human kind survival over the past tens of millions of years and human have evolved extremely complicated neural and cognitive systems for this kind of activities. It is normal that design and construct machines that have ability to recognize patterns. In applications such as speech recognition, fingerprint identification, optical character recognition, DNA sequence identification and so on, it is obvious that reliable, precise pattern recognition by machine would be highly constructive, because constructing such systems in solving in the innumerable problems is essential. A deeper understanding and realization can be gained if methods, which are used in the nature for solving pattern recognition applications, are studied. Moreover, solving pattern recognition applications will be influenced, in designing and employing special purpose hardware. An example of a classifier is application of pattern recognition in separating two types of fishes, e.g. Salmon and Sea-bass, on a conveyor belt in a packing factory. A camera which is installed above the conveyor belt can take some sample pictures of fishes that can be used to identify type of each fish according to special features that in this case can be length, lightness, width, number
and shape of fins, position of mouth and so on. In addition to features, noise or differences in the pictures, differences in lighting and location of fish on the conveyor, even static owing to electronics of the camera itself should be noticed. To have a good prototype in implementing a classifier, images, which are taken by the camera should be preprocessed to make simpler succeeding operations without losing important data. Especially, it is possible to apply a segmentation process in which the pictures of dissimilar fish are in some way isolated from each other and from the environment. Next, the data of a particular fish is launched to a feature extractor, which idea is to decrease the information by determining certain features. The values of these features are then sent to a classifier that assesses the facts presented and makes an ultimate conclusion according to the types. In classification, always there is a trade-off between cost and performance, i.e. if fewer features are taken to classify instances it is possible to find undesirable fish in some cans, which can dissatisfy customers. On the other hand, if extra features are used in classification it will decrease the performance of the classifiers because of increasing the computation cost. Consider if just single feature lightness has taken to classify salmon from sea bass, it will not easy to separate these two types of fishes, because it is not sufficient to distinguish them by only measuring the lightness, so the cost of undesired fishes in cans will increase. If numbers of features are increased, e.g. width is used as a second feature, precision for classifying will increased, because sea basses typically are wider than salmons, which means the cost of wrong classification will decrease, while performance will be degraded because of increasing the computation time for processing more information. Figure 6 shows the two features of lightness and width for sea bass and salmon.
Figure 6: The two features of lightness and width for sea bass and salmon (14)
As it can be seen in figure 6 the black line is serving as a decision boundary of the classifier that explained earlier. According to this plot fishes are separated as the following rule: fishes are classified as sea bass if the feature vectors are fallen below the decision boundary (black line) and as Solomon otherwise. As shown in figure 6, Overall classification error on the data shown is fine, but there still some errors. Classification results can be improved if in addition to lightness and width some other parameters such as, vertex angle of dorsal fin, or the position of eyes and so on are included. Although using more features might be useful in increasing the precision of classifications, some features might be redundant due to they may not improve the performance considerably. Assume that using other features in the approach described above is too costly or expensive to evaluate, or offer slight progress (or probably even humiliate the performance), and that it is compulsory to make our decision based on the two features in Figure 6. If the models were very complex, the classifier would have a decision boundary more complex than a straightforward line. Therefore, all the training examples would be separated fully as shown in Figure 7.
Figure 7: a more complicated model for classification of fishes (14)
Although excessively complex models for classification problems can direct to perfect classification, it would direct to poor performance on upcoming samples. The novel test point marked ? is evidently most likely a salmon, whereas the complex decision boundary shown leads it to be misclassified as a sea bass (14). In case that all training samples fully separated, as exposed in figure 7, it is too soon to accept the results, because the main reason for satisfaction with results is designing a classifier that proposes actions when new samples are presented to the classifier, in this case, fish which is not
seen yet. This problem is referred to generalization issue. Complicated decision boundary, which is shown in figure 6, doubtfully can provide an acceptable generalization, because it sounds, it is adjusted to the special training patterns, instead of, adjusting it by some fundamental characteristics or proper model of all the salmon and sea bass that will be required to classified. Logically, one method for solving the generalization issue would be to use additional training patterns for achieving an improved approximation of the true fundamental characteristics, for example the probability distributions of the varieties. It is not easy to obtain such amount of data in most pattern recognition problems. If classifier, which is shown in figure 7, is used to classify samples, even with a huge number of training samples in a continuous feature space, the classifier will provides an awfully complicated decision boundary, and doubtfully would work fine on unseen samples. Nevertheless, if a very complicated recognizer is designed, it doubtfully gives good generalization, to be more precise, how can it be quantified and supported simpler classifiers? How would a system automatically decide that the simple curve in figure 8 is preferable to the obviously simpler straight line in figure 6 or the complex boundary in figure 7? By assuming that this trade-off can be managed to optimize in some way; is it predictable to find how well the system will generalize to novel samples? These questions are some of the essential problems in statistical pattern recognition6 (14).
Figure 8:" The decision boundary shown might represent the optimal tradeoff between performance on the training set and simplicity of classifier. (14)
Since classification is, at base, the task of recovering the model that generated the patterns, different classification techniques are useful depending on the type of candidate models themselves. In statistical pattern recognition the focus is on the statistical properties of the patterns. Model for a pattern may be a single specific set of features, though the actual pattern sensed has been corrupted by some form of random noise. (14)
If the classifier receives the same incoming samples, it is possible to apply a considerably dissimilar cost function that will direct to dissimilar actions in general. Diverse decision tasks may need different features and yield margins fairly different from boundaries which are functional for the original classification problem. This is rather obvious that decisions are necessarily task or cost specific that is deeply hard challenge to make a particular universal purpose artificial pattern recognition device, which has capability of working precisely based on a broad diversity of tasks. In pattern recognition, we are looking for a demonstration that the patterns direct to the similar action and patterns are in some way near to each other, but far from those that require a dissimilar action. The degree to which an appropriate demonstration is produced or learned and how near and far are measured separately will find out the success of the pattern classifier. Several supplementary characteristics are attractive for the demonstration. For instance, a few features desirable for the classifier, which directs to more straightforward decision areas and a classifier simpler to train. Also, it is possible to have robust features that fairly not sensitive to noise or other errors. Besides, in realistic applications, classifiers that act rapidly are required, or utilize a small number of electronic components, memory or processing cycles. 3.2 SVM and its extensions Support vector machine originally is based on the structural risk minimization (SRM) approach, which is dealt with binary-class problems (15) (16) (17). SVMs are utilized to create an optimal separating hyper-plane with high classification precision. SVM is introduced simply in this section, and readers are referred to (18) for supplementary information. Data set is given which N is the whole number of examples, and , to be exact, Xi is a p element real vector. One of the methods that can be used for the linear classification is soft-margin7 method, which the related constraint optimization model is as follows: Minimize Subject to {
7
Equation (1)
Equation (2)
Soft-margin method
Where in formulas (1) and (2) classification of the examples penalized by represents by .
are slack parameters that evaluating the level of incorrect . The error penalty is shown by , and the non-zero are
is a scalar that indicates the bias of the hyper-plane. The weight vector
, which defining a direction at a 90 degree angle to the hyper-plane. As it can be
seen in figure 9, the support vectors are circled and the optimization problem is a compromise between minimizing training errors and maximizing the margin.
Figure 9: A geometric interpretation of the classification of SVM for non-separable data set with twoclasses (16). The support vectors are circled (18)
Particularly, when the data are perfectly separable by line, then
= 0. In this case, the hyper-
plane that separates different classes examples makes the highest distance between the plane and the closest data (i.e., the maximum-margin = hyper-plane. Generally, described model is a classical convex optimization problem (16) which is a quadratic equation. The problem can be converted to the equivalent Lagrangian dual problem to reduce computations as follows:
Minimize

). This is called the optimal separating
to
and the derivation of
Equation (3)
By using partial derivation of
to , equation 3 can be solved so
that the subsequent saddle-point equations can be acquired:

Equation (4) Equation (5)
The dual quadratic optimization problem Equation 3 can be concluded as Equation 6 by replacement of Equations 4 and 5. Equation 6 can be maximized with respect to , by having into account Equations 4 and 5: {
Equation (6)
Equation (7)
Complementarity condition of Karush KuhnTucker (18) describes that the solution of dual optimization problem that obtained above must satisfy Equation 8. [ ]
Equation (8)
The meaning of equation 8 is that for each given i, the result of the equation will be zero if or (i.e. . In case of the training data vector are
called the SVs (support vectors). Red circle data points in figure 9 are shown support vectors. Equation 9 represents the optimal separating hyper-plane, as said by support vectors: By considering equation 9, the result for testing data vector follows: ( )
for upcoming tests, represents as
Until now, whatever described was only applicable for linear classification with binary-class labels. Non-linear classification tasks can be solved by applying a mapping function training data sets to map them from the input space into a feature space with more dimensions. Using higher dimensions lets the SVM to fit the hyper-plane with maximum-margin in the new feature space. The final decision function for non-linear classification is officially the same as Equation 10, with the exception of every dot product in Equation 10 should substituted by a nonlinear mapping function classification equations. A kernel function product of , which is called kernel trick (19) is employed to replace the dot . Equation 11 is shown difference between linear and non-linear to
, the mapping function. (refer to Equation 12):
( ( )
A kernel function can be any function that convinces Mercers theorem (20). Advantage of using a kernel function is that classification can be continued in the feature space without having precise information about structure of the mapping. Linear function degree and >0( ( , RBF (Gaussian radial basis function) is related to the kernel width, and polynomial function with ) ) are various classical SVM kernels. In linear and
polynomial functions the input vector with the biggest average in the training samples will overcome all samples. The Gaussian RBF kernel exploits distance between vectors, so it is not dependent to the location of the data. However, it is not easy to deduce that a RBF kernel performs better than linear and polynomial kernels for every data set to acquire an optimized separating hyper-plane. Hence, all three functions are experienced in (16) to expand a fundamental SVM to solve multi-class classification problem, one-against-one, one-against-all and direct acyclic graph, which are popular schemes. Hsu & Lin in (21) accomplished a complete comparison of these three multi-class SVM classification schemes, and they proposed that the one-against-one method is the most appropriate method for realistic applications. For that reason, in (16), one-against- one SVMs are employed as the ordinary classifiers using the LIBSVM software (22).Readers who are beginners in SVM can use a simple applet that provided by Chang & Lin to demonstrate SVM classification and regression in (23). As described earlier the SVM is trained to maximize the margin and to create good generalization ability by exploiting a separating hyper-plane. Up to the present moment SVM has been employed profitably in face and hand-written digit recognition as well in data mining. But, SVN has two disadvantages: First, it is not applicable by itself for multi-class classification, because, it is initially designed for two-class classification problems, so a combination of SVMs should utilize to overcome multi-class classification. Second, training of the SVM for a huge amount of samples will consume a lot of time, so approximate algorithms should use to overcome this problem, but using approximate algorithms (e.g. decomposition methods and sequential minimal optimization algorithm) due to the high complexity of time and space, consequently this will degrades the performance of the classification. Ensemble learning
methods such as bagging or boosting can be applied to overcome limitation of classification performance of SVMs (12). Ensemble learning methods are described in further sections. 3.3 Ensemble learning Ensemble learning is an idea that utilizes multiple classifiers and combines their decisions. There is no unique categorization for ensemble learning. For example, a list of eighteen classifier combination schemes covered by Jain, Duin and Mao (2000), and four ensemble classifier introduced by Witten and Frank (2000): bagging, boosting, stacking and error-correcting output codes, while seven methods of multiple classifiers proposed by Alpaydin (2004): voting, errorcorrecting output codes, bagging, boosting, mixtures of experts, stacked generalization and cascading. (24) The story of ensemble learning begins from 1988, when Wittner and Denker argued approaches for training layered neural networks classification tasks. Schapire introduced boosting in 1990 (see section 3.3.3). A general method in 1990 was introduced by Kleinberg in an academic article, which proposed to use stochastic processes to separate points in multidimensional spaces. This method which called stochastic discrimination (SD) mainly receives low-level results as input and produces high-level results. Stacked generalization was introduced in 1992 by Wolpert as an idea for decreasing the generalization error rate of one or several generalizers. AdaBoost were introduced in 1996 by Freund and Schapire (see section 3.3.3) and bagging introduced by Breiman in the same year (see section 3.3.1). The application of majority voting to pattern recognition was analyzed by Lam and Suen in 1997. Later in 1998 Schapire, et al. suggest a description for the effectiveness of voting techniques. According to their investigations this experience is associated to the distribution of boundaries of the teaching samples with regard to the produced voting classification rule, where the boundary of a sample is merely the variation between the number of proper votes and the highest number of votes expected by any wrong label. Boosting algorithm of AdaBoost was established in 1999 by Schapire, and fundamental theory of boosting was explained. Bagging and two boosting methods were compared in 1999 by Opitz and Maclin: AdaBoost and arching8. According their studies in a low noise conditions, boosting do better than bagging,
8
Arching (adaptive reweighting and combining) is a generic term that refers to reusing or selecting data in order to improve classification. (24)
which overtake a single classifier, whilst bagging is the most proper technique as a general method. An ensemble of classifiers is a set of multiple classifiers so that individual decisions are combined to classify test samples (12).It is clear that a combination of several individual classifiers, which form an ensemble of classifiers, gives much better performance than individual classifiers that make it. In the following paragraphs, according to Hyun-Chul Kim et al. paper (12), it will be explained, why ensemble learning has a better performance than individual classifiers. Hyun-Chul Kim et al. considered a test data , and an ensemble of n classifiers as follows: { }. When all classifiers are the same, an ensemble will show the same
performance as individual classifiers when classifiers are wrong or correct for similar data. However, in case of having different classifiers with uncorrelated errors, then if incorrect, the majority of classifiers, excluding is
, can be correct. Consequently, majority
voting might be correct as well. To be more precise, if the error of individual classifier is and the errors are independent, afterward the probability of majority voting, is below formula:

E, which is incorrect result
(12)
When number of classifiers n is huge, the probability
E becomes extremely undersized. Hence,
an ensemble of SVMs is expected to conquer lack of the performance humiliation of SVM, when approximated learning algorithm are used or when SVMs are combined to make multi-class classifiers. The most significant point in making ensemble of SVMs is that each single SVM should be diverse from other SVMs as much as achievable. This constraint can be met by training each individual SVM with different training sets of input vectors.
3.3.1 Bagging
Bagging is derived from bootstrap aggregation. It was the first effective technique of ensemble learning and is one of the straightforward methods of arching. Bagging is a meta-algorithm that can be viewed as a special case of averaging, which was initially deliberated for classification and is typically used with decision tree models, although it can be applied to any variety of model for classification or regression.
Bagging uses sampling and replacement method, which is a technique that exploits several versions of a teaching set by using the bootstrap. Different models are trained by using these data sets. In case of regression the output of the models are combined by averaging and in case of classification by voting to produce a single output. It is important to consider that Bagging is merely successful when using unstable9 non-linear models (24). SVMs are widely are used to implement Bagging method. To realize Bagging, each single SVM is trained separately via bootstrap method by using selected training samples at random, and their outputs are combined using an appropriate aggregation method (12). Typically, a single training set | is used in Bagging, but for building
SVM ensemble with K independent SVM, K training data set is needed. Based on the statistical fact, to obtain higher enhancement of the aggregation outcomes, it is needed to increase the training data sets diversity as much as possible. Therefore from data set TR, randomly K extra reproduced training data sets | should be constructed repeatedly by in the known training set TR can emerge
re-sampling, but with substitution. Each instance
repetitively or it may not appear in any particular reproduce training data samples. Each reproduced training data sample will be applied to train definite SVM (15).
3.3.2 Boosting
The other meta-algorithm which can be considered as a model averaging method is called Boosting. It is broadly employed as ensemble method and one of the most dominant learning techniques initiated in the last two decades. It is initially proposed for classification; however it can also be used for regression. For implementing Boosting, at first, a weak classifier on the training set can be created with precision slightly better than wild guessing. A series of models are built repetitively by training each of them on a data set in which points classified improperly by the preceding model are given more weight. At the end, all of the succeeding models are weighted due to their achievement, and final model will be created from combining outputs by voting for classification and by averaging for regression. In early researches, the first boosting algorithm to produce a strong learner was combining three weak learners (24). Similar to Bagging, SVMs are utilized to build Boosting ensembles. To realize boosting ensembles, each SVM is also trained on dissimilar training samples. In boosting, training
9
Unstable i.e.
a small change in the training set can cause a major change in the model (24)
samples probability distribution is utilized to train each single SVM, in which distribution of training samples can be updated in proportion to the degree of error of the sample. Like Bagging, outputs of individual trained SVMs are combined to build a collective decision by applying methods, like majority voting, LSE (least squares estimation)-based weighting, and double-layer hierarchical combining (12) . AdaBoost is the representative boosting algorithm that will be described in section 3.3.3.
In boosting, each individual SVM is trained using training samples probability distribution, which is updated in proportion to the degree of error of the sample. 3.3.3 AdaBoost
AdaBoost (adaptive boosting) is the most conventional boosting algorithm. The number of learners can be arbitrary and also training sets can be small, because it is possible to use the same training sets several times (24). Although each SVM is trained using different training samples, the selection scheme of training samples in the AdaBoost technique is rather diverse from the Bagging technique (12). Similar to Bagging, each SVM is trained by dissimilar training samples as well. But, the selection method of training samples in the AdaBoost method is fairly diverse from the bagging technique. An algorithm to implement AdaBoost method is described in (15) as follows: In the beginning, a training set of samples, which the same value of weigh A training set acquired by choosing weight values | | is considered that contains to each sample in the TR is assigned. is built to train the kth SVM classifier, i.e.
samples along with the entire data set TR consistent with the at the (k-1)th repetition. The kth training SVM classifier is trained by
using these training samples. Afterward, the classification performance of the kth trained SVM classifier should evaluate by applying the entire training sample TR as follows: Updated weight values of the training samples in TR are acquire according to the errors
produced by the training samples as follows: the weight values of the samples that classified incorrectly are increased, while the weight values of the samples that classified correctly are reduced. This means, samples that are hard to classify are chosen more frequently. (k+1)th SVM classifiers can be achieved by using updated weight values from | training sets. Explained sampling procedure will be repeated until K
training data set has been built. Figure 6 depicts the pseudo-code that used in (15) to demonstrate AdaBoost algorithm.
Figure 10 : The AdaBoost Algorithm (15)
3.3.4
Methods for aggregating support vector machines
Aggregation of outputs of several independently trained SVMs is needed to be done by a proper combination method. Two type of aggregation method exists, linear and non-linear combination techniques. LSE-based weighting is a linear combination technique, which is described in section 3.3.4.2, that regularly is employed for bagging and boosting ensemble trainings. Majority voting and double-layer hierarchical methods are two non-linear techniques that can be used to combine several SVMs are described in sections 3.3.4.1 and 3.3.4.3.
3.3.4.1 Majority Voting
The most straightforward technique for aggregating outputs of several SVMs is Majority voting. If we consider and as a decision function of the as a label of the group, and if we consider class. Then, SVM in the SVM ensemble { | } as the will be the
number of SVMs whose decisions are known to the
aggregated decision of the SVM ensemble due to majority voting for a given test vector follows: .
as
3.3.4.2 LSE-Based Weighting
In LSE-based weighting method different weights are taken into account for a number of SVMs in SVM ensembles. Regularly, the weight of each SVM is specified by in fraction to its precision of classification. The LSE method for training the SVMs to learn weights is as follows. Decision function of the | SVM in the SVM ensemble is shown by , and is the ( )
is a replicate data set that is used for training the SVM ensemble. , where ( ( )) and
weight vector that can be calculated by After calculating W as weight vector, particular test vector (
the aggregated decision for SVM ensemble for a
owing to the LSE-based weighting can be specified by below formula: *( ) +).
3.3.4.3 Double-layer Hierarchical Combining
In this method a SVM is used to combine the outputs of SVMs in the SVM ensemble. Double layer hierarchical combing method has a lower-layer that consists of several SVMs and the outputs of these SVMs are given into a SVM in the upper-layer called super SVM. The final decision for x as a given test vector owing to this method can be calculated by (( )), where is a decision function of the SVM in is a
the SVM ensemble in the lower-layer which
is the number of SVMs in that layer and
decision function for the super SVM in the upper layer.
4 Empirical Evaluation
Second phase of the thesis project a single SVM is implemented and trained to work as an expert to analyze EEG signals which are obtained from benchmark studies. The outputs of this phase are various SVMs that can obtain by setting up the SVM with some different numbers of features as input which will work as local SVMs. (25) In third phase, by applying knowledge gathered in both phases one and two, ensemble learning will be used to improve the accuracy rate to distinguish Reiki from placebo. MATLAB Version
7.10 on the Release 2010a is used to implement SVMs and different extensions of ensemble learning such as boosting and bagging and so on. Section 4 will represent a quantitative description of evaluation of the experiments which are designed for the thesis and they will be compared with benchmark data.
4.1 Experiments Devices and Tools

After finishing introductory studies for phase 1 and become familiar with SVM an different ensemble learning methods, in the second phase, information from Ghobadis Master thesis project studied and they prepared to become suitable as input text files of features for SVM and MATLAB version 7 release 2010a was used to implement a single SVM. To implement a single SVM several papers including Mariam Rajabzadehs (25) paper -advanced learning methods in feed forward networks- and also MATLABs help have studied, and find out that it is easier to follow MATLABs help to implement a single SVM. The hardware used for calculation is a Fujitso Siemens Laptop with Intel Core 2 Due CPU 2.00 GHz and 4.00 GB RAM using Windows
7, 32-bit operating System.
Two free MATLAB tools, MATLABArsenal (26) and MILL (27) (Multiple Instance Learning Library), are studied and find out that MATLABArsenal has more ensemble classifiers than MILL and also its help is much complete than MILL. Therefore, MATLABArsenal is selected to apply ensemble methods such as Bagging, AdaBoost and adaptive and hierarchical on EEG signals to classify Reiki/Rest and Reiki/Placebo data in section 4. Application of MATLABArsenal is rather easy and it needs to arrange information as one of three known formats of the package and by detrmaining an ensemble method and adjasting its parameters it will begin classiffcatin of information according to features have given to it. To test the coorectness of the package I have used MATLABArsenal to classify Iris samples which are given in MATLABs help documents, gathered by Fisher (28). Below example is a simple program written in MATLAB to exploite MATLABArsenal to classify 150 samples of 3 types of Iris with 4 features and the results have shown as well. load fisheriris; groups= ismember (species, 'setosa);
data=[meas(:,1),meas(:,2), meas(:,3),meas(:,4), groups(:,1)]; save tst.txt data -ASCII Arsenal ('classify -t tst.txt -sf 1 -- LibSVM -Kernel 0 -CostFactor 3); Results: Message: Train-Test Split, Boundary: -2, Classification, Shuffled Data Number: 150, Feature Number: 4, Data per Class: (1, 50) (0,100( Classifier: LibSVM, Parameters: -Kernel: 0 -CostFactor: 3 YY:24, YN:0, NY:0, NN:51, Prec:1.000000, Rec:1.000000, Err:0.000000, Baseline=0.680000 Error = 0.000000, Precision = 1.000000, Recall = 1.000000, F1 = 1.000000, MAP = 0.000000, MBAP = 0.000000, Time = 2.015000, As it can be seen, MATHLABArsenal can classify information very well with accuration of 100% diffrent types of Irises, same as the result of SVM example of MATHLAB documents in addition it give more useful parametrs such as callculation time, percision, recall and F1.
4.2 Samples Structure

It is very important to know how data is obtained and how it is stored in files. According to my investigations there were 19 volunteers in Ghobadis experiments, which 12 were operated by Reiki practitioner and 7 were operated by a regular person. Samples have taken in 20 seconds windows and each sample was consisted information of 19 separate channels of EEG for each volunteer. Also there are two different classes of data: Class 1: Two minutes for Rest (6 EEG samples) Class 2: Seven minutes for Reiki or Placebo (21 EEG samples
That those samples were stored in text files, which 1 to 12 include Reikis data and 13 to 19 include Placebos data, in the following order:
Volunteer 1 information for channel 1 (6 sample for rest + 21 sample for Reiki or Placebo)
Volunteer 1 information for channel 2 (6+21) .............................. Volunteer 1 information for channel 19 (6+21) Volunteer 2 information for channel 1( 6+21) Volunteer 2 information for channel 2 (6+21) .............................. Volunteer 2 information for channel 19 (6+21) .............................. Volunteer 19 information for channel 1( 6+21) Volunteer 19 information for channel 2 (6+21) .............................. Volunteer 19 information for channel 19 (6+21)
Later, during the experiments one and two the order above changed by using shuffle parameter to prepare different orders of inputs to SVM and Ensemble methods. I altered Ghobadis programs to be able collect data described above from different EEG files to a single file named total_data.mat. 4.3 Input vector Features order Each line in the above file contains 21 numbers, which numbers from 1 to 20 are features that described earlier in the section 2.2.5 in table 2 and the last one is the class of data that described in section 4.1. The original file contains below features in each row: Frequency features takes positions number 1 to 5 as below: Delta, Theta, Alpha1, Alpha2, Beta
Wavelet features take positions number 6 to 17 as follows: A5-AVE, D5-AVE, D4-AVE, D3-AVE, A5-POW, D5-POW, D4-POW, D3-POW, A5-VAR, D5-VAR, D4-VAR, D3-VAR
None-linear features are placed in position 18 to 20 in the following order Fractal, Lyapanov, Antropi Class of data takes position number 21
Linear features were not used in the experiments one and two due to they were not contain any remarkable difference between Placebo and Reiki data according to Ghobadis thesis report (2). Total_data.mat file has 9747 rows which presents all data needs for SVM and Ensembles. 4.4 Experiment one In section 4.4, first information in total_data.mat will present to a single SVM for classifying Reiki/Rest and Reiki/Placebo data as reference for benchmarking data. Next effect of shuffling (different order of rows) will be studied and further different ensemble methods will be applied on information to find out to how extent they can improve correction classification rate and what is the effect of them on calculation time. And finally different numbers and orders of features will be studied.
4.4.1 Single SVM
A simple program contains eight single SVM was implemented in MATLAB using crossvalind, svmtrain and svmclassify functions to classify four different group of features including Frequency, Wavelet, None-Linear and All features. Training of the SVMs was done by cross validation method and linear kernel is used for svmclassify function to classify two different classes of data Reiki/Rest and Reiki/placebo. Table 7 shows the result for single SVMs experiment.
Table 7: Results for a single SVM to classify different group of features for Reiki/Rest and Reiki/Placebo
Classes of data Reiki/Rest Reiki/Placebo
Measured Parameters Correction Rate Running Time(Seconds) Correction Rate Running Time(Seconds)
Frequency 0.7778 6.27 0.6934 895.72
Wavelet 0.7706 6.85 0.6762 184.08
NoneLinear 0.7778 5.93 0.644 4.92
All Features 0.7789 21.7 0.7187 1830.14
As it can be seen the best Correction classification rate is obtained from group All-Features which for Reiki/Rest is 0.7789 and for Reiki/Placebo is 0.7178, but calculation time for this group remarkably is higher than other three groups. Especially for Reiki/Placebo the difference is 1825.22 seconds, which is very high.
4.4.2 Shuffled Versus no-Shuffled This experiment is designed to find out the effect of shuffling in results of Correction classification rate and Calculation time (Running Time), using MatlabArsenal toolbox. MatlabArsenal uses text format files as input and this experiment utilize twenty files that each contains only one features information. Correction Rate and Running Time are calculated for each of these 20 features separately with LibSVM
classifier using RBF kernel with the same kernel parameters KernelParam 0.01 and CostFactor 3. Cross validation technique were used to train the LibSVM classifier and the results can be seen in diagrams 1 and 2.
Diagram 1: Shuffled Verses no-Shuffled Correcton Rate
0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Reiki/Rest Reiki/Placibo Reiki/Rest Reiki/Placibo Arsenal SVM rbf no-shuffled Arsenal SVM rbf shuffled
Correction Classification Rate Delta A5-AVE D5- POW D4- VAR teta D5-AVE D4- POW D3- VAR alpha1 D4-AVE D3-POW Fractal alpha2 D3-AVE A5- VAR Lyapanov beta A5-POW D5- VAR Antropi
As seen in diagram 1 there is no fluctuation in Reiki/Rest Correction rate, but Correction Rate for noshuffled Reiki/Placebo data are surprisingly lower than shuffled Reiki/Placebo data. Diagram 2 illustrates that there is no remarkable difference between no-shuffled and shuffled data for both Reiki/Rest and Reiki/placebo information, so to have better results for Reiki/Placebo data, shuffling data used for all following experiments.
Diagram 2: Shuffled Verses no-Shuffled Running Time
60 50 40 30 20 10 0 Reiki/Rest Reiki/Placibo Reiki/Rest Reiki/Placibo Arsenal SVM rbf no-shuffled Arsenal SVM rbf shuffled
Calculation Time Delta A5-AVE D5- POW D4- VAR teta D5-AVE D4- POW D3- VAR alpha1 D4-AVE D3-POW Fractal alpha2 D3-AVE A5- VAR Lyapanov beta A5-POW D5- VAR Antropi
For studying detailed results please reference to Table 8 in Appendix A. Regarding to this experiment, there were not extraordinary difference between Correction classification rate of all twenty features, so all feature were used in further classification processes. 4.4.3 Different Features' Orders By considering experiments 4.4.1 and 4.4.2 another experiment designed to discover effect of different features orders for both measures Correction rate and Running time. As seen in table 9 twenty features are sorted in different orders in groups one to four. Groups 1 to 3 have the same number of features, but group 4 has only sixteen features that have significant difference between Reiki/Rest and Reiki/placebo. Beta, A5-AVE, A5-POW, A5-VAR features are removed from Group4 according to Ghobadis thesis report (2).
Table 8: Different features' orders
Features 1 2 3 4 5 6 7
Group1 Delta teta alpha1 alpha2 beta A5-AVE D5-AVE
Group 2 A5-AVE D3- POW D4- VAR teta Antropi D5-AVE D4- POW
Group 3 A5-AVE A5-POW A5- VAR Delta Fractal D5-AVE D5- POW
Group 4 D3- POW D4- VAR teta Antropi D4- POW D3- VAR alpha1
8 9 10 11 12 13 14 15 16 17 18 19 20
D4-AVE D3-AVE A5-POW D5- POW D4- POW D3- POW A5- VAR D5- VAR D4- VAR D3- VAR Fractal Lyapanov Antropi
D3- VAR alpha1 Fractal D4-AVE A5-POW D5- VAR alpha2 Lyapanov D3-AVE D5- POW A5- VAR Delta beta
D5- VAR teta Lyapanov D4-AVE D4- POW D4- VAR alpha1 Antropi D3-AVE D3- POW D3- VAR beta alpha2
Fractal D4-AVE D5- VAR Lyapanov alpha2 D3-AVE D5- POW Delta D5-AVE * * * *
This experiment is divided into two section, which at first Bagging and Adaboost, which belong to subsampling the training examples ensemble methods, were applied in these group and afterward a stacking classification method (MCWithMultiFSet wrapper) and a Hierarchical classification method using the meta-classifier on top (MCHierarchyClassify wrapper) were used to classify these groups. 4.4.3.1 Bagging and AdaBoost Diagrams 3 and 4 are shown Bagging and Adaboost ensemble methods results for Groups 1 to 4 by utilizing LibSVM Classifier. RBF kernel was used among kernel Parameters KernelParam 0.9 and CostFactor 1 and Cross validation method was used for training and testing SVMs. According to the diagram 3 the best Correction Rate for Reiki/rest is 0.8972 for Group4 and for Reiki/Placebo is 0.8517 for both groups 1 and 2 for Adaboost method and is 0.8479 for Bagging method.
Diagram 3: Correction Classification Rate for AdaBoost and Bagging ensemble methods
0.95 0.9 0.85 0.8 0.75 0.7 0.65 Group3 Group4 Group3 Group1 Group2 Group1 Group2 Group4
Reiki/Rest Reiki/Placebo
AdaBoost Correction Rate
Bagging
From Diagram 4 minimum Running time is 442.91 seconds, which also belongs to Group 4 for Reiki/rest and the least Running time for Reiki/Placebo is 699.72, which belongs to Group 4. It is no wonder why Running time for Group 4 is the least due to it has less number of data.
Diagram 4: Calculation time Rate for AdaBoost and Bagging ensemble methods
1600.00 1400.00 1200.00 1000.00 800.00 600.00 400.00 200.00 0.00 Group3 Group4 Group3 Group2
Reiki/Rest
Group1
Group1
AdaBoost Run Time (Sec)
By considering the same amount of Correction rate for Groups 1 and 2 for AdaBoost method, using Group 1 is suggested for classification of Reiki/Placebo data due to lesser Running time in compare with the other groups. Group 4 is suggested to apply for classifying both Reiki/Rest and Reiki/Placebo in term of Bagging method. Detailed results can be referenced to table 10 in Appendix A. 4.4.3.2 Stacking classification method (MCWithMultiFSet wrapper) and a Hierarchical classification method (MCHierarchyClassify) In the Stacked generalization methods the output of the ensemble serves as a feature vector to a meta-classifier. [Bishop 1995]
Figure 11:Stacked generalization (MCHierarchyClassify)
Adaptive Combiners: Hierarchical classification on multiple groups of features, use sum rule or
majority voting to combine. [Rong Yan 2006]
Group2
Bagging
Group4
Reiki/Placebo
Figure 12: Adaptive Combiner (MCWithMultiFSet)
1-2 Stacked Generalization : Adaptive Combiners

MCHierarchyClassify Wrapper Classifier Stacked Generalization . MCWithMultiFSet Wrapper Classifier Adaptive Combiners .
MCWithMultiFSet SVM 3 6 SVM Voting Sum Rule . MCHierachyClassify SVM ) (SVMLight SVM SVM ( )LibSVM SVM . MCHierarchyClassify RBF 1= KernelParam=0.9, CostFactor . MCWithMultiFSet LibSVM RBF 1= KernelParam=0.9, CostFactor Voting . 3 Correction rate 123 Correction Classification rate Running Time . 3
Reiki/Placebo MCHierarchy Correction Classification rate . 3 Correction Classification rate MCHierarchy 438834 / . / Correction Rate 0000.1 0008.0 0006.0 0004.0 0002.0 0000.0 3Group 4Group 3Group 1Group 2Group 1Group 2Group 4Group ReikiRest ReikiPlacebo 3
MCWithMultiFSet
MCHierarchy Classify
4 Running Time 00.0001 00.009 00.008 00.007 00.006 00.005 00.004 00.003 00.002 00.001 00.0
ReikiRest ReikiPlacebo
4Group
3Group
3Group
1Group
2Group
1Group
MCWithMultiFSet
MCHierarchy Classify 3 :
)Run Time (Sec
2Group
4Group
3- Reiki/Rest Reiki/Placebo ) Stacked Generalization(MCHierarchyClassify )Adaptive Combiners(MCWithMultiFSet
1=Meta Classifier:LibSVM,RBF KernelParam=0.9, CostFactor ,1=BaseClassifier:SVM_LIGHT,RBF KernelParam=0.9, CostFactor
Reiki/Rest
3 Cross validation class Correction Rate Run Time )(Sec Prec Rec 1F Correction Rate Run Time )(Sec Prec Rec 1F
1Group 1087.0 56.981 0000.1 3010.0 4020.0 9827.0 55.012 7017.0 7269.0 7718.0
MCWithMultiFSet 2Group 3Group 4977.0 12.733 5848.0 0110.0 7120.0 3807.0 11.372 9296.0 8669.0 2708.0 7977.0 80.913 4917.0 4510.0 1030.0 1617.0 01.362 6796.0 8179.0 2218.0
4Group 2028.0 11.631 2618.0 3299.0 6598.0 0107.0 80.692 8786.0 3469.0 9208.0
1Group 4647.0 65.975 5993.0 3672.0 3623.0 5538.0 85.997 4558.0 1098.0 3278.0
MCHierarchy Classify 2Group 3Group 6147.0 28.185 8573.0 3742.0 0892.0 3438.0 64.708 4258.0 2298.0 9178.0 3447.0 96.575 9053.0 0181.0 1832.0 6818.0 39.697 5248.0 8678.0 3958.0
4Group 4388.0 84.355 8698.0 7069.0 6729.0 8008.0 04.239 6818.0 3188.0 1848.0
Reiki/Placebo
3 MCWithMultiFSet MCHierarchy MCHierarchy . MCHierarchy / . 1-3 Correction Rate Running Time combiner voting Sum Rule MCWithMultiFSet 5 6 4 . LibSVM RBF 9.0=KernelParam 1= CostFactor . 5 Correction Rate Voting Sum Rule Correction Rate 1 Group Reiki/Placebo 9727.0 Voting 435838 Sum Rule Voting 42% Sum Rule . Running Time Voting Sum Rule .
5- Correction Rate Arsenal Adaptive Combiners )(MCWithMultiFSet
0.7200 0.7000 0.6800 0.6600 0.6400 0.6200 0.6000 0.5800 0.5600 0.5400 0.5200 Group1 Group2 Group3 Group1 Group2 Group3 Voting Correction Rate Sum Rule
Adaptive Combiners (MCWithMultiFSet) Arsenal Running Time -6
450.00 400.00 350.00 300.00 250.00 200.00 150.00 100.00 50.00 0.00 Group1 Group2 Group3 Group1 Group2 Group3 Voting Run Time (Sec) Sum Rule Reiki/Rest Reiki/Placebo
Running Time Correction Rate 4 KernelParam=0.9, CostFactor= 1 RBF LibSVM Cross validation, 3 class Reiki/ Rest Correctio n Rate Run Time Voting Group 2 0.6694 372.67
Group1 0.6912 237.82
Group 3 0.6832 358.88
Group 1 0.5854 277.84
Sum Rule Group 2 0.5932 369.58
Group 3 0.6272 356.84
(Sec) Prec Rec F1 Reiki/Placebo Correctio n Rate Run Time (Sec) Prec Rec F1 0.6740 0.9900 0.8019 0.6930 253.30 0.6750 0.9910 0.8031 0.6571 0.9967 0.7920 0.6679 375.99 0.6557 0.9983 0.7915 0.6673 0.9940 0.7985 0.6793 354.59 0.6649 0.9925 0.7963 0.7226 0.5579 0.6296 0.5932 253.77 0.7297 0.5655 0.6371 0.7401 0.5481 0.6298 0.5964 384.37 0.7466 0.5464 0.6310 0.7772 0.5746 0.6607 0.6234 365.63 0.7755 0.5683 0.6559
Running Time correction classification rate ensemble 1 Single SVM 7 8 9 41 / / RBf- RBf-sigma=0.7 Single SVM . : / . 834 934 5934 sigma /Single SVM Ensemble Correction Classification Rate 7 1.0000 0.9000 0.8000 0.7000 0.6000 0.5000 0.4000 0.3000 0.2000 0.1000 0.0000 Group1 Group2 Group3 Group4 ReikiRest: Correction Rate
Single SVM AdaBoost Bagging MCWithMultiFSet MCHierarchy Classify
/Single SVM Ensemble Correction Classification Rate 8-
0.9000 0.8000 0.7000 0.6000 0.5000 0.4000 0.3000 0.2000 0.1000 0.0000 Group1 Group2 Group3 Group4 ReikiPlacebo: Correction Rate Single SVM AdaBoost Bagging MCWithMultiFSet MCHierarchy Classify
/Single SVM Ensemble Running Time 1000 900 800 700 600 500 400 300 200 100 0 Group1 Group2 Group3 Group4 ReikiRest: Running Time (Sec) /Single SVM Ensemble Running Time
Single SVM AdaBoost Bagging MCWithMultiFSet MCHierarchy Classify
11
2000 1800 1600 1400 1200 1000 800 600 400 200 0 Group1 Group2 Group3 Group4 ReikiPlacebo: Running Time (Sec) Single SVM AdaBoost Bagging MCWithMultiFSet MCHierarchy Classify
4.5 Experiment 2
Different Kernels( Linear, Polynomial, RBF, Sigmoid) Different Kernel Parameters and Cost Factor Precision, Recall & F1 Metrics
Specify the best Kernel for Correct Classification Rate with Minimum Calculation Time Linear Polynomial RBF Sigmoid
Specify the best parameters for Kernel Parm Cost Factor
Compare Correction Classification Rate with Precision, Recall and F1
: ensemble Precision Recall 1: F

ensemble Wrapper ArsenalMatlab . Wrapper AdaBoost, Bagging, MultiFSet, Hierarchical Classifier 1 recall precision F . - - Ensemble wrapper CostFactor KernelParm Sigmoid RBF Polynomial Linear Reiki/Placebo Reiki/Rest 3 4 Reiki/Placebo Reiki/Rest . .
2-1 ensemble Correction Rate

2-1-1 Bagging Bagging . ensemble Reiki/Placebo Reiki/Rest . LibSVM Sigmoid RBF Polynomial Linear 1.0= CostFactor =1 KernelParm . Reiki/Rest Reiki/Placebo . Correction Rate Reiki/Rest 4 RBF 7568.0 Reiki/Placebo 3 RBF 6997.0 9997.0 . RBF Reiki/Rest 83.066 4 Reiki/ Placebo 81.747 52.277 Correction Rate 3 ensemble . linear Bagging . - Correction rate LibSVM Sigmoid RBF Polynomial Linear 1.0= KernelParm 1= CostFactor 1 4 Bagging
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
Group3
Group1
Group2
Group3
Group4
Group1
Group2
Group3
Group4
Group1
Group2
Group4
Group1
Group2
Group3
linear Kernel
polynomial Kernel
RBF Kernel Sigmoid kernel
Correction Rate Sigmoid RBF Polynomial Linear LibSVM 2- Bagging 1 4 CostFactor=1 KernelParm=0.1 1400 1200 1000 800 600 400 200 0 Group1 Group2 Group3 Group4 Group1 Group2 Group3 Group4 Group1 Group2 Group3 Group4 Group1 Group2 Group3 Group4 Reiki/Rest Reiki/Placebo
linear Kernel
polynomial Kernel
RBF Kernel
Sigmoid kernel
Running Time
RBF Polynomial Linear LibSVM Bagging : 4 1 CostFactor=1 KernelParm=0.1 Sigmoid AdaBoost 2-1-2 AdaBoost 4 Reiki/Rest Correction Rate . . 9518.0 RBF Reiki/Placebo
Group4
0.8725 RBF
07.024 4 Reiki/Rest Sigmoid RBF 02.613 . Reiki/ Placebo . ensemble AdaBoost RBF Sigmoid RBF Polynomial Linear LibSVM Correction rate - AdaBoost 1 4 CostFactor=1 KernelParm=0.1 1.0000 0.9000 0.8000 0.7000 0.6000 0.5000 0.4000 0.3000 0.2000 0.1000 0.0000 Group1 Group2 Group3 Group4 Group1 Group2 Group3 Group4 Group1 Group2 Group3 Group4 Group1 Group2 Group3 Group4
linear Kernel polynomial Kernel
RBF Kernel
Sigmoid kernel
Correction Rate Sigmoid RBF Polynomial Linear LibSVM - AdaBoost 1 4 CostFactor=1 KernelParm=0.1 3000.00 2500.00 2000.00 1500.00 1000.00 500.00 0.00 Group1 Group2 Group3 Group4 Group1 Group2 Group3 Group4 Group1 Group2 Group3 Group4 Group1 Group2 Group3 Group4 Reiki/Rest Reiki/Placebo
RBF Kernel
Sigmoid kernel
Running Time RBF Polynomial Linear LibSVM AdaBoost : 4 1 CostFactor=1 KernelParm=0.1 Sigmoid MultiFset 3-1-2 56 3 MultiFset RBF 4 Reiki/Rest Correction Rate . 3696.0 . RBF Reiki/Placebo
0.8093
Reiki/ 35.431 4 Reiki/Rest RBF 3 Correction rate 69.012 . Placebo . Reiki/ Placebo 2-1-1 2-1-2 3
Sigmoid RBF Polynomial Linear LibSVM Correction rate -5 MultiFset 1 4 CostFactor=1 KernelParm=0.1 0.9000 0.8000 0.7000 0.6000 0.5000 0.4000 0.3000 0.2000 0.1000 0.0000 Group1 Group2 Group3 Group4 Group1 Group2 Group3 Group4 Group1 Group2 Group3 Group4 Group1 Group2 Group3 Group4
RBF Kernel
Sigmoid kernel
Correction Rate
Sigmoid RBF Polynomial Linear LibSVM 6- MultiFset 1 4 CostFactor=1 KernelParm=0.1 1000.00 900.00 800.00 700.00 600.00 500.00 400.00 300.00 200.00 100.00 0.00 Group1 Group2 Group3 Group4 Group1 Group2 Group3 Group4 Group1 Group2 Group3 Group4 Group1 Group2 Group3 Group4
RBF Kernel
Sigmoid kernel
Running Time
RBF Polynomial Linear LibSVM MultiFset 3: 4 1 CostFactor=1 KernelParm=0.1 Sigmoid Hierarchical 4-1-2 Hierarchical 4 Reiki/Rest Correction Rate . 9997.0 . RBF Reiki/Placebo 0.8671 RBF 29.674 4 Reiki/Rest RBF 3 2 33.438 2 . Reiki/ Placebo . Reiki/ Placebo 3 3
0.8 Reiki/ Placebo 7.0 Reiki/Rest PosRatio Wrapper . PosRatio . Wrapper PosRatio . ./ /
Sigmoid RBF Polynomial Linear LibSVM Correction rate -7 Hierarchical 1 4 CostFactor=1 KernelParm=0.1 1.0000 0.9000 0.8000 0.7000 0.6000 0.5000 0.4000 0.3000 0.2000 0.1000 0.0000 Group1 Group2 Group3 Group4 Group1 Group2 Group3 Group4 Group1 Group2 Group3 Group4 Group1 Group2 Group3 Group4
RBF Kernel
Sigmoid kernel
Correction Rate
KernelParm=0.1 Sigmoid RBF Polynomial Linear LibSVM 8- Hierarchical 4 1 CostFactor=1 1600.00 1400.00 1200.00 1000.00 800.00 600.00 400.00 200.00 0.00 Group1 Group2 Group3 Group4 Group1 Group2 Group3 Group4 Group1 Group2 Group3 Group4 Group1 Group2 Group3 Group4
RBF Kernel
Sigmoid kernel
Running Time
4- Hierarchical LibSVM RBF Polynomial Linear CostFactor=1 KernelParm=0.1 Sigmoid 1 4
RBF Cost factor KernelParm .
2-2- RBF Correction Rate Duration Time

Wrapper Bagging, AdaBoost,MultiFset,Heirachy LibSVM RBF CostFactor KernelParm . KernelParm
2-2-1
3 / 4 / . KernelParm KernelParm 50.0 1.0 3.0 5.0 7.0 9.0 59.0 Wrapper ,Bagging AdaBoost,MultiFset,Heirachy LibSVM RBF . 9 01 11 21 Running Time Correction Classification Rate KernelParm . 9 01 Correction Rate wrapper Bagging Heirarchy Adaboost Kernelparm Wrapper MultiFSet . MultiFSet Wrapper . 01- 3Correction Rate, Reiki/Placebo Group 9- 4Correction Rate, Reiki/Rest Group
0.9200 0.9000 0.8800 0.8600 0.8400 0.8200 0.8000 0.7800 0.7600 0.05 0.10 0.30 0.50 0.70 0.90 0.95
Bagging AdaBoost MultFset Heirachy
0.9000 0.8500 0.8000 0.7500 0.7000 0.6500 0.6000 0.5500 Bagging AdaBoost MultFset Heirachy
0.05
0.10
0.30
0.50
0.70
0.90
Running Time, Reiki/Placebo Group3
-12
0.95 -11
Running Time, Reiki/Rest Group4
00.009 00.008 00.007 00.006 00.005 00.004 00.003 00.002 00.001 00.0
09.0
50.0
01.0
03.0
05.0
07.0
00.0002 00.0051 Bagging 00.0001 00.005 00.0 50.0 01.0 03.0 05.0 07.0 09.0 59.0 AdaBoost MultFset Heirachy
Wrapper 5 . 5 Correction Rate / 59.0= KernelParm Hierarchy Wrapper 6898.0 Correction Rate Wrapper AdaBoost 1898.0 09.0=Bagging KernelParm 0898.0 59.0= MultiFSet KernelParm 2128.0 09.0= KernelParm . 84.861 66.704 50.894 55.517 Heirarchy Bagging MultiFSet AdaBoost . Correction Rate / 07.0= KernelParm Hierarchy Wrapper 2948.0 Correction Rate Wrapper AdaBoost 4848.0 07.0= Bagging KernelParm 0848.0 59.0= MultiFSet KernelParm 6217.0 59.0= KernelParm . 25.113 97.857 43.829 39.5811 AdaBoost Heirarchy Bagging MultiFSet . 5- KernelParm 50.0 1.0 3.0 5.0 7.0 9.0 59.0 Wrapper ,Bagging AdaBoost,MultiFset,Heirachy LibSVM CostFactor=1 RBF 3 / 4 /
59.0
4Reiki/Rest Group
Bagging
Correc tion Rate Runn ing Time
AdaBoost
Correc tion Rate Runni ng Time
MultFset
Heirachy
0. 9 0 0. 9 5
0.898 1 0.898 0 Bagging

715. 55
0.821 2
168. 48 0.898 6 498. 05
407. 66 Reiki/Placebo Group3 AdaBoost MultFset

Correc tion Rate Runni ng Time Correc tion Rate Runn ing Time
Heirachy
0. 7 0 0. 9 5
0.848 0
758. 79 0.848 4 1185 .93 0.712 6 311. 52
0.849 2
928. 34
. 6
Bagging, Wrapper 50.0 1.0 3.0 5.0 7.0 9.0 59.0 KernelParm 6- 3 CostFactor=1 RBF LibSVM AdaBoost,MultiFset,Heirachy / 4 /
Cost Factor 2-2-2

. 3 / 4 / Bagging, AdaBoost Wrapper 12 3 4 5 6 CostFactor . 31 51 RBF LibSVM , MultiFset, Heirachy . CostFactor Running Time Correction Classification Rate Adaboost Bagging Heirarchy wrapper Correction Rate 3141 . Wrapper MultiFSet Kernelparm 13 . Wrapper MultiFSet . CostFactor=1 Correction Rate 41 Correction Rate, Reiki/Placebo Group3 -14 Correction Rate, Reiki/Rest Group4 -13
0.92 0.9 0.88 0.86 0.84 0.82 0.8 0.78 0.76 0.74 0.72 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1 2 3 4 5 6 2 3 4 5 6
Running Time, Reiki/Placebo Group3
-16
Running Time, Reiki/Rest Group4
-15
800 700 600 500 400 300 200 100 0 1 1400 1200 1000 800 600 400 200 0 1 2 3 4 5 6 Bagging AdaBoost MultFset Heirachy 2 3 4 5 6 Bagging AdaBoost MultFset Heirachy
d 1 0.8571 320.11 0.8677 0.9630 0.9128 0.8480 758.785 0.8582 0.9097 0.8832 1 2 0.8239 440.04 0.8324 0.9029 0.8662 0.8321 466.93 0.8373 0.9112 0.8727 2
Bagging 3 0.8190 450.98 0.8244 0.9067 0.8635 0.8305 471.36 0.8335 0.9141 0.8720 3 4 0.8174 434.23 0.8203 0.9104 0.8630 0.8269 476.91 0.8287 0.9152 0.8698 4 5 0.8131 430.24 0.8155 0.9100 0.8601 0.8248 469.57 0.8256 0.9162 0.8685 5 6 0.8101 433.03 0.8121 0.9098 0.8582 0.8176 471.86 0.8201 0.9111 0.8632 6 1 0.8112 146.89 0.8082 0.9929 0.8911 0.8484 1185.93 0.8736 0.8887 0.8811 1 2 0.8181 657.80 0.8713 0.8354 0.8529 0.8250 712.70 0.8724 0.8467 0.8593 2 3 0.8040 654.48 0.8681 0.8132 0.8397 0.8075 715.67 0.8790 0.8063 0.8410 3
Adaboost 4 0.7967 653.86 0.8726 0.7940 0.8315 0.7899 713.82 0.8755 0.7779 0.8238 Heirachy 4
tor ion
0.78
tion
651.
0.87
0.77
0.82
ion
0.77
tion
714.
0.87
0.75
0.81
MultiFset
tor
ion 3128.0 25.111 7618.0 1399.0 3698.0 2227.0 370.112 5307.0 3869.0 9418.0 7418.0 24.411 2908.0 7699.0 2398.0 6007.0 75.351 0286.0 5589.0 1608.0 0908.0 27.821 5308.0 5899.0 5098.0 1496.0 08.971 9576.0 8099.0 5308.0 2708.0 88.041 7108.0 2999.0 6988.0 9986.0 21.491 0276.0 6499.0 0208.0 0308.0 18.941 2897.0 4999.0 5788.0 3486.0 19.191 3766.0 7799.0 7997.0 1308.0 21.451 3897.0 4999.0 6788.0 6776.0 62.291 6266.0 7799.0 3697.0 8398.0 37.376 7329.0 1149.0 3239.0 9748.0 79.8201 4978.0 9978.0 6978.0 7888.0 87.725 3329.0 6439.0 9829.0 3848.0 52.679 4288.0 7678.0 5978.0 7188.0 55.694 7229.0 4529.0 0429.0 0528.0 04.329 8388.0 4238.0 3758.0 0088.0 94.555 4729.0 5719.0 4229.0 6508.0 78.8401 1888.0 0297.0 2738.0
58.0
tion
.375
29.0
88.0
09.0
ion
87.0
tion
.1111
88.0
67.0
28.0
2-3 1 F Precision Recall

Precision ( ) ) (Recall . 4 9 3 7/4 Recall 9/4 . 03 02 04 03/02 Recall 06/02. Percision ) ( exactness Recall ) ) completeness . Recall Precision. Precision Recall . Recall Precision F-Measure Balanced F-Score )F=2 * (Precision * Recall) / (Precision + Recall F 1 F Precision Recall . 1 F ensemble 2-2-1 .( 71 ) 71- 2-2-1 1 F CostFactor=1 RBF
0.95 0.9 0.85 0.8 0.75 0.7 0.05 0.95 0.05 0.95 0.05 0.95 0.05 0.95
0.3
0.1
0.3
0.5
0.7
0.9
0.1
0.3
0.5
0.7
0.9
0.1
0.3
0.5
0.7
0.9
0.1
0.5
0.7
Bagging
Adaboost Reiki/Rest Group4
MultiFset Reiki/Placebo Group3
Heirachy
KernelParm CostFactor=1 RBF F1 Bagging / 7.0 MultiFset AdaBoost 9.0 Hierarchy Bagging 59.0 . / MultiFset AdaBoost 59.0 Hierarchy
4.5.1
Precision, Recall,F1 Precision, a measure for exactness or Quality Recall, a measure for completeness or Quantity F1 or F-Measure, Combines Precision and Recall, It is harmonic average
F1=2 * (Precision * Recall) / (Precision + Recall)
4.6 Benchmark Data

Inputs for the SVMs are gathered by Ms.Sahar Ghobadi by using a medical instrument (EEG) in several experiments and also we trust the input data and feature space.
0.9
Conclusion (qualitative results, comparative) and Future Works

Shuffling data before applying into classifiers increase the Correction Rate Ensemble methods which are using LibSVM as classifier and RBF Kernel wit parameters Kernel Parm 0.95 and Cost Factor 1 are classifying better. Correction Classification Rate for Adaboost, Bagging and Hierarchy wrappers are very close or higher than 0.85, so they can be used for both type of data Reiki/Rest and Reiki/Placebo, while MultiFSet wrapper has higher precision than 0.85 for Reiki/Rest. Bagging has minimum Calculation Time for classification of Reiki/Placebo data with precision higher than 0.85, while for Reiki/Placebo, MultiFset has the best running time.
According to results have taken from the experiments one and two:
Hierarchy Bagging, AdaBoost Wrapper 58.0 (/ /) / 58.0 . MultiFSet Bagging 0.85 / Wrapper . MultiFset / 58.0 RBF CostFactor=1 Wrapper .
References
1. NHS choices. [Online] [Cited: 03 18, 2011.] http://www.nhs.uk/conditions/EEG/Pages/Introduction.aspx. 2. Evaluation of Dynamic Changes of Brain Signals During Reiki. Ghobadi, Sahar. s.l. : Islamic Azad University of Iran, Mashhad Branch, 2010. 3. U.S. Department of Health and Human Service, ,. Reiki: An Introduction. [Online] [Cited: 03 10, 2011.] http://nccam.nih.gov/health/reiki/D315_BKG.pdf. 4. Wikipedia. Placebo. [Online] [Cited: 03 18, 2011.] http://en.wikipedia.org/wiki/Placebo. 5. NCCAM. National Center for Complementary and Alternative Medicine. [Online] [Cited: 03 18, 2011.] http://nccam.nih.gov/.
6. http://www.reikimontana.com/reiki_history.html. Reiki Montana. [Online] [Cited: 03 18, 2011.] http://www.reikimontana.com/. 7. Wikipedia. [Online] [Cited: 03 26, 2011.] http://en.wikipedia.org/wiki/10-20_system_(EEG). 8. The Chakras System. [Online] [Cited: 03 23, 2011.] http://under-the-bodhi-tree.com/the-chakrasystem/. 9. Statistical Description of Data. [book auth.] William H. Press [et al]. NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING. 1992. 10. Neurobit Systems. [Online] [Cited: 03 24, 2011.] http://www.neurobitsystems.com/. 11. Comparisons of Discrete Wavelet Transform, Wavelet Transform and Stationary Wavelet Transform in Denoising PD Measurement Data. X. Zhou, C. Zhou, B.G. Stewart. s.l. : IEEE International Symposium on Electrical Insulation, 2006. 12. Hyun-Chul, Kim, et. al. Pattern Classification Using Support Vector Machine Ensemble. Pohang, Korea : IEEE, 2002. 13. Rong, Yan et. al. On Predicting rare Classes With SVM Ensembles In Scene Classification. Pittsburg : IEEE, 2003. 14. Duda, Richard O., Hart, Peter E. and Stork, David G. Pattern Classification, Second Edition. s.l. : Wiley-Interscience Publication. 15. Hyun-Chul, Kim et al. Constructing support vector machine ensemble. Pohang, South Korea : Elsevier Ltd., 2003. 16. Empirical Analisis of support vector machine ensemble classifiers. Shi-jin, Wang et al. s.l. : Expert Systems with Applications, 2008. 17. Boosting Prediction Accuracy on Imbalanced Datasets with SVM Ensemble. Yang, Liu et al. Torento : s.n., 2006. 18. A Tutorial on Support Vector Machines for Pattern. BURGES, CHRISTOPHER J.C. Boston : Kluwer Academic Publishers, 1998. 19. The Nature of Statistical Learning Theory. Vapnik, Vladimir N. Berlin : s.n., 2000. 20. An Introduction to Support Vector Machines: And Other Kernel-Based Learning Methods. Cristianini, Nello and John Shawe-Taylor. s.l. : Cambridge University Press, 2000. 21. A comparison of methods for multiclass supportvector machines. Hsu, C.-W., & Lin, C.-J. s.l. : IEEE , 2002.
22. Lin, Chih-Chung Chang and Chih-Jen. LIBSVM: A Library for Support Vector Machines. Taipei : s.n., Intial version 2001, Latest Update April,15th 2011. 23. LIBSVM -- A Library for Support Vector Machines. Chih-Chung Chang and Chih-Jen Lin. [Online] [Cited: 04 28, 2011.] http://www.csie.ntu.edu.tw/~cjlin/libsvm/. 24. Sewell, Martin. Ensemble Learning. s.l. : Department of Computer SCience, University College London, 2008. 25. Rajabzadeh, Taiebeh. Advanced Training Methods in Feed Forward Networks, Kernel Based Methods. Mashad : Azad Islamic University ( Mashad Branch), 2008. 26. Rong, Yan. MATLABArsenal, A MATLAB Package for Classification Algorithms. [Online] [Cited: 08 03, 2011.] http://www.informedia.cs.cmu.edu/yanrong/MATLABArsenal/MATLABArsenalDoc/. 27. Yang, Jun. MILL: A Multiple Instance Learning Library. [Online] [Cited: 11 11, 2011.] http://www.cs.cmu.edu/~juny/MILL/index.html. 28. Support Vector Machines, 36-350, Data Mining. [book auth.] Cosma Shalizi. Support Vector Machines. 2009.
Appendix A:
Table 8
Table 9: Shuffled Verses no-Shuffled data Correction Rate and Running Time. Computed by utilizing MATLABArsenal LibSVM classifier using RBF kernel with the same kernel parameters KernelParam 0.01 and CostFactor 3.
Arsenal SVM rbf no-shuffled Reiki/Rest Features Delta teta alpha1 alpha2 beta A5-AVE D5-AVE D4-AVE D3-AVE A5-POW Reiki/Placibo
Arsenal SVM rbf shuffled Reiki/Rest Reiki/Placibo
Correction Calculation Correction Calculation Correction Calculation Correction Calculation Rate Time Rate Time Rate Time Rate Time 0.7778 0.7778 0.7771 0.7778 0.7778 0.7778 0.7732 0.7778 0.7778 0.7778 35.9 35.07 44.04 40.46 37.36 50.5 46.91 40.47 38.3 42.9 0.1988 0.1638 0.2992 0.0973 0.2236 0.2508 0.1658 0.2215 0.3901 0.2653 38.13 39.75 39.8 38.39 39.2 37.8 37.57 37.91 36.38 37.27 0.7778 0.7778 0.7778 0.7778 0.7778 0.7778 0.7778 0.7778 0.7778 0.7778 37.05 39.83 49.91 44.8 40.92 50.7 49.64 42.12 41.26 45.18 0.6316 0.6321 0.6316 0.6316 0.6316 0.6316 0.6316 0.6316 0.6316 0.6313 37.77 41.31 35.9 37.16 37.78 37.18 35.85 35.55 37.32 34.73
D5- POW D4- POW D3-POW A5- VAR D5- VAR D4- VAR D3- VAR Fractal Lyapanov Antropi
0.7726 0.7778 0.7778 0.7778 0.7726 0.7778 0.7778 0.7778 0.7778 0.7778
47.96 42 40.84 45.16 49.41 42.43 42.65 39.5 45.8 41.36
0.1401 0.0462 0.0797 0.2236 0.114 0.0462 0.0769 0.1469 0.2556 0.1996
37.41 36.66 37.25 37.49 37.94 36.51 35.87 38.06 43.03 39.39
0.7779 0.7778 0.7778 0.7778 0.777 0.7778 0.7778 0.7778 0.7768 0.7778
48.39 42.59 42.21 45.04 48.67 44.8 42.93 42.62 47.13 43.2
0.6316 0.6316 0.6316 0.6316 0.6316 0.6316 0.6316 0.6305 0.6316 0.6297
36.26 33.95 36.29 36.21 35.35 34.94 35.09 35.66 37.08 36.97
Table 10:Bagging and Adaboost ensemble methods results for Groups 1 to 4 by utilizing LibSVM Classifier with RBF kernel among kernel Parameters KernelParam0.9 and CostFactor 1 using Cross validation method for training SVMs
Table 10 Cross validation, 3 class Correction Rate Reiki/Rest Run Time (Sec) Prec Rec F1 Reiki/Placebo Correction Rate Run Time (Sec) Prec Rec F1 AdaBoost Group1 0.7648 775.46 0.4566 0.3094 0.3680 0.8517 1120.22 0.8771 0.8899 0.8834 Group2 0.7643 757.83 0.4555 0.3094 0.3678 0.8436 1127.96 0.8688 0.8863 0.8774 Group3 0.7604 899.02 0.4417 0.3018 0.3579 0.8517 1387.35 0.8771 0.8899 0.8834 Group4 0.7865 608.06 0.5439 0.2456 0.3384 0.8479 793.81 0.8572 0.9110 0.8833 Group1 0.7867 537.50 0.5461 0.2548 0.3452 0.8447 772.42 0.8568 0.9056 0.8805 Bagging Group2 0.7865 519.06 0.5439 0.2456 0.3384 0.8479 761.14 0.8572 0.9110 0.8833 Group3 0.7867 536.28 0.5461 0.2548 0.3452 0.8447 815.02 0.8568 0.9056 0.8805
Group4 0.8972 442.91 0.9107 0.9622 0.9357 0.8287 699.72 0.8424 0.8964 0.8686
RBF Polynomial Linear LibSVM Bagging : 4 1 CostFactor=1 KernelParm=0.1 Sigmoid
libsvm kernel parm 0.1, cost factor 1 method Reiki/Rest Group4
different kernels linear Kernel polynomial Kernel Group1 Group2 Group3 Group4 0.7778 0.7778 0.7778 0.8086 476.00 487.94 472.67 453.24
Group Group1 Group2 Group3 Group4 Correction Rate takes a long time to run Calculation
Time Prec Rec F1 Correction Rate Calculation Reiki/Placebo Time Prec Group3 Rec F1 method RBF Kernel 0.0000 0.0000 0.0000 0.6925 1122.48 0.7208 0.8377 0.7748 0.0000 0.0000 0.0000 0.6957 1093.27 0.7206 0.8478 0.7787 0.0000 0.0000 0.0000 0.6957 1099.37 0.7206 0.8478 0.7787 0.8270 0.9534 0.8857 0.6998 1006.02 0.7397 0.8100 0.7731
Sigmoid kernel
Reiki/Rest Group4
Group Group1 Group2 Group3 Group4 Group1 Group2 Group3 Group4 Correction Rate 0.7833 0.7841 0.7841 0.8657 0.7778 0.7778 0.7778 0.7778 Calculation Time 519.17 503.63 511.40 344.70 512.03 500.17 512.88 478.16 Prec 0.5619 0.5732 0.5732 0.8750 0.0000 0.0000 0.0000 0.7778 Rec 0.1054 0.1148 0.1148 0.9651 0.0000 0.0000 0.0000 1.0000 0.1763 0.7999 772.25 0.8157 0.8826 0.8478 0.1899 0.7996 753.62 0.8156 0.8822 0.8476 0.1899 0.7996 747.18 0.8156 0.8822 0.8476 0.9179 0.7876 660.38 0.8045 0.8768 0.8391 0.0000 0.6316 1200.45 0.6316 1.0000 0.7742 0.0000 0.6316 1191.94 0.6316 1.0000 0.7742 0.0000 0.6316 1229.16 0.6316 1.0000 0.7742
F1 Correction Rate Calculation Reiki/Placebo Time Prec Group3 Rec F1
0.8750 0.6316 1106.39 0.6316 1.0000 0.7742
RBF Polynomial Linear LibSVM AdaBoost : 4 1 CostFactor=1 KernelParm=0.1 Sigmoid
libsvm kernel parm 0.1, cost factor 1 method Group Correction Rate Calculation Time Reiki/Rest Prec Rec F1 Correction Rate Reiki/Placebo Calculation Time Prec
different kernels linear Kernel Group1 0.7778 2360.15 0.0000 0.0000 0.0000 0.7244 1202.37 0.7475 Group2 0.7776 2622.43 0.0000 0.0000 0.0000 0.7267 1357.61 0.7509 Group3 0.7778 2614.35 0.0000 0.0000 0.0000 0.7229 1426.69 0.7474 Group4 0.8335 578.23 0.8466 0.9599 0.8996 0.7165 935.91 0.7433 Group1 0.7778 839.25 0.0000 0.0000 0.0000 0.6953 713.61 0.7218 polynomial Kernel Group2 0.7778 704.47 0.0000 0.0000 0.0000 0.6933 825.52 0.7195 Group3 0.7778 698.23 0.0000 0.0000 0.0000 0.6936 1123.28 0.7206 Group4 0.8062 444.79 0.8260 0.9511 0.8841 0.6995 926.02 0.7412
Rec F1 method Group Correction Rate Calculation Time Reiki/Rest Prec Rec F1 Correction Rate Calculation Time Reiki/Placebo Prec Rec F1
0.8513 0.7960 Group1 0.7875 868.11 0.5744 0.1785 0.2721 0.8133 1273.48 0.8384 0.8728 0.8552
0.8493 0.7969 Group2 0.7755 887.61 0.4856 0.1748 0.2566 0.8159 1505.66 0.8399 0.8753 0.8572
0.8478 0.7944 Group3 0.7755 896.55 0.4856 0.1748 0.2566 0.8159 1302.81 0.8399 0.8753 0.8572
0.8421 0.7896 Group4 0.8725 749.94 0.8996 0.9411 0.9199 0.7953 1126.70 0.8169 0.8713 0.8432
0.8418 0.7772 Group1 0.7778 433.59 0.0000 0.0000 0.0000 0.6316 589.57 0.6316 1.0000 0.7742
0.8440 0.7765 Group2 0.7776 377.57 0.0000 0.0000 0.0000 0.6316 1098.07 0.6316 1.0000 0.7742
0.8408 0.7761 Group3 0.7778 451.56 0.0000 0.0000 0.0000 0.6316 388.83 0.6316 1.0000 0.7742
0.8060 0.7721 Group4 0.7778 420.70 0.7778 1.0000 0.8750 0.6316 316.20 0.6316 1.0000 0.7742
RBF Kernel
Sigmoid kernel
RBF Polynomial Linear LibSVM MultiFset 3: 4 1 CostFactor=1 KernelParm=0.1 Sigmoid
libsvm kernel parm 0.1, cost factor 1 method Group Correction Rate Calculation Time Reiki/Rest Prec Group4 Rec F1 Correction Rate Calculation Time Reiki/Placebo Prec Group3 Rec F1 method Group Correction Rate Calculation Time Reiki/Rest Prec Group4 Rec F1 Reiki/Placebo Correction Rate
different kernels linear Kernel polynomial Kernel
Group1 Group2 Group3 Group4 Group1 Group2 Group3 Group4 0.7778 0.7778 0.7778 0.7778 0.7778 0.7778 0.7778 0.7778 795.84 688.90 297.56 365.46 96.39 131.66 127.87 125.33 0.0000 0.0000 0.0000 0.7778 0.0000 0.0000 0.0000 0.7778 0.0000 0.0000 0.0000 1.0000 0.0000 0.0000 0.0000 1.0000 0.0000 0.0000 0.0000 0.8750 0.0000 0.0000 0.0000 0.8750 0.6688 0.6597 0.6354 0.6417 0.6321 0.6292 0.6342 0.6316 515.08 398.11 876.99 688.73 240.79 333.09 328.21 282.70 0.6629 0.6528 0.6345 0.6392 0.6339 0.6307 0.6333 0.6316 0.9679 0.9851 0.9966 0.9948 0.9883 0.9962 1.0000 1.0000 0.7868 0.7852 0.7754 0.7781 0.7723 0.7724 0.7754 0.7742 RBF Kernel Sigmoid kernel Group1 Group2 Group3 Group4 Group1 Group2 Group3 Group4 0.7788 0.7786 0.7789 0.8093 0.7778 0.7778 0.7778 0.7778 169.01 184.67 194.78 134.53 116.36 141.84 137.62 133.22 1.0000 0.6667 0.4714 0.8057 0.0000 0.0000 0.0000 0.7778 0.0044 0.0051 0.0089 0.9946 0.0000 0.0000 0.0000 1.0000 0.0087 0.0101 0.0174 0.8902 0.0000 0.0000 0.0000 0.8750 0.6963 0.6888 0.6851 0.6880 0.6316 0.6316 0.6316 0.6316
Group3
Calculation Time Prec Rec F1
210.96 0.6810 0.9766 0.8024
273.25 0.6764 0.9724 0.7979
260.63 0.6760 0.9630 0.7943
262.80 0.6783 0.9624 0.7958
290.77 0.6316 0.9996 0.7741
260.01 0.6316 1.0000 0.7742
273.61 0.6316 1.0000 0.7742
261.99 0.6316 1.0000 0.7742
RBF Polynomial Linear LibSVM Hierarchical 4- 4 1 CostFactor=1 KernelParm=0.1 Sigmoid
different kernels PosRatio 0.8 -- LibSVM -KernelParam 0.1 -CostFactor 1 -- LibSVM -KernelParam 0.1 -CostFactor 1 method Group Correction Rate Calculation Time Prec Rec Group1 0.7778 1311.87 0.0000 0.0000 0.0000 0.7214 1101.46 0.7309 0.8846 0.8004 Group1 0.7776 624.02 0.0000 0.0000 0.0000 0.7984 845.76 0.7991 0.9096 0.8507 linear Kernel Group2 0.7778 1438.96 0.0000 0.0000 0.0000 0.7151 1226.77 0.7368 0.8562 0.7913 Group2 0.7778 627.88 0.0000 0.0000 0.0000 0.7991 834.33 0.7999 0.9094 0.8511 Group3 0.7778 1436.86 0.0000 0.0000 0.0000 0.7177 1286.56 0.7337 0.8685 0.7953 Group3 0.7778 622.84 0.1667 0.0007 0.0014 0.6280 852.80 0.6308 0.9926 0.7710 Group4 0.8340 470.78 0.8511 0.9534 0.8993 0.7068 1279.38 0.7323 0.8450 0.7845 Group4 0.8671 476.92 0.8802 0.9599 0.9183 0.6260 825.48 0.6298 0.9899 0.7697 Group1 0.7778 617.15 0.0000 0.0000 0.0000 437999 904.00 0.6422 0.9697 0.7724 Group1 0.7778 1459.80 0.0000 0.0000 0.0000 0.7157 1158.69 0.7349 0.8622 0.7927 polynomial Kernel Group2 0.7778 623.71 0.0000 0.0000 0.0000 0.6533 851.72 0.6583 0.9426 0.7747 Group2 0.7778 1426.64 0.0000 0.0000 0.0000 0.6568 1163.72 0.6587 0.9677 0.7815 Group3 0.7778 618.52 0.0000 0.0000 0.0000 0.6283 858.84 0.6348 0.9687 0.7670 Group3 0.7778 1412.58 0.0000 0.0000 0.0000 0.7152 1155.92 0.7286 0.8745 0.7949 Group4 0.8041 498.72 0.8216 0.9557 0.8836 0.6420 840.07 0.6531 0.9291 0.7663 Group4 0.8213 475.99 0.8676 0.9089 0.8877 0.6925 1260.61 0.7400 0.7914 0.7647
Reiki/Rest Group4
F1 Correction Rate Calculation Reiki/Placebo Time Group3 Prec Rec F1 method Group Correction Rate Calculation Time Prec Rec
RBF Kernel
Sigmoid kernel
Reiki/Rest Group4
F1 Correction Rate Calculation Reiki/Placebo Time Group3 Prec Rec F1
Bagging, Wrapper 50.0 1.0 3.0 5.0 7.0 9.0 59.0 KernelParm 6- 3 CostFactor=1 RBF LibSVM AdaBoost,MultiFset,Heirachy / 4 /
method Kernel Parm Correction Rate Calculation Time Prec Rec F1 Correction Rate Calculation Time Prec Rec F1 method Kernel Parm Reiki/Rest Group4 Correction Rate Calculation Time Prec Rec F1 Reiki/Placebo Group3 Correction Rate Calculation Time Prec Rec F1 0.05 0.8075 123.97 0.8051 0.9930 0.8891 0.6784 317.38 0.6697 0.9685 0.7918 0.1 0.8106 144.66 0.8069 0.9946 0.8909 0.6833 302.72 0.6746 0.9631 0.7934 0.3 0.8169 141.52 0.8135 0.9921 0.8939 0.6973 291.85 0.6846 0.9658 0.8011 0.05 0.8588 397.13 0.8678 0.9655 0.9141 0.7767 787.19 0.7948 0.8713 0.8313 0.1 0.8666 371.86 0.8750 0.9666 0.9185 0.7999 904.00 0.8157 0.8826 0.8478 0.3 0.8892 378.58 0.8999 0.9649 0.9313 0.8313 722.14 0.8467 0.8949 0.8701 Bagging 0.5 0.8944 374.15 0.9060 0.9643 0.9343 0.8445 737.77 0.8556 0.9068 0.8804 MultiFset 0.5 0.8173 155.83 0.8136 0.9925 0.8942 0.7047 307.13 0.6904 0.9654 0.8050 0.7 0.8202 159.67 0.8154 0.9938 0.8958 0.7057 291.91 0.6915 0.9641 0.8053 0.9 0.8212 168.48 0.8171 0.9921 0.8961 0.7119 301.61 0.6962 0.9651 0.8089 0.95 0.8207 163.00 0.8162 0.9931 0.8960 0.7126 311.52 0.6970 0.9641 0.8090 0.05 0.8559 508.74 0.8747 0.9509 0.9112 0.7780 838.92 0.7989 0.8671 0.8314 0.1 0.8683 527.14 0.8816 0.9595 0.9189 0.8062 891.85 0.8251 0.8797 0.8515 0.3 0.8900 496.37 0.9019 0.9635 0.9316 0.8358 837.58 0.8581 0.8866 0.8721 0.7 0.8951 411.06 0.9082 0.9624 0.9345 0.8480 758.79 0.8582 0.9097 0.8832 0.9 0.8972 403.98 0.9107 0.9622 0.9357 0.8479 895.51 0.8572 0.9110 0.8833 0.95 0.8980 407.66 0.9118 0.9620 0.9362 0.8478 803.96 0.8565 0.9118 0.8833 0.05 0.8588 797.21 0.8798 0.9483 0.9126 0.8453 1601.81 0.8683 0.8899 0.8789 0.1 0.8761 802.78 0.8994 0.9465 0.9224 0.8133 1393315 0.8384 0.8728 0.8552 0.3 0.8947 743.31 0.9213 0.9455 0.9332 0.7829 1289.96 0.8120 0.8540 0.8324 Adaboost 0.5 0.8964 742.53 0.9236 0.9449 0.9341 0.8466 1802.89 0.8717 0.8878 0.8796 Heirachy 0.5 0.8900 493.84 0.9106 0.9519 0.9308 0.8454 955.20 0.8673 0.8921 0.8793 0.7 0.8926 514.84 0.9179 0.9467 0.9321 0.8492 928.34 0.8754 0.8879 0.8815 0.7 0.8933 719.69 0.9228 0.9415 0.9321 0.8405 1419.37 0.8678 0.8817 0.8747
0.9 0.8981 715.55 0.9228 0.9484 0.9354 0.8436 1288.93 0.8688 0.8863 0.8774
0.95 0.8904 718.47 0.9189 0.9422 0.9304 0.8484 1185.93 0.8736 0.8887 0.8811
Reiki/Placebo Group3
Reiki/Rest Group4
0.9 0.8949 528.92 0.9220 0.9448 0.9332 0.8472 910.53 0.8639 0.9006 0.8816
0.95 0.8986 498.05 0.9220 0.9501 0.9358 0.8478 845.48 0.8748 0.8858 0.8802

Behzad Thesis

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Behzad Thesis

Încărcat de

Drepturi de autor:

Formate disponibile

Examensarbete 30 hp December 2011

Using Ensemble Learning To Improve Classification Accuracy in Medical Data

Using Ensemble Learning To Improve Classification Accuracy in Medical Data

Supervisor: Saied Rahati Reviewer: Olle Gllmo Examiner: Anders Jansson

2.1.1 2.1.2 2.2

2.2.1 2.2.2 2.2.3 2.2.4 2.2.5 2.3

3.3.1 3.3.2 3.3.3 3.3.4 4

Conclusion (qualitative results, comparative) and Future Works ...................................................... 65

Figure 1: Representation block diagram of the project

Figure 2: The 24-channel system to register brain signals (2) (7)

Figure 3: Chacras Body System (7)

Time domain features

Figure 4: Frequency bands of EEG signals (10)

Time-frequency domain features

Row Type of Feature

Time-Frequency Features (Wavelet)

Normal 75.53% 79.78% Bagging 88.89% 93.52% 81.15%

Boosting 87.18% 88.68% Boosting 89.65% 96.43% 82.08%

Figure 5: Architecture of the SVM ensembles (13)

3 Technical solution (Methodology)

Figure 7: a more complicated model for classification of fishes (14)

, which defining a direction at a 90 degree angle to the hyper-plane. As it can be

Particularly, when the data are perfectly separable by line, then

= 0. In this case, the hyper-

). This is called the optimal separating

and the derivation of

By using partial derivation of

to , equation 3 can be solved so

that the subsequent saddle-point equations can be acquired:

for upcoming tests, represents as

, the mapping function. (refer to Equation 12):

Equation (11) Equation (12)

, can be correct. Consequently, majority

E, which is incorrect result

When number of classifiers n is huge, the probability

E becomes extremely undersized. Hence,

re-sampling, but with substitution. Each instance

Figure 10 : The AdaBoost Algorithm (15)

Methods for aggregating support vector machines

number of SVMs whose decisions are known to the

3.3.4.2 LSE-Based Weighting

the aggregated decision for SVM ensemble for a

owing to the LSE-based weighting can be specified by below formula: *( ) +).

3.3.4.3 Double-layer Hierarchical Combining

the SVM ensemble in the lower-layer which

is the number of SVMs in that layer and

decision function for the super SVM in the upper layer.

4.1 Experiments Devices and Tools

4.2 Samples Structure

Classes of data Reiki/Rest Reiki/Placebo

Frequency 0.7778 6.27 0.6934 895.72

Wavelet 0.7706 6.85 0.6762 184.08

NoneLinear 0.7778 5.93 0.644 4.92

All Features 0.7789 21.7 0.7187 1830.14

Diagram 2: Shuffled Verses no-Shuffled Running Time

Group1 Delta teta alpha1 alpha2 beta A5-AVE D5-AVE

AdaBoost Correction Rate

AdaBoost Run Time (Sec)

Figure 12: Adaptive Combiner (MCWithMultiFSet)

1-2 Stacked Generalization : Adaptive Combiners

)Run Time (Sec

3- Reiki/Rest Reiki/Placebo ) Stacked Generalization(MCHierarchyClassify )Adaptive Combiners(MCWithMultiFSet

1=Meta Classifier:LibSVM,RBF KernelParam=0.9, CostFactor ,1=BaseClassifier:SVM_LIGHT,RBF KernelParam=0.9, CostFactor