Sunteți pe pagina 1din 14

Int. J. Mach. Learn. & Cyber.

DOI 10.1007/s13042-015-0341-x

ORIGINAL ARTICLE

Improved student dropout prediction in Thai University using


ensemble of mixed-type data clusterings
Natthakan Iam-On • Tossapon Boongoen

Received: 19 August 2014 / Accepted: 9 February 2015


Ó Springer-Verlag Berlin Heidelberg 2015

Abstract Increasing student retention has been a com- 1 Introduction


mon goal of many academic institutions, especially in the
university level. The negative effects of student attrition Having operated in a sophisticated and highly competitive
are evident to students, parents, university and the society environment, modern universities commonly seek to
as a whole. The first-year students are at the greatest risk of analyze their performance, to identify their uniqueness
dropping out or not completing their degree on time. With and to formulate a strategy from working experience and
this insight, a number of data mining methods have been knowledge [4]. The recent increase of learning resources,
developed for early detection of students at risk of dropout, educational software, databases of course and student
hence the immediate application of assistive measure. As information having created large repositories of data [28].
compared to western countries, this subject has attracted This provides a goldmine that can be explored to under-
only a few studies in Thai university, with educational data stand students’ learning behavior, preference and perfor-
mining being limited to the use of conventional classifi- mance [35]. In response, the application of data mining
cation models. This paper presents the most recent inves- (DM) methodology in education, also known as educa-
tigation of student dropout at Mae Fah Luang University, tional data mining (EDM), has been a fast-growing in-
Thailand, and the novel reuse of link-based cluster terdisciplinary research field [3, 43]. Discovered
ensemble as a data transformation framework for more knowledge is highly useful to better understand how
accurate prediction. The empirical study on mixed-type students learn and effects of different settings to their
data collection related to students’ demographic detail, achievement. This can help to improve educational out-
academic performance and enrollment record, suggests that comes and to gain insights into various educational
the proposed approach is usually more effective than sev- phenomena.
eral benchmark transformation techniques, across different In the EDM literature, recent researches have focused on
classifiers. understanding student categories and targeted marketing
[2, 44]. This is accomplished through, for instance, using
Keywords Ensemble clustering  Classification  Feature predictive modeling for maximizing student retention [60],
transformation  Student dropout  Educational data mining developing enrollment prediction models based on admis-
sion data [58], predicting student performance and dropout
[30, 40]. More specifically, if accurate predictors of aca-
demic performance can be obtained, they can be used to
N. Iam-On (&)
gain understanding of success and risk factors with respect
School of Information Technology, Mae Fah Luang University,
Chiang Rai 57100, Thailand to the curriculum [54]. Awareness of these issues by
e-mail: nt.iamon@gmail.com educational staffs and management will help identifying
the risk group and determining the appropriate course of
T. Boongoen
measures. In other words, at-risk students would be pro-
Department of Mathematics and Computer Science, Royal Thai
Air Force Academy, Bangkok 10220, Thailand vided with academic and administrative support to increase
e-mail: tossapon_b@rtaf.mi.th the chance of staying on the course.

123
Int. J. Mach. Learn. & Cyber.

According to the study of Tinto [52], student dropout is the overall prediction outcome, but they can be essential
a major problem in higher education with around one to a specific observation [6, 15, 16].
fourth of students dropping college after their first year. It This paper presents EDM models and results of the
is suggested that determinants of completion include stu- project carried out at Mae Fah Luang University, Thailand.
dent’s family background, personal characteristics, and The research work aims to disclose interesting patterns,
prior academic performance. Most dropout studies address which could contribute to predicting student performance
the retention of first-year students [17, 39, 48], based on the and dropout, based on their pre-university characteristics,
assumption that early detection of vulnerable students can admission details, and initial academic performance at
lead to the success of any retention strategy. To achieve university. In particular, a new feature transformation
this, several predictive models have been designed using method is introduced to improve the accuracy of conven-
different learning techniques, such as Naive Bayes [30], k- tional classifiers. This is achieved through the use of
means [10], and decision tree [26, 51]. Unfortunately, this ensemble-information matrix created by link-based
problem in Thai universities has attracted less attention, ensemble clustering (LCE; [22–24]). With respect to LCE,
with only a few investigations being conducted in recent the refined sample-cluster association matrix can be con-
years. These include the social science studies at Prince of sidered as the representation of samples in the transformed
Songkla University [47] and King Mongkut University of space, which is discovered from multiple clusterings in the
Technology North Bangkok [50]. Yet, the model devel- setting of original features. The use of clustering infor-
opment is limited in number, whilst restricted to the use of mation in addition to the original feature has been recently
conventional classifiers like decision tree and neural net- reported to improve the accuracy of intrusion detection
works [29]. problem [38]. In particular, a fuzzy clustering technique is
Besides the findings with benchmark techniques, the applied to the data under examination, and the resulting
tendency towards improving these predictive models with memberships of each sample to k clusters (where k is a
feature discrimination has emerged lately [14, 31, 33, 40]. user-defined number of clusters) are used to form addi-
Regarding the observation of West et al. [56], a few of tional k variables or features for the following classification
available data features may account for much of the data step. Similar classification approaches of Nasierding et al.
variation. With this characteristic, an effective prediction [36] and Sang-Woon [45] have combined cluster labels and
may not be achieved due to the fact that any proximity or conventional supervised algorithms for face recognition
similarity metric exploited by a classification model is and image annotation, respectively. The proposed approach
increasingly inaccurate as dimensionality increases [5]. To is largely different from the aforementioned clustering-
deal with this problem, one may reduce dimension of the oriented methods, which make use of the result from a
data by selecting a subset of interesting variables; or single run of clustering technique. In contrary, this method
generating meta components (i.e., combinations of vari- concentrates on transforming the original data to another
ables), which are widely known as feature selection and information matrix that represents associations within a
transformation, respectively. Following these approaches cluster ensemble.
to dimensionality reduction, the classification procedure The rest of this paper is organized as follows. Section 2
has been re-shaped and comprises two steps of: data provides background knowledge of ensemble clustering as
preprocessing by dimensionality reduction; and the clas- to set the scene for the concepts and notations used
sification process, in which samples are classified into throughout the paper. Following that, Sect. 3 introduces the
categories by applying standard statistical or machine ensemble-based data transformation framework, including
learning models [37]. Dimensionality reduction tech- the generation of cluster ensemble and the ensemble-in-
niques transform data for sake of easier computation, formation matrix. Section 4 presents the description of data
modeling and inference by analysis of variable interde- related to the current study of students’ dropout. In addi-
pendence and interobject similarity [7]. On one hand, tion, this section also includes the evaluation of the pro-
feature selection algorithms try to search for the most posed predictive model as compared to several
valuable feature subset heuristically under certain prede- unsupervised transformation methods, using conventional
fined feature subset evaluation criterion [8]. On the other, classification techniques. The paper is concluded in Sect. 5
feature transformation methods like principal component with the perspective of future research.
analysis (PCA) and kernel principal component analysis
(KPCA) transform original features into some new fea-
tures, which are probably difficult to interpret for human 2 Preliminary of cluster ensemble
beings [9]. According to the research of Luan and Zhao
[31], feature transformation is preferred for EDM. In Ensemble clustering or cluster ensemble has become an
particular, some features may have little significance to attractive tool for analyzing modern data, such as those

123
Int. J. Mach. Learn. & Cyber.

Fig. 1 Examples of a cluster ensemble, b label-assignment matrix, c pairwise similarity matrix and d binary cluster-association (BA) matrix.
Note that X ¼ fx1 ; . . .; x6 g, P ¼ fp1 ; p2 ; p3 ; p4 g, p1 ¼ fC11 ; C21 ; C31 g, p2 ¼ fC12 ; C22 g, p3 ¼ fC13 ; C23 g and p4 ¼ fC14 ; C24 ; C34 g

acquired from microarray experiments [22, 27]. This meta- partition p ¼ C1 ; . . .; CK , where K denotes the number of
learning approach is motivated by the fact that the quality clusters in the final clustering result, of a data set X that
of most conventional clustering techniques is highly data summarizes the information from the cluster ensemble P.
dependent. Based on a number of published studies [13, This meta-level methodology comprises three major tasks
57], a clustering algorithm may create an acceptable out- of: (1) generating a cluster ensemble, (2) summarizing
come on one dataset, but possibly becomes less accurate ensemble-information into a form suitable for the follow-
with others. To resolve this, researchers and scientists have ing analysis, and (3) producing the final partition, normally
commonly attempted to aggregate multiple standard clus- referred to as a consensus clustering function.
terings into a single consensus decision, with possibly Having obtained an ensemble of base clustering results,
higher quality than those initial data partitions. It is shown a variety of consensus functions have been developed and
across different domains that ensemble clustering can made available for generating the ultimate data partition.
provide more accurate and robust solutions [13, 22, 53]. Each consensus function utilizes a specific form of
Specific to this section, it is given to set the scene for ensemble-information matrix, which summarizes those
theoretical concepts and notations discussed throughout the decisions of base clusterings. Based on the cluster ensem-
present work. ble shown in Fig. 1a, three general types of such matrix can
According to Iam-On et al. [22], the problem of be constructed. Firstly, the label-assignment matrix (see
ensemble clustering is defined as follows. Let X ¼ Fig. 1b for an example), of size N  M, represents cluster
fx1 ; . . .; xN g be a set of N data points or samples, where labels that are assigned to each sample by different base
each xi 2 X is represented by a vector of D feature or at- clusterings. Next, the pairwise similarity matrix (e.g.,
tribute values, i.e., xi ¼ ðxi;1 ; . . .; xi;D Þ. Also, let P ¼ Fig. 1c), of size N  N, summarizes co-occurrence statis-
fp1 ; . . .; pM g be a cluster ensemble with M base cluster- tics amongst samples. And finally, the binary cluster-as-
ings, each of which is referred to as an ensemble member. sociation (BA) matrix, whose example is shown in Fig. 1d,
Each base clustering returns a set of clusters provides a cluster-specific view of the original label-as-
S kg g
pg ¼ fC1g ; C2g ; . . .; Ckgg g, such that t¼1 Ct ¼ X, where kg is signment matrix. The association degree that one sample
belonging to a specific cluster is either 1 or 0. With these
the number of clusters in the gth clustering. For each
representation schemes, different consensus functions
xi 2 X, C g ðxi Þ denotes the cluster label in the gth base
found in the literature make use of the aforementioned
clustering to which data point xi belongs, i.e., C g ðxi Þ ¼ ‘t’
matrices to create the final decision. They can be
(or ‘Ctg ’) if xi 2 Ctg . The problem is to find a new data

123
Int. J. Mach. Learn. & Cyber.

pffiffiffiffi
categorized into direct approach [12], feature-based ap- selected between f2; . . .; d N eg. Note that both
proach [53], pairwise-similarity approach [13, 34], and ‘Fixed-k’ and ‘Random-k’ generation strategies are
graph-based algorithms [11, 49], respectively. initially introduced in the primary work [21].

3 Proposed method
3.2 Aggregating multiple clustering results
The use of ensemble clustering as a new feature transfor-
Having acquired cluster ensemble P, the base clusterings
mation method for data classification has not been exten-
sively investigated in the literature. This approach is are summarized into an information matrix H 2 ½0; 1NP .
motivated by several studies in the recent years, which Note that P denotes the total number of clusters in the
enrich the original set of features with additional variables ensemble under examination, i.e., P ¼ k1 þ . . . þ kg . For
acquired from a single data clustering [36, 38, 45]. In ad- each clustering pg 2 P and their corresponding clusters
dition to their simplicity, these initial works have proven C1g ; . . .; Ckgg , a matrix entry Hðxi ; clÞ that represents the
effective for improving the accuracy of real classification association degree between data point xi 2 X and each
problems like face recognition and image annotation. cluster cl 2 fC1g ; . . .; Ckgg g, is estimated as follows:
Given these observations, it is hypothesized that the use of

multiple clusterings may provide more useful information 1 if cl ¼ Cg ðxi Þ
Hðxi ; clÞ ¼ ; ð1Þ
to the classification process, as compared to the previous simðcl; Cg ðxi ÞÞ otherwise
exploitation of one data partition. This leads to a new
method to unsupervised feature transformation, through the where Cg ðxi Þ is a cluster label to which data point xi has
generation of ensemble-information matrix that can be been assigned. In addition, simðCx ; Cy Þ 2 ½0; 1 denotes the
regarded as the deep-knowledge representation of the ori- similarity between any two clusters Cx ; Cy 2 pg , which can
ginal counterpart. In particular, the framework adopts link- be discovered using the link-based algorithms presented
based cluster ensemble (LCE) and includes two stages of: next. Note that H can be considered as a refined variation
(1) creating an ensemble P, and (2) aggregating base of a crisp BA matrix, in which an element can only be
clusterings, pg 2 P; g ¼ 1. . .M, into a meta-level data either 1 or 0.
matrix H. As such, the resulting matrix H can be used as an Weighted connected triple (WCT) algorithm [20–22]:
input to any preferred classification model. Note that the extends the Connected-Triple method [42] that has been
proposed method follows the initial attempt of using H used to find ambiguous author names within a publication
matrix for classification with numerical features Iam-On repository. This is built for a social network represented as
and Boongoen [19]. Unlike the previous study where k- an undirected graph G0 ¼ ðV; EÞ, where V is the set of
means is employed as base clusterings, the current research vertices each corresponding to an author name and E is the
generates an ensemble of k-prototypes Huang [18] that is set of unweighted edges each standing for a co-authorship
effective for mixed-type numerical and categorical data relation, respectively. With this representation scheme, the
clustering, e.g., students’ performance related data. similarity of vx ; vy 2 V is subjected to the number of
Connected-Triples (i.e., triples) they are part of. Formally,
3.1 Creating ensemble of clusterings a triple, Triple ¼ ðVTriple ; ETriple Þ, is a subgraph of G0 con-
taining three vertices VTriple ¼ fvx ; vy ; vk g  V and two
For the present study, the following two types of cluster edges ETriple ¼ fexk ; eyk g  E, with exy 62 E. This simple
ensembles are investigated. Based on the original work of counting might be sufficient for any indivisible object, e.g.,
LCE [22] and its application to mixed-type data analysis data point or author. However, in the case of clusters, it is
[25], k-prototypes is used to form base clusterings, each of necessary to take into account the composite characteristic,
which is initialized with a random set of cluster prototypes. i.e., shared data members between clusters under
examination.
– Fixed-k Each clustering pg 2 P, is created using the With this intuition, the WCT algorithm is established
data set X 2 RND with all D attributes. The number of upon a weighted graph G ¼ ðV; WÞ, where V is the set of
pffiffiffiffi
clusters in each base clustering is fixed to k ¼ d N e. vertices each representing a cluster in P and W is a set of
To obtain a meaningful partition, k becomes 50 if weighted edges between clusters. The weight jwxy j 2 ½0; 1
pffiffiffiffi
d N e [ 50. assigned to the edge wxy 2 W, that connects vertices
– Random-k Each pg is created using the data set with all vx ; vy 2 V (corresponding to clusters Cx ; Cy 2 P), is cal-
attributes, and the number of clusters is randomly culated by:

123
Int. J. Mach. Learn. & Cyber.

jBx \ By j their contribution towards the similarity measure. It is in-


jwxy j ¼ ; ð2Þ spired by the initial metric of Adamic and Adar [1], which
jBx [ By j
is used to evaluate the association between personal home
where Bz  X denotes the set of data points belonging to a pages. In particular, features of the compared pages pa and
cluster Cz 2 P. Note that G is an undirected graph such pb are exploited to calculate their similarity score, i.e.,
that jwxy j ¼ jwyx j; 8vx ; vy 2 V. Given a weight graph, the scoreðpa ; pb Þ, as follows:
WCT measure of vx ; vy 2 V with respect to each center of a X 1
triple vz 2 V, is defined as scoreðpa ; pb Þ ¼ ; ð7Þ
log ðfreqðz ÞÞ
8zc 2Z c
z
WCTxy ¼ minðjwxz j; jwyz jÞ; ð3Þ
where Z is the set of features shared by home pages pa and
where jwxz j and jwyz j are weights of the edges wxz ; wyz 2 W pb , and freqðzd Þ represents the number of times feature zd
connecting vertices vx and vz , and vertices vy and vz , re- appearing in the studied set of pages. Note that the method
spectively. The summation of all triples ð1. . .kÞ between gives high weights to rare features and low weights to
vertices vx and vy can be found by the following equation. features that are common to most of the pages. Specific to
X
k WTQ, Eq. 7 has been modified to discriminate the quality
z of shared triples between a pair of vertices in question. To
WCTxy ¼ WCTxy ð4Þ
z¼1 this point, the quality of each vertex is determined by the
rarity of links connecting itself to other vertices in a net-
WCT algorithm can be summarized as follows:
work. With the aforementioned weighted graph
G ¼ ðV; WÞ, the WTQ measure of vertices vx ; vy 2 V with
ALGORITHM: WCT(G, Cx , Cy )
respect to each center of a triple vz 2 V, is designed as
G = (V, W ), a weighted graph, where Cx , Cy ∈ V ;
1
Nk ⊂ V, a set of adjacent neighbors of Ck ∈ V ; Cz ∈ Nk WTQzxy ¼ P ; ð8Þ
jwzt j
when |wkz | > 0; 8vt 2Nz

W CTxy , the WCT measure of Cx and Cy ; Here Nz  V denotes the set of vertices that is directly
linked to the vertex vz , such that 8vt 2 Nz ; jwzt j [ 0. The
(1) W CTxy ← 0
accumulative WTQ score from all triples ð1. . .kÞ between
(2) For each z ∈ Nx vertices vx and vy can be approximated by
(3) If z ∈ Ny
X
k
(4) W CTxy ← W CTxy + min(|wxz |, |wyz |) WTQxy ¼ WTQzxy ð9Þ
z¼1
(5) Return W CTxy
The WTQ algorithm is summarized next.
Following that, the similarity SWCT ðvx ; vy Þ between vx
and vy (or cluster Cx and Cy ) is defined as ALGORITHM: WTQ(G, Cx , Cy )

WCTxy G = (V, W ), a weighted graph, where Cx , Cy ∈ V ;


SWCT ðvx ; vy Þ ¼  DC; ð5Þ
WCTmax Nk ⊂ V, a set of adjacent neighbors of Ck ∈ V ;

with Wk = wtk ;
∀Ct ∈Nk
WCTmax ¼ max WCTpq ; ð6Þ W T Qxy , the WTQ measure of Cx and Cy ;
8vp ;vq 2V
(1) W T Qxy ← 0
where DC 2 ½0; 1 is a constant decay factor (i.e., confi-
dence level of accepting two non-identical clusters as being (2) For each z ∈ Nx
similar). With this similarity metric, SWCT ðvx ; vy Þ 2 ½0; 1 (3) If z ∈ Ny
with SWCT ðvx ; vx Þ ¼ 1, 8vx ; vy 2 V. It is also reflexive such (4) W T Qxy ← W T Qxy + 1
Wz
that SWCT ðvx ; vy Þ is equivalent to SWCT ðvy ; vx Þ. It is note-
(5) Return W T Qxy
worthy that the DC value around 0.8–0.9 is suggested by
Iam-On and Garrett [20], based on experiments with Then, the similarity SWTQ ðvx ; vy Þ between vertices vx and
simulated, UCI and published microarray data collections. vy is
Weighted triple quality (WTQ) algorithm [24]: aims to WTQxy
SWTQ ðvx ; vy Þ ¼  DC; ð10Þ
differentiate the significance of different triples and hence WTQmax

123
Int. J. Mach. Learn. & Cyber.

provided that fication of graduate/dropout after the first year in the


university. Table 1 summarizes 21 features used in this
WTQmax ¼ max WTQpq ð11Þ
8vp ;vq 2V empirical study, with respect to data type and involvement
in the aforementioned application contexts. In particular to
Having achieved the transformed matrix H ¼ fx1 ; . . .; xN g
student’s gender, the dataset comprises 253 male and 558
where xi ¼ fxi;1 ; . . .; xi;P g; i ¼ 1. . .N; it can be used to
female, who are originally from 72 different home pro-
create a classifier, provided that the ground truth or classes
vinces. Their university entries can be categorized into four
of N data points are known. In other words, the matrix H
types of RQ (Regional Quota), DA (Direct Admission),
can be regarded as the input training set for a classification
ADA (Additional Direct Admission), and CA (Conditional
algorithm such as C4.5 (Decision Tree) and Naive Bayes.
Admission, with school GPAX above 2.0). The number of
After the supervised-learning stage, the resulting classifier
students belonging to these types are 222 of RQ, 393 of
can be exploited to categorize an unseen sample xu to one
DA, 178 of ADA, and 18 of CA. This dataset cover stu-
of the pre-defined classes. Note that this new sample is
dents from 26 academic departments of Mae Fah Luang
initially presented with the original D dimensions, i.e.,
University, e.g., Law, Business Administration, Informa-
xu ¼ fxu;1 ; . . .; xu;D g. Thus, it is necessary to transform the
tion Technology, Nursing Science, and Public Health.
aforementioned representation of xu to fxu;1 ; . . .; xu;P g, In addition to the aforementioned four nominal features,
which complies to that of the training set (H). To accom- there are 17 numerical variables included in this ex-
plish this, the knowledge of ensemble P and link-based amination. Five of these represent student’s prior-univer-
similarity measures are to be re-used here. Firstly, for each sity academic capability: the overall grade (S-GPAX) and
clustering pg 2 P, the distances between xu and centroids average grades across four subject groups (S-GPA1, S-
of all the clusters belonging to pg justify the crisp mem- GPA2, S-GPA3 and S-GPA4). The other 12 features that
berships or associations that xu has with these examined are applicable only to the second problem context, regard
clusters. By repeating this for all clusterings in P, a BA- student’s performance in the first year. These are encoded
alike representation of xu is created. Then, variables (cor- as ratios of different grades achieved from a collection of
responding to clusters) in the vector xu with ‘0’ values are registered subjects. Note that the academic assessment is
modified using the similarity amongst clusters that have subjected to two grading systems: (a) 8-level setting, i.e.,
been found by the previous generation of H. A, B? and so on, and (b) 2-level setting, i.e. S (Satisfied)
and U (Unsatisfied). Let Si be the total number of subjects
taken by the ith student in the first year, Si ¼ Sai þ Sbi , given
4 Application to student dropout prediction that Sai and Sbi denote the number of subjects with 8-level
and 2-level assessment systems, respectively. The student’s
This section presents the application of the proposed data- ratio of subject with grade r 2 fA; Bþ; B; Cþ; C;
transformation approach to student dropout prediction; and Dþ; D; Fg, RTir , is estimated by
the performance evaluation with different classification ri
RTir ¼ a ; ð12Þ
models, in comparison to various unsupervised feature Si
transformation techniques.
where ri 2 f0; . . .; Sai g is the number of subjects with grade
r obtained by student i. Likewise, the student’s ratio of
4.1 Investigated educational dataset subject with grade t 2 fS; Ug, RTit , is defined as
ti
The empirical study is conducted on the specific educa- RTit ¼ ; ð13Þ
Sbi
tional dataset obtained from the operational database sys-
tem at Mae Fah Luang University, Chiang Rai, Thailand. It here ti 2 f0; . . .; Sbi g is the number of subjects with grade t
consists of 811 records each belonging to a specific student obtained by student i. In addition, the ratio of withdrawn
who joined the university and graduated in the academic subject is
year of 2009 and 2012, respectively. The initial fact ob-
Wi
served with this dataset is that 271 students (33.42 %) RTiW ¼ ; ð14Þ
Si
dropout right after the first year or later. The proposed
model is designed for a binary-class prediction model given that Wi 2 f0; . . .; Si g is the number of subjects
(Class1 = graduate, Class2 = dropout), which is to be withdrawn by student i. Table 2 illustrates statistical details
exploited in two contexts: (1) identification of graduate/- for each of these numerical features, in terms of value
dropout before starting the first-year study, and (2) identi- range, maximum, minimum and mean values.

123
Int. J. Mach. Learn. & Cyber.

Table 1 Description of Feature Data type Context 1 Context 2 Description


investigated dataset, with
Context 1 and Context 2 Gender Nominal Applicable Applicable Student’s gender
denoting two problem contexts
Province Nominal Applicable Applicable Name of student’s home province
of before- and after-first-year
prediction Type Nominal Applicable Applicable Student’s type of university entry
Department Nominal Applicable Applicable Student’s academic department
S-GPAX Numerical Applicable Applicable Student’s school grade (overall)
S-GPA1 Numerical Applicable Applicable Student’s school grade (subject type1)
S-GPA2 Numerical Applicable Applicable Student’s school grade (subject type2)
S-GPA3 Numerical Applicable Applicable Student’s school grade (subject type3)
S-GPA4 Numerical Applicable Applicable Student’s school grade (subject type4)
GPAX Numerical n/a Applicable Student’s university grade
A Ratio Numerical n/a Applicable Student’s ratio of subject with grade A
B? Ratio Numerical n/a Applicable Student’s ratio of subject with grade B?
B Ratio Numerical n/a Applicable Student’s ratio of subject with grade B
C? Ratio Numerical n/a Applicable Student’s ratio of subject with grade C?
C Ratio Numerical n/a Applicable Student’s ratio of subject with grade C
D? Ratio Numerical n/a Applicable Student’s ratio of subject with grade D?
D Ratio Numerical n/a Applicable Student’s ratio of subject with grade D
F Ratio Numerical n/a Applicable Student’s ratio of subject with grade F
S Ratio Numerical n/a Applicable Student’s ratio of subject with grade S
U Ratio Numerical n/a Applicable Student’s ratio of subject with grade U
W Ratio Numerical n/a Applicable Student’s ratio of withdrawn subject
n/a not applicable

Table 2 Statistical details of numerical features Details of these algorithms are not provided due to the
Feature Range Max Min Mean available space.1

S-GPAX [0.00–4.00] 3.96 1.98 3.06


– PCA, Principle component analysis
S-GPA1 [0.00–4.00] 4.00 1.60 3.09
– KPCA, Kernel principle component analysis
S-GPA2 [0.00–4.00] 4.00 0.75 2.62
– LPP, Locality preserving projection [16]
– NPE, Neighborhood preserving embedding [15]
S-GPA3 [0.00–4.00] 4.00 0.00 2.78
– IsoP, Isometric projection [6]
S-GPA4 [0.00–4.00] 4.00 0.00 3.17
GPAX [0.00–4.00] 4.00 0.00 2.36 Additional experimental settings are exhibited below:
A ratio [0.00–1.00] 1.00 0.00 0.12 – For the proposed models that will be referred to as
B? ratio [0.00–1.00] 0.64 0.00 0.15 WCT-T and WTQ-T, k-prototypes is employed to
B ratio [0.00–1.00] 0.67 0.00 0.17 create M base clusterings, each with a random value of
C? ratio [0.00–1.00] 0.67 0.00 0.16 c 2 ½0:1; 1. In addition, the value of DC is set to 0.9 for
C ratio [0.00–1.00] 1.00 0.00 0.14 the generation of data-transformation matrix, with the
D? ratio [0.00–1.00] 1.00 0.00 0.09 target ensemble size M 2 f10; 20; . . .; 50g. For each
D ratio [0.00–1.00] 1.00 0.00 0.08 dataset, the generation of ensemble and corresponding
F ratio [0.00–1.00] 1.00 0.00 0.08 matrix is repeated for 20 trials.
S ratio [0.00–1.00] 1.00 0.00 0.63 – Four conventional techniques decision tree (C4.5),
U ratio [0.00–1.00] 1.00 0.00 0.19 Naive Bayes, KNN (K = 1, K = 2 and K = 3) and
W ratio [0.00–1.00] 0.67 0.00 0.02 artificial neural network (ANN) are used to generate
classification models from the original and investigated
data matrices. As for ANN, the back-propagation
algorithm is used to train the networks, with weights/
4.2 Experimental design bias values being initialized to small random values

A collection of compared methods includes BA that can 1


Overview and implementation of these dimensionality reduction
be considered as the baseline model of the proposed methods in Matlab are available at http://www.cad.zju.edu.cn/home/
methods, and the following well-known techniques. dengcai/Data/DimensionReduction.html.

123
Int. J. Mach. Learn. & Cyber.

(between 0.001 and 0.0001) and the hidden layer size to the former, the second problem context is less diffi-
of 10. The learning rate and momentum are set to 0.05 cult with addition performance related features. These
and 0.01, respectively. The training process stops after variables appear to be informative towards the predic-
2000 iterations, or when a mean squared error term tion of student dropout, given that the error rates with
drops below 0.001. Given a data matrix, 10-fold cross original data matrix become generally less than 0.2.
validation is specifically exploited to determine the Again, the data matrices generated by WCT-T and
classification error rate 2 ½0; 1, where error rate of 0 WTQ-T often provide lower error rates than the other
indicates the most accurate case with no false positive transformation techniques. The most accurate model is
nor false negative. the combination of WCT-T (Fixed-k) and ANN, which
reaches the error rate of 0.06 (i.e., the accuracy level of
94 %). Similar to the assessment framework presented
4.3 Experimental results in Fig. 2, it is shown in Fig. 3 that only the ensemble-
based techniques can consistently improve the accuracy
Table 3 presents the results of different classification obtained with the original data. Note that the perfor-
models in the first application context, i.e., prediction of mance of PCA is comparable to the use of original data
dropout at the beginning of the first-year study. Without matrix, while the rest can not achieve the expected
actual students’ performance at the university, this can be improvement. In fact, their accuracies are much lower
considered as a difficult task with most of the conventional than that of the original counterpart. For the first
techniques having the error rate around 0.5 or so. Both problem context where available features do not largely
WCT-T and WTQ-T usually deliver the lowest error fig- correlate with the predictive outcomes, the projection of
ures across six classifiers under examination. With more original data onto random planes or spaces can provide
refined information matrices, the proposed methods appear alternative interpretations that are usually useful to
to be more accurate than the use of conventional BA. classification modeling. This is shown with the results
According to the table, the combination of WCT-T (Ran- of using different data transformation techniques with
dom-k) and ANN provides the accurate predictive model, the first problem context. However, as a lot more in-
with the error rate of 0.09 (i.e., the accuracy level of 91 %). formative features being included in the second context,
According to Fig. 2 that illustrates the transformation the aforementioned random embedding may not lead to
technique-specific error rates (averages across six classi- the same observation. Hence, several methods assessed
fiers), all transformation methods commonly improve the in this experiment fail to match the quality initially
accuracy obtained on the original data, with the four acquired from original features. Unlike PCA and alike
ensemble-based matrices appearing to be the most effec- models discussed previously, WCT-T and WTQ-T op-
tive. In addition, the results also suggest that PCA is more erate in the original data space that is later transformed
accurate among the rest. to data point-cluster associations. Regarding the second
Table 4 shows the results with respect to classifica- problem contexts, those informative variables are pre-
tion task at the end of the first-year study. As compared served and used to deliver a better data clustering, thus

Table 3 The results with Problem Context 1: classification error rates of the original data matrix and nine different data transformation
techniques, with the corresponding standard deviation given in parenthesis
Transformation technique C4.5 Naive Bayes KNN (K = 1) KNN (K = 2) KNN (K = 3) ANN

Original Data 0.499 (0.011) 0.476 (0.006) 0.495 (0.006) 0.492 (0.009) 0.468 (0.006) 0.503 (0.030)
PCA 0.113 (0.004) 0.109 (0.001) 0.178 (0.002) 0.187 (0.003) 0.167 (0.002) 0.108 (0.011)
KPCA 0.346 (0.006) 0.403 (0.002) 0.346 (0.003) 0.336 (0.002) 0.325 (0.002) 0.305 (0.007)
LPP 0.340 (0.004) 0.409 (0.002) 0.343 (0.004) 0.335 (0.006) 0.325 (0.003) 0.303 (0.006)
NPE 0.338 (0.006) 0.319 (0.002) 0.355 (0.003) 0.340 (0.005) 0.325 (0.004) 0.302 (0.010)
IsoP 0.340 (0.004) 0.318 (0.001) 0.357 (0.004) 0.339 (0.003) 0.323 (0.003) 0.301 (0.004)
BA 0.118 (0.007) 0.120 (0.009) 0.134 (0.005) 0.136 (0.007) 0.130 (0.008) 0.101 (0.008)
WCT-T (fixed-k) 0.109 (0.005) 0.092 (0.004) 0.119 (0.007) 0.114 (0.004) 0.099 (0.006) 0.096 (0.017)
WCT-T (random-k) 0.104 (0.005) 0.091 (0.007) 0.126 (0.005) 0.115 (0.004) 0.103 (0.003) 0.090 (0.003)
WTQ-T (fixed-k) 0.110 (0.005) 0.098 (0.005) 0.120 (0.005) 0.114 (0.004) 0.100 (0.006) 0.094 (0.007)
WTQ-T (random-k) 0.106 (0.004) 0.094 (0.005) 0.126 (0.007) 0.117 (0.005) 0.104 (0.004) 0.101 (0.016)
These are obtained using 10-fold cross validation. The two lowest error rates on each investigated classifier are highlighted in boldface

123
Int. J. Mach. Learn. & Cyber.

Fig. 2 Error rates of different


transformation techniques for
the first problem context, as the
averages across six
classification models

Table 4 The results with Problem Context 2: classification error rates of the original data matrix and nine different data transformation
techniques, with the corresponding standard deviation given in parenthesis
Transformation technique C4.5 Naive Bayes KNN (K = 1) KNN (K = 2) KNN (K = 3) ANN

Original Data 0.095 (0.006) 0.085 (0.002) 0.191 (0.004) 0.184 (0.007) 0.173 (0.004) 0.087 (0.021)
PCA 0.116 (0.006) 0.102 (0.003) 0.172 (0.005) 0.186 (0.006) 0.172 (0.005) 0.077 (0.023)
KPCA 0.296 (0.009) 0.518 (0.005) 0.288 (0.005) 0.287 (0.010) 0.298 (0.007) 0.336 (0.012)
LPP 0.323 (0.018) 0.407 (0.005) 0.291 (0.008) 0.296 (0.004) 0.300 (0.009) 0.326 (0.036)
NPE 0.335 (0.005) 0.409 (0.006) 0.341 (0.006) 0.323 (0.007) 0.322 (0.005) 0.317 (0.014)
IsoP 0.334 (0.008) 0.407 (0.004) 0.342 (0.008) 0.326 (0.008) 0.324 (0.009) 0.320 (0.015)
BA 0.092 (0.009) 0.088 (0.007) 0.119 (0.007) 0.108 (0.004) 0.100 (0.006) 0.089 (0.009)
WCT-T (fixed-k) 0.079 (0.009) 0.063 (0.005) 0.094 (0.009) 0.073 (0.007) 0.067 (0.006) 0.060 (0.006)
WCT-T (random-k) 0.074 (0.009) 0.072 (0.005) 0.103 (0.013) 0.076 (0.005) 0.069 (0.004) 0.074 (0.019)
WTQ-T (fixed-k) 0.078 (0.008) 0.062 (0.003) 0.097 (0.010) 0.073 (0.006) 0.064 (0.007) 0.073 (0.014)
WTQ-T (random-k) 0.075 (0.006) 0.073 (0.006) 0.105 (0.010) 0.078 (0.006) 0.069 (0.005) 0.072 (0.023)
These are obtained using 10-fold cross validation. The two lowest error rates on each investigated classifier are highlighted in boldface

resulting in lower error rates shown throughout this resulting classification model, is the size of ensemble (M).
experimental report. As the examined dataset is imbal- With respect to WCT-T (Fixed-k), Figs. 6 and 7 summa-
anced, in term of class distribution; it is important to rize the associations between classification error rates and
investigate the performance with respect to each of the ensemble sizes (M 2 f10; 20; . . .; 50g), for the first prob-
two classes. Specific to this point, Figs. 4 and 5 shows lem context. According to these statistics, all classifiers
the average error rates, across six classifiers, for the first become more accurate as the size of M increases. On the
and the second problem context, respectively. These other hand, based on the results shown in Figs. 8 and 9,
results suggest that the proposed methods are marginally such an improvement is not as obvious with the second
more effective at predicting a dropout case than the problem context. In the former case where the accuracy of
other. They are generally robust to this binary problem, data clustering is not certain, the quality of ensemble can
as compared to other techniques. be enhanced with a large number of diversed members. On
In addition to the previous findings, it is also important the other hand, base clusterings created from informative
to investigate the effect of algorithmic parameter on the data are highly duplicated, thus the ensemble size becomes
performance of the proposed method. The important vari- less significant. Note that similar findings have been ob-
able of WCT-T and WTQ-T that may influence the served with the other ensemble-based techniques.

123
Int. J. Mach. Learn. & Cyber.

Fig. 3 Error rates of different


transformation techniques for
the second problem context, as
the averages across six
classification models

Fig. 4 Error rates of different transformation techniques for the first Fig. 5 Error rates of different transformation techniques for the
problem context, with respect to each class second problem context, with respect to each class

Based on the previous findings, WCT-T and WTQ-T are or WTQ similarity among P clusters. Note that P can be
pffiffiffiffi
useful as a preprocessing tool for any classification model. simplified as M N , where the number of clusters in M
They usually provide improved accuracy and robust algo- pffiffiffiffi
clusterings of the target ensemble is  N .
rithmic variable setting. These methods are based similarly
on two stages of: creating an ensemble of M base clus-
terings and summarizing the ensemble as a refined infor- 5 Conclusion
mation matrix. The time complexity can be defined as the
pffiffiffiffi
summation of these stages, OðM N NDÞ þ OðNP þ P2 Þ. This paper has proposed a new data transformation model,
As for the first stage, the complexity is subjected to the which is built upon the summarized data matrix of link-
ensemble size M and that of k-prototypes; OðkNDÞ, with based cluster ensembles (LCE). Like several existing di-
pffiffiffiffi
the number of clusters k  N in this research. The com- mension reduction techniques such as PCA and KPCA, this
plexity of the second stage is that for the creation of refined method aims to achieve high classification accuracy by
matrix (of size N  P) and the pairwise estimation of WCT transforming the original data to a new form. The use of

123
Int. J. Mach. Learn. & Cyber.

Fig. 6 Error rates of WCT-T


(Fixed-k) for problem context 1,
with respect to different
ensemble sizes (M) and
classifiers

Fig. 7 Error rates of WCT-T


(Fixed-k) for problem context 1,
with respect to different
ensemble sizes (M) and
classifiers

Fig. 8 Error rates of WCT-T


(Fixed-k) for problem context 2,
with respect to different
ensemble sizes (M) and
classifiers

123
Int. J. Mach. Learn. & Cyber.

Fig. 9 Error rates of WCT-T


(Fixed-k) for problem context 2,
with respect to different
ensemble sizes (M) and
classifiers

cluster ensembles for such a purpose is novel, with works of data for classifier ensembles has been introduced by
found in the literature thus far concentrate on stacking a Verma and Rahman [55]. This has inspired the on-going
single clustering result with original variables, and then work, which aims to examine the application of WCT-T
feed this combination to the classification process. Ac- and WTQ-T in such a context.
cording to the empirical findings conducted on two dif-
ferent application contexts of student dropout prediction,
the proposed method perform better than several state-of-
the-art dimensionality reduction techniques. Also, this References
performance is robust to the setting of algorithmic pa-
1. Adamic LA, Adar E (2003) Friends and neighbors on the web.
rameter. Instead of projecting the original data onto any Soc Netw 25(3):211–230
random or subjective space, WCT-T and WTQ-T simply 2. Antons C, Maltz E (2006) Expanding the role of institutional
provide another interpretation of the examined data, in research at small private universities: a case study in enrollment
terms of cluster-wise relations. This transformation makes management using data mining. New Dir Inst Res 131:69–81
3. Baepler P, Murdoch CJ (2010) Academic analytics and data
use of the initial variables to disclose proximity and data mining in higher education. Int J Scholarsh Teach Learn 4(2):1–9
clustering, without dropping or mapping them onto new 4. Bala M, Ojha DB (2012) Study of applications of data mining
meta-features. With the strategy, information lost in techniques in education. Int J Res Sci Technol 1:1–10
transformation can thus be avoided. 5. Boongoen T, Shang C, Iam-On N, Shen Q (2011) Extending data
reliability measure to a filter approach for soft subspace cluster-
The common limitation of these new techniques is the ing. IEEE Trans Syst Man Cybern Part B 41(6):1705–1714
demanding time complexity, such that it may not scale up 6. Cai D, He X, Han J (2007) Isometric projection. In: Proceedings
well to a very large dataset. Whilst WCT-T and WTQ-T og AAAI Conference on Artificial Intelligence, pp 528–533
are not quite for a highly time-critical application, they can 7. Carroll J, Green P, Chaturvedi A (1997) Mathematical tools for
applied multivariate analysis. Academic Press, San Diego, CA
be an attractive candidate for those quality-led works, such 8. Dettling M, Buhlmann P (2003) Boosting for tumor classification
as the identification of those students at risk of under- with gene expression data. Bioinformatics 19:1061–1069
achievement. To further improve the dropout predictive 9. Duda RO, Hart PE, Stork DG (2012) Pattern classification. Wi-
model, other information types regarding students’ family ley-Interscience, New York
10. Erdogan SZ, Timor M (2005) A data mining application in a
and university-activity involvement may be also examined. student database. J Aeronaut Space Technol 2(2):53–57
To this end, the application of different clustering tech- 11. Fern XZ, Brodley CE (2004) Solving cluster ensemble problems
niques [32, 41, 59] for ensemble generation is to be in- by bipartite graph partitioning. In: Proceedings of International
vestigated. In addition, the future research also includes the Conference on Machine Learning, pp 36–43
12. Fischer B, Buhmann JM (2003) Bagging for path-based cluster-
approximation framework for those ensemble-based ing. IEEE Trans Pattern Anal Mach Intell 25(11):1411–1415
methods, which may reduce the underlying complexity and 13. Fred ALN, Jain AK (2005) Combining multiple clusterings using
boost their applicability to a wider range of data size. evidence accumulation. IEEE Trans Pattern Anal Mach Intell
Another alternative may be the use of more efficient base 27(6):835–850
14. Harb HM, Moustafa MA (2012) Selecting optimal subset of
clustering such as that proposed by Sarma et al. [46]. Re- features for student performance model. Int J Comput Sci Issues
cently, the use of data clustering to prepare multiple folds 9(5):253–262

123
Int. J. Mach. Learn. & Cyber.

15. He X, Cai D, Yan S, Zhang HJ (2005a) Neighborhood preserving 36. Nasierding G, Tsoumakas G, Kouzani AZ (2009) Clustering
embedding. In: Proceedings of International Conference on based multi-label classification for image annotation and re-
Computer Vision, pp 1208–1213 trieval. In: Proceedings of IEEE International Conference on
16. He X, Yan S, Hu Y, Niyogi P, Zhang HJ (2005b) Face recog- System, Man and Cybernetics, pp 4514–4519
nition using laplacianfaces. IEEE Trans Pattern Anal Mach Intell 37. Nguyen DV, Rocke DM (2002) Tumor classification by partial
27(3):328–340 least squares using microarray gene expression data. Bioinfor-
17. Horstmanshof L, Zimitat C (2007) Future time orientation pre- matics 18:39–50
dicts academic engagement among first-year university students. 38. Nguyen HH, Harbi N, Darmont J (2011) An efficient fuzzy
Br J Educ Psychol 77(3):703–718 clustering-based approach for intrusion detection. In: Proceedings
18. Huang Z (1998) Extensions to the k-means algorithm for clus- of IEEE International Conference on Data Mining, pp 607–612
tering large data sets with categorical values. Data Min Knowl 39. Noble K, Flynn NT, Lee JD, Hilton D (2007) Predicting suc-
Discov 2:283–304 cessful college experiences: evidence from a first year retention
19. Iam-On N, Boongoen T (2013) Revisiting link-based cluster program. J Coll Stud Retent Res Theory Pract 9(1):39–60
ensembles for microarray data classification. In: Proceedings of 40. Ramaswami M, Bhaskaran R (2010) A CHAID based perfor-
IEEE International Conference on Systems, Man and Cybernet- mance prediction model in educational data mining. Int J Comput
ics, pp 4543–4548 Sci 7(1):10–18
20. Iam-On N, Garrett S (2010) LinkCluE: A MATLAB package for 41. Rana S, Jasola S, Kumar R (2013) A boundary restricted adaptive
link-based cluster ensembles. J Stat Softw 36(9):1–36 particle swarm optimization for data clustering. Int J Mach Learn
21. Iam-On N, Boongoen T, Garrett S (2008) Refining pairwise Cybern 4(4):391–400
similarity matrix for cluster ensemble problem with cluster re- 42. Reuther P, Walter B (2006) Survey on test collections and
lations. In: Proceedings of Eleventh International Conference on techniques for personal name matching. Int J Metadata Semant
Discovery Science, pp 222–233 Ontol 1(2):89–99
22. Iam-On N, Boongoen T, Garrett S (2010) LCE: A link-based 43. Romero C, Ventura S (2010) Educational data mining: a review
cluster ensemble method for improved gene expression data of the state-of-the-art. IEEE Trans Syst Man Cybern Part C
analysis. Bioinformatics 26(12):1513–1519 40:601–618
23. Iam-On N, Boongoen T, Garrett S, Price C (2011) A link-based 44. Romero C, Ventura S (2013) Data mining in education. Wiley
approach to the cluster ensemble problem. IEEE Trans Pattern Interdisciplinary Reviews. Data Min Knowl Discov 3(1):12–27
Anal Mach Intell 33(12):2396–2409 45. Sang-Woon K (2010) A pre-clustering technique for optimizing
24. Iam-On N, Boongoen T, Garrett S, Price C (2012) A link-based subclass discriminant analysis. Pattern Recognit Lett
cluster ensemble approach for categorical data clustering. IEEE 31(6):462–468
Trans Knowl Data Eng 24(3):413–425 46. Sarma TH, Viswanath P, Reddy BE (2013) A hybrid approach to
25. Iam-On N, Boongoen T, Garrett SM, Price C (2013) New cluster speed-up the k-means clustering method. Int J Mach Learn Cy-
ensemble approach to integrative biological data analysis. Int J bern 4(2):107–117
Data Min Bioinform 8(2):159–168 47. Sittichai R (2012) Why are there dropouts among university
26. Kabra RR, Bichkar RS (2011) Performance prediction of engi- students? Experiences in a Thai University. Int J Educ Dev
neering students using decision trees. Int J Comput Appl 32:283–289
36(11):8–12 48. Strayhorn TL (2009) An examination of the impact of first-year
27. Kim E, Kim S, Ashlock D, Nam D (2009) MULTI-K: accurate seminars on correlates of college student retention. J First Year
classification of microarray subtypes using ensemble k-means Exp Stud Transit 21(1):9–27
clustering. BMC Bioinform 10:260 49. Strehl A, Ghosh J (2002) Cluster ensembles: a knowledge reuse
28. Koedinger K, Cunningham K, Skogsholm A, Leber B (2008) An framework for combining multiple partitions. J Mach Learn Res
open repository and analysis tools for fine-grained, longitudinal 3:583–617
learner data. In: Proceedings of First International Conference on 50. Subyam S (2009) Causes of dropout and program incompletion
Educational Data Mining, pp 157–166 among undergraduate students from the faculty of engineering,
29. Kongsakun K, Fung CC (2012) Neural network modeling for an king mongkut university of technology north bangkok. In:
intelligent recommendation system supporting srm for universi- Proceedings of 8th National Conference on Engineering
ties in thailand. WSEAS Trans Comput 11(2):34–44 Education
30. Kotsiantis S, Pierrakeas C, Pintelas P (2004) Prediction of stu- 51. Sung-Hyuk C, Tappert C (2009) Constructing binary decision
dent’s performance in distance learning using machine learning trees using genetic algorithms. J Pattern Recognit Res 1:1–13
techniques. Appl Artif Intell 18(5):411–426 52. Tinto V (2006) Research and practice of student retention: What
31. Luan J, Zhao CM (2006) Practicing data mining for enrollment next? J Coll Stud Retent Res Theory Pract 8(1):1–20
management and beyond. New Dir Inst Res 31(1):117–122 53. Topchy AP, Jain AK, Punch WF (2005) Clustering ensembles:
32. Ma J, Tian D, Gong M, Jiao L (2014) Fuzzy clustering with non- Models of consensus and weak partitions. IEEE Trans Pattern
local information for image segmentation. Int J Mach Learn Anal Mach Intell 27(12):1866–1881
Cybern 5(6):845–859 54. Vandamme J, Meskens N, Superby J (2007) Predicting academic
33. Miller LD, Soh LK (2013) Meta-reasoning algorithm for im- performance by data mining methods. Educ Econ 15(4):405–419
proving analysis of student interactions with learning objects 55. Verma B, Rahman A (2012) Cluster-oriented ensemble classifier:
using supervised learning. In: Proceedings of 6th International Impact of multicluster characterization on ensemble classifier
Conference on Educational Data Mining, pp 129–136 learning. IEEE Trans Knowl Data Eng 24(4):605–618
34. Monti S, Tamayo P, Mesirov JP, Golub TR (2003) Consensus 56. West M, Blanchette C, Fressman H, Huang E, Ishida S, Spang R,
clustering: A resampling-based method for class discovery and Zuan H, Marks JR, Nevins JR (2001) Predicting the clinical status
visualization of gene expression microarray data. Mach Learn of human breast cancer using gene expression profiles. Proc Natl
52(1–2):91–118 Acad Sci USA 98(20):11462–11467
35. Mostow J, Beck J (2006) Some useful tactics to modify, map and 57. Xue H, Chen S, Yang Q (2009) Discriminatively regularized
mine data from intelligent tutors. Nat Lang Eng 12:195–208 least-squares classification. Pattern Recognit 42(1):93–104

123
Int. J. Mach. Learn. & Cyber.

58. Yadav SK, Pal S (2012) Data mining application in enrollment 60. Yu C, Gangi SD, Jannasch-Pennell A, Kaprolet C (2010) A data
management: a case study. Int J Comput Appl 41(5):1–6 mining approach for identifying predictors of student retention
59. Yeung DS, Wang XZ (2002) Improving performance of simi- from sophomore to junior year. J Data Sci 8:307–325
larity-based clustering by feature weight learning. IEEE Trans
Pattern Anal Mach Intell 24(4):556–561

123

S-ar putea să vă placă și