Documente Academic
Documente Profesional
Documente Cultură
Advanced Computing and Communication Techniques for High Performance Applications (ICACCTHPA-2014)
L. Arockiam, Ph.D.
Research Scholar,
Department of Computer Science,
St. Josephs College (Autonomous),
Tiruchirappalli, Tamilnadu, India.
Associate Professor,
Department of Computer Science,
St. Josephs College (Autonomous),
Tiruchirappalli, Tamilnadu, India.
ABSTRACT
Big Data Analytics is the new technology for extracting
hidden information from the large datasets or data deluge, due
to its volume, variety, and velocity. This paper presents the
overview of big data, its available technologies and tools and
discusses the open issues of big data. Big data plays a
significant role in education sector. Everything has become
digitalized in the educational institutions, which leads to store
and process enormous amount of data. Handling the huge
amount of data is complex. The main focus of this paper is to
propose a new approach to analyze the large streaming data
that produced by web server logs of educational institution.
The result represents the students web usage behavior, which
supports to make better decisions to improve the students
performance and suggest recommendations for their academic
perspectives.
Keywords
Big Data, Big Data Analytics, Web Usage Mining.
1. INTRODUCTION
The Internet Technology makes possible to share and access
data from anywhere at any time around the world. Data
generating by the users are massive and unpredictable.
Everything is digitalized and stored in the repository. The
growth of data will never stop. For example, the Yahoo
warehouse totaled 170 petabytes (8.5 times of all hard disk
drives created in 1995) [1]. The IDC survey reported that, the
digital universe will grow by a factor of 10 from 2013 to
2020, i.e., from 4.4 trillion gigabytes to 44 trillion. It more
than doubles every two years [2]. This massive growth of data
leads to the new technology innovation called Big Data. Big
Data is the buzz word, where it can be described with the
following characters such as Volume (size of data), Velocity
(streaming data), Variety (sources of data such as text,
images, videos, audios, etc.), Veracity (uncertainty data) and
Value (decision making). According to the leading research
team Gartner [3] defined big data as Big data is highvolume, high-velocity and high-variety information assets that
demand cost-effective, innovative forms of information
processing for enhanced insight and decision making. These
enormous amounts of data makes possible to analyze and
delivers better decision support. Thus big data becomes a hot
topic, since technologies such as modern data mining
technique, machine learning, and statistical models are help to
analyze the available data for the business intelligence. Big
data plays major role in all the fields, since processing of data
is growing day-by-day. The prominent applications of big
data include weather forecasting, social networks,
telecommunications, agriculture, e-governance, e-commerce,
and education, health care and so on. In the earlier days
organization generated data and public consumed the data.
Nowadays public generates and consume the data.
11
12
13
The web logs contain noisy data. Using the web usage mining
technique [9], the raw web logs are cleaned and user
segmentation will be identified. The cleaned data may further
convert into the required data format. For example, MOA
(Massive Online Analysis) provides various techniques and
algorithms to analyze the streaming data [10]. Using the
student web usage logs and implementing the stream mining
techniques, the log data may categorize by most viewed sites,
most listened lectures and accessed e-books. Through this we
could recommend or suggest useful information and websites
by using the recommendation analysis. We can also compare
the student performance based on their internet usage and the
marks obtained in the academic examination. This result can
improve the student academic career. Web usage mining
consists of three main steps [11]:
Preprocessing
Pattern Discovery
Pattern Analysis
Raw Data
Contents
User name
Path
Traversed
Time &
Date
Time stamp
Page last
visited
Bytes
User Agent
Request
type
Preproce
ssing
Pattern
Discovery
Pattern
Analysis
path analysis
decriptive and predictive techniques
14
Edu-cloud
E-content
Web server
Web logs
Preprocessing
Student Behavior
Reports
Students
Column-Oriented Databases.
Statistical Learning
Genetic Algorithms.
Machine Learning.
Visualization.
MapReduce.
15
The other major challenges [18] includes not only the issues
of scale, but also heterogeneity, lack of new data structure,
error-handling, security and privacy, timeliness, and
visualization, at all levels of the study from data acquisition to
result interpretation. These technological challenges are
common across diversity of application domains, and it is not
cost-effective to concentrate on the context of one domain
alone.
7. CONCLUSION
This paper discussed the impact of big data and cloud
computing in the education sector. On the other hand, many
technical challenges are discussed in this paper. This paper
has proposed a framework to improve the academic
performance of the student by analyzing the web logs. In
future, our work will focus on collaborating cloud services
and big data analytics in the educational sector, the
collaborative model will be cost-effective and supports better
decision making in the educational sector.
8. REFERENCES
[1] https://www.ida.gov.sg/~/media/Files/InfocommLandsca
pe/Technology/TechnologyRoadmap/BigData.pdf, as on
08-5-2014
[2] IDC. the 2011 Digital Universe Study: Extracting Value
from Chaos, http://www.emc.com/collateral/analystreports/idc-extracting-value-from-chaos-ar.pdf, as on 0805-2014.
[3] Gartner, http://www.gartner.com/it-glossary/bigdata
[4] Infocomm Development Authority of Singapore, Big
Data, 30, Nov 2012.
[5] Jiawei Han and Micheline Kamber, Data Mining
Concepts and Techniques, chapter - 1
[6] D. Pratiba, G. Shobha, Educational BigData Mining
Approach in Cloud: Reviewing the Trend, IJCA (0975
8887), Volume 92 No.13, April 2014.
[7] R. Sallam, M. Beyer, N. Heudecker, Key Trends in Big
Data Technologies, An Article from the Connected
Buisness, 2013.
[8]
http://httpd.apache.org/docs/current/logs.html,
source as on 07/05/2014.
internet
internet
16
17