Sunteți pe pagina 1din 7

eChallenges e-2015 Conference Proceedings

Paul Cunningham and Miriam Cunningham (Eds)


IIMC International Information Management Corporation, 2015
ISBN: 978-1-905824-52-6

An Open Source Implementation of Business


Intelligence Systems: A Case of
the Technical University of Kenya
Ishmael OBONYO1, Elisha OPIYO1, William O. OKELO1, Bernard MANDERICK2
1
University of Nairobi, School of Computing, P.O. Box 30197-00100, Kenya,
Emails: ishmaelny@students.uonbi.ac.ke; opiyo@uonbi.ac.ke; wokelo@uonbi.ac.ke;
2
Free University of Brussels, Artificial Intelligence Lab, 1050 Brussels, Belgium
Emails: bmanderi@vub.ac.be
Abstract: Latest advancements in information and communication technology have
enabled organizations to develop innovative ways to intelligently collect data that may not
be possible before. This has led to the explosion of data and unprecedented challenges in
making strategic and effective use of this data for reporting towards decision making.
Business Intelligent (BI) systems fill the gap by providing timely, accurate, and actionable
information to the right person enabling quick and correct decisions. This paper describes
an open source implementation of a BI System taking a case of the Technical University of
Kenya. The implementation was achieved using Hadoop cluster integrated with R
statistical software. A data warehouse was developed. Various analytics were performed
like student admission trends and staff distribution patterns. Different user dashboards
were created. The study recommended a real-time BI system with the incorporation of
more data sources for improved analytics.

1. Introduction
In order to effectively manage an organization, timely access to the right information pertaining
to that organization by the right person is the key; this enables monitoring of activities and
assessment of performance of the organization [1]. Access to such information still poses a
challenge in organizations; information systems collect and process vast amount of data in
various forms that are not readily perceptible to decision makers [1]. Decisions in organizations
are made by human beings and not the management information systems, as a result,
presentation of data plays a very important role in any decision making process, and for a
strategy based on any decision to be successful, the information has to be comprehensive and
perceptive [2][3]. It is argued that Management Information Systems (MIS) have been
supporting organizations in their different tasks; however, today majority of these systems have
undergone significant depreciation [4]. This is because such systems have not met decision
makers expectations, such as: making decisions under pressure; monitoring competition;
possessing such information on their organizations that includes different points of view; and
carrying out constant analyses of numerous data and considering different variants of
organization performance [4] [5].
Business Intelligence (BI) systems come to the rescue of decision makers. Business
Intelligence is regarded as the process of taking large amounts of data, analyzing it, and
presenting a high-level set of reports that summarize that data into the basis of business actions,
enabling management to make vital day-to-day business decisions [6]. The implementation of
business intelligence systems can contribute to improved information quality in various ways;
faster access to information, easier querying and analysis, a higher level of interactivity, and
improved data consistency due to data integration processes, among others[7]. The benefits of
having a Business Intelligence system cannot be overemphasized; universities being one of the
organizations also require such systems.
Copyright 2015 The Authors www.eChallenges.org Page 1 of 7
Today more and more organizations are turning towards BI for making better business
decisions. This is true based on the Gartners report which states that BI applications have been
ranked the top technology priority for four years in a row [8]. In learning institutions
specifically, it is argued that new data analytics approaches are creating new ways of
understanding trends and behaviours in students that can be applied in improving learning
design, strengthening student retention, providing early warning signals concerning individual
students and helping to personalise the learner's experience [9].
It has been posited that closed and commercial Bi tools have dominated market as opposed
to open source [10]. With the popularity of BI usage in the industries, there has been little focus
on implementation in institutions of higher learning especially in Kenya.
This study described an open source implementation of Business Intelligence system taking
a case of the Technical University of Kenya (TUK). TUK was chosen as a study area due to one
of the authors over three year working experience in the University as a software developer.
The challenges that the University was facing in regards to access to valuable, correct, timely,
and actionable information depicting the whole business picture for effective decision making
are elaborated. Among the decisions that were most challenging included determining: student
admission trend; qualified students for various scholarships in the university; success factor of
various programmes offered; budget for various departments; best performing faculty or school
in the university; student retention level; staff employment legal requirements; needy students
for hostel accommodation; expansion of the university, among others.
According to Gartners maturity model for Business Intelligence [11], TUK could be said to
be at the tactical stage of BI maturity level: the management seemed to start investing in BI;
metrics used were only at departmental levels; most of the data, tools, and applications were in
silos; and users seemed not skilled enough to take advantage of the BI system. With this regard,
issues relating to information access for decision making could be broadly based on; data
acquisition, storage, cleaning, integration, and summarization; presentation and timely distribution of
the generated information to the right users.
Challenge in data acquisition was brought about by the presence of multiple data sources
available in silos across departments and existing in various formats. Acquisition process was
mainly manual, and in most cases done by IT officers. Unstructured data existed in log files, web
sites, emails, social media, and official documents like memos; it was difficult to store, clean,
integrate and summarize these kind of data with structured data from MIS. The decision makers
needed information from these multiple disconnected data across all departments.Presentation
and timely distribution of information to right users was a challenge; too much time and money
was spent on generating the required reports; reports were generated using spreadsheets.
Valuable team members were busy creating reports instead of making decisions on the report
data. One time, a retreat of some senior managers had to be organized to collate data for staff
ethnic balance which was required urgently by the Government agency. These reports were
often rigid, not dynamic; preventing data analysis and drill-down. Data quality was poor and not
validated and therefore not trusted fully; concerns over data accuracy eroded confidence to make
important decisions. Some reports were often only available monthly or quarterly; however users
needed them on-demand.

2. Objectives
The main aim of this study was to implement a Business Intelligence system for the Technical
University of Kenya using open source tools. The specific objectives were:
1. To investigate appropriate open source tools in the implementation of the BI system.
2. To design and develop the University data warehouse.
3. To perform analytics pertaining to the University.
4. To implement dashboards for different BI users.

Copyright 2015 The Authors www.eChallenges.org Page 2 of 7


3. Methodology
The implementation team consisted of a software developer who doubled as the project
manager, a BI consultant (one of the authors), and five TUK operational managers. Initially,
implementation tools were to be selected; the open source tools that were adapted included the
Hadoop ecosystems and R statistical software; it is argued that open source BI has lower
development costs, has broader deployment methods, and there is a huge community base for
continual improvement [12]. There was a data warehouse; the study adopted Kimballs Bottom-
up Dimension Modelling to achieve this. The approach to designing a data warehouse/ BI
depends on the business objectives of an organization, nature of business, time and cost
involved, and the level of dependencies between various functions [13]. As a result, first there
was a need to capture the business requirements. The operational managers of the University
were interviewed; Academic Registrar, Examinations and Certification Officer, Director of
Student Support Services, Admissions and Recruitment Officer and the Director of ICT
Services. Using the business needs sampled from the officers, a data warehouse was modelled.

DECISION MAKERS STRATEGY

BI web portal (Powered by HUE) single source of truth KNOWLEDGE

Dashboards Online Analysis Standard & Ad hoc


( Cloudera Search & (Data Visualization) Reports
SOLR) (HIVE Queries)

INFORMATION
Data Mining (R)

INTEGRATED DATA

University Data Warehouse


(HIVE, IMPALA)

ETL (SQOOP, HIVE MetaStore, Flume)

External data DISPERSED


sources Student MIS Students E-Learning DATA
Portal Portal

Figure 1: System Architecture


Once the data warehouse had been modelled, implementation followed. This mainly,
consisted of setting up of host machine (single-node cluster), Hadoop installation and
configuration, setting up of programming environment, data warehouse development, data
extraction from sources, transformation and loading, dashboard creation and data analysis.
Hadoop cluster and R were installed. Eclipse Integrated Development Environment (IDE) was
used to code and debug the BI system source code. Data schema was developed using the Star
Schema based on the Kimballs Dimension Modelling discussed above. Data, from different
sources, was transferred to Hive data warehouse using Hive Table MetaStores. Pig scripts were
also written for some extraction, transformation and loading procedures. Hadoop User
Experience (HUE) was configured and used to develop web portal for accessing the data
warehouse. Different dashboards were created using Apache Solr and Cloudera Search. Analysis
of data was carried out to generate the required reports and also answer business questions and

Copyright 2015 The Authors www.eChallenges.org Page 3 of 7


appropriate dashboards generated using Apache Solr. Hive queries were also used to analyze and
visualize data. After the implementation of the user dashboards, use case tests were carried out
to get user feedback. This was done by a section of operational staff members and a few top
managers interviewed before. Cost-benefit analysis was also carried out to justify the
implementation of the BI system into the University. The implementation process took about
seven months. The system architecture was based on the Traditional Business Intelligence
Architecture by Chaudhuri, Dayal & Narasayya [14].

4. Technology Description
4.1 Big Data Analytics
In todays businesses, increasing standards, automation, and technologies have led to vast
amounts of data becoming available in data warehouses, improved extract, transform and load
(ETL) and reporting technologies [15] enabling Business Intelligence by sifting through large
amounts of data, extracting pertinent information, and turning that information into knowledge
upon which actions can be taken. It has been lately noted that the cost of data acquisition and
data storage has declined significantly, hence, increasing the appetite of businesses to acquire
very large volumes in order to extract as much competitive advantage from it as possible [14].
This has led to the birth of the term Big Data, which denotes, datasets whose size is beyond the
ability of typical database software tools to capture, store, manage, and analyze [16]. Big Data
Analytics is the process of analysing and examining large volumes of data of a variety of types
so as to uncover hidden patterns, unknown correlations and other useful information with the
aim of helping organisations to make better decisions and possibly gain competitive advantages
[17]. Big Data Analytics can be carried out using these seven techniques: association rule
learning; classification tree analysis; genetic algorithms; machine learning; regression analysis;
sentiment analysis; and social network analysis [18]. In the Education sector, Big Data Analytics
can be applied to predict student success or dropout from school by analysing data from
Learning Management Systems (LMS) and other student online systems; e-books and mobile
device utilization and effect on students by analysing book usage, course content, and content
presentation; and track finance and budgeting of educational intuitions to identify new market
opportunities [19]. Furthermore, it is argued that institutions can apply data mining techniques
and analytics to gain an understanding on different topics such as, administrative and
instructional applications, recruitment, admission processing, financial planning, donor tracking,
and student performance monitoring [20].
4.2 Hadoop Framework
Hadoop is an Apache open source framework that allows for the distributed processing of large
data sets across clusters of computers using simple programming models, designed to scale up
from single servers to thousands of machines, each offering local computation and storage [21].
Hadoop is composed of two main components: Hadoop Distributed File System (HDFS) and
MapReduce. Hadoop Distributed File System (HDFS) is a distributed file system designed to
run on commodity hardware, it is highly fault-tolerant and is designed to be deployed on low-
cost hardware [22]. MapReduce is a programming model and an associated implementation for
processing and generating large datasets that is amenable to a broad variety of real-world tasks
[23]. Hadoop provides a reliable distributed storage through HDFS and an analysis system by
MapReduce and was designed to scale up from a few servers to hundreds or thousands of
computers, having a high degree of fault tolerance [25]. Hadoop includes an ecosystem of other
projects built on top of HDFS and MapReduce in helping achieving certain operations on the
platform.

Copyright 2015 The Authors www.eChallenges.org Page 4 of 7


4.3 R Statistical Software
R is a language and environment for statistical computing and graphics, but is also a GNU
project similar to the S language and environment which was developed at Bell Laboratories by
John Chambers and colleagues [26]. R fall short when it comes to analyzing large data since the
data size that can be manipulated by R is dictated by the random access memory (RAM) of the
computer hosting R package. Hadoop and R are quite complementary in terms of visualization
and analytics of big data.

5. Results
Hadoop framework and R were used in the implementation of the BI system. A data warehouse
was designed and developed using Apache Hive, one of the Apache projects under Hadoop.
Data was extracted from internal sources (such as MIS, students portal), official documents like
memos, and external sources. The size of total data collected was a few gigabytes. Data was
loaded into Hadoop HDFS just the way they were retrieved. Appropriate schemas on the data
warehouse were used in transferring data from HDFS to the data warehouse. With this, it was
possible to develop a fully working data warehouse using Apache Hive and Impala. Data
analysis was carried out on the data warehouse using ad-hoc queries and data visualization. The
data warehouse powered by Hive was accessed through Hue. A number of analytics were
achieved. Among the interesting reports generated were on the student admission trends, student
neediness profiling for accommodation and scholarships, and staff composition. Using
generalized linear model in R, future student admission pattern could be predicted; since some
data were missing, it was not clear the reason behind the admission decline. When student data
from Student MIS was combined with other sources like students online portal, financial system
(Sage Pastel), a near complete student profile was generated which enabled the Directorate of
Student Support Services to determine the neediness levels of various students. This enabled
faster allocation of hostel rooms to students. The neediness level list was anticipated to be used
as basis for allocation of various scholarships that usually granted. On the admission trend,
information on regional distribution was revealed whereby one of the counties thought to be
popular with the University seemed to be declining. Different dashboards were created for
different users user needs. Staff composition report enabled the institution to submit urgent
report required by the Auditor General and Kenyan Senate; this would, before, require officers
involved to work overtime. The dashboards included for course application, student ranking of
neediness levels, and student admission. These dashboards were interactive enough to allow user
for drill-down and roll-ups.

6. Business Benefits
Understanding where the value of information technology lies, and how to measure that value,
has remained an important issue for both managers and academics [27], and hence it was crucial
to evaluate the business benefits of the BI implementation in TUK. Business intelligence (BI) is
one area of IT in which traditional evaluation techniques may perform poorly, as many of the
benefits are strategic, and consequently not easily quantifiable [28]. Most benefits were thus
intangible. This study used Process Model with 6 critical factors as this was a new approach
acknowledging traditional evaluation problems and the data warehousing focus is closely related
to BI systems therefore might prove useful [29] as shown in table 1.
Table 1: Critical Factors for Evaluating Business Benefits
Critical Factor Explanation Perceived Evidence
Problems Evaluating Delays in report generation; difficulty in data consolidation due to
intangible multiple data sources available in different formats; rigid reports, not
benefits dynamic; preventing data analysis and drill-down; poor data quality.
Economic Determined Extent of holding expensive retreats for certain report generation; staff

Copyright 2015 The Authors www.eChallenges.org Page 5 of 7


environment criticality of overtime payment for management staff to prepare reports; lack of
analysis intangibles confidence in making important decisions using certain reports; delays in
making important decisions thus losing business opportunities
Information Separated Main customers are students, staff and stakeholders. Critical need to
intensity analysis customer accurate number of students, staff and suppliers; need for a complete
requirements business picture of students like demographics, academic performance,
from internal financial status, co-curriculum, health status, so on; a complete staff
intangibles profile required for decision making in recruitment, staff appraisal,
expansion, and so on
Commitment and Showed high- Though initiated by authors after reviews, management officers showed
sponsorship level enthusiasm and ready to convince top management to adopt it; currently
appreciation for there was no BI system in place; the management staff members
importance of involved were receptive and supported the whole implementation
intangibles process.
Approach to Categorized Evaluation identified the following intangible benefits: consistent and
Evaluation intangibles centralized data; timeliness of key information for decision making;
reduced cost of report generation; confidence in the report; dynamic
reports allowing drill-down and ad hoc queries;
Time scale of Managed time After the initial version, applicant and student data had been integrated
benefits scale to yield into a data warehouse showing admission trends where the staff members
quick wins could visualize reports on graphs and pie-charts.
Appraisal Compliance During preparation strategic plan of the Institution, an urgent information
techniques was provided based on staff that was aided faster decision; Staff
composition report enabled quick reply to Auditor General and Senate;
Student profiling enabled quick allocation of rooms to students; this
would take weeks.

7. Conclusions
This study managed to implement a BI system for the Technical University of Kenya through
the use of open source tools mainly Hadoop and R software. Reports that could take weeks or
months to generate were now available at a click of a button. The student profiling enabled
analysis of students from various points of views through slicing, dicing and drill-downs.
Although, the top management was not involved in the study, still there were some tangible
results achieved. Due to constraints during the study, various departments that were not
integrated into the data warehouse, it would be interesting if all these departmental data were
incorporated and even external sources like social media. Although the authors were also
interested in the open source technical implementation of the system, involving the top
management like the Vice Chancellor and the Deputy Vice Chancellors in the study seemed to
would have improved the success of the BI implementation. It would be interesting also to make
it near real-time where the BI is integrated with the sources.
The issues in TUK are evident in other typical organization in Kenya. In effect, the BI
system implementation approach would be replicated in other organizations. Universities in
Kenya have fairly the same business processes; hence this implementation approach would
apply to a typical university in Kenya. Business answers like admission trends, and student
neediness profiling for accommodation and scholarships would be the same in other universities.
The implementation model can be applied in other sectors like health care, transport, cyber
security, national security, county governance, banking, and insurance, among others. Possible
analytics can be used in: fraud detection by sifting through bank transaction logs; detection of
network intrusion by analysing network logs; reporting of possible terrorist attack by analysing
bank transactions, videos of surveillance cameras; public opinion on government service through
sentimental analysis of social media data; Internet of Things (IoT) due to presence of connected
devices (mobile phones, fridges, laptops, watches); faster detection loan defaulters by analysing
borrowing history of applicants, and others. The study experienced challenges like long learning
curve for certain Hadoop products like Mahout, R manipulation; difficult server configuration

Copyright 2015 The Authors www.eChallenges.org Page 6 of 7


and Hadoop administration; the users were not available most of the time to provide immediate
feedback; lack of access to certain operational data; some operations of the University were still
manual; and difficulty in data extraction, transformation and loading processes.

References
[1] T. H. Davenport and L. Prusak, Working knowledge: How organizations manage what they know. Harvard
Business Press, 1998.
[2] J. P. Herring, "The role of intelligence in formulating strategy," Journal of Business Strategy, pp. 54-60, 1992.
[3] S. Malik, Enterprise dashboards design and best practices, 1st ed. New Jersey: Wiley, 2005.
[4] C. M. Olszak and E. Ziemba, "Approach to building and implementing business intelligence systems,"
Interdisciplinary Journal of Information Knowledge and Management, pp. 2,134-148, 2007.
[5] G. Muhammad, J. Ibrahim, Z. Bhatti, and A. Waqas, "Muhammad, G., Ibrahim, J., Bhatti, Z., & Waqas, A.
(2014). Business Intelligence as a Knowledge Management Tool in Providing Financial Consultancy
Services," American Journal of Information Systems, pp. 2(2)26-32, 2014.
[6] R. Stackowiak, J. Rayman, and R. Greenwald, Oracle Data Warehousing and Business Intelligence Solutions.
Indianapolis: Wiley Publishing, Inc, 2007.
[7] A. Popovic, P. S. Coelho, and J. Jaklic, "The Impact of Business Intelligence System Maturity on Information
Quality," Information Research, p. aer417, 2009.
[8] Gartner Research. (2009, Jun.) Gartner Research. [Online]. http://www.gartner.com/newsroom/id/1017812
[9] N. Karen, J. A. Clark, I. D. Stoodley, and T. A. Creagh, "Establishing a framework for transforming student
engagement, success and retention in higher education institutions," Queensland University of Technology,
Sydney, Final Report, 2014.
[10] M. Golfareli, Open source BI platforms: a functional and architectural comparison. Berlin Heidelberg:
Springer, 2009.
[11] B. Burton, "Results of Business Intelligence and Performance Management Maturity Survey," Gartner Inc.
Research, 2009.
[12] L. Wise, Using Open Source Platforms for Business Intelligence: Avoid Pitfalls and Maximize ROI. Newnes,
2012.
[13] R. Kimball and M. Ross, The data warehouse toolkit: the complete guide to dimensional modeling. John
Wiley & Sons, 2011.
[14] C. Surajit, U. Dayal, and V. Narasayya, "An overview of business intelligence technology," Communications
of the ACM, vol. 44, no. 8, 2011.
[15] J. Ranjan, "Business intelligence: concepts, components, techniques and benefits," ournal of Theoretical and
Applied Information Technology, pp. 60-70, 2009.
[16] M. Minelli, M. Chambers, and A. Dhiraj, Big data, big analytics: emerging business intelligence and analytic
trends for today's businesses. John Wiley & Sons, 2012.
[17] S. Miller. (2013) Singapore Management University. [Online]. http://ink.library.smu.edu.sg/podcasts/8/
[18] Talegaon and A. P. Shubhada, "ANALYTICS OF BIG DATA," COMPUSOFT, An international journal of
advanced computer technology, vol. 3, no. 10, Oct. 2014.
[19] F. Kalota, "Applications of Big Data in Education," International Journal of Social, Behavioral, Educational,
Economic and Management Engineering, vol. 9, no. 5, 2015.
[20] A. G. Picciano, "The Evolution of Big Data and Learning Analytics in American Higher Education," Journal
of Asynchronous Learning Networks, vol. 6, no. 3, pp. 9-20, 2012.
[21] Apache Hadoop. (2015) Hadoop. [Online]. http://hadoop.apache.org/
[22] D. Borthakur. (2008, Jan.) Hadoop Apache Project. [Online].
http://hadoop.apache.org/common/docs/current/hdfs-design.pdf
[23] J. Dean and G. Sanjay, "MapReduce: simplified data processing on large clusters," Communications of the
ACM, vol. 51, no. 1, pp. 107-113, 2008.
[24] T. Condie, N. Conway, P. Alvaro, and J. Hellerstein, "MapReduce Online," NSDI, vol. 10, no. 4, p. 20, 2010.
[25] B. Oancea and R. M. Dragoescu, Integrating R and Hadoop for Big Data Analysis. arXiv preprint
arXiv:1407.4908., 2014.
[26] The R Project for Statistical Computing. (2015) The R Project for Statistical Computing. [Online].
http://www.r-project.org
[27] M. Davern and R. Kauffman, "Discovering Potential and Realizing Value from Information," Journal of
Management Information Systems Spring, pp. 121-143, 2000.
[28] Z. Irani and P. D. E. Love, "The Propagation of Technology Management Taxonomies for Evaluating,"
Journal of Management Information Systems, vol. 17, no. 3, pp. 161-177, 2001.
[29] M. Gibson, A. David, J. Ilona, and A. Melbourne, "Evaluating the Intangible Benefits of Business Intelligence:
Review & Research Agenda," in Proceedings of the 2004 IFIP International Conference on Decision Support
Systems (DSS2004): Decision Support in an Uncertain and Complex World, Prato, Italy, 2004.

Copyright 2015 The Authors www.eChallenges.org Page 7 of 7

S-ar putea să vă placă și