Documente Academic
Documente Profesional
Documente Cultură
ABSTRACT
Objectives:
The purpose of this research is to portray and discuss the perspectives of the evolving use of Big
Data Analytics to unravel the causes and prerequisites for preventing diseases and to examine
some of the opportunities and challenges on its economic value in Public Health and offers
recommendations & conclusions.
Methods:
A non-systematic review of the literature was conducted to highlight the implications associated
with the use of Big Data Analytics in healthcare innovations and its applications to address
public health challenges in India. A thematic review of selected articles was performed with an
architectural framework and methodology, describes examples reported in the literature, briefly
discusses the challenges, and offers conclusions.
Results:
The paper provides a broad overview various applications of Big Data analytics for clinicians,
public health practitioners, epidemiologists, policy makers and other health experts for the use of
Big Data and analytics in the areas of healthcare
Conclusions:.
The concept of Big Data and associated analytics are to be taken seriously when approaching the
use of vast volumes of both structured and unstructured data in science and healthcare. Big Data
analytics in Public Health is evolving into a promising field for providing insight from very large
data sets and improving outcomes while reducing costs. Future exploration of issues surrounding
data privacy, confidentiality, and education are needed.
Page 1 of 34
2. INTRODUCTION
Public Health is not a new field every successful civilization has recognized the health
implications of lean water and the efficient disposal of human waste. Today, the Public Health
agenda has been defined and driven by National and International agencies such as the World
Health Organization (WHO), the National Health Service (NHS) and the Centers for Disease
Control and Prevention (CDC). Healthcare in India is government financed and government run.
But, for many people living in the many parts of the country, accessing primary healthcare is still
a challenge. For the developing countries like India, strengthening the public health system is
one of the most important areas for emphasis, so as to provide better healthcare access to the
priceless human resources, which in turn can make the India healthier too.
The most effective public health interventions are typically preventative interventions and
policies that help stop a crisis before it starts. But predicting the next public health crisis has
historically been a challenge in preventing diseases, designing better diagnostic tools and
increase access to and reduce the costs of healthcare. Many experts, including researchers, policy
makers and practitioners identified that, there is a big gap in the knowledge about interventions
in Public Health delivery systems. The inefficiencies and inequities in the Public Health in India
have pushed forward the need for creative thinking and innovative solutions to strengthen the
same. The exponential growth of data over the last decade has introduced a new domain in needs
validation and analysis, Big Data Analytics can be applied. Big Data has the potential to perform
critical computing and analytical ability towards the processing of the huge volumes of
transactional data.
Big data in healthcare is overwhelming not only because of its volume but also because of the
diversity of data types and the speed at which it must be managed. The totality of data related to
patient healthcare and wellbeing make up big data in the healthcare industry. It includes
clinical data from CPOE and clinical decision support systems (physicians written notes and
prescriptions, medical imaging, laboratory, pharmacy, insurance, and other administrative data);
patient data in electronic patient records (EPRs); machine generated/sensor data, such as from
monitoring vital signs; social media posts, including Twitter feeds (so-called tweets) [8], blogs
[9], status updates on Facebook and other platforms, and web pages; and less patient-specific
information, including emergency care data, news feeds, and articles in medical journals.
Page 2 of 34
The potential applications of Big Data analytics in public health are 1) analyzing disease patterns
and tracking disease outbreaks and transmission to improve public health surveillance and speed
response; 2) faster development of more accurately targeted vaccines, e.g.,choosing the annual
influenza strains; and, 3) turning large amounts of data into actionable information that can be
used to identify needs, provide services, and predict and prevent crises, especially for the benefit
of populations. In addition, [14] suggests Big Data analytics in healthcare can contribute to
Evidence-based medicine: Combine and analyze a variety of structured and unstructured dataEMRs, financial and operational data, clinical data, and genomic data to match treatments with
outcomes, predict patients at risk for disease or readmission and provide more efficient care.
The current research project provides an overview of Big Data analytics in addressing the
healthcare as it is emerging as a discipline. First, we define and discuss the various advantages
and characteristics of Big Data analytics in healthcare. Then we describe the architectural
framework of Big Data analytics in healthcare. Third, the Big Data analytics application
development methodology is described. Fourth, we provide examples of Big Data analytics in
healthcare reported in the literature. Fifth, the challenges are identified. Lastly, we offer
conclusions and future directions.
Page 3 of 34
Objectives:
The main objective of this dissertation was to gain new knowledge on how to bridge data mining
and Public Health communities to foster interdisciplinary works between the two communities.
The data collected were then used to achieve the following specific objectives:
1. To identify the benefits, risks and opportunities for Big Data in health and make
recommendations for the use of Big Data in the delivery of healthcare services in India.
2. To understand the gap between the healthcare delivery systems and public health
3. To understand the spatial distribution of epidemiological outbreaks globally by using
Google Trends tool.
Page 4 of 34
4. REVIEW OF LITERATURE
4.1 What is BIG DATAA?
Big Data is a term used by the IT industry to describe the voluminous amount of unstructured
data an organization creates. It represents information that has not been normalized or
harmonized, comes from many different sources, and in the past has been too expensive or not
practical operationally to normalize for typical online transactional processing (OLTP) or data
warehouse type data stores. Big Data (BIG DATA) has the characteristic of vast size that
exceeds the capability of traditional data management technologies and requires the use of new
capabilities and processes to source, process and manage it.
In siple terms Big Data is A collection of large and complex data sets which are difficult to
process using common database management tools or traditional data processing applications.
Big Data refers to the tools, processes and procedures allowing an organization to create,
manipulate, and manage very large data sets and storage facilities. Big Data is not just about
size. Finds insights from complex, noisy, heterogeneous, longitudinal, and voluminous data. It
aims to answer questions that were previously unanswered.
Four Vs definition that points to the four characteristics of Big Data, namely volume, variety,
velocity, and veracity
BIG DATA is described using four terms:
The convergence of these four dimensions helps to define Big Data:
Volume (the amount of data): it refers to the mass quantities of data that organizations
are trying to use to improve decision-making processes. Data volumes continue to
increase at an unprecedented rate. However, geography, and is smaller than the petabytes
and zettabytes often referenced. Many companies consider datasets between one terabyte
and one petabyte to be Big Data. Still, everyone can agree that whatever is considered
high volume today, will be even higher tomorrow.
Variety (different types of data and data sources): variety is about managing the
complexity of multiple data types, including structured, semi-structured and unstructured
data. Organizations need to integrate and analyze data from a complex array of both
traditional and nontraditional information sources, from within and outside the enterprise.
With the explosion of sensors, smart devices and social media technologies, data is being
Page 5 of 34
generated in countless forms, including text, web data, tweets, sensor data, audio, video,
click streams, log files and more;
Velocity (data in motion): the speed at which data is created, processed and analyzed
continues to accelerate. Higher velocity is due to both the real-time nature of data
creation, and the need to incorporate streaming data into business processes. Today, data
is continually being generated at a rate that is impossible for traditional systems to
capture, store and analyze. For time-sensitive processes such as multi-channel instant
marketing, data must be analyzed in real time to be of value to the business;
Veracity (data uncertainty): it refers to the level of reliability associated with certain
types of data. The quest for high data quality is an important Big Data requirement and
challenge, but even the best data cleansing methods cannot remove the inherent
unpredictability of some data, like the weather, the economy, or a customers buying
decisions. The need to acknowledge and plan for uncertainty is a dimension of Big Data
that has been introduced as executives try to better understand the uncertain world around
them.
recognize inherent patterns, correlations and anomalies which are discovered as a result of
integrating vast amounts of data from different datasets.
Together, the term Big Data Analytics represents, across all industries, new data-driven
insights which are being used for competitive advantage over peer organizations to more
effectively market products and services to targeted consumers. Examples include real-time
purchasing patterns and recommendations back to consumers, and gaining better understandings
and insights into consumer preferences and perspectives through affinity to certain social groups.
The origin of BIG DATAA comes from web-based search engines such as Google and Yahoo,
the popularity of social media and social networking services such as Facebook and Twitter, and
data-generating sensors, telehealth and mobile devices. All have increased and generated new
data and opportunities for new insights on customer behaviours and trends. While BIG DATAA
frameworks have been in operation since 2005, they have just recently moved into other
industries and sectors including financial services firms and banks, online retailers and
healthcare.
For healthcare, Big Data represents opportunities to exploit personalized care, streamline health
operations, support clinical and policy decision making, and improve patient engagement.
Today, across all industries, the typical sources of Big Data include:
Internet transactions By 2015, more than three billion people will be online. Billions
of online purchases, stock trades, social networking exchanges, Internet searches and
other transactions happen every day, including countless automated transactions. Each
creates a number of data points collected by retailers, banks, credit card issuers, credit
agencies, social networking and search engine service providers and others.
Mobile devices There are more than 5.6 billion mobile phones in use worldwide. Each
call, text and instant message is generating data. The average teen texts 4,700 times per
month. Mobile devices, particularly smart phones and tablets, also make it easier to use
social networking and other data-generating applications. Mobile devices also collect and
transmit location data.
Social networking and media There are currently more than 955 million active
Facebook users, 500 million Twitter users and 156 million public blogs. By 2015, more
than two billion videos will be watched over YouTube in one day. Each Facebook
Page 7 of 34
update, tweet, blog post and comment creates multiple new data points structured,
semi-structured and unstructured sometimes referred to as data exhaust.
Networked devices and sensors Electronic devices of all sorts including servers and
other IT hardware, smart energy meters and temperature sensors, patient monitors and
aides all create semi-structured log data that record every action.
Genomic data Represents significant amounts of new gene sequencing data being
made available through new investments, BIG DATAA capabilities and business models.
Streamed data Home monitoring, telehealth, handheld and sensor-based wireless and
smart devices are new data sources and types. They represent significant amounts of real
time data available for use by the health system.
Web and social networking-based data Web-based data comes from Google and
other search engines, consumer use of the Internet, as well as data from social networking
sites.
Health publication and clinical reference data This includes text-based publications
(clinical research and medical reference material) and clinical text based reference
practice guidelines and health product (e.g., drug information) data.
Clinical data Eighty per cent of health data is unstructured as documents, images,
clinical or transcribed notes. These semi-structured to unstructured clinical records and
documents represent new data sources
Business, organizational and external data Data which previously has not been
linked, such as financial, billing, scheduling, administrative, external and other nonclinical and non-health data.
It is important to note that while there are many sources of Big Data within the health sector, it is
unrealistic to assume that all data can be put to use for Big Data due to a range of governance,
privacy, operational and technical considerations.
Gartner Groups analysis of Big Data shows that vendors are enabling Big Data with a wide
variety of new and old technologies, in different ways and at different rates. Overall, Gartner
depicts an IT market that is still fairly immature, with larger traditional DW/BI entities engaged
and investing millions of dollars, and smaller Big Data pure-players ramping up their go-tomarket strategies purely focused on Big Data. Gartners research points to a marketplace in the
early adopter phase, despite the large valuation8 of $5 billion (US).
Page 8 of 34
STEP 02:
Find the right ways (smart devices, Internet, hospitals ) to collect your data;
STEP 03:
STEP 04:
STEP 05:
STEP 06:
To start a Big Data project, several steps are suggested as shown in Fig. 1: First, the right
problem should be chosen. There are three kinds of problems. The first kind of problem has
already been solved with traditional method and there is no need to use Big Data technologies.
The second kind of problem is impossible to be solved with current technologies. We should
focus on the third kind of problem that is solvable with current Big Data technologies. Second,
we need to generate the data by sensors, monitors, molecular profiling or extract the data from
public databases/sources after setting up a practical goal. Third, we need to do data preprocessing to obtain clean and meaningful data. Data pre-processing is a critical step for the
success of a Big Data project. A recent publication [5] showed that sample mis-alignment for
eQTL (expression Quantitative Trait Loci) and mQTL (methylation Quantitative Trait Loci)
studies will reduce the discovered associations by 27 folds. The quality control of data
essentially determines the upper bound of the data product, i.e. garbage in garbage out. The clean
data will be stored into database for the next step analysis. Fourth, the insight or knowledge will
be discovered from the processed data through statistical analysis. At last, the analytic results
will be presented to the end user as a report, an online recommendation or a decision-making.
Visualization of data, such as networks/graphs and charts, make the analytic results easy to
interpret and understand. If the results do not make sense, we need to reformulate our problems
and start the steps over again.
In health sciences, there are many problems that can be addressed with Big Data technologies,
such as recommendation system in healthcare, Internet based epidemic surveillance, sensor
based health condition and food safety monitoring, Genome-Wide Association Studies (GWAS)
and expression Quantitative Trait Loci (eQTL), inferring air quality using Big Data and
metabolomics and ionomics for nutritionists.
To solve these problems, many advanced computational technologies will be used. We will
cover the following technological perspectives: (1) Infrastructure of Big Data; (2) Analyzing of
Big Data Results; and (3) Visualization of Big Data Results. And the future perspectives of
health sciences in the era of Big Data will be discussed.
Page 10 of 34
2. Machine to machine data: readings from remote sensors, meters, and other vital sign devices
[6].
3. Big transaction data: healthcare claims and other billing records increasingly available in
semi-structured and unstructured formats [6].
4. Biometric data: finger prints, genetics, handwriting, retinal scans, x-ray and other medical
images, blood pressure, pulse and pulse-oximetry readings, and other similar types of data [6].
5. Human-generated data: unstructured and semi-structured data such as EMRs, physicians notes,
email, and paper documents [6].
For the purpose of Big Data analytics, this data has to be pooled. In the second component the data is in
raw state and needs to be processed or transformed, at which point several options are available. A
service oriented architectural approach combined with web services (middleware) is one possibility [27].
The data stays raw and services are used to call, retrieve and process the data. Another approach is data
warehousing wherein data from various sources is aggregated and made ready for processing, although
the data is not available in realtime. Via the steps of extract, transform, and load (ETL), data from diverse
sources is cleansed and readied. Depending on whether the data is structured or unstructured, several data
formats can be input to the Big Data analytics platform.
Page 12 of 34
In this next component in the conceptual framework, several decisions are made regarding the data input
approach, distributed design, tool selection and analytics models. Finally, on the far right, the four typical
applications of Big Data analytics in healthcare are shown.
These include queries, reports, OLAP, and data mining. Visualization is an overarching theme across the
four applications. Drawing from such fields as statistics, computer science, applied mathematics and
economics, a wide variety of techniques and technologies has been developed and adapted to aggregate,
manipulate, analyze, and visualize Big Data in healthcare.
The most significant platform for Big Data analytics is the open-source distributed data processing
platform Hadoop (Apache platform), initially developed for such routine functions as aggregating web
search indexes. It belongs to the class NoSQL technologiesothers include CouchDB and
MongoDBthat evolved to aggregate data in unique ways. Hadoop has the potential to process
extremely large amounts of data mainly by allocating partitioned data sets to numerous servers (nodes),
each of which solves different parts of the larger problem and then integrates them for the final result [2831].
Hadoop can serve the twin roles of data organizer and analytics tool. It offers a great deal of potential in
enabling enterprises to harness the data that has been, until now, difficult to manage and analyze.
Specifically, Hadoop makes it possible to process extremely large volumes of data with various structures
or no structure at all. But Hadoop can be challenging to install, configure and administer, and individuals
with Hadoop skills are not easily found. Furthermore, for these reasons, it appears organizations are not
quite ready to embrace Hadoop completely.
The surrounding ecosystem of additional platforms and tools supports the Hadoop distributed platform
[30,31]. These are summarized in Table 1. Numerous vendorsincluding AWS, Cloudera, Hortonworks,
and MapR Technologiesdistribute opensource Hadoop platforms [29]. Many proprietary options are
also available, such as IBMs BigInsights. Further, many of these platforms are cloud versions, making
them widely available. Cassandra, HBase, and MongoDB, described above, are used widely for the
database component.
While the available frameworks and tools are mostly open source and wrapped around Hadoop and
related platforms, there are numerous trade-offs that developers and users of Big Data analytics in
healthcare must consider. While the development costs may be lower since these tools are open source
and free of charge, the downsides are the lack of technical support and minimal security. In the healthcare
industry, these are, of course, significant drawbacks, and therefore the trade-offs must be addressed.
Page 13 of 34
Additionally, these platforms/tools require a great deal of programming, skills the typical end-user in
healthcare may not possess. Furthermore, considering the only recent emergence of Big Data analytics in
healthcare, governance issues including ownership, privacy, security, and standards have yet to be
addressed. In the next section we offer an applied Big Data analytics in healthcare methodology to
develop and implement a Big Data project for healthcare providers.
Page 14 of 34
6. RESULTS
6.1. Review of Big Data applications to Public Health:
Many countries are applying Big Data analytics to solve problems in healthcare. The benefits of
health-related Big Data have been demonstrated in three areas so far, namely to 1) prevent
disease, 2) identify modifiable risk factors for disease, and 3) design interventions for health
behavior change [9]. Organizations worldwide are recognizing the Big Data movement and
introducing new initiatives for knowledge discovery and data-driven decision-making. For
example, the National Institute of Health (NIH) is establishing the Big Data to Knowledge (BIG
DATA2K) and Infrastructure Plus Program, which provides a shared computational environment
(e.g. data standards, ontologies, data catalogues, virtualized cloud computing) to facilitate largescale biomedical data analysis for the NIH community [10]. Specifically, the NIH US Library of
Medicine hosts an impressive set of data sharing repositories [11], which primarily accept
submissions of biomedical data and other information sharing systems from NIH-funded
investigators. In addition, the United Nations (UN) is launching the Global Pulse project, which
advocates for the data philanthropy movement by asking organizations and individuals to
contribute data, resources, and skills to help understand the impact of UN development programs
and ways to improve their outreach on affected populations and regions [12].
In the United States, the Pillbox project results in an annual $500 million reduction in healthcare
costs through the application of Big Data analytics [3,4]. The San Francisco Police Department
has developed a Big Data system designed for crime prevention [3]. The UK is utilizing Big
Data through establishment and management of the Foresight Horizon Scanning Centre, which
serves as a countermeasure to various health and social problems such as obesity, potential risk
management (coastal erosion, climate change), and epidemics [5]. The EU is dealing with
uncertainty through the iKnow (Interconnect Knowledge) project, which provides opportunities
for research on earthquakes, tsunamis, terrorism, networking, and global crisis [15]. The OECD
adopted evaluating economic benefits of Big Data as an agenda for the 15th Working Party on
Indicators for the Information Society (WPIIS) by considering Big Data for business efficiency
[8].
Page 15 of 34
Moreover, the Australian Government Information Management Office has saved time and
resources by developing an automated tool that can analyze, search, and reuse massive
information through government 2.0 [7]. In 2004, Singapore established the Risk Assessment
and Horizon Scanning (RAHS) to prepare for future uncertainty regarding terrorism and
epidemics [6].
Big Data streams in health can be broadly summarized into three categories [13]. Traditional
medical data is primarily originated from the health system (e.g. EMRs, personal and family
health history, medication history, lab reports, pathology results), where the objective of these
analyses is to derive a better understanding of disease outcomes and their risk factors, reduce
health system costs, and improve its efficiency [13]. Omics data refer to large-scale datasets in
the biological and molecular fields (e.g. genomics, microbiomics, proteomics, and
metabolomics), where the aim of these analyses is to understand the mechanisms of diseases and
accelerate the individualization of medical treatments (e.g. precision medicine) [3, 6]. As
pointed out by Alice Whittmore, in the Stanford Big Data in Biomedicine Conference (2013),
genomic testing and mapping could, for example, point to women in high risk of developing
breast cancer, which would allow allocating them preventive care, and reduce the need for large
scale, potentially hazardous interventions, for other low-risk women [14]. Last but not least, data
from social media and the quantified-self movement essentially consist, of signs and behaviors
on how individuals (or groups of individuals) use the Internet, social media, mobile applications
(apps), sensor devices, wearable computing devices, or other technological and nontechnological tools to better inform and enhance their health.
This section presents examples of health-related Big Data projects, with an emphasis on data
from social media and the quantified-self movement (Table 1). For Big Data research related to
EMRs, digital enterprise, genetic data and omics sources, readers can refer to the following
reviews and perspectives conducted recently [15, 16, 17, 18, 19].
Page 16 of 34
Examples of health-related Big Data projects related to social media and the quantified-self
movement.
Data type
reporting, or
sensors)
Quantified-self data
Examples
[13]
longer
follow-up
currently
periods
possible
than
using
is
standard
Asthmapolis) [25]
questionnaires [13]
Location-based
information
Information
derived
from
Global
Weather
patterns,
pollution
levels,
projects
Provides
information
on
the
HealthMap [37]
Twitter (Note: a
Facilitates
emergency
services
by
Page 17 of 34
Data type
Examples
of English-
language tweets
communication [40
16.6% relate to
health [46])
eyewitness
reports
are
plotted
on
additional resources)
Emergency
health
messages,
quantify
situations
from
Boston
medical
misconception)
PatientsLikeMe [47]
social networking
sites
consumers
participant-reported
analyze,
Health-related
map,
symptoms
and
and
disseminate
Other social
networking sites
(e.g. online
issues
discussion board,
Facebook)
Web logs
behaviors
Page 18 of 34
Data type
Examples
predictions made by Google Flu Trends were 710 days prior to the official CDC networks and
their results were consistent [11].
For Chinese users, Baidu disease trend (http://trends.baidu.com/disease/) provided the province
citycounty view of prevalence of several diseases include hepatitis, tuberculosis, venereal
disease and influenza. What's more, its Big Data Trend product is open to ordinary users and
therefore similar trends can be customized.
Twitter is a widely used social networking and news-sharing platform. The tweets reflected
people's opinions and judgments about public event, especially the epidemic outbreaks [12].
Several methods were developed to monitor people's reaction to epidemic outbreak [12] and
early disease syndrome based on Twitter [13]. The tweets involving H1N1 activity can be
collected by searching key words, such as flu, influenza and H1N1. And the tweets involving
public concern can also be filtered using keywords like travel, flight and ship for disease
transmission, keywords like wash, hygiene and mask for disease counter measures. By studying
the sequential tweets of H1N1 activity and public concern, the evolution pattern of public
countermeasure can be revealed [12]. Similarly, by analyzing the early disease syndrome
keywords, the risks of diseases such as cancer, flu, depression, aches/pains, allergies, obesity and
dental disease, can be estimated [13].
Page 20 of 34
Page 21 of 34
7. DISCUSSIONS
Even though many benefits are expected with the implementation of Big Data in the areas of
Healthcare, there are certain difficulties in particular, have unique characteristics that merit
special analysis of the challenges faced by the application of Big Data and the ways they can be
surmounted.
In this section six broad categories have been developed to organize the content; with each
domain the difficulties that are common to all Big Data are mentioned, and finally the challenges
and opportunities to overcome them.
7.1 Data Capture:
Data sets are becoming larger and more difficult to manage using traditional database tools. As a
result, organizations are faced with difficulties to capture, store, manage, and analyze data in a
timely manner [15]. Consequently, this situation creates new infrastructure needs, and significant
economic costs. Fortunately, storage costs are also decreasing. This allows for the capture of
useful data, such as location data, which permit the mapping of real-time events for
epidemiological surveillance.
The growing adoption of mobile phones, 80% of which are located in India [27], offers the
possibility to use the data they provide to improve development programs. For example, SMS for
Life uses a combination of mobile phones, SMS messages, the Internet, and electronic mapping
technology to track weekly stock levels of malaria drugs at public health facilities. This program
improved the distribution of malaria drugs in rural Tanzania, reducing facilities without stock
from 78% to 26% [28]. In 2013, this initiative encompassed several countries in sub-Saharan
Africa from Ghana to Kenya, with plans to increase the number of countries reached [29].
7.2 Infrastructure:
A robust physical infrastructure is a key point for the operation and scalability of a Big Data. It is
based on a distributed model, where data can be physically stored in different places and
integrated through networks. The fundamental condition to take advantage of this capacity lies in
the quality of telecommunications, which offer a gateway to Big Data.
Large Internet companies like Google, Microsoft, Yahoo, and Amazon use this architecture with
centers distributed throughout the world offering their services. All these changes in
Page 22 of 34
infrastructure involve substantial costs, generating economies of scale that favor large Internet
companies [32], which take advantage of these barriers to provide infrastructure as a service
(IaaS) to organizations who cannot afford them [33].
In addition, apart from the hardware infrastructure, an additional component is required: the
software used to implement Big Data. The production, adoption, and adaptation of this software
are key ingredients for Big Data, and require a properly trained workforce [30].
Many developing countries lack the storage and communications infrastructure needed to
organize and integrate the amount of information that is generated in a Big Data. Not only do
these countries lack these resources, but they dont have the computing capacity to analyze them.
The vast majority of the necessary hardware resides in developed countries, and access to
information and resources is skewed by a very unequal distribution of telecommunication
capabilities to access them [30].
Regarding software used for organizing, integrating, and analyzing data, production is limited by
the lack of a trained workforce, and the possibility to purchase or license the necessary systems
is often not an option for developing countries. However, there are open source options with
strong communities that provide the necessary functionalities for free. The most outstanding
example is Apache Hadoop [42], a platform for processing large amounts of data distributed on
computer clusters used by companies like Yahoo and Facebook.
7.4 Organizational Changes Workforce:
According to Villars et al, BIG DATA deployments require new IT administration and
application developer skill sets. Additionally, the people who possess these skills are a scarce
resource given the high market demand. Hal Varian, Googles chief economist, contends that
statisticians will have the job most in demand in the next decade.
To take advantage of the opportunity created by Big Data, trained human resources are needed,
with the ability to manage and analyze data, with knowledge in computer science, statistics, and
mathematics. Some developing countries are better positioned in this regard, including Brazil,
Russia, India and China (the BRIC countries). In 2008, 40% of the specialized resources were
trained in these countries [30].
Page 23 of 34
As Internet and technological advances allow the outsourcing of infrastructures, there also exists
the possibility to recruit the human resources needed for a Big Data project over the web. As an
example, the Kaggle platform allows any organization to set a prize, and specialists from around
the world can compete to solve Big Data problems [45]. Ultimately, this possibility depends on
the economic resources that can be offered. One important example of a nonprofit organization is
Datakind, a group of data scientists that work with high impact social organizations to improve
their decision making processes [46].
7.5 Integration and Interoperability
One of the greatest challenges Big Data faces is to integrate data from many different sources.
The use of standards to achieve interoperability between systems is a core requirement to
effectively integrate information [47].
The major difficulty for achieving interoperability among multiple repositories of Big Data lies
in the differences in the metadata used in one repository with respect to other repositories.
Without standards for these metadata, the integration of data generated in Big Data projects will
be even more challenging [48].
Health information systems are often fragmented and isolated in information silos hindering
analysis and improvements in healthcare assistance [49]. This problem requires a political rather
than a technological solution. In most cases, the required standards for systems to interoperate
already exist, and they are the same in developing countries than in developed countries [50]. It
is necessary to achieve consensus between government organizations, businesses, and
stakeholders in order to advance in the development of digital agendas.
Developed countries have made progress in spreading digital agendas in the last decade, and are
now better positioned than developing countries, although lately this gap is narrowing.
According to the World Health Organization (WHO), since 2008 more than 20 developing
countries are in the process of implementing strategic plans for eHealth [51].
The WHO and the International Telecommunications Union (ITU) published a document in
order to help countries in the process of generating a national eHealth vision and an action plan
Page 24 of 34
(National eHealth Strategy Toolkit) [52]. These resources are especially useful for governments
in developing countries.
7.6 Privacy and Security
Some characteristics of Big Data, such as the relative lack of structure and the informal nature of
some data, can be a problem if they are sensitive, with potential privacy, safety or legal issues.
Traditional database management systems support granular security policies that protect data at
various levels. The software used in Big Data does not usually have these safety measures [15].
Another important challenge includes the security infrastructure and privacy policies. It is crucial
to apply not only legal but also ethical considerations on the security of the data as soon as
possible. The development of strategies to report on how data are collected, how they are
protected, and how they will be used should be considered and recognized as a necessity [53].
Likewise, an action plan should be contemplated in case of possible data losses or security
breaches. Sharing information in a clear and careful way will help reduce concerns related to
security and privacy [54].
It is essential to ensure the privacy and confidentiality of personal data, especially with regard to
the use of Big Data in healthcare. These factors should be considered part of the structure of a
Big Data project from the beginning.
Whatever the data, when they are related to humans, safety concerns will inevitably arise. If the
goal is to share data, those who provide them have to be able to trust those who assume the
responsibility of caring for their information [57, 58]. This will only be achieved with an
appropriate regulatory framework.
7.7 Adoption
Data should be managed as a strategic asset within organizations. Existing barriers to the
adoption of Big Data are usually cultural. Many organizations do not implement Big Data
programs because they cannot appreciate the way in which data analysis can enhance their
businesses [15].
Page 25 of 34
Defining objectives and expected outcomes are critical in order to establish a governance capable
to sustain projects of this magnitude. A BIG DATA program should include the people,
processes, and policies needed [59].
The difficulties that were previously reviewed: economic issues, poor infrastructure, and lack of
trained personnel, are common to most developing countries, and generate a gap in the adoption
of Big Data as compared to developed countries that is equivalent to the digital divide [30].
Some ways to accelerate the adoption of Big Data techniques in developing countries like India
are simple, such as sharing experiences and lessons learned [36]. Currently, developing countries
have more access to sources of scientific information, due to the increased penetration of
Internet, the emergence of the Open Access movement, which allows to access to scientific
articles of prestigious publications for free, and the advent of new tools for searching scientific
literature, like Google Scholar. A recent paper shows that Google Scholar provides greater
access to free full-text articles than PubMed [60].
#
1
Trend
Fragmented data
Data is
processes
Scale-up is shifting to
scale-out .
Software as a service
(SaaS), Infrastructure as a
driving
the
Description
The separation of data among labs, hospital systems,
and even clinical components such as financial IT and
electronic health records is a key issue in healthcare.
Traditional analytics use ETL processes that upload
data nightly or weekly to a data warehouse.
The Big Data trend is moving toward real or near realtime decision support at the point-of-care.
In traditional analytics, reporting focuses on the past,
but with Big Data, it is more predictive.
Traditionally, processes pulled and pushed data
whenever needed.
In Big Data, processes access data to derive meaning
from datasets, create clinical hypothesis, prevent fraud,
reduce cost of care, reduce clinical errors, and improve
outcomes.
Traditionally, scale-up was the active choice. This
led to replacing existing infrastructure with bigger
servers, larger memory and more processing power In
Big Data, multiple nodes are leveraged.
Systems need not be replaced, rather are modernized
and leveraged to exchange and use information.
The exponential growth of data requires significant
supporting infrastructure and complex software for
Page 26 of 34
Attribute
Variety
Velocity,
Value
Volume,
Variety,
Velocity
Value
Value
Service (IaaS)
Page 27 of 34
8. CONCLUSIONS
Big Data has the potential to string this traditional and non-traditional data together to deliver
significant insights that can drive improvements in wide ranging areas of healthcare from clinical
research to care delivery to health policy and planning. Big Data is proving to be a huge asset in
tackling community healthcare issues to reduce the costs associated with emergency care and
make it prevention-focused. In clinical research and care delivery, Big Data can be leveraged as
a powerful tool to find solutions to Alzheimers disease and certain types of cancer and also
provide a low cost approach to personalized medicine. In health policy, planning and
implementation, initiatives such as using cellphone data to track disease origination and spread
can lead to key insights on where to spend valuable economic resources to control diseases and
epidemics. Healthcare organizations need to evaluate Big Data needs as well as potential uses
and take a step towards moving to a data driven, hypothesis generating approach to forward
clinical research frontiers. By leveraging Big Data, healthcare organizations can create value
based outcome-driven efficient care delivery that benefits all stakeholders.
Recommendations:
Data capture
Infrastructure
Adoption
Page 28 of 34
9. REFERENCES
1. Raghupathi W: Data Mining in Healthcare. In Healthcare Informatics: Improving Efficiency
and Productivity. Edited by Kudyba S. Taylor & Francis; 2010:211223.
2. Burghard C: Big Data and Analytics Key to Accountable Care Success. IDC Health Insights;
2012.
3. Dembosky A: Data Prescription for Better Healthcare. Financial Times, December 12, 2012,
p. 19; 2012. Available from: http://www.ft.com/intl/cms/ s/2/55cbca5a-4333-11e2-aa8f00144feaBig Datac0.html#axzz2W9cuwajK.
4. Feldman B, Martin EM, Skotnes T: Big Data in Healthcare Hype and Hope. October 2012.
Dr. Bonnie 360; 2012. http://www.west-info.eu/files/big-data-inhealthcare. pdf.
5. Fernandes L, OConnor M, Weaver V: Big Data, bigger outcomes. J AHIMA 2012:3842.
6. IHTT: Transforming Healthcare through Big Data Strategies for leveraging Big Data in the
healthcare industry; 2013. http://ihealthtran.com/wordpress/2013/03/iht%C2%B2-releases-bigdata-research-reportdownload-today/.
7. Frost & Sullivan: Drowning in Big Data? Reducing Information Technology Complexities and
Costs
for
Healthcare
Organizations.
http://www.emc.com/collateral/analyst-reports/frost-
sullivan-reducing-information-technologycomplexities-ar.pdf.
8. Bian J, Topaloglu U, Yu F, Yu F: Towards Large-scale Twitter Mining for Drugrelated
Adverse Events. Maui, Hawaii: SHB; 2012.
9. Raghupathi W, Raghupathi V: An Overview of Health Analytics. Working paper; 2013.
10. Ikanow: Data Analytics for Healthcare: Creating Understanding from Big Data.
http://info.ikanow.com/Portals/163225/docs/data-analytics-for-healthcare.pdf.
11. jStart: How Big Data Analytics Reduced Medicaid Re-admissions. A jStart Case Study;
2012. http://www-01.ibm.com/software/ebusiness/jstart/portfolio/uncMedicaidCaseStudy.pdf.
12.
Knowledgent:
Big
Data
and
Healthcare
Payers;
2013.
http://knowledgent.com/mediapage/insights/whitepaper/482.
13.
Explorys:
Unlocking
the
Power
of
Big
Data
to
Improve
Healthcare
for
Everyone.https://www.explorys.com/docs/data-sheets/explorys-overview.pdf.
14.
IBM:
IBM
Big
Data
platform
for
healthcare.
Solutions
Brief;
http://publicdhe.ibm.com/common/ssi/ecm/en/ims14398usen/IMS14398USEN.PDF.
Page 29 of 34
2012.
15. Intel: Leveraging Big Data and Analytics in Healthcare and Life Sciences: Enabling
Personalized
Medicine
for
High-Quality
Care,
Better
Outcomes;
2012.http://www.intel.com/content/dam/www/public/us/en/documents/whitepapers/healthcareeveraging-big-data-paper.pdf.
16. IBM: Data Driven Healthcare Organizations Use Big Data Analytics for Big Gains; 2013.
http://www03.ibm.com/industries/ca/en/healthcare/documents/Data_driven_healthcare_organizat
ions_use_big_data_analytics_for_big_gains.pdf.
17. Savage N: Digging for drug facts. Commun ACM 2012, 55(10):1113.18. Zenger B: Can
Big
Data
Solve
Healthcares
Big
Problems?
HealthByte,February
2012;
2012.
http://www.equityhealthcare.com/docstor/EH%20Blog%20on%20Analytics.pdf.
19. LaValle S, Lesser E, Shockley R, Hopkins MS, Kruschwitz N: Big Data,analytics and the
path from insights to value. MIT Sloan Manag Rev 2011, 52:2032.
20. Core Techniques and Technologies for Advancing Big Data Science & Engineering
(BIGDATA)
[Internet]. National
Science
Foundation;
2012.
Available
at:http://www.nsf.gov/pubs/2012/nsf12499/nsf12499.pdf
21. MD Anderson Taps IBM Watson to Power Moon Shots Mission [Internet]. MD Anderson
Cancer
Center.
2013[cited
2013
Dec
17].
Available
at: http://www.mdanderson.org/newsroom/news-releases/2013/ibm-watson-to-power-moonshots-.html
22. Okun S, McGraw D, Stang P, Larson E, Gold-mann D, Kupersmith J. Making the Case for
Continuous Learning from Routinely Collected Data [Internet]. IOM; 2013. Available
at:http://www.iom.edu/~/media/Files/Perspectives-Files/2013/Discussion-Papers/VSRTMakingtheCase.pdf
23. Davis DA, Chawla NV, Blumm N, Christakis N, Barabasi A-L. Predicting individual disease
risk based on medical history. Proceedings of the 17th ACM conference on Information and
knowledge management. ACM; 2008. p. 76978.
24. Davis DA, Chawla NV, Christakis NA, Barabsi A-L. Time to CARE: a collaborative engine
for practical disease prediction. Data Min Knowl Discov 2010;20(3):388415.
Page 30 of 34
at:http://www.unglobalpulse.org/sites/default/files/BigDataforDevelopment-
UNGlobal-PulseJune2012.pdf
28. Barrington J, Wereko-Brobby O, Ward P, Mwafongo W, Kungulwe S. SMS for Life: a pilot
project to improve anti-malarial drug supply management in rural Tanzania using standard
technology. Malar J 2010. Oct 27;9(1):298. [PMC free article] [PubMed]
29. Novartis Malaria Initiative: SMS for Life [Internet]. [cited 2014 Mar 27]. Available
at:http://www.malaria.novartis.com/innovation/sms-for-life/
30. Hilbert M. Big Data for Development: From Information-to Knowledge Societies. Univ
South
Calif
Annenberg
Sch
Commun
[Internet].
2013;
Available
at: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2205145
31. Barroso LA, Hlzle U. The Datacenter as a Computer: An Introduction to the Design of
Warehouse-Scale Machines. Synth Lect Comput Archit 2009. Jan;4(1):1108.
32. Shapiro C, Varian HR. Information rules: a strategic guide to the network economy. Boston,
Mass: Harvard Business School Press; 1999.
33. Infrastructure as a Service (IaaS) [Internet]. Gartner IT Glossary. [cited 2013 Dec 10].
Available at:http://www.gartner.com/it-glossary/infrastructure-as-a-service-iaas
34. Latourette MT, Siebert JE, Barto RJ, Jr., Marable KL, Muyepa A, Hammond CA, et
al. Magnetic resonance imaging research in sub-Saharan Africa: Challenges and satellite-based
networking implementation. J Digit Imaging 2011;24(4):72938. [PMC free article] [PubMed]
Page 31 of 34
35. Shiferaw F, Zolfo M. The role of information communication technology (ICT) towards
universal health coverage: The first steps of a telemedicine project in Ethiopia. Glob Health
Action 2012;5(1):15.[PMC free article] [PubMed]
36. Simba DO. Application of ICT in strengthening health information systems in developing
countries in the wake of globalisation. Afr Health Sci 2004. Dec;4(3):1948. [PMC free
article] [PubMed]
37. Gardiner B. Astrophysicist Replaces Supercomputer with a Cluster of Eight PlayStation
3s [Internet].WIRED.
2007[cited
2013
Dec
10].
Available
at:http://www.wired.com/techbiz/it/news/2007/10/ps3_supercomputer
38. Zyga
L. US
Air
Force
connects
1,760
PlayStation
2010[cited
2013
Dec
s
10].
to
build
Available
at: http://phys.org/news/2010-12-air-playstation-3s-super-computer.html
39. Amazon
Web
Services
[Internet]. Amazon.
[cited
2013
Dec
10].
Available
at: http://aws.amazon.com/
40. Google Compute Engine [Internet]. Google Cloud Platform. [cited 2013 Dec 10]. Available
at:https://cloud.google.com/products/compute-engine/
41. Purkayastha S, Braa J. Big Data Analytics for developing countries-Using the Cloud for
Operational BI in Health. Electron J Inf Syst Dev Ctries [Internet]. 2013[cited 2014 Mar 25];59.
Available at:https://ejisdc.org/ojs2/index.php/ejisdc/article/view/1220
42. Apache
Hadoop
[Internet]. Hadoop.
[cited
2013
Dec
10].
Available
at: http://hadoop.apache.org/
43. Lohr S. For Todays Graduate, Just One Word: Statistics. The New York Times [Internet].
2009.
Aug
[cited
2013
Dec
10];
Available
at: http://www.nytimes.com/2009/08/06/technology/06stats.html?_r=3&
44. Manyika J, Chui M, Brown B, Bughin J, Dobbs R, Roxburgh C, et al. Big Data: The next
frontier for innovation, competition, and productivity [Internet]. McKinsey Global Institute;
2011.Available
Page 32 of 34
at:http://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_inno
vation
45. Competitions
Kaggle [Internet].
[cited
2014
Mar
27].
Available
at:https://www.kaggle.com/solutions/competitions
46. DataKind | DataKind [Internet]. [cited 2014 Mar 27]. Available at: http://www.datakind.org/
47. Hammond WE, Bailey C, Boucher P, Spohr M, Whitaker P. Connecting Information To
Improve Health. Health Aff (Millwood) 2010. Feb 1;29(2):2848. [PubMed]
48. Searching for standards in Big Data [Internet]. FCW; 2012[cited 2013 Dec 17]. Available
at:http://fcw.com/microsites/2012/snapshot-man-aging-big-data/05-establishing-big-datastandards.aspx
49. Glaser J. Interoperability: the key to breaking down information silos in health care. Healthc
Financ Manage 2011. Nov;65(11):446, 48, 50. [PubMed]
50. Luna D, Garca M, Nishioka A, Franco M. OPS - Revisin de estndares de interoperabilidad
para la e-salud en latinoamrica y el caribe. In Press. 2013;
51. Country
health
information
systems:
review
of
the
current
situation
and
trends [Internet]. Geneva: World Health Organization; 2011[cited 2013 Nov 1]. Available
at:http://www.who.int/healthmetrics/news/chis_report.pdf
52. National eHealth strategy toolkit. [Internet]. World Health Organization and International
Telecommunication Union; 2012. Available at: http://www.itu.int/pub/D-STR-E_HEALTH.052012/
53. Committee on the Role of Institutional Review Boards in Health Services Research Data
Privacy Protection. I of M. Protecting data privacy in health services research [Internet]. National
Academies Press.; 2000. Available at: http://www.nap.edu/openbook.php?isbn=0309071879
54. Meslin EM. Shifting Paradigms in Health Services Research Ethics. J Gen Intern Med 2006.
Mar;21(3):27980. [PMC free article] [PubMed]
Page 33 of 34
55. Summary of the HIPAA Security Rule [Internet]. HHS. [cited 2013 Dec 17]. Available
at:http://www.hhs.gov/ocr/privacy/hipaa/understanding/srsummary.html
56. Summary of the HIPAA Privacy Rule [Internet]. HHS. [cited 2013 Dec 17]. Available
at:http://www.hhs.gov/ocr/privacy/hipaa/understanding/summary/index.html
57. Campbell AV. The Ethical Challenges of Genetic Databases: Safeguarding Altruism and
Trust. Kings Law J 2007. Jan 1;18(2):22745.
58. Chalmers D, Nicol D. Commercialisation of biotechnology: public trust and research. Int J
Biotechnol2004. Jan 1;6(2):11633.
59. Michele O, Fernandes L, Weaver V. Big Data, Bigger Outcomes. J AHIMA 2012;83(10):38
43.[PubMed]
60. Shariff SZ, Bejaimal SA, Sontrop JM, Iansavichus AV, Haynes RB, Weir MA, et
al. Retrieving clinical evidence: a comparison of PubMed and Google Scholar for quick clinical
searches. J Med Internet Res2013;15(8):e164. [PMC free article] [PubMed]
61. Big Data for Development: a primer. Harnessing Big Data For Real-Time Awareness
[Internet]. UN
Global
Pulse;
2013.
Available
at: http://www.unglobalpulse.org/sites/default/files/Primer%20
2013_FINAL%20FOR%20PRINT.pdf
62. Vital Wave Consulting. Big Data, Big Impact: New Possibilities for International
Development
[Internet]. World
Economic
Forum;
2012.
Available
at:http://www3.weforum.org/docs/WEF_TC_MFS_BigDataBigImpact_Briefing_2012.pdf
63. New
Data
[Internet]. OECD;
for
Understanding
2013.
the
Available
Human
Condition:
International
Perspectives
at: http://www.oecd.org/sti/scitech/new-data-for-
understanding-the-human-condition.pdf.
Page 34 of 34