Documente Academic
Documente Profesional
Documente Cultură
Abstract: This paper identifies the (EHR). Concurrently, there is fast progress are being
made in clinical analytics, such as techniques for
increasing health care data which is being accumulated
analyzing large volumes of data and derive new
digitally every day. The healthcare industry is
insights from that analysis, which is known as big data
becoming very data intensive. Worldwide digital
analytics. As a result of this, we can utilize remarkable
healthcare data is estimated to be equal to 500
opportunities provided by big data to reduce the costs
petabytes (1015 bytes), and is expected to reach 25
of health care as well as diagnosing the diseases.In this
exabytes (1018 bytes) in 2020 [6].In this paper, heart
paper, heart disease is one such disease selected
disease is one such disease selected among variety of
among variety of disease in healthcare. Heart disease
disease in healthcare. The purpose of this work is to
is a general name for a variety of diseases. Heart
predict the diagnosis of heart disease with reduced
disease symptoms may vary depending on the specific
number of attributes. Each dataset stored in HDFS is
type of heart disease.
classified based on attributes. This prediction solution
using random forest on apache spark gives massive
The hospitalsuse the hospital database systems to store
opportunity for health care analysts to deploy this
and manage their patient data. These systems generate
solution on ever changing, scalable big data landscape
large volumes of data, but these data are rarely used to
for insightful decision making.
support insightful clinical decision making.
Keywords: Spark, HDFS, Heart disease, Random So by using big data with data mining algorithms
forest, verification makes it possible to do many things such as,identify
healthcare trends, prevent diseases, and diagnose the
diseases and so on.
1. INTRODUCTION 2. OBJECTIVES
The health care system is rapidly adopting electronic
The purpose of this work is to predict the diagnosis of
health records, which will drastically increase the
heart disease with reduced number of attributes. Each
quantity of clinical datas that are available digitally
Fig 5.2: Accuracy graph of Random Forest [4] Jian Fu, Junwei Sun, Kaiyuan Wang "SPARKA
Big Data Processing Platform for Machine Learning",
2016 International Conference on Industrial
Informatics - Computing Technology, Intelligent
CONCLUSION AND FUTURE Technology, Industrial Information Integration
ENHANCEMENT
[5] Patil R Priya, Kinariwala A S, "Automated
Diagnosis of Heart Disease using Random Forest
Algorithm" International Journal of Advance
Utilizing big data analytics, the healthcare data being Research, Ideas and Innovations in Technology
generated from time to time in medical field can be
processed faster for predicting diseases with none
[9] https://hortonworks.com/apache/hdfs/
[10] https://spark.apache.org/
[11] http://data-flair.training/blogs/hadoop-mapreduce-
vs-apache-spark/
[12]https://archive.ics.uci.edu/ml/datasets/Heart+Disea
se
[13]https://www.stat.berkeley.edu/~breiman/RandomF
orests/