Sunteți pe pagina 1din 6

Development of soft computing tools and IoT for

improving the performance assessment of analysers in


a clinical laboratory
Michael S Packianather Nury Leon Munizaga
School of Engineering, Cardiff University School of Engineering, Cardiff University
Queen’s Buildings, The Parade Queen’s Buildings, The Parade
Cardiff CF24 3AA, UK Cardiff CF24 3AA, UK
PackianatherMS@cf.ac.uk leonnc@cardiff.ac.uk

Soha Zouwail Mark Saunders


Medical Biochemistry and Immunology Medical Biochemistry and Immunology
Cardiff and Vale University Health Board Cardiff and Vale University Health Board
Cardiff CF14 4XW, UK Cardiff CF14 4XW, UK
soha.zouwail@wales.nhs.uk Mark.Saunders@wales.nhs.uk

Abstract—This paper presents a three phase methodology to which could potentially save vast amounts of money in the
automate quality control in healthcare clinical laboratory. The industries. This paper focuses on the novel aspect of predictive
first phase consists in the automation of the performance maintenance in the healthcare industry, not only because their
assessment of the equipment in MS Excel. With the smart tools equipment is high tech and expensive but because any failure
included in Excel, a macro was developed that not only saves the would have a significant consequence in patients’ lives and
user time and makes the process more efficient, but also gives a hospital reputation. This study aims to implement a predictive
clear idea of the quality of the test results. The second phase maintenance approach on the equipment used in the Clinical
deals with the quality control management of the generated data Chemistry and Immunology Department of the University
through the application of manufacturing techniques; a code in
Hospital of Wales, Cardiff, in order to improve their
Matlab was created that would allow the user to visualise the
performance assessment.
current performance of the equipment according to some
specified limits in Statistical Process Control (SPC) charts. This The paper is organised as follows. Sections II and III
enables the user to select the relevant information to visualise by describes the Predictive Condition-Based Maintenance, and
analysing the control levels and dates. In the final phase a Predictive Maintenance in Medical equipment. The proposed
prediction algorithm applying data mining and machine learning methodology is outlined in Section IV. The results are
techniques was developed, based on the historical data, which is discussed in Section V, and conclusion in Section VI.
used as a small sample of big data that could be potentially
generated by the IoT enabled equipment interconnected via the
internet enabling them to send and receive data. Using the K- II. PREDICTIVE CONDITION-BASED MAINTENANCE (CBM)
Nearest Neighbour (KNN) classifier a performance accuracy of Maintenance strategies are mainly divided into three
94% was achieved which allows the user to predict future categories: corrective, preventive, and predictive ones. The first
behaviour of the equipment. ones are failure-driven strategies and are focused on repairing a
system that has already failed, preventive maintenance
Keywords—Predictive Condition-Based Maintenance (CBM),
Statistical Process Control (SPC), Data Mining, Machine Learning,
strategies are time-based and focused on scheduling the
K-Nearest Neighbour (KNN) classifier, Soft Computing, Internet of maintenance before the failure occurs. And finally, predictive
Things (IoT), Big Data, Industry 4.0. maintenance strategies are better at saving time and money by
forecasting the failure of the system based on its condition or a
known threshold [1].
I. INTRODUCTION
Predictive Maintenance is becoming more and more Condition-Based Maintenance (CBM) can be defined in
important in today’s digital world. As we approach the 4th simple terms as an evolution of the visual inspection in
industrial revolution which is known as Industry 4.0, the factories, which is one of the most commonly used and oldest
situation becomes more challenging with the promise of methods available. It is mainly a smart technique that analyses
interconnecting many machines, processes, tools, methods and and detects the symptoms of the failure using sensors and
systems in an interdependent system of networks so that data algorithms [2]. Predictive CBM aims to predict the future
could be collected and used for intelligent decision making condition of the equipment by using a recursive model that

158
978-1-7281-0457-7/2019/$31.00 2019
c IEEE
considers the past operations profile together with the current degree of danger for patients or medical personnel, or the
behaviour of the system for more efficient decision-making [1]. function of the equipment in diagnosis or treatment processes
[5]. The advantages of the implementation of predictive
In order to develop a Smart predictive CBM model for maintenance are endless. For example, it improves the
machine fault diagnosis, it is necessary to rely on Artificial customer satisfaction leading to an increase in monetary return.
Intelligence and Soft Computing approaches which tend to
perform exceptionally well. Machine Learning algorithms are The first step to predictive maintenance is ensuring the quality
built with the Big Data obtained from IoT enabled equipment of data collected, and for this purpose existing frameworks and
and are trained to detect faulty and correlated patterns on this procedures could be applicable within the medical context.
data. It goes beyond just setting a simplistic threshold, and it This study focuses on the use of Statistical Process Control
analyses behavioural patterns to predict future failures without (SPC) tools for improving the management of quality control,
the need for human involvement. Machine learning can be with which it will be possible to easily visualise inconsistencies
classified according to the way that the algorithm learns which in the data such as missing values and outliers. The quality of
can be supervised or unsupervised. Supervised learning is the the data can be ensured by making use of data management
most commonly used, and it requires known sets of input and techniques, process monitoring, and statistical analysis [6].
output data. Basically, the algorithm iteratively learns the data SPC can be defined as a basic set of statistical tools that allows
and infers a function which is used for predicting outputs for ongoing improvement in data quality by monitoring the
new or unseen data. Unsupervised learning is related to the process performance, identifying and eliminating sources of
idea that a computer can learn without human guidance by variability to produce consistent output. The focal point of this
finding hidden patterns and trends present in a dataset but method is to use statistics to determine the causes of variation
without a known output. in the process, which can be observed as an unnatural or
unwanted pattern in process data. In addition, it tries to keep
Supervised learning can also have different approaches, the average of the measured process output at a target value
from classification to prediction models according to the type whereby reducing the inherent unpredictability within the
of input data, if it contains discrete or continuous values process [7].
respectively. Commonly used algorithms include Artificial
Neural Networks (ANNs), linear regression models, naïve SPC is just the first part of quality control management for
bayes, Support Vector Machines (SVM), decision trees, and ensuring the quality of the data collected in the laboratory. The
regression trees. The selection of the right algorithm will next step is the application of the quality control methods for
depend on different factors like the type of data, the speed of measuring the capability of the process. The test results of the
training, memory usage, and accuracy. assays used in this study can be classified as highly capable,
capable and incapable according to their performance and
Regardless of the chosen learning type, it is essential to further predictions could be made as to what is achievable
apply data mining techniques to the raw data before building a given the current state of the assay. In this step, targets are set
model or algorithm. Data mining is the process of discovering for example bias, coefficient variation, total allowable error,
useful patterns in big data sets and it plays an important role in among others. These targets will depend on the type of
the condition monitoring process. It involves a series of performance classified as desirable, optimal or minimum
activities from understanding the business context, selecting performance.
and cleaning vast amounts of data to refining the attributes and
features that are meaningful for the design of the algorithm [3]. The final step is the assessment of the acceptability of the
It can be the most time-consuming and challenging stage of the quality control results by the application of Westgard Multi-
model building exercise especially when there is a lack of QC rules, which are designed to detect both systematic and
domain knowledge, or the input data comes in various formats. random errors. They are used in order to achieve a high
probability of error detection while minimising the likelihood
III. PREDICTIVE MAINTENANCE IN MEDICAL EQUIPMENT of false rejection, and results will be reviewed immediately
following the analysis. When the quality control rules are
The increased costs of maintaining high tech and sophisticated violated, the assay must be taken off-line, and samples should
medical equipment make it essential to embrace Smart not be processed until QC falls within acceptable limits, and
predictive maintenance approach in the healthcare industry in the last 5 patient samples analysed before the violation should
order for the medical device manufacturers to grow their be reanalysed.
profits and maximise their customer satisfaction. This could be
facilitated by emerging technologies like the Internet of Things The clinical laboratory considered in this study has 7
(IoT), which allows the connection and communication of automated analysers, which are medical machines designed to
physical objects through embedded sensors connected via the measure different chemicals in biological samples in a fast way
internet. A pioneer in the field of predictive maintenance for and with minimal human intervention. The human factor is
medical devices is Karvy Analytics, which uses IoT and something that must always be considered in the performance
advance Machine Learning algorithms to analyse data already of a process, regardless of the experience and skill level of
available on the devices and accurately predict when they are those who perform such tasks, analyse data and make decisions
about to breakdown [4]. based on this data. Human errors can be small and harmless to
the process, but they can also have a tremendous economic
Besides the economic impact that medical equipment impact on the health industry. The approach taken to address
failure can represent, the importance of improving their this issue is through the automation of the performance
reliability is attached to several factors like having a low

159
assesment of the analysers and the development of a failure In this phase, a macro was built in order to automate the
prediction model by making use of artificial intelligence and handling of the .csv file. The macro has the following features:
soft computing to analyse a small sample of big data obtained
from IoT enabled equipment and devices, and communication  User interaction interface – It starts by asking the
modes as provided by the manufacturer. user if she/he wants to analyse the data. This gives the
macro a user-friendly interface offering a seamless
experience.
IV. METHODOLOGY
 Selection of the data – The next step of the macro is
In order to build a model useful for failure prediction, and to remove the unnecessary attributes at this stage and leave
to ease the transition for the users of the equipment from a the important ones for this specific analysis. These are
corrective approach to a predictive one, the methodology used Assay Name, ControlLevel, and Result.
in this study has been divided into 3 phases:  Creation of Pivot Table – This table resumes the
A. First phase results by assay and control level, and it also calculates the
mean and standard deviation of these results.
Mainly focused in the automation of the calculations  Calculations – The final step of the macro is to copy
carried out in 2 different MS Excel workbooks for the quality the information from the pivot table and to perform extra
control analysis. calculations: CV% and EMU.
B. Second phase
The application of manufacturing techniques on clinical TABLE I. ATTRIBUTES DESCRIPTION FOR DATA GENERATED BY
analysers. This phase consists of developing an algorithm in ANALYSER
Matlab for visualising Statistical Process Control (SPC) charts
for each specific assay in order to identify acceptable
ATTRIBUTE DESCRIPTION
behaviour.
ControlName Quality control name
C. Third phase SID Barcode associated with the quality
Consists of the application of soft computing techniques control level
such as data mining and machine learning for developing a ControlLevel Quality control level
model capable of predicting the process trend out to indicate ControlLotNumber Quality control lot number
how long it will be before the process will be out of control ControlComment Any comment associated with the quality
heading towards consequent failure. control
AssayName Assay name identification
Each stage has its specific steps, requirements, and AssayNumber Assay number identification
challenges which will be explained in the next section together Module Basically the analyser
with the explanation of how each of the above phases is SerialNumber Unique identifying number or group of
deployed. numbers and letters assigned to an
individual piece
V. RESULTS AND DISCUSSION CalLot Calibrator lot number
CalDateTime Date when the assay was last calibrated
A. First phase - Automation of calculations in MS Excel ReagentMasterLot Lot number of the batch in use
The aim of the first phase of this study was to analyse, ReagentSerialNumber Number of individual kits
clean and organise the data automatically generated by the IoT DateTimeCompleted Final date when the assay was completed
Carrier Rack that quality control analysed in
enabled Analysers used for the quality control analysis and
Position Position in rack
later on, to implement conditional formats and macros that
Result Reported result
allow doing this assessment in an automated way. There were Units Unit of measurement
2 important MS Excel files to work with; the first workbook MinRange Set by the user -2 SD
was the one that shows the results from each analyser and the MaxRange Set by user +2 SD
other one was the workspace for the quality control analysis Dilution Assays can be set as default to neat or a
itself. Each file had different automation requirements and specific dilution.
conditions to be applied, and these were specified by the user. Flags Any flag associated with the assay such as
westgard rule
Data collected from each analyser - .csv file ReadValues Raw absorbance data from analyser from
Each analyser produces a .csv report with information reaction cuvette
about its performance in a determined period of time set by the AbsorbanceCuvette Blank absorbance value using water in the
user (normally one month). This spreadsheet consists of 27 same cuvette
columns that show attributes of the data that were briefly MvCuvette Analog to digital signal conversion
detailed in Table I. The importance of understanding each of OperatorName Name of the person logged into analyser
these attributes lies in the fact that this data will be used to at time of analysis
build SPC charts and the failure prediction model in the SystemSerialNumber Analyser serial number
following phases.

160
“Quality Control Analysis” spreadsheet than the target then Excel would paint the cell with the value
In order to improve the quality control analysis in the with red colour as shown in Table IV. The cells in green mean
General Chemistry spreadsheet, these are the steps that were that the analyser doesn’t perform that specific analysis.
followed: Therefore, the value 100 means that the result is 100%
deviated from the target.
Organising the data - The first step was to reorganise the
data; the previous structure didn’t allow the user to access to
the information in a fast way because it showed the data of the TABLE IV. APPLICATION OF CONDITIONAL FORMATS FOR QC
7 analysers in 14 different tables according to the quality
control level. Table II shows the charts for the analyser 16200:

TABLE II. DATA FOR ANALYSER 16200 IN DIFFERENT QC LEVELS

B. Second phase - Application of Manufacturing Techniques


The aim of this phase was to develop an algorithm in
Matlab able to plot SPC charts for each analyte at different
After organising the data and re-allocating it, the new control levels, and for the different analysers. The approach
structure is a unique list with all the data that can be easily taken was the following:
copied from the results of the previous step. This new layout Importing the file: The user has to specify in the Matlab
in Table III allows the user to filter the results by analyser, browser, the exact location of the file that will be imported.
quality control level, analyte, mean, standard deviation, The file has to be imported as column vectors, selecting the
coefficient variance or EMU according to the user’s needs.
option given by the software as can be seen in Fig. 1.

TABLE III. THE NEW STRUCTURE SHOWING DATA IN EACH QC LEVEL

Application of conditional formats - The next step was made


in the same spreadsheet “General Chemistry” in which the
Fig. 1. Import window in Matlab.
quality control analysis took place. Basically, 3 factors were
evaluated in 2 different control levels each one, the bias,
Providing format to dates: When building the code, the
coefficient variation and total error in level 1 and 3 for each
first stage is to provide the correct format to the variable
one. The results were shown in 6 independent tables; each one
“DateTimeCompleted” in order for Matlab to recognise it as
had a section with the results of each analyser that were
dates. The vector should be input as text and using the
presented and evaluated, and also a column with the target
function datetime it can be changed to date format.
value that is used as a reference for this evaluation. The 6
Creating a list of assays: It was used the function listdlg
quality control analysis performed, are the following:
which uses the information in the vector “AssayName” to
 Quality Control Level 1 - Bias
generate a list of the assays in the entire file.
 Quality Control Level 3 - Bias
User interface: The next stage of the code consists of
 Quality Control Level 1 – Coefficient Variation
asking the user which analyte she/he would like to examine, at
 Quality Control Level 3 - Coefficient Variation
which specific control level and for which exact dates. The
 Quality Control Level 1 – Total Error
options given to the user in each list are built based on the
 Quality Control Level 3 – Total Error
database uploaded, and only the last interface (for date
selection) is built to allow multiple choice.
But this quality control assessment used to be done
previously just with a visual inspection comparing each result Calculation of parameters: For the given input, the
to the target, this process was prone to errors, repetitive and algorithm proceeds to calculate the mean, SD, CV, EMU,
time consuming for the operator. Therefore, conditional UCL, and LCL of the results. It also has a counter for the
formats were implemented where this comparison between the points that are outside the control limits of UCL and LCL; all
target and the result is made automatically, if a value is higher these results are presented to the user as shown in Fig. II. It is

161
important to notice that Matlab doesn’t recognise the values specific conditions. The .csv file for the analyser UHL 16000
containing special characters like > or <, therefore the contained 35000 instances and 27 attributes.
calculations are done without considering these values.

Fig. 2. Results for the selected assay and control level.

Plotting the SPC chart: With the results vector and


calculated parameters, the algorithm plots a graph of the Date Fig. 4. Workflow for deploying a prediction model.
vs. Results for the specified analyte, control level and dates as
shown in Fig. 3. Discretization of Data: Due to the type of data available it
was decided to build a classification model; most of the
attributes could be classified as discrete nominal values
because they contained numbers or labels to identify different
categories.
An essential step was discretized the values of the results;
this means that the continuous values were converted into
nominal ones by using the criteria of “Min. Range” and “Max.
Range” to decide if the value in the “Result” variable indicated
if the process was “in of control” or “out of control”. The new
nominal attribute was called “Nominal Result”, and it was
obtained by application of formulas in Excel.
Pre-processing Data: Before importing the dataset, it is a
priority to give format to the attributes as we are dealing with
numeric and nominal values like codes, results, and dates.
Matlab just allows 4 types of data (text, number, categorical
Fig. 3. SPC chart plotted in Matlab. and datetime).
The next step was to clean the data. Therefore outliers,
With this algorithm, the user can upload the .csv file of any missing values, and duplicate data were removed. There were
of the analysers and after selecting the assay, control level and found 8485 and 18063 missing values in the attributes CalLot
dates to assess, will have a visual representation of how the and OperatorName respectively. The approach taken to deal
analyser is performing. with the missing values in the first case was to remove the
C. Third phase - Application of Data Mining and Machine missing values as they were not representative of the whole
Learning Techniques sample. In the second case, it was decided to fill the spaces
with the name “Other” for the OperatorName as it shouldn’t
The goal of this phase was to build a prediction model
affect in a significant manner if the analysis is done by
with the historical small sample of big data from an analyser;
someone else as the laboratory has strict procedures for its
the model is useful for predicting the future and unknown
operators. In the OperatorName attribute, there was also found
behaviour of the performance of the equipment. The chosen
that the operator Sian had introduced his name in 2 different
approach was supervised learning because the expert mapped
ways in the system (siân and SIAN) therefore the software
the known set of input data to known responses, and the
recognize the name as 2 different operators, it was
category of supervised learning applied was classification due
standardized to the name SIAN. The same case with the
to the type of data that was used for training.
Administrator as a user. After this step, the dataset was
The software used was the Classification Learner App
included in the Machine Learning toolbox of Matlab, and in reduced to from 35000 to 26515 instances.
order to build a successful model, it was necessary to combine Identify Principal Components: Following the step of
the specific domain-knowledge of the expert with this powerful cleaning the data, it is crucial to reduce the dimensionality of
tool. This tool allowed the user to train different models, assess the dataset because it would reduce the time that the
their performance, cross-validate them and choose the best one algorithms take to compute, and helps to improve the
for this specific application. The basic workflow to deploy the visualization of the data. This step was done by removing the
algorithm is shown in Fig 4. non-relevant attributes in relation to the “Result” that is the
variable to predict. In order to choose these significant
Acquire Data: The first step was to acquire the dataset in a variables that influence it, two criteria were used: the Principal
.csv file. The collected data is representative both from the
Component Analysis (PCA) function in Matlab and the
healthy operation and fault conditions of the analyser, and has
been collected in from Abbott link in a period of time of eight knowledge of the expert. PCA mainly consists on finding a
months. Data have been allocated into ensembles, which are a subset that captures the most significant amount of variation
collection of data sets of the operation of the system under of the data. After both analyses, it was decided to keep only

162
the following 8 independent attributes or features and 1 introducing certain known parameters like the assay name,
dependent attribute (Nominal Result) - ControlLevel; Assay control level and reagent lot. The future work will consider the
Number; CalLot; CalDateTime; Position; ReagentMasterLot; combination of the SPC charts developed in phase 2, with the
ReagentSerialNumber; OperatorName; Nominal Result. prediction model developed in phase 3 for further
Please note that the number of features used for modelling is improvement, and time-series analysis based Deep Learning.
more than that of quality control.

Train the Model: Before training the model, it was


critical to split the data set into 2 (70% and 30%). With this
step, it was made sure that the dataset used for building the
model and testing it, had the same sampling distribution. One
dataset was used for training the model, and the other one was
used later for testing it, this step also assures that we won’t use
the same data to build and test the model so the results would
be more realistic. The dataset used for training the models
consists of 18860 instances and 9 attributes. Several models
can be trained with the Classifier Learner of Matlab and it
even gives the user the option of training at once a selection of
classifiers that are fast to train. The classification models
include decision trees, discriminant analysis, support vector Fig. 5. Performance of different classifiers.
machines, logistic regression, nearest neighbours, and
ensemble classification. In order to select the best model,
several factors were considered such as speed of training,
memory usage, accuracy and prediction speed. Also, it is
important to mention that the validation scheme chosen was
the 5 fold cross-validation. The performance of different
classifiers trained is shown in Fig. 5.
The chosen classifier was a medium KNN with a
classification accuracy of 94.4% and 10 neighbours (i.e.
K=10). The performance of the model could be understood
with a confusion matrix which shows how the selected Fig. 6. Confusion matrix for the selected classifier.
classifier performs in each class. As given in Fig. 6, the model
was able to give true positive results 95% (i.e. sensitivity) and ACKNOWLEDGMENT
true negative results 54% (i.e. specificity) of the times.
Deploy & Test: Once the model was trained and chosen, it The authors would like to thank School of Engineering at
was time to test it. In order to do so, Matlab has useful Cardiff University, ASTUTE and CAMSAC for their support.
functions that make this work more accessible. The function
yfit allows to make predictions with new data or just to fit the REFERENCES
model. The test dataset used for this purpose contained 7655 [1] S. Lu, Y. C. Tu, and H. Lu, 2007. “Predictive condition based
instances and 9 attributes including the one to be predicted. maintenance for continuously deteriorating systems,” Quality and
Reliability Engineering International, 23(1), pp.71-81, 2007.
A comparison of the real results versus the predicted ones by
[2] H. M. Hashemian, and W. C. Bean, 2011. “State-of-the-art predictive
the classifier was made and the calculated accuracy of the maintenance techniques,” IEEE Transactions on Instrumentation and
model after testing was found to be 94.98%. Measurement, 60(10), pp. 3480–3492, 2011.
[3] I. H. Witten, E. Frank, M. A. Hall, and C. J. Pal, Data Mining: Practical
VI. CONCLUSION machine learning tools and techniques. Morgan Kaufmann; 2016.
In this paper soft computing tools and IoT enabled [4] C. Florin, and A. Srivastava, “MedTech Prognostic IoT : Predictive
equipment were used to improve the quality performance Maintenance for Medical Devices”, Karvy Analytics,
http://www.karvyanalytics.com/MedTech-Prognostic-IoT-Predictive-
assessment of the analysers in a clinical laboratory. In the first Maintenance-for-Medical-Devices.pdf, Accessed on 25 Feb 2018.
phase macros have been developed in MS Excel for [5] A. A. Toporkov, 2008. “Criteria and methods for assessing reliability of
automating the performance assessment. In the second phase medical equipment. Part II: Special requirements for reliability of
the application of manufacturing techniques for improving the medical equipment and methods for improving reliability,” Biomedical
Engineering 42(2), pp. 82–86, 2008.
quality of data and its graphical visualisation has been
[6] R. Hattemer-Apostel, S. Fischer, and H. Nowak, “Getting better clinical
implemented. In the third phase the approach to predictive trial data: an inverted viewpoint,” Drug Information Journal, 42(2),
maintenance has been applied and a classification model with pp.123-130, 2008.
94% accuracy has been developed using historical big data [7] L. A. Morgan, “The importance of quality improvement,” Perceived
obtained from the IoT enabled equipment. This model allowed quality , pp. 61–64, 1985.
the user to predict the future behaviour of the machine by

163

S-ar putea să vă placă și