Sunteți pe pagina 1din 44

BACHELOR THESIS

Computer vision based approach


for analysing fluids inside
transparent glass vessels

Jasper Busschers
September 9, 2018

Johan Loeckx, Promotor


Computer Science
Contents

1 Requirement analysis 2
1.1 The act of extracting . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.1 Vessel Region . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.2 liquid surface . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.3 Liquid phases . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.4 Evaluate changes in properties . . . . . . . . . . . . . . . . . 4

2 Finding vessel region 5


2.1 Edge Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 Roberts cross operators . . . . . . . . . . . . . . . . . . . . . 5
2.1.2 Prewitt operators . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.3 Sobel operators . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.4 Noise reduction . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.5 Canny algorithm . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Edge integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3 Finding liquid surface and phase separations 12


3.1 Recognising liquid surface . . . . . . . . . . . . . . . . . . . . . . . 13
3.1.1 Generate all possible curves . . . . . . . . . . . . . . . . . . 14
3.2 Phase recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2.1 Determine number of surfaces . . . . . . . . . . . . . . . . . 15
3.3 Scoring curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.3.1 Relative intensity change normal to the curve . . . . . . . . 16
3.3.2 Difference between average edge density above and on curve 16
3.3.3 Combination of the edge density and the gradient direction . 16
3.3.4 Including curvature . . . . . . . . . . . . . . . . . . . . . . . 17

4 Segmentation of liquid phases 19


4.1 Convolutional neural network . . . . . . . . . . . . . . . . . . . . . 19
4.1.1 Convolution layer . . . . . . . . . . . . . . . . . . . . . . . . 20
4.1.2 Pooling layer . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.1.3 Activation layer . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.1.4 Fully connected layer . . . . . . . . . . . . . . . . . . . . . . 22
4.2 GoogLeNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.3 ResNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.4 Fully convolutional neural network . . . . . . . . . . . . . . . . . . 24
4.4.1 Valve filter approach . . . . . . . . . . . . . . . . . . . . . . 25

i
4.4.2 Modular approach . . . . . . . . . . . . . . . . . . . . . . . 25
4.5 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.5.1 Sagi Eppel . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.6 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.6.1 Tensorflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.6.2 Theano . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

5 Determine changes based on extracted properties 29


5.1 Comparing fill level . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.2 Comparing distribution of different phases . . . . . . . . . . . . . . 30
5.3 Comparing colour levels . . . . . . . . . . . . . . . . . . . . . . . . 30

6 Experiments 32
6.1 Finding vessel region . . . . . . . . . . . . . . . . . . . . . . . . . . 32
6.2 Finding liquid surface . . . . . . . . . . . . . . . . . . . . . . . . . . 33
6.3 Segmenting phases . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
6.4 Determining failure . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
6.4.1 Candidate-elimination . . . . . . . . . . . . . . . . . . . . . 35
6.4.2 Decision tree . . . . . . . . . . . . . . . . . . . . . . . . . . 35

7 Conclusion 36

Bibliography 38

ii
Abstract

In this bachelor thesis, we will investigate which technologies in both


computer vision and machine learning are qualified to extract physical
properties of fluids inside transparent glass vessels from images. These
properties are then used to compare 2 pictures of the same sample in
order to detect how these samples change over time. A sample that
doesn’t change properties is said to be stable, while any change in prop-
erties should indicate that a failure happened to the sample. For this
thesis is assumed that a failure can be identified by phase transitions
that occurred inside the vessel. These changes will be evaluated by
comparing the values of two images of the same sample. The proper-
ties that will be used to identify these are the fill level, colour and the
distribution across different phases of the fluid inside the vessel.
Chapter 1

Requirement analysis

1.1 The act of extracting


The act of extracting properties can be divided into multiple sub problems. These
will be evaluated separately and later combined in order to specify failures that
occurred to a sample.
The first step when processing an image is to identify if there is a vessel and if so,
recognise its boundaries. Only then we can proceed to analyse properties about
the fluid inside the vessel.
The first property to be extracted is the fill level of the fluid, this can be used to
identify failures happening to the sample. The failures to detect are separations
inside the fluid and reductions in liquid level.
For the last property we’ll be looking at the representation of different phases
inside the vessel. Extracting these values allows us to identify phase transitions
more accurately by looking at their representation inside the vessel. Each of these
sub problems will be evaluated separately. Multiple approaches in both computer
vision and machine learning will be compared in order to solve each of these
problems.

1.2 Requirements
1.2.1 Vessel Region
Ever since The Summer Vision Project (1966) [1], the field of computer vision has
been fascinated by ways to model vision far before any computer nor data existed
needed for such problem. Here were the first mentions of boundary detection and
object recognition being key features defining vision.
A clear distinct should be made between object detection and object recognition.

2
Object detection tells us a general location where we can find the object (given
as blobs). Object recognition on the other hand, means finding the boundaries of
the object to segment its shape from the background.
In this thesis is assumed that all sample images contain a glass vessel. Therefore
we’ll be discussing methods for finding the boundaries of the vessel. Doing this
allows neural networks used later to focus their attention only on the pixels that
are relevant for extracting properties of the liquid.
Research on this topic has been done by Sagi Eppel and Tal Kachman from the
Israel Institute of technology. In their paper of 2014 [2] they describe a computer
vision method based on edge density to find the vessel region. Their methods will
be further discussed in the next chapter.

1.2.2 liquid surface


Finding the liquid surface can be helpful in order to detect deviations in fill level
of the sample. A change in fill level can be the first indication of a phase transition
happening and therefore a failure. For example, a reduction in fill level can indicate
evaporation happening. By scanning for multiple surface lines it is also possible
to distinguish 2 separated liquids. For example, a layer of oil on top of the liquid.
Finding the liquid surface is not just helpful for detecting failures in samples, but
also for wide range of other chemical and industrial processes such as bottle filling
and distilling. In the past, this was usually achieved by using floating beats or
some laser technology [3].
In the 2014 paper of Sagi Eppel [2] an algorithm is proposed to find the surface
level of a fluid inside a transparent glass vessel. Here, 35 formulas are compared
by their ability to detect the correct surface lines. In this paper, they speak of
the difficulties in accurately determining the liquid surface. Because the method
looks at the pixel intensity, the algorithm can misjudge reflections on the glass or
properties of the vessel as surface lines.
In 2015 they followed up their work solving the problem where the algorithm
misjudged properties of the glass vessel as liquid surface [4]. Here they looked at
the relation between the vessel shape and the curvature of the reflections to more
accurately predict the surface of the fluid.

1.2.3 Liquid phases


The most accurate measurement of phase transitions must be the representation of
each phase within the vessel. Computer vision solutions for this are too limited for
such a problem. They might be able to recognise boundaries of different phases,
but recognising those would require very specific handcrafted methods.
Therefore, we will be looking at deep learning solutions that are able to segment the

3
vessel region into different phases. Again this topic has been addressed by Sagi
Eppel, in 2017 [5]. They discussed the implementation of a fully convolutional
neural network (FCNN) used to separate different phases and segment the liquid
phases inside a vessel.

1.2.4 Evaluate changes in properties


Once the properties are extracted from the image they can be evaluated and com-
pared in order to identify when a failure happens to a sample. In this work is
assumed that a failure can be identified by spotting phase transitions within the
fluid. These reactions can be observed by looking at the channge representation
of different phases inside the vessel over time.
To determine how a liquid has changed over time we look at 3 properties : the
distribution of different phases in the vessel, the colour of those phases and the fill
level. To evaluate how the colour changes over time we’ll also need to handcraft a
way of comparing pixels of 2 images within a region of interest. One of the diffi-
culties in comparing these 2 images is the assumption that the 2 images have been
exposed to the same light, as different lighting can change how cameras perceive
the colours. This will be explained in more depth in section 5.3.

4
Chapter 2

Finding vessel region

It’s well known that most non-redundant information can be found by looking at
the edges of an image [6]. That’s why much of the research in computer vision
has evolved around detecting edges and using that to identify objects. The goal of
this chapter is to recognise the position of the vessel on the image. In this chapter,
we will be evaluating several computer vision methods that could be applied to
achieve this goal.
Most of these methods use changes in pixel intensity to recognize edges. Then, a
grayscale image is created by using some form of thresholding [7]. Thresholding is
used to create a mapping from an image containing only the values that are above
the threshold, in our case keeping only the edges of the image. In this result we
can start looking for shapes that may represent a vessel. Here we assume that all
the vessels are symmetric, which is the case for most if not all lab vessels. This
assumption makes it a lot easier to recognize the shape of the vessel since we can
rate possible shapes based on their symmetric properties.

2.1 Edge Detection


2.1.1 Roberts cross operators
The first edge detection operator proposed by Lawrence Roberts in 1963 is one
of the first who tried to use masks in order to detect edges [8][9]. The operator
uses 2 2x2 convolution masks Gx and Gy, which are used to calculate diagonal
differences to detect edges.
 
1 0
Gx =
0 −1

5
 
0 +1
Gy =
−1 0
These 2 masks respond maximally to diagonal edges and can be performed
independently from each other. Afterwards the results of Gx and Gy can be
combined using the formula given below.

|G| = (Gx2 + Gy 2 )

Roberts cross is mainly used because of its fast computing time and its stability as
it only makes use of additions and subtractions. One of the biggest disadvantages
of this method has to do with its small mask size. This makes the operator very
sensitive to noise[9][10].

2.1.2 Prewitt operators


The Prewitt operators are very similar to Roberts cross as it also uses masks to
detect the edges of an image. The Prewitt operators were proposed in 1970 [11]
as an operation specialized in detecting horizontal and vertical edges. The way it
achieves this, is by using 2 3x3 masks Gx and Gy that are used to calculate the
difference between opposite sides.
 
−1/3 −1/3 −1/3
Gx =  0 0 0 
1/3 1/3 1/3
 
−1/3 0 1/3
Gy = −1/3 0 1/3
−1/3 0 1/3
This implementation is less sensitive to noise than Roberts cross but does have
a bigger computational cost with the use of its bigger masks[10]. Combining the
result is done in the same way as done in the Roberts cross operators, using the
sum of the two squares.

2.1.3 Sobel operators


Sobel operators are a slight variation of Prewitt operations which uses more weight
on the central coefficients of the masks [10][12]. It is shown that using this distri-
bution of weights allows better image smoothing.
 
−1/4 −1/2 −1/4
Gx =  0 0 0 
1/4 1/2 1/4

6
 
−1/4 0 1/4
Gy = −1/2 0 1/2
−1/4 0 1/4
It is obvious that Sobel operations have the same complexity as using Prewitt
operations as they both use 3x3 masks. Therefore, it is preferable to use Sobel
operators, because edges are better localized. The images below show the result
of the Sobel operation performed on a grayscale input image.

[10]
a,b,c)From the result of the two masks Gx and Gy, the magnitude M can be
computed using the formula discussed in section 2.1.1. d) This result is compared
to a previous chosen threshold; if it’s bigger than the threshold, it gets added to
the gradient module image. Increasing or decreasing the threshold causes more
edges to be detected, so this variable must be chosen not too high nor too low.

2.1.4 Noise reduction


The operators discussed so far do no sort of pre-processing to reduce noise in the
image. One of the most used operators to reduce noise is the Gaussian blur, which
takes as input the size of the mask to apply. The operation works by taking the
average of all pixels inside the mask. The size of the mask defines how strong the
blur should be applied [13].

7
2.1.5 Canny algorithm
Canny edge detection is a multi-stage algorithm based on the operators discussed
above [14][15]. It takes as input a grayscale image and applies a 5x5 gaussian
blur to the image to reduce noise. The blurred image can be passed to one of the
operators discussed above (Sobel, Prewitt, Robert Cross) to generate a gradient
module image.
The goal of the Canny algorithm is to create thinner edges (1px) and to distinguish
edges from noise. It achieves this by iterating over the gradient module image. For
each pixel along the edge, the direction is calculated using the following formula.

α = arctan(Gy/Gx)

The direction α of an edge specifies in what direction pixel intensity changes.

[16]
If we know the direction of the edge, we can perform a check to see if the pixel
is a local maximum, which means that it is bigger than its non-edge neighbour-
ing pixels. All pixels that are a local maximum and also above our chosen upper
threshold are placed in one of 4 bins, each bin representing one of the possible
directions.
After marking all the edges, the algorithm continues by using the lower threshold
to check for neighbouring edges in order to reduce noise and find the main shapes.
The algorithm loops over all found edges, checking if their direct neighbour edges
are above the lower threshold. If a neighboured pixel is above the lower threshold
and also respectively a local maximum by the previous conditions then it also gets
added to one of the bins.
This process gets repeated till no more new edges can be found. Doing this allows
us to include thinner edges that are part of the desired object without also includ-
ing noise with similar properties.

The first step of the algorithm is implemented using convolutions. It is pos-


sible to implement convolutions in time O(n log(n))[17] where n is the number
of elements. If an image has dimensions m x n than the computational cost for
computing Gaussian blur and performing one of the edge detection operators is
O(m*n log(m*n)) [18].

8
In the final step of the algorithm, scanning the edges has a worst case of O(n*m).
Therefore the overall time complexity is O(m*n log(m*n)) [18].

2.2 Edge integration


The previous sections showed us how to recognize all edges of the image. These
may contain background objects or noise. This section will provide a procedure for
removing all unwanted edges while only keeping the vessel shape. The procedure
used to achieve this has been described and implemented by Sagi Eppel [19]. A
couple assumptions have been made in order to correctly rate the vessel shape.
First we assume the vessel has a symmetric shape. The second assumption we
make states that the vessel is completely visible on the image, in case one of these
assumptions proves to be wrong some steps in the method have to be modified.

[19]
The algorithm starts from the result of the Canny edge detection algorithm
discussed in section 2.1.5. In the first step of the algorithm, all edges that touch
the border are removed because we assume the borders of the vessel should be
completely visible on the image.

[19]
A binary image is generated by taking the negative of the image found after
performing the first step. In this result, every pixel that is not part of the back-
ground will have value 1, while the background pixels will have value 0. This can
be used to multiply the original image in order to ignore the background region.

9
[19]
The image is segmented again only keeping the largest blob as vessel shape.

[19]
This step is used to remove the shape of the stand holding the vessel from the
vessel shape. This step can be optional depending on the dataset you’re working
with. The operation looks for 2 parallel regions on the horizontal axis and removes
the thinner of the two.

[19]
Every vertical line of the blob gets scanned and scored based on its symmetric
properties. The highest value is kept as symmetry axis for the vessel. The score is
calculated by taking the fraction of lines that have their centre in the symmetry
axis. Now for each line where the blob doesn’t have a centre on the axis, either
the left or right boundary is chosen for both sides, so that its centre also lies on
the symmetry axis.

[19]

10
The result of which provides us with just the shape of the vessel as a binary image.
In chapter 4 will be discussed how this pre-processing can be helpful for learning
neural networks to learn features about fluids inside the vessel.

11
Chapter 3

Finding liquid surface and phase


separations

The previous chapter discussed methods that allow us to find the vessel location
on the image. Using this vessel region, it is possible to process just the vessel area
while disregarding the background. This reduces the complexity of all upcoming
operations by a sincere amount since it has to process a smaller area. In this chap-
ter we’ll discuss an algorithm for finding the surface of a fluid inside a transparent
glass vessel.
The ability to recognise the surface of liquid fluids is known to be incredibly useful
for many chemical and industrial applications [3][20]. Such applications are bottle
filling or distilling. The fill level can also be used to determine changes between 2
pictures of a sample. A fall in fill level can indicate that there is evaporation hap-
pening. Sometimes there can also be multiple different fluids that are separated
in the vessel. For example having a layer of oil on top of water creates 2 surfaces.
For this reason we’ll be searching for multiple surface lines. Doing this allows us to
not only find the fill level of the fluid inside the vessel, but also detect if the fluid
has separated. These separations will prove to be useful in the chapter 5, there we
will discuss how these properties can be used in order to identify a failure.
Research in computer vision solutions for recognising separations between liquid
and air has been very limited [2][21]. Recognising the separation between differ-
ent liquid phases even less [22][4]. Most significant research on this topic is done
by Sagi Eppel and Tal Kachman whose research will briefly be discussed in this
chapter.

12
3.1 Recognising liquid surface
One of the biggest difficulties when trying to recognise the surface of a liquid is how
the shape of the surface is not only bound by the vessel shape but also depends
on the camera angle used to take the picture. Background objects and reflections
on the glass can also bring their own difficulties, as changes in pixel intensity can
get indicated as surface.
In Sagi Eppel and Tal Kachman’s paper of 2014 they describe an algorithm for
finding the liquid surface [2]. The algorithm is given below in pseudo code.

Algorithm 1 Finding liquid surface


Data: Y ← an outline in the vessel starting at the top of the vessel
Result: res ← A list of pairs of curves that match to surface lines on the image
while y not at vessel bottom do
if y is a possible outline for surface then
curves ← GenerateAllPossibleCurves(y)
counter ← 0
highestScore ← 0
bestCurve ← 0
while counter is smaller than size(curves) do
score ← matchToImage(curves[counter])
if score is bigger than highestScore then
highestScore ← score
bestCurve ← curves[counter]
end
counter ← counter-1
end
if highestScore above threshold then
res ← res + bestCurve
end
end
y ← y-1
end

The algorithm starts with the vessel region found using techniques from chapter
2. From here it starts iterating over every horizontal line in the vessel in order to
find possible outlines of the surface. Then it calculates all possible curves for each
outline and rates the curves based on their match to the image. The curves with
the highest scores that pass some threshold will be kept as surface lines.

13
3.1.1 Generate all possible curves
In the first stage of the algorithm, all horizontal lines within the vessel are scanned.
For each one, the algorithm scans all possible curves centred around the horizontal
line. Computing all possible curves can be a tremendous intensive task, but its
complexity can be greatly reduced by assuming the camera is not tilted to the left
or right.
Using this assumption, only two scenarios can occur; one case is when the camera
is pointing straight to the vessel. In this case the surface line is just represented as
one straight line. The other scenario happens when the camera is tilted upwards.
In this case the surface line is represented by 2 curves that form a surface on or
around the horizontal line.
The best implementation for finding a matching pair of curves is found by sepa-
rately matching and scoring the upper curves and lower curves, picking the best
of those results as pair representing the surface.

[2]
The viewing angle of the camera can be a factor in determining the surface
level of the fluid and therefore the fill level inside the vessel. a) In the perfect case,
where the camera is pointing straight to the vessel, the fill level of the fluid can be
easily read from the horizontal surface line.
b)c) In the other case, when the camera is tilted up, we can look at the relation
between the shape of the surface lines and the angle by which the picture is taken.
We can further reduce the amount of curves to be calculated by assuming the
camera is never tilted more than a certain percentage. The height between the

14
two surface lines can be expressed by the following formula.

h = w ∗ sin ϕ

This formula can be used to help accurately state the fill level of a fluid inside a
transparent vessel. We can find the fill level by looking at the lower curve and
comparing that to the vessel size.
After finding this surface, the algorithm continues by searching for more surface
lines. These can be used to express if the fluid is separated.

3.2 Phase recognition


3.2.1 Determine number of surfaces
Once the algorithm finishes searching all surface pairs in the vessel, it has to decide
which one of those displays a separation between 2 phases. This is easy for samples
where it is known beforehand how many phases they contain. In that case, rating
every pair to find the best suitable pairs would be the best solution. Though in
this study the amount of phases occurring in the sample are not always known
beforehand. This requires us to use some form of threshold for accepting curves as
surface lines. This threshold has to be below the max score a curve can get, but
has to be high enough only to accept curves that separate 2 distinct phases. One
of the problems that can occur when rating curves in this way, is that multiple
curves convolve the same surface level well. In this case multiple pairs of curves
get chosen as surface indicating too many phases. One of the ways of solving this
is by requiring a minimum distance between two pairs of curves[2].

3.3 Scoring curves


The score of a curve represents the correspondence between the image and the
chosen curve, in order to choose the highest scoring curves as surface lines. Using
a good formula to match curves with the actual surface is the key of making
accurate predictions.
In Sagi Eppel’s paper [2] 35 different formulas are compared in their performance
of finding the liquid surface. For this research we’ll limit us to 3 different formulas
that were concluded to be best performing at this job. All of these methods use
a grayscale version of an image to score the curve based on its correspondence to
the edges of the image.

15
3.3.1 Relative intensity change normal to the curve
Using just intensity as indicator of surface levels would be too broad, as areas with
high intensity could be falsely flagged as surface lines. Looking at the relative in-
tensity change gives both humans and computers a good indication of surface
boundaries, but gives weak results on surfaces with weak blurry boundaries.
The relative intensity change normal to the curve was found to be the best indi-
cator for liquid surfaces. This, because it balances the illumination effect and the
boundary strengths. though its accuracy in distinguishing separations between 2
liquids and between liquids and emulsion was proven to be much lower.

3.3.2 Difference between average edge density above and


on curve
Edge density can be another indicator for finding the surface boundaries, which
can be calculated by taking an average of all the pixel values along the curve. The
method discussed here compares the average edge density above and on the curve
to calculate a score to match this curve to the surface line. This method was found
to perform better distinguishing separations between liquid and emulsion. On the
other separations it performed relatively similar to the first discussed formula.

3.3.3 Combination of the edge density and the gradient


direction
In this formula they proposed to combine edge density and gradient direction in
order to match curves to the surface line. Here is assumed all pixels along the
curve representing a surface line should have the same direction. 2 points are
said to have the same direction if their intensity changes in the same direction as
discussed in chapter 2.
The method works by calculating the scalar product between the curve and the
gradient direction, multiplying this with the edge density to give the score per
pixel. This method scored significantly better when detecting separations between
2 liquids or between liquid and emulsion. This accuracy comes at a cost, as it loses
some accuracy detecting separations between liquid and air[2].

method liquid-air liquid-liquid liquid-emulsion


intensity change normal to the curve 99.3 77.3 63.2
average edge density above and on curve 98 78.9 77.1
edge density and the gradient direction 96.6 83 84.2

16
[2] The table shows the accuracy of each method in percentage when distinguishing
between different phase separations. Rating a best function is not trivial and
depends on the goal. For determining the fill level we would want to use the first
method. While if the goal is identifying when a fluid is separating into different
liquids we would prefer to use the last method.

3.3.4 Including curvature


All methods spoken of above, still suffer from a reduced accuracy due to features
resulting from the vessel shape that may get recognised as surface line. In 2015,
Sagi Eppel proposed a solution to this [4] adding a factor to the scoring formula
representing the curvature of the vessel at a certain point. This method relies on
the assumption that lab vessels are cylindrical, meaning curvature can be read
from the gradient module image. The curvature at a point on the vessel border is
calculated by following the curve in 2 directions and summing up the angles that
points make along the curve.

The total curvature at a certain point can be calculated using the following
formula.
n
X
θ(P ) = θi/n
i=1

This gives us the average curvature when looking n pixels far, the n to choose
depends on the size of the image. These calculated curvature values can be used to
create a factor stating the curvature at a certain height along the vessel. A simple
form of such factor is: F(P)=1+θ(P)/C. Recall that θ(P) represents the curvature
at point P and C is a constant used to choose the influence of curvature on the
factor.
The factor gets applied by dividing the score given at each point along the curves
by our factor F. It’s easy to see that if θ(P) = 0 meaning a flat surface that F(P) =

17
1 meaning no change in the original scoring method. By using this method, errors
caused by the shape of the vessel were reduced from 19 percent to 8 percent[4].

18
Chapter 4

Segmentation of liquid phases

In the previous chapter, we examined different computer vision approaches that


are capable of extracting properties out of images of fluids inside transparent glass
vessels. The second chapter provided us with methods to recognize the vessel lo-
cation. The last chapter discussed methods for finding the fill level of a fluid and
identify multiple liquids that are separated.
This chapter will investigate methods to further identify different phases within the
vessel. The approaches that have been discussed can be interesting for analyzing
shapes, however they are not capable of providing a semantic segmentation of the
liquid. Doing such a segmentation would require some kind of machine learning
application in order to learn different appearances of liquid phases.
In 1998 Lecun [23] showed the first implementation of a multi-layer neural network
performing gradient-based learning on text recognition. Many of the techniques
used here, later became key technologies for modelling convolutional neural net-
works. Later used for AlexNet’s deep convolutional neural network [24]. This
work showed the first example of a convolutional neural network outperforming
every traditional neural network in the ImageNet Large Scale Visual Recognition
Challenge (ILSVRC) in 2012.
Going into the mathematical details or implementations of all these networks would
go beyond the scope of this research paper as these are implemented by specialized
machine learning libraries (Torch, Tensorflow, Caffe...). This chapter will instead
try to give a general explanation about the building blocks for a convolutional
neural network and models proposed to solve different kinds of learning problems.

4.1 Convolutional neural network


In traditional neural networks, image processing was usually achieved by having a
set of neurons as input layer, each taking one pixel of the image [25]. Afterwards,

19
this input gets processed by fully connected hidden layers and combined for a
result. The limitations of such a network is that it doesn’t scale well to bigger size
images and that neurons on the same layer work independent from one another
without sharing connections. Convolutional neural networks work around this
problem by using 3-dimensional layers (width, height and depth) [26].

[26]
Using this architecture, neurons can be implemented as filters so-called convo-
lutions. Multiple of those filters together create a convolutional layer that produces
a 3-dimensional result where the dept is defined as the number of filters that were
applied on that layer. The layer produces a 3-dimensional output of which the size
is dependant on the size is depending on the size of the convolution filters used,
the stride (overlap) and amount of filters used in the layer. More on this in the
next section.
In most cases, the goal of a convolution layer is to expand our input into a higher
depth dimension output to extract more advanced features, though reducing the
dimensions can also be done. A convolution layer is most commonly followed up
by either a pooling layer or an activation layer. Both of which are generally used to
reduce the dimensions of the result that was generated by the convolutional layer.
The purpose of the activation layer is to add non-linearity by using an activation
function on the weight multiplied result for each filter in the previous layer. The
activation function should be a non-linear function (sigmoid, tanh, relu,...). These
will be discussed section 4.1.3.
The activation layer reduces the depth of its input while the width and height
dimensions remain the same. In order to reduce those dimensions, the pooling
layer can be used. Whose main purpose is to combine pixels together to reduce
its dimensions while only keeping a value representing the combined area. One of
the most commonly used pooling techniques is max pooling. More on this topic
in section 4.1.2.

4.1.1 Convolution layer


Convolution is a mathematical process originating from signal theory. The goal oh
which is to combine multiple results together[25]. Each result contains a weight
stating how strong its value should weight in the combined result. This is the main

20
process in convolutional neural networks, where we are looking for the weights that
represent a concept.
In neural networks, convolutions are defined by their filter size and a stride. The
filter is used to slide both horizontally and vertically over the image to generate an
activation map stating how much the filter is triggered at each point. The stride
of the filter defines the steps it should skip when moving horizontally or vertically
over the image. The size of the outputted activation map is defined as follow:

D = ((W − F + 2P )/S) + 1
Where D is the dimension of the outputted activation map, W the input di-
mension (width or height), F the filter size and S the stride. The dimension of the
output can be manipulated by adding zero padding around the borders of the in-
put. P states the amount of padding to be added around the border. It is common
practise to make each filter the same size in a convolutional layer. A convolutional
layer returns the results of each of these filters as a 3 dimensional matrix, where
the depth is defined by the amount of filters used on that layer and the initial dept
of the input.

4.1.2 Pooling layer


It is common practice to use a pooling layer in between two convolutional layers.
The pooling layer is responsible for down-sampling the input dimensions in order
to control over-fitting and the amount of computations in the network. Pooling
layers are usually implemented as a 2x2 filter with stride 2. The pooling layer
takes a matrix as input with height H1, width W1, and depth D1 and produces an
output with dimensions W2xH2xD2. The output dimensions are defined by the
following formula.

W 2 = ((W 1 − F )/S) + 1
H2 = ((H1 − F )/S) + 1
D2 = D1
Where F is the filter size and S the stride of the pooling layer, with a filter size
of 2 and stride of 2 it’s easy to see that the original width and height will be cut in
half. The pooling layer works on the full depth of the input, so the output depth
stays the same as the input depth.

21
Max-Pooling Max-pooling is one of the most commonly used techniques to
implement pooling and makes use of a 2x2 filter with stride 2. It works by keeping
the maximum value out of the 4 values the filter covers. These are returned in a
new matrix with half the height and width of its input, keeping only the highest
values.

4.1.3 Activation layer


In 1959 David H. Hubel studied the working of the visual cortex of the brain by
testing different imagery on a cat and looking at the response in the visual cortex
[27]. Here he discovered neurons in the brain have a hierarchical structure, where
initially activated neurons are used to process simpler features as light intensity.
Followed by neurons for more advanced features as movement and orientation.
Activation functions are based on this principle replicating neurons being trig-
gered. They accept a vector x of inputs and computes the weighted sum using the
following formula. X
z= (W i ∗ Xi + b)
i
Where Xi and Wi are the values found on the i’th position of the input vector,
and weights vector, and b the bias to be added to each result. The result z gets
passed to an activation function that, based on z, decides whether a neuron gets
triggered or not. A good choice of learning function can dramatically increase the
learning speed of the network. ReLu or rectified linear unit is often named as
the activation function to be used for convolutional neural networks[25][26]. This
because it’s been shown that a ReLu function converges at least 6 times faster
than tanh or Sigmoid [24]. The ReLu function also has a very low computational
cost as it only relies on a max operation.
The function is rather simple as it neglects all negative input x and passes all
positive values. The advantage of this is that the gradient never smooths out, so
learning rates stay high. A problem can occur when the input is negative. To
address this problem, multiple different adaptations of the ReLu function have
been proposed for example leaky ReLu and exponential ReLu. These alternatives
create a small slope on the negative side of the axis to improve learning on negative
input.

4.1.4 Fully connected layer


Fully connected layers are used at the end of most traditional convolutional neural
networks. Their task is to reduce the space to just an array, stating the truth value
for each classification. This is achieved by having a set of neurons that each take
all activations from the previous layer as input. The amount of neurons defines

22
how many classifications it should return. Fully connected layers are known to
heavily increase the number of parameters and thus memory usage within the
network. For this reason, these layers have fallen out of interest in more recent
works [28][29].

4.2 GoogLeNet
In 2015, GoogLeNet presented their neural network without a fully connected layer
which heavily outperformed previous state-of-the-art Alexnet [28]. In their paper,
they present a new model called the inception module. This module combines
multiple convolutional and pooling layers together into a single module. By stack-
ing multiple of these modules on top of each other they managed to reduce error
rate by almost 10 percent on the ILSVRC. By removing the fully connected layer
from their architecture their network also required 12 times fewer parameters than
previous state-of-the-art AlexNet. This allowed for a deeper network as memory
consumption could be kept low, generally deeper networks are said to perform
better but require much more memory.

[28]

Inception module In an inception module, multiple different convolutional and


pooling layers are performed in parallel. The stride and zero padding have been
chosen accordingly so that each parallel part outputs the same dimensions. Af-
terwards, each result is combined together. This creates a 3-dimensional output
where the depth is defined as the summed up depth of the results.

4.3 ResNet
GoogLeNet already showed that better accuracy can be reached using deeper net-
works. ResNet continued this idea in 2015, improving accuracy even more by using

23
152 layers (19x more than AlexNet)[29]. Their model was the first outperforming
human level with an error rate of just 3.57 percent. These networks made it clear
that better accuracy can be achieved with deeper networks. Deeper networks are
known to be more difficult to train. For this reason, they propose a new approach
called residual learning. This approach allowed them to build far deeper networks
than was possible before.

4.4 Fully convolutional neural network


The previous section discussed different layers that are used in a traditional con-
volutional neural network as seen by AlexNet [24]. Such neural networks have
shown to be able to outperform traditional technologies on image classification
and object detection tasks. Though such network architectures aren’t well suited
for doing image segmentation as it returns us a list of classifications matching with
the image.
In 2014 Jonathan Long et al. presented fully convolutional versions of AlexNet,
VGG16 and GoogLeNet their models. These modifications allowed them to per-
form semantic segmentation on the Imagenet dataset [28][30]. The approach was
called a fully convolutional neural network, as it no longer made use of any fully
connected layer to reach a result.
A fully convolutional neural network doesn’t return a set of classifications per im-
age like the previously discussed models. Instead, it provides us a map with one or
more labels for every pixel. For this, they rely on so-called deconvolutional layers,
whose task it is to increase the output dimensions again. Most often such a layer
is implemented as a reversed pooling operation.
In 2015, SegNet proposed an encoder-decoder architecture for implementing fully
convolutional neural networks [31]. Here they proposed a symmetric architecture
using the reverse operations when upsampling the output.

[31]

24
By using this symmetric approach, they were able to share spatial information
used to upsample the image again. For max pooling this would be storing the lo-
cation where the max value was found. While the architecture seemed promising,
the benchmark results were still too low for anyone to be using them.
Later Fisher Yu et. al proposed a convolution-based layer used to upsample data,
reaching far better results [32]. Such layers are called dilated convolutions and are
used to map the input to a bigger field of view. The best results were found in
2017 by DeepLab who modified a ResNet model in such a way that it could be
applied to image segmentation [33]. Fully convolutional neural networks were also
tested to reach a classification 25 times faster than an approach making use of
fully connected layers [34]. A disadvantage of fully convolutional neural networks
is that they require more detailed labeling in order to learn segmentation which
can be more costly.

4.4.1 Valve filter approach


In 2017 Sagi Eppel published a paper where he uses fully convolutional neural
networks for segmenting liquid phases within a glass vessel [5]. The networks dis-
cussed in their paper were implemented using a modified version of the VGG net
model [35]. The goal of their implementations was to create a segmentation for
each phase within a glass vessel.

[5]
Here the results of 2 different approaches were compared. The first only takes the
full image as input to produce a pixel-wise labelling. Their second approach is
called the valve filter approach as it combines two inputs; the image itself and the
vessel region. The inputs are combined using a pixel-wise product blacking out
the background. This approach scored far better on every classification than the
naive approach.

4.4.2 Modular approach


Later in 2017, Sagi Eppel published a modular approach for segmenting liquid
phases inside glass vessels. Here he chose to use a hierarchical modular neural net-
work [36]. Modular neural networks are used to divide a task into multiple smaller

25
tasks, using different independent neural networks for each one. The advantage of
using a modular approach is that every net can be substituted without requiring
retraining of the entire network. It also allows us to reuse parts of the network
again in other applications.
This approach has mostly been used to combine the result of 2 specialized net-
works in a parallel manner, not hierarchical. In Sagi Eppel’s approach, a hier-
archical modular approach is suggested, consisting of 2 fully convolutional neural
networks. The first of which identifies the vessel region and passes its result to the
next network, which will segment and identify the different liquid phases.

[36]
Whereas the valve filter approach used the ground truth vessel region as input,
this modular approach uses a fully convolutional neural network to determine the
vessel region. This can be seen in accuracy since errors by the first network may
get amplified when used as input for the next network. Nevertheless, for recogniz-
ing different phases, the modular approach performed similarly to the valve filter
approach.

4.5 Datasets
The fully convolutional networks discussed are not able to reach high accuracy in
distinguishing most gas and solid phases. This because these phases were under-
represented in the datasets used to learn their networks. In order to successfully
learn these classifications, we would require a more representative dataset. A new
dataset will be provided by a chemistry lab who requested to remain anonymous.
This will serve as the main source of data for this thesis. The dataset contains thou-
sands of pictures of size 260x100 pixels. The classifications made in this dataset
are not yet clear, but are assumed to contain a set of labels per image, stating
failures that appeared. These failures can relate to a colour change or different
phase separations. The datasets used to train fully convolutional neural networks
contain one or multiple labels for each pixel where as regular convolutional neural
networks typically contain one or more labels for each image. Since the dataset
is assumed to only contain image-wise labels, more labelling would have to be
performed in order to use this for semantic segmentation.

26
4.5.1 Sagi Eppel
The fully convolutional networks discussed and implemented by Sagi Eppel have
been trained using a dataset of materials inside glass vessels. The set contains 950
hand-labelled images where every pixel contains 4 labels. The first label states if
it is part of the vessel region or part of the background. The second label states if
the pixel represents filled or empty within the vessel and the third label identifies if
the pixel is part of a solid material or liquid. The last label distinguishes between
different liquid phases and assigns one of 15 values to each pixel [37]. This dataset
can be extended with new images in order to reach better accuracy on phase
recognition.

4.6 Implementation
Implementing a neural network from scratch would be a tediously long process re-
quiring knowledge in many different fields such as number stability, linear algebra,
and calculus. That’s why many different libraries exist to make neural networks
more accessible by providing interfaces for different neural network architectures.
Kovalev et al. [38] presented a study in 2016 where they compared the 5 most
known deep learning frameworks named: Theano, Torch, Caffe, Tensorflow and
deeplearning4j. They studied the accuracy and complexity (in lines of code) of each
framework when implementing a fully connected neural network. Their study was
very limited because of their use of fully connected neural networks which don’t
translate well to real-world applications. Also, frameworks are constantly being
updated, so results from 2016 may vary from the current situation. Later in 2016,
Heehoon et al. [39] published a comparison using a convolutional neural network.
Here was stated that performance difference is mainly caused by the choice of
convolution algorithm.

4.6.1 Tensorflow
Tensorflow is an open source framework used for machine learning applications
researched and produced at Google. Its architecture doesn’t provide any option
to choose a convolution algorithm. Instead it executes all available algorithms in
its first run. Then the fastest algorithm is chosen for each layer to be used for
subsequent runs. The framework is supported by broad documentation, examples
and tutorials of different learning problems. All implementations of Sagi Eppel
discussed in this chapter [36] make use of Tensorflow to implement their fully
convolutional networks. The implementations of their networks can be found open
source on GitHub [36].

27
4.6.2 Theano
Developed at Lisa lab at the University of Montreal, Theano is a python frame-
work supporting the mathematical operations needed to design a neural network.
In Theano it is possible to specify the convolution algorithm to be used either
globally or layer-by-layer. Doing this layer-by-layer usually results in less memory
usage and thus achieves faster training time than using a global declaration [39].
Theano is used as a backend for the deep neural network frameworks Pylearn2,
Keras and Lasagne. These frameworks are mainly used to provide a simpler inter-
face for faster prototyping.

28
Chapter 5

Determine changes based on


extracted properties

The previous chapters provided us with the region of the vessel, the surface lines
separating different phases within the vessel and labelling for those different phases.
In this chapter we’ll discus how to use the results to compare those to previously
taken samples and see if and how the state of the fluid has changed over time.
The comparisons that will be made are based on the fill level of the fluid inside
the vessel, the colour of the fluid and the distribution of phases that appear in the
vessel.
A reduction in fill level may sign a phase transition happening, for example evap-
oration. The distribution of different phases can be used to specify what chemical
reaction is happening to the fluid. At last, by comparing the colour to previ-
ous samples, we are able to see how the sample degrades over time. Comparing
colours in 2 images is not always as trivial as it may seem. Every colour can be
described as a product of two properties: the reflectance (colour of the surface)
and the illumination (the colour of the light striking the scene)[40]. This means
that, in order to truly find the color of the fluid, we first must extract the colour
and intensity of the illumination and normalize the image in order to compare
two different samples. In this study is assumed that images of the same sample
are being taken under the same light conditions. For this reason, we’ll only be
covering the problem of colour constancy while its implementation will be subject
to further research. Being able to normalize the colors in the image is not required
for this application, although it would significantly improve the generalisability of
methods discussed in this paper.

29
5.1 Comparing fill level
Assumed to have found the vessel region by using methods from either chapter 2
or chapter 4, the total area of the vessel can be expressed as the number of pixels
within the vessel region. As we assume the vessel has a asymmetric shape, the
area of the vessel can be computed fast by scanning every vertical line within the
vessel region, summing their widths together. Now, starting from the surface line
found by methods described in chapter 2 or 4, we can do the same. The result of
which will give us the number of pixels within the liquid region. By comparing this
to the previously computed vessel area, we can find the percentage of the vessel
that is filled with fluid.

5.2 Comparing distribution of different phases


Chapter 4 described the implementation of a fully convolutional network used for
segmenting different phases within glass vessels. The result of which is a pixel-
wise labelling stating if and to what phase a pixel belongs to. This result can be
scanned in a similar way where we scan the vessel region and count how many
labels appear for each possible phase. By again comparing the number of labels
for each phase to the total vessel area, we are able to state the percentage of the
area every phase occupies within the vessel. These values can be used to identify
chemical reactions that happened, by comparing it to previously taken samples.

5.3 Comparing colour levels


Comparing colours of 2 images seems like a simple task by which one would naively
compare the average pixel value of 2 areas. though this naive way of comparing
doesn’t take illumination effect into account, this effect can cause colours to be
displayed differently by a camera.
In recent decades, computer vision research has proposed many different ap-
proaches for predicting the illumination effect on colours in the image and use
those to normalize the colours [41]. The purpose of these approaches is to elim-
inate the impact of light on the images, in order to obtain the light-independent
color of the surfaces in the image. According to Labertian Reflectance Model, the
observed color at a certain point can be expressed as a product of 3 coefficients.
These coefficients are the colour of the light source, the surface reflectance and the
camera sensitivity.
Z
θe(λ)c(λ)dλ

30
In this formula e(λ) corresponds to the light source and c(λ) represents the camera
sensitivity for wavelength λ over the visible spectrum θ[42]. This can be seen
as an ill-posed problem since both e(λ) and c(λ) are unknown. Therefore it is
not possible to solve this without making further assumptions. Algorithms like
GreyWorld, MaxRGB, and White Patch try to solve this by making assumptions
about the color of the light source.
Other approaches like gamut-mapping rely on statistical data or physics-based
models in order to predict either the illumination or the camera sensitivity [43].
The most accurate results for prediction of illumination have been achieved by
using convolutional neural networks. The implementation of such a network has
been discussed by Simone Bianco et al. in 2015[44][42].
For this thesis we will assumed images of the same sample are exposed to the
same light, tho these methods may be interesting to cover in upcoming research.
Normalising the colours of surfaces inside the image can be a useful tool for further
generalization of the methods discussed in this research.

31
Chapter 6

Experiments

In the previous chapters, multiple solutions were discussed to meet the require-
ments defined in chapter one, this chapter will evaluate the discussed methods and
propose suitable experiments to conduct during the implementation of this bach-
elor thesis. Here we take the assumptions about the dataset discussed in section
4.3.2 and look how the discussed methods can be used on this dataset.

6.1 Finding vessel region


Chapter 2 discussed a computer vision implementation for finding the vessel re-
gion based on edge density. Here multiple different edge detection operators are
compared. The Sobel operator was found to be the best performing operator for
detecting edges. Using this operator in combination with Canny algorithm results
in a binary image with thin edges. This result can be further processed to identify
the vessel shape based on its symmetric properties and leave us with a binary
image of only the vessel shape. In section 4.2.2 a modular convolutional neural
network is discussed using a fully convolutional network in order to recognize the
vessel region. Its accuracy was tested to be 83.45 percent for the classification of
the vessel region [36]. For the computer vision method of finding the vessel region,
no definite results were offered.

Experiments In the upcoming experiments, we’ll be comparing these 2 different


approaches in their ability to correctly identify the vessel region. In the first
approach, we’ll be using computer vision based operations in order to extract the
vessel shape from the images.
The second approach will be based on Sagi Eppel’s modular neural network. For
this, a fully convolutional neural network will be trained to segment the vessel

32
shape from the background. The network model we’ll be using is the DeepLab’s
fully convolutional version of the ResNet model [33]. This model achieved the best
accuracy to date when performing image segmentation.
The accuracy of these 2 approaches will be computed by taking an average of the
mean error over every pixel when compared to the ground truth.

6.2 Finding liquid surface


Chapter 3 discussed a computer vision algorithm that can be used to find surface
lines, separating different phases within the vessel. It shows how the camera angle
used to take the picture can influence the shape of the surface lines and propose
assumptions about the maximum degree the camera can be tilted to solve this.
The choice of scoring function was shown to heavily influence the accuracy. A
combination of edge density and gradient direction was tested to offer the best
results for predicting liquid to liquid and liquid to emulsion separations, although
it loses a little accuracy at liquid to air separations. Section 3.3.4 discussed another
method to improve accuracy, by taking the curvature of the vessel into account in
the form of a factor that can be applied to the scoring formula.

Experiments These methods will be tested and evaluated using the newly pro-
vided dataset. The goal is to identify the fill level and phase separations within
the fluid. After calibrating these methods for the newly supplied dataset, it is
expected that these methods will reach similar performance. These separations
will be compared to the vessel height in order to express the change in fill level
and separation.

6.3 Segmenting phases


Chapter 4 discussed multiple implementations of convolutional neural networks
and presented different models used to achieve high accuracy in classification and
detection problems. It further discussed how these models could be expanded in
order to perform semantic segmentation. Such an expansion requires a more de-
tailed dataset. The dataset that will be provided is assumed to only contain a
single set of labels per image. In order to perform semantic segmentation on such
a dataset, it would need to be relabelled.
Using the labels provided in the dataset it is only possible to perform a classifica-
tion problem, we’ll be proposing an example of this in our experiments.

33
Experiment : Convolutional neural network Here we assume no relabelling
will happen and propose a solution, by using a single label per image, which
states if and what failure can be seen in the image. For this, we’ll be designing a
convolutional neural network with as input a 260x100x3 image. Since a traditional
convolutional neural networks are not able to deal with different size images, it
would not be possible to reduce its input size. Instead, the vessel region discussed
in chapter 2 can be used to black out the background, reducing its influence on
the result. The resulting image can be processed by the hidden layers, learning to
recognize different failures. At this point, it is not possible to predict the success
of using this method as it depends on the data and classifications.

Experiment : Fully convolutional neural network Using fully convolu-


tional neural networks would require us to relabel the provided dataset with more
detailed pixel-wise annotations. This would be a time-consuming procedure, but
would also allow us to extract more detailed properties.
In our experiments, we’ll propose a fully convolutional neural network based on the
modular approach discussed in section 4.4.2. In their network, they used a fully
convolutional neural network in order to recognize the vessel region. Followed by
another network responsible for segmenting the vessel area into different phases.
In our experiments, we’ll be comparing accuracy of the first fully convolutional
neural network to the computer vision approaches as discussed in chapter 2.
Since we use a modular approach, the network for finding the vessel region can be
replaced with another method without requiring retraining of the second network.

6.4 Determining failure


The task of this thesis is not to determine the phase distribution in a vessel,
but rather to determine if and what failure happened by comparing 2 images
of the same sample. The exact details about every failure to be detected have
not yet been shared. That’s why in this paper is assumed that every failure can
be expressed as a chemical reaction and chemical reactions can be identified by
phase transitions within the vessel. By using these assumptions, failures can be
easily identified by comparing if the change in phase distribution is above a certain
threshold. However, in case this simple approach leaves us with indecisive results,
the classifications must be found using a more advanced method. In this case,
we’re dealing with a concept-learning problem for which the goal is to learn when
a failure occurs, based on the extracted properties and the labels.

34
6.4.1 Candidate-elimination
The candidate elimination algorithm is used to find a hypothesis by which it is
possible to classify positive and negative examples for a certain concept. The
algorithm starts off by accepting the whole version space as positive examples.
Afterwards, it further limits these boundaries by processing a set of training ex-
amples and updating its hypothesis boundaries so that it only contains positive
examples. In the end, the algorithm should provide a hypothesis by which it is
possible to judge if an example can be positively classified. In our example, if
a failure has occurred in the sample. The values to base this hypothesis on are
the distributions of different phases and the fill level of the fluid inside the vessel.
These values are expressed in percentages ranging from 0 to 100, this makes for
infinite possibilities of values (when allowing rational values) and so an infinite
version space. In standard approaches, such continuous values are being expressed
in terms of intervals[45].

6.4.2 Decision tree


One of the limitations of using the candidate elimination algorithm is how noisy
data can corrupt the resulting hypothesis. Decision trees were presented in order to
reduce this problem. It does this by keeping statistical values about the examples,
showing for how many examples a hypothesis is true (so-called information gain).
Again, using intervals to express the continuous properties of the fluid, multiple
decision trees can be found to represent each different failure case and provide a
hypothesis by which it is possible to judge new examples [46].

35
Chapter 7

Conclusion

Though the field of computer vision has greatly expanded the past decades, the
study of computer vision applications for analyzing liquids has been very limited.
Most significant research has been done by Sagi Eppel, who proposed methods for
finding the vessel region and the liquid surface. These methods will have to be
tested in the upcoming experiments, in order to analyse failures in samples. By
using these methods, it would be possible to detect the fill level of a liquid inside
a transparent glass vessel. This fill level can then be tracked over time to indicate
if the fluid has changed. In case multiple surface lines can be found, we can also
indicate that the liquid is separated.
Computer vision techniques can be too limited to detect other kinds of failures
related to gas or solid phases. In chapter 4, we discussed 2 kinds of convolutional
networks. The first type are traditional convolutional neural networks used for
object detection and classification. GoogLeNet and ResNet provided a new model
that severely improved accuracy over traditional models. In our experiments we’ll
compare accuracy of 2 different kinds of networks. The first will be a convolu-
tional neural network used to classify if and what failure happened to a sample.
The network will combine the image and the vessel region as input. After eval-
uating this input, the network classifies if and what failure has occurred to the
sample. Expected is that this method will require a lot of data in order to learn
each failure.
The other architecture that has been discussed are fully convolutional neural net-
works. This architecture is made to provide a pixel-wise segmentation of the image
instead of a single classification. Multiple famous models have been expanded to
allow for semantic segmentation of images. DeepNet’s modified version of ResNet
was found to achieve the highest accuracy to date.
Sagi Eppel released 2 approaches where he used fully convolutional neural net-
works to segment different phases inside a transparent vessel. Their approaches
scored very promising results in detecting liquid phases, though classifying other

36
phases were done with low accuracy.
The second architecture we’ll be using in our experiments will be based on their
modular approach, containing 2 fully convolutional neural networks. The first
network is responsible for finding the vessel region, while the second is used for
segmenting that region into different phases. We’ll try to improve their accuracy
by implementing the networks using the more accurate ResNet model as suppose
to their implementation of VGG16. We’ll also be expanding their dataset with
newly provided images that will be hand-labelled to classify the different phases.
For this, we’ll focus on samples that contain gas or solid phases as these were
underrepresented.
The first architecture provides us with a set of labels stating the failure that oc-
curred in a sample, while the second returns us a segmented map of the image.
In order to compare these 2 results, the extracted properties must be reduced to
a failure, for this concept learning will be applied.
By using all of these methods, we would be able to analyze the following proper-
ties of fluids in transparent vessels: the fill level, colour, and the representation
of different phases. Using these properties we should be able to identify all phase
changes happening within the vessel. In order for this to be possible, accuracy in
detecting gas and solid phases has to be improved. For this we rely on the dataset
that will be provided.

37
Bibliography

[1] S. Papert, “The summer vision project,” 1966.

[2] S. Eppel and T. Kachman, “Computer vision-based recognition of liquid sur-


faces and phase boundaries in transparent vessels, with emphasis on chemistry
applications,” 2014.

[3] M. O’Brien, P. Koos, D. L. Browne, and S. V. Ley, “A prototype continuous-


flow liquid-liquid extraction system using open-source technology,” Org.
Biomol. Chem., vol. 10, pp. 7031–7036, 2012.

[4] S. Eppel, “Using curvature to distinguish between surface reflections and ves-
sel contents in computer vision based recognition of materials in transparent
vessels,” CoRR, vol. abs/1506.00168, 2015.

[5] S. Eppel, “Setting an attention region for convolutional neural networks using
region selective features, for recognition of materials within glass vessels,”
CoRR, vol. abs/1708.08711, 2017.

[6] D. Keren, M. Osadchy, and C. Gotsman, “Antifaces: A novel, fast method


for image detection,” IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 23, p. 2001, 2001.

[7] “Thresholding.” http://homepages.inf.ed.ac.uk/rbf/HIPR2/threshld.


htm. Accessed: 2017-11-18.

[8] L. G. Roberts, “Machine perception of three-dimensional solids,” MIT Press,


vol. 2, p. 30, 1963.

[9] “Roberts cross edge detector.” http://homepages.inf.ed.ac.uk/rbf/


HIPR2/roberts.htm. Accessed: 2017-11-18.

[10] H. Spontn and J. Cardelino, “A Review of Classic Edge Detectors,” Image


Processing On Line, vol. 5, pp. 90–123, 2015. 10.5201/ipol.2015.35.

38
[11] J. M. S. Prewitt, “Object enhancement and extraction,” Academic Press,
pp. 75–149, 1970.

[12] I. Sobel, “An isotropic 3 3 image gradient operator,” Presentation at Stanford


A.I. Project 1968, 02 2014.

[13] “Smoothing images.” http://opencv-python-tutroals.readthedocs.


io/en/latest/py_tutorials/py_imgproc/py_filtering/py_filtering.
html. Accessed: 2017-11-20.

[14] “Implementing canny from scratch.” http://aishack.in/tutorials/


implementing-canny-edges-scratch/. Accessed: 2017-11-20.

[15] J. Canny, “A computational approach to edge detection,” IEEE Trans. Pat-


tern Anal. Mach. Intell., June 1986.

[16] “The canny edge detector.” http://aishack.in/tutorials/canny-edge-


detector/. Accessed: 2017-11-20.

[17] “Fast fourier transform.” https://nl.wikipedia.org/wiki/Fast_Fourier_


transform. Accessed: 2017-11-21.

[18] “time complexity of canny edge detector.” https://stackoverflow.com/


questions/17458237/time-complexity-of-canny-edge-detector. Ac-
cessed: 2017-11-21.

[19] “Materials in vessels data set.” https://nl.mathworks.com/


matlabcentral/fileexchange/46887-find-boundary-of-symmetric-
object-in-image?focused=3822426&tab=function. Accessed: 2017-11-06.

[20] S. Ley, R. Ingham, M. O’Brien, and D. Browne, “Cheminform abstract:


Camera-enabled techniques for organic synthesis,” vol. 44, 10 2013.

[21] L. Yazdi, A. Prabuwono, and E. Golkar, Feature extraction algorithm for fill
level and cap inspection in bottling machine, vol. 1, pp. 47–52. 2011.

[22] D. X. Hu, M. OBrien, and S. V. Ley, “Continuous multiple liquidliquid sepa-


ration: Diazotization of amino acids in flow,” Organic Letters, vol. 14, no. 16,
pp. 4246–4249, 2012.

[23] Y. LeCun, P. Haffner, L. Bottou, and Y. Bengio, “Object recognition with


gradient-based learning,” in Shape, Contour and Grouping in Computer Vi-
sion, pp. 319–, Springer-Verlag, 1999.

39
[24] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with
deep convolutional neurala networks,” 2012.
[25] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. MIT Press, 2016.
http://www.deeplearningbook.org.
[26] “Materials in vessels data set.” http://cs231n.github.io/convolutional-
networks/. Accessed: 2017-11-22.
[27] M. Hubel and T. N. Wiesel, Brain and Visual Perception. Oxford Univeristy
Press, 2005.
[28] e. Christian Szegedy, Wei Liu, “Going deeper with convolutions,” CoRR,
vol. abs/1409.4842, 2014.
[29] K. He, X. Zhang, and etc, “Deep residual learning for image recognition,”
CoRR, vol. abs/1512.03385, 2015.
[30] E. Shelhamer, J. Long, and T. Darrell, “Fully convolutional networks for
semantic segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., 2017.
[31] V. Badrinarayanan, A. Kendall, and R. Cipolla, “Segnet: A deep convolu-
tional encoder-decoder architecture for image segmentation,” CoRR, 2015.
[32] F. Yu and V. Koltun, “Multi-scale context aggregation by dilated convolu-
tions,” CoRR, 2015.
[33] L. Chen, G. Papandreou, and etc, “Rethinking atrous convolution for seman-
tic image segmentation,” CoRR, 2017.
[34] E. Shelhamer, J. Long, and T. Darrell, “Fully convolutional networks for
semantic segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., pp. 640–
651, 2017.
[35] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-
scale image recognition,” CoRR, vol. abs/1409.1556, 2014.
[36] S. Eppel, “Hierarchical semantic segmentation using modular convolutional
neural networks,” CoRR, vol. abs/1710.05126, 2017.
[37] “Materials in vessels data set.” https://github.com/sagieppel/
Materials-in-Vessels-data-set. Accessed: 2017-11-05.
[38] V. Kovalev, A. Kalinovsky, and S. Kovalev, “Deep learning with theano, torch,
caffe, tensorflow, and deeplearning4j: Which one is the best in speed and
accuracy?,” 10 2016.

40
[39] H. Kim, H. Nam, W. Jung, and J. Lee, “Performance analysis of cnn frame-
works for gpus,” 2017.

[40] M. Ebner, Color Constancy. 2007.

[41] V. Agarwal and etc, “An overview of color constancy algorithms,” 2006.

[42] “Results per method.” http://colorconstancy.com/?page_id=15. Ac-


cessed: 2018-01-06.

[43] J. Morovic, Color Gamut Mapping. 2008.

[44] S. Bianco, C. Cusano, and R. Schettini, “Color constancy using cnns,” CoRR,
vol. abs/1504.04548, 2015.

[45] F. Divina, M. Keijzer, and E. Marchiori, A Method for Handling Numerical


Attributes in GA-Based Inductive Concept Learners. Springer Berlin Heidel-
berg.

[46] T. M. Mitchell, Machine Learning. McGraw-Hill, Inc., 1997.

41

S-ar putea să vă placă și