Documente Academic
Documente Profesional
Documente Cultură
Yudong Shen
UC Davis
ydshen@ucdavis.edu
ABSTRACT
In this paper, several machine learning methods, including KNN,
SVM, Softmax and neural network are used to predict the shelter
animal outcomes with the information collected by Austin Animal
Center. To improve the classification result, re-sampling is also
used to deal with the imbalance data.
Keywords
2. TECHNIQUES
1. INTRODUCTION
Every year, approximately 7.6 million companion animals end up
in US shelters. Many animals are given up as unwanted by their
owners, while others are picked up after getting lost or taken out of
cruelty situations. Many of these animals find forever families to
take them home, but just as many are not so lucky. About 2.7
million dogs and cats are euthanized in the US every year.
Using a dataset of intake information including breed, color, sex,
and age from the Austin Animal Center, it is possible to understand
The features included in the raw data are ID, Name, Date Time,
Animal Type, Sex upon Outcome, Age upon Outcome, Breed and
Color and the outcomes are Died, Adoption, Euthanasia, Return to
owner and Transfer. The purpose of the model is using machine
learning models to classify the data into 5 outcomes. Here is an
example of raw data and an overview of outcomes of all dogs and
cats copied from Kaggle forum. As we can see in the figure, most
of the animals are Adoption and Transfer, which is a good news.
Name
Date
Type
Sex
Age
A671945
Summer
2015-10-12
Dog
Neutered Male
1 year
A699218
Jimmy
2015-03-28
Cat
Intact Male
3 weeks
Breed
Color
Outcome
Shetland Sheepdog
Mix
Domestic Shorthair
Mix
Brown/
White
Return to
owner
Blue Tabby
Transfer
There are many ways to define this kind of classifier, the most two
popular classifiers are Multiclass Support Vector Machine (SVM)
classifier (especially linear) and Softmax classifier.
Training
Accuracy
65.02%
Test
Accuracy
63.74%
Processing
Time
~2 min
Training
Accuracy
65.60%
68.08%
69.07%
83.00%
87.74%
88.43%
Test
Accuracy
64.20%
63.38%
64.35%
58.35%
57.37%
57.16%
Processing
Time
~4min
~5min
~5 min
~8 min
~17 min
~30min
* Different activation functions does not affect the result too much
3. EXPERIMENTS
3.1 K-Nearest Neighbor
Table 2. KNN
k
1
5
50
200
500
Accuracy
44.16%
49.08%
52.59%
52.11%
51.76%
4. EVALUATION
~13 min
Processing Time
Training
Accuracy
63.97%
60.97%
The last model is using neural network. The result here shows
roughly no difference between simple linear classifier.
Test
Accuracy
63.19%
60.90%
Processing
Time
~7 min
~15 min
Here is the final result after SMOTE using the method of Softmax
Classifier, which has a decent improvement to the results above.
Table 6. Softmax after SMOTE
Name
Softmax
Training
Accuracy
73.49%
Test
Accuracy
71.24%
Processing
Time
~2 min
5. CONCLUSION
By this project I learned a lot about linear classifier and neural
network, the shelter animal outcomes classification problem is very
challenging and very different from what we did in class
assignments. What we did is hand-written digits recognition, which
is an unstructured database with balanced distribution (every digits
has roughly same samples), and this specific problem is more
structured and has imbalance data.
SMOTE algorithm is very useful to deal with the imbalance
database by generating artificial minority class data to the training
set to prevent them to be ignored by the model. Also, as the
structured data this problem has, neural network is not necessary,
linear classifiers like Softmax is good and fast enough to
implement.
6. REFERENCES
[1] Wikipedia. K-nearest neighbors algorithm wikipedia, the
free encyclopedia, 2016.
[2] Wikipedia. Linear classifier wikipedia, the free
encyclopedia, 2016.
[3] Yichuan Tang. Deep learning using linear support vector
machines. arXiv preprint, arXiv:1306.0239, 2013.
[4] Wikipedia. Artificial neural network wikipedia, the free
encyclopedia, 2016.
[5] Ho, Tin Kam (1998). The Random Subspace Method for
Constructing Decision Forests. IEEE Transactions on
Pattern Analysis and Machine Intelligence 20 (8): 832844.
[6] Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer,
W. P. (2002). SMOTE: synthetic minority over-sampling
technique. Journal of artificial intelligence research, 321357.