Documente Academic
Documente Profesional
Documente Cultură
PREDICTIVE MODELING
Hongyu Chen
Jing Li
Mubing Li
CS69/169 Mobile Health
March 2015
Motivation
Lets go further than StudentLife 1.0!
Standardized, normalized data set
Proof of concept
Scientific finding to our question:
Can we predict depression from a two week window
of StudentLife data collection?
Study Design
Data cleaning/parsing
Feature selection
Class determination
Predictive classifiers through supervised machine learning methods
Validation
Case study
Depressed
whole time
Project Design/Workflow
StudentLife
Dataset
Data
Preprocessing
& Interpolation:
Linear
Nearest
Neighbour
Concatenation
PCA
Nondepressed
whole time
PHQ9
Threshold
EMA:
Sleep
Mood
Stress
Social
Exercise,
etc.
Feature
Selection
Depresstion
status
changed
Case
Study
Sensor:
Audio
Conversation
Activity
Dark,
etc.
Feature
Class
Data
separation
by week
SVM,
etc.
Prediction
N-fold CV
Accruacy
F statistics
Precision/
Recall
Sensitivity/Sp
ecificity
Result
analysis
Diagnosis
0-4
No Depression
5-9
Mild Depression
10-14
Moderate Depression
15-19
Moderately Severe
Depression
20-27
Severe Depression
Nearest-Neighbor Interpolation
for Sensor Data
Now:
All 15 depression-related modalities have
One value per 24-hour period
Comparable scaling
A guarantee of good quality (279 samples removed)
Feature Selection
Step 1:
Decide sliding window time frame
Two weeks
Balance of enough time to make diagnosis, but short enough to have enough
time points for testing
Step 2:
Feature aggregation
Step 3:
Dimensionality Reduction
We cannot use 105 dimensions to classify only a couple hundred
cases!
PC1
PC2
PC3
PC4
PC5
PC6
PC7
PC8
Principle Components
PC9
PC10
0.5
0.4
0.3
0.0
0.1
0.2
Proportion of Variance
0.15
0.10
0.05
0.00
Proportion of Variance
0.6
0.20
0.7
PC1
PC2
PC3
PC4
PC5
PC6
PC7
PC8
Principle Components
PC9
PC10
Random Forest
0.06
0.02
0.04
Random Forest
0.00
0.08
0.10
Decision Trees
10
15
20
Numbers of features
25
30
Predictive classifier
Classes: (not depressed, depressed)
-1
+1
Features: top features by PCA
Training set
All depressed
Samples(50%)
Selected not
depressed
Samples(50%)
SVM model
Cross Validation
Accuracy = 96.6667%
Case study
Participant No.16:
Beginning of the term:
End of the term:
Depressed
Not depressed
Future Directions
Why is this important?
1. Contributes (marginally) to existing medical literature about
depression
2. Proof of concept for possible interventions
Imagine app that tells you when you could be depressed
Connects you with resources to help
3.