Sunteți pe pagina 1din 17

E6893 Advanced Big Data Analytics:

Analysis of Motor Vehicle Accident in NYC

Team Member: Xiaowen Zhang, Jimin Ge, Peiran Zhou


Group ID: 201512-41
Overview
New York City has one of the most extensive and oldest transportation
infrastructures across the country. However, NYC is infamous for its
worlds most notorious traffic condition for its high rate of motor
vehicle accidents.

Used the historical Motor Vehicle Collision data collected over the
past five years from NYC Open Data, which tracks details of
accidentsincluding time, location, area and the accident contribution
factors.

Applied Gaussian mixture model and Latent Dirichlet allocation to


help people learn about the traffic accidents and explore the reasons.

Built a traffic map pinpoints where and when accidents happen,


flagging particularly dangerous stretches of area.
Dataset

Selecting NYC Open Data website as our source


Choosing recent 5 years Motor Vehicle Collision Data
Over 700,000 data records
Using HTML parser to preprocess the raw data
Gaussian mixture model
Assumption: weighted sum of multiple Gaussian

Number of Gaussians requires preset

Variational Inference (VI)

Gaussian-Wishart conjugate prior distribution


Clustering

DATE

TIME

NUM INJURED

NUM KILLED
pre-LDA: clean
Tokenization Stop words Stemming

Tokenization: segments a document into its atomic


elements input: Description/Reasons/Vehicle information

Stop words: words need to be removed from our token


list

Stemming: reduce topically similar words to their root


Eg: Vehicular to vehicle
Latent Dirichlet allocation
Topic modeling

Prior:
Doc: topic distribution
Tpc: word distribution

Posterior:
p(word|Tpc)
p(Tpc|Doc)
Word Topic Document
Clustering Result
-Heat map by week

Normalization
No remarkable difference on
week between three clusters
Clustering Result
-Heat map by month
Normalization
The frequency of month 8-12
in cluster 3 is high
The frequency of month 5-8 in
cluster 1 and cluster 2 is high
Clustering Result
-Moving average by month

Collision numbers against time


3-points moving average
Clustering Result
LDA Result
num_topic = 4

vehicle, kill, injury

driver, distraction, inattention

passenger, traffic, pavement

station, sport, utility


Website
Google Map Javascript API
Import data into application
Display data on the map
Customize the map marker
Navigation Bar Multiple Layers
Future Work

Create a predictive crash model to help the drivers and


government better prevent accidents.

Provide the data to drivers through mobile devices or


portable navigation systems.

Optimize the core algorithms, improving its accuracy and


running time.
Thank You !

S-ar putea să vă placă și