Documente Academic
Documente Profesional
Documente Cultură
AbstractIn this work, we address a critical task of dynamically detecting the simultaneous behavior of driving and
texting using smartphone as the sensor. We propose, design,
and implement TEXIVE which achieves the goal of detecting
texting operations during driving utilizing irregularities and rich
micro-movements of users. Without relying on any external
infrastructures and additional devices, and no need to bring
any modification to vehicles, TEXIVE is able to successfully
detect dangerous operations with good sensitivity, specificity and
accuracy by leveraging the inertial sensors integrated in regular
smartphones. To validate our approach, we conduct extensive
experiments involving in a number of volunteers on various
of vehicles and smartphones. Our evaluation results show that
TEXIVE has a classification accuracy of 87.18%, and precision
of 96.67%.
Index TermsDriving Detection, Smartphone Sensing, Mobile
Applications, IoTs.
I. I NTRODUCTION
One recent study indicates that there are still at least 23%
of all vehicle crashed and 1.3 million crashes in the US
involve using cell phones (especially texting) during driving [1] although over 30 states and District of Columbia
has forbidden texting while driving [2]. Such severe safty
issue has stirred numerous researches and innovations on
detecting and preventing driving and texting behaviors so that
a number of detection approaches using different sensing or
Internet of Things (IoT) devices have been proposed, e.g.,
mounting a camera to monitor the driver [3], [4], relying
on acoustic ranging through car speakers [5], and leveraging
sensors and cloud computing to recognize drivers operations
[6]. Meanwhile, some Apps have been developed, such as
Rode Dog [7] and TextBuster [8]. Although these techniques
The work is partially supported by the US National Science Foundation
under Grant No. NSF CNS-1319915, CNS-1343355, CNS-1526638, EECS1247944, and CMMI-1436786, and the National Natural Science Foundation
of China under Grant No. 61272426, 61379117, 61428203, 61520106007, and
61572347. (Co-corresponding authors: Xiang-Yang Li and Yu Wang.)
C. Bo is with the Department of Computer Science, University of North
Carolina at Charlotte, Charlotte, NC 28223, USA. E-mail: cbo1@uncc.edu.
X. Jian, T. Jung and J. Han are with the Department of Computer
Science, Illinois Institute of Technology, Chicago, IL 60616, USA. E-mail:
{xjian,tjung,jhan20}@hawk.iit.edu.
X.-Y. Li is with the School of Computer Science and Technology, University
of Science and Technology of China, Hefei, Anhui, 230026, China and the
Department of Computer Science, Illinois Institute of Technology, Chicago,
IL 60616, USA. E-mail: xli@cs.iit.edu.
X. Mao is with School of Software, Tsinghua University, Beijing, 100084,
China. E-mail: xufei.mao@gmail.com.
Y. Wang is with the Department of Computer Science, University of
North Carolina at Charlotte, Charlotte, NC 28223, USA and the College of
Information Engineering, Taiyuan University of Technology, Taiyuan, Shanxi,
030024, China. E-mail: yu.wang@uncc.edu.
Y
Entering?
Side Detection
Side Signal
Decision
Combination
Driver
Passenger
Start
T2
T3
EnterCar
T4
Driving
Time
DetectionDelay
C. Entering Vehicles?
In this section, we mainly focus on the detection of entering
a vehicle, which is the initial stage of identification. We discuss
this according to three aspects: Activity Recognition, Entering
Detection, and Accuracy Improvement.
a) Activity Recognition: In our system, we set the sampling rate as 20Hz, and successive sensory data are collected,
which presents the continuous human activities. Two critical
stages attract our attention. The first one is to identify the
starting point and the duration of an activity from a sequence
of temporal data, while the other is employing effective feature
extraction, which are helpful for machine learning purpose
later. For the former issue, we store a small duration of sensory
data in the buffer, and set the sliding window size of sampling
data for recognition as 4.5 seconds, which based on the
extensive evaluation results to be presented later. According
to the latter, we adopt Discrete Cosine Transform (DCT) [35]
to extract features from linear acceleration to represent the
specific action.
Remembering that our system works in the background
and monitors multiple activities, which may present similar
patterns, our strategy is to analyze confusable activities from
the perspective of machine learning, and distinguish them from
each other to improve the accuracy. Therefore, we additionally
examine activities, like walking behavior, ascending stairs, and
sitting down, which could be reflected from acceleration on
the direction of gravity as shown in Figure 3(a) and 3(b).
Although the difference in walking and ascending stairs are
relatively small based on the experimental results, the machine
learning could still provide high accuracy to distinguish one
from the other. Sitting down (Figure 3(c)) is another activity
we take into account, not only because it is recorder multiple
times through one day, but also the pattern sometimes is
similar to sitting in the vehicle. We employ naive Bayesian
classifier [36], detect and identify our preliminary collected
data containing 20 cases for each activities mentioned above
and 100 other behaviors such as running and jumping, with
accuracy as high as 91.25%.
b) Entering Detection: Empirically, the activity of getting into vehicle consists of five basic actions, including
walking towards the vehicle, opening the door, turning the
body, entering, and sitting down. We extract the feature on
the linear acceleration and convert to both horizontal plane
and gravity directions in the Earth Frame Coordinate.
A study [37] indicates that most of the users carry their
phone in trousers. Therefore we put our emphasis mainly on
the trousers case.
10
8
8
Horizontal Plane
Ground
6
Acceleration (m2/s)
T1
Acceleration (m2/s)
6
4
2
0
2
4
0
4
2
0
2
4
Horizontal Plane
Ground
6
20
40
60
80
Time Sequence (X50 ms)
100
8
0
20
40
60
80
Time Sequence (X50 ms)
100
Fig. 4. Data extracted from accelerometer in Horizon plane and ground when
people enter vehicles.
Entering Vehicles
38
3
Other activities
46
250
The filter is proposed based on the observation that compared with common sitting activity, the sensed ambient magnetic field fluctuate dramatically when approaching the vehicle
because of the special material of vehicle and engine. We
conduct comprehensive tests, and plot the changes of magnetic
field in both two scenarios respectively (results shown in
Figure 5). The figures indicate that the significant difference
lies on the stability of value of magnetic field. Neglecting the
absolute error from the value, the sensory data vibrate clearly
during the complete activity in the first case, while the second
6
6
2
0
2
Acceleration (m2/s)
Acceleration (m2/s)
Acceleration (m2/s)
2
0
2
4
Horizontal Plane
Ground
6
6
0
50
100
150
Time Sequence (X50 ms)
200
8
0
20
40
60
80
100
Time Sequence (X50 ms)
120
Horizontal Plane
Ground
4
2
0
2
0
20
40
60
80
Time Sequence (X50 ms)
100
X
Y
Z
20
100
Pitch
Roll
50
50
0
50
150
0
20
40
60
80
100
Time Sequence (X50 ms)
20
Pitch
Roll
100
50
40
X
Y
Z
50
100
Rotation Degree (degree)
20
40
60
80
100
Time Sequence (X50 ms)
100
200
300
Time Sequence (X50 ms)
40
0
50
100
150
200
Time Sequence (X50 ms)
26
Front Row
Back Row
The system conducts side-detection operations simultaneously with the entering-detection so that the detection delay is
minimized. We call the leg close to the vehicle as inner leg,
and by looking into the collected data of experimental cases,
we have an interesting observation. When the smartphone is
located in the pockets of the inner leg, the boarding motion
leads to a large fluctuation on acceleration sensors followed
by a relatively smaller one. The observation is totally opposite
when the smartphone is put in some pocket of the other leg.
Unfortunately, it is still hard to judge the boarding side of a
user since a user with his/her phone in his left pocket boarding
from the right side has similar observing pattern as that of a
user with his/her phone in his right pocket boarding from the
left side. Thus pure acceleration-based determination may fail
in this stage.
Fortunately, we found another key factor determining the
direction of body rotation when entering from both sides.
Usually, in order to face front, the driver has to turn left while
the passenger will turn right before he/she enters into the car
(according to the car model in the US), and such small duration
of action could be captured by the gyroscope sensor in both
Pitch and Roll, shown in Figure 6.
Although the orientation of vehicle is unknown and unpredictable, the turning-based determination is demonstrated to be
robust during our evaluation. We also adopt Extend Kalman
Filter to eliminate the internal mechanism noise of sensors.
25
70
60
50
40
24.5
24
23.5
23
30
20
0
25.5
Magnetic FIeld (uT)
80
22.5
50
100
150
200
Time Sequence (X50 ms)
250
300
22
180
200
220
240
260
Time Period (X50 ms)
280
300
0
1
2
Front Row
Back Row
3
0
10
20
30
40
Time Sequence (X50 ms)
2
1
0
1
2
0
10
20
30
Time Sequence (X50 ms)
40
25
Normal Texting
Texting while Driving
20
15
0.8
CDF
Acceleration (m2/s)
Acceleration (m2/s)
Front
Back
PDF (%)
3
3
0.6
10
0.4
0.2
0
0
400
2000
0
0
Normal Texting
Texting while Driving
20
40
60
80
Numbe of inputs between two typos
G. Make Decision
Now we can make the decision from three parts of information: Front or Back, Left or Right, Side Information. Bump
and magnetic field information included in the first part, cause
the Bump information gives out high accuracy Front or Back
decision but we can not get the result before the driving start
cause meeting a bump need driving for several minutes. So at
beginning we use the result of magnetic field information to
consider front or back, the Bump information will be used as
soon as we meet the first bump. Left or Right decision only
relies on the pattern comparison. After these two parts, we can
know if the user is a driver. Other side information we use to
confirm and correct our decision. If all these side information
shows a different decision within several texting action then
the system will change the decision to the opposite.
IV. R EDUCING E NERGY C ONSUMPTION
Energy consumption is a paramount concern in mobile
devices, especially for smartphones. TEXIVE is designed to
V. E VALUATIONS
In this section, we will describe the intensive evaluation of
TEXIVE according to different components. All experiments
are conducted using two different smartphones (Samsung
Galaxy S3 and Galaxy Note II) in two different vehicles
(Sedan and SUV) by multiple users. The whole process is
evaluated on streets in Chicago, except the texting part is
evaluated in a large parking lot in order to avoid accident.
CDF
0.8
0.6
0.4
0.2
0
0
50
100
150
200
250
Time intervala between two bumps (s)
15
100
Entering
Vehicle
90
Percentage (%)
20
Acceleration (m2/s)
k
potholes in time interval [t, t + ] is: P (k) = e k! where
is a rate parameter. Thus, the probability of successfully
detecting the kth bump or pothole by the ith detecting cycle
w
is: Pik = Pith hit Pk1 miss = (1 ew ) s+w
. Suppose
the average power for sampling sensory data and running
activity recognition in one unit time is C, as a result, the
total energy consumption under the same circumstance is
C((i 1)(w + s) + t), where t is the time for identifying a
bump or pothole in the ith sampling. And the overall expected
w
cost is E(k) = (1 ew ) s+w
C((i 1)(w + s) + t). We
test a segment of the road (over 5 miles), containing both local
streets and highway. The actual bump measured in our data
is not the regular speed bump people experience, actually, we
treat any non-smoothy part of a road segment that will cause
bump-like behavior as a bump, and record the time interval
of driving through a bump or pothole on the street as shown in
Figure 10. The figure shows that the probability of a vehicle
driving through a bump within 50 seconds is over 80%, so
that method is feasible and reliable.
10
5
0
5
10
0
Walking Detected
Horizontal Plane
Ground
100
200
300
Time Sequence (X50 ms)
80
70
60
50
Precision
Sensitivity
Specificity
Accuracy
40
30
20
1.5
2.5 3 3.5 4
Window Size (s)
4.5
(a) Detecting the first arriving signal (b) Recognition of entering vehicles
Fig. 11. Detecting entering vehicles.
A. Getting on Vehicle
We first evaluate the performance of entering activity detection, more specifically, the capability of both extracting
entering pattern in a series of sensory data and distinguish
from others. We deploy TEXIVE in the background of multiple
users smartphones in the office for one week, and collect
the sensory data for analysis. Although a large amount of
sensory data is gathered and a lot of successful detections
are conducted, the number of driving related activity is much
smaller than the others because most of our volunteers only
drive in commute time, which is also considered credible
observation.
Figure 11(a) illustrates the successful detection of first
signal according to the protocol reflected in acceleration from
the perspective of both horizontal and ground direction. The
smartphone is put in the pocket, and TEXIVE perceives a
walking activity from 112th time slot, and the entering signal
arrives 2 seconds later (133th time slot). Since the duration for
the whole activity lasts approximately 5 to 6 seconds normally
if the user is not interrupted, thus we slide the detection
window with 0.5s step length to detect the action. In this
case, multiple entering activities are perceived, and only in
this circumstance we can increase the credibility that the user
is indeed entering the vehicle.
We collected 41 behaviors of entering in both SUV and
Sedan as well as 296 other activities in total, we evaluate the
precision, sensitivity, specificity and accuracy with respect to
different window sizes of the action. We set the window size
ranging from 1.5s to 5s and plot the results in Figure 11(b),
which indicates that generally the performance improves with
the increment of window size. The system reaches a better
performance when the window side is around 4s to 4.5s,
with both sensitivity are over 90%. On the other hand, we
have to admit the precision in both cases are not as high as
100
100
Percentage (%)
Accuracy (%)
90
80
Test in Front
Test in Back
80
70
60
Bump in Front
20
0
Bump in Back
0
20
70
Left
Right
50
40
TABLE II
B UMP ON THE ROAD .
Precision
Sensitivity
Specificity
Accuracy
90
1.5
2.5 3 3.5 4
Window Size (s)
4.5
60
1.5
2.5 3 3.5 4
Window Size (s)
4.5
Cup Holder
Hand
Dashboard
Windshield
8
6
4
2
0
0
20
40
60
Time Sequence (X50 ms)
80
100
SUV
Sedan
80
10
60
40
20
0
1
Sedan
SUV
15
Spike Amplitude of
Magnetic Field (uT)
10
3
4
5
Sampling Point
3
4
5
Sampling Point
900
800
700
600
500
400
0
10
Text No.
15
20
D. Driver Detection
The decision of driver detection is made by fusing the
evidence from previous sub-processes. When doing real time
recognition, the system slides the window with step 0.5s to
match the stored activities through naive Bayes classifier. Since
the activity could be detected in multiple times because of
the sliding window, we consider a continuous same activity
recognition to be a successful recognition to increase the
credibility. And taking the acceleration into account as a filter,
the recognition could provide high level of credit for current
recognition.
10
[28] C. Bo, L. Zhang, X.-Y. Li, Q. Huang, and Y. Wang, SilentSense: silent
user identification via touch and movement behavioral biometrics, in
Proc. of ACM MobiCom, Poster, 2013.
[29] C. Bo, L. Zhang, T. Jung, J. Han, X.-Y. Li, and Y. Wang, Continuous
user identification via touch and movement behavioral biometrics, in
Proc. of IEEE IPCCC, 2014.
[30] C. Bo, T. Jung, X. Mao, X.-Y. Li, and Y. Wang, SmartLoc: Sensing
landmarks silently for smartphone based metropolitan localization,
EURASIP Journal on Wireless Communications and Networking, to
appear.
[31] B. Guo, Z. Wang, Z. Yu, Y. Wang, N. Yen, R. Huang, and X. Zhou,
Mobile crowd sensing and computing: The review of an emerging
human-powered sensing paradigm, ACM Computing Surveys, 48(1),
Article 7, August 2015.
[32] Y. Wang, Y. Ge, W. Wang, and W. Fan, Mobile big data meets cyberphysical system: Mobile crowdsensing based cyber-physical system for
smart urban traffic control, in Proceedings of 2015 Workshop for Big
Data Analytics in CPS: Enabling the Move From IoT to Real-Time
Control, Seattle, Washington, April 2015.
[33] L. Rabiner and B. Juang, An introduction to hidden markov models,
ASSP Magazine, IEEE, vol. 3, no. 1, pp. 416, 1986.
[34] F. Ichikawa, J. Chipchase, and R. Grignani, Wheres the phone? a study
of mobile phone location in public spaces, in Proc. of MobiSys. ACM,
2005, pp. 18.
[35] N. Ahmed, T. Natarajan, and K. Rao, Discrete cosine transform,
Computers, IEEE Transactions on, vol. 100, no. 1, pp. 9093, 1974.
[36] P. Langley, W. Iba, and K. Thompson, An analysis of bayesian
classifiers, in Proc. of AAAI, 1992, pp. 223223.
[37] Y. Cui, J. Chipchase, and F. Ichikawa, A cross culture study on phone
carrying and physical personalization, in Proc. of UI-HCII. Springer,
2007, pp. 483492.
[38] P. Mohan, V. Padmanabhan, and R. Ramjee, Nericell: rich monitoring
of road and traffic conditions using mobile smartphones, in Proc. of
SenSys. ACM, 2008, pp. 323336.
[39] J. Eriksson, L. Girod, B. Hull, R. Newton, S. Madden, and H. Balakrishnan, The pothole patrol: using a mobile sensor network for road
surface monitoring, in ACM Proc. of MobiSys, 2008.
Xufei Mao received the Bachelor?s degree in Computer Science from Shenyang University of Technology and the MS degree in Computer Science
from Northeastern University in 1999 and 2003
respectively. He received the Ph.D. degree in Computer Science from Illinois Institute of Technology,
Chicago in 2010. He is currently an associate research professor in the School of Software, Tsinghua
University. His research interests span wireless ad
hoc networks and Internet of things. He is a member
of the IEEE.
11