Documente Academic
Documente Profesional
Documente Cultură
PFlow: Reconstructing
People Flow Recycling
Large-Scale Social
Survey Data
Understanding people flow on a macroscopic scale requires
reconstructing it from various forms of existing fragmentary
spatiotemporal data. This article illustrates a process for
reconstructing such data using existing person-trip survey data.
M
onitoring dynamic changes In technical terms, we can measure peo-
in people flow has become ple flow by tracking various measurements,
increasingly impor tant including
to m it ig at e s e cond a r y
disasters following earth- • the position of mobile objects, using GPS or
quakes, fires, and other major events and to personal handy systems (PHS);
relieve congestion in mass transit systems. • the number of stationary people, using a
For instance, the accurate closed-circuit television (CCTV) camera;
Yoshihide Sekimoto, monitoring of people flow • the number of passengers getting on and off
Ryosuke Shibasaki, Hiroshi Kanasugi, could have been helpful at transportation facilities, using the number of
and Tomotaka Usui a 2001 fireworks event in integrated circuit (IC) tickets passing through
University of Tokyo Akashi, Japan, where 247 of automatic ticket gates;
the 150,000 spectators were • the number of stationary people, using the
Yasunobu Shimazaki killed or injured when they number of registered mobile phones at each
Pasco rushed to a pedestrian bridge. base station, and
Public facility managers also • the hourly number of visitors to department
need a comprehensive grasp of stores.
people flow to design safe, comfortable spaces
and to develop appropriate urban transport However, the scope of many of these methods
policies such as those for commuter trains. doesn’t extend beyond data-acquisition tech-
Consider, for example, Shinjuku Station, nology. Such research doesn’t consider infra-
which with a daily ridership of approxi- structure data that can give an overview of
mately 4 million people, is the most crowded the mass flow of people by integrating these
station in the world. Understanding people various forms of data. (See the “Related Work
flow would also be useful in the commercial in People-Flow Analysis” for a discussion of
advertising field, where the pricing of outdoor some of the work in this field.) This is true
advertising depends on each location’s traffic in terms of comprehensive qualities including
volume. spatial and temporal accuracy, acquisition and
objects.3,4 However, none of these studies show examples of large- 6. Y. Ohkusa and T. Sunagawa, “Application of an Individual-Based
Model with Real Data for Transportation Mode and Location to Pan-
scale realistic data. Furthermore, although some studies have
demic Influenza,” J. Infection and Chemotherapy, vol. 13, no. 6, 2007,
proposed agent-based simulations,5,6 these require basic people- pp. 380–389.
flow data to keep the simulations realistic. Meanwhile, studies on
7. J.C. Herrera and A.M. Bayen, “Traffic Flow Reconstruction Using
vehicle activity integrating mobile phone data and traffic-sensing
Mobile Sensors and Loop Detector Data,” Transportation Research
data are increasing,7–9 and those on comprehensive human Board 87th Ann. Meeting, 2008; www.ce.berkeley.edu/~bayen/
activity using mobile phones are beginning to emerge.10–12 conferences/trb08.pdf.
However, no one has proposed reconstructing the total state.
8. P. Mohan, V.N. Padmanabhan, and R. Ramachandran, “Nericell: Rich
Monitoring of Road and Traffic Conditions Using Mobile Smart-
References phones,” Proc. 6th ACM Conf. Embedded Network Sensor Systems
(SenSys), ACM Press, 2008, pp. 323–336.
1. L. Ralph, D. Frank, and R. Kurt, “Scalable Processing of Trajectory-
Based Queries in Space-Partitioned Moving Objects Databases,” 9. J. Krumm and E. Horvitz, “Predestination: Inferring Destinations
Proc. 17th ACM SIGSPATIAL Int’l Conf. Advances in Geographic Informa- from Partial Trajectories,” Proc. 8th Int’l Conf. Ubiquitous Computing
tion System (ACMGIS 08), ACM Press, 2008, pp. 270–279. (Ubicomp), Springer-Verlag, 2006, pp. 243–260.
2. O. Wolfson et al., “DOMINO: Databases for Moving Objects Track- 10. M. Gonzalez, C. Hidalgo, and A. Barabasi, “Understanding Individ-
ing,” Proc. ACM Symp. Management of Data (SIGMOD 99), ACM ual Human Mobility Patterns,” Nature, no. 453, 2008, pp. 779–782.
Press, 1999, pp. 547–549.
11. R. Pulselli et al., “Computing Urban Mobile Landscapes through
3. P. Partsinevelos, P. Agouris, and A. Stefanidis, “Reconstructing Spatio- Monitoring Population Density Based on Cell-Phone Chatting,”
temporal Trajectories from Sparse Data,” Int’l J. Photogrammetry and Int’l J. Design & Nature and Ecodynamics, vol. 3, no. 2, 2008,
Remote Sensing (ISPRS), vol. 60, no. 1, 2005, pp. 3–16. pp. 121–134.
4. D. Pfoser and Y. Theodoridis, “Generating Semantics-based Trajecto- 12. C. Ratti et al., “Mobile Landscapes: Using Location Data from Cell
ries of Moving Objects,” Computers, Environment, and Urban Systems, Phones for Urban Analysis,” Environment and Planning B: Planning
vol. 27, no. 3, 2003, pp. 243–263. and Design, 2006, vol. 33, no. 5, pp. 727–748.
process costs, and service value to the microscale application. However, Person-Trip Data
user. descriptions of spatial accuracy are According to the Japan’s Ministry of
To provide a useful overview of also required because of the cost limi- Land, Infrastructure, Transport, and
the mass flow of people, a dataset on tations for pursuing details. Tourism’s (MLIT) website (www.mlit.
people flow must meet the following go.jp/crd/tosiko/pt/map_e.html), as
requirements: Such a people-flow dataset could con- of 2007, the MLIT and local govern-
sist of an individual’s location at each ments had conducted person-trip sur-
• Sufficiently large scale. The number minute. Moreover, this dataset could veys in 61 cities (122 times) in Japan
of surveyed people should be appro- be reconstructed from fragmentary over more than 40 years. These surveys
priate and unbiased to estimate the but large-scale spatiotemporal data us- originally intended to capture the macro
actual total people flow in the real ing sufficient infrastructure data (such scopic aggregated flow in each area for
world. as detailed road or railway networks) analyzing transportation on an urban
• Temporal completeness. The data and railway timetables. Source data scale. Despite the data’s fragmentary
for an individual should contain spatio could include call-logging data for nature due to the limited locations
temporal data based on a realistic mobile phones. For our work, we use sampled (for example, residences, of-
minimum resolution (for example, person-trip data obtained from public fices, and nearest stations), person-trip
1 minute) to maintain a high query transportation surveys because they’re data are valuable because they docu-
speed. widely available if used for the public’s ment the flow of disaggregated people
• Realistic spatial accuracy. The data benefit. Moreover, person-trip data let on a large scale.
for an individual should retain spa- us estimate the total number of people Figure 1 shows a person-trip sur-
tial details consistent with infra- because they’re obtained from unbiased vey sheet distributed to approximately
structural map data to withstand surveys sampling data from all ages. 10 percent of the residents in the Tokyo
Departure time
Arrival time
Trip object
Transportation mode
Travel time
Transfer point
Figure 1. Main portion of the person-trip survey sheet (from the Tokyo Metropolitan Region Transportation Planning
Commission website, www.tokyo-pt.jp/data/file/tebiki.pdf). This questionnaire requires entries on each place visited and each
trip between two places. Based on privacy considerations, the instructions require a place to be recorded as a rough address so
as not to specify the complete location. Each trip is to include a departure time, arrival time, and purpose. Moreover, each trip
is to consist of several subtrips (unlinked trips) with individual transportation modes, travel times, and transfer points.
metropolitan area. This questionnaire zone code. There were more than surveys are carried out in many coun-
requires entries on each place visited 20,000 zones, each of which covers tries, our reconstruction approach
and each trip between two places, in several hundred to several thousand could be applied to many cases.
addition to basic individual informa- people in the Tokyo metropolitan area
tion such as gender, age, and occupa- (Figure 2). Reconstruction Algorithm
tion. Because of privacy considerations, For our study, we used existing The reconstruction process involves
the places are recorded as rough ad- person-trip data at the block zone level three steps. First, we convert place in-
dresses to avoid specifying exact loca- as large-scale disaggregated fragmen- formation to latitude and longitude
tions. Each trip must include a depar- tary data to reconstruct people flow. (hereafter referred to as lat/lon) using
ture time, arrival time, and purpose. This made it easy to reconstruct the an address-matching process. Second,
Moreover, each trip must consist of real world because the person-trip each route is selected according to the
several subtrips (unlinked trips) with data had an average magnification fac- origin and destination positions of the
individual transportation modes, travel tor of about 40 to the total number of subtrip information based on road and
times, and transfer points. A transfer people in each segment based on area, railway topologies. Third, the spatio-
point can be a subtrip’s destination as gender, and age using National Census temporal position is interpolated ac-
well as the next subtrip’s origin. data. cording to the form of the people-flow
The final number of people in the As of 2004, person-trip surveys dataset based on detailed road and rail-
sample was 722,000, excluding error have been conducted not only in Japa- way geometries.
data from about 800,000 in the Tokyo nese cities but also in 52 cities in other
metropolitan area in the 1998 survey. countries under a Japan International Address Matching
The total number of records was about Cooperation Agency (JICA) project,1 The origin and destination positions
3.2 million—that is, each person pro- as Table 1 shows. This kind of survey are recorded by block zones and con-
vided data for about four to five sub- has also been conducted by US and UK verted to representative points of each
trips. Furthermore, each described transportation departments in the form block zone for the lat/lon description.
place was recorded by the smallest of a Household Travel Survey.2,3 Thus, Thus, although fragmentary, spatio-
zone (that is, block zone) of the survey because such public-sector person-trip temporal data can be acquired from the
Ibaraki prefect
(southern region)
Saitama prefect
Chiba prefect
Tokyo metropolitan
government
Smaller zone
(zone covering about 15,000 people
for regional planning) Total: 1,648
Kanagawa prefect
Block zone
(smallest zone
covering from
several hundred
to thousands for
Planning zone micro analysis)
(zone covering about 60,000 people Total: 21,967
for metropolitan area planning)
Total: 595
Figure 2. Zone system used in the Tokyo person-trip survey. Each described place of survey data was recorded by the smallest
zone (that is, block zone). These zones, each of which covers several hundred to several thousands of people, totaled over 20,000
in the Tokyo metropolitan area.
TABLE 1
List of person trip surveys.
Number of cities
Location and years Survey examples
Japanese cities 61 cities (1967 and on) Tokyo metropolitan area (1968, 1978, 1988, 1998, 2008)
as of 2007 Osaka metropolitan area (1970, 1980, 1990, 2000, 2010)
Nagoya metropolitan area (1971, 1981, 1991, 2001)
Other cities 52 cities (1966 and on) Manila (1996), Damascus and Kuala Lumpur (1997), Bucharest and Managua (1998),
as of 2004 Tripoli, Phnom Penh, Chengdu, and Belem (2000), Cairo (2001), Jakarta and Ho Chi
Minh (2002), Hanoi, Nairobi and Lima (2004)
data on origin and destination position Therefore, when the data are recorded Dijkstra method. Clearly there are some
(lat/lon) and time. in railway mode, the route can be esti- risks in this process, such as gaps in the
mated from the minimum time by rail- actual state when the share of vehicles is
Route Selection and way timetables. In walking or driving high with traffic congestion. This is be-
Spatiotemporal Interpolation mode, the route can be estimated along cause the route choices in this study are
For the person-trip data, each subtrip road networks through a minimum based solely on minimum time. Further-
has a unique transportation mode. route-search procedure based on the more, we mainly use the time/position
(c)
Railway
topology
(green line)
(b)
Figure 3. Spatiotemporal interpolation of position from origin and destination data (an example of one trip to the office for one
person). (a) One-trip data consisting of three subtrips for one person, (b) route selection from origin and destination data using
road and railway topologies (the latter linked specifically with timetables), and (c) spatiotemporal interpolation at 1-minute
intervals along detailed road and railway geometries.
data of the origin and destination posi- data of each subtrip using road and any two stations throughout Japan ob-
tion instead of the travel time because railway topologies (the latter linked tained from a relatively inexpensive
person-trip survey data in some met- specifically with the timetable). Figure 3c API provided by Val Laboratory. The
ropolitan cities other than Tokyo lack illustrates the spatiotemporal inter other is the interpolation along de-
travel times for subtrips. polation at 1-minute intervals along tailed network geometries. For road
After route estimation, lat/lon data detailed road and railway geometries. and railway networks, we use Sumi-
are spatiotemporally interpolated along tomo Electric System Solutions’ Digi-
road and railway network geometries Improved Reconstruction tal Road Map (DRM) data, which in-
to allow interpolation at 1-minute in- Using Infrastructure Data clude 4.67 million road network links
tervals. Figure 3 illustrates each step Using infrastructure data helps ensure throughout Japan. Although such ad-
of this process for the spatiotemporal we closely adhere to reality during the vanced infrastructure data are lim-
interpolation of position from origin reconstruction process. We use rail- ited for Japan, similar data are more
and destination data using one trip in way timetables as the topology in route widely available in other countries.
central Tokyo. Figure 3a shows data for selection. Choosing railway routes is For example, Open Street Map (OSM)
a trip consisting of three subtrips for an also necessary to reconstruct the total can be used for road network data in
individual after the address-matching people flow, which isn’t restricted to Hanoi. Railway timetables could also
process. Figure 3b shows route selec- vehicle flow. For railway timetables, be used, depending on the form they
tion from the origin and destination we use the available time data between take in each country.
TABLE 2
Volume of infrastructure data.
Road Railway
Area Topology Geometry Topology Geometry
Tokyo metropolitan About 1,331,000 nodes and 1,913,000 About 6,785,000 interpolation 1,455 stations 49,747 interpolation
area links in Digital Road Map (DRM) points in DRM links points in DRM geometry
Hanoi About 5,600 nodes and 3,600 links About 17,000 interpolation n/a n/a
in Open Street Map (OSM) points in OSM
Figure 5. People flow reconstructed in a 3D time-series representation with a 1-km2 mesh. The height axis illustrates a drastic
increase in people flow during peak hours.
compared our results with population Figure 6b illustrates the correlation with visualization of about 79,000 people,
data from the National Census. Specifi- a fourth-level mesh of about 500 m 2 . or 2.3 percent of the total, at three-hour
cally, we compared the data with ag- Although both correlation coefficients intervals. The aqua dots representing
gregated mesh levels in the Tokyo area are high (0.96 and 0.75), there is a clear two-wheeled motor vehicles are notable
at the same time slice because National difference resulting from the resolu- at commuting time because the sharing
Census data include aggregate popula- tion of address matching or geocoding, rate of a two-wheeled motor vehicle is
tion data at the mesh level during the because address matching is based on about 70 percent in Hanoi.
O
daytime and nighttime. block zones and many more mismatches
Figure 6 shows the correlation be- occur at mesh levels of higher resolution. ur proposed method,
tween the total number based on spatio Such a difference in the correlation coef- which reconstructs dense
temporally interpolated person-trip data ficient (0.85 and 0.50) can also be seen or continuous spatiotem-
at 12:00 a.m. (multiplying the magni- for the nighttime comparison. poral positions from frag-
fication factor we mentioned earlier) We also reconstructed the people flow mentary data based on statistically un-
and the daytime population from the in Hanoi from the JICA person-trip data biased samples, realizes an appropriate
2000 National Census. In particular, obtained in 2004 (Table 1) using OSM overview of people flow on an urban,
Figure 6a illustrates the correlation with for the infrastructure data (Table 2). macroscopic scale. In addition, our
a third-level mesh of about 1 km2, and Figure 7 illustrates the time-slice method can be applied to any existing
180,000 80,000
160,000
140,000
Number of people in each mesh
60,000
40,000 20,000
The number of The number of
20,000 meshes: 1,222 meshes: 3,422
0 0
0
00
00
00
00
00
0
0
,00
,00
,00
,00
,00
,00
,00
,00
0
0,0
0,0
0,0
0,0
0,0
20
40
60
80
40
60
80
20
10
12
14
16
18
Figure 6. Correlation between reconstructed people-flow data and National Census data. Comparison of the number of people
in each mesh for mesh sizes of about (a) 1 km2 (third-level mesh) and (b) 500 m2 (fourth-level mesh).
2 km 2 km 2 km 2 km
2 km 2 km 2 km 2 km
Legend
: Private vehicle : Pedestrian : Bicycle : Business vehicle : Two-wheeled motor vehicle
Figure 7. People flow in Hanoi reconstructed using JICA person trip data. The aqua dots representing two-wheeled motor
vehicles are notable at commuting time, because the sharing rate of a two-wheeled motor vehicle is about 70 percent in Hanoi.
The maps show the movement of people at three-hour intervals.
References
1. A. Nakamura et al., “Introduction of JICA
Urban Transportation Development Sur-
vey Database,” Traffic Eng. (in Japanese),
vol. 39, 2004, pp. 39–43.