Sunteți pe pagina 1din 12

Physica A 391 (2012) 40314042

Contents lists available at SciVerse ScienceDirect

Physica A
journal homepage: www.elsevier.com/locate/physa

Building and analyzing the US airport network based on en-route


location information
Tao Jia a, , Bin Jiang b
a
Division of Geoinformatics, Royal Institute of Technology (KTH), SE-100 44 Stockholm, Sweden
b
Division of Geomatics, University of Gvle, SE-801 76 Gvle, Sweden

article info abstract


Article history: From a complex network perspective, this study sets out two aims around the US airport
Received 11 December 2011 network (USAN) which is built from en-route location information of domestic flights in
Received in revised form 20 February 2012 the US. First, we analyze the structural properties of the USAN with respect to its binary
Available online 13 March 2012
and weighted graphs, and second we explore the airport patterns, which have wide-
ranging implications. Results from the two graphs indicate the following. (1) The USAN
Keywords:
exhibits scale-free, small-world and disassortative mixing properties, which are consistent
En-route location information
US airport network (USAN)
with the mainstream perspectives. Besides, we find (2) a remarkable power relationship
Complex network between the structural measurements in the binary graph and the traffic measurements
Traffic patterns in the weighted counterpart, namely degree versus capacity and attraction versus volume.
On the other hand, investigation of the airport patterns suggests (3) that all the airports
can be classified into four categories based on multiple network metrics, which shows
a complete typology of the airports. And it further indicates (4) that there is a subtle
relationship between the airport traffic and the geographical constraints as well as the
regional socioeconomic indicators.
2012 Elsevier B.V. All rights reserved.

1. Introduction

Many kinds of transportation networks have been examined from the perspective of complex network, e.g. road
network [1], maritime network [2,3], public transportation network [4], and airport network. As a crucial transportation
infrastructure, an airport network plays an important role in local and global economic development, disease control, and
human migration. Therefore, it attracts attention from a range of studies [511]. These studies are based on datasets provided
by air service departments; the dataset is composed of flight records with origin and destination airports. Generally, in their
network, a flight record corresponds to an edge and the individual airport or aggregated airport within a city is considered as
a node. For the aggregated case, it is obvious that a network constructed in this way is problematic because nearer airports
belonging to different cities are more connected with each other [12]. Hence, in this study, we build the US airport network
(USAN) from the en-route location information of domestic flights in the US; a spatial clustering method is adopted to derive
the airports.
The USAN can be seen from two aspects. On the one hand, it is considered as a binary graph with the node representing
the airport and the edge representing the route. This view is good at uncovering the topological structure of the network.
On the other hand, it is regarded as a weighted graph with the edge assigned a value representing the number of flights
during the observed period. Note that this is different from the previous studies [6,13] which took the number of passengers
or seats as the weighted value. To examine the two aspects quantitatively, we use several metrics to measure the structural

Corresponding author. Tel.: +46 26 64 8930; fax: +46 26 64 8828.


E-mail address: jiatao83@hotmail.com (T. Jia).

0378-4371/$ see front matter 2012 Elsevier B.V. All rights reserved.
doi:10.1016/j.physa.2012.03.006
4032 T. Jia, B. Jiang / Physica A 391 (2012) 40314042

Table 1
Terms of flight dataset for data processing.
Name Description

Flight time stamp (FTS) As a valid record in the ASCII file, this is the basic unit of an en-route flight and contains several data
elements, such as altitude, longitude, latitude, speed, time, etc.
FTS set This contains a set of FTSs of the same flight within the period of observation.
Flight Determined uniquely by the combination of airline code and flight number, this represents the entire route
from the origin to the destination that is composed of a series of FTSs.
Route This is defined as the edge connecting two nodes (airports) in the binary graph; for example, one route may
have one or more flights.

properties of the USAN. For example, we adopt the metrics of degree, attraction, average clustering coefficient, and average
path length to measure the topological structure, and we employ capacity as well as volume metrics to measure the traffic
dynamics. Through the exploration of these metrics, some hidden properties of the USAN can be uncovered and compared
with results of previous studies; this constitutes the initial purpose of this research.
Bearing this purpose in mind, we explore the structural properties of the USAN with respect to the two graphs. Findings
from the analysis suggest that the USAN is a scale-free network in terms of both degree and betweenness distribution. We
also confirm the fact demonstrated in Ref. [6] that the most connected airport is not always the most centered one, taking
Anchorage for an example. The small-world property is subsequently examined through a comparison with the random
counterpart based on two metrics: the average clustering coefficient and the average shortest path. Moreover, the USAN is
found to exhibit a kind of disassortative mixing for both the binary graph and the weighted counterpart, and two implications
from this property are given subsequently. Importantly, to bridge the connection between the topological structure and the
traffic dynamics, we further examine the relationship between the structural measurements in the binary graph and the
traffic measurements in the weighted counterpart, namely degree versus capacity and attraction versus volume.
Apart from the exploration of structural properties of the entire USAN, little attention has thus far been paid to the
patterns exhibited by the airports [3]. The specific patterns of airports have wide-ranging implications in many issues, such
as air traffic planning, airport network modeling, global or local economic prediction, and even environmental capacity
[1416]. Moreover, the heterogeneity of traffic patterns on the USAN implies a direct application of the head/tail division
rule [17] to rank the airports, which allows us to carry out a detailed analysis on the top ranked airports. Therefore,
considering the importance and the feasibility, we finally report our findings on the airport patterns from both overall and
individual perspectives.
In Section 2, we describe the dataset adopted in this study and the corresponding data processing procedure to extract the
individual flight details. In Section 3, we introduce the method to derive the airport and build the USAN; this is then followed
by some structural metrics. Structural analysis of the entire USAN is conducted in Section 4. In Section 5, we concentrate on
the exploration of the airport patterns. Finally, we draw a conclusion in Section 6.

2. Dataset and data processing

2.1. Dataset

The raw dataset adopted in this study, with size of about 500 MB, contains a total number of 7,685,948 records in
ASCII files from the period 818 August 2010. Each record indicates the current status of a particular en-route flight and
is updated once every five minutes. The updated information includes the location as well as other attributes, such as speed,
time, etc. The location information has three dimensions with longitude, latitude, and altitude referenced by the World
Geodetic System 1984 (WGS84). For simplicity of description, in Table 1, we define several terms from the perspective of
data processing.

2.2. Flight extraction

To extract the flight information, the raw dataset has to be processed first to distinguish the valid records, known as
flight time stamps (FTSs; see Table 1), from the invalid (noisy) records. Invalid records are defined in the case of information
loss (which indicates one of the three-dimensional locations is lost) or information duplication (which indicates that two
consecutive records have the same time stamp). For the first case, we simply drop the records, and we merge the duplicate
records into one in the second case. The extracted FTSs are further filtered to obtain the domestic FTSs based on the
tag information (domestic or international). In other words, we exclude flights coming from foreign airports. Then, these
domestic FTSs are categorized into individual FTS sets, each of which is composed of a series of FTSs of the same flight within
the observation period.
Finally, each FTS set is labeled as regular or irregular based on the statistical analysis. The regular FTS set is chopped
sequentially into several parts, and the FTS in each part is connected sequentially to form a regular flight. Thus, a regular flight
is repeated several times within the observation period, which reflects the flight routine between the origin and destination
airports. On the other hand, an irregular FTS set does not need to be broken apart, and each FTS is connected consecutively
T. Jia, B. Jiang / Physica A 391 (2012) 40314042 4033

Fig. 1. (Color online): Illustration of interval time ( ) pattern for the regular FTS set and the irregular one.

Fig. 2. Demonstration of the flight extraction flow.

to form one single irregular flight which usually reflects private or official usage. Below we illustrate the strategy adopted
in this research.
To differentiate a regular FTS set from an irregular FTS set (see Fig. 1), we employ the statistical information of the time
interval of two consecutive FTSs within the FTS set: i = ti ti1 . Theoretically, is equal to 5 min when the ground station
retrieves the location of an en-route aircraft. Practically, fluctuates around 5 min because of noisy interruptions. Based on
the fact that all the FTSs are captured when the aircraft is en route, a regular FTS set is one that has a majority of values
fluctuating around 5 min and a minority of extreme large values (which represent a large time gap between two consecutive
flights). On the other hand, all the values of an irregular FTS set do not change much except for a small deviation from
5 min. Given the above difference, it is natural to build the rules (see A.1) to determine the regular FTS set from the irregular
FTS set and to generate the respective flights.
So far, the whole process to extract the flights has been illustrated; see [18]. Statistically, we have derived a total number
of 4,823,658 domestic FTSs from the original 7,685,948 records. These domestic FTSs are further assembled into 32,204 FTS
sets, of which 27,744 are regular FTS sets and 4460 are irregular FTS sets. As explained before, one irregular FTS set is
considered as one irregular flight, and hence we obtain 4460 irregular flights. On the other hand, given the fact that one
regular FTS set can be divided into several flights, we obtain 201,202 regular flights from the regular FTS sets. Therefore, a
total number of 205,662 flights including both regular and irregular ones are obtained in this study (see Fig. 2).

3. Methodology

3.1. USAN construction

In the USAN, the edges represent the routes and the nodes denote the airports. However, before setting up the network,
we need to derive the airports. It is obvious that every flight has an origin (O) and a destination (D), so in total we obtain
410,536 origin and destination locations, which is less than the theoretical number 205662 2 = 411324 because of the
shared locations. Intuitively, the O/D locations should form several clusters across the entire US because the majority of them
are located within a reachable distance of airports except for trivial anomalies. Therefore, the triangular irregular network
(TIN) is adopted to model these locations.
A previous study found a bipartite power law distribution of triangle sizes of the O/D TIN [18]. This indicates (1) the
heterogeneity of the distribution of the O/D locations, which suggests the use of triangle clustering algorithm (TCA), similar
to the city clustering algorithm (CCA) [19], to agglomerate the triangles into airports, and (2) the application of the head/tail
4034 T. Jia, B. Jiang / Physica A 391 (2012) 40314042

division rule [17], which provides a useful solution to specify the threshold value used in the TCA process while avoiding the
arbitrary radius selection in the traditional CCA process. Moreover, a spatial autocorrelation factor is considered to remove
the noisy disruption in forming the airports. This spatial autocorrelation factor specifies that a triangle belongs to an airport
if and only if its size as well as the sizes of all its neighbors satisfies the rule. To better understand this process, we describe
it in A.2.
Finally, through the intersection operation between the airports and flights, we build the USAN. This network contains a
number of 732 nodes (airports) and 6086 edges (routes). Actually, two kinds of network, namely binary and weighted, can
be examined. Each edge in the weighted network has a weighted value as the number of flights between the two airports,
whereas an edge in the binary network only indicates that there is a connection between two airports; for instance, any two
airports are connected as long as there is at least one flight.

3.2. Metrics for the network

Several structural metrics are illustrated in this section, to characterize the network. Without loss of generality, the
fundamental concepts in graph theory are elaborated first and then this is followed by the physical meaning in this context.
Note that we take the airport network as an undirected one in this study. Thus, the binary network can be represented by
its adjacent matrix as Adj(G) = {aij , i = j}, where aij is 1 when there exists a link between node i and node j and 0 when
there is no link. The weighted network is simply generated by appending a weight to the corresponding edge, and its form
can be described as WAdj(G) = {aij wij , i = j}.

3.2.1. Degree
Degree (D) is defined as the number of nodes connected with the current node from the aspect of graph theory. As one
of the fundamental metrics used to represent the structural property of a topological network, it reflects the importance of
the current node [20]. Here, it refers to the number of airports with
direct flights with the current airport, for instance, the
degree of an airport i can be calculated using the formula Di = j aij .

3.2.2. Betweenness
For the measurement of the most central node in a network, betweenness (B) is defined as the number of shortest paths
between any two nodes that pass through the current node. As another fundamental metric to represent the structural
property of the topological network, it reflects the extent to which the current node lies in the network [20]. Typically, a node
acting as a bridge tends to have a high value of betweenness. Therefore, in this context, it is used to estimate the influence
of the current airport on the transiting
of air traffic between other airports; for instance, the betweenness of airport i can
be calculated using the formula Bi = m=n=i Path(m, i, n)/Path(m, n), where Path(m, i, n) is the number of shortest paths
passing through airport i form airport m to n and Path(m, n) is the number of shortest paths from airport m to n.

3.2.3. Attraction
Different from the above two metrics, attraction (A) is proposed to measure the importance of the edge in the topological
network. Inspired by the gravitation model, we define the attractive force for the edge connecting node i and j as Aij =
(Di Dj )/dist(i, j), where dist(i, j) is the geometric distance of the two nodes. As shown in this formula, attraction is

proportional to the product of the degrees of the connecting nodes, and is inversely proportional to the distance between
the connecting nodes. In this context, it reflects the extent to which the route between two airports attracts air traffic. In
other words, a busier route should have a higher value of attraction: Vij Aij , where Vij is called the volume of the route
from airport i to airport j, and it is the number of flights observed within the period.

3.2.4. Capacity
Capacity (C ) is suggested to measure the importance of the current node in the weighted network. It is defined as the
value of the total weights of its incident links. In fact, it is also called the node strength [21], and it is a generalized form
of the degree measurement in the topological network. Specifically, it reflects the importance of the airport in general and
accounts
for the total number of flights handled by the current airport within the observed period in particular. For instance,
Ci = j aij wij , where wij is the volume (Vij ) between airport i and j.

3.2.5. Clustering coefficient


The clustering coefficient (CC) is used to characterize the local cohesiveness of the current node or the extent to which
the nodes in the network are clustered together, which corresponds to two versions in graph theory, namely the global
and the local, respectively. The global one is based on the number of closed triplets (three nodes connected by at least
two nodes) over the total number of triplets [22], whereas the local one refers to the fraction of the actual number of pairs
of connected neighbors over the maximum number of possible pairs [23]. In the USAN, clustering coefficient reflects the
ease of the air traffic among the neighboring airports of the current one. The CC of airport i can be calculated with the
formula CCi = Ri /(Di (Di 1)/2), where Ri is the actual number of routes among the neighboring airports of airport i and
Di is the degree value of airport i.
T. Jia, B. Jiang / Physica A 391 (2012) 40314042 4035

Fig. 3. (Color online): Plot of the scale-free property of the USAN and the anomalous airport. (Note: in (a), both the topological metrics, degree and
betweenness, obey a power law distribution with exponential cutoff, and (b) illustrates that the two metrics display a smooth bounded quadratic
relationship with r squared equal to 0.86.)

3.2.6. Average shortest path


The average shortest path (ASP) is another metric to characterize the efficiency of information circulating on the network.
In graph theory, it is defined as the average shortest number of steps among all pairs of nodes [23]. In this context, it
represents the average minimum number of flights that a passenger needs to transfer between any two airports, and it
reveals the ease of air travel among cities. For instance, it can be formulated as ASP = 2 i>j dij /(N (N 1)), where dij

is the shortest distance (steps) between airport i and j, and N is the total number of airports. Note that the maximum dij is
called the diameter of the network.

4. Structural analysis of the USAN

Based on the metrics elaborated above, we investigate the structural properties of the USAN in this section. Findings
from the analysis are mainly focused on three aspects. First, the USAN exhibits a scale-free property, and a highly connected
airport is not always highly centered. Second, the properties of small world and disassortative mixing are observed for the
USAN. The last and most important aspect is that the air traffic is highly correlated with topological structure.

4.1. Scale-free property and anomalous airport

A scale-free network is a network whose degree follows a power law distribution [24]. To determine whether the
USAN exhibits this property, we examine the distribution of degree and betweenness metrics and find that both metrics
follow a power law distribution with an exponential cutoff (see Fig. 3(a)): Pr(D) = 0.18 D0.45 e0.02D and Pr(B) =
0.07 B1.14 e14.18B for degree and betweenness, respectively. This finding indicates the scale-free and finite-size growth
of the USAN [25].
Consequently, it is reasonable to assume that there is a distinct relationship between the two metrics. Indeed, it is
observed that the larger the value of degree the greater the value of betweenness (see Fig. 3(b)). However, an anomalous
airport in Anchorage is identified since it has a low degree but a high betweenness, which is in agreement with the findings
in the literature [6]. This abnormal case could be explained from two perspectives: one is that Anchorage acts as a hub to
connect the population in Alaska to the continental US, and the other is that it lacks natural connections with its geographical
neighbor Canada because of political constraints [6]. Besides, both C and V have been found to obey a power law distribution
with exponent equal to 1.67 and 2.56, respectively, which suggests a small number of airports handle a huge amount of air
traffic and characterizes the heterogeneity of air traffic among the airports.

4.2. Small-world and disassortative mixing properties

In this section, we start by verifying whether the USAN exhibits a small-world property. In particular, we examine two
structural metrics, namely the clustering coefficient (CC) and the shortest path (SP). The cumulative distribution of the CC can
be approximated by a linear function (see Fig. 4(a)), which indicates that it follows a uniform distribution with average CC
(ACC) equal to 0.58. On the other hand, a normal distribution is observed for the SP among any pair of airports (see Fig. 4(b))
with average SP (ASP) being 2.61. Actually, about 97% of SPs have length being less than or equal to 3, and the maximum
value of the SP (diameter) is 5, which reconfirms the six degrees of separation theory [26]. For comparison purposes, we
further model 100 ErdsRnyi random networks preserved with the same number of nodes and edges as the USAN. It is
found that the ACC for the random network is 0.02, which is much smaller than that of the USAN, and the ASP is about 2.65,
which is almost the same as that of the USAN.
4036 T. Jia, B. Jiang / Physica A 391 (2012) 40314042

(a) Clustering coefficient. (b) Shortest path distribution.

(c) Topological assortativity. (d) Weighted assortativity.

Fig. 4. (Color online): Demonstration of the small-world and disassortative mixing properties of USAN. (Note: the small-world property is illustrated in
(a) and (b), where the USAN (blue star) shows a high value of ACC and low value of ASP compared with the random counterpart (green circle); on the other
hand, the disassortative mixing property is displayed in (c) and (d), where the topological USAN seems more disassortative than the weighted one.)

Dnn , where Dnn =


Next, we examine the correlation of adjacent airports, namely the average nearest neighbor degree (
Dnn Dnn P (Dnn |D)) and the degree, the average nearest neighbor capacity (Cnn , where Cnn = Cnn Cnn P (Cnn |C ))
and the capacity, respectively (see Fig. 4(c) and (d)). It is found that the USAN reflects a pattern of disassortative mixing [27]
with value of 0.3 and 0.2 in terms of degree and capacity, respectively. This is consistent with the findings in the area
of technology and transportation [5,8,10], but is inconsistent with other findings in sociology in which individuals with
the same hobbies are more likely to be connected [28]. From this finding, two facts can be implied. First, airports with few
connections or small capacity tend to be linked with airports with many connections or large capacity. This can be mainly
attributed to the small airports extracted from our dataset, which are in fact connected to the nearest large airports for ease
of business or life. Second, the weighted network has a weaker effect of disassortative mixing than the binary one. This can
be explained by the high volume among the most connected airports, which indicates a degree of rich club effect [29].

4.3. Relationship between topological structure and air traffic

So far, we have found the heterogeneous characteristics for both the topological USAN and the weighted counterpart.
However, we have not gone into their relationship, which is important for understanding the evolution of air transport
system because there is a mutual interaction between the topological infrastructure and the air traffic. Therefore, to obtain
their relationship quantitatively, we examine two pairs of structural metrics from each side, namely degree (D) versus
capacity (C ) and attraction (A) versus volume (V ).
C C P (C |D)) is strongly

For the first pair of metrics, it is found that the average capacity of an airport (C , where C =
dependent on its degree, and the relationship can be expressed as C = 2.14 D1.55 , with r squared as high as 0.92 (see
Fig. 5(a)). This relationship, consistent with the literature [10,30,21], indicates a power increase in the capacity whenever a
unit is increased in the degree. Note that the exponent value 1.55 is almost the same as the findings (1.5) by Barrat et al. [21].
On the other hand, for the second pair of metrics, a clear correlation is also obtained between the average traffic volume
(V , where V = V V P (V |A)) and the attraction of each route. Interestingly, the correlation can again be approximated
by a power relationship in the formula as V = 87.96 A2.91 , with r squared equal to 0.70 (see Fig. 5(b)). The large exponent
value indicates a strong positive allometric growth [31] on the traffic volume with respect to the attraction.
T. Jia, B. Jiang / Physica A 391 (2012) 40314042 4037

Fig. 5. (Color online): Plot for the relationship between two pairs of structural metrics from the topological USAN and the weighted counterpart. (Note:
(a) demonstrates the power relationship between the degree and the average capacity; (b) shows the power relationship between the attraction and the
average volume.)

Table 2
Network structural metrics table for the airport typologies (Note: Avg. stands for average).
Cluster Number Avg. D Avg. B Avg. CC Avg. Z Avg. P Typology

1 25 148 0.05 0.26 4.14 0.73 Large


2 116 51 0.003 0.54 0.91 0.69 Medium
3 421 5 0.00007 0.83 0.32 0.35 Small
4 170 1 0.00002 0.03 0.44 0.03 Others

To fully understand the mechanism of the evolution of the USAN is not an easy job, but here we at least empirically
explore it from two aspects, namely airport and route based measurements, respectively. Indeed, these results obtained

from both sides, namely Wij Aij and Ci Di , set up the relationship between the topological structure of the USAN and
its observed traffic flow, and will benefit further research.

5. Airport patterns in the USAN

In this section, we explore the patterns of airports, which together with the above network structural analysis will benefit
a better understanding of the USAN. We start by examining the airport typologies based on multiple network structural
metrics. Subsequently, we analyze the traffic patterns of airports and their flight distance distributions.

5.1. Structural typology of airports

Different metrics can describe the role that an airport plays in the USAN from different aspects, and we can obtain
sufficient knowledge of an airport as far as we explore a sufficient number of metrics. To derive this knowledge, we try
to explore five network structural metrics, namely degree, betweenness, clustering coefficient, within-module degree (Z ),
and participation coefficient (P ) [32]. Note that the last two metrics are proposed in Ref. [32] and are used here to measure
the extent of an airport being connected with other airports in the same module and with respect to other modules [33].
However, it is hard to directly extract the information from the five-dimensional metrics space. And hence we employ
the principal component analysis (PCA) method to reduce the high dimensions and adopt the scores with respect to
the first two components as the new measurements. Specifically, the first two components can explain 84% of the total
variance, which is enough information. Moreover, inspired from the ideal that nodes with the similar scores should play
a similar role [34], we apply the k-mean algorithm to classify the 732 airports into four categories according to their
scores.
We find that the average value of each metric for the first three airport clusters is in a decreasing order except for that
of the CC, which is in an increasing order (see Table 2). This finding supposes a natural typology for the first three airport
clusters: large, medium, and small. Note that the 25 large airports are roughly the major US airports reported by Federal
Aviation Administration (FAA, [35]). On the other hand, airports in the fourth cluster have an extremely small CC value,
which suggests that they are different from the ones above and we name them as others. In fact, we can assume that the
large airports act as the global hubs in the USAN in terms of their high capability of connecting with other airports as well
as transferring the traffic flow, and that the medium and small airports probably provide the regional and local services,
respectively, and that the others are more likely to be the airports with irregular flights, such as those with private usage.
Finally, we map all the airports with their categories in Fig. 6.
4038 T. Jia, B. Jiang / Physica A 391 (2012) 40314042

Fig. 6. (Color online): Map of the airport typologies.

Fig. 7. (Color online): Plot of airport traffic capacity with (a) population and (b) GDP.

5.2. Traffic patterns of airports

Apart from the structural typological analysis of airports, we explore the traffic patterns of airports in this section. An
important measurement of airport traffic is capacity (C ), which suggests the intensity of airport usage during the observed
period. We have reported its power law distribution, which indicates the heterogeneity of air traffic in the USAN, namely a
small number of airports have a huge number of flights. In this respect, there are two questions connected with the airport
traffic. (1) Is it associated with socioeconomic factors? (2) Can it support a robust USAN? Our answer to the first question is
yes, because we find a remarkable linear relationship between the airport capacity and the socioeconomic indicators of the
underlying metropolitan statistical area (MSA) [36], population, and gross domestic product (GDP) (see Fig. 7).
To answer the second question, we first obtain the nine most important airports by applying the head/tail division
rule [17] recursively to the airport capacity. The nine top ranked airports are roughly in agreement with the official
report [37], and they belong to the large airport category of airport typologies. Then, we explore the local individual networks
of the nine top ones (here, a local individual network means one with nodes composed of a top ranked airport and its
immediate neighbors and the edges among these airports). From Table 3, we can conclude that the USAN is relatively robust
to attacks because each individual local network shares at least 80% of the total air traffic.

5.3. Flight distance distribution of airports

Another important aspect to explore the traffic patterns of airport is to study the flight distance distribution, which
not only improves our understanding on how far each airport communicates with others, but also helps in its growth
modeling [16]. Practically, we extract all the flights departing from or arriving at an airport within the observed period,
T. Jia, B. Jiang / Physica A 391 (2012) 40314042 4039

Table 3
Top ranked airports in the USAN (Note: N.Y., L.A., and S.F. are the abbreviations for New York, Los Angeles, and San Francisco, respectively; the traffic
percentage is calculated as the proportion of traffic volume on the individual network to that on the entire network).
Rank Name C Traffic (%) Rank Name C Traffic (%)

1 N.Y. 23 042 87.7 6 Houston 12 017 83.6


2 Chicago 19 361 88.9 7 L.A. 11 057 80.5
3 Atlanta 14 965 85.4 8 S.F. 10 477 81.0
4 Washington 13 448 84.9 9 Denver 9 514 82.8
5 Dallas 12 673 87.8

Table 4
KS statistic table for the nine top ranked airports (Note (Color online): three categories, based on the similarity of members of each category, are colored
by red, green, and blue respectively).

Table 5
Test of the best model for the nine top ranked airports (Note: VTS stands for Vuongs test statistic which is the normalized log likelihood ratio, and P is the
significance level; for more details refer to [40]).
Airport Power law/Exp. Power law/Lognormal Exp./Lognormal Model
VTS P VTS P VTS P

Atlanta 208.8 0.0 125.6 0.0 36.3 0.0 Lognormal


Chicago 160.4 0.0 141.1 0.0 25.1 0.0 Lognormal
Washington 122.8 0.0 117.9 0.0 16.2 0.0 Lognormal
Dallas 161.3 0.0 112.4 0.0 15.3 0.0 Lognormal
Denver 191.1 0.0 88.9 0.0 11.4 0.0 Lognormal
Houston 166.8 0.0 111.9 0.0 11.2 0.0 Lognormal
N.Y. 220.2 0.0 184.1 0.0 19.5 0.0 Lognormal
S.F. 178.6 0.0 125.6 0.0 0.7 0.4 None
L.A. 133.8 0.0 100.4 0.0 7.6 0.9 None

and here we obtain nine flight datasets with respect to the nine top ranked airports. First, we investigate how the number
of flights varies with distance increasing from the airport (see Fig. 8). In other words, we want to verify whether geographic
distance decay plays an important role [12]. This effect suggests a power or exponential decay of the number of flights with
increasing distance. As expected, it is found that the geographic constraint plays a weak role on most airports, and thereafter
human activities are becoming more loosely connected with geographic distance [38].
Second, we examine the issue of whether two airports with more similarity in air traffic are geographical nearer to
each other, which is inspired from the conventional thinking that near things are more related with each other than
distant things [12]. We start by grouping the nine top ranked airports into different categories based on the flight distance
distribution, and then analyze if the airports within the same group are closer to each other in space. The grouping process
is facilitated by the employment of the KolmogorovSmirnov (KS) statistic which is regarded as a distribution-free and
nonparametric test [39]. From Table 4, it is observed that the top ranked airports are demarcated into three groups, namely
the AtlantaChicagoWashington group, the N.Y.DallasHoustonDenver group, and the S.F.L.A. group, respectively.
Furthermore, a similar classification can be derived by aggregating the airports with distance no more than 1050 km,
albeit N.Y. airport is an outlier. Indeed, the geographical location of an individual airport does matter in shaping its traffic
patterns.
Finally, we establish the statistical model for each flight distance distribution. The best model is simply selected from
three alternatives, power law, exponential, and lognormal, based on Vuongs test statistic [40]. We find that the flight
distance of all airports except for S.F. and L.A. fit a lognormal distribution better than the other two alternatives (see Table 5).
However, it is difficult to select an optimal model for the flight distance of S.F. and L.A. Further studies from their histograms
indicate that they can be better modeled as a Gaussian mixture distribution. Importantly, both the lognormal and the
4040 T. Jia, B. Jiang / Physica A 391 (2012) 40314042

Fig. 8. (Color online): Map for the flight distance distribution of the nine top ranked airports. (Note: based on the natural break algorithm, this figure
visualizes the flight distance pattern of each airport using five categories.)

Fig. 9. (Color online): Plot of flight distance distribution for the nine top ranked airports. (Note: as for the flight distance, the airports within the first and
second groups obey a lognormal distribution as show in (a) and (b), whereas those within the third group follow a Gaussian mixture distribution with
modes four and three depicted in (c)).

Gaussian mixture distribution (see Fig. 9) show one or more peaks, which will benefit future studies on airport traffic
planning.
T. Jia, B. Jiang / Physica A 391 (2012) 40314042 4041

6. Conclusion

Thanks to the availability of massive en-route tracking location information, we show the procedure to extract individual
domestic flights and further build the US airport network (USAN) in this study. Analysis of the USAN is presented from
the perspectives of both binary and weighted graphs. Our results indicate that it exhibits some similar network structural
properties as those examined in previous studies, although it is totally different from them in terms of construction. For
example, it is a scale-free network with a heavy tail distribution for degree or betweenness, and it is a small-world network
with large local clustering and small global separation. Besides, it exhibits a kind of disassortative mixture in both the binary
and the weighted graphs, and the binary graph shows a heavier effect than the weighted counterpart. Importantly, there is a
power relationship between the metrics of binary graph and those of the weighted counterpart. This finding hints a mutual
relationship between the topological structure and the air traffic, which will help in the understanding of air transport
network evolution.
In addition, we explore the airport patterns from two perspectives: overall and individual. From an overall perspective,
we present a typological map of all airports based on multiple network metrics, which conforms well with our conventional
thinking. We further report that the airport traffic is highly associated with regional socioeconomic indicators, and that it
supports a robust USAN. On the other hand, from an individual perspective, we investigate the traffic patterns of the nine top
ranked airports in terms of their flight distance distributions. Investigation of the flight distance distribution indicates the
effect of geographical constraints on the traffic patterns of individual airports. Specifically, we demarcate the nine top ranked
airports into three groups according to the similarity of traffic patterns and further establish the corresponding statistical
models, and this will help in future studies of airport traffic planning.

Acknowledgments

We would like to thank FlyteComm Company for providing the en-route flight tracking data. Special thanks also go to
the anonymous referees for their constructive comments.

Appendix

In this appendix, two pseudo codes are given.

A.1. Pseudo-code for determining the regular FTS set from the irregular FTS set

If Mean ( ) >= Stdev ( ) Then


Irregular FTS set case: Generate an irregular flight from the FTS set;
Else
Regular FTS set case: Split the FTS set into individual flights with the threshold value defined as the sum of mean and
standard deviation;
End If

A.2. Pseudo-code for triangle clustering algorithm

Select any triangle with size less than mean value as current triangle;
Define a geometry collection (GC);
Recursive Function TCA (current triangle)
Retrieve its three neighbor triangles;
If (sizes of the three neighbor triangles < mean value) Then
Add the three neighbor triangles into the triangle set;
End If
If (the triangle set = empty) Then
Return;
Else
Remove the current triangle from the triangle set;
Add the current triangle into the GC;
Pick up any triangle from the triangle set as the current triangle;
End If
Call TCA (current triangle);
End Function
Merge the triangles in GC as a natural airport;
4042 T. Jia, B. Jiang / Physica A 391 (2012) 40314042

References

[1] S.H.Y. Chan, R.V. Donner, S. Lmmer, Urban road networks spatial networks with universal geometric features? a case study on Germanys largest
cities, The European Physical Journal B (2011) http://dx.doi.org/10.1140/epjb/e2011-10889-3.
[2] Y.H. Hu, D.L. Zhu, Empirical analysis of the worldwide maritime transportation network, Physica A 388 (10) (2009) 20612071.
[3] P. Kaluza, A. Klzsch, M.T. Gastner, B. Blasius, The complex network of global cargo ship movements, Journal of the Royal Society Interface 7 (2010)
10931103.
[4] H. Soh, S. Lim, T.Y. Zhang, X.J. Fu, G.K.K. Lee, T.G.G. Hung, P. Di, S. Prakasam, L. Wong, Weighted complex network analysis of travel routes on the
Singapore public transportation system, Physica A 389 (24) (2010) 58525863.
[5] W. Li, X. Cai, Statistical analysis of airport network of China, Physical Review E 69 (4) (2004) 046106.
[6] R. Guimer, S. Mossa, A. Turtschi, L.A.N. Amaral, The worldwide air transportation network: anomalous centrality, community structure, and cities
global roles, Proceedings of the National Academy of Sciences of the United States of America 102 (22) (2005) 77947799.
[7] M. Guida, M. Funaro, Topology of the Italian airport network: a scale-free small-world network with a fractal structure, Chaos, Solitons and Fractals
31 (2007) 527536.
[8] G. Bagler, Analysis of the airport network of India as a complex weighted network, Physica A 387 (12) (2008) 29722980.
[9] L.E.C.D. Rocha, Structural evolution of the Brazilian airport network, Journal of Statistical Mechanics: Theory and Experiment (2009)
http://dx.doi.org/10.1088/1742-5468/2009/04/P04020.
[10] D.D. Han, J.H. Qian, J.G. Liu, Network topology and correlation features affiliated with European airline companies, Physica A 388 (1) (2009) 7181.
[11] J.E. Wang, H.H. Mo, F.H. Wang, F.J. Jin, Exploring the network structure and nodal centrality of Chinas air transport network: a complex network
approach, Journal of Transport Geography (2010) 09666923. http://dx.doi.org/10.1016/j.jtrangeo.2010.08.012.
[12] W. Tobler, A computer movie simulating urban growth in the Detroit region, Economic Geography 46 (2) (1970) 234240.
[13] M. Barthelemy, A. Barrat, R.P. Satorras, A. Vespignani, Characterization and modeling of weighted networks, Physica A 346 (2005) 3443.
[14] P.J. Upham, Environmental capacity of aviation: theoretical issues and basic research directions, Journal of Environmental Planning and Management
44 (5) (2001) 721734.
[15] P.J. Upham, C. Thomas, D. Gillingwater, D. Raper, Environmental capacity and airport operations: current issues and future prospects, Journal of Air
Transport Management 9 (3) (2003) 145151.
[16] R. Guimer, L.A.N. Amaral, Modeling the world-wide airport network, The European Physical Journal B 38 (2004) 381385.
[17] B. Jiang, X. Liu, Scaling of geographic space from the perspective of city and field blocks and using volunteered geographic information, International
Journal of Geographical Information Science, x, xx-xx. Preprint http://arxiv.org/abs/1009.3635, 2011.
[18] B. Jiang, T. Jia, 2011. Exploring human mobility patterns based on location information of US flights, arXiv:1104.4578.
[19] H.D. Rozenfeld, D. Rybski, J.S. Andrade Jr., M. Batty, H.E. Stanley, H.A. Makse, Laws of population growth, Proceedings of the National Academy of
Sciences 105 (2008) 1870218707.
[20] L.C. Freeman, Centrality in social networks: conceptual clarification, Social Networks 1 (3) (1979) 215239.
[21] A. Barrat, M. Barthelemy, R.P. Satorras, A. Vespignani, The architecture of complex weighted networks, Proceedings of the National Academy of Sciences
101 (11) (2004) 37473752.
[22] W. Stanley, F. Kathrine, Social Network Analysis: Methods and Applications, Cambridge University Press, Cambridge, 1994.
[23] D.J. Watts, S.H. Strogatz, Collective dynamics of small world networks, Nature 393 (1998) 440442.
[24] A.L. Barabasi, E. Bonabeau, Scale-free networks, Scientific American 288 (60) (2003).
[25] P.S. Romualdo, V. Alessandro, Epidemic dynamics in finite size scale-free networks, Physical Review E 65 (3) (2002) 035108.
[26] D.J. Watts, Six Degrees: The Science of a Connected Age, Norton, New York, 2003.
[27] M.E.J. Newman, Assortative mixing in networks, Physical Review Letters 89 (2002) 208701.
[28] A. Grabowski, R. Kosinski, Mixing patterns in a large social network, Acta Physica Polonica B 39 (2008) 1291.
[29] V. Colizza, A. Flammini, M.A. Serrano, A. Vespignani, Detecting rich-club ordering in complex networks, Nature Physics 2 (2006) 110115.
[30] R.P. Satorras, A. Vzquez, A. Vespignani, Dynamical and correlation properties of the Internet, Physical Review Letters 87 (2001) 258701.
[31] S. Nordbeck, Urban allometric growth, Geografiska Annaler 53B (1971) 54.
[32] R. Guimer, L.A.N. Amaral, Functional cartography of complex metabolic networks, Nature 433 (2005) 895900.
[33] A.W. Rives, T. Galitski, Modular organization of cellular networks, Proceedings of the National Academy of Sciences 100 (2003) 11281133.
[34] R. Guimer, L.A.N. Amaral, Cartography of complex networks: modules and universal roles, Journal of Statistical Mechanics: Theory and Experiment
(2005) P02001.
[35] Source: Federal Aviation Administration, available at: http://www.fly.faa.gov/flyfaa/usmap.jsp.
[36] Source: Bureau of Economic Analysis, U.S. Department of Commerce, available at: http://www.bea.gov/regional/index.htm.
[37] Source: Bureau of Transportation Statistics, T-100 International Market, available at:
http://www.bts.gov/press_releases/2010/bts039_10/html/bts039_10.html#table_12.
[38] H.J. Miller, Toblers first law and spatial analysis, Annals of the Association of American Geographers 94 (2) (2004) 284289.
[39] D.N. Shanbhag, C.R. Rao, Stochastic Processes: Theory and Methods, Elsevier Science B.V, Amsterdam, 2001.
[40] Q.H. Vuong, Likelihood ratio tests for model selection and non-nested hypotheses, Econometrica 57 (1989) 307333.

S-ar putea să vă placă și