Sunteți pe pagina 1din 23

A Comparative

Analysis of Web and


P2P Traffic

Naimul Basher, Aniket Mahanti,


Anirban Mahanti, Carey
Williamson, and Martin Arlitt

WWW 2008, Beijing


INTRODUCTION
 In the past a significant proportion of Internet
traffic was from Web applications using HTTP.
 Web traffic is distinguished by small-sized
flows, short-lived connections, asymmetric

WWW 2008, Beijing


flow volumes, and well-defined port usage.
 The advent of Peer-to-Peer (P2P) file sharing
applications have triggered a paradigm shift
in Internet data exchange.
 P2P usage has grown steadily since
inception, and recent empirical studies report
that Web and P2P dominate today’s Internet
traffic.
2
WEB AND P2P CHARACTERIZATION
 We use recent packet traces collected at a large
university (30,000 students and employees) to
extensively characterize and compare traffic
generated by Web and P2P applications.

WWW 2008, Beijing


 We primarily focus on characterizing behaviors of
these applications at the flow-level and host-level.
 Our work develops flow-level distributional models
that may be used to refine Internet traffic models
for use in network simulations and emulation
experiments.
 We also analyze and compare two P2P
applications, BitTorrent and Gnutella.

3
PREVIEW OF RESULTS
Characteristics Web P2P

Flow size Introduces many mice Introduces many


but few elephant flows. mice and elephant
flows.

WWW 2008, Beijing


Flow IAT Typically short IAT. Typically long IAT.
Flow duration Typically short-lived. Typically long-lived.
Flow concurrency Most hosts maintain Many hosts maintain
more than one only one flow at a
concurrent flow. time.
Transfer volume Large transfers are Large transfers
dominated by happen in either
downstream traffic. upstream or
downstream
direction.
Geography Most externals hosts are External peers are 4
located in the same globally distributed.
geographic region.
TRACE COLLECTION METHODOLOGY
 Full packet traces were collected using lindump
from the 100 Mbps full duplex commercial Internet
connection of the University of Calgary.
 Since P2P applications frequently use random port

WWW 2008, Beijing


numbers, we used payload signatures to identify
applications.
 We used Bro, a network intrusion detection system
to perform payload signature matching and map
network flows to traffic types.
 Due to storage limitations we used non-contiguous
1-hour traces collected each morning and evening
on Thursday through Sunday between April 6 and
April 30, 2006.
5
TRACE SUMMARY
TCP Trace Statistics Count

Number of Flows 23 million


Number of Packets 945 million

WWW 2008, Beijing


Data Volume 585 GB

Internet Applications Flows Bytes

Web 40% 35%


P2P 3% 33%

P2P Applications Flows Bytes


Gnutella 21% 78%
BitTorrent 61% 17%
6
CHARACTERIZATION METRICS
 Flow-level characterization metrics
 Flow size – total bytes transferred during a connection.
We label flows as mice if they transfer < 10 KB and
elephants if they transfer > 5 MB.
 Flow duration – the time between the start and the end

WWW 2008, Beijing


of a TCP flow.
 Flow inter-arrival time (IAT) – the time between two
consecutive flow arrivals.
 Host-level characterization metrics
 Flow concurrency – the maximum number of TCP flows
a single host uses concurrently to transfer content.
 Transfer volume – the total bytes transferred to
(downstream) and from (upstream) a host.
 Geographic distribution – the distribution of the shortest
distance between hosts and our campus along the
surface of the Earth.
7
WEB AND P2P FLOW SIZES

P2P model: Hybrid


Pareto and Weibull

Web model: Hybrid


Pareto and Weibull

WWW 2008, Beijing


 P2P applications generate many small and many very large-sized flows
than Web applications.
 Three sources of small sized flows in P2P: extensive signaling, aborted
transfers, and connection attempts to non-responsive peers.
 We also find few very large P2P flows that are much larger than the
occasional large Web transfers.

8
GNUTELLA/BITTORRENT FLOW SIZES

BitTorrent model:
Hybrid Lognormal and
Pareto

Gnutella model:

WWW 2008, Beijing


Hybrid Lognormal and
Pareto

 Gnutella and BitTorrent generate similar


percentage of small-sized flows, mostly control
data exchanged between peers.
 Gnutella appears to generate more large-sized
flows than BitTorrent.
 BitTorrent uses file segmentation to split an object
into multiple equal-sized pieces and downloads
them using parallel flows. Gnutella typically 9

downloads the entire object from a single peer.


MICE AND ELEPHANT PHENOMENON
Applications Mice Mice Elephant Elephant
Flows Bytes Flows Bytes
Web 76% 9% 0.04% 15%
P2P 93% 0.5% 1% 93%
Gnutella 83% 0.1% 3% 93%

WWW 2008, Beijing


BitTorrent 95% 2% 0.1% 95%

 Web mice flows account for a relatively higher proportion of the total
Web bytes than P2P mice flows account for the total P2P bytes.
 P2P elephant flows are significantly larger than Web elephant flows.
 BitTorrent mice flows, on average, are larger than Gnutella mice flows
because of BitTorrent’s intense signaling activities.
 BitTorrent elephant flows, on average, are larger than Gnutella
elephant flows. Gnutella users share mostly audio files, while BitTorrent
users share more video files.

10
WEB AND P2P INTER-ARRIVAL TIMES

P2P model: Hybrid


Weibull and Pareto

Web model: Two-


mode Weibull

WWW 2008, Beijing


 Web flow IAT are much shorter than those of P2P flows.
 Web traffic has a higher arrival rate (80 flows/sec)
compared to P2P traffic (6 flows/sec).
 Another factor contributing to the lower arrival rate and
the longer IAT values for P2P flows is the persistent nature
of their TCP connections.

11
WEB AND P2P FLOW DURATIONS

P2P model: Hybrid


Weibull and Pareto

Web model: Two-

WWW 2008, Beijing


mode Pareto

 Approx. 70% of Web durations are < 1 sec indicating low response times
for Web requests because of good Internet connectivity in our campus.
 Approx. 30% of P2P flows are shorter than 30 sec. These are failed,
aborted, or signaling flows.
 There are few long duration P2P mice flows due to repeated
unsuccessful connection attempts.
 Approx. 40% of P2P flow durations are between 20 and 200 sec. These
reflect bandwidth-limited connections.

12
GNUTELLA/BITTORRENT FLOW DURATIONS

BitTorrent model:
Hybrid Lognormal and
Pareto

WWW 2008, Beijing


Gnutella model:
Hybrid Lognormal and
Pareto

 BitTorrent flows, on average, last longer than


Gnutella flows.
 Longer flows of BitTorrent resulted due to its
protocol architecture – rarest first piece selection,
fixed number of uploads/downloads permitted,
persistent connection.
 Gnutella can use a single flow for downloading an 13
object and does not need to share bandwidth.
WEB AND P2P FLOW CONCURRENCY

WWW 2008, Beijing


 Surprisingly many P2P hosts in our network maintain only
a single TCP connection.
 A significant proportion of internal Web hosts maintain
more than one concurrent TCP connection.
 Web browsers often initiate multiple concurrent connections to
transfer content in parallel.
 High degree of Web flow concurrency (> 30) is due to
Web proxies and content distribution nodes.
14
GNUTELLA/BT FLOW CONCURRENCY

WWW 2008, Beijing


 Most Gnutella hosts connect with only one host
at a time.
 We observed few Gnutella hosts with > 10
concurrent TCP connections. These hosts acted
as super-peers in Gnutella’s peer hierarchy.
 Most BitTorrent hosts exhibit a high degree of
flow concurrency, which is a natural occurrence
in BitTorrent. 15
WEB AND P2P TRANSFER VOLUME

WWW 2008, Beijing


 Approx. 50% of Web and P2P hosts transfer small amounts of data
(< 1 MB) and are typically active for < 100 sec.
 P2P hosts that repeatedly yet unsuccessfully attempt connecting to peers.
 Web hosts that browse the Web, widgets that retrieve information from the
Web periodically, and downloading small files.
 Approx. 35% of Web and 15% of P2P hosts transfer data < 10 MB
and are active for < 1000 sec.
 P2P hosts that share small objects.
 Web hosts that browse the Web for prolonged periods, downloading
software/multimedia, and HTTP-based streaming.

16
P2P TRANSFER SYMMETRY
System Freeloader Fair-share Benefactor
Gnutella 57% 10% 33%
BitTorrent 10% 40% 50%

WWW 2008, Beijing


 Transfer symmetry is a major concern for P2P
system developers, who want to encourage fair
sharing among participating peers.
 We observe more fairness in BitTorrent and more
freeloading in Gnutella.
 BitTorrent’s tit-for-tat mechanism encourages
uploading for the opportunity to download.
 Gnutella host behavior appears to be dominated by
extreme upstream and downstream transfers.
17
WEB AND P2P HEAVY HITTERS

WWW 2008, Beijing


 Heavy hitters are the few hosts that account for much of
the traffic volume transferred.
 Heavy hitters are present in both Web and P2P.
 Most P2P heavy hitters are either freeloaders or
benefactors.
 The total amount of data transferred by the top 10% of Web
and P2P hosts follows a power law distribution.
18
 Top ranked P2P hosts transfer an order of magnitude more
data than top ranked Web hosts.
WEB AND P2P GEOGRAPHIC DISTRIBUTION

WWW 2008, Beijing


 Approx. 75% of external Web hosts are in North
America; Europe and Asia account for 10% each.
 A majority of our Web campus users are English
speaking, and thus are likely to visit Web sites located
in predominantly English-speaking countries.
 Approx. 60% of P2P hosts are located outside
North America.
 This indicates that connectivity between P2P hosts
does not strongly rely on host locality, rather it 19
depends on resource availability during connection
establish phase.
GNUTELLA/BT GEOGRAPHIC DISTRIBUTION

WWW 2008, Beijing


 Approx. 70% of Gnutella hosts are located in
North America.
 This suggest either Gnutella peers prefer to connect
with hosts that are in close proximity or that Gnutella
clients are widely used in North America for file
sharing.
 Approx. 30% BitTorrent hosts are located in North
America and approx. 40% are located in Europe.
 We believe that the list of trackers is created based on
host bandwidth availability in a swarm, and we see a 20
bias towards regions with high broadband penetration.
NETWORK TRAFFIC MANAGEMENT

WWW 2008, Beijing


 At the University of Calgary, traffic is managed using a
commercial packet shaping device.
 At the time of capture the network policy was to group together all
identified P2P flows and collectively limit their bandwidth to 56
Kbps.
 We do not observe a strong positive correlation between
flow size and duration.
 Some P2P flows are indeed identified and limited by the traffic
shaper, however, we do see many other P2P flows that escaped
detection by the traffic shaper.
 Our results provide a snapshot of Web and P2P
characteristics from a large edge network, and should be 21
representative of other edge networks with similar user
population and network management policies.
RESULT HIGHLIGHTS
Characteristics Web P2P

Flow size Introduces many mice Introduces many


but few elephant flows. mice and elephant
flows.

WWW 2008, Beijing


Flow IAT Typically short IAT. Typically long IAT.
Flow duration Typically short-lived. Typically long-lived.
Flow concurrency Most hosts maintain Many hosts maintain
more than one only one flow at a
concurrent flow. time.
Transfer volume Large transfers are Large transfers
dominated by happen in either
downstream traffic. upstream or
downstream
direction.
Geography Most externals hosts are External peers are 22
located in the same globally distributed.
geographic region.
SUMMARY AND FUTURE WORK
 Our work presented an extensive characterization of Web
and P2P traffic using full packet traces collected at a large
edge network.
 We observed a number of contrasting features between
Web and P2P traffic using flow-level and host-level

WWW 2008, Beijing


metrics.
 Flow-level distributional models were developed for Web
and P2P traffic, which can be used in network simulation
and emulation experiments.
 Traffic from other networks should be studied to facilitate
development of general models for Web and P2P traffic.
 Impact of other non-Web applications, such as P2P VoIP,
P2P IPTV, on Web-based applications can be studied as
well.

23

S-ar putea să vă placă și