Documente Academic
Documente Profesional
Documente Cultură
Phenomenon
ABSTRACT end-hosts, called bots, that are under the control of a human op-
The academic community has long acknowledged the existence of erator commonly known as a botmaster. While botnets recruit
malicious botnets, however to date, very little is known about the vulnerable machines using methods also utilized by other classes
behavior of these distributed computing platforms. To the best of of malware (e.g., remotely exploiting software vulnerabilities, so-
our knowledge, botnet behavior has never been methodically stud- cial engineering, etc.), their defining characteristic is the use of
ied, botnet prevalence on the Internet is mostly a mystery, and the command and control (C&C) channels. The primary purpose of
botnet life cycle has yet to be modeled. Uncertainty abounds. In these channels is to disseminate the botmasters’ commands to their
this paper, we attempt to clear the fog surrounding botnets by con- bot armies. These channels can operate over a variety of (logical)
structing a multifaceted and distributed measurement infrastruc- network topologies and use different communication mechanisms,
ture. Throughout a period of more than three months, we used this from established Internet protocols to more recent P2P protocols.
infrastructure to track 192 unique IRC botnets of size ranging from However, the vast majority of botnets today use the Internet Relay
a few hundred to several thousand infected end-hosts. Our results Chat (IRC) protocol [13] which was originally designed to form
show that botnets represent a major contributor to unwanted In- large social chat rooms.
ternet traffic—27% of all malicious connection attempts observed While other classes of malware were mostly used to demonstrate
from our distributed darknet can be directly attributed to botnet- technical prominence among hackers, botnets are predominantly
related spreading activity. Furthermore, we discovered evidence of used for illegal activities. These activities range from extortion of
botnet infections in 11% of the 800,000 DNS domains we exam- Internet businesses to e-mail spamming, identity theft, and software
ined, indicating a high diversity among botnet victims. Taken as a piracy. Unfortunately, even with the substantial increase in botnet
whole, these results not only highlight the prominence of botnets, activity witnessed over the past few years, little is known about the
but also provide deep insights that may facilitate further research to specifics of this malicious behavior. For instance, questions per-
curtail this phenomenon. taining to the prevalence of botnet activity, the number of different
botnet subspecies (and how they can be behaviorally categorized),
and the evolution of a botnet over its lifetime, abound.
Categories and Subject Descriptors This paper presents the results of our effort to address these ques-
D.4.6 [Operating Systems]: Security and Protection—Invasive Soft- tions. We argue that a thorough and complete understanding of this
ware problem calls for a multifaceted measurement approach. Further-
more, we believe that this approach must capture the behavior and
impact of botnets from multiple viewpoints. In that regard, this pa-
General Terms per makes two key contributions, namely (1) the development of a
Security, Measurement multifaceted infrastructure to capture and concurrently track mul-
tiple botnets in the wild, and (2) a comprehensive analysis of mea-
Keywords surements reflecting several important structural and behavioral as-
pects of botnets. The infrastructure we developed synthesizes mul-
Botnets, Computer Security, Malware, Network Security tiple data collection or “sensing” techniques, including distributed
malware collection points to capture botnet binaries, IRC tracking
1. INTRODUCTION to gain an insider perspective of the behavior of live botnets, and
Despite the fact that botnets first appeared several years ago, DNS cache probing to assess the global prevalence of botnets. We
they have only recently sparked the interest of the research com- were able to observe more than two hundred botnets and actively
munity. The term botnets is used to define networks of infected track more than a hundred long-lived ones over a period of more
than three months. By cross checking the multiple views gained
by the different sensing techniques we reveal a number of behav-
ioral and structural features of botnets not previously reported in
Permission to make digital or hard copies of all or part of this work for the literature.
personal or classroom use is granted without fee provided that copies are The remainder of the paper is organized as follows: Section 2
not made or distributed for profit or commercial advantage and that copies provides background information on botnets and highlights the chal-
bear this notice and the full citation on the first page. To copy otherwise, to lenges associated with botnet detection and tracking. We present
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
the approach we developed to infiltrate large numbers of botnets
IMC’06, October 25–27, 2006, Rio de Janeiro, Brazil. in Section 3, and describe a novel approach for extracting infor-
Copyright 2006 ACM 1-59593-561-4/06/0010 ...$5.00.
mation from the malicious binaries collected using our distributed Using a DNS name instead of a hard-coded IP address allows the
infrastructure. Section 4 presents the analysis of the collected data. botmaster to retain control of her botnet in the event that the current
Related work is presented in Section 5. We conclude in Section 6. IP address associated with the DNS name of the IRC server gets
black-listed.
Once the IP address of the IRC server is available, the bot at-
2. BACKGROUND tempts to establish an IRC session with the server and joins the
A botnet is a group of infected end-hosts under the command command and control channel specified in the bot binary. Gener-
of a botmaster. Figure 1 illustrates the various stages in a typi- ally speaking, the bot-to-IRC server communication requires any
cal botnet life-cycle. Botnets usually commandeer new victims by combination of three types of authentication. First, a bot needs to
remotely exploiting a vulnerability of the software running on the authenticate itself to the IRC server using the PASS message in or-
victim. Botnets borrow infection strategies from several classes of der to successfully begin the IRC session with that server. Second,
malware, including self-replicating worms, e-mail viruses, etc. In- botmasters normally protect the command and control channel with
fections can also be spread by convincing victims to run some form a password and hence require the bot to authenticate itself before
of malicious code on their machines (e.g., by executing an email joining. The passwords corresponding to these two authentication
attachment). phases are contained in the bot binary and authentication normally
Once infected, the victim typically executes a script (known as takes place in the clear. The third type of authentication, which
shellcode) that fetches the image of the actual bot binary from a is not part of the IRC protocol, requires the botmaster to authen-
specified location.1 Upon completion of the download, the bot bi- ticate herself to the bot population before she is able to issue any
nary installs itself to the target machine so that it starts automati- command. While the first two authentication steps are intended to
cally each time the victim is rebooted. thwart outsiders from joining the C&C channel, the last authenti-
cation phase aims to protect bots from being overtaken by other
Botmaster botmasters attempting to seize control of fledgling botnets.
5. Commands
IRC Once the bot successfully joins the specified IRC channel, it au-
Server(s)
tomatically parses and executes the channel topic. The topic con-
DNS tains the default command that every bot should execute. Depend-
Server ing on the channel mode set by the botmaster, bots might be able
Botnet to “hear” all messages exchanged on the channel. This broadcast
5. Commands behavior of IRC channels is a design feature that makes the IRC
4. Join
protocol suitable for supporting large-scale chat rooms. As we
show later, broadcast via the C&C channel is an invaluable source
Internet
of insider information about the activities and capabilities of some
botnets. While convenient, one cannot rely on this feature for infor-
2. Bot mation extraction, since in some cases, IRC servers disable it either
Download to reduce “chatter” or to limit communication overhead.
1. Exploit The steps described so far are shared among all IRC-based bot-
3. DNS Lookup (*) nets. The degree of commonality however ends here; different bot-
Vulnerable
Host nets express the set of the commands and responses exchanged be-
tween the botmaster and her bots as extensions on top of the stan-
Figure 1: The life-cycle of a typical botnet infection. Steps with dard IRC protocol. While the syntax of these commands follow the
enclosed asterisk are optional. same general structure, they do vary across different botnets. This
variability is primarily a result of botmasters’ desire to “person-
As mentioned earlier, the defining characteristic of botnets is alize” their bots, and in doing so, complicate the task of tracking
manifested by the fact that individual bots are controlled via com- these botnets in an intelligent manner. Moreover, the repertoire of
mands sent by the network’s botmaster. The communication chan- available commands elicit a wide variety of responses, which in
nel used to issue commands can be implemented using a variety turn, greatly complicates the classification of botnet behaviors.
of protocols (e.g., HTTP, P2P, etc.). However, the majority of bot-
nets today use the Internet Relay Chat (IRC) protocol [13]. The 3. MEASUREMENT METHODOLOGY
IRC protocol was specifically designed to allow for several forms
As Figure 2 illustrates, our data collection methodology encom-
of communication (point-to-point, point to multi-point, etc.) and
passes three logically distinct phases: (1) malware collection, (2)
data dissemination among large number of end-hosts. The inher-
binary analysis via gray-box testing and, (3) longitudinal tracking
ent flexibility of this protocol, as well as the availability of several
of IRC botnets through IRC and DNS trackers.
open-source implementations, enables third parties to extend it in
The goal of the malware collection phase is simply to collect as
ways that suit their needs. These features make IRC the protocol
many bot binaries as possible. However, developing a scalable and
of choice for botmasters, as it simplifies the botnet implementation
robust infrastructure to achieve this goal is a challenging problem
and provides a high degree of control over the bots. Therefore, for
in its own right, and has been the subject of numerous research
the rest of this paper, we focus our attention to IRC-based botnets
initiatives (e.g., [20, 23]). In particular, any malware collection in-
since IRC is the prominent botnet control mechanism in use today.
frastructure must support a wide array of data collection endpoints
Upon initialization, each bot attempts to contact the IRC server
and should be highly scalable. Additionally, special measures must
address given in the executable. In many cases, this step requires
be implemented to prevent any part of the system from participat-
resolving the DNS name of the IRC server (Step 3 in Figure 1).
ing in malfeasance. Our system design and implementation draw
1
We observed that botnets using random scanning to spread, usu- on experience learned from earlier work [1, 2, 10, 19, 20, 23], but
ally serve the bot binary from the same machine that exploited the include several additions that are unique to our goals. In what fol-
remote vulnerability in the first place. lows, we discuss the specifics of our infrastructure.
Figure 2: Overall data collection architecture.
PlanetLab Node
M ALWARE COLLECTION. As we show later, a significant portion
of botnet-related spreading activity is localized, targeting certain
parts of the IP space. Any single vantage point is thus likely to
miss substantial portions of such scanning activity. We attempt to Click NAT module
Nepenthes
minimize this undesired effect by deploying our collection archi-
tecture on a conglomeration of distributed darknets 2 . This collec-
Darknet Traffic
4.1.1 Botnet Traffic Share Figure 4: Time series of incoming SYN packets to the darknet.
Figure 4 shows a two week snapshot of the total incoming SYN
packets to our local darknet versus those originating from known 1e+06
All Traffic
botnet spreaders. Simply speaking, a botnet spreader is any source
Default Botmaster
Topic Command
Localized scanning 66% 15%
- Class A 11% 18%
- Class B 89% 82%
Targeted scanning 32% 87.4%
- Class A 80% 88%
- Class B 20% 12%
Uniform scanning 2% 0.3%
3500
2500
1500 3000
2500 2000
1000 2000
1500
1500
500 1000
1000
500
0 0 500
03/25 04/01 04/08 04/15 04/22 04/29 05/06 05/13 03/25 04/01 04/08 04/15 04/22 04/29 05/06 05/13 03/25 04/01 04/08 04/15 04/22 04/29 05/06 05/13
Date Date Date
(a) Semi-Exponential (b) Staircase (c) Linear
Figure 7: DNS views showing examples from multiple botnets with the three predominant growth patterns.
0 0 0
02/04 02/18 03/04 03/18 04/01 04/15 04/29 05/13 05/07 05/07 05/08 05/08 05/09 05/09 05/10 05/10 05/11 05/11 02/04 02/11 02/18 02/25 03/04 03/11 03/18 03/25 04/01
Time Time Time
(a) Semi-Exponential (b) Staircase (c) Linear
Figure 8: View from IRC tracker showing multiple botnets with the three predominant growth patterns.
served broadcast their messages to the C&C channel. From these presence of bridged structures by examining the server status
messages we estimate the evolution of a botnet by counting the messages (from the tracker’s logs) and seeing whether multi-
unique sources for messages broadcast to the channel. 7 Figure 8 ple servers are in use. The discrepancy between local versus
provides examples of the predominant growth patterns extracted total online users can also disclose whether or not bridging
from the tracker logs. For illustrative purposes, we only plot bot- is in place. Our analysis shows that 30% of the botnets were
nets of comparable sizes on a given plot. These trends confirm the bridged on multiple servers, 50% of which were bridged be-
heterogeneity observed from the DNS experiments. A closer look tween two servers only. Roughly 25% of the bridged servers
at the IRC tracker log shows that, for the most part, these growth were also known public servers.
patterns are a result of the aforementioned reasons.
• Several seemingly unrelated botnets appear, on closer exam-
4.3 Botnet Structures ination, to be very similar when compared in terms of their
Of the 318 total malicious binaries we collected, 60% were IRC naming conventions, channel names, and operators’ user IDs.
bots; only a handful used HTTP for the command and control.8 Our In many cases, these botnets seem to belong to the same bot-
tracker traces reveal four predominant IRC structures: master(s).
2500 0.1
0.01
Number of on-line Bots
2000
0.001
CCDF
1500
1e-04
1000
1e-05
500
1e-06
0 1e-07
04/05 04/07 04/09 04/11 04/13 04/15 04/17 04/19 04/21 04/23 04/25 0.01 0.1 1 10 100 1000 10000
Date Bot Staying time (min.)
Figure 9: Evolution of effective size for three of the botnets in Figure 10: CCDF of bot channel occupancy.
our study.
the end of the monitoring period, 55% of which were still actively
is (in general) significantly smaller than the botnet’s footprint. For scanning the Internet. A rather troubling finding is that among the
instance, the average footprint for the botnets depicted in Figure 9 botnets with the longest lifetimes were those that used static IP ad-
was greater than 10, 000, while at most ≈ 3, 000 bots were online dresses rather than DNS names for their IRC servers; we tracked
at the same time. While the effective size has less impact on long three such botnets for nearly three months with footprints exceed-
term activities (i.e., executing commands posted as channel topics), ing 3,500 bots. Lastly, several botnets were available only intermit-
it significantly affects the number of minions available to execute tently, but as soon as they were re-activated, they quickly regained
timely commands (e.g., DDoS attacks). momentum due to the continual connection attempts of their zom-
Another interesting observation is the strong 24 hour diurnal pat- bies. These observations raise serious concerns about the effective-
tern in the botnet effective size. The notable synchronization in the ness of current measures in curtailing the botnet problem.
peaks is the composition of online populations belonging to differ-
ent time zones. Indeed, this observation confirms the pattern first
4.6 Botnet Software Taxonomy
noted and studied by Dagon et al. in their analysis of the connec- Of the 192 confirmed IRC-based bot executables,9 138 responded
tion attempts from Type-I bots to black-holed IRC servers [6]. to the probes of the IRC query engine (Phase 2 of the graybox test-
ing process described in Section 3.1). The collected responses were
4.5 Lifetime used to assess the threads that the bots start after they are executed.
Table 3 tabulates the percentage of binaries that reported threads
The wide discrepancy between the footprint and the effective of each type. For example, almost 50% of the bots run a utility
size is likely due to the relatively long lifetime of a typical botnet. thread that disables anti-virus and firewall processes running on the
Under these conditions, bot death rates (e.g., as a result of patching) infected host (the so-called “AV/FW Killer”). Likewise, many bots
can significantly impact a botnet’s effective size. Additionally, as run an identd [12] server which is used to identify the user on the
we show shortly, the high churn rate of bots connected to the IRC client-end of a TCP connection; this feature is required by some
channel significantly contributes to this disparity. IRC servers and is used to verify that only intended bots join a
Figure 10 illustrates the distributions of channel occupancy times specific IRC channel. The registry monitor thread checks for mod-
for a number of botnets. These results were inferred by tracking ifications to the system’s registry and is presumably used to alert
unique joins and quits for the IRC servers that broadcast such in- the bot of any attempts to disable it.
formation. As the graph shows, botnet IRC channels exhibit high
churn rates, implying that bots generally do not stay long on the Utility Software Thread Frequency (%)
IRC channel. The average staying time for all bots across the bot-
nets we tracked is approximately 25 minutes, with 90% of them AV/FW Killer 49
staying for less than 50 minutes. Identd Server 43
Although we can not explain with certainty what causes this high System Security Monitor 40
churn rate, some likely causes include client instability as a result Registry Monitor 38
of the infection, machine hibernation, and (as we have frequently
seen) botmasters commanding bots to leave the channel. Lastly, Table 3: The percentage of bots that launched the respective
inspection of the traces also shows that botmasters have the longest services on the victim machines.
staying times among all users on the channel. Their fascination
The System Security Monitor is also related to bot self-defense
with keeping a close watch on the activities of the bots under their
in that it periodically calls the secure() function. This function
control, as well as the desire to keep their operator status on the
is often called manually by the botmaster to perform rudimentary
channel, is probably the impetus behind their behavior.
security “tightening” tasks, such as disabling DCOM services and
With regards to botnets lifetimes, our data shows that botnets are
file sharing. To our surprise, we witnessed newer versions of the
generally long-lived. Those that were shutdown remained active,
secure command that actually patched the LSASS vulnerability.
on average, for about 47 days before ceasing operation. Among
9
all the botnets we tracked, 84% of the IRC servers were still up at These binaries were unique based on their MD5 sums.
The number of reported exploit modules bundled within bot bi- 4.7 Insights from the “Insider’s view”
naries varied from 3 to 29, with an average of 15 exploits per bi- The tracker observed botmasters of varying skill levels, rang-
nary.10 The three most popular exploits were DCOM135, LSASS- ing from novices frustrated with their lack of success at manag-
445, and NTPASS, all of which appear in over 75% of the binaries. ing a handful of bots to botmasters performing far more sophisti-
We also note that several executables reported identical results, cated behaviors. For example, our traces show that (i) botmasters
with the exception of their exploit capabilities. This is indicative share information regarding what prefixes they should not scan12 ,
of the modularity of bot software, in that new exploit modules can (ii) tweak their bots to keep chatter on the C&C channel to a min-
be added with relative ease. imum, and (iii) actively probe selected bots to detect and isolate
“misbehavers” (i.e. bots that do not seem to respond to their com-
Service Pack mands) and similarly, look for “super-bots” with valuable resources
OS version % inf. None SP1 SP2 SP3+ (e.g., high bandwidth network links and storage capacities).
Win XP 82.6 .47 .52 .01 n/a Our inside look has also afforded us the opportunity to witness
Win 2000 16.1 .09 .05 .03 .83 several instances during which bots were commanded to migrate,
Win Server 1.3 .57 .43 n/a n/a either by being instructed to move to a new IRC channel/server or
to download replacement software that pointed them to a different
Table 4: Distribution of exploited hosts extracted from the IRC C&C server. Figure 12 provides a snapshot of one such migration
tracker logs. instance captured by the IRC tracker. By simultaneously partici-
pating in two separate botnets, the tracker was able to witness the
Table 4 indicates that the botnets we tracked target a diverse set surge in membership in one botnet immediately after a migration
of operating systems. It is noteworthy that even the latest version of command was issued in the other. Additionally, we observed sev-
Windows XP (Service Pack 2) is not immune to attacks. Given the eral instances of cloning (i.e., instantiating multiple IRC sessions to
danger that bots pose to end-users, and to the Internet in general, we a specified server). Many of the cloning events we witnessed can
were interested to see if off-the-shelf anti-virus products offer pro- be attributed to attacks intended to overwhelm another IRC server;
tection against IRC bots. To answer this question, we subjected all other events may be evidence of some form of bot “leasing”.
confirmed IRC-bot binaries to virus scans using the Open Source
ClamAV tool [4] and Norton’s anti-virus, each using their most re- 800
cent definitions. The results were somewhat reassuring: ClamAV Population- Botnet I
700 Population- Botnet II
classified 137 of 192 binaries as malicious, and Norton AV detected
179. 600
Number of on-line Bots
35 100
Percentage of all captured executables
30 0
12:00 16:00 20:00 00:00 04:00 08:00 12:00 16:00
25 Time (hh:mm)
20
Figure 12: Migration of a botnet as observed by our IRC
15
Tracker.
10
Command Type Frequency (%)
5
Control 33
0
Scanning 28
Cloning 15
M
U ot
G las
Sd ot fied
IR ot
D Bo
D nlo
Pa M der
Ko ob en elf
Pc go. .N
M lie
G ot.
C ot. n
M bo
Pa ob.
Po ob ryp
M bo
G ob.
Pc bo P
Ai lien 34
W bot
nc
ow t
C a
od T
ob si
ob ge
ao B
yb
yb nt
yt t
yt t
m t
32
C
C H
c t.4
B
d .G .D
r ot
d C
e ot t.G
O
Mining 7
.M
S0
Download 7
5
en
Attack 7
Figure 11: Breakdown of captured IRC-bots based on Cla- Other 3
mAV’s classification
Table 5: Relative frequency of commands observed across all
tracked botnets.