Network Characterisation for Botnet Detection Using Statistical Behavioral Methods

Published on August 2016 | Categories: Types, Research, Internet & Technology | Downloads: 82 | Comments: 0 | Views: 440
of 72
Download PDF   Embed   Report

Comments

Content


NETWORK CHARACTERIZATION FOR BOTNET DETECTION USING
STATISTICAL-BEHAVIORAL METHODS
A Thesis
Submitted to the Faculty
in partial fulfillment of the requirements for the
degree of
Master of Science
by
ALEXANDER V. BARSAMIAN
Thayer School of Engineering
Dartmouth College
Hanover, New Hampshire
June, 2009
Examining Committee:
Chairman
(Vincent H. Berk)
Member
(George V. Cybenko)
Member
(Eugene Santos, Jr.)
Brian W. Pogue
Dean of Graduate Studies
(Signature of Author)
c 2009 Trustees of Dartmouth College
Abstract
This thesis presents a framework for characterizing network behavior on an Ethernet-
protocol network. We begin with the network traffic aggregated from packet series into
sessions and hypothesize that from this data we can characterize a variety of behaviors.
First, we note that a variety of summary measurements over these sessions together
imply a “behavioral signature” for the observed behavior of the network as a whole, as well
as for subsegments of the network, individual hosts, and services. The signature is seen to
be stable over time; as such, it is both characteristic and predictive of behavior. We develop
and evaluate a method which uses the Kullback-Leibler divergence to measure conformity
to the signature and detect changes in behavior. We also describe reliable methods for
detecting periodic and synchronous behavior based on K-means approximation.
Next, this thesis investigates an application of the proposed methods for detecting the
presence and operation of botnet-infected hosts on a network. Botnets are collections of
compromised computers running software under a common command-and-control infras-
tructure, usually used for malicious purposes.
ii
The framework and methods presented here represent a practical and extensible research
platform for network behavior characterization, and the botnet detection application serves
as a valuable proof-of-concept demonstration of the framework.
1
1
This work results from a research program in the Institute for Security, Technology, and Society at
Dartmouth College, supported by the U.S. Department of Homeland Security under Grant Award Number
2006-CS-001-000001. The views and conclusions contained in this document are those of the authors and
should not be interpreted as necessarily representing the official policies, either expressed or implied, of the
U.S. Department of Homeland Security.
iii
Preface and Acknowledgements
I cannot precisely say when I began work on this thesis, although I am sure to mourn, a
little, its completion. It is the culmination of several years of work during which I have been
accompanied and supported by many people. It is my great pleasure to thank them here.
I would like to gratefully acknowledge the patient supervision and support of my advisor,
Dr. George V. Cybenko, who has been instrumental in ensuring my academic, professional,
and financial well-being for as long as I have known him. I enthusiastically thank Dr.
Eugene Santos for his participation on my thesis committee and Dr. Vincent H. Berk for
chairing it, as well as for his encouragement and vision, his help with NetSAW and numerous
other technical matters, and for the idea (among many others) of using the Kullback-Leibler
distance to compare probability distributions.
I also thank Dr. Sean W. Smith, who has the distinction of being one of the first to have
seen anything resembling an academic in me; Dr. John P. Murphy for assistance with Java
and Matlab and for his work on the NetFEE graphical frontend; Dr. Robert A. Savell for
faithfully keeping our lab’s sponsors informed and happy; Sarah White and Lexi Heywood
for meticulously proofreading this manuscript; and all of the PQS/HBM lab at Thayer for
their stimulation, motivation, and not least of all, their friendship.
I have never come to regard a place as my home until these nine years I have lived in
iv
New Hampshire. Hanover became my home not merely on account of her beauty, but for
the personal and spiritual growth I experienced living here. The credit for that growth is
due almost entirely to the extraordinary people from all walks of life who have surrounded
me. Thanks to them, Hanover shall always be my home, the place where I grew up.
Thank you, my friends now scattered near and far, for loving me. Thank you, my
long-suffering housemates, past and present, for sharing your homes and your lives with
me. Thank you, my faith families at Christ Redeemer Church, Christian Impact, and the
Whaley community group, for being my anchors in the storm.
Thank you, Pastor Don Willeman: you have been like a father to me.
Thank you, my family, for your unrelenting support and love.
And thank you, all the little children in my life, but especially you: Jon and Jake, Phia
and Danny, Sydney, Benjamin. Years from now, if you are to somehow happen upon this
page, know that as you grew up before my eyes, your innocence and joy buoyed my spirit
in ways that words cannot express.
Finally, I give thanks to God Almighty, the Maker of Heaven and Earth, who has given
me life and reason. It is His hand I see everywhere in the lives of those around me and in
the awe-inspiring creation that it is my privilege to study.
v
Dedication
He named it Ebenezer, saying, “Thus far has the Lord helped us.”
—1 Samuel 7:12, Holy Bible, New International Version
vi
Contents
Abstract ii
Preface and Acknowledgements iv
Dedication vi
Table of Contents vii
List of Tables ix
List of Figures x
1 Introduction 1
1.1 Computer Network Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Network Traffic Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2.1 Self-similarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.2 Dimensionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.3 Dynamism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3.1 Botnets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Outline of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2 Background 10
2.1 Network Behavior Analysis Tools . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.1 Wireshark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.2 Radial Traffic Analyzer . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.3 Network Analysis Visualization . . . . . . . . . . . . . . . . . . . . . 12
2.1.4 PortVis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.1.5 Commercial Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2 Botnet Detection Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2.1 Host-Based Detection . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2.2 Network Signature-Based Detection . . . . . . . . . . . . . . . . . . 20
2.2.3 Network Statistical-Behavioral Approach . . . . . . . . . . . . . . . 21
2.3 Botnet Behavior in Depth . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3.1 Infection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3.2 Command-and-control Discovery . . . . . . . . . . . . . . . . . . . . 22
2.3.3 Command Channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
vii
2.3.4 Spreading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3.5 Common Malicious Activities . . . . . . . . . . . . . . . . . . . . . . 24
3 Tools, Metrics, and Algorithms 25
3.1 Toolkit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.1.1 NetSAW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.1.2 NetFEE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2 First-Tier Analysis (Metrics) . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3 Second-Tier Analysis (Algorithms) . . . . . . . . . . . . . . . . . . . . . . . 33
3.3.1 Change Detection (Kullback-Leibler Divergence) . . . . . . . . . . . 33
3.3.2 Periodicity Detection Using K-means . . . . . . . . . . . . . . . . . . 36
3.3.3 Synchronicity Detection Using K-means . . . . . . . . . . . . . . . . 37
4 Experimentation 39
4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.1.1 Non-intrusiveness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.1.2 Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.1.3 Software & Hardware in Use . . . . . . . . . . . . . . . . . . . . . . 43
4.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.2.1 Detecting Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.2.2 Detecting Beaconing . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.2.3 Detecting Synchronicity . . . . . . . . . . . . . . . . . . . . . . . . . 53
5 Conclusions 54
5.1 Evaluation of Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.2 Further Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Bibliography 58
viii
List of Tables
1.1 Partial Listing of Major Botnets [16] . . . . . . . . . . . . . . . . . . . . . . 9
2.1 Network Behavior Analysis Tool and Approach Summary . . . . . . . . . . 18
2.2 Easily Misclassified Behaviors . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.3 Botnet Behavior Overview by Phase . . . . . . . . . . . . . . . . . . . . . . 24
3.1 NetSAW SQL Record Format . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2 NetFEE Command Line Parameters . . . . . . . . . . . . . . . . . . . . . . 30
4.1 KL Distances of Snapshots for Sudden Change Experiment . . . . . . . . . 45
4.2 KL Distances of Snapshots for Gradual Change Experiment . . . . . . . . . 47
ix
List of Figures
1.1 Phases of botnet creation and maintenance . . . . . . . . . . . . . . . . . . 5
1.2 IRC-based botnet architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 WWW-based botnet architecture . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4 P2P-based botnet architecture . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1 Wireshark user interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Radial Traffic Analyzer view of a host’s activity . . . . . . . . . . . . . . . . 12
2.3 Overview of NAV Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4 NAV Visualization of Portscan Behavior . . . . . . . . . . . . . . . . . . . . 14
2.5 PortVis Visualization of Portscan Behavior . . . . . . . . . . . . . . . . . . 16
2.6 Intent Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.1 NetFEE Graphical User Interface (1) . . . . . . . . . . . . . . . . . . . . . . 28
3.2 NetFEE Graphical User Interface (2) . . . . . . . . . . . . . . . . . . . . . . 28
3.3 NetFEE Graphical User Interface (3) . . . . . . . . . . . . . . . . . . . . . . 29
3.4 Session volume balance histogram . . . . . . . . . . . . . . . . . . . . . . . . 31
3.5 Session durations histogram . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.6 Hourly volume, day-by-day, on the test network over four weeks . . . . . . . 33
4.1 NetSAW/NetFEE Experimental Architecture . . . . . . . . . . . . . . . . . 40
4.2 Portrait of Apollo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.3 Abrupt change in session volume balance . . . . . . . . . . . . . . . . . . . 45
4.4 KL distances during “abrupt change” . . . . . . . . . . . . . . . . . . . . . 46
4.5 Gradual change in session volume balance . . . . . . . . . . . . . . . . . . . 47
4.6 KL distances during “gradual change” experiment . . . . . . . . . . . . . . 48
4.7 Periodicity detector labeling . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
x
Chapter 1
Introduction
1.1 Computer Network Behavior
Computer networks are critical to modern society. An extensive range of business, infras-
tructure, and human needs, such as communications, utilities, banks, and leisure services
are now provided by systems that rely on the secure and efficient operation of networks.
As networks increase in size and complexity, a thorough understanding of their behavior
is crucial to protect them from security threats. Both synthetic and real-world data on
networks are plentiful and fairly easy to obtain. However, for a variety of reasons, gaining
useful insight into network behavior by looking at network traffic is non-trivial.
1.2 Network Traffic Challenges
For the purposes of this thesis, we will consider Ethernet network traffic at the level of series
of packets between hosts, aggregated into sessions.
There is undeniable utility in inspecting individual packets (as evidenced by the success
of projects such as Snort [1] and Wireshark [2]) and even in considering lower levels of the
1
communications stack, where physical- and device-level security threats such as spoofing
can hide [3]; however, attempting such investigation on a large, high-throughput network is
extremely resource-intensive. Furthermore, we are interested in characterizing behavior of
hosts, processes, and users, not network interfaces and device drivers.
Therefore, we intentionally restrict our data source to the level of session data. The
purpose of this thesis is to explore insights gained by applying statistical-behavioral methods
to that level of traffic.
It turns out that even when so restricted, traffic at the session level is quite rich in
several ways.
1.2.1 Self-similarity
The first way network sessions are rich is in their self-similarity.
Crovella and Bestavros observed that traffic patterns on both local- and wide-area net-
works exhibit a fractal or self-similar property; that is, significant features, corresponding
to traffic burstiness, appear on a wide range of time scales [4]. They go on to say that the
“self-similarity of Web traffic is not a machine-induced artifact” and that it is likely the
result of user behaviors.
Therefore, without some a priori knowledge of the phenomenon in which we are inter-
ested, such as, say, office workers’ weekly routines or the hourly behavior of an automated
script, learning what kind of time scale will contain the features that characterize the be-
havior is difficult.
2
We are naturally limited in one direction by the resolution of the sensors used to capture
the data. It is impossible to search for features on a shorter time scale than the sample rate,
and we concede that our analysis may consequently be subject to certain selection effects
1
.
However, in the other direction, there are no hard limits on the duration of a behavior
except those arbitrarily imposed. There is therefore no foolproof way to ascertain whether
an observed event is a repetition of a previous behavior, an entirely new behavior, or a part
of a composite behavior which has not yet been fully observed.
The principle of the self-similarity of network traffic must be considered when applying
statistical-behavioral methods and is one of the main challenges of this study.
1.2.2 Dimensionality
The next challenge presented by network traffic represented as flows is the dimensionality
of the data. For example, in the Cisco Systems NetFlow v9 specification [5], a compliant
exporter may export a combination of source address, destination address, source port
number, destination port number, the number of packets exchanged in each direction, the
number of bytes exchanged in each direction, a record of TCP flags, and more. Other
network flow representation standards, such as sFlow [6], are similarly rich.
Once we have decided what to measure (e.g., bytes transferred) and what time scale
to measure it on (e.g., per minute), the question becomes where we are to measure it.
Should we consider the whole network (from the perspective of our sensor or sensors)? Or
conversations between two particular hosts? Or all traffic corresponding only to a particular
service (that is, a host and port)?
1
Suppose that you use a net to catch one hundred fish from a pond, all of which are larger than six
inches. We might ask whether this data support the hypothesis that most fish in the pond are larger than
six inches, and we might conclude that it does. But if your net’s holes are too large to catch smaller fish, it
certainly does not and cannot. Limitations of a data collection process restrict the scope of the reasonable
conclusions that can be drawn from the data.
3
Deciding which cross-section of records to consider when characterizing behavior is a
deep data-mining problem in its own right. It is the hope of the author that this study will
shed some light on this problem.
1.2.3 Dynamism
Whenever we talk about characterizing behavior, we necessarily imply a subject or subjects:
for example, a subnet, a user, a host, or a set of processes. The number of potential actors
2
on even a small network is tremendous, and each such actor has a different goal, different
methods, and possibly even changing goals and methods over time. This results in significant
changes in user demands on the network over both time and space.
Furthermore, network-wide interactions, such as those induced by the four TCP conges-
tion control algorithms [7] or by the spread of a “viral video”, lead to complex underlying
processes among flows.
This internal adaptive behavior of flows, coupled with changes in user demands over
both time and space, creates elusive collective dynamics between flows that are still not
well-understood [8].
1.3 Motivation
Having discussed some of the challenges inherent to computer network behavior character-
ization, let us consider a strong motivation for attacking the problem.
2
In this context, an actor might be a person or it might be an automated backup process; in principle,
anything that tries to achieve a task using the network may be modeled as having intentions, behaviors, and
observable events related to them.
4
1.3.1 Botnets
In their Botnet Research Survey [9], Zhu et al. define a botnet as a collection of software
“robots” that run on host computers autonomously and automatically, controlled remotely
by an attacker or attackers. They describe four phases of botnet creation and maintenance:
1. Initial Infection.
2. Secondary Injection.
3. Malicious Activity.
4. Maintenance and Upgrade.
Botmaster, directly or via peers
Infected peer,
malicious website,
file attachment, etc
Computer owner,
network peers
1 2 3 4
(infects)
(injects) (maintains)
(harms)
Figure 1.1: Phases of botnet creation and maintenance
Initial infection is the exploit that an attacker uses to get the bot software running on
the host computer for the first time. In secondary injection, the bot running on the infected
host receives commands from the botmaster via the command-and-control network. It then
autonomously carries out whatever malicious activities the secondary injection specifies,
5
including spreading itself to vulnerable peers, and occasionally reports in to the botmaster
for maintenance and upgrades, that is, for updates to the mandate of the secondary injection.
Botnets use a variety of exploit methods for initial infection and a variety of communi-
cation protocols and architectures for the remaining phases, including Internet Relay Chat
(Figure 1.2), HTTP (Figure 1.3), and peer-to-peer (Figure 1.4). Broadly speaking, however,
certain behaviors are common to all botnets [10]. The detection of such behaviors shall be
our focus; for while a skilled bot author may use rootkits and other techniques to hide a
bot’s presence from the machine hosting it, there is little a bot author can do to conceal
the network behavior of his botnets except to exploit every host and router on a network.
Bot
Master
IRC
IRC
IRC
Bot Bot Bot Bot Bot
Figure 1.2: IRC-based botnet architecture
Botnet Threats
Botnets are used by attackers for a wide variety of nefarious activities, which can be broadly
divided into three categories: bandwidth-oriented activity (1), proxied abuse (2), and com-
puting power (3). [11]
(1) One use of botnets is in coordinating massive distributed denial-of-service attacks [12],
where a large number of bots simultaneously engage in web-browsing behavior to ex-
6
Bot
Master
www www
Bot Bot Bot Bot Bot
www
Figure 1.3: WWW-based botnet architecture
Bot
Master
Bot
Bot
Bot
Bot
Bot
Figure 1.4: P2P-based botnet architecture
7
haust the resources of a particular target. Since the requests from the bots differ in
intent but not in content, it is hard to defend against them. Another is in launch-
ing viruses and worms; in 2004, the Witty worm compromised almost all vulnerable
hosts in forty-five minutes after being launched from many points simultaneously by
a botnet [13].
(2) Botnets have long been used to provide spammers with the address books, compute
horsepower, and organizational IP addresses they need to find their targets, distribute
their missives, and get them through spam filters [10]. They are also used in clickfraud
scams [14], to install adware [10], and to install keylogging or other spyware to steal
personal information from victims [15].
(3) Botnets are used as distributed supercomputers by attackers wishing to crack cryp-
tographic keys [11].
Computer and network disruption, cryptographic compromise, extortion, and identity
theft are therefore all among the threats posed or tactics aided by botnets.
Stewart maintains a list [16] of historical botnets, summarized in Table 1.1 and notes
that botnets and botnet subsets are illicitly traded between those with the skill and means
to create and maintain them and those who would use them for nefarious purposes.
1.4 Outline of Thesis
In chapter one, we set out the motivation for a behavioral understanding of computer
networks, and consider some of the challenges inherent in gaining such insight.
In chapter two, we conduct a survey of current work in network behavior analysis and
botnet detection.
8
Table 1.1: Partial Listing of Major Botnets [16]
Name Est. Number Spam Capacity Aliases
of Bots
Conficker 10,000,000+ 10 billion/day DownUp, DownAndUp,
DownAdUp, Kido
Kraken 495,000 9 billion/day Kracken
Srizbi 450,000 60 billion/day Cbeplay, Exchanger
Bobax 185,000 9 billion/day Bobic, Oderoor, Cotmonger,
Hacktool.Spammer, Kraken
Rustock 150,000 30 billion/day RKRustok, Costrat
Cutwail 125,000 16 billion/day Pandex, Mutant
Storm 85,000 3 billion/day Nuwar, Peacomm, Zhelatin
Grum 50,000 2 billion/day Tedroo
Onewordsub 40,000 ? ?
Mega-D 35,000 10 billion/day Ozdok
Nucrypt 20,000 5 billion/day Loosky, Locksky
Wopla 20,000 600 million/day Pokier, Slogger
Spamthru 12,000 350 million/day Spam-DComServ, Covesmer,
Xmiler
In chapter three, we describe some of the statistical-behavioral methods we have devel-
oped for characterizing computer network behavior.
In chapter four, we describe our experimental setup and experiments for the botnet
detection challenge.
Finally, we discuss the results of those experiments and the conclusions we have drawn
in chapter five.
9
Chapter 2
Background
2.1 Network Behavior Analysis Tools
A wide variety of tools and technologies have been developed to help understand complex
behaviors in networks. This section provides an overview of some of these technologies, as
well as references for further investigation. We cover these in an order that approximates
their level of abstraction from the traffic: from the “raw” packet traces on one extreme to
high-level summaries and visualizations on the other.
2.1.1 Wireshark
Wireshark [2] is a basic open-source packet sniffer and analyzer. It sniffs the network traffic
visible from an interface that has been placed in promiscuous mode, records it to disk, and
displays the packet traces in a linear format to the user. It has many built-in protocol
parsers to add some semantic information to the data displayed, but does little else.
10
Figure 2.1: Wireshark user interface
11
2.1.2 Radial Traffic Analyzer
The Radial Traffic Analyzer [17] provides an interesting visualization of the network traffic
with respect to a particular host. It builds a radial representation of the distribution of
some volume of communication along source and destination addresses and ports.
RTA is claimed to display portscan behavior “conspicuously.”
IP addresses
localhost
others
Application ports
web
secure web
mail
secure mail
Telnet/MS Remote Desktop
SSH
FTP/Netbios
others
Figure 3: RTA display showing the distribution of network traffic of a local computer. We maintain an overview by grouping the packets
from inside to outside. The inner two circles represent the source and destination IP addresses, the outer two circles represent the source and
destination ports. Traffic originating from the local computer can be recognized by the lavender colored circle segment in the inner ring. Traffic
to this host can be recognized by the lavender colored segments on the second ring. Normally, ports reveal the application type of the respective
traffic. This display is dominated by web traffic (port 80 - colored green), remote desktop and login applications (port 3389 - red, port 22 -
bright red) and E-mail traffic (blue).
vides a thin implementation on the database side. To better serve
the performance requirements of monitoring large networks, we
decided to integrate a second database interface for a PostgreSQL
database [2]. For the analysis of larger data sets, a more intelligent
preprocessing is employed by merging individual packets to ses-
sions to significantly reduce the database size. The easiest way to
do this preprocessing is to take advantage of the knowledge imple-
mented in commercial routers by exporting their packet statistics
functionalities which group matching outgoing and incoming pack-
ets into one connection.
Usually, the data to be examined is abundant and the normal
daily patterns conceal exceptional traffic patterns. Therefore, filters
are crucial for the task of finding malfunctions and threats within
the information infrastructure. In our tool, we implemented rules to
discard “ordinary” traffic (e.g., web traffic), but also to select just
certain subsets of the traffic (e.g., traffic on ports used by known
root-kits). In the course of the visual analytics process, the user in-
teractively applies, combines, and refines these automatic analysis
methods to confirm or reject hypotheses about the data in her or his
search for insight.
4 RADIAL TRAFFIC ANALYZER
The visualization metaphor of the Radial Traffic Analyzer (RTA)
consists of concentric rings subdivided into sectors and is very close
to the Solar Plot, Sunburst and the Interring [7, 21, 23]. Roots of
the utilized radial layout are discussed in previous work of ours (cf.
[8]).
As users might tend to minimize eye movements, the cost of
sampling will be reduced if items are spatially close (cf. [22], p.
156). We therefore choose a radial layout for RTA, place the most
important attribute (as chosen by the user) in the inner circle, and
arrange the values in ascending order, to allow better comparisons
of close and distant items. The subdivision of this ring is conducted
according to the proportions of the measurement (i.e. number of
packets or connections) using an aggregation function over all tu-
ples with identical values for this attribute. Each further ring dis-
plays another attribute and uses the attributes of the rings further
inside for grouping and sorting, prioritized by the order of the rings
from inside to outside as illustrated in Figure 2.
In the default configuration, we use four of these rings. The vi-
sualization is to be read from inside to outside, starting from the
innermost ring for the source IP addresses, the second ring for the
destination IP addresses, and the remaining two rings for the source
and the destination ports, respectively. In Figure 3 beginning on
the right, we map the fractions of the payloads for each group of
network traffic counter-clockwise on the rings while sorting the
groups according to ip
src
, ip
dst
, port
src
, and port
dst
. Beginning
with grouping the traffic according to ip
src
, we add another group-
ing criteria for each ring further outside. This results in a finer
subdivision of each sector on the next ring.
To facilitate a better understanding of the rings, sectors repre-
senting identical IP addresses (inner two rings) are drawn in the
same color, ports (outer two rings) respectively. To further en-
hance the coloring concept, we created a mapping function for
ordinal attributes that maps a number x (i.e., the port number,
or IP address number) to the indices of an appropriate colormap:
c(x) = x mod n (n: number of distinct colors used). Prominent
Figure 2.2: Radial Traffic Analyzer view of a host’s activity
In the RTA visualization, packets are grouped from inside to outside, and the four rings
are source IP, destination IP, source port, and destination port.
2.1.3 Network Analysis Visualization
Network Analysis Visualization (NAV) is a project of the University of British Columbia’s
Department of Computer Science with the aim of gaining a higher level of understanding
of network events than that provided by linear views of packet traces [18].
NAV provides a bipartite visual overview of traffic at the border of the network to
12
which have the open source pcap [26] library as their capture in-
terface. As an example, log files generated by Ethereal [6] or tcp-
dump [12] can be used. Because it is a feature of the pcap library to
provide a unified interface to reading in a log file or from a network
interface, extending the application to accept live data in real time
will require only a modest rewrite of some of the internal structures
and can be considered as a future expansion to the project.
When the log file begins, users are requested to provide a num-
ber of pieces of information to NAV including the time range for
the capture, an optional filter, and both the local network range and
mask. The pcap library implements a bit packet filter which ac-
cepts a filter definition in the form of a string that our application
passes from the user to the pcap library. More details about the
filtering capabilities are available in section 4.1.5. The local net-
work address and mask must be provided by the user in order for
the NAV application to determine which addresses represent remote
and local hosts; from a raw capture file there is no automated mech-
anism for making this distinction. The network address is specified
in standard internet ’dotted quad’ notation for IP addresses in the
form V.X.Y.Z where each of {V,X,Y,Z} are a decimal range be-
tween 0-255. Internally each of these ’octets’ is stored as an eight
bit integer and taken together form a thirty two bit unsigned integer.
The network mask is also provided in dotted quad format and when
translated to binary represents a mask of ones followed by zeros
that demarcate the bits specifying hosts on a subnet from the bits
specifying the network address. The network address is then de-
rived by taking any IP address inside of a subnet and applying the
’mask’ for that subnet which involves performing a bitwise logical
AND operation between the IP address and the netmask. Many sys-
tem administrators are familiar with, and still use, the much simpler
classful system of specifying internet addresses where the smallest
address space available was a ’Class C’ block of 253 usable ad-
dresses. However, Classless Internet-Domain Routing (CIDR) from
RFC 1519 [22] has been the standard for some time now and we felt
it was important to support more flexible IP range configurations at
the expense of the added complexity of requiring the user to specify
a full netmask. The user is free to specify an initial time range for
the log file that exceeds the actual duration of the log file; provid-
ing the ability to select a time range in the open dialog is provided
simply as a convenience if the user knows in advance a particular
time frame they are interested in.
We intend to provide two primary ’views’ into the data flow, a
service and an IP view. The IP centered model involves the cre-
ation of two IP ’walls’ of addresses with one side representing lo-
cal addresses and the other remote hosts. The local address range is
specified by the user of the application. Lines are drawn to connect
the local and remote hosts, with further information such as traffic
type encoded as line colour and line width indicating traffic volume.
Users can ’aggregate’ ranges of IP addresses to reduce the num-
ber of line crossings. Dynamic queries are implemented with the
use of a time slider to help users sort through the data. Filters are
applied to both views simultaneously, which allows them to further
reduce the number of crossings on the wall as well as simplify the
graphs in the services view. Overall the visual focus of the IP wall
view is to provide the capability to reduce edge crossings and pro-
vide a scalable visualization.
The services view is based on a trellis of 2D scatter plots or line
graphs. The trellis view is inspired by R. A. Becker’s The Visual
Design and Control of Trellis Display [4]. Each graph represents a
service; the x-axis is time and the y-axis is bytes/s of traffic. The IP
and services views can be brushed to display the packet detail in the
detail view. For example, if the user click and drags the HTTP line
graph onto the detail view, the detail view will display all the HTTP
traffic until the filter is reset or a different brushing operation takes
place. Similarly, dragging an IP from the wall view onto the detail
view ties the detail view to showing all packets from that specific
IP. One scalability enhancement that can be made to this view is to
allow portions of the time axis to ’stretch’ or ’compress’ through
user interaction allowing them to gain details on demand. If this
proves too problematic the user could be given the ability to ’pan’
through the time axis by scrolling left and right on the graph.
The two primary views were implemented vertically because the
’wall’ viewwas better suited to this orientation whereas the services
view could be laid out either way. The detail view benefits from a
horizontal orientation because it allows users to see all the details
from each packet without having to scroll horizontally. The hori-
zontal orientation of the detail view allows NAV to display packet
details without using much of the space which is beneficial to our
two main views.
4.1 Features
This section will discuss and describe all of the currently imple-
mented features of NAV. Unimplemented features will be detailed
in section 9.
Our default system displays the IP wall view on the left side,
the services view on the right side and the detail view in a frame
which spans the full width of the application frame across the bot-
tom. Each of these views is collapsible or expandable with a single
click. The proportion of the screen taken by each view is adjustable
by means of split pane sliders. When a log file is opened, users are
able to specify a filter at that time. We have selected 8 default Ser-
vices to display, which are web traffic (http), SSL encrypted web
traffic (https), FTP data (ftp-data), IRC (irc3), bittorrent activity
(bittorrent), Microsoft Messenger (msnp), email in the pop3 format
(pop3) and Kazaa file sharing (kazaa). These are user configurable,
which will be discussed below in section 4.1.8.
Figure 5: NAV Overview.
4.1.1 IP Wall View
The purpose of the IP wall view located on the center left frame
of the application is to provide information to the user about what
local and remote Internet Protocol addresses and ports are generat-
ing network traffic and to provide rapid visual feedback concerning
how much traffic is being transmitted between which hosts. When
the system is in playback mode it is possible to see new connections
being formed between IP:Port pairs, as well as new local or remote
addresses that are becoming active. The VISUAL project [3], as
discussed in section 2 uses a similar local and remote distinction.
In the IP wall view local addresses are placed on the left side of
the display and remote addresses are located on the right separated
from one another by a pair of lines that represent ’the wall’. This
Figure 2.3: Overview of NAV Interface
13
help administrators understand local hosts’ interactions with remote ones. It is therefore
particularly well-suited to visualizing otherwise benign phenomena that gain the potential
for mischief when they are correlated across many hosts, such as (once again) portscan
behavior.
design visually forms a barrier between the two types of addresses.
Connections are represented by coloured lines that connect a lo-
cal address to remote hosts. Initially only ten default services are
colour coded using a subset of Ware’s [28] twelve recommended
colours for use in colour coding (excluding black which is used to
denote uncoded services and white which is the background colour
of the display). The default services and colours are discussed in
section 4.1.2 and more details on colour selection can be found in
section 4.1.7. When a connection is established between an IP:Port
combination the section of the ’wall’ beside the IP label is also mod-
ified to show the colour of the connection. This helps to establish a
visual link between the IP label and the connection.
Figure 6: The IP wall view. The left list represents local addresses
and the right represents remote addresses. Connections are repre-
sented by coloured lines. Widgets are provided to allow ranges to be
collapsed and to disable visualization of connections associated with
a given element or range.
The wall view contains a number of key features to reduce the
number of edge crossing ’snarls’ that occur when there is a large
quantity of traffic being visualized. One of the most important fea-
tures is the ability to visually ’collapse’ a port or address range for
both local and remote hosts. Address ranges can be collapsed to a
single entry for a class A, B or C network address range, and indi-
vidual hosts can have all of their ports collapsed to a single entry
for that host. When a collapse takes place, all of the connections
that originated from ’child’ entries are moved to their parent rep-
resentative. One consequence of collapsing a large range of ports
or addresses is that multiple services are typically contained in a
single entry, but this information can no longer be clearly coded.
Possible solutions to this problem are discussed in the Future Work
section.
When a user is not interested in seeing the edge connections for
a given port, host, or a network class of hosts, the user can sim-
ply collapse up to the desired level and then click on a trigger on
the ’outside’ of the local or remote label to disconnect that address
from the wall. The result of this disconnection is immediately visi-
ble, providing important feedback to the user. The IP address label
slides to the left or right (depending which side it’s located on) in-
dicating it is disconnected from the wall, and all connections to that
object vanish. However, the wall section colouring remains as a vi-
sual indicator to the user as to which service type was connected to
that label.
The wall view is also intended to convey information about im-
portant events that have taken place on a connection between local
and remote entries. Although this feature is not fully implemented
by NAV at this time the underlying code to support this capability
has been integrated into the code base and possible future directions
are discussed in section 9.1.3.
Information about the amount of traffic between any two points
is encoded in the form of a horizontal bar that displays just above
the label for a given local or remote entry in the IP Wall View.
This bar uses a base two logarithmic scale, where the largest bar
indicates the IP that has transmitted the most traffic. Using a loga-
rithmic scale enables huge disparities in transfer rates to be distin-
guished, while still permitting an accurate comparison of smaller
transfers. If a linear scale were used a large download would result
in smaller transfers becoming too small to visibly compare, possi-
bly to the point of making them imperceptible. Users can tell at a
glance which local or remote addresses are responsible for the most
traffic on the network. When network ranges or ports are collapsed
all of the transfer statistics for those entries are aggregated into their
parent representative.
The IP wall view can quickly show users when a port scan has
taken place; the port scan is visible as a series of connections from
a single remote host to a series of ports on a local host. This vi-
sual technique allows NAV to show ’stealth’ syn scans as well as
regular port scans. During a stealth scan a tcp synchronize request
is made but the port scanner only responds with an RST (reset) re-
quest which in many cases avoids a log being kept of the connection
activity because a full connection is never established. An example
of a port scan can be seen in Figure 7.
Figure 7: A portscan taking place in NAV, a single remote system is
scanning a sequence of internal ports.
Although the wall view is provided in a scrollable region of the
screen to allow large quantities of data to be displayed it is intended
that users will use the ability to collapse and disconnect edges to
focus their attention on connections of interest. The rendering code
of the wall view is quite fast, with a worst case draw time bounded
by a single O(log n) binary search required to locate the far side of
a connection drawing, and of course by the speed of the Java draw
code itself. The draw code does not draw IP labels or calculate traf-
fic statistics for entries that are not currently visible on the screen.
In several tests the draw code scaled to hundreds of addresses with
thousands of connections with draw performance better than 40ms
on a 1.4ghz P4.
4.1.2 Services view
The services view shows the traffic for particular services over time.
The traffic corresponding to each service is shown in a separate
graph. If there is no traffic in the log file related to a selected ser-
vice, the graph is not shown. Currently, the services view only
displays the default or user selected services, so there may be ac-
tivity in the log file that is not displayed in the services view. The
services view has the ability to show up to twelve graphs at once.
Figure 2.4: NAV Visualization of Portscan Behavior
The utility of NAV and RTA together demonstrate and emphasize the point that sim-
plifying visualizations of complex data can illustrate high-level behavior.
2.1.4 PortVis
McPherson et al, at the University of California, Davis, took the following approach [19]
to the problem of understanding vast quantities of finely detailed, high-dimensional data:
simply visualize an artificially coarse summary of the data and see what can be uncovered.
14
They chose to look at a summary of the hourly activity of each port over the entire network,
including the session count, number of unique source and destination addresses, number of
unique source-destination pairs, and number of unique source countries.
Each record is uniquely addressed by a key tuple:
r
(protocol, port, hour)
= (c
s
, c
src
, c
dest
, c
pairs
, c
countries
) (2.1)
Studying PortViz was instructive because it demonstrated the fact that the high-dimensionality
of network data could be mitigated by selecting an appropriate cross-section, which is a key
precondition for our research.
Of course, phenomena that do not manifest themselves as changes in port activity “slip
through the net,” so to speak. An advantage of using flows as the basic data record in our
research is that a number of different cross-sections of the data can be defined; indeed, a
simple preprocessing step (see 3.1.2) can be used to transform raw flow records into cross-
sections like the ones used by PortVis. It led us to hypothesize that there might be significant
explanatory power in fusing many such cross-sections into higher-level conclusions.
2.1.5 Commercial Tools
In addition to the freeware and scientific network behavior analysis tools, there are a number
of commercial products designed to help network administrators make sense of network
flows. This market is commonly called security information management. We provide a
brief overview of some of the major players in this market.
Packet Analytics’ Network Forensic Search Engine (Net/FSE) [20] is primarily a foren-
sics and data analysis tool. It indexes flows and other security event data, such as log entries,
15
Figure 6: Two port scans. Two port scans are
shown. Both started on October 20, 2003. The scan
on the left is a “randomized” scan; from 10:00am–
1:00pm on October 20, the scanner hit ports at ran-
dom, eventually trying all of them. Network activity
was fairly normal at 10:00am, but random port hits
increased from 11:00am to 12:00pm, and between
12:00pm and 1pm, nearly every port had been hit.
The scan on the right is a linear scan, and ran from
11:00pm on October 20 to 2:00am on October 21.
The scanning formed every-other-port stripes that
covered most of the upper port range (the missed
ports were covered in a subsequent scan, which is
not shown here). Note that both the randomized
(top) and linear (bottom) scans stand out on the
timeline, making them easy to tag for this kind of
detailed analysis.
analysis. Most of the time, an experienced analyst can de-
termine whether the suspicious activities uncovered were of
true security concerns.
4. CONCLUSION
Even in settings where only generalized information is
available concerning network activity, many types of ma-
licious activity can still be discovered using visualization.
We have developed a tool that takes general, summarized
network data and presents multiple, meaningful perspec-
tives of the data, and have demonstrated that this visu-
alization leads to useful insights concerning network activ-
ity. Port scans of several types have been successfully de-
tected, and many suspicious traffic patterns on individual
ports have been uncovered. In addition, useful information
about overall network traffic has been revealed; for instance,
the rhythm of the traffic on commonly used ports as time
progresses, and the relationships between the various met-
rics used to describe port activity.
Figure 7: Activity on three ports. Between 5:00pm
and 6:00pm on October 20, 2003, ports 45001,
45002, and 45003 had an anomalously high level
of activity, causing them to appear in highlighted
white on the main visualization. Three highlighted,
sequential ports in a relatively unused segment of
port space stood out enough to warrant analysis.
All the activity on each port was displayed, showing
that each port was relatively unused except for the
burst of activity under investigation. It is impos-
sible to ascertain what the actual traffic was using
PortVis, but the pattern is suspicious.
5. FUTURE WORK
5.1 Better use of attributes
The space of what can be accomplished with the raw at-
tributes available has not yet been fully explored. Most of
the visualizations presented in this paper focus on the raw
level of activity on the port (session counts). However, in-
teresting features may also lie in the other attributes and
their correlations with each other. For instance, Figure 8
highlights the fact that there are ports with suspicious ra-
tios of activity to destinations. This ratio, and others like
it, could be used as quantities for visualization and analysis.
However, there is a limit to what can be done with sum-
marized data; more interesting work lies in the integration of
more detailed data about network activity. If IP addresses
and other information about each session was available, the
existing visualizations could be made much more richly de-
tailed, and new visualizations could be created that could
lead to insights that cannot be found in summarized data.
5.2 Machine learning
Currently, human pattern detection is relied upon to find
patterns in the data and groups of related ports. How-
ever, machine learning could be potentially applied to find
patterns and anomalies, augmenting human abilities. Since
PortVis focuses on unlabeled data, clustering algorithms are
Figure 2.5: PortVis Visualization of Portscan Behavior
PortVis showing two distinct periods of portscan activity, visible in the timeline (center)
and on the left and right in detail.
16
and provides users with a unified graphical user interface to a search engine on collected
data, automatically performing some correlation and cross-reference tasks. From there, an-
alysts can apply various filters to try to understand, and determine the best response to,
an alert.
Q1 Labs’ QRadar [21] is very similar to Net/FSE, providing search and forensic capa-
bilities, but expanding on their efforts by providing their own set of collectors suited to
special tasks. These include collectors for virtual machine network traffic and collectors
which examine the content, rather than just the headers, of packets for suspicious activity.
Riverbed Technology’s Cascade [22] and Tipping Point’s Threat Suppression Engine [23]
differ from other market offerings by performing simple statistical analysis on flows collected
and allowing network engineers to write rule-based responses. For instance, Tipping Point
allows the user to set a threshold for ICMP traffic, generating an alert when it exceeds
4Mbps and automatically reconfiguring routers to rate-shape it back to 2Mbps when it
exceeds 7Mbps, as a way of stemming the flow of worm-related traffic. In many ways, this
approach parallels the autonomic computing approach to server and network monitoring
described by Roblee et al [24].
CISCO Systems’ MARS (Monitoring, Analysis, and Response System) [25] and Lancope
Solutions’ StealthWatch [26] both collect flow data and aggregate statistics on them. In
contrast to the products discussed in the previous paragraph, they take a classical machine
learning approach to anomaly detection; rather than using rule-based thresholding, they
attempt to automatically establish policies and thresholds.
17
2.1.6 Discussion
The low-level inspection tools such as Wireshark provide an analyst with a useful kit for
performing post-attack network forensics, protocol analysis, and the like. On the other
hand, the high-level visualization tools’ evaluations are often phrased in subjective terms
such as, “portscan activity was visually conspicuous.” The commercial tools discussed
are useful for finding gross anomalies over large networks, such as unexpected spikes in
particular types of traffic, but lack the type of fine-grained control and algorithms needed
to identify more subtle behavior, such as the botnet command-and-control behavior we are
using as our motivating scenario. While all of these types of reporting are useful, we hope to
provide both a deeper and a more qualitative insight into network flow behavior and botnet
detection by using statistical-behavioral techniques. See Table 2.1 for a brief summary of
the current approaches considered.
Table 2.1: Network Behavior Analysis Tool and Approach Summary
Signature Visualization Data Statistical Rule-based Learning-based
Detection Fusion Analysis Alert and Alert and
Response Response
RTA No Yes No No No No
NAV No Yes No No No No
PortVis No Yes No No No No
Net/FSE No No Yes No No No
QRadar No No Yes No No No
Cascade No Yes No Yes Yes No
TSE Yes Yes No Yes Yes No
MARS No Yes No Yes No Yes
StealthWatch Yes Yes No Yes No Yes
One final rub besets all network anomaly detection systems. The difficulty is that not
all anomalous behavior is malicious, and not all malicious behavior is anomalous. Table 2.2
gives some examples of such easily-misclassified behaviors, while Figure 2.6 depicts how
flow analysis fits within the lofty goal of understanding and modeling the intentions of the
actors on our network.
18
Table 2.2: Easily Misclassified Behaviors
Normal Anomalous
Benign Most things. A developer setting
up a new service.
Malicious An insider leaking Portscans, infection
sensitive data. attempts, etc.
Bytes
Packets
Flows
Activities
Intent
Figure 2.6: Intent Modeling
2.2 Botnet Detection Strategies
We now provide a survey of botnet detection strategies in use today, both in the commercial
world and in research.
2.2.1 Host-Based Detection
Host-based botnet detection is the most straightforward method of detecting when a host
is running bot software: it simply considers the bot software to be a virus, trojan, or other
malware, and detects it using the usual “anti-virus” methods. Anti-virus vendors derive
a binary signature of the bot software and compare running processes and binaries on the
disk to a catalog of such signatures.
For host-based detection to be effective, two things must be true:
19
The detector must have a positive signature for the bot software. This is a diffi-
cult problem that will always be an arms race between malware authors and anti-virus
vendors. Sometimes having a signature is not possible, as for zero-day botnets. Rajab
et al. estimate that Norton Anti-Virus currently detects only 80% of botnets [27].
The bot must not be obscuring its host-based activity. Among other methods, bot-
nets have been observed in the wild hiding behind rootkits [28].
2.2.2 Network Signature-Based Detection
An approach which addresses the second limitation (above) of host-based botnet detection
is network signature-based detection. In this approach, the detector again draws upon a
catalog of botnet signatures. However, instead of being based on the bot software’s binary
image on disk or in memory, it is based on observed network traffic generated by the botnet
software [27]. An example of such a signature for a IRC bot would be the following tuple:
f
net
= (Hosts, Ports, Nick, Pass, Channel) (2.2)
That is, the fingerprint includes the collection of hosts and ports known to be asso-
ciated with a particular botnet’s command-and-control architecture, along with the IRC
nicknames, passwords, and channel join requests associated with that botnet. It there-
fore captures the network-observable details of the command-and-control interaction; once
these values are known for a particular botnet, these signatures can be adapted and fed
into signature-based network intrusion detection systems such as Snort, which monitors the
network traffic for the signature.
Two difficulties with this approach are apparent:
20
We need a positive signature for the bot software’s network activity. This is anal-
ogous to the situation with host-based detection.
Such a signature must exist. The outward indicators that are part of any network sig-
nature (DNS, IPs, and so on) can all change as bot software receives continual main-
tenance and upgrades. Additionally, some command-and-control architectures, such
as P2P, may not produce such indicators consistently.
The premier example of network signature-based botnet detection is BotHunter [29], a
project of SRI International sponsored by the Army Research Office. BotHunter correlates
custom Snort alerts designed to find evidence of command-and-control and spreading activ-
ity for known bots with host-based evidence of infection. It then uses a rule-based system
to then declare a host infected when certain conditions have been met.
2.2.3 Network Statistical-Behavioral Approach
Our approach to botnet detection, in summary, is to consider the types of behaviors common
to all bot software and to consider ways we can detect those behaviors at the network level.
In short, we are looking for the network-based ways that bot-infected peers reveal themselves
and each other. Some of these behaviors will be easier to detect than others, and some will
be more (or less) reliable indicators of botnet infection than others. Such evidence can be
considered on its own or, more likely, fused with other evidence gathered from systems such
as BotHunter.
21
2.3 Botnet Behavior in Depth
In order to detect bot-infected hosts in this way, we must first enumerate the behaviors
in which we are interested. We do so in approximately chronological order, starting with
infection.
2.3.1 Infection
Infection cannot be easily characterized as a network behavior, as it may take only a few
packets or a single session to exploit a vulnerability on a host; the infection may also arrive
via otherwise normal traffic, such as email attachments. Initial infection and infection
attempts can be detected via signature-based intrusion detection systems, though, so we
note it here to suggest the potential for data fusion.
2.3.2 Command-and-control Discovery
All bot software must have at least one initial “hard-coded” initial command-and-control
discovery process, during which the infected host finds the “rendezvous point” that it needs
to contact to obtain a secondary injection. For bots that use protocols like IRC or HTTP
for C&C, detecting this behaviorally may be difficult, as with initial infection; we may
have to rely on bot-specific knowledge such as IP addresses or DNS requests known to be
associated with the malware. For bots that use a P2P command channel, discovery may
look like scanning behavior across a large number of peers.
2.3.3 Command Channel
In order to remain useful to a botmaster, bot software must periodically “phone-home” via
the command channel for updates to the secondary injection. Such behavior is some of the
22
most important in botnet detection.
For several reasons, we know that such requests must originate from the bots them-
selves and not the botmaster. First of all, most bot-infected hosts will be end-user ma-
chines which are not routable from outside the network either because of Network Address
Translation (NAT) or firewalls. Secondly, most bot-infected machines will be subject to the
constant churn of DHCP (Dynamic Host Configuration Protocol), making it impossible for
the botmaster to keep accurate records of the IP addresses of infected hosts. Finally, many
bot-infected machines will be switched off at a given time.
In this way we know that the botmaster cannot reliably “push” updates from his vantage
point; the bot-infected hosts must initiate the transaction. Since the bots cannot be induced
to check for updates, we can conclude that all bot software must periodically check for
updates, and further note that to be useful to the botmaster, the checks must be periodic
or otherwise predictable. We call this behavior beaconing, and detecting it reduces to
detecting communications that are periodic: for instance, a host initiating a session with
another host every 6 or 12 or 24 hours. There may be some randomization in the periodicity
of a host, which makes robust detection important.
Since we have the network as our vantage point, we can consider not only the periodicity
but the synchronicity of communication. In essence, botnet-infected hosts give each other
away by all “winking” at the same time, so to speak. If we detect an unusually high number
of hosts all contacting another particular host within some , we might suspect them all to
be part of a botnet.
For completeness’ sake, we acknowledge that there are other peculiarities of command
channel communication in botnets, such as the use of dynamic DNS, that once again “slip
through the net” at the flow analysis level.
23
2.3.4 Spreading
Most bot software’s secondary injection mandates attempts to spread via the network. As
spreading is the mirror image of infection, there are once again some aspects of spreading
(such as automatic remote exploits) that are not easily called “behavior.”
However, there are several associated behaviors that may be detectable via the network
layer. They include scanning (methodically finding vulnerable peers), excessive file-sharing
protocol use (writing infected files to shared volumes), and sending emails with a malicious
payload.
2.3.5 Common Malicious Activities
As with spreading, not all bot software will behave in the same way, but most bot soft-
ware will at one time or another be engaging in certain malicious activities. These include
spamming (sending bulk unsolicited email, observable as inordinate SMTP activity), DDoS
(sending repeated web requests in rapid succession, detectable on port 80), and provid-
ing services (such as DNS, proxying, or web hosting) not normally provided by end-user
computers.
Consult Table 2.3 for an overview of botnet behavior by phase.
Table 2.3: Botnet Behavior Overview by Phase
Behavior Initial Secondary Malicious Maintenance
Infection Injection Activity
Scanning Yes Yes† Yes No
Periodicity No No No Yes
Synchronicity No No Yes Yes
SMTP No No Yes No
Inappropriate Services No No Yes No
Exploits Yes No Yes No
†P2P botnets only
24
Chapter 3
Tools, Metrics, and Algorithms
In this chapter, we first describe the tools developed and used by our lab to perform data
collection, analysis, and visualization. Next, we will discuss in detail the first- and second-
tier metrics we will use in our experimental botnet detector.
3.1 Toolkit
Two programs, NetSAW and NetFEE, comprise the heart of our data capture and analysis
system. They are discussed in this section.
3.1.1 NetSAW
The Network Situational AWareness tool is a flow capture and export tool written in C
by Dr. Vincent H. Berk. It was conceived as a cost-effective way to gather flow data
at multiple points on a network. The program runs on Windows PCs and Linux/Unix
variants. It monitors TCP/UDP/ICMP packet headers on a selected network interface
using libpcap [30] and reconstructs the packet series into flows. It maintains flows in memory,
optionally exporting them in the Netflow v9 format [5] or to a PostgresQL database [31].
25
Table 3.1 describes the columns in the NetSAW PostgresQL database.
Table 3.1: NetSAW SQL Record Format
Field Name Type Description
keya bigint Session Key
keyb bigint Session Key
sid smallint Sensor ID, to distinguish multiple collectors
clientip inet Client Address
clientport integer Client Port
serverip inet Server Address
serverport integer Server Port
proto integer Protocol (integer enumeration
of TCP, UDP, ICMP)
state integer Connection State (integer enumeration
of NEW, ESTAB, CLOSED, RST, TMOUT)
clientpkt bigint Number of client-transmitted packets
clientbyts bigint Number of client-transmitted bytes
serverpkt bigint Number of server-transmitted packets
serverbyts bigint Number of server-transmitted bytes
starttime timestamp Start Time
lastupdate timestamp Time of last update
flags bit(32) Cumulative flags associated with a TCP session
We most heavily exercised the PostgresQL exporter, as NetFEE, the second component
in our data collection and analysis suite, relies on it as a backing store.
3.1.2 NetFEE
We noted in the Introduction two of the major challenges presented by network flow data:
dimensionality and self-similarity. A look at Table 3.1 brings the dimensionality challenge
into sharp relief. With this in mind, we developed NetFEE, the Network Feature Extraction
Engine. NetFEE is a general framework for querying the PostgresQL backing store popu-
lated by NetSAW and doing calculations on the retrieved data.
NetFEE was conceived to be a means of quickly performing the discovery phase of
mining our data; that is, we wrote it to help us efficiently discover the most interesting
and informative cross-sections and time intervals in the data. Our motivation was to create
26
a framework where a user could wonder aloud, “I wonder how many HTTP sessions have
been initiated each hour over the course of the last week?” or perhaps, “What is the session
volume balance (that is, the ratio of bytes transmitted to bytes exchanged) of a particular
server on March 22?” or any of a vast number of variations on the above theme, with
different measurements, time intervals, and deltas. Having imagined the metric, the user
can then configure NetFEE to calculate the answer without writing new Java or SQL code.
NetFEE provides the user with options for defining the desired cross-section, a number
of built-in metrics, and a way to import user-supplied metrics. This promotes code reuse
and makes it suitable for rapidly prototyping and evaluating new metrics and cross-sections.
NetFEE is written in Java. It has both a command-line interface, which is compre-
hensive, as well as a graphical front-end which reveals the most commonly-used functions.
Table 3.2 describes the available command-line parameters. Figures 3.1 and 3.2 illustrate
the GUI, and Listing 1 shows an example metric, a snippet of Java code that calculates
session volumes.
3.2 First-Tier Analysis (Metrics)
When we discuss first-tier analysis, we are referring primarily to simple measurements. In
this case, we are sampling flows to generate probability distributions over different cross-
sections of the network. These distributions are of the type that our hypothetical researcher
in 3.1.2 explored.
Beyond taking measurements, discretizing them into bins, and generating a histogram
that corresponds to the probability density function for the metric, we are doing very little
analysis or learning. This is why these methods are called “first-tier.” However, interesting
conclusions are still there to be found to the researcher who asks the right questions (that
27
!"#$%&'()"**+%,(-,#"&./0"(
)",*%&()"#12(3+,4%$(
Figure 3.1: NetFEE Graphical User Interface (1)
This is a screenshot from the NetFEE Sensor Setup window, used to configure NetSAW
collectors from NetFEE.
!"#$%&'()"**+%,(-,#"&./0"(
1/#/()233/&4(5+,6%$*(
Figure 3.2: NetFEE Graphical User Interface (2)
This is a screenshot of the NetFEE data summary window.
28
neLwork Sesslon lnLerface
Craphs avallable: hlsLogram of byLes ln each sesslon wlLh several opuons for vlewlng
when new sesslons were sLarLed as an lndlcaLor for when Lhe cllenL ls acuve.
Figure 3.3: NetFEE Graphical User Interface (3)
This is a screenshot of the NetFEE histogram window.
Program 1 NetFEE Example Metric. Takes a JDBC ResultSet, which contains the flow
records returned by the database in accordance with the user-defined cross-section, and
returns an array of long, which is written to disk.
private static long[] calculateSessionVolumes(ResultSet r) {
Vector sessions = new Vector();
try {
while(r.next()) {
Long count = r.getLong("clientbyts") + r.getLong("serverbyts");
sessions.add(count);
}
} catch (SQLException e) {
System.out.println("Getting rows failed! Check output console");
e.printStackTrace();
return null;
}
// put the vector into an array of Longs
Long array[] = new Long [sessions.size()];
array = (Long []) sessions.toArray(array);
long result[] = new long[array.length];
for(int i = 0; i < array.length; i++) {
result[i] = (long) array[i];
}
return result;
}
29
Table 3.2: NetFEE Command Line Parameters
Parameter Description
--dbConn Set JDBC database connection string
--dbUser Set JDBC database user name
--dbPass Set JDBC database password
--sid Set NetSAW Sensor ID (integer)
--start Set start time (format: MM-dd-yyyy HH:mm:ss)
--end Set end time (format: MM-dd-yyyy HH:mm:ss)
--cport Search for flows on specific client port
--sport Search for flows on specific server port
--clientCIDR Search for flows on specific client CIDR block or
IP address
--serverCIDR Search for flows on specific server CIDR block or
IP address
--volgreater Search for flows with volume greater than some value
--volless Search for flows with volume lower than some value
--pktgreater Search for flows with packet count greater than some value
--pktless Search for flows with packet count lower than some value
--sql Use this option to specify your own SQL WHERE clause for
session filtering, rather than specifying filters using the
above options.
--metric Metric, one of BYTES, NEW_SESSIONS,
SESSION_DURATIONS, SESSION_PACKET_COUNT,
SESSION_VOLUME, SESSION_PACKET_BALANCE,
SESSION_VOLUME, AVERAGE_PACKET_SIZE, or
INTERARRIVAL_TIMES.
--delta Time delta, one of SECONDS, MINUTES, HOURS
or DAYS.
is, measures using the right interval, cross-section, and delta). It was here that NetFEE was
most helpful, enabling us to quickly ask a wide variety of questions and discover out which
questions were the “right” ones. We present here a few representative metrics findings taken
on our test network with NetFEE.
Figure 3.4 depicts the session volume balance for a two-week period in February, 2009.
The balance is trimodal, implying correctly that the network environment is composed of
a mixture of clients (hosts mostly receiving bytes within a session), servers (hosts mostly
transmitting), and peers (hosts doing equal parts of both within a session). Several obser-
vations about this metric are:
30
Session Volume Balance (Bytes In/Bytes Out)
N
u
m
b
e
r

o
f

S
e
s
s
i
o
n
s
Figure 3.4: Session volume balance histogram
This is a histogram of the session volume balances for all sessions on a test network for a
two-week period in February, 2009.
31
Stability. The histograms generated from other activity intervals is visually almost indis-
tinguishable from the one in Figure 3.4. This property holds over changes in both
endpoints, and for most choices of interval length. However, there are scenarios where
a distribution only becomes stable with a wide-enough time window; for instance, a
single user’s traffic over HTTP may vary wildly in volume from day to day, but be
relatively stable from month to month. Selecting the right time window is often a
trade-off between stability and resolution.
Host characterization. Zeroing in on different hosts yielded quantitatively different dis-
tributions. We observed that file-server machines had a much smaller “left-peak,”
end-user machines had a smaller “right-peak,” and developer machines (running ad-
hoc client and server software) had profiles similar to the network overview.
Service characterization. Different services (such as telnet/ssh, HTTP, and so on) had
their own volume profiles, which varied from one another but not from host-to-host.
Session Duration (seconds)
N
u
m
b
e
r

o
f

S
e
s
s
i
o
n
s
Figure 3.5: Session durations histogram
This is a histogram of the session durations, in seconds, for all sessions on a test network
during the same two-week period depicted in Figure 3.4.
32
Figure 3.5 depicts another metric, the histogram of session durations. It shares with the
volume balance metric the properties of stability and service characterization.
3.3 Second-Tier Analysis (Algorithms)
In the section, we describe a few statistical or machine-learning tools that we find useful
for analyzing first-tier data. They include the Kullback-Leibler divergence, an algorithm
for detecting quasi-periodic activity, and an algorithm for detecting synchronous behavior.
3.3.1 Change Detection (Kullback-Leibler Divergence)
Having gathered some first-tier data and wishing to draw behavioral conclusions from it, a
natural question to ask is, “how ‘predictable’ are these data?” Often we can gain intuition
by inspection. Figure 3.6 depicts the total volume of network traffic on the test network in
bytes per hour, for a 28-day period. Laid out this way, a few observations come to mind.
Hour of Day
N
e
t
w
o
r
k

V
o
l
u
m
e

(
B
y
t
e
s
)
Figure 3.6: Hourly volume, day-by-day, on the test network over four weeks
This is a multiplot of the total volume per hour on the test network, day-by-day, for a
28-day period during January and February of 2009.
33
We noticed a relatively large peak every day around 4 AM, as predictable as clockwork.
We surmised that it was probably attributable to an automated backup process. We also
noticed that when network traffic was high apart from this spike, it was almost always
between the hours of 8 AM and 6 PM, and that during those times traffic volume was hard
to predict. Lastly, we saw that weekends (the fourth and fifth columns) saw almost no
daytime activity.
Many of our first-tier metrics can be interpreted as discrete, non-parametric (that is,
frequency) probability distributions of unknown type. Our goal with taking these metrics
is to characterize, predict, and detect changes in behavior of hosts or networks over time.
We noted in the previous section that many metrics appear to exhibit remarkable stability;
a notion of a distance measure between probability distributions would be a useful way to
quantify that intuition.
The Kullback-Leibler divergence [32] “measures the expected number of extra bits re-
quired to code samples from P when using a code based on Q, rather than using a code
based on P.” Formally, it is defined for discrete probability distributions as follows:
D
KL
(P||Q) =

i
P(i) log
P(i)
Q(i)
(3.1)
As defined, however, this measure is not symmetric and therefore does not satisfy the
triangle inequality; it fails to meet our intuition as a distance measure. The symmetric
version of the measure is defined as follows:
D
KL
(P||Q) + D
KL
(Q||P) (3.2)
As such, it is a natural measure of the distance between two probability distributions.
34
We call this version of the divergence the Kullback-Liebler distance (or KL distance).
In the previous section, we noted that the histogram of session volume balance over
the whole network varied little over time; the KL distance between two such distributions
would be zero or nearly zero. But the same measurement varied from host-to-host; the
KL distance between the measurement taken on a server versus a client machine would
be distinctly non-zero. If a client machine was reconfigured to perform server activities
(generating a distribution more like the server distribution), the KL distance between it
and other hosts would change to reflect the reconfiguration.
Therefore, we can use the KL distance to measure how two hosts or services behave
differently, or to detect changes in a host’s behavior over time.
Alternatives
The KL distance is certainly not the only statistical method for comparing non-parametric
distributions; the area has been studied extensively. Alternative methods range from the
na¨ıve (for example, normalizing the counts in the two distributions and summing the ab-
solute value of the differences between each count) to the sophisticated, such as Student’s
t-test [33] or the Kolmogorov–Smirnov test [34].
Each of these methods falls into the category of null hypothesis testing, a set of methods
which are meant to measure whether, or how well, a set of observations fits a model.
Naturally, they can easily be modified to compare two samples P and Q by simply calling
P the “model” and Q the “sample population.” Our approach, however, more closely
resembles statistical model selection [35] than it does hypothesis testing, in that we assume
the existence of a number of candidate models (e.g., (client, server, peer) or (student,
professor)), and we want to find the best model that explains the data.
35
Null hypothesis testing generates p-values, which could be used to select a model. How-
ever, while small p-values provide evidence against the null hypothesis, large p-values do
not necessarily comprise evidence in support of the null hypothesis. The KL distance, on
the other hand, may be understood as an estimate of the information lost when a particular
model is used to approximate reality, and as such is useful for selecting the best model; in
short, it is a distance measure, and it therefore generates a meaningful ranking.
3.3.2 Periodicity Detection Using K-means
One of the more challenging features to extract from flow data, but also a strong indicator
of potentially-infected hosts, is beaconing; that is, the periodic “phoning-home” that active
bot software must do to remain active. A serious difficulty in detecting beaconing is the
fact that observations may be missed, either as an artifact of the flow-capture method,
limited sensing resources, DHCP churn, or the infected host simply being switched off or
disconnected from the network. In addition, the data might be very noisy, either because
of sensor limitations or because of intentional obfuscation by the botmaster. Therefore,
we need a robust method of detecting quasi-periodic behavior. Clustering and pattern
recognition on time-series has been explored in depth [36]; we present here an adaptation
of a classical algorithm to the problem of periodicity detection.
K-means clustering [37] is a statistical technique for clustering data into classes with
a low in-class variance; that is, it partitions observations into clusters with the goal of
assigning each observation to the cluster with the nearest mean. This makes it well-suited
for finding periodic behavior; if we cluster flows from a host according to interflow time,
clustering should result in a large cluster with a low variance, with a basic period inferred
from the mean. However, it must be adapted somewhat to our domain to account for the
36
difficulties of noise and missed observations.
Our heuristic for detecting periodicity is as follows:
1. Calculate the interflow arrival time for a suspect channel
1
.
2. Cluster the interflow times using a k-means algorithm
2
, merging any clusters with
means that are very close. (Such clusters likely exist only because the algorithm
“over-optimized” when the initial k chosen was too large.)
3. Check for large clusters with means that are nearly multiples of each other. If so,
regard them as evidence for the existence of a basic period (the smallest such multiple).
4. If there is a large cluster with a small variance, report the mean as the period of
periodic behavior along the suspected channel.
We elect to use a clustering method for detecting periodicity rather than the more typical
Fast Fourier Transform because it is better suited to our data collection method. FFT is
better-suited to a continuous signal sampled at a regular sample rate; on the other hand,
our method records start and end times of flows asynchronously. Rather than contrive
to somehow digitize the signal, we take advantage of the fact that an otherwise costly
calculation (interarrival time) is available trivially and use it as the input to a heuristic.
3.3.3 Synchronicity Detection Using K-means
Another mark of a group of botnet-infected hosts is synchronicity. Synchronicity is simply
the phenomenon of a large number of hosts starting to do “the same thing” at the same
1
Defining suspect channels is another challenge, with various solutions existing on the continuum between
signature-based and behavioral. For instance, we may narrowly define “IRC traffic to a particular host” a
suspect channel, or broadly define “small flows” or “short-duration flows” to be suspect, or anywhere in
between. IRC traffic is an obvious suspect channel that covers the architecture of many botnets.
2
Selection of k is relatively unimportant, as clusters will later be merged; however, the actual initial seeds
might be important. In practice we simply use k-means++ [38].
37
time (as in DDoS attacks or other aperiodic behavior), or a single host doing “the same
thing” to many hosts all at once (as in portscan or spreading behavior). Regardless of
the attribution, all that changes for detection is the channel monitored, which is either the
suspected command-and-control channel or the suspected attack channel.
Once the suspect channel is established, detecting synchronicity is a matter of finding
sessions in that channel that overlap each other in time and are categorically similar. Ses-
sions which overlap each other in time will have a negative interarrival time. We assume
that flows that are categorically similar will have, at least, a similar duration.
Therefore our approach to detection is as follows. We again consider the interarrival
times of flows, but this time we use K-means to find large clusters of flows with negative
interarrival times. Sessions in these clusters all start before the previous flow completes
(because of the negative interarrival time), and have similar duration (because they are in
a cluster, which implies a small variance). We then split these clusters whenever there is
a temporal discontinuity. Any large clusters remaining after the splitting are potentially
intervals of synchronous behavior. The magnitudes of the means of such clusters reflect the
durations of the basic session.
38
Chapter 4
Experimentation
In this chapter, we describe our experimental setup, network characterization, and botnet
feature-extraction and detection experiments.
4.1 Experimental Setup
The NetSAW data collection engine (see 3.1.1) is designed to collect flows at the edge of a
network. The overall architecture of the NetSAW/NetFEE system is depicted in Figure 4.1.
It would be too contrived to attempt to synthetically create both the background traffic
and the botnet-correlated flows for our experiments. However, we are aware of the danger
of experimenting with live botnet software on a network filled with bystanders. Therefore,
we collected background traffic based on real users’ true internet use, and we wrote a set
of small, custom programs which behave as benign botnets. We used those programs to
generate the traffic for the synchronicity, beaconing, and behavior experiments. They are
described in greater specificity in the sections corresponding to actual experimental runs.
For some of our experiments, particularly the preliminary ones described in the previous
chapter, we collected flows at the border of our own testbed network, a small collection of
39
Outside World
Experimental
Network
Test Nodes
... ...
Firewall & NetSAW
Collector
“Apollo”
(Postgres & NetFEE)
Experiment Console
Figure 4.1: NetSAW/NetFEE Experimental Architecture
This is a diagram of our data collection (NetSAW) and processing (NetFEE) systems.
40
about a dozen hosts running a variety of operating systems. For the experiments described
in this chapter, we monitor the border of a larger academic network, consisting of two
Class C subnets with a mixture of fixed IPs and DHCP-allocated hosts, clients and servers.
As of the time of this writing, over four months of data collection, we recorded 9432362
sessions with 455655 unique IPs. Because only flow data are recorded, nothing personally
identifiable was saved.
4.1.1 Non-intrusiveness
Our approach differs most distinctly from previous attempts at botnet detection in our data
source: we limit our analysis to flow summaries rather than actual packet content. Since
we do not do “deep packet inspection,” our approach is suitable for use at such a level as
the ISP level, where both bandwidth and privacy concerns constrain such analysis.
4.1.2 Scalability
Our analysis platform can scale to much larger networks, although for networks with high
enough throughput, software-based flow aggregation (NetSAW) may be insufficient. How-
ever, there is a variety of commercially available hardware which performs session aggrega-
tion and export at line speed, so we consider that part of the scalability problem essentially
solved.
A significant strength of the system is in the architecture: most of the data filtering
heavy-lifting is handled by the PostgresQL relational database system and by instances of
NetFEE, which communicate via TCP. And since NetFEE requests only read and never
write to the database, PostgresQL’s replication facility can be used to easily increase data
processing capacity. All this is to say that PostgresQL and NetFEE are intrinsically dis-
41
tributed applications
1
.
The hardware and software we used for data analysis (see below) was able to perform
quickly enough for our needs; it almost always provided a turnaround time of under 10
seconds for most NetFEE requests, even ones which involved months of data, and usually
accomodated requests in under a second. In our experimental setup, we only monitored one
metric at a time, but a reasonable live monitoring system might be configured to calculate a
few dozen such metrics per host every few minutes, plus a few dozen over the whole network
or important subsegments. Such a notional system would certainly be realizable given our
test network and test hardware.
The real question is how resource requirements scale with the size and throughput of
the network being monitored. All of the built-in NetFEE metrics have O(n) runtime in
n, the number of sessions processed (excluding selection time), and the PostgresQL back-
end typically performs selection and insertion operations in amortized time proportional to
O(nlog(m)) in m, the total number of sessions recorded
2
. Naturally, the storage require-
ments grow in direct proportion to m as well.
One observation about n and m is that the number of records processed in a given
NetFEE query, n, is likely to be quite small compared to m, the total number of records
collected, after even a short interval of data collection. Another observation is that since
most NetFEE queries will be filtered to a single host or service, n is not likely to vary
wildly between large networks and small ones
3
. So the O(log(m)) component, even though
logarithmic, is likely to dominate the runtime requirements of NetFEE queries, and we can
1
I am grateful to the Jargon File for the lighthearted observation that some algorithms are “embarrass-
ingly parallel.”
2
The PostgresQL database is configured to maintain B-tree based indices [39] for the selection filters
provided by NetFEE.
3
We are here supposing that with access to a given edge bandwidth, hosts generate approximately the
same number of sessions over the border on a small network as on a large one.
42
consider the linear-related-to-n component effectively constant.
Given these observations, a network B with twice as many hosts as a network A generates
twice as many session records as A, and therefore requires computation proportional to
2log(m).
Over time, the number of sessions in the backing store might grow to the point where
even logarithmic compute time is too much. However, a periodic pruning or archiving of
historical session records will solve this problem, and will not affect the system’s effectiveness
as long as the metric results are retained.
4.1.3 Software & Hardware in Use
The NetSAW collector ran on the firewall hosts already in place for the networks on which
we experimented. Figure 4.2 is a portrait of Apollo, an IBM eServer xSeries 345 which
was the primary experimental workhorse for this research. Apollo is a 2U rackmount Quad
Intel Xeon 2.8Ghz server with 2.5G RAM and two 72G SCSI disks configured in a RAID 0.
It is running a PostgresQL database server and Sun Java 1.6.0_07-b06 on Ubuntu Linux
8.10.
Figure 4.2: Portrait of Apollo
This is a portrait of Apollo, apollo.ists.dartmouth.edu, who ran the PostgresQL
backing store and most instances of NetFEE.
The experimental console was simply a laptop issuing commands to Apollo via ssh. In
addition, it ran the custom programs which generated the traffic for our experiments.
43
4.2 Experiments
4.2.1 Detecting Changes
Our basic hypothesis regarding botnet behavior is that hosts behave in a statistically pre-
dictable manner, that bot actions invariably generate changes in that behavior, and that we
can observe, predict, and detect changes in that behavior with the right heuristic. Consult-
ing the table at the end of chapter two, it’s clear there are many ways to characterize those
suspect behaviors. We have designed this experiment around a particular pattern among
botnet-infected hosts.
The pattern is that botnet-infected hosts whose behavior previously resembled “client”
machines start behaving more like servers. Sometimes this is because they are actually
providing services (for instance, surreptitiously hosting illicit files). At other times, it is an
artifact of their spreading behavior, as they aggressively infect files on accessible file shares.
Whatever the attribution, we are attempting to observe the change in behavior.
Most approaches to detecting this kind of behavior center around binary signatures or
heuristic ones (such as port number). But, as with most client-server protocols, a port
number is merely a suggestion. And access to packet contents or useful signatures may
both be too much to assume. For our purposes, we assume little other than a change in
behavior from client to server at the network flow level.
For our experiment, therefore, we elected to measure the session volume balance metric
in NetFEE. We initially developed a simple model for behavior: in the uninfected state, the
host communicates on the suspect protocol, transmitting 1-to-100-bytes-per-session 90% of
the time, and 100-to-1000-bytes the rest of the time. In the infected state, it makes small
transmissions only 10% of the time and large transmissions otherwise.
44
Abrupt Changes
We first attempted to detect the state transition of the notional botnet from dormant to
spreading by taking periodic snapshots of the SESSION_VOLUME_BALANCE. Figure 4.3 depicts
the ten snapshots taken immediately surrounding the state transition.
(Bytes out/Bytes total)
Figure 4.3: Abrupt change in session volume balance
This is a series of plots of the SESSION_VOLUME_BALANCE during our “Abrupt Changes”
experiment that shows the abrupt transition in the distribution of session volumes as the
notional botnet host transitions from dormant to spreading.
Visual inspection clearly yields the intuition that the distribution was initially stable,
changed abruptly, and then stabilized in a new state. We can quantify and confirm this
intuition using the Kullback-Leibler distance. If we measure the KL distance between
every successive pair of snapshots depicted above, we get the following sequence (reading
diagonally):
Table 4.1: KL Distances of Snapshots for Sudden Change Experiment
a b c d e f g h i j
a 0.0559 0.0640 0.0599 0.0612 0.6379 0.5710 0.6036 0.5814 0.5982
b 0.0072 0.0086 0.0016 0.4486 0.3935 0.4180 0.3873 0.4113
c 0.0150 0.0043 0.3981 0.3532 0.3707 0.3465 0.3620
d 0.0145 0.5348 0.4776 0.5035 0.4732 0.4925
e 0.4030 0.3533 0.3742 0.3461 0.3667
f 0.0072 0.0019 0.0070 0.0045
g 0.0061 0.0055 0.0147
h 0.0047 0.0036
i 0.0110
j
Clearly the 0.4030 is out-of-place, and corresponds to the change in behavior. Immedi-
ately following that, the KL distance between successive snapshots returns to its previous
level.
45
See Figure 4.4 for a graphical representation of these results.
Figure 4.4: KL distances during “abrupt change”
A plot of KL distances between successive snapshots during one of the abrupt change
experiments, with notional thresholds for variance, is shown in Figure 4.4. The outlying
observation corresponds to the interval which our intuition suggests had the change in
behavior.
This yields a simple heuristic for characterizing and detecting behavior:
1. Take periodic snapshots of the distribution in question, and calculate the KL distance
between each.
2. Maintain a moving average of the KL distances (the baseline variance).
3. If a new snapshot’s KL distance from the preceding snapshot is more than σ from the
baseline variance, consider the algorithm to have observed a change in behavior.
Every metric will have its own baseline variance, established during a training period,
which makes the algorithm appropriately sensitive to changes. The length of the training
period will determine the accuracy of the baseline variance and the sensitivity of the system.
46
Gradual Change
Our next experiment explores gradual changes in behavior. We configured the notional
botnet this time to shift from a particular distribution to another gradually over the full
duration of the experiment. We ran the experiment several times; Figure 4.5 is a represen-
tative subplot of ten snapshots.
(Bytes out/Bytes total)
Figure 4.5: Gradual change in session volume balance
This is a series of plots of the SESSION_VOLUME_BALANCE during one of our gradual
changes experiments that shows a gradual transition in the distribution of session volumes
as the notional botnet host transitions from dormant to spreading.
Once again, a visual inspection shows that the distribution does change drastically from
start to finish. However, the sequence of KL distances between successive measurements
does not bear out any sudden change; reading along the diagonal, the KL distances are
about the same from snapshot to snapshot, while reading left-to-right, we see a gradual
divergence over time.
Table 4.2: KL Distances of Snapshots for Gradual Change Experiment
a b c d e f g h i j
a 0.0029 0.0263 0.0713 0.1093 0.1448 0.1861 0.3824 0.4469 0.5658
b 0.0160 0.0525 0.0890 0.1185 0.1627 0.3472 0.4112 0.5271
c 0.0123 0.0320 0.0496 0.0829 0.2183 0.2689 0.3658
d 0.0078 0.0170 0.0434 0.1388 0.1820 0.2666
e 0.0077 0.0152 0.0872 0.1213 0.1907
f 0.0182 0.0675 0.1002 0.1609
g 0.0431 0.0637 0.1114
h 0.0046 0.0228
i 0.0096
j
The heuristic described in the previous section will not detect a change in behavior,
because the KL distance between successive snapshots is never sufficiently large. There are
47
0
0.1
0.2
0.3
0.4
0.5
0.6
KL Distances for Different Representative Distributions
K
L

D
i
s
t
a
n
c
e
Time
a b c d e f g h i
Figure 4.6: KL distances during “gradual change” experiment
Each line is a plot of KL distances between successive snapshots during one of the gradual
change experiment, using a different initial snapshot as the representative shot.
several possible workarounds to this problem.
1. The sensitivity could be adjusted, but not without a commensurate increase in the
false positive rate.
2. Alternatively, snapshots could be taken at a variety of different resolutions (once per
minute, once per hour, and so on) in the hope that changes which are gradual on one
time scale are abrupt on another.
3. The system could be trained in a supervised manner, with a labeled representative
distribution and baseline variance. Indeed, if we na¨ıvely take the first measurement
(on the experimental run depicted in Figure 4.5) to be the representative distribution,
we find that the KL distance between the initial and final distributions is 0.5658.
This result illustrates an issue with learning systems in general: any system that “learns”
48
behavior in an on-line sense suffers from the ability to be “taught” by a savvy-enough
adversary
4
. Over time, the system is trained to recognize progressively-worse behavior as
normal, without ever crossing a detection threshold
5
.
Since session volume balance behavior is most likely to change when our view is limited
to a particular application protocol, a real-world monitoring scenario would focus on the
behavioral profile of a particular host and with respect to a suspect protocol such as SMB
(Windows File Sharing).
4.2.2 Detecting Beaconing
To evaluate our beaconing detection method as a proof-of-concept, we wrote a program
that simulated beaconing behavior and attempted to detect it using the heuristic described
in the previous chapter.
Our initial experimental setup for beaconing simulated ten hosts. Seven of them were
configured to initiate a session on the suspect channel periodically with a basic period of ten
seconds, and three were configured to initiate sessions randomly. Each host had a random
start time within ten seconds of the start of the experiment. In this scenario, all that was
necessary to achieve a good labeling of periodic hosts to non-periodic ones was to ensure
that the algorithm is allowed to consider a time window wide enough to observe enough
beacons to declare a host periodic.
To make things more realistic, we simulated both missed observations and two different
basic periods. To simulate missed observations, we modified the simulation for the host to
4
We call this the “check engine” phenomenon, because it is analogous to the situation when the check
engine light in an old car turns on. If the owner has seen the light illuminate many times without any
noticeable problems to the car, he will ignore it more and more. Eventually, the illuminated warning light
becomes “normal.”
5
This brings up another concern about learning which is: “How do I learn when I only have positive
examples labeled?” It may be easy to say when a host is botnet-infected; it is much harder to say that it
is not, even in a tightly-controlled environment. Machine learning tasks are especially difficult when you do
not have an oracle.
49
flip a coin every basic period, and only initiate a session on heads. We also made the basic
period for three of the hosts two seconds. We then ran the detector on all available data
every ten seconds.
Figure 4.7 shows the labeling results of a representative run of the above-described
experiment; on this particular run, an accurate labeling for all seven periodic hosts was
achieved after 14 ten-second time slices.
0
1
2
3
4
5
6
7
8
9
10
Periodicity Labeling
L
a
b
e
l
i
n
g
Time Slices
True Positives
True Negatives
False Positives
False Negatives
Figure 4.7: Periodicity detector labeling
The labeling results of the periodic behavior clustering algorithm. p = 0.75, v = 0.1.
It took longer to achieve a good labeling for the hosts with the 10-second period than
the two-second period, and some of the random hosts at times were labeled periodic because
50
the uniform-random behavior looked, for a moment, enough like periodic behavior to make
the ruling, resulting in some false positives.
One detail of the algorithm is the parameter p, which specifies what proportion of the
population must be members of a cluster for it to be declared a “large” cluster, which is
the condition that must hold for a host to be declared periodic. Too small a p resulted in a
high number of false positives, while too large a p reduced or eliminated false positives but
stretched out the time window required to obtain an accurate labeling.
Another detail in the algorithm which became relevant on this modified experimental
run was how much variation v from a true multiple of the presumed basic period would
be tolerated when merging clusters that are considered multiples of one another; we found
that up to ten percent in either direction (for instance, merging a cluster with a mean of
4.19 with a cluster of mean 2) still gave good results.
Beaconing in the Wild
Our approach to detecting beaconing in the wild proceeds from the observation we made
during the data mining phase that hosts are likely to initiate repeat connections with a
relatively small number of hosts. That is, the session count between a user machine and
his email server, default search engine, “daily” websites, or botmaster, are all likely to be
high, while the session count to most other hosts will be quite low.
We selected a host h (129.170.249.161) on the test network to check for beaconing
behaviors. We made a list of the 250 external hosts
6
h initiated sessions with between Febru-
ary and May of 2009 with highest session count. We ran the K-means++ [38] clustering
6
We note here one weakness of this approach, which is that if a pair of hosts communicate via different
ports, some periodically and some at random, we might fail to detect it. A simple, but compute-intensive,
workaround would be do the same but broken down along suspect ports as well. Such a task would be easily
scripted with NetFEE.
51
algorithm on sessions initiated from the analyzed host and the top-250 list and analyzed
them for evidence of periodic behavior. We found strong evidence for periodic behavior
between h and two of the 250 hosts on the list.
The first of the two hosts had an extremely large cluster with more than half of the
data set in membership. The cluster’s mean was 898061 ms which we took as evidence for
periodic behavior with a basic period of fifteen minutes. When we did a reverse-DNS lookup
of the remote host, we discovered it to be an email server. Fifteen minutes is a common
default interval in many mail clients for checking for new messages. Further investigation
and manual inspection of the session table for that host verified this intuition.
The second of the two hosts had much fewer sessions in total; it was ranked 240
th
in
the top-250 table. The mean of the largest cluster, again by far the largest, had a mean of
about 20 hours. The only other cluster had a mean of about six days, related to periods we
identified when that machine was switched off (or assigned a different IP address). From the
large cluster, we inferred a basic period of 20 hours. A reverse-DNS lookup revealed the host
to be associated with Windows Update. Windows Update by default is configured to check
for updates once per day, but inspection of the session start times suggested that the actual
time-of-day of the check may be randomized. To explain why the mean of the periodic
cluster was shorter than 24 hours, we hypothesized that 1) the machine’s user probably
initiated updates manually on occasion, and 2) the smaller cluster became a catch-all for
any rapid-fire traffic, such as sessions related to actually downloading updates (as opposed
to merely checking for their availability). Since there were fewer than 100 sessions during
the investigated interval, we were able to verify these two hypotheses.
52
4.2.3 Detecting Synchronicity
Because the heuristic for detecting synchronicity so parallels the above algorithm for de-
tecting periodicity, we elected not to evaluate it in simulation. Instead, we attempted to
find synchronous behavior on the test network.
There are several types of synchronous behaviors for which we could search, including
distributed denial of service (many hosts all connecting to a target), port scanning (one host
connecting at once to a large number of ports on a particular machine to find vulnerable
services), and spreading (one infected machine attempting to connect to the same server
port on a large number of peers all at once, to try to exploit a vulnerable service). We
elected to search for spreading behavior.
Our investigation began as follows. We ranked the list of server ports by session count,
and clustered the top 250. We then clustered the interarrival times, this time looking for
large negative clusters. (Recall that large negative clusters in the interarrival time metric
is a signal of sessions of similar duration initiated very close together in time.) On further
investigation, we found that on 2009-02-18 19:50:38, host 129.170.249.116 initiated
718 sessions over port 6667. While the behavior was indistinguishable from spreading, we
concluded based on two observations that it in fact was not: first, the connections were to
remote hosts, not local ones, and second, the connections were over a port commonly used
for IRC, which is not typically running as an exploitable service on user machine. Therefore,
we hypothesized that we were in fact observing malware checking in to a broad swath of
command-and-control servers. Checking a handful of the external hosts contacted against
the Emerging Threats malware blacklist [40] confirmed this hypothesis.
53
Chapter 5
Conclusions
During the course this study, we built and documented a prototype network data collection
and behavior analysis system. The data collection system is suitable for exploring network
activity on arbitrarily fine- and coarse-grained scales, as well as generating input suitable
for both statistical and time-series analysis, as we have shown in the proof-of-concept ex-
perimentation. We learned a number of lessons about the practical implementation of such
a research platform, and have built a robust and stable system that we are confident will be
useful for further investigation of network behavior. We believe the metrics and algorithms
developed on the platform so far only begin to show the explanatory power of such an
approach.
Next, we explored several ways of characterizing and predicting network behavior in a
statistical sense, and applied change detection and time-series clustering methods to data
gathered.
Finally, we applied our research platform and methods to a real-world security problem,
developing methods for detecting subtle behavioral artifacts that, when observed together,
provide strong evidence for botnet infection and operation. We furthermore detected and
54
explained a few of those behaviors on a live network.
5.1 Evaluation of Methods
In the first set of experiments on change detection, we observed that sudden statistical
changes are detectable even in a near-zero-knowledge situation; a training period of fewer
than five observations was enough to make strong conclusions regarding sudden changes in
behavior. We also found that statistical “drift” was harder to detect, and noted several
difficulties with identifying slow shifts in behavior.
Next, we built detectors for beaconing and synchronization, two additional behaviors
typical to botnet-infected hosts; we traced out some of their strengths and weaknesses in
simulation, and tried them on the large dataset. We found that we needed to take particular
care in selecting the sensitivity parameters to the clustering heuristic to get good results,
but concluded that with some effort, time, and experience, reliable detectors could be built
for a number of botnet behavioral indicators.
One criticism of our approach specific to botnet detection is the contention that botnets
can evade detection by moving away from the IRC-based command-and-control architecture
we used as our model [41]. However, we believe that while IRC botnets may disappear, the
abstract model of tight, central control is very efficient and will survive. The specific port
numbers and protocols changing is immaterial, as what we are interested in behavior.
Another possible evasion a botmaster can take would be reprogramming his bots to
beacon at randomized rather than periodic intervals. However, such a botnet would suffer
from decreased utility, as response time would be unpredictable. Also, while check-ins can
be made less periodic, most attacks, by nature, must be synchronous in order to work.
In 2.1.6, we briefly discussed the besetting difficulty of intent modeling; that is, the
55
idea that anomalous behavior and malicious behavior are not necessarily identical. As we
learned in the botnet detection challenge, each of the botnet-indicator features we identified
might stem from either benign or malicious intent; recall that both instances of beaconing
behavior we observed in the wild were in fact benign. Perform some kind of filtering (e.g.
white-listing certain known-good classes of behavior) or inferencing (akin to Bayesian spam
filtering: scoring a host based on the presence of a number of indicative features) would
both likely prove good ways to address the false positives that would be generated by a
system that na¨ıvely flagged all observations of botnet-like behavior.
Certainly, we have solved neither the problems of botnet detection nor network charac-
terization perfectly. We do believe that we have taken a significant, incremental step on the
spectrum from signature-based detection to behavior-based detection, and that is a step
we have taken away from the ultimately unwinnable “arms race” and towards an effective
solution.
5.2 Further Work
The true power of the network behavior analysis system we have made is its generality.
This study has opened a great many research questions about network and host behavior
in general. We present several ideas for further work.
Knowledge bases. The classical AI solution to the “drift” problem we encountered would
be to implement a knowledge-base representation of learned network behavior. With
our metrics easily machine-specified to NetFEE, an easy and fruitful next step would
be the automatic creation of a standard catalog of network indicators and a measured
baseline for them on the network monitored.
56
Automatic flow correlation. In the botnet detection application of this investigation,
we hand-explored the network space to find the particular network cross-sections and
metrics that worked for detecting botnet behavior. It was labor-intensive and full
of all manner of ad hockery. In order to cope with a bigger haystack and a smaller
needle, our methods would benefit from a method of automatically pruning flows
which are correlated to each other. Some work has already been done in this area;
Strayer et al are clustering flows which are related by application, size, function, and
multicast [42]. Such methods would make the methods we have demonstrated here as
proof-of-concept fully viable.
Behavioral fingerprinting. We observed surprisingly stable, predictable behavior in the
session-based metrics we explored, and have already begun to speculate about other
application areas, such as behavioral host fingerprinting and comparing machine-
generated network traffic to human-generated traffic [43].
57
Bibliography
[1] http://www.snort.org.
[2] http://www.wireshark.org.
[3] Doug Madory. New methods of spoof detection in 802.11b wireless networking. Master’s
thesis, Dept. of Computer Science, Dartmouth College, June 2006.
[4] Mark Crovella and Azer Bestavros. Self-similarity in world wide web traffic: evidence
and possible causes. IEEE/ACM Trans. Netw., 5(6):835–846, 1997.
[5] B. Claise. Cisco systems netflow services export version 9. RFC 3954 (Informational),
October 2004.
[6] http://www.sflow.org.
[7] M. Allman, V. Paxson, and W. Stevens. Tcp congestion control. RFC 2581 (Proposed
Standard), April 1999. Updated by RFC 3390.
[8] Jian Yuan, Jian Wang, Zanxin Xu, and Bing Li. Time-dependent collective behavior
in a computer network model. Physica A: Statistical Mechanics and its Applications,
368(1):294 – 304, 2006.
[9] Zhaosheng Zhu, Guohan Lu, Yan Chen, Z.J. Fu, P. Roberts, and Keesook Han. Botnet
research survey. pages 967–972, 28 2008-Aug. 1 2008.
[10] G.P. Schaffer. Worms and viruses and botnets, oh my! rational responses to emerging
internet threats. Security Privacy, IEEE, 4(3):52–58, May-June 2006.
[11] David Dagon, Guofei Gu, Christopher P. Lee, and Wenke Lee. A taxonomy of botnet
structures. Computer Security Applications Conference, Annual, 0:325–339, 2007.
[12] Srikanth Kandula, Dina Katabi, Matthias Jacob, and Arthur W. Berger. Botz-4-sale:
Surviving organized ddos attacks that mimic flash crowds. In 2nd Symposium on
Networked Systems Design and Implementation (NSDI), Boston, MA, May 2005.
[13] C. Shannon and D. Moore. The spread of the witty worm. Security Privacy, IEEE,
2(4):46–50, July-Aug. 2004.
[14] Neil Daswani and Michael Stoppelman. The anatomy of clickbot.a. In HotBots’07:
Proceedings of the first conference on First Workshop on Hot Topics in Understanding
Botnets, pages 11–11, Berkeley, CA, USA, 2007. USENIX Association.
[15] N. Ianelli and A. Hackworth. Botnets as a vehicle for online crime, December 2005.
58
[16] http://www.secureworks.com/research/threats/topbotnets.
[17] D.A. Keim, F. Mansmann, J. Schneidewind, and T. Schreck. Monitoring network traffic
with radial traffic analyzer. Symposium On Visual Analytics Science And Technology,
0:123–128, 2006.
[18] http://www.cs.ubc.ca/
~
spark343/NAV.pdf.
[19] Jonathan McPherson, Kwan-Liu Ma, Paul Krystosk, Tony Bartoletti, and Mar-
vin Christensen. Portvis: a tool for port-based detection of security events. In
VizSEC/DMSEC ’04: Proceedings of the 2004 ACM workshop on Visualization and
data mining for computer security, pages 73–81, New York, NY, USA, 2004. ACM.
[20] http://www.packetanalytics.com/.
[21] http://www.q1labs.com/.
[22] http://www.riverbed.com/products/cascade/.
[23] http://www.tippingpoint.com/.
[24] Christopher Roblee and George Cybenko. Implementing large-scale autonomic server
monitoring using process query systems. In ICAC ’05: Proceedings of the Second
International Conference on Automatic Computing, pages 123–133, Washington, DC,
USA, 2005. IEEE Computer Society.
[25] http://www.cisco.com/web/go/mars.
[26] http://www.lancope.com/products/.
[27] Moheeb Abu Rajab, Jay Zarfoss, Fabian Monrose, and Andreas Terzis. A multifaceted
approach to understanding the botnet phenomenon. In IMC ’06: Proceedings of the
6th ACM SIGCOMM conference on Internet measurement, pages 41–52, New York,
NY, USA, 2006. ACM.
[28] David Barroso. Botnets – the silent threat. Technical report, 2007.
[29] http://www.bothunter.net.
[30] http://www.tcpdump.org.
[31] http://www.postgresql.org.
[32] http://en.wikipedia.org/wiki/Kullback-Leibler_divergence.
[33] http://en.wikipedia.org/wiki/Student%27s_t-test.
[34] http://en.wikipedia.org/wiki/Kolmogorov-Smirnov_test.
[35] Tanya M Shenk and Alan B Franklin. Modeling in natural resource management :
development, interpretation, and application. Island Press, Washington, DC, 2001.
[36] Warren. Clustering of time series data–a survey. Pattern Recognition, 38(11):1857–
1874, November 2005.
59
[37] J. B. Macqueen. Some methods of classification and analysis of multivariate observa-
tions. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and
Probability, pages 281–297, 1967.
[38] David Arthur and Sergei Vassilvitskii. k-means++: The advantages of careful seeding.
Technical Report 2006-13, Stanford InfoLab, June 2006.
[39] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Intro-
duction to Algorithms, Second Edition. The MIT Press, September 2001.
[40] http://www.emergingthreats.net/rules/emerging-botcc.rules.
[41] Subhabrata Sen, Oliver Spatscheck, and Dongmei Wang. Accurate, scalable in-network
identification of p2p traffic using application signatures. In WWW ’04: Proceedings of
the 13th international conference on World Wide Web, pages 512–521, New York, NY,
USA, 2004. ACM.
[42] W. Timothy Strayer, Christine Jones, Beverly Schwartz, Sarah Edwards, Walter Mil-
liken, and Alden Jackson. Efficient multi-dimensional flow correlation. In LCN ’07:
Proceedings of the 32nd IEEE Conference on Local Computer Networks, pages 531–538,
Washington, DC, USA, 2007. IEEE Computer Society.
[43] Sally Floyd and Vern Paxson. Difficulties in simulating the internet. IEEE/ACM
Trans. Netw., 9(4):392–403, 2001.
60
This thesis brought to you by L
A
T
E
X and NoD¯ oz.
61

Sponsor Documents

Or use your account on DocShare.tips

Hide

Forgot your password?

Or register your new account on DocShare.tips

Hide

Lost your password? Please enter your email address. You will receive a link to create a new password.

Back to log-in

Close