Network Performance Measurement and Analysis
Measurement and Analysis Overview
• Size, complexity and diversity of the Internet makes it very difficult to understand cause-effect relationships • Measurement is necessary for understanding current system behavior and how new systems will behave – How, when, where, what do we measure? • Measurement is meaningless without careful analysis – Analysis of data gathered from networks is quite different from work done in other disciplines • Measurement/analysis enables models to be built which can be used to effectively develop and evaluate new techniques – Statistical models – Queuing models – Simulation models
Determining What to Measure
• Before any measurements can take place one must determine what to measure • There are many commonly used network performance characteristics – Latency – Throughput – Response time – Arrival rate – Utilization
– Bandwidth – Loss – Routing – Reliability
Measurement Introduction
• Internet measurement is done to either analyze/characterize network phenomena or to test new tools, protocols, systems, etc. • Measuring Internet performance is easier said than done – What does “performance” mean? – Workload (what and where you’re measuring) selection is critical • Reproducibility is often essential • Many tools have been developed to measure/monitor general characteristics of network performance – traceroute and ping are two of the most popular • These are examples of active measurement tools – Passive tools are the other major category • Representative and reproducible workload generation will be a focus
Active Measurement Tools
• Send probe packet(s) into the network and measure a response – Ping: RTT and loss • Zing: one way Poisson probes – Traceroute: path and RTT – Nettimer (Lai): latest bottleneck bandwidth using packet pair method
– Pathchar: per-hop bandwidth, latency, loss measurement • Pchar, clink: open-source reimplementation of pathchar • Problem: measurement timescales vary widely
Passive Measurement Tools
• Passive tools: Capture data as it passes by – Logging at application level – Packet capture applications (tcpdump) uses packet capture filter (bpf,libpcap) • Requires access to the wire • Can have many problems (adds, deletes, reordering) – Flow-based measurement tools – SNMP tools – Routing looking glass sites • Problems – LOTS of data! – Privacy issues – Getting packet scoped in backbone of the network
Workload Generation
• Local and/or wide area experiments often require representative and reproducible workloads • How do we select a workload? – Currently HTTP makes up the majority of Internet traffic • Trace-based workloads – Capture traces and replay them – Black-box method • Synthetic workloads – Abstraction of actual operation – May not capture all aspects of workload • Analytic workloads – Attempt to model workload precisely – Very difficult
SURGE Web Workload Generator
• Scalable URl Generator – Analytic workload generator – Based on 12 empirically derived distributions of Web browsing behaviror – Explicit, parameterized models – Captures “heavy-tailed” (highly variable) properties of Web workloads – Widely used
• SURGE components: – Statistical distribution generator – Hyper Text Transfer Protocol (HTTP) request generator
Workload characteristics captured in SURGE
SURGE Architecture
Analyzing Measured Data
• Analyzing measured data in networks is typically done using statistical methods – Selecting appropriate analysis method(s) is critical • Averaging • Dispersion (variability) • Correlations • Regression analysis • Distributional analysis • Frequency analysis • Principal-component analysis • Cluster analysis • Each form of analysis has strengths and weaknesses
Queuing Models
• One of the key modeling techniques for computer systems in general – Vast literature on queuing theory – Nicely suited for network analysis – Prof. Mary Vernon is our local expert • Generally, queuing systems deal with a situation where jobs (of which there are many) wait in line for a resource (of which there are few) – Queuing theory can enable us to determine response time – Examples? • Example: packets arriving at a router – how can we determine how long it takes for packets to be forwarded by the router? • Characteristics necessary to specify a queuing system – Arrival process – Service time distribution – Number of servers – System capacity (number of buffers) – Population size – Service discipline – Kendal notation: A/S/m/B/K/SD • Response time = waiting time + service time • For stability, mean arrival rate must be less than mean service rate
Little’s Law
• One of the most basic theorems in queuing theory (1961) • Mean number jobs in system = arrival rate * mean response time – Treats a system as a black box – Applies whenever number of jobs entering the system equals number of jobs leaving the system • No jobs created or lost inside system – Can be extended to include systems with finite buffers
Simulation Models
• Simulation is one of the most common/important methods of analysis/modeling – Typically an abstraction of the system under consideration – Can provide significant insight to system’s behavior • Network simulation is difficult because of the different layers of operation and the complexity at each layer • Simulation options: build your own, use someone else’s • Canonical network simulator is ns developed at LBL