Class Notes 10/29/01
Last Time:
- Asymptotic Independence, using SiZer analysis
- relationship between Size, Time and "Rate" = Size/Time
- Tail index estimation via slope of log-log CCDF
- Long Range Dependence vs. ARIMA(1)?
-
Zooming Periodogram Analysis?
Cascaded On-Off Process
Context: Individual
TCP Flows (Session traces)
-
Recall Size of data transferred vs. Time
-
From Lecture9-12-01,
recall Zooming Flow Graphic
- Shows "behaviors at variety of scales"
- Packet level
- TCP Windowing
- Longer "Congestion Delays"
Cascaded On-Off Process (cont.)
Today's Goal:
"Simulation Reconstruction", i.e.
Find a model for individual TCP flows, that:
1. “Looks right”
2. Gives “correct” statistical
properties (dependence, …)
3. Aggregates “correctly”
(scaling, dependence, …)
4. Fits easily into queueing
analysis.
5. Makes “physical” sense
Cascaded On-Off Process (cont.)
A "standard approach":
Conservative Cascades
a. Good simple visual performance
(esp. inverse cascade)
b. Parameter estimation failed
c. Wrong type of dependence
structure in data
d. No physical insight
e. Explored by Krishanu Maulik
Cascaded On-Off Process (cont.)
Motivating Ideas:
-
each packet is a “rapid burst” (on times)
- waiting times (off times) in between are very diverse
(orders of magnitude different)
Cascaded On-Off Process (cont.)
Mathematical Formulation
I. Independent On – Off Processes,
where
is
“on” (i.e. )
for exponential times, with rate
“off”
(i.e. )
for exponential times, with rate
Graphical
Example (top row)
Cascaded On-Off Process (cont.)
Mathematical Formulation
(cont.)
II. Vary the “gap distribution” by multiplying:
III. Normalize to keep overall expected value the same:
Graphical
Example (lower left)
Cascaded On-Off Process (cont.)
Mathematical Formulation
(cont.)
IV. Cumulative
has “physical interpretations”:
-
constant “on rates” (reflects “link capacity”)
-
session is “off” unless “all nodes are on”
-
wide range of “off times”
Cascaded On-Off Process (cont.)
Simple simulated examples
to build insight:
E.g.
1: Increasing :
shorter on times
more flat parts
E.g.
2: Increasing :
shorter off times
more “diagonal”
E.g.
3: Increasing :
steeper c.d.f. & more diverse “off’s”
Cascaded On-Off Process (cont.)
Fit model to data:
Idea: use
-
“peak rate” =
-
number of packets in trace
-
time stamp (secs) of
packet
-
size (bytes) of
packet
to estimate parameters: ,
&
Cascaded On-Off Process (cont.)
Microscopic view of model:
-
“packet transmission time” give peak rate,
-
drives “on” and “off” distribution relationship
-
does peak rate affect “macroscopic shape”???
Cascaded On-Off Process (cont.)
Parameter Estimation 1:
For a given
value of the level
1. “Get
total size right”, i.e: est. the “mean rate”, ,
by
2. "Make
jumps right”, i.e: est. the “mean on time”, ,
by
= "Prop'n On"
"Time/Packet"
3. “Time
conservation” gives the “mean off time”, ,
as:
= "Time/Packet" - "Mean on time"
4. Solve rate equations to get:
Cascaded On-Off Process (cont.)
Real Data Application:
Recall 10 Individual Session
traces: Graphic
I. Untruncated Analysis: Graphic
-
clearly should “truncate”
II. Truncated Analysis: Graphic
-
looks promising???
III. Better
View: “show the jumps”
Compare "no jump view" with "jumps included view"
-
Only microscopic differences, but better to show?
IV. Try
several values of :
Graphic
- Looks promising,
if can estimate
well.
V. Analyze On & Off Distributions: Graphic
- “Off” Family and SiZer: Most very small, few large
- “Off” on log scale: usually 2 clusters
- “Off” not Weibull or Pareto
-
“On” mostly = biggest value
Cascaded On-Off Process (cont.)
Basis of Comparison:
Simulation: 10 realizations
from Cascaded On-Off,
then analyze, On and Off Distributions
:
Graphic
-
On and Off both nearly exponential
:
Graphic
-
On = exp., Off is sharper Weibull
:
Graphic
-
On = exp., Off almost non-Weibull
:
Graphic
-
On = exp., Off not Weibull, lighter tails
Interesting View:
Corresponding Weibulls
Main lesson:
is a “shape parameter”???
Idea: Base
estimation of
on 2nd moments?
Cascaded On-Off Process (cont.)
Estimation of Cascade Level
Idea: “variance matching”
1. Consider range of
values.
2. Estimate ,
as before.
3. Compute
4. Choose
to “match”
with sample var
Cascaded On-Off Process (cont.)
Application to usual 10 traces:
Graphic
-
Looks great?
-
Estimate
is 6 – 12
-
Computation time: few minutes – 4 hours
-
Peak Session 1 is worst. How bad???
-
Still miss “shape”, e.g. “TCP Slow Start”??