Course  OR 778

Class Notes   10/29/01




Last Time:

    -    Asymptotic Independence, using SiZer analysis

    -    relationship between Size, Time and "Rate" = Size/Time

    -    Tail index estimation via slope of log-log CCDF

    -    Long Range Dependence vs. ARIMA(1)?

    -    Zooming Periodogram Analysis?
 
 
 


Cascaded On-Off Process



Context:  Individual TCP Flows (Session traces)
 

    -    Recall Size of data transferred vs. Time
 

    -    From Lecture9-12-01, recall Zooming Flow Graphic
 

    -    Shows "behaviors at variety of scales"

            -    Packet level

            -    TCP Windowing

            -    Longer "Congestion Delays"
 
 
 


Cascaded On-Off Process (cont.)



Today's Goal:    "Simulation Reconstruction",  i.e.
 


Find a model for individual TCP flows, that:



1. “Looks right”
 

2. Gives “correct” statistical properties (dependence, …)
 

3. Aggregates “correctly” (scaling, dependence, …)
 

4. Fits easily into queueing analysis.
 

5. Makes “physical” sense
 
 
 


Cascaded On-Off Process (cont.)



A "standard approach":  Conservative Cascades
 

a. Good simple visual performance (esp. inverse cascade)
 

b. Parameter estimation failed
 

c. Wrong type of dependence structure in data
 

d. No physical insight
 

e. Explored by Krishanu Maulik
 
 
 


Cascaded On-Off Process (cont.)




Motivating Ideas:
 
 

    -    each packet is a “rapid burst” (on times)
 
 

    -    waiting times  (off times)  in between are very diverse

(orders of magnitude different)










Cascaded On-Off Process (cont.)




Mathematical Formulation
 

I. Independent On – Off Processes, 

where    is
 

    “on” (i.e. )  for exponential times, with rate 
 

    “off” (i.e. )  for exponential times, with rate 
 

Graphical Example (top row)
 
 
 


Cascaded On-Off Process (cont.)




Mathematical Formulation (cont.)
 

II.  Vary the “gap distribution” by multiplying:




III.  Normalize to keep overall expected value the same:



Graphical Example (lower left)
 
 
 


Cascaded On-Off Process (cont.)




Mathematical Formulation (cont.)
 
 

IV.  Cumulative    has “physical interpretations”:
 

    -    constant “on rates” (reflects “link capacity”)
 

    -    session is “off” unless “all nodes are on”
 

    -    wide range of “off times”
 
 
 


Cascaded On-Off Process (cont.)




Simple simulated examples to build insight:
 
 

E.g. 1:    Increasing :    shorter on times    more flat parts
 

E.g. 2:    Increasing :    shorter off times    more “diagonal”
 

E.g. 3:    Increasing :    steeper c.d.f.  &  more diverse “off’s”
 
 
 


Cascaded On-Off Process (cont.)




Fit model to data:
 

Idea:  use

    -    “peak rate” = 

    -   number of packets in trace

    -   time stamp (secs) of   packet

    -   size (bytes) of   packet
 

to estimate parameters:   & 
 
 
 


Cascaded On-Off Process (cont.)




Microscopic view of model:
 

Toy Graphic
 

    -    “packet transmission time” give peak rate, 
 

    -   drives “on” and “off” distribution relationship
 

    -    does peak rate affect “macroscopic shape”???
 
 
 


Cascaded On-Off Process (cont.)




Parameter Estimation 1:
 

For a given value of the level 
 

1.    “Get total size right”,  i.e: est. the “mean rate”, ,  by



2.    "Make jumps right”, i.e: est. the “mean on time”, ,  by

=  "Prop'n On"   "Time/Packet"



3.    “Time conservation” gives the “mean off time”, ,  as:

= "Time/Packet" - "Mean on time"



4.    Solve rate equations to get:





Cascaded On-Off Process (cont.)



Real Data Application:
 

Recall 10 Individual Session traces:    Graphic
 
 

I.    Untruncated Analysis:    Graphic

    -    clearly should “truncate”
 
 

II.    Truncated Analysis:    Graphic

    -    looks promising???
 
 

III.    Better View: “show the jumps”
 

Compare "no jump view" with "jumps included view"

"Jump view over all traces"


    -    Only microscopic differences, but better to show?
 
 

IV.    Try several values of  :    Graphic
 

 - Looks promising, if can estimate    well.
 
 

V.    Analyze On & Off Distributions:    Graphic

    -    “Off” Family and SiZer:  Most very small, few large

    -    “Off” on log scale:  usually 2 clusters

    -    “Off” not Weibull or Pareto

    -    “On” mostly = biggest value
 
 
 


Cascaded On-Off Process (cont.)



Basis of Comparison:
 

Simulation:  10 realizations from Cascaded On-Off,
 

then analyze, On  and Off Distributions



:    Graphic

    -    On  and  Off  both nearly exponential
 

:    Graphic

   -    On = exp.,   Off is sharper Weibull
 

:    Graphic

    -    On = exp.,   Off almost non-Weibull
 

:    Graphic

    -    On = exp.,   Off not Weibull, lighter tails
 
 

Interesting View:    Corresponding Weibulls
 
 

Main lesson:       is a “shape parameter”???
 
 

Idea:    Base estimation of    on 2nd moments?
 
 
 


Cascaded On-Off Process (cont.)



Estimation of Cascade Level  
 

Idea:  “variance matching”
 

1. Consider range of    values.
 

2. Estimate  ,    as before.
 

3. Compute  
 

4. Choose     to “match”    with sample var  
 
 
 


Cascaded On-Off Process (cont.)



Application to usual 10 traces:    Graphic
 

    -    Looks great?
 

    -    Estimate    is 6 – 12
 

    -    Computation time:  few minutes – 4 hours
 

    -    Peak Session 1 is worst. How bad???
 

    -    Still miss “shape”,   e.g. “TCP Slow Start”??