Course  OR 778

Class Notes   10/10/01



Last Time:

    -    in general context of:

Heavy tailed durations   Long Range Dependence



    -    quick overview of extreme value theory

    -    revisited simple model assumptions

    -    Downey's argument for log normal

    -    "Important" open problem
 
 
 


Recall Simple Model







Intuitive basis:  Mice and Elephants plot from before:

graphic








Model:
 

a.    "line starts"  as homogenous Poisson process
 

b.    durations, (i.e. "line lengths") as indep., i.i.d. some dist'n
 
 

Possible variations:

    -    binned data

    -    counts at discrete time points

    -    discrete time

    -    Weibull process starts  (aggregate indep. Weibull inter-arr's)

    -    Clustered Poisson starts
 
 
 


Simple Model (cont.)







Nomenclature:  some names:
 

    -    "Model with M/G/  input"
 

    -    Fluid queue with poisson input
 
 

Interesting survey paper:
 

Resnick, S. and Samorodniztky, G. (2000) Fluid queues, on-off processes and teletraffic modelling with highly variable and correlated inputs, in Self Similar Network Traffic and Performance Evaluation, Eds: A. Erramilli, W. Willinger and C. Park, 171-192.
 
 


Deeper Exploration of Model Assumptions







Recall 2 views (shown earlier):
 

1. SiZer analysis of homogeneity (constant intensity) of flow starts

off-peak






    -    clearly not homogenous (lots of red & blue regions)

    -    but not so "rough" as packet level analysis?

all packets off-peak

    -    homogeneous Poisson OK as an approximation????
 
 

Aside:    all of these have far more SiZer structure than a

simulated homogeneous Poisson process









Deeper Exploration of Model Assumptions (cont.)







2.    QQ investigation of interarrival times between session starts
 

2 a.    Exponential QQ, scale est'd by sample mean

Offpeak

    -    unacceptable fit

    -    especially slope

    -    suggesting wrong shape parameter

    -    these parameters were used in above SiZer simulation
 
 
 


Deeper Exploration of Model Assumptions (cont.)







2 b.    Weibull QQ, parameters est'd by quantile matching

Offpeak

    -    very good fit

    -    shape parameter 0.9 "close to Poisson 1.0"?

    -    provides a workable approximation????
 
 

A possibility:    improve model by replacing "Poisson Process"

by "Weibull(0.9) waiting time process"









Weibull inter-arrival model







Repeat above analysis, for Weibull interarrival process,

parameters (0.8986, 0.01479) from above Q-Q







1. SiZer analysis of homogeneity  (graphic)

    -    No inhomogeneity visible

    -    i.e. no "strong evidence"    (null hypo. not rejected)

    -    Suggests Weibull interarrivals not an explanation

(of inhomogeneity)?

    -    Weibull interarrival process same as Poisson?

    -    or only "nearly same", since 0.89 ~ 1 ??

    -    smaller Weibull shape parameter will produce

"non-Poisson clumping"?









Weibull inter-arrival model (cont.)







Aside:    explore smaller (more extreme) Weibull shape parameter
 

Weibull(0.1)

    -    scale parameter chosen to "give approx. mean"

    -    here and below

    -    large amount of inhomogeneity

    -    so small enough Weibull shape clearly different from Poisson
 

Weibull(0.4)

    -    More similar amount on inhomogeneity

    -    still "too much"??
 

Weibull(0.45)

    -    better?

    -    still too much inhomogeneity at small scales??
 

Weibull(0.5)

    -    better at small scales?

    -    but now not enough at medium scales??

    -    appears to be "trade-off" between scales?

    -    "can't get this right"  -  suggests model wrong???

    -    too much splitting of hairs here????
 
 
 


Weibull inter-arrival model (cont.)







Close the circle:  could shape parameter 0.45 fit the data?

Q-Q plot






    -    clearly way off

    -    to get "correct inhomogeneity", need "wrong interarrivals"

    -    conclude Weibull interarrival process is poor fit
 
 
 

Question:    are there modifications of "simple model"

that better fit the data?









Cluster Poisson model







Idea:  At each homogeneous Poisson point,

generate a Poisson() number of additional points

from the Triangular(0,) distribution
 

Explanations:

    -    For small   have nearly no clusters

    -    For large   have large clusters (very non-homogeneous?)

    -    Physical motivation:  web pages that call others

(e.g. additional ads/graphics)

    -    Triangle is "sensible model" for load generated calls?

    -    How to estimate ?
 
 
 


Cluster Poisson model (cont.)







Investigate choice of   by exploiting

"memoryless property" of exponential







Idea:    for any ,  and  Exponential

   has the same Exponential distribution






Thus can "raise  until QQ plot looks exponential",
 

which reflects elimination of effect of compact cluster dist'n
 
 

Memoryless Q-Q plot movie
 

    -    suggests reasonable  is 0.01 - 0.1 sec.??

    -    very sensible since near "TCP round trip time"?

    -    thus start with   =  0.1 (sec)

    -    try to "choose   by eye"?
 
 
 


Cluster Poisson model (cont.)







Try SiZer  analysis of homogeneity

recall real data analysis







Attempt 1  =  4

    -    liberal guess at "average # new requests generated"?

(liberal aimed at good chance of finding structure)

    -    Start process parameters give "similar range"

    -    has clusters, as desired (not homogeneous)

    -    but not enough clustering

    -    also reflected by peaks too small in blue curves

    -    can get more clustering by larger 
 
 
 


Cluster Poisson model (cont.)







Attempt 2  =  8

    -    definite improvement

    -    not quite enough clustering?

    -    too much small scale structure??

    -    magnitude of wiggles in  blue curves  is "about right"?

    -    but  blue curve  wiggles are "too high in frequency"??
 
 

Attempt 3  =  16

    -    large scale structure "about right"?

    -    but too much "fine scale" structure?

    -    again magnitude of wiggles in  blue curves  is "about right"?

    -    but  blue curve  wiggles are "too high in frequency"??
 
 
 


Cluster Poisson model (cont.)







Attempt 4  =  4,   =  20
 

    -    now   was chosen for visual impression

    -    overall visual impression is "really good"?

    -    SiZer  provides "delicate and precise visual instrument"??

    -    intuition behind 20 sec cluster time window????

    -    too slow for TCP round trip time?

    -    too fast for "user generated clicks"?

    -    maybe this effect is "nonstationarity" or

"long range dependence", not clustering????









Cluster Poisson model (cont.)







Complete this circle with a QQ analysis:
 

Weibull QQ with quantile matched parameters

    -    not bad for most of distribution

    -    << 0.1% is "twice as big as should be"?

    -    simple explanation of this???

    -    shape parameter estimated at 0.95 > previous 0.9
 
 
 


Cluster Poisson model (cont.)







log version of Weibull QQ

    -    really different view as usual

    -    big values no longer so offensive

    -    pretty good fit for small values

    -    note significant "curvature" near 0.01 quantile

    -    not quite Weibull, but reasonable working approx'n?

    -    what about different shape parameter?
 
 
 


Cluster Poisson model (cont.)







Weibull(0.9) QQ

    -    now enforce shape parameter = 0.9

    -    scale estimated by MLE (given shape)

    -    curvature more visually apparent?

    -    maybe better overall fit than above??

    -    overall approximation looks better???

    -    could fine tune, using shape of cluster dist'n????

(not clear how much time this is worth)









Cluster Poisson model (cont.)







Overall Conclusion (personal):

    -    Cluster Poisson is worthwhile modification of simple model

    -    time parametrization question worth more investigation

    -    this analysis seems publishable??

    -    worth trying other cluster distributions????
 
 

Open question:    worth trying this "Cluster Poisson" analysis

on all packet data (not just flow starts as here)?