Course  OR 778

Class Notes   10/10/01

Last Time:

    -    in general context of:

Heavy tailed durations   Long Range Dependence

    -    quick overview of extreme value theory

    -    revisited simple model assumptions

    -    Downey's argument for log normal

    -    "Important" open problem

Recall Simple Model

Intuitive basis:  Mice and Elephants plot from before:



a.    "line starts"  as homogenous Poisson process

b.    durations, (i.e. "line lengths") as indep., i.i.d. some dist'n

Possible variations:

    -    binned data

    -    counts at discrete time points

    -    discrete time

    -    Weibull process starts  (aggregate indep. Weibull inter-arr's)

    -    Clustered Poisson starts

Simple Model (cont.)

Nomenclature:  some names:

    -    "Model with M/G/  input"

    -    Fluid queue with poisson input

Interesting survey paper:

Resnick, S. and Samorodniztky, G. (2000) Fluid queues, on-off processes and teletraffic modelling with highly variable and correlated inputs, in Self Similar Network Traffic and Performance Evaluation, Eds: A. Erramilli, W. Willinger and C. Park, 171-192.

Deeper Exploration of Model Assumptions

Recall 2 views (shown earlier):

1. SiZer analysis of homogeneity (constant intensity) of flow starts


    -    clearly not homogenous (lots of red & blue regions)

    -    but not so "rough" as packet level analysis?

all packets off-peak

    -    homogeneous Poisson OK as an approximation????

Aside:    all of these have far more SiZer structure than a

simulated homogeneous Poisson process

Deeper Exploration of Model Assumptions (cont.)

2.    QQ investigation of interarrival times between session starts

2 a.    Exponential QQ, scale est'd by sample mean


    -    unacceptable fit

    -    especially slope

    -    suggesting wrong shape parameter

    -    these parameters were used in above SiZer simulation

Deeper Exploration of Model Assumptions (cont.)

2 b.    Weibull QQ, parameters est'd by quantile matching


    -    very good fit

    -    shape parameter 0.9 "close to Poisson 1.0"?

    -    provides a workable approximation????

A possibility:    improve model by replacing "Poisson Process"

by "Weibull(0.9) waiting time process"

Weibull inter-arrival model

Repeat above analysis, for Weibull interarrival process,

parameters (0.8986, 0.01479) from above Q-Q

1. SiZer analysis of homogeneity  (graphic)

    -    No inhomogeneity visible

    -    i.e. no "strong evidence"    (null hypo. not rejected)

    -    Suggests Weibull interarrivals not an explanation

(of inhomogeneity)?

    -    Weibull interarrival process same as Poisson?

    -    or only "nearly same", since 0.89 ~ 1 ??

    -    smaller Weibull shape parameter will produce

"non-Poisson clumping"?

Weibull inter-arrival model (cont.)

Aside:    explore smaller (more extreme) Weibull shape parameter


    -    scale parameter chosen to "give approx. mean"

    -    here and below

    -    large amount of inhomogeneity

    -    so small enough Weibull shape clearly different from Poisson


    -    More similar amount on inhomogeneity

    -    still "too much"??


    -    better?

    -    still too much inhomogeneity at small scales??


    -    better at small scales?

    -    but now not enough at medium scales??

    -    appears to be "trade-off" between scales?

    -    "can't get this right"  -  suggests model wrong???

    -    too much splitting of hairs here????

Weibull inter-arrival model (cont.)

Close the circle:  could shape parameter 0.45 fit the data?

Q-Q plot

    -    clearly way off

    -    to get "correct inhomogeneity", need "wrong interarrivals"

    -    conclude Weibull interarrival process is poor fit

Question:    are there modifications of "simple model"

that better fit the data?

Cluster Poisson model

Idea:  At each homogeneous Poisson point,

generate a Poisson() number of additional points

from the Triangular(0,) distribution


    -    For small   have nearly no clusters

    -    For large   have large clusters (very non-homogeneous?)

    -    Physical motivation:  web pages that call others

(e.g. additional ads/graphics)

    -    Triangle is "sensible model" for load generated calls?

    -    How to estimate ?

Cluster Poisson model (cont.)

Investigate choice of   by exploiting

"memoryless property" of exponential

Idea:    for any ,  and  Exponential

   has the same Exponential distribution

Thus can "raise  until QQ plot looks exponential",

which reflects elimination of effect of compact cluster dist'n

Memoryless Q-Q plot movie

    -    suggests reasonable  is 0.01 - 0.1 sec.??

    -    very sensible since near "TCP round trip time"?

    -    thus start with   =  0.1 (sec)

    -    try to "choose   by eye"?

Cluster Poisson model (cont.)

Try SiZer  analysis of homogeneity

recall real data analysis

Attempt 1  =  4

    -    liberal guess at "average # new requests generated"?

(liberal aimed at good chance of finding structure)

    -    Start process parameters give "similar range"

    -    has clusters, as desired (not homogeneous)

    -    but not enough clustering

    -    also reflected by peaks too small in blue curves

    -    can get more clustering by larger 

Cluster Poisson model (cont.)

Attempt 2  =  8

    -    definite improvement

    -    not quite enough clustering?

    -    too much small scale structure??

    -    magnitude of wiggles in  blue curves  is "about right"?

    -    but  blue curve  wiggles are "too high in frequency"??

Attempt 3  =  16

    -    large scale structure "about right"?

    -    but too much "fine scale" structure?

    -    again magnitude of wiggles in  blue curves  is "about right"?

    -    but  blue curve  wiggles are "too high in frequency"??

Cluster Poisson model (cont.)

Attempt 4  =  4,   =  20

    -    now   was chosen for visual impression

    -    overall visual impression is "really good"?

    -    SiZer  provides "delicate and precise visual instrument"??

    -    intuition behind 20 sec cluster time window????

    -    too slow for TCP round trip time?

    -    too fast for "user generated clicks"?

    -    maybe this effect is "nonstationarity" or

"long range dependence", not clustering????

Cluster Poisson model (cont.)

Complete this circle with a QQ analysis:

Weibull QQ with quantile matched parameters

    -    not bad for most of distribution

    -    << 0.1% is "twice as big as should be"?

    -    simple explanation of this???

    -    shape parameter estimated at 0.95 > previous 0.9

Cluster Poisson model (cont.)

log version of Weibull QQ

    -    really different view as usual

    -    big values no longer so offensive

    -    pretty good fit for small values

    -    note significant "curvature" near 0.01 quantile

    -    not quite Weibull, but reasonable working approx'n?

    -    what about different shape parameter?

Cluster Poisson model (cont.)

Weibull(0.9) QQ

    -    now enforce shape parameter = 0.9

    -    scale estimated by MLE (given shape)

    -    curvature more visually apparent?

    -    maybe better overall fit than above??

    -    overall approximation looks better???

    -    could fine tune, using shape of cluster dist'n????

(not clear how much time this is worth)

Cluster Poisson model (cont.)

Overall Conclusion (personal):

    -    Cluster Poisson is worthwhile modification of simple model

    -    time parametrization question worth more investigation

    -    this analysis seems publishable??

    -    worth trying other cluster distributions????

Open question:    worth trying this "Cluster Poisson" analysis

on all packet data (not just flow starts as here)?