Course  OR 778

Class Notes   10/3/01

Last Time:

    -    in general context of:

Heavy tailed durations   Long Range Dependence

    -    revisited protocol background

    -  Mice and Elephants graphic, with "one minute split" flows

    -    introduced simple model

    -    investigated modelling assumptions

            -    SiZer analysis:  not quite homogeneous intensity

            -    QQ analysis:  not quite exponential interarrivals

A quick overview of Extreme Value Theory

(Soundbite level introduction)


Resnick, S. I. (1987) Extreme values, regular variation and point processes, Springer.

Leadbetter, M. R., Lindgren, G. and Rootzen, H. (1983) Extremes and related properties of random sequences and processes, Springer.

A quick overview of Extreme Value Theory (cont.)

Context:    For i.i.d. random variables 

with cumulative distribution function ,

study the asymptotic (as )  distribution of

Comments / Corrections welcome!

A quick overview of Extreme Value Theory (cont.)

Analog:         Central Limit Theorem

describes the asymptotic distribution of the sample mean, as

Intuitive ideas:

    -    sample mean  "clusters around" population mean 

    -    gets closer as sample size grows

    -    gets closer at specific rate 

    -    precise "normalization" gives limiting distribution

A quick overview of Extreme Value Theory (cont.)

Say "  is in the domain of attraction of the distribution "


      there are sequences  and , so that


A quick overview of Extreme Value Theory (cont.)

Limiting distributions, :

Three types, depending on "upper end behavior" of 

1.    "Extreme Value", i.e. "Gumbel" 

    -    roughly happens for   with "exponential tails"

    -    e.g. Gaussian, Weibull, log Normal

    -    could be bounded (but "little mass near end")

2.    "Weibull" 

    -    roughly happens for   with "polynomial tails"

    -    e.g. Cauchy, Pareto

3.    "Negative Weibull" 

    -    roughly "bounded from above"

(with reasonable mass near end point)

    -    e.g. Uniform, negative of Exponential, Weibull or Pareto

A quick overview of Extreme Value Theory (cont.)


    -    can apply to minima as well as maxima, since

    -    which is useful for "inter-arrival times"

    -    since "time to next packet" is a minimum over flows

    -    thus expect Weibull distribution for inter-arrivals

    -    not all   have such a limiting distribution for 

    -    there exist precise mathematical characterizations

    -    elegant and fun mathematics

A quick overview of Extreme Value Theory (cont.)

An example:    the beta  density:

    note:  allows study of range of "upper bound behavior"

Can show:    domain of attraction is

Negative Weibull 

with shape parameter: 

Interesting cases:


        -    small near ,

        -    Weibull shape parameter is 

        -    i.e. has lighter tail than exponential

        -    sensible, since "few observations near "


        -   constant height near ,

        -    e.g. Uniform or negative of exponential

        -    Weibull shape parameter is 

        -    i.e. domain of attraction of exponential distribution

        -    special notes:

             -   this is interarrival times of Poisson Process

             -   other cases are departures from this


        -   has a pole (infinite peak) near ,

        -    Weibull shape parameter is 

        -    i.e. has heavier tail than exponential

        -    sensible since "more observations near "

Recall checking of asumptions of "simple model"

2 b.    Weibull QQ, parameters est'd by quantile matching


    -    very good fit

    -    shape parameter 0.9 "close to Poisson 1.0"?

    -    provides a workable approximation????

    -    now understand potential mechanism:

minimization of Beta r.v.'s with a pole at 0

    -    but "clustered Poisson" may be more likely??

    -    since can think of such a mechanism

Recall checking of assumptions of "simple model" (cont.)

Recall poor Q-Q analysis for:


    -    visual impression distorted by few large values

    -    very similar shape parameter 0.9

Fix for this (boundary effect) problem:

consider only flows that start after 0.2 of range:

Boundary adjusted peak

    -    now same result as off peak

    -    Weibull, with shape parameter ~0.9

    -    shows "very large values" before were at beginning

    -    can reject hypothesis of exponential (graphic)

    -    but still maybe OK as "first pass approximation"

An Open problem:  which of the following can explain the

strong mean changes observed

in the start time  SiZer  analysis?

(Recall graphics:   Off Peak               Peak)

a.    Independent Weibull(0.9) interarrivals?

b.    Poisson cluster process?

Recall checking of assumptions of "simple model" (cont.)

An aside on the  SiZer  analysis:

recall the possible boundary effect observed in the Peak case

    -    downwards slope

Investigate by starting only after 20% of range:

Boundary Adjusted Peak

    -    looks much better

    -    confirms problem was mostly boundary effect

    -    but could still be some non-stationarity

Revisitation of "heavy tails"

interesting "physical explanation" of

log normal file size distributions


Downey, A. B. (2000) The structural cause of file size distributions, Wellesley College Tech. Report CSD-TR25-2000,

Main results:

    -    Studies distributions of file sizes

    -    On individual computers

    -    Claims generally log Normal

    -    not Pareto

Revisitation of "heavy tails" (cont.)

Fundamental Premise:

Most files were created from other files


downloads, software installations, backups, ...

Translating and filtering:

changing format, compiling, ...


programming, word processing, ...

Major assumption:

Changes affect file sizes multiplicatively

    -    easy for copying: factor of 1

    -    very sensible for translating and filtering

    -    OK for most programming and word processing?

    -    wrong for:

            -    concatenated files

            -    text files "created from scratch"

Revisitation of "heavy tails" (cont.)

Simple stochastic model:

for original file size 

and a random modification factor 

new file size is 

Distribution of ?

Perhaps doesn't matter by "Central Limit Theorem argument"

(on log scale)

Revisitation of "heavy tails" (cont.)

Downey's conclusion:

Since log normal is "light tailed"  (e.g. all moments exist)

Then don't have:

Heavy tailed durations   Long Range Dependence

I.e. something else must be creating the apparent LRD

A "really important" open problem

Can the lognormal lead to Long Range Dependence?

Simplest answer:

No, not for any fixed log-normal

Deeper question:

what if the lognormal changes during the asymptotics?

A precise (actually not very!) question:  for "simple model",

    for what sequences of parameters   and ,

    do we get classical long range dependence

(in any sense defined in Lecture9-19-01),

as ????