Course  OR 778

Class Notes   12/5/01




Last Time :

    -    Large Variable Association

    -    Found way to make axes "commensurable"

    -    Main Problem:  tail indices not right?
 
 
 


Revisit: New Mathematics for "Heavy Tails"?


Recall Main Goal (from 10/22/01 and 10/24/01) :

Find "more realistic" version of

Heavy tailed durations   Long Range Dependence


Motivating data:    HTTP Response Data, from UNC, 2001
 

Recall log-log CCDF: graphic

Suggests classical extreme value theory "hasn't kicked in yet"
 

Surprising point:

distributions very similar to each other,
across widely different time periods


Suggests "systematic", not "noise driven" behavior
 
 
 


Revisit: New Mathematics for "Heavy Tails"? (cont.)




A weakness of above analysis:

only done for 0.001, 0.002, ... 0.999 quantiles

    (since worked with summarized data)
 

Consequence:  for n ~ 1,000,000 - 7,000,000,

have "poor resolution of tails"

("chunkiness" at bottom)




Improved version:  "insert" raw data values, graphic

    -    chunkiness is gone

    -    distributional shape still surprisingly constant

    -    still suggests systematic nonlinear behavior

    -    still clear "classical tail theory" doesn't apply
 
 
 


Revisit: New Mathematics for "Heavy Tails"? (cont.)


Suggests need to update concept of:

Heavy tailed durations   Long Range Dependence


since "classical" notions of "heavy tails" don't apply.
 

Recall earlier approach:

study "pointwise slopes" in log-log CCDF:  graphic

    -    based on difference quotients (truncated below)

    -    zero denominators caused problems

    -    but clear visual impression of "usually in heavy tail range"

(recall    leads to LRD)

(does   lead to non-stationarity???)

    -    suggests possibility of updating above theory
 
 
 


Revisit: New Mathematics for "Heavy Tails"? (cont.)




Improved version of slopes analysis:

"insert" raw data values, graphic




    -    "0 denominator" problem becomes much worse

(since discrete quantiles did some smoothing)

    -    hence tried "grid based difference quotients"

    -  still "feels noise"

    -  has "smoothed out noise"

    -    general shape is same for both

    -    same lesson as before about effective tail indices

    -    still seems worth updating above theory
 
 
 


Revisit: New Mathematics for "Heavy Tails"? (cont.)




Provides new motivation for definitions of "heavy tails":

(from 10/24/01)




Version 1:    For some ,




Which results in:

Open Problem 1:    For the simple Model (recall from 10/1/01),

with Version 1 tailed Duration Dist'n,   is

?




(i.e. Version 1 heavy tails results in index   LRD?)
 
 
 


Revisit: New Mathematics for "Heavy Tails"? (cont.)




Note:  Version 1 is not "completely right",

    but instead only holds "most of the time"
 

Improvement:  modified definition of "heavy tailed"
 

Version 2:    Reformulate, in terms of:    have

"most of the time"  (in some sense)?
 

Open Problem 2:    For the simple model (recall from 10/1/01),

with Version 2 tailed Duration Dist'n,

can we still have    (in a suitable sense)?
 

How do we modify version 2 to make this happen?
 
 
 
 


A "really important" open problem (from before)


Can lognormal durations lead to Long Range Dependence?
 

Simplest (widely held) answer:

No, not for any fixed log-normal


Deeper question:

what if the lognormal changes during the asymptotics?


Who cares?

    -    Common misperception about "infinite moments"
 

A precise question:  for a sequence of "simple models",

    with log normal (parameters   and ) durations,

    under what conditions (on   and ) does classical L.R.D.

(in any sense defined in Lecture9-19-01),

    result, as ????
 
 
 


A "really important" answer?


Hannig, J., Marron, J. S., Samorodnitsky, G. and Smith F. D.

(2001) "Log-normal durations can give long range dependence".
 

For a sequence of "simple models" from Lecture 10-1-01,

    -    indexed by 

    -    Continuous time (simpler than discrete?)

    -    Homogeneous Poisson() "starting time" (for flows)

    -    Draw independent "duration time" (for each flow)

    -    Define    =  number of "active flows"
 

Choose the durations to be log-normal (,), and choose

    -       (growing intensity)

    -          (lognormal mean goes 0)

    -        (lognormal "spreads")
 

Note that for each  ,  the durations are "light tailed"

but "compressed towards 0", and "variance expanded",

so in , get "a few large values" and "many small values"
 

Is growing intensity "objectionable"?

    -    by some standard measures

    -    but internet traffic is growing rapidly...

    -    why do asymptotics?

            -    limiting process models something?    No!

            -    because everybody always has?    No!

            -    to illuminate important, underlying structure?    Yes!
 
 
 


A "really important" answer? (cont.)


Now measure "Long Range Dependence" through covariance:


Main result:  Given  and  ,

for any sequence    with  :


Intuitive Idea:   for a "wide range of lags ",

the autocovariance   

has a "polynomial rate of decay"  ,

which is a "common symptom of long range dependence"
 
 
 


Some Speculations


Recall Mice and Elephants:

    -    Aggregation of Mice    short range indepen’ce

(Cleveland, et. al.)

    -    Aggregation of Elephants     long range depen’ce
 

Deep question:  which will “eventually dominate”?

    -    Cleveland:   theory for mice swamping elephants

    -    Others (e.g. me):   elephants will continue to grow
 
 
 


Some Speculations


Future Major Players(???):

Voice/Video over IP, and other applications

    -    Strong “Quality of Service” Demands

    -    Big $$$$ behind this
 

Industry response???
 

Short term future:

    -    Internet Service Providers:

        -    Financial incentive for over-provisioning
 

Long term future(????):  Interesting competition between:

    -    Above Overprovisioning

    -    "Differentiated service"

e.g.  marked and prioritized packets

    -    intermediate solutions
 
 
 
 


Kulkarni’s view of Internet Traffic Research


Ancient Indian Story:

Blind men describe an elephant


 View 1:     It is like a tree trunk …

 View 2:     It is like a rope …

 View 3:     It is like a long hose …

 View 4:     It is like a large fan …
 
 
 


Request for comments:




Cornell Evaluations
 
 

Personally Important Questions:




Q1:    Html lectures, vs. Power-Point?
 

Q2:    How to encourage more student involvement?
 
 
 
 
 
 
 
 
 


Topics never covered:

Stationarity?  (scale based?)  (paper from Richard)
Cleveland & Cao and Ramanan asymptotics & mice vs. elephants
New "big trace" analysis
Shared Fate Visualization
Functional Data Analysis of Traces