Class Notes 12/5/01
Last Time :
- Large Variable Association
- Found way to make axes "commensurable"
    -   
Main Problem:  tail indices not right?
 
 
 
Revisit: New Mathematics for "Heavy Tails"?
Recall Main Goal (from 10/22/01 and 10/24/01) :
Find "more realistic" version of
Heavy tailed durations  
Long Range Dependence
Motivating data:   
HTTP Response Data, from UNC, 2001
 
Recall log-log CCDF: graphic
Suggests classical extreme
value theory "hasn't kicked in yet"
 
Surprising point:
distributions very similar
to each other,
across widely different
time periods
Suggests "systematic", not
"noise driven" behavior
 
 
 
Revisit: New Mathematics for "Heavy Tails"? (cont.)
A weakness of above analysis:
only done for 0.001, 0.002, ... 0.999 quantiles
    (since
worked with summarized data)
 
Consequence: for n ~ 1,000,000 - 7,000,000,
have "poor resolution of tails"
("chunkiness" at bottom)
Improved version: "insert" raw data values, graphic
- chunkiness is gone
- distributional shape still surprisingly constant
- still suggests systematic nonlinear behavior
    -   
still clear "classical tail theory" doesn't apply
 
 
 
Revisit: New Mathematics for "Heavy Tails"? (cont.)
Suggests need to update concept of:
Heavy tailed durations  
Long Range Dependence
since "classical" notions
of "heavy tails" don't apply.
 
Recall earlier approach:
study "pointwise slopes" in log-log CCDF: graphic
- based on difference quotients (truncated below)
- zero denominators caused problems
- but clear visual impression of "usually in heavy tail range"
(recall   
leads to LRD)
(does  
lead to non-stationarity???)
    -   
suggests possibility of updating above theory
 
 
 
Revisit: New Mathematics for "Heavy Tails"? (cont.)
Improved version of slopes analysis:
"insert" raw data values, graphic
- "0 denominator" problem becomes much worse
(since discrete quantiles did some smoothing)
- hence tried "grid based difference quotients"
    - 
still "feels noise"
    - 
has "smoothed out noise"
- general shape is same for both
- same lesson as before about effective tail indices
    -   
still seems worth updating above theory
 
 
 
Revisit: New Mathematics for "Heavy Tails"? (cont.)
Provides new motivation for definitions of "heavy tails":
(from 10/24/01)
Version 1:   
For some ,
Which results in:
Open Problem 1: For the simple Model (recall from 10/1/01),
with Version 1 tailed Duration Dist'n, is
(i.e. Version 1 heavy tails
results in index  
LRD?)
 
 
 
Revisit: New Mathematics for "Heavy Tails"? (cont.)
Note: Version 1 is not "completely right",
    but instead
only holds "most of the time"
 
Improvement:  modified
definition of "heavy tailed"
 
Version 2: Reformulate, in terms of: have
Open Problem 2: For the simple model (recall from 10/1/01),
with Version 2 tailed Duration Dist'n,
can we still have   
(in a suitable sense)?
 
How do we modify version
2 to make this happen?
 
 
 
 
A "really important" open problem (from before)
Can lognormal durations lead
to Long Range Dependence?
 
Simplest (widely held) answer:
No, not for any fixed log-normal
Deeper question:
what if the lognormal changes during the asymptotics?
Who cares?
    -   
Common misperception about "infinite moments"
 
A precise question: for a sequence of "simple models",
    with log
normal (parameters  
and 
) durations,
    under
what conditions (on  
and 
) does
classical L.R.D.
(in any sense defined in Lecture9-19-01),
    result,
as ????
 
 
 
A "really important" answer?
Hannig, J., Marron, J. S., Samorodnitsky, G. and Smith F. D.
(2001) "Log-normal durations
can give long range dependence".
 
For a sequence of "simple models" from Lecture 10-1-01,
    -   
indexed by 
- Continuous time (simpler than discrete?)
   
-    Homogeneous Poisson()
"starting time" (for flows)
- Draw independent "duration time" (for each flow)
   
-    Define   
=  number of "active flows"
 
Choose the durations to be
log-normal (,
),
and choose
    -      
(growing intensity)
    -         
(lognormal mean goes 0)
    -       
(lognormal "spreads")
 
Note that for each  , 
the durations are "light tailed"
but "compressed towards 0", and "variance expanded",
so in ,
get "a few large values" and "many small values"
 
Is growing intensity "objectionable"?
- by some standard measures
- but internet traffic is growing rapidly...
- why do asymptotics?
- limiting process models something? No!
- because everybody always has? No!
           
-    to illuminate important, underlying structure?   
Yes!
 
 
 
A "really important" answer? (cont.)
Now measure "Long Range Dependence" through covariance:
Main result:  Given 
and  
,
for any sequence   
with  
:
Intuitive Idea:  
for a "wide range of lags ",
the autocovariance   
has a "polynomial rate of
decay"  ,
which is a "common symptom
of long range dependence"
 
 
 
Some Speculations
Recall Mice and Elephants:
    -   
Aggregation of Mice   
short range indepen’ce
(Cleveland, et. al.)
    -   
Aggregation of Elephants    
long range depen’ce
 
Deep question: which will “eventually dominate”?
- Cleveland: theory for mice swamping elephants
    -   
Others (e.g. me):   elephants will
continue to grow
 
 
 
Some Speculations
Future Major Players(???):
Voice/Video over IP, and other applications
- Strong “Quality of Service” Demands
    -   
Big $$$$ behind this
 
Industry response???
 
Short term future:
- Internet Service Providers:
       
-    Financial incentive for over-provisioning
 
Long term future(????): Interesting competition between:
- Above Overprovisioning
- "Differentiated service"
e.g. marked and prioritized packets
    -   
intermediate solutions
 
 
 
 
Kulkarni’s view of Internet Traffic Research
Ancient Indian Story:
Blind men describe an elephant
View 1: It is like a tree trunk …
View 2: It is like a rope …
View 3: It is like a long hose …
 View 4:    
It is like a large fan …
 
 
 
Request for comments:
Cornell Evaluations
 
 
Personally Important Questions:
Q1:    Html
lectures, vs. Power-Point?
 
Q2:    How
to encourage more student involvement?
 
 
 
 
 
 
 
 
 
Topics never covered:
Stationarity?  (scale
based?)  (paper from Richard)
Cleveland & Cao and
Ramanan asymptotics & mice vs. elephants
New "big trace" analysis
Shared Fate Visualization
Functional Data Analysis
of Traces