Class Notes 12/5/01
Last Time :
- Large Variable Association
- Found way to make axes "commensurable"
-
Main Problem: tail indices not right?
Revisit: New Mathematics for "Heavy Tails"?
Recall Main Goal (from 10/22/01 and 10/24/01) :
Find "more realistic" version of
Heavy tailed durations
Long Range Dependence
Motivating data:
HTTP Response Data, from UNC, 2001
Recall log-log CCDF: graphic
Suggests classical extreme
value theory "hasn't kicked in yet"
Surprising point:
distributions very similar
to each other,
across widely different
time periods
Suggests "systematic", not
"noise driven" behavior
Revisit: New Mathematics for "Heavy Tails"? (cont.)
A weakness of above analysis:
only done for 0.001, 0.002, ... 0.999 quantiles
(since
worked with summarized data)
Consequence: for n ~ 1,000,000 - 7,000,000,
have "poor resolution of tails"
("chunkiness" at bottom)
Improved version: "insert" raw data values, graphic
- chunkiness is gone
- distributional shape still surprisingly constant
- still suggests systematic nonlinear behavior
-
still clear "classical tail theory" doesn't apply
Revisit: New Mathematics for "Heavy Tails"? (cont.)
Suggests need to update concept of:
Heavy tailed durations
Long Range Dependence
since "classical" notions
of "heavy tails" don't apply.
Recall earlier approach:
study "pointwise slopes" in log-log CCDF: graphic
- based on difference quotients (truncated below)
- zero denominators caused problems
- but clear visual impression of "usually in heavy tail range"
(recall
leads to LRD)
(does
lead to non-stationarity???)
-
suggests possibility of updating above theory
Revisit: New Mathematics for "Heavy Tails"? (cont.)
Improved version of slopes analysis:
"insert" raw data values, graphic
- "0 denominator" problem becomes much worse
(since discrete quantiles did some smoothing)
- hence tried "grid based difference quotients"
-
still "feels noise"
-
has "smoothed out noise"
- general shape is same for both
- same lesson as before about effective tail indices
-
still seems worth updating above theory
Revisit: New Mathematics for "Heavy Tails"? (cont.)
Provides new motivation for definitions of "heavy tails":
(from 10/24/01)
Version 1:
For some ,
Which results in:
Open Problem 1: For the simple Model (recall from 10/1/01),
with Version 1 tailed Duration Dist'n, is
(i.e. Version 1 heavy tails
results in index
LRD?)
Revisit: New Mathematics for "Heavy Tails"? (cont.)
Note: Version 1 is not "completely right",
but instead
only holds "most of the time"
Improvement: modified
definition of "heavy tailed"
Version 2: Reformulate, in terms of: have
Open Problem 2: For the simple model (recall from 10/1/01),
with Version 2 tailed Duration Dist'n,
can we still have
(in a suitable sense)?
How do we modify version
2 to make this happen?
A "really important" open problem (from before)
Can lognormal durations lead
to Long Range Dependence?
Simplest (widely held) answer:
No, not for any fixed log-normal
Deeper question:
what if the lognormal changes during the asymptotics?
Who cares?
-
Common misperception about "infinite moments"
A precise question: for a sequence of "simple models",
with log
normal (parameters
and
) durations,
under
what conditions (on
and
) does
classical L.R.D.
(in any sense defined in Lecture9-19-01),
result,
as ????
A "really important" answer?
Hannig, J., Marron, J. S., Samorodnitsky, G. and Smith F. D.
(2001) "Log-normal durations
can give long range dependence".
For a sequence of "simple models" from Lecture 10-1-01,
-
indexed by
- Continuous time (simpler than discrete?)
- Homogeneous Poisson()
"starting time" (for flows)
- Draw independent "duration time" (for each flow)
- Define
= number of "active flows"
Choose the durations to be
log-normal (,
),
and choose
-
(growing intensity)
-
(lognormal mean goes 0)
-
(lognormal "spreads")
Note that for each ,
the durations are "light tailed"
but "compressed towards 0", and "variance expanded",
so in ,
get "a few large values" and "many small values"
Is growing intensity "objectionable"?
- by some standard measures
- but internet traffic is growing rapidly...
- why do asymptotics?
- limiting process models something? No!
- because everybody always has? No!
- to illuminate important, underlying structure?
Yes!
A "really important" answer? (cont.)
Now measure "Long Range Dependence" through covariance:
Main result: Given
and
,
for any sequence
with
:
Intuitive Idea:
for a "wide range of lags ",
the autocovariance
has a "polynomial rate of
decay" ,
which is a "common symptom
of long range dependence"
Some Speculations
Recall Mice and Elephants:
-
Aggregation of Mice
short range indepen’ce
(Cleveland, et. al.)
-
Aggregation of Elephants
long range depen’ce
Deep question: which will “eventually dominate”?
- Cleveland: theory for mice swamping elephants
-
Others (e.g. me): elephants will
continue to grow
Some Speculations
Future Major Players(???):
Voice/Video over IP, and other applications
- Strong “Quality of Service” Demands
-
Big $$$$ behind this
Industry response???
Short term future:
- Internet Service Providers:
- Financial incentive for over-provisioning
Long term future(????): Interesting competition between:
- Above Overprovisioning
- "Differentiated service"
e.g. marked and prioritized packets
-
intermediate solutions
Kulkarni’s view of Internet Traffic Research
Ancient Indian Story:
Blind men describe an elephant
View 1: It is like a tree trunk …
View 2: It is like a rope …
View 3: It is like a long hose …
View 4:
It is like a large fan …
Request for comments:
Cornell Evaluations
Personally Important Questions:
Q1: Html
lectures, vs. Power-Point?
Q2: How
to encourage more student involvement?
Topics never covered:
Stationarity? (scale
based?) (paper from Richard)
Cleveland & Cao and
Ramanan asymptotics & mice vs. elephants
New "big trace" analysis
Shared Fate Visualization
Functional Data Analysis
of Traces