Lecture12-5-01

Course OR 778

Class Notes 12/5/01

Last Time :

- Large Variable Association

- Found way to make axes "commensurable"

- Main Problem: tail indices not right?

Revisit: New Mathematics for "Heavy Tails"?

Recall Main Goal (from 10/22/01 and 10/24/01) :

Find "more realistic" version of

Heavy tailed durations Long Range Dependence

Motivating data: HTTP Response Data, from UNC, 2001

Recall log-log CCDF: graphic

Suggests classical extreme value theory "hasn't kicked in yet"

Surprising point:

distributions very similar to each other,
across widely different time periods

Suggests "systematic", not "noise driven" behavior

Revisit: New Mathematics for "Heavy Tails"? (cont.)

A weakness of above analysis:

only done for 0.001, 0.002, ... 0.999 quantiles

(since worked with summarized data)

Consequence: for n ~ 1,000,000 - 7,000,000,

have "poor resolution of tails"

("chunkiness" at bottom)

Improved version: "insert" raw data values, graphic

- chunkiness is gone

- distributional shape still surprisingly constant

- still suggests systematic nonlinear behavior

- still clear "classical tail theory" doesn't apply

Revisit: New Mathematics for "Heavy Tails"? (cont.)

Suggests need to update concept of:

Heavy tailed durations Long Range Dependence

since "classical" notions of "heavy tails" don't apply.

Recall earlier approach:

study "pointwise slopes" in log-log CCDF: graphic

- based on difference quotients (truncated below)

- zero denominators caused problems

- but clear visual impression of "usually in heavy tail range"

(recall leads to LRD)

(does lead to non-stationarity???)

- suggests possibility of updating above theory

Revisit: New Mathematics for "Heavy Tails"? (cont.)

Improved version of slopes analysis:

"insert" raw data values, graphic

- "0 denominator" problem becomes much worse

(since discrete quantiles did some smoothing)

- hence tried "grid based difference quotients"

- still "feels noise"

- has "smoothed out noise"

- general shape is same for both

- same lesson as before about effective tail indices

- still seems worth updating above theory

Revisit: New Mathematics for "Heavy Tails"? (cont.)

Provides new motivation for definitions of "heavy tails":

(from 10/24/01)

Version 1: For some ,

Which results in:

Open Problem 1: For the simple Model (recall from 10/1/01),

with Version 1 tailed Duration Dist'n, is

(i.e. Version 1 heavy tails results in index LRD?)

Revisit: New Mathematics for "Heavy Tails"? (cont.)

Note: Version 1 is not "completely right",

but instead only holds "most of the time"

Improvement: modified definition of "heavy tailed"

Version 2: Reformulate, in terms of: have

"most of the time" (in some sense)?

Open Problem 2: For the simple model (recall from 10/1/01),

with Version 2 tailed Duration Dist'n,

can we still have (in a suitable sense)?

How do we modify version 2 to make this happen?

A "really important" open problem (from before)

Can lognormal durations lead to Long Range Dependence?

Simplest (widely held) answer:

No, not for any fixed log-normal

Deeper question:

what if the lognormal changes during the asymptotics?

Who cares?

- Common misperception about "infinite moments"

A precise question: for a sequence of "simple models",

with log normal (parameters and ) durations,

under what conditions (on and ) does classical L.R.D.

(in any sense defined in Lecture9-19-01),

result, as ????

A "really important" answer?

Hannig, J., Marron, J. S., Samorodnitsky, G. and Smith F. D.

(2001) "Log-normal durations can give long range dependence".

For a sequence of "simple models" from Lecture 10-1-01,

- indexed by

- Continuous time (simpler than discrete?)

- Homogeneous Poisson() "starting time" (for flows)

- Draw independent "duration time" (for each flow)

- Define = number of "active flows"

Choose the durations to be log-normal (,), and choose

- (growing intensity)

- (lognormal mean goes 0)

- (lognormal "spreads")

Note that for each , the durations are "light tailed"

but "compressed towards 0", and "variance expanded",

so in , get "a few large values" and "many small values"

Is growing intensity "objectionable"?

- by some standard measures

- but internet traffic is growing rapidly...

- why do asymptotics?

- limiting process models something? No!

- because everybody always has? No!

- to illuminate important, underlying structure? Yes!

A "really important" answer? (cont.)

Now measure "Long Range Dependence" through covariance:

Main result: Given and ,

for any sequence with :

Intuitive Idea: for a "wide range of lags ",

the autocovariance

has a "polynomial rate of decay" ,

which is a "common symptom of long range dependence"

Some Speculations

Recall Mice and Elephants:

- Aggregation of Mice short range indepen’ce

(Cleveland, et. al.)

- Aggregation of Elephants long range depen’ce

Deep question: which will “eventually dominate”?

- Cleveland: theory for mice swamping elephants

- Others (e.g. me): elephants will continue to grow

Some Speculations

Future Major Players(???):

Voice/Video over IP, and other applications

- Strong “Quality of Service” Demands

- Big $$$$ behind this

Industry response???

Short term future:

- Internet Service Providers:

- Financial incentive for over-provisioning

Long term future(????): Interesting competition between:

- Above Overprovisioning

- "Differentiated service"

e.g. marked and prioritized packets

- intermediate solutions

Kulkarni’s view of Internet Traffic Research

Ancient Indian Story:

Blind men describe an elephant

View 1: It is like a tree trunk …

View 2: It is like a rope …

View 3: It is like a long hose …

View 4: It is like a large fan …

Request for comments:

Cornell Evaluations

Personally Important Questions:

Q1: Html lectures, vs. Power-Point?

Q2: How to encourage more student involvement?

Topics never covered:

Stationarity? (scale based?) (paper from Richard)
Cleveland & Cao and Ramanan asymptotics & mice vs. elephants
New "big trace" analysis
Shared Fate Visualization
Functional Data Analysis of Traces