Course  OR 778

Class Notes   10/24/01







Last Time:

For new HTTP Response Size data:
    -    Studied Asymptotic Indepence
    -    Hill Estimation of tail indices
    -    Used for "power renormalization"
    -    Considered Box-Cox family of power transformations
    -    Explored "ratio Hill estimation"
    -    log-log CCDF Tail Index Estimation
 
 
 


New HTTP Response Data




Data sources:  4 hour blocks of packet headers

        "Morning":   8:00-12:00

        "Afternoon":   13:00-17:00

        "Evening":    19:30-23:30
 
 

Gathered at UNC Main Link

During 7 days in April 2001
 
 
 


New Response Size Q-Q Plots




Another view of New Response Size Data:

Extreme Value Tail Index 




Recall Intuition:
 

    -    Shape parameter of Pareto  (polynomial power)
 

    -    Strong relation to Long Range Dependence

            -    in Mice and Elephants plots  (graphic)

            -   in Duration Distributions,

implies Classical LRD in aggregated time series





    -    Strong relation to moments:

            -    for   have infinite mean

            -    for   have finite mean but infinite variance

            -    for   both mean and variance are finite

            -    similar for larger   and higher moments
 
 
 


New Response Size Q-Q Plots (cont.)






Simple, straightforward Estimation of :

Slope of CCDF (i.e. 1 - CDF) on log - log scale






Log-log CCDF:  graphic
 

    -    All 21 time blocks appear as thin blue lines

    -    Each Individual labeled and highlighted in thick red

    -    Not very "linear"?

    -    Suggests classical extreme value theory

hasn't "kicked in" yet???

    -    Note "shapes" of curves surprisingly constant

    -    Suggests curvature is not "random phenomenon"!

    -    Instead something systematic about internet traffic?

    -    Point worth deeper statistical confirmation??

    -    Suggests enhancement of current mathematics????

    -    Friday evening an "extreme point"?  (least steep?)

    -    Many Resp. Sizes near 400 bytes???

(also for Friday, Afternoon, no where else?)

    -    Worth plotting data between 0.999 quantile and max???

(1,000 to 7,000 of these for each time block....)








New Response Size Q-Q Plots (cont.)






Now estimate "tail index" , by studying:

Slopes:  graphic





    -    Simply use difference quotients from log-log CCDF

    -    Numerical problem:  0 denominators

    -    Reset to bottom of plot

    -    Suggest ignoring those

    -    Could use fancier differentiation (e.g. over bigger range)

    -    But this "raw data" shows interesting structure

    -    "Almost always" have   (interesting for LRD)

    -    But no apparent "tail limit" for ?

    -    So do not satisfy "classical heavy tail definition"?

    -    But still clearly "intuitively heavy tailed"?

    -    Worth exploring alternate definitions?
 
 
 


New Mathematics for "Heavy Tails"?






Version 1:    For some ,






Open Problem 1:    For the simple Model,

with Version 1 tailed Duration Dist'n,   is

?





(i.e. have index   LRD)
 
 

Version 2:    Reformulate, in terms of:    have

"most of the time"  (in some sense)?
 
 

Open Problem 2:    For the simple model,

with Version 2 tailed Duration Dist'n,

can we still have    (in a suitable sense)?
 
 

How do we modify version 2 to make this happen?
 
 
 



 
 
 
 
 
 
 
 
 
 
 
 


A "really important" open problem (from before)








Can the lognormal lead to Long Range Dependence?
 

Simplest (widely held) answer:

No, not for any fixed log-normal








Deeper question:

what if the lognormal changes during the asymptotics?








Who cares?
    -    Common misperception about "infinite moments"
 

A precise question:  for a sequence of "simple models",

    with log normal (parameters   and ) durations,

    under what conditions (on   and ) does classical L.R.D.

(in any sense defined in Lecture9-19-01),

result, as ????
 
 
 


A "really important" open problem (from before)








Suggested approaches:
 

    -    work with autocorrelation definition of L.R.D.
 

    -    since straightforward to compute for simple process
 

    -    mode of convergence (of autocorrelation function)?
 

            -    pointwise in lag?

                       (no, have uninteresting family of exact answers)

            -    uniform on compacts?

            -    over "sliding intervals", ?
 
 



 
 
 
 
 
 
 
 
 
 



 
 
 
 

General Current Direction:
 

QQ analysis, based on new Felix response sizes, towards "shape index curve" idea

shape index curve, towards LRD?
 
 

Back Burner, but interesting:

Another approach to why Weibull process is wrong: study what happens when try "memoryless property" for Weibull.

More sophisticated study of Cluster distribution in clustered Poisson?
 
 
 


Future Topics:

"Asymptotic Independence" scatterplots
Stationarity?  (scale based?)  (paper from Richard)
Heavy tails lead to long range dependence
Cleveland & Cao and Ramanan asymptotics & mice vs. elephants
How "Poisson - Gaussian"?
New "big trace" analysis?
Conservative Cascades
Cascaded On-Off Model
Shared Fate Visualization
Functional Data Analysis of Traces
"Speculations" from zoom stat talk
Kulkarni's view
Ask for comments  (Html lectures, vs. Power-Point?)