Course  OR 778

Class Notes   10/15/01






Last Time:

    -    in general context of:

Heavy tailed durations   Long Range Dependence





    -    revisited simple model assumptions

    -    considered modifications for start time process

    -    Weibull process improvement, not compelling

    -    Cluster Poisson process gave useful improvement?
 
 
 


Zooming Spectral Analysis






Recall "zooming autocorrelation" analysis:  graphic

    -    worked with binned data

    -    across wide range of scales

    -    i.e. binwidths 0.0003 - 0.3 secs

    -    expanding range of 10,000 bins

    -    looked like "white noise" at fine scales

    -    apparent Long Range Dependence at larger scales

    -    surprising "lifting" between scales

    -    explained by simple calculations
 
 
 


Zooming Spectral Analysis (cont.)






Bill Cleveland suggestion:

similar analysis, replacing autocorrelation by periodogram





    -    introduced while studying LRD

    -    essentially Fourier transform of autocovariance

    -    more "natural scale"?

    -    finds "periodicities" in data

    -    slope at 0 (log-log scale) is linear for LRD
 
 
 


Zooming Spectral Analysis (cont.)






1st View:  conventional axes

power vs. frequency





    -    can't see anything at smallest scales (binwidths)

    -    see a little bit at largest scales

    -    all for very low frequencies

    -    clearly not correct view

    -    "improper use" of graph area

    -    logs seem indicated
 
 
 


Zooming Spectral Analysis (cont.)






2nd View:  log-log view
 

    -    stretches both ways

    -    so now "properly use graph area"

    -    clear systematic behavior

    -    explanation?

    -    where is "time" as zooming is done?

    -   purple lines show "highest frequency for each scale"

    -    move to right one step in each frame

    -    "total power" increases (recall larger data window)

    -    LRD (slope at 0) increases??
 
 
 


Zooming Spectral Analysis (cont.)





Simple Analysis:  study effect of "combining bins"

on Periodogram   (at frequency )




Assume underlying spectral density is continuous

(recall Periodgram is a crude estimate of spectral density)




Then can show:




    -    simple multiplicative relationship

    -    power is doubled

    -    "effective frequency" is doubled

    -    interesting to "control by scaling"
 
 
 


Zooming Spectral Analysis (cont.)





3rd View:  normalized power
 

    -    i.e. divide by 2 (vertically) in each frame
 

    -    high frequency power stays constant
 

    -    as predicted by above theory
 

    -    low frequency power grows
 

    -    but only as purple lines move across
 

    -    i.e. extend into lower and lower frequencies
 
 
 


Zooming Spectral Analysis (cont.)





4th view:  normalized power and frequency

    -    rescale time so purple bars remain constant

    -    see increase happening at lowest frequencies

    -    Periodgram seems to follow:

            -    horizontal line at high frequencies

            -    sloped line at low frequencies

    -    consistent with mixture of white noise & LRD process

    -    could do parameter estimation?

    -    picture much more clear than zooming periodogram??

    -    maybe more on this later???
 
 
 


Revisit Flow Duration Distributions





Recall Mice and Elephants plot:

graphic





Studied "distribution of lengths"

Q-Q plots showed Pareto gave reasonable fit?

off peak QQ plot                   peak QQ plot




Interesting point:

Estimated shape parameter 
 
 

Quite different from earlier HTTP Response Size Analysis,

where had shape parameters in range 
 
 
 


Revisit Flow Duration Distributions (cont.)





An explanation of the difference:  Theory of

Residual Life Time Distributions

also known as

Forward Recurrence Time Distributions





Idea: study Mice and Elephants line lengths (flow life times),

conditioned on surviving to a given point




Reason: big problems with "truncation" of flows

i.e. they could have started earlier







Revisit Flow Duration Distributions (cont.)





Theoretical (in tail limit) adjustment:
 

Reference:  Example 3.5.3, page 214, of Resnick, S. I. (1987)

Adventures in Stochastic Processes, Birkhauser.
 
 

Soundbite summary:  for Pareto distribution,

for corresponding Residual Life distribution,

should reduce shape parameter   by  1

    -    since integrate Pareto tail

    -    consistent with above values

    -    but reduction smaller than 1  (~  only 0.6-0.7)??

            -    driven by boundary effects?

            -    not full Residual Life setting???

            -    an artifact of crude parameter estimation?
 
 
 


Revisit Flow Duration Distributions (cont.)





Try alternate (more dodgy?) parameter estimates

off peak alternate Q-Q              peak alternate Q-Q




    -    goal:  see how much difference this can make

    -    tried:  twiddling matched quantiles

    -    to "really different region"

    -    result:  not too much change in estimated 

    -    and went further in support of "residual life" idea

    -    conclude:  some residual life effect is present

(for these data)







Revisit Flow Duration Distributions (cont.)





Variation:  recall Pareto vs. log-normal controversy
 
 

e.g.   Downey's motivation of log normal

off peak lognormal Q-Q        peak lognormal Q-Q





    -    fits is "about as good" as Pareto fits??

    -    none are all that good

    -    expected, because of previously described boundary effects
 
 
 


Revisit Flow Duration Distributions (cont.)





Open problems:
 

What is the Residual Life Distribution for the log normal?
 

Again log normal, or different shape?
 

Can this give new insights regarding the controversy?
 

If data are log normal, and a Pareto is fit to two quantiles, will the

residual life time distribution still have the correspondingly adjusted

Pareto fit?