Course  OR 778

Class Notes   9/19/01







Introduction to Time Series Analysis
 

1.    Self Contained (so experts need not attend)
 

2.    Sound Bite

    -    only give "flavor" (enough for this course)

    -    follow references for more details
 
 
 
 


Two Main parts




1.    Classical Theory

    -    usual in a 1st course in time series

    -    really a semester's (or years's) worth of material
 
 

2.    Long Range Dependence

    -    more advanced topic

    -    an active research area

    -    important for internet traffic???
 
 
 


1.    Classical Theory




Recommended Source:
Brockwell, P. J. and Davis, R. A. (1987) Time Series: Theory and Methods, Springer
 
 

A finite time series is a "sequence of random variables"




Useful mathematical approximation:



    -    doubly infinite sequence

    -    avoids "edge problems"
 
 
 


1.    Classical Theory (cont.)




Fundamental Classical Property:  Stationarity
 
 

Definition:  a time series  is stationary when:

    for all integers 
 

    -    i.e. (joint multivar.) marginal dist'ns same over time

    -    in "sliding time window", indexed by lag 

    -    thus can hope to "accumulate information"

    -    only hope for classical statistical inference?

    -    at least for "consistent estimation"

    -    aside:    doubly infinite approx'n already useful

    -    confusing terminology:  some (e.g. B&D) call this

"strict stationarity"






1.    Classical Theory (cont.)




Simple dependence measures:
 

Variance

    -    measures "spread"

    -    in second moment terms
 

Covariance

    -    sample and theoretical versions

(related intuitively and asymptotically)

    -    independence makes this 0

    -    but can be 0 otherwise (i.e. only linear indep.)
 

Correlation,

    -    normalized version of covariance

    -    makes it "shift and scale invariant"

    -    thus better "measure of independence"
 
 
 


1.    Classical Theory (cont.)




For a stationary time series, covariance only depends on lag,

thus sensible to define:
 

Autocovariance: 
 

(depends also on  in non-stationary case)
 
 
 


1.    Classical Theory (cont.)




Variation on stationarity:
 

Since many statistical properties depend only on

Mean and covariance structure
 
 

Definition: a time series is "weakly stationary" when:

1.    Means are same (over time)

2.    Variance is constant (over time)

3.  is constant over 
 

    -    much weaker property than "stationary"
 

    -    since only about moments, not distributions
 

    -    but both properties are same for Gaussian data
 

    -    confusing terminology:  some (e.g. B&D) call this

"stationarity"






1.    Classical Theory (cont.)




Some famous classical time series models:
 
 

1.    White Noise:   where

are independent, identically distributed 




2.    ARMA (Auto Regressive Moving Average) Process:

Solution of "linear equation"

based on the "driving process",   a white noise

    -    Right side is "Moving Average" part

    -    just "linear combo" of White Noise

    -    Left side is "AutoRegressive" part

    -    just "solve linear equation" with MA driver

    -    Convergence issues are critical

    -    to "limit" entailed in 

    -    essentially equivalent to "stationarity"

    -    depends on coefficients

    -    B&D definition:  only have "ARMA" when stationary
 
 
 


1.    Classical Theory (cont.)




Cool ARMA notation:
 
 

Basis:   Backshift operator:




Iterated Backshift operator (note "power notation"):




Compact version of ARMA equation:



based on "backshift polynomials"






1.    Classical Theory (cont.)




Key to ARMA convergence (thus stationarity):
 
 

Roots of  not on unit circle

(in the complex plane)





These polynomials also determine other properties:
 

    -    "causality"

    -    "invertibility"

    -    autocorrelation
 
 
 


1.    Classical Theory (cont.)




Example:   AR(1) = ARMA(1,0)

Called an "AutoRegressive process"




Can show 

thus 
 

So, have "behavior similar to AR(1)" when:

plot of  vs. 

        "looks linear"
 

And slope allows estimation of 

Thus of 
 
 
 


1.    Classical Theory (cont.)




Earlier application of these ideas:
from Lecture Notes 9-12-01
 
 

Autocorrelation plot
 

    -    View 1:  Approximate as:     WhiteNoise + AR(1).

AR(1) part has 
(from slope of    vs. (lag),
since )



    -    nearly “unit root”
 

    -    close to nonstationary random walk
 

    -    i.e. "dancing near edge" of nonstationarity
 

    -    don't forget sampling variability
 
 
 
 


1.    Classical Theory (cont.)




General form of autocorrelation for (stationary) ARMA:
 


decreases exponentially in 




(Can be shown using "backshift polynomial" rep'n)
 
 

Terminology:    "short range dependence"
 
 
 
 
 
 
 


1.    Classical Theory (cont.)




A classical venture into "nonstationarity":
 
 

ARIMA (AutoRegressive Integrated Moving Average)
 

Understood through similar defining equation:



View 1:   ARMA, with AR polynomial: 

    -    has unit root

    -    thus non-stationary
 

View 2:  is a conventional (stationary) ARMA

    -    thus apply classical methods to differenced series
 
 
 
 


1.    Classical Theory (cont.)





Another view of dependence:

Spectral (Fourier) Analysis



Basic tool:   Periodogram  (i.e. Power Spectrum)

    -    magnitude of Fourier transform of data

    -    indicates "power of signal" at "range of frequencies"

    -    behavior "near 0" is "low frequency behavior"

    -    provides possible definition of "long range dependence"

    -    smooths estimate the "spectral density"

    -    related by Fourier transform to autocorrelation

    -    ARMA spectral density bounded,  for low frequencies
 
 
 
 


2.    Long Range Dependence





Possible source:

Beran, J. (1994) Statistics for long-memory processes, Chapman & Hall.

    (careful about math's?   at least good source of references)
 
 
 

Autocorrelation characterization:

  as 

    -    i.e. "slow" polynomial decay at large lags

    -    recall "fast" exponential decay for ARMA
 
 
 

Spectral density characterization:

as 

    -    i.e.   has a pole at 0 (low frequencies)

    -    recall bounded for ARMA
 
 

These are asymptotically equivalent

(under some conditions)






2.    Long Range Dependence (cont.)




A model:    Fractional ARIMA
 

Similar to ARIMA, with defining equation:

where 
 

    -    "less unit root-ish" than ARIMA
 

    -    for   have:

exponentially decaying autocorrelation

i.e. "short range dependence"



    -    for   have:

polynomially decaying autocorrelation

i.e. "long range dependence"



    -    generally have spectral density:

  as 







2.    Long Range Dependence (cont.)




Hurst scaling properties:
 

    -    Exceptions to the Central Limit Theorem?
 
 

Recall (under some assumptions)

  as 

    -    useful in many ways  (classical statistics)

    -    one view:  "variability" decreases as 

    -    assumptions are important when this is wrong....
 
 

An internet traffic example:
 

Expanding histogram graphic

    -    binning of 139,264 packet arrival time stamps

    -    measured at MCNC (from www.nlanr.org)

    -    Each layer decreases scale by 1/4

    -    variability increases as scale decreases

    -    so what?

    -    scaling is different from prediction by CLT?
 
 
 


2.    Long Range Dependence (cont.)




Hurst scaling properties (cont.)
 

Compare to I.I.D. Uniform(0,1) variables
 

Expanding histogram graphic

    -    Variability changes less across scale?

    -    similar at finest scale, less at large scale
 

Better view:  Standard Deviation as a function of scale

    -    Real data increases faster

    -    view is tricky because of exponential increase

    -    note sample size  n  is 4 to power of "histo level"
 

Better Scaling:  log(S.D.) vs. scale

    -    Uniforms have slope 1

    -    predicted by CLT   (since )

    -    but Network data has smaller slope

    -    i.e.  non-CLT scaling properties

    -    called the "Hurst phenomenon"

    -    Red is a sim'd LRD process (no time now)
 
 
 

Acknowledgement:  this was motivated by similar analysis in:

Paxson, V. and Floyd, S. (1995) Wide Area traffic: the failure of Poisson modeling, IEEE/ACM Transactions on Networking, 3, 226-244.  Downloadable from here.  For access to more papers by Paxson and colleagues, go here.
 
 
 


2.    Long Range Dependence (cont.)




3rd way to define "Long Range Dependence":

"Hurst Parameter", 

    (again there are asymptotic equivalences)
 
 

Hurst parameter and auto-correlation



Hurst parameter and periodogram







2.    Long Range Dependence (cont.)





Now can revisit Richard Smith's analysis

from Lecture Notes 9-12-01
 
 

Autocorrelation plot (cont.)
 

    -    View 2:   Hurst parameter  ~  0.86

(from slope of    vs. )

Periodogram based C.I. is: (0.82,1.06),
Based on analysis and graphics by Richard Smith





    -    0.86   Long Range Dependent, “self similar”, …
 

    -    Consistent with above “heavy tail” theory