Course  OR 778

Class Notes   9/19/01

Introduction to Time Series Analysis

1.    Self Contained (so experts need not attend)

2.    Sound Bite

    -    only give "flavor" (enough for this course)

    -    follow references for more details

Two Main parts

1.    Classical Theory

    -    usual in a 1st course in time series

    -    really a semester's (or years's) worth of material

2.    Long Range Dependence

    -    more advanced topic

    -    an active research area

    -    important for internet traffic???

1.    Classical Theory

Recommended Source:
Brockwell, P. J. and Davis, R. A. (1987) Time Series: Theory and Methods, Springer

A finite time series is a "sequence of random variables"

Useful mathematical approximation:

    -    doubly infinite sequence

    -    avoids "edge problems"

1.    Classical Theory (cont.)

Fundamental Classical Property:  Stationarity

Definition:  a time series  is stationary when:

    for all integers 

    -    i.e. (joint multivar.) marginal dist'ns same over time

    -    in "sliding time window", indexed by lag 

    -    thus can hope to "accumulate information"

    -    only hope for classical statistical inference?

    -    at least for "consistent estimation"

    -    aside:    doubly infinite approx'n already useful

    -    confusing terminology:  some (e.g. B&D) call this

"strict stationarity"

1.    Classical Theory (cont.)

Simple dependence measures:


    -    measures "spread"

    -    in second moment terms


    -    sample and theoretical versions

(related intuitively and asymptotically)

    -    independence makes this 0

    -    but can be 0 otherwise (i.e. only linear indep.)


    -    normalized version of covariance

    -    makes it "shift and scale invariant"

    -    thus better "measure of independence"

1.    Classical Theory (cont.)

For a stationary time series, covariance only depends on lag,

thus sensible to define:


(depends also on  in non-stationary case)

1.    Classical Theory (cont.)

Variation on stationarity:

Since many statistical properties depend only on

Mean and covariance structure

Definition: a time series is "weakly stationary" when:

1.    Means are same (over time)

2.    Variance is constant (over time)

3.  is constant over 

    -    much weaker property than "stationary"

    -    since only about moments, not distributions

    -    but both properties are same for Gaussian data

    -    confusing terminology:  some (e.g. B&D) call this


1.    Classical Theory (cont.)

Some famous classical time series models:

1.    White Noise:   where

are independent, identically distributed 

2.    ARMA (Auto Regressive Moving Average) Process:

Solution of "linear equation"

based on the "driving process",   a white noise

    -    Right side is "Moving Average" part

    -    just "linear combo" of White Noise

    -    Left side is "AutoRegressive" part

    -    just "solve linear equation" with MA driver

    -    Convergence issues are critical

    -    to "limit" entailed in 

    -    essentially equivalent to "stationarity"

    -    depends on coefficients

    -    B&D definition:  only have "ARMA" when stationary

1.    Classical Theory (cont.)

Cool ARMA notation:

Basis:   Backshift operator:

Iterated Backshift operator (note "power notation"):

Compact version of ARMA equation:

based on "backshift polynomials"

1.    Classical Theory (cont.)

Key to ARMA convergence (thus stationarity):

Roots of  not on unit circle

(in the complex plane)

These polynomials also determine other properties:

    -    "causality"

    -    "invertibility"

    -    autocorrelation

1.    Classical Theory (cont.)

Example:   AR(1) = ARMA(1,0)

Called an "AutoRegressive process"

Can show 


So, have "behavior similar to AR(1)" when:

plot of  vs. 

        "looks linear"

And slope allows estimation of 

Thus of 

1.    Classical Theory (cont.)

Earlier application of these ideas:
from Lecture Notes 9-12-01

Autocorrelation plot

    -    View 1:  Approximate as:     WhiteNoise + AR(1).

AR(1) part has 
(from slope of    vs. (lag),
since )

    -    nearly “unit root”

    -    close to nonstationary random walk

    -    i.e. "dancing near edge" of nonstationarity

    -    don't forget sampling variability

1.    Classical Theory (cont.)

General form of autocorrelation for (stationary) ARMA:

decreases exponentially in 

(Can be shown using "backshift polynomial" rep'n)

Terminology:    "short range dependence"

1.    Classical Theory (cont.)

A classical venture into "nonstationarity":

ARIMA (AutoRegressive Integrated Moving Average)

Understood through similar defining equation:

View 1:   ARMA, with AR polynomial: 

    -    has unit root

    -    thus non-stationary

View 2:  is a conventional (stationary) ARMA

    -    thus apply classical methods to differenced series

1.    Classical Theory (cont.)

Another view of dependence:

Spectral (Fourier) Analysis

Basic tool:   Periodogram  (i.e. Power Spectrum)

    -    magnitude of Fourier transform of data

    -    indicates "power of signal" at "range of frequencies"

    -    behavior "near 0" is "low frequency behavior"

    -    provides possible definition of "long range dependence"

    -    smooths estimate the "spectral density"

    -    related by Fourier transform to autocorrelation

    -    ARMA spectral density bounded,  for low frequencies

2.    Long Range Dependence

Possible source:

Beran, J. (1994) Statistics for long-memory processes, Chapman & Hall.

    (careful about math's?   at least good source of references)

Autocorrelation characterization:


    -    i.e. "slow" polynomial decay at large lags

    -    recall "fast" exponential decay for ARMA

Spectral density characterization:


    -    i.e.   has a pole at 0 (low frequencies)

    -    recall bounded for ARMA

These are asymptotically equivalent

(under some conditions)

2.    Long Range Dependence (cont.)

A model:    Fractional ARIMA

Similar to ARIMA, with defining equation:


    -    "less unit root-ish" than ARIMA

    -    for   have:

exponentially decaying autocorrelation

i.e. "short range dependence"

    -    for   have:

polynomially decaying autocorrelation

i.e. "long range dependence"

    -    generally have spectral density:


2.    Long Range Dependence (cont.)

Hurst scaling properties:

    -    Exceptions to the Central Limit Theorem?

Recall (under some assumptions)


    -    useful in many ways  (classical statistics)

    -    one view:  "variability" decreases as 

    -    assumptions are important when this is wrong....

An internet traffic example:

Expanding histogram graphic

    -    binning of 139,264 packet arrival time stamps

    -    measured at MCNC (from

    -    Each layer decreases scale by 1/4

    -    variability increases as scale decreases

    -    so what?

    -    scaling is different from prediction by CLT?

2.    Long Range Dependence (cont.)

Hurst scaling properties (cont.)

Compare to I.I.D. Uniform(0,1) variables

Expanding histogram graphic

    -    Variability changes less across scale?

    -    similar at finest scale, less at large scale

Better view:  Standard Deviation as a function of scale

    -    Real data increases faster

    -    view is tricky because of exponential increase

    -    note sample size  n  is 4 to power of "histo level"

Better Scaling:  log(S.D.) vs. scale

    -    Uniforms have slope 1

    -    predicted by CLT   (since )

    -    but Network data has smaller slope

    -    i.e.  non-CLT scaling properties

    -    called the "Hurst phenomenon"

    -    Red is a sim'd LRD process (no time now)

Acknowledgement:  this was motivated by similar analysis in:

Paxson, V. and Floyd, S. (1995) Wide Area traffic: the failure of Poisson modeling, IEEE/ACM Transactions on Networking, 3, 226-244.  Downloadable from here.  For access to more papers by Paxson and colleagues, go here.

2.    Long Range Dependence (cont.)

3rd way to define "Long Range Dependence":

"Hurst Parameter", 

    (again there are asymptotic equivalences)

Hurst parameter and auto-correlation

Hurst parameter and periodogram

2.    Long Range Dependence (cont.)

Now can revisit Richard Smith's analysis

from Lecture Notes 9-12-01

Autocorrelation plot (cont.)

    -    View 2:   Hurst parameter  ~  0.86

(from slope of    vs. )

Periodogram based C.I. is: (0.82,1.06),
Based on analysis and graphics by Richard Smith

    -    0.86   Long Range Dependent, “self similar”, …

    -    Consistent with above “heavy tail” theory