Lecture9-19-01

Course OR 778

Class Notes 9/19/01

Introduction to Time Series Analysis

1. Self Contained (so experts need not attend)

2. Sound Bite

- only give "flavor" (enough for this course)

- follow references for more details

Two Main parts

1. Classical Theory

- usual in a 1st course in time series

- really a semester's (or years's) worth of material

2. Long Range Dependence

- more advanced topic

- an active research area

- important for internet traffic???

1. Classical Theory

Recommended Source:
Brockwell, P. J. and Davis, R. A. (1987) Time Series: Theory and Methods, Springer

A finite time series is a "sequence of random variables"

Useful mathematical approximation:

- doubly infinite sequence

- avoids "edge problems"

1. Classical Theory (cont.)

Fundamental Classical Property: Stationarity

Definition: a time series is stationary when:

for all integers

- i.e. (joint multivar.) marginal dist'ns same over time

- in "sliding time window", indexed by lag

- thus can hope to "accumulate information"

- only hope for classical statistical inference?

- at least for "consistent estimation"

- aside: doubly infinite approx'n already useful

- confusing terminology: some (e.g. B&D) call this

"strict stationarity"

1. Classical Theory (cont.)

Simple dependence measures:

Variance

- measures "spread"

- in second moment terms

Covariance

- sample and theoretical versions

(related intuitively and asymptotically)

- independence makes this 0

- but can be 0 otherwise (i.e. only linear indep.)

Correlation,

- normalized version of covariance

- makes it "shift and scale invariant"

- thus better "measure of independence"

1. Classical Theory (cont.)

For a stationary time series, covariance only depends on lag,

thus sensible to define:

Autocovariance:

(depends also on in non-stationary case)

1. Classical Theory (cont.)

Variation on stationarity:

Since many statistical properties depend only on

Mean and covariance structure

Definition: a time series is "weakly stationary" when:

1. Means are same (over time)

2. Variance is constant (over time)

3. is constant over

- much weaker property than "stationary"

- since only about moments, not distributions

- but both properties are same for Gaussian data

- confusing terminology: some (e.g. B&D) call this

"stationarity"

1. Classical Theory (cont.)

Some famous classical time series models:

1. White Noise: where

are independent, identically distributed

2. ARMA (Auto Regressive Moving Average) Process:

Solution of "linear equation"

based on the "driving process", a white noise

- Right side is "Moving Average" part

- just "linear combo" of White Noise

- Left side is "AutoRegressive" part

- just "solve linear equation" with MA driver

- Convergence issues are critical

- to "limit" entailed in

- essentially equivalent to "stationarity"

- depends on coefficients

- B&D definition: only have "ARMA" when stationary

1. Classical Theory (cont.)

Cool ARMA notation:

Basis: Backshift operator:

Iterated Backshift operator (note "power notation"):

Compact version of ARMA equation:

based on "backshift polynomials"

1. Classical Theory (cont.)

Key to ARMA convergence (thus stationarity):

Roots of not on unit circle

(in the complex plane)

These polynomials also determine other properties:

- "causality"

- "invertibility"

- autocorrelation

1. Classical Theory (cont.)

Example: AR(1) = ARMA(1,0)

Called an "AutoRegressive process"

Can show

thus

So, have "behavior similar to AR(1)" when:

plot of vs.

"looks linear"

And slope allows estimation of

Thus of

1. Classical Theory (cont.)

Earlier application of these ideas:
from Lecture Notes 9-12-01

Autocorrelation plot

- View 1: Approximate as: WhiteNoise + AR(1).

AR(1) part has

(from slope of

vs.

(lag),
since

)

- nearly “unit root”

- close to nonstationary random walk

- i.e. "dancing near edge" of nonstationarity

- don't forget sampling variability

1. Classical Theory (cont.)

General form of autocorrelation for (stationary) ARMA:

decreases exponentially in

(Can be shown using "backshift polynomial" rep'n)

Terminology: "short range dependence"

1. Classical Theory (cont.)

A classical venture into "nonstationarity":

ARIMA (AutoRegressive Integrated Moving Average)

Understood through similar defining equation:

View 1: ARMA, with AR polynomial:

- has unit root

- thus non-stationary

View 2: is a conventional (stationary) ARMA

- thus apply classical methods to differenced series

1. Classical Theory (cont.)

Another view of dependence:

Spectral (Fourier) Analysis

Basic tool: Periodogram (i.e. Power Spectrum)

- magnitude of Fourier transform of data

- indicates "power of signal" at "range of frequencies"

- behavior "near 0" is "low frequency behavior"

- provides possible definition of "long range dependence"

- smooths estimate the "spectral density"

- related by Fourier transform to autocorrelation

- ARMA spectral density bounded, for low frequencies

2. Long Range Dependence

Possible source:

Beran, J. (1994) Statistics for long-memory processes, Chapman & Hall.

(careful about math's? at least good source of references)

Autocorrelation characterization:

- i.e. "slow" polynomial decay at large lags

- recall "fast" exponential decay for ARMA

Spectral density characterization:

- i.e. has a pole at 0 (low frequencies)

- recall bounded for ARMA

These are asymptotically equivalent

(under some conditions)

2. Long Range Dependence (cont.)

A model: Fractional ARIMA

Similar to ARIMA, with defining equation:

where

- "less unit root-ish" than ARIMA

- for have:

exponentially decaying autocorrelation

i.e. "short range dependence"

- for have:

polynomially decaying autocorrelation

i.e. "long range dependence"

- generally have spectral density:

2. Long Range Dependence (cont.)

Hurst scaling properties:

- Exceptions to the Central Limit Theorem?

Recall (under some assumptions)

- useful in many ways (classical statistics)

- one view: "variability" decreases as

- assumptions are important when this is wrong....

An internet traffic example:

Expanding histogram graphic

- binning of 139,264 packet arrival time stamps

- measured at MCNC (from www.nlanr.org)

- Each layer decreases scale by 1/4

- variability increases as scale decreases

- so what?

- scaling is different from prediction by CLT?

2. Long Range Dependence (cont.)

Hurst scaling properties (cont.)

Compare to I.I.D. Uniform(0,1) variables

Expanding histogram graphic

- Variability changes less across scale?

- similar at finest scale, less at large scale

Better view: Standard Deviation as a function of scale

- Real data increases faster

- view is tricky because of exponential increase

- note sample size n is 4 to power of "histo level"

Better Scaling: log(S.D.) vs. scale

- Uniforms have slope 1

- predicted by CLT (since )

- but Network data has smaller slope

- i.e. non-CLT scaling properties

- called the "Hurst phenomenon"

- Red is a sim'd LRD process (no time now)

Acknowledgement: this was motivated by similar analysis in:

Paxson, V. and Floyd, S. (1995) Wide Area traffic: the failure of Poisson modeling, IEEE/ACM Transactions on Networking, 3, 226-244. Downloadable from here. For access to more papers by Paxson and colleagues, go here.

2. Long Range Dependence (cont.)

3rd way to define "Long Range Dependence":

"Hurst Parameter",

(again there are asymptotic equivalences)

Hurst parameter and auto-correlation

Hurst parameter and periodogram

2. Long Range Dependence (cont.)

Now can revisit Richard Smith's analysis

from Lecture Notes 9-12-01

Autocorrelation plot (cont.)

- View 2: Hurst parameter ~ 0.86

(from slope of

vs.

)

Periodogram based C.I. is: (0.82,1.06),
Based on analysis and graphics by Richard Smith

- 0.86 Long Range Dependent, “self similar”, …

- Consistent with above “heavy tail” theory