Class Notes 9/19/01
Introduction to Time Series
1. Self
Contained (so experts need not attend)
2. Sound Bite
- only give "flavor" (enough for this course)
follow references for more details
Two Main parts
1. Classical Theory
- usual in a 1st course in time series
really a semester's (or years's) worth of material
2. Long Range Dependence
- more advanced topic
- an active research area
important for internet traffic???
1. Classical Theory
Recommended Source:
Brockwell, P. J. and Davis,
R. A. (1987) Time Series: Theory and Methods, Springer
A finite time series is a "sequence of random variables"
Useful mathematical approximation:
- doubly infinite sequence
avoids "edge problems"
1. Classical Theory (cont.)
Fundamental Classical Property:
Definition: a time
is stationary when:
for all
- i.e. (joint multivar.) marginal dist'ns same over time
in "sliding time window", indexed by lag
- thus can hope to "accumulate information"
- only hope for classical statistical inference?
- at least for "consistent estimation"
- aside: doubly infinite approx'n already useful
- confusing terminology: some (e.g. B&D) call this
"strict stationarity"
1. Classical Theory (cont.)
Simple dependence measures:
- measures "spread"
in second moment terms
- sample and theoretical versions
(related intuitively and asymptotically)
- independence makes this 0
but can be 0 otherwise (i.e. only linear indep.)
- normalized version of covariance
- makes it "shift and scale invariant"
thus better "measure of independence"
1. Classical Theory (cont.)
For a stationary time series, covariance only depends on lag,
thus sensible to define:
(depends also on
in non-stationary case)
1. Classical Theory (cont.)
Variation on stationarity:
Since many statistical properties depend only on
Mean and covariance structure
Definition: a time series is "weakly stationary" when:
1. Means are same (over time)
2. Variance is constant (over time)
is constant over
much weaker property than "stationary"
since only about moments, not distributions
but both properties are same for Gaussian data
- confusing terminology: some (e.g. B&D) call this
1. Classical Theory (cont.)
Some famous classical time
series models:
1. White
are independent, identically distributed
2. ARMA (Auto Regressive Moving Average) Process:
Solution of "linear equation"
based on the "driving process",
a white noise
- Right side is "Moving Average" part
- just "linear combo" of White Noise
- Left side is "AutoRegressive" part
- just "solve linear equation" with MA driver
- Convergence issues are critical
to "limit" entailed in
- essentially equivalent to "stationarity"
- depends on coefficients
B&D definition: only have "ARMA" when stationary
1. Classical Theory (cont.)
Cool ARMA notation:
Basis: Backshift operator:
Iterated Backshift operator (note "power notation"):
Compact version of ARMA equation:
based on "backshift polynomials"
1. Classical Theory (cont.)
Key to ARMA convergence (thus
Roots of
not on unit circle
(in the complex plane)
These polynomials also determine
other properties:
- "causality"
- "invertibility"
1. Classical Theory (cont.)
Example: AR(1) = ARMA(1,0)
Called an "AutoRegressive process"
Can show
So, have "behavior similar to AR(1)" when:
plot of
"looks linear"
And slope allows estimation
Thus of
1. Classical Theory (cont.)
Earlier application of these
from Lecture
Notes 9-12-01
- View 1: Approximate as: WhiteNoise + AR(1).
nearly “unit root”
close to nonstationary random walk
i.e. "dancing near edge" of nonstationarity
don't forget sampling variability
1. Classical Theory (cont.)
General form of autocorrelation
for (stationary) ARMA:
decreases exponentially in
(Can be shown using "backshift
polynomial" rep'n)
"short range dependence"
1. Classical Theory (cont.)
A classical venture into
ARIMA (AutoRegressive Integrated
Moving Average)
Understood through similar defining equation:
View 1: ARMA,
with AR polynomial:
- has unit root
thus non-stationary
View 2:
is a conventional (stationary) ARMA
thus apply classical methods to differenced series
1. Classical Theory (cont.)
Another view of dependence:
Spectral (Fourier) Analysis
Basic tool: Periodogram (i.e. Power Spectrum)
- magnitude of Fourier transform of data
- indicates "power of signal" at "range of frequencies"
- behavior "near 0" is "low frequency behavior"
- provides possible definition of "long range dependence"
- smooths estimate the "spectral density"
- related by Fourier transform to autocorrelation
ARMA spectral density bounded, for low frequencies
2. Long Range Dependence
Possible source:
Beran, J. (1994) Statistics for long-memory processes, Chapman & Hall.
about math's? at least good source of references)
Autocorrelation characterization:
- i.e. "slow" polynomial decay at large lags
recall "fast" exponential decay for ARMA
Spectral density characterization:
- i.e. has a pole at 0 (low frequencies)
recall bounded for ARMA
These are asymptotically equivalent
(under some conditions)
2. Long Range Dependence (cont.)
A model:
Fractional ARIMA
Similar to ARIMA, with defining equation:
"less unit root-ish" than ARIMA
exponentially decaying autocorrelation
i.e. "short range dependence"
polynomially decaying autocorrelation
i.e. "long range dependence"
- generally have spectral density:
2. Long Range Dependence (cont.)
Hurst scaling properties:
Exceptions to the Central Limit Theorem?
Recall (under some assumptions)
- useful in many ways (classical statistics)
one view: "variability" decreases as
assumptions are important when this is wrong....
An internet traffic example:
- binning of 139,264 packet arrival time stamps
- measured at MCNC (from
- Each layer decreases scale by 1/4
- variability increases as scale decreases
- so what?
scaling is different from prediction by CLT?
2. Long Range Dependence (cont.)
Hurst scaling properties
Compare to I.I.D. Uniform(0,1)
- Variability changes less across scale?
similar at finest scale, less at large scale
Better view: Standard Deviation as a function of scale
- Real data increases faster
- view is tricky because of exponential increase
note sample size n is 4 to power of "histo level"
Better Scaling: log(S.D.) vs. scale
- Uniforms have slope 1
predicted by CLT (since )
- but Network data has smaller slope
- i.e. non-CLT scaling properties
- called the "Hurst phenomenon"
Red is a sim'd LRD process (no time now)
Acknowledgement: this was motivated by similar analysis in:
Paxson, V. and Floyd, S.
(1995) Wide Area traffic: the failure of Poisson modeling, IEEE/ACM
Transactions on Networking, 3, 226-244. Downloadable
from here. For access to more papers by Paxson and colleagues,
go here.
2. Long Range Dependence (cont.)
3rd way to define "Long Range Dependence":
"Hurst Parameter",
there are asymptotic equivalences)
Hurst parameter and auto-correlation
Hurst parameter and periodogram
2. Long Range Dependence (cont.)
Now can revisit Richard Smith's analysis
plot (cont.)
- View 2: Hurst parameter ~ 0.86
Periodogram based C.I. is:
Based on analysis
and graphics by Richard Smith
0.86 Long Range Dependent, “self similar”, …
Consistent with above “heavy tail” theory