Class Notes 9/19/01
Introduction to Time Series
Analysis
1. Self
Contained (so experts need not attend)
2. Sound Bite
- only give "flavor" (enough for this course)
-
follow references for more details
Two Main parts
1. Classical Theory
- usual in a 1st course in time series
-
really a semester's (or years's) worth of material
2. Long Range Dependence
- more advanced topic
- an active research area
-
important for internet traffic???
1. Classical Theory
Recommended Source:
Brockwell, P. J. and Davis,
R. A. (1987) Time Series: Theory and Methods, Springer
A finite time series is a "sequence of random variables"
Useful mathematical approximation:
- doubly infinite sequence
-
avoids "edge problems"
1. Classical Theory (cont.)
Fundamental Classical Property:
Stationarity
Definition: a time
series
is stationary when:
for all
integers
- i.e. (joint multivar.) marginal dist'ns same over time
-
in "sliding time window", indexed by lag
- thus can hope to "accumulate information"
- only hope for classical statistical inference?
- at least for "consistent estimation"
- aside: doubly infinite approx'n already useful
- confusing terminology: some (e.g. B&D) call this
"strict stationarity"
1. Classical Theory (cont.)
Simple dependence measures:
Variance
- measures "spread"
-
in second moment terms
Covariance
- sample and theoretical versions
(related intuitively and asymptotically)
- independence makes this 0
-
but can be 0 otherwise (i.e. only linear indep.)
Correlation,
- normalized version of covariance
- makes it "shift and scale invariant"
-
thus better "measure of independence"
1. Classical Theory (cont.)
For a stationary time series, covariance only depends on lag,
thus sensible to define:
Autocovariance:
(depends also on
in non-stationary case)
1. Classical Theory (cont.)
Variation on stationarity:
Since many statistical properties depend only on
Mean and covariance structure
Definition: a time series is "weakly stationary" when:
1. Means are same (over time)
2. Variance is constant (over time)
3.
is constant over
-
much weaker property than "stationary"
-
since only about moments, not distributions
-
but both properties are same for Gaussian data
- confusing terminology: some (e.g. B&D) call this
"stationarity"
1. Classical Theory (cont.)
Some famous classical time
series models:
1. White
Noise:
where
are independent, identically distributed
2. ARMA (Auto Regressive Moving Average) Process:
Solution of "linear equation"
based on the "driving process",
a white noise
- Right side is "Moving Average" part
- just "linear combo" of White Noise
- Left side is "AutoRegressive" part
- just "solve linear equation" with MA driver
- Convergence issues are critical
-
to "limit" entailed in
- essentially equivalent to "stationarity"
- depends on coefficients
-
B&D definition: only have "ARMA" when stationary
1. Classical Theory (cont.)
Cool ARMA notation:
Basis: Backshift operator:
Iterated Backshift operator (note "power notation"):
Compact version of ARMA equation:
based on "backshift polynomials"
1. Classical Theory (cont.)
Key to ARMA convergence (thus
stationarity):
Roots of
not on unit circle
(in the complex plane)
These polynomials also determine
other properties:
- "causality"
- "invertibility"
-
autocorrelation
1. Classical Theory (cont.)
Example: AR(1) = ARMA(1,0)
Called an "AutoRegressive process"
Can show
thus
So, have "behavior similar to AR(1)" when:
plot of
vs.
"looks linear"
And slope allows estimation
of
Thus of
1. Classical Theory (cont.)
Earlier application of these
ideas:
from Lecture
Notes 9-12-01
- View 1: Approximate as: WhiteNoise + AR(1).
-
nearly “unit root”
-
close to nonstationary random walk
-
i.e. "dancing near edge" of nonstationarity
-
don't forget sampling variability
1. Classical Theory (cont.)
General form of autocorrelation
for (stationary) ARMA:
decreases exponentially in
(Can be shown using "backshift
polynomial" rep'n)
Terminology:
"short range dependence"
1. Classical Theory (cont.)
A classical venture into
"nonstationarity":
ARIMA (AutoRegressive Integrated
Moving Average)
Understood through similar defining equation:
View 1: ARMA,
with AR polynomial:
- has unit root
-
thus non-stationary
View 2:
is a conventional (stationary) ARMA
-
thus apply classical methods to differenced series
1. Classical Theory (cont.)
Another view of dependence:
Spectral (Fourier) Analysis
Basic tool: Periodogram (i.e. Power Spectrum)
- magnitude of Fourier transform of data
- indicates "power of signal" at "range of frequencies"
- behavior "near 0" is "low frequency behavior"
- provides possible definition of "long range dependence"
- smooths estimate the "spectral density"
- related by Fourier transform to autocorrelation
-
ARMA spectral density bounded, for low frequencies
2. Long Range Dependence
Possible source:
Beran, J. (1994) Statistics for long-memory processes, Chapman & Hall.
(careful
about math's? at least good source of references)
Autocorrelation characterization:
as
- i.e. "slow" polynomial decay at large lags
-
recall "fast" exponential decay for ARMA
Spectral density characterization:
as
- i.e. has a pole at 0 (low frequencies)
-
recall bounded for ARMA
These are asymptotically equivalent
(under some conditions)
2. Long Range Dependence (cont.)
A model:
Fractional ARIMA
Similar to ARIMA, with defining equation:
where
-
"less unit root-ish" than ARIMA
-
for
have:
exponentially decaying autocorrelation
i.e. "short range dependence"
-
for
have:
polynomially decaying autocorrelation
i.e. "long range dependence"
- generally have spectral density:
as
2. Long Range Dependence (cont.)
Hurst scaling properties:
-
Exceptions to the Central Limit Theorem?
Recall (under some assumptions)
as
- useful in many ways (classical statistics)
-
one view: "variability" decreases as
-
assumptions are important when this is wrong....
An internet traffic example:
- binning of 139,264 packet arrival time stamps
- measured at MCNC (from www.nlanr.org)
- Each layer decreases scale by 1/4
- variability increases as scale decreases
- so what?
-
scaling is different from prediction by CLT?
2. Long Range Dependence (cont.)
Hurst scaling properties
(cont.)
Compare to I.I.D. Uniform(0,1)
variables
- Variability changes less across scale?
-
similar at finest scale, less at large scale
Better view: Standard Deviation as a function of scale
- Real data increases faster
- view is tricky because of exponential increase
-
note sample size n is 4 to power of "histo level"
Better Scaling: log(S.D.) vs. scale
- Uniforms have slope 1
-
predicted by CLT (since )
- but Network data has smaller slope
- i.e. non-CLT scaling properties
- called the "Hurst phenomenon"
-
Red is a sim'd LRD process (no time now)
Acknowledgement: this was motivated by similar analysis in:
Paxson, V. and Floyd, S.
(1995) Wide Area traffic: the failure of Poisson modeling, IEEE/ACM
Transactions on Networking, 3, 226-244. Downloadable
from here. For access to more papers by Paxson and colleagues,
go here.
2. Long Range Dependence (cont.)
3rd way to define "Long Range Dependence":
"Hurst Parameter",
(again
there are asymptotic equivalences)
Hurst parameter and auto-correlation
Hurst parameter and periodogram
2. Long Range Dependence (cont.)
Now can revisit Richard Smith's analysis
Autocorrelation
plot (cont.)
- View 2: Hurst parameter ~ 0.86
Periodogram based C.I. is:
(0.82,1.06),
Based on analysis
and graphics by Richard Smith
-
0.86 Long Range Dependent, “self similar”, …
-
Consistent with above “heavy tail” theory