Class Notes 10/10/01
Last Time:
- in general context of:
Heavy tailed durations
Long Range Dependence
- quick overview of extreme value theory
- revisited simple model assumptions
- Downey's argument for log normal
-
"Important" open problem
Recall Simple Model
Intuitive basis: Mice and Elephants plot from before:
Model:
a. "line
starts" as homogenous Poisson process
b. durations,
(i.e. "line lengths") as indep., i.i.d. some dist'n
Possible variations:
- binned data
- counts at discrete time points
- discrete time
- Weibull process starts (aggregate indep. Weibull inter-arr's)
-
Clustered Poisson starts
Simple Model (cont.)
Nomenclature: some
names:
-
"Model with M/G/
input"
-
Fluid queue with poisson input
Interesting survey paper:
Resnick, S. and Samorodniztky,
G. (2000) Fluid queues, on-off processes and teletraffic modelling with
highly variable and correlated inputs, in Self Similar Network Traffic
and Performance Evaluation, Eds: A. Erramilli, W. Willinger and C.
Park, 171-192.
Deeper Exploration of Model Assumptions
Recall 2 views (shown earlier):
1. SiZer analysis of homogeneity (constant intensity) of flow starts
- clearly not homogenous (lots of red & blue regions)
- but not so "rough" as packet level analysis?
- homogeneous Poisson OK as an approximation????
Aside: all of these have far more SiZer structure than a
Deeper Exploration of Model Assumptions (cont.)
2.
QQ investigation of interarrival times between session starts
2 a. Exponential QQ, scale est'd by sample mean
- unacceptable fit
- especially slope
- suggesting wrong shape parameter
- these parameters were used in above SiZer
simulation
Deeper Exploration of Model Assumptions (cont.)
2 b. Weibull QQ, parameters est'd by quantile matching
- very good fit
- shape parameter 0.9 "close to Poisson 1.0"?
- provides a workable approximation????
A possibility: improve model by replacing "Poisson Process"
by "Weibull(0.9) waiting time process"
Weibull inter-arrival model
Repeat above analysis, for Weibull interarrival process,
parameters (0.8986, 0.01479) from above Q-Q
1. SiZer analysis of homogeneity (graphic)
- No inhomogeneity visible
- i.e. no "strong evidence" (null hypo. not rejected)
- Suggests Weibull interarrivals not an explanation
(of inhomogeneity)?
- Weibull interarrival process same as Poisson?
- or only "nearly same", since 0.89 ~ 1 ??
- smaller Weibull shape parameter will produce
"non-Poisson clumping"?
Weibull inter-arrival model (cont.)
Aside:
explore smaller (more extreme) Weibull shape parameter
- scale parameter chosen to "give approx. mean"
- here and below
- large amount of inhomogeneity
- so small enough Weibull shape clearly different from
Poisson
- More similar amount on inhomogeneity
- still "too much"??
- better?
- still too much inhomogeneity at small scales??
- better at small scales?
- but now not enough at medium scales??
- appears to be "trade-off" between scales?
- "can't get this right" - suggests model wrong???
- too much splitting of hairs here????
Weibull inter-arrival model (cont.)
Close the circle: could shape parameter 0.45 fit the data?
- clearly way off
- to get "correct inhomogeneity", need "wrong interarrivals"
- conclude Weibull interarrival process is poor fit
Question: are there modifications of "simple model"
that better fit the data?
Cluster Poisson model
Idea: At each homogeneous Poisson point,
generate
a Poisson()
number of additional points
from
the Triangular(0,) distribution
Explanations:
- For small
have nearly no clusters
- For large
have large clusters (very non-homogeneous?)
- Physical motivation: web pages that call others
(e.g. additional ads/graphics)
- Triangle is "sensible model" for load generated calls?
- How to estimate ?
Cluster Poisson model (cont.)
Investigate
choice of by exploiting
"memoryless property" of exponential
Idea:
for any ,
and
Exponential
has the same Exponential distribution
Thus
can "raise
until QQ plot looks exponential",
which
reflects elimination of effect of compact cluster dist'n
- suggests reasonable
is 0.01 - 0.1 sec.??
- very sensible since near "TCP round trip time"?
- thus start with
= 0.1 (sec)
- try to "choose
by eye"?
Cluster Poisson model (cont.)
Try SiZer analysis of homogeneity
recall real data analysis
Attempt
1:
= 4
- liberal guess at "average # new requests generated"?
(liberal aimed at good chance of finding structure)
- Start process parameters give "similar range"
- has clusters, as desired (not homogeneous)
- but not enough clustering
- also reflected by peaks too small in blue curves
- can get more clustering by larger
Cluster Poisson model (cont.)
Attempt
2:
= 8
- definite improvement
- not quite enough clustering?
- too much small scale structure??
- magnitude of wiggles in blue curves is "about right"?
- but blue curve
wiggles are "too high in frequency"??
Attempt
3:
= 16
- large scale structure "about right"?
- but too much "fine scale" structure?
- again magnitude of wiggles in blue curves is "about right"?
- but blue curve
wiggles are "too high in frequency"??
Cluster Poisson model (cont.)
Attempt
4:
= 4,
=
20
- now
was chosen for visual impression
- overall visual impression is "really good"?
- SiZer provides "delicate and precise visual instrument"??
- intuition behind 20 sec cluster time window????
- too slow for TCP round trip time?
- too fast for "user generated clicks"?
- maybe this effect is "nonstationarity" or
"long range dependence", not clustering????
Cluster Poisson model (cont.)
Complete
this circle with a QQ analysis:
Weibull QQ with quantile matched parameters
- not bad for most of distribution
- << 0.1% is "twice as big as should be"?
- simple explanation of this???
- shape parameter estimated at 0.95 > previous 0.9
Cluster Poisson model (cont.)
- really different view as usual
- big values no longer so offensive
- pretty good fit for small values
- note significant "curvature" near 0.01 quantile
- not quite Weibull, but reasonable working approx'n?
- what about different shape parameter?
Cluster Poisson model (cont.)
- now enforce shape parameter = 0.9
- scale estimated by MLE (given shape)
- curvature more visually apparent?
- maybe better overall fit than above??
- overall approximation looks better???
- could fine tune, using shape of cluster dist'n????
(not clear how much time this is worth)
Cluster Poisson model (cont.)
Overall Conclusion (personal):
- Cluster Poisson is worthwhile modification of simple model
- time parametrization question worth more investigation
- this analysis seems publishable??
- worth trying other cluster distributions????
Open question: worth trying this "Cluster Poisson" analysis
on all packet data (not just flow starts as here)?