Links to Lectures (with summary
of topics):
Lecture 12/7/01 Sam Steckley
Lecture 12/5/01 (Revisited Open Problems, new mathematics for heavy tails (with new graphics), solution for log-normal durations lead to long-range dependence, final "big picture" comments)
Lecture 12/3/01 (Large Variable Association, found way to make axes "commensurable" with respect to multiplicative rescaling, but are tail indices right?)
Lecture 11/28/01 {Actually given on 11/30/01} (Used toy examples and real data to explore variations (e.g. copulas) on Large Variable Association. Problem: how to make axes "commensurable")
Lecture 11/26/01 {Actually given on 11/28/01} (Asymptotic Independence, using SiZer analysis, saw hard to explain dependencies, introduced modification: "Large Variable Association")
Lecture 11/21/01 Jianghong Wang
Lecture 11/19/01 Stacey Tang
Lecture 11/14/01 Jörg Rothenbühler
Lecture 11/12/01 Krishanu Maulik
Lecture 10/31/01 (Cascaded On-Off Process: Quantitative Validation, autocorrelation, summary statistics, visual periodicities, quantiles)
Lecture 10/29/01 (Cascaded On-Off Process: Definition, Parameters Estimation, Visual Impression)
Lecture 10/24/01 (Asymptotic Independence, using SiZer analysis, relationship between Size, Time and "Rate" = Size/Time, Tail index estimation via slope of log-log CCDF, Long Range Dependence vs. ARIMA(1))
Lecture 10/22/01 (For new HTTP Response Size data: Studied Asymptotic Indepence, log-log CCDF Tail Index Estimation)
Lecture 10/17/01 (For Simple Model: Investigated "Independence" part of Poisson process starts. Studied Flow Scatterplots, to investigate "asymptotic independence". Introduced new HTTP Response Size data, did first version of Scatterplots.)
Lecture 10/15/01 (Zooming spectral analysis, revisited flow duration distributions, from "Residual Life Time Distribution" viewpoint)
Lecture 10/10/01 (In context of heavy tailed durations imply LRD, revisited simple model assumptions, considered modifications for start time process, Weibull process improvement not compelling, Cluster Poisson process gave useful improvement)
Lecture 10/3/01 (In context of heavy tailed durations imply LRD, quick overview of extreme value theory, revisited simple model assumptions, Downey's argument for log normal, "Important" open problem)
Lecture 10/1/01 (In context of heavy tailed durations imply LRD, Mice and Elephants graphic, for "one minute split" flows, introduced simple model, investigated modelling assumptions)
Lecture 9/26/01 (Mice and Elephants View, showing how heavy tailed durations imply LRD, time windows gave truncation - length biasing, constructed simulated versions, careful look at IP, TCP, UDP, ...)
Lecture 9/24/01 (Finished SiZer background, zooming SiZer analysis)
Lecture 9/19/01 (Introduction to Time Series Analysis)
Lecture 9/17/01 (Zooming autocorrelation analysis, Heading toward zooming SiZer analysis, 1st doing SiZer background)
Lecture 9/12/01 (Finished (?) study of heavy tails, began study of "Long Range Dependence", via correlation analysis (sensible?), in context of Heavy tailed durations imply LRD? TCP connection zooming graphic)
Lecture 9/10/01 (Careful overview of Q-Q analysis, quantile matching, comparison with Complementary CDF analysis)
Lecture 9/5/01 (Detailed Q-Q for tail of Response Size Distributions, Pareto and log normal gave decent fit, how should we think about "heavy tails" in context of Heavy tailed durations imply LRD?)
Lecture
9/3/01 ("Big
picture" of Internet traffic, Response Size Distributions: SiZer
analysis, Q-Q plots, simulated envelope,
suggest possible Pareto fits)
Other Links:
Summary
of References (with links)
Data Sets:
(kindly provided by Don
Smith, David Ott, Felix Hernandez, and others, from the UNC
Computer Science Distributed
and Real-Time Systems Group)
Response
Size Data: 734,814 HTTP Response Sizes (in bytes), gathered around
1998. (in plain ASCII text format, each line is one response size)
Updated Response Size Data, gathered in April, 2000. These files contain only the responses that are large than 100,000 bytes (in plain ASCII text format, each line is one HTTP response, 1st column is size in bytes, 2nd is starting time (sec), 3rd is finishing time (sec), 4th is time required for transmission). Individual files are for 4 hour blocks, 3 times a day, one for each weekday:
Monday morning, 8:00AM-12:00noon:
20010423_800.raw
Monday afternoon, 1:00PM-5:00PM:
20010423_1300.raw
Monday evening, 7:30PM -
11:30PM: 20010423_1930.raw
Tuesday morning, 8:00AM-12:00noon:
20010424_800.raw
Tuesday afternoon, 1:00PM-5:00PM:
20010424_1300.raw
Tuesday evening, 7:30PM
- 11:30PM: 20010424_1930.raw
Wednesday morning, 8:00AM-12:00noon:
20010425_800.raw
Wednesday afternoon, 1:00PM-5:00PM:
20010425_1300.raw
Wednesday evening, 7:30PM
- 11:30PM: 20010425_1930.raw
Thursday morning, 8:00AM-12:00noon:
20010426_800.raw
Thursday afternoon, 1:00PM-5:00PM:
20010426_1300.raw
Thursday evening, 7:30PM
- 11:30PM: 20010426_1930.raw
Friday morning, 8:00AM-12:00noon:
20010420_800.raw
Friday afternoon, 1:00PM-5:00PM:
20010420_1300.raw
Friday evening, 7:30PM -
11:30PM: 20010420_1930.raw
Saturday morning, 8:00AM-12:00noon:
20010421_800.raw
Saturday afternoon, 1:00PM-5:00PM:
20010421_1300.raw
Saturday evening, 7:30PM
- 11:30PM: 20010421_1930.raw
Sunday morning, 8:00AM-12:00noon:
20010429_800.raw
Sunday afternoon, 1:00PM-5:00PM:
20010429_1300.raw
Sunday evening, 7:30PM -
11:30PM: 20010429_1930.raw
Summary Statistics (Excel Spreadsheet)
Quantiles
(Excel Spreadsheet)
Background Material:
Statistical Analysis and
Modelling
of Internet Traffic Data
Course Meetings:
Time: Mon. - Wed. 8:40 - 9:55
Room: Rhodes 471
Course Web Site:
http://www.orie.cornell.edu/~marron/OR778NetworkData/OR778home.html
maybe easier to follow link from:
http://www.orie.cornell.edu/~marron/
Instructor: J.
S. (Steve) Marron
Office: Rhodes
234
Office Hours:
Mon. 10 - 11, Tuesday 11 - 12
Phone: (607)
255-9147
Email: marron@stat.unc.edu
Course Email List:
please add yourself,
by sending an email with
"subscribe" as the subject,
to: or778-fa01-l-request@orie.cornell.edu
(useful for announcements, such as "notes now posted")
Course Work / Grading
Based on a presentation
Presentations:
- can be either a paper by others (you choose, or I suggest)
- or your own work
-
let's discuss soon
Course Goals:
- Explore Internet Traffic from several viewpoints
- Highlight interesting open problems
- Promote possible joint research
-
Maximize understanding by all class members