Class Notes 12/3/01
Last Time :
- Large Variable Association
- Toy Examples and Real Data
- Tried to make axes "commensurable"
- Used Probability Integral Transform on angles
- Used SiZer analysis to assess "association"
-
Main Problem: axes not right?
Large Value Association
Note renaming (previously
"Large Variable Association").
Recall Main Idea:
variation of "Asymptotic Dependence"
that is well defined for finite samples,
not just in limit as
Goals:
1. Indicate whether large values are
"more (less) associated than usual".
2. Reduce to classical Asymptotic Independence
in limit as
(in interesting cases)
Large Value Association (cont.)
Approach:
a. For "properly adjusted marginals"
(adjust extreme value distributions as above?)
(surely "make scales comparable", e.g. divide by median)
b. Consider "polar
coordinates" versions of ,
where
and
.
c. Transfrom data to "angularly equally spaced",
by replacing
sorted s
with an equally spaced grid
(essentially Probability Integral Transform)
d. Study association
of "large values", by density of s
where corresponding
is large (> threshold)
e. Have "large value
association" when "piled up in middle"
f. Have "large value
disassociation" when "piled up at ends"
g. Use SiZer
to study statistical significance
Large Value Association
Toy example summary:
- Uniform PIT large value angles piled up at endpoints
-
clear large value disassociation
b. Independent log-normal(5.28,2.46)
- Uniform PIT large value angles piled up at endpoints
-
clear large value disassociation
- Uniform PIT large value angles piled up at endpoints
-
clear large value disassociation
d. Absolute Value of Standard Normal
- resulted in Uniform PIT large value angles
-
"break even" point for association
e. Absolute Value of Correlated Normal
- Uniform PIT large value angles cluster in center
-
clear large value association
- Uniform PIT large value angles cluster in center
-
clear large value association
Large Value Association (cont.)
Comments on Toy Examples:
- main difference with "Asymptotic Independence" is no
"extreme value normalization"
- makes Large Value Association more sensible??
(at least much easier to implement)
-
when are these the same in ???
-
perhaps in "heavy tailed case"?
Large Value Association (cont.)
Recall Real data experience:
- Poor job done of making "axes commensurate"
- "Angular Prob. Int. Trans." hasn't solved this
- Disappointed that now "everything looks independent"
- Recall seemed found interesting associations earlier
- What can be trusted with "inappropriate rescaling"?
-
Did "asymp. indep. axis rescaling" address this?
Large Value Association (cont.)
Main Problem: how to
rescale axes to "make commensurate"?
I. e. how to choose "rescalings"
and
,
to give "appropriate LVA
analysis" of
and
Recall there are a number of choices, e.g.
- Rescale (e.g. by median)
- Prob. Int. Trans. on Angles
- Apply Prob. Int. Trans. to axes (Copula trans.)
- Prob. Int. Trans. on radii (polar coord. Copula?)
-
mix and match
Problem: hard to keep
track of (understand) all of these
Large Value Association (cont.)
Approach: richer visualization
Setting: Thursday afternoon
Main conclusion: still
had not found "appropriate rescaling"
Some Highlights:
-
see axes not very commensurate
- Suggest Strong Association!?!
- seems to "choose wrong part of data"
-
inappropriate modification???
- but axes look not commensurate?
- something wrong with this "commensurate" idea?
- this analysis better than "median rescaled" above?
-
Still need better idea
Large Value Association (cont.)
Next Attempt: Improve "median rescaling"
to "tail quantile rescaling", using 200th from end
- Problem: still piled up on "time"
- Haven't found "correct rescaling"
-
Other quantiles didn't help
Large Value Association (cont.)
Next attempt at rescaling:
use "good visual aspect ratio"
Idea: For plotting a curve (set of line segments)
what is "best aspect ratio"
( /
)?
Usual answer: "fill rectangular
box"
S (S-plus) answer:
make median slope = 1
Adaptation here:
define: "root median
slope"
and take:
and
(note this makes median slope
= 1)
- Good job of "spreading angles"
- But axes still don't look comparable
-
Perhaps problem with "full population" vs. "tails"?
Large Value Association (cont.)
Another approach: replace "population median"
with "top 200 median"
Problem: don't know
"top" a priori
Solution: use top 100
and top 100
values
(count ties twice)
Top 100+100 RMS Rescaled:
- Scatterplots and Q-Q plots look very good
- Angles pile up at 0 instead of at 1
- Opposite to above
-
Looks promising?
Large Value Association (cont.)
Try with Angular PIT,
- Looks good?
- Maybe "best so far"?
- Note mean angle is in middle
-
So seems axes are "commensurate"
Now try for other data:
Thursday Evening, Time vs. Size:
-
Not good: too many towards angle = 1
Sunday Morning, Time vs. Size:
-
Similar lessons
Thursday Afternoon, Rate vs. Size:
-
Again off-balance
Thursday Afternoon, Inverse Rate vs. Time:
-
Even worse
Conclusion: need to do better job of:
- making axes "commensurate"
-
can axis scaling work???
Large Value Association (cont.)
Another approach: choose
scalings
and
,
to balance "average of angles"
after
other transforms
Drawback: requires iterative algorithm:
1. Start with 100+100 RMS version above
2. Iterate:
a. Compute angular PIT
b. Take largest 200
c. Adjust scaling by average of angles
3. End
when: |Average - 0.5| < 0.01 or 100 steps
Numerical experience:
- "often" converges in several steps
- sometimes took many steps
-
sometimes seemed to "oscillate" (never converge)
Large Value Association (cont.)
First results: Thursday
afternoon
- Looks like good rescaling
- not far from before
- but other examples suggest that was "lucky"
-
showing mean angle builds confidence in rescaling
- huge improvement over previous
- shows very convincing Large Value Disassociation
-
again mean angle gives confidence in rescaling
- again major improvement
- shows very convincing Large Value Association
- saw this before (with earlier Asy. Indep.)
-
again mean angle very helpful
Large Value Association (cont.)
Results over all time slots:
- mean angle rescaling "seems right"?
- much better than before
- often have clear Large Value Disassociation
- mostly during peak times?
- occasional "significant spikes"
- suggest "important rates"?
- mostly at off peak times?
-
can this be explained?
- mean angle rescaling "seems excellent"?
- far better than before
- all have clear Large Value Disassociation
-
clear indication that size does not drive rate
- mean angle rescaling "seems excellent"?
- again far better than before
- this time see clear Large Value Association
- so this statistical technique becomes interesting
- clear "spikes of constant rate"
- not clear how they realate to peak - off peak?
-
interesting to try to explain
Overall conclusions:
- mean angle rescaling is "way to do this"?
- lessons similar to earlier "Asy. Ind." analysis
- but now more convincingly made?
-
need to think about explanations