Lecture12-3-01

Course OR 778

Class Notes 12/3/01

Last Time :

- Large Variable Association

- Toy Examples and Real Data

- Tried to make axes "commensurable"

- Used Probability Integral Transform on angles

- Used SiZer analysis to assess "association"

- Main Problem: axes not right?

Large Value Association

Note renaming (previously "Large Variable Association").

Recall Main Idea:

variation of "Asymptotic Dependence"

that is well defined for finite samples,

not just in limit as

Goals:

1. Indicate whether large values are

"more (less) associated than usual".

2. Reduce to classical Asymptotic Independence

in limit as (in interesting cases)

Large Value Association (cont.)

Approach:

a. For "properly adjusted marginals"

(adjust extreme value distributions as above?)

(surely "make scales comparable", e.g. divide by median)

b. Consider "polar coordinates" versions of ,

where and .

c. Transfrom data to "angularly equally spaced",

by replacing sorted s with an equally spaced grid

(essentially Probability Integral Transform)

d. Study association of "large values", by density of s

where corresponding is large (> threshold)

e. Have "large value association" when "piled up in middle"

f. Have "large value disassociation" when "piled up at ends"

g. Use SiZer to study statistical significance

Large Value Association

Toy example summary:

a. Independent Pareto(1.5)

- Uniform PIT large value angles piled up at endpoints

- clear large value disassociation

b. Independent log-normal(5.28,2.46)

- Uniform PIT large value angles piled up at endpoints

- clear large value disassociation

c. Independent Exponential

- Uniform PIT large value angles piled up at endpoints

- clear large value disassociation

d. Absolute Value of Standard Normal

- resulted in Uniform PIT large value angles

- "break even" point for association

e. Absolute Value of Correlated Normal

- Uniform PIT large value angles cluster in center

- clear large value association

f. Independent Uniform

- Uniform PIT large value angles cluster in center

- clear large value association

Large Value Association (cont.)

Comments on Toy Examples:

- main difference with "Asymptotic Independence" is no

"extreme value normalization"

- makes Large Value Association more sensible??

(at least much easier to implement)

- when are these the same in ???

- perhaps in "heavy tailed case"?

Large Value Association (cont.)

Recall Real data experience:

- Poor job done of making "axes commensurate"

- "Angular Prob. Int. Trans." hasn't solved this

- Disappointed that now "everything looks independent"

- Recall seemed found interesting associations earlier

- What can be trusted with "inappropriate rescaling"?

- Did "asymp. indep. axis rescaling" address this?

Large Value Association (cont.)

Main Problem: how to rescale axes to "make commensurate"?

I. e. how to choose "rescalings" and ,

to give "appropriate LVA analysis" of and

Recall there are a number of choices, e.g.

- Rescale (e.g. by median)

- Prob. Int. Trans. on Angles

- Apply Prob. Int. Trans. to axes (Copula trans.)

- Prob. Int. Trans. on radii (polar coord. Copula?)

- mix and match

Problem: hard to keep track of (understand) all of these

Large Value Association (cont.)

Approach: richer visualization

Setting: Thursday afternoon

Main conclusion: still had not found "appropriate rescaling"

Some Highlights:

Med. RS & A. PIT & top 200:

- see axes not very commensurate

Copula & A. PIT & Top 200:

- Suggest Strong Association!?!

- seems to "choose wrong part of data"

- inappropriate modification???

A. PIT & top 200:

- but axes look not commensurate?

- something wrong with this "commensurate" idea?

- this analysis better than "median rescaled" above?

- Still need better idea

Large Value Association (cont.)

Next Attempt: Improve "median rescaling"

to "tail quantile rescaling", using 200th from end

n-200 QRS & A PIT & top 200:

- Problem: still piled up on "time"

- Haven't found "correct rescaling"

- Other quantiles didn't help

Large Value Association (cont.)

Next attempt at rescaling: use "good visual aspect ratio"

Idea: For plotting a curve (set of line segments)

what is "best aspect ratio" ( / )?

Usual answer: "fill rectangular box"

S (S-plus) answer: make median slope = 1

Adaptation here:

define: "root median slope"

and take: and

(note this makes median slope = 1)

Root Median Slope Rescaled:

- Good job of "spreading angles"

- But axes still don't look comparable

- Perhaps problem with "full population" vs. "tails"?

Large Value Association (cont.)

Another approach: replace "population median"

with "top 200 median"

Problem: don't know "top" a priori

Solution: use top 100 and top 100 values

(count ties twice)

Top 100+100 RMS Rescaled:

- Scatterplots and Q-Q plots look very good

- Angles pile up at 0 instead of at 1

- Opposite to above

- Looks promising?

Large Value Association (cont.)

Try with Angular PIT,

TRMS RS & A PIT & top 200:

- Looks good?

- Maybe "best so far"?

- Note mean angle is in middle

- So seems axes are "commensurate"

Now try for other data:

Thursday Evening, Time vs. Size:

- Not good: too many towards angle = 1

Sunday Morning, Time vs. Size:

- Similar lessons

Thursday Afternoon, Rate vs. Size:

- Again off-balance

Thursday Afternoon, Inverse Rate vs. Time:

- Even worse

Conclusion: need to do better job of:

- making axes "commensurate"

- can axis scaling work???

Large Value Association (cont.)

Another approach: choose scalings and ,

to balance "average of angles" after other transforms

Drawback: requires iterative algorithm:

1. Start with 100+100 RMS version above

2. Iterate:

a. Compute angular PIT

b. Take largest 200

c. Adjust scaling by average of angles

3. End when: |Average - 0.5| < 0.01 or 100 steps

Numerical experience:

- "often" converges in several steps

- sometimes took many steps

- sometimes seemed to "oscillate" (never converge)

Large Value Association (cont.)

First results: Thursday afternoon

Time vs. Size:

- Looks like good rescaling

- not far from before

- but other examples suggest that was "lucky"

- showing mean angle builds confidence in rescaling

Rate vs. Size:

- huge improvement over previous

- shows very convincing Large Value Disassociation

- again mean angle gives confidence in rescaling

Inverse Rate vs. Time:

- again major improvement

- shows very convincing Large Value Association

- saw this before (with earlier Asy. Indep.)

- again mean angle very helpful

Large Value Association (cont.)

Results over all time slots:

Time vs. Size:

- mean angle rescaling "seems right"?

- much better than before

- often have clear Large Value Disassociation

- mostly during peak times?

- occasional "significant spikes"

- suggest "important rates"?

- mostly at off peak times?

- can this be explained?

Rate vs. Size:

- mean angle rescaling "seems excellent"?

- far better than before

- all have clear Large Value Disassociation

- clear indication that size does not drive rate

Inverse Rate vs. Time:

- mean angle rescaling "seems excellent"?

- again far better than before

- this time see clear Large Value Association

- so this statistical technique becomes interesting

- clear "spikes of constant rate"

- not clear how they realate to peak - off peak?

- interesting to try to explain

Overall conclusions:

- mean angle rescaling is "way to do this"?

- lessons similar to earlier "Asy. Ind." analysis

- but now more convincingly made?

- need to think about explanations