Class Notes:
Tuesday 8/20/02
What is "statistics"?
A personal definition:
"The science of gaining insights from data"
What are "data"?
Aside: Why say "are"? Singular is "datum", plural is "data"....
Answer: A "data set" is a collection of numbers
(Interesting data sets have useful meanings, and insights to explore)
A powerful approach to gaining insights: "Look" at the data
Careful:
Reading lists of numbers is rarely helpful
Example 1: 7201 Family Incomes from Great Britain in 1975
Only about 10 x 36 = 360,
out of 7,201 (i.e. about 5%), are shown here.
Example 2:
6,870,022 Internet connection sizes and times,
gathered at UNC over 4 hours,
on Thursday, ??? 1:00PM - 5:00 PM
Only about 36 shown here.
A very miniscule fraction,
but already overwhelming to "look at list"
Asides:
- will study both data sets in detail later on
-
Internet data is subject of active current research
Important General Lesson:
Usually too many numbers to
"comprehend structure" of the list
Conclusion: Need better methods of "looking at" data
Course Direction:
Will study some of these, and apply them to real data
But today look at some fun examples of how not to do this!
Source:
Wainer, H. (1984) "How to
display data badly", The American Statistician, 38, 137-147.
Big Picture:
an interesting article with examples of poor visualization of data, with
examplanations of the problems, and suggestions for improvement.
Shown here:
a few personal favorites
Figure 5: Where is the deception?
Deception is in the vertical scale:
1970 looks 4 or 5 times as big
in fact only about 9 --->
14, (i.e. about 50%) bigger
Mitigating factor:
"wiggle in axis"
Figure 12: Where is the deception?
Look carefully at vertical axis
Big chunk missing from 800,000 to 1,500,000
In fact News still has > 2 times circulation of Post
Post not really all that
close to catching up
Nice Irony:
Title suggesting change based on "trust"!?!?
Figure 9: Where is the deception?
Don't look at numbers, just size. How much smaller is smallest?
1/4 the size?
Now look at numbers, actually
is about half the size!
Reason: "square law"
Recall area of rectangle, A = l x w, length times width
If both length and
width are 1/2 the size, area is 1/4 the size.....
Interesting fact about human perception:
The focus of study of "objects"
is more on area, than on lengths
A more honest view of the same data:
Idea: return to display of lengths, not areas.
Figure 10:
Now get correct visual impression
of relative sizes
Figure 4: Where is the deception?
Very hard to see increase in Private Schools,
since everything is so small.
Better choices:
- Separate graphs for public and private
-
Show percentages
Figure 3: Problem with this?
Too much "graph clutter", obscures main point.
"ddi" is "data density index", used to assess quality of display
ddi = 0.2 is "bad"
Back to Statistics
6D Home Page