Statistics 6D,   Visualizing Data

Class Notes:   Tuesday 8/20/02
 
 


What is "statistics"?

A personal definition:   "The science of gaining insights from data"
 
 


What are "data"?

Aside:    Why say "are"?   Singular is "datum", plural is "data"....

Answer:    A "data set" is a collection of numbers

       (Interesting data sets have useful meanings, and insights to explore)
 
 


A powerful approach to gaining insights:    "Look" at the data

Careful:  Reading lists of numbers is rarely helpful
 

Example 1:    7201 Family Incomes from Great Britain in 1975

Only about 10 x 36 = 360,  out of 7,201 (i.e. about 5%), are shown here.
 
 

Example 2:    6,870,022 Internet connection sizes and times,
gathered at UNC over 4 hours, on Thursday, ??? 1:00PM - 5:00 PM

Only about 36 shown here.

A very miniscule fraction, but already overwhelming to "look at list"
 
 

Asides:

    -    will study both data sets in detail later on

    -    Internet data is subject of active current research
 
 


Important General Lesson:    Usually too many numbers to
                                "comprehend structure" of the list

Conclusion:   Need better methods of "looking at" data

Course Direction:   Will study some of these, and apply them to real data
 


But today look at some fun examples of how not to do this!

Source:

Wainer, H. (1984) "How to display data badly", The American Statistician, 38, 137-147.
 

Big Picture:    an interesting article with examples of poor visualization of data, with examplanations of the problems, and suggestions for improvement.
 

Shown here:    a few personal favorites
 


Figure 5:    Where is the deception?


 

Deception is in the vertical scale:

1970 looks 4 or 5 times as big

in fact only about 9 ---> 14,   (i.e. about 50%) bigger
 

Mitigating factor:    "wiggle in axis"
 
 


Figure 12:    Where is the deception?


 

Look carefully at vertical axis

Big chunk missing from 800,000 to 1,500,000

In fact News still has > 2 times circulation of Post

Post not really all that close to catching up
 

Nice Irony:    Title suggesting change based on "trust"!?!?
 
 


Figure 9:    Where is the deception?

Don't look at numbers, just size.  How much smaller is smallest?

1/4 the size?

Now look at numbers, actually is about half the size!
 

Reason:    "square law"

Recall area of rectangle, A = l x w, length times width

If both length and width are 1/2 the size, area is 1/4 the size.....
 

Interesting fact about human perception:

The focus of study of "objects" is more on area, than on lengths
 

A more honest view of the same data:

Idea:    return to display of lengths, not areas.

Figure 10:

Now get correct visual impression of relative sizes
 


Figure 4:    Where is the deception?

Very hard to see increase in Private Schools,

since everything is so small.
 

Better choices:

    -    Separate graphs for public and private

    -    Show percentages
 
 


Figure 3:    Problem with this?

Too much "graph clutter", obscures main point.

"ddi" is "data density index", used to assess quality of display

ddi = 0.2 is "bad"
 
 


Back to Statistics 6D Home Page