Statistics 6D,   Visualizing Data

Class Notes:  Tuesday 9/17/02
 


    -    Check new material on student pages (from Class Home Page)

    -    Progress with Histograms (Class Assignment 3)???
 


Earlier Line of Thought:    Random ("scientific"?!?) sampling,

motivated by history of polls for presidential elections...

1936:   "Just choose some" failed,  quota sample much better

1948:   Quota sampling failed, random sampling better

        Two big gains:

            -    Much smaller sample sizes   (1.4 mil   -->   50,000   --->  2000)

            -    Can quantify errors via probability theory
 
 


What is "random"?    Choose by "equally likely" mechanism
 

Can generate random numbers using Excel:

                                        Part 10 of Computational Tips
 


Fun example:    Estimate the proportion of males students at UNC

                                 (reflects "your chance of getting a date")
 

Question:    How to get the data?
 

Approach:    draw a sample, and use sample proportion as an "estimate"
 

Sample Size:    25    --->    reasonably large, but not too tedious
 

Deep exploration:    try both "dumb" and "smart" sampling methods
 

Method 1:    Take the 25 people "sitting nearest you in class"

Method 2:    Stand at a doorway, and "tally the first 25 people to walk through"

              (allowed choice between "intelligent" or "crazy", e.g. restroom door)

Method 3:    Write down first 25 names in your head

                      (can know them, or else "famous people", e.g. athletes)

Method 4:    Choose a "random sample"

                                 (based on student telephone directory)

                        (sampling is important, because too many too count!)
 

Expectation:    1st three are "dumb" (but for different reasons), last is "smart"
 
 


Data Analysis by Excel, from an earlier class project

                       (students actually drew samples using above method)

                           (we won't do this because our class is too small)

    -    Data as count is 1st 4 columns

    -    Convert to proportions in next 4 columns

    -    Bin Grid for Histograms in Column J

    -    Histograms

            *    All look "mound" (bell) shaped

            *    But different "centerpoints"

            *    And different "spreads"

    -    Intuitive ideas:

            *    Q1  "moved to right":   since more females in class than at UNC

            *    Q1  "less spread":    since "less variation from smaller population"

                             (extreme case: sample size = population size)

            *    Q2  "much more spread":    reflects wide choice of doors

            *    Q3  "moved to right":    bias towards males when "thinking up names"?

            *    Q3  "very spread":    different people have different biases?

            *    Q4  "maybe about right"?
 
 


Deeper Look:

    -    Found "true proportion" = 0.43    (for that year)

    -    Can compare with sample means

            *    Q1:    0.39  <  0.43    (too small as indicated above)

            *    Q2:    0.47  >  0.43    (too big)

            *    Q3:    0.48  >  0.43   (too big, as indicated above)

            *    Q4:    0.42    Acceptably close?!?

                        (can analyze with more sophisticated statistical tools)

    -    Could do similar things with "standard deviation", and "spread"

    -    What "should the picture look like"?

                                   Useful statistical model:    "Binomial"

            *    Very smooth "mound shape"

            *    Looks like smooth version of others

            *    With "typical center" and "typical spread"

    -    See another statistics course for details (not done here)
 
 


Class Discussion:    Devise a practical random sampling scheme

    -    based on UNC student telephone directory

    -    for sample of size 25

    -    just describe method, don't actually gather sample

    -    put description on your web site

    -    Hint:  consider random page  --->  random column   --->   random student

 


Back to Statistics 6D Home Page