Statistics 6D,   Visualizing Data

Class Notes:  Thursday 9/5/02
 
 


    -    Check new material on student pages (from Class Home Page)

    -    Give introduction to Excel and Simple graphics (from Computing Tips)

    -    Review Histograms (from last meeting)

    -    Excel construction of histograms (from Computing Tips)
 
 


Analysis of Buffalo Snowfall data...

Background:    City of Buffalo, N.Y., known for heavy snows

Data:    TIme Series of annual accumulated snow falls (inches)
 


Recall Excel default histogram constructed in:  Toy Example Excel File
 

Comments:

    -    Excel chose binwidth  =  ~14

    -    Only 8 bins chosen, too large?

    -    Too few bins for "serious structure"?

    -    Note one year unusually small
 
 


Binwidth deliberately "too small"

    -    Tried binwidth  =  3

    -    Requires many bins to include all the data

    -    Histogram looks "very bumpy"

    -    Hard to see "large scale features of distribution"
 
 


Binwidth "clearly too big"

    -    Tried binwidth  =  30

    -    10 times as big as above

    -    Averages taken over too big a range

    -    Obscures potential interesting population structure
 
 


Binwidth "about right"???

    -    Tried binwidth  =  10

    -    "in between" above 2?

    -    large enough to remove "sampling artifacts"?

    -    Small enough to suggest 3 modes?

    -    Interesting question:    are modes "important underlying structure"???
 


Again highlights important issue for histograms:   choice of binwidth
 
 

Recommendation:  try several binwidths

    Including both too big, and too small
 
 


Third Class Assignment:    Explore a new data set with histograms

    -    Start with data in spreadsheet StudyHabitsIndexData.xls

            *    Number attempt to quantify "quality of study habits

            *    Measured for 18 females and 20 males

            *    How do the populations compare???

    -    Address this question by an Excel analysis based on histograms

            *    Just try something, then we compare and discuss

    -    Display your results and conclusions on a new web page

            *    Linked to your home page

            *    You select format and style of presentation

            *    But insert some graphics generated by Excel

    -    Some graphics ideas to consider:

            *    Look at two separate histos, or some "combined version"???

            *    I.e. single graphic showing both "together" (experiment with Excel)

            *    Answers depend on binwidth, how to effectively display several?

    -    Some additional questions (answer on your web page, w/ discussion):

            *    Which group "looks better on average"?

            *    Can you "quantify this idea"?  (e.g. give numerical measures)

            *    Which group "looks more spread" (i.e. has "greater variation")

            *    Quantify this idea by using the STDEV function in Excel

            *    Suppose you are an employer who must hire
                     somebody from one of the two groups.
                     Would you hire a female or a male, if:

                        +    You are forced to choose "at random"

                        +    You can carefully select from a large group of each type

                     Why?
 
 


Back to Statistics 6D Home Page