Class Notes: Tuesday
11/12/02
-
Check new material on student pages (from Class
Home Page)
-
Organize Class Lunch
Recall important Idea from histograms: Income Data
"Bin Width" controls amount of "smoothing":
- Too small: result is too "wiggly", "feels samping variation"
- Too big: smooths away important structure
-
Deep Question: Are the two bumps "really there"?
Surprising point: Not only is "width" important, so is "location"
- Uses same binwidth all through
- But "slides grid along"
- Two bumps turn into one
- What is going on?
- Lesson: not only need to worry about binwidth
- Location can be important, too
-
Effects smaller for smaller binwidth
Explanation: overlay "average of all shifts" (shown in green)
- See two clear peaks
- Histo shows 2 bumps, when 1st peak centered in a bin
- Histo shows 1 bump, when 1st peak split between two bins
- 1 or 2 bumps depends on luck of the draw???
- Casts doubt on histograms
- Better choice: use green curve for data analysis
-
Called "kernel density estimate"
Kernel Density Estimation: Alternate View
- Data: Chondrites
- Meteors that hit surface of the earth
- Early question: from how many sources do they come?
- Interesting quantity: % silica
- Approach: make curve with area 1:
- tall where there are many data points
- low where there are few data points
- put small curve with area 1/n near each data point
- Add them up to make kernel density estimate
- Gives strong impression of 3 sources
-
This was green curve for income data above
Notes about Kernel Density Estimate:
- Still have to deal with "window width" Incomes Data
- Too small: curve is too wiggly
- Too big: may smooth away important features
- About right: can find interesting structure
- Important Question: Which "bumps" are "really there"?
-
I.e. Important underlying structure, not sampling variation
Back to Statistics
6D Home Page