Your SlideShare is downloading. ×
0
Data Mining, Truth, Justice, the American Way,
      and the Flying Spaghetti Monster




               tim@menzies.us Ph...
Expose, and hose


• quot;Part of education is to     • quot;Part of science is to
  expose people to different     expose...
quot;Look up in the sky! It's a bird! It's a
      plane! It's Superman!quot;
                   quot;Yes, it's Superman, ...
So, tonight
     Notions of certainty
 
         Standards for debate
     

     Surprises
 
         Nothing is “trut...
God gave me a brain.
    I take it (s)he wants me to use it.
    Mark of the rational

         while not dead; do
     ...
Data miners: agents that automate the
    creation and review of new ideas
@relation weather.symbolic
@attribute outlook {...
Data doubling every 20 months
    Internet, Radio Frequency Identification (RFID) tracking, on-line

    shopping (patter...
More fundamentally, what can we say
 about the world, with any certainty?
    Same data, different data miners

        d...
Any learning scheme
    has many biases
•   Bias lets us ignore “stuff”.
•   Without it, we don’t know
    what is importa...
Generalizing from
                  the past, works
    Sometimes, very clearly

        Heavy smokers have
    
       ...
So, a relativistic soup?

    No certainty?


    No way to plan effective actions?


    No way to rule out absurd noti...
I don’t want to offend
                  any one, but…
    … I think that once …                     Should I even say thi...
The Church of the Flying
         Spaghetti Monster (FSM)




    Founded in 2005


         OSU physics graduate Bobby H...
FSM is not about religion
    It is a mistake to view FSM as anti-religion

        Rather, FSM is anti-anti-scientific r...
Wikipedia on FSM
    FSM: an invisible, undetectable            Pirates are quot;absolute divine
                        ...
FSM “proof” of the
             divinity of pirates
                                            A case study on how
      ...
What is the “best” weight-loss diet?




                                       17
How lucky for those in power
that people don't think.

- Adolph Hitler




       i.e. people trying to
       sell you th...
What is the “best”
programming language?




                        19
To our peril, we trust
                 old ideas too much
    Columbia ice strike:

         Size: 1200 in3,
     

   ...
Value of estrogen

    (NYT magazine,
    Sept 16, 2007)



    1990s:

                                                 ...
So, why is FSM silly?
    And please, rest assured,

        it is very very silly stuff indeed.
    


    Theories nee...
No things are “right”, but some
          things are “useful”
    Sure, one data set supports many theories.

        But...
Wolfgang Pauli
    The quot;conscience of physicsquot;,

         the critic to whom his colleagues were accountable.
   ...
Believe those who seek the truth;
     doubt those who find it
           -Andre Gide.
Don’t test once on just
            the training data
    Study more than the

    average
    performance

    Also look...
If something works, poke it till it breaks
   i) Sort attributes on “infogain”
   ii) Learn using first N attributes




 ...
Living with Uncertainty
    Check how training rate size effects theory





                                            ...
Living with Uncertainty
    Launch learners with anomaly

    detection and repair tools




                            ...
Living with uncertainty:                           An incremental
                                                   discr...
Living with uncertainty
                    Policy #1: exploration
                
Life is a
                         To...
Tolerance of “exploration”
    Critical to the

    American way
         America: history of
     
         tolerance a...
Tolerance = hi-tech = $$$
    R. Florida: The Economic

    Geography of Talent, 2002
         Annals of Association of A...
Data Mining, Truth, Justice, the
  American Way & Flying Spaghetti Monsters
                         “Superman, fights a n...
Expose, and hose


• quot;Part of education is to     • quot;Part of science is to
  expose people to different     expose...
Upcoming SlideShare
Loading in...5
×

Data mining, truth, justice, the American Way, and the Giant Spaghetti Monster

1,062

Published on

Published in: Economy & Finance, Technology
1 Comment
0 Likes
Statistics
Notes
  • Hi sorry if my opinion seems crazy... but you added the FSM theory of creation and basicly stated its crazy or delucional... and dismised the pirate vs global warming as false (from what I understood)... I think it would be better if you changed the FSM for something more beliveble and tried the same... how about Catolic church... not the same uh!... how about Jews.. or Budist.. or ID belivers.. or Scientology.. its a matter of faith my friend... just some cristian advise... dont judge people everything that goes comes around... or budist way, remember the karma...

    Atte
    Aperantly one who still belives in respect for others belifs
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

No Downloads
Views
Total Views
1,062
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
26
Comments
1
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Data mining, truth, justice, the American Way, and the Giant Spaghetti Monster"

  1. 1. Data Mining, Truth, Justice, the American Way, and the Flying Spaghetti Monster tim@menzies.us Ph.D. LCSEE, WVU, 20 Sept 2007
  2. 2. Expose, and hose • quot;Part of education is to • quot;Part of science is to expose people to different expose people to the schools of thought.” critical and continual (re)evaluation of ideas.” - President George Bush, - Some guy called Timm, August 1, 2005 September 20, 2007 2
  3. 3. quot;Look up in the sky! It's a bird! It's a plane! It's Superman!quot; quot;Yes, it's Superman, strange visitor from another planet who came to Earth with powers and abilities far beyond those of mortal men.” “Superman, who can change the course of mighty rivers, bend steel in his bare hands; and who, disguised as Clark Kent, mild- mannered reporter for a great metropolitan newspaper, fights a never ending battle for truth, justice, and the American way.quot; Why a never- How to ensure ending battle? justice? How to make lottsa $$ ? How to find truth? 3
  4. 4. So, tonight Notions of certainty  Standards for debate  Surprises  Nothing is “truth”  but many more things are false  And some things are useful  Implications for humility  And for justice  4
  5. 5. God gave me a brain. I take it (s)he wants me to use it. Mark of the rational  while not dead; do  Review and revise assumptions;  Done  Entertain a wide range of ideas  But don’t necessarily accept them  Demand evidence  that lets your repeat/ refute/ improve  prior conclusions But what of faith?  That, is another talk  There is room for the  divine in my universe But in my test tubes?  Not too much  5
  6. 6. Data miners: agents that automate the creation and review of new ideas @relation weather.symbolic @attribute outlook {sunny, overcast, rainy} Mountains @attribute temperature {hot, mild, cool} @attribute humidity {high, normal} of data @attribute windy {TRUE, FALSE} @attribute play {yes, no} @data sunny,hot,high,FALSE,no Tablespoons of sunny,hot,high,TRUE,no knowledge overcast,hot,high,FALSE,yes rainy,mild,high,FALSE,yes rainy,cool,normal,FALSE,yes outlook = sunny rainy,cool,normal,TRUE,no | humidity = high: no overcast,cool,normal,TRUE,yes sunny,mild,high,FALSE,no | humidity = normal: yes sunny,cool,normal,FALSE,yes outlook = overcast: yes rainy,mild,normal,FALSE,yes outlook = rainy sunny,mild,normal,TRUE,yes | windy = TRUE: no overcast,mild,high,TRUE,yes overcast,hot,normal,FALSE,yes | windy = FALSE: yes rainy,mild,high,TRUE,no 6
  7. 7. Data doubling every 20 months Internet, Radio Frequency Identification (RFID) tracking, on-line  shopping (patterns of sales tracked at Amazon) So now we can automatically learn answers to many questions; e.g.  What eggs to select for IVF?  What will software cost to develop?  What diseases does a patient have?  Which loan applications to fund?  What houses will have the best resale value?  Which parts of the program need more inspection?  What products are best to sell to what markets?  What cows to keep and which to send to the abattoir ?  How to teach a satellite to distinguish between cloud shadows and oil  spills? How much electricity will be needed in two hours  i.e. what cola-powered generators to fire up?  7
  8. 8. More fundamentally, what can we say about the world, with any certainty? Same data, different data miners  different conclusions  Every miner biased by  Evaluation bias  Language  What is the “shape” of the  models we can learn? Decision trees, equations, etc  Search  Pruning the possible infinite  space of of candidate models What not to explore  Over-fitting avoidance  How to stop the learner fixating on noise  E.g. pruning back decision trees  8
  9. 9. Any learning scheme has many biases • Bias lets us ignore “stuff”. • Without it, we don’t know what is important or dull, we can’t summarize, generalize. • Without bias, we can’t learn from the past • Bias blinds us but lets us see the future • But changing biases changes what we best believe • No wonder truth is a never-ending battle 9
  10. 10. Generalizing from the past, works Sometimes, very clearly  Heavy smokers have  2000% to 3000% higher change of lung cancer Learned theories  performs very well on new data But ...  the “best” learned theory  can be a moveable feast. 10
  11. 11. So, a relativistic soup? No certainty?  No way to plan effective actions?  No way to rule out absurd notions?  11
  12. 12. I don’t want to offend any one, but… … I think that once … Should I even say this in a   public place? there were no cell phones  or iPods, or clothes, or quot;Part of education is to expose   countries, or language, or people to different schools of human society, or 4-valved thought.” hearts, or homeostasis, or President George Bush,  organs, or brains, or planets, August 1, 2005 or stars, or matter Shouldn’t I be have to give  credence to all theories? Where the net energy  in-flow is positive… Evolution,  Intelligent design the universe selects for self-   perpetuating systems, Pirates cause global  warming? an exponentially decreasing  number of which are of exponentially increasing complexity 12
  13. 13. The Church of the Flying Spaghetti Monster (FSM) Founded in 2005  OSU physics graduate Bobby Henderson  A protest against the decision by the Kansas State Board of Education  That require the teaching of intelligent design as an alternative to biological evolution.  Henderson wrote to the board  professing belief in a supernatural Creator called the Flying Spaghetti Monster  Demanded that his quot;Pastafarianquot; theory of creation be taught in science classrooms.  13
  14. 14. FSM is not about religion It is a mistake to view FSM as anti-religion  Rather, FSM is anti-anti-scientific rigor  No one in their right mind would ever  believe this nonsense And that’s the point  Truth is a never-ending battle  We must have standards to assess scientific  theories, to reject absurdities Or any nonsense can be released on this world  E.g. “Global warming is caused by pirates.”  14
  15. 15. Wikipedia on FSM FSM: an invisible, undetectable Pirates are quot;absolute divine   Flying Spaghetti Monster beingsquot; and the original Pastafarians. Evidence for evolution planted by  FSM to in to Pastafarians' faith Their image as quot;thieves and  outcastsquot; is misinformation spread by Christian theologians in the FSM changes the results of  Middle Ages and Hare Krishnas. measurements, like radiocarbon dating, via His Noodly Appendage. Pirates are quot;peace-loving  explorers and spreaders of good Heaven contains beer volcanoes  willquot; who distributed candy to and a stripper factory. small children. Hell is similar, but with stale beer  Global warming, earthquakes,  and diseased strippers. hurricanes, and other natural disasters are a direct effect of the shrinking numbers of pirates since the 1800s. 15
  16. 16. FSM “proof” of the divinity of pirates A case study on how not to present data X-axis deliberately misleading. Crazy? Yes! • But would you recognize such craziness if you say it again? 16
  17. 17. What is the “best” weight-loss diet? 17
  18. 18. How lucky for those in power that people don't think. - Adolph Hitler i.e. people trying to sell you their diet book
  19. 19. What is the “best” programming language? 19
  20. 20. To our peril, we trust old ideas too much Columbia ice strike:  Size: 1200 in3,  Speed: 477 mph  (relative to vehicle) Certified as “safe” by the  CRATER micro-meteorite model A typical experiment in  CRATER’s test database Size: 3 in3 piece of debris  Speed: under 150 mph.  20
  21. 21. Value of estrogen (NYT magazine, Sept 16, 2007) 1990s:  Failure of scientific method  American Heart Association  Benefits of estrogen reported from large  recommends hormone replacement observational studies, not randomized trials therapy for older women to ward off Repeated epidemiological finding: heart disease and osteoporosis.  2001: randomized trail rarely support conclusions   from observational studies. 15 million Americans filling H.R.T.  So forget what you’re read about prescriptions annually  2002: Anti-oxidants like vitamins E & C &beta   carotene preventing heat disease estrogen therapy exposed as a hazard,  Fiber prevents colon cancer not a benefit, for health  21
  22. 22. So, why is FSM silly? And please, rest assured,  it is very very silly stuff indeed.  Theories need an entrance exam  Many possible theories  one for each bias  Demand that a theory has past at least  some operational al test before we condone it, act on it. If no reason to accept the new, don’t  Trust the most what has been  challenged the most Karl Popper  22
  23. 23. No things are “right”, but some things are “useful” Sure, one data set supports many theories.  But there are many many more theories that are  unsupported. No model is right, but some things are useful  (perform well on test data)  George Box  And many many many more ideas are useless  Can’t make predictions  Not defined enough to support (possible) refutation  23
  24. 24. Wolfgang Pauli The quot;conscience of physicsquot;,  the critic to whom his colleagues were accountable.  Scathing in his dismissal of poor theories  often labeling it ganz falsch, utterly false.  But “ganz falsch” was not his most severe  criticism, He hated theories so unclearly presented as to be  untestable  unevaluatable,  Worse than wrong because they could not be  proven wrong. Not properly belonging within the realm of science,  even though posing as such.  Famously, he wrote of of such unclear paper:  ”This paper is right. It is not even wrong.quot;  24
  25. 25. Believe those who seek the truth; doubt those who find it -Andre Gide.
  26. 26. Don’t test once on just the training data Study more than the  average performance Also look at the  variance E.g. here, no  significant on new data after X=8 26
  27. 27. If something works, poke it till it breaks i) Sort attributes on “infogain” ii) Learn using first N attributes labor soybean diabetes anneal A few variables are (often) enough 27
  28. 28. Living with Uncertainty Check how training rate size effects theory  28
  29. 29. Living with Uncertainty Launch learners with anomaly  detection and repair tools 29
  30. 30. Living with uncertainty: An incremental discretizer + a Bayes count, alert, fix classifier where all inputs are all mono-classified Track average max likelihood for data processing in “era”’s of X instances Count: stuff seen in past Alert: if new counts different Contrast set learning Fix: find delta new to old Linear time inference, Very, very fast Tiny memory footprint  And, it works [Orrego, 2004]  F15 simulator data [courtesy B. Cukic]  Five flights: a,b,c,d,e  each with different off-nominal condition  imposed at “time” 15 Off-nominal condition not present in prior data  In all cases,  massive change detected 30
  31. 31. Living with uncertainty Policy #1: exploration  Life is a Tolerate the sub-optimal, a little  balance Doing crazy things to learn new things  between Policy #2: exploitation  Fix your theories and base your work on those fixed ideas.  Popper: • most “science” is puzzle solving… • … within existing paradigms. • Sometimes the paradigm breakdowns…. • …prompting revolutionary research Human young: • Do crazy things (take long trips) • Less craziness as we grow older 31
  32. 32. Tolerance of “exploration” Critical to the  American way America: history of  tolerance and acceptance 1945:  400 German rocket  scientists choose to surrender to the Yankees, not the Russians The choose their post-war  life based on their perceptions of American ideology Hence,  32
  33. 33. Tolerance = hi-tech = $$$ R. Florida: The Economic  Geography of Talent, 2002 Annals of Association of American  Geographers 92(4), 2002,pp743-655 Best predictor for hi-tech industry  R2 0.42 to “coolness”  R2 0.49 to cultural amenities  R2 0.50 to median house value  R2 0.77 to “diversity” index  33
  34. 34. Data Mining, Truth, Justice, the American Way & Flying Spaghetti Monsters “Superman, fights a never ending battle To make $$, for truth, justice, and the American way.quot; institutionalize exploration Old conclusions must No “truth”, and tolerance be constantly re-assessed all Is biased. A healthy hi-tech needs tolerance to support exploration and that the FSM is silly, but would consider revising that view if new evidence emerges 34
  35. 35. Expose, and hose • quot;Part of education is to • quot;Part of science is to expose people to different expose people to the schools of thought.” critical and continual (re)evaluation of ideas.” - President George Bush, - Some guy called Timm, August 1, 2005 September 20, 2007 35
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×