Data mining, truth, justice, the American Way, and the Giant Spaghetti Monster
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • Hi sorry if my opinion seems crazy... but you added the FSM theory of creation and basicly stated its crazy or delucional... and dismised the pirate vs global warming as false (from what I understood)... I think it would be better if you changed the FSM for something more beliveble and tried the same... how about Catolic church... not the same uh!... how about Jews.. or Budist.. or ID belivers.. or Scientology.. its a matter of faith my friend... just some cristian advise... dont judge people everything that goes comes around... or budist way, remember the karma...

    Atte
    Aperantly one who still belives in respect for others belifs
    Are you sure you want to
    Your message goes here
    Be the first to like this
No Downloads

Views

Total Views
1,756
On Slideshare
1,739
From Embeds
17
Number of Embeds
1

Actions

Shares
Downloads
25
Comments
1
Likes
0

Embeds 17

http://menzies.us 17

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Data Mining, Truth, Justice, the American Way, and the Flying Spaghetti Monster tim@menzies.us Ph.D. LCSEE, WVU, 20 Sept 2007
  • 2. Expose, and hose • quot;Part of education is to • quot;Part of science is to expose people to different expose people to the schools of thought.” critical and continual (re)evaluation of ideas.” - President George Bush, - Some guy called Timm, August 1, 2005 September 20, 2007 2
  • 3. quot;Look up in the sky! It's a bird! It's a plane! It's Superman!quot; quot;Yes, it's Superman, strange visitor from another planet who came to Earth with powers and abilities far beyond those of mortal men.” “Superman, who can change the course of mighty rivers, bend steel in his bare hands; and who, disguised as Clark Kent, mild- mannered reporter for a great metropolitan newspaper, fights a never ending battle for truth, justice, and the American way.quot; Why a never- How to ensure ending battle? justice? How to make lottsa $$ ? How to find truth? 3
  • 4. So, tonight Notions of certainty  Standards for debate  Surprises  Nothing is “truth”  but many more things are false  And some things are useful  Implications for humility  And for justice  4
  • 5. God gave me a brain. I take it (s)he wants me to use it. Mark of the rational  while not dead; do  Review and revise assumptions;  Done  Entertain a wide range of ideas  But don’t necessarily accept them  Demand evidence  that lets your repeat/ refute/ improve  prior conclusions But what of faith?  That, is another talk  There is room for the  divine in my universe But in my test tubes?  Not too much  5
  • 6. Data miners: agents that automate the creation and review of new ideas @relation weather.symbolic @attribute outlook {sunny, overcast, rainy} Mountains @attribute temperature {hot, mild, cool} @attribute humidity {high, normal} of data @attribute windy {TRUE, FALSE} @attribute play {yes, no} @data sunny,hot,high,FALSE,no Tablespoons of sunny,hot,high,TRUE,no knowledge overcast,hot,high,FALSE,yes rainy,mild,high,FALSE,yes rainy,cool,normal,FALSE,yes outlook = sunny rainy,cool,normal,TRUE,no | humidity = high: no overcast,cool,normal,TRUE,yes sunny,mild,high,FALSE,no | humidity = normal: yes sunny,cool,normal,FALSE,yes outlook = overcast: yes rainy,mild,normal,FALSE,yes outlook = rainy sunny,mild,normal,TRUE,yes | windy = TRUE: no overcast,mild,high,TRUE,yes overcast,hot,normal,FALSE,yes | windy = FALSE: yes rainy,mild,high,TRUE,no 6
  • 7. Data doubling every 20 months Internet, Radio Frequency Identification (RFID) tracking, on-line  shopping (patterns of sales tracked at Amazon) So now we can automatically learn answers to many questions; e.g.  What eggs to select for IVF?  What will software cost to develop?  What diseases does a patient have?  Which loan applications to fund?  What houses will have the best resale value?  Which parts of the program need more inspection?  What products are best to sell to what markets?  What cows to keep and which to send to the abattoir ?  How to teach a satellite to distinguish between cloud shadows and oil  spills? How much electricity will be needed in two hours  i.e. what cola-powered generators to fire up?  7
  • 8. More fundamentally, what can we say about the world, with any certainty? Same data, different data miners  different conclusions  Every miner biased by  Evaluation bias  Language  What is the “shape” of the  models we can learn? Decision trees, equations, etc  Search  Pruning the possible infinite  space of of candidate models What not to explore  Over-fitting avoidance  How to stop the learner fixating on noise  E.g. pruning back decision trees  8
  • 9. Any learning scheme has many biases • Bias lets us ignore “stuff”. • Without it, we don’t know what is important or dull, we can’t summarize, generalize. • Without bias, we can’t learn from the past • Bias blinds us but lets us see the future • But changing biases changes what we best believe • No wonder truth is a never-ending battle 9
  • 10. Generalizing from the past, works Sometimes, very clearly  Heavy smokers have  2000% to 3000% higher change of lung cancer Learned theories  performs very well on new data But ...  the “best” learned theory  can be a moveable feast. 10
  • 11. So, a relativistic soup? No certainty?  No way to plan effective actions?  No way to rule out absurd notions?  11
  • 12. I don’t want to offend any one, but… … I think that once … Should I even say this in a   public place? there were no cell phones  or iPods, or clothes, or quot;Part of education is to expose   countries, or language, or people to different schools of human society, or 4-valved thought.” hearts, or homeostasis, or President George Bush,  organs, or brains, or planets, August 1, 2005 or stars, or matter Shouldn’t I be have to give  credence to all theories? Where the net energy  in-flow is positive… Evolution,  Intelligent design the universe selects for self-   perpetuating systems, Pirates cause global  warming? an exponentially decreasing  number of which are of exponentially increasing complexity 12
  • 13. The Church of the Flying Spaghetti Monster (FSM) Founded in 2005  OSU physics graduate Bobby Henderson  A protest against the decision by the Kansas State Board of Education  That require the teaching of intelligent design as an alternative to biological evolution.  Henderson wrote to the board  professing belief in a supernatural Creator called the Flying Spaghetti Monster  Demanded that his quot;Pastafarianquot; theory of creation be taught in science classrooms.  13
  • 14. FSM is not about religion It is a mistake to view FSM as anti-religion  Rather, FSM is anti-anti-scientific rigor  No one in their right mind would ever  believe this nonsense And that’s the point  Truth is a never-ending battle  We must have standards to assess scientific  theories, to reject absurdities Or any nonsense can be released on this world  E.g. “Global warming is caused by pirates.”  14
  • 15. Wikipedia on FSM FSM: an invisible, undetectable Pirates are quot;absolute divine   Flying Spaghetti Monster beingsquot; and the original Pastafarians. Evidence for evolution planted by  FSM to in to Pastafarians' faith Their image as quot;thieves and  outcastsquot; is misinformation spread by Christian theologians in the FSM changes the results of  Middle Ages and Hare Krishnas. measurements, like radiocarbon dating, via His Noodly Appendage. Pirates are quot;peace-loving  explorers and spreaders of good Heaven contains beer volcanoes  willquot; who distributed candy to and a stripper factory. small children. Hell is similar, but with stale beer  Global warming, earthquakes,  and diseased strippers. hurricanes, and other natural disasters are a direct effect of the shrinking numbers of pirates since the 1800s. 15
  • 16. FSM “proof” of the divinity of pirates A case study on how not to present data X-axis deliberately misleading. Crazy? Yes! • But would you recognize such craziness if you say it again? 16
  • 17. What is the “best” weight-loss diet? 17
  • 18. How lucky for those in power that people don't think. - Adolph Hitler i.e. people trying to sell you their diet book
  • 19. What is the “best” programming language? 19
  • 20. To our peril, we trust old ideas too much Columbia ice strike:  Size: 1200 in3,  Speed: 477 mph  (relative to vehicle) Certified as “safe” by the  CRATER micro-meteorite model A typical experiment in  CRATER’s test database Size: 3 in3 piece of debris  Speed: under 150 mph.  20
  • 21. Value of estrogen (NYT magazine, Sept 16, 2007) 1990s:  Failure of scientific method  American Heart Association  Benefits of estrogen reported from large  recommends hormone replacement observational studies, not randomized trials therapy for older women to ward off Repeated epidemiological finding: heart disease and osteoporosis.  2001: randomized trail rarely support conclusions   from observational studies. 15 million Americans filling H.R.T.  So forget what you’re read about prescriptions annually  2002: Anti-oxidants like vitamins E & C &beta   carotene preventing heat disease estrogen therapy exposed as a hazard,  Fiber prevents colon cancer not a benefit, for health  21
  • 22. So, why is FSM silly? And please, rest assured,  it is very very silly stuff indeed.  Theories need an entrance exam  Many possible theories  one for each bias  Demand that a theory has past at least  some operational al test before we condone it, act on it. If no reason to accept the new, don’t  Trust the most what has been  challenged the most Karl Popper  22
  • 23. No things are “right”, but some things are “useful” Sure, one data set supports many theories.  But there are many many more theories that are  unsupported. No model is right, but some things are useful  (perform well on test data)  George Box  And many many many more ideas are useless  Can’t make predictions  Not defined enough to support (possible) refutation  23
  • 24. Wolfgang Pauli The quot;conscience of physicsquot;,  the critic to whom his colleagues were accountable.  Scathing in his dismissal of poor theories  often labeling it ganz falsch, utterly false.  But “ganz falsch” was not his most severe  criticism, He hated theories so unclearly presented as to be  untestable  unevaluatable,  Worse than wrong because they could not be  proven wrong. Not properly belonging within the realm of science,  even though posing as such.  Famously, he wrote of of such unclear paper:  ”This paper is right. It is not even wrong.quot;  24
  • 25. Believe those who seek the truth; doubt those who find it -Andre Gide.
  • 26. Don’t test once on just the training data Study more than the  average performance Also look at the  variance E.g. here, no  significant on new data after X=8 26
  • 27. If something works, poke it till it breaks i) Sort attributes on “infogain” ii) Learn using first N attributes labor soybean diabetes anneal A few variables are (often) enough 27
  • 28. Living with Uncertainty Check how training rate size effects theory  28
  • 29. Living with Uncertainty Launch learners with anomaly  detection and repair tools 29
  • 30. Living with uncertainty: An incremental discretizer + a Bayes count, alert, fix classifier where all inputs are all mono-classified Track average max likelihood for data processing in “era”’s of X instances Count: stuff seen in past Alert: if new counts different Contrast set learning Fix: find delta new to old Linear time inference, Very, very fast Tiny memory footprint  And, it works [Orrego, 2004]  F15 simulator data [courtesy B. Cukic]  Five flights: a,b,c,d,e  each with different off-nominal condition  imposed at “time” 15 Off-nominal condition not present in prior data  In all cases,  massive change detected 30
  • 31. Living with uncertainty Policy #1: exploration  Life is a Tolerate the sub-optimal, a little  balance Doing crazy things to learn new things  between Policy #2: exploitation  Fix your theories and base your work on those fixed ideas.  Popper: • most “science” is puzzle solving… • … within existing paradigms. • Sometimes the paradigm breakdowns…. • …prompting revolutionary research Human young: • Do crazy things (take long trips) • Less craziness as we grow older 31
  • 32. Tolerance of “exploration” Critical to the  American way America: history of  tolerance and acceptance 1945:  400 German rocket  scientists choose to surrender to the Yankees, not the Russians The choose their post-war  life based on their perceptions of American ideology Hence,  32
  • 33. Tolerance = hi-tech = $$$ R. Florida: The Economic  Geography of Talent, 2002 Annals of Association of American  Geographers 92(4), 2002,pp743-655 Best predictor for hi-tech industry  R2 0.42 to “coolness”  R2 0.49 to cultural amenities  R2 0.50 to median house value  R2 0.77 to “diversity” index  33
  • 34. Data Mining, Truth, Justice, the American Way & Flying Spaghetti Monsters “Superman, fights a never ending battle To make $$, for truth, justice, and the American way.quot; institutionalize exploration Old conclusions must No “truth”, and tolerance be constantly re-assessed all Is biased. A healthy hi-tech needs tolerance to support exploration and that the FSM is silly, but would consider revising that view if new evidence emerges 34
  • 35. Expose, and hose • quot;Part of education is to • quot;Part of science is to expose people to different expose people to the schools of thought.” critical and continual (re)evaluation of ideas.” - President George Bush, - Some guy called Timm, August 1, 2005 September 20, 2007 35